Compare commits

...

365 Commits

Author SHA1 Message Date
teknium1
762b681423 feat(curator): add hermes curator usage — all-skills usage view
Surfaces the usage_report()/provenance() data layer added in #36701 as a
user-facing CLI command. Unlike `hermes curator status` (scoped to
curator-managed agent-created candidates), `usage` lists every skill on disk
— bundled built-ins and hub-installed included — with per-skill use/view/patch
counts and an agent/bundled/hub provenance tag.

Flags: --sort {activity,recent,name}, --provenance {agent,bundled,hub} filter,
--json for machine-readable output.
2026-06-01 03:05:41 -07:00
Teknium
023149f665 fix(agent): stop reporting broken streams as output-length truncation (#36705)
A stream that drops mid-response after tokens are delivered (peer-closed
connection, stale-stream reconnect) is converted into a synthetic
finish_reason="length" stub. The conversation loop treated that network
stall as a max-output-tokens truncation: when the dropped content was a
tool call it retried exactly once, then hard-failed with "Response
truncated due to output length limit" — even on large-output models that
never hit any cap (e.g. Opus).

- Tool-call truncation now retries up to 3 times (was 1) with a
  progressive max_tokens boost, and is stub-aware: a PARTIAL_STREAM_STUB_ID
  stall prints "Stream interrupted mid tool-call — retrying (n/3)" instead
  of the false "model hit max output tokens", and the give-up message
  distinguishes a network drop from a real truncation.
- Length-continuation retries preserve the original request's output cap
  as a floor, so a high provider/model default isn't silently downshifted
  to 8K/12K on retry.
- Added _requested_output_cap_from_api_kwargs() helper.

Tests: stub-stall mid-tool-call recovery within 3 retries; continuation
preserves a large provider-default output cap.

Fixes #26425. Salvages the substance of #26427 (cap floor) and #9525
(retry bump), adapted to the post-refactor conversation_loop.py which
handles all three api_modes uniformly.

Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: ygd58 <ygd58@users.noreply.github.com>
2026-06-01 03:01:20 -07:00
Teknium
b571ec298d feat(dashboard): full administration panel — MCP, pairing, webhooks, credentials, memory, gateway, ops (#36704)
* feat(dashboard): backend API for MCP, pairing, webhooks, credential pool, memory, gateway lifecycle

Adds REST endpoints so a remote admin can manage these without CLI access:
- MCP servers: list/add/remove/test (config.yaml parity with hermes mcp)
- Pairing: list/approve/revoke/clear-pending messaging codes
- Webhooks: list/subscribe/remove (hot-reloaded JSON store)
- Credential pool: list/add/remove rotation keys (via CredentialPool API)
- Memory provider: status/select/disable/reset
- Gateway lifecycle: start/stop (restart+update already existed)

Secrets redacted on read; usable values only reach the agent at session start.
All endpoints sit behind the existing dashboard auth gate.

* feat(dashboard): backend API for ops + skills hub

- Ops actions (spawned, log-tailed via /api/actions): doctor, security audit,
  backup, import, checkpoints prune
- Ops reads (structured JSON): hooks list + allowlist status, checkpoints list
  with per-session size
- Skills hub actions (spawned): install / uninstall / update
- Registers new action log files for all spawn-based endpoints

All gated by the existing dashboard auth middleware.

* feat(dashboard): admin pages for MCP, pairing, webhooks, and system ops

Adds four new dashboard pages + nav entries so a remote admin can manage
Hermes without CLI access:
- MCP: list/add/remove/test MCP servers
- Webhooks: list/create/delete subscriptions (one-time secret reveal)
- Pairing: approve/revoke/clear messaging pairing codes
- System: gateway start/stop/restart, memory provider + reset, credential
  pool add/remove, ops (doctor/audit/backup/import/skills update) with a
  live action-log viewer, checkpoints prune, shell-hooks status

api.ts: client methods + types for all new endpoints.
App.tsx: routes + sidebar nav (plain labels, no i18n key required).

Verified: tsc -b clean, production build succeeds, new pages lint clean,
zero new eslint errors in App.tsx.

* test(dashboard): cover admin API endpoints

20 tests across MCP, credential pool, memory, pairing, webhooks, ops, plus
an auth-gate parametrize that asserts every admin endpoint requires the
session token. Asserts request contract + CLI-config parity, not catalog
values (per the no-change-detector-tests rule).

* docs(dashboard): document MCP, Webhooks, Pairing, and System admin pages

Adds Pages sections for the four new admin tabs and an Admin-endpoints table
to the REST API reference. Updates the page description to reflect the
dashboard's expanded role as a full administration panel.
2026-06-01 02:58:02 -07:00
Teknium
2ed96372ad feat(skills): blank-slate skills — install --no-skills + opt-out/opt-in (#36228)
* feat(install): --no-skills flag for blank-slate default profile

Add an install-time --no-skills flag so the default ~/.hermes profile can
be created with zero bundled skills, matching what
`hermes profile create --no-skills` already does for named profiles.

The flag writes $HERMES_HOME/.no-bundled-skills and skips the install-time
seed. sync_skills() now honors that marker with an early return
(skipped_opt_out=True), so neither the installer, a later `hermes update`,
nor a direct sync re-injects bundled skills into a profile that opted out.

Previously the marker was only checked by seed_profile_skills() (named
profiles); the default profile had no opt-out and `hermes update` would
re-seed it every time.

Tests: TestNoBundledSkillsOptOut covers marker-present (no-op) and
marker-absent (normal seed) paths.

* feat(skills): hermes skills opt-out / opt-in for existing profiles

Adds an interactive counterpart to the install-time --no-skills flag so
an already-installed profile (default or named) can toggle the
.no-bundled-skills marker without reinstalling.

- `hermes skills opt-out` writes the marker (stop future seeding). Safe
  by default: nothing on disk is touched.
- `hermes skills opt-out --remove` ALSO deletes already-present bundled
  skills, but ONLY ones that are manifest-tracked AND byte-identical to
  their origin hash. User-edited bundled skills, hub-installed skills, and
  hand-written skills are never removed. Previews + confirms before
  deleting (--yes to skip).
- `hermes skills opt-in [--sync]` removes the marker and optionally
  re-seeds immediately.

Core logic lives in tools/skills_sync.py (set_bundled_skills_opt_out,
is_bundled_skills_opt_out, remove_pristine_bundled_skills) reusing the
existing manifest origin-hash machinery for the safety check.

Tests: TestOptOutToggleAndRemove covers marker toggle idempotency and
proves user-modified + non-bundled skills survive --remove.

* docs: blank-slate skills — install --no-skills + opt-out/opt-in

- features/skills.md: new 'Starting with a blank slate' section covering
  the install flag, profile-create flag, and runtime opt-out/opt-in, with
  a safe-by-default note.
- reference/cli-commands.md: document the new skills opt-out / opt-in
  subcommands + examples.
- reference/profile-commands.md: fix the marker filename (was .no-skills,
  actually .no-bundled-skills) and cross-link the runtime commands.

Validated with a full docusaurus build (exit 0); the three edited pages
compile clean with no new warnings.
2026-06-01 02:57:57 -07:00
Teknium
70e1571d89 feat(curator): prune built-in skills after inactivity + track usage for all skills (#36701)
Two related changes to the skill curator:

1. Built-in pruning. New curator.prune_builtins config (default on) lets the
   curator archive bundled built-in skills after the inactivity period, not
   just agent-created ones. A .curator_suppressed list tells the update-time
   re-seeder (tools/skills_sync) to leave pruned built-ins archived, so the
   prune is durable across `hermes update`. Built-ins are seeded with a
   baseline record on first sight, so the inactivity clock starts at upgrade
   time -- no mass-prune on the first run. Hub-installed skills are never
   pruned regardless of the flag. Restoring a built-in clears its suppression.

2. Usage tracking for all skills. Telemetry (view/use/patch) was wrongly gated
   behind curation-eligibility, so built-ins were tracked only when prunable
   and hub skills never. Telemetry is observability and is now decoupled from
   curation: every skill accrues usage counts regardless of provenance, while
   lifecycle mutators (set_state/set_pinned/mark_agent_created) stay
   curation-gated. New usage_report() + provenance() expose all skills with an
   agent/bundled/hub tag.
2026-06-01 02:07:32 -07:00
Teknium
0622a70eb4 feat(gateway): bring /undo [N] to messaging platforms (parity with CLI/TUI) (#36699)
Gateway /undo was wired into every platform but still ran the old
single-turn hard-truncate. Now it matches the CLI/TUI: /undo [N] backs
up N user turns (default 1, clamps to oldest), soft-deletes the
truncated rows on disk (active=0, kept for audit, hidden from re-prompts
and search) via SessionDB.rewind_to_message, evicts the cached agent so
the next turn rebuilds from the active-only transcript (the gateway's
equivalent of the CLI's in-place history surgery + memory invalidation),
and echoes the backed-up message text so the user can copy/edit and
resend — platforms have no editable composer to prefill.

- gateway/session.py: SessionStore.rewind_session(session_id, n) wraps
  the soft-delete primitive; load_transcript already returns active-only
- gateway/run.py: _handle_undo_command parses [N], calls rewind_session,
  evicts the agent, echoes target text; confirm-prompt detail is count-aware
- locales: undo.removed gains {turns}; new undo.invalid_count, all 16 langs
- tests: tests/gateway/test_undo_rewind_session.py (6 cases)
2026-06-01 02:04:14 -07:00
Teknium
ba6ffd4ff1 fix(skills-guard): stop flagging benign skill content + honor skill ignore files (#36231)
The skill security scanner blocked legitimate community skills on three
intrinsic false-positive patterns:

- read_secrets_file matched `cat > file.env <<` heredocs (writing the
  user's own keys into their own local .env), not just `cat file.env`
  reads. Exclude output redirections.
- allowed-tools frontmatter is REQUIRED by the agent-skill spec; every
  compliant skill declares it. Drop from HIGH privilege_escalation to a
  LOW informational finding so it no longer drives the verdict.
- python_os_environ flagged `os.environ.get("CONFIG_VAR")` config reads
  as HIGH exfiltration. Exempt non-secret `.get()` reads; add a dedicated
  CRITICAL python_environ_get_secret pattern so secret-named reads
  (OPENAI_API_KEY etc.) are still caught.

Also: scan_skill() now honors a skill-provided .skillignore / .clawhubignore
(gitignore-style) so dev/docs artifacts shipped in a skill root are excluded
from both structural checks and pattern scanning. SKILL.md is never ignorable.

80 tests pass (64 existing + 16 new).
2026-06-01 01:58:48 -07:00
Teknium
9074a154c5 feat: explain Quick Setup vs Full setup inline in the first-time setup menu (#36227)
The setup-mode chooser showed two bare labels ('Quick Setup (Nous
Portal) — OAuth login, model & messaging' / 'Full setup — configure
everything') that didn't explain what Quick Setup actually is. Expand
both labels inline so each choice line carries a concise explanation:

  Quick Setup (Nous Portal) — free OAuth login, no API keys, model + tools
  Full setup — configure every provider, tool & option yourself (bring your own keys)

Single-file change to the choice labels; no new plumbing.
2026-06-01 01:58:30 -07:00
Teknium
92a567db2d fix(ci): regen model catalog + stop gui tests consuming macos-fixup subprocess calls (#36687)
Two pre-existing failures on main, unrelated to each other:

- test_model_catalog: website/static/api/model-catalog.json was stale vs
  _PROVIDER_MODELS — minimax/minimax-m2.7 was renamed to minimax/minimax-m3
  without regenerating the committed manifest. Ran scripts/build_model_catalog.py.

- test_gui_command: the macOS relaunchable-signing fixup
  (_desktop_macos_relaunchable_fixup) makes two subprocess.run calls (xattr +
  codesign) on darwin before launch. The two darwin GUI tests set
  sys.platform='darwin' and mock subprocess.run with a 2-element side_effect
  (pack + launch), so the fixup's calls drained the iterator -> StopIteration.
  Mock out the fixup in those two tests so the subprocess accounting stays
  focused on pack/launch.
2026-06-01 01:39:03 -07:00
Teknium
e1951ce704 fix(memory): only forward rewound kwarg when set
The on_session_switch fan-out passed rewound=rewound unconditionally,
injecting rewound=False into every provider's **kwargs on the common
/resume, /branch, /new, and compression paths. Providers that capture
extra kwargs into an 'extra' dict (and the exact-dict-equality tests
guarding them) broke. Forward rewound only when truthy; /undo sets it
explicitly, everyone else stays clean.
2026-06-01 01:22:38 -07:00
Teknium
3f7d1c801d feat(undo): /undo [N] backs up N user turns with prefill + soft-delete
Extends the existing /undo command from a single in-memory exchange
removal into a full rewind: back up N user turns (default 1), soft-delete
the truncated rows in SessionDB (active=0, kept for audit, hidden from
re-prompts and search), notify memory providers, and prefill the composer
with the backed-up message text for editing — CLI and TUI.

Reuses the SessionDB rewind primitives, the on_session_switch(rewound=True)
memory hook, and the TUI command.dispatch prefill payload from SaguaroDev's
#21910 work, wired to /undo [N] instead of a separate /rewind picker.

- cli.py: undo_last(n, prefill) — in-memory truncate + SQLite soft-delete
  + agent surgery (system-prompt invalidate, flush-index reset) + memory
  notify + editable buffer prefill; /undo dispatch parses optional count;
  checkpoint-rollback caller passes prefill=False
- tui_gateway/server.py: command.dispatch undo branch (was rewind) parses
  count, picks Nth-from-last user turn, clamps to oldest
- commands.py: /undo gains [N] args_hint
- tests: rename + expand TUI suite (multi-turn, clamp, invalid-count)
- release.py: AUTHOR_MAP entry for SaguaroDev

Co-authored-by: SaguaroDev <74339271+SaguaroDev@users.noreply.github.com>
2026-06-01 01:22:38 -07:00
SaguaroDev
243e836dce feat(tui): wire /rewind through command.dispatch + prefill payload (#21910)
Adds the TUI half of the /rewind feature so the Ink terminal UI gets
the same affordance as the prompt_toolkit CLI.

Python side (tui_gateway/server.py):
- /rewind added to _PENDING_INPUT_COMMANDS so slash.exec rejects it
  and the TUI falls through to command.dispatch (the only path with
  access to live session state + memory hooks).
- New command.dispatch branch for name == "rewind":
  v1 auto-picks the most recent user turn (Claude-Code-style single-
  step undo), calls SessionDB.rewind_to_message, refreshes the
  in-memory history, fires _memory_manager.on_session_switch with
  rewound=True, and returns the new "prefill" payload.
- A dedicated picker overlay (multi-step rewind) is tracked as a
  follow-up to #21910.

TS side (ui-tui/src/):
- New "prefill" variant on CommandDispatchResponse + asCommandDispatch
  validator. Mirrors "send" but does NOT auto-submit; the client drops
  the message into the composer for editing.
- createSlashHandler renders the optional notice via sys() and calls
  ctx.composer.setInput(d.message), letting the user edit-and-resubmit
  the rewound turn — the core UX promised by the issue.

Tests:
- 7 new tui_gateway tests covering prefill payload shape, in-memory
  history truncation, DB soft-delete, memory-provider notification
  (rewound=True), busy-session refusal, missing-session error, and
  registry placement in _PENDING_INPUT_COMMANDS.
- Extended asCommandDispatch vitest covering the new prefill variant
  (with + without notice, and rejection of malformed payloads).

Out of scope for v1 (tracked as #21910 follow-up):
- Dedicated picker overlay in Ink (the multi-step rewind UI). v1 auto-
  picks the most recent user turn, matching the most common case.
- Gateway platforms (Telegram, Discord, etc.) — issue scopes v1 to
  CLI + TUI only.
2026-06-01 01:22:38 -07:00
SaguaroDev
31cfa08c66 feat(memory): add rewound kwarg to on_session_switch hook 2026-06-01 01:22:38 -07:00
SaguaroDev
3e59be0c41 feat(state): add messages.active flag + rewind primitives (#21910)
Schema v12 adds:
- messages.active (default 1) — soft-delete flag for /rewind
- sessions.rewind_count (default 0) — audit counter
- idx_messages_session_active deferred index

New SessionDB methods:
- rewind_to_message(session_id, target_message_id) — soft-deletes rows
  >= target_id, refuses non-user targets, increments rewind_count
- restore_rewound(session_id, since_message_id) — undo for stretch goal
- list_recent_user_messages — picker source

Existing methods get include_inactive kwarg (default False):
- get_messages, get_messages_as_conversation, search_messages.
  Rewound rows excluded from session_search by default — opt-in for audit.

The deferred index pattern (DEFERRED_INDEX_SQL run after _reconcile_columns)
avoids 'no such column: active' on legacy pre-v12 databases, since
executescript(SCHEMA_SQL) runs before column reconciliation.
2026-06-01 01:22:38 -07:00
kshitijk4poor
6c73e8ffaa fix(gateway): keep code blocks verbatim in cleaned text when media present
Self-review of the code-block masking fix: the cleanup path ran
media_pattern.sub('') over the _mask_protected_spans() copy of the text and
assigned that back to 'cleaned', so whenever a real MEDIA: tag was delivered
(if media: branch), every fenced code block / inline code / blockquote in the
reply was blanked to whitespace in the user-visible text.

Now mask only a length-equal copy of 'cleaned' to locate the real tag spans,
then delete those spans from the unmasked 'cleaned' — masking is a locator,
not a text rewrite. Protected spans survive verbatim. Strengthens the existing
mixed-code test (it only asserted 'Done.' survived, not the code block) and
adds an inline-code-survives regression test. Both fail on the old sub-based
code and pass now.
2026-06-01 00:00:26 -07:00
kshitijk4poor
ec6261ae2f chore(release): add VinciZhu to AUTHOR_MAP for #16721 salvage 2026-06-01 00:00:26 -07:00
liuhao1024
3ccf4fdc6d fix(gateway): skip MEDIA: tags inside code blocks and blockquotes
extract_media() scanned the full response text without distinguishing
live delivery tags from example paths in fenced code blocks, inline code
spans, and blockquotes. This caused false positives where the agent's
explanation of MEDIA: syntax (or tool output containing example paths)
was stripped from user-visible text and the path was added to the media
delivery list.

Added _mask_protected_spans() helper that replaces protected regions
with equal-length whitespace before regex matching, preserving match
offsets. The helper skips backtick-quoted paths in MEDIA: tags to
maintain existing path extraction behavior.

Fixes #35695
2026-06-01 00:00:26 -07:00
VinciZhu
521d06975e fix(gateway): restrict auto-appended media to producer tools 2026-06-01 00:00:26 -07:00
kshitijk4poor
fb1b681b3b fix(gateway): keep JSON-embedded MEDIA: text verbatim in cleaned output
Self-review of #34375 fix: the cleanup path ran media_pattern.sub('') over
the JSON-masked copy of the text, which baked the masking spaces into the
user-visible 'cleaned' string — a serialized tool result like
{"old":"MEDIA:/x.png"} came back as {"old":"          "}.

Now mask only a length-equal copy of 'cleaned' to locate the real tag spans,
then delete those spans from the unmasked 'cleaned'. Real tags are stripped;
JSON-embedded MEDIA: text reads back verbatim. Masking 'cleaned' (not the
original 'content') keeps offsets valid after the [[audio_as_voice]] /
[[as_document]] directives are removed. Adds two cleaned-text regression tests.
2026-05-31 23:51:42 -07:00
liuhao1024
e8827ef704 fix(gateway): skip MEDIA: inside serialized JSON string values
Serialized tool results frequently embed a prior reply's text, e.g.
{"result": "MEDIA:/path/stale.png"}. The bare-path branch of
MEDIA_TAG_CLEANUP_RE matched these and re-delivered stale files (#34375).

Adds BasePlatformAdapter._mask_json_string_media, which blanks (offset-
preserving) only MEDIA:<bare-path> tokens that sit inside a JSON value-
context string (opened by : , { or [). Legitimate tags at line start,
after prose, indented, MEDIA:"quoted" form, and two-line TTS output are
all left untouched.

Reworked from the approach in #34388 (a line-start regex anchor), which
no longer applied to current main and regressed same-line/indented tags.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-05-31 23:51:42 -07:00
Nicolay
b3aaf2676b fix(docker): discover Playwright headless_shell browser (#35717)
Co-authored-by: Nic <nicsequenzy@gmail.com>
2026-06-01 16:06:44 +10:00
Ben Barclay
e3998d4714 chore(attribution): map polnikale for PR #35717 (#36273)
Adds nicsequenzy@gmail.com -> polnikale to AUTHOR_MAP so the
check-attribution gate passes for the Playwright headless_shell browser
discovery fix (#35717).
2026-06-01 16:05:06 +10:00
Amin Vakil
f106e58afa fix(docker): create s6 envdir before browser path export (#34601) 2026-06-01 15:44:30 +10:00
Ben Barclay
c1a531d063 fix(dashboard): guard update endpoint in Docker with structured guidance (salvage #34831) (#36263)
* fix: guard dashboard update in Docker

* fix(dashboard): align action response type

---------

Co-authored-by: Donovan Yohan <donovan-yohan@users.noreply.github.com>
Co-authored-by: Donovan Yohan <34756395+donovan-yohan@users.noreply.github.com>
2026-06-01 15:39:35 +10:00
brooklyn!
359f2be12e feat(desktop): drop files anywhere in the chat area (#36262)
* feat(desktop): drop files anywhere in the chat area

File drops were only wired to the composer input. Add a reusable
useFileDropZone hook (enter/leave depth counting + capture-phase reset so
the affordance clears even when the composer claims the drop) and a
pointer-events-none ChatDropOverlay, wired onto the conversation viewport.
Drops funnel through the existing onAttachDroppedItems; composer drops keep
their own inline-ref behavior.

* fix(desktop): chat-area drops insert inline @file refs, not attachment cards

Match the composer-input drop behavior — funnel dropped paths through
droppedFileInlineRef + the composer insert bus so they render as inline
ref chips instead of attachment cards.

* fix(desktop): don't render bare file paths as tool images (404)

vision_analyze reports its input image as a local filesystem path, which
toolImageUrl handed straight to <img src>. In the renderer that resolves
against the dev-server origin and 404s. Restrict inline tool images to
fetchable sources (data: URLs and remote http(s)); bare paths now fall
back to the tool's codicon.
2026-06-01 00:30:39 -05:00
Ben Barclay
e1eba6f8cc fix(dashboard-auth): drop /api/* paths from OAuth next= round trip (#36244)
When an unauthenticated SPA fetch hit a gated /api/* endpoint (e.g.
GET /api/analytics/models?days=30 fired from ModelsPage on mount or
after a session expiry), the gated middleware stamped the request's
own path into next= on the 401 envelope's login_url. The SPA's global
401 handler in web/src/lib/api.ts full-page-navigated to that URL,
the PKCE cookie carried the encoded /api/* value through the OAuth
round trip to Portal, and /auth/callback's _validate_post_login_target
accepted it as same-origin and redirected the user to the raw JSON
endpoint instead of the dashboard.

Symptom Ben reported: after the OAuth screen he kept landing on
$DOMAIN/api/analytics/models?days=30 (raw JSON) rather than /models.
The bug was deterministic per page — whichever /api/* call ModelsPage,
AnalyticsPage, or SessionsPage fired first owned the redirect race.

Fix: both validators now reject /api/* targets in addition to the
existing /login, /auth/, /api/auth/ exclusions:

  - _safe_next_target in middleware.py drops the value before it ever
    enters login_url, so the SPA's 401 handler navigates to a bare
    /login (which the SPA itself can return-from via its own
    sessionStorage["hermes.lastLocation"] fallback that was already
    saving the actual browser location).
  - _validate_post_login_target in routes.py drops it as second-line
    defence at the callback boundary, so a legacy cookie, a regressed
    middleware, or an attacker-crafted /auth/login?next=/api/... value
    can't smuggle the redirect through. Either layer alone is enough;
    pairing them means a regression in one is caught by the other.

The match is anchored: ``decoded == "/api"`` or
``decoded.startswith("/api/")``. SPA route lookalikes like /apidocs
or /api-keys remain valid landing targets — tests pin that.

Test additions in test_dashboard_auth_401_reauth.py:

  - TestApi401Envelope: rewrote test_login_url_carries_next_for_deep_
    api_path (which asserted the pre-fix behaviour) as
    test_login_url_drops_next_for_deep_api_path, plus added the
    specific analytics-models repro case from Ben's report.
  - TestNextSameOriginValidation: rejects-api-paths + does-not-reject-
    api-prefix-lookalikes (covers /apidocs, /api-keys).
  - TestAuthCallbackNext: end-to-end test_callback_with_api_next_
    lands_at_root drives /auth/login?next=/api/... through to the
    callback and asserts the user lands at "/", not the API URL.
  - TestValidatePostLoginTarget: new class covering the callback-side
    validator directly, including the URL-encoded ``%2Fapi%2F...``
    form the PKCE cookie actually carries.

Mutation-tested: reverting both validators causes exactly the 5 new
or rewritten /api/*-related assertions to fail (each fix layer is
independently tested), while the 31 other assertions in the file
remain green. Full tests/hermes_cli/ suite (288 files, 5,938 tests)
passes with the fix applied.
2026-06-01 15:10:20 +10:00
brooklyn!
7fbe9b79ab fix(desktop): add missing PATCH /api/sessions/{id} so rename works (#36249)
The desktop rename dialog sent PATCH /api/sessions/{id}, but the backend
only defined GET and DELETE for that path — FastAPI returned 405 Method
Not Allowed, surfaced to the user as "Rename failed". Add the PATCH route
backed by SessionDB.set_session_title (handles sanitization, uniqueness,
and clearing the title when empty).

Also fix a misleading notification: any 405 was summarized as an unrelated
"does not support that audio endpoint" message. Make it a generic 405 hint.
2026-06-01 00:01:28 -05:00
Ben Barclay
bdceedf784 fix(docker): chown hermes-owned top-level state files on boot (#35098) (#36236)
The targeted data-volume chown in stage2-hook.sh only covers hermes-owned
*subdirectories*; loose state files living directly under $HERMES_HOME
(auth.json, state.db, gateway.lock, gateway_state.json, …) are missed.
When created or rewritten by `docker exec <container> hermes …` (root
unless `-u` is passed) they land root-owned, and the unprivileged hermes
runtime then hits PermissionError on next startup, producing a gateway
restart loop.

Fix: reset ownership of an explicit allowlist of hermes-owned top-level
files on every boot. The list mirrors the top-level file entries of
hermes_cli.profile_distribution.USER_OWNED_EXCLUDE plus the runtime lock
files.

This uses a targeted allowlist rather than the originally-proposed blanket
`find $HERMES_HOME -maxdepth 1 -user root` sweep, preserving the
targeted-ownership contract from #19788 / PR #19795: a bind-mounted
$HERMES_HOME may contain host-owned files Hermes does not manage, and
those must never be chowned. Verified end-to-end: allowlisted root-owned
files are reset to hermes on restart while a non-allowlisted host file
keeps its root ownership.

Co-authored-by: x1am1 <2663402852@qq.com>
2026-06-01 14:38:08 +10:00
brooklyn!
0bc616ecf9 fix(desktop): darken light-mode code comment color for legibility (#36234)
Shiki's github-light-default colors comments #6e7781 (~4.2:1 on the code
card background), which is borderline unreadable at the 11px code font
size — and worst for shell snippets, where a single `#` turns the rest
of the line into one long comment span. Remap light-mode comments to
GitHub's darker muted gray (#57606a, ~6.4:1) via per-theme
colorReplacements. Dark mode (~6.1:1) reads fine and is left untouched.
2026-05-31 23:21:58 -05:00
helix4u
b14e15c48e fix(gateway): clean service restart notifications 2026-05-31 21:05:53 -07:00
Nacho Avecilla
380ce4789b Remove prviliges drop when you never ran as root (#34837) 2026-06-01 13:54:18 +10:00
Bartok
064875a540 fix(docker): support s6 /init images in terminal sandbox (#34628) (#34635)
s6-overlay images (e.g. hermes-agent:latest) use /init as PID 1 and exec
/run/s6/basedir/bin/init during stage0 startup. The Docker terminal backend
unconditionally added Docker --init and mounted /run as noexec, which broke
those images in two ways: --init created a second competing PID-1 init, and
the noexec /run made s6 stage0 fail with "exec: /run/s6/basedir/bin/init:
Permission denied" (exit 126), so the container died and terminal commands
reported a generic "container is not running" error.

Detect images whose entrypoint is /init via 'docker image inspect' and, for
those images only, skip Docker --init and mount /run with exec. All other
images keep the hardened --init + noexec defaults. Detection is best-effort:
any inspect failure falls back to the safe defaults.
2026-06-01 13:46:04 +10:00
Bartok
a60bff282e fix(docker): add /usr/bin/tini compatibility shim for legacy wrappers (#34192) (#34382)
#34192 reports Hostinger's 'Hermes WebUI' catalog crashes on startup
with:

  /usr/bin/tini: No such file or directory

The image moved from tini to s6-overlay as PID 1 (/init) earlier in
2026. Orchestration templates that still pin /usr/bin/tini as the
entrypoint \u2014 like the Hostinger Hermes WebUI catalog \u2014 have no
binary to exec and the container crashes immediately.

Hermes has no control over the Hostinger catalog template, but we can
make the image backward-compatible by symlinking /usr/bin/tini -> /init
during the s6-overlay install step. External wrappers that exec
/usr/bin/tini will land on the same s6-overlay reaper they would have
landed on if they'd used the canonical /init entrypoint.

The image's own ENTRYPOINT continues to be /init verbatim \u2014 the shim
is purely for legacy external wrappers, not for the image's own
runtime path. Once affected catalogs are updated, the symlink can be
removed.

Other issues #34192 raises that are NOT addressed by this PR:

  * Problem #2 (UID 1024 vs 10000 mismatch): already fixed by #33148
    (S6_KEEP_ENV=1) and #32412 (with-contenv shebangs). The Hostinger
    template likely needs to update its env-var propagation.

  * Problem #3 (incompatible session formats): RFC for pluggable
    SessionDB is tracked in #23717.

  * Problem #4 (Telegram polling conflict): an operations problem on
    Hostinger's side, not in this codebase.

This PR is scoped to the one issue that can be fixed inside
Dockerfile: the missing /usr/bin/tini binary.

Tests (3 in test_dockerfile_tini_compat_shim.py):

  - test_tini_compat_symlink_present
    Guard: the symlink line must exist in Dockerfile.
  - test_tini_compat_comment_explains_why
    The #34192 anchor comment must be present so future readers know
    why the shim is there (avoid accidental removal).
  - test_entrypoint_still_init_not_tini
    Sanity check: ENTRYPOINT remains /init (s6-overlay). The shim is
    only for external wrappers.

Refs: #34192
Partial fix: addresses the immediate tini-binary crash. Catalog-side
fixes still needed by Hostinger for the UID and session-format
problems documented in the issue.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-01 13:32:55 +10:00
Bartok
740fb28d02 fix(config): chown ensure_hermes_home dirs to HERMES_UID/GID in Docker (#34107) (#34268)
Fixes #34107. When Hermes runs in Docker with HERMES_UID=1000 /
HERMES_GID=911, the entrypoint chowns the top-level HERMES_HOME once
at startup — but subdirectories created at runtime by
ensure_hermes_home() (especially for profile namespaces under
profiles/<name>/ spawned by kanban workers) were landing as root:root
and blocking subsequent uid-mapped worker invocations with:

  PermissionError: [Errno 13] Permission denied:
    '/opt/data/profiles/charles/logs/curator'

Fix: add _resolve_hermes_uid_gid + _chown_to_hermes_uid helpers that
read the env vars and apply chown after mkdir. Invoke from _secure_dir
which already runs after every directory creation in the home-init path,
so all newly-created subdirs (including the profile namespaces) get the
right ownership.

Safety properties:

- No-op when HERMES_UID/HERMES_GID unset (the dominant non-Docker path)
- No-op on Windows (os.chown doesn't exist; AttributeError swallowed)
- No-op when running as non-root (EPERM swallowed — the entrypoint's
  startup chown -R picks it up on next restart, and in most cases the
  dir was already correctly-owned by the calling user)
- Uses -1 sentinel for missing field so only the set value applies
- Empty-string env vars treated as unset

Adds 14 tests across:
- TestResolveHermesUidGid (7) — env-var parsing
- TestChownToHermesUid (5) — chown helper invariants
- TestSecureDirChown (2) — end-to-end through _secure_dir

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-01 13:27:30 +10:00
Teknium
e3b3d4d75e feat(models): add MiniMax-M3 to native minimax providers + 1M context (#36214)
Add MiniMax-M3 to the minimax, minimax-oauth, and minimax-cn curated
lists (these are hardcoded — the native Anthropic-format endpoint has no
/v1/models listing and the providers aren't in _MODELS_DEV_PREFERRED, so
new models don't auto-pull). Add a DEFAULT_CONTEXT_LENGTHS key
'minimax-m3' -> 1,000,000 so M3 resolves to its 1M context on every
surface (native ID + OpenRouter/Nous slug) via longest-key-first
substring match, while the M2.x series stays at 204,800.
2026-05-31 20:18:05 -07:00
brooklyn!
79f7e7a1e9 fix(desktop): make locally-built macOS app relaunchable after in-place self-update (#36198)
On macOS the desktop app is built locally and ad-hoc signed (no Developer ID
on the user's machine). An ad-hoc bundle has no stable Designated Requirement,
so when the self-updater rebuilds it in place with a fresh build (new cdhash)
— plus the com.apple.quarantine flag inherited from the downloaded installer
process chain — Gatekeeper/LaunchServices treats the changed code as tampering
and macOS reports "Hermes is damaged and can't be opened," and the app fails to
relaunch. First launch works (fresh registration); the in-place update relaunch
is what breaks.

Fix: after building the desktop app locally, strip quarantine xattrs and
re-apply a clean deep ad-hoc signature (omitting the hardened-runtime flag,
which an ad-hoc build can't satisfy). Applied in both build entry points:
- hermes_cli/main.py cmd_gui (the `hermes desktop --build-only` path the
  updater drives) — so the fix ships via `hermes update` (git), no installer
  re-download needed.
- scripts/install.sh install_desktop (first install) for parity.

Both are no-ops on non-macOS and when a real signing identity (CSC_LINK /
APPLE_SIGNING_IDENTITY) is configured, so signed/notarized builds are untouched.
2026-05-31 21:27:23 -05:00
Teknium
a8526a4159 chore(models): bump minimax to minimax-m3 in openrouter + nous lists (#36191)
Replace minimax/minimax-m2.7 with minimax/minimax-m3 in the OpenRouter
fallback snapshot and the Nous portal model list.
2026-05-31 19:24:17 -07:00
Simon Taggart
a75a45414c fix(tools): fall back to .hermes/.env when forwarded secret is empty (#35583)
The docker_forward_env build loop only consulted the ~/.hermes/.env disk
fallback when a key was unset (value is None), not when it was present
but empty (""). A transient empty value in os.environ was therefore
forwarded into the sandbox container as `-e KEY=`, clobbering the correct
value on disk. Sandboxed workloads then read a zero-length secret and
failed auth (observed as intermittent Linear API 401s) with no gateway
restart and no .env rewrite.

Treat empty-string like unset (`if not value:` on the fallback) and never
forward a blank secret (`if value:` on the guard).

Fixes #35580
2026-06-01 12:20:00 +10:00
Ben Barclay
e2ee9177f0 chore(attribution): map SiTaggart for PR #35583 (#36189)
Adds me@simontaggart.com → SiTaggart to AUTHOR_MAP so the
check-attribution gate passes for the docker_forward_env empty-secret
fix (#35583, fixes #35580).
2026-06-01 12:16:54 +10:00
ethernet
9a82cd33d8 Merge pull request #36190 from NousResearch/ethie/sign-win
add a github action to build& sign a windows installer
2026-05-31 22:10:45 -04:00
ethernet
4e530f1a27 add a github action to build& sign a windows installer 2026-05-31 22:09:44 -04:00
Foldblade
1031031dec fix(docker): skip unnecessary boot chown when volume ownership already matches remapped UID (#35027) 2026-06-01 11:59:43 +10:00
Teknium
758454d1e4 fix(docker): validate HERMES_UID/GID to prevent privilege escalation in stage2-hook (#35340)
Co-authored-by: sprmn24 <oncuevtv@gmail.com>
2026-06-01 11:46:53 +10:00
Donovan Yohan
dcbf62e26a fix(docker): seed s6 gateway state for legacy run cmd (#34829)
* fix(docker): seed s6 gateway state for legacy run cmd

* fix(docker): honor no-supervise during legacy gateway migration

---------

Co-authored-by: Donovan Yohan <donovan-yohan@users.noreply.github.com>
2026-06-01 11:28:56 +10:00
Siddharth Balyan
e1c7a9aa7b feat(tools): surface the free tool pool in entitlement + setup (#36153)
Read the Portal's tool_access claim (JWT + /api/oauth/account) into NousToolAccessInfo and gate managed Tool Gateway access on it: tool_gateway_entitled (paid OR live pool) and per-category tool_gateway_entitled_for(). The pool funds web/image/tts/browser but not video, so per-backend availability, the charge picker (ensure_nous_portal_access coverage_category), and managed defaults all respect coverage.

Setup: rebuild prompt_enable_tool_gateway as a per-tool checklist that renders whenever the pool is enabled, lists only pool-covered tools (video excluded for free-pool users), and is framed as the free tool pool for $0 subscribers rather than a paid subscription. get_gateway_eligible_tools now gates and filters off the entitlement snapshot.
2026-06-01 06:32:48 +05:30
brooklyn!
fa4ebaa8b5 fix(install): build desktop in 'desktop' stage on macOS/Linux instead of silently skipping (#36134)
The thin installer (apps/bootstrap-installer) drives install.sh stage-by-stage,
each in its own process. The `desktop` stage never called check_node, so the
Hermes-managed Node provisioned earlier (at $HERMES_HOME/node/bin) wasn't on
PATH. install_desktop's `command -v npm` check then failed and the build was
skipped — yet the stage still reported {"ok":true,"skipped":false}, so the
installer showed "Installation Complete" and only failed at the end with
"Couldn't find a built Hermes desktop ... the desktop build step may have been
skipped or failed."

Fix:
- Call check_node in the `desktop` stage (mirrors every other Node-dependent
  stage) so the managed Node is on PATH (or installed).
- Make install_desktop self-provision via check_node and hard-fail (return 1)
  if npm is still unavailable, instead of a silent `return 0`. The desktop
  stage only runs when a build is explicitly requested (--include-desktop), so
  an unavailable toolchain is a real failure, not graceful degradation.

Verified on macOS arm64: the `desktop` stage now builds
release/mac-arm64/Hermes.app, which matches resolve_hermes_desktop_exe, so the
installer's "Launch Hermes" succeeds.
2026-05-31 19:03:10 -05:00
brooklyn!
77bb64813c fix(desktop): report desktop_contract in lazy session.create info (#36112)
The lazy session.create path hand-builds a partial info dict that omitted
desktop_contract. The desktop GUI reads a missing contract as undefined and
treats it as an out-of-date backend, so it surfaced a "Backend out of date"
toast on every launch even against a current backend. Carry the contract in
the lazy payload like _session_info already does for resume/branch.
2026-05-31 18:23:10 -05:00
brooklyn!
3ef97a61b9 fix(desktop): track main for self-update now that GUI merged (#36104)
The desktop self-update branch defaulted to bb/gui, the pre-merge feature
branch. Now that the desktop app is on main, flip DEFAULT_UPDATE_BRANCH to
main so freshly built apps check for updates against the right branch
instead of relying on the runtime self-heal fallback.
2026-05-31 17:53:35 -05:00
Teknium
cd8aa389c9 Revert "fix(tui): clamp bogus terminal dimensions (WSL 131072x1) (#35657)" (#36096)
This reverts commit b1d34cf6e2.
2026-05-31 15:51:11 -07:00
brooklyn!
51c68d4ab1 Add Hermes desktop app (#20059)
* feat: better composer etc

* docs: add desktop and dashboard run instructions

* fix(desktop): address security scan findings

* fix(dashboard): resolve @nous-research/ui path under npm workspaces

The sync-assets prebuild step shelled out to 'cp -r
node_modules/@nous-research/ui/dist/fonts ...' with a path relative
to apps/dashboard/. That works only when the dep is installed
locally in the dashboard workspace, but 'npm install' at the repo
root (the documented setup — see apps/desktop/README.md) hoists
shared deps to the root node_modules under npm workspaces. The
relative cp then fails with 'No such file or directory', sync-assets
exits 1, the Vite build aborts, and 'hermes dashboard' surfaces a
generic 'Web UI build failed' message.

Replace the shell one-liner with scripts/sync-assets.cjs, which
walks up from the dashboard directory looking for node_modules/
@nous-research/ui — working in both the hoisted (workspaces) and
co-located (standalone) layouts. Also guards against a missing
dist/fonts or dist/assets with a clearer error pointing at a
rebuild of the UI package rather than silently copying nothing.

* feat(desktop): support connecting to a remote Hermes backend

Add HERMES_DESKTOP_REMOTE_URL and HERMES_DESKTOP_REMOTE_TOKEN env
vars that, when set, short-circuit the local-child spawn in
startHermes() and connect the Electron renderer to an already-
running 'hermes dashboard' server reachable over the network.

Motivating use case: WSL2 users who want to run the Hermes core
(agent loop, tools, filesystem access) inside their WSL
distribution while rendering the Electron GUI on native Windows.
Before this change, the desktop app always spawned a local Python
child on the same host as the renderer, which doesn't cross the
WSL/Windows boundary.

The remote path reuses waitForHermes() as a liveness probe
(/api/status is in the backend's public endpoint allowlist), so
the connection is only returned once the backend is actually
ready. WebSocket URL derivation picks ws:// or wss:// based on
the input scheme. URL validation rejects non-http(s) schemes and
requires both env vars together to avoid a half-configured
connection that would silently fall through to the spawn path.

No behaviour change when the env vars are unset — the default
local-spawn flow is untouched.

Typical usage:

  # in WSL2
  hermes dashboard --tui --no-open --host 0.0.0.0 --port 9119 --insecure

  # on Windows
  set HERMES_DESKTOP_REMOTE_URL=http://localhost:9119
  set HERMES_DESKTOP_REMOTE_TOKEN=<session token>
  set HERMES_DESKTOP_IGNORE_EXISTING=1
  (launch Hermes desktop)

* ci(desktop): automate desktop releases

Add GitHub Actions release channels for signed desktop installers and document the stable/nightly download paths.

* feat: file tabs

* refactor(desktop): tighten right-rail tab close API

Promote closeRightRailTab/closeActiveRightRailTab as the single
public entry point. Drops the activeTabRef + handleCloseDocument
indirection in ChatPreviewRail, the unused $rightRailHasContent
atom, and the legacy dismissFilePreviewTarget alias. -70 LOC.

* feat(desktop): polish composer pill toward reference look

Solid foreground-on-background send/voice-conversation circle (black-on-white
in light, white-on-black in dark) anchors the right edge as the primary CTA
instead of the orange theme primary. Bumps the primary control to 2.125rem so
it visually outranks the ghost mic/plus controls. Opens up the surface padding
(0.625rem x / 0.5rem y) so the input row breathes around its controls, and
nudges the corner radius from 20 to 24px for a slightly pill-ier silhouette.
LiquidGlass distortion is preserved.

* feat(desktop): add startup and onboarding flow

Add phase-based desktop boot progress, fresh-install sandbox testing, and first-run provider credential onboarding so packaged installs can start cleanly without manual settings detours.

* fix(desktop): gate prompts on provider setup

Show the desktop provider onboarding flow before prompt submission when no inference provider is configured, preventing fresh installs from falling through to backend credential errors.

* fix(desktop): surface provider onboarding from session warnings

Propagate credential warnings through session runtime info and open desktop onboarding whenever a session reports no usable provider, so unconfigured installs cannot fall through to prompt errors.

* fix(desktop): route gateway provider errors to onboarding

The "No inference provider configured" auth error reaches the renderer through gateway error events, not the prompt.submit promise; the previous patch only caught the latter, so the error toast still surfaced and onboarding never opened.

Also strip credential-shaped env vars from the test:desktop:fresh sandbox so the packaged backend can't see provider keys leaking from the launching shell.

* fix(desktop): use strict runtime check to drive onboarding

setup.status returned True whenever any provider auth state was discoverable, including indirect fallbacks like a gh-CLI Copilot token. That made desktop think the user was set up while the agent's actual resolve_runtime_provider call still raised AuthError, leaving the user with a useless toast and no onboarding.

Add a setup.runtime_check gateway method that runs the same resolver the agent uses on session creation, and switch the desktop onboarding overlay and prompt precheck to use it.

* feat(desktop): OAuth-first onboarding using existing dashboard provider API

Replace the engineer-flavored API key form with a Sign-in-first onboarding overlay that uses the dashboard's existing /api/providers/oauth catalog and PKCE/device-code endpoints (Anthropic, Nous, OpenAI Codex, etc.). API key entry is now a fallback tab with friendly provider names instead of env var prefixes, and the loud raw resolver error is gone in favor of a one-line welcome message.

* fix(desktop): polish onboarding provider list

Reorder OAuth providers so Nous Portal is first, give the segmented Sign in / API key control equal column widths, and replace the engineer-flavored backend names like "Anthropic (Claude API)" / "MiniMax (OAuth)" with friendlier in-app titles. External-CLI providers now show a softer subtitle and an external-link icon instead of a chevron.

* refactor(desktop): split onboarding overlay into store + view

Move the OAuth state machine, runtime check, copy-to-clipboard, and api-key save into store/onboarding.ts (matching the boot.ts pattern), leaving the overlay as a presentation layer that subscribes via useStore. Tabs are now table-driven, child panels read flow from the store instead of prop-drilling, and the polling/PKCE/error/success branches share a small Status atom.

* fix(desktop): external CLI providers + center mode tabs

External-CLI providers (Claude Code, Qwen Code) now open an in-overlay panel with the CLI command, copy button, and an "I've signed in" recheck instead of firing an invisible toast. Center the Sign in / API key tab control so it sits under the heading instead of hugging the left edge.

* fix(desktop): drop onboarding tabs for an inline link, group device-code waiting state

Replace the Sign in / API key tab pair with an "I have an API key" footer link under the OAuth provider list, with a "Back to sign in" affordance inside the API key form. Group the device-code "Waiting for you to authorize..." status next to the Cancel button so the alignment matches the action.

* refactor(desktop): tighten onboarding store + overlay

Drop the dead isOnboardingBusy/BUSY set, factor the catch-fallback dance into safeReq, and share a single reloadAndConnect helper between PKCE submit, device-code success, external recheck, and api-key save.

In the overlay, extract Step / CodeBlock / FlowFooter / CancelBtn / DocsLink atoms so the four sign-in panels share the same chrome instead of repeating it inline. Net effect: fewer literal divs, one place to touch the spacing, and the code-block + footer rows are reusable across future flows.

* fix(desktop): mount onboarding from frame 1 to kill the FOUT

Default onboarding.configured to null (unknown until the runtime check resolves) and have the onboarding overlay render whenever it's not yet confirmed true. The boot overlay now yields to it, so the very first paint is the Welcome card with a "While we get you set up..." progress strip instead of a flash of the chat shell between boot dismiss and onboarding mount.

The picker swaps in cleanly once the gateway opens and the runtime check confirms the user is not configured. Already-configured users see the same prep card briefly while their existing runtime warms up, then the overlay dismisses without touching the chat shell.

* fix(desktop): top-align empty sessions placeholder

The "Start a chat to build your history." empty state used a min-h-35 grid place-items-center container, which floated the text in a tall dead zone. Render it as a flat paragraph that sits right under the section header like the empty pinned state does.

* refactor(desktop): drop dead boot overlay

Onboarding overlay subsumes the boot card now that it mounts from frame 1 and renders boot progress inline. The standalone DesktopBootOverlay is unreachable in every flow (yields whenever onboarding has not confirmed configured, dismisses once it has).

* fix(desktop): hide pinned/recents sections until first session

A fresh sidebar showed the Pinned and Recent chats headers with floating empty-state copy underneath. Drop both sections (and the now-orphan SidebarEmptySessionState) when there are no sessions yet — they reappear after the first chat. Skeletons during initial load are unchanged.

* feat(gui): route embedded TUI through dashboard gateway (#21979)

Inject HERMES_TUI_GATEWAY_URL into dashboard PTY sessions so embedded ui-tui instances attach to the in-process websocket gateway, with coverage for the new env wiring.

* Add desktop remote gateway settings

Make the desktop gateway connection configurable from settings so local remains the default while remote backends can be saved, tested, and applied without environment variables.

* feat(gui): first-class Messaging page + gateway menu redesign

- Add Messaging page to the desktop app with per-platform setup,
  status, and inline guidance. Catalog derives from gateway.config
  Platform enum + plugin registry, so every messaging adapter the CLI
  supports (Telegram, Discord, Slack, Mattermost, Matrix, WhatsApp,
  Signal, BlueBubbles, Home Assistant, Email, SMS, DingTalk, Feishu,
  WeCom, Weixin, QQ, Yuanbao, API server, Webhooks, plugins) shows up
  without per-platform code.
- New REST endpoints: GET /api/messaging/platforms, PUT and POST
  /test on the same path. Secrets go through the existing .env
  pipeline; enable/disable writes config.yaml.
- Replace gateway statusbar dropdown with a richer panel: status row,
  icon-only restart + system-panel actions, recent activity (with
  timestamps trimmed in display, full text on hover), platform list.
- Auto-poll the messaging page every 6s (paused when hidden) so
  status updates without a manual check.
- Drop Settings / Command Center from the sidebar nav (still
  reachable via shortcuts and the titlebar cog).
- Flatten top corners on Messaging/Skills/Artifacts/Chat panes.
- Share new StatusDot component across messaging + gateway menu.
- Fix gateway/config.py so an explicit platforms.<name>.enabled=false
  in config.yaml is honored when env tokens are present.
- pb-9 on the chat content area for breathing room above the composer.

* Potential fix for pull request finding 'CodeQL / Clear-text logging of sensitive information'

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

* pin electron version

* hide application menu on non-mac systems

* interpret compactPreview for non-string vlaues as JSON or an empty string

* fix(desktop): keep composer contenteditable mounted across stacked toggle

The composer rendered {input} inside two different parent fragments
depending on `stacked`. When auto-expand flipped `stacked` (e.g. the
moment typed text wrapped past two lines), React reconciled the two
branches as different positions and unmounted/remounted the
contenteditable. The fresh mount started empty, so any in-flight
characters — most reliably reproduced by holding a key — were lost.

Replace the conditional with a single CSS Grid whose template-areas
swap on `stacked`. The three children (menu, input, controls) keep
stable identities across the toggle; only their grid placement
changes, which the browser handles without React tearing down the
editor.

* refactor(desktop): align install layout with install.ps1 / install.sh

Make the desktop app's runtime layout match what scripts/install.ps1 and
scripts/install.sh produce, so a desktop-only user and a CLI-only user end
up with the same files in the same places and can share one install.

Layout
- ACTIVE_HERMES_ROOT = HERMES_HOME/hermes-agent  (was: process.resourcesPath/hermes-agent, read-only)
- VENV_ROOT          = HERMES_HOME/hermes-agent/venv  (was: userData/hermes-runtime)
- desktop.log        = HERMES_HOME/logs/desktop.log  (was: userData/desktop.log)
- HERMES_HOME default: %LOCALAPPDATA%\hermes on Windows, ~/.hermes elsewhere

The packaged .app/.exe still ships a read-only payload at
process.resourcesPath/hermes-agent (FACTORY_HERMES_ROOT). On first launch
or after an installer-driven upgrade we sync factory -> active, then
provision the venv and run pip install -e . against the active root.

Key behaviors
- Pin HERMES_HOME in the spawned Python's env so get_hermes_home() resolves
  to the same path resolveHermesHome() picked. Without this, Python falls
  back to ~/.hermes on every platform - fine on mac/linux, a split-state
  bug on Windows where our default is %LOCALAPPDATA%\hermes.
- Detect developer installs by .git presence at ACTIVE; never overwrite
  a user's checkout via factory sync.
- Marker at ACTIVE/.hermes-desktop-runtime.json (schema v4) tracks
  pyproject hash + factory version + runtime schema version. depsFresh
  fast-paths when nothing changed.
- Dev (npm run dev) prefers SOURCE_REPO_ROOT over ACTIVE so devs run
  their local edits, not whatever's under HERMES_HOME.
- Better error messages distinguish "no payload" from "no Python".
- Preserve a legacy ~/.hermes on Windows when no %LOCALAPPDATA%\hermes
  exists, so users with prior pip/manual installs aren't orphaned.

pyproject.toml
- Promote fastapi, uvicorn[standard], ptyprocess (non-Windows), and
  pywinpty (Windows) to main dependencies. The dashboard backend
  (hermes dashboard) needs them at runtime; the previous lazy-import
  fallback was a footgun for fresh installs.
- Empty the [pty] optional-extra; kept as a no-op back-compat alias for
  any existing pip install hermes-agent[pty] invocations.

Drops the hardcoded BUNDLED_RUNTIME_REQUIREMENTS list in main.cjs - the
desktop now installs whatever pyproject.toml says, single source of truth.

Files
- apps/desktop/electron/main.cjs:    runtime layout, HERMES_HOME pin,
                                      factory->active sync, marker v4
- apps/desktop/scripts/test-desktop.mjs:  track new venv location
- apps/desktop/README.md:            new Setup, Runtime Bootstrap, and
                                      Debugging sections
- pyproject.toml:                    fastapi/uvicorn/pty backends in main
                                      dependencies; [pty] extra emptied

Tested locally on Windows: npm run dev boots cleanly, sessions land at
the new location, type-check + lint + test:desktop:platforms all pass.
Verified end-to-end on a fresh Win11 VM via dist:win installer.

Known gaps (filed as follow-ups, not in this PR):
- Skills not seeded on packaged installs (sync_skills only runs in
  cmd_chat, not cmd_dashboard). Need to move to shared pre-dispatch.
- Git Bash not bundled or detected; agent's terminal tool errors out
  with a useful message but desktop bootstrapper should pre-flight it.
- install.ps1 / install.sh should be decomposed into composable phase
  libraries so the desktop bootstrapper can reuse them as a single
  source of truth across all install surfaces.

* feat(desktop): theme polish, prose chat typography, composer chrome

- DS tokens/midground, Backdrop, scoped scrollbars, typography plugin + prose
- Composer liquid/radius utilities, thread font parity, tool/thinking cues
- File tree label scale, preview flex, thread retry loading + streaming tests

* feat(desktop): NSIS prereq detection page + auto-install via winget

The packaged Windows installer now detects Python 3.11+ and Git for Windows
at install time and offers to install missing prereqs via winget. Mirrors
the prereq logic scripts/install.ps1 already runs for CLI installs, so
desktop installer users get the same out-of-the-box experience as
install.ps1 users.

Why
- Hermes' terminal tool calls bash.exe directly (tools/environments/
  local.py); on Windows that's Git Bash from Git for Windows. Without it,
  the agent fails on the first terminal() call.
- Hermes' Python runtime needs 3.11+. Without it, the desktop bootstrapper
  errors out at venv creation.
- Both gaps surfaced on a fresh Windows 11 VM smoke test: VM had Python
  pre-installed but no Git, so the agent's first terminal call failed
  with "Git Bash isn't installed."
- install.ps1 has had Install-Git + Install-Uv functions for ages. The
  desktop installer was the asymmetric outlier.

How — NSIS prereq page
- New file: apps/desktop/installer/prereq-check.nsh (plugged into
  electron-builder via build.nsis.include)
- Real Wizard page using nsDialogs, inserted via customPageAfterChangeDir
  hook (between the Directory page and InstFiles).
  - Group boxes for Python and Git, each showing detection status.
  - Pre-checked install checkboxes when winget is available.
  - Auto-skips silently if both prereqs are already installed.
  - Falls back to manual download URLs when winget itself is missing.
- Detection:
  - Python: probes `py -3.11`/`-3.12`/`-3.13`/`-3.14` via the Python
    launcher. Microsoft Store "Python stub" (no py.exe) is correctly
    classified as not-installed.
  - Git: `where git`.
  - winget: `where winget` (Win10 1809+ / Win11 with App Installer).
- Install execution (in customInstall macro):
  - Python: nsExec::ExecToLog with `--scope user --silent`. Per-user
    install, no UAC prompt, output streams to install log.
  - Git: ExecShellWait via Windows ShellExecute. Critical because Git
    always installs per-machine and triggers UAC; ShellExecute preserves
    the foreground focus chain across non-elevated → elevated process
    spawns, so UAC actually comes to the foreground. nsExec::ExecToLog
    breaks the chain because winget runs hidden.
  - Both pass `--disable-interactivity --accept-package-agreements
    --accept-source-agreements` to suppress winget's own dialogs.
- Verification: probes Git's standard install locations via FileExists
  rather than `where git`. NSIS's process inherits PATH at startup, so
  a freshly-installed Git won't be visible to `where` until restart.
- Silent installs (/S) skip the prompts; managed deploys handle prereqs
  out-of-band via Group Policy / Intune.

How — Electron-side safety net
- New findGitBash() in main.cjs, parallel to findSystemPython(). Probes
  the same locations as tools/environments/local.py:_find_bash() so a
  positive result here means the agent's terminal tool will work.
- ensureRuntime now throws a clear, actionable error on Windows when Git
  Bash isn't found, matching the existing "Python 3.11+ is required"
  error path.
- Catches users the NSIS page doesn't: .msi installer users (NSIS prereq
  page doesn't run for MSI), `npm run dev` users, manual installers,
  anyone who unchecked the install boxes on the NSIS prereq page.
- All gated on `IS_WINDOWS`; macOS / Linux unaffected.

NSIS build issue (resolved)
- electron-builder defaults to `-WX` (warnings as errors). NSIS optimizer
  emits "warning 6010: function not referenced" for our page functions
  because Page custom directives don't count as references in its
  static-analysis pass. The functions ARE called at runtime when NSIS
  invokes the page; the optimizer just can't see it statically.
- Set `build.nsis.warningsAsErrors=false` in package.json so this
  spurious warning doesn't fail the build. (Documented option from
  electron-builder's nsisOptions.)

Out of scope (filed for future work)
- MSI prereq detection: Windows Installer custom actions are a different
  mechanism. Enterprise deploys typically handle prereqs via GP/Intune.
- Bundle PortableGit + python-build-standalone in extraResources for
  zero-network installs. ~80MB increase.
- Mac / Linux GUI prereq flows (different installer formats; Xcode CLT
  covers most macOS prereqs already; Linux is per-distro hard).

Files
- apps/desktop/installer/prereq-check.nsh   (new, ~290 lines NSIS)
- apps/desktop/package.json                 (build.nsis.include +
                                              warningsAsErrors)
- apps/desktop/electron/main.cjs            (findGitBash + preflight)
- apps/desktop/README.md                    (Runtime prerequisites
                                              section)

Cross-platform impact
- macOS / Linux builds (dist:mac, dist:mac:dmg, dist:mac:zip): nsis
  config is ignored entirely; .nsh is dormant.
- npm run dev: .nsh dormant; main.cjs preflight gated on IS_WINDOWS.
- scripts/install.ps1, scripts/install.sh: no reference to any new
  files; CLI install paths untouched.
- Hermes CLI / dashboard / gateway: no reference; runtime untouched.
- All checks: node --check on main.cjs and test-desktop.mjs pass;
  npm run test:desktop:platforms 4/4 passing; node --test green.

Tested
- npm run dist:win produces signed .exe and .msi without errors.
- Fresh Win11 VM (Python pre-installed, no Git): prereq page renders,
  Python check shows detected, Git checkbox pre-checked. Click Next →
  Git installs via winget with UAC prompt in foreground.
- After install completes, Hermes launches and the agent's terminal
  tool can run bash commands. Verified Git Bash is detected at
  `C:\Program Files\Git\bin\bash.exe` by ensureRuntime's preflight.

* feat: theme changes, composer tweaks, in app update ux, finesse

* fix(cli): seed bundled skills on dashboard + gateway entrypoints

`sync_skills(quiet=True)` was only being called from inside `cmd_chat`,
which meant `hermes dashboard` (the desktop GUI's backend) and `hermes
gateway` (Telegram/Discord/Slack/etc daemons) never seeded the bundled
skill library into ~/.hermes/skills/.

This surfaced as "No skills found" in the desktop GUI's skills panel on
fresh installs, despite the agent having access to the full bundled
library when invoked via `hermes chat`. scripts/install.ps1 worked
around it by running skills_sync.py as part of Copy-ConfigTemplates,
but that's not part of the desktop installer's bootstrap chain.

Fix
- Extract the skills-sync block from cmd_chat into a module-level
  `_sync_bundled_skills_quietly()` helper.
- Call the helper from cmd_chat (preserving existing behavior),
  cmd_dashboard (after the --status/--stop early-return paths and
  fastapi import check, so we don't run skills_sync on management
  commands or when deps aren't installed), and cmd_gateway.

Why these three entrypoints
- cmd_chat: the user's primary CLI entrypoint
- cmd_dashboard: the desktop GUI's backend; this is what `hermes
  dashboard --tui` invokes when the desktop bootstrapper spawns Hermes
- cmd_gateway: long-running daemons where the user expects the agent
  to have full skill access

Other entrypoints (cmd_config, cmd_doctor, cmd_login, cmd_status,
etc.) are management commands that don't need skill discovery and were
never running skills_sync in the first place — leaving them alone.

Idempotence
- tools/skills_sync.py is manifest-based: skipped skills cost
  milliseconds. Calling it from multiple entrypoints adds no real
  cost, and users running `hermes chat` then `hermes dashboard` get
  two fast no-ops on the second call.

Failure handling
- Helper wraps skills_sync in try/except. Skills are an enhancement,
  not a hard dependency — Hermes runs fine with an empty skills/ dir.

Files
- hermes_cli/main.py:
  + new helper `_sync_bundled_skills_quietly()` at module level
  + cmd_chat: replace inline block with helper call
  + cmd_dashboard: add helper call after fastapi import succeeds
  + cmd_gateway: add helper call before delegating to gateway_command

* feat(desktop): hoisted todo widget, JSON tool summaries, history grouping & timer fixes

- Hoist todo to first-class widget (shadcn checkboxes, brand colors, no
  tool-accordion). Header derives label from active task; non-active rows fade.
- Replace raw JSON dumps with structured key/value summaries via
  formatToolResultSummary; nested error extraction for clearer failures.
- Fix loaded-session grouping: stitch interleaved assistant/tool iterations
  into one bubble instead of orphaned synthetic messages.
- Stable tool/thinking timers via keyed registry so unmount/scroll doesn't
  reset elapsed counts; gate "running" on real live thread state.
- Reorganize chat-only assistant-ui components under components/chat/.

* fix(desktop): address CodeQL alerts on PR #20059

- settings/helpers.ts: harden setNested against prototype pollution.
  POLLUTING_PATH_PARTS check is now applied at every assignment site
  (loop + leaf) and uses Object.defineProperty so CodeQL can see the
  guard inline rather than via a helper function call.

- lib/markdown-preprocess.ts: rebuild the dangling-fence close regex
  from a fence-char + length instead of marker.replace(...). The marker
  is captured by `(`{3,}|~{3,})` so it can only be backticks or tildes,
  but CodeQL was tracing tainted input text into the RegExp source and
  flagging hostname dots from input as part of the pattern (false
  positive js/incomplete-hostname-regexp on the test fixture URLs).
  Reconstructing from a literal char breaks the dataflow.

- scripts/notarize-artifact.cjs: drop args from the run() rejection
  message. Args carry --key-id / --issuer / key file path; the existing
  outer catch already squashes errors to a generic line, but CodeQL was
  flagging the args.join(' ') as clear-text logging of APPLE_API_KEY_ID.

Composer DOM-text-as-HTML alerts (composer/index.tsx:379, :547) are
already addressed in 4dd9732a9 — innerHTML assignment was replaced with
renderComposerContents which builds DOM via replaceChildren / append
text nodes (no HTML interpretation).

* fix(desktop): inline prototype-pollution guard so CodeQL sees it

CodeQL's dataflow doesn't follow the helper-function guard inside
`safeSet`, so it kept flagging Object.defineProperty as prototype-
polluting. Inline the literal `__proto__`/`constructor`/`prototype`
check at the assignment site to break the dataflow.

Behavior unchanged — same set of disallowed keys, same throw.

* feat(ui-tui): resolve links to readable page titles

Mirror desktop pretty-link behavior in the TUI by resolving HTTP links to page titles with shared caching and safe fetch filters, plus slug-based fallbacks so chat links stay readable even when title fetch fails.

* fix(desktop): drop RegExp from dangling-fence close detection

Previous attempt tried to break the dataflow by reconstructing the
close-fence regex from a literal char + marker.length, but CodeQL still
traced marker.length back to input and kept flagging the test-fixture
URLs as hostname-regex sources (js/incomplete-hostname-regexp).

Replace `new RegExp(...)` + `closeRe.test(body)` with a string-only
hasCloseFenceLine() helper that splits on '\n' and uses ===. No regex
on this path now, so input data can no longer reach a RegExp source.

Behavior preserved: matches lines that are (whitespace + marker +
whitespace), which is what the original `\n[ \t]*${marker}[ \t]*(?=\n|$)`
matched. All 12 markdown-text tests still pass.

* fix(process-registry): suppress windows-footgun false positive on guarded killpg

Keep the existing POSIX-only process-group teardown path, but make the
signal selection explicit via getattr and add an inline windows-footgun
suppression marker on the guarded os.killpg line so the Windows footgun
check no longer blocks CI on this intentionally platform-gated code.

* feat(desktop): reconcile live tool events, polish thread chrome, harden boot

- chat-messages: match tool rows by overlapping query/context/preview values
  so preview-first `tool.progress` rows reliably adopt later stable-id
  `tool.start` payloads instead of spawning ghost rows or mis-merging
  parallel same-name calls; preserve prior args/result across phases.
- tui_gateway: emit full args + parsed result on `tool.start` / `tool.complete`,
  drop redundant `tool.started` re-emit from `tool.progress`.
- electron/main: prefer SOURCE_REPO_ROOT before PATH `hermes` in dev so
  local backend edits actually run; split hardening helpers into
  `electron/hardening.cjs` with tests.
- thread/tool UI: one-shot enter animation keyed by stable ids, braille
  spinner for running rows, Cursor-like disclosure rows, drill-down +
  duration/count formatting via new tool-fallback-model.
- composer: extract `text-utils`, drop liquid-glass overrides.
- right-rail: split preview-pane into preview-console / preview-file.
- runtime: incremental external-store runtime + runtime-readiness gate;
  onboarding store + tests; route-resume hook test.
- regression tests for live tool reconciliation (parallel tools, id-less
  progress, preview-first rows, structured args/results).

* feat(desktop): add ripgrep to NSIS prereq page + polish layout

Add ripgrep as a third (recommended) prereq alongside Python and Git in
the NSIS prereq detection page, and clean up the page layout based on
on-VM testing.

Why ripgrep
- Hermes' search_files tool calls `rg` directly for content + filename
  search (tools/file_operations.py:1382). Falls back to grep/find from
  Git Bash when missing — works but slower and noisier (no .gitignore
  awareness).
- ~5MB winget install via `BurntSushi.ripgrep.MSVC --scope user` — no
  UAC prompt, parallel to how Python installs.
- scripts/install.ps1 already installs ripgrep as part of
  Install-SystemPackages; this brings the desktop installer to parity.

Why "recommended" not "required"
- Python and Git are hard requirements: without them the agent runtime
  or terminal tool refuses to start. The bootstrapper preflight throws.
- ripgrep is a performance enhancement: missing it just means slower
  searches. Page wording reflects this; failure to install is logged
  but doesn't show a MessageBox or block.

Layout polish (response to on-VM screenshot review)
- Wizard header now correctly reads "System Requirements" instead of
  the leftover "Choose Install Location" from the previous page. Set
  via `GetDlgItem $HWNDPARENT 1037/1038` + WM_SETTEXT — the standard
  NSIS pattern for overriding the page header on a custom Page.
- Removed redundant in-body title + verbose intro paragraph; the
  wizard header IS the title now. Body has one short intro line.
- Group boxes tightened to 26u with content positioned just below the
  groupbox title (not top-anchored status + bottom-anchored checkbox
  with empty space in the middle). All three panels + footer fit
  comfortably in 126u, well under the 140u page limit.
- Checkbox labels simplified: dropped "(per-user, no admin prompt)"
  and "(administrator approval required)" suffixes. The footer note
  still calls out UAC for Git when relevant.
- Footer text trimmed to fit cleanly without clipping.

Install order (in customInstall macro)
- Python → ripgrep → Git
- Python and ripgrep are silent and run first; Git's UAC prompt comes
  last so the user's approval interaction isn't interrupted by silent
  activity afterwards.

Skip behavior unchanged
- All three detected → page auto-skips via Abort
- Silent install (/S) → customInstall winget block skips
- User unchecks all → page advances without running winget

Files
- apps/desktop/installer/prereq-check.nsh: ripgrep detection block,
  ripgrep page panel + checkbox, ripgrep customInstall block,
  GetDlgItem header override, layout reflow
- apps/desktop/README.md: Runtime prerequisites section updated to
  list ripgrep as recommended, with manual winget command

* feat(desktop): add model-confirmation step to onboarding

After OAuth/API-key login completes, onboarding now shows a confirmation
card with the curated default model and a Change button before dropping
the user into chat. Closes the gap where the desktop's `model.default`
was empty after first launch and the agent had to fall back to whatever
heuristic happened to fire — leaving users wondering "why am I getting
sonnet-4 when I logged into Nous Portal?"

Why
- Desktop onboarding only persisted credentials, never `model.default`.
  The CLI's `hermes model` command pairs provider + model selection,
  but the desktop's onboarding skipped the model step entirely.
- Result: users saw whichever model the agent's auto-fallback picked,
  unpredictably and undocumented.
- For the BUILD demo we want users to land on the model they expect
  for their provider, with a clear "this is what you're getting" UI
  and a one-click path to change it before chatting.

How
- New `confirming_model` flow status carries the just-authenticated
  provider slug, current default model, label, and a saving flag.
- `completeWithModelConfirm()` runs after credentials succeed: reloads
  env, verifies runtime, fetches /api/model/options to find the curated
  first-model for the provider, persists it via /api/model/set, then
  transitions into `confirming_model`.
- If anything fails (no providers returned, network error), falls
  through to the previous behaviour — onboarding completes without
  the confirm step. Polish, not a hard requirement.
- All four credential paths (device_code OAuth, PKCE OAuth, external
  CLI flow, API key) now use completeWithModelConfirm instead of
  reloadAndConnect.

UI
- `ConfirmingModelPanel` shows: green "<provider> connected" banner,
  card with "Default model: <name>" + Change button, and a "Start
  chatting" CTA that finalises onboarding.
- Reuses the existing `ModelPickerDialog` (the same picker available
  from the chat shell) for the change-model UX. Search, filtering,
  multi-provider listing — all already built.
- Stacking: ModelPickerDialog defaults to z-130, which renders UNDER
  the onboarding overlay (z-1300) and breaks pointer events. Added
  optional `contentClassName` prop to ModelPickerDialog so callers
  can override; onboarding passes `z-[1310]`.

Provider-slug matching
- For OAuth flows: pass `provider.id` directly as the preferred slug.
- For API-key flows: `OPENROUTER_API_KEY` → "openrouter" via env-key
  prefix strip. Also includes the user-visible label as a fallback
  candidate.
- fetchProviderDefaultModel falls back to the first authenticated
  provider in the response if no preferred slug matches — so even a
  miss still surfaces a reasonable default.

Files
- apps/desktop/src/store/onboarding.ts:
  + new `confirming_model` flow variant
  + fetchProviderDefaultModel + completeWithModelConfirm helpers
  + setOnboardingModel (optimistic update + revert on failure)
  + confirmOnboardingModel (finalises onboarding from the card)
  - reloadAndConnect (replaced; the four call sites now go through
    completeWithModelConfirm)
- apps/desktop/src/components/desktop-onboarding-overlay.tsx:
  + ConfirmingModelPanel component
  + new branch in FlowPanel for status `confirming_model`
  + ModelPickerDialog usage with z-[1310] content class
- apps/desktop/src/components/model-picker.tsx:
  + optional `contentClassName` prop on ModelPickerDialog so the
    dialog can be stacked on top of other fixed overlays

Tested
- `npm run type-check` passes
- `npx eslint` clean on touched files
- Live test in `npm run dev`: cleared onboarding cache, walked
  through Nous device-code flow, saw confirm card with curated
  default, clicked Change → ModelPickerDialog rendered above the
  onboarding overlay with working pointer events, picked a different
  model, "Start chatting" persisted to ~/.hermes/config.yaml.

* fix(desktop): suppress generic provider warning in onboarding

Hide the red setup notice when the message is the generic missing-provider guidance, since onboarding already presents provider auth actions. Centralize provider-setup matching across desktop hooks and add coverage for the matcher.

* fix(desktop): add 2u clearance below prereq checkboxes

Group box bottom border was clipping the checkboxes by 1-2px.
Bumped each box height 26u→30u; checkboxes now sit 2u above the bottom border.

* fix(nix): refresh dashboard lockfile hash

Update the web npm deps hash in nix/web.nix to match the committed apps/dashboard/package-lock.json so bb/gui passes the nix lockfile check.

* fix(desktop): install TUI deps in release workflow

Ensure desktop release builds install the standalone ui-tui package before bundling the TUI payload.

* fix(desktop): run release builder from app package

Invoke the desktop builder through the package script so electron-builder uses apps/desktop/package.json.

* fix(desktop): expand release artifact names safely

Build desktop artifact names from workflow version/channel while preserving electron-builder platform macros.

* fix(desktop): use package artifact naming in release workflow

Let electron-builder's desktop package config provide platform-specific artifact extensions while the workflow injects the release version/channel metadata.

* fix(nix): fetch dashboard npm deps from package root

Point the dashboard npm dependency fetch at apps/dashboard so Nix can find the package lockfile after the dashboard move.

* fix(nix): build dashboard from package directory

Set the web package source root to apps/dashboard so npm patch/build phases run beside the dashboard lockfile while keeping apps/shared available as a sibling.

* feat(desktop): render LaTeX math via KaTeX after streaming completes

Add @streamdown/math plugin to the chat markdown renderer.
Inline ($x^2$) and block ($$...$$) math both supported with
singleDollarTextMath enabled. Plugin is gated to non-streaming state
to match the existing pattern for syntax highlighting — math renders
when the message completes, avoiding KaTeX re-render churn during
streaming. KaTeX CSS is imported in styles.css; ~30KB CSS + ~430KB
JS added to the bundle. Smoothness improvements during streaming
deferred to a follow-up.

* perf(desktop): memoize KaTeX renders so math streams without re-rendering

Wrap rehype-katex with a per-equation LRU cache (keyed by
displayMode + source text) and re-enable math during streaming.

Stock @streamdown/math runs rehype-katex on every markdown commit,
so each new token re-katexes every equation in the message. For
math-heavy responses (an equation derived step-by-step) that's
hundreds of ms of wasted work per token and the streaming UI
chokes. With memoization, each equation pays katex.renderToString
exactly once; subsequent tokens re-walk the tree but hit cache for
unchanged equations.

The wrapper mirrors rehype-katex's semantics exactly: same class
detection (language-math, math-inline, math-display), same
<pre>-walk-up for fenced math blocks, same parent.children.splice
replacement, same SKIP traversal, same strict-then-lenient render
strategy with VFile message reporting.

Cached children are structuredCloned on each splice so downstream
rehype plugins or toJsxRuntime can't mutate the cache.

* fix(desktop): declare katex-memo deps directly + drop per-app lockfile

katex-memo.ts (added in 112cad59b) imports hast-util-from-html-isomorphic,
hast-util-to-text, remark-math, katex, and unist-util-visit-parents but
those were never added to apps/desktop/package.json. They were silently
resolving via @streamdown/math at the workspace root, which broke the
moment `npm i --prefix apps/desktop` ran with the per-workspace lockfile
because that install only consults apps/desktop/package.json. Add them
as direct deps, plus unified/vfile/@types/hast for the type imports.

Also delete apps/desktop/package-lock.json — root package.json declares
workspaces: ["apps/*"], so npm manages all lockfile state at the root.
The stale per-app lockfile is what made `npm i --prefix apps/desktop`
diverge from the workspace install in the first place and left an empty
apps/desktop/node_modules/@assistant-ui/ stub that Vite's dep optimizer
then tried (and failed) to open at @assistant-ui/core/dist/internal.js.

* feat(desktop): disable Backdrop noise overlay by default

The noise overlay defaulted to on, which adds a busy speckle layer over
the whole window for every new user. Flip the Leva default to off; the
toggle stays in Backdrop / Noise for anyone who wants it back.

* fix(desktop): polish LaTeX rendering — currency, code blocks, brackets

Five distinct bugs surfaced from a math-heavy stress test:

1. Adjacent code fences glued together. scrubBacktickNoise's
   second-pass regex /``\s*``/g matched the LAST 2 backticks of
   one fence + whitespace + FIRST 2 backticks of the next, collapsing
   two blocks into one. Fixed with lookbehind/lookahead so we only
   match exactly 2 backticks not part of a longer run.

2. Whitespace eaten between fences and following content.
   stripPreviewTargets internally calls .trim() which strips leading/
   trailing whitespace from each split-segment. For segments between
   two fences this collapsed \n\n to '', gluing fence close to next
   block. Fixed by capturing leading/trailing whitespace at the call
   site and restoring it after the transform.

3. Currency dollar signs eaten as math. With singleDollarTextMath:true
   remark-math greedy-matched any pair of $, so '$5 ... $10' became
   one inline math span. Added escapeCurrencyDollars to escape $<digit>
   patterns to \$<digit> in prose segments (not in code). Trade-off:
   math expressions starting with a digit (rare — '$5x = 10$') get
   escaped too. Mirrors the convention in ChatGPT/Claude's UIs.

4. \(...\) and \[...\] LaTeX brackets unsupported. Models often
   emit these instead of $...$ / $$...$$. Added
   rewriteLatexBracketDelimiters preprocessor pass.

5. ```latex / ```tex blocks were being routed to KaTeX via a
   rewrite to ```math. Aligns with GitHub markdown convention:
   ```math = render as math; ```latex / ```tex = LaTeX/TeX
   source code (syntax highlighted, not rendered). Conflating them
   broke teaching/showing-source use cases. MATH_FENCE_LANGUAGES
   pruned to {'math'} only.

Also flipped parseIncompleteMarkdown to true (was !isStreaming) so
the math parser can't see $ inside streaming-but-not-yet-closed code
fences. Shiki was already deferred via defer={isStreaming} so this
doesn't introduce new tokenization cost.

Test: 18/18 existing tests still pass; one test updated to expect
escaped \$ in currency-prose-with-URL case.

* fix(desktop): detect Python via registry/filesystem; pin to 3.11–3.13

Two related fixes for Python detection on Windows:

1. py.exe (Python launcher) is missing from per-user installs that
   didn't check the launcher option, so 'py -3.X --version' alone
   misses real Python installs. User-reported case: clean Win11 +
   official Python.org 3.14 install -> 'where py' returned nothing,
   our installer offered to install Python again. Both NSIS prereq
   page and main.cjs now probe in this order:
     1. py.exe launcher (when present)
     2. PEP 514 registry: HKLM/HKCU\SOFTWARE\Python\PythonCore\<v>\InstallPath
     3. Filesystem: %ProgramFiles%\Python<v>, %LocalAppData%\Programs\Python\Python<v>
   Crucially, we never fall back to running 'python.exe' from PATH
   on Windows — the WindowsApps stub at %LOCALAPPDATA%\Microsoft\
   WindowsApps\python.exe is a redirector that opens the Microsoft
   Store window if no Store Python is installed. Triggering that
   during boot would be terrible UX. Registry/filesystem probes
   never execute the binary.

2. Drop 3.14 from the supported version set. Several Hermes deps
   (notably pywinpty, which carries Rust crates like
   windows_x86_64_msvc) don't yet publish 3.14 wheels. With wheels
   missing, 'pip install -e .' falls back to building from sdist,
   which needs a Rust toolchain — users see 'could not compile
   windows_x86_64_msvc build script' on first run. install.ps1
   sidesteps this by pinning to 3.11 via uv; the desktop installer
   doesn't yet have the same uv-managed-Python pathway, so for now
   we accept 3.11/3.12/3.13 and tell winget to install 3.11 if
   none of those are present. Revisit when the wheel ecosystem
   catches up to 3.14 (~early 2026).

* feat(desktop): Cron, Profiles, usage analytics, and titlebar fixes

- Add Cron and Profiles sidebar routes with full CRUD-style flows and API wiring.
- Extend Command Center with auxiliary task overrides and a Usage panel (7d/30d/90d).
- Fix titlebar geometry for WSL/Windows (native overlay width, tool spacing).
- Remove stray merge conflict markers from pyproject.toml optional deps.

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(title-bar): position sidebar toggle button

* feat(desktop): composer queue — queue many, edit/delete/cancel-edit, Cursor-style

Press Enter while busy with a draft to queue it; with no draft to interrupt
and send the next queued turn. Auto-drains one queued turn each time the
session settles, same as Cursor. Queue persists across reloads so an
interrupted-and-queued turn isn't lost on refresh.

Each queued row supports edit-in-composer (with explicit Save/Cancel),
send-now (↑), and delete. Drain skips only the entry currently being
edited so the rest of the queue keeps flowing.

Queue dequeue is transactional — an entry only leaves the queue after
`prompt.submit` is accepted, so a rejected submit doesn't drop the turn.

Also shrinks the `[interrupted]` marker to a muted one-liner and drops
its assistant footer so it stops looking like a real reply.

* fix(desktop): handle empty usage analytics totals

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix(desktop): address PR review titlebar and usage races

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(desktop): add MCP settings and live subagent tree

Surface configured MCP servers in Settings with JSON edit/save and a gateway-backed reload action so users can manage tool servers without falling back to slash commands.

Track live subagent gateway events in a desktop store, show active subagent counts in the Agents statusbar item, and replace the Agents overlay stub with a live spawn tree for the active session.

* fix(desktop): move power-user views out of sidebar

Keep Cron and Profiles available through lower-prominence chrome entry points so the workspace sidebar stays focused on core chat navigation.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor(desktop): subagent overlay reads like a live transcript, not a dashboard

Strip the card chrome and rewire /agents to feel like peeking into the
child agent's stream:

- subagents store: single `stream` of typed entries (thinking/tool/progress/
  summary) replaces the parallel notes/thinking/tools arrays. Drop unused
  fields (toolsets, depth, apiCalls, reasoningTokens, sessionId).
- agents view: no OverlayCards, no boxed stream, no per-row borders. Goal +
  status pill + indented stream lines, full row width.
- Group root spawns into "Delegation N" sections when batch shape + spawn
  time match — hides task-index interleaving and makes hierarchy obvious.
- Sort tree by spawn time, then task_index. Step indicator is one colored
  pill (primary while running, emerald when done) inside the row, not a
  trailing pill that wrapped under the chevron.
- Tree picks up `subagent.start` (not only `spawn_requested`) and prunes
  delegate-tool fallback rows once native subagent events land for the
  session — fixes duplicate "Delegated task" rows alongside the real ones.

* feat(desktop): Esc closes every OverlayView-based overlay

Lift the keyboard handler into the shared OverlayView so Agents, Settings,
Command Center — and anything we build on top of it later — all dismiss on
Esc by default. Nested Radix dialogs stop propagation themselves, so a
modal opened inside an overlay (e.g. model picker inside Settings) still
closes the modal first, not the overlay underneath.

Drop the now-redundant Esc handlers in Settings (kept Cmd/Ctrl+P) and
Command Center.

* fix(desktop): drop numbered step pill on subagent rows

The pill was getting clipped at the overlay edge anyway. Just use the
status glyph (●/✓/✗/■/○) — the delegation header already conveys
"3 workers, 3 active", and order in the list implies which step you're
looking at.

* fix(desktop): drop noisy "returned N items / empty object" stub strings

When a tool returns nothing useful, the row should be silent — the title
("Search Files", etc.) already tells the user what happened. Counting the
fields in an opaque payload is engineer-noise.

`formatToolResultSummary` and `minimalValueSummary` now return '' for
empty arrays / records / unrecognized values; tool-fallback already hides
the detail section when its body is empty.

* refactor(desktop): subagent rows borrow chat tool patterns (fade-in, lucide glyphs, shimmer)

Pull the agents view closer to how chat tool blocks render:
- statusGlyph() returns the same lucide BrailleSpinner / CheckCircle2 /
  AlertCircle vocabulary as tool-fallback's statusGlyph
- Stream lines fade-in via useEnterAnimation (one-shot WAAPI), keyed per
  entry so streamed deltas settle in instead of popping
- Subagent rows fade in too, and pick up the existing data-slot=tool-block
  spacing rules between blocks
- Active stream line trails a BrailleSpinner instead of a hand-rolled
  pulsing rectangle
- Goal text drops FadeText (which forces nowrap); keep FadeText only for
  the single-line meta subtitle
- Running rows shimmer the title — same affordance the chat thinking row
  uses

* refactor(desktop): make /agents subagent-only, drop sidebar + dead sections

Activity rail and History stub were both noise. Strip the split layout,
sidebar, route enum, and the rail/stub helpers — the overlay is now just
the spawn tree, centered in a max-w-3xl column so it stops claiming the
whole screen for one section's worth of content.

* feat: update cron modals

* Add dedicated GUI log stream for dashboard debugging.

Capture dashboard and PTY websocket lifecycle failures in gui.log and expose it via hermes logs.

* Improve desktop runtime UX by surfacing inference readiness in gateway status and hardening WSL link opening.

This also stabilizes markdown code/table block spacing and adds root-install guards so desktop dev runs use a healthy workspace dependency tree.

* Log detailed GUI websocket failure metadata.

Capture richer reject/disconnect/send/parse context for dashboard gateway websocket flows so GUI connection failures are diagnosable from logs.

* Default dashboard startup logging to GUI mode.

Detect the dashboard subcommand during early CLI bootstrap so gui.log is attached from process start and GUI startup failures are always captured.

* Clean up gateway status conditionals and logging bootstrap mode detection.

Simplify nested dashboard gateway status branches for readability and use a concise first-subcommand check when selecting early GUI logging mode.

* add logging to nsis installer

* feat: glass ui pass

* fix(desktop): persist inline assistant errors across hydrate/resume

- Detect provider failure text arriving via message.complete
  (HTTP 4xx, "API call failed after N retries", Provider/Gateway
  error: ...) and persist as an inline assistant error instead of
  regular completion text, blocking the hydrate that was wiping it.
- preserveLocalAssistantErrors: merge by id so same-id hydrated
  messages keep their local error, and preserve the optimistic
  user+error pair as a unit (with tail-user dedupe).
- Hook all hydrate/resume writers (use-session-actions resume +
  fallback, hydrateFromStoredSession, syncSessionStateToView) into
  the merge so stale snapshots can't clobber a failed turn.
- Add error to chatMessagesEquivalent so the resume diff actually
  sees error-only changes and paints them.
- editMessage on a failed turn now submits a plain resend (no
  truncate_before_user_ordinal) and retries plainly on the
  "no longer in session history" race.

Style polish on touched files:
- Inline error: text-only treatment (no card).
- User stop / edit-composer send: shared Tabler IconPlayerStopFilled
  glyph + shared icon-button class slot for parity.

* feat(desktop): theme xterm with active light/dark mode

The right-sidebar terminal hardcoded a light palette, which read poorly
on the dark glass surface. Subscribe to `useTheme().resolvedMode` and
hot-swap `term.options.theme` so Shift+X (and any other mode change)
updates the terminal in place without tearing down the PTY session.

Dark mode uses xterm's built-in defaults (white fg/cursor + vivid ANSI
16) with just a transparent background so the glass shows through;
light mode keeps the existing hand-tuned overrides for legibility on a
bright surface.

* feat(sidebar): right-click + drag-reorder sessions and workspaces

- Wire right-click on session rows to open the same actions menu;
  suppresses the OS-native context menu so Windows stops looking awful.
- Share dropdown + context menu items via useSessionActions() driving
  a single declarative ItemSpec[]; render polymorphic over MenuItem.
- New shadcn ContextMenu primitive mirroring DropdownMenu styling.
- Restore drag-and-drop reordering for Agents (lost during the cwd
  cleanup) and add reordering of workspace groups via a right-side
  grab handle. Pinned reorder unchanged.
- Generic orderByIds<T> replaces the duplicated session/group orderers;
  useSortableBindings() hook collapses the two Sortable wrappers.
- cursor-pointer on every actionable element; cursor-grab on handles.
- KISS pass: baseName() helper, AGE_TICKS table, single WORKSPACE_PAGE
  constant, flatter SidebarSessionsSection render.

* feat(desktop): solarize the xterm palette in both light & dark

xterm's default ANSI 16 is tuned for dark and reads candy-bright on the
light glass surface (vivid cyans/greens). Ship the canonical Solarized
palette (Schoonover) for both modes — same 16 accents either way, only
fg/cursor swap between `base00/01` (light) and `base0/1` (dark), so a
prompt's colors look uniform across a Shift+X toggle.

Background stays transparent in both modes — Solarized's cream/slate
backgrounds would fight the glass.

* feat(desktop): virtualize chat thread + sidebar via TanStack Virtual

Replaces `use-stick-to-bottom` and per-row session rendering with
`@tanstack/react-virtual`, matching what Cursor uses.

Chat thread (`thread-virtualizer.tsx`):
- Natural-flow virtualization (padding spacers, not absolute items) so
  `position: sticky` on the human bubble still resolves cleanly against
  the scroller.
- Custom at-bottom anchor: pins when armed, disarms on user-driven
  upward scroll, re-arms at bottom, jumps on session switch +
  `thread.runStart`.
- Loading indicator and `--thread-last-message-clearance` move to a
  real `[data-slot=aui_composer-clearance]` node; drops the brittle
  `:nth-last-child(1 of …)` rule that can't fire reliably under
  virtualization.

Sidebar (`virtual-session-list.tsx`):
- Flat agents list virtualizes at >=25 rows; pinned and
  workspace-grouped paths stay direct-render.
- `SortableContext` keeps all IDs; only the window mounts; dnd-kit's
  `setNodeRef` is merged with `virtualizer.measureElement` so rows
  participate in both DnD hit-testing and TanStack measurement.

Drops `use-stick-to-bottom`. Streaming test gets a global
`offsetWidth/offsetHeight` stub so the virtualizer's viewport sizing
works in jsdom; the scroll-up-doesn't-pull-back invariant still passes.

* feat: more ui qa

* fix(desktop): trim sidebar terminal startup spacer

Drop zsh's initial spacer row before writing the first terminal prompt so new sidebar terminal sessions do not open with a selectable blank line.

* chore: uptick

* feat(desktop): thin installer + first-launch install.ps1 bootstrap

Converges the Windows packaged desktop installer onto a single canonical
install topology: drop the Electron shell only (~80MB instead of ~500MB),
clone Hermes Agent at a build-time-pinned commit on first launch via
install.ps1's stage protocol, and treat the resulting git checkout at
%LOCALAPPDATA%\hermes\hermes-agent\ as the canonical install location
(same path the CLI installer uses).  Future updates flow through the
existing applyUpdates() git-pull path.

Replaces the previous fat-installer architecture where the .exe bundled
a pre-staged hermes-agent source tree under resources/hermes-agent/ that
was then sync'd into ACTIVE_HERMES_ROOT at launch -- a complicated
factory-vs-active dance with several footguns (FACTORY_HERMES_ROOT
mismatch on path resolve, isGitCheckout guard regressions, pyproject
hash drift detection inside the sync loop).

Architecture overview
---------------------

  Build time
    apps/desktop/scripts/write-build-stamp.cjs writes
    apps/desktop/build/install-stamp.json with {commit, branch, builtAt,
    dirty}.  Honours $GITHUB_SHA / $GITHUB_REF_NAME in CI, falls back to
    `git rev-parse HEAD` locally.

    apps/desktop/scripts/stage-native-deps.cjs copies the runtime subset
    of @homebridge/node-pty-prebuilt-multiarch from the workspace-root
    node_modules into apps/desktop/build/native-deps/.  Workspace dedup
    hoists this dep to the root, out of reach of electron-builder's
    `files:`-restricted collector; staging gives us a deterministic
    path to extraResources.

    electron-builder ships both into resources/install-stamp.json and
    resources/native-deps/ respectively.

  Boot resolver (electron/main.cjs)
    Resolver order:
      1. HERMES_DESKTOP_HERMES_ROOT override
      2. SOURCE_REPO_ROOT (dev mode)
      3. ACTIVE_HERMES_ROOT git checkout WITH .hermes-bootstrap-complete
         marker -- the post-install fast path
      4. `hermes` on PATH (CLI-installed user adding the desktop)
      5. pip-installed hermes_cli via system Python
      6. bootstrap-needed sentinel -> hand off to runBootstrap

    Deletes the entire FACTORY_HERMES_ROOT / RUNTIME_MARKER /
    syncTreeExcludingVenv machinery (-200 lines).  The isGitCheckout
    guard that bit us in the install.ps1 PR is gone.

  First-launch bootstrap (electron/bootstrap-runner.cjs)
    1. Resolve install.ps1: prefer SOURCE_REPO_ROOT/scripts (dev), else
       download from GitHub raw at INSTALL_STAMP.commit (cached at
       HERMES_HOME\bootstrap-cache\install-<sha>.ps1).
    2. Fetch the stage manifest via install.ps1 -Manifest -Commit X
       -Branch Y.
    3. Iterate stages: install.ps1 -Stage <name> -NonInteractive -Json
       -Commit X -Branch Y per stage.
    4. On all stages green: write the .hermes-bootstrap-complete
       marker with {schemaVersion, pinnedCommit, pinnedBranch,
       completedAt, desktopVersion}.

    Per-run log to HERMES_HOME\logs\bootstrap-<ts>.log.  Cancellation
    via AbortSignal.  Manifest cache so retries don't re-download.

  Install overlay (src/components/desktop-install-overlay.tsx)
    Mounted alongside the existing onboarding overlay; flexbox card
    with header (static) + middle (scrollable) + footer (failure-only,
    static).  Subscribes to hermes:bootstrap:event IPC + resyncs from
    hermes:bootstrap:get on mount/reload.  Renders:
      - 14-stage checklist with per-stage state icons
      - Overall progress bar + current-stage spotlight
      - Auto-expanded installer-output panel on failure
      - "Copy output" button (full ring buffer + error to clipboard)
      - "Reload and retry" wired through hermes:bootstrap:reset to
        clear main.cjs's latched failure
    Synthetic empty-manifest event from main.cjs flips the overlay to
    'active' immediately so the slow install.ps1 download doesn't
    leave the user staring at the generic Preparing splash.

  Failure latching (main.cjs)
    bootstrapFailure module-scope variable holds the rejection after
    install.ps1 fails.  startHermes() throws the latched error
    immediately when set, bypassing the entire ensureRuntime +
    runBootstrap chain.  Without this, the renderer's ensureGatewayOpen
    retries would re-run install.ps1 in a 5-10 min hot loop while the
    user was still reading the failure overlay.  Cleared via
    hermes:bootstrap:reset on user-driven retry.

  Unsupported-platform overlay (1F)
    macOS / Linux packaged builds (no install.sh stage protocol yet)
    emit an unsupported-platform event with a copy-pasteable install
    command + docs URL.  Dedicated overlay branch with "Copy command"
    + "I've run it -- retry" buttons.

install.ps1 additions (Phase 1F.3 + 1F.5)
-----------------------------------------

  New -Commit and -Tag string params.  Precedence Commit > Tag >
  Branch.  Honoured by all three code paths (update / fresh clone /
  ZIP fallback), with archive URL selection that handles each
  ref-type variant.  Detached-HEAD checkouts intentionally -- they're
  pins, not branches the user pulls into.

  EAP=Continue wrap around the new pin-step git invocations.  `git
  fetch origin <commit>` writes the routine 'From <url>' info line to
  stderr; under the script's global EAP=Stop that terminates the
  script even though fetch+checkout succeed.  Matches the established
  pattern in Install-Uv, Test-Python, _Run-NpmInstall.

Backend fix (hermes_cli/web_server.py)
--------------------------------------

  CORS allow_origin_regex now accepts Origin: 'null'.  Packaged
  Electron loads index.html via file://; Chromium sets the WebSocket
  upgrade Origin header to the opaque origin 'null', which the old
  regex rejected with HTTP 403 before gateway_ws() ever ran.  This
  failure mode was masked in the older FACTORY_HERMES_ROOT
  architecture because the resolver often found an existing hermes
  on PATH with different binding behavior.

  Security maintained: localhost-only bind keeps cross-machine pages
  out; per-process session token still gates every authenticated
  /api/ endpoint regardless of Origin.

Desktop QoL
-----------

  DevTools is now enabled in packaged builds (F12 / Cmd+Opt+I).
  Field-debugging trade-off: tiny attack surface increase versus
  a much better support story when CSP / WS / theme issues surface.

  NSIS prereq-check page deleted (-767 lines).  The standard
  Welcome -> License -> Directory -> InstallFiles -> Finish wizard
  now installs without custom Python/Git/ripgrep detection -- those
  prereqs are install.ps1's job at first launch.

Test infrastructure (Phase 1G)
------------------------------

  apps/desktop/scripts/test-desktop.mjs rewritten as a cross-platform
  bundle validator (was darwin-only and asserted on dead factory-
  payload paths):
    NEGATIVE: hermes_cli/main.py is NOT shipped (regression guard)
    POSITIVE: install-stamp.json carries a real commit + branch
    POSITIVE: node-pty native deps shipped under resources/native-deps
    POSITIVE: renderer dist/index.html reachable (asar or unpacked)
  New nsis mode and npm run test:desktop:nsis script.

Validated end-to-end on clean Win10 VM
--------------------------------------

  Confirmed: NSIS installer drops Electron shell, app launches,
  install overlay shows progress, install.ps1 clones the pinned
  commit, 14 stages run to completion, marker written, backend
  spawns, WebSocket connects, onboarding overlay asks for API key,
  main UI loads, integrated terminal works.

  Failures handled: bootstrap stays failed (no hot-loop retry),
  "Copy output" gives actionable transcript, "Reload and retry"
  explicitly re-runs install.ps1.

What's deferred
---------------

  - MSIX wrapping (Phase 2): same Electron .exe under MSIX manifest
    with runFullTrust, signed and submitted to Microsoft Store.
  - install.sh stage protocol parity (Phase 2): once shipped, the
    unsupported-platform overlay becomes drive-it-yourself and
    macOS/Linux packaged installers gain feature parity with Windows.

* feat(desktop): persistent terminal pane + fullscreen takeover

Adds a VSCode-style "focus terminal" toggle to the right sidebar's Terminal
tab that takes over the chat pane area without unmounting the shell. The
xterm host is mounted once at the layout root and CSS-overlayed onto
whichever <TerminalSlot /> is currently active, so the PTY session,
scrollback, selection, focus, and WebGL renderer survive every toggle.

Also:
- WebGL renderer (matching dashboard ChatPage) so Hermes' TUI skins paint
  faithfully instead of muting through xterm's default DOM renderer
- File drag/drop from the project tree or OS into xterm — paths are
  shell-quoted (zsh/bash/pwsh/cmd) and written straight into the PTY
- Solarized dark canvas with brights promoted to real accent variants
  (Schoonover's UI-gray brights washed out every TUI accent)
- Strip NO_COLOR/FORCE_COLOR/COLORFGBG/TERM=dumb leaking from non-tty
  parents (CI runners, Cursor's agent shell) so the embedded shell gets
  truecolor regardless of how Electron was launched
- rAF-debounced ResizeObserver — running fit.fit() synchronously during
  sibling pane transitions crashed the WebGL texture-atlas rebuild

* fix(install.ps1): strip UTF-8 BOM regression that broke 'irm | iex'

The canonical install flow

    irm https://raw.githubusercontent.com/.../scripts/install.ps1 | iex

fails on PowerShell 5.1 with a cascade of 'The assignment expression
is not valid' errors at every param() default value:

    [string]$Branch = 'main',
                      ~~~~~~
    The assignment expression is not valid. The input to an assignment
    operator must be an object that is able to accept assignments...

Root cause: scripts/install.ps1 carries a UTF-8 BOM (0xEF 0xBB 0xBF)
as its first three bytes. 'irm' returns the response body as a string;
on PS 5.1 the BOM survives into that string as a leading \ufeff
character. 'iex' then evaluates the string and PS's parser chokes
on the invisible character before param() -- error recovery proceeds
into the body but every assignment is reported as broken.

This was the exact failure mode the install.ps1 hardening pass (PR
#27224) deliberately fixed by stripping the BOM and ensuring the
file body is pure ASCII. Commit 4279da4db ('fix(windows): make
PowerShell installer parse in 5.1') re-introduced the BOM later,
unintentionally undoing the irm|iex compatibility fix; the merge
that brought it into bb/gui carried it forward.

Fix: strip the three BOM bytes. File body is verified pure ASCII
(any-byte > 127 returns false), so PS 5.1 with no BOM falls back to
Windows-1252 decoding which is identical to ASCII for our content.
Both install paths now work:
  - 'irm ... | iex' (canonical CLI)
  - 'powershell -File install.ps1' (programmatic / desktop bootstrap)

* install.ps1: detect ARM64 Windows reliably for Node and Git stages

Add a Get-WindowsArch helper that reads Win32_Processor.Architecture
via CIM (invariant to PowerShell host bitness) with PROCESSOR_ARCHITEW6432
fallback. Use it in:

- Install-Git: previously only triggered the arm64 PortableGit asset
  when invoked from a native-ARM64 PowerShell host. WoW64 / emulated
  x64 hosts (the default powershell.exe on Windows-on-ARM) saw
  PROCESSOR_ARCHITECTURE=AMD64 and fell through to the x64 PortableGit
  build, leaving ARM64 users on emulated Git for Windows.

- Test-Node: previously hardcoded the Node download to win-x64 on any
  64-bit OS, so ARM64 users always got x64 Node under Prism emulation
  even though Node ships an arm64 build for Windows. The winget
  fallback now also passes --architecture arm64 on ARM64.

Python remains x86_64 by design: uv intentionally prefers
windows-x86_64 cpython on ARM64 hosts for ecosystem (wheel)
compatibility (see astral-sh/uv#19015).

* install.ps1: harden Install-SystemPackages against winget msstore failures

The previous winget invocation discarded stdout/stderr and trusted no
signal at all -- not the exit code (winget exits 0 even when it bails
"please specify --source"), not output (sent to Out-Null), not the
catch handler (winget returning 0 means no exception fires). The only
trust signal was a post-install Get-Command rg / Get-Command ffmpeg
check, which would also miss the package because %LOCALAPPDATA%\
Microsoft\WinGet\Links (where winget puts command aliases) is added to
PATH by AppExecutionAlias machinery only in fresh shells. End result on
machines where the msstore source has a cert problem (0x8a15005e --
common on Windows-on-ARM and some corporate networks): silent failure,
no log, no breadcrumb, and the user is told the install succeeded.

Specifically:

- Pin --source winget on every winget install call. Defeats the broken-
  msstore-source path. We ship nothing from msstore so this is safe and
  forward-compatible.

- Add --exact --id for a tighter package match.

- Capture each winget invocation's combined stdout/stderr + exit code to
  %TEMP%\hermes-winget-<pkg>-<n>.log instead of Out-Null. On the happy
  path the log is deleted after the post-install check confirms the
  binary is on PATH; on failure the log is kept and its path is named in
  a Write-Warn so the user has something to grep.

- Refresh PATH to include %LOCALAPPDATA%\Microsoft\WinGet\Links in
  addition to the User/Machine env-var hives, so Get-Command sees newly-
  installed winget aliases in the same process.

- No behavior change on the happy path. Same Write-Info/Success/Warn
  cadence, same fallback order (winget -> choco -> scoop -> manual),
  same $script:HasRipgrep / $script:HasFfmpeg outputs.

Verified end-to-end on a real Snapdragon ARM64 Windows host: ripgrep
uninstalled, stage re-run, [OK] ripgrep installed in 1.4s, ok:true.

* desktop: swap node-pty fork for upstream microsoft/node-pty 1.1.0

The previous dependency, @homebridge/node-pty-prebuilt-multiarch@0.13.1,
publishes no win32-arm64 prebuilds on its v0.13.x line, and its v0.14.x
betas (which do add an arm64 Windows build) ship no electron-vXXX-win32-
arm64 prebuilds at all -- so packaged Electron 40 builds (NMV 143) would
fail at runtime even on a successful npm install. Net effect: the
desktop's integrated terminal was unbuildable on Windows-on-ARM, in
both dev (npm install fails: 404 fetching the node-vXXX-win32-arm64
prebuilt) and packaged builds (no Electron-ABI prebuilt exists).

The homebridge fork was originally created because upstream node-pty
shipped no prebuilds at all. That hasn't been true since node-pty@1.0
(April 2024), which:

- bundles prebuilts for mac (arm64+x64) and Windows (arm64+x64) directly
  inside the npm tarball -- no GitHub-Releases fetch, no missing-binary
  failure mode
- uses N-API (node-addon-api) for ABI stability across Node and Electron
  major versions, so the same pty.node binary loads under Node 22 (dev)
  and Electron 40+ (packaged) without per-ABI rebuilds
- is what VS Code, Hyper, and Theia actually ship

API surface is identical (spawn / onData / onExit / write / resize /
kill) -- no call-site changes needed.

Specifically:

- apps/desktop/package.json: replace the @homebridge fork with
  node-pty@1.1.0 (exact pin). Widen `asarUnpack` from `["**/*.node"]`
  to also unpack `**/prebuilds/**`, because node-pty ships runtime-
  execed helpers alongside its .node files (darwin spawn-helper has no
  extension and would not be matched by `**/*.node`; conpty.dll,
  OpenConsole.exe, winpty.dll, winpty-agent.exe on Windows are also
  exec'd at runtime and cannot live inside asar).

- apps/desktop/electron/main.cjs: update both require() strings to
  match the new package name and the new staged path under
  resources/native-deps/node-pty/.

- apps/desktop/scripts/stage-native-deps.cjs: point at node_modules/
  node-pty. node-pty's prebuilts live under prebuilds/<plat>-<arch>/
  (not build/Release/), so update the include glob to copy that dir.
  Per-arch staging keeps the resource bundle small (target arch comes
  from npm_config_arch when electron-builder cross-builds, else
  process.arch). Explicitly enumerate file types in the prebuilds glob
  so the ~25 MB of .pdb debug symbols that prebuild-install bundles
  for Windows crash analysis don't bloat the installer (29 MB -> 2.6 MB
  staged on win32-arm64). Re-assert +x on the darwin spawn-helper
  defensively, since a stripped mode bit would manifest as a silent
  ENOENT at first pty.spawn().

- apps/desktop/scripts/test-desktop.mjs: update expectedNativeDepPaths()
  and its assertion site to look at prebuilds/<plat>-<arch>/ instead of
  build/Release/. Add an explicit spawn-helper-exists check on darwin
  so a regression in the asarUnpack glob would fail loudly in CI rather
  than at first PTY spawn.

Trade-off: Linux end-users lose prebuilts and fall back to building
node-pty from source on `npm install`. Acceptable because Hermes
ships no Linux desktop builds (desktop-release.yml matrix is mac + win
only, package.json declares no `linux` target), and Linux developers
hacking on the desktop already need a C++ toolchain for the rest of
the stack.

Verified on Windows 11 ARM64 (Snapdragon):
  npm install                                          -> exit 0
  node -e "require('node-pty').spawn(...)" round-trip  -> OK
  stage-native-deps                                    -> 27 files, 2.6 MB
  load from staged tree (simulates packaged fallback)  -> ConPTY
                                                           round-trip OK

* desktop+gateway: harden Slack socket recovery and Windows restart dedupe (#28873)

* desktop+gateway: harden Slack socket recovery and Windows restart dedupe

Fix Slack Socket Mode reliability by adding a watchdog/reconnect path so silent socket task drops no longer leave the adapter stuck. Harden Windows gateway lifecycle by avoiding desktop-binary path collisions, making gateway PID scans case/extension tolerant, and reusing in-flight restart actions to prevent duplicate gateway spawns.

* test(slack): add Socket Mode watchdog/reconnect behavioural coverage

Drive the new Slack Socket Mode self-healing logic through a fake AsyncSocketModeHandler so we can simulate the P0 silent-hang failure mode (task exit, transport disconnected, intentional shutdown, concurrent reconnect attempts) without touching real Slack.

* fix(slack,desktop): address Copilot review on watchdog races and path normalization

- connect(): explicitly cancel + await the prior socket watchdog before flipping _running, so an old monitor cannot exit between teardown and respawn (Copilot #1)
- _socket_watchdog_loop: wrap the body in try/except + add a done-callback that respawns on unexpected crash, so a transient bug cannot permanently disable self-healing (Copilot #2)
- normalizeExecutablePathForCompare: use the resolved path for realpathSync so non-string inputs cannot leak through (Copilot #3)
- Add tests for crash-recovery and atomic watchdog replacement across reconnects

* fix(slack): tighten connect() error path and clarify watchdog test intent

Address Copilot review round 2.

- connect(): wrap _start_socket_mode_handler/_ensure_socket_watchdog in a focused try/except so any failure rolls back partially-started handler/task state and leaves _running=False, ensuring the platform lock is always released by the outer finally
- Defer _running=True until after the handler is actually started so the watchdog observes a live socket task immediately and never spins against a half-built adapter
- Rename test_watchdog_self_restarts_after_unexpected_crash to test_watchdog_cancellation_does_not_respawn (matches what it actually asserts) and add test_watchdog_unexpected_exit_respawns_via_done_callback that drives a real RuntimeError through _on_socket_watchdog_done and verifies a fresh task replaces the crashed one

* fix(web_server): serialize action spawn check+store under a threading lock

Address Copilot review round 3.

FastAPI runs sync handlers on its threadpool, so two near-simultaneous /api/gateway/restart (or /api/hermes/update) requests could both observe "no live process" in _spawn_hermes_action's poll-based dedupe and double-spawn. Add a module-level _ACTION_SPAWN_LOCK around the entire check + Popen + _ACTION_PROCS store sequence so the dedupe is atomic across threads.

* fix: address Copilot review round 4

- slack.disconnect(): mirror connect()'s defensive cleanup — catch the broad Exception path on watchdog await so handler shutdown and lock release still run if the watchdog raised before cancellation took effect
- web_server._spawn_hermes_action: wrap subprocess.Popen in try/except so a missing executable / permission error closes the log file handle, writes a failure marker, and re-raises instead of leaking a file descriptor
- gateway._scan_gateway_pids: drop the over-broad "hermes.exe --profile" / "hermes.exe -p" patterns that would match any Hermes CLI subcommand using a profile flag (e.g. `hermes.exe --profile foo dashboard`); rely on the "hermes.exe gateway" + "hermes-gateway.exe" tokens instead
- tests: tighten _fake_create_task to assert coroutine input and return a real asyncio.Task that stays pending until pytest teardown, and update the three callsites whose mocked AsyncSocketModeHandler.start_async returned a non-coroutine value

* fix(slack): reset multi-workspace state on reconnect

Address Copilot review round 5.

connect() is reentrant (gateway restart, in-process reconnect), but it was leaving _bot_user_id / _team_clients / _team_bot_user_ids populated from the previous session. A reconnect that rotated the primary token or dropped a workspace would silently keep the stale bot user id and stale workspace client maps, leading to dispatch against gone workspaces.

Clear these three pieces of state right after _stop_socket_mode_handler() and before the auth_test loop, then let the loop repopulate from the current tokens. Add test_reconnect_refreshes_multi_workspace_state to lock it in.

* nix: package apps/desktop as .#desktop (#28964)

Adds nix/desktop.nix building the Electron renderer with buildNpmPackage
and wrapping nixpkgs' electron binary.  Reuses .#default by setting
HERMES_DESKTOP_HERMES to its hermes binary, so the desktop's resolver
picks up the fully-wired nix hermes (venv, bundled skills/plugins,
runtime PATH) without reimplementing agent resolution.

- nix/desktop.nix: renderer + electron wrapper
- nix/hermes-agent.nix: finalAttrs form, exposes hermesDesktop in passthru
- nix/packages.nix: exposes .#desktop + adds to fix-lockfiles
- apps/desktop/package-lock.json: standalone hermetic lockfile

nix build .#desktop && nix run .#desktop both clean.

* fix(desktop): probe steps 4 & 5 of resolveHermesBackend before trusting

A user-reported failure on Windows-on-ARM: a pre-installed Python 3.13
on PATH makes findSystemPython() succeed, so resolveHermesBackend
returns a backend pointing at it -- but hermes_cli isn't in that
interpreter's site-packages. The spawn dies with ModuleNotFoundError
and the user sees a dead GUI instead of the first-launch installer.

Same shape can hit step 4 (existing `hermes` on PATH) when a stale
shim survives a partial uninstall.

Add cheap exit-code probes -- `python -c "import hermes_cli"` for
step 5, `<hermes> --version` for step 4 -- and fall through to step 6
(bootstrap-needed) on failure. install.ps1 then runs as if on a clean
box and the venv gets built.

Probes live in a standalone electron/backend-probes.cjs module so they
can be unit-tested with node --test, same pattern as bootstrap-platform.cjs
and hardening.cjs. New test file wired into test:desktop:platforms.

* test(desktop): allow `node-pty` bare-require in packaged entrypoints

Pre-existing failure on bb/gui since c858484b4 swapped the node-pty
fork for upstream microsoft/node-pty 1.1.0. main.cjs intentionally
bare-requires node-pty (it's hoisted by workspace dedup in dev, and
staged to resources/native-deps via scripts/stage-native-deps.cjs +
extraResources for packaged builds, with a try/catch fallback at
line ~38). The allowlist hadn't been updated to match -- same shape
as `electron`, which was already allowed.

* chore(deps): refresh root lockfile for dashboard @nous-research/ui 0.14.0

apps/dashboard/package.json was bumped to @nous-research/ui 0.14.0 (+
flag-icons ^7.5.0, motion ^12.38.0) but the root package-lock.json was
never refreshed. Running `npm install` from the repo root now
materialises 0.14.0's transitive closure (launder, bumps for
@nanostores/react, nanostores, sanitize-html, tailwind-merge).

No code changes; purely a lockfile catch-up so fresh checkouts on bb/gui
get a working dashboard install.

* chore(desktop): bump version to 0.0.1

First non-placeholder version so electron-builder's artifactName template
produces `Hermes-0.0.1-win-x64.exe` instead of the obviously-unreleased
`Hermes-0.0.0-...`. No release process yet; this just stops the artifact
filename from telling users "you got a debug build."

Bumped in three slots that all carry the desktop app's version:
- apps/desktop/package.json (source of truth)
- apps/desktop/package-lock.json (per-app lockfile, kept for CI parity)
- root package-lock.json's apps/desktop workspace entry

Identity-of-build for first-launch bootstrap continues to come from
build/install-stamp.json (commit SHA + builtAt), unchanged.

* fix: fs icon color

* perf(desktop): cut per-keystroke layout + listener churn in chat composer

Empirical work via CDP harnesses under apps/desktop/scripts/ (see
profile-typing-lag.md):

  jsListeners growth (per round of 200 chars + GC):
    before: +35  (verified leak — listeners stuck after 1st trigger popover use)
    after:  +0

Four narrow edits in src/app/chat/composer/index.tsx:

1. Drop the per-keystroke `editorRef.current.scrollHeight` read used to
   decide composer expansion. Replace with `draft.length > 60` heuristic;
   the existing ResizeObserver still catches edge cases. `scrollHeight`
   is a forced-layout call and was firing on every char until the first
   wrap.

2. Bucket measured composer height to 8px before writing
   `--composer-measured-height` / `--composer-surface-measured-height`
   on `documentElement`. Without this, the editor grows ~1px per char,
   setProperty fires every keystroke, computed style is invalidated tree-
   wide.

3. Remove the dead `$composerDraft` two-way sync. Nothing outside the
   composer subscribed to that atom (verified via grep). Two useEffects
   on `[draft]` were pushing draft→atom and atom→aui per keystroke for
   no consumer. Also drop the per-keystroke
   `reconcileComposerTerminalSelections` call; it was pruning stale
   labels for `terminalContextBlocksFromDraft`, but that helper already
   ignores labels not in the current submitted text, so pruning per
   keystroke was just bookkeeping.

4. `refreshTrigger` fast-bails when the draft contains neither `@` nor
   `/`. Previously `textBeforeCaret(editor)` ran on every input/keyup
   regardless; `range.toString()` inside is O(n) over draft length.

Synthetic typing latency p50/p90/p99 is similar before vs after on a
freshly-loaded session (Blink can already handle ~30cps typing into a
contentEditable on its own); the real win is the listener leak being
gone and the global computed-style invalidations dropping ~8× when the
composer is sitting at a fixed height row.

The `Enter → stall` follow-up (see profile-typing-lag.md §"Submit /
TTFT stall") is unmeasured here — needs a throwaway session because
the harness fires a real prompt. Not blocking this commit.

* perf(desktop): cut FadeText forced layouts during streaming

The slowest user-felt path is typing into the composer while the
assistant is streaming. Profile (scripts/profile-under-stream.mjs):

  FadeText measureOverflow self time:  35.8 ms → 18.1 ms  (-50%)
  total active CPU during 7s window:   ~150 ms → ~50 ms

Two changes in src/components/ui/fade-text.tsx:

1. Drop the `useEffect([children])` that re-ran `measureOverflow`
   (reads scrollWidth + clientWidth — forced layout) on every parent
   re-render. `useResizeObserver` already fires the same callback on
   mount and whenever the host span's box size changes; that covers
   the only case where overflow state can legitimately change. The
   previous explicit useEffect was a forced-layout flush on every
   parent render, which during streaming meant every token tick.

2. Wrap the component in `memo` with a custom comparator that
   short-circuits the entire render when scalar string `children` and
   the className/fadeWidth/style props are unchanged. The hot path
   was tool-fallback's title chips being re-rendered by parent
   streaming updates even though their text was stable; memo+
   comparator skips that.

Also adds two harness scripts under apps/desktop/scripts/:
  - latency-under-stream.mjs (key→paint latency while a turn streams)
  - profile-under-stream.mjs (CPU profile while a turn streams)

Updates profile-typing-lag.md with the streaming numbers and confirms
the Enter→paint submit path is already fast (≤320ms on the populated
session; the 2s "stall after Enter" the user noticed once was a
one-time cold-start, not reproducible at the UI layer).

I'd guess the felt jank in real use is fast-burst typing during a
long-form streaming reply (code blocks + markdown lists multiply the
per-token render cost). The CPU savings here scale linearly with
token volume.

* chore(desktop): drop diag scratch scripts no longer needed

* docs(desktop): correct leak-typing numbers on a real session

Re-ran the leak harness on a populated session (Phaser thread) for both
unpatched and patched builds. The original 'listener leak' was transient
warm-up cost, not a steady-state leak — both versions show 0 listener
growth/round in steady state.

The load-bearing number is forced layouts per character:
  unpatched (HEAD~2):  7.02 layouts/char
  patched   (HEAD):    2.35 layouts/char  (3× fewer)

The patches reduce per-char forced-layout work to Blink's natural floor.
Document node count and heap are flat in both builds.

* perf(desktop): fix "Enter jumps up" on long threads

User reported: after pressing Enter on a long thread, the view jumps up
— the just-submitted message disappears below the fold. Confirmed via
apps/desktop/scripts/measure-jump.mjs:

  before:  distFromBottom 0 → 49.5px, sticks there permanently
  after:   distFromBottom 0 → ~0 (worst case 4px for one frame)

Root cause in useThreadScrollAnchor (thread-virtualizer.tsx):

1. The sticky-bottom logic disarmed on any scroll event where
   `scrollTop < lastTopRef.current`. That check can't distinguish a
   user scrolling up from a programmatic `pinToBottom` write that
   the browser clamped short of bottom (because content also grew in
   the same frame, so `scrollTop = scrollHeight` lands at
   `scrollHeight - clientHeight` for the OLD scrollHeight, which is
   now below the NEW scrollHeight). Result: sticky-bottom disarmed
   permanently on the user's first submit.

2. There was no synchronous pin tied to React's commit phase. By the
   time the ResizeObserver fired and re-pinned, the user had already
   seen ~50ms of "message below the fold" — visually that reads as the
   view jumping up.

Fix:

- `programmaticScrollPendingRef` counter tracks scroll events we
  expect to be ours (one per `pinToBottom` write). The scroll handler
  skips the disarm check when consuming a pending tick, keeps the
  arm bit true, and re-pins synchronously if the browser clamped us
  short of bottom. A depth cap (8) breaks runaway loops in
  pathological streaming-burst layouts.

- `useLayoutEffect` on `groupCount` increase pins BEFORE the browser
  paints, eliminating the visible ~50ms window between optimistic
  user-message insert and the RO/scroll-event chain firing.

Verified on the long Cloud Shadows thread (7-8 turns, ~11k px tall):
all three repro runs now hold within 0–4 px of bottom across the
post-Enter transition. Submit latency unchanged (paint 77–107 ms),
streaming-typing latency unchanged.

Also adds three debug harnesses:
  - measure-jump.mjs   — sample thread scroll across Enter
  - probe-thread.mjs   — dump current thread / scroll state
  - diag-jump.mjs      — intercept scrollTop + RO + mutations across Enter

* perf(desktop): rate-limit thread auto-pin during streaming

Follow-up to the Enter-jump fix. The first version did a synchronous
re-pin loop inside the on-scroll handler when the browser clamped our
`scrollTop = scrollHeight` write short of the new bottom; that gave a
tight 4 px visible jump on Enter, but during streaming the
ResizeObserver fires many times per second as content grows, and each
RO callback re-entered the pin loop. CPU profile showed
`Virtualizer.getMaxScrollOffset` climbing to 22 ms self over a typing-
during-streaming window — the sync re-pin path was paying tanstack-
virtual's recompute cost ~3× per token.

Re-architect:

- RO callback coalesces to one pin per animation frame. Streaming-rate
  RO bursts now cost the same as a single per-frame pin.
- The on-scroll programmatic-counter guard remains (it's what prevents
  the false-disarm bug when the browser clamps a write). It no longer
  does sync re-pins; the next RO/rAF will catch up.
- The useLayoutEffect on groupCount (the path that fires on user
  submit / new turn arrival) ALSO schedules one rAF pin in addition to
  the synchronous pin. This catches the case where React mounts the
  new message in a second commit (after our layout effect ran), which
  grows scrollHeight again. Two pins instead of a tight loop, paid only
  once per turn change.

Net effect on the Cloud Shadows long thread:

  enter-jump transient:   12–20 px for 1 frame (was 49 px permanent)
  CPU during stream+type: `getMaxScrollOffset` dropped out of top-5
                          self-time list
  typing-during-stream:   p50 ~10 ms paint, p99 ~20 ms (1 frame),
                          occasional 40 ms+ outliers during burst
                          token arrivals

Also adds scripts/profile-long-stream.mjs: 20-second streaming profile
with per-500ms FPS histogram + content-length tracking, so we can see
whether streaming render cost grows with message length (it doesn't —
sustained 60 fps).

* perf(desktop): use textContent for trigger precondition

Replace composerPlainText() call inside refreshTrigger's no-trigger
fast-bail with a textContent check. textContent is a browser-native
flat traversal; composerPlainText walks recursively with chip-aware
logic. We only need to know if @ or / appears; either way the trigger
char will be in textContent because chips contain @ in their refText.

Profile shows composerPlainText was ~18ms self over a 12s typing-during-
stream window, called from refreshTrigger on every keystroke. Most of
that was the precondition check (the trigger detection path is the
slow path but only runs when a trigger char is present).

* Revert "perf(desktop): use textContent for trigger precondition"

This reverts commit a6a78ff08a.

* Revert "perf(desktop): cut FadeText forced layouts during streaming"

This reverts commit 88e7d7537c.

* Revert "perf(desktop): cut per-keystroke layout + listener churn in chat composer"

This reverts commit bff1b3261d.

* Revert "Revert "perf(desktop): cut per-keystroke layout + listener churn in chat composer""

This reverts commit b7b378e3a4.

* Revert "Revert "perf(desktop): use textContent for trigger precondition""

This reverts commit 0739588f48.

* chore(desktop): synthetic-stream perf harness + scripts

Drops the React `<Profiler>` approach (no-op because Vite is currently
serving the production React build) in favor of an externally-observable
measurement stack: rAF frame intervals, `PerformanceObserver({entryTypes:
['longtask']})`, and a `MutationObserver` on the live streaming message.

Adds a synthetic stream driver — `window.__PERF_DRIVE__.stream({...})` —
that pushes tokens through the live `$messages` atom at a controlled rate,
so the assistant-ui runtime, incremental repository, and Streamdown
markdown pipeline see the same workload they'd see during a real LLM
stream, without the LLM cost.

The driver lives in `src/app/chat/perf-probe.tsx`; `main.tsx` side-imports
it under `import.meta.env.MODE !== 'production'` so it tree-shakes out of
prod builds. (Using `MODE` rather than `DEV` because our Vite setup
currently reports `DEV=false` even under `vite dev` — see the dev-build
note in `profile-typing-lag.md`.)

Scripts:
  - measure-synthetic-stream.mjs  drive synthetic + record frame/longtask/mutation
  - profile-synth-stream.mjs      CPU profile + top self-time during synthetic
  - measure-real-stream.mjs       same harness, real LLM stream
  - profile-real-stream.mjs       CPU profile bracketing the real stream window
  - eval.mjs / reload.mjs         small CDP helpers

A real-LLM measurement on Cloud Shadows (gpt-4o-mini, 39 s window) showed
12 longtasks in the same 75-127 ms range the synthetic predicted, so the
synthetic is a faithful proxy.

* perf(desktop): memo FadeText so it skips re-renders when text unchanged

FadeText is used 110+ times inside `tool-fallback.tsx` on a tool-heavy
thread. During streaming each parent re-render previously triggered the
component's `useEffect([children])`, which forced a `scrollWidth` layout
read even when the title text was unchanged. The `useResizeObserver` was
already covering the genuine resize case, so that effect was strictly
redundant work.

Drops the effect and wraps the component in `React.memo` with a custom
comparator that field-compares `className`, `fadeWidth`, and `style`,
plus identity-compares `children` (scalar fast-path; correct for JSX
nodes too since a new node should force a re-render).

Verified via temporary render counter on the 34 MB
`session_20260514_215353_fe0ac8` thread (110 FadeText instances): a
2 s synthetic stream went from ~11k FadeText render calls to 122 —
roughly one render per truly-new instance instead of one per parent
commit per instance.

Doesn't move the longtask needle on its own (Streamdown's markdown
re-parse dwarfs it) but eliminates a steady CPU floor and a class of
forced layouts during streaming. Profile-typing-lag.md documents the
full investigation, including the remaining Streamdown cost as the
real source of the perceived "5 fps moment" hitches.

* perf(desktop): memoize MarkdownText plugins to stop churning Streamdown

The inline `plugins={{ math: mathPlugin, ...(isStreaming ? {} : { code }) }}`
on `<StreamdownTextPrimitive>` constructed a new object literal on every
parent render. That broke `<Streamdown>`'s outer memo and forced its
internal `rehypePlugins` / `remarkPlugins` array useMemos to rebuild,
which propagates a new identity into every `<Block>` and defeats Block's
memoization for stable historical blocks.

After memoizing on `[isStreaming]` (the only real dimension of variance),
CPU profile during a 5 s synthetic stream on the 34 MB session shows
`parser` self-time dropping out of the top 10, `compile` cut roughly in
half, and `bn$1` / `m$1` (micromark internals) leaving the top entries.

Doesn't move the visible longtask count on its own — Streamdown's
per-Block parse cost still dominates whenever the last block's content
changes — but it removes a class of unnecessary re-parses for historical
blocks during streaming. See `scripts/profile-typing-lag.md` for the
full investigation.

* perf(desktop): floor assistant-text flush gap to 33ms for predictable batching

`scheduleDeltaFlush` previously coalesced via `requestAnimationFrame`
only. The "at most one flush per frame" guarantee that gives you is fine
for fast streams (>~80 tok/sec) where multiple tokens arrive within a
single frame, but breaks down at typical LLM token rates (30-80 tok/sec)
where each token arrives slower than the rAF cadence and triggers its
own React commit + Streamdown markdown re-parse.

Track `lastFlushAt` and require at least 33 ms between two flushes.
React 18+ auto-batching probabilistically already collapsed some of
these, but the floor makes it deterministic.

A/B on the 34 MB session, 300 tokens at 50 tok/sec (markdown chunks):

| | avgFps | p99 frame | LTs / 5 s | max LT |
|---|---|---|---|---|
| no floor (current rAF) | 54.0 | 38 ms | 2.0 | 145 ms |
| 33 ms floor (this PR) | 54.3 | 41 ms | 1.7 | 110 ms |

`inter-mutation` p50 also tightens from 22-28 ms to a clean 33 ms,
which is the expected signature of a deterministic floor. Doesn't fully
solve the user's perceived hitches — Streamdown's per-Block parse cost
when the last block grows past ~2 k chars is still the elephant — but
it consistently shaves the worst-case longtask and makes the streaming
cadence visibly steadier.

Also threads a matching `flushMinMs` option through the synthetic
stream driver in `perf-probe.tsx` + `scripts/measure-synthetic-stream.mjs`
so the harness can A/B both regimes without spending LLM credits.

See `scripts/profile-typing-lag.md` for the full investigation.

* perf(desktop): useDeferredValue for streaming markdown so parses don't block input

Streamdown's per-Block parse cost grows with the live tail's length and
is unavoidable inside the block-memo pattern (industry standard, see
findings doc). The fix is to stop having that work block the main thread.

`<DeferStreamingText>` is a 12-line wrapper that reads message-part state
via `useMessagePartText`, runs it through `useDeferredValue`, and
re-publishes via assistant-ui's `<TextMessagePartProvider>`. The inner
`<StreamdownTextPrimitive>` reads the deferred value through the normal
`useMessagePartText` hook — no fork, no internal-path imports, fully on
assistant-ui's public API. React's concurrent scheduler then:

  - abandons in-flight deferred renders when a newer token arrives, so
    intermediate states get skipped under fast streams
  - deprioritises the markdown render when the main thread has urgent
    work (typing, scroll), so input stays responsive even while a
    100ms parse is queued

Streamdown already uses `useTransition` for its block-array setState;
this lifts the deferral up to the consumer boundary so it covers the
whole pipeline (preprocess → split → repair → parse → render).

A/B on the 34 MB session, 300 tokens at 50 tok/sec, markdown chunks
(four trials each, with the 33ms flush throttle on for both):

| | avgFps | p99 frame | LTs/5s | max LT | typing-while-stream p95 |
|---|---|---|---|---|---|
| pre  | 54.3 | 41 ms | 1.7 | 110 ms | ~17 ms |
| post | 58.5 | 31 ms | 2.0 | 117 ms | 14-18 ms |

Longtask count + max LT unchanged — useDeferredValue doesn't reduce
CPU, only its priority. The avgFps lift and p99 frame drop are the
proof that the existing CPU is no longer blocking 60 fps cadence. One
clean run logged MUTATIONS=0 — React skipped every intermediate text
state and only committed the final one (textbook deferred-value
behaviour).

The actually-reduce-CPU path is replacing the parser with a state
machine like Flowdown — left for a future PR; see
`apps/desktop/scripts/profile-typing-lag.md` for the full investigation.

* feat(desktop): add hermes gui launcher

* feat(desktop): launch packaged gui builds by default

* bump gui version to 0.0.2

* fix(dashboard): allow file:// origin on loopback WS + diagnostic logging

Upstream commit 2e66eefbc ("fix(dashboard): validate WebSocket Host
and Origin") added a WebSocket Host/Origin guard to block DNS
rebinding against the dashboard.  The guard rejects any Origin whose
scheme is not http/https or whose netloc is empty — which includes
Electron's renderer Origin: file:// when the desktop app loads its
bundle from disk in production mode.

That makes the bb/gui Electron desktop unable to open the gateway
WebSocket against the embedded backend on Windows / macOS prod
builds.  The renderer reports "Desktop boot failed" and the backend
logs:

  WARNING hermes_cli.web_server: gateway-ws reject
      peer=127.0.0.1:NNNN reason=non_loopback_or_bad_origin
      bound_host=127.0.0.1 close_code=4403

DNS-rebinding requires a DNS-resolvable hostname; file:// has no
host component and therefore cannot be the attack vector this guard
exists to block.  When bound to a loopback interface (127.0.0.1 /
::1 / localhost), accept file:// origins so desktop wrappers can
attach.  Non-loopback binds (operator opted into network exposure)
keep rejecting file:// — the loose policy doesn't apply.

Also adds per-reason diagnostic logging in
_ws_host_origin_is_allowed, so future ws-guard rejections name the
specific clause that fired (bad_host / bad_origin_scheme /
origin_host_mismatch) instead of the opaque
"non_loopback_or_bad_origin" surfaced at the call site.

Verified against tests/hermes_cli/test_web_server_host_header.py
(all 11 upstream tests still pass) and hand-tested by opening the
bb/gui Electron desktop dev build against the patched backend.

* fix(tui_gateway): restore _content_display_text helper

Bb/gui had dropped the helper but the orchestrator code merged from main
still calls it (_inflight_text, _message_preview). Re-add the definition
verbatim from main so session.create / _start_inflight_turn don't crash
with NameError on first prompt submit.

* fix(tui-gateway): restore _content_display_text helper lost in main merge

The May 27 merge of origin/main into bb/gui re-introduced two callers of
_content_display_text (in _inflight_text and _history_to_messages) but
dropped the helper definition itself, leaving an unresolved reference.

NameError fires on every user message via _start_inflight_turn ->
_inflight_text, taking down both the TUI and the desktop (which share
this gateway backend) the moment input is dispatched.

Restores the helper verbatim from main (commit 36c99af37) -- pure
structured-content text extractor, no other dependencies.

* fix(telegram): import Set for _dm_topic_chat_ids annotation

self._dm_topic_chat_ids: Set[str] = {...} at line 460 references Set
but only Dict, List, Optional, Any are imported from typing. The file
has no 'from __future__ import annotations', so the annotation is
evaluated at runtime and raises NameError on TelegramAdapter
construction.

* fix(setup): drop shadowing inner importlib.util re-imports

_print_setup_summary and _setup_tts_provider each had 'import
importlib.util' inside a try: block nested deeper in the function
body. Python flips importlib to function-local for the whole scope,
so earlier references in the same function (the neutts branches at
lines 493 / 1109) hit UnboundLocalError before the late import can
run.

The top-of-module 'import importlib.util' at line 14 already covers
both call sites, so dropping the redundant inner imports restores
the intended behavior.

* feat(install.ps1): add -IncludeDesktop switch + Stage-Desktop

The new Hermes-Setup.exe (Tauri bootstrap installer) passes -IncludeDesktop
so users who install via the GUI end up with a launchable Hermes.exe at
apps/desktop/release/<os>-unpacked/. Existing flows are unchanged:

  * The 'irm install.ps1 | iex' CLI one-liner omits the flag — terminal
    users don't need a prebuilt desktop binary; 'hermes desktop' builds
    on demand.
  * The Electron desktop's bootstrap-runner.cjs also omits the flag —
    rebuilding apps/desktop from inside a running Hermes.exe would try
    to overwrite the live binary on disk and fail.

Stage-Desktop runs after Stage-NodeDeps so workspace npm is already
installed when electron-builder fires. It does:
  1. 'npm install' at repo root so apps/* workspaces resolve their deps
     (Electron itself arrives via npm here, ~150MB)
  2. 'npm run pack' in apps/desktop (tsc + vite + electron-builder --dir)
  3. Probes apps/desktop/release/{win-unpacked,win-arm64-unpacked}/Hermes.exe

The --dir mode produces an unpacked launchable binary without an NSIS/MSI
installer artifact — we don't need one because Hermes-Setup.exe spawns the
unpacked binary directly via launch_hermes_desktop.

* feat(installer): Tauri bootstrap installer for first-time onboarding

Hermes-Setup.exe is a small signed Rust+Tauri binary that drives
scripts/install.ps1 stage-by-stage with a native UI matching the
desktop's design language. Replaces the chicken-and-egg pattern of
shipping a 200MB Electron app whose first launch existed only to
run install.ps1.

The architecture:

  Rust backend (src-tauri/):
    bootstrap.rs        orchestrator -- Tauri commands, stage iteration
    install_script.rs   resolve install.ps1 (dev checkout, cache, GitHub raw)
    powershell.rs       spawn powershell, line-stream stdout/stderr, parse JSON
    events.rs           BootstrapEvent types -- mirror bootstrap-runner.cjs
    paths.rs            HERMES_HOME resolution + tracing log setup
    build.rs            bakes BUILD_PIN_COMMIT / BUILD_PIN_BRANCH from
                        'git rev-parse HEAD' at compile time

  React frontend (src/):
    Tauri webview rendering 4 screens (welcome / progress / success /
    failure), driven by nanostores subscribing to the Rust event stream.
    Visual layer reuses the desktop's styles.css wholesale via @import
    so the installer and desktop never drift visually.

  Distribution:
    targets = ['app', 'dmg', 'appimage'] -- no NSIS/MSI wrapper. The
    raw target/release/Hermes-Setup.exe IS the artifact on Windows;
    .dmg + .app on macOS; AppImage on Linux. One file, double-click,
    no installer-installing-an-installer pattern.

  Compile-time pinning:
    build.rs reads 'git rev-parse HEAD' and emits
    cargo:rustc-env=BUILD_PIN_COMMIT=<sha> + BUILD_PIN_BRANCH=<branch>.
    bootstrap.rs's option_env!() picks these up so the binary fetches
    install.ps1 from the exact SHA it was tested against. CI / release
    builds can override via HERMES_BUILD_PIN_COMMIT env var.

  Windows manifest:
    hermes-setup.manifest declares level='asInvoker' so the
    productName 'Hermes Setup' doesn't trip Windows's installer-
    detection heuristic and refuse to launch without elevation.
    Also declares PerMonitorV2 DPI + UTF-8 active code page + Common
    Controls v6.

Limitations of this initial version:

  * No code signing -- Windows SmartScreen will warn once on Hermes-Setup.exe
    ('More info -> Run anyway'). The downstream binaries it produces
    (Hermes.exe in win-unpacked/, the hermes CLI) are locally-built and
    therefore don't carry MOTW, so they launch without SmartScreen
    intervention. Cert procurement tracked separately.

  * macOS and Linux build paths defined but untested -- Windows-only V1.

* fix(installer): pass -IncludeDesktop to manifest, surface launch errors, alias hermes desktop

Three bugs found in the first VM end-to-end test:

1. install.ps1 -Manifest was called WITHOUT -IncludeDesktop, so the
   manifest came back with the 14-stage list (no desktop stage), the
   UI showed '14 steps' and Stage-Desktop never ran. Pass the flag to
   both the manifest fetch and the per-stage runs — install.ps1 gates
   the desktop stage's inclusion on the flag.

2. The Success screen's Launch button silently swallowed the Tauri
   error when no Hermes.exe existed (e.g. Stage-Desktop was skipped).
   Wire the error through to inline UI with an alert callout, so the
   user gets actionable text ('Hermes.exe missing, run hermes desktop
   from a terminal') instead of an unresponsive button.

3. The Success screen tells users to run 'hermes desktop' from a
   terminal but the CLI only accepted 'hermes gui' — invalid choice
   for 'desktop'. Rename the subcommand canonically to 'desktop' with
   'gui' as a backwards-compatible alias. Update the _SUBCOMMANDS sets
   used by session-flag arg parsing + logging-mode probe so both names
   route to the same logic.

* fix(install.ps1): pre-warm electron-builder winCodeSign cache + fix Stage-Desktop $HasNode false-skip

Two bugs caught in the second VM end-to-end run:

1. electron-builder's winCodeSign extraction fails on grandma-class
   Windows boxes because the .7z archive contains macOS symlinks
   (darwin/10.12/lib/libcrypto.dylib and libssl.dylib pointing at
   versioned siblings). Creating symlinks on Windows requires
   SeCreateSymbolicLinkPrivilege, a per-user right that non-admin
   accounts don't have on stock Windows. Result: every fresh install
   on a non-admin user fails Stage-Desktop with a 7-Zip 'cannot create
   symbolic link' error, retried four times, then bails.

   Fix: Initialize-ElectronBuilderCache pre-extracts winCodeSign-2.6.0.7z
   ourselves with -snl (don't preserve symlinks, store as resolved file
   content) AND -x!darwin (skip the entire macOS subtree — irrelevant
   on Windows). Writes to electron-builder's expected cache dir before
   electron-builder gets a chance to try its own broken extraction.
   Idempotent — fast-paths via signtool.exe sentinel check.

2. Install-Desktop's first guard was 'if (-not $HasNode) skip'.
   $HasNode is set by Stage-Node into $script:HasNode, but in
   cross-process driver mode (each -Stage NAME is a fresh powershell.exe
   spawned by Hermes-Setup.exe), that script-scope variable from the
   PREVIOUS process is invisible — so the guard always fired and
   Install-Desktop returned in 900ms with a misleading
   'Node.js not available' reason. The real npm probe below it never
   got to run. Fix: re-probe npm directly via Get-Command when $HasNode
   is empty/false, since by that point Stage-Node has already verified
   Node is installed and the only question is whether *this* process
   can see it on PATH (it can — installer-wide PATH update from Stage-Node).

* fix(install.ps1): tell electron-builder we're NOT signing instead of pre-extracting winCodeSign

The previous commit (c7e46f9f3) worked around the winCodeSign-symlinks-
on-Windows extraction crash by pre-extracting the archive ourselves with
-snl + -x!darwin. That fix was correct but addressed the wrong layer.

The deeper question: why was electron-builder fetching winCodeSign at all
when we have no signing cert configured? Answer: electron-builder
unconditionally pre-warms the toolchain assuming any build MIGHT sign.
The cert auto-discovery never finds anything (we never set CSC_LINK
or anything else), so the signing never happens — but the 100MB fetch
of winCodeSign and its broken-on-Windows symlink extraction does.

Set CSC_IDENTITY_AUTO_DISCOVERY=false (with WIN_CSC_LINK and
WIN_CSC_KEY_PASSWORD also explicitly cleared as belt-and-suspenders)
before invoking npm run pack, and electron-builder skips the entire
winCodeSign apparatus. No download, no extraction, no privilege check.
Env vars are saved/restored around the invocation so we don't leak
the override into Stage-PlatformSdks etc.

Net: removes the 100-line Initialize-ElectronBuilderCache helper that
manually downloaded + extracted winCodeSign-2.6.0.7z. Replaced with
3 env-var assignments. The produced Hermes.exe is functionally
identical — just no longer carries a code-signing-machinery dependency
we never used.

* fix(installer): bump bootstrap-installer.log to capture stage transitions + every install.ps1 line

Diagnosing the second VM failure was impossible because bootstrap-installer.log
contained only the 'starting' banner. Two causes:

1. emit_log() inside run_bootstrap() was tracing::debug! — dropped on the
   floor under the default INFO env-filter.

2. The per-stage sink callbacks (on_stdout_line / on_stderr_line) only
   emitted Tauri events to the frontend; they never tee'd to the log file
   at all. When the failure route mounts, the Tauri event stream is the
   only place the script output lived, and it gets discarded.

3. The Failed / Stage / Manifest / Complete lifecycle frames in emit_event()
   were also Tauri-only — so even the 'which stage failed' frame never
   reached the log.

Fixes:
  * emit_log() → tracing::info!
  * Sink callbacks tee stdout to info!, stderr to warn!, with stage label
    as a structured field for grep'ability
  * emit_event() now matches on the variant and logs each lifecycle frame
    at the right level: Failed → tracing::error!, others → info!

Result: a failing install leaves a complete forensic trail in
bootstrap-installer.log — manifest stage list, every install.ps1
stdout/stderr line tagged by stage, the stage transitions, and the
final error. Same path as before so nothing the user does changes.

* fix(install.ps1): Stage-NodeDeps cross-process $HasNode + stream npm install output to bootstrap log

VM run 3 diagnosis: node-deps stage skipped on the VM (logged
'Skipping Node.js dependencies (Node not installed)') and then
desktop's npm install failed with exit 1 and zero diagnostic detail.

Two root causes:

1. $HasNode false-skip in Stage-NodeDeps — same cross-process bug
   pattern we fixed for Stage-Desktop in c7e46f9f3. Stage-Node ran
   in process A and set $script:HasNode = $true, then exited. Stage-
   NodeDeps ran in fresh process B (Hermes-Setup.exe -Stage NAME
   spawns each stage independently), where that variable doesn't
   exist. Re-probe via Get-Command npm instead of trusting the
   stale script-scope global. The previous stage already verified
   Node so the re-probe succeeds.

2. npm install --silent + Tee to TEMP file hid the real error.
   When the workspace install failed on the VM, the actual reason
   was buffered in $env:TEMP\hermes-npm-desktop-install-*.log and
   the user saw only 'exit 1'. Drop --silent so npm streams its
   full output, drop the TEMP-file dance — the Tauri installer's
   streaming sink already tees every stdout/stderr line to the
   rolling bootstrap-installer.log, so a side log file is dead
   weight that hides the very error we need.

After this, the bootstrap log on a failure will contain npm's full
output (deprecation warnings, ETARGET, native-module compile errors,
whatever) tagged with stage=desktop, making the actual cause
diagnosable instead of an opaque exit code.

* fix(install.ps1): restore Initialize-ElectronBuilderCache (CSC env vars alone aren't enough)

VM run 4 diagnosis: even with CSC_IDENTITY_AUTO_DISCOVERY=false set,
electron-builder still fetches winCodeSign and signs bundled binaries.
The log shows the signing happens BEFORE the cache extraction:

  • signing with signtool.exe  ...\winpty-agent.exe
  • signing with signtool.exe  ...\OpenConsole.exe
  • downloading winCodeSign-2.6.0.7z
  • <symlink privilege error>

Cause: node-pty's bundled prebuilds are listed in apps/desktop's
asarUnpack ['**/*.node', '**/prebuilds/**']. electron-builder
re-signs anything unpacked from asar, regardless of whether OUR
binary gets signed. The signtool invocation needs winCodeSign on
disk, which needs the .7z extracted, which hits the macOS-symlink
crash on non-admin Windows.

The CSC env vars I added in d5fe46727 only kill IDENTITY DISCOVERY
(so OUR Hermes.exe stays unsigned, which is fine — we have no cert).
They don't prevent the toolchain fetch for the bundled-prebuild
re-sign. I removed the pre-extract in d5fe46727 thinking the env
vars subsumed it; that was wrong. Both are needed.

Restoring Initialize-ElectronBuilderCache verbatim from c7e46f9f3
and keeping the CSC env vars. Wrote a clearer doc-comment at the
call site explaining the two-knob interaction so future maintainers
don't drop one half again.

* fix(desktop): disable signtool via signtoolOptions.sign=null, drop dead winCodeSign pre-extract

VM run 5 diagnosis: the pre-extract from 3b29e65c1 ran (extracted 83
files, 24MB) but produced ZERO files at the expected sentinel path
'/winCodeSign-2.6.0/windows-10/x64/signtool.exe'.

Cause: the .7z archive's root entries are 'windows-10/', 'darwin/',
'linux/', etc. — not 'winCodeSign-2.6.0/<arch>'. Extracting with
'-o$cacheRoot' put files at $cacheRoot/windows-10/..., NOT at
$cacheRoot/winCodeSign-2.6.0/windows-10/.... I had the directory
nesting wrong from the start.

And then we observed: electron-builder downloads winCodeSign-2.6.0.7z
under a random numeric filename ('384387955.7z') regardless of what's
already extracted in the parent dir. The cache key isn't the dirname;
it's content-addressed. So the pre-extract approach was doomed even
if the path nesting had been right.

Actual fix: signtoolOptions.sign=null in apps/desktop/package.json's
win build config. electron-builder honors this and skips the bundled-
prebuild signing entirely — no signtool invocation, no winCodeSign
fetch, no symlink-privilege crash. The previous failures all stemmed
from electron-builder pre-signing node-pty's bundled .exes
(winpty-agent.exe, OpenConsole.exe) which are already author-signed
upstream; re-signing with our nonexistent cert was overwriting good
sigs with nothing useful anyway.

Cost: when we DO get a real cert later, we'll add it back with the
sign function pointing at the cert chain. Until then, all-null is
the correct config and unblocks every non-admin Windows user.

Removed Initialize-ElectronBuilderCache (the dead pre-extract).
Removed the call site. Kept the CSC_IDENTITY_AUTO_DISCOVERY env
vars as belt-and-suspenders against a future electron-builder
change that might revive cert auto-discovery.

* fix(desktop): use no-op sign function instead of sign=null

VM run 6 still hit the symlink crash even with signtoolOptions.sign=null.
electron-builder 26.8.1 treats null as 'use the default signtool path'
rather than 'skip signing', so the winCodeSign fetch + extraction still
fired for the bundled prebuild re-sign.

The Electron docs (electronjs.org/docs/latest/tutorial/code-signing)
make it clear signing is OPTIONAL and unsigned apps work fine — users
just see SmartScreen on first launch. The electron-builder mechanism
for 'don't actually sign anything' is to supply a custom sign function
(via signtoolOptions.sign: '<path-to-cjs-module>') that resolves
without invoking signtool.

build-noop-sign.cjs is that module — a 5-line async function that
returns undefined. electron-builder calls it for every binary it would
have signed, gets back a resolved promise, and considers each binary
'signed.' No signtool spawn, no winCodeSign fetch, no symlink crash.

When Nous's cert arrives, replace this file with a real signing hook
(@electron/windows-sign-based or a direct signtool invocation). The
architecture's signing-ready and the cutover is a one-file edit.

* fix(desktop): signAndEditExecutable=false to skip signtool path entirely

After reading app-builder-lib/winPackager.js line 216 + 231 directly:
signAndEditExecutable is the ACTUAL hardcoded gate that short-circuits
both signApp() (which signs Hermes.exe + every shouldSignFile match
including bundled prebuilds) AND createTransformerForExtraFiles().
None of signtoolOptions.sign / sign:null / sign:<custom-fn> gate the
winCodeSign download — that happens before they're consulted.

What we lose: rcedit also runs through signAndEditResources, so
disabling this drops PE metadata (file properties showing 'Hermes' /
'Nous Research' / file description). Cost is real but bounded:
  * Hermes.exe filename, icon, asar contents, app identity intact
  * Task Manager shows 'Hermes.exe' (the filename) not 'Hermes' (PE
    description) — minor downgrade
  * Start menu, taskbar, window title all work normally
  * SmartScreen will warn once (unsigned, same as before)

When the cert lands, flip signAndEditExecutable back to default true,
both signing AND rcedit return, PE metadata is restored.

Removes the no-op sign function (build-noop-sign.cjs) since
signAndEditExecutable=false prevents signtool from being invoked at
all — the custom hook never gets called either.

* feat(install.ps1): write .hermes-bootstrap-complete marker at end of install

The desktop app's main.cjs resolver ladder has a 'bootstrap-needed' rung
that fires when .hermes-bootstrap-complete is missing from
ACTIVE_HERMES_ROOT. Pre-Hermes-Setup, this marker was written by the
packaged-desktop's own bootstrap-runner.cjs at the end of its install
flow. Now that Hermes-Setup.exe runs install.ps1 directly, install.ps1
needs to own the marker — otherwise the desktop sees no marker on first
launch and triggers its legacy first-launch bootstrap (re-running
install.ps1 from inside Electron, the exact recursion Hermes-Setup.exe
was supposed to obviate).

Implementation:
  * New Stage-BootstrapMarker (worker) → Write-BootstrapMarker (helper)
  * Slotted in the manifest right after platform-sdks, before the
    interactive configure/gateway stages, so it runs unconditionally
    when the install reaches the finalize phase
  * Schema mirrors apps/desktop/electron/main.cjs writeBootstrapMarker /
    isBootstrapComplete EXACTLY: {schemaVersion: 1, pinnedCommit,
    pinnedBranch, completedAt}. Schema version stays at 1 so old
    desktops that read marker files written by future install.ps1s
    can still parse them.
  * pinnedCommit comes from -Commit flag (Hermes-Setup.exe passes it)
    or falls back to 'git rev-parse HEAD' in InstallDir
  * pinnedBranch from -Branch flag, defaults to 'main' matching
    install.ps1's own param default

Two PS-5.1 gotchas baked into comments:
  * The ?. null-conditional operator doesn't exist pre-PS7; use
    explicit if-checks on Get-Command results
  * Set-Content -Encoding UTF8 emits a BOM in 5.1 and Node's plain
    JSON.parse rejects BOM — write via .NET's UTF8Encoding(false)
    to produce BOM-less JSON the desktop's readJson() can parse

* feat(installer): drive in-app updates through the Tauri installer

Converge update on the same principle as bootstrap: one driver owns all
repo mutation. The desktop becomes a pure consumer that hands off to
Hermes-Setup.exe --update instead of re-implementing git/pip in Electron.

- hermes desktop --build-only: build without launching, so the installer
  owns the post-update launch (CLI keeps build logic single-sourced).
- Installer AppMode {Install,Update} from argv; get_mode exposed to the UI.
- Installer self-copies to HERMES_HOME/hermes-setup.exe on install success
  (no-op guard during --update re-invocation to avoid the locked-exe copy).
- Installer --update flow (update.rs): wait for the desktop to release the
  venv shim, run 'hermes update --yes --gateway' (branch on exit 0/2/other),
  then 'hermes desktop --build-only', then launch the rebuilt desktop. Reuses
  the bootstrap event channel + progress UI via a synthetic two-stage manifest.
- Desktop applyUpdates() gutted (~105 lines of git/stash/pull/pyproject/pip
  removed) -> thin handoff: spawn updater, app.quit() to free the shim.
  Detection (checkUpdates, commit changelog, behind-count) kept intact.
- install.ps1 creates Start Menu + Desktop shortcuts to the packed Hermes.exe
  (never bare 'hermes desktop', which would rebuild every launch).

* test update

* fix(installer): pass --branch to hermes update in the --update flow

The install is a detached-HEAD checkout of a pinned commit. Without
--branch, 'hermes update' fell back to its default (main) and switched
the checkout to main — a divergent branch that lacks the desktop CLI
command — so the update targeted the wrong branch and the rebuild stage
failed with 'invalid choice: desktop'.

Thread BUILD_PIN_BRANCH (the branch this installer was built against,
and the same branch the desktop detected the update on) into
'hermes update --branch <b>' so update + rebuild stay on-branch.

* test update

* fix(installer): stamp Hermes icon onto Hermes.exe via rcedit (no winCodeSign)

The unpacked Hermes.exe showed the stock Electron icon + name in the
taskbar because build.win.signAndEditExecutable=false disables BOTH
electron-builder's signing AND its rcedit metadata/icon stamping. That
flag is load-bearing: enabling it re-triggers signtool -> winCodeSign,
whose macOS symlinks crash 7-Zip on non-admin Windows (unfixable dead end).

Decouple identity-stamping from signing entirely: after npm run pack,
run rcedit ourselves on the produced exe.
- Add rcedit as a direct devDependency of apps/desktop (the transitive
  electron-winstaller copy is fragile).
- apps/desktop/scripts/set-exe-identity.cjs: Node helper that calls
  rcedit's named export to set icon + ProductName/FileDescription/
  CompanyName. Node builds argv natively — avoids the PowerShell->exe
  ->JSON double-escaping that broke the app-builder rcedit path.
- install.ps1 Set-DesktopExeIdentity invokes the script after the build,
  before shortcuts. Best-effort: failure keeps the stock icon, never
  fails the install. rcedit is a pure PE editor — no signtool, no
  winCodeSign, no symlinks.

Verified locally: stamping a copy of the built Hermes.exe embeds the
32x32 icon and sets ProductName=Hermes.

Also fix update-path success-screen flash: in update mode the installer
hands off + exits in ~600ms, so don't route to the 'launch Hermes'
success view (it flashed before the window closed).

* update test

* fix(desktop): show 'hermes update' guidance for CLI installs instead of dead-end error

A user who installed via the CLI (irm|iex / install.sh) then ran
`hermes desktop` has no staged hermes-setup.exe, so clicking Update
in-app hit resolveUpdaterBinary()=null and showed a misleading error
('re-run the Hermes installer') with a Try-again button that could
never succeed — a dead loop for a perfectly valid install.

Treat the no-updater case as an intentional outcome, not a failure:
- main.cjs applyUpdates returns { ok:true, manual:true, command:'hermes update' }
  (no throw, no 'error' stage) when no updater binary exists.
- New 'manual' update stage + apply-state.command thread the command to the UI.
- updates-overlay ManualView: a polished terminal-native card with the
  exact command and a copy button, framed as the correct path for a CLI
  user rather than an error.

GUI-installer users are unaffected — hermes-setup.exe present => seamless
auto-update runs as before. Zero new process orchestration; can't fail
the update demo.

* update test

* fix(gui): pin /api/hermes/update to the current branch

The desktop command-center 'update' action hits POST /api/hermes/update,
which spawned bare `hermes update` with no --branch. cmd_update then
falls back to its default (main) and checks the working tree OUT of the
tracked branch — a bb/gui install silently jumped to main and lost the
desktop CLI.

Resolve the checkout's current branch and pass --branch <current> from
this endpoint only. The engine default (main) is DELIBERATELY unchanged:
bare `hermes update` from a terminal, the gateway /update bot command,
and the CLI/TUI relaunch path all keep their long-standing 'update against
main' contract for the existing user base. Only the GUI button is scoped
to update-the-branch-you're-on. Detached HEAD / git failure falls back to
the bare default.

* update test

* fix(desktop): branch-pin the CLI manual-update command card

The 'Update from your terminal' card (shown to CLI installs with no staged
updater) hardcoded bare `hermes update` — which defaults to main and would
switch a bb/gui (or any non-main) checkout off-branch. Same bug we fixed for
the GUI button, leaked into the card's copy text.

Resolve the checkout's current branch and show `hermes update --branch
<current>` for non-main checkouts; keep it bare for main so the card stays
clean. Best-effort: bare fallback if branch detection fails. Matches the
GUI button + installer --update contract; bare terminal/bot/TUI update
paths still default to main, unchanged.

* docs: phragg was here

* feat(desktop): lead onboarding with Nous Portal + fix fresh-install detection (#34970)

- Feature Nous Portal as the primary onboarding card (Recommended tag,
  app logo, single pitch line); collapse other OAuth providers behind an
  "Other providers" disclosure whose open/closed state persists.
- Surface OpenRouter as a one-click API-key option inside the disclosure;
  move "I have an API key" to a quiet bottom-right link.
- Treat "no provider configured" as a normal onboarding state, not a red
  error banner (provider-setup-errors copy match).
- Fix setup.runtime_check: it reported ready when the resolved runtime had
  an empty credential or only implicit Bedrock/IAM, so fresh installs never
  saw onboarding. Now requires a usable credential.
- Auto-wire Windows fonts for WSL2 users so the renderer renders real
  Segoe UI instead of the DejaVu fallback; make WSL detection env-independent
  via the /proc kernel marker.

* feat(desktop): live elapsed timer on install bootstrap steps

The first-launch install overlay showed a static "Installing" with no
motion, so long steps (notably the repo clone) looked frozen. Stamp each
stage's start time on the running transition and tick once a second so the
active step shows live elapsed (e.g. "Installing · 1:23"), plus elapsed on
the overall current-step line. Completed steps keep their final duration.

* fix(desktop): resolve PortableGit for update checks + reserve titlebar tools space

- runGit() hardcoded spawn('git'), which ENOENTs on fresh installer-driven
  Windows installs (git is PortableGit under %LOCALAPPDATA%\hermes\git, never
  on PATH) — so "Check for updates" failed with "Couldn't check for updates".
  Add resolveGitBinary() mirroring findGitBash (PortableGit → Git-for-Windows
  → PATH) and use it in runGit.
- PageSearchShell rendered a full-width search input in the titlebar row, so
  on Windows its right edge slid under the fixed top-right tools + native
  window controls. Reserve that footprint via --titlebar-tools-* vars.

* fix(desktop): stop streaming caret from shifting layout on completion

The streaming caret (::after on the running message's last child) was an
in-flow inline-block adding ~0.78em of inline width, which could wrap the
last line mid-stream; when the caret is removed on completion the line
un-wraps and reflows — the visible post-response layout shift. Net-zero its
inline advance with a compensating negative margin so it paints at the text
end without consuming layout width.

* fix(desktop): stop completed-message layout shift while streaming

The assistant message action bar used `hideWhenRunning`, which unmounts it
whenever the thread is streaming. Since the bar reserves vertical space in
each completed assistant message's footer (it's invisible-until-hover via
opacity, not via mount), unmounting it collapsed every prior turn by the
bar's height — then remounting on resolve grew them back, shifting the whole
conversation (visible as "padding appears above the last user message").
Drop hideWhenRunning so the footer height is constant; the bar stays
invisible during streaming via its existing opacity/pointer-events gating.

* fix(merge): keep windows-footgun suppressions inline

* fix(merge): keep remaining gateway footgun suppressions inline

* fix(merge): restore contracts caught by main-target CI

* fix(dashboard): honor injected HERMES_DASHBOARD_SESSION_TOKEN

The desktop shell mints a session token and signs its /api + /api/ws
calls with it via HERMES_DASHBOARD_SESSION_TOKEN, but the main-merge
restored a web_server.py that ignored the env var and minted its own
random _SESSION_TOKEN -- so every desktop request 401'd and the UI
reported "gateway offline". Read the injected token (fall back to a
fresh random one) so loopback HTTP + WS auth line up.

Adds a regression test so a future merge can't silently drop the read.

* fix(desktop): align fresh-install home so upgraders don't brick

Two related first-launch bugs on machines with a legacy ~/.hermes:

- install.ps1 hardcoded $HermesHome/$InstallDir to %LOCALAPPDATA%\hermes
  and ignored the HERMES_HOME the desktop passes through. The desktop
  freezes HERMES_HOME at module load and prefers a legacy ~/.hermes when
  %LOCALAPPDATA%\hermes is absent, so the installer wrote to a different
  home than the shell read -> "Could not connect to Hermes gateway". Honor
  $env:HERMES_HOME in the param defaults.

- isBootstrapComplete() trusted the marker + checkout without verifying a
  runnable venv, so an interrupted/split install spawned a dead backend
  instead of re-bootstrapping. Also require the venv python to exist.

* fix(dashboard): allow packaged desktop file:// origin on loopback WS

The packaged Electron desktop loads its renderer over file://, so its
/api/ws handshake carries Origin: file:// (or null). The DNS-rebinding
WebSocket Origin guard only accepted http(s) origins matching the bound
host, so it rejected the desktop's own renderer with 4403 -> "Could not
connect to Hermes gateway" on macOS.

A browser DNS-rebinding attacker can only ever present an http(s) origin
(the site hosting the malicious page); it cannot forge file://, null, or
a custom app scheme AND hold the loopback session token. So on loopback
binds we now trust non-web origins -- the token in _ws_auth_ok remains
the real authenticator. Public/gated binds still reject them, and
cross-site http(s) origins are still rejected everywhere.

* fix(desktop): resolve renderer assets relative to BASE_URL

Absolute public asset paths (/apple-touch-icon.png, /ds-assets/...) work
under the dev server but break in the packaged app, where the renderer is
loaded from file://.../index.html and a leading slash resolves to the
filesystem root -> broken onboarding provider icon and backdrop image on
macOS. Prefix these with import.meta.env.BASE_URL so they resolve next to
the bundled index.html in both dev and packaged builds.

* feat(desktop): automate first-launch bootstrap on macOS/Linux

Previously a packaged macOS/Linux app with no Hermes install hit a
dead-end ("first-launch install is not yet automated -- run install.sh
manually") because install.sh lacked the staged protocol install.ps1
exposes. Now both platforms bootstrap on first launch with the same
structured, per-step progress UI as Windows.

- install.sh: add --manifest / --stage / --json / --non-interactive plus
  a stage dispatcher (prerequisites, repository, venv, python-deps,
  node-deps, path, config, setup, gateway, complete). User-input stages
  (setup, gateway) are skipped under --non-interactive; the in-app
  onboarding overlay owns API keys/model, matching the Windows flow.
  Each stage runs inside the install dir (its own process) and a new
  --commit flag pins the checkout to the build-stamp SHA.
- bootstrap-runner.cjs: drive the staged manifest/stage/JSON protocol for
  both install.ps1 (PowerShell) and install.sh (bash), selected by
  installer kind; removed the single-blob POSIX shim.
- main.cjs: drop the macOS/Linux unsupported-platform dead-end so the
  bootstrap-needed path runs the installer on every platform.

* fix(dashboard): return 404 JSON for unmatched /api paths instead of SPA HTML

The SPA catch-all (serve_spa) served index.html for any unmatched GET,
including unregistered /api/* endpoints. A missing API route therefore
came back as <!doctype html> with status 200, and JSON clients (the
desktop app's fetchJson) crashed with an opaque
'SyntaxError: Unexpected token <' instead of a clear error.

- web_server.py: unmatched /api or /api/... now returns 404 JSON
  ('No such API endpoint'); non-api paths still serve the SPA for
  client-side routing.
- main.cjs fetchJson: detect an HTML body / text/html content-type on a
  2xx response and reject with a clear message naming the URL, rather
  than a raw JSON.parse SyntaxError. Empty bodies resolve to null;
  malformed JSON reports the URL plus a snippet.

* say 'OS appearance' instead of 'macOS appearance'

* feat(install): add --include-desktop stage + PowerShell-style flags to install.sh

Brings install.sh to parity with install.ps1's bootstrap surface so the
shared Rust/Tauri bootstrapper (apps/bootstrap-installer) can drive a
macOS/Linux install the same way it drives Windows.

- Accept the PowerShell-style aliases the bootstrapper emits to both
  installers: -Commit / -Branch (alongside existing -Manifest / -Stage /
  -Json / -NonInteractive).
- Add --include-desktop / -IncludeDesktop. When set, the manifest gains a
  'desktop' stage (immediately before 'complete'), and a new install_desktop
  runs a root workspace `npm install` + `npm run pack` (electron-builder
  --dir, signing auto-discovery disabled) to produce release/mac*/Hermes.app
  -- mirroring install.ps1's Install-Desktop / Stage-Desktop.
- The flag is opt-in, exactly like Windows: the signed bootstrap installer
  passes it; the Electron app's own first-launch bootstrap and the CLI
  one-liner omit it (building the desktop from inside the running app would
  clobber it).

* fix: tts endpoints

* macOS desktop: install + in-app self-update (#35607)

* fix(installer): align macOS HERMES_HOME with the rest of the stack

paths.rs computed the macOS Hermes home as ~/Library/Application Support/
hermes, but nothing else does: hermes_constants.get_hermes_home() (Python),
scripts/install.sh, and the Electron desktop's resolveHermesHome() all use
~/.hermes on macOS. The drift meant the Tauri installer wrote the install to
one directory and the desktop looked for it in another, so a fresh GUI
install never found its backend (the file's own comment warned this exact
drift would break things). Use ~/.hermes on macOS to match.

* fix(install.sh): always emit a stage result frame on failure

Stage helpers (clone_repo, install_deps, check_python, …) were written for
the monolithic flow and call `exit 1` on failure. Under `--stage`, that
terminated the process before the JSON result frame was printed, so the
installer's parse_stage_result saw "no frame" instead of a clean
{ok:false,...} contract response. Run the stage body in a subshell so an
`exit` only unwinds the subshell and the parent still emits the frame.

* feat(install.sh): auto-provision git on macOS/Linux (parity with install.ps1)

install.ps1 downloads PortableGit on Windows, but install.sh just printed a
"please install git" hint and exited — so a fresh Mac with no developer tools
(no Xcode CLT → no git) couldn't get past the clone step. check_git now tries
to install git before bailing:
  - macOS: Homebrew if present (headless), else `xcode-select --install`
    (the CLT prompt also provides the compiler some wheels need), polling for
    git to appear.
  - Linux: apt/dnf/pacman via sudo when available.
Falls back to the manual instructions only if auto-provision fails.

* feat(desktop): in-app GUI+backend self-update on macOS/Linux

On Windows the staged Hermes-Setup binary drives updates (quit → hermes
update → hermes desktop --build-only → relaunch). The mac drag-install has no
such binary, so "Update now" previously just printed `hermes update`.

Since there's no venv-shim file lock on POSIX, the desktop can drive the whole
update itself. applyUpdates now, when no staged updater exists on mac/linux:
  1. runs `hermes update --yes [--branch <current>]` (backend git pull + deps),
  2. runs `hermes desktop --build-only` (OS-aware GUI rebuild) with the
     Hermes-managed Node + venv on PATH,
  3. spawns a detached swapper that waits for this process to exit, dittos the
     freshly built Hermes.app over the running bundle, clears quarantine, and
     relaunches.
Degrades to "backend updated — restart to load the new GUI" if the rebuild
fails or there's no .app bundle to swap (dev run, Linux AppImage).

* chore: uptick

* chore: uptick

* chore: linux build

* fix(install): detect xcode-select git stub on fresh macOS

* chore: bump

* fix(desktop): repair voice dictation on Windows

Voice dictation was broken on Windows in two ways:

1. Mic access was denied. The Electron permission request handler only
   granted 'media' requests whose details.mediaTypes included 'audio',
   but Chromium on Windows frequently fires the mic request with an empty
   mediaTypes array, so getUserMedia threw NotAllowedError. The handler
   now grants audio-capture when mediaTypes includes 'audio' OR is
   empty/absent, handles the 'audioCapture' permission name, and adds a
   setPermissionCheckHandler (the synchronous path Chromium also consults
   for getUserMedia on Windows). Video is still denied.

2. Transcripts went nowhere. The composer's insertText handler (used by
   dictation and other inserts) only updated the assistant-ui composer
   store via setText, never the contentEditable editor DOM. The
   draft->editor sync effect only re-renders the editor when it is NOT
   focused, and dictation runs while the editor has/regains focus, so the
   transcript was stored but never shown and could not be sent. insertText
   now renders into the editor DOM and places the caret, mirroring
   appendExternalText.

Also hardens fetchJson: a 2xx response with an HTML body (or text/html
content-type) now rejects with a clear message naming the URL instead of
an opaque JSON.parse 'Unexpected token <' error.

* feat(desktop): route Nous subscribers onto the Tool Gateway from the GUI

When the GUI sets the main provider to Nous via POST /api/model/set, call
the same apply_nous_managed_defaults the CLI uses after model selection, so
GUI/onboarding users land on the Nous Tool Gateway the same way CLI users do
— no separate prompt, no duplicated logic.

Purely additive: apply_nous_managed_defaults skips any tool where the user
has a direct key (FIRECRAWL_API_KEY, FAL_KEY, etc.) or explicit config, so it
never overwrites a user's own setup. Only unconfigured tools get routed.

- web_server.py: in set_model_assignment (scope=main, provider=nous), resolve
  enabled toolsets and apply managed defaults; guarded so a Portal hiccup never
  blocks saving the model. Returns routed tools as gateway_tools.
- onboarding.ts: surface a 'Tool Gateway enabled' toast listing routed tools.
- types/hermes.ts: add gateway_tools to ModelAssignmentResponse.
- tests: cover nous-applies, non-nous-skips, and failure-doesnt-block-save.

* feat(desktop): mirror hermes model free/paid curation in GUI onboarding

GUI onboarding picked models[0] from /api/model/options, which ignores the
Nous free/paid tier — a free user could land on a paid default (e.g.
anthropic/claude-opus-4). Now the recommended default mirrors what `hermes
model` does.

- web_server.py: new GET /api/model/recommended-default?provider=<slug>. For
  Nous it runs the same curation as the CLI (get_curated_nous_model_ids +
  pricing + check_nous_free_tier + union_with_portal_{free,paid}_recommendations
  + partition_nous_models_by_tier) so free users get a free model and paid users
  get the curated default. Other providers fall back to the first curated model.
  Never 500s — returns empty model on error so onboarding degrades gracefully.
- hermes.ts: getRecommendedDefaultModel client + RecommendedDefaultModel type.
- onboarding.ts: fetchProviderDefaultModel prefers the recommended endpoint,
  falls back to models[0] when unavailable.
- tests: free-tier picks free model, paid-tier picks curated default, failure
  returns empty without 500.

* feat(desktop): show model pricing + free/paid tier gating in GUI picker

The CLI `hermes model` picker shows per-model $/Mtok pricing and gates paid
models on free Nous accounts. The GUI picker showed bare model names. Bring it
to parity across both the model-picker dialog and onboarding confirm card.

Backend:
- inventory.build_models_payload gains a pricing=True flag → _apply_pricing
  enriches each provider row with formatted per-model pricing
  ({input,output,cache,free}) via the same _format_price_per_mtok the CLI uses,
  and for Nous adds free_tier + unavailable_models (paid models a free user
  can't select) via check_nous_free_tier + partition_nous_models_by_tier.
  Best-effort: any pricing/tier failure is swallowed and fails open (no gating).
- /api/model/options and TUI model.options now pass pricing=True so the
  global picker and in-session picker both carry pricing.

Frontend:
- ModelOptionProvider gains pricing/free_tier/unavailable_models; new
  ModelPricing type.
- model-picker dialog renders In/Out $/Mtok (or a Free pill) per model, a
  Free tier/Pro badge on the Nous heading, and disables + grays unavailable
  paid models for free users with a 'Pro models need a paid subscription' note.
- onboarding confirm card shows the chosen model's price + tier badge.

Tests: test_inventory_pricing covers price formatting, free-tier gating,
paid no-gating, providers without pricing, and swallowed failures.

* fix(desktop): GUI model picker shows curated Nous list in curated order

Two bugs made the GUI Nous model list diverge from the `hermes model` CLI picker:

1. Backend (model_switch.py): the Nous row in list_authenticated_providers
   fell through to cached_provider_model_ids("nous"), dumping the full live
   /v1/models catalog (~50 vendor-prefixed models, alphabetical). Now it uses
   the curated list AND applies the Portal free/paid recommendation union —
   exactly like _model_flow_nous in main.py — so newly-launched models such as
   stepfun/step-3.7-flash:free surface in curated order. Best-effort: falls
   back to the curated list alone if the Portal fetch fails.

2. Frontend (model-picker.tsx): cmdk's Command had shouldFilter on (default),
   which re-sorts items by fuzzy-match score (≈alphabetical) and ignores array
   order. Set shouldFilter={false} + own the search term and do an
   order-preserving substring filter, so the backend's curated order is shown
   verbatim.

* feat(desktop): add/switch providers from the model picker via onboarding reuse

The model picker could only select models from already-authenticated
providers. Switching to a new provider had no in-app path. Rather than
duplicate provider UI, reuse the existing onboarding provider selector
(featured Nous + other providers + API-key form + device-code/PKCE flow +
model-confirm with pricing/tier).

- onboarding store: add a 'manual' flag with startManualOnboarding() /
  closeManualOnboarding(). Manual mode forces the onboarding overlay to show
  even when configured===true and refreshOnboarding no longer auto-dismisses
  on runtime-ready (the app is already working — the user is just adding or
  switching a provider).
- onboarding overlay: render when manual even if configured; show a Close
  button (the first-run flow has none since the app can't run yet).
- model picker: 'Add provider' footer button opens the onboarding selector;
  ModelResults lists only configured (model-bearing) providers.

* feat(desktop): add PUT /api/tools/toolsets/{name} enable/disable endpoint

* feat(desktop): add toggleToolset RPC binding

* feat(desktop): toolset enable/disable switch in Tools settings

* feat(desktop): tool configuration parity in GUI Tools settings

Bring the desktop GUI Tools settings to parity with the CLI `hermes tools`
for provider selection and API-key configuration.

Backend (hermes_cli/web_server.py):
- GET  /api/tools/toolsets/{name}/config  - provider matrix + key status
- PUT  /api/tools/toolsets/{name}/provider - persist provider selection

Shared core (hermes_cli/tools_config.py):
- Extract apply_provider_selection / _write_provider_config from the
  interactive _configure_provider so the CLI and GUI write identical
  config keys (web.backend, tts.provider, browser.cloud_provider, plugin
  image/video providers, use_gateway flags) through one code path.

Desktop UI:
- ToolsetConfigPanel: provider list with select, per-provider API-key
  entry (set/replace/clear/reveal via the shared env RPCs), Ready/Needs
  keys state, guidance for Nous-auth and post-setup providers.
- Wire the Configured/Needs keys pill to expand the panel inline; refresh
  the toolset list after key changes so the pill updates live.
- Add getToolsetConfig / selectToolsetProvider RPC bindings + types.

Post-setup (OAuth/install) flows still defer to the CLI; see
docs spike findings for the planned /api/tools/setup/* endpoint family.

Tests: backend round-trip + 400 cases for the new endpoints and
apply_provider_selection; desktop vitest coverage for the config panel
(provider render, select, key save). No change-detector tests.

Also removes three stale completed plan docs.

* fix(desktop): show real Hermes version + sync package.json on release

The desktop app version was disconnected from the Hermes version: the
release script bumped pyproject.toml + hermes_cli/__init__.py but never
touched apps/desktop/package.json, which sat stale at 0.0.2 (lockfile at
0.0.1).

- main.cjs: hermes:version IPC now resolves __version__ from
  hermes_cli/__init__.py (the canonical source release.py bumps) via a new
  resolveHermesVersion() helper, falling back to app.getVersion() when the
  source tree isn't readable. The About panel now always shows the live
  Hermes version and can't drift.
- release.py: update_version_files() also bumps apps/desktop/package.json
  in lockstep with pyproject (top-level version only; dep specs untouched).
- One-time catch-up: package.json 0.0.2 -> 0.15.1 and the lockfile root
  mirrors 0.0.1 -> 0.15.1.

* fix(desktop): stamp exe identity in afterPack hook so updates stay branded

The packed Hermes.exe reverted to the stock Electron icon + "Electron" name
after an in-app update. The icon/identity stamp (rcedit) lived only in
install.ps1, but the installer's --update path rebuilds the desktop via
`hermes desktop --build-only` -> `npm run pack`, which never ran install.ps1
and so never stamped the rebuilt exe.

Move the stamp into an electron-builder afterPack hook so it runs for EVERY
packed build regardless of caller (first install, hermes desktop, the update
rebuild, or a manual npm run pack):

- set-exe-identity.cjs: refactor to export stampExeIdentity(exe, desktopRoot);
  still runnable as a standalone CLI.
- after-pack.cjs (new): afterPack hook calling stampExeIdentity. Windows-only
  guard; best-effort (logs + resolves on failure, never fails the build).
- package.json: register build.afterPack.
- install.ps1: remove the now-redundant Set-DesktopExeIdentity function + call;
  the hook handles it during npm run pack.

electron-builder's own rcedit step stays disabled (signAndEditExecutable=false)
to avoid the signtool -> winCodeSign -> 7-Zip macOS-symlink crash on non-admin
Windows; the hook runs rcedit directly (pure PE resource edit, no signing).

* fix(desktop): export afterPack hook as exports.default so electron-builder runs it

The afterPack hook used `module.exports = fn`, which electron-builder's hook
loader doesn't pick up — it expects the function as the module's default
export (the same shape afterSign/notarize.cjs uses). The hook silently never
ran, so even first install shipped the stock "Electron" exe.

Switch to `exports.default = async function afterPack(...)`. Verified with a
real `npm run pack`: electron-builder now invokes the hook and the produced
release/win-unpacked/Hermes.exe carries ProductName/FileDescription=Hermes.

* chore(desktop): drop auto-build release CI in favor of manual build + upload

Remove desktop-release.yml (nightly-on-main + stable publish). Installers
are now built locally per platform and uploaded to a GitHub Release by hand;
the website points at them via NEXT_PUBLIC_HERMES_DL_* env. Update README +
docs and drop the dead desktop-nightly channel links.

* fix(desktop): stable shortcut icon + bust icon cache so updates repaint

Symptom on a freshly-installed laptop: Hermes.exe itself shows the correct
Hermes icon (Explorer reads the live exe's stamped PE resource), but the
desktop shortcut still draws the stock Electron icon.

Cause: New-DesktopShortcuts set IconLocation to "<exe>,0", so Windows cached
the icon it extracted from the exe at shortcut-creation time. On an update the
exe gets re-stamped, but the shortcut keeps rendering the stale cached bitmap.

- package.json: ship assets/icon.ico beside the exe via extraResources
  (-> resources/icon.ico). Verified with a real npm run pack.
- install.ps1 New-DesktopShortcuts: point IconLocation at resources/icon.ico
  (fallback to <exe>,0 if absent) — a dedicated .ico is cache-stable and skips
  the per-exe extraction that goes stale. Then run `ie4uinit.exe -show` to bust
  the shell icon cache so the shortcut repaints immediately instead of showing
  the old Electron icon until reboot.

Both best-effort; never fail an otherwise-good install.

* dummy update

* feat(desktop): self-heal update branch + backend contract guard

Two fixes for the bb/gui→main transition:

- Self-update self-heals: if the tracked branch (e.g. bb/gui) no longer
  exists on origin (merged + deleted), the desktop updater falls back to
  main and persists it. Read-only ls-remote probe that only flips on a
  definitive "ref absent" (exit 2), never on a transient network error, so
  already-installed clients migrate themselves with no manual flip.
- Backend contract guard: tui_gateway reports DESKTOP_BACKEND_CONTRACT in
  session runtime info; the desktop warns with a one-click "Update Hermes"
  when the backend predates the GUI's required contract (e.g. a bb/gui app
  pointed at a main checkout) instead of failing cryptically downstream.

* docs(desktop): rewrite README to match current install/update/build flow

The old README contradicted itself (claimed a bundled Python payload while
also saying it no longer bundles source) and predated cross-platform support.
Rewrite for accuracy: Linux is a first-class build target, install.sh/install.ps1
both drive the staged bootstrap, the real self-update handoff (Windows
Hermes-Setup vs in-app macOS/Linux), and the bb/gui→main self-heal + backend
contract guard.

* docs(desktop): rewrite README as a real product readme

Lead with what the app is and how to get it (download an installer, or
`hermes desktop` for existing CLI users) plus a plain-language feature list,
then keep contributor/build/internals as a clearly separated secondary section.

* docs(desktop): fix install framing — releases no longer auto-build installers

Lead with the install-with-Hermes path (`--include-desktop` / `hermes desktop`),
which always works, and describe prebuilt installers as manually published when
a release ships them rather than implying CI attaches them to every release.

* docs(desktop): match base repo README style

Adopt the root README's conventions: centered title + badge row, bold
one-liner intro, a feature <table> grid, --- section dividers, and a
Community / License footer.

* feat(desktop): recover from gateway boot failures + validate API keys on entry (#35864)

Fresh installs that hit a gateway boot failure had no recovery path: the
shell rendered dead ("gateway offline"), logs were undiscoverable, and a
mistyped API key was accepted because onboarding only checked credential
presence, not validity.

- Add BootFailureOverlay: a top-level recovery surface (Retry, Repair
  install, Use local gateway, Open logs + inline recent logs) that mounts
  on any hard boot failure, including post-install. Trims the now-redundant
  recovery button from the onboarding Preparing panel.
- Add hermes:logs:reveal / :recent IPC (reveal desktop.log) and a
  hermes:bootstrap:repair IPC that drops the bootstrap marker to force a
  clean reinstall. Surface "Open logs" in Gateway settings too.
- Add POST /api/providers/validate: a live per-provider probe
  (OpenRouter/OpenAI/xAI/Gemini key check, local endpoint connectivity)
  wired into saveOnboardingApiKey so a rejected key blocks before it's
  persisted, while an unreachable probe falls through (offline-safe).

* test(model-catalog): fix stale nous picker test after curated-list change

ac2e48907 made the GUI/picker Nous row use the curated list (curated["nous"]
= get_curated_nous_model_ids()) + Portal union, matching the `hermes model`
CLI — but test_picker_nous_row_uses_manifest still asserted the old 2-model
manifest snapshot, breaking the test shard.

Rewrite it as an invariant: stub the Portal union to passthrough and assert the
row equals get_curated_nous_model_ids() computed under the same conditions, so
it tracks the real contract instead of a hardcoded model list that rots on every
catalog update.

---------

Co-authored-by: emozilla <emozilla@nousresearch.com>
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: ethernet <arilotter@gmail.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
2026-05-31 17:46:56 -05:00
Teknium
cf328723d4 docs: drop early-beta framing for native Windows support (#36093)
Native Windows is out of beta. Removes the early-beta warnings, headings,
and rough-edge framing across the README and docs (EN + zh-Hans), keeping
the WSL2-only dashboard PTY caveat. Historical RELEASE_v0.14.0.md notes are
left intact since they accurately describe the state at that release.

- README: Windows install + cross-platform notes
- index.mdx, installation.md: headings, warning admonitions, parity note
- windows-native.md: title/sidebar_label/warning, provider-hunting tip
- contributing.md, nous-portal.md: cross-platform / Portal parity prose
- Repoint cross-links to the renamed installation#windows-native-powershell
  anchor (EN) and #windows原生powershell (zh, also fixes pre-existing drift)
2026-05-31 15:33:18 -07:00
kshitijk4poor
c9a28dfb08 feat(model-picker): description on group layer, plain labels on members
For grouped provider families, the descriptive text now lives only on the
collapsed top-level group row. The member sub-picker rows show just the
short provider label (no parenthetical tui_desc), so the description is not
duplicated one layer down.

Ungrouped providers are unaffected — they have no group layer, so their own
row keeps its full tui_desc.

- main.py: member sub-picker uses provider_labels (label) instead of
  canonical_descs (tui_desc).
- Telegram already showed labels + model count on member buttons; group
  buttons keep Label ▸ (count) since inline keyboards can't fit a long blurb.

Member labels retain their short disambiguators (e.g. 'MiniMax (OAuth)') so
the sub-picker rows stay distinguishable.
2026-05-31 15:02:26 -07:00
kshitijk4poor
84d82453ae feat(model-picker): show short description on grouped provider rows
The 7 consolidated provider families (OpenAI, xAI Grok, GitHub Copilot,
Google Gemini, Kimi / Moonshot, MiniMax, OpenCode) collapse to one
top-level picker row. Previously that row showed only the bare group
label (e.g. `OpenAI ▸`); now it carries a short blurb describing the
endpoints folded inside (e.g. `OpenAI ▸ (Codex CLI or direct OpenAI API)`).

- models.py: extend PROVIDER_GROUPS tuples to (label, description, members);
  group_providers() emits the description on group rows.
- main.py: CLI picker renders `<label> ▸ (<description>)` for group rows.
- telegram.py: update the group tuple unpack (button text keeps the member
  count, which fits inline keyboards better than a long blurb).
- tests: assert every group has a non-empty description and the fold emits it.

Member-specific detail still lives in each member's tui_desc and shows in
the drill-down sub-picker. Slug identity, --provider, /model paths unchanged.
2026-05-31 15:02:26 -07:00
kshitijk4poor
47d2d05892 chore(model-picker): refresh provider picker descriptions
Update the tui_desc text shown for each provider in the interactive
`hermes model` / setup wizard / `/model` pickers. Pure copy refresh —
slugs, labels, PROVIDER_GROUPS folding, and all typed paths are unchanged,
so the 7 grouped families (OpenAI, xAI Grok, GitHub Copilot, Google Gemini,
Kimi / Moonshot, MiniMax, OpenCode) still fold identically.

Also aligns the auto-injected alibaba-coding-plan provider description to
the same parenthetical style.
2026-05-31 15:02:26 -07:00
kshitijk4poor
eb3cf9750e fix(gateway): resolve _get_dm_topic_info on adapter class, not instance
Follow-up to the synthetic-notification DM-topic routing fix. The new
_is_telegram_dm_topic_target probed the adapter's _get_dm_topic_info via
instance-level getattr, which a MagicMock auto-creates as a truthy callable —
so any test double with a non-dm chat_type and a thread_id would be
misclassified as a DM topic lane and have the fallback routing keys injected.

Resolve the method on type(adapter) and treat only dict-shaped returns as an
operator-declared topic, mirroring the existing guard in
_rename_telegram_topic_for_session_title. Update the home-channel startup test
to declare _get_dm_topic_info on a real adapter subclass instead of patching a
MagicMock onto the instance.
2026-05-31 12:13:46 -07:00
Dusk1e
4259bab7d4 fix(gateway): preserve Telegram DM topic routing metadata in synthetic notifications 2026-05-31 12:13:46 -07:00
kshitij
59cc7c305d Merge pull request #36023 from kshitijk4poor/fix/spawn-via-env-bg-wrapper
fix(tools): don't compound-rewrite spawn_via_env background wrappers
2026-05-31 12:11:17 -07:00
kshitij
01dda3fa02 Merge pull request #36010 from kshitijk4poor/fix/terminal-cwd-acp-aware
fix(tools): preserve live session cwd in terminal_tool, keep ACP update_cwd authoritative
2026-05-31 11:41:21 -07:00
kshitijk4poor
6f8975dcd8 fix(tools): don't compound-rewrite spawn_via_env background wrappers
Background tasks on non-local backends (SSH/Docker/Modal/Daytona/Singularity)
go through `ProcessRegistry.spawn_via_env`, which builds a hand-crafted,
shell-safe wrapper:

    mkdir -p T && ( nohup bash -lc CMD > LOG 2>&1; rc=$?; ... ) & echo $! > PID && cat PID

`BaseEnvironment.execute()` unconditionally ran `_rewrite_compound_background`
on every command, including this wrapper. The rewrite (meant to defuse the
`A && B &` subshell-wait trap for user commands) turns `( ... ) & echo $!` into
`{ ( ... ) & } echo $!` — note `} echo` with no separator, which is a bash
syntax error. The wrapper then never produces a PID, the redirected output file
is never created, and the agent sees an immediate exit code -1. This breaks
*every* background launch on a non-local backend (e.g. a simple
count-and-redirect script over SSH), not just edge cases.

Fix:
- Add `rewrite_compound_background: bool = True` to `BaseEnvironment.execute()`
  (and the `BaseModalExecutionEnvironment` override, which accepts and ignores
  it). Default preserves existing behavior; the user foreground terminal path
  still rewrites.
- `spawn_via_env` passes `rewrite_compound_background=False` so its already
  shell-safe wrapper is left intact.
- Treat a wrapper that produces no PID as a failed launch (mark the session
  exited with a real exit code instead of exposing a fake running session), and
  don't register/checkpoint a session that never started.

Verified empirically: with the rewrite skipped, the wrapper is valid bash,
launches the process, captures the PID, and writes the log/pid/exit files; the
old rewritten form fails `bash -n` with a syntax error.

Based on #33756 by @CharZhou (extracted from a multi-feature branch; the
unrelated image_gen / docker-media changes are not included here).

Co-authored-by: CharZhou <17255546+CharZhou@users.noreply.github.com>
2026-06-01 00:05:10 +05:30
kshitijk4poor
7a315bd702 fix(tools): preserve live session cwd in terminal_tool, and keep ACP update_cwd authoritative
terminal_tool re-sent the init-time/config cwd on every command, clobbering
session-local `cd` state: the environment tracked the new directory in
`env.cwd`, but foreground/background calls forced the old cwd back. A small
`_resolve_command_cwd` resolver now applies the precedence
`workdir > live env.cwd > config/override cwd` to:
  - foreground `env.execute(...)`
  - background `process_registry.spawn_local(...)`
  - background `process_registry.spawn_via_env(...)`

Additionally, syncing the cwd onto the live cached env when a `cwd` override is
(re-)registered. Preferring live `env.cwd` would otherwise demote the ACP
`update_cwd` override (registered via `register_task_env_overrides` on
`session/load` / `session/resume`) below an already-set `env.cwd`, silently
ignoring an editor's mid-session project-root change once any command had run.
`register_task_env_overrides` now pushes a new cwd onto the cached env so an
explicit ACP cwd change wins, while ordinary in-session `cd` tracking is
preserved.

Regression coverage:
  - foreground/background commands follow live `env.cwd`
  - explicit `workdir` still overrides everything
  - registering a cwd override updates the live env cwd (ACP authority)
  - no-op when no live env exists; non-cwd overrides leave env.cwd untouched

Based on #35510 by @Dusk1e.

Co-authored-by: Dusk1e <yusufalweshdemir@gmail.com>
2026-05-31 23:50:40 +05:30
Teknium
1044d9f25d fix(gateway): /stop can interrupt a sibling participant's run in a per-user thread (#35959)
In a per-user thread (thread_sessions_per_user=True), each participant
gets an isolated session key (...:{thread_id}:{user_id}). A run another
user started lives under a different key, so the caller's own /stop found
nothing and replied 'no active task to stop'.

When /stop finds no run under the caller's own key, fall back to
interrupting any running agent(s) sharing the caller's thread prefix
({chat_id}:{thread_id}), gated on _is_user_authorized. Thread-only — the
fallback returns [] for non-thread channels, and a prefix-collision guard
prevents thr1 from matching thr11.
2026-05-31 09:29:03 -07:00
Teknium
de4f40ed02 feat(setup): thin out setup — Quick Setup via Nous Portal + Full Setup defaults (#35723)
* feat(setup): Quick Setup routes through Nous Portal (OAuth + model + messaging)

First-time quick setup now goes straight to the Nous Portal provider
instead of showing the full provider picker. Runs the device-code OAuth
login, selects a Nous model, configures the terminal backend, and offers
messaging setup — applying recommended defaults for everything else.

- Rename menu entry to 'Quick Setup (Nous Portal)'.
- _run_first_time_quick_setup now calls _model_flow_nous (handles both the
  logged-out OAuth+model-select path and the logged-in curated picker),
  then re-syncs config from disk to avoid the #4172 stale-overwrite.
- Terminal / defaults / messaging steps unchanged.

* feat(setup): thin out Full Setup with happy defaults

Full Setup no longer asks for every config knob — anything with an
obvious default is applied silently and stays tunable via the per-section
commands (hermes setup agent|terminal|tts, hermes auth add).

- Model section: drop the same-provider rotation pool, vision-backend
  picker, and TTS provider sub-flows. Vision auto-detects from the main
  provider; TTS defaults to Edge; rotation lives in hermes auth add.
- Terminal section: keep the backend picker (Local default) and any
  required credentials (Modal token, SSH host/user/key, Daytona key),
  but stop prompting for container image, CPU/mem/disk resources, gateway
  cwd, and sudo password — all use defaults.
- Agent Settings: removed from the wizard. First installs get recommended
  defaults silently; existing installs keep their tuned values.
- New defaults: max_turns 90 -> 150, session_reset both -> none.
- Tests: reconfigure tests assert agent settings are no longer prompted
  on existing installs; drop 3 tests covering the deleted in-setup
  rotation flow.
2026-05-31 09:13:06 -07:00
brooklyn!
a726e8a811 fix(tui): auto-recover session on unexpected gateway death (+ persist lifecycle breadcrumbs) (#35893)
* fix(tui): persist gateway lifecycle breadcrumbs to crash log

A backend SIGTERM (`=== SIGTERM received ===` in tui_gateway_crash.log) is
always a parent action — `gw.kill()` (graceful-exit on a signal to Node, or an
explicit /quit) or `start()` replacing a live child. #31051 added parent-side
lifecycle breadcrumbs but left them in an in-memory CircularBuffer that dies
with the process, so SIGTERM crash reports arrive with no parent context and no
way to tell a signal-driven kill from a memory-critical `process.exit(137)`
(which closes the child's stdin → clean EOF, not SIGTERM).

Persist the death-explaining breadcrumbs (spawn / transport-exit / child-exit /
replace-live-child / kill-reason / startup-timeout) plus the graceful-exit
signal name and the memory-critical exit into the same crash log the Python
side writes, so they interleave by timestamp next to the child's panic entry —
making these recurring reports diagnosable.

Gated off under VITEST so unit tests stay hermetic.

* feat(tui): auto-recover the session when the gateway dies unexpectedly

When a still-owned gateway child dies while the TUI is alive (a crash, OOM
process.exit, or a SIGTERM/SIGHUP forwarded to it), the app currently nulls the
session and drops to an inert "gateway exited" state — the user loses a long
session and has to restart + re-run everything. That single behavior is most of
the "TUI doesn't survive heavy work" complaint, independent of what does the
killing.

The 'exit' event only reaches this handler on an *unexpected* death: a user
/quit calls process.exit before it fires, and a replaced child is identity-
skipped in GatewayClient. So on exit we now respawn the gateway and resume the
session that was live (history is persisted in SQLite) via a one-shot
recoverSidRef the next gateway.ready consults before forging a new session. The
in-flight reply is lost (it died with the process) but the session survives.

Bounded to GATEWAY_RECOVERY_LIMIT (3) attempts per GATEWAY_RECOVERY_WINDOW_MS
(60s) so a gateway that crash-loops on startup can't spawn-storm; past the
budget we fall back to the inert state.

* fix(tui): sanitize newlines + soften SIGTERM-cause claim in parentLog

Address PR review:
- recordParentLifecycle collapses embedded \r\n so a multi-line value (e.g. an
  error message) stays a single breadcrumb and can't masquerade as a separate
  entry or as the child's panic output sharing the crash log.
- Reword the header: a backend SIGTERM is *usually* a parent action but can come
  straight from an external supervisor (s6, cgroup OOM, stray kill); the
  presence/absence of a [tui-parent] line before the child's panic is precisely
  what disambiguates the two.

* fix(tui): clear sid during recovery + extract/test the recovery budget

Address PR review:
- Null `sid` immediately in the gateway exit handler. While the gateway is down
  (busy=false) the old sid would otherwise let sid-guarded effects (the 1.5s
  session.active_list poll, queue drain) fire RPCs at a dead/respawning gateway.
  recoverSidRef carries the session forward; resumeById restores sid on ready.
- Extract the respawn budget into a pure evalRecovery() (gatewayRecovery.ts) and
  unit-test the bound: allows GATEWAY_RECOVERY_LIMIT within the window, blocks
  past it, and prunes attempts older than the window so recovery re-arms.

* fix(tui): cap parent-log breadcrumb length (PR review)

Truncate a single persisted breadcrumb to 4096 chars (matching GatewayClient's
in-memory log-line cap) so a pathological value — e.g. a giant error string —
can't bloat the shared crash log or add noticeable blocking on the synchronous
append during a failure path. Covered by a test.

* fix(tui): keep "recovering session…" status visible during resume (PR review)

resumeById() synchronously sets status to 'resuming…' on entry, so the
recovery branch now applies its 'recovering session…' label *after* calling
resumeById — the distinct label sticks for the duration of the resume RPC
(which later flips to 'ready') instead of being immediately clobbered. Test
updated to assert the ordering.

* fix(tui): keep recovery budget alive across a startup crash-loop (PR review)

deadSid was read from getUiState().sid, which the first exit nulls — so if the
respawned gateway crash-looped before gateway.ready (resumeById never restored
sid), later exits saw null and abandoned the session after a single attempt,
defeating the bounded retry budget.

Lift the whole decision into a pure planGatewayRecovery() that falls back to the
pending recoverSidRef target when the live sid is already cleared, and unit-test
the crash-loop sequence (keeps retrying the same session up to the limit, then
falls back to inert). Supersedes evalRecovery.

* chore(tui): drop non-null assertion + clarify breadcrumb cap comment (PR review)

- Recovery branch guards on `recoverSidRef && recoverSid` so the ref write needs
  no `!` assertion (avoids a future unsafe refactor).
- Reword the parentLog cap comment: it slices the value to 4096 chars and
  appends a short truncation marker (so the written line is slightly longer),
  rather than implying a strict 4096-byte limit.

* chore(tui): soften "absence ⇒ external signal" + "any in-flight reply" (PR review)

- parentLog header: a missing [tui-parent] line only *suggests* an external
  signal (the logger is best-effort: VITEST-disabled, failed append swallowed),
  not a definitive conclusion.
- Recovery notice says "any in-flight reply was lost" since the gateway can also
  exit while idle.
2026-05-31 10:36:57 -05:00
teknium1
04bb74c58e chore: map fesalfayed author email for release notes 2026-05-31 06:14:34 -07:00
fesalfayed
64628ea89b fix(anthropic): demote dead thinking signature when orphan-strip mutates the latest turn
Extended-thinking Claude models (4.6+, e.g. Opus 4.8) emit a signed `thinking`
block on assistant turns that also carry parallel `tool_use` blocks. Anthropic
signs that block against the full, original turn content.

When a parallel tool batch is interrupted before every `tool_result` returns,
`_strip_orphaned_tool_blocks` removes the unanswered `tool_use` on replay — which
mutates the turn. The latest-assistant branch of `_manage_thinking_signatures`
then replays the now-stale signed thinking block verbatim, and Anthropic rejects
the request with a non-retryable HTTP 400:

    messages.N.content.M: `thinking` or `redacted_thinking` blocks in the latest
    assistant message cannot be modified. These blocks must remain as they were
    in the original response.

Because the poisoned turn is rebuilt from the persisted store every turn, the
gateway crash-loops with no self-recovery (a soft session reset does not clear
it). The drifting content index in the error is the changing count of stripped
`tool_use` blocks across rebuilds.

Fix: when orphan-stripping removes a `tool_use` from a turn that also holds a
thinking/redacted_thinking block, flag the turn. `_manage_thinking_signatures`
then demotes every thinking block on that latest turn to a plain text block
(preserving the reasoning text) instead of replaying a signature that can no
longer validate. An intact turn is unaffected — its signed thinking is still
replayed verbatim. The internal flag is stripped before the payload is sent.

Adds two regression tests:
- demotion when an orphaned parallel tool_use is stripped
- control: signed thinking preserved verbatim when nothing is stripped
2026-05-31 06:14:34 -07:00
Teknium
2b5268f716 revert: drop cumulative-resend tool-arg heuristic from shared streaming path (#35718) (#35860)
PR #35718 added a per-slot "cumulative-resend" latch to the universal
streaming tool-call accumulator to fix DeepSeek / Baidu Qianfan (#35592).
The latch fires when a delta is a strict superset of the accumulated
buffer (len(_new) > len(_prev) and _new.startswith(_prev)) and then
REPLACES the buffer instead of appending.

That superset test is not an unambiguous cumulative signature. A normal
incremental stream can emit a single fragment that restates an already-
accumulated prefix — trivially common in large code-patch arguments with
repeated lines / indentation — which trips the latch and clobbers the
accumulated buffer, corrupting the tool call. Observed in the wild on
Anthropic Opus (the primary model) building a large patch: corrupted /
short arguments → finish_reason='length' dead-end → session killed.

A guessing heuristic that can silently clobber a tool-call buffer has no
place on the path every provider and model shares. Reverting restores the
known-good plain `+=` accumulator. The #35592 narrow provider bug should
be re-addressed provider-gated so it is structurally impossible to touch
Anthropic / OpenAI incremental streams, rather than via a heuristic on the
shared path.

Reverts ca03486b6.
2026-05-31 06:14:32 -07:00
Teknium
f2d4cf4f76 fix(cli): clamp post-compression token sentinel in status bar (#35858)
The status bar read context_compressor.last_prompt_tokens directly with
an 'or 0' guard that only catches 0/None. Right after a compression the
compressor parks last_prompt_tokens at the -1 sentinel
(awaiting_real_usage_after_compression) until the next API call reports
real usage. -1 is truthy, so it sailed through and rendered as '-1/200K'
and '-1%' for that one transitional turn.

Clamp negative token/context-length values to 0 in the status-bar
snapshot so the gap reads as empty context until real usage arrives.
2026-05-31 06:03:01 -07:00
Teknium
1fc7bdc5e6 feat(tools): always show Nous Tool Gateway backends, login on select (#35792)
* feat(tools): always show Nous Tool Gateway backends, login on select

The Nous-managed Tool Gateway rows in `hermes tools` (Firecrawl, OpenAI
TTS, Browser Use, FAL image/video) were hidden unless the user was already
logged into Nous Portal with paid access. Now they are always listed.
Selecting one runs an inline Nous Portal device-code OAuth + entitlement
check — auth only, no inference-provider switch and no bulk 'enable all
tools' prompt (that stays in `hermes model`). The row only activates the
gateway once paid access is confirmed.

- _visible_providers: stop hiding managed_nous_feature rows (incl. those
  also flagged requires_nous_auth); pure pre-auth UX rows still gate on login
- nous_subscription.ensure_nous_portal_access(): auth + entitlement gate
  that preserves the user's active inference provider
- _configure_provider / _reconfigure_provider: run the inline gate for
  managed backends; write config only when entitled
- picker marker: 'via Nous Portal (login on select)' for logged-out users
- _hidden_nous_gateway_message: now a no-op (rows are never hidden)

* docs: hermes tools is a first-class Tool Gateway entry point

The Tool Gateway docs framed `hermes setup --portal` / `hermes model` as
the activation path and only mentioned `hermes tools` for mixing in your
own keys. With the inline-login change, picking a Nous-managed backend in
`hermes tools` is a complete path on its own — it logs you into Nous
Portal on select if needed, without switching your inference provider or
prompting to enable every other tool.

- tool-gateway.md: Get started now lists three peer entry points; new
  paragraph explaining login-on-select and the no-prompt fast path when
  OAuth is already active
- nous-portal.md + run-hermes-with-nous-portal.md: note that managed rows
  appear logged-out and trigger inline login on select
2026-05-31 03:39:17 -07:00
kshitijk4poor
8f4c8e7c82 refactor(cli): extract shared curses menu event-loop driver
The three curses menus (curses_checklist / curses_radiolist /
curses_single_select) each hand-rolled an identical event loop: cursor
hide + color-pair init, the per-frame clear/getmaxyx/refresh cycle,
scroll-offset math, row iteration, the read_menu_key dispatch with
NAV_UP/NAV_DOWN cursor wrap, flush_stdin, and the
KeyboardInterrupt/curses-unavailable fallback. Terminal-behavior changes
(e.g. Ghostty raw-escape handling, scroll tweaks, a new key) had to be
made in three places.

Extract that boilerplate into one _run_curses_menu driver. Each public
menu now supplies small callbacks for the parts that genuinely differ:
draw_header (returns the item-list start row), draw_row (checkbox vs
radio vs bare prefix), an on_action reducer (toggle-set vs return-cursor
vs return-None + the single_select cancel-row guard), an optional
draw_footer (the checklist status bar), reserve_bottom, and the numbered
fallback. Behavior is passed as functions; the loop is the only stateful
piece — so future terminal/Ghostty work is a one-place edit.

Duplicated event-loop primitives drop 3 -> 1 (stdscr.clear, read_menu_key
dispatch, scroll math). Verified byte-identical: a render harness records
every addnstr(y, x, clamped-text, attr) call across frames plus the
return value for 6 cases (checklist, checklist+status, radiolist,
radiolist+description, single_select, single_select ESC-cancel); output
diffs clean against origin/main. Non-TTY returns the cancel value
directly (not the input()-based numbered fallback), matching the old
per-menu guard. 150 menu/setup/browse/plugins tests pass.
2026-05-31 03:19:37 -07:00
kshitijk4poor
087be00733 fix(cli): migrate setup model/provider pickers off simple_term_menu to curses
The setup provider->model sub-menu (and three sibling pickers) used
simple_term_menu.TerminalMenu, whose ESC and arrow-key handling was
unreliable across terminals — notably ESC failed to back out of the
model selection list on terminals that emit raw escape sequences (e.g.
Ghostty). The codebase already notes simple_term_menu 'conflicts with
/dev/tty' and causes 'ghost-duplication rendering', and a prior attempt
to migrate these (closed PR) confirmed the same root cause.

Route all four single-select pickers through the shared, already-hardened
curses_radiolist (which decodes raw CSI/SS3 escape sequences and handles
ESC consistently, fixed in #35776):

- auth.py _prompt_model_selection — model picker; the pricing column
  header and the unavailable-models block are passed as the radiolist
  description so they survive the curses screen clear. ESC now cancels.
- main.py _prompt_reasoning_effort_selection — reasoning-effort picker.
- main.py _model_flow_named_custom — named custom-provider model picker.
- main.py _remove_custom_provider — provider-removal picker.

simple_term_menu is no longer imported anywhere (only stale comments
referenced it; one in setup.py is corrected). The numbered-input
fallbacks are unchanged and still trigger on curses errors / non-TTY.

Tests: updated test_terminal_menu_fallbacks / test_reasoning_effort_menu
/ test_custom_provider_model_switch / test_model_provider_persistence to
drive the fallback via curses_radiolist errors instead of breaking
simple_term_menu. New test_setup_menu_curses_migration.py asserts each
picker routes through curses_radiolist, ESC cancels, and the pricing
header is preserved. Net -147/+183 (mostly the new test file; production
code shrinks by removing TerminalMenu boilerplate).
2026-05-31 03:19:37 -07:00
kshitij
4ccd141b15 Merge pull request #35776 from kshitijk4poor/fix/curses-arrow-key-decode
fix(cli): decode raw arrow-key escape sequences in curses menus
2026-05-31 01:41:31 -07:00
kshitijk4poor
3463c97a36 fix(cli): decode raw arrow-key escape sequences in curses menus
The setup wizard's provider/model pickers (curses_radiolist via
prompt_choice) bailed to the numbered "Select [1-N]" fallback the moment
a user pressed up or down. Root cause: even with keypad(True) — which
curses.wrapper sets — many terminals/terminfo entries deliver cursor keys
to getch() as raw CSI/SS3 byte sequences (e.g. 27, 91, 66 for arrow-down)
rather than the translated curses.KEY_DOWN. The menus matched only
curses.KEY_UP/KEY_DOWN and treated the leading 27 (ESC) as cancel, so
navigation dropped into the text fallback and the trailing bytes leaked
into the next input().

Add a shared read_menu_key() helper that decodes CSI/SS3 escape sequences
into normalized NAV_* actions (only a lone ESC, with no continuation byte
within a short timeout, still cancels) and consumes the tail of unhandled
sequences so stray bytes can't corrupt later input(). Route all three
curses menus (checklist, radiolist, single_select) through it.

Add regression tests covering raw CSI/SS3 arrows, translated KEY_*
constants, vim keys, lone-ESC cancel, and full consumption of unhandled
sequences (Delete/Home/End).
2026-05-31 13:59:56 +05:30
Teknium
0cd7d54b00 feat(kanban): goal_mode cards run workers in a /goal loop (#35710)
* feat(kanban): goal_mode cards run workers in a /goal loop

A goal_mode card wraps its dispatched worker in the Ralph-style goal
loop behind /goal: after each turn an auxiliary judge checks the
worker's response against the card title+body, and if not done the
worker keeps going in the SAME session until the judge agrees, the
worker terminates the task itself, or the turn budget runs out (which
blocks the card for human review — never a silent exit).

- kanban_db: goal_mode + goal_max_turns columns (additive migration),
  Task fields, create_task params, INSERT wiring, created-event payload.
- kanban_tools: goal_mode/goal_max_turns on the kanban_create tool so
  orchestrators can opt cards in when fanning out.
- kanban CLI: --goal / --goal-max-turns on 'kanban create'.
- dashboard API: goal_mode/goal_max_turns on the create endpoint
  (auto-surfaced back via asdict).
- _default_spawn: sets HERMES_KANBAN_GOAL_MODE / _GOAL_MAX_TURNS only
  when the card opts in.
- goals.run_kanban_goal_loop: standalone, callback-injected loop engine
  (no SessionDB persistence; ephemeral worker). cli.py quiet path calls
  it after the worker's first turn when the env vars are set.
- Docs: orchestrator skill + kanban feature page.

Tests: DB roundtrip + legacy migration, spawn env gating, and the loop's
continuation/completion/budget-block/finalize-nudge branches. E2E run
against a real kanban DB confirms a budget-exhausted goal worker lands
in a sticky blocked state.

* feat(kanban/dashboard): goal-mode toggle in the create form

Wires the goal_mode card setting into the dashboard UI (the plugin's
hand-written IIFE bundle, no build step):

- InlineCreate: 'goal mode' checkbox after the skills field; checking it
  reveals an optional 'max turns' number input. Both reset on submit and
  only post goal_mode/goal_max_turns when enabled.
- TaskDrawer: a 'Goal mode: on (max N turns)' MetaRow so a card's
  goal-mode setting is visible after creation (auto-fed by asdict via the
  existing _task_dict).

Live-tested through the running dashboard with a browser: created a
goal-mode card with max-turns=8, confirmed it persisted to the kanban DB
(goal_mode=1, goal_max_turns=8) and rendered back in the drawer as
'on (max 8 turns)'. No JS console errors.
2026-05-31 01:16:33 -07:00
kshitijk4poor
32899279a7 fix(gateway): detach pending_watchers batch + normalize LRU caches + align test fixtures + AUTHOR_MAP
Self-review follow-up on top of the salvaged perf fixes:

- gateway/run.py (both watcher-drain sites): the salvaged O(n^2) fix
  (#32708) replaced `while pending_watchers: pop(0)` with iterate-then-
  `watchers.clear()`, but `watchers` aliased the registry's live list.
  A watcher appended by a concurrent session during the `await
  asyncio.sleep(0)` yield would be cleared without ever being scheduled.
  Detach the batch atomically (`pending_watchers = []`) before iterating.

- gateway/platforms/bluebubbles.py: normalize the salvaged _guid_cache
  LRU (#30523) to match feishu/codebase precedent — module-level
  `_GUID_CACHE_SIZE` constant, `while len > cap`, and drop the redundant
  post-insert `move_to_end` (a fresh insert is already most-recent).

- gateway/platforms/feishu.py: drop the same redundant post-insert
  `move_to_end` from the salvaged _message_text_cache LRU (#23706).

- scripts/release.py: add AUTHOR_MAP entries for the salvaged commits'
  authors (amathxbt #22155, ErnestHysa #32636/#32708) so the contributor
  audit passes when these commits land on main.

- tests/tools/test_tool_output_limits.py: autouse fixture resets the new
  module-level limits cache between tests.

- tests/gateway/test_feishu.py: hand-built adapter fixture seeded
  _message_text_cache as a plain dict; it's now an OrderedDict, so the
  fixture type had to match.
2026-05-31 00:50:19 -07:00
ErnestHysa
0036c72923 fix(gateway): upgrade plugin/bundle error logging and fix O(n^2) watcher recovery
N43 — Silent plugin/bundle errors:
- Plugin command dispatch: logger.debug() -> logger.warning()
- Bundle dispatch: logger.debug() -> logger.warning()
Plugin/auth failures are no longer invisible to operators.

N42 — O(n^2) pending_watchers recovery:
- Both recovery loops (startup + per-message) used while+pop(0) which is O(n) per pop
- Replaced with enumerate() over the list + periodic asyncio.sleep(0) yield points
- Clears the list after iteration instead of per-pop
- Batch size of 100 balances throughput vs event-loop responsiveness
2026-05-31 00:50:19 -07:00
ErnestHysa
eb9bfd3924 fix(T5): replace time.sleep(0.25) with asyncio.sleep in MCP auth reconnect poll
PAIN BEFORE:
Inside _handle_auth_error_and_retry() (a sync function that runs on the MCP
event loop thread), there was a blocking polling loop:

    while time.monotonic() < deadline:
        if srv.session is not None and srv._ready.is_set():
            break
        time.sleep(0.25)   # BLOCKS THE ENTIRE EVENT LOOP

Since _handle_auth_error_and_retry is invoked from tool handlers that run ON
the MCP event loop, time.sleep(0.25) blocked ALL concurrent MCP operations
(including other tools, keepalive heartbeats, OAuth refreshes) for 250ms per
iteration. With a 15-second deadline, worst case = 60 * 250ms = 15 seconds
of fully blocked concurrency.

WHAT WAS FIXED:
Extracted the blocking poll into an async helper _await_ready() that uses
asyncio.sleep(0.25) (non-blocking), and runs it via _run_on_mcp_loop().
_run_on_mcp_loop() properly awaits the coroutine on the event loop without
blocking the caller's thread. Added exception handling around the poll so
stuck reconnects still fall through to the error path.

The sync _handle_auth_error_and_retry now:
1. Fires reconnect signal (threadsafe)
2. Calls _run_on_mcp_loop(_await_ready(), timeout=15) — non-blocking
3. Returns; the event loop handles the polling

File: tools/mcp_tool.py
Lines: _handle_auth_error_and_retry() (~1886-1920)

Found by: exhaustive multi-pass audit (10 strategies, 1901 files, 913K lines)
2026-05-31 00:50:19 -07:00
AMATH
91a98d1519 fix: tool_output_limits re-reads config on every call (no caching) 2026-05-31 00:50:19 -07:00
Yuan Li
3c21fed099 fix(bluebubbles): cap _guid_cache with LRU eviction to prevent unbounded growth
The _guid_cache dict grows without bound as new contacts/groups are
resolved.  In a long-running gateway instance with many unique targets
this becomes a slow memory leak.

Replace the plain dict with an OrderedDict capped at 500 entries.
When the cap is exceeded the oldest (least-recently-used) entries are
evicted.
2026-05-31 00:50:19 -07:00
EloquentBrush
e8cacb57d5 fix(feishu): cap _message_text_cache with LRU eviction to prevent unbounded growth
_message_text_cache was a plain dict with no size limit. Every unique
message_id whose text was fetched (for reply-context lookups) stayed in
memory permanently, causing unbounded growth in long-running deployments
with active group chats.

Replace with an OrderedDict and evict the least-recently-used entry
whenever the cache exceeds _FEISHU_MESSAGE_TEXT_CACHE_SIZE (512). Cache
hits call move_to_end() to refresh LRU order. Mirrors the identical
pattern already used by _pending_processing_reactions in the same class.
2026-05-31 00:50:19 -07:00
Teknium
e1293bde4e feat(models): refresh model catalog hourly instead of daily (#35756)
Lower the model_catalog disk-cache TTL from 24h to 1h so freshly
published model-catalog.json deploys reach the picker within an hour
instead of up to a day. The picker now refetches on the next
`hermes model` / `/model` once the cache is older than 1h; younger
than 1h still serves the cache (no network hit), and network failures
still fall back to the stale copy.

- DEFAULT_TTL_HOURS 24 -> 1 (model_catalog.py)
- DEFAULT_CONFIG model_catalog.ttl_hours 24 -> 1, _config_version 24 -> 25
- migration v24->25 rewrites a stale ttl_hours:24 to 1, preserving any
  custom value the user set

E2E: verified >1h refetches / <1h skips, and migration rewrites 24->1
while preserving a custom 6.
2026-05-31 00:29:40 -07:00
Teknium
ca03486b6a fix(streaming): stop duplicating tool-call args from cumulative-resend providers (#35718)
DeepSeek / Baidu Qianfan stream tool-call arguments in cumulative mode:
each chunk resends the full arguments-so-far instead of the new fragment.
The stream accumulator blindly concatenated arg deltas with +=, turning
that into '{...}{...}{...}', which failed json.loads and got nuked to '{}'
— a silently corrupted tool call (#35592). Worse on multi-param tools
(search_files, session_search, memory replace) because longer args take
more chunks, giving more resend opportunities.

- Per-slot cumulative latch in the stream accumulator: a delta that is a
  strict superset of the accumulated buffer marks the slot cumulative and
  replaces (not appends); exact duplicates are dropped only after latching.
  Incremental fragments are untouched (default += path).
- Backstop _collapse_repeated_json_arguments() in the repair pipeline
  collapses pure identical-resend buffers (K exact repeats of a valid-JSON
  unit) for providers that resend the complete object from chunk 1. Only
  reached after json.loads already failed, so compliant single objects are
  never touched.

Not a gateway or DeepSeek-model bug — any OpenAI-wire provider in
cumulative streaming mode is affected.
2026-05-31 00:19:39 -07:00
Teknium
0ffbcbbe7d fix(vision): cap embedded image size before it wedges a session (#35732)
Resize vision tool-result images down to a 4 MB embed cap at load time,
not just at the 20 MB hard ceiling. A 5-20 MB image previously sailed
through the native fast path and got baked into conversation history,
where Anthropic's 5 MB per-image base64 limit rejected every subsequent
turn with a 400 — and because history is immutable, retries could never
clear it, permanently wedging the session.

Also harden the reactive shrink-recovery: it now returns False (don't
retry) when any oversized image part can't be brought under target, so
the single retry isn't burned re-sending a payload that will fail
identically. Previously it returned True after shrinking *any* part,
even when the actual oversized culprit survived.
2026-05-31 00:12:09 -07:00
Teknium
d4e7b2fc19 fix(voice): allow /voice over SSH when a sound server is reachable (#35719)
SSH sessions hard-failed voice mode on the presence of SSH_* env vars
alone, even when a PulseAudio/PipeWire server is running on the host and
audio works (ffplay/aplay/pw-play -> pulseaudio). Probe the default
sound-server sockets (PULSE_SERVER unix path, PULSE_RUNTIME_PATH/native,
$XDG_RUNTIME_DIR/{pulse/native,pipewire-0}) and actually connect() so a
stale socket doesn't count; downgrade the SSH branch to a notice when
audio is reachable. Mirrors the existing Docker/WSL forwarding handling.

Fixes #35622
2026-05-31 00:11:52 -07:00
Teknium
d276018378 docs(toolsets): clarify all/* wildcard does not enable kanban (#35729)
The all/* wildcard expands to every registered toolset, but a handful of
tools have an additional check_fn gate on top of toolset membership and
are intentionally NOT turned on by all/* alone:

- Capability-gated tools (browser, computer_use, code_execution, Feishu,
  Home Assistant, cronjob) require their backend/credential prerequisite.
- The kanban toolset is workflow-gated and deliberately opt-in. Kanban
  tools mutate shared board state, so they stay off by default even under
  all/* — you must list 'kanban' by name (or be a dispatcher-spawned
  worker with HERMES_KANBAN_TASK set).

This was the expectations gap behind #35581 — the docs previously said
all/* expands to 'every registered toolset' without noting the carve-out.

Closes #35581.
2026-05-31 00:10:50 -07:00
teknium1
bd72d333dc fix(gateway,cron): reuse existing _HERMES_GATEWAY marker; tighten cron regex
Follow-up to the salvaged #30728:
- Gateway already exports _HERMES_GATEWAY=1 at startup (gateway/run.py) and
  cli.py already keys off it. Drop the redundant new HERMES_IN_GATEWAY var;
  guard stop/restart on _HERMES_GATEWAY instead. One marker for one fact.
- Drop the greedy \bgateway.*restart alternation from the cron lifecycle
  filter — it false-positived on legit prompts that merely mention an
  unrelated gateway + a restart (API/payment gateway monitoring). The
  specific 'hermes gateway (restart|stop|start)' pattern already covers the
  real command.
- Rework the two negative guard tests to sentinel the first downstream call
  so they don't drive real signal delivery (tripped the live-system guard).
- Add false-positive regression cases to test_safe_commands.
2026-05-30 23:05:56 -07:00
simokiihamaki
5cd6c1717d fix(gateway,cron): prevent agent restart loops via self-targeting gateway commands (#30719)
Three defenses against SIGTERM-respawn loops when agent schedules its
own gateway restart under launchd/systemd KeepAlive:

1. HERMES_IN_GATEWAY env var: gateway sets it at startup; stop/restart
   subcommands refuse to run when set (exit 1 with clear message).

2. Cron create payload filter: regex pre-flight rejects prompts/scripts
   containing hermes gateway restart/stop, launchctl kickstart/unload,
   systemctl restart/stop, and pkill patterns.

3. 30 new tests: pattern matching (14), cron block (5), gateway guard (4),
   safe command negatives (7).
2026-05-30 23:05:56 -07:00
Teknium
9b78f411c8 fix(security): neutralize file paths in mutation-verifier footer (#35584) (#35684)
The per-turn file-mutation verifier footer rendered failed-write paths as
bare absolute paths in the user-facing response. The gateway's
extract_local_files() scans response text for bare paths ending in a
deliverable extension (.yaml/.json/etc.), validates os.path.isfile(), and
auto-attaches matches as native uploads — so a denied write to
~/.hermes/config.yaml surfaced the path in the footer and got the
credential file silently uploaded to the messaging channel.

The gateway denylist (validate_media_delivery_path) already blocks the
config.yaml case after #35634. This is defense-in-depth at the source:
backtick-wrap every path the footer emits — both the bullet path and any
path echoed inside the tool's error preview (the protected-file denial
message embeds the path in single quotes, which do NOT block the
extractor regex). extract_local_files skips paths inside inline-code
spans, so wrapping defeats auto-attachment for ANY protected file while
keeping the path human-readable.

- run_agent.py: _format_file_mutation_failure_footer wraps bullet paths;
  new _neutralize_footer_paths backticks any remaining bare path (covers
  the preview echo). staticmethod -> classmethod (caller unaffected).
- tests: backtick-wrap assertion + end-to-end extract_local_files leak test.
2026-05-30 23:05:23 -07:00
Teknium
dc4de14377 fix(telegram): retry on httpx pool timeout instead of dropping the send (#35664)
When PTB's general httpx pool is exhausted, it converts httpx.PoolTimeout
into telegram.error.TimedOut whose message states the request was *not*
sent to Telegram. The send retry loop treated all non-connect TimedOut as
non-retryable, so a pool timeout raised immediately, skipped all 3 retry
attempts, and was returned as retryable=False -- silently dropping the
message (agent responses, cron reports, etc.).

A pool timeout means the request never left the process, making it the
safest case to retry. Add _looks_like_pool_timeout() and treat it like a
connect timeout in both the in-loop retry decision and the outer retryable
determination, so pool timeouts flow through the existing backoff loop and
stay retryable on exhaustion.

Reported-by: q3874758 (#35610)
2026-05-30 22:58:16 -07:00
LeonSGP43
02d1da49de Block Hermes root config in media delivery 2026-05-30 21:02:36 -07:00
Teknium
50db2d9c12 feat(models): add deepseek-v4-flash, trim variants, group curated lists by maker (#35659)
* feat(models): add deepseek-v4-flash to OpenRouter + Nous curated lists

deepseek/deepseek-v4-flash was already in the native deepseek provider
catalog but missing from the curated OpenRouter and Nous Portal picker
lists. Added it to both and regenerated the model-catalog.json manifest
(drift guard requires same-PR regeneration).

* refactor(models): trim redundant variants, group curated lists by maker

Remove claude-opus-4.7/4.6, gpt-5.4-nano, gpt-5.3-codex,
gemini-3-pro-image-preview, gemini-3.1-flash-lite-preview, grok-4.20,
and the older gemini-3-pro-preview (Nous). Reorder both OPENROUTER_MODELS
and _PROVIDER_MODELS[nous] into contiguous per-maker blocks with comment
headers. Regenerated model-catalog.json (openrouter 27, nous 20).

* feat(models): add gemini-3-pro-preview to OpenRouter + Nous curated lists

Adds google/gemini-3-pro-preview to both curated pickers (new on
OpenRouter, restored on Nous). Regenerated model-catalog.json
(openrouter 28, nous 21).

* test(models): use claude-opus-4.8 in OpenRouter fetch fixtures

The two TestFetchOpenRouterModels tests mocked a live OpenRouter
response with claude-opus-4.6 and relied on it surviving the curated-list
filter. Since 4.6 was removed from OPENROUTER_MODELS, those models got
filtered out and the recommended tag shifted. Swap the fixture to
claude-opus-4.8 (still curated, still first in the Anthropic block).
2026-05-30 20:57:01 -07:00
teknium1
fe62424ac4 test(redact): assert Discord mentions pass through unchanged
Rewrite TestDiscordMentions as negative assertions (mentions survive the
redactor) and clean up the orphaned comment + dangling whitespace left by
removing _DISCORD_MENTION_RE. Follow-up to the salvaged #32259 fix for #35611.
2026-05-30 20:48:41 -07:00
BarnacleBoy
c2cbe2c97d fix: remove Discord mention redaction from secret scrubber 2026-05-30 20:48:41 -07:00
Teknium
9ed9af2f7d fix(update): name new config options in migration prompt; skip prompt for pure version bumps (#35658)
The 'hermes update' config-migration prompt printed only counts ('1 new
config option available') then asked 'configure them now?' without ever
saying what the options were. Users said no because they couldn't tell what
they were agreeing to. For pure config-format version bumps (no new
env/config keys) it still asked the question, where saying yes just bumped
the version and looked like a no-op.

- List each new env var / config key by name + description before prompting
  (cap at 8, then '… and N more'). The data was already available; we just
  threw it away and printed a count.
- Pure version bump (no new options): apply the format migration
  non-interactively and print what happened, instead of asking a misleading
  yes/no.

Reported by ScottFive and Tt2021.
2026-05-30 20:42:37 -07:00
Teknium
b1d34cf6e2 fix(tui): clamp bogus terminal dimensions (WSL 131072x1) (#35657)
Some hosts (notably WSL) report a junk window size such as 131072 columns
by 1 row. Both the Ink fork and our components only guard against
0/null/undefined/NaN (stdout.columns || 80), so a positive-but-absurd
width sails through into createScreen(width*height), allocating tens to
hundreds of MB per frame and tripping the TUI memory monitor's hard exit.

Add clampStdoutDimensions(), installed in entry.tsx before ink.render: it
patches process.stdout.columns/rows with clamping getters (cols 1-2000,
rows 1-1000; out-of-range -> 80x24). One install point fixes the renderer,
its resize handler, and every component read. Live resizes still propagate
through the original descriptor, just clamped.
2026-05-30 20:42:30 -07:00
brooklyn!
cd067ab91e fix(tui): swallow degraded mouse-burst noise so a stalled loop can't lock the composer (#35512)
* fix(tui): swallow degraded mouse-burst noise so a stalled loop can't lock the composer

When the Node event loop blocks during a heavy render/tool-call burst, stdin
stops being drained. Mode-1003 any-motion mouse reports pile up in the kernel
buffer, get partially read, and arrive as text with the `\x1b[<` prefix AND
coordinate digits chewed off across many partial reads. The existing fragment
recovery (SGR_MOUSE_FRAGMENT_RE) only handles clean `button;col;row[Mm]`
triples, so the degraded shards leak into the composer as typed text — the user
can no longer type or exit until the stall clears.

Captured leak (Windows Terminal, during tool calls):

  M6M35;220;56M6M35;218;56M169;48M;157;47M;44M20;43M79;40M78;40M0M7M35;49;41M
  48;41M;47;40M9;15;32M[I;31M5;211;26M35;211;25M7M;220;1MM0M09;25M24M23M3;22M
  M18M99;26M32MM38M63;44M47MM1;51M M4M54M

Add two recovery layers in parseTextWithSgrMouseFragments / the text-token path:

- MOUSE_BURST_NOISE_RE: whole-text fast path. If a text token is drawn only
  from the mouse-leak alphabet (`[ ] < ; I M m`, digits, spaces) AND carries
  the structural signature of mouse coordinates (>=3 M/m terminators, a digit,
  and a `;`), swallow it wholesale.
- MOUSE_BURST_RESIDUE_RE: swallows pure-noise residue in the gaps between and
  after recovered fragments, so a partially-recovered burst doesn't trail a
  chewed-up tail into the prompt.

All three constraints together preserve real prose: `Mmm MMM mmm yummy` has no
digit/`;`, `see 1;2;3M for details` has disqualifying letters, and
`1234;56;78M9;10;11M` has only two terminators — none are swallowed.

This is defense-in-depth: it stops the leak/lockout regardless of what blocks
the loop. The underlying event-loop stall during streaming is a separate,
still-open issue that needs live-turn instrumentation to root-cause.

* fix(tui): check mouse-burst noise before fragment recovery; drop test cast

Copilot review on #35512:

- MOUSE_BURST_NOISE_RE was only evaluated when parseTextWithSgrMouseFragments
  returned null. A noise blob that contains any intact `<b;c;r M` fragment makes
  fragment recovery return non-null, so the whole-text swallow never fired and
  the code emitted a pile of recovered mouse events instead of dropping the blob
  wholesale (contradicting the comment, and doing extra work mid-stall). Move the
  noise check ahead of fragment recovery so pure-noise tokens are dropped early.
  Add a regression test for a noise blob carrying intact fragments.

- Drop the unnecessary `(e as { isPasted?: boolean })` cast in the test;
  discriminated-union narrowing on `e.kind === 'key'` exposes isPasted directly.

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

* Potential fix for pull request finding

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
2026-05-30 22:27:14 -05:00
helix4u
355af2c20f fix(session): survive missing FTS5 runtimes 2026-05-30 18:59:08 -07:00
Teknium
ec67def5bf fix(install): refresh stale uv so installs actually get FTS5 Python (#35541)
The installer's ensure_fts5() handled a no-FTS5 Python by running
'uv python install --reinstall', but WHICH Python builds a uv can
install is baked into the uv binary's download manifest. A stale uv
(e.g. 'pip install uv==0.7.20', which predates python-build-standalone
#694) only knows about pre-FTS5 builds, so --reinstall just pulls the
same FTS5-less interpreter — a no-op for FTS5. Result: 'Could not obtain
an FTS5-capable Python' and a broken session search even on the
supported installer path.

ensure_fts5() now escalates uv itself: reinstall with current uv ->
'uv self update' + reinstall (stale standalone uv) -> install a fresh
standalone uv into a temp dir and reinstall with that (externally-managed
uv that can't self-update, the reported case). Pythons live in uv's
shared store, so the fresh uv's --reinstall overwrites the stale
interpreter in place and the installer's later 'uv python find' resolves
to the FTS5-capable build.

Verified against the reporter's exact repro (ubuntu:24.04 +
pip install uv==0.7.20): Python 3.11.13 (no FTS5) -> 3.11.15 (FTS5).
2026-05-30 18:59:05 -07:00
teknium1
4ec0adebe8 fix(gateway): denylist config.yaml for media delivery (belt-and-suspenders)
Defense-in-depth on top of the EphemeralReply gate: even if a config.yaml
path reaches response text via some other path, it can never be delivered
as a native attachment. Matches existing protection for .env, auth.json,
and credentials/.

Co-authored-by: JezzaHehn <jezzahehn@gmail.com>
2026-05-30 18:58:46 -07:00
helix4u
bdfba45247 fix(gateway): stop system tips from auto-uploading local files 2026-05-30 18:58:46 -07:00
Teknium
b1a25404b6 perf(read_file): make compact gutter the only format; drop HERMES_READ_GUTTER (#35532)
The compact "<n>|content" gutter from #35368 is now the sole behavior.
Removes the HERMES_READ_GUTTER=padded escape hatch and its env lookup —
no legacy fixed-width path to maintain. Padding was pure token overhead
(~48% more tokens than bare content, ~16% more than compact) with no
measured accuracy gain in the original A/B.

- file_operations.py: drop env lookup + os import; gutter always f"{i}|{line}"
- tests: drop the padded env-override test; compact assertions retained
2026-05-30 14:38:30 -07:00
brooklyn!
5921d66785 fix(cli): stop OSC 11 bg probe from trapping users in a stray editor (#35441)
Over SSH the OSC 11 background-color query round-trip routinely exceeds
the 100ms read budget, so _query_osc11_background() gives up and the late
reply lands after prompt_toolkit has grabbed the tty. prompt_toolkit then
injects the OSC payload as typed text and reads its BEL terminator
(\x07 = Ctrl+G) as a keystroke — Ctrl+G is the open-external-editor
binding, dropping the user into vi with garbage and no obvious way out.

- Skip the OSC 11 probe on remote sessions (SSH_CONNECTION/CLIENT/TTY);
  fall back to COLORFGBG / env hints / the dark default.
- Restore the tty with TCSAFLUSH instead of TCSANOW so any partial/late
  reply is scrubbed from the input buffer before pt reads it.
2026-05-30 11:55:12 -05:00
Sylw3ster
6a72af044c fix(managed-gateway): keep tool availability scans off the Nous token-refresh path 2026-05-30 07:58:08 -07:00
Teknium
96643b4a52 fix(file-tools): anchor relative-path resolution to absolute base; report resolved path (#35399)
Relative paths in write_file/patch could resolve against the agent PROCESS cwd
instead of the terminal's working directory. In a git-worktree session with a
stale TERMINAL_CWD='.' (a relative base), early edits silently landed in the
MAIN checkout, verified there, and reported success — while the agent inspected
the worktree and saw nothing, misreading it as the patch tool no-op'ing.

- _resolve_base_dir(): resolution base is now ALWAYS absolute. A relative
  TERMINAL_CWD is anchored to the process cwd once, deterministically, instead
  of being left to resolve()-time cwd. Live terminal cwd stays authoritative.
- write_file/patch pass the resolved absolute path to the shell FileOps layer
  so the tool layer and shell layer can't disagree about which file is edited.
- Responses now report the absolute resolved_path and files_modified, so a
  wrong-cwd mismatch is visible on the first call.
- _path_resolution_warning(): emits a _warning when a relative path resolves
  OUTSIDE the live terminal cwd (e.g. a worktree session writing into main).

Validation: 11 new unit tests + 43 live E2E assertions (worktree routing,
mid-session cd, V4A patches, divergence warning, absolute paths, consecutive
patches); 466 existing file/path/terminal tests green.
2026-05-30 07:55:36 -07:00
Sylw3ster
0c6e133c04 perf(cli): stop eager MCP discovery from blocking agent-capable startup 2026-05-30 07:45:26 -07:00
Teknium
b47cb1bbf2 feat(kanban): file attachments on tasks (#35395)
Tasks can now carry file attachments (PDFs, images, source docs) that
workers read directly — closes the gap where source material had to be
pasted as a path into the task body.

- kanban_db: task_attachments table (additive), Attachment dataclass,
  add/list/get/delete accessors, attachments_root/task_attachments_dir
  path helpers (per-board, HERMES_KANBAN_ATTACHMENTS_ROOT override)
- build_worker_context: surfaces each attachment's absolute path so the
  worker (full file/terminal tool access) reads it via read_file/pdftotext
- dashboard API: POST/GET/DELETE attachment routes (multipart upload,
  25MB cap, traversal-safe filenames, root-containment check on download)
- dashboard UI: Attachments section in the task drawer — upload button,
  list with download, per-row remove
- docs + tests (13 cases: DB accessors, REST round-trip, traversal
  rejection, collision suffixing, worker-context surfacing)

Closes #35338
2026-05-30 07:41:04 -07:00
teknium1
20d073fd0b test: update extract_local_files Windows-path test for new matching behavior
test_windows_path_not_matched asserted the pre-fix POSIX-only behavior.
The Windows drive-letter support now intentionally matches these paths,
so replace it with parametrized positive cases plus a relative-path
negative guard, mirroring tests/gateway/test_platform_base.py.
2026-05-30 07:38:03 -07:00
teknium1
1b955450e3 test: use raw docstring in test_run_tool_media_re to silence escape warning 2026-05-30 07:38:03 -07:00
Tranquil-Flow
51d165a8e7 fix(gateway): support Windows absolute paths in MEDIA tag regex and extract_local_files (#34632)
The MEDIA_TAG_CLEANUP_RE and extract_local_files path regex both used
(?:~/|/) to anchor paths, which only matches Unix-style absolute and
home-relative paths. Two additional _TOOL_MEDIA_RE patterns in run.py
had the same limitation. Windows absolute paths (C:\Users\..., D:/...)
were silently ignored, causing MEDIA directive delivery to fail.

Add [A-Za-z]:[/\\] as a third anchor alternative in all four regex
locations (base.py x2, run.py x2). Also update path separators in
extract_local_files from / to [/\\] so it can traverse Windows
directory trees.

Revert accidental + quantifier in MEDIA_TAG_CLEANUP_RE lookahead
that changed match-one to match-one-or-more (unrelated to fix).

Fixes: #34632
2026-05-30 07:38:03 -07:00
Teknium
45465b0d5d fix(gateway): never auto-pause platforms on transient network/DNS failures (#35387)
The per-platform reconnect watcher auto-paused a platform after 10
consecutive reconnect failures, setting next_retry=inf and requiring a
manual /platform resume to recover. But both pause sites only ever fire
on *retryable* failures — non-retryable errors (bad auth) already drop
out of the retry queue earlier. So a transient DNS outage that spanned
the watcher's backoff window would silently park the bot forever, even
after connectivity returned.

The watcher's own docstring already promised 'retryable failures keep
retrying at the backoff cap indefinitely' — the code contradicted it.

Remove the auto-pause from both reconnect-failure branches. Retryable
failures now retry at the 5-min backoff cap forever and self-heal once
the network recovers. The circuit breaker (_pause_failed_platform /
_resume_paused_platform) stays for manual /platform pause|resume.

Fixes #35284.
2026-05-30 07:33:34 -07:00
teknium1
cddb7283d9 fix(gateway): config.yaml path for WhatsApp/Weixin text-batch delays
Convert the salvaged text-debounce delays from HERMES_* env vars to
config.yaml (gateway.platforms.<name>.extra.text_batch_delay_seconds /
text_batch_split_delay_seconds), per the '.env is for secrets only'
policy. Adds a finite/non-negative guard so bad YAML values fall back to
the defaults instead of crashing asyncio.sleep().

- whatsapp.py / weixin.py: read delays via _coerce_float_extra(config.extra)
- update Weixin content-dedup regression test for the deferred dispatch path
- add text-debounce coverage (whatsapp + weixin): defaults, config override,
  bad-value fallback, env-var-ignored, burst-collapse, lone-message
- docs: WhatsApp + Weixin config keys
2026-05-30 07:33:15 -07:00
RedPiggy
b0ce47daac feat: add text debounce batching for WhatsApp and WeChat platforms
WhatsApp and WeChat (Weixin/iLink) both deliver messages individually
without any client-side batching, so rapid multi-message bursts (forwarded
batches, paste-splits, etc.) each trigger a separate agent invocation.

This wastes tokens (redundant system prompts / context for each fragment)
and degrades UX (the user receives reply fragments instead of a single
coherent response).

Both adapters now mirror the Telegram adapter's proven text-debounce
pattern:

- _text_batch_delay_seconds / _text_batch_split_delay_seconds
  (configurable via env vars)
- _pending_text_batches dict for per-session aggregation
- _enqueue_text_event() concatenates successive TEXT messages and
  resets the flush timer
- _flush_text_batch() dispatches after the quiet period expires

Configurable via env vars:
  HERMES_WHATSAPP_TEXT_BATCH_DELAY_SECONDS (default 5.0)
  HERMES_WHATSAPP_TEXT_BATCH_SPLIT_DELAY_SECONDS (default 10.0)
  HERMES_WEIXIN_TEXT_BATCH_DELAY_SECONDS (default 3.0)
  HERMES_WEIXIN_TEXT_BATCH_SPLIT_DELAY_SECONDS (default 5.0)
2026-05-30 07:33:15 -07:00
Teknium
234ac00937 fix(dashboard): allow insecure WS peers on explicit non-loopback binds (#35386)
The merged 0.0.0.0/:: insecure-bind fix (#35141) did not cover binding
directly to a specific non-loopback address (e.g. a Tailscale/LAN IP via
--host 100.64.0.10 --insecure). In that mode the dashboard HTML loaded but
every WebSocket upgrade was rejected by the loopback-only peer guard, so
/chat connected then silently received no data.

Generalize _ws_client_is_allowed to lift the loopback-only peer gate for
any explicit non-loopback bound host, not just the 0.0.0.0/:: wildcard.
DNS-rebinding stays blocked: _ws_host_origin_is_allowed already requires
the Host header to exactly match the bound interface for explicit binds,
mirroring _is_accepted_host on the HTTP layer.

Co-authored-by: pxdsgnco <14163800+pxdsgnco@users.noreply.github.com>
2026-05-30 07:33:02 -07:00
teknium
433bffff51 fix(cli): surface oneshot agent exceptions to stderr with rc=1
Layer an exception guard on top of the empty-response fix so a crash
inside the agent (e.g. OSError from prompt_toolkit/Vt100 when stdout is a
non-TTY pipe, per #30623) is surfaced on the real stderr with rc=1 instead
of crashing past the redirect_stderr block. KeyboardInterrupt/SystemExit
are re-raised so Ctrl-C and explicit exits still propagate.

Also map briancl2 in scripts/release.py AUTHOR_MAP for the cherry-picked
empty-response commit.

Adapts the exception-guard approach from sweetcornna's PR #33818.

Co-authored-by: sweetcornna <96944678+ymylive@users.noreply.github.com>
2026-05-30 07:31:48 -07:00
Brian LaFlamme
9fbde54b51 fix(cli): fail closed on empty oneshot responses 2026-05-30 07:31:48 -07:00
Teknium
92ad7cc62c fix(browser): recover from CDP DOM-node serialization crash in browser_console (#35385)
browser_console(expression="document.body") returned the cryptic CDP error
"Object reference chain is too long" instead of a usable result.

With returnByValue=true, Chrome deep-serializes the eval result; for a live
DOM Node/NodeList/Window that serialization overruns CDP's recursion guard
and fails the whole call with a protocol-level error (not a JS exception),
which _browser_eval surfaced raw.

- browser_supervisor.evaluate_runtime: on that specific error, retry once
  with returnByValue=false so Chrome returns the node's description string —
  the same graceful path already used for document.querySelector() results.
- browser_tool._browser_eval (CLI subprocess fallback): the subprocess can't
  retry, so convert the reference-chain error into actionable guidance
  (extract a primitive / use JSON.stringify) instead of leaking it raw.

No expression rewriting — normal evals (1+41 -> 42) are untouched.
2026-05-30 07:31:25 -07:00
Teknium
42bbd221e8 fix(compressor): strip stale handoff prefix on resume; reconcile #26290+#32787 (#35344)
A handoff persisted under an older SUMMARY_PREFIX can be inherited into a
resumed lineage. _strip_summary_prefix only matched the current/legacy
literal, so on re-compaction the old 'resume exactly from Active Task'
directive stayed embedded in the body and kept hijacking replies to new,
unrelated user messages.

- Add _HISTORICAL_SUMMARY_PREFIXES (pre-#35344 prefix) and strip/recognize
  them in _strip_summary_prefix + _is_context_summary_content so resumed
  stale handoffs are re-normalized to the current latest-message-wins prefix.
- Reconcile the overlapping Active Task template edits from the salvaged
  #26290 (reverse-signal cancellation) and #32787 (capture open questions /
  decisions, don't write None too eagerly) — both intents kept.
- Regression coverage in tests/agent/test_resume_stale_active_task.py.
- AUTHOR_MAP entries for both salvaged contributors.
2026-05-30 07:29:21 -07:00
Mathijs van den Hurk
56b8dccf25 fix(compressor): treat unanswered user questions as Active Task, not 'None'
The Active Task field in compression summaries is the single most important
field for task continuity across context boundaries. The previous template
described it narrowly as a 'task assignment' or 'request', which caused the
summary LLM to write 'None' whenever the user's most recent input was a
question, a decision request, or a discussion turn rather than an
imperative command. The assistant on the other side of the compaction then
treated the conversation as resolved and gave a generic recap instead of
answering the still-open question.

Expand the template guidance to cover:

  * explicit task assignments
  * questions awaiting an answer
  * decisions awaiting input (A vs B)
  * ongoing discussions where the assistant owes the next substantive reply

Reserve 'None' for the rare case where the last exchange was fully
resolved (e.g. user said 'thanks, that's all').

Also tighten the trailing CRITICAL instruction in the summary prompt so the
LLM cannot fall back to the old 'no imperative command → None' heuristic.

No behavioural code changes — template strings only. All 83 existing
compressor tests pass.
2026-05-30 07:29:21 -07:00
Zhipeng Li
020601d41e fix(compression): drop conflicting 'resume Active Task' directive in summary prefix
SUMMARY_PREFIX previously contained two contradictory directives:

1. "treat it as background reference, NOT as active instructions"
   "Do NOT answer questions or fulfill requests mentioned in this summary"
   "Respond ONLY to the latest user message that appears AFTER this summary"

2. "Your current task is identified in the '## Active Task' section of the
    summary — resume exactly from there."

When the latest user message contradicted Active Task (e.g. 'stop the
i18n refactor', 'never mind, look at grafana instead'), models tended to
follow (2) anyway because 'resume exactly' is a strong, unambiguous
directive — leading to repeated re-surfacing of already-cancelled work
across turns, even after explicit 'stop'/'don't keep bringing that up'
messages from the user.

This change:
- Removes the conflicting 'resume exactly from Active Task' clause.
- Makes the precedence explicit: latest user message is the single source
  of truth; it WINS on conflict; cancelled Active Task / In Progress /
  Pending User Asks / Remaining Work must be discarded entirely (no
  'wrap up the old task first').
- Names canonical reverse signals (stop, undo, roll back, never mind,
  just verify, topic change) so the model recognizes them as cancellation
  triggers, not background context.
- Updates the summarizer template instruction so the LLM doesn't
  mechanically copy a cancelled task into Active Task on the next
  compaction (it's instructed to copy the reverse signal verbatim).
- Preserves: REFERENCE ONLY framing, MEMORY.md/USER.md authority, and
  the 'don't repeat work already reflected in session state' clause.

Adds tests/agent/test_summary_prefix_semantics.py to pin invariants so
the conflict can't regress.

Tested:
- All compaction tests pass: tests/agent/test_context_compressor.py,
  tests/agent/test_context_compressor_summary_continuity.py,
  tests/run_agent/test_413_compression.py,
  tests/run_agent/test_compression_persistence.py,
  tests/run_agent/test_compression_boundary_hook.py,
  tests/cli/test_manual_compress.py — 117/117 passing.
- Tested on macOS.
2026-05-30 07:29:21 -07:00
teknium1
182739fcda test(interrupt): assert no leaked tid instead of no-op block
Follow-up on the #35309 regression test: the trailing `with _lock: pass`
asserted nothing. Replace it with a concrete assertion that
_interrupted_threads is empty after the worker exits, directly verifying
the leak the fix prevents.
2026-05-30 07:28:11 -07:00
liuhao1024
bede3cf12d fix(tools): wrap _run_tool cleanup in finally to prevent interrupt state leak
When _invoke_tool raises a BaseException (CancelledError, KeyboardInterrupt),
the cleanup code at the end of _run_tool was bypassed because it sat outside
the except block (which only catches Exception).  ThreadPoolExecutor recycles
thread IDs, so the leaked tid in _interrupted_threads poisons the next tool
scheduled on that thread — it instantly aborts with 'Interrupted'.

Move the discard + _set_interrupt(False) into a finally block so cleanup
runs regardless of how the worker exits.

Fixes #35309
2026-05-30 07:28:11 -07:00
Teknium
2b16b756a7 fix(gateway): recover model on post-interrupt turn; gate fallback status (#35381)
Empty model could reach the API on a recovery turn after stream_interrupt_abort,
failing HTTP 400 "No models provided" with no recovery — the session went
silent until the user manually re-sent (#35314).

- gateway/run.py: cache last-successfully-resolved model per session (+ a
  process-wide slot); when a fresh config read returns an empty model on a
  recovery turn, reuse the last-known-good instead of building model="".
- run_agent.py + agent/conversation_loop.py: only emit "trying fallback..."
  status when a fallback chain actually exists, so the UI stops announcing a
  fallback that will never run (also #17446).
- tests: empty-model recovery + _has_pending_fallback gate.
2026-05-30 07:28:06 -07:00
Teknium
10dec7c6dc fix(kanban): respect mobile safe areas in task detail drawer (#35378)
* fix(file-tools): handle UTF-8 BOM in read_file / write_file / patch

Some Windows editors prepend an invisible UTF-8 BOM (U+FEFF) to text
files. We had no awareness of it, so: read_file surfaced a phantom
U+FEFF as the first character; patch matches against the true first
line could miss; and a write/patch round-trip silently stripped the
marker, changing the file's byte signature.

Now:
- read_file / read_file_raw strip a single leading BOM so the model
  never sees it (only on the first chunk — the marker lives at byte 0).
- patch_replace strips the BOM before fuzzy-matching (so an exact
  first-line match works) and its post-write verification compares
  BOM-stripped content.
- write_file restores the BOM when the original file had one and the
  new content doesn't, mirroring the existing line-ending preservation
  (detect on disk via a cheap `head -c 3` probe or reuse pre_content,
  re-prepend across the edit). Guards against double-BOM.

Mid-content U+FEFF is left alone (it's data there, not a file marker).

Tests: TestBomHandling (real LocalEnvironment) — read-strips, raw-read
strips, write preserves, no-BOM-when-original-had-none, no-double-BOM,
patch round-trip preserves, patch matches first line through a BOM,
plus helper unit tests. 208 file-tool tests green.

* fix(kanban): respect mobile safe areas in task detail drawer

The task detail drawer is a body-level z-60 fixed overlay using
height:100vh starting at the viewport top. On mobile this puts the
drawer header behind the dashboard's fixed top bar (min-h-14, z-40)
and lets the bottom comment input sit under the browser's collapsing
nav bar.

- drawer: 100vh -> 100dvh (+ max-height:100dvh), 100vh kept as fallback
- head: padding-top honors env(safe-area-inset-top); mobile (<1024px,
  matching the lg breakpoint where the fixed bar shows) clears the
  3.5rem header
- comment-row + body: bottom padding extended with
  env(safe-area-inset-bottom) so the bottom-most element clears the
  mobile browser chrome

Mirrors the host shell idiom (100dvh + env(safe-area-inset-bottom) in
web/), and web/index.html already sets viewport-fit=cover so the insets
resolve. max()/calc() fallbacks leave desktop unchanged.

Closes #35324
2026-05-30 07:13:26 -07:00
Teknium
ea6eaabd8f perf(read_file): compact line-number gutter — ~14% fewer tokens per read (#35368)
read_file's gutter used a fixed-width zero/space-padded prefix
("     1|content"). The padding is pure token overhead: measured with
cl100k on real Hermes source, the padded gutter costs ~48% more tokens
than bare content and ~16% more than a compact "<n>|content" gutter,
because the leading spaces tokenize into extra tokens on every line.

Switched the default to the compact "<n>|content" form. An A/B
(Sonnet 4.6 via OpenRouter, 2 passes, 4-task battery, every claim
verified against ground truth) showed:
  - padded  : 4/4 PASS both passes
  - compact : 4/4 PASS both passes  ← keeps line-referencing + patch
  - none    : 3/4 PASS both passes  ← dropping numbers entirely made
              the model hand-count lines and answer off-by-one (33 vs 34)

So we keep the line numbers (the model genuinely uses them to reference
lines) but drop the wasteful padding — capturing ~14% of the read-token
cost with zero measured accuracy change. Dropping numbers entirely
(the larger 33% saving) is rejected: it regresses line-referencing.

patch/fuzzy_match never consumed the gutter (they match old_string text
and compute char offsets internally), so editing is unaffected. No
downstream parser keys on the fixed-width columns. HERMES_READ_GUTTER=
padded restores the legacy format for anyone relying on alignment.

Tests: updated the 3 format assertions to the compact gutter; added an
env-override test for the legacy padded format. 209 file-tool tests green.
2026-05-30 07:01:22 -07:00
Teknium
5f84c9144a fix(file-tools): handle UTF-8 BOM in read_file / write_file / patch (#35278)
Some Windows editors prepend an invisible UTF-8 BOM (U+FEFF) to text
files. We had no awareness of it, so: read_file surfaced a phantom
U+FEFF as the first character; patch matches against the true first
line could miss; and a write/patch round-trip silently stripped the
marker, changing the file's byte signature.

Now:
- read_file / read_file_raw strip a single leading BOM so the model
  never sees it (only on the first chunk — the marker lives at byte 0).
- patch_replace strips the BOM before fuzzy-matching (so an exact
  first-line match works) and its post-write verification compares
  BOM-stripped content.
- write_file restores the BOM when the original file had one and the
  new content doesn't, mirroring the existing line-ending preservation
  (detect on disk via a cheap `head -c 3` probe or reuse pre_content,
  re-prepend across the edit). Guards against double-BOM.

Mid-content U+FEFF is left alone (it's data there, not a file marker).

Tests: TestBomHandling (real LocalEnvironment) — read-strips, raw-read
strips, write preserves, no-BOM-when-original-had-none, no-double-BOM,
patch round-trip preserves, patch matches first line through a BOM,
plus helper unit tests. 208 file-tool tests green.
2026-05-30 06:25:50 -07:00
sprmn24
5a1aa9e68c fix(nous_account): add threading lock to prevent TOCTOU race on cache
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-05-30 06:25:43 -07:00
teknium1
44f3e51865 fix(gateway): run adapter config hooks for nested-only platform blocks
The plugin apply_yaml_config_fn dispatch loop only ran when a top-level
platform block (e.g. `discord:`) existed. Configs that defined a platform
only under `platforms.<name>` or `gateway.platforms.<name>` skipped the
hook, so `platforms.discord.extra.allow_from` never reached
DISCORD_ALLOWED_USERS. Fall back to those nested blocks when the top-level
one is absent.

Also map byquenox@gmail.com -> Que0x for the salvaged commits.
2026-05-30 05:23:55 -07:00
quen0xi
6d2727ef1c fix(discord): bridge explicit allow_from configuration to env var mapping 2026-05-30 05:23:55 -07:00
quen0xi
0bfe19ba17 fix(gateway): merge nested gateway.platforms configuration block 2026-05-30 05:23:55 -07:00
Teknium
61268ff7a9 feat(cli): add hermes prompt-size diagnostic (#35276)
Adds a 'hermes prompt-size' command that reports the fixed prompt budget
for a fresh session: system prompt total, skills index, memory, user
profile, prompt tiers, and tool-schema JSON bytes. Runs offline (dummy
credentials force the direct-construction path, no network call).

Lets users see which block dominates their per-call payload — the skills
index is often the largest single block when many skills are installed
(issue #34667). Zero model-tool footprint: it's a top-level CLI
subcommand, not an agent tool.

--platform <name> simulates a channel's platform hint; --json emits a
machine-readable breakdown.

Closes #34667
2026-05-30 02:53:42 -07:00
kshitijk4poor
cbf851ae1d perf(tui): stop slow/dead MCP servers from freezing TUI startup
The 'summoning hermes…' phase blocked on gateway.ready, which ran MCP
tool discovery inline. Any configured-but-unreachable MCP server burned
its full connect-retry backoff (1+2+4s ≈ 7s) before the composer
appeared — startup went from instant to ~7.5s of dead air for anyone
with a down stdio/http server in mcp_servers.

Move discovery into a background daemon thread so gateway.ready fires
immediately; tools register into the shared registry as servers connect,
and the agent isn't built until the first prompt. Measured spawn→ready:
~7500ms → ~115ms (dead twozero_td server in config).

Also drop rich.console + prompt_toolkit off banner.py's import path
(lazy-imported inside cprint/build_welcome_banner). tui_gateway.server
imports banner only to reach the lightweight prefetch_update_check
helper; the eager rich/pt imports added ~45ms before gateway.ready for
no benefit. tui_gateway.server import: ~115ms → ~69ms.
2026-05-30 02:53:37 -07:00
teknium1
bfc4a26032 fix(tools): point email home-channel error at EMAIL_HOME_ADDRESS
The no-home-channel error for send_message derived the env var name
generically as <PLATFORM>_HOME_CHANNEL, producing EMAIL_HOME_CHANNEL for
the email platform. But gateway/config.py reads EMAIL_HOME_ADDRESS, so a
user following the error's guidance would set a variable that is never
consulted. Add a per-platform override map so the email hint names the
variable actually read; all other platforms keep the generic hint.
2026-05-30 02:39:08 -07:00
liuhao1024
d3724c0be6 fix(tools): recognize email addresses as explicit targets in send_message
When using send_message with the email platform, valid email addresses
like user@example.com were not recognized as explicit targets by
_parse_target_ref(). This caused the function to return (None, None,
False), forcing the system into channel-name resolution which has no
way to resolve a raw email address, resulting in 'No home channel set
for email' errors.

Add _EMAIL_TARGET_RE pattern and email platform handler in
_parse_target_ref() so email addresses are treated as explicit targets
and routed directly without requiring a home target configuration.
2026-05-30 02:39:08 -07:00
teknium1
622e534379 test(auxiliary): e2e routing assertions for custom-provider aux resolution
Adds two real-client tests on top of the salvaged #34783 fix:
- config-less custom:<name> endpoint routes via the carried live base_url
  (guards the #34777 symptom directly, not just the wiring)
- named custom:<name> WITH a config entry still resolves via the
  named-custom branch (regression guard against collapsing to bare custom)
2026-05-30 02:38:59 -07:00
liuhao1024
40fcb96585 fix(auxiliary): pass base_url/api_key/api_mode through set_runtime_main for custom providers
When a user configures a custom: provider (e.g. custom:openclaw-router),
set_runtime_main() only stored provider and model in process-local globals.
_resolve_auto() then had no base_url or api_key for the custom endpoint,
causing Step 1 to fail and auxiliary tasks (approval, compression, title
generation) to fall through to the aggregator chain and route to wrong
providers.

Fix: extend set_runtime_main() to accept base_url, api_key, and api_mode
keyword arguments; store them in new globals alongside the existing provider
and model; fall back to these globals in _resolve_auto() when the main_runtime
dict is empty. The call site in conversation_loop.py now passes all five
fields from the agent object.

Fixes #34777
2026-05-30 02:38:59 -07:00
Teknium
2475244ca0 fix(update/windows): robustly exclude launcher-shim ancestors from concurrent check (#35257)
hermes update on Windows still aborted with 'Another hermes.exe is running',
listing its own launcher shim(s) as concurrent instances (issues #29341,
#34795). The distlib Scripts\hermes.exe launcher spawns python.exe and waits;
detection runs in the python child, so the launcher shim shows up in
process_iter.

The prior fix walked the ancestor chain with per-hop current.parent() inside
'except: break' — the first psutil AccessDenied/NoSuchProcess (common on
Windows across session/elevation boundaries) bailed the walk early, leaving
the launcher in the candidate set and re-triggering the false positive.

- Switch to proc.parents() (whole ancestor list in one call), evaluate each
  ancestor independently so one unreadable hop never strands the launcher.
- Only exclude ancestors whose exe is itself a shim, so a genuine second
  hermes.exe under a non-Hermes parent (Desktop backend child) is still flagged.
- Message now prints a copy-pasteable 'taskkill /PID … /F' for the exact stale
  PIDs so a user who already closed everything can self-remediate.

Conservative shim-only ancestor approach credited to the parallel attempts in
PRs #29358 (xxxigm) and #31808 (jquesnelle).
2026-05-30 02:38:40 -07:00
Donovan Yohan
8bd00607dc fix(google-workspace): handle Gmail header casing case-insensitively
Normalize Gmail API message header names to lowercase before lookup so
gmail get/search/reply populate to/subject/from regardless of the casing
the message was stored with. Emit conventional MIME header casing
(To/Subject/Cc/From) on send and reply.

Fixes #34806

Co-authored-by: Donovan Yohan <donovan-yohan@users.noreply.github.com>
2026-05-30 02:38:18 -07:00
beardthelion
6baf0016be fix(run_agent): gate concurrent checkpoint preflight on block_result (fixes #34827)
In the concurrent tool-execution path, checkpoint preflight (write_file,
patch, destructive terminal) fired BEFORE plugin guardrail block_result
was computed. A blocked write_file could still dirty checkpoint state
(doc_modified_this_turn, _last_write_file_call_id, turn_counter).

Move checkpoint preflight to AFTER block_result computation, gated on
`if block_result is None:` — matching the invariant the sequential path
already enforces.
2026-05-30 02:38:12 -07:00
teknium1
e1945ff697 test(state): cover update_session_model overwrite + getattr-guard text path
Follow-up to LengR's #35181 salvage:
- gateway text-path uses getattr(self, '_session_db', None) to match the
  picker callback path (defensive for object.__new__() gateway test pattern).
- add SessionDB.update_session_model test asserting it overwrites the
  COALESCE-pinned model and survives subsequent token updates (#34850).
2026-05-30 02:35:36 -07:00
lengr
794519c6ad fix(state): persist mid-session model switch to database
When a user switches models mid-session via /model, the gateway updates
the in-memory agent and session overrides, but the database was never
updated. The COALESCE(model, ?) in update_token_counts() only fills NULL
values, so the dashboard always showed the original model.

Fix: Add SessionDB.update_session_model() that unconditionally sets the
model column, and call it from both the interactive picker and direct
/model command paths in the gateway.

Fixes #34850
2026-05-30 02:35:36 -07:00
teknium1
c9e31a8e4b chore(release): map tuancookiez-hub for #34865 salvage 2026-05-30 02:08:36 -07:00
Tuna Dev
296fcdfa52 fix(lsp): handle Windows .cmd shims in LSP process spawn
asyncio.create_subprocess_exec cannot run .cmd/.bat files on Windows
because CreateProcess expects a valid PE executable. npm-installed LSP
servers (intelephense, typescript-language-server, etc.) ship as .cmd
shims on Windows, causing WinError 193 on spawn.

Detect .cmd/.bat extensions and wrap with cmd.exe /c before spawning.
Gated behind sys.platform == 'win32' — no code path changes elsewhere.

Fixes #34864
2026-05-30 02:08:36 -07:00
Sylw3ster
460771bf0f fix(lsp): detect Windows wrapper binaries in installer probes 2026-05-30 02:08:36 -07:00
teknium1
41decf2c4a test(mcp): import os and pytest in test_mcp_stability
The salvaged grandchild-reaping tests reference os.getpgid/os.killpg and
pytest.mark/skip/importorskip directly, but the file only imported asyncio,
signal, and unittest.mock. Add the missing imports so collection succeeds
on current main.
2026-05-30 02:08:29 -07:00
konsisumer
a29d64e50c fix(mcp): reap stdio MCP grandchildren via process-group signal
The orphan reaper for stdio MCP subprocesses only tracked the direct child
PID spawned by ``stdio_client`` (e.g. ``openclaw mcp serve``). When that
wrapper itself spawned a helper (``claude mcp serve``) and then exited, the
helper reparented to ``systemd --user`` and survived shutdown.

The MCP SDK already spawns stdio children with ``start_new_session=True``,
so the wrapper is its own pgroup leader and same-pgroup descendants are
reachable via ``killpg``. Capture the pgid at spawn time and reap via
``killpg(pgid, sig)`` so reparented grandchildren are reaped alongside the
direct child, even after the wrapper itself exits. Falls back to per-pid
``os.kill`` on Windows or when no pgid was recorded.

Fixes part 2 (orphan ``claude mcp serve``) of #23799. Part 1 (per-invocation
respawn) was confirmed by the reporter to be an environmental artifact, not
a code bug.
2026-05-30 02:08:29 -07:00
teknium1
4d7ea3fd36 chore(release): map inchargeautomation-lab author email 2026-05-30 02:08:11 -07:00
teknium1
2334228eca fix(update): handle pipx installs + --system fallback in _cmd_update_pip
Extends the uv-tool detection (briandevans, #29703) to cover the
remaining no-venv install layouts that hit the same uv 'No virtual
environment found' error:

- pipx-managed installs (sys.prefix under .../pipx/...) -> 'pipx upgrade',
  matching scripts/auto-update.sh (pipx-detection idea from
  inchargeautomation-lab, #29852)
- bare pip outside any venv -> 'uv pip install --system --upgrade'
- venv (launcher shim) keeps the VIRTUAL_ENV overlay from #35224 and never
  gets --system, so the install always targets the venv, not system Python

The four branches are mutually exclusive; VIRTUAL_ENV is exported only for
the uv-pip-in-venv path (uv tool / pipx upgrade ignore it).

Co-authored-by: Joshua Kimbrell <incharge.automation@gmail.com>
2026-05-30 02:08:11 -07:00
briandevans
bebd4f8516 fix(cli): restrict uv-tool-install detection to running interpreter
Copilot review on PR #29703 flagged two issues with the `uv tool list`
fallback in `is_uv_tool_install`:

1. False positive: `uv tool list` returns the *machine*'s installed
   tools, not the active install. A regular pip/venv Hermes on a host
   that also has `uv tool install hermes-agent` available would be
   misclassified as a uv-tool install, and `hermes update` would
   upgrade the wrong copy.

2. Overhead: the subprocess call (up to a 15s timeout) was triggered
   even from `recommended_update_command_for_method`, which just
   computes a display string.

Restrict detection to properties of the running interpreter
(`sys.prefix` and `sys.executable` — both can carry the uv-tool layout
marker depending on entry point). Drop the `uv tool list` fallback and
the `uv_path` parameter entirely. `_cmd_update_pip` now also surfaces a
clear hint when the runtime looks like a uv-tool install but `uv` is
missing from PATH, instead of silently falling back to `python -m pip`.
2026-05-30 02:08:11 -07:00
briandevans
1bdb29d938 fix(cli): use uv tool upgrade when Hermes is a uv tool install (#29700)
Hermes installed via `uv tool install hermes-agent` lives outside any
venv. `_cmd_update_pip` previously ran `uv pip install --upgrade`, which
errors with `No virtual environment found; run uv venv ...`. The user
hits this on the very first `hermes update` after a standard
non-`--system` install with `uv` on PATH.

Add `is_uv_tool_install()` in `hermes_cli/config.py`: fast path inspects
`sys.prefix` for the standard `uv/tools/hermes-agent/` layout, falls
back to `uv tool list` for non-standard prefixes. Both the
user-facing `recommended_update_command_for_method("pip")` string and
the actual subprocess invocation in `_cmd_update_pip` now switch to
`uv tool upgrade hermes-agent` when detected. Non-tool installs and the
no-`uv` fallback keep their existing commands unchanged.
2026-05-30 02:08:11 -07:00
Teknium
39f6b6e9d2 fix(file-tools): make write_file/patch atomic (temp-file + rename) (#35252)
* Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here'

Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.

Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20

- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
  guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
  compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
  user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
  changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.

Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).

* fix(file-tools): make write_file/patch atomic (temp-file + rename)

write_file streamed content straight into the target via `cat > path`, so
a crash, SIGKILL, or truncated pipe mid-write left the file half-written
and corrupt. patch_replace routes through write_file, so it shared the flaw.

Now writes stream into a temp file in the SAME directory and `mv` it over
the target — a real same-filesystem rename, which is atomic on POSIX and on
every terminal backend (local/docker/ssh/modal). A failed write leaves the
original byte-intact and leaks no temp file. The existing file's mode is
preserved across the swap (stat + chmod, GNU/BSD), and content still rides
stdin so there's no ARG_MAX limit. A trap cleans the temp on any error path.

Tests: added TestAtomicWrite (real LocalEnvironment, no mocks) covering
inode-change-on-overwrite, mode preservation, failed-write-leaves-original,
no-temp-leak, special chars, and patch routing. Updated two mocks in
test_file_operations.py that keyed on the literal `cat >` write command to
key on the stdin_data behavioral signal instead. 200 file-tool tests green.
2026-05-30 02:07:50 -07:00
teknium1
6a08fd3c3f test(skills): assert restore via synced[copied], not manifest re-read
The hermetic CI env (slice 4/6) redirects HERMES_HOME, so a post-restore
_read_manifest() can resolve to an empty/redirected manifest path and return
{}. Assert on sync_skills's in-memory return value (synced["copied"]) instead,
which is the resilient signal that the skill was re-copied and is no longer in
limbo.
2026-05-30 02:05:10 -07:00
teknium1
8ae0802d59 fix(skills): make _rmtree_writable handle read-only directories, not just files
The cherry-picked fix's onerror handler chmod'd only the failing path, but
unlinking a child requires write permission on its PARENT directory. On a true
Nix-store copy (r-xr-xr-x dirs + files) rmtree still failed. Now chmod the
parent dir as well before retrying.

Also rewrites the regression test: the original asserted the helper FAILS on a
read-only dir (documenting the limitation), which is the wrong success criterion.
Split into two tests — restore succeeds on a full read-only tree (real Nix case),
and manifest is preserved when removal genuinely cannot proceed (monkeypatched).
2026-05-30 02:05:10 -07:00
annguyenNous
83a7d0b601 fix(skills): fix transaction ordering in reset_bundled_skill and handle read-only files in rmtree
Two related bugs in tools/skills_sync.py affecting Nix-store and
immutable-package installs:

**#34972 — reset_bundled_skill corrupts manifest on rmtree failure:**
The function deleted the manifest entry BEFORE attempting rmtree. If
rmtree failed (read-only files from Nix store), the function returned
early — leaving the skill in a manifest-less limbo state where future
syncs silently skip it forever.

Fix: reorder steps — attempt rmtree FIRST, only delete manifest entry
after rmtree succeeds. If rmtree fails, nothing is changed.

**#34860 — stale .bak directories after sync:**
sync_skills() called shutil.rmtree(backup, ignore_errors=True) which
silently failed on read-only files, leaving persistent .bak dirs.

Fix: add _rmtree_writable() helper that makes files writable via an
onerror callback before retrying removal. Used in both sync_skills()
backup cleanup and reset_bundled_skill().

Fixes #34972
Fixes #34860
2026-05-30 02:05:10 -07:00
liuhao1024
a57cc00081 fix(packaging): include mcp_serve in py-modules so hermes mcp serve works on pip installs
mcp_serve.py was missing from the setuptools py-modules list, causing
hermes mcp serve to crash with ModuleNotFoundError on standard pip installs.

Fixes #34871
2026-05-30 01:45:30 -07:00
Teknium
93e6a05efc feat(model-picker): group multi-endpoint providers under one row (#35227)
* Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here'

Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.

Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20

- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
  guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
  compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
  user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
  changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.

Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).

* feat(model-picker): group multi-endpoint providers under one row

The interactive provider pickers (hermes model, setup wizard, Telegram
/model) listed every provider slug flat, so vendors with several endpoints
(Kimi/Moonshot, MiniMax, xAI Grok, Google Gemini, OpenAI, OpenCode, GitHub
Copilot) each occupied multiple top-level rows. Now related slugs fold into
one top-level row that drills down to the specific endpoint.

- models.py: add PROVIDER_GROUPS table + group_providers() fold (display
  only — CANONICAL_PROVIDERS, slugs, --provider, /model <provider:model>
  all unchanged and individually addressable).
- hermes model (main.py): group rows drill into a member sub-picker, then
  dispatch to the existing _model_flow_* unchanged. setup wizard inherits it.
- Telegram /model: new mpg:<group> callback expands to member mp:<slug>
  buttons; single authenticated member degrades to a direct button.
- Grouping is the single shared fold across all three surfaces.

Validation: 163 targeted tests pass; E2E confirms group->member->model
resolves to the correct concrete slug for all families.
2026-05-30 01:41:33 -07:00
LeonSGP43
14517ac1f5 fix(update): export launcher virtualenv to uv 2026-05-30 01:41:29 -07:00
teknium1
8e5a6854c3 fix(kanban): align recompute_ready guard with breaker's configured failure_limit
Follow-up to the budget-exhaustion recovery fix. recompute_ready's
new circuit-breaker guard resolved its effective limit from per-task
max_retries -> DEFAULT_FAILURE_LIMIT, skipping the dispatcher's
configured kanban.failure_limit. _record_task_failure resolves
max_retries -> failure_limit(config) -> DEFAULT, so the two disagreed
whenever an operator set kanban.failure_limit != 2:

- config > 2: a task could get stuck at DEFAULT(2) before reaching its
  allowed retry count.
- config < 2: a task the breaker already blocked could be auto-recovered
  back to ready, defeating the stricter limit.

Thread the dispatcher's failure_limit through dispatch_once into
recompute_ready so the guard and the breaker share one resolution order.
Updated test_circuit_breaker_block_still_auto_promotes (it asserted a
failures=5 block auto-recovers and resets the counter — that's the
pre-#35072 behavior the loop fix removes); it now exercises a below-limit
transient block, with the at-limit case covered in test_kanban_db.py.
Added two tests for the config-tier and per-task override resolution.
2026-05-30 01:40:57 -07:00
liuhao1024
6ab71d3bb4 fix(kanban): prevent infinite retry loop when worker exhausts iteration budget
recompute_ready() previously reset consecutive_failures to 0 when
auto-recovering a blocked task.  This defeated the circuit-breaker:
a task that repeatedly exhausted its iteration budget would cycle
forever (block → auto-recover with counter=0 → respawn → budget
exhausted → block → …) with no signal to the operator.

Fix: don't auto-recover tasks whose consecutive_failures has reached
the effective failure limit (per-task max_retries or
DEFAULT_FAILURE_LIMIT).  The counter is also preserved across
recovery so the breaker can accumulate across cycles.

Fixes #35072
2026-05-30 01:40:57 -07:00
teknium1
c70dca3a88 fix(kanban): rebuild legacy TEXT-PK tables to INTEGER AUTOINCREMENT on open
Legacy kanban boards (pre-AUTOINCREMENT schema) crashed the gateway
notifier on every tick — int(None) on a NULL id in unseen_events_for_sub
— silently losing all kanban notifications. CREATE TABLE IF NOT EXISTS
skips existing tables regardless of schema and _add_column_if_missing
only adds columns, so neither could fix a drifted primary-key type.

_rebuild_drifted_tables() detects the legacy shape via PRAGMA table_info
and rebuilds task_events/task_comments/task_runs (TEXT PK -> INTEGER
AUTOINCREMENT) and kanban_notify_subs.last_event_id (TEXT/NULL -> INTEGER
NOT NULL DEFAULT 0), preserving data. The whole pass is one transaction
so an interruption can't leave a table half-renamed, and recreates every
index DROP TABLE would otherwise take down (including idx_events_run).

Co-authored-by: liuhao1024 <liuhao1024@users.noreply.github.com>
2026-05-30 01:40:49 -07:00
teknium1
16882cfded refactor(tui): simplify base64 clipboard write to a stdin flag
The per-entry psScript callback was identical for every PowerShell entry,
so the function-valued union member added structure without behavior. Collapse
WriteCmd to a plain stdin boolean and apply the one shared base64 script in the
write loop. Document the CP936 root cause inline.

Co-authored-by: BROCCOLO1D <279959838+BROCCOLO1D@users.noreply.github.com>
2026-05-30 01:40:44 -07:00
annguyenNous
64998fa93e fix(tui): use base64 encoding for PowerShell clipboard writes to preserve UTF-8
When writing text to the clipboard via PowerShell (WSL2 and native Windows),
the previous implementation piped text through stdin using `Set-Clipboard
-Value $input`. PowerShell reads stdin using the Windows system's default
ANSI code page (e.g. CP936 for Chinese Windows), causing all non-ASCII
characters (CJK, emoji, accented) to become garbled.

Fix: encode the text as base64 in Node.js and pass it as a command argument.
PowerShell decodes it from base64 using explicit UTF-8, bypassing the code
page issue entirely.

Fixes #35107
2026-05-30 01:40:44 -07:00
Teknium
b4cf114f68 fix(vision): fail fast on non-retryable image download errors (#35221)
_download_image() wrapped every download attempt in a blanket
`except Exception` and retried 3x with 2s/4s/8s backoff regardless of
cause. A 404/403 image URL would never resolve on retry, so it just
burned up to 6s of wall-clock + extra GETs before failing — inflating
latency for a deterministic failure (issue #32296, umbrella #35114).

Add _is_retryable_download_error(): 4xx client errors (except 429),
website-policy PermissionError, and too-large/SSRF ValueError now raise
on the first attempt. 429, 5xx, and unclassified network errors stay
retryable. Removed the now-unreachable fall-through branch since the
loop always returns on success or re-raises on the final/terminal attempt.
2026-05-30 01:40:39 -07:00
kshitij
e481b15333 Merge pull request #35216 from kshitijk4poor/fix/agents-nudge-single-delegate
fix: surface /agents nudge for single-delegate fan-out (TUI + CLI)
2026-05-30 00:57:15 -07:00
kshitijk4poor
9d2571c86a fix: surface /agents nudge while delegate_task is in-flight (TUI + CLI)
The subagent spawn-observability overlay added a `(/agents)` hint, but
only on the standalone "Spawn tree" panel, gated behind `!inlineDelegateKey`
— it never showed for a single delegate_task call, and only appeared once
subagents had already registered. A nudge that arrives at the end (or only
after spawn) is useless for the actual goal: letting users open the live
monitor *while* delegation is running.

Surface it the moment delegation starts, on both surfaces:

TUI (ui-tui/src/components/thinking.tsx)
- Show `(/agents)` on any "Delegate Task" tool group as soon as it appears
  (in-flight, before any subagent registers), not gated on subagents
  already existing. Same `startsWith('Delegate Task')` predicate already
  used for delegateGroups.

CLI (agent/tool_executor.py)
- Append `· /agents to monitor` to the delegate spinner label, which is
  displayed for the full duration of the delegate_task call. The previous
  attempt put the hint on the completion line (get_cute_tool_message),
  which only renders after the call finishes — reverted.

TUI tsc clean (pre-existing execFileNoThrow type errors unrelated);
subagentTree 35/35; display.py reverted to upstream.
2026-05-30 13:22:45 +05:30
Teknium
bb79bcde61 fix: detect pyproject.toml / __init__.py version drift in hermes doctor (#35142)
A git conflict resolution (reset --hard or merge) can revert
hermes_cli/__init__.py to a stale __version__ while pyproject.toml stays
current, so 'hermes --version' silently reports the wrong version. Nothing
cross-checked the two files.

Add a version-consistency check to the doctor 'Python Environment' section:
reads the [project] version from pyproject.toml and compares it to
hermes_cli.__version__. Reports OK when they match, fails with a re-sync
hint when they drift, and is a silent no-op for installed wheels where
pyproject.toml isn't present.

Closes #35070
2026-05-30 00:32:05 -07:00
teknium1
e5765e61fa chore(release): map wei.chen.coder@gmail.com -> wenchengxucool 2026-05-30 00:30:55 -07:00
weichengxu
84ee80eb5d feat: set process title to 'hermes' in ps/top/htop
Adds _set_process_title() in hermes_cli/main.py, called first thing in
main(). Tries setproctitle (optional) for a full ps-args rewrite, then
falls back to ctypes prctl(PR_SET_NAME) on Linux / pthread_setname_np on
macOS. No-op on Windows and on any failure. No new dependency: the
setproctitle path is best-effort via ImportError guard.

Fixes #35108
2026-05-30 00:30:55 -07:00
teknium1
17103a1f11 chore: add SeaXen to AUTHOR_MAP for salvaged PR #33278 2026-05-30 00:23:44 -07:00
SeaXen
e8076c1ebe fix(dashboard): allow chat websockets on insecure public bind
Allow non-loopback websocket peers when the dashboard is explicitly exposed with --host 0.0.0.0/:: and --insecure.

This fixes the failure mode where /chat rendered over LAN but /api/ws and /api/events were rejected with HTTP 403, leaving the embedded TUI chat disconnected.

Add regression coverage for the insecure public bind case in the dashboard websocket auth tests.
2026-05-30 00:23:44 -07:00
Max Hsu
636ff636d7 fix(agent): strip schema-foreign keys from max-iterations summary request (#34436)
The max-iterations summary path (`handle_max_iterations`) hand-builds its
message list and calls `chat.completions.create()` directly, bypassing
`ChatCompletionsTransport.convert_messages()`. It only popped
("reasoning", "finish_reason", "_thinking_prefill"), so `tool_name` (SQLite
FTS bookkeeping), the `codex_*` reasoning carriers, and other internal
`_`-prefixed scaffolding leaked to the wire.

Strict OpenAI-compatible gateways (Fireworks-backed OpenCode Go, Mistral,
Moonshot/Kimi) reject these with HTTP 400 "Extra inputs are not permitted,
field: 'messages[N].tool_name'", so a long tool-using session that exhausts
the iteration budget fails to summarise instead of returning the result.

Mirror convert_messages() in this path: also drop tool_name,
codex_reasoning_items, codex_message_items, and every `_`-prefixed key.
Copy-on-write is already in place, so internal history keeps the fields for
FTS / Codex-fallback.

Adds a regression test to TestHandleMaxIterations asserting the summary
request carries none of the schema-foreign keys (fails on main, passes here).
2026-05-30 00:22:53 -07:00
Teknium
c1b2d0917f fix(cli): don't treat any container as the Docker image for updates (#35139)
detect_install_method() returned "docker" for any container (is_container()),
before the .git check. Both supported installs already self-identify via the
.install_method stamp read first: the curl installer (scripts/install.sh)
git-clones and stamps "git"; the published nousresearch/hermes-agent image
stamps "docker" at boot via docker/stage2-hook.sh. An unsupported manual
install dropped into a container has no stamp, so the bare container check
hijacked it to "docker" and 'hermes update' bailed with the docker-pull
guidance.

Drop the redundant is_container() -> docker fallback. Unstamped installs now
fall through to the .git/pip checks like any off-path install; both supported
paths are unaffected because the stamp wins first.

Fixes #34397.
2026-05-30 00:22:46 -07:00
kshitij
8738cb92c3 Merge pull request #34704 from kshitijk4poor/feat/tui-agents-nudge
feat(tui): nudge toward /agents dashboard when delegation starts
2026-05-30 00:01:59 -07:00
kshitijk4poor
5a72e82fd8 feat(tui): nudge toward /agents dashboard when delegation starts
The TUI already ships a rich /agents spawn-tree dashboard (live tree,
timeline, per-child tokens/cost/files/tools, kill/pause), but nothing
surfaced it — during delegation the transcript stayed quiet and users
had to already know to type /agents.

Drop a one-time transient activity hint ("subagents working · /agents
to watch live") the first time a turn starts delegating, matching the
existing "· /logs to inspect" house style. Guards keep it unobtrusive:

- fires at most once per turn (resets on message.start)
- silent when the /agents overlay is already open
- gated by display.tui_agents_nudge (default true)

Hooked on subagent.start, not subagent.spawn_requested: the delegate
progress callback in tools/delegate_tool.py only relays start/complete
to the gateway and drops spawn_requested, so start is the first
delegation event the TUI reliably receives. spawn_requested is wired
too for the future case, guarded once-per-turn.

Adds the display.tui_agents_nudge config default and gatewayTypes entry.
2026-05-30 12:26:36 +05:30
kshitijk4poor
7b0915037c test: remove low-value model-catalog mirror tests
These tests asserted that hardcoded curated model lists/constants still
contained specific model strings (e.g. 'glm-5' in provider_model_ids('zai'),
exact context-length values per model key, PROVIDER_TO_MODELS_DEV entries).
They mirror a constant rather than exercise logic, so they only ever break
when models are added/retired and never catch a real bug.

Removed 22 such functions across 7 files (149 deletions, 0 additions).
Behavioral siblings are kept: live-catalog-wins, fallback ordering,
substring/longest-match resolution, normalization, credential discovery,
and probe-tier stepping all still tested.
2026-05-29 23:45:05 -07:00
Teknium
0437137fff security: pin patched Starlette (>=1.0.1) for CVE-2026-48710 BadHost (#35118)
Starlette < 1.0.1 is affected by CVE-2026-48710 ("BadHost", CWE-444).
The HTTP Host header was not validated before being used to rebuild
`request.url`, so a malformed Host could make `request.url.path` desync
from the raw ASGI path the router actually dispatched. Middleware and
endpoints that apply path-based authorization off `request.url` (rather
than `scope["path"]`) can therefore be bypassed.

Hermes pulls Starlette transitively, never directly:
  - [web]          -> fastapi==0.133.1  (starlette>=0.40.0, no upper bound)
  - [mcp]          -> mcp==1.26.0 + sse-starlette (starlette>=0.27 / >=0.49.1)
  - [computer-use] -> mcp==1.26.0
  - [dev]          -> mcp==1.26.0

A fresh resolve landed starlette 0.52.1 — vulnerable. With no upper
bound on the transitive specs, pip/uv could resolve any pre-1.0.1
release on a fresh install.

Fix: pin starlette==1.0.1 directly in every extra that exposes a
Starlette-backed server surface, regenerate uv.lock (only starlette
moves: 0.52.1 -> 1.0.1, hash-verified), and mirror the pin in the
lazy-install map (tools/lazy_deps.py `tool.dashboard`) so `hermes`
on-demand dashboard installs can't re-resolve a vulnerable version.

1.0.1 is the advisory's named fix floor and the oldest patched release
(more bake time than 1.1.0/1.2.0, which are days old); it satisfies
every carrier constraint and our requires-python>=3.11.

Scope note: this is a dependency-level fix complementing the
application-layer Host-header validator added in #34162
(`hermes_cli/web_server.py` `_is_accepted_host`). Defense in depth at
both the framework and app layers.

Guards: two invariant tests in tests/test_packaging_metadata.py assert
every server-surface extra pins starlette and that pyproject + uv.lock
both resolve >= the 1.0.1 CVE floor — a dropped pin or stale lock fails
in CI instead of shipping the bypass.

Closes #35067
2026-05-29 23:23:54 -07:00
Erosika
827ce602db fix(honcho): harden self-hosted setup paths
Self-hosted Honcho setup had four sharp edges:

- local/cloud URLs ending in /vN double-prefixed by the SDK (/v3/v3/... 404)
- authenticated local servers had no setup prompt for a JWT/bearer token
- profile-derived host keys could be dot-containing workspace IDs Honcho rejects
- memory-provider config files with API keys written world-readable per umask

This keeps existing behavior but makes those paths safer:

- strip a trailing /vN version segment from any configured baseUrl before SDK
  init (the SDK's route builders always prepend their own version prefix);
  auth-skipping stays loopback-only
- add an optional local JWT/bearer prompt in honcho setup, stored under
  hosts.<host>.apiKey
- derive new profile host keys with underscores, still reading legacy
  hermes.<profile> blocks
- write memory-provider config files atomically with 0600 via a shared
  utils.atomic_json_write(mode=) arg (honcho/hindsight/mem0/supermemory)
- skip honcho.json parsing in gateway cache-busting unless Honcho is the active
  memory provider; memoize by honcho.json mtime when active
- bust the gateway agent cache on memory.provider change
- add a hermes memory setup <provider> one-liner so fresh installs can configure
  a named provider without the picker (the per-provider hermes <provider>
  subcommand only registers once that provider is active)

Closes #20688, #29885, #26459, #30246, #33382, #32244.

Co-authored-by: BROCCOLO1D
2026-05-29 22:29:48 -07:00
Siddharth Balyan
aa32edcac5 fix(setup): write config for image_gen and video_gen in apply_nous_managed_defaults (#35109)
apply_nous_managed_defaults() was adding image_gen and video_gen to the
'changed' return set without writing any config values.  The caller
(tools_command first_install flow) uses 'changed' to skip manual
configuration, so these tools ended up in platform_toolsets but with no
video_gen.provider, video_gen.use_gateway, or image_gen.use_gateway in
config.yaml.

At runtime the FAL plugin's is_available() returned False because there
was no FAL_KEY and no use_gateway config — the tool never loaded despite
being 'enabled' in the toolset list.

For image_gen this was a latent bug masked by the gateway offer prompt
(prompt_enable_tool_gateway) running earlier in the setup flow and
writing image_gen.use_gateway=True via apply_gateway_defaults().  But if
the user skipped the gateway offer, image_gen would silently break the
same way.

For video_gen (added in PR #33259) the bug was always hit because the
gateway offer ran before the user checked video_gen in the toolset
checklist.

Fix: write provider/use_gateway config values before adding to 'changed',
matching the pattern used by web, tts, and browser.
2026-05-30 03:45:12 +00:00
teknium1
a7421dc7d2 fix(session): point no-FTS5 warning at the supported install
When FTS5 is missing the warning now explains the likely cause (an
unsupported / pip-managed Python whose bundled SQLite lacks FTS5) and
links the supported install at hermes-agent.nousresearch.com, instead
of just logging the raw error.
2026-05-29 20:11:07 -07:00
teknium1
4fa20f9a8b fix(install): ensure the uv-managed Python ships SQLite FTS5
uv's python-build-standalone distributions only gained FTS5 in mid-2025
(#694). A stale interpreter already in uv's store — which `uv python find`
reuses without checking — can lack it, leaving the supported install with
a SQLite that can't create the FTS5 virtual tables hermes_state.py needs
for full-text session search ("no such module: fts5").

check_python now probes the resolved interpreter for FTS5 and, if missing,
reinstalls the latest patch for $PYTHON_VERSION (which has FTS5) and
re-resolves. If an FTS5-capable Python still can't be obtained (offline,
pinned env), it warns and continues — Hermes degrades gracefully and only
disables session search. No bundled second SQLite, no user action.
2026-05-29 20:11:07 -07:00
teknium1
97ecfa0fc4 fix(session): extend no-FTS5 degradation to the trigram CJK index
The salvaged contributor commit guarded only messages_fts. Current main
also creates a second virtual table, messages_fts_trigram (CJK substring
search), whose CREATE VIRTUAL TABLE ... USING fts5 still raised
"no such module: fts5" on builds without FTS5 — re-crashing SessionDB
init. Wrap the trigram setup with the same guard, and broaden the test's
no-fts5 mock to fail BOTH tables so the regression test actually
exercises a faithful no-FTS5 build.
2026-05-29 20:11:07 -07:00
LeonSGP43
5ad2b4c6da fix(session): degrade gracefully when SQLite lacks FTS5 2026-05-29 20:11:07 -07:00
Teknium
860cf28dab docs: clarify compression threshold is derived from the main model's context window (#35099)
The compression threshold is threshold × context_length where context_length
is the MAIN agent model's window, not the auxiliary/summary model's. On a
262,144-token model at the default 0.50 the threshold is 131,072 — close to a
common 128K figure by coincidence of the percentage, which has led to confusion
that the auxiliary model's context limit is the trigger. Add a note preempting
that misreading and pointing to the separate summary-model-context constraint.
2026-05-29 19:59:04 -07:00
teknium1
fb0ab27649 fix(agent): register explainer config key + shorten footer prefix
Follow-up to the salvaged #34452 turn-completion explainer:
- Register display.turn_completion_explainer: True in DEFAULT_CONFIG so the
  setting is discoverable, matching the file_mutation_verifier precedent.
- Shorten the repeated footer prefix from 'Turn ended without a usable
  reply: ' to 'No reply: ' so the 10 reason variants don't all open with
  the same 8-word boilerplate.
- Update the 7 assertions that referenced the old prefix.
2026-05-29 19:23:05 -07:00
Bartok9
de6d6023d7 test(run_agent): align test_dict_tool_call_args with explainer suffix
PR #34470 adds an explainer suffix to abnormal turn endings (e.g.
max_iterations_reached) so users see why the response is short instead
of receiving a bare/blank reply. test_tool_call_validation_accepts_dict_arguments
runs the agent at max_iterations=3 which hits the explainer path; the
existing strict-equality assertion (== "done") no longer matches once
the suffix is appended.

Switch the assertion to .startswith("done") so the test continues to
verify that the models actual text survives intact while leaving the
explainer suffix wording owned by conversation_loop (where it belongs).

Test now passes (1 passed in 0.88s).
2026-05-29 19:23:05 -07:00
Bartok9
59b0ea98c8 fix(agent): explain abnormal turn endings instead of blank/partial reply
When a turn ends abnormally after substantive tool calls (empty content
after retries, a partial/truncated stream, exhausted retries, or an
iteration/budget limit), the CLI/TUI response area was left blank or
showed only a fragment (e.g. "The") with no consolidated reason. The
internal turn_exit_reason values (empty_response_exhausted,
partial_stream_recovery, etc.) were never surfaced to the user.

Add a turn-completion explainer that mirrors the existing file-mutation
verifier footer: at turn end, map an abnormal turn_exit_reason to a
short, actionable message and either replace the bare "(empty)"
sentinel or append the reason after a partial fragment. Normal
text_response exits (e.g. a terse "Done.") stay quiet.

Gated by display.turn_completion_explainer (default on) with
HERMES_TURN_COMPLETION_EXPLAINER env override, matching the
file-mutation verifier seam.

Closes #34452
2026-05-29 19:23:05 -07:00
Teknium
897f9533ed fix: keep CLI context display in sync with preflight token estimate (#35079)
* Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here'

Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.

Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20

- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
  guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
  compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
  user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
  changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.

Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).

* fix: keep CLI context display in sync with preflight token estimate

The status bar reads compressor.last_prompt_tokens, which only updates
from a successful API response. When loaded history is oversized but
compression no-ops (e.g. the auxiliary summary model times out), no fresh
usage arrives and the bar stays frozen at the old, smaller value while the
preflight estimate reports a much larger number — looking permanently out
of sync (reported: 74.4K display vs ~144,669 preflight).

Seed last_prompt_tokens with the fresh preflight estimate (upward-only, so
a real usage figure is never clobbered and a successful compression's
downward correction still wins). Display-only; no behavioral change to
compression, caching, or the agent loop.
2026-05-29 19:21:15 -07:00
teknium
9d4c81130a fix(gateway): name what the /status token number actually is
Sharpen the label from 'Session usage (cumulative)' to 'Cumulative API
tokens (re-sent each call)'. The number is real provider-reported usage
summed across every API call in the session — not context size. In an
agentic loop the same context is re-sent each iteration, so a one-hour
tool-heavy session legitimately reaches tens of millions of tokens. The
new label explains the magnitude so users stop reading it as a bug or as
a total across all sessions.
2026-05-29 19:14:37 -07:00
helix4u
2259c15e4d fix(gateway): clarify status session usage label 2026-05-29 19:14:37 -07:00
Bartok9
45bc65abbe fix(gateway): drop outbound silence-narration messages pre-send
Hallucinated 'silence' tokens (*(silent)*, _silent_, the bare '.', '...',
'silent', no response/reply, the mute emoji) are emitted when a persona has
nothing actionable to say. In bot-to-bot channels the receiving bot mirrors
the token back, creating a tight loop that burns API tokens and can crash a
model with 'no content after all retries'. SOUL.md/prompt rules drift across
providers and have already failed in practice, so add a substrate-level guard.

_deliver_to_platform now drops a message whose finalized content is only a
silence-narration token, logs a WARNING with platform/chat_id/truncated
content, and returns {success: True, filtered: 'silence_narration',
delivered: False} instead of calling the adapter. Single chokepoint covers
every platform adapter; the regex is anchored start/end with a 64-char guard
so prose like 'Silence is golden — here is the plan...' or 'Silent install
completed' is never dropped. Local/file delivery is a separate path and is
left untouched. Opt out via gateway.filter_silence_narration: false or the
HERMES_FILTER_SILENCE_NARRATION env override (env wins when set).

Closes #34616
2026-05-29 19:06:05 -07:00
teknium1
9dbc3722ae test(compression): fix StopIteration in large-rough-growth preflight test
The rough-estimate mock supplied only 2 side_effect values but the
conversation loop calls estimate_request_tokens_rough a third time for
the post-response real-token estimate, exhausting the iterator. Use a
callable side_effect that returns 125k once (to fire preflight) then
sub-threshold values, independent of call count.
2026-05-29 19:05:03 -07:00
helix4u
e38b0b55d1 fix(compression): avoid repeat preflight compaction from rough estimates 2026-05-29 19:05:03 -07:00
Teknium
04de307d62 fix(cli): repaint input area after inline /steer and /model submit (#34839)
handle_enter dispatches /steer and /model inline on the UI thread while
the agent is running, calling buffer.reset() then returning. Unlike every
other early-return branch in the handler, these two skipped
event.app.invalidate(). process_command() prints through patch_stdout
(scrolls output above the prompt without redrawing the input line), so the
just-cleared input area could keep showing the submitted '/steer <text>'
until an unrelated redraw fired — looking unsent and inviting an accidental
re-submit.

Add event.app.invalidate() after reset in both inline branches to match
the sibling branches. AST regression test pins the invariant: every
reset-then-return branch in handle_enter must invalidate first.

Fixes #34569
2026-05-29 19:04:40 -07:00
Teknium
bcc8301000 Inspired by Claude Code: /compress here [N] — boundary-aware 'summarize up to here' (#35048)
Adds a user-chosen compression boundary to the existing /compress command.
/compress here [N] summarizes everything except the most recent N exchanges
(default 2), which are preserved verbatim — letting the user pick the
compression boundary instead of relying on the automatic token-budget heuristic.

Inspired by Claude Code's Rewind 'Summarize up to here' action (v2.1.139,
Week 20, May 2026): https://code.claude.com/docs/en/whats-new/2026-w20

- hermes_cli/partial_compress.py: pure split/parse helpers + seam-alternation
  guard (shared by CLI and gateway).
- cli.py / gateway/run.py: route 'here [N]' / '--keep N' to partial compression;
  compress only the head, re-append the verbatim tail through the seam guard.
- Preserves message-flow role alternation (seam guard merges any illegal
  user->user / assistant->assistant adjacency).
- Reuses the existing _compress_context session-rotation/lock machinery — no
  changes to the compression core.
- Bare /compress (full) and /compress <focus> behavior unchanged.

Tests: 12 helper unit tests + 5 CLI integration tests + E2E (interleaved
tool-call transcript, degenerate/multimodal seams, real handler path).
2026-05-29 17:49:15 -07:00
Bartok9
54aa4db1de fix(cli): remove Hermes-managed node/npm/npx symlinks on uninstall
The POSIX installer drops node/npm/npx symlinks in ~/.local/bin pointing
into $HERMES_HOME/node and prepends ~/.local/bin to PATH, shadowing an
existing nvm. Uninstall removed the hermes wrapper but left these behind,
so the user's default node/npm/npx stayed redirected after uninstall.

Add remove_node_symlinks() and call it from run_uninstall. It removes
~/.local/bin/{node,npm,npx} only when each is a symlink resolving into the
current Hermes home's node dir, so a link the user repointed at nvm or a
real binary is never touched. Handles dangling links too.

Closes #34536
2026-05-29 17:24:38 -07:00
Teknium
2062a84000 fix(auxiliary): stop capping output with max_tokens by default (#34530) (#34845)
* fix(auxiliary): stop capping output with max_tokens by default

Auxiliary LLM calls (compression, titles, vision, etc.) no longer send
max_tokens on the OpenAI-compatible chat-completions path. Most providers
treat an omitted max_tokens as "use the model max", which is what we want;
an explicit cap only risks truncation or a wire-format 400.

This was surfaced by GitHub Copilot / GPT-5 (#34530): those models reject
max_tokens and require max_completion_tokens, so compression 400'd and fell
back to a static context marker. Omitting the param sidesteps that quirk
(and ZAI vision's error 1210) entirely.

The Anthropic Messages wire (MiniMax + /anthropic endpoints) keeps
max_tokens because it is a mandatory field there.

* test(auxiliary): update temperature-retry assertions for omitted max_tokens

The temperature-retry tests asserted retry_kwargs["max_tokens"] == 500 on an
api.openai.com endpoint. Now that auxiliary calls omit max_tokens on
OpenAI-compatible endpoints (#34530), that key is absent. Assert it's absent
in both first and retry kwargs and use model as the survives-the-retry witness.
2026-05-29 17:24:30 -07:00
Teknium
f9daa4a41d fix(deps): declare setuptools in dev extra for packaging tests (#34851)
* fix(deps): declare setuptools in dev extra for packaging tests

tests/test_packaging_metadata.py imports `from setuptools import
find_packages` at module scope to validate package discovery against
the live tree. setuptools was being picked up ambiently from the CI
runner image, but recent ubuntu-latest images no longer ship it in the
test venv, so collection fails with ModuleNotFoundError on every PR.

Declare setuptools==82.0.1 in the dev optional-dependencies so `.[all,dev]`
installs it explicitly rather than relying on the runner environment.

* test(packaging): skip packaging-metadata tests when setuptools absent

Belt-and-suspenders alongside declaring setuptools in [dev]: guard the
module-level `from setuptools import find_packages` with
pytest.importorskip so a runner missing setuptools SKIPS these checks
instead of erroring out collection for the entire test shard.

* chore(deps): sync uv.lock for setuptools dev dependency
2026-05-29 17:24:23 -07:00
Teknium
689ef5e233 feat(cli): warn on unsupported pip installs + fix stale update-check cache (#34491) (#34846)
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround

The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.

Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.

* feat(cli): warn on unsupported pip installs + fix stale update-check cache after pip upgrade

Banner now shows a yellow warning when detect_install_method() == 'pip':
'pip install hermes-agent' isn't the supported install path (it exists on
PyPI for internal/CI reasons), so updates and issue support don't behave
correctly. Reuses existing install-method detection; warn, never block.

Also fixes #34491: check_for_updates() keyed its 6h cache only on ts+rev.
On the pip path (no HERMES_REVISION), rev is always None, so a
'pip install --upgrade' changed VERSION but left the cache valid — the
stale 'N commits behind' count survived the upgrade. Cache now also keys
on the installed VERSION and invalidates on mismatch.
2026-05-29 13:30:28 -07:00
teknium1
bb50825716 chore(release): map annguyenNous to AUTHOR_MAP
Clears the check-attribution CI gate on PR #34468 — the contributor's
noreply email was unmapped.
2026-05-29 13:29:34 -07:00
annguyenNous
9f5afc7636 fix(mcp): widen isinstance check to BaseException for CancelledError
asyncio.gather(return_exceptions=True) captures CancelledError as a
BaseException value. The previous isinstance(result, Exception) check
missed CancelledError, silently dropping it without logging.

Since Python 3.9, CancelledError is a BaseException subclass (not
Exception). This one-line change ensures all failure types from MCP
server connections are properly logged.

Fixes NousResearch/hermes-agent#34443
2026-05-29 13:29:34 -07:00
teknium1
4fd8521e44 test(tui-gateway): isolate completion_queue in poller requeue test
test_notification_poller_requeues_when_busy drained and reused the
process-global process_registry.completion_queue, so a concurrent test
in the same xdist worker could put/get on the shared singleton mid-run
and empty the event the poller requeues — flaking 'assert not
completion_queue.empty()' under parallel CI load only.

Monkeypatch a fresh Queue onto the singleton for the test's duration so
nothing external can interleave. The poller reads completion_queue by
attribute at runtime, so the isolated queue is what it operates on.
monkeypatch restores the original on teardown. Verified immune: 50/50
passes under a background thread hammering the global queue.
2026-05-29 13:29:24 -07:00
Bartok9
edfdc77664 fix(cli): resume the selected chat when a bare number follows /resume
A bare `/resume` printed the recent-sessions list but armed no selection
state, so typing just `3` on the next line was sent to the agent as chat
instead of resuming session #3. `/resume 3` worked, but the natural
list-then-pick flow did not.

Arm a one-shot pending-resume prompt when bare `/resume` shows the list,
and consume the next bare numeric input as the selection (out-of-range is
reported, non-numeric/other commands disarm it). Resolves against the same
_list_recent_sessions(limit=10) list used everywhere else.

Closes #34584.
2026-05-29 13:29:24 -07:00
Teknium
3a2c03061c fix(stt,tts): restore mistralai — 2.4.8 is clean, ban lifted (#34841)
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround

The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.

Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.

* fix(stt,tts): restore mistralai — 2.4.8 is clean, ban lifted

PyPI quarantined mistralai on 2026-05-12 after the malicious 2.4.6
release (Mini Shai-Hulud worm). 2.4.6 has since been removed from the
registry and clean releases resumed (2.4.7 2026-05-25, 2.4.8 2026-05-28).
This rolls back the blanket runtime ban so Voxtral STT + TTS work again,
following the restoration checklist the repo left in pyproject.toml.

Verified against the real SDK: 2.4.8 keeps the import path the code uses
(from mistralai.client import Mistral) and the audio.transcriptions.complete
/ audio.speech.complete surfaces.

Changes:
- pyproject.toml: re-add mistral extra pinned to mistralai==2.4.8; left
  OUT of [all] per the 2026-05-12 lazy-install policy (one quarantined
  release must not break fresh installs). uv.lock regenerated.
- tools/lazy_deps.py: add stt.mistral / tts.mistral entries so the SDK
  lazy-installs on first use (matches edge / elevenlabs).
- tools/transcription_tools.py: restore explicit-provider gate
  (_HAS_MISTRAL + key) and auto-detect entry (local>groq>openai>mistral>xai);
  _transcribe_mistral lazy-installs before import.
- tools/tts_tool.py: dispatcher routes back to _generate_mistral_tts;
  _import_mistral_client lazy-installs the SDK.
- hermes_cli/tools_config.py, hermes_cli/web_server.py: un-hide Mistral
  from the TTS provider picker and dashboard STT options.
- hermes_cli/security_advisories.py: KEEP the shai-hulud-2026-05 advisory
  (module policy forbids removal) — it is scoped to 2.4.6 only, so it
  still warns anyone with the poisoned build cached and never fires on
  2.4.8. Summary note updated to reflect the un-quarantine.
- tests: revert the disabled-behavior assertions added by the ban commit
  back to routing/positive expectations; add mistral to the
  lazy-installable-extras-excluded-from-[all] contract.

Reported by @SkYNewZ (#34503).

Validation: 189 targeted STT/TTS/lazy_deps/metadata tests pass; E2E with
the real mistralai 2.4.8 SDK routes both STT and TTS to mistral.
2026-05-29 13:24:12 -07:00
Teknium
781604ce4c fix(gateway): unify MEDIA: extraction extension set + close the unknown-ext black hole (#34517) (#34844)
MEDIA:<path> tags for .md/.json/.yaml/.xml/.html and other document
extensions were silently dropped. extract_media() carried a narrow
extension allowlist that omitted them, while extract_local_files()
had a broad one. The dispatch sites then ran an unconditional
re.sub(r'MEDIA:\\s*\\S+', '') that stripped the tag from the body even
when extract_media had not matched it — so extract_local_files (broad
list) ran on text where the path was already gone, and the file was
delivered by neither path.

- Add MEDIA_DELIVERY_EXTS in gateway/platforms/base.py as the single
  source of truth; extract_media and extract_local_files both derive
  their extension set from it (no more drift).
- Replace the loose MEDIA cleanup at the non-streaming dispatch site
  (base.py) and the streaming consumer (stream_consumer.py) with the
  shared, extension-anchored MEDIA_TAG_CLEANUP_RE. A MEDIA: tag with an
  unknown extension is left in the body so the bare-path detector can
  still pick it up instead of being black-holed.
- Chain cleaned text through extract_media -> extract_images ->
  extract_local_files in run.py's post-stream media delivery (it was
  dropping the cleaned text and rescanning raw text with MEDIA: tags).
- Regression tests covering both halves: previously-dropped extensions
  now extract, and unknown-ext paths survive the cleanup.

Consolidates the MEDIA extension-allowlist PR cluster.

Co-authored-by: Bartok9 <259807879+Bartok9@users.noreply.github.com>
Co-authored-by: banditburai <123342691+banditburai@users.noreply.github.com>
Co-authored-by: Kyzcreig <9063726+Kyzcreig@users.noreply.github.com>
2026-05-29 13:24:01 -07:00
teknium1
0dc0c5ea6b chore: add AUTHOR_MAP entry for sweetcornna
Maps the cherry-picked commit's noreply email to the GitHub login so the
release attribution / CI author check passes.
2026-05-29 13:22:54 -07:00
Bartok9
3845d86b93 fix(cron): restore jobs.json emptied by config migration on update
Config-version migrations have been observed to leave cron/jobs.json
valid-but-empty after `hermes update`, silently dropping every scheduled
job (#34600). The existing malformed-shape guards in cron/jobs.py don't
catch this because {"jobs": []} is valid JSON.

Add restore_cron_jobs_if_emptied() as a post-migration safety net: if the
live cron/jobs.json now has zero jobs while the pre-update snapshot held
one or more, restore the snapshot copy in place and warn loudly. The
check is conservative — it only restores on unambiguous evidence of loss
(snapshot had jobs, live file readable-and-empty), so a user who genuinely
cleared their jobs is never second-guessed and an unreadable live file is
left untouched so real corruption still surfaces.

Wired into _cmd_update_impl after migrate_config(), reusing the existing
pre-update quick snapshot (which already captures cron/jobs.json).

Closes #34600
2026-05-29 13:22:54 -07:00
Cornna
d473e7c938 fix(cron): exclude jobs.json registry from disk-cleanup pattern
Closes #32164
2026-05-29 13:22:54 -07:00
Teknium
91b174038c fix(feishu): bound _chat_locks with LRU eviction (#34836)
The Feishu adapter stored one asyncio.Lock per chat_id in a plain dict
with no upper bound, so a long-running gateway that saw many distinct
chats grew _chat_locks without limit. Port the LRU-eviction pattern
already used by the yuanbao adapter: OrderedDict + move_to_end on access,
CHAT_LOCK_MAX_SIZE cap (1000), and eviction that skips currently-held
locks (falling back to dropping the LRU entry only if all are held).
2026-05-29 13:18:15 -07:00
teknium1
8055d0f092 test(ntfy): cover echo-tag filter; tag standalone send path
Adds tests for the echo-loop fix (outgoing X-Tags header, inbound skip
on tagged events, genuine tags pass through) and extends the tag to the
out-of-process _standalone_send() path so cron / send_message deliveries
to a self-subscribed topic are also skipped. Maps both contributors in
release.py AUTHOR_MAP.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
2026-05-29 13:17:46 -07:00
annguyenNous
9405cdc8dd fix(ntfy): prevent echo loop by tagging outgoing messages
When publish_topic equals the subscribe topic, the agent's own replies
are echoed back by ntfy as incoming messages, creating an infinite
reply spiral.

Fix: tag outgoing messages with X-Tags: hermes-agent header, and skip
incoming messages that carry this tag. This is zero-config — works
automatically regardless of topic configuration.

Fixes NousResearch/hermes-agent#34447
2026-05-29 13:17:46 -07:00
Bartok9
08c0b22417 fix(gateway): scope tool-result MEDIA scan to current turn
The post-run scan that appends tool-emitted MEDIA: tags to the final
response iterated every tool/function message in the full conversation
and relied solely on path-based dedup against paths reconstructed from
the replayable transcript. When that reconstruction does not byte-match
the in-memory tool content (timestamp stripping, observed-context
withholding, compression rewrites), a stale path emitted several turns
earlier is absent from the dedup set and leaks onto a later text-only
reply (Telegram 'Sending media group of 1 photo(s)' with no MEDIA
directive present).

Scope the scan to this turn's new messages by slicing result['messages']
at len(agent_history) (agent_history is passed as conversation_history
into run_conversation, so the returned list is history + this turn).
Retain path-based dedup as a secondary guard and as the sole guard on
the compression-shrink fallback, preserving the #160 behaviour.

Closes #34608
2026-05-29 13:13:34 -07:00
teknium1
38c4f8c371 test(gateway): update system-unit cwd assertion to HERMES_HOME anchor
test_system_unit_has_no_root_paths asserted the system unit's
WorkingDirectory was the remapped *checkout* path
(/home/alice/.hermes/hermes-agent). That is the brittle pin this PR
fixes — the system unit now anchors cwd at the target user's HERMES_HOME
(/home/alice/.hermes). The test's intent (no root-home leak, target-user
paths present) is unchanged and still holds.
2026-05-29 12:36:59 -07:00
teknium1
a1cb5fa2c7 fix(gateway): anchor service WorkingDirectory at HERMES_HOME, not the source checkout
The systemd unit (and launchd plist) pinned WorkingDirectory to PROJECT_ROOT
(the checkout the unit was generated from). When that checkout is transient —
a git worktree, or a clone hermes update later relocates/removes — the path
rots. systemd then fails the start at the CHDIR step (status=200/CHDIR) BEFORE
Python loads, so the on-boot refresh_systemd_unit_if_needed() self-heal never
runs and Restart=always crash-loops forever on a dead directory. Observed in
the wild: a gateway that crash-looped 153 times overnight, bot offline until a
manual 'hermes gateway restart' regenerated the unit.

Anchor cwd at HERMES_HOME instead — it never moves, always exists, and the
gateway never needed cwd to be the checkout (ExecStart uses an absolute python
+ -m hermes_cli.main). Existing broken units now differ from the generated unit
and self-heal on the next start/restart/update.
2026-05-29 12:36:59 -07:00
Teknium
45b00bb49a fix(packaging): ship hermes_cli subpackages in wheel (#34811)
[tool.setuptools.packages.find] listed 'hermes_cli' without the
'hermes_cli.*' wildcard, so the wheel shipped hermes_cli/*.py but
dropped the dashboard_auth and proxy subpackages. The dashboard died
on every install with ModuleNotFoundError: No module named
'hermes_cli.dashboard_auth' (#34701); 'hermes proxy' was equally
broken.

Add the wildcard, and add a regression test that drives setuptools'
own find_packages against the live tree so any future subpackage
dropped from the include list fails CI instead of a user's container.
2026-05-29 12:36:09 -07:00
teknium1
8836b3a113 fix(cli): widen Windows .bat wrapper fix to custom-name alias path
The profile alias --name path in main.py rewrote the wrapper with a
hardcoded #!/bin/sh script right after create_wrapper_script(), clobbering
the .bat on Windows and reintroducing the exact bug for custom aliases.

create_wrapper_script() now takes an optional target so the alias file is
named after the alias while the -p content references the profile — one
platform-aware code path, no post-hoc rewrite.
2026-05-29 12:32:47 -07:00
liuhao1024
6312dd8c3a fix(cli): create .bat wrapper on Windows instead of POSIX shell script
On Windows, hermes profile create produced a #!/bin/sh script that the
shell cannot execute.  Now creates a .bat file with @echo off + %* on
Windows, and keeps the POSIX shell script on macOS/Linux.

Also fixes check_alias_collision to use 'where' instead of 'which' on
Windows, and remove_wrapper_script to find .bat files.

Fixes #34708
2026-05-29 12:32:47 -07:00
zapabob
30a0d5bc9e chore(release): map zapabob author email 2026-05-29 12:32:35 -07:00
zapabob
aa283d1e4f fix(model): isolate custom provider picker credentials 2026-05-29 12:32:35 -07:00
Teknium
2fc2280e63 fix(cli): clarify panel clips choices off-screen on short terminals (#34808)
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround

The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.

Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.

* fix(cli): clarify panel clips choices off-screen on short terminals

The clarify multiple-choice panel is a height-less Window inside a
non-full-screen HSplit. When its content exceeds the viewport,
prompt_toolkit distributes height per child and clips the panel's tail
— where the choices live — so options render invisible/cut off (issue
#34645, reported on macOS Terminal.app).

Two budget-accounting bugs let the panel overflow:
- the compact-chrome decision ignored the question rows, so full chrome
  (3 blank separators) was kept even with no room
- the '… (question truncated)' marker was not counted against the
  question's row budget, overshooting by one row at a 1-row budget

Fix: reserve one question row in the compact decision, count the
truncation marker against the budget, and drop the question entirely
when the choices alone already exceed the viewport (choices are the
must-see content for a selection).
2026-05-29 12:32:31 -07:00
Teknium
27a2c4f36f fix(mcp): stop reporting false OAuth success when no token was obtained (#34807)
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround

The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.

Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.

* fix(mcp): stop reporting false OAuth success when no token was obtained

`hermes mcp login` reported "Authenticated — N tool(s) available" for
servers that serve tools/list without auth (e.g. Google's official Drive
MCP server) even when the OAuth flow never completed — dynamic client
registration 400'd because the provider doesn't support RFC 7591, so no
token was ever acquired. Every real tool call then hung until timeout
with no indication of why.

Login now verifies a token actually landed on disk after the probe. When
it didn't, it warns that authentication didn't complete and shows the
config needed to supply a pre-registered client_id/client_secret (the
existing, already-supported workaround for DCR-less providers).

Adds a docs pitfall for Google Drive / Atlassian-style providers.

Fixes #34775
2026-05-29 12:32:19 -07:00
Teknium
1cb850b674 fix(api_server): emit per-turn transcript on run.completed (#34703) (#34804)
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround

The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.

Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.

* fix(api_server): emit per-turn transcript on run.completed (#34703)

WebUI clients lost intermediate (pre-tool-call) assistant text after
switching session pages mid-stream. The session-chat SSE stream delivers
all assistant text as assistant.delta events under one message_id
interleaved with tool.* events, then a single assistant.completed
carrying only the final reply — so a client accumulating deltas into one
buffer cannot reconstruct intermediate text segments that preceded tool
calls, and they vanish from the live view (state.db persists them
correctly).

run.completed now carries the authoritative per-turn transcript
(assistant + tool messages for this turn, in client-safe shape) so any
SSE consumer can reconcile its live view against ground truth without a
separate GET /messages round-trip. Purely additive — clients that ignore
the field are unaffected.
2026-05-29 12:27:49 -07:00
Teknium
b6ed3913d2 feat(skills): categorize tap skills from skills.sh.json grouping sidecar
A GitHub tap can ship a repo-root skills.sh.json (the published skills.sh
schema) declaring category groupings. The Skills Hub now reads it at index
time and uses each grouping title as the skill's category label, instead of
the tag-derived guess. Generic: any tap that ships the file gets real
categorization — NVIDIA's groupings (Inference AI, Decision Optimization,
GPU Development, etc.) flow through automatically.

- GitHubSource: _get_skillsh_groupings() fetches+caches the sidecar per repo;
  _parse_skillsh_groupings() flattens it to {skill_name: title};
  _list_skills_in_repo() stamps meta.extra['category']; _meta_to_dict now
  serializes extra so the category survives the index cache round-trip.
- extract-skills.py: prefers extra['category'] over the tag heuristic and
  exempts sidecar categories from the small-category to Other collapse.
- Docs + 12 tests.
2026-05-29 12:24:39 -07:00
Teknium
4de8009ce4 feat(skills): integrate NVIDIA/skills as a trusted skills hub tap
NVIDIA/skills is now a default trusted tap in the Hermes Skills Hub —
discoverable, browsable, searchable, and auto-updating through the same
pipeline that already serves OpenAI, Anthropic, and HuggingFace skills.

Rebased onto current main.
2026-05-29 12:24:39 -07:00
Teknium
1596bb287e fix(dashboard): chat tab works in gated (OAuth) mode (#34793)
The Chat/TUI dashboard tab showed a false "Session token unavailable"
error and never rendered the terminal whenever the dashboard ran in
gated mode (OAuth auth gate active, --insecure not set), even though
the user was fully authenticated and every other tab worked.

Two checks in ChatPage.tsx gated purely on window.__HERMES_SESSION_TOKEN__,
which the server intentionally omits in gated mode (web_server.py only
injects __HERMES_AUTH_REQUIRED__=true there; the SPA is expected to use
cookie auth + a single-use WS ticket). buildWsAuthParam() already resolves
WS auth correctly for both modes, but the early bail prevented the effect
from ever reaching it.

Both checks now also honor __HERMES_AUTH_REQUIRED__: the banner no longer
fires and the xterm/WS effect no longer bails in gated mode.

Reported-by: wbrione <wbrione@users.noreply.github.com>
Closes #34755
2026-05-29 12:19:51 -07:00
Teknium
90b3c54de9 fix: drain thread no longer crashes on fd-less stdout streams (#34789)
* docs(code-execution): document HERMES_* env narrowing + passthrough workaround

The execute_code sandbox-child env scrub (108397726, #27303) deliberately
dropped the broad HERMES_ prefix passthrough, keeping only an operational
4-var allowlist (HERMES_HOME/PROFILE/CONFIG/ENV). A script that relied on a
non-secret HERMES_* var (HERMES_BASE_URL, HERMES_KANBAN_DB, HERMES_*_WEBHOOK,
or a plugin-defined one) now sees it unset in the child.

Document the behavior change and the two recovery routes (terminal.env_passthrough
in config.yaml, or required_environment_variables in skill frontmatter), plus
the debug log line that surfaces the drop for diagnosis.

* fix: drain thread no longer crashes on fd-less stdout streams

The _wait_for_process drain thread called proc.stdout.fileno()
unconditionally. ProcessHandle implementations whose stdout is not
backed by a real OS fd (iterator-style in-memory streams, mock procs)
raised 'list_iterator' object has no attribute 'fileno' (or
'fileno() returned a non-integer' from select.select), killing the
daemon thread and silently losing all process output.

Resolve the fd defensively at the top of _drain; when stdout has no
usable integer fileno, fall back to draining it as an iterable (the
legacy 'for line in proc.stdout' contract). The real subprocess /
os.pipe-backed select() fast path is unchanged.
2026-05-29 12:16:57 -07:00
teknium1
5641ae6469 chore(release): add AUTHOR_MAP entries for Bucket-1 docs salvage contributors 2026-05-29 12:06:22 -07:00
Twanislas
549a69a925 docs(curator): align 'agent-created' definition with actual provenance semantics
The curator docs stated that any skill not bundled/hub-installed was
'agent-created' and subject to curation — including foreground-created
skills and hand-written ones. Since PR #19621 (May 2026), the curator
requires an explicit  marker in .usage.json, which
only the background self-improvement review fork sets.

Changes:
- Rewrite 'What agent-created means' to document the 3-step eligibility
  check (not bundled + not hub + created_by=agent marker)
- Explain that foreground skill_manage(create) does NOT mark skills as
  agent-created (user-directed by design)
- Warn that hand-written skills are NOT curated
- Add note in Per-run reports explaining the '(not resolved)' display
  when no candidates exist (LLM pass skipped, not a config error)
- Link to skill_provenance.py for the write-origin ContextVar

Ref: PR #19621, tools/skill_provenance.py, tools/skill_manager_tool.py
2026-05-29 12:06:22 -07:00
Aman113114-IITD
3f0d44af8a docs: replace invalid 'hermes config get <key>' with 'hermes config show'
'hermes config get <key>' is referenced in three guides but is not a
valid subcommand. The valid subcommands under 'hermes config' are
{show,edit,set,path,env-path,check,migrate}. 'hermes config show' is
already used elsewhere in the docs (including 'hermes config show |
grep <pattern>' in the FAQ), so it's the idiomatic replacement.

- work-with-skills.md: 'View all skill config' now uses
  'hermes config show | grep ^skills\.config'
- migrate-from-openclaw.md: session-policy check now reads the value
  from 'hermes config show'
- configuring-models.md: 'inspect what the CLI will actually use'
  now uses 'hermes config show | grep ^model\.'

Refs #30195
2026-05-29 12:06:22 -07:00
HKPA
eff4626747 fix(docs): add baseUrl prefix to SVG image paths in sessions and CLI pages
Fixes #24809

The docs site uses baseUrl='/docs/' but the <img> tags in sessions.md
and cli.md referenced images at /img/docs/... which resolves to a 404.
The static files are served at /docs/img/docs/... instead.

Before: <img src="/img/docs/session-recap.svg"> → 404
After:  <img src="/docs/img/docs/session-recap.svg"> → 200

Also fixes cli-layout.svg which had the same issue.
2026-05-29 12:06:22 -07:00
aqilaziz
175885218e fix(docs): align fallback provider config examples
Use the current top-level fallback_providers list in fallback docs and keep fallback_model documented only as the legacy compatibility shape. Also align cron and delegation fallback coverage with current runtime behavior.

Closes #19691

Co-authored-by: Codex <codex@openai.com>
2026-05-29 12:06:22 -07:00
helix4u
119390a2a1 docs(config): deprecate MESSAGING_CWD guidance 2026-05-29 12:06:22 -07:00
helix4u
3625dbb844 docs(security): update redaction skill source 2026-05-29 12:06:22 -07:00
helix4u
aef04b2b53 docs(security): fix secret redaction default docs 2026-05-29 12:06:22 -07:00
TonyPepe
a2d3cff53f docs(cli): refine update gateway restart wording 2026-05-29 12:06:22 -07:00
TonyPepe
ee0a9bf7c7 docs(cli): align hermes update flags 2026-05-29 12:06:22 -07:00
WadydX
b922e3ff93 docs(prompt): align precedence docs with system prompt runtime
- Replace outdated linear ordering in prompt-assembly guide with
  current stable/context/volatile tier contract from system_prompt.py
- Clarify where memory/profile snapshots live versus skills guidance
- Document that pre_llm_call context is user-message injection, not
  cached system-prompt mutation
- Update architecture guide wording to reference system_prompt.py +
  prompt_builder.py tiered assembly

Closes #34118
2026-05-29 12:06:22 -07:00
Octavio Turra
053969fd53 Correct URL format for simplex-chat download
Fix download link for Linux/macOS binary in documentation.
2026-05-29 12:06:22 -07:00
alelpoan
988cf1743b fix(docs): replace channel link with actual playlist URL in quickstart 2026-05-29 12:06:22 -07:00
kurobaryo
03bdeaa876 docs: fix BROWSERBASE_SESSION_TIMEOUT unit (ms → seconds) 2026-05-29 12:06:22 -07:00
haran2001
d86710528a docs(google-workspace): fix dead gws CLI link to googleworkspace/cli
The Google Workspace skill doc linked to https://github.com/nicholasgasior/gws
which returns 404. The actual upstream CLI lives at
https://github.com/googleworkspace/cli (the official Google Workspace CLI in
Rust, dynamically built from the Google Discovery Service).

Closes #28922
2026-05-29 12:06:22 -07:00
Niels Kaspers
6891e05e78 docs: fix session recap image baseUrl 2026-05-29 12:06:22 -07:00
hllqkb
0673638560 fix(docs): correct GitHub org links in memory-providers.md
hermes-ai/hermes-agent → NousResearch/hermes-agent (2 occurrences).
The old org name leads to 404 pages.
2026-05-29 12:06:22 -07:00
Hashclaw
ae9dfa510e docs: fix separate typo; hyphenate built-in trust wording
- ACL LaTeX template comment: seperate -> separate
- CONTRIBUTING and docs site: builtin trust -> built-in trust (prose/table cells)

Made-with: Cursor
2026-05-29 12:06:22 -07:00
kshitij
7379f17556 fix(gateway): only fire planned-stop watcher for self-targeting markers + fix Windows consume (#34749)
* fix(gateway): only fire planned-stop watcher for markers targeting self

Salvaged from #34599 — rebased onto current main.

The planned-stop watcher now only fires shutdown for a marker that targets
the current process, instead of any marker that exists on disk. Fixes the
Windows crash loop (#34597) where a stale marker from a previous Gateway
instance kills a freshly booted Gateway ~400ms after start with a false
"Received UNKNOWN — initiating shutdown".

Co-authored-by: Bartok9 <danielrpike9@gmail.com>

* fix(gateway): match planned-stop/takeover markers by PID alone when start_time is unavailable

Follow-up to the #34599 salvage. The watcher's non-destructive probe
(planned_stop_marker_targets_self) already falls back to PID equality when
a process start_time is unavailable, but the authoritative consume it gates
(_consume_pid_marker_for_self) still required a non-None start_time match.

_get_process_start_time reads /proc/<pid>/stat and returns None on macOS and
native Windows — the only platform the planned-stop watcher exists for. So on
Windows the probe would fire the shutdown handler (PID matches) but the
handler's consume_planned_stop_marker_for_self() would return False, and a
legitimate 'hermes gateway stop' was still misclassified as an unexpected
UNKNOWN exit (exit 1) and revived by the service manager — a residual half of
the #34597 crash loop on the legitimate-stop path.

Align the consume with the probe: when both start_times are known they must
match (PID-reuse guard preserved on Linux); when either is unavailable, fall
back to PID equality alone, bounded by the existing short marker TTL. This
also fixes the parallel --replace takeover consume on Windows, which shares
the same helper.

Adds regression tests for the Windows (None start_time) path, the foreign-PID
rejection under that fallback, and confirmation the start_time-mismatch guard
still rejects when both are known.

---------

Co-authored-by: Bartok9 <danielrpike9@gmail.com>
2026-05-29 17:36:58 +00:00
alt-glitch
0563ab0652 fix(test): add fal_client.submit stub to surface matrix test
The plugin switched from fal_client.subscribe() to submit()+handle.get().
The test mock only had subscribe, causing CI failures.
2026-05-29 22:26:24 +05:30
alt-glitch
e46e4bcf47 fix(video_gen): parse duration suffix in success_response
int(payload["duration"]) blows up on "4s" (veo3.1 format).
Strip non-digit chars before int conversion in the response builder.
2026-05-29 22:26:24 +05:30
alt-glitch
3183b2e28c fix(video_gen): veo3.1 duration format and 4k resolution
FAL veo3.1 API expects duration as "4s"/"6s"/"8s" (with unit suffix),
not bare "4"/"6"/"8" like other families. Add per-family duration_suffix
field and apply it in _build_payload. Also add "4k" to veo3.1 resolutions
per FAL API docs.

Note: the managed gateway currently rejects the "4s" format (expects
integer duration). Gateway-side fix needed for veo3.1 to work through
the Nous subscription path.
2026-05-29 22:26:24 +05:30
alt-glitch
a4c18f65d4 feat(video_gen): wire Nous subscription override into hermes tools UX
Add the same managed-gateway UX that image_gen already has:

- TOOL_CATEGORIES['video_gen'] gets a 'Nous Subscription' provider row
  with managed_nous_feature='video_gen' + video_gen_plugin_name='fal'
- NousSubscriptionFeatures gains a video_gen property + feature state
  computation (managed/active/available using the fal-queue gateway)
- _GATEWAY_TOOL_LABELS, _GATEWAY_DIRECT_LABELS, _ALL_GATEWAY_KEYS,
  _get_gateway_direct_credentials, opted_in all include video_gen
- apply_nous_managed_defaults and apply_gateway_defaults handle video_gen
- _is_toolset_satisfied checks Nous features for video_gen
- _is_provider_active detects managed video_gen (use_gateway + fal provider)
- _select_plugin_video_gen_provider accepts use_gateway kwarg, propagated
  from all 4 call sites in _configure_provider when managed_feature is set
- hermes setup status shows 'Video Generation (FAL via Nous subscription)'

Users on a Nous subscription can now pick 'Nous Subscription' under
hermes tools → Video Generation, which sets video_gen.provider=fal +
video_gen.use_gateway=true. The FAL plugin's _resolve_managed_fal_video_gateway
then routes through the managed queue gateway — no FAL_KEY needed.
2026-05-29 22:26:24 +05:30
alt-glitch
b6294ea9f1 test(video_gen): cover gateway decision matrix gaps and 4xx error path
- Add test for 4xx ValueError with actionable remediation message
- Add test for is_available() returning True via managed gateway
- Add test for prefers_gateway overriding direct FAL_KEY
- Add test for is_available() via gateway in plugin test file
2026-05-29 22:26:24 +05:30
alt-glitch
d04b3c193e feat(video_gen): route FAL video gen through managed Nous gateway
Wire plugins/video_gen/fal/__init__.py to use the same
_ManagedFalSyncClient pattern that image gen already uses.

Changes:
- Add managed gateway resolution, client caching, and
  _submit_fal_video_request() that routes between direct FAL_KEY
  and Nous gateway modes
- Update is_available() to return True when either FAL_KEY or the
  managed gateway is reachable
- Update generate() to use submit+get handle pattern instead of
  fal_client.subscribe() directly
- Fix happy-horse endpoint namespace: fal-ai/ → alibaba/ (matches
  the tool-gateway allowlist from fal-video-gen branch)
- Surface actionable error on 4xx gateway rejections

Tests:
- 4 new tests in test_managed_media_gateways.py (gateway routing,
  client reuse, direct mode fallback, alibaba namespace)
- Updated existing test_fal_plugin.py fixture to use submit/handle
  pattern and patch _resolve_managed_fal_video_gateway for isolation
2026-05-29 22:26:24 +05:30
kshitijk4poor
5cd0673217 ci: harden supply-chain gate jobs against changes-job failure
The scan-gate / dep-bounds-gate jobs use needs.changes; if the changes
job itself fails, its dependents would be skipped via a failed dependency
(not a conditional skip), leaving the required check unreported — the same
"pending forever" failure this PR fixes. Add always() and switch the gate
condition from == 'false' to != 'true' so the gate still fires (and reports
SUCCESS) when changes fails and its output is empty.
2026-05-29 09:17:01 -07:00
ethernet
6bc309baf2 ci: ensure required checks always report status
Remove paths filters from contributor-check and supply-chain-audit
workflows. When no matching files changed, the workflows never ran and
the required checks (check-attribution, supply chain scan, dep bounds)
stayed "pending" forever, blocking merge.

Now both workflows always trigger. A path-check step/job determines
whether the real work should run; gate jobs with matching names report
success when the real job was skipped, so branch protection always
gets a check status.

Also fixes dep-bounds: the old condition
  if: contains(github.event.pull_request.changed_files_url, 'pyproject.toml') || true
was always true (the || true made it unconditional). Now uses the
proper changes.deps output from the shared filter job.
2026-05-29 09:17:01 -07:00
ethernet
6928692cec Merge pull request #33773 from dvir-pashut/fix/nix-full-drop-stale-vercel-group
fix(nix): drop stale "vercel" group from #full variant
2026-05-29 11:16:25 -04:00
teknium1
75cd420b3b docs(skills): move antigravity-cli to autonomous-ai-agents in catalog + sidebar 2026-05-29 05:21:48 -07:00
teknium1
78d7fa1b5c refactor(skills/antigravity-cli): move to autonomous-ai-agents (it's an AI agent CLI) 2026-05-29 05:21:48 -07:00
teknium1
904c0b479b refactor(state): return FTS index count from vacuum()
Have vacuum() return optimize_fts()'s count so the CLI 'sessions optimize'
summary uses the real merged-index count instead of probing the private
_FTS_TABLES / _fts_table_exists() members.
2026-05-29 05:09:56 -07:00
kshitijk4poor
38695254f8 perf(state): merge FTS5 segments on VACUUM + add 'hermes sessions optimize'
The FTS5 indexes (messages_fts, messages_fts_trigram) grow as a series of
incremental b-tree segments — one per trigger-driven insert batch. SQLite's
automerge caps at ~16 segments, so a long-lived store keeps scanning many
segments per MATCH and never collapses them unless the special 'optimize'
command runs. Nothing in the codebase ever ran it: vacuum() only fired after
a prune that deleted rows, and even then never merged FTS segments.

Changes:
- SessionDB.optimize_fts(): merges each FTS5 index to a single segment,
  probing for the (optional/lazy) trigram table first so it is safe to call
  unconditionally. Layout-only — search results and snippet() are unchanged.
- vacuum() now calls optimize_fts() before VACUUM so freed index pages are
  returned to the OS in the same pass.
- 'hermes sessions optimize' CLI subcommand for on-demand reclamation +
  segment compaction (previously there was no way to compact the store
  without a prune deleting rows), with before/after size reporting.

Benchmark (8000 msgs, fragmented to 8 segments/index):
- segments 8 -> 1 on both indexes
- porter MATCH 5.5x faster (0.449 -> 0.081 ms/q)
- trigram MATCH 3.0x faster (0.632 -> 0.207 ms/q)
- 8000 matches before == 8000 after, identical row ids (no functional change)

Orthogonal to the structural FTS-size PRs (#20239 external-content,
#27770 optional trigram) — segment merge helps regardless of those.

Tests: TestOptimizeFts covers index count, search+snippet preservation,
missing-trigram path, and idempotency. Full test_hermes_state.py green (227).
2026-05-29 05:09:56 -07:00
Teknium
2159d2a729 docs(credential-pools): document immediate rotation on usage-limit 429 (#34580)
The rotation flowchart only described the generic 'retry once, rotate on
second 429' path. ChatGPT/Codex plan-limit 429s carry a usage_limit_reached
reason and rotate to the next pool key immediately (no retry, since the cap
won't clear on retry). Document that case so the docs match the code.
2026-05-29 04:50:14 -07:00
teknium1
0dba60f73b docs(skills): regen catalog + sidebar for optional antigravity-cli skill 2026-05-29 04:49:42 -07:00
teknium1
632a7088a3 chore(skills/antigravity-cli): make optional, frame through Hermes tools, tighten frontmatter 2026-05-29 04:49:42 -07:00
Tony Simons
1bba5f27ab feat(skills): add antigravity-cli operator skill 2026-05-29 04:49:42 -07:00
teknium1
d6f2bdabda docs(skills): regen catalog + sidebar for optional grok skill 2026-05-29 04:49:38 -07:00
teknium1
99ddba94ed chore(skills/grok): make optional + tighten SKILL.md to modern format 2026-05-29 04:49:38 -07:00
Matt Maximo
10cd4138cc feat(skills): add grok skill for xAI Grok Build CLI
Adds a `grok` skill under `skills/autonomous-ai-agents/`, a third coding-agent orchestration guide alongside `codex` and `claude-code`. It teaches Hermes to delegate coding tasks to Grok Build (xAI's `grok` CLI).

- Headless `-p` one-shots (preferred)
- Interactive TUI via pty + tmux
- Session resume, background tasks, structured JSON output
- PR review and parallel worktree patterns
- Auth via SuperGrok / X Premium+ (`grok login`)
- Full pitfalls and config notes
2026-05-29 04:49:38 -07:00
Teknium
5e7c2ffa9f chore(models): gemini-3.5-flash replaces gemini-3-flash-preview in OpenRouter + Nous lists (#34581)
* chore(models): swap gemini-3-flash-preview for gemini-3.5-flash in OpenRouter + Nous lists

* chore(models): regenerate model-catalog.json for gemini-3.5-flash swap
2026-05-29 04:27:58 -07:00
teknium1
1c53d39eaa test: deflake process-registry kill + PTY resize tests
Two CI flakes surfaced on PR #34572 (both in files this PR doesn't touch;
pre-existing host-dependent flakes):

1. test_process_registry::TestPopenLeakOnSetupFailure — the failure-cleanup
   tests use a fake proc.pid (8888/9999) and assert proc.kill() runs. But
   spawn_local's primary cleanup is os.killpg(os.getpgid(pid), SIGKILL),
   falling back to proc.kill() only on ProcessLookupError/PermissionError/
   OSError. When the fake PID happens to exist on a busy host, os.getpgid
   succeeds, os.killpg fires against an UNRELATED real process group, and
   proc.kill() is never reached -> flaky AssertionError (and a real risk of
   SIGKILLing an innocent process group from a unit test). Patch os.getpgid
   to raise ProcessLookupError so the fallback path runs deterministically
   and no real killpg is ever issued.

2. test_web_server::test_resize_escape_is_forwarded — the receive loop calls
   the blocking conn.receive_bytes() with no exception guard. Once the child
   prints its winsize and exits, the PTY closes; on a missed-marker run the
   next recv blocks until the 30s pytest-timeout instead of failing fast.
   Add a try/except break (matching the working sibling tests) and bump the
   child's pre-read sleep 0.15s -> 0.5s so the resize reliably lands first.

Verified: 4/4 pass across 3 consecutive runs; root cause for #1 reproduced
(os.getpgid(1) succeeds -> old code skips proc.kill).
2026-05-29 04:22:41 -07:00
teknium1
6a2e3c2d26 fix(gateway): guard adapter-trust check against bare GatewayRunner in tests
_adapter_enforces_own_access_policy accessed self.adapters directly, but
several auth tests build a bare GatewayRunner via object.__new__ without
setting .adapters (pitfalls.md #17). Read it defensively with getattr so a
missing/empty adapter map means "no adapter owns the policy" instead of
raising AttributeError.

Fixes 4 tests: test_feishu_bot_auth_bypass, test_discord_bot_auth_bypass (x2),
test_signal::test_signal_in_allowlist_maps.
2026-05-29 04:22:41 -07:00
teknium1
fd09b2c55e fix(gateway): trust adapter-owned access policy over env default-deny (#34515)
Config-driven platform policies (dm_policy / group_policy / allow_from /
group_allow_from) for WeCom, Weixin, Yuanbao, and QQBot now work without
also setting a PLATFORM_ALLOWED_USERS env var.

These adapters enforce their access policy at intake — a message is dropped
inside the adapter and never dispatched unless it already passed the policy.
The gateway's env-based check (_is_user_authorized) ran afterward and, with
no env allowlist set, fell through to an env-only default-deny — silently
rejecting `dm_policy: open` and config-only allowlists the adapter had
already authorized.

Rather than re-implement each adapter's policy a second time in run.py
(which would drift), adapters that own their gate now declare it via a new
BasePlatformAdapter.enforces_own_access_policy property (default False). The
gateway trusts that flag and skips the env-only default-deny for those
platforms. Env allowlists still take precedence when set.

Also resolves unauthorized DM behavior from config dm_policy so allowlist /
disabled policies drop unauthorized DMs silently instead of leaking pairing
codes, while an explicit pairing policy opts back in.

Co-authored-by: Frowtek <frowte3k@gmail.com>
2026-05-29 04:22:41 -07:00
teknium1
ddaf2f6712 style: restore PEP8 blank-line separation after dead-code removal
The deletions in the salvaged commit left some top-level defs/classes
separated by a single blank line. Restore the 2-blank-line separation.
2026-05-29 04:22:27 -07:00
kshitijk4poor
dc235e93cb chore: remove dead code — 28 unused functions/classes across 16 files
Vulture + per-symbol verification (whole-repo grep incl. tests, string
literals, getattr, decorator/registry/argparse dispatch) confirmed each of
these has zero callers anywhere — not reachable via any dynamic-dispatch path,
not referenced by tests, not re-exported.

Removed:
- acp_adapter/tools.py: _build_patch_mode_content
- agent/anthropic_adapter.py: read_claude_managed_key (diagnostics-only, never called)
- agent/bedrock_adapter.py: get_bedrock_model_ids
- agent/browser_registry.py: get_active_browser_provider
- agent/chat_completion_helpers.py: _take_request_client (x2 nested closures, never invoked)
- gateway/platforms/weixin.py: _rewrite_headers_for_weixin, _rewrite_table_block_for_weixin
- hermes_cli/banner.py: _skin_branding
- hermes_cli/debug.py: _delete_hint
- hermes_cli/gateway.py: _setup_email, _setup_sms, _setup_yuanbao
  (platform keys absent from the _builtin_setup_fn dispatch dict; handled by
  the _setup_standard_platform fallback)
- hermes_cli/kanban_db.py: set_max_runtime, active_run
- hermes_cli/kanban_diagnostics.py: severity_of_highest, _latest_clean_event_ts
- hermes_cli/main.py: _build_provider_choices, cmd_portal
  (portal subcommand is wired via portal_cli.add_parser, not this wrapper)
- hermes_cli/model_switch.py: CustomAutoResult (orphaned by the switch_model() extraction)
- hermes_cli/models.py: format_model_pricing_table, fetch_nous_account_tier
- hermes_cli/portal_cli.py: _nous_portal_base_url
- hermes_cli/proxy/server.py: handle_models_fallback (defined but never registered on the router)
- tools/computer_use/cua_backend.py: _parse_element, _is_arm_mac
- tools/file_operations.py: _get_safe_write_root (prod uses the imported
  agent.file_safety.get_safe_write_root directly)
- tools/skills_tool.py: _load_category_description

Also dropped two imports left unused by the removals:
- tools/file_operations.py: get_safe_write_root alias
- tools/computer_use/cua_backend.py: import platform

Pure deletion: -551 LOC. No behavior change. Test files covering the edited
modules pass (640/640); the broader suite's pre-existing/env-dependent
failures reproduce unchanged on origin/main.
2026-05-29 04:22:27 -07:00
teknium1
0aa9f6acfa docs(nav): wire multi-profile-gateways guide into sidebar
Follow-up for #30240 — the new page was not referenced in sidebars.ts,
leaving it orphaned (unreachable via nav and flagged as a broken relative
link to ./profiles.md). Added under Using Hermes after profile-distributions.
2026-05-29 04:11:10 -07:00
William Chen
0c0a905011 docs(gateway): add multi-profile gateways operations guide
Covers running multiple Hermes profiles as managed services on one host:

- A shell-loop wrapper pattern for start/stop/restart/status across every
  profile (the per-profile CLI commands stay unchanged).
- Per-platform service file locations (LaunchAgent on macOS, systemd user
  unit on Linux), plus the rules around clashes.
- Log paths per profile and how to tail every gateway at once.
- Config file layout per profile and the restart-after-edit workflow.
- Keeping the host awake: caffeinate flags on macOS,
  systemd-inhibit + loginctl enable-linger on Linux.
- Token-conflict auditing across .env files.
- Troubleshooting for the common "Could not find service in domain for
  user gui: 501" message and stale PIDs after a crash.

Tested locally with five profiles on macOS launchd.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 04:11:10 -07:00
Teknium
e4b9532c18 feat: embedder environment-hint hook for the system prompt (#34574)
* fix(security): block AWS SDK creds from subprocess env

* fix(security): narrow Bedrock subprocess strip to inference bearer token only

Scopes the AWS_SDK subprocess strip down from the full AWS credential chain
to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed *inference* secret
(analogous to OPENAI_API_KEY). The general AWS credential chain
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE
/ config + role pointers) is intentionally left inheritable.

Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator
shell. Hard-blocklisting the general chain would (a) regress *every* user who
runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users,
since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be
unrecoverable, because env_passthrough.py refuses to re-allow anything in
_HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes
the reported leak (opencode enumerating the Bedrock catalog off the leaked
bearer token) with no capability loss.

Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future
SDK-cred provider is covered automatically.

Tests: bearer token stripped + general chain preserved (no-regression guard),
on both the runtime strip path and the blocklist-membership path.

Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>

* feat: embedder environment-hint hook for the system prompt

Adds HERMES_ENVIRONMENT_HINT env var (and config.yaml agent.environment_hint)
so a host wrapping Hermes (sandbox runner, managed platform) can describe the
runtime environment — proxy, credential handling, mount layout — in the system
prompt's environment-hints block, without editing the identity slot (SOUL.md).

Read once at prompt-build time, so it lands in the stable, cache-safe portion
of the system prompt. Env var overrides the config key (build-time/container
mechanism). Empty by default — no behavior change for existing installs.

---------

Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
2026-05-29 04:10:05 -07:00
Hariharan Ayappane
c0b17b3c0c docs(weixin): clarify allowed users setup 2026-05-29 04:01:06 -07:00
Dave Tist
2520c9ad68 docs(skills): clarify Reminders alarm timing 2026-05-29 04:01:01 -07:00
LeonSGP43
62e81b2d9b docs(windows): add WSL desktop shortcut guide 2026-05-29 04:00:57 -07:00
SHL0MS
fe7e0a8c1d docs(feishu): add permission scopes, event subscription, and publish steps
The setup guide was missing the specific Feishu permission scopes to
configure and the event subscription (im.message.receive_v1) needed
for the bot to receive messages. Users had to reference external
OpenClaw documentation to complete the setup.

Adds:
- Required permissions table (im:message, im:message:send_as_bot,
  im:resource, im:chat, im:chat:readonly)
- Recommended permissions (reactions, app info, contact)
- Event subscription step (im.message.receive_v1)
- App version publish reminder (permissions require published version)
2026-05-29 04:00:52 -07:00
briandevans
6e179c44b1 fix(web): ensure plugin discovery before web_*_tool registry lookups
Web search/extract dispatch read agent.web_search_registry before plugin
discovery had run, so in any process that hadn't imported model_tools.py
(subprocess agent runs, delegate children, standalone scripts) the registry
was empty: get_provider('firecrawl') returned None and the dispatcher emitted
the misleading 'No web extract provider configured' error even with
web.extract_backend set and FIRECRAWL_API_KEY exported.

Adds an idempotent _ensure_web_plugins_loaded() helper (mirrors
tools.browser_tool._ensure_browser_plugins_loaded) and calls it at the top of
both the web_search_tool and web_extract_tool dispatch sites before the
registry lookup.

Fixes #27580.

Co-authored-by: briandevans <252620095+briandevans@users.noreply.github.com>
2026-05-29 04:00:00 -07:00
teknium1
58e1b04665 chore(release): map tillfalko to GitHub login for PR #29987 salvage 2026-05-29 03:58:56 -07:00
teknium1
c77a697fa4 refactor(vision): consolidate native fast-path gate into one shared helper
The fast-path decision (native routing + provider allowlist OR
supports_vision override) lived inline in vision_analyze and was copied
into browser_vision. Extract it to _should_use_native_vision_fast_path()
so both tools share one source of truth.

- vision_tools: gate logic now one helper; vision_analyze calls it in 3 lines
- browser_tool: thin envelope decoration over the shared helper, not a copy
- browser_vision typed Union[str, Dict] to match its real return shape
- tests slimmed to target the override path + text-mode-wins invariant
2026-05-29 03:58:56 -07:00
tillfalko
c3f28c651d docs(browser): update browser_vision tool description for native vision routing 2026-05-29 03:58:56 -07:00
tillfalko
2402ec5e7b test: extend test coverage to native image routing 2026-05-29 03:58:56 -07:00
tillfalko
f8b8dffccf fix(browser): add native image support to browser_vision and respect supports_vision 2026-05-29 03:58:56 -07:00
tillfalko
f05353397d fix(vision): respect supports_vision in vision_analyze 2026-05-29 03:58:56 -07:00
EloquentBrush0x
784d8dd2c2 fix(matrix): fail-closed approval reaction auth when MATRIX_ALLOWED_USERS is empty
The _on_reaction approval handler used:

    if self._allowed_user_ids and sender not in self._allowed_user_ids:

When MATRIX_ALLOWED_USERS is not configured, _allowed_user_ids is an
empty set. The short-circuit on the empty set caused the deny block to
never execute, allowing any Matrix room member to approve or deny tool
calls via / reactions — even users that run.py's _is_user_authorized
would reject for regular messages.

Fix mirrors the Telegram _is_callback_user_authorized fix (commit
89d32052e, PR #28494): deny by default when no allowlist is configured,
unless GATEWAY_ALLOW_ALL_USERS=true is explicitly set.
2026-05-29 03:58:45 -07:00
teknium1
3171845479 fix(code-exec): make dropped HERMES_* env vars diagnosable in sandbox scrub
Follow-up mitigation for the #27303 env-scrub tightening. Dropping the
broad HERMES_ prefix in favor of a 4-var operational allowlist is correct
hardening, but a sandbox script that imports a repo module reading a
non-allowlisted HERMES_* var at import time would otherwise see it
silently unset. _scrub_child_env now emits a one-shot debug log naming the
dropped non-secret HERMES_* vars and pointing at the env_passthrough
opt-in escape hatch. Secret-shaped vars are never named in the log.

Tests: dropped vars are logged + env_passthrough named; no log when
nothing is dropped; secret vars excluded from the diagnostic.
2026-05-29 03:44:49 -07:00
firefly
4bdae34771 test(code-exec): regression suite for the approval-bypass cluster
Cover context+callback propagation and teardown-clears, a source guard that both RPC threads stay wrapped, the check_execute_code_guard decision matrix (isolated backend, headless-local, cron-deny, gateway approve/deny/timeout/missing-notify, smart mode, session-yolo), the env-scrub allowlist/secret rules, and a behavioral test that execute_code() blocks before spawning on denial.

Refs #4146, #27303, #30882, #33057
2026-05-29 03:44:49 -07:00
firefly
655090b3d3 feat(gateway): warn at startup on manual approvals with no risk assessor
When approvals.mode=manual with security.tirith_enabled off and no auxiliary.approval model, dangerous commands and execute_code scripts can only be gated by live in-chat approval; with routing fixed they now fail closed (block) rather than silently auto-run. Surface that at startup so operators knowingly enable tirith or auxiliary.approval for unattended gateways.

Refs #30882
2026-05-29 03:44:49 -07:00
firefly
1083977261 fix(code-exec): restore approval context in execute_code RPC threads + guard entry
Wrap both execute_code RPC threads (local UDS + remote file-RPC) with propagate_context_to_thread so gateway sessions no longer fall into check_dangerous_command's non-interactive auto-approve branch and the CLI approval prompt stays reachable. Add check_execute_code_guard: one-shot fail-closed approval of the whole script in gateway/ask/cron-deny before the child spawns (skips isolated backends; command-string built only past the early returns). Drop the broad HERMES_ env passthrough for an explicit operational allowlist plus DSN/WEBHOOK secret substrings, and update the POSIX-equivalence oracle.

Refs #4146, #27303, #30882, #33057
2026-05-29 03:44:49 -07:00
firefly
21aeefe5fd fix(code-exec): propagate agent-turn context into tool worker threads
Worker threads that dispatch Hermes tools started with an empty contextvars.Context and no thread-local approval/sudo callbacks. Add tools/thread_context.propagate_context_to_thread factoring that capture/install/clear lifecycle (mirrors the GHSA-qg5c-hvr5-hjgr pattern), and refactor agent/tool_executor onto it so the security-critical logic lives in one audited place. Update the contextvar-propagation source guard for the new call shape.

Refs #33057
2026-05-29 03:44:49 -07:00
kshitijk4poor
a22c250001 refactor(auth): remove vestigial Nous min_key_ttl/inference_auth_mode params
After the legacy session-key path was removed, two parameters became dead
surface on the Nous runtime-resolution chain:

- min_key_ttl_seconds: del'd inside refresh_nous_oauth_pure and pass-through /
  telemetry-only in refresh_nous_oauth_from_state, _try_import_shared_nous_state,
  _nous_device_code_login, and resolve_nous_runtime_credentials. It controlled the
  now-deleted agent-key mint TTL and drives no behavior.
- inference_auth_mode: with the legacy mode gone, AUTO and FRESH are behaviorally
  identical; the value only fed _normalize_nous_inference_auth_mode validation and
  oauth trace output, never a branch.

Removing inference_auth_mode orphaned its whole supporting cluster
(NOUS_INFERENCE_AUTH_MODE_AUTO/FRESH, NOUS_INFERENCE_AUTH_MODES,
_normalize_nous_inference_auth_mode), and dropping min_key_ttl_seconds orphaned
DEFAULT_AGENT_KEY_MIN_TTL_SECONDS — all deleted here.

Updated every caller (run_agent, auxiliary_client, credential_pool, proxy adapter,
runtime_provider, web_server, main, auth_commands, setup) and pruned the matching
test kwargs. Deleted two tests that exercised the removed surface
(test_legacy_auth_mode_is_rejected, test_try_refresh_..._accepts_explicit_auth_mode).

No behavior change: net -134 LOC of dead code.
2026-05-29 02:24:48 -07:00
kshitijk4poor
95cf8f9842 refactor(auth): drop weak JWT-shape fallback in auxiliary _nous_api_key
The import-failure fallback returned any 3-segment token without scope/
expiry validation, a divergent reimplementation of the canonical
_nous_invoke_jwt_is_usable check. The import is from the same module that
provides resolve_nous_runtime_credentials, so a failure means the whole
auxiliary Nous path is unavailable anyway; return "" instead so the caller
falls through to the clear 'run: hermes auth add nous' guidance rather than
handing back an unvalidated token.
2026-05-29 02:24:48 -07:00
Robin Fernandes
4e4984a11a test(auth): update nous jwt-only expectations 2026-05-29 02:24:48 -07:00
Robin Fernandes
7e958dafc2 fix(auth): address Nous JWT fallback review 2026-05-29 02:24:48 -07:00
Robin Fernandes
41ff6e5937 refactor(auth): Disable Nous legacy session key fallback 2026-05-29 02:24:48 -07:00
teknium1
a87f0a82a5 test(tool-search): redact secrets from harness transcripts + console
The live harness runs against a real OpenRouter key; record['error'] is a
full traceback that, on an auth failure, could echo a request header or URL
containing the key. _redact_secrets() now masks the live OPENROUTER_API_KEY,
any sk-/sk-or- bearer token, and Authorization/Bearer headers before
final_response and error enter the transcript or the console print. Addresses
the CodeQL clear-text-storage/logging findings at the source.
2026-05-29 02:04:12 -07:00
teknium1
18c9e89106 test: update _invoke_tool dispatch assertion for new toolset-scope kwargs
The scoping fix added enabled_toolsets/disabled_toolsets to the
agent_runtime_helpers sequential dispatch into handle_function_call, so
test_invoke_tool_dispatches_to_handle_function_call's assert_called_once_with
(exact match) needs the two new kwargs. Both are None for the default agent
fixture.
2026-05-29 02:04:12 -07:00
teknium1
1709776120 test(tool-search): add live A/B harness, drop checked-in transcripts
Brings in the tool_search live-test harness from the original PR but leaves
out the 11 checked-in scripts/out/*.json transcript files — those are
non-deterministic model output that goes stale the moment the model changes
and were the bulk of the diff. scripts/out/ is now gitignored so a harness
run never re-commits them.

Fixes on top:
- API-key loading goes through hermes_cli.env_loader.load_hermes_dotenv
  instead of hand-parsing ~/.hermes/.env and assigning the value to a local.
  The canonical loader never materializes the secret in a local variable in
  this module, which clears the four CodeQL high alerts
  (py/clear-text-storage / py/clear-text-logging-sensitive-data at the
  transcript write/print sites — they were tracing the key from the
  hand-rolled parser into the records) and removes a hand-rolled parser.
- encoding='utf-8' on every write_text/read_text in both harness scripts
  (Windows-footgun hygiene).

Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
2026-05-29 02:04:12 -07:00
teknium1
7427b9d581 fix(tool-search): scope bridge catalog + dispatch to the session's toolsets
Tool Search read its catalog from the global registry (get_tool_definitions
with no toolset scope = 'start with everything'), so a restricted-toolset
session — subagent, kanban worker, curated gateway session — could:

  1. tool_search the entire process registry, not just its granted tools, and
  2. tool_call any registered plugin/MCP tool it was never given, because
     registry.dispatch() has no enabled_tools gate for non-execute_code tools.

A scoped session (enabled_toolsets=['mcp-github']) reported total_available=26
and successfully invoked an out-of-scope plugin tool via tool_call.

Fix:
- handle_function_call gains enabled_toolsets/disabled_toolsets; the bridge
  dispatch scopes get_tool_definitions to them (also stops polluting the
  process-global _last_resolved_tool_names with out-of-scope tools, which
  leaked into execute_code's sandbox-tool fallback).
- A defense-in-depth gate rejects any tool_call'd name not in the scoped
  deferrable catalog.
- tool_executor's unwrap (both concurrent + sequential paths) enforces the
  same scope before dispatch, since it unwraps tool_call -> underlying name
  and bypasses the bridge branch. New _tool_search_scoped_names() helper,
  cached per-agent on registry generation + toolset scope.
- New scoped_deferrable_names() helper in tool_search.py shared by both sites.

Tests: 4 new regression tests in TestRegression_ToolsetScoping (scoped
catalog, out-of-scope tool_call rejection, no global pollution, helper).
2026-05-29 02:04:12 -07:00
teknium1
369075dc95 feat(tools): progressive tool disclosure for MCP and plugin tools
Adds Tool Search, a structured-tools progressive-disclosure layer that
replaces MCP and non-core plugin tools in the model-visible tools array
with three bridge tools (tool_search / tool_describe / tool_call) when
the deferrable surface would consume more than a configurable percentage
of the active model's context window. Core Hermes tools are never deferred.

Default mode is 'auto' with a 10% context threshold, so small toolsets
pay no overhead. Set tools.tool_search.enabled to 'on' to force or 'off'
to disable.

Design carefully reflects the OpenClaw production failure modes
documented in the openclaw-tool-search-report:

  - Core tools never defer (toolsets._HERMES_CORE_TOOLS). Addresses the
    'tools silently missing from isolated cron turns' regression class
    (openclaw#84141) by construction: there is no code path that can
    drop a core tool.
  - Catalog is stateless across turns — rebuilt from the live tool-defs
    list on every assembly. No session-keyed Map that can drift out of
    sync with the registry.
  - tool_call unwraps the bridge call before any hook fires, so plugin
    pre/post hooks, guardrails, approval flows, and the activity feed
    all see the underlying tool name, not the bridge (addresses
    openclaw#85588 and the verbose-mode complaint on openclaw#79823).
  - The unwrap happens in both the parallel and sequential paths of
    agent/tool_executor.py and also in handle_function_call, so direct
    callers (sandboxed code, eval harnesses) are covered too.
  - Bridge tools cannot invoke each other (recursion guard) and cannot
    invoke core tools (those must be called directly).
  - Tools mode only — no JS-sandbox code-mode. Keeps the surface small.
  - Token estimation via cheap char/4 heuristic; precision isn't needed
    for the threshold decision.

Files:
  - tools/tool_search.py — new module (BM25 retrieval, classification,
    threshold gate, bridge dispatch, unwrap helper).
  - tests/tools/test_tool_search.py — 35 tests including the OpenClaw
    #84141 regression guard.
  - model_tools.py — wires assembly into _compute_tool_definitions as the
    final step, adds skip_tool_search_assembly kwarg so the bridge can
    see the real catalog, dispatches the three bridge tools.
  - agent/tool_executor.py — unwraps tool_call in both parallel and
    sequential parsing loops so checkpointing, guardrails, plugin hooks,
    and tool-progress callbacks all observe the underlying tool name.
  - hermes_cli/config.py — DEFAULT_CONFIG['tools']['tool_search'] block.
  - website/docs/user-guide/features/tool-search.md — user docs.

Validation:
  - 35/35 new tests pass.
  - Existing tool/registry/model_tools/config/coercion/executor tests
    (82 + 74 + small adjacents) green.
  - Live E2E: 20 fake MCP tools registered, get_tool_definitions returns
    3 bridges, tool_search returns top 3 hits, tool_describe returns
    full schema, tool_call dispatches to the real underlying handler
    and the underlying result is what the model sees.
  - Reserved-name recursion guard verified live.
  - Core-tool refusal via tool_call verified live.
2026-05-29 02:04:12 -07:00
teknium1
73d73f1f0d fix(codex): relax no-byte TTFB watchdog default from 12s to 120s
The chatgpt.com/backend-api/codex endpoint can spend tens of seconds in
backend admission / prompt prefill before emitting its first SSE event. The
12s no-byte TTFB cutoff aborted those still-valid streams, surfacing as
'Codex stream produced no bytes within 12s' through all retries (Discord
reports). The OpenAI SDK's own streaming read timeout is 600s, so 12s was
~50x more aggressive than the transport layer would have tolerated.

Default the no-byte cutoff to 120s and raise the openai-codex MAX cap default
to 120s so it no longer clamps the new default back to 20s. Disabling stays
available via HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0; the 25k-token auto-disable,
_STRICT override, and post-first-event idle watchdog are unchanged.

Co-authored-by: Gille <4317663+helix4u@users.noreply.github.com>
2026-05-29 02:02:25 -07:00
teknium1
6bebab4761 fix(security): narrow Bedrock subprocess strip to inference bearer token only
Scopes the AWS_SDK subprocess strip down from the full AWS credential chain
to just AWS_BEARER_TOKEN_BEDROCK — the only Hermes-managed *inference* secret
(analogous to OPENAI_API_KEY). The general AWS credential chain
(AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_SESSION_TOKEN / AWS_PROFILE
/ config + role pointers) is intentionally left inheritable.

Why: per SECURITY.md §3.2 the local terminal is the user's trusted operator
shell. Hard-blocklisting the general chain would (a) regress *every* user who
runs aws/terraform/cdk/boto3 in the agent terminal — not just Bedrock users,
since PROVIDER_REGISTRY is iterated unconditionally at import — and (b) be
unrecoverable, because env_passthrough.py refuses to re-allow anything in
_HERMES_PROVIDER_ENV_BLOCKLIST (GHSA-rhgp-j443-p4rf). The narrow strip closes
the reported leak (opencode enumerating the Bedrock catalog off the leaked
bearer token) with no capability loss.

Keeps zapabob's self-healing auth_type=="aws_sdk" mechanism so any future
SDK-cred provider is covered automatically.

Tests: bearer token stripped + general chain preserved (no-regression guard),
on both the runtime strip path and the blocklist-membership path.

Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
2026-05-29 01:48:08 -07:00
zapabob
95b5b72404 fix(security): block AWS SDK creds from subprocess env 2026-05-29 01:48:08 -07:00
Teknium
db2ce9e7d2 fix(compression): fail open when lock subsystem is missing (version skew) (#34475)
A process running mismatched module versions — conversation_compression.py
re-imported with the post-#34351 lock code while a long-lived
hermes_state.SessionDB stays bound to the pre-#34351 class in memory — has
the try_acquire_compression_lock call site but not the method. The
AttributeError it raises is NOT a sqlite3.Error, so the method's own
fail-open guard never runs; the exception escapes to the outer agent loop,
which prints the error and retries. Compression never succeeds, the token
count never drops, and the loop re-triggers compaction forever (the
'API call #47/#48/#49 ... has no attribute try_acquire_compression_lock'
spin a user hit after an update).

Wrap the lock acquire so any unexpected exception fails OPEN: skip locking
and proceed with compression. Skipping the lock risks a rare
concurrent-compression session fork; an infinite no-progress loop that never
compresses at all is strictly worse. The remediation hint in the log points
at the real fix (restart / hermes update to resync the stale module).

Also guards get_compression_lock_holder against the same skew.

Adds a regression test simulating the version skew (real SessionDB wrapped
so only the lock methods raise AttributeError) — asserts _compress_context
proceeds and rotates instead of raising.
2026-05-29 01:32:32 -07:00
Teknium
e28a668b40 fix(gateway): diagnosable MEDIA rejections + canonical cache roots + null-path guard
Operators can now see which MEDIA path was dropped and why, generated
artifacts under the canonical ~/.hermes/cache/{images,...} layout deliver,
and a crafted ~\x00 path no longer aborts the whole attachment batch.

- MEDIA_DELIVERY_SAFE_ROOTS: add canonical cache/{images,audio,videos,
  documents,screenshots} alongside the legacy *_cache dirs (#31733).
- filter_media/local_delivery_paths: log the rejected path (was a blind
  "outside allowed roots") via _log_safe_path, which strips control chars
  and Unicode line separators so a model-emitted path can't forge a log line.
- validate_media_delivery_path + extract_media: guard os.path.expanduser
  so a ~\x00 path returns None / is skipped instead of raising and dropping
  every other attachment in the response.

Salvaged and slimmed from #33251 (780 LOC -> 35): the reason-tag taxonomy,
the parts-eliding redactor, and the extension-partition hoist are dropped in
favor of logging the path directly. All three findings were verified and
reproduced by the contributor.

Co-authored-by: wysie <wysie@users.noreply.github.com>
2026-05-29 01:23:35 -07:00
teknium1
2765b02021 fix(packaging): ship bundled plugin.yaml manifests in wheel and sdist
The v0.15.0 PyPI wheel shipped every plugin's Python code but none of its
plugin.yaml manifests, so plugin discovery (hermes_cli/plugins.py) found zero
plugins and ALL gateway platforms failed with "No adapter available for
<platform>" (discord, slack, mattermost, ...). Same gap also dropped the
web-search provider manifests (#28149).

Declare manifest coverage in both packaging channels:
- wheel: [tool.setuptools.package-data] plugins += **/plugin.yaml, **/plugin.yml
- sdist: MANIFEST.in recursive-include plugins plugin.yaml plugin.yml
  (Homebrew and other downstream packagers build from the sdist)

Verified by building the wheel before/after: plugin.yaml count went 0 -> 69,
discord's manifest now ships. Adds a regression test asserting both channels
cover manifests.

Fixes #34034

Co-authored-by: outsourc-e <201563152+outsourc-e@users.noreply.github.com>
Co-authored-by: Dhruvil Parikh <41384593+dparikh79@users.noreply.github.com>
Co-authored-by: ousiaresearch <261687298+ousiaresearch@users.noreply.github.com>
Co-authored-by: libre-7 <6366424+libre-7@users.noreply.github.com>
2026-05-29 01:23:28 -07:00
Teknium
c01a2df0a3 fix(auth): don't launch a text-mode browser inside the terminal for OAuth (#34479)
OAuth auto-open only checked _is_remote_session() (SSH + cloud-shell env
vars). On a headless/CLI-only Linux box with no GUI browser, none of those
trip, so webbrowser.open() resolved to a console browser (w3m/lynx/links)
and launched it INSIDE the terminal — hijacking the user's TTY with the
xAI 'Account Management' login page instead of letting them copy the URL.

Add _can_open_graphical_browser(): returns False when webbrowser would
resolve to a known console browser, when $BROWSER names one, when there's
no display server on Linux, or when no browser resolves at all. Gate all 5
OAuth auto-open callsites (xAI loopback, Spotify loopback, MiniMax device
code, Anthropic, Google) on it in addition to the existing remote check.
Headless boxes now print the URL / fall through to manual-paste instead.
2026-05-29 01:23:06 -07:00
loongzhao
f247686c42 feat(yuanbao): cache resolved media resources by resourceId
Add an in-memory resourceId->local-path cache (24h TTL, 256-entry LRU) to
MediaResolveMiddleware so the same Yuanbao resource isn't re-downloaded when
it's referenced more than once in a session (own attachment, then quoted, then
group-observed backfill). Each reference otherwise triggers a fresh token
exchange + COS download.

The cache verifies the file still exists on disk before returning a hit (cache
dir may be swept) and is threaded through all three resolve paths:
_resolve_media_urls (rid parsed from placeholder URL), _collect_observed_media,
and the DispatchMiddleware quote path.

Salvaged from PR #30418 by @loongfay; the broader middleware refactor in that
PR converged with work already merged on main, so only the net-new download
cache is carried over.
2026-05-29 01:05:00 -07:00
wysie
f32b66c758 fix: improve plugins list usability 2026-05-29 00:59:42 -07:00
Teknium
c692000a57 docs(xai-oauth): mirror bare-code paste note to the primary guide (#33917)
The original PR diff updated two guides (oauth-over-ssh.md and
xai-grok-oauth.md) but only the oauth-over-ssh.md edit landed in the
PR's actual commit.  Mirror the note to the primary xai-grok-oauth.md
guide too so users reading the main entry point don't miss the
bare-code form that already shipped in #33880.
2026-05-29 00:57:13 -07:00
Evo
2410e11395 docs(xai-oauth): note bare-code manual-paste from #33880 2026-05-29 00:57:13 -07:00
Teknium
0384398c65 chore(release): map blackpilledsoftware-prog email to GitHub login
Required by CI author validation after salvaging PR #16780.
2026-05-29 00:31:44 -07:00
Blake
26b83a5f5f fix(cli): ignore terminal focus reports (salvage of #16780)
Ghostty/macOS window or tab navigation (Cmd+Shift+[ / ], Alt+Tab,
etc.) can deliver terminal focus reports (CSI I / CSI O) to the
running TUI. prompt_toolkit does not map those sequences by default,
so its parser falls back to literal key presses (ESC, [, I/O) and
inserts `[I` / `[O` into the prompt buffer after the ESC byte is
handled.

Fix: register the two sequences as Keys.Ignore in ANSI_SEQUENCES at
parser level, plus a no-op kb.add(Keys.Ignore) handler so the
default self-insert path never inserts focus-report bytes.

Salvage notes: original PR put the helper in cli.py. Salvaged into
hermes_cli/pt_input_extras.py alongside install_shift_enter_alias /
install_ctrl_enter_alias to match the established pattern for
ANSI_SEQUENCES augmentation. setdefault → in-check so any prior user
registration wins.

Closes #16780
2026-05-29 00:31:44 -07:00
Teknium
c1485d52e3 chore(release): add moikapy AUTHOR_MAP for PR #31527 salvage 2026-05-29 00:28:02 -07:00
moikapy
f6a2ba6261 fix(auxiliary): detect xAI OAuth 403 bad-credentials as auth error
xAI returns HTTP 403 (not 401) with unauthenticated:bad-credentials
when an OAuth2 access token has expired or is invalid. The existing
_is_auth_error() only checked for 401 status codes, so these tokens
were never refreshed and the 403 propagated as a generic permission
denied error.

Three fixes:

1. _is_auth_error: Recognize xAI's 403+bad-credentials pattern as
   an auth failure, triggering token refresh instead of silent failure.

2. _refresh_provider_credentials: Add xai-oauth branch with
   pool-level refresh (try_refresh_current with select to ensure
   current entry) then fallback to singleton resolver with
   force_refresh=True.

3. _recoverable_pool_provider: Map api.x.ai host to xai-oauth
   pool for auto-resolved providers, matching existing pattern for
   openai-codex/openrouter/nous/anthropic.

Includes 14 tests covering the new detection logic, host mapping,
and graceful fallback behavior.

Signed-off-by: moikapy <moikapy@devmoi.com>
2026-05-29 00:28:02 -07:00
teknium1
bc736ff543 test(model-catalog): use exact URL equality in fallback tests
CodeQL flagged 'hermes-agent.nousresearch.com' in url and similar substring
checks as py/incomplete-url-substring-sanitization. The rule is about URL
allowlist checks in production code, not test routing — there's no
security boundary here. Switch to url == self.PRIMARY / self.FALLBACK,
which is the same semantic and silences the rule.
2026-05-29 00:25:36 -07:00
teknium1
f2d88c820c fix(model-catalog): fall through to raw.github when Vercel 403s; swap step-3.5-flash for step-3.7-flash on OpenRouter+Nous
The docs site (Vercel) serves /docs/api/model-catalog.json behind a bot
mitigation rule that returns HTTP 403 + x-vercel-mitigated: challenge for
non-browser User-Agents — including urllib (what the CLI uses) and curl.
When that happens, get_catalog() falls back to the stale disk cache and
new model releases (Opus 4.8, etc.) never reach the /model picker even
though they're already in OPENROUTER_MODELS and the live OpenRouter API.

Adds a fallback URL chain: when the primary catalog URL fails, walk
DEFAULT_CATALOG_FALLBACK_URLS — currently the raw.githubusercontent.com
copy of the same file. GitHub raw doesn't bot-gate, so the manifest stays
reachable through Vercel firewall hiccups. Per-provider override URLs
keep their direct-fetch semantics (operators configure those specifically,
no implicit fallback).

Also swaps stepfun/step-3.5-flash for stepfun/step-3.7-flash in the
OpenRouter + Nous Portal curated picker lists. Native stepfun provider
configuration (api.stepfun.ai) is left alone — that depends on what
stepfun.ai itself serves, not what OpenRouter routes.

Test plan: 5 new TestFallbackChain tests cover primary-success,
primary-failure-fallback-success, all-fail, primary==fallback-dedup, and
end-to-end get_catalog routing through the new helper. Existing 23 tests
in test_model_catalog.py still pass (28 total). Wider tests/hermes_cli/
sweep: 5701/5701 pass.
2026-05-29 00:25:36 -07:00
teknium1
8d57281650 chore: add AUTHOR_MAP entry for Interstellar-code 2026-05-29 00:21:54 -07:00
Rohit Sharma
9d4fda9952 feat(kanban): add POST /runs/{run_id}/terminate endpoint
Closes the termination-control gap left by PR #28432, which shipped the
read-only sibling endpoints (/workers/active, /runs/{run_id},
/runs/{run_id}/inspect) but no way to stop a misbehaving worker from
the dashboard without dropping to the CLI.

The new endpoint resolves run_id -> task_id and delegates to the
existing kanban_db.reclaim_task() flow, so the SIGTERM->SIGKILL
escalation, run-outcome bookkeeping, and event-log append all match
POST /tasks/{task_id}/reclaim exactly. No new termination semantics
introduced.

Responses:
  200 {ok, run_id, task_id} on success
  404 unknown run_id
  409 run already ended OR task no longer reclaimable

Refs: #23762
2026-05-29 00:21:54 -07:00
teknium1
7d10105918 test(kanban): update iteration-exhaustion tests for #29747 gap 2
The two tests in TestRunConversation now verify the new behavior:
  - test_kanban_block_called_on_iteration_exhaustion → verifies
    _record_task_failure(outcome='timed_out') is called instead of
    kanban_block
  - test_no_kanban_block_when_not_in_kanban_mode → verifies the bridge
    is a no-op when HERMES_KANBAN_TASK is unset

The function names are kept for diff stability; both assert against
_record_task_failure now, which is the correct contract per the gap-2
fix in this PR.
2026-05-29 00:13:29 -07:00
teknium1
592a4ffb6b fix(kanban): close three blocked/iteration-exhausted handling gaps (#29747)
Reporter diagnosed three independent gaps that together allowed infinite
'unblock → re-stuck' loops with no surfacing or escalation:

GAP 1: `_rule_stuck_in_blocked` resets timer on any `commented`/`unblocked`
event, so a task that cycles every few minutes is invisible to it
regardless of how many times it cycles.

Fix: new `_rule_block_unblock_cycling` rule (`hermes_cli/kanban_diagnostics.py`)
that counts block→unblock cycles in a sliding window. Default threshold
3 cycles within 24h, configurable via `block_cycle_threshold` /
`block_cycle_window_seconds`. Walks events in arrival order (event id)
since multiple events can share the same `created_at` second. Fires as a
warning with a CLI hint to inspect the block reasons.

GAP 2: Iteration-budget-exhausted runs in kanban workers map to
`kanban_block` (status=blocked, but a clean exit from the kernel's
perspective). `_rule_repeated_failures` reads `consecutive_failures`,
which `_record_task_failure` increments only for crashed/timed_out/
spawn_failed — `blocked` outcome bypasses the failure counter, so the
`kanban.failure_limit` circuit breaker never trips on budget-exhaustion
loops.

Fix: `agent/conversation_loop.py` budget-exhaustion path now calls
`_record_task_failure(outcome="timed_out")` instead of `kanban_block`.
Budget exhaustion is genuinely a timeout-shaped failure (the task ran out
of allowed iterations), so this is more honest semantics; it also routes
through the unified failure counter, so repeated budget exhaustions trip
the circuit breaker and the task auto-blocks with `gave_up` after
`failure_limit` retries.

GAP 3: `release_stale_claims` uses `_pid_alive(worker_pid)` only and
ignores `last_heartbeat_at`. Reporter observed a 91-min run that held
its claim with frozen heartbeat because the worker entered a logic loop
with no tool calls — `_pid_alive` kept returning True so the claim was
extended every 15 minutes indefinitely.

Fix: heartbeat-stale backstop. If `last_heartbeat_at` is set AND older
than `DEFAULT_CLAIM_HEARTBEAT_MAX_STALE_SECONDS` (default 1h), reclaim
even if the PID is alive. NULL `last_heartbeat_at` preserves backward
compatibility (no heartbeat yet = extend, as before). The reclaim event
payload now includes a `heartbeat_stale` boolean so operators see why a
live-PID worker was reclaimed.

This works cleanly in concert with PR #34418 (#31752 runtime → heartbeat
bridge): once `_touch_activity` keeps `last_heartbeat_at` fresh as a
side effect of normal API traffic, the backstop only fires for genuinely
wedged workers (no chunks, no tool results, no progress at all).

Co-authored-by: baofuen <45189813+baofuen@users.noreply.github.com>
2026-05-29 00:13:29 -07:00
teknium1
bc31ee5cf8 fix(kanban): bridge worker runtime activity to board heartbeat (#31752)
The dispatcher watchdog (release_stale_claims) reads tasks.last_heartbeat_at
to decide whether to reclaim a running task. The agent maintains its own
in-process `_last_activity_ts` for every chunk/tool result, but those
liveness ticks never reach the board unless the model explicitly calls
the `kanban_heartbeat` tool — so a worker actively executing a long run
without tool-level heartbeats can be reclaimed mid-flight as 'stale',
returning the task to ready and orphaning the in-flight worker's progress.

Fix: in `_touch_activity` (the canonical 'we just did work' hook in
run_agent.py), call a new `heartbeat_current_worker_from_env` helper
in `tools/kanban_tools.py` that:

- No-ops outside dispatcher-spawned worker context (no HERMES_KANBAN_TASK).
- Rate-limited to one DB write per 60s (runtime activity ticks too often
  to faithfully mirror; we just need the watchdog to see liveness).
- Best-effort: never raises. heartbeat_claim + heartbeat_worker calls are
  individually try/except'd; any DB error logs at debug and returns.
- Uses worker env identity: HERMES_KANBAN_TASK + HERMES_KANBAN_RUN_ID +
  HERMES_KANBAN_CLAIM_LOCK (all pinned by the dispatcher at spawn time).
- No durable note on auto-heartbeats — that's reserved for the explicit
  `kanban_heartbeat` tool which carries a model-supplied note.

The explicit `kanban_heartbeat` tool stays available unchanged for
workers that want to attach a note or pre-emptively extend a claim
across a known-long single tool call.

Co-authored-by: faisfamilytravel <223516181+faisfamilytravel@users.noreply.github.com>
2026-05-29 00:05:58 -07:00
teknium1
40217aa194 fix(kanban): tell workers not to use clarify; route to kanban_block instead (#32167)
Kanban workers run headless — no live user is on the other side of `clarify`,
so the call times out (~120s default) and the task sits silently in `running`
with no signal to the operator that input is needed. Reporter observed a real
incident where a worker asked 'promote to production, or check staging first?'
via clarify, the call timed out, the agent hallucinated a fallback, and the
task sat 'running' for hours.

Fix: explicit 'do not call clarify' bullet in two surfaces every kanban worker
sees —

- `agent/prompt_builder.py` KANBAN_GUIDANCE `## Do NOT` section (auto-injected
  into every dispatcher-spawned worker run).
- `skills/devops/kanban-worker/SKILL.md` `## Do NOT` section (the bundled
  worker skill).

Both point at the right pattern: `kanban_comment` (context) + `kanban_block`
(decision needed) — the task surfaces on the board as blocked, the operator
sees it, unblocks with their answer in a comment, and the worker respawns
with the thread.

Co-authored-by: kweiner <17778+kweiner@users.noreply.github.com>
2026-05-28 23:57:20 -07:00
Teknium
86a389fee2 fix(credential-pool): STATUS_DEAD for terminal OAuth failures (#32849) (#34412)
When OpenAI Codex returns 401 token_invalidated or token_revoked, the
credential is broken upstream — retrying after a TTL cooldown cannot
fix it. The existing code treated every 401/429 the same way:
STATUS_EXHAUSTED with a TTL cooldown (5 min for 401, 1 hour for 429).
After the TTL elapsed, the broken credential re-entered rotation and
immediately failed again with the same 401, surfacing as 'Failed to
generate context summary' on every context-compression cycle.

Reporter observed 7 separate 401 token_invalidated failures from the
same revoked credential in a single day; the only workaround was
removing it manually via 'hermes auth'.

Add a STATUS_DEAD terminal state. Only 401 responses whose
error.code/reason matches a known terminal OAuth state (token_invalidated,
token_revoked, invalid_token, invalid_grant, unauthorized_client,
refresh_token_reused) transition to DEAD. Everything else keeps the
existing TTL semantics — 429 rate limits are transient and should
recover.

DEAD entries are excluded from rotation unconditionally. They only
clear when an explicit write-side re-auth sync rewrites the tokens
(the existing _sync_codex_pool_entries / _sync_*_entry_from_auth_store
paths already clear last_status to None). The read-side
auth.json-sync paths also now fire on DEAD so an in-flight pool entry
can adopt fresh tokens written by another process without needing
explicit re-auth.

After 24 hours, DEAD manual entries (source='manual:*') are pruned
from the pool automatically so dead state doesn't accumulate forever.
Singleton-seeded DEAD entries (source='device_code' etc.) are kept
because _seed_from_singletons would recreate them on the next load
with the same stale tokens — pruning would be pointless. The audit
trail stays visible (label, last_error_reason, timestamps).

Closes #32849.
2026-05-28 23:45:42 -07:00
teknium1
ae6817f7f7 fix(kanban): add --reason flag to unblock for symmetry with block (#30897)
`hermes kanban unblock <id> review-required: ...` parsed every trailing word
as another task_id (since `task_ids` is `nargs='+'`), then quietly failed on
each non-existent id with "cannot unblock review-required: (not blocked/scheduled?)".
Reporter saw this as asymmetric with `block <id> <reason...>` which accepts
positional reason words.

Fix: add a `--reason "..."` flag that, when provided, is appended as a
`UNBLOCK: <reason>` comment before the unblock transition. Bulk syntax
(`unblock t_a t_b t_c`) is preserved unchanged.

Co-authored-by: julio-cloudvisor <211828103+julio-cloudvisor@users.noreply.github.com>
2026-05-28 23:41:44 -07:00
AhmetArif0
4126da65ae fix(security): add bws_cache.json to file_safety read guard
The Bitwarden Secrets Manager disk cache introduced in #31968 stores
plaintext secret values at <hermes_home>/cache/bws_cache.json to avoid
re-fetching across back-to-back CLI invocations. The file was not added
to get_read_block_error()'s credential_file_names list, leaving the
agent able to read it directly via the read_file tool.

Add os.path.join("cache", "bws_cache.json") to credential_file_names
so both HERMES_HOME and the global root are covered, matching the
existing pattern used for auth.json, .anthropic_oauth.json, etc.

Other files under cache/ (images, documents, audio) are unaffected —
the check is an exact-file match, not a prefix match.

Verified: 11/11 exploit/regression scenarios pass; 38/38 existing
file_safety tests pass.
2026-05-28 23:31:20 -07:00
Teknium
71ae98b792 chore(release): map seppe@fushia.be to GitHub login
Required by CI author validation after salvaging PR #33193.
2026-05-28 23:30:39 -07:00
Seppe Gadeyne
cf8862cfa3 fix: preserve Ctrl+J newlines in Ghostty 2026-05-28 23:30:39 -07:00
Gabor Barany
1386a7e478 fix(xai-sanitize): deepcopy tools_for_api before in-place mutation (#27907)
The xAI tool-schema sanitizers (strip_slash_enum, strip_pattern_and_format)
mutate their input in place — that's their documented contract. The two
call sites (chat_completion_helpers.build_api_kwargs and the auxiliary
client) were passing agent.tools straight through, so the first xAI
request would permanently strip slash-containing enum constraints and
pattern/format keywords from the per-agent tool registry.

Effect: any subsequent non-xAI call from the same agent (auxiliary task
routed to Anthropic, OpenRouter fallback, mid-session model switch) saw
the already-stripped schema with no way for the user to notice from
their config.

Fix: deepcopy tools_for_api before sanitizing at both call sites.

The slash-enum bug itself (xAI 400ing on enums with '/') was fixed
earlier by #32443 (Nami4D) — that PR landed the strip but used the
sanitizers directly without copying. This salvages #27907's correctness
contribution (the deepcopy) while skipping its redundant parallel
sanitizer (strip_xai_incompatible_enum_values is functionally
equivalent to the existing strip_slash_enum) and its preflight-
neutrality argument (we chose model-gated preflight in #32443).

3 new tests in tests/run_agent/test_run_agent_codex_responses.py:

- strips_slash_enum_from_outgoing_request — outgoing kwargs has no
  slash-containing enum values (functional contract preserved).
- does_not_mutate_agent_tools — headline #27907 regression. Snapshot
  agent.tools before build_api_kwargs, assert it survives intact
  after. Pre-fix this assertion would have caught the mutation.
- is_idempotent_across_repeated_calls — three xAI requests in a row
  each strip cleanly AND don't progressively erode the source schema.

344/344 across tests/agent/test_auxiliary_client.py,
tests/agent/transports/test_codex_transport.py,
tests/run_agent/test_run_agent_codex_responses.py, and
tests/tools/test_schema_sanitizer.py.

Co-authored-by: Gabor Barany <barany.gabor@gmail.com>
2026-05-28 23:29:59 -07:00
Teknium
db96fc60d0 fix(gateway): keep Telegram topic bindings aligned with compression children (#34409)
Telegram DM topic bindings persist (chat_id, thread_id) -> session_id in
SQLite so reopening a topic resumes the right Hermes session. When
compression rotated session_entry.session_id mid-turn, the binding row
stayed pointed at the pre-compression parent. On the next inbound
message in that topic the gateway reloaded the oversized parent
transcript, retriggering preflight compression — sometimes in a loop.

Two-pronged fix:

1. `_sync_telegram_topic_binding(source, entry, *, reason)` helper
   called immediately after each of the three session_id rotation sites
   in _handle_message_with_agent (hygiene compression, agent-result
   compression rotation, /compress command). Keeps future bindings
   fresh.

2. Read-path self-heal: when resolving an existing topic binding, walk
   SessionDB.get_compression_tip() forward and switch_session to the
   descendant instead of the stored parent. Rewrites the binding row to
   the tip so subsequent messages skip the walk. Heals existing stale
   state on the next user message without requiring a gateway restart.

Skipped from competing PRs as not load-bearing for the bug:
- advance_session_after_compression SessionStore primitive (#26204/
  #28870/#33416) — preserves end_reason='compression' analytics nicety
  but doesn't affect routing correctness.
- Cached-agent eviction on session_id mismatch — _compress_context()
  already mutates tmp_agent.session_id on the cached object so the
  in-memory agent self-corrects.
- Startup repair pass (#33416) — redundant once the read path heals on
  the next message; one-line CLI follow-up can address bindings for
  topics users never reopen.

Closes #20470, #29712, #33414. Acknowledges work in #23195
(@litvinovvo), #26204 (@bizyumov), #28870 (@donrhmexe), #29713
(@hehehe0803), #29945 (@eugeneb1ack), #33416 (@bizyumov).
2026-05-28 23:25:52 -07:00
Ben
ec7736f8a7 fix(docker): auto-join Docker socket group for docker-in-docker backend
When users bind-mount /var/run/docker.sock to use TERMINAL_ENV=docker from
inside the container, the supervised hermes user (UID 10000) lacks
permission to talk to the socket — every `docker` invocation EACCES'es and
check_terminal_requirements() returns False. In messaging mode this also
silently strips the file/terminal toolset from the registered tool list,
so the agent rationalizes the missing tools as a platform restriction.

The naive workaround (docker run --group-add <socket-gid>) does NOT work
with our s6-setuidgid privilege drop: s6-setuidgid calls initgroups() for
the target user, which rebuilds supp groups from /etc/group. Without a
matching /etc/group entry the kernel-granted supp group is wiped between
PID 1 and the dropped hermes process. Verified empirically:

  --group-add 998 alone:    PID 1 Groups: 0 998 → after drop: Groups: 10000
  This fix's /etc/group add: id hermes shows 998 → after drop: Groups: 998 10000

Detect the socket's GID at boot in stage2-hook (runs as root before the
privilege drop), reuse an existing group name if one matches the GID,
otherwise create 'hostdocker'. Idempotent across container restarts.
Silent no-op when no socket is mounted.

End-to-end verified by building the image and running the supervised
hermes user against the real host Docker daemon: `docker version`
succeeds and check_terminal_requirements() returns True.

Fixes #16703
2026-05-29 16:15:44 +10:00
Ben Barclay
48083211ef fix(docker): accept PUID/PGID as aliases for HERMES_UID/HERMES_GID (#25872) (#34401)
Salvages #25872 by @konsisumer against current main.

NAS users (UGOS, Synology, unRAID) expect the LinuxServer.io
PUID/PGID convention and bind-mount /opt/data from a host directory
owned by their own UID.  Without this alias those vars are silently
ignored and the s6-setuidgid drop to UID 10000 leaves the runtime
unable to read the volume.  HERMES_UID/HERMES_GID still take
precedence when both are set.

The original PR targeted docker/entrypoint.sh, which is now a 27-line
deprecation shim under s6-overlay (the May 2026 rework moved all
bootstrap logic to docker/stage2-hook.sh, installed as
/etc/cont-init.d/01-hermes-setup).  Re-applied the same 2-line
alias resolution at the equivalent spot in stage2-hook.sh just
before the existing UID/GID remap block.  Test was retargeted at
docker/stage2-hook.sh; docs hunk adapted to current main's wording
("stage2 hook" + s6-setuidgid, not the obsolete "entrypoint drops
via gosu") with the NAS bind-mount example preserved verbatim.

Test-first regression verification: reverted just docker/stage2-hook.sh
to origin/main and re-ran the new tests.  Result:

  FAILED test_stage2_hook_resolves_puid_pgid_aliases
  FAILED test_puid_pgid_populate_hermes_uid_gid
      AssertionError: assert ':' == '1000:10'

That's the exact bug shape — PUID=1000 PGID=10 silently ignored,
HERMES_UID/HERMES_GID stay empty.  With the salvage applied, all 4
tests pass.

Closes #25872

Co-authored-by: konsisumer <11262660+konsisumer@users.noreply.github.com>
2026-05-29 16:07:15 +10:00
wysie
a0fc3df878 fix(browser): rewrite Camofox Docker loopback URLs (#25541)
Co-authored-by: Wysie <wysie@users.noreply.github.com>
2026-05-29 15:43:55 +10:00
Teknium
f61fd59b62 docs(run_agent): clarify why F401 re-exports stay 2026-05-28 22:26:25 -07:00
Teknium
00b8204cf4 fix: restore side-effect imports in test files (test_kanban_tools, test_command_guards)
The previous ruff prune commit removed two categories of test-file
imports whose value is the side effect of importing them, not their
binding:

  tests/tools/test_kanban_tools.py — 5 sites
    `import tools.kanban_tools  # ensure registered`
    The import itself runs tools/kanban_tools.py's @registry.register
    calls; without it, the kanban tool registry is empty and
    test_kanban_tools_visible_with_env_var asserts {} != {7 kanban tools}.

  tests/tools/test_command_guards.py — 1 site
    `import tools.tirith_security  # Ensure the module is importable so we can patch it`
    The comment names the requirement: keep the bare module reference
    so subsequent mock.patch("tools.tirith_security.<fn>") calls find
    a registered submodule.

CI failure: test (5) shard, tests/tools/test_kanban_tools.py:58
  AssertionError: expected {kanban_*}, got set()
2026-05-28 22:26:25 -07:00
Teknium
e371bf5d68 fix: re-export pruned names for tests that mock.patch or from-import them
The mechanical ruff prune in the previous commit removed several names that
`appear` unused inside their defining module but are external test/runtime
anchors:

  run_agent
    OpenAI, _SafeWriter
    get_tool_definitions, handle_function_call, check_toolset_requirements
    estimate_request_tokens_rough
    DEFAULT_AGENT_IDENTITY, build_context_files_prompt,
    build_environment_hints, build_nous_subscription_prompt
    _is_destructive_command, _extract_parallel_scope_path, _paths_overlap,
    _append_subdir_hint_to_multimodal, _trajectory_normalize_msg

  tools/web_tools
    Firecrawl, _get_firecrawl_client

These get accessed via four channels that are invisible to ruff's
in-module usage analysis:

  1. `mock.patch('module.name', ...)` in tests — resolves the attribute
     lazily, so `pytest --collect-only` passes even when the name is
     gone, but every test using the patch fails at runtime with
     AttributeError.
  2. `from run_agent import X` in production siblings (agent/transports
     /codex.py, etc.).
  3. The `_ra().X` indirection pattern in agent/system_prompt.py et al.
     — explicitly documented ("Many tests patch('run_agent.load_soul_md')")
     to preserve the patch contract.
  4. `from tools.web_tools import _get_firecrawl_client` in tests.

Each re-added import carries an explicit `# noqa: F401` with a comment
naming the channel, so future cleanup passes won't strip them again.
2026-05-28 22:26:25 -07:00
kshitijk4poor
66827f8947 chore: prune unused imports and duplicate import redefinitions
Remove unused imports (F401) and duplicate/shadowed import
redefinitions (F811) across the codebase using ruff's safe
autofixes. No behavioral changes -- imports only.

- ~1400 safe autofixes applied across 644 files (net -1072 lines)
- __init__.py re-exports preserved (excluded from F401 removal so
  public re-export surfaces stay intact)
- Re-exports that are imported or monkeypatched by tests but look
  unused in their defining module are kept with explicit # noqa:
  F401 (gateway/run.py load_dotenv; run_agent re-exports from
  agent.message_sanitization, agent.context_compressor,
  agent.retry_utils, agent.prompt_builder, agent.process_bootstrap,
  agent.codex_responses_adapter)
- Unsafe F841 (unused-variable) fixes deliberately skipped -- those
  can change behavior when the RHS has side effects
- ruff lints remain disabled in pyproject.toml (only PLW1514 is
  selected); this is a one-time cleanup, not a config change

Verification:
- python -m compileall: clean
- pytest --collect-only: all 27161 tests collect (zero import errors)
- core entry points import clean (run_agent, model_tools, cli,
  toolsets, hermes_state, batch_runner, gateway)
- static scan: every name any test imports directly from an edited
  module still resolves
2026-05-28 22:26:25 -07:00
Teknium
a4d8f0f62a feat(prompt): universal task-completion guidance + local Python toolchain probe (#34340)
* fix(codex): surface error code in Responses 'failed' status errors

When a Codex Responses turn ends with status=failed, the response carries
the failure details under `response.error` as
`{code, message, param, ...}`. The previous extractor pulled only
`message`, so users seeing a rate-limit failure got a bare "Slow down"
string indistinguishable from a generic stream truncation; an
internal_error with empty message degraded to a dict dump
("{'code': 'internal_error', 'message': ''}").

Extract a `_format_responses_error()` helper that:
- prefixes `code` when both code and message are present
  (e.g. 'rate_limit_exceeded: Slow down')
- falls back to the bare `code` when message is empty
- accepts both dict and attribute-style payloads (SDK and JSON-RPC paths)
- preserves the prior status-only fallback when no error payload exists

Apply the same helper at the sibling site in
`codex_app_server_session.run_turn()` so codex-CLI subprocess turn
failures get the same treatment.

Tests:
- 8 new unit tests for `_format_responses_error` covering both shapes,
  empty/missing fields, non-string fields, and the status-only fallback.
- 2 regression tests on `_normalize_codex_response` for failed status
  with and without a code, asserting the exact RuntimeError message.
- All 3603 tests in tests/agent/ pass.

Adapted from anomalyco/opencode#28757.

* feat(prompt): universal task-completion guidance + local Python toolchain probe

Two cross-model failure modes get a single-line answer in the cached
system prompt. Both gated by config (default on), both add zero overhead
when not needed, both verified via real AIAgent prompt builds.

## What changed

`TASK_COMPLETION_GUIDANCE` — short prompt block applied to ALL models.
Targets two failure modes observed on a real Sarasota real-estate build
task: (1) Opus stopped after writing an 85-byte stub and gave a prose
response with finish_reason=stop on call #3 of 90; (2) DeepSeek pushed
through a PEP-668 wall, then returned fabricated listings instead of
admitting the blocker. Both behaviors are model-family-agnostic, so the
guidance lives outside the existing tool_use_enforcement gate (~192
tokens, paid once per session via prefix cache).

`tools/env_probe.py` — local Python toolchain probe. Detects
python3/pip/uv/PEP-668 state and emits ONE short line in the system
prompt when something is non-default. Emits NOTHING when the env is
clean (zero token cost for normal users). Skipped entirely for remote
terminal backends (docker/modal/ssh) — they have their own probe.

Example output on a broken environment (the actual case):

    Python toolchain: python3=3.11.15 (no pip module),
    python=missing (use python3), pip→python3.12 (mismatch),
    PEP 668=yes (use venv or uv).

## Config

Both flags live under `agent.` in config.yaml, default True:

    agent:
      task_completion_guidance: true   # universal "finish the job" block
      environment_probe: true          # local Python toolchain hints

Neither addition required a `_config_version` bump — deep-merge fills
defaults in for existing user configs.

## Validation

| Test surface | Result |
|---|---|
| tests/tools/test_env_probe.py | 10/10 pass (probe unit) |
| tests/run_agent/test_run_agent.py — new classes | 8/8 pass (integration) |
| TestToolUseEnforcementConfig | 17/17 pass (no regression) |
| TestBuildSystemPrompt | 9/9 pass (no regression) |
| TestInvalidateSystemPrompt | 2/2 pass (no regression) |
| tests/agent/test_prompt_builder.py | 124/124 pass (no regression) |
| tests/hermes_cli/ | 5662/5662 pass (config defaults) |
| E2E AIAgent build (broken env) | Both blocks present, 2,178 chars |
| E2E AIAgent build (clean env) | 771-char net overhead, env probe silent |
2026-05-28 22:26:09 -07:00
Teknium
75d2c081c9 fix(logging): recover gateway.log handler from external rotation (#34349)
External rotation (logrotate, manual `mv gateway.log gateway.log.1`,
another process rotating the file) leaves `_ManagedRotatingFileHandler`'s
open fd pinned to the renamed inode. All subsequent writes go to the
rotated backup instead of the file every operator expects to read,
producing the symptom 'gateway.log frozen mid-write while agent.log
keeps growing with gateway.* records'.

PR #16229 fixed the original CLI->gateway init-order bug (#8404) so the
handler attaches in the first place. This is the sibling fix for what
happens after attach, when something external rotates underneath us.

Adds a WatchedFileHandler-style inode check on emit(): if baseFilename
no longer matches the open stream's (dev,ino), close the stale fd and
reopen at the expected path. doRollover() refreshes the snapshot so our
own rollover isn't misidentified as external.

Five regression tests cover the matrix: external rename, external
unlink, external truncate (must NOT trigger reopen — inode unchanged),
normal doRollover() (must still work), and the end-to-end
Allen-reproduction (rotate + re-call setup_logging).

55/55 tests in tests/test_hermes_logging.py pass; 5972/5972 in
tests/gateway/ pass.
2026-05-28 22:26:00 -07:00
Teknium
a30480bd2b fix(compression): prevent session-id fork from concurrent compressions (#34351)
* fix(compression): prevent session-id fork from concurrent compressions

When two AIAgent instances share the same session_id (most commonly the
parent-turn agent and its background-review fork, which inherits
session_id verbatim via background_review.py L451), both can call
compress_context() on overlapping snapshots of the same conversation.
Each ends the parent and creates its own NEW child session in state.db,
both parented to the same old id. The gateway SessionEntry only catches
one rotation; the other becomes an orphan that silently accumulates
writes — Damien's incident shape (parent 20260527_234659_e65f0e → two
children, only one visible).

Adds a state.db-backed per-session compression lock. Acquired before
the rotation in conversation_compression.compress_context(); on
failure, the caller returns messages unchanged so the auto-compress
retry loop stops cleanly. TTL (5min default) reclaims locks abandoned
by crashed compressors. Lock holder identity (pid:tid:agent:nonce) is
preserved for diagnostics via get_compression_lock_holder().

Schema bumped 13 -> 14 to track the new compression_locks table.
Reconciled additively via the existing declarative-column pattern;
no data migration needed for existing DBs.

Regression test reproduces Damien's shape: two threads racing
_compress_context on a shared parent_sid. Without the lock the test
deterministically produces 2 child sessions; with the lock, exactly 1.

Covers all six compression entry points (preflight in conversation_loop,
mid-turn fallback, hygiene compression in gateway, /compact, CLI
/compress, TUI /compress). ACP /compress was already protected by
nulling out _session_db before its compress call.

* ci: trigger rerun (transient GitHub API rate limit on CodeQL workflow)
2026-05-28 21:40:39 -07:00
liuhao1024
28bb7e0a8e fix(web): bridge Tailwind --font-sans to --theme-font-sans (#20406)
Tailwind v4 defines its own --font-sans and --font-mono tokens
independently of the Hermes theme variables. Components using
font-sans/font-mono utility classes bypass --theme-font-sans and
--theme-font-mono, so theme font changes have no effect.

Add --font-sans and --font-mono bridges in the @theme inline block
so Tailwind's font tokens follow the active Hermes theme.

Fixes #20380
2026-05-29 00:19:06 -04:00
teknium1
100536134c refactor(gateway): generalize topic recovery via adapter hook
Replace the runner-introspection trick in #32998 with an explicit
`set_topic_recovery_fn` setter on `BasePlatformAdapter`. The gateway
runner installs it once at adapter init; the adapter calls
`_apply_topic_recovery(event)` before any session keying.

Also apply the hook in `BasePlatformAdapter.handle_message` so the
running-agent guard and pending-message queue key off the recovered
thread_id too — not just the text-batch coalescence.

Net change vs #32998 alone: -2 files of indirection (no
`_message_handler.__self__` peek, no separate `_normalize_text_batch_source`),
+1 generic mechanism (other adapters can install their own hook later).
2026-05-28 21:18:39 -07:00
LeonSGP43
5407d25599 Fix Telegram DM topic text batch keying 2026-05-28 21:18:39 -07:00
Manzela
90f0f32eae docs(security): add network egress isolation guide for Docker deployments (#26385) 2026-05-29 14:09:10 +10:00
Ben Barclay
40fa0c1d19 fix(docker): skip credential/skills/cache mounts when source is invalid (#24490) (#34331)
Salvages #24490 by @liuhao1024 against current main.

The Docker daemon will silently auto-create a directory at the host
path of any `-v <host>:<container>` bind mount when the host path
doesn't exist.  In Docker-in-Docker setups (where the outer host's
real credential file isn't visible inside the agent's parent
container), this leaves a directory at the credential mount source —
and the inner `docker run` then refuses to mount a directory over a
file destination with exit 125.

Add defensive shape guards to all three mount loops in
DockerEnvironment.__init__:

  * credentials (expected: file)  — skip + warn on directory or missing
  * skills      (expected: dir)   — skip + warn when not a directory
  * cache       (expected: dir)   — skip + warn when not a directory

Failed mounts surface as WARN logs rather than crashing the container
start.  Existing well-formed sources mount unchanged.

The original PR's branch was on a pre-container-reuse-rework base
(May 12) and conflicted with the post-May-28 driver work (label
tagging, container reuse, orphan reaper).  Reconstructed the same
intent on current main; the three guard blocks slot cleanly into
`tools/environments/docker.py` around the existing mount loops.

Three new tests pinned in `tests/tools/test_docker_environment.py`:
directory-source skip, missing-source skip, valid-file mounts.  Test-
first regression verification: reverted just the production code to
`origin/main` and confirmed the new tests fail with
`'deleted_token.json' is contained here: /root/.hermes/...` — the
fixed code makes them pass.  Full file passes (54/54).

Closes #24490

Co-authored-by: liuhao1024 <11816344+liuhao1024@users.noreply.github.com>
2026-05-29 14:09:04 +10:00
Teknium
69b74c15a3 fix(kanban): CLI dispatch honors max_in_progress/max_spawn from config; swap missing 'avoid-ai-writing' skill for bundled humanizer (#33488, #29415) (#34337)
Two small bugs in the kanban dispatcher's CLI surface that were
silently degrading two distinct workflows. Bundled because the test
files and the surrounding code surface overlap.

## #33488: hermes kanban dispatch ignored kanban.max_in_progress / max_spawn

The CLI wrapper in hermes_cli/kanban.py:_cmd_dispatch only passed
default_assignee and max_in_progress_per_profile through to
dispatch_once. The global concurrency cap (kanban.max_in_progress)
and the per-tick spawn limit (kanban.max_spawn) were silently dropped,
so operators using 'hermes kanban dispatch' as a one-shot or in a
custom loop couldn't reach either cap from config — only the gateway
embedded dispatcher honored them.

Fix: read both keys from config in the same coerce-positive-int
helper that already handled max_in_progress_per_profile. CLI --max
still wins over config kanban.max_spawn when both are present
(explicit operator signal beats default), but absent --max falls
back to config.

## #29415: synthesizer crashed in retry loop on missing skill

hermes_cli/kanban_swarm.py:212 hardcoded skills=['avoid-ai-writing'],
a skill that doesn't exist in the bundled skills/ directory or any
registered hub source. Every synthesizer worker spawn failed at CLI
startup with 'Unknown skill(s): avoid-ai-writing' before the agent
loop even started — the dispatcher retried up to failure_limit
(default 2), then auto-blocked the task, then dependency rules could
re-promote it, looping forever until manual intervention.

Fix: replace with 'humanizer' which is bundled at
skills/creative/humanizer/SKILL.md (description: 'Humanize text:
strip AI-isms and add real voice'). That's the obvious intent behind
the 'avoid-ai-writing' name, and the skill is platform-portable
(linux/macos/windows) so it works on every supported runtime.

## Tests

tests/hermes_cli/test_kanban_cli_dispatch_passthrough.py — 4 cases:
- CLI passes max_in_progress / max_spawn / default_assignee /
  max_in_progress_per_profile from config to dispatch_once
- CLI --max flag overrides config kanban.max_spawn
- Invalid cap values (0, -1, 'abc', '1.5') silently fall through to None
- kanban_swarm.py no longer references 'avoid-ai-writing' AND the
  replacement 'humanizer' skill exists at the expected on-disk path

Kanban suite: 468/468 pass (was 464; +4 new regression tests).
2026-05-28 21:00:46 -07:00
teknium1
8cf6b3da9d fix(opencode-go): cap mimo-v2.5-pro max_tokens at 131072
The opencode-go relay defaults max_tokens to 262144 when none is sent,
but Xiami mimo-v2.5-pro only supports 131072 completion tokens — every
request 400s with "max_tokens is too large: 262144" before the agent
can do anything.

Add a get_max_tokens(model) hook on ProviderProfile (default returns
default_max_tokens) so profiles fronting multiple upstreams can vary
the cap per-model. Wire chat_completions transport through the hook.
Override on OpenCodeGoProfile with mimo-v2.5-pro=131072.

Only mimo-v2.5-pro is capped — other opencode-go models (kimi, glm,
qwen, minimax, other mimo variants) unchanged.
2026-05-28 20:49:53 -07:00
teknium1
bfecfabd0f Revert "feat(skills): integrate NVIDIA/skills as a trusted skills hub tap"
This reverts commit 9992e32db3.
2026-05-28 20:39:39 -07:00
liuhao1024
44df52005a fix(tools): guard Path.home() against PermissionError in has_direct_modal_credentials (#33528)
When HOME=/root (Docker containers) and the process runs as unprivileged
user (hermes, uid 10000), Path.home() / '.modal.toml' raises PermissionError
because /root/ is inaccessible. This crashes the dashboard /api/skills endpoint.

Catch PermissionError/OSError and treat as 'no config file'. Env vars still
take priority (tested).

Fixes #33525
2026-05-29 13:35:39 +10:00
Teknium
9992e32db3 feat(skills): integrate NVIDIA/skills as a trusted skills hub tap
NVIDIA's verified skills catalog (https://github.com/NVIDIA/skills) ships
NVIDIA-signed skills for CUDA-X, AIQ, cuOpt, cuPyNumeric, DeepStream, NeMo,
NemoClaw and the Skill Card Generator — each bundle carrying a detached
`skill.oms.sig` signature, a governance `skill-card.md`, and `evals/`. The
sync pipeline drops any skill missing those artifacts before publishing.

Changes:
- tools/skills_hub.py: add NVIDIA/skills to GitHubSource.DEFAULT_TAPS so
  it lights up in `hermes skills browse`, `hermes skills search <q>`, the
  twice-daily skills-index build, and the docs-site Skills Hub page
  (https://hermes-agent.nousresearch.com/docs/skills) automatically.
- tools/skills_guard.py: add NVIDIA/skills to TRUSTED_REPOS so installs
  resolve to trust_level="trusted" (looser install policy than community).
- website/scripts/extract-skills.py: map the `github` source id to a
  friendly "NVIDIA" pill label for the docs hub page.
- website/src/pages/skills/index.tsx: register the NVIDIA pill (green
  #76b900) and slot it into SOURCE_ORDER after HuggingFace.
- website/docs/user-guide/features/skills.md (+ zh-Hans i18n): document
  the new default tap and the expanded trusted-repos list.
- tests/tools/test_skills_guard.py: assert NVIDIA/skills resolves to
  "trusted" (including the skills-sh-wrapped form).
- tests/tools/test_skills_hub.py: invariant — every TRUSTED_REPOS entry
  must be reachable via GitHubSource.DEFAULT_TAPS (prevents future
  trusted repos from being declared but never browseable).

Validation:
- Live GitHub fetch: `src.fetch('NVIDIA/skills/skills/aiq-deploy')` pulled
  17 files including SKILL.md (13 KB), skill-card.md, skill.oms.sig, and
  the full references/ + evals/ tree. trust_level="trusted".
- Live inspect resolved name, description, and trust correctly.
- All 193 existing skills_guard + skills_hub tests still pass.
2026-05-28 20:35:13 -07:00
hinotoi-agent
042c1d6bb0 test: cover fallback dropped-turn handoff 2026-05-28 20:34:40 -07:00
Hinotoi Agent
6dc068ef04 fix: broaden deterministic compression fallback coverage 2026-05-28 20:34:40 -07:00
Hinotoi Agent
e785c0ad70 fix: preserve context when summary generation fails 2026-05-28 20:34:40 -07:00
Dusk
c834624f7d fix(voice): honor PIPEWIRE_REMOTE in PortAudio fallback checks (#33473) 2026-05-29 13:30:17 +10:00
Брагарник Дмитро
54bf798765 approval: add docker restart/stop/kill to DANGEROUS_PATTERNS (#33438)
When docker.sock is mounted (common Docker Compose pattern), the agent
can restart/stop/kill containers without user approval. hermes gateway
restart is already protected, but docker restart, docker stop,
docker kill, and their docker compose equivalents were not.

This caused repeated self-termination: the agent ran docker restart
hermes, killed its own container, Docker restarted it (restart policy),
and the agent resumed the same session — creating a restart loop.

Added patterns mirror the existing gateway lifecycle protection:
- docker compose restart/stop/kill/down
- docker restart/stop/kill

Co-authored-by: Sarbai <sarbai@users.noreply.github.com>
2026-05-29 13:26:54 +10:00
ninjmnky
593e4b435e Add iputils-ping (ping) to Docker image (#32015)
ping is a fundamental network diagnostic tool that most users expect to have available in the container. This adds iputils-ping to the apt install list in the Dockerfile.

Co-authored-by: ninjmnky <ninjmnky@users.noreply.github.com>
2026-05-29 13:25:32 +10:00
Ben
a618789dba fix(dashboard-auth): share /api/* public allowlist between legacy and OAuth gates
Two parallel public-path allowlists drifted: _PUBLIC_API_PATHS in
hermes_cli/web_server.py (legacy _SESSION_TOKEN middleware) and
_GATE_PUBLIC_PREFIXES in hermes_cli/dashboard_auth/middleware.py
(OAuth gate). The legacy list included /api/status (documented as a
non-sensitive read-only liveness target); the OAuth gate's list did not.

Effect: every wildcard-subdomain agent surfaced as STARTING/down to the
portal even though the dashboard was serving correctly. Nous account
service (src/server/agents/fly-provider.ts
getInstanceRuntimeStatus) fetches ``/api/status`` without a cookie
as its sole liveness probe; the OAuth gate's 401 looked identical to
'agent dead' on the portal side.

Fix: lift the allowlist into hermes_cli/dashboard_auth/public_paths.py
and have both middlewares import it. _path_is_public now consults
the shared frozenset first, then falls back to the gate's
auth-bootstrap/static prefix list. Future additions to the public list
hit both gates automatically.

Endpoint inventory (verified safe to remain public):

* /api/status            — version, gateway state, active session count,
                           auth-gate shape. Portal liveness probe target.
* /api/config/defaults   — config-defaults feed for the SPA's Config page
* /api/config/schema     — config schema for the SPA's Config page
* /api/model/info        — model catalogue metadata (context windows)
* /api/dashboard/themes  — theme manifests for the skin engine
* /api/dashboard/plugins — plugin manifests for the dashboard

No user data, no session content, no secrets. Same shape an external
monitoring agent would hit on /healthz.

Tests:

* New: test_gated_status_is_public (regression guard with the NAS
  fly-provider.ts liveness-probe rationale spelled out in the docstring)
* New: test_other_public_api_paths_are_public_under_gate (parametrised
  over the rest of PUBLIC_API_PATHS — proves 401 / 302-to-login is
  never the response)
* New: docker integration check #3 in
  test_dashboard_oauth_gate_engaged_by_default — /api/status
  remains 200 under the gate AND reports auth_required=True so the
  portal can distinguish modes
* Updated: test_full_login_round_trip_unlocks_gated_api now probes
  /api/sessions instead of /api/status (status is public, so it
  can no longer distinguish 'logged in' from 'gate accidentally
  disabled')
* Updated: TestApi401Envelope (the no-cookie / invalid-cookie /
  dead-cookie tests) probes /api/sessions for the same reason
* Updated: docker integration check #2 in
  test_dashboard_oauth_gate_engaged_by_default probes
  /api/sessions to prove the gate is intercepting
* Removed: dead _login() helper in
  test_dashboard_auth_status_endpoint.py (no longer needed since
  /api/status is reachable cold)

Companion to docs/handover/hermes-agent-dashboard-s6-insecure-fix.md
(the --insecure flag fix that shipped earlier).
2026-05-29 12:17:12 +10:00
Teknium
3b6347af15 feat(kanban): default_assignee fallback + per-profile concurrency cap (#27145, #21582) (#34244)
Two related dispatcher behaviors that have been missing for a while.

## kanban.default_assignee (#27145)

Reporter (@agarzon): dashboard creates a task without an assignee, task
parks in 'ready' forever even though the operator's intent ('default')
is perfectly clear. The dispatcher already had a 'skipped_unassigned'
bucket but no fallback routing — users had to manually type 'default'
in the assignee field every time.

Behavior: when 'kanban.default_assignee' is set in config.yaml, the
dispatcher applies that assignee to any unassigned ready task before
deciding whether to spawn. The row is mutated (assignee column + an
'assigned' event with source='kanban.default_assignee' for the audit
trail). Empty/whitespace config value = no fallback, preserving the
existing skipped_unassigned behavior.

Dry-run mode reports what WOULD happen via the new
'auto_assigned_default' bucket on DispatchResult, but does NOT mutate
the DB — operators using 'hermes kanban dispatch --dry-run' see the
routing decision before committing.

## kanban.max_in_progress_per_profile (#21582)

Reporter (@edwardchenchen, @simlu, 4 reactions): fan-out workloads
saturate one profile's local model / API quota / browser pool while
other profiles sit idle. The existing global 'max_in_progress' caps
total workers but doesn't balance across profiles.

Behavior: when 'kanban.max_in_progress_per_profile' is set to a
positive int, the dispatcher tracks per-assignee running counts (one
query at tick start) and refuses to spawn for any assignee already at
the cap. Tasks blocked this way go to a new
'skipped_per_profile_capped' bucket on DispatchResult as
(task_id, assignee, current_running_count) tuples — NOT an
operator-actionable failure, just 'try again next tick when the
profile has capacity'.

Pre-existing 'running' tasks count against the cap (verified via
regression test). The cap respects dry_run mode by incrementing
its in-memory counter on each would-be spawn so dry_run reports
the same balanced subset that a real tick would.

Invalid cap values (0, negative, non-int, None) are treated as 'no
cap', preserving the existing behavior. Backward-compatible for
installs that don't set the config.

## Surfaces

- 'hermes kanban dispatch' CLI now prints 'Auto-assigned to
  kanban.default_assignee=X: ...' and 'Deferred (X at per-profile cap,
  N running): ...' lines, plus matching JSON keys in --json output.
- Gateway dispatcher logs the configured values at startup
  ('default_assignee=X', 'max_in_progress_per_profile=N').
- 'kanban.max_in_progress_per_profile' added to DEFAULT_CONFIG with
  inline docs.

## Validation

- tests/hermes_cli/test_kanban_default_assignee.py (6 cases): no-cap
  baseline, auto-assign + DB mutation, dry-run reports without
  mutating, whitespace treated as None, explicit assignees untouched,
  DispatchResult field schema.
- tests/hermes_cli/test_kanban_per_profile_cap.py (9 cases including
  4 parametrized): no-cap baseline, balanced 2-profile fan-out,
  pre-existing running counts against cap, invalid cap values
  (0/-1/'abc'/None), capped tasks dispatched on next tick after
  running task completes, DispatchResult field schema.
- Broader kanban suite: 464/464 pass (was 449 baseline; +15 new
  regression tests across both features).

## Credit

#27145 — Jimmy Johansson reported the dispatcher skipped-unassigned
gap; @agarzon scoped the simpler 'honor kanban.default_assignee' fix
that matches the existing config knob.
#21582 — @edwardchenchen filed the per-profile cap ask after hitting
model 429s on fan-out research projects; @simlu confirmed the same
pain on local-model setups.
2026-05-28 19:02:55 -07:00
Ben
42612aa350 docs(docker): refresh user-guide page for s6-overlay reality
The page was last meaningfully rewritten in the pre-s6 (tini) era and had
drifted on five points that no longer matched the image:

1. "Running the dashboard" claimed the entrypoint backgrounds
   `hermes dashboard` and prefixes its output with `[dashboard]`. That
   was the pre-s6 entrypoint.sh path; under s6 the dashboard is a
   supervised s6-rc service (`docker/s6-rc.d/dashboard/run`) with no
   sed-prefix pipeline. Rewrote the section accordingly.

2. The default for `HERMES_DASHBOARD_HOST` was documented as
   `127.0.0.1`. The s6 run script defaults it to `0.0.0.0`
   (`dash_host="${HERMES_DASHBOARD_HOST:-0.0.0.0}"`). Fixed the table
   and the surrounding prose.

3. Multi-profile was documented as "not recommended in Docker — run
   one container per profile." That advice was load-bearing when
   there was no in-container supervisor, but the s6 architecture
   explicitly adds per-profile gateway supervision: each profile
   created via `hermes profile create <name>` gets a slot under
   `/run/service/gateway-<name>/`, the `02-reconcile-profiles`
   cont-init script restores them across `docker restart` from
   `gateway_state.json`, and `hermes gateway start/stop/restart` is
   intercepted by `_dispatch_via_service_manager_if_s6` to route
   through `s6-svc`. Pivoted the section to "one container, many
   supervised profile gateways" as the default, with a comparison
   table and a "When you DO want a separate container" escape
   hatch for the genuine resource-isolation / network-segmentation
   cases.

4. The Compose example trailer also claimed `[dashboard]` log
   prefixing. Replaced with the actual log routing.

5. Added a new "Where the logs go" section covering all four log
   surfaces: per-profile gateways (tee'd to `docker logs` AND
   `${HERMES_HOME}/logs/gateways/<profile>/current` since PR
   b34532319), dashboard (`docker logs`, no prefix), boot reconciler
   (`container-boot.log`), and `hermes logs`. The gateway-mode and
   Compose sections cross-reference this rather than each carrying
   their own routing prose.

Added a new "docker exec automatically drops to the hermes user"
subsection under "What the Dockerfile does", next to the existing
Privilege model warning. Documents the `/opt/hermes/bin/hermes` shim
(landed via the docker-exec privilege-drop work) — operators don't
need to remember `--user hermes` for `docker exec hermes login`,
`docker exec hermes profile create …`, etc. The historical footgun
(`auth.json` written as `root:root`, supervised gateway then can't
read its own auth file) is mentioned only as context for what the
fail-loud `exit 126` is protecting against, not as a problem the
reader needs to solve. The `HERMES_DOCKER_EXEC_AS_ROOT=1` opt-out is
documented for diagnostic sessions.

The "Permission denied" troubleshooting subsection now carries a
single-line pointer to the new section instead of duplicating it.

The `--insecure` framing reflects PR #fb5125362 (opt-in via
`HERMES_DASHBOARD_INSECURE`, not derived from bind host): the OAuth
gate is the authority, the bind host alone never implies
`--insecure`, and opting out is an explicit security trade-off.

Anchors verified resolve. i18n zh-Hans mirror left for the
translation flow to catch up.
2026-05-29 11:55:01 +10:00
Ben
3c6e70aef1 docs(docker): document new persist-across-processes contract and orphan reaper (#20561)
Updates the Docker Backend section of the user-guide configuration page
to match the actual behavior shipped in PR #33645. Pre-PR the docs
claimed "container is stopped and removed on shutdown," which was
never quite true for the documented happy path and is now actively
wrong: in default mode the container survives across Hermes processes
so background processes (npm watchers, dev servers, long-running
pytest) carry over the way the "ONE long-lived container shared
across sessions" promise requires.

Changes to `website/docs/user-guide/configuration.md`:

* Reworked the intro paragraph at the top of the Docker Backend
  section to describe the actual cross-process reuse contract.
* Expanded the YAML example with the new keys
  `docker_persist_across_processes` and `docker_orphan_reaper`, plus
  the pre-existing-but-undocumented `docker_env`, `timeout`, and
  `lifetime_seconds`.  Clarified the `container_persistent` comment
  to disambiguate from `docker_persist_across_processes`.
* Added a `docker_env` vs `docker_forward_env` explainer (one
  injects literal KEY=value, the other forwards values from the
  host/.env — easy to confuse).
* Replaced the one-line "Container lifecycle" paragraph with a full
  subsection covering:
    - the three labels Hermes tags every container with
      (hermes-agent, hermes-task-id, hermes-profile)
    - the label-probe reuse mechanism on startup
    - a teardown-trigger table with four rows for every situation
      that destroys the container in default mode
    - edge cases (OOM kill, profile switching)
* Added an "Environment variable overrides" table covering all
  TERMINAL_* env vars relevant to the Docker backend, including the
  previously-undocumented `TERMINAL_DOCKER_ENV` and
  `HERMES_DOCKER_BINARY`.

Changes to `website/docs/user-guide/docker.md`:

* Extended the cross-link admonition (around l.227) so the
  Hermes-in-Docker page points at the new terminal-backend keys
  (`docker_env`, `docker_persist_across_processes`,
  `docker_orphan_reaper`) alongside the ones already mentioned.

No code changes.  Behavior already covered by tests added in earlier
commits on this branch (#33645 commits 1-5).

Refs #20561
2026-05-29 11:49:54 +10:00
Ben
2f0f03c40d fix(docker): cleanup_vm() default honors persist mode (don't kill container on session close)
Commit 4 made cleanup_vm() default to force_remove=True, which was wrong:
cleanup_vm() is called from AIAgent.close() (TUI session close at
tui_gateway/server.py:2991, gateway session teardown at gateway/run.py:3569)
and from per-turn cleanup (agent/chat_completion_helpers.py:1517). All
three are session-lifecycle events that should honor persist mode, not
explicit user-initiated teardown.

Ben reported the symptom: container shared between multiple TUI sessions
(good) but killed as soon as any session closed (bad). With force_remove=True
as the default, every `session.close` JSON-RPC tore down the container.

The fix is to flip cleanup_vm()'s force_remove default back to False.
The kwarg still exists for future explicit-teardown paths (`/reset`-style
flows, "destroy my sandbox" commands) that haven't been wired up yet.

Two new unit tests pin the behavior:

* `test_cleanup_vm_default_honors_persist_mode` — asserts
  `cleanup_vm(task_id)` does neither docker stop nor docker rm on a
  persist-mode container (the regression Ben caught).
* `test_cleanup_vm_force_remove_tears_down_persist_container` —
  asserts the kwarg still flows through the runtime-signature-inspection
  plumbing to the backend's cleanup().

E2E verified against real Docker (in addition to all 17 existing checks):

  ✓ Default cleanup_vm() leaves persist-mode container running
  ✓ cleanup_vm(force_remove=True) removed the container

Refs #20561
2026-05-29 11:49:54 +10:00
Ben
5c2170a7c6 fix(docker): persist-mode cleanup is no-op; add force_remove kwarg (#20561)
The first iteration of this PR did docker stop on every cleanup in
persist mode (only skipping docker rm). Ben caught this as
contradicting the documented "ONE long-lived container shared across
sessions" semantics: stopping the container on every Hermes /quit kills
any background processes inside (npm watchers, pytest watchers,
long-running scripts) — exactly the case persist mode is supposed to
protect.

This commit splits the cleanup paths cleanly:

* **Persist mode (default)** — cleanup() is a NO-OP for the
  container. Container stays running, processes survive, next Hermes
  process attaches via the existing label probe in ~ms instead of
  waiting for docker start. Resource reclamation happens via the
  orphan reaper at next startup (2 × lifetime_seconds threshold), which
  covers the SIGKILL / OOM / abandoned-laptop cases.
* **Opt-out mode (persist_across_processes=False)** — unchanged:
  docker stop + docker rm -f on cleanup as before.
* **Explicit teardown** — new cleanup(force_remove=True) kwarg
  overrides persist mode and tears the container down unconditionally.
  cleanup_vm(task_id) now defaults to force_remove=True since
  it's the user-driven reset path (called from AIAgent.close(),
  /reset-style flows, and the idle reaper's per-turn cleanup).

The idle reaper in _cleanup_inactive_envs calls env.cleanup()
directly with no kwargs, so idle persist-mode envs are no-op'd — the
container survives the in-process pop and the next tool call re-probes
via labels. No state leak: _container_id is still cleared on the
in-process handle.

E2E verified against real Docker:

  ✓ Container is still running after cleanup()
  ✓ Background process (sleep loop) survived cleanup()
  ✓ Filesystem state preserved across cleanup()
  ✓ In-process container_id cleared (next __init__ will re-probe)
  ✓ Background process visible from reused env (no docker start happened)
  ✓ force_remove=True removed the container even in persist mode
  ✓ cleanup_vm() removed the container (defaults to force_remove=True)

Test changes:

* Replaces `test_cleanup_with_persist_only_stops_no_rm` with
  `test_cleanup_with_persist_is_noop_for_container` — asserts neither
  stop nor rm runs in persist mode, and the in-process handle is
  cleared so re-probe works.
* Adds `test_cleanup_force_remove_stops_and_rms_even_in_persist_mode`
  — covers the new kwarg.
* Updates `test_cleanup_uses_subprocess_run_not_detached_shell` and
  `test_wait_for_cleanup_after_cleanup_returns_true` to pass
  `force_remove=True` so they actually exercise the docker code path
  (default no-op would trivially pass).

cleanup_vm() forwards `force_remove` only to backends whose cleanup()
accepts the kwarg (currently just DockerEnvironment) via runtime
signature inspection — Modal/Daytona/SSH `cleanup()` signatures are
unchanged.

Refs #20561
2026-05-29 11:49:54 +10:00
Ben
d77d877665 fix(docker): startup orphan reaper for crashed-process containers
The cleanup-fix in the previous commit handles the graceful-exit leak: a
Hermes process that runs ``atexit`` will now actually wait on the docker
stop/rm worker thread, so containers either survive (persist mode) or are
fully removed (opt-out mode) by the time the interpreter exits.

But ``atexit`` doesn't fire on SIGKILL, OOM-kill, or terminal-window
close. Containers from those exits stay parked with no surviving Python
process to reuse or remove them, so they accumulate until the operator
intervenes with ``docker rm -f``. The cleanup-fix doesn't help this class
— there's no live cleanup() to fix.

This commit adds the safety net: a startup orphan reaper that runs once
per Hermes process and removes long-Exited hermes-labeled containers
that the prior commit couldn't reach.

Implementation:

* New ``reap_orphan_containers()`` in ``tools/environments/docker.py``.
  Filters: ``label=hermes-agent=1`` + ``status=exited`` + (optional)
  ``label=hermes-profile=<current>``. Per-container ``docker inspect``
  parses ``State.FinishedAt`` (with nanosecond-precision trimming for
  Python's microsecond-bound ``fromisoformat``); containers older than
  the threshold get ``docker rm -f``'d. The ``status=exited`` filter is
  load-bearing — a running container may belong to a sibling Hermes
  process whose reuse path will pick it up; killing it would crash the
  sibling mid-command. Single-container failures are logged and the
  sweep continues to the next candidate.

* New ``_maybe_reap_docker_orphans()`` helper in
  ``tools/terminal_tool.py``. Wired into ``_create_environment()`` for
  ``env_type == "docker"``. Gated by:

    - ``terminal.docker_orphan_reaper: true`` (default; opt-out for
      operators running multiple Hermes processes in the same profile
      who don't trust the conservative defaults)
    - ``_docker_orphan_reaper_ran`` module flag with double-checked
      locking — parallel subagents and RL rollouts don't trigger N
      concurrent docker ps storms
    - Age threshold = ``2 × TERMINAL_LIFETIME_SECONDS`` with a 60s floor
      (so ``TERMINAL_LIFETIME_SECONDS=0`` doesn't race the user's own
      setup)
    - Profile scoping — a research profile NEVER reaps the default
      profile's stragglers
    - Exception swallow — a janitor failure must never block container
      creation

* New config ``terminal.docker_orphan_reaper`` wired through all four
  config-bridge sites (cli.py, gateway/run.py, hermes_cli/config.py,
  tests/conftest.py) and pinned by
  ``test_docker_orphan_reaper_is_bridged_everywhere``.

Coverage:

* 9 new unit tests in test_docker_environment.py — happy path, recent-
  container sparing, profile scoping, unparseable-timestamp safety,
  docker-ps-failure handling, partial-failure continuation, nanosecond
  timestamp parsing, zero-value FinishedAt rejection.
* 6 new integration tests in test_docker_orphan_reaper_integration.py
  — once-per-process gate, disable-flag respected, lifetime doubling
  with 60s floor, current-profile filter wiring, exception swallow.
* 1 new bridge-invariant regression test.

Closes #20561 (combined with the two prior commits on this branch).
2026-05-29 11:49:54 +10:00
Ben
ac8e238bc8 fix(docker): reuse containers across processes + fix cleanup leaks
The Docker backend docs claim "Single persistent container — ONE long-
lived container shared across sessions, /new, /reset, and delegate_task
subagents. Stopped/removed on shutdown." In practice the code only
honored that contract within a single Python process via the in-memory
\`_active_environments[task_id]\` cache. Every \`hermes chat\` invocation
spawned a fresh \`hermes-<hex>\` container; older containers piled up in
\`Exited\` state and accumulated until manual \`docker rm\` (issue #20561).

Three root causes, all addressed by this commit:

1. No cross-process container discovery.
2. \`cleanup()\` used fire-and-forget \`subprocess.Popen("... &", shell=True)\`
   which raced with parent-process exit — when Python exited promptly the
   detached shell child got killed mid-\`docker stop\`, leaving stopped
   containers behind.
3. The \`docker rm\` step in cleanup was gated on \`not self._persistent\`
   (the bind-mount-persistence flag). Default config sets
   \`container_persistent: true\`, so the default happy path skipped \`rm\`
   entirely — even when the user explicitly didn't want cross-process
   reuse, containers leaked.

Fix:

* Add \`DockerEnvironment.__init__(persist_across_processes=True)\`. When
  true, init probes
  \`docker ps -a --filter label=hermes-agent=1
                  --filter label=hermes-task-id=<task>
                  --filter label=hermes-profile=<profile>\`
  and reuses a matching container (running → attach; stopped →
  \`docker start\` → attach; \`docker start\` failure → fall through to a
  fresh \`docker run\`). Multiple matches prefer the running one, with the
  stragglers left for the orphan reaper (next commit) to clean up.

* Rewrite \`cleanup()\`. Uses \`subprocess.run(..., timeout=30)\` on a
  daemon \`threading.Thread\`, not the racy \`Popen(... &)\`. The
  \`_persistent\` guard is dropped on the \`rm\` step — \`rm\` now runs
  whenever \`persist_across_processes\` is false, regardless of the
  bind-mount-persistence setting. The leak class is gone in all
  combinations.

* Add \`wait_for_cleanup(timeout)\`. \`tools/terminal_tool.py\`'s atexit
  hook calls this on every active env, blocking up to 15s for the
  cleanup thread before interpreter exit. Without this, \`hermes /quit\`
  raced the daemon-thread teardown and dropped the stop/rm work.

* New config \`terminal.docker_persist_across_processes\` (default
  \`true\` — restores the documented contract). Set \`false\` for hard
  per-process isolation. Wired through all four config-bridge sites
  (cli.py env_mappings, gateway/run.py _terminal_env_map,
  hermes_cli/config.py _config_to_env_sync, tests/conftest.py env-strip
  list); regression-pinned by
  \`test_docker_persist_across_processes_is_bridged_everywhere\` matching
  the existing pattern for docker_run_as_host_user / docker_env.

Reuse intentionally does NOT compare image / mounts / resources — only
the labels. Operators changing those settings should set
\`docker_persist_across_processes: false\` (or \`docker rm -f\` the
labeled container) to force a fresh start. This keeps the probe cheap
and the failure mode obvious.

Coverage: 12 new unit tests in tests/tools/test_docker_environment.py
covering reuse paths (running, stopped, fallback, opt-out, duplicate
preference) and cleanup behavior (persist-mode no-rm, opt-out always-rm,
no-Popen, wait_for_cleanup semantics, partial-init safety). Plus one
config-bridge regression pin.

Refs #20561
2026-05-29 11:49:54 +10:00
Ben
8d129d013b fix(docker): tag containers with hermes-agent labels for identification
Issue #20561 (Docker containers accumulate) needs a way to identify
hermes-created containers from the outside — both for the orphan reaper
(a follow-up commit) and for operators triaging `docker ps -a | grep
hermes-` after a SIGKILL leaves stragglers. The previous `hermes-<hex>`
name prefix was the only signal, which broke down under cross-process
reuse (planned) and against any custom `--name` someone might pass via
`docker_extra_args`.

This commit adds three labels at `docker run` time:

  --label hermes-agent=1                # global sweep target
  --label hermes-task-id=<sanitized>    # per-task reuse key
  --label hermes-profile=<sanitized>    # per-profile isolation key

Values are sanitized to `[A-Za-z0-9_.-]` and truncated to 63 chars so the
label round-trips cleanly through `docker ps --filter label=key=value`.
Empty or non-string inputs collapse to "unknown" rather than producing
an unqueryable empty value.

No behavior change: the labels are pure metadata. The follow-up commits
in this PR (cleanup-fix + orphan reaper) are what use them.

Refs #20561
2026-05-29 11:49:54 +10:00
Teknium
300140e006 test(tui_gateway): stop reloading server module in fixture teardown (#34217)
tui_gateway.server registers two atexit hooks at module load time:
ThreadPoolExecutor shutdown (line 170) and _shutdown_sessions (line 336).
Three test files reloaded the module on each fixture teardown to reset
per-test state. Each reload re-runs module-level code, including the
atexit registrations — duplicates accumulate across the test session.

At pytest interpreter shutdown the duplicated atexit hooks race the
stderr buffer flush:

    Fatal Python error: _enter_buffered_busy: could not acquire lock
    for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown,
    possibly due to daemon threads

pytest reports 'tests passed but the slice exited non-zero', and the
shard turns red on CI. Surfaced today on PR #34193's test slice 1
(204 files, 3572 tests passed, then Fatal Python error during exit).

Fix: drop importlib.reload(mod) from the three fixtures that have it.
Per-test reset is handled by clearing the mutable session dicts
(_sessions, _pending, _answers). _methods is also no longer cleared —
it's populated at module import time and would only be re-populated by
a reload, so clearing it without reload broke session.resume /
command.dispatch / slash.exec method registration across tests.

Affected fixtures:
- tests/tui_gateway/test_goal_command.py
- tests/tui_gateway/test_protocol.py
- tests/tui_gateway/test_review_summary_callback.py

The second reload in test_protocol.py at line 211 (reload of
tui_gateway.transport) is preserved — transport.py has no atexit hooks
or threads, so reload is safe there.

Tests: 84/84 in tests/tui_gateway/ pass cleanly with exit code 0; no
Fatal Python error at interpreter shutdown.
2026-05-28 18:16:54 -07:00
dvir pashut
66265a0571 fix(nix): drop stale "vercel" group from #full variant
The `vercel` optional-dependency was removed from pyproject.toml in
#33067, but `nix/packages.nix` (added a few hours later in #33108)
still references `"vercel"` in the `#full` variant's
`extraDependencyGroups`. uv2nix fails evaluation with:

  error: Extra/group name 'vercel' does not match either extra or
  dependency group

Because `nix/devShell.nix` does
`inputsFrom = builtins.attrValues self'.packages`, the broken `#full`
derivation is pulled into the dev shell too, so `nix develop` /
direnv breaks on a fresh clone — not just `nix build .#full`.
2026-05-28 11:52:31 +03:00
1426 changed files with 157241 additions and 9512 deletions

View File

@@ -417,9 +417,9 @@ IMAGE_TOOLS_DEBUG=false
# Default STT provider is "local" (faster-whisper) — runs on your machine, no API key needed.
# Install with: pip install faster-whisper
# Model downloads automatically on first use (~150 MB for "base").
# To use cloud providers instead, set GROQ_API_KEY or VOICE_TOOLS_OPENAI_KEY above.
# Provider priority: local > groq > openai
# Configure in config.yaml: stt.provider: local | groq | openai
# To use cloud providers instead, set GROQ_API_KEY, VOICE_TOOLS_OPENAI_KEY, or ELEVENLABS_API_KEY above.
# Provider priority: local > groq > openai > mistral > xai > elevenlabs
# Configure in config.yaml: stt.provider: local | groq | openai | mistral | xai | elevenlabs
# =============================================================================
# STT ADVANCED OVERRIDES (optional)
@@ -427,10 +427,12 @@ IMAGE_TOOLS_DEBUG=false
# Override default STT models per provider (normally set via stt.model in config.yaml)
# STT_GROQ_MODEL=whisper-large-v3-turbo
# STT_OPENAI_MODEL=whisper-1
# STT_ELEVENLABS_MODEL=scribe_v2
# Override STT provider endpoints (for proxies or self-hosted instances)
# GROQ_BASE_URL=https://api.groq.com/openai/v1
# STT_OPENAI_BASE_URL=https://api.openai.com/v1
# ELEVENLABS_STT_BASE_URL=https://api.elevenlabs.io/v1
# =============================================================================
# MICROSOFT TEAMS INTEGRATION

View File

@@ -0,0 +1,100 @@
name: Build Windows Installer
on:
workflow_dispatch:
permissions:
contents: read
jobs:
# Gate: workflow_dispatch is already restricted to users with write access,
# but we want ADMIN-only. Explicitly check the triggering actor's repo
# permission via the API and fail fast for anyone below admin.
authorize:
name: Authorize (admins only)
runs-on: ubuntu-latest
timeout-minutes: 5
steps:
- name: Check actor is a repo admin
env:
GH_TOKEN: ${{ github.token }}
ACTOR: ${{ github.actor }}
run: |
set -euo pipefail
perm=$(gh api \
"repos/${{ github.repository }}/collaborators/${ACTOR}/permission" \
--jq '.permission')
echo "Actor '${ACTOR}' has permission: ${perm}"
if [ "${perm}" != "admin" ]; then
echo "::error::'${ACTOR}' is not a repo admin (permission=${perm}). Refusing to build/sign."
exit 1
fi
echo "Authorized: '${ACTOR}' is an admin."
build:
name: Hermes-Setup.exe
needs: authorize
runs-on: windows-latest
timeout-minutes: 30
permissions:
contents: read
# Required for OIDC auth to Azure (azure/login federated credentials).
id-token: write
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Setup Node.js
uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4
with:
node-version: 22
cache: npm
- name: Install npm dependencies
run: npm ci
- name: Setup Rust
uses: dtolnay/rust-toolchain@29eef336d9b2848a0b548edc03f92a220660cdb8 # stable
- name: Cache Rust targets
uses: Swatinem/rust-cache@e18b497796c12c097a38f9edb9d0641fb99eee32 # v2
with:
workspaces: apps/bootstrap-installer/src-tauri
- name: Build installer
run: npm run tauri:build
working-directory: apps/bootstrap-installer
- name: Azure login (OIDC)
uses: azure/login@a457da9ea143d694b1b9c7c869ebb04ebe844ef5 # v2
with:
client-id: ${{ secrets.AZURE_CLIENT_ID }}
tenant-id: ${{ secrets.AZURE_TENANT_ID }}
subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
- name: Sign Hermes-Setup.exe with Azure Artifact Signing
uses: azure/artifact-signing-action@c7ab2a863ab5f9a846ddb8265964877ef296ee82 # v2
with:
endpoint: ${{ vars.AZURE_SIGNING_ENDPOINT }}
signing-account-name: ${{ vars.AZURE_SIGNING_ACCOUNT_NAME }}
certificate-profile-name: ${{ vars.AZURE_SIGNING_CERTIFICATE_PROFILE }}
# Sign both the raw exe and the bundled NSIS installer.
files-folder: ${{ github.workspace }}\apps\bootstrap-installer\src-tauri\target\release
files-folder-filter: exe
files-folder-recurse: true
file-digest: SHA256
timestamp-rfc3161: http://timestamp.acs.microsoft.com
timestamp-digest: SHA256
- name: Upload NSIS installer
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: Hermes-Setup-installer
path: apps/bootstrap-installer/src-tauri/target/release/bundle/nsis/*.exe
- name: Upload raw exe
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: Hermes-Setup-exe
path: apps/bootstrap-installer/src-tauri/target/release/Hermes-Setup.exe

View File

@@ -3,11 +3,9 @@ name: Contributor Attribution Check
on:
pull_request:
branches: [main]
paths:
# Only run when code files change (not docs-only PRs)
- '*.py'
- '**/*.py'
- '.github/workflows/contributor-check.yml'
# No paths filter — the job must always run so the required check
# reports a status (path-gated workflows leave checks "pending" forever
# when no matching files change, which blocks merge).
permissions:
contents: read
@@ -20,7 +18,21 @@ jobs:
with:
fetch-depth: 0 # Full history needed for git log
- name: Check if relevant files changed
id: filter
run: |
BASE="${{ github.event.pull_request.base.sha }}"
HEAD="${{ github.event.pull_request.head.sha }}"
CHANGED=$(git diff --name-only "$BASE"..."$HEAD" -- '*.py' '**/*.py' '.github/workflows/contributor-check.yml' || true)
if [ -n "$CHANGED" ]; then
echo "run=true" >> "$GITHUB_OUTPUT"
else
echo "run=false" >> "$GITHUB_OUTPUT"
echo "No Python files changed, skipping attribution check."
fi
- name: Check for unmapped contributor emails
if: steps.filter.outputs.run == 'true'
run: |
# Get the merge base between this PR and main
MERGE_BASE=$(git merge-base origin/main HEAD)

View File

@@ -6,8 +6,8 @@ on:
paths:
- 'ui-tui/package-lock.json'
- 'ui-tui/package.json'
- 'web/package-lock.json'
- 'web/package.json'
- 'apps/dashboard/package-lock.json'
- 'apps/dashboard/package.json'
workflow_dispatch:
inputs:
pr_number:
@@ -28,7 +28,7 @@ concurrency:
jobs:
# ── Auto-fix on main ───────────────────────────────────────────────
# Fires when a push to main touches package.json or package-lock.json
# in ui-tui/ or web/. Runs fix-lockfiles and pushes the hash
# in ui-tui/ or apps/dashboard/. Runs fix-lockfiles and pushes the hash
# update commit directly to main so Nix builds never stay broken.
#
# Safety invariants:
@@ -110,7 +110,7 @@ jobs:
# run recompute from the correct package-lock state.
pkg_changed="$(git diff --name-only "$BASE_SHA"..origin/main -- \
'ui-tui/package-lock.json' 'ui-tui/package.json' \
'web/package-lock.json' 'web/package.json' || true)"
'apps/dashboard/package-lock.json' 'apps/dashboard/package.json' || true)"
if [ -n "$pkg_changed" ]; then
echo "::warning::Package files changed since hash computation — aborting; a fresh run will recompute"
exit 0

View File

@@ -3,15 +3,9 @@ name: Supply Chain Audit
on:
pull_request:
types: [opened, synchronize, reopened]
paths:
- '**/*.py'
- '**/*.pth'
- '**/setup.py'
- '**/setup.cfg'
- '**/sitecustomize.py'
- '**/usercustomize.py'
- '**/__init__.pth'
- 'pyproject.toml'
# No paths filter — the jobs must always run so required checks
# report a status (path-gated workflows leave checks "pending" forever
# when no matching files change, which blocks merge).
permissions:
pull-requests: write
@@ -27,8 +21,44 @@ permissions:
# advisory-only workflow instead.
jobs:
# ── Path filter (shared by both scan and dep-bounds) ───────────────
changes:
runs-on: ubuntu-latest
outputs:
# True when any file the scanner cares about changed in this PR
scan: ${{ steps.filter.outputs.scan }}
# True when pyproject.toml changed in this PR
deps: ${{ steps.filter.outputs.deps }}
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
fetch-depth: 0
- name: Check for relevant file changes
id: filter
run: |
BASE="${{ github.event.pull_request.base.sha }}"
HEAD="${{ github.event.pull_request.head.sha }}"
SCAN_FILES=$(git diff --name-only "$BASE"..."$HEAD" -- \
'*.py' '**/*.py' '*.pth' '**/*.pth' \
'setup.py' 'setup.cfg' \
'sitecustomize.py' 'usercustomize.py' '__init__.pth' \
'pyproject.toml' || true)
if [ -n "$SCAN_FILES" ]; then
echo "scan=true" >> "$GITHUB_OUTPUT"
else
echo "scan=false" >> "$GITHUB_OUTPUT"
fi
DEPS_FILES=$(git diff --name-only "$BASE"..."$HEAD" -- 'pyproject.toml' || true)
if [ -n "$DEPS_FILES" ]; then
echo "deps=true" >> "$GITHUB_OUTPUT"
else
echo "deps=false" >> "$GITHUB_OUTPUT"
fi
scan:
name: Scan PR for critical supply chain risks
needs: changes
if: needs.changes.outputs.scan == 'true'
runs-on: ubuntu-latest
steps:
- name: Checkout
@@ -147,10 +177,24 @@ jobs:
echo "::error::CRITICAL supply chain risk patterns detected in this PR. See the PR comment for details."
exit 1
# Gate: reports success when scan was skipped (no relevant files changed).
# This ensures the required check always gets a status.
scan-gate:
name: Scan PR for critical supply chain risks
needs: changes
# always() so the gate still reports SUCCESS even if `changes` fails/is
# skipped — without it, a failed dependency would leave the required
# check unreported (i.e. "pending"), the exact failure mode this fixes.
if: always() && needs.changes.outputs.scan != 'true'
runs-on: ubuntu-latest
steps:
- run: echo "No supply-chain-relevant files changed, skipping scan."
dep-bounds:
name: Check PyPI dependency upper bounds
needs: changes
if: needs.changes.outputs.deps == 'true'
runs-on: ubuntu-latest
if: contains(github.event.pull_request.changed_files_url, 'pyproject.toml') || true
steps:
- name: Checkout
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -211,3 +255,16 @@ jobs:
run: |
echo "::error::PyPI dependencies without upper bounds detected. Add <next_major ceiling per CONTRIBUTING.md policy."
exit 1
# Gate: reports success when dep-bounds was skipped (no pyproject.toml changed).
# This ensures the required check always gets a status.
dep-bounds-gate:
name: Check PyPI dependency upper bounds
needs: changes
# always() so the gate still reports SUCCESS even if `changes` fails/is
# skipped — without it, a failed dependency would leave the required
# check unreported (i.e. "pending"), the exact failure mode this fixes.
if: always() && needs.changes.outputs.deps != 'true'
runs-on: ubuntu-latest
steps:
- run: echo "No pyproject.toml changes, skipping dependency bounds check."

18
.gitignore vendored
View File

@@ -63,6 +63,10 @@ environments/benchmarks/evals/
# Web UI build output
hermes_cli/web_dist/
apps/desktop/build/
apps/desktop/dist/
apps/desktop/release/
apps/desktop/*.tsbuildinfo
# Web UI assets — synced from @nous-research/ui at build time via
# `npm run sync-assets` (see web/package.json).
@@ -85,6 +89,16 @@ website/static/api/skills-index.json
website/static/api/skills.json
website/static/api/skills-meta.json
models-dev-upstream/
# Local editor / agent tooling (machine-specific; keep in global config, not the repo)
.codex/
.cursor/
.gemini/
.zed/
.mcp.json
opencode.json
config/mcporter.json
hermes_cli/tui_dist/*
hermes_cli/scripts/
docs/superpowers/*
@@ -92,3 +106,7 @@ docs/superpowers/*
# also created in-repo when an agent operates in this checkout). Plans, audit
# logs, and per-session caches are never artifacts of the codebase.
.hermes/
# Tool Search live-test harness output — non-deterministic model transcripts,
# regenerated by scripts/tool_search_livetest.py. Never an artifact of the repo.
scripts/out/

View File

@@ -2,6 +2,8 @@
Instructions for AI coding assistants and developers working on the hermes-agent codebase.
**Never give up on the right solution.**
## Development Environment
```bash
@@ -66,6 +68,29 @@ hermes-agent/
`gateway.log` when running the gateway. Profile-aware via `get_hermes_home()`.
Browse with `hermes logs [--follow] [--level ...] [--session ...]`.
## TypeScript Style
Applies to TypeScript across Hermes: desktop, TUI, website, and future TS packages.
- Prefer small nanostores over component state when state is shared, reused, or read by distant UI.
- Let each feature own its atoms. Chat state belongs near chat, shell state near shell, shared state in `src/store`.
- Components that render from an atom should use `useStore`. Non-rendering actions should read with `$atom.get()`.
- Do not pass state through three components when the leaf can subscribe to the atom.
- Keep persistence beside the atom that owns it.
- Keep route roots thin. They compose routes and shell; they should not become controllers.
- No monolithic hooks. A hook should own one narrow job.
- Prefer colocated action modules over hidden god hooks.
- If a callback is pure side effect, use the terse void form:
`onState={st => void setGatewayState(st)}`.
- Async UI handlers should make intent explicit:
`onClick={() => void save()}`.
- Prefer interfaces for public props and shared object shapes. Avoid `type X = { ... }` for object props.
- Extend React primitives for props: `React.ComponentProps<'button'>`, `React.ComponentProps<typeof Dialog>`, `Omit<...>`, `Pick<...>`.
- Table-driven beats condition ladders when mapping ids, routes, or views.
- `src/app` owns routes, pages, and page-specific components.
- `src/store` owns shared atoms.
- `src/lib` owns shared pure helpers.
## File Dependency Chain
```

View File

@@ -43,7 +43,7 @@ Bundled skills (in `skills/`) ship with every Hermes install. They should be **b
- Document handling, web research, common dev workflows, system administration
- Used regularly by a wide range of people
If your skill is official and useful but not universally needed (e.g., a paid service integration, a heavyweight dependency), put it in **`optional-skills/`** — it ships with the repo but isn't activated by default. Users can discover it via `hermes skills browse` (labeled "official") and install it with `hermes skills install` (no third-party warning, builtin trust).
If your skill is official and useful but not universally needed (e.g., a paid service integration, a heavyweight dependency), put it in **`optional-skills/`** — it ships with the repo but isn't activated by default. Users can discover it via `hermes skills browse` (labeled "official") and install it with `hermes skills install` (no third-party warning, built-in trust).
If your skill is specialized, community-contributed, or niche, it's better suited for a **Skills Hub** — upload it to a skills registry and share it in the [Nous Research Discord](https://discord.gg/NousResearch). Users can install it with `hermes skills install`.

View File

@@ -25,7 +25,7 @@ ENV PLAYWRIGHT_BROWSERS_PATH=/opt/hermes/.playwright
# hermes process, the dashboard, and per-profile gateways.
RUN apt-get update && \
apt-get install -y --no-install-recommends \
ca-certificates curl python3 python-is-python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli xz-utils && \
ca-certificates curl iputils-ping python3 python-is-python3 ripgrep ffmpeg gcc python3-dev libffi-dev procps git openssh-client docker-cli xz-utils && \
rm -rf /var/lib/apt/lists/*
# ---------- s6-overlay install ----------
@@ -73,7 +73,17 @@ RUN set -eu; \
tar -C / -Jxpf /tmp/s6-overlay-noarch.tar.xz; \
tar -C / -Jxpf /tmp/s6-overlay-arch.tar.xz; \
tar -C / -Jxpf /tmp/s6-overlay-symlinks-noarch.tar.xz; \
rm /tmp/s6-overlay-*.tar.xz /tmp/s6-overlay.sha256
rm /tmp/s6-overlay-*.tar.xz /tmp/s6-overlay.sha256; \
# #34192: backward-compat shim for orchestration templates that still\
# reference the legacy /usr/bin/tini entrypoint (e.g. Hostinger's\
# 'Hermes WebUI' catalog). The image has moved to s6-overlay /init\
# as PID 1 (see ENTRYPOINT below + the migration comment at the top\
# of this file), but external wrappers pinned to /usr/bin/tini will\
# crash with 'tini: No such file or directory' on startup. The shim\
# symlinks /usr/bin/tini -> /init so legacy wrappers exec the right\
# PID-1 reaper without behavior change for users on the current\
# ENTRYPOINT. Safe to drop once the affected catalogs are updated.\
ln -sf /init /usr/bin/tini
# Non-root user for runtime; UID can be overridden via HERMES_UID at runtime
RUN useradd -u 10000 -m -d /opt/data hermes

View File

@@ -1,4 +1,9 @@
graft skills
graft optional-skills
# Bundled plugin manifests (plugin.yaml / plugin.yml). Without these the
# PluginManager scan (hermes_cli/plugins.py) finds zero plugins on installs
# built from the sdist (e.g. Homebrew, downstream packagers). package-data
# below covers the wheel; this covers the sdist. See #34034 / #28149.
recursive-include plugins plugin.yaml plugin.yml
global-exclude __pycache__
global-exclude *.py[cod]

View File

@@ -36,9 +36,9 @@ Use any model you want — [Nous Portal](https://portal.nousresearch.com), [Open
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
```
### Windows (native, PowerShell) — Early Beta
### Windows (native, PowerShell)
> **Heads up:** Native Windows support is **early beta**. It installs and runs, but hasn't been road-tested as broadly as our Linux/macOS/WSL2 paths. Please [file issues](https://github.com/NousResearch/hermes-agent/issues) when you hit rough edges. For the most battle-tested Windows setup today, run the Linux/macOS one-liner above inside **WSL2**.
> **Heads up:** Native Windows runs Hermes without WSL — CLI, gateway, TUI, and tools all work natively. If you'd rather use WSL2, the Linux/macOS one-liner above works there too. Found a bug? Please [file issues](https://github.com/NousResearch/hermes-agent/issues).
Run this in PowerShell:
@@ -46,13 +46,13 @@ Run this in PowerShell:
iex (irm https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.ps1)
```
The installer handles everything: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **and a portable Git Bash** (MinGit, unpacked to `%LOCALAPPDATA%\hermes\git` — no admin required, completely isolated from any system Git install). Hermes uses this bundled Git Bash to run shell commands.
The installer handles everything: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **and a portable Git Bash** (MinGit, unpacked to `%LOCALAPPDATA%\hermes\git` — no admin required, completely isolated from any system Git install). Hermes uses this bundled Git Bash to run shell commands.
If you already have Git installed, the installer detects it and uses that instead. Otherwise a ~45MB MinGit download is all you need — it won't touch or interfere with any system Git.
If you already have Git installed, the installer detects it and uses that instead. Otherwise a ~45MB MinGit download is all you need — it won't touch or interfere with any system Git.
> **Android / Termux:** The tested manual path is documented in the [Termux guide](https://hermes-agent.nousresearch.com/docs/getting-started/termux). On Termux, Hermes installs a curated `.[termux]` extra because the full `.[all]` extra currently pulls Android-incompatible voice dependencies.
>
> **Windows:** Native Windows is supported as an **early beta** — the PowerShell one-liner above installs everything, but expect rough edges and please file issues when you hit them. If you'd rather use WSL2 (our most battle-tested Windows path), the Linux command works there too. Native Windows install lives under `%LOCALAPPDATA%\hermes`; WSL2 installs under `~/.hermes` as on Linux. The only Hermes feature that currently needs WSL2 specifically is the browser-based dashboard chat pane (it uses a POSIX PTY — classic CLI and gateway both run natively).
> **Windows:** Native Windows is fully supported — the PowerShell one-liner above installs everything. If you'd rather use WSL2, the Linux command works there too. Native Windows install lives under `%LOCALAPPDATA%\hermes`; WSL2 installs under `~/.hermes` as on Linux. The only Hermes feature that currently needs WSL2 specifically is the browser-based dashboard chat pane (it uses a POSIX PTY — classic CLI and gateway both run natively).
After installation:
@@ -104,17 +104,17 @@ You can still bring your own keys per-tool whenever you want — the gateway is
Hermes has two entry points: start the terminal UI with `hermes`, or run the gateway and talk to it from Telegram, Discord, Slack, WhatsApp, Signal, or Email. Once you're in a conversation, many slash commands are shared across both interfaces.
| Action | CLI | Messaging platforms |
|---------|-----|---------------------|
| Start chatting | `hermes` | Run `hermes gateway setup` + `hermes gateway start`, then send the bot a message |
| Start fresh conversation | `/new` or `/reset` | `/new` or `/reset` |
| Change model | `/model [provider:model]` | `/model [provider:model]` |
| Set a personality | `/personality [name]` | `/personality [name]` |
| Retry or undo the last turn | `/retry`, `/undo` | `/retry`, `/undo` |
| Compress context / check usage | `/compress`, `/usage`, `/insights [--days N]` | `/compress`, `/usage`, `/insights [days]` |
| Browse skills | `/skills` or `/<skill-name>` | `/<skill-name>` |
| Interrupt current work | `Ctrl+C` or send a new message | `/stop` or send a new message |
| Platform-specific status | `/platforms` | `/status`, `/sethome` |
| Action | CLI | Messaging platforms |
| ------------------------------ | --------------------------------------------- | -------------------------------------------------------------------------------- |
| Start chatting | `hermes` | Run `hermes gateway setup` + `hermes gateway start`, then send the bot a message |
| Start fresh conversation | `/new` or `/reset` | `/new` or `/reset` |
| Change model | `/model [provider:model]` | `/model [provider:model]` |
| Set a personality | `/personality [name]` | `/personality [name]` |
| Retry or undo the last turn | `/retry`, `/undo` | `/retry`, `/undo` |
| Compress context / check usage | `/compress`, `/usage`, `/insights [--days N]` | `/compress`, `/usage`, `/insights [days]` |
| Browse skills | `/skills` or `/<skill-name>` | `/<skill-name>` |
| Interrupt current work | `Ctrl+C` or send a new message | `/stop` or send a new message |
| Platform-specific status | `/platforms` | `/status`, `/sethome` |
For the full command lists, see the [CLI guide](https://hermes-agent.nousresearch.com/docs/user-guide/cli) and the [Messaging Gateway guide](https://hermes-agent.nousresearch.com/docs/user-guide/messaging).
@@ -124,23 +124,23 @@ For the full command lists, see the [CLI guide](https://hermes-agent.nousresearc
All documentation lives at **[hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs/)**:
| Section | What's Covered |
|---------|---------------|
| [Quickstart](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart) | Install → setup → first conversation in 2 minutes |
| [CLI Usage](https://hermes-agent.nousresearch.com/docs/user-guide/cli) | Commands, keybindings, personalities, sessions |
| [Configuration](https://hermes-agent.nousresearch.com/docs/user-guide/configuration) | Config file, providers, models, all options |
| [Messaging Gateway](https://hermes-agent.nousresearch.com/docs/user-guide/messaging) | Telegram, Discord, Slack, WhatsApp, Signal, Home Assistant |
| [Security](https://hermes-agent.nousresearch.com/docs/user-guide/security) | Command approval, DM pairing, container isolation |
| [Tools & Toolsets](https://hermes-agent.nousresearch.com/docs/user-guide/features/tools) | 40+ tools, toolset system, terminal backends |
| [Skills System](https://hermes-agent.nousresearch.com/docs/user-guide/features/skills) | Procedural memory, Skills Hub, creating skills |
| [Memory](https://hermes-agent.nousresearch.com/docs/user-guide/features/memory) | Persistent memory, user profiles, best practices |
| [MCP Integration](https://hermes-agent.nousresearch.com/docs/user-guide/features/mcp) | Connect any MCP server for extended capabilities |
| [Cron Scheduling](https://hermes-agent.nousresearch.com/docs/user-guide/features/cron) | Scheduled tasks with platform delivery |
| [Context Files](https://hermes-agent.nousresearch.com/docs/user-guide/features/context-files) | Project context that shapes every conversation |
| [Architecture](https://hermes-agent.nousresearch.com/docs/developer-guide/architecture) | Project structure, agent loop, key classes |
| [Contributing](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing) | Development setup, PR process, code style |
| [CLI Reference](https://hermes-agent.nousresearch.com/docs/reference/cli-commands) | All commands and flags |
| [Environment Variables](https://hermes-agent.nousresearch.com/docs/reference/environment-variables) | Complete env var reference |
| Section | What's Covered |
| --------------------------------------------------------------------------------------------------- | ---------------------------------------------------------- |
| [Quickstart](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart) | Install → setup → first conversation in 2 minutes |
| [CLI Usage](https://hermes-agent.nousresearch.com/docs/user-guide/cli) | Commands, keybindings, personalities, sessions |
| [Configuration](https://hermes-agent.nousresearch.com/docs/user-guide/configuration) | Config file, providers, models, all options |
| [Messaging Gateway](https://hermes-agent.nousresearch.com/docs/user-guide/messaging) | Telegram, Discord, Slack, WhatsApp, Signal, Home Assistant |
| [Security](https://hermes-agent.nousresearch.com/docs/user-guide/security) | Command approval, DM pairing, container isolation |
| [Tools & Toolsets](https://hermes-agent.nousresearch.com/docs/user-guide/features/tools) | 40+ tools, toolset system, terminal backends |
| [Skills System](https://hermes-agent.nousresearch.com/docs/user-guide/features/skills) | Procedural memory, Skills Hub, creating skills |
| [Memory](https://hermes-agent.nousresearch.com/docs/user-guide/features/memory) | Persistent memory, user profiles, best practices |
| [MCP Integration](https://hermes-agent.nousresearch.com/docs/user-guide/features/mcp) | Connect any MCP server for extended capabilities |
| [Cron Scheduling](https://hermes-agent.nousresearch.com/docs/user-guide/features/cron) | Scheduled tasks with platform delivery |
| [Context Files](https://hermes-agent.nousresearch.com/docs/user-guide/features/context-files) | Project context that shapes every conversation |
| [Architecture](https://hermes-agent.nousresearch.com/docs/developer-guide/architecture) | Project structure, agent loop, key classes |
| [Contributing](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing) | Development setup, PR process, code style |
| [CLI Reference](https://hermes-agent.nousresearch.com/docs/reference/cli-commands) | All commands and flags |
| [Environment Variables](https://hermes-agent.nousresearch.com/docs/reference/environment-variables) | Complete env var reference |
---
@@ -160,6 +160,7 @@ hermes claw migrate --overwrite # Overwrite existing conflicts
```
What gets imported:
- **SOUL.md** — persona file
- **Memories** — MEMORY.md and USER.md entries
- **Skills** — user-created skills → `~/.hermes/skills/openclaw-imports/`

View File

@@ -3,75 +3,73 @@
**Release Date:** May 16, 2026
**Since v0.13.0:** 808 commits · 633 merged PRs · 1393 files changed · 165,061 insertions · 545 issues closed (12 P0, 50 P1) · 215 community contributors (including co-authors)
> The Foundation Release — Hermes installs and runs anywhere, ships with the things you actually want to use, and stops shipping the things you don't. xAI Grok lands as a SuperGrok OAuth provider with grok-4.3 bumped to a 1M context window. A new OpenAI-compatible local proxy turns any OAuth-authed Hermes provider — Claude Pro, ChatGPT Pro, SuperGrok — into an endpoint that Codex / Aider / Cline / Continue can hit. `x_search` lands as a first-class X (Twitter) search tool with OAuth-or-API-key auth. The Microsoft Teams stack is wired end-to-end (Graph auth + webhook listener + pipeline runtime + outbound delivery). A debloating wave makes installs dramatically lighter — heavyweight backends now lazy-install on first use, the `[all]` extras drop everything covered by lazy-deps, and a tiered install falls back when a wheel rejects on your platform. `pip install hermes-agent` works from PyPI. The cold-start wave shaves ~19 seconds off `hermes` launch. Browser CDP calls are 180x faster. Two new messaging platforms (LINE + SimpleX Chat) bring the total to 22. Cross-session 1-hour Claude prompt caching, `/handoff` that actually transfers sessions live, native button UI for `clarify` on Telegram and Discord, Discord channel history backfill, LSP semantic diagnostics on every write, a unified pluggable `video_generate`, a `computer_use` cua-driver backend that finally works with non-Anthropic providers, clickable URLs in any terminal, Zed ACP Registry integration via `uvx`, native Windows beta, 9 new optional skills, OpenRouter Pareto Code router, huggingface/skills as a trusted default tap. 12 P0 + 50 P1 closures.
> The Foundation Release — Hermes Agent installs and runs anywhere now. Native Windows ships in early beta with a full PowerShell installer story, a `pip install hermes-agent` wheel lands on PyPI, lazy-deps reshape what `pip install hermes-agent` actually pulls down, the supply-chain checker scans every install/upgrade for unsafe versions, and a new OpenAI-compatible local proxy lets Codex / Aider / Cline talk to OAuth-only providers (Claude Pro, ChatGPT Pro, SuperGrok). The cold-start wave shaves ~19 seconds off `hermes` launch, browser-tool CDP calls run 180x faster, and `hermes tools` All-Platforms drops from 14s to under 1.5s. Two new messaging platforms (LINE and SimpleX Chat) and a Microsoft Graph foundation (Teams pipeline + webhook adapter) land alongside `/handoff` that finally transfers sessions live, `vision_analyze` passing pixels through to vision-capable models, `x_search` as a first-class tool, LSP semantic diagnostics on every `write_file` / `patch`, a unified pluggable `video_generate`, a `computer_use` cua-driver backend, cross-session 1-hour Claude prompt caching, a per-turn file-mutation verifier, plus 9 new optional skills. 50+ P1 closures, 12 P0 closures.
---
## ✨ Highlights
- **xAI Grok via SuperGrok OAuth — and grok-4.3 jumps to a 1M context window** — If you pay for SuperGrok, you can now use Grok inside Hermes by signing in with your xAI account — no API key, no separate billing. The wire-through also bumps grok-4.3 to a 1M token context window, so you can drop whole codebases or research corpora into a single prompt. Includes proper handling for entitlement errors and an SSH-to-tunnel docs page for when you're SSH'd into a remote box and need to complete the OAuth flow. ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534), [#26664](https://github.com/NousResearch/hermes-agent/pull/26664), [#26644](https://github.com/NousResearch/hermes-agent/pull/26644), [#26592](https://github.com/NousResearch/hermes-agent/pull/26592))
- **Native Windows support (early beta)** — full PowerShell installer, native subprocess/PTY paths, taskkill-based process management, MinGit auto-install, Microsoft Store python stub detection, foreground Ctrl+C preservation, taskkill+ps2 fallback, npm prefix handling, and ~40 follow-up Windows-only fixes across CLI / gateway / TUI / curator / tools. Hermes finally runs natively on `cmd.exe` and PowerShell, no WSL required. ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561), [#22130](https://github.com/NousResearch/hermes-agent/pull/22130), [#22752](https://github.com/NousResearch/hermes-agent/pull/22752), [#26618](https://github.com/NousResearch/hermes-agent/pull/26618), and many more)
- **OpenAI-compatible local proxy for OAuth providers** — Run `hermes proxy` and you get a `http://localhost:port` endpoint that speaks the OpenAI API but is backed by whichever OAuth provider you're signed into — Claude Pro, ChatGPT Pro, SuperGrok. Now any tool that expects an OpenAI-compatible endpoint (Codex CLI, Aider, Cline, Continue, your custom scripts) just works with your existing subscription, no API key required. One subscription, every tool. ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
- **`pip install hermes-agent && hermes`** — Hermes Agent is now a real PyPI package. One command, no clone, no git, no shell installer. Wheel includes the Ink TUI bundle and shell launcher. (salvage of [#26350](https://github.com/NousResearch/hermes-agent/pull/26350)) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593))
- **`x_search` — first-class X (Twitter) search tool** — The agent can now search X directly without installing a skill or wiring up a custom integration. Search the timeline, find threads, surface specific posts — straight from the chat. Auth with either your X OAuth login or an API key, whichever you have. ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
- **Cold-start performance wave — ~19s off `hermes` launch** — skills cache, lazy Feishu import, no Nous HTTP at startup, plus PEP-562 lazy adapter imports (QQ, Yuanbao, Teams, Google Chat), deferred `fal_client` / `google-cloud` / `httpx` loads, models.dev disk-cache-first lookup, parallel doctor API checks, eager-skip plugin discovery on built-in subcommands, `hermes tools` All-Platforms drops from 14s to <1.5s, welcome banner skipped on `chat -q`. ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138), [#22120](https://github.com/NousResearch/hermes-agent/pull/22120), [#22681](https://github.com/NousResearch/hermes-agent/pull/22681), [#22790](https://github.com/NousResearch/hermes-agent/pull/22790), [#22808](https://github.com/NousResearch/hermes-agent/pull/22808), [#22831](https://github.com/NousResearch/hermes-agent/pull/22831), [#22859](https://github.com/NousResearch/hermes-agent/pull/22859), [#22904](https://github.com/NousResearch/hermes-agent/pull/22904), [#22766](https://github.com/NousResearch/hermes-agent/pull/22766), [#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
- **Microsoft Teams — end-to-end** — Hermes can now read messages from Teams and post back. The full Microsoft Graph stack lands together: auth + client foundation, a webhook listener that receives Teams events, a pipeline plugin runtime, and outbound delivery. Wire up the bot once, then chat to your agent from any Teams channel, DM, or group. (salvages of #21408#21411) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922), [#21969](https://github.com/NousResearch/hermes-agent/pull/21969), [#22007](https://github.com/NousResearch/hermes-agent/pull/22007), [#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
- **180x faster `browser_console` evaluations** — routed through the supervisor's persistent CDP WebSocket instead of spawning a fresh DevTools session per call. Real-world page interactions feel instant. ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
- **Debloating wave — lighter installs, less you don't use** — A clean `pip install hermes-agent` used to pull down everything: every messaging adapter SDK, every image-gen SDK, every voice/TTS provider, whether you used them or not. Now those heavy backends (Slack / Matrix / Feishu / DingTalk adapters, hindsight client, codex app-server, Pixverse / Camofox / image-gen SDKs, voice/TTS providers) install automatically the first time you actually use them. The `[all]` extras drop everything covered by lazy-deps, the installer falls back through tiers when a wheel doesn't fit your platform, and a supply-chain advisory checker scans every install for unsafe versions. Faster installs, smaller disk footprint, fewer transitive vulnerabilities. ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220), [#24515](https://github.com/NousResearch/hermes-agent/pull/24515), [#25014](https://github.com/NousResearch/hermes-agent/pull/25014), [#25038](https://github.com/NousResearch/hermes-agent/pull/25038), [#25766](https://github.com/NousResearch/hermes-agent/pull/25766), [#21818](https://github.com/NousResearch/hermes-agent/pull/21818))
- **Supply-chain advisory checker + lazy-deps framework + tiered install fallback** — every `pip install` / `hermes update` scans dependencies against an advisory list, lazy-deps replace heavy import-time loads with first-use installs, and the installer falls back through extras tiers when a wheel rejects on the target platform. ([#24220](https://github.com/NousResearch/hermes-agent/pull/24220))
- **`pip install hermes-agent && hermes`** — Hermes Agent is now a real PyPI package. No more cloning the repo or running shell installers — one pip command and you're running. The wheel ships with the Ink TUI bundle and the shell launcher, so the full experience comes out of the box. (salvage of [#26350](https://github.com/NousResearch/hermes-agent/pull/26350)) ([#26593](https://github.com/NousResearch/hermes-agent/pull/26593), [#26148](https://github.com/NousResearch/hermes-agent/pull/26148))
- **OpenAI-compatible local proxy** — `hermes proxy` exposes any OAuth-authed provider (Claude Pro, ChatGPT Pro, SuperGrok) as an OpenAI-compatible endpoint that Codex / Aider / Cline / VS Code Continue can hit. Your subscription, your tools. ([#25969](https://github.com/NousResearch/hermes-agent/pull/25969))
- **Cross-session 1h Claude prompt cache** — When you use Claude through Anthropic, OpenRouter, or Nous Portal, the prompt prefix (system prompt, skills, memory) now caches for an hour across sessions. Start a `/new` session and the first response comes back faster and cheaper because the cache is still warm from your last session. Background memory review hits the cache too, so it's not paying full price every turn. ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828), [#25434](https://github.com/NousResearch/hermes-agent/pull/25434), [#24778](https://github.com/NousResearch/hermes-agent/pull/24778))
- **Cross-session 1-hour Claude prompt cache** — Anthropic / OpenRouter / Nous Portal now share a 1h prefix cache across sessions for Claude models. Fast resume, fast `/new`, lower cost on repeat work. ([#23828](https://github.com/NousResearch/hermes-agent/pull/23828))
- **180x faster `browser_console` evaluations** — When the agent uses the browser tool to inspect a page or run JavaScript, those calls now share one persistent connection to Chrome instead of spinning up a new DevTools session every time. The difference is huge: things that used to take a couple of seconds per call return in milliseconds. Real-world page interactions feel instant. ([#23226](https://github.com/NousResearch/hermes-agent/pull/23226))
- **Two new messaging platforms — LINE + SimpleX Chat** — LINE Messaging API lands as a first-class platform, SimpleX Chat salvages #2558 onto the modern adapter spec. Hermes is now on 22 platforms. ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197), [#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
- **Cold-start performance wave — ~19 seconds off `hermes` launch** — Running `hermes` used to make you wait through a chunk of import overhead and network calls before you saw a prompt. Now the launch path is mostly deferred: heavy adapters only load when you use them, model catalogs come from disk cache first, doctor checks run in parallel, and `chat -q` skips the welcome banner entirely. The `hermes tools` All-Platforms screen alone dropped from 14 seconds to under 1.5 seconds. ([#22138](https://github.com/NousResearch/hermes-agent/pull/22138), [#22120](https://github.com/NousResearch/hermes-agent/pull/22120), [#22681](https://github.com/NousResearch/hermes-agent/pull/22681), [#22790](https://github.com/NousResearch/hermes-agent/pull/22790), [#22808](https://github.com/NousResearch/hermes-agent/pull/22808), [#22831](https://github.com/NousResearch/hermes-agent/pull/22831), [#22859](https://github.com/NousResearch/hermes-agent/pull/22859), [#22904](https://github.com/NousResearch/hermes-agent/pull/22904), [#22766](https://github.com/NousResearch/hermes-agent/pull/22766), [#25341](https://github.com/NousResearch/hermes-agent/pull/25341))
- **Microsoft Graph foundation — Teams pipeline + webhook adapter** — `msgraph` auth/client foundation, webhook listener platform, Teams pipeline plugin runtime, and Teams outbound delivery via the existing adapter — Hermes can now read and post to Teams. (salvages of #21408#21411) ([#21922](https://github.com/NousResearch/hermes-agent/pull/21922), [#21969](https://github.com/NousResearch/hermes-agent/pull/21969), [#22007](https://github.com/NousResearch/hermes-agent/pull/22007), [#22024](https://github.com/NousResearch/hermes-agent/pull/22024))
- **Two new messaging platforms — LINE + SimpleX Chat** — LINE is huge in Japan, Korea, and Taiwan, and now Hermes runs natively on the LINE Messaging API. SimpleX Chat is the privacy-focused decentralized messenger with no user IDs — also wired up as a first-class platform. That brings Hermes to 22 messaging platforms total, so wherever you and your team chat, the agent can be there. ([#23197](https://github.com/NousResearch/hermes-agent/pull/23197), [#26232](https://github.com/NousResearch/hermes-agent/pull/26232))
- **`/handoff` actually transfers the session live** — the agent's active session moves to a different model / persona / profile mid-conversation, with messages, tool history, and context preserved. ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
- **`/handoff` actually transfers the session live** — Switching models or personalities mid-conversation used to mean losing context or starting over. Now `/handoff` moves your active session — every message, every tool call, every piece of context — to the target model, persona, or profile, live, without dropping anything. Mid-debugging hand off from a fast model to a deep-reasoning one, or pass a session between profiles for different parts of a task. ([#23395](https://github.com/NousResearch/hermes-agent/pull/23395))
- **`x_search` — first-class X (Twitter) search tool** — gated tool with OAuth-or-API-key auth, no skill needed to query the timeline. ([#26763](https://github.com/NousResearch/hermes-agent/pull/26763))
- **Native button UI for `clarify` on Telegram and Discord** — When the agent uses the `clarify` tool to ask you a multiple-choice question, it now shows real platform-native buttons on Telegram and Discord instead of asking you to type back the option number. Tap the button, the agent gets your answer. Especially nice on mobile. ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199), [#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
- **`vision_analyze` returns pixels to vision-capable models** — when the active model can see, `vision_analyze` now hands the image straight through instead of falling back to a text description. ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
- **Discord channel history backfill (default on)** — When Hermes joins a Discord channel or thread for the first time, it now reads the recent message history so it knows what's been said before it responds. No more "what are we talking about?" — the agent has the context that's already on screen for everyone else. ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
- **LSP semantic diagnostics on every write** — `write_file` and `patch` now run real language-server diagnostics on the post-edit file (delta-only) and surface real errors before they ship downstream. ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168), [#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
- **`vision_analyze` returns pixels to vision-capable models** — When you point the agent at an image with `vision_analyze` and the active model can actually see (GPT-5, Claude, Gemini, Grok-vision), Hermes now passes the raw pixels straight to the model instead of converting them to a text description first. You get the model's actual visual reasoning instead of a degraded text-summary round-trip. ([#22955](https://github.com/NousResearch/hermes-agent/pull/22955))
- **Per-turn file-mutation verifier footer** — after every turn that wrote files, the agent gets a verifier footer summarizing what actually changed on disk — catches silent overwrites and "wrote it but it didn't land" bugs. ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
- **Per-turn file-mutation verifier footer** — After every turn that wrote or edited files, the agent now gets a short footer summarizing exactly what changed on disk — the file paths, the line counts, the actual delta. That means the agent catches its own mistakes when a write didn't land or got silently overwritten, instead of confidently telling you "I added the function" when the file wasn't actually saved. ([#24498](https://github.com/NousResearch/hermes-agent/pull/24498))
- **Unified `video_generate` with pluggable provider backends** — single tool, any backend. Drop in a new video provider as a plugin, no core changes. ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
- **LSP semantic diagnostics on every write** — When the agent uses `write_file` or `patch`, Hermes now runs a real language server against the edited file and surfaces any new errors back to the agent before the next turn. Type errors, undefined symbols, missing imports — caught immediately. Goes way beyond v0.13.0's basic Python/JSON/YAML/TOML linting because it's actual semantic analysis. ([#24168](https://github.com/NousResearch/hermes-agent/pull/24168), [#25978](https://github.com/NousResearch/hermes-agent/pull/25978))
- **`computer_use` cua-driver backend** — proper focus-safe ops, non-Anthropic provider support, refresh on `hermes update`. Computer-use is no longer locked to a single SDK. (re-salvage of #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967), [#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
- **Unified `video_generate` with pluggable provider backends** — One tool, any video model. Hermes ships with the obvious backends already, but you can drop in a new video provider as a plugin without touching core. So when a new video model lands next month, it can be a one-file plugin instead of a fork. ([#25126](https://github.com/NousResearch/hermes-agent/pull/25126))
- **xAI Grok OAuth provider — SuperGrok via subscription** — sign in with your xAI account, talk to Grok models from Hermes. ([#26534](https://github.com/NousResearch/hermes-agent/pull/26534))
- **`computer_use` cua-driver backend — works with non-Anthropic models now** — Computer-use (the agent controlling your mouse and keyboard to drive GUI apps) used to be locked to Anthropic's SDK. The new cua-driver backend works with non-Anthropic providers too, has proper focus-safe operations, and refreshes itself on `hermes update`. Now any vision-capable model can drive your desktop. (re-salvage of #16936) ([#21967](https://github.com/NousResearch/hermes-agent/pull/21967), [#24063](https://github.com/NousResearch/hermes-agent/pull/24063))
- **Clarify with buttons — native inline keyboards on Telegram + Discord** — the `clarify` tool renders multi-choice prompts as platform-native buttons instead of typed responses. ([#24199](https://github.com/NousResearch/hermes-agent/pull/24199), [#25485](https://github.com/NousResearch/hermes-agent/pull/25485))
- **Clickable URLs in any terminal** — Links in agent output are now real OSC8 hyperlinks with hover-highlight in any terminal that supports them. Click to open in your browser — no more copy-paste-trim of long URLs from the transcript. Just works in iTerm2, Kitty, Ghostty, modern Windows Terminal, etc. (@OutThisLife) ([#25071](https://github.com/NousResearch/hermes-agent/pull/25071), [#24013](https://github.com/NousResearch/hermes-agent/pull/24013))
- **Discord channel history backfill (default on)** — Hermes reads recent channel history when joining a thread so it actually knows what's been said. ([#25984](https://github.com/NousResearch/hermes-agent/pull/25984))
- **Zed ACP Registry — `uvx` install in one click** — Hermes is now listed in Zed's Agent Client Protocol registry, so Zed users can install it with one click. The install path uses `uvx` so there's no npm dependency. `hermes acp --setup-browser` bootstraps the browser tools for registry-driven installs. (salvage of [#25908](https://github.com/NousResearch/hermes-agent/pull/25908)) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079), [#26120](https://github.com/NousResearch/hermes-agent/pull/26120), [#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
- **Watchers skill — RSS / HTTP JSON / GitHub polling via cron `no_agent` mode** — skill recipes that wire change-detection sources directly into cron's script-only watchdog mode. ([#21881](https://github.com/NousResearch/hermes-agent/pull/21881))
- **OpenRouter Pareto Code router with `min_coding_score` knob** — OpenRouter's "Pareto" router automatically picks the cheapest model that meets a minimum quality bar. The new `min_coding_score` config lets you set that bar for coding tasks specifically — Hermes routes to the most affordable model that's at least that good at code. Stop paying for top-tier models when a mid-tier one would do. ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
- **Zed ACP Registry integration + uvx distribution** — Hermes is in the Zed registry, installable via `uvx` (no npm). Plus `hermes acp --setup-browser` bootstraps browser tools for registry installs. (salvage of [#25908](https://github.com/NousResearch/hermes-agent/pull/25908)) ([#26079](https://github.com/NousResearch/hermes-agent/pull/26079), [#26120](https://github.com/NousResearch/hermes-agent/pull/26120), [#26234](https://github.com/NousResearch/hermes-agent/pull/26234))
- **NovitaAI as a new model provider** — NovitaAI joins the provider lineup, giving you another option for open-source model hosting (Llama, Qwen, DeepSeek, etc.) with their pricing and rate limits. (salvage #7219) (@kshitijk4poor) ([#25507](https://github.com/NousResearch/hermes-agent/pull/25507))
- **OpenRouter Pareto Code router** — wire a new OpenRouter router with `min_coding_score` knob. Pick the cheapest model that meets your quality bar. ([#22838](https://github.com/NousResearch/hermes-agent/pull/22838))
- **Codex app-server runtime for OpenAI/Codex models** — An optional runtime that drives OpenAI's Codex CLI under the hood when you're using OpenAI or Codex paths. You get session reuse, automatic retirement of wedged sessions, and proper OAuth refresh classification — the kind of plumbing that makes long agentic runs not fall over. ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182), [#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
- **Optional codex app-server runtime for OpenAI/Codex models** — drives the OpenAI Codex CLI under the hood for OpenAI/Codex paths, with session reuse, wedge retirement, and OAuth refresh classification. ([#24182](https://github.com/NousResearch/hermes-agent/pull/24182), [#25769](https://github.com/NousResearch/hermes-agent/pull/25769))
- **`huggingface/skills` as a trusted default tap** — The community skills index hosted at huggingface.co/skills is now wired into the Skills Hub by default. So when somebody publishes a useful skill there, you can install it from your own `hermes skills` browser without any extra config. (closes #2549) ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
- **`hermes-skills/huggingface` as a trusted default tap** — community skills index from huggingface.co/skills is available by default in the Skills Hub. ([#26219](https://github.com/NousResearch/hermes-agent/pull/26219))
- **9 new optional skills** — Hyperliquid (perp + spot trading via the SDK and REST API), Yahoo Finance (live market data, fundamentals, historicals), api-testing (REST + GraphQL debug recipes), unified EVM multi-chain (one skill covers Ethereum + L2s + Base), darwinian-evolver (evolutionary prompt/skill tuning), osint-investigation (OSINT recipes for people / domains / orgs), pinggy-tunnel (expose local services to the public internet), watchers (polls RSS / HTTP JSON / GitHub via cron `no_agent` mode for change detection), and a full Notion overhaul for the May 2026 Developer Platform. ([#23582](https://github.com/NousResearch/hermes-agent/pull/23582), [#23583](https://github.com/NousResearch/hermes-agent/pull/23583), [#23590](https://github.com/NousResearch/hermes-agent/pull/23590), [#25299](https://github.com/NousResearch/hermes-agent/pull/25299), [#26760](https://github.com/NousResearch/hermes-agent/pull/26760), [#26729](https://github.com/NousResearch/hermes-agent/pull/26729), [#26765](https://github.com/NousResearch/hermes-agent/pull/26765), [#21881](https://github.com/NousResearch/hermes-agent/pull/21881), [#26612](https://github.com/NousResearch/hermes-agent/pull/26612))
- **9 new optional skills** — Hyperliquid (perp/spot trading via SDK + REST) (@kshitijk4poor & Hermes), Yahoo Finance market data, api-testing (REST/GraphQL debug), unified EVM multi-chain skill (folds #25291 + #2010 + base/), darwinian-evolver, osint-investigation (closes #355), pinggy-tunnel, watchers (RSS/HTTP/GitHub via cron), Notion overhaul for the Developer Platform (May 2026). ([#23582](https://github.com/NousResearch/hermes-agent/pull/23582), [#23583](https://github.com/NousResearch/hermes-agent/pull/23583), [#23590](https://github.com/NousResearch/hermes-agent/pull/23590), [#25299](https://github.com/NousResearch/hermes-agent/pull/25299), [#26760](https://github.com/NousResearch/hermes-agent/pull/26760), [#26729](https://github.com/NousResearch/hermes-agent/pull/26729), [#26765](https://github.com/NousResearch/hermes-agent/pull/26765), [#21881](https://github.com/NousResearch/hermes-agent/pull/21881), [#26612](https://github.com/NousResearch/hermes-agent/pull/26612))
- **API server exposes run approval events** — If you're driving Hermes programmatically through the HTTP API, long-running runs no longer silently hang when the agent hits an approval-required command. The approval request now surfaces on the API stream so your client can prompt the user and reply — no more silent stalls. (salvage of [#20311](https://github.com/NousResearch/hermes-agent/pull/20311)) ([#21899](https://github.com/NousResearch/hermes-agent/pull/21899))
- **API server exposes run approval events** — long-running runs surface approval requests over the API stream, no more silent stalls. (salvage of [#20311](https://github.com/NousResearch/hermes-agent/pull/20311)) ([#21899](https://github.com/NousResearch/hermes-agent/pull/21899))
- **Plugins can run any LLM call via `ctx.llm` + replace built-in tools via `tool_override`** — If you're writing a Hermes plugin, you now get first-class access to make LLM calls through the active provider and credentials — no manual client wiring. The new `tool_override` flag lets a plugin swap out a built-in tool with its own implementation cleanly. Plugin authors get the same model-routing and auth plumbing the core agent uses. (closes #11049) ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194), [#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
- **`/subgoal` — user-added criteria appended to active `/goal`** — layer extra success criteria onto a running goal loop. The judge sees them in the prompt, no behavior change when subgoals are empty. ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
- **Brave Search (free tier) + DuckDuckGo (DDGS) as web-search providers** — Two new free web-search backends join Tavily, SearXNG, and Exa. Brave Search has a generous free tier; DDGS is the DuckDuckGo scraper that needs no key at all. Pick whichever fits your budget and rate-limit needs. ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
- **Plugins can run any LLM call via `ctx.llm`** — plugins get a first-class hook to make their own LLM requests through the active provider/credentials, no manual wiring. Plus `tool_override` flag for replacing built-in tools. ([#23194](https://github.com/NousResearch/hermes-agent/pull/23194), [#26759](https://github.com/NousResearch/hermes-agent/pull/26759))
- **Sudo brute-force block + 3 dangerous-command bypasses closed + tool-error sanitization** — The approval gate now blocks `sudo -S` brute-force attempts and classifies stdin-fed or askpass-stripped sudo invocations as DANGEROUS. Three known bypasses of dangerous-command detection are closed (inspired by Claude Code's command-detection work). And tool error strings are now sanitized before being re-injected into the model context, so a malicious file or remote service can't pass instructions to your agent through error output. ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736), [#26829](https://github.com/NousResearch/hermes-agent/pull/26829), [#26823](https://github.com/NousResearch/hermes-agent/pull/26823))
- **Brave Search (free tier) + DuckDuckGo (DDGS) as web-search providers** — two new free search backends alongside Tavily / SearXNG / Exa. ([#21337](https://github.com/NousResearch/hermes-agent/pull/21337))
- **`/subgoal` — user-added criteria appended to an active `/goal`** — When you've got a `/goal` running (the persistent Ralph-loop goal where the agent keeps going until criteria are met), you can now use `/subgoal <text>` to layer extra success criteria onto it mid-run. The judge factors your new criteria into the done-or-keep-going decision without restarting the loop. ([#25449](https://github.com/NousResearch/hermes-agent/pull/25449))
- **Sudo brute-force block + sudo-stdin/askpass DANGEROUS classification** — closes the `sudo -S` brute-force avenue; approval gates classify stdin-fed and askpass-stripped sudo invocations as dangerous. (salvages of #22194 + #21128) ([#23736](https://github.com/NousResearch/hermes-agent/pull/23736))
- **Provider rename — Alibaba Cloud → Qwen Cloud** — The Alibaba Cloud provider is renamed to Qwen Cloud in the picker and config to match what the rest of the world calls it. Existing config keys still work — no breaking changes — but the UI matches the actual brand now. ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
- **Native Windows support (early beta)** — Hermes now runs natively on `cmd.exe` and PowerShell without WSL. A full PowerShell installer handles MinGit auto-install, Microsoft Store python stub detection, and the foreground Ctrl+C dance. There's still rough edges (this is the "early beta" stamp) — ~40 follow-up Windows-only fixes already landed in the window — but the basic loop works end-to-end on a clean Windows box. ([#21561](https://github.com/NousResearch/hermes-agent/pull/21561))
- **Provider rename — Alibaba Cloud → Qwen Cloud, picker reorder** — matches what the world calls it. Existing config keys still work. ([#24835](https://github.com/NousResearch/hermes-agent/pull/24835))
---

View File

@@ -907,72 +907,6 @@ def _build_polished_completion_content(
return [_text(text)]
def _build_patch_mode_content(patch_text: str) -> List[Any]:
"""Parse V4A patch mode input into ACP diff blocks when possible."""
if not patch_text:
return [acp.tool_content(acp.text_block(""))]
try:
from tools.patch_parser import OperationType, parse_v4a_patch
operations, error = parse_v4a_patch(patch_text)
if error or not operations:
return [acp.tool_content(acp.text_block(patch_text))]
content: List[Any] = []
for op in operations:
if op.operation == OperationType.UPDATE:
old_chunks: list[str] = []
new_chunks: list[str] = []
for hunk in op.hunks:
old_lines = [line.content for line in hunk.lines if line.prefix in {" ", "-"}]
new_lines = [line.content for line in hunk.lines if line.prefix in {" ", "+"}]
if old_lines or new_lines:
old_chunks.append("\n".join(old_lines))
new_chunks.append("\n".join(new_lines))
old_text = "\n...\n".join(chunk for chunk in old_chunks if chunk)
new_text = "\n...\n".join(chunk for chunk in new_chunks if chunk)
if old_text or new_text:
content.append(
acp.tool_diff_content(
path=op.file_path,
old_text=old_text or None,
new_text=new_text or "",
)
)
continue
if op.operation == OperationType.ADD:
added_lines = [line.content for hunk in op.hunks for line in hunk.lines if line.prefix == "+"]
content.append(
acp.tool_diff_content(
path=op.file_path,
new_text="\n".join(added_lines),
)
)
continue
if op.operation == OperationType.DELETE:
content.append(
acp.tool_diff_content(
path=op.file_path,
old_text=f"Delete file: {op.file_path}",
new_text="",
)
)
continue
if op.operation == OperationType.MOVE:
content.append(
acp.tool_content(acp.text_block(f"Move file: {op.file_path} -> {op.new_path}"))
)
return content or [acp.tool_content(acp.text_block(patch_text))]
except Exception:
return [acp.tool_content(acp.text_block(patch_text))]
def _strip_diff_prefix(path: str) -> str:
raw = str(path or "").strip()
if raw.startswith(("a/", "b/")):

View File

@@ -27,7 +27,6 @@ import threading
import time
import uuid
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, List, Optional
from urllib.parse import urlparse, parse_qs, urlunparse
@@ -37,7 +36,6 @@ from agent.memory_manager import StreamingContextScrubber
from agent.model_metadata import (
MINIMUM_CONTEXT_LENGTH,
fetch_model_metadata,
get_model_context_length,
is_local_endpoint,
query_ollama_num_ctx,
)
@@ -52,7 +50,6 @@ from agent.tool_guardrails import (
from hermes_cli.config import cfg_get
from hermes_cli.timeouts import get_provider_request_timeout
from hermes_constants import get_hermes_home
from model_tools import check_toolset_requirements, get_tool_definitions
from utils import base_url_host_matches
# Use the same logger name as run_agent so tests patching ``run_agent.logger``
@@ -1201,6 +1198,18 @@ def init_agent(
_agent_section = {}
agent._tool_use_enforcement = _agent_section.get("tool_use_enforcement", "auto")
# Universal task-completion guidance toggle. Default True. Surfaced
# as a separate flag from tool_use_enforcement because the guidance
# applies to ALL models, not just the model families enforcement
# targets.
agent._task_completion_guidance = bool(_agent_section.get("task_completion_guidance", True))
# Local Python toolchain probe toggle. Default True. When False,
# the probe is skipped entirely (no subprocess calls, no system-prompt
# line). Useful for users on exotic setups where the probe heuristics
# are noisy.
agent._environment_probe = bool(_agent_section.get("environment_probe", True))
# App-level API retry count (wraps each model API call). Default 3,
# overridable via agent.api_max_retries in config.yaml. See #11616.
try:
@@ -1462,7 +1471,6 @@ def init_agent(
# Reject models whose context window is below the minimum required
# for reliable tool-calling workflows (64K tokens).
from agent.model_metadata import MINIMUM_CONTEXT_LENGTH
_ctx = getattr(agent.context_compressor, "context_length", 0)
if _ctx and _ctx < MINIMUM_CONTEXT_LENGTH:
raise ValueError(

View File

@@ -25,24 +25,17 @@ from __future__ import annotations
import copy
import json
import logging
import os
import re
import threading
import time
import uuid
from datetime import datetime
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from typing import Any, Dict, List, Optional
from hermes_cli.timeouts import get_provider_request_timeout
from agent.message_sanitization import (
_repair_tool_call_arguments,
_sanitize_surrogates,
)
from agent.tool_dispatch_helpers import _trajectory_normalize_msg, make_tool_result_message
from agent.trajectory import convert_scratchpad_to_think
from agent.credential_pool import STATUS_EXHAUSTED
from agent.error_classifier import classify_api_error, FailoverReason
from agent.error_classifier import FailoverReason
from utils import base_url_host_matches, base_url_hostname, env_var_enabled, atomic_json_write
logger = logging.getLogger(__name__)
@@ -1699,6 +1692,8 @@ def invoke_tool(agent, function_name: str, function_args: dict, effective_task_i
session_id=agent.session_id or "",
enabled_tools=list(agent.valid_tool_names) if agent.valid_tool_names else None,
skip_pre_tool_call_hook=True,
enabled_toolsets=getattr(agent, "enabled_toolsets", None),
disabled_toolsets=getattr(agent, "disabled_toolsets", None),
)

View File

@@ -894,20 +894,6 @@ def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
return None
def read_claude_managed_key() -> Optional[str]:
"""Read Claude's native managed key from ~/.claude.json for diagnostics only."""
claude_json = Path.home() / ".claude.json"
if claude_json.exists():
try:
data = json.loads(claude_json.read_text(encoding="utf-8"))
primary_key = data.get("primaryApiKey", "")
if isinstance(primary_key, str) and primary_key.strip():
return primary_key.strip()
except (json.JSONDecodeError, OSError, IOError) as e:
logger.debug("Failed to read ~/.claude.json: %s", e)
return None
def is_claude_code_token_valid(creds: Dict[str, Any]) -> bool:
"""Check if Claude Code credentials have a non-expired access token."""
import time
@@ -1256,10 +1242,16 @@ def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]:
print()
try:
webbrowser.open(auth_url)
print(" (Browser opened automatically)")
from hermes_cli.auth import _can_open_graphical_browser as _can_open_gui
except Exception:
pass
_can_open_gui = lambda: True # noqa: E731 — degrade to prior behavior
if _can_open_gui():
try:
webbrowser.open(auth_url)
print(" (Browser opened automatically)")
except Exception:
pass
print()
print("After authorizing, you'll see a code. Paste it below.")
@@ -1791,11 +1783,25 @@ def _strip_orphaned_tool_blocks(result: List[Dict[str, Any]]) -> None:
tool_result_ids.add(block.get("tool_use_id"))
for m in result:
if m["role"] == "assistant" and isinstance(m["content"], list):
m["content"] = [
kept = [
b
for b in m["content"]
if b.get("type") != "tool_use" or b.get("id") in tool_result_ids
]
# If stripping an orphaned tool_use mutated a turn that also carries a
# signed thinking block, that block's Anthropic signature was computed
# against the ORIGINAL (un-stripped) turn content and is now invalid.
# Anthropic rejects the replayed turn with HTTP 400 "thinking blocks in
# the latest assistant message cannot be modified". Flag the turn so
# _manage_thinking_signatures can demote the dead signature instead of
# replaying it verbatim. See hermes-agent: extended-thinking + parallel
# tool batch interrupted mid-flight → non-retryable 400 crash-loop.
if len(kept) != len(m["content"]) and any(
isinstance(b, dict) and b.get("type") in {"thinking", "redacted_thinking"}
for b in m["content"]
):
m["_thinking_signature_invalidated"] = True
m["content"] = kept
if not m["content"]:
m["content"] = [{"type": "text", "text": "(tool call removed)"}]
@@ -1840,6 +1846,10 @@ def _merge_consecutive_roles(result: List[Dict[str, Any]]) -> List[Dict[str, Any
fixed[-1]["content"] = prev_content + curr_content
else:
# Consecutive assistant messages — merge text content.
# Propagate the orphan-strip signature-invalidation flag onto the
# surviving (prev) dict so _manage_thinking_signatures still sees it.
if m.get("_thinking_signature_invalidated"):
fixed[-1]["_thinking_signature_invalidated"] = True
# Drop thinking blocks from the *second* message: their
# signature was computed against a different turn boundary
# and becomes invalid once merged.
@@ -1928,11 +1938,26 @@ def _manage_thinking_signatures(
else:
# Latest assistant on direct Anthropic: keep signed, downgrade unsigned
# to text so the reasoning isn't lost.
#
# Exception: if orphan-stripping (or another structural mutation) removed
# a tool_use block from THIS turn, every thinking signature on it was
# computed against the original turn content and is now dead. Anthropic
# rejects the turn either way — replaying the signed block 400s with
# "thinking blocks in the latest assistant message cannot be modified",
# and a bare signed block with no following tool_use is also invalid.
# Demote ALL thinking blocks on this turn to text so the turn replays
# cleanly and the model can re-plan from the surviving tool results.
signature_dead = bool(m.get("_thinking_signature_invalidated"))
new_content = []
for b in m["content"]:
if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
new_content.append(b)
continue
if signature_dead:
thinking_text = b.get("thinking", "")
if thinking_text:
new_content.append({"type": "text", "text": thinking_text})
continue
if b.get("type") == "redacted_thinking":
# Redacted blocks use 'data' for the signature payload —
# drop the block when 'data' is missing (can't be validated).
@@ -1952,6 +1977,9 @@ def _manage_thinking_signatures(
if isinstance(b, dict) and b.get("type") in _THINKING_TYPES:
b.pop("cache_control", None)
# Drop the internal bookkeeping flag — it must never reach the API payload.
m.pop("_thinking_signature_invalidated", None)
def _evict_old_screenshots(result: List[Dict[str, Any]]) -> None:
"""Keep only the most recent ``_MAX_KEEP_IMAGES`` computer-use screenshots.

View File

@@ -700,12 +700,20 @@ class _CodexCompletionsAdapter:
# xAI's Responses endpoint rejects ``pattern`` and ``format`` JSON Schema
# keywords (HTTP 400). Strip them here to match the parity guarantee that
# chat_completion_helpers.py provides for the main-agent xAI path.
#
# Deep-copy before sanitizing — ``list(tools)`` is only a shallow
# copy of the outer list, but the sanitizers mutate the inner
# parameter dicts in place. Without a deep copy the caller's
# tool registry permanently loses its slash-containing enum
# constraints after the first auxiliary xAI call. See #27907.
try:
import copy as _copy
from tools.schema_sanitizer import (
strip_pattern_and_format,
strip_slash_enum,
)
tools, _ = strip_pattern_and_format(list(tools))
tools = _copy.deepcopy(list(tools))
tools, _ = strip_pattern_and_format(tools)
tools, _ = strip_slash_enum(tools)
except Exception as exc:
logger.warning(
@@ -1235,8 +1243,23 @@ def _read_nous_auth() -> Optional[dict]:
def _nous_api_key(provider: dict) -> str:
"""Extract the Nous runtime credential from the compatibility field."""
return provider.get("agent_key") or provider.get("access_token", "")
"""Extract a usable Nous inference JWT from stored auth state."""
from hermes_cli.auth import _nous_invoke_jwt_is_usable
for token_key, expiry_key in (
("agent_key", "agent_key_expires_at"),
("access_token", "expires_at"),
):
token = provider.get(token_key)
if not isinstance(token, str) or not token.strip():
continue
if _nous_invoke_jwt_is_usable(
token,
scope=provider.get("scope"),
expires_at=provider.get(expiry_key),
):
return token
return ""
def _nous_base_url() -> str:
@@ -1248,25 +1271,16 @@ def _resolve_nous_runtime_api(*, force_refresh: bool = False) -> Optional[tuple[
"""Return fresh Nous runtime credentials when available.
This mirrors the main agent's 401 recovery path and keeps auxiliary
clients aligned with the singleton auth store + JWT/mint flow instead of
clients aligned with the singleton auth store + JWT refresh flow instead of
relying only on whatever raw tokens happen to be sitting in auth.json
or the credential pool.
"""
try:
from hermes_cli.auth import (
NOUS_INFERENCE_AUTH_MODE_AUTO,
NOUS_INFERENCE_AUTH_MODE_LEGACY,
resolve_nous_runtime_credentials,
)
from hermes_cli.auth import resolve_nous_runtime_credentials
creds = resolve_nous_runtime_credentials(
min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
inference_auth_mode=(
NOUS_INFERENCE_AUTH_MODE_LEGACY
if force_refresh
else NOUS_INFERENCE_AUTH_MODE_AUTO
),
force_refresh=force_refresh,
)
except Exception as exc:
logger.debug("Auxiliary Nous runtime credential resolution failed: %s", exc)
@@ -1550,13 +1564,9 @@ def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
_mark_provider_unhealthy("nous", ttl=60)
return None, None
if runtime is None and nous:
# Runtime credential mint failed but stored Nous auth is still present.
# Falls back to the raw stored token below; surface a debug line so
# operators investigating expired/invalid sessions have a breadcrumb,
# without blocking the fallback path the rest of this function relies on.
logger.debug(
"Auxiliary Nous: runtime credential mint failed; falling back to "
"stored auth.json token."
"Auxiliary Nous: runtime JWT refresh failed; checking stored "
"auth.json token."
)
global auxiliary_is_nous
auxiliary_is_nous = True
@@ -1594,6 +1604,13 @@ def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
api_key, base_url = runtime
else:
api_key = _nous_api_key(nous or {})
if not api_key:
logger.warning(
"Auxiliary Nous client unavailable: no usable inference JWT found "
"(run: hermes auth add nous)."
)
_mark_provider_unhealthy("nous", ttl=60)
return None, None
base_url = str((nous or {}).get("inference_base_url") or _nous_base_url()).rstrip("/")
return (
OpenAI(
@@ -1663,26 +1680,48 @@ def _read_main_provider() -> str:
# per turn — no lock needed. Cleared by ``clear_runtime_main()``.
_RUNTIME_MAIN_PROVIDER: str = ""
_RUNTIME_MAIN_MODEL: str = ""
_RUNTIME_MAIN_BASE_URL: str = ""
_RUNTIME_MAIN_API_KEY: str = ""
_RUNTIME_MAIN_API_MODE: str = ""
def set_runtime_main(provider: str, model: str) -> None:
"""Record the live runtime provider/model for the current AIAgent.
def set_runtime_main(
provider: str,
model: str,
*,
base_url: str = "",
api_key: str = "",
api_mode: str = "",
) -> None:
"""Record the live runtime provider/model/credentials for the current AIAgent.
Called by ``run_agent.AIAgent._sync_runtime_main_for_aux_routing`` (or
equivalent setter) at the top of each turn so that
``_read_main_provider`` / ``_read_main_model`` reflect CLI/gateway
overrides instead of the stale config.yaml default.
For ``custom:`` providers, ``base_url`` and ``api_key`` must also be
recorded so that ``_resolve_auto`` can construct a valid client in
Step 1 instead of falling through to the aggregator chain.
"""
global _RUNTIME_MAIN_PROVIDER, _RUNTIME_MAIN_MODEL
global _RUNTIME_MAIN_BASE_URL, _RUNTIME_MAIN_API_KEY, _RUNTIME_MAIN_API_MODE
_RUNTIME_MAIN_PROVIDER = (provider or "").strip().lower()
_RUNTIME_MAIN_MODEL = (model or "").strip()
_RUNTIME_MAIN_BASE_URL = (base_url or "").strip()
_RUNTIME_MAIN_API_KEY = api_key.strip() if isinstance(api_key, str) else ""
_RUNTIME_MAIN_API_MODE = (api_mode or "").strip()
def clear_runtime_main() -> None:
"""Clear the runtime override (e.g. on session end)."""
global _RUNTIME_MAIN_PROVIDER, _RUNTIME_MAIN_MODEL
global _RUNTIME_MAIN_BASE_URL, _RUNTIME_MAIN_API_KEY, _RUNTIME_MAIN_API_MODE
_RUNTIME_MAIN_PROVIDER = ""
_RUNTIME_MAIN_MODEL = ""
_RUNTIME_MAIN_BASE_URL = ""
_RUNTIME_MAIN_API_KEY = ""
_RUNTIME_MAIN_API_MODE = ""
def _resolve_custom_runtime() -> Tuple[Optional[str], Optional[str], Optional[str]]:
@@ -2357,7 +2396,16 @@ def _is_auth_error(exc: Exception) -> bool:
if status == 401:
return True
err_lower = str(exc).lower()
return "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower()
if "error code: 401" in err_lower or "authenticationerror" in type(exc).__name__.lower():
return True
# xAI returns HTTP 403 with "unauthenticated:bad-credentials" when an OAuth2
# access token has expired or is invalid — semantically a 401 auth failure,
# even though the status code is 403 (PermissionDenied).
if status == 403 and "bad-credentials" in err_lower:
return True
if "unauthenticated" in err_lower and "bad-credentials" in err_lower:
return True
return False
def _is_unsupported_parameter_error(exc: Exception, param: str) -> bool:
@@ -2510,6 +2558,8 @@ def _recoverable_pool_provider(
return "copilot"
if base_url_host_matches(base, "api.kimi.com"):
return "kimi-coding"
if base_url_host_matches(base, "api.x.ai"):
return "xai-oauth"
# For api_key providers not in the hardcoded list (e.g. opencode-go), match
# the client base URL against all registered api_key providers so that
# credential-pool rotation works for any provider the user configured.
@@ -2706,15 +2756,11 @@ def _refresh_provider_credentials(provider: str) -> bool:
_evict_cached_clients(normalized)
return True
if normalized == "nous":
from hermes_cli.auth import (
NOUS_INFERENCE_AUTH_MODE_LEGACY,
resolve_nous_runtime_credentials,
)
from hermes_cli.auth import resolve_nous_runtime_credentials
creds = resolve_nous_runtime_credentials(
min_key_ttl_seconds=max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800"))),
timeout_seconds=float(os.getenv("HERMES_NOUS_TIMEOUT_SECONDS", "15")),
inference_auth_mode=NOUS_INFERENCE_AUTH_MODE_LEGACY,
force_refresh=True,
)
if not str(creds.get("api_key", "") or "").strip():
return False
@@ -2731,6 +2777,24 @@ def _refresh_provider_credentials(provider: str) -> bool:
return False
_evict_cached_clients(normalized)
return True
if normalized == "xai-oauth":
# Preference: pool-level refresh (uses refresh_token from pool entry),
# then fall back to singleton auth-store resolver.
pool = load_pool(normalized)
if pool and pool.has_credentials():
# Ensure a current entry is selected before trying to refresh.
pool.select()
refreshed = pool.try_refresh_current()
if refreshed is not None and str(getattr(refreshed, "runtime_api_key", "") or "").strip():
_evict_cached_clients(normalized)
return True
from hermes_cli.auth import resolve_xai_oauth_runtime_credentials
creds = resolve_xai_oauth_runtime_credentials(force_refresh=True)
if not str(creds.get("api_key", "") or "").strip():
return False
_evict_cached_clients(normalized)
return True
except Exception as exc:
logger.debug("Auxiliary provider credential refresh failed for %s: %s", normalized, exc)
return False
@@ -2938,6 +3002,18 @@ def _resolve_auto(main_runtime: Optional[Dict[str, Any]] = None) -> Tuple[Option
runtime_api_key = runtime.get("api_key", "")
runtime_api_mode = str(runtime.get("api_mode") or "")
# Fall back to process-local globals when main_runtime dict was not
# provided or was incomplete. ``set_runtime_main()`` now records
# base_url/api_key/api_mode alongside provider/model, so custom:
# providers get the full credential surface in Step 1 of the
# auto-detect chain.
if not runtime_base_url and _RUNTIME_MAIN_BASE_URL:
runtime_base_url = _RUNTIME_MAIN_BASE_URL
if not runtime_api_key and _RUNTIME_MAIN_API_KEY:
runtime_api_key = _RUNTIME_MAIN_API_KEY
if not runtime_api_mode and _RUNTIME_MAIN_API_MODE:
runtime_api_mode = _RUNTIME_MAIN_API_MODE
# ── Warn once if OPENAI_BASE_URL is set but config.yaml uses a named
# provider (not 'custom'). This catches the common "env poisoning"
# scenario where a user switches providers via `hermes model` but the
@@ -4683,24 +4759,23 @@ def _build_call_kwargs(
kwargs["temperature"] = temperature
if max_tokens is not None:
# Codex adapter handles max_tokens internally; OpenRouter/Nous use max_tokens.
# Direct OpenAI api.openai.com with newer models needs max_completion_tokens.
# ZAI vision models (glm-4v-flash, glm-4v-plus, etc.) reject max_tokens with
# error code 1210 ("API 调用参数有误") on multimodal requests — skip it.
_model_lower = (model or "").lower()
_skip_max_tokens = (
provider == "zai"
and ("4v" in _model_lower or "5v" in _model_lower or "-v" in _model_lower)
# We do NOT cap output by default. Most chat-completions providers treat
# an omitted max_tokens as "use the model's max output", which is what we
# want for auxiliary tasks (compression summaries, titles, vision, etc.) —
# an explicit cap only risks truncating a summary or 400-ing on providers
# that reject the parameter outright (e.g. GitHub Copilot / newer OpenAI
# GPT-5 models require max_completion_tokens, not max_tokens; ZAI vision
# models reject it entirely with error 1210). Omitting it sidesteps all of
# those wire-format quirks at once.
#
# The one exception is the Anthropic Messages wire (MiniMax and any
# ``/anthropic`` endpoint reached through the OpenAI SDK wrapper), where
# max_tokens is a MANDATORY field — omitting it is a hard 400. Keep it only
# there.
_effective_base = base_url or (
_current_custom_base_url() if provider == "custom" else ""
)
if _skip_max_tokens:
pass # ZAI vision models do not accept max_tokens
elif provider == "custom":
custom_base = base_url or _current_custom_base_url()
if base_url_hostname(custom_base) == "api.openai.com":
kwargs["max_completion_tokens"] = max_tokens
else:
kwargs["max_tokens"] = max_tokens
else:
if _is_anthropic_compat_endpoint(provider, _effective_base):
kwargs["max_tokens"] = max_tokens
if tools:

View File

@@ -1167,18 +1167,6 @@ def _extract_provider_from_arn(arn: str) -> str:
"""
match = re.search(r"foundation-model/([^.]+)", arn)
return match.group(1) if match else ""
def get_bedrock_model_ids(region: str) -> List[str]:
"""Return a flat list of available Bedrock model IDs for the given region.
Convenience wrapper around ``discover_bedrock_models()`` for use in
the model selection UI.
"""
models = discover_bedrock_models(region)
return [m["id"] for m in models]
# ---------------------------------------------------------------------------
# Error classification — Bedrock-specific exceptions
# ---------------------------------------------------------------------------

View File

@@ -186,37 +186,6 @@ def _resolve(configured: Optional[str]) -> Optional[BrowserProvider]:
return None
def get_active_browser_provider() -> Optional[BrowserProvider]:
"""Resolve the currently-active cloud browser provider.
Reads ``browser.cloud_provider`` from config.yaml; falls back per the
module docstring. Returns None for local mode or when no provider is
available.
"""
try:
from hermes_cli.config import read_raw_config
cfg = read_raw_config()
browser_cfg = cfg.get("browser", {})
except Exception as exc:
logger.debug("Could not read browser config: %s", exc)
browser_cfg = {}
configured: Optional[str] = None
if isinstance(browser_cfg, dict) and "cloud_provider" in browser_cfg:
try:
from tools.tool_backend_helpers import normalize_browser_cloud_provider
configured = normalize_browser_cloud_provider(
browser_cfg.get("cloud_provider")
)
except Exception as exc:
logger.debug("normalize_browser_cloud_provider failed: %s", exc)
configured = None
return _resolve(configured)
def _reset_for_tests() -> None:
"""Clear the registry. **Test-only.**"""
with _lock:

View File

@@ -15,49 +15,23 @@ sites unchanged. Symbols that tests patch on ``run_agent`` (e.g.
from __future__ import annotations
import concurrent.futures
import contextvars
import copy
import json
import logging
import os
import random
import re
import sys
import threading
import time
import uuid
from datetime import datetime
from pathlib import Path
from types import SimpleNamespace
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import urlparse, parse_qs, urlunparse
from typing import Any, Dict, Optional
from hermes_cli.timeouts import get_provider_request_timeout, get_provider_stale_timeout
from hermes_constants import PARTIAL_STREAM_STUB_ID, FINISH_REASON_LENGTH
from agent.error_classifier import classify_api_error, FailoverReason
from agent.error_classifier import FailoverReason
from agent.model_metadata import is_local_endpoint
from agent.message_sanitization import (
_sanitize_surrogates,
_sanitize_messages_surrogates,
_sanitize_structure_surrogates,
_sanitize_messages_non_ascii,
_sanitize_tools_non_ascii,
_sanitize_structure_non_ascii,
_strip_images_from_messages,
_strip_non_ascii,
_repair_tool_call_arguments,
_escape_invalid_chars_in_json_strings,
)
from agent.tool_dispatch_helpers import (
_is_multimodal_tool_result,
_multimodal_text_summary,
)
from agent.retry_utils import jittered_backoff
from agent.tool_guardrails import (
ToolGuardrailDecision,
append_toolguard_guidance,
toolguard_synthetic_result,
)
from tools.terminal_tool import is_persistent_env
from utils import base_url_host_matches, base_url_hostname
@@ -175,13 +149,6 @@ def interruptible_api_call(agent, api_kwargs: dict):
request_client_holder["owner_tid"] = threading.get_ident()
return client
def _take_request_client():
with request_client_lock:
client = request_client_holder.get("client")
request_client_holder["client"] = None
request_client_holder["owner_tid"] = None
return client
def _close_request_client_once(reason: str) -> None:
# #29507: dispatch on the calling thread.
#
@@ -310,8 +277,15 @@ def interruptible_api_call(agent, api_kwargs: dict):
else:
_codex_idle_timeout_default = 12.0
# No-byte TTFB cutoff. The OpenAI SDK's own streaming read timeout is far
# longer (openai 2.x DEFAULT_TIMEOUT.read = 600s), so a tight 12s default
# killed subscription-backed Codex requests mid-prefill before the backend
# had a chance to emit its first SSE event. Default to 120s — long enough to
# clear normal backend admission / prompt prefill, short enough to still
# reconnect promptly when the socket is genuinely wedged. Set
# HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0 to disable this watchdog entirely.
_ttfb_enabled = _codex_watchdog_enabled
_ttfb_timeout = _env_float("HERMES_CODEX_TTFB_TIMEOUT_SECONDS", 12.0)
_ttfb_timeout = _env_float("HERMES_CODEX_TTFB_TIMEOUT_SECONDS", 120.0)
if _ttfb_timeout <= 0:
_ttfb_enabled = False
elif _openai_codex_backend:
@@ -333,7 +307,7 @@ def interruptible_api_call(agent, api_kwargs: dict):
_ttfb_disable_above,
)
else:
_ttfb_cap = _env_float("HERMES_CODEX_TTFB_MAX_SECONDS", 20.0)
_ttfb_cap = _env_float("HERMES_CODEX_TTFB_MAX_SECONDS", 120.0)
if _ttfb_cap > 0 and _ttfb_timeout > _ttfb_cap:
logger.info(
"Capping openai-codex no-byte TTFB timeout from %.0fs to %.0fs "
@@ -614,12 +588,23 @@ def build_api_kwargs(agent, api_messages: list) -> dict:
# It also rejects ``enum`` values containing ``/`` (HuggingFace IDs
# like ``Qwen/Qwen3.5-0.8B`` shipped by MCP servers) — same 400 with
# the same opaque message; strip those enums too.
#
# Deep-copy ``tools_for_api`` before sanitizing: the sanitizers
# mutate in place (documented contract on ``strip_slash_enum`` /
# ``strip_pattern_and_format``), and ``tools_for_api`` is a direct
# reference to ``agent.tools``. Without the copy, the first xAI
# request permanently strips constraints from the shared per-agent
# tool registry — every subsequent non-xAI call from the same
# agent (auxiliary task routed to Anthropic, OpenRouter fallback,
# main-model swap) sees the already-stripped schema. See #27907.
if is_xai_responses:
try:
import copy as _copy
from tools.schema_sanitizer import (
strip_pattern_and_format,
strip_slash_enum,
)
tools_for_api = _copy.deepcopy(tools_for_api)
tools_for_api, _ = strip_pattern_and_format(tools_for_api)
tools_for_api, _ = strip_slash_enum(tools_for_api)
except Exception as exc:
@@ -1298,6 +1283,18 @@ def handle_max_iterations(agent, messages: list, api_call_count: int) -> str:
agent._copy_reasoning_content_for_api(msg, api_msg)
for internal_field in ("reasoning", "finish_reason", "_thinking_prefill"):
api_msg.pop(internal_field, None)
# Strict OpenAI-compatible gateways (Fireworks-backed OpenCode Go,
# Mistral, Moonshot/Kimi) reject any message key outside the Chat
# Completions schema. The main loop drops these via
# ChatCompletionsTransport.convert_messages(), but the summary path
# hand-builds messages and calls chat.completions.create() directly,
# bypassing the transport — so mirror that sanitization here:
# tool_name (SQLite FTS bookkeeping), the codex_* reasoning carriers,
# and every Hermes-internal underscore-prefixed scaffolding key.
for schema_foreign in ("tool_name", "codex_reasoning_items", "codex_message_items"):
api_msg.pop(schema_foreign, None)
for internal_key in [k for k in api_msg if isinstance(k, str) and k.startswith("_")]:
api_msg.pop(internal_key, None)
if _needs_sanitize:
agent._sanitize_tool_calls_for_strict_api(api_msg)
api_messages.append(api_msg)
@@ -1636,13 +1633,6 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
request_client_holder["owner_tid"] = threading.get_ident()
return client
def _take_request_client():
with request_client_lock:
client = request_client_holder.get("client")
request_client_holder["client"] = None
request_client_holder["owner_tid"] = None
return client
def _close_request_client_once(reason: str) -> None:
# See #29507 explanation in the non-streaming variant above. A
# stranger thread (the interrupt-check / stale-stream detector loop)

View File

@@ -980,6 +980,48 @@ def _extract_responses_reasoning_text(item: Any) -> str:
return ""
def _format_responses_error(error_obj: Any, response_status: str) -> str:
"""Build a human-readable error string from a Responses ``response.error`` payload.
The OpenAI Responses API carries failure details under ``response.error``
on terminal ``response.failed`` events, in the shape
``{"code": "rate_limit_exceeded", "message": "Slow down", "param": ...}``.
Earlier code only surfaced ``message``, which left users staring at bare
strings like ``"Slow down"`` while the failure mode (rate limit vs
context-length vs internal_error vs model-overloaded) was hidden in
``code``. We now prefix ``code`` when both are present so consumers can
distinguish failure modes without parsing the bare message.
Falls back to ``code`` alone when ``message`` is empty, and to a stable
default referencing the response status when no error payload is
available at all. Adapted from anomalyco/opencode#28757.
"""
# Pull code and message from either dict or attribute-style payloads.
code: Any = None
message: Any = None
if isinstance(error_obj, dict):
code = error_obj.get("code")
message = error_obj.get("message")
elif error_obj is not None:
code = getattr(error_obj, "code", None)
message = getattr(error_obj, "message", None)
code_str = str(code).strip() if isinstance(code, str) else (str(code).strip() if code else "")
message_str = str(message).strip() if isinstance(message, str) else (str(message).strip() if message else "")
if code_str and message_str:
return f"{code_str}: {message_str}"
if message_str:
return message_str
if code_str:
return code_str
if error_obj:
# Last-resort: stringify whatever the provider sent so it's at least
# visible in logs/UI rather than silently swallowed.
return str(error_obj)
return f"Responses API returned status '{response_status}'"
# ---------------------------------------------------------------------------
# Full response normalization
# ---------------------------------------------------------------------------
@@ -1023,10 +1065,7 @@ def _normalize_codex_response(
if response_status in {"failed", "cancelled"}:
error_obj = getattr(response, "error", None)
if isinstance(error_obj, dict):
error_msg = error_obj.get("message") or str(error_obj)
else:
error_msg = str(error_obj) if error_obj else f"Responses API returned status '{response_status}'"
error_msg = _format_responses_error(error_obj, response_status)
raise RuntimeError(error_msg)
content_parts: List[str] = []

View File

@@ -16,7 +16,6 @@ compatibility.
from __future__ import annotations
import json
import logging
import os
import time

View File

@@ -40,17 +40,47 @@ SUMMARY_PREFIX = (
"window — treat it as background reference, NOT as active instructions. "
"Do NOT answer questions or fulfill requests mentioned in this summary; "
"they were already addressed. "
"Your current task is identified in the '## Active Task' section of the "
"summary — resume exactly from there. "
"Respond ONLY to the latest user message that appears AFTER this "
"summary — that message is the single source of truth for what to do "
"right now. "
"If the latest user message is consistent with the '## Active Task' "
"section, you may use the summary as background. If the latest user "
"message contradicts, supersedes, changes topic from, or in any way "
"diverges from '## Active Task' / '## In Progress' / '## Pending User "
"Asks' / '## Remaining Work', the latest message WINS — discard those "
"stale items entirely and do not 'wrap up the old task first'. "
"Reverse signals in the latest message (e.g. 'stop', 'undo', 'roll "
"back', 'just verify', 'don't do that anymore', 'never mind', a new "
"topic) must immediately end any in-flight work described in the "
"summary; do not re-surface it in later turns. "
"IMPORTANT: Your persistent memory (MEMORY.md, USER.md) in the system "
"prompt is ALWAYS authoritative and active — never ignore or deprioritize "
"memory content due to this compaction note. "
"Respond ONLY to the latest user message "
"that appears AFTER this summary. The current session state (files, "
"config, etc.) may reflect work described here — avoid repeating it:"
"The current session state (files, config, etc.) may reflect work "
"described here — avoid repeating it:"
)
LEGACY_SUMMARY_PREFIX = "[CONTEXT SUMMARY]:"
# Handoff prefixes that shipped in earlier releases. A summary persisted under
# one of these can be inherited into a resumed lineage (#35344); when it is
# re-normalized on re-compaction we must strip the OLD prefix too, otherwise the
# stale directive it carried (e.g. "resume exactly from Active Task") survives
# embedded in the body and keeps hijacking replies. Keep newest-first; entries
# are matched literally. Add a frozen copy here whenever SUMMARY_PREFIX changes.
_HISTORICAL_SUMMARY_PREFIXES = (
# Pre-#35344: contained the self-contradicting "resume exactly" directive.
"[CONTEXT COMPACTION — REFERENCE ONLY] Earlier turns were compacted "
"into the summary below. This is a handoff from a previous context "
"window — treat it as background reference, NOT as active instructions. "
"Do NOT answer questions or fulfill requests mentioned in this summary; "
"they were already addressed. "
"Your current task is identified in the '## Active Task' section of the "
"summary — resume exactly from there. "
"Respond ONLY to the latest user message "
"that appears AFTER this summary. The current session state (files, "
"config, etc.) may reflect work described here — avoid repeating it:",
)
# Minimum tokens for the summary output
_MIN_SUMMARY_TOKENS = 2000
# Proportion of compressed content to allocate for summary
@@ -75,6 +105,44 @@ _IMAGE_TOKEN_ESTIMATE = 1600
_IMAGE_CHAR_EQUIVALENT = _IMAGE_TOKEN_ESTIMATE * _CHARS_PER_TOKEN
_SUMMARY_FAILURE_COOLDOWN_SECONDS = 600
# Hard ceiling for the deterministic summary-failure handoff. The fallback is
# only meant to preserve continuity anchors from the dropped window, not to
# become another unbounded transcript copy after the LLM summarizer failed.
_FALLBACK_SUMMARY_MAX_CHARS = 8_000
_FALLBACK_TURN_MAX_CHARS = 700
_PATH_MENTION_RE = re.compile(r"(?:/|~/?|[A-Za-z]:\\)[^\s`'\")\]}<>]+")
def _dedupe_append(items: list[str], value: str, *, limit: int) -> None:
value = value.strip()
if value and value not in items and len(items) < limit:
items.append(value)
def _extract_tool_call_name_and_args(tool_call: Any) -> tuple[str, str]:
"""Return a best-effort ``(name, arguments)`` pair for dict/object tool calls."""
if isinstance(tool_call, dict):
fn = tool_call.get("function") or {}
return str(fn.get("name") or "unknown"), str(fn.get("arguments") or "")
fn = getattr(tool_call, "function", None)
if fn is None:
return "unknown", ""
return str(getattr(fn, "name", None) or "unknown"), str(getattr(fn, "arguments", None) or "")
def _extract_tool_call_id(tool_call: Any) -> str:
if isinstance(tool_call, dict):
return str(tool_call.get("id") or "")
return str(getattr(tool_call, "id", "") or "")
def _collect_path_mentions(text: str, relevant_files: list[str], *, limit: int = 12) -> None:
for match in _PATH_MENTION_RE.findall(text):
_dedupe_append(relevant_files, match.rstrip(".,:;"), limit=limit)
def _content_length_for_budget(raw_content: Any) -> int:
"""Return the effective char-length of a message's content for token budgeting.
@@ -480,6 +548,10 @@ class ContextCompressor(ContextEngine):
self._last_compression_savings_pct = 100.0
self._ineffective_compression_count = 0
self._summary_failure_cooldown_until = 0.0 # transient errors must not block a fresh session
self.last_real_prompt_tokens = 0
self.last_compression_rough_tokens = 0
self.last_rough_tokens_when_real_prompt_fit = 0
self.awaiting_real_usage_after_compression = False
def update_model(
self,
@@ -537,8 +609,8 @@ class ContextCompressor(ContextEngine):
self.quiet_mode = quiet_mode
# When True, summary-generation failure aborts compression entirely
# (returns messages unchanged, sets _last_compress_aborted=True).
# When False (default = historical behavior), insert a static
# "summary unavailable" placeholder and drop the middle window.
# When False (default = historical behavior), insert a
# deterministic "summary unavailable" handoff and drop the middle window.
self.abort_on_summary_failure = abort_on_summary_failure
self.context_length = get_model_context_length(
@@ -577,6 +649,10 @@ class ContextCompressor(ContextEngine):
self.last_prompt_tokens = 0
self.last_completion_tokens = 0
self.last_real_prompt_tokens = 0
self.last_compression_rough_tokens = 0
self.last_rough_tokens_when_real_prompt_fit = 0
self.awaiting_real_usage_after_compression = False
self.summary_model = summary_model_override or ""
@@ -610,6 +686,44 @@ class ContextCompressor(ContextEngine):
self.last_prompt_tokens = usage.get("prompt_tokens", 0)
self.last_completion_tokens = usage.get("completion_tokens", 0)
self.last_total_tokens = usage.get("total_tokens", self.last_prompt_tokens + self.last_completion_tokens)
if self.last_prompt_tokens > 0:
self.last_real_prompt_tokens = self.last_prompt_tokens
if self.last_prompt_tokens < self.threshold_tokens:
if self.awaiting_real_usage_after_compression and self.last_compression_rough_tokens > 0:
self.last_rough_tokens_when_real_prompt_fit = self.last_compression_rough_tokens
else:
self.last_rough_tokens_when_real_prompt_fit = 0
self.awaiting_real_usage_after_compression = False
def should_defer_preflight_to_real_usage(self, rough_tokens: int) -> bool:
"""Return True when a high rough preflight estimate is known-noisy.
``estimate_request_tokens_rough(..., tools=...)`` intentionally
overestimates schema-heavy requests so Hermes compresses before a
provider rejects the payload. After a successful compressed API call,
though, provider ``prompt_tokens`` are a better signal than repeating
compaction from the same rough schema overhead. Defer only while the
rough estimate has grown modestly since a request the provider proved
fit under the threshold.
"""
if rough_tokens < self.threshold_tokens:
return False
if self.last_real_prompt_tokens <= 0:
return False
if self.last_real_prompt_tokens >= self.threshold_tokens:
return False
baseline = self.last_rough_tokens_when_real_prompt_fit or self.last_compression_rough_tokens
if baseline <= 0:
return False
growth = max(0, rough_tokens - baseline)
tolerated_growth = max(4096, int(self.threshold_tokens * 0.05))
if growth > tolerated_growth:
return False
self.last_rough_tokens_when_real_prompt_fit = max(baseline, rough_tokens)
return True
def should_compress(self, prompt_tokens: int = None) -> bool:
"""Check if context exceeds the compression threshold.
@@ -884,6 +998,195 @@ class ContextCompressor(ContextEngine):
return "\n\n".join(parts)
def _build_static_fallback_summary(
self,
turns_to_summarize: List[Dict[str, Any]],
reason: str | None = None,
) -> str:
"""Build a deterministic handoff when the LLM summarizer is unavailable.
This is intentionally much less rich than an LLM-written summary, but it
is still better than a bare "N messages were removed" marker. It keeps
the most useful continuity anchors that can be extracted locally:
recent user asks, assistant/tool actions, files/commands mentioned in
tool calls, and any error text. The result uses the normal summary
structure so downstream prompts can recover gracefully after a provider
outage or summary-model failure.
"""
user_asks: list[str] = []
assistant_actions: list[str] = []
tool_actions: list[str] = []
relevant_files: list[str] = []
blockers: list[str] = []
last_dropped_turns: list[str] = []
def _compact_fallback_turn(value: Any) -> str:
text = redact_sensitive_text(_content_text_for_contains(value))
text = re.sub(r"\bgh[pousr]_[A-Za-z0-9_]{8,}\b", "[REDACTED]", text)
text = re.sub(r"\s+", " ", text).strip()
if len(text) > _FALLBACK_TURN_MAX_CHARS:
text = text[: _FALLBACK_TURN_MAX_CHARS - 15].rstrip() + " ...[truncated]"
return re.sub(r"\bgh[pousr]_[A-Za-z0-9_.-]+", "[REDACTED]", text)
def _remember_dropped_turn(label: str, text: str, *, limit: int = 8) -> None:
text = text.strip()
if not text:
return
last_dropped_turns.append(f"{label}: {text}")
if len(last_dropped_turns) > limit:
del last_dropped_turns[0]
def _collect_paths_from_jsonish(obj: Any) -> None:
if isinstance(obj, dict):
for key, val in obj.items():
if key in {"path", "workdir", "file_path", "output_path"} and isinstance(val, str):
_dedupe_append(relevant_files, val, limit=12)
_collect_paths_from_jsonish(val)
elif isinstance(obj, list):
for val in obj:
_collect_paths_from_jsonish(val)
elif isinstance(obj, str):
_collect_path_mentions(obj, relevant_files)
call_id_to_tool: dict[str, tuple[str, str]] = {}
for msg in turns_to_summarize:
if msg.get("role") == "assistant" and msg.get("tool_calls"):
for tc in msg.get("tool_calls") or []:
name, raw_args = _extract_tool_call_name_and_args(tc)
args = redact_sensitive_text(raw_args)
call_id = _extract_tool_call_id(tc)
if call_id:
call_id_to_tool[call_id] = (name, args)
if args:
try:
parsed = json.loads(args)
except Exception:
parsed = args
_collect_paths_from_jsonish(parsed)
for msg in turns_to_summarize:
role = msg.get("role", "unknown")
text = _compact_fallback_turn(msg.get("content"))
_collect_path_mentions(text, relevant_files)
turn_text = text
turn_tool_names: list[str] = []
if role == "assistant" and msg.get("tool_calls"):
for tc in msg.get("tool_calls") or []:
name, _args = _extract_tool_call_name_and_args(tc)
turn_tool_names.append(name)
if turn_tool_names:
prefix = "tool calls: " + ", ".join(turn_tool_names[:6])
turn_text = f"{prefix}; {turn_text}" if turn_text else prefix
_remember_dropped_turn(str(role).upper(), turn_text)
if len(text) > 600:
text = text[:420].rstrip() + " ... " + text[-160:].lstrip()
if role == "user" and text:
user_asks.append(text)
elif role == "assistant":
tool_names: list[str] = []
for tc in msg.get("tool_calls") or []:
name, _args = _extract_tool_call_name_and_args(tc)
tool_names.append(name)
if tool_names:
assistant_actions.append(
"Called tool(s): " + ", ".join(tool_names[:6])
)
elif text:
assistant_actions.append(text)
elif role == "tool":
call_id = str(msg.get("tool_call_id") or "")
tool_name, tool_args = call_id_to_tool.get(call_id, ("unknown", ""))
tool_actions.append(
_summarize_tool_result(tool_name, tool_args, text or "")
)
if re.search(
r"\b(error|failed|exception|traceback|timeout|timed out|fatal)\b",
text,
re.I,
):
blockers.append(text[:500])
def _bullets(items: list[str], limit: int = 8) -> str:
unique: list[str] = []
seen: set[str] = set()
for item in items:
item = item.strip()
if not item or item in seen:
continue
seen.add(item)
unique.append(item)
if len(unique) >= limit:
break
return "\n".join(f"- {item}" for item in unique) if unique else "None."
completed: list[str] = []
for idx, item in enumerate((assistant_actions + tool_actions)[:12], start=1):
completed.append(f"{idx}. {item}")
active_task = (
f"User asked: {user_asks[-1]!r}"
if user_asks
else "Unknown from deterministic fallback."
)
previous_summary_note = ""
if self._previous_summary:
previous_summary_note = (
"\n\nPrevious compaction summary was present and should still be treated as "
"background continuity context, but the latest LLM summary update failed."
)
reason_text = f" Summary failure reason: {reason}." if reason else ""
body = f"""## Active Task
{active_task}
## Goal
Recovered from a deterministic fallback because the LLM context summarizer was unavailable. Continue from the protected recent messages after this summary and use current file/system state for exact details.{previous_summary_note}
## Constraints & Preferences
- This fallback was generated locally without an LLM summary call.
- Secrets and credentials were redacted before preservation.
- The summary may be incomplete; prefer verifying current files, git state, processes, and test results instead of assuming omitted details.
## Completed Actions
{chr(10).join(completed) if completed else "None recoverable from compacted turns."}
## Active State
Unknown from deterministic fallback. Inspect current repository/session state if needed.
## In Progress
{active_task}
## Blocked
{_bullets(blockers, limit=5)}
## Key Decisions
None recoverable from deterministic fallback.
## Resolved Questions
None recoverable from deterministic fallback.
## Pending User Asks
{active_task}
## Relevant Files
{_bullets(relevant_files, limit=12)}
## Remaining Work
Continue from the most recent unfulfilled user ask and protected tail messages. Verify state with tools before making claims.
## Last Dropped Turns
{_bullets(last_dropped_turns, limit=8)}
## Critical Context
Summary generation was unavailable, so this is a best-effort deterministic fallback for {len(turns_to_summarize)} compacted message(s).{reason_text}"""
summary = self._with_summary_prefix(redact_sensitive_text(body.strip()))
if len(summary) > _FALLBACK_SUMMARY_MAX_CHARS:
summary = summary[: _FALLBACK_SUMMARY_MAX_CHARS - 42].rstrip() + "\n...[fallback summary truncated]"
return summary
def _fallback_to_main_for_compression(self, e: Exception, reason: str) -> None:
"""Switch from a separate ``summary_model`` back to the main model.
@@ -911,7 +1214,11 @@ class ContextCompressor(ContextEngine):
self.summary_model = "" # empty = use main model
self._summary_failure_cooldown_until = 0.0 # no cooldown — retry immediately
def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topic: str = None) -> Optional[str]:
def _generate_summary(
self,
turns_to_summarize: List[Dict[str, Any]],
focus_topic: Optional[str] = None,
) -> Optional[str]:
"""Generate a structured summary of conversation turns.
Uses a structured template (Goal, Progress, Decisions, Resolved/Pending
@@ -959,11 +1266,27 @@ class ContextCompressor(ContextEngine):
# Shared structured template (used by both paths).
_template_sections = f"""## Active Task
[THE SINGLE MOST IMPORTANT FIELD. Copy the user's most recent request or
task assignment verbatim — the exact words they used. If multiple tasks
were requested and only some are done, list only the ones NOT yet completed.
Continuation should pick up exactly here. Example:
[THE SINGLE MOST IMPORTANT FIELD. Capture the user's most recent unfulfilled
input verbatim — the exact words they used. This includes:
- Explicit task assignments ("refactor the auth module")
- Questions awaiting an answer ("waarom staat X op Y?", "wat zijn de volgende stappen?")
- Decisions awaiting input ("optie A of B?")
- Ongoing discussions where the assistant owes the next substantive reply
A conversation where the user just asked a question IS an active task — the
task is "answer that question with full context". Do NOT write "None" merely
because the user did not issue an imperative command; reserve "None" for the
rare case where the last exchange was fully resolved and the user said
something like "thanks, that's all".
If multiple items are outstanding, list only the ones NOT yet completed.
Continuation should pick up exactly here. Examples:
"User asked: 'Now refactor the auth module to use JWT instead of sessions'"
"User asked: 'Waarom stond provider ineens op openrouter?' — needs investigation + answer"
"User chose option A; awaiting implementation of step 2"
If the user's most recent message was a reverse signal (stop, undo, roll
back, never mind, just verify, change of topic) that supersedes earlier
work, write the reverse signal verbatim and DO NOT carry forward the
cancelled task. Example: "User asked: 'Stop the i18n refactor and just
verify the current diff' — earlier i18n in-flight work is cancelled."
If no outstanding task exists, write "None."]
## Goal
@@ -1029,7 +1352,7 @@ PREVIOUS SUMMARY:
NEW TURNS TO INCORPORATE:
{content_to_summarize}
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new completed actions to the numbered list (continue numbering). Move items from "In Progress" to "Completed Actions" when done. Move answered questions to "Resolved Questions". Update "Active State" to reflect current state. Remove information only if it is clearly obsolete. CRITICAL: Update "## Active Task" to reflect the user's most recent unfulfilled request — this is the most important field for task continuity.
Update the summary using this exact structure. PRESERVE all existing information that is still relevant. ADD new completed actions to the numbered list (continue numbering). Move items from "In Progress" to "Completed Actions" when done. Move answered questions to "Resolved Questions". Update "Active State" to reflect current state. Remove information only if it is clearly obsolete. CRITICAL: Update "## Active Task" to reflect the user's most recent unfulfilled input — this includes any question, decision request, or discussion turn that the assistant has not yet answered. Only write "None" if the last exchange was fully resolved.
{_template_sections}"""
else:
@@ -1193,9 +1516,16 @@ The user has requested that this compaction PRIORITISE preserving all informatio
@staticmethod
def _strip_summary_prefix(summary: str) -> str:
"""Return summary body without the current or legacy handoff prefix."""
"""Return summary body without the current, legacy, or any historical
handoff prefix.
Historical prefixes must be stripped too: a handoff persisted under an
older prefix can be inherited into a resumed lineage (#35344), and if we
only re-prepend the current prefix without removing the old one, the
stale directive it carried stays embedded in the body.
"""
text = (summary or "").strip()
for prefix in (SUMMARY_PREFIX, LEGACY_SUMMARY_PREFIX):
for prefix in (SUMMARY_PREFIX, LEGACY_SUMMARY_PREFIX, *_HISTORICAL_SUMMARY_PREFIXES):
if text.startswith(prefix):
return text[len(prefix):].lstrip()
return text
@@ -1209,7 +1539,9 @@ The user has requested that this compaction PRIORITISE preserving all informatio
@staticmethod
def _is_context_summary_content(content: Any) -> bool:
text = _content_text_for_contains(content).lstrip()
return text.startswith(SUMMARY_PREFIX) or text.startswith(LEGACY_SUMMARY_PREFIX)
if text.startswith(SUMMARY_PREFIX) or text.startswith(LEGACY_SUMMARY_PREFIX):
return True
return any(text.startswith(p) for p in _HISTORICAL_SUMMARY_PREFIXES)
@classmethod
def _find_latest_context_summary(
@@ -1608,9 +1940,9 @@ The user has requested that this compaction PRIORITISE preserving all informatio
# True → ABORT compression entirely. Return messages unchanged
# and set _last_compress_aborted=True so callers can warn
# the user and stop the auto-compress retry loop.
# False → Fall through to the legacy fallback path below: insert
# a static "summary unavailable" placeholder and drop the
# middle window. Records _last_summary_fallback_used /
# False → Fall through to the default fallback path below: insert
# a deterministic "summary unavailable" handoff and drop
# the middle window. Records _last_summary_fallback_used /
# _last_summary_dropped_count for gateway hygiene to
# surface a warning.
# Default is False (historical behavior).
@@ -1643,21 +1975,18 @@ The user has requested that this compaction PRIORITISE preserving all informatio
)
compressed.append(msg)
# Legacy fallback path: LLM summary failed and abort_on_summary_failure
# is False (the default). Insert a static placeholder so the model
# knows context was lost rather than silently dropping everything.
# If LLM summary failed, insert a deterministic fallback so the model
# gets at least locally recoverable continuity anchors instead of a
# content-free "N messages were removed" marker.
if not summary:
if not self.quiet_mode:
logger.warning("Summary generation failed — inserting static fallback context marker")
logger.warning("Summary generation failed — inserting deterministic fallback context summary")
n_dropped = compress_end - compress_start
self._last_summary_dropped_count = n_dropped
self._last_summary_fallback_used = True
summary = (
f"{SUMMARY_PREFIX}\n"
f"Summary generation was unavailable. {n_dropped} message(s) were "
f"removed to free context space but could not be summarized. The removed "
f"messages contained earlier work in this session. Continue based on the "
f"recent messages below and the current state of any files or resources."
summary = self._build_static_fallback_summary(
turns_to_summarize,
reason=self._last_summary_error,
)
_merge_summary_into_tail = False

View File

@@ -115,6 +115,15 @@ class ContextEngine(ABC):
"""
return False
def should_defer_preflight_to_real_usage(self, rough_tokens: int) -> bool:
"""Return True when preflight should trust recent real usage instead.
Built-in compression uses this to avoid re-compacting from known-noisy
rough estimates after a compressed request has already fit. Third-party
engines can ignore it safely.
"""
return False
# -- Optional: manual /compress preflight ------------------------------
def has_content_to_compress(self, messages: List[Dict[str, Any]]) -> bool:

View File

@@ -34,13 +34,33 @@ import tempfile
import uuid
from datetime import datetime
from pathlib import Path
from typing import Any, List, Optional, Tuple
from typing import Any, Optional, Tuple
from agent.model_metadata import estimate_request_tokens_rough
logger = logging.getLogger(__name__)
def _compression_lock_holder(agent: Any) -> str:
"""Build a unique holder id for the lock: pid:tid:agent-instance:uuid.
The pid+tid prefix lets ops tell crashed/abandoned holders apart from
live ones (expiry-based recovery uses the timestamp, but ``holder``
is what shows up in diagnostics + log lines). The agent instance id
and a per-acquire uuid disambiguate two co-resident agents on the
same thread (background_review forks run on a worker thread, but
on machines where compression itself dispatches to a thread pool
we want each acquire to be unique).
"""
import threading
return (
f"pid={os.getpid()}"
f":tid={threading.get_ident()}"
f":agent={id(agent):x}"
f":nonce={uuid.uuid4().hex[:8]}"
)
def check_compression_model_feasibility(agent: Any) -> None:
"""Warn at session start if the auxiliary compression model's context
window is smaller than the main model's compression threshold.
@@ -305,6 +325,103 @@ def compress_context(
"🗜️ Compacting context — summarizing earlier conversation so I can continue..."
)
# ── Compression lock ────────────────────────────────────────────────
# Atomic, state.db-backed lock per session_id. Without this, two
# AIAgent instances that share the same session_id (most commonly the
# parent-turn agent and its background-review fork — see
# ``agent/background_review.py``: ``review_agent.session_id =
# agent.session_id``) can each call compress() on overlapping
# snapshots of the same conversation. Both succeed, both rotate
# ``agent.session_id`` to a fresh id, both create child sessions in
# state.db parented to the same old id. The gateway's SessionEntry
# only catches one rotation, so the other child becomes an orphan
# that silently accumulates writes — Damien's repro shape.
#
# Acquire keyed on the OLD session_id (the rotation target's parent),
# because that's the id that competing paths see and read from
# SessionEntry at the start of their own compression attempt.
#
# If we can't acquire the lock, another path is mid-compression on
# this session. Aborting is correct: the messages are unchanged, the
# other path's rotation will produce the canonical new session_id,
# and our caller's auto-compress loop sees ``len(returned) == len(input)``
# and stops retrying for this cycle. The session is NOT corrupted —
# we just sit out this round and let the winner finish.
_lock_db = getattr(agent, "_session_db", None)
_lock_sid = agent.session_id or ""
_lock_holder: Optional[str] = None
# Probe whether the lock subsystem is actually available on this
# SessionDB instance. A process running mismatched module versions
# (e.g. ``conversation_compression.py`` reloaded after a pull but the
# long-lived ``hermes_state.SessionDB`` class still bound to the
# pre-#34351 version in memory) has the call site but not the method.
# In that case ``try_acquire_compression_lock`` raises AttributeError —
# NOT a ``sqlite3.Error`` — so the method's own fail-open guard never
# runs and the exception propagates to the outer agent loop, which
# prints the error and retries. Because compression never succeeds,
# the token count never drops and the loop re-triggers compaction
# forever (the "API call #47/#48/#49 ... has no attribute
# try_acquire_compression_lock" spin). Fail OPEN here: if the lock
# subsystem is missing or broken in any unexpected way, skip locking
# and proceed with compression. Skipping the lock risks a rare
# concurrent-compression session fork; an infinite no-progress loop
# that never compresses at all is strictly worse.
if _lock_db is not None and _lock_sid:
_lock_holder = _compression_lock_holder(agent)
try:
_lock_acquired = _lock_db.try_acquire_compression_lock(
_lock_sid, _lock_holder
)
except Exception as _lock_err:
# Broken/absent lock subsystem (version skew, etc.). Log once
# per session and proceed WITHOUT the lock rather than letting
# the exception spin the outer loop.
_lock_holder = None # we don't own anything to release
if getattr(agent, "_last_compression_lock_error_sid", None) != _lock_sid:
agent._last_compression_lock_error_sid = _lock_sid
logger.warning(
"compression lock subsystem unavailable for session=%s "
"(%s: %s) — proceeding without lock. This usually means a "
"stale in-memory module after an update; restart the "
"process (or `hermes update`) to resync.",
_lock_sid, type(_lock_err).__name__, _lock_err,
)
_lock_acquired = True # treat as acquired-but-unlocked; proceed
if not _lock_acquired:
try:
existing = _lock_db.get_compression_lock_holder(_lock_sid)
except Exception:
existing = None
logger.warning(
"compression skipped: another path is compressing session=%s "
"(holder=%s) — returning messages unchanged to avoid session fork",
_lock_sid, existing,
)
_lock_holder = None # don't release a lock we don't own
# Surface to the user once — quiet for downstream auto-compress loops
if getattr(agent, "_last_compression_lock_warning_sid", None) != _lock_sid:
agent._last_compression_lock_warning_sid = _lock_sid
try:
agent._emit_warning(
"⚠ Skipping concurrent compression — another path "
"is already compressing this session. Will retry "
"after it finishes."
)
except Exception:
pass
_existing_sp = getattr(agent, "_cached_system_prompt", None)
if not _existing_sp:
_existing_sp = agent._build_system_prompt(system_message)
return messages, _existing_sp
def _release_lock() -> None:
"""Release the lock keyed on the OLD session_id (before rotation)."""
if _lock_db is not None and _lock_sid and _lock_holder:
try:
_lock_db.release_compression_lock(_lock_sid, _lock_holder)
except Exception as _rel_err:
logger.debug("compression lock release failed: %s", _rel_err)
# Notify external memory provider before compression discards context
if agent._memory_manager:
try:
@@ -318,6 +435,11 @@ def compress_context(
# Plugin context engine with strict signature that doesn't accept
# focus_topic / force — fall back to calling without them.
compressed = agent.context_compressor.compress(messages, current_tokens=approx_tokens)
except BaseException:
# ANY exception during compress() must release the lock so the
# session isn't permanently blocked from future compression.
_release_lock()
raise
# If compression aborted (aux LLM failed to produce a usable summary)
# the compressor returns the input messages unchanged. Surface the
@@ -336,6 +458,7 @@ def compress_context(
_existing_sp = getattr(agent, "_cached_system_prompt", None)
if not _existing_sp:
_existing_sp = agent._build_system_prompt(system_message)
_release_lock() # compression aborted — no rotation will happen
return messages, _existing_sp
summary_error = getattr(agent.context_compressor, "_last_summary_error", None)
@@ -452,19 +575,18 @@ def compress_context(
force=True,
)
# Update token estimate after compaction so pressure calculations
# use the post-compression count, not the stale pre-compression one.
# Use estimate_request_tokens_rough() so tool schemas are included —
# with 50+ tools enabled, schemas alone can add 20-30K tokens, and
# omitting them delays the next compression cycle far past the
# configured threshold (issue #14695).
# Keep the post-compression rough estimate for diagnostics, but do not
# treat it as provider-reported prompt usage. Schema-heavy rough estimates
# can remain above threshold even after the next real API request fits.
_compressed_est = estimate_request_tokens_rough(
compressed,
system_prompt=new_system_prompt or "",
tools=agent.tools or None,
)
agent.context_compressor.last_prompt_tokens = _compressed_est
agent.context_compressor.last_compression_rough_tokens = _compressed_est
agent.context_compressor.last_prompt_tokens = -1
agent.context_compressor.last_completion_tokens = 0
agent.context_compressor.awaiting_real_usage_after_compression = True
# Clear the file-read dedup cache. After compression the original
# read content is summarised away — if the model re-reads the same
@@ -476,10 +598,16 @@ def compress_context(
pass
logger.info(
"context compression done: session=%s messages=%d->%d tokens=~%s",
"context compression done: session=%s messages=%d->%d rough_tokens=~%s awaiting_real_usage=true",
agent.session_id or "none", _pre_msg_count, len(compressed),
f"{_compressed_est:,}",
)
# Release the lock on the OLD session_id only AFTER rotation completed
# and all post-rotation bookkeeping (memory manager, context engine,
# file dedup) ran. A concurrent path that wakes up the moment we
# release will see the NEW session_id in state.db / SessionEntry and
# acquire on that — no race against our just-finished work.
_release_lock()
return compressed, new_system_prompt
@@ -516,6 +644,12 @@ def try_shrink_image_parts_in_messages(api_messages: list) -> bool:
# after a confirmed provider rejection, so the alternative is failure.
target_bytes = 4 * 1024 * 1024
changed_count = 0
# Track parts that are over the target but could NOT be shrunk under it.
# If any survive, retrying is pointless — the same oversized payload will
# be re-sent and rejected again, wasting the single retry budget. We only
# report success (caller retries) when every over-threshold image was
# actually brought under the target.
unshrinkable_oversized = 0
def _shrink_data_url(url: str) -> Optional[str]:
"""Return a smaller data URL, or None if shrink can't help."""
@@ -582,17 +716,34 @@ def try_shrink_image_parts_in_messages(api_messages: list) -> bool:
if resized:
image_value["url"] = resized
changed_count += 1
elif isinstance(url, str) and url.startswith("data:") \
and len(url) > target_bytes:
unshrinkable_oversized += 1
elif isinstance(image_value, str):
resized = _shrink_data_url(image_value)
if resized:
part["image_url"] = resized
changed_count += 1
elif image_value.startswith("data:") \
and len(image_value) > target_bytes:
unshrinkable_oversized += 1
if changed_count:
logger.info(
"image-shrink recovery: re-encoded %d image part(s) to fit under %.0f MB",
changed_count, target_bytes / (1024 * 1024),
)
if unshrinkable_oversized:
# At least one oversized image could not be shrunk under the target.
# Retrying would re-send it and fail identically, so signal "no
# progress" even if other parts shrank — the caller will surface the
# original error rather than burning its single retry on a no-op.
logger.warning(
"image-shrink recovery: %d oversized image part(s) could not be "
"shrunk under %.0f MB — not retrying (would re-send rejected payload)",
unshrinkable_oversized, target_bytes / (1024 * 1024),
)
return False
return changed_count > 0

View File

@@ -27,8 +27,6 @@ import time
import uuid
from typing import Any, Dict, List, Optional
from agent.anthropic_adapter import _is_oauth_token
from agent.auxiliary_client import set_runtime_main
from agent.codex_responses_adapter import _summarize_user_message_for_log
from agent.display import KawaiiSpinner
from agent.error_classifier import FailoverReason, classify_api_error
@@ -53,20 +51,13 @@ from agent.model_metadata import (
parse_available_output_tokens_from_error,
save_context_length,
)
from agent.nous_rate_guard import (
clear_nous_rate_limit,
is_genuine_nous_rate_limit,
nous_rate_limit_remaining,
record_nous_rate_limit,
)
from agent.process_bootstrap import _install_safe_stdio
from agent.prompt_caching import apply_anthropic_cache_control
from agent.retry_utils import jittered_backoff
from agent.trajectory import has_incomplete_scratchpad
from agent.usage_pricing import estimate_usage_cost, normalize_usage
from hermes_constants import display_hermes_home as _dhh_fn, PARTIAL_STREAM_STUB_ID
from hermes_constants import PARTIAL_STREAM_STUB_ID
from hermes_logging import set_session_context
from tools.schema_sanitizer import strip_pattern_and_format
from tools.skill_provenance import set_current_write_origin
from utils import base_url_host_matches, env_var_enabled
@@ -212,15 +203,13 @@ def _print_billing_or_entitlement_guidance(
def _try_refresh_nous_paid_entitlement_credentials(agent) -> bool:
"""Refresh Nous runtime credentials after a fresh paid-entitlement check."""
try:
from hermes_cli.auth import NOUS_INFERENCE_AUTH_MODE_LEGACY
from hermes_cli.nous_account import get_nous_portal_account_info
account_info = get_nous_portal_account_info(force_fresh=True)
if account_info.paid_service_access is not True:
return False
return agent._try_refresh_nous_client_credentials(
force=False,
inference_auth_mode=NOUS_INFERENCE_AUTH_MODE_LEGACY,
force=True,
)
except Exception:
return False
@@ -403,13 +392,15 @@ def run_conversation(
set_runtime_main(
getattr(agent, "provider", "") or "",
getattr(agent, "model", "") or "",
base_url=getattr(agent, "base_url", "") or "",
api_key=getattr(agent, "api_key", "") or "",
api_mode=getattr(agent, "api_mode", "") or "",
)
except Exception:
pass
# Tag all log records on this thread with the session ID so
# ``hermes logs --session <id>`` can filter a single conversation.
from hermes_logging import set_session_context
set_session_context(agent.session_id)
# Bind the skill write-origin ContextVar for this thread so tool
@@ -418,7 +409,6 @@ def run_conversation(
# a foreground user-directed turn. Set at the top of each call;
# the review fork runs on its own thread with a fresh context,
# so the foreground value here does not leak into it.
from tools.skill_provenance import set_current_write_origin
set_current_write_origin(getattr(agent, "_memory_write_origin", "assistant_tool"))
# If the previous turn activated fallback, restore the primary
@@ -613,18 +603,50 @@ def run_conversation(
system_prompt=active_system_prompt or "",
tools=agent.tools or None,
)
_compressor = agent.context_compressor
_defer_preflight = getattr(
_compressor,
"should_defer_preflight_to_real_usage",
lambda _tokens: False,
)
_preflight_deferred = _defer_preflight(_preflight_tokens)
if agent.context_compressor.should_compress(_preflight_tokens):
if not _preflight_deferred:
# Keep the CLI/ACP context display in sync with what preflight
# actually measured. The status bar reads
# ``compressor.last_prompt_tokens``, which otherwise only updates
# from a *successful* API response. When the conversation has grown
# since the last successful call — or when compression then fails
# (e.g. the auxiliary summary model times out) and no fresh usage
# arrives — the bar stays stuck at the old, smaller value while
# preflight reports a much larger number, looking out of sync.
# Seed it with the fresh estimate (only ever revising upward; a real
# ``update_from_response`` will correct it after the next API call).
# Skipped when deferring — a deferred estimate is known to over-count
# vs the last real provider prompt, so trusting it for the display
# would re-introduce the very desync we're avoiding.
if _preflight_tokens > (_compressor.last_prompt_tokens or 0):
_compressor.last_prompt_tokens = _preflight_tokens
if _preflight_deferred:
logger.info(
"Skipping preflight compression: rough estimate ~%s >= %s, "
"but last real provider prompt was %s after compression",
f"{_preflight_tokens:,}",
f"{_compressor.threshold_tokens:,}",
f"{_compressor.last_real_prompt_tokens:,}",
)
elif _compressor.should_compress(_preflight_tokens):
logger.info(
"Preflight compression: ~%s tokens >= %s threshold (model %s, ctx %s)",
f"{_preflight_tokens:,}",
f"{agent.context_compressor.threshold_tokens:,}",
f"{_compressor.threshold_tokens:,}",
agent.model,
f"{agent.context_compressor.context_length:,}",
f"{_compressor.context_length:,}",
)
agent._emit_status(
f"📦 Preflight compression: ~{_preflight_tokens:,} tokens "
f">= {agent.context_compressor.threshold_tokens:,} threshold. "
f">= {_compressor.threshold_tokens:,} threshold. "
"This may take a moment."
)
# May need multiple passes for very large sessions with small
@@ -659,8 +681,8 @@ def run_conversation(
system_prompt=active_system_prompt or "",
tools=agent.tools or None,
)
if _preflight_tokens < agent.context_compressor.threshold_tokens:
break # Under threshold
if not _compressor.should_compress(_preflight_tokens):
break # Under threshold or anti-thrash guard stopped it
# Plugin hook: pre_llm_call
# Fired once per turn before the tool-calling loop. Plugins can
@@ -1470,7 +1492,8 @@ def run_conversation(
if retry_count >= max_retries:
# Try fallback before giving up
agent._buffer_status(f"⚠️ Max retries ({max_retries}) for invalid responses — trying fallback...")
if agent._has_pending_fallback():
agent._buffer_status(f"⚠️ Max retries ({max_retries}) for invalid responses — trying fallback...")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
@@ -1716,20 +1739,52 @@ def run_conversation(
if agent.api_mode in {"chat_completions", "bedrock_converse", "anthropic_messages"}:
assistant_message = _trunc_msg
if assistant_message is not None and _trunc_has_tool_calls:
if truncated_tool_call_retries < 1:
_is_stub_stall = (
getattr(response, "id", "") == PARTIAL_STREAM_STUB_ID
)
if truncated_tool_call_retries < 3:
truncated_tool_call_retries += 1
agent._buffer_vprint(
f"⚠️ Truncated tool call detected — retrying API call..."
)
if _is_stub_stall:
# The stream broke mid tool-call (network /
# peer-closed connection), not a real output
# cap — say so instead of "max output tokens".
agent._buffer_vprint(
f"⚠️ Stream interrupted mid tool-call — "
f"retrying ({truncated_tool_call_retries}/3)..."
)
else:
agent._buffer_vprint(
f"⚠️ Truncated tool call detected — "
f"retrying API call "
f"({truncated_tool_call_retries}/3)..."
)
# Boost max_tokens on each retry so the model has
# more room to complete the tool-call JSON. A
# network stall doesn't need a bigger budget, but
# a genuine output-cap truncation does, and the
# boost is harmless for the stall case.
_tc_boost_base = agent.max_tokens if agent.max_tokens else 4096
_tc_boost = _tc_boost_base * (truncated_tool_call_retries + 1)
_tc_requested_cap = agent._requested_output_cap_from_api_kwargs(api_kwargs)
if _tc_requested_cap is not None:
_tc_boost = max(_tc_boost, _tc_requested_cap)
_tc_boost_cap = max(32768, _tc_requested_cap or 0)
agent._ephemeral_max_output_tokens = min(_tc_boost, _tc_boost_cap)
# Don't append the broken response to messages;
# just re-run the same API call from the current
# message state, giving the model another chance.
continue
agent._flush_status_buffer()
agent._vprint(
f"{agent.log_prefix}⚠️ Truncated tool call response detected again — refusing to execute incomplete tool arguments.",
force=True,
)
if _is_stub_stall:
agent._vprint(
f"{agent.log_prefix}⚠️ Stream kept dropping mid tool-call after 3 retries — the action was not executed.",
force=True,
)
else:
agent._vprint(
f"{agent.log_prefix}⚠️ Truncated tool call response detected again — refusing to execute incomplete tool arguments.",
force=True,
)
agent._cleanup_task_resources(effective_task_id)
agent._persist_session(messages, conversation_history)
return {
@@ -1738,7 +1793,12 @@ def run_conversation(
"api_calls": api_call_count,
"completed": False,
"partial": True,
"error": "Response truncated due to output length limit",
"error": (
"Stream repeatedly dropped mid tool-call (network); "
"the tool was not executed"
if _is_stub_stall
else "Response truncated due to output length limit"
),
}
# If we have prior messages, roll back to last complete state
@@ -3072,12 +3132,17 @@ def run_conversation(
) and not is_context_length_error
if is_client_error:
# Try fallback before aborting — a different provider
# may not have the same issue (rate limit, auth, etc.)
if classified.reason == FailoverReason.content_policy_blocked:
agent._buffer_status("⚠️ Provider safety filter blocked this request — trying fallback...")
else:
agent._buffer_status(f"⚠️ Non-retryable error (HTTP {status_code}) — trying fallback...")
# Try fallback before aborting — a different provider may
# not have the same issue (rate limit, auth, etc.). Only
# announce the attempt when a fallback chain actually
# exists; otherwise "trying fallback..." is a lie and the
# session looks like it's recovering when it's about to
# abort silently (#35314, #17446).
if agent._has_pending_fallback():
if classified.reason == FailoverReason.content_policy_blocked:
agent._buffer_status("⚠️ Provider safety filter blocked this request — trying fallback...")
else:
agent._buffer_status(f"⚠️ Non-retryable error (HTTP {status_code}) — trying fallback...")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
@@ -3220,7 +3285,8 @@ def run_conversation(
retry_count = 0
continue
# Try fallback before giving up entirely
agent._buffer_status(f"⚠️ Max retries ({max_retries}) exhausted — trying fallback...")
if agent._has_pending_fallback():
agent._buffer_status(f"⚠️ Max retries ({max_retries}) exhausted — trying fallback...")
if agent._try_activate_fallback():
retry_count = 0
compression_attempts = 0
@@ -3383,9 +3449,16 @@ def run_conversation(
# Progressively boost the output token budget on each retry.
# Retry 1 → 2× base, retry 2 → 3× base, capped at 32 768.
# Applies to all providers via _ephemeral_max_output_tokens.
# If the original request already used a larger provider/model
# default budget, keep that floor so continuation retries do
# not accidentally downshift to a much smaller cap.
_boost_base = agent.max_tokens if agent.max_tokens else 4096
_boost = _boost_base * (length_continue_retries + 1)
agent._ephemeral_max_output_tokens = min(_boost, 32768)
_requested_cap = agent._requested_output_cap_from_api_kwargs(api_kwargs)
if _requested_cap is not None:
_boost = max(_boost, _requested_cap)
_boost_cap = max(32768, _requested_cap or 0)
agent._ephemeral_max_output_tokens = min(_boost, _boost_cap)
continue
# Guard: if all retries exhausted without a successful response
@@ -3875,6 +3948,11 @@ def run_conversation(
# inflate completion_tokens with reasoning,
# causing premature compression. (#12026)
_real_tokens = _compressor.last_prompt_tokens
elif _compressor.last_prompt_tokens == -1:
# Compression just ran and no API-reported prompt count
# has arrived yet. Avoid treating a schema-heavy rough
# post-compression estimate as real context pressure.
_real_tokens = 0
else:
# Include tool schemas — with 50+ tools enabled
# these add 20-30K tokens the messages-only
@@ -4314,36 +4392,54 @@ def run_conversation(
)
final_response = agent._handle_max_iterations(messages, api_call_count)
# If running as a kanban worker, block the task so the dispatcher
# knows the worker could not complete (rather than treating it as a
# If running as a kanban worker, signal the dispatcher that the
# worker could not complete (rather than treating it as a
# protocol violation). The agent loop strips tools before calling
# _handle_max_iterations, so the model cannot call kanban_block
# itself — we must do it on its behalf.
#
# We route through ``_record_task_failure(outcome="timed_out")``
# rather than ``kanban_block`` so this counts toward the
# ``consecutive_failures`` counter and the dispatcher's
# ``failure_limit`` circuit breaker (#29747 gap 2). Without this,
# a task whose worker keeps exhausting its budget would block
# silently each run, get auto-promoted by the operator (or never
# surface), and re-block in an endless loop with no signal.
_kanban_task = os.environ.get("HERMES_KANBAN_TASK")
if _kanban_task:
try:
_ra().handle_function_call(
"kanban_block",
{
"task_id": _kanban_task,
"reason": (
from hermes_cli import kanban_db as _kb
_conn = _kb.connect()
try:
_kb._record_task_failure(
_conn,
_kanban_task,
error=(
f"Iteration budget exhausted "
f"({api_call_count}/{agent.max_iterations}) — "
"task could not complete within the allowed "
"iterations"
),
},
task_id=effective_task_id,
)
logger.info(
"kanban_block called for task %s after iteration "
"exhaustion (%d/%d)",
_kanban_task, api_call_count, agent.max_iterations,
)
outcome="timed_out",
release_claim=True,
end_run=True,
event_payload_extra={
"budget_used": api_call_count,
"budget_max": agent.max_iterations,
},
)
logger.info(
"recorded budget-exhausted failure for task %s (%d/%d)",
_kanban_task, api_call_count, agent.max_iterations,
)
finally:
try:
_conn.close()
except Exception:
pass
except Exception:
logger.warning(
"Failed to call kanban_block after iteration "
"exhaustion for task %s",
"Failed to record budget-exhausted failure for task %s",
_kanban_task,
exc_info=True,
)
@@ -4438,6 +4534,55 @@ def run_conversation(
except Exception as _ver_err:
logger.debug("file-mutation verifier footer failed: %s", _ver_err)
# Turn-completion explainer.
# When a turn ends abnormally after substantive work — empty content
# after retries, a partial/truncated stream, a still-pending tool
# result, or an iteration/budget limit — the user otherwise gets a
# blank or fragmentary response box with no consolidated reason why
# the agent stopped (#34452). Surface a single user-visible
# explanation derived from ``_turn_exit_reason``, mirroring the
# file-mutation verifier footer pattern above.
#
# Gate carefully so healthy turns stay quiet:
# - ``text_response(...)`` exits never produce an explanation
# (handled inside the formatter), so a terse ``Done.`` is silent.
# - We only ACT when there is no genuinely usable reply this turn:
# an empty response, the "(empty)" terminal sentinel, or a
# suspiciously short partial fragment with no terminating
# punctuation (e.g. "The"). A real short answer keeps its text.
if not interrupted:
try:
if agent._turn_completion_explainer_enabled():
_stripped = (final_response or "").strip()
_is_empty_terminal = _stripped == "" or _stripped == "(empty)"
# A short fragment that is not a normal text_response exit
# and lacks sentence-ending punctuation is treated as a
# truncated partial (the "The" case from #34452).
_is_partial_fragment = (
not _is_empty_terminal
and not str(_turn_exit_reason).startswith("text_response")
and len(_stripped) <= 24
and _stripped[-1:] not in {".", "!", "?", "", "", "", "`", ")"}
)
if _is_empty_terminal or _is_partial_fragment:
_explanation = agent._format_turn_completion_explanation(
_turn_exit_reason
)
if _explanation:
if _is_empty_terminal:
# Replace the bare "(empty)"/blank sentinel with
# the actionable explanation.
final_response = _explanation
else:
# Keep the partial fragment, append the reason so
# the user sees both what arrived and why it
# stopped.
final_response = (
_stripped + "\n\n" + _explanation
)
except Exception as _exp_err:
logger.debug("turn-completion explainer failed: %s", _exp_err)
_response_transformed = False
# Plugin hook: transform_llm_output

View File

@@ -14,7 +14,7 @@ from datetime import datetime, timezone
from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import OPENROUTER_BASE_URL
from hermes_cli.config import get_env_value, load_env
from hermes_cli.config import load_env
from agent.credential_persistence import (
is_borrowed_credential_source,
sanitize_borrowed_credential_payload,
@@ -22,7 +22,6 @@ from agent.credential_persistence import (
import hermes_cli.auth as auth_mod
from hermes_cli.auth import (
CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
PROVIDER_REGISTRY,
_auth_store_lock,
_codex_access_token_is_expiring,
@@ -55,6 +54,38 @@ def _load_config_safe() -> Optional[dict]:
STATUS_OK = "ok"
STATUS_EXHAUSTED = "exhausted"
# Terminal failure — the credential will never recover on its own. Used for
# upstream-permanent OAuth states like ``token_invalidated`` / ``token_revoked``
# where retrying after a TTL cooldown is guaranteed to fail. ``DEAD`` entries
# are excluded from rotation unconditionally and only clear when an explicit
# write-side sync (e.g. ``_save_codex_tokens`` after a fresh device-code
# login) rewrites the tokens.
STATUS_DEAD = "dead"
# OAuth error reasons that indicate the credential is permanently invalid
# server-side and cannot be recovered by retry/refresh. Sourced from
# OpenAI Codex Responses API, Anthropic, xAI, and Google OAuth spec.
_TERMINAL_AUTH_REASONS = frozenset({
"token_invalidated", # OpenAI Codex: "Your authentication token has been invalidated."
"token_revoked", # OAuth 2.0 RFC 7009: token explicitly revoked
"invalid_token", # RFC 6750: bearer token is malformed/expired/revoked
"invalid_grant", # RFC 6749: refresh_token rejected during refresh
"unauthorized_client", # RFC 6749: client no longer authorized
"refresh_token_reused", # Single-use refresh token consumed by another process
})
# How long a DEAD manual credential is preserved before being pruned.
# Manual entries (``manual:*``) are independent credentials with no singleton
# to re-seed from, so pruning them after a quiet window cleans up dead state
# without losing recoverability — the user always has the option to re-add
# via ``hermes auth add``.
#
# Singleton-seeded entries (``device_code``, ``loopback_pkce``, ``claude_code``)
# are NOT pruned because ``_seed_from_singletons`` would just re-create them
# on the next ``load_pool()`` with the same stale singleton tokens, defeating
# the cleanup. They remain in the pool marked DEAD until an explicit re-auth
# write-side sync (``_save_codex_tokens`` etc.) clears the status.
DEAD_MANUAL_PRUNE_TTL_SECONDS = 24 * 60 * 60 # 24 hours
AUTH_TYPE_OAUTH = "oauth"
AUTH_TYPE_API_KEY = "api_key"
@@ -171,8 +202,22 @@ class PooledCredential:
def runtime_api_key(self) -> str:
if self.provider == "nous":
# Nous stores the runtime inference credential in agent_key for
# compatibility. It may be a NAS invoke JWT or legacy opaque key.
return str(self.agent_key or self.access_token or "")
# compatibility. It must be a NAS invoke JWT.
for token, expires_at in (
(self.agent_key, self.agent_key_expires_at),
(self.access_token, self.expires_at),
):
if (
isinstance(token, str)
and token.strip()
and auth_mod._nous_invoke_jwt_is_usable(
token,
scope=getattr(self, "scope", None),
expires_at=expires_at,
)
):
return token.strip()
return ""
return str(self.access_token or "")
@property
@@ -438,6 +483,29 @@ class CredentialPool:
[entry.to_dict() for entry in self._entries],
)
def _is_terminal_auth_failure(
self,
status_code: Optional[int],
normalized_error: Dict[str, Any],
) -> bool:
"""Detect upstream-permanent OAuth failures that won't recover on TTL.
Only fires for 401 responses whose error code/reason matches a known
terminal OAuth state (token_invalidated, token_revoked, invalid_grant,
etc.). Distinguishes permanent failures from transient ones like
token_expired (refreshable) or generic 401 without a specific reason
(could be a server-side glitch worth retrying).
Returns False for non-401 status codes — 429 rate limits and 402
billing failures are transient by nature and should keep TTL semantics.
"""
if status_code != 401:
return False
reason = normalized_error.get("reason")
if not isinstance(reason, str):
return False
return reason.strip().lower() in _TERMINAL_AUTH_REASONS
def _mark_exhausted(
self,
entry: PooledCredential,
@@ -445,9 +513,20 @@ class CredentialPool:
error_context: Optional[Dict[str, Any]] = None,
) -> PooledCredential:
normalized_error = _normalize_error_context(error_context)
# Permanent OAuth failures (token_invalidated, token_revoked, etc.)
# transition to STATUS_DEAD instead of STATUS_EXHAUSTED. Without this,
# a revoked credential gets a 1-hour TTL cooldown and then re-enters
# rotation, failing immediately every hour until the user manually
# removes it (issue #32849). DEAD entries are excluded from rotation
# unconditionally and only clear via an explicit re-auth write-side
# sync (``_save_codex_tokens`` after a fresh device-code login).
if self._is_terminal_auth_failure(status_code, normalized_error):
terminal_status = STATUS_DEAD
else:
terminal_status = STATUS_EXHAUSTED
updated = replace(
entry,
last_status=STATUS_EXHAUSTED,
last_status=terminal_status,
last_status_at=time.time(),
last_error_code=status_code,
last_error_reason=normalized_error.get("reason"),
@@ -852,12 +931,7 @@ class CredentialPool:
if synced is not entry:
entry = synced
auth_mod.resolve_nous_runtime_credentials(
min_key_ttl_seconds=DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
inference_auth_mode=(
auth_mod.NOUS_INFERENCE_AUTH_MODE_LEGACY
if force
else auth_mod.NOUS_INFERENCE_AUTH_MODE_AUTO
),
force_refresh=force,
)
updated = self._sync_nous_entry_from_auth_store(entry)
else:
@@ -1139,7 +1213,7 @@ class CredentialPool:
auth_mod.XAI_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
)
if self.provider == "nous":
# Nous refresh/mint can require network access and should happen when
# Nous refresh can require network access and should happen when
# runtime credentials are actually resolved, not merely when the pool
# is enumerated for listing, migration, or selection.
return False
@@ -1158,13 +1232,14 @@ class CredentialPool:
"""
now = time.time()
cleared_any = False
entries_to_prune: List[str] = []
available: List[PooledCredential] = []
for entry in self._entries:
# For anthropic claude_code entries, sync from the credentials file
# before any status/refresh checks. This picks up tokens refreshed
# by other processes (Claude Code CLI, other Hermes profiles).
if (self.provider == "anthropic" and entry.source == "claude_code"
and entry.last_status == STATUS_EXHAUSTED):
and entry.last_status in {STATUS_EXHAUSTED, STATUS_DEAD}):
synced = self._sync_anthropic_entry_from_credentials_file(entry)
if synced is not entry:
entry = synced
@@ -1175,7 +1250,7 @@ class CredentialPool:
# exhausted status stale.
if (self.provider == "nous"
and entry.source == "device_code"
and entry.last_status == STATUS_EXHAUSTED):
and entry.last_status in {STATUS_EXHAUSTED, STATUS_DEAD}):
synced = self._sync_nous_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
@@ -1187,7 +1262,7 @@ class CredentialPool:
# future for ChatGPT weekly windows).
if (self.provider == "openai-codex"
and entry.source == "device_code"
and entry.last_status == STATUS_EXHAUSTED):
and entry.last_status in {STATUS_EXHAUSTED, STATUS_DEAD}):
synced = self._sync_codex_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
@@ -1198,11 +1273,41 @@ class CredentialPool:
# xAI Grok OAuth login) has since rotated in auth.json.
if (self.provider == "xai-oauth"
and entry.source == "loopback_pkce"
and entry.last_status == STATUS_EXHAUSTED):
and entry.last_status in {STATUS_EXHAUSTED, STATUS_DEAD}):
synced = self._sync_xai_oauth_entry_from_auth_store(entry)
if synced is not entry:
entry = synced
cleared_any = True
if entry.last_status == STATUS_DEAD:
# Manual DEAD credentials get pruned after a 24h quiet window
# so the pool doesn't accumulate dead entries forever. The
# user can always re-add via ``hermes auth add``. Singleton-
# seeded DEAD entries are kept so the audit trail (label,
# last_error_reason, timestamps) stays visible — pruning them
# would just be undone by ``_seed_from_singletons`` on the
# next load anyway.
if _is_manual_source(entry.source):
dead_at = entry.last_status_at or 0
if dead_at and now - dead_at > DEAD_MANUAL_PRUNE_TTL_SECONDS:
_label = entry.label or entry.id[:8]
logger.warning(
"credential pool: pruning DEAD manual entry %s "
"(reason=%s, age=%.1fh) — re-add via `hermes auth add %s`",
_label,
entry.last_error_reason or "unknown",
(now - dead_at) / 3600.0,
self.provider,
)
# Mark for removal after the loop completes; we can't
# mutate self._entries while iterating.
entries_to_prune.append(entry.id)
cleared_any = True
# Permanently failed credentials never re-enter rotation via
# TTL. They only clear when a write-side re-auth sync rewrites
# the tokens (e.g. ``_save_codex_tokens`` after a fresh
# device-code login). The auth.json-sync paths below handle
# the re-auth case for OAuth singletons.
continue
if entry.last_status == STATUS_EXHAUSTED:
exhausted_until = _exhausted_until(entry)
if exhausted_until is not None and now < exhausted_until:
@@ -1226,6 +1331,9 @@ class CredentialPool:
continue
entry = refreshed
available.append(entry)
if entries_to_prune:
pruned_ids = set(entries_to_prune)
self._entries = [e for e in self._entries if e.id not in pruned_ids]
if cleared_any:
self._persist()
return available
@@ -1293,11 +1401,22 @@ class CredentialPool:
if entry is None:
return None
_label = entry.label or entry.id[:8]
logger.info(
"credential pool: marking %s exhausted (status=%s), rotating",
_label, status_code,
)
self._mark_exhausted(entry, status_code, error_context)
# Re-read the updated entry to log the correct terminal state.
updated_entry = next(
(e for e in self._entries if e.id == entry.id), entry,
)
if updated_entry.last_status == STATUS_DEAD:
logger.warning(
"credential pool: marking %s DEAD (status=%s, reason=%s) — "
"permanently failed, will NOT re-enter rotation until re-auth",
_label, status_code, updated_entry.last_error_reason or "unknown",
)
else:
logger.info(
"credential pool: marking %s exhausted (status=%s), rotating",
_label, status_code,
)
self._current_id = None
next_entry = self._select_unlocked()
if next_entry:
@@ -1637,9 +1756,9 @@ def _seed_from_singletons(provider: str, entries: List[PooledCredential]) -> Tup
"inference_base_url": state.get("inference_base_url"),
"agent_key": state.get("agent_key"),
"agent_key_expires_at": state.get("agent_key_expires_at"),
# Carry the mint/refresh timestamps into the pool so
# Carry the refresh timestamps into the pool so
# freshness-sensitive consumers (self-heal hooks, pool
# pruning by age) can distinguish just-minted credentials
# pruning by age) can distinguish just-refreshed credentials
# from stale ones. Without these, fresh device_code
# entries get obtained_at=None and look older than they
# are (#15099).

View File

@@ -183,6 +183,18 @@ def get_archive_after_days() -> int:
return DEFAULT_ARCHIVE_AFTER_DAYS
def get_prune_builtins() -> bool:
"""Whether the curator may prune (archive) bundled built-in skills too.
ON by default. When on, built-ins become curation candidates and are
archived after the same inactivity period as agent-created skills, with a
suppression list keeping them archived across `hermes update` re-seeds.
Hub-installed skills are never pruned regardless of this flag.
"""
cfg = _load_config()
return bool(cfg.get("prune_builtins", True))
# ---------------------------------------------------------------------------
# Idle / interval check
# ---------------------------------------------------------------------------
@@ -254,9 +266,17 @@ def should_run_now(now: Optional[datetime] = None) -> bool:
# ---------------------------------------------------------------------------
def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int]:
"""Walk every agent-created skill and move active/stale/archived based on
"""Walk every curator-managed skill and move active/stale/archived based on
the latest real activity timestamp. Pinned skills are never touched.
Returns a counter dict describing what changed."""
Built-ins (eligible only when ``curator.prune_builtins`` is on) are seeded
with a baseline record the first time they're seen so their inactivity
clock starts NOW rather than at epoch — a long-unused built-in is therefore
archived only after a fresh ``archive_after_days`` of non-use, not on the
first pass after the flag flips on.
Returns a counter dict describing what changed.
"""
from tools import skill_usage as _u
if now is None:
@@ -264,7 +284,7 @@ def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int
stale_cutoff = now - timedelta(days=get_stale_after_days())
archive_cutoff = now - timedelta(days=get_archive_after_days())
counts = {"marked_stale": 0, "archived": 0, "reactivated": 0, "checked": 0}
counts = {"marked_stale": 0, "archived": 0, "reactivated": 0, "checked": 0, "seeded": 0}
for row in _u.agent_created_report():
counts["checked"] += 1
@@ -272,6 +292,13 @@ def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int
if row.get("pinned"):
continue
# First sight of a curation-eligible skill with no persisted record
# (e.g. a newly-eligible built-in): anchor its clock to now and defer.
if not row.get("_persisted", True):
_u.seed_record_if_missing(name)
counts["seeded"] += 1
continue
last_activity = _parse_iso(row.get("last_activity_at"))
# If never active, treat created_at as the anchor so new skills don't
# immediately archive themselves.
@@ -1484,14 +1511,30 @@ def run_curator_review(
"error": None,
}
else:
# When pruning built-ins is enabled, the candidate list now
# includes bundled skills. Override the default "don't touch
# bundled" rule for them — but only archiving is permitted, and
# hub-installed skills remain strictly off-limits.
builtins_note = ""
if get_prune_builtins():
builtins_note = (
"\n\nPRUNE-BUILTINS MODE IS ON: bundled built-in skills "
"ARE included in the candidate list below and MAY be "
"archived for staleness/irrelevance, overriding hard "
"rule #1 for bundled skills ONLY. Hub-installed skills "
"remain strictly off-limits. Treat a stale built-in the "
"same as a stale agent-created skill: archive it (never "
"delete). It will be restored on `hermes update` only if "
"the user explicitly restores it."
)
if dry_run:
prompt = (
f"{CURATOR_DRY_RUN_BANNER}\n\n"
f"{CURATOR_REVIEW_PROMPT}\n\n"
f"{CURATOR_REVIEW_PROMPT}{builtins_note}\n\n"
f"{candidate_list}"
)
else:
prompt = f"{CURATOR_REVIEW_PROMPT}\n\n{candidate_list}"
prompt = f"{CURATOR_REVIEW_PROMPT}{builtins_note}\n\n{candidate_list}"
llm_meta = _run_llm_review(prompt)
final_summary = (
f"{prefix}{auto_summary}; llm: {llm_meta.get('summary', 'no change')}"

View File

@@ -21,6 +21,8 @@ It DOES include:
pointer — otherwise the curator would immediately re-fire on the next
tick)
- ``.bundled_manifest`` (so protection markers stay consistent)
- ``.curator_suppressed`` (so rollback restores the set of pruned built-ins
the re-seeder must leave archived)
Alongside the skills tarball, each snapshot also captures a copy of
``~/.hermes/cron/jobs.json`` as ``cron-jobs.json`` when it exists. Cron
@@ -39,12 +41,9 @@ from __future__ import annotations
import json
import logging
import os
import re
import shutil
import tarfile
import tempfile
import time
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple

View File

@@ -249,6 +249,10 @@ def get_read_block_error(path: str) -> Optional[str]:
".env",
"webhook_subscriptions.json",
os.path.join("auth", "google_oauth.json"),
# Bitwarden Secrets Manager disk cache: stores plaintext secret values
# to avoid re-fetching across back-to-back CLI invocations. The file
# was introduced by #31968 but not added to this guard.
os.path.join("cache", "bws_cache.json"),
)
for hd in hermes_dirs:
for name in credential_file_names:

View File

@@ -31,7 +31,6 @@ import json
import logging
import time
import urllib.error
import urllib.parse
import urllib.request
import uuid
from dataclasses import dataclass, field

View File

@@ -899,7 +899,15 @@ def start_oauth_flow(
try:
import webbrowser
webbrowser.open(auth_url, new=1, autoraise=True)
try:
from hermes_cli.auth import (
_can_open_graphical_browser as _can_open_gui,
)
except Exception:
_can_open_gui = lambda: True # noqa: E731
if _can_open_gui():
webbrowser.open(auth_url, new=1, autoraise=True)
except Exception as exc:
logger.debug("webbrowser.open failed: %s", exc)

View File

@@ -16,7 +16,6 @@ from __future__ import annotations
import argparse
import sys
from typing import Optional
def register_subparser(subparsers: argparse._SubParsersAction) -> None:
@@ -248,19 +247,13 @@ def _cmd_restart() -> int:
def _cmd_which(server_id: str) -> int:
from agent.lsp.install import INSTALL_RECIPES, hermes_lsp_bin_dir
import os
import shutil as _shutil
from agent.lsp.install import INSTALL_RECIPES, _existing_binary
recipe = INSTALL_RECIPES.get(server_id)
bin_name = (recipe or {}).get("bin", server_id)
staged = hermes_lsp_bin_dir() / bin_name
if staged.exists():
sys.stdout.write(str(staged) + "\n")
return 0
on_path = _shutil.which(bin_name)
if on_path:
sys.stdout.write(on_path + "\n")
resolved = _existing_binary(bin_name)
if resolved:
sys.stdout.write(resolved + "\n")
return 0
sys.stderr.write(f"{server_id}: not installed\n")
return 1
@@ -294,11 +287,9 @@ def _backend_warnings() -> list:
suggestion across common platforms.
"""
import shutil as _shutil
from agent.lsp.install import hermes_lsp_bin_dir
from agent.lsp.install import _existing_binary
notes: list = []
bash_installed = _shutil.which("bash-language-server") is not None or (
(hermes_lsp_bin_dir() / "bash-language-server").exists()
)
bash_installed = _existing_binary("bash-language-server") is not None
if bash_installed and _shutil.which("shellcheck") is None:
notes.append(
"bash-language-server is installed but shellcheck is missing — "

View File

@@ -44,6 +44,7 @@ from __future__ import annotations
import asyncio
import logging
import os
import sys
from pathlib import Path
from typing import Any, Awaitable, Callable, Dict, List, Optional, Set
from urllib.parse import quote, unquote
@@ -244,15 +245,27 @@ class LSPClient:
await self._cleanup_process()
raise
@staticmethod
def _win_wrap_cmd(cmd: List[str]) -> List[str]:
"""On Windows, wrap .cmd/.bat shims so CreateProcess can run them."""
exe = cmd[0]
if exe.lower().endswith((".cmd", ".bat")):
return ["cmd.exe", "/c", *cmd]
return cmd
async def _spawn(self) -> None:
env = dict(os.environ)
if self._env:
env.update(self._env)
cmd = self._command
if sys.platform == "win32":
cmd = self._win_wrap_cmd(cmd)
try:
self._proc = await asyncio.create_subprocess_exec(
self._command[0],
*self._command[1:],
cmd[0],
*cmd[1:],
stdin=asyncio.subprocess.PIPE,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
@@ -261,7 +274,7 @@ class LSPClient:
)
except FileNotFoundError as e:
raise LSPProtocolError(
f"LSP server binary not found: {self._command[0]} ({e})"
f"LSP server binary not found: {cmd[0]} ({e})"
) from e
# Drain stderr at debug level — if we don't, the pipe buffer

View File

@@ -108,6 +108,11 @@ INSTALL_RECIPES: Dict[str, Dict[str, Any]] = {
_install_locks: Dict[str, threading.Lock] = {}
_install_results: Dict[str, Optional[str]] = {}
_install_lock_meta = threading.Lock()
_WINDOWS_WRAPPER_SUFFIXES = (".cmd", ".exe", ".bat")
def _is_windows() -> bool:
return os.name == "nt"
def hermes_lsp_bin_dir() -> Path:
@@ -120,14 +125,33 @@ def hermes_lsp_bin_dir() -> Path:
return p
def _native_binary_candidates(base: Path) -> list[Path]:
"""Return platform-native executable candidates for a staged binary."""
candidates = [base]
if _is_windows():
existing = {str(base).lower()}
for suffix in _WINDOWS_WRAPPER_SUFFIXES:
candidate = Path(str(base) + suffix)
key = str(candidate).lower()
if key not in existing:
candidates.append(candidate)
existing.add(key)
return candidates
def _existing_binary(name: str) -> Optional[str]:
"""Probe the staging dir + PATH for a binary named ``name``."""
staged = hermes_lsp_bin_dir() / name
if staged.exists() and os.access(staged, os.X_OK):
return str(staged)
for staged in _native_binary_candidates(hermes_lsp_bin_dir() / name):
if staged.exists() and os.access(staged, os.X_OK):
return str(staged)
on_path = shutil.which(name)
if on_path:
return on_path
if _is_windows():
for suffix in _WINDOWS_WRAPPER_SUFFIXES:
on_path = shutil.which(f"{name}{suffix}")
if on_path:
return on_path
return None
@@ -250,12 +274,7 @@ def _install_npm(
# Find the bin
nm_bin = staging / "node_modules" / ".bin" / bin_name
if os.name == "nt":
# On Windows npm sometimes drops `.cmd` shims
candidates = [nm_bin, nm_bin.with_suffix(".cmd")]
else:
candidates = [nm_bin]
for c in candidates:
for c in _native_binary_candidates(nm_bin):
if c.exists():
# Symlink into our `lsp/bin/` for stable PATH access.
link = hermes_lsp_bin_dir() / c.name
@@ -301,7 +320,7 @@ def _install_go(pkg: str, bin_name: str) -> Optional[str]:
logger.warning("[install] go install errored for %s: %s", pkg, e)
return None
bin_path = staging / bin_name
if os.name == "nt":
if _is_windows():
bin_path = bin_path.with_suffix(".exe")
if bin_path.exists():
return str(bin_path)
@@ -337,19 +356,24 @@ def _install_pip(pkg: str, bin_name: str) -> Optional[str]:
except (subprocess.TimeoutExpired, OSError) as e:
logger.warning("[install] pip install errored for %s: %s", pkg, e)
return None
# Look for the script
bin_path = pip_target / "bin" / bin_name
if bin_path.exists():
link = hermes_lsp_bin_dir() / bin_name
if not link.exists():
try:
link.symlink_to(bin_path)
except (OSError, NotImplementedError):
try:
shutil.copy2(bin_path, link)
except OSError:
return str(bin_path)
return str(link if link.exists() else bin_path)
# Look for the console script. POSIX wheels generally write to bin/,
# while native Windows installs use Scripts/.
script_dirs = [pip_target / "bin"]
if _is_windows():
script_dirs.append(pip_target / "Scripts")
for script_dir in script_dirs:
for bin_path in _native_binary_candidates(script_dir / bin_name):
if bin_path.exists():
link = hermes_lsp_bin_dir() / bin_path.name
if not link.exists():
try:
link.symlink_to(bin_path)
except (OSError, NotImplementedError):
try:
shutil.copy2(bin_path, link)
except OSError:
return str(bin_path)
return str(link if link.exists() else bin_path)
return None

View File

@@ -39,25 +39,20 @@ import logging
import os
import threading
import time
from concurrent.futures import Future as ConcurrentFuture
from typing import Any, Callable, Dict, List, Optional, Tuple
from agent.lsp import eventlog
from agent.lsp.client import (
DIAGNOSTICS_DOCUMENT_WAIT,
LSPClient,
file_uri,
)
from agent.lsp.servers import (
ServerContext,
ServerDef,
SpawnSpec,
find_server_for_file,
language_id_for,
)
from agent.lsp.workspace import (
clear_cache,
is_inside_workspace,
resolve_workspace_for_file,
)

View File

@@ -25,7 +25,7 @@ import shutil
from dataclasses import dataclass, field
from typing import Any, Callable, Dict, List, Optional, Sequence, Tuple
from agent.lsp.workspace import nearest_root, normalize_path
from agent.lsp.workspace import nearest_root
logger = logging.getLogger("agent.lsp.servers")

View File

@@ -491,6 +491,7 @@ class MemoryManager:
*,
parent_session_id: str = "",
reset: bool = False,
rewound: bool = False,
**kwargs,
) -> None:
"""Notify all providers that the agent's session_id has rotated.
@@ -503,9 +504,21 @@ class MemoryManager:
per-session state so subsequent writes land in the correct
session's record. See ``MemoryProvider.on_session_switch`` for
the full contract.
``rewound=True`` signals that session_id is unchanged but the
transcript was truncated; providers caching per-turn document
state should invalidate.
"""
if not new_session_id:
return
# Only forward ``rewound`` when it's actually set. Passing it
# unconditionally would inject ``rewound=False`` into every
# provider's **kwargs for the common /resume, /branch, /new, and
# compression paths, polluting providers that capture extra kwargs
# (and breaking exact-dict assertions). The /undo path sets
# rewound=True explicitly; everyone else stays clean.
if rewound:
kwargs["rewound"] = True
for provider in self._providers:
try:
provider.on_session_switch(

View File

@@ -178,6 +178,7 @@ class MemoryProvider(ABC):
*,
parent_session_id: str = "",
reset: bool = False,
rewound: bool = False,
**kwargs,
) -> None:
"""Called when the agent switches session_id mid-process.
@@ -207,6 +208,10 @@ class MemoryProvider(ABC):
(``_session_turns``, ``_turn_counter``, etc.) when this is
set. ``False`` for ``/resume`` / ``/branch`` / compression
where the logical conversation continues under the new id.
rewound:
``True`` if session_id is unchanged but the transcript was
truncated; providers caching per-turn document state should
invalidate.
Default is no-op for backward compatibility.
"""

View File

@@ -200,8 +200,12 @@ DEFAULT_CONTEXT_LENGTHS = {
"qwen3-coder-plus": 1000000, # 1M context
"qwen3-coder": 262144, # 256K context
"qwen": 131072,
# MiniMax — official docs: 204,800 context for all models
# https://platform.minimax.io/docs/api-reference/text-anthropic-api
# MiniMax — M3 is 1M context (max output 512K); M2.x series is 204,800.
# Keys use substring matching (longest-first), so "minimax-m3" wins over
# the generic "minimax" catch-all for the M3 slug on every surface
# (native MiniMax-M3, OpenRouter/Nous minimax/minimax-m3).
# https://platform.minimax.io/docs/api-reference/text-chat-openai
"minimax-m3": 1000000,
"minimax": 204800,
# GLM
"glm": 202752,

View File

@@ -15,18 +15,6 @@ and MoonshotAI/kimi-cli#1595:
2. When ``anyOf`` is used, ``type`` must be on the ``anyOf`` children, not
the parent. Presence of both causes "type should be defined in anyOf
items instead of the parent schema".
3. ``enum`` arrays on scalar-typed nodes may not contain ``null`` or empty
strings. Strip those entries (drop the enum entirely if it becomes empty).
4. ``$ref`` nodes may not carry sibling keywords. Moonshot expands the
reference before validation and then rejects the node if sibling keys
like ``description`` remain on the same node as ``$ref``. Strip every
sibling from ``$ref`` nodes so only ``{"$ref": "..."}`` survives.
(Ported from anomalyco/opencode#24730.)
5. ``items`` may not be a tuple-style array (``items: [schemaA, schemaB]``
for positional element schemas). Moonshot's schema engine requires a
single object schema applied to every array element. Collapse tuple
``items`` to the first element schema (or ``{}`` if the tuple is empty).
(Ported from anomalyco/opencode#24730.)
The ``#/definitions/...`` → ``#/$defs/...`` rewrite for draft-07 refs is
handled separately in ``tools/mcp_tool._normalize_mcp_input_schema`` so it
@@ -78,16 +66,6 @@ def _repair_schema(node: Any, is_schema: bool = True) -> Any:
}
elif key in _SCHEMA_LIST_KEYS and isinstance(value, list):
repaired[key] = [_repair_schema(v, is_schema=True) for v in value]
elif key == "items" and isinstance(value, list):
# Rule 5: tuple-style ``items`` arrays (positional element
# schemas) are not accepted by Moonshot. Collapse to the
# first element schema if present, else to ``{}``. This
# matches opencode's behaviour for moonshotai / kimi models.
first = value[0] if value else {}
if isinstance(first, dict):
repaired[key] = _repair_schema(first, is_schema=True)
else:
repaired[key] = first
elif key in _SCHEMA_NODE_KEYS:
# items / not / additionalProperties: single nested schema.
# additionalProperties can also be a bool — leave those alone.
@@ -152,15 +130,6 @@ def _repair_schema(node: Any, is_schema: bool = True) -> Any:
else:
repaired.pop("enum")
# Rule 4: $ref nodes must not have sibling keywords. Moonshot expands
# the reference before validation and then rejects the node if siblings
# like ``description`` / ``type`` / ``default`` appear alongside $ref.
# The referenced definition still carries its own description on the
# target node, which Moonshot accepts.
# (Ported from anomalyco/opencode#24730.)
if "$ref" in repaired:
return {"$ref": repaired["$ref"]}
return repaired

View File

@@ -7,7 +7,6 @@ assemble pieces, then combines them with memory and ephemeral prompts.
import json
import logging
import os
import re
import threading
from collections import OrderedDict
from pathlib import Path
@@ -236,6 +235,11 @@ KANBAN_GUIDANCE = (
"- Do not shell out to `hermes kanban <verb>` for board operations. Use "
"the `kanban_*` tools — they work across all terminal backends.\n"
"- Do not complete a task you didn't actually finish. Block it.\n"
"- Do not call `clarify` to ask questions. You are running headless — "
"there is no live user to answer. The call will time out and the task "
"will sit silently in `running` with no signal to the operator. Instead: "
"`kanban_comment` the context, then `kanban_block(reason=...)` so the "
"task surfaces on the board as needing input.\n"
"- Do not assign follow-up work to yourself. Assign it to the right "
"specialist profile.\n"
"- Do not call `delegate_task` as a board substitute. `delegate_task` is "
@@ -262,6 +266,37 @@ TOOL_USE_ENFORCEMENT_GUIDANCE = (
# Add new patterns here when a model family needs explicit steering.
TOOL_USE_ENFORCEMENT_MODELS = ("gpt", "codex", "gemini", "gemma", "grok", "glm", "qwen", "deepseek")
# Universal "finish the job" guidance — applied to ALL models, not gated
# by model family. Addresses two cross-model failure modes:
# 1. Stopping after a stub: writing a tiny file or running one command
# and then ending the turn with a description of the plan instead
# of the finished artifact. (Observed on Opus during a real
# Sarasota real-estate build task: 3 API calls, 85-byte file,
# one terminal command, finish_reason=stop.)
# 2. Fabricating output when a real path is blocked. When `pip` or a
# tool fails, some models will synthesize plausible-looking results
# (fake addresses, fake JSON, fake numbers) instead of reporting
# the blocker. (Observed on DeepSeek v4-flash on the same task:
# pushed through PEP-668 wall, then returned fabricated listings.)
#
# Short on purpose. This block is shipped to every user, every session,
# in the cached system prompt — token cost is paid once at install and
# then amortised across all sessions via prefix caching. Keep it tight.
TASK_COMPLETION_GUIDANCE = (
"# Finishing the job\n"
"When the user asks you to build, run, or verify something, the deliverable is "
"a working artifact backed by real tool output — not a description of one. "
"Do not stop after writing a stub, a plan, or a single command. Keep working "
"until you have actually exercised the code or produced the requested result, "
"then report what real execution returned.\n"
"If a tool, install, or network call fails and blocks the real path, say so "
"directly and try an alternative (different package manager, different "
"approach, ask the user). NEVER substitute plausible-looking fabricated "
"output (made-up data, invented file contents, synthesised API responses) "
"for results you couldn't actually produce. Reporting a blocker honestly "
"is always better than inventing a result."
)
# OpenAI GPT/Codex-specific execution guidance. Addresses known failure modes
# where GPT models abandon work on partial results, skip prerequisite lookups,
# hallucinate instead of using tools, and declare "done" without verification.
@@ -813,6 +848,27 @@ def build_environment_hints() -> str:
if is_wsl():
hints.append(WSL_ENVIRONMENT_HINT)
# Embedder-supplied environment description. Lets a host that wraps Hermes
# (e.g. a sandbox runner / managed platform) explain the environment the
# agent is running in — proxy, credential handling, mount layout — without
# forking the identity slot (SOUL.md). Read once at prompt-build time, so
# it's part of the stable, cache-safe system prompt. The env var is the
# build-time/embedder mechanism (set in a container ENV); config.yaml
# ``agent.environment_hint`` is the user-facing surface. Env var wins.
extra = (os.getenv("HERMES_ENVIRONMENT_HINT") or "").strip()
if not extra:
try:
from hermes_cli.config import load_config
extra = str(
(load_config().get("agent", {}) or {}).get("environment_hint", "")
).strip()
except Exception as e:
logger.debug("Could not read agent.environment_hint from config: %s", e)
if extra:
hints.append(extra)
return "\n\n".join(hints)

View File

@@ -150,10 +150,6 @@ _JWT_RE = re.compile(
r"(?:\.[A-Za-z0-9_=-]{4,}){0,2}" # Optional payload and/or signature
)
# Discord user/role mentions: <@123456789012345678> or <@!123456789012345678>
# Snowflake IDs are 17-20 digit integers that resolve to specific Discord accounts.
_DISCORD_MENTION_RE = re.compile(r"<@!?(\d{17,20})>")
# E.164 phone numbers: +<country><number>, 7-15 digits
# Negative lookahead prevents matching hex strings or identifiers
_SIGNAL_PHONE_RE = re.compile(r"(\+[1-9]\d{6,14})(?![A-Za-z0-9])")
@@ -331,7 +327,7 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
"""Apply all redaction patterns to a block of text.
Safe to call on any string -- non-matching text passes through unchanged.
Disabled by default — enable via security.redact_secrets: true in config.yaml.
Enabled by default. Disable via security.redact_secrets: false in config.yaml.
Set force=True for safety boundaries that must never return raw secrets
regardless of the user's global logging redaction preference.
@@ -419,10 +415,6 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
if "&" in text and "=" in text:
text = _redact_form_body(text)
# Discord user/role mentions (<@snowflake_id>)
if "<@" in text:
text = _DISCORD_MENTION_RE.sub(lambda m: f"<@{'!' if '!' in m.group(0) else ''}***>", text)
# E.164 phone numbers (Signal, WhatsApp)
if "+" in text:
def _redact_phone(m):

View File

@@ -37,7 +37,6 @@ import platform
import shutil
import stat
import subprocess
import sys
import tempfile
import time
import urllib.error

View File

@@ -37,6 +37,7 @@ from agent.prompt_builder import (
PLATFORM_HINTS,
SESSION_SEARCH_GUIDANCE,
SKILLS_GUIDANCE,
TASK_COMPLETION_GUIDANCE,
TOOL_USE_ENFORCEMENT_GUIDANCE,
TOOL_USE_ENFORCEMENT_MODELS,
)
@@ -100,6 +101,15 @@ def build_system_prompt_parts(agent: Any, system_message: Optional[str] = None)
# Pointer to the hermes-agent skill + docs for user questions about Hermes itself.
stable_parts.append(HERMES_AGENT_HELP_GUIDANCE)
# Universal task-completion / no-fabrication guidance. Applied to ALL
# models regardless of tool_use_enforcement gating — the failure modes
# this targets (stopping after a stub; fabricating output when a real
# path is blocked) are not model-family specific. Gated only by
# config.yaml ``agent.task_completion_guidance`` (default True) so
# users who want a leaner prompt can turn it off.
if getattr(agent, "_task_completion_guidance", True) and agent.valid_tool_names:
stable_parts.append(TASK_COMPLETION_GUIDANCE)
# Tool-aware behavioral guidance: only inject when the tools are loaded
tool_guidance = []
if "memory" in agent.valid_tool_names:
@@ -205,6 +215,23 @@ def build_system_prompt_parts(agent: Any, system_message: Optional[str] = None)
if _env_hints:
stable_parts.append(_env_hints)
# Local Python toolchain probe — names python/pip/uv/PEP-668 state when
# something is non-default so the model can pick the right install
# strategy without discovering by failure. Emits a single line; emits
# NOTHING when the environment is clean (no token cost). Skipped
# entirely for remote terminal backends (the host's Python state is
# irrelevant when tools run inside docker/modal/ssh). Gated by
# config.yaml ``agent.environment_probe`` (default True).
if getattr(agent, "_environment_probe", True):
try:
from tools.env_probe import get_environment_probe_line
_probe_line = get_environment_probe_line()
if _probe_line:
stable_parts.append(_probe_line)
except Exception:
# Probe failure must never block prompt build.
pass
# Active-profile hint — names the Hermes profile the agent is running
# under so it doesn't conflate ~/.hermes/skills/ (default profile) with
# ~/.hermes/profiles/<active>/skills/ (this profile's). Deterministic

View File

@@ -13,14 +13,13 @@ extracted functions reach back through the ``run_agent`` module via
from __future__ import annotations
import concurrent.futures
import contextvars
import json
import logging
import os
import random
import threading
import time
from typing import Any, Optional
from typing import Optional
from agent.display import (
KawaiiSpinner,
@@ -38,12 +37,9 @@ from agent.tool_dispatch_helpers import (
make_tool_result_message,
)
from tools.terminal_tool import (
_get_approval_callback,
_get_sudo_password_callback,
set_approval_callback as _set_approval_callback,
set_sudo_password_callback as _set_sudo_password_callback,
get_active_env,
)
from tools.thread_context import propagate_context_to_thread
from tools.tool_result_storage import (
maybe_persist_tool_result,
enforce_turn_budget,
@@ -62,6 +58,55 @@ def _ra():
return run_agent
def _tool_search_scoped_names(agent) -> frozenset:
"""Return the deferrable tool names the session may invoke via tool_call.
The Tool Search unwrap dispatches the underlying tool directly, bypassing
the bridge branch (and its scope check) in
``model_tools.handle_function_call``. To keep a restricted-toolset session
(subagent, kanban worker, curated gateway session) from reaching tools it
was never granted, the unwrap validates the underlying name against this
set: the deferrable subset of the session's own enabled/disabled toolset
scope.
Result is cached on the agent and refreshed when the tool registry's
generation changes (e.g. an MCP server reconnects), so the common case is
a dict lookup, not a full tool-defs rebuild on every tool call.
"""
try:
import model_tools
from tools import tool_search as _ts
from tools.registry import registry as _registry
except Exception:
return frozenset()
enabled = getattr(agent, "enabled_toolsets", None)
disabled = getattr(agent, "disabled_toolsets", None)
cache_key = (
getattr(_registry, "_generation", 0),
frozenset(enabled) if enabled is not None else None,
frozenset(disabled) if disabled is not None else None,
)
cached = getattr(agent, "_tool_search_scope_cache", None)
if cached is not None and cached[0] == cache_key:
return cached[1]
try:
scoped_defs = model_tools.get_tool_definitions(
enabled_toolsets=enabled,
disabled_toolsets=disabled,
quiet_mode=True,
skip_tool_search_assembly=True,
) or []
names = _ts.scoped_deferrable_names(scoped_defs)
except Exception:
names = frozenset()
try:
agent._tool_search_scope_cache = (cache_key, names)
except Exception:
pass
return names
def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effective_task_id: str, api_call_count: int = 0) -> None:
"""Execute multiple tool calls concurrently using a thread pool.
@@ -100,45 +145,89 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
if not isinstance(function_args, dict):
function_args = {}
# Checkpoint for file-mutating tools
if function_name in {"write_file", "patch"} and agent._checkpoint_mgr.enabled:
try:
file_path = function_args.get("path", "")
if file_path:
work_dir = agent._checkpoint_mgr.get_working_dir_for_path(file_path)
agent._checkpoint_mgr.ensure_checkpoint(work_dir, f"before {function_name}")
except Exception:
pass
# Checkpoint before destructive terminal commands
if function_name == "terminal" and agent._checkpoint_mgr.enabled:
try:
cmd = function_args.get("command", "")
if _is_destructive_command(cmd):
cwd = function_args.get("workdir") or os.getenv("TERMINAL_CWD", os.getcwd())
agent._checkpoint_mgr.ensure_checkpoint(
cwd, f"before terminal: {cmd[:60]}"
)
except Exception:
pass
# ── Tool Search unwrap ────────────────────────────────────────
# When the model invokes the tool_call bridge, peel it open so
# every downstream check (checkpointing, guardrails, plugin
# pre-tool-call hooks, the display/activity feed, the post-call
# callback) sees the underlying tool — not the bridge. This is
# the OpenClaw lesson: hooks must observe the real tool name.
#
# The original tool_call entry on ``tool_call.function`` is left
# untouched so the conversation transcript and the matching
# tool_call_id are preserved exactly as the model emitted them.
#
# Scope gate: the unwrap dispatches the underlying tool directly
# (bypassing the bridge branch in handle_function_call and its
# scope check), so we enforce session toolset scope HERE. A tool
# the session was not granted is rejected before any checkpoint,
# hook, or dispatch fires.
_ts_scope_block = None
try:
from tools import tool_search as _ts
if function_name == _ts.TOOL_CALL_NAME:
_underlying, _underlying_args, _err = _ts.resolve_underlying_call(function_args)
if not _err and _underlying:
if _underlying in _tool_search_scoped_names(agent):
function_name = _underlying
function_args = _underlying_args
else:
_ts_scope_block = json.dumps({
"error": (
f"'{_underlying}' is not available in this session. "
"Use tool_search to find tools you can call."
),
}, ensure_ascii=False)
except Exception:
pass
# ── Block evaluation (BEFORE checkpoint preflight) ───────────
# We must know whether the tool will execute before touching
# checkpoint state (dedup slot, real snapshots).
block_result = None
blocked_by_guardrail = False
try:
from hermes_cli.plugins import get_pre_tool_call_block_message
block_message = get_pre_tool_call_block_message(
function_name, function_args, task_id=effective_task_id or "",
)
except Exception:
block_message = None
if block_message is not None:
block_result = json.dumps({"error": block_message}, ensure_ascii=False)
if _ts_scope_block is not None:
# Out-of-scope tool_call: reject before hooks/guardrails/dispatch.
block_result = _ts_scope_block
else:
guardrail_decision = agent._tool_guardrails.before_call(function_name, function_args)
if not guardrail_decision.allows_execution:
block_result = agent._guardrail_block_result(guardrail_decision)
blocked_by_guardrail = True
try:
from hermes_cli.plugins import get_pre_tool_call_block_message
block_message = get_pre_tool_call_block_message(
function_name, function_args, task_id=effective_task_id or "",
)
except Exception:
block_message = None
if block_message is not None:
block_result = json.dumps({"error": block_message}, ensure_ascii=False)
else:
guardrail_decision = agent._tool_guardrails.before_call(function_name, function_args)
if not guardrail_decision.allows_execution:
block_result = agent._guardrail_block_result(guardrail_decision)
blocked_by_guardrail = True
# ── Checkpoint preflight (only for tools that will execute) ──
if block_result is None:
# Checkpoint for file-mutating tools
if function_name in {"write_file", "patch"} and agent._checkpoint_mgr.enabled:
try:
file_path = function_args.get("path", "")
if file_path:
work_dir = agent._checkpoint_mgr.get_working_dir_for_path(file_path)
agent._checkpoint_mgr.ensure_checkpoint(work_dir, f"before {function_name}")
except Exception:
pass
# Checkpoint before destructive terminal commands
if function_name == "terminal" and agent._checkpoint_mgr.enabled:
try:
cmd = function_args.get("command", "")
if _is_destructive_command(cmd):
cwd = function_args.get("workdir") or os.getenv("TERMINAL_CWD", os.getcwd())
agent._checkpoint_mgr.ensure_checkpoint(
cwd, f"before terminal: {cmd[:60]}"
)
except Exception:
pass
parsed_calls.append((tool_call, function_name, function_args, block_result, blocked_by_guardrail))
@@ -186,14 +275,6 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
agent._current_tool = tool_names_str
agent._touch_activity(f"executing {num_tools} tools concurrently: {tool_names_str}")
# Capture CLI callbacks from the agent thread so worker threads can
# register them locally. Without this, _get_approval_callback() in
# terminal_tool returns None in ThreadPoolExecutor workers, causing
# the dangerous-command prompt to fall back to input() — which
# deadlocks against prompt_toolkit's raw terminal mode (#13617).
_parent_approval_cb = _get_approval_callback()
_parent_sudo_cb = _get_sudo_password_callback()
def _run_tool(index, tool_call, function_name, function_args):
"""Worker function executed in a thread."""
# Register this worker tid so the agent can fan out an interrupt
@@ -220,54 +301,43 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
set_activity_callback(agent._touch_activity)
except Exception:
pass
# Propagate approval/sudo callbacks to this worker thread.
# Mirrors cli.py run_agent() pattern (GHSA-qg5c-hvr5-hjgr).
if _parent_approval_cb is not None:
try:
_set_approval_callback(_parent_approval_cb)
except Exception:
pass
if _parent_sudo_cb is not None:
try:
_set_sudo_password_callback(_parent_sudo_cb)
except Exception:
pass
# Approval/sudo callbacks (thread-local) and the agent turn's
# ContextVars are propagated by propagate_context_to_thread() at the
# submit site below (GHSA-qg5c-hvr5-hjgr, #13617).
start = time.time()
try:
result = agent._invoke_tool(
function_name,
function_args,
effective_task_id,
tool_call.id,
messages=messages,
pre_tool_block_checked=True,
)
except Exception as tool_error:
result = f"Error executing tool '{function_name}': {tool_error}"
logger.error("_invoke_tool raised for %s: %s", function_name, tool_error, exc_info=True)
duration = time.time() - start
is_error, _ = _detect_tool_failure(function_name, result)
if is_error:
logger.info("tool %s failed (%.2fs): %s", function_name, duration, result[:200])
else:
logger.info("tool %s completed (%.2fs, %d chars)", function_name, duration, len(result))
results[index] = (function_name, function_args, result, duration, is_error, False)
# Tear down worker-tid tracking. Clear any interrupt bit we may
# have set so the next task scheduled onto this recycled tid
# starts with a clean slate.
with agent._tool_worker_threads_lock:
agent._tool_worker_threads.discard(_worker_tid)
try:
_ra()._set_interrupt(False, _worker_tid)
except Exception:
pass
# Clear thread-local callbacks so a recycled worker thread
# doesn't hold stale references to a disposed CLI instance.
try:
_set_approval_callback(None)
_set_sudo_password_callback(None)
except Exception:
pass
try:
result = agent._invoke_tool(
function_name,
function_args,
effective_task_id,
tool_call.id,
messages=messages,
pre_tool_block_checked=True,
)
except Exception as tool_error:
result = f"Error executing tool '{function_name}': {tool_error}"
logger.error("_invoke_tool raised for %s: %s", function_name, tool_error, exc_info=True)
duration = time.time() - start
is_error, _ = _detect_tool_failure(function_name, result)
if is_error:
logger.info("tool %s failed (%.2fs): %s", function_name, duration, result[:200])
else:
logger.info("tool %s completed (%.2fs, %d chars)", function_name, duration, len(result))
results[index] = (function_name, function_args, result, duration, is_error, False)
finally:
# Tear down worker-tid tracking. Clear any interrupt bit we may
# have set so the next task scheduled onto this recycled tid
# starts with a clean slate. This MUST be in a finally block
# because BaseException subclasses (CancelledError, KeyboardInterrupt)
# bypass ``except Exception`` and would otherwise leak the tid
# into _interrupted_threads, poisoning the recycled thread.
with agent._tool_worker_threads_lock:
agent._tool_worker_threads.discard(_worker_tid)
try:
_ra()._set_interrupt(False, _worker_tid)
except Exception:
pass
# Start spinner for CLI mode (skip when TUI handles tool progress)
spinner = None
@@ -287,9 +357,12 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
max_workers = min(len(runnable_calls), _MAX_TOOL_WORKERS)
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
for i, tc, name, args in runnable_calls:
# Propagate ContextVars (e.g. _approval_session_key); mirrors asyncio.to_thread.
ctx = contextvars.copy_context()
f = executor.submit(ctx.run, _run_tool, i, tc, name, args)
# Propagate the agent turn's ContextVars (e.g.
# _approval_session_key) AND thread-local approval/sudo
# callbacks into the worker thread; clears callbacks on exit.
f = executor.submit(
propagate_context_to_thread(_run_tool), i, tc, name, args
)
futures.append(f)
# Wait for all to complete with periodic heartbeats so the
@@ -497,16 +570,39 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
if not isinstance(function_args, dict):
function_args = {}
# Check plugin hooks for a block directive before executing.
_block_msg: Optional[str] = None
# Tool Search unwrap — see execute_tool_calls_concurrent for full
# rationale, including the scope gate (the unwrap dispatches the
# underlying tool directly, so session toolset scope is enforced here).
_ts_scope_block: Optional[str] = None
try:
from hermes_cli.plugins import get_pre_tool_call_block_message
_block_msg = get_pre_tool_call_block_message(
function_name, function_args, task_id=effective_task_id or "",
)
from tools import tool_search as _ts
if function_name == _ts.TOOL_CALL_NAME:
_underlying, _underlying_args, _err = _ts.resolve_underlying_call(function_args)
if not _err and _underlying:
if _underlying in _tool_search_scoped_names(agent):
function_name = _underlying
function_args = _underlying_args
else:
_ts_scope_block = (
f"'{_underlying}' is not available in this session. "
"Use tool_search to find tools you can call."
)
except Exception:
pass
# Check plugin hooks for a block directive before executing.
_block_msg: Optional[str] = None
if _ts_scope_block is not None:
_block_msg = _ts_scope_block
else:
try:
from hermes_cli.plugins import get_pre_tool_call_block_message
_block_msg = get_pre_tool_call_block_message(
function_name, function_args, task_id=effective_task_id or "",
)
except Exception:
pass
_guardrail_block_decision: ToolGuardrailDecision | None = None
if _block_msg is None:
guardrail_decision = agent._tool_guardrails.before_call(function_name, function_args)
@@ -667,10 +763,14 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
elif function_name == "delegate_task":
tasks_arg = function_args.get("tasks")
if tasks_arg and isinstance(tasks_arg, list):
spinner_label = f"🔀 delegating {len(tasks_arg)} tasks"
spinner_label = f"🔀 delegating {len(tasks_arg)} tasks · (/agents to monitor)"
else:
goal_preview = (function_args.get("goal") or "")[:30]
spinner_label = f"🔀 {goal_preview}" if goal_preview else "🔀 delegating"
spinner_label = (
f"🔀 {goal_preview} · (/agents to monitor)"
if goal_preview
else "🔀 delegating · (/agents to monitor)"
)
spinner = None
if agent._should_emit_quiet_tool_messages() and agent._should_start_quiet_spinner():
face = random.choice(KawaiiSpinner.get_waiting_faces())
@@ -752,6 +852,8 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
session_id=agent.session_id or "",
enabled_tools=list(agent.valid_tool_names) if agent.valid_tool_names else None,
skip_pre_tool_call_hook=True,
enabled_toolsets=getattr(agent, "enabled_toolsets", None),
disabled_toolsets=getattr(agent, "disabled_toolsets", None),
)
_spinner_result = function_result
except Exception as tool_error:
@@ -772,6 +874,8 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
session_id=agent.session_id or "",
enabled_tools=list(agent.valid_tool_names) if agent.valid_tool_names else None,
skip_pre_tool_call_hook=True,
enabled_toolsets=getattr(agent, "enabled_toolsets", None),
disabled_toolsets=getattr(agent, "disabled_toolsets", None),
)
except Exception as tool_error:
function_result = f"Error executing tool '{function_name}': {tool_error}"

View File

@@ -10,7 +10,7 @@ reasoning configuration, temperature handling, and extra_body assembly.
"""
import copy
from typing import Any, Dict, List, Optional
from typing import Any, Dict
from agent.lmstudio_reasoning import resolve_lmstudio_effort
from agent.moonshot_schema import is_moonshot_model, sanitize_moonshot_tools
@@ -476,13 +476,17 @@ class ChatCompletionsTransport(ProviderTransport):
ephemeral = params.get("ephemeral_max_output_tokens")
user_max = params.get("max_tokens")
anthropic_max = params.get("anthropic_max_output")
# Per-model default cap — profiles override get_max_tokens() when
# they front several backends with different completion-token limits
# (e.g. opencode-go: mimo-v2.5-pro = 131072).
profile_max = profile.get_max_tokens(model)
if ephemeral is not None and max_tokens_fn:
api_kwargs.update(max_tokens_fn(ephemeral))
elif user_max is not None and max_tokens_fn:
api_kwargs.update(max_tokens_fn(user_max))
elif profile.default_max_tokens and max_tokens_fn:
api_kwargs.update(max_tokens_fn(profile.default_max_tokens))
elif profile_max and max_tokens_fn:
api_kwargs.update(max_tokens_fn(profile_max))
elif anthropic_max is not None:
api_kwargs["max_tokens"] = anthropic_max

View File

@@ -23,7 +23,7 @@ import subprocess
import threading
import time
from dataclasses import dataclass, field
from typing import Any, Callable, Optional
from typing import Any, Optional
# Default minimum codex version we test against. The PR sets this from the
# `codex --version` parsed at install time; bumping is a one-line change here.

View File

@@ -31,6 +31,7 @@ import time
from dataclasses import dataclass, field
from typing import Any, Callable, Optional
from agent.codex_responses_adapter import _format_responses_error
from agent.redact import redact_sensitive_text
from agent.transports.codex_app_server import (
CodexAppServerClient,
@@ -581,7 +582,7 @@ class CodexAppServerSession:
(note.get("params") or {}).get("turn") or {}
).get("error")
if err_obj:
err_msg = err_obj.get("message") or str(err_obj)
err_msg = _format_responses_error(err_obj, str(turn_status))
# If the turn failed for an auth/refresh reason,
# rewrite the error into a re-auth hint AND mark
# the session for retirement.

40
apps/bootstrap-installer/.gitignore vendored Normal file
View File

@@ -0,0 +1,40 @@
# Rust / Cargo
/src-tauri/target/
/src-tauri/Cargo.lock
# Vite / build output
/dist/
/dist-ssr/
*.local
# TypeScript build info + tsc emit (we don't ship .js for the
# vite.config.ts; Vite reads it directly via ts-node-style loader).
*.tsbuildinfo
vite.config.d.ts
vite.config.js
# Tauri generated artifacts (regenerated on each build)
/src-tauri/gen/schemas/
# Logs
*.log
npm-debug.log*
yarn-debug.log*
yarn-error.log*
# Editor
.vscode/*
!.vscode/extensions.json
.idea/
.DS_Store
*.suo
*.ntvs*
*.njsproj
*.sln
*.sw?
# Node
node_modules/
# Internal placeholder (re-create if needed)
.tauri-note

View File

@@ -0,0 +1,12 @@
<!doctype html>
<html lang="en" class="h-full">
<head>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<title>Hermes Setup</title>
</head>
<body class="h-full antialiased">
<div id="root" class="h-full"></div>
<script type="module" src="/src/main.tsx"></script>
</body>
</html>

View File

@@ -0,0 +1,46 @@
{
"name": "@hermes/bootstrap-installer",
"private": true,
"version": "0.0.1",
"description": "Hermes Setup — signed installer that drives scripts/install.ps1 with a polished native UI.",
"type": "module",
"scripts": {
"dev": "vite --host 127.0.0.1 --port 5175",
"build": "tsc -b && vite build",
"preview": "vite preview",
"tauri": "tauri",
"tauri:dev": "tauri dev",
"tauri:build": "tauri build",
"tauri:build:debug": "tauri build --debug"
},
"dependencies": {
"@nous-research/ui": "0.16.0",
"@tailwindcss/vite": "^4.2.1",
"@tailwindcss/typography": "^0.5.19",
"@tauri-apps/api": "^2.0.0",
"@tauri-apps/plugin-dialog": "^2.0.0",
"@tauri-apps/plugin-opener": "^2.0.0",
"@tauri-apps/plugin-process": "^2.0.0",
"@tauri-apps/plugin-shell": "^2.0.0",
"@vscode/codicons": "^0.0.45",
"class-variance-authority": "^0.7.1",
"clsx": "^2.1.1",
"katex": "^0.16.45",
"lucide-react": "^0.577.0",
"nanostores": "^1.3.0",
"radix-ui": "^1.4.3",
"react": "^19.2.4",
"react-dom": "^19.2.4",
"tailwind-merge": "^3.5.0",
"tailwindcss": "^4.2.1",
"tw-shimmer": "^0.4.11"
},
"devDependencies": {
"@tauri-apps/cli": "^2.0.0",
"@types/react": "^19.2.14",
"@types/react-dom": "^19.2.3",
"@vitejs/plugin-react": "^5.2.0",
"typescript": "~5.9.3",
"vite": "^7.3.1"
}
}

View File

@@ -0,0 +1,75 @@
[package]
name = "hermes-bootstrap"
version = "0.0.1"
description = "Hermes Setup — signed installer that drives scripts/install.ps1"
authors = ["Nous Research <info@nousresearch.com>"]
edition = "2021"
rust-version = "1.77"
# Rename the output binary so the distributed artifact is literally
# `Hermes-Setup.exe` on disk — not `hermes-bootstrap.exe`. Grandma sees
# what we hand her, period. Tauri honors [[bin]] over [package].name
# for the produced executable name.
[[bin]]
name = "Hermes-Setup"
path = "src/main.rs"
# The library target name MUST match the `withGlobalTauri` binding name that
# tauri.conf.json's `app.windows[].label` references. We don't ship a separate
# lib for now; everything is in src/.
[lib]
name = "hermes_bootstrap_lib"
crate-type = ["staticlib", "cdylib", "rlib"]
[build-dependencies]
tauri-build = { version = "2", features = [] }
[dependencies]
# Tauri runtime + plugins
tauri = { version = "2", features = [] }
tauri-plugin-dialog = "2"
tauri-plugin-opener = "2"
tauri-plugin-process = "2"
tauri-plugin-shell = "2"
# Async + IO
tokio = { version = "1", features = ["full"] }
futures = "0.3"
# Serialization
serde = { version = "1", features = ["derive"] }
serde_json = "1"
# HTTP — rustls so we don't need OpenSSL on the build box
reqwest = { version = "0.12", default-features = false, features = ["rustls-tls", "stream"] }
# Logging — emitted to a file under HERMES_HOME/logs/ and (optionally) the
# webview console via Tauri's event channel.
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "fmt"] }
tracing-appender = "0.2"
# Paths + utils
dirs = "5"
which = "6"
anyhow = "1"
thiserror = "1"
once_cell = "1"
uuid = { version = "1", features = ["v4"] }
# Process control on Windows (CREATE_NO_WINDOW etc.)
[target.'cfg(windows)'.dependencies]
windows-sys = { version = "0.59", features = [
"Win32_Foundation",
"Win32_System_Threading",
"Win32_System_Console",
"Win32_UI_WindowsAndMessaging",
] }
[profile.release]
# A 5-10MB signed installer is the goal. LTO + size-opt + single codegen unit.
panic = "abort"
codegen-units = 1
lto = true
opt-level = "s"
strip = true

View File

@@ -0,0 +1,150 @@
use std::process::Command;
fn main() {
// -----------------------------------------------------------------
// Bake the install.ps1 pin into the binary at compile time.
//
// BUILD_PIN_COMMIT and BUILD_PIN_BRANCH are read by bootstrap.rs's
// `option_env!()` macro to default the install-script reference.
// Precedence (matches install.ps1's own arg precedence): commit > branch.
//
// Resolution order:
// 1. Env var override at build time (HERMES_BUILD_PIN_COMMIT, etc.).
// Useful for CI builds that want to pin to a tagged release SHA
// rather than whatever the checkout's HEAD happens to be.
// 2. `git rev-parse HEAD` + `git rev-parse --abbrev-ref HEAD` against
// the repo this build.rs lives in. Default for `cargo tauri build`
// from a dev machine — pins the produced .exe to your current
// checkout state.
// 3. Last-resort fallback: hardcoded `main` branch, no commit. The
// installer will fetch HEAD-of-main at runtime. Used when the
// build is happening outside a git checkout (e.g. cargo install
// from a packaged crate, unlikely for this binary but defensive).
//
// Build script reruns on git HEAD change so a new commit triggers
// a rebuild without `cargo clean`.
// -----------------------------------------------------------------
let commit = resolve_commit_pin();
let branch = resolve_branch_pin();
if let Some(c) = &commit {
println!("cargo:rustc-env=BUILD_PIN_COMMIT={c}");
println!("cargo:warning=hermes-bootstrap: pinning to commit {}", short(c));
}
if let Some(b) = &branch {
println!("cargo:rustc-env=BUILD_PIN_BRANCH={b}");
println!("cargo:warning=hermes-bootstrap: pinning to branch {b}");
}
if commit.is_none() && branch.is_none() {
// Fail loudly rather than silently produce a binary that errors
// at runtime with "no install-script pin supplied". A build that
// can't resolve a pin almost certainly indicates a misconfigured
// build environment.
println!(
"cargo:warning=hermes-bootstrap: no pin resolved at build time; binary will fail at runtime without HERMES_SETUP_DEV_REPO_ROOT or runtime args"
);
}
// Rerun build.rs when HEAD moves so successive builds pick up new
// commits without needing `cargo clean`. .git/HEAD changes on every
// commit / branch switch / rebase.
let git_dir = locate_git_dir();
if let Some(gd) = &git_dir {
println!("cargo:rerun-if-changed={}/HEAD", gd.display());
// .git/HEAD often points at a ref (e.g. `ref: refs/heads/bb/gui`);
// also watch the ref itself so a new commit on the same branch
// re-triggers.
if let Ok(head) = std::fs::read_to_string(gd.join("HEAD")) {
if let Some(rest) = head.trim().strip_prefix("ref: ") {
println!("cargo:rerun-if-changed={}/{}", gd.display(), rest);
}
}
}
println!("cargo:rerun-if-env-changed=HERMES_BUILD_PIN_COMMIT");
println!("cargo:rerun-if-env-changed=HERMES_BUILD_PIN_BRANCH");
// -----------------------------------------------------------------
// Tauri windows manifest. See hermes-setup.manifest for rationale —
// declares level="asInvoker" so Windows's installer-detection
// heuristic doesn't refuse to launch us without UAC elevation.
// -----------------------------------------------------------------
#[cfg(target_os = "windows")]
let attrs = {
let manifest = include_str!("hermes-setup.manifest");
let win = tauri_build::WindowsAttributes::new().app_manifest(manifest);
tauri_build::Attributes::new().windows_attributes(win)
};
#[cfg(not(target_os = "windows"))]
let attrs = tauri_build::Attributes::new();
tauri_build::try_build(attrs).expect("failed to run tauri-build");
}
fn resolve_commit_pin() -> Option<String> {
if let Ok(v) = std::env::var("HERMES_BUILD_PIN_COMMIT") {
if !v.trim().is_empty() {
return Some(v.trim().to_string());
}
}
let out = Command::new("git")
.args(["rev-parse", "HEAD"])
.output()
.ok()?;
if !out.status.success() {
return None;
}
let s = String::from_utf8(out.stdout).ok()?.trim().to_string();
if s.is_empty() {
None
} else {
Some(s)
}
}
fn resolve_branch_pin() -> Option<String> {
if let Ok(v) = std::env::var("HERMES_BUILD_PIN_BRANCH") {
if !v.trim().is_empty() {
return Some(v.trim().to_string());
}
}
let out = Command::new("git")
.args(["rev-parse", "--abbrev-ref", "HEAD"])
.output()
.ok()?;
if !out.status.success() {
return None;
}
let s = String::from_utf8(out.stdout).ok()?.trim().to_string();
// "HEAD" is what you get on a detached checkout — no meaningful branch
// to pin to. The commit pin still applies; just don't emit a branch.
if s.is_empty() || s == "HEAD" {
None
} else {
Some(s)
}
}
fn locate_git_dir() -> Option<std::path::PathBuf> {
let out = Command::new("git")
.args(["rev-parse", "--git-dir"])
.output()
.ok()?;
if !out.status.success() {
return None;
}
let s = String::from_utf8(out.stdout).ok()?.trim().to_string();
if s.is_empty() {
return None;
}
Some(std::path::PathBuf::from(s))
}
fn short(commit: &str) -> &str {
if commit.len() >= 12 {
&commit[..12]
} else {
commit
}
}

View File

@@ -0,0 +1,16 @@
{
"$schema": "https://schema.tauri.app/config/2/capability",
"identifier": "default",
"description": "Capabilities required by Hermes Setup. Narrowly scoped: we don't write user files outside HERMES_HOME, we don't read arbitrary paths, and the only external network call goes through reqwest (Rust side, not exposed to the webview).",
"windows": ["main"],
"permissions": [
"core:default",
"core:window:allow-close",
"core:window:allow-minimize",
"core:event:default",
"opener:default",
"dialog:default",
"process:default",
"shell:default"
]
}

View File

@@ -0,0 +1,75 @@
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<!--
Hermes Setup application manifest.
The TL;DR: tell Windows we are NOT an installer in the classic "needs
UAC elevation" sense, despite the product name. We provision into
%LOCALAPPDATA%\hermes which is user-scoped and never touch HKLM or
Program Files. install.ps1 runs as a child process and elevates
itself only if a future stage explicitly needs HKLM access.
Without this manifest, the "Hermes Setup" productName embedded in
the binary's resource trips Windows's installer-detection heuristic
(https://learn.microsoft.com/en-us/windows/security/identity-protection/
user-account-control/how-user-account-control-works#installer-detection)
and CreateProcess fails with ERROR_ELEVATION_REQUIRED (740) when the
user double-clicks. asInvoker disables that.
-->
<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
<assemblyIdentity
version="0.0.1.0"
processorArchitecture="*"
name="NousResearch.Hermes.Setup"
type="win32"
/>
<description>Hermes Setup</description>
<trustInfo xmlns="urn:schemas-microsoft-com:asm.v3">
<security>
<requestedPrivileges>
<requestedExecutionLevel level="asInvoker" uiAccess="false"/>
</requestedPrivileges>
</security>
</trustInfo>
<!-- Tell Windows we know about all supported OSes (10 + 11) so it
doesn't shim us into Vista-compat mode. -->
<compatibility xmlns="urn:schemas-microsoft-com:compatibility.v1">
<application>
<!-- Windows 10 / 11 -->
<supportedOS Id="{8e0f7a12-bfb3-4fe8-b9a5-48fd50a15a9a}"/>
<!-- Windows 8.1 -->
<supportedOS Id="{1f676c76-80e1-4239-95bb-83d0f6d0da78}"/>
<!-- Windows 8 -->
<supportedOS Id="{4a2f28e3-53b9-4441-ba9c-d69d4a4a6e38}"/>
<!-- Windows 7 -->
<supportedOS Id="{35138b9a-5d96-4fbd-8e2d-a2440225f93a}"/>
<!-- Windows Vista -->
<supportedOS Id="{e2011457-1546-43c5-a5fe-008deee3d3f0}"/>
</application>
</compatibility>
<!-- Per-monitor v2 DPI awareness so the installer doesn't go blurry
on high-DPI displays when dragged between monitors. -->
<application xmlns="urn:schemas-microsoft-com:asm.v3">
<windowsSettings>
<dpiAwareness xmlns="http://schemas.microsoft.com/SMI/2016/WindowsSettings">PerMonitorV2</dpiAwareness>
<activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
</windowsSettings>
</application>
<!-- Use the modern common controls (v6 themes). Without this, our
file picker / shell dialogs fall back to 1990s-era visuals. -->
<dependency>
<dependentAssembly>
<assemblyIdentity
type="win32"
name="Microsoft.Windows.Common-Controls"
version="6.0.0.0"
processorArchitecture="*"
publicKeyToken="6595b64144ccf1df"
language="*"
/>
</dependentAssembly>
</dependency>
</assembly>

Binary file not shown.

After

Width:  |  Height:  |  Size: 674 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 674 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 674 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

View File

@@ -0,0 +1,712 @@
//! Bootstrap orchestration.
//!
//! Direct port of `runBootstrap` from `apps/desktop/electron/bootstrap-runner.cjs`.
//! Drives install.ps1 / install.sh stage-by-stage, emits progress events
//! over the Tauri `bootstrap` channel, writes a forensic log to
//! HERMES_HOME/logs/bootstrap-<timestamp>.log.
//!
//! Lifecycle:
//! 1. `start_bootstrap` (Tauri command) → spawns the worker task.
//! 2. Worker resolves install script (dev/cache/download).
//! 3. Worker calls `install.ps1 -Manifest` → emits `manifest` event.
//! 4. Worker iterates stages, calling `install.ps1 -Stage NAME -NonInteractive -Json`.
//! 5. On success → `complete`. On any stage failure → `failed`. On cancel → `failed`.
use std::path::PathBuf;
use std::sync::Arc;
use std::time::Instant;
use anyhow::{anyhow, Result};
use serde::{Deserialize, Serialize};
use tauri::{AppHandle, Emitter, State};
use tokio::sync::{mpsc, Mutex};
use crate::events::{BootstrapEvent, Manifest, StageState};
use crate::install_script::{self, Pin, ScriptKind, ScriptSource};
use crate::powershell::{self, StreamSink};
use crate::AppState;
// ---------------------------------------------------------------------------
// Public Tauri commands
// ---------------------------------------------------------------------------
/// Frontend → Rust: kick off the install.
#[derive(Debug, Deserialize)]
pub struct StartBootstrapArgs {
/// Optional override for the commit pin. Defaults to the build-time
/// pin baked in via `BUILD_PIN_COMMIT`.
pub commit: Option<String>,
/// Optional override for the branch pin. Defaults to `BUILD_PIN_BRANCH`.
pub branch: Option<String>,
/// Include Stage-Desktop (build apps/desktop) in the manifest. The
/// signed bootstrap installer passes true; the deprecated Electron-side
/// bootstrap-runner passes false to avoid building-while-running.
#[serde(default = "default_true")]
pub include_desktop: bool,
/// Optional override for HERMES_HOME. Tests use this; production
/// almost always falls back to the OS default.
pub hermes_home: Option<String>,
}
fn default_true() -> bool {
true
}
#[derive(Debug, Serialize)]
pub struct BootstrapStatus {
pub running: bool,
pub completed: bool,
pub install_root: Option<String>,
pub last_error: Option<String>,
}
/// Handle stored in AppState while a bootstrap run is in flight. Carries
/// the cancellation channel and the most recent terminal status so the
/// frontend can re-query after a window refresh.
pub struct BootstrapHandle {
pub cancel_tx: mpsc::Sender<()>,
pub started_at: Instant,
pub status: BootstrapStatus,
}
#[tauri::command]
pub async fn start_bootstrap(
app: AppHandle,
state: State<'_, Arc<AppState>>,
args: StartBootstrapArgs,
) -> Result<(), String> {
let mut guard = state.bootstrap.lock().await;
if let Some(h) = guard.as_ref() {
if h.status.running {
return Err("Bootstrap is already running".into());
}
}
let (cancel_tx, cancel_rx) = mpsc::channel::<()>(1);
let handle = BootstrapHandle {
cancel_tx,
started_at: Instant::now(),
status: BootstrapStatus {
running: true,
completed: false,
install_root: None,
last_error: None,
},
};
*guard = Some(handle);
drop(guard);
let app_for_task = app.clone();
let state_for_task = state.inner().clone();
let args_for_task = args;
let cancel_rx = Arc::new(Mutex::new(Some(cancel_rx)));
tokio::spawn(async move {
let result = run_bootstrap(app_for_task.clone(), args_for_task, cancel_rx).await;
// Reflect terminal state into AppState so get_bootstrap_status()
// can serve it after the task exits.
let mut guard = state_for_task.bootstrap.lock().await;
if let Some(h) = guard.as_mut() {
h.status.running = false;
match &result {
Ok(install_root) => {
h.status.completed = true;
h.status.install_root = Some(install_root.clone());
h.status.last_error = None;
}
Err(err) => {
h.status.completed = false;
h.status.last_error = Some(err.to_string());
}
}
}
});
Ok(())
}
#[tauri::command]
pub async fn cancel_bootstrap(state: State<'_, Arc<AppState>>) -> Result<(), String> {
let guard = state.bootstrap.lock().await;
if let Some(h) = guard.as_ref() {
let _ = h.cancel_tx.try_send(());
}
Ok(())
}
#[tauri::command]
pub async fn get_bootstrap_status(
state: State<'_, Arc<AppState>>,
) -> Result<BootstrapStatus, String> {
let guard = state.bootstrap.lock().await;
Ok(match guard.as_ref() {
Some(h) => BootstrapStatus {
running: h.status.running,
completed: h.status.completed,
install_root: h.status.install_root.clone(),
last_error: h.status.last_error.clone(),
},
None => BootstrapStatus {
running: false,
completed: false,
install_root: None,
last_error: None,
},
})
}
/// Spawn the locally-built Hermes desktop binary, then close the installer
/// window. Caller resolves the binary path from `install_root`.
///
/// Returns Err with a human-readable message if the binary doesn't exist
/// (e.g. when Stage-Desktop was skipped) so the frontend can present
/// actionable failure UI rather than silently doing nothing.
#[tauri::command]
pub async fn launch_hermes_desktop(
app: AppHandle,
install_root: String,
) -> Result<(), String> {
let install_root = PathBuf::from(install_root);
let exe_path = resolve_hermes_desktop_exe(&install_root).ok_or_else(|| {
format!(
"Couldn't find a built Hermes desktop at {}. The desktop build step \
may have been skipped or failed. Run `hermes desktop` from a \
terminal to build and launch it.",
install_root.join("apps").join("desktop").join("release").display()
)
})?;
tracing::info!(?exe_path, "launching Hermes desktop");
// Detach from us — the installer is about to exit.
let mut cmd = tokio::process::Command::new(&exe_path);
cmd.current_dir(exe_path.parent().unwrap_or(&install_root));
#[cfg(target_os = "windows")]
{
use std::os::windows::process::CommandExt;
// DETACHED_PROCESS = 0x00000008
cmd.creation_flags(0x0000_0008);
}
cmd.spawn().map_err(|e| {
format!(
"failed to launch {}: {e}",
exe_path.display()
)
})?;
// Give Windows ~150ms to actually start the new process before we exit.
tokio::time::sleep(std::time::Duration::from_millis(150)).await;
// Exit the installer cleanly. Tauri's process plugin gives us the
// right hook regardless of platform.
app.exit(0);
Ok(())
}
/// Walks the well-known electron-builder unpacked-app paths under
/// `install_root`. Mirrors the resolver in `cmd_gui` (apps/desktop/release/
/// <os>-unpacked/<exe>).
fn resolve_hermes_desktop_exe(install_root: &std::path::Path) -> Option<PathBuf> {
let release_dir = install_root.join("apps").join("desktop").join("release");
let candidates: &[(&str, &str)] = if cfg!(target_os = "windows") {
&[
("win-unpacked", "Hermes.exe"),
("win-arm64-unpacked", "Hermes.exe"),
]
} else if cfg!(target_os = "macos") {
&[
("mac/Hermes.app/Contents/MacOS", "Hermes"),
("mac-arm64/Hermes.app/Contents/MacOS", "Hermes"),
]
} else {
&[("linux-unpacked", "hermes")]
};
for (subdir, exe) in candidates {
let p = release_dir.join(subdir).join(exe);
if p.exists() {
return Some(p);
}
}
None
}
// ---------------------------------------------------------------------------
// Bootstrap implementation
// ---------------------------------------------------------------------------
async fn run_bootstrap(
app: AppHandle,
args: StartBootstrapArgs,
cancel_rx_holder: Arc<Mutex<Option<mpsc::Receiver<()>>>>,
) -> Result<String> {
let kind = ScriptKind::for_current_os();
let pin = Pin {
commit: args.commit.or_else(|| option_env_string("BUILD_PIN_COMMIT")),
branch: args.branch.or_else(|| option_env_string("BUILD_PIN_BRANCH")),
};
tracing::info!(
?pin,
kind = ?kind,
include_desktop = args.include_desktop,
"bootstrap starting"
);
let app_for_log = app.clone();
let emit_log = move |line: &str| {
emit_event(
&app_for_log,
BootstrapEvent::Log {
stage: None,
line: line.to_string(),
},
);
// Bump to info-level so the line shows in bootstrap-installer.log
// under the default INFO filter. Previously this was debug! which
// got dropped on the floor, leaving us blind whenever install.ps1
// failed — the log only had the "bootstrap starting" banner.
tracing::info!(target: "bootstrap.log", "{line}");
};
// 1. Resolve install.ps1
let script = install_script::resolve(kind, &pin, &emit_log)
.await
.map_err(|e| {
let msg = format!("resolve install script failed: {e:#}");
emit_event(
&app,
BootstrapEvent::Failed {
stage: None,
error: msg.clone(),
},
);
anyhow!(msg)
})?;
let source_note = match &script.source {
ScriptSource::DevCheckout => "dev checkout",
ScriptSource::Bundled => "bundled",
ScriptSource::Cached => "cached",
ScriptSource::Downloaded => "downloaded",
};
emit_log(&format!(
"[bootstrap] script {} via {}",
script.path.display(),
source_note
));
// 2. Fetch manifest
//
// -IncludeDesktop MUST be passed to the manifest call too — install.ps1
// gates the desktop stage inclusion on this flag, so without it here
// the manifest comes back missing the desktop stage and we never run
// it. The per-stage call below also passes -IncludeDesktop to keep
// the contracts identical.
let manifest_args = build_pin_args(&script);
let mut manifest_args_full = vec!["-Manifest".to_string()];
manifest_args_full.extend(manifest_args.clone());
if args.include_desktop {
manifest_args_full.push("-IncludeDesktop".to_string());
}
let manifest_result = run_install_script(
&app,
&script.path,
&manifest_args_full,
args.hermes_home.as_deref(),
None,
Some("__manifest__".to_string()),
)
.await?;
if manifest_result.exit_code != Some(0) {
let err = format!(
"install.ps1 -Manifest failed: exit {:?}\n{}",
manifest_result.exit_code,
manifest_result.stderr.trim()
);
emit_event(
&app,
BootstrapEvent::Failed {
stage: None,
error: err.clone(),
},
);
return Err(anyhow!(err));
}
let manifest: Manifest = powershell::parse_manifest(&manifest_result.stdout).ok_or_else(|| {
let err = format!(
"install.ps1 -Manifest produced no parseable JSON payload\n{}",
truncate(&manifest_result.stdout, 4000)
);
emit_event(
&app,
BootstrapEvent::Failed {
stage: None,
error: err.clone(),
},
);
anyhow!(err)
})?;
emit_event(
&app,
BootstrapEvent::Manifest {
stages: manifest.stages.clone(),
protocol_version: manifest.protocol_version,
},
);
// 3. Iterate stages.
for stage in &manifest.stages {
// Skip Stage-Desktop unless explicitly requested. install.ps1 may
// or may not include it in the manifest depending on the flag we
// pass, but if it slipped in, gate client-side too.
if !args.include_desktop && stage.name.eq_ignore_ascii_case("desktop") {
emit_event(
&app,
BootstrapEvent::Stage {
name: stage.name.clone(),
state: StageState::Skipped,
duration_ms: Some(0),
result: None,
error: Some("skipped by include_desktop=false".into()),
},
);
continue;
}
if cancellation_signalled(&cancel_rx_holder).await {
let err = "bootstrap cancelled by user".to_string();
emit_event(
&app,
BootstrapEvent::Failed {
stage: Some(stage.name.clone()),
error: err.clone(),
},
);
return Err(anyhow!(err));
}
let started = Instant::now();
emit_event(
&app,
BootstrapEvent::Stage {
name: stage.name.clone(),
state: StageState::Running,
duration_ms: None,
result: None,
error: None,
},
);
let mut stage_args = vec![
"-Stage".to_string(),
stage.name.clone(),
"-NonInteractive".to_string(),
"-Json".to_string(),
];
stage_args.extend(manifest_args.clone());
if args.include_desktop {
stage_args.push("-IncludeDesktop".to_string());
}
// Each stage gets its own cancel receiver because tokio::select!
// in run_script consumes it. Take/return through the Arc<Mutex>.
let local_cancel_rx = cancel_rx_holder.lock().await.take();
let stage_result = run_install_script(
&app,
&script.path,
&stage_args,
args.hermes_home.as_deref(),
local_cancel_rx,
Some(stage.name.clone()),
)
.await?;
let duration_ms = started.elapsed().as_millis() as u64;
if stage_result.killed {
emit_event(
&app,
BootstrapEvent::Stage {
name: stage.name.clone(),
state: StageState::Failed,
duration_ms: Some(duration_ms),
result: None,
error: Some("cancelled by user".into()),
},
);
emit_event(
&app,
BootstrapEvent::Failed {
stage: Some(stage.name.clone()),
error: "cancelled by user".into(),
},
);
return Err(anyhow!("cancelled by user"));
}
let result_frame = powershell::parse_stage_result(&stage_result.stdout);
match result_frame {
None => {
let err = format!(
"install.ps1 -Stage {} produced no JSON result frame (exit={:?})",
stage.name, stage_result.exit_code
);
emit_event(
&app,
BootstrapEvent::Stage {
name: stage.name.clone(),
state: StageState::Failed,
duration_ms: Some(duration_ms),
result: None,
error: Some(err.clone()),
},
);
emit_event(
&app,
BootstrapEvent::Failed {
stage: Some(stage.name.clone()),
error: err.clone(),
},
);
return Err(anyhow!(err));
}
Some(frame) if frame.ok && frame.skipped => {
emit_event(
&app,
BootstrapEvent::Stage {
name: stage.name.clone(),
state: StageState::Skipped,
duration_ms: Some(duration_ms),
result: Some(frame),
error: None,
},
);
}
Some(frame) if frame.ok => {
emit_event(
&app,
BootstrapEvent::Stage {
name: stage.name.clone(),
state: StageState::Succeeded,
duration_ms: Some(duration_ms),
result: Some(frame),
error: None,
},
);
}
Some(frame) => {
let err = frame
.reason
.clone()
.unwrap_or_else(|| format!("exit code {:?}", stage_result.exit_code));
emit_event(
&app,
BootstrapEvent::Stage {
name: stage.name.clone(),
state: StageState::Failed,
duration_ms: Some(duration_ms),
result: Some(frame),
error: Some(err.clone()),
},
);
emit_event(
&app,
BootstrapEvent::Failed {
stage: Some(stage.name.clone()),
error: err.clone(),
},
);
return Err(anyhow!(err));
}
}
}
// 4. Resolve install_root. install.ps1 doesn't (yet) report this back
// explicitly; we infer it from $HermesHome which Stage-Repository clones
// the repo INTO at $HermesHome\hermes-agent. Mirrors hermes_constants.
let hermes_home = args
.hermes_home
.clone()
.unwrap_or_else(|| crate::paths::hermes_home().to_string_lossy().into_owned());
let install_root = PathBuf::from(&hermes_home).join("hermes-agent");
// Copy ourselves to HERMES_HOME/hermes-setup.exe so the desktop app can
// re-invoke us with `--update` and shortcuts have a stable target. This is
// a one-shot install concern; an `--update` re-invocation no-ops because
// we're already running from that path. Best-effort — a failure here must
// not fail an otherwise-successful install.
if let Err(err) = crate::paths::copy_self_to_hermes_home() {
tracing::warn!(?err, "failed to copy installer into HERMES_HOME (non-fatal)");
emit_log(&format!(
"[bootstrap] warning: could not stage updater binary: {err}"
));
}
emit_event(
&app,
BootstrapEvent::Complete {
install_root: install_root.to_string_lossy().into_owned(),
marker: Some(serde_json::json!({
"pinnedCommit": pin.commit,
"pinnedBranch": pin.branch,
})),
},
);
Ok(install_root.to_string_lossy().into_owned())
}
async fn cancellation_signalled(holder: &Arc<Mutex<Option<mpsc::Receiver<()>>>>) -> bool {
let mut guard = holder.lock().await;
if let Some(rx) = guard.as_mut() {
rx.try_recv().is_ok()
} else {
false
}
}
async fn run_install_script(
app: &AppHandle,
script_path: &std::path::Path,
args: &[String],
hermes_home_override: Option<&str>,
cancel_rx: Option<mpsc::Receiver<()>>,
stage_name: Option<String>,
) -> Result<powershell::ScriptResult> {
let app_for_stdout = app.clone();
let stage_for_stdout = stage_name.clone();
let app_for_stderr = app.clone();
let stage_for_stderr = stage_name.clone();
let stage_for_stdout_log = stage_name.clone();
let stage_for_stderr_log = stage_name.clone();
let sink = StreamSink {
on_stdout_line: Box::new(move |line: &str| {
emit_event(
&app_for_stdout,
BootstrapEvent::Log {
stage: stage_for_stdout.clone(),
line: line.to_string(),
},
);
// Tee to the rolling installer log so we have a persistent
// record of every install.ps1 line. Without this, the only
// log evidence of a failure was the Tauri event stream —
// which gets discarded the moment the failure route mounts.
match &stage_for_stdout_log {
Some(name) => {
tracing::info!(target: "bootstrap.log", stage = %name, "{line}")
}
None => tracing::info!(target: "bootstrap.log", "{line}"),
}
}),
on_stderr_line: Box::new(move |line: &str| {
emit_event(
&app_for_stderr,
BootstrapEvent::Log {
stage: stage_for_stderr.clone(),
line: format!("stderr: {line}"),
},
);
// stderr-level lines get warn! so they're visually distinct
// when scrolling through the log later.
match &stage_for_stderr_log {
Some(name) => {
tracing::warn!(target: "bootstrap.log", stage = %name, "stderr: {line}")
}
None => tracing::warn!(target: "bootstrap.log", "stderr: {line}"),
}
}),
};
powershell::run_script(script_path, args, sink, hermes_home_override, cancel_rx)
.await
.map_err(|e| {
tracing::error!(?e, "install script invocation failed");
anyhow!("install script invocation failed: {e:#}")
})
}
fn build_pin_args(script: &install_script::ResolvedScript) -> Vec<String> {
let mut out = Vec::new();
if let Some(c) = &script.commit {
out.push("-Commit".to_string());
out.push(c.clone());
}
if let Some(b) = &script.branch {
out.push("-Branch".to_string());
out.push(b.clone());
}
out
}
fn emit_event(app: &AppHandle, event: BootstrapEvent) {
// Tee important state transitions to the rolling installer log so
// bootstrap-installer.log isn't just "starting" + final summary.
// Log lines (the noisy stuff) handle their own tracing in
// run_install_script's sink; here we cover the lifecycle frames.
match &event {
BootstrapEvent::Manifest { stages, .. } => {
tracing::info!(
stage_count = stages.len(),
names = ?stages.iter().map(|s| s.name.as_str()).collect::<Vec<_>>(),
"manifest received"
);
}
BootstrapEvent::Stage {
name,
state,
duration_ms,
error,
..
} => {
tracing::info!(
stage = %name,
?state,
duration_ms = ?duration_ms,
error = ?error,
"stage transition"
);
}
BootstrapEvent::Complete { install_root, .. } => {
tracing::info!(install_root = %install_root, "bootstrap complete");
}
BootstrapEvent::Failed { stage, error } => {
tracing::error!(stage = ?stage, error = %error, "bootstrap FAILED");
}
BootstrapEvent::Log { .. } => {
// Log lines are teed via the sink callbacks in
// run_install_script — don't double-emit here.
}
}
if let Err(e) = app.emit(BootstrapEvent::CHANNEL, &event) {
tracing::warn!(?e, "failed to emit bootstrap event");
}
}
fn option_env_string(key: &str) -> Option<String> {
// option_env! only accepts literals, so we hardcode the known keys.
let val = match key {
"BUILD_PIN_COMMIT" => option_env!("BUILD_PIN_COMMIT"),
"BUILD_PIN_BRANCH" => option_env!("BUILD_PIN_BRANCH"),
_ => None,
};
val.map(|s| s.to_string())
}
fn truncate(s: &str, max: usize) -> String {
if s.len() <= max {
s.to_string()
} else {
format!("{}...", &s[..max])
}
}

View File

@@ -0,0 +1,99 @@
//! Event types streamed from Rust → React.
//!
//! These mirror `apps/desktop/electron/bootstrap-runner.cjs`'s event shape
//! 1:1 so the React installer code can be roughly identical to the Electron
//! install-overlay we'll replace.
//!
//! The Tauri event channel name is `"bootstrap"` for all of these — the
//! `type` discriminator on each payload is how the frontend routes.
use serde::{Deserialize, Serialize};
/// Stage definition as reported by `install.ps1 -Manifest`.
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct StageInfo {
pub name: String,
pub title: String,
pub category: String,
/// `needs_user_input=true` stages run with -NonInteractive and emit
/// skipped=true; the post-install wizard takes over for those.
#[serde(rename = "needs_user_input", alias = "needsUserInput")]
pub needs_user_input: bool,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Manifest {
pub stages: Vec<StageInfo>,
#[serde(rename = "protocol_version", alias = "protocolVersion", default)]
pub protocol_version: Option<u32>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct StageResultPayload {
pub stage: String,
pub ok: bool,
#[serde(default)]
pub skipped: bool,
#[serde(default)]
pub reason: Option<String>,
/// install.ps1 may attach stage-specific structured data here.
#[serde(default)]
pub data: Option<serde_json::Value>,
}
/// Run-state for a single stage as we transition through it.
#[derive(Debug, Clone, Copy, PartialEq, Eq, Serialize, Deserialize)]
#[serde(rename_all = "lowercase")]
pub enum StageState {
Running,
Succeeded,
Skipped,
Failed,
}
/// The single event channel `bootstrap` emits these. `type` discriminates.
#[derive(Debug, Clone, Serialize)]
#[serde(tag = "type", rename_all = "lowercase")]
pub enum BootstrapEvent {
/// Sent once at the start with the full stage list.
Manifest {
stages: Vec<StageInfo>,
#[serde(rename = "protocolVersion")]
protocol_version: Option<u32>,
},
/// Stage state transition. `result` populated only on terminal states.
Stage {
name: String,
state: StageState,
#[serde(rename = "durationMs", skip_serializing_if = "Option::is_none")]
duration_ms: Option<u64>,
#[serde(skip_serializing_if = "Option::is_none")]
result: Option<StageResultPayload>,
#[serde(skip_serializing_if = "Option::is_none")]
error: Option<String>,
},
/// Raw stdout/stderr line from install.ps1 (or our wrapper).
Log {
#[serde(skip_serializing_if = "Option::is_none")]
stage: Option<String>,
line: String,
},
/// Sent once when all stages complete successfully.
Complete {
#[serde(rename = "installRoot")]
install_root: String,
marker: Option<serde_json::Value>,
},
/// Sent once if the run aborts.
Failed {
#[serde(skip_serializing_if = "Option::is_none")]
stage: Option<String>,
error: String,
},
}
impl BootstrapEvent {
/// Tauri event name. Single channel for all bootstrap events; the
/// `type` tag tells the renderer how to interpret the payload.
pub const CHANNEL: &'static str = "bootstrap";
}

View File

@@ -0,0 +1,273 @@
//! Resolves and downloads `scripts/install.ps1` (and `install.sh`).
//!
//! Resolution order:
//! 1. Dev shortcut: a sibling repo checkout via $HERMES_SETUP_DEV_REPO_ROOT
//! env var. Lets devs iterate without re-publishing the script.
//! 2. Bundled fallback: if the installer was bundled with a script (e.g.
//! tauri's `resource` mechanism), serve from there. Not used today.
//! 3. Network: download from GitHub raw at a pinned commit or branch.
//! Commit pins are immutable; branch pins are HEAD-tracking.
//!
//! Mirrors `apps/desktop/electron/bootstrap-runner.cjs`'s `resolveInstallScript`,
//! but the dev-checkout resolution is driven by an env var rather than the
//! Electron app's APP_ROOT/../.. trick, because Hermes-Setup.exe is meant
//! to live OUTSIDE any repo checkout.
use anyhow::{anyhow, Context, Result};
use std::path::{Path, PathBuf};
use tokio::io::AsyncWriteExt;
use crate::paths;
/// Identity of the install.ps1 we'll execute. Used by both the manifest
/// fetch and the per-stage runs.
#[derive(Debug, Clone)]
pub struct ResolvedScript {
pub path: PathBuf,
pub source: ScriptSource,
/// Commit pin (40-char SHA) if known. install.ps1's `-Commit` arg is
/// what makes the repo stage clone the exact tested SHA.
pub commit: Option<String>,
pub branch: Option<String>,
}
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum ScriptSource {
DevCheckout,
Bundled,
Cached,
Downloaded,
}
/// What flavor of script (Windows .ps1 vs Unix .sh).
#[derive(Debug, Clone, Copy)]
pub enum ScriptKind {
Ps1,
Sh,
}
impl ScriptKind {
pub fn for_current_os() -> Self {
if cfg!(target_os = "windows") {
Self::Ps1
} else {
Self::Sh
}
}
fn filename(&self) -> &'static str {
match self {
Self::Ps1 => "install.ps1",
Self::Sh => "install.sh",
}
}
}
/// Validates a string looks like a git SHA (7+ hex chars). Mirrors
/// `STAMP_COMMIT_RE` from bootstrap-runner.cjs.
fn is_valid_commit(s: &str) -> bool {
let len = s.len();
(7..=40).contains(&len) && s.chars().all(|c| c.is_ascii_hexdigit())
}
/// Resolves the install script to use for this run.
///
/// `pin` is the commit-or-branch from either Hermes-Setup's build-time
/// constant (compiled into the installer) or a runtime override.
pub async fn resolve(
kind: ScriptKind,
pin: &Pin,
emit_log: &impl Fn(&str),
) -> Result<ResolvedScript> {
// 1. Dev shortcut.
if let Ok(repo_root) = std::env::var("HERMES_SETUP_DEV_REPO_ROOT") {
let candidate = PathBuf::from(repo_root).join("scripts").join(kind.filename());
if candidate.exists() {
emit_log(&format!(
"[bootstrap] dev mode — using local {} at {}",
kind.filename(),
candidate.display()
));
return Ok(ResolvedScript {
path: candidate,
source: ScriptSource::DevCheckout,
commit: pin.commit.clone(),
branch: pin.branch.clone(),
});
}
}
// 2. (Not implemented) bundled fallback.
// 3. Network. Pin must be a real commit or a branch ref.
let commit_or_ref = match (&pin.commit, &pin.branch) {
(Some(c), _) if is_valid_commit(c) => c.clone(),
(_, Some(b)) if !b.trim().is_empty() => b.clone(),
(Some(other), _) => {
return Err(anyhow!(
"install script pin commit `{other}` is not a valid git SHA"
));
}
_ => {
return Err(anyhow!(
"no install-script pin supplied — installer cannot resolve a script source"
));
}
};
let cached = cached_path(kind, &commit_or_ref);
if cached.exists() {
emit_log(&format!(
"[bootstrap] using cached {} for {}",
kind.filename(),
truncate_ref(&commit_or_ref)
));
return Ok(ResolvedScript {
path: cached,
source: ScriptSource::Cached,
commit: pin.commit.clone(),
branch: pin.branch.clone(),
});
}
emit_log(&format!(
"[bootstrap] downloading {} for {} from GitHub",
kind.filename(),
truncate_ref(&commit_or_ref)
));
download(kind, &commit_or_ref, &cached).await?;
emit_log(&format!("[bootstrap] cached to {}", cached.display()));
Ok(ResolvedScript {
path: cached,
source: ScriptSource::Downloaded,
commit: pin.commit.clone(),
branch: pin.branch.clone(),
})
}
#[derive(Debug, Clone, Default)]
pub struct Pin {
pub commit: Option<String>,
pub branch: Option<String>,
}
fn cached_path(kind: ScriptKind, commit_or_ref: &str) -> PathBuf {
let safe = sanitize_ref(commit_or_ref);
let filename = match kind {
ScriptKind::Ps1 => format!("install-{safe}.ps1"),
ScriptKind::Sh => format!("install-{safe}.sh"),
};
paths::bootstrap_cache_dir().join(filename)
}
/// Replace anything that's not [A-Za-z0-9._-] with `_`. Branch refs can
/// contain `/`, dots, etc.; we want a flat filename.
fn sanitize_ref(s: &str) -> String {
s.chars()
.map(|c| {
if c.is_ascii_alphanumeric() || c == '.' || c == '-' || c == '_' {
c
} else {
'_'
}
})
.collect()
}
fn truncate_ref(s: &str) -> &str {
if is_valid_commit(s) && s.len() >= 12 {
&s[..12]
} else {
s
}
}
/// Downloads to `dest_path` via reqwest with rustls. Atomically renames
/// `dest_path.tmp` → `dest_path` so partial writes don't poison the cache.
async fn download(kind: ScriptKind, commit_or_ref: &str, dest_path: &Path) -> Result<()> {
let url = format!(
"https://raw.githubusercontent.com/NousResearch/hermes-agent/{}/scripts/{}",
commit_or_ref,
kind.filename()
);
if let Some(parent) = dest_path.parent() {
std::fs::create_dir_all(parent).with_context(|| {
format!("creating bootstrap-cache parent dir {}", parent.display())
})?;
}
let tmp_path = dest_path.with_extension({
let ext = dest_path
.extension()
.and_then(|s| s.to_str())
.unwrap_or("tmp");
format!("{ext}.tmp")
});
let response = reqwest::Client::new()
.get(&url)
.header("User-Agent", "hermes-setup/0.0.1")
.send()
.await
.with_context(|| format!("GET {url}"))?;
if !response.status().is_success() {
return Err(anyhow!(
"Failed to download {}: HTTP {} from {}",
kind.filename(),
response.status(),
url
));
}
let bytes = response
.bytes()
.await
.with_context(|| format!("reading body of {url}"))?;
let mut file = tokio::fs::File::create(&tmp_path)
.await
.with_context(|| format!("creating temp file {}", tmp_path.display()))?;
file.write_all(&bytes)
.await
.with_context(|| format!("writing temp file {}", tmp_path.display()))?;
file.flush().await.context("flushing temp file")?;
drop(file);
tokio::fs::rename(&tmp_path, dest_path)
.await
.with_context(|| {
format!(
"renaming {}{}",
tmp_path.display(),
dest_path.display()
)
})?;
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn is_valid_commit_accepts_short_and_full_shas() {
assert!(is_valid_commit("02d26981d3d4ad50e142399b8476f59ad5953ff0"));
assert!(is_valid_commit("02d2698"));
assert!(!is_valid_commit("02d269"));
assert!(!is_valid_commit("not-a-sha"));
assert!(!is_valid_commit(""));
}
#[test]
fn sanitize_ref_replaces_slashes() {
assert_eq!(sanitize_ref("bb/gui"), "bb_gui");
assert_eq!(sanitize_ref("main"), "main");
assert_eq!(sanitize_ref("release/1.2.3"), "release_1.2.3");
}
}

View File

@@ -0,0 +1,134 @@
//! Hermes Setup — Tauri entrypoint.
//!
//! Spawns a single window pointed at the React frontend (apps/bootstrap-installer/src/).
//! All install-time work lives in `bootstrap.rs` and is invoked through the Tauri
//! commands registered at the bottom of `run()`.
//!
//! The Windows-subsystem strip lives on the binary crate (src/main.rs), not
//! here — a crate-level attribute on a lib doesn't propagate to the linker
//! flags of the executable that consumes it.
mod bootstrap;
mod events;
mod install_script;
mod powershell;
mod paths;
mod update;
use std::sync::Arc;
use tokio::sync::Mutex;
/// How the installer was invoked. Resolved once from the process args in
/// `run()` and exposed to the frontend via `get_mode` so it can route to the
/// install flow (first-run onboarding) or the update flow (driven by the
/// desktop app handing off via `Hermes-Setup.exe --update`).
///
/// Bare launch (double-click, first-run) => Install.
/// `--update` (spawned by the desktop's "Update" button) => Update.
#[derive(Debug, Clone, Copy, PartialEq, Eq, serde::Serialize)]
#[serde(rename_all = "lowercase")]
pub enum AppMode {
Install,
Update,
}
impl AppMode {
/// Resolve the mode from an argument iterator. Anything containing the
/// `--update` flag selects Update; otherwise Install. Kept arg-iterator
/// generic (not reading `std::env` directly) so it's unit-testable.
pub fn from_args<I, S>(args: I) -> Self
where
I: IntoIterator<Item = S>,
S: AsRef<str>,
{
for a in args {
if a.as_ref() == "--update" {
return AppMode::Update;
}
}
AppMode::Install
}
}
/// Process-wide install state, shared across Tauri commands.
///
/// The bootstrap is a one-shot, single-tenant process — we only need one
/// of these per window. `Arc<Mutex<...>>` lets command handlers grab it
/// without lifetime gymnastics.
pub struct AppState {
pub bootstrap: Mutex<Option<bootstrap::BootstrapHandle>>,
/// How this process was launched (install vs update). Immutable for the
/// lifetime of the process; read by the `get_mode` command.
pub mode: AppMode,
}
impl AppState {
fn new(mode: AppMode) -> Self {
Self {
bootstrap: Mutex::new(None),
mode,
}
}
}
/// Frontend → Rust: which flow should the UI render?
#[tauri::command]
fn get_mode(state: tauri::State<'_, Arc<AppState>>) -> AppMode {
state.mode
}
#[cfg_attr(mobile, tauri::mobile_entry_point)]
pub fn run() {
// Tracing → bootstrap-installer.log under HERMES_HOME/logs/ so install
// failures leave a trail for support. Console output also goes here in
// debug builds.
let _guard = paths::init_logging();
let mode = AppMode::from_args(std::env::args().skip(1));
tracing::info!(?mode, "Hermes Setup starting");
tauri::Builder::default()
.plugin(tauri_plugin_dialog::init())
.plugin(tauri_plugin_opener::init())
.plugin(tauri_plugin_process::init())
.plugin(tauri_plugin_shell::init())
.manage(Arc::new(AppState::new(mode)))
.invoke_handler(tauri::generate_handler![
// Mode (install vs update)
get_mode,
// Bootstrap lifecycle
bootstrap::start_bootstrap,
bootstrap::cancel_bootstrap,
bootstrap::get_bootstrap_status,
// Update lifecycle
update::start_update,
// Hand-off
bootstrap::launch_hermes_desktop,
// Diagnostics
paths::get_log_path,
paths::get_hermes_home,
paths::open_log_dir,
])
.run(tauri::generate_context!())
.expect("error while running Hermes Setup");
}
#[cfg(test)]
mod tests {
use super::AppMode;
#[test]
fn bare_args_are_install() {
assert_eq!(AppMode::from_args(Vec::<String>::new()), AppMode::Install);
assert_eq!(AppMode::from_args(["--foo", "bar"]), AppMode::Install);
}
#[test]
fn update_flag_selects_update() {
assert_eq!(AppMode::from_args(["--update"]), AppMode::Update);
assert_eq!(
AppMode::from_args(["--something", "--update", "--else"]),
AppMode::Update
);
}
}

View File

@@ -0,0 +1,19 @@
// Hermes Setup — process entrypoint. All logic lives in lib.rs so it can
// be unit-tested as a library; this file just calls into it.
//
// The windows_subsystem attribute MUST live here on the binary crate
// (not lib.rs) — placing it on the lib was the bug that left a stray
// cmd window behind Hermes-Setup.exe on release builds.
//
// `windows_subsystem = "windows"` strips the console allocation that
// the default `windows_subsystem = "console"` would do, so double-clicking
// the .exe gives you ONLY the Tauri window.
//
// debug_assertions guard: dev builds keep the console so tracing output
// is visible during `cargo tauri dev`.
#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")]
fn main() {
hermes_bootstrap_lib::run()
}

View File

@@ -0,0 +1,168 @@
//! Filesystem paths + logging setup.
//!
//! Mirrors `hermes_constants.get_hermes_home()` from the Python CLI:
//! Windows: %LOCALAPPDATA%\hermes
//! macOS: ~/.hermes
//! Linux: ~/.hermes (override via $HERMES_HOME)
//!
//! NOTE (macOS): Python's get_hermes_home(), scripts/install.sh, and the
//! Electron desktop's resolveHermesHome() ALL use ~/.hermes on macOS — there
//! is no ~/Library/Application Support branch anywhere else. An earlier
//! version of this file used Application Support, which drifted from every
//! other component: the installer wrote the install to one dir and the
//! desktop looked for it in another, so first launch never found the backend.
//!
//! IMPORTANT: this must match exactly. Drift here means install.ps1
//! writes to one place and the installer reads from another, breaking
//! the bootstrap-complete check.
use std::path::{Path, PathBuf};
use tracing_appender::non_blocking::WorkerGuard;
/// Returns the canonical Hermes home directory, respecting $HERMES_HOME if set.
pub fn hermes_home() -> PathBuf {
if let Ok(override_path) = std::env::var("HERMES_HOME") {
if !override_path.trim().is_empty() {
return PathBuf::from(override_path);
}
}
#[cfg(target_os = "windows")]
{
// %LOCALAPPDATA%\hermes — matches scripts/install.ps1's $HermesHome.
if let Some(local_app_data) = dirs::data_local_dir() {
return local_app_data.join("hermes");
}
}
// macOS + Linux + fallback: ~/.hermes (matches Python get_hermes_home(),
// install.sh, and the Electron desktop's resolveHermesHome()).
if let Some(home) = dirs::home_dir() {
return home.join(".hermes");
}
// Last resort — current dir, almost certainly wrong but at least
// doesn't panic.
PathBuf::from(".hermes")
}
pub fn log_dir() -> PathBuf {
hermes_home().join("logs")
}
pub fn log_path() -> PathBuf {
log_dir().join("bootstrap-installer.log")
}
pub fn bootstrap_cache_dir() -> PathBuf {
hermes_home().join("bootstrap-cache")
}
/// Stable location the installer copies itself to after a successful install.
/// The desktop app re-invokes this with `--update`, and the start-menu /
/// desktop shortcuts can point users back to it. Lives directly under
/// HERMES_HOME so it survives repo checkout deletion (unlike anything under
/// hermes-agent/).
///
/// On Windows this is `%LOCALAPPDATA%\hermes\hermes-setup.exe`; on other
/// platforms the extension differs but the directory is the same.
pub fn installer_dest() -> PathBuf {
let name = if cfg!(target_os = "windows") {
"hermes-setup.exe"
} else {
"hermes-setup"
};
hermes_home().join(name)
}
/// Copy the currently-running installer binary to `installer_dest()` so it's
/// available for future `--update` runs and shortcut launches.
///
/// No-ops (returns Ok) when the running exe is ALREADY the destination — which
/// is exactly the case during an `--update` run (the desktop launched us FROM
/// that path), where copying onto ourselves would be a Windows sharing
/// violation. Best-effort: a failure here must not fail the install, so the
/// caller logs and continues.
pub fn copy_self_to_hermes_home() -> std::io::Result<()> {
let src = std::env::current_exe()?;
let dest = installer_dest();
// Skip if we're already running from the destination (update re-invocation
// or a prior copy). canonicalize both so symlinks / 8.3 short paths / case
// differences don't trick us into a self-copy.
let same = match (src.canonicalize(), dest.canonicalize()) {
(Ok(a), Ok(b)) => a == b,
_ => src == dest,
};
if same {
tracing::info!(?dest, "installer already at destination; skipping self-copy");
return Ok(());
}
if let Some(parent) = dest.parent() {
std::fs::create_dir_all(parent)?;
}
std::fs::copy(&src, &dest)?;
tracing::info!(?src, ?dest, "copied installer to HERMES_HOME");
Ok(())
}
/// Where install.ps1 writes the bootstrap-complete marker (existence-only file
/// the Electron app also checks). Per main.cjs:
/// const BOOTSTRAP_COMPLETE_MARKER = path.join(ACTIVE_HERMES_ROOT, '.hermes-bootstrap-complete')
/// We don't always know ACTIVE_HERMES_ROOT until install.ps1 reports it, so
/// this is a probe helper, not a definitive path.
pub fn likely_bootstrap_marker(install_root: &Path) -> PathBuf {
install_root.join(".hermes-bootstrap-complete")
}
/// Initializes tracing to bootstrap-installer.log under HERMES_HOME/logs/.
/// Returns a guard that flushes the appender on drop — keep it alive for
/// the lifetime of the process.
pub fn init_logging() -> Option<WorkerGuard> {
let dir = log_dir();
if let Err(err) = std::fs::create_dir_all(&dir) {
// No log dir → log to stderr only. Don't panic; the installer
// should still be usable on an exotic filesystem.
eprintln!("[hermes-setup] could not create log dir {dir:?}: {err}");
return None;
}
let file_appender = tracing_appender::rolling::never(&dir, "bootstrap-installer.log");
let (non_blocking, guard) = tracing_appender::non_blocking(file_appender);
let env_filter = tracing_subscriber::EnvFilter::try_from_env("HERMES_BOOTSTRAP_LOG")
.unwrap_or_else(|_| tracing_subscriber::EnvFilter::new("info"));
tracing_subscriber::fmt()
.with_env_filter(env_filter)
.with_writer(non_blocking)
.with_ansi(false)
.with_target(true)
.init();
Some(guard)
}
// ---------------------------------------------------------------------------
// Tauri commands
// ---------------------------------------------------------------------------
#[tauri::command]
pub fn get_log_path() -> String {
log_path().to_string_lossy().into_owned()
}
#[tauri::command]
pub fn get_hermes_home() -> String {
hermes_home().to_string_lossy().into_owned()
}
#[tauri::command]
pub fn open_log_dir(app: tauri::AppHandle) -> Result<(), String> {
use tauri_plugin_opener::OpenerExt;
let path = log_dir();
app.opener()
.open_path(path.to_string_lossy(), None::<&str>)
.map_err(|e| e.to_string())
}

View File

@@ -0,0 +1,267 @@
//! Drives PowerShell (Windows) or bash (Unix) for install.ps1 / install.sh.
//!
//! Port of `spawnPowerShell` from bootstrap-runner.cjs, with the same
//! line-buffered stdout/stderr streaming + cancellation semantics.
//!
//! On Windows we pass `-NoProfile -ExecutionPolicy Bypass -File <script>`.
//! On Unix we shell out to `bash <script>` since install.sh expects bash.
use anyhow::{Context, Result};
use std::path::Path;
use std::process::Stdio;
use tokio::io::{AsyncBufReadExt, BufReader};
use tokio::process::{Child, Command};
use tokio::sync::mpsc;
/// Hooks the caller installs to receive output.
pub struct StreamSink {
pub on_stdout_line: Box<dyn Fn(&str) + Send + Sync>,
pub on_stderr_line: Box<dyn Fn(&str) + Send + Sync>,
}
/// Outcome of a script invocation. Mirrors bootstrap-runner.cjs's
/// `{stdout, stderr, code, signal, killed}` shape.
#[derive(Debug)]
pub struct ScriptResult {
pub stdout: String,
pub stderr: String,
pub exit_code: Option<i32>,
pub killed: bool,
}
/// Cancellation signal — `cancel_tx.send(()).await` aborts the running script.
pub type CancelRx = mpsc::Receiver<()>;
/// Spawns install.ps1 / install.sh with the given args and streams output.
///
/// `hermes_home_override` propagates to the child as $HERMES_HOME so the
/// install script writes to the same directory the installer is reading from.
pub async fn run_script(
script_path: &Path,
args: &[String],
sink: StreamSink,
hermes_home_override: Option<&str>,
mut cancel_rx: Option<CancelRx>,
) -> Result<ScriptResult> {
let mut cmd = build_command(script_path, args);
if let Some(home) = hermes_home_override {
cmd.env("HERMES_HOME", home);
}
cmd.stdin(Stdio::null())
.stdout(Stdio::piped())
.stderr(Stdio::piped());
// On Windows, avoid spawning a flashing cmd window when we're hosted
// inside a GUI process. Tauri's main window is already created, so
// the side-effect console for the child is unwanted.
#[cfg(target_os = "windows")]
{
// CREATE_NO_WINDOW = 0x08000000
cmd.creation_flags(0x0800_0000);
}
let mut child: Child = cmd
.spawn()
.with_context(|| format!("spawning {}", script_path.display()))?;
let stdout = child.stdout.take().expect("stdout was piped");
let stderr = child.stderr.take().expect("stderr was piped");
let mut stdout_reader = BufReader::new(stdout).lines();
let mut stderr_reader = BufReader::new(stderr).lines();
let mut combined_stdout = String::new();
let mut combined_stderr = String::new();
let mut killed = false;
// Loop: poll stdout, stderr, cancel, and child exit concurrently.
loop {
tokio::select! {
line = stdout_reader.next_line() => {
match line {
Ok(Some(l)) => {
(sink.on_stdout_line)(&l);
combined_stdout.push_str(&l);
combined_stdout.push('\n');
}
Ok(None) => {
// EOF on stdout — wait for stderr + exit.
break;
}
Err(e) => {
tracing::warn!("stdout read error: {e}");
break;
}
}
}
line = stderr_reader.next_line() => {
match line {
Ok(Some(l)) => {
(sink.on_stderr_line)(&l);
combined_stderr.push_str(&l);
combined_stderr.push('\n');
}
Ok(None) => {
// stderr EOF — keep draining stdout.
}
Err(e) => {
tracing::warn!("stderr read error: {e}");
}
}
}
_ = recv_cancel(&mut cancel_rx) => {
tracing::warn!("cancellation received — killing child");
killed = true;
// best-effort kill; don't propagate errors
let _ = child.start_kill();
break;
}
}
}
// Drain remaining lines after the loop exited.
while let Ok(Some(l)) = stdout_reader.next_line().await {
(sink.on_stdout_line)(&l);
combined_stdout.push_str(&l);
combined_stdout.push('\n');
}
while let Ok(Some(l)) = stderr_reader.next_line().await {
(sink.on_stderr_line)(&l);
combined_stderr.push_str(&l);
combined_stderr.push('\n');
}
let status = child
.wait()
.await
.context("waiting for install script to exit")?;
Ok(ScriptResult {
stdout: combined_stdout,
stderr: combined_stderr,
exit_code: status.code(),
killed,
})
}
async fn recv_cancel(rx: &mut Option<CancelRx>) {
match rx {
Some(r) => {
let _ = r.recv().await;
}
None => std::future::pending::<()>().await,
}
}
#[cfg(target_os = "windows")]
fn build_command(script_path: &Path, args: &[String]) -> Command {
// We want PowerShell 5.1 / 7. install.ps1 uses 5.1-safe syntax everywhere.
// Prefer `powershell.exe` (5.1 baseline, present on every Windows since 7)
// over `pwsh.exe` (7+, may not be present).
let mut cmd = Command::new("powershell.exe");
cmd.arg("-NoProfile");
cmd.arg("-ExecutionPolicy").arg("Bypass");
cmd.arg("-File").arg(script_path);
for a in args {
cmd.arg(a);
}
cmd
}
#[cfg(not(target_os = "windows"))]
fn build_command(script_path: &Path, args: &[String]) -> Command {
// install.sh expects bash. /bin/bash is fine on macOS (Apple still
// ships an old 3.2 bash; install.sh is written to that baseline).
let mut cmd = Command::new("bash");
cmd.arg(script_path);
for a in args {
cmd.arg(a);
}
cmd
}
/// Parses the LAST line of stdout that looks like a JSON object matching
/// the install.ps1 stage-result contract: `{ok: bool, stage: string, ...}`.
///
/// Mirrors `parseStageResult` from bootstrap-runner.cjs. install.ps1 may
/// print info/banner lines before the result frame; we scan from the end.
pub fn parse_stage_result(stdout: &str) -> Option<crate::events::StageResultPayload> {
for line in stdout.lines().rev() {
let trimmed = line.trim();
if trimmed.is_empty() {
continue;
}
if let Ok(value) = serde_json::from_str::<serde_json::Value>(trimmed) {
if value.get("ok").and_then(|v| v.as_bool()).is_some()
&& value.get("stage").and_then(|v| v.as_str()).is_some()
{
if let Ok(parsed) =
serde_json::from_value::<crate::events::StageResultPayload>(value)
{
return Some(parsed);
}
}
}
}
None
}
/// Same logic but for the `-Manifest` payload (the LAST line with a `stages`
/// array). Returns the parsed manifest.
pub fn parse_manifest(stdout: &str) -> Option<crate::events::Manifest> {
for line in stdout.lines().rev() {
let trimmed = line.trim();
if trimmed.is_empty() {
continue;
}
if let Ok(value) = serde_json::from_str::<serde_json::Value>(trimmed) {
if value.get("stages").and_then(|v| v.as_array()).is_some() {
if let Ok(parsed) = serde_json::from_value::<crate::events::Manifest>(value) {
return Some(parsed);
}
}
}
}
None
}
#[cfg(target_os = "windows")]
use std::os::windows::process::CommandExt;
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn parse_stage_result_picks_last_json_line() {
let stdout = r#"
[bootstrap] some info
{"ok": false, "stage": "venv", "reason": "bad python"}
{"ok": true, "stage": "venv"}
final non-json banner
"#;
let result = parse_stage_result(stdout).unwrap();
assert_eq!(result.stage, "venv");
assert!(result.ok);
}
#[test]
fn parse_manifest_finds_stages_array() {
let stdout = r#"
info line
{"stages": [{"name": "uv", "title": "uv", "category": "prereqs", "needs_user_input": false}], "protocol_version": 1}
"#;
let m = parse_manifest(stdout).unwrap();
assert_eq!(m.stages.len(), 1);
assert_eq!(m.stages[0].name, "uv");
assert_eq!(m.protocol_version, Some(1));
}
#[test]
fn parse_returns_none_when_no_match() {
assert!(parse_stage_result("just banner\n").is_none());
assert!(parse_manifest("just banner\n").is_none());
}
}

View File

@@ -0,0 +1,462 @@
//! Update orchestration.
//!
//! Driven when the installer is launched as `Hermes-Setup.exe --update` (see
//! `AppMode` in lib.rs). The desktop app hands off to us — it exits, then we:
//!
//! 1. wait for the old Hermes desktop process to fully exit (so the venv
//! shim is free; otherwise `hermes update` aborts with exit code 2),
//! 2. run `hermes update --yes --gateway` (Python/repo update; this does NOT
//! rebuild apps/desktop by design — see cmd_update in hermes_cli/main.py),
//! 3. run `hermes desktop --build-only` (the rebuild step update skips),
//! 4. launch the freshly-built desktop (reuses bootstrap::launch logic).
//!
//! We reuse the `BootstrapEvent` channel + the existing progress UI by
//! emitting a synthetic two-stage manifest ("update", "rebuild"). To the
//! frontend an update looks like a short bootstrap.
//!
//! Cross-platform note: `hermes update` already handles macOS/Linux (git/pip).
//! The only OS-specific bits here are the venv shim path (resolve_hermes) and
//! the no-window creation flag — both already cfg-gated. Keep new logic
//! OS-agnostic so the mac/linux port stays "fill in the paths".
use std::path::{Path, PathBuf};
use std::process::Stdio;
use std::time::{Duration, Instant};
use anyhow::{anyhow, Result};
use tauri::{AppHandle, Emitter};
use tokio::io::{AsyncBufReadExt, BufReader};
use tokio::process::Command;
use crate::events::{BootstrapEvent, StageInfo, StageState};
/// `hermes update` exit code meaning "another hermes process is holding the
/// venv shim open / dirty precondition" — see _cmd_update_impl in
/// hermes_cli/main.py (sys.exit(2)). We surface a targeted message for this.
const UPDATE_EXIT_CONCURRENT: i32 = 2;
/// How long to wait for the old desktop process to release the venv shim
/// before giving up and letting `hermes update`'s own guard decide.
const DESKTOP_EXIT_WAIT: Duration = Duration::from_secs(20);
const DESKTOP_EXIT_POLL: Duration = Duration::from_millis(500);
/// Frontend → Rust: kick off the update flow. Mirrors `start_bootstrap`'s
/// fire-and-forget shape; progress arrives on the `bootstrap` event channel.
#[tauri::command]
pub async fn start_update(app: AppHandle) -> Result<(), String> {
tokio::spawn(async move {
if let Err(err) = run_update(app.clone()).await {
// run_update already emits a Failed event on the paths that matter;
// this catches anything that escaped. Emit defensively.
emit(
&app,
BootstrapEvent::Failed {
stage: None,
error: format!("{err:#}"),
},
);
}
});
Ok(())
}
async fn run_update(app: AppHandle) -> Result<()> {
let hermes_home = crate::paths::hermes_home();
let install_root = hermes_home.join("hermes-agent");
let hermes = resolve_hermes(&install_root).ok_or_else(|| {
let msg = format!(
"Could not find the hermes CLI under {}. Is Hermes installed? \
Re-run the installer to repair the install.",
install_root.display()
);
emit(
&app,
BootstrapEvent::Failed {
stage: None,
error: msg.clone(),
},
);
anyhow!(msg)
})?;
// Synthetic manifest so the existing progress UI renders our two stages.
emit(
&app,
BootstrapEvent::Manifest {
stages: vec![
stage_info("update", "Updating Hermes"),
stage_info("rebuild", "Rebuilding the desktop app"),
],
protocol_version: None,
},
);
// ---- pre-step: wait for the old desktop to die -----------------------
// The desktop exec'd us then called app.exit(), but process teardown is
// async on Windows. If it still holds the venv shim, `hermes update`
// aborts with exit 2. Give it a bounded window to clear.
wait_for_venv_free(&install_root, &app).await;
// ---- stage 1: hermes update -----------------------------------------
// Pass --branch so `hermes update` targets the branch this installer was
// built/pinned against (BUILD_PIN_BRANCH), NOT its built-in default of
// `main`. The install was a detached-HEAD checkout of a specific commit;
// without --branch, `hermes update` switches the checkout to `main` (a
// divergent branch that may not even have the desktop CLI command), then
// reports "already up to date" against the wrong branch. The desktop
// detected the update against this same branch, so we must update against
// it too.
let pin_branch = option_env_string("BUILD_PIN_BRANCH");
let mut update_args: Vec<&str> = vec!["update", "--yes", "--gateway"];
if let Some(b) = pin_branch.as_deref() {
update_args.push("--branch");
update_args.push(b);
}
emit_stage(&app, "update", StageState::Running, None, None);
let started = Instant::now();
let update = run_streamed(
&app,
&hermes,
&update_args,
&install_root,
Some("update"),
)
.await?;
let update_ms = started.elapsed().as_millis() as u64;
match update.exit_code {
Some(0) => {
emit_stage(&app, "update", StageState::Succeeded, Some(update_ms), None);
}
Some(code) if code == UPDATE_EXIT_CONCURRENT => {
let msg = "Hermes is still running. Close all Hermes windows and try \
the update again."
.to_string();
emit_stage(
&app,
"update",
StageState::Failed,
Some(update_ms),
Some(msg.clone()),
);
emit(
&app,
BootstrapEvent::Failed {
stage: Some("update".into()),
error: msg.clone(),
},
);
return Err(anyhow!(msg));
}
other => {
let msg = format!(
"hermes update failed (exit {:?}). See {} for details.",
other,
crate::paths::hermes_home()
.join("logs")
.join("update.log")
.display()
);
emit_stage(
&app,
"update",
StageState::Failed,
Some(update_ms),
Some(msg.clone()),
);
emit(
&app,
BootstrapEvent::Failed {
stage: Some("update".into()),
error: msg.clone(),
},
);
return Err(anyhow!(msg));
}
}
// ---- stage 2: hermes desktop --build-only ----------------------------
// `hermes update` deliberately does NOT build apps/desktop (it installs
// repo-root deps with --workspaces=false). This is the rebuild it skips.
emit_stage(&app, "rebuild", StageState::Running, None, None);
let started = Instant::now();
let rebuild = run_streamed(
&app,
&hermes,
&["desktop", "--build-only"],
&install_root,
Some("rebuild"),
)
.await?;
let rebuild_ms = started.elapsed().as_millis() as u64;
if rebuild.exit_code != Some(0) {
let msg = format!(
"Rebuilding the desktop app failed (exit {:?}). The update was \
applied but the app could not be rebuilt; run `hermes desktop` \
from a terminal to see the error.",
rebuild.exit_code
);
emit_stage(
&app,
"rebuild",
StageState::Failed,
Some(rebuild_ms),
Some(msg.clone()),
);
emit(
&app,
BootstrapEvent::Failed {
stage: Some("rebuild".into()),
error: msg.clone(),
},
);
return Err(anyhow!(msg));
}
emit_stage(&app, "rebuild", StageState::Succeeded, Some(rebuild_ms), None);
// ---- done: signal complete, then launch the fresh desktop ------------
emit(
&app,
BootstrapEvent::Complete {
install_root: install_root.to_string_lossy().into_owned(),
marker: None,
},
);
// Reuse the same detached-launch + app.exit(0) used post-install.
if let Err(err) =
crate::bootstrap::launch_hermes_desktop(app.clone(), install_root.to_string_lossy().into_owned())
.await
{
// Launch failed: don't hard-fail the update (it succeeded); surface a
// log line so the success screen can still tell the user to launch
// manually.
emit_log(
&app,
None,
&format!("[update] could not auto-launch desktop: {err}. Launch Hermes manually."),
);
}
Ok(())
}
/// Poll until the venv shim is no longer locked (Windows) or a bounded timeout
/// elapses. On non-Windows this is a short fixed grace since file locking
/// isn't the failure mode there.
async fn wait_for_venv_free(install_root: &Path, app: &AppHandle) {
let shim = venv_hermes(install_root);
let deadline = Instant::now() + DESKTOP_EXIT_WAIT;
emit_log(app, Some("update"), "[update] waiting for Hermes to exit…");
loop {
if !is_locked(&shim) {
return;
}
if Instant::now() >= deadline {
emit_log(
app,
Some("update"),
"[update] timed out waiting for Hermes to exit; proceeding anyway",
);
return;
}
tokio::time::sleep(DESKTOP_EXIT_POLL).await;
}
}
/// Best-effort lock probe: try to open the file for read+write. On Windows an
/// exclusively-held running .exe refuses the open with a sharing violation.
/// On Unix this almost always succeeds (no mandatory locking), which is fine —
/// the venv-shim contention is a Windows-only problem.
fn is_locked(path: &Path) -> bool {
if !path.exists() {
return false;
}
match std::fs::OpenOptions::new().read(true).write(true).open(path) {
Ok(_) => false,
Err(_) => true,
}
}
/// Spawn `hermes <args>` from `cwd`, stream stdout/stderr as Log events on the
/// bootstrap channel, and return the exit code. Mirrors powershell::run_script
/// but for an arbitrary command (no install.ps1 -File wrapping).
async fn run_streamed(
app: &AppHandle,
program: &Path,
args: &[&str],
cwd: &Path,
stage: Option<&str>,
) -> Result<CmdResult> {
let mut cmd = Command::new(program);
cmd.args(args)
.current_dir(cwd)
.stdin(Stdio::null())
.stdout(Stdio::piped())
.stderr(Stdio::piped());
#[cfg(target_os = "windows")]
{
use std::os::windows::process::CommandExt;
// CREATE_NO_WINDOW = 0x08000000 — no flashing console behind the GUI.
cmd.creation_flags(0x0800_0000);
}
let mut child = cmd
.spawn()
.map_err(|e| anyhow!("spawning {} {:?}: {e}", program.display(), args))?;
let stdout = child.stdout.take().expect("stdout piped");
let stderr = child.stderr.take().expect("stderr piped");
let mut out = BufReader::new(stdout).lines();
let mut err = BufReader::new(stderr).lines();
let stage_owned = stage.map(|s| s.to_string());
loop {
tokio::select! {
line = out.next_line() => match line {
Ok(Some(l)) => emit_log(app, stage_owned.as_deref(), &l),
Ok(None) => break,
Err(e) => { tracing::warn!("stdout read error: {e}"); break; }
},
line = err.next_line() => match line {
Ok(Some(l)) => emit_log(app, stage_owned.as_deref(), &format!("stderr: {l}")),
Ok(None) => {}
Err(e) => { tracing::warn!("stderr read error: {e}"); }
},
}
}
while let Ok(Some(l)) = out.next_line().await {
emit_log(app, stage_owned.as_deref(), &l);
}
while let Ok(Some(l)) = err.next_line().await {
emit_log(app, stage_owned.as_deref(), &format!("stderr: {l}"));
}
let status = child.wait().await.map_err(|e| anyhow!("waiting for child: {e}"))?;
Ok(CmdResult {
exit_code: status.code(),
})
}
struct CmdResult {
exit_code: Option<i32>,
}
/// Path to the venv hermes shim under an install root, regardless of existence.
fn venv_hermes(install_root: &Path) -> PathBuf {
if cfg!(target_os = "windows") {
install_root.join("venv").join("Scripts").join("hermes.exe")
} else {
install_root.join("venv").join("bin").join("hermes")
}
}
/// Resolve the hermes CLI to drive. Prefer the venv shim in the install we
/// just updated; fall back to `hermes` on PATH.
fn resolve_hermes(install_root: &Path) -> Option<PathBuf> {
let shim = venv_hermes(install_root);
if shim.exists() {
return Some(shim);
}
// PATH fallback. which-style probe via env, kept dependency-free.
let exe = if cfg!(target_os = "windows") { "hermes.exe" } else { "hermes" };
if let Ok(path) = std::env::var("PATH") {
let sep = if cfg!(target_os = "windows") { ';' } else { ':' };
for dir in path.split(sep) {
let cand = Path::new(dir).join(exe);
if cand.exists() {
return Some(cand);
}
}
}
None
}
// ---------------------------------------------------------------------------
// Event helpers — keep emit shape identical to bootstrap.rs so the UI is reused
// ---------------------------------------------------------------------------
fn stage_info(name: &str, title: &str) -> StageInfo {
StageInfo {
name: name.to_string(),
title: title.to_string(),
category: "update".to_string(),
needs_user_input: false,
}
}
// option_env! only accepts string literals, so the build-time pins are read
// by their literal names here. Mirrors bootstrap.rs's helper of the same name
// (kept local rather than shared because option_env! can't be parameterized).
fn option_env_string(key: &str) -> Option<String> {
let val = match key {
"BUILD_PIN_COMMIT" => option_env!("BUILD_PIN_COMMIT"),
"BUILD_PIN_BRANCH" => option_env!("BUILD_PIN_BRANCH"),
_ => None,
};
val.map(|s| s.to_string())
}
fn emit(app: &AppHandle, event: BootstrapEvent) {
if let Err(e) = app.emit(BootstrapEvent::CHANNEL, &event) {
tracing::warn!(?e, "failed to emit update event");
}
}
fn emit_stage(
app: &AppHandle,
name: &str,
state: StageState,
duration_ms: Option<u64>,
error: Option<String>,
) {
tracing::info!(stage = %name, ?state, ?duration_ms, ?error, "update stage");
emit(
app,
BootstrapEvent::Stage {
name: name.to_string(),
state,
duration_ms,
result: None,
error,
},
);
}
fn emit_log(app: &AppHandle, stage: Option<&str>, line: &str) {
match stage {
Some(s) => tracing::info!(target: "bootstrap.log", stage = %s, "{line}"),
None => tracing::info!(target: "bootstrap.log", "{line}"),
}
emit(
app,
BootstrapEvent::Log {
stage: stage.map(|s| s.to_string()),
line: line.to_string(),
},
);
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn venv_hermes_is_under_install_root() {
let root = Path::new("/x/hermes-agent");
let shim = venv_hermes(root);
assert!(shim.starts_with(root));
assert!(shim.to_string_lossy().contains("venv"));
}
#[test]
fn missing_file_is_not_locked() {
assert!(!is_locked(Path::new("/nonexistent/does/not/exist/xyz")));
}
}

View File

@@ -0,0 +1,67 @@
{
"$schema": "https://schema.tauri.app/config/2",
"productName": "Hermes Setup",
"version": "0.0.1",
"identifier": "com.nousresearch.hermes.setup",
"build": {
"beforeDevCommand": "npm run dev",
"devUrl": "http://127.0.0.1:5175",
"beforeBuildCommand": "npm run build",
"frontendDist": "../dist"
},
"app": {
"windows": [
{
"label": "main",
"title": "Hermes Setup",
"width": 880,
"height": 620,
"minWidth": 720,
"minHeight": 520,
"resizable": true,
"fullscreen": false,
"decorations": true,
"transparent": false,
"center": true
}
],
"security": {
"csp": "default-src 'self'; img-src 'self' data:; style-src 'self' 'unsafe-inline'; script-src 'self'; font-src 'self' data:; connect-src 'self' ipc: http://ipc.localhost"
},
"withGlobalTauri": false
},
"bundle": {
"active": true,
"category": "DeveloperTool",
"shortDescription": "Hermes Setup",
"longDescription": "Installs Hermes Agent on your machine. Drives scripts/install.ps1 (Windows) and scripts/install.sh (macOS/Linux).",
"publisher": "Nous Research",
"copyright": "Copyright © 2026 Nous Research",
"targets": [
"app",
"dmg",
"appimage"
],
"icon": [
"icons/32x32.png",
"icons/128x128.png",
"icons/128x128@2x.png",
"icons/icon.icns",
"icons/icon.ico"
],
"windows": {
"webviewInstallMode": {
"type": "embedBootstrapper"
}
},
"macOS": {
"minimumSystemVersion": "11.0",
"hardenedRuntime": true
}
},
"plugins": {
"shell": {
"open": true
}
}
}

View File

@@ -0,0 +1,35 @@
import { useStore } from '@nanostores/react'
import { useEffect } from 'react'
import { $route, $bootstrap, initialize } from './store'
import Welcome from './routes/welcome'
import Progress from './routes/progress'
import Success from './routes/success'
import Failure from './routes/failure'
/*
* App shell — Hermes Setup.
*
* No header chrome (the OS title bar already says "Hermes Setup"; an
* in-window repeat of the H mark + words was redundant slop).
*
* Route state lives in a single $route atom — 4 screens, no react-router.
*/
export default function App() {
const route = useStore($route)
const bootstrap = useStore($bootstrap)
useEffect(() => {
void initialize()
}, [])
return (
<div className="relative flex h-full flex-col overflow-hidden bg-background text-foreground">
<main className="relative z-10 flex flex-1 flex-col overflow-hidden">
{route === 'welcome' && <Welcome />}
{route === 'progress' && <Progress bootstrap={bootstrap} />}
{route === 'success' && <Success />}
{route === 'failure' && <Failure bootstrap={bootstrap} />}
</main>
</div>
)
}

View File

@@ -0,0 +1,80 @@
import { cva, type VariantProps } from 'class-variance-authority'
import { Slot } from 'radix-ui'
import * as React from 'react'
import { cn } from '../lib/utils'
/*
* Button — copied verbatim from apps/desktop/src/components/ui/button.tsx.
*
* We import the desktop's local shadcn-style Button rather than
* @nous-research/ui's <Button>, because the DS Button uses bg-midground /
* text-background-base utilities that resolve to the DS's hardcoded
* gold/brown brand defaults (#ffac02 / #170d02) unless overridden in
* runtime. The desktop never sets those vars; it routes through its
* own --dt-* token chain via shadcn classes like bg-primary. We do
* the same so visuals match exactly.
*/
const buttonVariants = cva(
"inline-flex shrink-0 items-center justify-center gap-2 rounded-md text-sm font-medium whitespace-nowrap transition-all outline-none focus-visible:border-ring focus-visible:ring-[0.1875rem] focus-visible:ring-ring/50 disabled:pointer-events-none disabled:opacity-50 aria-invalid:border-destructive aria-invalid:ring-destructive/20 dark:aria-invalid:ring-destructive/40 [&_svg]:pointer-events-none [&_svg]:shrink-0 [&_svg:not([class*='size-'])]:size-4",
{
variants: {
variant: {
default: 'bg-primary text-primary-foreground hover:bg-primary/90',
destructive:
'bg-destructive text-white hover:bg-destructive/90 focus-visible:ring-destructive/20 dark:bg-destructive/60 dark:focus-visible:ring-destructive/40',
outline:
'border bg-background shadow-xs hover:bg-accent hover:text-accent-foreground dark:border-input dark:bg-input/30 dark:hover:bg-input/50',
secondary:
'bg-secondary text-secondary-foreground hover:bg-secondary/80',
ghost:
'hover:bg-accent hover:text-accent-foreground dark:hover:bg-accent/50',
link: 'text-primary underline-offset-4 decoration-current/20 hover:underline'
},
size: {
default: 'h-9 px-4 py-2 has-[>svg]:px-3',
xs: "h-6 gap-1 rounded-md px-2 text-xs has-[>svg]:px-1.5 [&_svg:not([class*='size-'])]:size-3",
sm: 'h-8 gap-1.5 rounded-md px-3 has-[>svg]:px-2.5',
lg: 'h-10 rounded-md px-6 has-[>svg]:px-4',
icon: 'size-9',
'icon-xs':
"size-6 rounded-md [&_svg:not([class*='size-'])]:size-3",
'icon-sm': 'size-8',
'icon-lg': 'size-10'
}
},
defaultVariants: {
variant: 'default',
size: 'default'
}
}
)
interface ButtonProps
extends React.ComponentProps<'button'>,
VariantProps<typeof buttonVariants> {
asChild?: boolean
}
export function Button({
className,
variant = 'default',
size = 'default',
asChild = false,
...props
}: ButtonProps) {
const Comp = asChild ? Slot.Root : 'button'
return (
<Comp
className={cn(buttonVariants({ variant, size }), className)}
data-size={size}
data-slot="button"
data-variant={variant}
{...props}
/>
)
}
export { buttonVariants }

View File

@@ -0,0 +1,12 @@
import { type ClassValue, clsx } from 'clsx'
import { twMerge } from 'tailwind-merge'
/*
* cn — Tailwind-aware class merger. Same util the desktop and dashboard
* use. clsx handles conditional classes; twMerge resolves utility
* conflicts so `cn('px-2', condition && 'px-4')` ends up with px-4 only,
* not both.
*/
export function cn(...inputs: ClassValue[]) {
return twMerge(clsx(inputs))
}

View File

@@ -0,0 +1,14 @@
import { StrictMode } from 'react'
import { createRoot } from 'react-dom/client'
import App from './app.tsx'
import './styles.css'
// Default to LIGHT mode — matches the Hermes desktop's default. The
// desktop's runtime theme system can switch to .dark later, but our
// installer ships in light mode only since we don't carry the theme
// provider machinery.
createRoot(document.getElementById('root')!).render(
<StrictMode>
<App />
</StrictMode>
)

View File

@@ -0,0 +1,77 @@
import { type CSSProperties } from 'react'
import { useStore } from '@nanostores/react'
import { Button } from '../components/button'
import {
$logPath,
openLogDir,
startInstall,
type BootstrapStateModel
} from '../store'
import { RefreshCw, FileText } from 'lucide-react'
interface FailureProps {
bootstrap: BootstrapStateModel
}
/*
* Failure screen. Same hero treatment as Welcome/Success — the wordmark
* carries the brand, so we keep it across every terminal state.
*
* The actual error message lives below in muted text. Two clear
* affordances: Retry (primary) and Open log folder (secondary).
*/
export default function Failure({ bootstrap }: FailureProps) {
const logPath = useStore($logPath)
return (
<div className="hermes-fade-in flex h-full flex-col items-center justify-center gap-6 px-12 py-10">
<div className="w-full max-w-2xl min-w-0 text-center">
<p
className="fit-text mx-auto mb-4 w-full font-['Collapse'] font-bold uppercase leading-[0.9] tracking-[0.08em] text-destructive mix-blend-plus-lighter dark:text-destructive/90"
style={
{
'--fit-text-line-height': '0.9',
'--fit-text-max': '5rem',
'--fit-text-min': '2.25rem'
} as CSSProperties
}
>
<span>
<span>Install didn&rsquo;t finish</span>
</span>
<span aria-hidden="true">Install didn&rsquo;t finish</span>
</p>
<p className="m-0 mx-auto max-w-xl text-center text-sm leading-normal tracking-tight text-muted-foreground">
{bootstrap.error ?? 'Something went wrong during installation.'}
</p>
</div>
<div className="flex items-center gap-3">
<Button
onClick={() => void startInstall()}
size="lg"
className="inline-flex items-center gap-2 px-6"
>
<RefreshCw size={16} />
Retry install
</Button>
<Button
variant="outline"
size="lg"
onClick={() => void openLogDir()}
className="inline-flex items-center gap-2"
>
<FileText size={16} />
Open log folder
</Button>
</div>
{logPath && (
<p className="max-w-lg text-center text-xs text-muted-foreground/70">
Log: <code className="font-mono">{logPath}</code>
</p>
)}
</div>
)
}

View File

@@ -0,0 +1,190 @@
import { useEffect, useRef, useState } from 'react'
import { useStore } from '@nanostores/react'
import { Button } from '../components/button'
import {
cancelInstall,
$progress,
type BootstrapStateModel,
type StageState
} from '../store'
import { Check, X, ChevronRight, FileText, Loader2 } from 'lucide-react'
import clsx from 'clsx'
interface ProgressProps {
bootstrap: BootstrapStateModel
}
/*
* Progress screen — drives a stage list + collapsible log panel. Uses
* the DS <Progress> for the top bar so its motion + ring match the rest
* of the product.
*/
export default function ProgressScreen({ bootstrap }: ProgressProps) {
const progress = useStore($progress)
const [showLogs, setShowLogs] = useState(false)
const logEndRef = useRef<HTMLDivElement>(null)
useEffect(() => {
if (showLogs && logEndRef.current) {
logEndRef.current.scrollIntoView({ behavior: 'smooth' })
}
}, [bootstrap.logs.length, showLogs])
const currentStage =
bootstrap.currentStage != null
? bootstrap.stages[bootstrap.currentStage]
: null
return (
<div className="hermes-fade-in flex h-full flex-col">
<div className="border-b border-border px-6 py-4">
<div className="mb-3 flex items-center justify-between text-xs">
<div className="flex items-center gap-2 text-foreground">
{bootstrap.status === 'running' && (
<Loader2 size={12} className="animate-spin text-primary" />
)}
<span>
{bootstrap.status === 'running'
? currentStage
? currentStage.info.title
: 'Preparing\u2026'
: bootstrap.status === 'completed'
? 'Done'
: 'Installing'}
</span>
</div>
<div className="text-muted-foreground">
{progress.done} of {progress.total} steps
</div>
</div>
{/* Top progress bar — plain HTML, derived from --primary so it
tracks the theme accent. */}
<div className="h-1 w-full overflow-hidden rounded-full bg-muted">
<div
className="h-full bg-primary transition-all duration-300 ease-out"
style={{ width: `${Math.max(2, progress.fraction * 100)}%` }}
/>
</div>
</div>
<div className="flex flex-1 overflow-hidden">
<div className="flex-1 overflow-y-auto px-6 py-4">
<ol className="space-y-1">
{bootstrap.stageOrder.map((name) => {
const rec = bootstrap.stages[name]
if (!rec) return null
return (
<li
key={name}
className={clsx(
'flex items-center gap-3 rounded-md px-3 py-2 text-sm transition-colors',
rec.state === 'running' && 'bg-card text-foreground',
rec.state === 'succeeded' && 'text-foreground/80',
rec.state === 'skipped' && 'text-muted-foreground',
rec.state === 'failed' &&
'bg-destructive/10 text-destructive',
!rec.state && 'text-muted-foreground/60'
)}
>
<StateIcon state={rec.state ?? null} />
<span className="flex-1 truncate">{rec.info.title}</span>
{rec.durationMs != null && (
<span className="text-xs text-muted-foreground">
{formatDuration(rec.durationMs)}
</span>
)}
</li>
)
})}
</ol>
</div>
{showLogs && (
<div className="flex w-1/2 flex-col border-l border-border bg-card/40">
<div className="flex shrink-0 items-center justify-between border-b border-border px-3 py-2">
<div className="text-xs font-medium text-foreground/80">
Live output
</div>
<div className="text-xs text-muted-foreground">
{bootstrap.logs.length} lines
</div>
</div>
<div className="flex-1 overflow-y-auto px-3 py-2 font-mono text-[11px] leading-relaxed">
{bootstrap.logs.map((entry, idx) => (
<div
key={idx}
className={clsx(
'whitespace-pre-wrap',
entry.line.startsWith('stderr:')
? 'text-destructive'
: 'text-foreground/70'
)}
>
{entry.line}
</div>
))}
<div ref={logEndRef} />
</div>
</div>
)}
</div>
<div className="flex shrink-0 items-center justify-between border-t border-border px-6 py-3">
<button
type="button"
onClick={() => setShowLogs((v) => !v)}
className="inline-flex items-center gap-1.5 text-xs text-muted-foreground transition-colors hover:text-foreground"
>
<FileText size={14} />
{showLogs ? 'Hide details' : 'Show details'}
<ChevronRight
size={12}
className={clsx(
'transition-transform',
showLogs && 'rotate-90'
)}
/>
</button>
{bootstrap.status === 'running' && (
<Button
variant="outline"
size="sm"
onClick={() => void cancelInstall()}
>
Cancel
</Button>
)}
</div>
</div>
)
}
function StateIcon({ state }: { state: StageState | null }) {
if (state === 'running') {
return <Loader2 size={14} className="animate-spin text-primary" />
}
if (state === 'succeeded') {
return <Check size={14} className="text-emerald-400" />
}
if (state === 'skipped') {
return <ChevronRight size={14} className="text-muted-foreground/70" />
}
if (state === 'failed') {
return <X size={14} className="text-destructive" />
}
return (
<div
className="h-[6px] w-[6px] rounded-full bg-muted-foreground/40"
aria-hidden
/>
)
}
function formatDuration(ms: number): string {
if (ms < 1000) return `${ms}ms`
if (ms < 60000) return `${(ms / 1000).toFixed(1)}s`
const m = Math.floor(ms / 60000)
const s = Math.round((ms % 60000) / 1000)
return `${m}m ${s}s`
}

View File

@@ -0,0 +1,87 @@
import { useState } from 'react'
import { type CSSProperties } from 'react'
import { Button } from '../components/button'
import { launchHermesDesktop } from '../store'
import { Rocket, AlertCircle } from 'lucide-react'
/*
* Success screen. HERMES AGENT wordmark stays as the visual anchor
* (same Collapse Bold treatment as Welcome + the desktop chat intro),
* with a status line below.
*
* Launching the desktop can fail (e.g. Stage-Desktop was skipped and
* Hermes.exe doesn't exist). We catch the Tauri error and surface it
* inline rather than silently doing nothing — the previous version
* had `onClick={() => void launchHermesDesktop()}` which swallowed
* the rejection and left the user staring at an unresponsive button.
*/
export default function Success() {
const [error, setError] = useState<string | null>(null)
const [launching, setLaunching] = useState(false)
async function handleLaunch() {
setError(null)
setLaunching(true)
try {
await launchHermesDesktop()
// On success the installer exits — control never returns here.
} catch (e) {
const msg = e instanceof Error ? e.message : String(e)
setError(msg)
setLaunching(false)
}
}
return (
<div className="hermes-fade-in flex h-full flex-col items-center justify-center gap-8 px-12 py-10">
<div className="w-full max-w-2xl min-w-0 text-center">
<p
className="fit-text mx-auto mb-4 w-full font-['Collapse'] font-bold uppercase leading-[0.9] tracking-[0.08em] text-midground mix-blend-plus-lighter dark:text-foreground/90"
style={
{
'--fit-text-line-height': '0.9',
'--fit-text-max': '5rem',
'--fit-text-min': '2.25rem'
} as CSSProperties
}
>
<span>
<span>Hermes is ready</span>
</span>
<span aria-hidden="true">Hermes is ready</span>
</p>
<p className="m-0 text-center text-base leading-normal tracking-tight text-muted-foreground">
You can launch from here, or any time from your terminal with{' '}
<code className="rounded bg-muted/60 px-1 py-0.5 font-mono text-sm">
hermes desktop
</code>
.
</p>
</div>
<Button
onClick={() => void handleLaunch()}
size="lg"
disabled={launching}
className="inline-flex items-center gap-2 px-6"
>
<Rocket size={18} />
{launching ? 'Launching…' : 'Launch Hermes'}
</Button>
{error && (
<div
role="alert"
className="flex max-w-2xl items-start gap-2 rounded-md border border-destructive/30 bg-destructive/10 px-4 py-3 text-sm text-destructive"
>
<AlertCircle size={16} className="mt-0.5 shrink-0" />
<div className="min-w-0">
<div className="font-medium">Couldn&rsquo;t launch the desktop app</div>
<div className="mt-1 text-destructive/80">{error}</div>
</div>
</div>
)}
</div>
)
}

View File

@@ -0,0 +1,58 @@
import { type CSSProperties } from 'react'
import { Button } from '../components/button'
import { startInstall } from '../store'
import { ArrowRight } from 'lucide-react'
/*
* Welcome screen.
*
* Mirrors the desktop's chat intro (apps/desktop/src/components/chat/intro.tsx):
* - HERMES AGENT wordmark rendered in Collapse Bold, uppercase, tracked
* - mix-blend-plus-lighter so the type "glows" on the canvas
* - fit-text utility so the wordmark sizes itself to the column
*
* No install-path footer. The default install location is correct for
* 99% of users; the rest will use the CLI installer with a -HermesHome
* flag. Showing %LOCALAPPDATA% to grandma is developer-brain.
*/
export default function Welcome() {
return (
<div className="hermes-fade-in flex h-full flex-col items-center justify-center gap-10 px-12 py-10">
{/* Hero — same recipe the desktop's chat/intro.tsx uses */}
<div className="w-full max-w-2xl min-w-0 text-center">
<p
className="fit-text mx-auto mb-4 w-full font-['Collapse'] font-bold uppercase leading-[0.9] tracking-[0.08em] text-midground mix-blend-plus-lighter dark:text-foreground/90"
style={
{
'--fit-text-line-height': '0.9',
'--fit-text-max': '6rem',
'--fit-text-min': '2.5rem'
} as CSSProperties
}
>
<span>
<span>HERMES AGENT</span>
</span>
<span aria-hidden="true">HERMES AGENT</span>
</p>
<p className="m-0 text-center text-base leading-normal tracking-tight text-muted-foreground">
The agent that grows with you. We&rsquo;ll set things up in the
background &mdash; takes a few minutes.
</p>
</div>
<Button
onClick={() => void startInstall()}
size="lg"
className="group inline-flex items-center gap-2 px-6"
>
Install Hermes
<ArrowRight
size={18}
className="transition-transform group-hover:translate-x-0.5"
/>
</Button>
</div>
)
}

View File

@@ -0,0 +1,277 @@
import { atom, computed } from 'nanostores'
import { listen, type UnlistenFn } from '@tauri-apps/api/event'
import { invoke } from '@tauri-apps/api/core'
/*
* Bootstrap state store — single source of truth for installer screens.
*
* Lives in nanostores per the project's TypeScript guidelines (apps/desktop
* AGENTS.md): "Prefer small nanostores over component state when state is
* shared, reused, or read by distant UI."
*
* One channel from Rust ('bootstrap' event), discriminated by payload.type.
* We translate those events into typed atom updates here so the rest of
* the app only deals with React-friendly state.
*/
// ---------------------------------------------------------------------------
// Types — mirror src-tauri/src/events.rs
// ---------------------------------------------------------------------------
export interface StageInfo {
name: string
title: string
category: string
needs_user_input: boolean
}
export type StageState = 'running' | 'succeeded' | 'skipped' | 'failed'
export interface StageRecord {
info: StageInfo
state: StageState | null
durationMs?: number
error?: string
}
export interface BootstrapStateModel {
status: 'idle' | 'running' | 'completed' | 'failed'
protocolVersion: number | null
stages: Record<string, StageRecord>
stageOrder: string[]
currentStage: string | null
installRoot: string | null
error: string | null
logs: Array<{ stage?: string; line: string }>
}
const INITIAL: BootstrapStateModel = {
status: 'idle',
protocolVersion: null,
stages: {},
stageOrder: [],
currentStage: null,
installRoot: null,
error: null,
logs: []
}
// ---------------------------------------------------------------------------
// Atoms
// ---------------------------------------------------------------------------
export type Route = 'welcome' | 'progress' | 'success' | 'failure'
/// How the installer was launched, mirrored from src-tauri AppMode.
/// 'install' = first-run onboarding (bare launch). 'update' = driven by the
/// desktop app handing off via `Hermes-Setup.exe --update`.
export type AppMode = 'install' | 'update'
export const $route = atom<Route>('welcome')
export const $mode = atom<AppMode>('install')
export const $bootstrap = atom<BootstrapStateModel>(INITIAL)
export const $logPath = atom<string | null>(null)
export const $hermesHome = atom<string | null>(null)
export const $progress = computed($bootstrap, (b) => {
const total = b.stageOrder.length
if (total === 0) return { done: 0, total: 0, fraction: 0 }
let done = 0
for (const name of b.stageOrder) {
const s = b.stages[name]?.state
if (s === 'succeeded' || s === 'skipped' || s === 'failed') done += 1
}
return { done, total, fraction: done / total }
})
// ---------------------------------------------------------------------------
// Tauri event subscription
// ---------------------------------------------------------------------------
interface BootstrapManifestEvent {
type: 'manifest'
stages: StageInfo[]
protocolVersion: number | null
}
interface BootstrapStageEvent {
type: 'stage'
name: string
state: StageState
durationMs?: number
error?: string
}
interface BootstrapLogEvent {
type: 'log'
stage?: string
line: string
}
interface BootstrapCompleteEvent {
type: 'complete'
installRoot: string
marker: unknown
}
interface BootstrapFailedEvent {
type: 'failed'
stage?: string
error: string
}
type BootstrapEvent =
| BootstrapManifestEvent
| BootstrapStageEvent
| BootstrapLogEvent
| BootstrapCompleteEvent
| BootstrapFailedEvent
let unlisten: UnlistenFn | null = null
export async function initialize(): Promise<void> {
if (unlisten) return
// Pull static info on mount for the diagnostics footer.
try {
const [logPath, hermesHome, mode] = await Promise.all([
invoke<string>('get_log_path'),
invoke<string>('get_hermes_home'),
invoke<AppMode>('get_mode')
])
$logPath.set(logPath)
$hermesHome.set(hermesHome)
$mode.set(mode)
} catch (err) {
console.warn('failed to fetch installer paths', err)
}
unlisten = await listen<BootstrapEvent>('bootstrap', (event) => {
const payload = event.payload
const cur = $bootstrap.get()
switch (payload.type) {
case 'manifest': {
const stages: Record<string, StageRecord> = {}
const order: string[] = []
for (const s of payload.stages) {
stages[s.name] = { info: s, state: null }
order.push(s.name)
}
$bootstrap.set({
...cur,
status: 'running',
protocolVersion: payload.protocolVersion,
stages,
stageOrder: order,
currentStage: null,
installRoot: null,
error: null,
logs: []
})
$route.set('progress')
break
}
case 'stage': {
const existing = cur.stages[payload.name]
if (!existing) {
console.warn('stage event for unknown stage', payload.name)
break
}
const next: StageRecord = {
...existing,
state: payload.state,
durationMs: payload.durationMs,
error: payload.error
}
$bootstrap.set({
...cur,
stages: { ...cur.stages, [payload.name]: next },
currentStage:
payload.state === 'running' ? payload.name : cur.currentStage
})
break
}
case 'log': {
const logs = [...cur.logs, { stage: payload.stage, line: payload.line }]
// Keep the rolling buffer bounded so the UI doesn't get OOM'd
// during a long install (playwright chromium download is ~10k lines).
const trimmed = logs.length > 2000 ? logs.slice(-2000) : logs
$bootstrap.set({ ...cur, logs: trimmed })
break
}
case 'complete':
$bootstrap.set({
...cur,
status: 'completed',
installRoot: payload.installRoot,
currentStage: null
})
// Install: show the "launch Hermes" success screen. Update: this is a
// hand-off — the installer relaunches the desktop and exits within a
// few hundred ms, so routing to success just flashes that screen
// before the window closes. Stay on progress until we exit.
if ($mode.get() !== 'update') {
$route.set('success')
}
break
case 'failed':
$bootstrap.set({
...cur,
status: 'failed',
error: payload.error,
currentStage: null
})
$route.set('failure')
break
}
})
// Update mode is a hand-off, not a user-initiated flow: the desktop already
// exited and re-launched us as `--update`. Kick the update immediately so
// the user lands on progress, not a redundant "click to update" screen.
if ($mode.get() === 'update') {
void startUpdate()
}
}
// ---------------------------------------------------------------------------
// Actions
// ---------------------------------------------------------------------------
export async function startInstall(opts?: { branch?: string }): Promise<void> {
// Reset before kicking off so a retry from the failure screen clears
// the previous run's state.
$bootstrap.set(INITIAL)
$route.set('progress')
await invoke('start_bootstrap', {
args: {
commit: null,
branch: opts?.branch ?? null,
include_desktop: true,
hermes_home: null
}
})
}
export async function startUpdate(): Promise<void> {
// Update is driven by the desktop handing off (Hermes-Setup.exe --update);
// there's no welcome click. Reset + jump straight to progress, then let the
// Rust side stream the synthetic update manifest.
$bootstrap.set(INITIAL)
$route.set('progress')
await invoke('start_update')
}
export async function cancelInstall(): Promise<void> {
await invoke('cancel_bootstrap')
}
export async function launchHermesDesktop(): Promise<void> {
const installRoot = $bootstrap.get().installRoot
if (!installRoot) throw new Error('no install root')
await invoke('launch_hermes_desktop', { installRoot })
}
export async function openLogDir(): Promise<void> {
await invoke('open_log_dir')
}

View File

@@ -0,0 +1,51 @@
/*
* Hermes Setup — defer entirely to the desktop's styles.css.
*
* Rather than re-implement the Hermes design system (and inevitably drift
* from it), we import apps/desktop/src/styles.css wholesale. The desktop
* is the canonical source of truth for fonts, color tokens, button chrome,
* scrollbars, layout utilities, and animations. Any change to the
* Hermes look propagates here automatically with no copy-paste maintenance.
*
* Path resolution caveats:
* - Tailwind v4's `@import` resolves relative to this file. The desktop's
* `@source '../../../node_modules/...'` declarations therefore re-resolve
* against apps/bootstrap-installer/src/. Since both apps live two levels
* deep under the same repo root, `../../../node_modules` lands in the
* same place. (Verify if either app ever moves.)
* - The desktop's `@font-face url('../../../node_modules/...')` references
* are baked into the *imported* stylesheet; CSS resolves url()s relative
* to the file that contains them, so they continue to point at the
* correct node_modules path even from here.
*
* Forced light mode: the desktop ships with a runtime theme switcher
* (ThemeProvider + applyTheme) that can flip to dark via document.documentElement.
* The installer has no UI for theme switching, so we stay on the desktop's
* default light surface (Nous-blue accent on near-white chrome).
*/
@import '../../desktop/src/styles.css';
/* Installer-only additions: a fade-in animation and a warm radial glow
for the welcome screen. Everything else inherits from the desktop. */
@keyframes hermes-fade-in {
from {
opacity: 0;
transform: translateY(4px);
}
to {
opacity: 1;
transform: translateY(0);
}
}
.hermes-fade-in {
animation: hermes-fade-in 0.45s ease-out both;
}
.hermes-glow {
background: radial-gradient(
ellipse at center,
color-mix(in srgb, var(--ui-warm) 18%, transparent) 0%,
transparent 60%
);
}

View File

@@ -0,0 +1 @@
/// <reference types="vite/client" />

View File

@@ -0,0 +1,26 @@
{
"compilerOptions": {
"target": "ES2022",
"useDefineForClassFields": true,
"lib": ["ES2022", "DOM", "DOM.Iterable"],
"module": "ESNext",
"skipLibCheck": true,
"moduleResolution": "bundler",
"allowImportingTsExtensions": true,
"resolveJsonModule": true,
"isolatedModules": true,
"noEmit": true,
"jsx": "react-jsx",
"strict": true,
"noUnusedLocals": true,
"noUnusedParameters": true,
"esModuleInterop": true,
"noFallthroughCasesInSwitch": true,
"baseUrl": ".",
"paths": {
"@/*": ["src/*"]
}
},
"include": ["src"],
"references": [{ "path": "./tsconfig.node.json" }]
}

View File

@@ -0,0 +1,11 @@
{
"compilerOptions": {
"composite": true,
"skipLibCheck": true,
"module": "ESNext",
"moduleResolution": "bundler",
"allowSyntheticDefaultImports": true,
"strict": true
},
"include": ["vite.config.ts"]
}

View File

@@ -0,0 +1,46 @@
import { defineConfig } from 'vite'
import react from '@vitejs/plugin-react'
import tailwindcss from '@tailwindcss/vite'
import path from 'node:path'
// Hermes Setup — Tauri-targeted Vite config.
//
// Port 5175 keeps us out of the way of:
// web (vite default 5173)
// apps/desktop dev (5174 per its package.json)
//
// `clearScreen: false` is the Tauri convention — they spawn vite as a child
// process and want our errors to stay visible.
const host = process.env.TAURI_DEV_HOST
export default defineConfig({
plugins: [react(), tailwindcss()],
resolve: {
alias: {
'@': path.resolve(__dirname, './src')
}
},
clearScreen: false,
server: {
port: 5175,
strictPort: true,
host: host || '127.0.0.1',
hmr: host
? {
protocol: 'ws',
host,
port: 5176
}
: undefined,
watch: {
// Don't watch the Rust side — tauri-cli handles it.
ignored: ['**/src-tauri/**']
}
},
build: {
target: 'esnext',
outDir: 'dist',
emptyOutDir: true
}
})

11
apps/desktop/.prettierrc Normal file
View File

@@ -0,0 +1,11 @@
{
"arrowParens": "avoid",
"bracketSpacing": true,
"endOfLine": "auto",
"printWidth": 120,
"semi": false,
"singleQuote": true,
"tabWidth": 2,
"trailingComma": "none",
"useTabs": false
}

137
apps/desktop/README.md Normal file
View File

@@ -0,0 +1,137 @@
# Hermes Desktop ☤
<p align="center">
<a href="https://github.com/NousResearch/hermes-agent/releases"><img src="https://img.shields.io/badge/Download-macOS%20%C2%B7%20Windows%20%C2%B7%20Linux-FFD700?style=for-the-badge" alt="Download"></a>
<a href="https://hermes-agent.nousresearch.com/docs/"><img src="https://img.shields.io/badge/Docs-hermes--agent.nousresearch.com-FFD700?style=for-the-badge" alt="Documentation"></a>
<a href="https://discord.gg/NousResearch"><img src="https://img.shields.io/badge/Discord-5865F2?style=for-the-badge&logo=discord&logoColor=white" alt="Discord"></a>
<a href="https://github.com/NousResearch/hermes-agent/blob/main/LICENSE"><img src="https://img.shields.io/badge/License-MIT-green?style=for-the-badge" alt="License: MIT"></a>
</p>
**The native desktop app for [Hermes Agent](../../README.md) — the self-improving AI agent from [Nous Research](https://nousresearch.com).** Same agent, same skills, same memory as the CLI and gateway, in a polished native window — chat with streaming tool output, side-by-side previews, a file browser, voice, and settings, no terminal required. Available for **macOS, Windows, and Linux**.
<table>
<tr><td><b>Chat with the full agent</b></td><td>Streaming responses, live tool activity, structured tool summaries, and the same conversation history as every other Hermes surface.</td></tr>
<tr><td><b>Side-by-side previews</b></td><td>Render web pages, files, and tool outputs in a right-hand pane while you keep chatting.</td></tr>
<tr><td><b>File browser</b></td><td>Explore and preview the working directory without leaving the app.</td></tr>
<tr><td><b>Voice</b></td><td>Talk to Hermes and hear it back.</td></tr>
<tr><td><b>Settings & onboarding</b></td><td>Manage providers, models, tools, and credentials from a real UI. First-run setup gets you to your first message in seconds.</td></tr>
<tr><td><b>Stays current</b></td><td>Built-in updates pull the latest agent and rebuild the app in place.</td></tr>
</table>
---
## Install
### Install with Hermes (recommended)
Add `--include-desktop` to the [one-line installer](../../README.md#quick-install) and it sets up the agent and builds the desktop app in one go:
```bash
curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash -s -- --include-desktop
```
Already have the Hermes CLI? Just run:
```bash
hermes desktop
```
It builds and launches the GUI against your existing install — same config, keys, sessions, and skills. On first launch Hermes walks you through picking a provider and model; nothing else to configure.
### Prebuilt installers
When a release ships desktop installers they're attached to its [releases page](https://github.com/NousResearch/hermes-agent/releases) — `.dmg` (macOS), `.exe` / `.msi` (Windows), `.AppImage` / `.deb` / `.rpm` (Linux). These are published manually, so the install-with-Hermes path above is the most reliable way to get the latest.
---
## Updating
The app checks for updates in the background and offers a one-click update when one is ready. You can also update any time from the CLI:
```bash
hermes update
```
---
## Requirements
The installer handles everything for you (Python 3.11+, a portable Git, ripgrep). The only thing worth knowing:
- **Windows** — the installer bundles its own Git and Python; no admin rights or system changes required.
- **macOS / Linux** — uses your system Python 3.11+ (installed automatically if missing).
---
## Development
Want to hack on the app itself? Install workspace deps from the repo root once, then run the dev server from this directory:
```bash
npm install # from repo root — links apps/desktop, web, apps/shared
cd apps/desktop
npm run dev # Vite renderer + Electron, which boots the Python backend
```
Point the app at a specific source checkout, or sandbox it away from your real config:
```bash
HERMES_DESKTOP_HERMES_ROOT=/path/to/clone npm run dev
HERMES_HOME=/tmp/throwaway npm run dev
npm run dev:fake-boot # exercise the startup overlay with deterministic delays
```
### Building installers
```bash
npm run dist:mac # DMG + zip
npm run dist:win # NSIS + MSI
npm run dist:linux # AppImage + deb + rpm
npm run pack # unpacked app under release/ (no installer)
```
Installers are built and uploaded to GitHub Releases manually. macOS/Windows signing & notarization happen automatically when the relevant credentials are present in the environment (`CSC_LINK` / `CSC_KEY_PASSWORD` / `APPLE_*` for macOS, `WIN_CSC_*` for Windows).
### How it works
The packaged app ships only the Electron shell. On first launch it installs the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. The renderer (React, in `src/`) talks to a `hermes dashboard --tui` backend over the standard gateway APIs and reuses the embedded TUI rather than reimplementing chat. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
### Verification
Run before opening a PR (lint may surface pre-existing warnings but must exit cleanly):
```bash
npm run fix
npm run type-check
npm run lint
npm run test:desktop:all
```
### Troubleshooting
Boot logs land in `HERMES_HOME/logs/desktop.log` (includes backend output and recent Python tracebacks) — check it first if the app reports a boot failure.
```bash
# Force a clean first-launch setup
rm "$HOME/.hermes/hermes-agent/.hermes-bootstrap-complete" # macOS/Linux
# Rebuild a broken Python venv
rm -rf "$HOME/.hermes/hermes-agent/venv" # macOS/Linux
# Reset a stuck macOS microphone prompt
tccutil reset Microphone com.nousresearch.hermes
```
---
## Community
- 💬 [Discord](https://discord.gg/NousResearch)
- 📖 [Documentation](https://hermes-agent.nousresearch.com/docs/)
- 🐛 [Issues](https://github.com/NousResearch/hermes-agent/issues)
---
## License
MIT — see [LICENSE](../../LICENSE).
Built by [Nous Research](https://nousresearch.com).

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 674 KiB

View File

@@ -0,0 +1,21 @@
{
"$schema": "https://ui.shadcn.com/schema.json",
"style": "new-york",
"rsc": false,
"tsx": true,
"tailwind": {
"config": "",
"css": "src/styles.css",
"baseColor": "neutral",
"cssVariables": true,
"prefix": ""
},
"aliases": {
"components": "@/components",
"utils": "@/lib/utils",
"ui": "@/components/ui",
"lib": "@/lib",
"hooks": "@/hooks"
},
"iconLibrary": "lucide"
}

View File

@@ -0,0 +1,106 @@
/**
* backend-probes.cjs
*
* Cheap "does this candidate backend actually work" checks used by
* resolveHermesBackend (main.cjs). The resolver walks a ladder of
* candidates -- bootstrap marker, `hermes` on PATH, system Python with
* hermes_cli installed -- and historically returned the first candidate
* whose binary existed on disk. That assumption breaks when a user has
* a pre-installed Python 3.11-3.13 (so findSystemPython() returns a
* path) but no hermes_cli in its site-packages: the resolver hands back
* a backend the spawn step can't actually run, and the user gets a
* dead-on-arrival "ModuleNotFoundError: No module named 'hermes_cli'"
* instead of the first-launch installer.
*
* These probes give the resolver a way to verify a candidate before
* trusting it. Failure (non-zero exit, exception, timeout) means "skip
* this rung, try the next one"; success means "spawn this for real."
* Falling off the bottom of the ladder lands on the bootstrap-needed
* sentinel, which is exactly what we want when nothing pre-existing
* actually works.
*
* Both probes are deliberately fast and forgiving:
* - 5s timeout (a hung interpreter beats forever, but we still give
* slow disks / cold caches room to breathe)
* - stdio ignored (we only care about exit code; stdout/stderr are
* not surfaced to the user, just to recentHermesLog for forensics
* via the caller's catch block if it chooses)
* - any throw -> false (never propagate -- resolver wants a boolean)
*
* Kept in a standalone cjs module so it can be unit-tested with
* `node --test` without dragging in the electron runtime (same pattern
* as bootstrap-platform.cjs and hardening.cjs).
*/
const { execFileSync } = require('node:child_process')
const PROBE_TIMEOUT_MS = 5000
/**
* Return true iff `python -c "import hermes_cli"` exits 0.
*
* Used to gate the "fallback to system Python with hermes_cli installed"
* rung of resolveHermesBackend. Without this, a system Python 3.11-3.13
* registered in PEP 514 makes findSystemPython() succeed regardless of
* whether hermes_cli has actually been pip-installed into its
* site-packages -- and the resolver returns a backend that immediately
* dies on spawn.
*
* @param {string} pythonPath - Absolute path to a python.exe / python.
* @returns {boolean}
*/
function canImportHermesCli(pythonPath) {
if (!pythonPath) return false
try {
execFileSync(pythonPath, ['-c', 'import hermes_cli'], {
stdio: 'ignore',
timeout: PROBE_TIMEOUT_MS,
windowsHide: true
})
return true
} catch {
return false
}
}
/**
* Return true iff `<hermesCommand> --version` exits 0.
*
* Used to gate the "existing `hermes` on PATH" rung. Without this, a
* stale hermes.cmd shim left behind by an uninstalled pip install (or
* a half-built venv whose `hermes` entry-point points at a deleted
* Python) survives findOnPath() and gets selected as the backend.
*
* We intentionally avoid invoking the command with the dashboard args
* here -- `--version` is the cheapest "is this binary alive" smoke
* test that every hermes_cli entry-point has supported since 0.1.
*
* @param {string} hermesCommand - Resolved absolute path to a hermes
* executable (or an interpreter+script wrapper).
* @param {object} [opts]
* @param {boolean} [opts.shell] - Whether to run through a shell. For
* .cmd/.bat shims on Windows execFileSync needs shell:true to find
* the cmd interpreter; mirrors the same flag isCommandScript() drives
* in resolveHermesBackend.
* @returns {boolean}
*/
function verifyHermesCli(hermesCommand, opts = {}) {
if (!hermesCommand) return false
try {
execFileSync(hermesCommand, ['--version'], {
stdio: 'ignore',
timeout: PROBE_TIMEOUT_MS,
shell: Boolean(opts.shell),
windowsHide: true
})
return true
} catch {
return false
}
}
module.exports = {
canImportHermesCli,
verifyHermesCli,
PROBE_TIMEOUT_MS
}

View File

@@ -0,0 +1,80 @@
/**
* Tests for electron/backend-probes.cjs.
*
* Run with: node --test electron/backend-probes.test.cjs
* (Wired into npm test:desktop:platforms in package.json.)
*/
const test = require('node:test')
const assert = require('node:assert/strict')
const fs = require('node:fs')
const os = require('node:os')
const path = require('node:path')
const { canImportHermesCli, verifyHermesCli } = require('./backend-probes.cjs')
// Resolve the host's own Node binary -- guaranteed to be on disk and
// runnable. We use it as both a stand-in for "a python that doesn't
// have hermes_cli" (since `node -c "import hermes_cli"` will exit
// non-zero) and as a way to script verifyHermesCli's success path
// (a tiny script we write to disk that exits 0 on --version).
const NODE_BIN = process.execPath
test('canImportHermesCli returns false when path is falsy', () => {
assert.equal(canImportHermesCli(''), false)
assert.equal(canImportHermesCli(null), false)
assert.equal(canImportHermesCli(undefined), false)
})
test('canImportHermesCli returns false when interpreter cannot run -c', () => {
// node IS an interpreter, but `node -c "import hermes_cli"` is a
// SyntaxError -- different exit reason from a real Python's
// ModuleNotFoundError, but the predicate is "exit 0 or not" and
// both land on "not", which is exactly what we want for the
// resolver fall-through.
assert.equal(canImportHermesCli(NODE_BIN), false)
})
test('canImportHermesCli returns false when binary does not exist', () => {
const ghost = path.join(os.tmpdir(), 'hermes-probes-ghost-' + Date.now() + '.exe')
assert.equal(canImportHermesCli(ghost), false)
})
test('verifyHermesCli returns false when command is falsy', () => {
assert.equal(verifyHermesCli(''), false)
assert.equal(verifyHermesCli(null), false)
assert.equal(verifyHermesCli(undefined), false)
})
test('verifyHermesCli returns false when binary does not exist', () => {
const ghost = path.join(os.tmpdir(), 'hermes-probes-ghost-' + Date.now() + '.exe')
assert.equal(verifyHermesCli(ghost), false)
})
test('verifyHermesCli returns true when --version exits 0', () => {
// Write a tiny script that exits 0 regardless of args, then invoke
// it through node. This stands in for a working hermes binary --
// verifyHermesCli only cares about the exit code.
const scriptPath = path.join(os.tmpdir(), `hermes-probes-ok-${Date.now()}-${process.pid}.cjs`)
fs.writeFileSync(scriptPath, 'process.exit(0)\n')
try {
// Use node as the launcher and our script as the "command". Pass
// shell:false (default) -- node is a real binary, no shim.
// execFileSync passes ['--version'] as args, which node ignores
// gracefully (well, it prints its version and exits 0, which is
// perfect -- exit code 0 is the only signal we read).
assert.equal(verifyHermesCli(NODE_BIN), true)
} finally {
try {
fs.unlinkSync(scriptPath)
} catch {}
}
})
test('verifyHermesCli swallows timeouts (does not throw)', () => {
// We can't easily provoke a real 5s hang in CI without slowing the
// suite, but we CAN confirm that an invocation that DOES throw
// (because the binary is missing) returns false rather than
// propagating. Same code path the timeout case takes.
assert.equal(verifyHermesCli('/definitely/not/a/real/binary/anywhere'), false)
})

View File

@@ -0,0 +1,39 @@
const fs = require('node:fs')
function isWslEnvironment(env = process.env, platform = process.platform, kernelRelease = null) {
if (platform !== 'linux') return false
if (env.WSL_DISTRO_NAME || env.WSL_INTEROP) return true
try {
const release = kernelRelease ?? fs.readFileSync('/proc/sys/kernel/osrelease', 'utf8')
return /microsoft|wsl/i.test(release)
} catch {
return false
}
}
function isWindowsBinaryPathInWsl(filePath, options = {}) {
const isWsl = options.isWsl ?? isWslEnvironment(options.env, options.platform)
if (!isWsl) return false
const normalized = String(filePath || '')
.replace(/\\/g, '/')
.toLowerCase()
return (
normalized.endsWith('.exe') ||
normalized.endsWith('.cmd') ||
normalized.endsWith('.bat') ||
normalized.endsWith('.ps1')
)
}
function bundledRuntimeImportCheck(platform = process.platform) {
return platform === 'win32' ? 'import fastapi, uvicorn, winpty' : 'import fastapi, uvicorn, ptyprocess'
}
module.exports = {
bundledRuntimeImportCheck,
isWindowsBinaryPathInWsl,
isWslEnvironment
}

View File

@@ -0,0 +1,53 @@
const assert = require('node:assert/strict')
const fs = require('node:fs')
const path = require('node:path')
const test = require('node:test')
const { bundledRuntimeImportCheck, isWindowsBinaryPathInWsl, isWslEnvironment } = require('./bootstrap-platform.cjs')
test('isWslEnvironment detects WSL2 env vars on linux', () => {
assert.equal(isWslEnvironment({ WSL_DISTRO_NAME: 'Ubuntu' }, 'linux'), true)
assert.equal(isWslEnvironment({ WSL_INTEROP: '/run/WSL/123_interop' }, 'linux'), true)
assert.equal(isWslEnvironment({}, 'linux', '6.6.87.2-microsoft-standard-WSL2'), true)
assert.equal(isWslEnvironment({}, 'linux', '6.6.87-generic'), false)
assert.equal(isWslEnvironment({ WSL_DISTRO_NAME: 'Ubuntu' }, 'darwin'), false)
})
test('isWindowsBinaryPathInWsl blocks Windows binary types on WSL', () => {
assert.equal(isWindowsBinaryPathInWsl('/mnt/c/Tools/hermes.exe', { isWsl: true }), true)
assert.equal(isWindowsBinaryPathInWsl('/mnt/c/Tools/hermes.cmd', { isWsl: true }), true)
assert.equal(isWindowsBinaryPathInWsl('/mnt/c/Tools/hermes.bat', { isWsl: true }), true)
assert.equal(isWindowsBinaryPathInWsl('/mnt/c/Tools/install.ps1', { isWsl: true }), true)
assert.equal(isWindowsBinaryPathInWsl('/usr/local/bin/hermes', { isWsl: true }), false)
assert.equal(isWindowsBinaryPathInWsl('/mnt/c/Tools/hermes.exe', { isWsl: false }), false)
})
test('bundledRuntimeImportCheck selects platform-specific import checks', () => {
assert.equal(bundledRuntimeImportCheck('win32'), 'import fastapi, uvicorn, winpty')
assert.equal(bundledRuntimeImportCheck('darwin'), 'import fastapi, uvicorn, ptyprocess')
assert.equal(bundledRuntimeImportCheck('linux'), 'import fastapi, uvicorn, ptyprocess')
})
test('packaged electron entrypoints do not require unpackaged npm modules', () => {
const electronDir = __dirname
const entrypoints = ['main.cjs', 'preload.cjs', 'bootstrap-platform.cjs']
// - electron: provided by the electron runtime, always resolvable in packaged builds.
// - node-pty: hoisted by workspace dedup AND shipped via extraResources to
// resources/native-deps/node-pty (see scripts/stage-native-deps.cjs). main.cjs
// has a try/catch fallback at line ~38 that resolves the staged copy when the
// bare require fails in the packaged asar, so the bare require itself is by
// design rather than an oversight.
const allowedBareRequires = new Set(['electron', 'node-pty'])
const requirePattern = /require\(['"]([^'"]+)['"]\)/g
for (const entrypoint of entrypoints) {
const source = fs.readFileSync(path.join(electronDir, entrypoint), 'utf8')
const bareRequires = Array.from(source.matchAll(requirePattern))
.map(match => match[1])
.filter(specifier => !specifier.startsWith('node:'))
.filter(specifier => !specifier.startsWith('.'))
.filter(specifier => !allowedBareRequires.has(specifier))
assert.deepEqual(bareRequires, [], `${entrypoint} has unpackaged runtime requires`)
}
})

View File

@@ -0,0 +1,579 @@
'use strict'
/**
* bootstrap-runner.cjs
*
* Drives apps/desktop's first-launch install of Hermes Agent by spawning
* scripts/install.ps1 stage-by-stage and streaming progress events back to
* the renderer.
*
* Wired from electron/main.cjs:
* const { runBootstrap } = require('./bootstrap-runner.cjs')
* const result = await runBootstrap({
* installStamp, // INSTALL_STAMP from main.cjs (may be null in dev)
* activeRoot, // ACTIVE_HERMES_ROOT
* sourceRepoRoot, // SOURCE_REPO_ROOT (for dev install.ps1 lookup)
* hermesHome, // HERMES_HOME
* logRoot, // HERMES_HOME/logs
* emit: ev => {...} // event sink (sender.send or similar)
* })
*
* Emits events with shape:
* { type: 'manifest', stages: [{name, title, category, needs_user_input}, ...] }
* { type: 'stage', name, state: 'running'|'succeeded'|'skipped'|'failed',
* json?, durationMs?, error? }
* { type: 'log', stage?, line } // raw line from install.ps1
* { type: 'complete', marker: <written marker payload> }
* { type: 'failed', stage?, error } // bootstrap aborted
*
* Resolves with the same shape as the final 'complete' or 'failed' event so
* callers can await either way.
*
* NOT implemented yet (deferred to Phase 1E / 1F):
* - User-facing retry / cancel from the renderer (event channels exist;
* no UI consumes them yet)
*/
const fs = require('node:fs')
const fsp = require('node:fs/promises')
const path = require('node:path')
const https = require('node:https')
const { spawn } = require('node:child_process')
const STAMP_COMMIT_RE = /^[0-9a-f]{7,40}$/i
// Stages flagged needs_user_input=true in the manifest are skipped by the
// runner (passed -NonInteractive to install.ps1, which the install script
// itself handles by emitting skipped=true frames). The renderer / 1E onboarding
// overlay takes over for those concerns (API keys, model, persona, gateway).
// We let install.ps1's own -NonInteractive logic drive this rather than
// filtering client-side -- single source of truth.
// ---------------------------------------------------------------------------
// install.ps1 source resolution
// ---------------------------------------------------------------------------
function installScriptName() {
return process.platform === 'win32' ? 'install.ps1' : 'install.sh'
}
function installScriptKind() {
return process.platform === 'win32' ? 'powershell' : 'posix'
}
function resolveLocalInstallScript(sourceRepoRoot) {
if (!sourceRepoRoot) return null
const candidate = path.join(sourceRepoRoot, 'scripts', installScriptName())
try {
fs.accessSync(candidate, fs.constants.R_OK)
return candidate
} catch {
return null
}
}
function bootstrapCacheDir(hermesHome) {
return path.join(hermesHome, 'bootstrap-cache')
}
function cachedScriptPath(hermesHome, commit) {
return path.join(bootstrapCacheDir(hermesHome), `install-${commit}.${process.platform === 'win32' ? 'ps1' : 'sh'}`)
}
function downloadInstallScript(commit, destPath) {
// Fetch from GitHub raw at the pinned commit. The raw URL with a SHA
// is immutable (unlike a branch ref), so we don't need integrity
// verification beyond "did the file we wrote pass a syntax probe."
const scriptName = installScriptName()
const url = `https://raw.githubusercontent.com/NousResearch/hermes-agent/${commit}/scripts/${scriptName}`
return new Promise((resolve, reject) => {
fs.mkdirSync(path.dirname(destPath), { recursive: true })
const tmpPath = destPath + '.tmp'
const out = fs.createWriteStream(tmpPath)
https
.get(url, res => {
if (res.statusCode === 301 || res.statusCode === 302) {
// GitHub raw shouldn't redirect for a SHA URL, but follow once
// defensively.
out.close()
fs.unlinkSync(tmpPath)
https
.get(res.headers.location, res2 => {
if (res2.statusCode !== 200) {
reject(
new Error(`Failed to download ${scriptName}: HTTP ${res2.statusCode} from redirect ${res.headers.location}`)
)
return
}
const out2 = fs.createWriteStream(tmpPath)
res2.pipe(out2)
out2.on('finish', () => {
out2.close()
fs.renameSync(tmpPath, destPath)
resolve(destPath)
})
out2.on('error', reject)
})
.on('error', reject)
return
}
if (res.statusCode !== 200) {
out.close()
try {
fs.unlinkSync(tmpPath)
} catch {}
reject(new Error(`Failed to download ${scriptName}: HTTP ${res.statusCode} from ${url}`))
return
}
res.pipe(out)
out.on('finish', () => {
out.close()
fs.renameSync(tmpPath, destPath)
resolve(destPath)
})
out.on('error', err => {
try {
fs.unlinkSync(tmpPath)
} catch {}
reject(err)
})
})
.on('error', err => {
try {
fs.unlinkSync(tmpPath)
} catch {}
reject(err)
})
})
}
async function resolveInstallScript({ installStamp, sourceRepoRoot, hermesHome, emit }) {
// 1. Dev shortcut: prefer a local checkout's installer so we can iterate
// without pushing. SOURCE_REPO_ROOT comes from main.cjs (path.resolve
// of APP_ROOT/../..).
const localScript = resolveLocalInstallScript(sourceRepoRoot)
if (localScript) {
emit({ type: 'log', line: `[bootstrap] using local ${installScriptName()} at ${localScript}` })
return { path: localScript, source: 'local', kind: installScriptKind() }
}
// 2. Packaged path: download from GitHub at the pinned commit (1B's stamp).
if (!installStamp || !installStamp.commit || !STAMP_COMMIT_RE.test(installStamp.commit)) {
throw new Error(
`Cannot resolve ${installScriptName()}: no SOURCE_REPO_ROOT and no install stamp. ` +
'This packaged build was produced without a valid build-time stamp.'
)
}
const cached = cachedScriptPath(hermesHome, installStamp.commit)
try {
await fsp.access(cached, fs.constants.R_OK)
emit({ type: 'log', line: `[bootstrap] using cached ${installScriptName()} for ${installStamp.commit.slice(0, 12)}` })
return { path: cached, source: 'cache', commit: installStamp.commit, kind: installScriptKind() }
} catch {
// not cached; download
}
emit({ type: 'log', line: `[bootstrap] fetching ${installScriptName()} for ${installStamp.commit.slice(0, 12)} from GitHub` })
await downloadInstallScript(installStamp.commit, cached)
emit({ type: 'log', line: `[bootstrap] saved to ${cached}` })
return { path: cached, source: 'download', commit: installStamp.commit, kind: installScriptKind() }
}
// ---------------------------------------------------------------------------
// powershell wrapper
// ---------------------------------------------------------------------------
function spawnPowerShell(scriptPath, args, { emit, stageName, abortSignal, hermesHome } = {}) {
return new Promise((resolve, reject) => {
const ps = process.platform === 'win32' ? 'powershell.exe' : 'pwsh'
const fullArgs = ['-NoProfile', '-ExecutionPolicy', 'Bypass', '-File', scriptPath, ...args]
const child = spawn(ps, fullArgs, {
stdio: ['ignore', 'pipe', 'pipe'],
env: {
...process.env,
// Pass HERMES_HOME through so install.ps1 respects the caller's
// choice rather than re-computing the default.
HERMES_HOME: hermesHome || process.env.HERMES_HOME || ''
}
})
let stdout = ''
let stderr = ''
let killed = false
const onAbort = () => {
killed = true
try {
child.kill('SIGTERM')
} catch {}
}
if (abortSignal) {
if (abortSignal.aborted) {
onAbort()
} else {
abortSignal.addEventListener('abort', onAbort, { once: true })
}
}
child.stdout.setEncoding('utf8')
child.stderr.setEncoding('utf8')
// Stream stdout line-by-line so the renderer sees progress in real time.
let stdoutBuf = ''
child.stdout.on('data', chunk => {
stdout += chunk
stdoutBuf += chunk
let nl
while ((nl = stdoutBuf.indexOf('\n')) !== -1) {
const line = stdoutBuf.slice(0, nl).replace(/\r$/, '')
stdoutBuf = stdoutBuf.slice(nl + 1)
if (line) emit && emit({ type: 'log', stage: stageName, line })
}
})
let stderrBuf = ''
child.stderr.on('data', chunk => {
stderr += chunk
stderrBuf += chunk
let nl
while ((nl = stderrBuf.indexOf('\n')) !== -1) {
const line = stderrBuf.slice(0, nl).replace(/\r$/, '')
stderrBuf = stderrBuf.slice(nl + 1)
if (line) emit && emit({ type: 'log', stage: stageName, line: `stderr: ${line}` })
}
})
child.on('error', err => {
if (abortSignal) abortSignal.removeEventListener('abort', onAbort)
reject(err)
})
child.on('close', (code, signal) => {
if (abortSignal) abortSignal.removeEventListener('abort', onAbort)
// Flush any trailing bytes
if (stdoutBuf) emit && emit({ type: 'log', stage: stageName, line: stdoutBuf })
if (stderrBuf) emit && emit({ type: 'log', stage: stageName, line: `stderr: ${stderrBuf}` })
resolve({ stdout, stderr, code, signal, killed })
})
})
}
function spawnBash(scriptPath, args, { emit, stageName, abortSignal, hermesHome } = {}) {
return new Promise((resolve, reject) => {
const child = spawn('bash', [scriptPath, ...args], {
stdio: ['ignore', 'pipe', 'pipe'],
env: {
...process.env,
HERMES_HOME: hermesHome || process.env.HERMES_HOME || ''
}
})
let stdout = ''
let stderr = ''
let killed = false
const onAbort = () => {
killed = true
try {
child.kill('SIGTERM')
} catch {}
}
if (abortSignal) {
if (abortSignal.aborted) {
onAbort()
} else {
abortSignal.addEventListener('abort', onAbort, { once: true })
}
}
child.stdout.setEncoding('utf8')
child.stderr.setEncoding('utf8')
let stdoutBuf = ''
child.stdout.on('data', chunk => {
stdout += chunk
stdoutBuf += chunk
let nl
while ((nl = stdoutBuf.indexOf('\n')) !== -1) {
const line = stdoutBuf.slice(0, nl).replace(/\r$/, '')
stdoutBuf = stdoutBuf.slice(nl + 1)
if (line) emit && emit({ type: 'log', stage: stageName, line })
}
})
let stderrBuf = ''
child.stderr.on('data', chunk => {
stderr += chunk
stderrBuf += chunk
let nl
while ((nl = stderrBuf.indexOf('\n')) !== -1) {
const line = stderrBuf.slice(0, nl).replace(/\r$/, '')
stderrBuf = stderrBuf.slice(nl + 1)
if (line) emit && emit({ type: 'log', stage: stageName, line: `stderr: ${line}` })
}
})
child.on('error', err => {
if (abortSignal) abortSignal.removeEventListener('abort', onAbort)
reject(err)
})
child.on('close', (code, signal) => {
if (abortSignal) abortSignal.removeEventListener('abort', onAbort)
if (stdoutBuf) emit && emit({ type: 'log', stage: stageName, line: stdoutBuf })
if (stderrBuf) emit && emit({ type: 'log', stage: stageName, line: `stderr: ${stderrBuf}` })
resolve({ stdout, stderr, code, signal, killed })
})
})
}
// ---------------------------------------------------------------------------
// Manifest + stage dispatch
// ---------------------------------------------------------------------------
// Build the install.ps1 pin args (-Commit / -Branch) from the install-stamp
// so the repository stage clones the exact SHA the .exe was tested with
// instead of falling back to install.ps1's default ($Branch = "main").
function buildPinArgs(installStamp) {
const args = []
if (installStamp && installStamp.commit) {
args.push('-Commit', installStamp.commit)
}
if (installStamp && installStamp.branch) {
args.push('-Branch', installStamp.branch)
}
return args
}
function buildPosixPinArgs({ installStamp, activeRoot, hermesHome }) {
const args = ['--dir', activeRoot, '--hermes-home', hermesHome]
if (installStamp && installStamp.branch) {
args.push('--branch', installStamp.branch)
}
if (installStamp && installStamp.commit) {
args.push('--commit', installStamp.commit)
}
return args
}
async function fetchManifest({ scriptPath, installerKind, emit, hermesHome, activeRoot, installStamp }) {
const isPosix = installerKind === 'posix'
const args = isPosix
? ['--manifest', ...buildPosixPinArgs({ installStamp, activeRoot, hermesHome })]
: ['-Manifest', ...buildPinArgs(installStamp)]
const result = await (isPosix ? spawnBash : spawnPowerShell)(scriptPath, args, {
emit,
stageName: '__manifest__',
hermesHome
})
if (result.code !== 0) {
throw new Error(`${isPosix ? 'install.sh --manifest' : 'install.ps1 -Manifest'} failed: exit ${result.code}\n${result.stderr || result.stdout}`)
}
// The manifest is the LAST JSON line on stdout (install.ps1 may print
// banner / info lines first depending on Console.OutputEncoding effects).
// Find the last line that parses as JSON with a `stages` field.
const lines = result.stdout.split(/\r?\n/).filter(Boolean)
for (let i = lines.length - 1; i >= 0; i--) {
try {
const parsed = JSON.parse(lines[i])
if (parsed && Array.isArray(parsed.stages)) {
return parsed
}
} catch {}
}
throw new Error(`${isPosix ? 'install.sh --manifest' : 'install.ps1 -Manifest'} produced no parseable JSON payload\n${result.stdout}`)
}
// Parse the JSON result frame from a stage run. The protocol guarantees
// exactly one JSON line per stage in -Json or -Stage mode (post #27224 fix
// for the double-emit bug we addressed in the install.ps1 PR).
function parseStageResult(stdout) {
const lines = stdout.split(/\r?\n/).filter(Boolean)
for (let i = lines.length - 1; i >= 0; i--) {
try {
const parsed = JSON.parse(lines[i])
if (parsed && typeof parsed.ok === 'boolean' && typeof parsed.stage === 'string') {
return parsed
}
} catch {}
}
return null
}
async function runStage({ scriptPath, installerKind, stage, emit, hermesHome, activeRoot, abortSignal, installStamp }) {
const startedAt = Date.now()
emit({ type: 'stage', name: stage.name, state: 'running' })
const isPosix = installerKind === 'posix'
const args = isPosix
? ['--stage', stage.name, '--non-interactive', '--json', ...buildPosixPinArgs({ installStamp, activeRoot, hermesHome })]
: ['-Stage', stage.name, '-NonInteractive', '-Json', ...buildPinArgs(installStamp)]
const result = await (isPosix ? spawnBash : spawnPowerShell)(
scriptPath,
args,
{ emit, stageName: stage.name, abortSignal, hermesHome }
)
const durationMs = Date.now() - startedAt
if (result.killed) {
const ev = { type: 'stage', name: stage.name, state: 'failed', durationMs, error: 'cancelled by user' }
emit(ev)
return ev
}
const json = parseStageResult(result.stdout)
if (!json) {
const ev = {
type: 'stage',
name: stage.name,
state: 'failed',
durationMs,
error: `${isPosix ? 'install.sh --stage' : 'install.ps1 -Stage'} ${stage.name} produced no JSON result frame (exit=${result.code})`,
json: null
}
emit(ev)
return ev
}
if (json.ok && json.skipped) {
const ev = { type: 'stage', name: stage.name, state: 'skipped', durationMs, json }
emit(ev)
return ev
}
if (json.ok) {
const ev = { type: 'stage', name: stage.name, state: 'succeeded', durationMs, json }
emit(ev)
return ev
}
const ev = { type: 'stage', name: stage.name, state: 'failed', durationMs, json, error: json.reason || `exit code ${result.code}` }
emit(ev)
return ev
}
// ---------------------------------------------------------------------------
// Per-run log file
// ---------------------------------------------------------------------------
function openRunLog(logRoot) {
fs.mkdirSync(logRoot, { recursive: true })
const ts = new Date().toISOString().replace(/[:.]/g, '-')
const logPath = path.join(logRoot, `bootstrap-${ts}.log`)
const stream = fs.createWriteStream(logPath, { flags: 'a' })
return { path: logPath, stream }
}
// ---------------------------------------------------------------------------
// Public entrypoint
// ---------------------------------------------------------------------------
async function runBootstrap(opts) {
const {
installStamp,
activeRoot,
sourceRepoRoot,
hermesHome,
logRoot,
onEvent,
abortSignal,
writeMarker // callback to write the bootstrap-complete marker; main.cjs provides
} = opts
const runLog = openRunLog(logRoot || path.join(hermesHome, 'logs'))
// Tee every event to the runLog AND the caller's onEvent. This gives us a
// forensic trail per bootstrap run AND lets the renderer subscribe live.
const emit = ev => {
try {
runLog.stream.write(JSON.stringify(ev) + '\n')
} catch {}
try {
if (typeof onEvent === 'function') onEvent(ev)
} catch (err) {
// Don't let a subscriber bug crash the bootstrap
runLog.stream.write(`emit error: ${err && err.message}\n`)
}
}
emit({
type: 'log',
line:
`[bootstrap] starting at ${new Date().toISOString()}; ` +
`activeRoot=${activeRoot}; ` +
`stamp=${installStamp ? installStamp.commit.slice(0, 12) : '<none>'}; ` +
`runLog=${runLog.path}`
})
try {
// 1. Resolve the platform installer.
const scriptInfo = await resolveInstallScript({ installStamp, sourceRepoRoot, hermesHome, emit })
const installerKind = scriptInfo.kind || 'powershell'
// 2. Fetch manifest
const manifest = await fetchManifest({
scriptPath: scriptInfo.path,
installerKind,
emit,
hermesHome,
activeRoot,
installStamp
})
emit({
type: 'manifest',
stages: manifest.stages,
protocolVersion: manifest.protocol_version || manifest.protocolVersion || null
})
// 3. Iterate stages in order. Stages flagged needs_user_input are still
// invoked -- install.ps1's own -NonInteractive handler in those stages
// emits skipped=true. We trust the protocol rather than filtering
// client-side.
for (const stage of manifest.stages) {
if (abortSignal && abortSignal.aborted) {
emit({ type: 'failed', error: 'bootstrap cancelled by user' })
return { ok: false, cancelled: true }
}
const ev = await runStage({
scriptPath: scriptInfo.path,
installerKind,
stage,
emit,
hermesHome,
activeRoot,
abortSignal,
installStamp
})
if (ev.state === 'failed') {
emit({ type: 'failed', stage: stage.name, error: ev.error || 'stage failed' })
return { ok: false, failedStage: stage.name, error: ev.error }
}
}
// 4. Write the bootstrap-complete marker.
const markerPayload = {
pinnedCommit: installStamp ? installStamp.commit : null,
pinnedBranch: installStamp ? installStamp.branch : null
}
const marker = typeof writeMarker === 'function' ? writeMarker(markerPayload) : markerPayload
emit({ type: 'complete', marker })
return { ok: true, marker }
} catch (err) {
emit({ type: 'failed', error: err.message || String(err) })
return { ok: false, error: err.message || String(err) }
} finally {
try {
runLog.stream.end()
} catch {}
}
}
module.exports = {
runBootstrap,
// Exposed for testability
parseStageResult,
resolveLocalInstallScript,
cachedScriptPath
}

View File

@@ -0,0 +1,12 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>com.apple.security.cs.allow-jit</key>
<true/>
<key>com.apple.security.cs.allow-unsigned-executable-memory</key>
<true/>
<key>com.apple.security.cs.disable-library-validation</key>
<true/>
</dict>
</plist>

View File

@@ -0,0 +1,14 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>com.apple.security.cs.allow-jit</key>
<true/>
<key>com.apple.security.cs.allow-unsigned-executable-memory</key>
<true/>
<key>com.apple.security.cs.disable-library-validation</key>
<true/>
<key>com.apple.security.device.audio-input</key>
<true/>
</dict>
</plist>

View File

@@ -0,0 +1,184 @@
const fs = require('node:fs')
const path = require('node:path')
const { fileURLToPath } = require('node:url')
const DEFAULT_FETCH_TIMEOUT_MS = 15_000
const DATA_URL_READ_MAX_BYTES = 16 * 1024 * 1024
const TEXT_PREVIEW_SOURCE_MAX_BYTES = 64 * 1024 * 1024
const SAFE_ENV_SUFFIXES = new Set(['dist', 'example', 'sample', 'template'])
const SENSITIVE_EXTENSIONS = new Set(['.kdbx', '.p12', '.pem', '.pfx'])
function resolveTimeoutMs(timeoutMs, fallbackMs = DEFAULT_FETCH_TIMEOUT_MS) {
const fallback =
Number.isFinite(fallbackMs) && Number(fallbackMs) > 0 ? Math.round(Number(fallbackMs)) : DEFAULT_FETCH_TIMEOUT_MS
const parsed = Number(timeoutMs)
if (Number.isFinite(parsed) && parsed > 0) {
return Math.round(parsed)
}
return fallback
}
function encryptDesktopSecret(value, safeStorageApi) {
const raw = String(value || '')
if (!raw) {
return null
}
let encryptionAvailable = false
try {
encryptionAvailable = Boolean(safeStorageApi?.isEncryptionAvailable?.())
} catch {
encryptionAvailable = false
}
if (!encryptionAvailable) {
throw new Error(
'Secure token storage is unavailable, so Hermes Desktop cannot save remote gateway tokens. ' +
'Set HERMES_DESKTOP_REMOTE_URL and HERMES_DESKTOP_REMOTE_TOKEN in your environment, or enable OS keychain access and try again.'
)
}
try {
return {
encoding: 'safeStorage',
value: safeStorageApi.encryptString(raw).toString('base64')
}
} catch (error) {
const detail = error instanceof Error && error.message ? ` (${error.message})` : ''
throw new Error(
`Failed to encrypt the remote gateway token for secure storage${detail}. ` +
'Set HERMES_DESKTOP_REMOTE_URL and HERMES_DESKTOP_REMOTE_TOKEN in your environment as a fallback.'
)
}
}
function sensitiveFileBlockReason(filePath) {
const normalized = String(filePath || '')
.replace(/\\/g, '/')
.toLowerCase()
const basename = path.basename(normalized)
const ext = path.extname(basename)
if (!basename) {
return null
}
if (normalized.includes('/.ssh/')) {
return 'SSH key/config files are blocked.'
}
if (normalized.includes('/.gnupg/')) {
return 'GPG key material is blocked.'
}
if (normalized.endsWith('/.aws/credentials')) {
return 'AWS credential files are blocked.'
}
if (basename === '.env') {
return '.env files are blocked because they commonly contain secrets.'
}
if (basename.startsWith('.env.')) {
const suffix = basename.slice('.env.'.length)
if (!SAFE_ENV_SUFFIXES.has(suffix)) {
return `${basename} is blocked because it appears to contain environment secrets.`
}
}
if (/^id_(rsa|dsa|ecdsa|ed25519)(?:\..+)?$/.test(basename) && !basename.endsWith('.pub')) {
return 'SSH private key files are blocked.'
}
if (SENSITIVE_EXTENSIONS.has(ext)) {
return `${ext} key/certificate files are blocked.`
}
if (basename === '.npmrc' || basename === '.netrc' || basename === '.pypirc') {
return `${basename} is blocked because it may include auth credentials.`
}
return null
}
function resolveRequestedFilePath(filePath, baseDir = process.cwd(), purpose = 'File read') {
const raw = String(filePath || '').trim()
if (!raw) {
throw new Error(`${purpose} failed: file path is required.`)
}
if (raw.includes('\0')) {
throw new Error(`${purpose} failed: file path is invalid.`)
}
if (/^file:/i.test(raw)) {
try {
return fileURLToPath(raw)
} catch {
throw new Error(`${purpose} failed: file URL is invalid.`)
}
}
const resolvedBase = path.resolve(String(baseDir || process.cwd()))
return path.resolve(resolvedBase, raw)
}
async function resolveReadableFileForIpc(filePath, options = {}) {
const purpose = String(options.purpose || 'File read')
const resolvedPath = resolveRequestedFilePath(filePath, options.baseDir, purpose)
if (options.blockSensitive !== false) {
const blockReason = sensitiveFileBlockReason(resolvedPath)
if (blockReason) {
throw new Error(`${purpose} blocked for sensitive file: ${blockReason}`)
}
}
let stat
try {
stat = await fs.promises.stat(resolvedPath)
} catch (error) {
const code = error && typeof error === 'object' ? error.code : ''
if (code === 'ENOENT' || code === 'ENOTDIR') {
throw new Error(`${purpose} failed: file does not exist.`)
}
throw new Error(`${purpose} failed: ${error instanceof Error ? error.message : String(error)}`)
}
if (stat.isDirectory()) {
throw new Error(`${purpose} failed: path points to a directory.`)
}
if (!stat.isFile()) {
throw new Error(`${purpose} failed: only regular files can be read.`)
}
const maxBytes = Number.isFinite(options.maxBytes) && Number(options.maxBytes) > 0 ? Number(options.maxBytes) : null
if (maxBytes && stat.size > maxBytes) {
throw new Error(`${purpose} failed: file is too large (${stat.size} bytes; limit ${maxBytes} bytes).`)
}
try {
await fs.promises.access(resolvedPath, fs.constants.R_OK)
} catch {
throw new Error(`${purpose} failed: file is not readable.`)
}
return { resolvedPath, stat }
}
module.exports = {
DATA_URL_READ_MAX_BYTES,
DEFAULT_FETCH_TIMEOUT_MS,
TEXT_PREVIEW_SOURCE_MAX_BYTES,
encryptDesktopSecret,
resolveReadableFileForIpc,
resolveTimeoutMs,
sensitiveFileBlockReason
}

View File

@@ -0,0 +1,116 @@
const assert = require('node:assert/strict')
const fs = require('node:fs')
const os = require('node:os')
const path = require('node:path')
const test = require('node:test')
const { pathToFileURL } = require('node:url')
const {
DEFAULT_FETCH_TIMEOUT_MS,
encryptDesktopSecret,
resolveReadableFileForIpc,
resolveTimeoutMs,
sensitiveFileBlockReason
} = require('./hardening.cjs')
test('resolveTimeoutMs falls back to defaults and accepts overrides', () => {
assert.equal(resolveTimeoutMs(undefined), DEFAULT_FETCH_TIMEOUT_MS)
assert.equal(resolveTimeoutMs(0), DEFAULT_FETCH_TIMEOUT_MS)
assert.equal(resolveTimeoutMs(-25), DEFAULT_FETCH_TIMEOUT_MS)
assert.equal(resolveTimeoutMs('2750'), 2750)
})
test('encryptDesktopSecret requires available secure storage', () => {
assert.equal(
encryptDesktopSecret('', { isEncryptionAvailable: () => true, encryptString: () => Buffer.alloc(0) }),
null
)
assert.throws(
() => encryptDesktopSecret('token', { isEncryptionAvailable: () => false, encryptString: () => Buffer.alloc(0) }),
/Secure token storage is unavailable/
)
})
test('encryptDesktopSecret stores safeStorage base64 payload', () => {
const secret = encryptDesktopSecret('token-123', {
isEncryptionAvailable: () => true,
encryptString: value => Buffer.from(`enc:${value}`, 'utf8')
})
assert.deepEqual(secret, {
encoding: 'safeStorage',
value: Buffer.from('enc:token-123', 'utf8').toString('base64')
})
})
test('sensitiveFileBlockReason blocks obvious secret file patterns', () => {
assert.match(String(sensitiveFileBlockReason('/tmp/.env')), /\.env/)
assert.equal(sensitiveFileBlockReason('/tmp/.env.example'), null)
assert.match(String(sensitiveFileBlockReason('/Users/me/.ssh/id_ed25519')), /SSH/)
assert.match(String(sensitiveFileBlockReason('/tmp/server-cert.pem')), /\.pem/)
})
test('resolveReadableFileForIpc validates existence type size and sensitivity', async t => {
const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-desktop-hardening-'))
t.after(() => fs.rmSync(tempDir, { recursive: true, force: true }))
const textPath = path.join(tempDir, 'notes.txt')
fs.writeFileSync(textPath, 'hello world', 'utf8')
const fromRelative = await resolveReadableFileForIpc('notes.txt', {
baseDir: tempDir,
maxBytes: 256,
purpose: 'File preview'
})
assert.equal(fromRelative.resolvedPath, textPath)
assert.equal(fromRelative.stat.size, 11)
const fromFileUrl = await resolveReadableFileForIpc(pathToFileURL(textPath).toString(), {
purpose: 'File preview'
})
assert.equal(fromFileUrl.resolvedPath, textPath)
await assert.rejects(
resolveReadableFileForIpc('missing.txt', {
baseDir: tempDir,
purpose: 'Text preview'
}),
/file does not exist/
)
const nestedDir = path.join(tempDir, 'directory')
fs.mkdirSync(nestedDir)
await assert.rejects(
resolveReadableFileForIpc(nestedDir, {
purpose: 'Text preview'
}),
/path points to a directory/
)
const largePath = path.join(tempDir, 'large.txt')
fs.writeFileSync(largePath, 'x'.repeat(40), 'utf8')
await assert.rejects(
resolveReadableFileForIpc(largePath, {
maxBytes: 8,
purpose: 'File preview'
}),
/file is too large/
)
const envPath = path.join(tempDir, '.env')
fs.writeFileSync(envPath, 'SECRET_TOKEN=123', 'utf8')
await assert.rejects(
resolveReadableFileForIpc(envPath, {
purpose: 'File preview'
}),
/blocked for sensitive file/
)
const envTemplatePath = path.join(tempDir, '.env.example')
fs.writeFileSync(envTemplatePath, 'EXAMPLE_TOKEN=value', 'utf8')
const envTemplate = await resolveReadableFileForIpc(envTemplatePath, {
purpose: 'File preview'
})
assert.equal(envTemplate.resolvedPath, envTemplatePath)
})

File diff suppressed because it is too large Load Diff

Some files were not shown because too many files have changed in this diff Show More