Compare commits

...

450 Commits

Author SHA1 Message Date
Austin Pickett
fd324562d3 feat(desktop): add context usage breakdown popover
Let users click the status bar context indicator to see how tokens are
split across system prompt, tools, rules, skills, MCP, and conversation.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-29 09:18:10 -04:00
HexLab98
f1345290ed test(auxiliary): cover NVIDIA NIM max_tokens in _build_call_kwargs 2026-06-29 18:04:39 +05:30
HexLab98
88e6f9b98c fix(auxiliary): preserve max_tokens for NVIDIA NIM aux calls
NVIDIA integrate.api.nvidia.com models such as minimaxai/minimax-m3 can
return HTTP 200 with empty choices when max_tokens is omitted. Keep the
output cap on auxiliary chat-completions routes, matching the main NVIDIA
provider profile behavior.
2026-06-29 18:04:39 +05:30
Ben Barclay
f53ba9bb54 fix(s6): dot-prefix gateway staging dir so svscan ignores it mid-build (#54834)
The register path builds each profile-gateway slot in a sibling staging
dir under /run/service (the scandir s6-svscan watches), then atomically
renames it to the live gateway-<profile> name. The staging dir was named
gateway-<profile>.tmp — a NON-dotfile — so a concurrent `s6-svscanctl -a`
rescan (fired by the cont-init reconciler registering gateway-default, or
by a sibling register) would supervise the half-built slot the moment it
had a valid type/run: s6-supervise spawns AS ROOT and mkdirs supervise/
root-owned 0700, then the in-flight _seed_supervise_skeleton early-returns
on the now-existing supervise/ and the next `mkdir supervise/event` hits
PermissionError.

That is the arm64-only CI flake on
test_s6_unregister_removes_service_dir_in_live_container
(PermissionError: /run/service/gateway-phase3test.tmp/supervise/event) —
arm64-only because the native-arm runner's wider scheduling jitter lets
the rescan land inside the ~ms seed window; amd64 ran 30/30 clean.

Fix: dot-prefix the staging dir (.gateway-<profile>.tmp) in both register
paths (S6ServiceManager.register_profile_gateway and
container_boot._register_service). s6-svscan skips any scandir entry whose
name begins with '.', so the half-built slot can never be supervised
mid-build. The atomic rename to the dotless live name is unchanged.

Verified on a real s6 image (amd64): a non-dotted staging dir is picked up
by an svscanctl -a rescan (SUPERVISED owner=root) while a dot-prefixed one
is ignored (NOT-SUPERVISED). Added a docker-harness regression test that
asserts both, plus a unit test that the staging dir is dot-prefixed.
2026-06-29 21:33:00 +10:00
Teknium
dbad6d47d3 fix(gateway): also neutralize untrusted Matrix room name in prompt
Widen #5961's _format_untrusted_prompt_value coverage to the Matrix
room display name (**Matrix Room:**), a sibling attacker-controllable
field the original fix missed. chat_name is user-settable, so an
injected room name could render as literal markdown in the system
prompt. Adds a regression test.
2026-06-29 04:25:51 -07:00
Xowiek
09666ceb76 fix(gateway): neutralize untrusted session metadata in prompts 2026-06-29 04:25:51 -07:00
teknium1
ea1372d2af fix(security): wire session-id sanitizer into artifact paths + API boundary
Defense-in-depth on top of _safe_session_filename_component (#5958):

Sink (makes the bad write impossible regardless of entry point):
- run_agent._save_session_log: sanitize session_id before building the
  session_{sid}.json snapshot path.
- agent_runtime_helpers.dump_api_request_debug: sanitize before building
  the request_dump_{sid}_{ts}.json path.

Boundary (clean 400 instead of a silently-hashed filename):
- api_server rejects path-traversal-shaped X-Hermes-Session-Id on the
  session-continuation path and the explicit /api/sessions create path,
  reusing gateway.session._is_path_unsafe (mirrors the native gateway's
  entry-boundary guard). Also enforces the session-header length cap on
  the continuation path.

Tests: traversal session_id stays contained at the write site; sanitizer
always yields a traversal-free segment; the API header rejects
../, absolute, and Windows-traversal IDs with 400.
2026-06-29 04:25:45 -07:00
Xowiek
1debd5e8f9 fix(security): add session-id filename sanitizer to prevent path traversal
Session IDs can originate from untrusted input (e.g. the
X-Hermes-Session-Id API header) and are interpolated raw into on-disk
artifact filenames under ~/.hermes/sessions/. A traversal-shaped ID
(../../../../etc/pwned) would let a caller write the session snapshot
or request dump outside the sessions directory.

_safe_session_filename_component() collapses every non [A-Za-z0-9_-]
character to _, caps the length, and appends a short content hash when
sanitization changed the string, always yielding a single traversal-free
path segment.

Closes #5958.
2026-06-29 04:25:45 -07:00
teknium1
cdd8e0a271 test(gateway): exercise last_prompt_tokens in reset-activity tests
The reset-had-activity tests set total_tokens (dead state) to simulate
activity; production records activity via last_prompt_tokens. Update
the fixtures to match the field the fix and runtime actually use.
2026-06-29 04:25:37 -07:00
Mibayy
0fe9755016 fix(gateway): use last_prompt_tokens for session-reset activity check
reset_had_activity gated on entry.total_tokens, which is never written
(token counts migrated to agent-direct persistence) so it was always 0.
That suppressed session-reset notifications for sessions that genuinely
had activity. Switch to last_prompt_tokens, which is updated on every
turn.
2026-06-29 04:25:37 -07:00
Mibayy
9e490138a0 fix(security): fail-closed feishu webhook rate limiter + whatsapp bridge path guard
Salvages the two still-valid hardenings from #5381 onto the relocated
plugin adapters (the discord/feishu/whatsapp adapters moved to
plugins/platforms/ since the PR was opened, and 4 of its 6 hunks are
already on main or superseded).

- feishu: rate limiter now denies untracked keys when the tracking table
  is at capacity after pruning stale entries (was: allow through without
  tracking). At-capacity-with-all-fresh-entries only happens under abuse,
  so allowing untracked requests let an attacker who flooded the table
  bypass the limiter entirely. Already-tracked keys and post-prune room
  are unaffected.
- whatsapp: absolute file paths handed back by the Baileys bridge are now
  validated to resolve inside a known media cache dir before being
  attached. A compromised/buggy bridge could otherwise return an
  arbitrary path (e.g. /etc/passwd) that would be sent verbatim to the
  model. Guard resolves symlinks and accepts both the canonical
  cache/<kind> and legacy <kind>_cache layouts.
2026-06-29 04:25:31 -07:00
Ruzzgar
576424cc1c fix(security): redact browser CDP endpoint logs 2026-06-29 04:25:26 -07:00
teknium1
23c03ced75 fix(session-db): enrich NULL session metadata via upsert instead of INSERT OR IGNORE
The gateway's get_or_create_session() creates a bare session row (source +
user_id) before the agent exists. The agent's later create_session() carries
the real model/model_config/system_prompt, but _insert_session_row used
INSERT OR IGNORE — silently dropping that enrichment. Gateway sessions were
left with NULL model and NULL billing metadata.

Switch to INSERT ... ON CONFLICT(id) DO UPDATE with COALESCE so NULL columns
get backfilled while values an earlier writer already set are never
overwritten (a later bare write with source='unknown' can't clobber a real
source/model). Credit: original report and fix direction by @LucidPaths (#5048).
2026-06-29 04:25:23 -07:00
teknium1
61f56d27db refactor(dashboard-auth): drop redundant _interactive_providers helper
list_session_providers() already filters on supports_session=True, so the
new helper re-filtered an already-filtered list. Call it directly at the
single auto-SSO call site.
2026-06-29 04:25:18 -07:00
Ben
f5ecbe1ec6 feat(dashboard): auto-initiate portal SSO redirect on unauthenticated load
When the dashboard gateway has no local session cookie, it rendered a
click-through /login interstitial — even though the Nous portal's
/oauth/authorize auto-approves any current member of the dashboard's org
and is a silent 302 when the user already holds a portal session. For the
common case (clicking a hosted-agent dashboard link while signed in to the
portal) that interstitial click is pure friction.

This makes the gate auto-initiate the OAuth redirect on an unauthenticated
HTML document load instead of rendering the interstitial, when exactly one
interactive provider is registered. A one-shot loop-guard cookie
(hermes_sso_attempt, 60s TTL) ensures that a genuinely absent portal
session (the portal bounces back still-unauthenticated) falls back to the
/login page after exactly one bounce rather than ping-ponging forever. The
marker is cleared on a successful callback and whenever the gate falls back
to /login.

Security: this removes a human CLICK, not a security check. The redirect
lands on the existing /auth/login route and runs the unchanged PKCE
auth-code flow; token verification, audience checks, redirect-URI match,
and org-membership checks are all untouched. /api/* fetches still get the
401 JSON envelope (never a 302 a fetch() would follow opaquely), and with
two or more providers the /login chooser still renders.

Phase 1 of the cloud-auto-discovery work.
2026-06-29 04:25:18 -07:00
Teknium
dc5ef20d89 test(reasoning-floor): isolate stale-timeout floor tests from config-module reload races (#54775)
The five _resolved_api_call_stale_timeout_base integration tests reloaded
hermes_cli.config + hermes_cli.timeouts via importlib.reload to clear cached
config. Under xdist that mutates module-global state shared across the worker
process, so a sibling test could leave the config cache in a state that made
get_provider_stale_timeout return a leaked value — intermittently failing
test_reasoning_floor_applies_to_opus_4_thinking (shard 6 flake, #52217 area).

Patch run_agent.get_provider_stale_timeout per-test instead: floor-path tests
get None (resolver falls through to the reasoning floor / env var / default),
the explicit-config test gets 60.0 (priority-1 short-circuit). Same assertions,
no shared-module mutation, deterministic under parallel execution.
2026-06-29 02:42:54 -07:00
sgaofen
194bff0687 fix(gateway): confirm final delivery before suppressing send
Fixes #14238. During a compression/session split at the response
boundary, the interim callback delivered unrelated commentary, setting
response_previewed=True. The suppression logic treated that as proof the
final reply had been delivered and skipped the normal send — the response
was persisted to the child session but never sent to chat.

Only suppress the normal final send when the stream consumer confirms
final delivery (final_response_sent / final_content_delivered) or the
exact final response text was delivered as a preview.
2026-06-29 02:37:11 -07:00
teknium1
fa3dba4b30 docs(infographic): add list_profiles perf-fix infographic 2026-06-29 02:35:57 -07:00
teknium1
10c9eafde2 chore(attribution): map mango001@126.com -> max-chen for salvaged #51194 2026-06-29 02:35:57 -07:00
Sahil-SS9
1bb7b59c5d fix: offload blocking profiles endpoints from asyncio event loop (#54523)
(cherry picked from commit 09f10e2b77)
2026-06-29 02:35:57 -07:00
chenxiang
d5eee133eb perf(profiles): fix list_profiles O(N*M) wrapper rescan (6.4s -> 0.4s)
find_alias_for_profile re-scanned the whole wrapper dir (~/.local/bin) and
read_text every file for EACH profile — including large unrelated binaries
(ffmpeg etc.) read 15x over. With 16 profiles this took ~6.4s, long enough
that the desktop's per-request backend calls timed out (15s) and the sidebar
rendered '全部智能体 0 / 会话 0'.

- Add build_alias_map(): single-pass {profile -> alias} reverse map, reads
  only an 8KB head slice per wrapper, skips binaries via UnicodeDecodeError.
- find_alias_for_profile now delegates to it (behavior preserved).
- Cache _count_skills by skills-dir mtime signature (+30s TTL).

list_profiles: 6.37s -> 0.84s cold / 0.44s warm. 138 profile tests pass.

(cherry picked from commit 89e593749a)
2026-06-29 02:35:57 -07:00
teknium1
2f5950a83a chore(release): add telos-oc to AUTHOR_MAP for PR #14353 salvage 2026-06-29 02:25:48 -07:00
Telos
fa11b11cf5 fix: propagate key_env from custom_providers into ProviderDef
resolve_custom_provider() previously returned api_key_env_vars=()
for every custom provider entry, silently dropping the configured
key_env field. This caused 401 errors for any custom provider that
required an API key via environment variable (e.g. Xiaomi MiMo Token
Plan, self-hosted OpenAI-compatible servers).

The key_env field is already documented in _VALID_CUSTOM_PROVIDER_FIELDS
and normalized by normalize_custom_provider_entry(), so this was just
an oversight in the ProviderDef construction.

Also adds a regression test that verifies key_env is properly
propagated into the resolved ProviderDef.
2026-06-29 02:25:48 -07:00
teknium1
9f97915163 fix(browser): route open-timeout base through _safe_command_timeout
Wire the salvaged _safe_command_timeout() guard into the surviving
open-timeout call site. _get_open_command_timeout() feeds the
browser_navigate 'open' path; this closes the last call site that
could observe a None timeout from a torn cache (#14331), since the
original PR's max(_get_command_timeout(), 60) site no longer exists
on main (now routed through _get_open_command_timeout).
2026-06-29 02:24:57 -07:00
Sanjay Santhanam
c79e6bceae fix(browser_tool): resolve race in _get_command_timeout cache returning None (#14331)
# Conflicts:
#	tools/browser_tool.py
2026-06-29 02:24:57 -07:00
Teknium
bf0d8fed8e fix(config): v32 migration flips baked-in verify_on_stop=true to false (#54740)
The first ship of verify-on-stop (config v30) defaulted
DEFAULT_CONFIG agent.verify_on_stop to a literal True, and migrate_config
persists defaults with strip_defaults=False — so every install that updated
through v30 had verify_on_stop: true written into config.yaml as a literal.

The v30->v31 migration only flipped missing/'auto' values to false and
deliberately preserved an explicit bool, so it skipped that entire population
and left verify-on-stop ON for everyone who had updated. A literal true was
never a user choice: the feature had no off-switch worth setting it against
until v31 introduced one, so a true persisted before v32 is always the old
machine default.

v32 migration flips a literal true -> false once, for both v30 (skipped v31)
and v31 (preserved-by-bug) installs. A true the user sets AFTER v32 is a
deliberate opt-in and is never touched.
2026-06-29 01:51:08 -07:00
teknium1
75317d82d0 fix(vision): narrow the fan-out cap to the CPU encode burst only
The original cap held a process-global slot across the WHOLE vision
analysis (image load + encode + LLM call) with a default of min(CPUs, 4).
That serialized legitimate multi-image workflows — "compare these 6
screenshots", "read this 10-page scan", "analyze every frame" — behind a
4-wide gate, and on the native fast path it even throttled calls that make
no LLM request at all. Excess calls queued (blocking acquire, nothing
dropped), but the latency hit on real fan-out was the wrong tradeoff.

The incident was CPU exhaustion, not call count: concurrent base64/resize
bursts saturated every core and left none to service the shared event loop
serving /api/status. So cap ONLY that:

- A dedicated, bounded ThreadPoolExecutor (_vision_cpu_executor) runs the
  encode/resize/dimension-check off the caller's loop, sized to the host's
  usable core count with NO fixed ceiling — the cap tracks the actual
  exhausted resource (cores), not a magic number. Excess encodes queue on
  the executor; cores stay free for the loop.
- The LLM call is deliberately OUTSIDE the executor, so multi-image
  workflows keep full request concurrency.
- Override via auxiliary.vision.max_concurrency / HERMES_VISION_MAX_CONCURRENCY
  (honored verbatim, including above core count); sub-1 ignored.
- _vision_concurrency_slot() is now a no-op shim for back-compat.

Tests assert: resolver defaults to host cores with no ceiling; env/config
override (incl. above cores); sub-1 rejection; the executor is dedicated and
core-sized; encode runs on a vision-encode thread; and crucially that encode
bursts are bounded to the cap while the analyses themselves stay fully
concurrent (calls_peak > cap).
2026-06-29 01:27:10 -07:00
Ben Barclay
eddfecd2ce fix(vision): cap vision_analyze fan-out concurrency process-wide
A single agent turn can fan out N vision_analyze calls at once — the
classic trigger is "analyze every frame of this video", where ffmpeg
explodes a clip into dozens of frames and the model calls vision_analyze
on each. Every call does a CPU-heavy base64-encode/resize burst AND holds
a long-lived LLM stream open. The tool executor runs concurrent tool calls
on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple
agent sessions share one process (the dashboard runs the agent in-process),
so there was no global ceiling. In prod (June 2026) a video-frame fan-out
pinned a worker thread at ~100% CPU and starved the shared asyncio event
loop that also serves the dashboard's /api/status liveness probe, flapping
the instance to UNHEALTHY even though nothing had crashed.

Add a process-global threading.BoundedSemaphore that bounds how many vision
analyses run concurrently across the whole process, held across the entire
analysis (image load + encode + LLM call) in the single _handle_vision_analyze
chokepoint (covers both the native fast path and the legacy aux-LLM path).

It is a threading semaphore, NOT asyncio: each vision call is dispatched
through model_tools._run_async on a per-thread event loop, so an asyncio
primitive bound to one loop cannot coordinate across them. The acquire is
offloaded via run_in_executor so waiting for a slot never blocks the calling
loop.

Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency,
or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or
HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can
never be disabled into an unbounded fan-out.

Tests: bounded-fan-out regression guard + a control proving it would fail
without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu
host, env override, and sub-1 rejection. Pre-existing handler tests updated
for the now-async _handle_vision_analyze. Verified via the real
registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls,
peak bounded to cap).
2026-06-29 01:27:10 -07:00
teknium1
115e78c377 test(camofox): accept headers= kwarg in persistence test mocks
The auth-header fix adds headers=_auth_headers() to all Camofox HTTP
calls. Two _capture_post mocks in the persistence test lacked a headers
parameter, so navigate raised TypeError and the success assertions
failed. Add headers=None to both mock signatures.
2026-06-29 01:26:24 -07:00
teknium1
41095fdb04 fix(camofox): register CAMOFOX_API_KEY in OPTIONAL_ENV_VARS
The auth-header fix reads CAMOFOX_API_KEY but it was never registered,
so it didn't surface in `hermes setup` / `hermes tools`. Add it as an
advanced password-category tool env var alongside CAMOFOX_URL.
2026-06-29 01:26:24 -07:00
kaishi00
08d6195bc4 fix(camofox): auto-recover from stale tab 404 on navigate
When a Camofox browser tab is garbage collected (idle timeout, browser
recycle), the held tab_id becomes stale. The next browser_navigate call
hits /tabs/{stale_id}/navigate -> HTTP 404 -> unhandled HTTPError.

Catch the 404 in camofox_navigate, clear the stale tab_id, and create a
fresh tab via _ensure_tab. The agent recovers transparently without
requiring a session restart.

Other tab operations (snapshot, click, type, etc.) use the same pattern
but only fail if the tab dies between successful calls — much rarer.
The navigate fix covers 95%+ of cases since navigate is always the entry
point.
2026-06-29 01:26:24 -07:00
liuhao1024
fe38d50833 fix(tools): read browser.command_timeout in Camofox HTTP client
The Camofox browser backend hardcoded a 30s HTTP timeout via
_DEFAULT_TIMEOUT, ignoring the user's browser.command_timeout config.
The main browser_tool path already reads this config via
_get_command_timeout().

This commit adds an equivalent _get_command_timeout() to
browser_camofox.py that reads browser.command_timeout from config
with caching, and switches all HTTP helper methods (_post, _get,
_get_raw, _delete) to use it as the default timeout.

Fixes #40843
2026-06-29 01:26:24 -07:00
刘昊
babd9168ba fix(browser): send Authorization header in Camofox HTTP calls when CAMOFOX_API_KEY is set
The five HTTP call sites in browser_camofox.py (_ensure_tab, _post,
_get, _get_raw, _delete) did not include Authorization headers, causing
403 Forbidden when the Camofox server has API key auth enabled.

Added _auth_headers() helper and wired it into all five call sites.
The health check endpoint (/health) is left without auth since it is
a connectivity probe, not a browser operation.

Regression test covers: header present when key set, absent when unset,
blank key produces empty headers.

Fixes #20476
2026-06-29 01:26:24 -07:00
liuhao1024
270456308c fix(tools): send listItemId instead of sessionKey in Camofox tab creation
The Camoufox REST API server expects `listItemId` in the `POST /tabs`
body, but `_ensure_tab` was sending `sessionKey`.  This caused a 400
Bad Request on every `browser_navigate` call.

The parameter name mismatch is visible in the same file: line 283
already reads `tab.get("listItemId")` when adopting existing tabs,
confirming the server-side field name.

Fixes #37960
2026-06-29 01:26:24 -07:00
teknium1
34e616e778 feat(slack): nudge stale installs to add mpim scopes; mark message.mpim required
Follow-up to the group-DM manifest fix. The manifest change only helps
NEW installs; existing apps keep their old (mpim-less) scopes until the
admin reinstalls. Since a missing message.mpim event delivers nothing
(no runtime API error to catch), detect stale installs at connect time
from the auth.test x-oauth-scopes header and log an actionable reinstall
nudge when im:history is granted but mpim:history is not. Also promote
message.mpim from Recommended to Required in the docs event tables so the
default setup path can't drop it.
2026-06-29 01:02:53 -07:00
Ben
4125cc3b7c fix(slack): subscribe to message.mpim + mpim scopes so group DMs work
Group DMs (multi-person DMs, channel_type=mpim) were never delivered to
the Slack bot. The adapter already classifies mpim as a DM and replies
ambiently (adapter.py:2526, is_dm = channel_type in {im, mpim}), but the
generated app manifest only subscribed to message.im / im:history — the
1:1 DM pair. Without the message.mpim event subscription Slack drops
group-DM messages before the adapter ever sees them, so 1:1 DMs worked
while group-DM ambient mode was dead.

Add message.mpim to bot_events and mpim:history (the scope that event
requires per Slack docs) + mpim:read (mirrors im:read for the
conversations.info classification call) to bot_scopes. Update the
SLACK_BOT_TOKEN / SLACK_APP_TOKEN setup-help strings and the Slack docs
(EN + zh-Hans: scope table, event table, troubleshooting) so existing
installs are told to add the new scopes and reinstall.

Reported by an enterprise customer. Note: this is a manifest/scope
change, so it only takes effect after the app is reinstalled and the
new scopes are accepted.

Tests: assert message.mpim + mpim:history + mpim:read are in the
manifest (with and without assistant mode); both fail on current main
and pass with this change.
2026-06-29 01:02:53 -07:00
Teknium
29f0968275 test(windows): harden pid-scan no-window assertion against captured-call leakage (#54707)
test_gateway_pid_scan_hides_wmic_and_powershell_windows flaked once in CI
(slice 7/8) with 'KeyError: creationflags' while passing 15/15 under exact
CI-parity locally. The positional 'kwargs["creationflags"]' indexing raises
a bare KeyError the moment any stray subprocess.run call is captured, masking
the real contract. Filter captured calls to the two intended Windows console
spawns (wmic + PowerShell fallback) and assert each is windowless via
.get('creationflags'); a leaked/extra call now surfaces as a readable
len-mismatch with the full captured list, not a cryptic KeyError.
2026-06-29 01:01:29 -07:00
Teknium
0434a9a5ec chore: regenerate uv.lock for supermemory + mem0 extras 2026-06-29 00:25:36 -07:00
Ben Barclay
1289f12812 fix(memory): lazy-install supermemory + mem0 SDKs like honcho/hindsight
The supermemory and mem0 memory providers shipped third-party SDKs
(supermemory / mem0ai) that are not core dependencies, but — unlike the
honcho and hindsight providers — they imported those SDKs directly with
no tools.lazy_deps.ensure() preflight and had no LAZY_DEPS allowlist
entry. On the published Docker image the agent venv is sealed
(HERMES_DISABLE_LAZY_INSTALLS=1) and lazy installs are redirected to a
writable durable target (HERMES_LAZY_INSTALL_TARGET). honcho/hindsight
route through ensure() and install fine there; supermemory/mem0 never
called it, so their SDK was never installed on a hosted instance and the
provider silently reported itself unavailable even with the API key set.

Fixes:
- Add memory.supermemory + memory.mem0 to the LAZY_DEPS allowlist
  (tools/lazy_deps.py), pinned to current PyPI releases.
- Call ensure('memory.<x>', prompt=False) at each SDK-import chokepoint
  (_SupermemoryClient.__init__; Mem0MemoryProvider._create_backend),
  mirroring honcho's wrapped try/except shape.
- Drop the SDK-import gate from supermemory's is_available() — it was a
  chicken-and-egg trap (provider never loaded on a sealed venv, so
  ensure() never ran). Now key-presence only, like honcho/mem0.
- Add matching pyproject extras [supermemory]/[mem0]; update the
  lazy-covered-extras contract test (excluded from [all] by policy).

Tests prove each path fails without the fix and the real sealed-venv
durable-target gate accepts both features.
2026-06-29 00:25:36 -07:00
teknium1
f860492842 test(desktop): match multiline spawn(ps, fullArgs) via regex like sibling sites
The bootstrap-runner PowerShell spawn is formatted multiline (spawn(\n  ps,\n  fullArgs,...), so the literal substring 'spawn(ps, fullArgs' never matched and the assertion was failing on main independent of #54635. Convert it to a whitespace-tolerant regex like every other call-site assertion in this file.
2026-06-28 23:59:32 -07:00
emozilla
aa2ae36c3f fix(desktop): launch Windows backend as console python so child consoles are inherited, not flashed
The recurring Windows desktop console-flash bug (#54220) is governed by the
*parent's* console, not by each child spawn. The desktop backend was launched as
GUI-subsystem pythonw.exe, which has no console at all — so every
console-subsystem child it spawns (git, gh, cmd, wmic, powershell, ...) had to
allocate its own console, flashing a window. That is why the fix had become an
endless per-call-site sweep of CREATE_NO_WINDOW flags: each leaf spawn was
papering over a missing console on the root.

Launch the backend as the venv's console python.exe instead. Under the existing
hiddenWindowsChildOptions() wrapper (windowsHide: true -> CREATE_NO_WINDOW) the
backend owns a single *windowless* console, and every descendant spawn inherits
it instead of allocating a visible one. This makes "no flashing windows" a
property of the one backend launch rather than a flag that must be remembered at
every spawn site — including spawns inside third-party libraries that no
call-site sweep can reach.

Verified on Windows 11 25H2 (Windows Terminal default): with the per-site hide
flag forcibly neutered, the canonical culprits (git/gh/cmd/wmic/powershell)
spawned naively and none flashed, while the same naive spawn from the old
console-less pythonw parent did flash — isolating the parent console as the cause.

Two premises behind the old pythonw approach did not hold up on current Windows
and are dropped here:
- The venv Scripts\python.exe uv shim, under CREATE_NO_WINDOW, re-execs base
  python *windowless* — it does not flash a conhost (the #52239 concern), so the
  base-pythonw detour is unnecessary.
- Console python restores stdout, so the backend announces its port on the normal
  HERMES_DASHBOARD_READY stdout line; the pythonw-only ready-file side channel is
  no longer needed and the readyFile opt-in is removed.

Removes the now-dead pythonw machinery (getNoConsoleVenvPython, toNoConsolePython,
applyWindowsNoConsoleSpawnHints, readVenvHome) and updates the test to assert the
new invariant: backend command is never pythonw, both backend spawns still go
through hiddenWindowsChildOptions, and no backend opts into the ready-file path.

Scope: this fixes the high-frequency backend-descendant flash classes. The
updater/UAC handoff (#54543) and embedded-terminal PTY accumulation (#53555)
classes have separate root causes and are unaffected.
2026-06-28 23:59:32 -07:00
Ben
20b03d9aee i18n: add Custom Keys strings to all locale files
The env translation block is type-checked across every locale (tsc -b), so
the 8 new customKeys strings must exist in all of them, not just en/zh. Add
translated entries to the remaining 14 locales (de, es, fr, it, ja, ko, pt,
ru, tr, uk, hu, ga, af, zh-hant).
2026-06-28 22:53:56 -07:00
Ben
1c75e7c9d8 feat(dashboard): list & add arbitrary custom .env keys on the Keys page
The Keys page only rendered env vars present in a catalog (OPTIONAL_ENV_VARS
or the provider catalog); any other key a user set in .env was invisible, and
there was no way to add an arbitrary env var from the GUI (e.g. to inject a
var a skill or MCP server needs).

Backend: GET /api/env now also emits a row for every on-disk .env key that
isn't in any catalog, flagged category="custom" + custom=true and
password-masked (an unrecognised key could hold anything, so it's redacted and
reveal-gated like any secret). Channel-managed credentials stay excluded. The
write (PUT /api/env) and reveal (POST /api/env/reveal) paths already handle
arbitrary keys, with the existing env-name guard + denylist (PATH, LD_PRELOAD,
PYTHONPATH, …) enforced server-side — no new write surface.

Frontend: a new "Custom Keys" section lists those custom rows and carries an
add-a-key form (client-side name validation mirroring the backend regex; the
new row reuses the normal edit/save flow, so on save it round-trips back from
the backend as a durable custom row). i18n added for en + zh + types.

Tests: behavior-contract coverage that an unknown .env key surfaces as a
masked custom row and a catalogued key does not — verified to fail on the
pre-fix backend.
2026-06-28 22:53:56 -07:00
HexLab98
23f245eda5 test(vision): cover Ollama /api/show vision capability routing (#54511) 2026-06-28 22:52:59 -07:00
HexLab98
d7e573e54d fix(vision): detect Ollama vision models via /api/show (#54511)
When local Ollama models are absent from models.dev, probe the Ollama
server's /api/show capabilities so attached images are routed natively
instead of being stripped as non-vision input.
2026-06-28 22:52:59 -07:00
sgaofen
b481348fbc fix(agent): stream copilot ACP chat completions 2026-06-28 22:52:51 -07:00
sgaofen
0106082d1f fix(agent): return OpenAI-shaped copilot ACP tool calls 2026-06-28 22:52:51 -07:00
sgaofen
032d702140 fix(agent): omit stream_options for native Gemini streaming
Google's native Gemini REST endpoint (generativelanguage.googleapis.com,
non-/openai) rejects OpenAI-only stream_options={"include_usage": true},
crashing every streaming chat-completions call with TypeError. Omit it for
that endpoint while keeping it for the Gemini OpenAI-compat shim and all
OpenAI-compatible aggregators (OpenRouter, etc.) so usage accounting is
preserved.

Reuses is_native_gemini_base_url() so the compat shim (.../openai), which
accepts stream_options, is correctly excluded from the omission.

Fixes #14387

Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
2026-06-28 22:52:46 -07:00
Teknium
25d35cce18 infographic: Windows CLH lock-timeout traceback suppression (#54436 salvage) 2026-06-28 22:35:56 -07:00
helix4u
98a7cfb8f9 fix(logging): suppress Windows lock timeout tracebacks 2026-06-28 22:35:56 -07:00
Teknium
74541beb9c fix(security): cap WeCom callback body size before pre-auth XML parse (#54615)
The WeCom callback endpoint (internet-facing, 0.0.0.0) parsed untrusted
request bodies before signature verification. defusedxml already guards
the entity-expansion class on main, but there was no cap on raw body
size, so an unauthenticated POST could still force unbounded read work
pre-auth.

Set client_max_size=64KB on the aiohttp app (413 at the framework layer)
plus an explicit length guard in _handle_callback as defense in depth.
WeCom callbacks are small encrypted XML envelopes — media is delivered
out-of-band via MediaId, never inline — so 64KB is ample for legitimate
traffic. Adds tests for oversized (413) and normal-sized (not 413) bodies.

Salvaged from #10192 by @memosr (body-size limit half; defusedxml half
already superseded on main).
2026-06-28 22:35:43 -07:00
teknium1
0b733a8418 test(gateway): pin auto-reset cached-agent eviction (#10710)
Relocate marco0158's eviction into the dedicated auto-reset cleanup block
(single source of truth for dropping session-scoped transient state) and
add an AST invariant pinning _evict_cached_agent into that block. Add
AUTHOR_MAP entry for marco0158.
2026-06-28 22:35:17 -07:00
marco0158
b4300f2d96 fix(gateway): evict cached agent on auto-reset to prevent stale context summary leak
When a session is auto-reset by daily schedule, idle timeout, or suspended
state, the agent cache was not being cleared. This caused the old agent's
context_compressor._previous_summary to leak into the new session, mixing
old conversation history into new compaction summaries.

This was the root cause of the "skin making history" appearing after
compaction in fresh sessions reported by the user.

Follow-up to #9893 which only handled compression_exhausted case.

Changes:
- Add _evict_cached_agent(session_key) call after was_auto_reset check
- Covers daily, idle, and suspended auto-reset scenarios
- Matches the behavior of manual /reset command

Related tests: test_session_boundary_hooks, test_async_memory_flush,
test_session_reset_notify, test_session_reset_fix - all passing.
2026-06-28 22:35:17 -07:00
Junass1
61a4526ac7 fix(gateway): clear session-scoped model overrides on /resume
/resume is a conversation boundary, but unlike /new it did not clear the
chat-keyed _session_model_overrides / _pending_model_notes. A /model switch
made in the previous session under the same chat session_key leaked into the
resumed conversation, running it on the wrong model.

Clear both maps for the session_key after the switch (mirroring /new), scoped
to that key so other chats' overrides are untouched. The cached-agent eviction
this leak also implied already landed via #6672.

Closes #10702.
2026-06-28 22:35:12 -07:00
Shannon Sands
476875acb9 Add dashboard backup upload and download 2026-06-28 22:35:09 -07:00
Ben Barclay
8fe800ee1a fix(file-tools): sanitize host/relative cwd override before it reaches container sandbox (#54447) (#54616)
(cherry picked from commit 82132f7911)

Co-authored-by: Tranquil-Flow <66773372+Tranquil-Flow@users.noreply.github.com>
2026-06-29 15:32:20 +10:00
Ben Barclay
e1f4098b9f docs(cron): document explicit per-channel delivery targets for all platforms (#54630)
The cron delivery table only showed Discord/Telegram with explicit
target syntax and described Slack and every other platform as
home-channel-only. In fact the generic platform:<target> routing in
_resolve_single_delivery_target resolves explicit targets for every
platform: Slack (#channel / channel ID / channel:thread_ts), Matrix
(room/user IDs), Feishu (chat:thread), WhatsApp (JID / E.164), Signal
(group / E.164), SMS, Email, and Weixin all have dedicated explicit-
target branches in _parse_target_ref; the remaining platforms accept a
generic platform:<chat_id> passthrough.

Update the Delivery Model table (en + zh-Hans) to show the real
per-platform syntax, document #channel name resolution via the channel
directory, and note the Slack thread_ts nuance. Docs-only.
2026-06-29 15:23:16 +10:00
brooklyn!
388268ecde Merge pull request #54568 from NousResearch/bb/shared-websocket-layer
refactor(desktop+dashboard): shared WebSocket layer + decouple desktop from dashboard (hermes serve)
2026-06-28 23:43:49 -05:00
brooklyn!
fb0644fbc2 Merge pull request #54585 from NousResearch/bb/desktop-terminal-history
feat(desktop): persist & restore terminal tabs + scrollback across relaunch
2026-06-28 23:38:16 -05:00
Brooklyn Nicholson
1af109c79c test(cli): drop pytest dep + use real sentinel handlers in serve test
Clears the ty diff bot's warnings on the new test: pass real callables to
build_dashboard_parser (not object()) and replace the pytest.mark.parametrize
with a plain loop so the file is stdlib-only.
2026-06-28 23:24:45 -05:00
Ruzzgar
313a8c6833 fix(skills): replace string prefix check with strict path containment 2026-06-28 21:14:01 -07:00
Ben Barclay
0943e2a272 fix(cron): don't report a false 'gateway not running' on external-provider instances (#54600)
`hermes cron status` (and the create/list 'gateway not running' nag)
judge whether cron will fire purely from the in-process ticker's
heartbeat file + a live gateway PID. That heuristic is correct for the
built-in ticker but WRONG for an external provider like Chronos:

Chronos arms exactly one external one-shot per job and is fired by a
NAS-mediated webhook (POST /api/cron/fire). Its `start()` returns
immediately and it deliberately runs no 60s loop and writes no ticker
heartbeat — that's the whole point of scale-to-zero (the machine is at
zero between fires). So on a perfectly healthy Chronos instance,
`cron status` always printed '✗ Gateway is not running — cron jobs will
NOT fire' (or a STALLED-ticker warning), and `cron create` always
appended the 'jobs won't fire automatically' nag — both false.

Verified live on a staging Chronos instance: jobs fired and completed on
schedule via the relay while `cron status` insisted the gateway wasn't
running and the heartbeat was 370s+ stale.

Fix: resolve the active provider (offline — `resolve_cron_scheduler`,
whose `is_available()` contract forbids network) and, for any non-builtin
provider, report the managed-scheduler state instead of the ticker
heuristics, and suppress the ticker-only 'gateway not running' warning.
The built-in path is byte-unchanged. Active-job summary is factored into
a shared helper so both paths print it identically.

New tests prove both directions (chronos: no false negative even with no
gateway PID / no heartbeat; builtin: historical warning preserved) and
fail without the fix.
2026-06-29 14:03:02 +10:00
Teknium
e20ff352b9 test(matrix): authorize inviter in DM-invite fixture for new invite-auth gate
_on_invite now rejects auto-joins from users not on the allow-list. The
DM-recording tests invite @alice and expect a join, so the shared
_make_adapter fixture now puts @alice on _allowed_user_ids.
2026-06-28 20:47:33 -07:00
aaronagent
d836b2bac4 fix(matrix,mattermost): invite auth check + API path traversal guard
Two platform-security hardenings:

- Matrix: _on_invite now checks the inviter against the existing
  allow-list (_allowed_user_ids / GATEWAY_ALLOW_ALL_USERS) before
  auto-joining. Without this any federated Matrix user could invite
  the bot into arbitrary rooms, exposing its presence and metadata.
  The message and reaction paths already enforce this allow-list; the
  invite path bypassed it.

- Mattermost: _api_get / _api_post / _api_put reject any path
  containing '..'. WebSocket-event values (channel_id, post_id,
  file_id) are interpolated directly into API paths, so a malicious or
  compromised server could craft traversal payloads to make the bot
  issue authenticated requests to arbitrary endpoints with its bearer
  token.

The configurable-E2EE-passphrase change from the original PR is dropped:
the matrix adapter was rewritten onto mautrix and the passphrase-protected
key-export file no longer exists.
2026-06-28 20:47:33 -07:00
Teknium
9cf9d3a28f chore(release): add AUTHOR_MAP entry for PR #53295 salvage 2026-06-28 20:46:44 -07:00
lkevincc
163562bf88 fix: normalize lmstudio base urls 2026-06-28 20:46:44 -07:00
Teknium
43eaf79ae6 chore: remove committed PR infographics and gitignore the path (#54564)
PR infographics are rendered locally and embedded in PR descriptions via
the image-provider (fal.media) URL — they were never meant to live in the
repo. The intended .gitignore enforcement (documented as added back in May
2026) was never actually committed, so 35 PNGs (~54MB) accumulated under
infographic/ via 'docs: add PR infographic for X' commits.

- Remove all 35 tracked infographic/*.png files.
- Add infographic/ to .gitignore so git add on the path is now a no-op.

The PR body remains the archive for these images.
2026-06-28 20:46:35 -07:00
teknium1
14204b0646 test(agent): cover .hermes.md no-git-root cwd-only behavior
Regression tests for the injection fix: outside a git repo only cwd is
checked (planted ancestor .hermes.md is ignored), a cwd-local .hermes.md
is still found, and inside a git repo the parent walk to the git root
still works.
2026-06-28 20:46:32 -07:00
aaronagent
306b6615cf fix(agent): limit .hermes.md parent walk to git repos only
_find_hermes_md walks parent directories looking for .hermes.md/HERMES.md,
stopping at the git root. But when there is no git repo (_find_git_root
returns None), the stop guard never fires and the loop walks all the way
to /. On shared systems (CI runners, multi-tenant servers), a .hermes.md
planted at /tmp, /home, or / would be loaded into the system prompt of any
agent session not inside a git repo — a cross-user prompt-injection vector.

Fix: when there is no git root, only check cwd; do not walk parents.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-28 20:46:32 -07:00
Brooklyn Nicholson
1c0fa12edb feat(desktop): persist & restore terminal tabs + scrollback across relaunch
User terminal tabs and their recent scrollback now survive an app restart
(VS Code parity). Tabs, active selection, cwd, and a serialized scrollback
snapshot are written to localStorage on every change; on launch the tabs
reopen with their history replayed above a fresh shell. Processes are NOT
revived — a new shell starts one line below the restored block.

- Capture: SerializeAddon snapshots the buffer on a 750ms leading-edge
  throttle, so a `cmd; quit` lands on disk before teardown; the snapshot is
  trimmed of its trailing idle prompt (no "double prompt" on restore) and
  capped (200 scrollback lines / 48k chars) to stay under the storage budget.
- Teardown guard: app quit/reload kills the PTYs from the main process,
  firing onExit in the renderer, but React skips effect cleanups on teardown
  so the per-instance `disposed` flag never flips. A pagehide/beforeunload
  flag stops onExit from calling closeTerminal() and wiping the persisted
  tabs right before relaunch restores them. A real `exit`/Ctrl-D still closes.
- Agent mirror tabs stay runtime-only — only user tabs persist.
2026-06-28 22:12:29 -05:00
Brooklyn Nicholson
9d9a50c2bc test(cli): pin the hermes serve decoupling contract
Add a focused contract test for the headless `serve` command (routes to the
shared dashboard handler, headless by default while `dashboard` is not, accepts
the legacy --no-open, shares the same runtime/lifecycle flag surface). Also
refresh the dashboard.py module docstring to cover both commands.
2026-06-28 22:11:48 -05:00
Brooklyn Nicholson
e684b808ad fix(desktop): route old runtimes through dashboard when serve is absent
`hermes serve` is newer than the desktop binary's release cadence, so a new
app launched against an un-upgraded managed install / PATH `hermes` would
crash on an unknown subcommand and brick the user mid-upgrade. Detect whether
the resolved runtime registers `serve` (fast source read of its dashboard.py,
with a one-time CLI probe fallback) and rewrite the backend argv to the legacy
`dashboard --no-open` only when it does not. Happy path (current runtimes)
pays nothing and still spawns `serve`.

- electron/backend-command.cjs: pure serve/dashboard argv helpers + serve-
  source detection (unit-tested in backend-command.test.cjs)
- main.cjs: backendSupportsServe() cache + getBackendArgsForRuntime() guard at
  both backend spawn sites; expose `root` from the Windows venv unwrap so the
  fast source check covers Windows too
- docs: note the backward-compat fallback in README, desktop.md, AGENTS.md
2026-06-28 22:10:42 -05:00
Brooklyn Nicholson
dff491a2b9 feat(cli): add headless hermes serve backend; desktop no longer launches dashboard
The desktop app spawned `hermes dashboard --no-open` as its backend, which
made the dashboard look like a desktop prerequisite. Add a dedicated headless
`hermes serve` command that boots the same gateway (shared cmd_dashboard /
start_server) but never opens a browser, and point the desktop backend spawn
exclusively at it. dashboard and serve are now independent surfaces — neither
launches the other.

- subcommands/dashboard.py: factor shared server args; add `serve` parser
  (always headless; accepts legacy --no-open as a no-op)
- main.py: register serve in _BUILTIN_SUBCOMMANDS + coalesce set + gui-log
  detection; extend stale-backend reaper patterns to match `serve`
- desktop electron: spawn `serve`, rename dashboardArgs -> backendArgs,
  update comments + windows-child-process test assertions
- docs: desktop README, desktop.md (incl. remote-backend), AGENTS.md, and
  cli-commands.md now describe `hermes serve` as the desktop/headless backend
2026-06-28 22:04:22 -05:00
brooklyn!
4488fe134b Merge pull request #54517 from NousResearch/bb/desktop-multiterminal
feat(desktop): multi-terminal panel with read-only agent terminals
2026-06-28 21:52:14 -05:00
Brooklyn Nicholson
f019a999d8 docs: clarify desktop is self-contained, not dependent on the dashboard
The desktop app spawns a headless `hermes dashboard --no-open` backend and
talks to it through the shared @hermes/shared WebSocket client — it never
runs or requires the browser dashboard UI. Spell this out in the desktop
README, the desktop docs page, and AGENTS.md so "dashboard" stops reading
as a desktop prerequisite.
2026-06-28 21:50:33 -05:00
Brooklyn Nicholson
e9b95dfd19 fix(docker): include apps/shared in dashboard image build
The shared websocket package is a web file: dependency but was excluded
by .dockerignore and never copied into the Docker build context. Also fix
tsc -b errors: expose buildWsUrl on api and drop the GatewayClient state
getter that conflicted with the shared base class.
2026-06-28 21:43:56 -05:00
Brooklyn Nicholson
ae465e9fb8 Merge branch 'main' of github.com:NousResearch/hermes-agent into bb/desktop-multiterminal 2026-06-28 21:37:52 -05:00
Brooklyn Nicholson
1a1e00f37e fix(desktop): stop injecting ctrl-l into terminal startup
Remove the prompt-gap cleanup that sent Ctrl-L into the user's shell; it could
render as literal ^L and create the exact top-line gap it was meant to hide.
Keep first-prompt cleanup renderer-side only, and parse short ESC charset
sequences so the initial newline stripper does not disarm early.

Also add a Close all action to the terminal tab context menu.
2026-06-28 21:33:20 -05:00
brooklyn!
83f09f52f9 Merge pull request #54558 from NousResearch/bb/overlay-panels
feat(desktop): shared overlay Panel primitive for cron/profiles/agents
2026-06-28 21:32:14 -05:00
Brooklyn Nicholson
216ace4bf3 style(shared): apply workspace formatter to websocket helpers
Run the package-appropriate Prettier config on the shared WebSocket files so
the extracted helpers match the surrounding desktop/shared TypeScript style.
2026-06-28 21:30:43 -05:00
Brooklyn Nicholson
5a2906a11b chore(desktop): keep the diff surgical
Revert the repo-wide prettier churn the earlier fmt pass pulled into files
unrelated to this work; run prettier/eslint scoped to the touched files only.
2026-06-28 21:30:14 -05:00
Brooklyn Nicholson
f6ccf08ee6 refactor(web): centralize dashboard websocket URL calls
Keep dashboard pages and components on the dashboard API helper instead of
calling the raw shared URL primitive directly. The shared helper remains the
single low-level implementation; web/src/lib/api.ts is the dashboard-specific
facade for auth, base path, and ticket minting.
2026-06-28 21:27:06 -05:00
Brooklyn Nicholson
6776b2f9b5 feat(desktop): live gateway popout + statusbar/command-center polish
- Gateway status popout: flatten the header to stacked connection + inference
  statuses with system-panel and restart actions (reusing the shared
  runGatewayRestart helper). The recent-activity tail is now live while the
  popout is open via the shared LogView (WS connection churn filtered), and the
  icon / "View all logs" link dismiss the popover.
- Statusbar "menu" items accept a menuContent(close) render fn over a now
  controlled DropdownMenu, so popover content can close itself.
- Drop the always-on gateway-log poll from useStatusSnapshot (logs are fetched
  by the popout only while open).
- SearchField → text-xs to match Input/Select (controlVariants).
- Command center: remove the usage/system section dividers, swap the sessions
  nav icon (Pin → MessageCircle), small padding tweaks.
2026-06-28 21:26:15 -05:00
Brooklyn Nicholson
5a4bdfda50 fix(shared): close websocket clients deterministically
Ensure intentional client closes mark the transport closed and reject pending
RPCs immediately instead of relying on a browser close event that can be
ignored after the socket reference is cleared.
2026-06-28 21:25:12 -05:00
Brooklyn Nicholson
6c52e4a318 fix(desktop): match agent terminal scrollback to user tabs
Keep read-only agent terminal tabs visually and behaviorally aligned with normal
terminal tabs by using the same 1,000-line scrollback cap.
2026-06-28 21:22:17 -05:00
Brooklyn Nicholson
dfb561a3ae refactor(desktop+dashboard): extract shared WebSocket/JSON-RPC layer
The Electron desktop app and the web dashboard each carried their own
copy of the tui_gateway JSON-RPC WebSocket client plus near-identical
auth'd WS-URL construction. The dashboard's copy was the historical
source of the "is the dashboard required to run the desktop app?"
confusion, since the two surfaces looked coupled.

Consolidate the genuinely shared transport into the existing
framework-agnostic `@hermes/shared` package so both surfaces consume it
independently — neither app depends on the other:

- Move `resolveGatewayWsUrl` + `GatewayReauthRequiredError` (single-use
  OAuth ticket re-mint vs long-lived token fallback) into
  `@hermes/shared`; desktop now imports them directly.
- Add `buildHermesWebSocketUrl`, one base-path/scheme/auth-aware URL
  builder, and route every dashboard WS endpoint through it
  (`/api/ws`, `/api/events`, `/api/pty`, plugin WS URLs).
- Reduce the dashboard `GatewayClient` to a thin subclass of the shared
  `JsonRpcGatewayClient`, deleting ~210 lines of duplicated pending-call
  /event-dispatch/connect plumbing while keeping its dashboard-specific
  ticket-vs-token auth selection.
- Drop the stale "start it with --tui" chat banner, which implied the
  dashboard flag was required.

Behavior is preserved on both surfaces; the dashboard additionally
inherits the shared client's 15s connect timeout (previously
desktop-only), so a hung connect now fails fast instead of pinning the
composer in "connecting".
2026-06-28 21:20:35 -05:00
Brooklyn Nicholson
adacb16d62 fix(desktop): make agent terminal tabs fully readable
Register read-only agent terminals with the same renderer-side terminal reader
as user terminals so read_terminal works on whichever tab is active.

Also bring agent xterm rendering closer to user-terminal parity (unicode 11,
web links, font weights/spacing) and make the gateway sink wiring resilient if
only one terminal event sink was already installed.
2026-06-28 21:18:49 -05:00
Ben
dee41d0716 feat(dashboard): catalogue all memory-provider API keys in OPTIONAL_ENV_VARS
The dashboard Keys page and `hermes setup` render API-key rows from
OPTIONAL_ENV_VARS, but only Honcho had an entry — so Hindsight,
Supermemory, Mem0, RetainDB, ByteRover, and OpenViking read their keys
straight from os.environ yet had no place to set them in the GUI.

Add catalog entries (category=tool, password-masked, with get-key URLs
and the tool each powers) for all six, plus the relevant base-URL/endpoint
companions. Pure declaration: the generic GET /api/env endpoint, the
save/reveal write path, and the sandbox env blocklist (which auto-derives
from tool-category OPTIONAL_ENV_VARS) all pick these up with no further
wiring.

Adds a behavior-contract test asserting every memory provider's primary
credential key is catalogued, tool-categorised, and password-masked.
2026-06-28 19:17:02 -07:00
Brooklyn Nicholson
e117cfdff0 feat(desktop): live agent terminals + agent-driven tab close
Make the read-only agent terminal mirrors stream in real time and give
the agent a desktop-only way to dismiss its own tabs.

- Stream background output live: the local reader used a blocking
  read(4096) that buffered small periodic output until EOF, so agent
  tabs only "filled in" at process exit. Switch to buffer.read1(4096)
  (decoded) for incremental chunks.
- Route agent.terminal.output / terminal.close to the window that owns
  the process (its gateway session) instead of an empty session id, so
  events actually reach the desktop renderer.
- Add close_terminal: a HERMES_DESKTOP-gated tool (sibling of
  read_terminal) that drops a process's read-only tab WITHOUT killing it
  via process_registry.on_close; output keeps buffering and the user can
  reopen from the status stack.
- ⌘W now closes a focused agent tab: mark the agent instance
  data-terminal and focus it on activation so isFocusWithin routes there.
- ensureTerminal() no longer spawns an extra user shell when a tab
  already exists (e.g. opening a background task from the status stack).
2026-06-28 21:15:14 -05:00
Brooklyn Nicholson
9f02eea1d2 style(desktop): prettier + eslint pass
Repo-wide `npm run fmt` + `eslint --fix`; also drop two unused destructured
params in titlebar-overlay-width.cjs so the lint run is clean.
2026-06-28 21:04:43 -05:00
Teknium
c8fd47be14 docs: add PR infographic for approval mode validation 2026-06-28 19:04:18 -07:00
LIC99
dda3268d09 fix(approvals): warn and default to manual on unknown approvals.mode
_normalize_approval_mode() previously accepted any string, so an unknown
value like 'auto' fell through every downstream mode check (off/smart) and
silently behaved like manual with no signal. Validate against the known
modes (manual/smart/off), emit a warning for anything else, and default to
manual to match the config default and the rest of the function.

Bug 1 from the original PR (/approve & /deny bypassing the running-agent
guard) already landed on main independently, so only the mode-validation
fix is salvaged here.

Fixes #4261

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-06-28 19:04:18 -07:00
Brooklyn Nicholson
317b94871b chore(desktop): drop dead overlay primitives
Remove zero-consumer overlay code surfaced while auditing the primitive set:
OverlayNewButton (orphaned once "New" moved into PanelAddButton), OverlayCard /
overlayCardClass, and the unused overlay-search-input module. Leaves three
intentional layers: OverlayView (base), Panel (master/detail), and
OverlaySplitLayout (settings/command-center nav→content).
2026-06-28 21:03:04 -05:00
Brooklyn Nicholson
991220747f feat(desktop): unify non-settings overlays under a shared Panel primitive
Extract the agents/trace overlay chrome into overlays/panel.tsx and adopt it
across the Cron, Profiles, and Agents overlays so they share one layout
(centered card, header, master/detail list with built-in search, kebab row
actions, big "+" footer, empty state) instead of three ad-hoc split layouts.

Also in this pass:
- OverlayView insets equidistantly on every side (was top/left-only, which
  left a large left gutter on narrow windows).
- Form-control chrome: input border/background/recessed-inset are now
  per-mode theme-var knobs (--dt-input-border/-bg/-inset) — resting borders
  blend in, strengthen on hover, and go solid on focus / while a Select is open.
- Thread-timeline popover reuses the shared dropdown surface (1:1 with the
  kebab menus) and scrolls the hovered prompt into view.
2026-06-28 20:56:52 -05:00
Teknium
11183e8332 fix(profiles): validate custom alias names to prevent path traversal
`hermes profile alias <profile> --name <custom>` accepted arbitrary
strings and used them verbatim as a filename under ~/.local/bin. Because
normalize_profile_name only lowercases/strips (no regex gate), a value
like `../../.bashrc` escaped the wrapper directory and clobbered
arbitrary user-writable files. remove_wrapper_script had the same sink.

Add validate_alias_name (reusing the profile-id regex, which forbids
`/`, `.`, and `..`) and wire it into check_alias_collision,
create_wrapper_script, remove_wrapper_script, and the CLI alias action so
the rejection surfaces a clear "Invalid alias name" error instead of
silently writing or unlinking outside the wrapper dir.

Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>
Co-authored-by: Xowiek <xowiekk@gmail.com>
2026-06-28 18:53:33 -07:00
aaronagent
27ddd8fd80 fix(gateway): sanitize agent error messages, validate webhook gh args
Two of the three fixes from PR #6660 (the cli.py reopen_session change is
moot — that raw _conn.execute reopen block no longer exists on main).

- gateway/run.py: stop sending raw type(e).__name__ and str(e)[:300] to
  end users on chat platforms. Exception text from LLM providers can leak
  API URLs, file paths, and partial credentials. Return a generic message;
  keep curated status hints for known HTTP codes; full detail stays in logs.
- gateway/platforms/webhook.py: validate pr_number (positive int) and repo
  (owner/name regex) before passing to the 'gh pr comment' subprocess.
  Payload-controlled values could otherwise inject gh flags (--help, a
  different --repo). List-form subprocess means this is arg injection, not
  shell injection, but validation is still correct.

Co-authored-by: aaronagent <1115117931@qq.com>
2026-06-28 18:53:26 -07:00
aaronlab
ec148f5d31 fix(agent): guard Anthropic interrupt, cap vision data-URL size
Two independent agent-loop hardening fixes:

- anthropic: when the streaming loop breaks on _interrupt_requested,
  return None instead of calling stream.get_final_message() on the
  partially-drained stream — the SDK may hang draining remaining events
  or return a Message with incomplete tool_use blocks. The outer poll
  loop raises InterruptedError, so the return value is discarded anyway.

- vision: add a 20 MB cap on base64 data-URL payloads before
  base64.b64decode() in _materialize_data_url_for_vision. A 100MB+
  payload creates ~275MB of memory pressure; gateway users sharing the
  process can trivially OOM it. Oversized payloads return ("", None).

The third change from the original PR (streaming tool-name +=  to
assignment dedup) was already landed independently on main.

Co-authored-by: aaronlab <1115117931@qq.com>
2026-06-28 18:53:20 -07:00
Teknium
490f215a19 test: cover export-prefix stripping in .env parsers (PR #6659) 2026-06-28 18:53:00 -07:00
aaronagent
5c1ac6c70d fix(config): strip export prefix in .env parsers across three modules
All three .env parsers use `line.partition("=")` without stripping the
bash-compatible `export ` prefix first.  A line like `export API_KEY=sk-...`
produces key `"export API_KEY"` instead of `"API_KEY"`, silently ignoring
the variable and causing auth failures for users who copy-paste from
bash profiles or follow tutorials that include `export`.

- tools/skills_tool.py: `load_env()` for skill environment
- hermes_cli/config.py: `load_env()` for core config
- hermes_cli/main.py: `_has_any_provider_configured()` inline parser

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-28 18:53:00 -07:00
Teknium
f1cbe4308f fix(gateway): log error-notification failures instead of silently swallowing (#54472)
* fix(gateway): log error-notification failures instead of silently swallowing

The last-resort exception handler in _process_message_background() that
sends an error notice to the user caught all exceptions with a bare pass,
leaving zero trace when the notification itself failed. Upgrade to
logger.error(..., exc_info=True) so a failed error-notification send is
debuggable post-mortem.

Salvaged from #6499 by @BongSuCHOI (the logging-upgrade portion only).

* docs: add PR infographic for gateway error-notify logging
2026-06-28 18:52:51 -07:00
Teknium
3483424aaa fix(security): redact bare-token credentials in URL userinfo (#6396) (#54475)
git remote set-url with an embedded password (https://PASSWORD@github.com)
leaked the credential into agent output — the redaction engine only masked
user:pass@ DB connection strings, never the colon-less bare-token userinfo
form a git remote uses.

Add _URL_BARE_TOKEN_RE: scheme://TOKEN@host for web/transport schemes
(http/https/wss/git/ssh/ftp), 8+ char floor to skip short usernames, token
class forbidding /:@ so an @ in a path/query is never treated as userinfo.

Deliberately scoped to the bare-token form only. The user:pass@ colon form
and query-string tokens stay passing through (#34029, 'pass web URLs through
unchanged') so magic-link / OAuth round-trip skills keep working — a bare
credential in userinfo is never a workflow token (those live in the query
string), so masking it can't break a skill.
2026-06-28 18:52:42 -07:00
Teknium
9860d93f2a fix(terminal): require approval for host-bound Docker commands (#54483)
* fix(terminal): require approval for host-bound Docker commands

The Docker terminal backend blanket-skips dangerous-command approval on
the assumption that the container is isolated from the host. That holds
only when nothing is bind-mounted in. Once a host path is exposed (via
TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE or a host-path entry in
TERMINAL_DOCKER_VOLUMES), a command like `rm -rf /workspace` reaches
real host files but is still auto-approved.

Detect host bind mounts and route those sessions through the normal
approval flow. Isolated Docker keeps the fast path. The same gating is
applied to the execute_code guard, which had the identical blanket skip.

Co-authored-by: Hermes Agent <agent@nousresearch.com>

* chore: add AUTHOR_MAP entry for PR #6436 salvage (Kolektori)

* test: accept has_host_access kwarg in _check_all_guards mocks

The host-bound Docker approval fix adds a has_host_access kwarg to the
_check_all_guards wrapper. Six pre-existing tests monkeypatch it with a
fixed (command, env_type) / (cmd, env) lambda signature, which now
raises TypeError when terminal_tool passes the new kwarg. Widen those
mock signatures to accept **kwargs.

---------

Co-authored-by: Kolektori <256073454+Kolektori@users.noreply.github.com>
Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-06-29 11:35:41 +10:00
Ben Barclay
7cfa2fa13f fix(docker): gate resource limit flags on cgroup controller availability (#54516)
On hosts where the cgroup v2 cpu/memory/pids controllers are not delegated
to the docker/podman process (unprivileged Proxmox LXCs, some rootless and
nested setups), --pids-limit/--cpus/--memory cause every container start to
fail with OCI runtime error / exit 126, breaking terminal + execute_code.

- Add _cgroup_limits_available(image): one-shot, host-wide cached probe that
  spawns a throwaway container from the sandbox image itself (sleep 0) with
  all three flags together, mirroring the existing _storage_opt_supported
  probe-and-degrade pattern.
- Remove --pids-limit from static _BASE_SECURITY_ARGS; apply it (default 256
  via _DEFAULT_PIDS_LIMIT) in resource_args gated on the probe.
- Gate --cpus and --memory on the same probe.

Behavior unchanged on cgroup-capable hosts; graceful degradation with a
one-time warning where controllers aren't delegated.

Fixes #6568.

(cherry picked from commit c933880b7e)

Co-authored-by: angelos <angelos@oikos.lan.home.malaiwah.com>
2026-06-29 11:01:08 +10:00
Brooklyn Nicholson
5d661a3ad7 fix(desktop): show the agent command before terminal output arrives
Seed read-only agent terminal tabs with the background command immediately, so
they never open as a blank pane while stdout is pending or a live stream races
startup. Snapshot fallback now preserves that command header and appends only
missing output without duplicating live chunks.
2026-06-28 19:39:20 -05:00
Brooklyn Nicholson
6ac9ba9fc4 fix(desktop): seed agent terminal tabs from process snapshots
Read-only agent terminal tabs now consume both live agent.terminal.output chunks
and the process-list/status snapshot. The snapshot seeds tabs opened after output
already exists and acts as a fallback if the live stream races startup, so agent
background tabs don't sit blank while the status stack already knows the tail.
2026-06-28 19:35:54 -05:00
Brooklyn Nicholson
520212cc59 feat(desktop): stream agent terminal output live instead of polling
Replace the 5s output_tail poll (which often showed nothing) with a real push
stream. The process registry gains an on_output sink called from its reader
threads with each chunk; the tui_gateway wires it to emit agent.terminal.output
{process_id, chunk} (write_json is _stdout_lock-guarded, so emitting from the
reader thread is safe). The desktop routes chunks by process id straight into
the read-only agent xterm via a small writer registry, with a capped backlog so
a tab opened mid-stream (or reopened) replays what it missed.

Drops the fragile poll/tail path: no session-key matching, no truncation, no
lag — full-fidelity ANSI, env-agnostic (local/docker/ssh).
2026-06-28 19:33:43 -05:00
Brooklyn Nicholson
ad831dd492 feat(desktop): mirror agent background terminals as read-only tabs
When the agent runs terminal(background=true) — Hermes's equivalent of
Cursor's is_background — surface it as a read-only "agent" tab in the rail
(distinct sparkle icon), alongside the glanceable status-stack row, which now
links to the tab. The tab is a write-only xterm (no PTY, no input) fed by the
process output tail, appended live (faster poll while a tab is open) and
env-agnostic (works for local/docker/ssh shells alike).

- terminals.ts: TerminalEntry gains kind ('user'|'agent') + procId; agent tabs
  auto-surface once (closing one doesn't resurrect it) and the status row can
  reopen/focus them. ensureTerminal now guarantees a user shell specifically.
- use-agent-terminal.ts: slim read-only xterm hook, delta-appended.
- workspace: render user vs agent instances; auto-surface from the background
  store; tail faster while an agent tab exists.
- composer-status: $backgroundOutputByProc selector; status row links to the tab
  instead of an inline disclosure.
2026-06-28 19:26:21 -05:00
Brooklyn Nicholson
6e12f8ce4a fix(desktop): force a repaint when a terminal is re-activated
A WebGL terminal doesn't paint while visibility:hidden, so switching to it
(e.g. after closing the active tab) revealed a stale/garbled frame. On
activation, clear the glyph atlas and force a full term.refresh against the
live buffer (after the refit), then focus.
2026-06-28 19:13:58 -05:00
Brooklyn Nicholson
b02f453496 refactor(desktop): generalize focus check to isFocusWithin primitive
Replace the one-off isTerminalFocused with isFocusWithin(selector) in the
keybinds lib (beside isEditableTarget) — the reusable primitive for any
focus-scoped shortcut. The terminal marks itself data-terminal and the ⌘W
handler routes via isFocusWithin('[data-terminal]'); future surfaces just add
their own marker.
2026-06-28 19:11:48 -05:00
Brooklyn Nicholson
2d55ff8fca feat(desktop): ⌘W closes the focused terminal
Fold terminal close into the existing ⌘/Ctrl+W handler so focus decides the
target: a focused terminal takes ⌘W (closes the active tab) and otherwise the
keystroke closes the active preview tab as before. Only the ⌘ gesture is
intercepted — Ctrl+W stays the shell's werase — and a focused terminal never
lets ⌘/Ctrl+W close a preview out from under it.
2026-06-28 19:10:06 -05:00
Brooklyn Nicholson
c1bb34d5e8 fix(desktop): keep inactive terminals sized so switching doesn't garble
Hide inactive terminal tabs with `visibility` (absolute-stacked at full size)
instead of `display:none`. A display:none host is 0×0, so its ResizeObserver
fit bails and the terminal stops tracking pane resizes — re-showing it at a
changed size reflowed the buffer into a garbled prompt. Visibility-hidden
hosts keep their layout size, stay in sync, and switch instantly.
2026-06-28 19:08:25 -05:00
Brooklyn Nicholson
6875d6cd3e feat(desktop): multi-terminal panel with side tab rail
Multiple persistent in-app terminals managed by a thin VS Code-style icon
rail docked on the terminal pane's outer edge. Each tab is its own live
xterm+PTY that survives tab switches, session switches, and hiding the pane
(VS Code parity: only an explicit close or `exit` kills a shell). Terminals
own their state independent of the session — the sole thing they inherit is
an initial cwd snapshotted at creation.

- Rail: icon-only tabs (name + live hotkey on hover), +/hide controls,
  context menu. Sits at z-40 above the collapsed sidebars' hover-reveal
  triggers and marks itself data-suppress-pane-reveal, so reaching for a tab
  can't summon the file-browser/review panel.
- Lifecycle: PersistentTerminal latches mounted on first open so shells stay
  alive while hidden; ensureTerminal re-creates one on reopen.
- Agent reader: id-keyed registry drives read_terminal off the active tab.
- Keybinds (Ctrl-family, OS-aware): toggle Ctrl+`, new Ctrl+Shift+`,
  next/prev Ctrl+Shift+Down/Up, close Ctrl+Shift+W.
2026-06-28 19:06:55 -05:00
brooklyn!
10043c6d0c Merge pull request #54503 from NousResearch/bb/fix-desktop-cross-wired-resume
fix(desktop): restore cross-wired runtime-id guard on session resume
2026-06-28 18:26:01 -05:00
Brooklyn Nicholson
cd5fb760a5 fix(desktop): restore cross-wired runtime-id guard on session resume
resumeSession's warm-cache fast-path once again trusted the
storedSessionId -> runtimeId -> ClientSessionState mapping without
checking the cached state still BELONGS to the session being resumed. A
pooled profile backend that gets idle-reaped and respawned re-mints
runtime ids, so a recycled id resolves to a live-but-DIFFERENT session's
cache entry and paints the wrong transcript under the current route:
click thread A, a totally different thread (often from another worktree)
loads. The session.usage 404 guard only catches a fully-dead id; a
recycled-live id 200s, so the fast-path happily served the stale cache.

Straight regression, not a new bug. f7bf74064 ("reject cross-wired
runtime-id cache on session resume") landed takeWarmCache() + its
regression test; 62af32efe ("keep active sessions aligned with cwd"),
rebased off a stale branch, restructured resumeSession and silently
reverted both 29 minutes later -- the exact stale-branch squash clobber
AGENTS.md warns about ("Squash merges from stale branches silently
revert recent fixes").

Re-apply the whole-class fix on top of the current cwd-aligned code:
takeWarmCache() validates state.storedSessionId === storedSessionId at
BOTH cache reads (the early transcript-keep decision and the fast-path),
purging a cross-wired mapping on a miss so it falls through to a full
resume that rebinds a correct runtime id. Restore the two regression
tests guarding it.

Tests: resumeSession warm-cache mapping integrity -- a cross-wired
mapping is rejected + purged (the bug), a correctly-wired cache is still
served with no needless refetch (no perf regression).

Co-authored-by: professorpalmer <professorpalmer@users.noreply.github.com>
2026-06-28 18:23:09 -05:00
brooklyn!
65d45a0013 Merge pull request #53386 from NousResearch/bb/elevenlabs-voices-401-spam
fix(dashboard): stop ElevenLabs voice-list 401 log spam
2026-06-28 18:10:47 -05:00
Brooklyn Nicholson
f34cf7e3a4 test(gmi): stub profile fetch_models in static-fallback test
The fallback test only mocked fetch_api_models; CI still hit the real GMI
/v1/models endpoint via ProviderProfile.fetch_models and merged live
models into the result.
2026-06-28 18:05:28 -05:00
Brooklyn Nicholson
27f03243a0 fix(dashboard): stop ElevenLabs voice-list 401 log spam
The /api/audio/elevenlabs/voices endpoint logged a WARNING on every
failure, and the desktop re-polls it on each settings open/focus — a
bad/expired/scoped ELEVENLABS_API_KEY floods agent/gui logs with
identical "voice list failed: HTTP Error 401" lines indefinitely.

Treat 401/403 as a persistent "integration unavailable" state: return
{available: false, error: "unauthorized"} with a 200 (the dropdown
already handles available:false) instead of a 502, and collapse repeated
identical failures to a single log line via a small re-arming latch
(logs again on recovery or when the error changes). Non-auth errors keep
the 502 but are throttled the same way.
2026-06-28 17:59:28 -05:00
brooklyn!
d0d2cf1c2f Merge pull request #54492 from NousResearch/bb/windows-hide-checkpoint-skills-git
fix(windows): hide console flash on checkpoint git + skills_hub gh probes
2026-06-28 17:49:37 -05:00
Brooklyn Nicholson
cb1bb1a48d refactor(windows): unify windowless spawn form across the touched sites
windows_hide_flags() already returns 0 on POSIX (and creationflags=0 is
the no-op default there, exactly how server.py::_list_repo_files does it),
so drop the IS_WINDOWS import + ternary/one-use-dict gating and just pass
creationflags=windows_hide_flags() directly. Tests lose the now-pointless
IS_WINDOWS monkeypatch.
2026-06-28 17:44:47 -05:00
Brooklyn Nicholson
ee22d853eb fix(windows): hide pdftoppm console flash on PDF attach
server.py's PDF-attach handler shells out to `pdftoppm` from the
console-less desktop/gateway backend; on Windows that pops a conhost
window each attach. Route it through windows_hide_flags() like the
sibling _list_repo_files git calls (no-op on POSIX).
2026-06-28 17:43:27 -05:00
Brooklyn Nicholson
32087e4bc9 fix(windows): hide console flash on checkpoint git + skills_hub gh probes
The #54236/#54417 backend git/gh sweep routed git_probe, the repo-file
picker, coding_context, context_references, copilot_auth, and the gateway
process scans through CREATE_NO_WINDOW, but two sibling spawn legs that
also run inside the console-less desktop/gateway backend were missed:

- tools/checkpoint_manager.py `_run_git` (and the one-shot `git init
  --bare` in `_init_store`) — when checkpoints are enabled, every
  file-mutating turn fires multiple bare `git` calls (status, add,
  write-tree/commit-tree, update-ref). Spawned from a parent with no
  console (Electron spawns the backend with windowsHide → CREATE_NO_WINDOW),
  each one allocates its own conhost window → a flurry of terminal popups.
- tools/skills_hub.py `GitHubAuth._try_gh_cli` — `gh auth token`, the same
  bug class as the already-fixed copilot_auth gh probe.

Route both through `windows_hide_flags()` (no-op on POSIX), matching the
established per-site pattern. Tests added to
tests/test_windows_subprocess_no_window_flags.py.
2026-06-28 17:41:47 -05:00
Teknium
980622d0ec perf(startup): parse config + plugin manifests with libyaml CSafeLoader (#54486)
The startup config/manifest reads used PyYAML's pure-Python SafeLoader,
which is ~8x slower than the libyaml-backed CSafeLoader C extension.
config.yaml is parsed several times during launch (cli config, raw
config, early interface/redaction bridge, logging config) and every
plugin manifest is parsed once — all on the slow path.

Add utils.fast_safe_load (CSafeLoader-preferring, pure-Python fallback,
true drop-in for safe_load) and route the hot startup parse sites
through it: hermes_cli/config.py (config + manifest reads),
hermes_cli/plugins.py (manifest parse), env_loader, cli.load_cli_config,
hermes_logging, and the two pre-config early YAML bridges in main.py.

Behavior is identical (same restricted safe tag set); only speed changes.
safe_load calls on the startup path drop from ~79 to ~0, cutting the
YAML parse cost from ~0.9s to ~0.15s under profiling.

Adds tests/test_fast_safe_load.py asserting equivalence with safe_load
across input shapes, empty-doc falsiness, C-loader preference, and that
python/object tags are still rejected (safe, not full loader).
2026-06-28 15:38:39 -07:00
Teknium
d65468e7ff fix(security): SSRF guard yuanbao media download_url (#54470)
yuanbao_media.download_url() fetched model-supplied (outbound) and inbound
image/file URLs server-side via httpx with follow_redirects=True and no
SSRF check. A model response containing <img src="http://169.254.169.254/...">
routed through ImageUrlHandler -> download_url and would fetch cloud-metadata
endpoints; same for inbound media.

Add an is_safe_url() pre-flight plus an async redirect event-hook that
re-validates every 30x target, matching the cache_image_from_url() guard in
gateway/platforms/base.py. The other gateway adapters already guard their
URL-fetch paths; this was the remaining unguarded one.
2026-06-28 15:29:59 -07:00
brooklyn!
16ff1a3b93 Merge pull request #54457 from NousResearch/bb/windows-console-launcher-repair
fix(windows): repair missing console script launchers
2026-06-28 17:15:56 -05:00
Hermes Agent
c8b86963d0 docs: add PR infographic for anthropic stale base_url guard 2026-06-28 15:12:03 -07:00
奥森木
e7d4ade8cf fix(anthropic): ignore stale non-Anthropic base_url across all resolution paths
A config left with `provider: anthropic` but a leftover
`base_url: https://openrouter.ai/api/v1` (e.g. after a provider switch)
would route Anthropic OAuth/setup-token traffic to OpenRouter and 404.

Add `_anthropic_base_url_override_ok()` and gate the three native-Anthropic
resolution branches (pool, explicit, native) on it. The guard honors a
configured `model.base_url` only when it plausibly speaks the Anthropic
Messages protocol — official `*.anthropic.com` / `*.claude.com` hosts, Azure
Foundry endpoints, and `/anthropic`-suffixed or Kimi `/coding` proxies — and
falls back to `https://api.anthropic.com` otherwise. Aggregator URLs like
openrouter.ai / api.openai.com are treated as stale.

Reconstructed from @clovericbot's PR #3661 onto current main: the original
patched one branch with an anthropic-only allow-list, which would have broken
Azure-via-anthropic; widened to all three sites and made Azure/proxy-safe.
2026-06-28 15:12:03 -07:00
Teknium
95f2919f91 perf(startup): lazy-load gateway platform adapters (#54448)
Bundled platform plugins (telegram, discord, feishu, teams, ...) were
eagerly imported at plugin-discovery time on every `hermes` invocation,
including plain `hermes chat` which never touches a gateway platform.
Their modules import heavy platform SDKs at module level (lark_oapi,
microsoft_teams, discord.py, slack_bolt, ...) — feishu alone pulled in
lark_oapi (~2.6s), teams pulled microsoft_teams (~1.9s).

Discovery now registers a cheap deferred loader per platform in the
platform_registry; the adapter module is imported only when the gateway
/ cron / setup / send_message path actually asks for that platform.
is_registered() and the iterate-all accessors stay correct (deferred
counts as registered; plugin_entries()/all_entries() materialize all
deferred loaders, since those paths genuinely need every adapter).

Cold start: ~4.4s -> ~2.45s to banner. discover_and_load: 2.0s -> 0.3s
(warm), and the heavy SDKs are no longer imported at all in CLI mode.
Every shipped platform remains available out of the box — it just loads
on first use.
2026-06-28 15:11:59 -07:00
Mibayy
b0b7ff0d75 fix(provider): auto+base_url bypasses cloud API when custom endpoint configured (#3846)
When config.yaml has `provider: auto` and a non-cloud `base_url` (e.g. Ollama
at localhost:11434), requests were silently sent to https://api.anthropic.com
whenever ANTHROPIC_API_KEY was present in the environment, ignoring the
configured local endpoint and returning HTTP 401 / "credit balance too low".

Root cause: resolve_provider("auto") scans env vars and returns "anthropic"
when ANTHROPIC_API_KEY is set, before config.model.base_url is ever consulted.

In resolve_runtime_provider(), before calling resolve_provider(), short-circuit
to the OpenAI-compatible resolver when no explicit creds were passed, provider
is "auto"/unset, and a non-cloud base_url is configured. Well-known cloud roots
(openrouter.ai, anthropic.com, openai.com) are matched on HOST (not substring)
so look-alike hosts can't evade the bypass and leak a cloud credential.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-06-28 15:11:55 -07:00
Teknium
86e64900b9 fix(gateway): preserve sessions across restarts (#54442) 2026-06-28 15:10:39 -07:00
Teknium
4c2961c511 fix(curator): never archive cron-referenced skills + floor use=0 pruning (#54443)
The curator's inactivity prune archived any non-pinned agent-created
skill whose activity was older than archive_after_days (90d). A skill
loaded only by a cron job had its usage bumped solely when the job
fired, so paused jobs, infrequent (quarterly/annual) schedules, and
far-future one-shots aged their skills out from under them — the next
run then failed to load the now-archived skill.

- cron/jobs.py: add referenced_skill_names() returning skills used by
  ANY job (incl. paused/disabled).
- curator.apply_automatic_transitions(): skip cron-referenced skills
  like pinned; add a use=0 grace floor so a never-used skill is not
  marked stale/archived until it is at least stale_after_days old.
- LLM review pass: candidate list marks cron=yes; prompt forbids
  pruning cron-referenced skills and never-used skills under 30 days.

Tested E2E against a real cron job + real usage records and with 4 new
unit tests.
2026-06-28 15:10:21 -07:00
Gille
df8e2523fa fix(windows): verify launchers after primary install 2026-06-28 17:02:05 -05:00
HexLab98
76bb8f46a0 test(cli): cover Windows console script repair (#52931)
Add unit tests for missing-shim detection and repair trigger in
_verify_console_scripts_installed.
2026-06-28 17:01:31 -05:00
HexLab98
95994bbc56 fix(windows): repair missing hermes.exe after pip install (#52931)
On Windows, uv pip install -e . can register hermes.exe in package metadata
while the launcher never lands on disk. Detect missing [project.scripts]
shims and reinstall entry points under the existing quarantine path in
hermes update and install.ps1.
2026-06-28 17:01:31 -05:00
brooklyn!
28097d9cd9 Merge pull request #54385 from NousResearch/bb/project-folder-picker-remote
feat(desktop): remote-gateway-aware folder picker + git cockpit (status, review, worktrees)
2026-06-28 16:35:57 -05:00
Teknium
e5d22ab80d fix(daytona): quote single-upload mkdir parent path (#54440)
* fix(daytona): quote single-upload mkdir parent path

The single-file _daytona_upload() path shelled out 'mkdir -p {parent}'
with the remote parent interpolated unquoted, so shell metacharacters in
the path could break the command or inject arbitrary commands into the
sandbox. The bulk-upload, bulk-download, and delete paths were already
hardened with shlex-quoting helpers; this single-upload path was missed.

Route it through the existing quoted_mkdir_command() helper and add a
regression test covering a path with shell metacharacters.

Reported by @Gutslabs (#3960); the original branch predated the
file_sync refactor, so the fix is re-applied to the current code path.

* docs(infographic): daytona quote-sync fix
2026-06-28 14:33:03 -07:00
Brooklyn Nicholson
f9b469d7de test(web_git): assert default branch invariant, not hardcoded main
CI git init defaults to master on some runners; compare branch to
defaultBranch instead of pinning a branch name.
2026-06-28 16:29:52 -05:00
teknium1
c648ecdca5 fix(telegram): reject unauthorized users before event construction (#40863)
Removed/unauthorized Telegram users could inject prompt content before the
per-user auth gate fired. The adapter ran `_should_process_message`,
`_build_message_event`, and text/photo batching — and dispatched to the
runner — before `_is_user_authorized()` (gateway/authz_mixin.py) rejected
the sender. Unmentioned group chatter from a removed user was also
persisted into the session transcript via `_observe_unmentioned_group_message`,
leaking into the agent's observed context independent of dispatch.

Add `_is_user_authorized_from_message()` as an intake prefilter that runs
in `_handle_text_message`, `_handle_command`, `_handle_location_message`,
and `_handle_media_message` BEFORE batching, event construction, and the
unmentioned-group observe branch. It reuses the runner's
`_is_user_authorized()` with a correctly-shaped SessionSource (group vs
forum vs dm, real chat_id for TELEGRAM_GROUP_ALLOWED_* allowlists),
falls back to env allowlists, and only rejects when an allowlist actually
exists — unknown DMs with no allowlist still reach the pairing flow.
Channel posts authorize via `sender_chat` identity when `from_user` is
absent.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: Carlos Manuel Cejas <carlosmcejas@gmail.com>
2026-06-28 14:25:15 -07:00
srojk34
61210097a5 fix(browser): extend private-network guard to browser_get_images
The SSRF cluster (7a6fe9bb, 48f5c425, 7ef04ae7) sealed
browser_snapshot, browser_vision, and _browser_eval against
eval-navigated private pages, but browser_get_images bypasses
_browser_eval and calls _run_browser_command("eval", ...) directly.
An eval-driven navigation to a private address followed by
browser_get_images would leak image src URLs and alt text from the
private page.

Add the same _eval_ssrf_guard_active + _current_page_private_url
recheck before returning image data, matching the pattern established
by the sibling guards.

5 new tests cover: block on private page, allow on public page, skip
for local backend, skip when private URLs allowed, no guard needed on
failed eval.
2026-06-28 14:25:10 -07:00
Brooklyn Nicholson
c7542358f2 fix(desktop): remote project picker UX and profile-scoped fs/git routing
Route FS/git REST through the active profile, mount the remote folder picker
at app root, keep the project dialog open while picking, show a first-run
blank state, flip into grouped view on create, and constrain the picker scroll
area so Select stays reachable.
2026-06-28 16:23:39 -05:00
Teknium
9a0010fd46 fix(windows): cover remaining console-flash spawn legs (#54417) 2026-06-28 13:49:08 -07:00
Teknium
b31b0b9d95 docs: reconcile docs with code across last 3 releases (#54254)
Audited the last 3 releases (v2026.5.28..main) against the docs site and
fixed code-vs-docs drift:

- slash-commands: add /moa, /prompt, /pet, /hatch, /timestamps
- cli-commands: add hermes pets / project / desktop / whatsapp-cloud +
  dashboard register; correct --insecure (now a deprecated no-op);
  add gateway migrate-legacy + enroll --wake-url + dashboard --skip-build
- environment-variables: document the remaining ~48 env vars (SimpleX,
  Photon, Teams adapter, per-platform *_ALLOW_ALL_USERS, home-channel vars,
  IRC, Brave/Krea/Notion/Linear/Airtable/Tenor keys, QQ_SANDBOX) — full
  OPTIONAL_ENV_VARS (265) now covered
- configuration: document tool_loop_guardrails, goals, prompt_caching,
  network, onboarding, dashboard config blocks
- toolsets/tools-reference + tools.md: add coding/project toolsets and
  read_terminal/project_* tools; remove the stale messaging toolset and
  send_message agent tool (removed in #47856); drop stale RL-training prose
- messaging: new IRC channel page (adapter shipped without docs) + index
  row + sidebar + env vars
- pets: document the /hatch AI generation pipeline + Nous/OpenRouter image
  backend
- web-dashboard: document the bearer-token / TokenPrincipal service auth path
- purge agent-callable send_message references across guides/features and
  the research-paper-writing skill (tool removed in #47856)

Verified: docusaurus build succeeds; all authored internal links resolve.
2026-06-28 12:47:50 -07:00
Brooklyn Nicholson
19bae1b9e0 test(desktop): assert new backend sessions carry workspace cwd
Pin the desktop-to-gateway cwd handoff: createBackendSessionForSend must pass
the current workspace cwd into session.create so the backend registers the
session cwd before the agent/tools run.
2026-06-28 14:44:28 -05:00
Brooklyn Nicholson
8d8c7111d9 refactor(desktop): keep remote fs routing inside the fs facade
Let UI callers ask for folders/files without knowing remote-picker limits:
selectDesktopPaths now normalizes remote directory selection to a single folder
inside the facade. Project creation and composer context picking no longer branch
on remote mode; they route through desktop-fs helpers just like git callers route
through desktopGit(). Behavior unchanged except remote folder context now works
through the same backend picker path.
2026-06-28 14:39:33 -05:00
Brooklyn Nicholson
453f134b3b refactor(desktop): centralize remote git REST routing
Keep the remote git mirror as a thin facade: route all GETs through gitGet,
all mutations through gitPost, and keep consumers on desktopGit(). On the
backend, route git paths through a single _git_path helper instead of repeating
str(_fs_path(...)) in every endpoint. Behavior unchanged.
2026-06-28 14:37:36 -05:00
Brooklyn Nicholson
4e9439cc3b fix(desktop): route composer context picking through remote-aware fs
Second pass on the remote-project flow: the project dialog and git cockpit were
remote-aware, but the composer's Add file/folder context picker still called the
native Electron picker directly. Route it through selectDesktopPaths so remote
sessions use the backend-aware picker instead of local disk paths; preserve local
multi-select behavior and keep remote folder selection single because the in-app
remote picker only supports one directory.

Also use readDesktopFileDataUrl for image previews so an already-known backend
image path can be read through /api/fs/read-data-url, and add focused coverage
for backend file-diff routing plus the plain-folder git init/worktree path.
2026-06-28 14:35:23 -05:00
Brooklyn Nicholson
9b71221187 fix(desktop): write project IDEA.md through the remote-aware fs path
writeProjectIdea used the local-only Electron writeTextFile, so on a remote
gateway IDEA.md never landed on the backend (where the project folder lives).
Route it through writeDesktopFileText (local Electron / POST /api/fs/write-text).
2026-06-28 14:31:58 -05:00
Brooklyn Nicholson
e4cf3a2e9d refactor(web_git): unify porcelain-v2 parsing into one walker
Collapse the two near-duplicate status parsers (_parse_status_v2 +
_iter_status_entries) into a single _walk_entries generator feeding the rail,
review list, and commit flow; share the staged predicate; hoist `import re`.
Behavior unchanged.
2026-06-28 14:29:59 -05:00
Brooklyn Nicholson
fc86e35764 feat(desktop): make the git cockpit work over a remote gateway
After the folder picker fix, an added remote folder was still half-usable:
the desktop's git GUI (coding-rail status, worktree lanes, review pane,
branch switch, file diff) all ran Electron-local git on the USER's machine,
so against a remote-gateway repo they silently degraded to empty.

Mirror the whole surface over the dashboard REST API so it acts on the
BACKEND repo where sessions actually run:

- hermes_cli/web_git.py: git/gh logic (status, worktrees, branches, review
  list/diff/stage/unstage/revert/commit/commit-context/push/ship-info/
  create-pr, file-diff, worktree add/remove, branch switch) shelling to the
  system git, mirroring the Electron ops' shapes.
- web_server.py: /api/git/* routes (same auth gate + _fs_path hardening as
  /api/fs, executor-offloaded, mutations -> 400).
- apps/desktop desktop-git.ts: remote-aware facade exposing the same shape as
  window.hermesDesktop.git; coding-status / review / projects / model /
  desktop-fs route through desktopGit() so local stays Electron, remote hits
  /api/git/*.

Tests: tests/hermes_cli/test_web_server_git.py (real repo: status counts,
review classification, diff incl. untracked all-add, stage+commit roundtrip,
worktree/branch lifecycle, commit-context, gh-absent ship-info, auth) and
desktop-git.test.ts (local vs remote routing, envelope unwrap, POST bodies).
2026-06-28 14:26:09 -05:00
Brooklyn Nicholson
304f0650c4 style(desktop): tighten pickProjectFolder comment 2026-06-28 14:13:36 -05:00
Brooklyn Nicholson
4526fccdbe fix(desktop): make project "Add folder" picker remote-gateway aware
The new-project / add-folder dialog (PR #49037) picked folders via the
native Electron dialog (pickDefaultProjectDir), which only browses the
LOCAL machine. On a remote gateway that picks a path that doesn't exist
on the backend where sessions actually run.

Route pickProjectFolder() through selectDesktopPaths({directories,
multiple:false}) — the same remote-aware path the retired right-sidebar
picker used: local mode opens the native directory dialog, remote mode
browses the backend filesystem via the in-app RemoteFolderPicker. Seed
it with the backend's default cwd on remote so it opens somewhere useful.
2026-06-28 13:49:45 -05:00
brooklyn!
b699d27a4a Merge pull request #54357 from NousResearch/bb/browser-chromium-autoinstall
feat(browser): auto-install Chromium binary on local cold-start failure
2026-06-28 12:36:22 -05:00
brooklyn!
27868e5b55 Merge pull request #54353 from NousResearch/bb/browser-first-open-timeout
fix(browser): extend first-open timeout & surface daemon errors on Linux (salvage #52575)
2026-06-28 12:32:41 -05:00
Brooklyn Nicholson
70292596ef feat(browser): auto-install Chromium binary on local cold-start failure
When a local browser_navigate (or any browser command) fails fast because
Chromium isn't on disk, attempt a one-shot binary download via
`agent-browser install` and retry instead of only printing a hint.

Scope is narrow on purpose:
- binary only, never `--with-deps` (that shells apt/needs root, so missing
  system libraries stay a user action)
- gated by `security.allow_lazy_installs` (same opt-out as every lazy install)
- skipped in Docker (Chromium ships in the image)
- attempted once per process

Follow-up to #54353, which made the cold-start failure legible; this closes
the "doesn't actually install the missing browser" gap for the common case.
2026-06-28 12:25:15 -05:00
Brooklyn Nicholson
1ab5c3cdda refactor(browser): drop redundant sandbox-hint substring check 2026-06-28 12:14:47 -05:00
infinitycrew39
7bb8aa3bd5 test(browser): cover open timeout diagnostics and failed navigate title
Add regression tests for open-command timeout floors, sandbox bypass,
stderr capture formatting, first-navigation timeout wiring, and desktop
failed-navigate labeling.
2026-06-28 12:14:21 -05:00
infinitycrew39
a10727a555 fix(browser): extend first-open timeout and surface daemon errors
Local browser_navigate cold-starts the agent-browser daemon and Chromium;
60s was too short on slow Linux hosts and timeouts discarded stderr,
leaving users with a generic failure. Use a 120s floor on first open,
inject --no-sandbox in Docker, include captured daemon output plus install
hints when commands time out, and show "Failed to open" in the desktop
tool chip when navigation returns success=false.
2026-06-28 12:14:21 -05:00
brooklyn!
23021be26e Merge pull request #52656 from helix4u/fix-desktop-empty-resume-view
fix(desktop): retry empty resumed transcripts
2026-06-28 11:57:57 -05:00
ygd58
3e16176ba4 fix(tools): reconcile agent.disabled_toolsets when a toolset is enabled
_get_platform_tools() applies agent.disabled_toolsets as a final
override AFTER reading platform_toolsets.<platform>, so a toolset
listed there stays permanently OFF no matter what the toggle write
path saves. Blank Slate installs pre-populate this list with ~27
toolsets, making most of the desktop Toolsets UI un-enableable
(issue #49995).

Fix: _save_platform_tools() now removes any toolset the user just
explicitly enabled FOR THIS PLATFORM from agent.disabled_toolsets.
Toolsets the user did not touch, or that remain disabled on other
platforms, are left alone -- disabled_toolsets keeps working as a
cross-platform suppression list for anything not actively re-enabled.
Disabling a toolset (unchecking it) does not touch disabled_toolsets
at all -- only enables reconcile it.

Verified end-to-end with the exact repro from the issue: Blank Slate
config (disabled_toolsets=['todo','memory','browser'], cli=['file',
'terminal']) -> enable 'todo' via the toggle -> _get_platform_tools()
now resolves 'todo' as enabled while 'memory'/'browser' (untouched)
remain disabled.

Added 4 regression tests. Full tools_config suite: 101 passed
(97 existing + 4 new), no regressions.

Fixes #49995
2026-06-28 21:59:03 +05:30
brooklyn!
020966574d Merge pull request #53892 from NousResearch/bb/windows-popup-spawn-legs 2026-06-28 11:16:35 -05:00
Brooklyn Nicholson
eeca59f489 fix(windows): hide remaining backend console-flash legs missed on main
main (cb982ad99) wired windows_hide_flags() into the auxiliary git/gh/wmic/
bash/powershell/taskkill legs but left two it didn't reach, plus the Electron
backend-launch leg it explicitly deferred. Cover them the same way:

- apps/desktop/electron/main.cjs: getNoConsoleVenvPython resolves the BASE
  pythonw.exe instead of the venv Scripts\pythonw.exe shim, which re-execs a
  console python.exe and flashes a conhost the desktop backend can't suppress.
  Both backend creators put the venv site-packages on PYTHONPATH so imports
  still resolve under the base interpreter. (main's commit said this Electron
  leg "needs a Windows-tested change of its own".)
- tools/tts_tool.py, tools/transcription_tools.py, plugins/platforms/discord:
  ffmpeg conversions (voice notes / TTS / STT) via windows_hide_flags().
- plugins/platforms/whatsapp: netstat + taskkill bridge-port cleanup via
  windows_hide_flags().

All no-ops on POSIX. Tests assert the base-pythonw preference and the ffmpeg
legs pass CREATE_NO_WINDOW.
2026-06-28 10:19:21 -05:00
Teknium
0c2e6c0049 test: make active session cross-process race deterministic (#54248) 2026-06-28 05:49:21 -07:00
teknium1
1ffa01f35f test(windows): cover no-window backend subprocess flags 2026-06-28 05:28:45 -07:00
Teknium
cb982ad997 fix(windows): hide console-window flash on backend git/gh/wmic/bash subprocess spawns
The Windows desktop GUI runs its backend headless via pythonw.exe. Several
auxiliary subprocess sites that run inside that windowless backend spawned
console-subsystem children (git, gh, wmic, powershell, bash, rg, taskkill)
WITHOUT CREATE_NO_WINDOW, so Windows allocated a fresh conhost per call and
flashed a black window on screen — sometimes continuously (the dashboard
Projects-tree git probe alone fired ~118 spawns in 60s on startup).

The terminal tool, cron, browser, code_execution, and gateway-spawn paths
already carry windows_hide_flags(); these auxiliary probe/scan/launcher legs
were missed. Wire the existing helper into them:

- tui_gateway/git_probe.py: run_git (+ encoding=utf-8/errors=replace, fixes the
  cp950 UnicodeDecodeError on CJK paths from the same site)
- agent/coding_context.py: _git (per-turn git status/log/diff)
- agent/context_references.py: _run_git + _rg_files (@file/@ref resolution)
- hermes_cli/copilot_auth.py: gh auth token probe (auxiliary provider:auto)
- hermes_cli/gateway.py: wmic + PowerShell Get-CimInstance PID scan
- hermes_cli/main.py: wmic stale-dashboard PID scan
- gateway/status.py: taskkill /T /F force-kill

windows_hide_flags() returns 0 on POSIX, so every changed call is a no-op on
Linux/macOS (verified: real git/rg probes still work; Windows-simulated calls
all pass creationflags=CREATE_NO_WINDOW).

Scoped to the windowless-backend paths that cause the reported flashing. The
Electron updater-handoff leg (main.cjs windowsHide:false) and the
interactive-CLI banner probes (cli.py) are intentionally NOT touched here —
the former needs a Windows-tested change of its own, the latter runs in a
visible console anyway.

Tracking: #54220
Refs: #53178 #53631 #53781 #53957 #49602 #52982 #53424 #53053 #53016
2026-06-28 05:28:45 -07:00
teknium1
f25f235722 chore: map salvaged PR #49845 author email for AUTHOR_MAP 2026-06-28 04:47:39 -07:00
homelab-ha-agent
d05cc8f4d6 fix(mcp): skip preflight content-type probe for OAuth servers
OAuth-protected MCP servers (e.g. Hospitable) return 200 text/html on an
unauthenticated HEAD probe — a login/landing page the server cannot substitute
for a real MCP response without a Bearer token.  The preflight cannot
distinguish this from a misconfigured URL, so it raises NonMcpEndpointError
before the OAuth browser flow has a chance to run.

Add `and self._auth_type != "oauth"` to the preflight condition in
MCPServerTask.run().  The probe is inapplicable to OAuth servers: their URL
legitimacy is established by .well-known/oauth-protected-resource during the
OAuth handshake, not by a GET content-type check.

Concrete repro: Hospitable (https://mcp.hospitable.com/mcp) returns
`200 text/html` to an unauthenticated httpx HEAD.  Without the guard:
  ✗ NonMcpEndpointError at `hermes mcp test`
With the guard:
  ✓ Connected (1487ms) — 63 tools discovered

Relation to open PRs:
- #37598 adds a POST probe fallback for POST-only non-OAuth servers (e.g.
  DocuSeal), but only passes when POST returns 2xx + MCP content-type.
  Hospitable returns 401 on the POST probe (Bearer challenge), so #37598
  does not cover this case.
- #49463 extends the POST probe to also pass on non-2xx auth challenges
  (making it OAuth-aware), but is labeled duplicate of #37598 and may not
  land independently.
This fix is complementary: it handles OAuth servers with zero extra
round-trips rather than adding a POST probe step.

Tests:
- test_oauth_server_html_response_raises_without_skip: documents that
  _preflight_content_type raises NonMcpEndpointError for 200 text/html
  (the underlying issue), with an OAuth-server docstring.
- test_run_skips_preflight_for_oauth: verifies that run() does NOT invoke
  _preflight_content_type when auth_type=="oauth", using class-level
  monkeypatching so the gate is exercised without a live MCP transport.

23 passed  tests/tools/test_mcp_preflight_content_type.py
2026-06-28 04:47:39 -07:00
liuhao1024
9d919daf44 fix(gateway): mark platform lock failure as retryable instead of permanently fatal
When a stale lock file survives a gateway crash, `acquire_scoped_lock()`
may return `(False, existing_dict)` even after detecting and deleting
the stale lock (e.g. if unlink fails or a race condition occurs).

Previously, `_acquire_platform_lock()` called
`_set_fatal_error(..., retryable=False)`, which permanently killed the
platform — the reconnect watcher never retries a non-retryable fatal
error.

Change to `retryable=True` so the platform enters the "retrying"
state and the reconnect watcher can attempt acquisition again after the
standard backoff delay.

Fixes #54167
2026-06-28 04:35:37 -07:00
teknium1
61622bb56a fix(tui): use role=user for model switch marker to avoid HTTP 400 on strict providers (#48338)
_append_model_switch_marker() appended the post-/model-switch context marker
to session history as {"role": "system"}. The cached system prompt is
prepended to the API message list (conversation_loop.py), so this marker
became a SECOND system message mid-array after prior user/assistant turns.
Strict OpenAI-compatible providers (vLLM, Qwen) reject any system message
that is not at the beginning of the array, returning HTTP 400 and killing
the conversation on the next turn.

Flip the marker to role="user" (history entry + both session-DB persist
sites), matching the existing personality-overlay marker which already uses
role="user". repair_message_sequence() then coalesces it with adjacent user
turns as needed.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: Lucas Nicolas <lucas.nicolas@proton.me>
2026-06-28 04:34:55 -07:00
Brad Hallett
376d021fee fix(desktop): force app exit after update/uninstall handoff on macOS
On macOS app.quit() closes windows but window-all-closed deliberately keeps
the process alive (Dock convention). Every detached hand-off (update swap,
relaunch, Windows bootstrap recovery, uninstall cleanup) waits for the
desktop PID to exit before replacing/removing the bundle — so the process
never dying means the script spins its full PID-wait and the user sees a
blank app, or an uninstall that appears to do nothing.

Add a module-level isQuittingForHandoff flag, set before every hand-off
app.quit(); window-all-closed then quits on all platforms when it's set.

Covers all five hand-off sites including the Linux relaunch path.
2026-06-28 04:30:14 -07:00
teknium1
e54bedd8ea docs: add infographic for #42006 launchd bootout fix 2026-06-28 04:17:13 -07:00
izumi0uu
c4719aa51c fix(gateway): boot out stale launchd registration before restart bootstrap
launchd restart can leave the gateway job stopped but still registered after
update-time drain logic, so a direct bootstrap hits exit 5 and falls back to a
detached process. Booting the stale registration out before bootstrap keeps the
launchd-managed restart path intact and locks it with a regression test.

Constraint: Keep upstream-facing conventional commit style while preserving local decision context
Rejected: Treat bootstrap exit 5 as expected | Leaves macOS launchd restart outside launchd supervision after update
Confidence: high
Scope-risk: narrow
Directive: Keep launchd start/restart recovery flows aligned when changing launchctl handling
Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k "launchd_restart_boots_out_stale_registration_before_bootstrap or launchd_restart_falls_back_to_detached_on_error_5 or launchd_restart_drains_running_gateway_before_kickstart or launchd_restart_self_requests_graceful_restart_without_kickstart"
Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k launchd
Not-tested: Manual macOS launchctl restart after hermes update
2026-06-28 04:17:13 -07:00
Teknium
52a853f5c3 fix(test): pin monotonic clock in spinner-elapsed test to fix CI flake (#54203)
test_spinner_elapsed_format_is_fixed_width_to_reduce_wrap_jitter derived
_tool_start_time from the live time.monotonic() clock (now - 65.2 / now - 9.2).
monotonic()'s epoch is arbitrary — on a host where monotonic() < 65.2 (fresh
subprocess on a freshly-booted CI runner) the start time went negative, the
(t0 > 0) guard in _render_spinner_text() dropped the '(elapsed)' suffix, and
short.split('(',1)[1] raised IndexError: list index out of range. Deterministic
given a small clock, so it would keep flaking, not clear on rerun.

Pin time.monotonic to a fixed 1000.0 and offset _tool_start_time from it so both
the <60s and >=60s paths always render the elapsed suffix regardless of the
runner's monotonic epoch.

Pre-existing main flake (surfaced in CI test slice 1/8).
2026-06-28 04:16:25 -07:00
Teknium
8e356eccea docs(readme): trim provider list to a few names plus docs link (#54169)
The README line enumerated 11 providers inline, which dilutes the point
and goes stale as providers come and go. Replace with Nous Portal,
OpenRouter, OpenAI, your own endpoint, and a 'many others' link to the
canonical AI Providers docs page that already lists them all.
2026-06-28 04:14:59 -07:00
teknium1
f22b9d3867 docs: add infographic for MCP WS discovery fix (#38945) 2026-06-28 04:14:12 -07:00
Cornna
5c2c85c545 fix(tui): start MCP discovery for websocket sessions
The desktop app and dashboard chat reach the agent through the /api/ws
JSON-RPC sidecar (tui_gateway.ws.handle_ws), NOT through
tui_gateway.entry.main() — the stdio-TUI path that spawns the background
MCP discovery thread. In the WS process discovery was therefore never
started: _make_agent only *waits* (wait_for_mcp_discovery), which no-ops
when the thread was never created, so the agent snapshotted an MCP-less
tool list. The only discovery trigger reachable was a manual /reload-mcp,
which is why tools appeared after a reload but vanished on restart.

Start the shared, idempotent, config-gated background discovery in
handle_ws right after accept() and before gateway.ready, so the first
agent build picks up already-spawning servers (and the existing
late-binding refresh handles slow ones).

Fixes #38945.
2026-06-28 04:14:12 -07:00
teknium1
091ce825fe test(redact): fix file_read regression-guard for current-main YAML collapse
The salvaged #35519 regression guard asserted that default (non-file_read)
mode keeps a head/tail `ghp_S1...Pn2T` mask for a `token: <key>` line. On
current main the YAML config pass (`_YAML_ASSIGN_RE`, key `token`) re-masks
the already-prefix-masked value to `***`, so the assertion was stale. Switch
to a bare-token context so the guard isolates what it claims (prefix-mask
head/tail shape in default mode) without depending on the YAML collapse.
2026-06-28 04:13:20 -07:00
kshitijk4poor
de928bccde fix(redact): non-reusable sentinel for prefix secrets in file reads (#35519)
When security.redact_secrets is on (default), read_file/search_files/cat
applied redact_sensitive_text(code_file=True) to file content, which still
ran prefix masking. An API key in config.yaml (ghp_..., sk-..., xai-..., etc.)
came back as a head/tail mask like `ghp_S1...Pn2T` — a plausible-looking
truncated key. When an agent read that and wrote it back to config, the masked
value replaced the real credential, silently breaking auth (401). Production
evidence: a config.yaml found containing the exact 13-char masked GitHub PAT.

The two community PRs (#35529, #35534) fixed the corruption by NOT redacting
prefixes for config reads — but that exposes the user's real keys to the agent
context, model, and logs (a security regression). This takes the safer route:
keep redacting, but for file content emit a NON-REUSABLE sentinel.

- New `_mask_token_nonreusable`: prefix secrets -> `«redacted:ghp_…»` (vendor
  label preserved for debuggability; zero secret bytes; angle-bracket/ellipsis
  wrapper is syntactically invalid as a token so it can't be mistaken for or
  written back as a usable key).
- New `redact_sensitive_text(file_read=True)` routes prefix matches through it
  (implies code_file=True). Default/log/display mode is UNCHANGED — `_mask_token`
  still keeps head/tail (fine for logs, never written back).
- Wired the 3 file_tools.py call sites (read_file / search_files / cat) to
  file_read=True.

Fixes both the corruption AND avoids the secret-exposure of the un-redact
approach. 6 new tests (sentinel shape, no-leak, not-a-plausible-key, default
mode unchanged, file_read implies code_file, sk- prefix); 88 redact tests pass;
mutation-verified (reverting to the old mask fails the sentinel/leak tests).

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: adammatski1972 <289282750+adammatski1972@users.noreply.github.com>

Closes #35519. Supersedes #35529, #35534.
2026-06-28 04:13:20 -07:00
teknium1
19cbbe304a docs: add infographic for clarify typed-replies fix 2026-06-28 04:13:19 -07:00
tymrtn
d7f655f370 fix: accept typed clarify choice replies 2026-06-28 04:13:19 -07:00
teknium1
9bb5a809b5 fix(gateway): make zombie check defensive against partial psutil stubs
The zombie status probe referenced psutil.Process/NoSuchProcess/Error
unconditionally, which raised AttributeError when psutil is a partial
stub that only defines pid_exists (as in test_windows_native_support's
fallback tests). Guard the probe so any failure to read status degrades
to the authoritative pid_exists() instead of raising.
2026-06-28 04:11:14 -07:00
MorAlekss
acca526286 fix(gateway): treat zombie PIDs as dead in _pid_exists to unblock --replace (closes #42126)
Under systemd Restart=always, the old gateway becomes a zombie (in the
process table, awaiting reap) when the replacement starts. _pid_exists()
reported the zombie as alive, so --replace waited on a PID that never
dies, then aborted with exit 1 — a silent crash loop. Standalone runs are
unaffected because nothing respawns the gateway into a zombie.

The live path is psutil.pid_exists(), which returns True for zombies, so
the check is added there (Process.status() == STATUS_ZOMBIE -> dead). The
psutil-less POSIX fallback also reads /proc/<pid>/stat (state Z) with a ps
state= fallback for macOS/BSD, before the os.kill(pid, 0) liveness probe.

Diagnosis and the /proc + ps POSIX fallback by MorAlekss (PR #44898);
extended to cover the psutil hot path so the fix applies on normal installs.

Co-authored-by: MorAlekss <mor.aleksandr@yahoo.com>
2026-06-28 04:11:14 -07:00
teknium1
463225caf1 fix(gateway): bypass legacy-unit prompt in non-TTY systemd install
Folds in PR #42124 (kyssta-exe): systemd_install gained a non_interactive
flag so the 'Remove the legacy unit(s)?' prompt — the second hidden prompt
not guarded by --start-now/--start-on-login — is also skipped in headless
contexts. Updates systemd_install test mocks to accept the new kwarg and
adds coverage for the legacy-unit-skip path.
2026-06-28 04:09:54 -07:00
liuhao1024
831d443b03 fix(gateway): honor --start-now/--start-on-login flags and support non-TTY headless installs
When running `hermes gateway install` on Linux/systemd, the command
unconditionally prompts with two `prompt_yes_no` questions, breaking
headless installs (SSH, CI, provisioning scripts) and ignoring the
existing --start-now / --start-on-login CLI flags that the Windows
branch already respects.

The fix mirrors the Windows path: read CLI flags first, prompt only
when flags are not provided AND stdin is a TTY, and fall back to True
defaults for non-TTY contexts. The argparse help strings are promoted
from SUPPRESS to visible so users can discover the flags.

Fixes #42065
2026-06-28 04:09:54 -07:00
Teknium
5e7bca95d9 fix(tui): coalesce render frames while stdout backpressure is unresolved (#31486) (#54171)
When the previous frame's stdout.write has not drained (the outer terminal
parser is overwhelmed by a wide CR+LF burst — CJK + ANSI tool output on a
high-context session), the renderer kept writing a new frame every tick. That
piled writes onto an already-backed-up pipe and kept the macrotask queue hot,
starving the stdin 'readable' callback — the observed stdin freeze where the
agent loop keeps running but keystrokes/Ctrl-C are dead.

onRender now coalesces: while pendingWriteStart is non-null (prior write's
drain callback hasn't fired) it skips the frame and retries on the drain tick
instead of writing. A MAX_COALESCED_BACKPRESSURE_FRAMES ceiling forces a write
through after N skips so a terminal whose drain callback never fires (OSError
EIO on flush) self-heals once the pipe recovers rather than wedging forever.
TTY-only; piped stdout has no flow control. Coalesce counter resets on every
real write.

This is the stdout-backpressure strand left open after #54046 fixed the
swallowed-exception strand.
2026-06-28 04:00:22 -07:00
Teknium
a06d0198cd fix(dashboard): reap PTY bridge on child EOF, not only in writer finally (#54190)
The /api/pty handler only closed the PtyBridge in the writer loop's finally.
On child EOF the reader task closes the WebSocket, but if the handler task is
cancelled the instant the socket closes, the writer's finally can be skipped
and the PTY fds leak (#54028) — the FD-leak the regression test guards. Under
dashboard auto-reconnect this stacks orphaned PTYs until fds are exhausted.

Reap the bridge in the reader's EOF finally too (close() is idempotent), so
the PTY is reaped independently of the writer-loop cancellation race. Harden
the regression test to poll for teardown instead of asserting on the same
tick. Was flaky on main (2/20); now 25/25.
2026-06-28 03:58:18 -07:00
Teknium
7968c90318 test(install): track run_with_timeout extraction after #39219 refactor (#54185)
PR #39219 split run_browser_install_with_timeout into a thin wrapper that
delegates to a new run_with_timeout helper (and parameterized the timeout
binary as $timeout_bin for macOS gtimeout support), but did not update
tests/test_install_sh_browser_install.py. The behavioral harness extracted
only the now-empty wrapper, so the install command never ran (runs==[]),
failing all 8 behavioral cases; two text assertions also still expected the
old literal 'timeout' invocation.

Fix the stale test: extract run_with_timeout alongside the wrapper, and match
the $timeout_bin-parameterized GNU-timeout strings. Behavior unchanged.
2026-06-28 03:58:01 -07:00
Christian Persico
135f235165 docs: fix incorrect web search instructions 2026-06-28 02:54:27 -07:00
kshitijk4poor
546193aa6d fix(install): time-box desktop + node-deps installs so a stalled download self-heals (#39219)
The desktop install step ran npm ci / npm run pack with no wall-clock cap, and
the sibling browser-tools / TUI / agent-browser dependency installs had the same
gap. The Electron binary (~150MB) is fetched from GitHub during the pack; on a
throttled or region-blocked link that download can *stall* rather than fail —
npm never errors and never exits, so the installer sits on "Build desktop app"
(step 9/11) indefinitely with only harmless 'npm warn deprecated' lines visible.
The existing self-heal escalation (cache purge -> dist restore -> npmmirror
fallback) only fires when pack returns non-zero, so a stall bypassed it.

- run_with_timeout (generalized from run_browser_install_with_timeout): GNU
  timeout --foreground -k 10 (Ctrl+C-aware, #35166) / gtimeout for external
  commands, else a pure-shell process-group watchdog so stock macOS (neither
  binary present) is protected. Shell functions (_desktop_pack) always take the
  pure-shell path — the timeout binary can't exec a function. Integer-normalized
  budget + a boundary recheck so a command finishing in the final poll second
  isn't mislabeled 124. The internal wait is guarded so set -e can't abort
  mid-function before the real exit code is computed.
- Wrap the desktop npm ci/install (sharing ONE budget via a computed deadline so
  a stall can't cost 2x DESKTOP_BUILD_TIMEOUT) + all three _desktop_pack attempts
  (DESKTOP_BUILD_TIMEOUT, default 900s), and the browser-tools / TUI / agent-
  browser registry installs (NODE_DEPS_TIMEOUT, default 600s).

A stall now converts to a bounded non-zero exit that feeds the existing mirror
self-heal instead of hanging the whole install.
2026-06-28 02:47:47 -07:00
Teknium
c1c179a239 fix(security): redact secrets in background process + foreground env-dump output (#43025) (#54149)
* fix(security): redact secrets in background process + foreground env-dump output

Terminal-output redaction was incomplete (#43025):

- Gap 1: process(action=poll/log/wait) returned background stdout verbatim —
  no redaction at all. A background printenv/server/test emitting a key leaked
  raw to the model, session.db, and CLI display. Same for the gateway
  background-process watcher's completion/progress notifications.
- Gap 2: the foreground terminal path hardcoded code_file=True, which skips the
  ENV-assignment pass, so an opaque token (no vendor prefix) from env/printenv
  leaked even there.

Adds agent.redact.redact_terminal_output(output, command) as the single policy
for ALL terminal-output surfaces: env-dump commands (env/printenv/set/export/
declare) get the ENV-assignment pass (code_file=False) to mask opaque tokens;
other commands stay on code_file=True to avoid false positives on source dumps.
Wired into terminal_tool, process_registry (_handle_process boundary), and the
gateway watcher. Respects security.redact_secrets (no force) — opt-out preserved.

* docs: add infographic for #43025 terminal-output redaction fix
2026-06-28 02:44:21 -07:00
teknium1
d5ba374c03 fix(telegram): detect wedged getUpdates consumer via pending_update_count
The merged CLOSE-WAIT heartbeat (#52744) only probes get_me(), which uses the
general request path and stays healthy while PTB's getUpdates consumer is
silently wedged (updater.running=True but the long-poll task is stuck, observed
on WSL2). DMs then queue in the Bot API and never reach handlers (#42909).

Augment the existing _polling_heartbeat_loop to also probe
get_webhook_info().pending_update_count. After two consecutive probes that see a
non-draining queue while the updater claims to be running, escalate into the
existing _handle_polling_network_error recovery ladder — no new restart
machinery. No-ops in webhook mode, when the updater is not running, or when a
reconnect is already in flight.

Credit to @gazzumatteo, whose PR #42959 identified the pending_update_count
signal as the missing liveness probe. This reuses the existing heartbeat +
recovery path rather than adding a parallel watchdog.

Fixes #42909.
2026-06-28 02:44:17 -07:00
teknium1
822b71cbf8 docs: add infographic for #43083 secret-redaction fix 2026-06-28 02:44:06 -07:00
teknium1
bbe1bf4045 fix(agent): stop redacting tool-call args in history; fix auth-header quote-eating
Two related redaction bugs from #43083:

1. build_assistant_message redacted tool-call arguments in-memory. That dict
   feeds both the replayed conversation history and state.db (which is itself
   replayed verbatim on session resume), so the model read back its own
   PGPASSWORD='***' psql call and copied the placeholder, breaking every
   credential-dependent command on the second turn. The masking gave no real
   protection either — the same secret still leaks through tool OUTPUT. Remove
   it. Keeping secrets out of the replayable store is a separate
   tokenization/vault concern (security.redact_secrets still governs
   storage-time redaction elsewhere).

2. _AUTH_HEADER_RE's greedy \S+ credential class ate a closing quote when the
   token sat flush against it (Authorization: Bearer sk-.."), turning value
   corruption into syntax corruption (unterminated quote -> shell EOF /
   SyntaxError). Exclude " and ' from the token class; real credentials never
   contain them.

Closes #43083.
2026-06-28 02:44:06 -07:00
yoniebans
204a67f0c8 fix(kanban): retry write_txn on transient SQLITE_BUSY 2026-06-28 02:44:04 -07:00
yoniebans
90c1dc0493 test(kanban): cover write_txn BUSY retry (currently failing) 2026-06-28 02:44:04 -07:00
teknium1
9844243b18 fix(gateway): gate quick_commands through slash access policy
Config-backed quick_commands bypassed the admin-only slash gate. The
early gate in _handle_message only fires for registry-known commands
(is_gateway_known_command), but quick_commands are never in the gateway
registry, so they reached the type:exec dispatch sink unchecked. An
allowlisted non-admin gateway user could invoke admin-only quick
commands — including shell exec in the gateway process — even when the
operator set allow_admin_from / user_allowed_commands to lock them out.

Apply _check_slash_access(source, command) at the quick_commands
dispatch site (the single exec chokepoint, cold-path only) using the
raw typed name. Admins and users with the command in
user_allowed_commands still run it; backward-compat (no policy set)
is unaffected.

Fixes #44727.

Co-authored-by: maxpetrusenko <max.petrusenko.agent@gmail.com>
Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
2026-06-28 02:43:23 -07:00
Teknium
6d879d486b fix(dashboard): close PTY WebSocket on child EOF to stop FD leak (#54028) (#54123)
* fix(dashboard): close PTY WebSocket on child EOF to stop FD leak

The /api/pty handler's reader task returns on child EOF, but the writer
loop stayed blocked on ws.receive() until the browser sent a disconnect.
When the browser socket is half-open (no FIN delivered — common on
macOS/launchd), that disconnect never arrives, so the handler never
reaches its finally and the PTY master fd + child process leak. With
dashboard auto-reconnect (#52962), every dropped socket then spawns a
fresh PTY on top of the orphaned one, exhausting file descriptors within
hours (EMFILE / Errno 24).

Fix: the reader task now closes the WebSocket in a finally when the child
EOFs or the send side breaks, which unblocks ws.receive() so the existing
finally runs bridge.close(). The writer loop also guards ws.receive()
against the RuntimeError Starlette raises once the socket is closed.

Reported by @fifteenzhang.

Fixes #54028

* docs: add infographic for #54028 PTY FD leak fix
2026-06-28 02:42:21 -07:00
teknium1
7ef04ae7a7 fix(browser): close eval return-value SSRF bypass (sibling of #44731)
The snapshot/vision guards re-check the page URL before returning content,
but browser_console(expression=...) -> _browser_eval returns arbitrary JS
results directly, leaving two same-class bypasses open:

  1. Direct fetch: fetch('http://127.0.0.1/secret').then(r=>r.text()) reads
     a private endpoint and returns the body — the page URL stays public so
     the post-eval recheck never sees it.
  2. Navigate-then-read: location.href='http://127.0.0.1/' then a later eval
     reads document.body.innerText.

Guard _browser_eval on the same condition as navigate/snapshot/vision
(not local backend, not local sidecar, not allow_private_urls):
  - pre-scan the expression for private/always-blocked URL literals
  - re-check window.location.href after the eval at both success-return
    sites (supervisor fast-path + subprocess fallback)

Probe failures fail-open (matching the snapshot/vision guards).
2026-06-28 02:42:01 -07:00
liuhao1024
0ae6196087 fix(browser): allow local sidecar sessions to bypass SSRF guard
The private-network guard in browser_snapshot() and browser_vision()
blocked all private URLs, including those accessed via local sidecar
sessions (hybrid routing). Local sidecar sessions intentionally access
private URLs — the cloud provider never sees the URL in that case.

Add `_is_local_sidecar_key(effective_task_id)` check to both guards,
matching the existing pattern in browser_navigate().

Fixes #45101 review feedback from egilewski.
2026-06-28 02:42:01 -07:00
liuhao1024
48f5c42599 fix(browser): extend private-network guard to browser_vision
The SSRF bypass in #44731 was only patched for browser_snapshot(), but
browser_vision() exposes the same vulnerability — it takes a screenshot
and sends it to the vision model without checking if eval-driven
navigation moved the page to a private/internal URL.

Add the same current-page URL safety check to browser_vision() before
any screenshot is captured, encoded, or forwarded to the vision model.
This covers both the normal screenshot path and the Lightpanda Chrome
fallback path.

7 new tests: blocks private URL, allows public URL, skips in local
backend, skips when private URLs allowed, handles eval failure/empty/exception.
2026-06-28 02:42:01 -07:00
liuhao1024
7a6fe9bbfa fix(browser): block snapshot from eval-navigated private pages
browser_snapshot() now checks the current page URL before returning
content. When browser_console() changes location.href to a private or
internal address (e.g., http://127.0.0.1:8080/), the snapshot returns
an error instead of exposing the private page content.

This closes the SSRF bypass where an attacker could:
1. Navigate to a public page
2. Use browser_console to eval location.href = 'http://127.0.0.1:port/'
3. Use browser_snapshot to read the private page content

The fix reuses the existing _is_safe_url() and _allow_private_urls()
infrastructure, and fails open if the URL check itself fails.

Fixes #44731
2026-06-28 02:42:01 -07:00
Teknium
7c0a5def58 fix(memory/holographic): close DB connection on shutdown instead of leaking to GC (#54133)
HolographicMemoryProvider.shutdown() dropped its MemoryStore reference
without calling the existing MemoryStore.close(). Since the connection is
opened check_same_thread=False (one per session), its fd was released by
refcount/GC at a non-deterministic time on a non-deterministic thread,
churning a DB fd through the kernel free pool on every session teardown.
Call close() so the fd is released deterministically.

Reported by @alfranli123 (#44037), who pinpointed the exact code location.
Note: the report's TLS-fd-recycle corruption attribution could not be
reproduced from the code — dropping a sqlite connection flushes valid
SQLite pages via the VFS, never TLS framing, and the provider is at most a
releaser of DB fds, not a TLS-flushing socket owner. This change is correct
resource hygiene that removes per-session fd churn regardless.
2026-06-28 02:41:52 -07:00
Teknium
00d8c2c915 fix(gateway): prune stale sessions.json entries on startup
A hard gateway crash (exit code 1) skips the graceful shutdown path, so
sessions.json is never cleared and is left pointing at sessions already
ended in state.db. On the next startup get_or_create_session() reuses
those stale entries as long as the time/policy reset checks pass — it
never consults end_reason — so every incoming message is silently routed
into a closed session, with no log or error (#52804).

SessionStore._ensure_loaded_locked() now calls a new
_prune_stale_sessions_locked() that drops any entry whose session_id has
end_reason IS NOT NULL in state.db. Idempotent, _db=None / legacy-absent
safe, DB errors non-fatal, sessions.json rewritten only when something
was pruned. Self-heals into a fresh session on the next message.

Reported and diagnosed by @terry197913 (#52808).
2026-06-28 02:41:47 -07:00
teknium1
c38dfba3a7 docs: add infographic for #53175 gateway cleanup off-loop fix 2026-06-28 02:41:36 -07:00
teknium1
ea5aaa7a22 fix(gateway): offload remaining inline agent cleanup off the event loop (#53175)
#35994 moved /new reset cleanup off the loop, but _cleanup_agent_resources
(agent.close() subprocess teardown; shutdown_memory_provider() plugin IO) was
still called INLINE on the event loop from three other sites:

  - _session_expiry_watcher (5-min idle sweep) — live loop
  - _handle_message_with_agent cache-hygiene re-eviction — live loop
  - _finalize_shutdown_agents / stop() idle-cache loop — shutdown

A wedged memory provider on any of these froze the loop: bot goes silent,
runtime-status updated_at heartbeat stops advancing, and SIGTERM can't be
serviced (requires kill -9) — exactly the #53175 zombie pattern.

Adds _cleanup_agent_resources_off_loop: a bounded (30s) worker-thread offload
mirroring the #35994 reset fix, and routes all four sites through it.
2026-06-28 02:41:36 -07:00
teknium1
aa50c1ba5d fix(prompt): repair backend probe import (get_environment never existed)
The system-prompt backend probe imported a nonexistent symbol —
`from tools.environments import get_environment` — which always raised
ImportError: cannot import name 'get_environment'. The exception is caught
and only drops the live backend description to a static fallback, so it is
cosmetic, but it broke the live OS/user/cwd probe for every non-local
backend (docker/singularity/modal/daytona/ssh).

The real factory is `_create_environment` in tools.terminal_tool. Build the
environment the same way the live terminal path does (select backend image,
assemble ssh/container config from _get_env_config()), then run the probe.

Note: this does NOT affect tool loading — tool selection runs each tool's
check_fn and never consults this probe. Regression from #52147 (2026-06-25).

Closes #53667 (probe import); the 'cronjob-only' tool-collapse symptom is
not reproducible — tool selection has no probe dependency and memory's
check_fn is unconditionally True.
2026-06-28 02:41:31 -07:00
Teknium
b508d4296e test(ci): raise per-file timeout 140s → 300s to stop false timeouts (#54143)
* test(ci): raise per-file timeout 140s to 300s to stop false timeouts

The per-file parallel runner caps each test-file subprocess at a flat
wall-clock budget. Combined with per-test subprocess isolation (a fresh
Python process per test), a large-collection file pays N x (interpreter
startup + import) of overhead before any test logic runs. That overhead
dilates under load on shared CI runners, so a file that finishes in
~100s on a quiet box can blow the old 140s cap purely from scheduling
jitter, surfacing as a false 'no tests ran' timeout (rc=124) with zero
actual test failures.

Raise the default to 300s (5 min). The Docker build matrix jobs already
take 7-10 min, so this headroom costs nothing on total CI wall time
while still bounding a genuinely hung file.

* docs: add infographic for CI per-file timeout bump
2026-06-28 02:41:07 -07:00
teknium1
dcc6cd1b42 docs: add infographic for #52378 Windows update-loop salvage 2026-06-28 02:40:37 -07:00
teknium1
fe89ce0694 chore(release): map Cossackx in AUTHOR_MAP for #52528 salvage 2026-06-28 02:40:37 -07:00
Cossackx
ba37c910e0 fix(desktop/windows): resolve real hermes over extensionless shim + prefer --update on recovery
Two Windows-only desktop boot bugs that caused spurious reinstall/repair loops:

1. findOnPath() searched the empty extension BEFORE PATHEXT, so an
   extensionless Git-Bash `hermes` shim shadowed the real hermes.cmd/.exe.
   The shim then failed the shell:false --version probe and the resolver
   fell through to bootstrap/repair even though a working CLI was on PATH.
   Fix: try PATHEXT extensions first, keep the empty entry LAST so callers
   that already include the extension (py.exe, pwsh.exe) still resolve.

2. handOffWindowsBootstrapRecovery() chose the destructive --repair over the
   gentle --update by checking only venv\Scripts\hermes.exe -- the setuptools
   console-script shim, written at the END of venv setup and absent in
   interrupted/quarantined states. Fix: take --update when ANY real-install
   signal is present (venv python, the shim, or .hermes-bootstrap-complete).

Adds windows-hermes-resolution.test.cjs (source-assertion pattern, wired into
test:desktop:platforms) guarding both regressions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 02:40:37 -07:00
Cornna
0229246ab8 fix(desktop): probe venv runtime health before trusting bootstrap marker
A broken/empty Windows launcher venv can see the source tree via PYTHONPATH
but lack PyYAML, so 'import hermes_cli' succeeds while the first real CLI
import dies — the desktop then trusts the bootstrap marker, spawns a dead
backend, and loops on 'gateway offline' (#52378).

- backend-probes.cjs: canImportHermesCli now runs 'import yaml; import
  hermes_cli.config' (extracted as hermesRuntimeImportProbe) and accepts an
  env override, so a dependency regression is caught without a real broken
  venv fixture.
- main.cjs: isBootstrapComplete() routes through new isActiveRuntimeUsable(),
  which requires the venv python to pass the runtime import probe (with
  ACTIVE_HERMES_ROOT on PYTHONPATH) — not just exist on disk.

Salvaged from PR #38179. The PR's install.ps1 reset/clean + autocrlf changes
and their tests are dropped: current main already preserves dirty checkouts
via stash (the data-loss-safe #38542 path) rather than the PR's older
reset-based Repair-ManagedCheckoutBeforeUpdate approach.
2026-06-28 02:40:37 -07:00
teknium1
7c9cdad9fd test(cli): cover Windows self-lock recovery guard + cmd-quote its hint
Add two tests for the self-lock guard in _recover_from_interrupted_install:
one asserting it clears the marker and skips install when hermes.exe is a
process ancestor (breaking the #52378/#45542 loop), one asserting it falls
through to a normal recovery install when the shim is NOT an ancestor.

The guard's manual-recovery hint runs only inside the Windows branch, so
quote it for cmd.exe (cd /d, double-quoted paths) — the cross-platform
fallback hint at the end of the function is left POSIX-correct.

Map Icather in scripts/release.py AUTHOR_MAP for the salvage.
2026-06-28 02:40:37 -07:00
灵越羽毛
b6f592dbdc fix(cli): detect self-lock in update recovery to break infinite retry loop on Windows 2026-06-28 02:40:37 -07:00
liuhao1024
14baeefe1d fix(matrix): record DM rooms in m.direct on invite to prevent group misclassification
Rebase onto plugins/platforms/matrix/adapter.py (code moved from
gateway/platforms/matrix.py). Same logic: _on_invite checks is_direct
on invite events and calls _record_dm_room to persist in m.direct
account data.

Fixes #44679
2026-06-28 02:37:52 -07:00
Teknium
fde1c8570f fix(tui_gateway): suppress WS peer-hangup teardown error flood (#50005) (#54126)
When the Desktop forcibly closes its WebSocket mid-write, asyncio logs a
full traceback for every pending connection-lost callback — 50+ identical
WinError 10054 (ConnectionResetError) lines per disconnect on Windows, the
equivalent ConnectionResetError/BrokenPipeError on POSIX. These are not
actionable: they are the expected side effect of the peer hanging up before
our writes drained.

Install a loop exception handler on the gateway serving loop that collapses
exactly this teardown class (ConnectionResetError/ConnectionAbortedError/
BrokenPipeError originating from _call_connection_lost) to a single debug
line, forwarding every other loop error to the existing/default handler
unchanged so genuine loop bugs still surface. Idempotent per loop.
2026-06-28 02:35:01 -07:00
teknium1
6eec0d4f08 docs: add infographic for #53107 gateway force-exit fix 2026-06-28 02:34:23 -07:00
LeonSGP43
9f0e64cedd fix(gateway): force exit after graceful shutdown
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-06-28 02:34:23 -07:00
teknium1
dddaea0c98 chore(release): map yungchentang author for #53622 salvage 2026-06-28 02:34:17 -07:00
yungchentang
7e2ca7f68d fix(telegram): reset send pool after pool timeouts 2026-06-28 02:34:17 -07:00
kshitij
f3d8f20a59 Merge pull request #54116 from kshitijk4poor/fix/36658-gateway-drain-microtask
fix(tui): defer buffered gateway events to stop dashboard chat #301 (#36658)
2026-06-28 14:50:07 +05:30
Teknium
f646b82ff0 docs: add infographic for #38249 atomic env-snapshot fix 2026-06-28 02:08:57 -07:00
Teknium
9f17f16c66 fix(environments): use $BASHPID for atomic snapshot temp + harden failure path
The atomic mv approach (kyssta-exe's commit) narrows but does not close the
#38249 race: the temp name used $$ (parent shell PID), which is identical
across &-launched concurrent subshells. Two concurrent writers pick the same
temp file, clobber each other mid-write, and mv then publishes a torn snapshot
— a reader sourcing it absorbs declare-x/export fragments into PATH.

- Use $BASHPID (actual per-subshell PID) so concurrent writers never collide.
- Chain mv on export success (&&) and rm the temp on failure so a partial dump
  never replaces a good snapshot; apply the same to the init_session bootstrap.
- shlex-quote the static temp-path portion (Windows/spaces), $BASHPID outside.
- LocalEnvironment.cleanup sweeps orphaned snap.tmp.* temps.
- Regression tests: string-shape + a behavioral concurrent writers/readers test
  that proves the snapshot never tears (would still tear with $$).
2026-06-28 02:08:57 -07:00
kyssta-exe
6a2958a521 fix(environments): use atomic file replacement for snapshot writes
Fix race condition in terminal environment snapshots that could corrupt
PATH with declare -x entries. When concurrent terminal calls share the
same snapshot file, the non-atomic 'export -p > snapshot.sh' write could
be read mid-write by another process, causing partial/corrupted env vars
to be sourced and mixed into PATH.

The fix uses atomic file replacement:
- Write to a temp file: export -p > snapshot.sh.tmp.303651
- Atomically replace: mv -f snapshot.sh.tmp.303651 snapshot.sh

On POSIX, mv within the same filesystem is atomic, so source() will
either see the old complete snapshot or the new complete one, never a
partial/truncated file.

Fixes #38249
2026-06-28 02:08:57 -07:00
teknium1
c23f394eb8 fix: satisfy ruff encoding + windows-footgun lints for cgroup reaper
- read_text(encoding='utf-8') (PLW1514)
- # windows-footgun: ok on signal.SIGKILL — module is Linux-only (reads
  /proc, /sys/fs/cgroup; runs from a systemd unit)
- test lambda accepts the new encoding kwarg
2026-06-28 02:05:50 -07:00
teknium1
86ec979f66 chore(release): map PRATHAMESH75 author for #37550 salvage 2026-06-28 02:05:50 -07:00
PRATHAMESH75
e551da6ddb fix(gateway): reap cgroup orphans via ExecStopPost to unblock restart
Long-lived helpers spawned indirectly by tool calls (adb, platform
bridges) were left in the service cgroup after the gateway's main
process exited. When the kernel rejected the deferred cgroup-wide kill
with EINVAL, systemd blocked Restart=always for 6+ minutes, taking
down all platforms and cron windows (#37454).

Add a small ExecStopPost helper (gateway.cgroup_cleanup) that walks
cgroup.procs and sends per-PID SIGKILLs — a different kernel code path
than cgroup.kill, so it succeeds where the cgroup-wide write failed.
KillMode=mixed is preserved so the gateway still reaps its own
tool-call children before systemd intervenes (#8202).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-28 02:05:50 -07:00
Coy Geek
d7a1052424 fix(env-passthrough): fail closed when provider blocklist import fails
When tools.environments.local can't be imported (partial install,
import-time error), _is_hermes_provider_credential() returned False —
fail-open. A skill could then register a Hermes provider credential
(ANTHROPIC_API_KEY, etc.) as env passthrough; _scrub_child_env lets
passthrough vars bypass the secret-substring net (rule 1), so the
operator's real key would land in the execute_code child. Reopens the
GHSA-rhgp-j443-p4rf bypass.

Fail closed instead: on import failure, treat the name as a protected
provider credential and refuse passthrough. Regression test exercises
the full register -> scrub path under a simulated import failure.

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-28 02:05:43 -07:00
teknium1
58c36b1798 fix(api-server): widen error redaction to cron-endpoint + SSE sites
Follow-up to the salvaged #37733 fix. The contributor centralized
redaction at _openai_error and the chat/responses failure paths, which
covers the OpenAI-compatible envelopes transitively. Two sibling classes
crossed the same authenticated HTTP boundary unredacted:

- 8x cron-management endpoints returning {"error": str(e)} on 500
- the session-chat SSE error event ({"message": str(exc)})

Route both through the same _redact_api_error_text(force=True) helper.
Add AUTHOR_MAP entry for coygeek and a TestRedactApiErrorText guard
covering mask/force/limit/passthrough behavior.
2026-06-28 02:05:38 -07:00
Coy Geek
5e774de76e fix(api-server): redact provider errors at HTTP boundary
Force API-server error text through the existing secret redactor before returning OpenAI-compatible errors, response fallback text, response snapshots, and run failure events. This prevents credential-shaped provider failure text from crossing the API-server boundary while preserving debuggable sanitized messages.
2026-06-28 02:05:38 -07:00
HexLab98
d2fda5925d test(gateway): cover Discord/Slack compression status suppression (#39293) 2026-06-28 14:35:32 +05:30
HexLab98
d2ea948bc0 fix(gateway): suppress compression status noise on Discord and other chats (#39293)
Extend the gateway noisy-status filter beyond Telegram so internal
compression lifecycle messages stay in logs instead of spamming Discord,
Slack, and other messaging channels.
2026-06-28 14:35:32 +05:30
teknium1
9f7d520caf docs: add infographic for #36664 WhatsApp LID session-path fix 2026-06-28 02:05:26 -07:00
teknium1
3aaa98dd01 test(whatsapp): cover LID allowlist match on modern session layout
Add an _is_user_authorized E2E for the platforms/whatsapp/session layout
on top of fesalfayed's resolver fix (#36665) — guards the actual
silently-dropped-LID-sender path from #36664.
2026-06-28 02:05:26 -07:00
fesalfayed
263ffec1b0 fix(whatsapp): resolve LID aliases on modern platforms/ session layout
expand_whatsapp_aliases hardcoded get_hermes_home()/whatsapp/session, but
the adapter writes lid-mapping files via get_hermes_dir("platforms/whatsapp/
session", "whatsapp/session"). On installs without the legacy directory the
two paths diverge, so the resolver finds no mappings and returns the bare LID,
which misses the allowlist and silently drops the message. Resolve through the
same helper so both sides stay in lockstep on new and legacy layouts.
2026-06-28 02:05:26 -07:00
teknium1
d0f087e7f9 docs: add infographic for #36109 empty-400 diagnostics 2026-06-28 02:05:20 -07:00
xxxigm
093f567f0d fix(agent,cli): surface empty-body API errors and fail oneshot exit code
When an LLM API call returns HTTP 4xx with an empty parsed SDK `body` ({}),
`_summarize_api_error` fell through to a bare `str(error)`, so users saw only
"HTTP 400" with no provider detail (reported on Windows in #36109). The SDK
leaves `body` empty in this case, but the httpx `response` still carries the
payload in `.text`.

- run_agent.py `_summarize_api_error`: when `body` is empty, fall back to
  `response.text` — parse a JSON `error.message`/`message` when present, else
  surface the raw (truncated) body. Platform-agnostic diagnostics.
- hermes_cli/oneshot.py: `hermes -z` now runs via `run_conversation` and returns
  exit code 2 when the run is failed/partial with no usable final response, so
  scripts can detect LLM failures (still 0 when a response — incl. an error
  summary as output — is produced).

Tests: new tests/run_agent/test_summarize_api_error.py (empty-body JSON + raw
text, RED/GREEN verified) + oneshot exit-code/`run_conversation` wiring tests.

NOTE: #36109's original root cause (Windows "all providers return empty 400")
is not reproducible on current main (heavy provider-transport churn since
v0.15.1). This change does not claim to fix that root cause — it makes any
empty-body API error LEGIBLE so a future occurrence shows the real provider
message instead of a bare HTTP 400. Relates to #36109 (does not close it).
2026-06-28 02:05:20 -07:00
teknium1
c0b4a3438a fix(install): scope Playwright override to too-new apt releases + keep step interruptible
Follow-up on #54032 for #35166:
- Gate the PLAYWRIGHT_HOST_PLATFORM_OVERRIDE retry on the host being an apt
  release newer than Playwright recognizes (Ubuntu >24.04 / Debian >13) via
  playwright_host_unrecognized(), instead of retrying on ANY install failure.
  A network/disk/permission failure on a supported host now surfaces unchanged
  rather than getting a mismatched-glibc build forced onto it.
- detect_os() now captures DISTRO_VERSION from os-release.
- Fold in the interruptibility fix (was PR #35304, self-closed): wrap the
  download in 'timeout --foreground -k 10' (probed, with plain-timeout
  fallback) so a terminal Ctrl+C reaches the child and a wedged download is
  force-killed after the deadline.
- Add behavioral tests that source the helpers and assert the retry fires only
  on Ubuntu 26.04 / Debian 14, not on supported hosts, non-apt distros,
  native-success, operator-pinned override, or unsupported arch.
2026-06-28 02:05:18 -07:00
kshitijk4poor
a28fe788a6 fix(install): retry Playwright install with platform override on unrecognized host (#35166)
On apt releases newer than the bundled Playwright recognizes (Ubuntu 26.04,
Debian 14, and future distros), 'npx playwright install --with-deps chromium'
hangs uninterruptibly at 'Installing Playwright Chromium with system
dependencies' because Playwright's resolver maps the host to a platform with
no download build (#35166).

Wrap every installer Playwright call in run_playwright_install(), which tries
the native install first and, only if it fails or times out, retries once with
PLAYWRIGHT_HOST_PLATFORM_OVERRIDE pinned to the newest known build
(ubuntu24.04-<arch>). This is the escape hatch Playwright's maintainers bless
for unrecognized platforms (microsoft/playwright#33434).

Try-native-first (not a hardcoded distro/version table) is deliberate:
- Self-correcting — when Playwright already supports the host (e.g. Ubuntu
  26.04 on Playwright >=1.61) the first attempt succeeds and the override is
  never applied, so we never force a mismatched-glibc build onto a release
  Playwright handles correctly (microsoft/playwright#35114).
- Zero-maintenance — new distro releases work the moment Playwright adds them.
- Covers Debian 14+ and future releases, not just Ubuntu 26.04.

An operator-set PLAYWRIGHT_HOST_PLATFORM_OVERRIDE is always respected (applied
to the first attempt; retry skipped). Non-x64/arm64 arches have no fallback
build and skip the retry.

Refs #35166
2026-06-28 02:05:18 -07:00
teknium1
64972b6403 fix(config): canonicalize model.name/model.model to model.default (#34500)
A custom_providers config that names the model under model.name (or
model.model) resolved to an empty model, so the API request went out
with model= — HTTP 400 from OpenAI-compatible backends. Display paths
(hermes status/dump) already read model.name and showed the model,
making the failure silent.

The model id was read via 'default or model' at ~14 independent sites
(cli, gateway, cron, curator, oneshot, fallback, profiles, ...), none
of which honored 'name'. Rather than patch every site, canonicalize at
the single load/save chokepoint: _normalize_root_model_keys() now
promotes model.model/model.name -> model.default (precedence
default > model > name) and drops the stale alias, so every reader —
present and future — sees a populated default and config.yaml is
migrated canonical on next save. The gateway, which bypasses
load_config(), replays the same normalization in _load_gateway_config().

Co-authored-by: Bartok9 <danielrpike9@gmail.com>

Credit: root-cause analysis and fix direction from @Bartok9 (#34502,
first) and @v86861062 (#34527).
2026-06-28 02:05:13 -07:00
kshitijk4poor
f64d15ccb7 fix(tui): defer buffered gateway events to stop dashboard chat #301 (#36658)
Dashboard /chat spawns the TUI attached to the dashboard's in-memory
gateway via HERMES_TUI_GATEWAY_URL. In that attach mode the already-running
gateway replays `gateway.ready` (and `session.info`) the instant the socket
connects, so those events land in GatewayClient.bufferedEvents *before* the
consumer's mount-time subscribe effect (useMainApp.ts) calls drain().

drain() then emitted the buffered events synchronously, so the
`gateway.ready` handler's patchUiState / setHistoryItems cascade ran while
React was still inside the first commit — tripping "Too many re-renders"
(Minified React error #301) and breaking Dashboard chat after `hermes update`.
Spawn / inline / sidecar modes never hit this: their `gateway.ready` only
arrives after the Python child boots, on a later async tick.

Fix: drain() defers the replay to the next microtask AND keeps `subscribed`
false until that microtask runs. Keeping `subscribed` false in the gap means
any live event arriving before the flush keeps buffering (publish() pushes
when !subscribed) instead of emitting synchronously and jumping ahead of the
chronologically-earlier replayed events — the flush re-drains the buffer
right after flipping `subscribed`, preserving FIFO order. A drainGeneration
token (bumped in resetStartupState) makes a queued flush a no-op if the
transport was reset/killed in the meantime, avoiding use-after-teardown and
duplicate/reordered exits.

Regression tests: (1) drain() does not dispatch buffered events synchronously;
(2) a live event arriving in the post-drain / pre-microtask window still
delivers BEHIND the earlier-buffered event (FIFO). Both are red against the
old synchronous behavior, green with this fix. Same class of fix as #44528.

Closes #36658
2026-06-28 14:18:47 +05:30
Teknium
2ecb6f7fe6 fix(telegram): clear send_path_degraded on successful reconnect (#35205) (#54076)
* fix(telegram): clear send_path_degraded on successful reconnect

_send_path_degraded was cleared only in _verify_polling_after_reconnect,
60s after reconnect and only if scheduled. A clean start_polling() reconnect
left the flag stuck True, short-circuiting send() and blocking all outbound
messages until the deferred probe ran (or forever if it never did).

Clear the flag the moment start_polling() succeeds — that is the recovery
signal. The deferred probe remains a defensive re-check that re-enters the
reconnect ladder (re-setting the flag) if it detects a silent wedge.

Fixes #35205.

* docs: add infographic for #35205 telegram send-path fix
2026-06-28 01:38:17 -07:00
Teknium
674e16e7c6 fix(redact): stop DB-connstr redaction from corrupting code output (#33801) (#54061)
Secret redaction is display/output-scoped on main — write_file writes
content verbatim, terminal/execute_code redact only output not the
command/source. The real bug is in displayed tool OUTPUT (read_file,
terminal, execute_code):

_DB_CONNSTR_RE's password group [^@]+ was greedy across newlines, so on a
multi-line block it scanned past the DSN line to the next stray '@' (a
Python @decorator), replacing every intervening character — including line
breaks — with ***. That dropped lines and concatenated the next line onto
the f-string line, making read_file output look corrupted (the file on disk
was always correct). Reported in #33801.

Fix:
- Forbid whitespace in the userinfo/password groups ([^:\s]+ / [^@\s]+) so
  the match can never span a line break. A real DSN password never contains
  whitespace. This alone kills the catastrophic line-dropping.
- Under code_file=True, preserve a password group that is a pure {...} brace
  expression — f"postgresql://{user}:{pass}@{host}" is an f-string template,
  not a live credential. Literal passwords are still masked.
- Pass code_file=True at the terminal and execute_code output redaction call
  sites (file_tools already did) so code-execution output isn't corrupted by
  ENV/JSON/template false positives. Real prefixes, auth headers, JWTs, and
  private keys are still redacted.

Verified E2E against the reporter's exact pydantic-settings module: file
written verbatim, read_file shows the DSN f-string + @model_validator intact
with zero *** corruption, while a literal postgresql://admin:pw@host DSN and
a real sk- key are still masked.

Reported-by: koishi70
Reported-by: pfrenssen
2026-06-28 01:15:39 -07:00
Teknium
de6e9ac760 docs(discord): document bot-to-bot comms as unsupported (#32791) (#54063)
* docs(discord): document bot-to-bot comms as unsupported (#32791)

Multi-profile bot-to-bot conversation is not a supported topology.
DISCORD_ALLOW_BOTS=none (the default) blocks all bot-originated
messages; setting mentions/all across multiple Hermes profiles to make
them reply to each other ack-loops because Discord's reply auto-mention
satisfies the mention gate every turn. Document the safe default and
the loop hazard so operators don't wire it up.

* docs(discord): infographic for bot-to-bot unsupported stance (#32791)
2026-06-28 01:15:34 -07:00
teknium1
4f16950e9a docs: add infographic for #32421 content-filter fallback fix 2026-06-28 01:15:21 -07:00
teknium1
578e3989d4 fix(agent): route content-filter stream stalls to fallback chain (#32421)
When a provider's output-layer safety filter (MiniMax "output new_sensitive
(1027)", Azure content_filter, etc.) kills a streaming response after deltas
were already sent, interruptible_streaming_api_call swallows the raw error
into a finish_reason=length partial-stream stub. The conversation loop then
burned 3 continuation retries against the SAME primary — re-hitting the
content-deterministic filter every time — and gave up with "Response remained
truncated after 3 continuation attempts", never consulting fallback_providers.

Builds on @595650661's classifier change (cherry-picked) so error_classifier
recognizes the filter; then:
- chat_completion_helpers: run the swallowed error through error_classifier at
  the stub-creation point and stamp _content_filter_terminated on the stub
  (single source of truth — no parallel pattern list).
- conversation_loop: read the tag and activate the fallback chain BEFORE
  burning any continuation retries; roll partial content back to the last
  clean turn and re-issue against the new provider (restart_with_rebuilt_messages).
  Plain network stalls are unaffected (only content_policy_blocked is tagged).

Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the
same issue via the stub-tag and loop-escalation approaches respectively.

Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback
fires on the first stub and the fallback provider completes the turn.
2026-06-28 01:15:21 -07:00
595650661
b8e2268628 fix(agent): add MiniMax 'new_sensitive' to content_policy_blocked patterns
The MiniMax output-layer safety filter surfaces the error verbatim as
`output new_sensitive (1027)` (sometimes with additional provider
wrapping like 'Stream stalled mid tool-call: output new_sensitive (1027)').
When the model emits a large tool-call argument block, the upstream
filter trips and the SSE stream is truncated mid-flight, producing
'stream stalled mid tool-call' errors. Until now this case was
misclassified and retried 3x on the same provider, reproducing the same
refusal and burning paid attempts.

Adding `new_sensitive` to `_CONTENT_POLICY_BLOCKED_PATTERNS` routes
it through the existing is_client_error path: skip 3x retry, activate
configured fallback model immediately, surface a clear provider-safety
message to the user.

Refs #32421
2026-06-28 01:15:21 -07:00
Teknium
c9df4bc094 fix(gateway): default restart_drain_timeout to 0 to kill systemd crash loop (#54066)
A restart now interrupts in-flight agents immediately rather than holding
the gateway open for a grace window. The previous 180s default coupled two
independently-set timers: the gateway's own drain timer and systemd's
TimeoutStopSec. On a stale unit where TimeoutStopSec < drain, systemd
SIGKILLed the gateway mid-cleanup, leaving a stale lock that made the next
startup exit immediately ('already running') — an infinite crash loop under
Restart=on-failure (#31981).

Setting drain to 0 makes the mismatch structurally impossible: with drain 0
the generated unit gets TimeoutStopSec=90 against a near-instant drain, so
systemd never kills mid-cleanup. Contract: restart the gateway, in-flight
work stops. A grace window large enough to 'save' a long agent turn would
have to outlast an unbounded task, which is impossible.

Also fixes the stale-unit warning's suggested command
(hermes gateway service install --replace -> hermes gateway install --force);
the former subcommand does not exist.

Closes #31981
2026-06-28 01:14:34 -07:00
teknium1
0800f1c28b infographic: whatsapp send-queue serialization (#33360) 2026-06-28 01:10:14 -07:00
teknium1
cb9f855c2b test(whatsapp-bridge): drop structural send-queue integration test
The .integration.test.mjs greps bridge.js source text for the queue
wiring — a change-detector that breaks on any benign refactor of the
same code. The behavioral unit test (bridge.sendqueue.test.mjs) already
covers FIFO ordering, error isolation, timeout propagation, and
single-consumer concurrency, which is the contract that matters.
2026-06-28 01:10:14 -07:00
Tranquil-Flow
c393a8e55f fix(whatsapp-bridge): serialize sendMessage to prevent cross-chat contamination (#33360)
Concurrent sock.sendMessage() calls on a single Baileys socket can cause
the WhatsApp protocol-level routing to misdeliver messages — responses
intended for one chat appear in another.

Add a promise-based send queue that serialises all sendMessage() calls
across concurrent HTTP /send, /edit, and /send-media handlers so only
one send is in-flight at a time.

Includes unit tests for queue ordering, error isolation, timeout
propagation, and single-consumer concurrency semantics, plus an
integration check that the queue is wired into sendWithTimeout.
2026-06-28 01:10:14 -07:00
teknium1
1f72ad9be9 refactor(cli): extract interrupt recovery to a testable helper
Pull the #33271 post-interrupt recovery (flush_stdin + _force_full_redraw)
out of process_loop's finally block into _recover_terminal_after_interrupt(),
and replace the inline-logic-copy tests with ones that exercise the real
helper plus a source guard that process_loop still invokes it behind the
_last_turn_interrupted gate.
2026-06-28 01:08:09 -07:00
zccyman
f3aaba7f85 fix(cli): recover terminal state after interrupt to prevent raw control sequence freeze
When the agent is interrupted during processing, prompt_toolkit's
renderer and VT100 input parser can be left in an inconsistent state.
CSI 6n cursor position report responses leak as literal text
(^[[19;1R) and the terminal stops accepting keyboard input.

Fix: in process_loop's finally block, after an interrupted turn:
- flush_stdin() to drain stray escape bytes from the OS input buffer
- _force_full_redraw() to reset prompt_toolkit's renderer cache

Closes #33271
2026-06-28 01:08:09 -07:00
teknium1
2e1b48ed31 chore: map kurlyk local email → skabartem for PR #32867 salvage 2026-06-28 01:08:04 -07:00
kurlyk
def97bcd96 fix: eliminate race condition in OpenAI client replacement
Make check-and-replace atomic in _ensure_primary_openai_client by
keeping both operations under the same lock acquisition. Previously,
the lock was released between detecting a closed client and replacing
it, allowing two threads to simultaneously replace the client.

Fixes #32846

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-28 01:08:04 -07:00
teknium1
4a0fe4e54a docs: add PR infographic for #32762 clarify-expiry fix 2026-06-28 01:07:53 -07:00
teknium1
aacc15b2c9 fix(clarify): raise default clarify_timeout to 3600s (#32762)
The 600s default evicted the gateway clarify entry while users were
still away (meeting/AFK); a later button tap then landed on a dead
entry and the agent hung on 'running: clarify'. Raise the default to
1h in DEFAULT_CONFIG and the get_clarify_timeout() code-level fallback,
documenting the running-agent-guard tradeoff. User overrides still win.
2026-06-28 01:07:53 -07:00
konsisumer
3f543229f2 fix(telegram): notify user when clarify button tap arrives after expiry 2026-06-28 01:07:53 -07:00
Teknium
90d25adc9e fix(gateway): deliver profile-scoped cache media on symlinked HERMES_HOME (#54060)
Generated images under a profile gateway's cache (profiles/<name>/cache/
images/...) were silently dropped from Telegram/Discord delivery when
HERMES_HOME is symlinked under a denied prefix (e.g. /opt/data ->
/root/.hermes) and $HOME is not that prefix. The resolved path lands
under /root (a system denylist prefix), the root-home exception only
fires when the denied prefix IS $HOME, and the static safe-roots list
only covers the active HERMES_HOME's top-level cache — not per-profile
cache dirs. Both gates fail, so validate_media_delivery_path returns
None and the gateway logs 'Skipping unsafe MEDIA directive path'.

_media_delivery_allowed_roots() now also enumerates per-profile cache
roots (<root>/profiles/*/cache/{images,audio,videos,documents,
screenshots}) at check time. Allowlist match runs before the denylist,
so the profile artifact delivers regardless of the /root interaction;
profile-dir credentials (auth.json) stay blocked since they aren't
under a cache subdir.

Reopened regression of #34485/#38108, neither of which covered the
profile-scoped symlink case. Fixes #31733.
2026-06-28 01:07:28 -07:00
sweetcornna
2701ea2f0c fix(agent): reopen fallback chain after primary recovery 2026-06-28 00:57:42 -07:00
teknium1
7b9ff310b6 fix: salvage #33830 for current main — relocate allow_bots bridge to telegram plugin hook, fix stale adapter import in test 2026-06-28 00:57:03 -07:00
sweetcornna
fc70d023d8 fix(telegram): apply bot auth policy to Telegram sources
# Conflicts:
#	gateway/config.py
2026-06-28 00:57:03 -07:00
sweetcornna
002357a83f fix(tui): repump stdin after readable handler errors 2026-06-28 00:53:29 -07:00
teknium1
3a03d03bdc docs: add infographic for #30636 macOS state.db fix 2026-06-28 00:53:19 -07:00
teknium1
52d774f0f9 fix(state): F_FULLFSYNC barrier at WAL checkpoints on macOS (#30636)
On Darwin, synchronous=FULL (the WAL default) only issues a plain
fsync(), which Apple documents does NOT guarantee writes reach stable
storage or stay ordered. SQLite's WAL corruption-safety guarantee
assumes the OS honors the fsync barrier; macOS does not unless the app
uses F_FULLFSYNC. During a launchd *system* shutdown the page cache is
dropped (effectively power-loss for in-flight pages), so a WAL
checkpoint whose fsync 'reported' durable may never hit the platter —
corrupting state.db with a malformed image. That is the trigger in
#30636 ('SIGTERM during launchd shutdown under high load').

Apply PRAGMA checkpoint_fullfsync=1 (macOS-guarded) in
apply_wal_with_fallback. It forces the F_FULLFSYNC barrier only at
checkpoint boundaries (where WAL frames land in the main DB), so cost
amortizes to ~+0.1ms/commit vs ~+4ms for the broader fullfsync=1.
No-op off Darwin (F_FULLFSYNC is macOS-only).

Root-cause analysis by @catapreta on #30636. Supersedes #30654, whose
synchronous=FULL is a no-op (already FULL in WAL mode) and whose
TRUNCATE-on-close is already on main.

Co-authored-by: catapreta <catapreta@users.noreply.github.com>
2026-06-28 00:53:19 -07:00
Gille
9229d0db17 fix(moa): preserve Nous provider identity for references 2026-06-28 00:47:15 -07:00
Teknium
7c38249c79 feat(moa): references see full tool state + fire on every user/tool response (#54016)
The advisory reference view stripped all tool calls and tool results, so
reference models judged a task whose actions and results they never saw — and
references only fired once per user turn, never re-running as the agent's
state advanced through the tool loop.

Two fixes:
- _reference_messages() now PRESERVES the agent's tool calls and tool results,
  rendering them inline as text ([called tool: ...] / [tool result: ...]) so a
  reference gives an informed judgement on the real current state. Still emits
  zero tool-role messages and zero tool_calls arrays (strict providers reject
  those), and large tool results are previewed head+tail (4000-char budget).
  The required end-on-user shape is met by APPENDING a synthetic advisory user
  turn — not by deleting the agent's latest context (which the prior fix did).
- References now re-run on every state change — each new user message AND each
  new tool result — instead of once per user turn. The state-sensitive advisory
  signature drives the cache: new tool result = miss (re-run), identical-state
  re-call = hit (no re-run, no re-emit).

The acting aggregator still receives the full, untrimmed transcript.
2026-06-28 00:30:11 -07:00
kshitijk4poor
fc7a01b6cb test+harden: modernize salvaged Matrix path for current plugin layout
Two follow-ups on top of the salvaged #46365 fix:

1. Tests: the salvaged tests injected the ephemeral MatrixAdapter via
   sys.modules["gateway.platforms.matrix"], but Matrix migrated to a plugin
   (#41112) and the fallback now imports from plugins.platforms.matrix.adapter.
   Point the three sys.modules patches at the current module path so the
   ephemeral-fallback tests actually exercise the injected fake adapter.

2. Harden the live-adapter lookup: split the gateway import guard from the
   adapter lookup and log (instead of silently swallowing) when a runner
   exists but adapters.get() raises. A silent fall-through there would
   re-introduce the per-send reconnect/OTK-exhaustion storm this fix exists
   to prevent (#46310). Documented that the live adapter is gateway-owned and
   must not be disconnected, and why the ephemeral finally never touches it.
2026-06-28 12:48:08 +05:30
liuhao1024
a7fd62d824 fix(send_message): reuse live gateway adapter for Matrix media sends
When a live gateway adapter is available (i.e. the tool runs inside a
running gateway), reuse the persistent connection instead of creating a
new MatrixAdapter per call. This eliminates per-message E2EE re-init
storms that exhaust recipient OTKs and silently drop messages.

The fix follows the same pattern as _send_to_platform (line 618):
gateway_runner_ref → runner.adapters[Platform.MATRIX]. Falls back to
the ephemeral connect/disconnect cycle for standalone contexts.

Also extracts the shared send logic into _send_via_matrix_adapter()
to avoid duplicating the media dispatch code between the two paths.

Fixes #46310
2026-06-28 12:48:08 +05:30
Ben Barclay
1466eab4ee test(docker): wait for cont-init to finish before privilege-drop shim tests (#54026)
The docker-exec privilege-drop shim tests started a sleep container and
released the fixture as soon as `docker exec <c> true` returned 0. On
s6-overlay that succeeds almost immediately — ~0.05s in measurement —
long before the `01-hermes-setup` cont-init hook (docker/stage2-hook.sh)
has finished seeding + `chown hermes:hermes` config.yaml and running the
Python config migration (cont-init only fully settles at ~9.8s under
arm64 QEMU emulation).

`test_shim_opt_out_keeps_root` wipes config.yaml, writes it as root with
HERMES_DOCKER_EXEC_AS_ROOT=1, and asserts root:root ownership. When the
fixture released the test inside that ~10s window, stage2-hook's
boot-time `chown hermes:hermes config.yaml` raced the root-written file
and reset it to hermes:hermes — failing the assertion. The window is
invisible on native amd64 (stage2-hook completes in a blink) but wide
open under the arm64 build's QEMU emulation, which is why only build-arm64
flaked while build-amd64 stayed green.

Replace the responsiveness poll with a wait on the canonical
'cont-init finished' signal: $HERMES_HOME/logs/container-boot.log gaining
a `profile=default` line, written by 02-reconcile-profiles which s6 runs
strictly after 01-hermes-setup. Mirrors the readiness pattern already
used in test_container_restart.py. Also bumps the readiness timeout 20s->60s
to cover slow emulation.

No production code change — test-only hardening of a timing race.
2026-06-28 17:06:26 +10:00
Jeffrey Quesnelle
2c9b017696 Merge pull request #54000 from NousResearch/fix/desktop-main-cjs-clobber-stage-simple-git
fix(desktop): stop hermes desktop from clobbering tracked main.cjs
2026-06-28 01:56:51 -04:00
Teknium
4f61d48aef test(cron): deterministically wait for ticker, fix wall-clock flake (#54010)
tests/cron/test_scheduler_provider.py spawned a background ticker thread,
slept a fixed 0.2s, then asserted the loop had called tick()/heartbeat() at
least N times. Under loaded CI the worker thread isn't always scheduled
within that window, so the loop hadn't ticked yet — flaking with 'provider
never called tick()' (assert 0 >= 1).

Add a _wait_until(predicate, timeout) helper and replace all five fixed
time.sleep(0.2) sites with a poll on the actual predicate (calls/beats count
reached). Same contract assertions, no wall-clock dependence.
2026-06-27 22:52:29 -07:00
Teknium
1fa44180b0 fix(moa): advisory references end on a user turn + get a reference-role system prompt (#54007)
* fix(moa): reference advisory view must end with a user turn

MoA reference calls failed with Anthropic models that don't support
assistant prefill (e.g. Claude Opus 4.8): '400 ... must end with a user
message'. The advisory view built by _reference_messages() kept the last
assistant turn's text while dropping the following tool result, leaving a
trailing assistant turn — which Anthropic (and OpenRouter->Anthropic)
interpret as an assistant prefill to continue. References are advisory and
must end on the user turn they answer.

Strip trailing assistant turns from the advisory view (preserving
intervening ones). Update the existing test that encoded the buggy shape
and add a mid-tool-loop regression test.

* feat(moa): give reference models an advisory-role system prompt

Reference models received the bare trimmed conversation with no role
framing, so they assumed they were the acting agent and refused ("I can't
access repositories/URLs from here") or tried to call tools they don't have.

Prepend a dedicated advisory system prompt to every reference call: the
model is an analyst, not the actor — it cannot execute, should not
apologize for lacking tools, and should reason about the presented state to
advise the aggregator/orchestrator on approach, next steps, tool-use
strategy, risks, and anything the acting agent missed. Its output is private
guidance for the aggregator, not a user-facing answer.
2026-06-27 22:52:25 -07:00
Teknium
2523917680 fix(tests): bare pytest flags pass through run_tests.sh without a '--' separator (#54008)
The parallel runner only forwarded pytest args after a literal '--', so a
bare 'scripts/run_tests.sh tests/foo.py -q' (or -v/-x/-k/--tb=long) errored
out with 'unrecognized arguments'. This contradicted the docstring's
promise that common pytest flags pass through, and forced a retry on every
run that used pytest muscle-memory.

Now any token starting with '-' that isn't one of the runner's own options
(-j/--jobs, --paths, --slice, --file-timeout, --generate-slices, --files,
--include-integration) is routed to each per-file pytest invocation
automatically. Value-taking flags given space-separated (-k expr, -m mark,
-p plugin, -o name=val, etc.) keep their value instead of having it stolen
by positional-path discovery. The explicit '--' separator still works and
stacks with bare flags.

- scripts/run_tests_parallel.py: argv splitter routes bare unknown flags to
  pytest; value-flag lookahead; updated docstring.
- scripts/run_tests.sh: usage comment reflects bare-flag passthrough.
- tests/test_run_tests_parallel.py: 4 behavior-contract tests (bare -q runs,
  -k keeps its value/filters, '--' still works, positional path stays a root).
2026-06-27 22:43:26 -07:00
emozilla
2d206a3a42 fix(desktop): stop hermes desktop from clobbering tracked main.cjs (#52735)
`npm run build` ended with `bundle-electron-main.mjs`, which esbuild-bundled
electron/main.cjs and renamed the bundle on top of the tracked source file.
Because every `hermes desktop` runs `npm run build`, each launch rewrote a
checked-in source file (~7.5k-line source -> ~14.8k-line bundle), dirtying the
working tree with a build artifact that `git restore` couldn't keep (the next
launch re-clobbered it) and forcing autostash/restore conflicts on update.

The bundle only existed to inline `simple-git` so the packaged app.asar (which
ships no node_modules) wouldn't crash at launch with "Cannot find module
'simple-git'". Replace it with the mechanism the repo already uses for the
other hoisted runtime dep (node-pty): stage the dependency closure and resolve
it from process.resourcesPath at runtime.

- stage-native-deps.cjs: resolve simple-git's runtime closure (walking
  dependencies + optionalDependencies, so a version bump that adds a transitive
  dep can't silently reintroduce the crash) and stage it under
  build/native-deps/vendor/node_modules/. The `vendor/` nesting is load-bearing:
  electron-builder drops a node_modules dir at the ROOT of an extraResources
  copy but keeps a nested one.
- git-review-ops.cjs: fall back to the staged
  native-deps/vendor/node_modules/simple-git when the hoisted require() fails;
  dev runs resolve the hoisted copy and never hit the fallback.
- package.json: drop the bundler from the `build` script so main.cjs is never a
  build target again.
- nix/desktop.nix: drop the direct bundler call (the closure rides the existing
  `cp -rn native-deps` into $out) and patch process.resourcesPath in
  git-review-ops.cjs alongside main.cjs.
- delete scripts/bundle-electron-main.mjs.

Verified: electron-builder's own file filter keeps the full staged closure
(0 dropped), and a packaged win-unpacked build launches with the git-review
pane resolving simple-git from the staged vendor path.
2026-06-28 01:30:09 -04:00
teknium1
c918d42d88 feat(desktop): config-driven Electron launch flags + GPU policy
Adds a desktop: section to config.yaml so headless/VM users can make
`hermes desktop` launch correctly without a wrapper command:

- desktop.electron_flags: extra Electron CLI flags (e.g. --ozone-platform=x11)
  appended to every launch. Accepts a list or a shell-split string.
- desktop.disable_gpu: auto|true|false, bridged to the HERMES_DESKTOP_DISABLE_GPU
  env var the Electron app already reads. An explicit env var still wins.

cmd_gui() reads these via _desktop_launch_options() and applies them. This is
the config.yaml form of the capability proposed as a raw env var in #38934
(@1RB) — behavioral settings belong in config.yaml, not a new HERMES_* env var.

Co-authored-by: ray <86501179+1RB@users.noreply.github.com>
2026-06-27 22:26:43 -07:00
Teknium
1b70a91844 docs: third-party-product plugins ship standalone, not into core tree (#54001)
* docs: third-party-product plugins ship standalone, not into core tree

Generalizes the closed-set memory-provider policy to any plugin that
integrates someone else's product/project (observability backends,
vendor SaaS, analytics dashboards, paid-service tie-ins). These create
an open-ended maintenance burden on us for backends we don't own, so
they ship as standalone plugin repos installed into ~/.hermes/plugins/
and are promoted in #plugins-skills-and-skins — not merged into core.

- AGENTS.md: new 'what we don't want' bullet + generalized policy note
  beside the memory-provider closed-set rule
- CONTRIBUTING.md: new 'Third-Party Product Integrations' section
- build-a-hermes-plugin.md: caution callout at the top of the guide

It's a coupling decision, not a quality bar — a plugin can clear review
and still be a close.

* docs: add infographic for standalone-plugin policy
2026-06-27 22:23:50 -07:00
Rafael Millan
54ea059919 fix: fall back to no-sandbox for desktop launch on restricted Linux hosts 2026-06-27 22:16:20 -07:00
teknium1
97640fd9ad fix(desktop): reserve WCO width on plain Linux + author map
The plain-Linux overlay re-enable (#53185) left nativeOverlayWidth() at 0
for plain Linux, so the native min/max/close buttons painted on top of the
app's right-edge titlebar tools. Reserve the fallback width everywhere the
WCO overlay is painted (Windows, WSLg, plain Linux); macOS still reserves 0
since it uses traffic lights.
2026-06-27 22:05:33 -07:00
Chris Wesley
8194dbf612 fix(desktop): re-enable titleBarOverlay on plain Linux
Commit da5484b61 disabled the Window Controls Overlay on all Linux
(non-Windows, non-WSL) with the note that WCO is a Windows/macOS-only
Electron feature. However, several Linux compositors (KDE/KWin,
GNOME/Mutter) do support it — plain Electron titleBarOverlay paints
native min/max/close buttons that were working before that change.

Narrow the exclusion to only WSLg, where the RDP host draws its own
window controls and an Electron overlay would leave a dead gap.

Fixes: da5484b61 ("fix(desktop): WSL2 clipboard image paste + Linux titlebar overlay")
2026-06-27 22:05:33 -07:00
teknium1
9c7f9f9502 infographic: partial-stream recovery fix (salvage #41498) 2026-06-27 22:03:14 -07:00
infinitycrew39
1fa46570fb test(agent,gateway): cover partial-stream recovery and restart helper salvage 2026-06-27 22:03:14 -07:00
infinitycrew39
e860a40e14 fix(agent,gateway): surface partial-stream recovery and bound detached restart
Salvage of NousResearch/hermes-agent#41498 (0-CYBERDYNE-SYSTEMS-0).

- Leave response_previewed false on partial_stream_recovery so gateway
  fallback delivery can send the recovered fragment plus explanation.
- Always append the turn-completion explainer for partial_stream_recovery,
  not only for empty or very short fragments (#34452 gap).
- Launch the detached /restart helper before drain, idempotently, with a
  bounded wait of restart_drain_timeout + 5s.
2026-06-27 22:03:14 -07:00
Teknium
e3c9924b8b fix(cli): correct stale hermes auth login nous hints to hermes auth add nous (#53929)
* fix(cli): correct stale `hermes auth login nous` hints to `hermes auth add nous`

There is no `hermes auth login` subcommand — valid auth verbs are
add/list/remove/reset/status/logout/spotify. Six user-facing strings told
users to run `hermes auth login nous`, which fails with
`invalid choice: 'login'` — the same broken-hint class reported in #28089
for the proxy flow (already fixed there to `hermes auth add nous`).

Sites corrected to `hermes auth add nous`:
- hermes_cli/dashboard_register.py (401 retry hint, not-logged-in hint)
- hermes_cli/gateway_enroll.py (401 retry hint, not-logged-in hint)
- cli-config.yaml.example (two provider-requirement comments)

* docs(infographic): auth login nous hint fix
2026-06-27 21:30:37 -07:00
Teknium
4626ceb747 fix(gateway): only offer system-scope gateway install to root sessions (#53975)
Non-root users picking 'System service' in the setup wizard were handed a
'sudo hermes gateway install --system --run-as-user <you>' recipe that fails
on most distros: sudo's secure_path strips ~/.local/bin (pipx/uv installs),
so 'sudo hermes' is command-not-found. Worse, it funnels a non-root user
toward a system install they shouldn't be doing from a user session.

Now prompt_linux_gateway_install_scope() only offers system scope when
os.geteuid()==0. Non-root sessions get user-service or skip, with a tip to
re-run as root for a boot service. The non-root branch in
install_linux_gateway_from_setup becomes a defensive guard that refuses
without printing any self-elevation recipe. Gated the matching deferral hint
in setup.py behind root too.
2026-06-27 21:24:08 -07:00
teknium1
b304023fc6 docs(infographic): model picker fixes (#49129 + #51488) 2026-06-27 21:23:25 -07:00
teknium1
c72d68715f chore(release): map salvaged contributor emails for #49129 and #51488 2026-06-27 21:23:25 -07:00
Priyanshu Sharma
f6deabca0d fix(gateway): clear stale base_url on model switches 2026-06-27 21:23:25 -07:00
teknium1
f54c52800a fix(models): scope live-first picker merge to opencode aggregators only
Follow-up to the salvaged #49129 commit. The original change flipped the
shared generic-provider merge in provider_model_ids() to live-first
unconditionally, which regressed curated-first for single providers
(kimi/zai, #46309) — and the PR encoded that regression by flipping the
kimi-coding and zai test assertions to expect live-first.

Gate live-first on an explicit _LIVE_FIRST_PICKER_PROVIDERS set
({opencode-zen, opencode-go}); every other provider keeps curated-first.
Also widen the uncapped picker + live-first sets to opencode-go, which has
the same 70+ model catalog problem as opencode-zen. Restore the
kimi-coding curated-first test and rewrite the merge-order test to assert
the per-provider contract.
2026-06-27 21:23:25 -07:00
Afnath Ahamed
f98ffbc246 fix(models): live-first merge + update opencode-zen catalog + uncap aggregator picker 2026-06-27 21:23:25 -07:00
teknium1
2e7e600eaa chore(release): map HexLab98 author for PR #53863 salvage 2026-06-27 21:22:49 -07:00
HexLab98
04ff4d9b54 test(auxiliary): cover env-only proxy policy for auxiliary clients (#53702) 2026-06-27 21:22:49 -07:00
HexLab98
073847c0f2 fix(auxiliary): use env-only proxy policy for OpenAI SDK clients (#53702)
Auxiliary clients now inject a keepalive httpx transport with explicit
HTTPS_PROXY/NO_PROXY resolution, matching the main agent. This avoids
macOS system proxy settings (which omit the ExceptionsList) breaking
vision and other auxiliary calls to internal provider endpoints.
2026-06-27 21:22:49 -07:00
Teknium
3b23a984b5 feat(kanban): stamp handoff freshness so workers don't read stale state as current (#53973)
Multi-agent boards leak staleness: a sibling worker's parent handoff,
comment, or prior-attempt summary gets read by the next worker as live
truth even when it's a day old. build_worker_context surfaced the text
with (at best) a bare absolute timestamp, which an LLM reads as fact
regardless of age — parent results had no timestamp at all.

Adds a coarse relative-age stamp (just now / 18h ago / 3d ago) to every
recalled-state line and a one-line 'point-in-time snapshot, re-verify
against source' frame on the parent-results section, so the worker sees
when handoffs were produced and re-checks stale ones before acting.
2026-06-27 21:21:54 -07:00
Teknium
131c9c542c test(tui-gateway): stop deferred-resume build thread leaking into next test
test_session_resume_uses_parent_lineage_for_display resumes via the
deferred (non-eager) path, which fires a 50ms background Timer
(_schedule_agent_build) calling whatever server._make_agent is patched
in at that moment. The timer outlived the test and landed in the next
test's (_follows_compression_tip) _make_agent mock, racily setting
agent_session_id='tip' and flaking 'assert tip == cont_tip' on CI.

Root-cause fix: stub _schedule_agent_build to a no-op in the leaking
test (it only asserts display history). Defense in depth: the victim's
fake_make_agent now setdefault()s so a stray late build can't overwrite
the synchronous eager build's captured id.
2026-06-27 21:07:53 -07:00
Teknium
e418605450 test(24996): freeze monotonic clock to de-flake fallback cooldown timing
The exhaustion-cooldown timing assertions relied on a wall-clock budget
(before + window + 1.0s). On loaded CI runners the activation calls could
exceed the 1s slack, flaking 'Run tests slice 4/8'. Freeze
chat_completion_helpers.time.monotonic so the cooldown math is exact and
load-independent across all four tests.
2026-06-27 21:07:53 -07:00
teknium1
1ad8b44413 docs(infographic): skill sync external_dirs shadow fix 2026-06-27 21:07:53 -07:00
zccyman
db11849c9d fix(skills): skip shadowing when external_dirs provides the skill
Fixes #28126. sync_skills() was unconditionally writing bundled skills
into the local <profile_home>/skills/ tree even when the profile's
config.yaml delegated skill resolution to an external directory
via skills.external_dirs. The skill loader then saw two candidates
for the same name (local shadow + external canonical), refused to
resolve on collision, and every worker that auto-loaded such a skill
crashed with 'Unknown skill(s): <name>'.

Changes:
- _build_external_skill_index() indexes skills available in external
  dirs (by directory name and frontmatter name)
- sync_skills() skips writing a bundled skill when it finds the same
  name in the external index; records the hash in the manifest so
  subsequent syncs treat it as already handled
- Self-healing: removes stale local shadows left by prior buggy syncs
  (only when origin_hash == bundled_hash == user_hash, i.e. we wrote
  it and user didn't touch it)
- New 'shadowed_by_external' key in sync_skills() return dict

3 new tests in TestExternalDirsIndexing (all passing).
All 48 tests in test_skills_sync.py pass.

Closes #28126
2026-06-27 21:07:53 -07:00
Teknium
a8c862900b fix(tui): sanitize replay history on WebUI/TUI session resume (#29086) (#53939)
A WebUI/TUI session whose last turn died mid-tool-loop (stale-timeout kill,
interrupt, or process restart before the tool result was written) persists a
dangling assistant(tool_calls) or interrupted assistant->tool tail. The
messaging gateway already strips these tails before replay (the #49201 fix),
but the TUI/WebUI resume path fed db.get_messages_as_conversation() straight
in as the agent's conversation_history with no cleanup. The model re-issued
the unanswered call on every resume -- including after a full WebUI + Gateway
restart, since the poison lives in the SessionDB, not memory -- leaving the
session permanently 'thinking'. Only deleting the session recovered it.

- Extract the two strippers + helper from gateway/run.py into a shared
  agent/replay_cleanup.py (sanitize_replay_history wraps both).
- gateway/run.py re-exports under the historical private names; messaging
  behavior unchanged.
- Both TUI cold-resume sites now sanitize the model-fed history while leaving
  the display transcript untouched, so the user still sees their full history.

Verified E2E against a real SessionDB: dangling and interrupted tails are
stripped from the model feed, healthy mid-progress tool sequences are
preserved, and the display transcript is always the full raw history.
2026-06-27 20:56:49 -07:00
Teknium
f03823014b fix(telegram): kill 409 polling conflict loop by disarming PTB retry synchronously (#53941)
Telegram polling entered a self-inflicted ~31s loop of 409 Conflict ->
retry -> resume -> Conflict. The error_callback PTB invokes synchronously
inside its internal network_retry_loop only scheduled our async recovery
task (loop.create_task) and returned, so PTB kept polling getUpdates on its
own while our handler concurrently ran stop -> sleep -> start_polling. The
two polling sessions overlapped and Telegram returned a fresh 409.

Fix: in the conflict branch of the error_callback, synchronously set PTB's
private polling stop_event before scheduling recovery. PTB's loop exits on
its next tick (it races that event in do_action), so our handler owns
polling alone. The handler's await updater.stop() drains the task and PTB
clears the event, so the subsequent start_polling() builds a fresh event
and is not poisoned.

Keeps the existing reconnect ladder intact (option B) — fixes only the
race. Defensive: probes mangled + unmangled stop_event spellings and no-ops
(prior behaviour) if neither exists; never flips _running, which would make
the handler skip stop() and leave the loop wedged.
2026-06-27 20:46:08 -07:00
Teknium
d43e0cf304 fix(agent): config-driven intent-ack continuation for all api_modes (#27881) (#53943)
* fix(agent): config-driven intent-ack continuation for all api_modes (#27881)

The agent could end a turn after only stating intent ('I will run a health
check...') without executing the announced tool call, forcing the user to
re-prompt. A continuation guard that catches this and nudges the model to
proceed already existed but was hard-gated to the codex_responses api_mode,
so Gemini/Claude/OpenRouter turns never benefited.

- New agent.intent_ack_continuation config (default 'auto' = codex-only,
  byte-stable for existing conversations). 'true'/model-list opts every
  api_mode in; 'false' disables. Mirrors agent.tool_use_enforcement's shape.
- looks_like_codex_intermediate_ack gains require_workspace (default True).
  The opted-in path drops the codebase/filesystem requirement so general
  autonomous workflows (server ops, deploys, API calls) are caught, not just
  coding tasks. Future-ack + action-verb + short-content + no-prior-tool
  guards still apply; the 2-nudge-per-turn cap is unchanged.
- Resolution centralized in intent_ack_continuation_mode (off/codex_only/all).

* docs(infographic): intent-ack continuation (#27881)
2026-06-27 20:46:00 -07:00
Teknium
56abbaeac3 fix(curator): fail closed on unverified skill deletes during consolidation (#53935)
The curator's LLM consolidation pass could archive whole clusters of
active skills with zero verified consolidations (#29912): a bare prune
(skill_manage delete with absorbed_into empty/omitted) from the forked
review agent was accepted, removing the skill's name from lookup even
though counts.consolidated_this_run was 0.

- _delete_skill now fails closed during the curator/background-review
  pass: a delete is only allowed when it declares a verified
  consolidation (absorbed_into=<umbrella>, umbrella must exist). A prune
  with no forwarding target is refused; the skill stays active. The
  deterministic inactivity prune (archive_skill) is unaffected.
- A verified consolidation delete during the curator pass now routes
  through the recoverable archive primitive instead of shutil.rmtree, so
  a misjudged consolidation can be undone with hermes curator restore.
  The usage record is kept (state=archived) rather than forgotten.
- Foreground, user-directed deletes keep their existing hard-delete
  semantics.
2026-06-27 20:45:57 -07:00
konsisumer
11b0be8d15 fix(gateway): avoid Matrix pending invite boot loops 2026-06-27 20:45:51 -07:00
teknium1
a1ac6baac4 fix(gateway): make bg-process reset TTL configurable + surface session-scoped processes
Follow-up to the cherry-picked #29212 (#29177):

- Promote the 24h stale-process threshold to config.yaml
  (session_reset.bg_process_max_age_hours) instead of a hardcoded
  constant. 0 disables the cutoff (legacy: any live process blocks reset).
  Wired through GatewayConfig.default_reset_policy in gateway/run.py.
- Bug 2: process(action=list) now resolves the gateway session_key from
  the contextvar and surfaces session-scoped background processes (a
  forgotten preview server under a different task), flagged
  session_scoped — so the agent/user can discover and kill the blocker.
  Previously the task-scoped list returned [] and the blocker was invisible.
- Tests: config round-trip for the new field, cross-task list visibility.
- Docs: messaging session-reset section.
2026-06-27 20:45:43 -07:00
annguyenNous
33d8b66d5b fix: stale background processes no longer permanently block session reset
Background processes (e.g. http.server preview) that Hermes starts and
forgets about previously blocked session idle/daily reset indefinitely.
The reset guard in session.py checked has_active_for_session() with no
max age — a 3-day-old preview server blocked reset the same as a task
started 30 seconds ago.

Changes:
- Add max_active_age parameter to has_active_for_session() in
  process_registry.py. Processes older than this threshold are ignored.
- Add MAX_ACTIVE_PROCESS_AGE constant (24h / 86400s).
- Wire max_active_age into the gateway's session store callback in
  run.py so stale processes no longer block session lifecycle.
- Add debug logging when reset is skipped due to active processes.
- Add 3 tests covering recent, stale, and legacy (None) max age.

Fixes #29177
2026-06-27 20:45:43 -07:00
teknium1
8c8967a50b fix: defer hermes_subprocess_env import in browser_tool
The module-level import broke tests/tools/test_managed_browserbase_and_modal.py,
which loads browser_tool.py via spec_from_file_location against a stubbed
'tools' package that does not include tools.environments.local. Move the import
into a _build_browser_env() helper called at the two agent-browser spawn sites,
matching the lazy-import pattern already used by lazy_deps.py.
2026-06-27 20:45:31 -07:00
teknium1
9c6229ce24 fix(security): centralize credential-safe subprocess env (#29157)
Subprocesses spawned outside the terminal/execute_code path (agent-browser,
copilot ACP, dep-ensure, lazy_deps uv install, TUI Node host, cli.exec)
inherited the operator's full credential environment via os.environ.copy().
The terminal path was already scrubbed by _HERMES_PROVIDER_ENV_BLOCKLIST
(#1002/#1264/#32314); these spawn sites bypassed it.

Adds hermes_subprocess_env(inherit_credentials=) in tools/environments/local.py
reusing the existing dynamic blocklist as the single source of truth:

  - Tier 1 (_ALWAYS_STRIP_KEYS): gateway bot tokens, GitHub auth, infra
    secrets -- stripped even for credential-inheriting children.
  - Tier 2 (_HERMES_PROVIDER_ENV_BLOCKLIST): provider/tool keys -- stripped
    unless inherit_credentials=True. The opt-in is grep-able for audit.

Browser worker keeps a _BROWSER_PASSTHROUGH_KEYS allowlist (BROWSERBASE/
FIRECRAWL) re-added after the strip. Model-driving children (ACP, TUI Node
host, cli.exec) use inherit_credentials=True so they still get provider keys
while losing Tier-1 secrets. Installers (dep-ensure, lazy_deps) inherit
nothing sensitive. cua_backend already routed through _sanitize_subprocess_env
on main -- left as-is. Gateway adapter utility spawns (gh pr comment, ffmpeg)
are left inheriting env: gh needs GH_TOKEN by design, ffmpeg is a trusted
system binary -- no untrusted-dependency exposure.

This is defense-in-depth (personal-assistant trust model: same-user spawns),
making the existing scrub policy uniform across the spawn surface; the main
real payoff is shrinking the blast radius if a transitive npm dep in
agent-browser is compromised.

Reconstructed on current main from the design in #31959 (Tranquil-Flow);
also credits #39003 (rodboev), #37843 (coygeek), #35769 (egilewski).

Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
Co-authored-by: rodboev <rod.boev@gmail.com>
Co-authored-by: egilewski <egilewski@egilewski.com>
2026-06-27 20:45:31 -07:00
Hermes Agent
88b3d8638e test: de-flake SIGKILL-tree, compression-tip resume, and fallback-cooldown tests
Three CI flakes hit while landing the credential-pool restore fix; all three
were timing/wall-clock races in the tests, not product bugs (each passes
locally and the assertions are correct):

- test_entire_tree_is_sigkilled_not_just_parent: _terminate_host_pid SIGKILLs
  synchronously, but the test's 4s budget after a 1s in-function SIGTERM grace
  left almost no slack for the kernel to tear down 3 processes + reparent the
  children to zombies under loaded-CI scheduling. Widen the wait to 15s and
  make the liveness predicate tolerant of vanished-pid / zombie races. The
  assertion never weakens: every tree member must end up dead or zombie.

- test_session_resume_follows_compression_tip: appended messages got
  time.time() timestamps (~now) while the test forced session started_at into
  the past, so the get_compression_tip MAX(m.timestamp) tiebreaker depended on
  wall-clock ordering. Pass explicit, well-separated message timestamps so the
  chain resolution is deterministic by construction.

- test_non_retryable_exhaustion_arms_cooldown: asserted the short (5s)
  exhaustion cooldown with a tight +1.0s slack, which false-fails when
  wall-clock jitter between the 'before' snapshot and the cooldown computation
  exceeds a second on a loaded runner. Widen to +30s — still cleanly below the
  60s rate-limit window it must distinguish from.
2026-06-27 20:04:45 -07:00
Jack Maloney
f0de4c6a47 fix(pool): re-select from credential pool on primary runtime restore
_restore_primary_runtime restored the construction-time api_key snapshot and
never consulted the credential pool. After the pool rotated away from a
revoked/exhausted entry mid-session, every new turn restored the dead key,
re-failed instantly, burned the remaining entries, and fell through to
cross-provider fallback.

After restoring the snapshot, re-select the pool's current best entry and
swap the live credential in via _swap_credential (which already rebuilds the
OpenAI/Anthropic client, reapplies base-url headers, and carries the #33163
base_url / OAuth-detection fixes). Falls back to the snapshot key when the
pool is absent, empty, or the entry has no usable key.

Salvaged from #25206 onto current main: the original targeted the pre-refactor
monolithic method in run_agent.py; the logic now lives in
agent/agent_runtime_helpers.py and is collapsed onto _swap_credential instead
of re-inlining the client rebuild.

Fixes #25205
2026-06-27 20:04:45 -07:00
teknium1
a590c5efdc docs: add infographic for provider-precedence fix (#29285) 2026-06-27 19:49:02 -07:00
kshitijk4poor
2af1678bfc fix(auth): explicit provider intent beats stale OAuth active_provider (#29285)
`resolve_provider("auto")` checked `auth.json` `active_provider` BEFORE the
config.yaml `model.provider` and env-var API-key checks. So a user who was
OAuth-logged-into one provider (e.g. Anthropic) but had set an explicit
`model.provider` or exported an API key (e.g. `OPENAI_API_KEY`) was silently
routed to the stale OAuth provider — the override was invisible and surprising.

Reorder the auto-path so explicit intent wins (the order the issue asks for):

  1. explicit CLI api_key/base_url
  2. config.yaml `model.provider`            (safety net — see below)
  3. OPENAI_API_KEY / OPENROUTER_API_KEY env
  4. OpenRouter credential pool
  5. provider-specific API-key env vars
  6. auth.json `active_provider` (OAuth)      ← demoted to last-resort
  7. AWS Bedrock credential chain
  8. error

`active_provider` is still honored — it's just a last-resort fallback chosen
only when the user expressed no other preference, instead of overriding one.

The normal chat/gateway/TUI/ACP/status path already resolves config.provider
upstream in `resolve_requested_provider()` before "auto" is reached, so this
duplicate config check is the safety net for the lone direct caller
(`main.py` `resolve_provider("auto")`) and any future bypass. Because every
surface funnels through this one resolver, the fix propagates everywhere with
a single edit — no sibling path re-implements precedence.

Also add a one-shot WARN when resolution lands on `active_provider` while a
populated `model` config dict lacks a `provider` key — surfacing the silent
override the issue reported without breaking first-install.

Synthesizes the two competing PRs: #29615 (LifeJiggy — config-before-auth +
the silent-override framing) and #29809 (Minksgo — the env-before-auth
reorder). #29809 could not be merged directly (bundled unrelated, un-opt-in
cost-tagging telemetry); its reorder idea is incorporated here and credited.

Tests: tests/hermes_cli/test_provider_precedence.py — config/env beat stale
OAuth, OAuth still used as last resort, explicit request short-circuits, WARN
fires on silent fall-through. Full provider-resolution suites: 374 passed.

Fixes #29285

Co-authored-by: LifeJiggy <141562589+LifeJiggy@users.noreply.github.com>
Co-authored-by: Minksgo <153416856+Minksgo@users.noreply.github.com>
2026-06-27 19:49:02 -07:00
teknium1
2b73dd1ca6 fix(gateway): namespace --replace takeover marker by HERMES_HOME to stop cross-profile flap (#29092)
Two profile gateway services sharing the default ~/.hermes resolve the
takeover marker to the same path. A --replace from profile B could land
in profile A's marker, match on PID + start_time by coincidence of a
shared PID namespace, and make profile A exit 0 — only to be revived by
systemd Restart=always, which races the replacer again, flapping
indefinitely.

write_takeover_marker now stamps replacer_hermes_home; the shared
consume path rejects markers written under a different HERMES_HOME and
leaves them in place for the correct profile. Absent field (older
markers) is treated as same-home, so single-profile and mixed old/new
deployments are unaffected.

Salvaged from #31414 by @CryptoByz onto current main (branch was ~3962
commits behind; the consume function had since been refactored for
issue #34597). Co-authored-by: CryptoByz.
2026-06-27 19:43:02 -07:00
Teknium
28ed883959 docs: add PR infographic for config-defaults fix 2026-06-27 19:38:11 -07:00
Teknium
45b2e4dd6b fix(config): opt newer migrations out of default-stripping
The salvaged #27354 fix made save_config strip schema-default leaves by
default. Five migration sites added to main after the PR was authored
still called bare save_config(config) and intentionally materialize a
(often default-valued) key: model_catalog.ttl_hours, write_approval,
curator.consolidate, agent.verify_on_stop, and the suspicious-MCP-server
disable. Pass strip_defaults=False so those one-time deliberate writes
survive, matching the opt-out the PR applied to the other migrations.
2026-06-27 19:38:11 -07:00
郝鹏宇
98488c4be4 fix(config): prevent save_config from materialising schema defaults
Fixes #27354

Root cause:  called during init (or by any code path
that saves ) wrote injected schema defaults into
config.yaml as if the user had authored them.  Two fix layers:

1.  now only injects
    when the user actually set
    somewhere (root or agent).  A user who never set
    keeps it absent, so 's explicit-path
   detection won't treat it as user-authored.

2.  gains a  parameter and a
   new  pass that removes keys matching
    unless those paths were explicitly present in the
   **raw** (pre-normalization) config on disk.  Explicit-path detection
   uses  on  *before* any
   normalisation runs — preventing injected-in defaults from being
   mistaken for user-set values.

All migration and edit-config call sites pass
to preserve their intentional default-seeding behaviour.

New helpers:
-   — collects leaf-key paths from a raw dict
-    — removes keys matching schema defaults

Test coverage: 4 new regression tests (59 total, all passing).
2026-06-27 19:38:11 -07:00
teknium1
6dcc579bcb test(streaming): repoint anthropic stream-cleanup test to close+rebuild path
The existing test_anthropic_stream_parser_valueerror_retries_before_delivery
asserted mock_replace.call_count == 1 — i.e. it passed precisely because the
buggy OpenAI rebuild was invoked on the Anthropic path. Repoint it to assert
the corrected close+rebuild-Anthropic behavior (#28161).
2026-06-27 19:37:33 -07:00
EloquentBrush0x
a0b9663c7c fix(streaming): rebuild Anthropic client on stream cleanup instead of OpenAI client
interruptible_streaming_api_call() has three connection-pool cleanup
sites that called _replace_primary_openai_client() unconditionally.
For api_mode=anthropic_messages this has two consequences:

1. _replace_primary_openai_client() fails (OPENAI_API_KEY unset on
   Anthropic-only configs), so dead connections are never purged.
2. The stale-stream detector's outer-poll site (L1977) is the only
   mechanism that can interrupt the worker thread while it blocks in
   for event in stream:. Because the Anthropic client is never closed,
   the thread stays blocked until the 900 s httpx read-timeout fires,
   producing a visible 15-minute hang for Telegram/gateway users on
   claude-opus-4-7.

Fix: mirror the existing interrupt-path pattern (L1989-1997) at all
three cleanup sites — if api_mode == "anthropic_messages", call
_anthropic_client.close() + _rebuild_anthropic_client() instead of
_replace_primary_openai_client(). _rebuild_anthropic_client() handles
both direct Anthropic and Bedrock-hosted Claude correctly, unlike the
inline build_anthropic_client() calls in open PR #14430.

PR #14430 (open) covers only the outer stale-detector site (L1977).
PR #23678 (open) covers only the inner retry sites (L1774, L1833).
This PR covers all three sites and uses _rebuild_anthropic_client()
for Bedrock parity.

Fixes #28161
2026-06-27 19:37:33 -07:00
xxxigm
6f1a176b33 fix(gateway/discord): REST liveness probe to detect zombie clients (#26656)
The Discord adapter could enter a silent zombie state after a network
outage / proxy stall: the process is alive, _client looks open, but the
underlying socket is dead. discord.py's WebSocket reconnect never sees a
RST through a wedged proxy/NAT, so client.start() spins forever without
exiting — which means the bot-task done callback (which only fires on
task completion) never trips either. The bot stays "offline" in Discord
until a manual `hermes gateway restart`. Reported offline for 13-17h.

Adds an out-of-band REST liveness probe in DiscordAdapter. Every
`discord.liveness_interval_seconds` (default 60s) the adapter issues a
cheap fetch_user(bot_id) — the same REST path as message delivery, so it
fails when the proxy/NAT is wedged. After
`discord.liveness_failure_threshold` consecutive failures (default 3) the
probe closes the wedged client and surfaces a retryable fatal error,
which trips the gateway's existing _platform_reconnect_watcher and
rebuilds the adapter. Operators disable it by setting either knob to 0.

Config lives in config.yaml (discord.liveness_*) per the .env-is-secrets
policy; _apply_yaml_config bridges it to internal env vars the adapter
reads, matching the existing HERMES_DISCORD_TEXT_BATCH_* pattern.

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-06-27 19:30:32 -07:00
teknium1
457c8a0a7c fix(file-ops): keep worktree isolation when restoring preserved cwd (#26211)
The durable _last_known_cwd anchor is keyed by the shared 'default' container,
so a non-owning worktree session could inherit the owning session's cwd through
it — breaking the wrong-worktree-routing fix (test_file_tools_cwd_resolution::
test_resolution_routes_to_resolving_sessions_worktree).

Reorder _authoritative_workspace_root so the session-specific registered cwd
override (keyed by raw session id) is checked BEFORE the shared-container
_last_known_cwd fallback. A non-owning session now resolves into its own
registered worktree; the durable anchor only fills in when there's no
session-specific override (the #26211 single-session case). Adds a regression
test covering the owner-mirrors-then-other-session-resolves interaction.
2026-06-27 19:29:06 -07:00
teknium1
b2faeba182 fix(file-ops): make preserved cwd reachable at write-time resolution (#26211)
Belt-and-suspenders on top of the cherry-picked cwd-preservation fix:

- Proactively mirror every live terminal cwd into _last_known_cwd on each
  successful read, so the durable anchor survives even when the cleanup
  thread pops both _file_ops_cache and _active_environments before
  _get_file_ops' stale-cache save branch can fire.
- Fall back to _last_known_cwd in _authoritative_workspace_root. write_file_tool
  resolves the path (via _resolve_path_for_task) BEFORE _get_file_ops rebuilds
  the env, so restoring only the rebuilt env's cwd was insufficient — the
  resolution that decides where the file lands runs first. This closes that gap.

The local env's persisted _cwd_file can't serve this role: it's keyed by a
random per-session uuid and deleted on cleanup (the same cleanup that triggers
the bug). The in-memory _last_known_cwd registry is the durable anchor instead.

Adds a real-IO E2E regression (TestSilentFileMisplacementE2E) exercising the
actual write_file_tool path after env cleanup.
2026-06-27 19:29:06 -07:00
zccyman
adeba1d7a8 fix(file-ops): preserve CWD across terminal environment re-creation (#26211)
Root cause: when the terminal environment (`_active_environments` entry) is
cleaned up and re-created during a long conversation, the new environment
always starts with the default config CWD (typically `~/.hermes/hermes-agent`)
instead of preserving the user's last-known working directory. Subsequent
relative-path writes (`write_file`, `execute_code`, shell commands) silently
land in the default CWD, making files appear to be "created but absent."

Fix: add `_last_known_cwd` dict that preserves the old environment's CWD
before the stale cache entry is invalidated. When a new environment is
created for the same task_id, we check `_last_known_cwd` first and use the
preserved CWD instead of the config default.

Changes:
- tools/file_tools.py: add `_last_known_cwd` dict, save CWD before stale
  cache invalidation, restore CWD on env recreation
- tests/tools/test_file_tools.py: add `TestLastKnownCwd` with 2 tests
  verifying CWD preservation and fallback behavior

Fixes #26211
2026-06-27 19:29:06 -07:00
teknium1
926a1b915d fix(tools): suppress transient check_fn flakes so subagents keep file/terminal tools
A flaky external probe in a tool's check_fn (e.g. check_terminal_requirements
running `docker version` with a 5s timeout, momentarily timing out under load)
would return False for a single get_tool_definitions() call. Because file
tools delegate their check_fn to the terminal check, that one flake silently
stripped read_file/write_file/patch/search_files AND terminal from whatever
agent was being constructed at that instant — most visibly a delegate_task
subagent, which then reported "Tool read_file does not exist". This explains
both the intermittent (~80% success) user-session failures and the
deterministic cron failures in #21658 / #5304.

The existing _check_fn TTL cache made this worse: it cached the transient
False for the full 30s window, poisoning every subagent spawned in that span.

Fix: remember the last time each check_fn returned True; when a fresh probe
fails within a short grace window of that success, treat it as a flake —
serve the last-good True and do NOT cache the failure (so the next call
re-probes). A failure with no recent success, or past the grace window, is
honored normally so a backend that genuinely went down stops advertising its
tools. Probe failures now log at WARNING regardless of quiet mode, making the
previously-silent tool loss diagnosable in subagent (quiet) sessions.

Co-authored-by: Stuart Horner <5261694+djstunami@users.noreply.github.com>
2026-06-27 19:29:00 -07:00
Shashwat Gokhe
505bc27d8d fix(gateway): classify mixed attachments per-attachment + transcode uncommon image formats
A document attached alongside an image in the same Discord message was
swept into the vision pipeline and 400'd the whole turn ("Could not
process image"), and was simultaneously never surfaced to the agent as a
readable file. Restores the "any file type works" contract for mixed
messages and fixes the HTTP 400.

Bug 1 — mixed attachments: the inbound routing loop keyed image/audio/video
classification off the message-level type (PHOTO/VOICE/AUDIO), so a doc in
a PHOTO message landed in image_paths and poisoned the vision call. The
document context-note path was gated on message_type == DOCUMENT, so that
same doc never reached the agent at all. Now classification is
per-attachment (trust each attachment's own MIME; fall back to the
message-level type only when MIME is unknown), via shared _event_media_is_*
helpers used by both _build_media_placeholder and the main inbound loop.
The document note now fires for any non-image/audio/video attachment
regardless of message-level type.

Bug 2 — uncommon formats: AVIF/HEIC/BMP/TIFF/ICO produced the same generic
400 because providers only accept PNG/JPEG/GIF/WEBP. image_routing now
transcodes those to PNG via Pillow before declaring media_type, skipping
cleanly (logged) if Pillow/plugins are missing. SVG is vector — Pillow
can't rasterize it — so it's skipped rather than transcoded.

Closes #25935.

Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: cypres0099 <74935762+cypres0099@users.noreply.github.com>
2026-06-27 19:26:04 -07:00
teknium1
0c372274cd fix(agent): disable OpenAI SDK auto-retry that double-fires inside the rate-limit loop
Same bug class as the Anthropic fix (#26293): the OpenAI/aggregator client is
built without max_retries, so the SDK default of 2 applies. The SDK's own 1-2s
backoff ignores Retry-After and retries inside hermes's outer conversation loop,
burning request slots against a rate-limited bucket. Set max_retries=0 at the
single create_openai_client chokepoint (covers init, switch_model, recovery,
restore, request-scoped). auxiliary_client builds its own clients and is not
wrapped by the loop, so it keeps SDK retries.
2026-06-27 19:23:15 -07:00
konsisumer
1ab35ba25d fix(anthropic): stop SDK auto-retry double-firing and raise Retry-After cap to 600s
The Anthropic SDK clients were built without max_retries, so the SDK
default (max_retries=2) retried 429/5xx with its own backoff that ignores
Retry-After — double-retrying inside hermes's outer loop and burning
request slots against a bucket that won't refill for minutes. Set
max_retries=0 on all Anthropic/AnthropicBedrock client constructions so
the outer conversation loop (which already honors Retry-After) owns retry.

Also raise the Retry-After cap in the conversation loop from 120s to 600s.
Anthropic Tier 1 input-token buckets reset in ~171s, so the 120s cap made
hermes retry before the reset window and re-trip the limit.

Refs #26293
2026-06-27 19:23:15 -07:00
LeonSGP43
32732a8f83 fix(agent): cap same-entry credential refreshes so fallback can activate (#26080)
A persistent upstream 401 on a single-entry OAuth pool (common for Claude
Max subscribers) made the credential-pool recovery spin forever:
try_refresh_current() re-mints a fresh token and reports success on every
401, so recover_with_credential_pool returned True and the retry loop
continue'd without ever incrementing retry_count or reaching the
auth-failover block. The configured fallback_model never activated and the
agent appeared to hang.

Cap consecutive successful same-entry refreshes (keyed by provider +
pool-entry id) at 2; once exceeded, treat the credential as unrecoverable
and return not-recovered so the loop falls through to
_try_activate_fallback. The 429/billing paths already rotate-or-fall-through
correctly (mark_exhausted_and_rotate returns None on a single entry), so
only the auth-refresh branch needed the cap.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-06-27 19:20:07 -07:00
Teknium
fae920642a fix(agent): throttle cross-turn fallback-switch replay storm (#24996) (#53909)
When every provider in the fallback chain fails non-retryably back-to-back
(e.g. HTTP 400/402/429 across distinct providers), the within-turn walk is
already bounded — _fallback_index advances monotonically and the loop aborts
when the chain exhausts. The damaging mode is cross-turn: restore_primary_
runtime resets _fallback_index=0 every turn, so a client that re-submits
immediately replays the entire chain, re-marshaling the full (potentially
80k-token) context once per provider every turn with no throttle on the
non-rate-limit path. On constrained hosts this exhausts memory/swap.

Rate-limit/billing failures already arm a 60s cooldown via _rate_limited_until;
the gap was the non-rate-limit case. Now, when the chain exhausts on a non-
rate-limit failure with a non-empty chain, arm a short (5s) cooldown on the
same _rate_limited_until gate (max(), never shrinking an existing window).
The next turn's restore stays gated and does NOT reset the index, so the
chain isn't replayed until the cooldown clears. No new state, no thread sleep,
no false-trip on legitimately long chains (those walk normally within a turn).

Tests: tests/run_agent/test_24996_fallback_exhaustion_cooldown.py
2026-06-27 19:15:40 -07:00
Chaz Dinkle
1dde7e2f2a fix(anthropic): adopt Claude Code's already-refreshed token before racing refresh
Claude Code OAuth refresh tokens are single-use; Claude Code refreshes on
its own schedule, so by the time Hermes notices an expired token Claude
Code may have already rotated it. Re-read live credential sources first and
adopt a valid token rather than POSTing a possibly-stale refresh token.

Ports the _refresh_oauth_token hardening from PR #40107 (chazmaniandinkle)
on top of the keychain/file reconciliation from PR #21112 (nodejun).
Adds AUTHOR_MAP entry for nodejun.
2026-06-27 19:14:43 -07:00
jun
5a5396aecb fix(anthropic): reconcile keychain/file credentials when one is expired
read_claude_code_credentials() previously returned the macOS Keychain
entry as soon as one existed, even if its OAuth token was already
expired. Callers then ran is_claude_code_token_valid() on the result
and got False, so resolve_anthropic_token() returned None — surfacing
the misleading 'No Anthropic credentials found' error even when
~/.claude/.credentials.json held a perfectly valid token.

Now reads both sources and prefers the non-expired one. When both are
valid (or both expired), prefers the later expiresAt so any subsequent
refresh uses the freshest refresh_token.

Adds TestReadClaudeCodeCredentialsDesync covering the four reconciliation
cases. The existing 'keychain wins' priority test still passes because
both fixtures share the same expiresAt and the tiebreaker is >=.
2026-06-27 19:14:43 -07:00
Teknium
db16854f34 fix(telegram): surface failed media downloads to user and agent, not a silent empty turn (#53912)
When a Telegram attachment download/cache fails (typically a transient
httpx.ConnectError to Telegram's CDN), the except handler logged a warning
and fell through to handle_message() with empty media and no text — the user
thought the file was delivered, the agent saw a content-less turn with no
signal an attachment was attempted, and the only record was a buried log line.

Adds _surface_media_cache_failure(): replies to the user in Telegram so they
know to retry, and appends an agent-visible notice to event.text via the
existing _append_observed_note channel so the agent knows an attachment was
attempted and failed. No new event fields (structured-event refactor is out
of scope per #23045). Wired into all five cache-failure sites — photo, voice,
audio, video, document — since they shared the identical silent fall-through.

Bug 1 from #23045 (unsupported types routed as fake user messages) no longer
exists on main: the document handler now accepts any file type, so there is no
rejection branch to fix.

Closes #23045
2026-06-27 19:12:57 -07:00
teknium1
6514be5a28 chore(release): add AUTHOR_MAP entry for linyubin (#50228 salvage) 2026-06-27 19:12:21 -07:00
teknium1
4133cd9fbf docs(infographic): eager fallback on persistent transport failures 2026-06-27 19:12:21 -07:00
linyubin
c946e6709f fix(agent): activate fallback on persistent transport failures (#22277)
Eager fallback previously fired only on rate_limit/billing. A stale-
detector-killed hung stream classifies as FailoverReason.timeout
(retryable=True) and the retry loop re-hit the same dead primary until
the budget exhausted -- 3 x ~180-300s stale kills compounding into a
15+ min silent hang while the configured fallback chain sat idle.

Extend the existing eager-fallback gate to also cover timeout and
overloaded, but only after one real retry (retry_count >= 2) so genuine
transient hiccups still recover on the primary. Reuses the same
pool-recovery guard and state-reset as the rate_limit branch -- no new
config flag, no change to the rate-limit intent.

Salvaged from PR #50228 by @linyubin. Closes #22277.

Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
2026-06-27 19:12:21 -07:00
bykim0119
851f75d4df fix(discord): honor "*" wildcard in DISCORD_ALLOWED_USERS (#22334)
DISCORD_ALLOWED_USERS="*" now means "allow everyone", matching the
SIGNAL_ALLOWED_USERS / DISCORD_ALLOWED_CHANNELS wildcard convention and
the value `claw migrate` emits. Previously _is_allowed_user did exact
ID matching only, so "*" matched no user and blocked every non-self
sender — a P1 with no workaround.

Three sites, all required for the fix to hold at runtime:
- _is_allowed_user: short-circuit when "*" is in the allowlist.
- connect(): exclude "*" from the intents.members trigger so the
  wildcard does not request the privileged Server Members intent
  (which can block the bot from coming online).
- _resolve_allowed_usernames: preserve "*" verbatim; otherwise it lands
  in the username-resolution bucket, matches no member, and is silently
  dropped from the set and env var on the first on_ready — quietly
  undoing the fix.

Slash auth delegates to _is_allowed_user (auto-covered); component auth
already honors "*" on main.
2026-06-27 19:11:30 -07:00
Teknium
1207d81eed fix(gateway): unify outbound chat redaction onto authoritative redactor (#23810) (#53907)
The gateway banner promises 'chat responses are scrubbed before delivery',
but _redact_gateway_user_facing_secrets used a divergent 6-pattern subset that
leaked credential shapes the comprehensive agent.redact catches — notably the
GitHub fine-grained PAT (github_pat_...) and the Telegram bot-token shape
(bot<digits>:<token>), the gateway's own credential type.

_redact_gateway_user_facing_secrets now delegates to
agent.redact.redact_sensitive_text(force=True) — the same Tirith-grade redactor
already applied to logs, tool output, and approval-command prompts — so the
outbound LLM-response path (final_response -> _sanitize_gateway_final_response)
masks the full credential set. The narrow local pattern set is kept as a
fail-soft second pass. force=True honors redaction even when
security.redact_secrets is off, matching _redact_approval_command.

Test: regression guard parametrizing all 5 issue shapes x every chat surface;
asserts secret body never reaches the user and surrounding prose survives. The
existing bearer-token test's marker assertion is loosened from the literal
'[REDACTED]' to mask-agnostic (the redactor masks as '***'/partial) — it
asserts the security invariant, not the implementation's mask string.
2026-06-27 19:09:41 -07:00
LeonSGP43
c56b39c11e fix(auxiliary): fall back to OPENROUTER_API_KEY when credential pool exhausted
_try_openrouter() returned (None, None) whenever an OpenRouter credential
pool existed but was exhausted (_select_pool_entry -> (True, None)), making
the OPENROUTER_API_KEY env-var fallback unreachable. Auxiliary tasks
(compression, vision, web_extract) silently failed even with a valid env key.

Now the pool-present branch only returns early when it successfully builds a
client; an exhausted pool falls through to the env-var path. The final
failure (pool exhausted AND no env var) still marks the provider unhealthy.

Fixes #23452.

Co-authored-by: ambition0802 <noreply@github.com>
2026-06-27 19:09:27 -07:00
qWaitCrypto
46e18804ad fix(auxiliary): fall back on 401 auth errors in auto mode (#21165)
When the primary provider returns 401 and the auth-refresh path is
unavailable or fails, both call_llm() and async_call_llm() reached the
should_fallback gate without _is_auth_error in the condition, so the
auxiliary task (e.g. compression) was dropped silently — losing message
history. Add _is_auth_error to should_fallback (NOT is_capacity_error) in
both sync and async paths, plus an 'auth error' reason branch.

Auth stays a non-capacity error: it falls back in auto mode via the
is_auto gate, but on an explicitly-configured provider it still respects
the user's choice and raises rather than silently switching providers.
2026-06-27 19:07:04 -07:00
Teknium
1a570dae00 fix(image-routing): unblock message queue on OpenRouter 'no endpoints' image 404 (#53901)
The agent's image-rejection fallback strips images and retries text-only when
a provider rejects image content, which is what lets the gateway drain its
queued messages. The fallback only fires on a hardcoded phrase list, and the
OpenRouter wording — HTTP 404 'No endpoints found that support image input' —
was missing. For OpenRouter-routed non-vision models the fallback never fired,
the retry loop re-sent the same rejected request until exhaustion, and every
subsequent message (including plain text) stayed queued behind the stuck turn.

Add the phrase to _IMAGE_REJECTION_PHRASES (the 404 already passes the 4xx
gate). Add a positive test and a guard test so the sibling OpenRouter
'no endpoints ... data policy / guardrail' 404s do NOT get their images
stripped.

Fixes #21160. Reported by @liu14goal14-ux; PR #21198 by @ygd58.
2026-06-27 19:07:02 -07:00
Teknium
a94f657a50 fix(tui): route completion RPCs to the pool so they can't freeze the TUI (#53895)
complete.path and complete.slash ran inline on the tui_gateway stdin
reader thread. complete.path spawns git ls-files and fuzzy-ranks the
whole repo; complete.slash does first-call prompt_toolkit imports plus a
skill-dir scan. While either ran, prompt.submit / session.interrupt sat
unread in the stdin pipe, freezing the TUI until the 120s RPC timeout
fired — most reliably reproduced by typing @ on a large repo / WSL2 mount.

Add both to _LONG_HANDLERS so completion runs on the existing thread
pool (write_json is already _stdout_lock-guarded). Root-cause fix:
covers any slow completion, not just the bare-@ trigger.

Fixes #21123
2026-06-27 19:06:01 -07:00
teknium1
ccf526964a fix(gateway): bound adapter teardown awaits on the stop path (#14128)
The main stop loop in _stop_impl() awaited adapter.cancel_background_tasks()
and adapter.disconnect() with no timeout, for both the primary and the
secondary-profile (multiplex) adapter maps. A half-dead platform — a wedged
Feishu/Lark WebSocket thread blocked on network I/O is the reported case —
makes one of those awaits block forever, so the process never exits. systemd
then SIGKILLs it after TimeoutStopSec, skipping atexit PID-file cleanup, and
the next start dies with 'PID file race lost' and enters a restart loop.

The per-adapter timeout infra already existed on main
(_adapter_disconnect_timeout_secs / HERMES_GATEWAY_ADAPTER_DISCONNECT_TIMEOUT,
default 5s) but was only wired into _safe_adapter_disconnect, which the
teardown path never calls.

Add _bounded_adapter_teardown(): wraps BOTH cancel_background_tasks() and
disconnect() in the existing timeout budget, logs and forces forward progress
on timeout, and never raises. Both teardown loops now route through it, so the
stop sequence always completes regardless of any adapter's internal behavior
and PID-file cleanup runs.

Original report + fix direction by @happy5318 (#14128, #14130); this widens it
to cover cancel_background_tasks(), the multiplex loop, and the config knob.

Co-authored-by: happy5318 <happy5318@users.noreply.github.com>
2026-06-27 19:05:04 -07:00
Teknium
6717cfc805 docs(gateway): warn against custom ExecStopPost kill drop-in (restart loop) (#53903)
A user-added systemd drop-in like ExecStopPost=/bin/kill -9 $MAINPID fires
on every stop, including clean restarts — it SIGKILLs the freshly spawned
gateway before it stabilizes and Restart=always respawns it, producing an
infinite restart loop (issue #23272). The unit Hermes installs already shuts
down cleanly via KillMode=mixed + KillSignal=SIGTERM with Restart=always +
RestartForceExitStatus, so no extra kill is needed. Document this as a danger
callout in the gateway service-management section.
2026-06-27 19:04:29 -07:00
teknium1
ea8facee81 chore(release): add konsisumer to AUTHOR_MAP for PR #19608 salvage 2026-06-27 19:01:37 -07:00
konsisumer
8b4c29f0f0 fix(auth): preserve concurrently-added credentials on pool rewrite 2026-06-27 19:01:37 -07:00
Teknium
163cb24d45 feat(moa): render reference-model blocks in TUI and desktop, not just CLI (#53855)
The MoA reference-block display (each reference model's output shown as a
labelled thinking block before the aggregator responds) previously existed
only in the classic CLI. The facade already emits moa.reference / moa.aggregating
through tool_progress_callback; this wires the TUI and desktop consumers.

- tui_gateway/server.py: _on_tool_progress relays moa.reference (label / text /
  index / count) and moa.aggregating to the Ink/desktop client as their own
  events.
- ui-tui: gatewayTypes adds the two event shapes; createGatewayEventHandler
  routes them; turnController.recordMoaReference pushes a committed
  thinking-style segment tagged with the source model. Shown regardless of
  showReasoning — references ARE the mixture-of-agents process the user opted
  into, not ordinary reasoning. moa.aggregating is a status-only transition
  (no transcript entry).
- apps/desktop: use-message-stream appends each reference as a labelled
  reasoning chunk via the existing reasoning disclosure; GatewayEventPayload
  gains label/index/aggregator.

Tests: tui_gateway emit (3), Ink handler render + showReasoning-independence +
aggregating-no-segment (3). TUI typecheck/lint clean; desktop typecheck/lint
clean.
2026-06-27 18:46:20 -07:00
Teknium
d3d621f7c3 revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853)
* Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)"

This reverts commit 2ecca1e7d3.

* Revert "fix(windows): stop terminal-window popups from background spawns (#53810)"

This reverts commit 5db1430af9.

* Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)"

This reverts commit ef17cd204d.
2026-06-27 15:59:00 -07:00
Teknium
1d32e5d98c fix(gateway): relay _thinking bubbles when thinking_progress is on but tool_progress is off (#53849)
display.thinking_progress is documented as independent of tool_progress —
users can keep tool progress quiet while opting into mid-turn assistant
scratch-text bubbles. But two gates were keyed on tool_progress_enabled alone,
so with tool_progress:off the _thinking relay was silently dead even when
thinking_progress:true:

1. agent.tool_progress_callback was set to None unless tool_progress_enabled,
   so the callback that queues _thinking text never fired.
2. The send_progress_messages drain task was only started when
   tool_progress_enabled, so even queued messages had no consumer.

Both now gate on needs_progress_queue (tool_progress OR thinking_progress) —
the same condition that already decides whether to create the progress queue
at all. No effect when both are off (queue is None) or when tool_progress is
on (unchanged).

Tests: _thinking relays with thinking_progress:on/tool_progress:off, and is
suppressed when thinking_progress:off. Full progress-topics suite: 35 pass.
2026-06-27 15:48:20 -07:00
Teknium
2ecca1e7d3 fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)
Follow-up to #53791 addressing review feedback: the footgun checker treated
capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a
Windows console. That invariant is false — stream redirection controls where a
child's output goes, not whether a console is allocated. From a console-less
parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem
child still flashes a window even when fully captured.

- check-windows-footguns.py: capture/redirect/check_output is no longer a blanket
  safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/
  docker/powershell/…); calls to those are flagged even when captured. Non-flashing
  programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/
  popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW).
- Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the
  _subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the
  durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional).
- Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.
2026-06-27 14:49:41 -07:00
Teknium
3ac96d3308 fix(moa): resolve auxiliary tasks to the aggregator, not the preset name (#53827)
On a MoA session, auxiliary tasks (title generation, compression, vision, …)
ran through _resolve_auto with provider='moa' / model='<preset>', which sent
the preset name (e.g. 'opus-gpt') as the model id to resolve_provider_client —
producing 'HTTP 400: opus-gpt is not a valid model ID' on every turn (visible
as the title-generation warning).

MoA is a virtual provider with no real HTTP endpoint; aux tasks don't need the
reference fan-out. _resolve_auto now resolves a 'moa' main provider to the
preset's aggregator slot (its acting model) and continues Step 1 with that real
provider+model, dropping the virtual moa://local base_url + placeholder key so
the aggregator resolves via its own provider credentials. Mirrors the MoA
context-length resolution.

Verified live: a MoA turn no longer emits the 'not a valid model ID' warning.
Test: tests/agent/test_auxiliary_main_first.py (19 pass).
2026-06-27 14:21:26 -07:00
Gille
e7bb67332d fix(moa): preserve Codex slot routing 2026-06-27 14:20:51 -07:00
Gille
66aeda3550 fix(moa): keep virtual provider on MoA client 2026-06-27 14:20:51 -07:00
brooklyn!
5db1430af9 fix(windows): stop terminal-window popups from background spawns (#53810)
* fix(windows): stop terminal-window popups from background spawns

Native-Windows desktop/gateway users saw cmd/conhost windows flash on
gateway restart, image paste, the dashboard Projects tree, voice notes,
and ~5 min after closing the app (detached cron). Two root causes:

- Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist,
  agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw
  subprocess allocate a fresh console when the launching process has
  none (pythonw desktop backend / detached gateway) - even with output
  captured.
- uv venv pythonw shims re-exec console python.exe, so Python children
  get a console regardless of how they're launched.

Fixes:
- Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs
  CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned
  console-exe spawn through it.
- FreeConsole() catch-all in hermes_bootstrap: any Python child that
  exclusively owns an auto-allocated console detaches it at startup
  (GetConsoleProcessList()==1 gate leaves shared interactive consoles
  untouched).
- Replace PowerShell/wmic gateway PID scans with in-process psutil.
- Skip schtasks queries on non-interactive desktop restarts.
- Prefer native agent-browser .exe over .cmd shims.
- Guard test bans raw subprocess spawns of the Windows-only console
  tools repo-wide so the popup class can't regress.

* fix(windows): scope FreeConsole to background entry points; fix merge fallout

Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't
tell a uv pythonw->python phantom console apart from a user opening the
interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) —
both report a single attached process with a tty. Running FreeConsole() in the
import-time bootstrap therefore risked detaching a legitimately-interactive
terminal.

- Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console();
  remove it from apply_windows_utf8_bootstrap() (import side effect).
- Call it only from known background mains: gateway run, dashboard backend
  (start_server, what the desktop spawns), cron standalone, tui_gateway entry,
  slash worker. Interactive CLI/TUI never calls it.
- Behavior-contract tests: frees only when solo owner, leaves shared console,
  no-op without console / on POSIX, and asserts it's not an import side effect.

Merge fallout from origin/main (#53791):
- local.py: 3-way merge left a dangling **_popen_kwargs (NameError crashing
  every terminal init). _subprocess_compat.popen already hides the window, so
  drop it.
- discord adapter: merge stacked an undefined windows_hide_flags() onto the
  primitive call; drop the redundant arg.
- test_gateway: scan now goes psutil-first (zero spawn); rewrite the
  case-variant test to drive that production path.

* test(claw): mock _subprocess_compat.run seam for Windows process scan

claw.py's Windows tasklist/powershell scan routes through the hidden-spawn
primitive; the tests still patched claw_mod.subprocess, so on win32 the mock
was never hit and real spawns returned nothing. Patch the actual seam.
2026-06-27 14:02:24 -07:00
Teknium
ef17cd204d fix(windows): stop subprocess console-window popups + add CI guard (#53791)
* fix(windows): stop subprocess console-window popups + add CI guard

The single biggest source of Windows 'terminal popup' bug reports was bare
subprocess.run/Popen calls spawning a console window. The compat helpers
(windows_hide_flags / windows_detach_popen_kwargs) already existed but the
footgun checker had no rule to stop new bare calls from reintroducing the flash.

- scripts/check-windows-footguns.py: new AST-based rule flagging subprocess
  calls that can create a new console — output-redirection-aware (capture/
  redirect/check_output exempt) and POSIX-only-program-aware (launchctl/
  systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation
  burden on calls that can't flash.
- Swept all genuine window-spawning sites through windows_hide_flags()/
  windows_detach_popen_kwargs(); marked intentionally-visible launches
  (editor/terminal/foreground re-exec) with '# windows-footgun: ok'.
- tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract
  tests + full-repo cleanliness invariant.
- CONTRIBUTING.md: documents the rule + the helper pattern.

* test: accept creationflags kwarg in psutil_android fake_subprocess_run

The Windows no-window sweep added creationflags=windows_hide_flags() to
install_psutil_android.py's subprocess.run call; the test's fake stub had a
fixed (cmd) signature and raised TypeError on the new kwarg.
2026-06-27 13:03:51 -07:00
Teknium
3b44a3c8bb feat(moa): show each reference model's output as a labelled block before the aggregator (#53793)
When a MoA preset is selected, each reference model's answer now renders in the
CLI as a thinking-style block labelled with its source model, BEFORE the
aggregator responds — so the mixture-of-agents process is visible instead of a
silent pause. The aggregator's response (and its tool actions) follow as normal.

Mechanism (shared seam, all surfaces):
- MoAChatCompletions/MoAClient take an optional reference_callback and emit
  'moa.reference' (index/count/label/text) per reference, then 'moa.aggregating'
  (aggregator label) once. agent_init wires this to the agent's
  tool_progress_callback, which every surface already consumes — so the events
  reach CLI/TUI/desktop/gateway with no new plumbing.
- CLI _on_tool_progress renders 'moa.reference' as a labelled '┊ ◇ Reference
  i/n — <model>' header + a thinking-style preview (reusing _emit_reasoning_
  preview), and 'moa.aggregating' as a spinner transition. Display-only; never
  touches message history (cache-safe).

Turn-scoped reference cache: the agent loop calls the facade once per tool-loop
iteration, but the advisory message view is identical across iterations within a
turn, so references are now run AND displayed once per user turn (keyed by the
advisory view's signature) instead of re-running/re-spamming on every iteration.
This also cuts reference API cost from O(iterations) back to O(turns).

Verified live via interactive PTY on the opus-gpt preset (gpt-5.5 + opus refs):
reference blocks render once per turn, labelled by model, before the aggregator;
fresh blocks on each new turn; aggregator tool actions still execute.

Follow-up: TUI/desktop rich rendering + gateway batched-summary already receive
the events via tool_progress_callback; their surface-specific renderers are a
separate change.
2026-06-27 12:45:23 -07:00
Dale Nguyen
dbbf102b8e fix(terminal): strip VIRTUAL_ENV/CONDA_PREFIX from terminal subprocess env
The Hermes gateway runs inside its own venv, so its process environment
carries VIRTUAL_ENV (and possibly CONDA_PREFIX). The terminal tool spawned
subprocesses inheriting those markers. When the agent ran `uv sync`,
`uv pip install`, `poetry install`, etc. in ANY other project directory,
those tools honored the inherited VIRTUAL_ENV and rebuilt/synced that
project's dependencies into the Hermes venv path — wiping Hermes' own runtime
deps (and, when the other project pinned a different Python, replacing the
interpreter), bricking the gateway on the next restart (#23473).

Strip VIRTUAL_ENV/CONDA_PREFIX in both subprocess-env construction points in
tools/environments/local.py — `_sanitize_subprocess_env` and `_make_run_env`
— via a shared `_ACTIVE_VENV_MARKER_VARS` constant. The Hermes venv stays
reachable because its bin dir is already first on PATH, so removing the
active-environment markers is safe and only prevents the cross-project clobber.

Adds TestActiveVenvMarkerStripping: end-to-end (markers in os.environ don't
reach the spawned subprocess) and unit coverage for both functions, plus a
guard on the marker constant.

Also adds the AUTHOR_MAP entry for the salvaged contributor.

Closes #23473
2026-06-28 01:04:20 +05:30
Teknium
d470ed0c4c fix(cli): commit tool scrollback lines in verbose mode (non-streaming/MoA) (#53785)
In the interactive CLI, the aggregator's tool calls under a MoA preset (or
any non-streaming model call, e.g. copilot-acp) appeared to overwrite each
other instead of building scrollable history. Each tool only updated the
transient spinner line; no committed scrollback line was printed.

Root cause: persistent tool lines in _on_tool_progress's tool.completed
branch were gated on tool_progress_mode in {all, new}, omitting 'verbose'.
Streaming models hid the bug because _on_tool_gen_start commits a 'preparing'
line per tool during streaming; non-streaming calls (MoA forces
_use_streaming=False) never emit that, so under 'verbose' there was no
committed line at all — only the self-overwriting spinner.

'verbose' is strictly more than 'all', so it now commits the same scrollback
line. Verified live via interactive PTY on the MoA opus-gpt preset: three
terminal calls in turn 1 and two in turn 2 each render as separate persistent
lines.
2026-06-27 12:29:55 -07:00
Teknium
227e6c0143 fix(moa): resolve context window from the aggregator, not the 256K default (#53780)
A MoA session's model is the preset name (e.g. 'opus-gpt') and its base_url is
the virtual local endpoint, so get_model_context_length() missed every probe
and fell through to the 256K fallback — even when the aggregator is a 1M-context
model. The acting model in MoA IS the aggregator, so resolve the context window
from the aggregator slot's real provider+model.

- model_metadata.get_model_context_length: when provider=='moa', resolve the
  preset's aggregator slot through resolve_runtime_provider and recurse with the
  aggregator's real provider/model/base_url. Explicit model.context_length still
  wins (checked first); falls through to the generic default if resolution fails.

Tests: opus-gpt preset now reports 1M (the aggregator window), config override
still honored.
2026-06-27 12:08:09 -07:00
ailthrim
25ec01f79f fix(desktop): don't purge Electron cache / mirror-retry after a late build failure
`hermes desktop` / `hermes update` recover from a corrupt Electron download by
purging the cached zip + re-downloading and retrying the pack, and then by
falling back to a public mirror. That recovery is only meaningful when the
packaged executable is MISSING — the signature of a partial/corrupt unpack.

A LATE failure such as macOS code signing (#40187) leaves
`Hermes.app/Contents/MacOS/Hermes` (or the platform equivalent) in place.
Re-downloading Electron can't repair a signing failure, so the purge +
slow mirror retry just grind through another identical failure before the
build finally errors out.

Gate both recovery blocks on `_desktop_packaged_executable(desktop_dir) is None`
so a build that already produced the executable fails fast instead of
triggering the destructive download recovery. The corrupt-download path
(executable missing) is unchanged.

Salvage of #42782, re-applied onto current main (the surrounding recovery was
refactored to `_electron_dist_ok` / `_redownload_electron_dist` since the PR
was opened). Adds a regression test asserting no purge / mirror retry runs when
the executable exists, and updates the existing retry/mirror tests to model the
corrupt-download case (executable absent) the recovery is actually for.

Related to #40187 (the residual cache-purge sub-issue; the signing failure
itself is fixed by #52591).
2026-06-28 00:29:34 +05:30
teknium1
1ef19bad90 fix(model): show MoA preset picker on selection and label MoA in the banner
Selecting 'Mixture of Agents' in the `hermes model` provider picker fell
through silently — select_provider_and_model had no moa branch, so it just
reprinted the current model/provider summary and exited. And the CLI session
banner rendered the bare preset name (e.g. 'opus-gpt · Nous Research'),
which is meaningless out of context.

- Add _model_flow_moa: always lists the available presets (even one), then
  prints the full reference-models + aggregator breakdown for the selection
  and persists model.provider=moa / model.default=<preset> (dropping stale
  base_url + endpoint creds, since moa is a virtual local provider).
- Wire the branch into select_provider_and_model.
- build_welcome_banner takes provider; when 'moa' it renders
  'MoA: <preset> · agg <aggregator>' instead of a bare slug. Both CLI call
  sites pass self.provider.

Tests: 2 new banner tests (moa + non-moa unchanged); E2E verified the picker
persists the preset and clears stale base_url/api_key.
2026-06-27 11:45:07 -07:00
konsisumer
1b6ebb24c0 fix(agent): validate OpenRouter provider sort before request dispatch 2026-06-27 11:43:08 -07:00
Teknium
27322612b4 fix(update): route loud build/installer output to update.log instead of the terminal (#53616)
* fix(update): route loud build/installer output to update.log instead of the terminal

hermes update flooded the terminal with the full vite asset dump,
electron-builder logs, npm deprecation warnings from the desktop build,
and the cua-driver installer's 'Next steps' wall. All of that is
low-signal noise the user doesn't need on a successful update.

- Capture the desktop --build-only subprocess (vite + electron-builder)
  into ~/.hermes/logs/update.log; print a one-line status, and on
  failure surface the last 15 lines + a pointer to the full log.
- Capture the cua-driver installer's output when verbose=False (the
  hermes update refresh path); concise upgrade line is unchanged.
- Add _log_only_write() / _run_logged_subprocess() helpers that write to
  the update.log handle without echoing to the terminal.

The repo-root npm install keeps streaming (capture_output=False) — that
is the deliberate #18840 guard so a slow postinstall download doesn't
look hung. The desktop npm install is a separate Electron process with
no such progress concern and is captured.

* fix(update): persist full cua-driver installer output to update.log

The captured cua-driver installer output was only sent to logger.debug
(agent.log) on failure, so the 'Next steps' wall was lost from
update.log entirely on success. Write the full captured output straight
to the update.log handle (sys.stdout._log) on both success and failure,
matching the desktop-build capture, so update.log keeps the complete
record of everything an update did.
2026-06-27 11:43:01 -07:00
ethernet
f53b184c48 fix(ci): pass secrets down to docker workflows 2026-06-27 09:53:28 -07:00
Teknium
190e1ffac9 fix(redact): mask passwords in lowercase/dotted config keys (#53590)
The secret redactor only matched uppercase env-style keys ([A-Z0-9_]),
so config-file assignments like spring.datasource.password=secret,
app.api.key=xyz, and YAML password: secret leaked verbatim when the
agent ran cat/grep on application.properties or .env files (issue #16413).

Adds three case-insensitive config-key matchers that run only in a
config-file context, preserving the existing #4367 (lowercase code/prose)
and web-URL-passthrough carve-outs:
  - _CFG_DOTTED_RE: namespaced keys (contain a dot) — unambiguously config
  - _CFG_ANCHORED_RE: bare secret-word keys at line start (incl. export)
  - _YAML_ASSIGN_RE: unquoted colon config (password: value)
Value capture stops at whitespace and '&' so form bodies stay pair-wise;
the '://' guard keeps intentional web-URL query-param passthrough intact.

Reported-by: Murtaza1211
2026-06-27 04:43:28 -07:00
Teknium
917f6bdb00 fix(tools): let vision pick any provider+model, not just OpenRouter (#53606)
* fix(tools): let vision pick any provider+model, not just OpenRouter

hermes tools → configure → vision no longer forces an OPENROUTER_API_KEY.
It now offers the same any-provider surface as the model command: Auto
(use main model / aggregator fallback), pick any authenticated provider +
model, or a custom OpenAI-compatible endpoint. Selections persist to
auxiliary.vision.{provider,model,base_url} — the keys the vision resolver
already reads. Custom endpoint pins provider=custom so base_url routes
correctly. Reconfigure path uses the same picker instead of re-prompting
for OPENROUTER_API_KEY.

* docs: add PR infographic for vision any-provider picker
2026-06-27 04:41:42 -07:00
Brandon Zarnitz
9c81c938d3 fix(approval): honour tirith_fail_open=false on Tirith ImportError (#20733)
check_all_command_guards() swallowed ImportError from tools.tirith_security
with an unconditional pass, leaving tirith_result["action"] as "allow"
regardless of security.tirith_fail_open.  When an operator sets
tirith_fail_open: false they have explicitly opted into fail-closed
behaviour; a missing or broken Tirith module must not silently permit
command execution.

Inside the except ImportError handler, read the live security config.
When tirith_enabled is true and tirith_fail_open is false, synthesise a
"warn"-action Tirith result so the command flows through the normal
approval path (prompt the user, or block in cron/gateway contexts)
instead of bypassing it.  The default tirith_fail_open: true behaviour
is unchanged.

Adds three regression tests to tests/tools/test_approval.py:
- fail_open=true  + ImportError → silently allowed (no regression)
- fail_open=false + ImportError → approval callback invoked, command denied
- tirith_enabled=false           → always allowed regardless of fail_open

Fixes #20733

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

# Conflicts:
#	tests/tools/test_approval.py
2026-06-27 04:41:24 -07:00
Teknium
fe1c1c1121 fix(session_search): demote cron below interactive sessions in discover ranking (#53597)
Cron jobs accumulate large volumes of repetitive vocabulary (recurring
project names, dates, summaries) and out-number a user's interactive
sessions. Under bare BM25 they dominate the top FTS rows, so discover's
early-exit-at-N dedup collects only cron sessions and the user's own
conversations never surface — "recall blindness" (#19434).

- _order_for_recall() stable-sorts FTS rows so interactive sources rank
  above cron before lineage dedup; within each class BM25/recency order
  is preserved. Cron is demoted, not excluded, so it still surfaces when
  it is the only match.
- raise discover scan limit 50 -> 300 so buried interactive matches are
  in hand for the demotion pass.

Fixes the cron-flooding sub-bug of #19434. The split-brain sub-bug is
covered by #52798; the child-session sub-bug is superseded by in-place
compaction.
2026-06-27 04:41:22 -07:00
Teknium
cd592c105c feat(send_message): native WhatsApp media delivery via Baileys bridge (#53598)
send_message with MEDIA:/path to a WhatsApp target previously dropped the
attachment: the WhatsApp branch never passed media_files, the plugin's
_standalone_send accepted the param but only POSTed text, and WhatsApp was
absent from the media-supported platform list.

- send_message_tool: add a Platform.WHATSAPP media block (mirrors Feishu) that
  routes media_files through the whatsapp plugin's standalone_sender_fn, and
  add whatsapp to the supported-media list strings.
- whatsapp adapter: _standalone_send now sends text first (skipped when the
  chunk is media-only), then uploads each file via the bridge /send-media
  endpoint with a mediaType derived from extension/is_voice/force_document, so
  images/videos/voice arrive as native bubbles instead of documents.
- _bridge_media_type classifier maps ext -> image|video|audio|document.

Closes #19105 (remaining send_message gap). Other items in the report
(inbound video paths, image_generate auto-deliver, history dedup, native
gateway bubbles) already landed on main.
2026-06-27 04:40:05 -07:00
Teknium
88c02469cc fix(mcp): never permanently wedge the circuit breaker on a dead transport (#53599)
A long-running gateway session could permanently lose an MCP server: once a
stdio subprocess died (or transient drops accumulated over the session), the
run loop exhausted its reconnect budget and returned, orphaning the task. With
no listener for _reconnect_event, the circuit breaker's half-open probe could
never revive the server — every probe hit a dead/absent session, re-armed the
60s cooldown, and looped forever until a full gateway restart (#16788).

Root cause was split ownership of transport liveness between the run loop and
the tool handler, plus a permanent give-up path. Fixed by one invariant: a
non-shutdown server task is always reconnectable.

- run loop parks (deregisters phantom tools, then awaits _reconnect_event)
  instead of returning when the reconnect budget is exhausted, so the task
  stays alive as a dormant listener
- retry budget resets on every successful (re)connect, so a healthy
  long-lived server can't accumulate lifetime drops into a death sentence
- half-open probe with no live session signals a reconnect (reviving a
  parked/dead task and respawning a dead stdio subprocess) and returns a
  clean 'reconnecting' error instead of writing into a dead pipe
- breaker resets on successful session init across all transports
  (stdio/HTTP/SSE) — fully transport-agnostic, no PID/pipe polling

Builds on the closed-PR cluster for this issue: keeps #49255's deregister-on-
exhaustion insight and #21006's signal-don't-probe insight, discards the racy
os.kill PID machinery.

Co-authored-by: LeonSGP43 <LeonSGP43@users.noreply.github.com>
Co-authored-by: srojk34 <srojk34@users.noreply.github.com>
2026-06-27 04:39:54 -07:00
r266-tech
dbc925b755 Guard oversized Telegram video downloads 2026-06-27 04:39:48 -07:00
Teknium
02b32e2d7c fix(moa): call reference + aggregator models through their provider's real route (#53580)
MoA was calling reference and aggregator models through a bare
call_llm(provider=slot["provider"], model=slot["model"]) with a forced
temperature and a forced max_tokens (the preset's hardcoded 4096). That left
base_url/api_key/api_mode unresolved — so the auxiliary auto-detector guessed
the API surface instead of using the provider's real runtime, and the 4096 cap
truncated long aggregator syntheses.

A MoA slot is just a model selection and must be called the same way any model
is called elsewhere. Each slot is now resolved through resolve_runtime_provider
(the canonical provider→api_mode/base_url/api_key resolver the CLI, gateway, and
delegate_task all use) via a new _slot_runtime() helper, and the resolved
endpoint is passed into call_llm. So a reference/aggregator gets its provider's
actual API surface — MiniMax → anthropic_messages, GPT-5/o-series →
max_completion_tokens, custom endpoints → their base_url — identical to how that
model is handled as the acting model.

MoA also no longer imposes its own output cap: max_tokens defaults to None
(omitted → the model's real maximum) for references and is passed through from
the caller for the aggregator. The preset's hardcoded 4096 is gone. The
max_tokens preset config field is left in place (config/web/desktop unchanged);
it is simply no longer applied as a forced cap.

Tests: slots route through resolve_runtime_provider with resolved base_url/
api_key; resolution errors fall back to bare provider/model; neither call
carries an output cap even when the preset config still contains max_tokens.
2026-06-27 04:39:42 -07:00
herbalizer404
3fe16e3cd5 fix(fallback): attach credential pool after provider switch
When automatic fallback activates a provider that differs from the
primary, try_activate_fallback() cleared the primary's pool (to avoid
cross-provider base_url contamination, #33163) but never loaded the
fallback provider's own pool. The fallback then ran with no pool, so
rate_limit/billing/auth recovery couldn't rotate its credentials.

After clearing a mismatched pool, load_pool(fb_provider) and attach it
when it has credentials, so provider-specific rotation continues to
work on the fallback target.
2026-06-27 04:39:26 -07:00
Tranquil-Flow
635841d210 fix(agent): reload credential pool on switch_model provider change (#52727)
switch_model() swapped model/provider/base_url/api_key but never
refreshed agent._credential_pool, which stays bound to the original
provider. recover_with_credential_pool() then sees a pool.provider !=
agent.provider mismatch and short-circuits — so a 429/401 on the new
provider gets no rotation and falls through to fallback instead.

Reload load_pool(new_provider) inside switch_model when the provider
changes (or the pool is missing). The reload is inside the protected
swap block and the pool is added to the rollback snapshot, so a failed
client rebuild restores the original pool.

Fixes #16678, #52727.
2026-06-27 04:39:26 -07:00
Teknium
2002bb49a7 test(telegram): make config-bridge tests immune to ambient .env pollution (#53594)
test_config_bridges_telegram_group_settings and
test_config_bridges_telegram_user_allowlists asserted the YAML→env bridge
via os.environ. A developer's real ~/.hermes/.env can repopulate TELEGRAM_*
vars during load_gateway_config(): the microsoft_teams plugin runs
load_dotenv(find_dotenv(usecwd=True)) at import time, which walks up from the
cwd (under ~/.hermes/ in worktrees) and reloads the user's .env, defeating the
env-over-YAML bridge for any key present there (e.g. TELEGRAM_GROUP_ALLOWED_CHATS).

Assert the returned PlatformConfig.extra instead — it is parsed straight from
the test's config.yaml and is immune to that ambient leak. free_response_chats
is bridged to the env var only (not extra), and TELEGRAM_FREE_RESPONSE_CHATS
doesn't appear in developer .env files, so it stays a deterministic os.environ
assertion.
2026-06-27 04:36:45 -07:00
Teknium
d4c2217e87 fix(gateway): offload /model switch off the event loop (#53603)
The Telegram/Discord /model command's actual switch calls switch_model()
directly on the asyncio event loop. switch_model() can fall through to a
synchronous models.dev HTTP fetch (requests.get, 15s timeout) on a cold or
expired cache, freezing the gateway for up to 15s and dropping the Telegram
connection while a user switches models.

The picker provider-list and fallback text-list sites were already offloaded
(#41289), but the two _switch_model() calls — the picker callback and the
direct /model <name> path — were not. Wrap both in asyncio.to_thread.

Closes #20525.
2026-06-27 04:36:22 -07:00
Teknium
caf4dcc7ad fix(whatsapp): resolve phone↔LID aliases in adapter DM/group allowlist (#53588)
The adapter-level intake gate (_is_dm_allowed / _is_group_allowed, reached
via _should_process_message) did a raw set-membership check against the
configured allowlist. WhatsApp now delivers inbound DM senders in LID form
(<id>@lid) while operators configure allowlists with phone numbers, so the
check never matched and every DM from an allowed contact was silently
dropped before the gateway authz layer ran.

Route both gates through the existing gateway.whatsapp_identity.
expand_whatsapp_aliases helper (already used by gateway authz and session
keys), which walks the bridge's lid-mapping-*.json session files. Phone and
LID forms now resolve to each other in both directions; exact JID matches,
wildcard, disabled/open policies, and empty-allowlist fail-closed behavior
are all preserved.

Fixes #14486
2026-06-27 04:17:12 -07:00
teknium1
38e7bd8a08 fix(agent): classify 429 'overloaded' bodies as overloaded, not rate_limit
Z.AI / Zhipu reuse HTTP 429 for server-wide overload. The 429 status
path classified these unconditionally as rate_limit with
should_rotate_credential=True, so an overloaded provider exhausted the
credential pool after two errors — fatal for a single-key user, who has
nothing to rotate to.

The credential is valid; the server is just busy. Disambiguate the 429
body against a shared _OVERLOADED_PATTERNS list and route overload
language to FailoverReason.overloaded (retryable, no rotation), matching
the existing 503/529 path and the message-only path (#52890). Genuine
rate limits (no overload language) still rotate.

Extracted the inline overloaded tuple #52890 added into the shared
_OVERLOADED_PATTERNS constant so the status-code and message paths use
one list.

Closes #14038.
2026-06-27 04:16:54 -07:00
ms-alan
16192103f4 fix(config): accept placeholder base_url in custom provider validation
_normalize_custom_provider_entry() ran urlparse() on base_url and dropped
any entry whose value was an un-expanded placeholder, so a caller reaching
the normalizer with raw config (e.g. the Dockerized gateway path) silently
skipped the provider with a 'not a valid URL' warning. Skip URL validation
when the candidate contains a placeholder token — both ${ENV_VAR} env-refs
and bare {region}-style templates — since those are expanded at runtime.

Closes #14457
2026-06-27 04:15:27 -07:00
HiddenPuppy
b34771fc06 fix(cli): disable prompt_toolkit CPR queries to stop escape-sequence leak (#13870)
prompt_toolkit's renderer sends ESC[6n cursor-position queries before
painting in non-fullscreen mode; the terminal replies ESC[<row>;<col>R.
Over SSH/cloudflared tunnels and slow PTYs these replies race past the
input parser and land in the display as raw '20;1R21;1R' text, and the
pending-CPR future can stall the renderer so the prompt freezes after the
agent's final answer.

Build the prompt_toolkit output with enable_cpr=False so CPR is marked
NOT_SUPPORTED up front and ESC[6n is never sent. This is the root-cause
counterpart to the existing input-side _strip_leaked_terminal_responses
scrubbing. Vt100_Output.from_pty() does not expose enable_cpr in
prompt_toolkit 3.x, so _build_cpr_disabled_output() reproduces its
get_size setup and calls the constructor directly; it returns None on any
failure so startup falls back to the default output.

Verified in a real PTY: baseline emits 1 ESC[6n query, the fix emits 0,
banner/UI render identically. Layout is unaffected — with CPR off the
renderer sizes the prompt to its preferred height (the same fallback
prompt_toolkit uses on any terminal that doesn't answer CPR).

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-27 04:15:20 -07:00
LeonSGP43
e7c013494d fix(agent): preserve nested API error bodies 2026-06-27 04:13:53 -07:00
Teknium
5ab4136631 fix(webui): switch provider when Config-page model field changes (#53583)
The dashboard Config tab's Model field is a flat string with no provider
info. _denormalize_config_from_web only updated model.default and kept the
stale provider, so picking an OpenRouter model while the default provider was
ollama-local left provider=ollama-local and every call 404'd.

When the model string actually changes, infer the serving provider — curated
catalog first, then a vendor/model-slug heuristic for non-aggregator providers
— and route the switch through the existing _normalize_main_model_assignment /
_apply_main_model_assignment chokepoints so stale base_url/api_mode/api_key are
cleared on a provider change and preserved on a same-provider re-pick. Saving
an unchanged model never re-detects, so unrelated config saves keep an explicit
provider.

Closes #14058
2026-06-27 04:13:44 -07:00
teknium1
7ee0b68973 fix(gateway,feishu): refuse executor resurrection during real shutdown
Add an explicit _closing guard to both owned executors so the
recreate-on-shutdown path only recovers from an *external* teardown of
the loop default — never resurrects a pool the gateway/adapter itself
stopped. _shutdown_*executor() sets the flag; _get_*executor() raises if
closing; feishu connect() re-arms on reconnect. Updates the gateway
recreate test to assert the refusal contract and adds feishu coverage.
2026-06-27 04:13:09 -07:00
teknium1
b296915c82 fix(feishu): route blocking SDK calls through an adapter-owned executor
Feishu SDK calls ran on asyncio's shared default executor, so a torn-down
default executor wedged every send with 'Executor shutdown has been called'
and left the gateway a zombie (#10849). The adapter now owns a
ThreadPoolExecutor recreated on demand if shut down, mirroring the
gateway-owned executor change. Routes all 17 self._client SDK calls through
_run_blocking; shuts the pool down on disconnect.
2026-06-27 04:13:09 -07:00
konsisumer
1011c07966 fix(gateway): use owned executor for agent work 2026-06-27 04:13:09 -07:00
LeonSGP43
52a09d8faf fix(byterover): honor auto extract config 2026-06-27 04:04:15 -07:00
teknium1
f062cf076b fix(agent): also treat provider=ollama as an Ollama GLM backend
Follow-up to the #13971 fix: a genuine native Ollama provider reached
through a reverse proxy carries no ollama/:11434 URL signature, so the
restricted detection would miss it. Add provider=="ollama" as an
explicit True case (idea from #14789, @Tranquil-Flow) and cover both it
and the #13971 LiteLLM-proxy-to-zai false-positive with E2E tests.
2026-06-27 04:03:07 -07:00
YuShu
266521b55f refactor(agent): trim docstring per review feedback
Remove commentary about the previous is_local_endpoint() approach
from _is_ollama_glm_backend() — git history suffices.
2026-06-27 04:03:07 -07:00
YuShu
00a8252b7d fix(agent): scope Ollama/GLM stop-to-length heuristic to Ollama only
The _is_ollama_glm_backend() function was too broad: any local endpoint
running a GLM model was treated as Ollama, triggering the stop->length
misreport heuristic introduced in 8011aa3. This caused false truncation
detection on sglang, vLLM, LM Studio, and other non-Ollama servers that
correctly report finish_reason.

When a GLM model on sglang/vLLM returned finish_reason='stop', the agent
mistakenly reclassified it as 'length' if the response didn't end with
a whitelisted punctuation character (ASCII or CJK). This particularly
affected Chinese-language responses and Markdown-formatted text.

Root cause: the is_local_endpoint() fallback assumed any local GLM
endpoint = Ollama. But many non-Ollama servers also run on localhost.

Fix: remove the is_local_endpoint() catch-all. Only detect Ollama via
its distinctive signatures (port 11434, 'ollama' in URL). All other
local servers are assumed to report finish_reason correctly.

This is the correct tradeoff because:
- False negatives (Ollama at custom port, heuristic not triggered) only
  mean the user sees a truncated response — same as having no heuristic
- False positives (non-Ollama server, heuristic wrongly triggered) inject
  spurious continuation messages into the conversation — strictly worse

Adds two tests:
- sglang GLM response is NOT reclassified as truncated
- Ollama GLM on port 11434 still triggers the heuristic as before

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-06-27 04:03:07 -07:00
teknium1
ab1f9b94c5 fix(telegram): accept @username chat_id in delivery paths (#13206)
TELEGRAM_HOME_CHANNEL set to an @username (not a numeric chat ID) crashed
all webhook/cron->Telegram home-channel delivery with 'ValueError: invalid
literal for int()'. The Telegram Bot API accepts both a numeric chat_id and
an @username string; Hermes was force-coercing every chat_id with int().

Add normalize_telegram_chat_id() (returns int for numeric values, passes
@username strings through) and apply it at the Bot API send/edit sites in
the Telegram adapter and the send_message tool. Username targets are now
recognized as explicit targets in _parse_target_ref.

Reapplies the approach from #13274 (season179), whose branch predated the
gateway/platforms/telegram.py -> plugins/platforms/telegram/adapter.py
relocation. Dupes: #13535 (Tranquil-Flow), #37572 (chewkaah).

Co-authored-by: season179 <season.saw@gmail.com>
2026-06-27 04:01:58 -07:00
teknium1
f2ca3e3d84 fix(gateway): hold _run_restart on _restart_task + explicit cancel-loop skip
Follow-up on the cherry-picked #13173 fix. Holds the _run_restart task in
self._restart_task (a bare asyncio.create_task keeps only a weak reference,
so a still-pending task can be GC'd mid-flight) and explicitly skips it in
the _stop_impl cancel loop alongside _stop_task. Adds AUTHOR_MAP entry for
the contributor and a regression test that fails when the task is cancellable.

Refs #12875
2026-06-27 03:57:31 -07:00
zeapsu
1ce5d6d974 fix(gateway): exclude _run_restart from _background_tasks to prevent zombie on /restart
When request_restart() adds _run_restart to _background_tasks, _stop_impl
later cancels all entries in that set.  Since _run_restart is awaiting
_stop_task at that point, the CancelledError propagates into _stop_impl,
interrupting cleanup before _shutdown_event.set() and _exit_code = 75
execute.  This leaves the gateway as a zombie (alive but disconnected) or
exiting with code 0 instead of 75, preventing systemd Restart=on-failure
from restarting the service.

Fix: don't add _run_restart to _background_tasks — it self-terminates in
~50ms and needs no lifecycle management.

Fixes #12875
2026-06-27 03:57:31 -07:00
teknium1
08e131f77c test(telegram): cover bot self-message ingestion guard (#11905)
Regression tests for the self-author guard added in the salvaged fix:
- bot-authored DM-topic watcher echo is dropped (the exact #11905 symptom)
- bot self-messages dropped in groups/supergroups too
- other bots in the same chat are still processed (self-id, not is_bot)
- observe-unmentioned sibling path also rejects self-messages
- missing from_user does not crash

Test scaffolding ported from @cola-runner's PR #12817 and adapted to the
current plugins/platforms/telegram/adapter.py and _is_own_message().
2026-06-27 03:56:52 -07:00
Sahil-SS9
6fb25f86ac fix(telegram): filter out bot's own messages from inbound processing (#52363) 2026-06-27 03:56:52 -07:00
Teknium
68a65ed7a1 fix(agent_init): correct misleading sub-64K context_length error message (#53569)
The error raised when a model's context window is below the 64K minimum
advertised "or set model.context_length in config.yaml to override" — but
the guard intentionally has no sub-64K escape hatch. Sub-64K models are
rejected by design (tool schemas + system prompt need the headroom).

The misleading clause invited a cluster of dup PRs (#11097, #11110, #8962,
#9142, #37548) all trying to wire an override that we don't want. Reword to
state the real options: pick a >=64K model, or — if your local server
under-reports its true window — declare the real value (which must itself
be >=64K). Guard behavior is unchanged.
2026-06-27 03:56:25 -07:00
Teknium
d73078e7b0 fix(cron): make per-profile cron isolation intentional and tested (#4707) (#53570)
A profile's cron jobs now provably live in AND execute under that profile's
HERMES_HOME. A job authored under profile `coder` is stored at
`~/.hermes/profiles/coder/cron/jobs.json` and runs with coder's .env,
config.yaml, scripts and skills — never the default root's.

This was the de-facto behavior on main but only by accident: PR #50112 had
re-anchored cron storage at the shared default root, and a later stale-branch
squash merge (#52147) silently reverted it back to the profile home. Neither
direction was guarded by a test, so it could flip again on the next stale merge.

Changes:
- cron/jobs.py: document the per-profile storage anchor (get_hermes_home, NOT
  get_default_hermes_root) and why anchoring at the root leaks
  config/credentials/skills across profiles — the #4707 security boundary.
- cron/scheduler.py, cron/suggestions.py: same intent documented at the
  dynamic resolution helper and the suggestions store.
- tests/cron/test_cron_profile_isolation.py: pin storage, lock-path, and
  execution-home resolution to the active profile so a re-anchor can't regress.

Verified E2E: jobs created under two profiles land in separate per-profile
stores with zero cross-profile leakage and no shared-root store; scheduler
execution-home follows the active profile. Full cron suite: 576/576.
2026-06-27 03:55:01 -07:00
Bartok
864d5521ad test(curator): join straggler curator-review thread on fixture teardown
The curator_env fixture left async review threads (synchronous=False spawns
a daemon 'curator-review' thread that calls save_state() on completion)
running past test teardown. save_state() resolves the state path from
HERMES_HOME at write time, so a straggler could write into the next test's
tmp home, corrupting test_state_file_survives_corrupt_read (and others)
under CI load. Join the thread on teardown while HERMES_HOME is still
pinned to this test's home.
2026-06-27 03:52:52 -07:00
Bartok9
45ce35ed72 fix(agent): classify message-only 'overloaded' as server overload
Salvage of #14261 by @ms-alan — rebased onto current main, scoped to the
overloaded-classification fix, with a regression test that fails without it.
2026-06-27 03:52:52 -07:00
teknium1
151ae1e937 test(api-server): cover SSE failure finish_reason for both failure modes
Lock the contract that a clean stream-queue termination followed by an
agent failure never reports finish_reason: "stop". Covers the raised-
exception case (#12422 repro), the flagged failed-result case, truncation
(length), and the success happy path.

Follow-up to the salvaged #12504 fix from @flobo3.
2026-06-27 03:52:44 -07:00
flobo3
b8b695e2cd fix(api): surface agent crash in SSE chat completions stream 2026-06-27 03:52:44 -07:00
Teknium
f67c0b3e60 docs(hermes-agent skill): cover v0.13–v0.17 features, fix stale claims, tighten (#53566)
Refresh the hermes-agent skill against the last 5 major releases and the
current codebase, and cut verbose prose.

Coverage added (v0.13.0–v0.17.0):
- New gateway platforms: iMessage (Photon), Teams, LINE, SimpleX, ntfy,
  Google Chat, Raft, official WhatsApp Business Cloud API (now 20+).
- New surfaces section: desktop app, web dashboard admin panel,
  hermes proxy (OpenAI-compatible OAuth proxy), Automation Blueprints.
- delegate_task(background=true) async subagents; memory-tool atomic
  batch operations; session_search three-mode shape; x_search/video_analyze
  toolsets; image_gen image-to-image; xAI Grok via SuperGrok OAuth.
- display.interface (cli/tui), curator.consolidate opt-in, PyPI install.

Accuracy fixes:
- Adding-a-Tool is two files (auto-discovery), not three.
- Testing uses scripts/run_tests.sh (canonical runner), not bare pytest.
- Dropped change-detector test count and a dangling references/ pointer.
- Refreshed overview (Windows-native, 20+ providers, many surfaces).

Conciseness: trimmed over-explained Windows keybinding/sandbox/test prose
and deep prompt-builder internals to pointers.
2026-06-27 03:51:25 -07:00
Teknium
d3db73210c chore(release): map blaryx@gmail.com → Blaryxoff for PR #32602 salvage 2026-06-27 03:48:18 -07:00
blaryx
76af2456a2 fix(dashboard): merge PUT /api/config with existing on-disk config
The dashboard form is built from CONFIG_SCHEMA, which doesn't enumerate
every root-level key the YAML supports. Most visibly, `custom_providers`
is in `_KNOWN_ROOT_KEYS` but is absent from the schema — so the frontend
never sends it in the PUT body. The previous full-replace save() then
silently wiped the key from disk every time the user clicked anything
that triggered a save. Other casualties (less visible because defaults
re-mask them on load) include `agent.personalities`,
`agent.reasoning_effort`, `terminal.lifetime_seconds`, etc.

Fix: read the raw on-disk config and deep-merge the incoming PUT body
on top of it before saving. The frontend can only overwrite what it
explicitly sends; everything else is preserved verbatim.

Reuses the existing `_deep_merge` helper from `hermes_cli.config`.

Tests:
- `test_round_trip_preserves_custom_providers` exercises the exact bug:
  seed config with custom_providers, GET → drop the key → PUT,
  assert it's still on disk.
- `test_round_trip_preserves_schema_invisible_nested_keys` covers the
  shallow-vs-deep-merge case for nested dicts under `agent` etc.
Both fail on current main; both pass with this patch.
2026-06-27 03:48:18 -07:00
Teknium
ec769e49d2 fix(gateway): WhatsApp/Signal hints affirm markdown instead of forbidding it (#53564)
The 'whatsapp' and 'signal' PLATFORM_HINTS told the agent 'Please do not
use markdown as it does not render' — factually wrong. Both adapters
actively convert markdown to native formatting:

- whatsapp_common.format_message(): **bold**, ~~strike~~, # headers,
  links, code blocks -> WhatsApp native syntax
- signal_format.markdown_to_signal(): same conversions via bodyRanges,
  plus '- item' / '* item' bullets -> '• ' Unicode bullets

The wrong hint made the agent strip bullets and bold the adapter would
have rendered (#12224). Rewrote both hints to mirror whatsapp_cloud:
markdown is auto-converted, bullet lists work, tables are not supported.
Added a contract test asserting markdown-converting platforms never
forbid markdown in their hint.
2026-06-27 03:46:41 -07:00
teknium1
a5d1f68c74 refactor(moa): share one virtual-provider row builder across pickers
Follow-up on the gateway-picker salvage: the cherry-picked change added a
second copy of the MoA virtual-provider row in model_switch.py, duplicating
inventory._moa_provider_row (same slug/name/preset-models, identical extra
fields). Make _moa_provider_row take a bare current_provider string and reuse
it from the gateway picker path so the row shape lives in one place and the
two surfaces can't drift.
2026-06-27 03:43:38 -07:00
dodo-reach
ed54469d06 fix(gateway): show MoA presets in model picker 2026-06-27 03:43:38 -07:00
Teknium
789f8b7dc2 docs(webhook): clarify authenticated != trusted-content trust model (#53562)
HMAC validation authenticates the webhook sender, not the business
fields inside the payload (PR titles, commit messages, issue bodies),
which are authored by untrusted third parties. Expand the prompt-
injection section to make the trust boundary explicit: the agent's
capability surface, not the input channel. Document the hardening
levers (sandbox the runtime, scope the toolset, keep approvals on,
template narrowly) instead of pretending to sanitize untrusted text.

Refs #8820.
2026-06-27 03:43:33 -07:00
teknium1
4e0788783b refactor(gateway): extract MoA one-shot restore helper; restore #28686 comment; real-method tests
Follow-up on the salvaged MoA restore fix:
- Extract the finally-block restore into _restore_moa_one_shot() so the
  behavior is unit-testable without re-implementing it, and so the gateway
  /moa handler and the finally block share one implementation.
- Restore the load-bearing #28686 zombie-eviction comment above
  _release_running_agent_state that the original diff dropped.
- Rewrite the tests to call the real _restore_moa_one_shot helper (the
  originals re-implemented the restore logic inline, so they passed
  regardless of the production code).
2026-06-27 03:43:28 -07:00
srojk34
2f29e3cfc5 fix(gateway): restore MoA one-shot model override on failed turns
The MoA one-shot restore ran inside the try block after
_handle_message_with_agent returned. When that call raised an
exception (agent init failure, interpreter shutdown, OOM), the
restore was skipped and the MoA model override stayed permanently
on _session_model_overrides — silently routing all subsequent
messages through the MoA reference fan-out with no user-visible
indication.

Move the restore to the finally block so it fires on every exit
path (success, exception, interrupt). The restore data lives on
the per-turn event object and would be lost if not consumed here.
2026-06-27 03:43:28 -07:00
briandevans
17cb829991 test(moa): cover non-list/bare-dict reference_models normalization 2026-06-27 03:43:16 -07:00
briandevans
8dd4e576d0 fix(moa): tolerate non-list reference_models in hand-edited MoA preset config 2026-06-27 03:43:16 -07:00
Teknium
60f58a2b95 feat(verify-on-stop): default OFF, one-time migration, skip doc-only edits (#53552)
The verify-on-stop guard fired too eagerly — including on doc/markdown/skill
edits with nothing to verify, where it pushed a pointless /tmp verification
script. Three changes:

1. Default OFF for new installs: agent.verify_on_stop defaults to false
   (was the "auto" surface-aware sentinel). _config_version bumped 30 -> 31.
2. One-time migration (v30 -> v31): existing installs are switched off once,
   but only when the value is missing or still the "auto" sentinel — an
   explicit true/false the user set is preserved.
3. Path filter: build_verify_on_stop_nudge() now drops documentation/prose
   paths (.md/.mdx/.rst/.txt/LICENSE/CHANGELOG/...) so even when explicitly
   enabled, a doc-only turn never nudges. Mixed doc+code turns still nudge on
   the code paths.

The legacy "auto" sentinel is still honored when set explicitly (ON for
interactive coding surfaces, OFF for messaging). HERMES_VERIFY_ON_STOP env
override unchanged.
2026-06-27 03:23:22 -07:00
teknium1
29ee4bbff6 refactor(dashboard): tighten cron-job form helpers
Collapse the three near-identical optional-text helpers
(optionalText/optionalBaseUrl/listToText) into one optionalText with a
strip-trailing-slash flag, route listToText + toolsets through the
existing splitCronList, and replace the repeated
typeof x === 'string' ? x : '' ladders with a single asString helper.
Behavior-identical; all 16 vitest cases pass.
2026-06-27 03:20:32 -07:00
Versun
c655cdf2c1 feat(dashboard): expose cron job execution fields 2026-06-27 03:20:32 -07:00
teknium1
50f6855217 feat(moa): make /moa one-shot only; route preset switching through the model picker
/moa no longer does a sticky model switch. It now always runs a single
prompt through the default MoA preset and restores the prior model
afterward; the whole argument is the prompt (no preset-name matching).
To switch to a MoA preset for the session, select it from the model
picker, where presets already surface under a virtual Mixture of Agents
provider on every model-selection surface.

Also fixes #53444: the TUI one-shot only set session[model_override],
which the already-built cached agent ignored, so MoA silently never ran
and the turn used the original model. The TUI now does a real in-place
agent.switch_model() via _apply_model_switch() when a live agent exists
(with a proper restore after the turn), and falls back to a model_override
for lazy/unbuilt sessions.

Removes the redundant sticky-switch branch from the CLI, gateway, and TUI
/moa handlers; updates the command description, usage string, and docs.
2026-06-27 03:09:09 -07:00
teknium1
3cd4693494 chore: add DiamondEyesFox to AUTHOR_MAP for PR #53351 salvage 2026-06-27 03:04:26 -07:00
diamondeyesfox
8df231c941 fix(agent): rebaseline in-place compression flushes 2026-06-27 03:04:26 -07:00
Mahesh Sanikommu
1b75b3fd90 feat(memory): add Supermemory setup connection summary
Add post_setup() and get_status_config() to the Supermemory memory
provider so `hermes memory setup` and `hermes memory status` print a
one-line connection summary (container, profile fact count,
auto_recall/auto_capture). Point API-key onboarding at the Hermes
connect URL (app.supermemory.ai/integrations?connect=hermes).

Salvage of #52988. Two fixes folded in:

- Test isolation: the new probe/status tests mocked _SupermemoryClient
  but not the __import__("supermemory") guard inside
  _probe_supermemory_connection, so they passed only where the optional
  supermemory package was installed and failed on a clean checkout / CI
  (the PR shipped with red CI). Added _stub_supermemory_importable()
  mirroring the existing test_is_available_false_when_import_missing
  pattern; the suite now passes with supermemory absent.

- post_setup: `if api_key and api_key not in os.environ` checked whether
  the key's *value* named an env var (always false in practice). Fixed to
  compare the value: `os.environ.get("SUPERMEMORY_API_KEY") != api_key`.

Verified: 38/38 in test_supermemory_provider.py and the full
tests/plugins/memory/ suite green with supermemory not installed.

Closes #52988
2026-06-27 15:07:34 +05:30
underthestars-zhy
8827300267 fix(photon): correlate tapbacks to bot message context
Populate `reply_to_message_id`, `reply_to_text`, and
`reply_to_is_own_message` on reaction events so the gateway injects
`[Replying to your previous message: "..."]` when the agent receives
a tapback.

The sidecar now extracts a capped text preview from the hydrated
reaction target (plain text and mixed group messages; null for
attachment/voice-only targets), emitting it as `targetText` in the
NDJSON reaction payload. The Python adapter reads this field and sets
the reply correlation fields on the `MessageEvent`.
2026-06-27 00:51:34 -07:00
underthestars-zhy
4345b3e767 fix(photon): upgrade spectrum-ts sidecar to v8.0.0
v8 made `richlink` outbound-only; inbound rich links now arrive as
plain `text`. Remove the `getBalloonBundleId`/`toRichlinkMessage`
branches from the iMessage mapper patch and update the fixture,
lockfile, and README accordingly.
2026-06-27 00:51:34 -07:00
underthestars-zhy
5636c22828 feat(photon): upgrade spectrum-ts sidecar to v7.0.0
Update the Photon platform plugin's Node.js sidecar from spectrum-ts
3.1.0 to 7.0.0, which splits the SDK into scoped `@spectrum-ts/*`
packages with `spectrum-ts` as the umbrella re-export.

- Bump exact pin in package.json/package-lock.json to 7.0.0
- Update mixed-attachments patch script to target the new
  `@spectrum-ts/imessage/dist/index.js` path and tab-indented output
- Rewrite test fixture to match v7.x mapper shape (tab-indented,
  `const ... = async` declarations, single-line builder calls) and
  point at `@spectrum-ts/imessage/dist/index.js`
- Update README upgrade guide to document the v5 package split and
  the postinstall patch validation step
- Update comments in cli.py and index.mjs to reference v5/v7 changes
2026-06-27 00:51:34 -07:00
Teknium
d712a7fd73 fix(model-picker): surface the current custom/uncurated model in picker rows (#53457)
A model selected via the CLI (e.g. /model openrouter/<uncurated-name>) was
absent from every model picker — the main picker AND the MoA reference/
aggregator slot pickers — because each provider row only carried its curated
catalog. Inject the current model at the front of its provider's row so it is
selectable and shown everywhere.
2026-06-27 00:06:34 -07:00
Ben Barclay
fbf748b282 fix(dashboard-auth): follow redirects on self-hosted OIDC discovery (#53399)
The self-hosted OIDC provider fetched the discovery document with a bare
httpx.get(). httpx defaults to follow_redirects=False (unlike curl -L or
the requests library), so when an IDP answers GET
/.well-known/openid-configuration with a 3xx — Authentik canonicalises the
.well-known path, and any IDP behind a reverse proxy doing an http→https
upgrade redirects too — the bare redirect (empty body) tripped the
status != 200 guard and raised 'OIDC discovery returned 302', which
routes.py maps to the provider_unreachable audit event and a 503. The
browser surfaced 'Auth provider self-hosted unreachable'.

The user's smoking gun (curl -o writing zero bytes from inside the
container) is exactly a redirect with no body — the same wall the code hit.

Add follow_redirects=True to the discovery GET only. It's safe: the
issuer-pin check and _require_https_or_loopback still validate the resolved
document and every endpoint, so a redirect can't smuggle in a bad issuer or
a cleartext endpoint. The token/revocation POSTs deliberately keep the
no-follow default (they carry an auth code / refresh token and the endpoint
is already the canonical absolute URL).

Existing discovery tests mocked httpx.get with a canned 200 and never
exercised a real 3xx. Add a regression test that runs a real loopback
server returning a 302 on the .well-known path — fails without the fix
(ProviderError: discovery returned 302), passes with it.
2026-06-27 14:14:51 +10:00
ethernet
dd0e4ab81a change(ci): slice files in matrix job
avoid duplicating work, avoid file discovery on each job
2026-06-26 19:15:18 -07:00
ethernet
1a75387fa8 change(ci): log json decode error in durations 2026-06-26 19:15:18 -07:00
ethernet
707ae6e623 change(tests): don't count with pytest collect
it's way too slow. just grep files lol
2026-06-26 19:15:18 -07:00
ethernet
bcc3eb3419 fix(ci): rip out some xdist legacy stuff... how did these ever work?? 2026-06-26 19:15:18 -07:00
ethernet
2fa66950e8 change(ci): upload-artifact from v4 -> v7 2026-06-26 19:15:18 -07:00
ethernet
4b0a2040e7 change(ci): use run_tests in docker 2026-06-26 19:15:18 -07:00
ethernet
18f7ad49ab change(ci): update all UV installs 2026-06-26 19:15:18 -07:00
ethernet
f0cb049217 change(ci): migrate docker smoketests to real tests 2026-06-26 19:15:18 -07:00
ethernet
2bd17221b7 change(ci): pretty names 2026-06-26 19:15:18 -07:00
ethernet
9a861cd0ab change(tests): don't pass pytest args when counting tests 2026-06-26 19:15:18 -07:00
ethernet
447f9e7c89 change(nix): simpler dev setup 2026-06-26 19:15:18 -07:00
ethernet
8ae793d3de change(nix): ship fat hermes agent by default 2026-06-26 19:15:18 -07:00
ethernet
fb1dd1bf91 change(ci): docker-publish.yml -> docker.yml 2026-06-26 19:15:18 -07:00
ethernet
35dfe7b58f change(ci): docker runs again on PRs 2026-06-26 19:15:18 -07:00
ethernet
4cf69f0da4 refactor(ci): more test slices 2026-06-26 19:15:18 -07:00
ethernet
d4aec4e92f refactor(ci): run tests thru run_tests.sh 2026-06-26 19:15:18 -07:00
ethernet
c918d07b50 refactor(ci): rewrite docker tests to check built container 2026-06-26 19:15:18 -07:00
ethernet
638243726e refactor(ci): faster docker builds via --link and chmod removal 2026-06-26 19:15:18 -07:00
brooklyn!
f6e815e378 Merge pull request #53357 from helix4u/fix/desktop-titlebar-overlay 2026-06-26 20:42:11 -05:00
Gille
1bff85cf66 fix(desktop): keep titlebar overlay off session title 2026-06-26 19:18:26 -06:00
Nacho Avecilla
dbe734beff fix(dashboard-auth): exclude non-interactive providers from interactive login surfaces (#53239)
* Return None instead of erroring on drain login failure

* Fix login on drain

* Remove login for drained endpoints flow and clean the code

* chore: drop unrelated credits changes from this PR

* Remove extra comments that were not really necessary
2026-06-27 10:08:13 +10:00
brooklyn!
7a38d64a85 Merge pull request #53335 from NousResearch/bb/desktop-custom-model-blank-selector
fix(desktop): show custom (non-curated) model in Settings model pickers
2026-06-26 18:59:43 -05:00
Brooklyn Nicholson
a6ae179f43 fix(desktop): show custom (non-curated) model in Settings model pickers
A Radix <Select> renders a blank trigger when its `value` matches no
<SelectItem>. The Settings model pickers built their options solely from
each provider's curated `models` list, so a model added via config that
isn't in that list (e.g. anthropic/claude-opus-4.7 on nous) selected
nothing and showed an empty selector.

Union the active value into the options via a small `withActive` helper,
applied to the main, auxiliary, MoA reference, and MoA aggregator model
selects so the configured model always stays visible and selectable.
2026-06-26 18:55:44 -05:00
kshitijk4poor
7475d125d2 test(mcp): stub mcp_oauth in backgrounding test to deflake CI
The backgrounding-contract test (test_prepare_agent_startup_backgrounds_
blocking_mcp_for_chat) failed intermittently on loaded CI shards: it stubs
tools.mcp_tool.discover_mcp_tools but NOT tools.mcp_oauth, so the background
discovery thread paid the real, cold ~0.75s 'import tools.mcp_oauth' (added by
this PR's _discover_mcp_tools_without_interactive_oauth) before calling the
stubbed discovery. On a slow/loaded runner that import plus thread scheduling
exceeded the 1.0s polling deadline, leaving calls['mcp'] == 0.

Fix: stub tools.mcp_oauth with a nullcontext suppress_interactive_oauth (the
same no-op production falls back to when mcp_oauth is unavailable), so the
test exercises the backgrounding contract without paying an unrelated cold
import in its timing window. Bumped the poll deadline 1.0s -> 3.0s as
belt-and-suspenders. Production behaviour is unchanged; the import cost was
always off the main thread.

Verified: 5/5 pass repeatedly via scripts/run_tests.sh (per-file isolation,
matching CI), ruff clean.
2026-06-27 04:59:23 +05:30
zapabob
e55ddc3e33 fix(mcp): suppress interactive OAuth stdin prompts during background discovery (#35927)
When an MCP server requires OAuth, the interactive `hermes` TUI froze on
startup: background MCP discovery hit the OAuth flow, which on an interactive
TTY spawns a daemon thread doing a blocking `sys.stdin.readline()` (the
"paste the redirect URL" fallback in mcp_oauth._wait_for_callback). That
thread competes with the TUI's own stdin reader for the same terminal, so
keystrokes get swallowed and the TUI appears frozen (up to the 300s OAuth
timeout). Reported symptom: "MCP OAuth: authorization required / Open this URL
... the tui is freezing, not respond to typing."

Add a thread-local `suppress_interactive_oauth()` context manager in
tools/mcp_oauth.py; `_is_interactive()` returns False while it's active, so the
stdin paste-thread and prompt are never created. Background discovery
(hermes_cli/mcp_startup.py, tui_gateway/entry.py) now runs discovery inside
that context, so OAuth-requiring servers soft-skip (raise
OAuthNonInteractiveError, already handled) instead of stealing the TUI's stdin.
A real `hermes mcp login` on the main thread is unaffected (thread-local).

Salvaged from #35945 by @zapabob (authorship preserved via cherry-pick;
resolved a conflict against main's new mcp_discovery_timeout / wait_for_mcp_
discovery refactor, keeping both). Verified E2E: with suppression the paste
prompt is NOT printed and no stdin thread spawns (raises OAuthNonInteractive
soft-skip); without it the prompt shows (the freeze). Mutation-verified
(removing the suppress check in _is_interactive fails the regression test).
76 tests pass, ruff clean.

Closes #35927.

SELF-REVIEW FIX: the original #35945 used threading.local(), which does NOT
propagate to the dedicated mcp-event-loop thread where OAuth actually runs
(discover_mcp_tools dispatches the connect via run_coroutine_threadsafe), so
the suppression was a NO-OP in production (the tests passed only by stubbing
out the cross-thread dispatch). Converted to a contextvars.ContextVar, which
asyncio copies onto the scheduled coroutine — empirically verified suppression
now holds on the mcp-event-loop thread through the real _run_on_mcp_loop path.
Added a cross-thread regression test (fails on threading.local, passes on the
ContextVar) so the no-op can't regress.
2026-06-27 04:59:23 +05:30
briandevans
2d8c44ac87 fix(hermes-home): only honour legacy dir layout when it has content
get_hermes_dir(new_subpath, old_name) returned the legacy <old_name>/
location as soon as it existed on disk — even when empty. When an empty
legacy stub is created on a profile that already has populated data at
the new consolidated <new_subpath>/ (install scaffolds, profile init, a
stray mkdir, or ensure_hermes_home() recreating legacy dirs), the
resolver silently flipped to the empty legacy dir and the real data
became invisible. No log, no error — the feature behaved as if state was
wiped. Reproduced as a Discord pairing store losing every approved user
when an empty pairing/ shadowed the populated platforms/pairing/.

Resolve the legacy path only when it has content: a populated directory
(any entry) or a non-directory file counts; an empty directory falls
through to the new layout. Inspection failures (PermissionError on
lstat/iterdir, or any OSError short of FileNotFoundError) are treated as
"occupied" so a transient error never orphans legacy data — only a
genuine FileNotFoundError counts as absent. The lstat()-based gate also
fixes the prior exists()/is_dir() path swallowing PermissionError and
mis-reading an unreadable legacy dir as absent.

This hardens all 11+ call sites that share the resolver (pairing,
image/audio/video/document caches, matrix/whatsapp session stores,
vision/credential/tts/browser dirs).

Adds TestGetHermesDir regression coverage (empty/populated/subdir/file/
unreadable/unstatable cases) and updates test_credential_files to
populate its legacy dirs so they still count as content.

Closes #27602
Closes #27715
2026-06-27 04:57:15 +05:30
briandevans
c377e954fb test(gateway): isolate secret-redaction layer from provider-error rewrite
The existing test_chat_gateways_redact_secret_in_provider_error feeds a
provider-error envelope (HTTP 401), which _sanitize_gateway_final_response
rewrites wholesale to a generic category string. That rewrite strips the
secret regardless of whether the redaction layer works, so the test cannot
on its own prove _redact_gateway_user_facing_secrets is exercised.

Add test_chat_gateways_redact_secret_in_non_error_body: ordinary assistant
prose that echoes a bearer token but is NOT a provider-error envelope, so
the rewrite path does not fire and secret redaction is the only defense.
Verified fail-before (token leaks when _GATEWAY_SECRET_PATTERNS is emptied)
and pass-after across whatsapp/slack/signal/matrix, while non-secret prose
is preserved intact.
2026-06-27 04:47:10 +05:30
briandevans
57864d07ed fix(gateway): suppress operational status/error noise on all chat gateways, not just Telegram (#39293)
The Telegram noise/secret filter added in #28533 gated its work on
`_gateway_platform_value(platform) != "telegram"`, so
`_sanitize_gateway_final_response` and `_prepare_gateway_status_message`
only ran for Telegram. Every other human-facing chat surface
(WhatsApp, Discord, Slack, Signal, Matrix, plugin platforms, etc.)
received raw provider-error bodies verbatim — including any leaked
credentials the secret-redaction pass (`sk-…`, `Bearer …`, `gh[pousr]_…`,
`xox[baprs]-…`, `hf_…`, `glpat-…`) was meant to strip.

Invert the gate from a one-platform allowlist into a small
programmatic-surface denylist: only `local`, `api_server`, `webhook`,
and `msgraph_webhook` consume gateway text programmatically and keep raw
status/error text. Every other (chat) surface — including unknown/empty
platform values and on-demand plugin pseudo-members — fails closed to
the redacted, noise-filtered, sanitized path. This widens the same
root-cause fix to both call sites: status callbacks and final replies.
2026-06-27 04:47:10 +05:30
kshitijk4poor
244a6f2ceb fix(desktop): broken "Open setup guide" button for plugin platforms
On the desktop Channels / Messaging page, the "Open setup guide" button was
rendered as a bare <a href={platform.docs_url} target="_blank"> with no guard.
Plugin-provided platforms (Microsoft Teams, Google Chat, Line, Raft, Yuanbao,
…) ship an empty docs_url, so the anchor's href was "".

In a packaged build, Electron resolves an empty href against the current
document — the app's own index.html inside the asar bundle — and
shell.openPath then fails with an OS "file not found" dialog. This is exactly
the Windows error reported for Messaging → Teams → Open guide.

Fix (3 changes):

1. fix(desktop) — Only render the "Open setup guide" button when docs_url is
   non-empty, and route clicks through openExternalLink so a relative/empty
   value can never be treated as a local bundle path. Fixes the whole class
   (every plugin platform), not just Teams.

2. fix(messaging) — Give the Teams platform plugin a real docs_url (Microsoft
   Teams setup guide) so its card shows a working button instead of nothing.

3. fix(messaging) — Give the Google Chat platform plugin a real docs_url
   (Google Chat setup guide) so its card shows a working button instead of
   nothing. Originally from #48940; folded in here because that PR's test
   was broken (it queried the HTTP endpoint, but google_chat is a dynamic
   enum member that only appears after the adapter module is imported).

Test plan:
- apps/desktop — new src/app/messaging/index.test.tsx: button is hidden when
  docs_url is empty; a real URL opens via the validated external opener (does
  not navigate).
- apps/desktop typecheck (tsc --noEmit) clean.
- backend — test_teams_messaging_metadata_links_setup_guide: the Teams catalog
  entry exposes the setup-guide docs_url.
- backend — test_google_chat_messaging_metadata_links_setup_guide: the Google
  Chat catalog entry exposes the setup-guide docs_url.

Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: p-andhika <andhika.prakasiwi@gmail.com>
2026-06-27 04:34:08 +05:30
kshitijk4poor
58919f68ab fix: also preserve provider selection on Esc-clear-filter path
The back() handler had the same filtered-index drift bug as the Enter
and Ctrl+D transitions: when the user presses Esc to clear an active
filter on the provider stage, providerIdx was reset to 0, losing the
highlighted provider. Apply the same providerIndexAfterClearingFilter
fix as the other three transition paths.

Also adds edge-case tests for the helper: undefined provider, slug not
found, empty rows, and duplicate slug first-match behavior.

Found by hermes-pr-review Phase 2 + hermes-agent-dev 3-agent review.
2026-06-27 04:33:48 +05:30
Donovan Yohan
386478211b fix(tui): preserve filtered model provider selection 2026-06-27 04:33:48 +05:30
kshitijk4poor
b0f44d3fad fix(gateway): remove process-global HERMES_SESSION_KEY write that misroutes approval prompts across concurrent sessions
GatewayRunner._run_agent's run_sync() wrote the per-turn session key to
the process-global os.environ["HERMES_SESSION_KEY"]. Because os.environ
is shared across the whole process, concurrent gateway sessions (e.g.
two Discord threads) clobbered each other's value. A tool worker thread
whose approval contextvar was unset then fell back to os.environ via
get_current_session_key() and read whichever session ran run_sync()
last — routing "Command Approval Required" prompts to the wrong thread.

Session routing is already concurrency-safe via contextvars:
- gateway/session_context.py _SESSION_KEY (set in set_session_vars)
- tools/approval.py _approval_session_key (set via set_current_session_key
  right before the agent runs, inherited by tool worker threads)

The only non-test readers of HERMES_SESSION_KEY (tools/approval.py,
tools/terminal_tool.py, tools/kanban_tools.py) all prefer the contextvar
with os.environ as a mere fallback. CLI/cron/TUI set their own os.environ
via separate export paths (e.g. the TUI parent exporting it into the
agent subprocess), so removing this in-process write does not affect them.

Adds regression tests asserting the resolver prefers the contextvar and
does not leak a concurrent session's cleared/clobbered os.environ value.

Closes #24100

Co-authored-by: Yosapol Jitrak <yosapol@jitrak.dev>
2026-06-27 04:31:37 +05:30
helix4u
5191ebba22 fix(desktop): retry empty resumed transcripts 2026-06-25 13:36:19 -06:00
609 changed files with 43565 additions and 6259 deletions

View File

@@ -66,8 +66,12 @@ runtime/
# ---------- Not needed inside the Docker image ----------
# Desktop app source (Tauri/Electron); never installed in the container
# Desktop app source (Tauri/Electron); never installed in the container.
# apps/shared is the dashboard↔desktop websocket helper and is linked from
# web/package.json as a file: workspace dep — keep it in the build context.
apps/
!apps/shared/
!apps/shared/**
# Test suite — not shipped in production images
tests/

2
.envrc
View File

@@ -1,5 +1,5 @@
watch_file pyproject.toml uv.lock
watch_file package-lock.json package.json web/package.json ui-tui/package.json website/package.json apps/shared/package.json apps/desktop/package.json ui-tui/packages/hermes-ink/package.json
watch_file flake.nix flake.lock nix/devShell.nix nix/tui.nix nix/package.nix nix/python.nix
watch_file flake.nix flake.lock nix/devShell.nix nix/tui.nix nix/package.nix nix/python.nix nix/hermes-agent.nix nix/desktop.nix
use flake

View File

@@ -1,50 +0,0 @@
name: Hermes smoke test
description: >
Run the image's built-in entrypoint against `--help` and `dashboard --help`
to catch basic runtime regressions before publishing. Requires the image
to already be loaded into the local Docker daemon under `image`.
Works identically on amd64 and arm64 runners.
inputs:
image:
description: Fully-qualified image tag (e.g. nousresearch/hermes-agent:test)
required: true
runs:
using: composite
steps:
- name: Ensure /tmp/hermes-test is hermes-writable
shell: bash
run: |
# The image runs as the hermes user (UID 10000). GitHub Actions
# creates /tmp/hermes-test root-owned by default, which hermes
# can't write to — chown it to match the in-container UID before
# bind-mounting. Real users doing `docker run -v ~/.hermes:...`
# with their own UID hit the same issue and have their own
# remediations (HERMES_UID env var, or chown locally).
mkdir -p /tmp/hermes-test
sudo chown -R 10000:10000 /tmp/hermes-test
- name: hermes --help
shell: bash
run: |
# Use the image's real ENTRYPOINT (/init + main-wrapper.sh) so
# this exercises the actual production startup path. PR #30136
# review caught that an --entrypoint override here had been
# silently neutered by the s6-overlay migration — stage2-hook
# ignores its CMD args, so the smoke test was a no-op.
docker run --rm \
-v /tmp/hermes-test:/opt/data \
"${{ inputs.image }}" --help
- name: hermes dashboard --help
shell: bash
run: |
# Regression guard for #9153: dashboard was present in source but
# missing from the published image. If this fails, something in
# the Dockerfile is excluding the dashboard subcommand from the
# installed package.
docker run --rm \
-v /tmp/hermes-test:/opt/data \
"${{ inputs.image }}" dashboard --help

View File

@@ -20,6 +20,7 @@ permissions:
pull-requests: write # needed by lint (PR comment) + supply-chain (PR comment)
actions: read # needed by osv-scanner (SARIF upload)
security-events: write # needed by osv-scanner (SARIF upload)
packages: write # needed by docker build
concurrency:
group: ci-${{ github.ref }}
@@ -32,6 +33,7 @@ jobs:
# (all lanes true) so post-merge validation is never weakened.
# ─────────────────────────────────────────────────────────────────────
detect:
name: Detect affected areas
runs-on: ubuntu-latest
outputs:
python: ${{ steps.classify.outputs.python }}
@@ -53,11 +55,15 @@ jobs:
# Skipped workflows (if condition is false) don't spin up runners.
# ─────────────────────────────────────────────────────────────────────
tests:
name: Python tests
needs: detect
if: needs.detect.outputs.python == 'true'
uses: ./.github/workflows/tests.yml
with:
slice_count: 8
lint:
name: Python lints
needs: detect
if: needs.detect.outputs.python == 'true'
uses: ./.github/workflows/lint.yml
@@ -65,35 +71,49 @@ jobs:
event_name: ${{ needs.detect.outputs.event_name }}
typecheck:
name: TypeScript
needs: detect
if: needs.detect.outputs.frontend == 'true'
uses: ./.github/workflows/typecheck.yml
docs-site:
name: Docs Site
needs: detect
if: needs.detect.outputs.site == 'true'
uses: ./.github/workflows/docs-site-checks.yml
history-check:
name: Deny unrelated histories
needs: detect
if: needs.detect.outputs.event_name == 'pull_request'
uses: ./.github/workflows/history-check.yml
contributor-check:
name: Check contributors
needs: detect
if: needs.detect.outputs.python == 'true'
uses: ./.github/workflows/contributor-check.yml
uv-lockfile:
name: Check uv.lock
needs: detect
uses: ./.github/workflows/uv-lockfile-check.yml
docker-lint:
name: Lint Docker scripts
needs: detect
if: needs.detect.outputs.docker_meta == 'true'
uses: ./.github/workflows/docker-lint.yml
docker:
name: Build&Test Docker image
needs: detect
if: needs.detect.outputs.python == 'true' || needs.detect.outputs.frontend == 'true' || needs.detect.outputs.docker_meta == 'true'
uses: ./.github/workflows/docker.yml
secrets: inherit
supply-chain:
name: Supply-chain scan
needs: detect
if: needs.detect.outputs.event_name == 'pull_request' && (needs.detect.outputs.scan == 'true' || needs.detect.outputs.deps == 'true' || needs.detect.outputs.mcp_catalog == 'true')
uses: ./.github/workflows/supply-chain-audit.yml
@@ -104,7 +124,7 @@ jobs:
mcp_catalog: ${{ needs.detect.outputs.mcp_catalog == 'true' }}
osv-scanner:
needs: detect
name: OSV scan
uses: ./.github/workflows/osv-scanner.yml
# ─────────────────────────────────────────────────────────────────────
@@ -127,6 +147,8 @@ jobs:
- docker-lint
- supply-chain
- osv-scanner
# we don't require docker to pass rn because it's so slow lol
# - docker
if: always()
runs-on: ubuntu-latest
steps:

View File

@@ -2,7 +2,7 @@ name: Docker / shell lint
# Lints the container build inputs: Dockerfile (via hadolint) and any shell
# scripts under docker/ (via shellcheck). These catch the class of regression
# the behavioral docker-publish smoke test can't — unquoted variable
# the behavioral docker smoke test can't — unquoted variable
# expansions, silently-failing RUN commands, etc.
#
# Rules and ignores are documented in .hadolint.yaml at the repo root.

View File

@@ -1,24 +1,9 @@
name: Docker Build and Publish
name: Docker Build, Test, and Publish
on:
push:
branches: [main]
paths:
- '**/*.py'
- 'pyproject.toml'
- 'uv.lock'
- 'Dockerfile'
- 'docker/**'
- '.github/workflows/docker-publish.yml'
- '.github/actions/hermes-smoke-test/**'
# No paths filter — the job must always run so the required check
# reports a status (path-gated workflows leave checks "pending" forever
# when no matching files change, which blocks merge).
pull_request:
release:
types: [published]
workflow_call:
permissions:
contents: read
@@ -39,11 +24,7 @@ env:
IMAGE_NAME: nousresearch/hermes-agent
jobs:
# ---------------------------------------------------------------------------
# Build amd64 natively. This job also runs the smoke tests (basic --help
# and the dashboard subcommand regression guard from #9153), because amd64
# is the only arch we can `load` into the local daemon on an amd64 runner.
# ---------------------------------------------------------------------------
# Build, test, and optionally push the amd64 image.
build-amd64:
# Only run on the upstream repository, not on forks
if: github.repository == 'NousResearch/hermes-agent'
@@ -53,24 +34,19 @@ jobs:
digest: ${{ steps.push.outputs.digest }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
# The image build + smoke test + integration tests run ONLY on
# push-to-main and release — never on PRs. They are the heaviest jobs
# in CI (~15-45 min) and a broken build surfaces on the main push (and
# is gated pre-merge by docker-lint + uv-lockfile-check). Every step
# below is skipped on PRs, so the job still reports green and the
# required check never hangs.
# The image build + integration tests run on every event
# (PRs, push-to-main, release). Publish steps below are gated to
# push-to-main / release only.
- name: Set up Docker Buildx
if: github.event_name != 'pull_request'
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
# Build once, load into the local daemon for smoke testing. Cached
# Build once, load into the local daemon for testing. Cached
# to gha with a per-arch scope; the push step below reuses every
# layer from this build.
- name: Build image (amd64, smoke test)
if: github.event_name != 'pull_request'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
- name: Build image (amd64)
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
@@ -82,25 +58,12 @@ jobs:
cache-from: type=gha,scope=docker-amd64
cache-to: type=gha,mode=max,scope=docker-amd64
- name: Smoke test image
if: github.event_name != 'pull_request'
uses: ./.github/actions/hermes-smoke-test
with:
image: ${{ env.IMAGE_NAME }}:test
# ---------------------------------------------------------------------
# Run the docker-integration test suite against the freshly-built
# image already loaded into the local daemon (`:test`). These tests
# are excluded from the sharded `tests.yml :: test` matrix on purpose
# (see `_SKIP_PARTS` in scripts/run_tests_parallel.py) because each
# shard would otherwise reach the session-scoped ``built_image``
# fixture in ``tests/docker/conftest.py`` and start a 3-7min
# ``docker build`` — guaranteed to
# die in fixture setup.
# image already loaded into the local daemon (`:test`).
#
# Piggybacking here avoids a second image build: the smoke test
# already proved the image loads + runs, so the daemon has it under
# `${IMAGE_NAME}:test` and we just point ``HERMES_TEST_IMAGE`` at
# Piggybacking here avoids a second image build: the build step
# already loaded the image into the daemon under
# `${IMAGE_NAME}:test`, so we just point ``HERMES_TEST_IMAGE`` at
# that. The fixture's ``HERMES_TEST_IMAGE`` branch (see
# tests/docker/conftest.py:62-63) short-circuits the rebuild.
#
@@ -110,26 +73,20 @@ jobs:
# cheapest path to coverage on every PR that touches docker code.
# ---------------------------------------------------------------------
- name: Install uv (for docker tests)
if: github.event_name != 'pull_request'
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
- name: Set up Python 3.11 (for docker tests)
if: github.event_name != 'pull_request'
run: uv python install 3.11
- name: Install Python dependencies (for docker tests)
if: github.event_name != 'pull_request'
run: |
uv venv .venv --python 3.11
source .venv/bin/activate
# ``dev`` extra pulls in pytest, pytest-asyncio —
# everything tests/docker/ needs. We deliberately avoid ``all``
# here because the docker tests only drive the container via
# subprocess and don't import hermes_agent's optional deps.
uv pip install -e ".[dev]"
uv sync --locked --python 3.11 --extra dev
- name: Run docker integration tests
if: github.event_name != 'pull_request'
env:
# Skip rebuild; use the image already loaded by the build step.
HERMES_TEST_IMAGE: ${{ env.IMAGE_NAME }}:test
@@ -139,12 +96,11 @@ jobs:
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
run: |
source .venv/bin/activate
python -m pytest tests/docker/ -v --tb=short
scripts/run_tests.sh tests/docker/ --file-timeout 600
- name: Log in to Docker Hub
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
@@ -155,7 +111,7 @@ jobs:
- name: Push amd64 by digest
id: push
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
@@ -179,7 +135,7 @@ jobs:
- name: Upload digest artifact
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
with:
name: digest-amd64
path: /tmp/digests/*
@@ -187,10 +143,7 @@ jobs:
retention-days: 1
# ---------------------------------------------------------------------------
# Build arm64 natively on GitHub's free arm64 runner. This replaces the
# previous QEMU-emulated arm64 build, which was ~5-10x slower and shared
# a cache scope with amd64. Matches the amd64 job's shape: build+load,
# smoke test, then on push/release push by digest.
# Build, test, and optionally push the arm64 image.
# ---------------------------------------------------------------------------
build-arm64:
if: github.repository == 'NousResearch/hermes-agent'
@@ -200,29 +153,26 @@ jobs:
digest: ${{ steps.push.outputs.digest }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
# arm64 build runs only on push-to-main and release (see build-amd64).
- name: Set up Docker Buildx
if: github.event_name != 'pull_request'
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
# Log in to ghcr.io so the registry-backed build cache below can be
# read (cache-from) on every event and written (cache-to) on
# push/release. Uses the workflow's GITHUB_TOKEN, which is valid for
# the whole job — unlike the gha cache backend's short-lived Azure SAS
# token, which expired mid-build on slow cold-cache arm64 runs and
# crashed the build before the smoke test (the reason the gha cache
# crashed the build before the tests ran (the reason the gha cache
# was removed from arm64 PRs in the first place).
- name: Log in to ghcr.io (build cache)
if: github.event_name != 'pull_request'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# Build once, load into the local daemon for smoke testing, then push
# Build once, load into the local daemon for testing, then push
# by digest below. Reads AND writes the registry-backed cache so the
# push reuses layers from this build and the next build starts warm.
#
@@ -230,9 +180,8 @@ jobs:
# cache that previously broke here: its credential is the job-lifetime
# GITHUB_TOKEN, not a short-lived SAS token, so the cold-build-outlives-
# token failure mode cannot recur.
- name: Build image (arm64, smoke test, cached publish)
if: github.event_name != 'pull_request'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
- name: Build image (arm64, cached publish)
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
@@ -244,15 +193,29 @@ jobs:
cache-from: type=registry,ref=ghcr.io/nousresearch/hermes-agent:buildcache-arm64
cache-to: type=registry,ref=ghcr.io/nousresearch/hermes-agent:buildcache-arm64,mode=max
- name: Smoke test image
if: github.event_name != 'pull_request'
uses: ./.github/actions/hermes-smoke-test
with:
image: ${{ env.IMAGE_NAME }}:test
- name: Install uv for docker tests
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
- name: Set up Python 3.11 for docker tests
run: uv python install 3.11
- name: Install Python dependencies for docker tests
run: |
uv sync --locked --python 3.11 --extra dev
- name: Run docker tests
env:
# Skip rebuild; use the image already loaded by the build step.
HERMES_TEST_IMAGE: ${{ env.IMAGE_NAME }}:test
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
run: |
scripts/run_tests.sh tests/docker/ --file-timeout 600
- name: Log in to Docker Hub
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
@@ -260,7 +223,7 @@ jobs:
- name: Push arm64 by digest
id: push
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
@@ -282,7 +245,7 @@ jobs:
- name: Upload digest artifact
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
with:
name: digest-arm64
path: /tmp/digests/*
@@ -304,17 +267,17 @@ jobs:
timeout-minutes: 10
steps:
- name: Download digests
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
path: /tmp/digests
pattern: digest-*
merge-multiple: true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
- name: Log in to Docker Hub
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}

View File

@@ -37,7 +37,7 @@ jobs:
fetch-depth: 0 # need full history for merge-base + worktree
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
- name: Install ruff + ty
uses: ./.github/actions/retry
@@ -110,7 +110,7 @@ jobs:
cat .lint-reports/summary.md >> "$GITHUB_STEP_SUMMARY"
- name: Upload reports as artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
with:
name: lint-reports
path: .lint-reports/
@@ -164,7 +164,7 @@ jobs:
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
- name: Install ruff
uses: ./.github/actions/retry

View File

@@ -3,17 +3,17 @@ name: Build Skills Index
on:
schedule:
# Run twice daily: 6 AM and 6 PM UTC
- cron: '0 6,18 * * *'
workflow_dispatch: # Manual trigger
- cron: "0 6,18 * * *"
workflow_dispatch: # Manual trigger
push:
branches: [main]
paths:
- 'scripts/build_skills_index.py'
- '.github/workflows/skills-index.yml'
- "scripts/build_skills_index.py"
- ".github/workflows/skills-index.yml"
permissions:
contents: read
actions: write # to trigger deploy-site.yml on schedule
actions: write # to trigger deploy-site.yml on schedule
jobs:
build-index:
@@ -21,11 +21,11 @@ jobs:
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.11'
python-version: "3.11"
- name: Install dependencies
run: pip install httpx==0.28.1 pyyaml==6.0.2
@@ -36,7 +36,7 @@ jobs:
run: python scripts/build_skills_index.py
- name: Upload index artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
with:
name: skills-index
path: website/static/api/skills-index.json

View File

@@ -2,6 +2,11 @@ name: Tests
on:
workflow_call:
inputs:
slice_count:
description: Number of parallel test slices
type: number
default: 8
permissions:
contents: read
@@ -12,13 +17,11 @@ concurrency:
cancel-in-progress: true
jobs:
test:
generate:
name: "Generate slices"
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
slice: [1, 2, 3, 4, 5, 6]
outputs:
matrix: ${{ steps.matrix.outputs.matrix }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -27,13 +30,26 @@ jobs:
uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: test_durations.json
# main always writes a new suffix, but jobs pick the latest one with the same prefix
# quote from https://docs.github.com/en/actions/reference/workflows-and-actions/dependency-caching#cache-hits-and-misses
# If you provide restore-keys, the cache action sequentially searches for any caches that match the list of restore-keys.
# If there are no exact matches, the action searches for partial matches of the restore keys.
# When the action finds a partial match, the most recent cache is restored to the path directory.
key: test-durations
- name: Generate test slices
id: matrix
run: |
MATRIX=$(python3 scripts/run_tests_parallel.py --generate-slices ${{ inputs.slice_count }})
echo "matrix=$MATRIX" >> "$GITHUB_OUTPUT"
test:
name: Run tests slice ${{ matrix.slice.index }}/${{ inputs.slice_count }}
needs: generate
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
fail-fast: false
matrix: ${{ fromJSON(needs.generate.outputs.matrix) }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install ripgrep (prebuilt binary)
run: |
set -euo pipefail
@@ -49,7 +65,7 @@ jobs:
rg --version
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
with:
# Persist uv's download/wheel cache (~/.cache/uv) across runs.
# Keyed on the dependency manifests, so the cache is reused until
@@ -78,33 +94,19 @@ jobs:
# re-download, keeping the persisted cache small and fast to restore.
run: uv cache prune --ci
- name: Run tests (slice ${{ matrix.slice }}/6)
# Per-file isolation via scripts/run_tests_parallel.py: discovers
# every test_*.py file under tests/ (excluding integration/ + e2e/),
# then runs `python -m pytest <file>` in a freshly-spawned subprocess
- name: Run tests (slice ${{ matrix.slice.index }}/${{ inputs.slice_count }})
# Per-file isolation via scripts/run_tests.sh: each test file runs
# in its own freshly-spawned `python -m pytest <file>` subprocess
# with bounded parallelism. No xdist, no shared workers, no
# module-level state leakage between files.
#
# Why per-file (not per-test): per-test spawn cost (~250ms × 17k
# tests = 70min CPU minimum) blew the wall-clock budget. Per-file
# spawn (~250ms × ~850 files = ~3.5min) fits while still giving
# every file a fresh interpreter — the only isolation boundary
# that matters in practice (cross-file leakage was the original
# flake source; intra-file is the test author's responsibility).
#
# Why drop xdist entirely: xdist's persistent workers accumulate
# state across files, which is exactly the leakage we wanted to
# fix. ThreadPoolExecutor + subprocess.run is ~60 lines and does
# the job with cleaner semantics.
#
# Matrix slicing (--slice I/N): files are distributed across 6
# jobs by cached duration (LPT algorithm) so each job gets
# roughly equal wall time. Without a cache, files default to 2s
# estimate and get split roughly evenly by count — still correct,
# just not perfectly balanced.
# File list is pre-computed by the generate job (--generate-slices)
# which runs LPT distribution once and passes the file list to each
# matrix job via --files. Previously each job re-discovered files and
# re-ran LPT independently — redundant N times.
run: |
source .venv/bin/activate
python scripts/run_tests_parallel.py --slice ${{ matrix.slice }}/6
scripts/run_tests.sh --files '${{ matrix.slice.files }}'
env:
# Ensure tests don't accidentally call real APIs
OPENROUTER_API_KEY: ""
@@ -114,7 +116,7 @@ jobs:
- name: Upload per-slice durations
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: test-durations-slice-${{ matrix.slice }}
name: test-durations-slice-${{ matrix.slice.index }}
path: test_durations.json
retention-days: 1
@@ -173,7 +175,7 @@ jobs:
rg --version
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
with:
# Persist uv's download/wheel cache (~/.cache/uv) across runs.
# Keyed on the dependency manifests, so the cache is reused until

View File

@@ -6,6 +6,7 @@ on:
jobs:
typecheck:
name: Check TypeScript
runs-on: ubuntu-latest
strategy:
matrix:
@@ -22,8 +23,7 @@ jobs:
# native builds. Skipping install scripts drops node-pty's node-gyp
# header fetch — the transient flake that killed this job pre-`tsc` — and
# is faster. retry covers the remaining registry blips.
-
uses: ./.github/actions/retry
- uses: ./.github/actions/retry
with:
command: npm ci --ignore-scripts
- run: npm run --prefix ${{ matrix.package }} typecheck
@@ -35,6 +35,7 @@ jobs:
# users build apps/desktop from source on install/update. Run the real
# `vite build` here so that class of break fails in CI instead.
desktop-build:
name: Build desktop app
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -44,8 +45,7 @@ jobs:
cache: npm
# Keep install scripts here: the production build may need node-pty's
# native binary. retry handles the transient install-time fetch flakes.
-
uses: ./.github/actions/retry
- uses: ./.github/actions/retry
with:
command: npm ci
- run: npm run --prefix apps/desktop build

View File

@@ -5,11 +5,11 @@ name: Publish to PyPI
on:
push:
tags:
- 'v20*' # CalVer tags: v2026.5.15, v2026.5.15.2, etc.
- "v20*" # CalVer tags: v2026.5.15, v2026.5.15.2, etc.
workflow_dispatch:
inputs:
confirm_tag:
description: 'Tag to publish (e.g. v2026.5.15). Must already exist.'
description: "Tag to publish (e.g. v2026.5.15). Must already exist."
required: true
type: string
@@ -27,7 +27,7 @@ jobs:
name: Build distribution 📦
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
# On workflow_dispatch, check out the confirmed tag.
@@ -43,17 +43,17 @@ jobs:
fi
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.13'
python-version: "3.13"
- name: Install uv
uses: astral-sh/setup-uv@d0cc045d04ccac9d8b7881df0226f9e82c39688e # v6
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
- name: Set up Node.js
uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4
uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4
with:
node-version: '22'
node-version: "22"
- name: Build web dashboard
run: cd web && npm ci && npm run build
@@ -81,7 +81,7 @@ jobs:
run: uv build --sdist --wheel
- name: Upload distribution artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
with:
name: python-package-distributions
path: dist/
@@ -94,17 +94,17 @@ jobs:
name: pypi
url: https://pypi.org/p/hermes-agent
permissions:
id-token: write # OIDC trusted publishing
id-token: write # OIDC trusted publishing
steps:
- name: Download distribution artifacts
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
name: python-package-distributions
path: dist/
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b # v1.14.0
uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b # v1.14.0
with:
skip-existing: true
@@ -116,12 +116,12 @@ jobs:
needs: publish
runs-on: ubuntu-latest
permissions:
contents: write # attach assets to the existing release
id-token: write # sigstore signing
contents: write # attach assets to the existing release
id-token: write # sigstore signing
steps:
- name: Download distribution artifacts
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
name: python-package-distributions
path: dist/
@@ -145,7 +145,7 @@ jobs:
- name: Sign with Sigstore
if: env.skip_sign != 'true'
uses: sigstore/gh-action-sigstore-python@04cffa1d795717b140764e8b640de88853c92acc # v3.3.0
uses: sigstore/gh-action-sigstore-python@04cffa1d795717b140764e8b640de88853c92acc # v3.3.0
with:
inputs: >-
./dist/*.tar.gz

View File

@@ -4,7 +4,7 @@ name: uv.lock check
# that modify pyproject.toml without regenerating uv.lock (or vice versa)
# must not merge, because the Docker build's `uv sync --frozen` step will
# fail on a stale lockfile and we'd rather catch it here than in the
# docker-publish workflow on main.
# docker workflow on main.
#
# ─────────────────────────────────────────────────────────────────────────
# IMPORTANT: this check runs against the MERGED state, not just your branch
@@ -63,7 +63,7 @@ jobs:
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
# `uv lock --check` re-resolves the project from pyproject.toml and
# compares the result to uv.lock, exiting non-zero if they disagree.
@@ -100,7 +100,7 @@ jobs:
This check is blocking because the Docker image build uses
`uv sync --frozen --extra all`, which rejects stale lockfiles
— catching it here avoids a ~15 min failed docker-publish run
— catching it here avoids a ~15 min failed docker run
on `main` post-merge.
EOF
echo "::error title=uv.lock out of sync::Run \`uv lock\` locally and commit the result. If on a PR, sync with main first."

6
.gitignore vendored
View File

@@ -137,3 +137,9 @@ RELEASE_v*.md
# Desktop demo-run scratch output (hermes writes demo/*.txt during recorded
# walkthroughs). Throwaway artifacts, never part of the app.
apps/desktop/demo/
# PR infographics are rendered locally and embedded in PR descriptions via the
# image-provider (fal.media) URL — they are NEVER committed to the repo. The
# PR body is the archive. See the hermes-agent-dev skill's
# pr-infographic-workflow reference (storage rule + lapse #8 / #COMMIT-1).
infographic/

View File

@@ -123,6 +123,17 @@ conservative at the waist.
without E2E proof, and plugins that touch core files.** Plugins live in their
own directory and work within the ABCs/hooks we provide; if a plugin needs
more, widen the generic plugin surface, don't special-case it in core.
- **Third-party products / other people's projects integrated into the core
tree.** Observability backends, vendor SaaS integrations, analytics dashboards,
and similar "someone else's product" plugins do NOT land under `plugins/` in
this repo. They place an ongoing maintenance burden on us to keep them working
against a fast-moving core, for a backend we don't own. Ship them as a
**standalone plugin repo** users install into `~/.hermes/plugins/` (or via a
pip entry point), and promote them in the Nous Research Discord
(`#plugins-skills-and-skins`). This is a coupling-and-maintenance decision, not
a quality bar — the plugin can be excellent and still be a close. PRs that add
such a directory to the tree are closed with a pointer to publish it as its own
repo.
### Before you call it a bug — verify the premise (and when NOT to close)
@@ -480,7 +491,7 @@ The dashboard embeds the real `hermes --tui` — **not** a rewrite. See `hermes
### Electron Desktop Chat App (`apps/desktop/`)
A **separate** chat surface from both the classic CLI and the dashboard's embedded TUI. It is an Electron + React + nanostore renderer (`@assistant-ui/react`) that talks to a `tui_gateway` backend over JSON-RPC (`requestGateway(method, params)`). It does NOT embed `hermes --tui` — it has its own composer, transcript, and slash-command pipeline. Route desktop bugs to the `hermes-desktop-app-work` skill, not `hermes-dashboard-work`.
A **separate** chat surface from both the classic CLI and the dashboard's embedded TUI. It is an Electron + React + nanostore renderer (`@assistant-ui/react`) that talks to a `tui_gateway` backend over JSON-RPC (`requestGateway(method, params)`). The WebSocket/JSON-RPC transport lives in the framework-agnostic `apps/shared` package (`@hermes/shared``JsonRpcGatewayClient` + WS URL helpers), which the web dashboard (`web/`) also consumes; **desktop has no build/runtime dependency on the dashboard frontend** — it spawns a headless `hermes serve` backend server (the same gateway `dashboard` serves, minus the browser UI). `dashboard` and `serve` share `cmd_dashboard`/`start_server` but are independent surfaces — neither launches the other. The one exception is a backward-compat *fallback*: `serve` is newer, so the desktop spawn (`electron/backend-command.cjs` + `backendSupportsServe()` in `main.cjs`) detects whether the resolved runtime registers `serve` and, only when it does not (an older managed install / PATH `hermes` the app hasn't updated yet), rewrites the argv to the legacy `dashboard --no-open`. Without that, a new app against an un-upgraded runtime would crash on an unknown subcommand and brick every mid-upgrade user. It does NOT embed `hermes --tui` — it has its own composer, transcript, and slash-command pipeline. Route desktop bugs to the `hermes-desktop-app-work` skill, not `hermes-dashboard-work`.
**Slash commands in the desktop app are curated client-side, then dispatched to the backend.** The pipeline:
@@ -783,6 +794,24 @@ landing in this tree. PRs that add a new directory under
provider as its own repo. Existing in-tree providers stay; bug fixes
to them are welcome.
**No new third-party-product plugins in-tree (policy, June 2026):** the
same rule applies beyond memory providers. Plugins that integrate
someone else's product or project — observability/metrics backends,
vendor SaaS connectors, analytics dashboards, paid-service tie-ins —
must ship as **standalone plugin repos** that users install into
`~/.hermes/plugins/` (or via pip entry points). They register through
the existing plugin discovery path and use the ABCs/hooks/ctx surface
we expose; nothing special is needed in core. The reason is
maintenance load: every product we absorb into the tree becomes our
burden to keep working against a fast-moving core, for a backend we
don't own. Promote standalone plugins in the Nous Research Discord
(`#plugins-skills-and-skins`). PRs that add such a directory under
`plugins/` are closed with a pointer to publish it as its own repo —
this is a coupling decision, not a quality judgment. (The
`observability/`, `kanban/`, `disk-cleanup/`, etc. directories already
in the tree are existing precedent, not an invitation to add more
third-party-product plugins alongside them.)
### Model-provider plugins (`plugins/model-providers/<name>/`)
Every inference backend (openrouter, anthropic, gmi, deepseek, nvidia, …)

View File

@@ -85,6 +85,23 @@ This isn't a quality bar — it's a coupling-and-maintenance decision. Memory pr
---
## Third-Party Product Integrations: Ship as a Standalone Plugin
The same rule extends to **any plugin that integrates someone else's product or project** — observability/metrics backends, vendor SaaS connectors, analytics dashboards, paid-service tie-ins, and similar third-party integrations. **These do not land in this repo.**
The reason is maintenance load, not quality. Every external product absorbed into the core tree becomes ours to keep working against a fast-moving codebase, for a backend we don't own and can't control. Hermes ships a lot and the core moves quickly; coupling third-party products into it creates an open-ended burden on the maintainers.
Publish these as a **standalone plugin repo** instead:
- Implement the relevant ABC and use the existing plugin discovery path (`~/.hermes/plugins/`, project `.hermes/plugins/`, or a pip entry point) — see [Build a Hermes Plugin](https://hermes-agent.nousresearch.com/docs/guides/build-a-hermes-plugin)
- Register lifecycle hooks (`pre_tool_call`, `post_tool_call`, `pre_llm_call`, `post_llm_call`, `on_session_start`, `on_session_end`), tools (`ctx.register_tool`), and CLI subcommands (`ctx.register_cli_command`) through the surface we already expose — no core changes needed
- If your plugin needs a capability the framework doesn't expose, that's a feature request to **widen the generic plugin surface** (a new hook or `ctx` method) — never special-case your plugin in core
- Promote it in the [Nous Research Discord](https://discord.gg/NousResearch) `#plugins-skills-and-skins` channel so users can find and install it
A well-built third-party-product plugin can clear automated review and still be closed for this reason — it's a placement decision, not a verdict on the code. PRs that add such a directory under `plugins/` will be closed with a pointer to publish it as its own repo.
---
## Development Setup
### Prerequisites

View File

@@ -119,6 +119,9 @@ COPY package.json package-lock.json ./
COPY web/package.json web/
COPY ui-tui/package.json ui-tui/
COPY ui-tui/packages/hermes-ink/ ui-tui/packages/hermes-ink/
# apps/shared/ is copied IN FULL because web/package.json references it as a
# `file:` workspace dependency (same pattern as hermes-ink above).
COPY apps/shared/ apps/shared/
# `npm_config_install_links=false` forces npm to install `file:` deps as
# symlinks instead of copies. This is the default since npm 10+, which is
@@ -184,12 +187,19 @@ RUN uv sync --frozen --no-install-project --extra all --extra messaging --extra
# invalidate the (relatively slow) web + ui-tui build layer.
COPY web/ web/
COPY ui-tui/ ui-tui/
COPY apps/shared/ apps/shared/
RUN cd web && npm run build && \
cd ../ui-tui && npm run build
# ---------- Source code ----------
# .dockerignore excludes node_modules, so the installs above survive.
COPY . .
# --link decouples this layer from parents for cache purposes; --chmod bakes
# the final read-only permissions at copy time so we skip the separate
# `chmod -R` pass that previously walked ~30k files across the venv +
# node_modules + source (21s amd64 / 222s arm64 — #49113). `a+rX,go-w`
# gives the non-root hermes user read + traverse but no write; root retains
# write so the build steps below don't need chmod u+w dances.
COPY --link --chmod=a+rX,go-w . .
# ---------- Permissions ----------
# Link hermes-agent itself (editable). Deps are already installed in the
@@ -197,19 +207,15 @@ COPY . .
# resolution or downloads.
RUN uv pip install --no-cache-dir --no-deps -e "."
# Keep /opt/hermes immutable for the runtime hermes user. Hosted/container
# instances must not be able to self-edit the installed source or venv; user
# data, skills, plugins, config, logs, and dashboard uploads live under
# /opt/data instead. Root can still repair the image during build/boot, but
# supervised Hermes processes drop to the non-root hermes user.
# Wire the exec shim and install-method stamp. Files under /opt/hermes are
# already root-owned (COPY, uv sync, npm install all run as root) and
# read-only for the hermes user (go-w from the --chmod above).
USER root
RUN mkdir -p /opt/hermes/bin && \
cp /opt/hermes/docker/hermes-exec-shim.sh /opt/hermes/bin/hermes && \
chmod 0755 /opt/hermes/bin/hermes && \
printf 'docker\n' > /opt/hermes/.install_method && \
chown -R root:root /opt/hermes && \
chmod -R a+rX /opt/hermes && \
chmod -R a-w /opt/hermes
printf 'docker\n' > /opt/hermes/.install_method
# The ``.install_method`` stamp is baked next to the running code (the install
# tree), NOT into $HERMES_HOME. $HERMES_HOME (/opt/data) is a shared data
# volume that is commonly bind-mounted from the host and even shared with a
@@ -236,13 +242,11 @@ RUN mkdir -p /opt/hermes/bin && \
#
# The arg is optional — local `docker build` without --build-arg simply
# omits the file, and the runtime falls back to live-git lookup. CI
# (.github/workflows/docker-publish.yml) passes ${{ github.sha }} so
# (.github/workflows/docker.yml) passes ${{ github.sha }} so
# every published image has it.
ARG HERMES_GIT_SHA=
RUN if [ -n "${HERMES_GIT_SHA}" ]; then \
chmod u+w /opt/hermes && \
printf '%s\n' "${HERMES_GIT_SHA}" > /opt/hermes/.hermes_build_sha && \
chmod a-w /opt/hermes /opt/hermes/.hermes_build_sha; \
printf '%s\n' "${HERMES_GIT_SHA}" > /opt/hermes/.hermes_build_sha; \
fi
# ---------- s6-overlay service wiring ----------

View File

@@ -18,7 +18,7 @@
**The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NovitaAI](https://novita.ai) (AI-native cloud for Model API, Agent Sandbox, and GPU Cloud), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), OpenRouter, OpenAI, your own endpoint, and [many others](https://hermes-agent.nousresearch.com/docs/integrations/providers). Switch with `hermes model` — no code changes, no lock-in.
<table>
<tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>

View File

@@ -722,10 +722,50 @@ def init_agent(
elif agent.provider == "moa":
from agent.moa_loop import MoAClient
agent.api_mode = "chat_completions"
agent.client = MoAClient(agent.model or "default")
# Route reference-model outputs to the agent's tool_progress_callback so
# every surface that already consumes it (CLI spinner/scrollback, TUI,
# desktop, gateway) can show each reference's answer as a labelled block
# before the aggregator acts. The facade emits "moa.reference" and
# "moa.aggregating" events; we forward them through the same callback
# the tool lifecycle uses. Best-effort and cache-safe — these are
# display-only events, they never touch the message history.
def _moa_reference_relay(event: str, **kwargs: Any) -> None:
cb = getattr(agent, "tool_progress_callback", None)
if cb is None:
return
try:
if event == "moa.reference":
label = str(kwargs.get("label") or "")
text = str(kwargs.get("text") or "")
idx = kwargs.get("index")
count = kwargs.get("count")
cb(
"moa.reference",
label,
text,
None,
moa_index=idx,
moa_count=count,
)
elif event == "moa.aggregating":
cb(
"moa.aggregating",
str(kwargs.get("aggregator") or ""),
None,
None,
moa_ref_count=kwargs.get("ref_count"),
)
except Exception:
pass
agent.client = MoAClient(
agent.model or "default",
reference_callback=_moa_reference_relay,
)
agent._client_kwargs = {}
agent.api_key = api_key or "moa-virtual-provider"
agent.base_url = base_url or "moa://local"
agent.base_url = "moa://local"
if not agent.quiet_mode:
print(f"🤖 AI Agent initialized with MoA preset: {agent.model}")
elif agent.api_mode == "bedrock_converse":
@@ -1267,6 +1307,12 @@ def init_agent(
_agent_section = {}
agent._tool_use_enforcement = _agent_section.get("tool_use_enforcement", "auto")
# Intent-ack continuation config: "auto" (default — codex_responses only,
# the historical gate), true (all api_modes), false (never), or a list of
# model-name substrings. Resolved against the active api_mode/model in the
# conversation loop's intent-ack block.
agent._intent_ack_continuation = _agent_section.get("intent_ack_continuation", "auto")
# Universal task-completion guidance toggle. Default True. Surfaced
# as a separate flag from tool_use_enforcement because the guidance
# applies to ALL models, not just the model families enforcement
@@ -1630,8 +1676,10 @@ def init_agent(
f"Model {agent.model} has a context window of {_ctx:,} tokens, "
f"which is below the minimum {MINIMUM_CONTEXT_LENGTH:,} required "
f"by Hermes Agent. Choose a model with at least "
f"{MINIMUM_CONTEXT_LENGTH // 1000}K context, or set "
f"model.context_length in config.yaml to override."
f"{MINIMUM_CONTEXT_LENGTH // 1000}K context. If your server "
f"reports a window smaller than the model's true window, set "
f"model.context_length in config.yaml to the real value "
f"(this must be at least {MINIMUM_CONTEXT_LENGTH // 1000}K)."
)
# Inject context engine tool schemas (e.g. lcm_grep, lcm_describe, lcm_expand).

View File

@@ -42,6 +42,14 @@ from utils import base_url_host_matches, base_url_hostname, env_var_enabled, ato
logger = logging.getLogger(__name__)
# Max consecutive successful credential-pool token refreshes of the SAME entry
# on a persistent auth failure before we give up and let the fallback chain
# activate. A single-entry OAuth pool can re-mint a fresh token indefinitely
# even when the upstream keeps rejecting it, so without this cap the retry loop
# spins forever and never reaches ``_try_activate_fallback``. See #26080.
_MAX_AUTH_REFRESH_ATTEMPTS = 2
def _ra():
"""Lazy ``run_agent`` reference for test-patch routing."""
import run_agent
@@ -775,6 +783,30 @@ def recover_with_credential_pool(
return False, has_retried_429
refreshed = pool.try_refresh_current()
if refreshed is not None:
# ``try_refresh_current()`` re-mints a fresh OAuth token and reports
# success even when the upstream keeps rejecting it — a single-entry
# pool (common for OAuth/Max subscribers) has nothing to rotate to,
# so a bare "refreshed → retry" loop spins forever on the same dead
# token and the configured fallback never activates. Cap consecutive
# same-entry refreshes and fall through to fallback once exceeded.
# See #26080.
refreshed_id = getattr(refreshed, "id", None)
if refreshed_id is not None:
refresh_counts = getattr(agent, "_auth_pool_refresh_counts", None)
if refresh_counts is None:
refresh_counts = {}
agent._auth_pool_refresh_counts = refresh_counts
refresh_key = (agent.provider, refreshed_id)
refresh_counts[refresh_key] = refresh_counts.get(refresh_key, 0) + 1
if refresh_counts[refresh_key] > _MAX_AUTH_REFRESH_ATTEMPTS:
_ra().logger.warning(
"Credential auth failure persists after %s refreshes for "
"pool entry %s — treating as unrecoverable and allowing "
"fallback to activate.",
refresh_counts[refresh_key] - 1,
refreshed_id,
)
return False, has_retried_429
_ra().logger.info(f"Credential auth failure — refreshed pool entry {getattr(refreshed, 'id', '?')}")
agent._swap_credential(refreshed)
return True, has_retried_429
@@ -1046,6 +1078,34 @@ def restore_primary_runtime(agent) -> bool:
api_mode=rt.get("compressor_api_mode", ""),
)
# ── Re-select from the credential pool if one is available ──
# The snapshot's api_key was captured at construction time. Across
# turns the pool may have rotated (token revocation, billing/rate-limit
# exhaustion, cooldown), leaving the snapshot key stale. Restoring it
# blindly re-fails on the first request and burns through the remaining
# pool entries before cross-provider fallback even gets a chance. Ask
# the pool for its current best entry and swap the live credential in.
# When the pool is absent, empty, or the entry has no usable key, we
# keep the snapshot key (the existing behavior). Fixes #25205.
pool = getattr(agent, "_credential_pool", None)
if pool is not None and pool.has_available():
entry = pool.select()
if entry is not None:
entry_key = (
getattr(entry, "runtime_api_key", None)
or getattr(entry, "access_token", "")
)
if entry_key:
# ``_swap_credential`` rebuilds the OpenAI/Anthropic client,
# reapplies base-url-scoped headers, and carries the
# accumulated base_url / OAuth-detection fixes (#33163).
agent._swap_credential(entry)
logger.info(
"Restore re-selected pool entry %s (%s)",
getattr(entry, "id", "?"),
getattr(entry, "label", "?"),
)
# ── Reset fallback chain for the new turn ──
agent._fallback_activated = False
agent._fallback_index = 0
@@ -1221,7 +1281,11 @@ def dump_api_request_debug(
dump_payload["error"] = error_info
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
dump_file = agent.logs_dir / f"request_dump_{agent.session_id}_{timestamp}.json"
# Sanitize the session ID into a traversal-free path segment — it can
# originate from untrusted input (X-Hermes-Session-Id header), and an
# unsanitized "../"-shaped ID would write the dump outside logs_dir.
safe_sid = _ra()._safe_session_filename_component(agent.session_id)
dump_file = agent.logs_dir / f"request_dump_{safe_sid}_{timestamp}.json"
# Redact secrets before persisting/printing. This dump captures the
# full request body (system prompt, tool defs, context-embedded
@@ -1420,6 +1484,15 @@ def create_openai_client(agent, client_kwargs: dict, *, reason: str, shared: boo
keepalive_http = agent._build_keepalive_http_client(client_kwargs.get("base_url", ""))
if keepalive_http is not None:
client_kwargs["http_client"] = keepalive_http
# Delegate all rate-limit / 5xx retry to hermes's outer conversation loop,
# which honors Retry-After and applies adaptive/jittered backoff. The OpenAI
# SDK default (max_retries=2) uses its own 1-2s backoff that ignores
# Retry-After and double-retries inside our loop — the same deadlock the
# Anthropic clients hit (#26293). This is the single chokepoint every primary
# OpenAI/aggregator client passes through (init, switch_model, recovery,
# restore, request-scoped); auxiliary_client builds its own clients and keeps
# SDK retries because it is NOT wrapped by the conversation loop.
client_kwargs.setdefault("max_retries", 0)
# Uses the module-level `OpenAI` name, resolved lazily on first
# access via __getattr__ below. Tests patch via `run_agent.OpenAI`.
client = _ra().OpenAI(**client_kwargs)
@@ -1499,6 +1572,10 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
# _client_kwargs is a dict — snapshot a shallow copy so mutating the
# live dict doesn't poison the rollback target.
_snapshot["_client_kwargs"] = dict(getattr(agent, "_client_kwargs", {}) or {})
# Snapshot the credential pool reference so a failed client rebuild can
# restore the original pool (issue #52727: pool reload is part of this
# switch and must be reversible on rollback).
_snapshot["_credential_pool"] = getattr(agent, "_credential_pool", _MISSING)
try:
# Clear the per-config context_length override so the new model's
@@ -1523,8 +1600,36 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
if api_key:
agent.api_key = api_key
# ── Reload credential pool for the new provider (issue #52727) ──
# Without this, ``recover_with_credential_pool`` sees a
# ``pool.provider != agent.provider`` mismatch and short-circuits,
# leaving the new provider with no rotation/recovery on 401/429 and
# burning the original pool's entries. Only reload when the provider
# actually changed (or the pool was missing) — re-selecting the same
# provider must not churn the pool reference. A reload failure is
# logged + swallowed: the switch itself must still complete.
old_norm = (old_provider or "").strip().lower()
new_norm = (new_provider or "").strip().lower()
if old_norm != new_norm or getattr(agent, "_credential_pool", None) is None:
try:
from agent.credential_pool import load_pool
agent._credential_pool = load_pool(new_provider)
except Exception as _pool_exc: # noqa: BLE001
logger.warning(
"switch_model: credential pool reload failed for %s (%s); "
"continuing without pool rotation this turn",
new_provider, _pool_exc,
)
# ── Build new client ──
if api_mode == "anthropic_messages":
if (new_provider or "").strip().lower() == "moa":
from agent.moa_loop import MoAClient
agent.api_key = api_key or "moa-virtual-provider"
agent.base_url = "moa://local"
agent._client_kwargs = {}
agent.client = MoAClient(agent.model or "default")
elif api_mode == "anthropic_messages":
from agent.anthropic_adapter import (
build_anthropic_client,
resolve_anthropic_token,
@@ -2104,8 +2209,21 @@ def looks_like_codex_intermediate_ack(
user_message: str,
assistant_content: str,
messages: List[Dict[str, Any]],
require_workspace: bool = True,
) -> bool:
"""Detect a planning/ack message that should continue instead of ending the turn."""
"""Detect a planning/ack message that should continue instead of ending the turn.
``require_workspace`` (default True) keeps the original codex-coding scope:
the ack must reference a filesystem/repo workspace. The conversation loop
passes ``require_workspace=False`` when the user has explicitly opted into
intent-ack continuation for all api_modes (``agent.intent_ack_continuation``
is ``true`` or a model-list), so general autonomous workflows ("I'll run a
health check on the server", "I'll start the deployment") — which carry a
future-ack and an action verb but no filesystem reference — are caught too.
The future-ack + short-content + no-prior-tools + action-verb requirements
always apply, which is what keeps conversational "I'll help you brainstorm"
replies from tripping it.
"""
if any(isinstance(msg, dict) and msg.get("role") == "tool" for msg in messages):
return False
@@ -2158,17 +2276,67 @@ def looks_like_codex_intermediate_ack(
"path",
)
assistant_mentions_action = any(marker in assistant_text for marker in action_markers)
if not assistant_mentions_action:
return False
# Opted-in (all-api_mode) path: a future-ack + action verb + no prior tool
# call is enough — the user asked us to keep going when the model only
# announces intent, regardless of whether a filesystem is involved.
if not require_workspace:
return True
user_text = (user_message or "").strip().lower()
user_targets_workspace = (
any(marker in user_text for marker in workspace_markers)
or "~/" in user_text
or "/" in user_text
)
assistant_mentions_action = any(marker in assistant_text for marker in action_markers)
assistant_targets_workspace = any(
marker in assistant_text for marker in workspace_markers
)
return (user_targets_workspace or assistant_targets_workspace) and assistant_mentions_action
return user_targets_workspace or assistant_targets_workspace
def intent_ack_continuation_mode(agent) -> str:
"""Classify the resolved intent-ack continuation mode for this turn.
Returns one of:
* ``"off"`` — never continue.
* ``"codex_only"`` — historical scope: continue only on the
``codex_responses`` api_mode, and only for codebase/workspace acks
(``require_workspace=True``).
* ``"all"`` — user opted in for every api_mode; continue on any
future-ack + action verb (``require_workspace=False``).
Mirrors the four-mode shape of ``agent.tool_use_enforcement``: ``"auto"``
(default) → codex_only; ``True``/"true"/"always"/"yes"/"on" → all;
``False``/"false"/"never"/"no"/"off" → off; ``list`` → all when a substring
matches the active model name, else off.
"""
mode = getattr(agent, "_intent_ack_continuation", "auto")
if mode is True or (isinstance(mode, str) and mode.lower() in {"true", "always", "yes", "on"}):
return "all"
if mode is False or (isinstance(mode, str) and mode.lower() in {"false", "never", "no", "off"}):
return "off"
if isinstance(mode, list):
model_lower = (agent.model or "").lower()
return "all" if any(p.lower() in model_lower for p in mode if isinstance(p, str)) else "off"
# "auto" or any unrecognised value — historical codex-only behavior.
return "codex_only" if agent.api_mode == "codex_responses" else "off"
def intent_ack_continuation_enabled(agent) -> bool:
"""Whether intent-ack continuation should fire at all for this turn.
The ``codex_ack_continuations < 2`` per-turn cap and the
``looks_like_codex_intermediate_ack`` detector are applied by the caller;
this only decides the on/off gate. Callers that also need to know whether
the workspace requirement applies should use ``intent_ack_continuation_mode``
directly (``"codex_only"`` ⇒ require_workspace=True, ``"all"`` ⇒ False).
"""
return intent_ack_continuation_mode(agent) != "off"

View File

@@ -673,6 +673,9 @@ def _build_anthropic_client_with_bearer_hook(
kwargs = {
"timeout": timeout_obj,
"http_client": http_client,
# Delegate retry to hermes's outer loop (honors Retry-After); the SDK
# default max_retries=2 ignores it and double-retries. (#26293)
"max_retries": 0,
# The SDK requires *something* for api_key/auth_token. Our
# event hook overrides Authorization per request so this value
# is never sent. The sentinel string makes accidental leaks
@@ -757,6 +760,12 @@ def build_anthropic_client(
_read_timeout = timeout if (isinstance(timeout, (int, float)) and timeout > 0) else 900.0
kwargs = {
"timeout": Timeout(timeout=float(_read_timeout), connect=10.0),
# Delegate all rate-limit / 5xx retry to hermes's outer conversation
# loop, which honors Retry-After. The SDK default (max_retries=2) uses
# its own 1-2s backoff that ignores Retry-After and double-retries
# inside our loop — burning request slots against a bucket that won't
# refill for minutes. (#26293)
"max_retries": 0,
}
if normalized_base_url:
# Azure Anthropic endpoints require an ``api-version`` query parameter.
@@ -852,6 +861,9 @@ def build_anthropic_bedrock_client(region: str):
return _anthropic_sdk.AnthropicBedrock(
aws_region=region,
timeout=Timeout(timeout=900.0, connect=10.0),
# Delegate retry to hermes's outer loop (honors Retry-After); the SDK
# default max_retries=2 ignores it and double-retries. (#26293)
max_retries=0,
default_headers={"anthropic-beta": ",".join([*_COMMON_BETAS, _CONTEXT_1M_BETA])},
)
@@ -914,44 +926,72 @@ def _read_claude_code_credentials_from_keychain() -> Optional[Dict[str, Any]]:
return None
def _read_claude_code_credentials_from_file() -> Optional[Dict[str, Any]]:
"""Read Claude Code OAuth credentials from ~/.claude/.credentials.json.
Returns dict with {accessToken, refreshToken?, expiresAt?, source} or None.
"""
cred_path = Path.home() / ".claude" / ".credentials.json"
if not cred_path.exists():
return None
try:
data = json.loads(cred_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError, IOError) as e:
logger.debug("Failed to read ~/.claude/.credentials.json: %s", e)
return None
oauth_data = data.get("claudeAiOauth")
if not (oauth_data and isinstance(oauth_data, dict)):
return None
access_token = oauth_data.get("accessToken", "")
if not access_token:
return None
return {
"accessToken": access_token,
"refreshToken": oauth_data.get("refreshToken", ""),
"expiresAt": oauth_data.get("expiresAt", 0),
"source": "claude_code_credentials_file",
}
def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
"""Read refreshable Claude Code OAuth credentials.
Checks two sources in order:
Reads from two possible sources and reconciles them:
1. macOS Keychain (Darwin only) — "Claude Code-credentials" entry
2. ~/.claude/.credentials.json file
Selection rules when both are present:
- If exactly one is non-expired, prefer that one. (Handles the case
where Claude Code refreshes one source but not the other — observed
in the wild on Claude Code 2.1.x.)
- Otherwise, prefer the source with the later ``expiresAt`` so that
any subsequent refresh uses the most recent ``refreshToken``.
This intentionally excludes ~/.claude.json primaryApiKey. Opencode's
subscription flow is OAuth/setup-token based with refreshable credentials,
and native direct Anthropic provider usage should follow that path rather
than auto-detecting Claude's first-party managed key.
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
Returns dict with {accessToken, refreshToken?, expiresAt?, source} or None.
"""
# Try macOS Keychain first (covers Claude Code >=2.1.114)
kc_creds = _read_claude_code_credentials_from_keychain()
if kc_creds:
return kc_creds
file_creds = _read_claude_code_credentials_from_file()
# Fall back to JSON file
cred_path = Path.home() / ".claude" / ".credentials.json"
if cred_path.exists():
try:
data = json.loads(cred_path.read_text(encoding="utf-8"))
oauth_data = data.get("claudeAiOauth")
if oauth_data and isinstance(oauth_data, dict):
access_token = oauth_data.get("accessToken", "")
if access_token:
return {
"accessToken": access_token,
"refreshToken": oauth_data.get("refreshToken", ""),
"expiresAt": oauth_data.get("expiresAt", 0),
"source": "claude_code_credentials_file",
}
except (json.JSONDecodeError, OSError, IOError) as e:
logger.debug("Failed to read ~/.claude/.credentials.json: %s", e)
if kc_creds and file_creds:
kc_valid = is_claude_code_token_valid(kc_creds)
file_valid = is_claude_code_token_valid(file_creds)
if kc_valid and not file_valid:
return kc_creds
if file_valid and not kc_valid:
return file_creds
# Both valid or both expired: prefer the later expiresAt so the
# downstream refresh path uses the freshest refresh_token.
kc_exp = kc_creds.get("expiresAt", 0) or 0
file_exp = file_creds.get("expiresAt", 0) or 0
return kc_creds if kc_exp >= file_exp else file_creds
return None
return kc_creds or file_creds
def is_claude_code_token_valid(creds: Dict[str, Any]) -> bool:
@@ -1034,8 +1074,40 @@ def refresh_anthropic_oauth_pure(refresh_token: str, *, use_json: bool = False)
def _refresh_oauth_token(creds: Dict[str, Any]) -> Optional[str]:
"""Attempt to refresh an expired Claude Code OAuth token."""
refresh_token = creds.get("refreshToken", "")
"""Attempt to refresh an expired Claude Code OAuth token.
Claude Code's OAuth refresh tokens are single-use: a successful refresh
rotates the pair and invalidates the old refresh token. Claude Code itself
also refreshes on its own schedule (IDE/CLI activity), so by the time
Hermes notices an expired token, Claude Code may have already rotated it.
POSTing our now-stale refresh token in that window races Claude Code and
fails with ``invalid_grant``.
So before refreshing, re-read the live credential sources. If Claude Code
has already produced a valid token, adopt it and skip the POST entirely.
Only fall back to refreshing ourselves when no fresh credential is found.
"""
# Claude Code may have already refreshed — adopt its token rather than
# racing it with our (possibly already-rotated) refresh token. Only adopt
# when the live re-read produced a DIFFERENT token with a real future
# expiry: re-adopting the same credential we were just handed would be a
# no-op, and a 0/absent ``expiresAt`` means "managed key / unknown expiry"
# (see is_claude_code_token_valid) which must NOT be treated as a fresh
# refresh here.
current = read_claude_code_credentials()
if current:
current_token = current.get("accessToken", "")
current_exp = current.get("expiresAt", 0) or 0
if (
current_token
and current_token != creds.get("accessToken", "")
and current_exp > 0
and is_claude_code_token_valid(current)
):
logger.debug("Adopted Claude Code's already-refreshed OAuth token")
return current_token
refresh_token = (current or {}).get("refreshToken", "") or creds.get("refreshToken", "")
if not refresh_token:
logger.debug("No refresh token available — cannot refresh")
return None

View File

@@ -102,6 +102,7 @@ OpenAI = _OpenAIProxy() # module-level name, resolves lazily on call/isinstance
from agent.credential_pool import load_pool
from agent.model_metadata import MINIMUM_CONTEXT_LENGTH, get_model_context_length
from agent.process_bootstrap import build_keepalive_http_client
from hermes_cli.config import get_hermes_home
from hermes_constants import OPENROUTER_BASE_URL
from utils import base_url_host_matches, base_url_hostname, env_float, model_forces_max_completion_tokens, normalize_proxy_env_vars
@@ -109,6 +110,23 @@ from utils import base_url_host_matches, base_url_hostname, env_float, model_for
logger = logging.getLogger(__name__)
def _openai_http_client_kwargs(
base_url: Optional[str],
*,
async_mode: bool = False,
) -> Dict[str, Any]:
"""Inject keepalive httpx client with env-only proxy (not macOS system proxy)."""
client = build_keepalive_http_client(str(base_url or ""), async_mode=async_mode)
if client is None:
return {}
return {"http_client": client}
def _create_openai_client(*, api_key: str, base_url: str, **kwargs: Any) -> Any:
kwargs = {**_openai_http_client_kwargs(base_url), **kwargs}
return OpenAI(api_key=api_key, base_url=base_url, **kwargs)
# ── Interrupt protection for atomic auxiliary tasks ──────────────────────
# Some auxiliary tasks must NOT be aborted mid-flight by a gateway interrupt
# (e.g. an incoming user message while the agent is busy). Context
@@ -1614,7 +1632,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
_merged_aux = _apply_user_default_headers(extra.get("default_headers"))
if _merged_aux:
extra["default_headers"] = _merged_aux
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _create_openai_client(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, raw_base_url)
return _client, model
@@ -1654,7 +1672,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
_merged_aux2 = _apply_user_default_headers(extra.get("default_headers"))
if _merged_aux2:
extra["default_headers"] = _merged_aux2
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _create_openai_client(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, raw_base_url)
return _client, model
@@ -1669,20 +1687,21 @@ def _try_openrouter(explicit_api_key: str = None, model: str = None) -> Tuple[Op
pool_present, entry = _select_pool_entry("openrouter")
if pool_present:
or_key = explicit_api_key or _pool_runtime_api_key(entry)
if not or_key:
_mark_provider_unhealthy("openrouter", ttl=60)
return None, None
base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL
logger.debug("Auxiliary client: OpenRouter via pool")
return OpenAI(api_key=or_key, base_url=base_url,
default_headers=build_or_headers()), model or _OPENROUTER_MODEL
if or_key:
base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL
logger.debug("Auxiliary client: OpenRouter via pool")
return _create_openai_client(api_key=or_key, base_url=base_url,
default_headers=build_or_headers()), model or _OPENROUTER_MODEL
# Pool exists but is exhausted (no usable runtime key) — fall through to
# the OPENROUTER_API_KEY env-var path rather than failing outright.
logger.debug("Auxiliary client: OpenRouter pool exhausted, trying OPENROUTER_API_KEY")
or_key = explicit_api_key or os.getenv("OPENROUTER_API_KEY")
if not or_key:
_mark_provider_unhealthy("openrouter", ttl=60)
return None, None
logger.debug("Auxiliary client: OpenRouter")
return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
return _create_openai_client(api_key=or_key, base_url=OPENROUTER_BASE_URL,
default_headers=build_or_headers()), model or _OPENROUTER_MODEL
@@ -1775,7 +1794,7 @@ def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
return None, None
base_url = str((nous or {}).get("inference_base_url") or _nous_base_url()).rstrip("/")
return (
OpenAI(
_create_openai_client(
api_key=api_key,
base_url=base_url,
),
@@ -2052,7 +2071,7 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
if _custom_headers:
_extra["default_headers"] = _custom_headers
if custom_mode == "codex_responses":
real_client = OpenAI(api_key=custom_key, base_url=_clean_base, **_extra)
real_client = _create_openai_client(api_key=custom_key, base_url=_clean_base, **_extra)
return CodexAuxiliaryClient(real_client, model), model
if custom_mode == "anthropic_messages":
# Third-party Anthropic-compatible gateway (MiniMax, Zhipu GLM,
@@ -2066,14 +2085,14 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
"Custom endpoint declares api_mode=anthropic_messages but the "
"anthropic SDK is not installed — falling back to OpenAI-wire."
)
return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
return _create_openai_client(api_key=custom_key, base_url=_clean_base, **_extra), model
return (
AnthropicAuxiliaryClient(real_client, model, custom_key, custom_base, is_oauth=False),
model,
)
# URL-based anthropic detection for custom endpoints that didn't set
# api_mode explicitly (e.g. kimi.com/coding reached via custom config).
_fallback_client = OpenAI(api_key=custom_key, base_url=_clean_base, **_extra)
_fallback_client = _create_openai_client(api_key=custom_key, base_url=_clean_base, **_extra)
_fallback_client = _maybe_wrap_anthropic(
_fallback_client, model, custom_key, custom_base, custom_mode,
)
@@ -2102,7 +2121,7 @@ def _build_xai_oauth_aux_client(model: str) -> Tuple[Optional[Any], Optional[str
return None, None
api_key, base_url = resolved
logger.debug("Auxiliary client: xAI OAuth (%s via Responses API)", model)
real_client = OpenAI(api_key=api_key, base_url=base_url)
real_client = _create_openai_client(api_key=api_key, base_url=base_url)
return CodexAuxiliaryClient(real_client, model), model
@@ -2139,7 +2158,7 @@ def _build_codex_client(model: str) -> Tuple[Optional[Any], Optional[str]]:
return None, None
base_url = _CODEX_AUX_BASE_URL
logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", model)
real_client = OpenAI(
real_client = _create_openai_client(
api_key=codex_token,
base_url=base_url,
default_headers=_codex_cloudflare_headers(codex_token),
@@ -2239,7 +2258,7 @@ def _try_azure_foundry(
if _dq:
extra["default_query"] = _dq
client = OpenAI(api_key=api_key, base_url=_clean_base, **extra)
client = _create_openai_client(api_key=api_key, base_url=_clean_base, **extra)
if runtime_api_mode == "codex_responses":
# GPT-5.x / o-series / codex models on Azure Foundry are
@@ -3624,6 +3643,37 @@ def _resolve_auto(
# config.yaml (auxiliary.<task>.provider) still win over this.
main_provider = str(runtime_provider or _read_main_provider() or "")
main_model = str(runtime_model or _read_main_model() or "")
# MoA virtual provider: the "model" is a preset name (e.g. "opus-gpt") and
# there is no real "moa" HTTP endpoint, so resolving an aux client against
# provider="moa"/model=<preset> sends the preset name as the model id and
# the provider 400s ("opus-gpt is not a valid model ID"). Auxiliary tasks
# (title generation, compression, vision, …) don't need the reference
# fan-out — they should run on the aggregator, which is the preset's acting
# model. Resolve the MoA preset to its aggregator slot and continue Step 1
# with that real provider+model. Mirrors the MoA context-length resolution.
if main_provider == "moa":
try:
from hermes_cli.config import load_config
from hermes_cli.moa_config import resolve_moa_preset
_preset = resolve_moa_preset(load_config().get("moa") or {}, main_model)
_agg = _preset.get("aggregator") or {}
_agg_provider = str(_agg.get("provider") or "").strip()
_agg_model = str(_agg.get("model") or "").strip()
if _agg_provider and _agg_model and _agg_provider.lower() != "moa":
main_provider = _agg_provider
main_model = _agg_model
# The MoA virtual runtime carries a non-HTTP base_url
# ("moa://local") and a placeholder api_key; they belong to the
# facade, not the aggregator's real provider. Drop them so the
# aggregator resolves through its own provider credentials.
runtime_base_url = ""
runtime_api_key = ""
runtime_api_mode = ""
except Exception:
logger.debug("MoA aux resolution to aggregator failed", exc_info=True)
if (main_provider and main_model
and main_provider not in {"auto", ""}):
resolved_provider = main_provider
@@ -3770,6 +3820,10 @@ def _to_async_client(sync_client, model: str, is_vision: bool = False):
_merged_async = _apply_user_default_headers(async_kwargs.get("default_headers"))
if _merged_async:
async_kwargs["default_headers"] = _merged_async
async_kwargs = {
**_openai_http_client_kwargs(sync_base_url, async_mode=True),
**async_kwargs,
}
return AsyncOpenAI(**async_kwargs), model
@@ -3980,7 +4034,7 @@ def resolve_provider_client(
"but no Codex OAuth token found (run: hermes model)")
return None, None
final_model = _normalize_resolved_model(model, provider)
raw_client = OpenAI(
raw_client = _create_openai_client(
api_key=codex_token,
base_url=_CODEX_AUX_BASE_URL,
default_headers=_codex_cloudflare_headers(codex_token),
@@ -4061,7 +4115,7 @@ def resolve_provider_client(
_merged_custom = _apply_user_default_headers(extra.get("default_headers"))
if _merged_custom:
extra["default_headers"] = _merged_custom
client = OpenAI(api_key=custom_key, base_url=_clean_base, **extra)
client = _create_openai_client(api_key=custom_key, base_url=_clean_base, **extra)
client = _wrap_if_needed(client, final_model, custom_base, custom_key)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
@@ -4165,7 +4219,7 @@ def resolve_provider_client(
_fb_headers = _apply_user_default_headers(_fb_extra.get("default_headers"))
if _fb_headers:
_fb_extra["default_headers"] = _fb_headers
client = OpenAI(api_key=custom_key, base_url=_fb_clean, **_fb_extra)
client = _create_openai_client(api_key=custom_key, base_url=_fb_clean, **_fb_extra)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
sync_anthropic = AnthropicAuxiliaryClient(
@@ -4174,7 +4228,7 @@ def resolve_provider_client(
if async_mode:
return AsyncAnthropicAuxiliaryClient(sync_anthropic), final_model
return sync_anthropic, final_model
client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
client = _create_openai_client(api_key=custom_key, base_url=_clean_base2, **_extra2)
# codex_responses or inherited auto-detect (via _wrap_if_needed).
# _wrap_if_needed reads the closed-over `api_mode` (the task-level
# override). Named-provider entry api_mode=codex_responses also
@@ -4316,7 +4370,7 @@ def resolve_provider_client(
_merged_main = _apply_user_default_headers(headers)
if _merged_main:
headers = _merged_main
client = OpenAI(api_key=api_key, base_url=base_url,
client = _create_openai_client(api_key=api_key, base_url=base_url,
**({"default_headers": headers} if headers else {}))
# Copilot GPT-5+ models (except gpt-5-mini) require the Responses
@@ -4852,7 +4906,7 @@ def _refresh_nous_auxiliary_client(
return None, model
fresh_key, fresh_base_url = runtime
sync_client = OpenAI(api_key=fresh_key, base_url=fresh_base_url)
sync_client = _create_openai_client(api_key=fresh_key, base_url=fresh_base_url)
final_model = model
current_loop = None
@@ -5435,10 +5489,24 @@ def _build_call_kwargs(
# ``/anthropic`` endpoint reached through the OpenAI SDK wrapper), where
# max_tokens is a MANDATORY field — omitting it is a hard 400. Keep it only
# there.
#
# NVIDIA NIM (integrate.api.nvidia.com and local NIM endpoints) is a
# second exception: some models—notably minimaxai/minimax-m3—return HTTP
# 200 with an empty choices[] payload when max_tokens is omitted. The main
# NVIDIA chat path already sends an output cap via the provider profile;
# preserve it on the auxiliary path too.
_effective_base = base_url or (
_current_custom_base_url() if provider == "custom" else ""
)
if _is_anthropic_compat_endpoint(provider, _effective_base):
_provider_norm = str(provider or "").strip().lower()
_is_nvidia_nim = (
_provider_norm in {"nvidia", "nvidia-nim", "nim", "build-nvidia", "nemotron"}
or base_url_host_matches(_effective_base, "integrate.api.nvidia.com")
)
if (
_is_anthropic_compat_endpoint(provider, _effective_base)
or _is_nvidia_nim
):
kwargs["max_tokens"] = max_tokens
if tools:
@@ -5962,8 +6030,17 @@ def call_llm(
# When the provider returns a 429 rate-limit (not billing), fall
# back to an alternative provider instead of exhausting retries
# against the same rate-limited endpoint.
#
# ── Auth error fallback (#21165) ─────────────────────────────
# When the resolved provider returns 401 and neither the Nous
# refresh path nor explicit provider credential refresh applies,
# fall back to an alternative provider instead of dropping the
# auxiliary task on the floor (silent compression failure /
# message loss). Auth is NOT a capacity error: it only bypasses
# the explicit-provider gate when the user is in auto mode.
should_fallback = (
_is_payment_error(first_err)
_is_auth_error(first_err)
or _is_payment_error(first_err)
or _is_connection_error(first_err)
or _is_rate_limit_error(first_err)
or _is_model_incompatible_error(first_err)
@@ -5993,7 +6070,9 @@ def call_llm(
or _is_invalid_aux_response_error(first_err)
)
if should_fallback and (is_auto or is_capacity_error):
if _is_payment_error(first_err):
if _is_auth_error(first_err):
reason = "auth error"
elif _is_payment_error(first_err):
reason = "payment error"
# Resolve the actual provider label (resolved_provider may be
# "auto"; the client's base_url tells us which backend got the
@@ -6442,8 +6521,13 @@ async def async_call_llm(
raise
# ── Payment / connection / rate-limit fallback (mirrors sync call_llm) ──
# Auth error fallback (#21165): a 401 that survived the refresh path
# falls back in auto mode just like the sync call_llm() path. Auth is
# NOT a capacity error, so on an explicit provider it still respects
# the user's choice (handled by the is_auto/is_capacity_error gate).
should_fallback = (
_is_payment_error(first_err)
_is_auth_error(first_err)
or _is_payment_error(first_err)
or _is_connection_error(first_err)
or _is_rate_limit_error(first_err)
or _is_model_incompatible_error(first_err)
@@ -6465,7 +6549,9 @@ async def async_call_llm(
or _is_invalid_aux_response_error(first_err)
)
if should_fallback and (is_auto or is_capacity_error):
if _is_payment_error(first_err):
if _is_auth_error(first_err):
reason = "auth error"
elif _is_payment_error(first_err):
reason = "payment error"
_mark_provider_unhealthy(
_recoverable_pool_provider(resolved_provider, client) or resolved_provider

View File

@@ -28,6 +28,7 @@ from typing import Any, Dict, Optional
from hermes_cli.timeouts import get_provider_request_timeout, get_provider_stale_timeout
from hermes_constants import PARTIAL_STREAM_STUB_ID, FINISH_REASON_LENGTH
from agent.error_classifier import FailoverReason
from agent.gemini_native_adapter import is_native_gemini_base_url
from agent.model_metadata import is_local_endpoint
from agent.message_sanitization import (
_sanitize_surrogates,
@@ -37,6 +38,18 @@ from tools.terminal_tool import is_persistent_env
from utils import base_url_host_matches, base_url_hostname, env_float, env_int
logger = logging.getLogger(__name__)
_OPENROUTER_PROVIDER_SORT_VALUES = {"throughput", "latency", "price"}
# When the fallback chain is fully exhausted on a non-rate-limit failure
# (e.g. every provider returns a non-retryable client error like HTTP 400),
# arm a short cooldown so the NEXT turn's restore_primary_runtime stays gated
# and does not reset _fallback_index=0 to replay the entire chain again.
# Without this, a client/gateway that re-submits immediately would re-marshal
# the full (potentially 80k-token) context once per provider every turn and
# can drive a constrained host into memory/swap exhaustion. Rate-limit /
# billing reasons keep their own 60s cooldown (set above); this is the
# narrower non-rate-limit case. See issue #24996.
_FALLBACK_EXHAUSTED_COOLDOWN_S = 5.0
def _ra():
@@ -115,6 +128,23 @@ def _is_openai_codex_backend(agent) -> bool:
)
def _validated_openrouter_provider_sort(raw_sort: Any) -> Optional[str]:
"""Return a normalized OpenRouter provider.sort value or None."""
if not isinstance(raw_sort, str):
return None
sort_value = raw_sort.strip().lower()
if not sort_value:
return None
if sort_value in _OPENROUTER_PROVIDER_SORT_VALUES:
return sort_value
logger.warning(
"Ignoring invalid OpenRouter provider.sort value %r (allowed: %s)",
raw_sort,
", ".join(sorted(_OPENROUTER_PROVIDER_SORT_VALUES)),
)
return None
def _env_float(name: str, default: float) -> float:
try:
return float(os.getenv(name, str(default)))
@@ -229,6 +259,11 @@ def interruptible_api_call(agent, api_kwargs: dict):
invalidate_runtime_client(region)
raise
result["response"] = normalize_converse_response(raw_response)
elif agent.provider == "moa":
# MoA is a virtual chat-completions provider backed by the
# in-process MoAClient facade. Do not rebuild a request-local
# OpenAI client from the virtual runtime metadata.
result["response"] = agent.client.chat.completions.create(**api_kwargs)
else:
request_client = _set_request_client(
agent._create_request_openai_client(
@@ -698,8 +733,9 @@ def build_api_kwargs(agent, api_messages: list) -> dict:
_prefs["ignore"] = agent.providers_ignored
if agent.providers_order:
_prefs["order"] = agent.providers_order
if agent.provider_sort:
_prefs["sort"] = agent.provider_sort
_provider_sort = _validated_openrouter_provider_sort(agent.provider_sort)
if _provider_sort:
_prefs["sort"] = _provider_sort
if agent.provider_require_parameters:
_prefs["require_parameters"] = True
if agent.provider_data_collection:
@@ -1015,18 +1051,23 @@ def build_assistant_message(agent, assistant_message, finish_reason: str) -> dic
"arguments": tool_call.function.arguments
},
}
# Defence-in-depth: redact credentials from tool call arguments
# before they enter conversation history. Tool execution uses the
# raw API response object, not this dict, so redacting the
# persisted shape is safe and only affects storage. Catches the
# case where a model accidentally inlines a secret into a tool
# call (e.g. `terminal(command="curl -H 'Authorization: Bearer
# sk-...'")`). (#19798)
if isinstance(tc_dict["function"]["arguments"], str):
from agent.redact import redact_sensitive_text
tc_dict["function"]["arguments"] = redact_sensitive_text(
tc_dict["function"]["arguments"]
)
# Tool-call arguments are intentionally NOT redacted here. This
# dict enters the in-memory conversation history that is replayed
# to the model on every subsequent turn AND persisted to state.db,
# which is itself replayed verbatim on session resume
# (get_messages_as_conversation). Masking a credential to `***`
# here poisons that replay: the model reads back its own
# `PGPASSWORD='***' psql ...` call and copies the placeholder into
# the next tool call, breaking every credential-dependent command
# on the second turn (#43083). The masking also provided no real
# protection — the same secret still leaks verbatim through tool
# OUTPUT (file contents, command output, diffs, the compaction
# block), none of which this pass ever touched. Keeping secrets
# out of the replayable store is a separate tokenization/vault
# concern, not something arg-redaction can deliver without
# breaking replay. Storage-time redaction remains governed by the
# `security.redact_secrets` toggle. (#19798 introduced this;
# #43083 removed it.)
# Preserve extra_content (e.g. Gemini thought_signature) so it
# is sent back on subsequent API calls. Without this, Gemini 3
# thinking models reject the request with a 400 error.
@@ -1093,8 +1134,22 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
if (not fallback_already_active) or (primary_provider and current_provider == primary_provider):
agent._rate_limited_until = time.monotonic() + 60
if agent._fallback_index >= len(agent._fallback_chain):
# Chain exhausted. If we actually walked a non-empty chain and the
# failure was NOT a rate-limit/billing event (those already armed
# their own 60s cooldown above), arm a short cooldown so the next
# turn's restore_primary_runtime stays gated instead of resetting
# _fallback_index=0 and re-marshaling the whole context across every
# provider again. Guards the cross-turn replay storm in #24996.
if (
len(agent._fallback_chain) > 0
and reason not in {FailoverReason.rate_limit, FailoverReason.billing}
):
_existing_cooldown = getattr(agent, "_rate_limited_until", 0) or 0
agent._rate_limited_until = max(
_existing_cooldown,
time.monotonic() + _FALLBACK_EXHAUSTED_COOLDOWN_S,
)
return False
fb = agent._fallback_chain[agent._fallback_index]
agent._fallback_index += 1
fb_provider = (fb.get("provider") or "").strip().lower()
@@ -1210,14 +1265,16 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
agent._transport_cache.clear()
agent._fallback_activated = True
# Clear the credential pool when the fallback provider doesn't match
# the pool's provider. The pool was seeded for the primary provider;
# leaving it attached means downstream recovery (rate_limit / billing /
# auth) calls ``_swap_credential`` with a primary entry which overwrites
# the agent's ``base_url`` back to the primary's endpoint — every
# fallback request then 404s against the wrong host. See #33163.
# Rebind the credential pool to the fallback provider when the provider
# changes. Keeping the primary pool attached would make downstream
# recovery (rate_limit / billing / auth) mutate the wrong credential
# set and can overwrite the fallback's base_url back to the primary
# endpoint. See #33163.
#
# When the fallback shares the pool's provider (e.g. both openrouter
# entries with different routing) the pool is preserved.
# entries with different routing) the pool is preserved. When the
# providers differ, load the fallback provider's own pool if one exists
# so provider-specific rotation continues to work after the switch.
_existing_pool = getattr(agent, "_credential_pool", None)
if _existing_pool is not None:
_pool_provider = (getattr(_existing_pool, "provider", "") or "").strip().lower()
@@ -1228,6 +1285,22 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
fb_provider, fb_model, _pool_provider,
)
agent._credential_pool = None
if getattr(agent, "_credential_pool", None) is None:
try:
from agent.credential_pool import load_pool
fallback_pool = load_pool(fb_provider)
if fallback_pool and fallback_pool.has_credentials():
agent._credential_pool = fallback_pool
logger.info(
"Fallback to %s/%s: attached fallback credential pool",
fb_provider, fb_model,
)
except Exception as exc:
logger.debug(
"Fallback to %s/%s: could not attach credential pool: %s",
fb_provider, fb_model, exc,
)
# Honor per-provider / per-model request_timeout_seconds for the
# fallback target (same knob the primary client uses). None = use
@@ -1458,8 +1531,9 @@ def handle_max_iterations(agent, messages: list, api_call_count: int) -> str:
provider_preferences["ignore"] = agent.providers_ignored
if agent.providers_order:
provider_preferences["order"] = agent.providers_order
if agent.provider_sort:
provider_preferences["sort"] = agent.provider_sort
_provider_sort = _validated_openrouter_provider_sort(agent.provider_sort)
if _provider_sort:
provider_preferences["sort"] = _provider_sort
if provider_preferences and (
(agent.provider or "").strip().lower() == "openrouter"
or agent._is_openrouter_url()
@@ -1838,7 +1912,6 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
stream_kwargs = {
**api_kwargs,
"stream": True,
"stream_options": {"include_usage": True},
"timeout": _httpx.Timeout(
connect=_conn_cap,
read=_stream_read_timeout,
@@ -1846,6 +1919,14 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
pool=_conn_cap,
),
}
# OpenAI's `stream_options={"include_usage": True}` drives usage
# accounting on OpenAI-compatible endpoints (incl. the Gemini OpenAI
# compat shim and aggregators like OpenRouter). Google's *native*
# Gemini REST endpoint rejects the keyword outright
# (`Completions.create() got an unexpected keyword argument
# 'stream_options'`), so omit it only for that endpoint.
if not is_native_gemini_base_url(agent.base_url):
stream_kwargs["stream_options"] = {"include_usage": True}
request_client = _set_request_client(
agent._create_request_openai_client(
reason="chat_completion_stream_request",
@@ -2246,7 +2327,15 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
_fire_first_delta()
agent._fire_reasoning_delta(thinking_text)
# Return the native Anthropic Message for downstream processing
# Return the native Anthropic Message for downstream processing.
# If the stream was interrupted (the event loop broke out above on
# agent._interrupt_requested), do NOT call get_final_message() — on
# a partially-consumed stream the SDK may hang draining remaining
# events or return a Message with incomplete tool_use blocks (partial
# JSON in `input`). The outer poll loop raises InterruptedError, so
# this return value is discarded anyway.
if agent._interrupt_requested:
return None
return stream.get_final_message()
def _call():
@@ -2391,12 +2480,19 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
diag=request_client_holder.get("diag"),
)
_close_request_client_once("stream_mid_tool_retry_cleanup")
try:
agent._replace_primary_openai_client(
reason="stream_mid_tool_retry_pool_cleanup"
)
except Exception:
pass
if agent.api_mode == "anthropic_messages":
try:
agent._anthropic_client.close()
agent._rebuild_anthropic_client()
except Exception:
pass
else:
try:
agent._replace_primary_openai_client(
reason="stream_mid_tool_retry_pool_cleanup"
)
except Exception:
pass
continue
# SSE error events from proxies (e.g. OpenRouter sends
@@ -2444,12 +2540,19 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
_close_request_client_once("stream_retry_cleanup")
# Also rebuild the primary client to purge
# any dead connections from the pool.
try:
agent._replace_primary_openai_client(
reason="stream_retry_pool_cleanup"
)
except Exception:
pass
if agent.api_mode == "anthropic_messages":
try:
agent._anthropic_client.close()
agent._rebuild_anthropic_client()
except Exception:
pass
else:
try:
agent._replace_primary_openai_client(
reason="stream_retry_pool_cleanup"
)
except Exception:
pass
continue
# Retries exhausted. Log the final failure with
# full diagnostic detail (chain, headers,
@@ -2620,10 +2723,17 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
pass
# Rebuild the primary client too — its connection pool
# may hold dead sockets from the same provider outage.
try:
agent._replace_primary_openai_client(reason="stale_stream_pool_cleanup")
except Exception:
pass
if agent.api_mode == "anthropic_messages":
try:
agent._anthropic_client.close()
agent._rebuild_anthropic_client()
except Exception:
pass
else:
try:
agent._replace_primary_openai_client(reason="stale_stream_pool_cleanup")
except Exception:
pass
# Reset the timer so we don't kill repeatedly while
# the inner thread processes the closure.
last_chunk_time["t"] = time.time()
@@ -2699,7 +2809,30 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
role="assistant", content=_partial_text, tool_calls=None,
reasoning_content=None,
)
return SimpleNamespace(
# Detect provider output-layer content filtering (e.g. MiniMax
# "output new_sensitive (1027)", Azure/OpenAI content_filter,
# Anthropic safety refusal). The raw error is about to be
# swallowed into a finish_reason=length stub, so classify it HERE
# while we still have it and stamp the stub. Retrying such a
# content-deterministic filter on the same primary just re-hits
# the filter — the conversation loop reads this tag and activates
# the fallback chain instead of burning continuation retries.
# error_classifier is the single source of truth for "what counts
# as a content filter" (#32421).
_content_filter_terminated = False
try:
from agent.error_classifier import classify_api_error, FailoverReason
_cls = classify_api_error(
result["error"],
provider=str(getattr(agent, "provider", "") or ""),
model=str(getattr(agent, "model", "") or ""),
)
_content_filter_terminated = (
_cls.reason == FailoverReason.content_policy_blocked
)
except Exception:
_content_filter_terminated = False
_stub = SimpleNamespace(
id=PARTIAL_STREAM_STUB_ID,
model=getattr(agent, "model", "unknown"),
choices=[SimpleNamespace(
@@ -2708,6 +2841,9 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
usage=None,
_dropped_tool_names=_partial_names or None,
)
if _content_filter_terminated:
_stub._content_filter_terminated = True
return _stub
raise result["error"]
return result["response"]

View File

@@ -60,6 +60,8 @@ from dataclasses import dataclass
from pathlib import Path
from typing import Any, Optional
from hermes_cli._subprocess_compat import IS_WINDOWS, windows_hide_flags
logger = logging.getLogger("hermes.coding_context")
CODING_TOOLSET = "coding"
@@ -647,12 +649,14 @@ def _enabled_mcp_servers(config: Optional[dict[str, Any]]) -> list[str]:
def _git(cwd: Path, *args: str) -> str:
_popen_kwargs = {"creationflags": windows_hide_flags()} if IS_WINDOWS else {}
try:
out = subprocess.run(
["git", "-C", str(cwd), *args],
capture_output=True,
text=True,
timeout=_GIT_TIMEOUT,
**_popen_kwargs,
)
except (OSError, subprocess.SubprocessError):
return ""

156
agent/context_breakdown.py Normal file
View File

@@ -0,0 +1,156 @@
"""Live session context-window breakdown for UI surfaces.
Estimates how the next provider request is composed: system prompt tiers,
tool schemas, and conversation history. Uses the same rough char/4 heuristic
as ``agent.model_metadata.estimate_request_tokens_rough`` so numbers align
with compression thresholds — not exact tokenizer counts.
"""
from __future__ import annotations
import json
import re
from typing import Any, Dict, List, Optional, Sequence, Tuple
_SKILLS_BLOCK_RE = re.compile(r"<available_skills>.*?</available_skills>", re.DOTALL)
_SUBAGENT_TOOL_NAMES = frozenset({"delegate_task"})
_CATEGORY_COLORS = {
"system_prompt": "var(--context-usage-system)",
"tool_definitions": "var(--context-usage-tools)",
"rules": "var(--context-usage-rules)",
"skills": "var(--context-usage-skills)",
"mcp": "var(--context-usage-mcp)",
"subagent_definitions": "var(--context-usage-subagents)",
"memory": "var(--context-usage-memory)",
"conversation": "var(--context-usage-conversation)",
}
def _chars_to_tokens(text: str) -> int:
if not text:
return 0
return (len(text) + 3) // 4
def _json_tokens(value: Any) -> int:
if not value:
return 0
return _chars_to_tokens(json.dumps(value, ensure_ascii=False))
def _tool_name(tool: dict) -> str:
fn = tool.get("function") if isinstance(tool, dict) else None
if isinstance(fn, dict):
return str(fn.get("name") or "")
return str(tool.get("name") or "")
def _split_tools(tools: Sequence[dict]) -> Tuple[List[dict], List[dict], List[dict]]:
builtin: List[dict] = []
mcp: List[dict] = []
subagent: List[dict] = []
for tool in tools:
name = _tool_name(tool)
if name.startswith("mcp_"):
mcp.append(tool)
elif name in _SUBAGENT_TOOL_NAMES:
subagent.append(tool)
else:
builtin.append(tool)
return builtin, mcp, subagent
def _memory_blocks(agent: Any) -> Tuple[str, str]:
memory_block = ""
user_block = ""
store = getattr(agent, "_memory_store", None)
if store is None:
return memory_block, user_block
try:
if getattr(agent, "_memory_enabled", True):
memory_block = store.format_for_system_prompt("memory") or ""
if getattr(agent, "_user_profile_enabled", True):
user_block = store.format_for_system_prompt("user") or ""
except Exception:
pass
return memory_block, user_block
def _strip_blocks(text: str, *blocks: str) -> str:
out = text
for block in blocks:
if block:
out = out.replace(block, "")
return out.strip()
def compute_session_context_breakdown(
agent: Any,
messages: Optional[List[dict]] = None,
) -> Dict[str, Any]:
"""Return a Cursor-style context usage breakdown for one live agent."""
from agent.model_metadata import estimate_messages_tokens_rough
from agent.system_prompt import build_system_prompt_parts
parts = build_system_prompt_parts(agent)
stable = parts.get("stable", "") or ""
context = parts.get("context", "") or ""
volatile = parts.get("volatile", "") or ""
skills_match = _SKILLS_BLOCK_RE.search(stable)
skills_index = skills_match.group(0) if skills_match else ""
memory_block, user_block = _memory_blocks(agent)
memory_text = "\n\n".join(part for part in (memory_block, user_block) if part).strip()
system_core = _strip_blocks(stable, skills_index)
system_tail = _strip_blocks(volatile, memory_block, user_block)
system_prompt_text = "\n\n".join(part for part in (system_core, system_tail) if part).strip()
tools = list(getattr(agent, "tools", None) or [])
builtin_tools, mcp_tools, subagent_tools = _split_tools(tools)
conversation_tokens = estimate_messages_tokens_rough(messages or [])
categories = [
("system_prompt", "System prompt", _chars_to_tokens(system_prompt_text)),
("tool_definitions", "Tool definitions", _json_tokens(builtin_tools)),
("rules", "Rules", _chars_to_tokens(context)),
("skills", "Skills", _chars_to_tokens(skills_index)),
("mcp", "MCP", _json_tokens(mcp_tools)),
("subagent_definitions", "Subagent definitions", _json_tokens(subagent_tools)),
("memory", "Memory", _chars_to_tokens(memory_text)),
("conversation", "Conversation", conversation_tokens),
]
estimated_total = sum(tokens for _, _, tokens in categories)
comp = getattr(agent, "context_compressor", None)
context_max = int(getattr(comp, "context_length", 0) or 0) if comp else 0
measured_used = int(getattr(comp, "last_prompt_tokens", 0) or 0) if comp else 0
context_used = measured_used if measured_used > 0 else estimated_total
context_percent = (
max(0, min(100, round(context_used / context_max * 100)))
if context_max
else 0
)
return {
"categories": [
{
"color": _CATEGORY_COLORS.get(category_id, "var(--ui-text-tertiary)"),
"id": category_id,
"label": label,
"tokens": tokens,
}
for category_id, label, tokens in categories
if tokens > 0
],
"context_max": context_max,
"context_percent": context_percent,
"context_used": context_used,
"estimated_total": estimated_total,
"model": getattr(agent, "model", "") or "",
}

View File

@@ -12,6 +12,7 @@ from pathlib import Path
from typing import Awaitable, Callable
from agent.model_metadata import estimate_tokens_rough
from hermes_cli._subprocess_compat import IS_WINDOWS, windows_hide_flags
_QUOTED_REFERENCE_VALUE = r'(?:`[^`\n]+`|"[^"\n]+"|\'[^\'\n]+\')'
REFERENCE_PATTERN = re.compile(
@@ -290,6 +291,7 @@ def _expand_git_reference(
args: list[str],
label: str,
) -> tuple[str | None, str | None]:
_popen_kwargs = {"creationflags": windows_hide_flags()} if IS_WINDOWS else {}
try:
result = subprocess.run(
["git", *args],
@@ -298,6 +300,7 @@ def _expand_git_reference(
text=True,
timeout=30,
stdin=subprocess.DEVNULL,
**_popen_kwargs,
)
except subprocess.TimeoutExpired:
return f"{ref.raw}: git command timed out (30s)", None
@@ -483,6 +486,7 @@ def _iter_visible_entries(path: Path, cwd: Path, limit: int) -> list[Path]:
def _rg_files(path: Path, cwd: Path, limit: int) -> list[Path] | None:
_popen_kwargs = {"creationflags": windows_hide_flags()} if IS_WINDOWS else {}
try:
result = subprocess.run(
["rg", "--files", str(path.relative_to(cwd))],
@@ -491,6 +495,7 @@ def _rg_files(path: Path, cwd: Path, limit: int) -> list[Path] | None:
text=True,
timeout=10,
stdin=subprocess.DEVNULL,
**_popen_kwargs,
)
except (FileNotFoundError, OSError, subprocess.TimeoutExpired):
return None

View File

@@ -288,6 +288,29 @@ def replay_compression_warning(agent: Any) -> None:
pass
def conversation_history_after_compression(agent: Any, messages: list) -> Optional[list]:
"""Return the correct flush baseline after a compression boundary.
Legacy compression rotates to a fresh child session. That child has not
seen the compacted transcript through the normal same-turn flush path yet,
so callers must clear ``conversation_history`` to ``None`` and let the next
persistence call write the whole compacted list.
In-place compaction is different: ``archive_and_compact()`` has already
soft-archived the previous active rows and inserted ``messages`` as the new
active live transcript under the same session id. If the same agent turn
continues with ``conversation_history=None``, the identity-based flush path
treats those already-persisted compacted dicts as new and appends them a
second time, doubling the active context and retriggering compression.
A shallow copy is intentional: it captures the current compacted dict
identities as history while allowing later same-turn appends to remain new.
"""
if bool(getattr(agent, "_last_compaction_in_place", False)):
return list(messages)
return None
def compress_context(
agent: Any,
messages: list,

View File

@@ -28,6 +28,7 @@ import uuid
from typing import Any, Dict, List, Optional
from agent.codex_responses_adapter import _summarize_user_message_for_log
from agent.conversation_compression import conversation_history_after_compression
from agent.display import KawaiiSpinner
from agent.error_classifier import FailoverReason, classify_api_error
from agent.iteration_budget import IterationBudget
@@ -587,6 +588,13 @@ def run_conversation(
compression_attempts = 0
_turn_exit_reason = "unknown" # Diagnostic: why the loop ended
# Per-turn tally of consecutive successful credential-pool token refreshes,
# keyed by (provider, pool-entry-id). A persistent upstream 401 lets
# ``try_refresh_current()`` "succeed" forever on a single-entry OAuth pool,
# so this tally caps same-entry refreshes and lets the fallback chain take
# over instead of spinning. Reset here so each turn starts fresh. See #26080.
agent._auth_pool_refresh_counts = {}
# Optional opt-in runtime: if api_mode == codex_app_server, hand the
# turn to the codex app-server subprocess (terminal/file ops/patching
# all run inside Codex). Default Hermes path is bypassed entirely.
@@ -827,7 +835,6 @@ def run_conversation(
aggregator=moa_config.get("aggregator") or {},
temperature=float(moa_config.get("reference_temperature", 0.6) or 0.6),
aggregator_temperature=float(moa_config.get("aggregator_temperature", 0.4) or 0.4),
max_tokens=int(moa_config.get("max_tokens", 4096) or 4096),
)
if _moa_context:
for _msg in reversed(api_messages):
@@ -1692,6 +1699,56 @@ def run_conversation(
if agent.api_mode in {"chat_completions", "bedrock_converse", "anthropic_messages"}:
assistant_message = _trunc_msg
# ── Content-filter stream stall → fallback (#32421) ──
# When the provider's output-layer safety filter (e.g.
# MiniMax "output new_sensitive (1027)", Azure
# content_filter) kills the stream mid-delivery, the
# raw error was classified at the swallow point and the
# stub tagged ``_content_filter_terminated``. This
# filter is content-deterministic — continuation
# retries against the SAME primary just re-hit it and
# burn paid attempts (the loop used to give up with
# "Response remained truncated after 3 continuation
# attempts" and never consult the fallback chain).
# Escalate to the configured fallback BEFORE retrying.
_cf_terminated = getattr(
response, "_content_filter_terminated", False
)
if (
_cf_terminated
and agent._fallback_index < len(agent._fallback_chain)
):
agent._vprint(
f"{agent.log_prefix}🛡️ Content filter terminated "
f"stream — activating fallback provider...",
force=True,
)
agent._emit_status(
"Content filter terminated stream; switching to fallback..."
)
if agent._try_activate_fallback():
# Roll the partial content (if any was already
# appended in a prior continuation pass) back to
# the last clean turn so the fallback provider
# gets a coherent continuation point.
if truncated_response_parts:
messages = agent._get_messages_up_to_last_assistant(messages)
agent._session_messages = messages
length_continue_retries = 0
truncated_response_parts = []
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
_retry.restart_with_rebuilt_messages = True
break
# No fallback available — fall through to normal
# continuation (best-effort, may loop).
agent._vprint(
f"{agent.log_prefix}⚠️ No fallback provider "
f"configured — retrying with same provider "
f"(may re-hit filter)...",
force=True,
)
if assistant_message is not None and not _trunc_has_tool_calls:
length_continue_retries += 1
interim_msg = agent._build_assistant_message(assistant_message, finish_reason)
@@ -2259,6 +2316,15 @@ def run_conversation(
# "unknown variant `image_url`, expected `text`".
"unknown variant `image_url`, expected `text`",
"unknown variant image_url, expected text",
# OpenRouter routes a request to upstream endpoints and,
# when none of the candidate endpoints for the model accept
# image input, returns HTTP 404 "No endpoints found that
# support image input". Without this phrase the agent never
# strips the images, the retry loop re-sends the same
# rejected request until exhaustion, and the gateway leaves
# every subsequent message queued behind the stuck turn —
# the P1 in issue #21160. The 404 passes the 4xx gate below.
"no endpoints found that support image input",
)
_err_lower = _err_body.lower()
_looks_like_image_rejection = any(
@@ -2830,10 +2896,9 @@ def run_conversation(
approx_tokens=approx_tokens,
task_id=effective_task_id,
)
# Compression created a new session — clear history
# so _flush_messages_to_session_db writes compressed
# messages to the new session, not skipping them.
conversation_history = None
conversation_history = conversation_history_after_compression(
agent, messages
)
if len(messages) < original_len or old_ctx > _reduced_ctx:
agent._buffer_status(
f"🗜️ Context reduced to {_reduced_ctx:,} tokens "
@@ -2845,15 +2910,25 @@ def run_conversation(
# Fall through to normal error handling if compression
# is exhausted or didn't help.
# Eager fallback for rate-limit errors (429 or quota exhaustion).
# When a fallback model is configured, switch immediately instead
# of burning through retries with exponential backoff -- the
# primary provider won't recover within the retry window.
# Eager fallback for rate-limit errors (429 or quota exhaustion)
# and transport errors (connection failure / timeout / provider
# overloaded). Rate limits and billing: switch immediately —
# the primary provider won't recover within the retry window.
# Transport errors: allow 1 retry first (transient hiccups
# recover), then fall back if the provider is truly unreachable.
is_rate_limited = classified.reason in {
FailoverReason.rate_limit,
FailoverReason.billing,
}
if is_rate_limited and agent._fallback_index < len(agent._fallback_chain):
_is_transport_failure = classified.reason in {
FailoverReason.timeout,
FailoverReason.overloaded,
}
_should_fallback = (
is_rate_limited
or (_is_transport_failure and retry_count >= 2)
)
if _should_fallback and agent._fallback_index < len(agent._fallback_chain):
# Don't eagerly fallback if credential pool rotation may
# still recover. See _pool_may_recover_from_rate_limit
# for the single-credential-pool and CloudCode-quota
@@ -2868,6 +2943,10 @@ def run_conversation(
agent._buffer_status(
"⚠️ Billing or credits exhausted — switching to fallback provider..."
)
elif _is_transport_failure:
agent._buffer_status(
"⚠️ Provider unreachable — switching to fallback provider..."
)
else:
agent._buffer_status("⚠️ Rate limited — switching to fallback provider...")
if agent._try_activate_fallback(reason=classified.reason):
@@ -3042,10 +3121,9 @@ def run_conversation(
messages, system_message, approx_tokens=approx_tokens,
task_id=effective_task_id,
)
# Compression created a new session — clear history
# so _flush_messages_to_session_db writes compressed
# messages to the new session, not skipping them.
conversation_history = None
conversation_history = conversation_history_after_compression(
agent, messages
)
# Re-estimate tokens after compression. Same-message-count
# compression (tool-result pruning, in-place summarization)
@@ -3209,10 +3287,9 @@ def run_conversation(
messages, system_message, approx_tokens=approx_tokens,
task_id=effective_task_id,
)
# Compression created a new session — clear history
# so _flush_messages_to_session_db writes compressed
# messages to the new session, not skipping them.
conversation_history = None
conversation_history = conversation_history_after_compression(
agent, messages
)
# Re-estimate tokens after compression. Same-message-count
# compression (tool-result pruning, in-place summarization)
@@ -3474,6 +3551,13 @@ def run_conversation(
):
_retry.primary_recovery_attempted = True
retry_count = 0
# Primary transport recovery starts a fresh attempt
# cycle. Re-open fallback state so a follow-on 429 can
# still activate fallback_providers after stale
# pre-recovery fallback/credential-pool bookkeeping.
_retry.has_retried_429 = False
agent._fallback_index = 0
agent._fallback_activated = False
continue
# Try fallback before giving up entirely
if agent._has_pending_fallback():
@@ -3661,7 +3745,12 @@ def run_conversation(
_ra_raw = _resp_headers.get("retry-after") or _resp_headers.get("Retry-After")
if _ra_raw:
try:
_retry_after = min(float(_ra_raw), 120) # Cap at 2 minutes
# Cap at 10 minutes. Anthropic Tier 1 input-token
# buckets reset in ~171s, so a 120s cap caused us to
# retry before the actual reset window and re-trip the
# limit. 600s covers all realistic provider reset
# windows while still rejecting pathological values. (#26293)
_retry_after = min(float(_ra_raw), 600)
except (TypeError, ValueError):
pass
wait_time = _retry_after if _retry_after else jittered_backoff(retry_count, base_delay=2.0, max_delay=60.0)
@@ -3742,6 +3831,17 @@ def run_conversation(
_retry.restart_with_compressed_messages = False
continue
if _retry.restart_with_rebuilt_messages:
# A content-filter stream stall (#32421) was escalated to the
# fallback chain and the partial content rolled back. Re-issue
# the API call against the now-active fallback provider. Refund
# the budget/count for the stalled attempt so the fallback gets a
# fair turn.
api_call_count -= 1
agent.iteration_budget.refund()
_retry.restart_with_rebuilt_messages = False
continue
if _retry.restart_with_length_continuation:
# Progressively boost the output token budget on each retry.
# Retry 1 → 2× base, retry 2 → 3× base, capped at 32 768.
@@ -4316,10 +4416,9 @@ def run_conversation(
approx_tokens=agent.context_compressor.last_prompt_tokens,
task_id=effective_task_id,
)
# Compression created a new session — clear history so
# _flush_messages_to_session_db writes compressed messages
# to the new session (see preflight compression comment).
conversation_history = None
conversation_history = conversation_history_after_compression(
agent, messages
)
# Save session log incrementally (so progress is visible even if interrupted)
agent._session_messages = messages
@@ -4361,7 +4460,11 @@ def run_conversation(
"as final response"
)
final_response = _recovered
agent._response_was_previewed = True
# Streaming delivered a fragment, not a confirmed
# final preview. Leave response_previewed false so
# gateway fallback delivery can send the recovered
# text plus the abnormal-turn explanation.
agent._response_was_previewed = False
break
# If the previous turn already delivered real content alongside
@@ -4606,14 +4709,20 @@ def run_conversation(
# status from earlier failed attempts in this turn.
agent._clear_status_buffer()
from agent.agent_runtime_helpers import (
intent_ack_continuation_mode,
)
_ack_mode = intent_ack_continuation_mode(agent)
if (
agent.api_mode == "codex_responses"
_ack_mode != "off"
and agent.valid_tool_names
and codex_ack_continuations < 2
and agent._looks_like_codex_intermediate_ack(
user_message=user_message,
assistant_content=final_response,
messages=messages,
require_workspace=(_ack_mode == "codex_only"),
)
):
codex_ack_continuations += 1

View File

@@ -21,8 +21,14 @@ from pathlib import Path
from types import SimpleNamespace
from typing import Any
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
Function,
)
from agent.file_safety import get_read_block_error, is_write_denied
from agent.redact import redact_sensitive_text
from tools.environments.local import hermes_subprocess_env
ACP_MARKER_BASE_URL = "acp://copilot"
_DEFAULT_TIMEOUT_SECONDS = 900.0
@@ -94,7 +100,10 @@ def _resolve_home_dir() -> str:
def _build_subprocess_env() -> dict[str, str]:
env = os.environ.copy()
# Copilot ACP is a model-driving CLI executor: it legitimately needs LLM
# provider credentials. Route through the central helper so Tier-1 secrets
# (gateway bot tokens, GitHub auth, infra) are still stripped (#29157).
env = hermes_subprocess_env(inherit_credentials=True)
home = _resolve_home_dir()
env["HOME"] = home
from hermes_constants import apply_subprocess_home_env
@@ -224,11 +233,73 @@ def _render_message_content(content: Any) -> str:
return str(content).strip()
def _extract_tool_calls_from_text(text: str) -> tuple[list[SimpleNamespace], str]:
def _build_openai_tool_call(
*,
call_id: str,
name: str,
arguments: str,
) -> ChatCompletionMessageToolCall:
"""Build an OpenAI-compatible tool-call object for downstream handling."""
return ChatCompletionMessageToolCall(
id=call_id,
call_id=call_id,
response_item_id=None,
type="function",
function=Function(name=name, arguments=arguments),
)
def _completion_to_stream_chunks(completion: SimpleNamespace) -> list[SimpleNamespace]:
"""Convert a one-shot ACP response into OpenAI-style stream chunks."""
choice = completion.choices[0]
message = choice.message
tool_call_deltas = None
if message.tool_calls:
tool_call_deltas = []
for index, tool_call in enumerate(message.tool_calls):
tool_call_deltas.append(
SimpleNamespace(
index=index,
id=getattr(tool_call, "id", None),
type=getattr(tool_call, "type", "function"),
function=SimpleNamespace(
name=getattr(tool_call.function, "name", None),
arguments=getattr(tool_call.function, "arguments", None),
),
)
)
delta = SimpleNamespace(
role="assistant",
content=message.content or None,
tool_calls=tool_call_deltas,
reasoning_content=message.reasoning_content,
reasoning=message.reasoning,
)
data_chunk = SimpleNamespace(
choices=[
SimpleNamespace(
index=0,
delta=delta,
finish_reason=choice.finish_reason,
)
],
model=completion.model,
usage=None,
)
usage_chunk = SimpleNamespace(
choices=[],
model=completion.model,
usage=completion.usage,
)
return [data_chunk, usage_chunk]
def _extract_tool_calls_from_text(text: str) -> tuple[list[ChatCompletionMessageToolCall], str]:
if not isinstance(text, str) or not text.strip():
return [], ""
extracted: list[SimpleNamespace] = []
extracted: list[ChatCompletionMessageToolCall] = []
consumed_spans: list[tuple[int, int]] = []
def _try_add_tool_call(raw_json: str) -> None:
@@ -252,12 +323,10 @@ def _extract_tool_calls_from_text(text: str) -> tuple[list[SimpleNamespace], str
call_id = f"acp_call_{len(extracted)+1}"
extracted.append(
SimpleNamespace(
id=call_id,
_build_openai_tool_call(
call_id=call_id,
response_item_id=None,
type="function",
function=SimpleNamespace(name=fn_name.strip(), arguments=fn_args),
name=fn_name.strip(),
arguments=fn_args,
)
)
@@ -376,6 +445,7 @@ class CopilotACPClient:
timeout: float | None = None,
tools: list[dict[str, Any]] | None = None,
tool_choice: Any = None,
stream: bool = False,
**_: Any,
) -> Any:
prompt_text = _format_messages_as_prompt(
@@ -422,11 +492,14 @@ class CopilotACPClient:
)
finish_reason = "tool_calls" if tool_calls else "stop"
choice = SimpleNamespace(message=assistant_message, finish_reason=finish_reason)
return SimpleNamespace(
completion = SimpleNamespace(
choices=[choice],
usage=usage,
model=model or "copilot-acp",
)
if stream:
return _completion_to_stream_chunks(completion)
return completion
def _run_prompt(self, prompt_text: str, *, timeout_seconds: float) -> tuple[str, str]:
try:

View File

@@ -537,10 +537,11 @@ class CredentialPool:
self._entries[idx] = new
return
def _persist(self) -> None:
def _persist(self, *, removed_ids: Optional[List[str]] = None) -> None:
write_credential_pool(
self.provider,
[entry.to_dict() for entry in self._entries],
removed_ids=removed_ids,
)
def _is_terminal_auth_failure(
@@ -1124,13 +1125,17 @@ class CredentialPool:
logger.debug(
"Failed to clear terminal xAI OAuth state: %s", clear_exc
)
removed_ids = [
item.id for item in self._entries
if item.source == "loopback_pkce"
]
self._entries = [
item for item in self._entries
if item.source != "loopback_pkce"
]
if self._current_id == entry.id:
self._current_id = None
self._persist()
self._persist(removed_ids=removed_ids)
return None
# For openai-codex: same race as xAI/nous — another Hermes process
# may have consumed the refresh token between our proactive sync
@@ -1190,13 +1195,17 @@ class CredentialPool:
logger.debug(
"Failed to clear terminal Codex OAuth state: %s", clear_exc
)
removed_ids = [
item.id for item in self._entries
if item.source == "device_code"
]
self._entries = [
item for item in self._entries
if item.source != "device_code"
]
if self._current_id == entry.id:
self._current_id = None
self._persist()
self._persist(removed_ids=removed_ids)
return None
# For nous: another process may have consumed the refresh token
# between our proactive sync and the HTTP call. Re-sync from
@@ -1253,13 +1262,17 @@ class CredentialPool:
auth_mod.NOUS_DEVICE_CODE_SOURCE,
f"manual:{auth_mod.NOUS_DEVICE_CODE_SOURCE}",
}
removed_ids = [
item.id for item in self._entries
if item.source in singleton_sources
]
self._entries = [
item for item in self._entries
if item.source not in singleton_sources
]
if self._current_id == entry.id:
self._current_id = None
self._persist()
self._persist(removed_ids=removed_ids)
return None
self._mark_exhausted(entry, None)
return None
@@ -1421,7 +1434,7 @@ class CredentialPool:
pruned_ids = set(entries_to_prune)
self._entries = [e for e in self._entries if e.id not in pruned_ids]
if cleared_any:
self._persist()
self._persist(removed_ids=entries_to_prune)
return available
def _select_unlocked(self) -> Optional[PooledCredential]:
@@ -1595,7 +1608,11 @@ class CredentialPool:
replace(entry, priority=new_priority)
for new_priority, entry in enumerate(self._entries)
]
self._persist()
write_credential_pool(
self.provider,
[entry.to_dict() for entry in self._entries],
removed_ids=[removed.id],
)
if self._current_id == removed.id:
self._current_id = None
return removed
@@ -2257,6 +2274,11 @@ def _seed_custom_pool(pool_key: str, entries: List[PooledCredential]) -> Tuple[b
def load_pool(provider: str) -> CredentialPool:
provider = (provider or "").strip().lower()
raw_entries = read_credential_pool(provider)
disk_ids = {
entry.get("id")
for entry in raw_entries
if isinstance(entry, dict) and entry.get("id")
}
raw_needs_sanitization = any(
isinstance(payload, dict)
and sanitize_borrowed_credential_payload(payload, provider) != payload
@@ -2285,8 +2307,10 @@ def load_pool(provider: str) -> CredentialPool:
changed |= _normalize_pool_priorities(provider, entries)
if changed:
new_ids = {entry.id for entry in entries}
write_credential_pool(
provider,
[entry.to_dict() for entry in sorted(entries, key=lambda item: item.priority)],
removed_ids=disk_ids - new_ids,
)
return CredentialPool(provider, entries)

View File

@@ -273,6 +273,21 @@ def should_run_now(now: Optional[datetime] = None) -> bool:
# Automatic state transitions (pure function, no LLM)
# ---------------------------------------------------------------------------
def _cron_referenced_skills() -> Set[str]:
"""Skill names referenced by any cron job (incl. paused/disabled).
Best-effort: a cron-module import error or corrupt jobs store must never
break the curator, so any failure yields an empty set (no protection,
but no crash).
"""
try:
from cron.jobs import referenced_skill_names as _refs
return _refs()
except Exception as e:
logger.debug("Curator could not read cron skill references: %s", e, exc_info=True)
return set()
def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int]:
"""Walk every curator-managed skill and move active/stale/archived based on
the latest real activity timestamp. Pinned skills are never touched.
@@ -292,6 +307,8 @@ def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int
stale_cutoff = now - timedelta(days=get_stale_after_days())
archive_cutoff = now - timedelta(days=get_archive_after_days())
cron_referenced = _cron_referenced_skills()
counts = {"marked_stale": 0, "archived": 0, "reactivated": 0, "checked": 0, "seeded": 0}
for row in _u.agent_created_report():
@@ -300,6 +317,15 @@ def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int
if row.get("pinned"):
continue
# A skill referenced by any cron job (incl. paused/disabled) is in
# use by definition — resuming or the next fire must find it. The
# scheduler only bumps usage when a job actually fires, so jobs that
# fire less often than archive_after_days, paused jobs, and far-future
# one-shots would otherwise have their skills aged out from under
# them. Treat referenced skills like pinned: never auto-transition.
if name in cron_referenced:
continue
# First sight of a curation-eligible skill with no persisted record
# (e.g. a newly-eligible built-in): anchor its clock to now and defer.
if not row.get("_persisted", True):
@@ -316,6 +342,18 @@ def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int
current = row.get("state", _u.STATE_ACTIVE)
# Never-used skills (use_count == 0) get a grace floor: don't archive
# one until it is at least stale_after_days old. A use=0 skill is
# absence of evidence, not evidence of staleness — a skill created
# recently may simply not have had its trigger come up yet.
never_used = int(row.get("use_count", 0) or 0) == 0
if never_used and anchor > stale_cutoff:
# Younger than the stale window — leave it alone entirely.
if current == _u.STATE_STALE:
_u.set_state(name, _u.STATE_ACTIVE)
counts["reactivated"] += 1
continue
if anchor <= archive_cutoff and current != _u.STATE_ARCHIVED:
ok, _msg = _u.archive_skill(name)
if ok:
@@ -390,10 +428,19 @@ CURATOR_REVIEW_PROMPT = (
"back load-bearing UX (slash-command entry points referenced in docs and "
"tips) and are filtered out of the candidate list below — never resurrect "
"one as an archive or absorb target.\n"
"3c. DO NOT archive or prune any skill marked `cron=yes` in the candidate "
"list. A cron job depends on it and will fail to load it on its next "
"run. You MAY still consolidate it into an umbrella — but only because "
"the curator rewrites cron job skill references to follow consolidations; "
"never simply prune it.\n"
"4. DO NOT use usage counters as a reason to skip consolidation. The "
"counters are new and often mostly zero. Judge overlap on CONTENT, "
"not on use_count. 'use=0' is not evidence a skill is valuable; it's "
"absence of evidence either way.\n"
"absence of evidence either way. Corollary: 'use=0' is ALSO not a "
"reason to PRUNE a skill. Never archive a never-used skill (use=0) "
"unless it is at least 30 days old (check last_activity / created date) "
"AND its content is genuinely obsolete or fully absorbed elsewhere — a "
"recently-created skill simply may not have had its trigger come up yet.\n"
"5. DO NOT reject consolidation on the grounds that 'each skill has "
"a distinct trigger'. Pairwise distinctness is the wrong bar. The "
"right bar is: 'would a human maintainer write this as N separate "
@@ -1413,12 +1460,14 @@ def _render_candidate_list() -> str:
rows = skill_usage.agent_created_report()
if not rows:
return "No agent-created skills to review."
cron_referenced = _cron_referenced_skills()
lines = [f"Agent-created skills ({len(rows)}):\n"]
for r in rows:
lines.append(
f"- {r['name']} "
f"state={r['state']} "
f"pinned={'yes' if r.get('pinned') else 'no'} "
f"cron={'yes' if r['name'] in cron_referenced else 'no'} "
f"activity={r.get('activity_count', 0)} "
f"use={r.get('use_count', 0)} "
f"view={r.get('view_count', 0)} "

View File

@@ -133,6 +133,31 @@ _RATE_LIMIT_PATTERNS = [
"servicequotaexceededexception",
]
# Patterns that indicate provider-side overload, NOT a per-credential rate
# limit or billing problem. The credential is valid — the server is just
# busy — so the correct recovery is "back off and retry the same key", never
# "rotate the credential" (rotating exhausts the pool while the endpoint is
# still busy; a single-key user has nothing to rotate to). Some providers
# (notably Z.AI / Zhipu) reuse HTTP 429 for server-wide overload, so the 429
# status path matches the body against this list before falling through to
# the rate_limit default. Phrases are kept narrow and overload-flavoured so a
# normal rate-limit message ("you have been rate-limited") doesn't hit this
# bucket. (#14038, #15297)
_OVERLOADED_PATTERNS = [
"overloaded",
"temporarily overloaded",
"service is temporarily overloaded",
"service may be temporarily overloaded",
"server is overloaded",
"server overloaded",
"service overloaded",
"service is overloaded",
"upstream overloaded",
"currently overloaded",
"at capacity",
"over capacity",
]
# Usage-limit patterns that need disambiguation (could be billing OR rate_limit)
_USAGE_LIMIT_PATTERNS = [
"usage limit",
@@ -330,6 +355,14 @@ _CONTENT_POLICY_BLOCKED_PATTERNS = [
# echo back; the underscore form is provider-specific enough.
"content_filter",
"responsibleaipolicyviolation",
# MiniMax output-layer safety filter. The error string is surfaced
# verbatim by MiniMax SDK / OpenAI-compatible endpoints, usually in the
# form "output new_sensitive (1027)" when the model's *output* (often a
# large tool-call argument block) trips the upstream safety filter and
# the SSE stream is truncated mid-flight. ``new_sensitive`` is the
# filter name and is narrow enough that billing / format / auth error
# strings will not collide. See #32421.
"new_sensitive",
]
# Auth patterns (non-status-code signals)
@@ -863,7 +896,19 @@ def _classify_by_status(
)
if status_code == 429:
# Already checked long_context_tier above; this is a normal rate limit
# Already checked long_context_tier above. Some providers (notably
# Z.AI / Zhipu) reuse HTTP 429 for server-wide overload — same status
# code as a true per-credential rate limit, but the credential is
# valid and the correct recovery is "back off and retry the same key",
# NOT "rotate the credential" (which exhausts the pool while the
# endpoint is still busy, and does nothing for a single-key user).
# Disambiguate on the error body so an overload 429 takes the
# transient-overload path instead of burning the pool. (#14038)
if any(p in error_msg for p in _OVERLOADED_PATTERNS):
return result_fn(
FailoverReason.overloaded,
retryable=True,
)
return result_fn(
FailoverReason.rate_limit,
retryable=True,
@@ -1214,6 +1259,17 @@ def _classify_by_message(
should_fallback=True,
)
# Overloaded / server-busy patterns — must come BEFORE the rate_limit and
# billing checks so that a message-only "overloaded" (no 503/529 status,
# e.g. some Anthropic-compatible proxies) classifies as a transient
# overload (backoff + retry) instead of falling through to `unknown` or
# incorrectly triggering credential rotation.
if any(p in error_msg for p in _OVERLOADED_PATTERNS):
return result_fn(
FailoverReason.overloaded,
retryable=True,
)
# Billing patterns
if any(p in error_msg for p in _BILLING_PATTERNS):
return result_fn(
@@ -1303,19 +1359,25 @@ def _extract_status_code(error: Exception) -> Optional[int]:
def _extract_error_body(error: Exception) -> dict:
"""Extract the structured error body from an SDK exception."""
body = getattr(error, "body", None)
if isinstance(body, dict):
return body
# Some errors have .response.json()
response = getattr(error, "response", None)
if response is not None:
try:
json_body = response.json()
if isinstance(json_body, dict):
return json_body
except Exception:
pass
"""Extract the structured error body from an SDK exception or its cause chain."""
current = error
for _ in range(5): # Match _extract_status_code() traversal depth.
body = getattr(current, "body", None)
if isinstance(body, dict):
return body
# Some errors have .response.json()
response = getattr(current, "response", None)
if response is not None:
try:
json_body = response.json()
if isinstance(json_body, dict):
return json_body
except Exception:
pass
cause = getattr(current, "__cause__", None) or getattr(current, "__context__", None)
if cause is None or cause is current:
break
current = cause
return {}

View File

@@ -251,6 +251,78 @@ def _supports_vision_override(
return None
def _resolve_inference_base_url(
cfg: Optional[Dict[str, Any]],
provider: str,
) -> str:
"""Best-effort base URL for the active inference provider."""
try:
from agent.auxiliary_client import _RUNTIME_MAIN_BASE_URL
runtime = str(_RUNTIME_MAIN_BASE_URL or "").strip()
if runtime:
return runtime
except Exception:
pass
if not isinstance(cfg, dict):
return ""
model_cfg_raw = cfg.get("model")
model_cfg: Dict[str, Any] = model_cfg_raw if isinstance(model_cfg_raw, dict) else {}
base_url = str(model_cfg.get("base_url") or "").strip()
if base_url:
return base_url
config_provider = str(model_cfg.get("provider") or "").strip()
candidate_names: set[str] = set()
for p in filter(None, (provider, config_provider)):
candidate_names.add(p)
if p.lower().startswith("custom:"):
candidate_names.add(p.split(":", 1)[1])
else:
candidate_names.add(f"custom:{p}")
providers_cfg = cfg.get("providers")
if isinstance(providers_cfg, dict):
for name in candidate_names:
entry = providers_cfg.get(name)
if isinstance(entry, dict):
bu = str(entry.get("base_url") or "").strip()
if bu:
return bu
custom_providers = cfg.get("custom_providers")
if isinstance(custom_providers, list):
lowered = {n.lower() for n in candidate_names}
for entry_raw in custom_providers:
if not isinstance(entry_raw, dict):
continue
entry_name = str(entry_raw.get("name") or "").strip()
if entry_name not in candidate_names and entry_name.lower() not in lowered:
continue
bu = str(entry_raw.get("base_url") or "").strip()
if bu:
return bu
return ""
def _should_probe_ollama_vision(provider: str, base_url: str) -> bool:
"""True when the active provider likely fronts a local Ollama server."""
p = (provider or "").strip().lower()
if p == "ollama":
return True
if not base_url:
return False
try:
from agent.model_metadata import detect_local_server_type
return detect_local_server_type(base_url) == "ollama"
except Exception:
return False
def _coerce_mode(raw: Any) -> str:
"""Normalize a config value into one of the valid modes."""
if not isinstance(raw, str):
@@ -302,15 +374,33 @@ def _lookup_supports_vision(
return override
if not provider or not model:
return None
caps = None
try:
from agent.models_dev import get_model_capabilities
caps = get_model_capabilities(provider, model)
except Exception as exc: # pragma: no cover - defensive
logger.debug("image_routing: caps lookup failed for %s:%s%s", provider, model, exc)
return None
if caps is None:
return None
return bool(caps.supports_vision)
if caps is not None:
return bool(caps.supports_vision)
base_url = _resolve_inference_base_url(cfg, provider)
if not base_url and (provider or "").strip().lower() == "ollama":
base_url = "http://localhost:11434/v1"
if _should_probe_ollama_vision(provider, base_url):
try:
from agent.model_metadata import query_ollama_supports_vision
ollama_vision = query_ollama_supports_vision(model, base_url)
if ollama_vision is not None:
return ollama_vision
except Exception as exc: # pragma: no cover - defensive
logger.debug(
"image_routing: ollama vision probe failed for %s:%s%s",
provider,
model,
exc,
)
return None
def decide_image_input_mode(
@@ -388,14 +478,98 @@ def _sniff_mime_from_bytes(raw: bytes) -> Optional[str]:
# BMP: "BM"
if raw.startswith(b"BM"):
return "image/bmp"
# HEIC/HEIF: ftypheic / ftypheix / ftypmif1 / ftypmsf1 etc.
if len(raw) >= 12 and raw[4:8] == b"ftyp" and raw[8:12] in {
b"heic", b"heix", b"hevc", b"hevx", b"mif1", b"msf1", b"heim", b"heis",
}:
return "image/heic"
# ISO-BMFF family (HEIC/HEIF/AVIF): bytes 4..8 == 'ftyp', major brand at 8..12
if len(raw) >= 12 and raw[4:8] == b"ftyp":
brand = raw[8:12]
if brand in {b"avif", b"avis"}:
return "image/avif"
if brand in {
b"heic", b"heix", b"hevc", b"hevx",
b"mif1", b"msf1", b"heim", b"heis",
}:
return "image/heic"
# TIFF: II*\0 (little-endian) or MM\0* (big-endian)
if raw[:4] in {b"II*\x00", b"MM\x00*"}:
return "image/tiff"
# ICO: 00 00 01 00 (reserved=0, type=1=icon)
if raw[:4] == b"\x00\x00\x01\x00":
return "image/x-icon"
# SVG: text-based, look for an <svg tag near the start (skip BOM/whitespace)
head = raw[:512].lstrip().lower()
if head.startswith(b"<?xml") or head.startswith(b"<svg"):
if b"<svg" in head:
return "image/svg+xml"
return None
# Formats every major vision provider (Anthropic, OpenAI, Gemini, Bedrock)
# accepts natively. Anything outside this set has to be transcoded to PNG
# before we declare media_type, otherwise the provider returns HTTP 400
# ("Could not process image" / "Unsupported image media type") and the
# whole turn fails with no salvage path.
#
# Discord (and a few other chat platforms) freely accept attachments in
# formats outside this set -- AVIF screenshots from Chromium, HEIC from
# iPhones, TIFF from scanners, BMP from old Windows tools, ICO -- so users
# do hit this in practice. SVG is vector and Pillow cannot rasterize it;
# it is skipped (logged) rather than transcoded.
_UNIVERSALLY_SUPPORTED_MIMES = frozenset({
"image/png", "image/jpeg", "image/gif", "image/webp",
})
def _transcode_to_png(raw: bytes) -> Optional[bytes]:
"""Decode arbitrary image bytes with Pillow and re-encode as PNG.
Returns None if Pillow isn't installed or can't decode the input
(rare formats, corrupted bytes, missing optional decoder plugin for
HEIC/AVIF, or vector formats like SVG). Caller falls back to skipping
the image so the rest of the turn still works.
HEIC/HEIF and AVIF need optional Pillow plugins; we try to register
them on demand and swallow ImportError so a missing plugin just
looks like 'Pillow can't decode this' rather than crashing.
"""
try:
from PIL import Image
except ImportError:
logger.info(
"image_routing: Pillow not installed; cannot transcode "
"non-standard image format to PNG. Install with `pip install Pillow` "
"(and `pillow-heif` / `pillow-avif-plugin` for those formats)."
)
return None
# Optional plugin registration. Silent on failure: an unsupported
# format will just fall through to Image.open raising below.
try:
import pillow_heif # type: ignore
pillow_heif.register_heif_opener()
except Exception:
pass
try:
import pillow_avif # type: ignore # noqa: F401 -- registers AVIF on import
except Exception:
pass
try:
from io import BytesIO
with Image.open(BytesIO(raw)) as im:
# Pick an output mode PNG can serialise. Anything other than
# the standard set gets normalised to RGBA so transparency is
# preserved where the source had it.
if im.mode not in {"RGB", "RGBA", "L", "LA", "P"}:
im = im.convert("RGBA")
buf = BytesIO()
im.save(buf, format="PNG", optimize=False)
return buf.getvalue()
except Exception as exc:
logger.info(
"image_routing: Pillow could not transcode image to PNG -- %s", exc
)
return None
def _guess_mime(path: Path, raw: Optional[bytes] = None) -> str:
"""Return image MIME type for *path*.
@@ -431,8 +605,18 @@ def _file_to_data_url(path: Path) -> Optional[str]:
accept large images (OpenAI 49 MB+, Gemini 100 MB) don't pay a silent
quality tax just because one other provider is stricter.
Returns None only if the file can't be read (missing, permission
denied, etc.); the caller reports those paths in ``skipped``.
Format compatibility IS handled here: if the sniffed MIME isn't one
of ``_UNIVERSALLY_SUPPORTED_MIMES`` (i.e. it's something like AVIF,
HEIC, BMP, TIFF, or ICO that some providers reject outright), we
transcode to PNG with Pillow before declaring media_type. This fixes
the user-visible "Could not process image" HTTP 400 from Anthropic on
Discord-attached AVIF/HEIC/BMP files.
Returns None if the file can't be read OR if the format isn't
universally supported AND Pillow can't transcode it (Pillow missing,
HEIC/AVIF plugin missing, vector format like SVG, corrupt bytes). The
caller reports those paths in ``skipped`` and the rest of the turn
proceeds.
"""
try:
raw = path.read_bytes()
@@ -440,6 +624,22 @@ def _file_to_data_url(path: Path) -> Optional[str]:
logger.warning("image_routing: failed to read %s%s", path, exc)
return None
mime = _guess_mime(path, raw=raw)
if mime not in _UNIVERSALLY_SUPPORTED_MIMES:
transcoded = _transcode_to_png(raw)
if transcoded is None:
logger.warning(
"image_routing: %s is %s which is not accepted by all major "
"vision providers and could not be transcoded to PNG; "
"skipping this attachment.",
path, mime,
)
return None
logger.info(
"image_routing: transcoded %s (%s) -> image/png for provider compatibility",
path.name, mime,
)
raw = transcoded
mime = "image/png"
b64 = base64.b64encode(raw).decode("ascii")
return f"data:{mime};base64,{b64}"

View File

@@ -8,6 +8,7 @@ iteration.
from __future__ import annotations
import hashlib
import logging
from concurrent.futures import ThreadPoolExecutor
from typing import Any
@@ -25,20 +26,112 @@ logger = logging.getLogger(__name__)
# opening dozens of sockets at once.
_MAX_REFERENCE_WORKERS = 8
# Per-tool-result character budget for the advisory reference view. Tool
# results can be huge (a full diff, a 5000-line file dump); replaying them
# verbatim per reference per tool-loop step would blow the reference model's
# context window and cost. We keep the agent's *actions* (tool calls) in full —
# they are cheap, high-signal, and tell the reference what the agent did — but
# preview each tool *result* head+tail so the reference still sees what came
# back without replaying megabytes. The acting aggregator always gets the full,
# untrimmed transcript; this budget only shapes the advisory copy.
_REFERENCE_TOOL_RESULT_BUDGET = 4000
# System prompt prepended to every reference-model call. References are
# advisory — they do NOT act, call tools, or own the task. Without this
# framing a reference receives the bare trimmed conversation and assumes it is
# the acting agent: it then refuses ("I can't access repositories / URLs from
# here") or tries to call tools it doesn't have. The prompt reframes the model
# as an analyst whose job is to reason about the presented state and hand its
# best thinking to the aggregator/orchestrator that will actually act.
_REFERENCE_SYSTEM_PROMPT = (
"You are a reference advisor in a Mixture of Agents (MoA) process. You are "
"NOT the acting agent and you do NOT execute anything: you cannot call "
"tools, run commands, browse, or access files, repositories, or URLs, and "
"you should not try to or apologize for being unable to. A separate "
"aggregator/orchestrator model holds those capabilities and will take the "
"actual actions.\n\n"
"The conversation below is the current state of a task handled by that "
"acting agent. Your job is to give your most intelligent analysis of that "
"state: understand the goal, reason about the problem, and advise on what "
"to do next. Surface the best approach, concrete next steps and tool-use "
"strategy, likely pitfalls and risks, and anything the acting agent may "
"have missed or gotten wrong. Assume any referenced files, URLs, or "
"systems exist and reason about them from the context given rather than "
"asking for access.\n\n"
"Respond with your advice directly — no preamble, no disclaimers about "
"tools or access. Your response is private guidance handed to the "
"aggregator, not an answer shown to the user."
)
def _slot_label(slot: dict[str, str]) -> str:
return f"{slot.get('provider', '').strip()}:{slot.get('model', '').strip()}"
def _slot_runtime(slot: dict[str, str]) -> dict[str, Any]:
"""Resolve a reference/aggregator slot to real runtime call kwargs.
A MoA slot is just a model selection — it must be called the same way any
model is called elsewhere, not through a bare ``call_llm(provider=...,
model=...)`` that leaves base_url/api_key/api_mode unresolved and lets the
auxiliary auto-detector guess. We route the slot's provider through
``resolve_runtime_provider`` (the canonical provider→api_mode/base_url/
api_key resolver the CLI, gateway, and delegate_task all use), so the slot
gets its provider's real API surface — e.g. MiniMax → anthropic_messages,
GPT-5/o-series → max_completion_tokens, custom endpoints → their base_url.
Returns the kwargs to pass through to ``call_llm`` (provider/model plus the
resolved base_url/api_key when available). Falls back to the bare
provider/model on any resolution error so a misconfigured slot still
attempts the call rather than aborting the whole MoA turn.
"""
provider = str(slot.get("provider") or "").strip()
model = str(slot.get("model") or "").strip()
out: dict[str, Any] = {"provider": provider, "model": model}
try:
from hermes_cli.runtime_provider import resolve_runtime_provider
rt = resolve_runtime_provider(requested=provider, target_model=model)
resolved_provider = str(rt.get("provider") or provider).strip().lower()
# call_llm treats an explicit base_url as a custom endpoint. That is
# correct for ordinary OpenAI-compatible targets, but wrong for OAuth /
# provider-backed targets whose provider branch adds auth refresh,
# request metadata, or request-shape adapters. Keep those providers
# identified by name.
if resolved_provider in {"nous", "openai-codex", "xai-oauth"}:
return out
# Pass the resolved endpoint through so call_llm builds the request for
# the provider's actual API surface instead of auto-detecting. base_url
# routes call_llm to the right adapter (incl. anthropic_messages mode);
# api_key is the resolved credential for that provider.
if rt.get("base_url"):
out["base_url"] = rt["base_url"]
if rt.get("api_key"):
out["api_key"] = rt["api_key"]
except Exception as exc: # pragma: no cover - defensive
logger.debug("MoA slot runtime resolution failed for %s: %s", _slot_label(slot), exc)
return out
def _run_reference(
slot: dict[str, str],
ref_messages: list[dict[str, Any]],
*,
temperature: float,
max_tokens: int,
temperature: float | None = None,
max_tokens: int | None = None,
) -> tuple[str, str]:
"""Call one reference model and return ``(label, text)``.
The slot is resolved to its provider's real runtime (via ``_slot_runtime``)
and called through the same ``call_llm`` request-building path any model
uses, so per-model wire-format handling (anthropic_messages,
max_completion_tokens, fixed/forbidden temperature) applies identically to
a reference as it would if that model were the acting model. MoA imposes no
cap of its own (``max_tokens`` defaults to ``None`` → omitted → the model's
real maximum); ``temperature`` is only the user's configured preset value,
which call_llm may still override per model.
Never raises: a failed reference becomes a labelled note so the aggregator
can still act with partial context. Designed to run inside a thread pool —
``call_llm`` is synchronous/blocking, so threads (not asyncio) are the right
@@ -46,13 +139,17 @@ def _run_reference(
"""
label = _slot_label(slot)
try:
# Prepend the advisory-role system prompt so the reference understands
# it is analyzing state for an aggregator, not acting on the task. The
# trimmed view (_reference_messages) already strips the agent's own
# system prompt, so this is the only system message the reference sees.
messages = [{"role": "system", "content": _REFERENCE_SYSTEM_PROMPT}, *ref_messages]
response = call_llm(
task="moa_reference",
provider=slot["provider"],
model=slot["model"],
messages=ref_messages,
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
**_slot_runtime(slot),
)
return label, _extract_text(response) or "(empty response)"
except Exception as exc:
@@ -64,8 +161,8 @@ def _run_references_parallel(
reference_models: list[dict[str, str]],
ref_messages: list[dict[str, Any]],
*,
temperature: float,
max_tokens: int,
temperature: float | None = None,
max_tokens: int | None = None,
) -> list[tuple[str, str]]:
"""Fan out all reference models in parallel, returning outputs in order.
@@ -106,40 +203,140 @@ def _run_references_parallel(
return [r for r in results if r is not None]
def _reference_messages(messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
"""Build an advisory-safe view of the conversation for reference models.
def _truncate_tool_result(text: str, budget: int = _REFERENCE_TOOL_RESULT_BUDGET) -> str:
"""Head+tail preview of a tool result for the advisory view.
Reference calls are advisory: they never call tools and never emit the
``tool_calls`` the main model did. Replaying the full transcript verbatim
(a) re-bills the ~8K-token Hermes system prompt per reference per
iteration and (b) risks 400s from strict providers (Mistral, Fireworks)
that reject orphan ``tool`` messages or ``tool_calls`` the reference never
produced. We keep only the user/assistant *text* turns, dropping the
system prompt, any ``tool``-role messages, and any ``tool_calls`` payloads.
Keeps the first and last halves of the budget with a ``[... N chars
omitted ...]`` marker between them, so a reference sees both how the result
started and how it ended without replaying the whole payload.
"""
trimmed: list[dict[str, Any]] = []
if not text or len(text) <= budget:
return text
half = budget // 2
omitted = len(text) - 2 * half
return f"{text[:half]}\n[... {omitted} chars omitted ...]\n{text[-half:]}"
def _render_tool_calls(tool_calls: Any) -> str:
"""Render an assistant turn's tool_calls as readable text lines.
The advisory view cannot carry real ``tool_calls`` payloads (strict
providers reject tool_calls the reference never produced), so the agent's
actions are flattened to text the reference can read and reason about.
"""
lines: list[str] = []
for tc in tool_calls or []:
fn = (tc.get("function") or {}) if isinstance(tc, dict) else {}
name = fn.get("name") or (tc.get("name") if isinstance(tc, dict) else "") or "tool"
args = fn.get("arguments")
if isinstance(args, str):
args_text = args
elif args is not None:
try:
import json
args_text = json.dumps(args, ensure_ascii=False)
except Exception:
args_text = str(args)
else:
args_text = ""
lines.append(f"[called tool: {name}({args_text})]" if args_text else f"[called tool: {name}]")
return "\n".join(lines)
def _reference_messages(messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
"""Build an advisory view of the conversation for reference models.
A reference gives an INFORMED judgement on the current state, so it must
see what the agent actually did — its tool calls AND the tool results that
came back — not just the agent's narration. We therefore preserve the whole
conversation flow, but flatten it into clean user/assistant *text* turns:
- system prompt: dropped (8K of Hermes boilerplate, not advisory signal).
- assistant turns: kept; any ``tool_calls`` are rendered inline as
``[called tool: name(args)]`` text lines appended to the turn's text.
- ``tool``-role results: NOT dropped. Each is folded (head+tail preview,
see ``_truncate_tool_result``) into the *preceding* assistant turn as a
``[tool result: ...]`` block, so the reference sees what came back.
This emits ZERO ``tool``-role messages and ZERO ``tool_calls`` arrays — only
plain user/assistant text — so strict providers (Mistral, Fireworks) that
reject orphan tool messages / unproduced tool_calls don't 400, while the
reference still has the full picture.
The view MUST end with a ``user`` turn. Anthropic (and OpenRouter→Anthropic)
interpret a trailing assistant turn as an assistant *prefill* to continue,
and no-prefill models (e.g. Claude Opus 4.8) reject it with
``400 ... must end with a user message``. Rather than DELETE the agent's
latest context to satisfy that (which would blind the reference to the
current state), we APPEND a synthetic user turn asking the reference to
judge the state above. End-on-user is satisfied and no context is lost.
The acting aggregator always receives the full, untrimmed transcript; this
function only shapes the disposable advisory copy.
"""
advisory_instruction = (
"[The conversation above is the current state of the task. Give your "
"most intelligent judgement: what is going on, what should happen next, "
"what risks or mistakes you see, and how the acting agent should "
"proceed.]"
)
rendered: list[dict[str, Any]] = []
last_user_content: str | None = None
for msg in messages:
role = msg.get("role")
if role not in ("user", "assistant"):
# Drop system prompt and tool-result messages.
continue
content = msg.get("content")
if not isinstance(content, str):
# Skip non-text (multimodal/tool-call-only) assistant turns.
if not content:
continue
text = content if isinstance(content, str) else ""
if role == "assistant" and not text.strip():
# Assistant turn that was purely tool calls — nothing advisory.
if role == "system":
continue
trimmed.append({"role": role, "content": text})
if not trimmed:
# Degenerate case (e.g. first turn was stripped): fall back to a
# minimal user turn so the reference still has something to answer.
if role == "user":
if text.strip():
last_user_content = text
rendered.append({"role": "user", "content": text})
elif role == "assistant":
parts: list[str] = []
if text.strip():
parts.append(text.strip())
calls_text = _render_tool_calls(msg.get("tool_calls"))
if calls_text:
parts.append(calls_text)
# Empty assistant turns (no text, no calls) carry nothing advisory.
if parts:
rendered.append({"role": "assistant", "content": "\n".join(parts)})
elif role == "tool":
# Fold the tool result into the preceding assistant turn as text so
# the reference sees what came back, without emitting a tool-role
# message a reference never produced.
result_text = _truncate_tool_result(text)
block = f"[tool result: {result_text}]"
if rendered and rendered[-1].get("role") == "assistant":
rendered[-1]["content"] = rendered[-1]["content"] + "\n" + block
else:
# No assistant turn to attach to (e.g. a leading tool result);
# keep it as advisory context on its own assistant-role line.
rendered.append({"role": "assistant", "content": block})
# Any other role is ignored.
# End on a user turn: append a synthetic advisory request rather than
# deleting the agent's latest assistant context. This satisfies Anthropic's
# no-trailing-assistant-prefill rule while preserving full state.
if rendered and rendered[-1].get("role") == "assistant":
rendered.append({"role": "user", "content": advisory_instruction})
elif rendered and rendered[-1].get("role") == "user":
# Already ends on a user turn (fresh user prompt, no agent action yet).
# Leave it — the reference answers that prompt directly.
pass
if not rendered:
# Degenerate case: nothing rendered. Fall back to the latest user turn.
if last_user_content is not None:
return [{"role": "user", "content": last_user_content}]
for msg in reversed(messages):
if msg.get("role") == "user" and isinstance(msg.get("content"), str):
return [{"role": "user", "content": msg["content"]}]
return trimmed
return rendered
@@ -169,12 +366,18 @@ def aggregate_moa_context(
aggregator: dict[str, str],
temperature: float = 0.6,
aggregator_temperature: float = 0.4,
max_tokens: int = 4096,
max_tokens: int | None = None,
) -> str:
"""Run configured reference models and synthesize their advice.
Failures are returned as model-specific notes instead of aborting the normal
agent loop; the main model can still act with partial context.
``max_tokens`` is ``None`` by default: MoA does not cap reference or
aggregator output, so each model uses its own maximum. ``call_llm`` omits
the parameter entirely when it is ``None`` (see its docstring), which also
sidesteps providers that reject ``max_tokens`` outright. A hardcoded cap
here previously truncated long aggregator syntheses.
"""
reference_outputs: list[tuple[str, str]] = []
ref_messages = _reference_messages(api_messages)
@@ -203,11 +406,10 @@ def aggregate_moa_context(
try:
response = call_llm(
task="moa_aggregator",
provider=aggregator["provider"],
model=aggregator["model"],
messages=[{"role": "user", "content": synth_prompt}],
temperature=aggregator_temperature,
max_tokens=max_tokens,
**_slot_runtime(aggregator),
)
synthesis = _extract_text(response)
except Exception as exc:
@@ -230,8 +432,38 @@ def aggregate_moa_context(
class MoAChatCompletions:
"""OpenAI-chat-compatible facade where the aggregator is the acting model."""
def __init__(self, preset_name: str):
def __init__(self, preset_name: str, reference_callback: Any = None):
self.preset_name = preset_name or "default"
# Optional display hook. Called as reference outputs become available so
# frontends can show each reference model's answer as a labelled block
# before the aggregator acts. Signature:
# reference_callback(event, **kwargs)
# where event is one of:
# "moa.reference" kwargs: index, count, label, text
# "moa.aggregating" kwargs: aggregator (label), ref_count
# Never raises into the model call — display is best-effort.
self.reference_callback = reference_callback
# State-scoped reference cache. The agent loop calls create() once per
# tool-loop iteration; references should re-run whenever the task STATE
# advances — i.e. on every new user message AND every new tool result —
# so each reference judges the latest state. The advisory view
# (_reference_messages) now renders tool calls + results as text, so its
# signature changes on every new tool response; the cache key is that
# signature, so a new tool result is a cache MISS (references re-run)
# while a redundant create() call with identical state is a HIT (no
# re-run, no re-emit). This gives "fire on every user/tool response"
# for free, without re-firing on a pure no-op re-call.
self._ref_cache_key: tuple | None = None
self._ref_cache_outputs: list[tuple[str, str]] = []
def _emit(self, event: str, **kwargs: Any) -> None:
cb = self.reference_callback
if cb is None:
return
try:
cb(event, **kwargs)
except Exception as exc: # pragma: no cover - display must never break the turn
logger.debug("MoA reference_callback failed for %s: %s", event, exc)
def create(self, **api_kwargs: Any) -> Any:
from hermes_cli.config import load_config
@@ -241,7 +473,10 @@ class MoAChatCompletions:
messages = list(api_kwargs.get("messages") or [])
reference_models = preset.get("reference_models") or []
aggregator = preset.get("aggregator") or {}
max_tokens = int(preset.get("max_tokens", api_kwargs.get("max_tokens") or 4096) or 4096)
# MoA does not cap reference or aggregator output: each model uses its
# own maximum. Passing max_tokens=None makes call_llm omit the parameter
# (it never caps by default), so a long aggregator synthesis is never
# truncated and providers that reject max_tokens don't 400.
temperature = float(preset.get("reference_temperature", 0.6) or 0.6)
aggregator_temperature = float(preset.get("aggregator_temperature", api_kwargs.get("temperature") or 0.4) or 0.4)
@@ -253,12 +488,52 @@ class MoAChatCompletions:
reference_outputs: list[tuple[str, str]] = []
ref_messages = _reference_messages(messages)
reference_outputs = _run_references_parallel(
reference_models,
ref_messages,
temperature=temperature,
max_tokens=max_tokens,
)
# Turn-scoped cache: only run + display references when the advisory
# view changed (i.e. a new user turn). Within one turn the agent loop
# calls create() once per tool iteration with the same advisory view;
# reuse the cached outputs and skip both the re-run and the re-emit.
_sig = hashlib.sha256(
"\u0000".join(
f"{m.get('role')}:{m.get('content')}" for m in ref_messages
).encode("utf-8", "replace")
).hexdigest()
_cache_key = (self.preset_name, _sig, tuple(_slot_label(s) for s in reference_models))
_refs_from_cache = _cache_key == self._ref_cache_key and bool(self._ref_cache_outputs)
if _refs_from_cache:
reference_outputs = list(self._ref_cache_outputs)
else:
reference_outputs = _run_references_parallel(
reference_models,
ref_messages,
temperature=temperature,
max_tokens=None,
)
self._ref_cache_key = _cache_key
self._ref_cache_outputs = list(reference_outputs)
# Surface each reference model's answer to the display BEFORE the
# aggregator acts — once per turn (only on the iteration that
# actually ran them). The user sees one labelled block per
# reference (rendered like a thinking block) so the MoA process is
# visible rather than a silent pause. Best-effort: never blocks the
# turn.
_ref_count = len(reference_outputs)
for _idx, (_label, _text) in enumerate(reference_outputs, start=1):
self._emit(
"moa.reference",
index=_idx,
count=_ref_count,
label=_label,
text=_text,
)
if _ref_count:
self._emit(
"moa.aggregating",
aggregator=_slot_label(aggregator),
ref_count=_ref_count,
)
agg_messages = [dict(m) for m in messages]
if reference_outputs:
@@ -286,21 +561,26 @@ class MoAChatCompletions:
raise RuntimeError("MoA aggregator cannot be another MoA preset")
agg_kwargs = dict(api_kwargs)
agg_kwargs["messages"] = agg_messages
agg_kwargs["model"] = aggregator.get("model")
agg_kwargs["temperature"] = aggregator_temperature
# The aggregator is the acting model. Resolve its slot to the provider's
# real runtime (base_url/api_key/api_mode) and call it through the same
# request-building path any model uses — so per-model wire-format
# handling (anthropic_messages, max_completion_tokens, fixed/forbidden
# temperature) applies identically to it. MoA imposes no output cap:
# max_tokens is passed through from the caller (normally None → omitted
# → the model's real maximum). The preset's old hardcoded 4096 default
# is gone — it truncated long syntheses.
return call_llm(
task="moa_aggregator",
provider=aggregator.get("provider"),
model=aggregator.get("model"),
messages=agg_messages,
temperature=aggregator_temperature,
max_tokens=agg_kwargs.get("max_tokens"),
tools=agg_kwargs.get("tools"),
extra_body=agg_kwargs.get("extra_body"),
**_slot_runtime(aggregator),
)
class MoAClient:
def __init__(self, preset_name: str):
def __init__(self, preset_name: str, reference_callback: Any = None):
self.chat = type("_MoAChat", (), {})()
self.chat.completions = MoAChatCompletions(preset_name)
self.chat.completions = MoAChatCompletions(preset_name, reference_callback=reference_callback)

View File

@@ -478,6 +478,16 @@ def _infer_provider_from_url(base_url: str) -> Optional[str]:
return None
def _lmstudio_server_root(base_url: str) -> str:
"""Return the LM Studio server root for native ``/api/v1`` endpoints."""
root = _normalize_base_url(base_url).rstrip("/")
for suffix in ("/api/v1", "/api", "/v1"):
if root.endswith(suffix):
root = root[: -len(suffix)].rstrip("/")
break
return root
def _is_known_provider_base_url(base_url: str) -> bool:
return _infer_provider_from_url(base_url) is not None
@@ -549,6 +559,7 @@ def detect_local_server_type(base_url: str, api_key: str = "") -> Optional[str]:
server_url = normalized
if server_url.endswith("/v1"):
server_url = server_url[:-3]
lmstudio_url = _lmstudio_server_root(base_url)
headers = _auth_headers(api_key)
@@ -556,7 +567,7 @@ def detect_local_server_type(base_url: str, api_key: str = "") -> Optional[str]:
with httpx.Client(timeout=2.0, headers=headers) as client:
# LM Studio exposes /api/v1/models — check first (most specific)
try:
r = client.get(f"{server_url}/api/v1/models")
r = client.get(f"{lmstudio_url}/api/v1/models")
if r.status_code == 200:
return "lm-studio"
except Exception:
@@ -774,7 +785,7 @@ def fetch_endpoint_model_metadata(
if is_local_endpoint(normalized):
try:
if detect_local_server_type(normalized, api_key=api_key) == "lm-studio":
server_url = normalized[:-3].rstrip("/") if normalized.endswith("/v1") else normalized
server_url = _lmstudio_server_root(normalized)
response = requests.get(
server_url.rstrip("/") + "/api/v1/models",
headers=headers,
@@ -1188,6 +1199,56 @@ def query_ollama_num_ctx(model: str, base_url: str, api_key: str = "") -> Option
return None
def query_ollama_supports_vision(model: str, base_url: str, api_key: str = "") -> Optional[bool]:
"""Return True/False when Ollama ``/api/show`` reports vision support.
Uses the ``capabilities`` field on Ollama 0.6.0+ and falls back to
``model_info.*.vision.block_count`` on older servers. Returns None when
the server is unreachable, not Ollama, or the model is unknown.
"""
import httpx
bare_model = _strip_provider_prefix(model)
if not bare_model or not base_url:
return None
try:
if detect_local_server_type(base_url, api_key=api_key) != "ollama":
return None
except Exception:
return None
server_url = base_url.rstrip("/")
if server_url.endswith("/v1"):
server_url = server_url[:-3]
headers = _auth_headers(api_key)
try:
with httpx.Client(timeout=3.0, headers=headers) as client:
resp = client.post(f"{server_url}/api/show", json={"name": bare_model})
if resp.status_code != 200:
return None
data = resp.json()
except Exception:
return None
caps = data.get("capabilities")
if isinstance(caps, list):
if any(str(cap).lower() == "vision" for cap in caps):
return True
if caps:
return False
model_info = data.get("model_info")
if isinstance(model_info, dict):
for key in model_info:
if "vision.block_count" in str(key).lower():
return True
return None
def _query_ollama_api_show(model: str, base_url: str, api_key: str = "") -> Optional[int]:
"""Query an Ollama server's native ``/api/show`` for context length.
@@ -1297,6 +1358,7 @@ def _query_local_context_length(model: str, base_url: str, api_key: str = "") ->
server_url = base_url.rstrip("/")
if server_url.endswith("/v1"):
server_url = server_url[:-3]
lmstudio_url = _lmstudio_server_root(base_url)
headers = _auth_headers(api_key)
@@ -1340,7 +1402,7 @@ def _query_local_context_length(model: str, base_url: str, api_key: str = "") ->
# Use _model_id_matches for fuzzy matching: LM Studio stores models as
# "publisher/slug" but users configure only "slug" after "local:" prefix.
if server_type == "lm-studio":
resp = client.get(f"{server_url}/api/v1/models")
resp = client.get(f"{lmstudio_url}/api/v1/models")
if resp.status_code == 200:
data = resp.json()
for m in data.get("models", []):
@@ -1646,6 +1708,34 @@ def get_model_context_length(
if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
return config_context_length
# 0a. MoA virtual provider — ``model`` is a preset name, not a real model,
# and ``base_url`` is the local virtual endpoint, so every probe below would
# miss and fall through to the 256K default. The aggregator is the acting
# model, so resolve the context window from the aggregator slot's real
# provider+model instead. References are advisory-only and never bound the
# acting context, so they're ignored here.
if (provider or "").strip().lower() == "moa":
try:
from hermes_cli.config import load_config
from hermes_cli.moa_config import resolve_moa_preset
from hermes_cli.runtime_provider import resolve_runtime_provider
preset = resolve_moa_preset(load_config().get("moa") or {}, model)
agg = preset.get("aggregator") or {}
agg_provider = str(agg.get("provider") or "").strip()
agg_model = str(agg.get("model") or "").strip()
if agg_model and agg_provider and agg_provider.lower() != "moa":
rt = resolve_runtime_provider(requested=agg_provider, target_model=agg_model)
return get_model_context_length(
agg_model,
base_url=rt.get("base_url", "") or "",
api_key=rt.get("api_key", "") or "",
provider=agg_provider,
)
except Exception:
logger.debug("MoA aggregator context-length resolution failed", exc_info=True)
# Fall through to the generic default if aggregator resolution failed.
# 0b. custom_providers per-model override — check before any probe.
# This closes the gap where /model switch and display paths used to fall
# back to 128K despite the user having a per-model context_length set.

View File

@@ -26,7 +26,7 @@ from __future__ import annotations
import os
import sys
import urllib.request
from typing import Optional
from typing import Any, Optional
from utils import base_url_hostname, normalize_proxy_url
@@ -142,6 +142,46 @@ def _get_proxy_for_base_url(base_url: Optional[str]) -> Optional[str]:
return proxy
def build_keepalive_http_client(
base_url: str = "",
*,
async_mode: bool = False,
) -> Optional[Any]:
"""Build an httpx client for OpenAI SDK calls with env-only proxy policy.
Uses explicit ``HTTPS_PROXY`` / ``NO_PROXY`` env vars via
``_get_proxy_for_base_url``. A custom transport disables httpx's default
``trust_env`` path, so macOS system proxy settings from
``urllib.request.getproxies()`` (which omit the ExceptionsList) are not
applied. Mirrors ``AIAgent._build_keepalive_http_client``.
"""
try:
import httpx
import socket
if "api.githubcopilot.com" in str(base_url or "").lower():
client_cls = httpx.AsyncClient if async_mode else httpx.Client
return client_cls()
sock_opts = [(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)]
if hasattr(socket, "TCP_KEEPIDLE"):
sock_opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 30))
sock_opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10))
sock_opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 3))
elif hasattr(socket, "TCP_KEEPALIVE"):
sock_opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPALIVE, 30))
proxy = _get_proxy_for_base_url(base_url)
transport_cls = httpx.AsyncHTTPTransport if async_mode else httpx.HTTPTransport
client_cls = httpx.AsyncClient if async_mode else httpx.Client
return client_cls(
transport=transport_cls(socket_options=sock_opts),
proxy=proxy,
)
except Exception:
return None
def _install_safe_stdio() -> None:
"""Wrap stdout/stderr so best-effort console output cannot crash the agent."""
for stream_name in ("stdout", "stderr"):
@@ -164,4 +204,5 @@ __all__ = [
"_install_safe_stdio",
"_get_proxy_from_env",
"_get_proxy_for_base_url",
"build_keepalive_http_client",
]

View File

@@ -88,12 +88,15 @@ def _find_hermes_md(cwd: Path) -> Optional[Path]:
stop_at = _find_git_root(cwd)
current = cwd.resolve()
for directory in [current, *current.parents]:
# When there is no git root, only check cwd itself walking parents
# could pick up a .hermes.md planted in /tmp, /home, etc.
search_dirs = [current, *current.parents] if stop_at else [current]
for directory in search_dirs:
for name in _HERMES_MD_NAMES:
candidate = directory / name
if candidate.is_file():
return candidate
# Stop walking at the git root (or filesystem root).
if stop_at and directory == stop_at:
break
return None
@@ -617,7 +620,12 @@ DEVELOPER_ROLE_MODELS = ("gpt-5", "codex")
PLATFORM_HINTS = {
"whatsapp": (
"You are on a text messaging communication platform, WhatsApp. "
"Please do not use markdown as it does not render. "
"Standard markdown (**bold**, *italic*, ~~strike~~, # headers, "
"`code`, ```code blocks```, [links](url)) is auto-converted to "
"WhatsApp's native syntax (*bold*, _italic_, ~strike~, monospace) — "
"feel free to write in markdown, and use bullet lists ('- item') "
"freely. Tables are NOT supported — prefer bullet lists or labeled "
"key:value pairs. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. The file "
"will be sent as a native WhatsApp attachment — images (.jpg, .png, "
@@ -682,7 +690,11 @@ PLATFORM_HINTS = {
),
"signal": (
"You are on a text messaging communication platform, Signal. "
"Please do not use markdown as it does not render. "
"Standard markdown (**bold**, *italic*, ~~strike~~, # headers, "
"`code`, ```code blocks```) is auto-converted to Signal's native "
"rich formatting — feel free to write in markdown, and use bullet "
"lists ('- item') freely (they render as • bullets). Tables are NOT "
"supported — prefer bullet lists or labeled key:value pairs. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. Images "
"(.png, .jpg, .webp) appear as photos, audio as attachments, and other "
@@ -917,8 +929,7 @@ def _probe_remote_backend(env_type: str) -> str | None:
try:
# Import locally: tools/ imports are heavy and only relevant when a
# non-local backend is actually configured.
from tools.terminal_tool import _get_env_config # type: ignore
from tools.environments import get_environment # type: ignore
from tools.terminal_tool import _create_environment, _get_env_config # type: ignore
except Exception as e:
logger.debug("Backend probe unavailable (import failed): %s", e)
_BACKEND_PROBE_CACHE[cache_key] = ""
@@ -926,7 +937,59 @@ def _probe_remote_backend(env_type: str) -> str | None:
try:
config = _get_env_config()
env = get_environment(config)
# Build the environment the same way tools/terminal_tool.py does for a
# live command: select the backend image, then assemble ssh/container
# config from the env-derived dict. (There is no `get_environment`
# factory — the real entry point is `_create_environment`.)
if env_type == "docker":
image = config.get("docker_image", "")
elif env_type == "singularity":
image = config.get("singularity_image", "")
elif env_type == "modal":
image = config.get("modal_image", "")
elif env_type == "daytona":
image = config.get("daytona_image", "")
else:
image = ""
ssh_config = None
if env_type == "ssh":
ssh_config = {
"host": config.get("ssh_host", ""),
"user": config.get("ssh_user", ""),
"port": config.get("ssh_port", 22),
"key": config.get("ssh_key", ""),
"persistent": config.get("ssh_persistent", False),
}
container_config = None
if env_type in {"docker", "singularity", "modal", "daytona"}:
container_config = {
"container_cpu": config.get("container_cpu", 1),
"container_memory": config.get("container_memory", 5120),
"container_disk": config.get("container_disk", 51200),
"container_persistent": config.get("container_persistent", True),
"modal_mode": config.get("modal_mode", "auto"),
"docker_volumes": config.get("docker_volumes", []),
"docker_mount_cwd_to_workspace": config.get("docker_mount_cwd_to_workspace", False),
"docker_forward_env": config.get("docker_forward_env", []),
"docker_env": config.get("docker_env", {}),
"docker_run_as_host_user": config.get("docker_run_as_host_user", False),
"docker_extra_args": config.get("docker_extra_args", []),
"docker_persist_across_processes": config.get("docker_persist_across_processes", True),
"docker_orphan_reaper": config.get("docker_orphan_reaper", True),
}
env = _create_environment(
env_type=env_type,
image=image,
cwd=config.get("cwd", ""),
timeout=config.get("timeout", 180),
ssh_config=ssh_config,
container_config=container_config,
task_id="prompt-backend-probe",
host_cwd=config.get("host_cwd"),
)
# Single-line POSIX probe — works on any Unixy backend. Wrapped in
# `2>/dev/null` so a missing binary doesn't pollute the output.
probe_cmd = (

View File

@@ -10,6 +10,7 @@ the first 6 and last 4 characters for debuggability.
import logging
import os
import re
import shlex
logger = logging.getLogger(__name__)
@@ -107,12 +108,60 @@ _PREFIX_PATTERNS = [
r"ntn_[A-Za-z0-9]{10,}", # Notion internal integration token
]
# ENV assignment patterns: KEY=value where KEY contains a secret-like name
# ENV assignment patterns: KEY=value where KEY contains a secret-like name.
# Uppercase keys tolerate spaces around "=" (e.g. ``FOO_SECRET = bar``) because
# an all-caps key is almost never prose/code.
_SECRET_ENV_NAMES = r"(?:API_?KEY|TOKEN|SECRET|PASSWORD|PASSWD|CREDENTIAL|AUTH)"
_ENV_ASSIGN_RE = re.compile(
rf"([A-Z0-9_]{{0,50}}{_SECRET_ENV_NAMES}[A-Z0-9_]{{0,50}})\s*=\s*(['\"]?)(\S+)\2",
)
# Lowercase / dotted / hyphenated config keys from config files
# (application.properties, .env, YAML-ish dumps): ``spring.datasource.password=secret``,
# ``app.api.key=xyz``, ``password=secret``. The uppercase _ENV_ASSIGN_RE above
# never matched these, so config-file passwords leaked verbatim (issue #16413).
#
# These run only in a config-file context, NOT in prose, code, or URLs — three
# carve-outs preserved from the original design (#4367 + the documented
# web-URL passthrough below):
# 1. The value is bounded by ``[^\s&]`` (stops at whitespace AND ``&``) so
# form-urlencoded bodies are handled pair-by-pair (by _redact_form_body),
# not greedily swallowed.
# 2. _CFG_DOTTED_RE only matches when the key is NAMESPACED (contains a dot),
# which is unambiguously a config key — never a prose word.
# 3. _CFG_ANCHORED_RE matches a bare secret-word key only at line start
# (optionally after ``export``), so conversational ``I have password=foo``
# mid-sentence is left alone.
# The colon-form URL guard (skip when ``://`` present) lives at the call site.
_SECRET_CFG_NAMES = r"(?:api[ _.\-]?key|token|secret|passwd|password|credential|auth)"
_CFG_VALUE = r"(['\"]?)([^\s&]+?)\2(?=[\s&]|$)"
# Namespaced (dotted) key: the secret word may sit anywhere in a dotted path.
_CFG_DOTTED_RE = re.compile(
rf"((?:[A-Za-z0-9_\-]+\.)+[A-Za-z0-9_.\-]*{_SECRET_CFG_NAMES}[A-Za-z0-9_.\-]*"
rf"|[A-Za-z0-9_.\-]*{_SECRET_CFG_NAMES}[A-Za-z0-9_.\-]*\.[A-Za-z0-9_.\-]+)"
rf"={_CFG_VALUE}",
re.IGNORECASE,
)
# Line-anchored bare key: ``password=…`` / ``export api_key=…`` at start of line.
_CFG_ANCHORED_RE = re.compile(
rf"(^[ \t]*(?:export[ \t]+)?[A-Za-z0-9_\-]*{_SECRET_CFG_NAMES}[A-Za-z0-9_\-]*)={_CFG_VALUE}",
re.IGNORECASE | re.MULTILINE,
)
# Unquoted YAML / colon config (e.g. ``password: secret``,
# ``spring.datasource.password: hunter2``). The secret keyword must be part of
# the KEY (anchored to the start of the line/indent), and the value is a single
# whitespace-free token — so prose like ``note: secret meeting`` (keyword in the
# value) and ``error: token expired`` are left alone. Bare ``auth`` is excluded
# from the key set so ``Authorization:`` / ``author:`` don't match (the former
# is masked by _AUTH_HEADER_RE); ``auth_token``/``auth-token`` still match via
# the ``token`` keyword. Quoted values defer to _JSON_FIELD_RE via the lookahead.
_YAML_CFG_NAMES = r"(?:api[ _.\-]?key|token|secret|passwd|password|credential)"
_YAML_ASSIGN_RE = re.compile(
rf"(^[ \t]*[A-Za-z0-9_.\-]*{_YAML_CFG_NAMES}[A-Za-z0-9_.\-]*)(:[ \t]*)(?!['\"])([^\s&]+)",
re.IGNORECASE | re.MULTILINE,
)
# JSON field patterns: "apiKey": "value", "token": "value", etc.
_JSON_KEY_NAMES = r"(?:api_?[Kk]ey|token|secret|password|access_token|refresh_token|auth_token|bearer|secret_value|raw_secret|secret_input|key_material)"
_JSON_FIELD_RE = re.compile(
@@ -125,8 +174,15 @@ _JSON_FIELD_RE = re.compile(
# while the header name and scheme word are preserved for debuggability. The
# previous rule only matched ``Bearer``, so ``Basic <base64 user:pass>`` and
# ``token <pat>`` leaked verbatim into logs/transcripts.
#
# The credential class excludes quote characters (``"`` / ``'``): a token sitting
# flush against a closing quote (``"Authorization: Bearer sk-..."``) must not pull
# that quote into the match, or masking turns value corruption into *syntax*
# corruption — the closing quote vanishes and the command/string no longer parses
# (unterminated quote → shell EOF / Python SyntaxError). Real credentials never
# contain ``"`` or ``'``, so excluding them is safe. See #43083.
_AUTH_HEADER_RE = re.compile(
r"((?:Proxy-)?Authorization:\s*)([A-Za-z][\w.+-]*\s+)?(\S+)",
r"((?:Proxy-)?Authorization:\s*)([A-Za-z][\w.+-]*\s+)?([^\s\"']+)",
re.IGNORECASE,
)
@@ -154,9 +210,37 @@ _PRIVATE_KEY_RE = re.compile(
)
# Database connection strings: protocol://user:PASSWORD@host
# Catches postgres, mysql, mongodb, redis, amqp URLs and redacts the password
# Catches postgres, mysql, mongodb, redis, amqp URLs and redacts the password.
# The userinfo and password groups forbid whitespace ([^:\s]+ / [^@\s]+) so the
# match can never span a line break. A real DSN password never contains
# whitespace; without this bound the greedy [^@]+ would scan past the end of a
# code line to the next stray "@" (e.g. a Python decorator), swallowing
# intervening lines and corrupting tool OUTPUT for any source containing a
# postgresql:// f-string template. See issue #33801.
_DB_CONNSTR_RE = re.compile(
r"((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis|amqp)://[^:]+:)([^@]+)(@)",
r"((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis|amqp)://[^:\s]+:)([^@\s]+)(@)",
re.IGNORECASE,
)
# Bare-token credential in a web/transport URL: ``scheme://TOKEN@host``.
# This is the ``git remote set-url origin https://PASSWORD@github.com/...``
# shape from issue #6396 — a single opaque credential in the userinfo position
# with NO ``user:pass`` colon. It is unambiguously a secret: legitimate
# round-trip URLs (OAuth callbacks, magic links, pre-signed shares — see the
# "Web-URL redaction is intentionally OFF" note in redact_sensitive_text) carry
# their tokens in the QUERY STRING, never in bare userinfo. The colon form
# ``user:pass@`` is deliberately left to pass through (commit "pass web URLs
# through unchanged", #34029) and is NOT matched here — the token class forbids
# ``:``. DB schemes are handled by _DB_CONNSTR_RE above and excluded here.
#
# Guards against false positives:
# - 8+ char floor skips short usernames (git, admin, root, deploy, ubuntu).
# - The token class ``[^\s:@/]`` cannot cross ``/``, so an ``@`` sitting in a
# path or query (e.g. ``?q=user@example.com``) is never treated as userinfo.
_URL_BARE_TOKEN_RE = re.compile(
r"((?:https?|wss?|git|ssh|ftp|ftps|sftp)://)" # scheme
r"([^\s:@/]{8,})" # bare token (no colon/slash/@), 8+ chars
r"(@[^\s]+)", # @host...
re.IGNORECASE,
)
@@ -340,7 +424,40 @@ def _redact_form_body(text: str) -> str:
return _redact_query_string(text.strip())
def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = False) -> str:
def _mask_token_nonreusable(token: str) -> str:
"""Redact a prefix-matched credential to a NON-REUSABLE sentinel.
Unlike :func:`_mask_token` (which keeps head/tail chars — fine for logs
that are never fed back into a config), this emits a marker that:
* cannot be mistaken for a usable-but-truncated key, so an agent that
reads it from a config file and writes it back does NOT corrupt the
stored credential into a dead 13-char string (issue #35519); and
* still does not leak the secret material (no head/tail chars).
The vendor prefix label is preserved for debuggability so the agent can
still tell *which* credential is present (e.g. a GitHub PAT vs an OpenAI
key) without seeing any of its bytes.
"""
if not token:
return "«redacted-secret»"
# Preserve only the recognizable vendor prefix label (e.g. "ghp_", "sk-"),
# never any of the random secret body.
label = ""
for sub in _PREFIX_SUBSTRINGS:
if token.startswith(sub):
label = sub
break
return f"«redacted:{label}…»" if label else "«redacted-secret»"
def redact_sensitive_text(
text: str,
*,
force: bool = False,
code_file: bool = False,
file_read: bool = False,
) -> str:
"""Apply all redaction patterns to a block of text.
Safe to call on any string -- non-matching text passes through unchanged.
@@ -353,6 +470,17 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
constants, "apiKey": "test" fixtures). Prefix patterns, auth headers,
private keys, DB connstrings, JWTs, and URL secrets are still redacted.
Set file_read=True for file *content* returned to the agent (read_file /
search_files / cat). Secrets are STILL redacted — they are never exposed —
but prefix-matched credentials are replaced with a non-reusable sentinel
(``«redacted:ghp_…»``) instead of a head/tail-preserving mask
(``ghp_S1...Pn2T``). The old mask looked like a real-but-truncated key, so
an agent reading it from config.yaml and writing it back silently corrupted
the stored credential into a dead 13-char value → 401 (issue #35519). The
sentinel is syntactically invalid as a token, so it can't be mistaken for a
usable key or written back as one. Implies code_file=True (config/data
files shouldn't trigger the source-code ENV/JSON false-positive paths).
Performance: each regex pattern is gated behind a cheap substring
pre-check (e.g. ``"=" in text`` for ENV assignments, ``"://" in text``
for URLs, ``"eyJ" in text`` for JWTs). On a typical hermes log line
@@ -371,9 +499,15 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
if not (force or _REDACT_ENABLED):
return text
# file_read content shouldn't hit the source-code ENV/JSON false-positive
# paths either (it's config/data, not log lines).
if file_read:
code_file = True
# Known prefixes (sk-, ghp_, etc.) — gate on substring presence
if _has_known_prefix_substring(text):
text = _PREFIX_RE.sub(lambda m: _mask_token(m.group(1)), text)
_prefix_sub = _mask_token_nonreusable if file_read else _mask_token
text = _PREFIX_RE.sub(lambda m: _prefix_sub(m.group(1)), text)
# ENV assignments: OPENAI_API_KEY=*** (skip for code files — false positives)
if not code_file:
@@ -382,6 +516,13 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
name, quote, value = m.group(1), m.group(2), m.group(3)
return f"{name}={quote}{_mask_token(value)}{quote}"
text = _ENV_ASSIGN_RE.sub(_redact_env, text)
# Lowercase/dotted config keys (issue #16413). Skip URLs entirely —
# web-URL query params are intentionally passed through (see note
# near the bottom of this function); _DB_CONNSTR_RE still guards
# connection-string passwords.
if "://" not in text:
text = _CFG_DOTTED_RE.sub(_redact_env, text)
text = _CFG_ANCHORED_RE.sub(_redact_env, text)
# JSON fields: "apiKey": "***" (skip for code files — false positives)
if ":" in text and '"' in text:
@@ -390,6 +531,15 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
return f'{key}: "{_mask_token(value)}"'
text = _JSON_FIELD_RE.sub(_redact_json, text)
# Unquoted YAML / colon config: password: *** (after JSON so quoted
# values are handled there; the lookahead in _YAML_ASSIGN_RE skips
# quotes). Skip URLs — web-URL query params pass through by design.
if ":" in text and "://" not in text:
def _redact_yaml(m):
key, sep, value = m.group(1), m.group(2), m.group(3)
return f"{key}{sep}{_mask_token(value)}"
text = _YAML_ASSIGN_RE.sub(_redact_yaml, text)
# Authorization headers — _AUTH_HEADER_RE matches any scheme after
# "[Proxy-]Authorization:" case-insensitively, so "uthorization" is the
# cheapest substring gate that covers every casing without a casefold().
@@ -419,9 +569,32 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
if "BEGIN" in text and "-----" in text:
text = _PRIVATE_KEY_RE.sub("[REDACTED PRIVATE KEY]", text)
# Database connection string passwords
# Database connection string passwords. With code_file=True, a password
# group that is a pure ``{...}`` brace expression is an f-string template
# reference (e.g. f"postgresql://{user}:{pass}@{host}"), not a literal
# credential — preserve it. Literal passwords are still redacted. The regex
# forbids whitespace in the password group, so a single-line template's
# group(2) is exactly the brace expression. See issue #33801.
if "://" in text:
text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
if code_file:
def _redact_db(m):
pw = m.group(2)
if pw.startswith("{") and pw.endswith("}"):
return m.group(0)
return f"{m.group(1)}***{m.group(3)}"
text = _DB_CONNSTR_RE.sub(_redact_db, text)
else:
text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
# Bare-token userinfo in web/transport URLs: ``scheme://TOKEN@host``.
# The git-remote-with-embedded-password shape from #6396. Only the
# colon-less bare-token form is redacted — ``user:pass@`` and
# query-string tokens are left to pass through (see the web-URL note
# below). See _URL_BARE_TOKEN_RE for the false-positive guards.
text = _URL_BARE_TOKEN_RE.sub(
lambda m: f"{m.group(1)}{_mask_token(m.group(2))}{m.group(3)}",
text,
)
# JWT tokens (eyJ... — base64-encoded JSON headers)
if "eyJ" in text:
@@ -434,7 +607,12 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
# blanket-redacting param values by name breaks those skills mid-flow.
# Known credential shapes (sk-, ghp_, JWTs, etc.) inside URLs are still
# caught by _PREFIX_RE and _JWT_RE above. DB connection-string passwords
# are still caught by _DB_CONNSTR_RE.
# are still caught by _DB_CONNSTR_RE. The ONE userinfo case still redacted
# is the colon-less bare-token form ``scheme://TOKEN@host`` (#6396, handled
# by _URL_BARE_TOKEN_RE in the ``://`` block above): a bare credential in
# userinfo is never a round-trip workflow token (those live in the query
# string), so masking it can't break a skill. The ``user:pass@`` form is
# left to pass through per #34029.
# Form-urlencoded bodies (only triggers on clean k=v&k=v inputs).
if "&" in text and "=" in text:
@@ -452,6 +630,66 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
return text
# Commands whose stdout is an environment-variable dump (KEY=value lines),
# NOT source code. For these, terminal-output redaction must run the
# ENV-assignment pass (code_file=False) so opaque tokens with no recognized
# vendor prefix (e.g. ``MY_SERVICE_TOKEN=abc123randomstring``) are still
# masked. For all other commands, code_file=True is used to avoid mangling
# legitimate source/config dumps (``MAX_TOKENS=100``, ``"apiKey": "x"``
# fixtures, ``postgresql://{user}`` f-string templates). See issue #43025.
_ENV_DUMP_COMMANDS = frozenset({"env", "printenv", "set", "export", "declare"})
def is_env_dump_command(command: str | None) -> bool:
"""Return True if ``command`` dumps environment variables to stdout.
Detects ``env`` / ``printenv`` / ``set`` / ``export`` / ``declare`` as the
first token of any segment in a pipeline or sequence (``;`` / ``&&`` /
``||`` / ``|``). Conservative: a parse failure or anything unrecognized
returns False (callers then fall back to the safer code_file=True path,
which still masks prefix-shaped keys).
"""
if not command or not isinstance(command, str):
return False
# Split on shell separators, then inspect the first token of each segment.
segments = re.split(r"[|;&]+", command)
for seg in segments:
seg = seg.strip()
if not seg:
continue
try:
tokens = shlex.split(seg)
except ValueError:
tokens = seg.split()
if tokens and tokens[0] in _ENV_DUMP_COMMANDS:
return True
return False
def redact_terminal_output(
output: str, command: str | None = None, *, force: bool = False
) -> str:
"""Redact secrets from terminal/process stdout.
Single redaction policy for ALL terminal-output surfaces — foreground
``terminal`` results AND background ``process(action=poll/log/wait)``
output — so they can't diverge. Picks ``code_file`` based on whether
``command`` is an environment dump:
- env-dump command (``env``/``printenv``/``set``/``export``/``declare``)
→ ``code_file=False`` so the ENV-assignment pass masks opaque tokens.
- anything else (or unknown command) → ``code_file=True`` to avoid
false positives on source/config dumps.
``force=True`` bypasses the global ``security.redact_secrets`` preference
for safety boundaries that must never emit raw credentials.
"""
if not output:
return output
code_file = not is_env_dump_command(command or "")
return redact_sensitive_text(output, force=force, code_file=code_file)
# Substrings used to gate ``_PREFIX_RE`` execution. If none of these appear in
# the input string, the prefix regex cannot match anything, so we skip it.
# False positives are fine (they just run the regex, which then matches

140
agent/replay_cleanup.py Normal file
View File

@@ -0,0 +1,140 @@
"""Replay-history sanitization shared across resume code paths.
When a session's last turn dies mid-tool-loop — the process is killed by a
restart/shutdown command, a stale-timeout fires, or an interrupt lands before
the tool result is written — the persisted transcript can end with a dangling
``assistant(tool_calls)`` (no matching ``tool`` answer) or an interrupted
``assistant→tool`` block. On resume the model sees that broken tail and
re-issues the unanswered call, producing an endless "thinking"/reboot loop
(#49201, #29086).
These pure helpers strip those tails before the history is replayed to the
model. They were originally local to ``gateway/run.py`` (which fixed the
messaging-gateway path) and are extracted here so every resume surface — the
messaging gateway AND the TUI/WebUI gateway — shares the same cleanup instead
of the WebUI path silently skipping it.
"""
from __future__ import annotations
import logging
from typing import Any, Dict, List
logger = logging.getLogger(__name__)
def is_interrupted_tool_result(content: Any) -> bool:
"""Return True if a tool result indicates the tool was interrupted."""
if not isinstance(content, str):
return False
lowered = content.lower()
if "[command interrupted]" in lowered:
return True
if "exit_code" in lowered and ("130" in lowered or "-1" in lowered):
return "interrupt" in lowered
return False
def strip_interrupted_tool_tails(
agent_history: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
"""Strip interrupted assistant→tool sequences from replay history.
Older interrupted gateway turns can be followed by a queued real user
message, so the interrupted assistant/tool block is not necessarily the
final tail by the time we rebuild replay history. Remove any contiguous
assistant(tool_calls) + tool-result block that contains an interrupted tool
result, while preserving successful tool-call sequences intact.
"""
if not agent_history:
return agent_history
cleaned: List[Dict[str, Any]] = []
i = 0
n = len(agent_history)
while i < n:
msg = agent_history[i]
if msg.get("role") == "assistant" and "tool_calls" in msg:
j = i + 1
tool_results: List[Dict[str, Any]] = []
while j < n and agent_history[j].get("role") == "tool":
tool_results.append(agent_history[j])
j += 1
if tool_results and any(
is_interrupted_tool_result(m.get("content", ""))
for m in tool_results
):
logger.debug(
"Stripping interrupted assistant→tool replay block "
"(indices %d%d, tool_results=%d)",
i, j - 1, len(tool_results),
)
i = j
continue
if msg.get("role") == "tool" and is_interrupted_tool_result(msg.get("content", "")):
logger.debug("Stripping orphan interrupted tool result from replay history")
i += 1
continue
cleaned.append(msg)
i += 1
return cleaned
def strip_dangling_tool_call_tail(
agent_history: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
"""Strip a trailing ``assistant(tool_calls)`` block left with NO answers.
When a tool call itself kills the gateway process (``docker restart``,
``systemctl restart``, ``kill``, ``hermes gateway restart``), the process
is terminated by SIGKILL *mid-call* — before the tool result is ever
written and before the orderly shutdown rewind
(``_drop_trailing_empty_response_scaffolding``) can run. The last thing
persisted is the ``assistant`` message that issued the ``tool_calls``,
with zero matching ``tool`` rows.
On resume the model sees an unanswered tool call at the tail and naturally
re-issues it — which restarts the gateway again, producing the infinite
reboot loop in #49201. ``strip_interrupted_tool_tails`` does not catch
this because there is no tool result to inspect for an interrupt marker.
This strips that dangling tail at the source so there is nothing for the
model to re-execute. It only acts when the tail is an
``assistant(tool_calls)`` whose calls have NO corresponding ``tool``
results — a completed assistant→tool pair (any tool answers present) is
left untouched so genuine mid-progress tool loops still resume.
"""
if not agent_history:
return agent_history
last = agent_history[-1]
if not (
isinstance(last, dict)
and last.get("role") == "assistant"
and last.get("tool_calls")
):
return agent_history
logger.debug(
"Stripping dangling unanswered assistant(tool_calls) tail "
"(%d call(s)) — process likely killed mid-tool-call by a "
"restart/shutdown command (#49201)",
len(last.get("tool_calls") or []),
)
return agent_history[:-1]
def sanitize_replay_history(
agent_history: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
"""Apply both replay-tail strippers in the canonical order.
Convenience entry point for resume code paths: removes interrupted
assistant→tool blocks anywhere in the history, then removes a dangling
unanswered ``assistant(tool_calls)`` tail. Returns the same list object
when there is nothing to strip.
"""
if not agent_history:
return agent_history
return strip_dangling_tool_call_tail(strip_interrupted_tool_tails(agent_history))

View File

@@ -122,6 +122,8 @@ from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Callable, Dict, Iterator, List, Optional, Set, Tuple
from hermes_cli._subprocess_compat import IS_WINDOWS, windows_hide_flags
try:
import fcntl # POSIX only; Windows falls back to best-effort without flock.
except ImportError: # pragma: no cover
@@ -441,6 +443,7 @@ def _spawn(spec: ShellHookSpec, stdin_json: str) -> Dict[str, Any]:
return result
t0 = time.monotonic()
_popen_kwargs = {"creationflags": windows_hide_flags()} if IS_WINDOWS else {}
try:
proc = subprocess.run(
argv,
@@ -449,6 +452,7 @@ def _spawn(spec: ShellHookSpec, stdin_json: str) -> Dict[str, Any]:
timeout=spec.timeout,
text=True,
shell=False,
**_popen_kwargs,
)
except subprocess.TimeoutExpired:
result["timed_out"] = True

View File

@@ -5,6 +5,8 @@ import re
import subprocess
from pathlib import Path
from hermes_cli._subprocess_compat import IS_WINDOWS, windows_hide_flags
logger = logging.getLogger(__name__)
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
@@ -66,6 +68,7 @@ def run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
_popen_kwargs = {"creationflags": windows_hide_flags()} if IS_WINDOWS else {}
try:
completed = subprocess.run(
["bash", "-c", command],
@@ -75,6 +78,7 @@ def run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
timeout=max(1, int(timeout)),
check=False,
stdin=subprocess.DEVNULL,
**_popen_kwargs,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"

View File

@@ -28,6 +28,7 @@ import uuid
from dataclasses import dataclass
from typing import Any, Dict, List, Optional
from agent.conversation_compression import conversation_history_after_compression
from agent.iteration_budget import IterationBudget
from agent.model_metadata import (
estimate_messages_tokens_rough,
@@ -400,7 +401,9 @@ def build_turn_context(
_orig_len, len(messages), _orig_tokens, _preflight_tokens
):
break # Cannot compress further: neither rows nor tokens moved
conversation_history = None
conversation_history = conversation_history_after_compression(
agent, messages
)
agent._empty_content_retries = 0
agent._thinking_prefill_retries = 0
agent._last_content_with_tools = None

View File

@@ -289,7 +289,14 @@ def finalize_turn(
and len(_stripped) <= 24
and _stripped[-1:] not in {".", "!", "?", "", "", "", "`", ")"}
)
if _is_empty_terminal or _is_partial_fragment:
_is_partial_stream_recovery = (
str(_turn_exit_reason) == "partial_stream_recovery"
)
if (
_is_empty_terminal
or _is_partial_fragment
or _is_partial_stream_recovery
):
_explanation = agent._format_turn_completion_explanation(
_turn_exit_reason
)

View File

@@ -67,6 +67,11 @@ class TurnRetryState:
# ── Restart signals (read by the outer loop after the attempt) ───────
restart_with_compressed_messages: bool = False
restart_with_length_continuation: bool = False
# Set when a content-filter stream stall (e.g. MiniMax "new_sensitive")
# has been escalated to the fallback chain: the partial-stream content
# was rolled back off ``messages`` and the loop should re-issue the API
# call against the newly-activated provider (#32421).
restart_with_rebuilt_messages: bool = False
def __iter__(self):
# Convenience for debugging / tests: iterate (name, value) pairs.

View File

@@ -15,6 +15,63 @@ from typing import Any, Iterable
_MAX_CHANGED_PATHS_IN_NUDGE = 8
# Non-code file extensions whose edits carry no verifiable runtime behavior:
# documentation, prose, and data/markup that no test/build exercises. When a
# turn touches ONLY these, verify-on-stop has nothing to check, so the nudge is
# suppressed (this is fix "C" for the doc/markdown/skill false-positive — a
# SKILL.md or README edit must never demand a /tmp verification script). A turn
# that edits any non-listed path (a real source/code/config file) still nudges.
_NON_CODE_VERIFY_EXTENSIONS = frozenset(
{
".md",
".markdown",
".mdx",
".rst",
".txt",
".text",
".adoc",
".asciidoc",
".org",
".log",
".csv",
".tsv",
}
)
# Filenames (case-insensitive, extension-less or otherwise) that are pure prose
# even without a recognized doc extension.
_NON_CODE_VERIFY_FILENAMES = frozenset(
{
"license",
"licence",
"notice",
"authors",
"contributors",
"changelog",
"codeowners",
}
)
def _is_non_code_path(raw: str) -> bool:
"""Return True when a changed path is documentation/prose with nothing to verify."""
try:
p = Path(str(raw))
except Exception:
return False
suffix = p.suffix.lower()
if suffix in _NON_CODE_VERIFY_EXTENSIONS:
return True
if not suffix and p.name.lower() in _NON_CODE_VERIFY_FILENAMES:
return True
return False
def _filter_verifiable_paths(paths: Iterable[str]) -> list[str]:
"""Drop documentation/prose paths; keep paths that could have verifiable behavior."""
return [p for p in paths if p and not _is_non_code_path(p)]
# Session identities (platform or source) that are NOT human conversational
# messaging surfaces: interactive coding surfaces (CLI, TUI, desktop, codex,
# local, gateway) and programmatic callers (API server, webhooks, tools).
@@ -79,12 +136,13 @@ def verify_on_stop_enabled(config: dict[str, Any] | None = None) -> bool:
"""Return whether edit -> verify-before-finish behavior is enabled.
Precedence: an explicit ``HERMES_VERIFY_ON_STOP`` env var wins, then an
explicit boolean ``agent.verify_on_stop`` config value, then a surface-aware
default. The config default is the sentinel ``"auto"`` (see
``DEFAULT_CONFIG``), which resolves to ON for interactive coding surfaces
explicit ``agent.verify_on_stop`` config value. The config default is
``False`` (see ``DEFAULT_CONFIG``) — verify-on-stop is OFF unless the user
opts in. The legacy ``"auto"`` sentinel is still honored for anyone who
sets it explicitly: it resolves to ON for interactive coding surfaces
(CLI, TUI, desktop) and programmatic callers, and OFF for conversational
messaging surfaces (Telegram, Discord, etc.) where the verification
narrative would otherwise reach a human as chat noise.
messaging surfaces (Telegram, Discord, etc.). A missing/unknown value
falls back to OFF.
"""
env = os.environ.get("HERMES_VERIFY_ON_STOP")
if env is not None:
@@ -106,8 +164,11 @@ def verify_on_stop_enabled(config: dict[str, Any] | None = None) -> bool:
return True
if token in {"0", "false", "no", "off"}:
return False
# "auto", missing, or any other value -> surface-aware default.
return not _session_is_messaging_surface()
if token == "auto":
# Explicit opt-in to the legacy surface-aware behavior.
return not _session_is_messaging_surface()
# Missing or unknown value -> OFF (the new default).
return False
def _candidate_cwds(paths: Iterable[str]) -> list[Path]:
@@ -190,7 +251,10 @@ def build_verify_on_stop_nudge(
max_attempts: int = 2,
) -> str | None:
"""Return a synthetic follow-up when edited code lacks fresh verification."""
paths = sorted({str(p) for p in changed_paths if p})
# Drop documentation/prose paths (markdown, skills, README, LICENSE, ...) —
# they carry no verifiable behavior, so a turn that touched only those has
# nothing to verify and must not nudge.
paths = sorted({str(p) for p in _filter_verifiable_paths(changed_paths)})
if not paths or attempts >= max_attempts:
return None

View File

@@ -85,7 +85,7 @@ Installers are built and uploaded to GitHub Releases manually. macOS/Windows sig
### How it works
The packaged app ships the Electron shell and a native React chat surface. On first launch it can install the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. Backend resolution first honours `HERMES_DESKTOP_HERMES_ROOT`, then a completed managed install, then a probed `hermes` on `PATH` (unless `HERMES_DESKTOP_IGNORE_EXISTING=1` is set), and finally an explicit `HERMES_DESKTOP_HERMES` command override for packagers/troubleshooting. The renderer (React, in `src/`) talks to a `hermes dashboard` backend over the `tui_gateway`/dashboard APIs and reuses the agent runtime rather than embedding `hermes --tui`. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
The packaged app ships the Electron shell and a native React chat surface. On first launch it can install the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. Backend resolution first honours `HERMES_DESKTOP_HERMES_ROOT`, then a completed managed install, then a probed `hermes` on `PATH` (unless `HERMES_DESKTOP_IGNORE_EXISTING=1` is set), and finally an explicit `HERMES_DESKTOP_HERMES` command override for packagers/troubleshooting. The renderer (React, in `src/`) talks to a headless backend the app launches for you — a `hermes serve` process that serves the `tui_gateway` JSON-RPC/WebSocket API — through the framework-agnostic client in [`apps/shared`](../shared/) (the same client the web dashboard consumes), and reuses the agent runtime rather than embedding `hermes --tui`. The app is **self-contained**: it runs its own `hermes serve` backend and never opens or requires the web dashboard UI. (For backward compatibility, a runtime that predates the `serve` command automatically falls back to a headless `dashboard --no-open` — see `electron/backend-command.cjs` — so mid-upgrade installs never break.) The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
### Verification

View File

@@ -0,0 +1,51 @@
'use strict'
// Backend subcommand routing for the desktop-managed Hermes process.
//
// The desktop app launches its own headless backend via `hermes serve` — it
// must NEVER depend on or launch the browser `dashboard`. But `serve` is a
// newer subcommand: a runtime that predates it (an older managed install the
// app hasn't updated yet, or an older `hermes` resolved from PATH) only knows
// `dashboard --no-open`. To avoid bricking those users mid-upgrade we detect
// whether the resolved runtime understands `serve` and, only when it does not,
// fall back to the legacy `dashboard --no-open` invocation. Both produce the
// exact same headless gateway; `serve` is just the decoupled name.
//
// These helpers are pure so they can be unit-tested without Electron.
/**
* Build the canonical headless backend argv (always `serve`).
* @param {string} [profile] optional Hermes profile to pin via `--profile`.
*/
function serveBackendArgs(profile) {
const head = profile ? ['--profile', profile] : []
return [...head, 'serve', '--host', '127.0.0.1', '--port', '0']
}
/**
* Rewrite a resolved backend argv from `serve` to the legacy
* `dashboard --no-open` form, preserving every other argument (incl. a leading
* `-m hermes_cli.main` and any `--profile <name>`). Returns a copy; if there is
* no `serve` token the argv is returned unchanged.
*/
function dashboardFallbackArgs(args) {
const i = args.indexOf('serve')
if (i === -1) return args.slice()
return [...args.slice(0, i), 'dashboard', '--no-open', ...args.slice(i + 1)]
}
/**
* True when a runtime's `hermes_cli/subcommands/dashboard.py` source registers
* the `serve` subcommand. Matches `add_parser("serve"` / `add_parser('serve'`
* specifically so the substring "server" (e.g. "start_server", "web server")
* never produces a false positive.
*/
function sourceDeclaresServe(dashboardPySource) {
return /add_parser\(\s*["']serve["']/.test(String(dashboardPySource || ''))
}
module.exports = {
serveBackendArgs,
dashboardFallbackArgs,
sourceDeclaresServe,
}

View File

@@ -0,0 +1,83 @@
'use strict'
const test = require('node:test')
const assert = require('node:assert/strict')
const {
serveBackendArgs,
dashboardFallbackArgs,
sourceDeclaresServe,
} = require('./backend-command.cjs')
test('serveBackendArgs builds a headless serve invocation', () => {
assert.deepEqual(serveBackendArgs(), [
'serve',
'--host',
'127.0.0.1',
'--port',
'0',
])
})
test('serveBackendArgs pins a profile when provided', () => {
assert.deepEqual(serveBackendArgs('worker'), [
'--profile',
'worker',
'serve',
'--host',
'127.0.0.1',
'--port',
'0',
])
})
test('dashboardFallbackArgs rewrites serve -> dashboard --no-open, keeping the -m prefix', () => {
const serve = ['-m', 'hermes_cli.main', 'serve', '--host', '127.0.0.1', '--port', '0']
assert.deepEqual(dashboardFallbackArgs(serve), [
'-m',
'hermes_cli.main',
'dashboard',
'--no-open',
'--host',
'127.0.0.1',
'--port',
'0',
])
})
test('dashboardFallbackArgs preserves a --profile flag ahead of serve', () => {
const serve = ['-m', 'hermes_cli.main', '--profile', 'worker', 'serve', '--host', '127.0.0.1', '--port', '0']
assert.deepEqual(dashboardFallbackArgs(serve), [
'-m',
'hermes_cli.main',
'--profile',
'worker',
'dashboard',
'--no-open',
'--host',
'127.0.0.1',
'--port',
'0',
])
})
test('dashboardFallbackArgs is a no-op (copy) when there is no serve token', () => {
const args = ['-m', 'hermes_cli.main', 'dashboard', '--no-open']
const out = dashboardFallbackArgs(args)
assert.deepEqual(out, args)
assert.notEqual(out, args, 'should return a copy, not the same reference')
})
test('sourceDeclaresServe detects the serve subparser registration', () => {
assert.equal(sourceDeclaresServe('subparsers.add_parser("serve", help="...")'), true)
assert.equal(sourceDeclaresServe("subparsers.add_parser('serve')"), true)
assert.equal(sourceDeclaresServe('subparsers.add_parser(\n "serve",\n)'), true)
})
test('sourceDeclaresServe does not false-positive on the substring "server"', () => {
const oldSource = `
dashboard_parser = subparsers.add_parser("dashboard", help="Start the web UI dashboard")
from hermes_cli.web_server import start_server # web server
`
assert.equal(sourceDeclaresServe(oldSource), false)
})

View File

@@ -37,7 +37,18 @@ const { execFileSync } = require('node:child_process')
const PROBE_TIMEOUT_MS = 5000
/**
* Return true iff `python -c "import hermes_cli"` exits 0.
* Return the Python snippet used to verify Hermes can import far enough to
* launch the CLI. Kept exported for tests so dependency regressions are
* caught without needing a real broken venv fixture.
*
* @returns {string}
*/
function hermesRuntimeImportProbe() {
return 'import yaml; import hermes_cli.config'
}
/**
* Return true iff the Hermes runtime import probe exits 0.
*
* Used to gate the "fallback to system Python with hermes_cli installed"
* rung of resolveHermesBackend. Without this, a system Python 3.11-3.13
@@ -46,13 +57,20 @@ const PROBE_TIMEOUT_MS = 5000
* site-packages -- and the resolver returns a backend that immediately
* dies on spawn.
*
* The probe intentionally imports hermes_cli.config, not just the top-level
* package: a broken/empty Windows launcher venv can still see the source tree
* through PYTHONPATH but lack PyYAML, then die on the first real CLI import.
*
* @param {string} pythonPath - Absolute path to a python.exe / python.
* @param {object} [opts]
* @param {object} [opts.env] - Additional environment for the probe.
* @returns {boolean}
*/
function canImportHermesCli(pythonPath) {
function canImportHermesCli(pythonPath, opts = {}) {
if (!pythonPath) return false
try {
execFileSync(pythonPath, ['-c', 'import hermes_cli'], {
execFileSync(pythonPath, ['-c', hermesRuntimeImportProbe()], {
env: { ...process.env, ...(opts.env || {}) },
stdio: 'ignore',
timeout: PROBE_TIMEOUT_MS,
windowsHide: true
@@ -101,6 +119,7 @@ function verifyHermesCli(hermesCommand, opts = {}) {
module.exports = {
canImportHermesCli,
hermesRuntimeImportProbe,
verifyHermesCli,
PROBE_TIMEOUT_MS
}

View File

@@ -11,7 +11,7 @@ const fs = require('node:fs')
const os = require('node:os')
const path = require('node:path')
const { canImportHermesCli, verifyHermesCli } = require('./backend-probes.cjs')
const { canImportHermesCli, hermesRuntimeImportProbe, verifyHermesCli } = require('./backend-probes.cjs')
// Resolve the host's own Node binary -- guaranteed to be on disk and
// runnable. We use it as both a stand-in for "a python that doesn't
@@ -40,6 +40,12 @@ test('canImportHermesCli returns false when binary does not exist', () => {
assert.equal(canImportHermesCli(ghost), false)
})
test('hermes runtime import probe checks config dependencies', () => {
const probe = hermesRuntimeImportProbe()
assert.match(probe, /\bimport yaml\b/)
assert.match(probe, /\bimport hermes_cli\.config\b/)
})
test('verifyHermesCli returns false when command is falsy', () => {
assert.equal(verifyHermesCli(''), false)
assert.equal(verifyHermesCli(null), false)

View File

@@ -10,7 +10,26 @@ const { execFile } = require('node:child_process')
const fs = require('node:fs/promises')
const path = require('node:path')
const simpleGit = require('simple-git')
// `simple-git` is a pure-JS runtime dep that workspace dedup hoists into the
// repo-root node_modules. Packaged builds set `files:` in package.json, which
// excludes node_modules from the asar, so the normal require() fails at launch
// (issue #52735: "Cannot find module 'simple-git'"). We ship the dep's
// closure under resources/native-deps/vendor/node_modules/ via extraResources
// + scripts/stage-native-deps.cjs, and resolve from there when the hoisted
// require() isn't reachable. The `vendor/` nesting matters: electron-builder
// drops a node_modules dir at the root of an extraResources copy but keeps a
// nested one. Dev mode never hits the fallback -- Node's normal lookup finds
// the hoisted copy.
let simpleGit
try {
simpleGit = require('simple-git')
} catch {
const resourcesPath = process.resourcesPath
if (!resourcesPath) {
throw new Error("git-review IPC: 'simple-git' not found and no resourcesPath to fall back to")
}
simpleGit = require(path.join(resourcesPath, 'native-deps', 'vendor', 'node_modules', 'simple-git'))
}
const { resolveRequestedPathForIpc } = require('./hardening.cjs')

View File

@@ -39,6 +39,7 @@ const { createLinkTitleWindow } = require('./link-title-window.cjs')
const { probeGatewayWebSocket } = require('./gateway-ws-probe.cjs')
const { adoptServedDashboardToken } = require('./dashboard-token.cjs')
const { waitForDashboardPortAnnouncement } = require('./backend-ready.cjs')
const { dashboardFallbackArgs, sourceDeclaresServe } = require('./backend-command.cjs')
const { serializeJsonBody, setJsonRequestHeaders } = require('./oauth-net-request.cjs')
const { fetchMarketplaceThemes, searchMarketplaceThemes } = require('./vscode-marketplace.cjs')
const { buildDesktopBackendEnv, normalizeHermesHomeRoot } = require('./backend-env.cjs')
@@ -534,9 +535,10 @@ function getTitleBarOverlayOptions() {
return { height: TITLEBAR_HEIGHT }
}
// Windows + WSLg paint WCO natively; plain Linux disables it (frameless hidden
// titlebar still applies).
if (!IS_WINDOWS && !IS_WSL) {
// WSLg paints WCO via the RDP host's own min/max/close, so requesting
// an Electron overlay there just leaves a dead gap. Plain Linux (KDE,
// GNOME) can use the native overlay — let it through.
if (!IS_WINDOWS && IS_WSL) {
return false
}
@@ -790,7 +792,7 @@ let rendererReloadTimes = []
// the renderer's "Reload and retry" path or by quitting the app.
let bootstrapFailure = null
// Latched non-bootstrap backend spawn failure — stops getConnection() from
// respawning hermes dashboard children in a tight loop while boot is broken.
// respawning hermes serve backend children in a tight loop while boot is broken.
let backendStartFailure = null
// Active first-launch install, so the renderer's Cancel button (and app quit)
// can abort the in-flight install.sh/ps1 instead of leaving it running.
@@ -1284,8 +1286,14 @@ function findOnPath(command) {
const pathEntries = String(process.env.PATH || '')
.split(path.delimiter)
.filter(Boolean)
// On Windows, try PATHEXT extensions BEFORE the bare (empty-extension) name.
// A real command must resolve via its .exe/.cmd (Windows command-resolution
// semantics consult PATHEXT); an extensionless file — e.g. a Git-Bash
// shell-script shim named `hermes` — must not shadow `hermes.cmd`/`hermes.exe`.
// The empty entry is kept LAST so callers that already include the extension
// (py.exe, pwsh.exe, powershell.exe) still resolve.
const extensions = IS_WINDOWS
? ['', ...(process.env.PATHEXT || '.COM;.EXE;.BAT;.CMD').split(';').filter(Boolean)]
? [...(process.env.PATHEXT || '.COM;.EXE;.BAT;.CMD').split(';').filter(Boolean), '']
: ['']
for (const entry of pathEntries) {
@@ -1302,7 +1310,7 @@ function isCommandScript(command) {
return IS_WINDOWS && /\.(cmd|bat)$/i.test(command || '')
}
function unwrapWindowsVenvHermesCommand(command, dashboardArgs) {
function unwrapWindowsVenvHermesCommand(command, backendArgs) {
if (!IS_WINDOWS || !command || isCommandScript(command)) return null
const resolved = path.resolve(String(command))
@@ -1312,14 +1320,14 @@ function unwrapWindowsVenvHermesCommand(command, dashboardArgs) {
if (path.basename(scriptsDir).toLowerCase() !== 'scripts') return null
const venvRoot = path.dirname(scriptsDir)
const python = getNoConsoleVenvPython(venvRoot)
const python = getVenvPython(venvRoot)
if (!fileExists(python)) return null
const root = path.dirname(venvRoot)
return {
label: `existing Hermes no-console Python at ${python}`,
label: `existing Hermes Python at ${python}`,
command: python,
args: ['-m', 'hermes_cli.main', ...dashboardArgs],
args: ['-m', 'hermes_cli.main', ...backendArgs],
bootstrap: false,
env: buildDesktopBackendEnv({
hermesHome: HERMES_HOME,
@@ -1327,11 +1335,72 @@ function unwrapWindowsVenvHermesCommand(command, dashboardArgs) {
venvRoot
}),
kind: 'python',
readyFile: true,
// Surfaced so backendSupportsServe() can read this runtime's source for the
// `serve` capability check instead of falling back to a heavyweight probe.
root,
shell: false
}
}
// Does the resolved runtime understand the `serve` subcommand? The desktop
// spawns `hermes serve`; runtimes older than serve only have `dashboard`. We
// detect support so getBackendArgsForRuntime() can route old runtimes through
// the legacy `dashboard --no-open` form instead of crashing on an unknown
// subcommand (would brick every user mid-upgrade — #54568 follow-up).
//
// Fast path: read the runtime's own dashboard.py (instant, covers managed
// installs, dev checkouts, and the Windows venv). Fallback: probe the CLI once
// (covers a bare `hermes` resolved from PATH with no known source root). Result
// is cached per resolved runtime so we probe at most once per backend.
const _serveSupportCache = new Map()
function backendSupportsServe(backend) {
if (!backend || !backend.command) return true
const key = `${backend.command}::${backend.root || ''}`
if (_serveSupportCache.has(key)) return _serveSupportCache.get(key)
let supported = null
if (backend.root) {
try {
const src = fs.readFileSync(
path.join(backend.root, 'hermes_cli', 'subcommands', 'dashboard.py'),
'utf8'
)
supported = sourceDeclaresServe(src)
} catch {
supported = null // source unreadable — fall through to the probe
}
}
if (supported === null) {
try {
const prefix = backend.args && backend.args[0] === '-m' ? backend.args.slice(0, 2) : []
execFileSync(backend.command, [...prefix, 'serve', '--help'], {
cwd: backend.root || undefined,
env: { ...process.env, HERMES_HOME, ...(backend.env || {}) },
timeout: 15000,
stdio: 'ignore',
windowsHide: true
})
supported = true
} catch {
supported = false
}
}
_serveSupportCache.set(key, supported)
rememberLog(
`[backend] \`serve\` ${supported ? 'supported' : 'unsupported → routing via legacy `dashboard`'} for ${backend.label || key}`
)
return supported
}
// Given a resolved backend whose args target `serve`, return the args the
// runtime actually understands: unchanged when `serve` is supported, or
// rewritten to `dashboard --no-open` for older runtimes.
function getBackendArgsForRuntime(backend) {
return backendSupportsServe(backend) ? backend.args : dashboardFallbackArgs(backend.args)
}
function normalizeExecutablePathForCompare(commandPath) {
if (!commandPath) return null
@@ -1552,64 +1621,26 @@ function getVenvPython(venvRoot) {
return path.join(venvRoot, IS_WINDOWS ? path.join('Scripts', 'python.exe') : path.join('bin', 'python'))
}
function readVenvHome(venvRoot) {
try {
const cfg = fs.readFileSync(path.join(venvRoot, 'pyvenv.cfg'), 'utf8')
const match = cfg.match(/^home\s*=\s*(.+?)\s*$/im)
return match ? match[1].trim() : null
} catch {
return null
}
}
function getNoConsoleVenvPython(venvRoot) {
if (!IS_WINDOWS) return getVenvPython(venvRoot)
// Prefer the venv's own pythonw shim — it carries pyvenv.cfg / site-packages
// wiring. Falling back to the base uv/python.org pythonw.exe skips the venv
// and breaks imports (yaml, hermes_cli, …) even when PYTHONPATH is patched.
const venvPythonw = path.join(venvRoot, 'Scripts', 'pythonw.exe')
if (fileExists(venvPythonw)) return venvPythonw
const baseHome = readVenvHome(venvRoot)
if (baseHome) {
const basePythonw = path.join(baseHome, 'pythonw.exe')
if (fileExists(basePythonw)) return basePythonw
}
return venvPythonw
}
function toNoConsolePython(pythonPath) {
if (!IS_WINDOWS || !pythonPath) return pythonPath
const resolved = String(pythonPath)
if (/pythonw\.exe$/i.test(resolved)) return resolved
if (/python\.exe$/i.test(resolved)) {
const pythonw = path.join(path.dirname(resolved), 'pythonw.exe')
if (fileExists(pythonw)) return pythonw
}
return pythonPath
}
function applyWindowsNoConsoleSpawnHints(backend) {
if (!IS_WINDOWS || !backend?.command) return backend
const usesHermesModule =
backend.kind === 'python' ||
(Array.isArray(backend.args) && backend.args[0] === '-m' && backend.args[1] === 'hermes_cli.main')
if (!usesHermesModule) return backend
backend.command = toNoConsolePython(backend.command)
if (/pythonw\.exe$/i.test(path.basename(String(backend.command || '')))) {
backend.readyFile = true
}
return backend
}
// Windows console-window flashes are governed by the *parent's* console, not by
// each child spawn. A GUI-subsystem parent (pythonw.exe) has no console, so every
// console-subsystem child it spawns (git, gh, cmd, ...) must allocate its own —
// which flashes a window. A console-subsystem parent (python.exe) instead owns a
// single console that all of its children inherit, so none of them flash.
//
// Note this change adds no new creationflag: the backend spawn is ALREADY wrapped
// in hiddenWindowsChildOptions() (windowsHide: true), but that setting is INERT
// against pythonw.exe — a GUI-subsystem process has no console for it to act on.
// Switching the backend to the venv's console python.exe is what makes the
// existing wrapper load-bearing: with windowsHide the process comes up owning a
// *windowless* console (verified at runtime — it has an attachable console whose
// window handle is NULL), and its children inherit that one windowless console
// instead of each allocating a visible one.
//
// This makes "no flashing windows" a property of the one backend launch rather
// than a flag that has to be remembered at every descendant spawn site. Restoring
// console python also restores stdout, so the backend announces its port on the
// normal HERMES_DASHBOARD_READY stdout line and no ready-file side channel is
// needed.
function getVenvSitePackagesEntries(venvRoot) {
const entries = []
@@ -1964,6 +1995,16 @@ async function readCommitLog(cwd, branch) {
let updateInFlight = false
// Set to true when the desktop is about to quit so a detached swap/install/
// uninstall script can take over. On macOS, app.quit() closes windows but
// window-all-closed deliberately keeps the process alive (standard Electron
// macOS convention). Without this flag the process never exits — the detached
// hand-off script spins its PID-wait for the full timeout, and the user sees a
// blank app with no window (and an uninstall that appears to do nothing). When
// set, window-all-closed calls app.quit() on every platform so the process
// actually dies and the hand-off script can proceed immediately.
let isQuittingForHandoff = false
// Resolve the staged updater binary. The Tauri installer copies itself to
// HERMES_HOME/hermes-setup.exe on a successful install (see
// apps/bootstrap-installer paths::copy_self_to_hermes_home). That binary owns
@@ -2219,6 +2260,7 @@ async function applyUpdates(opts = {}) {
// appears), THEN quit to release the venv shim. The updater rebuilds and
// relaunches us when it's done. (#50419 — a 600ms quit looked like a crash
// and lured users into the #50238 relaunch loop.)
isQuittingForHandoff = true
setTimeout(() => {
app.quit()
}, UPDATE_HANDOFF_DWELL_MS)
@@ -2242,7 +2284,18 @@ async function handOffWindowsBootstrapRecovery(reason) {
: configuredBranch || DEFAULT_UPDATE_BRANCH
const venvBin = path.join(updateRoot, 'venv', IS_WINDOWS ? 'Scripts' : 'bin')
const venvHermes = path.join(venvBin, IS_WINDOWS ? 'hermes.exe' : 'hermes')
const updaterArgs = fileExists(venvHermes) ? ['--update', '--branch', branch] : ['--repair', '--branch', branch]
const venvPython = path.join(venvBin, IS_WINDOWS ? 'python.exe' : 'python')
// Choose the gentle in-place --update when ANY real-install signal is present,
// not just the `hermes.exe` console-script shim. That shim is generated at the
// END of venv setup and is absent in exactly the interrupted/quarantined states
// this recovery exists to heal — gating on it alone forced the destructive
// --repair (full venv recreate) and drove reinstall loops. The venv interpreter
// and the bootstrap-complete marker are present earlier and are better signals.
const haveRealInstall =
fileExists(venvPython) ||
fileExists(venvHermes) ||
fileExists(path.join(updateRoot, '.hermes-bootstrap-complete'))
const updaterArgs = haveRealInstall ? ['--update', '--branch', branch] : ['--repair', '--branch', branch]
await releaseBackendLockForUpdate(updateRoot)
@@ -2265,6 +2318,7 @@ async function handOffWindowsBootstrapRecovery(reason) {
// Same dwell as the in-app update hand-off (#50419): give the updater's
// window time to appear before we vanish, so the recovery doesn't look like
// a crash and provoke a mid-recovery relaunch.
isQuittingForHandoff = true
setTimeout(() => {
app.quit()
}, UPDATE_HANDOFF_DWELL_MS)
@@ -2344,14 +2398,14 @@ async function applyUpdatesPosixInApp() {
PATH: pathWithHermesManagedNode(path.join(updateRoot, 'venv', 'bin'))
}
// `hermes update` reaps stale `hermes dashboard` backends (a code update
// `hermes update` reaps stale `hermes serve` backends (a code update
// leaves the running process serving old Python against the freshly-updated
// JS bundle). But OUR backend is one of those processes, and killing it
// mid-update produces the boot→kill→crash loop in #37532 — the desktop
// already restarts its own backend via the rebuild+relaunch below, so the
// reap must spare it. Hand the live backend's PID to the update process;
// _kill_stale_dashboard_processes reads HERMES_DESKTOP_CHILD_PID and excludes
// it while still reaping any genuinely-orphaned dashboards. (#37532)
// it while still reaping any genuinely-orphaned backends. (#37532)
// Exclude every desktop-managed backend (primary + all pool profiles) from
// the update reaper. _kill_stale_dashboard_processes accepts a comma-separated
// list (a single int still parses for back-compat).
@@ -2472,6 +2526,7 @@ async function applyUpdatesPosixInApp() {
`[updates] launched linux relaunch: ${scriptPath} -> ${process.execPath} ` +
`(args=${relaunchArgs.length}, env=${Object.keys(relaunchEnv).length})`
)
isQuittingForHandoff = true
setTimeout(() => app.quit(), UPDATE_HANDOFF_DWELL_MS)
return { ok: true, handedOff: true }
} catch (err) {
@@ -2577,6 +2632,7 @@ fi
child.unref()
rememberLog(`[updates] launched mac swap+relaunch: ${scriptPath} (${rebuiltApp} -> ${targetApp})`)
isQuittingForHandoff = true
setTimeout(() => app.quit(), 600)
return { ok: true, handedOff: true, rebuiltApp, targetApp }
}
@@ -2607,6 +2663,24 @@ function readBootstrapMarker() {
return readJson(BOOTSTRAP_COMPLETE_MARKER)
}
// Marker-independent: is the canonical install at ACTIVE_HERMES_ROOT actually
// runnable right now? A complete CLI install (`install.sh --include-desktop`)
// or a DMG launch over a prior CLI install satisfies this WITHOUT the desktop
// ever having written the bootstrap marker -- so we must be able to recognise
// "already installed" off the filesystem alone, not just the marker.
function isActiveRuntimeUsable() {
const venvPython = getVenvPython(VENV_ROOT)
return (
isHermesSourceRoot(ACTIVE_HERMES_ROOT) &&
fileExists(venvPython) &&
canImportHermesCli(venvPython, {
env: {
PYTHONPATH: [ACTIVE_HERMES_ROOT, process.env.PYTHONPATH].filter(Boolean).join(path.delimiter)
}
})
)
}
function isBootstrapComplete() {
const marker = readBootstrapMarker()
if (!marker || typeof marker !== 'object') return false
@@ -2619,7 +2693,7 @@ function isBootstrapComplete() {
// a runnable venv: an interrupted or split-home install can leave the marker
// + checkout without a venv, and trusting that spawns a dead backend
// ("gateway offline") instead of re-running bootstrap to repair it.
return isHermesSourceRoot(ACTIVE_HERMES_ROOT) && fileExists(getVenvPython(VENV_ROOT))
return isActiveRuntimeUsable()
}
function writeBootstrapMarker(payload) {
@@ -2782,60 +2856,60 @@ function writeDefaultProjectDir(dir) {
}
}
function createPythonBackend(root, label, dashboardArgs, options = {}) {
function createPythonBackend(root, label, backendArgs, options = {}) {
const python = findPythonForRoot(root)
if (!python) return null
const venvRoot = path.join(root, 'venv')
const venvPython = getVenvPython(venvRoot)
const command = IS_WINDOWS && fileExists(venvPython) ? getNoConsoleVenvPython(venvRoot) : toNoConsolePython(python)
const command = IS_WINDOWS && fileExists(venvPython) ? venvPython : python
return applyWindowsNoConsoleSpawnHints({
return {
kind: 'python',
label,
command,
args: ['-m', 'hermes_cli.main', ...dashboardArgs],
args: ['-m', 'hermes_cli.main', ...backendArgs],
env: buildDesktopBackendEnv({
hermesHome: HERMES_HOME,
pythonPathEntries: [root],
pythonPathEntries: [root, ...getVenvSitePackagesEntries(venvRoot)],
venvRoot
}),
root,
bootstrap: Boolean(options.bootstrap),
shell: false
})
}
}
// createActiveBackend — build a backend pointing at ACTIVE_HERMES_ROOT, the
// canonical install location shared with the CLI installer. The venv at
// VENV_ROOT may not exist yet on first run; bootstrap=true tells
// ensureRuntime() to create / refresh it before launch.
function createActiveBackend(dashboardArgs) {
function createActiveBackend(backendArgs) {
const venvPython = getVenvPython(VENV_ROOT)
const command = fileExists(venvPython) ? getNoConsoleVenvPython(VENV_ROOT) : toNoConsolePython(findSystemPython())
const command = fileExists(venvPython) ? venvPython : findSystemPython()
return applyWindowsNoConsoleSpawnHints({
return {
kind: 'python',
label: `Hermes at ${ACTIVE_HERMES_ROOT}`,
command,
args: ['-m', 'hermes_cli.main', ...dashboardArgs],
args: ['-m', 'hermes_cli.main', ...backendArgs],
env: buildDesktopBackendEnv({
hermesHome: HERMES_HOME,
pythonPathEntries: [ACTIVE_HERMES_ROOT],
pythonPathEntries: [ACTIVE_HERMES_ROOT, ...getVenvSitePackagesEntries(VENV_ROOT)],
venvRoot: VENV_ROOT
}),
root: ACTIVE_HERMES_ROOT,
bootstrap: true,
shell: false
})
}
}
function resolveHermesBackend(dashboardArgs) {
function resolveHermesBackend(backendArgs) {
// 1. Explicit override -- HERMES_DESKTOP_HERMES_ROOT points at a developer
// checkout. Honour it as-is (no bootstrap; the user is driving).
const overrideRoot = process.env.HERMES_DESKTOP_HERMES_ROOT && path.resolve(process.env.HERMES_DESKTOP_HERMES_ROOT)
if (overrideRoot && isHermesSourceRoot(overrideRoot)) {
const backend = createPythonBackend(overrideRoot, `Hermes source at ${overrideRoot}`, dashboardArgs)
const backend = createPythonBackend(overrideRoot, `Hermes source at ${overrideRoot}`, backendArgs)
if (backend) return backend
}
@@ -2844,7 +2918,7 @@ function resolveHermesBackend(dashboardArgs) {
// installed `hermes` on PATH so local Python edits are actually exercised.
// (In dev with no checkout, SOURCE_REPO_ROOT won't pass isHermesSourceRoot.)
if (!IS_PACKAGED && isHermesSourceRoot(SOURCE_REPO_ROOT)) {
const backend = createPythonBackend(SOURCE_REPO_ROOT, `Hermes source at ${SOURCE_REPO_ROOT}`, dashboardArgs)
const backend = createPythonBackend(SOURCE_REPO_ROOT, `Hermes source at ${SOURCE_REPO_ROOT}`, backendArgs)
if (backend) return backend
}
@@ -2855,7 +2929,7 @@ function resolveHermesBackend(dashboardArgs) {
// to spawning hermes. Updates flow through the in-app update path
// (applyUpdates -> git pull) or `hermes update` from the CLI.
if (isBootstrapComplete()) {
return createActiveBackend(dashboardArgs)
return createActiveBackend(backendArgs)
}
// 4. Existing `hermes` on PATH -- installed via install.ps1 / install.sh from
@@ -2888,7 +2962,7 @@ function resolveHermesBackend(dashboardArgs) {
}
if (hermesCommand) {
const unwrapped = unwrapWindowsVenvHermesCommand(hermesCommand, dashboardArgs)
const unwrapped = unwrapWindowsVenvHermesCommand(hermesCommand, backendArgs)
if (unwrapped) {
return unwrapped
}
@@ -2903,10 +2977,10 @@ function resolveHermesBackend(dashboardArgs) {
const shellForProbe = isCommandScript(hermesCommand)
if (verifyHermesCli(hermesCommand, { shell: shellForProbe })) {
return (
unwrapWindowsVenvHermesCommand(hermesCommand, dashboardArgs) || {
unwrapWindowsVenvHermesCommand(hermesCommand, backendArgs) || {
label: `existing Hermes CLI at ${hermesCommand}`,
command: hermesCommand,
args: dashboardArgs,
args: backendArgs,
bootstrap: false,
env: {},
kind: 'command',
@@ -2934,15 +3008,15 @@ function resolveHermesBackend(dashboardArgs) {
// failure, fall through to step 6 so the bootstrap runner pulls
// a uv-managed 3.11 into %LOCALAPPDATA%\hermes\hermes-agent\venv.
if (canImportHermesCli(python)) {
return applyWindowsNoConsoleSpawnHints({
return {
kind: 'python',
label: `installed hermes_cli module via ${python}`,
command: toNoConsolePython(python),
args: ['-m', 'hermes_cli.main', ...dashboardArgs],
command: python,
args: ['-m', 'hermes_cli.main', ...backendArgs],
bootstrap: false,
env: {},
shell: false
})
}
}
rememberLog(`Ignoring system Python ${python}: hermes_cli is not importable; falling through to bootstrap.`)
}
@@ -2961,7 +3035,7 @@ function resolveHermesBackend(dashboardArgs) {
kind: 'bootstrap-needed',
label: 'Hermes Agent not installed yet; bootstrap required',
command: null,
args: dashboardArgs,
args: backendArgs,
bootstrap: true,
env: {},
shell: false,
@@ -2976,7 +3050,7 @@ function resolveHermesBackend(dashboardArgs) {
async function ensureRuntime(backend) {
if (!backend.bootstrap) {
await advanceBootProgress('runtime.external', `Using ${backend.label}`, 32)
return applyWindowsNoConsoleSpawnHints(backend)
return backend
}
// backend.kind === 'bootstrap-needed' means resolveHermesBackend couldn't
@@ -3118,7 +3192,7 @@ async function ensureRuntime(backend) {
)
}
backend.command = getNoConsoleVenvPython(VENV_ROOT)
backend.command = getVenvPython(VENV_ROOT)
backend.label = `Hermes at ${ACTIVE_HERMES_ROOT} (venv: ${VENV_ROOT})`
updateBootProgress({
phase: 'runtime.ready',
@@ -3127,7 +3201,7 @@ async function ensureRuntime(backend) {
running: true,
error: null
})
return applyWindowsNoConsoleSpawnHints(backend)
return backend
}
function fetchJson(url, token, options = {}) {
@@ -3788,7 +3862,7 @@ function getWindowButtonPosition() {
}
function getNativeOverlayWidth() {
return computeNativeOverlayWidth({ isWindows: IS_WINDOWS, isWsl: IS_WSL })
return computeNativeOverlayWidth({ isWindows: IS_WINDOWS, isWsl: IS_WSL, isMac: IS_MAC })
}
function getWindowState() {
@@ -5194,8 +5268,10 @@ async function spawnPoolBackend(profile, entry) {
// --profile wins over the inherited HERMES_HOME env (see _apply_profile_override
// step 3 in hermes_cli/main.py), so the child re-homes to this profile.
// --port 0: the OS assigns an ephemeral port; the child announces it on stdout.
const dashboardArgs = ['--profile', profile, 'dashboard', '--no-open', '--host', '127.0.0.1', '--port', '0']
const backend = await ensureRuntime(resolveHermesBackend(dashboardArgs))
const backendArgs = ['--profile', profile, 'serve', '--host', '127.0.0.1', '--port', '0']
const backend = await ensureRuntime(resolveHermesBackend(backendArgs))
// Route old runtimes (no `serve`) through the legacy `dashboard --no-open`.
backend.args = getBackendArgsForRuntime(backend)
const hermesCwd = resolveHermesCwd()
const webDist = resolveWebDist()
const readyFile = backend.readyFile ? makeDashboardReadyFile() : null
@@ -5411,7 +5487,7 @@ async function startHermes() {
const token = crypto.randomBytes(32).toString('base64url')
// --port 0: the OS assigns an ephemeral port; the child announces it on stdout.
const dashboardArgs = ['dashboard', '--no-open', '--host', '127.0.0.1', '--port', '0']
const backendArgs = ['serve', '--host', '127.0.0.1', '--port', '0']
// Pin the desktop's chosen profile via the global --profile flag. This is
// deterministic (it wins over the sticky ~/.hermes/active_profile file) and
// resolves HERMES_HOME the same way `hermes -p <name>` does on the CLI. An
@@ -5419,10 +5495,12 @@ async function startHermes() {
// unaffected.
const activeProfile = readActiveDesktopProfile()
if (activeProfile) {
dashboardArgs.unshift('--profile', activeProfile)
backendArgs.unshift('--profile', activeProfile)
}
await advanceBootProgress('backend.runtime', 'Resolving Hermes runtime', 28)
const backend = await ensureRuntime(resolveHermesBackend(dashboardArgs))
const backend = await ensureRuntime(resolveHermesBackend(backendArgs))
// Route old runtimes (no `serve`) through the legacy `dashboard --no-open`.
backend.args = getBackendArgsForRuntime(backend)
const hermesCwd = resolveHermesCwd()
const webDist = resolveWebDist()
const readyFile = backend.readyFile ? makeDashboardReadyFile() : null
@@ -7323,6 +7401,7 @@ async function runDesktopUninstall(mode) {
// Give the renderer a beat to show its "uninstalling…" state, then quit so
// the venv python shim + app bundle unlock and the cleanup script can run.
isQuittingForHandoff = true
setTimeout(() => app.quit(), 800)
return { ok: true, mode, willRemoveAppBundle: Boolean(removeBundle), scriptPath }
}
@@ -7528,5 +7607,11 @@ app.on('before-quit', () => {
})
app.on('window-all-closed', () => {
if (process.platform !== 'darwin') app.quit()
// macOS convention: keep the process alive in the Dock when the user closes
// the last window. But when we're handing off to a detached updater / swap /
// uninstall script, the process MUST exit so the script can replace or remove
// the bundle and relaunch — without this the script's PID-wait spins to its
// full timeout and the user is left with an invisible app (or an uninstall
// that appears to do nothing).
if (process.platform !== 'darwin' || isQuittingForHandoff) app.quit()
})

View File

@@ -1,11 +1,24 @@
// Pre-layout fallback for WCO right-edge reservation (--titlebar-tools-right).
// Live width comes from navigator.windowControlsOverlay in the renderer.
'use strict'
const OVERLAY_FALLBACK_WIDTH = 144
/** @param {{ isWindows?: boolean, isWsl?: boolean }} opts */
function nativeOverlayWidth({ isWindows = false, isWsl = false } = {}) {
return isWindows || isWsl ? OVERLAY_FALLBACK_WIDTH : 0
/**
* Static pre-layout reservation (px) for the right-side native window-controls
* overlay (min/max/close). Only a FALLBACK — once laid out the renderer reads
* the exact width from navigator.windowControlsOverlay
* (use-window-controls-overlay-width.ts) and uses this value only when the WCO
* API is unavailable.
*
* macOS uses traffic lights positioned via trafficLightPosition, not a WCO
* overlay, so it reserves nothing here. Every other desktop platform now paints
* the Electron overlay (Windows, WSLg, and plain Linux KDE/GNOME), so they all
* reserve the fallback width.
*
* @param {{ isWindows?: boolean, isWsl?: boolean, isMac?: boolean }} opts
*/
function nativeOverlayWidth({ isWindows = false, isWsl = false, isMac = false } = {}) {
if (isMac) return 0
return OVERLAY_FALLBACK_WIDTH
}
module.exports = { OVERLAY_FALLBACK_WIDTH, nativeOverlayWidth }

View File

@@ -18,10 +18,17 @@ test('WSLg paints the same WCO, so it reserves the same fallback width', () => {
assert.equal(nativeOverlayWidth({ isWsl: true }), OVERLAY_FALLBACK_WIDTH)
})
test('plain Linux and macOS reserve nothing', () => {
assert.equal(nativeOverlayWidth({ isWindows: false, isWsl: false }), 0)
assert.equal(nativeOverlayWidth(), 0)
assert.equal(nativeOverlayWidth({}), 0)
test('plain Linux paints the WCO too, so it reserves the fallback width', () => {
// Regression #53185: re-enabling the overlay on plain Linux (KDE/GNOME)
// without reserving its width left the native min/max/close buttons painting
// on top of the app's right-edge titlebar tools.
assert.equal(nativeOverlayWidth({ isWindows: false, isWsl: false }), OVERLAY_FALLBACK_WIDTH)
assert.equal(nativeOverlayWidth(), OVERLAY_FALLBACK_WIDTH)
assert.equal(nativeOverlayWidth({}), OVERLAY_FALLBACK_WIDTH)
})
test('macOS uses traffic lights, not a WCO overlay, so it reserves nothing', () => {
assert.equal(nativeOverlayWidth({ isMac: true }), 0)
})
test('the fallback width is a sane positive pixel value', () => {

View File

@@ -38,19 +38,40 @@ test('desktop background child processes opt into hidden Windows consoles', () =
requireHiddenChildOptions(source, /hermesProcess = spawn\(\s*backend\.command,\s*backend\.args/)
requireHiddenChildOptions(source, /spawn\(\s*py,\s*\['-m', 'hermes_cli\.main', 'uninstall', '--gui-summary'\]/)
assert.match(source, /function unwrapWindowsVenvHermesCommand\(command, dashboardArgs\)/)
assert.match(source, /existing Hermes no-console Python at/)
assert.match(source, /function getNoConsoleVenvPython\(venvRoot\)/)
assert.match(source, /function toNoConsolePython\(pythonPath\)/)
assert.match(source, /function applyWindowsNoConsoleSpawnHints\(backend\)/)
assert.match(source, /function readVenvHome\(venvRoot\)/)
assert.match(source, /path\.join\(venvRoot, 'Scripts', 'pythonw\.exe'\)/)
assert.match(source, /backendStartFailure/)
assert.match(source, /HERMES_DESKTOP_READY_FILE/)
assert.match(source, /readyFile: true/)
assert.match(source, /function unwrapWindowsVenvHermesCommand\(command, backendArgs\)/)
assert.match(source, /function getVenvSitePackagesEntries\(venvRoot\)/)
assert.match(source, /path\.join\(venvRoot, 'Lib', 'site-packages'\)/)
assert.match(source, /args: \['-m', 'hermes_cli\.main', \.\.\.dashboardArgs\]/)
assert.match(source, /args: \['-m', 'hermes_cli\.main', \.\.\.backendArgs\]/)
})
test('desktop backend launches console python so child consoles are inherited, not pythonw', () => {
const source = readElectronFile('main.cjs')
// The flash fix is structural: the backend runs as a console-subsystem
// python.exe under hiddenWindowsChildOptions() (-> CREATE_NO_WINDOW), so it
// owns ONE windowless console that every descendant spawn inherits. Launching
// it as GUI-subsystem pythonw.exe is what made each child allocate (and flash)
// its own console, so the backend command must never be pythonw.
assert.doesNotMatch(source, /pythonw\.exe'\)/, 'backend must not be launched via pythonw.exe')
assert.doesNotMatch(
source,
/function getNoConsoleVenvPython\b/,
'pythonw-conversion helper should be gone; console python is launched directly'
)
assert.doesNotMatch(
source,
/function applyWindowsNoConsoleSpawnHints\b/,
'pythonw spawn-hint rewriter should be gone'
)
// Console python restores stdout, so the port is announced on the normal
// HERMES_DASHBOARD_READY stdout line — no ready-file side channel is set.
assert.doesNotMatch(source, /readyFile: true/, 'no backend should opt into the pythonw ready-file path')
// Both desktop backend launches must still go through hiddenWindowsChildOptions
// so the single backend console is created windowless.
requireHiddenChildOptions(source, /spawn\(\s*backend\.command,\s*backend\.args/)
requireHiddenChildOptions(source, /hermesProcess = spawn\(\s*backend\.command,\s*backend\.args/)
})
test('intentional or interactive desktop child processes stay documented', () => {
@@ -68,5 +89,5 @@ test('bootstrap PowerShell runner hides Windows console children', () => {
const source = readElectronFile('bootstrap-runner.cjs')
assert.match(source, /function hiddenWindowsChildOptions\(options = \{\}\)/)
requireHiddenChildOptions(source, 'spawn(ps, fullArgs')
requireHiddenChildOptions(source, /spawn\(\s*ps,\s*fullArgs/)
})

View File

@@ -0,0 +1,67 @@
'use strict'
// Regression guards for Windows `hermes` resolution in main.cjs.
//
// main.cjs has no module.exports, so these follow the repo's source-assertion
// test pattern (see windows-child-process.test.cjs). They pin the two Windows
// resolution bugs that caused desktop reinstall loops:
// 1. findOnPath() tried the empty extension FIRST, so an extensionless
// Git-Bash `hermes` shim shadowed the real hermes.cmd/hermes.exe; the
// shim then failed the --version probe and the desktop fell through to a
// spurious bootstrap/repair.
// 2. handOffWindowsBootstrapRecovery() chose --update vs the destructive
// --repair by checking ONLY venv\Scripts\hermes.exe (the console-script
// shim, written at the END of venv setup and absent in interrupted
// states), so it escalated to a full venv recreate even on healthy
// installs.
const test = require('node:test')
const assert = require('node:assert/strict')
const fs = require('node:fs')
const path = require('node:path')
function readMain() {
return fs.readFileSync(path.join(__dirname, 'main.cjs'), 'utf8').replace(/\r\n/g, '\n')
}
test('findOnPath tries PATHEXT extensions before the bare (empty) name on Windows', () => {
const source = readMain()
// Fixed order: PATHEXT first, empty string LAST.
assert.match(
source,
/\(process\.env\.PATHEXT \|\| '\.COM;\.EXE;\.BAT;\.CMD'\)\.split\(';'\)\.filter\(Boolean\), ''\]/,
'extensions array must end with the empty string, not start with it'
)
// The buggy empty-first order must not return.
assert.doesNotMatch(
source,
/\['', \.\.\.\(process\.env\.PATHEXT/,
'empty-extension-first order regressed: an extensionless shim can shadow hermes.cmd/.exe'
)
})
test('Windows bootstrap recovery chooses --update when any real-install signal is present', () => {
const source = readMain()
assert.match(source, /const haveRealInstall =/, 'recovery must compute haveRealInstall')
assert.match(
source,
/fileExists\(venvPython\)/,
'recovery must accept the venv interpreter as a real-install signal'
)
assert.match(
source,
/\.hermes-bootstrap-complete/,
'recovery must accept the bootstrap-complete marker as a real-install signal'
)
assert.match(
source,
/updaterArgs = haveRealInstall \? \['--update'/,
'updaterArgs must gate on haveRealInstall'
)
// The old too-narrow check (only venv\Scripts\hermes.exe) must not return.
assert.doesNotMatch(
source,
/updaterArgs = fileExists\(venvHermes\) \?/,
'recovery regressed to gating only on the hermes.exe shim, which forces destructive --repair'
)
})

View File

@@ -18,7 +18,7 @@
"profile:main": "wait-on http://127.0.0.1:5174 && cross-env XCURSOR_SIZE=24 HERMES_DESKTOP_DEV_SERVER=http://127.0.0.1:5174 electron --inspect=9229 .",
"profile:main:cpu": "wait-on http://127.0.0.1:5174 && cross-env XCURSOR_SIZE=24 NODE_OPTIONS=--cpu-prof HERMES_DESKTOP_DEV_SERVER=http://127.0.0.1:5174 electron .",
"start": "npm run build && electron .",
"build": "node scripts/assert-root-install.cjs && node scripts/write-build-stamp.cjs && node scripts/stage-native-deps.cjs && tsc -b && vite build && node scripts/bundle-electron-main.mjs && npm run postbuild",
"build": "node scripts/assert-root-install.cjs && node scripts/write-build-stamp.cjs && node scripts/stage-native-deps.cjs && tsc -b && vite build && npm run postbuild",
"postbuild": "node scripts/assert-dist-built.cjs",
"prebuilder": "node scripts/patch-electron-builder-mac-binary.cjs",
"builder": "cross-env NODE_OPTIONS=--max-old-space-size=16384 node scripts/run-electron-builder.cjs",
@@ -37,7 +37,7 @@
"test:desktop:nsis": "node scripts/test-desktop.mjs nsis",
"test:desktop:existing": "node scripts/test-desktop.mjs existing",
"test:desktop:fresh": "node scripts/test-desktop.mjs fresh",
"test:desktop:platforms": "node --test electron/bootstrap-platform.test.cjs electron/hardening.test.cjs electron/backend-env.test.cjs electron/backend-probes.test.cjs electron/backend-ready.test.cjs electron/bootstrap-runner.test.cjs electron/connection-config.test.cjs electron/dashboard-token.test.cjs electron/gateway-ws-probe.test.cjs electron/oauth-net-request.test.cjs electron/desktop-uninstall.test.cjs electron/session-windows.test.cjs electron/link-title-window.test.cjs electron/workspace-cwd.test.cjs electron/fs-read-dir.test.cjs electron/git-root.test.cjs electron/git-worktree-ops.test.cjs electron/windows-child-process.test.cjs electron/update-remote.test.cjs electron/update-count.test.cjs electron/update-rebuild.test.cjs electron/update-marker.test.cjs electron/update-relaunch.test.cjs electron/windows-user-env.test.cjs electron/wsl-clipboard-image.test.cjs electron/titlebar-overlay-width.test.cjs electron/window-state.test.cjs",
"test:desktop:platforms": "node --test electron/bootstrap-platform.test.cjs electron/hardening.test.cjs electron/backend-env.test.cjs electron/backend-probes.test.cjs electron/backend-ready.test.cjs electron/bootstrap-runner.test.cjs electron/connection-config.test.cjs electron/dashboard-token.test.cjs electron/gateway-ws-probe.test.cjs electron/oauth-net-request.test.cjs electron/desktop-uninstall.test.cjs electron/session-windows.test.cjs electron/link-title-window.test.cjs electron/workspace-cwd.test.cjs electron/fs-read-dir.test.cjs electron/git-root.test.cjs electron/git-worktree-ops.test.cjs electron/windows-child-process.test.cjs electron/update-remote.test.cjs electron/update-count.test.cjs electron/update-rebuild.test.cjs electron/update-marker.test.cjs electron/update-relaunch.test.cjs electron/windows-user-env.test.cjs electron/wsl-clipboard-image.test.cjs electron/titlebar-overlay-width.test.cjs electron/window-state.test.cjs electron/windows-hermes-resolution.test.cjs",
"typecheck": "tsc -p . --noEmit",
"lint": "eslint src/ electron/",
"lint:fix": "eslint src/ electron/ --fix",
@@ -73,6 +73,7 @@
"@tanstack/react-virtual": "^3.13.24",
"@vscode/codicons": "^0.0.45",
"@xterm/addon-fit": "^0.11.0",
"@xterm/addon-serialize": "^0.14.0",
"@xterm/addon-unicode11": "^0.9.0",
"@xterm/addon-web-links": "^0.12.0",
"@xterm/addon-webgl": "^0.19.0",

View File

@@ -1,33 +0,0 @@
#!/usr/bin/env node
// bundle-electron-main.mjs — bundles electron/main.cjs into a single
// self-contained file so the nix build doesn't need to ship node_modules/.
//
// `electron` is provided by the runtime; `node-pty` is staged separately
// via stage-native-deps.cjs. `preload.cjs` is NOT require()'d by main —
// Electron loads it via path.join(__dirname, 'preload.cjs') — so it stays
// as a separate file and doesn't need bundling.
import { build } from 'esbuild'
import { resolve, dirname } from 'node:path'
import { fileURLToPath } from 'node:url'
import { renameSync } from 'node:fs'
const here = dirname(fileURLToPath(import.meta.url))
const root = resolve(here, '..')
const entry = resolve(root, 'electron/main.cjs')
const tmp = resolve(root, 'electron/main.bundled.cjs')
await build({
entryPoints: [entry],
bundle: true,
platform: 'node',
format: 'cjs',
target: 'node20',
outfile: tmp,
external: ['electron', 'node-pty'],
logLevel: 'info'
})
// Overwrite the original with the bundled version.
renameSync(tmp, entry)
console.log(`bundled ${entry}`)

View File

@@ -66,6 +66,31 @@ const NATIVE_DEPS = [
}
]
// Pure-JS runtime dependencies that the packaged electron main require()s but
// that workspace dedup hoists into the repo-root node_modules -- out of reach
// of electron-builder's file collector, exactly like node-pty above. Unlike
// node-pty there is no native binary to select; we stage each package's whole
// directory into build/native-deps/vendor/node_modules/<name> so the dep's own
// internal require()s resolve against a real node_modules tree, and the
// requiring file (electron/git-review-ops.cjs) falls back to that path via
// process.resourcesPath when the normal require() fails. See issue #52735
// (packaged app crashed at launch on `Cannot find module 'simple-git'`).
//
// The closure is resolved at stage time by walking dependencies +
// optionalDependencies, so a simple-git version bump that pulls in a new
// transitive dep can't silently re-introduce the crash.
//
// Layout note: the closure lands in build/native-deps/vendor/node_modules/,
// NOT build/native-deps/node_modules/. electron-builder's file collector
// hard-drops a `node_modules` directory that sits at the ROOT of an
// extraResources copy (app-builder-lib/out/util/filter.js: `if (relative ===
// "node_modules") return false`), but keeps a NESTED one. Nesting under
// `vendor/` makes node_modules a subdirectory so it survives packing; the
// require() fallback in git-review-ops.cjs resolves the matching
// vendor/node_modules path.
const JS_DEP_ROOTS = ['simple-git']
const JS_DEP_STAGE_ROOT = path.join(STAGE_ROOT, 'vendor', 'node_modules')
function rmrf(target) {
fs.rmSync(target, { recursive: true, force: true })
}
@@ -148,12 +173,111 @@ function stageOne(spec) {
console.log(`[stage-native-deps] ${path.relative(APP_ROOT, spec.to)}: ${copied} files`)
}
// Resolve a package's directory by name, searching the repo-root node_modules
// first (where workspace dedup hoists everything) and then the requiring
// package's own node_modules for any non-hoisted nested copy.
//
// We deliberately do NOT use require.resolve(`${name}/package.json`): packages
// with an "exports" map that doesn't list "./package.json" (e.g. simple-git
// 3.x) make that subpath unresolvable under Node's exports enforcement
// (ERR_PACKAGE_PATH_NOT_EXPORTED), which fails on CI even though it happened to
// work locally. Instead resolve the package's main entry (exports-aware) and
// walk up to the directory whose package.json's "name" matches.
function resolvePkgDir(name, fromDir) {
const searchPaths = [fromDir, REPO_ROOT, path.join(REPO_ROOT, 'node_modules')]
let entry
try {
entry = require.resolve(name, { paths: searchPaths })
} catch {
return null
}
// Walk up from the resolved entry file to the package root: the first
// ancestor dir whose package.json declares this package's name.
let dir = path.dirname(entry)
while (true) {
const pjPath = path.join(dir, 'package.json')
try {
const pj = JSON.parse(fs.readFileSync(pjPath, 'utf8'))
if (pj.name === name) {
return dir
}
} catch {
// no package.json here (or unreadable) — keep walking up
}
const parent = path.dirname(dir)
if (parent === dir) {
return null
}
dir = parent
}
}
// Walk dependencies + optionalDependencies from each root package and return
// the set of resolved package directories in the runtime closure. Keyed by
// package name so a dep reached via two paths is staged once.
function resolveJsClosure(roots) {
const closure = new Map() // name -> absolute package dir
const stack = roots.map(name => ({ name, fromDir: REPO_ROOT }))
while (stack.length) {
const { name, fromDir } = stack.pop()
if (closure.has(name)) continue
const dir = resolvePkgDir(name, fromDir)
if (!dir) {
throw new Error(
`stage-native-deps: could not resolve '${name}' for the simple-git ` +
`closure. Run \`npm install\` at the workspace root first.`
)
}
closure.set(name, dir)
let pj
try {
pj = JSON.parse(fs.readFileSync(path.join(dir, 'package.json'), 'utf8'))
} catch {
continue
}
const deps = { ...(pj.dependencies || {}), ...(pj.optionalDependencies || {}) }
for (const depName of Object.keys(deps)) {
stack.push({ name: depName, fromDir: dir })
}
}
return closure
}
// Stage the resolved JS dependency closure into build/native-deps/vendor/node_modules/
// so the packaged app (and the nix output) can require() it from
// process.resourcesPath when the hoisted-root require() isn't reachable. Each
// package is copied whole (minus node_modules/ — the closure is flattened so
// every dep already has its own top-level entry) into a real node_modules
// layout, which keeps the deps' own internal require()s working unchanged.
function stageJsClosure(roots) {
const closure = resolveJsClosure(roots)
rmrf(JS_DEP_STAGE_ROOT)
ensureDir(JS_DEP_STAGE_ROOT)
let staged = 0
for (const [name, fromDir] of closure) {
const dest = path.join(JS_DEP_STAGE_ROOT, name)
ensureDir(path.dirname(dest))
// Copy the package directory but skip any nested node_modules/ — the
// closure is flattened, so nested copies would just bloat the bundle.
fs.cpSync(fromDir, dest, {
recursive: true,
filter: src => path.basename(src) !== 'node_modules'
})
staged += 1
}
console.log(
`[stage-native-deps] vendor/node_modules/: ${staged} package(s) ` +
`(${[...closure.keys()].sort().join(', ')})`
)
}
function main() {
rmrf(STAGE_ROOT)
ensureDir(STAGE_ROOT)
for (const spec of NATIVE_DEPS) {
stageOne(spec)
}
stageJsClosure(JS_DEP_ROOTS)
}
main()

View File

@@ -19,7 +19,7 @@ import {
type SubagentStreamEntry
} from '@/store/subagents'
import { OverlayView } from '../overlays/overlay-view'
import { Panel, PanelEmpty, PanelHeader } from '../overlays/panel'
// Mirrors statusGlyph() in tool-fallback.tsx so subagent rows speak the
// same visual vocabulary as the chat tool blocks.
@@ -86,18 +86,16 @@ export function AgentsView({ onClose }: AgentsViewProps) {
const tree = useMemo(() => buildSubagentTree(allSubagents(subagentsBySession)), [subagentsBySession])
return (
<OverlayView
closeLabel={t.agents.close}
contentClassName="px-5 pt-5 pb-4 sm:px-6"
onClose={onClose}
rootClassName="mx-auto max-w-3xl"
>
<header className="mb-3 shrink-0">
<h2 className="text-sm font-semibold text-foreground">{t.agents.title}</h2>
<p className="text-xs text-muted-foreground/80">{t.agents.subtitle}</p>
</header>
<SubagentTree tree={tree} />
</OverlayView>
<Panel closeLabel={t.agents.close} onClose={onClose}>
{tree.length === 0 ? (
<PanelEmpty description={t.agents.emptyDesc} icon="hubot" title={t.agents.emptyTitle} />
) : (
<>
<PanelHeader subtitle={t.agents.subtitle} title={t.agents.title} />
<SubagentTree tree={tree} />
</>
)}
</Panel>
)
}

View File

@@ -1,10 +1,9 @@
import { Fragment, memo, type ReactNode, useState } from 'react'
import { Fragment, memo, type ReactNode } from 'react'
import { openAgentTerminal } from '@/app/right-sidebar/terminal/terminals'
import { StatusRow } from '@/components/chat/status-row'
import { TerminalOutput } from '@/components/chat/terminal-output'
import { Button } from '@/components/ui/button'
import { Codicon } from '@/components/ui/codicon'
import { DisclosureCaret } from '@/components/ui/disclosure-caret'
import { GlyphSpinner } from '@/components/ui/glyph-spinner'
import { Tip } from '@/components/ui/tooltip'
import { type Translations, useI18n } from '@/i18n'
@@ -82,7 +81,6 @@ interface StatusItemRowProps {
export const StatusItemRow = memo(function StatusItemRow({ item, onDismiss, onOpen, onStop }: StatusItemRowProps) {
const { t } = useI18n()
const s = t.statusStack
const [outputOpen, setOutputOpen] = useState(false)
const failed = item.state === 'failed'
const running = item.state === 'running'
@@ -94,8 +92,10 @@ export const StatusItemRow = memo(function StatusItemRow({ item, onDismiss, onOp
: null
const canOpen = item.type === 'subagent' && !!onOpen
const hasOutput = item.type === 'background' && !!item.output
const onActivate = canOpen ? onOpen : hasOutput ? () => setOutputOpen(open => !open) : undefined
// Background rows link to their read-only terminal tab; subagents open their session.
const onActivate =
item.type === 'background' ? () => openAgentTerminal(item.id, item.title) : canOpen ? onOpen : undefined
return (
<Fragment>
@@ -146,9 +146,7 @@ export const StatusItemRow = memo(function StatusItemRow({ item, onDismiss, onOp
{s.exit(item.exitCode)}
</span>
)}
{hasOutput && <DisclosureCaret className="shrink-0 text-muted-foreground/45" open={outputOpen} size="0.8em" />}
</StatusRow>
{hasOutput && outputOpen && <TerminalOutput className="mx-auto mb-1 max-w-[90%]" text={item.output!} />}
</Fragment>
)
})

View File

@@ -5,6 +5,7 @@ import { droppedFileInlineRef } from '@/app/chat/composer/inline-refs'
import { formatRefValue } from '@/components/assistant-ui/directive-text'
import { useI18n } from '@/i18n'
import { attachmentId, contextPath, pathLabel } from '@/lib/chat-runtime'
import { readDesktopFileDataUrl, selectDesktopPaths } from '@/lib/desktop-fs'
import {
addComposerAttachment,
type ComposerAttachment,
@@ -262,7 +263,7 @@ export function useComposerActions({ activeSessionId, currentCwd, requestGateway
const pickContextPaths = useCallback(
async (kind: 'file' | 'folder') => {
const paths = await window.hermesDesktop?.selectPaths({
const paths = await selectDesktopPaths({
title: kind === 'file' ? 'Add files as context' : 'Add folders as context',
defaultPath: currentCwd || undefined,
directories: kind === 'folder'
@@ -347,7 +348,7 @@ export function useComposerActions({ activeSessionId, currentCwd, requestGateway
attachToMain(baseAttachment)
try {
const previewUrl = await window.hermesDesktop?.readFileDataUrl(filePath)
const previewUrl = await readDesktopFileDataUrl(filePath)
if (previewUrl) {
addComposerAttachment({ ...baseAttachment, previewUrl })
@@ -395,7 +396,7 @@ export function useComposerActions({ activeSessionId, currentCwd, requestGateway
)
const pickImages = useCallback(async () => {
const paths = await window.hermesDesktop?.selectPaths({
const paths = await selectDesktopPaths({
title: copy.attachImages,
defaultPath: currentCwd || undefined,
filters: [

View File

@@ -1149,7 +1149,8 @@ export function ChatSidebar({
const showSessionSkeletons = sessionsLoading && sortedSessions.length === 0
const showSessionSections = showSessionSkeletons || sortedSessions.length > 0
const showSessionSections =
showSessionSkeletons || sortedSessions.length > 0 || projectModel.length > 0
// Each reorderable list reports its OWN new id order; persisting is a direct,
// typed write — no id-prefix sniffing to figure out which level moved.
@@ -1537,7 +1538,7 @@ export function ChatSidebar({
</div>
)}
{contentVisible && !showSessionSections && <div className="min-h-0 flex-1" />}
{contentVisible && !showSessionSections && <SidebarBlankState onNewProject={openProjectCreate} />}
{contentVisible && (
<div className="shrink-0 px-0.5 pb-1 pt-0.5">
@@ -1618,6 +1619,29 @@ function SidebarSessionSkeletons() {
)
}
function SidebarBlankState({ onNewProject }: { onNewProject: () => void }) {
const { t } = useI18n()
const s = t.sidebar
return (
<div className="grid min-h-0 flex-1 place-items-center px-4 text-center">
<div className="flex flex-col items-center gap-2">
<Codicon className="text-(--ui-text-quaternary)" name="root-folder" size="1.25rem" />
<p className="text-xs text-(--ui-text-tertiary)">{s.noSessions}</p>
<Button
className="mt-0.5 text-(--ui-text-secondary)"
onClick={onNewProject}
size="sm"
variant="ghost"
>
<Codicon name="add" size="0.75rem" />
{s.projects.newButton}
</Button>
</div>
</div>
)
}
function SidebarPinnedEmptyState() {
const { t } = useI18n()

View File

@@ -87,21 +87,25 @@ export function ProjectDialog() {
}
const pickFolder = async () => {
const dir = await pickProjectFolder()
try {
const dir = await pickProjectFolder()
if (!dir) {
return
if (!dir) {
return
}
const projectId = state?.projectId
if (mode === 'add-folder' && projectId) {
await runSubmit(() => addProjectFolder(projectId, dir))
return
}
setFolders(prev => (prev.includes(dir) ? prev : [...prev, dir]))
} catch (err) {
notifyError(err, p.createFailed)
}
const projectId = state?.projectId
if (mode === 'add-folder' && projectId) {
await runSubmit(() => addProjectFolder(projectId, dir))
return
}
setFolders(prev => (prev.includes(dir) ? prev : [...prev, dir]))
}
const submit = async () => {
@@ -145,7 +149,10 @@ export function ProjectDialog() {
return (
<Dialog onOpenChange={onOpenChange} open={open}>
<DialogContent className="max-w-md">
<DialogContent
className="max-w-md"
onInteractOutside={event => event.preventDefault()}
>
<DialogHeader>
<DialogTitle>{title}</DialogTitle>
{mode === 'create' && <DialogDescription>{p.createDesc}</DialogDescription>}

View File

@@ -3,6 +3,7 @@ import { useEffect, useMemo, useState } from 'react'
import type { HermesGitWorktree } from '@/global'
import type { SessionInfo } from '@/hermes'
import { desktopGit } from '@/lib/desktop-git'
import { mapPool } from '@/lib/pool'
import { $sidebarWorkspaceCollapsedIds, toggleWorkspaceNodeCollapsed } from '@/store/layout'
import { $worktreeRefreshToken } from '@/store/projects'
@@ -88,7 +89,7 @@ export function useRepoWorktreeMap(
const refreshToken = useStore($worktreeRefreshToken)
useEffect(() => {
const git = window.hermesDesktop?.git
const git = desktopGit()
if (!enabled || !repoPaths.length || !git?.worktreeList) {
setMap({})

View File

@@ -9,7 +9,16 @@ import { getActionStatus, getLogs, getStatus, getUsageAnalytics, restartGateway,
import type { ActionStatusResponse, AnalyticsResponse, StatusResponse } from '@/hermes'
import { useI18n } from '@/i18n'
import { sessionTitle } from '@/lib/chat-runtime'
import { Activity, AlertCircle, BarChart3, Bookmark, BookmarkFilled, Download, Pin, Trash2 } from '@/lib/icons'
import {
Activity,
AlertCircle,
BarChart3,
Bookmark,
BookmarkFilled,
Download,
MessageCircle,
Trash2
} from '@/lib/icons'
import { exportSession } from '@/lib/session-export'
import { cn } from '@/lib/utils'
import { upsertDesktopActionTask } from '@/store/activity'
@@ -263,7 +272,7 @@ export function CommandCenterView({ initialSection, onClose, onDeleteSession, on
{SECTIONS.map(value => (
<OverlayNavItem
active={section === value}
icon={value === 'sessions' ? Pin : value === 'system' ? Activity : BarChart3}
icon={value === 'sessions' ? MessageCircle : value === 'system' ? Activity : BarChart3}
key={value}
label={cc.sections[value]}
onClick={() => setSection(value)}
@@ -361,7 +370,7 @@ export function CommandCenterView({ initialSection, onClose, onDeleteSession, on
/>
) : (
<div className="grid min-h-0 flex-1 grid-rows-[auto_minmax(0,1fr)] gap-4">
<div className="border-b border-(--ui-stroke-tertiary) pb-4">
<div>
{status ? (
<div className="grid gap-2">
<div className="flex items-start justify-between gap-3">
@@ -406,7 +415,7 @@ export function CommandCenterView({ initialSection, onClose, onDeleteSession, on
)}
</div>
<div className="flex min-h-0 flex-col">
<div className="flex min-h-0 flex-col pt-2">
<div className="mb-2 flex items-center justify-between">
<span className="text-[0.625rem] font-medium uppercase tracking-[0.08em] text-(--ui-text-tertiary)">
{cc.recentLogs}
@@ -503,7 +512,7 @@ function UsagePanel({ error, loading, onRefresh, period, usage }: UsagePanelProp
</span>
)}
<div className="grid grid-cols-2 gap-x-4 gap-y-4 border-b border-(--ui-stroke-tertiary) pb-5 sm:grid-cols-3">
<div className="grid grid-cols-2 gap-x-4 gap-y-4 py-2 sm:grid-cols-3">
<UsageStat label={cc.statSessions} value={formatInteger(totals.total_sessions)} />
<UsageStat label={cc.statApiCalls} value={formatInteger(totals.total_api_calls)} />
<UsageStat
@@ -563,7 +572,7 @@ function UsagePanel({ error, loading, onRefresh, period, usage }: UsagePanelProp
)}
</section>
<div className="grid min-h-0 gap-x-8 gap-y-5 border-t border-(--ui-stroke-tertiary) pt-5 sm:grid-cols-2">
<div className="grid min-h-0 gap-x-8 gap-y-5 pt-1 sm:grid-cols-2">
<UsageList
emptyLabel={cc.noModelUsage}
rows={byModel.slice(0, 6).map(entry => ({

View File

@@ -14,7 +14,6 @@ import {
DialogTitle
} from '@/components/ui/dialog'
import { Input } from '@/components/ui/input'
import { SearchField } from '@/components/ui/search-field'
import { Select, SelectContent, SelectItem, SelectTrigger, SelectValue } from '@/components/ui/select'
import { Textarea } from '@/components/ui/textarea'
import {
@@ -30,14 +29,28 @@ import {
updateCronJob
} from '@/hermes'
import { type Translations, useI18n } from '@/i18n'
import { AlertTriangle, Clock } from '@/lib/icons'
import { cn } from '@/lib/utils'
import { AlertTriangle } from '@/lib/icons'
import { $cronFocusJobId, $cronJobs, setCronFocusJobId, setCronJobs, updateCronJobs } from '@/store/cron'
import { notify, notifyError } from '@/store/notifications'
import { useRefreshHotkey } from '../hooks/use-refresh-hotkey'
import { OverlayMain, OverlayNewButton, OverlaySidebar, OverlaySplitLayout } from '../overlays/overlay-split-layout'
import { OverlayView } from '../overlays/overlay-view'
import {
Panel,
PanelAction,
PanelAddButton,
PanelBlock,
PanelBody,
PanelDetail,
PanelEmpty,
PanelHeader,
PanelList,
PanelListRow,
PanelMeta,
PanelPill,
type PanelPillTone,
PanelRowMenu,
PanelSectionLabel
} from '../overlays/panel'
import type { SetStatusbarItemGroup } from '../shell/statusbar-controls'
import { jobState, jobTitle, STATE_DOT } from './job-state'
@@ -56,7 +69,7 @@ const SCHEDULE_OPTIONS: ReadonlyArray<ScheduleOption> = [
{ value: 'custom' }
]
const STATE_TONE: Record<string, 'good' | 'muted' | 'warn' | 'bad'> = {
const STATE_TONE: Record<string, PanelPillTone> = {
enabled: 'good',
scheduled: 'good',
running: 'good',
@@ -66,13 +79,6 @@ const STATE_TONE: Record<string, 'good' | 'muted' | 'warn' | 'bad'> = {
completed: 'muted'
}
const PILL_TONE: Record<'good' | 'muted' | 'warn' | 'bad', string> = {
good: 'bg-primary/10 text-primary',
muted: 'bg-muted text-muted-foreground',
warn: 'bg-amber-500/10 text-amber-600 dark:text-amber-300',
bad: 'bg-destructive/10 text-destructive'
}
const asText = (value: unknown): string => (typeof value === 'string' ? value : '')
const truncate = (value: string, max = 80): string => (value.length > max ? `${value.slice(0, max)}` : value)
@@ -321,7 +327,7 @@ export function CronView({ onClose, onOpenSession, setStatusbarItemGroup: _setSt
pendingScrollRef.current = null
requestAnimationFrame(() => {
document.querySelector(`[data-cron-row="${CSS.escape(target)}"]`)?.scrollIntoView({ block: 'nearest' })
document.querySelector(`[data-panel-row="${CSS.escape(target)}"]`)?.scrollIntoView({ block: 'nearest' })
})
}, [selectedJob])
@@ -406,60 +412,66 @@ export function CronView({ onClose, onOpenSession, setStatusbarItemGroup: _setSt
}
return (
<OverlayView closeLabel={c.close} onClose={onClose}>
<Panel closeLabel={c.close} onClose={onClose}>
{loading && jobs.length === 0 ? (
<PageLoader label={c.loading} />
) : totalCount === 0 ? (
<PanelEmpty
action={
<Button onClick={() => setEditor({ mode: 'create' })} size="sm">
{c.newCron}
</Button>
}
description={c.emptyDescNew}
icon="watch"
title={c.emptyTitleNew}
/>
) : (
<OverlaySplitLayout>
<OverlaySidebar>
<OverlayNewButton label={c.newCron} onClick={() => setEditor({ mode: 'create' })} />
{totalCount > 0 && (
<SearchField
aria-label={c.search}
containerClassName="mb-1 w-full px-2"
onChange={setQuery}
placeholder={c.search}
value={query}
/>
)}
{visibleJobs.map(job => (
<CronJobListRow
active={selectedJob?.id === job.id}
c={c}
job={job}
key={job.id}
onSelect={() => setSelectedJobId(job.id)}
/>
))}
{visibleJobs.length === 0 && (
<p className="px-2 py-4 text-center text-xs text-muted-foreground">
{totalCount === 0 ? c.emptyTitleNew : c.emptyTitleSearch}
</p>
)}
</OverlaySidebar>
<>
<PanelHeader subtitle={c.count(totalCount)} title={c.title} />
<PanelBody>
<PanelList
onSearchChange={setQuery}
searchLabel={c.search}
searchPlaceholder={c.search}
searchValue={query}
>
{visibleJobs.map(job => (
<CronJobListRow
active={selectedJob?.id === job.id}
job={job}
key={job.id}
menu={
<PanelRowMenu
items={[
{ icon: 'edit', label: c.edit, onSelect: () => setEditor({ mode: 'edit', job }) },
{ icon: 'trash', label: t.common.delete, onSelect: () => setPendingDelete(job), tone: 'danger' }
]}
/>
}
onSelect={() => setSelectedJobId(job.id)}
/>
))}
{visibleJobs.length === 0 && (
<p className="px-2 py-4 text-center text-xs text-muted-foreground">{c.emptyTitleSearch}</p>
)}
<PanelAddButton label={c.newCron} onClick={() => setEditor({ mode: 'create' })} />
</PanelList>
<OverlayMain className="px-0">
{selectedJob ? (
<CronJobDetail
busy={busyJobId === selectedJob.id}
c={c}
job={selectedJob}
onDelete={() => setPendingDelete(selectedJob)}
onEdit={() => setEditor({ mode: 'edit', job: selectedJob })}
onOpenSession={onOpenSession}
onPauseResume={() => void handlePauseResume(selectedJob)}
onTrigger={() => void handleTrigger(selectedJob)}
/>
) : (
<div className="grid h-full place-items-center px-6 py-12 text-center text-sm text-muted-foreground">
<div>
<Clock className="mx-auto size-6 text-muted-foreground/60" />
<p className="mt-3">{totalCount === 0 ? c.emptyDescNew : c.emptyDescSearch}</p>
</div>
</div>
<PanelEmpty description={c.emptyDescSearch} icon="search" />
)}
</OverlayMain>
</OverlaySplitLayout>
</PanelBody>
</>
)}
<CronEditorDialog editor={editor} onClose={() => setEditor({ mode: 'closed' })} onSave={handleEditorSave} />
@@ -488,42 +500,32 @@ export function CronView({ onClose, onOpenSession, setStatusbarItemGroup: _setSt
</DialogFooter>
</DialogContent>
</Dialog>
</OverlayView>
</Panel>
)
}
function CronJobListRow({
active,
c,
job,
menu,
onSelect
}: {
active: boolean
c: Translations['cron']
job: CronJob
menu?: React.ReactNode
onSelect: () => void
}) {
const state = jobState(job)
return (
<button
className={cn(
'flex w-full flex-col items-start gap-0.5 rounded-md px-2 py-1.5 text-left transition-colors',
active ? 'bg-accent text-foreground' : 'text-foreground/85 hover:bg-accent/60'
)}
data-cron-row={job.id}
onClick={onSelect}
type="button"
>
<span className="flex w-full items-center gap-2">
<span
aria-hidden="true"
className={cn('size-1.5 shrink-0 rounded-full', STATE_DOT[state] ?? 'bg-muted-foreground')}
/>
<span className="min-w-0 flex-1 truncate text-sm font-medium">{jobTitle(job)}</span>
</span>
<span className="truncate pl-3.5 text-[0.66rem] text-muted-foreground">{jobScheduleDisplay(job)}</span>
</button>
<PanelListRow
active={active}
dotClassName={STATE_DOT[state] ?? 'bg-muted-foreground'}
menu={menu}
onSelect={onSelect}
rowKey={job.id}
title={jobTitle(job)}
/>
)
}
@@ -531,8 +533,6 @@ function CronJobDetail({
busy,
c,
job,
onDelete,
onEdit,
onOpenSession,
onPauseResume,
onTrigger
@@ -540,8 +540,6 @@ function CronJobDetail({
busy: boolean
c: Translations['cron']
job: CronJob
onDelete: () => void
onEdit: () => void
onOpenSession?: (sessionId: string) => void
onPauseResume: () => void
onTrigger: () => void
@@ -552,69 +550,49 @@ function CronJobDetail({
const prompt = jobPrompt(job)
return (
<div className="flex h-full min-h-0 flex-col">
<div className="min-h-0 flex-1 overflow-y-auto">
<div className="mx-auto max-w-2xl space-y-6 px-6 py-6">
<header className="space-y-3">
<div className="flex flex-wrap items-start justify-between gap-3">
<div className="min-w-0 space-y-1">
<div className="flex flex-wrap items-center gap-2">
<h3 className="text-xl font-semibold tracking-tight">{jobTitle(job)}</h3>
<StatePill tone={STATE_TONE[state] ?? 'muted'}>{c.states[state] ?? state}</StatePill>
{deliver && deliver !== DEFAULT_DELIVER && (
<StatePill tone="muted">{c.deliveryLabels[deliver] ?? deliver}</StatePill>
)}
</div>
<div className="flex flex-wrap items-center gap-x-4 gap-y-1 text-[0.7rem] text-muted-foreground">
<span className="inline-flex items-center gap-1">
<Clock className="size-3" />
{jobScheduleDisplay(job)}
</span>
<span>
{c.last} {formatTime(job.last_run_at)}
</span>
<span>
{c.next} {formatTime(job.next_run_at)}
</span>
</div>
</div>
<div className="flex shrink-0 items-center gap-1">
<Button disabled={busy} onClick={onPauseResume} size="sm" variant="outline">
<Codicon name={isPaused ? 'play' : 'debug-pause'} size="0.875rem" />
{isPaused ? c.resumeTitle : c.pauseTitle}
</Button>
<Button disabled={busy} onClick={onTrigger} size="sm" variant="outline">
<Codicon name="zap" size="0.875rem" />
{c.triggerNow}
</Button>
<Button onClick={onEdit} size="sm" variant="outline">
<Codicon name="edit" size="0.875rem" />
{c.edit}
</Button>
<Button
className="text-muted-foreground hover:bg-destructive/10 hover:text-destructive"
onClick={onDelete}
size="sm"
variant="ghost"
>
<Codicon name="trash" size="0.875rem" />
</Button>
</div>
</div>
{prompt && <p className="line-clamp-3 text-xs text-muted-foreground">{prompt}</p>}
{job.last_error && (
<p className="inline-flex items-start gap-1 text-[0.7rem] text-destructive">
<AlertTriangle className="mt-px size-3 shrink-0" />
<span className="line-clamp-2">{job.last_error}</span>
</p>
)}
</header>
<CronJobRuns c={c} jobId={job.id} onOpenSession={onOpenSession} />
<PanelDetail>
<header className="space-y-3">
<div className="flex flex-wrap items-start justify-between gap-3">
<div className="flex min-w-0 flex-wrap items-center gap-2">
<h3 className="text-[0.95rem] font-semibold tracking-tight text-foreground">{jobTitle(job)}</h3>
<PanelPill tone={STATE_TONE[state] ?? 'muted'}>{c.states[state] ?? state}</PanelPill>
</div>
<div className="flex shrink-0 items-center gap-0.5">
<PanelAction disabled={busy} icon={isPaused ? 'play' : 'debug-pause'} onClick={onPauseResume}>
{isPaused ? c.resumeTitle : c.pauseTitle}
</PanelAction>
<PanelAction disabled={busy} icon="zap" onClick={onTrigger}>
{c.triggerNow}
</PanelAction>
</div>
</div>
</div>
</div>
<PanelMeta
rows={[
{ label: c.frequencyLabel, value: jobScheduleDisplay(job) },
{ label: c.last.replace(/:$/, ''), value: formatTime(job.last_run_at) },
{ label: c.next.replace(/:$/, ''), value: formatTime(job.next_run_at) },
{ label: c.deliverLabel, value: c.deliveryLabels[deliver] ?? deliver }
]}
/>
{job.last_error ? (
<div className="flex items-start gap-1.5 rounded bg-destructive/10 p-2 text-[0.7rem] text-destructive">
<AlertTriangle className="mt-px size-3 shrink-0" />
<span className="min-w-0 break-words">{job.last_error}</span>
</div>
) : null}
</header>
{prompt ? (
<section className="space-y-1.5">
<PanelSectionLabel>{c.promptLabel}</PanelSectionLabel>
<PanelBlock>{prompt}</PanelBlock>
</section>
) : null}
<CronJobRuns c={c} jobId={job.id} onOpenSession={onOpenSession} />
</PanelDetail>
)
}
@@ -685,10 +663,10 @@ function CronJobRuns({
return (
<div>
<div className="mb-1.5 text-[0.62rem] font-medium uppercase tracking-wide text-muted-foreground">
<PanelSectionLabel className="mb-1.5">
{c.runHistory}
{runs && runs.length > 0 ? ` · ${runs.length}` : ''}
</div>
</PanelSectionLabel>
{runs === null ? (
<div className="flex items-center gap-1.5 py-1 text-xs text-muted-foreground">
<Codicon name="loading" size="0.75rem" spinning />
@@ -699,13 +677,13 @@ function CronJobRuns({
<div className="flex flex-col gap-px">
{runs.map(run => (
<button
className="flex items-center justify-between gap-3 rounded-md px-2 py-1 text-left text-xs hover:bg-(--chrome-action-hover) focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring/40"
className="flex items-center justify-between gap-3 rounded-md px-2 py-1 text-left text-xs transition-colors duration-100 hover:bg-(--ui-row-hover-background) focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring/40"
key={run.id}
onClick={() => onOpenSession?.(run.id)}
type="button"
>
<span className="truncate text-foreground">{run.title?.trim() || run.preview?.trim() || run.id}</span>
<span className="shrink-0 text-[0.62rem] text-muted-foreground tabular-nums">
<span className="truncate text-foreground/85">{run.title?.trim() || run.preview?.trim() || run.id}</span>
<span className="shrink-0 text-[0.62rem] text-muted-foreground/55 tabular-nums">
{formatRunTime(run.last_active || run.started_at)}
</span>
</button>
@@ -716,16 +694,6 @@ function CronJobRuns({
)
}
function StatePill({ children, tone }: { children: string; tone: keyof typeof PILL_TONE }) {
return (
<span
className={cn('inline-flex items-center rounded-full px-1.5 py-0.5 text-[0.64rem] capitalize', PILL_TONE[tone])}
>
{children}
</span>
)
}
function CronEditorDialog({
editor,
onClose,

View File

@@ -10,6 +10,7 @@ import { GatewayConnectingOverlay } from '@/components/gateway-connecting-overla
import { Pane, PaneMain } from '@/components/pane-shell'
import { RemoteDisplayBanner } from '@/components/remote-display-banner'
import { useMediaQuery } from '@/hooks/use-media-query'
import { isFocusWithin } from '@/lib/keybinds/combo'
import { cn } from '@/lib/utils'
import { useSkinCommand } from '@/themes/use-skin-command'
@@ -124,9 +125,12 @@ import { ModelVisibilityOverlay } from './model-visibility-overlay'
import { PetGenerateOverlay } from './pet-generate/pet-generate-overlay'
import { RightSidebarPane } from './right-sidebar'
import { FileActionDialogs } from './right-sidebar/file-actions'
import { RemoteFolderPicker } from './right-sidebar/files/remote-picker'
import { ReviewPane } from './right-sidebar/review'
import { $terminalTakeover } from './right-sidebar/store'
import { PersistentTerminal, TerminalSlot } from './right-sidebar/terminal/persistent'
import { TerminalPaneChrome } from './right-sidebar/terminal/chrome'
import { PersistentTerminal } from './right-sidebar/terminal/persistent'
import { closeActiveTerminal } from './right-sidebar/terminal/terminals'
import { CRON_ROUTE, NEW_CHAT_ROUTE, routeSessionId, sessionRoute, SETTINGS_ROUTE } from './routes'
import { SessionPickerOverlay } from './session-picker-overlay'
import { SessionSwitcher } from './session-switcher'
@@ -387,11 +391,25 @@ export function DesktopController() {
useEffect(() => {
const onKeyDown = (event: KeyboardEvent) => {
if (!$filePreviewTarget.get() && !$previewTarget.get()) {
if (event.altKey || event.shiftKey || event.key.toLowerCase() !== 'w' || (!event.metaKey && !event.ctrlKey)) {
return
}
if ((event.metaKey || event.ctrlKey) && !event.altKey && !event.shiftKey && event.key.toLowerCase() === 'w') {
// Terminal focused: ⌘W closes the active terminal. Ctrl+W is left untouched
// for the shell's werase, and nothing else may steal ⌘/Ctrl+W from a
// focused terminal (so it never closes a preview tab out from under it).
if (isFocusWithin('[data-terminal]')) {
if (event.metaKey && !event.ctrlKey) {
event.preventDefault()
event.stopPropagation()
closeActiveTerminal()
}
return
}
// Otherwise ⌘/Ctrl+W closes the active preview tab when one is open.
if ($filePreviewTarget.get() || $previewTarget.get()) {
event.preventDefault()
event.stopPropagation()
closeActiveRightRailTab()
@@ -580,7 +598,7 @@ export function DesktopController() {
}
}, [])
const { gatewayLogLines, inferenceStatus, statusSnapshot } = useStatusSnapshot(gatewayState, requestGateway)
const { inferenceStatus, statusSnapshot } = useStatusSnapshot(gatewayState, requestGateway)
const updateActiveSessionRuntimeInfo = useCallback(
(info: { branch?: string; cwd?: string }) => {
@@ -1060,7 +1078,6 @@ export function DesktopController() {
commandCenterOpen,
extraLeftItems: statusbarItemGroups.flat.left,
extraRightItems: statusbarItemGroups.flat.right,
gatewayLogLines,
gatewayState,
inferenceStatus,
openAgents,
@@ -1095,11 +1112,13 @@ export function DesktopController() {
/>
)
// One PTY-backed terminal mounted forever; <TerminalSlot /> placeholders decide
// where it shows. Lives in main's stacking context (not the root overlay layer)
// so pane resize handles still paint above it. Toggling never rebuilds the shell.
// The persistent xterm layer (one host per terminal tab), CSS-overlaid onto the
// pane's <TerminalSlot />. Lives in main's stacking context (not the root overlay
// layer) so pane resize handles still paint above it. Terminals own their state
// (incl. a snapshotted cwd) independent of the session, so switching sessions
// never rebuilds or closes them; toggling the pane never rebuilds the shells.
const mainOverlays = (
<PersistentTerminal cwd={currentCwd} onAddSelectionToChat={composer.addTerminalSelectionAttachment} />
<PersistentTerminal onAddSelectionToChat={composer.addTerminalSelectionAttachment} />
)
const overlays = (
@@ -1127,6 +1146,7 @@ export function DesktopController() {
<PetGenerateOverlay />
<SessionSwitcher />
<FileActionDialogs />
<RemoteFolderPicker />
{settingsOpen && (
<Suspense fallback={null}>
@@ -1329,7 +1349,7 @@ export function DesktopController() {
terminalAsRow ? 'border-l border-(--ui-stroke-secondary) pt-0' : 'pt-(--titlebar-height)'
)}
>
<TerminalSlot />
<TerminalPaneChrome />
</div>
</Pane>
)

View File

@@ -1,10 +1,10 @@
import { isGatewayReauthRequired, resolveGatewayWsUrl } from '@hermes/shared'
import { useEffect, useRef } from 'react'
import type { HermesConnection } from '@/global'
import { HermesGateway } from '@/hermes'
import { translateNow } from '@/i18n'
import { desktopDefaultCwd } from '@/lib/desktop-fs'
import { isGatewayReauthRequired, resolveGatewayWsUrl } from '@/lib/gateway-ws-url'
import {
$desktopBoot,
applyDesktopBootProgress,

View File

@@ -1,8 +1,8 @@
import { isGatewayReauthRequired, resolveGatewayWsUrl } from '@hermes/shared'
import { useStore } from '@nanostores/react'
import { useCallback, useEffect, useRef } from 'react'
import type { HermesGateway } from '@/hermes'
import { isGatewayReauthRequired, resolveGatewayWsUrl } from '@/lib/gateway-ws-url'
import { $gateway, ensureActiveGatewayOpen, isActivePrimary } from '@/store/gateway'
import { $activeGatewayProfile } from '@/store/profile'
import { $gatewayState, setConnection } from '@/store/session'

View File

@@ -2,6 +2,7 @@ import { useEffect, useRef } from 'react'
import { useNavigate } from 'react-router-dom'
import { $terminalTakeover, setTerminalTakeover } from '@/app/right-sidebar/store'
import { closeActiveTerminal, createTerminal, cycleTerminal } from '@/app/right-sidebar/terminal/terminals'
import { PANE_TOGGLE_REVEAL_EVENT } from '@/components/pane-shell'
import { matchesQuery } from '@/hooks/use-media-query'
import { PROFILE_SLOT_COUNT, SESSION_SLOT_COUNT } from '@/lib/keybinds/actions'
@@ -164,6 +165,17 @@ export function useKeybinds(deps: KeybindRuntimeDeps): void {
'view.toggleReview': toggleReview,
'view.showFiles': showFiles,
'view.showTerminal': () => setTerminalTakeover(!$terminalTakeover.get()),
// Create first so the pane's open-effect ensure sees a non-empty set and
// doesn't also spawn one — net effect is exactly one fresh terminal.
'view.newTerminal': () => {
createTerminal()
setTerminalTakeover(true)
},
// Switch / close only act while the pane is open (no focus-scoping here, so
// this stands in for "terminal is showing").
'view.nextTerminal': () => $terminalTakeover.get() && cycleTerminal(1),
'view.prevTerminal': () => $terminalTakeover.get() && cycleTerminal(-1),
'view.closeTerminal': () => $terminalTakeover.get() && closeActiveTerminal(),
'view.flipPanes': togglePanesFlipped,
'appearance.toggleMode': () => setMode(resolvedMode === 'dark' ? 'light' : 'dark'),

View File

@@ -0,0 +1,89 @@
// @vitest-environment jsdom
import { cleanup, fireEvent, render, screen, waitFor } from '@testing-library/react'
import { MemoryRouter } from 'react-router-dom'
import { afterEach, beforeEach, describe, expect, it, vi } from 'vitest'
import type { MessagingPlatformInfo } from '@/types/hermes'
const getMessagingPlatforms = vi.fn()
const updateMessagingPlatform = vi.fn()
const openExternalLink = vi.fn()
vi.mock('@/hermes', () => ({
getMessagingPlatforms: () => getMessagingPlatforms(),
updateMessagingPlatform: (id: string, body: unknown) => updateMessagingPlatform(id, body)
}))
vi.mock('@/lib/external-link', () => ({
openExternalLink: (href: string) => openExternalLink(href)
}))
vi.mock('@/store/notifications', () => ({
notify: vi.fn(),
notifyError: vi.fn()
}))
vi.mock('@/store/system-actions', () => ({
runGatewayRestart: vi.fn()
}))
function platform(patch: Partial<MessagingPlatformInfo> = {}): MessagingPlatformInfo {
return {
configured: false,
description: 'A platform.',
docs_url: '',
enabled: false,
env_vars: [],
gateway_running: true,
id: 'teams',
name: 'Microsoft Teams',
state: 'disabled',
...patch
}
}
beforeEach(() => {
updateMessagingPlatform.mockResolvedValue({ ok: true, platform: 'teams' })
})
afterEach(() => {
cleanup()
vi.clearAllMocks()
})
async function renderMessaging() {
const { MessagingView } = await import('./index')
return render(
<MemoryRouter>
<MessagingView />
</MemoryRouter>
)
}
describe('MessagingView setup-guide link', () => {
it('hides the setup-guide button for a plugin platform with no docs URL', async () => {
// Teams (and other plugin platforms) ship an empty docs_url. Rendering an
// anchor with href="" let Electron resolve it to the app's own packaged
// index.html and fail with an OS "file not found" dialog. The button must
// simply not appear when there is no guide to open.
getMessagingPlatforms.mockResolvedValue({ platforms: [platform({ docs_url: '' })] })
await renderMessaging()
expect((await screen.findAllByText('Microsoft Teams')).length).toBeGreaterThan(0)
expect(screen.queryByText('Open setup guide')).toBeNull()
})
it('opens a real docs URL through the validated external opener', async () => {
const docsUrl = 'https://hermes-agent.nousresearch.com/docs/user-guide/messaging/teams'
getMessagingPlatforms.mockResolvedValue({ platforms: [platform({ docs_url: docsUrl })] })
await renderMessaging()
const link = await screen.findByText('Open setup guide')
fireEvent.click(link)
await waitFor(() => expect(openExternalLink).toHaveBeenCalledWith(docsUrl))
})
})

View File

@@ -14,6 +14,7 @@ import {
updateMessagingPlatform
} from '@/hermes'
import { type Translations, useI18n } from '@/i18n'
import { openExternalLink } from '@/lib/external-link'
import { AlertTriangle, ExternalLink, Save, Trash2 } from '@/lib/icons'
import { cn } from '@/lib/utils'
import { notify, notifyError } from '@/store/notifications'
@@ -404,14 +405,31 @@ function PlatformDetail({
<p className="mt-1 text-[length:var(--conversation-caption-font-size)] leading-(--conversation-caption-line-height) text-(--ui-text-tertiary)">
{introCopy(platform, m)}
</p>
<div className="mt-3">
<Button asChild size="sm" variant="textStrong">
<a href={platform.docs_url} rel="noreferrer" target="_blank">
{m.openSetupGuide}
<ExternalLink className="size-3.5" />
</a>
</Button>
</div>
{platform.docs_url && (
<div className="mt-3">
<Button asChild size="sm" variant="textStrong">
<a
href={platform.docs_url}
onClick={event => {
// Route through the validated external opener instead of
// letting Electron resolve the anchor. A packaged build's
// empty/relative href resolves to the app's own
// index.html file path, which shell.openPath then fails to
// open ("file not found"). Plugin platforms (Teams, etc.)
// ship no docs_url, so this guard + handler keeps the
// button from ever pointing at a local bundle path.
event.preventDefault()
openExternalLink(platform.docs_url)
}}
rel="noreferrer"
target="_blank"
>
{m.openSetupGuide}
<ExternalLink className="size-3.5" />
</a>
</Button>
</div>
)}
</section>
<section>

View File

@@ -1,26 +1,11 @@
import type { ButtonHTMLAttributes, ComponentProps, ReactNode } from 'react'
import type { ButtonHTMLAttributes, ReactNode } from 'react'
import { cn } from '@/lib/utils'
export const overlayCardClass =
'rounded-lg border border-[color-mix(in_srgb,var(--dt-border)_52%,transparent)] bg-[color-mix(in_srgb,var(--dt-card)_72%,transparent)] shadow-[inset_0_0.0625rem_0_color-mix(in_srgb,white_34%,transparent)]'
interface OverlayCardProps extends ComponentProps<'div'> {
children: ReactNode
}
interface OverlayActionButtonProps extends ButtonHTMLAttributes<HTMLButtonElement> {
tone?: 'default' | 'danger' | 'subtle'
}
export function OverlayCard({ children, className, ...props }: OverlayCardProps) {
return (
<div className={cn(overlayCardClass, className)} {...props}>
{children}
</div>
)
}
export function OverlayActionButton({
children,
className,

View File

@@ -1,33 +0,0 @@
import type { RefObject } from 'react'
import { SearchField } from '@/components/ui/search-field'
interface OverlaySearchInputProps {
containerClassName?: string
inputRef?: RefObject<HTMLInputElement | null>
loading?: boolean
onChange: (value: string) => void
placeholder: string
value: string
}
// Borderless underline search — matches the tools/skills page (PageSearchShell).
export function OverlaySearchInput({
containerClassName,
inputRef,
loading = false,
onChange,
placeholder,
value
}: OverlaySearchInputProps) {
return (
<SearchField
containerClassName={containerClassName}
inputRef={inputRef}
loading={loading}
onChange={onChange}
placeholder={placeholder}
value={value}
/>
)
}

View File

@@ -1,7 +1,5 @@
import type { ReactNode } from 'react'
import { Button } from '@/components/ui/button'
import { Codicon } from '@/components/ui/codicon'
import type { IconComponent } from '@/lib/icons'
import { cn } from '@/lib/utils'
@@ -50,9 +48,10 @@ export function OverlaySidebar({ children, className }: OverlaySidebarProps) {
return (
<aside
className={cn(
// pt clears the floating titlebar/header; the bg itself fills from the
// card's top edge so there's no surface-colored gap above the sidebar.
'flex min-h-0 flex-col gap-0.5 overflow-y-auto bg-(--ui-sidebar-surface-background) px-2.5 pb-3 pt-[calc(var(--titlebar-height)+1rem)]',
// pt clears the in-card close button (the OverlayView now insets the
// whole card below the OS titlebar); the bg fills from the card's top
// edge so there's no surface-colored gap above the sidebar.
'flex min-h-0 flex-col gap-0.5 overflow-y-auto bg-(--ui-sidebar-surface-background) px-2.5 pb-3 pt-[calc(var(--titlebar-height)/2+1rem)]',
className
)}
>
@@ -65,7 +64,7 @@ export function OverlayMain({ children, className }: OverlayMainProps) {
return (
<main
className={cn(
'flex min-h-0 flex-1 flex-col overflow-hidden bg-transparent pb-3 pt-[calc(var(--titlebar-height)+1rem)]',
'flex min-h-0 flex-1 flex-col overflow-hidden bg-transparent pb-3 pt-[calc(var(--titlebar-height)/2+1rem)]',
PAGE_INSET_X,
className
)}
@@ -75,31 +74,6 @@ export function OverlayMain({ children, className }: OverlayMainProps) {
)
}
// Boxless "+ New …" action that tops an OverlaySidebar list (profiles, cron, …).
// The text variant underlines on hover, which also strokes the icon glyph — so
// we keep the button itself underline-free and underline only the label span.
export function OverlayNewButton({
icon = 'add',
label,
onClick
}: {
icon?: string
label: string
onClick: () => void
}) {
return (
<Button
className="group mb-1 w-full justify-start gap-2 text-muted-foreground hover:bg-transparent hover:text-foreground"
onClick={onClick}
size="sm"
variant="ghost"
>
<Codicon name={icon} />
<span className="underline-offset-4 group-hover:underline">{label}</span>
</Button>
)
}
export function OverlayNavItem({ active, icon: Icon, label, nested, onClick, trailing }: OverlayNavItemProps) {
return (
<button

View File

@@ -49,7 +49,15 @@ export function OverlayView({
return (
<div
className="fixed inset-0 z-50 bg-black/22 p-3 backdrop-blur-[0.125rem] sm:p-6"
className={cn(
'fixed inset-0 z-50 bg-black/22 backdrop-blur-[0.125rem]',
// Equidistant inset on every side. The top value is driven by the
// titlebar height so the card clears the OS traffic-lights vertically;
// since the card top already sits below them, the left needs no extra
// inset — keeping all sides equal so the card is ~full-width at any size.
'p-[calc(var(--titlebar-height)+0.625rem)]',
'sm:p-[calc(var(--titlebar-height)+0.875rem)]'
)}
onClick={event => {
if (event.target === event.currentTarget) {
closeOverlay()

View File

@@ -0,0 +1,377 @@
import type { ReactNode } from 'react'
import { Button } from '@/components/ui/button'
import { Codicon } from '@/components/ui/codicon'
import { DropdownMenu, DropdownMenuContent, DropdownMenuItem, DropdownMenuTrigger } from '@/components/ui/dropdown-menu'
import { SearchField } from '@/components/ui/search-field'
import { translateNow } from '@/i18n'
import { cn } from '@/lib/utils'
import { OverlayView } from './overlay-view'
// Overlay "panel" primitive — the centered, capped card + framed chrome lifted
// straight from the trace / agents overlay so every non-settings overlay (cron,
// profiles, …) speaks the same visual language: tight type scale, muted
// opacities, NO container borders (rows separate via the row-hover/active bg
// vars + gaps, exactly like the trace waterfall labels).
//
// Compose it as:
// <Panel onClose>
// <PanelHeader title subtitle actions={…} />
// <PanelBody> // master/detail row
// <PanelList>…</PanelList>
// <PanelDetail>…</PanelDetail>
// </PanelBody>
// </Panel>
//
// Single-column views drop their content straight after the header.
interface PanelProps {
children: ReactNode
// Root layout override (the card already fills the equidistant inset).
className?: string
closeLabel?: string
contentClassName?: string
onClose: () => void
}
export function Panel({
children,
className,
closeLabel = translateNow('common.close'),
contentClassName,
onClose
}: PanelProps) {
return (
<OverlayView
closeLabel={closeLabel}
// Top pad aligns the header title's center with the floating close button
// (which sits at 0.1875rem + titlebar/2, -translate-y-1/2). The X is
// absolute so it costs no layout space — the header rides up next to it.
contentClassName={cn(
'flex h-full min-h-0 flex-col px-4 pb-4 pt-[calc(var(--titlebar-height)/2-0.4375rem)] sm:px-5',
contentClassName
)}
onClose={onClose}
rootClassName={cn('flex h-full w-full flex-col', className)}
>
{children}
</OverlayView>
)
}
interface PanelHeaderProps {
// Right-aligned controls (search, "+ New", segmented control, …).
actions?: ReactNode
subtitle?: ReactNode
title: ReactNode
}
export function PanelHeader({ actions, subtitle, title }: PanelHeaderProps) {
return (
<header className="mb-3 flex shrink-0 items-start justify-between gap-3">
<div className="min-w-0">
<h2 className="text-sm font-semibold text-foreground">{title}</h2>
{subtitle ? <p className="truncate text-xs text-muted-foreground/80">{subtitle}</p> : null}
</div>
{actions ? <div className="flex shrink-0 items-center gap-1.5">{actions}</div> : null}
</header>
)
}
export function PanelBody({ children, className }: { children: ReactNode; className?: string }) {
return <div className={cn('flex min-h-0 flex-1 gap-5 overflow-hidden', className)}>{children}</div>
}
interface PanelListProps {
children: ReactNode
className?: string
// Pass an onSearchChange to bake a full-bleed filter field in above the items
// (pinned; the rows scroll under it). Controlled via searchValue.
onSearchChange?: (value: string) => void
searchLabel?: string
searchPlaceholder?: string
searchValue?: string
}
// Left master list. Dense + borderless, like the trace waterfall's label tree:
// single-line rows that touch, separated from the detail only by the body gap.
// An optional search field pins to the top, full-bleed, above the scroll.
export function PanelList({
children,
className,
onSearchChange,
searchLabel,
searchPlaceholder,
searchValue
}: PanelListProps) {
return (
<div className={cn('flex w-52 shrink-0 flex-col', className)}>
{onSearchChange ? (
<SearchField
aria-label={searchLabel ?? searchPlaceholder ?? ''}
containerClassName="mb-1 w-full shrink-0"
onChange={onSearchChange}
placeholder={searchPlaceholder ?? ''}
value={searchValue ?? ''}
/>
) : null}
<div className="flex min-h-0 flex-1 flex-col overflow-y-auto overscroll-contain">{children}</div>
</div>
)
}
interface PanelListRowProps {
active: boolean
// Leading status dot color class (e.g. 'bg-emerald-500'); omit for none.
dotClassName?: string
// Leading codicon glyph name (used when there's no lead/dot).
icon?: string
// Custom leading element (colored swatch, avatar, …). Wins over dot/icon.
lead?: ReactNode
// Trailing per-row kebab menu (pass a <PanelRowMenu/>). Reveals on hover/focus.
menu?: ReactNode
// Short always-visible trailing meta (a tag/time, like the trace label's duration).
meta?: ReactNode
onSelect: () => void
rowKey?: string
title: ReactNode
}
// A row is a container (not a <button>) so it can host both the select target
// and a kebab menu without nesting interactive elements. Hover/active bg lives
// on the wrapper so the whole row highlights as one.
export function PanelListRow({
active,
dotClassName,
icon,
lead,
menu,
meta,
onSelect,
rowKey,
title
}: PanelListRowProps) {
return (
<div
className={cn(
'group/row relative flex h-7 w-full items-center rounded-md text-[0.78rem] transition-colors duration-100 ease-out',
active
? 'bg-(--ui-row-active-background) text-foreground'
: 'text-(--ui-text-secondary) hover:bg-(--ui-row-hover-background) hover:text-foreground'
)}
data-panel-row={rowKey}
>
<button
className="flex h-full min-w-0 flex-1 items-center gap-2 rounded-md pl-2 pr-1 text-left"
onClick={onSelect}
type="button"
>
{lead ??
(dotClassName ? (
<span aria-hidden="true" className={cn('size-1.5 shrink-0 rounded-full', dotClassName)} />
) : icon ? (
<Codicon className="shrink-0 text-muted-foreground/55" name={icon} size="0.85rem" />
) : null)}
<span className="min-w-0 flex-1 truncate font-medium text-foreground/85">{title}</span>
</button>
{meta ? <span className="shrink-0 pr-2 text-[0.62rem] tabular-nums text-muted-foreground/45">{meta}</span> : null}
{menu ? <div className="shrink-0 pr-1">{menu}</div> : null}
</div>
)
}
export interface PanelMenuItem {
disabled?: boolean
icon?: string
label: string
onSelect: () => void
tone?: 'danger' | 'default'
}
// Per-row "⋮" actions menu — mirrors the sidebar session row's settled pattern
// (size-5 ghost trigger + kebab-vertical codicon + w-40 content). Hidden until
// the row is hovered/focused (or the menu is open). Returns null with no items
// (e.g. the default profile, which can't be renamed/deleted).
export function PanelRowMenu({ items, label = 'Actions' }: { items: PanelMenuItem[]; label?: string }) {
if (items.length === 0) {
return null
}
return (
<DropdownMenu>
<DropdownMenuTrigger asChild>
<Button
aria-label={label}
className="size-5 rounded-[4px] bg-transparent text-(--ui-text-tertiary) opacity-0 transition-colors duration-100 hover:bg-(--ui-control-active-background) hover:text-foreground focus-visible:opacity-100 focus-visible:ring-0 group-hover/row:opacity-100 data-[state=open]:bg-(--ui-control-active-background) data-[state=open]:text-foreground data-[state=open]:opacity-100 [&_svg]:size-3.5!"
size="icon"
title={label}
variant="ghost"
>
<Codicon name="kebab-vertical" size="0.875rem" />
</Button>
</DropdownMenuTrigger>
<DropdownMenuContent align="end" className="w-40" sideOffset={6}>
{items.map(item => (
<DropdownMenuItem
disabled={item.disabled}
key={item.label}
onSelect={item.onSelect}
variant={item.tone === 'danger' ? 'destructive' : undefined}
>
{item.icon ? <Codicon name={item.icon} size="0.875rem" /> : null}
<span>{item.label}</span>
</DropdownMenuItem>
))}
</DropdownMenuContent>
</DropdownMenu>
)
}
// Scrolling detail region. Fills the column (no right rail here, unlike the
// trace inspector), so the content stretches the full available width.
export function PanelDetail({ children, className }: { children: ReactNode; className?: string }) {
return (
<div className={cn('min-h-0 flex-1 overflow-y-auto overscroll-contain', className)}>
<div className="space-y-4 pb-6 pl-1 pr-2">{children}</div>
</div>
)
}
interface PanelEmptyProps {
action?: ReactNode
description?: ReactNode
// Codicon glyph name (e.g. 'hubot', 'warning', 'loading~spin').
icon?: string
title?: ReactNode
}
export function PanelEmpty({ action, description, icon = 'inbox', title }: PanelEmptyProps) {
return (
<div className="grid flex-1 place-items-center px-6 py-10 text-center">
<div className="flex flex-col items-center gap-2">
<Codicon className="text-muted-foreground/50" name={icon} size="1.25rem" />
{title ? <p className="text-sm font-medium text-foreground/90">{title}</p> : null}
{description ? (
<p className="max-w-sm text-xs leading-relaxed text-muted-foreground/70">{description}</p>
) : null}
{action ? <div className="mt-2">{action}</div> : null}
</div>
</div>
)
}
export function PanelSectionLabel({ children, className }: { children: ReactNode; className?: string }) {
return (
<div className={cn('text-[0.6rem] font-medium uppercase tracking-wider text-muted-foreground/50', className)}>
{children}
</div>
)
}
// Inspector-style key/value grid (mirrors the trace span inspector's <dl>).
export interface PanelMetaRow {
label: ReactNode
value: ReactNode
}
export function PanelMeta({ className, rows }: { className?: string; rows: PanelMetaRow[] }) {
return (
<dl className={cn('grid grid-cols-[5rem_1fr] gap-x-2 gap-y-1 text-[0.7rem]', className)}>
{rows.map((row, i) => (
<div className="contents" key={typeof row.label === 'string' ? row.label : i}>
<dt className="truncate text-muted-foreground/55">{row.label}</dt>
<dd className="min-w-0 break-words text-foreground/85">{row.value}</dd>
</div>
))}
</dl>
)
}
// Monospace content block (job prompt, etc.) — mirrors the inspector's
// input/output <pre> blocks: subtle bg, no border.
export function PanelBlock({ children, className }: { children: ReactNode; className?: string }) {
return (
<pre
className={cn(
'max-h-48 overflow-auto whitespace-pre-wrap break-words rounded bg-foreground/5 p-2.5 text-[0.68rem] leading-relaxed text-foreground/80',
className
)}
>
{children}
</pre>
)
}
export type PanelPillTone = 'bad' | 'good' | 'muted' | 'warn'
const PILL_TONE: Record<PanelPillTone, string> = {
bad: 'bg-destructive/10 text-destructive',
good: 'bg-primary/10 text-primary',
muted: 'bg-foreground/10 text-muted-foreground',
warn: 'bg-amber-500/10 text-amber-600 dark:text-amber-300'
}
export function PanelPill({ children, tone = 'muted' }: { children: ReactNode; tone?: PanelPillTone }) {
return (
<span
className={cn(
'inline-flex items-center rounded-full px-1.5 py-0.5 text-[0.62rem] font-medium capitalize',
PILL_TONE[tone]
)}
>
{children}
</span>
)
}
// Self-describing centered "+" that sits as the LAST item in a PanelList. The
// label rides aria/title only — no visible text.
export function PanelAddButton({
icon = 'add',
label,
onClick
}: {
icon?: string
label: string
onClick: () => void
}) {
return (
<Button
aria-label={label}
className="h-7 w-full shrink-0 justify-center text-muted-foreground/70 hover:bg-(--ui-row-hover-background) hover:text-foreground"
onClick={onClick}
size="sm"
title={label}
variant="ghost"
>
<Codicon name={icon} size="0.875rem" />
</Button>
)
}
// Visible ghost action for a detail header (cron pause/resume/trigger, …).
export function PanelAction({
children,
disabled,
icon,
onClick
}: {
children: ReactNode
disabled?: boolean
icon: string
onClick: () => void
}) {
return (
<Button
className="gap-1.5 text-muted-foreground hover:bg-(--ui-row-hover-background) hover:text-foreground"
disabled={disabled}
onClick={onClick}
size="sm"
variant="ghost"
>
<Codicon name={icon} size="0.875rem" />
{children}
</Button>
)
}

View File

@@ -1,8 +1,10 @@
import { useStore } from '@nanostores/react'
import type * as React from 'react'
import { useCallback, useEffect, useMemo, useRef, useState } from 'react'
import { PageLoader } from '@/components/page-loader'
import { Button } from '@/components/ui/button'
import { Codicon } from '@/components/ui/codicon'
import {
Dialog,
DialogContent,
@@ -18,21 +20,34 @@ import {
createProfile,
deleteProfile,
getProfiles,
getProfileSetupCommand,
getProfileSoul,
type ProfileInfo,
renameProfile,
updateProfileSoul
} from '@/hermes'
import { useI18n } from '@/i18n'
import { AlertTriangle, Pencil, Save, Terminal, Trash2, Users } from '@/lib/icons'
import { AlertTriangle, Save } from '@/lib/icons'
import { profileColorSoft, resolveProfileColor } from '@/lib/profile-color'
import { slug } from '@/lib/sanitize'
import { cn } from '@/lib/utils'
import { notify, notifyError } from '@/store/notifications'
import { $profileColors } from '@/store/profile'
import { useRefreshHotkey } from '../hooks/use-refresh-hotkey'
import { OverlayMain, OverlayNewButton, OverlaySidebar, OverlaySplitLayout } from '../overlays/overlay-split-layout'
import { OverlayView } from '../overlays/overlay-view'
import {
Panel,
PanelAddButton,
PanelBody,
PanelDetail,
PanelEmpty,
PanelHeader,
PanelList,
PanelListRow,
PanelMeta,
PanelPill,
PanelRowMenu,
PanelSectionLabel
} from '../overlays/panel'
const PROFILE_NAME_RE = /^[a-z0-9][a-z0-9_-]{0,63}$/
@@ -49,7 +64,9 @@ export function ProfilesView({ onClose }: ProfilesViewProps) {
const p = t.profiles
const [profiles, setProfiles] = useState<null | ProfileInfo[]>(null)
const [selectedName, setSelectedName] = useState<null | string>(null)
const [query, setQuery] = useState('')
const [createOpen, setCreateOpen] = useState(false)
const [pendingRename, setPendingRename] = useState<null | ProfileInfo>(null)
const [pendingDelete, setPendingDelete] = useState<null | ProfileInfo>(null)
const [deleting, setDeleting] = useState(false)
@@ -83,6 +100,18 @@ export function ProfilesView({ onClose }: ProfilesViewProps) {
return profiles.find(p => p.name === selectedName) ?? profiles[0] ?? null
}, [profiles, selectedName])
const visibleProfiles = useMemo(() => {
const q = query.trim().toLowerCase()
if (!profiles || !q) {
return profiles ?? []
}
return profiles.filter(
profile => profile.name.toLowerCase().includes(q) || (profile.model ?? '').toLowerCase().includes(q)
)
}, [profiles, query])
const handleCreate = useCallback(
async (name: string, cloneFrom: null | string) => {
const trimmed = name.trim()
@@ -140,46 +169,79 @@ export function ProfilesView({ onClose }: ProfilesViewProps) {
}, [p, pendingDelete, refresh])
return (
<OverlayView closeLabel={p.close} onClose={onClose}>
<Panel closeLabel={p.close} onClose={onClose}>
{!profiles ? (
<PageLoader label={p.loading} />
) : profiles.length === 0 ? (
<PanelEmpty
action={
<Button onClick={() => setCreateOpen(true)} size="sm">
{p.newProfile}
</Button>
}
description={p.createDesc}
icon="organization"
title={p.noProfiles}
/>
) : (
<OverlaySplitLayout>
<OverlaySidebar>
<OverlayNewButton label={p.newProfile} onClick={() => setCreateOpen(true)} />
{profiles.map(profile => (
<ProfileRow
active={selected?.name === profile.name}
key={profile.name}
onSelect={() => setSelectedName(profile.name)}
profile={profile}
/>
))}
{profiles.length === 0 && (
<p className="px-2 py-4 text-center text-xs text-muted-foreground">{p.noProfiles}</p>
)}
</OverlaySidebar>
<>
<PanelHeader subtitle={p.count(profiles.length)} title={p.title} />
<PanelBody>
<PanelList
onSearchChange={setQuery}
searchLabel={p.search}
searchPlaceholder={p.search}
searchValue={query}
>
{visibleProfiles.map(profile => (
<ProfileRow
active={selected?.name === profile.name}
key={profile.name}
menu={
<PanelRowMenu
items={
profile.is_default
? []
: [
{ icon: 'edit', label: p.rename, onSelect: () => setPendingRename(profile) },
{
icon: 'trash',
label: t.common.delete,
onSelect: () => setPendingDelete(profile),
tone: 'danger'
}
]
}
/>
}
onSelect={() => setSelectedName(profile.name)}
profile={profile}
/>
))}
<PanelAddButton label={p.newProfile} onClick={() => setCreateOpen(true)} />
</PanelList>
<OverlayMain className="px-0">
{selected ? (
<ProfileDetail
key={selected.name}
onDelete={() => setPendingDelete(selected)}
onRename={newName => handleRename(selected.name, newName)}
profile={selected}
/>
<ProfileDetail key={selected.name} profile={selected} />
) : (
<div className="grid h-full place-items-center px-6 py-12 text-center text-sm text-muted-foreground">
<div>
<Users className="mx-auto size-6 text-muted-foreground/60" />
<p className="mt-3">{p.selectPrompt}</p>
</div>
</div>
<PanelEmpty description={p.selectPrompt} icon="account" />
)}
</OverlayMain>
</OverlaySplitLayout>
</PanelBody>
</>
)}
<RenameProfileDialog
currentName={pendingRename?.name ?? ''}
onClose={() => setPendingRename(null)}
onRename={async newName => {
if (pendingRename) {
await handleRename(pendingRename.name, newName)
setPendingRename(null)
}
}}
open={pendingRename !== null}
/>
<CreateProfileDialog
onClose={() => setCreateOpen(false)}
onCreate={async (name, cloneFrom) => handleCreate(name, cloneFrom)}
@@ -213,150 +275,106 @@ export function ProfilesView({ onClose }: ProfilesViewProps) {
</DialogFooter>
</DialogContent>
</Dialog>
</OverlayView>
</Panel>
)
}
function ProfileRow({ active, onSelect, profile }: { active: boolean; onSelect: () => void; profile: ProfileInfo }) {
const { t } = useI18n()
const p = t.profiles
return (
<button
className={cn(
'flex w-full flex-col items-start gap-0.5 rounded-md px-2 py-1.5 text-left transition-colors',
active ? 'bg-accent text-foreground' : 'text-foreground/85 hover:bg-accent/60'
)}
onClick={onSelect}
type="button"
>
<span className="flex w-full items-center justify-between gap-2">
<span className="truncate text-sm font-medium">{profile.name}</span>
{profile.is_default && <span className="text-[0.6rem] text-primary">{p.default}</span>}
</span>
<span className="text-[0.66rem] text-muted-foreground">
{p.skills(profile.skill_count)}
{profile.has_env ? ` · ${p.env}` : ''}
</span>
</button>
)
}
function ProfileDetail({
onDelete,
onRename,
function ProfileRow({
active,
menu,
onSelect,
profile
}: {
onDelete: () => void
onRename: (newName: string) => Promise<void>
active: boolean
menu?: React.ReactNode
onSelect: () => void
profile: ProfileInfo
}) {
const { t } = useI18n()
const p = t.profiles
const [renameOpen, setRenameOpen] = useState(false)
const [copying, setCopying] = useState(false)
const handleCopySetup = useCallback(async () => {
setCopying(true)
try {
const { command } = await getProfileSetupCommand(profile.name)
await navigator.clipboard.writeText(command)
notify({ kind: 'success', title: p.setupCopied, message: command })
} catch (err) {
notifyError(err, p.failedCopy)
} finally {
setCopying(false)
}
}, [p, profile.name])
const colors = useStore($profileColors)
return (
<div className="flex h-full min-h-0 flex-col">
<div className="min-h-0 flex-1 overflow-y-auto">
<div className="mx-auto max-w-2xl space-y-6 px-6 py-6">
<header className="space-y-3">
<div className="flex flex-wrap items-start justify-between gap-3">
<div className="min-w-0">
<div className="flex flex-wrap items-center gap-2">
<h3 className="text-xl font-semibold tracking-tight">{profile.name}</h3>
{profile.is_default && (
<span className="rounded-full bg-primary/10 px-2 py-0.5 text-[0.65rem] font-medium text-primary">
{p.defaultBadge}
</span>
)}
{profile.has_env && (
<span className="rounded-full bg-muted px-2 py-0.5 text-[0.65rem] font-medium text-muted-foreground">
.env
</span>
)}
</div>
<p className="mt-1 font-mono text-[0.7rem] text-muted-foreground" title={profile.path}>
{profile.path}
</p>
</div>
<div className="flex shrink-0 items-center gap-1">
{!profile.is_default && (
<Button onClick={() => setRenameOpen(true)} size="sm" variant="outline">
<Pencil />
{p.rename}
</Button>
)}
<Button disabled={copying} onClick={() => void handleCopySetup()} size="sm" variant="outline">
<Terminal />
{copying ? p.copying : p.copySetup}
</Button>
{!profile.is_default && (
<Button
className="text-muted-foreground hover:bg-destructive/10 hover:text-destructive"
onClick={onDelete}
size="sm"
variant="ghost"
>
<Trash2 />
{t.common.delete}
</Button>
)}
</div>
</div>
<dl className="grid gap-2 text-xs sm:grid-cols-2">
<DetailRow label={p.modelLabel}>
{profile.model ? (
<>
<span className="font-mono">{profile.model}</span>
{profile.provider && <span className="text-muted-foreground"> · {profile.provider}</span>}
</>
) : (
<span className="text-muted-foreground">{p.notSet}</span>
)}
</DetailRow>
<DetailRow label={p.skillsLabel}>{profile.skill_count}</DetailRow>
</dl>
</header>
<SoulEditor profileName={profile.name} />
</div>
</div>
<RenameProfileDialog
currentName={profile.name}
onClose={() => setRenameOpen(false)}
onRename={async newName => {
await onRename(newName)
setRenameOpen(false)
}}
open={renameOpen}
/>
</div>
<PanelListRow
active={active}
lead={
<ProfileGlyph
color={resolveProfileColor(profile.name, colors)}
isDefault={profile.is_default}
name={profile.name}
/>
}
menu={menu}
onSelect={onSelect}
rowKey={profile.name}
title={profile.name}
/>
)
}
function DetailRow({ children, label }: { children: React.ReactNode; label: string }) {
// Leading glyph for a profile row, mirroring the sidebar rail: the default
// profile gets the `home` icon; named profiles get a soft color-tinted square
// with their initial in the profile's color.
function ProfileGlyph({ color, isDefault, name }: { color: null | string; isDefault: boolean; name: string }) {
if (isDefault) {
return <Codicon className="shrink-0 text-muted-foreground/70" name="home" size="0.9rem" />
}
const hue = color ?? 'var(--ui-text-quaternary)'
const initial =
name
.replace(/[^a-z0-9]/gi, '')
.charAt(0)
.toUpperCase() || '?'
return (
<div className="flex flex-wrap items-baseline gap-2">
<dt className="text-[0.65rem] font-semibold uppercase tracking-[0.12em] text-muted-foreground">{label}</dt>
<dd className="text-sm text-foreground">{children}</dd>
</div>
<span
aria-hidden="true"
className="grid size-4 shrink-0 place-items-center rounded-[3px] text-[0.5rem] font-semibold uppercase leading-none"
style={{ backgroundColor: profileColorSoft(hue, 22), color: color ?? undefined }}
>
{initial}
</span>
)
}
function ProfileDetail({ profile }: { profile: ProfileInfo }) {
const { t } = useI18n()
const p = t.profiles
return (
<PanelDetail>
<header className="space-y-3">
<div className="min-w-0">
<div className="flex flex-wrap items-center gap-2">
<h3 className="text-[0.95rem] font-semibold tracking-tight text-foreground">{profile.name}</h3>
{profile.is_default && <PanelPill tone="good">{p.defaultBadge}</PanelPill>}
{profile.has_env && <PanelPill tone="muted">.env</PanelPill>}
</div>
<p className="mt-1 truncate font-mono text-[0.66rem] text-muted-foreground/55" title={profile.path}>
{profile.path}
</p>
</div>
<PanelMeta
rows={[
{
label: p.modelLabel,
value: profile.model ? (
<span className="font-mono">
{profile.model}
{profile.provider ? <span className="text-muted-foreground/55"> · {profile.provider}</span> : null}
</span>
) : (
<span className="text-muted-foreground/55">{p.notSet}</span>
)
},
{ label: p.skillsLabel, value: profile.skill_count }
]}
/>
</header>
<SoulEditor profileName={profile.name} />
</PanelDetail>
)
}
@@ -419,7 +437,7 @@ function SoulEditor({ profileName }: { profileName: string }) {
<section className="space-y-2">
<div className="flex flex-wrap items-baseline justify-between gap-2">
<div>
<h4 className="text-[0.7rem] font-semibold uppercase tracking-[0.14em] text-muted-foreground">SOUL.md</h4>
<PanelSectionLabel className="text-[0.7rem] tracking-[0.14em]">SOUL.md</PanelSectionLabel>
<p className="text-xs text-muted-foreground">{p.soulDesc}</p>
</div>
{dirty && <span className="text-[0.65rem] text-muted-foreground">{p.unsavedChanges}</span>}
@@ -429,7 +447,7 @@ function SoulEditor({ profileName }: { profileName: string }) {
<PageLoader className="min-h-44" label={p.loadingSoul} />
) : (
<Textarea
className="min-h-72 font-mono text-xs leading-5"
className="min-h-48 font-mono text-xs leading-5"
onChange={event => setContent(event.target.value)}
placeholder={isEmpty ? p.emptySoul : undefined}
value={content}
@@ -437,7 +455,7 @@ function SoulEditor({ profileName }: { profileName: string }) {
)}
{error && (
<div className="flex items-start gap-2 rounded-md border border-destructive/30 bg-destructive/10 px-3 py-2 text-xs text-destructive">
<div className="flex items-start gap-2 rounded bg-destructive/10 px-3 py-2 text-xs text-destructive">
<AlertTriangle className="mt-0.5 size-3.5 shrink-0" />
<span>{error}</span>
</div>

View File

@@ -120,14 +120,14 @@ export function RemoteFolderPicker() {
return (
<Dialog onOpenChange={open => !open && close()} open={Boolean(pending)}>
<DialogContent className="max-w-lg gap-0 overflow-hidden p-0">
<div className="border-b border-border/70 px-4 py-3">
<DialogContent className="flex h-[min(36rem,calc(100vh-4rem))] max-w-lg flex-col gap-0 overflow-hidden p-0">
<div className="shrink-0 border-b border-border/70 px-4 py-3">
<DialogTitle className="text-sm">{pending?.title || r.remotePickerTitle}</DialogTitle>
<DialogDescription className="mt-1 text-xs">{r.remotePickerDescription}</DialogDescription>
</div>
<div className="flex min-h-[22rem] flex-col">
<div className="flex flex-wrap items-center gap-1 border-b border-border/50 px-3 py-2 text-xs text-muted-foreground">
<div className="flex min-h-0 flex-1 flex-col">
<div className="shrink-0 flex flex-wrap items-center gap-1 border-b border-border/50 px-3 py-2 text-xs text-muted-foreground">
{crumbs.map((crumb, index) => (
<button
className={cn(
@@ -166,7 +166,7 @@ export function RemoteFolderPicker() {
</div>
</div>
<div className="flex items-center justify-between gap-2 border-t border-border/70 px-4 py-3">
<div className="shrink-0 flex items-center justify-between gap-2 border-t border-border/70 px-4 py-3">
<div className="min-w-0 truncate text-xs text-muted-foreground">{currentPath}</div>
<div className="flex shrink-0 items-center gap-2">
<Button onClick={() => close()} size="sm" variant="ghost">

View File

@@ -16,7 +16,6 @@ import { $currentCwd } from '@/store/session'
import { SidebarPanelLabel } from '../shell/sidebar-label'
import { RemoteFolderPicker } from './files/remote-picker'
import { ProjectTree } from './files/tree'
import { useProjectTree } from './files/use-project-tree'
@@ -82,8 +81,6 @@ export function RightSidebarPane({ onActivateFile, onActivateFolder }: RightSide
: 'border-l shadow-[inset_0.0625rem_0_0_color-mix(in_srgb,white_18%,transparent)]'
)}
>
<RemoteFolderPicker />
<FilesystemTab
canCollapse={canCollapse}
collapseNonce={collapseNonce}

View File

@@ -0,0 +1,100 @@
// Live agent-terminal output, pushed from the backend as `agent.terminal.output`
// events (see tui_gateway `_wire_agent_terminal_output`). Chunks route straight
// to the matching read-only xterm, keyed by process id — no polling, no tail
// truncation. A capped per-proc backlog lets a tab opened mid-stream replay what
// it missed, and lets a closed-then-reopened tab restore its history.
type Writer = (chunk: string) => void
const writers = new Map<string, Writer>()
const backlog = new Map<string, string>()
const commandHeaders = new Map<string, string>()
const lastSnapshots = new Map<string, string>()
const seededCommands = new Set<string>()
const MAX_BACKLOG = 256_000
/** A live agent terminal registers its xterm write and replays the backlog.
* Returns an idempotent unregister. */
export function registerAgentTerminalWriter(procId: string, write: Writer): () => void {
writers.set(procId, write)
const history = backlog.get(procId)
if (history) {
write(history)
}
return () => {
if (writers.get(procId) === write) {
writers.delete(procId)
}
}
}
/** Append a streamed chunk: buffer it (capped) for future opens and write it to
* the live terminal, if one is mounted. */
export function writeAgentTerminalChunk(procId: string, chunk: string): void {
if (!procId || !chunk) {
return
}
const next = (backlog.get(procId) ?? '') + chunk
backlog.set(procId, next.length > MAX_BACKLOG ? next.slice(-MAX_BACKLOG) : next)
writers.get(procId)?.(chunk)
}
/** Seed the tab with the command immediately, so an agent terminal never opens
* as an empty void while stdout is still pending or not yet observed. */
export function seedAgentTerminalCommand(procId: string, command: string): void {
const trimmed = command.trim()
if (!procId || !trimmed || seededCommands.has(procId)) {
return
}
seededCommands.add(procId)
const header = `$ ${trimmed}\r\n`
commandHeaders.set(procId, header)
writeAgentTerminalChunk(procId, header)
}
/** Ingest a full output snapshot from process.list/status-stack. This is the
* fallback for older/not-yet-restarted gateways and a seed for tabs opened
* after output already exists. If it extends our current backlog, append only
* the delta; if the registry's rolling tail slid, reset to that tail. */
export function syncAgentTerminalSnapshot(procId: string, output: string): void {
if (!procId || !output) {
return
}
const current = backlog.get(procId) ?? ''
const header = commandHeaders.get(procId) ?? ''
const body = header && current.startsWith(header) ? current.slice(header.length) : current
const previous = lastSnapshots.get(procId) ?? ''
if (output === previous || output === body || body.endsWith(output)) {
lastSnapshots.set(procId, output)
return
}
if (output.startsWith(previous)) {
writeAgentTerminalChunk(procId, output.slice(previous.length))
lastSnapshots.set(procId, output)
return
}
if (output.startsWith(body)) {
writeAgentTerminalChunk(procId, output.slice(body.length))
lastSnapshots.set(procId, output)
return
}
const next = `${header}${output}`.slice(-MAX_BACKLOG)
lastSnapshots.set(procId, output)
backlog.set(procId, next)
writers.get(procId)?.(`\x1bc${next}`)
}

View File

@@ -19,17 +19,32 @@ export interface TerminalReadOptions {
type Reader = (opts: TerminalReadOptions) => TerminalReadResult
// The persistent terminal is a singleton (one xterm mounted forever), so a
// module-level slot is enough — set while the session is live, cleared on
// dispose. The gateway `terminal.read.request` handler reads through this.
let activeReader: Reader | null = null
// Each live terminal registers a reader keyed by its id; a single `activeId`
// (driven by the tab selection) decides which one the agent's `read_terminal`
// tool sees. Keying by id keeps switching race-free — a deactivating tab's
// cleanup can't null out the tab that just activated.
const readers = new Map<string, Reader>()
let activeId: string | null = null
export function setActiveTerminalReader(reader: Reader | null): void {
activeReader = reader
/** Register a live terminal's reader; returns an idempotent unregister. */
export function registerTerminalReader(id: string, reader: Reader): () => void {
readers.set(id, reader)
return () => {
if (readers.get(id) === reader) {
readers.delete(id)
}
}
}
export function setActiveTerminalId(id: string | null): void {
activeId = id
}
export function readActiveTerminal(opts: TerminalReadOptions = {}): TerminalReadResult | null {
return activeReader ? activeReader(opts) : null
const reader = activeId === null ? null : readers.get(activeId)
return reader ? reader(opts) : null
}
export function makeTerminalReader(term: Terminal): Reader {

View File

@@ -0,0 +1,24 @@
import { useStore } from '@nanostores/react'
import { TerminalSlot } from './persistent'
import { TerminalRail } from './rail'
import { $terminals } from './terminals'
/** Pane-side terminal chrome: the body slot (which the persistent overlay chases)
* plus the always-on tab rail. Lives in the real pane DOM — NOT the z-4 terminal
* overlay — so the rail sits above the collapsed sidebars' z-30 hover-reveal
* triggers (z-40, like the thread timeline) and suppresses them while hovered.
* The rail is always shown when a terminal exists (even one), so every tab keeps
* its close affordance; closing the last one hides the pane (reopen re-creates). */
export function TerminalPaneChrome() {
const terminals = useStore($terminals)
return (
<div className="flex min-h-0 min-w-0 flex-1">
<div className="relative flex min-h-0 min-w-0 flex-1 flex-col">
<TerminalSlot />
</div>
{terminals.length > 0 && <TerminalRail />}
</div>
)
}

View File

@@ -1,88 +0,0 @@
import '@xterm/xterm/css/xterm.css'
import { Button } from '@/components/ui/button'
import { Codicon } from '@/components/ui/codicon'
import { KbdCombo } from '@/components/ui/kbd'
import { Loader } from '@/components/ui/loader'
import { Tip } from '@/components/ui/tooltip'
import { useI18n } from '@/i18n'
import { SidebarPanelLabel } from '../../shell/sidebar-label'
import { setTerminalTakeover } from '../store'
import { useTerminalSession } from './use-terminal-session'
interface TerminalTabProps {
cwd: string
onAddSelectionToChat: (text: string, label?: string) => void
}
export function TerminalTab({ cwd, onAddSelectionToChat }: TerminalTabProps) {
const { t } = useI18n()
const { addSelectionToChat, hostRef, selection, selectionStyle, shellName, status } = useTerminalSession({
cwd,
onAddSelectionToChat
})
const label = t.rightSidebar.terminalHide
return (
<div className="relative flex min-h-0 min-w-0 flex-1 flex-col">
<div className="flex h-8 shrink-0 items-center gap-2 px-2.5">
<SidebarPanelLabel className="text-(--ui-text-secondary)!">{shellName}</SidebarPanelLabel>
<Tip label={label}>
<Button
aria-label={label}
className="ml-auto size-6 rounded-md text-(--ui-text-secondary)!"
onClick={() => setTerminalTakeover(false)}
size="icon"
type="button"
variant="ghost"
>
<Codicon name="close" size="0.875rem" />
</Button>
</Tip>
</div>
<div className="relative min-h-0 flex-1 bg-(--ui-editor-surface-background) p-2">
{status === 'starting' && (
<div className="pointer-events-none absolute inset-0 z-10 grid place-items-center">
<Loader
className="size-8 text-(--ui-text-tertiary)"
pathSteps={180}
strokeScale={0.68}
type="spiral-search"
/>
</div>
)}
{selection.trim() && (
<div className="absolute z-50 flex items-center gap-1" style={selectionStyle ?? { right: 12, top: 8 }}>
<Button
className="h-6 rounded-md px-2 text-[0.68rem] shadow-md backdrop-blur-md"
onClick={event => event.preventDefault()}
onMouseDown={event => {
event.preventDefault()
event.stopPropagation()
addSelectionToChat()
}}
type="button"
variant="secondary"
>
{t.rightSidebar.addToChat}
<KbdCombo className="ml-1 opacity-70" combo="mod+l" size="sm" />
</Button>
</div>
)}
{/* Outer div paints terminal inset; inner div is the xterm host so the
canvas sizes to the content area and p-2 stays as terminal padding.
Screen/viewport inherit the live skin surface so the terminal blends
with the app and follows light/dark; the xterm canvas itself is
painted the resolved surface color in use-terminal-session. */}
<div
className="h-full min-h-0 overflow-hidden text-(--ui-text-secondary) [&_.xterm]:h-full [&_.xterm-screen]:bg-(--ui-editor-surface-background)! [&_.xterm-viewport]:bg-(--ui-editor-surface-background)!"
ref={hostRef}
/>
</div>
</div>
)
}

View File

@@ -0,0 +1,102 @@
import '@xterm/xterm/css/xterm.css'
import { Button } from '@/components/ui/button'
import { KbdCombo } from '@/components/ui/kbd'
import { Loader } from '@/components/ui/loader'
import { useI18n } from '@/i18n'
import { cn } from '@/lib/utils'
import { reportTerminalShell } from './terminals'
import { useAgentTerminal } from './use-agent-terminal'
import { useTerminalSession } from './use-terminal-session'
// Absolute-stacked so inactive tabs keep layout size (a display:none host goes
// 0×0 and renders garbled on re-show); visibility toggles which one is seen.
const INSTANCE_CLASS = 'absolute inset-0 flex flex-col bg-(--ui-editor-surface-background) px-2 pb-2 pt-0'
interface TerminalInstanceProps {
id: string
cwd: string
active: boolean
onAddSelectionToChat: (text: string, label?: string) => void
reviveBuffer?: string
}
/** One persistent xterm+PTY. Every open tab stays mounted (so its shell and
* scrollback survive tab switches); only the active one is shown. */
export function TerminalInstance({ id, active, cwd, onAddSelectionToChat, reviveBuffer }: TerminalInstanceProps) {
const { t } = useI18n()
const { addSelectionToChat, hostRef, selection, selectionStyle, status } = useTerminalSession({
id,
cwd,
active,
onAddSelectionToChat,
reviveBuffer,
onShell: shell => reportTerminalShell(id, shell)
})
return (
<div
className={cn(INSTANCE_CLASS, active ? 'visible' : 'invisible pointer-events-none')}
// Focus-scope marker so isFocusWithin('[data-terminal]') can route ⌘W here.
data-terminal=""
>
{status === 'starting' && (
<div className="pointer-events-none absolute inset-0 z-10 grid place-items-center">
<Loader className="size-8 text-(--ui-text-tertiary)" pathSteps={180} strokeScale={0.68} type="spiral-search" />
</div>
)}
{selection.trim() && (
<div className="absolute z-50 flex items-center gap-1" style={selectionStyle ?? { right: 12, top: 8 }}>
<Button
className="h-6 rounded-md px-2 text-[0.68rem] shadow-md backdrop-blur-md"
onClick={event => event.preventDefault()}
onMouseDown={event => {
event.preventDefault()
event.stopPropagation()
addSelectionToChat()
}}
type="button"
variant="secondary"
>
{t.rightSidebar.addToChat}
<KbdCombo className="ml-1 opacity-70" combo="mod+l" size="sm" />
</Button>
</div>
)}
{/* Outer div paints the terminal inset; inner div is the xterm host so the
canvas sizes to the content area and p-2 stays as terminal padding. */}
<div
className="h-full min-h-0 overflow-hidden text-(--ui-text-secondary) [&_.xterm]:h-full [&_.xterm-screen]:bg-(--ui-editor-surface-background)! [&_.xterm-viewport]:bg-(--ui-editor-surface-background)!"
ref={hostRef}
/>
</div>
)
}
interface AgentTerminalInstanceProps {
active: boolean
id: string
procId: string
}
/** Read-only mirror of an agent background process — a write-only xterm streamed
* live from the backend output (no PTY, no input). */
export function AgentTerminalInstance({ active, id, procId }: AgentTerminalInstanceProps) {
const { hostRef } = useAgentTerminal({ active, id, procId })
return (
<div
className={cn(INSTANCE_CLASS, active ? 'visible' : 'invisible pointer-events-none')}
// Same focus-scope marker as the user terminal so isFocusWithin('[data-terminal]')
// routes ⌘W here and closes the focused agent tab (not a preview).
data-terminal=""
>
<div
className="h-full min-h-0 overflow-hidden text-(--ui-text-secondary) [&_.xterm]:h-full [&_.xterm-screen]:bg-(--ui-editor-surface-background)! [&_.xterm-viewport]:bg-(--ui-editor-surface-background)!"
ref={hostRef}
/>
</div>
)
}

View File

@@ -4,7 +4,8 @@ import { type CSSProperties, useEffect, useLayoutEffect, useRef, useState } from
import { $terminalTakeover } from '../store'
import { TerminalTab } from './index'
import { ensureTerminal } from './terminals'
import { TerminalWorkspace } from './workspace'
/**
* One xterm Terminal mounted at the layout root and CSS-overlayed onto
@@ -40,7 +41,6 @@ export function TerminalSlot({ className = SLOT_CLASS }: { className?: string })
}
interface PersistentTerminalProps {
cwd: string
onAddSelectionToChat: (text: string, label?: string) => void
}
@@ -54,12 +54,26 @@ interface Rect {
const sameRect = (a: Rect | null, b: Rect) =>
!!a && a.top === b.top && a.left === b.left && a.width === b.width && a.height === b.height
export function PersistentTerminal({ cwd, onAddSelectionToChat }: PersistentTerminalProps) {
export function PersistentTerminal({ onAddSelectionToChat }: PersistentTerminalProps) {
const slot = useStore($slot)
const terminalTakeover = useStore($terminalTakeover)
const [rect, setRect] = useState<Rect | null>(null)
const [ready, setReady] = useState(false)
// VS Code parity: once the pane has ever been opened, keep the terminals
// mounted — and their shells alive — even while hidden. Hiding the pane just
// collapses the slot, so the overlay below goes invisible; nothing is torn
// down. Only an explicit per-tab close kills a PTY. Re-opening re-ensures one
// terminal exists (covers having closed the last tab).
const [mounted, setMounted] = useState(false)
useEffect(() => {
if (terminalTakeover && ready) {
setMounted(true)
ensureTerminal()
}
}, [terminalTakeover, ready])
useLayoutEffect(() => {
if (!slot) {
setRect(null)
@@ -114,12 +128,12 @@ export function PersistentTerminal({ cwd, onAddSelectionToChat }: PersistentTerm
contain: 'layout size paint'
}
// Defer mount until the terminal sidebar is open and the slot has real dims.
// Booting xterm/node-pty at 0×0 starts the shell at 80×24 and spawns a
// visible conhost on Windows even when the pane is collapsed.
// Defer the FIRST mount until the pane is open and the slot has real dims
// booting xterm/node-pty at 0×0 starts the shell at 80×24 and spawns a visible
// conhost on Windows. After that `mounted` latches: shells persist while hidden.
return (
<div aria-hidden={!visible} style={style}>
{terminalTakeover && ready && <TerminalTab cwd={cwd} onAddSelectionToChat={onAddSelectionToChat} />}
{mounted && <TerminalWorkspace onAddSelectionToChat={onAddSelectionToChat} />}
</div>
)
}

View File

@@ -0,0 +1,177 @@
import { useStore } from '@nanostores/react'
import { Codicon } from '@/components/ui/codicon'
import {
ContextMenu,
ContextMenuContent,
ContextMenuItem,
ContextMenuSeparator,
ContextMenuTrigger
} from '@/components/ui/context-menu'
import { Tip } from '@/components/ui/tooltip'
import { useI18n } from '@/i18n'
import { formatCombo } from '@/lib/keybinds/combo'
import { cn } from '@/lib/utils'
import { $bindings } from '@/store/keybinds'
import { setTerminalTakeover } from '../store'
import {
$activeTerminalId,
$terminals,
closeAllTerminals,
closeOtherTerminals,
closeTerminal,
createTerminal,
selectTerminal,
type TerminalEntry
} from './terminals'
const RAIL_ACTION =
'grid size-6 place-items-center rounded text-(--ui-text-tertiary) transition-colors hover:bg-(--chrome-action-hover) hover:text-foreground focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-sidebar-ring [-webkit-app-region:no-drag]'
/** Tooltip label with a trailing hotkey hint (the user's live binding). */
function hintLabel(text: string, combo?: string) {
return combo ? (
<span className="flex items-center gap-2">
<span>{text}</span>
<span className="opacity-55">{formatCombo(combo)}</span>
</span>
) : (
text
)
}
/** Thin icon "bookmark" strip blended into the terminal surface, shown whenever a
* terminal exists. Each square is a tab (name + hotkey on hover); close via the
* shell's `exit`, middle-click, or the context menu. */
export function TerminalRail() {
const { t } = useI18n()
const terminals = useStore($terminals)
const activeId = useStore($activeTerminalId)
const bindings = useStore($bindings)
const toggleHint = bindings['view.showTerminal']?.[0]
const newHint = bindings['view.newTerminal']?.[0]
return (
<div
className="group/rail relative z-40 flex h-full w-9 shrink-0 flex-col items-center border-l border-(--ui-stroke-quaternary) bg-(--ui-editor-surface-background)"
// The rail sits at the pane's outer edge, under the collapsed sidebars'
// hover-reveal triggers; mark it so those triggers go pointer-transparent
// while it's hovered (see the suppression rules in styles.css) and a reach
// for a tab can't drag in the file-browser/review panel.
data-suppress-pane-reveal=""
>
<ul
aria-label={t.rightSidebar.terminalsAria}
className="flex min-h-0 flex-1 flex-col items-center gap-0.5 self-stretch overflow-y-auto overflow-x-hidden overscroll-contain py-1 [-ms-overflow-style:none] [scrollbar-width:none] [&::-webkit-scrollbar]:hidden"
role="tablist"
>
{terminals.map((term, index) => (
<TerminalRailItem
active={term.id === activeId}
canCloseOthers={terminals.length > 1}
index={index}
key={term.id}
term={term}
toggleHint={toggleHint}
/>
))}
<li className="flex w-full justify-center">
<Tip label={hintLabel(t.rightSidebar.terminalNew, newHint)} side="left">
<button
aria-label={t.rightSidebar.terminalNew}
className={cn(RAIL_ACTION, 'size-7 text-(--ui-text-quaternary)')}
onClick={() => createTerminal()}
type="button"
>
<Codicon name="add" size="0.8125rem" />
</button>
</Tip>
</li>
</ul>
<div className="flex shrink-0 flex-col items-center pb-1.5">
<Tip label={t.rightSidebar.terminalHide} side="left">
<button
aria-label={t.rightSidebar.terminalHide}
className={cn(RAIL_ACTION, 'opacity-0 transition-opacity group-hover/rail:opacity-100')}
onClick={() => setTerminalTakeover(false)}
type="button"
>
<Codicon name="chevron-down" size="0.8125rem" />
</button>
</Tip>
</div>
</div>
)
}
interface TerminalRailItemProps {
active: boolean
canCloseOthers: boolean
index: number
term: TerminalEntry
toggleHint?: string
}
function TerminalRailItem({ active, canCloseOthers, index, term, toggleHint }: TerminalRailItemProps) {
const { t } = useI18n()
const label = `${index + 1}. ${term.title}`
return (
<ContextMenu>
<ContextMenuTrigger asChild>
<li className="relative flex w-full justify-center [-webkit-app-region:no-drag]">
{active && (
<span
aria-hidden="true"
className="absolute inset-y-0.5 right-0 w-0.5 rounded-l-sm bg-(--ui-stroke-primary)"
/>
)}
<Tip label={hintLabel(label, toggleHint)} side="left">
<button
aria-label={label}
aria-selected={active}
className={cn(
'grid size-7 place-items-center rounded-md transition-colors',
active
? 'bg-(--chrome-action-hover) text-foreground'
: 'text-(--ui-text-tertiary) hover:bg-(--chrome-action-hover) hover:text-foreground'
)}
onAuxClick={event => {
if (event.button === 1) {
event.preventDefault()
closeTerminal(term.id)
}
}}
onClick={() => selectTerminal(term.id)}
onMouseDown={event => {
if (event.button === 1) {
event.preventDefault()
}
}}
role="tab"
type="button"
>
<Codicon
className={cn(term.kind === 'agent' && !active && 'text-primary')}
name={term.kind === 'agent' ? 'agent' : 'terminal'}
size="0.875rem"
/>
</button>
</Tip>
</li>
</ContextMenuTrigger>
<ContextMenuContent>
<ContextMenuItem onSelect={() => closeTerminal(term.id)}>{t.common.close}</ContextMenuItem>
<ContextMenuItem disabled={!canCloseOthers} onSelect={() => closeOtherTerminals(term.id)}>
{t.rightSidebar.terminalCloseOthers}
</ContextMenuItem>
<ContextMenuItem onSelect={closeAllTerminals}>{t.rightSidebar.terminalCloseAll}</ContextMenuItem>
<ContextMenuSeparator />
<ContextMenuItem onSelect={() => setTerminalTakeover(false)}>{t.rightSidebar.terminalHide}</ContextMenuItem>
</ContextMenuContent>
</ContextMenu>
)
}

View File

@@ -0,0 +1,90 @@
import { atom } from 'nanostores'
import { beforeEach, describe, expect, it, vi } from 'vitest'
const STORAGE_KEY = 'hermes.desktop.terminals.v1'
async function loadTerminalStore() {
vi.doMock('@/store/session', () => ({
$currentCwd: atom('/workspace')
}))
return import('./terminals')
}
describe('terminal store persistence', () => {
beforeEach(() => {
window.localStorage.clear()
vi.resetModules()
})
it('restores user tabs, active tab, and history on module load', async () => {
window.localStorage.setItem(
STORAGE_KEY,
JSON.stringify({
activeTerminalId: 'term-two',
terminals: [
{ auto: false, cwd: '/repo/one', id: 'term-one', reviveBuffer: 'last output', title: 'zsh' },
{ auto: true, cwd: '/repo/two', id: 'term-two', title: 'Terminal' }
]
})
)
const { $activeTerminalId, $terminals } = await loadTerminalStore()
expect($activeTerminalId.get()).toBe('term-two')
expect($terminals.get()).toEqual([
{ auto: false, cwd: '/repo/one', id: 'term-one', kind: 'user', reviveBuffer: 'last output', title: 'zsh' },
{ auto: true, cwd: '/repo/two', id: 'term-two', kind: 'user', title: 'Terminal' }
])
})
it('persists user tabs and history synchronously, skipping agent mirrors', async () => {
const { createTerminal, ensureAgentTerminal, renameTerminal, selectTerminal, updateTerminalReviveBuffer } =
await loadTerminalStore()
const userId = createTerminal('/repo')
renameTerminal(userId, 'server')
updateTerminalReviveBuffer(userId, 'recent scrollback')
ensureAgentTerminal('proc-1', 'background task')
selectTerminal(userId)
// No flush/tick: persistence is synchronous, so the snapshot is already on
// disk (this is what makes app-quit restore reliable).
expect(JSON.parse(window.localStorage.getItem(STORAGE_KEY) ?? '{}')).toEqual({
activeTerminalId: userId,
terminals: [{ auto: false, cwd: '/repo', id: userId, reviveBuffer: 'recent scrollback', title: 'server' }]
})
})
it('never attaches a revive buffer to an agent tab', async () => {
const { $terminals, ensureAgentTerminal, updateTerminalReviveBuffer } = await loadTerminalStore()
const agentId = ensureAgentTerminal('proc-1', 'background task')!
updateTerminalReviveBuffer(agentId, 'should be ignored')
expect($terminals.get().find(term => term.id === agentId)?.reviveBuffer).toBeUndefined()
expect(window.localStorage.getItem(STORAGE_KEY)).toBeNull()
})
it('tail-trims an oversized revive buffer to stay under the storage budget', async () => {
const { $terminals, createTerminal, updateTerminalReviveBuffer } = await loadTerminalStore()
const userId = createTerminal('/repo')
const huge = 'x'.repeat(60_000)
updateTerminalReviveBuffer(userId, huge)
const stored = $terminals.get().find(term => term.id === userId)?.reviveBuffer ?? ''
expect(stored.length).toBe(48_000)
expect(stored).toBe(huge.slice(-48_000))
})
it('clears remembered tabs when all terminals close', async () => {
const { closeAllTerminals, createTerminal } = await loadTerminalStore()
createTerminal('/repo')
expect(window.localStorage.getItem(STORAGE_KEY)).not.toBeNull()
closeAllTerminals()
expect(window.localStorage.getItem(STORAGE_KEY)).toBeNull()
})
})

View File

@@ -0,0 +1,330 @@
import { atom, computed } from 'nanostores'
import { readKey, writeKey } from '@/lib/storage'
import { $currentCwd } from '@/store/session'
import { setTerminalTakeover } from '../store'
import { seedAgentTerminalCommand } from './agent-terminal-stream'
/** One in-app terminal tab. `id` is the renderer-side handle (distinct from the
* PTY session id the main process mints); each instance owns its own shell. */
export interface TerminalEntry {
id: string
/** Display label. `auto` adopts the resolved shell name until the user renames. */
title: string
auto: boolean
/** Working directory, snapshotted once at creation. Terminals live outside
* session/project state — the only thing they inherit is this initial cwd
* (the project root if opened in one, else the backend's default). Switching
* sessions never moves or recreates a terminal. */
cwd: string
/** Serialized xterm scrollback from the last session, replayed on relaunch so
* the tab reopens with its recent history (VS Code parity). Processes are NOT
* revived — a fresh shell starts beneath the restored buffer. Captured live
* for user tabs only; agent mirrors stay runtime-only. */
reviveBuffer?: string
/** `user` = interactive PTY shell. `agent` = read-only mirror of an agent
* background process (`terminal(background=true)`), keyed by `procId`. */
kind: 'user' | 'agent'
procId?: string
}
interface PersistedTerminalEntry {
auto: boolean
cwd: string
id: string
reviveBuffer?: string
title: string
}
interface PersistedTerminalState {
activeTerminalId: null | string
terminals: PersistedTerminalEntry[]
}
const TERMINALS_STORAGE_KEY = 'hermes.desktop.terminals.v1'
// Cap a single tab's replayed history so the persisted layout can't blow the
// localStorage quota. Roughly mirrors VS Code's persistentSessionScrollback
// default (100 lines) once the serialized escape codes are counted in.
const MAX_REVIVE_BUFFER_CHARS = 48_000
function sanitizePersistedTerminal(value: unknown): PersistedTerminalEntry | null {
if (!value || typeof value !== 'object' || Array.isArray(value)) {
return null
}
const record = value as Record<string, unknown>
const id = typeof record.id === 'string' ? record.id.trim() : ''
const title = typeof record.title === 'string' ? record.title.trim() : ''
const cwd = typeof record.cwd === 'string' ? record.cwd : ''
const reviveBuffer = typeof record.reviveBuffer === 'string' ? record.reviveBuffer : undefined
if (!id) {
return null
}
return {
auto: typeof record.auto === 'boolean' ? record.auto : true,
cwd,
id,
...(reviveBuffer ? { reviveBuffer } : {}),
title: title || 'Terminal'
}
}
function loadPersistedTerminals(): PersistedTerminalState {
const fallback: PersistedTerminalState = { activeTerminalId: null, terminals: [] }
const raw = readKey(TERMINALS_STORAGE_KEY)
if (!raw) {
return fallback
}
try {
const parsed = JSON.parse(raw) as unknown
if (!parsed || typeof parsed !== 'object' || Array.isArray(parsed)) {
return fallback
}
const record = parsed as Record<string, unknown>
const terminals = Array.isArray(record.terminals)
? record.terminals.map(sanitizePersistedTerminal).filter((term): term is PersistedTerminalEntry => Boolean(term))
: []
const active =
typeof record.activeTerminalId === 'string' && terminals.some(term => term.id === record.activeTerminalId)
? record.activeTerminalId
: (terminals[0]?.id ?? null)
return { activeTerminalId: active, terminals }
} catch {
return fallback
}
}
// Persist synchronously on every change (the app-wide convention — see panes.ts
// / layout.ts). Capturing history this way means a snapshot is already on disk
// well before the renderer tears down, so app quit needs no unload hook.
function persistTerminals(list: readonly TerminalEntry[], activeTerminalId: null | string) {
const terminals = list
.filter(term => term.kind === 'user')
.map(term => ({
auto: term.auto,
cwd: term.cwd,
id: term.id,
...(term.reviveBuffer ? { reviveBuffer: term.reviveBuffer } : {}),
title: term.title
}))
if (!terminals.length) {
writeKey(TERMINALS_STORAGE_KEY, null)
return
}
const active = terminals.some(term => term.id === activeTerminalId) ? activeTerminalId : (terminals[0]?.id ?? null)
writeKey(TERMINALS_STORAGE_KEY, JSON.stringify({ activeTerminalId: active, terminals }))
}
const restored = loadPersistedTerminals()
export const $terminals = atom<readonly TerminalEntry[]>(
restored.terminals.map(term => ({ ...term, kind: 'user' as const }))
)
export const $activeTerminalId = atom<string | null>(restored.activeTerminalId)
$terminals.subscribe(list => persistTerminals(list, $activeTerminalId.get()))
$activeTerminalId.subscribe(active => persistTerminals($terminals.get(), active))
export const $activeTerminal = computed(
[$terminals, $activeTerminalId],
(list, id) => list.find(term => term.id === id) ?? null
)
const newId = () =>
globalThis.crypto?.randomUUID?.() ?? `term-${Date.now().toString(36)}-${Math.random().toString(36).slice(2)}`
/** Append a fresh terminal and focus it. Captures the current cwd once (its only
* tie to session/project state); pass an explicit cwd to override. Returns the id. */
export function createTerminal(cwd: string = $currentCwd.get()): string {
const id = newId()
$terminals.set([...$terminals.get(), { id, title: 'Terminal', auto: true, cwd, kind: 'user' }])
$activeTerminalId.set(id)
return id
}
// Procs we've already surfaced a tab for — so closing an agent tab doesn't
// resurrect it on the next poll while the process is still running.
const surfacedProcs = new Set<string>()
const findByProc = (procId: string) => $terminals.get().find(term => term.procId === procId)
/** Auto-surface an agent background process as a read-only tab — once. Returns
* the tab id, or null if it was already surfaced and the user has since closed it. */
export function ensureAgentTerminal(procId: string, title: string): string | null {
const existing = findByProc(procId)
if (existing) {
return existing.id
}
if (surfacedProcs.has(procId)) {
return null
}
surfacedProcs.add(procId)
const id = newId()
$terminals.set([...$terminals.get(), { id, title: title || 'agent', auto: false, cwd: '', kind: 'agent', procId }])
return id
}
/** Open + focus an agent process's tab (the status-stack link), recreating it if
* the user had closed it. Opens the pane. */
export function openAgentTerminal(procId: string, title: string): void {
surfacedProcs.add(procId)
seedAgentTerminalCommand(procId, title)
let id = findByProc(procId)?.id
if (!id) {
id = newId()
$terminals.set([...$terminals.get(), { id, title: title || 'agent', auto: false, cwd: '', kind: 'agent', procId }])
}
$activeTerminalId.set(id)
setTerminalTakeover(true)
}
/** Guarantee at least one tab exists when the pane opens.
* If a status-stack click already opened an agent tab, don't create a
* second, unrelated user shell just because the pane became visible. */
export function ensureTerminal(): void {
if ($terminals.get().length === 0) {
createTerminal()
}
}
export function selectTerminal(id: string): void {
if ($terminals.get().some(term => term.id === id)) {
$activeTerminalId.set(id)
}
}
/** Move the active tab by `direction` (+1 next / -1 prev), wrapping around. */
export function cycleTerminal(direction: 1 | -1): void {
const list = $terminals.get()
if (list.length < 2) {
return
}
const current = Math.max(
0,
list.findIndex(term => term.id === $activeTerminalId.get())
)
$activeTerminalId.set(list[(current + direction + list.length) % list.length].id)
}
/** Drop a terminal. Focus slides to the neighbor that fills its slot; closing
* the last one closes the whole pane. */
export function closeTerminal(id: string): void {
const list = $terminals.get()
const index = list.findIndex(term => term.id === id)
if (index < 0) {
return
}
const next = list.filter(term => term.id !== id)
$terminals.set(next)
if ($activeTerminalId.get() === id) {
$activeTerminalId.set((next[index] ?? next[index - 1])?.id ?? null)
}
if (!next.length) {
setTerminalTakeover(false)
}
}
/** Close the read-only agent tab mirroring a background process. The agent
* drives this via the desktop-gated `close_terminal` tool → `terminal.close`.
* The process is NOT killed — only the view is dropped; `surfacedProcs` keeps
* it from auto-resurfacing, and the status-stack row can reopen it on demand.
* No-op when no such tab exists. */
export function closeAgentTerminalByProc(procId: string): boolean {
const term = $terminals.get().find(t => t.kind === 'agent' && t.procId === procId)
if (!term) {
return false
}
closeTerminal(term.id)
return true
}
export function closeActiveTerminal(): void {
const id = $activeTerminalId.get()
if (id) {
closeTerminal(id)
}
}
export function closeAllTerminals(): void {
if ($terminals.get().length === 0) {
return
}
$terminals.set([])
$activeTerminalId.set(null)
setTerminalTakeover(false)
}
export function closeOtherTerminals(id: string): void {
const keep = $terminals.get().find(term => term.id === id)
if (keep) {
$terminals.set([keep])
$activeTerminalId.set(keep.id)
}
}
/** Record the latest serialized scrollback for a tab so it can be replayed on
* the next launch. Oversized buffers are tail-trimmed to stay under the storage
* budget; only user tabs ever carry one. */
export function updateTerminalReviveBuffer(id: string, reviveBuffer: string): void {
const capped =
reviveBuffer.length > MAX_REVIVE_BUFFER_CHARS ? reviveBuffer.slice(-MAX_REVIVE_BUFFER_CHARS) : reviveBuffer
$terminals.set(
$terminals.get().map(term => (term.id === id && term.kind === 'user' ? { ...term, reviveBuffer: capped } : term))
)
}
export function renameTerminal(id: string, title: string): void {
const trimmed = title.trim()
$terminals.set(
$terminals.get().map(term => (term.id === id ? { ...term, title: trimmed || term.title, auto: false } : term))
)
}
/** A live terminal reports its resolved shell; adopt it as the label only while
* the user hasn't named the tab themselves. */
export function reportTerminalShell(id: string, shell: string): void {
const name = shell.trim()
if (!name) {
return
}
$terminals.set($terminals.get().map(term => (term.id === id && term.auto ? { ...term, title: name } : term)))
}

View File

@@ -0,0 +1,142 @@
import { FitAddon } from '@xterm/addon-fit'
import { Unicode11Addon } from '@xterm/addon-unicode11'
import { WebLinksAddon } from '@xterm/addon-web-links'
import { WebglAddon } from '@xterm/addon-webgl'
import { Terminal } from '@xterm/xterm'
import { useEffect, useRef } from 'react'
import { useTheme } from '@/themes/context'
import { registerAgentTerminalWriter } from './agent-terminal-stream'
import { makeTerminalReader, registerTerminalReader } from './buffer'
import { resolveSurfaceColor, terminalTheme } from './selection'
// Read-only terminal for an agent background process: a write-only xterm (no PTY,
// no input) fed live by the backend output stream, keyed by process id. Shares
// the user terminal's look so the two read as one surface.
export function useAgentTerminal({ active, id, procId }: { active: boolean; id: string; procId: string }) {
const { renderedMode, theme, themeName } = useTheme()
const hostRef = useRef<HTMLDivElement | null>(null)
const termRef = useRef<Terminal | null>(null)
const webglRef = useRef<WebglAddon | null>(null)
const fitRef = useRef<(() => void) | null>(null)
const surfaceTheme = () => {
const ansi = renderedMode === 'dark' ? (theme.darkTerminal ?? theme.terminal) : theme.terminal
const surface = resolveSurfaceColor('#ffffff')
return { ...terminalTheme(renderedMode, ansi), background: surface, cursorAccent: surface }
}
useEffect(() => {
const host = hostRef.current
if (!host) {
return
}
const term = new Terminal({
allowProposedApi: true,
allowTransparency: false,
convertEol: true,
cursorBlink: false,
disableStdin: true,
fontFamily: "'JetBrains Mono', 'Cascadia Code', 'SF Mono', Menlo, Consolas, monospace",
fontSize: 11,
fontWeight: 'normal',
fontWeightBold: 'bold',
letterSpacing: 0,
lineHeight: 1.12,
minimumContrastRatio: 4.5,
scrollback: 1000,
theme: surfaceTheme()
})
const fit = new FitAddon()
term.loadAddon(fit)
term.loadAddon(new Unicode11Addon())
term.loadAddon(new WebLinksAddon())
term.unicode.activeVersion = '11'
term.open(host)
termRef.current = term
fitRef.current = () => {
if (host.clientWidth > 0 && host.clientHeight > 0) {
try {
fit.fit()
} catch {
// Mid-transition layout — the next observer tick refits.
}
}
}
try {
const webgl = new WebglAddon()
webgl.onContextLoss(() => {
webgl.dispose()
webglRef.current = null
})
term.loadAddon(webgl)
webglRef.current = webgl
} catch {
// No WebGL — xterm falls back to the DOM renderer.
}
fitRef.current()
const observer = new ResizeObserver(() => fitRef.current?.())
observer.observe(host)
// Stream live output straight into the terminal (replays backlog on attach).
const unregister = registerAgentTerminalWriter(procId, chunk => term.write(chunk))
const unregisterReader = registerTerminalReader(id, makeTerminalReader(term))
return () => {
unregister()
unregisterReader()
observer.disconnect()
term.dispose()
termRef.current = null
webglRef.current = null
}
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [])
useEffect(() => {
const term = termRef.current
if (!term) {
return
}
const raf = requestAnimationFrame(() => {
term.options.theme = surfaceTheme()
webglRef.current?.clearTextureAtlas()
})
return () => cancelAnimationFrame(raf)
// eslint-disable-next-line react-hooks/exhaustive-deps
}, [renderedMode, themeName])
// A visibility:hidden xterm doesn't paint — refit + redraw on re-activation.
useEffect(() => {
if (!active) {
return
}
const frame = requestAnimationFrame(() => {
const term = termRef.current
fitRef.current?.()
webglRef.current?.clearTextureAtlas()
term?.refresh(0, term.rows - 1)
// Take focus on activation (parity with the user terminal) so the active
// agent tab holds focus and ⌘W's isFocusWithin('[data-terminal]') routes
// the close to this tab rather than to a preview.
term?.focus()
})
return () => cancelAnimationFrame(frame)
}, [active])
return { hostRef }
}

View File

@@ -1,4 +1,5 @@
import { FitAddon } from '@xterm/addon-fit'
import { SerializeAddon } from '@xterm/addon-serialize'
import { Unicode11Addon } from '@xterm/addon-unicode11'
import { WebLinksAddon } from '@xterm/addon-web-links'
import { WebglAddon } from '@xterm/addon-webgl'
@@ -12,7 +13,7 @@ import { useTheme } from '@/themes/context'
import { $terminalInjection } from '../store'
import { makeTerminalReader, setActiveTerminalReader } from './buffer'
import { makeTerminalReader, registerTerminalReader } from './buffer'
import {
isAddSelectionShortcut,
resolveSurfaceColor,
@@ -20,6 +21,34 @@ import {
terminalSelectionLabel,
terminalTheme
} from './selection'
import { closeTerminal, updateTerminalReviveBuffer } from './terminals'
// How many scrollback lines to serialize for relaunch restore. Mirrors VS Code's
// terminal.integrated.persistentSessionScrollback default; the store caps the
// resulting string so a long line-wrapped buffer can't blow the storage budget.
const PERSISTENT_SESSION_SCROLLBACK = 200
// Leading-edge throttle window for capturing history. The first output after an
// idle gap persists almost immediately (so `cmd; quit` is on disk before the
// renderer tears down), then at most once per window while output streams.
const SNAPSHOT_THROTTLE_MS = 750
// True once the page/app is tearing down (Cmd+Q, Alt+F4, window close, reload).
// App quit kills the PTYs from the main process, which fires onExit in the
// renderer — but React skips effect cleanups on teardown, so the per-instance
// `disposed` flag never flips. Without this guard those teardown exits would call
// closeTerminal() and wipe the persisted terminal list right before relaunch
// reads it. A real `exit`/Ctrl-D still closes the tab (flag stays false).
let appTearingDown = false
if (typeof window !== 'undefined') {
const markTearingDown = () => {
appTearingDown = true
}
window.addEventListener('pagehide', markTearingDown)
window.addEventListener('beforeunload', markTearingDown)
}
type TerminalStatus = 'closed' | 'open' | 'starting'
@@ -65,6 +94,14 @@ function readEscapeSequence(data: string, index: number) {
}
}
// Character-set and other short ESC forms are three bytes (e.g. ESC ( B).
// Treating only ESC+( as a sequence leaves the final selector ("B") as
// printable text, which disarms the initial prompt-gap stripper before it can
// eat the shell's leading newline.
if (['(', ')', '*', '+', '-', '.', '/'].includes(kind) && index + 2 < data.length) {
return data.slice(index, index + 3)
}
return data.slice(index, Math.min(index + 2, data.length))
}
@@ -131,9 +168,49 @@ function stripInitialPromptGap(data: string) {
return prefix
}
// Trim the shell's trailing idle prompt from a serialized snapshot before it's
// persisted. Without it, the saved buffer ends in the old prompt, so the next
// launch replays it directly above the fresh shell's prompt ("double bar"). The
// prompt is the short block after the last blank line (starship's add_newline
// gap); only a short tail is dropped, so real command output is never trimmed and
// configs without that blank line simply keep the historical prompt (no loss).
function cleanReviveSnapshot(serialized: string): string {
const visible = (line: string) => stripEscapeSequences(line).replace(/[\s%]/g, '')
const lines = serialized.split(/\r?\n/)
while (lines.length && visible(lines[lines.length - 1]) === '') {
lines.pop()
}
let lastBlank = -1
for (let i = lines.length - 1; i >= 0; i -= 1) {
if (visible(lines[i]) === '') {
lastBlank = i
break
}
}
// A prompt is a short block; a long tail after the blank is real output, leave it.
if (lastBlank >= 0 && lines.length - 1 - lastBlank <= 3) {
lines.length = lastBlank
}
return lines.join('\r\n')
}
interface UseTerminalSessionOptions {
/** Renderer-side terminal id (the tab handle), used to key the agent reader. */
id: string
cwd: string
/** Only the active tab is visible, owns the agent reader, and runs injections. */
active: boolean
onAddSelectionToChat: (text: string, label?: string) => void
/** Serialized scrollback from the previous session, replayed once on mount. */
reviveBuffer?: string
/** Reports the resolved shell name once the PTY is live (for the tab label). */
onShell?: (shell: string) => void
}
// Bind the palette to the live skin surface so the terminal blends with the app
@@ -232,7 +309,14 @@ function quotePathForShell(path: string, shellName: string): string {
return `'${path.replace(/'/g, "'\\''")}'`
}
export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSessionOptions) {
export function useTerminalSession({
id,
cwd,
active,
onAddSelectionToChat,
reviveBuffer,
onShell
}: UseTerminalSessionOptions) {
// Key off renderedMode (the painted surface type), not resolvedMode (the
// clicked switch) — a skin can keep a light surface in "dark" mode, and we
// must match the surface or the ANSI palette inverts against it. themeName
@@ -249,10 +333,17 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
const termRef = useRef<Terminal | null>(null)
const webglRef = useRef<WebglAddon | null>(null)
const sessionIdRef = useRef<string | null>(null)
// Snapshot the revive buffer once: live snapshots feed updateTerminalReviveBuffer
// and would otherwise re-arm replay on every store-driven re-render.
const initialReviveBufferRef = useRef(reviveBuffer)
const shellNameRef = useRef('shell')
const selectionLabelRef = useRef('')
const selectionRef = useRef('')
const onAddSelectionToChatRef = useRef(onAddSelectionToChat)
const onShellRef = useRef(onShell)
// Re-fit on activation: a tab hidden via display:none has a 0×0 host, so its
// last fit is stale by the time it's shown again.
const fitRef = useRef<(() => void) | null>(null)
const [status, setStatus] = useState<TerminalStatus>('starting')
const [selection, setSelection] = useState('')
const [selectionStyle, setSelectionStyle] = useState<CSSProperties | null>(null)
@@ -260,7 +351,8 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
useEffect(() => {
onAddSelectionToChatRef.current = onAddSelectionToChat
}, [onAddSelectionToChat])
onShellRef.current = onShell
}, [onAddSelectionToChat, onShell])
// Live selection at call time. A redraw-heavy TUI (spinners, clocks) outruns
// onSelectionChange, so trust xterm directly — fall back to the native
@@ -361,16 +453,71 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
})
const fit = new FitAddon()
const serialize = new SerializeAddon()
termRef.current = term
term.loadAddon(fit)
term.loadAddon(serialize)
term.loadAddon(new Unicode11Addon())
term.loadAddon(new WebLinksAddon())
term.unicode.activeVersion = '11'
// Let the GUI chat agent read this pane via the `read_terminal` tool: the
// gateway's terminal.read.request handler serializes the buffer through this.
setActiveTerminalReader(makeTerminalReader(term))
// Replay last session's scrollback before the fresh shell boots. The process
// is NOT revived — a new shell starts one line below the restored history.
// Stripping the boot gap still applies to the live shell output that follows,
// so the fresh prompt lands flush under the restored block.
const initialReviveBuffer = initialReviveBufferRef.current
if (initialReviveBuffer) {
term.write(initialReviveBuffer)
term.write('\r\n')
}
// Capture the buffer on a leading-edge throttle and persist synchronously via
// the store. No unload hook: by the time the user quits, a recent snapshot is
// already on disk (the prior beforeunload-based attempt lost the last output).
let snapshotTimer = 0
let lastSnapshotAt = 0
const persistSnapshot = () => {
if (disposed) {
return
}
lastSnapshotAt = Date.now()
try {
const snapshot = serialize.serialize({ scrollback: PERSISTENT_SESSION_SCROLLBACK })
updateTerminalReviveBuffer(id, cleanReviveSnapshot(snapshot))
} catch {
// Best-effort restore: never let serialization break a live terminal.
}
}
const scheduleSnapshot = () => {
if (snapshotTimer) {
return
}
const elapsed = Date.now() - lastSnapshotAt
if (elapsed >= SNAPSHOT_THROTTLE_MS) {
persistSnapshot()
return
}
snapshotTimer = window.setTimeout(() => {
snapshotTimer = 0
persistSnapshot()
}, SNAPSHOT_THROTTLE_MS - elapsed)
}
cleanup.push(() => {
if (snapshotTimer) {
window.clearTimeout(snapshotTimer)
}
})
const onDragOver = (e: DragEvent) => {
if (!e.dataTransfer || !transferHasDropCandidates(e.dataTransfer)) {
@@ -411,18 +558,9 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
host.removeEventListener('drop', onDrop)
})
// A fresh prompt should sit at the top. Every resize SIGWINCHes the shell,
// which reprints its prompt and can leave stale blank rows above it. While
// the session is pristine (nothing run yet) we ask the shell to clear +
// redraw via Ctrl-L (\f) after the resize settles. Ctrl-L preserves
// multi-line prompts (term.clear() would drop all but the cursor row) and we
// stop the moment real output exists, so command scrollback is never wiped.
let promptPristine = true
let gapCleanupTimer = 0
// While armed, strip leading blank rows so the prompt lands at the very top
// (no starship `add_newline` gap). Re-armed before each Ctrl-L redraw so the
// resize cleanup doesn't reintroduce the blank line.
// While armed, strip leading blank rows so the first prompt lands at the
// very top (no starship `add_newline` gap). Do this only on renderer output:
// never inject Ctrl-L or other cleanup keystrokes into the user's shell.
let stripLeading = true
const armedWrite = (data: string) => {
@@ -451,35 +589,6 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
term.write(next)
}
const scheduleGapCleanup = () => {
if (!promptPristine) {
return
}
if (gapCleanupTimer) {
window.clearTimeout(gapCleanupTimer)
}
gapCleanupTimer = window.setTimeout(() => {
gapCleanupTimer = 0
const id = sessionIdRef.current
if (disposed || !id || !promptPristine) {
return
}
stripLeading = true
void terminalApi.write(id, '\f')
term.clearSelection()
}, 120)
}
cleanup.push(() => {
if (gapCleanupTimer) {
window.clearTimeout(gapCleanupTimer)
}
})
const fitAndResize = () => {
if (disposed || !host.isConnected || host.clientWidth <= 0 || host.clientHeight <= 0) {
return
@@ -496,10 +605,11 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
if (id && (lastSentSize?.cols !== term.cols || lastSentSize?.rows !== term.rows)) {
lastSentSize = { cols: term.cols, rows: term.rows }
void terminalApi.resize(id, { cols: term.cols, rows: term.rows })
scheduleGapCleanup()
}
}
fitRef.current = fitAndResize
// Coalesce ResizeObserver bursts through rAF — running fit.fit()
// synchronously while sibling panes are mid-transition (e.g. file browser
// collapsing to 0px) crashes the WebGL renderer mid texture-atlas rebuild.
@@ -533,12 +643,6 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
const id = sessionIdRef.current
if (id) {
// Once the user submits a line, real output may follow — stop the
// pristine-prompt gap cleanup so we never clear command scrollback.
if (promptPristine && data.includes('\r')) {
promptPristine = false
}
void terminalApi.write(id, data)
}
})
@@ -569,6 +673,7 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
lastSentSize = { cols: term.cols, rows: term.rows }
shellNameRef.current = session.shell || 'shell'
setShellName(session.shell || 'shell')
onShellRef.current?.(session.shell || 'shell')
const initial = term.hasSelection() ? term.getSelection() : ''
selectionRef.current = initial
@@ -577,10 +682,21 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
setStatus('open')
cleanup.push(
terminalApi.onData(session.id, armedWrite),
terminalApi.onExit(session.id, ({ code, signal }) => {
setStatus('closed')
term.write(`\r\n[terminal exited${signal ? `: ${signal}` : code !== null ? `: ${code}` : ''}]\r\n`)
terminalApi.onData(session.id, data => {
armedWrite(data)
scheduleSnapshot()
}),
terminalApi.onExit(session.id, () => {
// Shell exited (`exit` / Ctrl-D / crash) — drop the tab like a real
// terminal. closeTerminal hides the pane when it's the last one.
// Skip if we're tearing down (cleanup disposes the PTY) OR the app
// is quitting/reloading: on quit the main process kills every PTY,
// firing this exit, but React skips the cleanup so `disposed` stays
// false — running closeTerminal here would wipe the persisted tabs
// right before relaunch restores them.
if (!disposed && !appTearingDown) {
closeTerminal(id)
}
})
)
@@ -638,7 +754,7 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
return () => {
disposed = true
cleanup.forEach(run => run())
setActiveTerminalReader(null)
fitRef.current = null
const id = sessionIdRef.current
sessionIdRef.current = null
@@ -654,7 +770,10 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
selectionRef.current = ''
selectionLabelRef.current = ''
}
}, [addSelectionToChat, cwd])
// `id` is stable for the instance's life (keyed by tab id), so listing it
// doesn't re-create the shell — it just satisfies the deps check for the
// closeTerminal(id) call in onExit.
}, [addSelectionToChat, cwd, id])
useEffect(() => {
const term = termRef.current
@@ -677,27 +796,61 @@ export function useTerminalSession({ cwd, onAddSelectionToChat }: UseTerminalSes
return () => cancelAnimationFrame(raf)
}, [activeTheme, themeName])
// Flush a queued command (e.g. a provider-disconnect) into the live session.
// Only active while open; the subscribe fires immediately, so a command set
// before this pane mounted runs as soon as the session is ready. Clearing the
// atom after writing stops a later remount from replaying a stale command.
// Expose this terminal's buffer to the agent's `read_terminal` tool, keyed by
// id. The tab selection (setActiveTerminalId) decides which one it reads, so
// every live terminal stays registered regardless of visibility.
useEffect(() => {
if (status !== 'open') {
return
}
return $terminalInjection.subscribe(command => {
const id = sessionIdRef.current
const term = termRef.current
if (!command || !id) {
return term ? registerTerminalReader(id, makeTerminalReader(term)) : undefined
}, [id, status])
// On (re)activation: a WebGL terminal doesn't paint while visibility:hidden, so
// it reveals a stale/garbled frame. Refit, rebuild the glyph atlas, and force a
// full redraw against the live buffer, then focus.
useEffect(() => {
if (!active || status !== 'open') {
return
}
const frame = requestAnimationFrame(() => {
const term = termRef.current
fitRef.current?.()
webglRef.current?.clearTextureAtlas()
term?.refresh(0, term.rows - 1)
term?.focus()
})
return () => cancelAnimationFrame(frame)
}, [active, status])
// Flush a queued command (e.g. a provider-disconnect) into the live session.
// Only the active tab runs it (so a broadcast doesn't fan out to every shell);
// the subscribe fires immediately, so a command set before this pane mounted
// runs as soon as the session is ready. Cleared after writing so a later
// remount can't replay a stale command.
useEffect(() => {
if (!active || status !== 'open') {
return
}
return $terminalInjection.subscribe(command => {
const sessionId = sessionIdRef.current
if (!command || !sessionId) {
return
}
void window.hermesDesktop?.terminal?.write(id, `${command}\r`)
void window.hermesDesktop?.terminal?.write(sessionId, `${command}\r`)
$terminalInjection.set(null)
termRef.current?.focus()
})
}, [status])
}, [active, status])
return {
addSelectionToChat,

View File

@@ -0,0 +1,65 @@
import { useStore } from '@nanostores/react'
import { useEffect } from 'react'
import { $backgroundStatusBySession } from '@/store/composer-status'
import { seedAgentTerminalCommand, syncAgentTerminalSnapshot } from './agent-terminal-stream'
import { setActiveTerminalId } from './buffer'
import { AgentTerminalInstance, TerminalInstance } from './instance'
import { $activeTerminalId, $terminals, ensureAgentTerminal } from './terminals'
interface TerminalWorkspaceProps {
onAddSelectionToChat: (text: string, label?: string) => void
}
/** The persistent-overlay layer: the stack of live xterm instances (only these
* must stay in the fixed overlay, for the WebGL host). Mount/visibility is owned
* by PersistentTerminal (latched so shells survive hiding); the tab rail and
* new-terminal control live in the pane DOM — see TerminalPaneChrome. */
export function TerminalWorkspace({ onAddSelectionToChat }: TerminalWorkspaceProps) {
const terminals = useStore($terminals)
const activeId = useStore($activeTerminalId)
const background = useStore($backgroundStatusBySession)
// Mirror the tab selection into the agent reader (read_terminal reads it).
useEffect(() => {
const unsubscribe = $activeTerminalId.subscribe(setActiveTerminalId)
return () => {
unsubscribe()
setActiveTerminalId(null)
}
}, [])
// Surface the agent's background processes as read-only tabs (once each).
// Live chunks stream via agent.terminal.output; the process-list snapshot also
// seeds/falls back so the tab never stays blank if the stream races startup.
useEffect(() => {
for (const list of Object.values(background)) {
for (const item of list) {
ensureAgentTerminal(item.id, item.title)
seedAgentTerminalCommand(item.id, item.title)
syncAgentTerminalSnapshot(item.id, item.output ?? '')
}
}
}, [background])
return (
<>
{terminals.map(term =>
term.kind === 'agent' ? (
<AgentTerminalInstance active={term.id === activeId} id={term.id} key={term.id} procId={term.procId!} />
) : (
<TerminalInstance
active={term.id === activeId}
cwd={term.cwd}
id={term.id}
key={term.id}
onAddSelectionToChat={onAddSelectionToChat}
reviveBuffer={term.reviveBuffer}
/>
)
)}
</>
)
}

View File

@@ -1,6 +1,8 @@
import type { QueryClient } from '@tanstack/react-query'
import { type MutableRefObject, useCallback, useEffect, useRef } from 'react'
import { writeAgentTerminalChunk } from '@/app/right-sidebar/terminal/agent-terminal-stream'
import { closeAgentTerminalByProc } from '@/app/right-sidebar/terminal/terminals'
import { readActiveTerminal } from '@/app/right-sidebar/terminal/buffer'
import { translateNow } from '@/i18n'
import {
@@ -900,6 +902,29 @@ export function useMessageStream({
appendReasoningDelta(sessionId, coerceThinkingText(payload?.text), true)
}
if (isActiveEvent) {
setPetActivity({ reasoning: true })
}
} else if (event.type === 'moa.reference') {
// MoA reference-model output — surface as a labelled thinking chunk
// (tagged with the source model) before the aggregator's response, so
// the mixture-of-agents process is visible. Reuses the reasoning
// disclosure rather than introducing a parallel surface.
if (sessionId) {
const label = coerceGatewayText(payload?.label) || 'reference'
const idx = typeof payload?.index === 'number' ? payload.index : undefined
const cnt = typeof payload?.count === 'number' ? payload.count : undefined
const header = idx && cnt ? `◇ Reference ${idx}/${cnt}${label}` : `◇ Reference — ${label}`
const body = coerceThinkingText(payload?.text)
appendReasoningDelta(sessionId, `${header}\n${body}\n\n`, true)
}
if (isActiveEvent) {
setPetActivity({ reasoning: true })
}
} else if (event.type === 'moa.aggregating') {
// Status transition only; the aggregator's reply arrives via the normal
// message stream. No reasoning/transcript mutation here.
if (isActiveEvent) {
setPetActivity({ reasoning: true })
}
@@ -1142,6 +1167,13 @@ export function useMessageStream({
text: result ? JSON.stringify(result) : ''
})
}
} else if (event.type === 'agent.terminal.output') {
// Live chunk from a background process → its read-only agent terminal tab.
writeAgentTerminalChunk(payload?.process_id ?? '', payload?.chunk ?? '')
} else if (event.type === 'terminal.close') {
// Agent closed its own read-only tab via the desktop-gated close_terminal tool.
// The process is untouched — this only drops the view.
closeAgentTerminalByProc(payload?.process_id ?? '')
} else if (event.type === 'status.update') {
if (sessionId && payload?.kind === 'compacting') {
setSessionCompacting(sessionId, true)

View File

@@ -3,9 +3,19 @@ import type { MutableRefObject } from 'react'
import { useEffect } from 'react'
import { afterEach, describe, expect, it, vi } from 'vitest'
import { getSessionMessages } from '@/hermes'
import { getSessionMessages, type SessionInfo } from '@/hermes'
import { createClientSessionState } from '@/lib/chat-runtime'
import { $activeGatewayProfile, $newChatProfile } from '@/store/profile'
import { $currentCwd, $messages, $resumeFailedSessionId, setMessages, setResumeFailedSessionId } from '@/store/session'
import {
$activeSessionId,
$currentCwd,
$messages,
$resumeFailedSessionId,
setActiveSessionId,
setMessages,
setResumeFailedSessionId,
setSessions
} from '@/store/session'
import type { ClientSessionState } from '../../types'
@@ -22,6 +32,25 @@ vi.mock('@/hermes', async importOriginal => ({
const RUNTIME_SESSION_ID = 'rt-new-001'
function storedSession(overrides: Partial<SessionInfo> = {}): SessionInfo {
return {
ended_at: null,
id: 'stored-1',
input_tokens: 0,
is_active: false,
last_active: 1,
message_count: 0,
model: null,
output_tokens: 0,
preview: null,
source: 'desktop',
started_at: 1,
title: 'stored',
tool_call_count: 0,
...overrides
}
}
function Harness({
onReady,
requestGateway
@@ -84,6 +113,7 @@ describe('createBackendSessionForSend profile routing', () => {
cleanup()
$newChatProfile.set(null)
$activeGatewayProfile.set('default')
$currentCwd.set('')
vi.restoreAllMocks()
})
@@ -117,6 +147,14 @@ describe('createBackendSessionForSend profile routing', () => {
expect(params).toMatchObject({ profile: 'default' })
})
it('passes the current workspace cwd into session.create', async () => {
const params = await createWith(() => {
$currentCwd.set('/remote/worktree')
})
expect(params).toMatchObject({ cwd: '/remote/worktree' })
})
})
// ── Resume failure recovery (the "stuck loading session window" bug) ──────────
@@ -126,10 +164,14 @@ describe('createBackendSessionForSend profile routing', () => {
// succeeds must NOT leave the flag armed.
function ResumeHarness({
onReady,
requestGateway
requestGateway,
runtimeIdByStoredSessionIdRef,
sessionStateByRuntimeIdRef
}: {
onReady: (resume: (storedSessionId: string, replaceRoute?: boolean) => Promise<unknown>) => void
requestGateway: <T>(method: string, params?: Record<string, unknown>) => Promise<T>
runtimeIdByStoredSessionIdRef?: MutableRefObject<Map<string, string>>
sessionStateByRuntimeIdRef?: MutableRefObject<Map<string, ClientSessionState>>
}) {
const ref = <T,>(value: T): MutableRefObject<T> => ({ current: value })
@@ -142,10 +184,10 @@ function ResumeHarness({
getRouteToken: () => 'token',
navigate: vi.fn() as never,
requestGateway,
runtimeIdByStoredSessionIdRef: ref(new Map<string, string>()),
runtimeIdByStoredSessionIdRef: runtimeIdByStoredSessionIdRef ?? ref(new Map<string, string>()),
selectedStoredSessionId: null,
selectedStoredSessionIdRef: ref<string | null>(null),
sessionStateByRuntimeIdRef: ref(new Map<string, ClientSessionState>()),
sessionStateByRuntimeIdRef: sessionStateByRuntimeIdRef ?? ref(new Map<string, ClientSessionState>()),
syncSessionStateToView: vi.fn(),
updateSessionState: (_sessionId, updater) => updater({} as ClientSessionState)
})
@@ -160,16 +202,22 @@ function ResumeHarness({
describe('resumeSession failure recovery', () => {
afterEach(() => {
cleanup()
setActiveSessionId(null)
setResumeFailedSessionId(null)
setMessages([])
setSessions([])
vi.restoreAllMocks()
})
async function runResume(
requestGateway: <T>(method: string, params?: Record<string, unknown>) => Promise<T>
requestGateway: <T>(method: string, params?: Record<string, unknown>) => Promise<T>,
options: {
runtimeIdByStoredSessionIdRef?: MutableRefObject<Map<string, string>>
sessionStateByRuntimeIdRef?: MutableRefObject<Map<string, ClientSessionState>>
} = {}
): Promise<void> {
let resume: ((storedSessionId: string, replaceRoute?: boolean) => Promise<unknown>) | null = null
render(<ResumeHarness onReady={r => (resume = r)} requestGateway={requestGateway} />)
render(<ResumeHarness onReady={r => (resume = r)} requestGateway={requestGateway} {...options} />)
await waitFor(() => expect(resume).not.toBeNull())
await resume!('stored-1', true)
}
@@ -281,4 +329,187 @@ describe('resumeSession failure recovery', () => {
expect(resumeParams).not.toHaveProperty('lazy')
expect(resumeParams).not.toHaveProperty('eager_build')
})
it('arms the failure latch when resume succeeds with an empty transcript for a non-empty stored session', async () => {
setSessions([storedSession({ message_count: 4 })])
const requestGateway = vi.fn(async (method: string, params?: Record<string, unknown>) => {
if (method === 'session.resume') {
return { session_id: 'runtime-1', resumed: params?.session_id, messages: [], info: {} } as never
}
return {} as never
})
vi.mocked(getSessionMessages).mockResolvedValue({ messages: [], session_id: 'stored-1' } as never)
await runResume(requestGateway)
expect($resumeFailedSessionId.get()).toBe('stored-1')
expect($activeSessionId.get()).toBeNull()
expect($messages.get()).toEqual([])
})
it('does not reuse an empty cached runtime view for a stored session with history', async () => {
const runtimeIdByStoredSessionIdRef = {
current: new Map([['stored-1', 'runtime-stale']])
} satisfies MutableRefObject<Map<string, string>>
const sessionStateByRuntimeIdRef = {
current: new Map([
[
'runtime-stale',
{
awaitingResponse: false,
branch: '',
busy: false,
cwd: '',
fast: false,
interrupted: false,
messages: [],
model: '',
needsInput: false,
pendingBranchGroup: null,
personality: '',
provider: '',
reasoningEffort: '',
sawAssistantPayload: false,
serviceTier: '',
storedSessionId: 'stored-1',
streamId: null,
turnStartedAt: null,
yolo: false
}
]
])
} satisfies MutableRefObject<Map<string, ClientSessionState>>
setSessions([storedSession({ message_count: 4 })])
const requestGateway = vi.fn(async (method: string, params?: Record<string, unknown>) => {
if (method === 'session.resume') {
return { session_id: 'runtime-1', resumed: params?.session_id, messages: [], info: {} } as never
}
return {} as never
})
vi.mocked(getSessionMessages).mockResolvedValue({
messages: [{ content: 'existing text', role: 'user', timestamp: 1 }],
session_id: 'stored-1'
} as never)
await runResume(requestGateway, {
runtimeIdByStoredSessionIdRef,
sessionStateByRuntimeIdRef
})
expect(requestGateway).not.toHaveBeenCalledWith('session.usage', { session_id: 'runtime-stale' })
expect(runtimeIdByStoredSessionIdRef.current.has('stored-1')).toBe(false)
expect(sessionStateByRuntimeIdRef.current.has('runtime-stale')).toBe(false)
expect($activeSessionId.get()).toBe('runtime-1')
expect($messages.get().length).toBe(1)
})
})
// ── Warm-cache mapping integrity (the "open chat A, chat B loads" bug) ─────────
// resumeSession's warm fast-path maps storedSessionId -> runtimeId -> cached
// state. A reaped/respawned pooled backend re-mints runtime ids, so a recycled
// id can resolve to a live-but-DIFFERENT session's cache entry. The fast-path
// must verify the cached state still BELONGS to the resumed session before it
// paints, or it shows a totally different thread under the current route.
const clientState = (storedSessionId: string | null): ClientSessionState => createClientSessionState(storedSessionId)
describe('resumeSession warm-cache mapping integrity', () => {
afterEach(() => {
cleanup()
setActiveSessionId(null)
setResumeFailedSessionId(null)
setMessages([])
setSessions([])
vi.restoreAllMocks()
})
it('rejects a cross-wired runtime mapping and falls through to a full resume', async () => {
// A recycled runtime id ('rt-recycled') is mapped to 'stored-A', but its
// cached state actually belongs to a DIFFERENT session ('stored-B') — the
// exact "open chat A, chat B loads" corruption a reaped/respawned pooled
// backend can leave behind.
const runtimeIdByStoredSessionIdRef: MutableRefObject<Map<string, string>> = {
current: new Map([['stored-A', 'rt-recycled']])
}
const sessionStateByRuntimeIdRef: MutableRefObject<Map<string, ClientSessionState>> = {
current: new Map([['rt-recycled', clientState('stored-B')]])
}
const requestGateway = vi.fn(async (method: string, params?: Record<string, unknown>) => {
if (method === 'session.resume') {
return { session_id: 'rt-A-fresh', resumed: params?.session_id, messages: [], info: {} } as never
}
return {} as never
})
vi.mocked(getSessionMessages).mockResolvedValue({ messages: [] } as never)
let resume: ((storedSessionId: string, replaceRoute?: boolean) => Promise<unknown>) | null = null
render(
<ResumeHarness
onReady={r => (resume = r)}
requestGateway={requestGateway}
runtimeIdByStoredSessionIdRef={runtimeIdByStoredSessionIdRef}
sessionStateByRuntimeIdRef={sessionStateByRuntimeIdRef}
/>
)
await waitFor(() => expect(resume).not.toBeNull())
await resume!('stored-A', true)
// The fast-path did NOT short-circuit on the cross-wired cache — the full
// resume RPC ran, for the session that was actually requested.
const resumeCalls = requestGateway.mock.calls.filter(([method]) => method === 'session.resume')
expect(resumeCalls.length).toBe(1)
expect(resumeCalls[0][1]).toMatchObject({ session_id: 'stored-A' })
// The corrupt mapping was purged so it can't mis-resolve again.
expect(runtimeIdByStoredSessionIdRef.current.has('stored-A')).toBe(false)
expect(sessionStateByRuntimeIdRef.current.has('rt-recycled')).toBe(false)
})
it('honours a warm cache entry whose stored id matches (no needless refetch)', async () => {
// Correctly-wired mapping: 'rt-A' <-> 'stored-A'. The fast-path should trust
// it and never reach session.resume (only the lightweight usage probe).
const runtimeIdByStoredSessionIdRef: MutableRefObject<Map<string, string>> = {
current: new Map([['stored-A', 'rt-A']])
}
const sessionStateByRuntimeIdRef: MutableRefObject<Map<string, ClientSessionState>> = {
current: new Map([['rt-A', clientState('stored-A')]])
}
const requestGateway = vi.fn(async (method: string) => {
if (method === 'session.usage') {
return { input: 0, output: 0, total: 0 } as never
}
return {} as never
})
let resume: ((storedSessionId: string, replaceRoute?: boolean) => Promise<unknown>) | null = null
render(
<ResumeHarness
onReady={r => (resume = r)}
requestGateway={requestGateway}
runtimeIdByStoredSessionIdRef={runtimeIdByStoredSessionIdRef}
sessionStateByRuntimeIdRef={sessionStateByRuntimeIdRef}
/>
)
await waitFor(() => expect(resume).not.toBeNull())
await resume!('stored-A', true)
// Fast-path served the session from cache: no full resume RPC, mapping intact.
const methods = requestGateway.mock.calls.map(([method]) => method)
expect(methods).not.toContain('session.resume')
expect(runtimeIdByStoredSessionIdRef.current.get('stored-A')).toBe('rt-A')
})
})

View File

@@ -252,6 +252,10 @@ function sessionMatchesStoredId(session: SessionInfo, storedSessionId: string):
return session.id === storedSessionId || session._lineage_root_id === storedSessionId
}
function sessionShouldHaveTranscript(session: SessionInfo | undefined): boolean {
return (session?.message_count ?? 0) > 0
}
function upsertResolvedSession(session: SessionInfo, storedSessionId: string) {
const lineage = session._lineage_root_id ?? session.id
@@ -627,9 +631,34 @@ export function useSessionActions({
// chat view drops the error state and shows the loader again.
setResumeExhaustedSessionId(current => (current === storedSessionId ? null : current))
const warmRuntimeId = runtimeIdByStoredSessionIdRef.current.get(storedSessionId)
// A warm cache entry is only trustworthy when it still BELONGS to the
// session being resumed. A pooled profile backend that gets idle-reaped
// and respawned (pruneSecondaryGateways) re-mints runtime ids, so a
// recycled id can resolve to a live-but-DIFFERENT session's cache entry.
// The session.usage 404 guard below only catches a fully-DEAD id — a
// recycled-live id 200s, so an unchecked hit paints the wrong transcript
// under the current route (the "open chat A, chat B loads" bug). On a
// mismatch the mapping is cross-wired: purge both sides and report a miss
// so the caller falls through to a full resume that rebinds a correct id.
const takeWarmCache = (): { runtimeId: string; state: ClientSessionState } | null => {
const runtimeId = runtimeIdByStoredSessionIdRef.current.get(storedSessionId)
const state = runtimeId ? sessionStateByRuntimeIdRef.current.get(runtimeId) : undefined
if (!warmRuntimeId || !sessionStateByRuntimeIdRef.current.get(warmRuntimeId)) {
if (!runtimeId || !state) {
return null
}
if (state.storedSessionId !== storedSessionId) {
runtimeIdByStoredSessionIdRef.current.delete(storedSessionId)
sessionStateByRuntimeIdRef.current.delete(runtimeId)
return null
}
return { runtimeId, state }
}
if (!takeWarmCache()) {
setActiveSessionId(null)
activeSessionIdRef.current = null
setMessages([])
@@ -648,11 +677,15 @@ export function useSessionActions({
await ensureGatewayProfile(sessionProfile)
const cachedRuntimeId = runtimeIdByStoredSessionIdRef.current.get(storedSessionId)
const cachedState = cachedRuntimeId && sessionStateByRuntimeIdRef.current.get(cachedRuntimeId)
// Re-check after the profile-resolve / gateway-swap awaits above: the
// cache may have changed, and takeWarmCache re-validates belongs-to and
// purges a cross-wired mapping before we trust the fast-path.
const warmHit = takeWarmCache()
if (cachedRuntimeId && cachedState) {
const stored = $sessions.get().find(session => session.id === storedSessionId)
if (warmHit) {
const cachedRuntimeId = warmHit.runtimeId
const cachedState = warmHit.state
const stored = $sessions.get().find(session => sessionMatchesStoredId(session, storedSessionId)) ?? storedForProfile
const cachedViewState =
!cachedState.model && stored?.model != null
@@ -666,41 +699,46 @@ export function useSessionActions({
sessionStateByRuntimeIdRef.current.set(cachedRuntimeId, cachedViewState)
}
setFreshDraftReady(false)
clearNotifications()
setSelectedStoredSessionId(storedSessionId)
selectedStoredSessionIdRef.current = storedSessionId
setActiveSessionId(cachedRuntimeId)
activeSessionIdRef.current = cachedRuntimeId
syncSessionStateToView(cachedRuntimeId, cachedViewState)
setCurrentCwd(cachedViewState.cwd)
setCurrentBranch(cachedViewState.branch)
setSessionStartedAt(Date.now())
try {
const usage = await requestGateway<UsageStats>('session.usage', { session_id: cachedRuntimeId })
if (!isCurrentResume()) {
return
}
if (usage) {
setCurrentUsage(current => ({ ...current, ...usage }))
}
return
} catch {
// The cached runtime id was minted by a prior backend instance. A
// pooled profile backend that gets idle-reaped (pruneSecondaryGateways)
// and respawned across a profile swap mints fresh ids, so this mapping
// now 404s ("session not found"). Drop it and fall through to a full
// resume that rebinds a live runtime id.
if (!isCurrentResume()) {
return
}
if (sessionShouldHaveTranscript(stored) && cachedViewState.messages.length === 0) {
runtimeIdByStoredSessionIdRef.current.delete(storedSessionId)
sessionStateByRuntimeIdRef.current.delete(cachedRuntimeId)
} else {
setFreshDraftReady(false)
clearNotifications()
setSelectedStoredSessionId(storedSessionId)
selectedStoredSessionIdRef.current = storedSessionId
setActiveSessionId(cachedRuntimeId)
activeSessionIdRef.current = cachedRuntimeId
syncSessionStateToView(cachedRuntimeId, cachedViewState)
setCurrentCwd(cachedViewState.cwd)
setCurrentBranch(cachedViewState.branch)
setSessionStartedAt(Date.now())
try {
const usage = await requestGateway<UsageStats>('session.usage', { session_id: cachedRuntimeId })
if (!isCurrentResume()) {
return
}
if (usage) {
setCurrentUsage(current => ({ ...current, ...usage }))
}
return
} catch {
// The cached runtime id was minted by a prior backend instance. A
// pooled profile backend that gets idle-reaped (pruneSecondaryGateways)
// and respawned across a profile swap mints fresh ids, so this mapping
// now 404s ("session not found"). Drop it and fall through to a full
// resume that rebinds a live runtime id.
if (!isCurrentResume()) {
return
}
runtimeIdByStoredSessionIdRef.current.delete(storedSessionId)
sessionStateByRuntimeIdRef.current.delete(cachedRuntimeId)
}
}
}
@@ -714,7 +752,7 @@ export function useSessionActions({
setSelectedStoredSessionId(storedSessionId)
selectedStoredSessionIdRef.current = storedSessionId
setSessionStartedAt(Date.now())
const stored = $sessions.get().find(session => sessionMatchesStoredId(session, storedSessionId))
const stored = $sessions.get().find(session => sessionMatchesStoredId(session, storedSessionId)) ?? storedForProfile
applyStoredSessionPreviewRuntimeInfo(stored)
if (stored) {
@@ -804,6 +842,15 @@ export function useSessionActions({
? currentMessages
: preserveLocalAssistantErrors(preferredMessages, currentMessages)
if (sessionShouldHaveTranscript(stored) && messagesForView.length === 0) {
setActiveSessionId(null)
activeSessionIdRef.current = null
setResumeFailedSessionId(storedSessionId)
resumedRunning = false
return
}
setActiveSessionId(resumed.session_id)
activeSessionIdRef.current = resumed.session_id
const runtimeInfo = applyRuntimeInfo(resumed.info)

View File

@@ -213,7 +213,7 @@ export function SettingsView({ gateway, onClose, onConfigSaved, onMainModelChang
</div>
</OverlaySidebar>
<OverlayMain className="px-0 pb-0 pt-[calc(var(--titlebar-height)+1rem)]">
<OverlayMain className="px-0 pb-0 pt-[calc(var(--titlebar-height)/2+1rem)]">
{activeView === 'config:appearance' ? (
<AppearanceSettings />
) : activeView === 'about' ? (

View File

@@ -78,6 +78,12 @@ const AUX_TASKS: readonly AuxTaskMeta[] = [
const NO_PROVIDERS: readonly ModelOptionProvider[] = [{ name: '—', slug: '', models: [] }]
// Radix <Select> renders a blank trigger when `value` matches no <SelectItem>.
// A custom model (e.g. one added via config that isn't in the provider's
// curated list) would vanish — surface the active value so it stays selectable.
export const withActive = (models: readonly string[], active: string): readonly string[] =>
active && !models.includes(active) ? [active, ...models] : models
interface StaleAuxWarningProps {
applying: boolean
onReset: () => void
@@ -555,7 +561,7 @@ export function ModelSettings({ onMainModelChanged }: ModelSettingsProps) {
<SelectValue placeholder={m.model} />
</SelectTrigger>
<SelectContent>
{(selectedProviderModels.length ? selectedProviderModels : []).map(model => (
{withActive(selectedProviderModels, selectedModel).map(model => (
<SelectItem key={model} value={model}>
{model}
</SelectItem>
@@ -708,7 +714,7 @@ export function ModelSettings({ onMainModelChanged }: ModelSettingsProps) {
<SelectValue placeholder={m.model} />
</SelectTrigger>
<SelectContent>
{(auxDraftProviderModels.length ? auxDraftProviderModels : []).map(model => (
{withActive(auxDraftProviderModels, auxDraft.model).map(model => (
<SelectItem key={model} value={model}>
{model}
</SelectItem>
@@ -880,7 +886,7 @@ export function ModelSettings({ onMainModelChanged }: ModelSettingsProps) {
<SelectValue placeholder={m.model} />
</SelectTrigger>
<SelectContent>
{modelsForProvider(slot.provider).map(model => (
{withActive(modelsForProvider(slot.provider), slot.model).map(model => (
<SelectItem key={model} value={model}>
{model}
</SelectItem>
@@ -957,7 +963,10 @@ export function ModelSettings({ onMainModelChanged }: ModelSettingsProps) {
<SelectValue placeholder={m.model} />
</SelectTrigger>
<SelectContent>
{modelsForProvider(currentMoaPreset.aggregator.provider).map(model => (
{withActive(
modelsForProvider(currentMoaPreset.aggregator.provider),
currentMoaPreset.aggregator.model
).map(model => (
<SelectItem key={model} value={model}>
{model}
</SelectItem>

View File

@@ -0,0 +1,34 @@
import { describe, expect, it } from 'vitest'
import { withActive } from './model-settings'
// A Radix <Select> shows a blank trigger when its `value` matches no
// <SelectItem>. `withActive` guarantees the controlled value is always
// representable so a config-only / custom model never renders blank.
describe('withActive', () => {
const curated = ['hermes-4', 'hermes-4-mini']
it('prepends a custom model missing from the curated list', () => {
expect(withActive(curated, 'anthropic/claude-opus-4.7')).toEqual([
'anthropic/claude-opus-4.7',
...curated
])
})
it('leaves the list untouched when the active model is already curated', () => {
expect(withActive(curated, 'hermes-4')).toEqual(curated)
})
it('does not inject an empty active value', () => {
expect(withActive(curated, '')).toEqual(curated)
})
it('surfaces the active model even when the curated list is empty', () => {
expect(withActive([], 'anthropic/claude-opus-4.7')).toEqual(['anthropic/claude-opus-4.7'])
})
it('keeps the active model selectable as the invariant', () => {
const out = withActive(curated, 'custom/model')
expect(out).toContain('custom/model')
})
})

View File

@@ -192,7 +192,7 @@ export function AppShell({
{nativeOverlayWidth > 0 && (
<div
aria-hidden
className="pointer-events-none fixed inset-x-0 top-0 z-[4] h-(--titlebar-height) border-b border-(--ui-stroke-tertiary) bg-(--ui-chat-surface-background)"
className="pointer-events-none fixed right-0 top-0 z-[4] h-(--titlebar-height) w-(--titlebar-tools-right) border-b border-(--ui-stroke-tertiary) bg-(--ui-chat-surface-background)"
/>
)}

View File

@@ -0,0 +1,147 @@
import { useEffect, useMemo, useState } from 'react'
import { useI18n } from '@/i18n'
import { formatK } from '@/lib/statusbar'
import { cn } from '@/lib/utils'
import type { ContextBreakdown, ContextUsageCategory, UsageStats } from '@/types/hermes'
interface ContextUsagePanelProps {
currentUsage: UsageStats
requestGateway: <T = unknown>(method: string, params?: Record<string, unknown>) => Promise<T>
sessionId: string | null
}
export function ContextUsagePanel({ currentUsage, requestGateway, sessionId }: ContextUsagePanelProps) {
const { t } = useI18n()
const copy = t.shell.statusbar.contextUsagePanel
const [breakdown, setBreakdown] = useState<ContextBreakdown | null>(null)
const [loading, setLoading] = useState(false)
useEffect(() => {
if (!sessionId) {
setBreakdown(null)
setLoading(false)
return
}
let cancelled = false
setLoading(true)
void requestGateway<ContextBreakdown>('session.context_breakdown', { session_id: sessionId })
.then(data => {
if (!cancelled) {
setBreakdown(data)
}
})
.catch(() => {
if (!cancelled) {
setBreakdown(null)
}
})
.finally(() => {
if (!cancelled) {
setLoading(false)
}
})
return () => {
cancelled = true
}
}, [requestGateway, sessionId])
const contextMax = breakdown?.context_max ?? currentUsage.context_max ?? 0
const contextUsed = breakdown?.context_used ?? currentUsage.context_used ?? 0
const contextPercent = Math.max(
0,
Math.min(100, Math.round(breakdown?.context_percent ?? currentUsage.context_percent ?? 0))
)
const categories = useMemo(
() =>
(breakdown?.categories ?? []).map(category => ({
...category,
label: copy.categories[category.id as keyof typeof copy.categories] ?? category.label
})),
[breakdown?.categories, copy.categories]
)
const segmentTotal = categories.reduce((sum, category) => sum + category.tokens, 0) || contextUsed || 1
return (
<div className="flex w-72 flex-col gap-3 p-3 text-[0.75rem]" data-slot="context-usage-panel">
<div className="flex items-baseline justify-between gap-2">
<p className="font-medium text-foreground">{copy.title}</p>
<span className="text-[0.6875rem] text-muted-foreground">
{copy.tokenSummary(`~${formatK(contextUsed)}`, formatK(contextMax))}
</span>
</div>
<p className="text-[0.6875rem] text-foreground">{copy.percentFull(contextPercent)}</p>
<ContextUsageBar categories={categories} segmentTotal={segmentTotal} />
<ul className="flex flex-col gap-1.5">
{categories.map(category => (
<li className="flex items-center justify-between gap-2" key={category.id}>
<span className="flex min-w-0 items-center gap-2">
<span
className="size-2 shrink-0 rounded-[2px]"
style={{ background: category.color }}
/>
<span className="truncate text-muted-foreground">{category.label}</span>
</span>
<span className="shrink-0 tabular-nums text-foreground">{formatCategoryTokens(category.tokens)}</span>
</li>
))}
</ul>
{loading && <p className="text-[0.6875rem] text-muted-foreground">{copy.loading}</p>}
{!loading && !categories.length && <p className="text-[0.6875rem] text-muted-foreground">{copy.empty}</p>}
</div>
)
}
function ContextUsageBar({
categories,
segmentTotal
}: {
categories: readonly ContextUsageCategory[]
segmentTotal: number
}) {
return (
<div
className={cn(
'flex h-1.5 overflow-hidden rounded-full',
categories.length ? 'bg-(--ui-stroke-tertiary)' : 'dither bg-(--ui-bg-elevated)'
)}
data-slot="context-usage-bar"
>
{categories.map(category => (
<span
className="h-full min-w-px"
key={category.id}
style={{
background: category.color,
width: `${(category.tokens / segmentTotal) * 100}%`
}}
/>
))}
</div>
)
}
function formatCategoryTokens(value: number): string {
if (!Number.isFinite(value) || value <= 0) {
return '0'
}
if (value >= 1_000) {
return `${formatK(value)}`
}
return value.toLocaleString()
}

Some files were not shown because too many files have changed in this diff Show More