Compare commits

..

951 Commits

Author SHA1 Message Date
Brooklyn Nicholson
f3ce17bf9e refactor(desktop): extract filesystem IPC handlers from main.cjs into fs-ipc.cjs
Second main.cjs cluster peel (after git-ipc). The six hermes:fs:* handlers
(readDir, gitRoot, reveal, rename, writeText, trash) move verbatim into
electron/fs-ipc.cjs behind a registerFsIpc({ ipcMain, directoryExists,
expandUserPath }) registrar — same injection pattern as registerGitIpc. Path
hardening / read-dir / git-root come from their sibling modules directly; the
two main-process path helpers are injected so the module stays side-effect free.

Channel names are unchanged, so preload + renderer are untouched. main.cjs drops
~85 lines; the now-dead fs-read-dir / git-root requires in main.cjs are removed.
Adds electron/fs-ipc.test.cjs asserting the hermes:fs:* surface by invariant.
2026-06-30 13:26:53 -05:00
Brooklyn Nicholson
b29bb6ef9d refactor(desktop): assert git-ipc surface by invariant, drop channel snapshot 2026-06-30 02:05:07 -05:00
Brooklyn Nicholson
025c8f0604 refactor(desktop): extract git IPC handlers from main.cjs into git-ipc.cjs
electron/main.cjs is the worst god file in the desktop app (~7.6k lines, 93 IPC
handlers across unrelated domains). Begin peeling cohesive handler clusters into
sibling modules — the established main.cjs pattern.

First cluster: the 19 git/worktree/review IPC handlers (all thin delegators to
the existing git-*-ops modules) move into a new electron/git-ipc.cjs exposing
registerGitIpc({ ipcMain, resolveGitBinary, resolveGhBinary }). The git/gh
binary resolvers stay in main.cjs (Windows PATH discovery) and are injected, so
the new module is pure. Channel names are unchanged, so preload/renderer are
unaffected.

Adds electron/git-ipc.test.cjs (wired into test:desktop:platforms) asserting
the full channel surface and resolver delegation. main.cjs: 7,617 -> 7,530.
2026-06-30 01:42:33 -05:00
brooklyn!
a81c5922a2 Merge pull request #55413 from NousResearch/bb/pre-stop-hook
feat(agent): add pre_verify hook and coding guidance config
2026-06-30 01:10:08 -05:00
Brooklyn Nicholson
821d9f709f feat(agent): add configurable coding_instructions
agent.coding_instructions (a string or list) is appended to the coding brief as
its own stable system block, so users can pin project-wide workflow rules
without editing the shipped brief. Coding-posture only and cache-safe (resolved
once per session; takes effect next session). Empty by default.
2026-06-30 00:59:59 -05:00
Brooklyn Nicholson
a10113658b feat(agent): add pre_verify hook and verify-on-stop coding guidance
Add a `pre_verify` user/plugin/shell hook fired once per turn when the agent
edited code and is about to finish, after the existing verify-on-stop guard. A
hook can keep the agent going one more turn (run a check, defer it, tidy the
diff) by returning {"action":"continue","message":...} (the Claude-Code Stop
shape {"decision":"block","reason":...} is accepted too). Hooks receive coding,
attempt, final_response, and sorted changed_paths so they can self-scope and
self-throttle; the path is bounded by agent.max_verify_nudges and preserves
message-role alternation.

Hermes still ships its default coding guidance (agent.verify_guidance, on by
default), but it now rides the evidence-based verify-on-stop missing-evidence
nudge instead of a separate default pre_verify continuation, so it costs no
extra model turn of its own. Guidance reuses the shared utils.is_truthy_value
parser rather than a local copy.
2026-06-30 00:59:29 -05:00
beardthelion
14c4a849b7 fix(kanban): make goal_mode judge gate truly fail-open
Follow-up to the judge gate. judge_goal() is fail-open at the source:
when no auxiliary model is reachable it returns a "continue" verdict
that is indistinguishable from a real "not done yet" judgment. The gate
treated any non-"done" verdict as a rejection, so an unconfigured or
degraded auxiliary model would wedge every goal_mode worker — it could
never close its own task. That contradicted the gate's own "fail-open"
comment.

Probe judge availability before enforcing (the same auxiliary client
lookup judge_goal performs) and only gate when a judge is actually
reachable. When none is, completion proceeds.

Also fix the rejection guidance: kanban_create takes parents=[...], not
parent=.

Add test_complete_goal_mode_allows_when_judge_unavailable covering the
fail-open path; update the rejection test to force the availability probe.
2026-06-29 22:20:19 -07:00
beardthelion
b3c1b3b3f3 fix(kanban): address review feedback on goal_mode judge gate
Apply naqerl's review comments on PR #38388:

- Hoist `from hermes_cli.goals import judge_goal` to module-level
  imports so an import failure surfaces at module init, not lazily
  on the first goal-mode completion (no circular import: hermes_cli
  package init is trivial and does not load tools.kanban_tools).
- Narrow the fail-open `try` to wrap only the judge_goal() call.
  The verdict check and its rejection `return tool_error(...)` now
  live outside the handler, so a failure there can no longer be
  swallowed by the broad except.
- Pass `exc_info=True` to the logger.warning call per CONTRIBUTING.md.

Update the test mock target to tools.kanban_tools.judge_goal, since
the hoisted import rebinds the name into this module's namespace.
2026-06-29 22:20:19 -07:00
beardthelion
0b33bc5396 fix(kanban): gate goal_mode task completion with auxiliary judge
Prevents workers in goal_mode from bypassing the auxiliary judge by
calling kanban_complete before acceptance criteria are met. The tool
handler now synchronously invokes the goal judge against the task's
title/body and the completion summary. If the verdict is not "done",
the completion is rejected with actionable guidance for the agent.

This keeps kanban_db.py as a pure SQLite wrapper while intercepting
the bypass exactly at the agent tool-call boundary, aligning with
Hermes separation of concerns.

Fixes #38367

Co-authored-by: CommandCodeBot <noreply@commandcode.ai>
2026-06-29 22:20:19 -07:00
brooklyn!
972b162090 Merge pull request #55410 from NousResearch/bb/install-theme-dedupe
feat(desktop): flag already-installed themes in the install pickers
2026-06-29 23:34:10 -05:00
Brooklyn Nicholson
04639adace feat(desktop): flag already-installed themes in the install pickers
The Cmd-K "Install theme…" palette listed Marketplace themes with no hint
that you already had them, and clicking one re-downloaded + re-installed a
theme you owned. The Appearance settings grid already detected this, but by
parsing theme descriptions inline on every render — plumbing that never made
it to the palette.

Lift it into one reactive source and reuse it everywhere:
- $marketplaceInstalls (computed over $userThemes): extensionId -> installed
  theme, derived once via marketplaceIdOf and memoized, instead of rebuilding
  a Set per render.
- Both install surfaces now mark owned rows installed and, on click,
  re-activate the installed theme rather than re-fetching it.
- Drops the duplicated description-parsing in settings and the per-session
  "installed here" state in both surfaces (the store is the source of truth,
  so previously-installed themes show correctly too).
2026-06-29 23:29:20 -05:00
Ben Barclay
05ac16778b feat(gateway): per-platform typing_indicator toggle
Add a generic per-platform PlatformConfig.typing_indicator flag (default
True) that gates the _keep_typing refresh loop in
_process_message_background. When false, the loop is never spawned, so no
typing/"is thinking…" status is shown on that platform — message delivery
is otherwise unchanged.

Mirrors the gateway_restart_notification contract exactly: dataclass field
+ to_dict/from_dict (with extra-fallback resolution) + shared-key bridge in
load_gateway_config, so 'slack: typing_indicator: false' under platforms
works without a separate block. Generic by design — the same key works for
every platform (Slack 'is thinking…', Telegram/Discord/Signal typing).

Motivated by users who find Slack's assistant 'is thinking…' status noisy
(it also briefly disables the compose box, via the Assistant API).
2026-06-29 21:12:57 -07:00
teknium1
463b1dfa9c fix(container-boot): also autostart a gateway stranded in 'degraded'
degraded is the same wedge class as draining: the gateway came up with
some platforms queued for retry, fell through to the running state
(gateway/run.py #5196), and is serving. A hard-kill there strands
gateway_state=degraded, which (like draining) is not in _AUTOSTART_STATES
and is not an operator stop or a failed boot — so it would stay DOWN
forever on every recreate. Add degraded to _TRANSIENT_RUNNING_STATES so
the fallback path normalises it to running-intent too.
2026-06-29 21:12:36 -07:00
Ben
d3f2931b8c fix(container-boot): autostart a gateway stranded in 'draining' state
A gateway hard-killed while draining (a container/VM recreate SIGTERMs it
before _stop_impl reaches its terminal-state persist) leaves
gateway_state.json frozen at 'draining'. With no explicit desired_state to
fall back to, container_boot read that transient value literally, found it
not in _AUTOSTART_STATES, and left the gateway DOWN on every subsequent
boot — dashboard up, messaging silently dark. Observed on a relay-opted-in
staging instance (2026-06): the s6 gateway-default slot kept its 'down'
marker across recreates and the gateway never came back.

'draining' is a transient sub-state of RUNNING (written by the drain
watcher / scale-to-zero go-dormant path), never an operator stop and never
a failed boot. Normalise it to 'running' in the gateway_state fallback so a
stranded drain marker reads as the run-intent it represents. This extends
gateway/run.py's #42675 handling (persist 'running' on an unexpected signal)
to the case where the gateway died before persisting anything at all.

'starting'/'startup_failed' are deliberately NOT normalised — those mean a
mid-boot death and must stay down to avoid the crash-loop the down-marker
guard prevents. An explicit desired_state still wins verbatim, so an
operator stop survives a transient 'draining' runtime value.

Tests: draining named-profile + default-root autostart (both fail without
the fix), plus a guard that an explicit desired_state=stopped still blocks a
draining runtime.
2026-06-29 21:12:36 -07:00
Teknium
d4c14011eb feat(claude-design): add surface-first conditioning + slop diagnostic (#55399)
Port the two genuinely-novel ideas from Command Code's /design skill into
our existing claude-design skill (skill-only, zero model-tool footprint):

- Surface-First: commit to one of 7 surface archetypes (Monitor/Operate/
  Compare/Configure/Decide/Explore/Command) before any visual tokens. Most
  AI design slop is compositional, not cosmetic — conditioning generation on
  a surface choice collapses entropy the way a CoT step does. Workflow step 3.
- Slop Diagnostic: the ~10 tells that account for ~90% of the 'this is AI'
  signal, as a score-out-of-10 self-audit. Diagnose-then-treat: the report is
  context not a to-do list; repair only what fired, matched to the tell
  (re-layout vs recolor vs de-decorate). Workflow step 7 (Verify).

Did NOT clone /design's 16-mode CLI, proprietary reference corpus, or make it
a core tool. Docs page regenerated via generate-skill-docs.py.
2026-06-29 21:12:29 -07:00
teknium1
5a3d7fb99d fix(xai): suppress false-positive windows-footgun on binary image read
open(..., "rb") is binary mode and needs no encoding=; the checker's
regex doesn't recognize the mode. Add the documented suppression comment.
2026-06-29 21:11:58 -07:00
Jaaneek
9ce79cd642 feat(xai): Imagine public-URL storage, chaining & video edit/extend
Add durable public-URL output and URL-based chaining to xAI Grok Imagine:

- Store generated media on files-cdn with permanent public HTTPS URLs
  (public_url: true, no expiry by default).
- Chain by URL: generate -> edit -> extend each take a prior result's
  public HTTPS URL (or a data URI / local file for inputs).
- Add provider-specific xai_video_edit and xai_video_extend tools.
- Image generation: public-URL/storage output, multi-reference edits,
  and ~/ local-path support for image edits.

Credentials use xAI Grok device-code OAuth (separate PR).
2026-06-29 21:11:58 -07:00
Ben
184c10cf97 fix(slack): warn when configured token is a user token, not a bot token
A Slack user/legacy token (xoxp-...) makes auth.test resolve to the
installing human's member ID with no bot_id, so the adapter binds its
identity (_bot_user_id / _team_bot_user_ids) to that human. Every
"is this the bot?" check then misfires: that person's <@...> mentions
wake the bot and are stripped as the bot's own mention, so the agent is
genuinely told it was @mentioned and replies to messages merely
addressed to that human (symptom: bot responds to "@trevor ..." and
insists it was explicitly mentioned).

There is no runtime API error to catch — a user token still
sends/receives — so the only detectable moment is connect time. Add a
warning-only nudge (_warn_if_not_bot_token) alongside the existing
group-DM scope nudge: when auth.test resolves a user_id but no bot_id,
log that the token is a user token and to use the xoxb-... Bot User
OAuth Token. Warning-only: does not block a working-but-misconfigured
install. Fires once per workspace per process.
2026-06-29 20:57:43 -07:00
brooklyn!
33d044c3af Merge pull request #55400 from NousResearch/bb/pet-roam-calmer
feat(desktop): calmer, more realistic pet roam + split roam modules
2026-06-29 22:55:30 -05:00
Brooklyn Nicholson
30cd39dc56 refactor(desktop): collapse stroll-direction coin to a single draw
DRY: the roomier-side bias computed its probability two ways
(STROLL_TOWARD_ROOM and 1 - STROLL_TOWARD_ROOM). One draw XNOR'd against
the roomier side says the same thing more plainly.
2026-06-29 22:53:38 -05:00
Brooklyn Nicholson
b7322f946d feat(desktop): calmer, more realistic pet roam + split roam modules
The floating pet wandered almost constantly: every idle beat picked a new
walk and hops fired ~45% of the time, so it read as nervous rather than
alive. Make movement the exception, not the default, and split the
overgrown roam hook into focused modules.

Behavior (per ambient game-AI: GameAIPro ch.36 + idle/wander state
machines):
- Loaf, don't pace: most decision beats just keep resting (REST_CHANCE
  0.62) instead of always re-walking.
- Memoryless dwell: pauses now draw from an exponential distribution
  (mostly short rests, the occasional long loaf) instead of a uniform
  1.8-5.2s window, so the cadence never reads as a metronome.
- Hops dialed back 0.45 -> 0.2 (the jumpiest, noisiest motion).

Structure (no god-file; a hook should own one narrow job):
- roam-behavior.ts - what to do & when (dwellMs, chooseMove,
  pickStrollTarget) + tuning. Pure, rng-injectable.
- roam-geometry.ts - where it can stand (snapshotLedges, overlayLedge,
  resolveLedge, overlapsX, groundTop). DOM measurement + pure ledge math.
- use-pet-roam.ts - the physics/RAF loop only.

Tests: deterministic, rng-seeded unit coverage for the decision + geometry
helpers (behavior contracts, not snapshots).
2026-06-29 22:51:09 -05:00
Teknium
6aefc9d925 feat(gateway): show per-category context breakdown in /usage (#55204)
Channel users get the same context split the desktop popover shows
(PR #54907) — system prompt, tools, rules, skills, MCP, subagents,
memory, conversation — under the existing Context line in /usage.

Reuses agent.context_breakdown.compute_session_context_breakdown, so
there is no new tool and no new engine. The slices are estimates
(chars/4) and the block is labelled _(estimated)_; the headline
Context line keeps using the provider-measured last_prompt_tokens.
Rendering is fail-open: any engine error returns no breakdown and the
rest of /usage is unaffected.

- gateway/slash_commands.py: _context_breakdown_lines() helper + wire
  into _handle_usage_command
- locales/*.yaml: breakdown_header, breakdown_line, and 8 category
  labels across all 16 locales (parity gate)
- tests/gateway/test_usage_command.py: render + fail-open coverage
2026-06-29 20:42:19 -07:00
Ben Barclay
53a75f147f feat(dashboard_auth): support confidential clients (client_secret) in self-hosted OIDC (#55344)
The self-hosted OIDC dashboard provider was public-client + PKCE only, with
two `# TODO(confidential-client)` seams. Authentik and Keycloak commonly
default a new OIDC client to *confidential*, whose token endpoint rejects an
unauthenticated exchange (`invalid_client`) — so a self-hoster who accepts
their IDP's default could not complete dashboard login without manually
flipping the client to public.

Add optional confidential-client support:

- New optional `client_secret` (env `HERMES_DASHBOARD_OIDC_CLIENT_SECRET`,
  or `dashboard.oauth.self_hosted.client_secret`; env-wins-config, empty
  treated as unset). It is a credential, so docs steer operators to the
  `.env` file; config.yaml is supported only for precedence symmetry.
- `_token_endpoint_auth()` selects `client_secret_basic` (HTTP Basic header)
  vs `client_secret_post` (form body) from the IDP's advertised
  `token_endpoint_auth_methods_supported`, defaulting to basic (the OIDC
  default) when absent. Applied to complete_login, refresh_session, and
  revoke_session (RFC 7009 §2.1).
- PKCE is sent in BOTH modes — the secret is client authentication layered
  on top, never a replacement (OAuth 2.1 / RFC 9700 keep PKCE mandatory).
- Basic header url-encodes client_id/secret before base64 per RFC 6749
  §2.3.1, so reserved chars (`:`, `@`, space) round-trip correctly.

Non-breaking: with no secret configured the provider is a pure public PKCE
client, byte-identical to prior behaviour (no Authorization header, no
client_secret in the body). The secret is never logged — register() reports
only a `confidential=<bool>` flag.

Tests: 16 new cases covering basic/post selection, default-when-absent,
public-unchanged contract, PKCE-preserved, reserved-char url-encoding,
blank-secret-is-public, refresh + revoke auth, no-secret-in-logs, and
env/config register wiring. Full dashboard-auth suite (nous provider,
middleware, gate, cookies, WS, 401-reauth, status endpoint) — 396 tests —
green, proving no existing auth path regressed.
2026-06-30 13:32:51 +10:00
Teknium
481caa66f2 feat(display): friendly human-phrased tool labels for built-in tools (#55166)
* feat(display): friendly human-phrased tool labels for built-in tools

Built-in tools now render ChatGPT-style status verbs ('Searching the web
for ...', 'Reading <file>', 'Browsing <url>') on the CLI spinner and
gateway/desktop tool-progress instead of the raw tool name.

- agent/display.py: _TOOL_VERBS map + build_tool_label() + set/get
  friendly-labels flag (default on). Custom/plugin/MCP tools fall back to
  the raw preview; verbose gateway mode left untouched (debug surface).
- tool_executor.py / tui_gateway / gateway: route the three spinner sites,
  the TUI _tool_ctx, and the gateway all/new progress line through the label.
- config: display.friendly_tool_labels (default True, per-platform aware).

Zero new core tool / schema footprint — pure display layer.

* docs: add PR infographic for friendly tool labels

* fix(display): preserve arg preview in gateway friendly labels + update tests

The first gateway pass re-derived the label from the callback's `args`, which
is empty ({}) at the gateway tool.started callsite — the command/query lives in
the `preview` string, so terminal rendered as a bare '💻 Running' and dedup
collapsed consecutive commands. Now the gateway prefixes the verb onto the
already-computed preview via get_tool_verb/tool_verb_connector/verb_drops_preview,
preserving the command/url/query. CLI spinner path (real args) keeps build_tool_label.

Tests: update test_run_progress_topics exact-format assertions to the friendly
form ('💻 Running pwd'), add a format-agnostic preview extractor for the
truncation tests (works for both quoted-legacy and verb-prefixed output).

* test(tui): update resume-display context to friendly tool label

_tool_ctx now uses build_tool_label, so the desktop resume-view context for a
search_files turn reads 'Searching files for resume' instead of the bare
'resume' preview — consistent with live tool-progress. Update the assertion.

* test(tui): harden no-race worker test against sibling shard leakage

test_session_create_no_race_keeps_worker_alive flaked under -j 8: a daemon
build thread leaked from a prior session.create test in the same shard process
fires close/unregister against its own (foreign) session_key after this test
patches the global approval hooks, polluting the captured lists. Scope the
assertions to this session's own session_key so the regression intent
(this session's worker/notify must survive) is preserved while the test
becomes immune to shard composition. Not related to friendly-tool-labels.
2026-06-29 20:31:17 -07:00
ethernet
41c85fb946 fix(agents.md): fix documentation on subprocess isolation in tests 2026-06-29 19:17:04 -07:00
ethernet
cca8b4ef4e fix(ci): unify amd64/arm64 docker pipelines 2026-06-29 19:17:04 -07:00
ethernet
66ba9e06d9 change(ci): remove lint PR comment
it's already in the job summary.
having it as a comment just makes people ignore it. don't waste sapce.
2026-06-29 19:07:00 -07:00
ethernet
808ba82125 feat(ci): add CI timing report 2026-06-29 19:07:00 -07:00
Ben Barclay
3a55f66602 refactor(relay): adopt scope_id wire key (guild_id → scope_id dual-read/write) (#55289)
Gateway half of relay-platform-parity Phase 2.5 (D-Q2.5). The relay wire's
platform-neutral scope discriminator is renamed guild_id → scope_id; this is the
hermes-agent side of the cross-repo wire-compatible migration.

- SessionSource: scope_id is canonical; guild_id kept as @deprecated alias.
  __post_init__ mirrors the two so all existing SessionSource(guild_id=...)
  constructors across native adapters keep working unchanged. to_dict dual-WRITES
  scope_id+guild_id; from_dict dual-READS scope_id ?? guild_id.
- relay/adapter.py: capture + outbound metadata dual-read/write scope_id.
- relay/ws_transport.py: _frame_to_event dual-reads scope_id ?? guild_id.
- docs/relay-connector-contract.md: document scope_id (canonical) + guild_id
  (deprecated alias) in the §3 SessionSource field table (conformance test).

250 relay+session+contract tests green. Solo lane (relay).
2026-06-30 11:16:53 +10:00
Joey Kerper
f3d2dfbec6 fix(dashboard_auth): allow any http:// host in self-hosted OIDC redirect_uri (#55099)
The self-hosted OIDC dashboard login rejected any http:// redirect_uri
whose host was not localhost/127.0.0.1, surfacing "redirect_uri may only use http:// for localhost/127.0.0.1" before reaching the IDP. This broke self-hosted dashboards reached over plain HTTP (including LAN IPs, internal hostnames, and reverse proxies that terminate TLS upstream).

#38827 already dropped this check from the nous provider, but the generic self-hosted provider  copied the old localhost-only
branch and reintroduced the bug for HERMES_DASHBOARD_OIDC_ISSUER setups.

The IDP's own allowlist is authoritative on which redirect_uris are
permitted; this client-side _validate_redirect_uri is only a fast-fail for
obvious operator error and should not second-guess valid http:// deployments.

Fix: drop the localhost-only branch on the http scheme. Validation now enforces only that the scheme is http(s) and the path ends with
/auth/callback. Updated the docstring to explain the relaxed contract,
and added test_allows_http_with_arbitrary_host covering an internal
hostname and a LAN IP alongside the existing localhost case.
2026-06-30 09:45:11 +10:00
yoniebans
d2ce2c852d test(gateway): assert interleaving safety of concurrent offloaded DB calls 2026-06-29 15:51:57 -07:00
yoniebans
6735162531 fix(gateway): offload the Telegram topic-recovery helper tree off the loop
The topic-mode helpers (_telegram_topic_mode_enabled,
_recover_telegram_topic_thread_id, _record/_sync_telegram_topic_binding,
_is_telegram_topic_lane/_root_lobby, _normalize_source_for_session_key,
_telegram_topic_new_header, _schedule_telegram_topic_title_rename, and the
base.py _apply_topic_recovery hook) each run a synchronous SessionDB read or
write. They reach the event loop through async handlers, so a contended
state.db froze the loop the same way the handoff watcher did.

These helpers already run off-loop in the run_sync thread-pool closure, so
they are proven thread-safe there. Rather than colour them async, loop-side
callers now invoke them via asyncio.to_thread(...); the executor callers are
unchanged. Inside the helpers the SessionDB handle is unwrapped to the sync
door (getattr(db, '_db', db)) since they always run on a worker thread, and
AIAgent construction + query_session_listing are handed the sync SessionDB
directly. base.py wraps its single _apply_topic_recovery call in to_thread.

The guard is now alias-aware (catches db = getattr(self, '_session_db', None);
db.method(...)) and enforces the offload contract: the offloaded sync helpers
may never be called bare on the loop. Sibling test fixtures wrap their injected
SessionDB in AsyncSessionDB to match how the gateway holds it.
2026-06-29 15:51:57 -07:00
yoniebans
0a997aabbc fix(gateway): route aliased SessionDB calls through AsyncSessionDB
The migration's call-site sweep keyed on the literal self._session_db.
spelling and missed calls bound to a local first
(db = getattr(self, '_session_db', None); db.method(...)). Convert the
three in async contexts: get_telegram_topic_binding in the topic-rename
coroutine, and the two update_session_model sites on the model-switch path.
2026-06-29 15:51:57 -07:00
yoniebans
0896facce8 fix(gateway): route SessionDB calls through AsyncSessionDB 2026-06-29 15:51:57 -07:00
yoniebans
ea26f22710 feat(gateway): add AsyncSessionDB offload facade 2026-06-29 15:51:57 -07:00
yoniebans
89daacb454 test(gateway): cover AsyncSessionDB offload + raw-call guard (failing) 2026-06-29 15:51:57 -07:00
brooklyn!
f171842f0d Merge pull request #55154 from NousResearch/bb/desktop-auto-speak-replies
feat(desktop): read replies aloud (auto-TTS) composer toggle
2026-06-29 15:25:27 -05:00
Teknium
290fa7fd2b fix(gateway): skip confirmed-dead delivery targets (deleted groups, blocked bots) (#55115)
* fix(gateway): skip confirmed-dead delivery targets (deleted groups, blocked bots)

A deleted Telegram group, kicked/blocked bot, or deactivated user keeps
throwing Forbidden/not_found on every cron tick and fan-out delivery. Each
retry burns a send against the platform's flood-control envelope and spams
the logs, making the whole session feel broken even when the model call
completed.

Add a small persistent DeadTargetRegistry (per-profile JSON under
HERMES_HOME) that records a target the moment a send reports a whole-chat
death (forbidden / chat-level not_found), and have DeliveryRouter.deliver()
short-circuit it on subsequent attempts. Self-healing: any successful send
clears the flag, so a user re-adding the bot recovers with no manual cleanup.
Thread/topic-level not_found is NOT recorded (adapters already self-heal that
by retrying without reply_to). Transient/timeout errors are never marked dead.

* infographic: dead delivery target skipping
2026-06-29 13:23:29 -07:00
Brooklyn Nicholson
596b813c9b feat(desktop): add read-replies-aloud toggle and wire auto-speak 2026-06-29 15:22:37 -05:00
Brooklyn Nicholson
fcdc05c891 feat(desktop): add auto-speak watcher hook 2026-06-29 15:22:37 -05:00
Brooklyn Nicholson
572c7dbd93 feat(desktop): add read-replies-aloud composer strings 2026-06-29 15:22:37 -05:00
Brooklyn Nicholson
09abbf8a63 feat(desktop): mirror voice.auto_tts into an $autoSpeakReplies store 2026-06-29 15:22:37 -05:00
Brooklyn Nicholson
bff91f978f feat(desktop): type voice.auto_tts in desktop config 2026-06-29 15:22:37 -05:00
brooklyn!
d417ffb363 Merge pull request #55114 from NousResearch/bb/pet-roam
feat(desktop): roaming pet (opt-in)
2026-06-29 15:00:03 -05:00
Brooklyn Nicholson
a1e699ae55 feat(desktop): roaming pet patrols the base of an open overlay
When a full-screen route overlay (settings/profiles/cron/agents/command-center) is up, the pet's walkable surface swaps to a single ledge at the overlay card's bottom edge — derived from OverlayView's shared inset, not measured — so it patrols there; closing the overlay restores the normal surfaces and it drops back down.
2026-06-29 14:57:26 -05:00
Brooklyn Nicholson
0e2a5a3206 feat(desktop): ground the roaming pet — sprite-paced walk + feet on surface
Walk speed is derived from the sprite's animation loop + on-screen size (one body-width per loop) instead of a fixed px/s, so it steps rather than glides; the pet also sinks a few px so its feet meet the surface instead of hovering.
2026-06-29 14:47:37 -05:00
Austin Pickett
75d4aa9325 fix(web): confirm sidebar Update Hermes before running
Match the Restart Gateway flow with a confirm dialog that fetches cached
update metadata so users see commit-behind context before applying.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-29 12:30:24 -07:00
Austin Pickett
dbe92b9ed1 fix(web): confirm sidebar gateway restart and use DS checkboxes
Prompt before restarting from the sidebar system menu, and replace native
checkboxes on the System page with the design-system Checkbox component.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-29 12:30:24 -07:00
Austin Pickett
1abf0c6cbf fix(web): polish dashboard sidebar chrome and model card menus
Use momentum easing for sidebar transitions, switch sidebar typography to
sans-serif, replace the profile native select with the DS Select, and stop
clipping the Models page Use-as dropdown inside model cards.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-29 12:30:24 -07:00
Austin Pickett
10374bb7a2 fix(web): theme terminal foreground and restore backdrop plugin slot
Make Nous Blue terminal text readable without the inversion layer, re-mount
the backdrop plugin slot, and drop unused backdrop CSS vars from theme apply.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-29 12:30:24 -07:00
Austin Pickett
57d98ebed7 fix(web): remove marketing backdrop stack for lighter dashboard shell
Drop the CSS lens overlay (blend modes, noise, inversion) and backdrop-blur
from the ops dashboard so compositing no longer competes with xterm on /chat.
Use flat theme backgrounds and direct Nous Blue palette colors instead of
FG-inversion authoring.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-29 12:30:24 -07:00
Brooklyn Nicholson
b72c9e1b2c feat(desktop): add pet roam opt-in toggle + i18n 2026-06-29 14:26:02 -05:00
Brooklyn Nicholson
4da744ef9b feat(desktop): let the pet perch on the status bar and profile rail
Tag both bars with data-slots; the roam loop stands on the status bar's top edge (not over it) and treats the profile rail as a climbable ledge.
2026-06-29 14:26:02 -05:00
Brooklyn Nicholson
7d3c1d55f4 feat(desktop): wire roaming into the floating pet 2026-06-29 14:26:02 -05:00
Brooklyn Nicholson
a8f1d9cc76 feat(desktop): add surface-aware pet wander loop
usePetRoam re-measures ledges from the live DOM each beat and walks/hops/falls between them, driving DOM position imperatively (no per-frame re-render).
2026-06-29 14:26:02 -05:00
Brooklyn Nicholson
964ec680cc feat(desktop): pick directional run row from travel direction
roamWalkRow() prefers running-left/running-right rows, falling back to the generic running row with a mirror for pets that lack them.
2026-06-29 14:26:02 -05:00
Brooklyn Nicholson
c6d6a1c30d feat(desktop): add pet roam + motion/direction store signals
Opt-in $petRoam (localStorage), $petMotion (run/jump pose) and $petRoamDir (-1/0/1) feed the shared $petState only while the agent is at rest ($petAtRest), so a wander never overrides real activity.
2026-06-29 14:26:02 -05:00
Ben Barclay
b963d3238b feat(gateway): suppress home-channel shutdown broadcast on flagged drains (#54824)
Add a generic suppress_notification flag to the drain-request marker. When a
drain that ends in process exit (e.g. a NAS auto-update image migration on the
always-on Hermes Cloud fleet) is flagged, the gateway skips ONLY the
home-channel 'gateway shutting down' broadcast — the operator-flavoured ping
that would otherwise fire on every routine auto-update, dozens of times a day.

The per-active-session interrupt ping is ALWAYS kept: on a drained shutdown
it's empty by construction, and in the force-interrupt (deadline-exceeded) case
it carries the user-valuable 'your task was cut off, message me to resume' hint.

The gateway stays agnostic about WHY a drain is quiet (generic boolean, not a
kind enum); the policy of which drain causes set the flag lives in the caller
(NAS). Default-false so legacy/operator drains behave exactly as before. The
reader reuses the NS-570 epoch-staleness check so an orphaned marker on the
durable volume can never silence a fresh gateway's legitimate broadcast.

- drain_control.py: write_drain_request gains suppress_notification; new
  drain_notification_suppressed() reader (current-epoch + truthy flag).
- web_server.py: /api/gateway/drain reads + echoes the flag.
- run.py: _notify_active_sessions_of_shutdown skips the home-channel loop only.

Tests prove: flag round-trips; home-channel suppressed when set, kept when
unset; active-session ping always fires; stale/legacy/corrupt markers never
suppress.
2026-06-29 12:18:11 -07:00
brooklyn!
ccc92c5213 Merge pull request #55086 from NousResearch/fix/gateway-statusbar-tooltip
fix(desktop): show Gateway statusbar tooltip via composed trigger Slots
2026-06-29 13:50:50 -05:00
Brooklyn Nicholson
7a6b3cb923 fix(desktop): show Gateway statusbar tooltip via composed trigger Slots
The Gateway item is the only statusbar entry with variant === 'menu'.
Since da73223f4 wrapped every render branch in `Tip`, the menu branch
nested `<DropdownMenu>` (a Radix Root that renders no DOM node) inside
`Tip`'s `<TooltipTrigger asChild>`. With no element to attach to, Radix
could never wire hover listeners, so the tooltip silently never showed.

`Tip` also can't be moved inside `DropdownMenuTrigger asChild` (the shape
proposed in #54859): it's a plain component, not a Slot-forwarding one, so
the trigger's injected ref/handlers would land on `TooltipContent` instead
of the button and break the menu's click + popper anchoring.

Fix by composing both trigger Slots directly onto a single <button>
(`TooltipTrigger asChild` over `DropdownMenuTrigger asChild`), the pattern
already used in profile-switcher.tsx, and skip the tooltip wrapper entirely
when the item has no title.

Supersedes #54859.

Co-authored-by: wnuuee1 <wnuuee1@users.noreply.github.com>
2026-06-29 13:48:56 -05:00
brooklyn!
929dd9c0d7 Merge pull request #55033 from NousResearch/bb/subagent-watch-readonly
feat(desktop): read-only spectator transcript for subagent watch windows
2026-06-29 12:09:53 -05:00
Brooklyn Nicholson
7cf6758e33 feat(desktop): read-only spectator transcript for subagent watch windows
Subagent session pop-outs (`watch=1`) spectate a run driven elsewhere, so
editing/steering the transcript from there makes no sense. Gate the composer
and the user-bubble mutations on `isWatchWindow()`:

- hide the composer (folds into `showChatBar`)
- user prompts become a read-only button that toggles the 2-line clamp so long
  prompts stay fully readable, instead of opening the edit composer
- drop the stop/restore actions and the checkpoint branch-picker

Keyed off the narrow `isWatchWindow()` (not `isSecondaryWindow()`), so the
new-session and cmd-click pop-outs are unaffected.
2026-06-29 12:06:25 -05:00
Teknium
ee8cbfdc03 feat(web_extract): truncate-and-store instead of LLM summarization (#54843)
* feat(web_extract): truncate-and-store instead of LLM summarization

web_extract no longer runs an auxiliary LLM over scraped pages. The extract
backends (Firecrawl/Tavily/Exa/Parallel) already return clean, boilerplate-
stripped markdown, so we return it directly: pages within a char budget
(default 15000, web.extract_char_limit) come back whole; larger pages get a
head+tail window plus an explicit footer giving the stored full-text path and
the read_file call to page through the omitted middle. The full clean text is
written to cache/web (mounted read-only into remote backends like the other
cache dirs), so nothing is lost.

Inline base64 images are converted to [IMAGE: alt] placeholders (token bombs
dropped) while real http(s) image URLs are preserved as links so the agent can
still web_extract/vision_analyze them.

Removes process_content_with_llm + the chunked summarizer + check_auxiliary_model
+ _resolve_web_extract_auxiliary. context_references._default_url_fetcher is
updated to the truncate path and its stale data.documents shape read is fixed
to results (it was silently returning empty).

Live before/after eval (firecrawl, 4 URLs): 11.7x faster overall (176.6s ->
15.1s); 10-60x on large pages. Quality identical; findability 4/4 (answer
recoverable from stored full text on every truncated page). web_search is
unchanged.

No own scraper added; no changes to web_search.

* fix(web_extract): add char_limit to execute_code web_extract stub

The new web_extract char_limit param must appear in the code_execution_tool
_TOOL_STUBS signature (and doc line) or test_stubs_cover_all_schema_params
fails — the stub schema must cover every real schema param.
2026-06-29 10:00:49 -07:00
Teknium
c6c1fd8b6b docs: create dev venv outside the source tree (root-cause fix for #7779) (#54862)
A manually-installed venv inside the cloned repo can be destroyed by the
agent running a relative-path command against its own checkout (rm -rf venv,
uv venv venv, etc.), silently wiping the running runtime mid-session. Moving
the canonical manual-install venv to ~/.hermes/venvs/hermes-dev means no
relative path from the agent's workspace resolves to its own runtime, making
the bug class impossible without any command-detection code.

Closes the root cause of #7779. The managed install.sh layout is unchanged.
2026-06-29 10:00:37 -07:00
brooklyn!
3bbeb9e008 Merge pull request #54907 from NousResearch/austin/feat/context-usage-popover
feat(desktop): add context usage breakdown popover
2026-06-29 11:45:23 -05:00
Austin Pickett
fd324562d3 feat(desktop): add context usage breakdown popover
Let users click the status bar context indicator to see how tokens are
split across system prompt, tools, rules, skills, MCP, and conversation.

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-29 09:18:10 -04:00
HexLab98
f1345290ed test(auxiliary): cover NVIDIA NIM max_tokens in _build_call_kwargs 2026-06-29 18:04:39 +05:30
HexLab98
88e6f9b98c fix(auxiliary): preserve max_tokens for NVIDIA NIM aux calls
NVIDIA integrate.api.nvidia.com models such as minimaxai/minimax-m3 can
return HTTP 200 with empty choices when max_tokens is omitted. Keep the
output cap on auxiliary chat-completions routes, matching the main NVIDIA
provider profile behavior.
2026-06-29 18:04:39 +05:30
Ben Barclay
f53ba9bb54 fix(s6): dot-prefix gateway staging dir so svscan ignores it mid-build (#54834)
The register path builds each profile-gateway slot in a sibling staging
dir under /run/service (the scandir s6-svscan watches), then atomically
renames it to the live gateway-<profile> name. The staging dir was named
gateway-<profile>.tmp — a NON-dotfile — so a concurrent `s6-svscanctl -a`
rescan (fired by the cont-init reconciler registering gateway-default, or
by a sibling register) would supervise the half-built slot the moment it
had a valid type/run: s6-supervise spawns AS ROOT and mkdirs supervise/
root-owned 0700, then the in-flight _seed_supervise_skeleton early-returns
on the now-existing supervise/ and the next `mkdir supervise/event` hits
PermissionError.

That is the arm64-only CI flake on
test_s6_unregister_removes_service_dir_in_live_container
(PermissionError: /run/service/gateway-phase3test.tmp/supervise/event) —
arm64-only because the native-arm runner's wider scheduling jitter lets
the rescan land inside the ~ms seed window; amd64 ran 30/30 clean.

Fix: dot-prefix the staging dir (.gateway-<profile>.tmp) in both register
paths (S6ServiceManager.register_profile_gateway and
container_boot._register_service). s6-svscan skips any scandir entry whose
name begins with '.', so the half-built slot can never be supervised
mid-build. The atomic rename to the dotless live name is unchanged.

Verified on a real s6 image (amd64): a non-dotted staging dir is picked up
by an svscanctl -a rescan (SUPERVISED owner=root) while a dot-prefixed one
is ignored (NOT-SUPERVISED). Added a docker-harness regression test that
asserts both, plus a unit test that the staging dir is dot-prefixed.
2026-06-29 21:33:00 +10:00
Teknium
dbad6d47d3 fix(gateway): also neutralize untrusted Matrix room name in prompt
Widen #5961's _format_untrusted_prompt_value coverage to the Matrix
room display name (**Matrix Room:**), a sibling attacker-controllable
field the original fix missed. chat_name is user-settable, so an
injected room name could render as literal markdown in the system
prompt. Adds a regression test.
2026-06-29 04:25:51 -07:00
Xowiek
09666ceb76 fix(gateway): neutralize untrusted session metadata in prompts 2026-06-29 04:25:51 -07:00
teknium1
ea1372d2af fix(security): wire session-id sanitizer into artifact paths + API boundary
Defense-in-depth on top of _safe_session_filename_component (#5958):

Sink (makes the bad write impossible regardless of entry point):
- run_agent._save_session_log: sanitize session_id before building the
  session_{sid}.json snapshot path.
- agent_runtime_helpers.dump_api_request_debug: sanitize before building
  the request_dump_{sid}_{ts}.json path.

Boundary (clean 400 instead of a silently-hashed filename):
- api_server rejects path-traversal-shaped X-Hermes-Session-Id on the
  session-continuation path and the explicit /api/sessions create path,
  reusing gateway.session._is_path_unsafe (mirrors the native gateway's
  entry-boundary guard). Also enforces the session-header length cap on
  the continuation path.

Tests: traversal session_id stays contained at the write site; sanitizer
always yields a traversal-free segment; the API header rejects
../, absolute, and Windows-traversal IDs with 400.
2026-06-29 04:25:45 -07:00
Xowiek
1debd5e8f9 fix(security): add session-id filename sanitizer to prevent path traversal
Session IDs can originate from untrusted input (e.g. the
X-Hermes-Session-Id API header) and are interpolated raw into on-disk
artifact filenames under ~/.hermes/sessions/. A traversal-shaped ID
(../../../../etc/pwned) would let a caller write the session snapshot
or request dump outside the sessions directory.

_safe_session_filename_component() collapses every non [A-Za-z0-9_-]
character to _, caps the length, and appends a short content hash when
sanitization changed the string, always yielding a single traversal-free
path segment.

Closes #5958.
2026-06-29 04:25:45 -07:00
teknium1
cdd8e0a271 test(gateway): exercise last_prompt_tokens in reset-activity tests
The reset-had-activity tests set total_tokens (dead state) to simulate
activity; production records activity via last_prompt_tokens. Update
the fixtures to match the field the fix and runtime actually use.
2026-06-29 04:25:37 -07:00
Mibayy
0fe9755016 fix(gateway): use last_prompt_tokens for session-reset activity check
reset_had_activity gated on entry.total_tokens, which is never written
(token counts migrated to agent-direct persistence) so it was always 0.
That suppressed session-reset notifications for sessions that genuinely
had activity. Switch to last_prompt_tokens, which is updated on every
turn.
2026-06-29 04:25:37 -07:00
Mibayy
9e490138a0 fix(security): fail-closed feishu webhook rate limiter + whatsapp bridge path guard
Salvages the two still-valid hardenings from #5381 onto the relocated
plugin adapters (the discord/feishu/whatsapp adapters moved to
plugins/platforms/ since the PR was opened, and 4 of its 6 hunks are
already on main or superseded).

- feishu: rate limiter now denies untracked keys when the tracking table
  is at capacity after pruning stale entries (was: allow through without
  tracking). At-capacity-with-all-fresh-entries only happens under abuse,
  so allowing untracked requests let an attacker who flooded the table
  bypass the limiter entirely. Already-tracked keys and post-prune room
  are unaffected.
- whatsapp: absolute file paths handed back by the Baileys bridge are now
  validated to resolve inside a known media cache dir before being
  attached. A compromised/buggy bridge could otherwise return an
  arbitrary path (e.g. /etc/passwd) that would be sent verbatim to the
  model. Guard resolves symlinks and accepts both the canonical
  cache/<kind> and legacy <kind>_cache layouts.
2026-06-29 04:25:31 -07:00
Ruzzgar
576424cc1c fix(security): redact browser CDP endpoint logs 2026-06-29 04:25:26 -07:00
teknium1
23c03ced75 fix(session-db): enrich NULL session metadata via upsert instead of INSERT OR IGNORE
The gateway's get_or_create_session() creates a bare session row (source +
user_id) before the agent exists. The agent's later create_session() carries
the real model/model_config/system_prompt, but _insert_session_row used
INSERT OR IGNORE — silently dropping that enrichment. Gateway sessions were
left with NULL model and NULL billing metadata.

Switch to INSERT ... ON CONFLICT(id) DO UPDATE with COALESCE so NULL columns
get backfilled while values an earlier writer already set are never
overwritten (a later bare write with source='unknown' can't clobber a real
source/model). Credit: original report and fix direction by @LucidPaths (#5048).
2026-06-29 04:25:23 -07:00
teknium1
61f56d27db refactor(dashboard-auth): drop redundant _interactive_providers helper
list_session_providers() already filters on supports_session=True, so the
new helper re-filtered an already-filtered list. Call it directly at the
single auto-SSO call site.
2026-06-29 04:25:18 -07:00
Ben
f5ecbe1ec6 feat(dashboard): auto-initiate portal SSO redirect on unauthenticated load
When the dashboard gateway has no local session cookie, it rendered a
click-through /login interstitial — even though the Nous portal's
/oauth/authorize auto-approves any current member of the dashboard's org
and is a silent 302 when the user already holds a portal session. For the
common case (clicking a hosted-agent dashboard link while signed in to the
portal) that interstitial click is pure friction.

This makes the gate auto-initiate the OAuth redirect on an unauthenticated
HTML document load instead of rendering the interstitial, when exactly one
interactive provider is registered. A one-shot loop-guard cookie
(hermes_sso_attempt, 60s TTL) ensures that a genuinely absent portal
session (the portal bounces back still-unauthenticated) falls back to the
/login page after exactly one bounce rather than ping-ponging forever. The
marker is cleared on a successful callback and whenever the gate falls back
to /login.

Security: this removes a human CLICK, not a security check. The redirect
lands on the existing /auth/login route and runs the unchanged PKCE
auth-code flow; token verification, audience checks, redirect-URI match,
and org-membership checks are all untouched. /api/* fetches still get the
401 JSON envelope (never a 302 a fetch() would follow opaquely), and with
two or more providers the /login chooser still renders.

Phase 1 of the cloud-auto-discovery work.
2026-06-29 04:25:18 -07:00
Teknium
dc5ef20d89 test(reasoning-floor): isolate stale-timeout floor tests from config-module reload races (#54775)
The five _resolved_api_call_stale_timeout_base integration tests reloaded
hermes_cli.config + hermes_cli.timeouts via importlib.reload to clear cached
config. Under xdist that mutates module-global state shared across the worker
process, so a sibling test could leave the config cache in a state that made
get_provider_stale_timeout return a leaked value — intermittently failing
test_reasoning_floor_applies_to_opus_4_thinking (shard 6 flake, #52217 area).

Patch run_agent.get_provider_stale_timeout per-test instead: floor-path tests
get None (resolver falls through to the reasoning floor / env var / default),
the explicit-config test gets 60.0 (priority-1 short-circuit). Same assertions,
no shared-module mutation, deterministic under parallel execution.
2026-06-29 02:42:54 -07:00
sgaofen
194bff0687 fix(gateway): confirm final delivery before suppressing send
Fixes #14238. During a compression/session split at the response
boundary, the interim callback delivered unrelated commentary, setting
response_previewed=True. The suppression logic treated that as proof the
final reply had been delivered and skipped the normal send — the response
was persisted to the child session but never sent to chat.

Only suppress the normal final send when the stream consumer confirms
final delivery (final_response_sent / final_content_delivered) or the
exact final response text was delivered as a preview.
2026-06-29 02:37:11 -07:00
teknium1
fa3dba4b30 docs(infographic): add list_profiles perf-fix infographic 2026-06-29 02:35:57 -07:00
teknium1
10c9eafde2 chore(attribution): map mango001@126.com -> max-chen for salvaged #51194 2026-06-29 02:35:57 -07:00
Sahil-SS9
1bb7b59c5d fix: offload blocking profiles endpoints from asyncio event loop (#54523)
(cherry picked from commit 09f10e2b77)
2026-06-29 02:35:57 -07:00
chenxiang
d5eee133eb perf(profiles): fix list_profiles O(N*M) wrapper rescan (6.4s -> 0.4s)
find_alias_for_profile re-scanned the whole wrapper dir (~/.local/bin) and
read_text every file for EACH profile — including large unrelated binaries
(ffmpeg etc.) read 15x over. With 16 profiles this took ~6.4s, long enough
that the desktop's per-request backend calls timed out (15s) and the sidebar
rendered '全部智能体 0 / 会话 0'.

- Add build_alias_map(): single-pass {profile -> alias} reverse map, reads
  only an 8KB head slice per wrapper, skips binaries via UnicodeDecodeError.
- find_alias_for_profile now delegates to it (behavior preserved).
- Cache _count_skills by skills-dir mtime signature (+30s TTL).

list_profiles: 6.37s -> 0.84s cold / 0.44s warm. 138 profile tests pass.

(cherry picked from commit 89e593749a)
2026-06-29 02:35:57 -07:00
teknium1
2f5950a83a chore(release): add telos-oc to AUTHOR_MAP for PR #14353 salvage 2026-06-29 02:25:48 -07:00
Telos
fa11b11cf5 fix: propagate key_env from custom_providers into ProviderDef
resolve_custom_provider() previously returned api_key_env_vars=()
for every custom provider entry, silently dropping the configured
key_env field. This caused 401 errors for any custom provider that
required an API key via environment variable (e.g. Xiaomi MiMo Token
Plan, self-hosted OpenAI-compatible servers).

The key_env field is already documented in _VALID_CUSTOM_PROVIDER_FIELDS
and normalized by normalize_custom_provider_entry(), so this was just
an oversight in the ProviderDef construction.

Also adds a regression test that verifies key_env is properly
propagated into the resolved ProviderDef.
2026-06-29 02:25:48 -07:00
teknium1
9f97915163 fix(browser): route open-timeout base through _safe_command_timeout
Wire the salvaged _safe_command_timeout() guard into the surviving
open-timeout call site. _get_open_command_timeout() feeds the
browser_navigate 'open' path; this closes the last call site that
could observe a None timeout from a torn cache (#14331), since the
original PR's max(_get_command_timeout(), 60) site no longer exists
on main (now routed through _get_open_command_timeout).
2026-06-29 02:24:57 -07:00
Sanjay Santhanam
c79e6bceae fix(browser_tool): resolve race in _get_command_timeout cache returning None (#14331)
# Conflicts:
#	tools/browser_tool.py
2026-06-29 02:24:57 -07:00
Teknium
bf0d8fed8e fix(config): v32 migration flips baked-in verify_on_stop=true to false (#54740)
The first ship of verify-on-stop (config v30) defaulted
DEFAULT_CONFIG agent.verify_on_stop to a literal True, and migrate_config
persists defaults with strip_defaults=False — so every install that updated
through v30 had verify_on_stop: true written into config.yaml as a literal.

The v30->v31 migration only flipped missing/'auto' values to false and
deliberately preserved an explicit bool, so it skipped that entire population
and left verify-on-stop ON for everyone who had updated. A literal true was
never a user choice: the feature had no off-switch worth setting it against
until v31 introduced one, so a true persisted before v32 is always the old
machine default.

v32 migration flips a literal true -> false once, for both v30 (skipped v31)
and v31 (preserved-by-bug) installs. A true the user sets AFTER v32 is a
deliberate opt-in and is never touched.
2026-06-29 01:51:08 -07:00
teknium1
75317d82d0 fix(vision): narrow the fan-out cap to the CPU encode burst only
The original cap held a process-global slot across the WHOLE vision
analysis (image load + encode + LLM call) with a default of min(CPUs, 4).
That serialized legitimate multi-image workflows — "compare these 6
screenshots", "read this 10-page scan", "analyze every frame" — behind a
4-wide gate, and on the native fast path it even throttled calls that make
no LLM request at all. Excess calls queued (blocking acquire, nothing
dropped), but the latency hit on real fan-out was the wrong tradeoff.

The incident was CPU exhaustion, not call count: concurrent base64/resize
bursts saturated every core and left none to service the shared event loop
serving /api/status. So cap ONLY that:

- A dedicated, bounded ThreadPoolExecutor (_vision_cpu_executor) runs the
  encode/resize/dimension-check off the caller's loop, sized to the host's
  usable core count with NO fixed ceiling — the cap tracks the actual
  exhausted resource (cores), not a magic number. Excess encodes queue on
  the executor; cores stay free for the loop.
- The LLM call is deliberately OUTSIDE the executor, so multi-image
  workflows keep full request concurrency.
- Override via auxiliary.vision.max_concurrency / HERMES_VISION_MAX_CONCURRENCY
  (honored verbatim, including above core count); sub-1 ignored.
- _vision_concurrency_slot() is now a no-op shim for back-compat.

Tests assert: resolver defaults to host cores with no ceiling; env/config
override (incl. above cores); sub-1 rejection; the executor is dedicated and
core-sized; encode runs on a vision-encode thread; and crucially that encode
bursts are bounded to the cap while the analyses themselves stay fully
concurrent (calls_peak > cap).
2026-06-29 01:27:10 -07:00
Ben Barclay
eddfecd2ce fix(vision): cap vision_analyze fan-out concurrency process-wide
A single agent turn can fan out N vision_analyze calls at once — the
classic trigger is "analyze every frame of this video", where ffmpeg
explodes a clip into dozens of frames and the model calls vision_analyze
on each. Every call does a CPU-heavy base64-encode/resize burst AND holds
a long-lived LLM stream open. The tool executor runs concurrent tool calls
on a per-session ThreadPoolExecutor (_MAX_TOOL_WORKERS=8), and multiple
agent sessions share one process (the dashboard runs the agent in-process),
so there was no global ceiling. In prod (June 2026) a video-frame fan-out
pinned a worker thread at ~100% CPU and starved the shared asyncio event
loop that also serves the dashboard's /api/status liveness probe, flapping
the instance to UNHEALTHY even though nothing had crashed.

Add a process-global threading.BoundedSemaphore that bounds how many vision
analyses run concurrently across the whole process, held across the entire
analysis (image load + encode + LLM call) in the single _handle_vision_analyze
chokepoint (covers both the native fast path and the legacy aux-LLM path).

It is a threading semaphore, NOT asyncio: each vision call is dispatched
through model_tools._run_async on a per-thread event loop, so an asyncio
primitive bound to one loop cannot coordinate across them. The acquire is
offloaded via run_in_executor so waiting for a slot never blocks the calling
loop.

Default: min(host CPUs, 4), floored at 1 — respect the host's concurrency,
or lower. Override via auxiliary.vision.max_concurrency (config.yaml) or
HERMES_VISION_MAX_CONCURRENCY (env). Values < 1 are ignored so the cap can
never be disabled into an unbounded fan-out.

Tests: bounded-fan-out regression guard + a control proving it would fail
without the cap; resolver tests for host-cpu default, ceiling clamp, low-cpu
host, env override, and sub-1 rejection. Pre-existing handler tests updated
for the now-async _handle_vision_analyze. Verified via the real
registry.dispatch -> _run_async per-thread-loop path (16 concurrent calls,
peak bounded to cap).
2026-06-29 01:27:10 -07:00
teknium1
115e78c377 test(camofox): accept headers= kwarg in persistence test mocks
The auth-header fix adds headers=_auth_headers() to all Camofox HTTP
calls. Two _capture_post mocks in the persistence test lacked a headers
parameter, so navigate raised TypeError and the success assertions
failed. Add headers=None to both mock signatures.
2026-06-29 01:26:24 -07:00
teknium1
41095fdb04 fix(camofox): register CAMOFOX_API_KEY in OPTIONAL_ENV_VARS
The auth-header fix reads CAMOFOX_API_KEY but it was never registered,
so it didn't surface in `hermes setup` / `hermes tools`. Add it as an
advanced password-category tool env var alongside CAMOFOX_URL.
2026-06-29 01:26:24 -07:00
kaishi00
08d6195bc4 fix(camofox): auto-recover from stale tab 404 on navigate
When a Camofox browser tab is garbage collected (idle timeout, browser
recycle), the held tab_id becomes stale. The next browser_navigate call
hits /tabs/{stale_id}/navigate -> HTTP 404 -> unhandled HTTPError.

Catch the 404 in camofox_navigate, clear the stale tab_id, and create a
fresh tab via _ensure_tab. The agent recovers transparently without
requiring a session restart.

Other tab operations (snapshot, click, type, etc.) use the same pattern
but only fail if the tab dies between successful calls — much rarer.
The navigate fix covers 95%+ of cases since navigate is always the entry
point.
2026-06-29 01:26:24 -07:00
liuhao1024
fe38d50833 fix(tools): read browser.command_timeout in Camofox HTTP client
The Camofox browser backend hardcoded a 30s HTTP timeout via
_DEFAULT_TIMEOUT, ignoring the user's browser.command_timeout config.
The main browser_tool path already reads this config via
_get_command_timeout().

This commit adds an equivalent _get_command_timeout() to
browser_camofox.py that reads browser.command_timeout from config
with caching, and switches all HTTP helper methods (_post, _get,
_get_raw, _delete) to use it as the default timeout.

Fixes #40843
2026-06-29 01:26:24 -07:00
刘昊
babd9168ba fix(browser): send Authorization header in Camofox HTTP calls when CAMOFOX_API_KEY is set
The five HTTP call sites in browser_camofox.py (_ensure_tab, _post,
_get, _get_raw, _delete) did not include Authorization headers, causing
403 Forbidden when the Camofox server has API key auth enabled.

Added _auth_headers() helper and wired it into all five call sites.
The health check endpoint (/health) is left without auth since it is
a connectivity probe, not a browser operation.

Regression test covers: header present when key set, absent when unset,
blank key produces empty headers.

Fixes #20476
2026-06-29 01:26:24 -07:00
liuhao1024
270456308c fix(tools): send listItemId instead of sessionKey in Camofox tab creation
The Camoufox REST API server expects `listItemId` in the `POST /tabs`
body, but `_ensure_tab` was sending `sessionKey`.  This caused a 400
Bad Request on every `browser_navigate` call.

The parameter name mismatch is visible in the same file: line 283
already reads `tab.get("listItemId")` when adopting existing tabs,
confirming the server-side field name.

Fixes #37960
2026-06-29 01:26:24 -07:00
teknium1
34e616e778 feat(slack): nudge stale installs to add mpim scopes; mark message.mpim required
Follow-up to the group-DM manifest fix. The manifest change only helps
NEW installs; existing apps keep their old (mpim-less) scopes until the
admin reinstalls. Since a missing message.mpim event delivers nothing
(no runtime API error to catch), detect stale installs at connect time
from the auth.test x-oauth-scopes header and log an actionable reinstall
nudge when im:history is granted but mpim:history is not. Also promote
message.mpim from Recommended to Required in the docs event tables so the
default setup path can't drop it.
2026-06-29 01:02:53 -07:00
Ben
4125cc3b7c fix(slack): subscribe to message.mpim + mpim scopes so group DMs work
Group DMs (multi-person DMs, channel_type=mpim) were never delivered to
the Slack bot. The adapter already classifies mpim as a DM and replies
ambiently (adapter.py:2526, is_dm = channel_type in {im, mpim}), but the
generated app manifest only subscribed to message.im / im:history — the
1:1 DM pair. Without the message.mpim event subscription Slack drops
group-DM messages before the adapter ever sees them, so 1:1 DMs worked
while group-DM ambient mode was dead.

Add message.mpim to bot_events and mpim:history (the scope that event
requires per Slack docs) + mpim:read (mirrors im:read for the
conversations.info classification call) to bot_scopes. Update the
SLACK_BOT_TOKEN / SLACK_APP_TOKEN setup-help strings and the Slack docs
(EN + zh-Hans: scope table, event table, troubleshooting) so existing
installs are told to add the new scopes and reinstall.

Reported by an enterprise customer. Note: this is a manifest/scope
change, so it only takes effect after the app is reinstalled and the
new scopes are accepted.

Tests: assert message.mpim + mpim:history + mpim:read are in the
manifest (with and without assistant mode); both fail on current main
and pass with this change.
2026-06-29 01:02:53 -07:00
Teknium
29f0968275 test(windows): harden pid-scan no-window assertion against captured-call leakage (#54707)
test_gateway_pid_scan_hides_wmic_and_powershell_windows flaked once in CI
(slice 7/8) with 'KeyError: creationflags' while passing 15/15 under exact
CI-parity locally. The positional 'kwargs["creationflags"]' indexing raises
a bare KeyError the moment any stray subprocess.run call is captured, masking
the real contract. Filter captured calls to the two intended Windows console
spawns (wmic + PowerShell fallback) and assert each is windowless via
.get('creationflags'); a leaked/extra call now surfaces as a readable
len-mismatch with the full captured list, not a cryptic KeyError.
2026-06-29 01:01:29 -07:00
Teknium
0434a9a5ec chore: regenerate uv.lock for supermemory + mem0 extras 2026-06-29 00:25:36 -07:00
Ben Barclay
1289f12812 fix(memory): lazy-install supermemory + mem0 SDKs like honcho/hindsight
The supermemory and mem0 memory providers shipped third-party SDKs
(supermemory / mem0ai) that are not core dependencies, but — unlike the
honcho and hindsight providers — they imported those SDKs directly with
no tools.lazy_deps.ensure() preflight and had no LAZY_DEPS allowlist
entry. On the published Docker image the agent venv is sealed
(HERMES_DISABLE_LAZY_INSTALLS=1) and lazy installs are redirected to a
writable durable target (HERMES_LAZY_INSTALL_TARGET). honcho/hindsight
route through ensure() and install fine there; supermemory/mem0 never
called it, so their SDK was never installed on a hosted instance and the
provider silently reported itself unavailable even with the API key set.

Fixes:
- Add memory.supermemory + memory.mem0 to the LAZY_DEPS allowlist
  (tools/lazy_deps.py), pinned to current PyPI releases.
- Call ensure('memory.<x>', prompt=False) at each SDK-import chokepoint
  (_SupermemoryClient.__init__; Mem0MemoryProvider._create_backend),
  mirroring honcho's wrapped try/except shape.
- Drop the SDK-import gate from supermemory's is_available() — it was a
  chicken-and-egg trap (provider never loaded on a sealed venv, so
  ensure() never ran). Now key-presence only, like honcho/mem0.
- Add matching pyproject extras [supermemory]/[mem0]; update the
  lazy-covered-extras contract test (excluded from [all] by policy).

Tests prove each path fails without the fix and the real sealed-venv
durable-target gate accepts both features.
2026-06-29 00:25:36 -07:00
teknium1
f860492842 test(desktop): match multiline spawn(ps, fullArgs) via regex like sibling sites
The bootstrap-runner PowerShell spawn is formatted multiline (spawn(\n  ps,\n  fullArgs,...), so the literal substring 'spawn(ps, fullArgs' never matched and the assertion was failing on main independent of #54635. Convert it to a whitespace-tolerant regex like every other call-site assertion in this file.
2026-06-28 23:59:32 -07:00
emozilla
aa2ae36c3f fix(desktop): launch Windows backend as console python so child consoles are inherited, not flashed
The recurring Windows desktop console-flash bug (#54220) is governed by the
*parent's* console, not by each child spawn. The desktop backend was launched as
GUI-subsystem pythonw.exe, which has no console at all — so every
console-subsystem child it spawns (git, gh, cmd, wmic, powershell, ...) had to
allocate its own console, flashing a window. That is why the fix had become an
endless per-call-site sweep of CREATE_NO_WINDOW flags: each leaf spawn was
papering over a missing console on the root.

Launch the backend as the venv's console python.exe instead. Under the existing
hiddenWindowsChildOptions() wrapper (windowsHide: true -> CREATE_NO_WINDOW) the
backend owns a single *windowless* console, and every descendant spawn inherits
it instead of allocating a visible one. This makes "no flashing windows" a
property of the one backend launch rather than a flag that must be remembered at
every spawn site — including spawns inside third-party libraries that no
call-site sweep can reach.

Verified on Windows 11 25H2 (Windows Terminal default): with the per-site hide
flag forcibly neutered, the canonical culprits (git/gh/cmd/wmic/powershell)
spawned naively and none flashed, while the same naive spawn from the old
console-less pythonw parent did flash — isolating the parent console as the cause.

Two premises behind the old pythonw approach did not hold up on current Windows
and are dropped here:
- The venv Scripts\python.exe uv shim, under CREATE_NO_WINDOW, re-execs base
  python *windowless* — it does not flash a conhost (the #52239 concern), so the
  base-pythonw detour is unnecessary.
- Console python restores stdout, so the backend announces its port on the normal
  HERMES_DASHBOARD_READY stdout line; the pythonw-only ready-file side channel is
  no longer needed and the readyFile opt-in is removed.

Removes the now-dead pythonw machinery (getNoConsoleVenvPython, toNoConsolePython,
applyWindowsNoConsoleSpawnHints, readVenvHome) and updates the test to assert the
new invariant: backend command is never pythonw, both backend spawns still go
through hiddenWindowsChildOptions, and no backend opts into the ready-file path.

Scope: this fixes the high-frequency backend-descendant flash classes. The
updater/UAC handoff (#54543) and embedded-terminal PTY accumulation (#53555)
classes have separate root causes and are unaffected.
2026-06-28 23:59:32 -07:00
Ben
20b03d9aee i18n: add Custom Keys strings to all locale files
The env translation block is type-checked across every locale (tsc -b), so
the 8 new customKeys strings must exist in all of them, not just en/zh. Add
translated entries to the remaining 14 locales (de, es, fr, it, ja, ko, pt,
ru, tr, uk, hu, ga, af, zh-hant).
2026-06-28 22:53:56 -07:00
Ben
1c75e7c9d8 feat(dashboard): list & add arbitrary custom .env keys on the Keys page
The Keys page only rendered env vars present in a catalog (OPTIONAL_ENV_VARS
or the provider catalog); any other key a user set in .env was invisible, and
there was no way to add an arbitrary env var from the GUI (e.g. to inject a
var a skill or MCP server needs).

Backend: GET /api/env now also emits a row for every on-disk .env key that
isn't in any catalog, flagged category="custom" + custom=true and
password-masked (an unrecognised key could hold anything, so it's redacted and
reveal-gated like any secret). Channel-managed credentials stay excluded. The
write (PUT /api/env) and reveal (POST /api/env/reveal) paths already handle
arbitrary keys, with the existing env-name guard + denylist (PATH, LD_PRELOAD,
PYTHONPATH, …) enforced server-side — no new write surface.

Frontend: a new "Custom Keys" section lists those custom rows and carries an
add-a-key form (client-side name validation mirroring the backend regex; the
new row reuses the normal edit/save flow, so on save it round-trips back from
the backend as a durable custom row). i18n added for en + zh + types.

Tests: behavior-contract coverage that an unknown .env key surfaces as a
masked custom row and a catalogued key does not — verified to fail on the
pre-fix backend.
2026-06-28 22:53:56 -07:00
HexLab98
23f245eda5 test(vision): cover Ollama /api/show vision capability routing (#54511) 2026-06-28 22:52:59 -07:00
HexLab98
d7e573e54d fix(vision): detect Ollama vision models via /api/show (#54511)
When local Ollama models are absent from models.dev, probe the Ollama
server's /api/show capabilities so attached images are routed natively
instead of being stripped as non-vision input.
2026-06-28 22:52:59 -07:00
sgaofen
b481348fbc fix(agent): stream copilot ACP chat completions 2026-06-28 22:52:51 -07:00
sgaofen
0106082d1f fix(agent): return OpenAI-shaped copilot ACP tool calls 2026-06-28 22:52:51 -07:00
sgaofen
032d702140 fix(agent): omit stream_options for native Gemini streaming
Google's native Gemini REST endpoint (generativelanguage.googleapis.com,
non-/openai) rejects OpenAI-only stream_options={"include_usage": true},
crashing every streaming chat-completions call with TypeError. Omit it for
that endpoint while keeping it for the Gemini OpenAI-compat shim and all
OpenAI-compatible aggregators (OpenRouter, etc.) so usage accounting is
preserved.

Reuses is_native_gemini_base_url() so the compat shim (.../openai), which
accepts stream_options, is correctly excluded from the omission.

Fixes #14387

Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
2026-06-28 22:52:46 -07:00
Teknium
25d35cce18 infographic: Windows CLH lock-timeout traceback suppression (#54436 salvage) 2026-06-28 22:35:56 -07:00
helix4u
98a7cfb8f9 fix(logging): suppress Windows lock timeout tracebacks 2026-06-28 22:35:56 -07:00
Teknium
74541beb9c fix(security): cap WeCom callback body size before pre-auth XML parse (#54615)
The WeCom callback endpoint (internet-facing, 0.0.0.0) parsed untrusted
request bodies before signature verification. defusedxml already guards
the entity-expansion class on main, but there was no cap on raw body
size, so an unauthenticated POST could still force unbounded read work
pre-auth.

Set client_max_size=64KB on the aiohttp app (413 at the framework layer)
plus an explicit length guard in _handle_callback as defense in depth.
WeCom callbacks are small encrypted XML envelopes — media is delivered
out-of-band via MediaId, never inline — so 64KB is ample for legitimate
traffic. Adds tests for oversized (413) and normal-sized (not 413) bodies.

Salvaged from #10192 by @memosr (body-size limit half; defusedxml half
already superseded on main).
2026-06-28 22:35:43 -07:00
teknium1
0b733a8418 test(gateway): pin auto-reset cached-agent eviction (#10710)
Relocate marco0158's eviction into the dedicated auto-reset cleanup block
(single source of truth for dropping session-scoped transient state) and
add an AST invariant pinning _evict_cached_agent into that block. Add
AUTHOR_MAP entry for marco0158.
2026-06-28 22:35:17 -07:00
marco0158
b4300f2d96 fix(gateway): evict cached agent on auto-reset to prevent stale context summary leak
When a session is auto-reset by daily schedule, idle timeout, or suspended
state, the agent cache was not being cleared. This caused the old agent's
context_compressor._previous_summary to leak into the new session, mixing
old conversation history into new compaction summaries.

This was the root cause of the "skin making history" appearing after
compaction in fresh sessions reported by the user.

Follow-up to #9893 which only handled compression_exhausted case.

Changes:
- Add _evict_cached_agent(session_key) call after was_auto_reset check
- Covers daily, idle, and suspended auto-reset scenarios
- Matches the behavior of manual /reset command

Related tests: test_session_boundary_hooks, test_async_memory_flush,
test_session_reset_notify, test_session_reset_fix - all passing.
2026-06-28 22:35:17 -07:00
Junass1
61a4526ac7 fix(gateway): clear session-scoped model overrides on /resume
/resume is a conversation boundary, but unlike /new it did not clear the
chat-keyed _session_model_overrides / _pending_model_notes. A /model switch
made in the previous session under the same chat session_key leaked into the
resumed conversation, running it on the wrong model.

Clear both maps for the session_key after the switch (mirroring /new), scoped
to that key so other chats' overrides are untouched. The cached-agent eviction
this leak also implied already landed via #6672.

Closes #10702.
2026-06-28 22:35:12 -07:00
Shannon Sands
476875acb9 Add dashboard backup upload and download 2026-06-28 22:35:09 -07:00
Ben Barclay
8fe800ee1a fix(file-tools): sanitize host/relative cwd override before it reaches container sandbox (#54447) (#54616)
(cherry picked from commit 82132f7911)

Co-authored-by: Tranquil-Flow <66773372+Tranquil-Flow@users.noreply.github.com>
2026-06-29 15:32:20 +10:00
Ben Barclay
e1f4098b9f docs(cron): document explicit per-channel delivery targets for all platforms (#54630)
The cron delivery table only showed Discord/Telegram with explicit
target syntax and described Slack and every other platform as
home-channel-only. In fact the generic platform:<target> routing in
_resolve_single_delivery_target resolves explicit targets for every
platform: Slack (#channel / channel ID / channel:thread_ts), Matrix
(room/user IDs), Feishu (chat:thread), WhatsApp (JID / E.164), Signal
(group / E.164), SMS, Email, and Weixin all have dedicated explicit-
target branches in _parse_target_ref; the remaining platforms accept a
generic platform:<chat_id> passthrough.

Update the Delivery Model table (en + zh-Hans) to show the real
per-platform syntax, document #channel name resolution via the channel
directory, and note the Slack thread_ts nuance. Docs-only.
2026-06-29 15:23:16 +10:00
brooklyn!
388268ecde Merge pull request #54568 from NousResearch/bb/shared-websocket-layer
refactor(desktop+dashboard): shared WebSocket layer + decouple desktop from dashboard (hermes serve)
2026-06-28 23:43:49 -05:00
brooklyn!
fb0644fbc2 Merge pull request #54585 from NousResearch/bb/desktop-terminal-history
feat(desktop): persist & restore terminal tabs + scrollback across relaunch
2026-06-28 23:38:16 -05:00
Brooklyn Nicholson
1af109c79c test(cli): drop pytest dep + use real sentinel handlers in serve test
Clears the ty diff bot's warnings on the new test: pass real callables to
build_dashboard_parser (not object()) and replace the pytest.mark.parametrize
with a plain loop so the file is stdlib-only.
2026-06-28 23:24:45 -05:00
Ruzzgar
313a8c6833 fix(skills): replace string prefix check with strict path containment 2026-06-28 21:14:01 -07:00
Ben Barclay
0943e2a272 fix(cron): don't report a false 'gateway not running' on external-provider instances (#54600)
`hermes cron status` (and the create/list 'gateway not running' nag)
judge whether cron will fire purely from the in-process ticker's
heartbeat file + a live gateway PID. That heuristic is correct for the
built-in ticker but WRONG for an external provider like Chronos:

Chronos arms exactly one external one-shot per job and is fired by a
NAS-mediated webhook (POST /api/cron/fire). Its `start()` returns
immediately and it deliberately runs no 60s loop and writes no ticker
heartbeat — that's the whole point of scale-to-zero (the machine is at
zero between fires). So on a perfectly healthy Chronos instance,
`cron status` always printed '✗ Gateway is not running — cron jobs will
NOT fire' (or a STALLED-ticker warning), and `cron create` always
appended the 'jobs won't fire automatically' nag — both false.

Verified live on a staging Chronos instance: jobs fired and completed on
schedule via the relay while `cron status` insisted the gateway wasn't
running and the heartbeat was 370s+ stale.

Fix: resolve the active provider (offline — `resolve_cron_scheduler`,
whose `is_available()` contract forbids network) and, for any non-builtin
provider, report the managed-scheduler state instead of the ticker
heuristics, and suppress the ticker-only 'gateway not running' warning.
The built-in path is byte-unchanged. Active-job summary is factored into
a shared helper so both paths print it identically.

New tests prove both directions (chronos: no false negative even with no
gateway PID / no heartbeat; builtin: historical warning preserved) and
fail without the fix.
2026-06-29 14:03:02 +10:00
Teknium
e20ff352b9 test(matrix): authorize inviter in DM-invite fixture for new invite-auth gate
_on_invite now rejects auto-joins from users not on the allow-list. The
DM-recording tests invite @alice and expect a join, so the shared
_make_adapter fixture now puts @alice on _allowed_user_ids.
2026-06-28 20:47:33 -07:00
aaronagent
d836b2bac4 fix(matrix,mattermost): invite auth check + API path traversal guard
Two platform-security hardenings:

- Matrix: _on_invite now checks the inviter against the existing
  allow-list (_allowed_user_ids / GATEWAY_ALLOW_ALL_USERS) before
  auto-joining. Without this any federated Matrix user could invite
  the bot into arbitrary rooms, exposing its presence and metadata.
  The message and reaction paths already enforce this allow-list; the
  invite path bypassed it.

- Mattermost: _api_get / _api_post / _api_put reject any path
  containing '..'. WebSocket-event values (channel_id, post_id,
  file_id) are interpolated directly into API paths, so a malicious or
  compromised server could craft traversal payloads to make the bot
  issue authenticated requests to arbitrary endpoints with its bearer
  token.

The configurable-E2EE-passphrase change from the original PR is dropped:
the matrix adapter was rewritten onto mautrix and the passphrase-protected
key-export file no longer exists.
2026-06-28 20:47:33 -07:00
Teknium
9cf9d3a28f chore(release): add AUTHOR_MAP entry for PR #53295 salvage 2026-06-28 20:46:44 -07:00
lkevincc
163562bf88 fix: normalize lmstudio base urls 2026-06-28 20:46:44 -07:00
Teknium
43eaf79ae6 chore: remove committed PR infographics and gitignore the path (#54564)
PR infographics are rendered locally and embedded in PR descriptions via
the image-provider (fal.media) URL — they were never meant to live in the
repo. The intended .gitignore enforcement (documented as added back in May
2026) was never actually committed, so 35 PNGs (~54MB) accumulated under
infographic/ via 'docs: add PR infographic for X' commits.

- Remove all 35 tracked infographic/*.png files.
- Add infographic/ to .gitignore so git add on the path is now a no-op.

The PR body remains the archive for these images.
2026-06-28 20:46:35 -07:00
teknium1
14204b0646 test(agent): cover .hermes.md no-git-root cwd-only behavior
Regression tests for the injection fix: outside a git repo only cwd is
checked (planted ancestor .hermes.md is ignored), a cwd-local .hermes.md
is still found, and inside a git repo the parent walk to the git root
still works.
2026-06-28 20:46:32 -07:00
aaronagent
306b6615cf fix(agent): limit .hermes.md parent walk to git repos only
_find_hermes_md walks parent directories looking for .hermes.md/HERMES.md,
stopping at the git root. But when there is no git repo (_find_git_root
returns None), the stop guard never fires and the loop walks all the way
to /. On shared systems (CI runners, multi-tenant servers), a .hermes.md
planted at /tmp, /home, or / would be loaded into the system prompt of any
agent session not inside a git repo — a cross-user prompt-injection vector.

Fix: when there is no git root, only check cwd; do not walk parents.

Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
2026-06-28 20:46:32 -07:00
Brooklyn Nicholson
1c0fa12edb feat(desktop): persist & restore terminal tabs + scrollback across relaunch
User terminal tabs and their recent scrollback now survive an app restart
(VS Code parity). Tabs, active selection, cwd, and a serialized scrollback
snapshot are written to localStorage on every change; on launch the tabs
reopen with their history replayed above a fresh shell. Processes are NOT
revived — a new shell starts one line below the restored block.

- Capture: SerializeAddon snapshots the buffer on a 750ms leading-edge
  throttle, so a `cmd; quit` lands on disk before teardown; the snapshot is
  trimmed of its trailing idle prompt (no "double prompt" on restore) and
  capped (200 scrollback lines / 48k chars) to stay under the storage budget.
- Teardown guard: app quit/reload kills the PTYs from the main process,
  firing onExit in the renderer, but React skips effect cleanups on teardown
  so the per-instance `disposed` flag never flips. A pagehide/beforeunload
  flag stops onExit from calling closeTerminal() and wiping the persisted
  tabs right before relaunch restores them. A real `exit`/Ctrl-D still closes.
- Agent mirror tabs stay runtime-only — only user tabs persist.
2026-06-28 22:12:29 -05:00
Brooklyn Nicholson
9d9a50c2bc test(cli): pin the hermes serve decoupling contract
Add a focused contract test for the headless `serve` command (routes to the
shared dashboard handler, headless by default while `dashboard` is not, accepts
the legacy --no-open, shares the same runtime/lifecycle flag surface). Also
refresh the dashboard.py module docstring to cover both commands.
2026-06-28 22:11:48 -05:00
Brooklyn Nicholson
e684b808ad fix(desktop): route old runtimes through dashboard when serve is absent
`hermes serve` is newer than the desktop binary's release cadence, so a new
app launched against an un-upgraded managed install / PATH `hermes` would
crash on an unknown subcommand and brick the user mid-upgrade. Detect whether
the resolved runtime registers `serve` (fast source read of its dashboard.py,
with a one-time CLI probe fallback) and rewrite the backend argv to the legacy
`dashboard --no-open` only when it does not. Happy path (current runtimes)
pays nothing and still spawns `serve`.

- electron/backend-command.cjs: pure serve/dashboard argv helpers + serve-
  source detection (unit-tested in backend-command.test.cjs)
- main.cjs: backendSupportsServe() cache + getBackendArgsForRuntime() guard at
  both backend spawn sites; expose `root` from the Windows venv unwrap so the
  fast source check covers Windows too
- docs: note the backward-compat fallback in README, desktop.md, AGENTS.md
2026-06-28 22:10:42 -05:00
Brooklyn Nicholson
dff491a2b9 feat(cli): add headless hermes serve backend; desktop no longer launches dashboard
The desktop app spawned `hermes dashboard --no-open` as its backend, which
made the dashboard look like a desktop prerequisite. Add a dedicated headless
`hermes serve` command that boots the same gateway (shared cmd_dashboard /
start_server) but never opens a browser, and point the desktop backend spawn
exclusively at it. dashboard and serve are now independent surfaces — neither
launches the other.

- subcommands/dashboard.py: factor shared server args; add `serve` parser
  (always headless; accepts legacy --no-open as a no-op)
- main.py: register serve in _BUILTIN_SUBCOMMANDS + coalesce set + gui-log
  detection; extend stale-backend reaper patterns to match `serve`
- desktop electron: spawn `serve`, rename dashboardArgs -> backendArgs,
  update comments + windows-child-process test assertions
- docs: desktop README, desktop.md (incl. remote-backend), AGENTS.md, and
  cli-commands.md now describe `hermes serve` as the desktop/headless backend
2026-06-28 22:04:22 -05:00
brooklyn!
4488fe134b Merge pull request #54517 from NousResearch/bb/desktop-multiterminal
feat(desktop): multi-terminal panel with read-only agent terminals
2026-06-28 21:52:14 -05:00
Brooklyn Nicholson
f019a999d8 docs: clarify desktop is self-contained, not dependent on the dashboard
The desktop app spawns a headless `hermes dashboard --no-open` backend and
talks to it through the shared @hermes/shared WebSocket client — it never
runs or requires the browser dashboard UI. Spell this out in the desktop
README, the desktop docs page, and AGENTS.md so "dashboard" stops reading
as a desktop prerequisite.
2026-06-28 21:50:33 -05:00
Brooklyn Nicholson
e9b95dfd19 fix(docker): include apps/shared in dashboard image build
The shared websocket package is a web file: dependency but was excluded
by .dockerignore and never copied into the Docker build context. Also fix
tsc -b errors: expose buildWsUrl on api and drop the GatewayClient state
getter that conflicted with the shared base class.
2026-06-28 21:43:56 -05:00
Brooklyn Nicholson
ae465e9fb8 Merge branch 'main' of github.com:NousResearch/hermes-agent into bb/desktop-multiterminal 2026-06-28 21:37:52 -05:00
Brooklyn Nicholson
1a1e00f37e fix(desktop): stop injecting ctrl-l into terminal startup
Remove the prompt-gap cleanup that sent Ctrl-L into the user's shell; it could
render as literal ^L and create the exact top-line gap it was meant to hide.
Keep first-prompt cleanup renderer-side only, and parse short ESC charset
sequences so the initial newline stripper does not disarm early.

Also add a Close all action to the terminal tab context menu.
2026-06-28 21:33:20 -05:00
brooklyn!
83f09f52f9 Merge pull request #54558 from NousResearch/bb/overlay-panels
feat(desktop): shared overlay Panel primitive for cron/profiles/agents
2026-06-28 21:32:14 -05:00
Brooklyn Nicholson
216ace4bf3 style(shared): apply workspace formatter to websocket helpers
Run the package-appropriate Prettier config on the shared WebSocket files so
the extracted helpers match the surrounding desktop/shared TypeScript style.
2026-06-28 21:30:43 -05:00
Brooklyn Nicholson
5a2906a11b chore(desktop): keep the diff surgical
Revert the repo-wide prettier churn the earlier fmt pass pulled into files
unrelated to this work; run prettier/eslint scoped to the touched files only.
2026-06-28 21:30:14 -05:00
Brooklyn Nicholson
f6ccf08ee6 refactor(web): centralize dashboard websocket URL calls
Keep dashboard pages and components on the dashboard API helper instead of
calling the raw shared URL primitive directly. The shared helper remains the
single low-level implementation; web/src/lib/api.ts is the dashboard-specific
facade for auth, base path, and ticket minting.
2026-06-28 21:27:06 -05:00
Brooklyn Nicholson
6776b2f9b5 feat(desktop): live gateway popout + statusbar/command-center polish
- Gateway status popout: flatten the header to stacked connection + inference
  statuses with system-panel and restart actions (reusing the shared
  runGatewayRestart helper). The recent-activity tail is now live while the
  popout is open via the shared LogView (WS connection churn filtered), and the
  icon / "View all logs" link dismiss the popover.
- Statusbar "menu" items accept a menuContent(close) render fn over a now
  controlled DropdownMenu, so popover content can close itself.
- Drop the always-on gateway-log poll from useStatusSnapshot (logs are fetched
  by the popout only while open).
- SearchField → text-xs to match Input/Select (controlVariants).
- Command center: remove the usage/system section dividers, swap the sessions
  nav icon (Pin → MessageCircle), small padding tweaks.
2026-06-28 21:26:15 -05:00
Brooklyn Nicholson
5a4bdfda50 fix(shared): close websocket clients deterministically
Ensure intentional client closes mark the transport closed and reject pending
RPCs immediately instead of relying on a browser close event that can be
ignored after the socket reference is cleared.
2026-06-28 21:25:12 -05:00
Brooklyn Nicholson
6c52e4a318 fix(desktop): match agent terminal scrollback to user tabs
Keep read-only agent terminal tabs visually and behaviorally aligned with normal
terminal tabs by using the same 1,000-line scrollback cap.
2026-06-28 21:22:17 -05:00
Brooklyn Nicholson
dfb561a3ae refactor(desktop+dashboard): extract shared WebSocket/JSON-RPC layer
The Electron desktop app and the web dashboard each carried their own
copy of the tui_gateway JSON-RPC WebSocket client plus near-identical
auth'd WS-URL construction. The dashboard's copy was the historical
source of the "is the dashboard required to run the desktop app?"
confusion, since the two surfaces looked coupled.

Consolidate the genuinely shared transport into the existing
framework-agnostic `@hermes/shared` package so both surfaces consume it
independently — neither app depends on the other:

- Move `resolveGatewayWsUrl` + `GatewayReauthRequiredError` (single-use
  OAuth ticket re-mint vs long-lived token fallback) into
  `@hermes/shared`; desktop now imports them directly.
- Add `buildHermesWebSocketUrl`, one base-path/scheme/auth-aware URL
  builder, and route every dashboard WS endpoint through it
  (`/api/ws`, `/api/events`, `/api/pty`, plugin WS URLs).
- Reduce the dashboard `GatewayClient` to a thin subclass of the shared
  `JsonRpcGatewayClient`, deleting ~210 lines of duplicated pending-call
  /event-dispatch/connect plumbing while keeping its dashboard-specific
  ticket-vs-token auth selection.
- Drop the stale "start it with --tui" chat banner, which implied the
  dashboard flag was required.

Behavior is preserved on both surfaces; the dashboard additionally
inherits the shared client's 15s connect timeout (previously
desktop-only), so a hung connect now fails fast instead of pinning the
composer in "connecting".
2026-06-28 21:20:35 -05:00
Brooklyn Nicholson
adacb16d62 fix(desktop): make agent terminal tabs fully readable
Register read-only agent terminals with the same renderer-side terminal reader
as user terminals so read_terminal works on whichever tab is active.

Also bring agent xterm rendering closer to user-terminal parity (unicode 11,
web links, font weights/spacing) and make the gateway sink wiring resilient if
only one terminal event sink was already installed.
2026-06-28 21:18:49 -05:00
Ben
dee41d0716 feat(dashboard): catalogue all memory-provider API keys in OPTIONAL_ENV_VARS
The dashboard Keys page and `hermes setup` render API-key rows from
OPTIONAL_ENV_VARS, but only Honcho had an entry — so Hindsight,
Supermemory, Mem0, RetainDB, ByteRover, and OpenViking read their keys
straight from os.environ yet had no place to set them in the GUI.

Add catalog entries (category=tool, password-masked, with get-key URLs
and the tool each powers) for all six, plus the relevant base-URL/endpoint
companions. Pure declaration: the generic GET /api/env endpoint, the
save/reveal write path, and the sandbox env blocklist (which auto-derives
from tool-category OPTIONAL_ENV_VARS) all pick these up with no further
wiring.

Adds a behavior-contract test asserting every memory provider's primary
credential key is catalogued, tool-categorised, and password-masked.
2026-06-28 19:17:02 -07:00
Brooklyn Nicholson
e117cfdff0 feat(desktop): live agent terminals + agent-driven tab close
Make the read-only agent terminal mirrors stream in real time and give
the agent a desktop-only way to dismiss its own tabs.

- Stream background output live: the local reader used a blocking
  read(4096) that buffered small periodic output until EOF, so agent
  tabs only "filled in" at process exit. Switch to buffer.read1(4096)
  (decoded) for incremental chunks.
- Route agent.terminal.output / terminal.close to the window that owns
  the process (its gateway session) instead of an empty session id, so
  events actually reach the desktop renderer.
- Add close_terminal: a HERMES_DESKTOP-gated tool (sibling of
  read_terminal) that drops a process's read-only tab WITHOUT killing it
  via process_registry.on_close; output keeps buffering and the user can
  reopen from the status stack.
- ⌘W now closes a focused agent tab: mark the agent instance
  data-terminal and focus it on activation so isFocusWithin routes there.
- ensureTerminal() no longer spawns an extra user shell when a tab
  already exists (e.g. opening a background task from the status stack).
2026-06-28 21:15:14 -05:00
Brooklyn Nicholson
9f02eea1d2 style(desktop): prettier + eslint pass
Repo-wide `npm run fmt` + `eslint --fix`; also drop two unused destructured
params in titlebar-overlay-width.cjs so the lint run is clean.
2026-06-28 21:04:43 -05:00
Teknium
c8fd47be14 docs: add PR infographic for approval mode validation 2026-06-28 19:04:18 -07:00
LIC99
dda3268d09 fix(approvals): warn and default to manual on unknown approvals.mode
_normalize_approval_mode() previously accepted any string, so an unknown
value like 'auto' fell through every downstream mode check (off/smart) and
silently behaved like manual with no signal. Validate against the known
modes (manual/smart/off), emit a warning for anything else, and default to
manual to match the config default and the rest of the function.

Bug 1 from the original PR (/approve & /deny bypassing the running-agent
guard) already landed on main independently, so only the mode-validation
fix is salvaged here.

Fixes #4261

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-06-28 19:04:18 -07:00
Brooklyn Nicholson
317b94871b chore(desktop): drop dead overlay primitives
Remove zero-consumer overlay code surfaced while auditing the primitive set:
OverlayNewButton (orphaned once "New" moved into PanelAddButton), OverlayCard /
overlayCardClass, and the unused overlay-search-input module. Leaves three
intentional layers: OverlayView (base), Panel (master/detail), and
OverlaySplitLayout (settings/command-center nav→content).
2026-06-28 21:03:04 -05:00
Brooklyn Nicholson
991220747f feat(desktop): unify non-settings overlays under a shared Panel primitive
Extract the agents/trace overlay chrome into overlays/panel.tsx and adopt it
across the Cron, Profiles, and Agents overlays so they share one layout
(centered card, header, master/detail list with built-in search, kebab row
actions, big "+" footer, empty state) instead of three ad-hoc split layouts.

Also in this pass:
- OverlayView insets equidistantly on every side (was top/left-only, which
  left a large left gutter on narrow windows).
- Form-control chrome: input border/background/recessed-inset are now
  per-mode theme-var knobs (--dt-input-border/-bg/-inset) — resting borders
  blend in, strengthen on hover, and go solid on focus / while a Select is open.
- Thread-timeline popover reuses the shared dropdown surface (1:1 with the
  kebab menus) and scrolls the hovered prompt into view.
2026-06-28 20:56:52 -05:00
Teknium
11183e8332 fix(profiles): validate custom alias names to prevent path traversal
`hermes profile alias <profile> --name <custom>` accepted arbitrary
strings and used them verbatim as a filename under ~/.local/bin. Because
normalize_profile_name only lowercases/strips (no regex gate), a value
like `../../.bashrc` escaped the wrapper directory and clobbered
arbitrary user-writable files. remove_wrapper_script had the same sink.

Add validate_alias_name (reusing the profile-id regex, which forbids
`/`, `.`, and `..`) and wire it into check_alias_collision,
create_wrapper_script, remove_wrapper_script, and the CLI alias action so
the rejection surfaces a clear "Invalid alias name" error instead of
silently writing or unlinking outside the wrapper dir.

Co-authored-by: Gutslabs <gutslabsxyz@gmail.com>
Co-authored-by: Xowiek <xowiekk@gmail.com>
2026-06-28 18:53:33 -07:00
aaronagent
27ddd8fd80 fix(gateway): sanitize agent error messages, validate webhook gh args
Two of the three fixes from PR #6660 (the cli.py reopen_session change is
moot — that raw _conn.execute reopen block no longer exists on main).

- gateway/run.py: stop sending raw type(e).__name__ and str(e)[:300] to
  end users on chat platforms. Exception text from LLM providers can leak
  API URLs, file paths, and partial credentials. Return a generic message;
  keep curated status hints for known HTTP codes; full detail stays in logs.
- gateway/platforms/webhook.py: validate pr_number (positive int) and repo
  (owner/name regex) before passing to the 'gh pr comment' subprocess.
  Payload-controlled values could otherwise inject gh flags (--help, a
  different --repo). List-form subprocess means this is arg injection, not
  shell injection, but validation is still correct.

Co-authored-by: aaronagent <1115117931@qq.com>
2026-06-28 18:53:26 -07:00
aaronlab
ec148f5d31 fix(agent): guard Anthropic interrupt, cap vision data-URL size
Two independent agent-loop hardening fixes:

- anthropic: when the streaming loop breaks on _interrupt_requested,
  return None instead of calling stream.get_final_message() on the
  partially-drained stream — the SDK may hang draining remaining events
  or return a Message with incomplete tool_use blocks. The outer poll
  loop raises InterruptedError, so the return value is discarded anyway.

- vision: add a 20 MB cap on base64 data-URL payloads before
  base64.b64decode() in _materialize_data_url_for_vision. A 100MB+
  payload creates ~275MB of memory pressure; gateway users sharing the
  process can trivially OOM it. Oversized payloads return ("", None).

The third change from the original PR (streaming tool-name +=  to
assignment dedup) was already landed independently on main.

Co-authored-by: aaronlab <1115117931@qq.com>
2026-06-28 18:53:20 -07:00
Teknium
490f215a19 test: cover export-prefix stripping in .env parsers (PR #6659) 2026-06-28 18:53:00 -07:00
aaronagent
5c1ac6c70d fix(config): strip export prefix in .env parsers across three modules
All three .env parsers use `line.partition("=")` without stripping the
bash-compatible `export ` prefix first.  A line like `export API_KEY=sk-...`
produces key `"export API_KEY"` instead of `"API_KEY"`, silently ignoring
the variable and causing auth failures for users who copy-paste from
bash profiles or follow tutorials that include `export`.

- tools/skills_tool.py: `load_env()` for skill environment
- hermes_cli/config.py: `load_env()` for core config
- hermes_cli/main.py: `_has_any_provider_configured()` inline parser

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-06-28 18:53:00 -07:00
Teknium
f1cbe4308f fix(gateway): log error-notification failures instead of silently swallowing (#54472)
* fix(gateway): log error-notification failures instead of silently swallowing

The last-resort exception handler in _process_message_background() that
sends an error notice to the user caught all exceptions with a bare pass,
leaving zero trace when the notification itself failed. Upgrade to
logger.error(..., exc_info=True) so a failed error-notification send is
debuggable post-mortem.

Salvaged from #6499 by @BongSuCHOI (the logging-upgrade portion only).

* docs: add PR infographic for gateway error-notify logging
2026-06-28 18:52:51 -07:00
Teknium
3483424aaa fix(security): redact bare-token credentials in URL userinfo (#6396) (#54475)
git remote set-url with an embedded password (https://PASSWORD@github.com)
leaked the credential into agent output — the redaction engine only masked
user:pass@ DB connection strings, never the colon-less bare-token userinfo
form a git remote uses.

Add _URL_BARE_TOKEN_RE: scheme://TOKEN@host for web/transport schemes
(http/https/wss/git/ssh/ftp), 8+ char floor to skip short usernames, token
class forbidding /:@ so an @ in a path/query is never treated as userinfo.

Deliberately scoped to the bare-token form only. The user:pass@ colon form
and query-string tokens stay passing through (#34029, 'pass web URLs through
unchanged') so magic-link / OAuth round-trip skills keep working — a bare
credential in userinfo is never a workflow token (those live in the query
string), so masking it can't break a skill.
2026-06-28 18:52:42 -07:00
Teknium
9860d93f2a fix(terminal): require approval for host-bound Docker commands (#54483)
* fix(terminal): require approval for host-bound Docker commands

The Docker terminal backend blanket-skips dangerous-command approval on
the assumption that the container is isolated from the host. That holds
only when nothing is bind-mounted in. Once a host path is exposed (via
TERMINAL_DOCKER_MOUNT_CWD_TO_WORKSPACE or a host-path entry in
TERMINAL_DOCKER_VOLUMES), a command like `rm -rf /workspace` reaches
real host files but is still auto-approved.

Detect host bind mounts and route those sessions through the normal
approval flow. Isolated Docker keeps the fast path. The same gating is
applied to the execute_code guard, which had the identical blanket skip.

Co-authored-by: Hermes Agent <agent@nousresearch.com>

* chore: add AUTHOR_MAP entry for PR #6436 salvage (Kolektori)

* test: accept has_host_access kwarg in _check_all_guards mocks

The host-bound Docker approval fix adds a has_host_access kwarg to the
_check_all_guards wrapper. Six pre-existing tests monkeypatch it with a
fixed (command, env_type) / (cmd, env) lambda signature, which now
raises TypeError when terminal_tool passes the new kwarg. Widen those
mock signatures to accept **kwargs.

---------

Co-authored-by: Kolektori <256073454+Kolektori@users.noreply.github.com>
Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-06-29 11:35:41 +10:00
Ben Barclay
7cfa2fa13f fix(docker): gate resource limit flags on cgroup controller availability (#54516)
On hosts where the cgroup v2 cpu/memory/pids controllers are not delegated
to the docker/podman process (unprivileged Proxmox LXCs, some rootless and
nested setups), --pids-limit/--cpus/--memory cause every container start to
fail with OCI runtime error / exit 126, breaking terminal + execute_code.

- Add _cgroup_limits_available(image): one-shot, host-wide cached probe that
  spawns a throwaway container from the sandbox image itself (sleep 0) with
  all three flags together, mirroring the existing _storage_opt_supported
  probe-and-degrade pattern.
- Remove --pids-limit from static _BASE_SECURITY_ARGS; apply it (default 256
  via _DEFAULT_PIDS_LIMIT) in resource_args gated on the probe.
- Gate --cpus and --memory on the same probe.

Behavior unchanged on cgroup-capable hosts; graceful degradation with a
one-time warning where controllers aren't delegated.

Fixes #6568.

(cherry picked from commit c933880b7e)

Co-authored-by: angelos <angelos@oikos.lan.home.malaiwah.com>
2026-06-29 11:01:08 +10:00
Brooklyn Nicholson
5d661a3ad7 fix(desktop): show the agent command before terminal output arrives
Seed read-only agent terminal tabs with the background command immediately, so
they never open as a blank pane while stdout is pending or a live stream races
startup. Snapshot fallback now preserves that command header and appends only
missing output without duplicating live chunks.
2026-06-28 19:39:20 -05:00
Brooklyn Nicholson
6ac9ba9fc4 fix(desktop): seed agent terminal tabs from process snapshots
Read-only agent terminal tabs now consume both live agent.terminal.output chunks
and the process-list/status snapshot. The snapshot seeds tabs opened after output
already exists and acts as a fallback if the live stream races startup, so agent
background tabs don't sit blank while the status stack already knows the tail.
2026-06-28 19:35:54 -05:00
Brooklyn Nicholson
520212cc59 feat(desktop): stream agent terminal output live instead of polling
Replace the 5s output_tail poll (which often showed nothing) with a real push
stream. The process registry gains an on_output sink called from its reader
threads with each chunk; the tui_gateway wires it to emit agent.terminal.output
{process_id, chunk} (write_json is _stdout_lock-guarded, so emitting from the
reader thread is safe). The desktop routes chunks by process id straight into
the read-only agent xterm via a small writer registry, with a capped backlog so
a tab opened mid-stream (or reopened) replays what it missed.

Drops the fragile poll/tail path: no session-key matching, no truncation, no
lag — full-fidelity ANSI, env-agnostic (local/docker/ssh).
2026-06-28 19:33:43 -05:00
Brooklyn Nicholson
ad831dd492 feat(desktop): mirror agent background terminals as read-only tabs
When the agent runs terminal(background=true) — Hermes's equivalent of
Cursor's is_background — surface it as a read-only "agent" tab in the rail
(distinct sparkle icon), alongside the glanceable status-stack row, which now
links to the tab. The tab is a write-only xterm (no PTY, no input) fed by the
process output tail, appended live (faster poll while a tab is open) and
env-agnostic (works for local/docker/ssh shells alike).

- terminals.ts: TerminalEntry gains kind ('user'|'agent') + procId; agent tabs
  auto-surface once (closing one doesn't resurrect it) and the status row can
  reopen/focus them. ensureTerminal now guarantees a user shell specifically.
- use-agent-terminal.ts: slim read-only xterm hook, delta-appended.
- workspace: render user vs agent instances; auto-surface from the background
  store; tail faster while an agent tab exists.
- composer-status: $backgroundOutputByProc selector; status row links to the tab
  instead of an inline disclosure.
2026-06-28 19:26:21 -05:00
Brooklyn Nicholson
6e12f8ce4a fix(desktop): force a repaint when a terminal is re-activated
A WebGL terminal doesn't paint while visibility:hidden, so switching to it
(e.g. after closing the active tab) revealed a stale/garbled frame. On
activation, clear the glyph atlas and force a full term.refresh against the
live buffer (after the refit), then focus.
2026-06-28 19:13:58 -05:00
Brooklyn Nicholson
b02f453496 refactor(desktop): generalize focus check to isFocusWithin primitive
Replace the one-off isTerminalFocused with isFocusWithin(selector) in the
keybinds lib (beside isEditableTarget) — the reusable primitive for any
focus-scoped shortcut. The terminal marks itself data-terminal and the ⌘W
handler routes via isFocusWithin('[data-terminal]'); future surfaces just add
their own marker.
2026-06-28 19:11:48 -05:00
Brooklyn Nicholson
2d55ff8fca feat(desktop): ⌘W closes the focused terminal
Fold terminal close into the existing ⌘/Ctrl+W handler so focus decides the
target: a focused terminal takes ⌘W (closes the active tab) and otherwise the
keystroke closes the active preview tab as before. Only the ⌘ gesture is
intercepted — Ctrl+W stays the shell's werase — and a focused terminal never
lets ⌘/Ctrl+W close a preview out from under it.
2026-06-28 19:10:06 -05:00
Brooklyn Nicholson
c1bb34d5e8 fix(desktop): keep inactive terminals sized so switching doesn't garble
Hide inactive terminal tabs with `visibility` (absolute-stacked at full size)
instead of `display:none`. A display:none host is 0×0, so its ResizeObserver
fit bails and the terminal stops tracking pane resizes — re-showing it at a
changed size reflowed the buffer into a garbled prompt. Visibility-hidden
hosts keep their layout size, stay in sync, and switch instantly.
2026-06-28 19:08:25 -05:00
Brooklyn Nicholson
6875d6cd3e feat(desktop): multi-terminal panel with side tab rail
Multiple persistent in-app terminals managed by a thin VS Code-style icon
rail docked on the terminal pane's outer edge. Each tab is its own live
xterm+PTY that survives tab switches, session switches, and hiding the pane
(VS Code parity: only an explicit close or `exit` kills a shell). Terminals
own their state independent of the session — the sole thing they inherit is
an initial cwd snapshotted at creation.

- Rail: icon-only tabs (name + live hotkey on hover), +/hide controls,
  context menu. Sits at z-40 above the collapsed sidebars' hover-reveal
  triggers and marks itself data-suppress-pane-reveal, so reaching for a tab
  can't summon the file-browser/review panel.
- Lifecycle: PersistentTerminal latches mounted on first open so shells stay
  alive while hidden; ensureTerminal re-creates one on reopen.
- Agent reader: id-keyed registry drives read_terminal off the active tab.
- Keybinds (Ctrl-family, OS-aware): toggle Ctrl+`, new Ctrl+Shift+`,
  next/prev Ctrl+Shift+Down/Up, close Ctrl+Shift+W.
2026-06-28 19:06:55 -05:00
brooklyn!
10043c6d0c Merge pull request #54503 from NousResearch/bb/fix-desktop-cross-wired-resume
fix(desktop): restore cross-wired runtime-id guard on session resume
2026-06-28 18:26:01 -05:00
Brooklyn Nicholson
cd5fb760a5 fix(desktop): restore cross-wired runtime-id guard on session resume
resumeSession's warm-cache fast-path once again trusted the
storedSessionId -> runtimeId -> ClientSessionState mapping without
checking the cached state still BELONGS to the session being resumed. A
pooled profile backend that gets idle-reaped and respawned re-mints
runtime ids, so a recycled id resolves to a live-but-DIFFERENT session's
cache entry and paints the wrong transcript under the current route:
click thread A, a totally different thread (often from another worktree)
loads. The session.usage 404 guard only catches a fully-dead id; a
recycled-live id 200s, so the fast-path happily served the stale cache.

Straight regression, not a new bug. f7bf74064 ("reject cross-wired
runtime-id cache on session resume") landed takeWarmCache() + its
regression test; 62af32efe ("keep active sessions aligned with cwd"),
rebased off a stale branch, restructured resumeSession and silently
reverted both 29 minutes later -- the exact stale-branch squash clobber
AGENTS.md warns about ("Squash merges from stale branches silently
revert recent fixes").

Re-apply the whole-class fix on top of the current cwd-aligned code:
takeWarmCache() validates state.storedSessionId === storedSessionId at
BOTH cache reads (the early transcript-keep decision and the fast-path),
purging a cross-wired mapping on a miss so it falls through to a full
resume that rebinds a correct runtime id. Restore the two regression
tests guarding it.

Tests: resumeSession warm-cache mapping integrity -- a cross-wired
mapping is rejected + purged (the bug), a correctly-wired cache is still
served with no needless refetch (no perf regression).

Co-authored-by: professorpalmer <professorpalmer@users.noreply.github.com>
2026-06-28 18:23:09 -05:00
brooklyn!
65d45a0013 Merge pull request #53386 from NousResearch/bb/elevenlabs-voices-401-spam
fix(dashboard): stop ElevenLabs voice-list 401 log spam
2026-06-28 18:10:47 -05:00
Brooklyn Nicholson
f34cf7e3a4 test(gmi): stub profile fetch_models in static-fallback test
The fallback test only mocked fetch_api_models; CI still hit the real GMI
/v1/models endpoint via ProviderProfile.fetch_models and merged live
models into the result.
2026-06-28 18:05:28 -05:00
Brooklyn Nicholson
27f03243a0 fix(dashboard): stop ElevenLabs voice-list 401 log spam
The /api/audio/elevenlabs/voices endpoint logged a WARNING on every
failure, and the desktop re-polls it on each settings open/focus — a
bad/expired/scoped ELEVENLABS_API_KEY floods agent/gui logs with
identical "voice list failed: HTTP Error 401" lines indefinitely.

Treat 401/403 as a persistent "integration unavailable" state: return
{available: false, error: "unauthorized"} with a 200 (the dropdown
already handles available:false) instead of a 502, and collapse repeated
identical failures to a single log line via a small re-arming latch
(logs again on recovery or when the error changes). Non-auth errors keep
the 502 but are throttled the same way.
2026-06-28 17:59:28 -05:00
brooklyn!
d0d2cf1c2f Merge pull request #54492 from NousResearch/bb/windows-hide-checkpoint-skills-git
fix(windows): hide console flash on checkpoint git + skills_hub gh probes
2026-06-28 17:49:37 -05:00
Brooklyn Nicholson
cb1bb1a48d refactor(windows): unify windowless spawn form across the touched sites
windows_hide_flags() already returns 0 on POSIX (and creationflags=0 is
the no-op default there, exactly how server.py::_list_repo_files does it),
so drop the IS_WINDOWS import + ternary/one-use-dict gating and just pass
creationflags=windows_hide_flags() directly. Tests lose the now-pointless
IS_WINDOWS monkeypatch.
2026-06-28 17:44:47 -05:00
Brooklyn Nicholson
ee22d853eb fix(windows): hide pdftoppm console flash on PDF attach
server.py's PDF-attach handler shells out to `pdftoppm` from the
console-less desktop/gateway backend; on Windows that pops a conhost
window each attach. Route it through windows_hide_flags() like the
sibling _list_repo_files git calls (no-op on POSIX).
2026-06-28 17:43:27 -05:00
Brooklyn Nicholson
32087e4bc9 fix(windows): hide console flash on checkpoint git + skills_hub gh probes
The #54236/#54417 backend git/gh sweep routed git_probe, the repo-file
picker, coding_context, context_references, copilot_auth, and the gateway
process scans through CREATE_NO_WINDOW, but two sibling spawn legs that
also run inside the console-less desktop/gateway backend were missed:

- tools/checkpoint_manager.py `_run_git` (and the one-shot `git init
  --bare` in `_init_store`) — when checkpoints are enabled, every
  file-mutating turn fires multiple bare `git` calls (status, add,
  write-tree/commit-tree, update-ref). Spawned from a parent with no
  console (Electron spawns the backend with windowsHide → CREATE_NO_WINDOW),
  each one allocates its own conhost window → a flurry of terminal popups.
- tools/skills_hub.py `GitHubAuth._try_gh_cli` — `gh auth token`, the same
  bug class as the already-fixed copilot_auth gh probe.

Route both through `windows_hide_flags()` (no-op on POSIX), matching the
established per-site pattern. Tests added to
tests/test_windows_subprocess_no_window_flags.py.
2026-06-28 17:41:47 -05:00
Teknium
980622d0ec perf(startup): parse config + plugin manifests with libyaml CSafeLoader (#54486)
The startup config/manifest reads used PyYAML's pure-Python SafeLoader,
which is ~8x slower than the libyaml-backed CSafeLoader C extension.
config.yaml is parsed several times during launch (cli config, raw
config, early interface/redaction bridge, logging config) and every
plugin manifest is parsed once — all on the slow path.

Add utils.fast_safe_load (CSafeLoader-preferring, pure-Python fallback,
true drop-in for safe_load) and route the hot startup parse sites
through it: hermes_cli/config.py (config + manifest reads),
hermes_cli/plugins.py (manifest parse), env_loader, cli.load_cli_config,
hermes_logging, and the two pre-config early YAML bridges in main.py.

Behavior is identical (same restricted safe tag set); only speed changes.
safe_load calls on the startup path drop from ~79 to ~0, cutting the
YAML parse cost from ~0.9s to ~0.15s under profiling.

Adds tests/test_fast_safe_load.py asserting equivalence with safe_load
across input shapes, empty-doc falsiness, C-loader preference, and that
python/object tags are still rejected (safe, not full loader).
2026-06-28 15:38:39 -07:00
Teknium
d65468e7ff fix(security): SSRF guard yuanbao media download_url (#54470)
yuanbao_media.download_url() fetched model-supplied (outbound) and inbound
image/file URLs server-side via httpx with follow_redirects=True and no
SSRF check. A model response containing <img src="http://169.254.169.254/...">
routed through ImageUrlHandler -> download_url and would fetch cloud-metadata
endpoints; same for inbound media.

Add an is_safe_url() pre-flight plus an async redirect event-hook that
re-validates every 30x target, matching the cache_image_from_url() guard in
gateway/platforms/base.py. The other gateway adapters already guard their
URL-fetch paths; this was the remaining unguarded one.
2026-06-28 15:29:59 -07:00
brooklyn!
16ff1a3b93 Merge pull request #54457 from NousResearch/bb/windows-console-launcher-repair
fix(windows): repair missing console script launchers
2026-06-28 17:15:56 -05:00
Hermes Agent
c8b86963d0 docs: add PR infographic for anthropic stale base_url guard 2026-06-28 15:12:03 -07:00
奥森木
e7d4ade8cf fix(anthropic): ignore stale non-Anthropic base_url across all resolution paths
A config left with `provider: anthropic` but a leftover
`base_url: https://openrouter.ai/api/v1` (e.g. after a provider switch)
would route Anthropic OAuth/setup-token traffic to OpenRouter and 404.

Add `_anthropic_base_url_override_ok()` and gate the three native-Anthropic
resolution branches (pool, explicit, native) on it. The guard honors a
configured `model.base_url` only when it plausibly speaks the Anthropic
Messages protocol — official `*.anthropic.com` / `*.claude.com` hosts, Azure
Foundry endpoints, and `/anthropic`-suffixed or Kimi `/coding` proxies — and
falls back to `https://api.anthropic.com` otherwise. Aggregator URLs like
openrouter.ai / api.openai.com are treated as stale.

Reconstructed from @clovericbot's PR #3661 onto current main: the original
patched one branch with an anthropic-only allow-list, which would have broken
Azure-via-anthropic; widened to all three sites and made Azure/proxy-safe.
2026-06-28 15:12:03 -07:00
Teknium
95f2919f91 perf(startup): lazy-load gateway platform adapters (#54448)
Bundled platform plugins (telegram, discord, feishu, teams, ...) were
eagerly imported at plugin-discovery time on every `hermes` invocation,
including plain `hermes chat` which never touches a gateway platform.
Their modules import heavy platform SDKs at module level (lark_oapi,
microsoft_teams, discord.py, slack_bolt, ...) — feishu alone pulled in
lark_oapi (~2.6s), teams pulled microsoft_teams (~1.9s).

Discovery now registers a cheap deferred loader per platform in the
platform_registry; the adapter module is imported only when the gateway
/ cron / setup / send_message path actually asks for that platform.
is_registered() and the iterate-all accessors stay correct (deferred
counts as registered; plugin_entries()/all_entries() materialize all
deferred loaders, since those paths genuinely need every adapter).

Cold start: ~4.4s -> ~2.45s to banner. discover_and_load: 2.0s -> 0.3s
(warm), and the heavy SDKs are no longer imported at all in CLI mode.
Every shipped platform remains available out of the box — it just loads
on first use.
2026-06-28 15:11:59 -07:00
Mibayy
b0b7ff0d75 fix(provider): auto+base_url bypasses cloud API when custom endpoint configured (#3846)
When config.yaml has `provider: auto` and a non-cloud `base_url` (e.g. Ollama
at localhost:11434), requests were silently sent to https://api.anthropic.com
whenever ANTHROPIC_API_KEY was present in the environment, ignoring the
configured local endpoint and returning HTTP 401 / "credit balance too low".

Root cause: resolve_provider("auto") scans env vars and returns "anthropic"
when ANTHROPIC_API_KEY is set, before config.model.base_url is ever consulted.

In resolve_runtime_provider(), before calling resolve_provider(), short-circuit
to the OpenAI-compatible resolver when no explicit creds were passed, provider
is "auto"/unset, and a non-cloud base_url is configured. Well-known cloud roots
(openrouter.ai, anthropic.com, openai.com) are matched on HOST (not substring)
so look-alike hosts can't evade the bypass and leak a cloud credential.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-06-28 15:11:55 -07:00
Teknium
86e64900b9 fix(gateway): preserve sessions across restarts (#54442) 2026-06-28 15:10:39 -07:00
Teknium
4c2961c511 fix(curator): never archive cron-referenced skills + floor use=0 pruning (#54443)
The curator's inactivity prune archived any non-pinned agent-created
skill whose activity was older than archive_after_days (90d). A skill
loaded only by a cron job had its usage bumped solely when the job
fired, so paused jobs, infrequent (quarterly/annual) schedules, and
far-future one-shots aged their skills out from under them — the next
run then failed to load the now-archived skill.

- cron/jobs.py: add referenced_skill_names() returning skills used by
  ANY job (incl. paused/disabled).
- curator.apply_automatic_transitions(): skip cron-referenced skills
  like pinned; add a use=0 grace floor so a never-used skill is not
  marked stale/archived until it is at least stale_after_days old.
- LLM review pass: candidate list marks cron=yes; prompt forbids
  pruning cron-referenced skills and never-used skills under 30 days.

Tested E2E against a real cron job + real usage records and with 4 new
unit tests.
2026-06-28 15:10:21 -07:00
Gille
df8e2523fa fix(windows): verify launchers after primary install 2026-06-28 17:02:05 -05:00
HexLab98
76bb8f46a0 test(cli): cover Windows console script repair (#52931)
Add unit tests for missing-shim detection and repair trigger in
_verify_console_scripts_installed.
2026-06-28 17:01:31 -05:00
HexLab98
95994bbc56 fix(windows): repair missing hermes.exe after pip install (#52931)
On Windows, uv pip install -e . can register hermes.exe in package metadata
while the launcher never lands on disk. Detect missing [project.scripts]
shims and reinstall entry points under the existing quarantine path in
hermes update and install.ps1.
2026-06-28 17:01:31 -05:00
brooklyn!
28097d9cd9 Merge pull request #54385 from NousResearch/bb/project-folder-picker-remote
feat(desktop): remote-gateway-aware folder picker + git cockpit (status, review, worktrees)
2026-06-28 16:35:57 -05:00
Teknium
e5d22ab80d fix(daytona): quote single-upload mkdir parent path (#54440)
* fix(daytona): quote single-upload mkdir parent path

The single-file _daytona_upload() path shelled out 'mkdir -p {parent}'
with the remote parent interpolated unquoted, so shell metacharacters in
the path could break the command or inject arbitrary commands into the
sandbox. The bulk-upload, bulk-download, and delete paths were already
hardened with shlex-quoting helpers; this single-upload path was missed.

Route it through the existing quoted_mkdir_command() helper and add a
regression test covering a path with shell metacharacters.

Reported by @Gutslabs (#3960); the original branch predated the
file_sync refactor, so the fix is re-applied to the current code path.

* docs(infographic): daytona quote-sync fix
2026-06-28 14:33:03 -07:00
Brooklyn Nicholson
f9b469d7de test(web_git): assert default branch invariant, not hardcoded main
CI git init defaults to master on some runners; compare branch to
defaultBranch instead of pinning a branch name.
2026-06-28 16:29:52 -05:00
teknium1
c648ecdca5 fix(telegram): reject unauthorized users before event construction (#40863)
Removed/unauthorized Telegram users could inject prompt content before the
per-user auth gate fired. The adapter ran `_should_process_message`,
`_build_message_event`, and text/photo batching — and dispatched to the
runner — before `_is_user_authorized()` (gateway/authz_mixin.py) rejected
the sender. Unmentioned group chatter from a removed user was also
persisted into the session transcript via `_observe_unmentioned_group_message`,
leaking into the agent's observed context independent of dispatch.

Add `_is_user_authorized_from_message()` as an intake prefilter that runs
in `_handle_text_message`, `_handle_command`, `_handle_location_message`,
and `_handle_media_message` BEFORE batching, event construction, and the
unmentioned-group observe branch. It reuses the runner's
`_is_user_authorized()` with a correctly-shaped SessionSource (group vs
forum vs dm, real chat_id for TELEGRAM_GROUP_ALLOWED_* allowlists),
falls back to env allowlists, and only rejects when an allowlist actually
exists — unknown DMs with no allowlist still reach the pairing flow.
Channel posts authorize via `sender_chat` identity when `from_user` is
absent.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: Carlos Manuel Cejas <carlosmcejas@gmail.com>
2026-06-28 14:25:15 -07:00
srojk34
61210097a5 fix(browser): extend private-network guard to browser_get_images
The SSRF cluster (7a6fe9bb, 48f5c425, 7ef04ae7) sealed
browser_snapshot, browser_vision, and _browser_eval against
eval-navigated private pages, but browser_get_images bypasses
_browser_eval and calls _run_browser_command("eval", ...) directly.
An eval-driven navigation to a private address followed by
browser_get_images would leak image src URLs and alt text from the
private page.

Add the same _eval_ssrf_guard_active + _current_page_private_url
recheck before returning image data, matching the pattern established
by the sibling guards.

5 new tests cover: block on private page, allow on public page, skip
for local backend, skip when private URLs allowed, no guard needed on
failed eval.
2026-06-28 14:25:10 -07:00
Brooklyn Nicholson
c7542358f2 fix(desktop): remote project picker UX and profile-scoped fs/git routing
Route FS/git REST through the active profile, mount the remote folder picker
at app root, keep the project dialog open while picking, show a first-run
blank state, flip into grouped view on create, and constrain the picker scroll
area so Select stays reachable.
2026-06-28 16:23:39 -05:00
Teknium
9a0010fd46 fix(windows): cover remaining console-flash spawn legs (#54417) 2026-06-28 13:49:08 -07:00
Teknium
b31b0b9d95 docs: reconcile docs with code across last 3 releases (#54254)
Audited the last 3 releases (v2026.5.28..main) against the docs site and
fixed code-vs-docs drift:

- slash-commands: add /moa, /prompt, /pet, /hatch, /timestamps
- cli-commands: add hermes pets / project / desktop / whatsapp-cloud +
  dashboard register; correct --insecure (now a deprecated no-op);
  add gateway migrate-legacy + enroll --wake-url + dashboard --skip-build
- environment-variables: document the remaining ~48 env vars (SimpleX,
  Photon, Teams adapter, per-platform *_ALLOW_ALL_USERS, home-channel vars,
  IRC, Brave/Krea/Notion/Linear/Airtable/Tenor keys, QQ_SANDBOX) — full
  OPTIONAL_ENV_VARS (265) now covered
- configuration: document tool_loop_guardrails, goals, prompt_caching,
  network, onboarding, dashboard config blocks
- toolsets/tools-reference + tools.md: add coding/project toolsets and
  read_terminal/project_* tools; remove the stale messaging toolset and
  send_message agent tool (removed in #47856); drop stale RL-training prose
- messaging: new IRC channel page (adapter shipped without docs) + index
  row + sidebar + env vars
- pets: document the /hatch AI generation pipeline + Nous/OpenRouter image
  backend
- web-dashboard: document the bearer-token / TokenPrincipal service auth path
- purge agent-callable send_message references across guides/features and
  the research-paper-writing skill (tool removed in #47856)

Verified: docusaurus build succeeds; all authored internal links resolve.
2026-06-28 12:47:50 -07:00
Brooklyn Nicholson
19bae1b9e0 test(desktop): assert new backend sessions carry workspace cwd
Pin the desktop-to-gateway cwd handoff: createBackendSessionForSend must pass
the current workspace cwd into session.create so the backend registers the
session cwd before the agent/tools run.
2026-06-28 14:44:28 -05:00
Brooklyn Nicholson
8d8c7111d9 refactor(desktop): keep remote fs routing inside the fs facade
Let UI callers ask for folders/files without knowing remote-picker limits:
selectDesktopPaths now normalizes remote directory selection to a single folder
inside the facade. Project creation and composer context picking no longer branch
on remote mode; they route through desktop-fs helpers just like git callers route
through desktopGit(). Behavior unchanged except remote folder context now works
through the same backend picker path.
2026-06-28 14:39:33 -05:00
Brooklyn Nicholson
453f134b3b refactor(desktop): centralize remote git REST routing
Keep the remote git mirror as a thin facade: route all GETs through gitGet,
all mutations through gitPost, and keep consumers on desktopGit(). On the
backend, route git paths through a single _git_path helper instead of repeating
str(_fs_path(...)) in every endpoint. Behavior unchanged.
2026-06-28 14:37:36 -05:00
Brooklyn Nicholson
4e9439cc3b fix(desktop): route composer context picking through remote-aware fs
Second pass on the remote-project flow: the project dialog and git cockpit were
remote-aware, but the composer's Add file/folder context picker still called the
native Electron picker directly. Route it through selectDesktopPaths so remote
sessions use the backend-aware picker instead of local disk paths; preserve local
multi-select behavior and keep remote folder selection single because the in-app
remote picker only supports one directory.

Also use readDesktopFileDataUrl for image previews so an already-known backend
image path can be read through /api/fs/read-data-url, and add focused coverage
for backend file-diff routing plus the plain-folder git init/worktree path.
2026-06-28 14:35:23 -05:00
Brooklyn Nicholson
9b71221187 fix(desktop): write project IDEA.md through the remote-aware fs path
writeProjectIdea used the local-only Electron writeTextFile, so on a remote
gateway IDEA.md never landed on the backend (where the project folder lives).
Route it through writeDesktopFileText (local Electron / POST /api/fs/write-text).
2026-06-28 14:31:58 -05:00
Brooklyn Nicholson
e4cf3a2e9d refactor(web_git): unify porcelain-v2 parsing into one walker
Collapse the two near-duplicate status parsers (_parse_status_v2 +
_iter_status_entries) into a single _walk_entries generator feeding the rail,
review list, and commit flow; share the staged predicate; hoist `import re`.
Behavior unchanged.
2026-06-28 14:29:59 -05:00
Brooklyn Nicholson
fc86e35764 feat(desktop): make the git cockpit work over a remote gateway
After the folder picker fix, an added remote folder was still half-usable:
the desktop's git GUI (coding-rail status, worktree lanes, review pane,
branch switch, file diff) all ran Electron-local git on the USER's machine,
so against a remote-gateway repo they silently degraded to empty.

Mirror the whole surface over the dashboard REST API so it acts on the
BACKEND repo where sessions actually run:

- hermes_cli/web_git.py: git/gh logic (status, worktrees, branches, review
  list/diff/stage/unstage/revert/commit/commit-context/push/ship-info/
  create-pr, file-diff, worktree add/remove, branch switch) shelling to the
  system git, mirroring the Electron ops' shapes.
- web_server.py: /api/git/* routes (same auth gate + _fs_path hardening as
  /api/fs, executor-offloaded, mutations -> 400).
- apps/desktop desktop-git.ts: remote-aware facade exposing the same shape as
  window.hermesDesktop.git; coding-status / review / projects / model /
  desktop-fs route through desktopGit() so local stays Electron, remote hits
  /api/git/*.

Tests: tests/hermes_cli/test_web_server_git.py (real repo: status counts,
review classification, diff incl. untracked all-add, stage+commit roundtrip,
worktree/branch lifecycle, commit-context, gh-absent ship-info, auth) and
desktop-git.test.ts (local vs remote routing, envelope unwrap, POST bodies).
2026-06-28 14:26:09 -05:00
Brooklyn Nicholson
304f0650c4 style(desktop): tighten pickProjectFolder comment 2026-06-28 14:13:36 -05:00
Brooklyn Nicholson
4526fccdbe fix(desktop): make project "Add folder" picker remote-gateway aware
The new-project / add-folder dialog (PR #49037) picked folders via the
native Electron dialog (pickDefaultProjectDir), which only browses the
LOCAL machine. On a remote gateway that picks a path that doesn't exist
on the backend where sessions actually run.

Route pickProjectFolder() through selectDesktopPaths({directories,
multiple:false}) — the same remote-aware path the retired right-sidebar
picker used: local mode opens the native directory dialog, remote mode
browses the backend filesystem via the in-app RemoteFolderPicker. Seed
it with the backend's default cwd on remote so it opens somewhere useful.
2026-06-28 13:49:45 -05:00
brooklyn!
b699d27a4a Merge pull request #54357 from NousResearch/bb/browser-chromium-autoinstall
feat(browser): auto-install Chromium binary on local cold-start failure
2026-06-28 12:36:22 -05:00
brooklyn!
27868e5b55 Merge pull request #54353 from NousResearch/bb/browser-first-open-timeout
fix(browser): extend first-open timeout & surface daemon errors on Linux (salvage #52575)
2026-06-28 12:32:41 -05:00
Brooklyn Nicholson
70292596ef feat(browser): auto-install Chromium binary on local cold-start failure
When a local browser_navigate (or any browser command) fails fast because
Chromium isn't on disk, attempt a one-shot binary download via
`agent-browser install` and retry instead of only printing a hint.

Scope is narrow on purpose:
- binary only, never `--with-deps` (that shells apt/needs root, so missing
  system libraries stay a user action)
- gated by `security.allow_lazy_installs` (same opt-out as every lazy install)
- skipped in Docker (Chromium ships in the image)
- attempted once per process

Follow-up to #54353, which made the cold-start failure legible; this closes
the "doesn't actually install the missing browser" gap for the common case.
2026-06-28 12:25:15 -05:00
Brooklyn Nicholson
1ab5c3cdda refactor(browser): drop redundant sandbox-hint substring check 2026-06-28 12:14:47 -05:00
infinitycrew39
7bb8aa3bd5 test(browser): cover open timeout diagnostics and failed navigate title
Add regression tests for open-command timeout floors, sandbox bypass,
stderr capture formatting, first-navigation timeout wiring, and desktop
failed-navigate labeling.
2026-06-28 12:14:21 -05:00
infinitycrew39
a10727a555 fix(browser): extend first-open timeout and surface daemon errors
Local browser_navigate cold-starts the agent-browser daemon and Chromium;
60s was too short on slow Linux hosts and timeouts discarded stderr,
leaving users with a generic failure. Use a 120s floor on first open,
inject --no-sandbox in Docker, include captured daemon output plus install
hints when commands time out, and show "Failed to open" in the desktop
tool chip when navigation returns success=false.
2026-06-28 12:14:21 -05:00
brooklyn!
23021be26e Merge pull request #52656 from helix4u/fix-desktop-empty-resume-view
fix(desktop): retry empty resumed transcripts
2026-06-28 11:57:57 -05:00
ygd58
3e16176ba4 fix(tools): reconcile agent.disabled_toolsets when a toolset is enabled
_get_platform_tools() applies agent.disabled_toolsets as a final
override AFTER reading platform_toolsets.<platform>, so a toolset
listed there stays permanently OFF no matter what the toggle write
path saves. Blank Slate installs pre-populate this list with ~27
toolsets, making most of the desktop Toolsets UI un-enableable
(issue #49995).

Fix: _save_platform_tools() now removes any toolset the user just
explicitly enabled FOR THIS PLATFORM from agent.disabled_toolsets.
Toolsets the user did not touch, or that remain disabled on other
platforms, are left alone -- disabled_toolsets keeps working as a
cross-platform suppression list for anything not actively re-enabled.
Disabling a toolset (unchecking it) does not touch disabled_toolsets
at all -- only enables reconcile it.

Verified end-to-end with the exact repro from the issue: Blank Slate
config (disabled_toolsets=['todo','memory','browser'], cli=['file',
'terminal']) -> enable 'todo' via the toggle -> _get_platform_tools()
now resolves 'todo' as enabled while 'memory'/'browser' (untouched)
remain disabled.

Added 4 regression tests. Full tools_config suite: 101 passed
(97 existing + 4 new), no regressions.

Fixes #49995
2026-06-28 21:59:03 +05:30
brooklyn!
020966574d Merge pull request #53892 from NousResearch/bb/windows-popup-spawn-legs 2026-06-28 11:16:35 -05:00
Brooklyn Nicholson
eeca59f489 fix(windows): hide remaining backend console-flash legs missed on main
main (cb982ad99) wired windows_hide_flags() into the auxiliary git/gh/wmic/
bash/powershell/taskkill legs but left two it didn't reach, plus the Electron
backend-launch leg it explicitly deferred. Cover them the same way:

- apps/desktop/electron/main.cjs: getNoConsoleVenvPython resolves the BASE
  pythonw.exe instead of the venv Scripts\pythonw.exe shim, which re-execs a
  console python.exe and flashes a conhost the desktop backend can't suppress.
  Both backend creators put the venv site-packages on PYTHONPATH so imports
  still resolve under the base interpreter. (main's commit said this Electron
  leg "needs a Windows-tested change of its own".)
- tools/tts_tool.py, tools/transcription_tools.py, plugins/platforms/discord:
  ffmpeg conversions (voice notes / TTS / STT) via windows_hide_flags().
- plugins/platforms/whatsapp: netstat + taskkill bridge-port cleanup via
  windows_hide_flags().

All no-ops on POSIX. Tests assert the base-pythonw preference and the ffmpeg
legs pass CREATE_NO_WINDOW.
2026-06-28 10:19:21 -05:00
Teknium
0c2e6c0049 test: make active session cross-process race deterministic (#54248) 2026-06-28 05:49:21 -07:00
teknium1
1ffa01f35f test(windows): cover no-window backend subprocess flags 2026-06-28 05:28:45 -07:00
Teknium
cb982ad997 fix(windows): hide console-window flash on backend git/gh/wmic/bash subprocess spawns
The Windows desktop GUI runs its backend headless via pythonw.exe. Several
auxiliary subprocess sites that run inside that windowless backend spawned
console-subsystem children (git, gh, wmic, powershell, bash, rg, taskkill)
WITHOUT CREATE_NO_WINDOW, so Windows allocated a fresh conhost per call and
flashed a black window on screen — sometimes continuously (the dashboard
Projects-tree git probe alone fired ~118 spawns in 60s on startup).

The terminal tool, cron, browser, code_execution, and gateway-spawn paths
already carry windows_hide_flags(); these auxiliary probe/scan/launcher legs
were missed. Wire the existing helper into them:

- tui_gateway/git_probe.py: run_git (+ encoding=utf-8/errors=replace, fixes the
  cp950 UnicodeDecodeError on CJK paths from the same site)
- agent/coding_context.py: _git (per-turn git status/log/diff)
- agent/context_references.py: _run_git + _rg_files (@file/@ref resolution)
- hermes_cli/copilot_auth.py: gh auth token probe (auxiliary provider:auto)
- hermes_cli/gateway.py: wmic + PowerShell Get-CimInstance PID scan
- hermes_cli/main.py: wmic stale-dashboard PID scan
- gateway/status.py: taskkill /T /F force-kill

windows_hide_flags() returns 0 on POSIX, so every changed call is a no-op on
Linux/macOS (verified: real git/rg probes still work; Windows-simulated calls
all pass creationflags=CREATE_NO_WINDOW).

Scoped to the windowless-backend paths that cause the reported flashing. The
Electron updater-handoff leg (main.cjs windowsHide:false) and the
interactive-CLI banner probes (cli.py) are intentionally NOT touched here —
the former needs a Windows-tested change of its own, the latter runs in a
visible console anyway.

Tracking: #54220
Refs: #53178 #53631 #53781 #53957 #49602 #52982 #53424 #53053 #53016
2026-06-28 05:28:45 -07:00
teknium1
f25f235722 chore: map salvaged PR #49845 author email for AUTHOR_MAP 2026-06-28 04:47:39 -07:00
homelab-ha-agent
d05cc8f4d6 fix(mcp): skip preflight content-type probe for OAuth servers
OAuth-protected MCP servers (e.g. Hospitable) return 200 text/html on an
unauthenticated HEAD probe — a login/landing page the server cannot substitute
for a real MCP response without a Bearer token.  The preflight cannot
distinguish this from a misconfigured URL, so it raises NonMcpEndpointError
before the OAuth browser flow has a chance to run.

Add `and self._auth_type != "oauth"` to the preflight condition in
MCPServerTask.run().  The probe is inapplicable to OAuth servers: their URL
legitimacy is established by .well-known/oauth-protected-resource during the
OAuth handshake, not by a GET content-type check.

Concrete repro: Hospitable (https://mcp.hospitable.com/mcp) returns
`200 text/html` to an unauthenticated httpx HEAD.  Without the guard:
  ✗ NonMcpEndpointError at `hermes mcp test`
With the guard:
  ✓ Connected (1487ms) — 63 tools discovered

Relation to open PRs:
- #37598 adds a POST probe fallback for POST-only non-OAuth servers (e.g.
  DocuSeal), but only passes when POST returns 2xx + MCP content-type.
  Hospitable returns 401 on the POST probe (Bearer challenge), so #37598
  does not cover this case.
- #49463 extends the POST probe to also pass on non-2xx auth challenges
  (making it OAuth-aware), but is labeled duplicate of #37598 and may not
  land independently.
This fix is complementary: it handles OAuth servers with zero extra
round-trips rather than adding a POST probe step.

Tests:
- test_oauth_server_html_response_raises_without_skip: documents that
  _preflight_content_type raises NonMcpEndpointError for 200 text/html
  (the underlying issue), with an OAuth-server docstring.
- test_run_skips_preflight_for_oauth: verifies that run() does NOT invoke
  _preflight_content_type when auth_type=="oauth", using class-level
  monkeypatching so the gate is exercised without a live MCP transport.

23 passed  tests/tools/test_mcp_preflight_content_type.py
2026-06-28 04:47:39 -07:00
liuhao1024
9d919daf44 fix(gateway): mark platform lock failure as retryable instead of permanently fatal
When a stale lock file survives a gateway crash, `acquire_scoped_lock()`
may return `(False, existing_dict)` even after detecting and deleting
the stale lock (e.g. if unlink fails or a race condition occurs).

Previously, `_acquire_platform_lock()` called
`_set_fatal_error(..., retryable=False)`, which permanently killed the
platform — the reconnect watcher never retries a non-retryable fatal
error.

Change to `retryable=True` so the platform enters the "retrying"
state and the reconnect watcher can attempt acquisition again after the
standard backoff delay.

Fixes #54167
2026-06-28 04:35:37 -07:00
teknium1
61622bb56a fix(tui): use role=user for model switch marker to avoid HTTP 400 on strict providers (#48338)
_append_model_switch_marker() appended the post-/model-switch context marker
to session history as {"role": "system"}. The cached system prompt is
prepended to the API message list (conversation_loop.py), so this marker
became a SECOND system message mid-array after prior user/assistant turns.
Strict OpenAI-compatible providers (vLLM, Qwen) reject any system message
that is not at the beginning of the array, returning HTTP 400 and killing
the conversation on the next turn.

Flip the marker to role="user" (history entry + both session-DB persist
sites), matching the existing personality-overlay marker which already uses
role="user". repair_message_sequence() then coalesces it with adjacent user
turns as needed.

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: Lucas Nicolas <lucas.nicolas@proton.me>
2026-06-28 04:34:55 -07:00
Brad Hallett
376d021fee fix(desktop): force app exit after update/uninstall handoff on macOS
On macOS app.quit() closes windows but window-all-closed deliberately keeps
the process alive (Dock convention). Every detached hand-off (update swap,
relaunch, Windows bootstrap recovery, uninstall cleanup) waits for the
desktop PID to exit before replacing/removing the bundle — so the process
never dying means the script spins its full PID-wait and the user sees a
blank app, or an uninstall that appears to do nothing.

Add a module-level isQuittingForHandoff flag, set before every hand-off
app.quit(); window-all-closed then quits on all platforms when it's set.

Covers all five hand-off sites including the Linux relaunch path.
2026-06-28 04:30:14 -07:00
teknium1
e54bedd8ea docs: add infographic for #42006 launchd bootout fix 2026-06-28 04:17:13 -07:00
izumi0uu
c4719aa51c fix(gateway): boot out stale launchd registration before restart bootstrap
launchd restart can leave the gateway job stopped but still registered after
update-time drain logic, so a direct bootstrap hits exit 5 and falls back to a
detached process. Booting the stale registration out before bootstrap keeps the
launchd-managed restart path intact and locks it with a regression test.

Constraint: Keep upstream-facing conventional commit style while preserving local decision context
Rejected: Treat bootstrap exit 5 as expected | Leaves macOS launchd restart outside launchd supervision after update
Confidence: high
Scope-risk: narrow
Directive: Keep launchd start/restart recovery flows aligned when changing launchctl handling
Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k "launchd_restart_boots_out_stale_registration_before_bootstrap or launchd_restart_falls_back_to_detached_on_error_5 or launchd_restart_drains_running_gateway_before_kickstart or launchd_restart_self_requests_graceful_restart_without_kickstart"
Tested: pytest -q tests/hermes_cli/test_gateway_service.py -k launchd
Not-tested: Manual macOS launchctl restart after hermes update
2026-06-28 04:17:13 -07:00
Teknium
52a853f5c3 fix(test): pin monotonic clock in spinner-elapsed test to fix CI flake (#54203)
test_spinner_elapsed_format_is_fixed_width_to_reduce_wrap_jitter derived
_tool_start_time from the live time.monotonic() clock (now - 65.2 / now - 9.2).
monotonic()'s epoch is arbitrary — on a host where monotonic() < 65.2 (fresh
subprocess on a freshly-booted CI runner) the start time went negative, the
(t0 > 0) guard in _render_spinner_text() dropped the '(elapsed)' suffix, and
short.split('(',1)[1] raised IndexError: list index out of range. Deterministic
given a small clock, so it would keep flaking, not clear on rerun.

Pin time.monotonic to a fixed 1000.0 and offset _tool_start_time from it so both
the <60s and >=60s paths always render the elapsed suffix regardless of the
runner's monotonic epoch.

Pre-existing main flake (surfaced in CI test slice 1/8).
2026-06-28 04:16:25 -07:00
Teknium
8e356eccea docs(readme): trim provider list to a few names plus docs link (#54169)
The README line enumerated 11 providers inline, which dilutes the point
and goes stale as providers come and go. Replace with Nous Portal,
OpenRouter, OpenAI, your own endpoint, and a 'many others' link to the
canonical AI Providers docs page that already lists them all.
2026-06-28 04:14:59 -07:00
teknium1
f22b9d3867 docs: add infographic for MCP WS discovery fix (#38945) 2026-06-28 04:14:12 -07:00
Cornna
5c2c85c545 fix(tui): start MCP discovery for websocket sessions
The desktop app and dashboard chat reach the agent through the /api/ws
JSON-RPC sidecar (tui_gateway.ws.handle_ws), NOT through
tui_gateway.entry.main() — the stdio-TUI path that spawns the background
MCP discovery thread. In the WS process discovery was therefore never
started: _make_agent only *waits* (wait_for_mcp_discovery), which no-ops
when the thread was never created, so the agent snapshotted an MCP-less
tool list. The only discovery trigger reachable was a manual /reload-mcp,
which is why tools appeared after a reload but vanished on restart.

Start the shared, idempotent, config-gated background discovery in
handle_ws right after accept() and before gateway.ready, so the first
agent build picks up already-spawning servers (and the existing
late-binding refresh handles slow ones).

Fixes #38945.
2026-06-28 04:14:12 -07:00
teknium1
091ce825fe test(redact): fix file_read regression-guard for current-main YAML collapse
The salvaged #35519 regression guard asserted that default (non-file_read)
mode keeps a head/tail `ghp_S1...Pn2T` mask for a `token: <key>` line. On
current main the YAML config pass (`_YAML_ASSIGN_RE`, key `token`) re-masks
the already-prefix-masked value to `***`, so the assertion was stale. Switch
to a bare-token context so the guard isolates what it claims (prefix-mask
head/tail shape in default mode) without depending on the YAML collapse.
2026-06-28 04:13:20 -07:00
kshitijk4poor
de928bccde fix(redact): non-reusable sentinel for prefix secrets in file reads (#35519)
When security.redact_secrets is on (default), read_file/search_files/cat
applied redact_sensitive_text(code_file=True) to file content, which still
ran prefix masking. An API key in config.yaml (ghp_..., sk-..., xai-..., etc.)
came back as a head/tail mask like `ghp_S1...Pn2T` — a plausible-looking
truncated key. When an agent read that and wrote it back to config, the masked
value replaced the real credential, silently breaking auth (401). Production
evidence: a config.yaml found containing the exact 13-char masked GitHub PAT.

The two community PRs (#35529, #35534) fixed the corruption by NOT redacting
prefixes for config reads — but that exposes the user's real keys to the agent
context, model, and logs (a security regression). This takes the safer route:
keep redacting, but for file content emit a NON-REUSABLE sentinel.

- New `_mask_token_nonreusable`: prefix secrets -> `«redacted:ghp_…»` (vendor
  label preserved for debuggability; zero secret bytes; angle-bracket/ellipsis
  wrapper is syntactically invalid as a token so it can't be mistaken for or
  written back as a usable key).
- New `redact_sensitive_text(file_read=True)` routes prefix matches through it
  (implies code_file=True). Default/log/display mode is UNCHANGED — `_mask_token`
  still keeps head/tail (fine for logs, never written back).
- Wired the 3 file_tools.py call sites (read_file / search_files / cat) to
  file_read=True.

Fixes both the corruption AND avoids the secret-exposure of the un-redact
approach. 6 new tests (sentinel shape, no-leak, not-a-plausible-key, default
mode unchanged, file_read implies code_file, sk- prefix); 88 redact tests pass;
mutation-verified (reverting to the old mask fails the sentinel/leak tests).

Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Co-authored-by: adammatski1972 <289282750+adammatski1972@users.noreply.github.com>

Closes #35519. Supersedes #35529, #35534.
2026-06-28 04:13:20 -07:00
teknium1
19cbbe304a docs: add infographic for clarify typed-replies fix 2026-06-28 04:13:19 -07:00
tymrtn
d7f655f370 fix: accept typed clarify choice replies 2026-06-28 04:13:19 -07:00
teknium1
9bb5a809b5 fix(gateway): make zombie check defensive against partial psutil stubs
The zombie status probe referenced psutil.Process/NoSuchProcess/Error
unconditionally, which raised AttributeError when psutil is a partial
stub that only defines pid_exists (as in test_windows_native_support's
fallback tests). Guard the probe so any failure to read status degrades
to the authoritative pid_exists() instead of raising.
2026-06-28 04:11:14 -07:00
MorAlekss
acca526286 fix(gateway): treat zombie PIDs as dead in _pid_exists to unblock --replace (closes #42126)
Under systemd Restart=always, the old gateway becomes a zombie (in the
process table, awaiting reap) when the replacement starts. _pid_exists()
reported the zombie as alive, so --replace waited on a PID that never
dies, then aborted with exit 1 — a silent crash loop. Standalone runs are
unaffected because nothing respawns the gateway into a zombie.

The live path is psutil.pid_exists(), which returns True for zombies, so
the check is added there (Process.status() == STATUS_ZOMBIE -> dead). The
psutil-less POSIX fallback also reads /proc/<pid>/stat (state Z) with a ps
state= fallback for macOS/BSD, before the os.kill(pid, 0) liveness probe.

Diagnosis and the /proc + ps POSIX fallback by MorAlekss (PR #44898);
extended to cover the psutil hot path so the fix applies on normal installs.

Co-authored-by: MorAlekss <mor.aleksandr@yahoo.com>
2026-06-28 04:11:14 -07:00
teknium1
463225caf1 fix(gateway): bypass legacy-unit prompt in non-TTY systemd install
Folds in PR #42124 (kyssta-exe): systemd_install gained a non_interactive
flag so the 'Remove the legacy unit(s)?' prompt — the second hidden prompt
not guarded by --start-now/--start-on-login — is also skipped in headless
contexts. Updates systemd_install test mocks to accept the new kwarg and
adds coverage for the legacy-unit-skip path.
2026-06-28 04:09:54 -07:00
liuhao1024
831d443b03 fix(gateway): honor --start-now/--start-on-login flags and support non-TTY headless installs
When running `hermes gateway install` on Linux/systemd, the command
unconditionally prompts with two `prompt_yes_no` questions, breaking
headless installs (SSH, CI, provisioning scripts) and ignoring the
existing --start-now / --start-on-login CLI flags that the Windows
branch already respects.

The fix mirrors the Windows path: read CLI flags first, prompt only
when flags are not provided AND stdin is a TTY, and fall back to True
defaults for non-TTY contexts. The argparse help strings are promoted
from SUPPRESS to visible so users can discover the flags.

Fixes #42065
2026-06-28 04:09:54 -07:00
Teknium
5e7bca95d9 fix(tui): coalesce render frames while stdout backpressure is unresolved (#31486) (#54171)
When the previous frame's stdout.write has not drained (the outer terminal
parser is overwhelmed by a wide CR+LF burst — CJK + ANSI tool output on a
high-context session), the renderer kept writing a new frame every tick. That
piled writes onto an already-backed-up pipe and kept the macrotask queue hot,
starving the stdin 'readable' callback — the observed stdin freeze where the
agent loop keeps running but keystrokes/Ctrl-C are dead.

onRender now coalesces: while pendingWriteStart is non-null (prior write's
drain callback hasn't fired) it skips the frame and retries on the drain tick
instead of writing. A MAX_COALESCED_BACKPRESSURE_FRAMES ceiling forces a write
through after N skips so a terminal whose drain callback never fires (OSError
EIO on flush) self-heals once the pipe recovers rather than wedging forever.
TTY-only; piped stdout has no flow control. Coalesce counter resets on every
real write.

This is the stdout-backpressure strand left open after #54046 fixed the
swallowed-exception strand.
2026-06-28 04:00:22 -07:00
Teknium
a06d0198cd fix(dashboard): reap PTY bridge on child EOF, not only in writer finally (#54190)
The /api/pty handler only closed the PtyBridge in the writer loop's finally.
On child EOF the reader task closes the WebSocket, but if the handler task is
cancelled the instant the socket closes, the writer's finally can be skipped
and the PTY fds leak (#54028) — the FD-leak the regression test guards. Under
dashboard auto-reconnect this stacks orphaned PTYs until fds are exhausted.

Reap the bridge in the reader's EOF finally too (close() is idempotent), so
the PTY is reaped independently of the writer-loop cancellation race. Harden
the regression test to poll for teardown instead of asserting on the same
tick. Was flaky on main (2/20); now 25/25.
2026-06-28 03:58:18 -07:00
Teknium
7968c90318 test(install): track run_with_timeout extraction after #39219 refactor (#54185)
PR #39219 split run_browser_install_with_timeout into a thin wrapper that
delegates to a new run_with_timeout helper (and parameterized the timeout
binary as $timeout_bin for macOS gtimeout support), but did not update
tests/test_install_sh_browser_install.py. The behavioral harness extracted
only the now-empty wrapper, so the install command never ran (runs==[]),
failing all 8 behavioral cases; two text assertions also still expected the
old literal 'timeout' invocation.

Fix the stale test: extract run_with_timeout alongside the wrapper, and match
the $timeout_bin-parameterized GNU-timeout strings. Behavior unchanged.
2026-06-28 03:58:01 -07:00
Christian Persico
135f235165 docs: fix incorrect web search instructions 2026-06-28 02:54:27 -07:00
kshitijk4poor
546193aa6d fix(install): time-box desktop + node-deps installs so a stalled download self-heals (#39219)
The desktop install step ran npm ci / npm run pack with no wall-clock cap, and
the sibling browser-tools / TUI / agent-browser dependency installs had the same
gap. The Electron binary (~150MB) is fetched from GitHub during the pack; on a
throttled or region-blocked link that download can *stall* rather than fail —
npm never errors and never exits, so the installer sits on "Build desktop app"
(step 9/11) indefinitely with only harmless 'npm warn deprecated' lines visible.
The existing self-heal escalation (cache purge -> dist restore -> npmmirror
fallback) only fires when pack returns non-zero, so a stall bypassed it.

- run_with_timeout (generalized from run_browser_install_with_timeout): GNU
  timeout --foreground -k 10 (Ctrl+C-aware, #35166) / gtimeout for external
  commands, else a pure-shell process-group watchdog so stock macOS (neither
  binary present) is protected. Shell functions (_desktop_pack) always take the
  pure-shell path — the timeout binary can't exec a function. Integer-normalized
  budget + a boundary recheck so a command finishing in the final poll second
  isn't mislabeled 124. The internal wait is guarded so set -e can't abort
  mid-function before the real exit code is computed.
- Wrap the desktop npm ci/install (sharing ONE budget via a computed deadline so
  a stall can't cost 2x DESKTOP_BUILD_TIMEOUT) + all three _desktop_pack attempts
  (DESKTOP_BUILD_TIMEOUT, default 900s), and the browser-tools / TUI / agent-
  browser registry installs (NODE_DEPS_TIMEOUT, default 600s).

A stall now converts to a bounded non-zero exit that feeds the existing mirror
self-heal instead of hanging the whole install.
2026-06-28 02:47:47 -07:00
Teknium
c1c179a239 fix(security): redact secrets in background process + foreground env-dump output (#43025) (#54149)
* fix(security): redact secrets in background process + foreground env-dump output

Terminal-output redaction was incomplete (#43025):

- Gap 1: process(action=poll/log/wait) returned background stdout verbatim —
  no redaction at all. A background printenv/server/test emitting a key leaked
  raw to the model, session.db, and CLI display. Same for the gateway
  background-process watcher's completion/progress notifications.
- Gap 2: the foreground terminal path hardcoded code_file=True, which skips the
  ENV-assignment pass, so an opaque token (no vendor prefix) from env/printenv
  leaked even there.

Adds agent.redact.redact_terminal_output(output, command) as the single policy
for ALL terminal-output surfaces: env-dump commands (env/printenv/set/export/
declare) get the ENV-assignment pass (code_file=False) to mask opaque tokens;
other commands stay on code_file=True to avoid false positives on source dumps.
Wired into terminal_tool, process_registry (_handle_process boundary), and the
gateway watcher. Respects security.redact_secrets (no force) — opt-out preserved.

* docs: add infographic for #43025 terminal-output redaction fix
2026-06-28 02:44:21 -07:00
teknium1
d5ba374c03 fix(telegram): detect wedged getUpdates consumer via pending_update_count
The merged CLOSE-WAIT heartbeat (#52744) only probes get_me(), which uses the
general request path and stays healthy while PTB's getUpdates consumer is
silently wedged (updater.running=True but the long-poll task is stuck, observed
on WSL2). DMs then queue in the Bot API and never reach handlers (#42909).

Augment the existing _polling_heartbeat_loop to also probe
get_webhook_info().pending_update_count. After two consecutive probes that see a
non-draining queue while the updater claims to be running, escalate into the
existing _handle_polling_network_error recovery ladder — no new restart
machinery. No-ops in webhook mode, when the updater is not running, or when a
reconnect is already in flight.

Credit to @gazzumatteo, whose PR #42959 identified the pending_update_count
signal as the missing liveness probe. This reuses the existing heartbeat +
recovery path rather than adding a parallel watchdog.

Fixes #42909.
2026-06-28 02:44:17 -07:00
teknium1
822b71cbf8 docs: add infographic for #43083 secret-redaction fix 2026-06-28 02:44:06 -07:00
teknium1
bbe1bf4045 fix(agent): stop redacting tool-call args in history; fix auth-header quote-eating
Two related redaction bugs from #43083:

1. build_assistant_message redacted tool-call arguments in-memory. That dict
   feeds both the replayed conversation history and state.db (which is itself
   replayed verbatim on session resume), so the model read back its own
   PGPASSWORD='***' psql call and copied the placeholder, breaking every
   credential-dependent command on the second turn. The masking gave no real
   protection either — the same secret still leaks through tool OUTPUT. Remove
   it. Keeping secrets out of the replayable store is a separate
   tokenization/vault concern (security.redact_secrets still governs
   storage-time redaction elsewhere).

2. _AUTH_HEADER_RE's greedy \S+ credential class ate a closing quote when the
   token sat flush against it (Authorization: Bearer sk-.."), turning value
   corruption into syntax corruption (unterminated quote -> shell EOF /
   SyntaxError). Exclude " and ' from the token class; real credentials never
   contain them.

Closes #43083.
2026-06-28 02:44:06 -07:00
yoniebans
204a67f0c8 fix(kanban): retry write_txn on transient SQLITE_BUSY 2026-06-28 02:44:04 -07:00
yoniebans
90c1dc0493 test(kanban): cover write_txn BUSY retry (currently failing) 2026-06-28 02:44:04 -07:00
teknium1
9844243b18 fix(gateway): gate quick_commands through slash access policy
Config-backed quick_commands bypassed the admin-only slash gate. The
early gate in _handle_message only fires for registry-known commands
(is_gateway_known_command), but quick_commands are never in the gateway
registry, so they reached the type:exec dispatch sink unchecked. An
allowlisted non-admin gateway user could invoke admin-only quick
commands — including shell exec in the gateway process — even when the
operator set allow_admin_from / user_allowed_commands to lock them out.

Apply _check_slash_access(source, command) at the quick_commands
dispatch site (the single exec chokepoint, cold-path only) using the
raw typed name. Admins and users with the command in
user_allowed_commands still run it; backward-compat (no policy set)
is unaffected.

Fixes #44727.

Co-authored-by: maxpetrusenko <max.petrusenko.agent@gmail.com>
Co-authored-by: zapabob <1920071390@campus.ouj.ac.jp>
2026-06-28 02:43:23 -07:00
Teknium
6d879d486b fix(dashboard): close PTY WebSocket on child EOF to stop FD leak (#54028) (#54123)
* fix(dashboard): close PTY WebSocket on child EOF to stop FD leak

The /api/pty handler's reader task returns on child EOF, but the writer
loop stayed blocked on ws.receive() until the browser sent a disconnect.
When the browser socket is half-open (no FIN delivered — common on
macOS/launchd), that disconnect never arrives, so the handler never
reaches its finally and the PTY master fd + child process leak. With
dashboard auto-reconnect (#52962), every dropped socket then spawns a
fresh PTY on top of the orphaned one, exhausting file descriptors within
hours (EMFILE / Errno 24).

Fix: the reader task now closes the WebSocket in a finally when the child
EOFs or the send side breaks, which unblocks ws.receive() so the existing
finally runs bridge.close(). The writer loop also guards ws.receive()
against the RuntimeError Starlette raises once the socket is closed.

Reported by @fifteenzhang.

Fixes #54028

* docs: add infographic for #54028 PTY FD leak fix
2026-06-28 02:42:21 -07:00
teknium1
7ef04ae7a7 fix(browser): close eval return-value SSRF bypass (sibling of #44731)
The snapshot/vision guards re-check the page URL before returning content,
but browser_console(expression=...) -> _browser_eval returns arbitrary JS
results directly, leaving two same-class bypasses open:

  1. Direct fetch: fetch('http://127.0.0.1/secret').then(r=>r.text()) reads
     a private endpoint and returns the body — the page URL stays public so
     the post-eval recheck never sees it.
  2. Navigate-then-read: location.href='http://127.0.0.1/' then a later eval
     reads document.body.innerText.

Guard _browser_eval on the same condition as navigate/snapshot/vision
(not local backend, not local sidecar, not allow_private_urls):
  - pre-scan the expression for private/always-blocked URL literals
  - re-check window.location.href after the eval at both success-return
    sites (supervisor fast-path + subprocess fallback)

Probe failures fail-open (matching the snapshot/vision guards).
2026-06-28 02:42:01 -07:00
liuhao1024
0ae6196087 fix(browser): allow local sidecar sessions to bypass SSRF guard
The private-network guard in browser_snapshot() and browser_vision()
blocked all private URLs, including those accessed via local sidecar
sessions (hybrid routing). Local sidecar sessions intentionally access
private URLs — the cloud provider never sees the URL in that case.

Add `_is_local_sidecar_key(effective_task_id)` check to both guards,
matching the existing pattern in browser_navigate().

Fixes #45101 review feedback from egilewski.
2026-06-28 02:42:01 -07:00
liuhao1024
48f5c42599 fix(browser): extend private-network guard to browser_vision
The SSRF bypass in #44731 was only patched for browser_snapshot(), but
browser_vision() exposes the same vulnerability — it takes a screenshot
and sends it to the vision model without checking if eval-driven
navigation moved the page to a private/internal URL.

Add the same current-page URL safety check to browser_vision() before
any screenshot is captured, encoded, or forwarded to the vision model.
This covers both the normal screenshot path and the Lightpanda Chrome
fallback path.

7 new tests: blocks private URL, allows public URL, skips in local
backend, skips when private URLs allowed, handles eval failure/empty/exception.
2026-06-28 02:42:01 -07:00
liuhao1024
7a6fe9bbfa fix(browser): block snapshot from eval-navigated private pages
browser_snapshot() now checks the current page URL before returning
content. When browser_console() changes location.href to a private or
internal address (e.g., http://127.0.0.1:8080/), the snapshot returns
an error instead of exposing the private page content.

This closes the SSRF bypass where an attacker could:
1. Navigate to a public page
2. Use browser_console to eval location.href = 'http://127.0.0.1:port/'
3. Use browser_snapshot to read the private page content

The fix reuses the existing _is_safe_url() and _allow_private_urls()
infrastructure, and fails open if the URL check itself fails.

Fixes #44731
2026-06-28 02:42:01 -07:00
Teknium
7c0a5def58 fix(memory/holographic): close DB connection on shutdown instead of leaking to GC (#54133)
HolographicMemoryProvider.shutdown() dropped its MemoryStore reference
without calling the existing MemoryStore.close(). Since the connection is
opened check_same_thread=False (one per session), its fd was released by
refcount/GC at a non-deterministic time on a non-deterministic thread,
churning a DB fd through the kernel free pool on every session teardown.
Call close() so the fd is released deterministically.

Reported by @alfranli123 (#44037), who pinpointed the exact code location.
Note: the report's TLS-fd-recycle corruption attribution could not be
reproduced from the code — dropping a sqlite connection flushes valid
SQLite pages via the VFS, never TLS framing, and the provider is at most a
releaser of DB fds, not a TLS-flushing socket owner. This change is correct
resource hygiene that removes per-session fd churn regardless.
2026-06-28 02:41:52 -07:00
Teknium
00d8c2c915 fix(gateway): prune stale sessions.json entries on startup
A hard gateway crash (exit code 1) skips the graceful shutdown path, so
sessions.json is never cleared and is left pointing at sessions already
ended in state.db. On the next startup get_or_create_session() reuses
those stale entries as long as the time/policy reset checks pass — it
never consults end_reason — so every incoming message is silently routed
into a closed session, with no log or error (#52804).

SessionStore._ensure_loaded_locked() now calls a new
_prune_stale_sessions_locked() that drops any entry whose session_id has
end_reason IS NOT NULL in state.db. Idempotent, _db=None / legacy-absent
safe, DB errors non-fatal, sessions.json rewritten only when something
was pruned. Self-heals into a fresh session on the next message.

Reported and diagnosed by @terry197913 (#52808).
2026-06-28 02:41:47 -07:00
teknium1
c38dfba3a7 docs: add infographic for #53175 gateway cleanup off-loop fix 2026-06-28 02:41:36 -07:00
teknium1
ea5aaa7a22 fix(gateway): offload remaining inline agent cleanup off the event loop (#53175)
#35994 moved /new reset cleanup off the loop, but _cleanup_agent_resources
(agent.close() subprocess teardown; shutdown_memory_provider() plugin IO) was
still called INLINE on the event loop from three other sites:

  - _session_expiry_watcher (5-min idle sweep) — live loop
  - _handle_message_with_agent cache-hygiene re-eviction — live loop
  - _finalize_shutdown_agents / stop() idle-cache loop — shutdown

A wedged memory provider on any of these froze the loop: bot goes silent,
runtime-status updated_at heartbeat stops advancing, and SIGTERM can't be
serviced (requires kill -9) — exactly the #53175 zombie pattern.

Adds _cleanup_agent_resources_off_loop: a bounded (30s) worker-thread offload
mirroring the #35994 reset fix, and routes all four sites through it.
2026-06-28 02:41:36 -07:00
teknium1
aa50c1ba5d fix(prompt): repair backend probe import (get_environment never existed)
The system-prompt backend probe imported a nonexistent symbol —
`from tools.environments import get_environment` — which always raised
ImportError: cannot import name 'get_environment'. The exception is caught
and only drops the live backend description to a static fallback, so it is
cosmetic, but it broke the live OS/user/cwd probe for every non-local
backend (docker/singularity/modal/daytona/ssh).

The real factory is `_create_environment` in tools.terminal_tool. Build the
environment the same way the live terminal path does (select backend image,
assemble ssh/container config from _get_env_config()), then run the probe.

Note: this does NOT affect tool loading — tool selection runs each tool's
check_fn and never consults this probe. Regression from #52147 (2026-06-25).

Closes #53667 (probe import); the 'cronjob-only' tool-collapse symptom is
not reproducible — tool selection has no probe dependency and memory's
check_fn is unconditionally True.
2026-06-28 02:41:31 -07:00
Teknium
b508d4296e test(ci): raise per-file timeout 140s → 300s to stop false timeouts (#54143)
* test(ci): raise per-file timeout 140s to 300s to stop false timeouts

The per-file parallel runner caps each test-file subprocess at a flat
wall-clock budget. Combined with per-test subprocess isolation (a fresh
Python process per test), a large-collection file pays N x (interpreter
startup + import) of overhead before any test logic runs. That overhead
dilates under load on shared CI runners, so a file that finishes in
~100s on a quiet box can blow the old 140s cap purely from scheduling
jitter, surfacing as a false 'no tests ran' timeout (rc=124) with zero
actual test failures.

Raise the default to 300s (5 min). The Docker build matrix jobs already
take 7-10 min, so this headroom costs nothing on total CI wall time
while still bounding a genuinely hung file.

* docs: add infographic for CI per-file timeout bump
2026-06-28 02:41:07 -07:00
teknium1
dcc6cd1b42 docs: add infographic for #52378 Windows update-loop salvage 2026-06-28 02:40:37 -07:00
teknium1
fe89ce0694 chore(release): map Cossackx in AUTHOR_MAP for #52528 salvage 2026-06-28 02:40:37 -07:00
Cossackx
ba37c910e0 fix(desktop/windows): resolve real hermes over extensionless shim + prefer --update on recovery
Two Windows-only desktop boot bugs that caused spurious reinstall/repair loops:

1. findOnPath() searched the empty extension BEFORE PATHEXT, so an
   extensionless Git-Bash `hermes` shim shadowed the real hermes.cmd/.exe.
   The shim then failed the shell:false --version probe and the resolver
   fell through to bootstrap/repair even though a working CLI was on PATH.
   Fix: try PATHEXT extensions first, keep the empty entry LAST so callers
   that already include the extension (py.exe, pwsh.exe) still resolve.

2. handOffWindowsBootstrapRecovery() chose the destructive --repair over the
   gentle --update by checking only venv\Scripts\hermes.exe -- the setuptools
   console-script shim, written at the END of venv setup and absent in
   interrupted/quarantined states. Fix: take --update when ANY real-install
   signal is present (venv python, the shim, or .hermes-bootstrap-complete).

Adds windows-hermes-resolution.test.cjs (source-assertion pattern, wired into
test:desktop:platforms) guarding both regressions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-28 02:40:37 -07:00
Cornna
0229246ab8 fix(desktop): probe venv runtime health before trusting bootstrap marker
A broken/empty Windows launcher venv can see the source tree via PYTHONPATH
but lack PyYAML, so 'import hermes_cli' succeeds while the first real CLI
import dies — the desktop then trusts the bootstrap marker, spawns a dead
backend, and loops on 'gateway offline' (#52378).

- backend-probes.cjs: canImportHermesCli now runs 'import yaml; import
  hermes_cli.config' (extracted as hermesRuntimeImportProbe) and accepts an
  env override, so a dependency regression is caught without a real broken
  venv fixture.
- main.cjs: isBootstrapComplete() routes through new isActiveRuntimeUsable(),
  which requires the venv python to pass the runtime import probe (with
  ACTIVE_HERMES_ROOT on PYTHONPATH) — not just exist on disk.

Salvaged from PR #38179. The PR's install.ps1 reset/clean + autocrlf changes
and their tests are dropped: current main already preserves dirty checkouts
via stash (the data-loss-safe #38542 path) rather than the PR's older
reset-based Repair-ManagedCheckoutBeforeUpdate approach.
2026-06-28 02:40:37 -07:00
teknium1
7c9cdad9fd test(cli): cover Windows self-lock recovery guard + cmd-quote its hint
Add two tests for the self-lock guard in _recover_from_interrupted_install:
one asserting it clears the marker and skips install when hermes.exe is a
process ancestor (breaking the #52378/#45542 loop), one asserting it falls
through to a normal recovery install when the shim is NOT an ancestor.

The guard's manual-recovery hint runs only inside the Windows branch, so
quote it for cmd.exe (cd /d, double-quoted paths) — the cross-platform
fallback hint at the end of the function is left POSIX-correct.

Map Icather in scripts/release.py AUTHOR_MAP for the salvage.
2026-06-28 02:40:37 -07:00
灵越羽毛
b6f592dbdc fix(cli): detect self-lock in update recovery to break infinite retry loop on Windows 2026-06-28 02:40:37 -07:00
liuhao1024
14baeefe1d fix(matrix): record DM rooms in m.direct on invite to prevent group misclassification
Rebase onto plugins/platforms/matrix/adapter.py (code moved from
gateway/platforms/matrix.py). Same logic: _on_invite checks is_direct
on invite events and calls _record_dm_room to persist in m.direct
account data.

Fixes #44679
2026-06-28 02:37:52 -07:00
Teknium
fde1c8570f fix(tui_gateway): suppress WS peer-hangup teardown error flood (#50005) (#54126)
When the Desktop forcibly closes its WebSocket mid-write, asyncio logs a
full traceback for every pending connection-lost callback — 50+ identical
WinError 10054 (ConnectionResetError) lines per disconnect on Windows, the
equivalent ConnectionResetError/BrokenPipeError on POSIX. These are not
actionable: they are the expected side effect of the peer hanging up before
our writes drained.

Install a loop exception handler on the gateway serving loop that collapses
exactly this teardown class (ConnectionResetError/ConnectionAbortedError/
BrokenPipeError originating from _call_connection_lost) to a single debug
line, forwarding every other loop error to the existing/default handler
unchanged so genuine loop bugs still surface. Idempotent per loop.
2026-06-28 02:35:01 -07:00
teknium1
6eec0d4f08 docs: add infographic for #53107 gateway force-exit fix 2026-06-28 02:34:23 -07:00
LeonSGP43
9f0e64cedd fix(gateway): force exit after graceful shutdown
Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-06-28 02:34:23 -07:00
teknium1
dddaea0c98 chore(release): map yungchentang author for #53622 salvage 2026-06-28 02:34:17 -07:00
yungchentang
7e2ca7f68d fix(telegram): reset send pool after pool timeouts 2026-06-28 02:34:17 -07:00
kshitij
f3d8f20a59 Merge pull request #54116 from kshitijk4poor/fix/36658-gateway-drain-microtask
fix(tui): defer buffered gateway events to stop dashboard chat #301 (#36658)
2026-06-28 14:50:07 +05:30
Teknium
f646b82ff0 docs: add infographic for #38249 atomic env-snapshot fix 2026-06-28 02:08:57 -07:00
Teknium
9f17f16c66 fix(environments): use $BASHPID for atomic snapshot temp + harden failure path
The atomic mv approach (kyssta-exe's commit) narrows but does not close the
#38249 race: the temp name used $$ (parent shell PID), which is identical
across &-launched concurrent subshells. Two concurrent writers pick the same
temp file, clobber each other mid-write, and mv then publishes a torn snapshot
— a reader sourcing it absorbs declare-x/export fragments into PATH.

- Use $BASHPID (actual per-subshell PID) so concurrent writers never collide.
- Chain mv on export success (&&) and rm the temp on failure so a partial dump
  never replaces a good snapshot; apply the same to the init_session bootstrap.
- shlex-quote the static temp-path portion (Windows/spaces), $BASHPID outside.
- LocalEnvironment.cleanup sweeps orphaned snap.tmp.* temps.
- Regression tests: string-shape + a behavioral concurrent writers/readers test
  that proves the snapshot never tears (would still tear with $$).
2026-06-28 02:08:57 -07:00
kyssta-exe
6a2958a521 fix(environments): use atomic file replacement for snapshot writes
Fix race condition in terminal environment snapshots that could corrupt
PATH with declare -x entries. When concurrent terminal calls share the
same snapshot file, the non-atomic 'export -p > snapshot.sh' write could
be read mid-write by another process, causing partial/corrupted env vars
to be sourced and mixed into PATH.

The fix uses atomic file replacement:
- Write to a temp file: export -p > snapshot.sh.tmp.303651
- Atomically replace: mv -f snapshot.sh.tmp.303651 snapshot.sh

On POSIX, mv within the same filesystem is atomic, so source() will
either see the old complete snapshot or the new complete one, never a
partial/truncated file.

Fixes #38249
2026-06-28 02:08:57 -07:00
teknium1
c23f394eb8 fix: satisfy ruff encoding + windows-footgun lints for cgroup reaper
- read_text(encoding='utf-8') (PLW1514)
- # windows-footgun: ok on signal.SIGKILL — module is Linux-only (reads
  /proc, /sys/fs/cgroup; runs from a systemd unit)
- test lambda accepts the new encoding kwarg
2026-06-28 02:05:50 -07:00
teknium1
86ec979f66 chore(release): map PRATHAMESH75 author for #37550 salvage 2026-06-28 02:05:50 -07:00
PRATHAMESH75
e551da6ddb fix(gateway): reap cgroup orphans via ExecStopPost to unblock restart
Long-lived helpers spawned indirectly by tool calls (adb, platform
bridges) were left in the service cgroup after the gateway's main
process exited. When the kernel rejected the deferred cgroup-wide kill
with EINVAL, systemd blocked Restart=always for 6+ minutes, taking
down all platforms and cron windows (#37454).

Add a small ExecStopPost helper (gateway.cgroup_cleanup) that walks
cgroup.procs and sends per-PID SIGKILLs — a different kernel code path
than cgroup.kill, so it succeeds where the cgroup-wide write failed.
KillMode=mixed is preserved so the gateway still reaps its own
tool-call children before systemd intervenes (#8202).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-28 02:05:50 -07:00
Coy Geek
d7a1052424 fix(env-passthrough): fail closed when provider blocklist import fails
When tools.environments.local can't be imported (partial install,
import-time error), _is_hermes_provider_credential() returned False —
fail-open. A skill could then register a Hermes provider credential
(ANTHROPIC_API_KEY, etc.) as env passthrough; _scrub_child_env lets
passthrough vars bypass the secret-substring net (rule 1), so the
operator's real key would land in the execute_code child. Reopens the
GHSA-rhgp-j443-p4rf bypass.

Fail closed instead: on import failure, treat the name as a protected
provider credential and refuse passthrough. Regression test exercises
the full register -> scrub path under a simulated import failure.

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-28 02:05:43 -07:00
teknium1
58c36b1798 fix(api-server): widen error redaction to cron-endpoint + SSE sites
Follow-up to the salvaged #37733 fix. The contributor centralized
redaction at _openai_error and the chat/responses failure paths, which
covers the OpenAI-compatible envelopes transitively. Two sibling classes
crossed the same authenticated HTTP boundary unredacted:

- 8x cron-management endpoints returning {"error": str(e)} on 500
- the session-chat SSE error event ({"message": str(exc)})

Route both through the same _redact_api_error_text(force=True) helper.
Add AUTHOR_MAP entry for coygeek and a TestRedactApiErrorText guard
covering mask/force/limit/passthrough behavior.
2026-06-28 02:05:38 -07:00
Coy Geek
5e774de76e fix(api-server): redact provider errors at HTTP boundary
Force API-server error text through the existing secret redactor before returning OpenAI-compatible errors, response fallback text, response snapshots, and run failure events. This prevents credential-shaped provider failure text from crossing the API-server boundary while preserving debuggable sanitized messages.
2026-06-28 02:05:38 -07:00
HexLab98
d2fda5925d test(gateway): cover Discord/Slack compression status suppression (#39293) 2026-06-28 14:35:32 +05:30
HexLab98
d2ea948bc0 fix(gateway): suppress compression status noise on Discord and other chats (#39293)
Extend the gateway noisy-status filter beyond Telegram so internal
compression lifecycle messages stay in logs instead of spamming Discord,
Slack, and other messaging channels.
2026-06-28 14:35:32 +05:30
teknium1
9f7d520caf docs: add infographic for #36664 WhatsApp LID session-path fix 2026-06-28 02:05:26 -07:00
teknium1
3aaa98dd01 test(whatsapp): cover LID allowlist match on modern session layout
Add an _is_user_authorized E2E for the platforms/whatsapp/session layout
on top of fesalfayed's resolver fix (#36665) — guards the actual
silently-dropped-LID-sender path from #36664.
2026-06-28 02:05:26 -07:00
fesalfayed
263ffec1b0 fix(whatsapp): resolve LID aliases on modern platforms/ session layout
expand_whatsapp_aliases hardcoded get_hermes_home()/whatsapp/session, but
the adapter writes lid-mapping files via get_hermes_dir("platforms/whatsapp/
session", "whatsapp/session"). On installs without the legacy directory the
two paths diverge, so the resolver finds no mappings and returns the bare LID,
which misses the allowlist and silently drops the message. Resolve through the
same helper so both sides stay in lockstep on new and legacy layouts.
2026-06-28 02:05:26 -07:00
teknium1
d0f087e7f9 docs: add infographic for #36109 empty-400 diagnostics 2026-06-28 02:05:20 -07:00
xxxigm
093f567f0d fix(agent,cli): surface empty-body API errors and fail oneshot exit code
When an LLM API call returns HTTP 4xx with an empty parsed SDK `body` ({}),
`_summarize_api_error` fell through to a bare `str(error)`, so users saw only
"HTTP 400" with no provider detail (reported on Windows in #36109). The SDK
leaves `body` empty in this case, but the httpx `response` still carries the
payload in `.text`.

- run_agent.py `_summarize_api_error`: when `body` is empty, fall back to
  `response.text` — parse a JSON `error.message`/`message` when present, else
  surface the raw (truncated) body. Platform-agnostic diagnostics.
- hermes_cli/oneshot.py: `hermes -z` now runs via `run_conversation` and returns
  exit code 2 when the run is failed/partial with no usable final response, so
  scripts can detect LLM failures (still 0 when a response — incl. an error
  summary as output — is produced).

Tests: new tests/run_agent/test_summarize_api_error.py (empty-body JSON + raw
text, RED/GREEN verified) + oneshot exit-code/`run_conversation` wiring tests.

NOTE: #36109's original root cause (Windows "all providers return empty 400")
is not reproducible on current main (heavy provider-transport churn since
v0.15.1). This change does not claim to fix that root cause — it makes any
empty-body API error LEGIBLE so a future occurrence shows the real provider
message instead of a bare HTTP 400. Relates to #36109 (does not close it).
2026-06-28 02:05:20 -07:00
teknium1
c0b4a3438a fix(install): scope Playwright override to too-new apt releases + keep step interruptible
Follow-up on #54032 for #35166:
- Gate the PLAYWRIGHT_HOST_PLATFORM_OVERRIDE retry on the host being an apt
  release newer than Playwright recognizes (Ubuntu >24.04 / Debian >13) via
  playwright_host_unrecognized(), instead of retrying on ANY install failure.
  A network/disk/permission failure on a supported host now surfaces unchanged
  rather than getting a mismatched-glibc build forced onto it.
- detect_os() now captures DISTRO_VERSION from os-release.
- Fold in the interruptibility fix (was PR #35304, self-closed): wrap the
  download in 'timeout --foreground -k 10' (probed, with plain-timeout
  fallback) so a terminal Ctrl+C reaches the child and a wedged download is
  force-killed after the deadline.
- Add behavioral tests that source the helpers and assert the retry fires only
  on Ubuntu 26.04 / Debian 14, not on supported hosts, non-apt distros,
  native-success, operator-pinned override, or unsupported arch.
2026-06-28 02:05:18 -07:00
kshitijk4poor
a28fe788a6 fix(install): retry Playwright install with platform override on unrecognized host (#35166)
On apt releases newer than the bundled Playwright recognizes (Ubuntu 26.04,
Debian 14, and future distros), 'npx playwright install --with-deps chromium'
hangs uninterruptibly at 'Installing Playwright Chromium with system
dependencies' because Playwright's resolver maps the host to a platform with
no download build (#35166).

Wrap every installer Playwright call in run_playwright_install(), which tries
the native install first and, only if it fails or times out, retries once with
PLAYWRIGHT_HOST_PLATFORM_OVERRIDE pinned to the newest known build
(ubuntu24.04-<arch>). This is the escape hatch Playwright's maintainers bless
for unrecognized platforms (microsoft/playwright#33434).

Try-native-first (not a hardcoded distro/version table) is deliberate:
- Self-correcting — when Playwright already supports the host (e.g. Ubuntu
  26.04 on Playwright >=1.61) the first attempt succeeds and the override is
  never applied, so we never force a mismatched-glibc build onto a release
  Playwright handles correctly (microsoft/playwright#35114).
- Zero-maintenance — new distro releases work the moment Playwright adds them.
- Covers Debian 14+ and future releases, not just Ubuntu 26.04.

An operator-set PLAYWRIGHT_HOST_PLATFORM_OVERRIDE is always respected (applied
to the first attempt; retry skipped). Non-x64/arm64 arches have no fallback
build and skip the retry.

Refs #35166
2026-06-28 02:05:18 -07:00
teknium1
64972b6403 fix(config): canonicalize model.name/model.model to model.default (#34500)
A custom_providers config that names the model under model.name (or
model.model) resolved to an empty model, so the API request went out
with model= — HTTP 400 from OpenAI-compatible backends. Display paths
(hermes status/dump) already read model.name and showed the model,
making the failure silent.

The model id was read via 'default or model' at ~14 independent sites
(cli, gateway, cron, curator, oneshot, fallback, profiles, ...), none
of which honored 'name'. Rather than patch every site, canonicalize at
the single load/save chokepoint: _normalize_root_model_keys() now
promotes model.model/model.name -> model.default (precedence
default > model > name) and drops the stale alias, so every reader —
present and future — sees a populated default and config.yaml is
migrated canonical on next save. The gateway, which bypasses
load_config(), replays the same normalization in _load_gateway_config().

Co-authored-by: Bartok9 <danielrpike9@gmail.com>

Credit: root-cause analysis and fix direction from @Bartok9 (#34502,
first) and @v86861062 (#34527).
2026-06-28 02:05:13 -07:00
kshitijk4poor
f64d15ccb7 fix(tui): defer buffered gateway events to stop dashboard chat #301 (#36658)
Dashboard /chat spawns the TUI attached to the dashboard's in-memory
gateway via HERMES_TUI_GATEWAY_URL. In that attach mode the already-running
gateway replays `gateway.ready` (and `session.info`) the instant the socket
connects, so those events land in GatewayClient.bufferedEvents *before* the
consumer's mount-time subscribe effect (useMainApp.ts) calls drain().

drain() then emitted the buffered events synchronously, so the
`gateway.ready` handler's patchUiState / setHistoryItems cascade ran while
React was still inside the first commit — tripping "Too many re-renders"
(Minified React error #301) and breaking Dashboard chat after `hermes update`.
Spawn / inline / sidecar modes never hit this: their `gateway.ready` only
arrives after the Python child boots, on a later async tick.

Fix: drain() defers the replay to the next microtask AND keeps `subscribed`
false until that microtask runs. Keeping `subscribed` false in the gap means
any live event arriving before the flush keeps buffering (publish() pushes
when !subscribed) instead of emitting synchronously and jumping ahead of the
chronologically-earlier replayed events — the flush re-drains the buffer
right after flipping `subscribed`, preserving FIFO order. A drainGeneration
token (bumped in resetStartupState) makes a queued flush a no-op if the
transport was reset/killed in the meantime, avoiding use-after-teardown and
duplicate/reordered exits.

Regression tests: (1) drain() does not dispatch buffered events synchronously;
(2) a live event arriving in the post-drain / pre-microtask window still
delivers BEHIND the earlier-buffered event (FIFO). Both are red against the
old synchronous behavior, green with this fix. Same class of fix as #44528.

Closes #36658
2026-06-28 14:18:47 +05:30
Teknium
2ecb6f7fe6 fix(telegram): clear send_path_degraded on successful reconnect (#35205) (#54076)
* fix(telegram): clear send_path_degraded on successful reconnect

_send_path_degraded was cleared only in _verify_polling_after_reconnect,
60s after reconnect and only if scheduled. A clean start_polling() reconnect
left the flag stuck True, short-circuiting send() and blocking all outbound
messages until the deferred probe ran (or forever if it never did).

Clear the flag the moment start_polling() succeeds — that is the recovery
signal. The deferred probe remains a defensive re-check that re-enters the
reconnect ladder (re-setting the flag) if it detects a silent wedge.

Fixes #35205.

* docs: add infographic for #35205 telegram send-path fix
2026-06-28 01:38:17 -07:00
Teknium
674e16e7c6 fix(redact): stop DB-connstr redaction from corrupting code output (#33801) (#54061)
Secret redaction is display/output-scoped on main — write_file writes
content verbatim, terminal/execute_code redact only output not the
command/source. The real bug is in displayed tool OUTPUT (read_file,
terminal, execute_code):

_DB_CONNSTR_RE's password group [^@]+ was greedy across newlines, so on a
multi-line block it scanned past the DSN line to the next stray '@' (a
Python @decorator), replacing every intervening character — including line
breaks — with ***. That dropped lines and concatenated the next line onto
the f-string line, making read_file output look corrupted (the file on disk
was always correct). Reported in #33801.

Fix:
- Forbid whitespace in the userinfo/password groups ([^:\s]+ / [^@\s]+) so
  the match can never span a line break. A real DSN password never contains
  whitespace. This alone kills the catastrophic line-dropping.
- Under code_file=True, preserve a password group that is a pure {...} brace
  expression — f"postgresql://{user}:{pass}@{host}" is an f-string template,
  not a live credential. Literal passwords are still masked.
- Pass code_file=True at the terminal and execute_code output redaction call
  sites (file_tools already did) so code-execution output isn't corrupted by
  ENV/JSON/template false positives. Real prefixes, auth headers, JWTs, and
  private keys are still redacted.

Verified E2E against the reporter's exact pydantic-settings module: file
written verbatim, read_file shows the DSN f-string + @model_validator intact
with zero *** corruption, while a literal postgresql://admin:pw@host DSN and
a real sk- key are still masked.

Reported-by: koishi70
Reported-by: pfrenssen
2026-06-28 01:15:39 -07:00
Teknium
de6e9ac760 docs(discord): document bot-to-bot comms as unsupported (#32791) (#54063)
* docs(discord): document bot-to-bot comms as unsupported (#32791)

Multi-profile bot-to-bot conversation is not a supported topology.
DISCORD_ALLOW_BOTS=none (the default) blocks all bot-originated
messages; setting mentions/all across multiple Hermes profiles to make
them reply to each other ack-loops because Discord's reply auto-mention
satisfies the mention gate every turn. Document the safe default and
the loop hazard so operators don't wire it up.

* docs(discord): infographic for bot-to-bot unsupported stance (#32791)
2026-06-28 01:15:34 -07:00
teknium1
4f16950e9a docs: add infographic for #32421 content-filter fallback fix 2026-06-28 01:15:21 -07:00
teknium1
578e3989d4 fix(agent): route content-filter stream stalls to fallback chain (#32421)
When a provider's output-layer safety filter (MiniMax "output new_sensitive
(1027)", Azure content_filter, etc.) kills a streaming response after deltas
were already sent, interruptible_streaming_api_call swallows the raw error
into a finish_reason=length partial-stream stub. The conversation loop then
burned 3 continuation retries against the SAME primary — re-hitting the
content-deterministic filter every time — and gave up with "Response remained
truncated after 3 continuation attempts", never consulting fallback_providers.

Builds on @595650661's classifier change (cherry-picked) so error_classifier
recognizes the filter; then:
- chat_completion_helpers: run the swallowed error through error_classifier at
  the stub-creation point and stamp _content_filter_terminated on the stub
  (single source of truth — no parallel pattern list).
- conversation_loop: read the tag and activate the fallback chain BEFORE
  burning any continuation retries; roll partial content back to the last
  clean turn and re-issue against the new provider (restart_with_rebuilt_messages).
  Plain network stalls are unaffected (only content_policy_blocked is tagged).

Credits #32479 (@sweetcornna) and #33845 (@Tranquil-Flow) which fixed the
same issue via the stub-tag and loop-escalation approaches respectively.

Live E2E confirmed: before, _try_activate_fallback called 0x; after, fallback
fires on the first stub and the fallback provider completes the turn.
2026-06-28 01:15:21 -07:00
595650661
b8e2268628 fix(agent): add MiniMax 'new_sensitive' to content_policy_blocked patterns
The MiniMax output-layer safety filter surfaces the error verbatim as
`output new_sensitive (1027)` (sometimes with additional provider
wrapping like 'Stream stalled mid tool-call: output new_sensitive (1027)').
When the model emits a large tool-call argument block, the upstream
filter trips and the SSE stream is truncated mid-flight, producing
'stream stalled mid tool-call' errors. Until now this case was
misclassified and retried 3x on the same provider, reproducing the same
refusal and burning paid attempts.

Adding `new_sensitive` to `_CONTENT_POLICY_BLOCKED_PATTERNS` routes
it through the existing is_client_error path: skip 3x retry, activate
configured fallback model immediately, surface a clear provider-safety
message to the user.

Refs #32421
2026-06-28 01:15:21 -07:00
Teknium
c9df4bc094 fix(gateway): default restart_drain_timeout to 0 to kill systemd crash loop (#54066)
A restart now interrupts in-flight agents immediately rather than holding
the gateway open for a grace window. The previous 180s default coupled two
independently-set timers: the gateway's own drain timer and systemd's
TimeoutStopSec. On a stale unit where TimeoutStopSec < drain, systemd
SIGKILLed the gateway mid-cleanup, leaving a stale lock that made the next
startup exit immediately ('already running') — an infinite crash loop under
Restart=on-failure (#31981).

Setting drain to 0 makes the mismatch structurally impossible: with drain 0
the generated unit gets TimeoutStopSec=90 against a near-instant drain, so
systemd never kills mid-cleanup. Contract: restart the gateway, in-flight
work stops. A grace window large enough to 'save' a long agent turn would
have to outlast an unbounded task, which is impossible.

Also fixes the stale-unit warning's suggested command
(hermes gateway service install --replace -> hermes gateway install --force);
the former subcommand does not exist.

Closes #31981
2026-06-28 01:14:34 -07:00
teknium1
0800f1c28b infographic: whatsapp send-queue serialization (#33360) 2026-06-28 01:10:14 -07:00
teknium1
cb9f855c2b test(whatsapp-bridge): drop structural send-queue integration test
The .integration.test.mjs greps bridge.js source text for the queue
wiring — a change-detector that breaks on any benign refactor of the
same code. The behavioral unit test (bridge.sendqueue.test.mjs) already
covers FIFO ordering, error isolation, timeout propagation, and
single-consumer concurrency, which is the contract that matters.
2026-06-28 01:10:14 -07:00
Tranquil-Flow
c393a8e55f fix(whatsapp-bridge): serialize sendMessage to prevent cross-chat contamination (#33360)
Concurrent sock.sendMessage() calls on a single Baileys socket can cause
the WhatsApp protocol-level routing to misdeliver messages — responses
intended for one chat appear in another.

Add a promise-based send queue that serialises all sendMessage() calls
across concurrent HTTP /send, /edit, and /send-media handlers so only
one send is in-flight at a time.

Includes unit tests for queue ordering, error isolation, timeout
propagation, and single-consumer concurrency semantics, plus an
integration check that the queue is wired into sendWithTimeout.
2026-06-28 01:10:14 -07:00
teknium1
1f72ad9be9 refactor(cli): extract interrupt recovery to a testable helper
Pull the #33271 post-interrupt recovery (flush_stdin + _force_full_redraw)
out of process_loop's finally block into _recover_terminal_after_interrupt(),
and replace the inline-logic-copy tests with ones that exercise the real
helper plus a source guard that process_loop still invokes it behind the
_last_turn_interrupted gate.
2026-06-28 01:08:09 -07:00
zccyman
f3aaba7f85 fix(cli): recover terminal state after interrupt to prevent raw control sequence freeze
When the agent is interrupted during processing, prompt_toolkit's
renderer and VT100 input parser can be left in an inconsistent state.
CSI 6n cursor position report responses leak as literal text
(^[[19;1R) and the terminal stops accepting keyboard input.

Fix: in process_loop's finally block, after an interrupted turn:
- flush_stdin() to drain stray escape bytes from the OS input buffer
- _force_full_redraw() to reset prompt_toolkit's renderer cache

Closes #33271
2026-06-28 01:08:09 -07:00
teknium1
2e1b48ed31 chore: map kurlyk local email → skabartem for PR #32867 salvage 2026-06-28 01:08:04 -07:00
kurlyk
def97bcd96 fix: eliminate race condition in OpenAI client replacement
Make check-and-replace atomic in _ensure_primary_openai_client by
keeping both operations under the same lock acquisition. Previously,
the lock was released between detecting a closed client and replacing
it, allowing two threads to simultaneously replace the client.

Fixes #32846

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-28 01:08:04 -07:00
teknium1
4a0fe4e54a docs: add PR infographic for #32762 clarify-expiry fix 2026-06-28 01:07:53 -07:00
teknium1
aacc15b2c9 fix(clarify): raise default clarify_timeout to 3600s (#32762)
The 600s default evicted the gateway clarify entry while users were
still away (meeting/AFK); a later button tap then landed on a dead
entry and the agent hung on 'running: clarify'. Raise the default to
1h in DEFAULT_CONFIG and the get_clarify_timeout() code-level fallback,
documenting the running-agent-guard tradeoff. User overrides still win.
2026-06-28 01:07:53 -07:00
konsisumer
3f543229f2 fix(telegram): notify user when clarify button tap arrives after expiry 2026-06-28 01:07:53 -07:00
Teknium
90d25adc9e fix(gateway): deliver profile-scoped cache media on symlinked HERMES_HOME (#54060)
Generated images under a profile gateway's cache (profiles/<name>/cache/
images/...) were silently dropped from Telegram/Discord delivery when
HERMES_HOME is symlinked under a denied prefix (e.g. /opt/data ->
/root/.hermes) and $HOME is not that prefix. The resolved path lands
under /root (a system denylist prefix), the root-home exception only
fires when the denied prefix IS $HOME, and the static safe-roots list
only covers the active HERMES_HOME's top-level cache — not per-profile
cache dirs. Both gates fail, so validate_media_delivery_path returns
None and the gateway logs 'Skipping unsafe MEDIA directive path'.

_media_delivery_allowed_roots() now also enumerates per-profile cache
roots (<root>/profiles/*/cache/{images,audio,videos,documents,
screenshots}) at check time. Allowlist match runs before the denylist,
so the profile artifact delivers regardless of the /root interaction;
profile-dir credentials (auth.json) stay blocked since they aren't
under a cache subdir.

Reopened regression of #34485/#38108, neither of which covered the
profile-scoped symlink case. Fixes #31733.
2026-06-28 01:07:28 -07:00
sweetcornna
2701ea2f0c fix(agent): reopen fallback chain after primary recovery 2026-06-28 00:57:42 -07:00
teknium1
7b9ff310b6 fix: salvage #33830 for current main — relocate allow_bots bridge to telegram plugin hook, fix stale adapter import in test 2026-06-28 00:57:03 -07:00
sweetcornna
fc70d023d8 fix(telegram): apply bot auth policy to Telegram sources
# Conflicts:
#	gateway/config.py
2026-06-28 00:57:03 -07:00
sweetcornna
002357a83f fix(tui): repump stdin after readable handler errors 2026-06-28 00:53:29 -07:00
teknium1
3a03d03bdc docs: add infographic for #30636 macOS state.db fix 2026-06-28 00:53:19 -07:00
teknium1
52d774f0f9 fix(state): F_FULLFSYNC barrier at WAL checkpoints on macOS (#30636)
On Darwin, synchronous=FULL (the WAL default) only issues a plain
fsync(), which Apple documents does NOT guarantee writes reach stable
storage or stay ordered. SQLite's WAL corruption-safety guarantee
assumes the OS honors the fsync barrier; macOS does not unless the app
uses F_FULLFSYNC. During a launchd *system* shutdown the page cache is
dropped (effectively power-loss for in-flight pages), so a WAL
checkpoint whose fsync 'reported' durable may never hit the platter —
corrupting state.db with a malformed image. That is the trigger in
#30636 ('SIGTERM during launchd shutdown under high load').

Apply PRAGMA checkpoint_fullfsync=1 (macOS-guarded) in
apply_wal_with_fallback. It forces the F_FULLFSYNC barrier only at
checkpoint boundaries (where WAL frames land in the main DB), so cost
amortizes to ~+0.1ms/commit vs ~+4ms for the broader fullfsync=1.
No-op off Darwin (F_FULLFSYNC is macOS-only).

Root-cause analysis by @catapreta on #30636. Supersedes #30654, whose
synchronous=FULL is a no-op (already FULL in WAL mode) and whose
TRUNCATE-on-close is already on main.

Co-authored-by: catapreta <catapreta@users.noreply.github.com>
2026-06-28 00:53:19 -07:00
Gille
9229d0db17 fix(moa): preserve Nous provider identity for references 2026-06-28 00:47:15 -07:00
Teknium
7c38249c79 feat(moa): references see full tool state + fire on every user/tool response (#54016)
The advisory reference view stripped all tool calls and tool results, so
reference models judged a task whose actions and results they never saw — and
references only fired once per user turn, never re-running as the agent's
state advanced through the tool loop.

Two fixes:
- _reference_messages() now PRESERVES the agent's tool calls and tool results,
  rendering them inline as text ([called tool: ...] / [tool result: ...]) so a
  reference gives an informed judgement on the real current state. Still emits
  zero tool-role messages and zero tool_calls arrays (strict providers reject
  those), and large tool results are previewed head+tail (4000-char budget).
  The required end-on-user shape is met by APPENDING a synthetic advisory user
  turn — not by deleting the agent's latest context (which the prior fix did).
- References now re-run on every state change — each new user message AND each
  new tool result — instead of once per user turn. The state-sensitive advisory
  signature drives the cache: new tool result = miss (re-run), identical-state
  re-call = hit (no re-run, no re-emit).

The acting aggregator still receives the full, untrimmed transcript.
2026-06-28 00:30:11 -07:00
kshitijk4poor
fc7a01b6cb test+harden: modernize salvaged Matrix path for current plugin layout
Two follow-ups on top of the salvaged #46365 fix:

1. Tests: the salvaged tests injected the ephemeral MatrixAdapter via
   sys.modules["gateway.platforms.matrix"], but Matrix migrated to a plugin
   (#41112) and the fallback now imports from plugins.platforms.matrix.adapter.
   Point the three sys.modules patches at the current module path so the
   ephemeral-fallback tests actually exercise the injected fake adapter.

2. Harden the live-adapter lookup: split the gateway import guard from the
   adapter lookup and log (instead of silently swallowing) when a runner
   exists but adapters.get() raises. A silent fall-through there would
   re-introduce the per-send reconnect/OTK-exhaustion storm this fix exists
   to prevent (#46310). Documented that the live adapter is gateway-owned and
   must not be disconnected, and why the ephemeral finally never touches it.
2026-06-28 12:48:08 +05:30
liuhao1024
a7fd62d824 fix(send_message): reuse live gateway adapter for Matrix media sends
When a live gateway adapter is available (i.e. the tool runs inside a
running gateway), reuse the persistent connection instead of creating a
new MatrixAdapter per call. This eliminates per-message E2EE re-init
storms that exhaust recipient OTKs and silently drop messages.

The fix follows the same pattern as _send_to_platform (line 618):
gateway_runner_ref → runner.adapters[Platform.MATRIX]. Falls back to
the ephemeral connect/disconnect cycle for standalone contexts.

Also extracts the shared send logic into _send_via_matrix_adapter()
to avoid duplicating the media dispatch code between the two paths.

Fixes #46310
2026-06-28 12:48:08 +05:30
Ben Barclay
1466eab4ee test(docker): wait for cont-init to finish before privilege-drop shim tests (#54026)
The docker-exec privilege-drop shim tests started a sleep container and
released the fixture as soon as `docker exec <c> true` returned 0. On
s6-overlay that succeeds almost immediately — ~0.05s in measurement —
long before the `01-hermes-setup` cont-init hook (docker/stage2-hook.sh)
has finished seeding + `chown hermes:hermes` config.yaml and running the
Python config migration (cont-init only fully settles at ~9.8s under
arm64 QEMU emulation).

`test_shim_opt_out_keeps_root` wipes config.yaml, writes it as root with
HERMES_DOCKER_EXEC_AS_ROOT=1, and asserts root:root ownership. When the
fixture released the test inside that ~10s window, stage2-hook's
boot-time `chown hermes:hermes config.yaml` raced the root-written file
and reset it to hermes:hermes — failing the assertion. The window is
invisible on native amd64 (stage2-hook completes in a blink) but wide
open under the arm64 build's QEMU emulation, which is why only build-arm64
flaked while build-amd64 stayed green.

Replace the responsiveness poll with a wait on the canonical
'cont-init finished' signal: $HERMES_HOME/logs/container-boot.log gaining
a `profile=default` line, written by 02-reconcile-profiles which s6 runs
strictly after 01-hermes-setup. Mirrors the readiness pattern already
used in test_container_restart.py. Also bumps the readiness timeout 20s->60s
to cover slow emulation.

No production code change — test-only hardening of a timing race.
2026-06-28 17:06:26 +10:00
Jeffrey Quesnelle
2c9b017696 Merge pull request #54000 from NousResearch/fix/desktop-main-cjs-clobber-stage-simple-git
fix(desktop): stop hermes desktop from clobbering tracked main.cjs
2026-06-28 01:56:51 -04:00
Teknium
4f61d48aef test(cron): deterministically wait for ticker, fix wall-clock flake (#54010)
tests/cron/test_scheduler_provider.py spawned a background ticker thread,
slept a fixed 0.2s, then asserted the loop had called tick()/heartbeat() at
least N times. Under loaded CI the worker thread isn't always scheduled
within that window, so the loop hadn't ticked yet — flaking with 'provider
never called tick()' (assert 0 >= 1).

Add a _wait_until(predicate, timeout) helper and replace all five fixed
time.sleep(0.2) sites with a poll on the actual predicate (calls/beats count
reached). Same contract assertions, no wall-clock dependence.
2026-06-27 22:52:29 -07:00
Teknium
1fa44180b0 fix(moa): advisory references end on a user turn + get a reference-role system prompt (#54007)
* fix(moa): reference advisory view must end with a user turn

MoA reference calls failed with Anthropic models that don't support
assistant prefill (e.g. Claude Opus 4.8): '400 ... must end with a user
message'. The advisory view built by _reference_messages() kept the last
assistant turn's text while dropping the following tool result, leaving a
trailing assistant turn — which Anthropic (and OpenRouter->Anthropic)
interpret as an assistant prefill to continue. References are advisory and
must end on the user turn they answer.

Strip trailing assistant turns from the advisory view (preserving
intervening ones). Update the existing test that encoded the buggy shape
and add a mid-tool-loop regression test.

* feat(moa): give reference models an advisory-role system prompt

Reference models received the bare trimmed conversation with no role
framing, so they assumed they were the acting agent and refused ("I can't
access repositories/URLs from here") or tried to call tools they don't have.

Prepend a dedicated advisory system prompt to every reference call: the
model is an analyst, not the actor — it cannot execute, should not
apologize for lacking tools, and should reason about the presented state to
advise the aggregator/orchestrator on approach, next steps, tool-use
strategy, risks, and anything the acting agent missed. Its output is private
guidance for the aggregator, not a user-facing answer.
2026-06-27 22:52:25 -07:00
Teknium
2523917680 fix(tests): bare pytest flags pass through run_tests.sh without a '--' separator (#54008)
The parallel runner only forwarded pytest args after a literal '--', so a
bare 'scripts/run_tests.sh tests/foo.py -q' (or -v/-x/-k/--tb=long) errored
out with 'unrecognized arguments'. This contradicted the docstring's
promise that common pytest flags pass through, and forced a retry on every
run that used pytest muscle-memory.

Now any token starting with '-' that isn't one of the runner's own options
(-j/--jobs, --paths, --slice, --file-timeout, --generate-slices, --files,
--include-integration) is routed to each per-file pytest invocation
automatically. Value-taking flags given space-separated (-k expr, -m mark,
-p plugin, -o name=val, etc.) keep their value instead of having it stolen
by positional-path discovery. The explicit '--' separator still works and
stacks with bare flags.

- scripts/run_tests_parallel.py: argv splitter routes bare unknown flags to
  pytest; value-flag lookahead; updated docstring.
- scripts/run_tests.sh: usage comment reflects bare-flag passthrough.
- tests/test_run_tests_parallel.py: 4 behavior-contract tests (bare -q runs,
  -k keeps its value/filters, '--' still works, positional path stays a root).
2026-06-27 22:43:26 -07:00
emozilla
2d206a3a42 fix(desktop): stop hermes desktop from clobbering tracked main.cjs (#52735)
`npm run build` ended with `bundle-electron-main.mjs`, which esbuild-bundled
electron/main.cjs and renamed the bundle on top of the tracked source file.
Because every `hermes desktop` runs `npm run build`, each launch rewrote a
checked-in source file (~7.5k-line source -> ~14.8k-line bundle), dirtying the
working tree with a build artifact that `git restore` couldn't keep (the next
launch re-clobbered it) and forcing autostash/restore conflicts on update.

The bundle only existed to inline `simple-git` so the packaged app.asar (which
ships no node_modules) wouldn't crash at launch with "Cannot find module
'simple-git'". Replace it with the mechanism the repo already uses for the
other hoisted runtime dep (node-pty): stage the dependency closure and resolve
it from process.resourcesPath at runtime.

- stage-native-deps.cjs: resolve simple-git's runtime closure (walking
  dependencies + optionalDependencies, so a version bump that adds a transitive
  dep can't silently reintroduce the crash) and stage it under
  build/native-deps/vendor/node_modules/. The `vendor/` nesting is load-bearing:
  electron-builder drops a node_modules dir at the ROOT of an extraResources
  copy but keeps a nested one.
- git-review-ops.cjs: fall back to the staged
  native-deps/vendor/node_modules/simple-git when the hoisted require() fails;
  dev runs resolve the hoisted copy and never hit the fallback.
- package.json: drop the bundler from the `build` script so main.cjs is never a
  build target again.
- nix/desktop.nix: drop the direct bundler call (the closure rides the existing
  `cp -rn native-deps` into $out) and patch process.resourcesPath in
  git-review-ops.cjs alongside main.cjs.
- delete scripts/bundle-electron-main.mjs.

Verified: electron-builder's own file filter keeps the full staged closure
(0 dropped), and a packaged win-unpacked build launches with the git-review
pane resolving simple-git from the staged vendor path.
2026-06-28 01:30:09 -04:00
teknium1
c918d42d88 feat(desktop): config-driven Electron launch flags + GPU policy
Adds a desktop: section to config.yaml so headless/VM users can make
`hermes desktop` launch correctly without a wrapper command:

- desktop.electron_flags: extra Electron CLI flags (e.g. --ozone-platform=x11)
  appended to every launch. Accepts a list or a shell-split string.
- desktop.disable_gpu: auto|true|false, bridged to the HERMES_DESKTOP_DISABLE_GPU
  env var the Electron app already reads. An explicit env var still wins.

cmd_gui() reads these via _desktop_launch_options() and applies them. This is
the config.yaml form of the capability proposed as a raw env var in #38934
(@1RB) — behavioral settings belong in config.yaml, not a new HERMES_* env var.

Co-authored-by: ray <86501179+1RB@users.noreply.github.com>
2026-06-27 22:26:43 -07:00
Teknium
1b70a91844 docs: third-party-product plugins ship standalone, not into core tree (#54001)
* docs: third-party-product plugins ship standalone, not into core tree

Generalizes the closed-set memory-provider policy to any plugin that
integrates someone else's product/project (observability backends,
vendor SaaS, analytics dashboards, paid-service tie-ins). These create
an open-ended maintenance burden on us for backends we don't own, so
they ship as standalone plugin repos installed into ~/.hermes/plugins/
and are promoted in #plugins-skills-and-skins — not merged into core.

- AGENTS.md: new 'what we don't want' bullet + generalized policy note
  beside the memory-provider closed-set rule
- CONTRIBUTING.md: new 'Third-Party Product Integrations' section
- build-a-hermes-plugin.md: caution callout at the top of the guide

It's a coupling decision, not a quality bar — a plugin can clear review
and still be a close.

* docs: add infographic for standalone-plugin policy
2026-06-27 22:23:50 -07:00
Rafael Millan
54ea059919 fix: fall back to no-sandbox for desktop launch on restricted Linux hosts 2026-06-27 22:16:20 -07:00
teknium1
97640fd9ad fix(desktop): reserve WCO width on plain Linux + author map
The plain-Linux overlay re-enable (#53185) left nativeOverlayWidth() at 0
for plain Linux, so the native min/max/close buttons painted on top of the
app's right-edge titlebar tools. Reserve the fallback width everywhere the
WCO overlay is painted (Windows, WSLg, plain Linux); macOS still reserves 0
since it uses traffic lights.
2026-06-27 22:05:33 -07:00
Chris Wesley
8194dbf612 fix(desktop): re-enable titleBarOverlay on plain Linux
Commit da5484b61 disabled the Window Controls Overlay on all Linux
(non-Windows, non-WSL) with the note that WCO is a Windows/macOS-only
Electron feature. However, several Linux compositors (KDE/KWin,
GNOME/Mutter) do support it — plain Electron titleBarOverlay paints
native min/max/close buttons that were working before that change.

Narrow the exclusion to only WSLg, where the RDP host draws its own
window controls and an Electron overlay would leave a dead gap.

Fixes: da5484b61 ("fix(desktop): WSL2 clipboard image paste + Linux titlebar overlay")
2026-06-27 22:05:33 -07:00
teknium1
9c7f9f9502 infographic: partial-stream recovery fix (salvage #41498) 2026-06-27 22:03:14 -07:00
infinitycrew39
1fa46570fb test(agent,gateway): cover partial-stream recovery and restart helper salvage 2026-06-27 22:03:14 -07:00
infinitycrew39
e860a40e14 fix(agent,gateway): surface partial-stream recovery and bound detached restart
Salvage of NousResearch/hermes-agent#41498 (0-CYBERDYNE-SYSTEMS-0).

- Leave response_previewed false on partial_stream_recovery so gateway
  fallback delivery can send the recovered fragment plus explanation.
- Always append the turn-completion explainer for partial_stream_recovery,
  not only for empty or very short fragments (#34452 gap).
- Launch the detached /restart helper before drain, idempotently, with a
  bounded wait of restart_drain_timeout + 5s.
2026-06-27 22:03:14 -07:00
Teknium
e3c9924b8b fix(cli): correct stale hermes auth login nous hints to hermes auth add nous (#53929)
* fix(cli): correct stale `hermes auth login nous` hints to `hermes auth add nous`

There is no `hermes auth login` subcommand — valid auth verbs are
add/list/remove/reset/status/logout/spotify. Six user-facing strings told
users to run `hermes auth login nous`, which fails with
`invalid choice: 'login'` — the same broken-hint class reported in #28089
for the proxy flow (already fixed there to `hermes auth add nous`).

Sites corrected to `hermes auth add nous`:
- hermes_cli/dashboard_register.py (401 retry hint, not-logged-in hint)
- hermes_cli/gateway_enroll.py (401 retry hint, not-logged-in hint)
- cli-config.yaml.example (two provider-requirement comments)

* docs(infographic): auth login nous hint fix
2026-06-27 21:30:37 -07:00
Teknium
4626ceb747 fix(gateway): only offer system-scope gateway install to root sessions (#53975)
Non-root users picking 'System service' in the setup wizard were handed a
'sudo hermes gateway install --system --run-as-user <you>' recipe that fails
on most distros: sudo's secure_path strips ~/.local/bin (pipx/uv installs),
so 'sudo hermes' is command-not-found. Worse, it funnels a non-root user
toward a system install they shouldn't be doing from a user session.

Now prompt_linux_gateway_install_scope() only offers system scope when
os.geteuid()==0. Non-root sessions get user-service or skip, with a tip to
re-run as root for a boot service. The non-root branch in
install_linux_gateway_from_setup becomes a defensive guard that refuses
without printing any self-elevation recipe. Gated the matching deferral hint
in setup.py behind root too.
2026-06-27 21:24:08 -07:00
teknium1
b304023fc6 docs(infographic): model picker fixes (#49129 + #51488) 2026-06-27 21:23:25 -07:00
teknium1
c72d68715f chore(release): map salvaged contributor emails for #49129 and #51488 2026-06-27 21:23:25 -07:00
Priyanshu Sharma
f6deabca0d fix(gateway): clear stale base_url on model switches 2026-06-27 21:23:25 -07:00
teknium1
f54c52800a fix(models): scope live-first picker merge to opencode aggregators only
Follow-up to the salvaged #49129 commit. The original change flipped the
shared generic-provider merge in provider_model_ids() to live-first
unconditionally, which regressed curated-first for single providers
(kimi/zai, #46309) — and the PR encoded that regression by flipping the
kimi-coding and zai test assertions to expect live-first.

Gate live-first on an explicit _LIVE_FIRST_PICKER_PROVIDERS set
({opencode-zen, opencode-go}); every other provider keeps curated-first.
Also widen the uncapped picker + live-first sets to opencode-go, which has
the same 70+ model catalog problem as opencode-zen. Restore the
kimi-coding curated-first test and rewrite the merge-order test to assert
the per-provider contract.
2026-06-27 21:23:25 -07:00
Afnath Ahamed
f98ffbc246 fix(models): live-first merge + update opencode-zen catalog + uncap aggregator picker 2026-06-27 21:23:25 -07:00
teknium1
2e7e600eaa chore(release): map HexLab98 author for PR #53863 salvage 2026-06-27 21:22:49 -07:00
HexLab98
04ff4d9b54 test(auxiliary): cover env-only proxy policy for auxiliary clients (#53702) 2026-06-27 21:22:49 -07:00
HexLab98
073847c0f2 fix(auxiliary): use env-only proxy policy for OpenAI SDK clients (#53702)
Auxiliary clients now inject a keepalive httpx transport with explicit
HTTPS_PROXY/NO_PROXY resolution, matching the main agent. This avoids
macOS system proxy settings (which omit the ExceptionsList) breaking
vision and other auxiliary calls to internal provider endpoints.
2026-06-27 21:22:49 -07:00
Teknium
3b23a984b5 feat(kanban): stamp handoff freshness so workers don't read stale state as current (#53973)
Multi-agent boards leak staleness: a sibling worker's parent handoff,
comment, or prior-attempt summary gets read by the next worker as live
truth even when it's a day old. build_worker_context surfaced the text
with (at best) a bare absolute timestamp, which an LLM reads as fact
regardless of age — parent results had no timestamp at all.

Adds a coarse relative-age stamp (just now / 18h ago / 3d ago) to every
recalled-state line and a one-line 'point-in-time snapshot, re-verify
against source' frame on the parent-results section, so the worker sees
when handoffs were produced and re-checks stale ones before acting.
2026-06-27 21:21:54 -07:00
Teknium
131c9c542c test(tui-gateway): stop deferred-resume build thread leaking into next test
test_session_resume_uses_parent_lineage_for_display resumes via the
deferred (non-eager) path, which fires a 50ms background Timer
(_schedule_agent_build) calling whatever server._make_agent is patched
in at that moment. The timer outlived the test and landed in the next
test's (_follows_compression_tip) _make_agent mock, racily setting
agent_session_id='tip' and flaking 'assert tip == cont_tip' on CI.

Root-cause fix: stub _schedule_agent_build to a no-op in the leaking
test (it only asserts display history). Defense in depth: the victim's
fake_make_agent now setdefault()s so a stray late build can't overwrite
the synchronous eager build's captured id.
2026-06-27 21:07:53 -07:00
Teknium
e418605450 test(24996): freeze monotonic clock to de-flake fallback cooldown timing
The exhaustion-cooldown timing assertions relied on a wall-clock budget
(before + window + 1.0s). On loaded CI runners the activation calls could
exceed the 1s slack, flaking 'Run tests slice 4/8'. Freeze
chat_completion_helpers.time.monotonic so the cooldown math is exact and
load-independent across all four tests.
2026-06-27 21:07:53 -07:00
teknium1
1ad8b44413 docs(infographic): skill sync external_dirs shadow fix 2026-06-27 21:07:53 -07:00
zccyman
db11849c9d fix(skills): skip shadowing when external_dirs provides the skill
Fixes #28126. sync_skills() was unconditionally writing bundled skills
into the local <profile_home>/skills/ tree even when the profile's
config.yaml delegated skill resolution to an external directory
via skills.external_dirs. The skill loader then saw two candidates
for the same name (local shadow + external canonical), refused to
resolve on collision, and every worker that auto-loaded such a skill
crashed with 'Unknown skill(s): <name>'.

Changes:
- _build_external_skill_index() indexes skills available in external
  dirs (by directory name and frontmatter name)
- sync_skills() skips writing a bundled skill when it finds the same
  name in the external index; records the hash in the manifest so
  subsequent syncs treat it as already handled
- Self-healing: removes stale local shadows left by prior buggy syncs
  (only when origin_hash == bundled_hash == user_hash, i.e. we wrote
  it and user didn't touch it)
- New 'shadowed_by_external' key in sync_skills() return dict

3 new tests in TestExternalDirsIndexing (all passing).
All 48 tests in test_skills_sync.py pass.

Closes #28126
2026-06-27 21:07:53 -07:00
Teknium
a8c862900b fix(tui): sanitize replay history on WebUI/TUI session resume (#29086) (#53939)
A WebUI/TUI session whose last turn died mid-tool-loop (stale-timeout kill,
interrupt, or process restart before the tool result was written) persists a
dangling assistant(tool_calls) or interrupted assistant->tool tail. The
messaging gateway already strips these tails before replay (the #49201 fix),
but the TUI/WebUI resume path fed db.get_messages_as_conversation() straight
in as the agent's conversation_history with no cleanup. The model re-issued
the unanswered call on every resume -- including after a full WebUI + Gateway
restart, since the poison lives in the SessionDB, not memory -- leaving the
session permanently 'thinking'. Only deleting the session recovered it.

- Extract the two strippers + helper from gateway/run.py into a shared
  agent/replay_cleanup.py (sanitize_replay_history wraps both).
- gateway/run.py re-exports under the historical private names; messaging
  behavior unchanged.
- Both TUI cold-resume sites now sanitize the model-fed history while leaving
  the display transcript untouched, so the user still sees their full history.

Verified E2E against a real SessionDB: dangling and interrupted tails are
stripped from the model feed, healthy mid-progress tool sequences are
preserved, and the display transcript is always the full raw history.
2026-06-27 20:56:49 -07:00
Teknium
f03823014b fix(telegram): kill 409 polling conflict loop by disarming PTB retry synchronously (#53941)
Telegram polling entered a self-inflicted ~31s loop of 409 Conflict ->
retry -> resume -> Conflict. The error_callback PTB invokes synchronously
inside its internal network_retry_loop only scheduled our async recovery
task (loop.create_task) and returned, so PTB kept polling getUpdates on its
own while our handler concurrently ran stop -> sleep -> start_polling. The
two polling sessions overlapped and Telegram returned a fresh 409.

Fix: in the conflict branch of the error_callback, synchronously set PTB's
private polling stop_event before scheduling recovery. PTB's loop exits on
its next tick (it races that event in do_action), so our handler owns
polling alone. The handler's await updater.stop() drains the task and PTB
clears the event, so the subsequent start_polling() builds a fresh event
and is not poisoned.

Keeps the existing reconnect ladder intact (option B) — fixes only the
race. Defensive: probes mangled + unmangled stop_event spellings and no-ops
(prior behaviour) if neither exists; never flips _running, which would make
the handler skip stop() and leave the loop wedged.
2026-06-27 20:46:08 -07:00
Teknium
d43e0cf304 fix(agent): config-driven intent-ack continuation for all api_modes (#27881) (#53943)
* fix(agent): config-driven intent-ack continuation for all api_modes (#27881)

The agent could end a turn after only stating intent ('I will run a health
check...') without executing the announced tool call, forcing the user to
re-prompt. A continuation guard that catches this and nudges the model to
proceed already existed but was hard-gated to the codex_responses api_mode,
so Gemini/Claude/OpenRouter turns never benefited.

- New agent.intent_ack_continuation config (default 'auto' = codex-only,
  byte-stable for existing conversations). 'true'/model-list opts every
  api_mode in; 'false' disables. Mirrors agent.tool_use_enforcement's shape.
- looks_like_codex_intermediate_ack gains require_workspace (default True).
  The opted-in path drops the codebase/filesystem requirement so general
  autonomous workflows (server ops, deploys, API calls) are caught, not just
  coding tasks. Future-ack + action-verb + short-content + no-prior-tool
  guards still apply; the 2-nudge-per-turn cap is unchanged.
- Resolution centralized in intent_ack_continuation_mode (off/codex_only/all).

* docs(infographic): intent-ack continuation (#27881)
2026-06-27 20:46:00 -07:00
Teknium
56abbaeac3 fix(curator): fail closed on unverified skill deletes during consolidation (#53935)
The curator's LLM consolidation pass could archive whole clusters of
active skills with zero verified consolidations (#29912): a bare prune
(skill_manage delete with absorbed_into empty/omitted) from the forked
review agent was accepted, removing the skill's name from lookup even
though counts.consolidated_this_run was 0.

- _delete_skill now fails closed during the curator/background-review
  pass: a delete is only allowed when it declares a verified
  consolidation (absorbed_into=<umbrella>, umbrella must exist). A prune
  with no forwarding target is refused; the skill stays active. The
  deterministic inactivity prune (archive_skill) is unaffected.
- A verified consolidation delete during the curator pass now routes
  through the recoverable archive primitive instead of shutil.rmtree, so
  a misjudged consolidation can be undone with hermes curator restore.
  The usage record is kept (state=archived) rather than forgotten.
- Foreground, user-directed deletes keep their existing hard-delete
  semantics.
2026-06-27 20:45:57 -07:00
konsisumer
11b0be8d15 fix(gateway): avoid Matrix pending invite boot loops 2026-06-27 20:45:51 -07:00
teknium1
a1ac6baac4 fix(gateway): make bg-process reset TTL configurable + surface session-scoped processes
Follow-up to the cherry-picked #29212 (#29177):

- Promote the 24h stale-process threshold to config.yaml
  (session_reset.bg_process_max_age_hours) instead of a hardcoded
  constant. 0 disables the cutoff (legacy: any live process blocks reset).
  Wired through GatewayConfig.default_reset_policy in gateway/run.py.
- Bug 2: process(action=list) now resolves the gateway session_key from
  the contextvar and surfaces session-scoped background processes (a
  forgotten preview server under a different task), flagged
  session_scoped — so the agent/user can discover and kill the blocker.
  Previously the task-scoped list returned [] and the blocker was invisible.
- Tests: config round-trip for the new field, cross-task list visibility.
- Docs: messaging session-reset section.
2026-06-27 20:45:43 -07:00
annguyenNous
33d8b66d5b fix: stale background processes no longer permanently block session reset
Background processes (e.g. http.server preview) that Hermes starts and
forgets about previously blocked session idle/daily reset indefinitely.
The reset guard in session.py checked has_active_for_session() with no
max age — a 3-day-old preview server blocked reset the same as a task
started 30 seconds ago.

Changes:
- Add max_active_age parameter to has_active_for_session() in
  process_registry.py. Processes older than this threshold are ignored.
- Add MAX_ACTIVE_PROCESS_AGE constant (24h / 86400s).
- Wire max_active_age into the gateway's session store callback in
  run.py so stale processes no longer block session lifecycle.
- Add debug logging when reset is skipped due to active processes.
- Add 3 tests covering recent, stale, and legacy (None) max age.

Fixes #29177
2026-06-27 20:45:43 -07:00
teknium1
8c8967a50b fix: defer hermes_subprocess_env import in browser_tool
The module-level import broke tests/tools/test_managed_browserbase_and_modal.py,
which loads browser_tool.py via spec_from_file_location against a stubbed
'tools' package that does not include tools.environments.local. Move the import
into a _build_browser_env() helper called at the two agent-browser spawn sites,
matching the lazy-import pattern already used by lazy_deps.py.
2026-06-27 20:45:31 -07:00
teknium1
9c6229ce24 fix(security): centralize credential-safe subprocess env (#29157)
Subprocesses spawned outside the terminal/execute_code path (agent-browser,
copilot ACP, dep-ensure, lazy_deps uv install, TUI Node host, cli.exec)
inherited the operator's full credential environment via os.environ.copy().
The terminal path was already scrubbed by _HERMES_PROVIDER_ENV_BLOCKLIST
(#1002/#1264/#32314); these spawn sites bypassed it.

Adds hermes_subprocess_env(inherit_credentials=) in tools/environments/local.py
reusing the existing dynamic blocklist as the single source of truth:

  - Tier 1 (_ALWAYS_STRIP_KEYS): gateway bot tokens, GitHub auth, infra
    secrets -- stripped even for credential-inheriting children.
  - Tier 2 (_HERMES_PROVIDER_ENV_BLOCKLIST): provider/tool keys -- stripped
    unless inherit_credentials=True. The opt-in is grep-able for audit.

Browser worker keeps a _BROWSER_PASSTHROUGH_KEYS allowlist (BROWSERBASE/
FIRECRAWL) re-added after the strip. Model-driving children (ACP, TUI Node
host, cli.exec) use inherit_credentials=True so they still get provider keys
while losing Tier-1 secrets. Installers (dep-ensure, lazy_deps) inherit
nothing sensitive. cua_backend already routed through _sanitize_subprocess_env
on main -- left as-is. Gateway adapter utility spawns (gh pr comment, ffmpeg)
are left inheriting env: gh needs GH_TOKEN by design, ffmpeg is a trusted
system binary -- no untrusted-dependency exposure.

This is defense-in-depth (personal-assistant trust model: same-user spawns),
making the existing scrub policy uniform across the spawn surface; the main
real payoff is shrinking the blast radius if a transitive npm dep in
agent-browser is compromised.

Reconstructed on current main from the design in #31959 (Tranquil-Flow);
also credits #39003 (rodboev), #37843 (coygeek), #35769 (egilewski).

Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
Co-authored-by: rodboev <rod.boev@gmail.com>
Co-authored-by: egilewski <egilewski@egilewski.com>
2026-06-27 20:45:31 -07:00
Hermes Agent
88b3d8638e test: de-flake SIGKILL-tree, compression-tip resume, and fallback-cooldown tests
Three CI flakes hit while landing the credential-pool restore fix; all three
were timing/wall-clock races in the tests, not product bugs (each passes
locally and the assertions are correct):

- test_entire_tree_is_sigkilled_not_just_parent: _terminate_host_pid SIGKILLs
  synchronously, but the test's 4s budget after a 1s in-function SIGTERM grace
  left almost no slack for the kernel to tear down 3 processes + reparent the
  children to zombies under loaded-CI scheduling. Widen the wait to 15s and
  make the liveness predicate tolerant of vanished-pid / zombie races. The
  assertion never weakens: every tree member must end up dead or zombie.

- test_session_resume_follows_compression_tip: appended messages got
  time.time() timestamps (~now) while the test forced session started_at into
  the past, so the get_compression_tip MAX(m.timestamp) tiebreaker depended on
  wall-clock ordering. Pass explicit, well-separated message timestamps so the
  chain resolution is deterministic by construction.

- test_non_retryable_exhaustion_arms_cooldown: asserted the short (5s)
  exhaustion cooldown with a tight +1.0s slack, which false-fails when
  wall-clock jitter between the 'before' snapshot and the cooldown computation
  exceeds a second on a loaded runner. Widen to +30s — still cleanly below the
  60s rate-limit window it must distinguish from.
2026-06-27 20:04:45 -07:00
Jack Maloney
f0de4c6a47 fix(pool): re-select from credential pool on primary runtime restore
_restore_primary_runtime restored the construction-time api_key snapshot and
never consulted the credential pool. After the pool rotated away from a
revoked/exhausted entry mid-session, every new turn restored the dead key,
re-failed instantly, burned the remaining entries, and fell through to
cross-provider fallback.

After restoring the snapshot, re-select the pool's current best entry and
swap the live credential in via _swap_credential (which already rebuilds the
OpenAI/Anthropic client, reapplies base-url headers, and carries the #33163
base_url / OAuth-detection fixes). Falls back to the snapshot key when the
pool is absent, empty, or the entry has no usable key.

Salvaged from #25206 onto current main: the original targeted the pre-refactor
monolithic method in run_agent.py; the logic now lives in
agent/agent_runtime_helpers.py and is collapsed onto _swap_credential instead
of re-inlining the client rebuild.

Fixes #25205
2026-06-27 20:04:45 -07:00
teknium1
a590c5efdc docs: add infographic for provider-precedence fix (#29285) 2026-06-27 19:49:02 -07:00
kshitijk4poor
2af1678bfc fix(auth): explicit provider intent beats stale OAuth active_provider (#29285)
`resolve_provider("auto")` checked `auth.json` `active_provider` BEFORE the
config.yaml `model.provider` and env-var API-key checks. So a user who was
OAuth-logged-into one provider (e.g. Anthropic) but had set an explicit
`model.provider` or exported an API key (e.g. `OPENAI_API_KEY`) was silently
routed to the stale OAuth provider — the override was invisible and surprising.

Reorder the auto-path so explicit intent wins (the order the issue asks for):

  1. explicit CLI api_key/base_url
  2. config.yaml `model.provider`            (safety net — see below)
  3. OPENAI_API_KEY / OPENROUTER_API_KEY env
  4. OpenRouter credential pool
  5. provider-specific API-key env vars
  6. auth.json `active_provider` (OAuth)      ← demoted to last-resort
  7. AWS Bedrock credential chain
  8. error

`active_provider` is still honored — it's just a last-resort fallback chosen
only when the user expressed no other preference, instead of overriding one.

The normal chat/gateway/TUI/ACP/status path already resolves config.provider
upstream in `resolve_requested_provider()` before "auto" is reached, so this
duplicate config check is the safety net for the lone direct caller
(`main.py` `resolve_provider("auto")`) and any future bypass. Because every
surface funnels through this one resolver, the fix propagates everywhere with
a single edit — no sibling path re-implements precedence.

Also add a one-shot WARN when resolution lands on `active_provider` while a
populated `model` config dict lacks a `provider` key — surfacing the silent
override the issue reported without breaking first-install.

Synthesizes the two competing PRs: #29615 (LifeJiggy — config-before-auth +
the silent-override framing) and #29809 (Minksgo — the env-before-auth
reorder). #29809 could not be merged directly (bundled unrelated, un-opt-in
cost-tagging telemetry); its reorder idea is incorporated here and credited.

Tests: tests/hermes_cli/test_provider_precedence.py — config/env beat stale
OAuth, OAuth still used as last resort, explicit request short-circuits, WARN
fires on silent fall-through. Full provider-resolution suites: 374 passed.

Fixes #29285

Co-authored-by: LifeJiggy <141562589+LifeJiggy@users.noreply.github.com>
Co-authored-by: Minksgo <153416856+Minksgo@users.noreply.github.com>
2026-06-27 19:49:02 -07:00
teknium1
2b73dd1ca6 fix(gateway): namespace --replace takeover marker by HERMES_HOME to stop cross-profile flap (#29092)
Two profile gateway services sharing the default ~/.hermes resolve the
takeover marker to the same path. A --replace from profile B could land
in profile A's marker, match on PID + start_time by coincidence of a
shared PID namespace, and make profile A exit 0 — only to be revived by
systemd Restart=always, which races the replacer again, flapping
indefinitely.

write_takeover_marker now stamps replacer_hermes_home; the shared
consume path rejects markers written under a different HERMES_HOME and
leaves them in place for the correct profile. Absent field (older
markers) is treated as same-home, so single-profile and mixed old/new
deployments are unaffected.

Salvaged from #31414 by @CryptoByz onto current main (branch was ~3962
commits behind; the consume function had since been refactored for
issue #34597). Co-authored-by: CryptoByz.
2026-06-27 19:43:02 -07:00
Teknium
28ed883959 docs: add PR infographic for config-defaults fix 2026-06-27 19:38:11 -07:00
Teknium
45b2e4dd6b fix(config): opt newer migrations out of default-stripping
The salvaged #27354 fix made save_config strip schema-default leaves by
default. Five migration sites added to main after the PR was authored
still called bare save_config(config) and intentionally materialize a
(often default-valued) key: model_catalog.ttl_hours, write_approval,
curator.consolidate, agent.verify_on_stop, and the suspicious-MCP-server
disable. Pass strip_defaults=False so those one-time deliberate writes
survive, matching the opt-out the PR applied to the other migrations.
2026-06-27 19:38:11 -07:00
郝鹏宇
98488c4be4 fix(config): prevent save_config from materialising schema defaults
Fixes #27354

Root cause:  called during init (or by any code path
that saves ) wrote injected schema defaults into
config.yaml as if the user had authored them.  Two fix layers:

1.  now only injects
    when the user actually set
    somewhere (root or agent).  A user who never set
    keeps it absent, so 's explicit-path
   detection won't treat it as user-authored.

2.  gains a  parameter and a
   new  pass that removes keys matching
    unless those paths were explicitly present in the
   **raw** (pre-normalization) config on disk.  Explicit-path detection
   uses  on  *before* any
   normalisation runs — preventing injected-in defaults from being
   mistaken for user-set values.

All migration and edit-config call sites pass
to preserve their intentional default-seeding behaviour.

New helpers:
-   — collects leaf-key paths from a raw dict
-    — removes keys matching schema defaults

Test coverage: 4 new regression tests (59 total, all passing).
2026-06-27 19:38:11 -07:00
teknium1
6dcc579bcb test(streaming): repoint anthropic stream-cleanup test to close+rebuild path
The existing test_anthropic_stream_parser_valueerror_retries_before_delivery
asserted mock_replace.call_count == 1 — i.e. it passed precisely because the
buggy OpenAI rebuild was invoked on the Anthropic path. Repoint it to assert
the corrected close+rebuild-Anthropic behavior (#28161).
2026-06-27 19:37:33 -07:00
EloquentBrush0x
a0b9663c7c fix(streaming): rebuild Anthropic client on stream cleanup instead of OpenAI client
interruptible_streaming_api_call() has three connection-pool cleanup
sites that called _replace_primary_openai_client() unconditionally.
For api_mode=anthropic_messages this has two consequences:

1. _replace_primary_openai_client() fails (OPENAI_API_KEY unset on
   Anthropic-only configs), so dead connections are never purged.
2. The stale-stream detector's outer-poll site (L1977) is the only
   mechanism that can interrupt the worker thread while it blocks in
   for event in stream:. Because the Anthropic client is never closed,
   the thread stays blocked until the 900 s httpx read-timeout fires,
   producing a visible 15-minute hang for Telegram/gateway users on
   claude-opus-4-7.

Fix: mirror the existing interrupt-path pattern (L1989-1997) at all
three cleanup sites — if api_mode == "anthropic_messages", call
_anthropic_client.close() + _rebuild_anthropic_client() instead of
_replace_primary_openai_client(). _rebuild_anthropic_client() handles
both direct Anthropic and Bedrock-hosted Claude correctly, unlike the
inline build_anthropic_client() calls in open PR #14430.

PR #14430 (open) covers only the outer stale-detector site (L1977).
PR #23678 (open) covers only the inner retry sites (L1774, L1833).
This PR covers all three sites and uses _rebuild_anthropic_client()
for Bedrock parity.

Fixes #28161
2026-06-27 19:37:33 -07:00
xxxigm
6f1a176b33 fix(gateway/discord): REST liveness probe to detect zombie clients (#26656)
The Discord adapter could enter a silent zombie state after a network
outage / proxy stall: the process is alive, _client looks open, but the
underlying socket is dead. discord.py's WebSocket reconnect never sees a
RST through a wedged proxy/NAT, so client.start() spins forever without
exiting — which means the bot-task done callback (which only fires on
task completion) never trips either. The bot stays "offline" in Discord
until a manual `hermes gateway restart`. Reported offline for 13-17h.

Adds an out-of-band REST liveness probe in DiscordAdapter. Every
`discord.liveness_interval_seconds` (default 60s) the adapter issues a
cheap fetch_user(bot_id) — the same REST path as message delivery, so it
fails when the proxy/NAT is wedged. After
`discord.liveness_failure_threshold` consecutive failures (default 3) the
probe closes the wedged client and surfaces a retryable fatal error,
which trips the gateway's existing _platform_reconnect_watcher and
rebuilds the adapter. Operators disable it by setting either knob to 0.

Config lives in config.yaml (discord.liveness_*) per the .env-is-secrets
policy; _apply_yaml_config bridges it to internal env vars the adapter
reads, matching the existing HERMES_DISCORD_TEXT_BATCH_* pattern.

Co-authored-by: Hermes Agent <agent@nousresearch.com>
2026-06-27 19:30:32 -07:00
teknium1
457c8a0a7c fix(file-ops): keep worktree isolation when restoring preserved cwd (#26211)
The durable _last_known_cwd anchor is keyed by the shared 'default' container,
so a non-owning worktree session could inherit the owning session's cwd through
it — breaking the wrong-worktree-routing fix (test_file_tools_cwd_resolution::
test_resolution_routes_to_resolving_sessions_worktree).

Reorder _authoritative_workspace_root so the session-specific registered cwd
override (keyed by raw session id) is checked BEFORE the shared-container
_last_known_cwd fallback. A non-owning session now resolves into its own
registered worktree; the durable anchor only fills in when there's no
session-specific override (the #26211 single-session case). Adds a regression
test covering the owner-mirrors-then-other-session-resolves interaction.
2026-06-27 19:29:06 -07:00
teknium1
b2faeba182 fix(file-ops): make preserved cwd reachable at write-time resolution (#26211)
Belt-and-suspenders on top of the cherry-picked cwd-preservation fix:

- Proactively mirror every live terminal cwd into _last_known_cwd on each
  successful read, so the durable anchor survives even when the cleanup
  thread pops both _file_ops_cache and _active_environments before
  _get_file_ops' stale-cache save branch can fire.
- Fall back to _last_known_cwd in _authoritative_workspace_root. write_file_tool
  resolves the path (via _resolve_path_for_task) BEFORE _get_file_ops rebuilds
  the env, so restoring only the rebuilt env's cwd was insufficient — the
  resolution that decides where the file lands runs first. This closes that gap.

The local env's persisted _cwd_file can't serve this role: it's keyed by a
random per-session uuid and deleted on cleanup (the same cleanup that triggers
the bug). The in-memory _last_known_cwd registry is the durable anchor instead.

Adds a real-IO E2E regression (TestSilentFileMisplacementE2E) exercising the
actual write_file_tool path after env cleanup.
2026-06-27 19:29:06 -07:00
zccyman
adeba1d7a8 fix(file-ops): preserve CWD across terminal environment re-creation (#26211)
Root cause: when the terminal environment (`_active_environments` entry) is
cleaned up and re-created during a long conversation, the new environment
always starts with the default config CWD (typically `~/.hermes/hermes-agent`)
instead of preserving the user's last-known working directory. Subsequent
relative-path writes (`write_file`, `execute_code`, shell commands) silently
land in the default CWD, making files appear to be "created but absent."

Fix: add `_last_known_cwd` dict that preserves the old environment's CWD
before the stale cache entry is invalidated. When a new environment is
created for the same task_id, we check `_last_known_cwd` first and use the
preserved CWD instead of the config default.

Changes:
- tools/file_tools.py: add `_last_known_cwd` dict, save CWD before stale
  cache invalidation, restore CWD on env recreation
- tests/tools/test_file_tools.py: add `TestLastKnownCwd` with 2 tests
  verifying CWD preservation and fallback behavior

Fixes #26211
2026-06-27 19:29:06 -07:00
teknium1
926a1b915d fix(tools): suppress transient check_fn flakes so subagents keep file/terminal tools
A flaky external probe in a tool's check_fn (e.g. check_terminal_requirements
running `docker version` with a 5s timeout, momentarily timing out under load)
would return False for a single get_tool_definitions() call. Because file
tools delegate their check_fn to the terminal check, that one flake silently
stripped read_file/write_file/patch/search_files AND terminal from whatever
agent was being constructed at that instant — most visibly a delegate_task
subagent, which then reported "Tool read_file does not exist". This explains
both the intermittent (~80% success) user-session failures and the
deterministic cron failures in #21658 / #5304.

The existing _check_fn TTL cache made this worse: it cached the transient
False for the full 30s window, poisoning every subagent spawned in that span.

Fix: remember the last time each check_fn returned True; when a fresh probe
fails within a short grace window of that success, treat it as a flake —
serve the last-good True and do NOT cache the failure (so the next call
re-probes). A failure with no recent success, or past the grace window, is
honored normally so a backend that genuinely went down stops advertising its
tools. Probe failures now log at WARNING regardless of quiet mode, making the
previously-silent tool loss diagnosable in subagent (quiet) sessions.

Co-authored-by: Stuart Horner <5261694+djstunami@users.noreply.github.com>
2026-06-27 19:29:00 -07:00
Shashwat Gokhe
505bc27d8d fix(gateway): classify mixed attachments per-attachment + transcode uncommon image formats
A document attached alongside an image in the same Discord message was
swept into the vision pipeline and 400'd the whole turn ("Could not
process image"), and was simultaneously never surfaced to the agent as a
readable file. Restores the "any file type works" contract for mixed
messages and fixes the HTTP 400.

Bug 1 — mixed attachments: the inbound routing loop keyed image/audio/video
classification off the message-level type (PHOTO/VOICE/AUDIO), so a doc in
a PHOTO message landed in image_paths and poisoned the vision call. The
document context-note path was gated on message_type == DOCUMENT, so that
same doc never reached the agent at all. Now classification is
per-attachment (trust each attachment's own MIME; fall back to the
message-level type only when MIME is unknown), via shared _event_media_is_*
helpers used by both _build_media_placeholder and the main inbound loop.
The document note now fires for any non-image/audio/video attachment
regardless of message-level type.

Bug 2 — uncommon formats: AVIF/HEIC/BMP/TIFF/ICO produced the same generic
400 because providers only accept PNG/JPEG/GIF/WEBP. image_routing now
transcodes those to PNG via Pillow before declaring media_type, skipping
cleanly (logged) if Pillow/plugins are missing. SVG is vector — Pillow
can't rasterize it — so it's skipped rather than transcoded.

Closes #25935.

Co-authored-by: LeonSGP43 <cine.dreamer.one@gmail.com>
Co-authored-by: cypres0099 <74935762+cypres0099@users.noreply.github.com>
2026-06-27 19:26:04 -07:00
teknium1
0c372274cd fix(agent): disable OpenAI SDK auto-retry that double-fires inside the rate-limit loop
Same bug class as the Anthropic fix (#26293): the OpenAI/aggregator client is
built without max_retries, so the SDK default of 2 applies. The SDK's own 1-2s
backoff ignores Retry-After and retries inside hermes's outer conversation loop,
burning request slots against a rate-limited bucket. Set max_retries=0 at the
single create_openai_client chokepoint (covers init, switch_model, recovery,
restore, request-scoped). auxiliary_client builds its own clients and is not
wrapped by the loop, so it keeps SDK retries.
2026-06-27 19:23:15 -07:00
konsisumer
1ab35ba25d fix(anthropic): stop SDK auto-retry double-firing and raise Retry-After cap to 600s
The Anthropic SDK clients were built without max_retries, so the SDK
default (max_retries=2) retried 429/5xx with its own backoff that ignores
Retry-After — double-retrying inside hermes's outer loop and burning
request slots against a bucket that won't refill for minutes. Set
max_retries=0 on all Anthropic/AnthropicBedrock client constructions so
the outer conversation loop (which already honors Retry-After) owns retry.

Also raise the Retry-After cap in the conversation loop from 120s to 600s.
Anthropic Tier 1 input-token buckets reset in ~171s, so the 120s cap made
hermes retry before the reset window and re-trip the limit.

Refs #26293
2026-06-27 19:23:15 -07:00
LeonSGP43
32732a8f83 fix(agent): cap same-entry credential refreshes so fallback can activate (#26080)
A persistent upstream 401 on a single-entry OAuth pool (common for Claude
Max subscribers) made the credential-pool recovery spin forever:
try_refresh_current() re-mints a fresh token and reports success on every
401, so recover_with_credential_pool returned True and the retry loop
continue'd without ever incrementing retry_count or reaching the
auth-failover block. The configured fallback_model never activated and the
agent appeared to hang.

Cap consecutive successful same-entry refreshes (keyed by provider +
pool-entry id) at 2; once exceeded, treat the credential as unrecoverable
and return not-recovered so the loop falls through to
_try_activate_fallback. The 429/billing paths already rotate-or-fall-through
correctly (mark_exhausted_and_rotate returns None on a single entry), so
only the auth-refresh branch needed the cap.

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-06-27 19:20:07 -07:00
Teknium
fae920642a fix(agent): throttle cross-turn fallback-switch replay storm (#24996) (#53909)
When every provider in the fallback chain fails non-retryably back-to-back
(e.g. HTTP 400/402/429 across distinct providers), the within-turn walk is
already bounded — _fallback_index advances monotonically and the loop aborts
when the chain exhausts. The damaging mode is cross-turn: restore_primary_
runtime resets _fallback_index=0 every turn, so a client that re-submits
immediately replays the entire chain, re-marshaling the full (potentially
80k-token) context once per provider every turn with no throttle on the
non-rate-limit path. On constrained hosts this exhausts memory/swap.

Rate-limit/billing failures already arm a 60s cooldown via _rate_limited_until;
the gap was the non-rate-limit case. Now, when the chain exhausts on a non-
rate-limit failure with a non-empty chain, arm a short (5s) cooldown on the
same _rate_limited_until gate (max(), never shrinking an existing window).
The next turn's restore stays gated and does NOT reset the index, so the
chain isn't replayed until the cooldown clears. No new state, no thread sleep,
no false-trip on legitimately long chains (those walk normally within a turn).

Tests: tests/run_agent/test_24996_fallback_exhaustion_cooldown.py
2026-06-27 19:15:40 -07:00
Chaz Dinkle
1dde7e2f2a fix(anthropic): adopt Claude Code's already-refreshed token before racing refresh
Claude Code OAuth refresh tokens are single-use; Claude Code refreshes on
its own schedule, so by the time Hermes notices an expired token Claude
Code may have already rotated it. Re-read live credential sources first and
adopt a valid token rather than POSTing a possibly-stale refresh token.

Ports the _refresh_oauth_token hardening from PR #40107 (chazmaniandinkle)
on top of the keychain/file reconciliation from PR #21112 (nodejun).
Adds AUTHOR_MAP entry for nodejun.
2026-06-27 19:14:43 -07:00
jun
5a5396aecb fix(anthropic): reconcile keychain/file credentials when one is expired
read_claude_code_credentials() previously returned the macOS Keychain
entry as soon as one existed, even if its OAuth token was already
expired. Callers then ran is_claude_code_token_valid() on the result
and got False, so resolve_anthropic_token() returned None — surfacing
the misleading 'No Anthropic credentials found' error even when
~/.claude/.credentials.json held a perfectly valid token.

Now reads both sources and prefers the non-expired one. When both are
valid (or both expired), prefers the later expiresAt so any subsequent
refresh uses the freshest refresh_token.

Adds TestReadClaudeCodeCredentialsDesync covering the four reconciliation
cases. The existing 'keychain wins' priority test still passes because
both fixtures share the same expiresAt and the tiebreaker is >=.
2026-06-27 19:14:43 -07:00
Teknium
db16854f34 fix(telegram): surface failed media downloads to user and agent, not a silent empty turn (#53912)
When a Telegram attachment download/cache fails (typically a transient
httpx.ConnectError to Telegram's CDN), the except handler logged a warning
and fell through to handle_message() with empty media and no text — the user
thought the file was delivered, the agent saw a content-less turn with no
signal an attachment was attempted, and the only record was a buried log line.

Adds _surface_media_cache_failure(): replies to the user in Telegram so they
know to retry, and appends an agent-visible notice to event.text via the
existing _append_observed_note channel so the agent knows an attachment was
attempted and failed. No new event fields (structured-event refactor is out
of scope per #23045). Wired into all five cache-failure sites — photo, voice,
audio, video, document — since they shared the identical silent fall-through.

Bug 1 from #23045 (unsupported types routed as fake user messages) no longer
exists on main: the document handler now accepts any file type, so there is no
rejection branch to fix.

Closes #23045
2026-06-27 19:12:57 -07:00
teknium1
6514be5a28 chore(release): add AUTHOR_MAP entry for linyubin (#50228 salvage) 2026-06-27 19:12:21 -07:00
teknium1
4133cd9fbf docs(infographic): eager fallback on persistent transport failures 2026-06-27 19:12:21 -07:00
linyubin
c946e6709f fix(agent): activate fallback on persistent transport failures (#22277)
Eager fallback previously fired only on rate_limit/billing. A stale-
detector-killed hung stream classifies as FailoverReason.timeout
(retryable=True) and the retry loop re-hit the same dead primary until
the budget exhausted -- 3 x ~180-300s stale kills compounding into a
15+ min silent hang while the configured fallback chain sat idle.

Extend the existing eager-fallback gate to also cover timeout and
overloaded, but only after one real retry (retry_count >= 2) so genuine
transient hiccups still recover on the primary. Reuses the same
pool-recovery guard and state-reset as the rate_limit branch -- no new
config flag, no change to the rate-limit intent.

Salvaged from PR #50228 by @linyubin. Closes #22277.

Co-authored-by: Hermes Agent <127238744+teknium1@users.noreply.github.com>
2026-06-27 19:12:21 -07:00
bykim0119
851f75d4df fix(discord): honor "*" wildcard in DISCORD_ALLOWED_USERS (#22334)
DISCORD_ALLOWED_USERS="*" now means "allow everyone", matching the
SIGNAL_ALLOWED_USERS / DISCORD_ALLOWED_CHANNELS wildcard convention and
the value `claw migrate` emits. Previously _is_allowed_user did exact
ID matching only, so "*" matched no user and blocked every non-self
sender — a P1 with no workaround.

Three sites, all required for the fix to hold at runtime:
- _is_allowed_user: short-circuit when "*" is in the allowlist.
- connect(): exclude "*" from the intents.members trigger so the
  wildcard does not request the privileged Server Members intent
  (which can block the bot from coming online).
- _resolve_allowed_usernames: preserve "*" verbatim; otherwise it lands
  in the username-resolution bucket, matches no member, and is silently
  dropped from the set and env var on the first on_ready — quietly
  undoing the fix.

Slash auth delegates to _is_allowed_user (auto-covered); component auth
already honors "*" on main.
2026-06-27 19:11:30 -07:00
Teknium
1207d81eed fix(gateway): unify outbound chat redaction onto authoritative redactor (#23810) (#53907)
The gateway banner promises 'chat responses are scrubbed before delivery',
but _redact_gateway_user_facing_secrets used a divergent 6-pattern subset that
leaked credential shapes the comprehensive agent.redact catches — notably the
GitHub fine-grained PAT (github_pat_...) and the Telegram bot-token shape
(bot<digits>:<token>), the gateway's own credential type.

_redact_gateway_user_facing_secrets now delegates to
agent.redact.redact_sensitive_text(force=True) — the same Tirith-grade redactor
already applied to logs, tool output, and approval-command prompts — so the
outbound LLM-response path (final_response -> _sanitize_gateway_final_response)
masks the full credential set. The narrow local pattern set is kept as a
fail-soft second pass. force=True honors redaction even when
security.redact_secrets is off, matching _redact_approval_command.

Test: regression guard parametrizing all 5 issue shapes x every chat surface;
asserts secret body never reaches the user and surrounding prose survives. The
existing bearer-token test's marker assertion is loosened from the literal
'[REDACTED]' to mask-agnostic (the redactor masks as '***'/partial) — it
asserts the security invariant, not the implementation's mask string.
2026-06-27 19:09:41 -07:00
LeonSGP43
c56b39c11e fix(auxiliary): fall back to OPENROUTER_API_KEY when credential pool exhausted
_try_openrouter() returned (None, None) whenever an OpenRouter credential
pool existed but was exhausted (_select_pool_entry -> (True, None)), making
the OPENROUTER_API_KEY env-var fallback unreachable. Auxiliary tasks
(compression, vision, web_extract) silently failed even with a valid env key.

Now the pool-present branch only returns early when it successfully builds a
client; an exhausted pool falls through to the env-var path. The final
failure (pool exhausted AND no env var) still marks the provider unhealthy.

Fixes #23452.

Co-authored-by: ambition0802 <noreply@github.com>
2026-06-27 19:09:27 -07:00
qWaitCrypto
46e18804ad fix(auxiliary): fall back on 401 auth errors in auto mode (#21165)
When the primary provider returns 401 and the auth-refresh path is
unavailable or fails, both call_llm() and async_call_llm() reached the
should_fallback gate without _is_auth_error in the condition, so the
auxiliary task (e.g. compression) was dropped silently — losing message
history. Add _is_auth_error to should_fallback (NOT is_capacity_error) in
both sync and async paths, plus an 'auth error' reason branch.

Auth stays a non-capacity error: it falls back in auto mode via the
is_auto gate, but on an explicitly-configured provider it still respects
the user's choice and raises rather than silently switching providers.
2026-06-27 19:07:04 -07:00
Teknium
1a570dae00 fix(image-routing): unblock message queue on OpenRouter 'no endpoints' image 404 (#53901)
The agent's image-rejection fallback strips images and retries text-only when
a provider rejects image content, which is what lets the gateway drain its
queued messages. The fallback only fires on a hardcoded phrase list, and the
OpenRouter wording — HTTP 404 'No endpoints found that support image input' —
was missing. For OpenRouter-routed non-vision models the fallback never fired,
the retry loop re-sent the same rejected request until exhaustion, and every
subsequent message (including plain text) stayed queued behind the stuck turn.

Add the phrase to _IMAGE_REJECTION_PHRASES (the 404 already passes the 4xx
gate). Add a positive test and a guard test so the sibling OpenRouter
'no endpoints ... data policy / guardrail' 404s do NOT get their images
stripped.

Fixes #21160. Reported by @liu14goal14-ux; PR #21198 by @ygd58.
2026-06-27 19:07:02 -07:00
Teknium
a94f657a50 fix(tui): route completion RPCs to the pool so they can't freeze the TUI (#53895)
complete.path and complete.slash ran inline on the tui_gateway stdin
reader thread. complete.path spawns git ls-files and fuzzy-ranks the
whole repo; complete.slash does first-call prompt_toolkit imports plus a
skill-dir scan. While either ran, prompt.submit / session.interrupt sat
unread in the stdin pipe, freezing the TUI until the 120s RPC timeout
fired — most reliably reproduced by typing @ on a large repo / WSL2 mount.

Add both to _LONG_HANDLERS so completion runs on the existing thread
pool (write_json is already _stdout_lock-guarded). Root-cause fix:
covers any slow completion, not just the bare-@ trigger.

Fixes #21123
2026-06-27 19:06:01 -07:00
teknium1
ccf526964a fix(gateway): bound adapter teardown awaits on the stop path (#14128)
The main stop loop in _stop_impl() awaited adapter.cancel_background_tasks()
and adapter.disconnect() with no timeout, for both the primary and the
secondary-profile (multiplex) adapter maps. A half-dead platform — a wedged
Feishu/Lark WebSocket thread blocked on network I/O is the reported case —
makes one of those awaits block forever, so the process never exits. systemd
then SIGKILLs it after TimeoutStopSec, skipping atexit PID-file cleanup, and
the next start dies with 'PID file race lost' and enters a restart loop.

The per-adapter timeout infra already existed on main
(_adapter_disconnect_timeout_secs / HERMES_GATEWAY_ADAPTER_DISCONNECT_TIMEOUT,
default 5s) but was only wired into _safe_adapter_disconnect, which the
teardown path never calls.

Add _bounded_adapter_teardown(): wraps BOTH cancel_background_tasks() and
disconnect() in the existing timeout budget, logs and forces forward progress
on timeout, and never raises. Both teardown loops now route through it, so the
stop sequence always completes regardless of any adapter's internal behavior
and PID-file cleanup runs.

Original report + fix direction by @happy5318 (#14128, #14130); this widens it
to cover cancel_background_tasks(), the multiplex loop, and the config knob.

Co-authored-by: happy5318 <happy5318@users.noreply.github.com>
2026-06-27 19:05:04 -07:00
Teknium
6717cfc805 docs(gateway): warn against custom ExecStopPost kill drop-in (restart loop) (#53903)
A user-added systemd drop-in like ExecStopPost=/bin/kill -9 $MAINPID fires
on every stop, including clean restarts — it SIGKILLs the freshly spawned
gateway before it stabilizes and Restart=always respawns it, producing an
infinite restart loop (issue #23272). The unit Hermes installs already shuts
down cleanly via KillMode=mixed + KillSignal=SIGTERM with Restart=always +
RestartForceExitStatus, so no extra kill is needed. Document this as a danger
callout in the gateway service-management section.
2026-06-27 19:04:29 -07:00
teknium1
ea8facee81 chore(release): add konsisumer to AUTHOR_MAP for PR #19608 salvage 2026-06-27 19:01:37 -07:00
konsisumer
8b4c29f0f0 fix(auth): preserve concurrently-added credentials on pool rewrite 2026-06-27 19:01:37 -07:00
Teknium
163cb24d45 feat(moa): render reference-model blocks in TUI and desktop, not just CLI (#53855)
The MoA reference-block display (each reference model's output shown as a
labelled thinking block before the aggregator responds) previously existed
only in the classic CLI. The facade already emits moa.reference / moa.aggregating
through tool_progress_callback; this wires the TUI and desktop consumers.

- tui_gateway/server.py: _on_tool_progress relays moa.reference (label / text /
  index / count) and moa.aggregating to the Ink/desktop client as their own
  events.
- ui-tui: gatewayTypes adds the two event shapes; createGatewayEventHandler
  routes them; turnController.recordMoaReference pushes a committed
  thinking-style segment tagged with the source model. Shown regardless of
  showReasoning — references ARE the mixture-of-agents process the user opted
  into, not ordinary reasoning. moa.aggregating is a status-only transition
  (no transcript entry).
- apps/desktop: use-message-stream appends each reference as a labelled
  reasoning chunk via the existing reasoning disclosure; GatewayEventPayload
  gains label/index/aggregator.

Tests: tui_gateway emit (3), Ink handler render + showReasoning-independence +
aggregating-no-segment (3). TUI typecheck/lint clean; desktop typecheck/lint
clean.
2026-06-27 18:46:20 -07:00
Teknium
d3d621f7c3 revert(windows): roll back terminal-popup PRs #53791 #53810 #53829 (#53853)
* Revert "fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)"

This reverts commit 2ecca1e7d3.

* Revert "fix(windows): stop terminal-window popups from background spawns (#53810)"

This reverts commit 5db1430af9.

* Revert "fix(windows): stop subprocess console-window popups + add CI guard (#53791)"

This reverts commit ef17cd204d.
2026-06-27 15:59:00 -07:00
Teknium
1d32e5d98c fix(gateway): relay _thinking bubbles when thinking_progress is on but tool_progress is off (#53849)
display.thinking_progress is documented as independent of tool_progress —
users can keep tool progress quiet while opting into mid-turn assistant
scratch-text bubbles. But two gates were keyed on tool_progress_enabled alone,
so with tool_progress:off the _thinking relay was silently dead even when
thinking_progress:true:

1. agent.tool_progress_callback was set to None unless tool_progress_enabled,
   so the callback that queues _thinking text never fired.
2. The send_progress_messages drain task was only started when
   tool_progress_enabled, so even queued messages had no consumer.

Both now gate on needs_progress_queue (tool_progress OR thinking_progress) —
the same condition that already decides whether to create the progress queue
at all. No effect when both are off (queue is None) or when tool_progress is
on (unchanged).

Tests: _thinking relays with thinking_progress:on/tool_progress:off, and is
suppressed when thinking_progress:off. Full progress-topics suite: 35 pass.
2026-06-27 15:48:20 -07:00
Teknium
2ecca1e7d3 fix(windows): capture is not a no-window boundary; route flashing spawns through chokepoint (#53829)
Follow-up to #53791 addressing review feedback: the footgun checker treated
capture_output=/stdout=/stderr=/check_output as proof a subprocess can't pop a
Windows console. That invariant is false — stream redirection controls where a
child's output goes, not whether a console is allocated. From a console-less
parent (Desktop/Electron, pythonw.exe, detached gateway/cron) a console-subsystem
child still flashes a window even when fully captured.

- check-windows-footguns.py: capture/redirect/check_output is no longer a blanket
  safe-pass. Added _WINDOWS_FLASHING_PROGRAMS (git/gh/npm/node/python/uv/ffmpeg/
  docker/powershell/…); calls to those are flagged even when captured. Non-flashing
  programs keep the capture exemption (no 271-site noise). _subprocess_compat.run/
  popen calls are inherently safe (wrapper injects CREATE_NO_WINDOW).
- Routed the 35 genuine flashing git/gh/npm/uv/ffmpeg/docker spawns through the
  _subprocess_compat.run/popen chokepoint (Brooklyn's wrapper from #53810) — the
  durable fix, not per-site annotations. cmd.exe /c start stays # ok (intentional).
- Updated tests + CONTRIBUTING.md rule #17 to the corrected invariant.
2026-06-27 14:49:41 -07:00
Teknium
3ac96d3308 fix(moa): resolve auxiliary tasks to the aggregator, not the preset name (#53827)
On a MoA session, auxiliary tasks (title generation, compression, vision, …)
ran through _resolve_auto with provider='moa' / model='<preset>', which sent
the preset name (e.g. 'opus-gpt') as the model id to resolve_provider_client —
producing 'HTTP 400: opus-gpt is not a valid model ID' on every turn (visible
as the title-generation warning).

MoA is a virtual provider with no real HTTP endpoint; aux tasks don't need the
reference fan-out. _resolve_auto now resolves a 'moa' main provider to the
preset's aggregator slot (its acting model) and continues Step 1 with that real
provider+model, dropping the virtual moa://local base_url + placeholder key so
the aggregator resolves via its own provider credentials. Mirrors the MoA
context-length resolution.

Verified live: a MoA turn no longer emits the 'not a valid model ID' warning.
Test: tests/agent/test_auxiliary_main_first.py (19 pass).
2026-06-27 14:21:26 -07:00
Gille
e7bb67332d fix(moa): preserve Codex slot routing 2026-06-27 14:20:51 -07:00
Gille
66aeda3550 fix(moa): keep virtual provider on MoA client 2026-06-27 14:20:51 -07:00
brooklyn!
5db1430af9 fix(windows): stop terminal-window popups from background spawns (#53810)
* fix(windows): stop terminal-window popups from background spawns

Native-Windows desktop/gateway users saw cmd/conhost windows flash on
gateway restart, image paste, the dashboard Projects tree, voice notes,
and ~5 min after closing the app (detached cron). Two root causes:

- Console-subsystem exes (taskkill, schtasks, wmic, netstat, tasklist,
  agent-browser, git, ffmpeg, powershell, git-bash) spawned via raw
  subprocess allocate a fresh console when the launching process has
  none (pythonw desktop backend / detached gateway) - even with output
  captured.
- uv venv pythonw shims re-exec console python.exe, so Python children
  get a console regardless of how they're launched.

Fixes:
- Single hidden-spawn primitive (_subprocess_compat.run/.popen) that ORs
  CREATE_NO_WINDOW on Windows, no-op on POSIX. Route every Hermes-owned
  console-exe spawn through it.
- FreeConsole() catch-all in hermes_bootstrap: any Python child that
  exclusively owns an auto-allocated console detaches it at startup
  (GetConsoleProcessList()==1 gate leaves shared interactive consoles
  untouched).
- Replace PowerShell/wmic gateway PID scans with in-process psutil.
- Skip schtasks queries on non-interactive desktop restarts.
- Prefer native agent-browser .exe over .cmd shims.
- Guard test bans raw subprocess spawns of the Windows-only console
  tools repo-wide so the popup class can't regress.

* fix(windows): scope FreeConsole to background entry points; fix merge fallout

Console detach review (per #53810 feedback): GetConsoleProcessList()==1 can't
tell a uv pythonw->python phantom console apart from a user opening the
interactive CLI/TUI in its own fresh console (double-click, shortcut, ConPTY) —
both report a single attached process with a tty. Running FreeConsole() in the
import-time bootstrap therefore risked detaching a legitimately-interactive
terminal.

- Extract FreeConsole into explicit hermes_bootstrap.detach_orphan_console();
  remove it from apply_windows_utf8_bootstrap() (import side effect).
- Call it only from known background mains: gateway run, dashboard backend
  (start_server, what the desktop spawns), cron standalone, tui_gateway entry,
  slash worker. Interactive CLI/TUI never calls it.
- Behavior-contract tests: frees only when solo owner, leaves shared console,
  no-op without console / on POSIX, and asserts it's not an import side effect.

Merge fallout from origin/main (#53791):
- local.py: 3-way merge left a dangling **_popen_kwargs (NameError crashing
  every terminal init). _subprocess_compat.popen already hides the window, so
  drop it.
- discord adapter: merge stacked an undefined windows_hide_flags() onto the
  primitive call; drop the redundant arg.
- test_gateway: scan now goes psutil-first (zero spawn); rewrite the
  case-variant test to drive that production path.

* test(claw): mock _subprocess_compat.run seam for Windows process scan

claw.py's Windows tasklist/powershell scan routes through the hidden-spawn
primitive; the tests still patched claw_mod.subprocess, so on win32 the mock
was never hit and real spawns returned nothing. Patch the actual seam.
2026-06-27 14:02:24 -07:00
Teknium
ef17cd204d fix(windows): stop subprocess console-window popups + add CI guard (#53791)
* fix(windows): stop subprocess console-window popups + add CI guard

The single biggest source of Windows 'terminal popup' bug reports was bare
subprocess.run/Popen calls spawning a console window. The compat helpers
(windows_hide_flags / windows_detach_popen_kwargs) already existed but the
footgun checker had no rule to stop new bare calls from reintroducing the flash.

- scripts/check-windows-footguns.py: new AST-based rule flagging subprocess
  calls that can create a new console — output-redirection-aware (capture/
  redirect/check_output exempt) and POSIX-only-program-aware (launchctl/
  systemctl/brew/etc. exempt). Comprehensive on real popups, no annotation
  burden on calls that can't flash.
- Swept all genuine window-spawning sites through windows_hide_flags()/
  windows_detach_popen_kwargs(); marked intentionally-visible launches
  (editor/terminal/foreground re-exec) with '# windows-footgun: ok'.
- tests/scripts/test_windows_footgun_subprocess_rule.py: behavior-contract
  tests + full-repo cleanliness invariant.
- CONTRIBUTING.md: documents the rule + the helper pattern.

* test: accept creationflags kwarg in psutil_android fake_subprocess_run

The Windows no-window sweep added creationflags=windows_hide_flags() to
install_psutil_android.py's subprocess.run call; the test's fake stub had a
fixed (cmd) signature and raised TypeError on the new kwarg.
2026-06-27 13:03:51 -07:00
Teknium
3b44a3c8bb feat(moa): show each reference model's output as a labelled block before the aggregator (#53793)
When a MoA preset is selected, each reference model's answer now renders in the
CLI as a thinking-style block labelled with its source model, BEFORE the
aggregator responds — so the mixture-of-agents process is visible instead of a
silent pause. The aggregator's response (and its tool actions) follow as normal.

Mechanism (shared seam, all surfaces):
- MoAChatCompletions/MoAClient take an optional reference_callback and emit
  'moa.reference' (index/count/label/text) per reference, then 'moa.aggregating'
  (aggregator label) once. agent_init wires this to the agent's
  tool_progress_callback, which every surface already consumes — so the events
  reach CLI/TUI/desktop/gateway with no new plumbing.
- CLI _on_tool_progress renders 'moa.reference' as a labelled '┊ ◇ Reference
  i/n — <model>' header + a thinking-style preview (reusing _emit_reasoning_
  preview), and 'moa.aggregating' as a spinner transition. Display-only; never
  touches message history (cache-safe).

Turn-scoped reference cache: the agent loop calls the facade once per tool-loop
iteration, but the advisory message view is identical across iterations within a
turn, so references are now run AND displayed once per user turn (keyed by the
advisory view's signature) instead of re-running/re-spamming on every iteration.
This also cuts reference API cost from O(iterations) back to O(turns).

Verified live via interactive PTY on the opus-gpt preset (gpt-5.5 + opus refs):
reference blocks render once per turn, labelled by model, before the aggregator;
fresh blocks on each new turn; aggregator tool actions still execute.

Follow-up: TUI/desktop rich rendering + gateway batched-summary already receive
the events via tool_progress_callback; their surface-specific renderers are a
separate change.
2026-06-27 12:45:23 -07:00
Dale Nguyen
dbbf102b8e fix(terminal): strip VIRTUAL_ENV/CONDA_PREFIX from terminal subprocess env
The Hermes gateway runs inside its own venv, so its process environment
carries VIRTUAL_ENV (and possibly CONDA_PREFIX). The terminal tool spawned
subprocesses inheriting those markers. When the agent ran `uv sync`,
`uv pip install`, `poetry install`, etc. in ANY other project directory,
those tools honored the inherited VIRTUAL_ENV and rebuilt/synced that
project's dependencies into the Hermes venv path — wiping Hermes' own runtime
deps (and, when the other project pinned a different Python, replacing the
interpreter), bricking the gateway on the next restart (#23473).

Strip VIRTUAL_ENV/CONDA_PREFIX in both subprocess-env construction points in
tools/environments/local.py — `_sanitize_subprocess_env` and `_make_run_env`
— via a shared `_ACTIVE_VENV_MARKER_VARS` constant. The Hermes venv stays
reachable because its bin dir is already first on PATH, so removing the
active-environment markers is safe and only prevents the cross-project clobber.

Adds TestActiveVenvMarkerStripping: end-to-end (markers in os.environ don't
reach the spawned subprocess) and unit coverage for both functions, plus a
guard on the marker constant.

Also adds the AUTHOR_MAP entry for the salvaged contributor.

Closes #23473
2026-06-28 01:04:20 +05:30
Teknium
d470ed0c4c fix(cli): commit tool scrollback lines in verbose mode (non-streaming/MoA) (#53785)
In the interactive CLI, the aggregator's tool calls under a MoA preset (or
any non-streaming model call, e.g. copilot-acp) appeared to overwrite each
other instead of building scrollable history. Each tool only updated the
transient spinner line; no committed scrollback line was printed.

Root cause: persistent tool lines in _on_tool_progress's tool.completed
branch were gated on tool_progress_mode in {all, new}, omitting 'verbose'.
Streaming models hid the bug because _on_tool_gen_start commits a 'preparing'
line per tool during streaming; non-streaming calls (MoA forces
_use_streaming=False) never emit that, so under 'verbose' there was no
committed line at all — only the self-overwriting spinner.

'verbose' is strictly more than 'all', so it now commits the same scrollback
line. Verified live via interactive PTY on the MoA opus-gpt preset: three
terminal calls in turn 1 and two in turn 2 each render as separate persistent
lines.
2026-06-27 12:29:55 -07:00
Teknium
227e6c0143 fix(moa): resolve context window from the aggregator, not the 256K default (#53780)
A MoA session's model is the preset name (e.g. 'opus-gpt') and its base_url is
the virtual local endpoint, so get_model_context_length() missed every probe
and fell through to the 256K fallback — even when the aggregator is a 1M-context
model. The acting model in MoA IS the aggregator, so resolve the context window
from the aggregator slot's real provider+model.

- model_metadata.get_model_context_length: when provider=='moa', resolve the
  preset's aggregator slot through resolve_runtime_provider and recurse with the
  aggregator's real provider/model/base_url. Explicit model.context_length still
  wins (checked first); falls through to the generic default if resolution fails.

Tests: opus-gpt preset now reports 1M (the aggregator window), config override
still honored.
2026-06-27 12:08:09 -07:00
ailthrim
25ec01f79f fix(desktop): don't purge Electron cache / mirror-retry after a late build failure
`hermes desktop` / `hermes update` recover from a corrupt Electron download by
purging the cached zip + re-downloading and retrying the pack, and then by
falling back to a public mirror. That recovery is only meaningful when the
packaged executable is MISSING — the signature of a partial/corrupt unpack.

A LATE failure such as macOS code signing (#40187) leaves
`Hermes.app/Contents/MacOS/Hermes` (or the platform equivalent) in place.
Re-downloading Electron can't repair a signing failure, so the purge +
slow mirror retry just grind through another identical failure before the
build finally errors out.

Gate both recovery blocks on `_desktop_packaged_executable(desktop_dir) is None`
so a build that already produced the executable fails fast instead of
triggering the destructive download recovery. The corrupt-download path
(executable missing) is unchanged.

Salvage of #42782, re-applied onto current main (the surrounding recovery was
refactored to `_electron_dist_ok` / `_redownload_electron_dist` since the PR
was opened). Adds a regression test asserting no purge / mirror retry runs when
the executable exists, and updates the existing retry/mirror tests to model the
corrupt-download case (executable absent) the recovery is actually for.

Related to #40187 (the residual cache-purge sub-issue; the signing failure
itself is fixed by #52591).
2026-06-28 00:29:34 +05:30
teknium1
1ef19bad90 fix(model): show MoA preset picker on selection and label MoA in the banner
Selecting 'Mixture of Agents' in the `hermes model` provider picker fell
through silently — select_provider_and_model had no moa branch, so it just
reprinted the current model/provider summary and exited. And the CLI session
banner rendered the bare preset name (e.g. 'opus-gpt · Nous Research'),
which is meaningless out of context.

- Add _model_flow_moa: always lists the available presets (even one), then
  prints the full reference-models + aggregator breakdown for the selection
  and persists model.provider=moa / model.default=<preset> (dropping stale
  base_url + endpoint creds, since moa is a virtual local provider).
- Wire the branch into select_provider_and_model.
- build_welcome_banner takes provider; when 'moa' it renders
  'MoA: <preset> · agg <aggregator>' instead of a bare slug. Both CLI call
  sites pass self.provider.

Tests: 2 new banner tests (moa + non-moa unchanged); E2E verified the picker
persists the preset and clears stale base_url/api_key.
2026-06-27 11:45:07 -07:00
konsisumer
1b6ebb24c0 fix(agent): validate OpenRouter provider sort before request dispatch 2026-06-27 11:43:08 -07:00
Teknium
27322612b4 fix(update): route loud build/installer output to update.log instead of the terminal (#53616)
* fix(update): route loud build/installer output to update.log instead of the terminal

hermes update flooded the terminal with the full vite asset dump,
electron-builder logs, npm deprecation warnings from the desktop build,
and the cua-driver installer's 'Next steps' wall. All of that is
low-signal noise the user doesn't need on a successful update.

- Capture the desktop --build-only subprocess (vite + electron-builder)
  into ~/.hermes/logs/update.log; print a one-line status, and on
  failure surface the last 15 lines + a pointer to the full log.
- Capture the cua-driver installer's output when verbose=False (the
  hermes update refresh path); concise upgrade line is unchanged.
- Add _log_only_write() / _run_logged_subprocess() helpers that write to
  the update.log handle without echoing to the terminal.

The repo-root npm install keeps streaming (capture_output=False) — that
is the deliberate #18840 guard so a slow postinstall download doesn't
look hung. The desktop npm install is a separate Electron process with
no such progress concern and is captured.

* fix(update): persist full cua-driver installer output to update.log

The captured cua-driver installer output was only sent to logger.debug
(agent.log) on failure, so the 'Next steps' wall was lost from
update.log entirely on success. Write the full captured output straight
to the update.log handle (sys.stdout._log) on both success and failure,
matching the desktop-build capture, so update.log keeps the complete
record of everything an update did.
2026-06-27 11:43:01 -07:00
ethernet
f53b184c48 fix(ci): pass secrets down to docker workflows 2026-06-27 09:53:28 -07:00
Teknium
190e1ffac9 fix(redact): mask passwords in lowercase/dotted config keys (#53590)
The secret redactor only matched uppercase env-style keys ([A-Z0-9_]),
so config-file assignments like spring.datasource.password=secret,
app.api.key=xyz, and YAML password: secret leaked verbatim when the
agent ran cat/grep on application.properties or .env files (issue #16413).

Adds three case-insensitive config-key matchers that run only in a
config-file context, preserving the existing #4367 (lowercase code/prose)
and web-URL-passthrough carve-outs:
  - _CFG_DOTTED_RE: namespaced keys (contain a dot) — unambiguously config
  - _CFG_ANCHORED_RE: bare secret-word keys at line start (incl. export)
  - _YAML_ASSIGN_RE: unquoted colon config (password: value)
Value capture stops at whitespace and '&' so form bodies stay pair-wise;
the '://' guard keeps intentional web-URL query-param passthrough intact.

Reported-by: Murtaza1211
2026-06-27 04:43:28 -07:00
Teknium
917f6bdb00 fix(tools): let vision pick any provider+model, not just OpenRouter (#53606)
* fix(tools): let vision pick any provider+model, not just OpenRouter

hermes tools → configure → vision no longer forces an OPENROUTER_API_KEY.
It now offers the same any-provider surface as the model command: Auto
(use main model / aggregator fallback), pick any authenticated provider +
model, or a custom OpenAI-compatible endpoint. Selections persist to
auxiliary.vision.{provider,model,base_url} — the keys the vision resolver
already reads. Custom endpoint pins provider=custom so base_url routes
correctly. Reconfigure path uses the same picker instead of re-prompting
for OPENROUTER_API_KEY.

* docs: add PR infographic for vision any-provider picker
2026-06-27 04:41:42 -07:00
Brandon Zarnitz
9c81c938d3 fix(approval): honour tirith_fail_open=false on Tirith ImportError (#20733)
check_all_command_guards() swallowed ImportError from tools.tirith_security
with an unconditional pass, leaving tirith_result["action"] as "allow"
regardless of security.tirith_fail_open.  When an operator sets
tirith_fail_open: false they have explicitly opted into fail-closed
behaviour; a missing or broken Tirith module must not silently permit
command execution.

Inside the except ImportError handler, read the live security config.
When tirith_enabled is true and tirith_fail_open is false, synthesise a
"warn"-action Tirith result so the command flows through the normal
approval path (prompt the user, or block in cron/gateway contexts)
instead of bypassing it.  The default tirith_fail_open: true behaviour
is unchanged.

Adds three regression tests to tests/tools/test_approval.py:
- fail_open=true  + ImportError → silently allowed (no regression)
- fail_open=false + ImportError → approval callback invoked, command denied
- tirith_enabled=false           → always allowed regardless of fail_open

Fixes #20733

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

# Conflicts:
#	tests/tools/test_approval.py
2026-06-27 04:41:24 -07:00
Teknium
fe1c1c1121 fix(session_search): demote cron below interactive sessions in discover ranking (#53597)
Cron jobs accumulate large volumes of repetitive vocabulary (recurring
project names, dates, summaries) and out-number a user's interactive
sessions. Under bare BM25 they dominate the top FTS rows, so discover's
early-exit-at-N dedup collects only cron sessions and the user's own
conversations never surface — "recall blindness" (#19434).

- _order_for_recall() stable-sorts FTS rows so interactive sources rank
  above cron before lineage dedup; within each class BM25/recency order
  is preserved. Cron is demoted, not excluded, so it still surfaces when
  it is the only match.
- raise discover scan limit 50 -> 300 so buried interactive matches are
  in hand for the demotion pass.

Fixes the cron-flooding sub-bug of #19434. The split-brain sub-bug is
covered by #52798; the child-session sub-bug is superseded by in-place
compaction.
2026-06-27 04:41:22 -07:00
Teknium
cd592c105c feat(send_message): native WhatsApp media delivery via Baileys bridge (#53598)
send_message with MEDIA:/path to a WhatsApp target previously dropped the
attachment: the WhatsApp branch never passed media_files, the plugin's
_standalone_send accepted the param but only POSTed text, and WhatsApp was
absent from the media-supported platform list.

- send_message_tool: add a Platform.WHATSAPP media block (mirrors Feishu) that
  routes media_files through the whatsapp plugin's standalone_sender_fn, and
  add whatsapp to the supported-media list strings.
- whatsapp adapter: _standalone_send now sends text first (skipped when the
  chunk is media-only), then uploads each file via the bridge /send-media
  endpoint with a mediaType derived from extension/is_voice/force_document, so
  images/videos/voice arrive as native bubbles instead of documents.
- _bridge_media_type classifier maps ext -> image|video|audio|document.

Closes #19105 (remaining send_message gap). Other items in the report
(inbound video paths, image_generate auto-deliver, history dedup, native
gateway bubbles) already landed on main.
2026-06-27 04:40:05 -07:00
Teknium
88c02469cc fix(mcp): never permanently wedge the circuit breaker on a dead transport (#53599)
A long-running gateway session could permanently lose an MCP server: once a
stdio subprocess died (or transient drops accumulated over the session), the
run loop exhausted its reconnect budget and returned, orphaning the task. With
no listener for _reconnect_event, the circuit breaker's half-open probe could
never revive the server — every probe hit a dead/absent session, re-armed the
60s cooldown, and looped forever until a full gateway restart (#16788).

Root cause was split ownership of transport liveness between the run loop and
the tool handler, plus a permanent give-up path. Fixed by one invariant: a
non-shutdown server task is always reconnectable.

- run loop parks (deregisters phantom tools, then awaits _reconnect_event)
  instead of returning when the reconnect budget is exhausted, so the task
  stays alive as a dormant listener
- retry budget resets on every successful (re)connect, so a healthy
  long-lived server can't accumulate lifetime drops into a death sentence
- half-open probe with no live session signals a reconnect (reviving a
  parked/dead task and respawning a dead stdio subprocess) and returns a
  clean 'reconnecting' error instead of writing into a dead pipe
- breaker resets on successful session init across all transports
  (stdio/HTTP/SSE) — fully transport-agnostic, no PID/pipe polling

Builds on the closed-PR cluster for this issue: keeps #49255's deregister-on-
exhaustion insight and #21006's signal-don't-probe insight, discards the racy
os.kill PID machinery.

Co-authored-by: LeonSGP43 <LeonSGP43@users.noreply.github.com>
Co-authored-by: srojk34 <srojk34@users.noreply.github.com>
2026-06-27 04:39:54 -07:00
r266-tech
dbc925b755 Guard oversized Telegram video downloads 2026-06-27 04:39:48 -07:00
Teknium
02b32e2d7c fix(moa): call reference + aggregator models through their provider's real route (#53580)
MoA was calling reference and aggregator models through a bare
call_llm(provider=slot["provider"], model=slot["model"]) with a forced
temperature and a forced max_tokens (the preset's hardcoded 4096). That left
base_url/api_key/api_mode unresolved — so the auxiliary auto-detector guessed
the API surface instead of using the provider's real runtime, and the 4096 cap
truncated long aggregator syntheses.

A MoA slot is just a model selection and must be called the same way any model
is called elsewhere. Each slot is now resolved through resolve_runtime_provider
(the canonical provider→api_mode/base_url/api_key resolver the CLI, gateway, and
delegate_task all use) via a new _slot_runtime() helper, and the resolved
endpoint is passed into call_llm. So a reference/aggregator gets its provider's
actual API surface — MiniMax → anthropic_messages, GPT-5/o-series →
max_completion_tokens, custom endpoints → their base_url — identical to how that
model is handled as the acting model.

MoA also no longer imposes its own output cap: max_tokens defaults to None
(omitted → the model's real maximum) for references and is passed through from
the caller for the aggregator. The preset's hardcoded 4096 is gone. The
max_tokens preset config field is left in place (config/web/desktop unchanged);
it is simply no longer applied as a forced cap.

Tests: slots route through resolve_runtime_provider with resolved base_url/
api_key; resolution errors fall back to bare provider/model; neither call
carries an output cap even when the preset config still contains max_tokens.
2026-06-27 04:39:42 -07:00
herbalizer404
3fe16e3cd5 fix(fallback): attach credential pool after provider switch
When automatic fallback activates a provider that differs from the
primary, try_activate_fallback() cleared the primary's pool (to avoid
cross-provider base_url contamination, #33163) but never loaded the
fallback provider's own pool. The fallback then ran with no pool, so
rate_limit/billing/auth recovery couldn't rotate its credentials.

After clearing a mismatched pool, load_pool(fb_provider) and attach it
when it has credentials, so provider-specific rotation continues to
work on the fallback target.
2026-06-27 04:39:26 -07:00
Tranquil-Flow
635841d210 fix(agent): reload credential pool on switch_model provider change (#52727)
switch_model() swapped model/provider/base_url/api_key but never
refreshed agent._credential_pool, which stays bound to the original
provider. recover_with_credential_pool() then sees a pool.provider !=
agent.provider mismatch and short-circuits — so a 429/401 on the new
provider gets no rotation and falls through to fallback instead.

Reload load_pool(new_provider) inside switch_model when the provider
changes (or the pool is missing). The reload is inside the protected
swap block and the pool is added to the rollback snapshot, so a failed
client rebuild restores the original pool.

Fixes #16678, #52727.
2026-06-27 04:39:26 -07:00
Teknium
2002bb49a7 test(telegram): make config-bridge tests immune to ambient .env pollution (#53594)
test_config_bridges_telegram_group_settings and
test_config_bridges_telegram_user_allowlists asserted the YAML→env bridge
via os.environ. A developer's real ~/.hermes/.env can repopulate TELEGRAM_*
vars during load_gateway_config(): the microsoft_teams plugin runs
load_dotenv(find_dotenv(usecwd=True)) at import time, which walks up from the
cwd (under ~/.hermes/ in worktrees) and reloads the user's .env, defeating the
env-over-YAML bridge for any key present there (e.g. TELEGRAM_GROUP_ALLOWED_CHATS).

Assert the returned PlatformConfig.extra instead — it is parsed straight from
the test's config.yaml and is immune to that ambient leak. free_response_chats
is bridged to the env var only (not extra), and TELEGRAM_FREE_RESPONSE_CHATS
doesn't appear in developer .env files, so it stays a deterministic os.environ
assertion.
2026-06-27 04:36:45 -07:00
Teknium
d4c2217e87 fix(gateway): offload /model switch off the event loop (#53603)
The Telegram/Discord /model command's actual switch calls switch_model()
directly on the asyncio event loop. switch_model() can fall through to a
synchronous models.dev HTTP fetch (requests.get, 15s timeout) on a cold or
expired cache, freezing the gateway for up to 15s and dropping the Telegram
connection while a user switches models.

The picker provider-list and fallback text-list sites were already offloaded
(#41289), but the two _switch_model() calls — the picker callback and the
direct /model <name> path — were not. Wrap both in asyncio.to_thread.

Closes #20525.
2026-06-27 04:36:22 -07:00
Teknium
caf4dcc7ad fix(whatsapp): resolve phone↔LID aliases in adapter DM/group allowlist (#53588)
The adapter-level intake gate (_is_dm_allowed / _is_group_allowed, reached
via _should_process_message) did a raw set-membership check against the
configured allowlist. WhatsApp now delivers inbound DM senders in LID form
(<id>@lid) while operators configure allowlists with phone numbers, so the
check never matched and every DM from an allowed contact was silently
dropped before the gateway authz layer ran.

Route both gates through the existing gateway.whatsapp_identity.
expand_whatsapp_aliases helper (already used by gateway authz and session
keys), which walks the bridge's lid-mapping-*.json session files. Phone and
LID forms now resolve to each other in both directions; exact JID matches,
wildcard, disabled/open policies, and empty-allowlist fail-closed behavior
are all preserved.

Fixes #14486
2026-06-27 04:17:12 -07:00
teknium1
38e7bd8a08 fix(agent): classify 429 'overloaded' bodies as overloaded, not rate_limit
Z.AI / Zhipu reuse HTTP 429 for server-wide overload. The 429 status
path classified these unconditionally as rate_limit with
should_rotate_credential=True, so an overloaded provider exhausted the
credential pool after two errors — fatal for a single-key user, who has
nothing to rotate to.

The credential is valid; the server is just busy. Disambiguate the 429
body against a shared _OVERLOADED_PATTERNS list and route overload
language to FailoverReason.overloaded (retryable, no rotation), matching
the existing 503/529 path and the message-only path (#52890). Genuine
rate limits (no overload language) still rotate.

Extracted the inline overloaded tuple #52890 added into the shared
_OVERLOADED_PATTERNS constant so the status-code and message paths use
one list.

Closes #14038.
2026-06-27 04:16:54 -07:00
ms-alan
16192103f4 fix(config): accept placeholder base_url in custom provider validation
_normalize_custom_provider_entry() ran urlparse() on base_url and dropped
any entry whose value was an un-expanded placeholder, so a caller reaching
the normalizer with raw config (e.g. the Dockerized gateway path) silently
skipped the provider with a 'not a valid URL' warning. Skip URL validation
when the candidate contains a placeholder token — both ${ENV_VAR} env-refs
and bare {region}-style templates — since those are expanded at runtime.

Closes #14457
2026-06-27 04:15:27 -07:00
HiddenPuppy
b34771fc06 fix(cli): disable prompt_toolkit CPR queries to stop escape-sequence leak (#13870)
prompt_toolkit's renderer sends ESC[6n cursor-position queries before
painting in non-fullscreen mode; the terminal replies ESC[<row>;<col>R.
Over SSH/cloudflared tunnels and slow PTYs these replies race past the
input parser and land in the display as raw '20;1R21;1R' text, and the
pending-CPR future can stall the renderer so the prompt freezes after the
agent's final answer.

Build the prompt_toolkit output with enable_cpr=False so CPR is marked
NOT_SUPPORTED up front and ESC[6n is never sent. This is the root-cause
counterpart to the existing input-side _strip_leaked_terminal_responses
scrubbing. Vt100_Output.from_pty() does not expose enable_cpr in
prompt_toolkit 3.x, so _build_cpr_disabled_output() reproduces its
get_size setup and calls the constructor directly; it returns None on any
failure so startup falls back to the default output.

Verified in a real PTY: baseline emits 1 ESC[6n query, the fix emits 0,
banner/UI render identically. Layout is unaffected — with CPR off the
renderer sizes the prompt to its preferred height (the same fallback
prompt_toolkit uses on any terminal that doesn't answer CPR).

Co-authored-by: Hermes Agent <noreply@nousresearch.com>
2026-06-27 04:15:20 -07:00
LeonSGP43
e7c013494d fix(agent): preserve nested API error bodies 2026-06-27 04:13:53 -07:00
Teknium
5ab4136631 fix(webui): switch provider when Config-page model field changes (#53583)
The dashboard Config tab's Model field is a flat string with no provider
info. _denormalize_config_from_web only updated model.default and kept the
stale provider, so picking an OpenRouter model while the default provider was
ollama-local left provider=ollama-local and every call 404'd.

When the model string actually changes, infer the serving provider — curated
catalog first, then a vendor/model-slug heuristic for non-aggregator providers
— and route the switch through the existing _normalize_main_model_assignment /
_apply_main_model_assignment chokepoints so stale base_url/api_mode/api_key are
cleared on a provider change and preserved on a same-provider re-pick. Saving
an unchanged model never re-detects, so unrelated config saves keep an explicit
provider.

Closes #14058
2026-06-27 04:13:44 -07:00
teknium1
7ee0b68973 fix(gateway,feishu): refuse executor resurrection during real shutdown
Add an explicit _closing guard to both owned executors so the
recreate-on-shutdown path only recovers from an *external* teardown of
the loop default — never resurrects a pool the gateway/adapter itself
stopped. _shutdown_*executor() sets the flag; _get_*executor() raises if
closing; feishu connect() re-arms on reconnect. Updates the gateway
recreate test to assert the refusal contract and adds feishu coverage.
2026-06-27 04:13:09 -07:00
teknium1
b296915c82 fix(feishu): route blocking SDK calls through an adapter-owned executor
Feishu SDK calls ran on asyncio's shared default executor, so a torn-down
default executor wedged every send with 'Executor shutdown has been called'
and left the gateway a zombie (#10849). The adapter now owns a
ThreadPoolExecutor recreated on demand if shut down, mirroring the
gateway-owned executor change. Routes all 17 self._client SDK calls through
_run_blocking; shuts the pool down on disconnect.
2026-06-27 04:13:09 -07:00
konsisumer
1011c07966 fix(gateway): use owned executor for agent work 2026-06-27 04:13:09 -07:00
LeonSGP43
52a09d8faf fix(byterover): honor auto extract config 2026-06-27 04:04:15 -07:00
teknium1
f062cf076b fix(agent): also treat provider=ollama as an Ollama GLM backend
Follow-up to the #13971 fix: a genuine native Ollama provider reached
through a reverse proxy carries no ollama/:11434 URL signature, so the
restricted detection would miss it. Add provider=="ollama" as an
explicit True case (idea from #14789, @Tranquil-Flow) and cover both it
and the #13971 LiteLLM-proxy-to-zai false-positive with E2E tests.
2026-06-27 04:03:07 -07:00
YuShu
266521b55f refactor(agent): trim docstring per review feedback
Remove commentary about the previous is_local_endpoint() approach
from _is_ollama_glm_backend() — git history suffices.
2026-06-27 04:03:07 -07:00
YuShu
00a8252b7d fix(agent): scope Ollama/GLM stop-to-length heuristic to Ollama only
The _is_ollama_glm_backend() function was too broad: any local endpoint
running a GLM model was treated as Ollama, triggering the stop->length
misreport heuristic introduced in 8011aa3. This caused false truncation
detection on sglang, vLLM, LM Studio, and other non-Ollama servers that
correctly report finish_reason.

When a GLM model on sglang/vLLM returned finish_reason='stop', the agent
mistakenly reclassified it as 'length' if the response didn't end with
a whitelisted punctuation character (ASCII or CJK). This particularly
affected Chinese-language responses and Markdown-formatted text.

Root cause: the is_local_endpoint() fallback assumed any local GLM
endpoint = Ollama. But many non-Ollama servers also run on localhost.

Fix: remove the is_local_endpoint() catch-all. Only detect Ollama via
its distinctive signatures (port 11434, 'ollama' in URL). All other
local servers are assumed to report finish_reason correctly.

This is the correct tradeoff because:
- False negatives (Ollama at custom port, heuristic not triggered) only
  mean the user sees a truncated response — same as having no heuristic
- False positives (non-Ollama server, heuristic wrongly triggered) inject
  spurious continuation messages into the conversation — strictly worse

Adds two tests:
- sglang GLM response is NOT reclassified as truncated
- Ollama GLM on port 11434 still triggers the heuristic as before

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-06-27 04:03:07 -07:00
teknium1
ab1f9b94c5 fix(telegram): accept @username chat_id in delivery paths (#13206)
TELEGRAM_HOME_CHANNEL set to an @username (not a numeric chat ID) crashed
all webhook/cron->Telegram home-channel delivery with 'ValueError: invalid
literal for int()'. The Telegram Bot API accepts both a numeric chat_id and
an @username string; Hermes was force-coercing every chat_id with int().

Add normalize_telegram_chat_id() (returns int for numeric values, passes
@username strings through) and apply it at the Bot API send/edit sites in
the Telegram adapter and the send_message tool. Username targets are now
recognized as explicit targets in _parse_target_ref.

Reapplies the approach from #13274 (season179), whose branch predated the
gateway/platforms/telegram.py -> plugins/platforms/telegram/adapter.py
relocation. Dupes: #13535 (Tranquil-Flow), #37572 (chewkaah).

Co-authored-by: season179 <season.saw@gmail.com>
2026-06-27 04:01:58 -07:00
teknium1
f2ca3e3d84 fix(gateway): hold _run_restart on _restart_task + explicit cancel-loop skip
Follow-up on the cherry-picked #13173 fix. Holds the _run_restart task in
self._restart_task (a bare asyncio.create_task keeps only a weak reference,
so a still-pending task can be GC'd mid-flight) and explicitly skips it in
the _stop_impl cancel loop alongside _stop_task. Adds AUTHOR_MAP entry for
the contributor and a regression test that fails when the task is cancellable.

Refs #12875
2026-06-27 03:57:31 -07:00
zeapsu
1ce5d6d974 fix(gateway): exclude _run_restart from _background_tasks to prevent zombie on /restart
When request_restart() adds _run_restart to _background_tasks, _stop_impl
later cancels all entries in that set.  Since _run_restart is awaiting
_stop_task at that point, the CancelledError propagates into _stop_impl,
interrupting cleanup before _shutdown_event.set() and _exit_code = 75
execute.  This leaves the gateway as a zombie (alive but disconnected) or
exiting with code 0 instead of 75, preventing systemd Restart=on-failure
from restarting the service.

Fix: don't add _run_restart to _background_tasks — it self-terminates in
~50ms and needs no lifecycle management.

Fixes #12875
2026-06-27 03:57:31 -07:00
teknium1
08e131f77c test(telegram): cover bot self-message ingestion guard (#11905)
Regression tests for the self-author guard added in the salvaged fix:
- bot-authored DM-topic watcher echo is dropped (the exact #11905 symptom)
- bot self-messages dropped in groups/supergroups too
- other bots in the same chat are still processed (self-id, not is_bot)
- observe-unmentioned sibling path also rejects self-messages
- missing from_user does not crash

Test scaffolding ported from @cola-runner's PR #12817 and adapted to the
current plugins/platforms/telegram/adapter.py and _is_own_message().
2026-06-27 03:56:52 -07:00
Sahil-SS9
6fb25f86ac fix(telegram): filter out bot's own messages from inbound processing (#52363) 2026-06-27 03:56:52 -07:00
Teknium
68a65ed7a1 fix(agent_init): correct misleading sub-64K context_length error message (#53569)
The error raised when a model's context window is below the 64K minimum
advertised "or set model.context_length in config.yaml to override" — but
the guard intentionally has no sub-64K escape hatch. Sub-64K models are
rejected by design (tool schemas + system prompt need the headroom).

The misleading clause invited a cluster of dup PRs (#11097, #11110, #8962,
#9142, #37548) all trying to wire an override that we don't want. Reword to
state the real options: pick a >=64K model, or — if your local server
under-reports its true window — declare the real value (which must itself
be >=64K). Guard behavior is unchanged.
2026-06-27 03:56:25 -07:00
Teknium
d73078e7b0 fix(cron): make per-profile cron isolation intentional and tested (#4707) (#53570)
A profile's cron jobs now provably live in AND execute under that profile's
HERMES_HOME. A job authored under profile `coder` is stored at
`~/.hermes/profiles/coder/cron/jobs.json` and runs with coder's .env,
config.yaml, scripts and skills — never the default root's.

This was the de-facto behavior on main but only by accident: PR #50112 had
re-anchored cron storage at the shared default root, and a later stale-branch
squash merge (#52147) silently reverted it back to the profile home. Neither
direction was guarded by a test, so it could flip again on the next stale merge.

Changes:
- cron/jobs.py: document the per-profile storage anchor (get_hermes_home, NOT
  get_default_hermes_root) and why anchoring at the root leaks
  config/credentials/skills across profiles — the #4707 security boundary.
- cron/scheduler.py, cron/suggestions.py: same intent documented at the
  dynamic resolution helper and the suggestions store.
- tests/cron/test_cron_profile_isolation.py: pin storage, lock-path, and
  execution-home resolution to the active profile so a re-anchor can't regress.

Verified E2E: jobs created under two profiles land in separate per-profile
stores with zero cross-profile leakage and no shared-root store; scheduler
execution-home follows the active profile. Full cron suite: 576/576.
2026-06-27 03:55:01 -07:00
Bartok
864d5521ad test(curator): join straggler curator-review thread on fixture teardown
The curator_env fixture left async review threads (synchronous=False spawns
a daemon 'curator-review' thread that calls save_state() on completion)
running past test teardown. save_state() resolves the state path from
HERMES_HOME at write time, so a straggler could write into the next test's
tmp home, corrupting test_state_file_survives_corrupt_read (and others)
under CI load. Join the thread on teardown while HERMES_HOME is still
pinned to this test's home.
2026-06-27 03:52:52 -07:00
Bartok9
45ce35ed72 fix(agent): classify message-only 'overloaded' as server overload
Salvage of #14261 by @ms-alan — rebased onto current main, scoped to the
overloaded-classification fix, with a regression test that fails without it.
2026-06-27 03:52:52 -07:00
teknium1
151ae1e937 test(api-server): cover SSE failure finish_reason for both failure modes
Lock the contract that a clean stream-queue termination followed by an
agent failure never reports finish_reason: "stop". Covers the raised-
exception case (#12422 repro), the flagged failed-result case, truncation
(length), and the success happy path.

Follow-up to the salvaged #12504 fix from @flobo3.
2026-06-27 03:52:44 -07:00
flobo3
b8b695e2cd fix(api): surface agent crash in SSE chat completions stream 2026-06-27 03:52:44 -07:00
Teknium
f67c0b3e60 docs(hermes-agent skill): cover v0.13–v0.17 features, fix stale claims, tighten (#53566)
Refresh the hermes-agent skill against the last 5 major releases and the
current codebase, and cut verbose prose.

Coverage added (v0.13.0–v0.17.0):
- New gateway platforms: iMessage (Photon), Teams, LINE, SimpleX, ntfy,
  Google Chat, Raft, official WhatsApp Business Cloud API (now 20+).
- New surfaces section: desktop app, web dashboard admin panel,
  hermes proxy (OpenAI-compatible OAuth proxy), Automation Blueprints.
- delegate_task(background=true) async subagents; memory-tool atomic
  batch operations; session_search three-mode shape; x_search/video_analyze
  toolsets; image_gen image-to-image; xAI Grok via SuperGrok OAuth.
- display.interface (cli/tui), curator.consolidate opt-in, PyPI install.

Accuracy fixes:
- Adding-a-Tool is two files (auto-discovery), not three.
- Testing uses scripts/run_tests.sh (canonical runner), not bare pytest.
- Dropped change-detector test count and a dangling references/ pointer.
- Refreshed overview (Windows-native, 20+ providers, many surfaces).

Conciseness: trimmed over-explained Windows keybinding/sandbox/test prose
and deep prompt-builder internals to pointers.
2026-06-27 03:51:25 -07:00
Teknium
d3db73210c chore(release): map blaryx@gmail.com → Blaryxoff for PR #32602 salvage 2026-06-27 03:48:18 -07:00
blaryx
76af2456a2 fix(dashboard): merge PUT /api/config with existing on-disk config
The dashboard form is built from CONFIG_SCHEMA, which doesn't enumerate
every root-level key the YAML supports. Most visibly, `custom_providers`
is in `_KNOWN_ROOT_KEYS` but is absent from the schema — so the frontend
never sends it in the PUT body. The previous full-replace save() then
silently wiped the key from disk every time the user clicked anything
that triggered a save. Other casualties (less visible because defaults
re-mask them on load) include `agent.personalities`,
`agent.reasoning_effort`, `terminal.lifetime_seconds`, etc.

Fix: read the raw on-disk config and deep-merge the incoming PUT body
on top of it before saving. The frontend can only overwrite what it
explicitly sends; everything else is preserved verbatim.

Reuses the existing `_deep_merge` helper from `hermes_cli.config`.

Tests:
- `test_round_trip_preserves_custom_providers` exercises the exact bug:
  seed config with custom_providers, GET → drop the key → PUT,
  assert it's still on disk.
- `test_round_trip_preserves_schema_invisible_nested_keys` covers the
  shallow-vs-deep-merge case for nested dicts under `agent` etc.
Both fail on current main; both pass with this patch.
2026-06-27 03:48:18 -07:00
Teknium
ec769e49d2 fix(gateway): WhatsApp/Signal hints affirm markdown instead of forbidding it (#53564)
The 'whatsapp' and 'signal' PLATFORM_HINTS told the agent 'Please do not
use markdown as it does not render' — factually wrong. Both adapters
actively convert markdown to native formatting:

- whatsapp_common.format_message(): **bold**, ~~strike~~, # headers,
  links, code blocks -> WhatsApp native syntax
- signal_format.markdown_to_signal(): same conversions via bodyRanges,
  plus '- item' / '* item' bullets -> '• ' Unicode bullets

The wrong hint made the agent strip bullets and bold the adapter would
have rendered (#12224). Rewrote both hints to mirror whatsapp_cloud:
markdown is auto-converted, bullet lists work, tables are not supported.
Added a contract test asserting markdown-converting platforms never
forbid markdown in their hint.
2026-06-27 03:46:41 -07:00
teknium1
a5d1f68c74 refactor(moa): share one virtual-provider row builder across pickers
Follow-up on the gateway-picker salvage: the cherry-picked change added a
second copy of the MoA virtual-provider row in model_switch.py, duplicating
inventory._moa_provider_row (same slug/name/preset-models, identical extra
fields). Make _moa_provider_row take a bare current_provider string and reuse
it from the gateway picker path so the row shape lives in one place and the
two surfaces can't drift.
2026-06-27 03:43:38 -07:00
dodo-reach
ed54469d06 fix(gateway): show MoA presets in model picker 2026-06-27 03:43:38 -07:00
Teknium
789f8b7dc2 docs(webhook): clarify authenticated != trusted-content trust model (#53562)
HMAC validation authenticates the webhook sender, not the business
fields inside the payload (PR titles, commit messages, issue bodies),
which are authored by untrusted third parties. Expand the prompt-
injection section to make the trust boundary explicit: the agent's
capability surface, not the input channel. Document the hardening
levers (sandbox the runtime, scope the toolset, keep approvals on,
template narrowly) instead of pretending to sanitize untrusted text.

Refs #8820.
2026-06-27 03:43:33 -07:00
teknium1
4e0788783b refactor(gateway): extract MoA one-shot restore helper; restore #28686 comment; real-method tests
Follow-up on the salvaged MoA restore fix:
- Extract the finally-block restore into _restore_moa_one_shot() so the
  behavior is unit-testable without re-implementing it, and so the gateway
  /moa handler and the finally block share one implementation.
- Restore the load-bearing #28686 zombie-eviction comment above
  _release_running_agent_state that the original diff dropped.
- Rewrite the tests to call the real _restore_moa_one_shot helper (the
  originals re-implemented the restore logic inline, so they passed
  regardless of the production code).
2026-06-27 03:43:28 -07:00
srojk34
2f29e3cfc5 fix(gateway): restore MoA one-shot model override on failed turns
The MoA one-shot restore ran inside the try block after
_handle_message_with_agent returned. When that call raised an
exception (agent init failure, interpreter shutdown, OOM), the
restore was skipped and the MoA model override stayed permanently
on _session_model_overrides — silently routing all subsequent
messages through the MoA reference fan-out with no user-visible
indication.

Move the restore to the finally block so it fires on every exit
path (success, exception, interrupt). The restore data lives on
the per-turn event object and would be lost if not consumed here.
2026-06-27 03:43:28 -07:00
briandevans
17cb829991 test(moa): cover non-list/bare-dict reference_models normalization 2026-06-27 03:43:16 -07:00
briandevans
8dd4e576d0 fix(moa): tolerate non-list reference_models in hand-edited MoA preset config 2026-06-27 03:43:16 -07:00
Teknium
60f58a2b95 feat(verify-on-stop): default OFF, one-time migration, skip doc-only edits (#53552)
The verify-on-stop guard fired too eagerly — including on doc/markdown/skill
edits with nothing to verify, where it pushed a pointless /tmp verification
script. Three changes:

1. Default OFF for new installs: agent.verify_on_stop defaults to false
   (was the "auto" surface-aware sentinel). _config_version bumped 30 -> 31.
2. One-time migration (v30 -> v31): existing installs are switched off once,
   but only when the value is missing or still the "auto" sentinel — an
   explicit true/false the user set is preserved.
3. Path filter: build_verify_on_stop_nudge() now drops documentation/prose
   paths (.md/.mdx/.rst/.txt/LICENSE/CHANGELOG/...) so even when explicitly
   enabled, a doc-only turn never nudges. Mixed doc+code turns still nudge on
   the code paths.

The legacy "auto" sentinel is still honored when set explicitly (ON for
interactive coding surfaces, OFF for messaging). HERMES_VERIFY_ON_STOP env
override unchanged.
2026-06-27 03:23:22 -07:00
teknium1
29ee4bbff6 refactor(dashboard): tighten cron-job form helpers
Collapse the three near-identical optional-text helpers
(optionalText/optionalBaseUrl/listToText) into one optionalText with a
strip-trailing-slash flag, route listToText + toolsets through the
existing splitCronList, and replace the repeated
typeof x === 'string' ? x : '' ladders with a single asString helper.
Behavior-identical; all 16 vitest cases pass.
2026-06-27 03:20:32 -07:00
Versun
c655cdf2c1 feat(dashboard): expose cron job execution fields 2026-06-27 03:20:32 -07:00
teknium1
50f6855217 feat(moa): make /moa one-shot only; route preset switching through the model picker
/moa no longer does a sticky model switch. It now always runs a single
prompt through the default MoA preset and restores the prior model
afterward; the whole argument is the prompt (no preset-name matching).
To switch to a MoA preset for the session, select it from the model
picker, where presets already surface under a virtual Mixture of Agents
provider on every model-selection surface.

Also fixes #53444: the TUI one-shot only set session[model_override],
which the already-built cached agent ignored, so MoA silently never ran
and the turn used the original model. The TUI now does a real in-place
agent.switch_model() via _apply_model_switch() when a live agent exists
(with a proper restore after the turn), and falls back to a model_override
for lazy/unbuilt sessions.

Removes the redundant sticky-switch branch from the CLI, gateway, and TUI
/moa handlers; updates the command description, usage string, and docs.
2026-06-27 03:09:09 -07:00
teknium1
3cd4693494 chore: add DiamondEyesFox to AUTHOR_MAP for PR #53351 salvage 2026-06-27 03:04:26 -07:00
diamondeyesfox
8df231c941 fix(agent): rebaseline in-place compression flushes 2026-06-27 03:04:26 -07:00
Mahesh Sanikommu
1b75b3fd90 feat(memory): add Supermemory setup connection summary
Add post_setup() and get_status_config() to the Supermemory memory
provider so `hermes memory setup` and `hermes memory status` print a
one-line connection summary (container, profile fact count,
auto_recall/auto_capture). Point API-key onboarding at the Hermes
connect URL (app.supermemory.ai/integrations?connect=hermes).

Salvage of #52988. Two fixes folded in:

- Test isolation: the new probe/status tests mocked _SupermemoryClient
  but not the __import__("supermemory") guard inside
  _probe_supermemory_connection, so they passed only where the optional
  supermemory package was installed and failed on a clean checkout / CI
  (the PR shipped with red CI). Added _stub_supermemory_importable()
  mirroring the existing test_is_available_false_when_import_missing
  pattern; the suite now passes with supermemory absent.

- post_setup: `if api_key and api_key not in os.environ` checked whether
  the key's *value* named an env var (always false in practice). Fixed to
  compare the value: `os.environ.get("SUPERMEMORY_API_KEY") != api_key`.

Verified: 38/38 in test_supermemory_provider.py and the full
tests/plugins/memory/ suite green with supermemory not installed.

Closes #52988
2026-06-27 15:07:34 +05:30
underthestars-zhy
8827300267 fix(photon): correlate tapbacks to bot message context
Populate `reply_to_message_id`, `reply_to_text`, and
`reply_to_is_own_message` on reaction events so the gateway injects
`[Replying to your previous message: "..."]` when the agent receives
a tapback.

The sidecar now extracts a capped text preview from the hydrated
reaction target (plain text and mixed group messages; null for
attachment/voice-only targets), emitting it as `targetText` in the
NDJSON reaction payload. The Python adapter reads this field and sets
the reply correlation fields on the `MessageEvent`.
2026-06-27 00:51:34 -07:00
underthestars-zhy
4345b3e767 fix(photon): upgrade spectrum-ts sidecar to v8.0.0
v8 made `richlink` outbound-only; inbound rich links now arrive as
plain `text`. Remove the `getBalloonBundleId`/`toRichlinkMessage`
branches from the iMessage mapper patch and update the fixture,
lockfile, and README accordingly.
2026-06-27 00:51:34 -07:00
underthestars-zhy
5636c22828 feat(photon): upgrade spectrum-ts sidecar to v7.0.0
Update the Photon platform plugin's Node.js sidecar from spectrum-ts
3.1.0 to 7.0.0, which splits the SDK into scoped `@spectrum-ts/*`
packages with `spectrum-ts` as the umbrella re-export.

- Bump exact pin in package.json/package-lock.json to 7.0.0
- Update mixed-attachments patch script to target the new
  `@spectrum-ts/imessage/dist/index.js` path and tab-indented output
- Rewrite test fixture to match v7.x mapper shape (tab-indented,
  `const ... = async` declarations, single-line builder calls) and
  point at `@spectrum-ts/imessage/dist/index.js`
- Update README upgrade guide to document the v5 package split and
  the postinstall patch validation step
- Update comments in cli.py and index.mjs to reference v5/v7 changes
2026-06-27 00:51:34 -07:00
Teknium
d712a7fd73 fix(model-picker): surface the current custom/uncurated model in picker rows (#53457)
A model selected via the CLI (e.g. /model openrouter/<uncurated-name>) was
absent from every model picker — the main picker AND the MoA reference/
aggregator slot pickers — because each provider row only carried its curated
catalog. Inject the current model at the front of its provider's row so it is
selectable and shown everywhere.
2026-06-27 00:06:34 -07:00
Ben Barclay
fbf748b282 fix(dashboard-auth): follow redirects on self-hosted OIDC discovery (#53399)
The self-hosted OIDC provider fetched the discovery document with a bare
httpx.get(). httpx defaults to follow_redirects=False (unlike curl -L or
the requests library), so when an IDP answers GET
/.well-known/openid-configuration with a 3xx — Authentik canonicalises the
.well-known path, and any IDP behind a reverse proxy doing an http→https
upgrade redirects too — the bare redirect (empty body) tripped the
status != 200 guard and raised 'OIDC discovery returned 302', which
routes.py maps to the provider_unreachable audit event and a 503. The
browser surfaced 'Auth provider self-hosted unreachable'.

The user's smoking gun (curl -o writing zero bytes from inside the
container) is exactly a redirect with no body — the same wall the code hit.

Add follow_redirects=True to the discovery GET only. It's safe: the
issuer-pin check and _require_https_or_loopback still validate the resolved
document and every endpoint, so a redirect can't smuggle in a bad issuer or
a cleartext endpoint. The token/revocation POSTs deliberately keep the
no-follow default (they carry an auth code / refresh token and the endpoint
is already the canonical absolute URL).

Existing discovery tests mocked httpx.get with a canned 200 and never
exercised a real 3xx. Add a regression test that runs a real loopback
server returning a 302 on the .well-known path — fails without the fix
(ProviderError: discovery returned 302), passes with it.
2026-06-27 14:14:51 +10:00
ethernet
dd0e4ab81a change(ci): slice files in matrix job
avoid duplicating work, avoid file discovery on each job
2026-06-26 19:15:18 -07:00
ethernet
1a75387fa8 change(ci): log json decode error in durations 2026-06-26 19:15:18 -07:00
ethernet
707ae6e623 change(tests): don't count with pytest collect
it's way too slow. just grep files lol
2026-06-26 19:15:18 -07:00
ethernet
bcc3eb3419 fix(ci): rip out some xdist legacy stuff... how did these ever work?? 2026-06-26 19:15:18 -07:00
ethernet
2fa66950e8 change(ci): upload-artifact from v4 -> v7 2026-06-26 19:15:18 -07:00
ethernet
4b0a2040e7 change(ci): use run_tests in docker 2026-06-26 19:15:18 -07:00
ethernet
18f7ad49ab change(ci): update all UV installs 2026-06-26 19:15:18 -07:00
ethernet
f0cb049217 change(ci): migrate docker smoketests to real tests 2026-06-26 19:15:18 -07:00
ethernet
2bd17221b7 change(ci): pretty names 2026-06-26 19:15:18 -07:00
ethernet
9a861cd0ab change(tests): don't pass pytest args when counting tests 2026-06-26 19:15:18 -07:00
ethernet
447f9e7c89 change(nix): simpler dev setup 2026-06-26 19:15:18 -07:00
ethernet
8ae793d3de change(nix): ship fat hermes agent by default 2026-06-26 19:15:18 -07:00
ethernet
fb1dd1bf91 change(ci): docker-publish.yml -> docker.yml 2026-06-26 19:15:18 -07:00
ethernet
35dfe7b58f change(ci): docker runs again on PRs 2026-06-26 19:15:18 -07:00
ethernet
4cf69f0da4 refactor(ci): more test slices 2026-06-26 19:15:18 -07:00
ethernet
d4aec4e92f refactor(ci): run tests thru run_tests.sh 2026-06-26 19:15:18 -07:00
ethernet
c918d07b50 refactor(ci): rewrite docker tests to check built container 2026-06-26 19:15:18 -07:00
ethernet
638243726e refactor(ci): faster docker builds via --link and chmod removal 2026-06-26 19:15:18 -07:00
brooklyn!
f6e815e378 Merge pull request #53357 from helix4u/fix/desktop-titlebar-overlay 2026-06-26 20:42:11 -05:00
Gille
1bff85cf66 fix(desktop): keep titlebar overlay off session title 2026-06-26 19:18:26 -06:00
Nacho Avecilla
dbe734beff fix(dashboard-auth): exclude non-interactive providers from interactive login surfaces (#53239)
* Return None instead of erroring on drain login failure

* Fix login on drain

* Remove login for drained endpoints flow and clean the code

* chore: drop unrelated credits changes from this PR

* Remove extra comments that were not really necessary
2026-06-27 10:08:13 +10:00
brooklyn!
7a38d64a85 Merge pull request #53335 from NousResearch/bb/desktop-custom-model-blank-selector
fix(desktop): show custom (non-curated) model in Settings model pickers
2026-06-26 18:59:43 -05:00
Brooklyn Nicholson
a6ae179f43 fix(desktop): show custom (non-curated) model in Settings model pickers
A Radix <Select> renders a blank trigger when its `value` matches no
<SelectItem>. The Settings model pickers built their options solely from
each provider's curated `models` list, so a model added via config that
isn't in that list (e.g. anthropic/claude-opus-4.7 on nous) selected
nothing and showed an empty selector.

Union the active value into the options via a small `withActive` helper,
applied to the main, auxiliary, MoA reference, and MoA aggregator model
selects so the configured model always stays visible and selectable.
2026-06-26 18:55:44 -05:00
kshitijk4poor
7475d125d2 test(mcp): stub mcp_oauth in backgrounding test to deflake CI
The backgrounding-contract test (test_prepare_agent_startup_backgrounds_
blocking_mcp_for_chat) failed intermittently on loaded CI shards: it stubs
tools.mcp_tool.discover_mcp_tools but NOT tools.mcp_oauth, so the background
discovery thread paid the real, cold ~0.75s 'import tools.mcp_oauth' (added by
this PR's _discover_mcp_tools_without_interactive_oauth) before calling the
stubbed discovery. On a slow/loaded runner that import plus thread scheduling
exceeded the 1.0s polling deadline, leaving calls['mcp'] == 0.

Fix: stub tools.mcp_oauth with a nullcontext suppress_interactive_oauth (the
same no-op production falls back to when mcp_oauth is unavailable), so the
test exercises the backgrounding contract without paying an unrelated cold
import in its timing window. Bumped the poll deadline 1.0s -> 3.0s as
belt-and-suspenders. Production behaviour is unchanged; the import cost was
always off the main thread.

Verified: 5/5 pass repeatedly via scripts/run_tests.sh (per-file isolation,
matching CI), ruff clean.
2026-06-27 04:59:23 +05:30
zapabob
e55ddc3e33 fix(mcp): suppress interactive OAuth stdin prompts during background discovery (#35927)
When an MCP server requires OAuth, the interactive `hermes` TUI froze on
startup: background MCP discovery hit the OAuth flow, which on an interactive
TTY spawns a daemon thread doing a blocking `sys.stdin.readline()` (the
"paste the redirect URL" fallback in mcp_oauth._wait_for_callback). That
thread competes with the TUI's own stdin reader for the same terminal, so
keystrokes get swallowed and the TUI appears frozen (up to the 300s OAuth
timeout). Reported symptom: "MCP OAuth: authorization required / Open this URL
... the tui is freezing, not respond to typing."

Add a thread-local `suppress_interactive_oauth()` context manager in
tools/mcp_oauth.py; `_is_interactive()` returns False while it's active, so the
stdin paste-thread and prompt are never created. Background discovery
(hermes_cli/mcp_startup.py, tui_gateway/entry.py) now runs discovery inside
that context, so OAuth-requiring servers soft-skip (raise
OAuthNonInteractiveError, already handled) instead of stealing the TUI's stdin.
A real `hermes mcp login` on the main thread is unaffected (thread-local).

Salvaged from #35945 by @zapabob (authorship preserved via cherry-pick;
resolved a conflict against main's new mcp_discovery_timeout / wait_for_mcp_
discovery refactor, keeping both). Verified E2E: with suppression the paste
prompt is NOT printed and no stdin thread spawns (raises OAuthNonInteractive
soft-skip); without it the prompt shows (the freeze). Mutation-verified
(removing the suppress check in _is_interactive fails the regression test).
76 tests pass, ruff clean.

Closes #35927.

SELF-REVIEW FIX: the original #35945 used threading.local(), which does NOT
propagate to the dedicated mcp-event-loop thread where OAuth actually runs
(discover_mcp_tools dispatches the connect via run_coroutine_threadsafe), so
the suppression was a NO-OP in production (the tests passed only by stubbing
out the cross-thread dispatch). Converted to a contextvars.ContextVar, which
asyncio copies onto the scheduled coroutine — empirically verified suppression
now holds on the mcp-event-loop thread through the real _run_on_mcp_loop path.
Added a cross-thread regression test (fails on threading.local, passes on the
ContextVar) so the no-op can't regress.
2026-06-27 04:59:23 +05:30
briandevans
2d8c44ac87 fix(hermes-home): only honour legacy dir layout when it has content
get_hermes_dir(new_subpath, old_name) returned the legacy <old_name>/
location as soon as it existed on disk — even when empty. When an empty
legacy stub is created on a profile that already has populated data at
the new consolidated <new_subpath>/ (install scaffolds, profile init, a
stray mkdir, or ensure_hermes_home() recreating legacy dirs), the
resolver silently flipped to the empty legacy dir and the real data
became invisible. No log, no error — the feature behaved as if state was
wiped. Reproduced as a Discord pairing store losing every approved user
when an empty pairing/ shadowed the populated platforms/pairing/.

Resolve the legacy path only when it has content: a populated directory
(any entry) or a non-directory file counts; an empty directory falls
through to the new layout. Inspection failures (PermissionError on
lstat/iterdir, or any OSError short of FileNotFoundError) are treated as
"occupied" so a transient error never orphans legacy data — only a
genuine FileNotFoundError counts as absent. The lstat()-based gate also
fixes the prior exists()/is_dir() path swallowing PermissionError and
mis-reading an unreadable legacy dir as absent.

This hardens all 11+ call sites that share the resolver (pairing,
image/audio/video/document caches, matrix/whatsapp session stores,
vision/credential/tts/browser dirs).

Adds TestGetHermesDir regression coverage (empty/populated/subdir/file/
unreadable/unstatable cases) and updates test_credential_files to
populate its legacy dirs so they still count as content.

Closes #27602
Closes #27715
2026-06-27 04:57:15 +05:30
briandevans
c377e954fb test(gateway): isolate secret-redaction layer from provider-error rewrite
The existing test_chat_gateways_redact_secret_in_provider_error feeds a
provider-error envelope (HTTP 401), which _sanitize_gateway_final_response
rewrites wholesale to a generic category string. That rewrite strips the
secret regardless of whether the redaction layer works, so the test cannot
on its own prove _redact_gateway_user_facing_secrets is exercised.

Add test_chat_gateways_redact_secret_in_non_error_body: ordinary assistant
prose that echoes a bearer token but is NOT a provider-error envelope, so
the rewrite path does not fire and secret redaction is the only defense.
Verified fail-before (token leaks when _GATEWAY_SECRET_PATTERNS is emptied)
and pass-after across whatsapp/slack/signal/matrix, while non-secret prose
is preserved intact.
2026-06-27 04:47:10 +05:30
briandevans
57864d07ed fix(gateway): suppress operational status/error noise on all chat gateways, not just Telegram (#39293)
The Telegram noise/secret filter added in #28533 gated its work on
`_gateway_platform_value(platform) != "telegram"`, so
`_sanitize_gateway_final_response` and `_prepare_gateway_status_message`
only ran for Telegram. Every other human-facing chat surface
(WhatsApp, Discord, Slack, Signal, Matrix, plugin platforms, etc.)
received raw provider-error bodies verbatim — including any leaked
credentials the secret-redaction pass (`sk-…`, `Bearer …`, `gh[pousr]_…`,
`xox[baprs]-…`, `hf_…`, `glpat-…`) was meant to strip.

Invert the gate from a one-platform allowlist into a small
programmatic-surface denylist: only `local`, `api_server`, `webhook`,
and `msgraph_webhook` consume gateway text programmatically and keep raw
status/error text. Every other (chat) surface — including unknown/empty
platform values and on-demand plugin pseudo-members — fails closed to
the redacted, noise-filtered, sanitized path. This widens the same
root-cause fix to both call sites: status callbacks and final replies.
2026-06-27 04:47:10 +05:30
kshitijk4poor
244a6f2ceb fix(desktop): broken "Open setup guide" button for plugin platforms
On the desktop Channels / Messaging page, the "Open setup guide" button was
rendered as a bare <a href={platform.docs_url} target="_blank"> with no guard.
Plugin-provided platforms (Microsoft Teams, Google Chat, Line, Raft, Yuanbao,
…) ship an empty docs_url, so the anchor's href was "".

In a packaged build, Electron resolves an empty href against the current
document — the app's own index.html inside the asar bundle — and
shell.openPath then fails with an OS "file not found" dialog. This is exactly
the Windows error reported for Messaging → Teams → Open guide.

Fix (3 changes):

1. fix(desktop) — Only render the "Open setup guide" button when docs_url is
   non-empty, and route clicks through openExternalLink so a relative/empty
   value can never be treated as a local bundle path. Fixes the whole class
   (every plugin platform), not just Teams.

2. fix(messaging) — Give the Teams platform plugin a real docs_url (Microsoft
   Teams setup guide) so its card shows a working button instead of nothing.

3. fix(messaging) — Give the Google Chat platform plugin a real docs_url
   (Google Chat setup guide) so its card shows a working button instead of
   nothing. Originally from #48940; folded in here because that PR's test
   was broken (it queried the HTTP endpoint, but google_chat is a dynamic
   enum member that only appears after the adapter module is imported).

Test plan:
- apps/desktop — new src/app/messaging/index.test.tsx: button is hidden when
  docs_url is empty; a real URL opens via the validated external opener (does
  not navigate).
- apps/desktop typecheck (tsc --noEmit) clean.
- backend — test_teams_messaging_metadata_links_setup_guide: the Teams catalog
  entry exposes the setup-guide docs_url.
- backend — test_google_chat_messaging_metadata_links_setup_guide: the Google
  Chat catalog entry exposes the setup-guide docs_url.

Co-authored-by: xxxigm <tuancanhnguyen706@gmail.com>
Co-authored-by: p-andhika <andhika.prakasiwi@gmail.com>
2026-06-27 04:34:08 +05:30
kshitijk4poor
58919f68ab fix: also preserve provider selection on Esc-clear-filter path
The back() handler had the same filtered-index drift bug as the Enter
and Ctrl+D transitions: when the user presses Esc to clear an active
filter on the provider stage, providerIdx was reset to 0, losing the
highlighted provider. Apply the same providerIndexAfterClearingFilter
fix as the other three transition paths.

Also adds edge-case tests for the helper: undefined provider, slug not
found, empty rows, and duplicate slug first-match behavior.

Found by hermes-pr-review Phase 2 + hermes-agent-dev 3-agent review.
2026-06-27 04:33:48 +05:30
Donovan Yohan
386478211b fix(tui): preserve filtered model provider selection 2026-06-27 04:33:48 +05:30
kshitijk4poor
b0f44d3fad fix(gateway): remove process-global HERMES_SESSION_KEY write that misroutes approval prompts across concurrent sessions
GatewayRunner._run_agent's run_sync() wrote the per-turn session key to
the process-global os.environ["HERMES_SESSION_KEY"]. Because os.environ
is shared across the whole process, concurrent gateway sessions (e.g.
two Discord threads) clobbered each other's value. A tool worker thread
whose approval contextvar was unset then fell back to os.environ via
get_current_session_key() and read whichever session ran run_sync()
last — routing "Command Approval Required" prompts to the wrong thread.

Session routing is already concurrency-safe via contextvars:
- gateway/session_context.py _SESSION_KEY (set in set_session_vars)
- tools/approval.py _approval_session_key (set via set_current_session_key
  right before the agent runs, inherited by tool worker threads)

The only non-test readers of HERMES_SESSION_KEY (tools/approval.py,
tools/terminal_tool.py, tools/kanban_tools.py) all prefer the contextvar
with os.environ as a mere fallback. CLI/cron/TUI set their own os.environ
via separate export paths (e.g. the TUI parent exporting it into the
agent subprocess), so removing this in-process write does not affect them.

Adds regression tests asserting the resolver prefers the contextvar and
does not leak a concurrent session's cleared/clobbered os.environ value.

Closes #24100

Co-authored-by: Yosapol Jitrak <yosapol@jitrak.dev>
2026-06-27 04:31:37 +05:30
kshitijk4poor
cdb1dfbc49 fix: use os.pathsep, add tests, update tips for multi-root support
- Use os.pathsep instead of literal ':' so Windows paths (C:\dir) and
  the Windows separator ';' work correctly.
- Add 9 tests covering multi-root behavior: writes inside first/second
  root, writes outside all roots, trailing/leading/double separators,
  all-separators edge case, static deny priority, duplicate dedup.
- Update hermes_cli/tips.py tip string to mention multiple paths.
- Update docs to mention os.pathsep / ; on Windows.

Follow-up for salvaged PR #49557.
2026-06-27 04:01:12 +05:30
Zheng Tao
d15cc9bc83 docs: update HERMES_WRITE_SAFE_ROOT docs with multi-path format
Add note about colon-separated multiple directories support.
2026-06-27 04:01:12 +05:30
Zheng Tao
fa8f1517da feat(file_safety): support multiple HERMES_WRITE_SAFE_ROOT dirs
Supports multiple directories separated by ':' (Unix PATH-style).
E.g., HERMES_WRITE_SAFE_ROOT=/opt/data:/var/www/html

Fixes #49535
2026-06-27 04:01:12 +05:30
kshitijk4poor
a67ddf5983 fix: drop isinstance(str) guard so client.base_url fallback works with httpx.URL
The OpenAI SDK exposes client.base_url as an httpx.URL object, not str.
The isinstance(live_raw, str) guard made this branch dead code in
production. Use _normalized_runtime_url (which coerces via str()) so
the fallback actually fires.
2026-06-27 03:59:36 +05:30
xxxigm
2608f78b93 test(delegate): cover stale parent base_url inheritance for subagents
Add regression tests ensuring delegate_task passes the parent's active
localhost endpoint to child agents instead of a leftover OpenRouter URL.
2026-06-27 03:59:36 +05:30
xxxigm
25b7348457 fix(delegate): inherit subagent endpoint from parent active client
When parent_agent.base_url still carries a stale OpenRouter URL but the
live OpenAI client already points at local Ollama, subagents were routing
API calls to OpenRouter and failing with HTTP 401. Prefer _client_kwargs
and the mounted client base_url when they disagree with the surface field.
2026-06-27 03:59:36 +05:30
kshitijk4poor
6326d5c6f6 fix: remove duplicated table renderer from Telegram adapter
The PR's original refactor commit only replaced the primitives (regex,
is_table_row, split_markdown_table_row) with shared imports but left the
verbatim-copied renderer (_render_table_block_for_telegram) and driver
(_wrap_markdown_tables) in place. Both are logic-identical to the shared
convert_table_to_bullets in gateway/platforms/helpers.py.

Replace both with a direct import alias. _TABLE_SEPARATOR_RE is still
imported separately because it's used by the rich-message routing logic
(lines 1024, 1044) to detect whether content contains tables.

Found by 3-agent parallel code-reuse review.
2026-06-27 03:57:24 +05:30
Yashiel Sookdeo
24a4df9cd1 refactor(telegram): import shared table-detection primitives from helpers.py
Replace local _TABLE_SEPARATOR_RE, _is_table_row, and
_split_markdown_table_row with imports from the shared module.
Telegram-specific rendering stays local.

Co-authored-by: Yashiel Sookdeo <yashiel@skyner.co.za>
2026-06-27 03:57:24 +05:30
Yashiel Sookdeo
cf7bf5bdc9 fix(discord): auto-convert markdown tables to bullet groups
Discord does not render GFM pipe tables — raw pipe characters display
as garbage text. format_message now rewrites tables into bold-heading +
bullet groups using the shared helpers.

Fixes #21168

Co-authored-by: Yashiel Sookdeo <yashiel@skyner.co.za>
2026-06-27 03:57:24 +05:30
Yashiel Sookdeo
70c834a740 refactor: extract shared GFM table→bullet helpers into helpers.py
Move table-detection regex, row-splitting, and table-to-bullet
conversion into gateway/platforms/helpers.py so both Discord and
Telegram adapters can share them.

Co-authored-by: Yashiel Sookdeo <yashiel@skyner.co.za>
2026-06-27 03:57:24 +05:30
kshitij
9c9b28a2b3 Merge pull request #53296 from kshitijk4poor/chore/author-map-yashiels-45781
chore: AUTHOR_MAP — yashiel@skyner.co.za → yashiels
2026-06-27 03:47:07 +05:30
kshitijk4poor
5eb108f06c chore: AUTHOR_MAP — yashiel@skyner.co.za → yashiels
PR #53284 salvage (discord markdown table-to-bullet conversion; #21168)
2026-06-27 03:38:29 +05:30
Teknium
391090083c fix(desktop): persist MoA preset add/delete/set-default immediately (#53290)
The desktop MoA settings 'Add preset', 'Set default', and 'Delete' buttons
mutated local React state only and never called the save endpoint, so a newly
constructed preset vanished on refresh. Each now builds the next config and
calls saveMoa() so the change is written to config.yaml via PUT /api/model/moa.
2026-06-26 15:06:18 -07:00
Teknium
7e101e553b fix(moa): block the moa virtual provider as a reference or aggregator slot (#53281)
A MoA preset whose reference or aggregator slot points at the moa virtual
provider creates a recursive MoA tree. The runtime guards in moa_loop.py only
surface this mid-turn (references silently skipped, aggregator raises). Reject
it at the config chokepoint (_clean_slot) so it can never be saved, and hide it
from the desktop/dashboard slot pickers so it isn't offered as a dead choice.
2026-06-26 14:42:42 -07:00
liuhao1024
515192c4b9 fix(tools): use start_new_session instead of preexec_fn to prevent SIGSEGV in multi-threaded processes
preexec_fn=os.setsid runs Python code in the forked child before exec,
which is unsafe in multi-threaded processes (CPython docs). When the
Desktop gateway loads native libraries (onnxruntime, BLAS, provider SDKs)
with active thread pools, the fork can SIGSEGV before the child execs.

Replace all preexec_fn usage with start_new_session=True, which provides
the same setsid/process-group semantics without running Python in the
fork. This is already the pattern used throughout hermes_cli/gateway.py
and hermes_cli/_subprocess_compat.py.

Fixes #46789
2026-06-27 03:08:41 +05:30
srojk34
f0678b031e fix(moa): tolerate non-numeric values in hand-edited MoA preset config
_normalize_preset uses bare float() and int() to coerce
reference_temperature, aggregator_temperature, and max_tokens from
config.yaml.  When a user hand-edits a non-numeric value (e.g.
max_tokens: "8k" or reference_temperature: "hot"), the coercion raises
ValueError.  Since normalize_moa_config runs on every model-selection
and MoA turn (via resolve_moa_preset), the crash is unrecoverable and
blocks all MoA usage until the config is manually fixed.

Replace the bare casts with _coerce_float / _coerce_int helpers that
fall back to the default on TypeError/ValueError instead of raising.
2026-06-26 14:35:38 -07:00
Teknium
9b2af36d5a docs(moa): document prompt-caching behavior for references and aggregator (#53218)
* docs(moa): document prompt-caching behavior for references and aggregator

* docs(moa): clarify references preserve cache, only aggregator trades reuse

* docs(moa): correct caching prose — tail-append preserves aggregator cache too
2026-06-26 12:58:05 -07:00
Teknium
525e1e775d fix(skills): background review fork respects pinned skills (#53226)
The autonomous self-improvement review fork could still write to a pinned
skill — only external/bundled/hub-installed/protected-builtin skills were
guarded. The curator skips pinned skills from every auto-transition; the
review fork is the same kind of no-user-present actor and must too.

Adds a pin check to _background_review_write_guard so background-origin
edit/patch/delete/write_file/remove_file on a pinned skill are refused.
Stricter than the foreground _pinned_guard (delete-only) by design: with
no user in the loop there is no one to consent to an edit.

Fixes #25839
2026-06-26 12:49:33 -07:00
Nacho Avecilla
f509f6e598 fix(dashboard): offload PTY spawn/close off the event loop (#53227)
* Fix blocking tasks on the dashboard

* Remove unnecessary comments
2026-06-26 12:47:23 -07:00
Teknium
217047de2d fix(agent): silence verification-stop loop status line (#53223)
The verify-on-stop guard (#52296) printed '↻ Verification required before
finishing' to the terminal on every internal nudge turn, adding noise to
CLI/gateway sessions whenever code was edited without fresh passing checks.
Demote the user-facing status emit to a logger.debug breadcrumb — the loop
still nudges the model to verify before finishing, just silently.
2026-06-26 11:52:11 -07:00
briandevans
3c8d3ecfa0 fix(approval): extend gateway-lifecycle guard to launchctl and pidof-based kills
The dangerous-command approval layer already blocks `hermes gateway
(stop|restart)`, `pkill/killall hermes|gateway`, and `kill ... $(pgrep ...)`.
A reporter noted on #33071 that the agent can still achieve the same
effect by driving launchd directly against the gateway's service label
(`launchctl stop ai.hermes.gateway`, `launchctl kickstart -k
system/ai.hermes.gateway`, etc.) or by substituting `pidof` for `pgrep`
in the kill-expansion form.

This widens the "Gateway lifecycle protection" block in
`tools/approval.py` to cover both vectors:

- `launchctl (stop|kickstart|bootout|unload|kill|disable|remove)`
  scoped to commands that target a Hermes label (`hermes`,
  `ai.hermes`). Read-only inspection (`launchctl print …`,
  `launchctl list`) and operations against unrelated labels remain
  unflagged.
- `kill ... $(pidof …)` and the backtick form, alongside the existing
  `pgrep` expansion. `pidof` is the BSD/Linux equivalent and is
  equally opaque to the `(pkill|killall) … hermes` name pattern.

Intentionally left out of scope: plain `kill -TERM <numeric_pid>` with
a PID looked up out-of-band. Catching that would require runtime PID
state and would break the existing
`TestPgrepKillExpansion::test_safe_kill_pid_not_flagged` contract,
which guarantees that a plain literal-PID `kill 12345` stays safe.
2026-06-26 11:38:28 -07:00
ethernet
ba7026c376 feat(docs): clarify termux/nix as t2 platforoms 2026-06-26 11:37:56 -07:00
ethernet
772cf847b0 feat(docs): clarify platform support 2026-06-26 11:37:56 -07:00
ethernet
699adc2ca5 regen package-lock for integrity checks 2026-06-26 11:37:56 -07:00
brooklyn!
ed962104c8 Merge pull request #52935 from NousResearch/bb/desktop-inline-rendering
feat(desktop): inline rich embeds, diagrams & alerts in assistant markdown
2026-06-26 13:36:43 -05:00
Brooklyn Nicholson
db6ced4712 feat(desktop): consent gate for inline embeds (per-embed / per-service)
Embeds reach out to third parties on render, so default to a placeholder that
mirrors the tool-approval UX: "Load <service>" (this embed) or "Always allow
<service>" (persisted). A desktop-local store ($embedMode ask|always|off +
per-service allowlist) gates the fetch with zero gateway round-trip; an
Appearance setting controls the global default. Local renderers (mermaid, svg,
alerts) are never gated. Addresses review feedback on outbound third-party
requests.
2026-06-26 13:35:01 -05:00
Teknium
2d3071f9d4 docs(moa): clarify MoA presets are selectable on every surface (CLI, hermes model, Dashboard, Desktop, TUI) (#53211) 2026-06-26 11:16:14 -07:00
Teknium
9dd56f0dfb docs(moa): add HermesBench results to Mixture of Agents page (#53206) 2026-06-26 11:05:07 -07:00
Teknium
3d735fe156 fix(skills-hub): surface per-tap providers (NVIDIA/OpenAI/...) in runtime search (#53191)
Natural-language skill search returned a short, arbitrary list and never
surfaced NVIDIA (or OpenAI/Anthropic/HuggingFace) skills. Two causes:

1. The runtime index collapses every GitHub tap into source="github", so
   there was no way to find or filter by provider at the CLI — the per-tap
   identity only existed in the docs-site catalog.
2. HermesIndexSource.search matched only name/description/tags (not the
   identifier or provider) and broke at the first `limit` hits in raw index
   order, burying the most relevant skills. `search` also defaulted to
   --limit 10 against an 86k-entry catalog.

Changes:
- GitHubSource stamps a per-tap provider label (extra.provider) on each
  skill via github_provider_for(); source stays "github" so dedup/floor/
  index-skip logic is untouched. Flows into the built index.
- HermesIndexSource.search now matches identifier + provider too, and
  collect-then-ranks (exact > prefix > whole-word > substring) instead of
  break-at-limit.
- --source nvidia|openai|anthropic|huggingface|voltagent|gstack|minimax
  provider filters for browse/search (narrows merged results by provider).
- search --limit default 10 -> 25; table Source column shows the provider
  label for github skills.

Tested: 181 unit tests pass; E2E against the live runtime index confirms
'nvidia'/'cuda' searches now surface NVIDIA-provider skills and
--source nvidia narrows to exactly the NVIDIA catalog.
2026-06-26 11:04:41 -07:00
Teknium
d430684d7c fix(gateway,windows): respawn gateway windowless after GUI update (#52239)
The post-update gateway restart path relaunched the gateway with the
venv's console `python.exe` (via `get_python_path()` in
`_gateway_run_args_for_profile`). On Windows this leaves a terminal
window open permanently: uv's `venv\Scripts\python.exe` is a launcher
shim that re-execs the *base* console interpreter, which allocates its
own conhost — and `CREATE_NO_WINDOW` cannot suppress that second window.
The clean-start path (`_spawn_detached`) already dodges this by routing
through `_resolve_detached_python` to use the windowless base
`pythonw.exe`; the restart watcher did not.

Symptom (reported on Windows 11): after an in-app GUI update, a console
window for the gateway stays open and never closes. Confirmed on the
reporter's box — the running gateway was `python.exe ... gateway run
--replace` with a live conhost child and the foreground "Press Ctrl+C to
stop" banner, born exactly at the update's "Restarting Windows gateway"
log line.

Fix:
- Add `gateway_windows.windowless_gateway_restart_spec(run_argv)` which
  rewrites a console-python gateway argv into the windowless `pythonw.exe`
  equivalent and returns the cwd + env overlay (VIRTUAL_ENV / PYTHONPATH /
  HERMES_HOME) the base interpreter needs to import `hermes_cli` without
  the venv launcher's site config. No-op on POSIX.
- `_spawn_gateway_restart_watcher` now applies that rewrite on Windows and
  threads cwd= / env= into the inlined respawn Popen. Covers both restart
  entry points (`launch_detached_profile_gateway_restart` and
  `launch_detached_gateway_restart_by_cmdline`). CREATE_NO_WINDOW |
  DETACHED_PROCESS | CREATE_BREAKAWAY_FROM_JOB and the breakaway-denied
  fallback are all preserved.

Verified E2E on a real Windows 11 box: drove the actual watcher against a
dummy old-pid; the respawned gateway came up as `pythonw.exe` (zero
console python, no conhost child) and booted fully (housekeeping + kanban
dispatcher started → imports resolved under the base interpreter).

Tests: TestWindowlessGatewayRestartSpec (behavior) +
TestGatewayDetachedWatcherWindowsFlags regression assert. Pre-existing
Linux-only failures on a Windows host (SIGKILL, systemd, docker-root)
confirmed identical on the bare base.
2026-06-26 17:39:46 +00:00
Teknium
bb6a4d2a57 docs(nix): mark Nix/NixOS as no longer explicitly supported (#52975)
Add a deprecation banner to the top of the dedicated Nix & NixOS setup
guide and consistency notes at the Nix sections of installation, updating,
and the plugin-distribution guide. Nix is now best-effort only; the
supported install paths are the curl|bash installer, Docker, and Windows.
2026-06-26 10:17:43 -07:00
kyssta-exe
c0568ca95f fix(config): use read_raw_config() in migrations to prevent expanding defaults (#40821) 2026-06-26 22:40:52 +05:30
brooklyn!
5cc4009deb Merge pull request #52828 from helix4u/fix/desktop-backend-update-indicator
fix(desktop): show remote backend updates without counts
2026-06-26 11:49:07 -05:00
kshitij
5038678647 Merge pull request #53110 from NousResearch/salvage/42203-find-shell-prefer-usershell
fix(terminal): prefer $SHELL over bash for background process spawning (#42203)
2026-06-26 21:11:57 +05:30
liuhao1024
d9f1f1a1de fix(terminal): prefer $SHELL over bash for background process spawning (#42203)
On macOS, terminal(background=true) silently failed: the process returned a
session_id and exit_code=0 but the command never ran (empty stdout, no side
effects). Root cause is two interacting issues:

1. _find_shell was aliased to _find_bash, which prefers `shutil.which("bash")`
   → /bin/bash (GNU bash 3.2, still shipped on macOS) over $SHELL (/bin/zsh).
2. process_registry.spawn_local runs [shell, "-lic", "set +m; <cmd>"] with
   stdin=/dev/null. bash 3.2 as a login shell sources ~/.bash_profile, which on
   many macOS setups contains `exec /bin/zsh -l`; that exec replaces bash but
   drops the -c argument, so the command is swallowed (exit 0, no output).

Decouple _find_shell from _find_bash: _find_shell now prefers the user's
configured $SHELL on POSIX (the shell they actually log in with), falling back
to _find_bash when $SHELL is unset/missing. _find_bash is unchanged, so callers
that genuinely need bash (e.g. the _run_bash login-shell snapshot) keep bash
semantics. zsh handles -lic correctly even with redirected stdin.

Salvaged from #42219 by @liuhao1024 (authorship preserved via cherry-pick).
On top of the original (8 unit tests covering $SHELL-set/unset/missing/empty,
Windows-ignores-$SHELL, _find_bash-unchanged), added an E2E regression test
that reproduces the real bash-3.2 login-shell swallow (exit 0 / no file) and
asserts the shell _find_shell selects actually executes a -lic background
command. Mutation-verified: reverting _find_shell to the bash alias fails the
$SHELL-preference test. Bug reproduced directly: /bin/bash 3.2 -lic with a
.bash_profile->exec-zsh creates no file; zsh -lic does.

Closes #42203. Supersedes #42290.
2026-06-26 20:45:32 +05:30
xxxigm
65be0061e0 fix(hermes): heal broken managed Node tree instead of PATH fallback
When a Hermes-managed node/npm/npx shim exists but fails --version, redownload
the pinned nodejs.org bundle under HERMES_HOME/node and retry. Do not fall
back to system npm on PATH when a managed tree is present.

POSIX heal probes node, npm, and npx (npm can break while node still runs).
2026-06-26 20:10:20 +05:30
xxxigm
3c5bcd3eee test(hermes): cover broken managed npm fallback in node resolution
Add POSIX runnable-probe coverage plus Windows fallback wiring that skips
a managed npm.cmd when node_tool_runnable rejects it.
2026-06-26 20:10:20 +05:30
xxxigm
9274f73e48 fix(hermes): fall back when managed node/npm fails health probe
A stale or partial Hermes-managed Node tree under the active HERMES_HOME
can leave bin/npm behind while lib/cli.js is missing. File-existence checks
alone made hermes update pick that broken npm and skip healthy system npm
on PATH. Probe managed candidates with --version before preferring them.
2026-06-26 20:10:20 +05:30
kshitij
7b2c51152a Merge pull request #52990 from NousResearch/salvage/52889-backup-projects-kanban
fix(backup): include projects.db and kanban boards in pre-update snapshot (#52889)
2026-06-26 20:09:15 +05:30
0xDevNinja
9ef49cd78f fix(backup): include projects.db, kanban boards, and sibling stores in pre-update snapshot (#52889)
projects.db (per-profile project store) and kanban.db were missing from
_QUICK_STATE_FILES, so the pre-update quick snapshot never backed them up.
On a desktop upgrade, when the update flow removes/replaces the file and the
post-update schema-init re-creates an empty one, all user-created projects,
folder mappings, the active-project pointer, kanban board bindings, and tasks
vanish silently — no error.

Add the per-profile user-created stores to the snapshot set:
- projects.db               — project store
- response_store.db         — gateway conversation history / tool payloads (WAL)
- memory_store.db           — holographic memory facts/entities (WAL)
- verification_evidence.db  — agent verification audit trail
- kanban.db                 — default board (back-compat <root>/kanban.db)
- kanban/boards             — non-default boards (<root>/kanban/boards/<slug>/kanban.db
                              + metadata); workspaces/ and attachments/ subtrees
                              are skipped as large + regenerable.

Also: the directory-branch of create_quick_snapshot now routes *.db through the
WAL-safe _safe_copy_db (SQLite backup() API), matching the top-level file path —
previously a non-default board DB with an open WAL could be copied inconsistently.

Salvaged from #52930 by @0xDevNinja (authorship preserved via cherry-pick).
On top of the original (which covered only projects.db + the default kanban.db),
this adds: non-default-board coverage, the three sibling per-profile DBs that
meet the same upgrade-wipe criteria, WAL-safe directory copies, and a
workspaces/attachments skip to avoid snapshot bloat (×20 retained). 8 tests,
all mutation-verified; E2E verified snapshot→wipe→restore preserves all six
store types on the real code path.

Closes #52889. Supersedes #52930.
2026-06-26 19:23:33 +05:30
Ben
8ab7246c45 fix(gateway): stamp drain marker with instantiation epoch so a durable-volume restart clears it (NS-570)
The external-drain marker .drain_request.json is written under HERMES_HOME,
which on Hermes Cloud is a persistent Fly volume (/opt/data). A begin-drain
marker therefore SURVIVES the post-update machine restart. But the disruptive
lifecycle actions a drain protects (auto-update / image migrate / env edit /
profile change) all restart the machine — which is exactly the signal the drain
is over. The freshly-restarted gateway re-read the orphaned marker on its
startup reconcile and parked itself back in 'draining', refusing every new turn
indefinitely (NS-570: ~52 min until manually cleared).

Fix: stamp the marker with an identity of THIS container/VM instantiation
(kernel boot_id + PID 1 start time, read from /proc) and treat a marker whose
epoch differs from the current instantiation as absent. A deliberate restart →
new PID 1 → new epoch → stale marker ignored → gateway boots 'running'. A marker
written during the current instantiation (the live drain) still matches; an s6
respawn of just the gateway (PID 1/init unchanged) keeps the same epoch, so an
in-flight drain is still honoured (D4a reversibility preserved).

The staleness check is lenient and never fail-closed: a legacy marker with no
epoch, a corrupt/contentless marker, or an environment with no /proc (epoch
unavailable) all degrade to the original presence-only behaviour. NAS is
untouched — it only ever POSTs begin/cancel-drain over HTTP; the marker file is
purely gateway-internal IPC.

The fix is entirely within gateway/drain_control.py; the watcher and the
dashboard endpoint go through the same drain_requested()/write_drain_request()
chokepoints and need no functional change.
2026-06-26 18:59:41 +05:30
Dr1985
e3db1ef92d fix(macos): clearly distinguish launchd supervision from detached fallback in gateway status
## Description

On macOS 26.x, `launchctl bootstrap` and `launchctl kickstart` return exit code 5 ("Input/output error"), which Hermes already anticipates and handles by spawning a detached fallback process. However, the gateway status reporting is ambiguous:

- `gateway status` says "Gateway service is loaded" (because `launchctl list` returns exit 0)
- But `launchctl print` shows `state = not running` — launchd isn't actually supervising anything
- The detached fallback PID running is invisible to the status command
- Users can't tell whether auto-start at login and auto-restart on crash are available

### Root Cause

Two problems in `hermes_cli/gateway.py`:

1. **`_probe_launchd_service_running()`** (line 1067): Determined launchd service liveness solely by `launchctl list <label>` exit code. On macOS 26, this returns 0 even when the service is only *registered* but not running (output lacks a `"PID"` field). This caused `GatewayRuntimeSnapshot.service_running = True` incorrectly, which suppressed the process/service mismatch warning.

2. **`launchd_status()`** (line 3569): Used the same binary "loaded/not loaded" check without inspecting whether launchd actually has a PID, whether a detached fallback is running, or whether auto-start/restart are available.

### Changes

**`hermes_cli/gateway.py`:**

1. **New `_parse_launchd_pid_from_list_output()` helper** — Extracts the PID from `launchctl list` output. When launchd is actively supervising, the output includes `"PID" = <number>;`. When only registered but not running, no PID field is present.

2. **Fixed `_probe_launchd_service_running()`** — Now requires a PID in the `launchctl list` output to confirm launchd is actually supervising. This correctly sets `service_running = False` when launchd has the service registered but `state = not running`, which triggers the existing process/service mismatch detection.

3. **Reworked `launchd_status()`** — Reports clearly separated information:
   - LaunchAgent plist currentness (stale or current)
   - Whether launchd is actively supervising (with PID)
   - Whether a detached fallback PID is running
   - Whether auto-start at login and auto-restart on crash are available
   - When launchd supervision is known to be unavailable, explains why

4. **Persistent unsupported marker** (`~/.hermes/.gateway-launchd-unsupported`) — Written when `_launchd_fallback_to_detached()` is called (launchd exit 5/125). Allows `launchd_status()` to explain *why* launchd can't supervise even when no fallback process is currently running. Cleared automatically when a future bootstrap/kickstart succeeds (e.g., after an OS update fixes the issue).

5. **Updated `_print_gateway_process_mismatch()`** — Distinguishes the managed detached fallback from a genuinely manual `nohup hermes gateway run`, providing accurate guidance for each case.

### Status Output Examples

**Before** (macOS 26, fallback active):
```
Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist
✓ Service definition matches the current Hermes install
✓ Gateway service is loaded
{
    "Label" = "ai.hermes.gateway";
    "OnDemand" = true;
    ...
};
```

**After** (macOS 26, fallback active):
```
Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist
✓ Service definition matches the current Hermes install
⚠ Gateway service is registered but launchd is not supervising it
  launchd cannot manage the gateway on this macOS version.
✓ Detached fallback process is running (PID 12345)
  Cron jobs will fire. Stop with: hermes gateway stop
  ⚠ Auto-start at login and auto-restart on crash are NOT available.
```

**After** (normal launchd supervision):
```
Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist
✓ Service definition matches the current Hermes install
✓ Gateway is supervised by launchd (PID 12345)
  Auto-start at login and auto-restart on crash are available.
```

### Tests

Updated 5 existing tests and added 11 new tests in `tests/hermes_cli/test_gateway_service.py`:
- PID parsing from `launchctl list` output (with PID, without PID, empty, unquoted PID)
- `_probe_launchd_service_running()` requires PID presence
- Unsupport marker lifecycle (write, clear, persist across fallback)
- Marker cleared on successful bootstrap
- `launchd_status()` reporting: supervised, fallback-running, fallback-unavailable
- Existing fallback tests now verify marker creation

### Related Issues

- Issue #23387 (original macOS 26 launchd workaround)
- Issue #42524 (this issue)
2026-06-26 16:30:30 +05:30
kshitij
1c832762a8 Merge pull request #52983 from kshitijk4poor/chore/author-map-dr1985
chore: add Dr1985 to AUTHOR_MAP for launchd salvage
2026-06-26 16:29:22 +05:30
kyssta-exe
07cc567dfa fix(security): add circuit breaker for tirith crashes to prevent agent hangs (#41400) 2026-06-26 15:26:08 +05:30
brooklyn!
ca82d0accc Merge pull request #52993 from NousResearch/bb/desktop-clarify-redesign
feat(desktop): redesign the clarify prompt + fix its awaiting-input states
2026-06-26 03:57:43 -05:00
Brooklyn Nicholson
54b50037e1 fix(desktop): treat a pending prompt as paused-on-you, not working
A clarify/approval/sudo/secret prompt blocks the turn on the user, but the UI
treated it as an in-flight turn: the "thinking" timer kept ticking and Esc
interrupted the run — discarding a question you might want to come back to. Add
$activeSessionAwaitingInput (the pet's awaitingInput concept, scoped to the
active session) and use it to suppress the stall indicator and disarm Esc while a
prompt waits. Clear the session's prompts (and needsInput) on Stop and on turn
end so a resolved/aborted turn can't leave a dead panel or a stuck "needs input"
dot.
2026-06-26 03:55:34 -05:00
Brooklyn Nicholson
8559246bfb feat(desktop): rebuild the clarify prompt to match the chat UI
The inline clarify panel used its own card tokens, an animated ring, and
oversized spacing — out of step with every other tool row. Rebuild it on the
shared --ui-*/--conversation-* tokens: a compact panel, letter-key badges
(A/B/C…) that double as a/b/c… shortcuts, an inline content-sizing "Other" field
(CSS field-sizing — no view swap, no layout shift on focus), and a Continue
button so picking an option selects rather than auto-sends. Selection lives on
the letter badge alone (solid primary; outlined while Other is focused-but-empty).
Also settle the panel into the standard tool block once the turn stops running,
so a stopped turn no longer strands a live, unanswerable prompt.
2026-06-26 03:55:29 -05:00
kshitij
1aa458a1e6 Merge pull request #52920 from NousResearch/salvage/38798-toolset-validation
fix(config): surface invalid platform_toolsets instead of silently dropping tools (#38798)
2026-06-26 14:14:55 +05:30
Brooklyn Nicholson
da0ed979fa feat(desktop): zoomable primitive — open full, pan/zoom, copy
Add a content-agnostic Zoomable primitive (useZoomPan hook + overlay viewer):
click to open full-screen, wheel-zoom toward the cursor, drag to pan, toolbar
zoom/reset, and an optional copy action. Wire Mermaid diagrams into it with
copy-as-PNG; reusable for other inline content later.
2026-06-26 03:40:49 -05:00
kshitijk4poor
05ba5f3962 chore: add Dr1985 to AUTHOR_MAP for launchd salvage (#42567) 2026-06-26 14:09:11 +05:30
lEWFkRAD
41ede84b93 fix(config): surface invalid platform_toolsets instead of silently dropping tools (#38798)
A config migration (or hand-edit) that leaves an invalid toolset name in
`platform_toolsets` — e.g. the #38798 corruption that rewrote `hermes-cli` to
the non-existent `hermes` — silently disabled all affected tools:
resolve_toolset() returns [] for an unknown name, so the agent quietly lost its
tools with no error, warning, or log entry and degraded to text-only replies.

Surface it loudly at two points:
- After migration (migrate_config): validate platform_toolsets and record/print
  a warning per unknown name, with a `hermes-<platform>` suggestion when that
  would have been valid (the exact #38798 shape).
- At runtime (_get_platform_tools): if a platform was explicitly configured but
  every toolset name is invalid, log a warning when tools are resolved for a
  session — so an ALREADY-corrupted config is caught at startup, not only on the
  next `hermes update`.

Logic lives in a new pure, side-effect-free helper (toolset_validation.py) with
validate_toolset injected, so it is unit-testable without the tool registry.

Note: the original v25→v26 migration that caused the corruption no longer
exists (config format is now v30; no migration step rewrites toolset names).
This change is the durable defense against the silent-failure mode regardless
of cause, matching the issue's "Expected: log a warning".

Salvaged from #39207 by @lEWFkRAD (authorship preserved via cherry-pick).
Tests: 9 helper cases (incl. the #38798 corruption shape, mixed valid/invalid,
zero-tools state, non-dict/scalar/non-string) + a runtime caplog test — both the
helper warning and the runtime guard mutation-verified to fail without the fix.

Closes #38798. Supersedes #39581 (prevent-in-v25→v26 — that path is gone),
#41006 / #40208 (repair-migration for already-corrupted configs).
2026-06-26 14:07:43 +05:30
Brooklyn Nicholson
e36d9862ec feat(desktop): render embeds, fences and alerts in assistant markdown
Wire the embeds module into the markdown surface: bare provider autolinks unfurl
to inline embeds, ```mermaid/```svg fences route to the rich renderers, and
`> [!NOTE]`-style blockquotes become alert callouts. Labeled links stay plain.
2026-06-26 03:22:14 -05:00
Brooklyn Nicholson
0c190083cd feat(desktop): lazy embed renderers + fenced diagrams/alerts
Per-kind renderers, each a lazy split chunk: plain-iframe video/maps (wheel
chains to the transcript; maps gate scroll behind ⌘), the in-document
blockquote-script path for X/Instagram, the dark Spotify player, and the
YouTube iframe. Adds Mermaid and DOMPurify-sanitised SVG fences and GFM alert
callouts, all sized to 33dvh and theme-matched to avoid white color-scheme
artifacts. Main-process stamps a Referer on YouTube embed requests.
2026-06-26 03:22:08 -05:00
Brooklyn Nicholson
81ac562bf0 feat(desktop): inline embed detection + module primitives
Pure, synchronous URL→descriptor matchers for YouTube, Vimeo, Instagram,
Pinterest, TikTok, X, Spotify, Google Maps and OpenStreetMap, plus the shared
embed primitives (error boundary, fail card, escape-html, dark-mode hook,
sizing token). Declares the mermaid + dompurify deps used by the fenced
renderers.
2026-06-26 03:21:59 -05:00
helix4u
063fe4f6ef fix(auxiliary): fallback on invalid provider responses 2026-06-26 13:49:46 +05:30
teknium1
fbfccbb3ee fix(security): align cron invisible-unicode set with install-time scanner
The cron runtime tripwire (_scan_cron_prompt) used a 10-char invisible-unicode
set while the install-time scanner (threat_patterns.INVISIBLE_CHARS) flags 17.
The cron-local set was missing U+2062-U+2064 (invisible math operators) and
U+2066-U+2069 (directional isolates), so a directive obfuscated with one of
those codepoints (e.g. "ig<U+2063>nore all previous instructions") slipped past
the runtime cron gate while being caught at install time.

Import the canonical set so the cron tripwire and install scanner can't drift
apart again. Emoji-ZWJ protection (_zwj_has_emoji_neighbour) is unchanged.

Fixes #35075

Co-authored-by: rlaope <piyrw9754@gmail.com>
2026-06-26 01:11:11 -07:00
Shannon Sands
a0dc92450b Split dashboard PTY reconnect tests 2026-06-26 01:06:02 -07:00
Shannon Sands
41f8126148 Reconnect dashboard PTY chat after socket drops 2026-06-26 01:06:02 -07:00
Shannon Sands
6a319f570f Settle TUI resume scroll after hydration 2026-06-26 01:05:26 -07:00
Teknium
619dc4a561 fix(whatsapp_cloud): resolve reply-to text so the agent sees reply context (#52957)
Replies on WhatsApp Cloud arrived at the agent with reply_to_id set but
reply_to_text=None, so run.py never injected the "[Replying to: ...]"
disambiguation prefix (it gates on reply_to_text). Meta's webhook context
object carries only the quoted message's id, never its text.

Index (chat_id, wamid) -> text in rich_sent_store on every inbound message
and every outbound text send -- the same store that solved the identical
Telegram rich-send problem -- then look up the quoted text in
_build_message_event_from_cloud and populate reply_to_text plus
reply_to_is_own_message, derived from context.from versus the business
number.
2026-06-26 01:05:05 -07:00
Ben
19b2624404 feat(gateway): external drain trigger + accept-gating (begin/cancel + control channel)
Tasks 2.1 + 2.2 + 2.3 of the safe-shutdown plan — the reversible
quiesce-without-restart machinery NAS drives during a lifecycle action (D4a).
These ship together because the endpoint, the control channel, and the gateway
state machine are one coherent slice.

2.2 — control channel (gateway/drain_control.py, new):
The dashboard has no HTTP path into a running gateway (guardrails: "there is NO
external control channel into a running gateway"); restart/drain is driven only
by markers the gateway reacts to. So begin/cancel-drain writes/removes a
presence-based marker .drain_request.json (HERMES_HOME-scoped, atomic write,
never-raises read; a corrupt marker reads as present-contentless → fail-safe
toward quiescing). This is Q-B option A.

2.2 — gateway state machine (gateway/run.py):
- _external_drain_active flag, DISTINCT from the shutdown _draining flag: this
  one does NOT exit the process and is fully reversible.
- _enter_external_drain / _exit_external_drain: idempotent transitions that
  flip gateway_state→draining / →running via _update_runtime_status (preserving
  the live active_agents count). exit refuses to revert to running during a
  real shutdown or after the loop stops (shutdown wins).
- _drain_control_watcher: 1s background task (modelled on _handoff_watcher)
  reconciling accept-state with the marker; honours a marker that survived a
  restart on its first tick. Registered alongside the other watchers in start.
- New-turn accept gate in _handle_message, placed BEFORE the session-slot
  claim: when draining, refuse to START a new turn (so active_agents can only
  fall → no TOCTOU race), while in-flight turns finish untouched. Internal/
  system events (restart-recovery replays, bg-process completions) bypass it.

2.1 — endpoint (hermes_cli/web_server.py):
POST /api/gateway/drain {action: drain|cancel}. Authenticated by the Task-2.0a
token seam (the drain plugin registered this exact path as a token route);
attributes the request to the verified token principal. Begin writes the
marker, cancel removes it — the gateway process owns the actual transition.
Force-override (D6) is NOT here; it maps onto the existing immediate
/api/gateway/restart force path.

Tests (mocked — necessary-not-sufficient; the HARD live gate Q-B is next):
- tests/gateway/test_external_drain_control.py — marker contract (write/clear/
  read/corrupt/atomic), state machine (enter/exit/idempotency/shutdown-wins/
  loop-stopped), watcher reconcile-enter-then-exit, new-turn refusal, and
  in-flight-not-interrupted. 15 tests.
- tests/hermes_cli/test_web_server.py — /api/gateway/drain begin/default-begin/
  cancel/cancel-idempotent/bad-action-400. 6 tests.
- dashboard.drain_auth config section already added in 2.0b commit.

All touched suites green: 301 (gateway+auth) + 9 (web_server endpoints) passed.

Intentionally deferred:
- HARD live-validation gate (Q-B): real isolated `hermes gateway run`, drive a
  real begin-drain marker, prove the 5-point checklist a–e.
- Spec-doc status flip + Phase-2 PR.

Build status: external-drain, restart-drain, status, dashboard-auth, drain-plugin,
token-auth, and web_server-endpoint suites green.
2026-06-26 00:47:19 -07:00
Ben
2e322466b1 feat(dashboard-auth): drain shared-bearer-secret provider plugin
Task 2.0b: the concrete shared-bearer-secret auth provider, the FIRST consumer
of the generic token-auth capability (Task 2.0a). Implements decisions.md Q-A.

plugins/dashboard_auth/drain/ (bundled, discovered like dashboard_auth/basic):
- DrainSecretProvider: non-interactive provider, supports_token=True. Verifies
  an inbound Authorization bearer token against a per-agent shared secret with
  hmac.compare_digest (constant-time, no timing oracle) and, on a match,
  vouches for the caller as the "drain-control" principal scoped to "drain".
  The five interactive ABC methods raise NotImplementedError; verify_session
  returns None (stacks harmlessly in the cookie-verify loop).
- assess_secret_strength(): fail-closed entropy gate. Rejects secrets shorter
  than 43 url-safe-b64 chars (~256 bits), with < 16 distinct characters, or
  below 128 bits Shannon entropy — so a weak/structured/repeated secret can
  never be silently accepted. Enforced both at register() (friendly skip
  reason) and in __init__ (raises — defence in depth).
- register(ctx): no-op + skip reason when HERMES_DASHBOARD_DRAIN_SECRET is
  unset; rejects a weak secret fail-closed (drain endpoint stays gated). On a
  strong secret, registers the provider AND opts /api/gateway/drain into the
  generic token-auth seam via register_token_route().

Config: the secret is a CREDENTIAL → carried via HERMES_DASHBOARD_DRAIN_SECRET
(per-agent, provisioned by NAS at deploy). Behavioural knobs only
(dashboard.drain_auth.{scope,min_secret_chars}) live in config.yaml — added to
DEFAULT_CONFIG with the .env-is-for-secrets rationale documented inline.

Tests: tests/plugins/dashboard_auth/test_drain_provider.py — entropy gate
(strong pass; empty/short/repeated/few-distinct/custom-min reject), verify_token
(match → scoped principal, wrong/empty → None, custom scope), protocol
compliance, interactive-methods-raise, and register() (skip-no-secret,
fail-closed-weak-secret, strong-env-secret registers + route opt-in, config
scope + min_secret_chars). 21 new tests; drain + token-auth suites 44 passed.
Verified the plugin is discovered as dashboard_auth/drain alongside basic/nous.

Intentionally deferred:
- The begin/cancel-drain endpoint handler itself — Task 2.1.
- The dashboard→gateway control channel — Task 2.2.

Build status: dashboard-auth + drain-plugin suites green.
2026-06-26 00:47:19 -07:00
Ben
cb9cb6ba1c feat(dashboard-auth): generic non-interactive API-token capability
Task 2.0a of the safe-shutdown drain-coordination plan. Widens the dashboard
auth framework GENERICALLY to support non-interactive (service-to-service)
bearer-token auth, mirroring the existing supports_password precedent. This is
a reusable capability — any future machine-credential provider plugs in without
core changes (decisions.md Q-C). The drain bearer-secret plugin (Task 2.0b) is
the first consumer, not the definition.

- base.py: add TokenPrincipal dataclass (the token analog of Session) +
  supports_token capability flag + verify_token() on the ABC (default raises
  NotImplementedError so a misconfigured provider fails loud). Contract mirrors
  verify_session stacking: return None for unrecognised tokens (never raise),
  raise ProviderError only on a genuine backing-store outage.
- registry.py: list_token_providers() — the supports_token subset, in
  registration order. Empty when none registered (token routes fail closed).
- token_auth.py (new): route-agnostic seam. Routes opt in via
  register_token_route(exact path); token_auth_middleware owns the auth
  decision for those routes only — authenticate via stacked providers, attach
  request.state.token_principal + token_authenticated, pass through. 401 on
  missing/unrecognised token, 503 when a provider was unreachable, untouched
  passthrough for non-token routes. Fails closed (never open).
- web_server.py: install the seam OUTERMOST (registered last → runs first).
  Both downstream gates (legacy auth_middleware + gated_auth_middleware) honour
  request.state.token_authenticated and skip enforcement, so a token-authed
  service request is never bounced to /login.
- audit.py: TOKEN_AUTH_SUCCESS / TOKEN_AUTH_FAILURE events.

Tests: tests/hermes_cli/test_dashboard_token_auth.py — ABC flag default,
verify_token NotImplementedError, registry filter, bearer extraction
(case-insensitive scheme, malformed/non-bearer → ""), provider stacking
(first-match-wins, unreachable-remembered, unreachable-then-valid, buggy
provider doesn't crash the gate), and the seam's passthrough/401/503/
fail-closed behaviour. 29 new tests; full dashboard-auth suite 169 passed.

Intentionally deferred:
- The concrete shared-bearer-secret provider plugin — Task 2.0b.
- The begin/cancel-drain endpoint that registers itself as a token route —
  Task 2.1.

Build status: dashboard-auth + plugin-hook suites green.
2026-06-26 00:47:19 -07:00
Teknium
099df3cd89 fix(security): stop blocking AGENTS.md/SOUL.md that name an agent 'Praxis' (#52925)
The known_c2_framework threat pattern included 'praxis' in its
alternation alongside genuine offensive-security tool brands (Cobalt
Strike, Sliver, Havoc, Mythic, Metasploit, Brainworm). Unlike those
distinctive brand names, 'praxis' is a common English word (Greek for
practice/action) and a legitimate agent name, so any context file that
mentioned an agent named Praxis matched at 'context' scope and the whole
AGENTS.md / SOUL.md was replaced with a [BLOCKED] placeholder before it
reached the system prompt.

Remove 'praxis' from the alternation and add a guard comment: every
token in this list must be a distinctive tool brand, not a common word.
Real C2 brands still fire.
2026-06-26 00:36:01 -07:00
teknium1
4d0dd6bd52 test(mcp): make invalid_client tests interactive under hermetic env
The new _maybe_flag_poisoned_client tests built a provider via
get_or_build_provider without an interactive stdin. Under the hermetic
test env (no TTY, no cached tokens), the non-interactive guard in
mcp_oauth_manager._make_provider raised OAuthNonInteractiveError before
the provider was built, failing 6 tests in CI parity (they passed
locally where stdin was a TTY).

Thread monkeypatch into _provider_with_token_endpoint and present an
interactive stdin, matching the sibling test_manager_builds_hermes_provider_subclass.
2026-06-26 00:35:27 -07:00
Max Hsu
075f93ad78 fix(mcp): auto-recover from invalid_client on stale OAuth client registration
Fixes #36767.

Two complementary recoveries for the recurring "delete three cache files and
re-auth by hand" ritual when an MCP server's dynamically-registered OAuth
client goes dead server-side (IdP redeploy / DB wipe / rebrand):

- Auto-heal (token-endpoint subset): HermesMCPOAuthProvider now sniffs
  auth-flow responses and, on a 400/401 `invalid_client` from the discovered
  token endpoint, backs up + deletes `<server>.client.json` and `.meta.json`
  and clears the in-memory client so the SDK re-runs RFC 7591 dynamic client
  registration on the next flow. Conservative by construction: only
  dynamically-registered (non config-supplied) clients, only the token
  endpoint, only on a word-boundary `invalid_client` match (so RFC 7591's
  `invalid_client_metadata` does not trip it); best-effort so a miss never
  breaks the live flow. Covers both code-exchange and refresh when the token
  endpoint was discovered. Tokens are preserved.

- `hermes mcp reauth [<name>|--all]`: the reporter's primary symptom — the
  IdP's in-browser "Redirect URI Mismatch" — produces no HTTP signal (the SDK
  only sees a callback timeout), so it cannot be auto-detected. The new
  command re-auths one or ALL `auth: oauth` servers, serially: one browser
  flow at a time, which also fixes the startup popup storm when several
  servers are stale at once. Single-server reauth is factored out of
  `mcp login` and shared.

Tests: +14 (poison helper x2; token-endpoint detection x5 incl. wrong-endpoint,
success-response, pre-registered, and invalid_client_metadata negative guards;
a bridge integration test driving the real async_auth_flow generator to prove
the detection hook preserves the bidirectional asend() forwarding contract;
reauth CLI x6). Verified against the pinned mcp==1.26.0: scripts/run_tests.sh
122/122 green for the touched suites; check-windows-footguns.py and ruff clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-26 00:35:27 -07:00
Ben Barclay
6e4e5967f7 feat(relay): multi-platform-per-agent — list identity, provision-loop, N-hello, per-frame egress (Phase 1.5) (#52830)
Cut over the agent half of Shape A (D-Q1.5a/b.1/c) to front a SET of platforms on
one relay WS:

- relay_platform_identities() parses GATEWAY_RELAY_PLATFORMS (list) +
  GATEWAY_RELAY_BOT_IDS (JSON keyed map {platform:{botId,username?}}). Cut over
  from the scalar GATEWAY_RELAY_PLATFORM/_BOT_ID (no fallback, D-Q1.5c).
- self_provision_relay() loops one /relay/provision per platform under one
  gatewayId+secret, partial-failure-tolerant.
- WebSocketRelayTransport takes the identity SET, sends one hello per identity
  (connector accumulates the advertised set), and stamps the per-frame
  OutboundFrame.platform + its matching advertised botId on outbound.
- RelayAdapter remembers each chat's underlying source.platform (mirroring the
  existing guild/dm scope capture) and tags the reply's egress platform.
- send_relay_policy() declares one relevance policy per fronted platform (the
  connector keys policy by (tenant,platform,instanceId)).

Single-platform deploys are byte-identical on the wire (1-element list, no per-frame
tag -> connector session-default fallback). typecheck/ruff clean; relay unit 221 pass
(+10 new); all 15 cross-repo E2E drivers green vs connector origin/main.
2026-06-26 17:32:46 +10:00
brooklyn!
a2b49e60b6 Merge pull request #52412 from GodsBoy/fix/verify-on-stop-messaging-surface-leak
fix(agent): gate verify-on-stop nudge off for messaging surfaces
2026-06-26 02:30:08 -05:00
kshitij
7d568293f9 Merge pull request #52891 from kshitijk4poor/salvage/52623-aux-host
fix(auxiliary): gate Anthropic base_url override on Anthropic-compatible host (#52608)
2026-06-26 12:24:19 +05:30
konsisumer
3cf900eb67 fix(install): discard managed lockfile churn before stashing 2026-06-25 23:49:11 -07:00
Ben Barclay
cb7d1f68f8 fix(relay): accept is_reconnect kwarg in RelayAdapter.connect (#52911)
The gateway reconnect watcher (gateway/run.py) recovers a platform after a
fatal adapter error by building a fresh adapter and calling
connect(is_reconnect=True). Every BasePlatformAdapter implements
connect(*, is_reconnect: bool = False) for this — except RelayAdapter, whose
connect() was bare. So the watcher's recovery path raised:

    TypeError: connect() got an unexpected keyword argument 'is_reconnect'

Observed live on a hosted staging agent: after a fatal relay adapter error the
watcher could never re-establish relay, so the shared-bot inbound never reached
the gateway and Discord DMs stopped (dashboard surfaced the TypeError).

Relay deliberately ignores the flag: the #46621 server-side-queue-preservation
concern doesn't apply, because relay's outage buffer is the connector's durable
buffer (replayed on the transport's re-handshake), not a gateway-side queue the
adapter owns. Routine WS drops are already handled by the transport's own
reconnect supervisor (WebSocketRelayTransport, reconnect=True); the watcher path
is fatal-error recovery, and the fatal handler disconnect()s the old adapter
(cancelling its supervisor) before a fresh adapter+transport is built, so there
is no double-dial.

Adds two regression tests (both proven red without the fix): connect(is_reconnect=True)
reaches the same transport-less RuntimeError instead of TypeError, and the
signature matches BasePlatformAdapter.connect.
2026-06-26 16:46:09 +10:00
brooklyn!
0f81b0d458 Merge pull request #52901 from NousResearch/bb/desktop-tui-lint-fixes
style(desktop,tui): fix all lint/type/formatting issues
2026-06-26 01:07:14 -05:00
Brooklyn Nicholson
62fe9fd101 style(desktop,tui): fix all lint/type/formatting issues
Bring apps/desktop and ui-tui to a clean state for typecheck, eslint,
and prettier:

- Run prettier across both trees (printWidth/wrap drift; prettier is not
  CI-enforced for these JS projects, so main had accumulated drift).
- Apply eslint --fix for padding-line-between-statements and perfectionist
  import/export sorting.
- Manual fixes for non-auto-fixable rules:
  - remove unused node:net import in electron/main.cjs (uses Electron net)
  - replace inline `typeof import(...)` annotations with top-level
    `import type * as EnvModule` in two ui-tui test files
  - scoped eslint-disable no-control-regex on intentional sentinel/ANSI
    regexes (mathUnicode.ts, text.ts)
  - resolve react-hooks/exhaustive-deps per-case: correct swapped/missing
    deps, collapse redundant session.* members, and justified disables on
    settings mount-only data-load effects to preserve run-once behavior

No behavior changes; test pass/fail counts are unchanged from the main
baseline.
2026-06-26 01:04:33 -05:00
Moonsong
4e66bf1f80 fix(auxiliary): gate Anthropic base_url override on Anthropic-compatible host (#52608)
When operator config has provider=anthropic with model.base_url pointing
at a non-Anthropic host (e.g. https://openrouter.ai/api/v1 with provider=anthropic),
the auxiliary Anthropic path was unconditionally applying that override.
Main-session traffic routed correctly because the main path attaches the
right credential for the actual destination, but every side-channel call
(memory extractors, reflection, vision, title generation, janus
extractor/promise) sent ANTHROPIC_API_KEY to the foreign host and 401'd.

Gate the override on hostname == api.anthropic.com. Operators routing main
through a non-Anthropic provider must use that provider's own auxiliary
client; the Anthropic aux path now stays pointed at api.anthropic.com.

Regression tests cover openrouter, openai, anthropic-with-path, empty, and
anthropic-default-base_url cases.
2026-06-26 11:21:05 +05:30
kshitij
7aa32ec82f Merge pull request #52888 from kshitijk4poor/chore/author-map-tranquil-flow
chore: add Tranquil-Flow to AUTHOR_MAP for auxiliary base_url salvage
2026-06-26 11:20:54 +05:30
brooklyn!
2b86e9dae4 Merge pull request #52877 from NousResearch/bb/pet-scale-gesture
feat(desktop): Alt+wheel to scale the pet, never cropped
2026-06-26 00:45:07 -05:00
Brooklyn Nicholson
bf60bbb6c5 refactor(desktop): collapse overlay zoom-anchor math 2026-06-26 00:42:19 -05:00
kshitijk4poor
fe255ab28b chore: add Tranquil-Flow to AUTHOR_MAP for auxiliary base_url salvage (#52623) 2026-06-26 11:11:33 +05:30
Brooklyn Nicholson
7d1b72a15d feat(desktop): zoom the pet toward the cursor
Alt+wheel now scales about the pixel under the pointer instead of growing from a
corner, so the pet stays put under the cursor instead of running away. In-window
shifts its top-left; the overlay repositions its OS window (cursor-anchored on
wheel, bottom-center for slider-driven changes).
2026-06-26 00:37:45 -05:00
brooklyn!
6ba551e942 Merge pull request #52871 from NousResearch/bb/fix-tui-interrupt-queued
fix(tui-gateway): make stop interrupt queued turns
2026-06-26 00:37:13 -05:00
Brooklyn Nicholson
dd980aaba1 feat(desktop): Alt+wheel to scale the pet, never cropped
Hold Alt/Option and scroll over the mascot to resize it (same on Mac and
Windows); the modifier keeps a plain scroll passing through to the page. The
gesture drives the same `display.pet.scale` path as the settings slider.

The popped-out overlay grows its OS window to fit the pet at any scale (anchored
bottom-center) so the sprite is never clipped by the window edge, and the
in-window pet re-clamps against its actual size so growing near an edge can't
crop it. Also makes the overlay click-through per-pixel: only solid sprite
pixels (plus bubble / mail button) are interactive, transparent margins pass
clicks through.
2026-06-26 00:33:22 -05:00
Brooklyn Nicholson
594380d44a fix(tui): make stop interrupt queued desktop turns
Ensure TUI/desktop stop targets the actual conversation thread and cancels any queued next prompt, including the lazy agent-start window, so a stopped session cannot keep running or restart itself.
2026-06-26 00:31:06 -05:00
kshitij
a28b939092 Merge pull request #52678 from kshitijk4poor/salvage/52502-fuzzy-boundary
fix(fuzzy-match): preserve boundary space after whitespace-normalized match (#52491)
2026-06-26 10:59:14 +05:30
DavidMetcalfe
27c486e3b1 feat(agent): apply per-reasoning-model stale-timeout floor in stream + non-stream detectors
Wire get_reasoning_stale_timeout_floor() into both stale detectors so known
reasoning models (Nemotron 3 Ultra, OpenAI o1/o3, Opus 4.x thinking, DeepSeek
R1, Qwen QwQ, Grok reasoning) tolerate multi-minute thinking phases instead of
the upstream gateway idle-killing the socket (BrokenPipeError) before first
token. Applied as max(default, floor) — never overrides explicit user config,
never lowers an existing threshold.

The reasoning_timeouts.py allowlist module already landed on main via #52795,
so this salvage carries only the wiring + tests (the duplicate module and the
stale-base MoA reverts from the original PR branch are dropped).

Salvaged from #52238. Fixes #52217.
2026-06-25 22:12:06 -07:00
brooklyn!
f4c656b0a0 Merge pull request #52854 from NousResearch/bb/fix-interrupt-partial-reply
fix(interrupt): keep partial streamed reply when stopped mid-response
2026-06-26 00:04:37 -05:00
teknium1
4d04c652f2 fix(curator): make external-skill write guard actually fire during curation
The salvaged #51875 added a background-review write guard in skill_manage
that refuses mutations to skills.external_dirs skills — but it only fires
when is_background_review() is true. The curator's LLM review fork ran with
the default _memory_write_origin='assistant_tool', so the guard never
triggered during the exact curation pass it exists to protect against
(GH-47688).

- Set _memory_write_origin='background_review' on the curator review fork so
  turn_context binds it onto the write-origin ContextVar and the guard fires.
- Add a regression test asserting the fork runs under the background_review
  origin (the invariant linking the fork to the guard).
- AUTHOR_MAP: map yu-xin-c for the salvaged commit.
2026-06-25 22:03:02 -07:00
yu-xin-c
96bc524a71 fix(curator): protect external skills from background curation 2026-06-25 22:03:02 -07:00
teknium1
eed9bbeb0a chore(release): add rebel0789 to AUTHOR_MAP for salvaged PR #47308 2026-06-25 22:02:22 -07:00
teknium1
6c58878e7d fix(browser): force secret-pattern redaction on browser_type display
Force redact_sensitive_text(force=True) on the browser_type text arg so
recognized credentials (API keys, tokens, JWTs) are masked in tool
progress, previews, callbacks, and return payloads even when the global
security.redact_secrets opt-out is set — a typed credential reaching chat
history is a security boundary, not log hygiene. Normal typed text matches
no pattern and stays fully readable for debuggability.

Tests assert the API-key-shaped secret is masked across every surface and
that normal text passes through unchanged.
2026-06-25 22:02:22 -07:00
rebel
8ff426e53b fix: redact browser typed text surfaces 2026-06-25 22:02:22 -07:00
brooklyn!
5add283ec8 Merge pull request #52833 from NousResearch/bb/wsl-desktop-fixes
fix(desktop): WSL2 clipboard paste, titlebar layout, HMR survival, and GPU acceleration
2026-06-25 23:55:42 -05:00
Brooklyn Nicholson
8233598e64 fix(interrupt): keep partial streamed reply when stopped mid-response
Stopping a turn while the model is streaming (stop/esc to redirect) raised
InterruptedError, set final_response to the throwaway "waiting for model
response" sentinel, and persisted messages WITHOUT the assistant text that
was already streamed to the screen. The next turn then had no record of the
half-finished reply, so the model appeared to "forget" what it just said.

Recover the on-screen text from _current_streamed_assistant_text in the
InterruptedError branch and append it as the assistant turn (and surface it
as final_response). The metadata sentinel is kept only when nothing was
streamed yet, preserving the ACP/client suppression behavior.

Completes the partial-stream recovery from 397eae5d9 (which wired the same
_current_streamed_assistant_text salvage into the connection-failure twin
but missed the user-interrupt path). The lossy handler dates to c98ee9852.
2026-06-25 23:54:20 -05:00
Brooklyn Nicholson
76074b2145 fix(desktop): transparent WCO titlebar chrome on Windows/WSLg
Use a transparent native overlay so renderer chrome shows through the min/max/close
band. Sync window pre-paint bg to the computed chrome mix.
2026-06-25 23:50:59 -05:00
Brooklyn Nicholson
3b1344c18c fix(desktop): WSL titlebar layout and WSL2 GPU acceleration
Live-measure WCO width in the renderer, drop the right rail below the titlebar
band, and re-enable GPU compositing under WSLg when /dev/dxg is present.
2026-06-25 23:50:59 -05:00
Brooklyn Nicholson
da5484b61f fix(desktop): WSL2 clipboard image paste + Linux titlebar overlay
WSLg bridges clipboard text but not images — pull host screenshots via
PowerShell. Disable titleBarOverlay on plain Linux; gate overlay width per
platform in titlebar-overlay-width.cjs.
2026-06-25 23:50:59 -05:00
Teknium
5b5c79a8ef feat(kanban): typed block reasons + unblock-loop breaker (#52848)
* feat(kanban): typed block reasons + unblock-loop breaker

Stops the kanban blocked-task loop: a worker blocks a task, a cron
unblocks it, the worker re-blocks for the same reason, repeat forever.

block_task now takes a typed kind and a persistent block_recurrences
counter on the tasks table:

- kind=dependency routes to todo (parent-gated, auto-resumed), never
  the human 'blocked' bucket a cron would keep unblocking.
- needs_input/capability/transient/untyped land in blocked; each
  same-cause re-block after an unblock increments block_recurrences,
  and at BLOCK_RECURRENCE_LIMIT (default 2) the task routes to triage
  for a human instead of blocked.
- unblock_task no longer resets block_recurrences (the amnesia that
  let the loop run unbounded); complete_task clears it on success.

Wired through the worker kanban_block tool (new kind arg) and the
hermes kanban block --kind CLI flag, both reporting where the task
actually landed. Docs + 11 new tests; 536 existing kanban tests green.

* test(kanban): make second-block notify test use a distinct block cause

test_notifier_second_blocked_delivers blocked the same task twice with
the same (untyped) reason, which now trips the new unblock-loop breaker
and routes the second block to triage instead of blocked — so only one
'blocked' notification fired. The test's actual intent is that TWO
distinct block cycles each notify; give the two cycles different kinds
(needs_input then capability) so they're genuinely separate blocks. The
same-cause loop→triage path is covered by test_kanban_block_kinds.py.
2026-06-25 21:46:58 -07:00
teknium1
43b8ba4181 fix(telegram): preserve Bot API update queue on watcher reconnect
After a prolonged outage the in-process network-error ladder escalates to
fatal and GatewayRunner._platform_reconnect_watcher rebuilds a fresh adapter
that reconnects through the bootstrap path. That path called
start_polling(drop_pending_updates=True), discarding every update Telegram
queued during the outage — all messages sent while the bot was down were
silently lost. The in-process ladder and 409-conflict handler already passed
drop_pending_updates=False; only bootstrap did not distinguish a cold first
boot from a reconnect.

Thread an is_reconnect signal from the watcher through
_connect_adapter_with_timeout into adapter.connect(). The base
BasePlatformAdapter.connect() gains a keyword-only is_reconnect=False so every
adapter inherits a tolerant signature (no per-platform breakage when the
runner forwards the kwarg). Telegram translates is_reconnect into
drop_pending_updates=not is_reconnect on both the polling and webhook bootstrap
calls. Cold boot still drops the stale queue; a watcher reconnect preserves it.

Fixes #46621.

Co-authored-by: annguyenNous <annguyen@nousresearch.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
Co-authored-by: Kewe63 <Kewe63@users.noreply.github.com>
2026-06-25 21:29:57 -07:00
liuhao1024
f44415e71a fix(gateway): add init-time provider fallback to _make_agent
When the primary provider raises AuthError (e.g. expired OAuth token),
_make_agent now walks the configured fallback_providers/fallback_model
chain before giving up — matching the behavior that cron/scheduler.py
and cli_agent_setup_mixin.py already have.

Fixes #47627
2026-06-25 21:21:58 -07:00
Teknium
0b7128582f fix(state): detect and repair FTS write corruption that silently drops gateway history (#52798)
A readable state.db can still reject every message write through the
messages_fts* triggers when the FTS5 index is corrupt: base-table reads and
PRAGMA integrity_check pass, but INSERT INTO messages fails with 'database
disk image is malformed'. The gateway reloads conversation_history from disk
each turn, so a silently-failed write hands the next turn stale/empty history
even though the same cached AIAgent still holds the live transcript — causing
immediate same-session amnesia. (#50502)

- hermes_state.py: _db_opens_cleanly() now drives a rolled-back message write
  through the FTS triggers, so write-only corruption (which the read-only
  probe reported healthy) is detected. repair_state_db_schema() gains an
  in-place FTS5 'rebuild' strategy (tier 0) before the dedup/drop tiers, plus
  an already_healthy short-circuit. Both 'hermes sessions repair' and
  'hermes doctor' route through these, so the fix covers the whole class.
- hermes_cli/doctor.py: the state.db check runs the write-health probe even on
  the success (readable) path and repairs in place with --fix.
- gateway/run.py: _select_cached_agent_history() prefers the cached agent's
  longer live _session_messages over a shorter persisted transcript, so an
  FTS write failure can't wipe in-session context.
- tests: regressions for write-health detection, in-place repair preserving
  rows + resuming writes, the already_healthy shortcut, and the gateway guard.

Combines the approaches from #50504 (@0-CYBERDYNE-SYSTEMS-0, issue author),
#52165 (@davidgut1982), and #50576 (@trevorgordon981).
2026-06-25 21:18:41 -07:00
teknium1
85e084d60d fix(email): reject spoofed From: header for authorization (GHSA-rxqh-5572-8m77)
The email adapter authorized senders entirely off the From: header, which is
attacker-controlled and unauthenticated by IMAP. An attacker could forge
From: an-allowlisted-address and pass both the adapter's EMAIL_ALLOWED_USERS
pre-filter and the gateway's allowlist authz (both key on the same spoofable
sender_addr), getting unauthorized commands executed by the agent.

Verify the From: domain against the trusted Authentication-Results header the
receiving mail server stamps (SPF/DKIM/DMARC) before trusting it for
authorization. Enforced only when an allowlist is in effect and allow-all is
off — fail-closed. Operators whose server does not stamp the header can opt
out via platforms.email.require_authenticated_sender: false (or
EMAIL_TRUST_FROM_HEADER=true).
2026-06-25 21:11:02 -07:00
Ben Barclay
dedf5643d8 fix(gateway): scale-to-zero never armed — arm-gate counted disabled placeholder platforms (#52831)
The scale-to-zero idle watcher never started on a correctly-opted-in,
relay-only instance, so the gateway never ran its idle decision, never called
go_dormant(), and never sent going_idle to the connector. Fly's autostop still
suspended the machine on traffic-idle, but the connector never flipped the
instance to buffered-only — so an inbound DM took the live delivery path,
found no live session for the suspended machine, and was dropped fail-closed
with no wake poke. The machine slept and never woke.

Root cause: _scale_to_zero_should_arm() passed list(config.platforms.keys())
to messaging_is_relay_only_or_absent(). config.platforms is pre-seeded with a
DISABLED placeholder PlatformConfig for every known platform (telegram,
discord, slack, matrix, …), so the key set is always the full ~20-entry
catalog regardless of what the instance actually runs. The relay-only check
discarded "relay", saw the disabled placeholders as live direct-socket
platforms, and returned False — so should_arm() was False and the watcher was
never created. Verified live on a staging instance: config.platforms keys =
[telegram, discord, slack, mattermost, matrix, relay] with only relay
enabled=True; should_arm() = False.

Fix: filter config.platforms to ENABLED entries before the relay-only check,
mirroring the adapter-connect loop which already gates on
`if not platform_config.enabled: continue`. This arms off the same notion of
"active platform" the rest of start() already uses — no parallel concept.

Also add a one-line not-armed diagnostic: when an instance IS opted in (the
HERMES_SCALE_TO_ZERO stamp is set) but the watcher still doesn't arm, log why
(relay_only_or_absent, the enabled platforms, wake_url present/missing). A
non-opted instance stays silent. The arm path previously logged only on
success, so a failed arm was invisible.

Tests: the existing pure-helper tests passed bare names so they never
exercised the call site that feeds the placeholder-laden config. Add
behaviour-contract tests against the REAL _scale_to_zero_should_arm with a
realistic config.platforms (relay enabled + others disabled). The F25
regression test (relay-only + disabled placeholders must arm) and the
no-platform case are RED without this fix, GREEN with it; the
genuinely-enabled-direct-platform / not-opted-in / no-wake-url cases stay
correctly non-arming so the filter can't over-broaden.

Wake mechanism itself verified healthy independently (direct wakeUrl GET
resumed a suspended staging instance in 1.15s, clean resume signature).
2026-06-26 14:01:48 +10:00
helix4u
1c8594b634 fix(desktop): show remote backend updates without counts 2026-06-25 21:39:29 -06:00
Teknium
a4091e49f1 fix(auth): write rotated Codex/xAI pool grant through to global root (#48415) (#52760)
CredentialPool._sync_device_code_entry_to_auth_store rotated single-use
OAuth refresh tokens but wrote the new chain only into the active profile
store. When a profile resolves a grant from the global-root fallback
(read_credential_pool, #18594) and the pool then refreshes it, root was
left holding a now-revoked refresh token — every other profile reading the
stale root grant subsequently died with refresh_token_reused / invalid_grant
once its access token expired.

This is the credential-pool analog of #43589 (which fixed the non-pool xAI
refresh path in _save_xai_oauth_tokens). Detect the read-from-root case
(profile lacks its own providers.<id> block) BEFORE the profile save and,
after it, write the rotated chain back to the global root via a best-effort,
seat-belted write-through. A profile that genuinely shadows root (owns the
block) is untouched; classic mode (profile == root) is a no-op; a failed root
write never breaks the profile's own save. Covers openai-codex (reported),
xai-oauth, and nous through the shared sync path.
2026-06-25 19:14:06 -07:00
Harjoth Khara
233ef98afe fix(docker): skip symlinked stage2 chown targets (#52789)
Prevents stage2-hook.sh recursive chown from following a symlinked $HERMES_HOME/home (or profiles/cron) and destroying the host user's home directory. Also guards top-level state-file chowns and refuses first-boot seeding through symlinks. Fixes #52781.

Co-authored-by: harjoth <harjoth.khara@gmail.com>
2026-06-26 12:09:52 +10:00
teknium1
1abfa66ba6 chore(release): add DavidMetcalfe to AUTHOR_MAP for PR #52272 salvage 2026-06-25 19:00:48 -07:00
DavidMetcalfe
865a09a610 fix(agent): detect thinking-timeout for reasoning models and surface actionable guidance instead of misleading file-write advice
Two-part fix:

Part 1 (classifier override at agent/error_classifier.py:720-738):
A transport disconnect on a reasoning model — even on a large session —
now routes to FailoverReason.timeout instead of context_overflow. Without
this, large-session reasoning-model disconnects route to the compression
branch and silently delete conversation history on a phantom
context-length error. The override is strictly targeted: non-reasoning
models (gpt-4o, claude-3-5-sonnet, llama-3.3-70b, etc.) still route to
context_overflow on large sessions — the existing intentional behavior
for chat models whose proxy doesn't idle-kill during prefill/generation.

Part 2 (new agent/thinking_timeout_guidance.py + integration at
agent/conversation_loop.py:3488-3567):
New is_thinking_timeout() and build_thinking_timeout_guidance() helpers.
When a known reasoning model (NVIDIA Nemotron 3 Ultra, OpenAI o1/o3,
Anthropic Opus 4.x thinking, DeepSeek R1, Qwen QwQ, xAI Grok reasoning)
hits a transport-kill on a small session (classifier says timeout
directly) or after Part 1 routes correctly (large session), the user
now sees reasoning-specific guidance with three actionable workarounds
in priority order:

  1. Set providers.<provider>.models.<model>.stale_timeout_seconds: 900
     in ~/.hermes/config.yaml (Hermes's built-in floor is already 600s
     for known reasoning models; raise further if upstream is even
     tighter).
  2. Lower reasoning_budget or set reasoning_effort: medium on this
     model if the provider supports it.
  3. Use a smaller / faster reasoning model if the task doesn't
     require deep thinking.

The new guidance takes precedence via if/elif over the existing
_is_stream_drop block, so a reasoning-model user with a transport-kill
message sees actionable advice instead of the misleading "try
execute_code with Python's open() for large files" advice (which is
correct for the unrelated large-file-write stream-drop case but
actively wrong for the thinking-timeout case).

Verified:
- 478 tests passing across 9 directly-relevant files (49 new + 429
  existing, zero regressions).
- Ruff lint clean on all 4 modified/new files.
- Negative test: 6 parametrized regression guards confirm non-reasoning
  models still route to context_overflow on large sessions; 4
  parametrized gates confirm non-timeout classifier reasons never
  trigger the guidance; 5 parametrized cases confirm non-transport
  messages never trigger it.
- Regression guard: new guidance message does NOT contain
  "execute_code" or "open()" — the misleading advice is fully
  replaced, not appended alongside.
- Cross-vendor dual review via agy -p:
  - Gemini 3.5 Flash (Medium) — passed: true, zero blockers, one
    SHOULD-FIX (vprint block duplication — fixed by extracting
    detection into a helper module).
  - GPT-OSS 120B (Medium) — passed: true, zero blockers, two nits
    (test placement — adopted at tests/agent/test_thinking_timeout_guidance.py;
    primary-model capture — accepted as non-issue per Flash's nit).

Dependency note for maintainers:
This PR includes agent/reasoning_timeouts.py (the reasoning-model
allowlist module from PR #52238) because the Layer 1 override is
load-bearing on get_reasoning_stale_timeout_floor(). After PR #52238
lands on main, this PR's duplicate agent/reasoning_timeouts.py should
be rebased away. Either PR can land first; the other rebase is
mechanical.

Fixes #52271.
2026-06-25 19:00:48 -07:00
Teknium
811df74a10 fix(gateway): defer cross-process cache cleanup off the cache lock (#52197) (#52761)
The #45966 cross-process coherence guard popped the stale cached agent
and then called the blocking _cleanup_agent_resources (memory-provider
shutdown, tool-resource teardown, async-client teardown) while still
holding _agent_cache_lock, on the gateway event-loop thread. While that
ran, _sweep_idle_cached_agents (driven by _session_expiry_watcher)
blocked acquiring the same lock and the asyncio loop stalled for minutes,
tripping repeated Discord 'heartbeat blocked' warnings.

Fix mirrors the cap-enforcer / idle-sweep paths: pop the stale entry
under the lock, release it, then schedule the SOFT release on a daemon
thread. The soft path (_release_evicted_agent_soft) is also more correct
here than the hard teardown the regression used — the same session
rebuilds a fresh agent immediately after invalidation, so its terminal
sandbox / browser / bg processes (keyed on task_id) must be preserved
for the rebuilt agent to inherit, not torn down.

Verified the cross-process site was the only cleanup-under-lock instance;
the other _cleanup_agent_resources call sites run outside the lock.
2026-06-25 18:58:47 -07:00
teknium1
e29823f1e8 chore(release): map agt-user noreply email for #48496 salvage 2026-06-25 18:50:11 -07:00
Teknium
ce802e932c fix(telegram): heartbeat loop exits cleanly when bot has no get_me
CI shard test_telegram_conflict.py timed out (140s) because the new
_polling_heartbeat_loop, started by connect(), busy-spun under those
tests: they monkeypatch asyncio.sleep to instant and pass a bot double
with no get_me(), so the probe raised AttributeError (swallowed) and the
loop re-entered immediately with no real pacing, starving the event loop.

Guard the loop to return when bot.get_me is not callable — a real PTB Bot
always exposes it, so this only triggers on a torn-down app or a test
double, where there is nothing to probe. Also cancel the heartbeat task in
the conflict tests that call connect() without disconnect(), matching the
production disconnect() teardown.

Verified: test_telegram_conflict.py now runs in ~4.5s; the 22
heartbeat/reconnect tests still pass; E2E confirms a hanging get_me still
fires the reconnect ladder while a missing get_me exits without spinning.
2026-06-25 18:50:11 -07:00
agt-user
8501caf51f fix(telegram): persistent heartbeat loop to detect CLOSE-WAIT polling sockets
When a Telegram long-poll TCP socket enters CLOSE-WAIT (remote sent FIN
but httpx hasn't noticed), epoll still reports it readable so no
exception is raised. PTB's error_callback never fires, the reconnect
ladder never engages, and the gateway silently stops receiving messages
while the process stays alive — until a manual systemctl restart.

The existing recovery only covers two cases: error_callback-driven
reconnects (which require an exception PTB never gets) and a one-shot
_verify_polling_after_reconnect probe (which runs only right after an
explicit reconnect). A socket that wedges during steady-state operation
is never detected.

Add _polling_heartbeat_loop: a background asyncio.Task started in
connect() (polling mode only) that probes get_me() every 90s on the
general request pool (not the getUpdates pool, so healthy long-polls are
never interrupted). On asyncio.TimeoutError/OSError it hands off to the
existing _handle_polling_network_error ladder; other errors are
swallowed. disconnect() cancels and awaits the task. Worst-case
detection window ~105s.

Complementary to #51541 (general-pool keepalive limits / fd leak) — that
recycles idle pooled connections; this detects a wedged active read.

Fixes #48495

Co-authored-by: agt-user <267614622+agt-user@users.noreply.github.com>
2026-06-25 18:50:11 -07:00
liuhao1024
56cf517ccd fix(cron): detect partial job loss in restore_cron_jobs_if_emptied (#52144)
The desktop scheduler can overwrite cron/jobs.json with its own small
set of internally-tracked crons after an update/restart, causing
partial loss of tool-created cron jobs. The previous guard only
checked for total loss (live_count == 0), missing the case where
live_count > 0 but less than the pre-update snapshot count.

Compare live_count against snap_count instead of checking for zero,
so both total loss (0 vs N) and partial loss (1 vs 19) trigger
restoration.

Salvaged from #52161 by @liuhao1024.

Closes #52144
2026-06-25 18:49:18 -07:00
brooklyn!
6b639bc2b9 Merge pull request #52772 from NousResearch/bb/editor
feat(desktop): in-app spot editor for the file preview pane
2026-06-25 20:25:06 -05:00
brooklyn!
41f4dce828 Merge pull request #52756 from NousResearch/bb/delegate-bg-resume-ux
feat(delegation): calm "will resume" affordance for background delegate_task
2026-06-25 20:08:06 -05:00
Brooklyn Nicholson
985350dd85 feat(cli): note background delegate_task dispatch in _on_tool_complete
A top-level delegate_task dispatches in the background and re-enters as a
fresh turn when done. Print a one-line dispatch-time note — no spinner,
nothing to poll — so the idle prompt doesn't read as "nothing happened."
2026-06-25 19:57:58 -05:00
Brooklyn Nicholson
7f02f30b76 feat(tui): add width-budgeted "resumes when subagent finishes" status segment
When idle with a background subagent still in flight, append a tail status
segment spelling out that the agent resumes on its own. Width-budgeted like
every tail segment, so it drops first on a tight terminal where the ⛓ count
already carries the signal.
2026-06-25 19:57:58 -05:00
Brooklyn Nicholson
563d347e4d feat(desktop): show a calm "will resume" notice for background delegate_task
When idle with a top-level delegate_task still in flight, render a static,
shimmering system-note at the transcript tail instead of a spinner (which
reads as "stuck"). Reuses the shared steer / slash-status chrome (centered,
0.6875rem, muted, Codicon) so it sits in the thread like every other meta
line, and mirrors the primary child's latest stream line, falling back to
generic copy. i18n across en/ja/zh/zh-hant; markdown prose/heading rhythm
tuned so a re-entered turn breathes.
2026-06-25 19:57:51 -05:00
Brooklyn Nicholson
6e096a850a feat(desktop): add $backgroundResume store for parked delegate_task
Track top-level delegate_task work that dispatches in the background and
re-enters as a fresh turn. $backgroundResume returns {count, activity} for
the active session while idle — count of parked tasks plus the primary
child's latest stream line (tool/progress/thinking) when readable.
2026-06-25 19:57:45 -05:00
Brooklyn Nicholson
09623b4527 fix(desktop): make the tab modified dot amber with a separating ring
Use the app's amber warn color for the unsaved-edits tab dot (was inheriting
the label text color) and add a tab-bg ring + soft drop shadow so it stays
legible where it overlaps the filename.
2026-06-25 19:55:31 -05:00
Brooklyn Nicholson
c456029b4e Merge remote-tracking branch 'origin/main' into bb/editor 2026-06-25 19:53:31 -05:00
Brooklyn Nicholson
1f950e189c feat(desktop): vertical resize for the bottom-row terminal pane
Extends the pane store with heightOverride (alongside widthOverride) and a
get/set/clear API, and wires the pane shell + desktop controller so the
bottom-row terminal pane can be resized on the Y axis with its size persisted.
2026-06-25 19:50:29 -05:00
Brooklyn Nicholson
ff81365988 feat(desktop): in-app spot editor for the file preview pane
Adds a CodeMirror 6 spot editor to the right-rail file preview so users can
make quick edits in-app without leaving for an IDE. Entering edit mode is a
pure in-place swap of the read view — same fixed-height header, same gutter
geometry/typography (mirrors SourceView 1:1) so nothing shifts — toggled via
the Edit button, a bare `e` when the pane is hovered/focused, or the tab.

- Save path is transport-agnostic (writeDesktopFileText): local Electron IPC
  or a new hardened POST /api/fs/write-text on the dashboard server (path
  validation, parent-must-exist, regular-files-only, size cap, atomic
  temp-file + os.replace), behind the existing auth middleware.
- Stale-on-disk guard re-reads before writing and offers overwrite vs
  discard-and-reload instead of clobbering external/agent edits.
- VS Code-style modified dot on the tab; ⌘/Ctrl+S and ⌘/Ctrl+Enter save,
  Esc cancels; GitHub highlight style matched to the read view's Shiki theme.
- Typing stays render-free (draft in a ref; dirty flips once at the boundary).
2026-06-25 19:50:25 -05:00
Que0x
b8fc8c908b fix(approval): fold Windows absolute home paths in dangerous-command detection
The detector folds absolute home / Hermes-home prefixes into their canonical
~/ and ~/.hermes/ forms so static patterns catch /home/alice/.bashrc the same
way they catch ~/.bashrc (abd69b81). On native Windows this fold never fired,
so terminal commands writing to shell startup files, ~/.ssh/authorized_keys,
or ~/.hermes/config.yaml / .env returned "safe" and skipped the approval
prompt — and config.yaml carries the approval policy itself.

Two compounding causes:

1. The fold ran after the backslash-escape strip (r\m -> rm), which dissolves
   the backslash separators in a Windows path (C:\Users\alice\.bashrc ->
   C:Usersalice...) before the fold could match. It now runs before the strip.
2. The fold only recognized POSIX absolute paths and only the home prefix,
   leaving multi-segment backslash suffixes (\.ssh\authorized_keys) to be
   mangled by the strip.

Consolidated into _home_prefix_fold_regex / _fold_home_prefixes: match a home
prefix with either separator, capture the rest of the path token, and
normalize its separators to / so multi-segment patterns match. The
degenerate-path guard generalizes count("/") >= 2 to "at least two components
below the root" (also rejecting a bare drive root C:\). HOME is consulted
directly because Windows' expanduser ignores it; the more specific Hermes home
is folded first, longest candidate first, so neither fold clobbers the other.

POSIX behavior unchanged; the r\m -> rm anti-obfuscation strip still runs.
Adds TestWindowsAbsolutePathFolding, which monkeypatches a Windows-style
HOME/HERMES_HOME so the behavior is also exercised on the CI runner.
2026-06-25 17:49:39 -07:00
brooklyn!
7cd5eaa646 Merge pull request #52745 from NousResearch/desktop/bundle-main
desktop: bundle main.cjs for electron
2026-06-25 19:06:59 -05:00
ethernet
df514654ba desktop: bundle main.cjs for electron
fixes simple-git not found
2026-06-25 20:05:20 -04:00
brooklyn!
55af6c447a Merge pull request #52206 from NousResearch/bb/desktop-tools-curation
fix(desktop): hide platform/internal toolsets from the Skills & Tools list
2026-06-25 18:56:04 -05:00
teknium1
6dfb8326f5 fix(state): exclude delegate/branch/tool children from resume walk + reconcile salvaged fixes
Follow-up to the salvage of #45035 + #48682. The two PRs touched different
functions (resolve_resume_session_id vs get_compression_tip) but #45035's
descendant walk followed ANY parent_session_id child, so a delegate/subagent
child could hijack the resume target. Apply the same _branched_from /
_delegate_from / source!='tool' exclusion the rest of hermes_state.py uses,
so the resume walk only follows genuine compression continuations.

Also updates the unrealistic delegation test fixture to carry the real
_delegate_from marker, and updates 3 list_sessions_rich test mocks for the
order_by_last_active kwarg #48682 added.

AUTHOR_MAP: map PINKIIILQWQ + ailang323 salvage authors.
2026-06-25 16:29:09 -07:00
longer
6d9ca04574 fix(desktop): resume latest compression continuation 2026-06-25 16:29:09 -07:00
Pink
263f6b03eb chore: rename test to reflect new semantics of resolve_resume_session_id 2026-06-25 16:29:09 -07:00
PINKIIILQWQ
abd6b85200 fix(state): resolve compression chain tip in resolve_resume_session_id
After context compression, the parent session holds pre-compression messages
and a child (or deeper descendant) holds the continuation.
resolve_resume_session_id() short-circuited when the input session already
had messages (row is not None -> return session_id), causing REST API
endpoints, gateway resume, and CLI resume to serve stale parent messages.

Remove the early-return. Walk the full descendant chain, record the
deepest node that has messages (best), and return best if not None
else the original session_id (preserving the empty-chain fallback).

Callers (api_server.py, web_server.py, cli_agent_setup_mixin.py,
cli_commands_mixin.py) all use the resolved != input -> redirect pattern
and are transparent to this change.
2026-06-25 16:29:09 -07:00
Teknium
208f0d7c3b fix(update): default pre-update backup to off (#52729)
The pre-update HERMES_HOME zip shipped on by default (DEFAULT_CONFIG +
runtime fallback both True), so every `hermes update` zipped the entire
~/.hermes — sessions DB, caches, skills — adding minutes to each update.
The shipped cli-config.yaml.example, the --backup help, and the example
config all already said "off by default," so the live default
contradicted its own documentation.

Flip the default to off everywhere: DEFAULT_CONFIG, the runtime
`.get(..., False)` fallback in _run_pre_update_backup, and the stale
--backup help string. Users who want the #48200 safety net opt in via
updates.pre_update_backup: true or --backup for a single run.

Updated test_default_enabled_creates_backup -> test_default_disabled_is_silent
to assert the new default (silent no-op, no zip).
2026-06-25 16:01:09 -07:00
kshitij
e4ff494860 fix(cron): add default retention to per-run job output (#52383) (#52646)
* fix(cron): add default retention to per-run job output to bound disk usage (#52383)

Per-run cron output (cron/output/<job>/<timestamp>.md) is written once
per execution and was never pruned, so a frequently-scheduled job on
a long-running deploy accumulates one file per run indefinitely and
can fill the volume ('no space left on device').

save_job_output() now keeps the most recent N output files per job and
removes older ones. N defaults to 50 and is configurable via
cron.output_retention; a non-positive value disables pruning for
operators who manage cleanup externally.

Salvaged from #52402 by @0xDevNinja.

Closes #52383

* fix(config): add cron.output_retention to DEFAULT_CONFIG

Follow-up to #52383: the retention config key was functional via
get()-with-default but missing from DEFAULT_CONFIG, so the deep-merge
wouldn't auto-populate it for new installs. Add it explicitly.

---------

Co-authored-by: 0xDevNinja <manmit0x@gmail.com>
2026-06-25 16:00:13 -07:00
brooklyn!
ffa3d3c811 Merge pull request #49037 from NousResearch/bb/projects-paradigm
feat(desktop): first-class projects — sidebar, coding rail, review pane, and agent project tools
2026-06-25 17:49:05 -05:00
Teknium
fd2a35b169 fix: stop reporting cache-hit rate and cost across all UI surfaces (#52717)
* fix: stop reporting cache-hit rate and cost across all UI surfaces

Cost estimates and cache read/write token reporting are unreliable on
providers that don't surface cached_tokens (e.g. ollama-cloud, which doesn't
implement prompt_tokens_details.cached_tokens), producing misleading
near-zero 'cache hit' readouts and cost figures. Remove cost + cache-hit
reporting from every user-facing surface; keep input/output/total token
counts (provider-agnostic and accurate) and the Nous account billing UI
(real account money, separate from per-conversation estimates).

Surfaces:
- CLI /usage + model-info: drop cost lines + cache read/write token lines
- Gateway /usage + /model: drop cost + cache lines
- tui_gateway/server.py: stop emitting cost_usd / cache_read in usage and
  subagent.complete payloads
- TUI (Ink): drop cost from status bar (+ showCost plumbing), /usage panel,
  thinking rollup, agents overlay (incl. compare view); keep token counts
- Desktop Command Center: drop cost stat, per-model cost, actual-cost hint

Underlying estimate_usage_cost / format_cost / insights cost columns are
left intact but no longer surfaced (display-only change, reversible).

* test: update TUI + gateway + CLI tests for removed cost/cache-hit reporting

- CLI /usage test asserts cost/cache lines are absent, tokens present
- gateway /usage test drops cost + cache asserts; removes cost-included test
- TUI subagentTree summary expectation drops the cost segment
- useConfigSync + appChrome status-rule tests drop showCost prop/state
2026-06-25 15:21:22 -07:00
Brooklyn Nicholson
19ca295a84 fix(desktop): clarify branch convert actions
Open checked-out branches, switch the primary checkout for the default branch, and create linked worktrees only for non-trunk free branches.
2026-06-25 17:19:36 -05:00
teknium1
3e99ec0ff9 test(hermes_state): cover update_session_billing_route overwrite + prompt null
Regression for the salvaged #48254 fix: billing route is first-writer-wins
via update_token_counts (COALESCE), so a mid-session provider switch left
the dashboard attributing cost to the original provider. Asserts the new
update_session_billing_route() overwrites unconditionally, nulls system_prompt
so the next turn rebuilds Model:/Provider:, and preserves billing_mode when
omitted (COALESCE on None).
2026-06-25 14:44:00 -07:00
x7peeps
c7e934a5b4 fix(hermes_state): persist billing provider/base_url after mid-session /model switch
The session database records billing_provider and billing_base_url using
COALESCE(column, ?) in update_token_counts(), making them write-once.
When a user switches models mid-session via /model, the runtime (agent.provider,
agent.base_url) updates correctly, but the session row never reflects the new
provider. This causes the dashboard Models page to display a stale provider
badge and misattributes token usage / cost analytics.

Fix: add update_session_billing_route() that unconditionally sets
billing_provider, billing_base_url, and billing_mode (no COALESCE), and call
it from switch_model() in agent_runtime_helpers.py after the swap succeeds.

This follows the same pattern as update_session_model() which already
unconditionally updates the model column (added for the identical COALESCE
problem on the model field).

Closes #48248
2026-06-25 14:44:00 -07:00
Gille
bf0513bca0 test(windows): align gateway restart CI coverage 2026-06-25 14:42:38 -07:00
Gille
e7d2f0b93c fix(windows): suppress console flashes and harden gateway restarts 2026-06-25 14:42:38 -07:00
Brooklyn Nicholson
9f3aa1685c fix(cli): register project command beside MoA 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
890e890281 chore(desktop): update package lock 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
a391523bcc i18n(desktop): add project and worktree strings 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
b8d220f268 feat(desktop): wire project settings and shell chrome 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
62af32efe7 feat(desktop): keep active sessions aligned with cwd 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
68680db10d feat(desktop): add Codex-style review pane 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
7a7f9a5b3d feat(desktop): add composer coding rail and worktree flow 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
488ae376db feat(desktop): render backend-authoritative projects sidebar 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
74352a1e61 feat(desktop): add project and coding stores 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
344415892f feat(desktop): add shared project UI primitives 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
e2b8018729 feat(desktop): add git worktree and review IPC 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
86e748df13 fix(agent): require code for coding posture 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
cb3f8ec03d fix(tools): isolate per-session worktree cwd 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
4ffdedd369 feat(tools): add project workspace tools 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
4e023f5bc9 feat(gateway): build authoritative project tree 2026-06-25 16:40:27 -05:00
Brooklyn Nicholson
e7811345c1 feat(kanban): link tasks to project worktrees 2026-06-25 16:40:26 -05:00
Brooklyn Nicholson
8a45ce2dd4 feat(projects): add per-profile project store 2026-06-25 16:40:26 -05:00
Brooklyn Nicholson
4cdd1a3230 feat(sessions): record git workspace metadata 2026-06-25 16:40:26 -05:00
brooklyn!
c4ba4770eb Merge pull request #52704 from NousResearch/bb/desktop-root-boundary-recover
fix(desktop): recover root error boundary from transient render races (salvage #41787)
2026-06-25 16:17:18 -05:00
brooklyn!
43f9d24513 Merge pull request #52703 from NousResearch/bb/desktop-resume-cross-wired-cache
fix(desktop): reject cross-wired runtime-id cache on session resume (salvage #50464)
2026-06-25 16:17:14 -05:00
Brooklyn Nicholson
2e3efce66e fix(desktop): recover the root error boundary from transient render races
A stale-index render race in assistant-ui (a just-shrunk thread rendered
at an old message index during a session switch / teardown) throws
errors like "tapClientLookup: Index N out of bounds", "Cannot read
properties of undefined (reading 'type')", or "Tried to unmount a fiber
that is already unmounted". These bubble to the root ErrorBoundary and
latch the WHOLE desktop app on the "Reload window" fallback even though
the next render against fresh state would be fine.

Teach the root boundary to treat that small set of known-transient
renderer errors as recoverable: log them and schedule a next-tick
reset() so React re-renders against current state instead of stranding
the user on the fallback.

Auto-recovery is BOUNDED -- at most MAX_RECOVERIES (3) attempts within a
5s window -- so a genuinely persistent error can't spin the boundary in
a reset -> throw -> reset loop; after the budget is spent the fallback
is left up for the user. Manual retry (the button) resets the budget.
Only the root boundary auto-recovers; scoped boundaries keep their own
fallbacks, and unrecognized errors are never swallowed.

Tests: transient race recovers (fallback never sticks), a persistent
recoverable error stops at the cap and surfaces the fallback (proving
the loop is bounded), and neither a non-root boundary nor an
unrecognized root error auto-recovers.

Closes #41693. Supersedes #41787 by @izumi0uu, reimplemented with a
bounded recovery budget so a non-transient error can't loop forever.

Co-authored-by: izumi0uu <izumi0uu@gmail.com>
2026-06-25 16:15:20 -05:00
Brooklyn Nicholson
f7bf740640 fix(desktop): reject cross-wired runtime-id cache on session resume
resumeSession's warm-cache fast-path trusted the
storedSessionId -> runtimeId -> ClientSessionState mapping without
checking the cached state still BELONGS to the session being resumed.
A pooled profile backend that gets idle-reaped and respawned
(pruneSecondaryGateways) re-mints runtime ids, so a recycled id can
resolve to a live-but-DIFFERENT session's cache entry. The only
existing guard was a session.usage 404 -- that catches a fully-dead
runtime id, but a recycled id still 200s, so the fast-path happily
painted the wrong transcript under the current route (open chat A,
chat B loads).

Fold the belongs-to check into a single takeWarmCache() helper used at
BOTH cache reads -- the early transcript-keep decision and the fast-path
itself -- so a cross-wired entry can't even briefly flash a stale
transcript before the full resume repaints. On a mismatch the helper
purges both stale map entries and reports a miss, falling through to a
full resume that rebinds a correct runtime id. The full-resume path
already guards its final paint with isCurrentResume(), so only the
cached fast-path was missing the belongs-to check.

Pre-existing bug from the initial desktop app (#20059); not introduced
by the session-switch perf work (#49807), which left these lines
untouched.

Tests: two cases in use-session-actions.test.tsx driven through a
harness that owns the two cache maps -- a cross-wired mapping is
rejected + purged (the bug), and a correctly-wired cache still serves
from memory with no needless refetch (no perf regression).

Supersedes #50464 by @professorpalmer, reimplemented to also guard the
early transcript-keep read (whole-class fix, not just the fast-path).

Co-authored-by: professorpalmer <professorpalmer@users.noreply.github.com>
2026-06-25 16:11:18 -05:00
Teknium
c6575df927 feat(moa): expose MoA presets as selectable virtual models (#46081)
* feat(moa): expose MoA presets as selectable virtual models

Reconstructed onto current main (PR #46081's base had diverged with no common
ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual
provider: each named preset is a selectable model under provider 'moa', and the
preset's aggregator is the acting model that answers and calls tools.

Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same
batch pattern delegate_task uses) — all references dispatched at once, collected
when every one finishes, then handed to the aggregator. Output order is
preserved, failures and the MoA-recursion guard stay isolated per reference.

- Removed the old mixture_of_agents model tool and moa toolset.
- Added moa as a virtual provider in the provider/model inventory.
- /moa is shortcut behavior over model selection (default preset / named preset
  / one-shot prompt).
- Dashboard + Desktop manage named presets; presets appear in model pickers.
- Parallel reference fan-out in agent/moa_loop.py with regression test.

* fix(moa): thread moa_config through _run_agent to _run_agent_inner

The reconstructed gateway MoA wiring declared moa_config on _run_agent (the
profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper
never forwarded it — _run_agent_inner had no such parameter, so the runtime hit
NameError: name 'moa_config' is not defined on the compression-failure session
sync path. Add moa_config to _run_agent_inner's signature and forward it from
both wrapper call sites (multiplex and non-multiplex). Caught by
tests/gateway/test_compression_failure_session_sync.py on CI shard test(4).

* fix(moa): classify moa as a virtual provider in the catalog

The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so
provider_catalog() fell through to the default auth_type="api_key" with no
env vars — tripping two catalog invariants:
  - test_provider_catalog: api_key providers must expose a credential env var
  - test_provider_parity: every hermes-model provider must be desktop-configurable

moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that
overlay as an auth_type fallback so the catalog reports moa as virtual (no real
credential, no network endpoint). Exempt virtual providers from the desktop
parity union check the same way 'custom' is exempt — derived from the catalog,
not a hardcoded slug, so future virtual providers are covered too.
2026-06-25 13:52:06 -07:00
teknium1
f284d85efa fix(cron): restore [SILENT] silence + suppress empty-turn explainer on Telegram
Scheduled jobs delivering to Telegram/etc. started posting a literal
'⚠️ No reply: the model returned empty content…' message instead of
staying silent. Two interacting causes:

1. The turn-completion explainer (#34452) replaces an empty model turn
   with a user-facing '⚠️ No reply…' string. In a cron context that is
   not a silence marker, so the scheduler delivered it — a regression
   from the previously-silent empty turn. run_job now detects the
   explainer text deterministically (via the same formatter that
   produced it) for abnormal-empty turn_exit_reasons and strips it to
   empty, so the existing empty-response suppression + soft-fail guard
   apply. The explainer is unchanged on CLI/gateway.

2. The cron suppression used a loose 'SILENT_MARKER in ...upper()'
   substring check. It leaked bracketless near-markers the model emits
   ('SILENT', 'NO_REPLY', 'NO REPLY' — #51438, #46917) and wrongly
   swallowed a real report that merely quoted '[SILENT]' mid-sentence.
   Replaced with _is_cron_silence_response(): suppresses a canonical
   token as the whole response, its own first/last line, or the
   documented bracketed '[SILENT] <note>' prefix — while a token buried
   mid-sentence in a genuine report is delivered. Preserves the
   intentional cron trailing/prefix tolerance (existing tests unchanged).

Tests: bracketless-variant suppression, mid-sentence-quote delivery,
direct matcher contract, and explainer-strip + defensive real-report
delivery.
2026-06-25 13:45:09 -07:00
kshitij
42bea9e298 Merge pull request #52618 from NousResearch/salvage/14185-todo-coercion
fix(tools): defensive type coercion in todo_tool for malformed LLM input (#14185)
2026-06-26 02:02:18 +05:30
liuhao1024
f23d077b5f fix(fuzzy-match): preserve boundary space after whitespace-normalized match
The trailing-whitespace expansion in _map_normalized_positions
unconditionally consumed whitespace after the matched region — including
the word-boundary space that separates the match from the next token.
This caused silent file corruption when the fuzzy matcher fell back to
the whitespace_normalized strategy.

Guard the expansion on the normalized match actually ending with
whitespace (i.e. the original had a run of spaces that were collapsed).
When the match ends with a non-space character, the first whitespace in
the original is a boundary and must not be consumed.

Fixes #52491
2026-06-26 01:55:27 +05:30
infinitycrew39
d40b5735a4 test(telegram): cover table auto-rich and topic routing
Assert bare tables upgrade to sendRichMessage under default/opt-out config,
DM-topic resumed sends without reply anchors, and rich finalize edits carry
forum topic routing metadata.
2026-06-25 13:10:54 -07:00
infinitycrew39
9d225fbf4e fix(telegram): auto-rich pipe tables and topic routing for sendRichMessage
Pipe-only markdown tables now use sendRichMessage even when rich_messages
is off, and resumed DM-topic sends route via direct_messages_topic_id
without requiring a reply anchor. Rich finalize edits forward topic kwargs.
2026-06-25 13:10:54 -07:00
teknium1
92b5987ca2 chore: add herbalizer404 + pyxl-dev to AUTHOR_MAP for auxiliary fallback salvage 2026-06-25 13:08:18 -07:00
teknium1
0d777453fa fix(auxiliary): fall back when a route can't run the model at all (400 capability mismatch)
The salvaged context-window screen (#52392) skips fallback candidates that
are too small, and the rate-limit/403 fixes skip candidates that are at
capacity. A third hard failure remained uncovered: a fallback that builds a
client fine but returns a 400 because it structurally cannot run the model.
The canonical case is a configured openai-codex / ChatGPT-account fallback
asked to compress a glm-5.2 conversation:

    400 - {'detail': "The 'glm-5.2' model is not supported when using
    Codex with a ChatGPT account."}

This is a request-validation error, so should_fallback was False and the
explicit-provider gate blocked it — the auxiliary task (compression) aborted
every turn, dropping middle turns without a summary and churning the session,
which is exactly what destroys the prompt cache.

Adds _is_model_incompatible_error() (400 + capability phrasing, excluding
not-found and billing 400s which the sibling predicates own) and treats it as
a fallback-worthy capacity error in both sync and async call_llm, so the chain
skips the incapable route and continues to the next viable candidate.
2026-06-25 13:08:18 -07:00
Tranquil-Flow
e4d026aa3b fix(auxiliary): screen fallback chain by context window for compression (#52392)
The runtime auxiliary fallback chain (_try_configured_fallback_chain and
_try_main_fallback_chain) returned the first reachable candidate without
checking whether the candidate's context window was large enough for the
task. For task='compression' this meant a reachable but undersized
fallback (e.g. 32K) could be selected and then fail, even when a later
larger-context fallback was available.

This adds two small helpers:

  _task_minimum_context_length(task)
      Returns MINIMUM_CONTEXT_LENGTH (64K) for compression, None for
      other tasks (vision, web_extract, etc.).

  _candidate_context_window(provider, model, ...)
      Thin wrapper around get_model_context_length that returns None on
      probe failure so unknown/custom endpoints pass through unchanged
      (preserves the existing fallback surface).

Both fallback loops now skip reachable candidates whose resolved context
is below the task minimum and continue iterating. The success path
(first viable candidate wins) is unchanged. Return shape and ordering
for healthy candidates are preserved.

Six regression tests cover:
  L2 configured chain skips too-small candidate
  L2 chain continues after skipping, returns last viable
  L3 main chain skips too-small candidate
  L4 unknown-context candidate passes through
  L5 non-compression task is not filtered
  L6 minimum constant matches MINIMUM_CONTEXT_LENGTH (64K)

3/6 fail on upstream/main without the production change (verified); all
6 pass with the fix. Full test_auxiliary_client.py suite (231 tests)
and related compression tests (130 tests) remain green.
2026-06-25 13:08:18 -07:00
herbalizer404
b82c83d320 fix(auxiliary): honor fallback chain when compression provider auth is unavailable
When an explicit aux provider cannot build a client before any request is
sent (missing raw env key, exhausted/unavailable OAuth or credential-pool
auth, resolver returning (None, None)), call_llm raised a misleading
"no API key was found" error and bypassed the configured fallback_chain
entirely. A provider authenticated through Hermes auth / the credential
pool (e.g. ollama-cloud) whose pool entry is exhausted hit this path, so
compression failed instead of routing to the configured fallback.

Adds _try_configured_fallback_for_unavailable_client() and wires it into
both sync and async call_llm before the raise, and into the startup
compression feasibility check.

Salvaged from #51835 by @herbalizer404.
2026-06-25 13:08:18 -07:00
pyxl-dev
751adfa6b9 fix: include rate-limit in auxiliary capacity-error fallback gate
Rate-limit (429) errors on explicit-provider auxiliary tasks were
silently failing instead of triggering the fallback chain. The
is_capacity_error gate only checked payment and connection errors,
excluding rate limits — so when a configured provider like
openai-codex hit its rate limit, auxiliary tasks (kanban_decomposer,
vision, web_extract, approval, etc.) had zero resilience.

Add _is_rate_limit_error() to is_capacity_error at both call sites
(sync and async paths) so rate limits trigger fallback regardless
of whether the provider was auto-detected or explicitly configured.

Fixes #52228
2026-06-25 13:08:18 -07:00
herbalizer404
ff8920299c fix(auxiliary): treat 403 subscription and session-usage-limit errors as payment errors for fallback
Ollama Cloud (and similar) return 403 with bodies like "this model requires
a subscription, upgrade for access" or "you have reached your session usage
limit, upgrade for higher limits". These are capacity/billing conditions
semantically identical to credit exhaustion, but _is_payment_error() did not
recognize them (403 missing from the status set; keywords missing), so the
configured fallback_chain was never tried and compression failed outright.

Adds 403 to the status set and the subscription/session-usage keywords.

Salvaged from #49076 by @herbalizer404.
2026-06-25 13:08:18 -07:00
kshitij
ca714f6189 Merge pull request #52653 from kshitijk4poor/salvage/33814-env-quote-hash
fix(config): quote .env values containing # to prevent token truncation (#30355)
2026-06-26 01:32:49 +05:30
kshitijk4poor
0654319644 chore(release): map srojk34 legacy prefix-less noreply in AUTHOR_MAP (#50098) 2026-06-25 12:56:05 -07:00
kshitijk4poor
d9bd7ce827 test(compression): pin rotation-fallback tests to in_place=False ahead of default flip
These 7 test sites assert rotation behavior (fork, child sessions, lock
contention, logging session-context follows id rotation, boundary hooks fire
on rotation). Pin each builder to in_place=False explicitly so they keep
exercising the retained rotation fallback regardless of the global default
(flipped to True in #38763). Rotation stays a working opt-out fallback and
deserves continued coverage — these are NOT deleted.

Pinned sites:
- test_compression_concurrent_fork._build_agent_with_db
- test_compression_logging_session_context._build_agent_with_db
- test_compression_rotation_state._build_agent_with_db
- test_compression_boundary_hook._make_agent (2 helpers: CompressionBoundaryHook + SessionCompressEvent)
- test_compression_concurrent_sessions._build_agent_with_db
2026-06-25 12:56:05 -07:00
kshitijk4poor
2107b86024 feat(compression): flip in_place default to True (#38763) [2/2]
In-place compaction (single durable session id, non-destructive soft-archive)
becomes the default. Rotation is now the opt-out fallback via
compression.in_place: false.

Prerequisite: #50098 (hygiene guard reads result flag not config flag) merged
first — without it, flipping the default causes permanent transcript loss on
gateway hygiene-compress and /compress when no session_db is available.

Blast radius (empirically measured on current main): 7 rotation-asserting
tests broke and are pinned to in_place=False in the companion test commit:
- tests/agent/test_compression_concurrent_fork.py (2)
- tests/agent/test_compression_logging_session_context.py (1)
- tests/agent/test_compression_rotation_state.py (1)
- tests/run_agent/test_compression_boundary_hook.py (2 _make_agent helpers)
- tests/gateway/test_compression_concurrent_sessions.py (2)
Rotation stays as a working fallback and deserves continued coverage.

Plan: .hermes/plans/in-place-compaction-38763.md
2026-06-25 12:56:05 -07:00
srojk34
510bf40705 fix(gateway): read compaction result flag not config flag in hygiene guard (#50098)
Salvage of #50098 by @srojk34, cherry-picked onto current main.

The hygiene auto-compress guard and the /compress slash command both read
compression_in_place (config flag — is in-place mode enabled?) instead of
_last_compaction_in_place (result flag — did in-place compaction actually
succeed?). Both agents are built without a session_db, so archive_and_compact
always fails silently and _last_compaction_in_place stays False. Reading the
config flag makes the guard think in-place succeeded, triggering
rewrite_transcript() which replaces the original messages with only the
compressed summary — permanent data loss.

Co-authored-by: srojk34 <srojk34@users.noreply.github.com>
2026-06-25 12:56:05 -07:00
Teknium
2a1e615565 fix: persist non-NULL system prompt on fresh turn setup (#45499) (#52616)
build_turn_context() created the DB session row via _ensure_db_session()
before the system prompt was restored/built, so a fresh API/gateway agent
carrying client-managed history inserted a row with system_prompt=NULL. That
tripped the misleading 'stored system prompt is null; rebuilding from scratch
... investigate the previous turn's write path' warning and a guaranteed
first-turn prefix cache miss. Move row creation to after _cached_system_prompt
is populated.

Verified live (OpenRouter + claude-sonnet-4.5): persistent-agent turns show
cache_read jumping to the full prefix on turn 2+ (write 24411 -> read 24411),
and the persisted system_prompt is non-NULL so fresh-agent restore keeps the
prefix cache warm.

Tests: turn-context ordering regression asserting _ensure_db_session runs
after _cached_system_prompt is populated.
2026-06-25 12:54:19 -07:00
Teknium
d7021af30f fix(learn): name distilled skills as author Hermes, not the host OS user (#52388)
/learn told the agent to fill the skill `author` field, and the system
prompt environment probe surfaces the OS login name (user=$(whoami) in
prompt_builder.py), so the model wrote the host username into published
SKILL.md frontmatter — a privacy leak the user never opted into, and
inconsistent run to run as the most-salient identity changed.

The /learn authoring prompt now sets `author` to the literal value
`Hermes` and explicitly forbids deriving it from the host environment
(OS/login user, git config, or any probeable identity). The skill names
itself as the tool that wrote it.

Closes #52368.
2026-06-25 12:48:08 -07:00
helix4u
4efec63a34 fix(tools): let session_search match session titles 2026-06-26 01:12:26 +05:30
rob-maron
2c02583c2b fix shape 2026-06-25 12:38:33 -07:00
rob-maron
525ee58b43 krea 2026-06-25 12:38:33 -07:00
helix4u
5191ebba22 fix(desktop): retry empty resumed transcripts 2026-06-25 13:36:19 -06:00
sweetcornna
150afea942 fix(config): quote env values containing hash 2026-06-26 00:54:34 +05:30
kshitijk4poor
73c8d5a1e7 fix: use self._session_db directly + add regression test
- Replace getattr(self.session_store, '_db', None) with self._session_db
  (the GatewayRunner's own SessionDB, consistent with existing usage in
  slash_commands.py L240/L499).
- Remove verbose comment referencing a branch name as an issue number.
- Update stale comment in run.py that said 'today it has no session_db'.
- Add regression test verifying session_db is passed and rotated session
  is persisted (adapted from #51624 by @LeonSGP43).
- Add _session_db=None to _make_runner fixtures in test_compress_command,
  test_compress_focus, and test_compress_plugin_engine.
2026-06-26 00:50:40 +05:30
Omar B
1a38a8ff7d fix(gateway): pass session_db to compress temp agents so persistence works
Manual /compress and session hygiene auto-compress both create temporary
AIAgent instances to run compression. These agents were created without
a session_db, so compress_context computed the compressed messages in
memory, rotated the session ID, and reported success — but never wrote
to the database. The next user message reloaded the original full
transcript, making compression appear to do nothing.

Fix: pass session_db=self.session_store._db to both temp agents so the
session rotation is properly persisted. Also set _end_session_on_close
on the /compress temp agent (already done in hygiene path) to prevent
cleanup from ending the newly rotated session.
2026-06-26 00:50:40 +05:30
brooklyn!
edf35918be Merge pull request #52620 from NousResearch/bb/desktop-session-switch-perf 2026-06-25 14:19:59 -05:00
Brooklyn Nicholson
e8561d61e6 test(tui_gateway): pin synchronous-build resume tests to eager_build
These three assert the eager build contract — stored runtime overrides /
profile db reach _make_agent synchronously, and the agent binds to the
compression tip. Under deferred-by-default the build runs off-thread, so
they raced the timer (green in CI, flaky locally). Pin them to
eager_build; deferred coverage lives in the protocol tests.
2026-06-25 14:13:07 -05:00
David Metcalfe
da73223f4a fix(desktop): show statusbar item tooltips on hover
Statusbar items declared a 'title' string (e.g. YOLO, gateway health,
agents, cron, version, context usage) that was populated by
use-statusbar-items.tsx but never forwarded to the rendered DOM in
StatusbarControls — so every statusbar button/menu/text/link had no
hover hint.

Wrap the four render branches (menu trigger, text, link, action) in
the existing 'Tip' component from components/ui/tooltip.tsx. Tip is
self-contained (carries its own Provider), instant (delayDuration=0),
themed (bg-foreground/text-background, auto-inverts per theme), and
already in use elsewhere in the desktop shell. Renders the child
untouched when label is falsy, so items without a title stay
zero-cost.
2026-06-25 12:11:17 -07:00
Brooklyn Nicholson
1ca1f9f2c7 refactor(tui_gateway): DRY the deferred-session paths
Collapse the duplicated cold-resume / lazy-watch / create scaffolding into
shared helpers: _deferred_session_record (the live-session dict minus the
agent), _lazy_resume_info (the not-yet-built session.info), _claim_or_reuse_live
(lock + double-checked register-or-reuse), and _schedule_agent_build (the
pre-warm timer). Net -12 lines, three copies of the ~30-key session dict and
the lazy-info block down to one each. No behavior change.
2026-06-25 14:03:03 -05:00
Brooklyn Nicholson
3bf00e459a perf(desktop): make deferred resume the default, not an opt-in flag
Per review: gating the faster path behind a `defer_build` flag that the
only caller always sends is pointless. Flip it — `session.resume` now
defers the agent build by default for every caller (desktop + Ink TUI);
a caller that needs the agent built synchronously passes `eager_build:
true` (used by the build-race test). The desktop no longer sends a flag.

While verifying the flip, fixed two real parity gaps the deferred path
had vs the old eager (`_init_session`) path:

- `_enable_gateway_prompts()` was never called on a deferred resume, so
  approvals/clarify wouldn't route through the gateway prompt callbacks.
- `_start_agent_build` never wired `background_review_callback` /
  `memory_notifications`, so a deferred-built session's self-improvement
  "💾 …" summary leaked to stdout instead of rendering in-transcript.
  Wiring it there also fixes it for `session.create` sessions, which
  build through the same path.

ACP is unaffected (it uses its own session_manager, not this RPC); the
Ink TUI already consumes the same lazy `info` shape from session.create
and upgrades on the later `session.info` event.
2026-06-25 14:03:03 -05:00
Brooklyn Nicholson
c4c590e4a1 perf(desktop): make session switching fast under load
Switching sessions in the desktop app could freeze the whole UI for
several seconds on heavy, tool-rich chats. Root causes and fixes:

- Cold `session.resume` built the AIAgent (MCP discovery, prompt/skill
  build) *before* returning, and the desktop awaits that RPC before it
  paints — so the entire switch blocked on the build. Add an opt-in
  `defer_build` resume path (the contract `session.create` already uses):
  return the full display transcript immediately, register an upgradable
  live session, and pre-warm the agent on a short timer. The persisted
  runtime identity (model/provider/base_url/api_mode/reasoning/tier) is
  restored on the deferred build so it can't drop the provider.

- Nothing bounded how many in-memory agents accumulate; a user who
  reconnects often piled up detached sessions for the full 6h TTL. Add a
  soft LRU cap (`max_live_sessions`, default 16) that evicts the
  least-recently-active DETACHED sessions (no live client) — never a
  running, awaiting-input, mid-build, or live-transport one. Reopening
  re-resumes from disk.

- On the prefetch-hit cold-resume path, skip rebuilding a throwaway
  merged-message array (and its 1000-entry Map) when the prefetch already
  painted the exact transcript; the downstream sameMessageList guard
  already drops the publish, so it was pure main-thread cost.

The desktop opts into `defer_build` for every non-watch cold resume; the
eager path stays for CLI/TUI and existing callers.
2026-06-25 14:03:03 -05:00
kshitij
5de8a8fbe8 Merge pull request #52375 from NousResearch/salvage/47237-dedupe-user-turns
fix(gateway): dedupe user turns on transient failure (#47237)
2026-06-26 00:30:59 +05:30
davidgut1982
6208d6b3be fix(gateway): dedupe user turns on transient failure (#47237)
When the gateway persists a user message after a transient provider
failure (429/timeout/auth error), subsequent retries of the same
Telegram message could stack duplicate user turns in the transcript,
causing the agent to fall behind by 1-2 messages.

Add has_platform_message_id() to SessionDB (using the existing
idx_messages_platform_msg_id partial index) and a SessionStore wrapper.
The gateway's transient-failure path checks this before
append_to_transcript -- if the platform_message_id is already
persisted, the duplicate write is skipped.

Salvaged from #47869 by @davidgut1982. Adapted to current main which
has additional append sites and an existing content-based dedupe in
the exception handler path.

Closes #47237
2026-06-26 00:11:17 +05:30
Tranquil-Flow
0be10607d9 fix(tools): defensive type coercion in todo_tool for malformed LLM input (#14185)
todo_tool crashed with `AttributeError: 'str' object has no attribute 'get'`
when the LLM emitted the `todos` param as a JSON-encoded string instead of an
array, or as a list containing non-dict items (observed intermittently on
Claude 4.5/4.6/4.7, and after a prior tool-call rejection where the model
"self-corrects" by wrapping the list in json.dumps).

Three additive guards, no behavior change for well-formed input:
- todo_tool(): if `todos` is a str, json.loads it; reject unparseable strings
  and non-list values with a clear tool_error instead of crashing downstream.
- _validate(): non-dict items return a {id:"?", content:"(invalid item)"}
  placeholder rather than calling .get() on a str/int/None.
- _dedupe_by_id(): non-dict items get a synthetic key so _validate handles them.

Salvaged from #14785 by @Tranquil-Flow (authorship preserved via cherry-pick).
Comprehensive tests: JSON-string coercion (parse / unparseable / non-list /
non-string), non-dict list items (str/None/int/mixed), and a well-formed-
unchanged regression class — both guards mutation-verified to fail without them.

Closes #14185. Supersedes #14187, #22505, #14350 (same fix, less/no test
coverage) and #16952 (bundled unrelated scope-creep).
2026-06-25 23:42:42 +05:30
kshitij
d682f320b3 Merge pull request #52147 from NousResearch/salvage/29184-mcp-osv-nonblocking
fix(mcp): run OSV malware preflight off the event loop with a bounded timeout (#29184)
2026-06-25 23:39:44 +05:30
kshitij
c210e23a02 Merge pull request #52386 from NousResearch/salvage/31999-yaml-indent
fix(utils): unify YAML list indent across all config writers (#31999)
2026-06-25 23:39:37 +05:30
qdaszx
6305ac0e4b fix(mcp): run OSV malware preflight off the event loop with a bounded timeout (#29184)
During stdio MCP server startup, _run_stdio (an async method) called the
synchronous check_package_for_malware() inline. That makes a blocking
urllib HTTPS POST to api.osv.dev whose own timeout doesn't reliably cover a
stalled SSL handshake, so an intermittent network issue froze the entire
asyncio event loop for up to ~120s — blowing past the TUI/gateway's 15s
startup budget and showing "gateway startup timeout".

Run the check via asyncio.to_thread (off the loop) AND bound it with
asyncio.wait_for(timeout=_OSV_MALWARE_CHECK_TIMEOUT_S=12s). The malware check
is fail-open, so on timeout we log and proceed rather than blocking startup.

Salvaged from #29190 by @qdaszx (re-applied on current main — the call site
moved since the PR was opened), combining the to_thread approach also proposed
in #29192 by @ygd58. Two load-bearing tests: event-loop-not-blocked-during-
check and timeout-fails-open — both mutation-verified to fail against the old
inline blocking call.

Closes #29184.

Co-authored-by: ygd58 <buraysandro9@gmail.com>
2026-06-25 23:30:41 +05:30
xxxigm
0aea0c3654 fix(utils): unify YAML list indent across all config writers (#31999)
atomic_yaml_write used default yaml.dump which emits indentless
sequences (list items at column 0), while atomic_roundtrip_yaml_update
(ruamel.yaml) emits 2-space-indented sequences. Cross-path writes to
the same config.yaml toggled indentation on every save, eventually
producing a mixed-indent file that js-yaml rejects with 'bad indentation
of a mapping entry', silently dropping custom_providers and breaking
model switching.

Add IndentDumper SafeDumper subclass that forces indentless=False,
route atomic_yaml_write through it. Route tui_gateway._save_cfg and
the Telegram adapter's config writer through atomic_yaml_write so all
paths emit the same 2-indent layout.

Salvaged from #32034 by @xxxigm. Adapted to current main which already
has allow_unicode=True (from #51356) but was missing IndentDumper.

Closes #31999
2026-06-25 23:27:44 +05:30
brooklyn!
a53fc78c02 Merge pull request #52594 from NousResearch/bb/queue-resubmit-on-busy
fix(tui_gateway): queue mid-turn prompts instead of dropping them on a busy retry
2026-06-25 12:50:18 -05:00
kshitijk4poor
15ee2d6f04 refactor: lightweight sudo count + drop chatty multi-sudo tip
Replace _count_real_sudo_invocations (which called
_rewrite_real_sudo_invocations and discarded the rewritten string) with
a lightweight token scan that reuses the same tokeniser but skips string
building. Remove the agent-facing tip about nested sudo in heredocs —
the cache-cleared warning is enough.
2026-06-25 23:08:48 +05:30
xxxigm
d93abd75d1 test(terminal): cover sudo cache invalidation and multi-invocation piping 2026-06-25 23:08:48 +05:30
xxxigm
8278d82e17 fix(terminal): improve sudo -S password delivery and cache invalidation
Pipe one password line per sudo invocation in compound commands so a correct
password is not rejected on the second `sudo` in `sudo a && sudo b`. Drop the
session cache when sudo returns Authentication failed, surface sudo_auth_failed
in the tool result, and add hints for interactive sessions.
2026-06-25 23:08:48 +05:30
brooklyn!
931a5e92cc Merge pull request #52592 from NousResearch/bb/close-interrupt-tool-seq-sibling-paths
fix(agent): close tool-call sequence on all interrupt aborts (#48879 follow-up)
2026-06-25 12:31:27 -05:00
Brooklyn Nicholson
70319626a9 fix(tui_gateway): queue mid-turn prompts instead of dropping them on a busy retry
A prompt sent while a turn was in flight got rejected with 4009 "session busy",
which pushed clients (the desktop app) into a deadline-bounded busy-retry. When
turn teardown outlived that deadline — e.g. the user hits stop while a slow,
non-interruptible tool (web_search, read_file, an MCP call) is mid-flight, since
the sequential executor only checks the interrupt flag between tools — the
resubmitted message was silently dropped: "it just doesn't listen".

Wire the previously-dead display.busy_input_mode config into prompt.submit:
instead of rejecting, apply the policy and queue the message to run as the next
turn (drained in run()'s tail, ahead of goal/notification follow-ups). Modes:
interrupt (default) interrupts the live turn so it winds down promptly then runs
the queued message; queue runs it after the current turn finishes; steer injects
it into the live turn when accepted, else queues. The queued slot pins the
sender's transport and losslessly merges a second arrival. No client deadline,
no dropped sends.
2026-06-25 12:29:49 -05:00
Brooklyn Nicholson
2d286a6d00 fix(agent): close tool-call sequence on all interrupt aborts, not just finalize_turn
#48879 closed the tool-call sequence on interrupt inside finalize_turn so a
/stop after a tool no longer persists a `tool` tail that the next user message
turns into a `tool -> user` role-alternation violation (which strict providers
like Gemini/Claude react to by hallucinating a continuation and ignoring prior
context — what users see as "lost context after stop").

But the retry-wait, error-handling, and post-error retry-wait interrupt aborts
in conversation_loop return early and never reach finalize_turn, so they still
persisted and returned a raw `tool` tail. Interrupting during provider
backoff/rate-limiting (common under heavy work) hit exactly this path.

Extract the close into a shared close_interrupted_tool_sequence helper and apply
it at every interrupt abort (finalize_turn + the three early returns) so the
whole bug class is fixed, not just the one site.
2026-06-25 12:24:34 -05:00
brooklyn!
88e01d92e6 Merge pull request #52591 from NousResearch/bb/desktop-update-adhoc-sign
fix(desktop): ad-hoc sign macOS self-update rebuilds
2026-06-25 12:23:59 -05:00
Brooklyn Nicholson
1d9ed7f48a fix(desktop): ad-hoc sign macOS self-update rebuilds
The desktop self-updater rebuilds and re-signs the .app on each user's own
machine (`hermes desktop --build-only` -> electron-builder `--dir`). With
CSC_IDENTITY_AUTO_DISCOVERY on (its default), electron-builder signs the
type=distribution, hardened-runtime bundle with whatever identity is in that
user's keychain -- typically a personal "Apple Development" cert -- which
stalls/fails the sign step (no Developer ID, no provisioning profile) or
clobbers the original notarized signature with an unusable one, tripping
Gatekeeper on every post-update launch.

Force ad-hoc signing for the local packaged rebuild instead: deterministic,
and exactly what _desktop_macos_relaunchable_fixup already finishes off.
No-op for source runs, off-macOS, when a real identity is configured
(CSC_LINK / APPLE_SIGNING_IDENTITY), or when the caller already pinned the flag.
2026-06-25 12:08:29 -05:00
ethernet
a6a28ce3e2 fix(ci): run CI on all PRs to anywhere
fixes stacked PRs no-checks bug where
main < a < b
a merges into main
b is retargeted to main

but b doesn't run checks since it's not considered a new pr to main

now b will simply already have passing ci :)
2026-06-25 09:15:20 -07:00
Ben Barclay
d6269da7fd fix(gateway): harden scale-to-zero dormancy guards (#52359)
Block scale-to-zero suspend while background async delegations are active, and restore runtime status to running on real inbound after a dormant wake.\n\nAdd regression coverage for both review findings.
2026-06-25 20:41:03 +10:00
GodsBoy
f168631be0 fix(agent): gate verify-on-stop nudge off for messaging surfaces
The verify-on-stop guard (PRs #52296, #52297) defaulted ON for every
session, so on gateway messaging surfaces (Telegram, Discord, etc.) the
model complied with the nudge by writing a hermes-verify temp script and
emitting an ad-hoc verification summary, which the gateway delivered to
the end user as chat noise.

Resolve a surface-aware default instead. The DEFAULT_CONFIG value becomes
the sentinel "auto", which verify_on_stop_enabled() resolves to ON for
interactive coding surfaces (CLI, TUI, desktop) and programmatic callers,
and OFF for conversational messaging surfaces. The surface is read from
HERMES_SESSION_PLATFORM (what the gateway actually binds), with
HERMES_SESSION_SOURCE and HERMES_PLATFORM as fallbacks, matching the
sibling resolution in skill_commands.py and prompt_builder.py. An explicit
HERMES_VERIFY_ON_STOP env var or a boolean agent.verify_on_stop config
still overrides in either direction.

The passive evidence ledger and the call site are untouched.
2026-06-25 10:05:04 +02:00
Teknium
e62afaca62 fix(learn): teach /learn the full CONTRIBUTING.md skill standards (#52372)
The /learn authoring prompt taught a subset of the HARDLINE skill rules,
and stated the <=60-char description rule without making the model enforce
it — so generated descriptions overshot (up to 202 chars), which the
60-char system-prompt skill index then silently truncates.

- description: add the index-truncation rationale, a count-and-trim
  self-check, and a good/bad length example so the model actually hits <=60.
- add platforms-gating rule (OS-bound primitives -> declare platforms:).
- add author-credits-human-first rule.
- round out the Hermes-tool framing with the full wrapped-tool mapping and
  references/templates layout.

Closes #52367.
2026-06-25 00:17:23 -07:00
Teknium
60a2feeebf chore: add benbenlijie to AUTHOR_MAP for PR #47205 salvage 2026-06-25 00:17:17 -07:00
benbenwyb
6f2b2a1f34 fix: handle named custom providers and Z.AI overload retries 2026-06-25 00:17:17 -07:00
Ben Barclay
736e981abf fix(auth): honor NOUS_INFERENCE_BASE_URL env override for Nous OAuth sessions (#52270)
The host-allowlist hardening (#30611) plus the refresh heal (#49735) left
the documented NOUS_INFERENCE_BASE_URL dev/staging escape hatch unreachable
for OAuth sessions, despite three code comments asserting it still works.

Root cause — resolution precedence in resolve_nous_runtime_credentials:

    inference_base_url = (
        _optional_base_url(state.get("inference_base_url"))  # stored — wins
        or os.getenv("NOUS_INFERENCE_BASE_URL")              # env — unreachable
        or DEFAULT_NOUS_INFERENCE_URL
    )

A staging OAuth login persists its inference_base_url, but the allowlist
rejects the staging host and the refresh heal rewrites the stored value to
the production default. The stored (now prod) value is then read BEFORE the
env var, so the override never takes effect — every request 401s against
prod or is pinned to prod, and setting the env var does nothing.

Fix: the user-set env override is the most-trusted source, so consult it
FIRST for the URL used to build the client / returned to callers — while
keeping the PERSISTED value the validated, network-provenance one (the
override is a runtime overlay, never written to auth.json, so unsetting it
cleanly reverts to prod). Applied at both chokepoints:

- resolve_nous_runtime_credentials (no-refresh read path AND refresh path)
- the nous_portal proxy adapter, which re-validates the resolver's returned
  base_url against the prod allowlist as defense-in-depth and would
  otherwise reject a legitimate staging override at the forward boundary.

New _nous_inference_env_override() / split of stored-vs-effective URL keep
the threat model intact: Portal-returned URLs are still allowlist-validated
at every network site, and the env path stays ungated (trusted OS user).

Also folds in the no-refresh read-path heal (supersedes the approach in
the open #50265): a poisoned stored staging host now heals to the prod
default on read even when no refresh fires.

Tests: TestEnvOverrideWins (env wins on read + refresh paths; override never
persisted; poisoned stored heals) and TestProxyAdapterEnvOverride. Verified
the 4 behavioral tests fail against pre-fix code and pass with the fix; full
inference-validation + nous-provider suites green (85 passed). E2E-validated
against a real temp HERMES_HOME exercising the real resolver + proxy adapter:
resolver→staging, persisted→prod, proxy→staging, unset→reverts to prod.
2026-06-25 00:11:15 -07:00
kshitijk4poor
d6cf383d74 refactor(setup): simplify Z.AI picker — drop dead fallback, fix tests
- Remove dead `chosen_base or effective_base` fallback; _select_zai_endpoint
  always returns a non-empty base URL (returns current_base on cancel).
- Add .rstrip("/") to official-endpoint return for symmetry with custom-proxy
  path (both now return normalized URLs).
- Replace magic index 4 with len(ZAI_ENDPOINTS) in custom-proxy tests so they
  don't break if a 5th endpoint is added to ZAI_ENDPOINTS.
2026-06-25 12:07:01 +05:30
kshitijk4poor
d0df264213 test(setup): add ZAI endpoint picker tests, move base-URL tests to MiniMax
Z.AI now uses a curses picker instead of plain text input for base URL,
so the existing TestBaseUrlValidation tests (which used zai as their test
subject) are migrated to MiniMax, which still uses the text input path.

Add TestZaiEndpointPicker covering:
- Selecting each official endpoint (Global, China, Coding Plan Global,
  Coding Plan China) saves the correct base URL to config
- Custom proxy URL entry (valid + invalid rejection)
- Cancel keeps the existing base URL
- Current endpoint is the default choice in the picker
- Non-standard URL defaults to the Custom proxy option
2026-06-25 12:07:01 +05:30
kshitijk4poor
f3372d3407 feat(setup): wire Z.AI endpoint picker into _model_flow_api_key_provider
When provider_id == 'zai', replace the plain text Base URL input with
_select_zai_endpoint, which presents a curses picker offering Global,
China, Coding Plan Global, Coding Plan China, and custom proxy options.
Other API-key providers (MiniMax, DeepSeek, etc.) keep the text input.
2026-06-25 12:07:01 +05:30
kshitijk4poor
d0f9c4bcc6 feat(setup): add _select_zai_endpoint helper for Z.AI endpoint picker
Presents a curses-based picker (via _prompt_provider_choice) offering the
four official Z.AI endpoints — Global, China, Coding Plan Global, Coding
Plan China — plus a custom-proxy option. Sourced from ZAI_ENDPOINTS in
auth.py so it stays in sync with the probe list.

Not yet wired into the setup flow; that comes in the next commit.
2026-06-25 12:07:01 +05:30
brooklyn!
818f03cdd8 Merge pull request #52366 from NousResearch/bb/pet-gen-variant-remix
feat(pets): remix a draft into a fresh round
2026-06-25 01:12:47 -05:00
Brooklyn Nicholson
6b3ea2cea6 refactor(pets): tighten remix comments and confirm handler 2026-06-25 01:10:56 -05:00
Brooklyn Nicholson
5196575d40 feat(pets): remix a draft into a fresh round
Add a hover/focus "Remix" action on each completed draft card in the
generation grid. It re-runs generation with the chosen draft fed back in
as the reference image, keeping the same prompt and staying on step 2 so
the user can explore variations without starting over.

Because regenerating is slow and replaces the current drafts, the first
remix shows a one-time confirmation; the acknowledgement is persisted so
subsequent remixes fire immediately.
2026-06-25 01:09:19 -05:00
brooklyn!
4362c1a3af Merge pull request #52326 from NousResearch/bb/shared-tool-labels
fix(ui): share compact tool previews across clients
2026-06-25 00:53:11 -05:00
Brooklyn Nicholson
f3d6d9bbd3 fix(ui): share compact tool previews across clients
Move terminal/execute_code/read_file preview compaction into agent.display so CLI, gateway, and Ink TUI all inherit the same labels that desktop introduced in #52321.

The shared preview keeps raw args intact while trimming display-only shell plumbing (`cd`, pipe tails, banner/status echoes) and read_file line ranges. Desktop now prefers backend `context` for live rows and keeps its TypeScript fallback only for hydrated history.
2026-06-25 00:47:14 -05:00
brooklyn!
3af22c0ed5 Merge pull request #52338 from NousResearch/bb/pets-gen-timeouts
fix(pets): raise generation timeouts for the slow quality-first model path
2026-06-25 00:45:43 -05:00
Brooklyn Nicholson
a5849917a8 test(pets): make slow pet generation suite opt-in
The pet generation image-processing suite is deterministic but expensive enough
to blow the per-file CI timeout on Linux (140s), and it is not relevant to the
fast timeout PR's normal signal. Keep it available for manual validation, but do
not run it by default.

Set HERMES_RUN_SLOW_PET_TESTS=1 to enable the suite. The canonical test wrapper
now preserves that opt-in variable through its hermetic env.
2026-06-25 00:44:53 -05:00
Brooklyn Nicholson
25c31cab62 fix(pets): soften step-1 ETA copy to "several minutes"
The fixed "up to 5 minutes" wording undersells the slow quality-first path
(OpenAI image via OpenRouter), where a full hatch can run far longer. Use an
open-ended "several minutes" instead so the banner stays honest across the
fast and slow providers.
2026-06-25 00:35:54 -05:00
Brooklyn Nicholson
7078d9d1e2 fix(pets): raise generation timeouts for the slow quality-first model path
The quality-first default (OpenAI image via OpenRouter) is slow, and a full
hatch fans out ~8 rows with up to 3 retries each (300s/call) across 2 parallel
waves, so the absolute backend worst case is ~30 min. The old ceilings fired
mid-run:

- per-image HTTP call: 180s -> 300s (a single cold row can exceed 3 min)
- drafts RPC: 240s -> 420s (single wave, no retries — 7 min is ample)
- hatch RPC: 420s -> 1hr (sits above the ~30 min backend worst case)

The hatch ceiling is intentionally well above the realistic max so the frontend
never throws "request timed out" before the backend has exhausted its own
retries. The background-resumable notification path remains the real UX safety
net — the user can close the modal and get pinged on completion.
2026-06-25 00:34:52 -05:00
brooklyn!
a8e6a4f00b Merge pull request #52321 from NousResearch/bb/desktop-cmd-label-summary
fix(desktop): compact tool row titles
2026-06-25 00:03:07 -05:00
Brooklyn Nicholson
41f302fa73 fix(desktop): compact tool row titles
Make completed desktop tool rows read like useful activity labels instead of raw plumbing: terminal rows use a dispatch-style shell summarizer for agent wrappers, and read_file rows keep the action plus filename and requested line range.

The shell cleanup follows condensed-milk-pi's shape: split command compounds on real separators, strip pipe tails inside each segment, clean redirects/env prefixes, then classify setup/banner/status segments. Multi-command probes render as `first command + N commands`; the full command remains available in copy/detail.

Read rows now render as `Read package.json` or `Read main.ts L25-34`, using requested positive offset/limit and returned line numbers only as fallback for negative/unknown offsets.
2026-06-25 00:01:11 -05:00
Teknium
7a65800fed fix(cache): content-address prompt_cache_key so recurring cron jobs reuse the warm prefix (#52295)
Recurring cron jobs were prompt-cache-cold on every fire. session_id is
built as cron_<job_id>_<timestamp>, and the Codex/Responses transport used
session_id directly as prompt_cache_key — so the timestamp changed the cache
key on every run and the static prefix (agent identity + tool schemas) was
re-paid each tick.

Derive prompt_cache_key from a SHA-256 of the static prefix (instructions +
sorted tool schemas) instead. Repeated fires of the same job share one
content-addressed key (pck_<hash>) and reuse the warm prefix within the
provider's cache TTL. The key changes exactly when the prefix changes —
edit the job's prompt or toolset and it re-keys; leave it alone and it stays
stable.

session_id is left untouched for transcript isolation, log correlation, and
the Codex/xAI session-scope routing headers (session_id, x-client-request-id,
x-grok-conv-id) — those are the per-fire identity, not the cache key. Only the
prompt_cache_key body field (standard OpenAI/Codex path and the xAI extra_body
field) is content-addressed.

Closes #51395.

Co-authored-by: spiky02plateau <spiky02plateau@users.noreply.github.com>
Co-authored-by: JoaoMarcos44 <JoaoMarcos44@users.noreply.github.com>
2026-06-24 21:46:30 -07:00
Ben Barclay
72ae163250 fix(relay): authorize relay-delivered events by delivery, not source.platform (#52306)
* fix(relay): authorize relay-delivered events by delivery, not source.platform

The #52190 upstream-authz fix keyed _is_user_authorized off
source.platform via _adapter_authorization_is_upstream(source.platform).
But a relay *message* inbound carries the UNDERLYING platform
(source.platform == discord/telegram/...), NOT Platform.RELAY, because
ws_transport._event_from_wire maps the connector's wire payload
(platform="discord") straight onto SessionSource for session-keying and
egress. The relay adapter is registered only under Platform.RELAY, so
adapters.get(Platform.DISCORD) misses, the trusted-upstream branch is
skipped, and the user hits the env-allowlist default-deny:

    WARNING gateway.run: Unauthorized user: <id> (<name>) on discord

(Live staging bug: alpha tester linked successfully, then every
follow-up DM was silently dropped.)

Fix: the authentic trust signal is that the event was delivered over the
per-instance-authenticated relay WS, not which platform it underlies. Add
a wire-INVISIBLE SessionSource.delivered_via_upstream_relay flag, stamped
by the relay transport in _event_from_wire, and authorize on it. The flag
is excluded from to_dict/from_dict so a peer can neither forge it across
the wire nor have it restored from persistence. The existing adapter-flag
check is retained for events whose source.platform IS Platform.RELAY
(interaction-passthrough). A direct Discord event on a multiplexing
gateway (direct + relay adapters) is unmarked and still default-denies.

* fix(relay): use identity check on delivery marker to avoid MagicMock fail-open

A MagicMock() source (used by test_signal.py and other gateway tests) auto-
vivifies source.delivered_via_upstream_relay as a truthy Mock, which a bare
truthiness check would treat as authorized — flipping
test_signal_in_allowlist_maps from False to True. The marker is a real bool on
SessionSource, so check 'is True' explicitly: refuses to authorize any non-bool
stand-in, defensive against accidental fail-open.
2026-06-25 14:21:09 +10:00
brooklyn!
0c442fa1d3 Merge pull request #52303 from NousResearch/bb/pets-gen-qa
feat(pets): quality-first OpenRouter chain, stronger atlas gates, global pet-gen notifications
2026-06-24 23:16:40 -05:00
Brooklyn Nicholson
e92b5c6af8 feat(pets): quality-first OpenRouter model chain + stronger atlas gates + global pet-gen notifications
OpenRouter/Nous image gen now runs a quality-first model chain by default:
attempt the highest-fidelity OpenAI image model first, then fall back to
Gemini 3 Pro Image when it's access-gated/unavailable/times out. An explicit
OPENROUTER_IMAGE_MODEL / config model override pins one model with no fallback.

Atlas validation rejects malformed model output instead of shipping it: adds a
per-state collapse guard (a single sliver/fragment row no longer passes because
other rows are healthy), on top of the existing postage-stamp + multi-pose
checks.

Desktop: pet-gen native notifications are now "global" (not tied to a chat
session), so a background generation started from the command center fires an
OS notification when the user is away even with no active session. Adds a
neutral "This can take up to 5 minutes." banner on step 1, and lets the
provider picker auto-size.

Tests updated/added for the OpenRouter fallback chain, the collapse guard, and
the global notification path.
2026-06-24 23:11:21 -05:00
brooklyn!
380d660cab Merge pull request #52297 from NousResearch/bb/ad-hoc-verify
Support ad-hoc verification scripts
2026-06-24 23:10:15 -05:00
brooklyn!
d473e5d07a Merge pull request #52296 from NousResearch/bb/verify-stop-loop
Add verification stop loop
2026-06-24 23:10:03 -05:00
brooklyn!
1512bad0bc Merge pull request #52286 from NousResearch/bb/verify-status
feat(gateway): expose coding verification status
2026-06-24 23:09:45 -05:00
brooklyn!
da0320bf40 Merge pull request #52285 from NousResearch/bb/verify-ledger
feat(agent): record coding verification evidence
2026-06-24 23:07:10 -05:00
Brooklyn Nicholson
a5a2edd451 feat(agent): recognize focused ad-hoc verification scripts
Allow focused temporary scripts to satisfy verification when no canonical suite is detected, while keeping suite evidence distinct from ad-hoc proof.
2026-06-24 23:03:45 -05:00
Brooklyn Nicholson
2f1a47b90e feat(agent): require verification before finishing edits
Make verification closure the default coding behavior after landed file edits while keeping bounded retries and config/env switches for users who need to disable it.
2026-06-24 23:02:48 -05:00
Brooklyn Nicholson
7ef0f360d0 feat(gateway): expose coding verification status
Add a read-only gateway RPC for querying the passive verification ledger without running checks from the UI surface.
2026-06-24 22:36:03 -05:00
Brooklyn Nicholson
f0beb6f617 test(agent): cover verification evidence ledger
Exercise command classification, session scoping, stale edits, bounded retention, and natural expiry for recorded verification evidence.
2026-06-24 22:35:27 -05:00
Brooklyn Nicholson
fcbdf3c356 feat(agent): record coding verification evidence
Record foreground verification commands in a bounded, profile-scoped ledger and mark evidence stale when code edits change the workspace.
2026-06-24 22:35:27 -05:00
Victor Kyriazakos
b177d4ee48 fix(cron): mirror continuable cron as a labelled user turn (alternation-safe)
Addresses review on #51077 (kxee). The continuable-cron mirror reused
gateway.mirror.mirror_to_session, which writes role=assistant — re-
introducing the exact alternation violation #2313 (37a997945)
deliberately removed: a cron brief landing as assistant after the
agent's last turn yields assistant->assistant, which breaks strict-
alternation providers (OpenAI/OpenRouter) per issue #2221. The mirror/
mirror_source metadata is also dropped at the SQLite boundary, so the
[Delivered from cron] label is lost on replay.

This is an intentional, opt-in (default OFF) reversal of #2313's
'cron output does not belong in interactive history' for the reply-to-
cron use case — gated behind cron.mirror_delivery / attach_to_session.

Fixes:
- mirror_to_session gains a role param (default 'assistant' — interactive
  send_message mirror unchanged, it IS the agent speaking). Cron paths
  pass role='user' with a '[Cron delivery: <task>]' prefix so the brief
  collapses via repair_message_sequence's consecutive-user merge on every
  provider, and stays distinguishable on replay despite the metadata drop.
- thread_seeded: defer seeding + the flag until delivery into the new
  thread actually succeeds. Previously set pre-delivery, so an open-
  succeeds / deliver-fails case both stranded a seeded-but-unseen brief
  AND suppressed the DM-fallback mirror.
- seed mirror now passes user_id='system:cron' to resolve the exact
  thread-keyed session row it just created.
- dedupe the duplicate BasePlatformAdapter import in _deliver_result.
- trim oversized docstrings to non-obvious WHY (AGENTS.md).
- docs: document cron.mirror_delivery / attach_to_session in
  website/docs/user-guide/features/cron.md.
- test: assert the cron mirror writes role='user' with the label prefix.

204 cron+mirror tests pass.
2026-06-24 20:27:05 -07:00
Victor Kyriazakos
b693bee100 feat(cron): thread-preferred continuable delivery (open a thread, mirror DM fallback)
Continuable cron jobs (attach_to_session / cron.mirror_delivery, default
OFF) now prefer a dedicated thread on thread-capable platforms, falling
back to origin-DM mirroring where threads don't exist.

- Thread-capable (Telegram topics, Discord/Slack threads): open a fresh
  thread for the job via the shipped adapter.create_handoff_thread,
  route the brief into it, and seed the thread-keyed session so the
  user's in-thread reply continues with full context. This is the
  'continuable cron opens its own thread' interface.
- DM-only (WhatsApp/Signal/SMS): create_handoff_thread returns None ->
  fall back to mirroring into the origin DM session (existing behaviour).

Reuses existing infrastructure end-to-end — no new adapter surface, no
provider-chain signature change:
- adapter.create_handoff_thread (already implemented per-platform,
  returns None on unsupported platforms = the fallback signal)
- the live SessionStore via adapter._session_store (already set on every
  adapter), reached without threading a new param through the frozen
  CronScheduler.start() contract
- gateway.mirror.mirror_to_session for the seed/append
- existing per-target delivery routing carries the new thread_id for free

Mirrors GatewayRunner._process_handoff's open-thread-or-fallback +
seed pattern, standalone for the cron delivery path. thread_seeded
guards against a double-mirror after seeding. Scoped to the origin
target only; fan-out/broadcast targets are never threaded or mirrored.

Config docs updated (cron.mirror_delivery) + cronjob tool
attach_to_session description reframed around continuable/thread-preferred.

Tests: +5 (thread id returned on thread platform; None on DM platform;
None without capability/loop; seed creates thread session + mirrors;
seed no-op on empty). 22/22 in TestCronDeliveryMirror; 532 cron tests
pass (4 failures pre-existing: croniter-not-installed + TZ).
2026-06-24 20:27:05 -07:00
Victor Kyriazakos
98f3c19282 feat(cron): pass origin user_id to delivery mirror (send_message parity)
Multi-participant parity with interactive send_message, which passes
HERMES_SESSION_USER_ID to gateway.mirror.mirror_to_session so the mirror
lands in the exact participant's session.

- cronjob_tools._origin_from_env now captures user_id from the session
  context at job-create time (alongside platform/chat_id/thread_id).
- _maybe_mirror_cron_delivery forwards user_id to mirror_to_session.
- _deliver_result threads origin.user_id through for the origin target.

Effect: in a per-user-isolated group chat (group_sessions_per_user=True,
the default), the mirror resolves to the member who scheduled the job
instead of conservatively no-op'ing on ambiguous candidates. DMs and
shared group/thread sessions are unaffected (single candidate). Default
still OFF.

Tests: helper forwards user_id; E2E _deliver_result forwards origin
user_id. 17/17 in TestCronDeliveryMirror; 527 cron tests pass (4 failures
pre-existing: croniter-not-installed + TZ, identical on baseline).
2026-06-24 20:27:05 -07:00
Victor Kyriazakos
c06ceb3232 refactor(cron): scope delivery mirror to the origin conversation
The cron->session mirror now fires ONLY for the delivery target that
equals the job's origin (platform+chat_id[+thread_id]). A job created
from a live gateway chat stamps that chat as origin, and that session is
guaranteed to exist (it is the conversation the user scheduled the job
in). Fan-out / broadcast / home-channel-fallback targets are never
mirrored: they are not a continuation of a conversation and may have no
session at all.

This makes the prior 'cold-start session seeding' concern a non-case by
construction: when the mirror semantically applies the session exists;
when none exists the target was never the origin, so we no-op.

Adds _target_matches_origin() + origin-scoping tests (exact match,
other-chat/other-platform/no-origin rejection, thread scoping, fan-out
mirrors only the origin target).
2026-06-24 20:27:05 -07:00
Victor Kyriazakos
1b181724fa feat(cron): optional mirror of cron delivery into target chat session
Adds an opt-in path so a cron job's delivered output is also appended to
the TARGET chat's gateway session transcript (as an assistant turn), so a
user reply to a recurring delivery (daily brief, reminder) is answered with
the delivery in context instead of 'what is that?' amnesia.

- Reuses the shipped gateway.mirror.mirror_to_session — the same primitive
  interactive send_message mirroring already uses. No messaging-toolset
  change (cron still can't call send_message; this rides delivery).
- Gated: per-job attach_to_session overrides global cron.mirror_delivery
  (config.yaml). Default OFF — historical isolation preserved byte-for-byte.
- Mirrors the CLEAN agent output, not the cron header/footer wrapper.
- Alternation/cache-safe: append lands at a turn boundary, never mid-loop,
  never mutates the cached system prompt. Cold-start (no target session)
  is a silent no-op; mirror errors never fail a successful delivery.
- Surfaced on the cronjob tool (attach_to_session) + config schema.

Driven by enterprise cron-as-control-plane use case. 10 new tests; full
cron + cronjob-tool suites pass (600).
2026-06-24 20:27:05 -07:00
brooklyn!
532b7ed408 Merge pull request #52265 from NousResearch/bb/desktop-tool-verb-shimmer
fix(desktop): localize tool title shimmer
2026-06-24 22:01:11 -05:00
Brooklyn Nicholson
281b333cc5 test(desktop): cover localized tool title shimmer 2026-06-24 21:59:41 -05:00
Brooklyn Nicholson
f2c45e2c81 fix(desktop): limit pending tool shimmer to action verb
Localize tool titles and split pending rows so only the action segment
shimmers — paths, commands, and URLs stay static.
2026-06-24 21:59:41 -05:00
brooklyn!
cbe5c5689f perf(desktop): bound tool-result rendering so big /learn runs don't freeze (#52273)
ToolFallback rebuilt the `part` wrapper every render, defeating the
buildToolView memo and re-running a full JSON.stringify of the result on
every ~33ms stream delta. A /learn over a large directory (many ~100KB
tool results) saturated the renderer main thread (hang/throttle) and
spiked memory until it OOMd (crash).

- Re-derive a stable `part` from the referentially-stable args/result so
  the view/copy memos hold across deltas.
- Clamp every inline-painted payload (detail, stdout/stderr, rawResult,
  technical trace) to MAX_TOOL_RENDER_CHARS; the row's Copy button still
  reads the uncapped view.detail for the full output.
2026-06-25 02:52:51 +00:00
Ben
0c3f197cff fix(relay): re-attach DM author user_id on outbound for connector egress
A DM reply carries no guild_id, so the connector's egress guard cannot
resolve the owning tenant from metadata.guild_id and declines the send
with "discord egress declined: target not routed to an onboarded tenant"
— the bug behind "the bot never replies in DMs". Guild replies are
unaffected (they carry guild_id), which is why the guild path worked
end-to-end while DMs looked broken.

The connector now resolves a DM reply's tenant from the recipient's
author binding (gateway-gateway #67, resolveByUser keyed on
metadata.user_id) — the outbound counterpart to inbound Phase 7a
author-first resolution. But it needs the recipient user_id ON the
outbound action, and the adapter only re-attached guild_id
(_capture_scope/_with_scope), no-op for DMs (the docstring even said so).

This extends the adapter's inbound-scope capture: for a DM (no guild_id)
remember chat_id -> the authentic author user_id we observed, and
re-attach it as metadata.user_id on outbound. Guild capture is unchanged
and wins when present; user_id is the DM-only fallback. The id is the one
the connector observed inbound (never gateway-asserted), so the trust
invariant holds.

+4 unit tests (DM reply re-attaches user_id + no guild_id; unknown chat
invents nothing; explicit user_id preserved; guild reply never carries
user_id). Proved load-bearing (reverting the re-attach fails the DM
test). 144 relay tests pass, ruff clean.

Pairs with gateway-gateway #67 (the connector-side resolver). Together
they close the DM-reply egress gap end-to-end.
2026-06-25 12:43:54 +10:00
Ben Barclay
c15945655f fix(terminal): sanitize host/relative cwd OVERRIDE before it reaches docker run -w (#50636)
terminal_tool() resolves a per-task cwd override that WINS over config["cwd"]:

    cwd = overrides.get("cwd") or config["cwd"]

config["cwd"] is sanitized for container backends in _get_env_config() (host
prefixes /Users//home//C:\\/C:/ and relative paths are replaced with the
backend default /root). But the override was applied RAW — it was never run
through that guard. The gateway/TUI registers the host launch dir as a cwd
override for workspace tracking (tui_gateway/server.py _register_session_cwd
-> _terminal_task_cwd -> _session_cwd -> os.getcwd()), so on a container
backend a host path leaked straight to `docker run -w <host-path>`:

  - Windows desktop: -w C:\Users\<user>  -> container fails to start (exit 125)
  - POSIX:           -w /home/<user>      -> same

The ACP adapter translates its override cwd (acp_adapter/session.py
_translate_acp_cwd), but the gateway path did neither translation nor
sanitization, so the override bypassed the one guard that would have caught it.

Fix: extract the host/relative-path predicate into a shared
_is_unusable_container_cwd() helper (so the existing _get_env_config()
sanitizer and the new guard can't drift), and re-apply it to the *resolved*
cwd at the override-resolution site. Valid in-container override paths
(RL/benchmark sandboxes that set cwd to /workspace, /root, ...) are absolute
non-host paths and pass through untouched.

Tests: unit-pin the predicate (Windows backslash/forwardslash, POSIX home,
macOS /Users, relative, valid container paths) AND an E2E call-site pin that
drives terminal_tool() with a host-path override registered and asserts the
cwd reaching _create_environment is sanitized. Mutation-verified: reverting
the call-site guard makes the two host-path E2E tests fail (showing the raw
host path leaking) while the valid-/workspace-override test stays green.
2026-06-25 02:33:40 +00:00
Teknium
411faf08bd fix(soul): installers seed the real default persona, upgrade legacy empty templates (#52246)
The desktop bootstrap (and curl/PowerShell/docker installs) seeded
~/.hermes/SOUL.md with a comment-only scaffold that contained no persona
text. That shadowed the runtime default (_ensure_default_soul_md ->
DEFAULT_SOUL_MD), since seeding is guarded by 'if SOUL.md doesn't exist'.
Result: every fresh installer install got the empty template instead of
the documented Hermes persona; desktop just made it visible in onboarding.

- install.sh / install.ps1 / docker/SOUL.md now write DEFAULT_SOUL_MD.
- _ensure_default_soul_md() upgrades a SOUL.md still matching the known
  legacy scaffold in place; customized files (any deviation, incl. a
  persona appended below the comment) are never touched.
- Detection normalizes CRLF/BOM so Windows-installer drift still matches.
2026-06-24 18:56:26 -07:00
Teknium
a4fa1481e2 fix(tui): route /learn through command.dispatch so the prompt fires (#52232)
The Desktop GUI (tui_gateway) slash worker subprocess has no reader for
the CLI's _pending_input queue. /learn's CLI handler prints the ack and
puts the built prompt onto that queue, so in the TUI the prompt was
silently dropped — ack shown, no LLM turn, no skill created (#51829).

command.dispatch already handles 'learn' correctly (returns
{type: send, message: build_learn_prompt(arg)}), but 'learn' was missing
from _PENDING_INPUT_COMMANDS, so slash.exec fell through to the worker
instead of routing to command.dispatch. Add it to the frozenset, matching
the existing goal/queue/steer/plan pattern.
2026-06-24 18:48:50 -07:00
Ben
d1cac0e5ef feat(gateway): scale-to-zero idle detection + dormant-quiesce (Phase 0)
The gateway-side BEHAVIOUR layer that consumes the relay scale-to-zero
primitives (gateway-gateway Phase 5): the gateway decides it is idle and
drives the relay transport dormant so the platform (Fly autostop:"suspend")
can suspend the now-traffic-idle machine, which wakes on the connector's
wakeUrl poke (decisions.md Q3=C', D1-D13).

- gateway/scale_to_zero.py: pure helpers — scale_to_zero_enabled (the NAS
  Labs HERMES_SCALE_TO_ZERO stamp, D11/Q8=A), parse_idle_timeout_seconds
  (config.yaml gateway.scale_to_zero.idle_timeout_minutes, D2),
  messaging_is_relay_only_or_absent (F6/D1), should_arm (D1/D11/§3.4(1)),
  is_idle (D2/D3/F7).
- gateway/run.py: _last_inbound_at clock stamped on user inbound in
  _handle_message (F13); the arm-gate + idle predicate + the
  _scale_to_zero_watcher dormant sequence (mark draining -> adapter
  go_dormant() -> cooldown), started only when armed. Deliberately NOT the
  stop path and NOT mark_resume_pending (F12/D13).
- tools/process_registry.py: has_any_active() for the bg-work guard (D3/F7).
- hermes_cli/config.py: gateway.scale_to_zero.idle_timeout_minutes default 5.

Tests: 38 pure-logic + 6 watcher (incl. bg-work regression guard proven RED).
Full relay + scale-to-zero suites: 184 passed. The 20 unrelated failures in
the broader run are PRE-EXISTING on origin/main (custom-provider/tools tests),
confirmed via a pristine baseline worktree.
2026-06-24 18:47:18 -07:00
Ben
96af4bec30 feat(relay): add go_dormant() transport mode for scale-to-zero (0.E0)
Net-new WebSocketRelayTransport.go_dormant() + RelayAdapter.go_dormant() —
the third transport mode the scale-to-zero behaviour layer needs, distinct
from both disconnect() and an unexpected close (decisions.md D12/F14):

- disconnect() sets _closing=True and CANCELS the reconnect supervisor
  (terminal "shutting down for good") -> a suspended machine never re-dials
  on wake, stranding its buffered backlog.
- an unexpected close re-dials IMMEDIATELY -> the socket never stays down,
  so the platform proxy never suspends the machine.

go_dormant(): going_idle->ack (reuse go_idle), then close the socket WITHOUT
setting _closing, so the reader's fall-through still arms the reconnect
supervisor (wake path stays live) but on the longer _dormant_redial_s
cadence so it doesn't fight the platform suspend window. A successful re-dial
clears _dormant. Honors the §3.4 wake->reconnect->drain contract.

Tests: 6 new in test_relay_going_idle.py incl. the F14 regression guard
(routing dormancy through disconnect() fails exactly the 4 wake-path tests).
Full relay suite 140 passed.
2026-06-24 18:47:18 -07:00
xxxigm
4aeaba6922 test(desktop): cover undefined/null attachment holes in ref helpers
Regression for the refText crash: attachmentDisplayText and
optimisticAttachmentRef must return null (not throw) when handed an
undefined/null attachment hole, so the submit path can't reproduce
"Cannot read properties of undefined (reading 'refText')".
2026-06-24 18:22:01 -07:00
xxxigm
7e2db0a140 fix(desktop): stop refText crash on undefined composer attachment holes
A session switch or draft restore can leave undefined/null holes in the
composer attachments array. AttachmentList was guarded against this in
#49624, but the sibling submit path was not: submitPromptText maps the
same array through attachmentDisplayText/optimisticAttachmentRef and
buildContextText (a.kind / a.label / a.refText), so a hole threw
"Cannot read properties of undefined (reading 'refText')" — an uncaught
renderer error that blanks the chat pane and shows "Desktop app link
offline".

Close the whole bug class:
- attachmentDisplayText / optimisticAttachmentRef no-op on a falsy
  attachment (shared chokepoint, also protects thread.tsx drop handler).
- submitPromptText filters falsy entries from the source array, and
  buildContextText filters its (possibly post-sync) input before reading
  fields.
2026-06-24 18:22:01 -07:00
helix4u
17beb55e3c fix(telegram): gate rich draft previews separately 2026-06-24 18:11:14 -07:00
Gille
284be6cc24 Merge pull request #52210 from helix4u/fix/desktop-update-progress-visibility
fix(desktop): surface update progress lines
2026-06-24 19:45:05 -05:00
brooklyn!
7157b213f5 Merge pull request #47959 from NousResearch/bb/pets-gen
Pet generation: frame-perfect hatch flow, backend picker, CPU-safe chroma, and CI-hardening
2026-06-24 19:41:34 -05:00
brooklyn!
153ad79524 Merge pull request #52201 from NousResearch/bb/desktop-shallow-update-count
fix(desktop): don't report a bogus update count for a shallow checkout
2026-06-24 19:34:02 -05:00
Brooklyn Nicholson
a05a9b0e07 test(delegate): harden heartbeat in-tool stale timing assertion
Stabilize the long-running-tool heartbeat test by patching stale thresholds inside the test and asserting the heartbeat exceeds the idle ceiling, which preserves intent while removing scheduler-sensitive assumptions that flake in CI.
2026-06-24 19:33:40 -05:00
Brooklyn Nicholson
2ea94c6c45 fix(pets): make inline generate cancel discard draft flow
Wire the sparkle generate button's cancel action to the same discard/reset path as step-2 cancel so abort semantics are consistent and always return to step 1 while retaining the prompt input.
2026-06-24 19:33:33 -05:00
brooklyn!
d635a6d507 Merge pull request #52208 from NousResearch/bb/desktop-update-steps
fix(desktop): stop the update overlay looking frozen while it works
2026-06-24 19:29:02 -05:00
brooklyn!
42e14d1089 Merge pull request #52205 from NousResearch/bb/desktop-restart-profile
fix(desktop): route gateway restart / status / update to the active profile
2026-06-24 19:28:53 -05:00
brooklyn!
b649cdee4a Merge pull request #52203 from NousResearch/bb/update-drain-announce
fix(update): announce gateway drain waits so desktop updates don't look hung
2026-06-24 19:28:44 -05:00
Ben
538c419d2e fix(gateway): scope dashboard liveness fallback to the profile
PR #52151 hardened the runtime-status liveness check to trust a readable
live process command line over stale gateway_state.json argv, so a recycled
PID now owned by an s6 supervisor no longer counts as a running gateway.

That fix is correct but incomplete for the reported symptom: the web
dashboard showed a named profile's gateway green while
`hermes -p <name> gateway status` showed it stopped. Two further issues:

1. Cross-profile PID reuse. In per-profile Docker supervision, one profile's
   stale `gateway_state.json` can record a PID the OS later recycled onto a
   DIFFERENT profile's live gateway. That PID's command line still
   `looks_like_gateway`, so the dead profile was reported running. The
   recorded argv has its `-p <name>` selector stripped in-process by
   `_apply_profile_override`, so it cannot disambiguate; the live `/proc`
   cmdline still carries it. `get_runtime_status_running_pid` now accepts an
   `expected_home` and validates the live command line belongs to THAT
   profile (mirroring `hermes_cli.gateway._matches_current_profile`, the
   logic the CLI scan path already uses — which is why the CLI was correct).
   `_check_gateway_running` passes the enumerated profile dir.

2. The existing regression test `test_gateway_running_check_falls_back_to_
   runtime_state` used the live pytest PID with a gateway-shaped record; once
   the live cmdline became authoritative it no longer looked like a gateway.
   Updated to mock the live cmdline to the real separate-process scenario it
   describes.

The active-profile path (`get_running_pid`) is intentionally left unscoped:
it is lock-verified and any live gateway cmdline is acceptable there. Multiplex
mode is unaffected — `running` state is only ever written to a gateway's own
home, never a secondary served profile's.

Adds coverage for: cross-profile PID reuse (named + default), matching
profile cmdline (`-p`, `--profile`, explicit HERMES_HOME=), the bare default
gateway, and the unreadable-cmdline cross-platform fallback. Each new
cross-profile assertion fails without the profile scope and passes with it.

Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
2026-06-25 10:25:54 +10:00
helix4u
f1617a7ebb fix(gateway): validate runtime status pid command line 2026-06-25 10:25:54 +10:00
Brooklyn Nicholson
592c462e3c refine(pets): preserve user-requested tone in generation prompts
Remove cute/chibi-biased wording from base draft variations and explicitly preserve the requested mood across base and row prompts so scary, eerie, or other non-cute concepts are honored while keeping sprite constraints.
2026-06-24 19:22:00 -05:00
Brooklyn Nicholson
9a4600c5fb fix(desktop): stop the update overlay looking frozen while it works
Two ways the update overlay read as stuck even though the update was
streaming progress underneath:

- In-app (macOS/Linux) UpdatesOverlay: runStreamedUpdate forwards every
  stdout line as a progress event with percent: null, and ingestProgress
  wrote that straight through — clobbering the milestone percents (10/60)
  so the bar fell back to indeterminate on every log line. Keep the last
  percent when a line carries null.

- Staged install/update overlay: the bar is completedCount / totalCount,
  which counts only *finished* stages, so a long first stage pinned it at
  "0 of 2" / 0% until the stage ended. Count the running stage as half a
  unit so the bar advances during the stage (the per-stage spinner already
  shows which step is live).

Both are display-only; no stage/event semantics change. (The Windows
hermes-setup Tauri progress UI in apps/bootstrap-installer has the same
counter-only-on-completion logic — parity follow-up.)
2026-06-24 19:20:38 -05:00
Brooklyn Nicholson
00779800f6 fix(desktop): hide platform/internal toolsets from the Skills & Tools list
GET /api/tools/toolsets returns the full CONFIGURABLE_TOOLSETS set with no
desktop curation, so the Skills & Tools → Toolsets list shows entries that
don't belong in a flat per-user toggle: platform-coupled toolsets (discord,
discord_admin, yuanbao — which `hermes tools` already platform-restricts off
the CLI) and internal plumbing (context_engine, moa). `hermes tools` curates
these out; the desktop didn't.

Add a small documented block-list + predicate (mirroring
desktop-slash-commands.ts) and apply it in the toolset list filter. Hiding a
row is cosmetic — enabled state and runtime gating are untouched.
2026-06-24 19:18:29 -05:00
Brooklyn Nicholson
65b13e9dbc fix(desktop): route gateway restart / status / update to the active profile
restartGateway, getActionStatus, getStatus, updateHermes and
checkHermesUpdate all hit window.hermesDesktop.api WITHOUT spreading
profileScoped() — unlike their siblings (getModelInfo, setModelAssignment,
grantComputerUsePermissions). _apiProfile tracks the active gateway
profile, and the Electron proxy uses request.profile to pick which pooled
/ remote backend serves the call.

So for a multi-profile or global-remote user, the System-panel "Restart
gateway" (and its status poll, plus Update / status reads) targeted the
primary/default backend instead of the one they're on: the restart hit
the wrong gateway and the poll never saw the action → it looked like
restart silently failed. Single-profile users are unaffected
(profileScoped() returns {} when no profile is active).

Add ...profileScoped() to the five backend-action helpers so they follow
the active profile like the rest of the API surface.
2026-06-24 19:16:26 -05:00
AIalliAI
463bf2be25 fix(update): announce gateway drain waits so desktop updates don't look hung
On macOS, the desktop updater's stage 1 (hermes update --gateway) ends by
restarting running gateways. launchd_restart() SIGTERMs the gateway and
silently waits up to agent.restart_drain_timeout (default 180s) for the
drain; the manual profile-gateway loop waits its drain budget per gateway
the same way. Neither path prints anything before the wait, so the desktop
updater's live output goes dead for minutes right after '✓ Update
complete!' — users read it as a hung update and force-kill their gateway
processes to make it move (#44515). The systemd branch already announces
its drain ('draining (up to Ns)...'); launchd and the manual loop did not.

Print the stop/drain (with PID and budget) before the wait in both paths,
mirroring the systemd branch, and assert the message in the existing
launchd drain test.

Fixes #44515
2026-06-24 19:12:44 -05:00
briandevans
cb6edbf448 fix(desktop): skip the rev-list count when it is discarded anyway
checkUpdates() ran `git rev-list HEAD..origin/<branch> --count`
unconditionally in the parallel probe batch, even on the shallow +
no-merge-base path where resolveBehindCount() ignores the result and
falls back to a SHA compare. In the #51922 failure mode that count walks
the entire remote ancestry (thousands of commits), so the work was pure
latency on every update check for the exact case the fix targets.

Split the probes into two phases: resolve --is-shallow-repository and
merge-base first, then run rev-list --count only when shouldCountCommits
says the number is meaningful (full clone, or shallow-with-merge-base).
The shallow/no-merge-base SHA fallback is preserved unchanged.
2026-06-24 19:12:09 -05:00
briandevans
a6485bddb8 fix(desktop): don't report a bogus update count for a shallow checkout
The desktop installer clones with `--depth 1`, so a public install's local
history often shares no merge-base with the freshly fetched origin tip. In
that state `git rev-list HEAD..origin/<branch> --count` enumerates the
entire remote ancestry and returns a meaningless huge number, surfacing as
e.g. "v0.17.0 (+12104)" in the update indicator (#51922).

The official-SSH branch of checkUpdates() already sidesteps this by reporting
a binary up-to-date check (`behind: currentSha === targetSha ? 0 : 1`), and
hermes_cli/banner.py guards the identical class for the CLI banner. The
passive desktop count path was the one place the shallow guard was missing.

Detect shallow + no-merge-base up front and fall back to the same SHA-based
binary check; full clones (developers / Docker dev images) keep the exact
count path unchanged. The resolution logic lives in a pure update-count.cjs
helper so it is unit-testable without booting Electron.
2026-06-24 19:12:09 -05:00
Brooklyn Nicholson
1fe013ee16 feat(pets): polish generate flow and reduce hatch CPU pressure
Ship the final pet-generation UX polish (provider picker behavior, step-2 cancel flow, banner integration, and visual consistency) and make saturated-chroma background removal C-op driven so hatch processing no longer hammers the machine during long runs.
2026-06-24 19:08:06 -05:00
Ben
d335164833 fix(relay): authorize relay inbound via connector-enforced upstream authz
A hosted instance fronted by the Team Gateway connector dropped EVERY relay
message as "Unauthorized user" and the agent never replied — despite the
message routing correctly through the connector to the instance.

Root cause: gateway authorization (_is_user_authorized) had no notion of
upstream-enforced authz. Platform.RELAY matches no {PLATFORM}_ALLOWED_USERS
allowlist and isn't in the HA/WEBHOOK always-authorized set, so a relay user
with no env allowlist configured hit the default-deny ("No user allowlists
configured. All unauthorized users will be denied."). The message was received,
then silently denied before reaching the agent.

This is incorrect for relay: the connector authenticates the gateway's WS with
a per-instance secret and performs owner-only author-binding resolution BEFORE
delivering. A message only reaches this gateway because the connector resolved
it to THIS instance's bound user (user_instance_binding), keyed on the author id
the connector OBSERVED off the event — never a gateway claim. The authorization
decision is already made by a trusted, authenticated upstream; there is no local
RELAY_ALLOWED_USERS allowlist to consult, and default-denying for its absence is
the bug.

Fix: add a generic BasePlatformAdapter.authorization_is_upstream capability
(default False) that the relay adapter overrides to True, plus a dedicated
trusted branch in _is_user_authorized that honors it. This is delegation to a
trusted upstream, NOT a fail-open: it fires only for an adapter that explicitly
declares the flag; every direct network-exposed adapter leaves it False and the
env-allowlist default-deny (SECURITY.md §2.6) is unchanged. Distinct from
enforces_own_access_policy, which mirrors a LOCAL config-driven allowlist —
this delegates to an authenticated upstream's decision.

Tests: behavior contract that the base defaults False, the relay adapter
declares True, a relay user (group + DM) is authorized with no env allowlist,
and crucially a non-upstream adapter with no allowlist still default-denies
(guards against the fix becoming a blanket fail-open). 6 new tests; relay +
authz + config-policy suites green (134 + 90).

Found via live staging debug of the Discord self-serve onboarding flow.
2026-06-25 10:06:21 +10:00
brooklyn!
a378b1e980 Merge pull request #52192 from NousResearch/bb/session-loop-guard
fix(desktop): let the session watchdog heal a stuck "looping" turn
2026-06-24 19:03:43 -05:00
brooklyn!
4127332f15 Merge pull request #52189 from NousResearch/bb/desktop-offline
fix(desktop): give the gateway reconnect loop an escape hatch
2026-06-24 19:03:34 -05:00
brooklyn!
70650e82a3 Merge pull request #52187 from NousResearch/bb/desktop-voice
fix(desktop): wire Ctrl+B voice, declutter voice settings, stop endless TTS hang
2026-06-24 19:03:25 -05:00
brooklyn!
9a94865552 Merge pull request #52183 from NousResearch/bb/desktop-agents-status
fix(desktop): make Agents indicator match the Spawn-tree panel
2026-06-24 19:03:11 -05:00
Brooklyn Nicholson
93192059c9 fix(desktop): let the session watchdog heal a stuck "looping" turn
The 8-minute stream-silence watchdog only removed a stuck session from
$workingSessionIds (the sidebar dot). The composer's busy state lives in
the session-state cache and was never cleared, so a hung or looping turn
that never delivered its terminal event — including an old session
re-opened while the backend still reports it "running" — stayed wedged on
"Thinking" / Stop indefinitely.

Have the watchdog notify subscribers when it force-clears a session, and
subscribe from the session-state cache to also drop that session's
busy/awaiting/needsInput flags. updateSessionState re-syncs $busy when the
healed session is the one on screen, so the composer recovers instead of
spinning forever.

Frontend-only safety net; doesn't touch the turn lifecycle. The backend
root (a stale in-memory session["running"] surviving a dead turn thread
and re-arming busy on every resume) is a separate follow-up.
2026-06-24 18:36:17 -05:00
Brooklyn Nicholson
2a75c4a8cb fix(desktop): give the gateway reconnect loop an escape hatch
When a remote gateway dropped after a healthy boot (internet loss,
sleep/wake, VPS restart), use-gateway-boot retried with backoff forever
and never surfaced an error. The renderer sat behind the fullscreen
CONNECTING overlay with gatewayState non-open and boot.error null — no
way to reach Settings, sign in again, or switch to a local gateway. To
the user the app was simply broken on connection loss.

Raise a recoverable boot error once the reconnect loop crosses
RECONNECT_ESCALATE_AFTER (6 attempts, ≈45s), so the BootFailureOverlay
(Retry / Sign in / Use local gateway) replaces the dead-end CONNECTING
screen. The loop keeps retrying underneath; the next successful reconnect
(or a manual/wake-driven one) clears the error and dismisses the overlay.

This implements the contract already specified — but never wired up — in
use-gateway-boot.test.tsx (desktop vitest isn't in CI, so the failing
"FIX:" specs went unnoticed). All 4 hook tests + the 3 connecting-overlay
tests pass.
2026-06-24 18:32:29 -05:00
Brooklyn Nicholson
8d1706ae5c fix(desktop): wire Ctrl+B voice, declutter voice settings, stop endless TTS hang
Three voice-mode papercuts in the desktop app:

1. Ctrl+B did nothing. The docs + `voice.record_key` advertise Ctrl+B to
   talk, but the desktop never bound it (only ⌘B = sidebar existed). Add a
   rebindable `composer.voice` action that toggles the voice conversation,
   defaulting to ⌃B on macOS (distinct from ⌘B; off-macOS `ctrl` folds to
   the sidebar chord, so it ships unbound there to avoid stealing it). The
   global keybind reaches the composer through a new focus-bus event.

2. The Voice settings page rendered every provider's options at once (~30
   fields). Filter to the *selected* TTS/STT provider's sub-fields; STT
   provider fields hide when STT is off. Picking "edge" now shows just the
   Edge voice, making it obvious voice chat also needs STT enabled.

3. Voice mode could hang "speaking" forever. Free Edge TTS sometimes returns
   audio that never fires `playing`/`ended`/`error`, so the playback promise
   never settled. Add a stall watchdog (rearmed on each progress tick, so
   long speech is never cut off) that rejects a stuck stream, letting the
   loop recover with a clear error.
2026-06-24 18:26:14 -05:00
Ben
41b9b7e719 test(lazy-deps): make durable-target tests network-free
CI test shard has no PyPI egress: the real 'pip install packaging==20.9'
in test_core_package_is_not_shadowed failed (the pypi.org reachability
probe passed but the actual install didn't), failing slice 2/6.

- Prove the anti-shadow invariant deterministically: synthesize a fake
  'packaging' in the durable target with a sentinel and assert the import
  still resolves to the core copy (TestCoreNeverShadowed). No network.
- Cover the install wire offline: stub subprocess and assert --target +
  --constraint are built in durable mode and absent in venv-scoped mode
  (TestInstallArgConstruction).
- Gate the genuine PyPI install behind HERMES_RUN_NETWORK_TESTS=1 (opt-in,
  skipped in CI) instead of a flaky reachability probe that doesn't predict
  install success.
2026-06-25 09:20:13 +10:00
Ben
cbd6ba1bdd fix(docker): redirect lazy installs to a durable target so opt-in backends work in the immutable image (#51136)
The published Docker image seals the agent venv (root-owned, read-only
/opt/hermes) and sets HERMES_DISABLE_LAZY_INSTALLS=1 so a runtime install
can't mutate and brick the core. But opt-in backends (Firecrawl web search,
Exa, Feishu, ...) deliberately keep their SDKs in tools/lazy_deps.py and out
of [all] (pyproject policy 2026-05-12: one quarantined release must not break
every install). The two policies collided: the SDK isn't baked in AND can't
lazy-install, so the default Firecrawl web_search/web_extract fail out of the
box in Docker (#51136), as do Exa (#49445) and Feishu (#50205).

Fix the whole class instead of baking in one backend: when
HERMES_LAZY_INSTALL_TARGET is set, lazy installs are redirected to a writable
dir on the durable /opt/data volume via `pip/uv install --target`, and that
dir is APPENDED to the end of sys.path. Because the core venv always wins
name collisions, a package installed this way can only ADD new modules — it
can never shadow, downgrade, or break a module the core ships. The worst a
bad/incompatible backend package can do is fail to import and report itself
unavailable; the agent core stays healthy. That structural guarantee is what
made it safe to seal the venv, and it is preserved here even with installs
re-enabled.

- tools/lazy_deps.py: durable-target mode — `--target` install + core-pinned
  `--constraint` file (shared deps resolve to core's versions, conflicts fail
  loudly at install time), append-only sys.path activation, ABI/Python-version
  stamp that wipes the store if an image rebuild bumps the interpreter, and a
  reworked gate so HERMES_DISABLE_LAZY_INSTALLS=1 redirects (rather than hard-
  blocks) when a target is set. security.allow_lazy_installs=false still
  disables installs in every mode.
- hermes_bootstrap.py: activate the durable target on sys.path at first import
  (before any backend imports its SDK) so packages installed on a previous run
  are importable on this run.
- Dockerfile: set HERMES_LAZY_INSTALL_TARGET=/opt/data/lazy-packages.
- docker/stage2-hook.sh: seed + chown the dir on the data volume.
- tests: real-install E2E proving installs land in the target, import cleanly,
  don't leak into the sealed venv, and that a core package is never shadowed;
  ABI-stamp wipe/preserve; gate matrix; Dockerfile/stage2 contract test.

Fixes #51136
2026-06-25 09:20:13 +10:00
Brooklyn Nicholson
a268dfff0a fix(desktop): make Agents indicator match the Spawn-tree panel
The status-bar "Agents" item conflated three unrelated signals — running
subagents (aggregated across all sessions), in-flight session turns, and
failed background *system* actions (gateway restarts, toolset installs,
computer-use grants via $desktopActionTasks/preview restart) — yet
clicking it opens AgentsView, which renders only subagents. A failed
gateway restart therefore showed "Agents (1 Failed)" over an empty
"No live subagents" tree. AgentsView also filtered to the active session,
so a subagent running in a background session showed "Agents N running"
with nothing in the tree (the desync reported in #49808).

Unify the scope both surfaces speak:
- AgentsView aggregates subagents across every session (salvages #49819).
- The indicator's running/failed counts come from subagents only
  (aggregated), never background system actions — those keep their own
  surfaces in settings / command center.

So "Agents (N …)" now always points at a populated Spawn tree.

Supersedes #49819. Fixes #49808.
2026-06-24 18:16:14 -05:00
liuhao1024
404b06ac4f fix(gateway): honor server retry_after in _send_with_retry for Telegram flood control (#46762)
When Telegram's sendRichMessage returns a FloodWait/RetryAfter error,
_try_send_rich() now extracts the server-provided retry_after value and
propagates it through SendResult.retry_after. The base _send_with_retry()
layer honors this value instead of using its default short exponential
backoff (~2s, ~4s), preventing the retry budget from being exhausted
against a server that demands a 25-37s wait.

Salvaged from #46774 by @liuhao1024. Telegram adapter path moved from
gateway/platforms/telegram.py to plugins/platforms/telegram/adapter.py
since the original PR.

Closes #46762
2026-06-25 02:43:47 +05:30
kshitij
cedbb4cfa2 Merge pull request #52140 from NousResearch/salvage/47707-tool-schema-validation
fix(agent): validate context/memory tool schemas before wrapping (#47707)
2026-06-25 02:36:19 +05:30
kshitij
085096fd59 Merge pull request #52135 from NousResearch/salvage/51826-tirith-mkdtemp-oerror
fix(tools): catch mkdtemp OSError in tirith install (#51826)
2026-06-25 02:35:27 +05:30
kshitij
7d2c1f3f84 Merge pull request #52134 from NousResearch/salvage/42449-deepcopy-ctx-engine
fix(agent): deepcopy plugin context engine to prevent parent corruption on delegate_task (#42449)
2026-06-25 02:28:37 +05:30
Bartok9
710cd48fb1 fix(agent): validate context/memory tool schemas before wrapping
Closes #47707

Context engines and memory providers expose tool schemas via
get_tool_schemas(). agent_init.py wrapped each as
{"type":"function","function":_schema} without validating that
_schema carries a top-level name. A provider returning an entry already
in OpenAI tool form ({"type":"function","function":{...}}) was then
double-wrapped into a tool whose function has no name. Strict providers
(e.g. DeepSeek) reject the entire request with HTTP 400
'tools[N].function: missing field name', so one malformed schema
silently disables the whole toolset and breaks every turn. The schema
was also never added to valid_tool_names, so even lenient providers
could not call it.

Add a shared normalize_tool_schema() helper that unwraps an
already-wrapped entry and returns None for anything lacking a resolvable
string name. Wire it into the agent_init context-engine loop and all
three memory_manager surfaces (inject_memory_provider_tools,
add_provider routing index, get_all_tool_schemas), so a single bad
plugin schema is skipped with a warning instead of poisoning the
request.

Verification: 209 targeted agent/memory tests pass (incl. 9 new).
New tests assert the unwrap + skip-nameless behavior and fail without
the fix.
2026-06-25 02:17:29 +05:30
liuhao1024
dbf0797335 fix(tools): catch mkdtemp OSError in tirith install to prevent unbounded retry and temp-dir leak (#51826)
When tempfile.mkdtemp() raises OSError (e.g. disk full), the exception
propagated past the try/finally block, so _mark_install_failed() was
never called. The 24h backoff marker never engaged, causing unbounded
retry on every command -- each attempt leaked a tirith-install-* temp
directory, eventually filling /tmp completely.

Fix: wrap mkdtemp in its own try/except OSError, returning
(None, "no_space") so the caller's normal failure path (including
_mark_install_failed) executes.

Salvaged from #51831 by @liuhao1024.

Closes #51826
2026-06-25 02:13:56 +05:30
liuhao1024
8d1f6debfd fix(agent): deepcopy plugin context engine to prevent parent corruption on delegate_task (#42449)
When delegate_task spawns a child agent with a different model/provider, the
child's init_agent loaded the plugin context-engine GLOBAL singleton by
reference (`_selected_engine = _candidate`) and then called update_model() on
it with the child's (smaller) context_length. Because parent and child shared
the same object, this mutated the PARENT's compressor: e.g. DeepSeek 1M ctx
silently dropped to 204800 and the compression threshold from 200K to 40K
after any delegate_task with a different model.

Deepcopy the singleton before assigning/mutating it (agent_init.py) so the
child gets its own instance and the parent's compressor is untouched.

Salvaged from #42452 by @liuhao1024 (authorship preserved). Added a
source-pin regression test that fails if the production line reverts to the
bare alias, plus an end-to-end test driving get_plugin_context_engine() and a
StubEngine.update_model() — the original PR's tests exercised copy.deepcopy in
isolation but did not guard the actual agent_init code path.

Closes #42449. Supersedes #42469, #42474 (same one-line fix, no test).
2026-06-25 02:13:26 +05:30
kshitij
77d2b50751 Merge pull request #52118 from NousResearch/salvage/36776-ddgs-timeout
fix(ddgs): bound DuckDuckGo search with a wall-clock timeout (#36776)
2026-06-25 01:56:26 +05:30
kshitij
4d589b1e13 Merge pull request #52121 from NousResearch/salvage/43466-strip-cronjob-toolset
fix(delegate): strip cronjob toolset from delegated children (#43466)
2026-06-25 01:54:37 +05:30
uzunkuyruk
489b85ee1e fix(ddgs): bound DuckDuckGo search with a wall-clock timeout (#36776)
A single ddgs (DuckDuckGo) search could hang indefinitely and block the
shared agent loop — and therefore every platform (CLI, Telegram, Matrix...).
The DDGS constructor's timeout only bounds individual HTTP requests; ddgs's
multi-engine retry loop has no overall cap, so a slow/rate-limited response
could spin for 20+ minutes with no output and no error.

Run the synchronous ddgs call in a single-worker ThreadPoolExecutor and cap
it with future.result(timeout=_SEARCH_TIMEOUT_SECS=30). On timeout, return a
clear failure ("DuckDuckGo search timed out ... try a different provider")
instead of blocking; the pool is shut down with cancel_futures so a hung
worker is never awaited.

Salvaged from #37422 by @uzunkuyruk (authorship preserved). Re-applied on
current main (the PR's provider.py base had diverged). Added a load-bearing
timeout regression test (the original PR only updated the fake's constructor
and had no timeout-behavior test) — mutation-verified to fail without the cap.

Closes #36776.
2026-06-25 01:45:06 +05:30
kshitijk4poor
e25b56fc64 chore: AUTHOR_MAP entry for riyas22 (PR #43687 salvage) 2026-06-25 01:39:11 +05:30
Riyasudeen Farook
1e4df599ec fix(delegate): strip cronjob toolset from delegated children (#43466)
_strip_blocked_tools used a hardcoded set missing 'cronjob'. Children
on gateway platforms could inherit the cronjob toolset, scheduling
persistent jobs that outlive the delegation despite DELEGATE_BLOCKED_TOOLS.

Fix: derive the strip set from DELEGATE_BLOCKED_TOOLS at runtime so the
two lists can never drift. Add 'cronjob' to DELEGATE_BLOCKED_TOOLS for
documentation consistency. Two regression tests lock the invariant.

Salvaged from #43687 by @riyas22. Adapted test to current main (no
'messaging' toolset exists -- send_message is intentionally not
registered as an agent tool).

Closes #43466
2026-06-25 01:37:25 +05:30
kshitij
7a79a4447c Merge pull request #52116 from NousResearch/fix/46994-session-load-bool-iterable
fix(gateway): skip non-dict entries in session loading (#46994)
2026-06-25 01:33:36 +05:30
kshitij
8f0a12ce09 Merge pull request #52114 from NousResearch/salvage/27405-preflight-fewbig
fix(agent): trigger preflight compression on few-but-huge sessions (#27405)
2026-06-25 01:27:07 +05:30
kshitijk4poor
9c994377ed fix(gateway): skip non-dict entries in session loading (#46994)
Corrupted sessions.json entries (e.g. a bare bool where a dict is
expected) caused TypeError on 'origin' in data' which escaped the
(ValueError, KeyError) inner except and aborted loading ALL remaining
sessions, not just the corrupted one.

Two-layer fix:
- Loop level: isinstance(entry_data, dict) guard before from_dict
- from_dict: isinstance(data['origin'], dict) instead of bare truthiness
- Added TypeError to the inner except as defense-in-depth

Closes #46994
2026-06-25 01:26:13 +05:30
texhy
aacc6bb0a8 fix(agent): trigger preflight compression on few-but-huge sessions (#27405)
The preflight-compression gate only ran the (expensive) token estimate when
the message COUNT exceeded protect_first_n + protect_last_n + 1. A session
with a handful of very large messages never tripped the count condition, so
compression was never attempted and the turn eventually hit a hard
context-overflow error.

Add _should_run_preflight_estimate() with OR semantics: run the estimate when
either the message count exceeds the protected ranges (the historical gate)
OR a cheap char-based estimate already crosses the configured threshold. The
downstream estimate_request_tokens_rough() stays authoritative — this is only
a hint that decides whether to pay for the full estimate.

Salvaged from #27435 by @texhy (authorship preserved). Re-applied on current
main: the preflight gate moved from conversation_loop.py to turn_context.py
since the PR was opened, so the helper + gate are placed there; the test
imports the real MINIMUM_CONTEXT_LENGTH instead of a hardcoded literal.

Closes #27405.
2026-06-25 01:20:23 +05:30
kshitij
ed1fdb5b61 Merge pull request #52112 from NousResearch/revert/52053-minimum-context-floor
revert(plugins): revert minimum context floor configurable (#52053)
2026-06-25 01:11:53 +05:30
kshitijk4poor
e0272cfef2 Revert "fix(compression): make minimum context floor configurable (#31600)"
This reverts commit cae1ee44a7.
2026-06-25 01:04:44 +05:30
kshitij
59acaa972f Merge pull request #52053 from NousResearch/salvage/31600-minimum-context-length-configurable
fix(compression): make minimum context floor configurable (#31600)
2026-06-25 01:02:52 +05:30
kshitij
6800fd6608 Merge pull request #52091 from NousResearch/salvage/42874-memory-drift-guard-add
fix(memory): skip drift guard for add (append-only) action (#42874)
2026-06-25 00:58:39 +05:30
Tranquil-Flow
cae1ee44a7 fix(compression): make minimum context floor configurable (#31600)
Add compression.minimum_context_floor config key that allows users
to lower the compression threshold floor below the hardcoded 64K
default, preventing infinite tool-call loops on models whose
structured output degrades well before 64K tokens.

- agent/model_metadata.py: add get_configurable_minimum_context()
  helper with 16K hard safety limit
- agent/context_compressor.py: accept minimum_context_floor param,
  thread it through _compute_threshold_tokens
- agent/conversation_compression.py: use compressor's floor for
  aux model context validation
- agent/agent_init.py: read compression.minimum_context_floor from
  config and pass to ContextCompressor
- gateway/run.py: cache-busting includes new key

Salvaged from #31686 by @Tranquil-Flow onto current main.
Resolves conflicts with in-place compaction (#38763) and max_tokens
threshold computation (#43547) that landed after the original PR.

Closes #31600
2026-06-25 00:56:04 +05:30
liuhao1024
25e2312230 fix(memory): skip drift guard for add (append-only) action (#42874)
The drift guard (introduced for #26045) correctly protects replace/remove
from clobbering un-roundtrippable content, but it also fires on the add
path. Since add only appends and never overwrites, the guard is
unnecessary and causes false positives when prior add() calls in the same
session shift the byte count of the on-disk file.

Add skip_drift parameter to _reload_target() and pass True from add().
Replace/remove continue to use the drift guard unchanged.

Salvaged from #42880 by @liuhao1024.

Closes #42874
2026-06-25 00:51:12 +05:30
Jeffrey Quesnelle
b13e2fd694 Merge pull request #52044 from NousResearch/fix/install-venv-kill-venv-processes
fix(install): kill venv-resident gateway before recreating venv on Windows
2026-06-24 15:16:58 -04:00
Brooklyn Nicholson
b674f7ba28 feat(pets): offer backend setup when generation is unavailable
When no reference-capable image backend is configured, generating a pet is
impossible — so instead of a dead prompt + post-hoc error, the overlay now
detects it up front and offers a way out:

- pet.generate.status RPC reports whether a reference-capable provider
  (OpenRouter / Nous Portal / OpenAI) is set up; the overlay probes it on
  open and swaps the prompt for a friendly setup card (paw, one-line copy,
  "Set up image generation" → /settings?tab=providers, key links).
- useRouteOverlayActive(): reusable hook so any portaled modal yields the
  screen to a full-screen route overlay (e.g. settings) and reappears —
  re-running its mount effects — on return, instead of closing. The probe
  re-runs on that remount, so adding a key flips the card to the prompt.
2026-06-24 14:10:19 -05:00
kshitij
9214aa7dde Merge pull request #52090 from NousResearch/salvage/35994-reset-deadlock
fix(gateway): offload agent cleanup off the event loop in /new reset (#35994)
2026-06-25 00:34:21 +05:30
kshitijk4poor
0225480369 fix(gateway): offload agent cleanup off the event loop in /new reset (#35994)
The /new (and /reset) confirmation-button callback runs the slash-confirm
handler on the asyncio event loop (see _request_slash_confirm). That handler
calls _handle_reset_command, which invoked the SYNCHRONOUS, potentially
long-blocking _cleanup_agent_resources inline: agent.close() tears down
terminal sandboxes, browser daemons and background processes (subprocess
waits), and shutdown_memory_provider() can make a network call. A slow
teardown wedged the entire event loop, so the bot went silent and stopped
processing all messages until a manual restart.

Offload _cleanup_agent_resources via the existing contextvar-preserving
_run_in_executor_with_context helper, bounded by asyncio.wait_for with a
named _RESET_CLEANUP_TIMEOUT_S (30s). The loop is never blocked; on timeout
the reset proceeds and the worker thread is left to finish on its own (it
cannot be cancelled). The text /new path is unaffected (already off-loop).

Tests (tests/gateway/test_35994_reset_button_deadlock.py): the loop keeps
ticking while close() blocks in its worker thread; a cleanup that raises is
swallowed (warning logged) and the reset still rotates the session; a
cleanup that times out degrades gracefully. All three are mutation-verified
to fail without their respective production branch.
2026-06-25 00:27:22 +05:30
Brooklyn Nicholson
743985bf1e feat(pets): Pokédex generate UI — overlay, animated egg, hatch FX, manage
Dedicated generate modal (Cmd-K → Pets → Generate): prompt → 2×2 draft
grid → egg hatch → preview → adopt, width fits each phase.

- Reuses shared primitives (Button/Input/Dialog/Alert/GenerateButton);
  cards use selectableCardClass; only canvas + range stay raw.
- Animated creme pixel egg + PetStarShower hatch celebration (canvas).
- Live streamed drafts with a real Stop (AbortSignal); clean default name.
- Manage generated pets: badge + top ranking, rename (optimistic), safe
  delete (confirm + drop), export — in both the Cmd-K and Settings lists.
- pet-gallery routes every RPC through profile-scoped petRpc; i18n ×5.
2026-06-24 13:51:34 -05:00
Brooklyn Nicholson
aab49f6927 feat(pets): generation RPCs, non-blocking gallery + gateway plumbing
- pet.generate / pet.hatch (parallel rows, off the reader thread) +
  cooperative pet.cancel; pet.export / pet.rename.
- pet.gallery localOnly fast path + background manifest prefetch so the
  picker never blocks on petdex; rename follows the active-pet config.
- gateway request gains optional timeout + AbortSignal for real Stop.
2026-06-24 13:48:38 -05:00
Brooklyn Nicholson
3faf768cde feat(pets): OpenRouter + Nous Portal image backend
Reference-grounded image provider over the OpenRouter-compatible
chat-completions image protocol (Gemini Flash Image et al.). Nous Portal
proxies OpenRouter, so one provider serves both — giving pet generation a
reference-capable backend beyond OpenAI gpt-image.
2026-06-24 13:48:38 -05:00
Brooklyn Nicholson
32f837add1 feat(pets): prompt → atlas sprite-generation engine
Turn a text prompt into a petdex-spec spritesheet (8×9 grid of 192×208
cells), grounded so every animation row stays the same creature:

- orchestrate: base drafts (distinct variation nudges) → per-row grounded
  generation → atlas compose; one image call per row, rows fan out in parallel.
- atlas: frame-perfect registration in normalize_cells — 1-D cross-correlation
  of each frame's column-mass profile locks the body (robust to limbs/cape),
  one shared per-state scale, bottom-anchored; plus alpha-hole repair, gutter
  severing, and interior-seeded chroma-pocket clearing.
- prompts: pixel-art-by-default style hints + registration constraints.
- store: local pet write (register_local_pet), slugify/unique_slug,
  export_pet, slug-realigning rename_pet, createdBy provenance.
2026-06-24 13:48:29 -05:00
kshitij
de281bcebc Merge pull request #52084 from NousResearch/salvage/31884-silent-drop-after-stop
fix(gateway): surface retry hint instead of silently dropping turn after /stop (#31884)
2026-06-25 00:06:32 +05:30
kshitij
5b065e32ed Merge pull request #51051 from NousResearch/salvage/cron-provider-pin
fix(cron): fail closed when an unpinned job provider drifts from creation snapshot (#44585)
2026-06-25 00:05:52 +05:30
brooklyn!
a130b62678 Merge pull request #52086 from NousResearch/bb/salvage-desktop-window-state
feat(desktop): remember window size/position/maximized across launches (salvage #39154)
2026-06-24 13:35:46 -05:00
Brooklyn Nicholson
2de7549fe0 feat(desktop): remember window size/position/maximized across launches (salvage #39154)
The desktop window opened at a hardcoded 1220×800 every launch, discarding
whatever size and position the user left it at (#39101) — on macOS the dock
reopen was the most visible case, but every restart reset it.

A small window-state.json under userData (same pattern as connection.json /
updates.json) records the window's normal bounds plus its maximized flag,
written debounced on resize/move/maximize and flushed on close, applied on the
next createWindow(). getNormalBounds() captures the pre-maximize size so an
un-maximize next session lands where the user actually sized it.

Restore is defensive: sanitize rejects garbage, drops off-screen positions
(window falls back to Electron centering), and caps a size saved on a
since-disconnected larger monitor to the largest current display. The geometry
math lives in a side-effect-free window-state.cjs so it unit-tests with
node --test, no Electron boot. No new dependency.

Salvages #39154 by @jeffrobodie-glitch — same userData approach and validation
intent, reimplemented tighter and folded into one module.

Co-authored-by: jeffrobodie-glitch <jeffrobodie@gmail.com>
2026-06-24 13:32:05 -05:00
sweetcornna
b41d9b845d fix(gateway): surface retry hint instead of silently dropping turn after /stop (#31884)
After /stop, the next user message can hit a stale generation token and
return with api_calls=0, no failure, no interruption. _normalize_empty_agent_response
fell through to an empty string, so the gateway logged "response=0 chars"
and sent nothing — the message was silently lost while internal work
sometimes continued.

Add the api_calls==0 / not-failed / not-interrupted / not-partial branch
to the single normalization chokepoint so the user gets a short retry hint
instead of silence. Regression test asserts the hint surfaces.

Salvaged from #33851 (re-applied on current main; original was 1401 commits
behind and the function had moved).
2026-06-24 23:51:31 +05:30
brooklyn!
35e9c63d89 Merge pull request #52008 from infinitycrew39/fix/desktop-nous-onboarding-stale-provider
fix(desktop): stop Nous Portal onboarding from validating stale Anthropic config
2026-06-24 13:12:44 -05:00
emozilla
6638199c53 fix(install): harden venv-resident process sweep on Windows
Follow-up to the salvaged venv-recreate fix. Three changes to the
Install-Venv pre-delete sweep:

- Match the venv path with a case-insensitive StartsWith instead of the
  PowerShell -like operator. A venv path containing wildcard
  metacharacters ('[', ']') — legal in a Windows user name — silently
  fails to match under -like, which would let the locking process slip
  through and reintroduce the exact access-denied failure this fix
  closes.
- Retry Remove-Item once after a short pause. A force-killed process can
  take a moment to release its file handles, so the first delete may
  still hit a locked .pyd; retry before failing the stage.
- Note in a comment that the gateway autostart task runs at LIMITED
  integrity as the current user, so the installer always runs at
  equal-or-higher integrity and can read the process executable path,
  and that Get-CimInstance is preferred over Get-Process because it
  returns a null path for an uninspectable process instead of throwing.

Adds a regression test asserting the recreate branch sweeps by venv path
prefix, uses StartsWith rather than -like, and runs the sweep before
Remove-Item.

Covers issues #47036, #47557, #47910.
2026-06-24 13:25:44 -04:00
Dana Moverman
7e55b934ea fix(install): kill gateway running from venv before recreating it (Windows)
The Windows venv-recreate guard only runs `taskkill /IM hermes.exe`, but the
gateway that a scheduled task or watchdog autostarts runs as
`pythonw.exe -m hermes_cli.main gateway run` straight out of venv\Scripts\.
Its image name is python/pythonw, so taskkill never matches it; it keeps the
venv's native extensions (e.g. tornado\speedups.pyd) loaded, and the following
Remove-Item fails with "Access to the path is denied" -- aborting boot at the
venv stage so the desktop app never loads.

Additionally stop any process whose executable lives under this venv, matched
by path so the image name is irrelevant and a global/system python outside the
venv is never touched.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-24 13:22:36 -04:00
infinitycrew39
d8fe1c0b41 test(desktop): cover scoped onboarding runtime readiness checks
Assert setup.runtime_check honors provider params and that Nous OAuth
onboarding persists model config before validating the connected provider.
2026-06-24 23:19:51 +07:00
infinitycrew39
6da615c77c fix(desktop): scope onboarding runtime check to connected provider
Let setup.runtime_check accept an optional provider, persist the selected
provider/model before the gate, and validate the provider the user just
connected instead of a stale config entry such as anthropic.
2026-06-24 23:19:45 +07:00
Teknium
9259d1e5da chore(desktop): sync package-lock version to apps/desktop 0.17.0
The apps/desktop workspace was bumped to 0.17.0 in apps/desktop/package.json
but package-lock.json still recorded 0.15.1, so npm install reports the lock
as out of date and rewrites it on every fresh install. Regenerate the lock
(npm install --package-lock-only) to record the current 0.17.0; one-line
change, no dependency resolution churn.
2026-06-24 07:50:30 -07:00
kshitij
c42d44cb2f revert(plugins): restore user dashboard plugin backend API auto-import (#43719) (#51950)
* Revert "refactor(security): centralize non-bundled plugin sources in one constant"

This reverts commit e2bea0abe6.

* Revert "fix(security): restrict dashboard plugin backend import to bundled plugins (#43719)"

This reverts commit 8845f3316c.
2026-06-24 07:46:54 -07:00
kshitij
7fb2027d85 Merge pull request #51881 from NousResearch/fix/29559-compression-abort-on-network-failure
fix(compression): abort + preserve context on transient network summary failure (#29559, #25585)
2026-06-24 19:54:21 +05:30
kshitij
f477f892b3 Merge pull request #51043 from NousResearch/salvage/tui-config-destruction
fix(tui): preserve config on model switch — atomic writes + custom-provider guard (#48305)
2026-06-24 19:42:56 +05:30
kshitijk4poor
fce2af780f chore(release): add Elshayib to AUTHOR_MAP (PR #48351) 2026-06-24 19:34:33 +05:30
Elshayib
1a435a6d5d fix(model-switch): prevent custom-provider misattribution in model picker (#48305)
When the current provider is a custom endpoint (custom or custom:*), the model
switch pipeline must NOT auto-switch to a native provider/OpenRouter based on a
static-catalog match. The user explicitly configured their own endpoint and the
same model name may be served there; silently rewriting model.provider destroys
their config.

- detect_static_provider_for_model(): skip the static-catalog scan when the
  current provider is custom/custom:*
- switch_model() Step e: extend is_custom to cover custom:* so the
  detect_provider_for_model() last-resort fallback cannot fire

Salvaged from #48351 by Elshayib (authorship preserved).

Fixes #48305
2026-06-24 19:34:33 +05:30
kyssta-exe
b85c460540 fix(tui): targeted save_config_value for model persistence (#48305)
The TUI model-switch persistence (_persist_model_switch) rewrote the entire
model config block via save_config(), destroying sibling keys the user set
under model: (model_slots, model_fallback, base_url, ...) on every switch.

Use targeted, atomic, comment-preserving save_config_value("model.default" /
"model.provider" / "model.base_url") writes instead, so a model switch only
touches the keys it changes.

Salvaged from #48391 by kyssta-exe (authorship preserved).

Fixes #48305
2026-06-24 19:34:33 +05:30
kshitij
2187fd884c Merge pull request #51027 from NousResearch/salvage/typed-model-routing
fix(model_switch): route typed configured models off openai-codex (#45006)
2026-06-24 19:32:35 +05:30
kshitijk4poor
1a174dfb50 fix(models): gate openai-codex/xai-oauth soft-accept to family-shaped slugs (#45006)
Completes the #45006 fix. PR-base commit (configured-provider routing) handles
the case where a typed model IS declared in user/custom provider config. This
commit closes the other root: when a typed model is NOT in any config and the
current provider is a soft-accepting one (openai-codex / xai-oauth), the
hidden-model soft-accept (#16172 / #19729) would accept ANY unknown name as a
hidden model — so `qwen3.5-4b` typed on a Codex-default session "succeeded" and
mislabeled the provider as "OpenAI Codex" (the exact reported symptom), then
400'd on the next turn.

Gate the soft-accept to slugs that plausibly belong to the provider's family
(openai-codex -> gpt-/codex-/o1/o3/o4; xai-oauth -> grok-). Family-shaped
unknown slugs are still soft-accepted (preserving the #16172 entitlement-gated
hidden-model intent); unrelated names are rejected with actionable guidance to
pin the right provider via `--provider <slug>` or the picker.

Adds TestCodexSoftAcceptPlausibilityGate (5 tests): unrelated names rejected on
codex/xai, family-shaped hidden slugs still accepted, real catalog models
unaffected. Verified load-bearing.
2026-06-24 19:23:53 +05:30
kshitij
ae20c3fb90 Merge pull request #51025 from NousResearch/salvage/cron-autoreset-override
fix(gateway): consume was_auto_reset so /model survives session auto-reset (#48031)
2026-06-24 19:20:11 +05:30
x7peeps
6879d77d74 fix(gateway): consume was_auto_reset so /model survives session auto-reset
When `/model X` is the FIRST message after an idle/daily/suspended auto-reset,
the slash-command path stores a session model override but leaves
`session_entry.was_auto_reset = True` (it never passes through
`_handle_message_with_agent`, which is where the flag was consumed). On the
NEXT regular message, the auto-reset cleanup block pops the freshly-stored
model/reasoning override BEFORE the flag is consumed — so the switch is
silently lost and resolution falls back to the config default, while the
session DB still shows the switched model (a two-sources-of-truth divergence).

Consume the flag at both sites:
  1. gateway/run.py — capture `was_auto_reset` into a local and set the
     attribute False immediately at the top of the cleanup block, so the
     cleanup can't re-fire on a later message and wipe an override stored
     between turns. Downstream reads use the captured local.
  2. gateway/slash_commands.py — the model path consumes the flag before
     storing the override, so a /model-first-after-auto-reset isn't wiped by
     the next message's cleanup.

Salvaged from #48062 by x7peeps (authorship preserved).

Tests: tests/gateway/test_48031_model_switch_after_auto_reset.py — AST
invariants pinning both consume sites (load-bearing; verified they fail when
either consume is removed). Mirrors the AST-pin approach in
test_35809_auto_reset_clean_context.py. Gateway session/reset suite: 16 passed.

Fixes #48031
2026-06-24 19:12:44 +05:30
kshitij
d68a133458 Merge pull request #51890 from NousResearch/salvage/40695-handoff-watcher-async
fix(gateway): offload handoff-watcher SQLite calls to avoid blocking the async heartbeat (#40695)
2026-06-24 19:10:52 +05:30
kshitij
7634488074 Merge pull request #51889 from NousResearch/salvage/41289-model-cmd-async
fix(gateway): offload Discord /model provider-listing off the event loop (#41289)
2026-06-24 19:06:23 +05:30
kshitij
4f521a5382 Merge pull request #51898 from kshitijk4poor/salvage/openviking-recall-48927
feat(openviking): add full recall prefetch policy (salvage #48927)
2026-06-24 19:01:15 +05:30
kshitijk4poor
ab9134bf16 feat(openviking): add full recall prefetch policy
Salvage of PR #48927 by @ehz0ah, which consolidates OpenViking recall
work from #41706 (@huangxun375-stack), #33260, #49975, and #32444.

Replaces stale background post-turn prefetch warming with synchronous
current-query recall. The old queue_prefetch warmed the PREVIOUS user
message while turn-start recall consumed the CURRENT one, so injected
context was always about the wrong topic.

Changes:
- prefetch() now does session-aware /api/v1/search/search with the
  current query, falls back to /api/v1/search/find on failure
- Contract-safe payloads: limit, score_threshold, context_type,
  session_id — no top_k, no search-body mode, no target_uri
- L2 content reads for items with level=2 or empty abstracts, capped
  at full_read_limit (default 2)
- Local ranking (score + query-token overlap + leaf boost), dedup,
  score threshold, and injected-char budget
- queue_prefetch() is now a no-op (background warming removed)
- Additive batched viking_read: uris param accepts up to 3 URIs
- Per-request timeout support on _VikingClient.get/post/delete
- Removes stale _prefetch_result/_prefetch_thread/_prefetch_generation
  state and _invalidate_prefetch_state()
- Strengthened system_prompt_block guidance

Salvage follow-up fixes:
- Expose all 8 recall config knobs in get_config_schema() (PR #48927
  had removed them; #41706 correctly exposed them). Env vars remain
  as internal mechanism but are now visible in setup wizard.
- Lower default timeout 8s→4s, request_timeout 6s→3s, full_read_limit
  3→2 to reduce per-turn blocking latency.

Co-authored-by: Hao Zhe <haozhe4547@gmail.com>
Co-authored-by: Eurekaxun <eurekaxun@163.com>
2026-06-24 18:53:49 +05:30
liuhao1024
721cf54fb1 fix(gateway): offload /model provider-listing off the event loop (#41289)
The Discord/Telegram /model slash command listed providers synchronously
on the gateway's async event loop. list_picker_providers /
list_authenticated_providers are blocking and can fall through to a
synchronous urllib HTTP fetch when the on-disk provider cache is stale,
freezing the loop for 120-150s -> "application did not respond" and
delayed agent starts.

Port #41304's asyncio.to_thread offload to the current handler location.
The handler moved from gateway/run.py to gateway/slash_commands.py
(_handle_model_command); wrap BOTH blocking call sites so the whole bug
class is covered:

  - picker path        -> list_picker_providers
  - text-fallback path -> list_authenticated_providers

asyncio.to_thread is already idiomatic in this module (and asyncio is
imported), so the loop now stays responsive while the (possibly
network-bound) listing runs on a worker thread.

Adds tests/gateway/test_model_command_async_offload.py asserting the
offload contract at the real handler seam for both paths (mutation-
survivable: reverting either to_thread wrap fails the matching test).

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-24 18:40:52 +05:30
r266-tech
f0c5d812b0 fix(gateway): offload handoff watcher SessionDB polling off the event loop
The Discord gateway heartbeat stalled ('Shard ID None heartbeat blocked
for more than N seconds') because _handoff_watcher polled the synchronous,
blocking SQLite-backed SessionDB directly on the asyncio event loop every
2s. Each list_pending/claim/complete/fail call performed blocking disk I/O
on the loop thread, starving the Discord heartbeat coroutine.

Wrap every blocking SessionDB call inside the watcher loop in
asyncio.to_thread(...) so the SQLite work runs on a worker thread and the
event loop (and heartbeat) stays responsive. These four call sites are the
only synchronous self._session_db.* calls inside the watcher loop body.

Adds tests/gateway/test_handoff_watcher_async_db.py asserting the watcher
offloads its SessionDB calls via asyncio.to_thread (mutation-survivable:
reverting any to_thread wrap fails the corresponding assertion).

Fixes #40695

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-24 18:40:23 +05:30
kshitijk4poor
ac822e4d36 fix(compression): abort (preserve context) on transient network summary failure (#29559, #25585)
When context compaction's summary generation fails, the compressor's default
path (abort_on_summary_failure=False) drops the middle window and inserts a
static 'summary unavailable' marker — destroying the compacted turns. #29559
reported the field impact: a Connection error at the compaction moment dropped
124->15 messages (110 lost) for a long browser-automation task; #25585 is the
same failure mode (failed summary commits a destructive compaction anyway).

compress() already has an EXCEPTION to the historical drop default: auth
failures (401/403) ALWAYS abort and preserve the session, because rotating into
a placeholder-summary child on a broken credential strands the user. A transient
network/connection error is the same situation in reverse: it WILL recover, and
retrying then is strictly better than discarding context for a momentary blip.

Extend the always-abort carve-out to terminal connection/network failures:
- new _last_summary_network_failure flag, set in _generate_summary's terminal
  failure branch when _is_connection_error(e) (reached only after any main-model
  fallback is exhausted), reset alongside the auth flag;
- compress() aborts when it's set (returns messages unchanged,
  _last_compress_aborted=True), independent of abort_on_summary_failure;
- a network-specific operator warning (distinct from the auth + config-flag
  messages).

Scoped to connection errors only: a generic 500/400 still takes the historical
fallback-drop path (test_non_auth_failure_still_uses_fallback_path stays green).

Tests: network-failure detection + abort-despite-flag-false, both mutation-checked
(removing the flag-set fails detection; removing the carve-out fails the abort).
2026-06-24 18:31:51 +05:30
kshitijk4poor
a4a74ca9e9 fix(desktop): use notify() with stable id for fallback notification
hermes-pr-review findings:
- notifyError('runtime-not-ready', msg) misused the (error, fallback) API:
  the key became the notification body and the message became the title.
  Switch to notify({ id, kind, title, message }) which puts content in the
  right slots.
- The stable id 'runtime-not-ready' deduplicates: notify() replaces by id,
  so repeated refreshOnboarding calls during an outage no longer stack
  up to 4 persistent error toasts.
- Remove dead !state.manual guard from shouldPreserveConfiguredOnFallback:
  refreshOnboarding already short-circuits on manual before the helper.
- Test: seed localStorage with '1' before asserting it survives (was testing
  the wrong invariant — null in, null out).
- Test: use static import for spy instead of fragile await import.
- Test: add negative case for requested=true + configured=true (should
  still downgrade — requested overrides preservation).
2026-06-24 18:29:15 +05:30
kshitijk4poor
d398076c21 fix(desktop): show non-blocking notification on fallback runtime probe
When shouldPreserveConfiguredOnFallback keeps configured=true, also call
notifyError('runtime-not-ready', ...) so the user knows the backend wasn't
verified instead of silently proceeding. Adapted from @mohamedorigami-jpg's
approach in PR #37634.
2026-06-24 18:29:15 +05:30
infinitycrew39
7243111c57 test(desktop): cover fallback timeout onboarding downgrade regression 2026-06-24 18:29:15 +05:30
infinitycrew39
66a0907c95 fix(desktop): keep configured onboarding state on fallback runtime probes 2026-06-24 18:29:15 +05:30
xxxigm
89540d592b test(cli): cover non-interactive prompt_yes_no fallback
Regression coverage for the desktop gateway-restart hang: prompt_yes_no
returns its default when HERMES_NONINTERACTIVE=1 or on a bare EOFError
(closed/redirected stdin), and still exits on KeyboardInterrupt.
2026-06-24 17:56:30 +05:30
xxxigm
33926eb315 fix(cli): honor non-interactive context in prompt_yes_no
The dashboard/desktop spawn gateway actions with stdin=DEVNULL and
HERMES_NONINTERACTIVE=1 (hermes_cli/web_server.py), but prompt_yes_no
ignored that contract and called sys.exit(1) on the resulting EOFError.

On Windows, `gateway start` asks "Install it now so the gateway starts on
login? [Y/n]" when the scheduled task / startup entry is not yet
installed. Spawned from the desktop app there is no stdin to answer it, so
every desktop-triggered gateway restart aborted at that prompt and the
gateway never started ("Gateway service is not installed").

Fall back to the prompt's default when HERMES_NONINTERACTIVE is set, and
treat a bare EOFError as "accept default" rather than exiting. This lets
the Windows start path proceed unattended (Startup-folder fallback + direct
spawn) while interactive TTY usage is unchanged. Ctrl+C still exits.
2026-06-24 17:56:30 +05:30
Ben
8446c15706 docs(chronos): pin hop-1 auth to the hosted-agent bootstrap token
The wire contract said hop 1 uses "the agent's existing Nous Portal
access token" but didn't name WHICH of an agent's two identities that is.
A hosted agent never holds an `agent:{instanceId}` OAuth client (that
shape is minted only by the interactive dashboard auth-code grant); its
own outbound portal calls use the bootstrap-session token (client
`hermes-cli-vps`) planted in auth.json on first boot. NAS must resolve
the instance id from either an `agent:{id}` client OR the bootstrap
session (AgentInstance.bootstrapSessionId), not gate on `agent:*` alone —
which 403'd every real hosted-agent provision in prod.

Documents the NAS-side fix (resolveAgentCronInstanceId) so the contract
and the implementation agree.
2026-06-24 20:57:43 +10:00
Ben
c93b9f9057 feat(relay): terminal 4401 (opt-out) → clean "Relay disabled" state
Phase 7 Unit 7d-B. When an operator opts an instance OUT of the Team Gateway
relay (Unit 7b deprovision), the connector revokes the per-gateway secret and
closes the gateway's WS with 4401. The reconnect supervisor previously treated
EVERY close as retryable, so the live process spun "retrying 4401" forever and
the dashboard showed a red error — opt-out looked like a failure.

Now a 4401 close that arrives AFTER a successful handshake is recognized as a
terminal credential revocation:

- ws_transport.py: track `_handshake_succeeded` (set when a descriptor is
  received); on a 4401 close after a prior success, latch `auth_revoked` and do
  NOT spawn the reconnect supervisor. A 4401 BEFORE any successful handshake
  stays retryable (cold-start / not-yet-provisioned race, not a revocation).
  New `auth_revoked` property + a websockets-version-safe close-code reader
  (prefers `.rcvd`/`.sent` Close frames; `.code` is deprecated in websockets 13+).
- adapter.py: a revocation monitor turns `transport.auth_revoked` into a clean,
  NON-retryable `relay_disabled` fatal and notifies the gateway's fatal-error
  handler (so the adapter is removed and NOT queued for reconnection — the
  credential is dead until the instance is recreated). Monitor is cancelled on
  disconnect; only started when the transport exposes `auth_revoked` (prod WS).
- run.py: `_handle_adapter_fatal_error` maps the `relay_disabled` code to a
  `disabled` platform_state (not `fatal`/`retrying`).
- web: PlatformsCard renders the `disabled` state with a neutral outline badge,
  a PowerOff icon, and muted (not destructive-red) text + message. New optional
  `status.disabled` i18n string ("Disabled").

Also bundles the Phase 7 contract-doc update (this doc is authoritative in
hermes-agent): docs/relay-connector-contract.md gains an "Author-first
resolution + the account-link (DM) path" section documenting the
multi-tenant-guild rule (D-7.2 — route by authenticated author binding, never by
guild; unlinked → fail-closed), the `/link <code>` DM flow, and the
connector-authoritative opt-out + terminal-4401 behavior this PR implements.

Tests: +2 ws_transport (4401-after-handshake terminal / no-reconnect;
4401-before-handshake stays retryable) and +2 adapter (revocation → non-retryable
relay_disabled fatal + handler fired; no-revocation → no fatal). 138 relay tests
pass (incl. the contract-doc conformance test); ruff clean; web tsc clean.

Phase 7 Unit 7d-B (relay-adapter solo lane). Q17 → Option 2; Option 3 (live
de-register, no recreate) + the restart-re-provision hole deferred post-alpha.
2026-06-24 18:43:01 +10:00
Teknium
3c75e11571 fix(browser): validate agent-browser is runnable, not just present (#51740)
After `hermes update`, a globally-installed agent-browser's npm postinstall
(fixUnixSymlink) re-points the global symlink (e.g. /opt/homebrew/bin/agent-browser)
at our local node_modules binary. The next update wipes node_modules, leaving a
dangling symlink that `which` still reports but exec fails on with exit 127 —
silently breaking every browser tool (#48521).

Root cause is trust-on-presence: shutil.which/Path.exists accept a name that
resolves but won't run. Add hermes_constants.agent_browser_runnable() (resolves
the path + runs --version) and gate all four resolution sites on it:
_find_agent_browser now skips a dead candidate and falls through to the next
working one (extended PATH -> local .bin -> npx), self-healing the dangling link.
dep_ensure/doctor/nous_subscription validate too; doctor warns on a broken link.

Closes #48521.
2026-06-24 00:14:49 -07:00
Teknium
a911bcda18 docs: stop recommending pip install; curl installer is the only supported path (#51743)
* docs: stop recommending pip install hermes-agent; point to install script

The install script is the only supported install path (it provisions a
managed, isolated uv environment). Replace bare `pip install hermes-agent`
primary-install recommendations with the curl install script, and rewrite
optional-extra snippets (`pip install "hermes-agent[X]"`) to the managed-env
form `cd ~/.hermes/hermes-agent && uv pip install -e ".[X]"` that matches the
installer and the English quickstart.

Covers English docs + zh-Hans mirrors, the achievements plugin README, and
realigns the zh-Hans quickstart to the English Desktop-installer-first layout
(dropping its stale "Method A — pip (simplest)" section).

* docs: drop pip as a supported install/update method

Removes the 'pip installs' supported-method sections from updating.md and
cli-commands.md (EN + zh-Hans): the curl install script is the only supported
way to install/update the Hermes CLI. The _cmd_update_pip pip/pipx branches
remain in code as an undocumented safety net for users who already have such an
install, but the docs no longer advertise pip as a path.

Also normalizes a bare `pip install -e '.[acp]'` to the managed-env form.

Leaves python-library.md untouched: importing AIAgent as a library dependency
into your own project is a distinct use case where pip is correct.
2026-06-24 00:14:32 -07:00
teknium1
98224ce8b6 chore: add chazmaniandinkle to AUTHOR_MAP for PR #43888 salvage 2026-06-24 00:14:25 -07:00
Chaz Dinkle
abc3662bf6 fix(gateway): detect launchd in /restart service-manager probe (#43475)
On a launchd-managed gateway (macOS), /restart stopped the gateway but
never relaunched it: the handler's service detection checks only
INVOCATION_ID (systemd) and container markers, so under launchd it takes
the detached path and exits 0 — which KeepAlive.SuccessfulExit=false
treats as a deliberate stop. The gateway stays silently dead until a
manual launchctl kickstart.

Detect launchd via XPC_SERVICE_NAME, which launchd sets to the job label
for processes it spawns. The probe deliberately excludes the literal
"0": interactive macOS shells inherit XPC_SERVICE_NAME=0 (a truthy
string), and routing an unsupervised interactive gateway to the service
path would make it exit non-zero with nothing to revive it.

Routing through via_service=True (rather than forcing a non-zero exit
on the detached path) matters: the detached path also spawns a helper
that relaunches the gateway, so exiting non-zero there would have BOTH
the helper and launchd respawn it — two gateways racing for the same
bot tokens. The service path spawns no helper; launchd is the single
respawner.

Fixes #43475. Supersedes the run.py-era probes in #19940/#33393 (the
handler has since moved to gateway/slash_commands.py) and avoids the
double-spawn risk in the exit-code-site approaches (#43498, #43596).
2026-06-24 00:14:25 -07:00
Tranquil-Flow
73a20a6ad6 fix(telegram): clip mid-stream overflow instead of splitting (#48648) 2026-06-24 00:00:46 -07:00
Teknium
47fccc0735 refactor(dashboard): remove the dead tools box from the chat sidebar (#51737)
The dashboard chat sidebar's tool-call activity card was disabled in the
product — both ChatPage mounts passed showTools={false} (since #49077),
so the box never rendered. The sidebar still subscribed to tool.* events
and accumulated them in state for a panel nobody saw.

Remove the tools card, the showTools prop, the tool.* event handling and
state, and the now-orphaned ToolCall component. The /api/events
subscription stays for session.info (live title) and
dashboard.new_session_requested. The sidebar is now just the model
selector box; the session list (ChatSessionList) is unchanged.

No behavior change in the live dashboard — the tools box was already
hidden.
2026-06-23 23:59:55 -07:00
teknium1
ba50787180 test(anthropic-oauth): cover login token-endpoint host + fallback
Add two regression tests for the salvaged #48706 fix:
- login token exchange targets platform.claude.com first
- falls back to console.anthropic.com when the new host is unreachable

Also map the salvaged contributor's noreply email in release.py
AUTHOR_MAP (CI author-map gate).
2026-06-23 23:59:40 -07:00
yusekiotacode
2ee6449fe5 fix(anthropic): use platform.claude.com for OAuth token exchange
Anthropic migrated the OAuth token endpoint from
console.anthropic.com/v1/oauth/token (now returns HTTP 404) to
platform.claude.com/v1/oauth/token. The token *refresh* path already
iterated both hosts, but the two initial code-exchange call sites were
hardcoded to the dead console host, so every new Claude OAuth login
failed with 'Token exchange failed: HTTP Error 404: Not Found' and saved
no credentials.

Fix the whole bug class:
- Add _OAUTH_TOKEN_URLS [platform.claude.com, console.anthropic.com] in
  agent/anthropic_adapter.py; _OAUTH_TOKEN_URL now points at the live
  host for backward-compat with existing imports.
- run_hermes_oauth_login_pure() (CLI flow) iterates the list, first
  success wins, mirroring the refresh path.
- hermes_cli/web_server.py (desktop dashboard flow) imports the list and
  iterates it too, so the GUI login path is fixed identically.

Probe: console.anthropic.com/v1/oauth/token -> HTTP 404 (gone),
platform.claude.com/v1/oauth/token -> HTTP 400 (alive). Verified a real
Claude MAX OAuth login now succeeds end-to-end.
2026-06-23 23:59:40 -07:00
Teknium
be78fbd70e Revert "fix(profiles): clone auth.json so OAuth credentials carry to cloned profiles (#51719)" (#51732)
This reverts commit f504aecffe.
2026-06-23 23:58:43 -07:00
justemu
4aa793345e fix(matrix): use member_count as DM signal for named DM rooms
Most Matrix clients auto-set a room name when creating a DM (e.g.
"Alice & Bot" from participant display names), so the old
`is_direct and not has_explicit_name` heuristic classified virtually
all client-created DM rooms as "room", forcing require_mention gating
in legitimate one-on-one DMs.

member_count is now the primary DM signal: <=2 members means the room
is necessarily a 1:1 conversation, regardless of m.direct or an explicit
name. A room that grew to 3+ members but is still in stale m.direct is
still classified as a room (conflict flag set). Falls back to the
m.direct + name heuristic when the count is unavailable.

Also hardens _get_room_member_count with a joined_members API fallback
when the cache-backed state_store is empty.

Salvaged from #48554 by @justemu onto the current plugin adapter path
(gateway/platforms/matrix.py -> plugins/platforms/matrix/adapter.py).

Fixes #48551
2026-06-23 23:57:38 -07:00
Teknium
0ef86febe2 docs(sessions): clarify sessions.json is the gateway routing index, not the session list (#51726)
Users who inspect ~/.hermes/sessions/sessions.json see only gateway entries
(e.g. agent:main:whatsapp:dm:...) and mistake it for the session index that
hermes sessions list / /sessions read — which is actually state.db. Issue
#49361 reported CLI sessions as 'invisible' on this premise.

- gateway/session.py: write a self-documenting _README sentinel at the top of
  sessions.json explaining it's the gateway routing index and that ALL sessions
  (CLI/TUI/gateway) live in state.db; skip _-prefixed keys on load so the
  sentinel never round-trips into a SessionEntry.
- Harden every sessions.json reader against the sentinel: mcp_serve loader,
  gateway/mirror.py, gateway/channel_directory.py all skip _-prefixed keys.
- docs/user-guide/sessions.md: warning callout naming the exact symptom.
- tests: assert prune ignores metadata sentinels; add round-trip coverage.
2026-06-23 23:56:36 -07:00
liuhao1024
7ff48a6291 fix(discord): check pairing store for component button auth
Component button interactions (approve/deny, slash confirm, model
picker, clarify) were not checking the pairing store for authorization.
Users approved via `hermes pairing approve` could send messages and use
slash commands (which go through the gateway authz_mixin), but button
clicks were rejected because `_component_check_auth` only checked
env-var allowlists (DISCORD_ALLOWED_USERS, GATEWAY_ALLOW_ALL_USERS,
etc.) and not the pairing store.

This was a regression from commit f6f363662 which intentionally made
component auth fail-closed when no allowlist is set (security fix for
GHSA-mc26-p6fw-7pp6), but did not account for pairing-based auth.

Fix: add a `PairingStore.is_approved("discord", uid)` check to
`_component_check_auth`, mirroring `authz_mixin._check_authorization`.
The pairing store check runs after all allowlist checks, preserving the
fail-closed behavior for non-paired, non-allowed users.

Fixes #50627
2026-06-23 23:55:18 -07:00
Teknium
0957d77187 test(agent): cover interrupt tool-tail alternation close (#48879)
Regression coverage for the synthetic-assistant close: interrupt after a
successful tool must persist an assistant tail (placeholder when no
delivered text), real delivered text is preserved, and non-interrupted
or non-tool tails are left untouched.
2026-06-23 23:52:28 -07:00
kyssta-exe
81d2dc5d0f fix(agent): close tool-call sequence on interrupt to prevent role alternation violation (#48879) 2026-06-23 23:52:28 -07:00
teknium1
53f8386587 test(delegation): regression for bedrock Claude target_model api_mode routing
Asserts resolve_runtime_provider honors target_model over the stale
persisted model.default when choosing the Bedrock dual-path api_mode:
Claude target -> anthropic_messages, Nova target -> bedrock_converse.
Both fail without the #49095 fix.
2026-06-23 23:49:37 -07:00
kyssta-exe
284d06cabf fix(delegation): use target_model for bedrock api_mode routing (#49095) 2026-06-23 23:49:37 -07:00
teknium1
3dfbc0ad1d chore(release): map thestral123 author email for PR #42021 salvage 2026-06-23 23:49:22 -07:00
teknium1
d4be583d98 fix(telegram): raise default command-menu cap to 60 so skills stay visible
The 30-slot default could not fit Hermes's ~50 built-in commands, so
every skill command (and 20 built-ins) were silently dropped from the
Telegram \`/\` menu by default — they only worked when typed manually.
Raising the default to 60 keeps all built-ins plus common skill commands
visible out of the box while staying under Telegram's ~4KB payload limit.
Users can still tune it via platforms.telegram.extra.command_menu.
2026-06-23 23:49:22 -07:00
Thestral
dbe14ce35d feat(gateway): configure Telegram command menu priority
Adds a configurable Telegram BotCommand menu cap and priority list via
platforms.telegram.extra.command_menu (max_commands clamped 1..100;
priority_mode prepend|append|replace). Default cap stays 30; hidden
commands remain invokable when typed and /commands lists the full set.

Salvaged from PR #42021. Cherry-picked onto current main; the original
edited gateway/platforms/telegram.py, now relocated to
plugins/platforms/telegram/adapter.py.
2026-06-23 23:49:22 -07:00
Teknium
281a439ad4 fix(desktop): guard composer mutations when the composer core isn't bound (#51728)
The desktop composer threw an uncaught "Composer is not available" at
startup and the input went unresponsive (#49903). assistant-ui's composer
mutators (setText/send/…) throw when the thread's composer core isn't bound
yet; the read path is null-safe but the writes are not. ChatBar pushes draft
text via aui.composer().setText() from mount-time effects (draft restore,
clearDraft, external inserts), and the v0.17.0 popout refactor (#49488)
widened the unbound window by moving the composer out of the contain wrapper
into a sibling of the thread — so the throw surfaced as an uncaught error
that wedged the input.

Wrap every composer mutation in a setComposerText helper that swallows the
unbound-core throw. The contentEditable DOM + draftRef already hold the text
and the draft-editor sync re-applies it once the core attaches, so the draft
is never lost — only the premature state push is skipped.
2026-06-23 23:47:45 -07:00
Teknium
f504aecffe fix(profiles): clone auth.json so OAuth credentials carry to cloned profiles (#51719)
Selective --clone / --clone-from / --clone-config copied .env but not
auth.json, silently dropping the credential pool — including OAuth tokens
(Anthropic `claude /login`, Codex, xAI) that never land in .env. A profile
cloned from an OAuth-authenticated default therefore resolved a different
provider (or none) than the source under provider: auto. --clone-all already
carried auth.json via the full copytree; only the selective path missed it.

Add auth.json to _CLONE_CONFIG_FILES and tighten it to 0o600 after copy,
matching .env semantics.
2026-06-23 23:44:34 -07:00
Teknium
050bd01b7b fix(dashboard): serve uvicorn on SelectorEventLoop on Windows (#50641) (#51717)
On Windows, start_server() served uvicorn via a bare asyncio.run(_serve()),
which uses the default ProactorEventLoop. uvicorn's socket-serving stack
assumes a SelectorEventLoop on win32 (uvicorn/loops/asyncio.py forces it, and
uvicorn.Server.run threads config.get_loop_factory() into its runner for
exactly this reason). Driving uvicorn on the proactor loop makes
server.startup() bind a socket that never accepts: the dashboard and desktop
backend print "Skipping web UI build" then hang forever with the port
LISTENING but no TCP handshake completing.

Fix is win32-scoped to keep the blast radius minimal: POSIX keeps the exact
asyncio.run(_serve()) it had (its default loop is already a SelectorEventLoop /
uvloop, which is what uvicorn serves on). Only on Windows do we mirror
uvicorn.Server.run and run on the loop factory uvicorn picks, with a fallback
to WindowsSelectorEventLoopPolicy for uvicorn < 0.36.

Fixes hermes dashboard and hermes desktop (the Electron app spawns a
hermes dashboard backend). The gateway symptom in the report has a separate
root cause (no uvicorn) and is not addressed here.
2026-06-23 23:43:24 -07:00
teknium1
901165b5a4 fix(cron): complete plugins.cron_providers rename in 2 missed test files
uperLu's #50958 renamed plugins/cron → plugins/cron_providers but left
two test files patching the now-gone plugins.cron.chronos.verify path,
which would fail collection. Point them at plugins.cron_providers.*.
Add uperLu to release.py AUTHOR_MAP.
2026-06-23 23:39:22 -07:00
uperLu
0d4cecb352 fix(cron): avoid provider package shadowing core cron 2026-06-23 23:39:22 -07:00
Ben
31bced1607 fix(profiles): detect a separate-process gateway in profile status
The dashboard Profiles view showed "Gateway stopped" for a gateway that
is in fact running — while the sidebar status strip and `hermes gateway
status` (CLI) both correctly showed it running. Reported on v0.17.0
running the gateway + dashboard in one Docker container.

Root cause: three liveness surfaces with three detection strengths, all
reading the same `gateway.pid`:

  - `hermes gateway status` -> find_gateway_pids() (process-table scan)
  - sidebar /api/status     -> get_running_pid() + gateway_state.json PID
                               fallback + health-URL probe
  - Profiles view           -> _check_gateway_running() = get_running_pid()
                               ONLY, no fallback

`get_running_pid()` short-circuits to None the moment the runtime lock
(`gateway.lock`) doesn't register as held by the *calling* process —
which is always true when the reader is a separate process from the
gateway (the dashboard is its own s6 service in the container), and also
for any launch-service-managed gateway that left a fresh
`gateway_state.json` but no live PID file. So the Profiles view alone
reported the live gateway as stopped.

Fix: give _check_gateway_running the same fallback the sidebar already
has — after the pid-file/lock check misses, validate the PID recorded in
that profile's gateway_state.json against the live process table via the
existing get_runtime_status_running_pid(). read_runtime_status() gains an
optional path arg so a profile's state file can be read without mutating
the process-global HERMES_HOME (preserving the contextvar-based profile
isolation the dashboard relies on). Backward compatible: every existing
caller passes no argument.

Tests: a regression test that fails pre-fix (live gateway, lock check
returns None -> must still report running) and a guard test that a
'stopped' state file is never reported running even with a live PID.
2026-06-24 16:36:17 +10:00
teknium1
fa2f0bf3da chore(release): add francescomucio to AUTHOR_MAP for salvaged PR #51357 2026-06-24 16:34:51 +10:00
teknium1
366c2a3766 fix(gateway): propagate fatal-config exit code through start_gateway clean-exit path
The contributor PR stamped runner._exit_code=78 on non-retryable startup
errors, but start_gateway()'s clean-exit branch returned True before the
SystemExit(runner.exit_code) site, so main() exited 0. The s6 finish
script's [ "$1" = "78" ] check never matched and s6 crash-looped the
gateway anyway — the fix was dead as shipped (#51228).

Honor runner.exit_code in the clean-exit branch: raise SystemExit(code)
when set, else return True (normal /restart clean exit). Add a
start_gateway()-level test that asserts process-level SystemExit(78)
propagation — the gap the PR's object-level test missed — plus exit_code
on the existing _CleanExitRunner mocks.
2026-06-24 16:34:51 +10:00
Francesco Mucio
776f68e1ee fix(gateway): exit 78 (EX_CONFIG) on fatal startup errors, s6 finish script stops restart loop
Profiles without their own messaging token inherit the default
profile's token via os.getenv, hit a token collision, and exit with
startup_failed.  s6 restarts them immediately, creating ~30MB tirith
sandbox dirs in /tmp each cycle — filling the disk in hours (#51228).

Changes:
- gateway/restart.py: add GATEWAY_FATAL_CONFIG_EXIT_CODE = 78
- gateway/run.py: set exit_code=78 on non-retryable startup errors
  (token collision, no platforms)
- hermes_cli/service_manager.py: add _render_finish_script() that
  translates exit 78 → exit 125 (s6 permanent failure)
- hermes_cli/container_boot.py: write finish script alongside run
  script during profile registration

The s6 finish script pattern follows docker/s6-rc.d/dashboard/finish.

Closes #51228
2026-06-24 16:34:51 +10:00
Teknium
d93d0aee83 fix(cron): anchor naive schedule timestamps to configured timezone (#51695)
A naive ISO timestamp (e.g. 2026-06-22T20:07:00) was anchored to the
server's local timezone via dt.astimezone(), but the due-check
(get_due_jobs -> _hermes_now()) runs in the CONFIGURED Hermes timezone.
When the two diverge (cloud host on UTC with a different timezone: set,
or vice-versa) the stored instant lands hours off the user's wall-clock
intent, so one-shots never become due and recurring jobs fire at the
wrong time. The ticker stays healthy (heartbeat + success markers fresh)
because every tick finds nothing due, matching the silent no-fire in #51021.

Anchor naive timestamps to _hermes_now().tzinfo so '20:07' means 20:07 on
the same clock the scheduler checks against. The legacy _ensure_aware path
still treats already-stored naive values as server-local for back-compat.

Fixes #51021
2026-06-23 23:29:57 -07:00
Teknium
78e122ae1a feat(cron): warn when gateway not running on cron create/list (#51696)
The cron ticker only runs inside the gateway (_start_cron_ticker); there
is no standalone cron daemon. When the gateway isn't running, next_run_at
passes but jobs never fire and last_run_at stays null — and manual
'hermes cron run' (which bypasses the ticker) appears to work, masking
the real cause. This is the most common cron support report (#51038).

cron list already warned; extend the same warning to cron create (the
moment the user is most likely to hit this) via a shared helper, and add
a pointer to 'hermes cron status'. Silent when a gateway is running, so
the gateway /cron path is unaffected.
2026-06-23 23:29:50 -07:00
Teknium
c39b2b50ee fix(tui): stop a cwd package named utils/proxy/ui from crashing the gateway child (#51693)
Launching Hermes from a directory that ships its own top-level package with a
Hermes-internal name (utils/, proxy/, ui/) crashed the gateway/TUI child with
an ImportError (exit 1, crash loop): from utils import atomic_replace resolved
to the user's package.

tui_gateway/entry.py already stripped the relative cwd forms ('' / '.'), but
the launch dir also reaches sys.path as its own ABSOLUTE path (venv activation
or a project that adds itself to PYTHONPATH), which the strip missed and which
sat ahead of the Hermes root.

Centralize a hardened guard in hermes_bootstrap.harden_import_path(): drop the
relative forms AND force the Hermes source root to the front even when an
absolute cwd entry is present. Wire it into tui_gateway/entry.py and
acp_adapter/entry.py (both spawn into arbitrary cwds); hermes_cli/main.py and
gateway/run.py already insert the root at front. gatewayClient.ts now also
exports HERMES_PYTHON_SRC_ROOT for defense in depth.
2026-06-23 23:29:45 -07:00
teknium1
3d56807fbd fix(gateway): actively reap no-systemd gateway orphan before restart
Builds on @wgu9's runtime-tracking fix: now that find_gateway_pids() can
see a no-supervisor `gateway restart` runtime, have stop_profile_gateway()
fall back to an orphan-aware, profile-scoped reap (SIGTERM then SIGKILL)
when the pidfile/runtime record is missing or stale. Closes the duplicate-
accumulation path in #51325 — a follow-up restart now kills the prior
orphan instead of stacking another listener on :8644. Gated on
not supports_systemd_services() so a transient `gateway restart` argv on
supervised hosts is never killed.

Also adds the AUTHOR_MAP entry for the salvaged contributor.
2026-06-23 23:29:28 -07:00
jeremy gu
044996e403 fix(gateway): track no-systemd restart runtimes 2026-06-23 23:29:28 -07:00
Teknium
d539cd9004 fix(config): write config.yaml as UTF-8 to stop emoji/personality corruption (#51676)
atomic_yaml_write (and two sibling config writers) called yaml.dump
without allow_unicode=True. The default personalities shipped in cli.py
contain emoji/kaomoji, so PyYAML escaped astral-plane chars as 8-digit
\\UXXXXXXXX sequences inside multi-line double-quoted strings wrapped
with \\ line-continuations. Stricter/non-PyYAML parsers, editors, and
hand-edits break that structure into unclosed quotes, failing the whole
config parse -> silent fallback to defaults -> custom_providers lost.

Add allow_unicode=True to the canonical writer plus tui_gateway/server.py
and the telegram adapter's atomic config write so config is written as
readable UTF-8 with no escape/fold artifacts.

Fixes #51356
2026-06-23 23:28:21 -07:00
Teknium
8e7e104521 fix(cron): tell the user TUI/CLI cron jobs are local-only at create time (#51683)
deliver=origin (or omitted) from a TUI or classic-CLI session produces a
job with origin=null, because those sessions never populate the
HERMES_SESSION_PLATFORM/CHAT_ID context vars that _origin_from_env reads.
The scheduler then resolves no delivery target and skips delivery — the
job runs and saves output to last_output, but nothing reaches the user
and they only find out by polling cronjob(action='list') (#51568).

This is by design (local sessions have no live-delivery channel), so the
fix surfaces it instead of silently dropping the intent:

- cronjob create now appends an informational notice to its result when
  a created job resolves to zero delivery targets and the user did not
  explicitly ask for deliver='local'. The check uses the scheduler's own
  _resolve_delivery_targets so it accounts for origin, home channels,
  'all', and explicit platform targets — no false positives.
- PLATFORM_HINTS gains a 'tui' entry (the TUI had none) and the 'cli'
  hint now states that cron jobs from these sessions are local-only and
  that deliver must target a gateway-connected platform to notify the
  user. This stops the agent promising a delivery that never happens.

No scheduler/delivery behavior change; no new env var; cron isolation
invariant untouched.
2026-06-23 23:27:48 -07:00
Teknium
a39283bf09 test(docker): assert boot migration keeps .env byte-identical across reboots
Adds the #51579 regression test the issue asked for: run the real
docker_config_migrate.py boot path twice (host-reboot scenario under
--restart unless-stopped) and assert $HERMES_HOME/.env survives
byte-for-byte and the second boot is a no-op (no re-migration, no new
backup). Exercises real migrate_config + real file I/O via subprocess.
2026-06-24 15:23:23 +10:00
LeonSGP43
60d3b8cbce fix(docker): restore config backups after failed boot migration 2026-06-24 15:23:23 +10:00
teknium1
7f1c278db8 fix(photon): intercept console.log so 'stream interrupted' bursts escalate
spectrum-ts routes stream telemetry through @photon-ai/otel's createLogger,
which sends severity>=ERROR to console.error and WARN/INFO to console.log.
The two lines the health monitor keys off land on different channels:
log.error("stream persistently failing") -> console.error (caught), but
log.warn("stream interrupted; reconnecting") -> console.log (was missed).

The original interception patched console.error only, so the recovering->
degraded escalation counter never saw the interrupt bursts that are the
primary silent-inbound symptom. Verified live against spectrum-ts 3.1.0 +
@photon-ai/otel: 3 real log.warn('stream interrupted') calls now escalate
to degraded -> process.exit(75) -> adapter reconnect.

Adds a shared classifyStreamLog() fed by both console.error and console.log,
plus a regression test asserting both channels are intercepted.
2026-06-23 21:33:10 -07:00
Teknium
b60260c61a chore(release): add SidUParis to AUTHOR_MAP for salvaged PR #50071 2026-06-23 21:33:10 -07:00
XU SUN
0952acbf4d fix(photon): label upstream CatchUpEvents failures 2026-06-23 21:33:10 -07:00
helix4u
06cbc3bae9 fix(photon): recover degraded upstream stream 2026-06-23 21:33:10 -07:00
xxxigm
34bd6a0db5 test(installer): lock Python-fallback propagation into the venv stage (#50769)
Source-level regression guard (the script only runs on Windows, so there's no
runner on Linux CI). Asserts Resolve-AvailablePythonVersion exists, that
Install-Venv re-resolves the interpreter before the venv-creation line, and
that Test-Python and the resolver share the single $PythonFallbackVersions
constant so detection and venv creation can't drift apart again.
2026-06-23 21:33:08 -07:00
xxxigm
23683c3353 fix(installer): re-resolve Python fallback at venv stage on Windows (#50769)
The Windows installer runs each -Stage NAME in its own powershell.exe under
Hermes-Setup.exe. Test-Python records a detected fallback (e.g. 3.12 when 3.11
is absent) via an in-memory $script:PythonVersion = $fallbackVer mutation,
which dies with the python stage's process. The fresh venv stage starts with
$PythonVersion back at its "3.11" default, so it logged "Creating virtual
environment with Python 3.11..." and ran uv venv venv --python 3.11, failing
with exit 2 on machines that only had the fallback installed.

Add a cross-process-safe Resolve-AvailablePythonVersion helper (preferring the
requested version, then the shared $PythonFallbackVersions list, probed via
uv python find) and call it at the top of Install-Venv before creating the
venv. Test-Python's fallback loop now iterates the same shared constant so
detection and venv creation can't drift.
2026-06-23 21:33:08 -07:00
Ben Barclay
935f2bc48d docs(relay): add §3.4 — obligations on a future scale-to-zero behaviour layer (#51633)
The contract already documents the scale-to-zero PRIMITIVES (§3.2 going-idle/
buffered-flip, §3.3 wake poke) and what's out of scope. This adds the missing
half: the contract FROM the primitives TO the behaviour layer — the guarantees
a separate scale-to-zero workstream must honour to consume them safely (register
a wakeUrl before suspend; drain+ack before teardown; keep the reconnect loop
live; treat suspended != down in the health model; don't assume exactly-once/
prompt wake; suspend only when genuinely idle, composing with the existing drain
machine). Docs-only; lets the independent scale-to-zero stream build against a
written contract instead of re-reading the connector.
2026-06-24 12:27:19 +10:00
pefontana
4ea3096a85 chore(release): map jinhyuk9714 to AUTHOR_MAP for attribution check
The cherry-picked commit is authored by jinhyuk9714@gmail.com (GitHub
sjh9714); the check-attribution CI gate requires every PR commit author
to be present in scripts/release.py AUTHOR_MAP.
2026-06-23 18:42:05 -07:00
pefontana
667a9f5139 fix(update): reuse an existing PATH uv on Termux before pip
_ensure_uv_for_termux only checked resolve_uv() (the managed
$HERMES_HOME/bin/uv) before falling back to pip, so a uv installed via
`pkg install uv` lives on PATH but is invisible to the helper. Combined
with the cherry-picked wheel-only fallback, a Termux user with no managed
uv still hit `pip install uv`, which has no Android wheel and tried to
source-build the Rust crate, OOM-killing low-memory devices.

Probe shutil.which("uv") right after the Termux guard and reuse it before
pip. Add a regression test that keeps resolve_uv() returning None while a
uv exists on PATH and asserts pip is never invoked.
2026-06-23 18:42:05 -07:00
jinhyuk9714
3e508363f7 fix(update): avoid source-building uv on Termux 2026-06-23 18:42:05 -07:00
Ben Barclay
6e88f7b6f7 feat(relay): Phase 5 Unit C — wake primitive (gateway side) (#51595)
Register a per-instance wakeUrl and forward it to the connector at
self-provision so a suspended gateway can be poked awake when buffered
work arrives (pairs with the connector-side WakePoker).

- relay_wake_url() resolver (env GATEWAY_RELAY_WAKE_URL, then
  gateway.relay_wake_url in config.yaml), mirroring relay_instance_id()
- thread wake_url through _post_provision (adds wakeUrl to the body only
  when set) + self_provision_relay (resolve, forward, log)
- hermes gateway enroll --wake-url <url> persists GATEWAY_RELAY_WAKE_URL
- document the §5.2 wake poke in relay-connector-contract.md §3.3
- tests: relay_wake_url resolution (env/config/absent), provision
  forwarding, body-only-when-set (6 new; 130 relay tests pass)

The actual reconnect+drain on wake is Unit B's loop; this unit only
wires the wake SIGNAL. Opt-in: absent wakeUrl => connector never pokes.
2026-06-24 11:00:11 +10:00
brooklyn!
6ef679420e Merge pull request #46464 from NousResearch/bb/pets
Pets: animated mascots across CLI, TUI, and desktop
2026-06-23 19:20:47 -05:00
Brooklyn Nicholson
6afeea2bea harden(pets): host-pin asset downloads + sanitize slug paths
install_pet now refuses spritesheet/pet.json URLs that aren't on a petdex
host (matching thumbnail_png's existing _is_petdex_host guard), so a
spoofed manifest can't redirect a download at an arbitrary host. Slugs
are normalized to a single path segment before indexing into pets_dir(),
closing a path-traversal vector in load_pet/remove_pet/install_pet.
2026-06-23 19:13:08 -05:00
Brooklyn Nicholson
e495b33bf1 Merge remote-tracking branch 'origin/main' into bb/pets-merge
# Conflicts:
#	hermes_cli/commands.py
#	tui_gateway/server.py
2026-06-23 19:05:22 -05:00
Ben Barclay
40fddc9e4c feat(relay): Phase 5 §5.3 going-idle / buffered-flip primitive (gateway side) (#51572)
The gateway half of the going-idle/buffered-flip primitive (scale-to-zero
PRIMITIVE, not the behaviour). Integrates with the EXISTING drain transition:

- ws_transport: `go_idle()` sends `going_idle` + awaits the connector's
  `going_idle_ack` (connector-authoritative flip-then-ack, Q-5.3c — stays
  serving until the ack so nothing is lost in the flip window); acks a buffered
  inbound (bufferId present) via `inbound_ack` after the handler runs
  (drain-without-dup on the delivery leg); NET-NEW reconnect loop re-dials +
  re-handshakes after an unexpected close (off by default, on in production).
- adapter: emits `going_idle` from its existing `disconnect()` drain seam before
  tearing down the socket; best-effort + guarded (never blocks shutdown).
- transport Protocol + contract doc §3.2 document the 3 new frames.

+6 relay tests (124 pass). NOT in scope: the autonomous idle timer / machine
suspend / NAS health model (deferred behaviour). Ben's relay-adapter solo lane.
2026-06-24 09:50:30 +10:00
lEWFkRAD
433db17c0a fix(windows): harden gateway scheduled task (#45610)
* fix(windows): harden gateway scheduled task

* fix(windows): launch gateway scheduled task via console-less wscript

The Scheduled Task ran the gateway through cmd.exe, which allocates a
console. During logon Windows broadcasts CTRL_CLOSE_EVENT to console
process groups, reaping cmd.exe and the half-initialized gateway with
STATUS_CONTROL_C_EXIT (0xC000013A) - which Task Scheduler treats as a
user cancel, so RestartOnFailure never fires and the gateway vanishes on
every reboot (issue #45599 root cause #1).

Add a console-less .vbs launcher (wscript.exe -> pythonw.exe, both
GUI-subsystem) mirroring the gateway.cmd env + argv, and point the task
action at it. The .cmd stays for the Startup-folder fallback and /Run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Jeff <jeffrobodie@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-23 16:07:52 -07:00
fyzanshaik
0ba1dfed78 fix(gateway): refuse model switch on stale checkout to avoid env_float ImportError 2026-06-24 04:16:54 +05:30
manusjs
807bdc17f6 fix(gateway): prevent double dispatch of Discord messages via thread-starter dedup
When _auto_create_thread() creates a thread from a user message via
message.create_thread(), Discord fires a second MESSAGE_CREATE event
for the 'thread starter message'.  That starter message carries
message.id == thread.id and may arrive with type=default instead of
type=21 (thread_starter_message), so the existing type filter in
on_message does not catch it — triggering a second call into
_handle_message and thus a second agent run and response.

Fix: after _auto_create_thread succeeds and returns a thread, pre-seed
the dedup cache with str(thread.id) via self._dedup.is_duplicate().
The dedup cache is the same TTL-based MessageDeduplicator that already
guards against Discord RESUME event replays.  Calling is_duplicate()
marks the ID as seen; when the duplicate thread-starter MESSAGE_CREATE
arrives, on_message's guard returns True and the event is dropped.

This is a minimal, targeted fix:
- No new state: reuses the existing _dedup instance
- No timing/race: the pre-seed happens synchronously inside the async
  _handle_message, before the thread-starter event can be dispatched
- Scoped: only fires when auto-threading is enabled AND thread creation
  succeeds (thread object is not None)

Also adds tests in tests/gateway/test_discord_double_dispatch.py
covering the pre-seed behaviour, failure modes (thread creation fails,
auto-thread disabled), and dedup cache integrity.

Closes #51057
2026-06-24 03:25:33 +05:30
kshitij
89538d47b8 Merge pull request #51553 from NousResearch/salvage/48300-stale-session-lock
fix(gateway): preserve _session_tasks on guard mismatch to heal stale session lock (#48300)
2026-06-24 03:21:38 +05:30
kshitij
b56aafc2ef Merge pull request #51554 from kshitijk4poor/chore/authormap-manusjs
chore(release): map manusjs email to manus-use
2026-06-24 03:17:11 +05:30
kshitijk4poor
5511fcf944 chore(release): map manusjs email to manus-use GitHub login
Required by contributor-check/check-attribution before salvaging PR #51129
(Discord thread-starter dedup, #51057). The CI step greps AUTHOR_MAP by
exact email and does not special-case noreply addresses.
2026-06-24 03:09:23 +05:30
islam666
0c79992db5 fix(gateway): preserve _session_tasks on guard mismatch to enable stale lock healing (#48300)
_session_task_is_stale() failed to detect a stale session lock when the owner
task completed and cleaned _session_tasks (del in _process_message_background's
finally) but _active_sessions was NOT released because _release_session_guard
skipped on a guard mismatch (a concurrent reset/new command or drain handoff
swapped _active_sessions[key] to a different guard). With no owner task left to
inspect, _session_task_is_stale reported 'not stale', the orphaned guard was
never healed, and the session deadlocked permanently — later messages received
but never dispatched.

Reorder the finally cleanup to release-then-conditional-delete: release the
guard first, then drop the _session_tasks entry ONLY if the guard was actually
released (session_key no longer in _active_sessions). On a guard mismatch the
done-task entry survives, so the on-entry self-heal (_session_task_is_stale ->
_heal_stale_session_lock) detects the stale lock and clears it on the next
inbound message.

Extracted the cleanup into a callable _cleanup_finished_session_task() helper so
the regression test drives the REAL production code path rather than a copy of
its logic (the original test inlined the fixed logic and passed regardless of
the production order — mutation-verified the rewritten tests now fail on the
buggy del-first order). Added a positive-path test (guard matches -> release +
delete) so both branches are pinned.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-24 03:06:21 +05:30
helix4u
292a456c06 fix(agent): handle concurrent tool submit shutdown 2026-06-24 02:56:56 +05:30
kshitij
74265c8e84 Merge pull request #51541 from NousResearch/salvage/31599-telegram-closewait
fix(telegram): wire keepalive limits into general request pool to fix CLOSE_WAIT fd leak (#31599)
2026-06-24 02:35:37 +05:30
kshitij
9e924f79a8 Merge pull request #51539 from NousResearch/salvage/49045-toolcall-persist
fix(agent): persist tool calls before turn-end flush (#49045)
2026-06-24 02:27:36 +05:30
Teknium
e32ebc6aa2 feat(skills): /learn — distill a reusable skill from anything you describe (#51506)
Open-ended skill learning across every surface. /learn <free text> takes a
description of any source — a directory, a URL, the workflow you just walked
the agent through, or pasted notes — and the live agent gathers it with the
tools it already has (read_file/search_files, web_extract, the conversation,
the pasted text), then authors a SKILL.md via skill_manage following the
house authoring standards (<=60-char description, the standard section order,
Hermes-tool framing, no invented commands).

No engine, no model-tool footprint, works on any terminal backend (local,
Docker, remote): /learn builds a standards-guided prompt and hands it to the
agent as a normal turn.

- agent/learn_prompt.py: shared standards-guided prompt builder
- /learn registry entry (both surfaces) + CLI handler (inject onto input
  queue) + gateway handler (rewrite turn, fall through, /blueprint pattern)
- tui_gateway command.dispatch returns a send directive -> TUI + dashboard chat
- dashboard Skills page 'Learn a skill' panel (dir + URL + open-ended text)
  composes a /learn request and runs it in chat
- docs (slash-commands ref + skills feature page), 11 targeted tests

Inspired by OpenAI Codex's Record & Replay and the /learn concept from #47234
(dir-distillation engine); reworked to be open-ended and engine-free per
review.
2026-06-23 13:51:28 -07:00
konsisumer
190b01c553 fix(agent): persist tool calls before turn-end flush
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-24 02:15:57 +05:30
kshitijk4poor
4b7f3826c2 fix(telegram): wire platform_httpx_limits into general-pool HTTPXRequest (#31599)
PTB's HTTPXRequest builds its httpx.AsyncClient with
`limits = httpx.Limits(max_connections=connection_pool_size)` and no
keepalive tuning, so httpx's default keepalive_expiry=5.0 applies. Behind
an HTTP proxy (Cloudflare Warp etc.) a peer-initiated FIN can sit in
CLOSE_WAIT longer than that, leaking fds in the general request pool
(_request[1], which routes bot.send_message/set_my_commands) — the pool
_drain_polling_connections never resets. Telegram was the lone holdout
adapter not using the shared #18451 CLOSE_WAIT helper.

Wire gateway.platforms._http_client_limits.platform_httpx_limits() into
the httpx client across ALL THREE request-construction branches —
fallback-transport, proxy, and plain — via httpx_kwargs["limits"], which
PTB spreads last into its client kwargs so our tuned limits win. PTB's
connection_pool_size (max_connections) is preserved; only keepalive
behaviour is tightened (max_keepalive_connections + keepalive_expiry<5.0).

The fix is macOS-import-safe: no Linux-only socket TCP_KEEPIDLE/INTVL/CNT
constants at module scope (unlike the broken candidate which crashed on
import on the reporter's OS), and it patches the actual proxy path the
repro hits rather than TelegramFallbackTransport, which the proxy repro
never instantiates.

Adds a mutation-survivable behavior-contract test asserting every
HTTPXRequest built by connect() receives httpx_kwargs["limits"] with
keepalive_expiry < httpx's 5.0 default, across both the proxy and plain
branches. Reverting the limits wiring fails the test.

Co-authored-by: indigokarasu <mx.indigo.karasu@gmail.com>
2026-06-24 02:15:47 +05:30
kshitij
aaa2e2cb88 Merge pull request #51509 from NousResearch/salvage/49041-compression-session-lineage
fix(tui): preserve live session identity across compression (#49041)
2026-06-24 02:04:48 +05:30
kshitij
e155ca20ea Merge pull request #51507 from NousResearch/salvage/47134-mcp-killpg-guard
fix(mcp): skip killpg when child shares gateway's process group (#47134)
2026-06-24 01:48:26 +05:30
konsisumer
02050859f3 fix(tui): preserve live session identity across compression (#49041)
When a session rotates id on compression, _sync_session_key_after_compress()
re-anchored the session_key, approval-notify routing, yolo state, and slash
worker — but never moved the active-session lease, which stayed keyed to the
pre-compression id. And _find_live_session_by_key() matched live sessions on
the stale session_key, not the live agent's current agent.session_id. After
compression a resume/create path failed to recognize the existing live agent
and could build a SECOND live agent against the same DB continuation -> forked
lineage / cross-session message mixing.

- active_sessions.transfer_active_session(): move a lease in place to the new
  id under the exclusive file lock (no slot drop).
- gateway _transfer_active_session_slot(): call it inside
  _sync_session_key_after_compress(); on the rare fallback (entry pruned)
  RESERVE the new slot before releasing the old lease (reserve-before-release),
  so a concurrent gateway at the session cap cannot grab the freed slot in a
  release-then-reacquire window and leave this session with no lease; if the
  reserve fails, keep the existing lease (review fix).
- _session_lookup_key(): make live-session lookup authoritative on
  agent.session_id, wired into all stale-session_key consumers
  (_find_live_session_by_key, _session_live_item, _live_session_payload) —
  fixes the whole lookup class.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-24 00:54:18 +05:30
kyssta-exe
23c47371d2 fix(mcp): skip killpg when child shares gateway's process group (#47134)
/reload-mcp -> shutdown_mcp_servers -> _kill_orphaned_mcp_children(include_active=True)
-> _send_signal -> killpg(pgid, SIGTERM). When a tracked MCP stdio child shares
the gateway's OWN process group, killpg delivers SIGTERM to the gateway itself,
firing its SIGTERM handler -> os._exit(0): /reload-mcp crashes the gateway.

Pre-compute the gateway's own pgid (os.getpgrp(), None on Windows/restricted)
and, in _send_signal, skip killpg when pgid == own pgid, falling through to the
per-pid os.kill path so the child is still reaped without self-signaling.

Adds a regression test (folded in) that pins the guard: with a tracked pgid
equal to the gateway's own pgid, killpg is never called for that pgid and the
per-pid kill fallback is used. Mutation-checked.

Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
2026-06-24 00:52:18 +05:30
Teknium
64131bf975 chore: add s010mn to AUTHOR_MAP for PR #29221 salvage 2026-06-23 11:51:43 -07:00
s010mn
221cd60242 feat: add reasoning_effort support to ollama-cloud provider
Map Hermes xhigh→max to unlock DeepSeek V4's 'Max thinking' tier
through Ollama Cloud's OpenAI-compatible /v1/chat/completions endpoint.
low/medium/high pass through unchanged; disabled/none suppress
reasoning entirely.

Empirically confirmed: reasoning_effort:max produces ~2.5× more
thinking tokens than high on deepseek-v4-pro:cloud (1576 vs 642).
2026-06-23 11:51:43 -07:00
Teknium
72bfc48e63 feat(tui): track background subagents in the status bar (#51485)
Parity with the classic CLI status bar's ⛓ indicator (PR #51441). The
Ink TUI status bar now shows ⛓ N for live background/async subagents
(delegate_task batches + background single delegations).

- tui_gateway/server.py: _get_usage() embeds active_subagents from
  tools.async_delegation.active_count() — the same registry the CLI
  reads — onto the existing per-update usage payload, guarded so a
  raising active_count() leaves the field off without breaking usage.
- ui-tui appChrome: new 'subagents' status segment (breakpoint w>=92,
  slots between bg and cost in the shed-order), renders ⛓ N from
  usage.active_subagents.
- Usage / SessionUsageResponse types gain active_subagents?.

Distinct from the turn-scoped SpawnHud / /agents overlay, which mirror
live in-turn subagent.* events; this is the persistent registry count.
2026-06-23 11:32:00 -07:00
Victor Kyriazakos
da80ac0042 feat(slack): add --no-assistant flag to manifest generation
By default `hermes slack manifest` opts the app into Slack's AI Assistant
container (assistant_view feature + assistant:write scope +
assistant_thread_* events). Slack then renders DMs as the right-hand
Assistant split-pane, where every exchange is a thread and bare slash
commands (/help, /new, ...) are not delivered as normal command events —
they only work when the bot is @mentioned. There was no way to opt out
short of hand-editing the generated JSON.

Add --no-assistant to emit a flat-DM manifest that omits those three
pieces, so DMs render as a normal chat and slash commands dispatch
inline. The regular messaging surface (Messages tab, slash commands,
Socket Mode, channel + DM scopes/events) is preserved in both modes.

Default behaviour is unchanged (assistant mode still on).

Tests: cover both manifest modes and the argparse wiring.
2026-06-23 11:30:10 -07:00
kshitijk4poor
a4e61ddf04 fix(cron): fail closed when an unpinned job's provider drifts from creation snapshot (#44585)
An unpinned cron job follows the global default provider (config.yaml
model.default + resolve_runtime_provider). If that global state is changed
after the job is created — e.g. a temporary switch to a paid provider like
nous/claude-fable-5 — the job silently inherits it on its next tick and spends
real money. This is the reported $7.73 incident: a job created under a
free/default provider later inherited a temporary paid switch.

Fix (ask #1 only) preserves the legitimate "unpinned job should follow
model.default" use case by detecting *drift* rather than freezing the model:

- create_job (cron/jobs.py): for UNPINNED, agent-backed jobs (no explicit
  provider, not no_agent), snapshot the provider that resolution WOULD pick
  right now into a new optional `provider_snapshot` field, resolved via the
  same resolve_runtime_provider() path the ticker uses. Fail-open to None on
  any resolution error so job creation never breaks.

- run_job (cron/scheduler.py): right after runtime resolution, if the job has
  a provider_snapshot AND is unpinned AND the currently-resolved provider
  DIFFERS from the snapshot, fail closed for that run — make no paid call and
  deliver a loud, actionable alert naming both providers and telling the user
  to pin explicitly (`cronjob action=update job_id=.. provider=..`).

Back-compat: jobs with no snapshot (pre-existing jobs, no_agent jobs, or any
job whose creation-time resolution failed) behave exactly as before — the
guard only engages when a snapshot exists. Explicitly-pinned jobs (job.provider
set) are unaffected since they don't drift with global state.

Tests: tests/cron/test_cron_provider_pin.py covers snapshot-matches (runs),
snapshot-differs (fail closed, no agent constructed), no-snapshot back-compat,
None-snapshot back-compat, explicitly-pinned (runs regardless), plus create_job
snapshot capture/skip/fail-open. The fail-closed case is load-bearing (fails
without the guard).

Issue #44585 asks #2-4 (hard-stop a running job, gateway-stop containment,
fail-closed on provider mutation) are out of scope for this change.
2026-06-23 02:45:52 +05:30
harjothkhara
791c992b55 fix(model_switch): route typed configured models off openai-codex (#45006)
A typed `/model <name>` where `<name>` is declared under `providers.<slug>` or
`custom_providers` — but typed while the current provider is a soft-accepting
one (e.g. `openai-codex`) — stayed on the current provider and was swallowed as
an unknown hidden Codex model, instead of routing to the provider that actually
declares it.

Add configured-provider exact-match detection (`_configured_provider_matches`)
and a new Step d.5 in `switch_model`: if the typed model is declared in
user/custom provider config, route to that provider BEFORE
`detect_provider_for_model()` guesses from static catalogs and BEFORE the
common-path validation lets a soft-accepting current provider swallow the name.

- Matching is exact (case-insensitive) against explicitly-declared model
  collections only (`models`, `model`, `default_model`) — never fuzzy/family.
- Same-provider declarer → keep current provider (canonicalize the id).
- Multiple declarers → fail clearly and ask for `--provider <slug>`.
- Single declarer → route there; for `providers.<slug>` user providers, set
  `explicit_provider` so the credential block resolves base_url/key from config.
- Step e (`detect_provider_for_model`) is gated off when `config_routed`.

The deliberately-supported openai-codex / xai-oauth hidden-model soft-accept
(#16172 / #19729) is left untouched: when nothing in config matches, detection
is a no-op.

Salvaged from #45442 by harjothkhara (authorship preserved).

Tests: tests/hermes_cli/test_model_switch_configured_provider_routing.py
(7 tests). Full model_switch suite: 214 passed.

Fixes #45006
2026-06-23 02:03:21 +05:30
Brooklyn Nicholson
5342eccf12 Merge remote-tracking branch 'origin/main' into bb/pets 2026-06-22 05:25:49 -05:00
Brooklyn Nicholson
6fd839ac84 docs(pets): feature guide, petdex skill + catalog
Add the pets feature guide and the petdex skill (SKILL.md + bundled doc),
and register them in the website sidebar and skills catalog.
2026-06-20 14:18:43 -05:00
Brooklyn Nicholson
86b990fe0f feat(desktop): floating pet, pop-out overlay + Cmd+K picker
Add the in-window floating pet (sprite, speech bubble, contact shadow,
profile-scoped, resize-safe) and a pop-out always-on-top overlay window
with gestures and notifications. Add the Cmd+K pet picker page plus the
appearance gallery and size slider in settings. Includes the pet stores,
electron overlay wiring, i18n strings, and store tests.
2026-06-20 14:18:40 -05:00
Brooklyn Nicholson
75b36a138f feat(pets): TUI pet pane, picker + gateway RPCs
Add the Ink pet sprite pane, the interactive /pet picker overlay, and live
pet switching/rescale driven by new tui_gateway RPCs (pet state, pet.scale,
per-state frames). Wires pet flash state and the picker into the TUI layout
and slash handler. Covered by the slash-handler test.
2026-06-20 14:18:36 -05:00
Brooklyn Nicholson
83aa84ae3b feat(pets): CLI pet pane + /pet command
Render the reactive pet pane in the classic CLI (steady redraw,
right-aligned) and wire the /pet command to list and switch pets, plus an
enable/disable toggle. Backed by hermes_cli/pets.py and the CLI commands
mixin, registered in the central command registry. Covered by the CLI pet
pane and toggle tests.
2026-06-20 14:18:33 -05:00
Brooklyn Nicholson
e7dbfdaad7 feat(pets): pet engine + display.pet config
Add the shared pet engine under agent/pet/: spritesheet manifest loading
and in-process caching, six-state animation model, frame rendering, and
the persistent pet store. Register the display.pet config block (pet,
scale, enabled, etc.) that every surface reads from. Covered by
tests/agent/test_pet_engine.py.
2026-06-20 14:18:30 -05:00
1456 changed files with 134638 additions and 16623 deletions

View File

@@ -66,8 +66,12 @@ runtime/
# ---------- Not needed inside the Docker image ----------
# Desktop app source (Tauri/Electron); never installed in the container
# Desktop app source (Tauri/Electron); never installed in the container.
# apps/shared is the dashboard↔desktop websocket helper and is linked from
# web/package.json as a file: workspace dep — keep it in the build context.
apps/
!apps/shared/
!apps/shared/**
# Test suite — not shipped in production images
tests/

2
.envrc
View File

@@ -1,5 +1,5 @@
watch_file pyproject.toml uv.lock
watch_file package-lock.json package.json web/package.json ui-tui/package.json website/package.json apps/shared/package.json apps/desktop/package.json ui-tui/packages/hermes-ink/package.json
watch_file flake.nix flake.lock nix/devShell.nix nix/tui.nix nix/package.nix nix/python.nix
watch_file flake.nix flake.lock nix/devShell.nix nix/tui.nix nix/package.nix nix/python.nix nix/hermes-agent.nix nix/desktop.nix
use flake

View File

@@ -1,50 +0,0 @@
name: Hermes smoke test
description: >
Run the image's built-in entrypoint against `--help` and `dashboard --help`
to catch basic runtime regressions before publishing. Requires the image
to already be loaded into the local Docker daemon under `image`.
Works identically on amd64 and arm64 runners.
inputs:
image:
description: Fully-qualified image tag (e.g. nousresearch/hermes-agent:test)
required: true
runs:
using: composite
steps:
- name: Ensure /tmp/hermes-test is hermes-writable
shell: bash
run: |
# The image runs as the hermes user (UID 10000). GitHub Actions
# creates /tmp/hermes-test root-owned by default, which hermes
# can't write to — chown it to match the in-container UID before
# bind-mounting. Real users doing `docker run -v ~/.hermes:...`
# with their own UID hit the same issue and have their own
# remediations (HERMES_UID env var, or chown locally).
mkdir -p /tmp/hermes-test
sudo chown -R 10000:10000 /tmp/hermes-test
- name: hermes --help
shell: bash
run: |
# Use the image's real ENTRYPOINT (/init + main-wrapper.sh) so
# this exercises the actual production startup path. PR #30136
# review caught that an --entrypoint override here had been
# silently neutered by the s6-overlay migration — stage2-hook
# ignores its CMD args, so the smoke test was a no-op.
docker run --rm \
-v /tmp/hermes-test:/opt/data \
"${{ inputs.image }}" --help
- name: hermes dashboard --help
shell: bash
run: |
# Regression guard for #9153: dashboard was present in source but
# missing from the published image. If this fails, something in
# the Dockerfile is excluding the dashboard subcommand from the
# installed package.
docker run --rm \
-v /tmp/hermes-test:/opt/data \
"${{ inputs.image }}" dashboard --help

View File

@@ -12,7 +12,6 @@ name: CI
on:
pull_request:
branches: [main]
push:
branches: [main]
@@ -21,6 +20,7 @@ permissions:
pull-requests: write # needed by lint (PR comment) + supply-chain (PR comment)
actions: read # needed by osv-scanner (SARIF upload)
security-events: write # needed by osv-scanner (SARIF upload)
packages: write # needed by docker build
concurrency:
group: ci-${{ github.ref }}
@@ -33,6 +33,7 @@ jobs:
# (all lanes true) so post-merge validation is never weakened.
# ─────────────────────────────────────────────────────────────────────
detect:
name: Detect affected areas
runs-on: ubuntu-latest
outputs:
python: ${{ steps.classify.outputs.python }}
@@ -54,11 +55,15 @@ jobs:
# Skipped workflows (if condition is false) don't spin up runners.
# ─────────────────────────────────────────────────────────────────────
tests:
name: Python tests
needs: detect
if: needs.detect.outputs.python == 'true'
uses: ./.github/workflows/tests.yml
with:
slice_count: 8
lint:
name: Python lints
needs: detect
if: needs.detect.outputs.python == 'true'
uses: ./.github/workflows/lint.yml
@@ -66,35 +71,49 @@ jobs:
event_name: ${{ needs.detect.outputs.event_name }}
typecheck:
name: TypeScript
needs: detect
if: needs.detect.outputs.frontend == 'true'
uses: ./.github/workflows/typecheck.yml
docs-site:
name: Docs Site
needs: detect
if: needs.detect.outputs.site == 'true'
uses: ./.github/workflows/docs-site-checks.yml
history-check:
name: Deny unrelated histories
needs: detect
if: needs.detect.outputs.event_name == 'pull_request'
uses: ./.github/workflows/history-check.yml
contributor-check:
name: Check contributors
needs: detect
if: needs.detect.outputs.python == 'true'
uses: ./.github/workflows/contributor-check.yml
uv-lockfile:
name: Check uv.lock
needs: detect
uses: ./.github/workflows/uv-lockfile-check.yml
docker-lint:
name: Lint Docker scripts
needs: detect
if: needs.detect.outputs.docker_meta == 'true'
uses: ./.github/workflows/docker-lint.yml
docker:
name: Build&Test Docker image
needs: detect
if: needs.detect.outputs.python == 'true' || needs.detect.outputs.frontend == 'true' || needs.detect.outputs.docker_meta == 'true'
uses: ./.github/workflows/docker.yml
secrets: inherit
supply-chain:
name: Supply-chain scan
needs: detect
if: needs.detect.outputs.event_name == 'pull_request' && (needs.detect.outputs.scan == 'true' || needs.detect.outputs.deps == 'true' || needs.detect.outputs.mcp_catalog == 'true')
uses: ./.github/workflows/supply-chain-audit.yml
@@ -105,7 +124,7 @@ jobs:
mcp_catalog: ${{ needs.detect.outputs.mcp_catalog == 'true' }}
osv-scanner:
needs: detect
name: OSV scan
uses: ./.github/workflows/osv-scanner.yml
# ─────────────────────────────────────────────────────────────────────
@@ -128,6 +147,8 @@ jobs:
- docker-lint
- supply-chain
- osv-scanner
# we don't require docker to pass rn because it's so slow lol
# - docker
if: always()
runs-on: ubuntu-latest
steps:
@@ -144,3 +165,67 @@ jobs:
sys.exit(1)
print('All checks passed (or were skipped)')
"
# ─────────────────────────────────────────────────────────────────────
# CI timing report: collect per-job/step durations from the GitHub API,
# cache them on main (as a baseline), and on PRs generate an HTML diff
# report with a gantt chart + per-step breakdown. The report is uploaded
# as an artifact and a markdown summary is written to $GITHUB_STEP_SUMMARY.
# ─────────────────────────────────────────────────────────────────────
ci-timings:
name: CI timing report
needs: [all-checks-pass, docker]
if: always()
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Restore baseline cache (PR only)
if: github.event_name == 'pull_request'
uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: ci-timings-baseline.json
# Prefix-match: exact key will never hit (run_id differs), so
# restore-keys finds the most recent baseline from main.
key: ci-timings-baseline-never-exact
restore-keys: |
ci-timings-baseline-
- name: Collect timings and generate report
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
run: |
python3 scripts/ci/timings_report.py \
--baseline ci-timings-baseline.json \
--output ci-timings-report.html \
--json-out ci-timings.json \
--summary-out ci-timings-summary.md
- name: Upload HTML report
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
id: ci-timings-artifact
with:
name: ci-timings-report
path: ci-timings-report.html
retention-days: 14
archive: false
- name: Output summary
env:
REPORT_URL: ${{ steps.ci-timings-artifact.outputs.artifact-url}}
run: |
echo "# CI Timing report" >> "$GITHUB_STEP_SUMMARY"
echo "[View the full interactive report]($REPORT_URL)" >> "$GITHUB_STEP_SUMMARY"
cat ci-timings-summary.md >> "$GITHUB_STEP_SUMMARY"
- name: Save baseline cache (main only)
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
run: cp ci-timings.json ci-timings-baseline.json
- name: Upload baseline to cache (main only)
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: ci-timings-baseline.json
key: ci-timings-baseline-${{ github.run_id }}

View File

@@ -2,7 +2,7 @@ name: Docker / shell lint
# Lints the container build inputs: Dockerfile (via hadolint) and any shell
# scripts under docker/ (via shellcheck). These catch the class of regression
# the behavioral docker-publish smoke test can't — unquoted variable
# the behavioral docker smoke test can't — unquoted variable
# expansions, silently-failing RUN commands, etc.
#
# Rules and ignores are documented in .hadolint.yaml at the repo root.

View File

@@ -1,353 +0,0 @@
name: Docker Build and Publish
on:
push:
branches: [main]
paths:
- '**/*.py'
- 'pyproject.toml'
- 'uv.lock'
- 'Dockerfile'
- 'docker/**'
- '.github/workflows/docker-publish.yml'
- '.github/actions/hermes-smoke-test/**'
# No paths filter — the job must always run so the required check
# reports a status (path-gated workflows leave checks "pending" forever
# when no matching files change, which blocks merge).
pull_request:
branches: [main]
release:
types: [published]
permissions:
contents: read
# Needed so the arm64 job can push/pull its registry-backed build cache
# to ghcr.io (cache-to/cache-from type=registry). See the build-arm64
# job for why registry cache replaced the gha cache on that arch.
packages: write
# Concurrency: push/release runs are NEVER cancelled so every merge gets
# its own image. PR runs reuse a PR-scoped group with
# cancel-in-progress: true so rapid pushes to the same PR collapse to the
# latest commit.
concurrency:
group: docker-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
env:
IMAGE_NAME: nousresearch/hermes-agent
jobs:
# ---------------------------------------------------------------------------
# Build amd64 natively. This job also runs the smoke tests (basic --help
# and the dashboard subcommand regression guard from #9153), because amd64
# is the only arch we can `load` into the local daemon on an amd64 runner.
# ---------------------------------------------------------------------------
build-amd64:
# Only run on the upstream repository, not on forks
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-latest
timeout-minutes: 45
outputs:
digest: ${{ steps.push.outputs.digest }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
# The image build + smoke test + integration tests run ONLY on
# push-to-main and release — never on PRs. They are the heaviest jobs
# in CI (~15-45 min) and a broken build surfaces on the main push (and
# is gated pre-merge by docker-lint + uv-lockfile-check). Every step
# below is skipped on PRs, so the job still reports green and the
# required check never hangs.
- name: Set up Docker Buildx
if: github.event_name != 'pull_request'
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
# Build once, load into the local daemon for smoke testing. Cached
# to gha with a per-arch scope; the push step below reuses every
# layer from this build.
- name: Build image (amd64, smoke test)
if: github.event_name != 'pull_request'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
load: true
platforms: linux/amd64
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
cache-from: type=gha,scope=docker-amd64
cache-to: type=gha,mode=max,scope=docker-amd64
- name: Smoke test image
if: github.event_name != 'pull_request'
uses: ./.github/actions/hermes-smoke-test
with:
image: ${{ env.IMAGE_NAME }}:test
# ---------------------------------------------------------------------
# Run the docker-integration test suite against the freshly-built
# image already loaded into the local daemon (`:test`). These tests
# are excluded from the sharded `tests.yml :: test` matrix on purpose
# (see `_SKIP_PARTS` in scripts/run_tests_parallel.py) because each
# shard would otherwise reach the session-scoped ``built_image``
# fixture in ``tests/docker/conftest.py`` and start a 3-7min
# ``docker build`` — guaranteed to
# die in fixture setup.
#
# Piggybacking here avoids a second image build: the smoke test
# already proved the image loads + runs, so the daemon has it under
# `${IMAGE_NAME}:test` and we just point ``HERMES_TEST_IMAGE`` at
# that. The fixture's ``HERMES_TEST_IMAGE`` branch (see
# tests/docker/conftest.py:62-63) short-circuits the rebuild.
#
# Why this job and not a standalone one: the image is 5GB+; passing
# it between jobs via ``docker save``/``upload-artifact`` is slower
# than the build itself. Reusing the existing daemon state is the
# cheapest path to coverage on every PR that touches docker code.
# ---------------------------------------------------------------------
- name: Install uv (for docker tests)
if: github.event_name != 'pull_request'
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
- name: Set up Python 3.11 (for docker tests)
if: github.event_name != 'pull_request'
run: uv python install 3.11
- name: Install Python dependencies (for docker tests)
if: github.event_name != 'pull_request'
run: |
uv venv .venv --python 3.11
source .venv/bin/activate
# ``dev`` extra pulls in pytest, pytest-asyncio —
# everything tests/docker/ needs. We deliberately avoid ``all``
# here because the docker tests only drive the container via
# subprocess and don't import hermes_agent's optional deps.
uv pip install -e ".[dev]"
- name: Run docker integration tests
if: github.event_name != 'pull_request'
env:
# Skip rebuild; use the image already loaded by the build step.
HERMES_TEST_IMAGE: ${{ env.IMAGE_NAME }}:test
# Match the policy in tests.yml :: test job — no accidental
# real-API calls from inside the harness.
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
run: |
source .venv/bin/activate
python -m pytest tests/docker/ -v --tb=short
- name: Log in to Docker Hub
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
# Push amd64 by digest only (no tag). The merge job assembles the
# tagged manifest list. `push-by-digest=true` is docker's recommended
# pattern for multi-runner multi-platform builds.
- name: Push amd64 by digest
id: push
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
platforms: linux/amd64
labels: |
org.opencontainers.image.revision=${{ github.sha }}
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
outputs: type=image,name=${{ env.IMAGE_NAME }},push-by-digest=true,name-canonical=true,push=true
cache-from: type=gha,scope=docker-amd64
cache-to: type=gha,mode=max,scope=docker-amd64
# Write the digest to a file and upload it as an artifact so the
# merge job can stitch both per-arch digests into a manifest list.
- name: Export digest
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
run: |
mkdir -p /tmp/digests
digest="${{ steps.push.outputs.digest }}"
touch "/tmp/digests/${digest#sha256:}"
- name: Upload digest artifact
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
with:
name: digest-amd64
path: /tmp/digests/*
if-no-files-found: error
retention-days: 1
# ---------------------------------------------------------------------------
# Build arm64 natively on GitHub's free arm64 runner. This replaces the
# previous QEMU-emulated arm64 build, which was ~5-10x slower and shared
# a cache scope with amd64. Matches the amd64 job's shape: build+load,
# smoke test, then on push/release push by digest.
# ---------------------------------------------------------------------------
build-arm64:
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-24.04-arm
timeout-minutes: 45
outputs:
digest: ${{ steps.push.outputs.digest }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
# arm64 build runs only on push-to-main and release (see build-amd64).
- name: Set up Docker Buildx
if: github.event_name != 'pull_request'
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
# Log in to ghcr.io so the registry-backed build cache below can be
# read (cache-from) on every event and written (cache-to) on
# push/release. Uses the workflow's GITHUB_TOKEN, which is valid for
# the whole job — unlike the gha cache backend's short-lived Azure SAS
# token, which expired mid-build on slow cold-cache arm64 runs and
# crashed the build before the smoke test (the reason the gha cache
# was removed from arm64 PRs in the first place).
- name: Log in to ghcr.io (build cache)
if: github.event_name != 'pull_request'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
registry: ghcr.io
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# Build once, load into the local daemon for smoke testing, then push
# by digest below. Reads AND writes the registry-backed cache so the
# push reuses layers from this build and the next build starts warm.
#
# Registry cache (type=registry on ghcr.io) is used instead of the gha
# cache that previously broke here: its credential is the job-lifetime
# GITHUB_TOKEN, not a short-lived SAS token, so the cold-build-outlives-
# token failure mode cannot recur.
- name: Build image (arm64, smoke test, cached publish)
if: github.event_name != 'pull_request'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
load: true
platforms: linux/arm64
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
cache-from: type=registry,ref=ghcr.io/nousresearch/hermes-agent:buildcache-arm64
cache-to: type=registry,ref=ghcr.io/nousresearch/hermes-agent:buildcache-arm64,mode=max
- name: Smoke test image
if: github.event_name != 'pull_request'
uses: ./.github/actions/hermes-smoke-test
with:
image: ${{ env.IMAGE_NAME }}:test
- name: Log in to Docker Hub
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Push arm64 by digest
id: push
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
platforms: linux/arm64
labels: |
org.opencontainers.image.revision=${{ github.sha }}
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
outputs: type=image,name=${{ env.IMAGE_NAME }},push-by-digest=true,name-canonical=true,push=true
cache-from: type=registry,ref=ghcr.io/nousresearch/hermes-agent:buildcache-arm64
cache-to: type=registry,ref=ghcr.io/nousresearch/hermes-agent:buildcache-arm64,mode=max
- name: Export digest
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
run: |
mkdir -p /tmp/digests
digest="${{ steps.push.outputs.digest }}"
touch "/tmp/digests/${digest#sha256:}"
- name: Upload digest artifact
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
with:
name: digest-arm64
path: /tmp/digests/*
if-no-files-found: error
retention-days: 1
# ---------------------------------------------------------------------------
# Stitch both per-arch digests into a single tagged multi-arch manifest.
# This is a registry-side operation — no building, no layer re-push —
# so it runs in ~30 seconds.
#
# On main pushes: tags both :main and :latest.
# On releases: tags :<release_tag_name>.
# ---------------------------------------------------------------------------
merge:
if: github.repository == 'NousResearch/hermes-agent' && (github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release')
runs-on: ubuntu-latest
needs: [build-amd64, build-arm64]
timeout-minutes: 10
steps:
- name: Download digests
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
path: /tmp/digests
pattern: digest-*
merge-multiple: true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
- name: Log in to Docker Hub
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Create manifest list and push
working-directory: /tmp/digests
run: |
set -euo pipefail
args=()
for digest_file in *; do
args+=("${IMAGE_NAME}@sha256:${digest_file}")
done
if [ "${{ github.event_name }}" = "release" ]; then
TAG="${{ github.event.release.tag_name }}"
docker buildx imagetools create \
-t "${IMAGE_NAME}:${TAG}" \
"${args[@]}"
else
docker buildx imagetools create \
-t "${IMAGE_NAME}:main" \
-t "${IMAGE_NAME}:latest" \
"${args[@]}"
fi
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
- name: Inspect image
run: |
if [ "${{ github.event_name }}" = "release" ]; then
docker buildx imagetools inspect "${IMAGE_NAME}:${{ github.event.release.tag_name }}"
else
docker buildx imagetools inspect "${IMAGE_NAME}:main"
fi
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}

209
.github/workflows/docker.yml vendored Normal file
View File

@@ -0,0 +1,209 @@
name: Docker Build, Test, and Publish
on:
release:
types: [published]
workflow_call:
permissions:
contents: read
# Concurrency: push/release runs are NEVER cancelled so every merge gets
# its own image. PR runs reuse a PR-scoped group with
# cancel-in-progress: true so rapid pushes to the same PR collapse to
# the latest commit.
concurrency:
group: docker-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: ${{ github.event_name == 'pull_request' }}
env:
IMAGE_NAME: nousresearch/hermes-agent
jobs:
# Build, test, and optionally push the image for each architecture.
build:
if: github.repository == 'NousResearch/hermes-agent'
strategy:
fail-fast: false
matrix:
include:
- arch: amd64
runner: ubuntu-latest
platform: linux/amd64
cache-from: type=gha,scope=docker-amd64
cache-to: type=gha,mode=max,scope=docker-amd64
- arch: arm64
runner: ubuntu-24.04-arm
platform: linux/arm64
cache-from: type=gha,scope=docker-arm64
cache-to: type=gha,mode=max,scope=docker-arm64
runs-on: ${{ matrix.runner }}
timeout-minutes: 45
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
# Build once, load into the local daemon for testing. Cached
# per-arch; the push step below reuses every layer from this build.
- name: Build image (${{ matrix.arch }})
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
load: true
platforms: ${{ matrix.platform }}
tags: ${{ env.IMAGE_NAME }}:test
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
cache-from: ${{ matrix.cache-from }}
cache-to: ${{ (github.event_name != 'pull_request') && matrix.cache-to || '' }}
- name: Log in to Docker Hub
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
# Push by digest only (no tag). The merge job assembles the
# tagged manifest list. `push-by-digest=true` is docker's recommended
# pattern for multi-runner multi-platform builds.
- name: Push ${{ matrix.arch }} by digest
id: push
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: docker/build-push-action@bcafcacb16a39f128d818304e6c9c0c18556b85f # v7.1.0
with:
context: .
file: Dockerfile
platforms: ${{ matrix.platform }}
labels: |
org.opencontainers.image.revision=${{ github.sha }}
build-args: |
HERMES_GIT_SHA=${{ github.sha }}
outputs: type=image,name=${{ env.IMAGE_NAME }},push-by-digest=true,name-canonical=true,push=true
cache-from: ${{ matrix.cache-from }}
cache-to: ${{ matrix.cache-to }}
# Write the digest to a file and upload it as an artifact so the
# merge job can stitch both per-arch digests into a manifest list.
- name: Export digest
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
run: |
mkdir -p /tmp/digests
digest="${{ steps.push.outputs.digest }}"
touch "/tmp/digests/${digest#sha256:}"
- name: Upload digest artifact
if: github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release'
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
with:
name: digest-${{ matrix.arch }}
path: /tmp/digests/*
if-no-files-found: error
retention-days: 1
# Run the docker-integration test suite against the freshly-built
# image already loaded into the local daemon (`:test`).
#
# Piggybacking here avoids a second image build: the build step
# already loaded the image into the daemon under
# `${IMAGE_NAME}:test`, so we just point ``HERMES_TEST_IMAGE`` at
# that. The fixture's ``HERMES_TEST_IMAGE`` branch (see
# tests/docker/conftest.py:62-63) short-circuits the rebuild.
#
# Why this job and not a standalone one: the image is 5GB+; passing
# it between jobs via ``docker save``/``upload-artifact`` is slower
# than the build itself. Reusing the existing daemon state is the
# cheapest path to coverage on every PR that touches docker code.
# ---------------------------------------------------------------------
- name: Install uv (for docker tests)
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
- name: Set up Python 3.11 (for docker tests)
run: uv python install 3.11
- name: Install Python dependencies (for docker tests)
run: |
# ``dev`` extra pulls in pytest, pytest-asyncio —
# everything tests/docker/ needs. We deliberately avoid ``all``
# here because the docker tests only drive the container via
# subprocess and don't import hermes_agent's optional deps.
uv sync --locked --python 3.11 --extra dev
- name: Run docker integration tests
env:
# Skip rebuild; use the image already loaded by the build step.
HERMES_TEST_IMAGE: ${{ env.IMAGE_NAME }}:test
# Match the policy in tests.yml :: test job — no accidental
# real-API calls from inside the harness.
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
run: |
scripts/run_tests.sh tests/docker/ --file-timeout 600
# ---------------------------------------------------------------------------
# Stitch both per-arch digests into a single tagged multi-arch manifest.
# This is a registry-side operation — no building, no layer re-push —
# so it runs in ~30 seconds.
#
# On main pushes: tags both :main and :latest.
# On releases: tags :<release_tag_name>.
# ---------------------------------------------------------------------------
merge:
if: github.repository == 'NousResearch/hermes-agent' && (github.event_name == 'push' && github.ref == 'refs/heads/main' || github.event_name == 'release')
runs-on: ubuntu-latest
needs: [build]
timeout-minutes: 10
steps:
- name: Download digests
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
path: /tmp/digests
pattern: digest-*
merge-multiple: true
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@8d2750c68a42422c14e847fe6c8ac0403b4cbd6f # v3
- name: Log in to Docker Hub
uses: docker/login-action@4907a6ddec9925e35a0a9e82d7399ccc52663121 # v4.1.0
with:
username: ${{ secrets.DOCKERHUB_USERNAME }}
password: ${{ secrets.DOCKERHUB_TOKEN }}
- name: Create manifest list and push
working-directory: /tmp/digests
run: |
set -euo pipefail
args=()
for digest_file in *; do
args+=("${IMAGE_NAME}@sha256:${digest_file}")
done
if [ "${{ github.event_name }}" = "release" ]; then
TAG="${{ github.event.release.tag_name }}"
docker buildx imagetools create \
-t "${IMAGE_NAME}:${TAG}" \
"${args[@]}"
else
docker buildx imagetools create \
-t "${IMAGE_NAME}:main" \
-t "${IMAGE_NAME}:latest" \
"${args[@]}"
fi
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}
- name: Inspect image
run: |
if [ "${{ github.event_name }}" = "release" ]; then
docker buildx imagetools inspect "${IMAGE_NAME}:${{ github.event.release.tag_name }}"
else
docker buildx imagetools inspect "${IMAGE_NAME}:main"
fi
env:
IMAGE_NAME: ${{ env.IMAGE_NAME }}

View File

@@ -37,7 +37,7 @@ jobs:
fetch-depth: 0 # need full history for merge-base + worktree
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
- name: Install ruff + ty
uses: ./.github/actions/retry
@@ -109,46 +109,6 @@ jobs:
--output .lint-reports/summary.md
cat .lint-reports/summary.md >> "$GITHUB_STEP_SUMMARY"
- name: Upload reports as artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
with:
name: lint-reports
path: .lint-reports/
retention-days: 14
- name: Post / update PR comment
if: inputs.event_name == 'pull_request' && github.event.pull_request.head.repo.full_name == github.repository
continue-on-error: true
uses: actions/github-script@60a0d83039c74a4aee543508d2ffcb1c3799cdea # v7
with:
script: |
const fs = require('fs');
const body = fs.readFileSync('.lint-reports/summary.md', 'utf8');
const marker = '<!-- lint-diff-summary -->';
const fullBody = marker + '\n' + body;
const { data: comments } = await github.rest.issues.listComments({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
});
const existing = comments.find(c => c.body && c.body.includes(marker));
if (existing) {
await github.rest.issues.updateComment({
owner: context.repo.owner,
repo: context.repo.repo,
comment_id: existing.id,
body: fullBody,
});
} else {
await github.rest.issues.createComment({
owner: context.repo.owner,
repo: context.repo.repo,
issue_number: context.issue.number,
body: fullBody,
});
}
ruff-blocking:
# Enforce the rules in pyproject.toml [tool.ruff.lint.select]. Currently
# PLW1514 (unspecified-encoding) — catches bare ``open()`` /
@@ -164,7 +124,7 @@ jobs:
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
- name: Install ruff
uses: ./.github/actions/retry

View File

@@ -3,17 +3,17 @@ name: Build Skills Index
on:
schedule:
# Run twice daily: 6 AM and 6 PM UTC
- cron: '0 6,18 * * *'
workflow_dispatch: # Manual trigger
- cron: "0 6,18 * * *"
workflow_dispatch: # Manual trigger
push:
branches: [main]
paths:
- 'scripts/build_skills_index.py'
- '.github/workflows/skills-index.yml'
- "scripts/build_skills_index.py"
- ".github/workflows/skills-index.yml"
permissions:
contents: read
actions: write # to trigger deploy-site.yml on schedule
actions: write # to trigger deploy-site.yml on schedule
jobs:
build-index:
@@ -21,11 +21,11 @@ jobs:
if: github.repository == 'NousResearch/hermes-agent'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
- uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.11'
python-version: "3.11"
- name: Install dependencies
run: pip install httpx==0.28.1 pyyaml==6.0.2
@@ -36,7 +36,7 @@ jobs:
run: python scripts/build_skills_index.py
- name: Upload index artifact
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
with:
name: skills-index
path: website/static/api/skills-index.json

View File

@@ -2,6 +2,11 @@ name: Tests
on:
workflow_call:
inputs:
slice_count:
description: Number of parallel test slices
type: number
default: 8
permissions:
contents: read
@@ -12,13 +17,11 @@ concurrency:
cancel-in-progress: true
jobs:
test:
generate:
name: "Generate slices"
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
slice: [1, 2, 3, 4, 5, 6]
outputs:
matrix: ${{ steps.matrix.outputs.matrix }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -27,13 +30,26 @@ jobs:
uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: test_durations.json
# main always writes a new suffix, but jobs pick the latest one with the same prefix
# quote from https://docs.github.com/en/actions/reference/workflows-and-actions/dependency-caching#cache-hits-and-misses
# If you provide restore-keys, the cache action sequentially searches for any caches that match the list of restore-keys.
# If there are no exact matches, the action searches for partial matches of the restore keys.
# When the action finds a partial match, the most recent cache is restored to the path directory.
key: test-durations
- name: Generate test slices
id: matrix
run: |
MATRIX=$(python3 scripts/run_tests_parallel.py --generate-slices ${{ inputs.slice_count }})
echo "matrix=$MATRIX" >> "$GITHUB_OUTPUT"
test:
name: Run tests slice ${{ matrix.slice.index }}/${{ inputs.slice_count }}
needs: generate
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
fail-fast: false
matrix: ${{ fromJSON(needs.generate.outputs.matrix) }}
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install ripgrep (prebuilt binary)
run: |
set -euo pipefail
@@ -49,7 +65,7 @@ jobs:
rg --version
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
with:
# Persist uv's download/wheel cache (~/.cache/uv) across runs.
# Keyed on the dependency manifests, so the cache is reused until
@@ -78,33 +94,19 @@ jobs:
# re-download, keeping the persisted cache small and fast to restore.
run: uv cache prune --ci
- name: Run tests (slice ${{ matrix.slice }}/6)
# Per-file isolation via scripts/run_tests_parallel.py: discovers
# every test_*.py file under tests/ (excluding integration/ + e2e/),
# then runs `python -m pytest <file>` in a freshly-spawned subprocess
- name: Run tests (slice ${{ matrix.slice.index }}/${{ inputs.slice_count }})
# Per-file isolation via scripts/run_tests.sh: each test file runs
# in its own freshly-spawned `python -m pytest <file>` subprocess
# with bounded parallelism. No xdist, no shared workers, no
# module-level state leakage between files.
#
# Why per-file (not per-test): per-test spawn cost (~250ms × 17k
# tests = 70min CPU minimum) blew the wall-clock budget. Per-file
# spawn (~250ms × ~850 files = ~3.5min) fits while still giving
# every file a fresh interpreter — the only isolation boundary
# that matters in practice (cross-file leakage was the original
# flake source; intra-file is the test author's responsibility).
#
# Why drop xdist entirely: xdist's persistent workers accumulate
# state across files, which is exactly the leakage we wanted to
# fix. ThreadPoolExecutor + subprocess.run is ~60 lines and does
# the job with cleaner semantics.
#
# Matrix slicing (--slice I/N): files are distributed across 6
# jobs by cached duration (LPT algorithm) so each job gets
# roughly equal wall time. Without a cache, files default to 2s
# estimate and get split roughly evenly by count — still correct,
# just not perfectly balanced.
# File list is pre-computed by the generate job (--generate-slices)
# which runs LPT distribution once and passes the file list to each
# matrix job via --files. Previously each job re-discovered files and
# re-ran LPT independently — redundant N times.
run: |
source .venv/bin/activate
python scripts/run_tests_parallel.py --slice ${{ matrix.slice }}/6
scripts/run_tests.sh --files '${{ matrix.slice.files }}'
env:
# Ensure tests don't accidentally call real APIs
OPENROUTER_API_KEY: ""
@@ -114,7 +116,7 @@ jobs:
- name: Upload per-slice durations
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: test-durations-slice-${{ matrix.slice }}
name: test-durations-slice-${{ matrix.slice.index }}
path: test_durations.json
retention-days: 1
@@ -173,7 +175,7 @@ jobs:
rg --version
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
with:
# Persist uv's download/wheel cache (~/.cache/uv) across runs.
# Keyed on the dependency manifests, so the cache is reused until

View File

@@ -6,6 +6,7 @@ on:
jobs:
typecheck:
name: Check TypeScript
runs-on: ubuntu-latest
strategy:
matrix:
@@ -22,8 +23,7 @@ jobs:
# native builds. Skipping install scripts drops node-pty's node-gyp
# header fetch — the transient flake that killed this job pre-`tsc` — and
# is faster. retry covers the remaining registry blips.
-
uses: ./.github/actions/retry
- uses: ./.github/actions/retry
with:
command: npm ci --ignore-scripts
- run: npm run --prefix ${{ matrix.package }} typecheck
@@ -35,6 +35,7 @@ jobs:
# users build apps/desktop from source on install/update. Run the real
# `vite build` here so that class of break fails in CI instead.
desktop-build:
name: Build desktop app
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -44,8 +45,7 @@ jobs:
cache: npm
# Keep install scripts here: the production build may need node-pty's
# native binary. retry handles the transient install-time fetch flakes.
-
uses: ./.github/actions/retry
- uses: ./.github/actions/retry
with:
command: npm ci
- run: npm run --prefix apps/desktop build

View File

@@ -5,11 +5,11 @@ name: Publish to PyPI
on:
push:
tags:
- 'v20*' # CalVer tags: v2026.5.15, v2026.5.15.2, etc.
- "v20*" # CalVer tags: v2026.5.15, v2026.5.15.2, etc.
workflow_dispatch:
inputs:
confirm_tag:
description: 'Tag to publish (e.g. v2026.5.15). Must already exist.'
description: "Tag to publish (e.g. v2026.5.15). Must already exist."
required: true
type: string
@@ -27,7 +27,7 @@ jobs:
name: Build distribution 📦
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
# On workflow_dispatch, check out the confirmed tag.
@@ -43,17 +43,17 @@ jobs:
fi
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
with:
python-version: '3.13'
python-version: "3.13"
- name: Install uv
uses: astral-sh/setup-uv@d0cc045d04ccac9d8b7881df0226f9e82c39688e # v6
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
- name: Set up Node.js
uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4
uses: actions/setup-node@49933ea5288caeca8642d1e84afbd3f7d6820020 # v4
with:
node-version: '22'
node-version: "22"
- name: Build web dashboard
run: cd web && npm ci && npm run build
@@ -81,7 +81,7 @@ jobs:
run: uv build --sdist --wheel
- name: Upload distribution artifacts
uses: actions/upload-artifact@ea165f8d65b6e75b540449e92b4886f43607fa02 # v4
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7
with:
name: python-package-distributions
path: dist/
@@ -94,17 +94,17 @@ jobs:
name: pypi
url: https://pypi.org/p/hermes-agent
permissions:
id-token: write # OIDC trusted publishing
id-token: write # OIDC trusted publishing
steps:
- name: Download distribution artifacts
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
name: python-package-distributions
path: dist/
- name: Publish to PyPI
uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b # v1.14.0
uses: pypa/gh-action-pypi-publish@cef221092ed1bacb1cc03d23a2d87d1d172e277b # v1.14.0
with:
skip-existing: true
@@ -116,12 +116,12 @@ jobs:
needs: publish
runs-on: ubuntu-latest
permissions:
contents: write # attach assets to the existing release
id-token: write # sigstore signing
contents: write # attach assets to the existing release
id-token: write # sigstore signing
steps:
- name: Download distribution artifacts
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
uses: actions/download-artifact@d3f86a106a0bac45b974a628896c90dbdf5c8093 # v4
with:
name: python-package-distributions
path: dist/
@@ -145,7 +145,7 @@ jobs:
- name: Sign with Sigstore
if: env.skip_sign != 'true'
uses: sigstore/gh-action-sigstore-python@04cffa1d795717b140764e8b640de88853c92acc # v3.3.0
uses: sigstore/gh-action-sigstore-python@04cffa1d795717b140764e8b640de88853c92acc # v3.3.0
with:
inputs: >-
./dist/*.tar.gz

View File

@@ -4,7 +4,7 @@ name: uv.lock check
# that modify pyproject.toml without regenerating uv.lock (or vice versa)
# must not merge, because the Docker build's `uv sync --frozen` step will
# fail on a stale lockfile and we'd rather catch it here than in the
# docker-publish workflow on main.
# docker workflow on main.
#
# ─────────────────────────────────────────────────────────────────────────
# IMPORTANT: this check runs against the MERGED state, not just your branch
@@ -63,7 +63,7 @@ jobs:
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@fac544c07dec837d0ccb6301d7b5580bf5edae39 # 8.2.0
# `uv lock --check` re-resolves the project from pyproject.toml and
# compares the result to uv.lock, exiting non-zero if they disagree.
@@ -100,7 +100,7 @@ jobs:
This check is blocking because the Docker image build uses
`uv sync --frozen --extra all`, which rejects stale lockfiles
— catching it here avoids a ~15 min failed docker-publish run
— catching it here avoids a ~15 min failed docker run
on `main` post-merge.
EOF
echo "::error title=uv.lock out of sync::Run \`uv lock\` locally and commit the result. If on a PR, sync with main first."

6
.gitignore vendored
View File

@@ -137,3 +137,9 @@ RELEASE_v*.md
# Desktop demo-run scratch output (hermes writes demo/*.txt during recorded
# walkthroughs). Throwaway artifacts, never part of the app.
apps/desktop/demo/
# PR infographics are rendered locally and embedded in PR descriptions via the
# image-provider (fal.media) URL — they are NEVER committed to the repo. The
# PR body is the archive. See the hermes-agent-dev skill's
# pr-infographic-workflow reference (storage rule + lapse #8 / #COMMIT-1).
infographic/

View File

@@ -123,6 +123,17 @@ conservative at the waist.
without E2E proof, and plugins that touch core files.** Plugins live in their
own directory and work within the ABCs/hooks we provide; if a plugin needs
more, widen the generic plugin surface, don't special-case it in core.
- **Third-party products / other people's projects integrated into the core
tree.** Observability backends, vendor SaaS integrations, analytics dashboards,
and similar "someone else's product" plugins do NOT land under `plugins/` in
this repo. They place an ongoing maintenance burden on us to keep them working
against a fast-moving core, for a backend we don't own. Ship them as a
**standalone plugin repo** users install into `~/.hermes/plugins/` (or via a
pip entry point), and promote them in the Nous Research Discord
(`#plugins-skills-and-skins`). This is a coupling-and-maintenance decision, not
a quality bar — the plugin can be excellent and still be a close. PRs that add
such a directory to the tree are closed with a pointer to publish it as its own
repo.
### Before you call it a bug — verify the premise (and when NOT to close)
@@ -480,7 +491,7 @@ The dashboard embeds the real `hermes --tui` — **not** a rewrite. See `hermes
### Electron Desktop Chat App (`apps/desktop/`)
A **separate** chat surface from both the classic CLI and the dashboard's embedded TUI. It is an Electron + React + nanostore renderer (`@assistant-ui/react`) that talks to a `tui_gateway` backend over JSON-RPC (`requestGateway(method, params)`). It does NOT embed `hermes --tui` — it has its own composer, transcript, and slash-command pipeline. Route desktop bugs to the `hermes-desktop-app-work` skill, not `hermes-dashboard-work`.
A **separate** chat surface from both the classic CLI and the dashboard's embedded TUI. It is an Electron + React + nanostore renderer (`@assistant-ui/react`) that talks to a `tui_gateway` backend over JSON-RPC (`requestGateway(method, params)`). The WebSocket/JSON-RPC transport lives in the framework-agnostic `apps/shared` package (`@hermes/shared``JsonRpcGatewayClient` + WS URL helpers), which the web dashboard (`web/`) also consumes; **desktop has no build/runtime dependency on the dashboard frontend** — it spawns a headless `hermes serve` backend server (the same gateway `dashboard` serves, minus the browser UI). `dashboard` and `serve` share `cmd_dashboard`/`start_server` but are independent surfaces — neither launches the other. The one exception is a backward-compat *fallback*: `serve` is newer, so the desktop spawn (`electron/backend-command.cjs` + `backendSupportsServe()` in `main.cjs`) detects whether the resolved runtime registers `serve` and, only when it does not (an older managed install / PATH `hermes` the app hasn't updated yet), rewrites the argv to the legacy `dashboard --no-open`. Without that, a new app against an un-upgraded runtime would crash on an unknown subcommand and brick every mid-upgrade user. It does NOT embed `hermes --tui` — it has its own composer, transcript, and slash-command pipeline. Route desktop bugs to the `hermes-desktop-app-work` skill, not `hermes-dashboard-work`.
**Slash commands in the desktop app are curated client-side, then dispatched to the backend.** The pipeline:
@@ -783,6 +794,24 @@ landing in this tree. PRs that add a new directory under
provider as its own repo. Existing in-tree providers stay; bug fixes
to them are welcome.
**No new third-party-product plugins in-tree (policy, June 2026):** the
same rule applies beyond memory providers. Plugins that integrate
someone else's product or project — observability/metrics backends,
vendor SaaS connectors, analytics dashboards, paid-service tie-ins —
must ship as **standalone plugin repos** that users install into
`~/.hermes/plugins/` (or via pip entry points). They register through
the existing plugin discovery path and use the ABCs/hooks/ctx surface
we expose; nothing special is needed in core. The reason is
maintenance load: every product we absorb into the tree becomes our
burden to keep working against a fast-moving core, for a backend we
don't own. Promote standalone plugins in the Nous Research Discord
(`#plugins-skills-and-skins`). PRs that add such a directory under
`plugins/` are closed with a pointer to publish it as its own repo —
this is a coupling decision, not a quality judgment. (The
`observability/`, `kanban/`, `disk-cleanup/`, etc. directories already
in the tree are existing precedent, not an invitation to add more
third-party-product plugins alongside them.)
### Model-provider plugins (`plugins/model-providers/<name>/`)
Every inference backend (openrouter, anthropic, gmi, deepseek, nvidia, …)
@@ -1260,65 +1289,22 @@ scripts/run_tests.sh # full suite, CI-parity
scripts/run_tests.sh tests/gateway/ # one directory
scripts/run_tests.sh tests/agent/test_foo.py::test_x # one test
scripts/run_tests.sh -v --tb=long # pass-through pytest flags
scripts/run_tests.sh --no-isolate tests/foo/ # disable subprocess isolation (faster, for debugging)
```
### Subprocess-per-test isolation
### Subprocess-per-test-file isolation
Every test runs in a freshly-spawned Python subprocess via the in-tree plugin
at `tests/_isolate_plugin.py`. This means module-level dicts/sets and
ContextVars from one test cannot leak into the next — the historic
`_reset_module_state` autouse fixture is gone.
Every test file runs in a freshly-spawned Python subprocess via `run_tests_parallel.py`. This means module-level dicts/sets and
ContextVars from one test file cannot leak into the next.
Implementation notes:
### Why the wrapper
- The plugin uses `multiprocessing.get_context("spawn")`, which works on
Linux, macOS, and Windows alike (POSIX `fork` is not used).
- Per-test overhead is ~0.51.0s (Python startup + pytest collection). xdist
parallelism amortizes this across cores; on a 20-core box the full suite
finishes in roughly the same wall time as before, but flake-free.
- `isolate_timeout` (configured in `pyproject.toml`) caps each test at 30s.
Hangs are killed and surfaced as a failure report.
- Pass `--no-isolate` to disable isolation — useful when debugging a single
test interactively, or when you specifically want to verify state leakage.
- The plugin disables itself in child processes (sentinel envvar
`HERMES_ISOLATE_CHILD=1`), so there's no fork-bomb risk.
| | Without wrapper | With wrapper |
| ------------------- | ------------------------------------------- | ----------------------------------------- |
| Provider API keys | Whatever is in your env (auto-detects pool) | All env vars except a specific few unset. |
| HOME / `~/.hermes/` | Your real config+auth.json | Temp dir per test |
| Timezone | Local TZ (PDT etc.) | UTC |
| Locale | Whatever is set | C.UTF-8 |
### Why the wrapper (and why the old "just call pytest" doesn't work)
Five real sources of local-vs-CI drift the script closes:
| | Without wrapper | With wrapper |
|---|---|---|
| Provider API keys | Whatever is in your env (auto-detects pool) | All `*_API_KEY`/`*_TOKEN`/etc. unset |
| HOME / `~/.hermes/` | Your real config+auth.json | Temp dir per test |
| Timezone | Local TZ (PDT etc.) | UTC |
| Locale | Whatever is set | C.UTF-8 |
| xdist workers | `-n auto` = all cores | `-n auto` (safe — subprocess isolation prevents cross-worker flakes) |
`tests/conftest.py` also enforces points 1-4 as an autouse fixture so ANY pytest
invocation (including IDE integrations) gets hermetic behavior — but the wrapper
is belt-and-suspenders.
### Running without the wrapper (only if you must)
If you can't use the wrapper (e.g. inside an IDE that shells pytest directly),
at minimum activate the venv. The isolation plugin loads automatically from
`addopts` in `pyproject.toml`, so you get the same per-test process isolation
either way.
```bash
source .venv/bin/activate # or: source venv/bin/activate
python -m pytest tests/ -q
```
If you need to bypass isolation for fast feedback while debugging:
```bash
python -m pytest tests/agent/test_foo.py -q --no-isolate
```
Always run the full suite before pushing changes.
### Don't write change-detector tests

View File

@@ -85,6 +85,23 @@ This isn't a quality bar — it's a coupling-and-maintenance decision. Memory pr
---
## Third-Party Product Integrations: Ship as a Standalone Plugin
The same rule extends to **any plugin that integrates someone else's product or project** — observability/metrics backends, vendor SaaS connectors, analytics dashboards, paid-service tie-ins, and similar third-party integrations. **These do not land in this repo.**
The reason is maintenance load, not quality. Every external product absorbed into the core tree becomes ours to keep working against a fast-moving codebase, for a backend we don't own and can't control. Hermes ships a lot and the core moves quickly; coupling third-party products into it creates an open-ended burden on the maintainers.
Publish these as a **standalone plugin repo** instead:
- Implement the relevant ABC and use the existing plugin discovery path (`~/.hermes/plugins/`, project `.hermes/plugins/`, or a pip entry point) — see [Build a Hermes Plugin](https://hermes-agent.nousresearch.com/docs/guides/build-a-hermes-plugin)
- Register lifecycle hooks (`pre_tool_call`, `post_tool_call`, `pre_llm_call`, `post_llm_call`, `on_session_start`, `on_session_end`), tools (`ctx.register_tool`), and CLI subcommands (`ctx.register_cli_command`) through the surface we already expose — no core changes needed
- If your plugin needs a capability the framework doesn't expose, that's a feature request to **widen the generic plugin surface** (a new hook or `ctx` method) — never special-case your plugin in core
- Promote it in the [Nous Research Discord](https://discord.gg/NousResearch) `#plugins-skills-and-skins` channel so users can find and install it
A well-built third-party-product plugin can clear automated review and still be closed for this reason — it's a placement decision, not a verdict on the code. PRs that add such a directory under `plugins/` will be closed with a pointer to publish it as its own repo.
---
## Development Setup
### Prerequisites
@@ -132,13 +149,20 @@ this way, make sure you run the `hermes` entrypoint from this venv; running the
system `python3 -m hermes_cli.main` can pick up unrelated system Python
packages.
Create the venv **outside** the cloned source tree. A venv that lives inside
the directory the agent operates from can be wiped by a relative-path command
the agent runs against its own checkout (`rm -rf venv`, `uv venv venv`, etc.),
which silently destroys the running runtime mid-session. Keeping it outside the
tree means no relative path from the workspace resolves to it.
```bash
git clone https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
# Create venv with Python 3.11
uv venv venv --python 3.11
export VIRTUAL_ENV="$(pwd)/venv"
# Create venv with Python 3.11, OUTSIDE the source tree
uv venv ~/.hermes/venvs/hermes-dev --python 3.11
export VIRTUAL_ENV="$HOME/.hermes/venvs/hermes-dev"
export PATH="$VIRTUAL_ENV/bin:$PATH"
# Install with all extras (messaging, cron, CLI menus, dev tools)
uv pip install -e ".[all,dev]"

View File

@@ -119,6 +119,9 @@ COPY package.json package-lock.json ./
COPY web/package.json web/
COPY ui-tui/package.json ui-tui/
COPY ui-tui/packages/hermes-ink/ ui-tui/packages/hermes-ink/
# apps/shared/ is copied IN FULL because web/package.json references it as a
# `file:` workspace dependency (same pattern as hermes-ink above).
COPY apps/shared/ apps/shared/
# `npm_config_install_links=false` forces npm to install `file:` deps as
# symlinks instead of copies. This is the default since npm 10+, which is
@@ -184,12 +187,19 @@ RUN uv sync --frozen --no-install-project --extra all --extra messaging --extra
# invalidate the (relatively slow) web + ui-tui build layer.
COPY web/ web/
COPY ui-tui/ ui-tui/
COPY apps/shared/ apps/shared/
RUN cd web && npm run build && \
cd ../ui-tui && npm run build
# ---------- Source code ----------
# .dockerignore excludes node_modules, so the installs above survive.
COPY . .
# --link decouples this layer from parents for cache purposes; --chmod bakes
# the final read-only permissions at copy time so we skip the separate
# `chmod -R` pass that previously walked ~30k files across the venv +
# node_modules + source (21s amd64 / 222s arm64 — #49113). `a+rX,go-w`
# gives the non-root hermes user read + traverse but no write; root retains
# write so the build steps below don't need chmod u+w dances.
COPY --link --chmod=a+rX,go-w . .
# ---------- Permissions ----------
# Link hermes-agent itself (editable). Deps are already installed in the
@@ -197,19 +207,15 @@ COPY . .
# resolution or downloads.
RUN uv pip install --no-cache-dir --no-deps -e "."
# Keep /opt/hermes immutable for the runtime hermes user. Hosted/container
# instances must not be able to self-edit the installed source or venv; user
# data, skills, plugins, config, logs, and dashboard uploads live under
# /opt/data instead. Root can still repair the image during build/boot, but
# supervised Hermes processes drop to the non-root hermes user.
# Wire the exec shim and install-method stamp. Files under /opt/hermes are
# already root-owned (COPY, uv sync, npm install all run as root) and
# read-only for the hermes user (go-w from the --chmod above).
USER root
RUN mkdir -p /opt/hermes/bin && \
cp /opt/hermes/docker/hermes-exec-shim.sh /opt/hermes/bin/hermes && \
chmod 0755 /opt/hermes/bin/hermes && \
printf 'docker\n' > /opt/hermes/.install_method && \
chown -R root:root /opt/hermes && \
chmod -R a+rX /opt/hermes && \
chmod -R a-w /opt/hermes
printf 'docker\n' > /opt/hermes/.install_method
# The ``.install_method`` stamp is baked next to the running code (the install
# tree), NOT into $HERMES_HOME. $HERMES_HOME (/opt/data) is a shared data
# volume that is commonly bind-mounted from the host and even shared with a
@@ -236,13 +242,11 @@ RUN mkdir -p /opt/hermes/bin && \
#
# The arg is optional — local `docker build` without --build-arg simply
# omits the file, and the runtime falls back to live-git lookup. CI
# (.github/workflows/docker-publish.yml) passes ${{ github.sha }} so
# (.github/workflows/docker.yml) passes ${{ github.sha }} so
# every published image has it.
ARG HERMES_GIT_SHA=
RUN if [ -n "${HERMES_GIT_SHA}" ]; then \
chmod u+w /opt/hermes && \
printf '%s\n' "${HERMES_GIT_SHA}" > /opt/hermes/.hermes_build_sha && \
chmod a-w /opt/hermes /opt/hermes/.hermes_build_sha; \
printf '%s\n' "${HERMES_GIT_SHA}" > /opt/hermes/.hermes_build_sha; \
fi
# ---------- s6-overlay service wiring ----------
@@ -290,6 +294,19 @@ ENV HERMES_TUI_DIR=/opt/hermes/ui-tui
ENV HERMES_HOME=/opt/data
ENV HERMES_WRITE_SAFE_ROOT=/opt/data
ENV HERMES_DISABLE_LAZY_INSTALLS=1
# The published image seals /opt/hermes (root-owned, read-only) so a runtime
# lazy install can't mutate the agent's own venv and brick it. But opt-in
# backends (Firecrawl web search, Exa, Feishu, …) keep their SDKs in
# tools/lazy_deps.py — deliberately NOT baked into [all] (see pyproject.toml
# policy 2026-05-12: one quarantined release must not break every install).
# Redirect those lazy installs to a writable dir on the durable data volume.
# lazy_deps appends this dir to the END of sys.path, so a package installed
# here can only ADD modules — it can never shadow or downgrade a core module,
# so the sealed-venv guarantee holds even with installs re-enabled. The dir
# is seeded + chowned to the hermes user by docker/stage2-hook.sh and lives
# on the /opt/data volume, so it persists across container recreates / image
# updates (an ABI stamp invalidates it if a rebuild bumps the interpreter).
ENV HERMES_LAZY_INSTALL_TARGET=/opt/data/lazy-packages
# `docker exec` privilege-drop shim. When operators run
# `docker exec <c> hermes ...` they default to root, and any file the

View File

@@ -18,7 +18,7 @@
**The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (200+ models), [NovitaAI](https://novita.ai) (AI-native cloud for Model API, Agent Sandbox, and GPU Cloud), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, or your own endpoint. Switch with `hermes model` — no code changes, no lock-in.
Use any model you want — [Nous Portal](https://portal.nousresearch.com), OpenRouter, OpenAI, your own endpoint, and [many others](https://hermes-agent.nousresearch.com/docs/integrations/providers). Switch with `hermes model` — no code changes, no lock-in.
<table>
<tr><td><b>A real terminal interface</b></td><td>Full TUI with multiline editing, slash-command autocomplete, conversation history, interrupt-and-redirect, and streaming tool output.</td></tr>
@@ -232,10 +232,14 @@ scripts/run_tests.sh
Manual clone fallback (for throwaway clones/CI where you intentionally do not
want the managed install layout):
Create the venv outside the cloned source tree — a venv inside the directory
the agent operates from can be wiped by a relative-path command the agent runs
against its own checkout, destroying the running runtime mid-session.
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv .venv --python 3.11
source .venv/bin/activate
uv venv ~/.hermes/venvs/hermes-dev --python 3.11
source ~/.hermes/venvs/hermes-dev/bin/activate
uv pip install -e ".[all,dev]"
scripts/run_tests.sh
```

View File

@@ -23,6 +23,11 @@ except ModuleNotFoundError:
# new code but ``uv pip install -e .`` didn't finish. Missing bootstrap
# means UTF-8 stdio setup is skipped on Windows; POSIX is unaffected.
pass
else:
# Stop a ``utils/``/``proxy/``/``ui/`` package in the launch directory from
# shadowing Hermes's own modules — ``hermes acp`` can be started from any
# cwd, including a project that has same-named packages on its path.
hermes_bootstrap.harden_import_path()
import argparse
import asyncio

View File

@@ -74,7 +74,7 @@ _POLISHED_TOOLS = {
"kanban_create", "kanban_show", "kanban_comment", "kanban_complete",
"kanban_block", "kanban_link", "kanban_heartbeat",
"yb_query_group_info", "yb_query_group_members", "yb_search_sticker",
"yb_send_dm", "yb_send_sticker", "mixture_of_agents",
"yb_send_dm", "yb_send_sticker",
}

View File

@@ -106,7 +106,12 @@ def _custom_provider_extra_body_for_agent(
base_url: str,
custom_providers: List[Dict[str, Any]],
) -> Optional[Dict[str, Any]]:
if (provider or "").strip().lower() != "custom":
provider_norm = (provider or "").strip().lower()
if provider_norm == "custom":
provider_key_filter = ""
elif provider_norm.startswith("custom:"):
provider_key_filter = provider_norm.split(":", 1)[1].strip()
else:
return None
target_url = _normalized_custom_base_url(base_url)
@@ -117,6 +122,13 @@ def _custom_provider_extra_body_for_agent(
for entry in custom_providers or []:
if not isinstance(entry, dict):
continue
if provider_key_filter:
entry_keys = {
str(entry.get("provider_key", "") or "").strip().lower(),
str(entry.get("name", "") or "").strip().lower(),
}
if provider_key_filter not in entry_keys:
continue
if _normalized_custom_base_url(entry.get("base_url")) != target_url:
continue
extra_body = entry.get("extra_body")
@@ -707,6 +719,55 @@ def init_agent(
print("🔑 Using credentials: Microsoft Entra ID")
elif isinstance(effective_key, str) and len(effective_key) > 12:
print(f"🔑 Using token: {effective_key[:8]}...{effective_key[-4:]}")
elif agent.provider == "moa":
from agent.moa_loop import MoAClient
agent.api_mode = "chat_completions"
# Route reference-model outputs to the agent's tool_progress_callback so
# every surface that already consumes it (CLI spinner/scrollback, TUI,
# desktop, gateway) can show each reference's answer as a labelled block
# before the aggregator acts. The facade emits "moa.reference" and
# "moa.aggregating" events; we forward them through the same callback
# the tool lifecycle uses. Best-effort and cache-safe — these are
# display-only events, they never touch the message history.
def _moa_reference_relay(event: str, **kwargs: Any) -> None:
cb = getattr(agent, "tool_progress_callback", None)
if cb is None:
return
try:
if event == "moa.reference":
label = str(kwargs.get("label") or "")
text = str(kwargs.get("text") or "")
idx = kwargs.get("index")
count = kwargs.get("count")
cb(
"moa.reference",
label,
text,
None,
moa_index=idx,
moa_count=count,
)
elif event == "moa.aggregating":
cb(
"moa.aggregating",
str(kwargs.get("aggregator") or ""),
None,
None,
moa_ref_count=kwargs.get("ref_count"),
)
except Exception:
pass
agent.client = MoAClient(
agent.model or "default",
reference_callback=_moa_reference_relay,
)
agent._client_kwargs = {}
agent.api_key = api_key or "moa-virtual-provider"
agent.base_url = "moa://local"
if not agent.quiet_mode:
print(f"🤖 AI Agent initialized with MoA preset: {agent.model}")
elif agent.api_mode == "bedrock_converse":
# AWS Bedrock — uses boto3 directly, no OpenAI client needed.
# Region is extracted from the base_url or defaults to us-east-1.
@@ -1246,6 +1307,12 @@ def init_agent(
_agent_section = {}
agent._tool_use_enforcement = _agent_section.get("tool_use_enforcement", "auto")
# Intent-ack continuation config: "auto" (default — codex_responses only,
# the historical gate), true (all api_modes), false (never), or a list of
# model-name substrings. Resolved against the active api_mode/model in the
# conversation loop's intent-ack block.
agent._intent_ack_continuation = _agent_section.get("intent_ack_continuation", "auto")
# Universal task-completion guidance toggle. Default True. Surfaced
# as a separate flag from tool_use_enforcement because the guidance
# applies to ALL models, not just the model families enforcement
@@ -1506,6 +1573,7 @@ def init_agent(
# 3. Check general plugin system (user-installed plugins)
# 4. Fall back to built-in ContextCompressor
_selected_engine = None
_copy_failed = False
_engine_name = "compressor" # default
try:
_ctx_cfg = _agent_cfg.get("context", {}) if isinstance(_agent_cfg, dict) else {}
@@ -1523,15 +1591,35 @@ def init_agent(
# Try general plugin system as fallback
if _selected_engine is None:
_candidate = None
try:
from hermes_cli.plugins import get_plugin_context_engine
_candidate = get_plugin_context_engine()
if _candidate and _candidate.name == _engine_name:
_selected_engine = _candidate
except Exception:
pass
_candidate = None
if _candidate is not None and _candidate.name == _engine_name:
# Deep-copy the shared plugin singleton so a child agent's
# update_model() can't mutate the parent's compressor (#42449).
# Copy can fail for engines holding uncopyable state (locks, DB
# connections, clients); in that case fall back to the built-in
# compressor with an ACCURATE message rather than silently
# mislabelling it "not found".
import copy
try:
_selected_engine = copy.deepcopy(_candidate)
except Exception as _copy_err:
_copy_failed = True
_ra().logger.warning(
"Context engine '%s' could not be safely copied for this "
"agent (%s) — falling back to built-in compressor. Plugin "
"engines that hold uncopyable state (locks, DB connections) "
"should implement __deepcopy__ to copy only mutable budget "
"state.",
_engine_name, _copy_err,
)
_selected_engine = None
if _selected_engine is None:
if _selected_engine is None and not _copy_failed:
_ra().logger.warning(
"Context engine '%s' not found — falling back to built-in compressor",
_engine_name,
@@ -1588,8 +1676,10 @@ def init_agent(
f"Model {agent.model} has a context window of {_ctx:,} tokens, "
f"which is below the minimum {MINIMUM_CONTEXT_LENGTH:,} required "
f"by Hermes Agent. Choose a model with at least "
f"{MINIMUM_CONTEXT_LENGTH // 1000}K context, or set "
f"model.context_length in config.yaml to override."
f"{MINIMUM_CONTEXT_LENGTH // 1000}K context. If your server "
f"reports a window smaller than the model's true window, set "
f"model.context_length in config.yaml to the real value "
f"(this must be at least {MINIMUM_CONTEXT_LENGTH // 1000}K)."
)
# Inject context engine tool schemas (e.g. lcm_grep, lcm_describe, lcm_expand).
@@ -1621,16 +1711,27 @@ def init_agent(
for t in agent.tools
if isinstance(t, dict)
}
for _schema in agent.context_compressor.get_tool_schemas():
_tname = _schema.get("name", "")
if _tname and _tname in _existing_tool_names:
from agent.memory_manager import normalize_tool_schema as _normalize_tool_schema
for _raw_schema in agent.context_compressor.get_tool_schemas():
_schema = _normalize_tool_schema(_raw_schema)
if _schema is None:
# A schema with no resolvable name (e.g. an already-wrapped
# entry) would append a nameless tool that strict providers
# 400 on, disabling the whole toolset (#47707). Skip it.
_ra().logger.warning(
"Context engine returned a tool schema with no resolvable "
"name; skipping to avoid poisoning the request (%r)",
_raw_schema,
)
continue
_tname = _schema["name"]
if _tname in _existing_tool_names:
continue # already registered via plugin/cache path
_wrapped = {"type": "function", "function": _schema}
agent.tools.append(_wrapped)
if _tname:
agent.valid_tool_names.add(_tname)
agent._context_engine_tool_names.add(_tname)
_existing_tool_names.add(_tname)
agent.valid_tool_names.add(_tname)
agent._context_engine_tool_names.add(_tname)
_existing_tool_names.add(_tname)
# Notify context engine of session start
if hasattr(agent, "context_compressor") and agent.context_compressor:

View File

@@ -42,6 +42,14 @@ from utils import base_url_host_matches, base_url_hostname, env_var_enabled, ato
logger = logging.getLogger(__name__)
# Max consecutive successful credential-pool token refreshes of the SAME entry
# on a persistent auth failure before we give up and let the fallback chain
# activate. A single-entry OAuth pool can re-mint a fresh token indefinitely
# even when the upstream keeps rejecting it, so without this cap the retry loop
# spins forever and never reaches ``_try_activate_fallback``. See #26080.
_MAX_AUTH_REFRESH_ATTEMPTS = 2
def _ra():
"""Lazy ``run_agent`` reference for test-patch routing."""
import run_agent
@@ -775,6 +783,30 @@ def recover_with_credential_pool(
return False, has_retried_429
refreshed = pool.try_refresh_current()
if refreshed is not None:
# ``try_refresh_current()`` re-mints a fresh OAuth token and reports
# success even when the upstream keeps rejecting it — a single-entry
# pool (common for OAuth/Max subscribers) has nothing to rotate to,
# so a bare "refreshed → retry" loop spins forever on the same dead
# token and the configured fallback never activates. Cap consecutive
# same-entry refreshes and fall through to fallback once exceeded.
# See #26080.
refreshed_id = getattr(refreshed, "id", None)
if refreshed_id is not None:
refresh_counts = getattr(agent, "_auth_pool_refresh_counts", None)
if refresh_counts is None:
refresh_counts = {}
agent._auth_pool_refresh_counts = refresh_counts
refresh_key = (agent.provider, refreshed_id)
refresh_counts[refresh_key] = refresh_counts.get(refresh_key, 0) + 1
if refresh_counts[refresh_key] > _MAX_AUTH_REFRESH_ATTEMPTS:
_ra().logger.warning(
"Credential auth failure persists after %s refreshes for "
"pool entry %s — treating as unrecoverable and allowing "
"fallback to activate.",
refresh_counts[refresh_key] - 1,
refreshed_id,
)
return False, has_retried_429
_ra().logger.info(f"Credential auth failure — refreshed pool entry {getattr(refreshed, 'id', '?')}")
agent._swap_credential(refreshed)
return True, has_retried_429
@@ -1046,6 +1078,34 @@ def restore_primary_runtime(agent) -> bool:
api_mode=rt.get("compressor_api_mode", ""),
)
# ── Re-select from the credential pool if one is available ──
# The snapshot's api_key was captured at construction time. Across
# turns the pool may have rotated (token revocation, billing/rate-limit
# exhaustion, cooldown), leaving the snapshot key stale. Restoring it
# blindly re-fails on the first request and burns through the remaining
# pool entries before cross-provider fallback even gets a chance. Ask
# the pool for its current best entry and swap the live credential in.
# When the pool is absent, empty, or the entry has no usable key, we
# keep the snapshot key (the existing behavior). Fixes #25205.
pool = getattr(agent, "_credential_pool", None)
if pool is not None and pool.has_available():
entry = pool.select()
if entry is not None:
entry_key = (
getattr(entry, "runtime_api_key", None)
or getattr(entry, "access_token", "")
)
if entry_key:
# ``_swap_credential`` rebuilds the OpenAI/Anthropic client,
# reapplies base-url-scoped headers, and carries the
# accumulated base_url / OAuth-detection fixes (#33163).
agent._swap_credential(entry)
logger.info(
"Restore re-selected pool entry %s (%s)",
getattr(entry, "id", "?"),
getattr(entry, "label", "?"),
)
# ── Reset fallback chain for the new turn ──
agent._fallback_activated = False
agent._fallback_index = 0
@@ -1221,7 +1281,11 @@ def dump_api_request_debug(
dump_payload["error"] = error_info
timestamp = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
dump_file = agent.logs_dir / f"request_dump_{agent.session_id}_{timestamp}.json"
# Sanitize the session ID into a traversal-free path segment — it can
# originate from untrusted input (X-Hermes-Session-Id header), and an
# unsanitized "../"-shaped ID would write the dump outside logs_dir.
safe_sid = _ra()._safe_session_filename_component(agent.session_id)
dump_file = agent.logs_dir / f"request_dump_{safe_sid}_{timestamp}.json"
# Redact secrets before persisting/printing. This dump captures the
# full request body (system prompt, tool defs, context-embedded
@@ -1420,6 +1484,15 @@ def create_openai_client(agent, client_kwargs: dict, *, reason: str, shared: boo
keepalive_http = agent._build_keepalive_http_client(client_kwargs.get("base_url", ""))
if keepalive_http is not None:
client_kwargs["http_client"] = keepalive_http
# Delegate all rate-limit / 5xx retry to hermes's outer conversation loop,
# which honors Retry-After and applies adaptive/jittered backoff. The OpenAI
# SDK default (max_retries=2) uses its own 1-2s backoff that ignores
# Retry-After and double-retries inside our loop — the same deadlock the
# Anthropic clients hit (#26293). This is the single chokepoint every primary
# OpenAI/aggregator client passes through (init, switch_model, recovery,
# restore, request-scoped); auxiliary_client builds its own clients and keeps
# SDK retries because it is NOT wrapped by the conversation loop.
client_kwargs.setdefault("max_retries", 0)
# Uses the module-level `OpenAI` name, resolved lazily on first
# access via __getattr__ below. Tests patch via `run_agent.OpenAI`.
client = _ra().OpenAI(**client_kwargs)
@@ -1499,6 +1572,10 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
# _client_kwargs is a dict — snapshot a shallow copy so mutating the
# live dict doesn't poison the rollback target.
_snapshot["_client_kwargs"] = dict(getattr(agent, "_client_kwargs", {}) or {})
# Snapshot the credential pool reference so a failed client rebuild can
# restore the original pool (issue #52727: pool reload is part of this
# switch and must be reversible on rollback).
_snapshot["_credential_pool"] = getattr(agent, "_credential_pool", _MISSING)
try:
# Clear the per-config context_length override so the new model's
@@ -1523,8 +1600,36 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
if api_key:
agent.api_key = api_key
# ── Reload credential pool for the new provider (issue #52727) ──
# Without this, ``recover_with_credential_pool`` sees a
# ``pool.provider != agent.provider`` mismatch and short-circuits,
# leaving the new provider with no rotation/recovery on 401/429 and
# burning the original pool's entries. Only reload when the provider
# actually changed (or the pool was missing) — re-selecting the same
# provider must not churn the pool reference. A reload failure is
# logged + swallowed: the switch itself must still complete.
old_norm = (old_provider or "").strip().lower()
new_norm = (new_provider or "").strip().lower()
if old_norm != new_norm or getattr(agent, "_credential_pool", None) is None:
try:
from agent.credential_pool import load_pool
agent._credential_pool = load_pool(new_provider)
except Exception as _pool_exc: # noqa: BLE001
logger.warning(
"switch_model: credential pool reload failed for %s (%s); "
"continuing without pool rotation this turn",
new_provider, _pool_exc,
)
# ── Build new client ──
if api_mode == "anthropic_messages":
if (new_provider or "").strip().lower() == "moa":
from agent.moa_loop import MoAClient
agent.api_key = api_key or "moa-virtual-provider"
agent.base_url = "moa://local"
agent._client_kwargs = {}
agent.client = MoAClient(agent.model or "default")
elif api_mode == "anthropic_messages":
from agent.anthropic_adapter import (
build_anthropic_client,
resolve_anthropic_token,
@@ -1697,6 +1802,27 @@ def switch_model(agent, new_model, new_provider, api_key='', base_url='', api_mo
old_model, old_provider, new_model, new_provider,
)
# ── Persist billing route to session DB ──
# The agent's _session_db / session_id may not be set in all contexts
# (tests, bare agents without a session DB, etc.). This ensures the
# dashboard Model cards show the actual provider after a mid-session
# /model switch instead of the stale session-creation provider.
# See #48248 for the full bug description.
_session_db = getattr(agent, "_session_db", None)
_session_id = getattr(agent, "session_id", None)
if _session_db is not None and _session_id:
try:
_session_db.update_session_billing_route(
_session_id,
provider=agent.provider,
base_url=agent.base_url,
billing_mode=getattr(agent, "api_mode", None),
)
except Exception:
logger.warning(
"Failed to persist billing route after model switch",
exc_info=True,
)
def invoke_tool(agent, function_name: str, function_args: dict, effective_task_id: str,
@@ -2083,8 +2209,21 @@ def looks_like_codex_intermediate_ack(
user_message: str,
assistant_content: str,
messages: List[Dict[str, Any]],
require_workspace: bool = True,
) -> bool:
"""Detect a planning/ack message that should continue instead of ending the turn."""
"""Detect a planning/ack message that should continue instead of ending the turn.
``require_workspace`` (default True) keeps the original codex-coding scope:
the ack must reference a filesystem/repo workspace. The conversation loop
passes ``require_workspace=False`` when the user has explicitly opted into
intent-ack continuation for all api_modes (``agent.intent_ack_continuation``
is ``true`` or a model-list), so general autonomous workflows ("I'll run a
health check on the server", "I'll start the deployment") — which carry a
future-ack and an action verb but no filesystem reference — are caught too.
The future-ack + short-content + no-prior-tools + action-verb requirements
always apply, which is what keeps conversational "I'll help you brainstorm"
replies from tripping it.
"""
if any(isinstance(msg, dict) and msg.get("role") == "tool" for msg in messages):
return False
@@ -2137,17 +2276,67 @@ def looks_like_codex_intermediate_ack(
"path",
)
assistant_mentions_action = any(marker in assistant_text for marker in action_markers)
if not assistant_mentions_action:
return False
# Opted-in (all-api_mode) path: a future-ack + action verb + no prior tool
# call is enough — the user asked us to keep going when the model only
# announces intent, regardless of whether a filesystem is involved.
if not require_workspace:
return True
user_text = (user_message or "").strip().lower()
user_targets_workspace = (
any(marker in user_text for marker in workspace_markers)
or "~/" in user_text
or "/" in user_text
)
assistant_mentions_action = any(marker in assistant_text for marker in action_markers)
assistant_targets_workspace = any(
marker in assistant_text for marker in workspace_markers
)
return (user_targets_workspace or assistant_targets_workspace) and assistant_mentions_action
return user_targets_workspace or assistant_targets_workspace
def intent_ack_continuation_mode(agent) -> str:
"""Classify the resolved intent-ack continuation mode for this turn.
Returns one of:
* ``"off"`` — never continue.
* ``"codex_only"`` — historical scope: continue only on the
``codex_responses`` api_mode, and only for codebase/workspace acks
(``require_workspace=True``).
* ``"all"`` — user opted in for every api_mode; continue on any
future-ack + action verb (``require_workspace=False``).
Mirrors the four-mode shape of ``agent.tool_use_enforcement``: ``"auto"``
(default) → codex_only; ``True``/"true"/"always"/"yes"/"on" → all;
``False``/"false"/"never"/"no"/"off" → off; ``list`` → all when a substring
matches the active model name, else off.
"""
mode = getattr(agent, "_intent_ack_continuation", "auto")
if mode is True or (isinstance(mode, str) and mode.lower() in {"true", "always", "yes", "on"}):
return "all"
if mode is False or (isinstance(mode, str) and mode.lower() in {"false", "never", "no", "off"}):
return "off"
if isinstance(mode, list):
model_lower = (agent.model or "").lower()
return "all" if any(p.lower() in model_lower for p in mode if isinstance(p, str)) else "off"
# "auto" or any unrecognised value — historical codex-only behavior.
return "codex_only" if agent.api_mode == "codex_responses" else "off"
def intent_ack_continuation_enabled(agent) -> bool:
"""Whether intent-ack continuation should fire at all for this turn.
The ``codex_ack_continuations < 2`` per-turn cap and the
``looks_like_codex_intermediate_ack`` detector are applied by the caller;
this only decides the on/off gate. Callers that also need to know whether
the workspace requirement applies should use ``intent_ack_continuation_mode``
directly (``"codex_only"`` ⇒ require_workspace=True, ``"all"`` ⇒ False).
"""
return intent_ack_continuation_mode(agent) != "off"

View File

@@ -673,6 +673,9 @@ def _build_anthropic_client_with_bearer_hook(
kwargs = {
"timeout": timeout_obj,
"http_client": http_client,
# Delegate retry to hermes's outer loop (honors Retry-After); the SDK
# default max_retries=2 ignores it and double-retries. (#26293)
"max_retries": 0,
# The SDK requires *something* for api_key/auth_token. Our
# event hook overrides Authorization per request so this value
# is never sent. The sentinel string makes accidental leaks
@@ -757,6 +760,12 @@ def build_anthropic_client(
_read_timeout = timeout if (isinstance(timeout, (int, float)) and timeout > 0) else 900.0
kwargs = {
"timeout": Timeout(timeout=float(_read_timeout), connect=10.0),
# Delegate all rate-limit / 5xx retry to hermes's outer conversation
# loop, which honors Retry-After. The SDK default (max_retries=2) uses
# its own 1-2s backoff that ignores Retry-After and double-retries
# inside our loop — burning request slots against a bucket that won't
# refill for minutes. (#26293)
"max_retries": 0,
}
if normalized_base_url:
# Azure Anthropic endpoints require an ``api-version`` query parameter.
@@ -852,6 +861,9 @@ def build_anthropic_bedrock_client(region: str):
return _anthropic_sdk.AnthropicBedrock(
aws_region=region,
timeout=Timeout(timeout=900.0, connect=10.0),
# Delegate retry to hermes's outer loop (honors Retry-After); the SDK
# default max_retries=2 ignores it and double-retries. (#26293)
max_retries=0,
default_headers={"anthropic-beta": ",".join([*_COMMON_BETAS, _CONTEXT_1M_BETA])},
)
@@ -914,44 +926,72 @@ def _read_claude_code_credentials_from_keychain() -> Optional[Dict[str, Any]]:
return None
def _read_claude_code_credentials_from_file() -> Optional[Dict[str, Any]]:
"""Read Claude Code OAuth credentials from ~/.claude/.credentials.json.
Returns dict with {accessToken, refreshToken?, expiresAt?, source} or None.
"""
cred_path = Path.home() / ".claude" / ".credentials.json"
if not cred_path.exists():
return None
try:
data = json.loads(cred_path.read_text(encoding="utf-8"))
except (json.JSONDecodeError, OSError, IOError) as e:
logger.debug("Failed to read ~/.claude/.credentials.json: %s", e)
return None
oauth_data = data.get("claudeAiOauth")
if not (oauth_data and isinstance(oauth_data, dict)):
return None
access_token = oauth_data.get("accessToken", "")
if not access_token:
return None
return {
"accessToken": access_token,
"refreshToken": oauth_data.get("refreshToken", ""),
"expiresAt": oauth_data.get("expiresAt", 0),
"source": "claude_code_credentials_file",
}
def read_claude_code_credentials() -> Optional[Dict[str, Any]]:
"""Read refreshable Claude Code OAuth credentials.
Checks two sources in order:
Reads from two possible sources and reconciles them:
1. macOS Keychain (Darwin only) — "Claude Code-credentials" entry
2. ~/.claude/.credentials.json file
Selection rules when both are present:
- If exactly one is non-expired, prefer that one. (Handles the case
where Claude Code refreshes one source but not the other — observed
in the wild on Claude Code 2.1.x.)
- Otherwise, prefer the source with the later ``expiresAt`` so that
any subsequent refresh uses the most recent ``refreshToken``.
This intentionally excludes ~/.claude.json primaryApiKey. Opencode's
subscription flow is OAuth/setup-token based with refreshable credentials,
and native direct Anthropic provider usage should follow that path rather
than auto-detecting Claude's first-party managed key.
Returns dict with {accessToken, refreshToken?, expiresAt?} or None.
Returns dict with {accessToken, refreshToken?, expiresAt?, source} or None.
"""
# Try macOS Keychain first (covers Claude Code >=2.1.114)
kc_creds = _read_claude_code_credentials_from_keychain()
if kc_creds:
return kc_creds
file_creds = _read_claude_code_credentials_from_file()
# Fall back to JSON file
cred_path = Path.home() / ".claude" / ".credentials.json"
if cred_path.exists():
try:
data = json.loads(cred_path.read_text(encoding="utf-8"))
oauth_data = data.get("claudeAiOauth")
if oauth_data and isinstance(oauth_data, dict):
access_token = oauth_data.get("accessToken", "")
if access_token:
return {
"accessToken": access_token,
"refreshToken": oauth_data.get("refreshToken", ""),
"expiresAt": oauth_data.get("expiresAt", 0),
"source": "claude_code_credentials_file",
}
except (json.JSONDecodeError, OSError, IOError) as e:
logger.debug("Failed to read ~/.claude/.credentials.json: %s", e)
if kc_creds and file_creds:
kc_valid = is_claude_code_token_valid(kc_creds)
file_valid = is_claude_code_token_valid(file_creds)
if kc_valid and not file_valid:
return kc_creds
if file_valid and not kc_valid:
return file_creds
# Both valid or both expired: prefer the later expiresAt so the
# downstream refresh path uses the freshest refresh_token.
kc_exp = kc_creds.get("expiresAt", 0) or 0
file_exp = file_creds.get("expiresAt", 0) or 0
return kc_creds if kc_exp >= file_exp else file_creds
return None
return kc_creds or file_creds
def is_claude_code_token_valid(creds: Dict[str, Any]) -> bool:
@@ -1034,8 +1074,40 @@ def refresh_anthropic_oauth_pure(refresh_token: str, *, use_json: bool = False)
def _refresh_oauth_token(creds: Dict[str, Any]) -> Optional[str]:
"""Attempt to refresh an expired Claude Code OAuth token."""
refresh_token = creds.get("refreshToken", "")
"""Attempt to refresh an expired Claude Code OAuth token.
Claude Code's OAuth refresh tokens are single-use: a successful refresh
rotates the pair and invalidates the old refresh token. Claude Code itself
also refreshes on its own schedule (IDE/CLI activity), so by the time
Hermes notices an expired token, Claude Code may have already rotated it.
POSTing our now-stale refresh token in that window races Claude Code and
fails with ``invalid_grant``.
So before refreshing, re-read the live credential sources. If Claude Code
has already produced a valid token, adopt it and skip the POST entirely.
Only fall back to refreshing ourselves when no fresh credential is found.
"""
# Claude Code may have already refreshed — adopt its token rather than
# racing it with our (possibly already-rotated) refresh token. Only adopt
# when the live re-read produced a DIFFERENT token with a real future
# expiry: re-adopting the same credential we were just handed would be a
# no-op, and a 0/absent ``expiresAt`` means "managed key / unknown expiry"
# (see is_claude_code_token_valid) which must NOT be treated as a fresh
# refresh here.
current = read_claude_code_credentials()
if current:
current_token = current.get("accessToken", "")
current_exp = current.get("expiresAt", 0) or 0
if (
current_token
and current_token != creds.get("accessToken", "")
and current_exp > 0
and is_claude_code_token_valid(current)
):
logger.debug("Adopted Claude Code's already-refreshed OAuth token")
return current_token
refresh_token = (current or {}).get("refreshToken", "") or creds.get("refreshToken", "")
if not refresh_token:
logger.debug("No refresh token available — cannot refresh")
return None
@@ -1297,7 +1369,15 @@ def run_oauth_setup_token() -> Optional[str]:
# Stores credentials in ~/.hermes/.anthropic_oauth.json (our own file).
_OAUTH_CLIENT_ID = "9d1c250a-e61b-44d9-88ed-5944d1962f5e"
_OAUTH_TOKEN_URL = "https://console.anthropic.com/v1/oauth/token"
# Anthropic migrated the OAuth token endpoint to platform.claude.com;
# console.anthropic.com now 404s. Callers should iterate _OAUTH_TOKEN_URLS
# (new host first, console fallback). _OAUTH_TOKEN_URL is kept as the primary
# for backward compatibility with existing imports and now points at the live host.
_OAUTH_TOKEN_URLS = [
"https://platform.claude.com/v1/oauth/token",
"https://console.anthropic.com/v1/oauth/token",
]
_OAUTH_TOKEN_URL = _OAUTH_TOKEN_URLS[0]
_OAUTH_REDIRECT_URI = "https://console.anthropic.com/oauth/code/callback"
_OAUTH_SCOPES = "org:create_api_key user:profile user:inference"
_HERMES_OAUTH_FILE = get_hermes_home() / ".anthropic_oauth.json"
@@ -1395,18 +1475,34 @@ def run_hermes_oauth_login_pure() -> Optional[Dict[str, Any]]:
"code_verifier": verifier,
}).encode()
req = urllib.request.Request(
_OAUTH_TOKEN_URL,
data=exchange_data,
headers={
"Content-Type": "application/json",
"User-Agent": f"claude-cli/{_get_claude_code_version()} (external, cli)",
},
method="POST",
)
# Anthropic migrated the OAuth token endpoint to platform.claude.com;
# console.anthropic.com now 404s. Try the new host first, then fall
# back to console for older deployments (mirrors the refresh path).
result = None
last_error = None
for endpoint in _OAUTH_TOKEN_URLS:
req = urllib.request.Request(
endpoint,
data=exchange_data,
headers={
"Content-Type": "application/json",
"User-Agent": f"claude-cli/{_get_claude_code_version()} (external, cli)",
},
method="POST",
)
try:
with urllib.request.urlopen(req, timeout=15) as resp:
result = json.loads(resp.read().decode())
break
except Exception as exc:
last_error = exc
logger.debug("Anthropic token exchange failed at %s: %s", endpoint, exc)
continue
with urllib.request.urlopen(req, timeout=15) as resp:
result = json.loads(resp.read().decode())
if result is None:
raise last_error if last_error is not None else ValueError(
"Anthropic token exchange failed"
)
except Exception as e:
print(f"Token exchange failed: {e}")
return None

View File

@@ -101,6 +101,8 @@ class _OpenAIProxy:
OpenAI = _OpenAIProxy() # module-level name, resolves lazily on call/isinstance
from agent.credential_pool import load_pool
from agent.model_metadata import MINIMUM_CONTEXT_LENGTH, get_model_context_length
from agent.process_bootstrap import build_keepalive_http_client
from hermes_cli.config import get_hermes_home
from hermes_constants import OPENROUTER_BASE_URL
from utils import base_url_host_matches, base_url_hostname, env_float, model_forces_max_completion_tokens, normalize_proxy_env_vars
@@ -108,6 +110,23 @@ from utils import base_url_host_matches, base_url_hostname, env_float, model_for
logger = logging.getLogger(__name__)
def _openai_http_client_kwargs(
base_url: Optional[str],
*,
async_mode: bool = False,
) -> Dict[str, Any]:
"""Inject keepalive httpx client with env-only proxy (not macOS system proxy)."""
client = build_keepalive_http_client(str(base_url or ""), async_mode=async_mode)
if client is None:
return {}
return {"http_client": client}
def _create_openai_client(*, api_key: str, base_url: str, **kwargs: Any) -> Any:
kwargs = {**_openai_http_client_kwargs(base_url), **kwargs}
return OpenAI(api_key=api_key, base_url=base_url, **kwargs)
# ── Interrupt protection for atomic auxiliary tasks ──────────────────────
# Some auxiliary tasks must NOT be aborted mid-flight by a gateway interrupt
# (e.g. an incoming user message while the agent is busy). Context
@@ -665,6 +684,28 @@ def _pool_runtime_base_url(entry: Any, fallback: str = "") -> str:
return str(url or "").strip().rstrip("/")
# Hostnames (lowercase, exact) that the auxiliary Anthropic path is allowed to
# be pointed at via config.yaml model.base_url. Anything else falls back to the
# Anthropic default — operators routing main-session traffic through a
# non-Anthropic host (e.g. OpenRouter, OpenAI) with provider=anthropic in config
# must NOT have that foreign host leak into the auxiliary client. See #52608.
_ANTHROPIC_COMPATIBLE_HOSTS = frozenset({
"api.anthropic.com",
})
def _is_anthropic_compatible_host(url: str) -> bool:
"""Return True if ``url``'s hostname is an Anthropic endpoint we trust for aux calls."""
if not url:
return False
try:
from urllib.parse import urlparse
host = (urlparse(url).hostname or "").strip().lower().rstrip(".")
return host in _ANTHROPIC_COMPATIBLE_HOSTS
except Exception:
return False
def _nous_min_key_ttl_seconds() -> int:
try:
return max(60, int(os.getenv("HERMES_NOUS_MIN_KEY_TTL_SECONDS", "1800")))
@@ -1591,7 +1632,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
_merged_aux = _apply_user_default_headers(extra.get("default_headers"))
if _merged_aux:
extra["default_headers"] = _merged_aux
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _create_openai_client(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, raw_base_url)
return _client, model
@@ -1631,7 +1672,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
_merged_aux2 = _apply_user_default_headers(extra.get("default_headers"))
if _merged_aux2:
extra["default_headers"] = _merged_aux2
_client = OpenAI(api_key=api_key, base_url=base_url, **extra)
_client = _create_openai_client(api_key=api_key, base_url=base_url, **extra)
_client = _maybe_wrap_anthropic(_client, model, api_key, raw_base_url)
return _client, model
@@ -1646,20 +1687,21 @@ def _try_openrouter(explicit_api_key: str = None, model: str = None) -> Tuple[Op
pool_present, entry = _select_pool_entry("openrouter")
if pool_present:
or_key = explicit_api_key or _pool_runtime_api_key(entry)
if not or_key:
_mark_provider_unhealthy("openrouter", ttl=60)
return None, None
base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL
logger.debug("Auxiliary client: OpenRouter via pool")
return OpenAI(api_key=or_key, base_url=base_url,
default_headers=build_or_headers()), model or _OPENROUTER_MODEL
if or_key:
base_url = _pool_runtime_base_url(entry, OPENROUTER_BASE_URL) or OPENROUTER_BASE_URL
logger.debug("Auxiliary client: OpenRouter via pool")
return _create_openai_client(api_key=or_key, base_url=base_url,
default_headers=build_or_headers()), model or _OPENROUTER_MODEL
# Pool exists but is exhausted (no usable runtime key) — fall through to
# the OPENROUTER_API_KEY env-var path rather than failing outright.
logger.debug("Auxiliary client: OpenRouter pool exhausted, trying OPENROUTER_API_KEY")
or_key = explicit_api_key or os.getenv("OPENROUTER_API_KEY")
if not or_key:
_mark_provider_unhealthy("openrouter", ttl=60)
return None, None
logger.debug("Auxiliary client: OpenRouter")
return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
return _create_openai_client(api_key=or_key, base_url=OPENROUTER_BASE_URL,
default_headers=build_or_headers()), model or _OPENROUTER_MODEL
@@ -1752,7 +1794,7 @@ def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
return None, None
base_url = str((nous or {}).get("inference_base_url") or _nous_base_url()).rstrip("/")
return (
OpenAI(
_create_openai_client(
api_key=api_key,
base_url=base_url,
),
@@ -2029,7 +2071,7 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
if _custom_headers:
_extra["default_headers"] = _custom_headers
if custom_mode == "codex_responses":
real_client = OpenAI(api_key=custom_key, base_url=_clean_base, **_extra)
real_client = _create_openai_client(api_key=custom_key, base_url=_clean_base, **_extra)
return CodexAuxiliaryClient(real_client, model), model
if custom_mode == "anthropic_messages":
# Third-party Anthropic-compatible gateway (MiniMax, Zhipu GLM,
@@ -2043,14 +2085,14 @@ def _try_custom_endpoint() -> Tuple[Optional[Any], Optional[str]]:
"Custom endpoint declares api_mode=anthropic_messages but the "
"anthropic SDK is not installed — falling back to OpenAI-wire."
)
return OpenAI(api_key=custom_key, base_url=_clean_base, **_extra), model
return _create_openai_client(api_key=custom_key, base_url=_clean_base, **_extra), model
return (
AnthropicAuxiliaryClient(real_client, model, custom_key, custom_base, is_oauth=False),
model,
)
# URL-based anthropic detection for custom endpoints that didn't set
# api_mode explicitly (e.g. kimi.com/coding reached via custom config).
_fallback_client = OpenAI(api_key=custom_key, base_url=_clean_base, **_extra)
_fallback_client = _create_openai_client(api_key=custom_key, base_url=_clean_base, **_extra)
_fallback_client = _maybe_wrap_anthropic(
_fallback_client, model, custom_key, custom_base, custom_mode,
)
@@ -2079,7 +2121,7 @@ def _build_xai_oauth_aux_client(model: str) -> Tuple[Optional[Any], Optional[str
return None, None
api_key, base_url = resolved
logger.debug("Auxiliary client: xAI OAuth (%s via Responses API)", model)
real_client = OpenAI(api_key=api_key, base_url=base_url)
real_client = _create_openai_client(api_key=api_key, base_url=base_url)
return CodexAuxiliaryClient(real_client, model), model
@@ -2116,7 +2158,7 @@ def _build_codex_client(model: str) -> Tuple[Optional[Any], Optional[str]]:
return None, None
base_url = _CODEX_AUX_BASE_URL
logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", model)
real_client = OpenAI(
real_client = _create_openai_client(
api_key=codex_token,
base_url=base_url,
default_headers=_codex_cloudflare_headers(codex_token),
@@ -2216,7 +2258,7 @@ def _try_azure_foundry(
if _dq:
extra["default_query"] = _dq
client = OpenAI(api_key=api_key, base_url=_clean_base, **extra)
client = _create_openai_client(api_key=api_key, base_url=_clean_base, **extra)
if runtime_api_mode == "codex_responses":
# GPT-5.x / o-series / codex models on Azure Foundry are
@@ -2255,9 +2297,16 @@ def _try_anthropic(explicit_api_key: str = None) -> Tuple[Optional[Any], Optiona
if not token:
return None, None
# Allow base URL override from config.yaml model.base_url, but only
# when the configured provider is anthropic otherwise a non-Anthropic
# base_url (e.g. Codex endpoint) would leak into Anthropic requests.
# Allow base URL override from config.yaml model.base_url, but only when:
# 1. the configured provider is anthropic (otherwise a non-Anthropic
# base_url, e.g. Codex endpoint, would leak into Anthropic requests), AND
# 2. the override URL actually points at an Anthropic-compatible endpoint.
# Without gate (2), operators who route main-session traffic through a
# non-Anthropic provider that accepts Anthropic-format requests (e.g.
# OpenRouter at openrouter.ai/api/v1, with provider=anthropic in config.yaml)
# would have every auxiliary side-channel call (memory extractors,
# reflection, vision, title generation) 401 from the foreign host —
# see issue #52608.
base_url = _pool_runtime_base_url(entry, _ANTHROPIC_DEFAULT_BASE_URL) if pool_present else _ANTHROPIC_DEFAULT_BASE_URL
try:
from hermes_cli.config import load_config
@@ -2267,7 +2316,7 @@ def _try_anthropic(explicit_api_key: str = None) -> Tuple[Optional[Any], Optiona
cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
if cfg_provider == "anthropic":
cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
if cfg_base_url:
if cfg_base_url and _is_anthropic_compatible_host(cfg_base_url):
base_url = cfg_base_url
except Exception:
pass
@@ -2470,7 +2519,7 @@ def _is_payment_error(exc: Exception) -> bool:
# but sometimes wrap them in 429 or other codes.
# Daily quota exhaustion from Bedrock, Vertex AI, and similar providers
# uses different language but is semantically identical to credit exhaustion.
if status in {402, 404, 429, None}:
if status in {402, 403, 404, 429, None}:
if any(kw in err_lower for kw in (
"credits", "insufficient funds",
"can only afford", "billing",
@@ -2479,6 +2528,8 @@ def _is_payment_error(exc: Exception) -> bool:
"balance_depleted", "no usable credits",
"model_not_supported_on_free_tier",
"not available on the free tier",
"requires a subscription", "upgrade for access",
"upgrade for higher limits", "reached your session usage limit",
# Daily / monthly / weekly quota exhaustion keywords
"quota exceeded", "quota_exceeded",
"too many tokens per day", "daily limit",
@@ -2697,6 +2748,79 @@ def _is_model_not_found_error(exc: Exception) -> bool:
))
def _is_model_incompatible_error(exc: Exception) -> bool:
"""Detect "this route cannot serve this model" 400s (capability mismatch).
Distinct from :func:`_is_model_not_found_error` (the model does not exist
anywhere): here the model name is valid but the *current provider/account*
is structurally unable to run it. The canonical case is a configured
fallback that cannot run the main model — e.g. an ``openai-codex`` /
ChatGPT-account fallback asked to compress a ``glm-5.2`` conversation::
Error code: 400 - {'detail': "The 'glm-5.2' model is not supported
when using Codex with a ChatGPT account."}
The candidate authenticates fine and builds a client, so the auth and
payment predicates don't fire and the call would otherwise raise and
abort the whole auxiliary task (commonly compression — which then drops
middle turns and churns the session, destroying the prompt cache).
Treating it as a fallback-worthy capability error lets the chain skip the
incapable route and continue to the next candidate, mirroring the
context-window feasibility screen (#52392).
Billing/quota 400s belong to :func:`_is_payment_error`; "model does not
exist" 400s belong to :func:`_is_model_not_found_error`. This predicate
explicitly excludes both so the three don't overlap.
"""
status = getattr(exc, "status_code", None)
if status not in {400, None}:
return False
err_lower = str(exc).lower()
# Not-found 400s ("invalid model ID", "model does not exist") are owned by
# _is_model_not_found_error. Billing/free-tier 400s are owned by the
# payment path — key on the billing keywords directly here rather than
# calling _is_payment_error(), because that predicate is status-gated
# ({402,403,404,429,None}) and would not recognise a 400-coded billing
# body, letting it leak into this capability bucket.
if _is_model_not_found_error(exc):
return False
if any(kw in err_lower for kw in (
"credits", "insufficient funds", "billing", "out of funds",
"balance_depleted", "no usable credits", "payment required",
"free tier", "free-tier", "not available on the free tier",
"model_not_supported_on_free_tier", "quota",
)):
return False
return any(kw in err_lower for kw in (
"is not supported when using", # codex/ChatGPT-account model gating
"model is not supported",
"not supported with this",
"not supported for this account",
"model_not_supported",
"does not support this model",
"unsupported model",
))
def _is_invalid_aux_response_error(exc: Exception) -> bool:
"""Detect provider responses that authenticated but cannot serve aux shape.
Some OpenAI-compatible routes return HTTP 200 with an empty/malformed
ChatCompletion instead of a normal provider error. That is still a
provider/model capability failure for auxiliary tasks: downstream callers
need ``choices[0].message`` and should be able to continue through the
same fallback path as explicit model-incompatibility errors.
"""
if not isinstance(exc, RuntimeError):
return False
msg = str(exc).lower()
return (
"auxiliary " in msg
and "llm returned invalid response" in msg
and "choices[0].message" in msg
)
def _evict_cached_clients(provider: str) -> None:
"""Drop cached auxiliary clients for a provider so fresh creds are used."""
normalized = _normalize_aux_provider(provider)
@@ -3147,6 +3271,88 @@ def _try_main_agent_model_fallback(
return client, resolved_model or main_model, label
# ── Context-window screening for runtime fallback chains (issue #52392) ──
#
# When the runtime auxiliary fallback chain selects a candidate that is
# reachable but has a context window smaller than the compression task
# requires, the call errors out instead of continuing to the next, viable
# candidate. The startup feasibility check in
# ``agent.conversation_compression.check_compression_model_feasibility``
# already filters too-small auxiliary models at startup, but the runtime
# fallback chain (``_try_configured_fallback_chain`` and
# ``_try_main_fallback_chain``) does not apply the same filter, so
# compression can stop at the first alive door even if the room behind it
# is too small.
#
# The helpers below screen each candidate by its effective context window
# before it is returned. ``None`` results from ``get_model_context_length``
# are passed through (we cannot prove a model is too small, so we do not
# block it). This preserves the existing fallback surface for
# unrecognised/custom models while closing the gap on the well-known ones.
def _task_minimum_context_length(task: Optional[str]) -> Optional[int]:
"""Return the minimum context length required for an auxiliary task.
Only ``compression`` carries an explicit minimum today (the same
``MINIMUM_CONTEXT_LENGTH`` (64K) floor that
``check_compression_model_feasibility`` already enforces at startup).
Other tasks (``vision``, ``title_generation``, ``web_extract``,
``skills_hub``, ``mcp``, ``session_search``) return ``None`` — they
have no per-task context floor and the runtime chain must remain
permissive for them.
Returns ``None`` for an empty/``None`` task name so the helper is a
safe no-op when called from generic sites.
"""
if not task:
return None
if task == "compression":
return MINIMUM_CONTEXT_LENGTH
return None
def _candidate_context_window(
provider: str,
model: str,
base_url: str = "",
api_key: str = "",
) -> Optional[int]:
"""Resolve the effective context window for a fallback candidate.
Thin wrapper around :func:`agent.model_metadata.get_model_context_length`
that swallows probe failures (returns ``None``). Callers treat
``None`` as "unknown — pass through" so the existing fallback
surface is preserved when the context-length resolver chain cannot
determine a value (custom endpoints, models not in the registry,
offline endpoints).
Best-effort, never raises — the runtime fallback chain must keep
moving even if the resolver hits a probe error.
"""
if not model:
return None
try:
ctx = get_model_context_length(
model,
base_url=base_url,
api_key=api_key,
provider=provider,
)
except Exception as exc:
logger.debug(
"Auxiliary fallback: could not resolve context window for %s/%s: %s",
provider, model, exc,
)
return None
# ``get_model_context_length`` returns an int (with a 256K default
# fallback when nothing else matches). We still propagate ``None`` if
# a future change returns ``Optional[int]`` — being explicit is
# cheap and the test suite covers both shapes.
if isinstance(ctx, int) and ctx > 0:
return ctx
return None
def _try_configured_fallback_chain(
task: str,
failed_provider: str,
@@ -3171,6 +3377,7 @@ def _try_configured_fallback_chain(
skip = failed_provider.lower().strip()
tried = []
min_ctx = _task_minimum_context_length(task)
for i, entry in enumerate(chain):
if not isinstance(entry, dict):
@@ -3188,6 +3395,20 @@ def _try_configured_fallback_chain(
fb_client, resolved_model = None, None
if fb_client is not None:
if min_ctx is not None and resolved_model:
fb_ctx = _candidate_context_window(
fb_provider,
resolved_model,
base_url=str(entry.get("base_url") or ""),
api_key=_fallback_entry_api_key(entry) or "",
)
if fb_ctx is not None and fb_ctx < min_ctx:
logger.info(
"Auxiliary %s: skipping %s (%s context=%d < min=%d), continuing chain",
task, label, resolved_model, fb_ctx, min_ctx,
)
tried.append(f"{label} (context too small: {fb_ctx}<{min_ctx})")
continue
logger.info(
"Auxiliary %s: %s on %s — configured fallback to %s (%s)",
task, reason, failed_provider, label, resolved_model or fb_model or "default",
@@ -3203,6 +3424,28 @@ def _try_configured_fallback_chain(
return None, None, ""
def _try_configured_fallback_for_unavailable_client(
task: Optional[str],
failed_provider: str,
) -> Tuple[Optional[Any], Optional[str], str]:
"""Try task fallback_chain when an explicit aux provider cannot build.
This covers the "no client" case before any request is sent: missing
raw env key, unavailable OAuth/pool credentials, or provider resolver
returning ``(None, None)``. It deliberately stops at the configured
per-task fallback chain; the main-agent model remains the last-resort
runtime fallback for request-time capacity errors.
"""
explicit = (failed_provider or "").strip().lower()
if not task or not explicit or explicit in {"auto"}:
return None, None, ""
return _try_configured_fallback_chain(
task,
explicit,
reason="provider unavailable",
)
def _fallback_entry_api_key(entry: Dict[str, Any]) -> Optional[str]:
"""Resolve inline or env-backed API key from a fallback-chain entry."""
explicit = str(entry.get("api_key") or "").strip()
@@ -3261,6 +3504,7 @@ def _try_main_fallback_chain(
main_norm = (_read_main_provider() or "").strip().lower()
skip = {p for p in (failed_norm, main_norm, "auto") if p}
tried: List[str] = []
min_ctx = _task_minimum_context_length(task)
for i, entry in enumerate(chain):
if not isinstance(entry, dict):
@@ -3284,6 +3528,20 @@ def _try_main_fallback_chain(
logger.debug("Auxiliary %s: main fallback %s failed to resolve: %s", task or "call", label, exc)
fb_client, resolved_model = None, None
if fb_client is not None:
if min_ctx is not None:
fb_ctx = _candidate_context_window(
fb_provider,
resolved_model or fb_model,
base_url=str(entry.get("base_url") or ""),
api_key=_fallback_entry_api_key(entry) or "",
)
if fb_ctx is not None and fb_ctx < min_ctx:
logger.info(
"Auxiliary %s: skipping %s (context=%d < min=%d), continuing chain",
task or "call", label, fb_ctx, min_ctx,
)
tried.append(f"{label} (context too small: {fb_ctx}<{min_ctx})")
continue
logger.info(
"Auxiliary %s: %s on %s — main fallback chain to %s (%s)",
task or "call", reason, failed_provider or "auto", label,
@@ -3385,6 +3643,37 @@ def _resolve_auto(
# config.yaml (auxiliary.<task>.provider) still win over this.
main_provider = str(runtime_provider or _read_main_provider() or "")
main_model = str(runtime_model or _read_main_model() or "")
# MoA virtual provider: the "model" is a preset name (e.g. "opus-gpt") and
# there is no real "moa" HTTP endpoint, so resolving an aux client against
# provider="moa"/model=<preset> sends the preset name as the model id and
# the provider 400s ("opus-gpt is not a valid model ID"). Auxiliary tasks
# (title generation, compression, vision, …) don't need the reference
# fan-out — they should run on the aggregator, which is the preset's acting
# model. Resolve the MoA preset to its aggregator slot and continue Step 1
# with that real provider+model. Mirrors the MoA context-length resolution.
if main_provider == "moa":
try:
from hermes_cli.config import load_config
from hermes_cli.moa_config import resolve_moa_preset
_preset = resolve_moa_preset(load_config().get("moa") or {}, main_model)
_agg = _preset.get("aggregator") or {}
_agg_provider = str(_agg.get("provider") or "").strip()
_agg_model = str(_agg.get("model") or "").strip()
if _agg_provider and _agg_model and _agg_provider.lower() != "moa":
main_provider = _agg_provider
main_model = _agg_model
# The MoA virtual runtime carries a non-HTTP base_url
# ("moa://local") and a placeholder api_key; they belong to the
# facade, not the aggregator's real provider. Drop them so the
# aggregator resolves through its own provider credentials.
runtime_base_url = ""
runtime_api_key = ""
runtime_api_mode = ""
except Exception:
logger.debug("MoA aux resolution to aggregator failed", exc_info=True)
if (main_provider and main_model
and main_provider not in {"auto", ""}):
resolved_provider = main_provider
@@ -3531,6 +3820,10 @@ def _to_async_client(sync_client, model: str, is_vision: bool = False):
_merged_async = _apply_user_default_headers(async_kwargs.get("default_headers"))
if _merged_async:
async_kwargs["default_headers"] = _merged_async
async_kwargs = {
**_openai_http_client_kwargs(sync_base_url, async_mode=True),
**async_kwargs,
}
return AsyncOpenAI(**async_kwargs), model
@@ -3741,7 +4034,7 @@ def resolve_provider_client(
"but no Codex OAuth token found (run: hermes model)")
return None, None
final_model = _normalize_resolved_model(model, provider)
raw_client = OpenAI(
raw_client = _create_openai_client(
api_key=codex_token,
base_url=_CODEX_AUX_BASE_URL,
default_headers=_codex_cloudflare_headers(codex_token),
@@ -3822,7 +4115,7 @@ def resolve_provider_client(
_merged_custom = _apply_user_default_headers(extra.get("default_headers"))
if _merged_custom:
extra["default_headers"] = _merged_custom
client = OpenAI(api_key=custom_key, base_url=_clean_base, **extra)
client = _create_openai_client(api_key=custom_key, base_url=_clean_base, **extra)
client = _wrap_if_needed(client, final_model, custom_base, custom_key)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
@@ -3926,7 +4219,7 @@ def resolve_provider_client(
_fb_headers = _apply_user_default_headers(_fb_extra.get("default_headers"))
if _fb_headers:
_fb_extra["default_headers"] = _fb_headers
client = OpenAI(api_key=custom_key, base_url=_fb_clean, **_fb_extra)
client = _create_openai_client(api_key=custom_key, base_url=_fb_clean, **_fb_extra)
return (_to_async_client(client, final_model, is_vision=is_vision) if async_mode
else (client, final_model))
sync_anthropic = AnthropicAuxiliaryClient(
@@ -3935,7 +4228,7 @@ def resolve_provider_client(
if async_mode:
return AsyncAnthropicAuxiliaryClient(sync_anthropic), final_model
return sync_anthropic, final_model
client = OpenAI(api_key=custom_key, base_url=_clean_base2, **_extra2)
client = _create_openai_client(api_key=custom_key, base_url=_clean_base2, **_extra2)
# codex_responses or inherited auto-detect (via _wrap_if_needed).
# _wrap_if_needed reads the closed-over `api_mode` (the task-level
# override). Named-provider entry api_mode=codex_responses also
@@ -4077,7 +4370,7 @@ def resolve_provider_client(
_merged_main = _apply_user_default_headers(headers)
if _merged_main:
headers = _merged_main
client = OpenAI(api_key=api_key, base_url=base_url,
client = _create_openai_client(api_key=api_key, base_url=base_url,
**({"default_headers": headers} if headers else {}))
# Copilot GPT-5+ models (except gpt-5-mini) require the Responses
@@ -4613,7 +4906,7 @@ def _refresh_nous_auxiliary_client(
return None, model
fresh_key, fresh_base_url = runtime
sync_client = OpenAI(api_key=fresh_key, base_url=fresh_base_url)
sync_client = _create_openai_client(api_key=fresh_key, base_url=fresh_base_url)
final_model = model
current_loop = None
@@ -5196,10 +5489,24 @@ def _build_call_kwargs(
# ``/anthropic`` endpoint reached through the OpenAI SDK wrapper), where
# max_tokens is a MANDATORY field — omitting it is a hard 400. Keep it only
# there.
#
# NVIDIA NIM (integrate.api.nvidia.com and local NIM endpoints) is a
# second exception: some models—notably minimaxai/minimax-m3—return HTTP
# 200 with an empty choices[] payload when max_tokens is omitted. The main
# NVIDIA chat path already sends an output cap via the provider profile;
# preserve it on the auxiliary path too.
_effective_base = base_url or (
_current_custom_base_url() if provider == "custom" else ""
)
if _is_anthropic_compat_endpoint(provider, _effective_base):
_provider_norm = str(provider or "").strip().lower()
_is_nvidia_nim = (
_provider_norm in {"nvidia", "nvidia-nim", "nim", "build-nvidia", "nemotron"}
or base_url_host_matches(_effective_base, "integrate.api.nvidia.com")
)
if (
_is_anthropic_compat_endpoint(provider, _effective_base)
or _is_nvidia_nim
):
kwargs["max_tokens"] = max_tokens
if tools:
@@ -5254,6 +5561,9 @@ def _validate_llm_response(response: Any, task: str = None) -> Any:
if not choices or not hasattr(choices[0], "message"):
raise AttributeError("missing choices[0].message")
except (AttributeError, TypeError, IndexError) as exc:
recovered = _recover_aux_response_message(response)
if recovered is not None:
return recovered
response_type = type(response).__name__
response_preview = str(response)[:120]
raise RuntimeError(
@@ -5265,6 +5575,64 @@ def _validate_llm_response(response: Any, task: str = None) -> Any:
return response
def _recover_aux_response_message(response: Any) -> Optional[Any]:
"""Synthesize chat-completions shape from Responses-style text fields.
Auxiliary callers consume ``choices[0].message``. Some compatible
endpoints return text outside ``choices`` (for example ``output_text`` or
``output`` items). Preserve that response before declaring it malformed.
"""
text = _extract_aux_response_text(response)
if not text:
return None
choice = SimpleNamespace(
message=SimpleNamespace(content=text),
finish_reason=getattr(response, "finish_reason", None) or "stop",
)
try:
response.choices = [choice]
return response
except Exception:
return SimpleNamespace(
id=getattr(response, "id", ""),
model=getattr(response, "model", ""),
object=getattr(response, "object", "chat.completion"),
choices=[choice],
usage=getattr(response, "usage", None),
)
def _extract_aux_response_text(response: Any) -> str:
output_text = _obj_get(response, "output_text")
if isinstance(output_text, str) and output_text.strip():
return output_text.strip()
output = _obj_get(response, "output")
if not isinstance(output, list):
return ""
parts: List[str] = []
for item in output:
item_type = _obj_get(item, "type")
if item_type and item_type != "message":
continue
for part in (_obj_get(item, "content") or []):
part_type = _obj_get(part, "type")
if part_type in {"output_text", "text", None}:
text = _obj_get(part, "text")
if isinstance(text, str) and text.strip():
parts.append(text.strip())
return "\n".join(parts).strip()
def _obj_get(obj: Any, key: str, default: Any = None) -> Any:
value = getattr(obj, key, default)
if value is default and isinstance(obj, dict):
value = obj.get(key, default)
return value
def call_llm(
task: str = None,
*,
@@ -5344,21 +5712,30 @@ def call_llm(
)
if client is None:
# When the user explicitly chose a non-OpenRouter provider but no
# credentials were found, fail fast instead of silently routing
# through OpenRouter (which causes confusing 404s).
# credentials were found, honor the task fallback_chain before
# raising. Missing raw env keys are recoverable for auxiliary
# tasks because fallback entries may use OAuth / credential-pool
# auth (for example openai-codex).
_explicit = (resolved_provider or "").strip().lower()
if _explicit and _explicit not in {"auto", "openrouter", "custom"}:
raise RuntimeError(
f"Provider '{_explicit}' is set in config.yaml but no API key "
f"was found. Set the {_explicit.upper()}_API_KEY environment "
f"variable, or switch to a different provider with `hermes model`."
fb_client, fb_model, fb_label = _try_configured_fallback_for_unavailable_client(
task, _explicit,
)
if fb_client is not None:
client, final_model = fb_client, fb_model
resolved_provider = fb_label or resolved_provider
else:
raise RuntimeError(
f"Provider '{_explicit}' is set in config.yaml but no API key "
f"was found. Set the {_explicit.upper()}_API_KEY environment "
f"variable, or switch to a different provider with `hermes model`."
)
# For auto/custom with no credentials, try the full auto chain
# rather than hardcoding OpenRouter (which may be depleted).
# Pass model=None so each provider uses its own default —
# resolved_model may be an OpenRouter-format slug that doesn't
# work on other providers.
if not resolved_base_url:
if client is None and not resolved_base_url:
logger.info("Auxiliary %s: provider %s unavailable, trying auto-detection chain",
task or "call", resolved_provider)
client, final_model = _get_cached_client("auto", main_runtime=main_runtime, task=task)
@@ -5653,10 +6030,21 @@ def call_llm(
# When the provider returns a 429 rate-limit (not billing), fall
# back to an alternative provider instead of exhausting retries
# against the same rate-limited endpoint.
#
# ── Auth error fallback (#21165) ─────────────────────────────
# When the resolved provider returns 401 and neither the Nous
# refresh path nor explicit provider credential refresh applies,
# fall back to an alternative provider instead of dropping the
# auxiliary task on the floor (silent compression failure /
# message loss). Auth is NOT a capacity error: it only bypasses
# the explicit-provider gate when the user is in auto mode.
should_fallback = (
_is_payment_error(first_err)
_is_auth_error(first_err)
or _is_payment_error(first_err)
or _is_connection_error(first_err)
or _is_rate_limit_error(first_err)
or _is_model_incompatible_error(first_err)
or _is_invalid_aux_response_error(first_err)
)
# Respect explicit provider choice for transient errors (auth, request
# validation, etc.) but allow fallback when the provider clearly cannot
@@ -5667,9 +6055,24 @@ def call_llm(
is_auto = resolved_provider in {"auto", "", None}
# Capacity errors bypass the explicit-provider gate: the provider
# literally cannot serve this request regardless of user intent.
is_capacity_error = _is_payment_error(first_err) or _is_connection_error(first_err)
# Rate limits are included: after retries are exhausted, a 429 means
# the provider cannot serve this request — fall back. See #52228.
# Model-incompatibility 400s are also a hard capability mismatch (the
# route cannot run this model at all — e.g. a codex/ChatGPT-account
# fallback asked to compress a glm-5.2 conversation), so they bypass
# the explicit-provider gate and continue to the next candidate
# instead of aborting the auxiliary task and churning the session.
is_capacity_error = (
_is_payment_error(first_err)
or _is_connection_error(first_err)
or _is_rate_limit_error(first_err)
or _is_model_incompatible_error(first_err)
or _is_invalid_aux_response_error(first_err)
)
if should_fallback and (is_auto or is_capacity_error):
if _is_payment_error(first_err):
if _is_auth_error(first_err):
reason = "auth error"
elif _is_payment_error(first_err):
reason = "payment error"
# Resolve the actual provider label (resolved_provider may be
# "auto"; the client's base_url tells us which backend got the
@@ -5680,6 +6083,10 @@ def call_llm(
)
elif _is_rate_limit_error(first_err):
reason = "rate limit"
elif _is_model_incompatible_error(first_err):
reason = "model incompatible with route"
elif _is_invalid_aux_response_error(first_err):
reason = "invalid provider response"
else:
reason = "connection error"
logger.info("Auxiliary %s: %s on %s (%s), trying fallback",
@@ -5854,12 +6261,21 @@ async def async_call_llm(
if client is None:
_explicit = (resolved_provider or "").strip().lower()
if _explicit and _explicit not in {"auto", "openrouter", "custom"}:
raise RuntimeError(
f"Provider '{_explicit}' is set in config.yaml but no API key "
f"was found. Set the {_explicit.upper()}_API_KEY environment "
f"variable, or switch to a different provider with `hermes model`."
fb_client, fb_model, fb_label = _try_configured_fallback_for_unavailable_client(
task, _explicit,
)
if not resolved_base_url:
if fb_client is not None:
client, final_model = _to_async_client(
fb_client, fb_model or "", is_vision=(task == "vision")
)
resolved_provider = fb_label or resolved_provider
else:
raise RuntimeError(
f"Provider '{_explicit}' is set in config.yaml but no API key "
f"was found. Set the {_explicit.upper()}_API_KEY environment "
f"variable, or switch to a different provider with `hermes model`."
)
if client is None and not resolved_base_url:
logger.info("Auxiliary %s: provider %s unavailable, trying auto-detection chain",
task or "call", resolved_provider)
client, final_model = _get_cached_client("auto", async_mode=True, main_runtime=main_runtime, task=task)
@@ -6105,24 +6521,47 @@ async def async_call_llm(
raise
# ── Payment / connection / rate-limit fallback (mirrors sync call_llm) ──
# Auth error fallback (#21165): a 401 that survived the refresh path
# falls back in auto mode just like the sync call_llm() path. Auth is
# NOT a capacity error, so on an explicit provider it still respects
# the user's choice (handled by the is_auto/is_capacity_error gate).
should_fallback = (
_is_auth_error(first_err)
or _is_payment_error(first_err)
or _is_connection_error(first_err)
or _is_rate_limit_error(first_err)
or _is_model_incompatible_error(first_err)
or _is_invalid_aux_response_error(first_err)
)
# Capacity errors (payment/quota/connection/rate-limit) bypass the
# explicit-provider gate — the provider cannot serve the request
# regardless of user intent. Rate limits are included: after retries
# are exhausted, a 429 means the provider is at capacity. See #52228.
# See #26803: daily token quota must fall back like a 402 credit error.
# Model-incompatibility 400s (route cannot run this model at all)
# bypass the gate too — see the sync call_llm() path for rationale.
is_auto = resolved_provider in {"auto", "", None}
is_capacity_error = (
_is_payment_error(first_err)
or _is_connection_error(first_err)
or _is_rate_limit_error(first_err)
or _is_model_incompatible_error(first_err)
or _is_invalid_aux_response_error(first_err)
)
# Capacity errors (payment/quota/connection) bypass the explicit-provider
# gate — the provider cannot serve the request regardless of user intent.
# See #26803: daily token quota must fall back like a 402 credit error.
is_auto = resolved_provider in {"auto", "", None}
is_capacity_error = _is_payment_error(first_err) or _is_connection_error(first_err)
if should_fallback and (is_auto or is_capacity_error):
if _is_payment_error(first_err):
if _is_auth_error(first_err):
reason = "auth error"
elif _is_payment_error(first_err):
reason = "payment error"
_mark_provider_unhealthy(
_recoverable_pool_provider(resolved_provider, client) or resolved_provider
)
elif _is_rate_limit_error(first_err):
reason = "rate limit"
elif _is_model_incompatible_error(first_err):
reason = "model incompatible with route"
elif _is_invalid_aux_response_error(first_err):
reason = "invalid provider response"
else:
reason = "connection error"
logger.info("Auxiliary %s (async): %s on %s (%s), trying fallback",

View File

@@ -28,6 +28,7 @@ from typing import Any, Dict, Optional
from hermes_cli.timeouts import get_provider_request_timeout, get_provider_stale_timeout
from hermes_constants import PARTIAL_STREAM_STUB_ID, FINISH_REASON_LENGTH
from agent.error_classifier import FailoverReason
from agent.gemini_native_adapter import is_native_gemini_base_url
from agent.model_metadata import is_local_endpoint
from agent.message_sanitization import (
_sanitize_surrogates,
@@ -37,6 +38,18 @@ from tools.terminal_tool import is_persistent_env
from utils import base_url_host_matches, base_url_hostname, env_float, env_int
logger = logging.getLogger(__name__)
_OPENROUTER_PROVIDER_SORT_VALUES = {"throughput", "latency", "price"}
# When the fallback chain is fully exhausted on a non-rate-limit failure
# (e.g. every provider returns a non-retryable client error like HTTP 400),
# arm a short cooldown so the NEXT turn's restore_primary_runtime stays gated
# and does not reset _fallback_index=0 to replay the entire chain again.
# Without this, a client/gateway that re-submits immediately would re-marshal
# the full (potentially 80k-token) context once per provider every turn and
# can drive a constrained host into memory/swap exhaustion. Rate-limit /
# billing reasons keep their own 60s cooldown (set above); this is the
# narrower non-rate-limit case. See issue #24996.
_FALLBACK_EXHAUSTED_COOLDOWN_S = 5.0
def _ra():
@@ -115,6 +128,23 @@ def _is_openai_codex_backend(agent) -> bool:
)
def _validated_openrouter_provider_sort(raw_sort: Any) -> Optional[str]:
"""Return a normalized OpenRouter provider.sort value or None."""
if not isinstance(raw_sort, str):
return None
sort_value = raw_sort.strip().lower()
if not sort_value:
return None
if sort_value in _OPENROUTER_PROVIDER_SORT_VALUES:
return sort_value
logger.warning(
"Ignoring invalid OpenRouter provider.sort value %r (allowed: %s)",
raw_sort,
", ".join(sorted(_OPENROUTER_PROVIDER_SORT_VALUES)),
)
return None
def _env_float(name: str, default: float) -> float:
try:
return float(os.getenv(name, str(default)))
@@ -229,6 +259,11 @@ def interruptible_api_call(agent, api_kwargs: dict):
invalidate_runtime_client(region)
raise
result["response"] = normalize_converse_response(raw_response)
elif agent.provider == "moa":
# MoA is a virtual chat-completions provider backed by the
# in-process MoAClient facade. Do not rebuild a request-local
# OpenAI client from the virtual runtime metadata.
result["response"] = agent.client.chat.completions.create(**api_kwargs)
else:
request_client = _set_request_client(
agent._create_request_openai_client(
@@ -698,8 +733,9 @@ def build_api_kwargs(agent, api_messages: list) -> dict:
_prefs["ignore"] = agent.providers_ignored
if agent.providers_order:
_prefs["order"] = agent.providers_order
if agent.provider_sort:
_prefs["sort"] = agent.provider_sort
_provider_sort = _validated_openrouter_provider_sort(agent.provider_sort)
if _provider_sort:
_prefs["sort"] = _provider_sort
if agent.provider_require_parameters:
_prefs["require_parameters"] = True
if agent.provider_data_collection:
@@ -1015,18 +1051,23 @@ def build_assistant_message(agent, assistant_message, finish_reason: str) -> dic
"arguments": tool_call.function.arguments
},
}
# Defence-in-depth: redact credentials from tool call arguments
# before they enter conversation history. Tool execution uses the
# raw API response object, not this dict, so redacting the
# persisted shape is safe and only affects storage. Catches the
# case where a model accidentally inlines a secret into a tool
# call (e.g. `terminal(command="curl -H 'Authorization: Bearer
# sk-...'")`). (#19798)
if isinstance(tc_dict["function"]["arguments"], str):
from agent.redact import redact_sensitive_text
tc_dict["function"]["arguments"] = redact_sensitive_text(
tc_dict["function"]["arguments"]
)
# Tool-call arguments are intentionally NOT redacted here. This
# dict enters the in-memory conversation history that is replayed
# to the model on every subsequent turn AND persisted to state.db,
# which is itself replayed verbatim on session resume
# (get_messages_as_conversation). Masking a credential to `***`
# here poisons that replay: the model reads back its own
# `PGPASSWORD='***' psql ...` call and copies the placeholder into
# the next tool call, breaking every credential-dependent command
# on the second turn (#43083). The masking also provided no real
# protection — the same secret still leaks verbatim through tool
# OUTPUT (file contents, command output, diffs, the compaction
# block), none of which this pass ever touched. Keeping secrets
# out of the replayable store is a separate tokenization/vault
# concern, not something arg-redaction can deliver without
# breaking replay. Storage-time redaction remains governed by the
# `security.redact_secrets` toggle. (#19798 introduced this;
# #43083 removed it.)
# Preserve extra_content (e.g. Gemini thought_signature) so it
# is sent back on subsequent API calls. Without this, Gemini 3
# thinking models reject the request with a 400 error.
@@ -1093,8 +1134,22 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
if (not fallback_already_active) or (primary_provider and current_provider == primary_provider):
agent._rate_limited_until = time.monotonic() + 60
if agent._fallback_index >= len(agent._fallback_chain):
# Chain exhausted. If we actually walked a non-empty chain and the
# failure was NOT a rate-limit/billing event (those already armed
# their own 60s cooldown above), arm a short cooldown so the next
# turn's restore_primary_runtime stays gated instead of resetting
# _fallback_index=0 and re-marshaling the whole context across every
# provider again. Guards the cross-turn replay storm in #24996.
if (
len(agent._fallback_chain) > 0
and reason not in {FailoverReason.rate_limit, FailoverReason.billing}
):
_existing_cooldown = getattr(agent, "_rate_limited_until", 0) or 0
agent._rate_limited_until = max(
_existing_cooldown,
time.monotonic() + _FALLBACK_EXHAUSTED_COOLDOWN_S,
)
return False
fb = agent._fallback_chain[agent._fallback_index]
agent._fallback_index += 1
fb_provider = (fb.get("provider") or "").strip().lower()
@@ -1210,14 +1265,16 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
agent._transport_cache.clear()
agent._fallback_activated = True
# Clear the credential pool when the fallback provider doesn't match
# the pool's provider. The pool was seeded for the primary provider;
# leaving it attached means downstream recovery (rate_limit / billing /
# auth) calls ``_swap_credential`` with a primary entry which overwrites
# the agent's ``base_url`` back to the primary's endpoint — every
# fallback request then 404s against the wrong host. See #33163.
# Rebind the credential pool to the fallback provider when the provider
# changes. Keeping the primary pool attached would make downstream
# recovery (rate_limit / billing / auth) mutate the wrong credential
# set and can overwrite the fallback's base_url back to the primary
# endpoint. See #33163.
#
# When the fallback shares the pool's provider (e.g. both openrouter
# entries with different routing) the pool is preserved.
# entries with different routing) the pool is preserved. When the
# providers differ, load the fallback provider's own pool if one exists
# so provider-specific rotation continues to work after the switch.
_existing_pool = getattr(agent, "_credential_pool", None)
if _existing_pool is not None:
_pool_provider = (getattr(_existing_pool, "provider", "") or "").strip().lower()
@@ -1228,6 +1285,22 @@ def try_activate_fallback(agent, reason: "FailoverReason | None" = None) -> bool
fb_provider, fb_model, _pool_provider,
)
agent._credential_pool = None
if getattr(agent, "_credential_pool", None) is None:
try:
from agent.credential_pool import load_pool
fallback_pool = load_pool(fb_provider)
if fallback_pool and fallback_pool.has_credentials():
agent._credential_pool = fallback_pool
logger.info(
"Fallback to %s/%s: attached fallback credential pool",
fb_provider, fb_model,
)
except Exception as exc:
logger.debug(
"Fallback to %s/%s: could not attach credential pool: %s",
fb_provider, fb_model, exc,
)
# Honor per-provider / per-model request_timeout_seconds for the
# fallback target (same knob the primary client uses). None = use
@@ -1458,8 +1531,9 @@ def handle_max_iterations(agent, messages: list, api_call_count: int) -> str:
provider_preferences["ignore"] = agent.providers_ignored
if agent.providers_order:
provider_preferences["order"] = agent.providers_order
if agent.provider_sort:
provider_preferences["sort"] = agent.provider_sort
_provider_sort = _validated_openrouter_provider_sort(agent.provider_sort)
if _provider_sort:
provider_preferences["sort"] = _provider_sort
if provider_preferences and (
(agent.provider or "").strip().lower() == "openrouter"
or agent._is_openrouter_url()
@@ -1838,7 +1912,6 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
stream_kwargs = {
**api_kwargs,
"stream": True,
"stream_options": {"include_usage": True},
"timeout": _httpx.Timeout(
connect=_conn_cap,
read=_stream_read_timeout,
@@ -1846,6 +1919,14 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
pool=_conn_cap,
),
}
# OpenAI's `stream_options={"include_usage": True}` drives usage
# accounting on OpenAI-compatible endpoints (incl. the Gemini OpenAI
# compat shim and aggregators like OpenRouter). Google's *native*
# Gemini REST endpoint rejects the keyword outright
# (`Completions.create() got an unexpected keyword argument
# 'stream_options'`), so omit it only for that endpoint.
if not is_native_gemini_base_url(agent.base_url):
stream_kwargs["stream_options"] = {"include_usage": True}
request_client = _set_request_client(
agent._create_request_openai_client(
reason="chat_completion_stream_request",
@@ -2246,7 +2327,15 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
_fire_first_delta()
agent._fire_reasoning_delta(thinking_text)
# Return the native Anthropic Message for downstream processing
# Return the native Anthropic Message for downstream processing.
# If the stream was interrupted (the event loop broke out above on
# agent._interrupt_requested), do NOT call get_final_message() — on
# a partially-consumed stream the SDK may hang draining remaining
# events or return a Message with incomplete tool_use blocks (partial
# JSON in `input`). The outer poll loop raises InterruptedError, so
# this return value is discarded anyway.
if agent._interrupt_requested:
return None
return stream.get_final_message()
def _call():
@@ -2391,12 +2480,19 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
diag=request_client_holder.get("diag"),
)
_close_request_client_once("stream_mid_tool_retry_cleanup")
try:
agent._replace_primary_openai_client(
reason="stream_mid_tool_retry_pool_cleanup"
)
except Exception:
pass
if agent.api_mode == "anthropic_messages":
try:
agent._anthropic_client.close()
agent._rebuild_anthropic_client()
except Exception:
pass
else:
try:
agent._replace_primary_openai_client(
reason="stream_mid_tool_retry_pool_cleanup"
)
except Exception:
pass
continue
# SSE error events from proxies (e.g. OpenRouter sends
@@ -2444,12 +2540,19 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
_close_request_client_once("stream_retry_cleanup")
# Also rebuild the primary client to purge
# any dead connections from the pool.
try:
agent._replace_primary_openai_client(
reason="stream_retry_pool_cleanup"
)
except Exception:
pass
if agent.api_mode == "anthropic_messages":
try:
agent._anthropic_client.close()
agent._rebuild_anthropic_client()
except Exception:
pass
else:
try:
agent._replace_primary_openai_client(
reason="stream_retry_pool_cleanup"
)
except Exception:
pass
continue
# Retries exhausted. Log the final failure with
# full diagnostic detail (chain, headers,
@@ -2561,6 +2664,17 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
_stream_stale_timeout = max(_stream_stale_timeout_base, 240.0)
else:
_stream_stale_timeout = _stream_stale_timeout_base
# Reasoning-model floor: known reasoning models (Nemotron 3 Ultra,
# OpenAI o1/o3, Anthropic Opus 4.x thinking, DeepSeek R1, Qwen QwQ,
# xAI Grok reasoning, etc.) routinely exceed the default 180s chat-
# model threshold during their thinking phase. The cloud gateway
# upstream kills the socket first, surfacing as BrokenPipeError.
# Raises the floor only — never overrides explicit user config
# (handled by get_provider_stale_timeout above).
from agent.reasoning_timeouts import get_reasoning_stale_timeout_floor
_reasoning_floor = get_reasoning_stale_timeout_floor(api_kwargs.get("model"))
if _reasoning_floor is not None:
_stream_stale_timeout = max(_stream_stale_timeout, _reasoning_floor)
t = threading.Thread(target=_call, daemon=True)
t.start()
@@ -2609,10 +2723,17 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
pass
# Rebuild the primary client too — its connection pool
# may hold dead sockets from the same provider outage.
try:
agent._replace_primary_openai_client(reason="stale_stream_pool_cleanup")
except Exception:
pass
if agent.api_mode == "anthropic_messages":
try:
agent._anthropic_client.close()
agent._rebuild_anthropic_client()
except Exception:
pass
else:
try:
agent._replace_primary_openai_client(reason="stale_stream_pool_cleanup")
except Exception:
pass
# Reset the timer so we don't kill repeatedly while
# the inner thread processes the closure.
last_chunk_time["t"] = time.time()
@@ -2688,7 +2809,30 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
role="assistant", content=_partial_text, tool_calls=None,
reasoning_content=None,
)
return SimpleNamespace(
# Detect provider output-layer content filtering (e.g. MiniMax
# "output new_sensitive (1027)", Azure/OpenAI content_filter,
# Anthropic safety refusal). The raw error is about to be
# swallowed into a finish_reason=length stub, so classify it HERE
# while we still have it and stamp the stub. Retrying such a
# content-deterministic filter on the same primary just re-hits
# the filter — the conversation loop reads this tag and activates
# the fallback chain instead of burning continuation retries.
# error_classifier is the single source of truth for "what counts
# as a content filter" (#32421).
_content_filter_terminated = False
try:
from agent.error_classifier import classify_api_error, FailoverReason
_cls = classify_api_error(
result["error"],
provider=str(getattr(agent, "provider", "") or ""),
model=str(getattr(agent, "model", "") or ""),
)
_content_filter_terminated = (
_cls.reason == FailoverReason.content_policy_blocked
)
except Exception:
_content_filter_terminated = False
_stub = SimpleNamespace(
id=PARTIAL_STREAM_STUB_ID,
model=getattr(agent, "model", "unknown"),
choices=[SimpleNamespace(
@@ -2697,6 +2841,9 @@ def interruptible_streaming_api_call(agent, api_kwargs: dict, *, on_first_delta=
usage=None,
_dropped_tool_names=_partial_names or None,
)
if _content_filter_terminated:
_stub._content_filter_terminated = True
return _stub
raise result["error"]
return result["response"]

View File

@@ -60,6 +60,8 @@ from dataclasses import dataclass
from pathlib import Path
from typing import Any, Optional
from hermes_cli._subprocess_compat import IS_WINDOWS, windows_hide_flags
logger = logging.getLogger("hermes.coding_context")
CODING_TOOLSET = "coding"
@@ -83,6 +85,59 @@ _PROJECT_MARKERS = (
# Agent-instruction files surfaced separately from manifests in the snapshot.
_CONTEXT_FILES = ("AGENTS.md", "CLAUDE.md", ".cursorrules")
# Source-file extensions that make a git repo a *code* workspace even with no
# manifest. Without this, `git init` on a notes/writing/research folder (a huge
# non-coding use case) would flip the whole session into the coding posture just
# for having a `.git`. A manifest still wins on its own (see `_PROJECT_MARKERS`).
_CODE_EXTENSIONS = frozenset({
".py", ".pyi", ".ipynb", ".js", ".jsx", ".ts", ".tsx", ".mjs", ".cjs",
".go", ".rs", ".java", ".kt", ".kts", ".scala", ".rb", ".php", ".c", ".h",
".cc", ".cpp", ".hpp", ".cs", ".swift", ".m", ".mm", ".dart", ".ex", ".exs",
".lua", ".sh", ".bash", ".zsh", ".sql", ".vue", ".svelte", ".r", ".jl",
".hs", ".clj", ".erl", ".pl",
})
# Dirs never worth scanning for the code check (deps/build/vcs/venv noise).
_CODE_SCAN_SKIP_DIRS = frozenset({
".git", "node_modules", "venv", ".venv", "__pycache__", "dist", "build",
"target", ".next", ".turbo", "vendor",
})
# Bounded sweep: a code workspace reveals itself in the first handful of entries.
_CODE_SCAN_MAX_ENTRIES = 500
def _has_code_files(root: Path) -> bool:
"""Cheap, bounded check for source files in a repo's top two levels.
Lets a git repo of loose scripts (no manifest) still read as a code
workspace while a bare notes/writing repo does not. Scans the root and its
immediate subdirectories only, capped at ``_CODE_SCAN_MAX_ENTRIES`` stats —
a handful of readdirs at session start, not a full walk.
"""
seen = 0
stack = [(root, True)]
while stack:
directory, is_root = stack.pop()
try:
with os.scandir(directory) as entries:
for entry in entries:
seen += 1
if seen > _CODE_SCAN_MAX_ENTRIES:
return False
name = entry.name
try:
if entry.is_file():
if os.path.splitext(name)[1].lower() in _CODE_EXTENSIONS:
return True
elif is_root and entry.is_dir() and name not in _CODE_SCAN_SKIP_DIRS and not name.startswith("."):
stack.append((Path(entry.path), False))
except OSError:
continue
except OSError:
continue
return False
# Lockfile → package manager, checked in priority order.
_PY_LOCKFILES = (("uv.lock", "uv"), ("poetry.lock", "poetry"), ("Pipfile.lock", "pipenv"))
_JS_LOCKFILES = (
@@ -298,6 +353,29 @@ def _coding_mode(config: Optional[dict[str, Any]]) -> str:
return "auto"
def _coding_instructions(config: Optional[dict[str, Any]]) -> str:
"""Standing operator instructions for the coding posture (config).
``agent.coding_instructions`` — a string or list of strings appended to the
coding brief as an extra stable system block, so a user can pin project-wide
coding-workflow rules (e.g. "for UI work don't run tsc/lint until I approve;
clean the diff before committing") without editing the shipped brief.
Cache-safe: resolved once per session into the stable system-prompt tier,
like the rest of the posture.
"""
if config is None:
try:
from hermes_cli.config import load_config
config = load_config()
except Exception:
config = {}
raw = ((config or {}).get("agent", {}) or {}).get("coding_instructions", "")
if isinstance(raw, (list, tuple)):
return "\n".join(str(item).strip() for item in raw if str(item).strip())
return str(raw or "").strip()
def _resolve_cwd(cwd: Optional[str | Path]) -> Path:
if cwd:
return Path(cwd).expanduser()
@@ -368,10 +446,16 @@ def _detect_profile_name(mode: str, platform: str, cwd_str: str) -> str:
if platform and platform.strip().lower() not in INTERACTIVE_CODING_PLATFORMS:
return GENERAL_PROFILE.name
cwd = Path(cwd_str)
# A recognized project root (manifest / AGENTS.md / .cursorrules) is a code
# workspace on its own — cheap stat checks, no scan.
if _marker_root(cwd) is not None:
return CODING_PROFILE.name
git_root = _git_root(cwd)
if git_root is not None and git_root == _home():
git_root = None # dotfiles repo at $HOME — not a code workspace
if git_root is not None or _marker_root(cwd) is not None:
# A bare git repo only counts when it actually holds code, so `git init` on a
# notes/writing/research folder stays in the general posture.
if git_root is not None and _has_code_files(git_root):
return CODING_PROFILE.name
return GENERAL_PROFILE.name
@@ -398,6 +482,9 @@ class RuntimeMode:
# only to steer edit-format guidance toward the model's family — see
# ``_edit_format_line``. Fixed for the session, so cache-safe.
model: Optional[str] = None
# Standing operator instructions (``agent.coding_instructions``), appended
# as an extra stable system block. Empty unless the user configures it.
instructions: str = ""
@property
def kind(self) -> str:
@@ -444,6 +531,10 @@ class RuntimeMode:
workspace = build_coding_workspace_block(self.cwd)
if workspace:
blocks.append(workspace)
# Operator instructions ride their own block so the brief (block 0) stays
# byte-stable and cache-keyed independently of user config.
if self.instructions:
blocks.append(f"Operator instructions (from config):\n{self.instructions}")
return blocks
def compact_skill_categories(self) -> frozenset[str]:
@@ -496,6 +587,7 @@ def resolve_runtime_mode(
cwd=resolved_cwd,
config_mode=mode,
model=model,
instructions=_coding_instructions(config),
)
@@ -588,12 +680,14 @@ def _enabled_mcp_servers(config: Optional[dict[str, Any]]) -> list[str]:
def _git(cwd: Path, *args: str) -> str:
_popen_kwargs = {"creationflags": windows_hide_flags()} if IS_WINDOWS else {}
try:
out = subprocess.run(
["git", "-C", str(cwd), *args],
capture_output=True,
text=True,
timeout=_GIT_TIMEOUT,
**_popen_kwargs,
)
except (OSError, subprocess.SubprocessError):
return ""

156
agent/context_breakdown.py Normal file
View File

@@ -0,0 +1,156 @@
"""Live session context-window breakdown for UI surfaces.
Estimates how the next provider request is composed: system prompt tiers,
tool schemas, and conversation history. Uses the same rough char/4 heuristic
as ``agent.model_metadata.estimate_request_tokens_rough`` so numbers align
with compression thresholds — not exact tokenizer counts.
"""
from __future__ import annotations
import json
import re
from typing import Any, Dict, List, Optional, Sequence, Tuple
_SKILLS_BLOCK_RE = re.compile(r"<available_skills>.*?</available_skills>", re.DOTALL)
_SUBAGENT_TOOL_NAMES = frozenset({"delegate_task"})
_CATEGORY_COLORS = {
"system_prompt": "var(--context-usage-system)",
"tool_definitions": "var(--context-usage-tools)",
"rules": "var(--context-usage-rules)",
"skills": "var(--context-usage-skills)",
"mcp": "var(--context-usage-mcp)",
"subagent_definitions": "var(--context-usage-subagents)",
"memory": "var(--context-usage-memory)",
"conversation": "var(--context-usage-conversation)",
}
def _chars_to_tokens(text: str) -> int:
if not text:
return 0
return (len(text) + 3) // 4
def _json_tokens(value: Any) -> int:
if not value:
return 0
return _chars_to_tokens(json.dumps(value, ensure_ascii=False))
def _tool_name(tool: dict) -> str:
fn = tool.get("function") if isinstance(tool, dict) else None
if isinstance(fn, dict):
return str(fn.get("name") or "")
return str(tool.get("name") or "")
def _split_tools(tools: Sequence[dict]) -> Tuple[List[dict], List[dict], List[dict]]:
builtin: List[dict] = []
mcp: List[dict] = []
subagent: List[dict] = []
for tool in tools:
name = _tool_name(tool)
if name.startswith("mcp_"):
mcp.append(tool)
elif name in _SUBAGENT_TOOL_NAMES:
subagent.append(tool)
else:
builtin.append(tool)
return builtin, mcp, subagent
def _memory_blocks(agent: Any) -> Tuple[str, str]:
memory_block = ""
user_block = ""
store = getattr(agent, "_memory_store", None)
if store is None:
return memory_block, user_block
try:
if getattr(agent, "_memory_enabled", True):
memory_block = store.format_for_system_prompt("memory") or ""
if getattr(agent, "_user_profile_enabled", True):
user_block = store.format_for_system_prompt("user") or ""
except Exception:
pass
return memory_block, user_block
def _strip_blocks(text: str, *blocks: str) -> str:
out = text
for block in blocks:
if block:
out = out.replace(block, "")
return out.strip()
def compute_session_context_breakdown(
agent: Any,
messages: Optional[List[dict]] = None,
) -> Dict[str, Any]:
"""Return a Cursor-style context usage breakdown for one live agent."""
from agent.model_metadata import estimate_messages_tokens_rough
from agent.system_prompt import build_system_prompt_parts
parts = build_system_prompt_parts(agent)
stable = parts.get("stable", "") or ""
context = parts.get("context", "") or ""
volatile = parts.get("volatile", "") or ""
skills_match = _SKILLS_BLOCK_RE.search(stable)
skills_index = skills_match.group(0) if skills_match else ""
memory_block, user_block = _memory_blocks(agent)
memory_text = "\n\n".join(part for part in (memory_block, user_block) if part).strip()
system_core = _strip_blocks(stable, skills_index)
system_tail = _strip_blocks(volatile, memory_block, user_block)
system_prompt_text = "\n\n".join(part for part in (system_core, system_tail) if part).strip()
tools = list(getattr(agent, "tools", None) or [])
builtin_tools, mcp_tools, subagent_tools = _split_tools(tools)
conversation_tokens = estimate_messages_tokens_rough(messages or [])
categories = [
("system_prompt", "System prompt", _chars_to_tokens(system_prompt_text)),
("tool_definitions", "Tool definitions", _json_tokens(builtin_tools)),
("rules", "Rules", _chars_to_tokens(context)),
("skills", "Skills", _chars_to_tokens(skills_index)),
("mcp", "MCP", _json_tokens(mcp_tools)),
("subagent_definitions", "Subagent definitions", _json_tokens(subagent_tools)),
("memory", "Memory", _chars_to_tokens(memory_text)),
("conversation", "Conversation", conversation_tokens),
]
estimated_total = sum(tokens for _, _, tokens in categories)
comp = getattr(agent, "context_compressor", None)
context_max = int(getattr(comp, "context_length", 0) or 0) if comp else 0
measured_used = int(getattr(comp, "last_prompt_tokens", 0) or 0) if comp else 0
context_used = measured_used if measured_used > 0 else estimated_total
context_percent = (
max(0, min(100, round(context_used / context_max * 100)))
if context_max
else 0
)
return {
"categories": [
{
"color": _CATEGORY_COLORS.get(category_id, "var(--ui-text-tertiary)"),
"id": category_id,
"label": label,
"tokens": tokens,
}
for category_id, label, tokens in categories
if tokens > 0
],
"context_max": context_max,
"context_percent": context_percent,
"context_used": context_used,
"estimated_total": estimated_total,
"model": getattr(agent, "model", "") or "",
}

View File

@@ -890,7 +890,15 @@ class ContextCompressor(ContextEngine):
# This is independent of the abort_on_summary_failure config flag:
# rotating on a broken credential is never the right behavior.
self._last_summary_auth_failure: bool = False
# When a user-configured summary model fails and we recover by
# Set when summary generation ultimately fails due to a transient
# network/connection error (httpx/httpcore connection drop, premature
# stream close, etc.) — distinct from auth failures but treated the
# same way by compress(): ABORT and preserve the session unchanged
# rather than destroy the middle window for a deterministic
# "summary unavailable" marker. Retrying once the network recovers is
# strictly better than discarding context for a transient blip
# (#29559, #25585). Independent of abort_on_summary_failure.
self._last_summary_network_failure: bool = False
# retrying on the main model, record the failure so gateway /
# CLI callers can still warn the user even though compression
# succeeded. Silent recovery would hide the broken config.
@@ -1687,6 +1695,7 @@ This compaction should PRIORITISE preserving all information related to the focu
self._summary_model_fallen_back = False
self._last_summary_error = None
self._last_summary_auth_failure = False
self._last_summary_network_failure = False
return self._with_summary_prefix(summary)
except Exception as e:
# ``call_llm`` raises ``RuntimeError`` for two very different cases:
@@ -1819,6 +1828,15 @@ This compaction should PRIORITISE preserving all information related to the focu
if len(err_text) > 220:
err_text = err_text[:217].rstrip() + "..."
self._last_summary_error = err_text
# A terminal connection/network failure (we reach this branch only
# after any main-model fallback has already been tried or is
# unavailable). Flag it so compress() ABORTS and preserves the
# session unchanged instead of destroying the middle window for a
# placeholder marker — retrying once the network recovers is
# strictly better than dropping context (#29559, #25585). Mirrors
# the auth-failure carve-out; independent of abort_on_summary_failure.
if _is_streaming_closed:
self._last_summary_network_failure = True
logger.warning(
"Failed to generate context summary: %s. "
"Further summary attempts paused for %d seconds.",
@@ -2382,6 +2400,7 @@ This compaction should PRIORITISE preserving all information related to the focu
self._last_aux_model_failure_model = None
self._last_compress_aborted = False
self._last_summary_auth_failure = False
self._last_summary_network_failure = False
# Manual /compress (force=True) bypasses the failure cooldown so the
# user can retry immediately after an auto-compress abort. Without
@@ -2498,15 +2517,21 @@ This compaction should PRIORITISE preserving all information related to the focu
# surface a warning.
# Default is False (historical behavior).
#
# EXCEPTION — auth failures always abort. A 401/403 from the summary
# call means the credential or endpoint is broken (invalid/blocked
# key, or a token pointed at the wrong inference host). Rotating into
# EXCEPTION — auth AND transient network failures always abort. A
# 401/403 from the summary call means the credential or endpoint is
# broken (invalid/blocked key, or a token pointed at the wrong
# inference host). A connection/stream-close error means the network
# blipped at the compaction moment (#29559). In BOTH cases rotating into
# a child session with a placeholder summary on a broken credential
# strands the user on a degraded session for zero benefit — every
# subsequent call fails the same way. So when the failure was an auth
# error we abort regardless of abort_on_summary_failure, preserving
# the conversation unchanged until the credential is fixed.
if not summary and (self.abort_on_summary_failure or self._last_summary_auth_failure):
if not summary and (
self.abort_on_summary_failure
or self._last_summary_auth_failure
or self._last_summary_network_failure
):
n_skipped = compress_end - compress_start
self._last_summary_dropped_count = 0 # nothing actually dropped
self._last_summary_fallback_used = False
@@ -2521,6 +2546,15 @@ This compaction should PRIORITISE preserving all information related to the focu
"with /compress or start fresh with /new.",
n_skipped,
)
elif self._last_summary_network_failure:
logger.warning(
"Summary generation failed with a network/connection "
"error — aborting compression. %d message(s) preserved "
"unchanged; the session was NOT rotated. This is "
"transient: retry with /compress once connectivity "
"recovers, or continue the conversation as-is.",
n_skipped,
)
else:
logger.warning(
"Summary generation failed — aborting compression "

View File

@@ -12,6 +12,7 @@ from pathlib import Path
from typing import Awaitable, Callable
from agent.model_metadata import estimate_tokens_rough
from hermes_cli._subprocess_compat import IS_WINDOWS, windows_hide_flags
_QUOTED_REFERENCE_VALUE = r'(?:`[^`\n]+`|"[^"\n]+"|\'[^\'\n]+\')'
REFERENCE_PATTERN = re.compile(
@@ -290,6 +291,7 @@ def _expand_git_reference(
args: list[str],
label: str,
) -> tuple[str | None, str | None]:
_popen_kwargs = {"creationflags": windows_hide_flags()} if IS_WINDOWS else {}
try:
result = subprocess.run(
["git", *args],
@@ -298,6 +300,7 @@ def _expand_git_reference(
text=True,
timeout=30,
stdin=subprocess.DEVNULL,
**_popen_kwargs,
)
except subprocess.TimeoutExpired:
return f"{ref.raw}: git command timed out (30s)", None
@@ -325,9 +328,9 @@ async def _fetch_url_content(
async def _default_url_fetcher(url: str) -> str:
from tools.web_tools import web_extract_tool
raw = await web_extract_tool([url], format="markdown", use_llm_processing=True)
raw = await web_extract_tool([url], format="markdown")
payload = json.loads(raw)
docs = payload.get("data", {}).get("documents", [])
docs = payload.get("results", [])
if not docs:
return ""
doc = docs[0]
@@ -483,6 +486,7 @@ def _iter_visible_entries(path: Path, cwd: Path, limit: int) -> list[Path]:
def _rg_files(path: Path, cwd: Path, limit: int) -> list[Path] | None:
_popen_kwargs = {"creationflags": windows_hide_flags()} if IS_WINDOWS else {}
try:
result = subprocess.run(
["rg", "--files", str(path.relative_to(cwd))],
@@ -491,6 +495,7 @@ def _rg_files(path: Path, cwd: Path, limit: int) -> list[Path] | None:
text=True,
timeout=10,
stdin=subprocess.DEVNULL,
**_popen_kwargs,
)
except (FileNotFoundError, OSError, subprocess.TimeoutExpired):
return None

View File

@@ -90,6 +90,7 @@ def check_compression_model_feasibility(agent: Any) -> None:
try:
from agent.auxiliary_client import (
_resolve_task_provider_model,
_try_configured_fallback_for_unavailable_client,
get_text_auxiliary_client,
)
from agent.model_metadata import (
@@ -97,10 +98,6 @@ def check_compression_model_feasibility(agent: Any) -> None:
get_model_context_length,
)
client, aux_model = get_text_auxiliary_client(
"compression",
main_runtime=agent._current_main_runtime(),
)
# Best-effort aux provider label for the warning message. The
# configured provider may be "auto", in which case we fall back
# to the client's base_url hostname so the user can still tell
@@ -109,6 +106,19 @@ def check_compression_model_feasibility(agent: Any) -> None:
_aux_cfg_provider, _, _, _, _ = _resolve_task_provider_model("compression")
except Exception:
_aux_cfg_provider = ""
client, aux_model = get_text_auxiliary_client(
"compression",
main_runtime=agent._current_main_runtime(),
)
if client is None or not aux_model:
fb_client, fb_model, fb_label = _try_configured_fallback_for_unavailable_client(
"compression",
_aux_cfg_provider,
)
if fb_client is not None and fb_model:
client, aux_model = fb_client, fb_model
if "(" in fb_label and fb_label.endswith(")"):
_aux_cfg_provider = fb_label.rsplit("(", 1)[1][:-1]
if client is None or not aux_model:
if _aux_cfg_provider and _aux_cfg_provider != "auto":
msg = (
@@ -278,6 +288,29 @@ def replay_compression_warning(agent: Any) -> None:
pass
def conversation_history_after_compression(agent: Any, messages: list) -> Optional[list]:
"""Return the correct flush baseline after a compression boundary.
Legacy compression rotates to a fresh child session. That child has not
seen the compacted transcript through the normal same-turn flush path yet,
so callers must clear ``conversation_history`` to ``None`` and let the next
persistence call write the whole compacted list.
In-place compaction is different: ``archive_and_compact()`` has already
soft-archived the previous active rows and inserted ``messages`` as the new
active live transcript under the same session id. If the same agent turn
continues with ``conversation_history=None``, the identity-based flush path
treats those already-persisted compacted dicts as new and appends them a
second time, doubling the active context and retriggering compression.
A shallow copy is intentional: it captures the current compacted dict
identities as history while allowing later same-turn appends to remain new.
"""
if bool(getattr(agent, "_last_compaction_in_place", False)):
return list(messages)
return None
def compress_context(
agent: Any,
messages: list,

View File

@@ -28,6 +28,7 @@ import uuid
from typing import Any, Dict, List, Optional
from agent.codex_responses_adapter import _summarize_user_message_for_log
from agent.conversation_compression import conversation_history_after_compression
from agent.display import KawaiiSpinner
from agent.error_classifier import FailoverReason, classify_api_error
from agent.iteration_budget import IterationBudget
@@ -35,6 +36,7 @@ from agent.turn_context import build_turn_context
from agent.turn_retry_state import TurnRetryState
from agent.memory_manager import build_memory_context_block
from agent.message_sanitization import (
close_interrupted_tool_sequence,
_repair_tool_call_arguments,
_sanitize_messages_non_ascii,
_sanitize_messages_surrogates,
@@ -55,7 +57,7 @@ from agent.model_metadata import (
)
from agent.process_bootstrap import _install_safe_stdio
from agent.prompt_caching import apply_anthropic_cache_control
from agent.retry_utils import jittered_backoff
from agent.retry_utils import adaptive_rate_limit_backoff, jittered_backoff
from agent.trajectory import has_incomplete_scratchpad
from agent.usage_pricing import estimate_usage_cost, normalize_usage
from hermes_constants import PARTIAL_STREAM_STUB_ID
@@ -501,6 +503,7 @@ def run_conversation(
stream_callback: Optional[callable] = None,
persist_user_message: Optional[str] = None,
persist_user_timestamp: Optional[float] = None,
moa_config: Optional[dict[str, Any]] = None,
) -> Dict[str, Any]:
"""
Run a complete conversation with tool calling until completion.
@@ -523,6 +526,19 @@ def run_conversation(
Returns:
Dict: Complete conversation result with final response and message history
"""
if moa_config is None:
try:
from hermes_cli.moa_config import decode_moa_turn
_decoded_message, _decoded_moa_config = decode_moa_turn(user_message)
if _decoded_moa_config is not None:
user_message = _decoded_message
moa_config = _decoded_moa_config
if persist_user_message is None:
persist_user_message = _decoded_message
except Exception:
pass
# ── Per-turn setup (the prologue) ──
# All once-per-turn setup — stdio guarding, retry-counter resets, user
# message sanitization, todo/nudge hydration, system-prompt restore-or-
@@ -572,6 +588,13 @@ def run_conversation(
compression_attempts = 0
_turn_exit_reason = "unknown" # Diagnostic: why the loop ended
# Per-turn tally of consecutive successful credential-pool token refreshes,
# keyed by (provider, pool-entry-id). A persistent upstream 401 lets
# ``try_refresh_current()`` "succeed" forever on a single-entry OAuth pool,
# so this tally caps same-entry refreshes and lets the fallback chain take
# over instead of spinning. Reset here so each turn starts fresh. See #26080.
agent._auth_pool_refresh_counts = {}
# Optional opt-in runtime: if api_mode == codex_app_server, hand the
# turn to the codex app-server subprocess (terminal/file ops/patching
# all run inside Codex). Default Hermes path is bypassed entirely.
@@ -801,6 +824,28 @@ def run_conversation(
if effective_system:
api_messages = [{"role": "system", "content": effective_system}] + api_messages
if moa_config:
try:
from agent.moa_loop import aggregate_moa_context
_moa_context = aggregate_moa_context(
user_prompt=original_user_message if isinstance(original_user_message, str) else str(original_user_message),
api_messages=api_messages,
reference_models=moa_config.get("reference_models") or [],
aggregator=moa_config.get("aggregator") or {},
temperature=float(moa_config.get("reference_temperature", 0.6) or 0.6),
aggregator_temperature=float(moa_config.get("aggregator_temperature", 0.4) or 0.4),
)
if _moa_context:
for _msg in reversed(api_messages):
if _msg.get("role") == "user":
_base = _msg.get("content", "")
if isinstance(_base, str):
_msg["content"] = _base + "\n\n" + _moa_context
break
except Exception as _moa_exc:
logger.warning("MoA context aggregation failed: %s", _moa_exc)
# Inject ephemeral prefill messages right after the system prompt
# but before conversation history. Same API-call-time-only pattern.
if agent.prefill_messages:
@@ -1122,7 +1167,7 @@ def run_conversation(
# stream. Mirror the ACP exclusion used for Responses
# API upgrade (lines ~1083-1085).
elif (
agent.provider == "copilot-acp"
agent.provider in {"copilot-acp", "moa"}
or str(agent.base_url or "").lower().startswith("acp://copilot")
or str(agent.base_url or "").lower().startswith("acp+tcp://")
):
@@ -1396,10 +1441,12 @@ def run_conversation(
while time.time() < sleep_end:
if agent._interrupt_requested:
agent._vprint(f"{agent.log_prefix}⚡ Interrupt detected during retry wait, aborting.", force=True)
_interrupt_text = f"Operation interrupted during retry ({_failure_hint}, attempt {retry_count}/{max_retries})."
close_interrupted_tool_sequence(messages, _interrupt_text)
agent._persist_session(messages, conversation_history)
agent.clear_interrupt()
return {
"final_response": f"Operation interrupted during retry ({_failure_hint}, attempt {retry_count}/{max_retries}).",
"final_response": _interrupt_text,
"messages": messages,
"api_calls": api_call_count,
"completed": False,
@@ -1652,6 +1699,56 @@ def run_conversation(
if agent.api_mode in {"chat_completions", "bedrock_converse", "anthropic_messages"}:
assistant_message = _trunc_msg
# ── Content-filter stream stall → fallback (#32421) ──
# When the provider's output-layer safety filter (e.g.
# MiniMax "output new_sensitive (1027)", Azure
# content_filter) kills the stream mid-delivery, the
# raw error was classified at the swallow point and the
# stub tagged ``_content_filter_terminated``. This
# filter is content-deterministic — continuation
# retries against the SAME primary just re-hit it and
# burn paid attempts (the loop used to give up with
# "Response remained truncated after 3 continuation
# attempts" and never consult the fallback chain).
# Escalate to the configured fallback BEFORE retrying.
_cf_terminated = getattr(
response, "_content_filter_terminated", False
)
if (
_cf_terminated
and agent._fallback_index < len(agent._fallback_chain)
):
agent._vprint(
f"{agent.log_prefix}🛡️ Content filter terminated "
f"stream — activating fallback provider...",
force=True,
)
agent._emit_status(
"Content filter terminated stream; switching to fallback..."
)
if agent._try_activate_fallback():
# Roll the partial content (if any was already
# appended in a prior continuation pass) back to
# the last clean turn so the fallback provider
# gets a coherent continuation point.
if truncated_response_parts:
messages = agent._get_messages_up_to_last_assistant(messages)
agent._session_messages = messages
length_continue_retries = 0
truncated_response_parts = []
retry_count = 0
compression_attempts = 0
_retry.primary_recovery_attempted = False
_retry.restart_with_rebuilt_messages = True
break
# No fallback available — fall through to normal
# continuation (best-effort, may loop).
agent._vprint(
f"{agent.log_prefix}⚠️ No fallback provider "
f"configured — retrying with same provider "
f"(may re-hit filter)...",
force=True,
)
if assistant_message is not None and not _trunc_has_tool_calls:
length_continue_retries += 1
interim_msg = agent._build_assistant_message(assistant_message, finish_reason)
@@ -1971,9 +2068,21 @@ def run_conversation(
agent.thinking_callback("")
api_elapsed = time.time() - api_start_time
agent._vprint(f"{agent.log_prefix}⚡ Interrupted during API call.", force=True)
agent._persist_session(messages, conversation_history)
interrupted = True
final_response = f"{INTERRUPT_WAITING_FOR_MODEL_PREFIX}{api_elapsed:.1f}s elapsed)."
# Preserve any assistant text already streamed to the user
# before the stop landed. Dropping it leaves history with no
# record of the half-finished reply on screen, so the next turn
# the model "forgets" what it just said — exactly what users hit
# when they stop to redirect mid-response.
_partial = agent._strip_think_blocks(
getattr(agent, "_current_streamed_assistant_text", "") or ""
).strip()
if _partial:
messages.append({"role": "assistant", "content": _partial})
final_response = _partial
else:
final_response = f"{INTERRUPT_WAITING_FOR_MODEL_PREFIX}{api_elapsed:.1f}s elapsed)."
agent._persist_session(messages, conversation_history)
break
except Exception as api_error:
@@ -2207,6 +2316,15 @@ def run_conversation(
# "unknown variant `image_url`, expected `text`".
"unknown variant `image_url`, expected `text`",
"unknown variant image_url, expected text",
# OpenRouter routes a request to upstream endpoints and,
# when none of the candidate endpoints for the model accept
# image input, returns HTTP 404 "No endpoints found that
# support image input". Without this phrase the agent never
# strips the images, the retry loop re-sends the same
# rejected request until exhaustion, and the gateway leaves
# every subsequent message queued behind the stuck turn —
# the P1 in issue #21160. The 404 passes the 4xx gate below.
"no endpoints found that support image input",
)
_err_lower = _err_body.lower()
_looks_like_image_rejection = any(
@@ -2663,10 +2781,12 @@ def run_conversation(
# Check for interrupt before deciding to retry
if agent._interrupt_requested:
agent._vprint(f"{agent.log_prefix}⚡ Interrupt detected during error handling, aborting retries.", force=True)
_interrupt_text = f"Operation interrupted: handling API error ({error_type}: {agent._clean_error_message(str(api_error))})."
close_interrupted_tool_sequence(messages, _interrupt_text)
agent._persist_session(messages, conversation_history)
agent.clear_interrupt()
return {
"final_response": f"Operation interrupted: handling API error ({error_type}: {agent._clean_error_message(str(api_error))}).",
"final_response": _interrupt_text,
"messages": messages,
"api_calls": api_call_count,
"completed": False,
@@ -2776,10 +2896,9 @@ def run_conversation(
approx_tokens=approx_tokens,
task_id=effective_task_id,
)
# Compression created a new session — clear history
# so _flush_messages_to_session_db writes compressed
# messages to the new session, not skipping them.
conversation_history = None
conversation_history = conversation_history_after_compression(
agent, messages
)
if len(messages) < original_len or old_ctx > _reduced_ctx:
agent._buffer_status(
f"🗜️ Context reduced to {_reduced_ctx:,} tokens "
@@ -2791,26 +2910,43 @@ def run_conversation(
# Fall through to normal error handling if compression
# is exhausted or didn't help.
# Eager fallback for rate-limit errors (429 or quota exhaustion).
# When a fallback model is configured, switch immediately instead
# of burning through retries with exponential backoff -- the
# primary provider won't recover within the retry window.
# Eager fallback for rate-limit errors (429 or quota exhaustion)
# and transport errors (connection failure / timeout / provider
# overloaded). Rate limits and billing: switch immediately —
# the primary provider won't recover within the retry window.
# Transport errors: allow 1 retry first (transient hiccups
# recover), then fall back if the provider is truly unreachable.
is_rate_limited = classified.reason in {
FailoverReason.rate_limit,
FailoverReason.billing,
}
if is_rate_limited and agent._fallback_index < len(agent._fallback_chain):
_is_transport_failure = classified.reason in {
FailoverReason.timeout,
FailoverReason.overloaded,
}
_should_fallback = (
is_rate_limited
or (_is_transport_failure and retry_count >= 2)
)
if _should_fallback and agent._fallback_index < len(agent._fallback_chain):
# Don't eagerly fallback if credential pool rotation may
# still recover. See _pool_may_recover_from_rate_limit
# for the single-credential-pool exception. Fixes #11314.
# for the single-credential-pool and CloudCode-quota
# exceptions. Fixes #11314 and #13636.
pool_may_recover = _ra()._pool_may_recover_from_rate_limit(
agent._credential_pool,
provider=agent.provider,
base_url=getattr(agent, "base_url", None),
)
if not pool_may_recover:
if classified.reason == FailoverReason.billing:
agent._buffer_status(
"⚠️ Billing or credits exhausted — switching to fallback provider..."
)
elif _is_transport_failure:
agent._buffer_status(
"⚠️ Provider unreachable — switching to fallback provider..."
)
else:
agent._buffer_status("⚠️ Rate limited — switching to fallback provider...")
if agent._try_activate_fallback(reason=classified.reason):
@@ -2985,10 +3121,9 @@ def run_conversation(
messages, system_message, approx_tokens=approx_tokens,
task_id=effective_task_id,
)
# Compression created a new session — clear history
# so _flush_messages_to_session_db writes compressed
# messages to the new session, not skipping them.
conversation_history = None
conversation_history = conversation_history_after_compression(
agent, messages
)
# Re-estimate tokens after compression. Same-message-count
# compression (tool-result pruning, in-place summarization)
@@ -3152,10 +3287,9 @@ def run_conversation(
messages, system_message, approx_tokens=approx_tokens,
task_id=effective_task_id,
)
# Compression created a new session — clear history
# so _flush_messages_to_session_db writes compressed
# messages to the new session, not skipping them.
conversation_history = None
conversation_history = conversation_history_after_compression(
agent, messages
)
# Re-estimate tokens after compression. Same-message-count
# compression (tool-result pruning, in-place summarization)
@@ -3417,6 +3551,13 @@ def run_conversation(
):
_retry.primary_recovery_attempted = True
retry_count = 0
# Primary transport recovery starts a fresh attempt
# cycle. Re-open fallback state so a follow-on 429 can
# still activate fallback_providers after stale
# pre-recovery fallback/credential-pool bookkeeping.
_retry.has_retried_429 = False
agent._fallback_index = 0
agent._fallback_activated = False
continue
# Try fallback before giving up entirely
if agent._has_pending_fallback():
@@ -3482,6 +3623,65 @@ def run_conversation(
force=True,
)
# Detect thinking-timeout pattern: a known reasoning model
# hit a transport-layer error before the first content
# token arrived. Distinct from _is_stream_drop above
# (which fires for large file-write stream drops) and
# from any classifier reason that's not a transport
# timeout. Reuses the reasoning-model allowlist from
# agent/reasoning_timeouts.py (Fixes #52217) so the
# trigger is consistent with what the per-model
# stale-timeout floor covers. After the classifier
# override at agent/error_classifier.py:720-738 (this
# PR), transport disconnects on reasoning models route
# to FailoverReason.timeout rather than
# context_overflow, so this branch actually fires.
# Detection and message text live in
# agent.thinking_timeout_guidance so they're
# unit-testable without driving the full retry loop.
# (Part 2 of Fixes #52310.)
from agent.thinking_timeout_guidance import (
is_thinking_timeout,
)
_is_thinking_timeout = is_thinking_timeout(
classified,
_model,
error_msg,
)
if _is_thinking_timeout:
agent._vprint(
f"{agent.log_prefix} 💡 The model's thinking "
f"phase exceeded the upstream proxy's idle "
f"timeout before the first content token "
f"arrived. This is a known issue with "
f"reasoning models behind cloud gateways "
f"(NVIDIA NIM, OpenAI, Anthropic, DeepSeek).",
force=True,
)
agent._vprint(
f"{agent.log_prefix} Workarounds in priority order:",
force=True,
)
agent._vprint(
f"{agent.log_prefix} 1. Set "
f"`providers.{_provider}.models.{_model}.stale_timeout_seconds: 900` "
f"in `~/.hermes/config.yaml` to extend the per-call "
f"timeout. (Hermes's built-in floor is 600s for "
f"known reasoning models — if you still see this "
f"after raising, the upstream cap is even shorter.)",
force=True,
)
agent._vprint(
f"{agent.log_prefix} 2. Lower `reasoning_budget` or set "
f"`reasoning_effort: medium` on this model if the provider supports it.",
force=True,
)
agent._vprint(
f"{agent.log_prefix} 3. Use a smaller / faster reasoning "
f"model if the task doesn't require deep thinking.",
force=True,
)
logger.error(
"%sAPI call failed after %s retries. %s | provider=%s model=%s msgs=%s tokens=~%s",
agent.log_prefix, max_retries, _final_summary,
@@ -3498,7 +3698,22 @@ def run_conversation(
_final_response += f"\n\n{_billing_guidance}"
else:
_final_response = f"API call failed after {max_retries} retries: {_final_summary}"
if _is_stream_drop:
if _is_thinking_timeout:
# Thinking-timeout guidance overrides the generic
# stream-drop guidance — the latter is wrong for
# this case (it suggests splitting large file
# writes, which isn't what happened). See the
# reasoning-model override at
# agent/error_classifier.py:720-738 and the
# detection block above for context.
from agent.thinking_timeout_guidance import (
build_thinking_timeout_guidance,
)
_final_response += build_thinking_timeout_guidance(
provider=_provider,
model=_model,
)
elif _is_stream_drop:
_final_response += (
"\n\nThe provider's stream connection keeps "
"dropping — this often happens when generating "
@@ -3530,20 +3745,47 @@ def run_conversation(
_ra_raw = _resp_headers.get("retry-after") or _resp_headers.get("Retry-After")
if _ra_raw:
try:
_retry_after = min(float(_ra_raw), 120) # Cap at 2 minutes
# Cap at 10 minutes. Anthropic Tier 1 input-token
# buckets reset in ~171s, so a 120s cap caused us to
# retry before the actual reset window and re-trip the
# limit. 600s covers all realistic provider reset
# windows while still rejecting pathological values. (#26293)
_retry_after = min(float(_ra_raw), 600)
except (TypeError, ValueError):
pass
wait_time = _retry_after if _retry_after else jittered_backoff(retry_count, base_delay=2.0, max_delay=60.0)
_backoff_policy = None
if is_rate_limited and not _retry_after:
wait_time, _backoff_policy = adaptive_rate_limit_backoff(
retry_count,
base_url=str(_base),
model=_model,
error=api_error,
default_wait=wait_time,
)
if is_rate_limited:
agent._buffer_status(f"⏱️ Rate limited. Waiting {wait_time:.1f}s (attempt {retry_count + 1}/{max_retries})...")
_policy_note = ""
if _backoff_policy == "zai_coding_overload_long":
_policy_note = " (Z.AI Coding overload adaptive long backoff)"
elif _backoff_policy == "zai_coding_overload_short":
_policy_note = " (Z.AI Coding overload short retry)"
_rate_limit_status = f"⏱️ Rate limited. Waiting {wait_time:.1f}s (attempt {retry_count + 1}/{max_retries}){_policy_note}..."
# Normal retries are buffered to avoid noisy transient chatter. Long
# Z.AI Coding waits are different: they can last minutes, so surface
# progress immediately instead of making the TUI look frozen.
if _backoff_policy == "zai_coding_overload_long":
agent._emit_status(_rate_limit_status)
else:
agent._buffer_status(_rate_limit_status)
else:
agent._buffer_status(f"⏳ Retrying in {wait_time:.1f}s (attempt {retry_count}/{max_retries})...")
logger.warning(
"Retrying API call in %ss (attempt %s/%s) %s error=%s",
"Retrying API call in %ss (attempt %s/%s) %s policy=%s error=%s",
wait_time,
retry_count,
max_retries,
agent._client_log_context(),
_backoff_policy or "default",
api_error,
)
# Sleep in small increments so we can respond to interrupts quickly
@@ -3553,10 +3795,12 @@ def run_conversation(
while time.time() < sleep_end:
if agent._interrupt_requested:
agent._vprint(f"{agent.log_prefix}⚡ Interrupt detected during retry wait, aborting.", force=True)
_interrupt_text = f"Operation interrupted: retrying API call after error (retry {retry_count}/{max_retries})."
close_interrupted_tool_sequence(messages, _interrupt_text)
agent._persist_session(messages, conversation_history)
agent.clear_interrupt()
return {
"final_response": f"Operation interrupted: retrying API call after error (retry {retry_count}/{max_retries}).",
"final_response": _interrupt_text,
"messages": messages,
"api_calls": api_call_count,
"completed": False,
@@ -3587,6 +3831,17 @@ def run_conversation(
_retry.restart_with_compressed_messages = False
continue
if _retry.restart_with_rebuilt_messages:
# A content-filter stream stall (#32421) was escalated to the
# fallback chain and the partial content rolled back. Re-issue
# the API call against the now-active fallback provider. Refund
# the budget/count for the stalled attempt so the fallback gets a
# fair turn.
api_call_count -= 1
agent.iteration_budget.refund()
_retry.restart_with_rebuilt_messages = False
continue
if _retry.restart_with_length_continuation:
# Progressively boost the output token budget on each retry.
# Retry 1 → 2× base, retry 2 → 3× base, capped at 32 768.
@@ -4047,6 +4302,19 @@ def run_conversation(
messages.append(assistant_msg)
agent._emit_interim_assistant_message(assistant_msg)
try:
# Persist the assistant tool-call turn before any tool
# side effects run. If a destructive tool restarts or
# terminates Hermes mid-turn, resume logic still sees the
# exact tool-call block that already executed.
agent._flush_messages_to_session_db(messages, conversation_history)
except Exception as exc:
logger.warning(
"Incremental tool-call persistence failed before execution "
"(session=%s): %s",
agent.session_id or "none",
exc,
)
# Close any open streaming display (response box, reasoning
# box) before tool execution begins. Intermediate turns may
@@ -4148,10 +4416,9 @@ def run_conversation(
approx_tokens=agent.context_compressor.last_prompt_tokens,
task_id=effective_task_id,
)
# Compression created a new session — clear history so
# _flush_messages_to_session_db writes compressed messages
# to the new session (see preflight compression comment).
conversation_history = None
conversation_history = conversation_history_after_compression(
agent, messages
)
# Save session log incrementally (so progress is visible even if interrupted)
agent._session_messages = messages
@@ -4193,7 +4460,11 @@ def run_conversation(
"as final response"
)
final_response = _recovered
agent._response_was_previewed = True
# Streaming delivered a fragment, not a confirmed
# final preview. Leave response_previewed false so
# gateway fallback delivery can send the recovered
# text plus the abnormal-turn explanation.
agent._response_was_previewed = False
break
# If the previous turn already delivered real content alongside
@@ -4438,14 +4709,20 @@ def run_conversation(
# status from earlier failed attempts in this turn.
agent._clear_status_buffer()
from agent.agent_runtime_helpers import (
intent_ack_continuation_mode,
)
_ack_mode = intent_ack_continuation_mode(agent)
if (
agent.api_mode == "codex_responses"
_ack_mode != "off"
and agent.valid_tool_names
and codex_ack_continuations < 2
and agent._looks_like_codex_intermediate_ack(
user_message=user_message,
assistant_content=final_response,
messages=messages,
require_workspace=(_ack_mode == "codex_only"),
)
):
codex_ack_continuations += 1
@@ -4476,9 +4753,10 @@ def run_conversation(
final_msg = agent._build_assistant_message(assistant_message, finish_reason)
# Pop thinking-only prefill and empty-response retry
# scaffolding before appending the final response. These
# internal turns are only for the next API retry and should
# not become durable transcript context.
# scaffolding before appending either a final response or a
# verification-stop follow-up. These internal turns are only
# for the next API retry and should not become durable
# transcript context.
while (
messages
and isinstance(messages[-1], dict)
@@ -4490,6 +4768,97 @@ def run_conversation(
):
messages.pop()
try:
from agent.verification_stop import (
build_verify_on_stop_nudge,
verify_on_stop_enabled,
)
if verify_on_stop_enabled():
_verify_nudge = build_verify_on_stop_nudge(
session_id=getattr(agent, "session_id", None),
changed_paths=getattr(agent, "_turn_file_mutation_paths", set()),
attempts=getattr(agent, "_verification_stop_nudges", 0),
)
else:
_verify_nudge = None
except Exception:
logger.debug("verification stop-loop check failed", exc_info=True)
_verify_nudge = None
if _verify_nudge:
agent._verification_stop_nudges = (
getattr(agent, "_verification_stop_nudges", 0) + 1
)
final_msg["finish_reason"] = "verification_required"
messages.append(final_msg)
# Keep the attempted final answer in model history so the
# synthetic user nudge preserves role alternation, but do
# not surface it to the user as an interim answer. The
# whole point of this guard is to prevent premature
# "done" claims before checks run.
messages.append({
"role": "user",
"content": _verify_nudge,
"_verification_stop_synthetic": True,
})
agent._session_messages = messages
# Run the verification-stop loop silently — the nudge is an
# internal turn that should not add noise to the user's
# terminal. Keep a debug breadcrumb in agent.log for tracing.
logger.debug("verification stop-loop nudge issued (attempt %d)",
agent._verification_stop_nudges)
continue
# User verification-loop gate: when the agent edited code this
# turn, let a registered `pre_verify` hook (plugin/shell) keep it
# going one more turn. The shipped guidance is folded into the
# evidence-based verify-on-stop nudge above, so this path has no
# default continuation cost.
_verify_nudge2 = None
_edited = sorted(getattr(agent, "_turn_file_mutation_paths", set()) or [])
_attempt = getattr(agent, "_pre_verify_nudges", 0)
try:
from agent.verify_hooks import max_verify_nudges
from hermes_cli.plugins import get_pre_verify_continue_message, has_hook
if _edited and has_hook("pre_verify") and _attempt < max_verify_nudges():
# Posture is fixed for the session — resolve once + cache.
coding = getattr(agent, "_resolved_is_coding", None)
if coding is None:
from agent.coding_context import is_coding_context
coding = bool(is_coding_context(platform=getattr(agent, "platform", "") or ""))
agent._resolved_is_coding = coding
_verify_nudge2 = get_pre_verify_continue_message(
session_id=getattr(agent, "session_id", None) or "",
platform=getattr(agent, "platform", "") or "",
model=getattr(agent, "model", "") or "",
coding=coding,
attempt=_attempt,
final_response=final_response,
changed_paths=_edited,
)
except Exception:
logger.debug("pre_verify hook check failed", exc_info=True)
_verify_nudge2 = None
if _verify_nudge2:
agent._pre_verify_nudges = _attempt + 1
final_msg["finish_reason"] = "verify_hook_continue"
# Same alternation contract as verify-on-stop: keep the
# attempted answer in history, follow it with a synthetic
# user nudge, and don't surface the premature answer.
messages.append(final_msg)
messages.append({
"role": "user",
"content": _verify_nudge2,
"_pre_verify_synthetic": True,
})
agent._session_messages = messages
logger.debug("pre_verify nudge issued (attempt %d)",
agent._pre_verify_nudges)
continue
messages.append(final_msg)
_turn_exit_reason = f"text_response(finish_reason={finish_reason})"

View File

@@ -21,8 +21,14 @@ from pathlib import Path
from types import SimpleNamespace
from typing import Any
from openai.types.chat.chat_completion_message_tool_call import (
ChatCompletionMessageToolCall,
Function,
)
from agent.file_safety import get_read_block_error, is_write_denied
from agent.redact import redact_sensitive_text
from tools.environments.local import hermes_subprocess_env
ACP_MARKER_BASE_URL = "acp://copilot"
_DEFAULT_TIMEOUT_SECONDS = 900.0
@@ -94,7 +100,10 @@ def _resolve_home_dir() -> str:
def _build_subprocess_env() -> dict[str, str]:
env = os.environ.copy()
# Copilot ACP is a model-driving CLI executor: it legitimately needs LLM
# provider credentials. Route through the central helper so Tier-1 secrets
# (gateway bot tokens, GitHub auth, infra) are still stripped (#29157).
env = hermes_subprocess_env(inherit_credentials=True)
home = _resolve_home_dir()
env["HOME"] = home
from hermes_constants import apply_subprocess_home_env
@@ -224,11 +233,73 @@ def _render_message_content(content: Any) -> str:
return str(content).strip()
def _extract_tool_calls_from_text(text: str) -> tuple[list[SimpleNamespace], str]:
def _build_openai_tool_call(
*,
call_id: str,
name: str,
arguments: str,
) -> ChatCompletionMessageToolCall:
"""Build an OpenAI-compatible tool-call object for downstream handling."""
return ChatCompletionMessageToolCall(
id=call_id,
call_id=call_id,
response_item_id=None,
type="function",
function=Function(name=name, arguments=arguments),
)
def _completion_to_stream_chunks(completion: SimpleNamespace) -> list[SimpleNamespace]:
"""Convert a one-shot ACP response into OpenAI-style stream chunks."""
choice = completion.choices[0]
message = choice.message
tool_call_deltas = None
if message.tool_calls:
tool_call_deltas = []
for index, tool_call in enumerate(message.tool_calls):
tool_call_deltas.append(
SimpleNamespace(
index=index,
id=getattr(tool_call, "id", None),
type=getattr(tool_call, "type", "function"),
function=SimpleNamespace(
name=getattr(tool_call.function, "name", None),
arguments=getattr(tool_call.function, "arguments", None),
),
)
)
delta = SimpleNamespace(
role="assistant",
content=message.content or None,
tool_calls=tool_call_deltas,
reasoning_content=message.reasoning_content,
reasoning=message.reasoning,
)
data_chunk = SimpleNamespace(
choices=[
SimpleNamespace(
index=0,
delta=delta,
finish_reason=choice.finish_reason,
)
],
model=completion.model,
usage=None,
)
usage_chunk = SimpleNamespace(
choices=[],
model=completion.model,
usage=completion.usage,
)
return [data_chunk, usage_chunk]
def _extract_tool_calls_from_text(text: str) -> tuple[list[ChatCompletionMessageToolCall], str]:
if not isinstance(text, str) or not text.strip():
return [], ""
extracted: list[SimpleNamespace] = []
extracted: list[ChatCompletionMessageToolCall] = []
consumed_spans: list[tuple[int, int]] = []
def _try_add_tool_call(raw_json: str) -> None:
@@ -252,12 +323,10 @@ def _extract_tool_calls_from_text(text: str) -> tuple[list[SimpleNamespace], str
call_id = f"acp_call_{len(extracted)+1}"
extracted.append(
SimpleNamespace(
id=call_id,
_build_openai_tool_call(
call_id=call_id,
response_item_id=None,
type="function",
function=SimpleNamespace(name=fn_name.strip(), arguments=fn_args),
name=fn_name.strip(),
arguments=fn_args,
)
)
@@ -376,6 +445,7 @@ class CopilotACPClient:
timeout: float | None = None,
tools: list[dict[str, Any]] | None = None,
tool_choice: Any = None,
stream: bool = False,
**_: Any,
) -> Any:
prompt_text = _format_messages_as_prompt(
@@ -422,11 +492,14 @@ class CopilotACPClient:
)
finish_reason = "tool_calls" if tool_calls else "stop"
choice = SimpleNamespace(message=assistant_message, finish_reason=finish_reason)
return SimpleNamespace(
completion = SimpleNamespace(
choices=[choice],
usage=usage,
model=model or "copilot-acp",
)
if stream:
return _completion_to_stream_chunks(completion)
return completion
def _run_prompt(self, prompt_text: str, *, timeout_seconds: float) -> tuple[str, str]:
try:

View File

@@ -11,6 +11,7 @@ import uuid
import re
from dataclasses import dataclass, fields, replace
from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Dict, List, Optional, Set, Tuple
from hermes_constants import OPENROUTER_BASE_URL
@@ -447,6 +448,63 @@ def get_pool_strategy(provider: str) -> str:
DEFAULT_MAX_CONCURRENT_PER_CREDENTIAL = 1
def _write_through_provider_state_to_global_root(
provider_id: str, state: Dict[str, Any]
) -> None:
"""Persist a rotated OAuth ``state`` into the global-root auth.json.
Best-effort write-through for the multi-profile rotation hazard
(#48415 / #43589): nous, openai-codex, and xai-oauth rotate the
refresh_token on refresh, so when a profile pool refresh rotates a grant
it resolved from the root fallback, the rotated chain must land back in
root. Otherwise root keeps a now-revoked refresh token and every other
profile reading the stale root grant dies with ``refresh_token_reused`` /
``invalid_grant`` once its access token expires.
Only updates ``providers.<provider_id>`` in the root store; never touches
the profile store (the caller already saved that). Swallows all errors — a
failed write-through degrades to the pre-existing behavior (root stale), it
must never break the profile's own successful save. Mirrors
``hermes_cli.auth._write_through_xai_oauth_to_global_root`` (which covers
the non-pool xAI refresh path) for the credential-pool refresh path.
"""
try:
global_path = auth_mod._global_auth_file_path()
except Exception:
return
if global_path is None:
# Classic mode (profile == root); the profile save already hit root.
return
# Seat belt: under pytest, refuse to write the real user's
# ~/.hermes/auth.json even when HERMES_HOME points at a profile path
# (mirrors the read-side guard in _load_global_auth_store). Uses the
# unmodified HOME env, not Path.home() which fixtures may monkeypatch.
if os.environ.get("PYTEST_CURRENT_TEST"):
real_home_env = os.environ.get("HOME", "")
if real_home_env:
real_root = Path(real_home_env) / ".hermes" / "auth.json"
try:
if global_path.resolve(strict=False) == real_root.resolve(strict=False):
return
except Exception:
return
try:
if global_path.exists():
global_store = _load_auth_store(global_path)
else:
global_store = {}
if not isinstance(global_store, dict):
return
_store_provider_state(global_store, provider_id, dict(state), set_active=False)
auth_mod._save_auth_store(global_store, global_path)
except Exception as exc: # pragma: no cover - best effort
logger.debug(
"%s pool refresh: write-through to global root failed: %s",
provider_id,
exc,
)
class CredentialPool:
def __init__(self, provider: str, entries: List[PooledCredential]):
self.provider = provider
@@ -479,10 +537,11 @@ class CredentialPool:
self._entries[idx] = new
return
def _persist(self) -> None:
def _persist(self, *, removed_ids: Optional[List[str]] = None) -> None:
write_credential_pool(
self.provider,
[entry.to_dict() for entry in self._entries],
removed_ids=removed_ids,
)
def _is_terminal_auth_failure(
@@ -800,6 +859,28 @@ class CredentialPool:
try:
with _auth_store_lock():
auth_store = _load_auth_store()
# Decide BEFORE writing whether this profile is reading the
# grant from the global root (no own providers.<id> block) vs.
# genuinely shadowing it. A pool refresh rotates single-use
# OAuth refresh tokens, so a profile that resolved the grant
# from root MUST write the rotated chain back to root too —
# otherwise root keeps a revoked refresh token and every other
# profile reading the stale root grant dies with
# refresh_token_reused / invalid_grant once its access token
# expires. This mirrors the xAI write-through in
# hermes_cli.auth._save_xai_oauth_tokens (#43589); the pool
# refresh path is the Codex/xAI analog reported in #48415.
_wt_provider_id = {
"nous": "nous",
"openai-codex": "openai-codex",
"xai-oauth": "xai-oauth",
}.get(self.provider)
write_through_to_root = bool(_wt_provider_id) and not (
isinstance(auth_store.get("providers"), dict)
and isinstance(
auth_store["providers"].get(_wt_provider_id), dict
)
)
if self.provider == "nous":
state = _load_provider_state(auth_store, "nous")
if state is None:
@@ -855,6 +936,10 @@ class CredentialPool:
return
_save_auth_store(auth_store)
if write_through_to_root and _wt_provider_id:
_write_through_provider_state_to_global_root(
_wt_provider_id, state
)
except Exception as exc:
logger.debug("Failed to sync %s pool entry back to auth store: %s", self.provider, exc)
@@ -1040,13 +1125,17 @@ class CredentialPool:
logger.debug(
"Failed to clear terminal xAI OAuth state: %s", clear_exc
)
removed_ids = [
item.id for item in self._entries
if item.source == "loopback_pkce"
]
self._entries = [
item for item in self._entries
if item.source != "loopback_pkce"
]
if self._current_id == entry.id:
self._current_id = None
self._persist()
self._persist(removed_ids=removed_ids)
return None
# For openai-codex: same race as xAI/nous — another Hermes process
# may have consumed the refresh token between our proactive sync
@@ -1106,13 +1195,17 @@ class CredentialPool:
logger.debug(
"Failed to clear terminal Codex OAuth state: %s", clear_exc
)
removed_ids = [
item.id for item in self._entries
if item.source == "device_code"
]
self._entries = [
item for item in self._entries
if item.source != "device_code"
]
if self._current_id == entry.id:
self._current_id = None
self._persist()
self._persist(removed_ids=removed_ids)
return None
# For nous: another process may have consumed the refresh token
# between our proactive sync and the HTTP call. Re-sync from
@@ -1169,13 +1262,17 @@ class CredentialPool:
auth_mod.NOUS_DEVICE_CODE_SOURCE,
f"manual:{auth_mod.NOUS_DEVICE_CODE_SOURCE}",
}
removed_ids = [
item.id for item in self._entries
if item.source in singleton_sources
]
self._entries = [
item for item in self._entries
if item.source not in singleton_sources
]
if self._current_id == entry.id:
self._current_id = None
self._persist()
self._persist(removed_ids=removed_ids)
return None
self._mark_exhausted(entry, None)
return None
@@ -1337,7 +1434,7 @@ class CredentialPool:
pruned_ids = set(entries_to_prune)
self._entries = [e for e in self._entries if e.id not in pruned_ids]
if cleared_any:
self._persist()
self._persist(removed_ids=entries_to_prune)
return available
def _select_unlocked(self) -> Optional[PooledCredential]:
@@ -1511,7 +1608,11 @@ class CredentialPool:
replace(entry, priority=new_priority)
for new_priority, entry in enumerate(self._entries)
]
self._persist()
write_credential_pool(
self.provider,
[entry.to_dict() for entry in self._entries],
removed_ids=[removed.id],
)
if self._current_id == removed.id:
self._current_id = None
return removed
@@ -2173,6 +2274,11 @@ def _seed_custom_pool(pool_key: str, entries: List[PooledCredential]) -> Tuple[b
def load_pool(provider: str) -> CredentialPool:
provider = (provider or "").strip().lower()
raw_entries = read_credential_pool(provider)
disk_ids = {
entry.get("id")
for entry in raw_entries
if isinstance(entry, dict) and entry.get("id")
}
raw_needs_sanitization = any(
isinstance(payload, dict)
and sanitize_borrowed_credential_payload(payload, provider) != payload
@@ -2201,8 +2307,10 @@ def load_pool(provider: str) -> CredentialPool:
changed |= _normalize_pool_priorities(provider, entries)
if changed:
new_ids = {entry.id for entry in entries}
write_credential_pool(
provider,
[entry.to_dict() for entry in sorted(entries, key=lambda item: item.priority)],
removed_ids=disk_ids - new_ids,
)
return CredentialPool(provider, entries)

View File

@@ -273,6 +273,21 @@ def should_run_now(now: Optional[datetime] = None) -> bool:
# Automatic state transitions (pure function, no LLM)
# ---------------------------------------------------------------------------
def _cron_referenced_skills() -> Set[str]:
"""Skill names referenced by any cron job (incl. paused/disabled).
Best-effort: a cron-module import error or corrupt jobs store must never
break the curator, so any failure yields an empty set (no protection,
but no crash).
"""
try:
from cron.jobs import referenced_skill_names as _refs
return _refs()
except Exception as e:
logger.debug("Curator could not read cron skill references: %s", e, exc_info=True)
return set()
def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int]:
"""Walk every curator-managed skill and move active/stale/archived based on
the latest real activity timestamp. Pinned skills are never touched.
@@ -292,6 +307,8 @@ def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int
stale_cutoff = now - timedelta(days=get_stale_after_days())
archive_cutoff = now - timedelta(days=get_archive_after_days())
cron_referenced = _cron_referenced_skills()
counts = {"marked_stale": 0, "archived": 0, "reactivated": 0, "checked": 0, "seeded": 0}
for row in _u.agent_created_report():
@@ -300,6 +317,15 @@ def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int
if row.get("pinned"):
continue
# A skill referenced by any cron job (incl. paused/disabled) is in
# use by definition — resuming or the next fire must find it. The
# scheduler only bumps usage when a job actually fires, so jobs that
# fire less often than archive_after_days, paused jobs, and far-future
# one-shots would otherwise have their skills aged out from under
# them. Treat referenced skills like pinned: never auto-transition.
if name in cron_referenced:
continue
# First sight of a curation-eligible skill with no persisted record
# (e.g. a newly-eligible built-in): anchor its clock to now and defer.
if not row.get("_persisted", True):
@@ -316,6 +342,18 @@ def apply_automatic_transitions(now: Optional[datetime] = None) -> Dict[str, int
current = row.get("state", _u.STATE_ACTIVE)
# Never-used skills (use_count == 0) get a grace floor: don't archive
# one until it is at least stale_after_days old. A use=0 skill is
# absence of evidence, not evidence of staleness — a skill created
# recently may simply not have had its trigger come up yet.
never_used = int(row.get("use_count", 0) or 0) == 0
if never_used and anchor > stale_cutoff:
# Younger than the stale window — leave it alone entirely.
if current == _u.STATE_STALE:
_u.set_state(name, _u.STATE_ACTIVE)
counts["reactivated"] += 1
continue
if anchor <= archive_cutoff and current != _u.STATE_ARCHIVED:
ok, _msg = _u.archive_skill(name)
if ok:
@@ -377,8 +415,10 @@ CURATOR_REVIEW_PROMPT = (
"bodies + `references/`, `templates/`, and `scripts/` subfiles for "
"session-specific detail — not one-session-one-skill micro-entries.\n\n"
"Hard rules — do not violate:\n"
"1. DO NOT touch bundled or hub-installed skills. The candidate list "
"below is already filtered to agent-created skills only.\n"
"1. DO NOT touch bundled, hub-installed, or external-dir skills "
"(`skills.external_dirs`). The candidate list below is already filtered "
"to local curator-managed skills only; external skills are externally "
"owned and read-only to this background curator.\n"
"2. DO NOT delete any skill. Archiving (moving the skill's directory "
"into ~/.hermes/skills/.archive/) is the maximum destructive action. "
"Archives are recoverable; deletion is not.\n"
@@ -388,10 +428,19 @@ CURATOR_REVIEW_PROMPT = (
"back load-bearing UX (slash-command entry points referenced in docs and "
"tips) and are filtered out of the candidate list below — never resurrect "
"one as an archive or absorb target.\n"
"3c. DO NOT archive or prune any skill marked `cron=yes` in the candidate "
"list. A cron job depends on it and will fail to load it on its next "
"run. You MAY still consolidate it into an umbrella — but only because "
"the curator rewrites cron job skill references to follow consolidations; "
"never simply prune it.\n"
"4. DO NOT use usage counters as a reason to skip consolidation. The "
"counters are new and often mostly zero. Judge overlap on CONTENT, "
"not on use_count. 'use=0' is not evidence a skill is valuable; it's "
"absence of evidence either way.\n"
"absence of evidence either way. Corollary: 'use=0' is ALSO not a "
"reason to PRUNE a skill. Never archive a never-used skill (use=0) "
"unless it is at least 30 days old (check last_activity / created date) "
"AND its content is genuinely obsolete or fully absorbed elsewhere — a "
"recently-created skill simply may not have had its trigger come up yet.\n"
"5. DO NOT reject consolidation on the grounds that 'each skill has "
"a distinct trigger'. Pairwise distinctness is the wrong bar. The "
"right bar is: 'would a human maintainer write this as N separate "
@@ -469,8 +518,9 @@ CURATOR_REVIEW_PROMPT = (
"skill, or `absorbed_into=\"\"` when you're truly pruning with no "
"forwarding target. This drives cron-job skill-reference migration — "
"guessing from your YAML summary after the fact is fragile.\n"
" - terminal — mv a sibling into the archive "
"OR move its content into a support subfile\n\n"
" - terminal — move LOCAL candidate content into "
"a support subfile when package integrity requires it; never mv, cp, rm, "
"patch, or rewrite bundled, hub-installed, or external-dir skills\n\n"
"'keep' is a legitimate decision ONLY when the skill is already a "
"class-level umbrella and none of the proposed merges would improve "
"discoverability. 'This is narrow but distinct from its siblings' "
@@ -1410,12 +1460,14 @@ def _render_candidate_list() -> str:
rows = skill_usage.agent_created_report()
if not rows:
return "No agent-created skills to review."
cron_referenced = _cron_referenced_skills()
lines = [f"Agent-created skills ({len(rows)}):\n"]
for r in rows:
lines.append(
f"- {r['name']} "
f"state={r['state']} "
f"pinned={'yes' if r.get('pinned') else 'no'} "
f"cron={'yes' if r['name'] in cron_referenced else 'no'} "
f"activity={r.get('activity_count', 0)} "
f"use={r.get('use_count', 0)} "
f"view={r.get('view_count', 0)} "
@@ -1843,6 +1895,14 @@ def _run_llm_review(prompt: str) -> Dict[str, Any]:
# Disable recursive nudges — the curator must never spawn its own review.
review_agent._memory_nudge_interval = 0
review_agent._skill_nudge_interval = 0
# Tag this fork as autonomous background curation so skill_manage's
# background-review write guard fires. Without this the fork inherits
# the default "assistant_tool" origin, is_background_review() is False,
# and the external/bundled/hub-installed skill_manage guards never
# trigger during the curation pass they exist to protect against.
# turn_context.py binds this onto the write-origin ContextVar at turn
# start (see agent/turn_context.py).
review_agent._memory_write_origin = "background_review"
# Redirect the forked agent's stdout/stderr to /dev/null while it
# runs so its tool-call chatter doesn't pollute the foreground

View File

@@ -6,6 +6,7 @@ Used by AIAgent._execute_tool_calls for CLI feedback.
import logging
import os
import re
import sys
import threading
import time
@@ -15,6 +16,7 @@ from pathlib import Path
from typing import Any
from utils import safe_json_loads
from agent.redact import redact_sensitive_text
from agent.tool_result_classification import file_mutation_result_landed
# ANSI escape codes for coloring tool failure indicators
@@ -177,6 +179,223 @@ def _truncate_preview(text: str, max_len: int | None) -> str:
return text
_SHELL_SILENT_HEADS = {"cd", "pushd", "popd", "export", "set", "unset", "source", ".", "true", "false", ":"}
_SHELL_PIPE_TAIL_HEADS = {"head", "tail", "wc", "sort", "uniq"}
def _shell_basename(head: str) -> str:
return head.rsplit("/", 1)[-1] if head else ""
def _split_shell_words(segment: str) -> list[str]:
words: list[str] = []
buf: list[str] = []
quote: str | None = None
for i, ch in enumerate(segment):
if quote:
buf.append(ch)
if ch == quote and (i == 0 or segment[i - 1] != "\\"):
quote = None
continue
if ch in {"'", '"'}:
quote = ch
buf.append(ch)
continue
if ch.isspace():
if buf:
words.append("".join(buf))
buf = []
continue
buf.append(ch)
if buf:
words.append("".join(buf))
return words
def _strip_shell_pipe_tail(segment: str) -> str:
words = _split_shell_words(segment)
out: list[str] = []
for i, word in enumerate(words):
if word == "|" and _shell_basename(words[i + 1] if i + 1 < len(words) else "") in _SHELL_PIPE_TAIL_HEADS:
break
out.append(word)
return " ".join(out).strip()
def _split_shell_compound(command: str) -> list[str]:
segments: list[str] = []
buf: list[str] = []
quote: str | None = None
i = 0
while i < len(command):
ch = command[i]
if quote:
buf.append(ch)
if ch == quote and (i == 0 or command[i - 1] != "\\"):
quote = None
i += 1
continue
if ch in {"'", '"'}:
quote = ch
buf.append(ch)
i += 1
continue
op_len = 2 if command.startswith("&&", i) or command.startswith("||", i) else 1 if ch in {";", "\n"} else 0
if op_len:
segment = _strip_shell_pipe_tail("".join(buf).strip())
if segment:
segments.append(segment)
buf = []
i += op_len
continue
buf.append(ch)
i += 1
segment = _strip_shell_pipe_tail("".join(buf).strip())
if segment:
segments.append(segment)
return segments
def _shell_head_word(segment: str) -> str:
words = _split_shell_words(segment)
index = 0
while index < len(words) and re.match(r"^[A-Za-z_]\w*=", words[index]):
index += 1
return _shell_basename(words[index] if index < len(words) else "")
def _clean_shell_segment(segment: str) -> str:
words = _split_shell_words(segment)
out: list[str] = []
i = 0
while i < len(words):
word = words[i]
if re.match(r"^\d*(?:>>?|<)$", word):
i += 2
continue
if re.match(r"^\d*(?:>&|<&)\d+$", word) or re.match(r"^\d*>&\d+$", word):
i += 1
continue
out.append(word)
i += 1
return " ".join(out).strip()
def _is_shell_boundary_echo(segment: str) -> bool:
words = _split_shell_words(segment)
if _shell_basename(words[0] if words else "") != "echo":
return False
rest = " ".join(words[1:])
return bool(re.search(r"-{2,}|_exit=|(?:^|\s|=)\$[?{]|PIPESTATUS", rest))
def summarize_shell_command(command: str) -> str:
"""Compact shell wrapper/plumbing for display while preserving raw command elsewhere."""
original = _oneline(command)
if not original:
return ""
segments = _split_shell_compound(original)
if len(segments) <= 1:
return _clean_shell_segment(segments[0] if segments else original) or original
core: list[str] = []
for segment in segments:
cleaned = _clean_shell_segment(segment)
head = _shell_head_word(cleaned)
if cleaned and head not in _SHELL_SILENT_HEADS and not _is_shell_boundary_echo(cleaned):
core.append(cleaned)
if not core:
return original
if len(core) == 1:
return core[0]
count = len(core) - 1
return f"{core[0]} + {count} {'command' if count == 1 else 'commands'}"
def _read_file_line_label(args: dict) -> str:
offset = args.get("offset")
limit = args.get("limit")
if not isinstance(offset, int) or offset <= 0:
return ""
if not isinstance(limit, int) or limit <= 1:
return f"L{offset}"
return f"L{offset}-{offset + limit - 1}"
def redact_browser_typed_text_for_display(value: Any, typed_text: Any) -> Any:
"""Apply secret redaction to browser_type text in display-facing payloads.
Backends sometimes echo the attempted input in error strings or fallback
metadata. When the raw typed value contains a recognizable secret (API
key, token, JWT, etc.) the redacted form differs from the raw value, so we
replace every occurrence of the raw value with its redacted form before a
browser_type result reaches logs, callbacks, the model, or chat history.
Normal typed text (search queries, addresses, form fields) matches no
secret pattern, so it passes through unchanged and stays readable.
Redaction is forced here regardless of the global ``security.redact_secrets``
preference: a typed credential leaking into chat history is a security
boundary, not mere log hygiene.
"""
if typed_text is None:
return value
needle = str(typed_text)
if needle == "":
return value
redacted = redact_sensitive_text(needle, force=True)
if redacted == needle:
# Nothing secret-looking in the typed text; leave payload untouched.
return value
if isinstance(value, str):
return value.replace(needle, redacted)
if isinstance(value, dict):
return {
key: redact_browser_typed_text_for_display(item, typed_text)
for key, item in value.items()
}
if isinstance(value, list):
return [redact_browser_typed_text_for_display(item, typed_text) for item in value]
if isinstance(value, tuple):
return tuple(redact_browser_typed_text_for_display(item, typed_text) for item in value)
return value
def redact_tool_args_for_display(tool_name: str, args: dict | None) -> dict | None:
"""Return a copy of tool args safe for logs/progress UI.
For ``browser_type`` the ``text`` argument is run through the same
secret-pattern redactor used for logs. Recognizable credentials (API
keys, tokens) are masked before the value reaches tool progress
notifications; normal typed text is left intact for debuggability.
"""
if not isinstance(args, dict):
return args
if tool_name == "browser_type" and isinstance(args.get("text"), str):
safe_args = dict(args)
safe_args["text"] = redact_sensitive_text(args["text"], force=True)
return safe_args
return args
def _delegate_task_goal_parts(tasks: Any, *, per_goal_len: int) -> tuple[int, list[str]]:
if not isinstance(tasks, list):
return 0, []
@@ -200,13 +419,14 @@ def build_tool_preview(tool_name: str, args: dict, max_len: int | None = None) -
max_len = _tool_preview_max_len
if not args:
return None
args = redact_tool_args_for_display(tool_name, args) or args
primary_args = {
"terminal": "command", "web_search": "query", "web_extract": "urls",
"read_file": "path", "write_file": "path", "patch": "path",
"search_files": "pattern", "browser_navigate": "url",
"browser_click": "ref", "browser_type": "text",
"image_generate": "prompt", "text_to_speech": "text",
"vision_analyze": "question", "mixture_of_agents": "user_prompt",
"vision_analyze": "question",
"skill_view": "name", "skills_list": "category",
"cronjob": "action",
"execute_code": "code", "delegate_task": "goal",
@@ -253,6 +473,23 @@ def build_tool_preview(tool_name: str, args: dict, max_len: int | None = None) -
else:
return f"planning {len(todos_arg)} task(s)"
if tool_name in {"terminal", "execute_code"}:
key = "code" if tool_name == "execute_code" else "command"
command = args.get(key)
if command is None:
return None
preview = summarize_shell_command(str(command))
return _truncate_preview(preview, max_len) if preview else None
if tool_name == "read_file":
path = args.get("path") or args.get("file") or args.get("filepath")
if path is None:
return None
label = Path(str(path).replace("\\", "/")).name or str(path)
line_label = _read_file_line_label(args)
preview = f"{label} {line_label}".strip()
return _truncate_preview(preview, max_len) if preview else None
if tool_name == "session_search":
query = _oneline(args.get("query", ""))
return f"recall: \"{query[:25]}{'...' if len(query) > 25 else ''}\""
@@ -300,6 +537,122 @@ def build_tool_preview(tool_name: str, args: dict, max_len: int | None = None) -
return preview
# =========================================================================
# Friendly tool labels (human-phrased verbs for built-in tools)
#
# Turns "web_search <query>" into "Searching the web for <query>" — the
# ChatGPT-style "Searching…/Reading…" surface. Curated and built-in only:
# we know each core tool's semantics, so the verb is fixed, not computed.
# Custom/plugin/MCP tools have no entry and fall back to the raw preview.
# =========================================================================
# Each entry maps a built-in tool name to its present-participle verb phrase.
# A trailing space-then-preview is appended by build_tool_label() when the
# tool's argument preview is available (e.g. "Reading docs/api.md").
_TOOL_VERBS: dict[str, str] = {
"web_search": "Searching the web",
"web_extract": "Reading",
"browser_navigate": "Browsing",
"browser_click": "Clicking",
"browser_type": "Typing",
"read_file": "Reading",
"write_file": "Writing",
"patch": "Editing",
"search_files": "Searching files",
"terminal": "Running",
"execute_code": "Running code",
"image_generate": "Generating image",
"video_generate": "Generating video",
"text_to_speech": "Generating speech",
"vision_analyze": "Looking at the image",
"session_search": "Searching past sessions",
"skill_view": "Reading skill",
"skills_list": "Listing skills",
"skill_manage": "Updating skill",
"delegate_task": "Delegating",
"cronjob": "Scheduling",
"clarify": "Asking",
"memory": "Updating memory",
"todo": "Updating tasks",
}
# Verbs that read better without the raw argument preview appended.
_TOOL_VERBS_NO_PREVIEW: frozenset[str] = frozenset({
"skills_list",
"session_search",
})
# Verbs that take a "for" connector before the preview (search-style phrasing):
# "Searching the web for <query>" reads better than "Searching the web <query>".
_TOOL_VERBS_FOR_CONNECTOR: frozenset[str] = frozenset({
"web_search",
"search_files",
})
_friendly_tool_labels: bool = True
def set_friendly_tool_labels(enabled: bool) -> None:
"""Toggle friendly human-phrased tool labels (display.friendly_tool_labels)."""
global _friendly_tool_labels
_friendly_tool_labels = bool(enabled)
def get_friendly_tool_labels() -> bool:
"""Return whether friendly tool labels are enabled."""
return _friendly_tool_labels
def get_tool_verb(tool_name: str) -> str | None:
"""Return the friendly verb for a built-in tool, or None.
Returns None when friendly labels are disabled or the tool has no curated
verb (custom/plugin/MCP tools). Callers that already hold a computed
argument preview can compose ``f"{verb} {preview}"`` themselves; use
:func:`tool_verb_connector` to pick the right joiner.
"""
if not _friendly_tool_labels:
return None
return _TOOL_VERBS.get(tool_name)
def tool_verb_connector(tool_name: str) -> str:
"""Return the connector between a verb and its preview (" for " or " ")."""
return " for " if tool_name in _TOOL_VERBS_FOR_CONNECTOR else " "
def verb_drops_preview(tool_name: str) -> bool:
"""Whether the verb should render alone, without the argument preview."""
return tool_name in _TOOL_VERBS_NO_PREVIEW
def build_tool_label(tool_name: str, args: dict, max_len: int | None = None) -> str | None:
"""Build a human-phrased status label for a tool call.
For built-in tools with a known verb (``web_search`` -> "Searching the
web for ..."), returns the verb optionally followed by the argument
preview. For everything else (custom/plugin/MCP tools, or when friendly
labels are disabled) returns the raw preview, so callers can use this as a
drop-in replacement for :func:`build_tool_preview`.
"""
if not _friendly_tool_labels:
return build_tool_preview(tool_name, args, max_len=max_len)
verb = _TOOL_VERBS.get(tool_name)
if not verb:
return build_tool_preview(tool_name, args, max_len=max_len)
if tool_name in _TOOL_VERBS_NO_PREVIEW:
return verb
preview = build_tool_preview(tool_name, args, max_len=max_len)
if not preview:
return verb
if tool_name in _TOOL_VERBS_FOR_CONNECTOR:
return f"{verb} for {preview}"
return f"{verb} {preview}"
# =========================================================================
# Inline diff previews for write actions
# =========================================================================
@@ -906,6 +1259,7 @@ def get_cute_tool_message(
When *result* is provided the line is checked for failure indicators.
Failed tool calls get a red prefix and an informational suffix.
"""
args = redact_tool_args_for_display(tool_name, args) or args
dur = f"{duration:.1f}s"
is_failure, failure_suffix = _detect_tool_failure(tool_name, result)
skin_prefix = get_skin_tool_prefix()
@@ -943,7 +1297,7 @@ def get_cute_tool_message(
return _wrap(f"┊ 📄 fetch {_trunc(domain, 35)}{extra} {dur}")
return _wrap(f"┊ 📄 fetch pages {dur}")
if tool_name == "terminal":
return _wrap(f"┊ 💻 $ {_trunc(args.get('command', ''), 42)} {dur}")
return _wrap(f"┊ 💻 $ {_trunc(build_tool_preview(tool_name, args) or args.get('command', ''), 42)} {dur}")
if tool_name == "process":
action = args.get("action", "?")
sid = args.get("session_id", "")[:12]
@@ -951,7 +1305,7 @@ def get_cute_tool_message(
"wait": f"wait {sid}", "kill": f"kill {sid}", "write": f"write {sid}", "submit": f"submit {sid}"}
return _wrap(f"┊ ⚙️ proc {labels.get(action, f'{action} {sid}')} {dur}")
if tool_name == "read_file":
return _wrap(f"┊ 📖 read {_path(args.get('path', ''))} {dur}")
return _wrap(f"┊ 📖 read {_trunc(build_tool_preview(tool_name, args) or args.get('path', ''), 42)} {dur}")
if tool_name == "write_file":
return _wrap(f"┊ ✍️ write {_path(args.get('path', ''))} {dur}")
if tool_name == "patch":
@@ -1037,8 +1391,6 @@ def get_cute_tool_message(
return _wrap(f"┊ 🔊 speak {_trunc(args.get('text', ''), 30)} {dur}")
if tool_name == "vision_analyze":
return _wrap(f"┊ 👁️ vision {_trunc(args.get('question', ''), 30)} {dur}")
if tool_name == "mixture_of_agents":
return _wrap(f"┊ 🧠 reason {_trunc(args.get('user_prompt', ''), 30)} {dur}")
if tool_name == "send_message":
return _wrap(f"┊ 📨 send {args.get('target', '?')}: \"{_trunc(args.get('message', ''), 25)}\" {dur}")
if tool_name == "cronjob":

View File

@@ -133,6 +133,31 @@ _RATE_LIMIT_PATTERNS = [
"servicequotaexceededexception",
]
# Patterns that indicate provider-side overload, NOT a per-credential rate
# limit or billing problem. The credential is valid — the server is just
# busy — so the correct recovery is "back off and retry the same key", never
# "rotate the credential" (rotating exhausts the pool while the endpoint is
# still busy; a single-key user has nothing to rotate to). Some providers
# (notably Z.AI / Zhipu) reuse HTTP 429 for server-wide overload, so the 429
# status path matches the body against this list before falling through to
# the rate_limit default. Phrases are kept narrow and overload-flavoured so a
# normal rate-limit message ("you have been rate-limited") doesn't hit this
# bucket. (#14038, #15297)
_OVERLOADED_PATTERNS = [
"overloaded",
"temporarily overloaded",
"service is temporarily overloaded",
"service may be temporarily overloaded",
"server is overloaded",
"server overloaded",
"service overloaded",
"service is overloaded",
"upstream overloaded",
"currently overloaded",
"at capacity",
"over capacity",
]
# Usage-limit patterns that need disambiguation (could be billing OR rate_limit)
_USAGE_LIMIT_PATTERNS = [
"usage limit",
@@ -330,6 +355,14 @@ _CONTENT_POLICY_BLOCKED_PATTERNS = [
# echo back; the underscore form is provider-specific enough.
"content_filter",
"responsibleaipolicyviolation",
# MiniMax output-layer safety filter. The error string is surfaced
# verbatim by MiniMax SDK / OpenAI-compatible endpoints, usually in the
# form "output new_sensitive (1027)" when the model's *output* (often a
# large tool-call argument block) trips the upstream safety filter and
# the SSE stream is truncated mid-flight. ``new_sensitive`` is the
# filter name and is narrow enough that billing / format / auth error
# strings will not collide. See #32421.
"new_sensitive",
]
# Auth patterns (non-status-code signals)
@@ -717,6 +750,26 @@ def classify_api_error(
is_disconnect = any(p in error_msg for p in _SERVER_DISCONNECT_PATTERNS)
if is_disconnect and not status_code:
# Reasoning-model override: a transport disconnect on a reasoning
# model is much more likely the upstream proxy idle-killing a
# long thinking stream than a true context overflow — even on
# large sessions. The default disconnect+large-session routing
# below would otherwise send the user into the compression
# branch (should_compress=True) and silently delete
# conversation history on a phantom context-length error.
# Reasoning models have multi-minute thinking phases that
# routinely exceed the cloud gateway's idle window (NVIDIA
# NIM ~120s — first-party repro at NVIDIA/NemoClaw#4846;
# OpenAI worker / Anthropic stream-idle similar). The
# per-reasoning-model stale-timeout floor in
# agent/reasoning_timeouts.py raises the stale-detector
# threshold to tolerate long thinking, so a true
# transport-layer failure here is recoverable via the retry
# path — not via context compression. Reclassify as timeout.
# (Part 1 of Fixes #52310.)
from agent.reasoning_timeouts import get_reasoning_stale_timeout_floor
if get_reasoning_stale_timeout_floor(model) is not None:
return _result(FailoverReason.timeout, retryable=True)
# Absolute token/message-count thresholds are only a proxy for smaller
# context windows. Large-context sessions can have hundreds of
# messages while still being far below their actual token budget.
@@ -843,7 +896,19 @@ def _classify_by_status(
)
if status_code == 429:
# Already checked long_context_tier above; this is a normal rate limit
# Already checked long_context_tier above. Some providers (notably
# Z.AI / Zhipu) reuse HTTP 429 for server-wide overload — same status
# code as a true per-credential rate limit, but the credential is
# valid and the correct recovery is "back off and retry the same key",
# NOT "rotate the credential" (which exhausts the pool while the
# endpoint is still busy, and does nothing for a single-key user).
# Disambiguate on the error body so an overload 429 takes the
# transient-overload path instead of burning the pool. (#14038)
if any(p in error_msg for p in _OVERLOADED_PATTERNS):
return result_fn(
FailoverReason.overloaded,
retryable=True,
)
return result_fn(
FailoverReason.rate_limit,
retryable=True,
@@ -1194,6 +1259,17 @@ def _classify_by_message(
should_fallback=True,
)
# Overloaded / server-busy patterns — must come BEFORE the rate_limit and
# billing checks so that a message-only "overloaded" (no 503/529 status,
# e.g. some Anthropic-compatible proxies) classifies as a transient
# overload (backoff + retry) instead of falling through to `unknown` or
# incorrectly triggering credential rotation.
if any(p in error_msg for p in _OVERLOADED_PATTERNS):
return result_fn(
FailoverReason.overloaded,
retryable=True,
)
# Billing patterns
if any(p in error_msg for p in _BILLING_PATTERNS):
return result_fn(
@@ -1283,19 +1359,25 @@ def _extract_status_code(error: Exception) -> Optional[int]:
def _extract_error_body(error: Exception) -> dict:
"""Extract the structured error body from an SDK exception."""
body = getattr(error, "body", None)
if isinstance(body, dict):
return body
# Some errors have .response.json()
response = getattr(error, "response", None)
if response is not None:
try:
json_body = response.json()
if isinstance(json_body, dict):
return json_body
except Exception:
pass
"""Extract the structured error body from an SDK exception or its cause chain."""
current = error
for _ in range(5): # Match _extract_status_code() traversal depth.
body = getattr(current, "body", None)
if isinstance(body, dict):
return body
# Some errors have .response.json()
response = getattr(current, "response", None)
if response is not None:
try:
json_body = response.json()
if isinstance(json_body, dict):
return json_body
except Exception:
pass
cause = getattr(current, "__cause__", None) or getattr(current, "__context__", None)
if cause is None or cause is current:
break
current = cause
return {}

View File

@@ -77,15 +77,22 @@ def build_write_denied_prefixes(home: str) -> list[str]:
]
def get_safe_write_root() -> Optional[str]:
"""Return the resolved HERMES_WRITE_SAFE_ROOT path, or None if unset."""
root = os.getenv("HERMES_WRITE_SAFE_ROOT", "")
if not root:
return None
try:
return os.path.realpath(os.path.expanduser(root))
except Exception:
return None
def get_safe_write_roots() -> set[str]:
"""Return resolved HERMES_WRITE_SAFE_ROOT paths. Supports multiple directories
separated by ``os.pathsep`` (``:`` on Unix, ``;`` on Windows).
E.g., ``/opt/data:/var/www/html`` on Unix, ``C:\\data;D:\\www`` on Windows."""
env = os.getenv("HERMES_WRITE_SAFE_ROOT", "")
if not env:
return set()
roots: set[str] = set()
for path in env.split(os.pathsep):
if path:
try:
resolved = os.path.realpath(os.path.expanduser(path))
roots.add(resolved)
except (OSError, ValueError):
continue
return roots
def is_write_denied(path: str) -> bool:
@@ -124,9 +131,15 @@ def is_write_denied(path: str) -> bool:
except Exception:
pass
safe_root = get_safe_write_root()
if safe_root and not (resolved == safe_root or resolved.startswith(safe_root + os.sep)):
return True
safe_roots = get_safe_write_roots()
if safe_roots:
allowed = False
for safe_root in safe_roots:
if resolved == safe_root or resolved.startswith(safe_root + os.sep):
allowed = True
break
if not allowed:
return True
return False

View File

@@ -251,6 +251,78 @@ def _supports_vision_override(
return None
def _resolve_inference_base_url(
cfg: Optional[Dict[str, Any]],
provider: str,
) -> str:
"""Best-effort base URL for the active inference provider."""
try:
from agent.auxiliary_client import _RUNTIME_MAIN_BASE_URL
runtime = str(_RUNTIME_MAIN_BASE_URL or "").strip()
if runtime:
return runtime
except Exception:
pass
if not isinstance(cfg, dict):
return ""
model_cfg_raw = cfg.get("model")
model_cfg: Dict[str, Any] = model_cfg_raw if isinstance(model_cfg_raw, dict) else {}
base_url = str(model_cfg.get("base_url") or "").strip()
if base_url:
return base_url
config_provider = str(model_cfg.get("provider") or "").strip()
candidate_names: set[str] = set()
for p in filter(None, (provider, config_provider)):
candidate_names.add(p)
if p.lower().startswith("custom:"):
candidate_names.add(p.split(":", 1)[1])
else:
candidate_names.add(f"custom:{p}")
providers_cfg = cfg.get("providers")
if isinstance(providers_cfg, dict):
for name in candidate_names:
entry = providers_cfg.get(name)
if isinstance(entry, dict):
bu = str(entry.get("base_url") or "").strip()
if bu:
return bu
custom_providers = cfg.get("custom_providers")
if isinstance(custom_providers, list):
lowered = {n.lower() for n in candidate_names}
for entry_raw in custom_providers:
if not isinstance(entry_raw, dict):
continue
entry_name = str(entry_raw.get("name") or "").strip()
if entry_name not in candidate_names and entry_name.lower() not in lowered:
continue
bu = str(entry_raw.get("base_url") or "").strip()
if bu:
return bu
return ""
def _should_probe_ollama_vision(provider: str, base_url: str) -> bool:
"""True when the active provider likely fronts a local Ollama server."""
p = (provider or "").strip().lower()
if p == "ollama":
return True
if not base_url:
return False
try:
from agent.model_metadata import detect_local_server_type
return detect_local_server_type(base_url) == "ollama"
except Exception:
return False
def _coerce_mode(raw: Any) -> str:
"""Normalize a config value into one of the valid modes."""
if not isinstance(raw, str):
@@ -302,15 +374,33 @@ def _lookup_supports_vision(
return override
if not provider or not model:
return None
caps = None
try:
from agent.models_dev import get_model_capabilities
caps = get_model_capabilities(provider, model)
except Exception as exc: # pragma: no cover - defensive
logger.debug("image_routing: caps lookup failed for %s:%s%s", provider, model, exc)
return None
if caps is None:
return None
return bool(caps.supports_vision)
if caps is not None:
return bool(caps.supports_vision)
base_url = _resolve_inference_base_url(cfg, provider)
if not base_url and (provider or "").strip().lower() == "ollama":
base_url = "http://localhost:11434/v1"
if _should_probe_ollama_vision(provider, base_url):
try:
from agent.model_metadata import query_ollama_supports_vision
ollama_vision = query_ollama_supports_vision(model, base_url)
if ollama_vision is not None:
return ollama_vision
except Exception as exc: # pragma: no cover - defensive
logger.debug(
"image_routing: ollama vision probe failed for %s:%s%s",
provider,
model,
exc,
)
return None
def decide_image_input_mode(
@@ -388,14 +478,98 @@ def _sniff_mime_from_bytes(raw: bytes) -> Optional[str]:
# BMP: "BM"
if raw.startswith(b"BM"):
return "image/bmp"
# HEIC/HEIF: ftypheic / ftypheix / ftypmif1 / ftypmsf1 etc.
if len(raw) >= 12 and raw[4:8] == b"ftyp" and raw[8:12] in {
b"heic", b"heix", b"hevc", b"hevx", b"mif1", b"msf1", b"heim", b"heis",
}:
return "image/heic"
# ISO-BMFF family (HEIC/HEIF/AVIF): bytes 4..8 == 'ftyp', major brand at 8..12
if len(raw) >= 12 and raw[4:8] == b"ftyp":
brand = raw[8:12]
if brand in {b"avif", b"avis"}:
return "image/avif"
if brand in {
b"heic", b"heix", b"hevc", b"hevx",
b"mif1", b"msf1", b"heim", b"heis",
}:
return "image/heic"
# TIFF: II*\0 (little-endian) or MM\0* (big-endian)
if raw[:4] in {b"II*\x00", b"MM\x00*"}:
return "image/tiff"
# ICO: 00 00 01 00 (reserved=0, type=1=icon)
if raw[:4] == b"\x00\x00\x01\x00":
return "image/x-icon"
# SVG: text-based, look for an <svg tag near the start (skip BOM/whitespace)
head = raw[:512].lstrip().lower()
if head.startswith(b"<?xml") or head.startswith(b"<svg"):
if b"<svg" in head:
return "image/svg+xml"
return None
# Formats every major vision provider (Anthropic, OpenAI, Gemini, Bedrock)
# accepts natively. Anything outside this set has to be transcoded to PNG
# before we declare media_type, otherwise the provider returns HTTP 400
# ("Could not process image" / "Unsupported image media type") and the
# whole turn fails with no salvage path.
#
# Discord (and a few other chat platforms) freely accept attachments in
# formats outside this set -- AVIF screenshots from Chromium, HEIC from
# iPhones, TIFF from scanners, BMP from old Windows tools, ICO -- so users
# do hit this in practice. SVG is vector and Pillow cannot rasterize it;
# it is skipped (logged) rather than transcoded.
_UNIVERSALLY_SUPPORTED_MIMES = frozenset({
"image/png", "image/jpeg", "image/gif", "image/webp",
})
def _transcode_to_png(raw: bytes) -> Optional[bytes]:
"""Decode arbitrary image bytes with Pillow and re-encode as PNG.
Returns None if Pillow isn't installed or can't decode the input
(rare formats, corrupted bytes, missing optional decoder plugin for
HEIC/AVIF, or vector formats like SVG). Caller falls back to skipping
the image so the rest of the turn still works.
HEIC/HEIF and AVIF need optional Pillow plugins; we try to register
them on demand and swallow ImportError so a missing plugin just
looks like 'Pillow can't decode this' rather than crashing.
"""
try:
from PIL import Image
except ImportError:
logger.info(
"image_routing: Pillow not installed; cannot transcode "
"non-standard image format to PNG. Install with `pip install Pillow` "
"(and `pillow-heif` / `pillow-avif-plugin` for those formats)."
)
return None
# Optional plugin registration. Silent on failure: an unsupported
# format will just fall through to Image.open raising below.
try:
import pillow_heif # type: ignore
pillow_heif.register_heif_opener()
except Exception:
pass
try:
import pillow_avif # type: ignore # noqa: F401 -- registers AVIF on import
except Exception:
pass
try:
from io import BytesIO
with Image.open(BytesIO(raw)) as im:
# Pick an output mode PNG can serialise. Anything other than
# the standard set gets normalised to RGBA so transparency is
# preserved where the source had it.
if im.mode not in {"RGB", "RGBA", "L", "LA", "P"}:
im = im.convert("RGBA")
buf = BytesIO()
im.save(buf, format="PNG", optimize=False)
return buf.getvalue()
except Exception as exc:
logger.info(
"image_routing: Pillow could not transcode image to PNG -- %s", exc
)
return None
def _guess_mime(path: Path, raw: Optional[bytes] = None) -> str:
"""Return image MIME type for *path*.
@@ -431,8 +605,18 @@ def _file_to_data_url(path: Path) -> Optional[str]:
accept large images (OpenAI 49 MB+, Gemini 100 MB) don't pay a silent
quality tax just because one other provider is stricter.
Returns None only if the file can't be read (missing, permission
denied, etc.); the caller reports those paths in ``skipped``.
Format compatibility IS handled here: if the sniffed MIME isn't one
of ``_UNIVERSALLY_SUPPORTED_MIMES`` (i.e. it's something like AVIF,
HEIC, BMP, TIFF, or ICO that some providers reject outright), we
transcode to PNG with Pillow before declaring media_type. This fixes
the user-visible "Could not process image" HTTP 400 from Anthropic on
Discord-attached AVIF/HEIC/BMP files.
Returns None if the file can't be read OR if the format isn't
universally supported AND Pillow can't transcode it (Pillow missing,
HEIC/AVIF plugin missing, vector format like SVG, corrupt bytes). The
caller reports those paths in ``skipped`` and the rest of the turn
proceeds.
"""
try:
raw = path.read_bytes()
@@ -440,6 +624,22 @@ def _file_to_data_url(path: Path) -> Optional[str]:
logger.warning("image_routing: failed to read %s%s", path, exc)
return None
mime = _guess_mime(path, raw=raw)
if mime not in _UNIVERSALLY_SUPPORTED_MIMES:
transcoded = _transcode_to_png(raw)
if transcoded is None:
logger.warning(
"image_routing: %s is %s which is not accepted by all major "
"vision providers and could not be transcoded to PNG; "
"skipping this attachment.",
path, mime,
)
return None
logger.info(
"image_routing: transcoded %s (%s) -> image/png for provider compatibility",
path.name, mime,
)
raw = transcoded
mime = "image/png"
b64 = base64.b64encode(raw).decode("ascii")
return f"data:{mime};base64,{b64}"

136
agent/learn_prompt.py Normal file
View File

@@ -0,0 +1,136 @@
#!/usr/bin/env python3
"""``/learn`` — build the standards-guided prompt that turns whatever the user
described into a reusable skill.
``/learn`` is open-ended. The user can point it at anything they can describe:
a directory of code, an API doc URL, a workflow they just walked the agent
through in this conversation, or pasted notes. This module builds ONE prompt
that instructs the live agent to:
1. Gather the sources the user named, using the tools it already has
(``read_file`` / ``search_files`` for dirs, ``web_extract`` for URLs, the
current conversation for "what I just did", the user's text for pasted
material).
2. Author a single ``SKILL.md`` via ``skill_manage`` that follows the Hermes
skill-authoring standards (description <=60 chars, the modern section
order, Hermes-tool framing, no invented commands).
There is no separate distillation engine and no model-tool footprint: the
agent does the work with its existing toolset, so this works identically on
local, Docker, and remote terminal backends. Every surface (CLI ``/learn``,
gateway ``/learn``, the dashboard "Learn a skill" panel) calls
:func:`build_learn_prompt` and feeds the result to the agent as a normal turn.
"""
from __future__ import annotations
# The house-style rules, distilled from AGENTS.md "Skill authoring standards
# (HARDLINE)" and the hermes-agent-dev new-skill salvage reference. Embedded in
# the prompt so the agent authors skills the way a maintainer would by hand.
_AUTHORING_STANDARDS = """\
Follow the Hermes skill-authoring standards exactly. These are the same
HARDLINE rules a maintainer enforces in review:
Frontmatter:
- name: lowercase-hyphenated, <=64 chars, no spaces.
- description: ONE sentence, **<=60 characters**, ends with a period. State the
capability, not the implementation. No marketing words (powerful,
comprehensive, seamless, advanced, robust). Do NOT repeat the skill name. If
the description contains a colon, wrap the whole value in double quotes.
This is the most-violated rule and it is NOT cosmetic: the system-prompt
skill index truncates the description to 60 chars and loads it every
session, so anything past char 60 is silently cut and never routes. After
you write the description, COUNT the characters; if it is over 60, cut it
down before saving — do not ship a sentence and hope.
Good (<=60): `Search arXiv papers by keyword, author, or ID.`
Bad (123): `A comprehensive skill that lets the agent search arXiv for
academic papers using keywords, authors, and categories.`
- version: 0.1.0
- author: always the literal value `Hermes`. NEVER fill it from the host
environment — the OS/login username (e.g. the `user=` line in your
environment hints), git config, or any identity you can probe must not be
written. Skills get shared and published, so an environment-derived name is
a privacy leak the user never opted into; the skill names itself as Hermes.
- platforms: declare `[macos]`, `[linux]`, and/or `[windows]` IF the skill
uses OS-bound primitives (osascript/apt/systemctl => the matching OS; /proc,
os.setsid, signal.SIGKILL => linux; fcntl/termios => POSIX). Prefer fixing it
cross-platform first (tempfile.gettempdir(), pathlib.Path, psutil); gate only
when the dependency is genuinely platform-bound. Omit the field for portable
skills.
- metadata.hermes.tags: a few Capitalized, Relevant, Tags.
Body section order (omit a section only if it genuinely has no content):
1. "# <Human Title>" then a 2-3 sentence intro: what it does, what it does NOT
do, and the key dependency stance (e.g. "stdlib only").
2. "## When to Use" — bullet list of concrete trigger phrases.
3. "## Prerequisites" — exact env vars, install steps, credentials.
4. "## How to Run" — the canonical invocation, framed through Hermes tools.
5. "## Quick Reference" — a flat command/endpoint list, no narration.
6. "## Procedure" — numbered steps with copy-paste-exact commands.
7. "## Pitfalls" — known limits, rate limits, things that look broken but aren't.
8. "## Verification" — a single command/check that proves the skill worked.
Hermes-tool framing (this is what makes it a skill, not shell docs):
- Frame running scripts as "invoke through the `terminal` tool".
- Reference Hermes tools by name in backticks: `terminal`, `read_file`,
`write_file`, `search_files`, `patch`, `web_extract`, `web_search`,
`vision_analyze`, `browser_navigate`, `delegate_task`, `image_generate`,
`text_to_speech`, `cronjob`, `memory`, `skill_view`, `execute_code`.
- Do NOT name shell utilities the agent already has wrapped: say `read_file`
not cat/head/tail, `search_files` not grep/rg/find/ls, `patch` not sed/awk,
`web_extract` not curl-to-scrape, `write_file` not echo>file or heredocs.
- Third-party CLIs (ffmpeg, gh, an SDK) are fine inside a script file, but the
prose still frames them as "invoke through the `terminal` tool". If the
skill needs an MCP server, name it and document its setup in Prerequisites.
Quality bar:
- Prefer exact commands, endpoint URLs, function signatures, and config keys
that appear VERBATIM in the source. NEVER invent flags, paths, or APIs — if
you didn't see it in the source, don't write it.
- Keep it tight and scannable: ~100 lines for a simple skill, ~200 for a
complex one. Don't re-paste the source docs.
- Don't write a router/index/hub skill that only points at other skills.
- Larger scripts/parsers belong in a `scripts/` file (add via
`skill_manage` write_file), referenced from SKILL.md by relative path — not
inlined for the agent to re-type every run. References go in `references/`,
templates in `templates/`."""
def build_learn_prompt(user_request: str) -> str:
"""Build the agent prompt for an open-ended ``/learn`` request.
Args:
user_request: the free-text the user gave after ``/learn`` — a
description of the workflow, paths, URLs, or "what I just did".
Returns:
A complete instruction the agent runs as a normal turn. The agent
gathers the described sources with its existing tools and authors the
skill via ``skill_manage``.
"""
req = (user_request or "").strip()
if not req:
req = (
"the workflow we just went through in this conversation — review "
"the steps taken and distill them into a reusable skill"
)
return (
"[/learn] The user wants you to learn a reusable skill from the "
"source(s) they described below, and save it.\n\n"
f"WHAT TO LEARN FROM:\n{req}\n\n"
"Do this:\n"
"1. Gather the material. Resolve whatever the user named using the "
"tools you already have — `read_file`/`search_files` for local files "
"or directories, `web_extract` for URLs, the current conversation "
"history if they referred to something you just did, and the text "
"they pasted as-is. If the request is ambiguous about scope, make a "
"reasonable choice and note it; do not stall.\n"
"2. Author ONE SKILL.md and save it with the `skill_manage` tool "
"(action=\"create\"). Pick a sensible category. If the procedure needs "
"a non-trivial script, add it under the skill's `scripts/` with "
"`skill_manage` write_file and reference it by relative path.\n\n"
f"{_AUTHORING_STANDARDS}\n\n"
"When done, tell the user the skill name, its category, and a "
"one-line summary of what it captured."
)

View File

@@ -46,6 +46,39 @@ logger = logging.getLogger(__name__)
_SYNC_DRAIN_TIMEOUT_S = 5.0
def normalize_tool_schema(schema: Any) -> Optional[Dict[str, Any]]:
"""Return a function-tool dict with a resolvable top-level ``name``.
Context engines and memory providers expose tool schemas via
``get_tool_schemas()``. The expected shape is a bare function schema
(``{"name": ..., "description": ..., "parameters": ...}``) which callers
wrap as ``{"type": "function", "function": schema}``.
Some providers instead return an entry that is *already* in OpenAI tool
form (``{"type": "function", "function": {"name": ...}}``). Wrapping that
a second time produces ``{"type": "function", "function": {"type":
"function", "function": {...}}}`` whose ``function`` has no top-level
``name``. Strict providers (e.g. DeepSeek) reject the *entire* request
with ``tools[N].function: missing field name`` (HTTP 400), so one bad
schema disables the whole toolset and breaks every turn (#47707).
This helper normalizes both shapes to the bare function schema and
returns ``None`` for anything without a resolvable name, so callers can
skip-with-warning rather than appending a nameless tool.
"""
if not isinstance(schema, dict):
return None
# Unwrap an already-wrapped OpenAI tool entry.
if schema.get("type") == "function" and isinstance(schema.get("function"), dict):
schema = schema["function"]
if not isinstance(schema, dict):
return None
name = schema.get("name", "")
if not name or not isinstance(name, str):
return None
return schema
def memory_provider_tools_enabled(enabled_toolsets: Optional[List[str]]) -> bool:
"""Return whether external memory-provider tools should be exposed."""
if enabled_toolsets is None:
@@ -92,11 +125,17 @@ def inject_memory_provider_tools(agent: Any) -> int:
agent.valid_tool_names = valid_tool_names
added = 0
for schema in get_schemas():
if not isinstance(schema, dict):
for raw_schema in get_schemas():
schema = normalize_tool_schema(raw_schema)
if schema is None:
logger.warning(
"Memory provider returned a tool schema with no resolvable "
"name; skipping to avoid poisoning the request (%r)",
raw_schema,
)
continue
tool_name = schema.get("name", "")
if not tool_name or tool_name in existing_tool_names:
tool_name = schema["name"]
if tool_name in existing_tool_names:
continue
tools.append({"type": "function", "function": schema})
valid_tool_names.add(tool_name)
@@ -370,8 +409,11 @@ class MemoryManager:
_core_tool_names = set(_HERMES_CORE_TOOLS)
# Index tool names → provider for routing
for schema in provider.get_tool_schemas():
tool_name = schema.get("name", "")
for raw_schema in provider.get_tool_schemas():
schema = normalize_tool_schema(raw_schema)
if schema is None:
continue
tool_name = schema["name"]
if tool_name in _core_tool_names:
logger.warning(
"Memory provider '%s' tool '%s' shadows a reserved core "
@@ -658,11 +700,19 @@ class MemoryManager:
seen = set()
for provider in self._providers:
try:
for schema in provider.get_tool_schemas():
name = schema.get("name", "")
for raw_schema in provider.get_tool_schemas():
schema = normalize_tool_schema(raw_schema)
if schema is None:
logger.warning(
"Memory provider '%s' returned a tool schema with "
"no resolvable name; skipping (%r)",
provider.name, raw_schema,
)
continue
name = schema["name"]
if name in _core_tool_names:
continue
if name and name not in seen:
if name not in seen:
schemas.append(schema)
seen.add(name)
except Exception as e:

View File

@@ -279,6 +279,38 @@ def _repair_tool_call_arguments(raw_args: str, tool_name: str = "?") -> str:
return "{}"
def close_interrupted_tool_sequence(messages: list, final_response: Any = None) -> bool:
"""Append a synthetic assistant turn when an interrupted tail is a tool result.
A turn cut short by ``/stop`` can leave the transcript ending on a raw
``tool`` message (a tool finished, or its execution was cancelled, but the
model never streamed a closing assistant turn). Persisting that tail means
the next user message lands as ``… tool → user`` — a role-alternation
violation that strict providers (Gemini, Claude) react to by hallucinating
a continuation of the user's message and ignoring prior context, which
reads to the user as "lost context" (#48879).
``finalize_turn`` closes this on the happy interrupt path, but the
retry/backoff/error interrupt aborts in ``conversation_loop`` ``return``
early and never reach it — this shared helper closes the sequence on all of
them. ``final_response`` is usually empty on an interrupt, so an explicit
placeholder is used rather than an empty-content assistant turn.
Mutates ``messages`` in place. Returns True if a closing turn was appended.
"""
if not messages:
return False
last = messages[-1]
if not isinstance(last, dict) or last.get("role") != "tool":
return False
text = final_response if isinstance(final_response, str) else ""
messages.append({
"role": "assistant",
"content": text.strip() or "Operation interrupted.",
})
return True
def _strip_non_ascii(text: str) -> str:
"""Remove non-ASCII characters, replacing with closest ASCII equivalent or removing.
@@ -431,6 +463,7 @@ def _sanitize_structure_non_ascii(payload: Any) -> bool:
__all__ = [
"_SURROGATE_RE",
"close_interrupted_tool_sequence",
"_sanitize_surrogates",
"_sanitize_structure_surrogates",
"_sanitize_messages_surrogates",

586
agent/moa_loop.py Normal file
View File

@@ -0,0 +1,586 @@
"""Mixture-of-Agents runtime helpers for /moa turns.
The slash command is deliberately not a model tool. It marks one user turn as
MoA-enabled; the normal Hermes agent loop still owns tool calling and turn
termination, while this module gathers reference-model context before each model
iteration.
"""
from __future__ import annotations
import hashlib
import logging
from concurrent.futures import ThreadPoolExecutor
from typing import Any
from agent.auxiliary_client import call_llm
from agent.transports import get_transport
logger = logging.getLogger(__name__)
# Upper bound on concurrent reference-model calls. References are independent
# advisory calls (no tools, no inter-dependence), so we fan them out the same
# way delegate_task runs a batch: all in flight at once, results collected when
# every reference finishes. Presets rarely list more than a handful of
# references; this cap just protects against a pathologically large preset
# opening dozens of sockets at once.
_MAX_REFERENCE_WORKERS = 8
# Per-tool-result character budget for the advisory reference view. Tool
# results can be huge (a full diff, a 5000-line file dump); replaying them
# verbatim per reference per tool-loop step would blow the reference model's
# context window and cost. We keep the agent's *actions* (tool calls) in full —
# they are cheap, high-signal, and tell the reference what the agent did — but
# preview each tool *result* head+tail so the reference still sees what came
# back without replaying megabytes. The acting aggregator always gets the full,
# untrimmed transcript; this budget only shapes the advisory copy.
_REFERENCE_TOOL_RESULT_BUDGET = 4000
# System prompt prepended to every reference-model call. References are
# advisory — they do NOT act, call tools, or own the task. Without this
# framing a reference receives the bare trimmed conversation and assumes it is
# the acting agent: it then refuses ("I can't access repositories / URLs from
# here") or tries to call tools it doesn't have. The prompt reframes the model
# as an analyst whose job is to reason about the presented state and hand its
# best thinking to the aggregator/orchestrator that will actually act.
_REFERENCE_SYSTEM_PROMPT = (
"You are a reference advisor in a Mixture of Agents (MoA) process. You are "
"NOT the acting agent and you do NOT execute anything: you cannot call "
"tools, run commands, browse, or access files, repositories, or URLs, and "
"you should not try to or apologize for being unable to. A separate "
"aggregator/orchestrator model holds those capabilities and will take the "
"actual actions.\n\n"
"The conversation below is the current state of a task handled by that "
"acting agent. Your job is to give your most intelligent analysis of that "
"state: understand the goal, reason about the problem, and advise on what "
"to do next. Surface the best approach, concrete next steps and tool-use "
"strategy, likely pitfalls and risks, and anything the acting agent may "
"have missed or gotten wrong. Assume any referenced files, URLs, or "
"systems exist and reason about them from the context given rather than "
"asking for access.\n\n"
"Respond with your advice directly — no preamble, no disclaimers about "
"tools or access. Your response is private guidance handed to the "
"aggregator, not an answer shown to the user."
)
def _slot_label(slot: dict[str, str]) -> str:
return f"{slot.get('provider', '').strip()}:{slot.get('model', '').strip()}"
def _slot_runtime(slot: dict[str, str]) -> dict[str, Any]:
"""Resolve a reference/aggregator slot to real runtime call kwargs.
A MoA slot is just a model selection — it must be called the same way any
model is called elsewhere, not through a bare ``call_llm(provider=...,
model=...)`` that leaves base_url/api_key/api_mode unresolved and lets the
auxiliary auto-detector guess. We route the slot's provider through
``resolve_runtime_provider`` (the canonical provider→api_mode/base_url/
api_key resolver the CLI, gateway, and delegate_task all use), so the slot
gets its provider's real API surface — e.g. MiniMax → anthropic_messages,
GPT-5/o-series → max_completion_tokens, custom endpoints → their base_url.
Returns the kwargs to pass through to ``call_llm`` (provider/model plus the
resolved base_url/api_key when available). Falls back to the bare
provider/model on any resolution error so a misconfigured slot still
attempts the call rather than aborting the whole MoA turn.
"""
provider = str(slot.get("provider") or "").strip()
model = str(slot.get("model") or "").strip()
out: dict[str, Any] = {"provider": provider, "model": model}
try:
from hermes_cli.runtime_provider import resolve_runtime_provider
rt = resolve_runtime_provider(requested=provider, target_model=model)
resolved_provider = str(rt.get("provider") or provider).strip().lower()
# call_llm treats an explicit base_url as a custom endpoint. That is
# correct for ordinary OpenAI-compatible targets, but wrong for OAuth /
# provider-backed targets whose provider branch adds auth refresh,
# request metadata, or request-shape adapters. Keep those providers
# identified by name.
if resolved_provider in {"nous", "openai-codex", "xai-oauth"}:
return out
# Pass the resolved endpoint through so call_llm builds the request for
# the provider's actual API surface instead of auto-detecting. base_url
# routes call_llm to the right adapter (incl. anthropic_messages mode);
# api_key is the resolved credential for that provider.
if rt.get("base_url"):
out["base_url"] = rt["base_url"]
if rt.get("api_key"):
out["api_key"] = rt["api_key"]
except Exception as exc: # pragma: no cover - defensive
logger.debug("MoA slot runtime resolution failed for %s: %s", _slot_label(slot), exc)
return out
def _run_reference(
slot: dict[str, str],
ref_messages: list[dict[str, Any]],
*,
temperature: float | None = None,
max_tokens: int | None = None,
) -> tuple[str, str]:
"""Call one reference model and return ``(label, text)``.
The slot is resolved to its provider's real runtime (via ``_slot_runtime``)
and called through the same ``call_llm`` request-building path any model
uses, so per-model wire-format handling (anthropic_messages,
max_completion_tokens, fixed/forbidden temperature) applies identically to
a reference as it would if that model were the acting model. MoA imposes no
cap of its own (``max_tokens`` defaults to ``None`` → omitted → the model's
real maximum); ``temperature`` is only the user's configured preset value,
which call_llm may still override per model.
Never raises: a failed reference becomes a labelled note so the aggregator
can still act with partial context. Designed to run inside a thread pool —
``call_llm`` is synchronous/blocking, so threads (not asyncio) are the right
concurrency primitive, mirroring ``delegate_task``'s batch fan-out.
"""
label = _slot_label(slot)
try:
# Prepend the advisory-role system prompt so the reference understands
# it is analyzing state for an aggregator, not acting on the task. The
# trimmed view (_reference_messages) already strips the agent's own
# system prompt, so this is the only system message the reference sees.
messages = [{"role": "system", "content": _REFERENCE_SYSTEM_PROMPT}, *ref_messages]
response = call_llm(
task="moa_reference",
messages=messages,
temperature=temperature,
max_tokens=max_tokens,
**_slot_runtime(slot),
)
return label, _extract_text(response) or "(empty response)"
except Exception as exc:
logger.warning("MoA reference model %s failed: %s", label, exc)
return label, f"[failed: {exc}]"
def _run_references_parallel(
reference_models: list[dict[str, str]],
ref_messages: list[dict[str, Any]],
*,
temperature: float | None = None,
max_tokens: int | None = None,
) -> list[tuple[str, str]]:
"""Fan out all reference models in parallel, returning outputs in order.
Like ``delegate_task``'s batch mode, every reference is dispatched at once
and we block until all of them finish before handing the joined results to
the aggregator. Output order matches ``reference_models`` so the
``Reference {idx}`` labelling stays stable. MoA presets that reference
another MoA preset are skipped here (recursion guard) with a labelled note.
"""
if not reference_models:
return []
results: list[tuple[str, str] | None] = [None] * len(reference_models)
futures = {}
workers = min(_MAX_REFERENCE_WORKERS, len(reference_models))
with ThreadPoolExecutor(max_workers=workers) as executor:
for idx, slot in enumerate(reference_models):
if slot.get("provider") == "moa":
results[idx] = (
_slot_label(slot),
"[skipped: MoA presets cannot recursively reference MoA]",
)
continue
futures[
executor.submit(
_run_reference,
slot,
ref_messages,
temperature=temperature,
max_tokens=max_tokens,
)
] = idx
# Collect every reference before returning — the aggregator needs the
# complete set, so there is no early-exit / first-completed path here.
for future, idx in futures.items():
results[idx] = future.result()
return [r for r in results if r is not None]
def _truncate_tool_result(text: str, budget: int = _REFERENCE_TOOL_RESULT_BUDGET) -> str:
"""Head+tail preview of a tool result for the advisory view.
Keeps the first and last halves of the budget with a ``[... N chars
omitted ...]`` marker between them, so a reference sees both how the result
started and how it ended without replaying the whole payload.
"""
if not text or len(text) <= budget:
return text
half = budget // 2
omitted = len(text) - 2 * half
return f"{text[:half]}\n[... {omitted} chars omitted ...]\n{text[-half:]}"
def _render_tool_calls(tool_calls: Any) -> str:
"""Render an assistant turn's tool_calls as readable text lines.
The advisory view cannot carry real ``tool_calls`` payloads (strict
providers reject tool_calls the reference never produced), so the agent's
actions are flattened to text the reference can read and reason about.
"""
lines: list[str] = []
for tc in tool_calls or []:
fn = (tc.get("function") or {}) if isinstance(tc, dict) else {}
name = fn.get("name") or (tc.get("name") if isinstance(tc, dict) else "") or "tool"
args = fn.get("arguments")
if isinstance(args, str):
args_text = args
elif args is not None:
try:
import json
args_text = json.dumps(args, ensure_ascii=False)
except Exception:
args_text = str(args)
else:
args_text = ""
lines.append(f"[called tool: {name}({args_text})]" if args_text else f"[called tool: {name}]")
return "\n".join(lines)
def _reference_messages(messages: list[dict[str, Any]]) -> list[dict[str, Any]]:
"""Build an advisory view of the conversation for reference models.
A reference gives an INFORMED judgement on the current state, so it must
see what the agent actually did — its tool calls AND the tool results that
came back — not just the agent's narration. We therefore preserve the whole
conversation flow, but flatten it into clean user/assistant *text* turns:
- system prompt: dropped (8K of Hermes boilerplate, not advisory signal).
- assistant turns: kept; any ``tool_calls`` are rendered inline as
``[called tool: name(args)]`` text lines appended to the turn's text.
- ``tool``-role results: NOT dropped. Each is folded (head+tail preview,
see ``_truncate_tool_result``) into the *preceding* assistant turn as a
``[tool result: ...]`` block, so the reference sees what came back.
This emits ZERO ``tool``-role messages and ZERO ``tool_calls`` arrays — only
plain user/assistant text — so strict providers (Mistral, Fireworks) that
reject orphan tool messages / unproduced tool_calls don't 400, while the
reference still has the full picture.
The view MUST end with a ``user`` turn. Anthropic (and OpenRouter→Anthropic)
interpret a trailing assistant turn as an assistant *prefill* to continue,
and no-prefill models (e.g. Claude Opus 4.8) reject it with
``400 ... must end with a user message``. Rather than DELETE the agent's
latest context to satisfy that (which would blind the reference to the
current state), we APPEND a synthetic user turn asking the reference to
judge the state above. End-on-user is satisfied and no context is lost.
The acting aggregator always receives the full, untrimmed transcript; this
function only shapes the disposable advisory copy.
"""
advisory_instruction = (
"[The conversation above is the current state of the task. Give your "
"most intelligent judgement: what is going on, what should happen next, "
"what risks or mistakes you see, and how the acting agent should "
"proceed.]"
)
rendered: list[dict[str, Any]] = []
last_user_content: str | None = None
for msg in messages:
role = msg.get("role")
content = msg.get("content")
text = content if isinstance(content, str) else ""
if role == "system":
continue
if role == "user":
if text.strip():
last_user_content = text
rendered.append({"role": "user", "content": text})
elif role == "assistant":
parts: list[str] = []
if text.strip():
parts.append(text.strip())
calls_text = _render_tool_calls(msg.get("tool_calls"))
if calls_text:
parts.append(calls_text)
# Empty assistant turns (no text, no calls) carry nothing advisory.
if parts:
rendered.append({"role": "assistant", "content": "\n".join(parts)})
elif role == "tool":
# Fold the tool result into the preceding assistant turn as text so
# the reference sees what came back, without emitting a tool-role
# message a reference never produced.
result_text = _truncate_tool_result(text)
block = f"[tool result: {result_text}]"
if rendered and rendered[-1].get("role") == "assistant":
rendered[-1]["content"] = rendered[-1]["content"] + "\n" + block
else:
# No assistant turn to attach to (e.g. a leading tool result);
# keep it as advisory context on its own assistant-role line.
rendered.append({"role": "assistant", "content": block})
# Any other role is ignored.
# End on a user turn: append a synthetic advisory request rather than
# deleting the agent's latest assistant context. This satisfies Anthropic's
# no-trailing-assistant-prefill rule while preserving full state.
if rendered and rendered[-1].get("role") == "assistant":
rendered.append({"role": "user", "content": advisory_instruction})
elif rendered and rendered[-1].get("role") == "user":
# Already ends on a user turn (fresh user prompt, no agent action yet).
# Leave it — the reference answers that prompt directly.
pass
if not rendered:
# Degenerate case: nothing rendered. Fall back to the latest user turn.
if last_user_content is not None:
return [{"role": "user", "content": last_user_content}]
for msg in reversed(messages):
if msg.get("role") == "user" and isinstance(msg.get("content"), str):
return [{"role": "user", "content": msg["content"]}]
return rendered
def _extract_text(response: Any) -> str:
try:
transport = get_transport("chat_completions")
if transport is None:
raise RuntimeError("chat_completions transport unavailable")
normalized = transport.normalize_response(response)
text = (normalized.content or "").strip()
if text:
return text
except Exception:
pass
try:
content = response.choices[0].message.content
return (content or "").strip()
except Exception:
return ""
def aggregate_moa_context(
*,
user_prompt: str,
api_messages: list[dict[str, Any]],
reference_models: list[dict[str, str]],
aggregator: dict[str, str],
temperature: float = 0.6,
aggregator_temperature: float = 0.4,
max_tokens: int | None = None,
) -> str:
"""Run configured reference models and synthesize their advice.
Failures are returned as model-specific notes instead of aborting the normal
agent loop; the main model can still act with partial context.
``max_tokens`` is ``None`` by default: MoA does not cap reference or
aggregator output, so each model uses its own maximum. ``call_llm`` omits
the parameter entirely when it is ``None`` (see its docstring), which also
sidesteps providers that reject ``max_tokens`` outright. A hardcoded cap
here previously truncated long aggregator syntheses.
"""
reference_outputs: list[tuple[str, str]] = []
ref_messages = _reference_messages(api_messages)
reference_outputs = _run_references_parallel(
reference_models,
ref_messages,
temperature=temperature,
max_tokens=max_tokens,
)
joined = "\n\n".join(
f"Reference {idx}{label}:\n{text}"
for idx, (label, text) in enumerate(reference_outputs, start=1)
)
synth_prompt = (
"You are the aggregator in a Mixture of Agents process. Synthesize the "
"reference responses into concise, actionable guidance for the main "
"Hermes agent. Focus on next steps, tool-use strategy, risks, and any "
"disagreements. Do not answer the user directly unless that is all that "
"is needed; produce context the main agent should use in its normal loop.\n\n"
f"Original user prompt:\n{user_prompt}\n\n"
f"Reference responses:\n{joined}"
)
agg_label = _slot_label(aggregator)
try:
response = call_llm(
task="moa_aggregator",
messages=[{"role": "user", "content": synth_prompt}],
temperature=aggregator_temperature,
max_tokens=max_tokens,
**_slot_runtime(aggregator),
)
synthesis = _extract_text(response)
except Exception as exc:
logger.warning("MoA aggregator model %s failed: %s", agg_label, exc)
synthesis = ""
if not synthesis:
synthesis = joined
return (
"[Mixture of Agents context — use this as private guidance for the "
"normal Hermes agent loop. You may call tools, continue reasoning, or "
"finish normally.]\n"
f"Aggregator: {agg_label}\n"
f"References: {', '.join(_slot_label(slot) for slot in reference_models)}\n\n"
f"{synthesis.strip()}"
)
class MoAChatCompletions:
"""OpenAI-chat-compatible facade where the aggregator is the acting model."""
def __init__(self, preset_name: str, reference_callback: Any = None):
self.preset_name = preset_name or "default"
# Optional display hook. Called as reference outputs become available so
# frontends can show each reference model's answer as a labelled block
# before the aggregator acts. Signature:
# reference_callback(event, **kwargs)
# where event is one of:
# "moa.reference" kwargs: index, count, label, text
# "moa.aggregating" kwargs: aggregator (label), ref_count
# Never raises into the model call — display is best-effort.
self.reference_callback = reference_callback
# State-scoped reference cache. The agent loop calls create() once per
# tool-loop iteration; references should re-run whenever the task STATE
# advances — i.e. on every new user message AND every new tool result —
# so each reference judges the latest state. The advisory view
# (_reference_messages) now renders tool calls + results as text, so its
# signature changes on every new tool response; the cache key is that
# signature, so a new tool result is a cache MISS (references re-run)
# while a redundant create() call with identical state is a HIT (no
# re-run, no re-emit). This gives "fire on every user/tool response"
# for free, without re-firing on a pure no-op re-call.
self._ref_cache_key: tuple | None = None
self._ref_cache_outputs: list[tuple[str, str]] = []
def _emit(self, event: str, **kwargs: Any) -> None:
cb = self.reference_callback
if cb is None:
return
try:
cb(event, **kwargs)
except Exception as exc: # pragma: no cover - display must never break the turn
logger.debug("MoA reference_callback failed for %s: %s", event, exc)
def create(self, **api_kwargs: Any) -> Any:
from hermes_cli.config import load_config
from hermes_cli.moa_config import resolve_moa_preset
preset = resolve_moa_preset(load_config().get("moa") or {}, self.preset_name)
messages = list(api_kwargs.get("messages") or [])
reference_models = preset.get("reference_models") or []
aggregator = preset.get("aggregator") or {}
# MoA does not cap reference or aggregator output: each model uses its
# own maximum. Passing max_tokens=None makes call_llm omit the parameter
# (it never caps by default), so a long aggregator synthesis is never
# truncated and providers that reject max_tokens don't 400.
temperature = float(preset.get("reference_temperature", 0.6) or 0.6)
aggregator_temperature = float(preset.get("aggregator_temperature", api_kwargs.get("temperature") or 0.4) or 0.4)
# When the preset is disabled, skip the reference fan-out and let the
# configured aggregator act alone — it is the preset's acting model, so
# a disabled MoA preset is simply "use the aggregator directly."
if not preset.get("enabled", True):
reference_models = []
reference_outputs: list[tuple[str, str]] = []
ref_messages = _reference_messages(messages)
# Turn-scoped cache: only run + display references when the advisory
# view changed (i.e. a new user turn). Within one turn the agent loop
# calls create() once per tool iteration with the same advisory view;
# reuse the cached outputs and skip both the re-run and the re-emit.
_sig = hashlib.sha256(
"\u0000".join(
f"{m.get('role')}:{m.get('content')}" for m in ref_messages
).encode("utf-8", "replace")
).hexdigest()
_cache_key = (self.preset_name, _sig, tuple(_slot_label(s) for s in reference_models))
_refs_from_cache = _cache_key == self._ref_cache_key and bool(self._ref_cache_outputs)
if _refs_from_cache:
reference_outputs = list(self._ref_cache_outputs)
else:
reference_outputs = _run_references_parallel(
reference_models,
ref_messages,
temperature=temperature,
max_tokens=None,
)
self._ref_cache_key = _cache_key
self._ref_cache_outputs = list(reference_outputs)
# Surface each reference model's answer to the display BEFORE the
# aggregator acts — once per turn (only on the iteration that
# actually ran them). The user sees one labelled block per
# reference (rendered like a thinking block) so the MoA process is
# visible rather than a silent pause. Best-effort: never blocks the
# turn.
_ref_count = len(reference_outputs)
for _idx, (_label, _text) in enumerate(reference_outputs, start=1):
self._emit(
"moa.reference",
index=_idx,
count=_ref_count,
label=_label,
text=_text,
)
if _ref_count:
self._emit(
"moa.aggregating",
aggregator=_slot_label(aggregator),
ref_count=_ref_count,
)
agg_messages = [dict(m) for m in messages]
if reference_outputs:
joined = "\n\n".join(
f"Reference {idx}{label}:\n{text}"
for idx, (label, text) in enumerate(reference_outputs, start=1)
)
guidance = (
"[Mixture of Agents reference context]\n"
f"Preset: {self.preset_name}\n"
f"Aggregator/acting model: {_slot_label(aggregator)}\n"
f"References: {', '.join(label for label, _ in reference_outputs)}\n\n"
"Use the reference responses below as private context. You are the aggregator and acting model: "
"answer the user directly or call tools as needed.\n\n"
f"{joined}"
)
for msg in reversed(agg_messages):
if msg.get("role") == "user" and isinstance(msg.get("content"), str):
msg["content"] = msg["content"] + "\n\n" + guidance
break
else:
agg_messages.append({"role": "user", "content": guidance})
if aggregator.get("provider") == "moa":
raise RuntimeError("MoA aggregator cannot be another MoA preset")
agg_kwargs = dict(api_kwargs)
agg_kwargs["messages"] = agg_messages
# The aggregator is the acting model. Resolve its slot to the provider's
# real runtime (base_url/api_key/api_mode) and call it through the same
# request-building path any model uses — so per-model wire-format
# handling (anthropic_messages, max_completion_tokens, fixed/forbidden
# temperature) applies identically to it. MoA imposes no output cap:
# max_tokens is passed through from the caller (normally None → omitted
# → the model's real maximum). The preset's old hardcoded 4096 default
# is gone — it truncated long syntheses.
return call_llm(
task="moa_aggregator",
messages=agg_messages,
temperature=aggregator_temperature,
max_tokens=agg_kwargs.get("max_tokens"),
tools=agg_kwargs.get("tools"),
extra_body=agg_kwargs.get("extra_body"),
**_slot_runtime(aggregator),
)
class MoAClient:
def __init__(self, preset_name: str, reference_callback: Any = None):
self.chat = type("_MoAChat", (), {})()
self.chat.completions = MoAChatCompletions(preset_name, reference_callback=reference_callback)

View File

@@ -478,6 +478,16 @@ def _infer_provider_from_url(base_url: str) -> Optional[str]:
return None
def _lmstudio_server_root(base_url: str) -> str:
"""Return the LM Studio server root for native ``/api/v1`` endpoints."""
root = _normalize_base_url(base_url).rstrip("/")
for suffix in ("/api/v1", "/api", "/v1"):
if root.endswith(suffix):
root = root[: -len(suffix)].rstrip("/")
break
return root
def _is_known_provider_base_url(base_url: str) -> bool:
return _infer_provider_from_url(base_url) is not None
@@ -549,6 +559,7 @@ def detect_local_server_type(base_url: str, api_key: str = "") -> Optional[str]:
server_url = normalized
if server_url.endswith("/v1"):
server_url = server_url[:-3]
lmstudio_url = _lmstudio_server_root(base_url)
headers = _auth_headers(api_key)
@@ -556,7 +567,7 @@ def detect_local_server_type(base_url: str, api_key: str = "") -> Optional[str]:
with httpx.Client(timeout=2.0, headers=headers) as client:
# LM Studio exposes /api/v1/models — check first (most specific)
try:
r = client.get(f"{server_url}/api/v1/models")
r = client.get(f"{lmstudio_url}/api/v1/models")
if r.status_code == 200:
return "lm-studio"
except Exception:
@@ -774,7 +785,7 @@ def fetch_endpoint_model_metadata(
if is_local_endpoint(normalized):
try:
if detect_local_server_type(normalized, api_key=api_key) == "lm-studio":
server_url = normalized[:-3].rstrip("/") if normalized.endswith("/v1") else normalized
server_url = _lmstudio_server_root(normalized)
response = requests.get(
server_url.rstrip("/") + "/api/v1/models",
headers=headers,
@@ -1188,6 +1199,56 @@ def query_ollama_num_ctx(model: str, base_url: str, api_key: str = "") -> Option
return None
def query_ollama_supports_vision(model: str, base_url: str, api_key: str = "") -> Optional[bool]:
"""Return True/False when Ollama ``/api/show`` reports vision support.
Uses the ``capabilities`` field on Ollama 0.6.0+ and falls back to
``model_info.*.vision.block_count`` on older servers. Returns None when
the server is unreachable, not Ollama, or the model is unknown.
"""
import httpx
bare_model = _strip_provider_prefix(model)
if not bare_model or not base_url:
return None
try:
if detect_local_server_type(base_url, api_key=api_key) != "ollama":
return None
except Exception:
return None
server_url = base_url.rstrip("/")
if server_url.endswith("/v1"):
server_url = server_url[:-3]
headers = _auth_headers(api_key)
try:
with httpx.Client(timeout=3.0, headers=headers) as client:
resp = client.post(f"{server_url}/api/show", json={"name": bare_model})
if resp.status_code != 200:
return None
data = resp.json()
except Exception:
return None
caps = data.get("capabilities")
if isinstance(caps, list):
if any(str(cap).lower() == "vision" for cap in caps):
return True
if caps:
return False
model_info = data.get("model_info")
if isinstance(model_info, dict):
for key in model_info:
if "vision.block_count" in str(key).lower():
return True
return None
def _query_ollama_api_show(model: str, base_url: str, api_key: str = "") -> Optional[int]:
"""Query an Ollama server's native ``/api/show`` for context length.
@@ -1297,6 +1358,7 @@ def _query_local_context_length(model: str, base_url: str, api_key: str = "") ->
server_url = base_url.rstrip("/")
if server_url.endswith("/v1"):
server_url = server_url[:-3]
lmstudio_url = _lmstudio_server_root(base_url)
headers = _auth_headers(api_key)
@@ -1340,7 +1402,7 @@ def _query_local_context_length(model: str, base_url: str, api_key: str = "") ->
# Use _model_id_matches for fuzzy matching: LM Studio stores models as
# "publisher/slug" but users configure only "slug" after "local:" prefix.
if server_type == "lm-studio":
resp = client.get(f"{server_url}/api/v1/models")
resp = client.get(f"{lmstudio_url}/api/v1/models")
if resp.status_code == 200:
data = resp.json()
for m in data.get("models", []):
@@ -1646,6 +1708,34 @@ def get_model_context_length(
if config_context_length is not None and isinstance(config_context_length, int) and config_context_length > 0:
return config_context_length
# 0a. MoA virtual provider — ``model`` is a preset name, not a real model,
# and ``base_url`` is the local virtual endpoint, so every probe below would
# miss and fall through to the 256K default. The aggregator is the acting
# model, so resolve the context window from the aggregator slot's real
# provider+model instead. References are advisory-only and never bound the
# acting context, so they're ignored here.
if (provider or "").strip().lower() == "moa":
try:
from hermes_cli.config import load_config
from hermes_cli.moa_config import resolve_moa_preset
from hermes_cli.runtime_provider import resolve_runtime_provider
preset = resolve_moa_preset(load_config().get("moa") or {}, model)
agg = preset.get("aggregator") or {}
agg_provider = str(agg.get("provider") or "").strip()
agg_model = str(agg.get("model") or "").strip()
if agg_model and agg_provider and agg_provider.lower() != "moa":
rt = resolve_runtime_provider(requested=agg_provider, target_model=agg_model)
return get_model_context_length(
agg_model,
base_url=rt.get("base_url", "") or "",
api_key=rt.get("api_key", "") or "",
provider=agg_provider,
)
except Exception:
logger.debug("MoA aggregator context-length resolution failed", exc_info=True)
# Fall through to the generic default if aggregator resolution failed.
# 0b. custom_providers per-model override — check before any probe.
# This closes the gap where /model switch and display paths used to fall
# back to 128K despite the user having a per-model context_length set.

51
agent/pet/__init__.py Normal file
View File

@@ -0,0 +1,51 @@
"""Petdex pet engine — shared core for the CLI, TUI, and desktop surfaces.
Petdex (https://github.com/crafter-station/petdex) is a public gallery of
animated sprite "pets" for coding agents. Each pet is a ``pet.json`` plus a
``spritesheet.{webp,png}`` of 192×208 px cells. Current Codex/petdex sheets use
an 8-column × 9-row atlas; older Hermes/petdex sheets used an 8-row atlas.
Hermes infers the row taxonomy from the sheet and maps agent activity onto
idle/run/review/failed/wave/jump.
This package is the **single source of truth** for the feature so the base
CLI (Python) and TUI (Ink, via ``tui_gateway``) never duplicate the hard
parts:
- :mod:`agent.pet.constants` — frame geometry + the :class:`PetState` enum.
- :mod:`agent.pet.state` — map agent activity → a :class:`PetState`.
- :mod:`agent.pet.manifest` — fetch the public petdex manifest.
- :mod:`agent.pet.store` — install / list / resolve pets on disk
(profile-aware via ``get_hermes_home()``).
- :mod:`agent.pet.render` — decode a spritesheet and encode frames for a
terminal (kitty / iTerm2 / sixel graphics
protocols, with a Unicode half-block
fallback).
Rendering in the Electron desktop is necessarily TypeScript (canvas), but it
reuses the same on-disk store and the same state semantics.
The whole feature is a *display* concern: it adds no model tool, mutates no
system prompt or toolset, and therefore has zero effect on prompt caching.
"""
from agent.pet.constants import (
DEFAULT_SCALE,
FRAME_H,
FRAME_W,
FRAMES_PER_STATE,
LOOP_MS,
STATE_ROWS,
PetState,
)
from agent.pet.state import derive_pet_state
__all__ = [
"DEFAULT_SCALE",
"FRAME_H",
"FRAME_W",
"FRAMES_PER_STATE",
"LOOP_MS",
"STATE_ROWS",
"PetState",
"derive_pet_state",
]

167
agent/pet/constants.py Normal file
View File

@@ -0,0 +1,167 @@
"""Pet sprite geometry + animation-state taxonomy.
These values are the common petdex/Codex pet geometry. The real ``pet.json``
usually only carries ``id``/``displayName``/``description``/``spritesheetPath``;
row taxonomy is inferred from the atlas shape so Hermes can render both legacy
8-row sheets and current 9-row Codex sheets.
"""
from __future__ import annotations
from enum import Enum
# Frame geometry (pixels). Current Codex/petdex spritesheets are 8 columns x 9
# rows (1536x1872), while older Hermes/petdex sheets used 9 columns x 8 rows
# (1728x1664). Renderers derive both row taxonomy and real column count from the
# concrete sheet, so either shape works.
FRAME_W = 192
FRAME_H = 208
# Frames consumed per animation state (the petdex web app uses CSS
# ``steps(6)``). A sheet may physically contain more columns; we only step
# through the first ``FRAMES_PER_STATE``.
FRAMES_PER_STATE = 6
# Full-loop duration for one state, milliseconds (petdex default).
LOOP_MS = 1100
# Default on-screen scale relative to native frame size. ``display.pet.scale``
# is the single master scalar: the desktop canvas multiplies its native pixels
# by it and every terminal surface derives its half-block/kitty column width
# from it (see :func:`cols_for_scale`), so one number shrinks all three
# interfaces together. (petdex's own clients render at 0.7; we default smaller
# so the kitty/GUI mascot stays a glanceable corner sprite. The half-block
# fallback can't shrink as far — see ``UNICODE_MIN_COLS`` — and clamps to its
# legibility floor instead.)
DEFAULT_SCALE = 0.33
# User-settable scale bounds (``/pet scale``, desktop slider). Floor keeps the
# pet clickable/visible; ceiling stops a fat-fingered value from filling the
# screen. The unicode fallback additionally clamps to ``UNICODE_MIN_COLS``.
MIN_SCALE = 0.1
MAX_SCALE = 3.0
def clamp_scale(scale: float) -> float:
"""Clamp *scale* to ``[MIN_SCALE, MAX_SCALE]`` (the single validation point)."""
return max(MIN_SCALE, min(MAX_SCALE, scale))
# Terminal cells one native frame spans at ``scale == 1.0``. A cell is ~8px
# wide, a frame is ``FRAME_W`` (192) px → 24 cells. This mirrors the kitty
# graphics placement (``scaled_px // 8``) so at full scale every renderer agrees.
BASE_UNICODE_COLS = FRAME_W // 8
# Legibility floor for the half-block fallback. A half-block cell samples the
# sprite at only 1 horizontal + 2 vertical taps, so below this width a 192×208
# pet collapses into an unreadable blob *regardless* of scale. kitty/GUI draw
# true pixels and have no such floor — that's why the same ``scale: 0.33`` is
# crisp there but mush in half-blocks. ``scale`` shrinks the unicode pet down
# TO this floor (and grows it above), instead of past it into noise.
UNICODE_MIN_COLS = 16
def cols_for_scale(scale: float) -> int:
"""Half-block width implied by *scale*, clamped to the legibility floor.
Above the floor it tracks the kitty cell box (``scaled_px // 8``) so the two
renderers converge at larger sizes; below it the floor keeps the sprite
readable rather than letting it devolve into a blob.
"""
return max(UNICODE_MIN_COLS, round(BASE_UNICODE_COLS * (scale or DEFAULT_SCALE)))
def resolve_cols(scale: float, unicode_cols: int = 0) -> int:
"""Resolve terminal width: explicit *unicode_cols* override, else from *scale*."""
return int(unicode_cols) if unicode_cols and int(unicode_cols) > 0 else cols_for_scale(scale)
class PetState(str, Enum):
"""Animation state a pet can be shown in.
These are Hermes' activity state names. They are not always identical to the
source atlas row names: Codex-format pets use rows like ``jumping`` /
``running`` while the UI keeps the shorter ``jump`` / ``run`` names.
"""
IDLE = "idle"
WAVE = "wave"
RUN = "run"
FAILED = "failed"
REVIEW = "review"
JUMP = "jump"
WAITING = "waiting"
# Legacy Hermes/petdex row order (top -> bottom) used by the older 8-row,
# 9-column atlas shape.
LEGACY_STATE_ROWS: list[str] = [
PetState.IDLE.value,
PetState.WAVE.value,
PetState.RUN.value,
PetState.FAILED.value,
PetState.REVIEW.value,
PetState.JUMP.value,
"extra1",
"extra2",
]
# Current Petdex row order (top -> bottom) used by 1536x1872 atlases:
# 8 columns x 9 rows of 192x208 cells.
CODEX_STATE_ROWS: list[str] = [
PetState.IDLE.value,
"running-right",
"running-left",
"waving",
"jumping",
PetState.FAILED.value,
PetState.WAITING.value,
"running",
PetState.REVIEW.value,
]
# Default/fallback for callers without a sheet. Prefer the current 9-row Codex
# format because generated pets and the public Codex pet contract use it.
STATE_ROWS: list[str] = CODEX_STATE_ROWS
# Canonical Hermes activity names -> accepted row-name aliases in descending
# preference. This keeps our internal state names stable (`wave`/`jump`/`run`)
# while matching Petdex's current `waving`/`jumping`/`running` taxonomy.
STATE_ALIASES: dict[str, tuple[str, ...]] = {
PetState.IDLE.value: (PetState.IDLE.value,),
PetState.WAVE.value: (PetState.WAVE.value, "waving"),
PetState.JUMP.value: (PetState.JUMP.value, "jumping"),
PetState.RUN.value: (PetState.RUN.value, "running"),
PetState.FAILED.value: (PetState.FAILED.value,),
PetState.REVIEW.value: (PetState.REVIEW.value,),
PetState.WAITING.value: (PetState.WAITING.value,),
}
def state_aliases_for(state: "PetState | str") -> tuple[str, ...]:
"""Return accepted row-name aliases for *state* (always non-empty)."""
value = state.value if isinstance(state, PetState) else str(state)
aliases = STATE_ALIASES.get(value)
return aliases if aliases else (value,)
def state_rows_for_grid(row_count: int | None) -> list[str]:
"""Return the row taxonomy for a spritesheet with *row_count* rows."""
try:
rows = int(row_count or 0)
except (TypeError, ValueError):
rows = 0
if rows >= len(CODEX_STATE_ROWS):
return CODEX_STATE_ROWS
return LEGACY_STATE_ROWS
def state_row_index(state: "PetState | str", row_count: int | None = None) -> int:
"""Return the spritesheet row index for *state* (clamped, never raises)."""
rows = state_rows_for_grid(row_count)
for name in state_aliases_for(state):
try:
return rows.index(name)
except ValueError:
continue
return 0 # fall back to the idle row

View File

@@ -0,0 +1,29 @@
"""Pet generation — base-draft → hatch pipeline.
Public surface used by the gateway RPCs, the CLI ``hermes pets generate``
command, and tests:
- :func:`generate_base_drafts` / :func:`hatch_pet` — the two-step flow.
- :class:`HatchResult`, :class:`GenerationError`.
- :mod:`atlas` — deterministic frame extraction + atlas composition/validation.
Image generation is delegated to the active reference-capable
:class:`~agent.image_gen_provider.ImageGenProvider` (OpenAI gpt-image-2 or Krea);
atlas assembly is fully deterministic so it's testable without any API calls.
"""
from __future__ import annotations
from agent.pet.generate.imagegen import GenerationError
from agent.pet.generate.orchestrate import (
HatchResult,
generate_base_drafts,
hatch_pet,
)
__all__ = [
"GenerationError",
"HatchResult",
"generate_base_drafts",
"hatch_pet",
]

1183
agent/pet/generate/atlas.py Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,251 @@
"""Thin image-generation layer for pet sprites.
Wraps the active :class:`~agent.image_gen_provider.ImageGenProvider` with the
two things sprite generation needs that the agent-facing ``image_generate`` tool
doesn't expose: **N variants** (loop) and **reference-image grounding** (so each
animation row stays the same character as the chosen base).
Reference grounding only works on providers that support it — currently OpenAI
``gpt-image-2`` (image edits) and Krea (style references). We resolve to one of
those and surface a clear, actionable error otherwise rather than silently
producing an ungrounded, drifting pet.
"""
from __future__ import annotations
import logging
import os
from dataclasses import dataclass
from pathlib import Path
logger = logging.getLogger(__name__)
# Providers that can ground generation on a reference image, in preference order
# (Nous Portal → OpenAI → OpenRouter → …). OpenRouter/Nous run a quality-first
# model chain and may fall back depending on account access and endpoint behavior,
# so fidelity can vary by configured backend + model availability.
_REF_CAPABLE = ("nous", "openai", "openai-codex", "openrouter", "krea")
# Friendly display label per reference-capable provider, surfaced in the desktop
# pet-gen picker.
_PROVIDER_LABELS: dict[str, str] = {
"nous": "Nous Portal",
"openrouter": "OpenRouter",
"openai": "OpenAI",
"openai-codex": "OpenAI (Codex)",
"krea": "Krea",
}
def _forced_provider_from_env() -> str | None:
"""Optional QA override to force a pet-gen backend.
`HERMES_PET_IMAGE_PROVIDER=<name>` (e.g. `openrouter`) bypasses the normal
active/default provider resolution for pet generation only. Unknown values are
ignored so existing users are unaffected.
"""
forced = os.environ.get("HERMES_PET_IMAGE_PROVIDER", "").strip().lower()
return forced if forced in _REF_CAPABLE else None
class GenerationError(RuntimeError):
"""Raised on any image-generation failure (no provider, API error, IO)."""
@dataclass(frozen=True)
class SpriteProvider:
"""Resolved provider plus whether it can take reference images."""
name: str
provider: object
supports_references: bool
def _discover() -> None:
try:
from hermes_cli.plugins import _ensure_plugins_discovered
_ensure_plugins_discovered()
except Exception as exc: # noqa: BLE001 - discovery is best-effort
logger.debug("image-gen plugin discovery failed: %s", exc)
def resolve_provider(*, require_references: bool = True, prefer: str | None = None) -> SpriteProvider:
"""Pick the image provider to use for sprite work.
Preference: an explicit *prefer* choice (the desktop pet-gen picker) when it's
reference-capable and configured, then the configured/active provider when
it's reference-capable, else the first available reference-capable provider.
With *require_references* off we fall back to any available provider (used for
prompt-only base drafts).
"""
_discover()
from agent.image_gen_registry import get_active_provider, get_provider
# QA override: force one provider for pet-gen iteration regardless of the
# globally active image_gen backend.
forced = _forced_provider_from_env()
if forced:
chosen = get_provider(forced)
if chosen is not None and chosen.is_available():
return SpriteProvider(name=forced, provider=chosen, supports_references=True)
# An explicit user pick wins when it's reference-capable and has credentials;
# otherwise we ignore it and fall through to the normal resolution.
if prefer:
chosen = get_provider(prefer)
if prefer in _REF_CAPABLE and chosen is not None and chosen.is_available():
return SpriteProvider(name=prefer, provider=chosen, supports_references=True)
# Configured / active provider first.
active = None
try:
active = get_active_provider()
except Exception: # noqa: BLE001
active = None
if active is not None:
name = getattr(active, "name", "")
if name in _REF_CAPABLE and active.is_available():
return SpriteProvider(name=name, provider=active, supports_references=True)
# Any available reference-capable provider.
for name in _REF_CAPABLE:
provider = get_provider(name)
if provider is not None and provider.is_available():
return SpriteProvider(name=name, provider=provider, supports_references=True)
if not require_references and active is not None and active.is_available():
return SpriteProvider(
name=getattr(active, "name", "unknown"), provider=active, supports_references=False
)
raise GenerationError(
"Pet generation needs an image backend that supports reference images. "
"Open `hermes tools` → Image Generation and configure Nous Portal, "
"OpenRouter, or OpenAI (gpt-image-2) with an API key."
)
def list_sprite_providers() -> list[dict]:
"""The reference-capable providers available to pick for pet generation.
Returns ``[{name, label, default}]`` for every ref-capable provider the user
actually has credentials for, in preference order, marking the one
:func:`resolve_provider` would choose with no explicit preference. Empty when
none is configured (the picker hides itself). Best-effort: discovery hiccups
yield an empty list.
"""
_discover()
from agent.image_gen_registry import get_provider
try:
default_name = resolve_provider(require_references=True).name
except GenerationError:
default_name = ""
out: list[dict] = []
for name in _REF_CAPABLE:
provider = get_provider(name)
if provider is None or not provider.is_available():
continue
out.append(
{
"name": name,
"label": _PROVIDER_LABELS.get(name, name),
"default": name == default_name,
}
)
return out
def _save_local(image_ref: str, *, prefix: str) -> Path:
"""Return a local path for *image_ref*, downloading it if it's a URL."""
if image_ref.startswith(("http://", "https://")):
from agent.image_gen_provider import save_url_image
return Path(save_url_image(image_ref, prefix=prefix))
return Path(image_ref)
def _rejected_background(error: str) -> bool:
"""True when a provider error is specifically about the ``background`` param.
Transparent backgrounds are a per-model capability (e.g. some gpt-image tiers
reject ``background=transparent`` outright). We detect that one rejection so
we can retry without the flag rather than failing the whole pet — our chroma
key pass makes the result transparent regardless.
"""
lowered = (error or "").lower()
return "background" in lowered and ("not supported" in lowered or "transparent" in lowered)
def generate(
prompt: str,
*,
n: int = 1,
reference_images: list[Path] | None = None,
provider: SpriteProvider | None = None,
prefix: str = "pet_gen",
aspect_ratio: str = "square",
) -> list[Path]:
"""Generate *n* sprite images and return their local paths.
*reference_images* grounds the output on a base image (required for rows).
*aspect_ratio* picks the canvas: ``"square"`` for single-character base
drafts, ``"landscape"`` for multi-frame row strips (the wider 1536px canvas
gives every frame real horizontal room so winged poses don't have to be
shrunk to avoid touching their neighbors).
We *ask* for a transparent background, but fall back to an opaque generation
(cleaned up downstream by the chroma-key pass) on models that reject the
flag. Raises :class:`GenerationError` if nothing usable comes back.
"""
sprite = provider or resolve_provider(require_references=bool(reference_images))
if reference_images and not sprite.supports_references:
raise GenerationError(
f"image backend '{sprite.name}' cannot use reference images; "
"configure OpenAI gpt-image-2 or Krea for pet generation"
)
refs = [str(p) for p in (reference_images or [])]
def _run(extra: dict) -> tuple[Path | None, str]:
kwargs: dict = {"aspect_ratio": aspect_ratio, **extra}
if refs:
# Providers disagree on the ref kwarg name: our OpenRouter/Nous
# backends read ``reference_images``, OpenAI's gpt-image-2 reads
# ``reference_image_urls``. Send both; each ignores the other.
kwargs["reference_images"] = refs
kwargs["reference_image_urls"] = refs
try:
result = sprite.provider.generate(prompt, **kwargs)
except Exception as exc: # noqa: BLE001 - normalize provider crashes
logger.debug("provider.generate crashed: %s", exc)
return None, str(exc)
if not isinstance(result, dict) or not result.get("success"):
return None, (result or {}).get("error", "unknown error") if isinstance(result, dict) else "no result"
image_ref = result.get("image")
if not image_ref:
return None, "provider returned no image"
try:
return _save_local(str(image_ref), prefix=prefix), ""
except Exception as exc: # noqa: BLE001
return None, f"could not save generated image: {exc}"
out: list[Path] = []
last_error = ""
allow_transparent = True
for _ in range(max(1, n)):
path, err = _run({"background": "transparent"} if allow_transparent else {})
# Model doesn't support the transparent flag → drop it for this and every
# remaining variant (no point re-probing a capability we just disproved).
if path is None and allow_transparent and _rejected_background(err):
allow_transparent = False
path, err = _run({})
if path is not None:
out.append(path)
else:
last_error = err
if not out:
raise GenerationError(last_error or "image generation produced no output")
return out

View File

@@ -0,0 +1,358 @@
"""Pet generation orchestration — the base-draft → hatch flow.
Two steps, mirroring the UX across every surface:
1. :func:`generate_base_drafts` — a handful of prompt-only "what should this pet
look like" variants. Cheap; the user picks one (or retries for a fresh set).
2. :func:`hatch_pet` — takes the chosen base and generates one grounded row
strip per Hermes state, slices each into frames, composes the atlas, validates
it, and writes the pet into the store.
Splitting it this way bounds cost (4 cheap base calls per round; the ~6 row
calls happen once, on the pet you actually keep) and gives each UI a natural
preview/loading point.
"""
from __future__ import annotations
import logging
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass
from pathlib import Path
from typing import Callable
from agent.pet.generate import atlas, imagegen, prompts
from agent.pet.generate.imagegen import GenerationError, SpriteProvider
logger = logging.getLogger(__name__)
# (event, detail) — e.g. ("row", "idle"), ("compose", ""), ("save", "<slug>").
ProgressFn = Callable[[str, str], None]
# Image generations are independent network calls, so we fan them out instead of
# blocking on each in turn — a hatch is ~8 row calls that would otherwise run
# back-to-back and routinely blow past the client's RPC timeout. Capped so we
# don't hammer the provider's rate limit (one cold call can still be slow).
_MAX_PARALLEL_GENERATIONS = 4
# How many times to (re)generate a single row before accepting a best-effort
# slice. Early attempts demand clean per-pose gutters; the last is lenient so a
# stubborn row still yields frames instead of dropping out entirely.
_ROW_GEN_ATTEMPTS = 3
_MIN_FILLED_STATES = 6
_REQUIRED_STATES = frozenset({"idle", "running-right", "waving"})
@dataclass(frozen=True)
class HatchResult:
"""Outcome of a successful :func:`hatch_pet`."""
slug: str
display_name: str
spritesheet: Path
states: list[str]
validation: dict
def _harden_transparency(path: Path) -> Path:
"""Key out any solid backdrop the provider painted; save as an RGBA PNG.
``background=transparent`` is requested on every call, but image models honor
it inconsistently — some still paint a flat (often near-white) backdrop. We
run the same chroma-key pass the row extractor uses so every base draft the
user picks between (and the reference the rows are grounded on) is a clean
cutout. Best-effort: a decode failure leaves the original untouched.
"""
from PIL import Image
try:
with Image.open(path) as opened:
keyed = atlas.remove_background(opened.convert("RGBA"))
# Zero the RGB of any leftover semi-transparent edge pixels so a keyed
# draft has no colored halo when composited on the dark UI.
keyed = atlas._clear_transparent_rgb(keyed)
out = path.with_suffix(".png")
keyed.save(out, format="PNG")
return out
except Exception as exc: # noqa: BLE001 - cosmetic; fall back to the raw image
logger.debug("base draft transparency hardening failed for %s: %s", path, exc)
return path
def generate_base_drafts(
concept: str,
*,
n: int = 4,
style: str = "auto",
reference_images: list[Path] | None = None,
provider: SpriteProvider | None = None,
on_draft: Callable[[int, Path], None] | None = None,
is_cancelled: Callable[[], bool] | None = None,
) -> list[Path]:
"""Generate *n* candidate base looks for *concept*; returns image paths.
Each draft is hardened to a transparent cutout (see :func:`_harden_transparency`).
Drafts are generated concurrently and *on_draft(index, path)* fires as each
one finishes (not at the end) so callers can stream previews to the UI
instead of leaving it blank until the whole batch is done.
*is_cancelled*, when supplied, is polled cooperatively: a draft that hasn't
started yet is skipped, and once it trips we stop staging/streaming further
drafts and cancel any queued work (already-in-flight provider calls can't be
hard-killed, but their results are dropped).
"""
# A user reference image (e.g. their own pet) grounds every draft, so it
# needs a reference-capable provider — same requirement as the row passes.
refs = reference_images or None
sprite = provider or imagegen.resolve_provider(require_references=bool(refs))
cancelled = is_cancelled or (lambda: False)
# Each draft is its own one-shot generation, run concurrently so the user
# waits for one image, not N. A single draft failing must not sink the set.
# Each gets a distinct variation nudge so the options aren't near-duplicates.
logger.info("pet generate: drafting %d base looks for %r (style=%s)", n, concept, style)
def _one(index: int) -> tuple[int, Path | None, str | None]:
if cancelled():
return index, None, None
t0 = time.monotonic()
variation = prompts.BASE_VARIATIONS[index % len(prompts.BASE_VARIATIONS)]
prompt = prompts.build_base_prompt(concept, style=style, variation=variation)
try:
out = imagegen.generate(prompt, n=1, reference_images=refs, provider=sprite, prefix="pet_base")
except Exception as exc: # noqa: BLE001 - tolerate a single failed draft
logger.warning("pet generate: draft %d failed after %.1fs: %s", index, time.monotonic() - t0, exc)
return index, None, str(exc)
if not out:
logger.warning("pet generate: draft %d produced no image", index)
return index, None, "the image provider returned no image"
logger.info("pet generate: draft %d ready in %.1fs", index, time.monotonic() - t0)
return index, _harden_transparency(out[0]), None
workers = max(1, min(n, _MAX_PARALLEL_GENERATIONS))
results: dict[int, Path] = {}
errors: list[str] = []
with ThreadPoolExecutor(max_workers=workers) as pool:
futures = [pool.submit(_one, i) for i in range(n)]
# as_completed runs in *this* (the caller's) thread, so on_draft — and any
# gateway event it emits — inherits the request's bound transport, unlike
# the worker threads above.
for fut in as_completed(futures):
if cancelled():
logger.info("pet generate: cancelled — dropping remaining drafts")
for pending in futures:
pending.cancel()
break
index, path, err = fut.result()
if path is None:
if err:
errors.append(err)
continue
results[index] = path
if on_draft is not None:
try:
on_draft(index, path)
except Exception as exc: # noqa: BLE001 - progress is best-effort
logger.debug("on_draft callback failed: %s", exc)
drafts = [results[i] for i in sorted(results)]
if not drafts and not cancelled():
# Surface *why* — every draft failed for a reason (a content-policy refusal
# on a name like "minion", a provider/auth error, …); the most common one
# is the representative cause. Far more useful than "no usable drafts".
raise GenerationError(_drafts_failed_reason(errors))
return drafts
def _drafts_failed_reason(errors: list[str]) -> str:
"""The representative reason a draft round produced nothing, humanized."""
if not errors:
return "image generation produced no usable drafts"
from collections import Counter
return _humanize_image_error(Counter(errors).most_common(1)[0][0])
def _humanize_image_error(error: str) -> str:
"""Turn a raw provider error into a friendly, actionable sentence.
The big one is moderation: image models refuse trademarked characters and
real people (e.g. "minion"), which reads as an opaque 400 otherwise.
"""
low = error.lower()
if any(s in low for s in ("moderation_blocked", "safety system", "content policy", "content_policy")):
return (
"The image provider blocked this prompt — its safety filter rejects "
"trademarked characters and real people. Try an original description."
)
if any(s in low for s in ("api key", "unauthorized", "401", "auth")):
return "The image provider rejected the request — check your API key in Settings → Providers."
if "rate limit" in low or "429" in low:
return "The image provider is rate-limiting — wait a moment and try again."
# Otherwise the first line, trimmed of the noisy provider envelope.
return error.splitlines()[0].strip()[:200]
def hatch_pet(
*,
base_image: str | Path,
slug: str,
display_name: str = "",
description: str = "",
concept: str = "",
style: str = "auto",
on_progress: ProgressFn | None = None,
provider: SpriteProvider | None = None,
is_cancelled: Callable[[], bool] | None = None,
) -> HatchResult:
"""Turn an approved base image into a full, installed Hermes pet.
Generates a grounded row strip per state, extracts frames, composes +
validates the atlas, and registers it. The idle row falls back to the base
look so the pet always renders. Raises :class:`GenerationError` on failure.
*is_cancelled*, when supplied, is polled cooperatively: rows that haven't
started are skipped, queued rows are cancelled, and once every row is done we
abort (raising :class:`GenerationError`) before composing/saving so a stopped
hatch never writes a half-built pet.
"""
base = Path(base_image)
if not base.is_file():
raise GenerationError(f"base image not found: {base}")
sprite = provider or imagegen.resolve_provider(require_references=True)
progress = on_progress or (lambda *_: None)
cancelled = is_cancelled or (lambda: False)
label = concept or display_name or slug
frames_by_state: dict[str, list] = {}
total_rows = len(atlas.ROW_SPECS)
logger.info("pet hatch %r: generating %d animation rows", slug, total_rows)
# Generate every state's row strip concurrently — they're independent
# grounded calls, so the hatch waits for the slowest row, not their sum. A
# single row failing is tolerated (idle is guaranteed below).
def _gen_row(spec: tuple[str, int, int]) -> tuple[str, list | None]:
state, _row, count = spec
if cancelled():
return state, None
t0 = time.monotonic()
last_exc: Exception | None = None
# Self-healing: a model occasionally returns a row whose poses are touching
# (no clean gutters), which slices badly. We retry such rolls; only the
# final attempt falls back to lenient ``auto`` slicing so a stubborn row
# still yields *something* rather than dropping the whole row.
for attempt in range(_ROW_GEN_ATTEMPTS):
if cancelled():
return state, None
strict = attempt < _ROW_GEN_ATTEMPTS - 1
try:
strips = imagegen.generate(
prompts.build_row_prompt(state, count, label, style=style),
n=1,
reference_images=[base],
provider=sprite,
prefix=f"pet_row_{state}",
# Wider canvas → each frame gets real horizontal room, so winged
# poses keep a full, healthy size and still leave clean gutters.
aspect_ratio="landscape",
)
# ``components`` requires clean per-pose gutters (raises otherwise),
# so a touching roll is rejected and regenerated; the last attempt
# uses ``auto`` (equal-slot fallback, never raises). Raw (fit=False)
# so normalize_cells registers the whole pet at once.
method = "components" if strict else "auto"
frames = atlas.extract_strip_frames(strips[0], count, method=method, fit=False)
logger.info(
"pet hatch %r: row %r ready in %.1fs (attempt %d)",
slug, state, time.monotonic() - t0, attempt + 1,
)
return state, frames
except Exception as exc: # noqa: BLE001 - retried; one bad row is tolerated
last_exc = exc
logger.warning(
"pet hatch %r: row %r attempt %d/%d failed: %s",
slug, state, attempt + 1, _ROW_GEN_ATTEMPTS, exc,
)
logger.warning(
"pet hatch %r: row %r gave up after %.1fs: %s",
slug, state, time.monotonic() - t0, last_exc,
)
return state, None
# running-left is derived by mirroring running-right (guaranteed-consistent
# and one fewer generation), so we don't generate it directly.
generated_specs = [spec for spec in atlas.ROW_SPECS if spec[0] != "running-left"]
workers = max(1, min(len(generated_specs), _MAX_PARALLEL_GENERATIONS))
done = 0
with ThreadPoolExecutor(max_workers=workers) as pool:
futures = [pool.submit(_gen_row, spec) for spec in generated_specs]
# as_completed runs on the caller (request) thread, so progress events
# emitted here inherit the request transport — unlike the worker threads.
for fut in as_completed(futures):
if cancelled():
logger.info("pet hatch %r: cancelled — dropping remaining rows", slug)
for pending in futures:
pending.cancel()
break
state, frames = fut.result()
done += 1
progress("row", f"{state}:{done}:{total_rows}")
if frames:
frames_by_state[state] = frames
if cancelled():
raise GenerationError("hatch cancelled")
# Derive running-left from the approved running-right row (per-frame mirror,
# preserving order/timing). Missing running-right is rejected below; a pet
# without its canonical walk cycle is a failed hatch, not a shippable mascot.
right = frames_by_state.get("running-right")
if right:
done += 1
progress("row", f"running-left:{done}:{total_rows}")
frames_by_state["running-left"] = atlas.mirror_frames(right)
logger.info("pet hatch %r: row 'running-left' mirrored from running-right", slug)
else:
logger.warning("pet hatch %r: no running-right to mirror; left walk left empty", slug)
# Idle is the resting state the renderer falls back to — guarantee it.
if not frames_by_state.get("idle"):
progress("row", "idle-fallback")
frames_by_state["idle"] = [atlas.single_frame(base, fit=False)]
progress("compose", "")
logger.info("pet hatch %r: composing atlas from %d states", slug, len(frames_by_state))
# One shared scale + baseline across every state so the pet never slides or
# pulses size between frames; compose just packs the normalized cells.
sheet = atlas.compose_atlas(atlas.normalize_cells(frames_by_state))
validation = atlas.validate_atlas(sheet)
if not validation["ok"]:
raise GenerationError("; ".join(validation["errors"]) or "atlas validation failed")
filled_states = set(validation["filled_states"])
missing_required = sorted(_REQUIRED_STATES - filled_states)
if missing_required:
raise GenerationError(f"missing required animation row(s): {', '.join(missing_required)}")
if len(filled_states) < _MIN_FILLED_STATES:
raise GenerationError(
f"only {len(filled_states)}/{len(atlas.ROW_SPECS)} animation rows were usable; regenerate"
)
from agent.pet import store
progress("save", slug)
logger.info("pet hatch %r: saving pet", slug)
pet = store.register_local_pet(
sheet,
slug=slug,
display_name=display_name or slug,
description=description,
)
return HatchResult(
slug=pet.slug,
display_name=pet.display_name,
spritesheet=pet.spritesheet,
states=validation["filled_states"],
validation=validation,
)

View File

@@ -0,0 +1,183 @@
"""Prompt builders for pet generation.
Two prompt shapes: a *base* prompt (prompt-only, produces the canonical look the
user picks between) and per-*state* *row* prompts (grounded on the chosen base,
produce one horizontal strip of N poses). Prompts stay concise and
sprite-production oriented; the identity lock and "one transparent row" framing
matter more than flowery description.
We generate the full petdex/Codex nine-state set (see
:data:`agent.pet.generate.atlas.ROW_SPECS`) so a hatched pet is a valid
``petdex submit`` spritesheet.
"""
from __future__ import annotations
# What each petdex/Codex state should depict (kept short — these go straight into
# the row prompt). Phrased to avoid the common sprite-gen failure modes (detached
# effects, motion lines, shadows). Critical distinction: ``running`` is the
# *working* state (in place), while ``running-right`` / ``running-left`` are the
# actual directional walk/run cycles.
STATE_ACTIONS: dict[str, str] = {
"idle": "a calm idle loop: subtle breathing, a tiny blink or gentle bob, no big gestures",
"running-right": (
"a sideways walk/run locomotion cycle moving to the RIGHT: the character "
"faces and travels right with clear directional steps, a smooth gait loop"
),
"running-left": (
"a sideways walk/run locomotion cycle moving to the LEFT: the character "
"faces and travels left with clear directional steps (the mirror of the "
"right-facing run)"
),
"waving": "a friendly greeting: raising a paw/hand/limb to wave, clear up-and-down gesture",
"jumping": "a happy celebration jump: anticipation, lift off the ground, peak, and land",
"failed": "a sad or deflated reaction: slumped, dejected, small frown — readable but not noisy",
"waiting": (
"an expectant 'waiting on you' pose: looking up/out as if asking for input "
"or approval — distinct from idle and review"
),
"running": (
"focused active work, staying IN PLACE (NOT walking or foot-running): "
"leaning in, concentrating, busy 'thinking / processing / typing' energy"
),
"review": "careful inspection: a focused lean, head tilt, studying something intently",
}
_STYLE_HINTS: dict[str, str] = {
# Default to the popular petdex look: crisp 16-bit PIXEL ART, not the smooth
# 2D illustration (let alone 3D render) gpt-image reaches for by default.
"auto": (
" Style: crisp 16-bit PIXEL-ART game sprite — visible square pixels, a small "
"limited palette, clean dark outline, flat cel shading, chunky chibi "
"proportions, like a classic SNES/JRPG party member or a petdex.dev mascot. "
"Absolutely NOT 3D-rendered, NOT a smooth painted or vector illustration, "
"NOT photorealistic — no soft gradients, no realistic lighting, no figurine look."
),
"pixel": " Render in clean 16-bit pixel-art style with visible square pixels and a limited palette.",
"plush": " Render as a soft plush toy.",
"clay": " Render as a claymation / soft 3D clay figure.",
"sticker": " Render as a glossy die-cut sticker.",
"flat-vector": " Render in flat vector mascot style.",
"3d-toy": " Render as a glossy 3D toy.",
"painterly": " Render in a soft painterly style.",
}
_BACKGROUND = (
"Center the character on a SINGLE flat, uniform, high-contrast chroma-key "
"background — pure hot magenta #FF00FF (only if magenta appears on the "
"character, use pure green #00FF00 instead). The background is ONE continuous "
"even color that completely surrounds the character with NO gradient, "
"vignette, texture, pattern, scenery, shadow, ground line, frame, border, "
"panel, comic cell, gutter line, grid, or divider of any kind, so it keys out "
"cleanly. The background color must not appear anywhere on the character. "
"No text, no labels, no speech bubbles, no UI."
)
def style_hint(style: str | None) -> str:
return _STYLE_HINTS.get((style or "auto").strip().lower(), "")
# Row strips are generated on the wider landscape canvas (see imagegen.generate /
# orchestrate). The extra width is what lets each pose stay a healthy size AND
# leave a real gutter — used here only to cite concrete pixel numbers.
_ASSUMED_STRIP_WIDTH = 1536
def _spacing_spec(frame_count: int) -> tuple[int, int]:
"""(per-pose width px, gap px) for a row of *frame_count* poses.
Pixel counts alone don't hold — the model fills each slot edge-to-edge with
the full wingspan, so neighbors touch even when bodies are spaced. The lever
that works is proportional containment on a wide canvas: give each pose its
own equal cell and keep the ENTIRE silhouette (wings/tail/halo included)
inside it. On the 1536px landscape strip ~70% occupancy still leaves a
generous gutter, so the pet stays a normal, good-looking size — no shrinking.
"""
slots = max(1, frame_count)
slot_w = _ASSUMED_STRIP_WIDTH / slots
pose_px = round(slot_w * 0.7)
gap_px = max(48, round(slot_w * 0.3))
return pose_px, gap_px
# Per-draft nudges so the 4 base options are actually distinct — gpt-image returns
# near-duplicates for a single prompt. We vary the *look* (palette, build,
# expression, accents), NOT the pose, so the chosen base still grounds clean,
# consistent animation rows.
BASE_VARIATIONS: tuple[str, ...] = (
"",
"a distinctly different colour palette and markings",
"a heavier, broader silhouette with sturdier proportions",
"a different facial structure and expression matching the concept tone, with unique accent/accessory details",
"a leaner, taller build and an alternate colour scheme",
"bolder, more saturated colours and a stronger expression matching the concept tone",
)
def build_base_prompt(concept: str, *, style: str | None = "auto", variation: str = "") -> str:
"""The base look: a single, clean, centered full-body mascot.
*variation* differentiates one draft from the next (see :data:`BASE_VARIATIONS`).
"""
concept = (concept or "a distinctive mascot creature").strip()
nudge = f" Make this design distinct: {variation}." if variation else ""
return (
f"A stylized mascot pet character: {concept}. "
"Honor the requested tone and mood exactly (cute, eerie, scary, menacing, whimsical, etc.) "
"while staying non-graphic. "
"Compact, whole-body silhouette that reads clearly at small size, "
"clear readable facial features, simple consistent palette. "
# A neutral, symmetric, at-rest stance makes the cleanest identity anchor
"Neutral front-facing standing pose, upright and symmetric, arms/limbs "
"relaxed at the sides, feet together on the ground, any cape/accessories "
"hanging straight and still."
f"{nudge} "
f"{_BACKGROUND}{style_hint(style)}"
)
def build_row_prompt(state: str, frame_count: int, concept: str, *, style: str | None = "auto") -> str:
"""A row strip: *frame_count* poses of the SAME character, left→right.
The attached base image is the identity source of truth; the prompt locks
species, palette, face, and props to it.
"""
action = STATE_ACTIONS.get(state, "a simple idle pose")
concept = (concept or "the mascot").strip()
pose_px, gap_px = _spacing_spec(frame_count)
return (
f"Using the attached reference image as the exact same character "
f"(same species, face, colors, markings, proportions, and props), "
"preserving the same emotional tone/mood (e.g., scary stays scary, cute stays cute), "
f"draw a single WIDE horizontal strip of {frame_count} animation frames showing {action}. "
f"LAYOUT: arrange {frame_count} poses in ONE horizontal row at equal spacing, "
"each pose centered in its own imaginary equal region. Draw NO panel borders, "
"NO comic cells, NO boxes, NO vertical divider/gutter lines, NO grid, NO frame "
"outlines between poses — the backdrop is one unbroken flat field behind all of them. "
"Fill the WHOLE strip with the SAME single flat chroma-key color as the attached "
"reference image's background (identical hue in every frame, no per-pose color shifts). "
f"SPACING (critical): draw each pose at a consistent, healthy, clearly "
f"visible size (roughly {pose_px}px wide on a {_ASSUMED_STRIP_WIDTH}px "
f"strip) — do NOT shrink it tiny — but keep its ENTIRE silhouette "
f"(wings, tail, halo, horns, cape, every appendage) fully INSIDE its own "
f"cell. Leave at least {gap_px}px of empty chroma-key background between "
f"neighboring silhouettes at their closest point (wingtip to wingtip), and "
f"the same empty margin before the first pose and after the last. If a wing, "
f"cape, or tail would reach into a neighbor, FOLD or angle it inward rather "
f"than letting it cross the gap. Silhouettes must NEVER touch, overlap, "
f"share a shadow, share a ground line, share motion trails, or merge into "
f"one connected shape. "
# Registration: a clean sprite sheet keeps the character locked in place
# so only the action moves — this is what stops the loop sliding/pulsing.
"REGISTRATION (critical): the character is the SAME height and SAME width "
"in every frame, drawn at the SAME scale, centered over the SAME point, "
"with all feet aligned to the SAME invisible horizontal baseline across the "
"whole strip — this baseline is conceptual ONLY: draw NO ground line, floor, "
"platform, horizon, or contact shadow beneath the feet. Keep the body's center, size, and stance fixed frame to "
"frame — ONLY the limbs/features the action needs may move. Capes, cloaks, "
"bags, and scarves stay in the SAME place and shape every frame (no "
"swinging, flowing, or drifting) unless the action itself requires it. No "
"pose is cropped at the strip edges. "
f"{_BACKGROUND}{style_hint(style)}"
)

165
agent/pet/manifest.py Normal file
View File

@@ -0,0 +1,165 @@
"""Fetch the public petdex manifest.
``https://petdex.dev/api/manifest`` 307-redirects to a JSON document on R2:
{
"generatedAt": "...",
"total": 2926,
"pets": [
{"slug": "boba", "displayName": "Boba", "kind": "creature",
"submittedBy": "railly",
"spritesheetUrl": "https://assets.petdex.dev/.../spritesheet.webp",
"petJsonUrl": "https://assets.petdex.dev/.../pet.json",
"zipUrl": "https://assets.petdex.dev/.../boba.zip"},
...
]
}
Read-only and unauthenticated; no credentials involved.
"""
from __future__ import annotations
import logging
import threading
import time
from dataclasses import dataclass
logger = logging.getLogger(__name__)
MANIFEST_URL = "https://petdex.dev/api/manifest"
_DEFAULT_TIMEOUT = 10.0
# In-process cache for the (large, slow, identical-per-call) manifest. The list
# is a static CDN object that barely changes, yet a single session can ask for
# it many times — every gallery open, plus a full re-fetch per install/select
# (``find_entry``). A short TTL collapses those into one network hit without
# going stale for long. Cleared by :func:`clear_cache` (tests).
_MANIFEST_TTL = 300.0
_cache: tuple[float, list[ManifestEntry]] | None = None
_prefetch_lock = threading.Lock()
_prefetching = False
def clear_cache() -> None:
"""Drop the cached manifest (forces the next fetch to hit the network)."""
global _cache
_cache = None
def _cache_is_warm() -> bool:
return _cache is not None and time.monotonic() - _cache[0] < _MANIFEST_TTL
def prefetch(*, timeout: float = _DEFAULT_TIMEOUT) -> None:
"""Warm the manifest cache in a daemon thread — idempotent, never blocks.
The desktop picker calls this when it loads the (instant) local-only gallery
so the full petdex catalog is usually cached by the time it's requested,
without ever holding up the user's own pets on a network round-trip.
"""
global _prefetching
if _cache_is_warm():
return
with _prefetch_lock:
if _prefetching:
return
_prefetching = True
def _run() -> None:
global _prefetching
try:
fetch_manifest(timeout=timeout)
except Exception as exc: # noqa: BLE001 - best-effort warm
logger.debug("petdex manifest prefetch failed: %s", exc)
finally:
_prefetching = False
threading.Thread(target=_run, name="petdex-prefetch", daemon=True).start()
@dataclass(frozen=True)
class ManifestEntry:
"""A single pet's row in the manifest."""
slug: str
display_name: str
kind: str
submitted_by: str
spritesheet_url: str
pet_json_url: str
zip_url: str
@classmethod
def from_dict(cls, data: dict) -> "ManifestEntry":
return cls(
slug=str(data.get("slug", "")).strip(),
display_name=str(data.get("displayName", "") or data.get("slug", "")),
kind=str(data.get("kind", "") or "pet"),
submitted_by=str(data.get("submittedBy", "") or ""),
spritesheet_url=str(data.get("spritesheetUrl", "") or ""),
pet_json_url=str(data.get("petJsonUrl", "") or ""),
zip_url=str(data.get("zipUrl", "") or ""),
)
class ManifestError(RuntimeError):
"""Raised when the manifest can't be fetched or parsed."""
def fetch_manifest(*, timeout: float = _DEFAULT_TIMEOUT, force: bool = False) -> list[ManifestEntry]:
"""Return every approved pet from the public manifest.
Cached in-process for ``_MANIFEST_TTL`` seconds (pass ``force=True`` to
bypass). Follows the 307 redirect to R2. Raises :class:`ManifestError` on
any network/parse failure so callers can surface a clean message.
"""
global _cache
if not force and _cache is not None and time.monotonic() - _cache[0] < _MANIFEST_TTL:
return _cache[1]
try:
import httpx
except ImportError as exc: # pragma: no cover - httpx is a core dep
raise ManifestError("httpx is required to fetch the petdex manifest") from exc
try:
resp = httpx.get(
MANIFEST_URL,
timeout=timeout,
follow_redirects=True,
headers={"User-Agent": "hermes-agent-petdex"},
)
resp.raise_for_status()
payload = resp.json()
except Exception as exc: # noqa: BLE001 - normalize to one error type
raise ManifestError(f"could not fetch petdex manifest: {exc}") from exc
pets = payload.get("pets") if isinstance(payload, dict) else None
if not isinstance(pets, list):
raise ManifestError("petdex manifest had no 'pets' array")
entries: list[ManifestEntry] = []
for raw in pets:
if not isinstance(raw, dict):
continue
entry = ManifestEntry.from_dict(raw)
if entry.slug and entry.spritesheet_url:
entries.append(entry)
_cache = (time.monotonic(), entries)
return entries
def find_entry(slug: str, *, timeout: float = _DEFAULT_TIMEOUT) -> ManifestEntry | None:
"""Return the manifest entry for *slug*, or ``None`` if not listed."""
slug = slug.strip().lower()
for entry in fetch_manifest(timeout=timeout):
if entry.slug.lower() == slug:
return entry
return None

618
agent/pet/render.py Normal file
View File

@@ -0,0 +1,618 @@
"""Decode a pet spritesheet and encode frames for a terminal.
Shared by the base CLI (writes the escape bytes to its own stdout) and the
TUI (``tui_gateway`` ships the encoded bytes to Ink, which writes them) so the
decode + capability-detection + protocol-encoding logic exists exactly once.
Supported output modes, in fidelity order:
- ``kitty`` — the kitty graphics protocol (kitty, Ghostty, WezTerm).
- ``iterm`` — iTerm2 inline images (iTerm2, WezTerm).
- ``sixel`` — DEC sixel (xterm -ti vt340, foot, mlterm, WezTerm, …).
- ``unicode`` — 24-bit half-block downscale; works in any truecolor terminal.
Frame decoding requires Pillow (a core Hermes dependency). If Pillow or the
spritesheet is unavailable the renderer degrades to ``unicode`` text or an
empty string rather than raising.
"""
from __future__ import annotations
import base64
import io
import logging
import os
import sys
from functools import lru_cache
from pathlib import Path
from agent.pet.constants import (
DEFAULT_SCALE,
FRAME_H,
FRAME_W,
FRAMES_PER_STATE,
PetState,
state_row_index,
)
logger = logging.getLogger(__name__)
# Public render-mode names accepted by ``display.pet.render_mode``.
RENDER_MODES = ("auto", "kitty", "iterm", "sixel", "unicode", "off")
# ─────────────────────────────────────────────────────────────────────────
# Terminal capability detection
# ─────────────────────────────────────────────────────────────────────────
def detect_terminal_graphics() -> str:
"""Best-effort detection of the richest graphics protocol available.
Env-based (non-blocking — we never issue a DA1/terminal query that could
hang a pipe). Returns one of ``kitty`` / ``iterm`` / ``sixel`` /
``unicode``. Conservative: unknown terminals get ``unicode``, which works
anywhere with truecolor.
"""
term = os.environ.get("TERM", "").lower()
term_program = os.environ.get("TERM_PROGRAM", "").lower()
# The VS Code / Cursor integrated terminal sets TERM_PROGRAM=vscode
# authoritatively but does NOT scrub the terminal env vars it inherits when
# launched from another emulator (ITERM_SESSION_ID, KITTY_WINDOW_ID, …).
# Trusting those leaks emits an image protocol the embedded xterm.js can't
# display — you get a blank frame. Inline images there are opt-in
# (terminal.integrated.enableImages), so default to half-blocks, which
# always render in its truecolor grid. Users who enabled images can pin
# display.pet.render_mode explicitly.
if term_program == "vscode":
return "unicode"
# kitty graphics protocol
if os.environ.get("KITTY_WINDOW_ID") or "kitty" in term or "ghostty" in term:
return "kitty"
if term_program in {"ghostty"}:
return "kitty"
# WezTerm speaks both kitty and iterm; prefer kitty (richer placement).
if term_program == "wezterm" or os.environ.get("WEZTERM_PANE"):
return "kitty"
# iTerm2 inline images
if term_program == "iterm.app" or os.environ.get("ITERM_SESSION_ID"):
return "iterm"
# sixel-capable terminals (env heuristics only)
if term_program in {"mintty"} or "foot" in term or "mlterm" in term:
return "sixel"
if "sixel" in term:
return "sixel"
return "unicode"
def resolve_mode(configured: str | None, *, stream=None) -> str:
"""Resolve the effective render mode from config + the environment.
``configured`` is ``display.pet.render_mode`` (``auto`` → detect). Returns
``off`` when not attached to a TTY (no point emitting graphics into a pipe
or logfile).
"""
mode = (configured or "auto").strip().lower()
if mode not in RENDER_MODES:
mode = "auto"
if mode == "off":
return "off"
stream = stream or sys.stdout
try:
if not (hasattr(stream, "isatty") and stream.isatty()):
return "off"
except (ValueError, OSError):
return "off"
if mode == "auto":
return detect_terminal_graphics()
return mode
# ─────────────────────────────────────────────────────────────────────────
# Frame decoding
# ─────────────────────────────────────────────────────────────────────────
def _open_sheet(path: Path):
from PIL import Image
img = Image.open(path)
return img.convert("RGBA")
# Max alpha at/below which a frame counts as blank padding. petdex sheets are
# left-packed: a state with fewer real frames than ``FRAMES_PER_STATE`` fills
# the trailing columns with fully transparent cells. Animating into one flashes
# the pet blank, so we stop the row at the first such gap.
_BLANK_ALPHA = 8
def _frame_is_blank(frame) -> bool:
"""True if *frame* has no meaningfully opaque pixel (transparent padding)."""
return frame.getchannel("A").getextrema()[1] <= _BLANK_ALPHA
@lru_cache(maxsize=16)
def _raw_frames(
sheet_path: str,
state_value: str,
frame_w: int,
frame_h: int,
frames_per_state: int,
) -> tuple:
"""Cropped, padding-trimmed RGBA frames for one state row (unscaled).
Steps across the row until the first blank column so pets with ragged
per-state frame counts never animate into empty padding. Cached; returns
``()`` on any decode failure.
"""
try:
sheet = _open_sheet(Path(sheet_path))
cols = max(1, sheet.width // frame_w)
rows = max(1, sheet.height // frame_h)
row = state_row_index(state_value, rows)
top = row * frame_h
# Clamp the row to the sheet (some pets ship fewer rows than the 8 the
# taxonomy reserves).
if top + frame_h > sheet.height:
top = max(0, sheet.height - frame_h)
frames = []
for i in range(min(frames_per_state, cols)):
left = i * frame_w
frame = sheet.crop((left, top, left + frame_w, top + frame_h))
if _frame_is_blank(frame):
break # trailing transparent padding — real frames end here
frames.append(frame)
return tuple(frames)
except Exception as exc: # noqa: BLE001 - cosmetic feature, never fatal
logger.debug("pet frame decode failed (%s, %s): %s", sheet_path, state_value, exc)
return ()
@lru_cache(maxsize=8)
def _frames_for(
sheet_path: str,
state_value: str,
frame_w: int,
frame_h: int,
frames_per_state: int,
scale_w: int,
scale_h: int,
):
"""Return padding-trimmed RGBA frames for one state row, scaled.
Thin scaling layer over :func:`_raw_frames`; both are cached so repeated
frame requests during animation are free.
"""
raw = _raw_frames(sheet_path, state_value, frame_w, frame_h, frames_per_state)
if not raw or (scale_w, scale_h) == (frame_w, frame_h):
return list(raw)
from PIL import Image
return [f.resize((scale_w, scale_h), Image.LANCZOS) for f in raw]
def state_frame_counts(
sheet_path: str | Path,
*,
frame_w: int = FRAME_W,
frame_h: int = FRAME_H,
frames_per_state: int = FRAMES_PER_STATE,
) -> dict[str, int]:
"""Map each driven :class:`PetState` → its real (padding-trimmed) frame count.
The single source of truth for "how many frames does this state actually
have?". The CLI/TUI consume the trimmed frame lists directly; the gateway
ships this map to the desktop canvas, which steps its own loop.
"""
return {
state.value: len(
_raw_frames(str(sheet_path), state.value, frame_w, frame_h, frames_per_state)
)
for state in PetState
}
# ─────────────────────────────────────────────────────────────────────────
# Encoders
# ─────────────────────────────────────────────────────────────────────────
def _png_bytes(frame) -> bytes:
buf = io.BytesIO()
frame.save(buf, format="PNG")
return buf.getvalue()
def _kitty_apc(ctrl: str, data: str) -> str:
"""Emit a kitty APC escape for *data*, chunked into ≤4096-byte ``m`` pieces."""
chunk = 4096
if len(data) <= chunk:
return f"\x1b_G{ctrl},m=0;{data}\x1b\\"
out = [f"\x1b_G{ctrl},m=1;{data[:chunk]}\x1b\\"]
rest = data[chunk:]
while rest:
piece, rest = rest[:chunk], rest[chunk:]
out.append(f"\x1b_Gm={1 if rest else 0};{piece}\x1b\\")
return "".join(out)
def _encode_kitty(frame, *, cell_cols: int | None = None, cell_rows: int | None = None) -> str:
"""Encode one frame via the kitty graphics protocol (transmit + display).
``a=T`` transmits & displays at the cursor; ``c``/``r`` request a display
box in terminal cells so successive frames overwrite the same area.
"""
ctrl = "f=100,a=T,q=2"
if cell_cols:
ctrl += f",c={cell_cols}"
if cell_rows:
ctrl += f",r={cell_rows}"
return _kitty_apc(ctrl, base64.standard_b64encode(_png_bytes(frame)).decode("ascii"))
# ─────────────────────────────────────────────────────────────────────────
# kitty Unicode placeholders
#
# Ink (the TUI's React-for-terminal layer) owns the screen and measures every
# cell's width, so it can't host raw kitty image escapes (no width to count,
# clobbered on the next repaint). kitty's *Unicode placeholder* protocol is the
# grid-safe path: transmit the image once (q=2, virtual placement U=1), then the
# host app prints ordinary-width placeholder cells (U+10EEEE + diacritics) whose
# foreground color encodes the image id. Ink counts those as width-1 text, so
# layout stays correct and the terminal paints the image underneath.
# https://sw.kovidgoyal.net/kitty/graphics-protocol/#unicode-placeholders
# ─────────────────────────────────────────────────────────────────────────
_KITTY_PLACEHOLDER = "\U0010eeee"
# Row/column diacritics, in order (index → diacritic). Verbatim from kitty's
# gen/rowcolumn-diacritics.txt (Unicode 6.0.0, combining class 230). Index i is
# the diacritic that encodes the number i; we only ever need the row index.
_ROWCOL_DIACRITICS: tuple[int, ...] = (
0x0305, 0x030D, 0x030E, 0x0310, 0x0312, 0x033D, 0x033E, 0x033F, 0x0346, 0x034A,
0x034B, 0x034C, 0x0350, 0x0351, 0x0352, 0x0357, 0x035B, 0x0363, 0x0364, 0x0365,
0x0366, 0x0367, 0x0368, 0x0369, 0x036A, 0x036B, 0x036C, 0x036D, 0x036E, 0x036F,
0x0483, 0x0484, 0x0485, 0x0486, 0x0487, 0x0592, 0x0593, 0x0594, 0x0595, 0x0597,
0x0598, 0x0599, 0x059C, 0x059D, 0x059E, 0x059F, 0x05A0, 0x05A1, 0x05A8, 0x05A9,
0x05AB, 0x05AC, 0x05AF, 0x05C4, 0x0610, 0x0611, 0x0612, 0x0613, 0x0614, 0x0615,
0x0616, 0x0617, 0x0657, 0x0658, 0x0659, 0x065A, 0x065B, 0x065D, 0x065E, 0x06D6,
0x06D7, 0x06D8, 0x06D9, 0x06DA, 0x06DB, 0x06DC, 0x06DF, 0x06E0, 0x06E1, 0x06E2,
0x06E4, 0x06E7, 0x06E8, 0x06EB, 0x06EC, 0x0730, 0x0732, 0x0733, 0x0735, 0x0736,
0x073A, 0x073D, 0x073F, 0x0740, 0x0741, 0x0743, 0x0745, 0x0747, 0x0749, 0x074A,
0x07EB, 0x07EC, 0x07ED, 0x07EE, 0x07EF, 0x07F0, 0x07F1, 0x07F3, 0x0816, 0x0817,
0x0818, 0x0819, 0x081B, 0x081C, 0x081D, 0x081E, 0x081F, 0x0820, 0x0821, 0x0822,
0x0823, 0x0825, 0x0826, 0x0827, 0x0829, 0x082A, 0x082B, 0x082C, 0x082D, 0x0951,
0x0953, 0x0954, 0x0F82, 0x0F83, 0x0F86, 0x0F87, 0x135D, 0x135E, 0x135F, 0x17DD,
0x193A, 0x1A17, 0x1A75, 0x1A76, 0x1A77, 0x1A78, 0x1A79, 0x1A7A, 0x1A7B, 0x1A7C,
0x1B6B, 0x1B6D, 0x1B6E, 0x1B6F, 0x1B70, 0x1B71, 0x1B72, 0x1B73, 0x1CD0, 0x1CD1,
0x1CD2, 0x1CDA, 0x1CDB, 0x1CE0, 0x1DC0, 0x1DC1, 0x1DC3, 0x1DC4, 0x1DC5, 0x1DC6,
0x1DC7, 0x1DC8, 0x1DC9, 0x1DCB, 0x1DCC, 0x1DD1, 0x1DD2, 0x1DD3, 0x1DD4, 0x1DD5,
0x1DD6, 0x1DD7, 0x1DD8, 0x1DD9, 0x1DDA, 0x1DDB, 0x1DDC, 0x1DDD, 0x1DDE, 0x1DDF,
0x1DE0, 0x1DE1, 0x1DE2, 0x1DE3, 0x1DE4, 0x1DE5, 0x1DE6, 0x1DFE, 0x20D0, 0x20D1,
0x20D4, 0x20D5, 0x20D6, 0x20D7, 0x20DB, 0x20DC, 0x20E1, 0x20E7, 0x20E9, 0x20F0,
0x2CEF, 0x2CF0, 0x2CF1, 0x2DE0, 0x2DE1, 0x2DE2, 0x2DE3, 0x2DE4, 0x2DE5, 0x2DE6,
0x2DE7, 0x2DE8, 0x2DE9, 0x2DEA, 0x2DEB, 0x2DEC, 0x2DED, 0x2DEE, 0x2DEF, 0x2DF0,
0x2DF1, 0x2DF2, 0x2DF3, 0x2DF4, 0x2DF5, 0x2DF6, 0x2DF7, 0x2DF8, 0x2DF9, 0x2DFA,
0x2DFB, 0x2DFC, 0x2DFD, 0x2DFE, 0x2DFF, 0xA66F, 0xA67C, 0xA67D, 0xA6F0, 0xA6F1,
0xA8E0, 0xA8E1, 0xA8E2, 0xA8E3, 0xA8E4, 0xA8E5, 0xA8E6, 0xA8E7, 0xA8E8, 0xA8E9,
0xA8EA, 0xA8EB, 0xA8EC, 0xA8ED, 0xA8EE, 0xA8EF, 0xA8F0, 0xA8F1, 0xAAB0, 0xAAB2,
0xAAB3, 0xAAB7, 0xAAB8, 0xAABE, 0xAABF, 0xAAC1, 0xFE20, 0xFE21, 0xFE22, 0xFE23,
0xFE24, 0xFE25, 0xFE26, 0x10A0F, 0x10A38, 0x1D185, 0x1D186, 0x1D187, 0x1D188,
0x1D189, 0x1D1AA, 0x1D1AB, 0x1D1AC, 0x1D1AD, 0x1D242, 0x1D243, 0x1D244,
)
def kitty_image_id(slug: str) -> int:
"""Stable per-pet image id in ``[1, 0x7FFF]``.
The id is encoded in the placeholder's 24-bit foreground color, so it must
be non-zero and fit comfortably under ``0xFFFFFF``. A small CRC keeps it
deterministic per slug (so re-renders reuse the same terminal-side image)
while making collisions between two different pets unlikely.
"""
import zlib
return (zlib.crc32(slug.encode("utf-8")) % 0x7FFE) + 1
def kitty_color_hex(image_id: int) -> str:
"""Hex foreground color (``#rrggbb``) that encodes *image_id* for kitty."""
return "#%06x" % (image_id & 0xFFFFFF)
def kitty_placeholder_rows(cols: int, rows: int) -> list[str]:
"""Build the placeholder text grid for an *rows*×*cols* image.
Each line is one row of the grid: the first cell carries the row diacritic
(column defaults to 0), and the remaining ``cols-1`` bare placeholders let
the terminal auto-increment the column. The foreground color (the image id)
is applied by the caller / Ink, not embedded here.
"""
cols = max(1, cols)
out: list[str] = []
for r in range(max(1, rows)):
idx = min(r, len(_ROWCOL_DIACRITICS) - 1)
first = _KITTY_PLACEHOLDER + chr(_ROWCOL_DIACRITICS[idx])
out.append(first + _KITTY_PLACEHOLDER * (cols - 1))
return out
def _encode_kitty_virtual(frame, *, image_id: int, cols: int, rows: int) -> str:
"""Transmit a frame as a kitty *virtual* placement for Unicode placeholders.
``a=T`` transmits and creates the placement in one shot; ``U=1`` marks it
virtual (no on-screen output, cursor untouched); ``q=2`` suppresses the
terminal's OK/error replies that would otherwise corrupt the host app's
output. Re-sending with the same ``i`` replaces the image, so the static
placeholder cells animate underneath.
"""
ctrl = f"a=T,U=1,i={image_id},c={cols},r={rows},f=100,q=2"
return _kitty_apc(ctrl, base64.standard_b64encode(_png_bytes(frame)).decode("ascii"))
def _encode_iterm(frame, *, cell_cols: int | None = None, cell_rows: int | None = None) -> str:
"""Encode one frame as an iTerm2 inline image (OSC 1337 File)."""
payload = base64.standard_b64encode(_png_bytes(frame)).decode("ascii")
size = len(payload)
args = [f"inline=1", f"size={size}", "preserveAspectRatio=1"]
if cell_cols:
args.append(f"width={cell_cols}")
if cell_rows:
args.append(f"height={cell_rows}")
return f"\x1b]1337;File={';'.join(args)}:{payload}\x07"
def _encode_sixel(frame) -> str:
"""Encode one frame as DEC sixel.
Quantizes to an adaptive palette (≤255 colors) and emits the sixel band
stream. Pillow has no sixel writer, so this is a compact hand-rolled
encoder. Transparent pixels render as background (color register skipped).
"""
from PIL import Image
rgba = frame
# Composite onto transparent-as-skip: track alpha to decide background.
pal = rgba.convert("RGB").quantize(colors=255, method=Image.MEDIANCUT)
palette = pal.getpalette() or []
px = pal.load()
alpha = rgba.getchannel("A").load()
w, h = pal.size
out = ["\x1bP0;1;0q", '"1;1;%d;%d' % (w, h)]
# Color register definitions (sixel uses 0..100 scale).
used = sorted({px[x, y] for y in range(h) for x in range(w)})
for idx in used:
r = palette[idx * 3] if idx * 3 < len(palette) else 0
g = palette[idx * 3 + 1] if idx * 3 + 1 < len(palette) else 0
b = palette[idx * 3 + 2] if idx * 3 + 2 < len(palette) else 0
out.append("#%d;2;%d;%d;%d" % (idx, r * 100 // 255, g * 100 // 255, b * 100 // 255))
# Emit in 6-row bands.
for band in range(0, h, 6):
for color_idx in used:
line = ["#%d" % color_idx]
run_char = None
run_len = 0
def flush():
nonlocal run_char, run_len
if run_char is None:
return
if run_len > 3:
line.append("!%d%s" % (run_len, run_char))
else:
line.append(run_char * run_len)
run_char, run_len = None, 0
for x in range(w):
bits = 0
for bit in range(6):
y = band + bit
if y < h and alpha[x, y] > 32 and px[x, y] == color_idx:
bits |= 1 << bit
ch = chr(63 + bits)
if ch == run_char:
run_len += 1
else:
flush()
run_char, run_len = ch, 1
flush()
out.append("".join(line) + "$") # carriage return within band
out.append("-") # next band
out.append("\x1b\\")
return "".join(out)
_HALF_BLOCK = ""
# A single half-block cell: top pixel + bottom pixel as (r, g, b, a) tuples.
Cell = tuple[tuple[int, int, int, int], tuple[int, int, int, int]]
def _downscale_cells(frame, *, target_cols: int) -> list[list[Cell]]:
"""Downscale a frame to a grid of half-block cells.
Each cell pairs a top and bottom pixel so one terminal row encodes two
pixel rows. Returns rows of ``((tr,tg,tb,ta),(br,bg,bb,ba))`` — the
framework-neutral representation shared by the ANSI encoder (CLI) and the
structured ``cells`` API (Ink).
"""
from PIL import Image
target_cols = max(4, target_cols)
aspect = frame.height / max(1, frame.width)
target_rows = max(2, int(round(target_cols * aspect * 0.5)) * 2)
small = frame.resize((target_cols, target_rows), Image.LANCZOS).convert("RGBA")
px = small.load()
grid: list[list[Cell]] = []
for y in range(0, target_rows, 2):
row: list[Cell] = []
for x in range(target_cols):
top = px[x, y]
bottom = px[x, y + 1] if y + 1 < target_rows else (0, 0, 0, 0)
row.append((top, bottom))
grid.append(row)
return grid
def _encode_unicode(frame, *, target_cols: int) -> str:
"""Downscale to truecolor ANSI half-blocks (one char = 2 vertical pixels)."""
lines: list[str] = []
for row in _downscale_cells(frame, target_cols=target_cols):
cells: list[str] = []
for (tr, tg, tb, ta), (br, bg, bb, ba) in row:
if ta < 32 and ba < 32:
cells.append("\x1b[0m ") # fully transparent → blank
continue
cells.append(f"\x1b[38;2;{tr};{tg};{tb}m\x1b[48;2;{br};{bg};{bb}m{_HALF_BLOCK}")
lines.append("".join(cells) + "\x1b[0m")
return "\n".join(lines)
# ─────────────────────────────────────────────────────────────────────────
# Public renderer
# ─────────────────────────────────────────────────────────────────────────
class PetRenderer:
"""Holds a pet's spritesheet and yields encoded frames per (state, index).
Construct once per pet, then call :meth:`frame` on an animation timer.
Cheap to call repeatedly — decoded frames are cached.
"""
def __init__(
self,
spritesheet: str | Path,
*,
mode: str = "unicode",
scale: float = DEFAULT_SCALE,
unicode_cols: int = 20,
frame_w: int = FRAME_W,
frame_h: int = FRAME_H,
frames_per_state: int = FRAMES_PER_STATE,
) -> None:
self.spritesheet = str(spritesheet)
self.mode = mode if mode in RENDER_MODES else "unicode"
self.scale = scale
self.unicode_cols = unicode_cols
self.frame_w = frame_w
self.frame_h = frame_h
self.frames_per_state = frames_per_state
@property
def available(self) -> bool:
return self.mode != "off" and Path(self.spritesheet).is_file()
def frame_count(self, state: PetState | str) -> int:
return len(self._frames(state))
def _frames(self, state: PetState | str):
value = state.value if isinstance(state, PetState) else str(state)
scale_w = max(1, int(self.frame_w * self.scale))
scale_h = max(1, int(self.frame_h * self.scale))
return _frames_for(
self.spritesheet,
value,
self.frame_w,
self.frame_h,
self.frames_per_state,
scale_w,
scale_h,
)
def cells(self, state: PetState | str, index: int, *, cols: int | None = None) -> list[list[Cell]]:
"""Return one frame as a half-block cell grid (framework-neutral).
Used by the TUI, which renders the grid with native Ink color props
instead of raw ANSI. Returns ``[]`` when no frame is available.
"""
frames = self._frames(state)
if not frames:
return []
frame = frames[index % len(frames)]
return _downscale_cells(frame, target_cols=cols or self.unicode_cols)
def _cell_box(self, frame) -> tuple[int, int]:
"""Terminal cell box for a scaled frame (~8×16 px per cell).
Must match :meth:`frame` graphics sizing — kitty stretches the image to
fill ``c``×``r`` cells, so these must reflect the scaled pixel
dimensions, not a native-aspect column count (that upscales small pets).
"""
return max(1, frame.width // 8), max(1, frame.height // 16)
def kitty_payload(self, state: PetState | str, *, image_id: int) -> dict | None:
"""Build the kitty Unicode-placeholder payload for one state.
Returns ``{cols, rows, placeholder, frames}`` where ``frames`` is a
list of transmit escapes (one per animation frame, all reusing
``image_id``) and ``placeholder`` is the static text grid Ink paints.
Placement geometry is derived from the scaled frame pixels (via
:meth:`_cell_box`), not ``unicode_cols`` — kitty upscales to fill
``c``×``r`` cells. ``None`` when no frame is available.
"""
frames = self._frames(state)
if not frames:
return None
cols, rows = self._cell_box(frames[0])
return {
"cols": cols,
"rows": rows,
"placeholder": kitty_placeholder_rows(cols, rows),
"frames": [
_encode_kitty_virtual(f, image_id=image_id, cols=cols, rows=rows) for f in frames
],
}
def frame(self, state: PetState | str, index: int) -> str:
"""Return the encoded escape string for one frame, or ``""``.
``index`` is taken modulo the available frame count so callers can pass
a free-running counter.
"""
if self.mode == "off":
return ""
frames = self._frames(state)
if not frames:
return ""
frame = frames[index % len(frames)]
cell_cols, cell_rows = self._cell_box(frame)
try:
if self.mode == "kitty":
return _encode_kitty(frame, cell_cols=cell_cols, cell_rows=cell_rows)
if self.mode == "iterm":
return _encode_iterm(frame, cell_cols=cell_cols, cell_rows=cell_rows)
if self.mode == "sixel":
return _encode_sixel(frame)
return _encode_unicode(frame, target_cols=self.unicode_cols)
except Exception as exc: # noqa: BLE001 - degrade silently
logger.debug("pet frame encode failed (mode=%s): %s", self.mode, exc)
return ""
def build_renderer(
spritesheet: str | Path,
*,
configured_mode: str | None = None,
scale: float = DEFAULT_SCALE,
unicode_cols: int = 20,
stream=None,
) -> PetRenderer:
"""Convenience factory: resolve the mode from config+env, then construct."""
mode = resolve_mode(configured_mode, stream=stream)
return PetRenderer(
spritesheet,
mode=mode,
scale=scale,
unicode_cols=unicode_cols,
)

81
agent/pet/state.py Normal file
View File

@@ -0,0 +1,81 @@
"""Map agent activity → a :class:`PetState`.
This is the one place the "what is the agent doing right now?""which
animation row?" decision lives. Each surface feeds it the signals it already
tracks:
- CLI — ``KawaiiSpinner`` waiting/thinking state + tool outcomes.
- TUI — gateway ``tool.start/complete`` + ``message.delta/complete`` events.
- Desktop — the ``$busy``/``$awaitingResponse``/tool-event nanostores
(re-implemented in TS, but mirroring this priority order).
Keeping the priority order here (and documenting it) lets the TypeScript
mirror stay faithful without a second design.
"""
from __future__ import annotations
from collections.abc import Iterable
from typing import Any
from agent.pet.constants import PetState
def todos_all_done(todos: Iterable[Any] | None) -> bool:
"""True iff there's ≥1 todo and every one is completed/cancelled.
The "celebrate" beat (``JUMP``) fires when a plan finishes; this mirrors
the TUI's ``isTodoDone`` so the trigger is defined once across surfaces.
Accepts dicts (``{"status": ...}``) or objects with a ``status`` attr.
"""
items = list(todos or [])
if not items:
return False
def _status(t: Any) -> Any:
return t.get("status") if isinstance(t, dict) else getattr(t, "status", None)
return all(_status(t) in ("completed", "cancelled") for t in items)
def derive_pet_state(
*,
busy: bool = False,
awaiting_input: bool = False,
error: bool = False,
celebrate: bool = False,
just_completed: bool = False,
tool_running: bool = False,
reasoning: bool = False,
) -> PetState:
"""Resolve the animation state from coarse activity signals.
Priority (highest first) — only one row can show at a time, so the most
salient signal wins:
1. ``error`` → ``FAILED`` (a tool/turn just failed)
2. ``celebrate`` → ``JUMP`` (explicit success beat, e.g. todos done)
3. ``just_completed`` → ``WAVE`` (turn finished cleanly / greeting)
4. ``awaiting_input`` → ``WAITING`` (blocked on the user — a clarify/approval
prompt is open; this outranks the in-flight signals below because the turn
is paused on *you*, even though a tool is technically mid-call)
5. ``tool_running`` → ``RUN`` (a tool is executing)
6. ``reasoning`` → ``REVIEW`` (model is thinking / reading)
7. ``busy`` → ``RUN`` (turn in flight, unspecified work)
8. otherwise → ``IDLE``
"""
if error:
return PetState.FAILED
if celebrate:
return PetState.JUMP
if just_completed:
return PetState.WAVE
if awaiting_input:
return PetState.WAITING
if tool_running:
return PetState.RUN
if reasoning:
return PetState.REVIEW
if busy:
return PetState.RUN
return PetState.IDLE

503
agent/pet/store.py Normal file
View File

@@ -0,0 +1,503 @@
"""On-disk pet store — install / list / resolve pets.
Pets live under ``get_hermes_home()/pets/<slug>/`` so every profile gets its
own set (we deliberately do **not** reuse petdex's ``~/.codex/pets`` default —
that's owned by the petdex npm CLI and isn't profile-aware). Each installed
pet directory holds:
pets/<slug>/
pet.json # {id, displayName, description, spritesheetPath}
spritesheet.webp # (or .png)
The active pet is resolved from the caller-supplied ``display.pet.slug`` config
value (falling back to the first installed pet), so this module stays free of
the config loader.
"""
from __future__ import annotations
import json
import logging
import re
from dataclasses import dataclass
from pathlib import Path
from hermes_constants import get_hermes_home
logger = logging.getLogger(__name__)
_DOWNLOAD_TIMEOUT = 60.0
class PetStoreError(RuntimeError):
"""Raised on install/IO failures."""
@dataclass(frozen=True)
class InstalledPet:
"""A pet present on disk."""
slug: str
display_name: str
description: str
directory: Path
spritesheet: Path
created_by: str = "" # "generator" for pets hatched locally; "" for petdex installs
@property
def exists(self) -> bool:
return self.spritesheet.is_file()
@property
def generated(self) -> bool:
return self.created_by == "generator"
def pets_dir() -> Path:
"""Return the profile-scoped pets directory (created on demand)."""
path = get_hermes_home() / "pets"
path.mkdir(parents=True, exist_ok=True)
return path
def _read_pet_json(directory: Path) -> dict:
pet_json = directory / "pet.json"
if not pet_json.is_file():
return {}
try:
return json.loads(pet_json.read_text(encoding="utf-8"))
except (OSError, ValueError) as exc:
logger.debug("unreadable pet.json in %s: %s", directory, exc)
return {}
def _resolve_spritesheet(directory: Path, meta: dict) -> Path:
"""Find the spritesheet for a pet dir.
Honors ``spritesheetPath`` from pet.json, else probes the conventional
filenames (``spritesheet.{webp,png}`` and petdex R2's ``sprite.webp``).
"""
declared = str(meta.get("spritesheetPath", "") or "").strip()
if declared:
candidate = directory / declared
if candidate.is_file():
return candidate
for name in ("spritesheet.webp", "spritesheet.png", "sprite.webp", "sprite.png"):
candidate = directory / name
if candidate.is_file():
return candidate
# Default expectation even if missing, so callers get a stable path.
return directory / "spritesheet.webp"
def _safe_slug(slug: str) -> str:
"""Normalize a slug to a single bare path segment.
Pet slugs index into ``pets_dir()/<slug>/`` for load/remove, so a value
carrying path separators (``../``, absolute paths) could escape the pets
directory. Strip every separator and reject ``.``/``..`` so callers can
only ever name a direct child of the pets directory.
"""
segment = Path(str(slug).strip()).name
if segment in ("", ".", ".."):
return ""
return segment
def load_pet(slug: str) -> InstalledPet | None:
"""Return the :class:`InstalledPet` for *slug*, or ``None`` if absent."""
slug = _safe_slug(slug)
if not slug:
return None
directory = pets_dir() / slug
if not directory.is_dir():
return None
meta = _read_pet_json(directory)
return InstalledPet(
slug=slug,
display_name=str(meta.get("displayName", "") or slug),
description=str(meta.get("description", "") or ""),
directory=directory,
spritesheet=_resolve_spritesheet(directory, meta),
created_by=str(meta.get("createdBy", "") or ""),
)
def installed_pets() -> list[InstalledPet]:
"""Return every installed pet (dirs containing a usable spritesheet)."""
out: list[InstalledPet] = []
for child in sorted(pets_dir().iterdir()):
if not child.is_dir():
continue
pet = load_pet(child.name)
if pet and pet.exists:
out.append(pet)
return out
def resolve_active_pet(configured_slug: str | None = None) -> InstalledPet | None:
"""Resolve which pet to display.
Precedence: the configured slug (``display.pet.slug``) if it's installed,
otherwise the first installed pet alphabetically, otherwise ``None``.
"""
if configured_slug:
pet = load_pet(configured_slug.strip())
if pet and pet.exists:
return pet
pets = installed_pets()
return pets[0] if pets else None
def install_pet(slug: str, *, force: bool = False, timeout: float = _DOWNLOAD_TIMEOUT) -> InstalledPet:
"""Download *slug* from the manifest into the pets directory.
Idempotent: a fully-installed pet is returned as-is unless *force*. Raises
:class:`PetStoreError` / :class:`~agent.pet.manifest.ManifestError` on
failure.
"""
from agent.pet.manifest import find_entry
slug = _safe_slug(slug)
if not slug:
raise PetStoreError("invalid pet slug")
existing = load_pet(slug)
if existing and existing.exists and not force:
return existing
entry = find_entry(slug, timeout=timeout)
if entry is None:
raise PetStoreError(f"pet '{slug}' is not in the petdex manifest")
# Host-pin every asset URL to petdex. The manifest is trusted (HTTPS from
# petdex.dev), but pin the asset hosts too so a compromised/spoofed manifest
# can't redirect the download at an arbitrary host. Matches thumbnail_png.
if not _is_petdex_host(entry.spritesheet_url):
raise PetStoreError(f"refusing non-petdex spritesheet host for '{slug}'")
directory = pets_dir() / slug
directory.mkdir(parents=True, exist_ok=True)
sprite_ext = ".png" if entry.spritesheet_url.lower().split("?")[0].endswith(".png") else ".webp"
sprite_path = directory / f"spritesheet{sprite_ext}"
_download(entry.spritesheet_url, sprite_path, timeout=timeout)
# Fetch the upstream pet.json if present; otherwise synthesize a minimal
# one so the local layout is self-describing.
meta: dict = {}
if entry.pet_json_url and _is_petdex_host(entry.pet_json_url):
try:
meta = _download_json(entry.pet_json_url, timeout=timeout)
except Exception as exc: # noqa: BLE001 - non-fatal, fall back below
logger.debug("pet.json fetch failed for %s: %s", slug, exc)
if not isinstance(meta, dict) or not meta:
meta = {"id": slug, "displayName": entry.display_name, "description": ""}
meta["spritesheetPath"] = sprite_path.name
meta.setdefault("id", slug)
meta.setdefault("displayName", entry.display_name)
(directory / "pet.json").write_text(json.dumps(meta, indent=2), encoding="utf-8")
pet = load_pet(slug)
if pet is None or not pet.exists:
raise PetStoreError(f"install of '{slug}' did not produce a spritesheet")
return pet
def slugify(name: str) -> str:
"""Lowercase, hyphenate, and strip a display name into a filesystem slug."""
slug = re.sub(r"[^a-z0-9]+", "-", (name or "").strip().lower()).strip("-")
return slug or "pet"
def unique_slug(name: str) -> str:
"""A :func:`slugify` result that doesn't collide with an existing pet dir."""
base = slugify(name)
slug = base
counter = 2
while (pets_dir() / slug).exists():
slug = f"{base}-{counter}"
counter += 1
return slug
def _write_spritesheet(source, dest: Path) -> None:
"""Write *source* (PIL image, bytes, or path) as a lossless WebP at *dest*."""
if isinstance(source, (bytes, bytearray)):
dest.write_bytes(bytes(source))
return
from PIL import Image
if isinstance(source, (str, Path)):
with Image.open(source) as opened:
image = opened.convert("RGBA")
else:
image = source.convert("RGBA")
image.save(dest, format="WEBP", lossless=True, quality=100, method=6, exact=True)
def register_local_pet(
spritesheet,
*,
slug: str,
display_name: str = "",
description: str = "",
) -> InstalledPet:
"""Write a locally-generated pet into the store and return it.
*spritesheet* may be a PIL image, raw WebP/PNG bytes, or a path. The pet
appears in :func:`installed_pets` immediately, and because :func:`install_pet`
returns an already-on-disk pet before consulting the manifest, it can be
adopted (``pet.select`` / ``/pet <slug>``) without a manifest entry.
"""
slug = slugify(slug)
directory = pets_dir() / slug
directory.mkdir(parents=True, exist_ok=True)
sprite_path = directory / "spritesheet.webp"
try:
_write_spritesheet(spritesheet, sprite_path)
except Exception as exc: # noqa: BLE001 - normalize to one error type
raise PetStoreError(f"could not write spritesheet for '{slug}': {exc}") from exc
meta = {
"id": slug,
"displayName": display_name or slug,
"description": description or "",
"spritesheetPath": sprite_path.name,
"createdBy": "generator",
}
(directory / "pet.json").write_text(json.dumps(meta, indent=2), encoding="utf-8")
pet = load_pet(slug)
if pet is None or not pet.exists:
raise PetStoreError(f"register of generated pet '{slug}' did not produce a spritesheet")
return pet
def export_pet(slug: str) -> tuple[str, bytes]:
"""Zip an installed pet's folder (pet.json + spritesheet) → (filename, bytes).
Dotfiles (cached thumbs, backups) are skipped so the archive is a clean,
re-importable pet package. Raises :class:`PetStoreError` if not installed.
"""
import io
import zipfile
root = pets_dir()
directory = root / slug.strip()
# Guard against traversal: the target must be a direct child of pets_dir.
if directory.resolve().parent != root.resolve() or not directory.is_dir():
raise PetStoreError(f"pet '{slug}' is not installed")
name = directory.name
buf = io.BytesIO()
with zipfile.ZipFile(buf, "w", zipfile.ZIP_DEFLATED) as archive:
for path in sorted(directory.iterdir()):
if path.is_file() and not path.name.startswith("."):
archive.write(path, f"{name}/{path.name}")
return f"{name}.zip", buf.getvalue()
_THUMB_FRAME_W = 192
_THUMB_FRAME_H = 208
_THUMB_W = 96 # rendered ~40px; 2x+ keeps it crisp on HiDPI
def _thumbs_dir() -> Path:
path = pets_dir() / ".thumbs"
path.mkdir(parents=True, exist_ok=True)
return path
def _is_petdex_host(url: str) -> bool:
"""True only for petdex.dev hosts — bounds server-side fetch (anti-SSRF)."""
from urllib.parse import urlparse
try:
host = (urlparse(url).hostname or "").lower()
except ValueError:
return False
return host == "petdex.dev" or host.endswith(".petdex.dev")
def thumbnail_png(slug: str, *, source_url: str = "", timeout: float = 30.0) -> bytes | None:
"""Return a small idle-frame PNG for *slug*, cached on disk.
Crops the top-left (idle, frame 0) cell of the spritesheet and downsamples
it to a thumbnail. Source preference: an installed spritesheet on disk, else
*source_url* — but only when it points at petdex (so the gateway never
fetches an arbitrary client-supplied URL). Returns ``None`` when there's no
usable source or Pillow/network fails; callers render a placeholder.
Doing this server-side sidesteps the renderer's CSP / R2 hotlink limits that
break a direct ``<img src=cdn>`` and lets the result ride the authenticated
gateway as a same-origin data URL.
"""
slug = slug.strip()
if not slug:
return None
cache = _thumbs_dir() / f"{slug}.png"
if cache.is_file():
try:
return cache.read_bytes()
except OSError:
pass
sheet_bytes: bytes | None = None
pet = load_pet(slug)
if pet and pet.exists:
try:
sheet_bytes = pet.spritesheet.read_bytes()
except OSError:
sheet_bytes = None
if sheet_bytes is None and source_url and _is_petdex_host(source_url):
try:
import httpx
resp = httpx.get(
source_url,
timeout=timeout,
follow_redirects=True,
headers={"User-Agent": "hermes-agent-petdex"},
)
resp.raise_for_status()
sheet_bytes = resp.content
except Exception as exc: # noqa: BLE001 - cosmetic, degrade to placeholder
logger.debug("thumb fetch failed for %s: %s", slug, exc)
if not sheet_bytes:
return None
try:
import io
from PIL import Image
with Image.open(io.BytesIO(sheet_bytes)) as im:
frame = im.convert("RGBA").crop(
(0, 0, min(_THUMB_FRAME_W, im.width), min(_THUMB_FRAME_H, im.height))
)
height = round(_THUMB_W * _THUMB_FRAME_H / _THUMB_FRAME_W)
frame = frame.resize((_THUMB_W, height), Image.NEAREST)
buf = io.BytesIO()
frame.save(buf, format="PNG")
data = buf.getvalue()
except Exception as exc: # noqa: BLE001
logger.debug("thumb crop failed for %s: %s", slug, exc)
return None
try:
cache.write_bytes(data)
except OSError:
pass
return data
def remove_pet(slug: str) -> bool:
"""Delete an installed pet directory. Returns True if anything was removed."""
import shutil
slug = _safe_slug(slug)
if not slug:
return False
# The cached thumbnail lives in pets/.thumbs/<slug>.png — OUTSIDE the pet
# dir, so rmtree won't catch it. Drop it too, or a later pet that reuses this
# slug renders this one's stale thumbnail.
try:
(_thumbs_dir() / f"{slug}.png").unlink(missing_ok=True)
except OSError:
pass
directory = pets_dir() / slug
if not directory.is_dir():
return False
shutil.rmtree(directory, ignore_errors=True)
return not directory.exists()
def rename_pet(slug: str, display_name: str) -> str | None:
"""Rename a pet's ``displayName`` AND realign its slug/dir to match.
Generated pets are hatched under a provisional, prompt-derived slug; when
the user names the pet on the reveal screen we make that name the real
identity so lists/subtitles show what they typed, not the prompt. The dir is
renamed to ``slugify(name)`` (and the cached thumbnail moved alongside it)
whenever that yields a free, different slug — otherwise the slug is left as
is. Returns the resulting slug on success, or ``None`` on failure.
"""
slug = _safe_slug(slug)
display_name = (display_name or "").strip()
if not slug or not display_name:
return None
directory = pets_dir() / slug
pet_json = directory / "pet.json"
if not pet_json.is_file():
return None
try:
meta = json.loads(pet_json.read_text(encoding="utf-8"))
except (OSError, ValueError):
meta = {}
if not isinstance(meta, dict):
meta = {}
meta["displayName"] = display_name
new_slug = slug
desired = slugify(display_name)
if desired and desired != slug and not (pets_dir() / desired).exists():
try:
directory.rename(pets_dir() / desired)
try:
(_thumbs_dir() / f"{slug}.png").rename(_thumbs_dir() / f"{desired}.png")
except OSError:
pass
directory = pets_dir() / desired
pet_json = directory / "pet.json"
new_slug = desired
meta["id"] = new_slug
except OSError:
new_slug = slug # keep the provisional slug if the move fails
try:
pet_json.write_text(json.dumps(meta, indent=2), encoding="utf-8")
except OSError:
return None
return new_slug
def _download(url: str, dest: Path, *, timeout: float) -> None:
import httpx
try:
with httpx.stream(
"GET",
url,
timeout=timeout,
follow_redirects=True,
headers={"User-Agent": "hermes-agent-petdex"},
) as resp:
resp.raise_for_status()
tmp = dest.with_suffix(dest.suffix + ".part")
with tmp.open("wb") as fh:
for chunk in resp.iter_bytes():
fh.write(chunk)
tmp.replace(dest)
except Exception as exc: # noqa: BLE001
raise PetStoreError(f"download failed for {url}: {exc}") from exc
def _download_json(url: str, *, timeout: float) -> dict:
import httpx
resp = httpx.get(
url,
timeout=timeout,
follow_redirects=True,
headers={"User-Agent": "hermes-agent-petdex"},
)
resp.raise_for_status()
data = resp.json()
return data if isinstance(data, dict) else {}

View File

@@ -26,7 +26,7 @@ from __future__ import annotations
import os
import sys
import urllib.request
from typing import Optional
from typing import Any, Optional
from utils import base_url_hostname, normalize_proxy_url
@@ -142,6 +142,46 @@ def _get_proxy_for_base_url(base_url: Optional[str]) -> Optional[str]:
return proxy
def build_keepalive_http_client(
base_url: str = "",
*,
async_mode: bool = False,
) -> Optional[Any]:
"""Build an httpx client for OpenAI SDK calls with env-only proxy policy.
Uses explicit ``HTTPS_PROXY`` / ``NO_PROXY`` env vars via
``_get_proxy_for_base_url``. A custom transport disables httpx's default
``trust_env`` path, so macOS system proxy settings from
``urllib.request.getproxies()`` (which omit the ExceptionsList) are not
applied. Mirrors ``AIAgent._build_keepalive_http_client``.
"""
try:
import httpx
import socket
if "api.githubcopilot.com" in str(base_url or "").lower():
client_cls = httpx.AsyncClient if async_mode else httpx.Client
return client_cls()
sock_opts = [(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1)]
if hasattr(socket, "TCP_KEEPIDLE"):
sock_opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 30))
sock_opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 10))
sock_opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 3))
elif hasattr(socket, "TCP_KEEPALIVE"):
sock_opts.append((socket.IPPROTO_TCP, socket.TCP_KEEPALIVE, 30))
proxy = _get_proxy_for_base_url(base_url)
transport_cls = httpx.AsyncHTTPTransport if async_mode else httpx.HTTPTransport
client_cls = httpx.AsyncClient if async_mode else httpx.Client
return client_cls(
transport=transport_cls(socket_options=sock_opts),
proxy=proxy,
)
except Exception:
return None
def _install_safe_stdio() -> None:
"""Wrap stdout/stderr so best-effort console output cannot crash the agent."""
for stream_name in ("stdout", "stderr"):
@@ -164,4 +204,5 @@ __all__ = [
"_install_safe_stdio",
"_get_proxy_from_env",
"_get_proxy_for_base_url",
"build_keepalive_http_client",
]

View File

@@ -88,12 +88,15 @@ def _find_hermes_md(cwd: Path) -> Optional[Path]:
stop_at = _find_git_root(cwd)
current = cwd.resolve()
for directory in [current, *current.parents]:
# When there is no git root, only check cwd itself walking parents
# could pick up a .hermes.md planted in /tmp, /home, etc.
search_dirs = [current, *current.parents] if stop_at else [current]
for directory in search_dirs:
for name in _HERMES_MD_NAMES:
candidate = directory / name
if candidate.is_file():
return candidate
# Stop walking at the git root (or filesystem root).
if stop_at and directory == stop_at:
break
return None
@@ -243,7 +246,10 @@ KANBAN_GUIDANCE = (
"- **Workspace.** `cd $HERMES_KANBAN_WORKSPACE` first. For a `worktree` kind "
"with no `.git`, `git worktree add <path> "
"${HERMES_KANBAN_BRANCH:-wt/$HERMES_KANBAN_TASK}` from the main repo, then "
"cd there.\n"
"cd there. For a project-linked task the workspace is a fresh "
"`<repo>/.worktrees/<task-id>` and `$HERMES_KANBAN_BRANCH` a deterministic "
"`<project-slug>/<task-id>` — the main repo is two levels up, so run "
"`git worktree add` from there.\n"
"- **Deliverables.** Files a human wants go in "
"`kanban_complete(artifacts=[<absolute paths>])` (top-level param; paths in "
"`metadata` are NOT uploaded). Files must exist at completion.\n"
@@ -614,7 +620,12 @@ DEVELOPER_ROLE_MODELS = ("gpt-5", "codex")
PLATFORM_HINTS = {
"whatsapp": (
"You are on a text messaging communication platform, WhatsApp. "
"Please do not use markdown as it does not render. "
"Standard markdown (**bold**, *italic*, ~~strike~~, # headers, "
"`code`, ```code blocks```, [links](url)) is auto-converted to "
"WhatsApp's native syntax (*bold*, _italic_, ~strike~, monospace) — "
"feel free to write in markdown, and use bullet lists ('- item') "
"freely. Tables are NOT supported — prefer bullet lists or labeled "
"key:value pairs. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. The file "
"will be sent as a native WhatsApp attachment — images (.jpg, .png, "
@@ -679,7 +690,11 @@ PLATFORM_HINTS = {
),
"signal": (
"You are on a text messaging communication platform, Signal. "
"Please do not use markdown as it does not render. "
"Standard markdown (**bold**, *italic*, ~~strike~~, # headers, "
"`code`, ```code blocks```) is auto-converted to Signal's native "
"rich formatting — feel free to write in markdown, and use bullet "
"lists ('- item') freely (they render as • bullets). Tables are NOT "
"supported — prefer bullet lists or labeled key:value pairs. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. Images "
"(.png, .jpg, .webp) appear as photos, audio as attachments, and other "
@@ -709,7 +724,24 @@ PLATFORM_HINTS = {
"(those are only intercepted on messaging platforms like Telegram, "
"Discord, Slack, etc.; on the CLI they render as literal text). "
"When referring to a file you created or changed, just state its "
"absolute path in plain text; the user can open it from there."
"absolute path in plain text; the user can open it from there. "
"Cron jobs scheduled from this session are LOCAL-ONLY: their output is "
"saved (viewable via cronjob action='list') but is NOT delivered back "
"into this terminal — there is no live-delivery channel here. If the "
"user wants to be notified when a job runs, the job's `deliver` must "
"target a gateway-connected messaging platform (e.g. deliver='telegram' "
"or 'all'). Do not promise the user that a deliver='origin' or "
"default-deliver cron job will message them in this session."
),
"tui": (
"You are running in the Hermes terminal UI (TUI). "
"Cron jobs scheduled from this session are LOCAL-ONLY: their output is "
"saved (viewable via cronjob action='list') but is NOT delivered back "
"into this TUI session — there is no live-delivery channel here. If the "
"user wants to be notified when a job runs, the job's `deliver` must "
"target a gateway-connected messaging platform (e.g. deliver='telegram' "
"or 'all'). Do not promise the user that a deliver='origin' or "
"default-deliver cron job will message them in this session."
),
"sms": (
"You are communicating via SMS. Keep responses concise and use plain text "
@@ -897,8 +929,7 @@ def _probe_remote_backend(env_type: str) -> str | None:
try:
# Import locally: tools/ imports are heavy and only relevant when a
# non-local backend is actually configured.
from tools.terminal_tool import _get_env_config # type: ignore
from tools.environments import get_environment # type: ignore
from tools.terminal_tool import _create_environment, _get_env_config # type: ignore
except Exception as e:
logger.debug("Backend probe unavailable (import failed): %s", e)
_BACKEND_PROBE_CACHE[cache_key] = ""
@@ -906,7 +937,59 @@ def _probe_remote_backend(env_type: str) -> str | None:
try:
config = _get_env_config()
env = get_environment(config)
# Build the environment the same way tools/terminal_tool.py does for a
# live command: select the backend image, then assemble ssh/container
# config from the env-derived dict. (There is no `get_environment`
# factory — the real entry point is `_create_environment`.)
if env_type == "docker":
image = config.get("docker_image", "")
elif env_type == "singularity":
image = config.get("singularity_image", "")
elif env_type == "modal":
image = config.get("modal_image", "")
elif env_type == "daytona":
image = config.get("daytona_image", "")
else:
image = ""
ssh_config = None
if env_type == "ssh":
ssh_config = {
"host": config.get("ssh_host", ""),
"user": config.get("ssh_user", ""),
"port": config.get("ssh_port", 22),
"key": config.get("ssh_key", ""),
"persistent": config.get("ssh_persistent", False),
}
container_config = None
if env_type in {"docker", "singularity", "modal", "daytona"}:
container_config = {
"container_cpu": config.get("container_cpu", 1),
"container_memory": config.get("container_memory", 5120),
"container_disk": config.get("container_disk", 51200),
"container_persistent": config.get("container_persistent", True),
"modal_mode": config.get("modal_mode", "auto"),
"docker_volumes": config.get("docker_volumes", []),
"docker_mount_cwd_to_workspace": config.get("docker_mount_cwd_to_workspace", False),
"docker_forward_env": config.get("docker_forward_env", []),
"docker_env": config.get("docker_env", {}),
"docker_run_as_host_user": config.get("docker_run_as_host_user", False),
"docker_extra_args": config.get("docker_extra_args", []),
"docker_persist_across_processes": config.get("docker_persist_across_processes", True),
"docker_orphan_reaper": config.get("docker_orphan_reaper", True),
}
env = _create_environment(
env_type=env_type,
image=image,
cwd=config.get("cwd", ""),
timeout=config.get("timeout", 180),
ssh_config=ssh_config,
container_config=container_config,
task_id="prompt-backend-probe",
host_cwd=config.get("host_cwd"),
)
# Single-line POSIX probe — works on any Unixy backend. Wrapped in
# `2>/dev/null` so a missing binary doesn't pollute the output.
probe_cmd = (

216
agent/reasoning_timeouts.py Normal file
View File

@@ -0,0 +1,216 @@
"""Per-reasoning-model stale-timeout floor for known reasoning models.
Reasoning models (those that emit extended thinking blocks before their
first content token) routinely exceed Hermes's default chat-model
stale detectors:
* Stream stale detector: ``HERMES_STREAM_STALE_TIMEOUT`` default 180s
``agent/chat_completion_helpers.py:2544``
* Non-stream stale detector: ``HERMES_API_CALL_STALE_TIMEOUT`` default 90s
``run_agent.py:1140``
For NVIDIA Nemotron 3 Ultra on the hosted NIM gateway the empirical
upstream idle kill is ~120s (first-party reproduction at
NVIDIA/NemoClaw#4846 — TTFB ~31s, stream dies at 120s). The same
failure mode exists on OpenAI o1/o3, Anthropic Opus 4.x thinking,
DeepSeek R1, Qwen QwQ, xAI Grok reasoning — every cloud reasoning
model hits upstream-proxies / load-balancers with idle timeouts
shorter than the model's thinking phase. Result: the stale detector
kills the connection mid-think, surfacing as
``BrokenPipeError``/``RemoteProtocolError`` on the next read.
This module provides a floor that the existing stale-detector scaling
blocks consult via :func:`get_reasoning_stale_timeout_floor` and
apply as ``max(default, floor)``. It is a FLOOR:
* Never overrides explicit user config (``providers.<id>.models.<model>.stale_timeout_seconds``
or ``request_timeout_seconds`` already wins — this code never runs
in that branch).
* Never lowers an existing threshold.
* Has zero effect on non-reasoning models — they are not in the
allowlist and the resolver returns ``None``.
Matching uses start-anchored regex on the slug-only component of
the model name (after stripping any aggregator prefix like
``openai/``, ``x-ai/``, ``anthropic/``). The right-anchor matches
end-of-string or a ``-``/``.``/``_`` slug separator, so ``qwen3-235b``
matches the ``qwen3`` family entry (a future model slug would be
``qwen3-235b-instruct`` and would also match) but ``some-other-qwen3``
does NOT match ``qwen3`` (the ``-qwen3`` is not at start of slug).
The ``o1`` case is the most delicate: a model named
``llama-4-70b-o1-preview`` is a hypothetical community derivative that
should NOT trigger the reasoning-model floor for the user (the user
chose a non-OpenAI model, not a reasoning model). The start-of-slug
anchor naturally excludes this — the matched ``o1-preview`` is at
position 11 of the slug, not at position 0. The previous substring-
with-trailing-hyphen design would have over-matched here, which is
why start-of-slug anchoring is the right shape.
Fixes #52217.
"""
from __future__ import annotations
import re
from typing import Optional
# (slug, floor_seconds). Each slug is matched as a discrete
# word-boundary component via the wrapper regex in ``_match_any``
# below. Order is irrelevant — the first regex match wins.
_REASONING_STALE_TIMEOUT_FLOORS: tuple[tuple[str, int], ...] = (
# NVIDIA Nemotron — reasoning models behind hosted NIM with
# documented 60-180s upstream idle kill (NVIDIA/NemoClaw#4846:
# 120s measured).
("nemotron-3-ultra", 600),
("nemotron-3-super", 600),
("nemotron-3-nano", 300),
# DeepSeek — R1 reasoning model on hosted NIM / DeepSeek direct.
("deepseek-r1", 600),
("deepseek-reasoner", 600),
# Qwen — QwQ reasoning + Qwen3 thinking variants. QwQ-32B
# preview is the stable slug; ``qwen3`` covers the family of
# thinking-mode Qwen3 models (qwen3-235b-a22b, qwen3-32b, etc.)
# without over-matching every Qwen3 instruct variant — the
# right-anchor requires the slug to be at the start of the
# remaining model name, so ``qwen3-235b-instruct`` (instruct is
# NOT a thinking variant) would still match. Acceptable
# trade-off: instruct variants of qwen3 get the 180s floor
# even though they don't reason. The cost is a slightly longer
# wait on a hung provider; the alternative (matching only
# ``qwen3-.*-thinking``) breaks the moment NVIDIA or Alibaba
# ships a slightly different naming shape.
("qwq-32b", 300),
("qwen3", 180),
# OpenAI o-series — known multi-minute TTFB. Each variant
# enumerated explicitly so bare ``o1`` doesn't over-match
# ``olmo-1`` or hypothetical future community derivatives.
("o1", 600),
("o1-mini", 600),
("o1-pro", 600),
("o1-preview", 600),
("o3", 600),
("o3-pro", 600),
("o3-mini", 300),
("o4-mini", 300),
# Anthropic Claude 4.x thinking variants. Anchored at
# ``claude-opus-4`` so non-thinking Claude 3.x or future
# non-reasoning Claude variants don't match.
("claude-opus-4", 240),
("claude-sonnet-4.5", 180),
("claude-sonnet-4.6", 180),
# xAI Grok reasoning variants. Explicit reasoning-only keys
# plus one for the ``non-reasoning`` variant so users picking
# the fast variant don't get the 300s floor. Bare ``grok-3``,
# ``grok-4`` etc. don't match — only the explicit reasoning /
# non-reasoning pairs.
("grok-4-fast-reasoning", 300),
("grok-4.20-reasoning", 300),
("grok-4-fast-non-reasoning", 180),
)
# Pre-compile each pattern. Wrapper = start-of-slug + slug + end-or-
# separator, where ``start-of-slug`` means start-of-string OR
# immediately after the last ``/`` (aggregator separator) and
# ``end-or-separator`` means end-of-string OR a ``-``/``.``/``_``.
#
# Why start-of-slug and not start-of-string: aggregator prefixes
# like ``openai/`` should not affect matching — the slug identity is
# the part after the last ``/``. Stripping the aggregator prefix in
# :func:`get_reasoning_stale_timeout_floor` before regex matching
# gives the wrapper a clean start-of-string anchor.
#
# Why end-or-separator on the right: ``openai/o3-mini`` must match
# the ``o3-mini`` slug (the right anchor is end-of-string). And
# ``openai/o3-mini-2025-01-31`` must also match ``o3-mini`` (the right
# anchor is the ``-`` separator). But ``openai/o3-mini-fork`` should
# NOT match ``o3-mini`` if we wanted to exclude forks — though the
# pattern ``o3-mini-fork`` would be matched as a derivative anyway,
# so we accept that community forks inheriting the same prefix are
# treated as reasoning models (a reasonable default — the upstream
# gateway timing is the same).
_PATTERN_CACHE: dict[str, re.Pattern[str]] = {}
def _get_pattern(slug: str) -> re.Pattern[str]:
compiled = _PATTERN_CACHE.get(slug)
if compiled is None:
compiled = re.compile(
r"^"
+ re.escape(slug)
+ r"(?:$|[\-._])"
)
_PATTERN_CACHE[slug] = compiled
return compiled
def _match_any(model_lower: str) -> Optional[float]:
"""Return the floor for the first matching slug, else None.
Each table entry is matched as a start-of-slug prefix with the
slug-separator-or-end-of-string right-anchor. Table iteration
order is irrelevant: longest slug wins (so ``o3-mini`` beats
``o3`` on a model like ``openai/o3-mini``).
"""
# Sort by slug length descending so longer / more-specific slugs
# win on shared prefixes (o3-mini beats o3).
sorted_floors = sorted(
_REASONING_STALE_TIMEOUT_FLOORS, key=lambda kv: -len(kv[0])
)
for slug, floor in sorted_floors:
if _get_pattern(slug).search(model_lower):
return float(floor)
return None
def get_reasoning_stale_timeout_floor(model: object) -> Optional[float]:
"""Return the stale-timeout floor (seconds) for a known reasoning model.
Returns ``None`` when the model is not in the allowlist or the
argument is empty / not a string. Matching uses
word-boundary-anchored regex on the lowercased model name, so
``openai/o3-mini`` matches the ``o3-mini`` slug but
``olmo-1`` does NOT match ``o1`` (the ``o1`` substring is not
at a word boundary inside ``olmo-1``).
Aggregator prefixes (``openai/``, ``x-ai/``, ``anthropic/`` etc.)
are preserved through matching — the ``/`` is itself a word
boundary, so ``openai/o3-mini`` matches ``o3-mini`` because the
``/`` before ``o3-mini`` satisfies the left-anchor alternation.
This is a FLOOR — callers must apply it as ``max(default, floor)``
and only when no explicit user-configured per-model
``stale_timeout_seconds`` exists.
>>> get_reasoning_stale_timeout_floor("nvidia/nemotron-3-ultra-550b-a55b")
600.0
>>> get_reasoning_stale_timeout_floor("openai/o3-mini")
300.0
>>> get_reasoning_stale_timeout_floor("deepseek/deepseek-r1")
600.0
>>> get_reasoning_stale_timeout_floor("qwen/qwen3-235b-a22b-thinking")
180.0
>>> get_reasoning_stale_timeout_floor("x-ai/grok-4-fast-reasoning")
300.0
>>> get_reasoning_stale_timeout_floor("anthropic/claude-opus-4-6")
240.0
>>> get_reasoning_stale_timeout_floor("gpt-4o") is None
True
>>> get_reasoning_stale_timeout_floor("olmo-1") is None
True
>>> get_reasoning_stale_timeout_floor(None) is None
True
"""
if not model or not isinstance(model, str):
return None
name = model.strip().lower()
if not name:
return None
# Strip aggregator prefix (everything before and including the
# last ``/``). The wrapper regex anchors at start-of-string, so
# the slug identity is the bare model name.
if "/" in name:
name = name.rsplit("/", 1)[1]
return _match_any(name)

View File

@@ -10,6 +10,7 @@ the first 6 and last 4 characters for debuggability.
import logging
import os
import re
import shlex
logger = logging.getLogger(__name__)
@@ -107,12 +108,60 @@ _PREFIX_PATTERNS = [
r"ntn_[A-Za-z0-9]{10,}", # Notion internal integration token
]
# ENV assignment patterns: KEY=value where KEY contains a secret-like name
# ENV assignment patterns: KEY=value where KEY contains a secret-like name.
# Uppercase keys tolerate spaces around "=" (e.g. ``FOO_SECRET = bar``) because
# an all-caps key is almost never prose/code.
_SECRET_ENV_NAMES = r"(?:API_?KEY|TOKEN|SECRET|PASSWORD|PASSWD|CREDENTIAL|AUTH)"
_ENV_ASSIGN_RE = re.compile(
rf"([A-Z0-9_]{{0,50}}{_SECRET_ENV_NAMES}[A-Z0-9_]{{0,50}})\s*=\s*(['\"]?)(\S+)\2",
)
# Lowercase / dotted / hyphenated config keys from config files
# (application.properties, .env, YAML-ish dumps): ``spring.datasource.password=secret``,
# ``app.api.key=xyz``, ``password=secret``. The uppercase _ENV_ASSIGN_RE above
# never matched these, so config-file passwords leaked verbatim (issue #16413).
#
# These run only in a config-file context, NOT in prose, code, or URLs — three
# carve-outs preserved from the original design (#4367 + the documented
# web-URL passthrough below):
# 1. The value is bounded by ``[^\s&]`` (stops at whitespace AND ``&``) so
# form-urlencoded bodies are handled pair-by-pair (by _redact_form_body),
# not greedily swallowed.
# 2. _CFG_DOTTED_RE only matches when the key is NAMESPACED (contains a dot),
# which is unambiguously a config key — never a prose word.
# 3. _CFG_ANCHORED_RE matches a bare secret-word key only at line start
# (optionally after ``export``), so conversational ``I have password=foo``
# mid-sentence is left alone.
# The colon-form URL guard (skip when ``://`` present) lives at the call site.
_SECRET_CFG_NAMES = r"(?:api[ _.\-]?key|token|secret|passwd|password|credential|auth)"
_CFG_VALUE = r"(['\"]?)([^\s&]+?)\2(?=[\s&]|$)"
# Namespaced (dotted) key: the secret word may sit anywhere in a dotted path.
_CFG_DOTTED_RE = re.compile(
rf"((?:[A-Za-z0-9_\-]+\.)+[A-Za-z0-9_.\-]*{_SECRET_CFG_NAMES}[A-Za-z0-9_.\-]*"
rf"|[A-Za-z0-9_.\-]*{_SECRET_CFG_NAMES}[A-Za-z0-9_.\-]*\.[A-Za-z0-9_.\-]+)"
rf"={_CFG_VALUE}",
re.IGNORECASE,
)
# Line-anchored bare key: ``password=…`` / ``export api_key=…`` at start of line.
_CFG_ANCHORED_RE = re.compile(
rf"(^[ \t]*(?:export[ \t]+)?[A-Za-z0-9_\-]*{_SECRET_CFG_NAMES}[A-Za-z0-9_\-]*)={_CFG_VALUE}",
re.IGNORECASE | re.MULTILINE,
)
# Unquoted YAML / colon config (e.g. ``password: secret``,
# ``spring.datasource.password: hunter2``). The secret keyword must be part of
# the KEY (anchored to the start of the line/indent), and the value is a single
# whitespace-free token — so prose like ``note: secret meeting`` (keyword in the
# value) and ``error: token expired`` are left alone. Bare ``auth`` is excluded
# from the key set so ``Authorization:`` / ``author:`` don't match (the former
# is masked by _AUTH_HEADER_RE); ``auth_token``/``auth-token`` still match via
# the ``token`` keyword. Quoted values defer to _JSON_FIELD_RE via the lookahead.
_YAML_CFG_NAMES = r"(?:api[ _.\-]?key|token|secret|passwd|password|credential)"
_YAML_ASSIGN_RE = re.compile(
rf"(^[ \t]*[A-Za-z0-9_.\-]*{_YAML_CFG_NAMES}[A-Za-z0-9_.\-]*)(:[ \t]*)(?!['\"])([^\s&]+)",
re.IGNORECASE | re.MULTILINE,
)
# JSON field patterns: "apiKey": "value", "token": "value", etc.
_JSON_KEY_NAMES = r"(?:api_?[Kk]ey|token|secret|password|access_token|refresh_token|auth_token|bearer|secret_value|raw_secret|secret_input|key_material)"
_JSON_FIELD_RE = re.compile(
@@ -125,8 +174,15 @@ _JSON_FIELD_RE = re.compile(
# while the header name and scheme word are preserved for debuggability. The
# previous rule only matched ``Bearer``, so ``Basic <base64 user:pass>`` and
# ``token <pat>`` leaked verbatim into logs/transcripts.
#
# The credential class excludes quote characters (``"`` / ``'``): a token sitting
# flush against a closing quote (``"Authorization: Bearer sk-..."``) must not pull
# that quote into the match, or masking turns value corruption into *syntax*
# corruption — the closing quote vanishes and the command/string no longer parses
# (unterminated quote → shell EOF / Python SyntaxError). Real credentials never
# contain ``"`` or ``'``, so excluding them is safe. See #43083.
_AUTH_HEADER_RE = re.compile(
r"((?:Proxy-)?Authorization:\s*)([A-Za-z][\w.+-]*\s+)?(\S+)",
r"((?:Proxy-)?Authorization:\s*)([A-Za-z][\w.+-]*\s+)?([^\s\"']+)",
re.IGNORECASE,
)
@@ -154,9 +210,37 @@ _PRIVATE_KEY_RE = re.compile(
)
# Database connection strings: protocol://user:PASSWORD@host
# Catches postgres, mysql, mongodb, redis, amqp URLs and redacts the password
# Catches postgres, mysql, mongodb, redis, amqp URLs and redacts the password.
# The userinfo and password groups forbid whitespace ([^:\s]+ / [^@\s]+) so the
# match can never span a line break. A real DSN password never contains
# whitespace; without this bound the greedy [^@]+ would scan past the end of a
# code line to the next stray "@" (e.g. a Python decorator), swallowing
# intervening lines and corrupting tool OUTPUT for any source containing a
# postgresql:// f-string template. See issue #33801.
_DB_CONNSTR_RE = re.compile(
r"((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis|amqp)://[^:]+:)([^@]+)(@)",
r"((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis|amqp)://[^:\s]+:)([^@\s]+)(@)",
re.IGNORECASE,
)
# Bare-token credential in a web/transport URL: ``scheme://TOKEN@host``.
# This is the ``git remote set-url origin https://PASSWORD@github.com/...``
# shape from issue #6396 — a single opaque credential in the userinfo position
# with NO ``user:pass`` colon. It is unambiguously a secret: legitimate
# round-trip URLs (OAuth callbacks, magic links, pre-signed shares — see the
# "Web-URL redaction is intentionally OFF" note in redact_sensitive_text) carry
# their tokens in the QUERY STRING, never in bare userinfo. The colon form
# ``user:pass@`` is deliberately left to pass through (commit "pass web URLs
# through unchanged", #34029) and is NOT matched here — the token class forbids
# ``:``. DB schemes are handled by _DB_CONNSTR_RE above and excluded here.
#
# Guards against false positives:
# - 8+ char floor skips short usernames (git, admin, root, deploy, ubuntu).
# - The token class ``[^\s:@/]`` cannot cross ``/``, so an ``@`` sitting in a
# path or query (e.g. ``?q=user@example.com``) is never treated as userinfo.
_URL_BARE_TOKEN_RE = re.compile(
r"((?:https?|wss?|git|ssh|ftp|ftps|sftp)://)" # scheme
r"([^\s:@/]{8,})" # bare token (no colon/slash/@), 8+ chars
r"(@[^\s]+)", # @host...
re.IGNORECASE,
)
@@ -340,7 +424,40 @@ def _redact_form_body(text: str) -> str:
return _redact_query_string(text.strip())
def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = False) -> str:
def _mask_token_nonreusable(token: str) -> str:
"""Redact a prefix-matched credential to a NON-REUSABLE sentinel.
Unlike :func:`_mask_token` (which keeps head/tail chars — fine for logs
that are never fed back into a config), this emits a marker that:
* cannot be mistaken for a usable-but-truncated key, so an agent that
reads it from a config file and writes it back does NOT corrupt the
stored credential into a dead 13-char string (issue #35519); and
* still does not leak the secret material (no head/tail chars).
The vendor prefix label is preserved for debuggability so the agent can
still tell *which* credential is present (e.g. a GitHub PAT vs an OpenAI
key) without seeing any of its bytes.
"""
if not token:
return "«redacted-secret»"
# Preserve only the recognizable vendor prefix label (e.g. "ghp_", "sk-"),
# never any of the random secret body.
label = ""
for sub in _PREFIX_SUBSTRINGS:
if token.startswith(sub):
label = sub
break
return f"«redacted:{label}…»" if label else "«redacted-secret»"
def redact_sensitive_text(
text: str,
*,
force: bool = False,
code_file: bool = False,
file_read: bool = False,
) -> str:
"""Apply all redaction patterns to a block of text.
Safe to call on any string -- non-matching text passes through unchanged.
@@ -353,6 +470,17 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
constants, "apiKey": "test" fixtures). Prefix patterns, auth headers,
private keys, DB connstrings, JWTs, and URL secrets are still redacted.
Set file_read=True for file *content* returned to the agent (read_file /
search_files / cat). Secrets are STILL redacted — they are never exposed —
but prefix-matched credentials are replaced with a non-reusable sentinel
(``«redacted:ghp_…»``) instead of a head/tail-preserving mask
(``ghp_S1...Pn2T``). The old mask looked like a real-but-truncated key, so
an agent reading it from config.yaml and writing it back silently corrupted
the stored credential into a dead 13-char value → 401 (issue #35519). The
sentinel is syntactically invalid as a token, so it can't be mistaken for a
usable key or written back as one. Implies code_file=True (config/data
files shouldn't trigger the source-code ENV/JSON false-positive paths).
Performance: each regex pattern is gated behind a cheap substring
pre-check (e.g. ``"=" in text`` for ENV assignments, ``"://" in text``
for URLs, ``"eyJ" in text`` for JWTs). On a typical hermes log line
@@ -371,9 +499,15 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
if not (force or _REDACT_ENABLED):
return text
# file_read content shouldn't hit the source-code ENV/JSON false-positive
# paths either (it's config/data, not log lines).
if file_read:
code_file = True
# Known prefixes (sk-, ghp_, etc.) — gate on substring presence
if _has_known_prefix_substring(text):
text = _PREFIX_RE.sub(lambda m: _mask_token(m.group(1)), text)
_prefix_sub = _mask_token_nonreusable if file_read else _mask_token
text = _PREFIX_RE.sub(lambda m: _prefix_sub(m.group(1)), text)
# ENV assignments: OPENAI_API_KEY=*** (skip for code files — false positives)
if not code_file:
@@ -382,6 +516,13 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
name, quote, value = m.group(1), m.group(2), m.group(3)
return f"{name}={quote}{_mask_token(value)}{quote}"
text = _ENV_ASSIGN_RE.sub(_redact_env, text)
# Lowercase/dotted config keys (issue #16413). Skip URLs entirely —
# web-URL query params are intentionally passed through (see note
# near the bottom of this function); _DB_CONNSTR_RE still guards
# connection-string passwords.
if "://" not in text:
text = _CFG_DOTTED_RE.sub(_redact_env, text)
text = _CFG_ANCHORED_RE.sub(_redact_env, text)
# JSON fields: "apiKey": "***" (skip for code files — false positives)
if ":" in text and '"' in text:
@@ -390,6 +531,15 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
return f'{key}: "{_mask_token(value)}"'
text = _JSON_FIELD_RE.sub(_redact_json, text)
# Unquoted YAML / colon config: password: *** (after JSON so quoted
# values are handled there; the lookahead in _YAML_ASSIGN_RE skips
# quotes). Skip URLs — web-URL query params pass through by design.
if ":" in text and "://" not in text:
def _redact_yaml(m):
key, sep, value = m.group(1), m.group(2), m.group(3)
return f"{key}{sep}{_mask_token(value)}"
text = _YAML_ASSIGN_RE.sub(_redact_yaml, text)
# Authorization headers — _AUTH_HEADER_RE matches any scheme after
# "[Proxy-]Authorization:" case-insensitively, so "uthorization" is the
# cheapest substring gate that covers every casing without a casefold().
@@ -419,9 +569,32 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
if "BEGIN" in text and "-----" in text:
text = _PRIVATE_KEY_RE.sub("[REDACTED PRIVATE KEY]", text)
# Database connection string passwords
# Database connection string passwords. With code_file=True, a password
# group that is a pure ``{...}`` brace expression is an f-string template
# reference (e.g. f"postgresql://{user}:{pass}@{host}"), not a literal
# credential — preserve it. Literal passwords are still redacted. The regex
# forbids whitespace in the password group, so a single-line template's
# group(2) is exactly the brace expression. See issue #33801.
if "://" in text:
text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
if code_file:
def _redact_db(m):
pw = m.group(2)
if pw.startswith("{") and pw.endswith("}"):
return m.group(0)
return f"{m.group(1)}***{m.group(3)}"
text = _DB_CONNSTR_RE.sub(_redact_db, text)
else:
text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
# Bare-token userinfo in web/transport URLs: ``scheme://TOKEN@host``.
# The git-remote-with-embedded-password shape from #6396. Only the
# colon-less bare-token form is redacted — ``user:pass@`` and
# query-string tokens are left to pass through (see the web-URL note
# below). See _URL_BARE_TOKEN_RE for the false-positive guards.
text = _URL_BARE_TOKEN_RE.sub(
lambda m: f"{m.group(1)}{_mask_token(m.group(2))}{m.group(3)}",
text,
)
# JWT tokens (eyJ... — base64-encoded JSON headers)
if "eyJ" in text:
@@ -434,7 +607,12 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
# blanket-redacting param values by name breaks those skills mid-flow.
# Known credential shapes (sk-, ghp_, JWTs, etc.) inside URLs are still
# caught by _PREFIX_RE and _JWT_RE above. DB connection-string passwords
# are still caught by _DB_CONNSTR_RE.
# are still caught by _DB_CONNSTR_RE. The ONE userinfo case still redacted
# is the colon-less bare-token form ``scheme://TOKEN@host`` (#6396, handled
# by _URL_BARE_TOKEN_RE in the ``://`` block above): a bare credential in
# userinfo is never a round-trip workflow token (those live in the query
# string), so masking it can't break a skill. The ``user:pass@`` form is
# left to pass through per #34029.
# Form-urlencoded bodies (only triggers on clean k=v&k=v inputs).
if "&" in text and "=" in text:
@@ -452,6 +630,66 @@ def redact_sensitive_text(text: str, *, force: bool = False, code_file: bool = F
return text
# Commands whose stdout is an environment-variable dump (KEY=value lines),
# NOT source code. For these, terminal-output redaction must run the
# ENV-assignment pass (code_file=False) so opaque tokens with no recognized
# vendor prefix (e.g. ``MY_SERVICE_TOKEN=abc123randomstring``) are still
# masked. For all other commands, code_file=True is used to avoid mangling
# legitimate source/config dumps (``MAX_TOKENS=100``, ``"apiKey": "x"``
# fixtures, ``postgresql://{user}`` f-string templates). See issue #43025.
_ENV_DUMP_COMMANDS = frozenset({"env", "printenv", "set", "export", "declare"})
def is_env_dump_command(command: str | None) -> bool:
"""Return True if ``command`` dumps environment variables to stdout.
Detects ``env`` / ``printenv`` / ``set`` / ``export`` / ``declare`` as the
first token of any segment in a pipeline or sequence (``;`` / ``&&`` /
``||`` / ``|``). Conservative: a parse failure or anything unrecognized
returns False (callers then fall back to the safer code_file=True path,
which still masks prefix-shaped keys).
"""
if not command or not isinstance(command, str):
return False
# Split on shell separators, then inspect the first token of each segment.
segments = re.split(r"[|;&]+", command)
for seg in segments:
seg = seg.strip()
if not seg:
continue
try:
tokens = shlex.split(seg)
except ValueError:
tokens = seg.split()
if tokens and tokens[0] in _ENV_DUMP_COMMANDS:
return True
return False
def redact_terminal_output(
output: str, command: str | None = None, *, force: bool = False
) -> str:
"""Redact secrets from terminal/process stdout.
Single redaction policy for ALL terminal-output surfaces — foreground
``terminal`` results AND background ``process(action=poll/log/wait)``
output — so they can't diverge. Picks ``code_file`` based on whether
``command`` is an environment dump:
- env-dump command (``env``/``printenv``/``set``/``export``/``declare``)
→ ``code_file=False`` so the ENV-assignment pass masks opaque tokens.
- anything else (or unknown command) → ``code_file=True`` to avoid
false positives on source/config dumps.
``force=True`` bypasses the global ``security.redact_secrets`` preference
for safety boundaries that must never emit raw credentials.
"""
if not output:
return output
code_file = not is_env_dump_command(command or "")
return redact_sensitive_text(output, force=force, code_file=code_file)
# Substrings used to gate ``_PREFIX_RE`` execution. If none of these appear in
# the input string, the prefix regex cannot match anything, so we skip it.
# False positives are fine (they just run the regex, which then matches

140
agent/replay_cleanup.py Normal file
View File

@@ -0,0 +1,140 @@
"""Replay-history sanitization shared across resume code paths.
When a session's last turn dies mid-tool-loop — the process is killed by a
restart/shutdown command, a stale-timeout fires, or an interrupt lands before
the tool result is written — the persisted transcript can end with a dangling
``assistant(tool_calls)`` (no matching ``tool`` answer) or an interrupted
``assistant→tool`` block. On resume the model sees that broken tail and
re-issues the unanswered call, producing an endless "thinking"/reboot loop
(#49201, #29086).
These pure helpers strip those tails before the history is replayed to the
model. They were originally local to ``gateway/run.py`` (which fixed the
messaging-gateway path) and are extracted here so every resume surface — the
messaging gateway AND the TUI/WebUI gateway — shares the same cleanup instead
of the WebUI path silently skipping it.
"""
from __future__ import annotations
import logging
from typing import Any, Dict, List
logger = logging.getLogger(__name__)
def is_interrupted_tool_result(content: Any) -> bool:
"""Return True if a tool result indicates the tool was interrupted."""
if not isinstance(content, str):
return False
lowered = content.lower()
if "[command interrupted]" in lowered:
return True
if "exit_code" in lowered and ("130" in lowered or "-1" in lowered):
return "interrupt" in lowered
return False
def strip_interrupted_tool_tails(
agent_history: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
"""Strip interrupted assistant→tool sequences from replay history.
Older interrupted gateway turns can be followed by a queued real user
message, so the interrupted assistant/tool block is not necessarily the
final tail by the time we rebuild replay history. Remove any contiguous
assistant(tool_calls) + tool-result block that contains an interrupted tool
result, while preserving successful tool-call sequences intact.
"""
if not agent_history:
return agent_history
cleaned: List[Dict[str, Any]] = []
i = 0
n = len(agent_history)
while i < n:
msg = agent_history[i]
if msg.get("role") == "assistant" and "tool_calls" in msg:
j = i + 1
tool_results: List[Dict[str, Any]] = []
while j < n and agent_history[j].get("role") == "tool":
tool_results.append(agent_history[j])
j += 1
if tool_results and any(
is_interrupted_tool_result(m.get("content", ""))
for m in tool_results
):
logger.debug(
"Stripping interrupted assistant→tool replay block "
"(indices %d%d, tool_results=%d)",
i, j - 1, len(tool_results),
)
i = j
continue
if msg.get("role") == "tool" and is_interrupted_tool_result(msg.get("content", "")):
logger.debug("Stripping orphan interrupted tool result from replay history")
i += 1
continue
cleaned.append(msg)
i += 1
return cleaned
def strip_dangling_tool_call_tail(
agent_history: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
"""Strip a trailing ``assistant(tool_calls)`` block left with NO answers.
When a tool call itself kills the gateway process (``docker restart``,
``systemctl restart``, ``kill``, ``hermes gateway restart``), the process
is terminated by SIGKILL *mid-call* — before the tool result is ever
written and before the orderly shutdown rewind
(``_drop_trailing_empty_response_scaffolding``) can run. The last thing
persisted is the ``assistant`` message that issued the ``tool_calls``,
with zero matching ``tool`` rows.
On resume the model sees an unanswered tool call at the tail and naturally
re-issues it — which restarts the gateway again, producing the infinite
reboot loop in #49201. ``strip_interrupted_tool_tails`` does not catch
this because there is no tool result to inspect for an interrupt marker.
This strips that dangling tail at the source so there is nothing for the
model to re-execute. It only acts when the tail is an
``assistant(tool_calls)`` whose calls have NO corresponding ``tool``
results — a completed assistant→tool pair (any tool answers present) is
left untouched so genuine mid-progress tool loops still resume.
"""
if not agent_history:
return agent_history
last = agent_history[-1]
if not (
isinstance(last, dict)
and last.get("role") == "assistant"
and last.get("tool_calls")
):
return agent_history
logger.debug(
"Stripping dangling unanswered assistant(tool_calls) tail "
"(%d call(s)) — process likely killed mid-tool-call by a "
"restart/shutdown command (#49201)",
len(last.get("tool_calls") or []),
)
return agent_history[:-1]
def sanitize_replay_history(
agent_history: List[Dict[str, Any]],
) -> List[Dict[str, Any]]:
"""Apply both replay-tail strippers in the canonical order.
Convenience entry point for resume code paths: removes interrupted
assistant→tool blocks anywhere in the history, then removes a dangling
unanswered ``assistant(tool_calls)`` tail. Returns the same list object
when there is nothing to strip.
"""
if not agent_history:
return agent_history
return strip_dangling_tool_call_tail(strip_interrupted_tool_tails(agent_history))

View File

@@ -8,6 +8,7 @@ rate-limited provider concurrently.
import random
import threading
import time
from typing import Any
# Monotonic counter for jitter seed uniqueness within the same process.
# Protected by a lock to avoid race conditions in concurrent retry paths
@@ -15,6 +16,14 @@ import time
_jitter_counter = 0
_jitter_lock = threading.Lock()
# Z.AI Coding Plan's GLM-5.2 endpoint often returns HTTP 429 code 1305
# ("The service may be temporarily overloaded...") for otherwise valid
# Hermes requests. Short retries tend to hammer the same overloaded window;
# after a few normal retries, progressively widen the wait window. Keep the
# cap interactive-friendly: a simple TUI message should fail visibly in minutes,
# not sit silent for 20+ minutes.
_ZAI_CODING_OVERLOAD_LONG_BACKOFF = (30.0, 60.0, 90.0, 120.0)
def jittered_backoff(
attempt: int,
@@ -55,3 +64,66 @@ def jittered_backoff(
jitter = rng.uniform(0, jitter_ratio * delay)
return delay + jitter
def _error_text(error: Any) -> str:
"""Best-effort flattened provider error text for retry classification."""
parts = [
error,
getattr(error, "message", None),
getattr(error, "body", None),
getattr(error, "response", None),
]
return " ".join(str(part) for part in parts if part is not None).lower()
def is_zai_coding_overload_error(*, base_url: str | None, model: str | None, error: Any) -> bool:
"""Return True for Z.AI Coding Plan transient overload 429s.
The coding-plan endpoint reports overload as HTTP 429 with body code 1305
and message "The service may be temporarily overloaded...". Treat only
that narrow shape specially so ordinary quota/billing 429s still fail fast
through the existing classifier.
"""
base = (base_url or "").lower()
model_name = (model or "").lower()
status = getattr(error, "status_code", None)
text = _error_text(error)
return (
status == 429
and "api.z.ai/api/coding/paas/v4" in base
and "glm-5.2" in model_name
and ("1305" in text or "temporarily overloaded" in text)
)
def adaptive_rate_limit_backoff(
attempt: int,
*,
base_url: str | None,
model: str | None,
error: Any,
default_wait: float,
short_attempts: int = 3,
) -> tuple[float, str | None]:
"""Provider-aware rate-limit backoff.
For most providers this returns ``default_wait`` unchanged. For Z.AI
Coding Plan GLM-5.2 overloads, keep the first ``short_attempts`` retries on
the normal short exponential schedule, then switch to progressively longer
waits (30s → 60s → 90s → 120s, capped) plus light jitter.
``attempt`` is 1-based, matching the retry loop's logged attempt number.
Returns ``(wait_seconds, reason_label)`` where ``reason_label`` is suitable
for status/log decoration when a provider-specific policy fired.
"""
if not is_zai_coding_overload_error(base_url=base_url, model=model, error=error):
return default_wait, None
if attempt <= short_attempts:
return default_wait, "zai_coding_overload_short"
idx = min(attempt - short_attempts - 1, len(_ZAI_CODING_OVERLOAD_LONG_BACKOFF) - 1)
base_delay = _ZAI_CODING_OVERLOAD_LONG_BACKOFF[idx]
# A smaller jitter ratio keeps long waits readable while still avoiding
# synchronized retry storms across concurrent Hermes sessions.
return jittered_backoff(1, base_delay=base_delay, max_delay=base_delay, jitter_ratio=0.2), "zai_coding_overload_long"

View File

@@ -122,6 +122,8 @@ from datetime import datetime, timezone
from pathlib import Path
from typing import Any, Callable, Dict, Iterator, List, Optional, Set, Tuple
from hermes_cli._subprocess_compat import IS_WINDOWS, windows_hide_flags
try:
import fcntl # POSIX only; Windows falls back to best-effort without flock.
except ImportError: # pragma: no cover
@@ -441,6 +443,7 @@ def _spawn(spec: ShellHookSpec, stdin_json: str) -> Dict[str, Any]:
return result
t0 = time.monotonic()
_popen_kwargs = {"creationflags": windows_hide_flags()} if IS_WINDOWS else {}
try:
proc = subprocess.run(
argv,
@@ -449,6 +452,7 @@ def _spawn(spec: ShellHookSpec, stdin_json: str) -> Dict[str, Any]:
timeout=spec.timeout,
text=True,
shell=False,
**_popen_kwargs,
)
except subprocess.TimeoutExpired:
result["timed_out"] = True
@@ -584,6 +588,17 @@ def _parse_response(event: str, stdout: str) -> Optional[Dict[str, Any]]:
return {"action": "block", "message": _block_message(data.get("reason"), data.get("message"))}
return None
if event == "pre_verify":
# "continue" (Hermes) / "block" (Claude-Code Stop: block the stop) both
# mean keep going; the message/reason is the follow-up for the model. A
# continue with no message is a no-op — let the turn finish.
action = str(data.get("action") or data.get("decision") or "").strip().lower()
if action in {"continue", "block"}:
message = data.get("message") or data.get("reason")
if isinstance(message, str) and message.strip():
return {"action": "continue", "message": message.strip()}
return None
context = data.get("context")
if isinstance(context, str) and context.strip():
return {"context": context}

View File

@@ -5,6 +5,8 @@ import re
import subprocess
from pathlib import Path
from hermes_cli._subprocess_compat import IS_WINDOWS, windows_hide_flags
logger = logging.getLogger(__name__)
# Matches ${HERMES_SKILL_DIR} / ${HERMES_SESSION_ID} tokens in SKILL.md.
@@ -66,6 +68,7 @@ def run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
Failures return a short ``[inline-shell error: ...]`` marker instead of
raising, so one bad snippet can't wreck the whole skill message.
"""
_popen_kwargs = {"creationflags": windows_hide_flags()} if IS_WINDOWS else {}
try:
completed = subprocess.run(
["bash", "-c", command],
@@ -75,6 +78,7 @@ def run_inline_shell(command: str, cwd: Path | None, timeout: int) -> str:
timeout=max(1, int(timeout)),
check=False,
stdin=subprocess.DEVNULL,
**_popen_kwargs,
)
except subprocess.TimeoutExpired:
return f"[inline-shell timeout after {timeout}s: {command}]"

View File

@@ -507,6 +507,34 @@ def get_all_skills_dirs() -> List[Path]:
return dirs
def _resolve_for_skill_ownership(path) -> Path:
path_obj = path if isinstance(path, Path) else Path(str(path))
try:
return path_obj.expanduser().resolve()
except (OSError, RuntimeError):
return path_obj.expanduser().absolute()
def is_external_skill_path(path) -> bool:
"""Return True when ``path`` lives under a configured external skills dir.
``skills.external_dirs`` are externally owned: Hermes can discover and view
their skills, and foreground user-directed tool calls may still edit them,
but autonomous lifecycle maintenance must treat them as read-only. This
helper centralizes the ownership boundary so curator/reporting/tool paths do
not each need to re-interpret the config.
"""
candidate = _resolve_for_skill_ownership(path)
for root in get_external_skills_dirs():
resolved_root = _resolve_for_skill_ownership(root)
try:
candidate.relative_to(resolved_root)
return True
except ValueError:
continue
return False
# ── Condition extraction ──────────────────────────────────────────────────

View File

@@ -0,0 +1,136 @@
"""Thinking-timeout detection and user-facing guidance for reasoning models.
When a known reasoning model (NVIDIA Nemotron 3 Ultra, OpenAI o1/o3,
Anthropic Opus 4.x thinking, DeepSeek R1, Qwen QwQ, xAI Grok reasoning)
hits a transport-layer error before the first content token arrives, the
upstream proxy has almost certainly idle-killed a long thinking stream —
not a true context overflow or a configuration error. The user needs
distinct guidance for this case:
"The model's thinking phase exceeded the upstream proxy's idle
timeout before the first content token arrived. This is a known
issue with reasoning models behind cloud gateways (NVIDIA NIM,
OpenAI, Anthropic, DeepSeek). Workarounds in priority order:
1. Set `providers.<provider>.models.<model>.stale_timeout_seconds: 900`
in `~/.hermes/config.yaml` to extend the per-call timeout...
2. Lower `reasoning_budget` or set `reasoning_effort: medium`...
3. Use a smaller / faster reasoning model..."
The existing `_is_stream_drop` guidance at
``agent/conversation_loop.py:3464-3486`` fires for large-file-write
stream drops ("try execute_code with Python's open() for large files")
which is the WRONG advice for the thinking-timeout case. This module
provides the detection and the message as standalone helpers so the
detection logic is unit-testable without driving the full retry loop,
and the message text can be regression-tested for spelling and accuracy.
Part 2 of Fixes #52310.
"""
from __future__ import annotations
from typing import Optional
# Substring set that identifies a transport-layer failure on the
# response stream. Same shape as the existing
# ``_SERVER_DISCONNECT_PATTERNS`` in ``agent/error_classifier.py:394``
# but extended to also catch the OSS-level error signature
# (``broken pipe`` / ``errno 32``) that the upstream kill surfaces
# to the OpenAI SDK wrapper.
_THINKING_TIMEOUT_SUBSTRINGS: tuple[str, ...] = (
"broken pipe",
"errno 32",
"remote protocol",
"connection reset",
"connection lost",
"peer closed",
"server disconnected",
)
def is_thinking_timeout(classified: object, model: str, error_msg: str) -> bool:
"""Return True when a reasoning model's thinking phase hit a transport kill.
Args:
classified: a :class:`agent.error_classifier.ClassifiedError` instance
(duck-typed here to avoid an import cycle in unit tests).
model: the model slug at failure time (e.g.
``"nvidia/nemotron-3-ultra-550b-a55b"``).
error_msg: lowercased string representation of the underlying
exception (typically ``str(api_error).lower()``).
Returns True when ALL conditions hold:
1. ``classified.reason == FailoverReason.timeout`` (the classifier
override at ``agent/error_classifier.py:720-738`` ensures this
is the case for reasoning models even on large sessions).
2. ``api_error`` has no ``.status_code`` attribute set (transport
disconnect, not an HTTP error).
3. ``model`` is in the reasoning-model allowlist (reuses
``agent.reasoning_timeouts.get_reasoning_stale_timeout_floor``).
4. ``error_msg`` contains one of the transport-kill substrings.
Non-reasoning models always return False. Non-transport errors
(billing / rate_limit / auth / context_overflow / format_error)
always return False. HTTP-status errors always return False.
"""
# Import here (not at module top) to keep this helper cheap to
# import even from callers that don't need it. ``agent.reasoning_timeouts``
# is small and dependency-free.
from agent.reasoning_timeouts import get_reasoning_stale_timeout_floor
# Condition 1: classifier says timeout. Use a string/value check
# rather than importing FailoverReason so this module has zero
# import cycles from the error_classifier package.
reason = getattr(classified, "reason", None)
reason_value = getattr(reason, "value", None)
if reason_value != "timeout":
return False
# Condition 2: no HTTP status code (transport, not API error).
# Caller is expected to gate on ``getattr(api_error, "status_code", None) is None``
# before calling this helper; the surface here is just the post-gate
# boolean so the caller can pass an already-prepped error_msg.
# Condition 3: reasoning model allowlist.
if get_reasoning_stale_timeout_floor(model) is None:
return False
# Condition 4: transport-kill substring in the error message.
error_msg_lower = (error_msg or "").lower()
return any(p in error_msg_lower for p in _THINKING_TIMEOUT_SUBSTRINGS)
def build_thinking_timeout_guidance(
provider: str, model: str, model_label: Optional[str] = None,
) -> str:
"""Return the user-facing guidance string appended to ``_final_response``.
Args:
provider: provider slug (e.g. ``"nvidia"``, ``"openai"``).
model: bare model slug the user would put in their config
(e.g. ``"nemotron-3-ultra-550b-a55b"`` if the user uses
NVIDIA direct, or the full ``"nvidia/nemotron-3-ultra-550b-a55b"``
if they go through an aggregator). Used verbatim in the
config snippet so the user can copy-paste.
model_label: optional short label for the model name in the
prose (e.g. ``"Nemotron 3 Ultra"``). Falls back to the
slug if not provided.
"""
label = model_label or model
return (
"\n\nThe model's thinking phase exceeded the upstream proxy's "
"idle timeout before the first content token arrived. This is a "
f"known issue with reasoning models (like {label}) behind cloud "
"gateways (NVIDIA NIM, OpenAI, Anthropic, DeepSeek). Workarounds "
"in priority order:\n"
f"1. Set `providers.{provider}.models.{model}.stale_timeout_seconds: 900` "
"in `~/.hermes/config.yaml` to extend the per-call timeout. "
"(Hermes's built-in floor is 600s for known reasoning models — "
"if you still see this after raising, the upstream cap is even "
"shorter.)\n"
"2. Lower `reasoning_budget` or set `reasoning_effort: medium` on this "
"model if the provider supports it.\n"
"3. Use a smaller / faster reasoning model if the task doesn't "
"require deep thinking."
)

View File

@@ -11,7 +11,8 @@ Pure module-level utilities extracted from ``run_agent.py``:
``_append_subdir_hint_to_multimodal`` — envelope helpers for the
``{"_multimodal": True, "content": [...], "text_summary": ...}`` dict
shape returned by tools like ``computer_use``.
* ``_extract_file_mutation_targets`` / ``_extract_error_preview``
* ``_extract_file_mutation_targets`` / ``_extract_landed_file_mutation_paths`` /
``_extract_error_preview`` —
per-turn file-mutation verifier inputs.
* ``_trajectory_normalize_msg`` — strip image blobs from a message for
trajectory saving.
@@ -269,6 +270,35 @@ def _extract_file_mutation_targets(tool_name: str, args: Dict[str, Any]) -> List
return []
def _extract_landed_file_mutation_paths(
tool_name: str,
args: Dict[str, Any],
result: Any,
) -> List[str]:
"""Return the concrete file paths a successful mutation reports."""
targets = _extract_file_mutation_targets(tool_name, args)
if tool_name not in _FILE_MUTATING_TOOLS or not isinstance(result, str):
return targets
try:
data = json.loads(result.strip())
except Exception:
return targets
if not isinstance(data, dict):
return targets
files = data.get("files_modified")
if isinstance(files, list):
landed = [str(p) for p in files if p]
if landed:
return landed
resolved = data.get("resolved_path")
if resolved:
return [str(resolved)]
return targets
def _extract_error_preview(result: Any, max_len: int = 180) -> str:
"""Pull a one-line error summary out of a tool result for footer display."""
text = _multimodal_text_summary(result) if result is not None else ""
@@ -411,6 +441,7 @@ __all__ = [
"_multimodal_text_summary",
"_append_subdir_hint_to_multimodal",
"_extract_file_mutation_targets",
"_extract_landed_file_mutation_paths",
"_extract_error_preview",
"_trajectory_normalize_msg",
"make_tool_result_message",

View File

@@ -24,8 +24,10 @@ from typing import Any, Optional
from agent.display import (
KawaiiSpinner,
build_tool_preview as _build_tool_preview,
build_tool_label as _build_tool_label,
get_cute_tool_message as _get_cute_tool_message_impl,
get_tool_emoji as _get_tool_emoji,
redact_tool_args_for_display as _redact_tool_args_for_display,
_detect_tool_failure,
)
from agent.tool_guardrails import ToolGuardrailDecision
@@ -69,12 +71,35 @@ def _budget_for_agent(agent) -> BudgetConfig:
_MAX_TOOL_WORKERS = 8
def _flush_session_db_after_tool_progress(
agent,
messages: list,
*,
stage: str,
) -> None:
"""Best-effort incremental SessionDB flush for tool-call progress.
Tool execution can perform side effects that terminate or restart the
current Hermes process before the normal turn-end persistence path runs.
Flush the already-appended assistant/tool messages immediately so the
transcript survives destructive-but-valid tool calls.
"""
try:
agent._flush_messages_to_session_db(messages)
except Exception as exc:
logger.warning("Incremental tool-call persistence failed after %s: %s", stage, exc)
def _ra():
"""Lazy reference to ``run_agent`` so patches like ``run_agent._set_interrupt`` work."""
import run_agent
return run_agent
def _is_interpreter_shutdown_submit_error(exc: RuntimeError) -> bool:
return "cannot schedule new futures after interpreter shutdown" in str(exc)
def _emit_terminal_post_tool_call(
agent,
*,
@@ -279,6 +304,11 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
f"[Tool execution cancelled — {tc.function.name} was skipped due to user interrupt]",
tc.id,
))
_flush_session_db_after_tool_progress(
agent,
messages,
stage=f"cancelled tool result {tc.function.name}",
)
return
# ── Parse args + pre-execution bookkeeping ───────────────────────
@@ -441,10 +471,11 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
if not agent.quiet_mode and getattr(agent, "tool_progress_mode", "all") != "off":
print(f" ⚡ Concurrent: {num_tools} tool calls — {tool_names_str}")
for i, (tc, name, args, middleware_trace, block_result, blocked_by_guardrail) in enumerate(parsed_calls, 1):
args_str = json.dumps(args, ensure_ascii=False)
display_args = _redact_tool_args_for_display(name, args) or args
args_str = json.dumps(display_args, ensure_ascii=False)
if agent.verbose_logging:
print(f" 📞 Tool {i}: {name}({list(args.keys())})")
print(agent._wrap_verbose("Args: ", json.dumps(args, indent=2, ensure_ascii=False)))
print(f" 📞 Tool {i}: {name}({list(display_args.keys())})")
print(agent._wrap_verbose("Args: ", json.dumps(display_args, indent=2, ensure_ascii=False)))
else:
args_preview = args_str[:agent.log_prefix_chars] + "..." if len(args_str) > agent.log_prefix_chars else args_str
print(f" 📞 Tool {i}: {name}({list(args.keys())}) - {args_preview}")
@@ -454,8 +485,9 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
continue
if agent.tool_progress_callback:
try:
preview = _build_tool_preview(name, args)
agent.tool_progress_callback("tool.started", name, preview, args)
display_args = _redact_tool_args_for_display(name, args) or args
preview = _build_tool_preview(name, display_args)
agent.tool_progress_callback("tool.started", name, preview, display_args)
except Exception as cb_err:
logging.debug(f"Tool progress callback error: {cb_err}")
@@ -464,7 +496,8 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
continue
if agent.tool_start_callback:
try:
agent.tool_start_callback(tc.id, name, args)
display_args = _redact_tool_args_for_display(name, args) or args
agent.tool_start_callback(tc.id, name, display_args)
except Exception as cb_err:
logging.debug(f"Tool start callback error: {cb_err}")
@@ -581,13 +614,40 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
if runnable_calls:
max_workers = min(len(runnable_calls), _MAX_TOOL_WORKERS)
with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as executor:
for i, tc, name, args in runnable_calls:
for submit_index, (i, tc, name, args) in enumerate(runnable_calls):
# Propagate the agent turn's ContextVars (e.g.
# _approval_session_key) AND thread-local approval/sudo
# callbacks into the worker thread; clears callbacks on exit.
f = executor.submit(
propagate_context_to_thread(_run_tool), i, tc, name, args, parsed_calls[i][3]
)
try:
f = executor.submit(
propagate_context_to_thread(_run_tool), i, tc, name, args, parsed_calls[i][3]
)
except RuntimeError as submit_error:
if not _is_interpreter_shutdown_submit_error(submit_error):
raise
skipped_calls = runnable_calls[submit_index:]
logger.warning(
"interpreter shutdown while scheduling concurrent tools; "
"skipping %d unsubmitted tool(s)",
len(skipped_calls),
)
for skipped_i, _tc, skipped_name, skipped_args in skipped_calls:
if results[skipped_i] is None:
middleware_trace = parsed_calls[skipped_i][3]
result = (
f"Error executing tool '{skipped_name}': "
"Python interpreter is shutting down; tool was not started"
)
results[skipped_i] = (
skipped_name,
skipped_args,
result,
0.0,
True,
False,
middleware_trace,
)
break
futures.append(f)
# Wait for all to complete with periodic heartbeats so the
@@ -737,7 +797,8 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
if not blocked and agent.tool_complete_callback:
try:
agent.tool_complete_callback(tc.id, name, args, function_result)
display_args = _redact_tool_args_for_display(name, args) or args
agent.tool_complete_callback(tc.id, name, display_args, function_result)
except Exception as cb_err:
logging.debug(f"Tool complete callback error: {cb_err}")
@@ -768,6 +829,11 @@ def execute_tool_calls_concurrent(agent, assistant_message, messages: list, effe
# String results pass through unchanged.
_tool_content = agent._tool_result_content_for_active_model(name, function_result)
messages.append(make_tool_result_message(name, _tool_content, tc.id))
_flush_session_db_after_tool_progress(
agent,
messages,
stage=f"tool result {name}",
)
# ── Per-tool /steer drain ───────────────────────────────────
# Same as the sequential path: drain between each collected
@@ -803,13 +869,16 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
agent._vprint(f"{agent.log_prefix}⚡ Interrupt: skipping {len(remaining_calls)} tool call(s)", force=True)
for skipped_tc in remaining_calls:
skipped_name = skipped_tc.function.name
skip_msg = {
"role": "tool",
"name": skipped_name,
"content": f"[Tool execution cancelled — {skipped_name} was skipped due to user interrupt]",
"tool_call_id": skipped_tc.id,
}
messages.append(skip_msg)
messages.append(make_tool_result_message(
skipped_name,
f"[Tool execution cancelled — {skipped_name} was skipped due to user interrupt]",
skipped_tc.id,
))
_flush_session_db_after_tool_progress(
agent,
messages,
stage=f"cancelled tool result {skipped_name}",
)
break
function_name = tool_call.function.name
@@ -891,10 +960,11 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
agent._iters_since_skill = 0
if not agent.quiet_mode and getattr(agent, "tool_progress_mode", "all") != "off":
args_str = json.dumps(function_args, ensure_ascii=False)
display_args = _redact_tool_args_for_display(function_name, function_args) or function_args
args_str = json.dumps(display_args, ensure_ascii=False)
if agent.verbose_logging:
print(f" 📞 Tool {i}: {function_name}({list(function_args.keys())})")
print(agent._wrap_verbose("Args: ", json.dumps(function_args, indent=2, ensure_ascii=False)))
print(f" 📞 Tool {i}: {function_name}({list(display_args.keys())})")
print(agent._wrap_verbose("Args: ", json.dumps(display_args, indent=2, ensure_ascii=False)))
else:
args_preview = args_str[:agent.log_prefix_chars] + "..." if len(args_str) > agent.log_prefix_chars else args_str
print(f" 📞 Tool {i}: {function_name}({list(function_args.keys())}) - {args_preview}")
@@ -915,14 +985,16 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
if not _execution_blocked and agent.tool_progress_callback:
try:
preview = _build_tool_preview(function_name, function_args)
agent.tool_progress_callback("tool.started", function_name, preview, function_args)
display_args = _redact_tool_args_for_display(function_name, function_args) or function_args
preview = _build_tool_preview(function_name, display_args)
agent.tool_progress_callback("tool.started", function_name, preview, display_args)
except Exception as cb_err:
logging.debug(f"Tool progress callback error: {cb_err}")
if not _execution_blocked and agent.tool_start_callback:
try:
agent.tool_start_callback(tool_call.id, function_name, function_args)
display_args = _redact_tool_args_for_display(function_name, function_args) or function_args
agent.tool_start_callback(tool_call.id, function_name, display_args)
except Exception as cb_err:
logging.debug(f"Tool start callback error: {cb_err}")
@@ -1152,7 +1224,8 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
if agent._should_emit_quiet_tool_messages():
face = random.choice(KawaiiSpinner.get_waiting_faces())
emoji = _get_tool_emoji(function_name)
preview = _build_tool_preview(function_name, function_args) or function_name
display_args = _redact_tool_args_for_display(function_name, function_args) or function_args
preview = _build_tool_label(function_name, display_args) or function_name
spinner = KawaiiSpinner(f"{face} {emoji} {preview}", spinner_type='dots', print_fn=agent._print_fn)
spinner.start()
_ce_result = None
@@ -1185,7 +1258,8 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
if agent._should_emit_quiet_tool_messages() and agent._should_start_quiet_spinner():
face = random.choice(KawaiiSpinner.get_waiting_faces())
emoji = _get_tool_emoji(function_name)
preview = _build_tool_preview(function_name, function_args) or function_name
display_args = _redact_tool_args_for_display(function_name, function_args) or function_args
preview = _build_tool_label(function_name, display_args) or function_name
spinner = KawaiiSpinner(f"{face} {emoji} {preview}", spinner_type='dots', print_fn=agent._print_fn)
spinner.start()
_mem_result = None
@@ -1216,7 +1290,8 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
if agent._should_emit_quiet_tool_messages() and agent._should_start_quiet_spinner():
face = random.choice(KawaiiSpinner.get_waiting_faces())
emoji = _get_tool_emoji(function_name)
preview = _build_tool_preview(function_name, function_args) or function_name
display_args = _redact_tool_args_for_display(function_name, function_args) or function_args
preview = _build_tool_label(function_name, display_args) or function_name
spinner = KawaiiSpinner(f"{face} {emoji} {preview}", spinner_type='dots', print_fn=agent._print_fn)
spinner.start()
_spinner_result = None
@@ -1378,7 +1453,8 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
if not _execution_blocked and agent.tool_complete_callback:
try:
agent.tool_complete_callback(tool_call.id, function_name, function_args, function_result)
display_args = _redact_tool_args_for_display(function_name, function_args) or function_args
agent.tool_complete_callback(tool_call.id, function_name, display_args, function_result)
except Exception as cb_err:
logging.debug(f"Tool complete callback error: {cb_err}")
@@ -1402,6 +1478,11 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
# (see parallel path for rationale). String results pass through.
_tool_content = agent._tool_result_content_for_active_model(function_name, function_result)
messages.append(make_tool_result_message(function_name, _tool_content, tool_call.id))
_flush_session_db_after_tool_progress(
agent,
messages,
stage=f"tool result {function_name}",
)
# ── Per-tool /steer drain ───────────────────────────────────
# Drain pending steer BETWEEN individual tool calls so the
@@ -1428,6 +1509,11 @@ def execute_tool_calls_sequential(agent, assistant_message, messages: list, effe
f"[Tool execution skipped — {skipped_name} was not started. User sent a new message]",
skipped_tc.id,
))
_flush_session_db_after_tool_progress(
agent,
messages,
stage=f"skipped tool result {skipped_name}",
)
break
if agent.tool_delay > 0 and i < len(assistant_message.tool_calls):

View File

@@ -5,12 +5,47 @@ This transport owns format conversion and normalization — NOT client lifecycle
streaming, or the _run_codex_stream() call path.
"""
import hashlib
import json
from typing import Any, Dict, List, Optional
from agent.transports.base import ProviderTransport
from agent.transports.types import NormalizedResponse, ToolCall
def _content_cache_key(instructions: str, tools: Optional[List[Dict[str, Any]]]) -> Optional[str]:
"""Content-address the prompt cache key from the static request prefix.
Returns ``pck_<sha256[:24]>`` of (instructions + sorted tool schemas), or
None when there is nothing static to key on. The cache key is a routing
hint only — never a correctness boundary — so two requests sharing a system
prompt and tool set intentionally resolve to the same warm prefix bucket.
The fix this exists for: recurring cron jobs build session_id as
``cron_<id>_<timestamp>``, so using session_id as the cache key made every
fire cache-cold. The static prefix (identity + tools) is identical across
fires, so hashing it gives a stable key that stays warm within the
provider's cache TTL. Sorting tools by name keeps the hash insertion-order
independent.
"""
if not instructions and not tools:
return None
tools_part = ""
if tools:
sorted_tools = sorted(
(t for t in tools if isinstance(t, dict)),
key=lambda t: str(t.get("name") or t.get("type") or ""),
)
tools_part = json.dumps(
sorted_tools, sort_keys=True, ensure_ascii=False, separators=(",", ":")
)
# \x00 separator so instructions ending in the tool JSON can't collide with
# a request whose instructions contain that JSON and whose tools are empty.
content = f"{instructions or ''}\x00{tools_part}"
digest = hashlib.sha256(content.encode("utf-8", errors="replace")).hexdigest()[:24]
return f"pck_{digest}"
class ResponsesApiTransport(ProviderTransport):
"""Transport for api_mode='codex_responses'.
@@ -71,7 +106,10 @@ class ResponsesApiTransport(ProviderTransport):
params:
instructions: str — system prompt (extracted from messages[0] if not given)
reasoning_config: dict | None — {effort, enabled}
session_id: str | None — used for prompt_cache_key + xAI conv header
session_id: str | None — transcript/session id; drives the xAI
x-grok-conv-id header and the Codex cache-scope headers, and is
the fallback prompt_cache_key when there is no static prefix to
content-address
max_tokens: int | None — max_output_tokens
timeout: float | None — per-request timeout forwarded to the SDK
request_overrides: dict | None — extra kwargs merged in
@@ -212,10 +250,17 @@ class ResponsesApiTransport(ProviderTransport):
kwargs["parallel_tool_calls"] = True
session_id = params.get("session_id")
# prompt_cache_key is content-addressed from the static prefix
# (instructions + tools), NOT session_id — recurring cron jobs carry a
# per-fire timestamp in session_id (cron_<id>_<ts>) that made every run
# cache-cold. session_id is left untouched for transcript isolation and
# the cache-scope routing headers below. Falls back to session_id when
# there is no static content to hash.
cache_key = _content_cache_key(instructions, response_tools) or session_id
# xAI Responses takes prompt_cache_key in extra_body (set further
# down); GitHub Models opts out of cache-key routing entirely.
if not is_github_responses and not is_xai_responses and session_id:
kwargs["prompt_cache_key"] = session_id
if not is_github_responses and not is_xai_responses and cache_key:
kwargs["prompt_cache_key"] = cache_key
if reasoning_enabled and is_xai_responses:
from agent.model_metadata import grok_supports_reasoning_effort
@@ -326,7 +371,7 @@ class ResponsesApiTransport(ProviderTransport):
merged_extra_body: Dict[str, Any] = {}
if isinstance(existing_extra_body, dict):
merged_extra_body.update(existing_extra_body)
merged_extra_body.setdefault("prompt_cache_key", session_id)
merged_extra_body.setdefault("prompt_cache_key", cache_key)
kwargs["extra_body"] = merged_extra_body
return kwargs

View File

@@ -28,8 +28,12 @@ import uuid
from dataclasses import dataclass
from typing import Any, Dict, List, Optional
from agent.conversation_compression import conversation_history_after_compression
from agent.iteration_budget import IterationBudget
from agent.model_metadata import estimate_request_tokens_rough
from agent.model_metadata import (
estimate_messages_tokens_rough,
estimate_request_tokens_rough,
)
logger = logging.getLogger(__name__)
@@ -57,6 +61,34 @@ def _compression_made_progress(
return orig_tokens > 0 and new_tokens < orig_tokens * 0.95
def _should_run_preflight_estimate(
messages: List[Dict[str, Any]],
protect_first_n: int,
protect_last_n: int,
threshold_tokens: int,
) -> bool:
"""Cheap gate for the (expensive) full preflight token estimate.
Returns ``True`` when either:
(a) message count exceeds the protected ranges (the historical gate), or
(b) a cheap char-based estimate already crosses the configured threshold
— the few-but-huge case from issue #27405 that the count-only gate
would silently skip (a handful of very large messages never trips
the count condition, so compression was never attempted and the
turn hit a hard context-overflow error).
Branch (b) uses ``estimate_messages_tokens_rough`` (the shared char-based
estimator) so a single large base64 image isn't mistaken for ~250K tokens.
It intentionally undercounts vs. the full request estimate — it omits the
system prompt and tool schemas — because it is only a *hint* deciding
whether to pay for the authoritative ``estimate_request_tokens_rough``,
which (together with ``should_compress``) makes the real decision.
"""
if len(messages) > protect_first_n + protect_last_n + 1:
return True
return estimate_messages_tokens_rough(messages) >= threshold_tokens
@dataclass
class TurnContext:
"""Values produced by the turn prologue and consumed by the turn loop."""
@@ -111,7 +143,13 @@ def build_turn_context(
# Guard stdio against OSError from broken pipes (systemd/headless/daemon).
install_safe_stdio()
agent._ensure_db_session()
# NOTE: the DB session row is created later, AFTER the system prompt is
# restored/built (see _ensure_db_session() below the system-prompt block).
# Creating it here — before _cached_system_prompt is populated — inserts a
# row with system_prompt=NULL on a fresh API/gateway agent that carries
# client-managed history, which then trips the "stored system prompt is
# null; rebuilding from scratch" warning and a needless first-turn prefix
# cache miss. (Issue #45499.)
# Tell auxiliary_client what the live main provider/model are for this turn.
try:
@@ -278,6 +316,11 @@ def build_turn_context(
active_system_prompt = agent._cached_system_prompt
# Create the DB session row now that _cached_system_prompt is populated, so
# the persisted snapshot is written non-NULL on the first turn (Issue
# #45499). Idempotent: _ensure_db_session() no-ops once the row exists.
agent._ensure_db_session()
# Crash-resilience: persist the inbound user turn as soon as the session row exists.
try:
agent._persist_session(messages, conversation_history)
@@ -289,10 +332,14 @@ def build_turn_context(
)
# ── Preflight context compression ──
if (
agent.compression_enabled
and len(messages) > agent.context_compressor.protect_first_n
+ agent.context_compressor.protect_last_n + 1
# Gate the (expensive) full token estimate behind a cheap pre-check.
# See ``_should_run_preflight_estimate`` for the OR semantics that fix
# issue #27405 (a few very large messages slipping past the count gate).
if agent.compression_enabled and _should_run_preflight_estimate(
messages,
agent.context_compressor.protect_first_n,
agent.context_compressor.protect_last_n,
agent.context_compressor.threshold_tokens,
):
_preflight_tokens = estimate_request_tokens_rough(
messages,
@@ -354,7 +401,9 @@ def build_turn_context(
_orig_len, len(messages), _orig_tokens, _preflight_tokens
):
break # Cannot compress further: neither rows nor tokens moved
conversation_history = None
conversation_history = conversation_history_after_compression(
agent, messages
)
agent._empty_content_retries = 0
agent._thinking_prefill_retries = 0
agent._last_content_with_tools = None
@@ -392,6 +441,9 @@ def build_turn_context(
# Per-turn file-mutation verifier state.
agent._turn_failed_file_mutations = {}
agent._turn_file_mutation_paths = set()
agent._verification_stop_nudges = 0
agent._pre_verify_nudges = 0
# Record the execution thread so interrupt()/clear_interrupt() can scope
# the tool-level interrupt signal to THIS agent's thread only.

View File

@@ -166,6 +166,25 @@ def finalize_turn(
# same empty-response loop again.
try:
agent._drop_trailing_empty_response_scaffolding(messages)
# When the turn was interrupted and the last message is a tool
# result, append a synthetic assistant message to close the
# tool-call sequence. Without this, the session persists a
# ``tool → user`` alternation that strict providers (Gemini,
# Claude) reject, causing them to hallucinate a continuation of
# the user's message on the next turn (#48879).
#
# ``_drop_trailing_empty_response_scaffolding`` only rewinds the
# tool tail when an empty-response scaffolding flag is present; a
# clean ``/stop`` interrupt after a successful tool sets no such
# flag, so the tool result survives as the tail and we close it
# here instead. On an interrupt ``final_response`` is typically
# empty, so fall back to an explicit placeholder rather than
# persisting an empty-content assistant turn.
if interrupted:
from agent.message_sanitization import close_interrupted_tool_sequence
close_interrupted_tool_sequence(messages, final_response)
agent._persist_session(messages, conversation_history)
except Exception as _persist_err:
_cleanup_errors.append(f"persist_session: {_persist_err}")
@@ -270,7 +289,14 @@ def finalize_turn(
and len(_stripped) <= 24
and _stripped[-1:] not in {".", "!", "?", "", "", "", "`", ")"}
)
if _is_empty_terminal or _is_partial_fragment:
_is_partial_stream_recovery = (
str(_turn_exit_reason) == "partial_stream_recovery"
)
if (
_is_empty_terminal
or _is_partial_fragment
or _is_partial_stream_recovery
):
_explanation = agent._format_turn_completion_explanation(
_turn_exit_reason
)

View File

@@ -67,6 +67,11 @@ class TurnRetryState:
# ── Restart signals (read by the outer loop after the attempt) ───────
restart_with_compressed_messages: bool = False
restart_with_length_continuation: bool = False
# Set when a content-filter stream stall (e.g. MiniMax "new_sensitive")
# has been escalated to the fallback chain: the partial-stream content
# was rolled back off ``messages`` and the loop should re-issue the API
# call against the newly-activated provider (#32421).
restart_with_rebuilt_messages: bool = False
def __iter__(self):
# Convenience for debugging / tests: iterate (name, value) pairs.

View File

@@ -0,0 +1,618 @@
"""Coding verification evidence ledger.
This module records what the agent actually proved while working in a code
workspace. It is deliberately passive: it never decides to run a suite, never
blocks completion, and never upgrades targeted checks into "repo green".
"""
from __future__ import annotations
import json
import re
import shlex
import sqlite3
import tempfile
import threading
from dataclasses import dataclass
from datetime import datetime, timedelta, timezone
from pathlib import Path
from typing import Any, Optional
from hermes_constants import get_hermes_home
_DB_LOCK = threading.Lock()
_MAX_OUTPUT_SUMMARY_CHARS = 2000
_MAX_EVIDENCE_AGE_DAYS = 30
_MAX_EVENTS_PER_SESSION_ROOT = 100
_MAX_TOTAL_UNREFERENCED_EVENTS = 10_000
_AD_HOC_SCRIPT_NAME_PREFIXES = ("hermes-verify-", "hermes-ad-hoc-")
_VERIFY_SCHEMA_VERSION = 1
_SHELL_SPLIT_RE = re.compile(r"\s*(?:&&|\|\||;)\s*")
@dataclass(frozen=True)
class VerificationEvidence:
"""A classified command result worth recording."""
command: str
canonical_command: str
kind: str
scope: str
status: str
exit_code: int
cwd: str
root: str
session_id: str
output_summary: str = ""
def _utc_now() -> str:
return datetime.now(timezone.utc).isoformat()
def _retention_cutoff() -> str:
return (datetime.now(timezone.utc) - timedelta(days=_MAX_EVIDENCE_AGE_DAYS)).isoformat()
def _db_path() -> Path:
return get_hermes_home() / "verification_evidence.db"
def _connect() -> sqlite3.Connection:
path = _db_path()
path.parent.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(path)
conn.execute("PRAGMA journal_mode=WAL")
conn.execute("PRAGMA busy_timeout=5000")
conn.row_factory = sqlite3.Row
_ensure_schema(conn)
return conn
def _ensure_schema(conn: sqlite3.Connection) -> None:
conn.execute(
"""
CREATE TABLE IF NOT EXISTS meta (
key TEXT PRIMARY KEY,
value TEXT NOT NULL
)
"""
)
conn.execute(
"""
CREATE TABLE IF NOT EXISTS verification_events (
id INTEGER PRIMARY KEY AUTOINCREMENT,
created_at TEXT NOT NULL,
session_id TEXT NOT NULL,
cwd TEXT NOT NULL,
root TEXT NOT NULL,
command TEXT NOT NULL,
canonical_command TEXT NOT NULL,
kind TEXT NOT NULL,
scope TEXT NOT NULL,
status TEXT NOT NULL,
exit_code INTEGER NOT NULL,
output_summary TEXT NOT NULL
)
"""
)
conn.execute(
"""
CREATE TABLE IF NOT EXISTS verification_state (
session_id TEXT NOT NULL,
root TEXT NOT NULL,
last_event_id INTEGER,
last_edit_at TEXT,
changed_paths_json TEXT NOT NULL DEFAULT '[]',
PRIMARY KEY (session_id, root)
)
"""
)
conn.execute(
"""
CREATE INDEX IF NOT EXISTS idx_verification_events_session_root
ON verification_events(session_id, root, id DESC)
"""
)
conn.execute(
"INSERT OR REPLACE INTO meta(key, value) VALUES ('schema_version', ?)",
(str(_VERIFY_SCHEMA_VERSION),),
)
conn.commit()
def _split_segment_tokens(command: str) -> list[list[str]]:
segments: list[list[str]] = []
for segment in _SHELL_SPLIT_RE.split(command.strip()):
if not segment:
continue
try:
tokens = shlex.split(segment)
except ValueError:
continue
if tokens:
segments.append(tokens)
return segments
def _clean_token(token: str) -> str:
token = token.strip()
while token.startswith("./"):
token = token[2:]
return token
def _canonical_tokens(canonical: str) -> list[str]:
try:
return [_clean_token(t) for t in shlex.split(canonical) if t]
except ValueError:
return []
def _find_subsequence(tokens: list[str], needle: list[str]) -> Optional[int]:
if not tokens or not needle or len(needle) > len(tokens):
return None
cleaned = [_clean_token(t) for t in tokens]
for idx in range(0, len(cleaned) - len(needle) + 1):
if cleaned[idx:idx + len(needle)] == needle:
return idx
return None
def _strip_command_prefix(tokens: list[str]) -> list[str]:
"""Remove harmless command prefixes before matching canonical commands."""
remaining = list(tokens)
if remaining and remaining[0] == "env":
remaining = remaining[1:]
while remaining and "=" in remaining[0] and not remaining[0].startswith("-"):
remaining = remaining[1:]
while remaining and remaining[0] in {"command", "time", "noglob"}:
remaining = remaining[1:]
return remaining
def _equivalent_needles(needle: list[str]) -> list[list[str]]:
"""Return command spellings equivalent to the detected canonical command."""
candidates = [needle]
if len(needle) >= 3 and needle[1] == "run":
package_manager = needle[0]
script_name = needle[2]
if package_manager in {"npm", "pnpm", "yarn", "bun"}:
candidates.append([package_manager, script_name])
if len(needle) == 1 and "/" in needle[0]:
candidates.extend([["bash", needle[0]], ["sh", needle[0]]])
if needle == ["pytest"]:
candidates.extend(
[
["python", "-m", "pytest"],
["python3", "-m", "pytest"],
["uv", "run", "pytest"],
["poetry", "run", "pytest"],
["pipenv", "run", "pytest"],
]
)
return candidates
def _find_canonical_match(command: str, canonical_commands: list[str]) -> Optional[tuple[str, list[str]]]:
"""Return ``(canonical, trailing_args)`` for the first detected command."""
segments = _split_segment_tokens(command)
for canonical in canonical_commands:
needle = _canonical_tokens(canonical)
if not needle:
continue
for tokens in segments:
candidate_tokens = _strip_command_prefix(tokens)
for candidate in _equivalent_needles(needle):
if candidate_tokens[:len(candidate)] == candidate:
return canonical, candidate_tokens[len(candidate):]
return None
def _kind_for_command(canonical: str) -> str:
lowered = canonical.lower()
if any(word in lowered for word in ("lint", "eslint", "ruff")):
return "lint"
if any(word in lowered for word in ("typecheck", "tsc", "mypy", "pyright", "ty")):
return "typecheck"
if "build" in lowered:
return "build"
if "fmt" in lowered or "format" in lowered:
return "format"
if "check" in lowered and "test" not in lowered:
return "check"
return "test"
def _looks_like_target(arg: str) -> bool:
if not arg or arg.startswith("-") or "=" in arg:
return False
return (
"/" in arg
or "\\" in arg
or "::" in arg
or arg.endswith((".py", ".js", ".jsx", ".ts", ".tsx", ".rs", ".go", ".java"))
or arg.startswith(("test_", "tests", "spec", "__tests__"))
)
def _scope_for_args(args: list[str]) -> str:
return "targeted" if any(_looks_like_target(arg) for arg in args) else "full"
def _is_under_temp_dir(token: str) -> bool:
if not token or token.startswith("-"):
return False
try:
path = Path(token).expanduser()
if not path.is_absolute():
return False
resolved = path.resolve()
temp_root = Path(tempfile.gettempdir()).resolve()
return resolved == temp_root or temp_root in resolved.parents
except Exception:
return False
def _is_under_root(token: str, root: str | Path | None) -> bool:
if not root:
return False
try:
path = Path(token).expanduser().resolve()
root_path = Path(root).expanduser().resolve()
return path == root_path or root_path in path.parents
except Exception:
return False
def _is_temp_script_path(token: str, root: str | Path | None) -> bool:
try:
name = Path(token).expanduser().name
except Exception:
return False
return (
name.startswith(_AD_HOC_SCRIPT_NAME_PREFIXES)
and _is_under_temp_dir(token)
and not _is_under_root(token, root)
)
def _ad_hoc_script_args(tokens: list[str], root: str | Path | None) -> Optional[list[str]]:
candidate_tokens = _strip_command_prefix(tokens)
if not candidate_tokens:
return None
command = candidate_tokens[0]
if _is_temp_script_path(command, root):
return candidate_tokens[1:]
if command in {"python", "python3", "node", "bash", "sh", "ruby", "perl"}:
for idx, token in enumerate(candidate_tokens[1:], start=1):
if token == "--":
continue
if _is_temp_script_path(token, root):
return candidate_tokens[idx + 1:]
if not token.startswith("-"):
return None
return None
def _find_ad_hoc_match(command: str, root: str | Path | None) -> Optional[list[str]]:
for tokens in _split_segment_tokens(command):
trailing_args = _ad_hoc_script_args(tokens, root)
if trailing_args is not None:
return trailing_args
return None
def _summarize_output(output: str) -> str:
text = (output or "").strip()
if len(text) <= _MAX_OUTPUT_SUMMARY_CHARS:
return text
head = _MAX_OUTPUT_SUMMARY_CHARS // 3
tail = _MAX_OUTPUT_SUMMARY_CHARS - head
return (
text[:head]
+ f"\n... [{len(text) - _MAX_OUTPUT_SUMMARY_CHARS} chars omitted] ...\n"
+ text[-tail:]
)
def _prune_old_events(conn: sqlite3.Connection, *, session_id: str, root: str) -> None:
"""Bound ledger growth without deleting the current state pointer."""
cutoff = _retention_cutoff()
conn.execute(
"""
DELETE FROM verification_events
WHERE session_id = ?
AND root = ?
AND id NOT IN (
SELECT id FROM verification_events
WHERE session_id = ? AND root = ?
ORDER BY id DESC
LIMIT ?
)
""",
(session_id, root, session_id, root, _MAX_EVENTS_PER_SESSION_ROOT),
)
conn.execute(
"""
DELETE FROM verification_state
WHERE (
last_edit_at IS NOT NULL
AND last_edit_at < ?
)
OR (
last_edit_at IS NULL
AND last_event_id IN (
SELECT id FROM verification_events
WHERE created_at < ?
)
)
""",
(cutoff, cutoff),
)
conn.execute(
"""
DELETE FROM verification_events
WHERE created_at < ?
AND id NOT IN (
SELECT last_event_id FROM verification_state
WHERE last_event_id IS NOT NULL
)
""",
(cutoff,),
)
conn.execute(
"""
DELETE FROM verification_events
WHERE id NOT IN (
SELECT id FROM verification_events
ORDER BY id DESC
LIMIT ?
)
AND id NOT IN (
SELECT last_event_id FROM verification_state
WHERE last_event_id IS NOT NULL
)
""",
(_MAX_TOTAL_UNREFERENCED_EVENTS,),
)
def classify_verification_command(
command: str,
*,
cwd: str | Path | None = None,
session_id: str | None = None,
exit_code: int = 0,
output: str = "",
) -> Optional[VerificationEvidence]:
"""Classify a terminal command as verification evidence, if applicable."""
if not command or not isinstance(command, str):
return None
try:
from agent.coding_context import project_facts_for
facts = project_facts_for(cwd)
except Exception:
facts = None
if not facts:
return None
verify_commands = list(facts.get("verifyCommands") or [])
match = _find_canonical_match(command, verify_commands)
is_ad_hoc = False
if match is None and not verify_commands:
ad_hoc_args = _find_ad_hoc_match(command, facts.get("root"))
if ad_hoc_args is not None:
match = ("ad-hoc verification script", ad_hoc_args)
is_ad_hoc = True
if match is None:
return None
canonical, trailing_args = match
return VerificationEvidence(
command=command,
canonical_command=canonical,
kind="ad_hoc" if is_ad_hoc else _kind_for_command(canonical),
scope="targeted" if is_ad_hoc else _scope_for_args(trailing_args),
status="passed" if int(exit_code) == 0 else "failed",
exit_code=int(exit_code),
cwd=str(Path(cwd or ".").resolve()),
root=str(facts.get("root") or Path(cwd or ".").resolve()),
session_id=str(session_id or "default"),
output_summary=_summarize_output(output),
)
def record_terminal_result(
*,
command: str,
cwd: str | Path | None,
session_id: str | None,
exit_code: int,
output: str = "",
) -> Optional[dict[str, Any]]:
"""Record a foreground terminal result when it is verification evidence."""
evidence = classify_verification_command(
command,
cwd=cwd,
session_id=session_id,
exit_code=exit_code,
output=output,
)
if evidence is None:
return None
created_at = _utc_now()
with _DB_LOCK:
with _connect() as conn:
cur = conn.execute(
"""
INSERT INTO verification_events(
created_at, session_id, cwd, root, command, canonical_command,
kind, scope, status, exit_code, output_summary
) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
""",
(
created_at,
evidence.session_id,
evidence.cwd,
evidence.root,
evidence.command,
evidence.canonical_command,
evidence.kind,
evidence.scope,
evidence.status,
evidence.exit_code,
evidence.output_summary,
),
)
if cur.lastrowid is None:
raise RuntimeError("verification event insert did not return an id")
event_id = int(cur.lastrowid)
conn.execute(
"""
INSERT INTO verification_state(
session_id, root, last_event_id, last_edit_at, changed_paths_json
) VALUES (?, ?, ?, NULL, '[]')
ON CONFLICT(session_id, root) DO UPDATE SET
last_event_id = excluded.last_event_id,
last_edit_at = NULL,
changed_paths_json = '[]'
""",
(evidence.session_id, evidence.root, event_id),
)
_prune_old_events(conn, session_id=evidence.session_id, root=evidence.root)
conn.commit()
return {"id": event_id, **evidence.__dict__, "created_at": created_at}
def mark_workspace_edited(
*,
session_id: str | None,
cwd: str | Path | None,
paths: list[str] | tuple[str, ...] | None = None,
) -> Optional[dict[str, Any]]:
"""Mark verification evidence stale after a successful file edit."""
try:
from agent.coding_context import project_facts_for
facts = project_facts_for(cwd)
except Exception:
facts = None
if not facts:
return None
sid = str(session_id or "default")
root = str(facts.get("root") or Path(cwd or ".").resolve())
changed_paths = sorted({str(p) for p in (paths or []) if p})
edited_at = _utc_now()
with _DB_LOCK:
with _connect() as conn:
row = conn.execute(
"""
SELECT changed_paths_json FROM verification_state
WHERE session_id = ? AND root = ?
""",
(sid, root),
).fetchone()
existing: set[str] = set()
if row is not None:
try:
existing = set(json.loads(row["changed_paths_json"] or "[]"))
except (TypeError, ValueError):
existing = set()
merged = sorted((existing | set(changed_paths)))[-200:]
conn.execute(
"""
INSERT INTO verification_state(
session_id, root, last_event_id, last_edit_at, changed_paths_json
) VALUES (?, ?, NULL, ?, ?)
ON CONFLICT(session_id, root) DO UPDATE SET
last_edit_at = excluded.last_edit_at,
changed_paths_json = excluded.changed_paths_json
""",
(sid, root, edited_at, json.dumps(merged)),
)
conn.commit()
return {"session_id": sid, "root": root, "last_edit_at": edited_at, "changed_paths": changed_paths}
def verification_status(
*,
session_id: str | None,
cwd: str | Path | None,
) -> dict[str, Any]:
"""Return the best known verification state for a session/workspace."""
try:
from agent.coding_context import project_facts_for
facts = project_facts_for(cwd)
except Exception:
facts = None
if not facts:
return {"status": "not_applicable", "evidence": None}
sid = str(session_id or "default")
root = str(facts.get("root") or Path(cwd or ".").resolve())
with _DB_LOCK:
with _connect() as conn:
state = conn.execute(
"""
SELECT last_event_id, last_edit_at, changed_paths_json
FROM verification_state
WHERE session_id = ? AND root = ?
""",
(sid, root),
).fetchone()
if state is None:
return {
"status": "unverified",
"evidence": None,
"root": root,
"session_id": sid,
"changed_paths": [],
}
event = None
if state["last_event_id"] is not None:
event = conn.execute(
"SELECT * FROM verification_events WHERE id = ?",
(state["last_event_id"],),
).fetchone()
changed_paths: list[str] = []
try:
changed_paths = json.loads(state["changed_paths_json"] or "[]")
except (TypeError, ValueError):
changed_paths = []
if event is None:
return {
"status": "unverified",
"evidence": None,
"root": root,
"session_id": sid,
"changed_paths": changed_paths,
}
evidence = dict(event)
if state["last_edit_at"] and state["last_edit_at"] > evidence["created_at"]:
status = "stale"
else:
status = evidence["status"]
return {
"status": status,
"evidence": evidence,
"root": root,
"session_id": sid,
"changed_paths": changed_paths,
}

314
agent/verification_stop.py Normal file
View File

@@ -0,0 +1,314 @@
"""Turn-end verification guard for coding edits.
This module is intentionally policy-only. It never runs checks itself; it turns
the passive verification ledger into a bounded follow-up when the model tries to
finish immediately after editing code without fresh evidence.
"""
from __future__ import annotations
import os
import tempfile
from pathlib import Path
from typing import Any, Iterable
_MAX_CHANGED_PATHS_IN_NUDGE = 8
# Non-code file extensions whose edits carry no verifiable runtime behavior:
# documentation, prose, and data/markup that no test/build exercises. When a
# turn touches ONLY these, verify-on-stop has nothing to check, so the nudge is
# suppressed (this is fix "C" for the doc/markdown/skill false-positive — a
# SKILL.md or README edit must never demand a /tmp verification script). A turn
# that edits any non-listed path (a real source/code/config file) still nudges.
_NON_CODE_VERIFY_EXTENSIONS = frozenset(
{
".md",
".markdown",
".mdx",
".rst",
".txt",
".text",
".adoc",
".asciidoc",
".org",
".log",
".csv",
".tsv",
}
)
# Filenames (case-insensitive, extension-less or otherwise) that are pure prose
# even without a recognized doc extension.
_NON_CODE_VERIFY_FILENAMES = frozenset(
{
"license",
"licence",
"notice",
"authors",
"contributors",
"changelog",
"codeowners",
}
)
def _is_non_code_path(raw: str) -> bool:
"""Return True when a changed path is documentation/prose with nothing to verify."""
try:
p = Path(str(raw))
except Exception:
return False
suffix = p.suffix.lower()
if suffix in _NON_CODE_VERIFY_EXTENSIONS:
return True
if not suffix and p.name.lower() in _NON_CODE_VERIFY_FILENAMES:
return True
return False
def _filter_verifiable_paths(paths: Iterable[str]) -> list[str]:
"""Drop documentation/prose paths; keep paths that could have verifiable behavior."""
return [p for p in paths if p and not _is_non_code_path(p)]
# Session identities (platform or source) that are NOT human conversational
# messaging surfaces: interactive coding surfaces (CLI, TUI, desktop, codex,
# local, gateway) and programmatic callers (API server, webhooks, tools).
# Verify-on-stop stays ON by default for these. Any other resolved gateway
# platform is a conversational messaging surface (Telegram, Discord, WhatsApp,
# Signal, Slack, etc.) where the verification narrative would reach a human as
# chat noise, so it defaults OFF. Mirrors LOCAL_SESSION_SOURCE_IDS in
# apps/desktop/src/lib/session-source.ts; keep roughly in sync when adding a
# local or programmatic surface. Default-deny by design: an unrecognized
# identity is treated as messaging (OFF) so a new chat platform never leaks the
# verification receipt before this set is updated.
_NON_MESSAGING_SESSION_SURFACES = frozenset(
{
"",
"cli",
"codex",
"desktop",
"gateway",
"local",
"tui",
"tool",
"api_server",
"webhook",
"msgraph_webhook",
}
)
def _session_is_messaging_surface() -> bool:
"""Return whether this turn is delivered over a human messaging channel.
The gateway binds the platform value (e.g. ``telegram``) to
``HERMES_SESSION_PLATFORM``; the CLI and TUI set ``HERMES_SESSION_SOURCE``
(e.g. ``cli``, ``tui``) instead. Both are consulted via the session-context
helper (with an ``os.environ`` fallback), alongside the ``HERMES_PLATFORM``
override, matching the sibling platform resolution in
``agent/skill_commands.py`` and ``agent/prompt_builder.py``. A turn is a
messaging surface when a resolved identity is present and is not a known
non-messaging surface.
"""
try:
from gateway.session_context import get_session_env
platform = (
os.getenv("HERMES_PLATFORM")
or get_session_env("HERMES_SESSION_PLATFORM", "")
)
source = get_session_env("HERMES_SESSION_SOURCE", "")
except Exception:
platform = os.getenv("HERMES_PLATFORM", "") or os.environ.get(
"HERMES_SESSION_PLATFORM", ""
)
source = os.environ.get("HERMES_SESSION_SOURCE", "")
for identity in (platform, source):
identity = str(identity or "").strip().lower()
if identity and identity not in _NON_MESSAGING_SESSION_SURFACES:
return True
return False
def verify_on_stop_enabled(config: dict[str, Any] | None = None) -> bool:
"""Return whether edit -> verify-before-finish behavior is enabled.
Precedence: an explicit ``HERMES_VERIFY_ON_STOP`` env var wins, then an
explicit ``agent.verify_on_stop`` config value. The config default is
``False`` (see ``DEFAULT_CONFIG``) — verify-on-stop is OFF unless the user
opts in. The legacy ``"auto"`` sentinel is still honored for anyone who
sets it explicitly: it resolves to ON for interactive coding surfaces
(CLI, TUI, desktop) and programmatic callers, and OFF for conversational
messaging surfaces (Telegram, Discord, etc.). A missing/unknown value
falls back to OFF.
"""
env = os.environ.get("HERMES_VERIFY_ON_STOP")
if env is not None:
return env.strip().lower() not in {"0", "false", "no", "off"}
if config is None:
try:
from hermes_cli.config import load_config
config = load_config()
except Exception:
config = {}
agent_cfg = (config or {}).get("agent") if isinstance(config, dict) else None
cfg_val = agent_cfg.get("verify_on_stop") if isinstance(agent_cfg, dict) else None
if isinstance(cfg_val, bool):
return cfg_val
if isinstance(cfg_val, str):
token = cfg_val.strip().lower()
if token in {"1", "true", "yes", "on"}:
return True
if token in {"0", "false", "no", "off"}:
return False
if token == "auto":
# Explicit opt-in to the legacy surface-aware behavior.
return not _session_is_messaging_surface()
# Missing or unknown value -> OFF (the new default).
return False
def _candidate_cwds(paths: Iterable[str]) -> list[Path]:
candidates: list[Path] = []
seen: set[str] = set()
for raw in paths:
if not raw:
continue
try:
path = Path(raw).expanduser()
candidate = path if path.is_dir() else path.parent
resolved = str(candidate.resolve())
except Exception:
continue
if resolved not in seen:
seen.add(resolved)
candidates.append(Path(resolved))
return candidates
def _verification_snapshot(
*,
session_id: str | None,
changed_paths: list[str],
) -> tuple[dict[str, Any], dict[str, Any]] | None:
"""Return ``(status, facts)`` for the first edited workspace needing proof."""
try:
from agent.coding_context import project_facts_for
from agent.verification_evidence import verification_status
except Exception:
return None
first_snapshot: tuple[dict[str, Any], dict[str, Any]] | None = None
for cwd in _candidate_cwds(changed_paths):
facts = project_facts_for(cwd)
if not facts:
continue
status = verification_status(session_id=session_id, cwd=cwd)
snapshot = (status, facts)
if first_snapshot is None:
first_snapshot = snapshot
if str(status.get("status") or "unverified") != "passed":
return snapshot
return first_snapshot
def _format_changed_paths(paths: list[str]) -> str:
shown = paths[:_MAX_CHANGED_PATHS_IN_NUDGE]
lines = [f"- `{path}`" for path in shown]
remaining = len(paths) - len(shown)
if remaining > 0:
lines.append(f"- ... and {remaining} more")
return "\n".join(lines)
def _status_detail(status: dict[str, Any]) -> str:
state = str(status.get("status") or "unverified")
evidence = status.get("evidence") if isinstance(status.get("evidence"), dict) else None
if not evidence:
return state
command = evidence.get("canonical_command") or evidence.get("command")
summary = str(evidence.get("output_summary") or "").strip()
parts = [state]
if command:
parts.append(f"last command `{command}`")
if summary:
max_summary = 1200
if len(summary) > max_summary:
summary = summary[:max_summary].rstrip() + "\n... [truncated]"
parts.append(f"last output:\n{summary}")
return "\n".join(parts)
def build_verify_on_stop_nudge(
*,
session_id: str | None,
changed_paths: Iterable[str],
attempts: int = 0,
max_attempts: int = 2,
) -> str | None:
"""Return a synthetic follow-up when edited code lacks fresh verification."""
# Drop documentation/prose paths (markdown, skills, README, LICENSE, ...) —
# they carry no verifiable behavior, so a turn that touched only those has
# nothing to verify and must not nudge.
paths = sorted({str(p) for p in _filter_verifiable_paths(changed_paths)})
if not paths or attempts >= max_attempts:
return None
snapshot = _verification_snapshot(session_id=session_id, changed_paths=paths)
if snapshot is None:
return None
status, facts = snapshot
verify_commands = [
str(cmd).strip()
for cmd in (facts.get("verifyCommands") or [])
if str(cmd).strip()
]
state = str(status.get("status") or "unverified")
if state == "passed":
return None
# Optional shipped coding guidance, only paid when this evidence gate fires.
try:
from agent.verify_hooks import coding_verify_guidance
guidance = coding_verify_guidance()
except Exception:
guidance = None
addendum = f"\n\n{guidance}" if guidance else ""
if verify_commands:
command_instruction = (
"Run the relevant verification command now ("
+ ", ".join(f"`{cmd}`" for cmd in verify_commands[:3])
+ (", ..." if len(verify_commands) > 3 else "")
+ "), read any failure, repair the code, and summarize what passed."
)
else:
temp_dir = tempfile.gettempdir()
command_instruction = (
"No canonical test/lint/build command was detected. Create a focused "
f"temporary verification script under `{temp_dir}` using an OS-safe "
"`tempfile` path with a `hermes-verify-` filename prefix, run it "
"against the changed behavior, clean it up when possible, and "
"summarize it explicitly as ad-hoc verification rather than suite "
"green."
)
return (
"[System: You edited code in this turn, but the workspace does not have "
"fresh passing verification evidence yet.\n\n"
f"Verification status: {_status_detail(status)}\n\n"
f"Changed paths:\n{_format_changed_paths(paths)}\n\n"
f"{command_instruction} If verification is not possible, explain the "
"concrete blocker instead of claiming the work is fully verified."
f"{addendum}]"
)
__all__ = ["build_verify_on_stop_nudge", "verify_on_stop_enabled"]

69
agent/verify_hooks.py Normal file
View File

@@ -0,0 +1,69 @@
"""Verification-loop helpers for the ``pre_verify`` round-end gate.
When the agent has edited code and is about to verify/finish, the loop fires the
``pre_verify`` hook (user directives resolved by
:func:`hermes_cli.plugins.get_pre_verify_continue_message`). A directive keeps
the agent going one more turn — run a check, defer it, tidy the diff — instead of
stopping immediately.
The shipped coding guidance lives on the evidence-based verification-stop nudge
(``agent/verification_stop.py``), not as a second default stop gate. That keeps
the default token cost tied to the existing "missing verification evidence"
decision while preserving ``pre_verify`` for user/plugin policy.
"""
from __future__ import annotations
from typing import Any, Optional
from utils import is_truthy_value
DEFAULT_MAX_VERIFY_NUDGES = 3
# Shipped guidance appended to the verification-stop nudge when code lacks fresh
# verification evidence. Wording mirrors the user-facing "clean your work"
# workflow, but does not create its own extra model turn.
CODING_VERIFY_GUIDANCE = (
"[Coding] Before you run tests/linters or call this done: if this is "
"creative UI/visual work, hold off on tests and linters until the user says "
"they like the result or you're about to commit. And before every commit, "
"clean your work: keep it KISS/DRY, match the surrounding code style, and be "
"elitist, shorthand, clever, concise, efficient, and elegant."
)
def max_verify_nudges(config: Optional[dict[str, Any]] = None) -> int:
"""Bound on consecutive ``pre_verify`` continue directives per turn (>= 0)."""
agent_cfg = _agent_cfg(config)
raw = agent_cfg.get("max_verify_nudges")
try:
return max(0, int(raw))
except (TypeError, ValueError):
return DEFAULT_MAX_VERIFY_NUDGES
def coding_verify_guidance(config: Optional[dict[str, Any]] = None) -> Optional[str]:
"""Return the optional guidance appended to verification-stop nudges."""
if not is_truthy_value(_agent_cfg(config).get("verify_guidance", True), default=True):
return None
return CODING_VERIFY_GUIDANCE
def _agent_cfg(config: Optional[dict[str, Any]]) -> dict[str, Any]:
if config is None:
try:
from hermes_cli.config import load_config
config = load_config()
except Exception:
config = {}
agent_cfg = (config or {}).get("agent") if isinstance(config, dict) else None
return agent_cfg if isinstance(agent_cfg, dict) else {}
__all__ = [
"CODING_VERIFY_GUIDANCE",
"DEFAULT_MAX_VERIFY_NUDGES",
"coding_verify_guidance",
"max_verify_nudges",
]

View File

@@ -85,7 +85,7 @@ Installers are built and uploaded to GitHub Releases manually. macOS/Windows sig
### How it works
The packaged app ships the Electron shell and a native React chat surface. On first launch it can install the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. Backend resolution first honours `HERMES_DESKTOP_HERMES_ROOT`, then a completed managed install, then a probed `hermes` on `PATH` (unless `HERMES_DESKTOP_IGNORE_EXISTING=1` is set), and finally an explicit `HERMES_DESKTOP_HERMES` command override for packagers/troubleshooting. The renderer (React, in `src/`) talks to a `hermes dashboard` backend over the `tui_gateway`/dashboard APIs and reuses the agent runtime rather than embedding `hermes --tui`. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
The packaged app ships the Electron shell and a native React chat surface. On first launch it can install the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. Backend resolution first honours `HERMES_DESKTOP_HERMES_ROOT`, then a completed managed install, then a probed `hermes` on `PATH` (unless `HERMES_DESKTOP_IGNORE_EXISTING=1` is set), and finally an explicit `HERMES_DESKTOP_HERMES` command override for packagers/troubleshooting. The renderer (React, in `src/`) talks to a headless backend the app launches for you — a `hermes serve` process that serves the `tui_gateway` JSON-RPC/WebSocket API — through the framework-agnostic client in [`apps/shared`](../shared/) (the same client the web dashboard consumes), and reuses the agent runtime rather than embedding `hermes --tui`. The app is **self-contained**: it runs its own `hermes serve` backend and never opens or requires the web dashboard UI. (For backward compatibility, a runtime that predates the `serve` command automatically falls back to a headless `dashboard --no-open` — see `electron/backend-command.cjs` — so mid-upgrade installs never break.) The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
### Verification

View File

@@ -17,5 +17,5 @@
"lib": "@/lib",
"hooks": "@/hooks"
},
"iconLibrary": "lucide"
"iconLibrary": "tabler"
}

View File

@@ -0,0 +1,51 @@
'use strict'
// Backend subcommand routing for the desktop-managed Hermes process.
//
// The desktop app launches its own headless backend via `hermes serve` — it
// must NEVER depend on or launch the browser `dashboard`. But `serve` is a
// newer subcommand: a runtime that predates it (an older managed install the
// app hasn't updated yet, or an older `hermes` resolved from PATH) only knows
// `dashboard --no-open`. To avoid bricking those users mid-upgrade we detect
// whether the resolved runtime understands `serve` and, only when it does not,
// fall back to the legacy `dashboard --no-open` invocation. Both produce the
// exact same headless gateway; `serve` is just the decoupled name.
//
// These helpers are pure so they can be unit-tested without Electron.
/**
* Build the canonical headless backend argv (always `serve`).
* @param {string} [profile] optional Hermes profile to pin via `--profile`.
*/
function serveBackendArgs(profile) {
const head = profile ? ['--profile', profile] : []
return [...head, 'serve', '--host', '127.0.0.1', '--port', '0']
}
/**
* Rewrite a resolved backend argv from `serve` to the legacy
* `dashboard --no-open` form, preserving every other argument (incl. a leading
* `-m hermes_cli.main` and any `--profile <name>`). Returns a copy; if there is
* no `serve` token the argv is returned unchanged.
*/
function dashboardFallbackArgs(args) {
const i = args.indexOf('serve')
if (i === -1) return args.slice()
return [...args.slice(0, i), 'dashboard', '--no-open', ...args.slice(i + 1)]
}
/**
* True when a runtime's `hermes_cli/subcommands/dashboard.py` source registers
* the `serve` subcommand. Matches `add_parser("serve"` / `add_parser('serve'`
* specifically so the substring "server" (e.g. "start_server", "web server")
* never produces a false positive.
*/
function sourceDeclaresServe(dashboardPySource) {
return /add_parser\(\s*["']serve["']/.test(String(dashboardPySource || ''))
}
module.exports = {
serveBackendArgs,
dashboardFallbackArgs,
sourceDeclaresServe,
}

View File

@@ -0,0 +1,83 @@
'use strict'
const test = require('node:test')
const assert = require('node:assert/strict')
const {
serveBackendArgs,
dashboardFallbackArgs,
sourceDeclaresServe,
} = require('./backend-command.cjs')
test('serveBackendArgs builds a headless serve invocation', () => {
assert.deepEqual(serveBackendArgs(), [
'serve',
'--host',
'127.0.0.1',
'--port',
'0',
])
})
test('serveBackendArgs pins a profile when provided', () => {
assert.deepEqual(serveBackendArgs('worker'), [
'--profile',
'worker',
'serve',
'--host',
'127.0.0.1',
'--port',
'0',
])
})
test('dashboardFallbackArgs rewrites serve -> dashboard --no-open, keeping the -m prefix', () => {
const serve = ['-m', 'hermes_cli.main', 'serve', '--host', '127.0.0.1', '--port', '0']
assert.deepEqual(dashboardFallbackArgs(serve), [
'-m',
'hermes_cli.main',
'dashboard',
'--no-open',
'--host',
'127.0.0.1',
'--port',
'0',
])
})
test('dashboardFallbackArgs preserves a --profile flag ahead of serve', () => {
const serve = ['-m', 'hermes_cli.main', '--profile', 'worker', 'serve', '--host', '127.0.0.1', '--port', '0']
assert.deepEqual(dashboardFallbackArgs(serve), [
'-m',
'hermes_cli.main',
'--profile',
'worker',
'dashboard',
'--no-open',
'--host',
'127.0.0.1',
'--port',
'0',
])
})
test('dashboardFallbackArgs is a no-op (copy) when there is no serve token', () => {
const args = ['-m', 'hermes_cli.main', 'dashboard', '--no-open']
const out = dashboardFallbackArgs(args)
assert.deepEqual(out, args)
assert.notEqual(out, args, 'should return a copy, not the same reference')
})
test('sourceDeclaresServe detects the serve subparser registration', () => {
assert.equal(sourceDeclaresServe('subparsers.add_parser("serve", help="...")'), true)
assert.equal(sourceDeclaresServe("subparsers.add_parser('serve')"), true)
assert.equal(sourceDeclaresServe('subparsers.add_parser(\n "serve",\n)'), true)
})
test('sourceDeclaresServe does not false-positive on the substring "server"', () => {
const oldSource = `
dashboard_parser = subparsers.add_parser("dashboard", help="Start the web UI dashboard")
from hermes_cli.web_server import start_server # web server
`
assert.equal(sourceDeclaresServe(oldSource), false)
})

View File

@@ -61,10 +61,7 @@ function buildDesktopBackendPath({
const venvBin = venvRoot ? pathModule.join(venvRoot, platform === 'win32' ? 'Scripts' : 'bin') : null
const saneEntries = platform === 'win32' ? [] : POSIX_SANE_PATH_ENTRIES
return appendUniquePathEntries(
[hermesNodeBin, venvBin, currentPath, saneEntries],
{ delimiter }
)
return appendUniquePathEntries([hermesNodeBin, venvBin, currentPath, saneEntries], { delimiter })
}
function normalizeHermesHomeRoot(hermesHome, { pathModule = pathModuleForPlatform(process.platform) } = {}) {

View File

@@ -76,10 +76,7 @@ test('normalizeHermesHomeRoot maps profile homes back to the global Hermes root'
normalizeHermesHomeRoot('C:\\Users\\test\\AppData\\Local\\hermes\\profiles\\oracle', { pathModule: path.win32 }),
'C:\\Users\\test\\AppData\\Local\\hermes'
)
assert.equal(
normalizeHermesHomeRoot('/Users/test/.hermes', { pathModule: path.posix }),
'/Users/test/.hermes'
)
assert.equal(normalizeHermesHomeRoot('/Users/test/.hermes', { pathModule: path.posix }), '/Users/test/.hermes')
})
test('Windows PATH casing and delimiter are preserved without POSIX sane entries', () => {
@@ -104,8 +101,5 @@ test('Windows PATH casing and delimiter are preserved without POSIX sane entries
})
test('appendUniquePathEntries drops empty entries and keeps first occurrence', () => {
assert.equal(
appendUniquePathEntries([':/a::/b', ['/a', '/c']], { delimiter: ':' }),
'/a:/b:/c'
)
assert.equal(appendUniquePathEntries([':/a::/b', ['/a', '/c']], { delimiter: ':' }), '/a:/b:/c')
})

View File

@@ -37,7 +37,18 @@ const { execFileSync } = require('node:child_process')
const PROBE_TIMEOUT_MS = 5000
/**
* Return true iff `python -c "import hermes_cli"` exits 0.
* Return the Python snippet used to verify Hermes can import far enough to
* launch the CLI. Kept exported for tests so dependency regressions are
* caught without needing a real broken venv fixture.
*
* @returns {string}
*/
function hermesRuntimeImportProbe() {
return 'import yaml; import hermes_cli.config'
}
/**
* Return true iff the Hermes runtime import probe exits 0.
*
* Used to gate the "fallback to system Python with hermes_cli installed"
* rung of resolveHermesBackend. Without this, a system Python 3.11-3.13
@@ -46,13 +57,20 @@ const PROBE_TIMEOUT_MS = 5000
* site-packages -- and the resolver returns a backend that immediately
* dies on spawn.
*
* The probe intentionally imports hermes_cli.config, not just the top-level
* package: a broken/empty Windows launcher venv can still see the source tree
* through PYTHONPATH but lack PyYAML, then die on the first real CLI import.
*
* @param {string} pythonPath - Absolute path to a python.exe / python.
* @param {object} [opts]
* @param {object} [opts.env] - Additional environment for the probe.
* @returns {boolean}
*/
function canImportHermesCli(pythonPath) {
function canImportHermesCli(pythonPath, opts = {}) {
if (!pythonPath) return false
try {
execFileSync(pythonPath, ['-c', 'import hermes_cli'], {
execFileSync(pythonPath, ['-c', hermesRuntimeImportProbe()], {
env: { ...process.env, ...(opts.env || {}) },
stdio: 'ignore',
timeout: PROBE_TIMEOUT_MS,
windowsHide: true
@@ -101,6 +119,7 @@ function verifyHermesCli(hermesCommand, opts = {}) {
module.exports = {
canImportHermesCli,
hermesRuntimeImportProbe,
verifyHermesCli,
PROBE_TIMEOUT_MS
}

View File

@@ -11,7 +11,7 @@ const fs = require('node:fs')
const os = require('node:os')
const path = require('node:path')
const { canImportHermesCli, verifyHermesCli } = require('./backend-probes.cjs')
const { canImportHermesCli, hermesRuntimeImportProbe, verifyHermesCli } = require('./backend-probes.cjs')
// Resolve the host's own Node binary -- guaranteed to be on disk and
// runnable. We use it as both a stand-in for "a python that doesn't
@@ -40,6 +40,12 @@ test('canImportHermesCli returns false when binary does not exist', () => {
assert.equal(canImportHermesCli(ghost), false)
})
test('hermes runtime import probe checks config dependencies', () => {
const probe = hermesRuntimeImportProbe()
assert.match(probe, /\bimport yaml\b/)
assert.match(probe, /\bimport hermes_cli\.config\b/)
})
test('verifyHermesCli returns false when command is falsy', () => {
assert.equal(verifyHermesCli(''), false)
assert.equal(verifyHermesCli(null), false)

View File

@@ -1,3 +1,5 @@
const fs = require('node:fs')
const _READY_RE = /^HERMES_DASHBOARD_READY port=(\d+)/m
// The announcement clock starts the instant the backend process is spawned —
@@ -94,9 +96,76 @@ function waitForDashboardPort(child, timeoutMs = resolvePortAnnounceTimeoutMs())
})
}
function readDashboardReadyFile(readyFile) {
if (!readyFile) return null
try {
const parsed = JSON.parse(fs.readFileSync(readyFile, 'utf8'))
const port = Number(parsed?.port)
return Number.isInteger(port) && port > 0 ? port : null
} catch {
return null
}
}
function waitForDashboardReadyFile(readyFile, child, timeoutMs = resolvePortAnnounceTimeoutMs()) {
return new Promise((resolve, reject) => {
let done = false
let interval = null
function cleanup() {
if (done) return
done = true
clearTimeout(timer)
if (interval) clearInterval(interval)
child.off('exit', onExit)
child.off('error', onError)
}
function check() {
const port = readDashboardReadyFile(readyFile)
if (port) {
cleanup()
resolve(port)
}
}
function onExit(code, signal) {
cleanup()
reject(new Error(`Hermes backend: exited before port announcement (${signal || code})`))
}
function onError(err) {
cleanup()
reject(err)
}
const timer = setTimeout(() => {
cleanup()
reject(new Error(`Timed out waiting for Hermes backend port announcement (${timeoutMs}ms)`))
}, timeoutMs)
child.on('exit', onExit)
child.on('error', onError)
interval = setInterval(check, 50)
if (typeof interval.unref === 'function') interval.unref()
check()
})
}
function waitForDashboardPortAnnouncement(child, options = {}) {
const timeoutMs = options.timeoutMs ?? resolvePortAnnounceTimeoutMs()
if (options.readyFile) {
return waitForDashboardReadyFile(options.readyFile, child, timeoutMs)
}
return waitForDashboardPort(child, timeoutMs)
}
module.exports = {
waitForDashboardPort,
waitForDashboardPortAnnouncement,
waitForDashboardReadyFile,
readDashboardReadyFile,
resolvePortAnnounceTimeoutMs,
DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS,
MIN_PORT_ANNOUNCE_TIMEOUT_MS,
MIN_PORT_ANNOUNCE_TIMEOUT_MS
}

View File

@@ -14,12 +14,18 @@
const test = require('node:test')
const assert = require('node:assert/strict')
const { EventEmitter } = require('node:events')
const fs = require('node:fs')
const os = require('node:os')
const path = require('node:path')
const {
readDashboardReadyFile,
waitForDashboardPort,
waitForDashboardPortAnnouncement,
waitForDashboardReadyFile,
resolvePortAnnounceTimeoutMs,
DEFAULT_PORT_ANNOUNCE_TIMEOUT_MS,
MIN_PORT_ANNOUNCE_TIMEOUT_MS,
MIN_PORT_ANNOUNCE_TIMEOUT_MS
} = require('./backend-ready.cjs')
// A minimal stand-in for a spawned child process: an EventEmitter with a
@@ -119,3 +125,75 @@ test('a late announcement after timeout does not throw (listeners torn down)', a
child.stdout.emit('data', 'HERMES_DASHBOARD_READY port=9999\n')
})
})
// ---------------------------------------------------------------------------
// ready-file port announcement
// ---------------------------------------------------------------------------
function mkTmpReadyFile() {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-ready-test-'))
return {
dir,
file: path.join(dir, 'ready.json'),
cleanup: () => fs.rmSync(dir, { recursive: true, force: true })
}
}
test('readDashboardReadyFile returns a valid port from JSON', () => {
const tmp = mkTmpReadyFile()
try {
fs.writeFileSync(tmp.file, JSON.stringify({ port: 4567 }))
assert.equal(readDashboardReadyFile(tmp.file), 4567)
} finally {
tmp.cleanup()
}
})
test('readDashboardReadyFile ignores missing, malformed, or invalid files', () => {
const tmp = mkTmpReadyFile()
try {
assert.equal(readDashboardReadyFile(tmp.file), null)
fs.writeFileSync(tmp.file, '{')
assert.equal(readDashboardReadyFile(tmp.file), null)
fs.writeFileSync(tmp.file, JSON.stringify({ port: 0 }))
assert.equal(readDashboardReadyFile(tmp.file), null)
} finally {
tmp.cleanup()
}
})
test('waitForDashboardReadyFile resolves when the ready file appears', async () => {
const tmp = mkTmpReadyFile()
const child = makeFakeChild()
try {
const p = waitForDashboardReadyFile(tmp.file, child, 1000)
setTimeout(() => fs.writeFileSync(tmp.file, JSON.stringify({ port: 8765 })), 20)
assert.equal(await p, 8765)
} finally {
tmp.cleanup()
}
})
test('waitForDashboardPortAnnouncement uses ready file when provided', async () => {
const tmp = mkTmpReadyFile()
const child = makeFakeChild()
try {
const p = waitForDashboardPortAnnouncement(child, { readyFile: tmp.file, timeoutMs: 1000 })
setTimeout(() => fs.writeFileSync(tmp.file, JSON.stringify({ port: 9876 })), 20)
assert.equal(await p, 9876)
} finally {
tmp.cleanup()
}
})
test('waitForDashboardReadyFile rejects when the child exits before file readiness', async () => {
const tmp = mkTmpReadyFile()
const child = makeFakeChild()
try {
const p = waitForDashboardReadyFile(tmp.file, child, 1000)
child.emit('exit', 1, null)
await assert.rejects(p, /exited before port announcement/)
} finally {
tmp.cleanup()
}
})

View File

@@ -179,7 +179,13 @@ function downloadInstallScript(commit, destPath) {
})
}
async function resolveInstallScript({ installStamp, sourceRepoRoot, hermesHome, emit, _download = downloadInstallScript }) {
async function resolveInstallScript({
installStamp,
sourceRepoRoot,
hermesHome,
emit,
_download = downloadInstallScript
}) {
// 1. Dev shortcut: prefer a local checkout's installer so we can iterate
// without pushing. SOURCE_REPO_ROOT comes from main.cjs (path.resolve
// of APP_ROOT/../..).
@@ -293,15 +299,19 @@ function spawnPowerShell(scriptPath, args, { emit, stageName, abortSignal, herme
const ps = process.platform === 'win32' ? resolveWindowsPowerShell() : 'pwsh'
const fullArgs = ['-NoProfile', '-ExecutionPolicy', 'Bypass', '-File', scriptPath, ...args]
const child = spawn(ps, fullArgs, hiddenWindowsChildOptions({
stdio: ['ignore', 'pipe', 'pipe'],
env: {
...process.env,
// Pass HERMES_HOME through so install.ps1 respects the caller's
// choice rather than re-computing the default.
HERMES_HOME: hermesHome || process.env.HERMES_HOME || ''
}
}))
const child = spawn(
ps,
fullArgs,
hiddenWindowsChildOptions({
stdio: ['ignore', 'pipe', 'pipe'],
env: {
...process.env,
// Pass HERMES_HOME through so install.ps1 respects the caller's
// choice rather than re-computing the default.
HERMES_HOME: hermesHome || process.env.HERMES_HOME || ''
}
})
)
let stdout = ''
let stderr = ''

View File

@@ -261,12 +261,7 @@ function cookiesHaveSession(cookies) {
*/
function cookiesHaveLiveSession(cookies) {
if (!Array.isArray(cookies)) return false
return cookies.some(
c =>
c &&
c.value &&
(AT_COOKIE_VARIANTS.includes(c.name) || RT_COOKIE_VARIANTS.includes(c.name))
)
return cookies.some(c => c && c.value && (AT_COOKIE_VARIANTS.includes(c.name) || RT_COOKIE_VARIANTS.includes(c.name)))
}
module.exports = {

View File

@@ -138,10 +138,7 @@ function buildPosixCleanupScript({ desktopPid, pythonExe, pythonPath, agentRoot,
if (pythonPath) {
lines.push(`export PYTHONPATH=${q(pythonPath)}\${PYTHONPATH:+:$PYTHONPATH}`)
}
lines.push(
`cd ${q(agentRoot)} 2>/dev/null || true`,
`${q(pythonExe)} ${uninstallArgs.map(q).join(' ')} || true`
)
lines.push(`cd ${q(agentRoot)} 2>/dev/null || true`, `${q(pythonExe)} ${uninstallArgs.map(q).join(' ')} || true`)
if (appPath) {
lines.push(`rm -rf ${q(appPath)} || true`)
}
@@ -169,7 +166,15 @@ function buildPosixCleanupScript({ desktopPid, pythonExe, pythonPath, agentRoot,
* Removal: even after the desktop PID is gone, Windows releases directory
* handles lazily, so a single `rmdir /s /q` can half-fail — retry up to 10x.
*/
function buildWindowsCleanupScript({ desktopPid, pythonExe, pythonPath, agentRoot, uninstallArgs, appPath, hermesHome }) {
function buildWindowsCleanupScript({
desktopPid,
pythonExe,
pythonPath,
agentRoot,
uninstallArgs,
appPath,
hermesHome
}) {
const pid = Number(desktopPid) || 0
// cmd.exe has no string escaping inside quotes; strip embedded quotes (paths
// under %LOCALAPPDATA% never contain them). `&`/`^` in a path would still be

View File

@@ -101,10 +101,7 @@ test('resolveRemovableAppPath uses APPIMAGE on Linux when set', () => {
})
test('resolveRemovableAppPath finds the unpacked dir on Linux', () => {
assert.equal(
resolveRemovableAppPath('/opt/hermes/linux-unpacked/hermes', 'linux', {}),
'/opt/hermes/linux-unpacked'
)
assert.equal(resolveRemovableAppPath('/opt/hermes/linux-unpacked/hermes', 'linux', {}), '/opt/hermes/linux-unpacked')
// A system-package install (/usr/bin) → null, left to apt/dnf.
assert.equal(resolveRemovableAppPath('/usr/bin/hermes', 'linux', {}), null)
})

View File

@@ -0,0 +1,48 @@
'use strict'
const { session } = require('electron')
const EMBED_SESSION_PARTITION = 'persist:hermes-embed'
const EMBED_REFERER = 'https://www.youtube.com/'
const YOUTUBE_REFERER_HOST_RE =
/(^|\.)(youtube\.com|youtube-nocookie\.com|googlevideo\.com|ytimg\.com|youtubei\.googleapis\.com)$/i
function installEmbedRefererForSession(embedSession) {
if (!embedSession) {
return
}
embedSession.webRequest.onBeforeSendHeaders((details, callback) => {
let host = ''
try {
host = new URL(details.url).hostname
} catch {
host = ''
}
if (!YOUTUBE_REFERER_HOST_RE.test(host)) {
callback({ requestHeaders: details.requestHeaders })
return
}
const headers = { ...details.requestHeaders }
if (!headers.Referer && !headers.referer) {
headers.Referer = EMBED_REFERER
}
callback({ requestHeaders: headers })
})
}
/** Stamp Referer on YouTube requests in the embed webview partition only. */
function installEmbedReferer() {
try {
installEmbedRefererForSession(session.fromPartition(EMBED_SESSION_PARTITION))
} catch {
// Non-fatal: embeds still render; YouTube may show referer errors.
}
}
module.exports = { installEmbedReferer }

View File

@@ -0,0 +1,105 @@
'use strict'
const { shell } = require('electron')
const fs = require('fs')
const path = require('path')
const { readDirForIpc } = require('./fs-read-dir.cjs')
const { gitRootForIpc } = require('./git-root.cjs')
const { resolveRequestedPathForIpc } = require('./hardening.cjs')
// Filesystem IPC: read-dir, git-root, reveal, rename, write-text, trash. Path
// hardening + `~` expansion + dir-existence checks live in the main process and
// are injected so this module stays side-effect free.
function registerFsIpc({ directoryExists, expandUserPath, ipcMain }) {
ipcMain.handle('hermes:fs:readDir', async (_event, dirPath) => readDirForIpc(dirPath))
ipcMain.handle('hermes:fs:gitRoot', async (_event, startPath) => gitRootForIpc(startPath))
// Reveal a path in the OS file manager (Finder / Explorer / Files).
ipcMain.handle('hermes:fs:reveal', async (_event, targetPath) => {
const target = String(targetPath || '').trim()
if (!target) {
return false
}
try {
shell.showItemInFolder(target)
return true
} catch {
return false
}
})
// Rename a file/folder in place. The renderer passes the existing path + a new
// base name; the destination is resolved in the SAME parent dir so a rename can
// never move the item elsewhere or traverse out. Rejects on a name collision.
ipcMain.handle('hermes:fs:rename', async (_event, targetPath, newName) => {
const src = String(targetPath || '').trim()
const name = String(newName || '').trim()
if (!src || !name || name === '.' || name === '..' || name.includes('/') || name.includes('\\')) {
throw new Error('Invalid rename')
}
const dst = path.join(path.dirname(src), name)
if (dst === src) {
return { path: dst }
}
if (fs.existsSync(dst)) {
throw new Error(`"${name}" already exists`)
}
await fs.promises.rename(src, dst)
return { path: dst }
})
// Write a small UTF-8 text file (e.g. a project's IDEA.md at creation). The path
// is hardened (resolveRequestedPathForIpc) and the parent must already exist —
// this never creates directory trees or escapes the allowed roots, and content
// is size-capped so it can't be abused as a bulk-write primitive.
ipcMain.handle('hermes:fs:writeText', async (_event, filePath, content) => {
const raw = String(filePath || '').trim()
if (!raw) {
throw new Error('Invalid path')
}
const text = String(content ?? '')
if (text.length > 1_000_000) {
throw new Error('Content too large')
}
const resolved = resolveRequestedPathForIpc(expandUserPath(raw), { purpose: 'Write text file' })
if (!directoryExists(path.dirname(resolved))) {
throw new Error('Parent directory does not exist')
}
await fs.promises.writeFile(resolved, text, 'utf8')
return { path: resolved }
})
// Move a file/folder to the OS trash (recoverable) — the VS Code "Delete"
// default. `shell.trashItem` routes to Finder/Explorer/Files trash per platform.
ipcMain.handle('hermes:fs:trash', async (_event, targetPath) => {
const target = String(targetPath || '').trim()
if (!target) {
throw new Error('Invalid delete')
}
await shell.trashItem(target)
return true
})
}
module.exports = { registerFsIpc }

View File

@@ -0,0 +1,49 @@
'use strict'
const assert = require('node:assert/strict')
const test = require('node:test')
const { registerFsIpc } = require('./fs-ipc.cjs')
function fakeIpcMain() {
const handlers = new Map()
return {
handlers,
handle(channel, handler) {
assert.ok(!handlers.has(channel), `duplicate registration for ${channel}`)
handlers.set(channel, handler)
}
}
}
test('registerFsIpc wires only hermes:fs:* channels, each to a handler fn', () => {
const ipcMain = fakeIpcMain()
registerFsIpc({ ipcMain, directoryExists: () => true, expandUserPath: p => p })
assert.ok(ipcMain.handlers.size >= 6, `expected the full fs surface, got ${ipcMain.handlers.size}`)
for (const [channel, handler] of ipcMain.handlers) {
assert.match(channel, /^hermes:fs:/, `${channel} is not an fs channel`)
assert.equal(typeof handler, 'function', `${channel} should register a handler`)
}
for (const channel of ['hermes:fs:readDir', 'hermes:fs:rename', 'hermes:fs:trash']) {
assert.ok(ipcMain.handlers.has(channel), `missing ${channel}`)
}
})
test('rename rejects names that traverse out of the parent dir', async () => {
const ipcMain = fakeIpcMain()
registerFsIpc({ ipcMain, directoryExists: () => true, expandUserPath: p => p })
for (const bad of ['..', '.', 'a/b', 'a\\b']) {
await assert.rejects(
() => ipcMain.handlers.get('hermes:fs:rename')({}, '/tmp/x', bad),
/Invalid rename/,
`"${bad}" should be rejected`
)
}
})

View File

@@ -92,9 +92,7 @@ async function readDirForIpc(dirPath, options = {}) {
try {
const dirents = await fsImpl.promises.readdir(resolved, { withFileTypes: true })
const visibleDirents = dirents.filter(dirent => !FS_READDIR_HIDDEN.has(dirent.name))
const entries = await mapWithStatConcurrency(visibleDirents, dirent =>
entryForDirent(dirent, resolved, fsImpl)
)
const entries = await mapWithStatConcurrency(visibleDirents, dirent => entryForDirent(dirent, resolved, fsImpl))
entries.sort((a, b) => Number(b.isDirectory) - Number(a.isDirectory) || a.name.localeCompare(b.name))

View File

@@ -349,7 +349,10 @@ test('readDirForIpc bounds concurrent stats while preserving complete sorted out
assert.equal(result.error, undefined)
assert.equal(result.entries.length, names.length)
assert.equal(statCalls.length, names.length)
assert.equal(statCalls.some(fullPath => fullPath.endsWith(`${path.sep}node_modules`)), false)
assert.equal(
statCalls.some(fullPath => fullPath.endsWith(`${path.sep}node_modules`)),
false
)
assert.ok(peak > 1, `expected concurrent stats, observed peak ${peak}`)
assert.ok(peak <= 16, `expected at most 16 concurrent stats, observed peak ${peak}`)
assert.deepEqual(
@@ -357,8 +360,5 @@ test('readDirForIpc bounds concurrent stats while preserving complete sorted out
expectedNames
)
assert.equal(result.entries.find(entry => entry.name === failedName)?.isDirectory, false)
assert.equal(
result.entries.filter(entry => entry.isDirectory).length,
successfulDirectoryNames.size
)
assert.equal(result.entries.filter(entry => entry.isDirectory).length, successfulDirectoryNames.size)
})

View File

@@ -0,0 +1,96 @@
'use strict'
const { scanGitRepos } = require('./git-repo-scan.cjs')
const {
fileDiffVsHead,
repoStatus,
reviewCommit,
reviewCommitContext,
reviewCreatePr,
reviewDiff,
reviewList,
reviewPush,
reviewRevParse,
reviewRevert,
reviewShipInfo,
reviewStage,
reviewUnstage
} = require('./git-review-ops.cjs')
const { addWorktree, listBranches, listWorktrees, removeWorktree, switchBranch } = require('./git-worktree-ops.cjs')
// Register the git/worktree/review IPC handlers. Thin delegators to the
// git-*-ops sibling modules; the git/gh binary resolution lives in the main
// process (Windows PATH discovery) and is injected so this module stays pure.
function registerGitIpc({ ipcMain, resolveGitBinary, resolveGhBinary }) {
// Git-driven worktree management ("Start work" flow). Errors surface to the
// renderer as rejected promises so it can toast a friendly message.
ipcMain.handle('hermes:git:worktreeList', async (_event, repoPath) => listWorktrees(repoPath, resolveGitBinary()))
ipcMain.handle('hermes:git:worktreeAdd', async (_event, repoPath, options) =>
addWorktree(repoPath, options || {}, resolveGitBinary())
)
ipcMain.handle('hermes:git:worktreeRemove', async (_event, repoPath, worktreePath, options) =>
removeWorktree(repoPath, worktreePath, options || {}, resolveGitBinary())
)
ipcMain.handle('hermes:git:branchSwitch', async (_event, repoPath, branch) =>
switchBranch(repoPath, branch, resolveGitBinary())
)
ipcMain.handle('hermes:git:branchList', async (_event, repoPath) => listBranches(repoPath, resolveGitBinary()))
// Compact repo status (branch, ahead/behind, change counts + files) for the
// composer coding rail. Returns null on a non-repo / remote backend so the rail
// hides cleanly rather than erroring.
ipcMain.handle('hermes:git:repoStatus', async (_event, repoPath) => repoStatus(repoPath, resolveGitBinary()))
// Codex-style review pane: list changed files for a scope, fetch one file's
// unified diff, and stage / unstage / revert. Reads return empty on failure;
// mutations reject so the renderer can toast.
ipcMain.handle('hermes:git:review:list', async (_event, repoPath, scope, baseRef) =>
reviewList(repoPath, scope, baseRef, resolveGitBinary())
)
ipcMain.handle('hermes:git:review:diff', async (_event, repoPath, filePath, scope, baseRef, staged) =>
reviewDiff(repoPath, filePath, scope, baseRef, staged, resolveGitBinary())
)
// Working-tree-vs-HEAD diff for one file (the preview's "show the diff" view).
ipcMain.handle('hermes:git:fileDiff', async (_event, repoPath, filePath) =>
fileDiffVsHead(repoPath, filePath, resolveGitBinary())
)
ipcMain.handle('hermes:git:review:stage', async (_event, repoPath, filePath) =>
reviewStage(repoPath, filePath ?? null, resolveGitBinary())
)
ipcMain.handle('hermes:git:review:unstage', async (_event, repoPath, filePath) =>
reviewUnstage(repoPath, filePath ?? null, resolveGitBinary())
)
ipcMain.handle('hermes:git:review:revert', async (_event, repoPath, filePath) =>
reviewRevert(repoPath, filePath ?? null, resolveGitBinary())
)
ipcMain.handle('hermes:git:review:revParse', async (_event, repoPath, ref) =>
reviewRevParse(repoPath, ref, resolveGitBinary())
)
ipcMain.handle('hermes:git:review:commit', async (_event, repoPath, message, push) =>
reviewCommit(repoPath, message, Boolean(push), resolveGitBinary())
)
ipcMain.handle('hermes:git:review:commitContext', async (_event, repoPath) =>
reviewCommitContext(repoPath, resolveGitBinary())
)
ipcMain.handle('hermes:git:review:push', async (_event, repoPath) => reviewPush(repoPath, resolveGitBinary()))
ipcMain.handle('hermes:git:review:shipInfo', async (_event, repoPath) => reviewShipInfo(repoPath, resolveGhBinary()))
ipcMain.handle('hermes:git:review:createPr', async (_event, repoPath) =>
reviewCreatePr(repoPath, resolveGitBinary(), resolveGhBinary())
)
// Repo-first project discovery: scan bounded roots for git repos (pure fs walk,
// no native addon). Never throws to the renderer — failures yield an empty list.
ipcMain.handle('hermes:git:scanRepos', async (_event, roots, options) => {
try {
return await scanGitRepos(roots || [], options || {})
} catch {
return []
}
})
}
module.exports = { registerGitIpc }

View File

@@ -0,0 +1,61 @@
'use strict'
const assert = require('node:assert/strict')
const test = require('node:test')
const { registerGitIpc } = require('./git-ipc.cjs')
function fakeIpcMain() {
const handlers = new Map()
return {
handlers,
handle(channel, handler) {
assert.ok(!handlers.has(channel), `duplicate registration for ${channel}`)
handlers.set(channel, handler)
}
}
}
test('registerGitIpc wires only hermes:git:* channels, each to a handler fn', () => {
const ipcMain = fakeIpcMain()
registerGitIpc({ ipcMain, resolveGitBinary: () => 'git', resolveGhBinary: () => 'gh' })
assert.ok(ipcMain.handlers.size >= 19, `expected the full git surface, got ${ipcMain.handlers.size}`)
for (const [channel, handler] of ipcMain.handlers) {
assert.match(channel, /^hermes:git:/, `${channel} is not a git channel`)
assert.equal(typeof handler, 'function', `${channel} should register a handler`)
}
// Spot-check the load-bearing channels across the worktree / review / scan groups.
for (const channel of ['hermes:git:worktreeList', 'hermes:git:review:commit', 'hermes:git:scanRepos']) {
assert.ok(ipcMain.handlers.has(channel), `missing ${channel}`)
}
})
test('handlers thread the injected resolver into the ops layer', async () => {
const ipcMain = fakeIpcMain()
const calls = []
registerGitIpc({
ipcMain,
resolveGitBinary: () => {
calls.push('git')
return 'git'
},
resolveGhBinary: () => 'gh'
})
// The resolver is consulted synchronously to build the ops call; whatever the
// ops layer does with a non-repo path is irrelevant to the wiring.
try {
await ipcMain.handlers.get('hermes:git:worktreeList')({}, '/definitely/not/a/repo')
} catch {
// ops layer may reject on a bad path — not what this test asserts.
}
assert.deepEqual(calls, ['git'])
})

View File

@@ -0,0 +1,96 @@
'use strict'
// Repo-first discovery: walk bounded roots for git repos using only Node's `fs`
// — no native addon, so it just works for anyone who pulls main (no
// electron-rebuild). Mirrors how GitHub Desktop scans: stop at the first `.git`
// (don't descend into a repo), cap depth, and skip heavy non-repo trees so the
// first scan stays fast. Results are cached by the backend after the first run.
const fs = require('node:fs')
const os = require('node:os')
const path = require('node:path')
const fsp = fs.promises
// Shallow on purpose: real projects live a few levels under home
// (`~/www/repo`, `~/code/org/repo`); deeper `.git` dirs are almost always
// fixtures/vendored/eval checkouts (e.g. `~/www/ha-evals/tasks/*/repo`). Repos
// you actually use but keep deeper still surface via session-derived discovery,
// so this only prunes noise, never repos with history.
const DEFAULT_MAX_DEPTH = 3
const MAX_CONCURRENCY = 32
// Big trees that are never themselves repos and would waste the walk. Anything
// hidden (dotdirs like .cache/.Trash/.npm) is skipped wholesale below, so this
// only needs the non-hidden heavyweights.
const JUNK_DIRS = new Set(['Applications', 'Library', 'node_modules', 'site-packages', 'vendor', 'venv'])
async function mapLimit(items, limit, fn) {
let cursor = 0
async function worker() {
while (cursor < items.length) {
const index = cursor
cursor += 1
await fn(items[index])
}
}
await Promise.all(Array.from({ length: Math.min(limit, items.length) }, worker))
}
/**
* Scan `roots` (default: the home dir) for git repositories. Returns deduped
* `{ root, label }` entries. `options.maxDepth` caps recursion (default 3).
*/
async function scanGitRepos(roots, options = {}) {
const maxDepth = Number(options.maxDepth) || DEFAULT_MAX_DEPTH
const searchRoots = Array.isArray(roots) && roots.length > 0 ? roots : [os.homedir()]
const found = new Map()
async function walk(dir, depth) {
if (depth > maxDepth) {
return
}
let entries
try {
entries = await fsp.readdir(dir, { withFileTypes: true })
} catch {
return // unreadable / permission denied
}
// A `.git` DIRECTORY marks a real repo root (a main checkout). A `.git`
// FILE is a linked worktree or submodule — those belong to their parent
// repo as lanes, not as separate projects, so we don't list them (and we
// keep descending in case a real repo sits deeper). This is what kills the
// worktree/eval-repo duplicate explosion.
if (entries.some(entry => entry.name === '.git' && entry.isDirectory())) {
const root = dir.replace(/[/\\]+$/, '')
found.set(root, path.basename(root) || root)
return
}
const subdirs = []
for (const entry of entries) {
// Real directories only (skip symlinks to avoid loops), no hidden dirs, no
// known heavy trees.
if (!entry.isDirectory() || entry.name.startsWith('.') || JUNK_DIRS.has(entry.name)) {
continue
}
subdirs.push(path.join(dir, entry.name))
}
await mapLimit(subdirs, MAX_CONCURRENCY, sub => walk(sub, depth + 1))
}
await mapLimit(searchRoots.map(root => String(root || '').trim()).filter(Boolean), MAX_CONCURRENCY, root =>
walk(root, 0)
)
return [...found.entries()].map(([root, label]) => ({ label, root }))
}
module.exports = { scanGitRepos }

View File

@@ -0,0 +1,703 @@
'use strict'
// Git ops backing the coding rail + Codex-style review pane. Built on `simple-git`
// (a maintained wrapper around the system git binary — same git the rest of the
// app shells to, no native build) so we read structured status()/diffSummary()
// results instead of hand-parsing porcelain. Reads degrade to null/empty on a
// non-repo / remote backend; mutations reject so the renderer can toast.
const { execFile } = require('node:child_process')
const fs = require('node:fs/promises')
const path = require('node:path')
// `simple-git` is a pure-JS runtime dep that workspace dedup hoists into the
// repo-root node_modules. Packaged builds set `files:` in package.json, which
// excludes node_modules from the asar, so the normal require() fails at launch
// (issue #52735: "Cannot find module 'simple-git'"). We ship the dep's
// closure under resources/native-deps/vendor/node_modules/ via extraResources
// + scripts/stage-native-deps.cjs, and resolve from there when the hoisted
// require() isn't reachable. The `vendor/` nesting matters: electron-builder
// drops a node_modules dir at the root of an extraResources copy but keeps a
// nested one. Dev mode never hits the fallback -- Node's normal lookup finds
// the hoisted copy.
let simpleGit
try {
simpleGit = require('simple-git')
} catch {
const resourcesPath = process.resourcesPath
if (!resourcesPath) {
throw new Error("git-review IPC: 'simple-git' not found and no resourcesPath to fall back to")
}
simpleGit = require(path.join(resourcesPath, 'native-deps', 'vendor', 'node_modules', 'simple-git'))
}
const { resolveRequestedPathForIpc } = require('./hardening.cjs')
const COMMIT_CONTEXT_DIFF_MAX_CHARS = 120_000
const COMMIT_CONTEXT_UNTRACKED_MAX = 80
const UNTRACKED_LINE_COUNT_CONCURRENCY = 16
const UNTRACKED_LINE_COUNT_MAX_BYTES = 1024 * 1024
// GUI-launched Electron apps on macOS inherit only a minimal PATH (no
// /opt/homebrew/bin or /usr/local/bin), so `gh` — and the `git` gh shells out
// to — aren't found. Augment PATH with the resolved gh dir + the common
// package-manager bins so gh runs the same way it does in a terminal.
function ghEnv(ghBin) {
const extra = [ghBin ? path.dirname(ghBin) : '', '/opt/homebrew/bin', '/usr/local/bin', '/usr/bin'].filter(
dir => dir && dir !== '.'
)
return { ...process.env, PATH: [...extra, process.env.PATH].filter(Boolean).join(path.delimiter) }
}
// Run the `gh` CLI in a repo. Resolves { ok, stdout } so callers branch on
// availability/auth without a throw. gh missing/unauthed → ok:false.
function runGh(args, cwd, ghBin) {
return new Promise(resolve => {
execFile(
ghBin || 'gh',
args,
{ cwd, env: ghEnv(ghBin), windowsHide: true, timeout: 30_000, maxBuffer: 8 * 1024 * 1024 },
(err, stdout) => resolve({ ok: !err, stdout: String(stdout || '') })
)
})
}
function gitFor(cwd, gitBin) {
return simpleGit({ baseDir: cwd, binary: gitBin || 'git', maxConcurrentProcesses: 4, trimmed: false })
}
// simple-git reports renames as `old => new` (and `dir/{old => new}/f`); resolve
// to the NEW path so the row addresses the real file for diff/stage.
function resolveRenamePath(raw) {
const path = String(raw || '').trim()
if (!path.includes(' => ')) {
return path
}
const brace = path.match(/^(.*)\{(.*) => (.*)\}(.*)$/)
if (brace) {
const [, prefix, , to, suffix] = brace
return `${prefix}${to}${suffix}`.replace(/\/{2,}/g, '/')
}
return path.split(' => ').pop().trim()
}
// DiffResult.files → Map<path, {added, removed}> (binary files carry no line
// delta).
function countsByPath(summary) {
const map = new Map()
for (const file of summary.files) {
map.set(resolveRenamePath(file.file), {
added: file.binary ? 0 : file.insertions,
removed: file.binary ? 0 : file.deletions
})
}
return map
}
// Untracked files don't appear in diffSummary(); count insertions from disk so
// the review tree can show +N for new files (matches an all-add diff view).
// Insertions = line count: newline bytes, plus one for a final unterminated
// line. Binary (NUL byte) → 0, mirroring git numstat's "-".
async function untrackedInsertions(cwd, relPath) {
try {
const fullPath = path.join(cwd, relPath)
const stat = await fs.stat(fullPath)
if (!stat.isFile() || stat.size > UNTRACKED_LINE_COUNT_MAX_BYTES) {
return 0
}
const buf = await fs.readFile(fullPath)
if (buf.includes(0)) {
return 0
}
let lines = 0
for (const byte of buf) {
if (byte === 10) {
lines++
}
}
return buf.length > 0 && buf[buf.length - 1] !== 10 ? lines + 1 : lines
} catch {
return 0
}
}
function capText(text, maxChars, label = 'truncated') {
const value = String(text || '')
if (value.length <= maxChars) {
return value
}
return `${value.slice(0, maxChars)}\n# ${label}: ${value.length - maxChars} chars omitted\n`
}
async function fillUntrackedCounts(cwd, files) {
const pending = files.filter(file => file.status === '?' && file.added === 0 && file.removed === 0)
for (let i = 0; i < pending.length; i += UNTRACKED_LINE_COUNT_CONCURRENCY) {
await Promise.all(
pending.slice(i, i + UNTRACKED_LINE_COUNT_CONCURRENCY).map(async file => {
file.added = await untrackedInsertions(cwd, file.path)
})
)
}
}
// Resolve the base ref for "all branch changes": merge-base with the remote
// default branch (origin/HEAD), falling back to common trunk names.
async function branchBase(git) {
const candidates = []
try {
const head = (await git.revparse(['--abbrev-ref', 'origin/HEAD'])).trim()
if (head) {
candidates.push(head)
}
} catch {
// No origin/HEAD configured.
}
candidates.push('origin/main', 'origin/master', 'main', 'master')
for (const ref of candidates) {
try {
const base = (await git.raw(['merge-base', 'HEAD', ref])).trim()
if (base) {
return base
}
} catch {
// Ref doesn't exist; try the next candidate.
}
}
return null
}
// Resolve the repo's default branch NAME ("main" / "master" / …), preferring
// the remote's HEAD, then common local trunk names. Null when none is found
// (e.g. a fresh repo with only a feature branch). Used to offer "branch off the
// trunk" regardless of which branch you're currently on.
async function defaultBranchName(git) {
try {
const head = (await git.revparse(['--abbrev-ref', 'origin/HEAD'])).trim()
// "origin/main" → "main"; skip the bare "origin/HEAD" placeholder.
if (head && head !== 'origin/HEAD') {
return head.replace(/^origin\//, '')
}
} catch {
// No origin/HEAD configured.
}
// Prefer a local trunk, then a remote-only one (returns the clean name either
// way) so "branch off main" works even before main is checked out locally.
for (const ref of [
'refs/heads/main',
'refs/heads/master',
'refs/remotes/origin/main',
'refs/remotes/origin/master'
]) {
try {
await git.raw(['rev-parse', '--verify', '--quiet', ref])
return ref.replace(/^refs\/(?:heads|remotes\/origin)\//, '')
} catch {
// Ref doesn't exist; try the next candidate.
}
}
return null
}
// A status file's single-letter classification, preferring the staged (index)
// code over the worktree code; untracked wins (simple-git marks both '?').
function statusLetter(file) {
if (file.index === '?' || file.working_dir === '?') {
return '?'
}
const code = file.index && file.index !== ' ' ? file.index : file.working_dir
return (code || 'M').toUpperCase()
}
const isStaged = file => Boolean(file.index && file.index !== ' ' && file.index !== '?')
async function reviewList(repoPath, scope, baseRef, gitBin) {
let cwd
try {
cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Review list' })
} catch {
return { files: [], base: null }
}
const git = gitFor(cwd, gitBin)
try {
if (scope === 'branch' || scope === 'lastTurn') {
const base = scope === 'branch' ? await branchBase(git) : baseRef
if (!base) {
return { files: [], base: null }
}
const range = scope === 'branch' ? `${base}...HEAD` : base
const summary = await git.diffSummary([range])
const files = summary.files.map(file => ({
path: resolveRenamePath(file.file),
added: file.binary ? 0 : file.insertions,
removed: file.binary ? 0 : file.deletions,
status: 'M',
staged: false
}))
// "Last turn" also surfaces files created since the baseline (untracked).
if (scope === 'lastTurn') {
const status = await git.status()
for (const path of status.not_added) {
if (!files.some(f => f.path === path)) {
files.push({ path, added: 0, removed: 0, status: '?', staged: false })
}
}
}
files.sort((a, b) => a.path.localeCompare(b.path))
await fillUntrackedCounts(cwd, files)
return { files, base }
}
// Default: uncommitted (staged + unstaged + untracked), one row per path.
const [status, staged, unstaged] = await Promise.all([
git.status(),
git.diffSummary(['--cached']),
git.diffSummary([])
])
const stagedCounts = countsByPath(staged)
const unstagedCounts = countsByPath(unstaged)
const files = status.files.map(file => {
const filePath = resolveRenamePath(file.path)
const sc = stagedCounts.get(filePath) || { added: 0, removed: 0 }
const uc = unstagedCounts.get(filePath) || { added: 0, removed: 0 }
return {
path: filePath,
added: sc.added + uc.added,
removed: sc.removed + uc.removed,
status: statusLetter(file),
staged: isStaged(file)
}
})
files.sort((a, b) => a.path.localeCompare(b.path))
await fillUntrackedCounts(cwd, files)
return { files, base: null }
} catch {
return { files: [], base: null }
}
}
async function reviewDiff(repoPath, filePath, scope, baseRef, staged, gitBin) {
let cwd
try {
cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Review diff' })
} catch {
return ''
}
const git = gitFor(cwd, gitBin)
const safe = args => git.diff(args).catch(() => '')
if (scope === 'branch') {
const base = await branchBase(git)
return base ? safe([`${base}...HEAD`, '--', filePath]) : ''
}
if (scope === 'lastTurn') {
return baseRef ? safe([baseRef, '--', filePath]) : ''
}
if (staged) {
return safe(['--cached', '--', filePath])
}
const worktree = await safe(['--', filePath])
if (worktree.trim()) {
return worktree
}
// Untracked file: no worktree diff exists, so synthesize an all-add diff via
// --no-index (exits non-zero by design when files differ, so go around
// simple-git's reject-on-nonzero with a raw execFile).
return new Promise(resolve => {
execFile(
gitBin || 'git',
['diff', '--no-index', '--', '/dev/null', filePath],
{ cwd, windowsHide: true, timeout: 30_000, maxBuffer: 32 * 1024 * 1024 },
(_err, stdout) => resolve(String(stdout || ''))
)
})
}
// Working-tree-vs-HEAD diff for ONE file — the "what changed since the last
// commit" view used by the file preview. Unlike reviewDiff this never synthesizes
// a full-add for a clean tracked file (so a pristine file shows no diff); it only
// all-adds a genuinely untracked file.
async function fileDiffVsHead(repoPath, filePath, gitBin) {
let cwd
try {
cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'File diff' })
} catch {
return ''
}
const git = gitFor(cwd, gitBin)
const head = await git.diff(['HEAD', '--', filePath]).catch(() => '')
if (head.trim()) {
return head
}
// No tracked changes vs HEAD. Only synthesize an all-add diff for a file git
// doesn't know yet; a clean tracked file must return empty.
const status = await git.raw(['status', '--porcelain', '--', filePath]).catch(() => '')
if (!status.trim().startsWith('??')) {
return ''
}
return new Promise(resolve => {
execFile(
gitBin || 'git',
['diff', '--no-index', '--', '/dev/null', filePath],
{ cwd, windowsHide: true, timeout: 30_000, maxBuffer: 32 * 1024 * 1024 },
(_err, stdout) => resolve(String(stdout || ''))
)
})
}
async function reviewStage(repoPath, filePath, gitBin) {
const cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Review stage' })
await gitFor(cwd, gitBin).raw(filePath ? ['add', '--', filePath] : ['add', '-A'])
return { ok: true }
}
async function reviewUnstage(repoPath, filePath, gitBin) {
const cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Review unstage' })
await gitFor(cwd, gitBin).raw(filePath ? ['reset', '-q', 'HEAD', '--', filePath] : ['reset', '-q', 'HEAD'])
return { ok: true }
}
// Discard changes back to the committed state. Destructive — the renderer
// confirms first. Restores tracked files and removes untracked ones.
async function reviewRevert(repoPath, filePath, gitBin) {
const cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Review revert' })
const git = gitFor(cwd, gitBin)
if (filePath) {
await git.raw(['checkout', 'HEAD', '--', filePath]).catch(() => undefined)
await git.raw(['clean', '-fd', '--', filePath]).catch(() => undefined)
} else {
await git.raw(['checkout', 'HEAD', '--', '.']).catch(() => undefined)
await git.raw(['clean', '-fd']).catch(() => undefined)
}
return { ok: true }
}
// Resolve a ref to a commit sha (captures the turn baseline for "Last turn").
async function reviewRevParse(repoPath, ref, gitBin) {
let cwd
try {
cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Review rev-parse' })
} catch {
return null
}
try {
return (await gitFor(cwd, gitBin).revparse([ref || 'HEAD'])).trim() || null
} catch {
return null
}
}
// Commit the working tree. Mirrors VS Code: if nothing is staged, stage
// everything first ("commit all"), then commit. Optionally push afterward,
// setting upstream on the first push.
async function reviewCommit(repoPath, message, push, gitBin) {
const cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Review commit' })
const git = gitFor(cwd, gitBin)
const status = await git.status()
if (status.staged.length === 0) {
await git.raw(['add', '-A'])
}
await git.commit(message)
if (push) {
const fresh = await git.status()
if (fresh.tracking) {
await git.push()
} else if (fresh.current) {
await git.raw(['push', '-u', 'origin', fresh.current])
}
}
return { ok: true }
}
// Gather the context the model needs to draft a commit message: the diff of
// what *will* be committed (staged when anything is staged, else everything
// vs HEAD — mirroring reviewCommit's "stage all when nothing staged" rule),
// the names of untracked files (which carry no diff), and recent commit
// subjects for style. Diff is capped so the payload stays bounded. Reads only.
async function reviewCommitContext(repoPath, gitBin) {
let cwd
try {
cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Review commit context' })
} catch {
return { diff: '', recent: '' }
}
const git = gitFor(cwd, gitBin)
const safe = args => git.diff(args).catch(() => '')
let status
try {
status = await git.status()
} catch {
return { diff: '', recent: '' }
}
// What will land: staged changes if any, otherwise all tracked changes vs HEAD.
let diff = capText(
status.staged.length > 0 ? await safe(['--cached']) : await safe(['HEAD']),
COMMIT_CONTEXT_DIFF_MAX_CHARS,
'diff truncated for commit-message generation'
)
// Untracked files have no diff — list them so new files aren't invisible.
const untracked = status.not_added || []
if (untracked.length > 0) {
const visible = untracked.slice(0, COMMIT_CONTEXT_UNTRACKED_MAX)
const omitted = untracked.length - visible.length
const note =
`\n# New (untracked) files:\n${visible.map(p => `# ${p}`).join('\n')}\n` +
(omitted > 0 ? `# ... ${omitted} more omitted\n` : '')
diff = diff ? `${diff}${note}` : note
}
const recent = await git.raw(['log', '-n', '10', '--pretty=format:%s']).catch(() => '')
return { diff: diff || '', recent: String(recent || '').trim() }
}
async function reviewPush(repoPath, gitBin) {
const cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Review push' })
const git = gitFor(cwd, gitBin)
const status = await git.status()
if (status.tracking) {
await git.push()
} else if (status.current) {
await git.raw(['push', '-u', 'origin', status.current])
}
return { ok: true }
}
// gh availability + auth + whether this branch already has a PR. Reads only;
// drives the PR button's enabled/label state. `ghReady` is false when gh is
// missing OR not authenticated — either way the PR action can't run.
async function reviewShipInfo(repoPath, ghBin) {
let cwd
try {
cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Review ship info' })
} catch {
return { ghReady: false, pr: null }
}
const auth = await runGh(['auth', 'status'], cwd, ghBin)
if (!auth.ok) {
return { ghReady: false, pr: null }
}
const view = await runGh(['pr', 'view', '--json', 'url,state,number'], cwd, ghBin)
if (!view.ok) {
// gh exits non-zero when no PR exists for the branch — that's not an error.
return { ghReady: true, pr: null }
}
try {
const pr = JSON.parse(view.stdout)
return { ghReady: true, pr: pr && pr.url ? { url: pr.url, state: pr.state, number: pr.number } : null }
} catch {
return { ghReady: true, pr: null }
}
}
// Create a PR for the current branch (pushing first so gh has a remote ref),
// letting gh fill title/body from the commits. Returns the new PR url.
async function reviewCreatePr(repoPath, gitBin, ghBin) {
const cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Review create PR' })
await reviewPush(repoPath, gitBin).catch(() => undefined)
const created = await runGh(['pr', 'create', '--fill'], cwd, ghBin)
if (!created.ok) {
throw new Error('gh pr create failed (is gh installed and authenticated?)')
}
const url = created.stdout.trim().split('\n').filter(Boolean).pop() || ''
return { url }
}
// Compact working-tree status for the composer coding rail: branch, ahead/behind,
// per-state change counts, +/- vs HEAD, and a capped changed-file list.
async function repoStatus(repoPath, gitBin) {
let cwd
try {
cwd = resolveRequestedPathForIpc(repoPath, { purpose: 'Repo status' })
} catch {
return null
}
// Session cwds can point at a deleted worktree for a moment (or forever in a
// stale row). simple-git throws at construction time on a missing baseDir, so
// fail soft and hide the coding rail instead of spamming IPC handler errors.
try {
const stat = await fs.stat(cwd)
if (!stat.isDirectory()) {
return null
}
} catch {
return null
}
let git
try {
git = gitFor(cwd, gitBin)
} catch {
return null
}
let status
try {
status = await git.status()
} catch {
// Not a repo / git unavailable / remote backend.
return null
}
const detached = typeof status.detached === 'boolean' ? status.detached : !status.current
const files = status.files.map(file => ({
path: file.path,
staged: isStaged(file),
unstaged: Boolean(file.working_dir && file.working_dir !== ' ' && file.working_dir !== '?'),
untracked: file.index === '?' || file.working_dir === '?',
conflicted: file.index === 'U' || file.working_dir === 'U'
}))
const result = {
branch: detached ? null : status.current || null,
defaultBranch: await defaultBranchName(git),
detached,
ahead: status.ahead || 0,
behind: status.behind || 0,
staged: files.filter(f => f.staged).length,
unstaged: files.filter(f => f.unstaged).length,
untracked: status.not_added.length,
conflicted: status.conflicted.length,
changed: files.length,
added: 0,
removed: 0,
files: files.slice(0, 200)
}
// +/- vs HEAD (staged + unstaged tracked changes). No HEAD yet → leave 0.
try {
const summary = await git.diffSummary(['HEAD'])
result.added = summary.insertions
result.removed = summary.deletions
} catch {
// No commits yet.
}
// `git diff HEAD` ignores untracked files, so a turn that only creates new
// files (the common case — a fresh module, a demo dir) showed +0 in the rail
// while the review pane counted them. Fold untracked insertions into `added`
// so the rail matches reality. Bounded (size cap + concurrency) like the
// review tree; only the capped file slice is counted so a huge untracked tree
// can't stall the probe.
try {
const untracked = status.not_added.slice(0, 500)
for (let i = 0; i < untracked.length; i += UNTRACKED_LINE_COUNT_CONCURRENCY) {
const batch = await Promise.all(
untracked.slice(i, i + UNTRACKED_LINE_COUNT_CONCURRENCY).map(path => untrackedInsertions(cwd, path))
)
result.added += batch.reduce((sum, n) => sum + n, 0)
}
} catch {
// Best-effort: a probe failure just leaves untracked lines uncounted.
}
return result
}
module.exports = {
branchBase,
fileDiffVsHead,
repoStatus,
resolveRenamePath,
reviewCommit,
reviewCommitContext,
reviewCreatePr,
reviewDiff,
reviewList,
reviewPush,
reviewRevParse,
reviewRevert,
reviewShipInfo,
reviewStage,
reviewUnstage
}

View File

@@ -0,0 +1,22 @@
'use strict'
const assert = require('node:assert/strict')
const test = require('node:test')
const { resolveRenamePath } = require('./git-review-ops.cjs')
test('resolveRenamePath: plain path is unchanged', () => {
assert.equal(resolveRenamePath('src/a.ts'), 'src/a.ts')
})
test('resolveRenamePath: simple rename resolves to the new path', () => {
assert.equal(resolveRenamePath('old.ts => new.ts'), 'new.ts')
})
test('resolveRenamePath: brace rename resolves to the new path', () => {
assert.equal(resolveRenamePath('src/{old => new}/file.ts'), 'src/new/file.ts')
})
test('resolveRenamePath: brace rename collapsing a segment', () => {
assert.equal(resolveRenamePath('src/{lib => }/file.ts'), 'src/file.ts')
})

View File

@@ -0,0 +1,350 @@
'use strict'
// Git-driven worktree operations for the desktop "Start work" flow: spin up a
// fresh worktree the lightest way (`git worktree add -b`), list real worktrees,
// and remove them. Git is the source of truth; the renderer just drives these.
const path = require('node:path')
const fs = require('node:fs')
const { execFile } = require('node:child_process')
const { resolveRequestedPathForIpc } = require('./hardening.cjs')
function runGit(gitBin, args, cwd) {
return new Promise((resolve, reject) => {
execFile(
gitBin,
args,
{ cwd, windowsHide: true, timeout: 30_000, maxBuffer: 8 * 1024 * 1024 },
(err, stdout, stderr) => {
if (err) {
err.stderr = String(stderr || '')
reject(err)
return
}
resolve(String(stdout || ''))
}
)
})
}
// Parse `git worktree list --porcelain`. The first record is the main worktree.
function parseWorktrees(out) {
const trees = []
let cur = null
for (const line of out.split('\n')) {
if (line.startsWith('worktree ')) {
if (cur) {
trees.push(cur)
}
cur = { path: line.slice(9).trim(), branch: null, detached: false, bare: false, locked: false }
} else if (!cur) {
continue
} else if (line.startsWith('branch ')) {
cur.branch = line
.slice(7)
.trim()
.replace(/^refs\/heads\//, '')
} else if (line === 'detached') {
cur.detached = true
} else if (line === 'bare') {
cur.bare = true
} else if (line.startsWith('locked')) {
cur.locked = true
}
}
if (cur) {
trees.push(cur)
}
return trees
}
async function listWorktrees(repoPath, gitBin) {
let resolved
try {
resolved = resolveRequestedPathForIpc(repoPath, { purpose: 'Worktree list' })
} catch {
return []
}
try {
const out = await runGit(gitBin, ['worktree', 'list', '--porcelain'], resolved)
return parseWorktrees(out).map((tree, index) => ({
path: tree.path,
branch: tree.branch,
isMain: index === 0,
detached: tree.detached,
locked: tree.locked
}))
} catch {
return []
}
}
// A git-ref-safe branch name (spaces → "-", drop forbidden chars, trim edges),
// or "" when nothing usable remains. Mirrors the renderer's `gitRef`, so a bad
// value can't reach `git` no matter the caller (the GUI also enforces live).
function sanitizeBranch(name) {
return String(name || '')
.replace(/\s+/g, '-')
.replace(/[^\w./-]/g, '')
.replace(/-{2,}/g, '-')
.replace(/\/{2,}/g, '/')
.replace(/\.{2,}/g, '.')
.replace(/^[-./]+|[-./]+$/g, '')
}
function slugify(name) {
const slug = String(name || '')
.trim()
.toLowerCase()
.replace(/[^a-z0-9]+/g, '-')
.replace(/^-+|-+$/g, '')
.slice(0, 40)
.replace(/-+$/g, '')
return slug || 'work'
}
const TRUNK_BRANCHES = ['main', 'master']
async function gitLine(gitBin, args, cwd) {
try {
return (await runGit(gitBin, args, cwd)).trim()
} catch {
return ''
}
}
async function defaultBranch(gitBin, cwd) {
const remote = (
await gitLine(gitBin, ['symbolic-ref', '--quiet', '--short', 'refs/remotes/origin/HEAD'], cwd)
).replace(/^origin\//, '')
if (remote) {
return remote
}
const configured = await gitLine(gitBin, ['config', '--get', 'init.defaultBranch'], cwd)
if (configured) {
return configured
}
for (const branch of TRUNK_BRANCHES) {
if (await gitLine(gitBin, ['show-ref', '--verify', `refs/heads/${branch}`], cwd)) {
return branch
}
}
return ''
}
// A brand-new project folder isn't a git repo — and a freshly-init'd one has no
// commit to branch from — so `git worktree add` would fail. Make the dir a repo
// with a root commit on the user's behalf so worktrees "just work". No-op for a
// repo that already has commits; never touches the user's files (the seed commit
// is `--allow-empty`), and never inits a dir that already lives inside a repo.
async function ensureGitRepo(gitBin, dir) {
let needsRoot = false
try {
const inside = (await runGit(gitBin, ['rev-parse', '--is-inside-work-tree'], dir)).trim()
if (inside !== 'true') {
await runGit(gitBin, ['init'], dir)
needsRoot = true
} else {
// Repo exists; a worktree still needs a HEAD to branch from.
try {
await runGit(gitBin, ['rev-parse', '--verify', 'HEAD'], dir)
} catch {
needsRoot = true
}
}
} catch {
await runGit(gitBin, ['init'], dir)
needsRoot = true
}
if (needsRoot) {
// Inline identity so the seed commit lands even with no global git config.
await runGit(
gitBin,
[
'-c',
'user.email=hermes@localhost',
'-c',
'user.name=Hermes',
'commit',
'--allow-empty',
'-m',
'Initial commit'
],
dir
)
}
}
// Resolve the repo's MAIN worktree root, so `.worktrees/` always nests under the
// primary checkout even when called from a linked worktree.
async function mainRoot(gitBin, cwd) {
const list = await listWorktrees(cwd, gitBin)
const main = list.find(tree => tree.isMain)
return main ? main.path : cwd
}
function uniqueDir(base) {
let dir = base
let n = 1
while (fs.existsSync(dir)) {
n += 1
dir = `${base}-${n}`
}
return dir
}
async function addExistingBranchWorktree(gitBin, root, name) {
const branch = sanitizeBranch(name)
if (!branch) {
throw new Error('Branch name is required.')
}
if (branch === (await defaultBranch(gitBin, root))) {
await runGit(gitBin, ['switch', branch], root)
return { path: root, branch, repoRoot: root }
}
const dir = uniqueDir(path.join(root, '.worktrees', slugify(branch)))
await runGit(gitBin, ['worktree', 'add', dir, branch], root)
return { path: dir, branch, repoRoot: root }
}
async function addWorktree(repoPath, options, gitBin) {
const resolved = resolveRequestedPathForIpc(repoPath, { purpose: 'Worktree add' })
// A new project's folder may not be a git repo yet — init it (with a root
// commit) so the worktree has something to branch from.
await ensureGitRepo(gitBin, resolved)
const root = await mainRoot(gitBin, resolved)
const opts = options || {}
if (opts.existingBranch) {
return addExistingBranchWorktree(gitBin, root, opts.existingBranch)
}
const slug = slugify(opts.name || `work-${Date.now().toString(36)}`)
const branch = sanitizeBranch(opts.branch) || `hermes/${slug}`
const dir = uniqueDir(path.join(root, '.worktrees', slug))
const args = ['worktree', 'add', '-b', branch, dir]
if (opts.base) {
args.push(String(opts.base))
}
try {
await runGit(gitBin, args, root)
} catch (err) {
// Branch name may already exist — retry checking out the existing branch
// into a fresh worktree dir instead of failing the whole flow.
if (/already exists/i.test(err.stderr || '')) {
await runGit(gitBin, ['worktree', 'add', dir, branch], root)
} else {
throw err
}
}
return { path: dir, branch, repoRoot: root }
}
async function removeWorktree(repoPath, worktreePath, options, gitBin) {
const resolvedRepo = resolveRequestedPathForIpc(repoPath, { purpose: 'Worktree remove (repo)' })
const resolvedTree = resolveRequestedPathForIpc(worktreePath, { purpose: 'Worktree remove (tree)' })
const root = await mainRoot(gitBin, resolvedRepo)
const args = ['worktree', 'remove']
if (options && options.force) {
args.push('--force')
}
args.push(resolvedTree)
await runGit(gitBin, args, root)
return { removed: resolvedTree }
}
// List local branches for the "convert a branch into a worktree" picker, most
// recently committed first. Each carries whether it's already checked out in a
// worktree and, when checked out, that worktree's path. Empty on a non-repo /
// remote backend where the probe can't run.
async function listBranches(repoPath, gitBin) {
let resolved
try {
resolved = resolveRequestedPathForIpc(repoPath, { purpose: 'Branch list' })
} catch {
return []
}
try {
const out = await runGit(
gitBin,
['for-each-ref', '--format=%(refname:short)', '--sort=-committerdate', 'refs/heads'],
resolved
)
const trees = await listWorktrees(resolved, gitBin)
const pathByBranch = new Map(trees.filter(tree => tree.branch).map(tree => [tree.branch, tree.path]))
const trunk = await defaultBranch(gitBin, resolved)
return out
.split('\n')
.map(line => line.trim())
.filter(Boolean)
.map(name => ({
name,
checkedOut: pathByBranch.has(name),
isDefault: Boolean(trunk && name === trunk),
worktreePath: pathByBranch.get(name) || null
}))
} catch {
return []
}
}
async function switchBranch(repoPath, branch, gitBin) {
const resolved = resolveRequestedPathForIpc(repoPath, { purpose: 'Branch switch' })
const target = sanitizeBranch(branch)
if (!target) {
throw new Error('Branch name is required.')
}
await runGit(gitBin, ['switch', target], resolved)
return { branch: target }
}
module.exports = {
addWorktree,
ensureGitRepo,
listBranches,
listWorktrees,
parseWorktrees,
removeWorktree,
sanitizeBranch,
switchBranch
}

View File

@@ -0,0 +1,214 @@
'use strict'
const assert = require('node:assert/strict')
const { execFileSync } = require('node:child_process')
const fs = require('node:fs')
const os = require('node:os')
const path = require('node:path')
const test = require('node:test')
const {
addWorktree,
ensureGitRepo,
listBranches,
parseWorktrees,
sanitizeBranch,
switchBranch
} = require('./git-worktree-ops.cjs')
test('sanitizeBranch: spaces → hyphens, forbidden chars dropped, edges trimmed', () => {
assert.equal(sanitizeBranch('beach vibes'), 'beach-vibes')
assert.equal(sanitizeBranch('feat/cool thing'), 'feat/cool-thing')
assert.equal(sanitizeBranch(' wip~^:? '), 'wip')
assert.equal(sanitizeBranch('///'), '')
})
test('parseWorktrees: main checkout + linked worktree', () => {
const out = [
'worktree /repo',
'HEAD abc123',
'branch refs/heads/main',
'',
'worktree /repo/.worktrees/feat',
'HEAD def456',
'branch refs/heads/hermes/feat',
''
].join('\n')
const trees = parseWorktrees(out)
assert.equal(trees.length, 2)
assert.equal(trees[0].path, '/repo')
assert.equal(trees[0].branch, 'main')
assert.equal(trees[1].path, '/repo/.worktrees/feat')
assert.equal(trees[1].branch, 'hermes/feat')
})
test('parseWorktrees: detached + locked flags', () => {
const out = ['worktree /repo/wt', 'HEAD abc', 'detached', 'locked reason', ''].join('\n')
const trees = parseWorktrees(out)
assert.equal(trees.length, 1)
assert.equal(trees[0].detached, true)
assert.equal(trees[0].locked, true)
assert.equal(trees[0].branch, null)
})
test('parseWorktrees: empty input', () => {
assert.deepEqual(parseWorktrees(''), [])
})
test('ensureGitRepo: inits a plain dir with a root commit so worktrees branch', async () => {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-wt-'))
const git = (...args) => execFileSync('git', args, { cwd: dir }).toString().trim()
try {
await ensureGitRepo('git', dir)
assert.match(git('rev-parse', '--verify', 'HEAD'), /^[0-9a-f]{7,}$/)
// The whole point: a worktree can now branch off the seeded root commit.
execFileSync('git', ['worktree', 'add', '-b', 'wt', path.join(dir, '.worktrees', 'wt')], { cwd: dir })
assert.ok(fs.existsSync(path.join(dir, '.worktrees', 'wt')))
// Idempotent: an already-committed repo gets no extra commit.
await ensureGitRepo('git', dir)
assert.equal(git('rev-list', '--count', 'HEAD'), '1')
} finally {
fs.rmSync(dir, { recursive: true, force: true })
}
})
test('switchBranch: switches a normal checkout branch', async () => {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-switch-'))
const git = (...args) => execFileSync('git', args, { cwd: dir }).toString().trim()
try {
await ensureGitRepo('git', dir)
execFileSync('git', ['branch', 'feature'], { cwd: dir })
await switchBranch(dir, 'feature', 'git')
assert.equal(git('branch', '--show-current'), 'feature')
} finally {
fs.rmSync(dir, { recursive: true, force: true })
}
})
test('listBranches: lists locals and flags the checked-out branch', async () => {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-branches-'))
try {
await ensureGitRepo('git', dir)
const current = execFileSync('git', ['branch', '--show-current'], { cwd: dir }).toString().trim()
execFileSync('git', ['branch', 'feature'], { cwd: dir })
const branches = await listBranches(dir, 'git')
const names = branches.map(b => b.name).sort()
assert.deepEqual(names, [current, 'feature'].sort())
// The repo's own checkout is flagged; the unused branch is convertible.
assert.equal(branches.find(b => b.name === current).checkedOut, true)
assert.equal(branches.find(b => b.name === current).isDefault, true)
assert.equal(fs.realpathSync(branches.find(b => b.name === current).worktreePath), fs.realpathSync(dir))
assert.equal(branches.find(b => b.name === 'feature').checkedOut, false)
assert.equal(branches.find(b => b.name === 'feature').isDefault, false)
assert.equal(branches.find(b => b.name === 'feature').worktreePath, null)
} finally {
fs.rmSync(dir, { recursive: true, force: true })
}
})
test('listBranches: flags a free default branch as default, not checked out', async () => {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-branches-default-'))
const git = (...args) => execFileSync('git', args, { cwd: dir }).toString().trim()
try {
await ensureGitRepo('git', dir)
const trunk = git('branch', '--show-current')
execFileSync('git', ['switch', '-c', 'rawr'], { cwd: dir })
const branches = await listBranches(dir, 'git')
const defaultBranch = branches.find(b => b.name === trunk)
assert.equal(defaultBranch.checkedOut, false)
assert.equal(defaultBranch.isDefault, true)
assert.equal(defaultBranch.worktreePath, null)
} finally {
fs.rmSync(dir, { recursive: true, force: true })
}
})
test('listBranches: a branch claimed by a worktree is flagged checked out', async () => {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-branches-wt-'))
try {
await ensureGitRepo('git', dir)
execFileSync('git', ['branch', 'feature'], { cwd: dir })
// addWorktree converts the existing "feature" branch into a worktree.
const result = await addWorktree(dir, { existingBranch: 'feature' }, 'git')
assert.equal(result.branch, 'feature')
assert.ok(fs.existsSync(result.path))
const branches = await listBranches(dir, 'git')
assert.equal(branches.find(b => b.name === 'feature').checkedOut, true)
} finally {
fs.rmSync(dir, { recursive: true, force: true })
}
})
test('listBranches: empty on a non-repo path', async () => {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-nonrepo-'))
try {
assert.deepEqual(await listBranches(dir, 'git'), [])
} finally {
fs.rmSync(dir, { recursive: true, force: true })
}
})
test('addWorktree: existingBranch checks the branch out without a new branch', async () => {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-convert-'))
const git = (...args) => execFileSync('git', args, { cwd: dir }).toString().trim()
try {
await ensureGitRepo('git', dir)
execFileSync('git', ['branch', 'cool/feature'], { cwd: dir })
const before = git('branch', '--list').split('\n').length
const result = await addWorktree(dir, { existingBranch: 'cool/feature' }, 'git')
// No new branch was created — only the existing one is checked out.
assert.equal(git('branch', '--list').split('\n').length, before)
assert.equal(result.branch, 'cool/feature')
// Dir is named off the branch slug, nested under the main repo's .worktrees.
assert.match(result.path, /[/\\]\.worktrees[/\\]cool-feature/)
assert.equal(
execFileSync('git', ['branch', '--show-current'], { cwd: result.path }).toString().trim(),
'cool/feature'
)
} finally {
fs.rmSync(dir, { recursive: true, force: true })
}
})
test('addWorktree: existing default branch switches the main checkout, not .worktrees/main', async () => {
const dir = fs.mkdtempSync(path.join(os.tmpdir(), 'hermes-convert-default-'))
const git = (...args) => execFileSync('git', args, { cwd: dir }).toString().trim()
try {
await ensureGitRepo('git', dir)
const trunk = git('branch', '--show-current')
execFileSync('git', ['switch', '-c', 'rawr'], { cwd: dir })
const result = await addWorktree(dir, { existingBranch: trunk }, 'git')
assert.equal(result.branch, trunk)
assert.equal(fs.realpathSync(result.path), fs.realpathSync(dir))
assert.equal(git('branch', '--show-current'), trunk)
assert.equal(fs.existsSync(path.join(dir, '.worktrees', trunk)), false)
} finally {
fs.rmSync(dir, { recursive: true, force: true })
}
})

View File

@@ -1,174 +0,0 @@
'use strict'
// Resolve git-worktree relationships for a set of session cwds, reading git's
// on-disk metadata directly (no `git` spawn per path):
//
// - A normal checkout has a `.git` DIRECTORY at its root → it's the main
// worktree; its repo root IS that directory's parent.
// - A linked worktree has a `.git` FILE: `gitdir: <repo>/.git/worktrees/<name>`.
// That admin dir's `commondir` points back at the shared `<repo>/.git`, whose
// parent is the main repo root.
//
// Grouping by repoRoot therefore clusters a repo's main checkout with all of its
// linked worktrees, regardless of how the worktree directories are named. The
// branch (read from the worktree's own HEAD) gives each worktree a meaningful
// label.
const fs = require('node:fs')
const path = require('node:path')
const { resolveRequestedPathForIpc } = require('./hardening.cjs')
// Walk up from `start` to the nearest ancestor that carries a `.git` entry
// (file for a linked worktree, dir for the main checkout). Capped so a stray
// path can't loop forever.
function findGitHost(start, fsImpl) {
let dir = start
for (let i = 0; i < 64; i += 1) {
const dotgit = path.join(dir, '.git')
try {
if (fsImpl.existsSync(dotgit)) {
return dir
}
} catch {
return null
}
const parent = path.dirname(dir)
if (parent === dir) {
return null
}
dir = parent
}
return null
}
function readBranch(gitDir, fsImpl) {
try {
const head = fsImpl.readFileSync(path.join(gitDir, 'HEAD'), 'utf8').trim()
const ref = head.match(/^ref:\s*refs\/heads\/(.+)$/)
if (ref) {
return ref[1]
}
// Detached HEAD: surface a short sha so the worktree still gets a label.
return /^[0-9a-f]{7,40}$/i.test(head) ? head.slice(0, 8) : null
} catch {
return null
}
}
// Given the directory that owns the `.git` entry, resolve its worktree identity.
function resolveFromHost(host, fsImpl) {
const dotgit = path.join(host, '.git')
let stat
try {
stat = fsImpl.statSync(dotgit)
} catch {
return null
}
if (stat.isDirectory()) {
return {
repoRoot: host,
worktreeRoot: host,
isMainWorktree: true,
branch: readBranch(dotgit, fsImpl)
}
}
// Linked worktree: `.git` is a file pointing at the admin dir.
let contents
try {
contents = fsImpl.readFileSync(dotgit, 'utf8').trim()
} catch {
return null
}
const match = contents.match(/^gitdir:\s*(.+)$/m)
if (!match) {
return null
}
const adminDir = path.resolve(host, match[1].trim())
// `commondir` resolves to the shared `<repo>/.git`; fall back to walking two
// levels up from `<repo>/.git/worktrees/<name>` if it's missing.
let commonDir
try {
const rel = fsImpl.readFileSync(path.join(adminDir, 'commondir'), 'utf8').trim()
commonDir = path.resolve(adminDir, rel)
} catch {
commonDir = path.dirname(path.dirname(adminDir))
}
return {
repoRoot: path.dirname(commonDir),
worktreeRoot: host,
isMainWorktree: false,
branch: readBranch(adminDir, fsImpl)
}
}
function resolveWorktree(startPath, fsImpl = fs) {
let resolved
try {
resolved = resolveRequestedPathForIpc(startPath, { purpose: 'Worktree lookup' })
} catch {
return null
}
let start = resolved
try {
const stat = fsImpl.statSync(resolved)
if (!stat.isDirectory()) {
start = path.dirname(resolved)
}
} catch {
return null
}
const host = findGitHost(start, fsImpl)
if (!host) {
return null
}
return resolveFromHost(host, fsImpl)
}
// Batch entry point for the renderer: maps each requested cwd to its worktree
// info (or null when it isn't inside a git checkout / can't be read). Dedupes so
// many sessions sharing a cwd cost one lookup.
async function worktreesForIpc(cwds, options = {}) {
const fsImpl = options.fs || fs
const list = Array.isArray(cwds) ? cwds : []
const out = {}
for (const cwd of list) {
if (typeof cwd !== 'string' || !cwd.trim() || cwd in out) {
continue
}
out[cwd] = resolveWorktree(cwd, fsImpl)
}
return out
}
module.exports = {
resolveWorktree,
worktreesForIpc
}

Some files were not shown because too many files have changed in this diff Show More