The messaging gateway logs every inbound message to gateway.log, but its
dashboard/TUI twin (gui.log) was nearly silent: the dashboard FastAPI app had
host-header/auth-gate/auth middlewares but no access log, and 3 of 4 WebSocket
endpoints logged nothing at all. Worst of these, /api/pty logged 'pty accepted'
on connect but was completely silent on close — a PTY EOF (backend crash), a
send failure, or a client drop left no trace, so user-reported 'chat
disconnected / TUI froze' was unreproducible.
Extend the convention tui_gateway/ws.py::handle_ws already establishes (a
structured 'ws closed peer=... reason=... <counters>' line) across the whole
surface, at the same INFO granularity as gateway.log, into the gui.log that is
already sized for it (10MB x5):
- HTTP access-log middleware (registered LIFO-outermost so it captures the
final status, including 400/401 from the middlewares above): one INFO line
per request with method, path, status, latency, request id, peer. Path only,
never the query string (tokens ride in query on some routes). UA/referer at
DEBUG (-v). Reads/echoes X-Request-ID for client/proxy correlation.
- /api/pty: structured close line covering all exit paths
(client_disconnect | pty_eof | send_failed | error) with duration and
bytes_in/out counters.
- /api/pub + /api/events: accept + structured close (reason/duration/frames)
+ all reject paths.
- /api/ws: reject paths logged; request id threaded into handle_ws and stamped
on its accept/close lines so a WS session correlates with the HTTP upgrade.
Metadata only — no request/response bodies, no WS frame payloads, no
headers/cookies on the INFO lines. Opt-in body capture is a separate change so
this stays clear of the debug-share privacy surface.
handle_ws gains an optional rid=None arg, backward-compatible with the stdio
entry-point (tui_gateway.entry) which calls handle_ws(ws).
Tests (behavior-contract style, not frozen strings): HTTP access line shape +
query-string redaction + 401-still-logged + X-Request-ID round-trip; WS
accept/close lines for /api/pub and /api/events; WS reject logging; rid
propagation through handle_ws.
TMUX is not forwarded over SSH, so a TUI launched on a remote host from
inside local tmux only sees TERM=tmux/tmux-256color with no TMUX var --
the cursor-drift bug still applies there. Extend supportsFastEchoTerminal()
to also fall back when TERM is tmux-flavored.
Deliberately scoped to tmux* only, NOT screen*: GNU screen sets the same
screen/screen-256color TERM and has no reported drift, so widening to
screen would disable the optimization for those users with no evidence of
a bug (matching the original PR's stated out-of-scope note).
Adds tests for tmux-flavored TERM (disabled) and screen/xterm TERM
(stays enabled) to guard against accidental widening.
When /goal (and other _PENDING_INPUT_COMMANDS: retry, queue, q, steer,
plan, undo) were typed in the TUI desktop app, slash.exec returned error
4018 instructing the frontend to fall back to command.dispatch. Some
clients failed that client-side fallback, leaving the command empty and
surfacing "empty command" — the user's typed text was silently dropped.
slash.exec now routes pending-input commands to command.dispatch
internally, eliminating the fragile client-side fallback hop. The
response is exactly what command.dispatch would have produced, so the
TUI client behaves identically once the round-trip succeeds.
Salvaged from #48944 — rebased onto current main. The original PR's
source change and test_goal_command.py update are correct, but it missed
the second test surface: tests/tui_gateway/test_protocol.py's
parametrized test_slash_exec_rejects_pending_input_commands still
asserted the old 4018 rejection for retry/queue/q/steer/plan, turning CI
red (5 failures). That test is rewritten here as a behavior contract:
slash.exec for a pending-input command must yield the same payload as a
direct command.dispatch call, and must no longer emit the old
"pending-input command" fallback rejection.
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
`hermes backup` walked every file under HERMES_HOME, excluding only
hermes-agent / node_modules / __pycache__ / backups / checkpoints. Python
dependency trees (plugin and MCP-server venvs, site-packages) and pip/uv
tool caches that live under HERMES_HOME were swept in file-by-file,
ballooning a backup to hundreds of thousands of entries that crawl for
hours — the reported "backup stuck for days / 426543 files" symptom.
Add the canonical regeneratable-dir names (.venv, venv, site-packages,
.tox, .nox, .pytest_cache, .mypy_cache, .ruff_cache — mirroring
agent.skill_utils.EXCLUDED_SKILL_DIRS) plus .cache to the backup's
exclusion set, used by both run_backup and the pre-update/pre-migration
_write_full_zip_backup. .archive is intentionally left in so the curator's
restorable archived skills still get backed up.
Tests cover each new dir name (excluded at any depth), that .archive and
cache-resembling files are kept, and an integration check that a planted
venv/site-packages/cache is pruned from the actual backup zip while
skills/config survive.
The batch tool_status values ('completed'/'error'/'pending') and the inbound
status alias sets were inline magic strings, duplicated across two checks in
_tool_result_status. Hoist them to module-level constants
(_TOOL_STATUS_* + _TOOL_STATUS_{ERROR,COMPLETED}_ALIASES) so the canonical
wire values and the alias->canonical mapping live in one place. Emitted
values are unchanged.
_messages_to_openviking_batch's pre-scan already parses and caches each
tool call's arguments into tool_calls_by_id. The pending-tool-call branch
re-parsed them via _tool_call_input(), a second parse and a second source
of truth. Reuse the cached tool_input when the id was cached (non-empty),
falling back to a parse only for the uncached empty-id case so arguments
are never dropped. No behavior change.
_OPENVIKING_RECALL_TOOL_NAMES hardcoded the three read-tool names as string
literals, which can silently desync from the *_SCHEMA["name"] constants on a
rename (the same drift the adjacent _CATEGORY_SUBDIR_MAP comment warns about).
Derive the set from SEARCH/READ/BROWSE_SCHEMA["name"] instead. Write tools
(viking_remember / viking_add_resource) remain intentionally excluded. Set
contents are unchanged.
The npm workspace pins a single npmDepsHash for fetchNpmDeps. Any change to
package-lock.json that doesn't also refresh that hash breaks the bundled
hermes-tui / hermes-desktop-renderer build for Nix flake consumers, and no
nix CI catches it — the workflow that ran fix-lockfiles was removed in
9eb0bcd6 ("change(ci): rip out nix ci for now").
Fetch the workspace deps with pkgs.importNpmLock instead. It resolves each
package from the lockfile's own integrity hashes, so package-lock.json is the
single source of truth and there is no separate hash to drift.
This also removes:
- the fix-lockfiles checker/refresher and its devShell wiring — it existed
only to keep npmDepsHash in sync, so it is dead once the hash is gone, and
its sole CI consumer was already removed in 9eb0bcd6;
- the patchPhase that normalized lockfile trailing newlines — importNpmLock's
npmConfigHook overwrites the lockfile rather than diffing it, so the
normalization is unnecessary.
npm-lockfile-fix is retained: importNpmLock requires an integrity-complete
lockfile, which that tool guarantees when the lockfile is regenerated.
Co-authored-by: ak2k <19240940+ak2k@users.noreply.github.com>
Two follow-up fixes on top of the cherry-picked structured-sync work:
- _messages_to_openviking_batch only added a recall tool result's id to
skipped_tool_ids when the id was non-empty. An empty tool_call_id (which
the canonical transcript can carry; agent_runtime_helpers defaults it to
"") poisoned the skip set with "", silently dropping any *other* tool
result that also lacked an id. Move the recall-skip add inside the
existing `if tool_id:` guard. Adds a regression test (mutation-checked:
fails on pre-fix code, passes after).
- _sync_trace_enabled() open-coded the canonical truthy-env check; reuse
utils.env_var_enabled (byte-identical {1,true,yes,on} semantics).
* feat(skills): add html-artifact skill, fold in sketch + architecture-diagram + concept-diagrams
Adds a unified `html-artifact` creative skill that produces self-contained,
single-file HTML artifacts — concept explainers, implementation plans,
status/incident reports, code-review walkthroughs, technical + educational
SVG diagrams, multi-variant design comparisons, and throwaway editors that
export their state back to the clipboard. Grounded in Anthropic's
html-effectiveness gallery (MIT); the house style (token block, serif/sans/
mono split, hand-rolled diffs, inline-SVG diagrams, graceful degradation) is
distilled from reading all 20 reference files.
Supersedes and removes three overlapping skills, folding their unique value in:
- sketch -> the fidelity dial (throwaway vs presentation) + the
multi-variant comparison layouts + the browser-vision
verify loop (references/fidelity-and-verify.md)
- architecture-diagram-> the dark "infra" token variant + double-rect masking +
semantic component palette (references/dark-tech.md,
templates/diagram.html infra mode)
- concept-diagrams -> the 9-ramp educational color system + the concept
archetype library (references/concept-archetypes.md,
the light design system in templates/diagram.html)
Structure:
- SKILL.md (description exactly 60 chars), 6 references, 3 templates
- templates verified by headless-Chrome render + vision inspection
- editor export logic (file://-safe clipboard, Promise-normalized) verified in node
Cross-references updated in claude-design (new disambiguation table row drawing
the design-taste vs information-artifact boundary), design-md, pretext, spike,
and kanban-video-orchestrator. Website skill docs + catalogs regenerated;
stale EN/zh-Hans per-skill pages pruned and i18n cross-refs fixed.
Not folded (intentionally orthogonal): excalidraw (.excalidraw JSON), p5js
(generative canvas), claude-design / popular-web-designs / design-md (visual
design taste / brand vocab / token spec).
* feat(skills): ship html-effectiveness gallery as fetched reference examples
Add scripts/fetch-examples.sh (idempotent clone/pull of Anthropic's MIT
html-effectiveness gallery) + references/examples.md mapping each of the 20
example files to a mode so the agent reads the right worked example. The clone
lands in references/examples/ and is gitignored (it's a 384KB upstream repo,
not vendored). SKILL.md workflow + reference list now point at it; falls back to
the distilled pattern references when offline.
* feat(skills): make reading a gallery example a required authoring step
Reading the matching html-effectiveness example is now workflow step 2 (was an
optional aside in step 3): fetch the gallery, read_file the file for your mode,
mirror its structure. Models skip optional steps; the examples are the ground
truth, so consulting one is mandatory. Added an 'Example' column to the
mode->build quick-reference table and a 'don't skip the example' pitfall.
Also dogfooded the skill: read 03-code-review-pr.html and 13-flowchart-diagram.html
raw and reconciled the distilled references against source — aligned diff-row tint
opacity to the source's 0.15 (was 0.18) and added the .ctx/.hunk rows in
house-style.md + base.html so they match 03-code-review-pr.html verbatim.
* docs(skills): explain the consolidation + bundled-vs-optional rationale
The supersession note only stated *what* was folded, not *why* the prune is
sound. Expand SKILL.md's intro into a 'Why this skill exists' section: the three
former skills emitted the same artifact and overlapped, so consolidating removes
which-one-do-I-load ambiguity; and the optional->bundled promotion of
concept-diagrams is footprint-safe because this skill has zero deps (only cost is
the 60-char description; everything else is progressive-disclosure). States the
bundling dividing line explicitly: zero install cost + broadly useful gets
bundled, real install cost (hyperframes: Node+FFmpeg+Chromium) stays optional.
Regenerated website per-skill page to match.
Follow-up to the salvaged hosted /exit fix. Instead of a separate 4-env-var
fingerprint (HERMES_TUI_INLINE + /opt/data HERMES_HOME + HERMES_WRITE_SAFE_ROOT
+ HERMES_DISABLE_LAZY_INSTALLS), gate /exit and /quit on the existing
DASHBOARD_TUI_MODE flag (HERMES_TUI_DASHBOARD) that the keyboard idle-exit
(useInputHandlers) and SIGINT-ignore (entry.tsx) paths already use. One hosted
detection mechanism instead of two divergent ones.
Extract the refusal text to an exported DASHBOARD_EXIT_DISABLED_MESSAGE so the
test asserts the same source of truth as production (no change-detector on the
literal). Test mocks only the DASHBOARD_TUI_MODE export via importActual so the
other env exports stay real.
- Drop empty entries before validating SLACK_ALLOWED_USERS so a trailing or
interior comma (which the gateway silently tolerates in
gateway/platforms/slack.py) is no longer rejected at the dashboard.
- Hoist the member-ID regex to a module-level _SLACK_MEMBER_ID_RE constant
and note it stays in sync with the frontend SLACK_MEMBER_ID_RE.
- Add a regression test for the trailing-comma case.
The new SLACK_ALLOWED_USERS validation rejected '*', but the Slack gateway
honors '*' as an allow-all wildcard (gateway/platforms/slack.py DM auth,
slash-confirm, and approval-button paths). Accept '*' as a valid list entry
in both the API validator and the dashboard form so a value the runtime
honors is no longer blocked at setup.
* fix(relay): enable RELAY platform + normalize dial URL so hosted gateways actually connect
Three bugs blocked a self-provisioned hosted gateway from ever establishing its
inbound relay WS (found while standing up the live staging end-to-end). Each
masked the next; all three are needed for inbound to work.
1. RELAY platform never enabled in config.platforms (gateway/config.py).
register_relay_adapter() puts the adapter in the platform_registry, but
start_gateway()'s connect loop iterates self.config.platforms — which never
contained Platform.RELAY. So the adapter was "registered" but never connected
(logs showed "relay adapter registered" then "No messaging platforms
enabled"). Fix: _apply_env_overrides now enables Platform.RELAY (mirroring
relay_url into extra for the connected-checker) when GATEWAY_RELAY_URL (env)
or gateway.relay_url (yaml) is set. Absent -> no RELAY entry (direct/
single-tenant gateways unaffected).
2. URL scheme not converted for the WS dial (gateway/relay/ws_transport.py).
The relay URL is configured once as the http(s):// base (used as-is for the
provision POST), but websockets.connect rejects http(s):// with "scheme isn't
ws or wss". Fix: _ws_dial_url converts https->wss / http->ws.
3. /relay path not appended (same helper). The connector mounts its
WebSocketServer at path "/relay" and returns HTTP 400 on an upgrade to any
other path. GATEWAY_RELAY_URL is the base (no /relay), so the dial hit "/"
-> 400. Fix: _ws_dial_url ensures the path ends in /relay. Idempotent — a URL
already carrying ws(s):// and/or /relay is unchanged, so provision's
_provision_url (which derives /relay/provision from either form) still works.
Why the cross-repo E2E missed #2/#3: the stub connector binds ws://host:port and
its websockets.serve accepts ANY path, so neither the scheme nor the /relay path
was exercised. Real connector needs both.
Verified live on staging hermes-agent-stg-automated-perception-5054: after the
fixes the gateway logs "Connecting to relay..." -> "✓ relay connected" ->
"Gateway running with 1 platform(s)" against
wss://gateway-gateway.staging-nousresearch.com/relay, stable.
Tests: added _ws_dial_url scheme+path+idempotency cases (test_ws_transport.py)
and RELAY-platform-enablement cases for env + yaml + absent (test_config.py).
Full gateway/relay + config suites green (191 passed).
Relay-adapter lane. EXPERIMENTAL.
* fix(relay): re-attach guild_id to outbound so connector egress resolves the tenant
The final bug in the hosted-relay round-trip. Inbound worked end to end (Discord
-> connector -> bus -> agent WS -> agent runs -> reply), but the reply's egress
was declined by the connector: "discord egress declined: target not routed to an
onboarded tenant".
Cause: the connector's routedEgressGuard resolves the owning tenant from the
OUTBOUND action's metadata.guild_id (Discord's routing discriminator). The
gateway's generic delivery path builds outbound metadata via
run.py _thread_metadata_for_source, which only carries thread_id (and returns
None entirely for a non-threaded message) — so guild_id never reached the
connector, tenant resolution failed, and the shared bot refused to post.
Fix (relay-adapter-local, no perturbation of the generic delivery path or other
platforms): RelayAdapter learns chat_id -> guild_id from each inbound event
(_capture_scope) and re-attaches it to the outbound action's metadata in send()
(_with_scope) when not already present. No-op for chats we never saw inbound
(e.g. DMs) and never overwrites an explicit guild_id.
Verified live on staging hermes-agent-stg-automated-perception-5054: an
@mention in #general now produces a visible bot reply — full multi-tenant relay
round-trip (real Discord -> shared connector bot -> tenant routing -> agent WS ->
reply egress -> Discord).
Tests: _capture_scope/_with_scope reattach, no-scope no-op, explicit-guild_id
preserved (test_relay_adapter.py). Full relay + config suites green (160 passed).
Relay-adapter lane. EXPERIMENTAL.
systemctl --user restart hermes-gateway run via the terminal tool is a
child of the gateway itself. When systemd delivers SIGTERM the gateway
kills this subprocess before it can complete, so the service may never
restart — reproducing issue #37453.
The hermes gateway restart/stop guard (hermes_cli/gateway.py) and the
cron-path guard (hermes_cli/cron.py) already block equivalent commands
in their respective paths but the terminal tool had no such defense.
Add a hard-block before command execution in terminal_tool: when
_HERMES_GATEWAY=1 and the command matches _contains_gateway_lifecycle_command,
return an error immediately. force=True cannot bypass it — unlike the
normal dangerous-command approval flow, here even a user-approved restart
would fail because the SIGTERM propagates to child processes.
Also extend _GATEWAY_LIFECYCLE_PATTERNS to match systemctl with flags
(e.g. systemctl --user restart) — the previous regex required the
action word immediately after systemctl with no flags in between.
Adds 9 regression tests: 6 blocked variants (parametrized), force bypass
attempt, safe systemctl passthrough, and guard-inactive-outside-gateway.
* feat(image-gen): add image-to-image / editing to image_generate
Brings image generation to parity with video generation: the unified
image_generate tool now edits/transforms a source image (image-to-image)
when given image_url / reference_image_urls, routing to each backend's
edit endpoint, exactly as video_generate routes to image-to-video.
- ImageGenProvider ABC: generate() gains keyword-only image_url +
reference_image_urls; new capabilities() declares modalities +
max_reference_images (defaults to text-only, backward compatible).
success_response gains a modality field; adds normalize_reference_images.
- image_generate tool: schema exposes image_url + reference_image_urls;
dynamic schema reflects the active model's actual edit capability so the
agent knows when image_url is honored. Handler + plugin dispatch forward
the new inputs; legacy/text-only providers get a clear modality_unsupported
error instead of silently dropping the source image.
- In-tree FAL: 7 models gain edit endpoints (flux-2-klein, flux-2-pro,
nano-banana-pro, gpt-image-1.5, gpt-image-2, ideogram/v3, qwen-image)
with per-model edit_supports whitelists + reference caps; routes to the
/edit endpoint and skips the upscaler for edits.
- Plugins: openai (images.edit, 16 refs), xai (/v1/images/edits via
grok-imagine-image-quality, JSON body per xAI docs), krea
(image_style_references, 10 refs). openai-codex stays text-only and
rejects edits with an actionable error.
- Tests: 15 new (payload, routing, dispatch forwarding, dynamic schema,
capabilities); updated 2 change-detector/lambda tests for the new schema.
- Docs: image-generation feature page, image-gen provider plugin guide,
tools reference.
* fix(image-gen): preserve legacy passthrough in fal/krea plugin tests
Two existing plugin tests asserted pre-image-to-image behavior:
- fal: forward image_url/reference_image_urls only when supplied, so a
text-to-image delegation stays byte-identical (no None kwargs).
- krea: keep dict-shaped image_style_references refs verbatim (the unified
string refs go through normalize_reference_images; legacy non-string ref
objects pass through unchanged) — fixes KeyError when callers pass the
richer Krea ref-object shape.
* fix(image-gen): clearer not-capable message for text-to-image-only models
When a text-to-image-only model (incl. gpt-image-2 on the Codex OAuth path,
which can't do editing through the Responses image_generation tool) gets a
source image, say 'this model is not capable of image-to-image / editing —
provide a text-only prompt' rather than sending the user shopping for other
backends. Applies to the openai-codex guard, the in-tree FAL no-edit-endpoint
error, and the dynamic tool-schema text-only line.
The desktop model picker had no way to force a fresh model fetch: model.options
went through the 1h-cached provider_models_cache.json, and there was no flag to
bust it. When a provider's cached list expired and its next live fetch failed,
the picker fell back to the curated static list — silently dropping live-only
models (e.g. OpenCode Zen's free tier like deepseek-v4-flash-free) the user had
been using.
- Thread refresh through model.options (RPC + REST /api/model/options) ->
build_models_payload -> list_authenticated_providers, which calls
clear_provider_models_cache() up front when set so every row re-fetches live.
- Add a 'Refresh Models' control to the desktop picker (5-locale i18n, spinning
sync icon). Normal opens leave refresh=false to stay snappy on the cache.
Verified: stale cache hides deepseek-v4-flash-free -> refresh busts it -> live
re-fetch surfaces it. refresh=false never touches the cache.
* fix(dashboard): resolve chat TUI argv off event loop
Dashboard chat now resolves its TUI launch command off the
FastAPI/WebSocket event loop. The resolver can run `npm install` /
`npm run build` through `_make_tui_argv()`, and doing that synchronously
in `/api/pty` can block proxy keepalives and other dashboard WebSocket
work long enough for reverse-proxy deployments to drop the chat
connection.
This keeps the current TUI build policy intact: normal production
launches still run the correctness-first `npm run build` path, while
`HERMES_TUI_DIR` remains the prebuilt/no-build path for distros and
containers. The change only moves the potentially slow resolver work to
a worker thread for the dashboard chat path, serialized by an
`asyncio.Lock` so concurrent chat tabs preserve one-build-at-a-time
behavior. `SystemExit` (node/npm missing) and the profile `HTTPException`
path still propagate cleanly through `asyncio.to_thread()`.
Salvaged from #26124 — rebased onto current main. The async wrapper now
threads the `profile` parameter that `_resolve_chat_argv` gained on main
since the PR was opened, so cross-profile chat is preserved.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
* chore: add 0xdany to AUTHOR_MAP
* fix(dashboard): bind chat-argv lock to app.state; cover error propagation
Self-review hardening on top of the salvaged fix:
- Move `_chat_argv_lock` from a module-level `asyncio.Lock()` onto
`app.state` (initialised in `_lifespan`, lazy fallback via
`_get_chat_argv_lock`), mirroring `event_lock`. A module-level
`asyncio.Lock()` binds to whatever event loop is active at import time,
which is the exact pattern `_get_event_state`'s docstring warns against
(breaks across TestClient instances / uvicorn reloads). This keeps the
lock on the running loop.
- Add two tests exercising the real `_resolve_chat_argv_async` →
`asyncio.to_thread` → lock → re-raise chain: `SystemExit` (node/npm
missing) and `HTTPException` (invalid profile) both propagate out of the
worker thread and are caught by `pty_ws`'s existing handlers. The prior
tests mocked `asyncio.to_thread` away and never covered this path.
* test(dashboard): dedupe pty error-propagation tests; assert close code
simplify-code cleanup pass on the salvage stack:
- Extract the shared scaffolding of the two pty_ws error-propagation tests
into `_assert_pty_propagates`, keeping the two tests as distinct contracts
for the `except SystemExit` and `except HTTPException` arms.
- Assert the stable WebSocket close code (1011) instead of relying solely on
the user-facing "Chat unavailable" notice wording — a behavior contract per
the AGENTS.md "behavior contracts over snapshots" rule, robust to notice
rewording. The detail substring ("unknown profile") is still checked for the
HTTPException case since proving the detail survives the thread hop is the
point of that test.
No production-code change; the helper exercises the same real
_resolve_chat_argv_async -> asyncio.to_thread -> lock -> re-raise chain.
---------
Co-authored-by: draihan <draihan@student.ubc.ca>
git worktree lock at creation and unlock before removal. A locked
worktree refuses 'git worktree remove' (and prune), so a second hermes
process or a stray cleanup can't silently delete an in-use isolated
worktree. Fail-soft on both paths — a lock/unlock error never blocks
the session or cleanup.
Salvaged from #47029 (Issue #46303). Unlock moved to the actual-removal
path so a preserved (unpushed-commits) worktree stays locked while in use.
When SessionDB init fails, the CLI/Desktop previously continued live with only
a buried log line. The chat looks healthy, but the transcript is never written
to state.db — so resume later shows a truncated or empty session and the user
only discovers the loss after the fact (#41386).
Emit a prominent stderr banner at startup when the store is unavailable, making
it explicit that the conversation will not be saved and cannot be resumed, with
a pointer to fix the store. Also set _session_db_unavailable so downstream code
can detect the degraded state.
- Scope 'no such tokenizer' matcher to trigram specifically (#779)
- Decouple base FTS and trigram backfill in v11 migration (#1195)
- CJK search falls back to LIKE when trigram unavailable (#3384/#3430)
- Add _trigram_available tracking across init, migration, and startup
- Add regression tests for migration backfill and CJK LIKE fallback
- Add _is_trigram_unavailable_error and _warn_trigram_unavailable helpers
_is_fts5_unavailable_error only matched 'no such module: fts5', but
SQLite builds that ship FTS5 without the optional trigram tokenizer
raise 'no such tokenizer: trigram' instead. This caused SessionDB init
to crash on those builds.
Additionally, the trigram failure path called _warn_fts5_unavailable
which set _fts_enabled = False, globally disabling full-text search
even though the base FTS5 table was created successfully.
Fix:
- Extend _is_fts5_unavailable_error to also match 'no such tokenizer'
- Add _is_tokenizer_unavailable_error to distinguish tokenizer-specific
failures from whole-module absence
- Only call _warn_fts5_unavailable for module-level failures; skip it
for tokenizer-specific failures so base FTS5 remains usable
Fixes#47002
self_provision_if_managed() gated on is_managed(), but is_managed() means
"NixOS/package-manager-managed" (it keys on HERMES_MANAGED or a ~/.hermes/.managed
marker) — NOT "NAS-hosted". A NAS-provisioned Fly agent sets NEITHER, so the gate
was always False and relay self-provision SILENTLY no-oped on exactly the hosted
agents it was built for. Caught live: a staging agent with GATEWAY_RELAY_URL
correctly stamped logged "No messaging platforms enabled" and never dialed the
connector; HERMES_MANAGED was unset on the machine. The unit tests had mocked
is_managed()->True, so they passed while the real trigger never fired (mocked-
trigger blind spot).
Fix: drop the is_managed() gate and rename self_provision_if_managed ->
self_provision_relay. The real trigger is now "relay_url() set + no pinned secret
+ a resolvable NAS token", which is both NAS-independent and self-guarding:
- NAS-hosted agent: GATEWAY_RELAY_URL + no pinned secret + bootstrapped NAS
token -> self-provisions.
- Self-hosted + `hermes gateway enroll`: pinned GATEWAY_RELAY_SECRET -> skipped
(existing secret-present guard).
- Self-hosted, unenrolled, no NAS identity: resolve_nous_access_token() fails
-> graceful no-op (existing fail-soft path).
Security: unchanged trust model. The connector still derives tenant from the
validated NAS token; this only broadens WHEN the provision attempt fires, and
every broadened case is still guarded by token-resolution + pinned-secret-skip.
Tests: replaced the (wrong) "skips when not managed" test with a regression test
proving a NAS host where is_managed()==False STILL provisions; renamed all call
sites; added a "no NAS token -> non-fatal skip" test for the self-hosted branch.
88 relay tests pass.
Relay-adapter lane. EXPERIMENTAL.
The connector now delivers inbound (messages + interrupts) over the gateway's
OUTBOUND /relay WebSocket, not a signed HTTP POST to an inbound endpoint. The
gateway needs no inbound HTTP port — which is what makes hosted gateways (no
public IP) able to receive inbound at all.
- gateway/relay/adapter.py: connect() wires set_interrupt_inbound_handler(
self.on_interrupt) so connector->gateway interrupt_inbound frames bridge into
the existing per-session interrupt path (the inbound message handler was
already wired). Removed _maybe_start_inbound_receiver() + the _inbound_runner
lifecycle — there is no HTTP receiver anymore.
- gateway/relay/inbound_receiver.py: deleted (the signed-HTTP InboundDelivery
receiver).
- gateway/relay/__init__.py: removed relay_inbound_config() (dead with the
receiver gone). The delivery key is still set in-process by self-provision for
forward-compat but is no longer consumed for inbound.
- docs/relay-connector-contract.md: §3 rewritten — inbound is the WS back-channel
routed cross-instance via the connector's relay bus; §5 interrupt + §6 auth
table updated; the old signed-HTTP-POST + per-tenant-delivery-key-signing path
is documented as superseded. gatewayEndpoint noted as passthrough-plane only.
Tests: stub_connector grows set_interrupt_inbound_handler + push_interrupt;
new test_relay_interrupt case proves connect() wires BOTH inbound handlers and an
interrupt_inbound frame over the WS cancels the right session. Removed the
HTTP-receiver test; updated the crypto-shedding scan + self-provision delivery-key
assertion. 88 relay tests pass.
EXPERIMENTAL. Pairs with gateway-gateway (relay bus + WsGatewayDelivery) and the
NAS GATEWAY_RELAY_URL stamp. The cross-repo E2E (connector repo) proves the full
multi-instance path against this production adapter code.
* fix(desktop): show Hindsight memory provider
* feat(desktop): configure Hindsight memory provider
* fix(desktop): limit Hindsight modes to supported setup
* refactor(desktop): generic memory-provider config surface
Replace the bespoke Hindsight settings surface with a declarative,
schema-driven path so adding a memory provider is pure declaration —
no per-provider page, conditional, or endpoint.
- memory_providers.py: declarative registry. Each provider lists its
fields {key, label, kind, default, options, secret-vs-plain}. Hindsight's
mode is a select(cloud, local_external), so rejecting local_embedded
falls out of generic enum validation instead of a hand-written check.
- One generic endpoint pair GET/PUT /api/memory/providers/{name}/config.
GET returns declared fields + current values (secrets only as is_set,
never read back); PUT validates selects against their options, writes
plain fields to the provider config file, secrets to the env store,
and flips memory.provider.
- ProviderConfigPanel renders straight from the schema, replacing
hindsight-settings.tsx and the memory.provider === 'hindsight'
conditional in config-settings.tsx — same pattern as
toolset-config-panel.tsx off env_vars.
Scoped to memory providers; storage layout is unchanged so the runtime
Hindsight plugin reads the same config.json / HINDSIGHT_API_KEY / provider
keys as before. Tests cover the registry, endpoint behavior (defaults,
write+secret, select rejection, unknown provider, secret-never-returned),
and the generic panel.
Adds a 'Customizing platform hints' section to the Prompt Assembly
developer guide covering the append/replace/shorthand shapes, the
defensive fallback, and the cache-stable lifecycle (stable tier,
resolved at build time).
Add platform_hints config so an admin can append to or replace Hermes'
built-in platform hint for a single messaging platform (WhatsApp, Slack,
Telegram, ...) without affecting other platforms. Enables enterprise
managed profiles to steer platform-aware skills (e.g. invoke a custom
table-formatting skill on WhatsApp where Markdown tables don't render)
while leaving Telegram/Slack/CLI behavior unchanged.
- hermes_cli/config.py: document platform_hints in DEFAULT_CONFIG
- agent/agent_init.py: load platform_hints -> agent._platform_hint_overrides
- agent/system_prompt.py: _resolve_platform_hint() applies append/replace
(replace wins; bare string = append shorthand); defensive on bad config
- tests: 16 cases covering append/replace/shorthand/isolation/malformed
Override only affects the platform-hint segment of the system prompt;
SOUL/context/memory tiers and general instructions are unchanged.
DELETE /api/sessions/{id} was the only session endpoint that didn't
resolve the id (detail, messages, rename, export all call
resolve_session_id) and 404'd when the row was already gone. The desktop
optimistically removes the sidebar row, then RESTORES it and shows the
error on any failure — so deleting a session that had just been reaped
(empty-session hygiene) or removed by a concurrent client resurrected a
ghost row and surfaced "session not found". /goal + auto-compression churn
leaves transient empty rows that race the sidebar snapshot, which is the
exact "I deleted the empty one and got 'session not found'" report.
Resolve exact ids / unique prefixes, and treat an already-absent session
as an idempotent success — DELETE's contract is "ensure it's gone". This
mirrors the bulk-delete endpoint, which already treats ghost ids as
success.
Tests: deleting an absent id is idempotent (200, not 404); delete resolves
a unique prefix; a real session still deletes.
When a worker calls kanban_create from inside a session that has a
persistent delivery channel, the originating session is now subscribed
to the new task's completion/block events automatically. The agent
that dispatched the task gets notified instead of having to poll.
- Gateway sessions (telegram/discord/slack): HERMES_SESSION_PLATFORM +
HERMES_SESSION_CHAT_ID ContextVars, set by the messaging gateway.
- TUI / desktop sessions: HERMES_SESSION_KEY in the subprocess env.
The TUI notification poller keys on platform='tui' + chat_id=<key>.
- CLI / cron / test: no persistent channel, no subscription.
Gated by kanban.auto_subscribe_on_create in config.yaml (default True).
Disable to mirror pre-feature behaviour — users who want explicit
kanban_notify-subscribe calls per task can set it to false. This
config gate addresses the design concern that got PR #19718 reverted
upstream (unconditional implicit auto-subscribe on tool-driven
kanban_create was too aggressive for orchestrator users).
HERMES_SESSION_ID is intentionally not a fallback channel — it is
set by ACP/agent subprocess telemetry for every invocation, not just
TUI, so treating it as a notification target would auto-subscribe
every CLI session and re-introduce the over-eager behaviour.
The kanban_create response now includes a 'subscribed' bool so
orchestrators can react if subscription failed (e.g. by falling
back to explicit kanban_notify-subscribe or to polling).
Includes 6 tests covering the gateway / TUI / CLI / partial-context /
gated / add_notify_sub-failure paths. All 90 tests in
test_kanban_tools.py pass; 509 broader kanban tests pass.
The new compression-tip tests poke started_at/ended_at directly via
db._conn to force deterministic lineage ordering. _conn is typed
Optional[Connection], so ty flagged .execute/.commit as unresolved on
None. Bind a local and assert it's non-None first to narrow the union.
Auto-compression ends the live session and forks a continuation child
(linked via parent_session_id). A long-lived parent keeps its own flushed
message rows, so resolve_resume_session_id()'s empty-head walk never
redirected it — resuming the parent id reloaded the pre-compression
transcript and dropped every turn generated after compression, including
the assistant's response. On the desktop this is the recurring "I sent a
message, came back, and the reply isn't there" report on large sessions:
the chat's routed id is the pre-rotation id, and both the gateway
session.resume RPC and the REST /messages read anchored on it.
Fix the resolver at the chokepoint: resolve_resume_session_id() now
follows the compression-continuation chain forward via get_compression_tip()
before its existing empty-head descendant walk. get_compression_tip() only
follows children whose parent ended with end_reason='compression' (created
after the parent was ended), so delegation/branch children never hijack a
resume. This fixes every resume caller at once (REST /messages, CLI
--resume, gateway /resume).
session.resume in tui_gateway was the one resume path that never called the
resolver — it used the raw target id directly. Route it through
resolve_resume_session_id() too (non-lazy only; lazy watch windows must
stay on their exact child branch). Resolving up front also re-anchors the
live-session fast path so a still-live rotated session is reused by its new
key instead of rebuilding a duplicate agent on the stale parent.
Tests:
- resolve_resume_session_id follows the tip even when the parent retains
messages, and is not confused by a delegation child.
- session.resume binds the agent to the continuation tip and returns the
post-compression reply.
Follow-up to the cap-removal salvage. The contributor guarded the new
unlimited default with `[:max_models] if max_models else ...`, which conflates
max_models=0 (used by slug-only callers that want an empty model list) with
None (unlimited). Tighten to `is not None` at all five slicing sites in
list_authenticated_providers / list_picker_providers, and add a regression test
asserting the three-way contract: None=full, 0=empty, N=first N.
The interactive model pickers (Desktop REST API, TUI model.options, CLI
/model) were hard-capped at max_models=50, which truncated large provider
catalogs like Kilo Gateway (336 models) to just 50 entries. This made
most models undiscoverable via the picker search box.
Changes:
- Change build_models_payload() default from max_models=50 to None (unlimited)
- Change list_authenticated_providers() default from max_models=8 to None
- Change list_picker_providers() default from max_models=8 to None
- Fix all [:max_models] slicing to handle None as 'no limit'
- Remove max_models=50 from 5 interactive picker callers:
* web_server.py: get_model_options (Desktop /api/model/options)
* web_server.py: get_recommended_default_model
* model_switch.py: prewarm_picker_cache_async
* tui_gateway/server.py: model.options JSON-RPC
* cli.py: HermesCLI model picker
- Telegram/Discord inline keyboard picker (gateway/slash_commands.py)
still passes max_models=50 explicitly — unchanged behavior.
The total_models field was already in the response payload and is now
meaningful since models.length == total_models for interactive pickers.
Fixes#48279
The manual /compress handler called rewrite_transcript() unconditionally on
the session id returned by _compress_context(). When rotation does not occur
(e.g. _session_db unavailable, or the DB split raised), session_id is unchanged
and rewrite_transcript() DELETEs the original messages and replaces them with
only the compressed summary — permanent data loss (#44794, #39704).
Guard the rewrite on actual rotation: only overwrite when _compress_context
produced a new session id. Otherwise leave the original transcript intact and
log a warning.
compress_context() rotates the session (end_session -> create_session)
mid-turn when auto-compress triggers, but never called
_flush_messages_to_session_db() first. Messages generated during the
current turn that hadn't been persisted to state.db were silently lost.
The same bug existed in cli.py:new_session() (/new command). Both paths
now flush un-persisted messages before ending the old session.
* feat(billing): nous_billing http client + BillingState core (phase 2b)
Phase 2b terminal-billing client foundation:
- hermes_cli/nous_billing.py: typed client for the 4 /api/billing/* endpoints
(state/charge/poll/auto-top-up). Raises typed errors (BillingScopeRequired,
BillingRateLimited, BillingAuthError) mapped from the live-verified contract;
fail-open is the caller's job. Idempotency-Key enforced client-side.
- agent/billing_view.py: surface-agnostic BillingState core + Decimal money
parsing (server emits decimal strings, not 2dp), fail-open builder,
idempotency-key gen, custom-amount validation.
- 51 unit tests (decimal parse/format, payload tiering, error->exception
matrix, fail-open, amount validation).
Plan: docs/plans/2026-06-13-001-phase-2b-terminal-billing-tui-plan.md
* feat(billing): billing:manage scope + lazy step-up re-auth (phase 2b)
- NOUS_BILLING_MANAGE_SCOPE constant.
- nous_token_has_billing_scope(): split-based scope check (no false-positive
substring match).
- step_up_nous_billing_scope(): re-runs the device flow requesting
billing:manage, reusing the held credential's portal/inference URLs + client_id
(so a preview stays a preview), persists like _login_nous but WITHOUT the model
picker. Returns True iff the minted token carries the scope (False when NAS
silently downscopes a non-admin / unticked grant).
Lazy step-up (plan D-A): normal login path unchanged; 403 insufficient_scope
from a billing call triggers this. 7 unit tests.
* feat(billing): billing JSON-RPC methods for the TUI (phase 2b)
billing.state / charge / charge_status / auto_reload / step_up in
tui_gateway/server.py. Return STRUCTURED success envelopes (result.ok +
result.error=<code>) rather than JSON-RPC-level errors, so the Ink rpc() promise
always resolves and the TUI branches on the typed billing error code
(insufficient_scope, rate_limited, no_payment_method, …) to render the right
affordance. Money serialized as decimal STRINGS + display strings. charge mints
+ echoes an idempotency_key for retry reuse. 16 unit tests.
* feat(billing): /billing CLI handler + command registry (phase 2b)
- CommandDef("billing", subcommands=buy|auto-reload|limit), added to
_SLACK_VIA_HERMES_ONLY so it routes via /hermes on Slack (keeps the 50-cap
parity test green, same as /credits).
- cli.py::_show_billing + screen helpers: all 5 screens (overview, buy→confirm→
poll, auto-reload, monthly-limit read-only). Reuses _prompt_text_input_modal /
_prompt_text_input (D-C). Non-interactive (_app is None) renders text + portal
deep-link, never prompts (R7). Decimal money end-to-end. 2s/5-min cancellable
poll loop; 429/503 = retry not failure; settled = ledger truth. Lazy step-up on
403 insufficient_scope. no_payment_method treated as mainline funnel-to-portal.
- 6 CLI tests; 156 command tests (incl. Slack/Telegram parity) green.
* feat(billing): /billing Ink TUI screens + tests (phase 2b)
- ui-tui/src/app/slash/commands/billing.ts: /billing TUI command covering all 5
screens — overview (text), buy <amt> → ConfirmReq → charge → non-blocking 2s/
5-min poll loop → settled/failed/timeout branches, auto-reload <below> <to> →
ConfirmReq → PATCH, limit (read-only). Reuses the existing ConfirmReq overlay
(D-C) — no bespoke component. Typed-error envelope branching: insufficient_scope
arms the lazy step-up confirm; no_payment_method/rate_limited/cap funnel to
portal. Client-side amount validation mirrors the server (bounds + 2dp).
- gatewayTypes.ts: Billing* response interfaces.
- registry.ts: register billingCommands.
- billingCommand.test.ts: 12 vitest cases (overview/gating/buy-confirm-poll-
settled/no_payment_method/step-up/limit/auto-reload/validation).
TUI build green; 12/12 vitest pass; slash tests pass once @hermes/ink is built.
* docs(billing): scrub private cross-repo references
NAS is a private repo — remove all references to it from the public PR:
- drop the cross-repo planning doc (planning scaffolding, not a deliverable;
the PR description documents the design)
- replace 'NAS' / 'PR #412 preview' mentions in code + test comments with
generic 'the server' / 'a preview deployment'
* docs(billing): scrub final NAS reference in step-up docstring
* docs(billing): drop dangling plan-doc refs
The phase-2b plan doc was removed in the cross-repo scrub (300afcc0b)
but two module docstrings still pointed at it. Drop the dead refs.
* feat(billing): interactive /billing overlay + step-up UX, portal-URL & token fixes
Adds the interactive /billing TUI overlay and hardens the terminal-billing
client across CLI and TUI.
- TUI: full /billing overlay state machine (overview to buy to confirm,
auto-reload, read-only monthly limit) reusing the existing confirm overlay.
- Step-up: surface the verification link in-transcript and open the browser
via the TUI's own opener (the device flow runs in the headless gateway, so a
printed URL was being dropped); run the step-up handler off the main loop and
emit the link as an out-of-band event so the gateway stays responsive.
- Step-up copy is scope-accurate ("Billing permission granted") and re-checks
/state so it never claims "enabled" when the org kill-switch is still off.
- Portal deep-links resolve to absolute URLs against the active portal base
(the server emits them relative) - fixes a bare "/billing?topup=open" link.
- Billing calls refresh an expired access token via the stored refresh token
instead of reporting a false "not logged in".
- Optimistic funnel: advise "set up a saved card on the portal" up front when
no card is on file (advisory, not a hard gate).
- Token resolution is cached briefly so the 2s charge poll loop stops
re-locking + re-reading the auth store on every tick; 401 re-resolves fresh.
- Remove the temporary demo-mode shims.
Validation: 87 Python billing tests, 88 TS tests (billing command + gateway
event handler), tsc clean, ink + ui-tui builds green.
* docs(billing): add /billing TUI screenshots for PR
* fix(cli): guard _last_invalidate on bare instances; update stale prompt-fallback test
The UI-invalidate throttle read self._last_invalidate unconditionally, which
raised AttributeError on HermesCLI instances built without __init__ (the
thread-safety test's object.__new__ shell). Guard the read with getattr.
The off-main-thread branch of _prompt_text_input was changed (#23185) to cancel
cleanly to None instead of falling back to a bare input() that would hang on the
slash-worker thread; the test still asserted the old direct-input fallback.
Update it to assert the current intended behavior: returns None, calls neither
run_in_terminal nor input(), and does not hang.
@nous-research/ui@0.18.2 Button is grid-based: size=xs is an
aspect-square icon-only box, and icons belong in prefix/suffix.
The dashboard used shadcn-style size=xs + inline <Icon/> text
children, which forced text buttons into broken tall squares
(Configure, Run setup, Select, Save keys) and split icon/label
across grid columns elsewhere (Schedule it, Prune/Delete actions).
Move leading icons to prefix and size text buttons as sm/default.
For the post-setup spinner, drive the spin from a button-level
[&_svg]:animate-spin selector since the prefix slot clones the
icon and overwrites its className.
- ToolsetConfigDrawer: Select, Save keys, Run setup
- SkillsPage: New skill, Configure
- AutomationBlueprints: Schedule it
- SessionsPage: Prune old sessions, Delete empty, Delete selected
The "💾 Self-improvement review" summary (skill/memory updated) was invisible
on two surfaces:
- Desktop Electron app had no review.summary event handler — skill/memory
writes happened silently. Now appends a persistent system message to the
transcript (matching the Ink TUI's persistent-line semantics, not a
transient toast that can be missed).
- tui_gateway (backs both 'hermes --tui' and the desktop) never read
display.memory_notifications, so it always behaved as 'on' and ignored a
user who set 'off'/'verbose'. Added _load_memory_notifications() (mirrors
the messaging gateway's bool->str normalization, defaults to 'on') and
wired it to agent.memory_notifications, matching gateway/run.py and the CLI.
Delivery chain now reaches all surfaces:
background_review.py -> background_review_callback -> review.summary event ->
desktop transcript / Ink TUI line / gateway message / CLI print.
The universal PARALLEL_TOOL_CALL_GUIDANCE block already lives on main, but it
shipped with two rough edges this change cleans up:
- It duplicated the batching steer for Google models. The
GOOGLE_MODEL_OPERATIONAL_GUIDANCE block still carried its own
"Parallel tool calls" bullet, so Gemini/Gemma received the instruction
twice in one prompt. Drop the redundant bullet — the universal block is now
the single source.
- Its comment claimed "nothing in the open-source system prompt encouraged
batching," which was wrong: the steer existed for Google models only. Reword
to say the gap was that every *other* model got nothing.
- Tighten the test that asserts the steer (precedence-correct), and add an
invariant guarding against re-introducing the Google duplicate.
* Port from cline/cline#11514: encourage parallel tool calls
Add a universal system-prompt guidance block telling the model to batch
independent tool calls (reads, searches, web fetches, read-only commands)
into a single assistant turn instead of one call per turn. The runtime
already executes independent batches concurrently (read-only tools always;
non-overlapping path-scoped file ops); the open-source system prompt had
nothing steering the model to PRODUCE the batch. Fewer round-trips means
less resent context, which compounds over a long conversation.
- prompt_builder.py: new PARALLEL_TOOL_CALL_GUIDANCE block (short, static,
cache-amortised) modeled on TASK_COMPLETION_GUIDANCE.
- system_prompt.py: inject right after the task-completion block, gated by
agent.valid_tool_names + the new toggle.
- agent_init.py: read agent.parallel_tool_call_guidance (default True).
- config.py: add the default under the agent section.
- test_prompt_builder.py: behavior-contract tests (batching steer, dependent
carve-out, length bound) — invariants, not wording snapshots.
Adapted from Cline's TypeScript tool-surface guidance to hermes-agent's
Python prompt-assembly architecture and config-over-env conventions.
* fix(desktop): never persist or restore a named custom provider as bare "custom"
Custom providers vanish from the Desktop/TUI model picker with
"No LLM provider configured" — repeatedly fixed (#44062, #44109, #45578)
and repeatedly regressed (#44022, #47714) because every fix only recovered
the entry identity from a persisted base_url. When a session is
persisted/restored with the resolved provider "custom" and NO base_url, bare
"custom" leaked through verbatim; resolve_runtime_provider("custom") routes to
the OpenRouter default URL with no api_key, so the next turn/resume dies.
Bare "custom" is the resolved billing class shared by every named providers:/
custom_providers: entry — it is not a routable identity. Centralize the
"never let bare custom escape" invariant in one helper,
runtime_provider.canonical_custom_identity(), and apply it at all four leak
sites in tui_gateway/server.py:
- _ensure_session_db_row — the ORIGIN: first DB write seeds the bad row
- _runtime_model_config — live persist
- _stored_session_runtime_overrides — resume restore (heals old rows; drops
unrecoverable bare custom so resume falls back to config default)
- _make_agent — rebuild / per-turn
The helper recovers custom:<name> from the endpoint URL when present, else
from config.model.provider (the durable identity left when no base_url
survived). Regression tests in test_custom_provider_session_persistence.py
lock the no-base_url vector at every site so it cannot regress again.
The memory tool was strictly one-op-per-call. With the store running near
its char limit by design, a new add that would overflow gets rejected with
'consolidate now, then retry' -- but the model could not consolidate and add
in one call. It had to remove/replace across several turns, then retry the
add, each turn re-sending the whole conversation context. Expensive thrash.
Add an 'operations' array: a list of add/replace/remove ops applied
atomically against the FINAL char budget. The model frees space and adds new
entries in ONE call, even when an add alone would overflow. All-or-nothing:
any bad op aborts the whole batch, nothing written.
Root-cause note: the two agent-level memory interception sites
(agent_runtime_helpers.py, tool_executor.py) silently dropped any param not
in their explicit kwarg list, so 'operations' never reached the handler and
batch calls failed with 'Unknown action None'. Both now pass it through and
bridge each add/replace op to external memory providers.
Also: success response is now terminal (done=true + 'do not repeat' note,
no full-entries echo that invited re-edits); schema rewritten to lead with
the batch mechanism and an explicit one-shot stop rule (2138 -> 1476 chars).
Live-verified: near-full consolidate-and-add went 7 calls -> 1 call,
stable across 3 reps. 103 memory/approval tests + 398 background-review/
run_agent tests green; 6 new batch tests added.
PR #48372 relaxes EAP=Stop around the uv venv call so PowerShell 5.1
doesn't mistake uv's 'Using CPython ...' stderr for a terminating
NativeCommandError. But relaxing EAP also means a *genuine* uv venv
failure (exit != 0) no longer aborts on its own — Install-Venv would
continue and print 'Virtual environment ready', and in stage mode
Invoke-Stage would report ok=true, even though no venv was created.
Capture $LASTEXITCODE immediately after the relaxed call and throw on
non-zero (Pop-Location first, matching the function's other exit paths),
so the venv stage fails fast instead of falsely succeeding. This is the
explicit guard originally proposed in #48463 (devorun), composed on top
of #48372's reusable helper + regression test.
Adds a regression test asserting the uv venv exit-code capture + throw.
The dashboard MCP catalog only showed name/description/transport and a
non-clickable source. Users couldn't see what an entry connects to or runs
before installing — the exact detail the docs trust model tells them to vet.
- /api/mcp/catalog now returns transport target (url, or command+args),
auth_type, git install source/ref + bootstrap commands, default-enabled
tool hint, and post-install guidance per entry.
- McpPage renders the endpoint URL (http) or command+args (stdio), the git
install source/ref, a collapsible bootstrap-commands list, setup notes,
and the source as a clickable link when it's a URL.
- Docs: drop the 'uv pip install -e .[mcp]' quick-start step (Hermes does
not support pip installs; MCP ships with the standard install) and note
the dashboard now surfaces this detail.
- Strengthen the catalog endpoint test to assert the new inspection fields.
Epic's experimental Unreal MCP plugin embeds an MCP server inside the
Unreal Editor process, served over local HTTP (127.0.0.1:8000/mcp by
default). HTTP transport, no auth, no install block — the user enables
the plugin in-editor and Hermes connects to the URL.
Also drops test_optional_mcps_manifests_ship_in_both_wheel_and_sdist:
it asserted wheel/sdist packaging targets for pip/Homebrew/Nix installs,
which Hermes does not support — installs run from the repo checkout, where
the catalog is discovered by directory iteration with no packaging step.
* fix(tui): don't make Enter swallow trailing-space-only slash completions
Submitting a slash command in the TUI took three Enter presses: one to
complete the name (/ex → /exit), a second that only appended the trailing
space the gateway adds to keep the classic-CLI prompt_toolkit dropdown open
(/exit → "/exit "), and a third to actually submit.
The composer's submit handler accepted the highlighted completion whenever
applying it changed the input at all, so the whitespace-only delta ate an
extra keypress. Treat a completion whose only change is trailing whitespace
on an already-complete token as "already complete" and fall through to
submit. Partial-name and argument completions (a real token change) still
accept on Enter as before.
The replace/accept logic is extracted into pure helpers (applyCompletion,
completionToApplyOnSubmit) in domain/slash.ts.
* test(tui): cover Enter/completion trailing-space behavior and isolate poller queue
- completionApply.test.ts asserts completionToApplyOnSubmit accepts real
token completions (partial command name, argument) but returns null for a
trailing-space-only delta on an already-complete command, so Enter submits
instead of needing extra presses.
- test_notification_poller_delivers_completion / _skips_consumed previously
shared the process-global process_registry.completion_queue. Their events
carry no session_key, so a leaked/concurrent poller could dequeue and
dispatch them to a fixture agent without run_conversation, flaking CI
("AttributeError: '_FakeAgent' object has no attribute 'run_conversation'").
Isolate the queue per test (fresh queue.Queue via monkeypatch), matching the
sibling poller tests that already do this.
The salvaged guard allowed _rmtree_writable(SKILLS_DIR) itself. No call
site ever passes the root — every site passes a skill subdir or its .bak
sibling — so allowing the root only preserves the #48200 footgun (a dest
that collapses to the root wipes every installed skill). Require a strict
strict-child relationship and update the test that documented the
nonexistent 'full reset' capability.
Defense-in-depth fix for the silent wipe of ~/.hermes/ documented in
#48200. A `hermes update --yes` run silently destroyed a user's
.env, MEMORY.md, kanban.db, custom skills, and scripts. Two changes:
1. `_rmtree_writable` in tools/skills_sync.py now refuses to rmtree
anything outside SKILLS_DIR (the HERMES_HOME/skills/ root).
All five call sites pass paths under SKILLS_DIR, so the guard is
a no-op for current code and a loud, recoverable failure for
any future regression (bad path join, malicious bundled
manifest, stale path in scope after an exception).
2. The default `updates.pre_update_backup` flips from false to
true in hermes_cli/config.py. A few minutes of zip per update
is negligible compared to silent total data loss. Still
overridable; --no-backup still works for one-off opt-out.
Five new tests in TestRmtreeWritableScopeGuard (root path,
hermes home, sibling dir, skills root itself, subdir) plus a
flipped `test_default_enabled_creates_backup` in test_backup.py.
178/178 tests pass in the two affected files. Public method
signatures unchanged, no test-stub blast radius.
Closes#48200
Review feedback from egilewski:
1. Remove trailing whitespace from test docstring and mock patches (lines 1430, 1469, 1476, 1482)
2. Expand test coverage: also verify ANTHROPIC_API_KEY is stripped (not just OPENAI_API_KEY)
Changes:
- Remove trailing whitespace from test file
- Add ANTHROPIC_API_KEY to test environment
- Add assertion verifying ANTHROPIC_API_KEY is stripped from cua-driver subprocess env
- Syntax verified: python3 -m py_compile tests/tools/test_computer_use.py ✓
- Use _sanitize_subprocess_env() to filter Hermes-managed credentials
from the cua-driver subprocess environment (issue #37878)
- Prevents credential exfiltration to the third-party cua-driver binary
- Aligns with existing pattern used by browser-tool and other tools
- Add regression test to verify environment sanitization
The cua-driver is a lower-trust MCP subprocess per SECURITY.md §2.3.
Its inherited environment is now scrubbed by default, removing provider
API keys, gateway tokens, and platform credentials that should not leak
to third-party binaries.
Fixes#37878
PR #47792 pinned Electron to an exact 40.10.2 and regenerated the root
package-lock.json (dropping @electron/get@5 + @electron-internal/extract-zip,
restoring @electron/get@2 + extract-zip@2 + yauzl), but did not refresh the
shared npmDepsHash in nix/lib.nix. The hash still described the previous
40.10.3 lockfile, so npmConfigHook fails on every Nix build with
"npmDepsHash is out of date" for hermes-tui / hermes-web / hermes-desktop.
Regenerate the single shared hash to match the current lockfile.
Verified with fetchNpmDeps (authoritative, not prefetch-npm-deps):
nix build .#tui.npmDeps -> builds clean
nix build .#tui -> Validating consistency -> Installing dependencies
-> Finished npmConfigHook (no hash error)
The `before-quit` handler tears down the bootstrap controller, preview
watchers, and the Python backend but never disposes live PTY sessions.
When `app.quit()` proceeds to `FreeEnvironment()`, node-pty's
`ThreadSafeFunction::CallJS` callback fires on a half-torn-down
environment, throws a C++ exception that can no longer be caught, and
the process aborts (microsoft/node-pty#904).
Iterate `terminalSessions` and call `disposeTerminalSession()` (which
already calls `pty.kill()` + deletes the map entry) before killing the
backend, so the ThreadSafeFunctions are removed before teardown begins.
Closes#48335
The TUI banner reported fewer tools than the classic CLI for the same
config (e.g. 32 vs 38) when an MCP server connected slowly. Root cause:
the agent snapshots `agent.tools` once at build time and never re-reads
the registry. `_make_agent` briefly joins the background MCP discovery
thread (`wait_for_mcp_discovery`, ~0.75s) so fast servers land in that
snapshot, but a server slower than the bound — common for an HTTP MCP
server on first connect — lands *after* the agent is built. Its tools are
then absent from both the agent (uncallable until `/reload-mcp`) and the
banner for the whole session.
The classic CLI doesn't hit this because it re-derives
`get_tool_definitions()` at banner render time (which re-waits for
discovery), so it picks the late tools up.
Fix: after a fresh agent is built and its first `session.info` emitted,
if discovery is still in flight, schedule an off-critical-path daemon that
waits for it to finish, then rebuilds the tool snapshot and re-emits
`session.info` — the same rebuild `/reload-mcp` performs, but automatic.
Both the agent's callable tools and the banner count catch up.
Cache safety: the rebuild runs only while the session is still
pre-first-turn (`_user_turn_count`/`_api_call_count` both 0 → nothing
cached to invalidate). Once the user has sent a message we leave the
snapshot frozen rather than break the cached prompt prefix mid-conversation;
late tools then require an explicit `/reload-mcp` (user-consented), exactly
as today. No-op when discovery finished before the agent build, when the
join times out, when the registry was unchanged, or when the session was
swapped/closed while waiting.
Adds entry.mcp_discovery_in_flight() / join_mcp_discovery() accessors and
covers the matrix (added/none/post-turn/timeout/unchanged/replaced) with
unit tests.
The TUI banner footer used the raw `info.mcp_servers.length`, so a
configured-but-disabled server (e.g. `linear`) was counted alongside
connected ones. With a disabled `linear` and a connected `nous-support`,
the TUI reported "2 MCP" while the classic CLI correctly reported "1 MCP"
(`mcp_connected = sum(1 for s in mcp_status if s["connected"])` in
hermes_cli/banner.py).
The collapse toggle even labels the count "connected", which was wrong
for the same reason.
Count connected servers for both the toggle and the footer segment, and
drop the `· N MCP` segment entirely when none are connected (matching the
classic banner, which only appends it when the count is > 0). The
expandable MCP section still lists every configured server, including
disabled ones.
Invariant test renders SessionPanel and asserts the headline equals the
connected count, never the configured total.
The non-retryable abort path now computes _nonretryable_summary once and
reuses it at the emit sites and the returned error field. The
content-policy-blocked return branch still recomputed the identical
value into a separate _summary local, half-honoring the 'summarize once'
intent. _summarize_api_error is a pure staticmethod and api_error is
never reassigned in this block, so _summary was provably byte-identical
to _nonretryable_summary. Reuse the hoisted value and drop the redundant
call. Behavior-preserving.
Locks the contract that a non-retryable failure (a Cloudflare 403
"managed challenge" page) returns a short, HTML-free `error` field —
guarding the field path where the raw page was dumped to Discord as
~31 messages.
The test drives the standard chat-completions path with a concrete
model so the turn actually reaches `client.chat.completions.create`,
where the mocked 403 is raised. It asserts the create call happened
(guarding against a vacuous pass — an empty model on the Codex
Responses path would otherwise abort on a validation ValueError before
any API call) and that the summarized error includes "403" while
excluding <html> / _cf_chl_opt. The non-retryable abort path is
provider-agnostic; a Cloudflare managed-challenge 403 can surface on
any provider behind Cloudflare.
When a non-retryable client error aborts the turn (e.g. a Codex/Cloudflare
HTTP 403 "managed challenge" page), the conversation loop returned the
failure dict with `error: str(api_error)` — the entire ~60KB HTML page.
Downstream consumers deliver that field verbatim: a cron job dumped a
Cloudflare challenge page to Discord, where it was split into ~31 messages.
The sibling "max retries exhausted" path already collapses such bodies via
`_summarize_api_error` (which extracts the <title> / status from HTML error
pages). This makes the non-retryable path consistent: compute the summary
once and use it for both the status emit and the returned `error`.
Source-level guard (install.ps1 only runs on Windows, so there's no Linux CI
runner to execute it): the astral uv install line must be invoked via the call
operator on a resolved host variable, the bare-`powershell` literal that
produced the field-reported "The term 'powershell' is not recognized" must be
gone, and the resolver must be PATH-independent (Get-Process -Id $PID) and
pwsh-aware.
The Windows installer's Install-Uv spawned the astral uv installer with a
hardcoded bare `powershell -ExecutionPolicy ByPass -c "irm .../uv | iex"`.
That name resolves only to Windows PowerShell, and only when its System32
directory is on PATH. Run under PowerShell 7+ (`pwsh`) — or any session where
`powershell` isn't on PATH — the spawn dies with "The term 'powershell' is not
recognized", and uv installation aborts (the installer then appears stuck).
Add Get-PowerShellHostExe, which prefers the absolute path of the host we're
already running in (PATH-independent), then falls back to powershell/pwsh via
Get-Command, then to the bare name. Install-Uv now invokes that resolved exe.
Add infinitycrew39@gmail.com -> infinitycrew39 to AUTHOR_MAP so the
contributor audit resolves the two cherry-picked commits from the #47945
langfuse trace-scope salvage (merged as #48292) to a GitHub handle instead
of flagging them as an unmapped author email.
The prior assertion `all("turn1" in k or "turn2" in k for k in keys)` was
weak on two counts: it passes vacuously when keys is empty (a regression
that lost all state would slip through), and after turn 2 finalizes only
turn 1 lingers, so it only ever inspected turn 1 anyway. Replace it with an
exact check that one key survives, it is turn 1, and turn 2 never merged
into it — the real isolation invariant the test name claims.
Scoping the trace key by turn_id (the prior commit) fixed cross-turn
collisions but introduced a slow leak: _finish_trace only pops a key when a
turn ends cleanly (final response has content and no tool calls), so any
turn that is interrupted, ends on a tool call, or has empty final content
now leaves its uniquely-keyed entry in _TRACE_STATE forever. Previously the
constant per-session key was overwritten by the next turn, capping growth at
~1 entry per session.
Add an LRU cap (_MAX_TRACE_STATE) enforced by _evict_stale_locked, called
under _STATE_LOCK immediately before each insert. It evicts the
least-recently-updated entries (using the previously-dead last_updated_at
field) and ends their root span so nothing dangles. Regression test drives
50 non-finalizing turns against a cap of 8 and asserts the dict stays bounded
with the most-recent turns surviving.
The turn- and api-scoped branches each repeated the same
task/session/thread fallback ladder with only the infix differing. Extract
the shared prefix into _scope_prefix so a future scope dimension touches one
ladder instead of three. The legacy branch still returns a bare task_id (not
the task: prefix) for backward compatibility, so it stays separate.
Output key strings are unchanged; a new test pins them across every
task/session/turn/api combination since the keys are matched across hooks
and any drift would silently break trace finalization.
Cleanup pass on the salvage (behavior-preserving):
- diff_bundled_skill now uses the existing _skill_file_list() helper
instead of reimplementing the rglob/is_file/relative_to file-set
enumeration inline (twice).
- Extract _is_tracked_user_modification(origin_hash, user_hash) and use
it in BOTH the sync loop and list_user_modified_bundled_skills() so the
'kept user edit' rule can't drift between the two sites.
- _read_text_for_diff -> _read_for_diff returns (bytes, text); the binary
branch now compares the bytes it already read instead of re-reading
both files from disk.
- Drop the unused 'user_present' key from diff_bundled_skill's return
contract (no consumer or test ever read it).
- test_update_modified_notice: drop the brittle '>= 2 sites' count-floor
so consolidating the two print paths into a shared helper stays a
welcome refactor; keep the per-site 'count notice => discovery hint'
invariant (still mutation-tested).
The PR added helper-level tests for _trace_key but nothing exercised the
keys through the real hooks. This adds TestTurnTraceIsolation, which drives
on_pre_llm_request / on_post_llm_call across two turns of one gateway
session (task_id == session_id, unique turn_id, api_call_count reset per
turn) and asserts each turn opens its own root trace when the first turn
fails to finalize (tool-only final step). This test fails on the pre-fix
code (only one trace opened, turn 2 absorbed into turn 1) and passes with
the scoping fix.
Also pins the turn_id-over-api_request_id key precedence: the turn-scoped
post_llm_call carries no api_request_id, so it must still resolve to the
same key as the request-scoped hooks or finalization breaks.
Salvage follow-up to the cherry-picked feat/test commits:
- W1: the unpack/install update path in main.py printed the
'~ N user-modified (kept)' notice without the new
'hermes skills list-modified' hint that the git-pull path got.
Mirror the hint to both sites so the count is actionable
regardless of which update path runs.
- W2: 'hermes skills diff <name>' (bundled-vs-stock) now shares the
verb with the gateway write-approval 'diff <id>'. The gateway
handler's docstring + truncation message pointed users to
'/skills diff <id>' on the CLI, which now resolves a bundled skill
by that name instead. Point at the pending JSON file and note the
two diff commands are distinct.
- Add an invariant test asserting every 'user-modified (kept)' notice
in main.py carries the discovery hint (guards sibling drift).
Exercises the real sync pipeline (no mocked comparison logic): a pristine
synced skill is not flagged; an edited one is listed and diffed (modified +
added files); an unknown skill returns not-ok; and `reset --restore` clears
the modified state so revert and discovery stay consistent.
`hermes update` keeps (won't overwrite) bundled skills the user edited
locally, but only printed a count — "~ N user-modified (kept)" — with no way
to learn which skills, or see what changed. Reverting already existed
(`hermes skills reset <name> [--restore]`); discovery and inspection did not.
Add two CLI commands (zero model-tool footprint), reusing the manifest
origin-hash that sync already maintains:
- `hermes skills list-modified [--json]` — list the bundled skills whose
on-disk copy diverges from the last-synced origin hash (the exact test the
sync loop uses to decide what to skip).
- `hermes skills diff <name>` — unified diff between the user's copy and the
current bundled (stock) version, so the user can confirm what changed
before reverting.
Both are mirrored as `/skills list-modified` and `/skills diff`. The
`hermes update` notice now points at `hermes skills list-modified`. Core
helpers `list_user_modified_bundled_skills()` and `diff_bundled_skill()` live
in tools/skills_sync.py alongside the existing reset logic.
Follow-up cleanup on the OpenViking setup path merged in #48262:
- _write_ovcli_config now uses utils.atomic_json_write(path, data, mode=0o600)
instead of the local _precreate_secret_file + write_text + chmod sequence.
The shared helper (already used by honcho/mem0/supermemory/hindsight) writes
via temp-file + fchmod(0600) + fsync + os.replace, so the ovcli.conf is
written atomically (no half-written secret file on crash) and with no
chmod-after-write TOCTOU window. _precreate_secret_file stays for the .env
writer path.
- Remove dead _DEFAULT_ACCOUNT/_DEFAULT_USER constants (0 references; the
empty->'default' tenant fallback lives in the _VikingClient constructor).
Tests: tests/plugins/memory/test_openviking_provider.py + test_memory_setup.py
+ openviking_plugin/test_openviking.py -> 130 passed; ruff clean.
PR infographics are decorative visual hooks for a PR body, not repo
artifacts. The established convention (commit 5772e638c, "chore: drop
in-repo infographic/ directory; keep PR-body URLs only", #30854) is to
hotlink an externally-hosted image so GitHub camo-proxies it inline,
leaving zero binary footprint in the tree.
Two such assets had been committed anyway and are referenced nowhere in
the codebase:
- docs/assets/ns504-chat-session-reconnect.png (1024-equiv, NS-504 PR
infographic, added in #47674 alongside the ChatPage.tsx fix)
- infographic/kanban-db-corruption-defense/infographic.png (re-added a
directory #30854 had explicitly removed, in #30952)
Both are unreferenced decorative infographics, so removing them has no
effect on docs, website, or app builds. Removing the latter also clears
the stray top-level infographic/ directory that #30854 had retired.
These blobs remain in history (the commits that introduced them are
already on main and bundled with real code, so they can't be dropped);
this just removes them from the working tree going forward.
Follow-up to #47663 (streaming multipart upload), fixing two issues that
landed with it.
1. Temp file leaked on client disconnect. The streaming upload endpoint's
except chain caught only HTTPException / PermissionError / OSError — all
Exception subclasses. asyncio.CancelledError, raised when a browser aborts
a large upload mid-stream (the exact NS-501 scenario), is a BaseException,
so it bypassed every except clause and reached a finally that only closed
the file handle and never unlinked the temp file. Every aborted large
upload orphaned a partial `.{name}.*.upload` file (up to ~100 MB) in the
target directory. Cleanup now lives in finally, keyed on a `renamed`
success flag, so the temp file is removed on every non-success exit
including BaseException paths. Added test_stream_upload_cleans_temp_on_cancellation,
which fails on the pre-fix code (leaks the temp file) and passes with the fix.
2. python-multipart pinned to ==0.0.27 instead of ==0.0.20. The package was
already resolved at 0.0.27 transitively (via daytona) before #47663; the
explicit ==0.0.20 pin in the [web] extra and the tool.dashboard lazy-install
set downgraded it. Bumped both to ==0.0.27 and regenerated with `uv lock`,
keeping the lockfile coherent. The base dependency stays >=0.0.9,<1.
Resolves conflicts from the OpenViking churn that merged after #32445 was
opened (#48042/#47662 session-switch + write hardening, #47311/#47973):
- plugins/memory/openviking/__init__.py: keep both __init__ field groups
(the PR's _runtime_start_* alongside main's _prefetch_threads/_shutting_down).
- tests/plugins/memory/test_openviking_provider.py: keep BOTH the PR's new
setup-validation tests and main's session-switch/concurrency tests (disjoint
additions to the same region).
Two fixes layered while reconciling (contributor work otherwise preserved):
- Restore the merged tenant-header contract (#22414/#21232). The PR had changed
_VikingClient defaults to '' and made empty account/user OMIT the tenant
headers; main's contract is that empty falls back to 'default' and the
X-OpenViking-Account/User headers are ALWAYS sent (ROOT API keys need them).
Reverted the constructor to 'account or os.environ.get(..., "default")' and
updated the two PR tests that asserted the omit-when-empty behavior.
- Close a secret-file TOCTOU in the setup writers. _write_env_vars and
_write_ovcli_config wrote the api_key/root_api_key file and chmod 0600
AFTERWARD, leaving a world-readable window on newly-created files. Added
_precreate_secret_file() to create with 0600 before any secret bytes land.
* fix(dashboard): stream file uploads via multipart instead of base64 JSON
The dashboard file manager uploaded files (including backup/restore zip
archives) by reading them client-side with FileReader.readAsDataURL and
POSTing a base64 data URL inside a JSON body to /api/files/upload. For a
large backup this (a) inflates the payload ~33%, (b) buffers the whole
file plus its decoded copy in memory, and (c) reliably trips an upstream
proxy body-size/timeout limit, surfacing as a 502 with the upload
appearing to hang indefinitely (NS-501). Dashboard-only hosted users have
no shell fallback to place the archive, so backup restore was unusable.
Add a streaming multipart endpoint POST /api/files/upload-stream
(UploadFile + Form) that reads the request body in 1 MiB chunks straight
to a sibling temp file, enforces the existing 100 MB size cap as it
streams (413 on overflow, before buffering the whole file), and
atomically renames into place so a partial/aborted/over-limit upload
never clobbers an existing file. The frontend api.uploadFile now sends
multipart/form-data (raw bytes, no base64, browser-set boundary) and
FilesPage passes the File object directly; the dead readAsDataUrl helper
is removed. The legacy base64 JSON endpoint stays for backward compat.
FastAPI's UploadFile/Form require python-multipart, which is NOT pulled in
by fastapi itself, so it is added to the base deps, the [web] extra, and
the tool.dashboard lazy-install set (kept in sync).
Validated: 5 new endpoint tests (roundtrip, multi-chunk >1 MiB,
over-limit 413 without clobbering + no temp-file leak, overwrite=false
conflict, forced-root traversal containment); existing base64 tests still
pass; web typecheck + vite build clean; and a real uvicorn server E2E
(5 MB multipart upload -> HTTP 200 in 0.21s, exact byte match) plus a
30 MB TestClient roundtrip confirm constant-memory streaming end to end.
Reported via beta (NS-501).
* build(deps): regenerate uv.lock for python-multipart (NS-501)
CI ran uv lock --check / uv sync --locked which failed because the
python-multipart dependency add was not reflected in uv.lock. Regenerate
the lockfile (resolves to 0.0.20, matching the [web] extra pin) after
merging current main.
Importing a backup wrote every file from the zip over the target home
wholesale. On a hosted instance this clobbered gateway_state.json with the
source machine's last recorded run/desired state — driving the container-boot
reconciler (container_boot._read_desired_state, which only auto-starts a
gateway whose state is "running") off stale/foreign state and leaving the
gateway stuck "starting", disconnected from the Nous portal.
Add _IMPORT_SKIP_NAMES (gateway_state.json, gateway.pid, cron.pid,
gateway.lock, processes.json) and skip them by basename in run_import, so both
the root profile and named profiles preserve the target's own runtime state.
This mirrors what container_boot._STALE_RUNTIME_FILES already sweeps on every
container boot, and protects against older backups that predate the
backup-side exclusions. The import summary reports which files were preserved.
This is the second half of NS-501 (filed separately as NS-508): the upload
502 was fixed in #47663; this fixes the import-breaks-the-instance half.
The gateway half of relay Phase 3. On a MANAGED boot with relay configured and
no secret pinned, the runtime self-provisions its relay credentials IN-PROCESS:
resolve the agent's own Nous access token (resolve_nous_access_token) -> POST
the connector's /relay/provision asserting its own endpoint + route keys ->
set GATEWAY_RELAY_ID/SECRET/DELIVERY_KEY into os.environ so the immediately-
following register_relay_adapter() reads them and dials out authenticated.
No human, no enrollment token, no disk write — the creds live only in process
memory (save_env_value refuses under managed anyway, and keeping the secret off
any volume is the stronger posture). Stateless: process-env creds don't survive
a restart, so a managed container re-provisions every boot; the connector's
rotation window covers a still-connected prior instance. An explicitly-pinned
GATEWAY_RELAY_SECRET is respected (skip). Self-hosted is unchanged: humans keep
using `hermes gateway enroll`.
Endpoint provenance is gateway-asserted (GATEWAY_RELAY_ENDPOINT +
GATEWAY_RELAY_ROUTE_KEYS, env or gateway.relay_* config) — uniform code path
whether the operator sets it (self-hosted) or NAS stamps it (hosted, the only
case NAS knows the public URL). Both absent -> outbound-only provisioning
(credentials, no inbound routes). The connector scopes the asserted endpoint to
the verified tenant, so it stays within the security model.
- gateway/relay/__init__.py: relay_endpoint(), relay_route_keys(),
_provision_url(), _post_provision(), self_provision_if_managed() (never
raises — a provision failure logs and boots without relay auth).
- gateway/run.py: call self_provision_if_managed() immediately before
register_relay_adapter() in the startup path.
Tests: 12 unit (trigger logic, respect-pinned-secret, in-process env wiring,
endpoint+routes vs outbound-only, fail-soft on token/connector failure);
mutation-checked (drop is_managed guard / pinned-secret guard -> tests fail).
Cross-repo live E2E driver lands on the connector side (depends on this).
EXPERIMENTAL: relay auth scheme may change until >=2 Class-1 platforms validate.
The install method (docker/git/pip/...) describes the *running binary*, but
detect_install_method() read it from $HERMES_HOME/.install_method — a shared
DATA directory. The Docker docs deliberately bind-mount $HERMES_HOME
(~/.hermes:/opt/data) so config/sessions/memory persist and can be shared with
a host-side Desktop/CLI install.
When a containerized gateway and a host install share one $HERMES_HOME, the
home-scoped stamp is a single slot describing two installs: the published image
stamps 'docker' on every boot, the host install then reads 'docker' and the
in-app updater refuses to run 'hermes update' ("doesn't apply inside the Docker
container"). Reinstalling the Desktop app from the DMG doesn't help because the
contaminated stamp is re-read every time.
Fix (option 1 — code-scoped stamp):
- detect_install_method() reads <install tree>/.install_method first (next to
the running code, immune to the shared data dir). It falls back to the legacy
$HERMES_HOME stamp for back-compat, but IGNORES a 'docker' home stamp when
not actually containerized — so already-poisoned shared homes self-heal.
- stamp_install_method() writes the code-scoped stamp.
- install.sh stamps $INSTALL_DIR instead of $HERMES_HOME.
- Dockerfile bakes 'docker' into /opt/hermes/.install_method at build time
(inside the immutable block); stage2-hook.sh no longer writes the home stamp
and proactively removes a stale 'docker' one to heal existing shared homes.
Genuine containers still resolve to 'docker' (baked stamp, or legacy home stamp
honored when containerized). Unstamped installs in generic containers still fall
through to git/pip (preserves the #34397 fix).
* feat(relay): authenticate the connector⇄gateway WS channel
The relay gateway may be customer-managed and internet-exposed, so the
connector⇄gateway channel is itself authenticated (distinct from the
platform crypto the relay path sheds). Add gateway/relay/auth.py — a
Python port of the connector's HMAC token + delivery-signature schemes
(relayAuthToken.ts / deliverySigning.ts), verified byte-for-byte against
the connector's compiled TypeScript via cross-language test vectors.
Present an Authorization bearer on the /relay WS upgrade keyed by the
per-gateway secret (resolved from GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET
in env or config). The connector rejects an unauthenticated/invalid/
revoked upgrade with close 4401.
* feat(relay): signed-HTTP inbound delivery receiver
The connector delivers normalized inbound events to a tenant's gateway
over a signed HTTP POST, not the outbound /relay WS: the connector
instance owning a platform socket is generally not the instance a given
gateway dialed out to, so inbound targets a tenant endpoint that may
load-balance across gateway instances.
Add gateway/relay/inbound_receiver.py — verifies x-relay-signature /
x-relay-timestamp over the EXACT raw request bytes (re-serializing would
break the HMAC: JS JSON.stringify is compact, Python json.dumps spaces)
against the per-tenant delivery key verify list within a 300s replay
window, then dispatches messages to handle_message and interrupts to the
interrupt handler. Wire it into the adapter lifecycle (start in connect()
when a delivery key + bind port are configured, tear down in disconnect();
a purely-outbound dev gateway runs without it).
Refine test_relay_sheds_crypto to distinguish PLATFORM crypto (Discord
ed25519, Twilio/WeCom HMAC — still shed) from the connector⇄gateway
CHANNEL auth (intended): auth.py / inbound_receiver.py are exempt from
the platform-symbol scan but still banned from importing platform-crypto
modules, plus a positive guard that auth.py uses only stdlib hmac/hashlib.
* feat(relay): hermes gateway enroll CLI
Add the gateway half of zero-touch enrollment. `hermes gateway enroll`
resolves a fresh Nous Portal access token (the tenant-proving identity),
POSTs {enrollmentToken, gatewayId} to the connector's /relay/enroll, and
persists GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET / GATEWAY_RELAY_DELIVERY_KEY
to ~/.hermes/.env. The per-gateway secret authenticates the WS upgrade;
the per-tenant delivery key verifies signed inbound deliveries.
Refuses under is_managed() (hosted installs get the secret stamped in by
the orchestrator). Added as an 'enroll' subcommand on the existing
gateway subparser — not a new top-level command.
* docs(relay): inbound is signed HTTP, not WS; document channel auth
Fix the stale contract: §3/§5 said inbound rode the WS socket (single-
instance only, predates the multi-instance socket-ownership + channel-auth
model). Inbound + connector→gateway interrupt are signed HTTP POSTs to the
tenant endpoint. Add §6.1 documenting the two channel-auth schemes (per-
gateway WS-upgrade secret, per-tenant inbound delivery key) and how they
differ from the platform crypto the relay path sheds.
* test(relay): update build_gateway_parser callers for cmd_gateway_enroll
The enroll subcommand added cmd_gateway_enroll as a required keyword-only
arg to build_gateway_parser, but two existing parser-extraction tests still
called it with only cmd_gateway/cmd_proxy — failing CI with TypeError.
Thread the new handler through both call sites and add a test asserting
`gateway enroll` dispatches to cmd_gateway_enroll with its flags parsed.
* fix(docker): supervised gateway uses --replace to take over stale holder
Inside the s6 container image the per-profile gateway service rendered a
bare `hermes gateway run` (no --replace). When a gateway is started
OUTSIDE s6 — a stray shell `hermes gateway run`, an agent action, or the
Open WebUI helper (scripts/setup_open_webui.sh) — it grabs the
per-HERMES_HOME PID lock first. The supervised slot then execs the bare
`gateway run`, hits the "Another gateway instance is already running"
guard, exits non-zero, and s6 restarts it: a restart loop that floods the
log every ~12s and never binds. The container looks up but the gateway is
permanently down, and dashboard-only users (no shell) cannot recover.
Render the supervised run script as `gateway run --replace` so s6 is
authoritative for its slot: it reaps the stale holder via the hardened
takeover path (takeover marker + SIGTERM->SIGKILL-with-confirmation +
scoped-lock cleanup in gateway/run.py) and binds. This matches the
systemd service path, which already builds its argv with --replace
(_build_gateway_argv / 'nohup hermes gateway run --replace'), and the
intent already documented in _maybe_redirect_run_to_s6_supervision. The
existing HERMES_S6_SUPERVISED_CHILD sentinel still prevents the
run->start->run redirect recursion. Each profile is scoped to its own
HERMES_HOME and s6 guarantees one supervised instance per slot, so there
is no legitimate supervised sibling for --replace to clobber.
Reported via beta (NS-505): gateway.log showed PID 17907 'running
(manual process)' with the guard error repeating every ~12s on
v2026.6.5.
Adds a regression test asserting every gateway-run exec line in the
rendered script (default + named profile, both privilege branches)
carries --replace, and updates the existing render-script assertion.
* fix(ci): remove stray .venv symlink committed into repo
The PR's commit accidentally tracked a .venv symlink pointing at the
developer's local venv (mode 120000 -> /home/ben/nous/hermes-agent/.venv).
The CI test/e2e/build jobs run `uv venv` to create .venv and failed with
`failed to create directory .venv: File exists (os error 17)` because the
checkout already contained the symlink. All test shards aborted in <15s
during setup, before any test ran.
Untrack the symlink and add a bare `.venv` entry to .gitignore (the
existing `.venv/` rule only matches a directory, so a symlink slipped
through).
Salvage corrections on top of @XVVH's #44341:
- Make native web_search injection a 1:1 swap for an already-present client
web_search function, NOT an additive grant. The original unconditionally
appended {"type":"web_search"} on every is_xai_responses turn with any
tools, force-enabling Grok server-side search even when the user never
enabled the web toolset (bypassing Hermes web-provider config + tool-trace
plumbing). Now gated on a client web_search actually being present.
- Reconcile grok-composer context to 200000 (merged in #47908) rather than
262144; 200k is xAI's published usable context window for Composer 2.5,
262144 is the /v1/responses input+output budget.
- Update tests to match scoped behavior + add a no-web-toolset guard test.
- AUTHOR_MAP entry for #44341 salvage.
Incomplete-guard (server-side *_call items at in_progress no longer flip
has_incomplete_items) and preflight built-in-tool allowlist kept as-is.
- model_metadata: grok-composer-2.5-fast → 262144 (OAuth slug not in /v1/models)
- codex transport: inject native {"type":"web_search"} for is_xai_responses;
drop client web_search to avoid duplicate-name 400s
- codex adapter: do not treat in-progress server-side *_call items as incomplete
- tests: adapter, transport build_kwargs, model_metadata, oauth recovery
The desktop self-update runs `hermes update` then `hermes desktop
--build-only`, and only relaunches if the rebuild returns 0. The first
`--build-only` can exit nonzero on a still-settling post-update tree or a
network-blocked Electron fetch that the installer's self-heal repaired
mid-run — so both updaters (the Tauri setup binary and the in-app POSIX
path) bailed before the relaunch step. The update landed but the app
never restarted; a manual launch worked because the heal had completed.
Retry `--build-only` once in both paths before failing, mirroring the
retry-once `hermes update` already does (and the CLI `hermes update`'s
own desktop rebuild). A second run builds clean off the healed dist and
is a near-no-op when the first actually succeeded (content-hash stamp).
- update.rs: retry stage 2; add rebuild_needs_retry() + test
- main.cjs: retry via new update-rebuild.cjs helper (behavior-tested)
Weak open models (mimo, nemotron-class) that see tool-call XML/JSON sitting in
file contents or tool output get primed and emit their own structured tool
calls mimicking the payload — usually with an empty/whitespace name. Those
calls can't be fuzzy-repaired toward a real tool, so the dispatch loop returns
an error and the model retries. Before this fix, every empty-name error dumped
the full tool catalog back to the model, which fed the priming loop more names
to mimic and inflated context 3-4x across the retry budget.
A blank/whitespace-only tool name now gets a terse anti-priming error that
tells the model in-context tool-call syntax is DATA, with no catalog dump. A
genuinely-wrong-but-nonempty name (a real typo) still gets the full catalog so
the model can self-correct.
Not a sandbox/auth boundary issue: Hermes never parses tool-call text from
content into executable calls (structured tool_calls only; the lone text->call
parser is the Copilot ACP transport and it also rejects empty names). The
reporter's own debug dump confirms the injection never executed.
Behavior-contract test added: empty-name -> terse error, no catalog; nonempty
unknown -> catalog preserved. Exercised end-to-end via run_conversation against
an in-process mock provider.
* fix(dashboard): recover the Chat tab when the agent session ends (NS-504)
In the dashboard Chat tab, when the agent process exits — the user types
`/exit`, or starts a new session that ends the current PTY child — the
`/api/pty` WebSocket closes with a normal code (not one of the
4401/4403/4404/4408/1011 rejection codes the server emits). The frontend
handled only those rejection codes; the normal-exit fallback just printed
"[session ended]" into the dead terminal and stopped, with `wsRef` nulled
and no respawn path. The only recovery was a full page refresh — exactly
the beta report ("typing /exit breaks functionality, no way to restart
without refreshing"; "starting a new session completely breaks the
agent").
On a clean/normal close the Chat tab now flips `sessionEnded` and renders
an in-place "Start new session" overlay (mirroring ChatSidebar's existing
reconnect affordance). Clicking it bumps a `reconnectNonce` that is a
dependency of the connect effect, so the effect tears down and re-runs,
spawning a fresh PTY in place — no page refresh. `onopen` clears the
flag so a successful reconnect dismisses the overlay.
An explicit button (rather than auto-respawn) is deliberate: if the agent
is crash-looping, auto-respawn would hide the failure and spin; the user
stays in control.
Verified against a live uvicorn `/api/pty` socket: a child that exits
closes with a non-rejection code (client sees close_code None / 1000-class),
which is precisely the branch that now sets sessionEnded=true. web
typecheck + vite build clean.
Reported via beta (NS-504).
* docs(assets): add NS-504 chat session recovery infographic
* feat(mcp): raise default tool-call timeout 120s -> 300s
Port from openai/codex#28234. Long-running MCP tools (web fetches,
sandboxed builds, deep-research servers) routinely exceed 120s, causing
spurious timeout failures. Codex bumped its default MCP tool timeout from
120 to 300 for the same reason.
- _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server
'timeout' config override unchanged)
- update test_default_timeout assertion
- document the default in mcp-config-reference.md
* fix(dump): show commit date instead of release date in hermes dump
The version line in `hermes dump` (the top of the /debug report) appended
the package release date in parentheses, which reads like a wall-clock
"generated at" timestamp and confuses support triage. Replace it with the
date the HEAD commit was actually made, resolved live via
`git log -1 --format=%cd --date=short`, kept next to the commit SHA.
On Docker/wheel installs with no .git the date resolves to '' and the
suffix is simply omitted (the baked SHA still identifies the build).
* fix(desktop): resolve electronDist dynamically + self-heal blocked installs
Supersedes the static-path approach (#48081) and the install-step self-heal
(#48082) with a fix that removes the whole failure class instead of chasing each
symptom. Three distinct faults converged into the June desktop-build outage; this
closes all three.
Root cause (the part #48081 left open — "Gap B"):
build.electronDist was a static relative path in apps/desktop/package.json, but
npm workspace hoisting is NOT deterministic — depending on the npm version and
what else is installed, npm nests the workspace-only electron devDep under
apps/desktop/node_modules/electron OR hoists it to the repo root. A static path
matches only one layout, so a clean install intermittently fails with "The
specified electronDist does not exist". #48081 re-pointed the path at the
nested layout (correct today) but electron-builder reads electronDist
STATICALLY, so any future hoist change silently breaks it again — only caught
by a CI invariant, never self-corrected.
Fix:
- scripts/run-electron-builder.cjs: resolve electron the way Node's runtime does
— require.resolve("electron/package.json") walks node_modules from the desktop
project upward and finds electron wherever npm actually put it. The path can
never drift out of sync with the install layout again, on any OS/npm version.
* dist present -> pass -c.electronDist=<abs>/dist so electron-builder reuses
the unpacked runtime (keeps the #38673 fast path that dodges the 26.8.x
missing-binary re-unpack bug).
* dist absent -> omit electronDist; electron-builder fetches Electron itself
via @electron/get honoring electronVersion + ELECTRON_MIRROR.
package.json: builder script now runs the wrapper; the static build.electronDist
is removed (the resolver owns it).
- main.py / install.sh / install.ps1: on a dependency-install failure where the
electron package staged but its dist is missing (electron's install.js
process.exit(1) on a blocked/throttled binary download — #47266/#47917/#48021),
repopulate the dist via electron's downloader (canonical, then npmmirror.com)
and CONTINUE to the build instead of aborting. npm runs postinstall LAST, so
the only casualty is electron/dist; bailing here is what made the pack-time
mirror self-heal unreachable on a blocked network. Hard-fail only when electron
never staged at all (a genuine dependency error).
- The pack-time mirror fallback now retries the build even when the pre-fetch
can't populate the dist: the wrapper lets electron-builder download Electron
itself via the mirror, so the retry is no longer a no-op (it was, when
electronDist was a static path).
The exact 40.10.2 pin (already on main) keeps the third mode — the native
@electron-internal/extract-zip win32 binding that 40.10.3/40.10.4 ship without a
published prebuild — from recurring.
Tests:
- test_desktop_electron_pin.py: replace the static-path-matches-lockfile
invariant with contracts that there is no hardcoded electronDist to drift, the
builder script routes through the resolver, and the resolver uses Node module
resolution + injects -c.electronDist.
- test_gui_command.py: install-failure self-heal continues to build; genuine
(electron-never-staged) install failure still hard-fails; pack retries under
the mirror even when the pre-fetch is blocked.
Salvages/supersedes the overlapping community work in #48003 (sitkarev),
#48012 (omegazheng), #48033 (james47kjv), and #48082.
Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>
* fix(desktop): narrow Electron self-heal to real missing-dist failures
Follow-up on #48091 to remove the remaining misdiagnosis risk from the
installer/build fallback path (#46785 concern): only take the Electron
repair/retry path when Electron's package files are staged and dist is actually
missing/corrupt.
- main.py: add _electron_pkg_staged_missing_dist() and use it to gate install
failure recovery; fail fast for unrelated npm install errors.
- main.py/install.sh/install.ps1: run cache purge + retry only when dist is
missing; do not retry unrelated tsc/vite/build failures under an
Electron-specific narrative.
- install.sh/install.ps1: tighten install-stage self-heal guard to require both
package.json + install.js and missing dist.
- tests: add coverage that install failure hard-fails when Electron dist already
exists, and update retry test to reflect the tightened recovery condition.
Validation:
- Python tests: 64 passed
- install.sh-related tests included in the run
- Real mac build on this machine:
- npm ci at repo root: success
- cd apps/desktop && npm run pack: success
- electron-builder packaged darwin arm64 and used custom unpacked Electron dist
* refactor(desktop): trim electron self-heal helpers and comments
Deduplicate mirror-retry into _try_redownload_electron_dist / shell
counterparts; shorten wrapper and install-script commentary without
changing recovery semantics.
---------
Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>
- test_ws_transport.py: drives WebSocketRelayTransport against a REAL in-process
websockets server (not a mock socket): handshake (hello->descriptor), inbound
frame -> handler, outbound request/response correlation, follow_up routing,
and clean disconnect failing pending waiters. Skips if websockets is absent.
- test_relay_registration.py: rewritten for the config-driven gate — registers
when GATEWAY_RELAY_URL is set / an explicit url is passed / force=True; no-op
without a URL; trailing slash stripped; adapter constructs through the registry.
Full relay suite: 57 passed.
Wire the relay adapter into gateway startup and make activation config-driven
instead of a dark-launch flag.
- gateway/relay/__init__.py: replace relay_enabled()/HERMES_GATEWAY_RELAY with
relay_url() (GATEWAY_RELAY_URL env or gateway.relay_url in config.yaml) — the
same shape as gateway.proxy_url. register_relay_adapter() registers when a URL
is configured and builds a live WebSocketRelayTransport; with no URL it's a
no-op (direct/single-tenant deployments unaffected). force=True keeps the
transport-less adapter for unit tests. relay_platform_identity() reads the
hello platform/botId from GATEWAY_RELAY_PLATFORM/GATEWAY_RELAY_BOT_ID.
- gateway/run.py: call register_relay_adapter() during GatewayRunner.start(),
right after plugin discovery, so a configured connector relay is registered
on every boot. Failures are logged, never block startup.
This removes the dark-launch posture: the relay is on whenever it's configured,
shipping the production end state rather than hiding it behind a flag.
Adds the concrete transport behind the RelayTransport Protocol — the missing
'later-phase work' the relay scaffold deferred. The gateway dials OUT to the
connector over a WebSocket and speaks the newline-delimited JSON frame protocol
(docs/relay-connector-contract.md; connector src/relay/protocol.ts):
- connect(): opens the ws, sends hello{platform,botId}, starts a background
read loop, and resolves handshake() when the connector's descriptor frame
arrives.
- inbound frames -> the registered InboundHandler (rebuilt into a MessageEvent
via _event_from_wire, mapping the snake_case SessionSource wire form back
onto the gateway dataclasses).
- send_outbound / send_follow_up / get_chat_info: request/response correlated
by a uuid requestId against a per-request future, with a timeout so a caller
never hangs; send_interrupt is fire-and-forget.
- disconnect(): cancels the reader, closes the ws, and fails any in-flight
outbound waiters with a structured error.
RelayAdapter.connect() now negotiates the real CapabilityDescriptor from the
transport and adopts it (_apply_descriptor updates MAX_MESSAGE_LENGTH +
markdown surface), replacing the construction-time placeholder. Lazy
'import websockets' mirrors gateway/platforms/feishu.py; WEBSOCKETS_AVAILABLE
gates construction.
The contract's §6 still said the connector 'forwards the signed body
byte-for-byte so the gateway's existing crypto validates against unmodified
bytes.' That model is incoherent under an untrusted, disposable tenant
gateway on a shared bot:
- re-validating Twilio HMAC / WeCom crypto needs the shared signing secret
(handing it over IS the cross-tenant leak),
- WeCom payloads are encrypted with that secret (the connector must decrypt
at the edge just to route),
- a Discord interaction token lives inside the signed body — you can't both
preserve the bytes and strip the credential.
Rewrites §6 to the actual model: the connector is the SOLE crypto/identity
boundary — verifies/decrypts at the edge, normalizes to a tenant-scoped
MessageEvent, strips shared-identity capabilities into its vault, and
forwards only the sanitized event. The gateway re-validates nothing (the
invariant test from the crypto-shed commit enforces this). Notes that this
unifies the passthrough + relay planes and points to the connector repo's
capability-trust-boundary.md.
Also documents the follow_up op in §4 (token-less capability action added
in the previous commit). The conformance test (§2/§3 tables) stays green;
contract is unpublished/EXPERIMENTAL so no version-bump ceremony. 55 passed.
The relay outbound surface had send/edit/typing but no way to act on a
SHARED-identity capability (e.g. a Discord interaction follow-up token,
~15min) that the connector captured + stripped at the edge. Under A2 that
credential never reaches the gateway, so the gateway can't just 'send with
the token' — it needs a semantic op naming the session it's already in.
Adds the follow_up op end to end on the gateway side:
- RelayTransport.send_follow_up(action): protocol method. Action carries
op='follow_up' + session_key + kind + content (+ metadata) and NO token.
- RelayAdapter.send_follow_up(session_key, kind, content, metadata): builds
that action and returns a SendResult. The connector resolves the real
capability (its resolveOutboundCapability), enforces the tenant match so
tenant B can't wield tenant A's capability, and egresses; success=False
when the capability is absent/expired/mismatched (nothing to retry — a
leaked gateway holds zero capability material).
- StubConnector records follow_ups + a canned next_follow_up_result.
Tests: round-trips without a token; the wire action carries only session
refs (no credential value field — the 'kind' string is a type ref, not the
secret); failure surfaces when the connector can't resolve; no-transport
fails cleanly. 55 passed. §4 doc entry follows in the contract-rewrite commit.
Under the A2 trust model the connector is the SOLE crypto/identity
boundary: it verifies/decrypts every inbound platform payload at the edge
(it holds the tenant secrets), normalizes to a tenant-scoped MessageEvent,
and forwards only the sanitized event. The gateway re-validates nothing —
it cannot without being handed the shared signing secret, which on a
shared bot is itself the cross-tenant leak.
The relay path already imports no platform-crypto today; this locks that
in as an enforced invariant so nobody bolts re-validation (Discord
ed25519, Twilio HMAC, WeCom BizMsgCrypt, generic webhook signature checks)
onto the relay later and silently re-couples the gateway to platform
secrets it must never hold. Verification stays in the direct platform
adapters (gateway/platforms/*) which serve non-relay deployments.
- test_relay_package_imports_no_platform_crypto: AST-walks gateway/relay/*
and fails on any import of a platform-crypto/verification module.
- test_relay_package_calls_no_signature_verification: fails on any
verification-symbol reference (ed25519/hmac/bizmsg/verify_*).
Invariants (assert the relation 'relay re-validates nothing'), not frozen
snapshots. Verified the guard bites: injecting a wecom_crypto import makes
it fail, removing it goes green. docs §6 rewrite follows in a later commit.
The Phase 1 exit gate requires BOTH Discord and Telegram to round-trip
through the relay stub, but test_relay_roundtrip.py only covered Discord.
Add the Telegram companion exercising its distinct discriminator profile:
- no guild_id — two chats isolate on chat_id alone
- forum topics share one chat_id and isolate by thread_id (the Telegram
analog of Discord per-guild isolation), shared across participants by
default (thread_sessions_per_user=False)
- DM isolation by chat_id
- utf16 len_unit + markdown_v2 dialect round-trip and configure the adapter
- outbound send round-trips through the stub
Proves the CapabilityDescriptor + build_session_key generalize beyond
Discord, not just the struct (which the descriptor unit tests already
covered).
Add an invariant test pinning docs/relay-connector-contract.md to the
Python source of truth so the doc (which the connector repo mirrors by
hand) cannot silently drift:
- CapabilityDescriptor §2 table ⟷ dataclass fields + required/optional
- SessionSource wire keys (to_dict output) ⟷ §3 documented fields
- per-platform discriminator columns exist as real SessionSource fields
- guard that is_bot stays off the wire until deliberately promoted
Writing the test surfaced a real gap: §3 only enumerated 5 discriminators
in its per-platform table while to_dict() emits 12 keys. Seven wire keys
the connector must populate (chat_name, chat_topic, user_id_alt,
chat_id_alt, parent_chat_id, message_id, user_name) were undocumented —
a connector author reading the doc would never know to set them. Added a
complete SessionSource wire-field table to §3. The connector's existing
contract.ts already carries all 12, so no connector change is needed; the
doc was the lagging artifact.
The platform-connected-checker invariant test requires every built-in
Platform enum member to have either a generic token path or a bespoke
entry in _PLATFORM_CONNECTED_CHECKERS. Platform.RELAY was added without
one, so test_all_builtins_have_checker_or_generic_token_path failed.
Relay dials OUT to a connector and is 'connected' once an endpoint URL
is configured (extra['relay_url'] or extra['url']); the capability
descriptor is negotiated at handshake time, so the URL is the only
config-level signal in the experimental phase. Add the checker plus a
synthetic-config case exercising its True path.
CI guard: fails if gateway/ or plugins/ ever imports the test-only stub
connector or defines StubConnector. Matches code leaks (imports / class defs),
not prose mentions, so the transport.py docstring reference to the stub's path
is allowed.
Phase 1 complete. Task 1.6 of the gateway-relay plan.
Formal interface between the Hermes gateway (RelayAdapter) and the Node
connector repo: handshake, CapabilityDescriptor field table, MessageEvent
inbound envelope with per-platform SessionSource discriminators (Discord
guild_id is REQUIRED for server isolation), outbound action set, /stop
interrupt routing, signed-body verify-at-edge/byte-preserving rule, and the
additive-only contract_version policy. Documents bot-identity-vs-tenant
separation so single-bot consolidation (Phase 6) stays open. Read-first
artifact for the connector implementer.
Phase 1, Task 1.5 of the gateway-relay plan.
RelayAdapter.on_interrupt(session_key, chat_id) bridges a connector-delivered
mid-turn /stop into the existing interrupt_session_activity path, setting the
per-session _active_sessions Event and clearing typing — cancelling exactly the
targeted session's turn without touching siblings (mirrors test_stop_thread_
sibling isolation). Transport.send_interrupt carries the gateway-side egress to
the connector for socket-owner routing.
Phase 1, Task 1.4 of the gateway-relay plan.
register_relay_adapter() registers the generic 'relay' platform via the same
PlatformRegistry path as plugin adapters — no core dispatch changes. OFF by
default (dark-launch): only registers when HERMES_GATEWAY_RELAY is truthy (or
force=True for tests), so existing single-tenant/direct deployments are
unaffected. Factory builds a transport-less RelayAdapter with a placeholder
descriptor; the real descriptor is negotiated at handshake.
Phase 1, Task 1.3 of the gateway-relay plan.
Defines RelayTransport (lifecycle/handshake/inbound/outbound/interrupt) as the
gateway<->connector wire contract; RelayAdapter.connect now registers an inbound
handler that bridges connector-delivered MessageEvents into handle_message.
Adds an in-memory StubConnector under tests/ and an E2E round-trip proving:
connect registers the handler, inbound events reach the adapter, guild_id drives
build_session_key isolation (two guilds -> two keys; same guild/channel/user ->
one), outbound send round-trips, get_chat_info is proxied.
Phase 1, Task 1.2 of the gateway-relay plan.
One BasePlatformAdapter subclass that reads its capability profile from a
CapabilityDescriptor: MAX_MESSAGE_LENGTH attribute, message_len_fn (table-driven
by len_unit: chars=len, utf16=Telegram-style code units), supports_draft_streaming.
Implements the four abstract methods (connect/disconnect/send/get_chat_info) by
delegating to an injected RelayTransport (full protocol lands in Task 1.2). Adds
Platform.RELAY enum member. No per-platform gateway code.
Phase 1, Task 1.1 of the gateway-relay plan.
CapabilityDescriptor.from_platform_entry() projects an existing PlatformEntry
(label, max_message_length, emoji, platform_hint, pii_safe, name) into a
descriptor, proving the descriptor is a projection of existing config rather
than a parallel concept. Runtime-only capabilities (len_unit, draft/edit/
thread/markdown) are caller-supplied. max_message_length==0 ('no limit') maps
to the stream_consumer 4096 default.
Phase 0 complete. Task 0.3 of the gateway-relay plan.
Behavioral regression harness locking the capability surface that the future
RelayAdapter must reproduce: the abstract-method set (connect/disconnect/send/
get_chat_info), message_len_fn default, supports_draft_streaming default, and
the stream_consumer MAX_MESSAGE_LENGTH attribute read. Passes on main before
any RelayAdapter exists.
Phase 0, Task 0.1 of the gateway-relay plan.
After the June lockfile regeneration (#46652) floated electron and reshuffled
npm workspace hoisting, the desktop pack fails with "The specified electronDist
does not exist". apps/desktop/package.json pointed electronDist at the repo
root (../../node_modules/electron/dist) while npm now installs electron nested
under apps/desktop/node_modules/electron. The two contradict, so a clean
install can never package the app (Windows + macOS).
- electronDist -> node_modules/electron/dist (resolved relative to apps/desktop,
i.e. the workspace-local install npm actually produces).
- hermes_cli/main.py, scripts/install.sh, scripts/install.ps1: add a runtime
electron-dir resolver that prefers apps/desktop/node_modules/electron and
falls back to the root hoist, so dist checks + the mirror re-download work
under either npm layout.
- patch-electron-builder-mac-binary.cjs: try the workspace-local Electron.app
before the root hoist in the macOS binary-restore fallback (sibling site no
PR touched).
- test: assert build.electronDist resolves to where the lockfile installs
electron, so a future hoist change (root <-> nested) can't silently break it.
Salvages the overlapping work in #48003 (sitkarev), #48012 (omegazheng), and
#48033 (james47kjv).
Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>
* fix(desktop): recover stranded session windows when resume fails
Opening a session in a new window (or any routed resume) could latch the
thread loader on "session" forever — the reported "stays stuck loading,
even after a nap" bug. Two compounding causes:
1. use-session-actions.resumeSession's catch ran the REST transcript
fallback OUTSIDE its own try. When session.resume rejected AND the
fallback also threw (the common case on a wedged/unreachable backend),
the throw skipped setMessages and left activeSessionId null with an
empty transcript — exactly the state the loader gates on
(messagesEmpty && !activeSessionId), with no terminal/error state.
2. use-route-resume's self-heal could never re-fire: resumeSession sets
selectedStoredSessionIdRef synchronously at entry (before failing), so
stuckOnRoutedSession stays false, and on an already-open idle window
neither pathnameChanged nor gatewayBecameOpen fire again. The window
never retried — naps, focus, nothing recovered it.
Fix:
- Wrap the REST fallback in its own try so a fallback failure can't strand
the loader.
- Add $resumeFailedSessionId: armed on terminal resume failure, cleared at
the next resume's entry (and left clear on success).
- use-route-resume gains a bounded backoff auto-retry (4 attempts, 1s→8s)
that re-resumes while the routed session matches the failure flag, with a
fire-time liveness recheck so a recovered session isn't double-resumed.
Regression tests cover: fallback-wrap arming the flag without throwing,
flag cleared on success, retry fires on backoff, no retry for a
non-routed/recovered session, and the retry cap.
* feat(desktop): show error + manual Retry when resume retries exhaust
When a stranded session window's bounded auto-retry gives up (gateway
resume RPC + REST fallback fail through all MAX_RESUME_RETRIES attempts),
the loader latched forever. Add a $resumeExhaustedSessionId atom armed at
the give-up point so the chat view swaps the perpetual spinner for an
explicit error state + manual Retry button. Retry / reconnect / reselect
clears the latch and resets the auto-retry counter for a fresh cycle; a
route-change away from the stranded session also clears it.
Distinct from $resumeFailedSessionId (armed during the backoff window) so
the error UI only appears once auto-recovery has actually given up, not
mid-retry. Adds i18n strings across en/ja/zh/zh-hant and 3 tests covering
latch-arms-on-exhaustion, stays-clear-while-retries-remain, and
clears-on-route-change.
* fix(desktop): address review on stranded-resume recovery layer
Follow-up to review on #47655 (PR head 253bfc0e3). Four issues on the
recovery layer:
1. (blocking) Arm $resumeFailedSessionId only when the transcript is still
empty after the REST fallback ($messages.get().length === 0), matching the
atom's documented contract and the loader's messagesEmpty gate. Previously
armed on any resume-RPC reject regardless of fallback outcome, so a window
that recovered its history via REST still auto-retried and, on exhaustion,
blanked the visible transcript behind the error overlay.
2. Reset the bounded-retry attempt counter on the $resumeExhaustedSessionId
armed->cleared edge so a manual Retry / reconnect / reselect on the SAME
stranded session gets a fresh backoff cycle, not a single one-shot attempt
that immediately re-arms the error. (Keyed on the exhausted latch rather
than the resumeFailedSessionId null->value transition the review suggested:
the auto-retry loop itself toggles resumeFailedSessionId every cycle, so
keying the reset there would defeat the MAX_RESUME_RETRIES cap. Only
resumeSession clears the exhausted latch, making its clear edge the
unambiguous manual-retry signal.)
3. Advance retryAttemptRef only when the timer actually dispatches a resume,
not at schedule time. Prevents unrelated dep changes during the 1s-8s
backoff window (transient gatewayState flip, non-stable resumeSession) from
burning attempts and hitting MAX with fewer than 4 real resume attempts.
4. Drop unrelated blank-line-only insertions in store/session.ts and
use-session-actions.ts to keep the diff tight.
Tests: +3 (RPC-fails-REST-succeeds-no-arm; manual-retry-fresh-cycle;
no-attempts-burned-on-dep-churn). All 19 resume tests + full session-hook
suite (65) pass; tsc --noEmit clean.
---------
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
* fix(photon): preserve text in mixed iMessage attachments
When an iMessage bubble carried both text and an attachment, spectrum-ts'
inbound mapper returned only buildAttachmentMessage(...), dropping the user's
typed text before Hermes could see it. The Photon adapter then had no 'group'
content path, so the text was lost entirely.
- adapter.py: handle a new 'group' content type that flattens text + attachment
items, preserving the typed text alongside cached media (extracted shared
_normalize_binary_payload helper).
- sidecar: emit 'group' content in normalizeContent, and ship
patch-spectrum-mixed-attachments.mjs which patches spectrum-ts' pinned mapper
(at npm postinstall AND at sidecar startup, so existing installs self-heal).
Windows robustness fixes on top of the original PR:
- The patcher's CLI guard used 'import.meta.url === file://${argv[1]}', which
never matches on Windows (file:/// + drive letter) — it silently no-opped.
Switched to pathToFileURL(argv[1]).href.
- The patcher matched \n-joined strings, so a CRLF checkout (Windows git
autocrlf) defeated every replacement. It now normalizes CRLF->LF for matching
and restores the original EOL style on write.
Co-authored-by: Yuhang Lin <yuhanglin@YuhangdeMac-mini.local>
* chore: map YuhangLin contributor email for attribution (#46513)
---------
Co-authored-by: Yuhang Lin <yuhanglin@YuhangdeMac-mini.local>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
A failed turn leaves a red error banner inline in the transcript. These
errors are renderer-local state (never persisted) and stay pinned to the
message until the session is reloaded, so a stale, no-longer-relevant
error (e.g. a transient provider/inference error) lingers with no way to
clear it.
Add an 'x' dismiss button inside the existing MessagePrimitive.Error
block. Clicking it clears the error from BOTH the live $messages view
and the per-runtime session cache — the view first, because
preserveLocalAssistantErrors re-grafts any still-errored message it finds
in the view onto the next session.info flush, so clearing only the cache
would let the heartbeat resurrect the banner. A bare error placeholder
(no streamed content) is dropped entirely; a turn that streamed partial
output before failing keeps its text and just sheds the error.
The control only renders when an onDismissError handler is wired, so
secondary/embedded Thread usages are unaffected. Adds the dismissError
string to all four locales (en/ja/zh/zh-hant) and two behavior tests.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
A /title typed before any message in a fresh desktop chat could be silently
lost: the session DB row is deferred to the first prompt, so session.title
found no row, only stashed pending_title, and returned pending:true. It then
relied on a post-turn apply block to write the title. When that turn never
landed under the same session_key (or the apply path didn't fire), the title
was dropped and the sidebar fell back to the first-message preview — e.g.
"/title my-custom-name" then "hello" left the session titled "hello".
Mirror the messaging gateway's _handle_title_command: an explicit /title is
clear user intent, not an abandoned draft, so create the row up front
(_ensure_session_db_row) and set the title immediately via the profile-aware
_session_db handle, returning pending:false. This also fixes the frontend
symptom for free — the desktop handler's immediate refreshSessions() now pulls
the correct persisted title instead of clobbering the optimistic value with a
still-NULL row.
If row creation can't take (DB unavailable / racing writer), fall back to the
existing pending_title queue so the post-turn apply block remains a recovery
path. The sidebar's min-messages filter keeps a titled 0-message row hidden, so
a /title'd-but-never-used draft still doesn't clutter the list.
Updates the test that asserted the old queue-on-missing-row behavior and adds a
fallback-to-queue regression test.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
* feat(search_files): path-grouped lossless densification of content matches
Content-mode search_files results repeat the {path,line,content} JSON keys
and the full path string for every match. Group consecutive same-path matches
under one path header with indented '<line>: <content>' rows — lossless (every
path/line/content byte preserved), self-describing (matches_format key), and
readable by the model with no decode step.
57.8% mean token reduction on real search_files content outputs (422-output
corpus), fires on 97% of them. Gated at >=5 matches; below that the verbose
array is left untouched. Default to_dict(densify=False) is unchanged, so no
other caller is affected.
ripgrep emits matches path-ordered, so consecutive grouping never reorders
results.
* test: accept densify kwarg in _FakeSearchResult.to_dict
The search loop-detection tests stub SearchResult with a fake whose
to_dict() must mirror the real signature now that it takes densify=.
* test(search_files): edge-case losslessness battery for densification
Adversarial single-line content (colons, indentation, unicode/emoji, empty,
trailing whitespace, quotes+commas), paths with spaces, and an explicit
one-line-per-match invariant documenting the ripgrep contract the format
relies on (0/6775 real match contents contained a newline).
* fix(logging): alias RotatingFileHandler to concurrent-log-handler
On Windows, stdlib RotatingFileHandler.doRollover() uses os.rename(), which
fails with PermissionError [WinError 32] whenever another process holds an
append-mode handle on agent.log — essentially always in Hermes (TUI, gateway,
hy_memory server, MCP servers, and on-demand CLI commands all log from separate
processes). This pinned agent.log at the 5 MiB threshold and spammed stderr
with a traceback on every emit (#44873).
Add concurrent-log-handler==0.9.29 as a core dep and alias its
ConcurrentRotatingFileHandler as RotatingFileHandler in hermes_logging.py. It
wraps the rename in a cross-process file lock (via portalocker: pywin32 on
Windows, fcntl on POSIX) so only one process rotates at a time. Aliasing keeps
every existing isinstance/class-declaration reference working unchanged.
Co-authored-by: tuancookiez-hub <tuancookiez@gmail.com>
* fix(logging): gate concurrent-log-handler swap to Windows only
The initial salvage aliased RotatingFileHandler -> ConcurrentRotatingFileHandler
unconditionally, which regressed POSIX: CLH opens lazily and rotates via its own
lock path, breaking managed-mode (NixOS) group-writable perms and eager file
creation that _ManagedRotatingFileHandler depends on. CI caught it as 2 failures
in test_managed_mode_*_group_writable on Linux.
The WinError 32 bug (#44873) is Windows-specific — POSIX renames an open file
fine, so stdlib already works on Linux/macOS. Gate the swap behind
sys.platform == 'win32': Windows uses CLH, POSIX keeps stdlib RotatingFileHandler.
- hermes_logging.py: platform-conditional import.
- tests/test_hermes_logging.py: import RotatingFileHandler from hermes_logging
(single source of truth) so the autouse fixture's isinstance checks match the
real handler class on both platforms.
- pyproject.toml/uv.lock: mark the dep 'sys_platform == "win32"' so portalocker
/pywin32 only ship where used.
---------
Co-authored-by: tuancookiez-hub <tuancookiez@gmail.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Follow-up hardening on @ehz0ah / @harshitAgr's session-switch work (#28296):
- on_session_switch no longer runs the old-session writer-drain + pending-token
GET + commit POST inline on the caller's command thread. /new, /branch,
/resume, /undo call it synchronously, so a slow drain (up to 10s) or wedged
commit blocked the user-facing command — the same hazard #41945 fixed for
end-of-turn sync. State now rotates synchronously (cheap) and the old-session
commit is offloaded to a daemon finalizer (generalized _finalize_session_async).
- Guard the (_session_id, _turn_count) pair with _session_state_lock: sync_turn
runs on the memory-manager executor thread while the session hooks run on the
command thread, so the snapshot+reset vs increment was a cross-thread race.
- _session_needs_commit checks the committed-session guard BEFORE the
turn_count>0 shortcut, closing a double-commit window when a racing sync_turn
re-increments after commit+reset.
- Add a _shutting_down flag so deferred finalizers stop POSTing against a
torn-down client; track all prefetch threads in a set so invalidate/shutdown
join every one, not just the latest slot.
Tests: regression for the non-blocking switch (asserts the caller returns while
a slow drain is parked off-thread) and the committed-guard ordering; updated the
deferred-commit test to the unified finalizer contract.
* fix(desktop): pin Electron below the broken native extract-zip install
The Windows desktop install fails at "Building desktop app": Electron's
postinstall aborts with `ERR_DLOPEN_FAILED loading
index.win32-x64-msvc.node` / "Cannot find native binding" from
`@electron-internal/extract-zip`.
Root cause is a dependency drift, not the user's machine. Electron changed
its install mechanism mid-patch-series:
electron 40.9.3 .. 40.10.2 -> @electron/get@^2 + extract-zip@^2 (pure JS)
electron 40.10.3 / 40.10.4 -> @electron/get@^5 + @electron-internal/extract-zip@^1 (native napi)
apps/desktop declares `electronVersion: 40.9.3` (the tested, JS-extract
build) but pinned the dependency as `electron: ^40.9.3`, so `npm ci`/`npm
install` silently resolved 40.10.3/40.10.4 — onto the brand-new native
extract-zip whose win32-x64 binding fails to dlopen on some Windows hosts.
The committed lockfile already carried 40.10.3, and the installer's mirror
fallback can't help (it re-runs Electron's own `install.js`, which uses the
same broken native module).
Fix:
- Pin `electron` to an exact `40.10.2` — the newest build before the native
extract-zip switch — and align `build.electronVersion` to match (Electron
Builder needs electronVersion/electronDist to match the installed binary).
- Add a root `yauzl: ^3.3.1` override so the (re-introduced) JS extract-zip
path also works on Node >= 24.16 / >= 26.1, where the old yauzl hangs.
This is the same workaround the wider Electron ecosystem adopted.
- Regenerate package-lock.json: drops @electron-internal/extract-zip and
@electron/get@5, restores @electron/get@2 + extract-zip@2 + yauzl@3.4.0.
* test(desktop): lock the Electron pin/version/lockfile consistency contract
Guards against the dependency drift that broke the Windows desktop install:
the Electron dependency must be an exact version, must equal
build.electronVersion, and the lockfile must resolve to that same version so
`npm ci` installs exactly what electron-builder packages. Asserts the
relationships, not a specific version number.
* fix(desktop): keep streaming painting in unfocused secondary chat windows
The chat transcript streams to screen through a requestAnimationFrame-gated
flush, which Chromium pauses for blurred/occluded windows. The primary window
opted out with `backgroundThrottling: false`, but the secondary "session
windows" (cmd-click pop-out, new-session, subagent-watch) hand-copied their
webPreferences and silently lost that flag — so a streamed answer in one of them
stalled until the window regained focus (reported on Windows 11). The primary
window's own comment even claimed it was "matching the secondary windows," which
was no longer true.
Hoist the chat-window webPreferences into a single shared factory
(`chatWindowWebPreferences`) in session-windows.cjs and use it for BOTH windows,
so they can never drift on this flag again.
* test(desktop): assert chat windows disable background throttling
Cover chatWindowWebPreferences: it must set backgroundThrottling=false (so the
streaming transcript paints while the window is blurred) and pass the preload
path through while keeping the hardened defaults (contextIsolation, sandbox,
nodeIntegration=false).
The model is callable via xAI OAuth but omitted from models.dev and
/v1/models listings. Merge it into the curated xAI catalog so it appears
in `hermes model` without requiring a custom model name.
Avoid applying text-only persist_user_message overrides to multimodal current-turn user messages. Early crash-resilience persistence mutates the same messages list later used for the API call, so clobbering list content drops ACP image blocks before model dispatch.\n\nAdd regression coverage for both text override behavior and multimodal preservation.\n\nCloses #44242
* feat(mcp): raise default tool-call timeout 120s -> 300s
Port from openai/codex#28234. Long-running MCP tools (web fetches,
sandboxed builds, deep-research servers) routinely exceed 120s, causing
spurious timeout failures. Codex bumped its default MCP tool timeout from
120 to 300 for the same reason.
- _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server
'timeout' config override unchanged)
- update test_default_timeout assertion
- document the default in mcp-config-reference.md
* refactor: remove agent-callable send_message tool
The agent should not decide on its own to fire off cross-platform
messages or reactions. Outbound platform messaging is handled outside
the agent loop — cron delivery, the gateway kanban notifier
(dashboard-toggled), and the `hermes send` CLI.
Removes the model-tool registration only; the send engine in
send_message_tool.py (_send_to_platform, _send_via_adapter,
_parse_target_ref, per-platform _send_* helpers) is kept intact for
those non-agent callers. Drops the now-empty 'messaging' toolset and
its `hermes tools` toggle. Yuanbao DM guidance now points at the
native yb_send_dm tool.
A multi-MB message (logged bundle, huge tool dump) froze the renderer
before any paint: Streamdown runs `preprocess` + `marked` lex over the
whole string synchronously in a useMemo, an uninterruptible long task
that no try/catch or content-visibility can help (our JS runs before the
browser ever skips layout). Tiered fix:
- Message gate: past 200KB, bypass markdown entirely and render the raw
text in `content-visibility:auto` line-chunks — synchronous work is
bounded to a string split, the browser virtualizes layout natively,
and every line stays in the DOM (selectable, find-in-page).
- Code-block budget: past 3k lines / 150KB, skip Shiki (which emits a
span per token) and render plain, chunked the same way.
- Collapse/expand: a reusable ExpandableBlock clamps code blocks and the
huge-text fallback to a 120px preview with a gradient + chevron,
expanding to 300px. The inner element is always a scroll container so
the content-visibility chunks stay lazily laid out in both states.
No content is ever dropped; the copy button (card header) always yields
the full block.
restore_skill() falls back to p.name.startswith(f"{skill_name}-") when no
archive directory matches the requested name exactly. That fallback is meant
to catch the timestamped duplicate archive_skill() writes on a name collision
(<skill>-YYYYMMDDHHMMSS), but the bare prefix also matches any unrelated
archived skill named <name>-something. So restoring "git" can pull an archived
"git-helpers" out of .archive/, rename it to "git", and report success: the
requested skill is not restored and the sibling is gone from the archive.
Constrain the fallback to the exact suffix archive_skill() produces, a 14 digit
timestamp. The exact-name match and the recursive nested-archive walk are
unchanged, so nested and timestamped restores still work; unrelated siblings no
longer match.
Fixes#47647
The OpenAI device-code login (POST auth.openai.com/.../deviceauth/usercode)
had no retry or 429 handling — a transient throttle from OpenAI surfaced as
a bare "Device code request returned status 429" with no guidance, reading
as a hard login failure.
- Retry the device-code request with capped exponential backoff (honoring
Retry-After), up to 4 attempts.
- On persistent 429, raise a clear AuthError tagged CODEX_RATE_LIMITED_CODE
(classified transient, not a credential problem) with a wait hint.
- Apply the same 429 classification to the token-exchange step (same bug
class).
Unrelated to PR #47399 (Responses-API cache headers); this is the OAuth
device-code path in hermes_cli/auth.py.
get_runtime_status_running_pid() validates liveness with a local
os.kill(pid, 0) probe. In /api/status the runtime record can be the
REMOTE health-probe body (cross-container), whose PID belongs to another
host and is display-only — probing it locally is wrong and trips the
test live-system guard (os.kill on a PID outside the test subtree).
Run the fallback only against the local read_runtime_status() record.
_profile_scope swaps process-global skills_tool/skill_manager module
attrs under an RLock; /api/status holds that scope across the
run_in_executor remote-health probe await, so a concurrent
/api/skills?profile=X request can cross-restore the status profile's
skill dir on its finally. Add _config_profile_scope (contextvar-only,
task-local, await-safe) and use it for status, which only resolves
get_hermes_home() at call time for config/env/gateway state and never
needs the skills-module globals.
Context files (AGENTS.md, CLAUDE.md, .hermes.md, .cursorrules, SOUL.md) were
hard-capped at a flat 20K chars before head/tail truncation. Among the agent
harnesses we track, only Codex caps project docs at all (32 KiB); Claude Code,
OpenCode, and Cline load them whole. The flat 20K predates large context
windows and silently truncates real-world AGENTS.md files.
B — dynamic cap: when context_file_max_chars is unset (now the shipped
default), the cap scales with the model's context window
(ctx_tokens * 4 * 0.06, floor 20K, ceiling 500K). Small-context models stay at
the historical 20K; a 200K model gets 48K; large models stop truncating real
docs. An explicit context_file_max_chars still wins. Context length is resolved
once per conversation (stable -> prompt cache untouched).
C — when truncation does happen, the marker now names the concrete file path
and tells the agent to read_file it for the full content.
Validation: 154 targeted tests + full agent/ + hermes_cli/ + test_config
(0 failures); E2E against a real 60K AGENTS.md confirms small windows truncate
with the path-bearing marker, large windows load whole, and the system prompt
is byte-stable across rebuilds.
unicode-bidi:plaintext (#44596) resolves text direction per line, but
list markers and the blockquote border are box chrome driven by the CSS
direction property, which plaintext never sets, so an RTL list renders
its numbers stranded at the far left edge. CSS cannot close this gap
(:dir() only reads the dir attribute, never plaintext resolution), so
ul/ol/blockquote carry dir="auto" and the browser resolves their box
direction natively while the plaintext rules keep owning the text.
Inline code carries dir="ltr", which HTML's auto algorithm skips,
matching the no-vote contract the CSS isolate already gives it.
Rolling back to the oldest curator snapshot failed and deleted that
snapshot. rollback() takes a safety snapshot first, and snapshot_skills()
ends by pruning the backups directory down to keep (5 by default). At the
steady keep limit that prune removed the oldest snapshot, which is the very
one being restored, so the extract found no skills.tar.gz and the rollback
stopped with "snapshot extract failed (state restored)".
Thread an optional protect set through snapshot_skills() into _prune_old()
so the pre rollback safety snapshot can never evict the snapshot being
restored. Add two regression tests covering restore of the oldest snapshot
at the keep limit.
Fixes#47612
The curator now defaults to prune-only: the deterministic inactivity pass
(mark stale / archive long-unused skills) still runs whenever the curator is
enabled, but the opinionated LLM umbrella-building consolidation fork is OFF
by default.
- agent/curator.py: add DEFAULT_CONSOLIDATE=False + get_consolidate(); gate
the forked aux-model review in run_curator_review behind it (new consolidate
param, None=read config). When off, the LLM pass is skipped entirely (no
aux-model cost); the run is still recorded and reported.
- config.py: add curator.consolidate (default false); v29->v30 migration seeds
the key for existing installs without clobbering a user-set value.
- hermes_cli/curator.py: 'hermes curator run --consolidate' override; status
shows consolidate state; prune-only notice on run.
- docs + tests.
When refresh_launchd_plist_if_needed() runs from inside the gateway's own
launchd process tree (agent-initiated self-update via the terminal tool), a
direct launchctl bootout tears down the service's process group — including
the CLI doing the refresh — before the follow-up bootstrap can run. The
gateway is left unloaded and KeepAlive can't revive it (#43842).
Detect in-service execution via gateway.status.get_running_pid() +
_is_pid_ancestor_of_current_process(), and delegate the bootout->bootstrap to
a detached (start_new_session=True) helper that survives the process-group
teardown. The normal out-of-tree CLI path is unchanged.
Fixes#43842.
A turn that ends in an error (e.g. an out-of-funds state) was being
re-rendered in unrelated threads. On a warm thread switch the on-screen
`$messages` still belongs to the previously viewed thread, and
`flushPendingViewState` fed it into `preserveLocalAssistantErrors`, which
grafted the prior thread's failed turn onto the newly opened one. Because
the polluted view then became the next switch's baseline, the error
cascaded into every thread the user visited.
Only carry local errors across a view flush when the on-screen baseline is
the same session being flushed; the cached state we publish already retains
that session's own errors. Also surface the turn error as a global toast
even when the failing turn ran in a background thread, since the error
blocks all subsequent interactions until the user acts.
The double-underscore prefix swap fixed bare native tools but SKIPPED tools
already named mcp_<server>_<tool> (real MCP servers, e.g. mcp_linear_get_issue):
they went on the OAuth wire single-underscore and still tripped Anthropic's
third-party billing classifier -> HTTP 400 'extra usage, not plan limits'.
Verified empirically against a live Max subscription: a single mcp_ tool flips
the whole request to the extra-usage lane; mcp__ is accepted.
- build_anthropic_kwargs: promote ANY leading single-underscore mcp_ to mcp__
(bare names -> mcp__name; mcp_<server>_<tool> -> mcp__<server>_<tool>),
never double-prefixing an already-mcp__ name. Same for tool_use blocks in
history.
- normalize_response: reverse the mcp__ wire name back to whichever original
the registry knows — the single-underscore mcp_<server>_<tool> form for MCP
server tools, or the bare name for native tools — preferring a name that
already resolves natively.
- Tests rewritten to assert the invariant: ZERO single-underscore mcp_ names
reach the OAuth wire, and the mcp__ round-trip resolves back to the
registered name for both native and MCP-server tools.
Builds on liuhao1024's mcp__ prefix commit (cherry-picked). Closes the
MCP-server gap that left any session with an MCP server configured still
billing to extra usage.
Anthropic's Claude-Code request classifier treats tool names with a
single-underscore `mcp_<x>` prefix as non-Claude-Code / third-party,
routing the request to extra-usage billing (HTTP 400). Real Claude Code
uses double underscores: `mcp__<server>__<tool>`.
Change the tool-name prefix from `mcp_` to `mcp__` in both the outgoing
path (build_anthropic_kwargs) and the incoming path
(normalize_response). Update the skip-guard to check for both `mcp_`
and `mcp__` prefixes so native MCP server tools (which use the legacy
single-underscore format) are not double-prefixed.
Fixes#46675
`hermes login` was removed in favor of `hermes auth` / `hermes model`, but
the subparser still validated `--provider` against a hardcoded choices list
(nous, openai-codex, xai-oauth). Running `hermes login --provider anthropic`
therefore crashed in argparse with `invalid choice: 'anthropic'` *before* the
deprecation handler could print the redirect to `hermes model` — so a user
trying to authenticate a perfectly valid provider just saw a hard error and
assumed the feature was broken rather than relocated.
- Drop the restrictive `choices=` so every `--provider` value reaches the
deprecation handler (which ignores the value and prints guidance).
- Omit the subparser `help=` kwarg so the dead command no longer advertises
itself in `hermes --help` (#24756). Avoids the `==SUPPRESS==` placeholder
leak that `help=argparse.SUPPRESS` emits for a top-level subparser on 3.12+.
- `hermes login [--flags]` still reaches the actionable deprecation message
for old scripts/aliases; `hermes login --help` shows the redirect.
Picks up the intent of the inactivity-closed #24902, rebased onto the
post-refactor parser location (hermes_cli/subcommands/login.py) and extended
to fix the whole bug class (any provider value), not just hiding from --help.
Tests: parametrized provider acceptance + help-suppression (no SUPPRESS leak).
The docstring described a token as path-like when it contains a "/"
separator, but the keystroke-latency fix now excludes "://" scheme tokens
(URLs) even though they contain "/". Document the exclusion so the contract
matches the behavior.
Regression coverage for the keystroke-latency fix: a URL token contains
"/", so the bare-slash path heuristic used to return it as a path word and
run os.listdir on every keystroke. Assert _extract_path_word rejects
http/https/ssh scheme tokens, that ordinary paths (incl. a bare colon) are
unaffected, and that the completer never touches the filesystem for a URL
under the cursor.
The interactive CLI input box runs its completer with
`complete_while_typing=True`, so `SlashCommandCompleter.get_completions`
is invoked on *every* keystroke. That completer does blocking I/O:
fuzzy `@`-file indexing shells out to `rg`/`fd` (up to a 2s timeout) and
file-path completion calls `os.listdir` + `stat`. Because the completer
was passed inline (never wrapped in `ThreadedCompleter`), all of this ran
synchronously on the prompt_toolkit event loop, stalling the render after
each key — very noticeable on WSL2 and other slow-filesystem setups
("typing in the prompt box being very latent").
Two fixes:
- Wrap the input completer in `ThreadedCompleter` so completion work runs
off the UI event loop and never blocks rendering between keystrokes.
- Stop treating URLs as file paths in `_extract_path_word`: a token like
`https://example.com/x` contains `/`, so it triggered `os.listdir` on
every keystroke while typing/pasting a link (listing a bogus `https:`
dir) for a completion that can never be useful. Skip any token with a
`://` scheme separator.
(cherry picked from commit b5be2ba276)
Streamdown runs our `preprocess` inside its own useMemo, and the user
bubble runs `extractEmbeddedImages`/directive parsing inside theirs — so
anything thrown while rendering one message (a regex/stack overflow on
adversarial content) escapes to the ROOT error boundary and takes down
the entire app, as seen in a reported `RangeError: Maximum call stack
size exceeded` from a single message.
Wrap both the assistant preprocess pipeline and the user-message
directive passes in try/catch that degrade to the raw text. One bad
message now renders plain instead of nuking the transcript.
`normalizeFenceBlocks`/`pushProseFence` appended block bodies with
`out.push(...lines)`, which spreads every line as a separate call
argument. A single message carrying a large fenced block (a logged
minified bundle, base64 blob, or big tool dump — common in long
sessions) overflows V8's argument-count limit and throws
`RangeError: Maximum call stack size exceeded`, breaking the transcript
render. Compression doesn't save us: it gates on tokens vs. window, not
a single message's line count, and the protected recent tail renders
verbatim regardless.
Append iteratively via a small `extend()` helper. Behavior is identical
for normal-sized blocks.
sync_turn's bounded join could drop a still-alive previous worker by
replacing the single _sync_thread slot. The dropped worker kept POSTing
under the old sid but was no longer visible to on_session_end /
on_session_switch, so the commit could fire while orphaned writes were
still in flight — those writes landed past the commit boundary and were
never extracted.
Replace the single _sync_thread slot with _inflight_writers:
Dict[sid, Set[Thread]]. Writers self-register on spawn (sync_turn,
on_memory_write) and self-deregister on exit. The commit path drains
_drain_writers(sid, 10.0) and skips the commit if any writer for that
sid is still alive after the bounded budget.
Also trim inline review-rationale comments to short invariants per
reviewer style ask: "commit only after session writes drain" and
"drop prefetch results from older switch generations."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 7537ee6f5b)
Three follow-ups from review on #28296:
1. Sync worker outliving the bounded join. Each sync_turn POST has
_TIMEOUT=30s and there are two per turn, but on_session_end and
on_session_switch only join for 10s. If the worker is still alive
after the join, committing the old session orphans the worker's
late writes past the commit boundary — they land in an already-
committed session and never get extracted. Both hooks now re-check
is_alive() after the join and skip the commit when the worker
hasn't drained.
2. on_memory_write late session_id capture. Same shape as the
pre-fix sync_turn: f-string for the post path read self._session_id
inside the worker, so a switch between thread spawn and post call
landed the memory note in the new session. Snapshot sid at call
time, same pattern as sync_turn.
3. Stale prefetch repopulating the new session. The pre-switch
drain+clear only protects against workers that finish before the
join completes; one finishing after the clear would write its
result into the new generation's slot. Added a monotonic
_prefetch_generation; workers capture it at spawn and refuse to
write if it has advanced.
Tests: existing in-flight-sync test updated to drain (it tested the
join-before-commit happy path); four new tests cover hung-writer skip
on end + switch, on_memory_write sid capture, and prefetch generation
gating. 177/177 memory tests pass.
(cherry picked from commit 3791a87dbe)
Two hardening fixes prompted by review on #28296:
1. sync_turn() now snapshots the target session id before spawning the
worker. The previous code read self._session_id inside the worker, so
a worker delayed past on_session_switch's bounded join could read the
rotated-in NEW id and write the OLD turn's messages into the wrong
session.
2. on_session_end() resets _turn_count to 0 after a successful commit,
making the old-session commit path idempotent with the new switch
hook. /new and compression call commit_memory_session() (which fires
on_session_end) immediately before on_session_switch; without this,
the old session would be committed twice. On commit failure we leave
_turn_count > 0 so on_session_switch retries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 2ea8d5c537)
OpenVikingMemoryProvider only overrides on_session_end and inherits the
base-class no-op for on_session_switch. When the agent rotates session_id
(via /new, /branch, /reset, /resume, or context compression), the
provider's cached _session_id stays at the value initialize() captured.
All subsequent sync_turn writes then land in the already-closed old
session, and on_session_end tries to commit it a second time — the new
session never accumulates messages and never triggers memory extraction.
The fix mirrors the pattern Hindsight uses (#17508):
1. Wait for any in-flight sync thread to drain under the OLD _session_id
before we mutate it, otherwise the commit below races the last
message write.
2. Commit the old session if it accumulated turns — same extraction
semantics as on_session_end. Skip if empty (nothing to extract).
3. Drain in-flight prefetch from the old session and clear its cached
result so the new session doesn't see stale recall.
4. Rotate _session_id to the new value and reset _turn_count.
Commit failures are swallowed (logged at WARN) so a flaky server can't
strand the provider on the old session forever — same posture as the
existing on_session_end commit.
(cherry picked from commit a1e7185e8a)
Closes#47111
is_container() only recognized Docker (/.dockerenv), Podman
(/run/.containerenv), and docker/podman/lxc markers in /proc/1/cgroup.
Under cgroup v2 (Kubernetes/k3s on containerd or CRI-O) /proc/1/cgroup
collapses to a single "0::/" line with no runtime marker, so
is_container() returned False on every containerd/CRI pod.
That false negative bypassed container-aware behavior across the CLI.
The most damaging case (reported): even after #46290 fixed
detect_service_manager() to gate on _s6_running() alone, other
is_container() call sites (profile home resolution, gateway behaviors,
config, doctor) still misbehave on containerd.
Broaden detection conservatively:
- KUBERNETES_SERVICE_HOST env var (present in every k8s pod).
- kubepods/containerd/crio markers in /proc/1/cgroup (cgroup v1 nested).
- same markers in /proc/self/mountinfo as a cgroup-v2 fallback.
Tests: 3 new (k8s env, kubepods cgroup, cgroup-v2-via-mountinfo) plus the
existing negative case hardened to stub mountinfo + env; 108 constants +
service_manager tests pass.
Follow-up to salvaged PR #41633: the timestamp prefix injection was
unconditional. Gate the in-context render behind
gateway.message_timestamps.enabled (default false) at both the live-message
and history-replay sites; timestamp metadata is still captured + persisted
regardless so the toggle can be flipped on later. Add DEFAULT_CONFIG entry,
docs, and gate tests.
Consolidates these related Amy fork patches:
- 429830f39 feat(gateway): inject message timestamps into user messages for LLM context
- 3c3d6fac0 fix: handle both ISO string and epoch float timestamps in history replay
- 2874f7725 feat: human-friendly timestamp format with weekday and timezone name
- 3735f4c8b fix: render gateway message timestamps once
* fix(desktop): keep the pre-session model pick selected in the picker
The composer picker derived its "current" row from `model.options ?? store`,
so model.options always won. Pre-session that query returns the PROFILE
DEFAULT, not the sticky composer pick — so selecting a model before a session
exists left the checkmark (and the picker's "current" line) on the default,
making the pick look ignored even though the pill updated.
Add `currentPickerSelection()`: with a live session the gateway's model.options
is authoritative; pre-session the sticky `$currentModel`/`$currentProvider`
wins, falling back to options. Wire it into ModelMenuPanel and ModelPickerDialog.
* feat(desktop): global reasoning/speed defaults in Settings → Model
The composer picker is now sticky-UI/per-session only and never writes the
profile default (#46959), but Settings → Model had no reasoning/speed control
and `agent.reasoning_effort` wasn't in the curated config surface at all
(`service_tier` was buried in Advanced) — so there was nowhere to set the
profile default that crons/subagents/messaging resolve from.
Add capability-gated Reasoning (effort) + Fast controls beside the main model,
gated by the applied model's reported capabilities (reasoning defaults on, fast
off when unreported — same as the composer). They read/write `agent.reasoning_effort`
and `agent.service_tier` by round-tripping the config record, matching the
gateway's value semantics (service_tier "fast"/"priority"/"on" ⇒ fast).
* refactor(desktop): don't open the reasoning select from its row label
A <label> wrapping the Select forwarded text clicks to the trigger, opening
the dropdown unexpectedly. Plain row for reasoning; Fast stays a <label> so
clicking its text toggles the switch (expected for a checkbox-like control).
* fix(desktop): re-download Electron binary via mirror when pack fails (#47266)
Since #38673 pinned build.electronDist to node_modules/electron/dist,
electron-builder reads the Electron binary straight from there and never
downloads it during `npm run pack`. That dist tree is only produced by the
electron package's postinstall (install.js) during `npm ci`. When that
download is blocked or throttled (GitHub's release host is unreachable in
some regions), the dist is missing and the build dies with:
The specified electronDist does not exist: .../node_modules/electron/dist
The existing ELECTRON_MIRROR fallback in all three desktop-build paths
(scripts/install.ps1, scripts/install.sh, and `hermes desktop` in
hermes_cli/main.py) re-ran `npm run pack` with ELECTRON_MIRROR set — but
pack never downloads Electron anymore, so the mirror was never used and the
retry re-read the same missing dist. The fallback was effectively dead.
Drive the mirror through electron's own downloader instead:
- Add a dist-presence check + a downloader helper (Test-ElectronDist /
Restore-ElectronDist, _electron_dist_ok / _restore_electron_dist,
_electron_dist_ok / _redownload_electron_dist) that wipes a partial dist
+ the path.txt version marker (electron's install.js short-circuits on it)
and re-runs `node install.js`, optionally via a mirror.
- On the first retry, repopulate a missing dist from the canonical source;
on the mirror retry, re-fetch through npmmirror.com, then pack.
- Gate the re-download on the dist check so an unrelated build failure
(tsc/vite) doesn't trigger a pointless ~200 MB refetch, and skip the final
pack when the binary still can't be fetched instead of failing the same way.
* test(desktop): cover Electron dist re-download mirror fallback (#47266)
Add behavior coverage for the electronDist re-download fix:
- _electron_dist_ok across linux/win32/darwin, including the partial-dist
case (dir present but binary missing) that makes the pinned electronDist
fail.
- _redownload_electron_dist: no-op when the binary is present, bail when
install.js is absent, wipe a stale dist + path.txt marker and run
electron's downloader with ELECTRON_MIRROR injected, and report failure
when the download still produces no binary.
- `hermes desktop`: the mirror fallback now drives electron's own downloader
before re-running pack, and skips the final pack entirely when the binary
can't be fetched.
Replaces the old mirror test that asserted the (now-fixed) dead behavior of
re-running `npm run pack` with ELECTRON_MIRROR set — pack never downloads
Electron under the pinned electronDist, so that retry could never help.
Guards that two user-defined custom endpoints exposing an overlapping
model each keep their full catalog — the dedup must never cross-filter
two user-defined rows against each other.
The /model interactive picker resolved a base_url from user credentials
but never passed it to ProviderProfile.fetch_models(), causing the
picker to always query the provider's hardcoded default endpoint
instead of the user's custom URL (e.g. a company litellm proxy).
- providers/base.py: add optional base_url parameter to fetch_models()
- hermes_cli/models.py: pass resolved base_url to fetch_models()
- Update all subclass overrides for signature compatibility
- Add 6 regression tests covering override, fallback, and integration
Support files under references/, templates/, assets/, and scripts/ are progressive-disclosure data loaded through skill_view(..., file_path=...). They should not be treated as standalone skills during discovery or collision checks.
This prevents archived skill packages or support markdown files inside a real skill from shadowing active skills with the same name while still allowing top-level categories named scripts/templates/assets/references.
Tests cover:
- pruning nested SKILL.md files inside skill support directories
- preserving support-named top-level categories
- avoiding skill_view collisions from support markdown
- keeping archived package SKILL.md files accessible only through file_path
Salvage follow-up for PR #29575: add regression tests for the section-3
no-api_key /v1/models probe (probes bare endpoints, skips when explicit
models set) and add the contributor AUTHOR_MAP entry.
Section 3 of list_authenticated_providers (user-defined endpoints from
the providers: config section) required an api_key before probing the
endpoint's /v1/models for live model discovery. This broke local
self-hosted backends (llama.cpp, Ollama, vLLM, etc.) that don't require
authentication — they would only ever show the single default_model
from config instead of the full model catalog.
Section 4 (custom_providers list) already handled this correctly with
the policy: probe when api_key is set OR when no explicit models are
configured. Apply the same logic to Section 3 so local backends get
full model discovery without requiring a placeholder api_key workaround.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cover #47375 fix: record-on-rich-send + lookup-on-reply round trip,
lookup miss leaving reply_to_text None, and precedence (native quote
and echoed caption both win over the index fallback).
Telegram does not echo a sendRichMessage's content back in
reply_to_message (.text/.caption empty, .api_kwargs None), so replies
to rich sends (briefings, the gateway's own rich finals) arrived with
no quotable text and the [Replying to: ...] injection was skipped.
Remember message_id -> text at send time in a best-effort JSON index
(gateway/rich_sent_store.py), and recover it on inbound when text and
caption are both empty. Best-effort and no-throw throughout: any
failure degrades to prior behavior and never breaks a send or message.
Salvaged from #47375 by @x1erra. Dropped the cross-platform run.py
reply-prefix rewrite (out of scope; bloated every reply on every
platform) and scrubbed a docstring reference to an out-of-repo script.
Kept the inbound reply_to logging enrichment used to verify the fix.
The #45954 model-dedup builds `user_models` from every is_user_defined
row, then strips those model IDs from every row where is_aggregator(slug)
is True. But is_aggregator() returns True for *every* `custom:*` slug, and
list_authenticated_providers emits named custom providers with slug
`custom:<name>` and is_user_defined=True. So a user's own custom provider
is treated as an aggregator and filtered against user_models — which holds
exactly its own models (the row helped build that set). Every model is
removed, the row drops to zero, and the provider disappears from the model
picker.
Guard the dedup loop to skip is_user_defined rows: a user's configured
provider is never an aggregator duplicate of itself. Built-in aggregators
(openrouter, etc.) are still deduped as before. Adds a regression test.
Reflect the default-model change in the xAI Grok OAuth guide, the web
search docs (EN + zh-Hans), and the web provider docstring. grok-4.3 is
kept in the model tables as the previous default; the Nous/OpenRouter
aggregator catalog still lists grok-4.3 and is left unchanged.
Switch the default model for the xAI/Grok provider and the xAI web
search backend from grok-4.3 to grok-build-0.1. grok-build-0.1 is
already recognized by the model metadata, so no new model definition
is required; grok-4.3 remains selectable.
Follow-up to salvaged PR #41624:
- Remove stray urllib.parse import in run_agent.py (cherry-pick cruft, unused)
- Add tests: session:compress emits with correct context, no-callback is
safe, and a callback exception does not break compression
* feat(desktop): stream subagent replies into watch windows
A desktop watch window resumes a child session lazily (no full agent) and
mirrors the parent-relayed `subagent.*` events into native child-session
stream events. The child's streamed reply text was never relayed, so the
window sat blank while the subagent "talked".
- delegate_tool: forward the child's `run_conversation` stream tokens up the
progress relay as `subagent.text` (inert under CLI/TUI — their progress
handlers ignore non-tool event types; only a gateway watch window mirrors it).
- server: mirror `subagent.text` -> `message.delta` on the child sid only, and
skip the parent emit (per-token frames are meaningless on the parent session,
which shows the child via the spawn tree). Demote `subagent.start` to a
one-time goal header and drop the noisy `subagent.progress` mirror — tools
already mirror natively.
- server: guard `_start_agent_build` so a lazy watch session spectating an
in-flight child stays lazy; incidental RPCs were upgrading it to a full
agent mid-stream and silently killing the mirror.
* fix(desktop): keep watch-window chat clear of titlebar chrome
Secondary windows (new-session scratch, subagent watch, cmd-click pop-out)
hide the titlebar tool cluster + session header, so the transcript ran to the
window's top edge and streamed text slid up under the OS traffic lights.
- Gate the hidden chrome on `isSecondaryWindow()` everywhere (app-shell,
chat header, thread list) instead of the narrower new-session flag.
- Add a fixed opaque drag-strip at the top of the secondary-window transcript:
content padding alone scrolls away with the text, so the strip masks
anything behind it and keeps the window draggable like the main header.
* fix: WSL subagent window
* fix: subagent window top padding
---------
Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Follow-up to salvaged PR #41619: replace the module-global
_truncation_warnings list with a contextvars.ContextVar so concurrent
gateway-session prompt builds can't drain or clear each other's pending
warnings (cross-session leak). Adds a context-isolation test.
PROBLEM: Automatic context files such as SOUL.md and AGENTS.md were capped by a hardcoded CONTEXT_FILE_MAX_CHARS value. Amy's local fork had raised that constant from 20K to 25K so a larger SOUL.md would not be silently truncated, but the hardcoded 25K value changed upstream default behavior and made the patch less generally useful.
SOLUTION: Restore the upstream-compatible 20K default, add a context_file_max_chars config setting for users who intentionally keep larger identity/project-context files, keep chat-visible truncation warnings, and document the new setting. Tests cover the default, config override, explicit max_chars precedence, and the warning text.
Z.ai released GLM 5.2 on 2026-06-15, available on OpenRouter:
- https://openrouter.ai/z-ai/glm-5.2
GLM-5.2 is Z.ai's flagship for long-horizon tasks, shipping a 1M-token
context window (up from 200K on GLM 5.1) and tool calling. Per the
OpenRouter API: text-only, context_length 1048576, tools supported.
No separate -fast variant exists.
The 1M context length, native zai picker entry, setup wizard, and Z.ai
coding-plan auth entries for glm-5.2 already landed on main. This fills
the remaining gap: the two aggregator surfaces where glm-5.1 appears but
glm-5.2 did not.
Changes:
hermes_cli/models.py
- Add z-ai/glm-5.2 to the OpenRouter fallback snapshot (OPENROUTER_MODELS)
and the Nous Portal curated list (_PROVIDER_MODELS["nous"]), newest
flagship first. Live catalogs surface it automatically when reachable;
the fallback lists matter when the manifest fetch fails.
website/static/api/model-catalog.json
- Regenerated via scripts/build_model_catalog.py (not hand-edited) so the
manifest stays in sync with the source lists; guarded by
tests/hermes_cli/test_model_catalog.py.
The generic live+curated merge (commit 630b438) seeded the merged list
from live results, demoting curated-only models below live ones. That
regressed #46309, which deliberately surfaces the newest curated model
(kimi-k2.7-code) FIRST in the native picker even when the live /models
listing lags. Restore curated-first ordering: curated entries lead (in
catalog order), live-only entries are appended for discovery. This keeps
the #46850 fix (zai glm-5.2 now appears) without the kimi regression.
Also switch the validate_requested_model curated fallback (commit
ee7b8a4) from provider_model_ids() — which triggers a second, uncached
live /models fetch with its own 8s timeout and may resolve different
credentials than the api_key/base_url just probed — to the pure-catalog
helper _model_in_provider_catalog(). Membership is checked against the
shipped catalog only, with no extra network call.
Tests: restore the curated-first assertion in
test_kimi_coding_live_catalog_does_not_hide_curated_k2_7_code; update
the new merge tests to curated-first semantics; de-circularize the
validation fallback tests to patch _PROVIDER_MODELS (the real source)
instead of mocking the function under test.
Generalizes #32663 (@ehz0ah). The slash-skill scaffolding pollution
affected every auto-syncing memory provider — mem0, hindsight, retaindb,
byterover, honcho, supermemory all store/embed the raw user turn, so a
/skill invocation poisoned their stores with the full skill body, not just
openviking.
- Lift the contributor's parser into agent/skill_commands.py as the canonical
extract_user_instruction_from_skill_message(), co-located with the message
builders so the markers can't drift.
- Strip once in MemoryManager.{prefetch_all,queue_prefetch_all,sync_all} —
fixes the whole provider fan-out, bare /skill turns are skipped entirely.
- OpenViking's _derive_openviking_user_text() now delegates to the shared
helper as defense-in-depth (no duplicated marker literals).
- Marker-drift regression now asserts against the canonical skill_commands
constants; add manager-level coverage proving every provider gets clean text.
Rewrites the Shop personal-shopping-assistant skill to use the
@shopify/shop-cli (with a full direct-API fallback in references/),
replacing the previous curl-only shop-app skill.
- Rename optional-skills/productivity/shop-app -> shop
- Add references/: catalog-mcp.md, direct-api.md, safety.md, legal.md
- Catalog discovery via Shopify Global Catalog MCP (search / lookup /
get-product), device-authorization sign-in, UCP agent checkout with
delegated spending budget, and order tracking / returns / reorder
- One-product-per-message presentation rules + per-channel overrides
- Expanded security, safety, and legal guidance
Website docs are auto-generated from SKILL.md by CI
(website/scripts/generate-skill-docs.py), so no docs are hand-edited here.
Support linking, copying, and creating ovcli.conf during OpenViking memory setup.
Make setup cancellation write nothing and cover OpenViking/Hindsight picker cancellation paths.
Clicking a model row in the composer dropdown now commits and closes the menu
(via a close context); the hover-revealed reasoning/fast submenu stays open to
tweak. The pill shows a quiet braille loader instead of literal "No model"
until one resolves, and steer takes over the mic slot while typing into a
running agent.
A live config.set model switch already moved the next API call to the new model,
but the conversation could still restore an old sessions.system_prompt snapshot
whose Model/Provider lines named the previous runtime. That made "what model are
you?" answer from stale metadata even while inference ran on the new model.
After a live switch we now refresh the stored system prompt and append a real
system-history pivot (not a fake user turn) so the transcript itself records the
new model/provider. Restore also rejects already-stale prompt snapshots when
their Model/Provider lines disagree with the runtime, so existing bad sessions
self-heal.
The picker no longer touches the profile default. Model/effort/fast live as
plain UI state persisted in localStorage, so a pick follows across Cmd+N and
restarts instead of snapping back. New chats ship that state through
session.create as per-session overrides; live chats still scope switches to the
current session. Settings -> Model remains the only surface that writes the
profile default.
The gateway now accepts those session.create overrides, builds the agent with
them directly, reflects them in the immediate session.info payload, and writes
the chat's own model_config into the lazy DB row so reconnect/resume restores
that chat instead of the global default.
* fix(skills): guard recursive skill delete against tree-escape
Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire
working directory: a built-in-skill sentinel location resolved to the server
cwd and the skill-removal endpoint ran a recursive delete on it.
Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the
agent-facing skill_manage(action='delete') path did a bare
shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target():
refuse to rmtree a path that (1) isn't strictly inside a known skills root,
(2) is a skills root itself, or (3) is reached via a symlink/junction.
Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree
all refused). E2E verified with real symlink + file I/O.
* feat(desktop): allow /browser connect on a local gateway
/browser was hardcoded as terminal-only in the desktop slash palette, so
the chat GUI rejected it with "only available in the terminal interface."
The TUI already drives the live CDP connection via the browser.manage RPC.
Wire the same RPC into the desktop dispatcher as a /browser action handler,
gated to local-gateway connections ($connection.mode !== 'remote'). connect
mutates BROWSER_CDP_URL (and may launch Chrome) in the gateway process, so
it's only meaningful when that process runs on this machine; a remote
gateway gets a clear "local gateway only" message instead.
PROBLEM: Mattermost threads can become invalid or enormous, exposing two failure modes: internal scratch/reasoning/commentary displays could leak into persistent Mattermost threads via global display toggles, while rejected threaded user-visible replies could disappear unless every failed send fell back flat. A broad flat fallback would pollute channels with tool/status/progress noise.
SOLUTION: Require explicit Mattermost platform opt-in for scratch displays, keep using the existing notify=True metadata marker for user-visible final text/media/file replies, and allow the Mattermost plugin adapter to flat-fallback only notify-worthy sends whose threaded POST failure looks like a broken root/thread. Keep tool/status/progress and other non-notify sends thread-strict. Add regression tests for display opt-in, notify-only broken-thread fallback, generic API failure suppression, and stream notify metadata.
Verification: tests/gateway/test_mattermost.py tests/gateway/test_stream_consumer.py tests/gateway/test_stream_consumer_thread_routing.py tests/gateway/test_stream_consumer_fresh_final.py tests/gateway/test_stream_consumer_draft.py; tests/gateway/test_session_api.py tests/gateway/test_status_command.py tests/gateway/test_resume_command.py tests/hermes_cli/test_commands.py; py_compile touched gateway files; git diff --check.
Session: Mattermost thread 6qg8e9dd1pd9pkhi74xyaa1mry, 2026-06-01.
`BasePlatformAdapter.send_multiple_images` passes `metadata=metadata` to
`send_image` / `send_image_file` / `send_animation` on every send. The
WhatsApp and email `send_image` overrides stopped their signature at
`reply_to`, so any image delivered as a URL (the common case — image-gen
backends return URLs) raised:
TypeError: send_image() got an unexpected keyword argument "metadata"
and the image silently failed to send. Their sibling overrides
(`send_image_file` / `send_video` / `send_voice` / `send_document`)
already absorb it via **kwargs, which is why only plain image-URL sends
broke.
- whatsapp/email `send_image`: accept `metadata` (matches the base
signature); WhatsApp forwards it to the super() text fallback.
- Add `tests/gateway/test_media_metadata_contract.py`: asserts WhatsApp +
email accept it, plus a best-effort sweep over every adapter so the next
slip fails at test time instead of in production.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Slack caps apps at 50 slash commands and the registry is at that ceiling, so
adding /debug clamped it out of the native list and broke the telegram-parity
test (debug on Telegram, absent from Slack native slashes, in neither
exclusion set). Add 'debug' to _SLACK_VIA_HERMES_ONLY — same treatment credits
already gets. /debug stays native on CLI/TUI/Telegram/Discord and reachable via
/hermes debug on Slack.
test_session_create_no_race_keeps_worker_alive flaked on CI shard 3 with
'build thread unregistered its own notify despite no race' while passing
20/20 in isolation locally. Root cause: daemon build threads from sibling
session.create tests in the same shard process mutate the shared
server._sessions dict under _sessions_lock and can replace/pop entries
mid-run, flipping this build thread's 'replaced' check (server.py:1011) to
True and triggering a spurious unregister_gateway_notify.
Fix is test-only: snapshot + clear server._sessions before the request so
the test sees only its own session, restore siblings in finally. Also assert
agent_ready.wait() actually returned True (was silently ignoring timeout) and
bump the timeout 2s -> 10s for loaded CI runners.
Classify exhausted pool-only openai-codex credentials as quota/rate-limited instead of missing auth. This prevents auth status and runtime credential resolution from reporting missing credentials when a valid manual:device_code pool credential exists but is temporarily in a 429 usage-limit cooldown.
Adds regression coverage for pool-only Codex auth status and runtime resolution.
Add display.tool_progress_style setting to control how tool progress
messages are displayed in chat platforms:
- 'accumulate' (default): Edit a single message with all tool calls
(new v0.9.0 behavior)
- 'separate': Send each tool call as its own message, interleaved
with thinking messages (pre-v0.9 behavior, better readability)
The setting participates in the per-platform display override system
and can be set globally or per-platform.
Files: gateway/display_config.py, gateway/run.py
When display.memory_notifications is set to 'verbose', skill_manage
notifications now show meaningful change details instead of just the
generic tool message.
Before (verbose mode):
💾📝 Patched SKILL.md in skill 'gogcli' (1 replacement).
After (verbose mode):
💾📝 Skill 'gogcli' patched: "old pitfall text..." → "new pitfall text..."
Changes:
- skill_manager_tool.py: _patch_skill() now includes old/new string
previews (truncated to 200 chars) in the result via '_change' key.
_create_skill() and _edit_skill() include skill description from
frontmatter for verbose create/edit notifications.
- run_agent.py: Background review notification builder now reads the
'_change' dict from skill tool results and formats descriptive
notifications per action type (patch → old→new diff, create/edit →
description preview). Falls back to generic message when _change
data is unavailable (backwards compatible).
This is especially useful when subagents patch skills, since neither
the user nor the parent agent can see what the subagent changed.
Background memory reviews now support three notification modes,
configured via display.memory_notifications in config.yaml:
off — no chat notification (still logged to stdout/HA log)
on — generic '💾 Memory updated' (default, unchanged behavior)
verbose — content preview with action indicators:
💾 Memory ➕ Hermes Repo liegt unter /config/amy/hermes-agent/...
💾 Memory ✏️ Updated repo path from claude-code to hermes-agent...
💾 Memory ➖ old entry about claude-code path...
Previews are truncated to 120 chars for adds/replaces, 60 for removes.
Each action gets its own line in verbose mode for readability.
Files: run_agent.py, gateway/run.py
Streamed Telegram replies that finalize through editMessageText were
converted to MarkdownV2, which has no table syntax and rewrites pipe
tables into bullet lists — users saw a table while streaming that
collapsed to a list at the last moment.
Finalize now edits the existing preview IN PLACE via Bot API 10.1's
editMessageText rich_message parameter when the content has constructs
the legacy path degrades (tables, task lists, <details>, block math).
No fresh send + delete, so no duplicate-preview flicker — the reason
#46206 reverted the fresh-final re-send path. prefers_fresh_final_streaming
stays False; the in-place edit replaces it.
- _needs_rich_rendering(): rich reserved for table/task-list/details/math
(adapted from #45995, @YonganZhang); plain replies stay on MarkdownV2.
- _try_edit_rich(): editMessageText + rich_message via do_api_request,
mirroring _try_send_rich's fallback/latch/transient contract.
- edit_message finalize tries rich in place before the 4,096 overflow
pre-flight (rich cap is 32,768), falling back to legacy on rejection.
- rich_messages default flipped back to True (DEFAULT_CONFIG + adapter).
- docs (en + zh-Hans) + cli-config example updated to default-on.
Closes the root cause behind #45911 / #46009.
When live /v1/models responds but omits a model that exists in the
curated static catalog, validate_requested_model now accepts it with
a note instead of rejecting. This covers the /model slash-command path
(the picker path was already fixed in the parent commit).
Addresses review feedback from potatogim on #46857.
When a provider's live /v1/models endpoint returns a stale or incomplete
list (e.g. Z.AI missing glm-5.2), the generic profile-based code path
returned only the live results, silently dropping curated models.
Generalize the kimi-coding merge pattern to all providers: live entries
come first (provider's preferred order), then curated-only entries are
appended with case-insensitive dedup. This ensures models that the live
endpoint omits still appear in /model picker.
Fixes#46850
External providers (Claude Code) store creds outside Hermes, so the
disconnect API refuses them. The backend now hands the GUI a per-OS
`disconnect_command` that clears the credential the same way the CLI's
logout does (macOS Keychain entry + ~/.claude/.credentials.json), and
the misleading "use claude setup-token" hint is corrected.
Settings → Providers offers a Disconnect button for these: it confirms,
leaves Settings, and runs the removal command in the embedded terminal
via a new runInTerminal() (queues onto $terminalInjection; the terminal
pane flushes and clears it once its session is live). The expanded list
also gets its own "Other providers" header so it no longer reads as
grouped under "Connected". API-managed providers keep the one-click
(trash) disconnect.
Each model remembers its own reasoning effort / fast mode (localStorage,
like model-visibility): editing a model's effort/fast in the submenu
writes its preset, and selecting a model restores its preset onto the
session (capability-gated, Hermes defaults when unset). Every row shows
its own remembered settings (grayed), and the row label and edit submenu
read the same effective value so they can't disagree.
Presets are desktop-client state only — applyModelPreset() no-ops without
a live session id, so selecting a model can't fall through to the
gateway's persistent agent.reasoning_effort / agent.service_tier writes.
Inactive variant `-fast` edits stay preset-only: toggleFast() records
{ fast } on the base model and only swaps models when the row is active,
and selectFamily() honors a saved variant-fast preset by selecting the
`-fast` sibling id.
Provider catalogs surface date-pinned snapshots (`…-20251101`) that the
picker rendered as standalone rows with the date baked into the name
("Opus 4 5 20251101"). Strip the trailing date from display names, and
fold a snapshot out of the list when its rolling alias is present so the
alias stays selectable/searchable while the exact dated id isn't shown
as its own row.
Relocate the model pill to the composer, left of the mic. A new
ModelPill reuses the live ModelMenuPanel dropdown verbatim (single
click target) and the formatModelStatusLabel "Model · Fast Med" label,
anchored to its right edge so the menu doesn't drift with model-name
length. modelMenuContent now flows to ChatView instead of
useStatusbarItems, and the status-bar model-summary item is removed;
the pill subscribes to the model atoms directly and falls back to the
full picker when the gateway is closed.
On a remote gateway connection, agent-written files live on the gateway
host, not the desktop's disk, so the Artifacts view's file:// hrefs failed
("Invalid external URL") and image thumbnails broke.
Make mediaExternalUrl() remote-aware in one place: in remote mode it
rewrites gateway-local paths to GET /api/files/download (a new endpoint
that streams the file as a Content-Disposition: attachment). The artifacts
view now resolves through it, and so do the existing chat-media and
generated-image callers, for free.
The download endpoint stays auth-gated; auth_middleware additionally
accepts the session token as a ?token= query param for this one path so a
shell/browser-opened download (which can't set the session header) still
authenticates — the same query-token tradeoff as the /api/pty WebSocket.
It is NOT added to PUBLIC_API_PATHS.
Salvages #46663 (which carried ~19k lines of CRLF noise and made the
endpoint public). Reimplemented on a clean LF base with the security hole
closed and tests added.
Co-authored-by: qingshan89 <qs2816661685@gmail.com>
* fix(skills): guard recursive skill delete against tree-escape
Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire
working directory: a built-in-skill sentinel location resolved to the server
cwd and the skill-removal endpoint ran a recursive delete on it.
Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the
agent-facing skill_manage(action='delete') path did a bare
shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target():
refuse to rmtree a path that (1) isn't strictly inside a known skills root,
(2) is a skills root itself, or (3) is reached via a symlink/junction.
Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree
all refused). E2E verified with real symlink + file I/O.
* fix(delegation): forward background flag in delegate_task dispatch
delegate_task is an _AGENT_LOOP_TOOLS member, so every surface (CLI,
gateway, desktop/TUI) routes it through AIAgent._dispatch_delegate_task.
That forwarder passed every schema field except background, so
delegate_task(background=true) was silently downgraded to a synchronous
run and returned the sync results payload instead of a delegation_id.
The model sees background in the schema (the call validates), but the
value never reached the function. Add the one missing kwarg so async
background delegation actually engages.
Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire
working directory: a built-in-skill sentinel location resolved to the server
cwd and the skill-removal endpoint ran a recursive delete on it.
Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the
agent-facing skill_manage(action='delete') path did a bare
shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target():
refuse to rmtree a path that (1) isn't strictly inside a known skills root,
(2) is a skills root itself, or (3) is reached via a symlink/junction.
Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree
all refused). E2E verified with real symlink + file I/O.
Collapse segmentMergeIndex + mergeTextInto + the three append helpers
into a single segment-aware appendStreamPart core plus a part-factory
table. Same behavior, DRY.
Models that interleave their reasoning_content and content token streams
(Kimi/DeepSeek/GLM-style routes) emit text -> reasoning -> text deltas
within a single tool-bounded segment. Appending each delta as its own
part shredded one sentence into "Let me" / Thinking / "verify the file",
with a Thinking disclosure wedged mid-sentence.
Coalesce streaming deltas into the most recent same-type part within the
current segment (bounded by any non-streaming part, e.g. a tool call).
The opposite streaming channel is transparent, so a reasoning burst
between two content deltas no longer opens a fresh text part, while a
real tool call still starts a new segment and preserves narration order.
Data-layer only; the renderer already groups consecutive reasoning.
* feat(skills): add optional payments skills (Stripe Link, MPP, Projects)
Adds four optional skills under optional-skills/payments/ wrapping the
Stripe Link CLI, the Machine Payments Protocol (MPP) clients, and the
Stripe Projects CLI plugin. Plus a router skill (payments) that picks
between them based on user intent.
All four are gated [linux, macos] — Stripe's Link CLI does not yet
support Windows. The other CLIs (mppx, stripe projects) are
cross-platform on paper but the payments cluster moves as a unit until
Link CLI gains Windows support.
Skills:
- stripe-link-cli - one-time virtual cards + Shared Payment Tokens
- mpp-agent - HTTP 402 payments via mppx/Tempo/Privy/AgentCash
- stripe-projects - provision SaaS services + credential sync
- payments - router/index skill for the cluster
Hard invariants encoded in every skill:
- Card PANs/wallet keys never enter agent transcripts, logs, or memory
- Spend approvals are not self-bypassable (Link app / wallet UI / CLI prompt)
- Final totals confirmed with user before any --request-approval call
- Credential output files cleaned up after one-time use
Zero core touches. Skills install via:
hermes skills install official/payments/<skill>
* chore(skills/payments): drop router skill — skills shouldn't depend on other skills
Removed optional-skills/payments/payments/ — the router skill that
existed to hand off between stripe-link-cli, mpp-agent, and
stripe-projects.
Per project convention: skills should be independently loadable; a
router is a footgun because (a) it assumes the loader will follow its
recommendation rather than just loading what the user asked for, and
(b) it duplicates the trigger logic that already lives in each
sub-skill's '## When to Use' section.
The three remaining skills declare their own triggers and routing
hints. The optional-skills catalog still groups them under '## payments',
which is the appropriate place for cluster-level discoverability.
Also drops 'payments' from each remaining skill's 'related_skills' list
and removes the corresponding entries from the docs catalog + sidebars.
* feat(skills/payments): fold in danhill-stripe review feedback
- mpp-agent: add link-cli as a client option (when Link is already set
up, or the 402 challenge advertises method="stripe")
- stripe-link-cli: reframe Link account / payment method / approval app
as first-run setup, not hard preconditions (CLI configures them on
first run)
- regenerate the two affected optional-skills docs pages
The Honcho provider page documented the per-profile peer model (user
peer / AI peer / observation) but never the gateway axis — how platform
runtime IDs map to peers. Adds the three keys to the config table and a
short Gateway identity mapping subsection that points at the Honcho page
for the resolver ladder.
Uses the corrected pinUserPeer wording (pins non-agent users, overrides
aliases) so the provider-comparison reader gets the same accurate framing
as the dedicated page.
'everyone collapses to your peer' read as a promise about all traffic.
pinUserPeer pins the user-side peer and is checked before userPeerAliases
(session.py:335), so a pin overrides every alias — including agent peers.
For a multi-agent operator that silently pools distinct agents onto one
peer, the opposite of intent.
Scopes the wording to 'every non-agent gateway user', notes the pin
overrides aliases, and points agent-mesh operators at pinUserPeer:false +
userPeerAliases instead. Same correction in the wizard menu/echo text,
the plugin README, and the website Honcho page.
* feat(delegation): async background subagents via delegate_task(background=true)
delegate_task(background=true) dispatches a subagent that runs in the
background and returns a handle immediately, so the user and model keep
working while it runs. The full result — plus the original task source —
re-enters the conversation as a new turn when the subagent finishes,
riding the same completion-queue rail as terminal background processes.
- tools/async_delegation.py: daemon-executor registry, capacity cap,
rich self-contained completion event pushed onto the shared
process_registry.completion_queue (type='async_delegation').
- delegate_tool.py: background param + single-task dispatch branch;
batch async rejected (v1).
- process_registry.py: format_process_notification renders the rich
task-source block (goal/context/toolsets/model/status/result).
- gateway/run.py: dedicated _async_delegation_watcher drains + injects
results into the originating session (idle + post-turn), session_key
routing enrichment, shutdown interrupt of dangling delegations.
- config: delegation.max_async_children (default 3).
Reuses the existing idle-drain wiring rather than mutating a running
agent loop, preserving message-role alternation and prompt-cache
invariants. 13 targeted tests; CLI + gateway paths E2E-verified.
* test(delegation): make async non-blocking tests environment-independent
CI 'test (5)' flaked on a cold, 8-worker runner: the first
delegate_task(background=true) call measured 2.27s of one-time setup
(config load + child-agent construction + imports), tripping the
elapsed < 1.0 wall-clock assertion. That assertion was testing setup
overhead, not blocking.
Replace the wall-clock thresholds with the real invariant: dispatch
returns while the child is still gated (active_count == 1, completion
queue empty), which a synchronous impl could not do. Keep only a loose
4s sanity backstop well under the runner's 5s gate.
* fix(delegation): harden async background delegation
Follow-up review fixes:
- Detach background child from parent._active_children at dispatch —
otherwise parent-turn interrupts (Ctrl+C, mid-turn steering), cache
evicts (release_clients), and session close (/new) kill/close the
detached subagent mid-run, defeating the point of background mode.
Lifecycle is owned by the async registry's interrupt_fn.
- Make the capacity check atomic with the record insert (TOCTOU: two
concurrent dispatches could both pass active_count() and exceed the cap).
- TUI dedup: key async_delegation events by delegation_id — the
fallthrough keyed them all as ("", type), suppressing every completion
after the first in the desktop/TUI status feed.
- CLI /stop now interrupts running background delegations and /agents
lists them (they live outside the process registry and were invisible).
- Drop stray unbalanced ']' line from the re-injection block and the
unused _ASYNC_DEFAULT import.
Tests: detach-at-dispatch + concurrent-capacity race added (15 total in
test_async_delegation.py); 137 delegate + 140 process-registry/notify/watch
+ 7 TUI dedup tests pass.
* fix(delegation): harden async background completion drains
A GUI app launched from Explorer inherits the environment block captured at
login, so a HERMES_HOME set via 'setx' AFTER login is invisible in process.env
even though the CLI (a fresh shell) sees it. The desktop then silently fell
back to %LOCALAPPDATA%\hermes and reported 'No inference provider configured'
despite a valid configured home (#45471).
resolveHermesHome() now consults the live HKCU\Environment registry value on
Windows before the LOCALAPPDATA default. New windows-user-env.cjs helper parses
'reg query' output, expands %VAR% refs, and fails safe (returns null off-Windows,
on spawn error, or empty value). The registry value is normalized through the
same normalizeHermesHomeRoot() path as the env var for consistency.
Co-authored-by: jeffrobodie-glitch <jeffrobodie@gmail.com>
When a desktop/dashboard session had no agent built yet and the user explicitly
picked a provider in the model picker, config.set('model', ...) would first try
to initialize the agent from the (possibly broken) config default provider —
failing before the user's explicit switch could take effect, trapping them on a
misconfigured default.
config.set now pre-parses the model flags: if an explicit --provider is present
and no agent exists yet, it skips the default-provider agent build and routes
straight through _apply_model_switch with the explicit provider. _apply_model_switch
gained a parsed_flags passthrough (avoids double-parsing) and only falls back to
resolve_runtime_provider(requested=None) when no explicit provider was given.
The desktop hook now sends config.set instead of slash.exec for active-session
model changes, so errors from the selected provider surface to the user instead
of being swallowed.
Co-authored-by: rodboev <rod.boev@gmail.com>
Verifies `hermes debug` surfaces a TERMINAL_ENV override of
terminal.backend, reports the config value when no override is present,
and emits no spurious note when env and config agree.
`terminal.backend` in config.yaml is bridged to the TERMINAL_ENV env var,
but a TERMINAL_ENV set in .env / the shell overrides config and is what
terminal_tool actually uses. The dump printed only the config value, so a
user whose agent was jailed in a docker/podman sandbox via a stale
TERMINAL_ENV still saw `terminal: local` — hiding the real cause. Report
the effective backend and flag when TERMINAL_ENV overrides config.yaml.
When a user-defined provider (e.g. litellm-proxy) and an aggregator
(e.g. openrouter) both advertise the same model name, the Desktop/TUI
model picker would show the model under both groups. Selecting it from
the aggregator row silently set model.provider to the aggregator,
breaking calls because the aggregator doesn't actually serve that model
ID.
Fix: after list_authenticated_providers() returns, collect all models
from user-defined provider rows and filter them out of aggregator rows.
Uses is_aggregator() from hermes_cli/providers.py to identify
aggregators. Case-insensitive matching.
Fixes#45954
NVIDIA NIM API uses vendor-prefixed model IDs (e.g. qwen/qwen3.5-122b-a10b,
nvidia/nemotron-3-super-120b-a12b). The doctor command incorrectly warns that
vendor-prefixed slugs belong to aggregators like openrouter when nvidia is
the configured provider.
Add 'nvidia' to the providers_accepting_vendor_slugs set so doctor no longer
raises false-positive warnings for valid NVIDIA NIM configurations.
Fixes#35425
electron-builder 26.8.x can stage an Electron.app without its
Contents/MacOS/Electron binary, then fail renaming it to Hermes:
ENOENT: no such file or directory, rename .../MacOS/Electron -> .../MacOS/Hermes
This breaks `npm run pack` and the installer desktop stage before a
launchable Hermes.app exists.
- Point build.electronDist at the already-installed Electron dist so
electron-builder reuses it instead of re-unpacking from cache.
- Add a darwin-only prebuilder patch that restores the missing main
binary from the runtime dist before the rename. Idempotent (marker
guard), soft-fails on shape mismatch, survives node_modules reinstall.
Co-authored-by: ChasLui <chaslui@outlook.com>
* fix(teams): package Microsoft Teams SDK as an installable extra
The Teams adapter imports the microsoft-teams-apps SDK, but it was never
declared as a dependency, so source/local installs hit ImportError and the
adapter silently reported the SDK as unavailable. Add a 'teams' extra
(microsoft-teams-apps==2.0.13.4 + aiohttp) and document 'uv sync --extra teams'.
Per the 2026-05-12 [all] policy, opt-in messaging-platform SDKs are NOT added
to [all] (they would break every fresh install on a quarantined release); the
teams extra is installed on demand like the other platform backends.
Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>
* chore: map rio-jeong contributor email for attribution (#43945)
* feat(teams): lazy-install the Teams SDK on demand (parity with other channels)
The teams extra alone left Teams as the only messaging platform that wouldn't
auto-install its SDK — every other channel (telegram, discord, slack, matrix,
dingtalk, feishu) lazy-installs via tools.lazy_deps on first connect. Bring
Teams to parity:
- Add 'platform.teams' to LAZY_DEPS (microsoft-teams-apps + aiohttp).
- Replace the passive 'check_teams_requirements = check_requirements' alias with
a real lazy-installer that calls ensure_and_bind('platform.teams', ...),
rebinding all Teams SDK globals on success (mirrors check_slack_requirements).
- Call check_teams_requirements() at the top of TeamsAdapter.connect() so
enabling Teams installs the SDK on demand.
- Keep the passive check_requirements() as the registry check_fn so 'gateway
status' probes never trigger a pip install.
The 'teams' extra remains for packagers / explicit 'uv sync --extra teams'.
Tests: rework the alias test into shortcircuit + lazy-install assertions, and
update test_connect_fails_without_sdk to simulate an uninstallable SDK.
---------
Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
The initial fix only wrote the prefix npmrc on a fresh Node install, so
pre-existing bundled-Node installs (Node already present) were not repaired
by re-running the installer — install_node/ensure_node skip when Node is
already up to date.
Extract the redirect into an idempotent helper
(configure_managed_node_npm_prefix / _nb_configure_npm_prefix) that no-ops
when there's no Hermes-managed npm, and call it unconditionally from
check_node (install.sh) and at the top of ensure_node (node-bootstrap.sh).
Re-running the install command now repairs an affected install in place,
not just brand-new ones.
Guards that install.sh and node-bootstrap.sh redirect the bundled Node's
npm global prefix to the command link dir's parent via a prefix-local
global npmrc, so `npm install -g` binaries land on PATH instead of the
off-PATH $HERMES_HOME/node/bin.
When the installer falls back to a bundled Node under $HERMES_HOME/node,
npm's default global prefix is that Node dir, so `npm install -g <pkg>`
drops the package binary in $HERMES_HOME/node/bin. Only node/npm/npx are
symlinked into the command link dir (~/.local/bin, /usr/local/bin, or
$PREFIX/bin) — so user-installed global package binaries are NOT on PATH
and can't be run, even though `npm i -g` reports success. They also get
wiped on every Node upgrade (the dir is rm -rf'd and re-extracted).
Redirect the bundled Node's npm global prefix to the command link dir's
parent, so global bins land in the link dir (already on PATH, alongside
node/npm/npx) and survive Node upgrades. Scoped to the bundled Node via
its prefix-local global npmrc ($HERMES_HOME/node/etc/npmrc), so the user's
other Node installs and their ~/.npmrc are untouched. Hermes's own global
installs (agent-browser) pass an explicit --prefix and are unaffected.
Registers 200 plugin commands on top of the native + COMMAND_REGISTRY set
and asserts the tree never exceeds Discord's 100-command limit, that native
high-priority commands survive the cap, and that overflow is actually
dropped. Regression guard for the recurring error 30032
("Maximum number of application commands reached") sync failures.
Discord enforces a hard cap of 100 global application commands per app.
The adapter registers ~27 native commands plus every gateway-available
entry in COMMAND_REGISTRY plus all plugin commands plus the consolidated
/skill group. On a loaded install (many plugins/quick commands) the
desired set exceeds 100, so tree.sync() / _safe_sync_slash_commands()
hits error 30032 ("Maximum number of application commands reached") and
Discord rejects the ENTIRE batch — silently breaking every slash command,
not just the overflow.
Cap registration at the 100-command limit: native commands (registered
first, highest priority) and the /skill group are always kept; lower-
priority auto-registered COMMAND_REGISTRY and plugin commands are added
only until the cap is reached, with a single concise warning telling the
user how to surface the rest. Since both sync paths read from
tree.get_commands(), bounding the tree fixes the root cause for both.
The identity-mapping keys never made it to the site docs. Add the three keys
to the config reference and a Gateway Identity Mapping section: when it
applies (gateway only, setup-gated), the intent tree, resolver order, the
un-pin orphan warning, and the deprecated pinPeerName alias.
Drop pinPeerName from the key table (now a deprecated-alias note), and replace
the single/multi/hybrid 'deployment shapes' section with the gateway-gated
intent tree the wizard actually presents, including the [e] raw-edit hatch and
the un-pin pooling steer.
The single/multi/hybrid 'deployment shape' was a misnomer: these keys only
affect the gateway (the one entrypoint supplying a runtime user ID), and the
three preset names stamped a lossy taxonomy onto three orthogonal knobs while
hiding which keys got written.
Replace it with an intent-led tree gated on gateway detection:
- _gateway_platforms() lazily inspects the gateway config (best-effort, no
hard dependency); the step auto-skips when no platform is connected.
- 'who talks to this?' → just me / me+others (pooled?) / only others, deriving
pinUserPeer + userPeerAliases + runtimePeerPrefix and echoing the result.
- [e] drops to a raw-knob editor for power users.
- The single→multi orphan guard survives as a pooling steer.
The setup wizard wrote the legacy pinPeerName even though pinUserPeer is
the canonical key that outranks it in the resolver — so it had to scrub
the canonical key afterward to stop it winning. Write pinUserPeer directly
and migrate any legacy pinPeerName onto it on touch (setup load + clone),
which removes the precedence-fighting entirely.
Resolver still reads pinPeerName as a back-compat alias; that's deferred.
2026-06-10 16:07:53 -04:00
593 changed files with 43008 additions and 10401 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.