Replaces the tautological test from the original PR (which asserted a
plain assignment it performed itself in the test body) with one that
exercises the actual contracts: _init_cached_agent_for_turn leaves
max_iterations untouched, and the per-turn IterationBudget rebuild
(turn_context.py) propagates a refreshed cap.
When a gateway agent is reused from cache, it retains the max_iterations
from its initial creation. If config.yaml agent.max_turns or HERMES_MAX_ITERATIONS
changed between turns, the cached agent's budget becomes stale.
Before reusing a cached agent, refresh agent.max_iterations from the
freshly-resolved value (read from env/config at line 14585).
Fixes partial issue from PR #48127: handles fresh agent creation + cached agent reuse.
The desktop model picker had no way to force a fresh model fetch: model.options
went through the 1h-cached provider_models_cache.json, and there was no flag to
bust it. When a provider's cached list expired and its next live fetch failed,
the picker fell back to the curated static list — silently dropping live-only
models (e.g. OpenCode Zen's free tier like deepseek-v4-flash-free) the user had
been using.
- Thread refresh through model.options (RPC + REST /api/model/options) ->
build_models_payload -> list_authenticated_providers, which calls
clear_provider_models_cache() up front when set so every row re-fetches live.
- Add a 'Refresh Models' control to the desktop picker (5-locale i18n, spinning
sync icon). Normal opens leave refresh=false to stay snappy on the cache.
Verified: stale cache hides deepseek-v4-flash-free -> refresh busts it -> live
re-fetch surfaces it. refresh=false never touches the cache.
* fix(dashboard): resolve chat TUI argv off event loop
Dashboard chat now resolves its TUI launch command off the
FastAPI/WebSocket event loop. The resolver can run `npm install` /
`npm run build` through `_make_tui_argv()`, and doing that synchronously
in `/api/pty` can block proxy keepalives and other dashboard WebSocket
work long enough for reverse-proxy deployments to drop the chat
connection.
This keeps the current TUI build policy intact: normal production
launches still run the correctness-first `npm run build` path, while
`HERMES_TUI_DIR` remains the prebuilt/no-build path for distros and
containers. The change only moves the potentially slow resolver work to
a worker thread for the dashboard chat path, serialized by an
`asyncio.Lock` so concurrent chat tabs preserve one-build-at-a-time
behavior. `SystemExit` (node/npm missing) and the profile `HTTPException`
path still propagate cleanly through `asyncio.to_thread()`.
Salvaged from #26124 — rebased onto current main. The async wrapper now
threads the `profile` parameter that `_resolve_chat_argv` gained on main
since the PR was opened, so cross-profile chat is preserved.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
* chore: add 0xdany to AUTHOR_MAP
* fix(dashboard): bind chat-argv lock to app.state; cover error propagation
Self-review hardening on top of the salvaged fix:
- Move `_chat_argv_lock` from a module-level `asyncio.Lock()` onto
`app.state` (initialised in `_lifespan`, lazy fallback via
`_get_chat_argv_lock`), mirroring `event_lock`. A module-level
`asyncio.Lock()` binds to whatever event loop is active at import time,
which is the exact pattern `_get_event_state`'s docstring warns against
(breaks across TestClient instances / uvicorn reloads). This keeps the
lock on the running loop.
- Add two tests exercising the real `_resolve_chat_argv_async` →
`asyncio.to_thread` → lock → re-raise chain: `SystemExit` (node/npm
missing) and `HTTPException` (invalid profile) both propagate out of the
worker thread and are caught by `pty_ws`'s existing handlers. The prior
tests mocked `asyncio.to_thread` away and never covered this path.
* test(dashboard): dedupe pty error-propagation tests; assert close code
simplify-code cleanup pass on the salvage stack:
- Extract the shared scaffolding of the two pty_ws error-propagation tests
into `_assert_pty_propagates`, keeping the two tests as distinct contracts
for the `except SystemExit` and `except HTTPException` arms.
- Assert the stable WebSocket close code (1011) instead of relying solely on
the user-facing "Chat unavailable" notice wording — a behavior contract per
the AGENTS.md "behavior contracts over snapshots" rule, robust to notice
rewording. The detail substring ("unknown profile") is still checked for the
HTTPException case since proving the detail survives the thread hop is the
point of that test.
No production-code change; the helper exercises the same real
_resolve_chat_argv_async -> asyncio.to_thread -> lock -> re-raise chain.
---------
Co-authored-by: draihan <draihan@student.ubc.ca>
git worktree lock at creation and unlock before removal. A locked
worktree refuses 'git worktree remove' (and prune), so a second hermes
process or a stray cleanup can't silently delete an in-use isolated
worktree. Fail-soft on both paths — a lock/unlock error never blocks
the session or cleanup.
Salvaged from #47029 (Issue #46303). Unlock moved to the actual-removal
path so a preserved (unpushed-commits) worktree stays locked while in use.
When SessionDB init fails, the CLI/Desktop previously continued live with only
a buried log line. The chat looks healthy, but the transcript is never written
to state.db — so resume later shows a truncated or empty session and the user
only discovers the loss after the fact (#41386).
Emit a prominent stderr banner at startup when the store is unavailable, making
it explicit that the conversation will not be saved and cannot be resumed, with
a pointer to fix the store. Also set _session_db_unavailable so downstream code
can detect the degraded state.
- Scope 'no such tokenizer' matcher to trigram specifically (#779)
- Decouple base FTS and trigram backfill in v11 migration (#1195)
- CJK search falls back to LIKE when trigram unavailable (#3384/#3430)
- Add _trigram_available tracking across init, migration, and startup
- Add regression tests for migration backfill and CJK LIKE fallback
- Add _is_trigram_unavailable_error and _warn_trigram_unavailable helpers
_is_fts5_unavailable_error only matched 'no such module: fts5', but
SQLite builds that ship FTS5 without the optional trigram tokenizer
raise 'no such tokenizer: trigram' instead. This caused SessionDB init
to crash on those builds.
Additionally, the trigram failure path called _warn_fts5_unavailable
which set _fts_enabled = False, globally disabling full-text search
even though the base FTS5 table was created successfully.
Fix:
- Extend _is_fts5_unavailable_error to also match 'no such tokenizer'
- Add _is_tokenizer_unavailable_error to distinguish tokenizer-specific
failures from whole-module absence
- Only call _warn_fts5_unavailable for module-level failures; skip it
for tokenizer-specific failures so base FTS5 remains usable
Fixes#47002
self_provision_if_managed() gated on is_managed(), but is_managed() means
"NixOS/package-manager-managed" (it keys on HERMES_MANAGED or a ~/.hermes/.managed
marker) — NOT "NAS-hosted". A NAS-provisioned Fly agent sets NEITHER, so the gate
was always False and relay self-provision SILENTLY no-oped on exactly the hosted
agents it was built for. Caught live: a staging agent with GATEWAY_RELAY_URL
correctly stamped logged "No messaging platforms enabled" and never dialed the
connector; HERMES_MANAGED was unset on the machine. The unit tests had mocked
is_managed()->True, so they passed while the real trigger never fired (mocked-
trigger blind spot).
Fix: drop the is_managed() gate and rename self_provision_if_managed ->
self_provision_relay. The real trigger is now "relay_url() set + no pinned secret
+ a resolvable NAS token", which is both NAS-independent and self-guarding:
- NAS-hosted agent: GATEWAY_RELAY_URL + no pinned secret + bootstrapped NAS
token -> self-provisions.
- Self-hosted + `hermes gateway enroll`: pinned GATEWAY_RELAY_SECRET -> skipped
(existing secret-present guard).
- Self-hosted, unenrolled, no NAS identity: resolve_nous_access_token() fails
-> graceful no-op (existing fail-soft path).
Security: unchanged trust model. The connector still derives tenant from the
validated NAS token; this only broadens WHEN the provision attempt fires, and
every broadened case is still guarded by token-resolution + pinned-secret-skip.
Tests: replaced the (wrong) "skips when not managed" test with a regression test
proving a NAS host where is_managed()==False STILL provisions; renamed all call
sites; added a "no NAS token -> non-fatal skip" test for the self-hosted branch.
88 relay tests pass.
Relay-adapter lane. EXPERIMENTAL.
The connector now delivers inbound (messages + interrupts) over the gateway's
OUTBOUND /relay WebSocket, not a signed HTTP POST to an inbound endpoint. The
gateway needs no inbound HTTP port — which is what makes hosted gateways (no
public IP) able to receive inbound at all.
- gateway/relay/adapter.py: connect() wires set_interrupt_inbound_handler(
self.on_interrupt) so connector->gateway interrupt_inbound frames bridge into
the existing per-session interrupt path (the inbound message handler was
already wired). Removed _maybe_start_inbound_receiver() + the _inbound_runner
lifecycle — there is no HTTP receiver anymore.
- gateway/relay/inbound_receiver.py: deleted (the signed-HTTP InboundDelivery
receiver).
- gateway/relay/__init__.py: removed relay_inbound_config() (dead with the
receiver gone). The delivery key is still set in-process by self-provision for
forward-compat but is no longer consumed for inbound.
- docs/relay-connector-contract.md: §3 rewritten — inbound is the WS back-channel
routed cross-instance via the connector's relay bus; §5 interrupt + §6 auth
table updated; the old signed-HTTP-POST + per-tenant-delivery-key-signing path
is documented as superseded. gatewayEndpoint noted as passthrough-plane only.
Tests: stub_connector grows set_interrupt_inbound_handler + push_interrupt;
new test_relay_interrupt case proves connect() wires BOTH inbound handlers and an
interrupt_inbound frame over the WS cancels the right session. Removed the
HTTP-receiver test; updated the crypto-shedding scan + self-provision delivery-key
assertion. 88 relay tests pass.
EXPERIMENTAL. Pairs with gateway-gateway (relay bus + WsGatewayDelivery) and the
NAS GATEWAY_RELAY_URL stamp. The cross-repo E2E (connector repo) proves the full
multi-instance path against this production adapter code.
* fix(desktop): show Hindsight memory provider
* feat(desktop): configure Hindsight memory provider
* fix(desktop): limit Hindsight modes to supported setup
* refactor(desktop): generic memory-provider config surface
Replace the bespoke Hindsight settings surface with a declarative,
schema-driven path so adding a memory provider is pure declaration —
no per-provider page, conditional, or endpoint.
- memory_providers.py: declarative registry. Each provider lists its
fields {key, label, kind, default, options, secret-vs-plain}. Hindsight's
mode is a select(cloud, local_external), so rejecting local_embedded
falls out of generic enum validation instead of a hand-written check.
- One generic endpoint pair GET/PUT /api/memory/providers/{name}/config.
GET returns declared fields + current values (secrets only as is_set,
never read back); PUT validates selects against their options, writes
plain fields to the provider config file, secrets to the env store,
and flips memory.provider.
- ProviderConfigPanel renders straight from the schema, replacing
hindsight-settings.tsx and the memory.provider === 'hindsight'
conditional in config-settings.tsx — same pattern as
toolset-config-panel.tsx off env_vars.
Scoped to memory providers; storage layout is unchanged so the runtime
Hindsight plugin reads the same config.json / HINDSIGHT_API_KEY / provider
keys as before. Tests cover the registry, endpoint behavior (defaults,
write+secret, select rejection, unknown provider, secret-never-returned),
and the generic panel.
Adds a 'Customizing platform hints' section to the Prompt Assembly
developer guide covering the append/replace/shorthand shapes, the
defensive fallback, and the cache-stable lifecycle (stable tier,
resolved at build time).
Add platform_hints config so an admin can append to or replace Hermes'
built-in platform hint for a single messaging platform (WhatsApp, Slack,
Telegram, ...) without affecting other platforms. Enables enterprise
managed profiles to steer platform-aware skills (e.g. invoke a custom
table-formatting skill on WhatsApp where Markdown tables don't render)
while leaving Telegram/Slack/CLI behavior unchanged.
- hermes_cli/config.py: document platform_hints in DEFAULT_CONFIG
- agent/agent_init.py: load platform_hints -> agent._platform_hint_overrides
- agent/system_prompt.py: _resolve_platform_hint() applies append/replace
(replace wins; bare string = append shorthand); defensive on bad config
- tests: 16 cases covering append/replace/shorthand/isolation/malformed
Override only affects the platform-hint segment of the system prompt;
SOUL/context/memory tiers and general instructions are unchanged.
DELETE /api/sessions/{id} was the only session endpoint that didn't
resolve the id (detail, messages, rename, export all call
resolve_session_id) and 404'd when the row was already gone. The desktop
optimistically removes the sidebar row, then RESTORES it and shows the
error on any failure — so deleting a session that had just been reaped
(empty-session hygiene) or removed by a concurrent client resurrected a
ghost row and surfaced "session not found". /goal + auto-compression churn
leaves transient empty rows that race the sidebar snapshot, which is the
exact "I deleted the empty one and got 'session not found'" report.
Resolve exact ids / unique prefixes, and treat an already-absent session
as an idempotent success — DELETE's contract is "ensure it's gone". This
mirrors the bulk-delete endpoint, which already treats ghost ids as
success.
Tests: deleting an absent id is idempotent (200, not 404); delete resolves
a unique prefix; a real session still deletes.
When a worker calls kanban_create from inside a session that has a
persistent delivery channel, the originating session is now subscribed
to the new task's completion/block events automatically. The agent
that dispatched the task gets notified instead of having to poll.
- Gateway sessions (telegram/discord/slack): HERMES_SESSION_PLATFORM +
HERMES_SESSION_CHAT_ID ContextVars, set by the messaging gateway.
- TUI / desktop sessions: HERMES_SESSION_KEY in the subprocess env.
The TUI notification poller keys on platform='tui' + chat_id=<key>.
- CLI / cron / test: no persistent channel, no subscription.
Gated by kanban.auto_subscribe_on_create in config.yaml (default True).
Disable to mirror pre-feature behaviour — users who want explicit
kanban_notify-subscribe calls per task can set it to false. This
config gate addresses the design concern that got PR #19718 reverted
upstream (unconditional implicit auto-subscribe on tool-driven
kanban_create was too aggressive for orchestrator users).
HERMES_SESSION_ID is intentionally not a fallback channel — it is
set by ACP/agent subprocess telemetry for every invocation, not just
TUI, so treating it as a notification target would auto-subscribe
every CLI session and re-introduce the over-eager behaviour.
The kanban_create response now includes a 'subscribed' bool so
orchestrators can react if subscription failed (e.g. by falling
back to explicit kanban_notify-subscribe or to polling).
Includes 6 tests covering the gateway / TUI / CLI / partial-context /
gated / add_notify_sub-failure paths. All 90 tests in
test_kanban_tools.py pass; 509 broader kanban tests pass.
The new compression-tip tests poke started_at/ended_at directly via
db._conn to force deterministic lineage ordering. _conn is typed
Optional[Connection], so ty flagged .execute/.commit as unresolved on
None. Bind a local and assert it's non-None first to narrow the union.
Auto-compression ends the live session and forks a continuation child
(linked via parent_session_id). A long-lived parent keeps its own flushed
message rows, so resolve_resume_session_id()'s empty-head walk never
redirected it — resuming the parent id reloaded the pre-compression
transcript and dropped every turn generated after compression, including
the assistant's response. On the desktop this is the recurring "I sent a
message, came back, and the reply isn't there" report on large sessions:
the chat's routed id is the pre-rotation id, and both the gateway
session.resume RPC and the REST /messages read anchored on it.
Fix the resolver at the chokepoint: resolve_resume_session_id() now
follows the compression-continuation chain forward via get_compression_tip()
before its existing empty-head descendant walk. get_compression_tip() only
follows children whose parent ended with end_reason='compression' (created
after the parent was ended), so delegation/branch children never hijack a
resume. This fixes every resume caller at once (REST /messages, CLI
--resume, gateway /resume).
session.resume in tui_gateway was the one resume path that never called the
resolver — it used the raw target id directly. Route it through
resolve_resume_session_id() too (non-lazy only; lazy watch windows must
stay on their exact child branch). Resolving up front also re-anchors the
live-session fast path so a still-live rotated session is reused by its new
key instead of rebuilding a duplicate agent on the stale parent.
Tests:
- resolve_resume_session_id follows the tip even when the parent retains
messages, and is not confused by a delegation child.
- session.resume binds the agent to the continuation tip and returns the
post-compression reply.
Follow-up to the cap-removal salvage. The contributor guarded the new
unlimited default with `[:max_models] if max_models else ...`, which conflates
max_models=0 (used by slug-only callers that want an empty model list) with
None (unlimited). Tighten to `is not None` at all five slicing sites in
list_authenticated_providers / list_picker_providers, and add a regression test
asserting the three-way contract: None=full, 0=empty, N=first N.
The interactive model pickers (Desktop REST API, TUI model.options, CLI
/model) were hard-capped at max_models=50, which truncated large provider
catalogs like Kilo Gateway (336 models) to just 50 entries. This made
most models undiscoverable via the picker search box.
Changes:
- Change build_models_payload() default from max_models=50 to None (unlimited)
- Change list_authenticated_providers() default from max_models=8 to None
- Change list_picker_providers() default from max_models=8 to None
- Fix all [:max_models] slicing to handle None as 'no limit'
- Remove max_models=50 from 5 interactive picker callers:
* web_server.py: get_model_options (Desktop /api/model/options)
* web_server.py: get_recommended_default_model
* model_switch.py: prewarm_picker_cache_async
* tui_gateway/server.py: model.options JSON-RPC
* cli.py: HermesCLI model picker
- Telegram/Discord inline keyboard picker (gateway/slash_commands.py)
still passes max_models=50 explicitly — unchanged behavior.
The total_models field was already in the response payload and is now
meaningful since models.length == total_models for interactive pickers.
Fixes#48279
The manual /compress handler called rewrite_transcript() unconditionally on
the session id returned by _compress_context(). When rotation does not occur
(e.g. _session_db unavailable, or the DB split raised), session_id is unchanged
and rewrite_transcript() DELETEs the original messages and replaces them with
only the compressed summary — permanent data loss (#44794, #39704).
Guard the rewrite on actual rotation: only overwrite when _compress_context
produced a new session id. Otherwise leave the original transcript intact and
log a warning.
compress_context() rotates the session (end_session -> create_session)
mid-turn when auto-compress triggers, but never called
_flush_messages_to_session_db() first. Messages generated during the
current turn that hadn't been persisted to state.db were silently lost.
The same bug existed in cli.py:new_session() (/new command). Both paths
now flush un-persisted messages before ending the old session.
* feat(billing): nous_billing http client + BillingState core (phase 2b)
Phase 2b terminal-billing client foundation:
- hermes_cli/nous_billing.py: typed client for the 4 /api/billing/* endpoints
(state/charge/poll/auto-top-up). Raises typed errors (BillingScopeRequired,
BillingRateLimited, BillingAuthError) mapped from the live-verified contract;
fail-open is the caller's job. Idempotency-Key enforced client-side.
- agent/billing_view.py: surface-agnostic BillingState core + Decimal money
parsing (server emits decimal strings, not 2dp), fail-open builder,
idempotency-key gen, custom-amount validation.
- 51 unit tests (decimal parse/format, payload tiering, error->exception
matrix, fail-open, amount validation).
Plan: docs/plans/2026-06-13-001-phase-2b-terminal-billing-tui-plan.md
* feat(billing): billing:manage scope + lazy step-up re-auth (phase 2b)
- NOUS_BILLING_MANAGE_SCOPE constant.
- nous_token_has_billing_scope(): split-based scope check (no false-positive
substring match).
- step_up_nous_billing_scope(): re-runs the device flow requesting
billing:manage, reusing the held credential's portal/inference URLs + client_id
(so a preview stays a preview), persists like _login_nous but WITHOUT the model
picker. Returns True iff the minted token carries the scope (False when NAS
silently downscopes a non-admin / unticked grant).
Lazy step-up (plan D-A): normal login path unchanged; 403 insufficient_scope
from a billing call triggers this. 7 unit tests.
* feat(billing): billing JSON-RPC methods for the TUI (phase 2b)
billing.state / charge / charge_status / auto_reload / step_up in
tui_gateway/server.py. Return STRUCTURED success envelopes (result.ok +
result.error=<code>) rather than JSON-RPC-level errors, so the Ink rpc() promise
always resolves and the TUI branches on the typed billing error code
(insufficient_scope, rate_limited, no_payment_method, …) to render the right
affordance. Money serialized as decimal STRINGS + display strings. charge mints
+ echoes an idempotency_key for retry reuse. 16 unit tests.
* feat(billing): /billing CLI handler + command registry (phase 2b)
- CommandDef("billing", subcommands=buy|auto-reload|limit), added to
_SLACK_VIA_HERMES_ONLY so it routes via /hermes on Slack (keeps the 50-cap
parity test green, same as /credits).
- cli.py::_show_billing + screen helpers: all 5 screens (overview, buy→confirm→
poll, auto-reload, monthly-limit read-only). Reuses _prompt_text_input_modal /
_prompt_text_input (D-C). Non-interactive (_app is None) renders text + portal
deep-link, never prompts (R7). Decimal money end-to-end. 2s/5-min cancellable
poll loop; 429/503 = retry not failure; settled = ledger truth. Lazy step-up on
403 insufficient_scope. no_payment_method treated as mainline funnel-to-portal.
- 6 CLI tests; 156 command tests (incl. Slack/Telegram parity) green.
* feat(billing): /billing Ink TUI screens + tests (phase 2b)
- ui-tui/src/app/slash/commands/billing.ts: /billing TUI command covering all 5
screens — overview (text), buy <amt> → ConfirmReq → charge → non-blocking 2s/
5-min poll loop → settled/failed/timeout branches, auto-reload <below> <to> →
ConfirmReq → PATCH, limit (read-only). Reuses the existing ConfirmReq overlay
(D-C) — no bespoke component. Typed-error envelope branching: insufficient_scope
arms the lazy step-up confirm; no_payment_method/rate_limited/cap funnel to
portal. Client-side amount validation mirrors the server (bounds + 2dp).
- gatewayTypes.ts: Billing* response interfaces.
- registry.ts: register billingCommands.
- billingCommand.test.ts: 12 vitest cases (overview/gating/buy-confirm-poll-
settled/no_payment_method/step-up/limit/auto-reload/validation).
TUI build green; 12/12 vitest pass; slash tests pass once @hermes/ink is built.
* docs(billing): scrub private cross-repo references
NAS is a private repo — remove all references to it from the public PR:
- drop the cross-repo planning doc (planning scaffolding, not a deliverable;
the PR description documents the design)
- replace 'NAS' / 'PR #412 preview' mentions in code + test comments with
generic 'the server' / 'a preview deployment'
* docs(billing): scrub final NAS reference in step-up docstring
* docs(billing): drop dangling plan-doc refs
The phase-2b plan doc was removed in the cross-repo scrub (300afcc0b)
but two module docstrings still pointed at it. Drop the dead refs.
* feat(billing): interactive /billing overlay + step-up UX, portal-URL & token fixes
Adds the interactive /billing TUI overlay and hardens the terminal-billing
client across CLI and TUI.
- TUI: full /billing overlay state machine (overview to buy to confirm,
auto-reload, read-only monthly limit) reusing the existing confirm overlay.
- Step-up: surface the verification link in-transcript and open the browser
via the TUI's own opener (the device flow runs in the headless gateway, so a
printed URL was being dropped); run the step-up handler off the main loop and
emit the link as an out-of-band event so the gateway stays responsive.
- Step-up copy is scope-accurate ("Billing permission granted") and re-checks
/state so it never claims "enabled" when the org kill-switch is still off.
- Portal deep-links resolve to absolute URLs against the active portal base
(the server emits them relative) - fixes a bare "/billing?topup=open" link.
- Billing calls refresh an expired access token via the stored refresh token
instead of reporting a false "not logged in".
- Optimistic funnel: advise "set up a saved card on the portal" up front when
no card is on file (advisory, not a hard gate).
- Token resolution is cached briefly so the 2s charge poll loop stops
re-locking + re-reading the auth store on every tick; 401 re-resolves fresh.
- Remove the temporary demo-mode shims.
Validation: 87 Python billing tests, 88 TS tests (billing command + gateway
event handler), tsc clean, ink + ui-tui builds green.
* docs(billing): add /billing TUI screenshots for PR
* fix(cli): guard _last_invalidate on bare instances; update stale prompt-fallback test
The UI-invalidate throttle read self._last_invalidate unconditionally, which
raised AttributeError on HermesCLI instances built without __init__ (the
thread-safety test's object.__new__ shell). Guard the read with getattr.
The off-main-thread branch of _prompt_text_input was changed (#23185) to cancel
cleanly to None instead of falling back to a bare input() that would hang on the
slash-worker thread; the test still asserted the old direct-input fallback.
Update it to assert the current intended behavior: returns None, calls neither
run_in_terminal nor input(), and does not hang.
@nous-research/ui@0.18.2 Button is grid-based: size=xs is an
aspect-square icon-only box, and icons belong in prefix/suffix.
The dashboard used shadcn-style size=xs + inline <Icon/> text
children, which forced text buttons into broken tall squares
(Configure, Run setup, Select, Save keys) and split icon/label
across grid columns elsewhere (Schedule it, Prune/Delete actions).
Move leading icons to prefix and size text buttons as sm/default.
For the post-setup spinner, drive the spin from a button-level
[&_svg]:animate-spin selector since the prefix slot clones the
icon and overwrites its className.
- ToolsetConfigDrawer: Select, Save keys, Run setup
- SkillsPage: New skill, Configure
- AutomationBlueprints: Schedule it
- SessionsPage: Prune old sessions, Delete empty, Delete selected
The "💾 Self-improvement review" summary (skill/memory updated) was invisible
on two surfaces:
- Desktop Electron app had no review.summary event handler — skill/memory
writes happened silently. Now appends a persistent system message to the
transcript (matching the Ink TUI's persistent-line semantics, not a
transient toast that can be missed).
- tui_gateway (backs both 'hermes --tui' and the desktop) never read
display.memory_notifications, so it always behaved as 'on' and ignored a
user who set 'off'/'verbose'. Added _load_memory_notifications() (mirrors
the messaging gateway's bool->str normalization, defaults to 'on') and
wired it to agent.memory_notifications, matching gateway/run.py and the CLI.
Delivery chain now reaches all surfaces:
background_review.py -> background_review_callback -> review.summary event ->
desktop transcript / Ink TUI line / gateway message / CLI print.
The universal PARALLEL_TOOL_CALL_GUIDANCE block already lives on main, but it
shipped with two rough edges this change cleans up:
- It duplicated the batching steer for Google models. The
GOOGLE_MODEL_OPERATIONAL_GUIDANCE block still carried its own
"Parallel tool calls" bullet, so Gemini/Gemma received the instruction
twice in one prompt. Drop the redundant bullet — the universal block is now
the single source.
- Its comment claimed "nothing in the open-source system prompt encouraged
batching," which was wrong: the steer existed for Google models only. Reword
to say the gap was that every *other* model got nothing.
- Tighten the test that asserts the steer (precedence-correct), and add an
invariant guarding against re-introducing the Google duplicate.
* Port from cline/cline#11514: encourage parallel tool calls
Add a universal system-prompt guidance block telling the model to batch
independent tool calls (reads, searches, web fetches, read-only commands)
into a single assistant turn instead of one call per turn. The runtime
already executes independent batches concurrently (read-only tools always;
non-overlapping path-scoped file ops); the open-source system prompt had
nothing steering the model to PRODUCE the batch. Fewer round-trips means
less resent context, which compounds over a long conversation.
- prompt_builder.py: new PARALLEL_TOOL_CALL_GUIDANCE block (short, static,
cache-amortised) modeled on TASK_COMPLETION_GUIDANCE.
- system_prompt.py: inject right after the task-completion block, gated by
agent.valid_tool_names + the new toggle.
- agent_init.py: read agent.parallel_tool_call_guidance (default True).
- config.py: add the default under the agent section.
- test_prompt_builder.py: behavior-contract tests (batching steer, dependent
carve-out, length bound) — invariants, not wording snapshots.
Adapted from Cline's TypeScript tool-surface guidance to hermes-agent's
Python prompt-assembly architecture and config-over-env conventions.
* fix(desktop): never persist or restore a named custom provider as bare "custom"
Custom providers vanish from the Desktop/TUI model picker with
"No LLM provider configured" — repeatedly fixed (#44062, #44109, #45578)
and repeatedly regressed (#44022, #47714) because every fix only recovered
the entry identity from a persisted base_url. When a session is
persisted/restored with the resolved provider "custom" and NO base_url, bare
"custom" leaked through verbatim; resolve_runtime_provider("custom") routes to
the OpenRouter default URL with no api_key, so the next turn/resume dies.
Bare "custom" is the resolved billing class shared by every named providers:/
custom_providers: entry — it is not a routable identity. Centralize the
"never let bare custom escape" invariant in one helper,
runtime_provider.canonical_custom_identity(), and apply it at all four leak
sites in tui_gateway/server.py:
- _ensure_session_db_row — the ORIGIN: first DB write seeds the bad row
- _runtime_model_config — live persist
- _stored_session_runtime_overrides — resume restore (heals old rows; drops
unrecoverable bare custom so resume falls back to config default)
- _make_agent — rebuild / per-turn
The helper recovers custom:<name> from the endpoint URL when present, else
from config.model.provider (the durable identity left when no base_url
survived). Regression tests in test_custom_provider_session_persistence.py
lock the no-base_url vector at every site so it cannot regress again.
The memory tool was strictly one-op-per-call. With the store running near
its char limit by design, a new add that would overflow gets rejected with
'consolidate now, then retry' -- but the model could not consolidate and add
in one call. It had to remove/replace across several turns, then retry the
add, each turn re-sending the whole conversation context. Expensive thrash.
Add an 'operations' array: a list of add/replace/remove ops applied
atomically against the FINAL char budget. The model frees space and adds new
entries in ONE call, even when an add alone would overflow. All-or-nothing:
any bad op aborts the whole batch, nothing written.
Root-cause note: the two agent-level memory interception sites
(agent_runtime_helpers.py, tool_executor.py) silently dropped any param not
in their explicit kwarg list, so 'operations' never reached the handler and
batch calls failed with 'Unknown action None'. Both now pass it through and
bridge each add/replace op to external memory providers.
Also: success response is now terminal (done=true + 'do not repeat' note,
no full-entries echo that invited re-edits); schema rewritten to lead with
the batch mechanism and an explicit one-shot stop rule (2138 -> 1476 chars).
Live-verified: near-full consolidate-and-add went 7 calls -> 1 call,
stable across 3 reps. 103 memory/approval tests + 398 background-review/
run_agent tests green; 6 new batch tests added.
PR #48372 relaxes EAP=Stop around the uv venv call so PowerShell 5.1
doesn't mistake uv's 'Using CPython ...' stderr for a terminating
NativeCommandError. But relaxing EAP also means a *genuine* uv venv
failure (exit != 0) no longer aborts on its own — Install-Venv would
continue and print 'Virtual environment ready', and in stage mode
Invoke-Stage would report ok=true, even though no venv was created.
Capture $LASTEXITCODE immediately after the relaxed call and throw on
non-zero (Pop-Location first, matching the function's other exit paths),
so the venv stage fails fast instead of falsely succeeding. This is the
explicit guard originally proposed in #48463 (devorun), composed on top
of #48372's reusable helper + regression test.
Adds a regression test asserting the uv venv exit-code capture + throw.
The dashboard MCP catalog only showed name/description/transport and a
non-clickable source. Users couldn't see what an entry connects to or runs
before installing — the exact detail the docs trust model tells them to vet.
- /api/mcp/catalog now returns transport target (url, or command+args),
auth_type, git install source/ref + bootstrap commands, default-enabled
tool hint, and post-install guidance per entry.
- McpPage renders the endpoint URL (http) or command+args (stdio), the git
install source/ref, a collapsible bootstrap-commands list, setup notes,
and the source as a clickable link when it's a URL.
- Docs: drop the 'uv pip install -e .[mcp]' quick-start step (Hermes does
not support pip installs; MCP ships with the standard install) and note
the dashboard now surfaces this detail.
- Strengthen the catalog endpoint test to assert the new inspection fields.
Epic's experimental Unreal MCP plugin embeds an MCP server inside the
Unreal Editor process, served over local HTTP (127.0.0.1:8000/mcp by
default). HTTP transport, no auth, no install block — the user enables
the plugin in-editor and Hermes connects to the URL.
Also drops test_optional_mcps_manifests_ship_in_both_wheel_and_sdist:
it asserted wheel/sdist packaging targets for pip/Homebrew/Nix installs,
which Hermes does not support — installs run from the repo checkout, where
the catalog is discovered by directory iteration with no packaging step.
* fix(tui): don't make Enter swallow trailing-space-only slash completions
Submitting a slash command in the TUI took three Enter presses: one to
complete the name (/ex → /exit), a second that only appended the trailing
space the gateway adds to keep the classic-CLI prompt_toolkit dropdown open
(/exit → "/exit "), and a third to actually submit.
The composer's submit handler accepted the highlighted completion whenever
applying it changed the input at all, so the whitespace-only delta ate an
extra keypress. Treat a completion whose only change is trailing whitespace
on an already-complete token as "already complete" and fall through to
submit. Partial-name and argument completions (a real token change) still
accept on Enter as before.
The replace/accept logic is extracted into pure helpers (applyCompletion,
completionToApplyOnSubmit) in domain/slash.ts.
* test(tui): cover Enter/completion trailing-space behavior and isolate poller queue
- completionApply.test.ts asserts completionToApplyOnSubmit accepts real
token completions (partial command name, argument) but returns null for a
trailing-space-only delta on an already-complete command, so Enter submits
instead of needing extra presses.
- test_notification_poller_delivers_completion / _skips_consumed previously
shared the process-global process_registry.completion_queue. Their events
carry no session_key, so a leaked/concurrent poller could dequeue and
dispatch them to a fixture agent without run_conversation, flaking CI
("AttributeError: '_FakeAgent' object has no attribute 'run_conversation'").
Isolate the queue per test (fresh queue.Queue via monkeypatch), matching the
sibling poller tests that already do this.
The salvaged guard allowed _rmtree_writable(SKILLS_DIR) itself. No call
site ever passes the root — every site passes a skill subdir or its .bak
sibling — so allowing the root only preserves the #48200 footgun (a dest
that collapses to the root wipes every installed skill). Require a strict
strict-child relationship and update the test that documented the
nonexistent 'full reset' capability.
Defense-in-depth fix for the silent wipe of ~/.hermes/ documented in
#48200. A `hermes update --yes` run silently destroyed a user's
.env, MEMORY.md, kanban.db, custom skills, and scripts. Two changes:
1. `_rmtree_writable` in tools/skills_sync.py now refuses to rmtree
anything outside SKILLS_DIR (the HERMES_HOME/skills/ root).
All five call sites pass paths under SKILLS_DIR, so the guard is
a no-op for current code and a loud, recoverable failure for
any future regression (bad path join, malicious bundled
manifest, stale path in scope after an exception).
2. The default `updates.pre_update_backup` flips from false to
true in hermes_cli/config.py. A few minutes of zip per update
is negligible compared to silent total data loss. Still
overridable; --no-backup still works for one-off opt-out.
Five new tests in TestRmtreeWritableScopeGuard (root path,
hermes home, sibling dir, skills root itself, subdir) plus a
flipped `test_default_enabled_creates_backup` in test_backup.py.
178/178 tests pass in the two affected files. Public method
signatures unchanged, no test-stub blast radius.
Closes#48200
Review feedback from egilewski:
1. Remove trailing whitespace from test docstring and mock patches (lines 1430, 1469, 1476, 1482)
2. Expand test coverage: also verify ANTHROPIC_API_KEY is stripped (not just OPENAI_API_KEY)
Changes:
- Remove trailing whitespace from test file
- Add ANTHROPIC_API_KEY to test environment
- Add assertion verifying ANTHROPIC_API_KEY is stripped from cua-driver subprocess env
- Syntax verified: python3 -m py_compile tests/tools/test_computer_use.py ✓
- Use _sanitize_subprocess_env() to filter Hermes-managed credentials
from the cua-driver subprocess environment (issue #37878)
- Prevents credential exfiltration to the third-party cua-driver binary
- Aligns with existing pattern used by browser-tool and other tools
- Add regression test to verify environment sanitization
The cua-driver is a lower-trust MCP subprocess per SECURITY.md §2.3.
Its inherited environment is now scrubbed by default, removing provider
API keys, gateway tokens, and platform credentials that should not leak
to third-party binaries.
Fixes#37878
PR #47792 pinned Electron to an exact 40.10.2 and regenerated the root
package-lock.json (dropping @electron/get@5 + @electron-internal/extract-zip,
restoring @electron/get@2 + extract-zip@2 + yauzl), but did not refresh the
shared npmDepsHash in nix/lib.nix. The hash still described the previous
40.10.3 lockfile, so npmConfigHook fails on every Nix build with
"npmDepsHash is out of date" for hermes-tui / hermes-web / hermes-desktop.
Regenerate the single shared hash to match the current lockfile.
Verified with fetchNpmDeps (authoritative, not prefetch-npm-deps):
nix build .#tui.npmDeps -> builds clean
nix build .#tui -> Validating consistency -> Installing dependencies
-> Finished npmConfigHook (no hash error)
The `before-quit` handler tears down the bootstrap controller, preview
watchers, and the Python backend but never disposes live PTY sessions.
When `app.quit()` proceeds to `FreeEnvironment()`, node-pty's
`ThreadSafeFunction::CallJS` callback fires on a half-torn-down
environment, throws a C++ exception that can no longer be caught, and
the process aborts (microsoft/node-pty#904).
Iterate `terminalSessions` and call `disposeTerminalSession()` (which
already calls `pty.kill()` + deletes the map entry) before killing the
backend, so the ThreadSafeFunctions are removed before teardown begins.
Closes#48335
The TUI banner reported fewer tools than the classic CLI for the same
config (e.g. 32 vs 38) when an MCP server connected slowly. Root cause:
the agent snapshots `agent.tools` once at build time and never re-reads
the registry. `_make_agent` briefly joins the background MCP discovery
thread (`wait_for_mcp_discovery`, ~0.75s) so fast servers land in that
snapshot, but a server slower than the bound — common for an HTTP MCP
server on first connect — lands *after* the agent is built. Its tools are
then absent from both the agent (uncallable until `/reload-mcp`) and the
banner for the whole session.
The classic CLI doesn't hit this because it re-derives
`get_tool_definitions()` at banner render time (which re-waits for
discovery), so it picks the late tools up.
Fix: after a fresh agent is built and its first `session.info` emitted,
if discovery is still in flight, schedule an off-critical-path daemon that
waits for it to finish, then rebuilds the tool snapshot and re-emits
`session.info` — the same rebuild `/reload-mcp` performs, but automatic.
Both the agent's callable tools and the banner count catch up.
Cache safety: the rebuild runs only while the session is still
pre-first-turn (`_user_turn_count`/`_api_call_count` both 0 → nothing
cached to invalidate). Once the user has sent a message we leave the
snapshot frozen rather than break the cached prompt prefix mid-conversation;
late tools then require an explicit `/reload-mcp` (user-consented), exactly
as today. No-op when discovery finished before the agent build, when the
join times out, when the registry was unchanged, or when the session was
swapped/closed while waiting.
Adds entry.mcp_discovery_in_flight() / join_mcp_discovery() accessors and
covers the matrix (added/none/post-turn/timeout/unchanged/replaced) with
unit tests.
The TUI banner footer used the raw `info.mcp_servers.length`, so a
configured-but-disabled server (e.g. `linear`) was counted alongside
connected ones. With a disabled `linear` and a connected `nous-support`,
the TUI reported "2 MCP" while the classic CLI correctly reported "1 MCP"
(`mcp_connected = sum(1 for s in mcp_status if s["connected"])` in
hermes_cli/banner.py).
The collapse toggle even labels the count "connected", which was wrong
for the same reason.
Count connected servers for both the toggle and the footer segment, and
drop the `· N MCP` segment entirely when none are connected (matching the
classic banner, which only appends it when the count is > 0). The
expandable MCP section still lists every configured server, including
disabled ones.
Invariant test renders SessionPanel and asserts the headline equals the
connected count, never the configured total.
Source-level guard (install.ps1 only runs on Windows, so there's no Linux CI
runner to execute it): the astral uv install line must be invoked via the call
operator on a resolved host variable, the bare-`powershell` literal that
produced the field-reported "The term 'powershell' is not recognized" must be
gone, and the resolver must be PATH-independent (Get-Process -Id $PID) and
pwsh-aware.
The Windows installer's Install-Uv spawned the astral uv installer with a
hardcoded bare `powershell -ExecutionPolicy ByPass -c "irm .../uv | iex"`.
That name resolves only to Windows PowerShell, and only when its System32
directory is on PATH. Run under PowerShell 7+ (`pwsh`) — or any session where
`powershell` isn't on PATH — the spawn dies with "The term 'powershell' is not
recognized", and uv installation aborts (the installer then appears stuck).
Add Get-PowerShellHostExe, which prefers the absolute path of the host we're
already running in (PATH-independent), then falls back to powershell/pwsh via
Get-Command, then to the bare name. Install-Uv now invokes that resolved exe.
Add infinitycrew39@gmail.com -> infinitycrew39 to AUTHOR_MAP so the
contributor audit resolves the two cherry-picked commits from the #47945
langfuse trace-scope salvage (merged as #48292) to a GitHub handle instead
of flagging them as an unmapped author email.
The prior assertion `all("turn1" in k or "turn2" in k for k in keys)` was
weak on two counts: it passes vacuously when keys is empty (a regression
that lost all state would slip through), and after turn 2 finalizes only
turn 1 lingers, so it only ever inspected turn 1 anyway. Replace it with an
exact check that one key survives, it is turn 1, and turn 2 never merged
into it — the real isolation invariant the test name claims.
Scoping the trace key by turn_id (the prior commit) fixed cross-turn
collisions but introduced a slow leak: _finish_trace only pops a key when a
turn ends cleanly (final response has content and no tool calls), so any
turn that is interrupted, ends on a tool call, or has empty final content
now leaves its uniquely-keyed entry in _TRACE_STATE forever. Previously the
constant per-session key was overwritten by the next turn, capping growth at
~1 entry per session.
Add an LRU cap (_MAX_TRACE_STATE) enforced by _evict_stale_locked, called
under _STATE_LOCK immediately before each insert. It evicts the
least-recently-updated entries (using the previously-dead last_updated_at
field) and ends their root span so nothing dangles. Regression test drives
50 non-finalizing turns against a cap of 8 and asserts the dict stays bounded
with the most-recent turns surviving.
The turn- and api-scoped branches each repeated the same
task/session/thread fallback ladder with only the infix differing. Extract
the shared prefix into _scope_prefix so a future scope dimension touches one
ladder instead of three. The legacy branch still returns a bare task_id (not
the task: prefix) for backward compatibility, so it stays separate.
Output key strings are unchanged; a new test pins them across every
task/session/turn/api combination since the keys are matched across hooks
and any drift would silently break trace finalization.
Cleanup pass on the salvage (behavior-preserving):
- diff_bundled_skill now uses the existing _skill_file_list() helper
instead of reimplementing the rglob/is_file/relative_to file-set
enumeration inline (twice).
- Extract _is_tracked_user_modification(origin_hash, user_hash) and use
it in BOTH the sync loop and list_user_modified_bundled_skills() so the
'kept user edit' rule can't drift between the two sites.
- _read_text_for_diff -> _read_for_diff returns (bytes, text); the binary
branch now compares the bytes it already read instead of re-reading
both files from disk.
- Drop the unused 'user_present' key from diff_bundled_skill's return
contract (no consumer or test ever read it).
- test_update_modified_notice: drop the brittle '>= 2 sites' count-floor
so consolidating the two print paths into a shared helper stays a
welcome refactor; keep the per-site 'count notice => discovery hint'
invariant (still mutation-tested).
The PR added helper-level tests for _trace_key but nothing exercised the
keys through the real hooks. This adds TestTurnTraceIsolation, which drives
on_pre_llm_request / on_post_llm_call across two turns of one gateway
session (task_id == session_id, unique turn_id, api_call_count reset per
turn) and asserts each turn opens its own root trace when the first turn
fails to finalize (tool-only final step). This test fails on the pre-fix
code (only one trace opened, turn 2 absorbed into turn 1) and passes with
the scoping fix.
Also pins the turn_id-over-api_request_id key precedence: the turn-scoped
post_llm_call carries no api_request_id, so it must still resolve to the
same key as the request-scoped hooks or finalization breaks.
Salvage follow-up to the cherry-picked feat/test commits:
- W1: the unpack/install update path in main.py printed the
'~ N user-modified (kept)' notice without the new
'hermes skills list-modified' hint that the git-pull path got.
Mirror the hint to both sites so the count is actionable
regardless of which update path runs.
- W2: 'hermes skills diff <name>' (bundled-vs-stock) now shares the
verb with the gateway write-approval 'diff <id>'. The gateway
handler's docstring + truncation message pointed users to
'/skills diff <id>' on the CLI, which now resolves a bundled skill
by that name instead. Point at the pending JSON file and note the
two diff commands are distinct.
- Add an invariant test asserting every 'user-modified (kept)' notice
in main.py carries the discovery hint (guards sibling drift).
Exercises the real sync pipeline (no mocked comparison logic): a pristine
synced skill is not flagged; an edited one is listed and diffed (modified +
added files); an unknown skill returns not-ok; and `reset --restore` clears
the modified state so revert and discovery stay consistent.
`hermes update` keeps (won't overwrite) bundled skills the user edited
locally, but only printed a count — "~ N user-modified (kept)" — with no way
to learn which skills, or see what changed. Reverting already existed
(`hermes skills reset <name> [--restore]`); discovery and inspection did not.
Add two CLI commands (zero model-tool footprint), reusing the manifest
origin-hash that sync already maintains:
- `hermes skills list-modified [--json]` — list the bundled skills whose
on-disk copy diverges from the last-synced origin hash (the exact test the
sync loop uses to decide what to skip).
- `hermes skills diff <name>` — unified diff between the user's copy and the
current bundled (stock) version, so the user can confirm what changed
before reverting.
Both are mirrored as `/skills list-modified` and `/skills diff`. The
`hermes update` notice now points at `hermes skills list-modified`. Core
helpers `list_user_modified_bundled_skills()` and `diff_bundled_skill()` live
in tools/skills_sync.py alongside the existing reset logic.
Follow-up cleanup on the OpenViking setup path merged in #48262:
- _write_ovcli_config now uses utils.atomic_json_write(path, data, mode=0o600)
instead of the local _precreate_secret_file + write_text + chmod sequence.
The shared helper (already used by honcho/mem0/supermemory/hindsight) writes
via temp-file + fchmod(0600) + fsync + os.replace, so the ovcli.conf is
written atomically (no half-written secret file on crash) and with no
chmod-after-write TOCTOU window. _precreate_secret_file stays for the .env
writer path.
- Remove dead _DEFAULT_ACCOUNT/_DEFAULT_USER constants (0 references; the
empty->'default' tenant fallback lives in the _VikingClient constructor).
Tests: tests/plugins/memory/test_openviking_provider.py + test_memory_setup.py
+ openviking_plugin/test_openviking.py -> 130 passed; ruff clean.
PR infographics are decorative visual hooks for a PR body, not repo
artifacts. The established convention (commit 5772e638c, "chore: drop
in-repo infographic/ directory; keep PR-body URLs only", #30854) is to
hotlink an externally-hosted image so GitHub camo-proxies it inline,
leaving zero binary footprint in the tree.
Two such assets had been committed anyway and are referenced nowhere in
the codebase:
- docs/assets/ns504-chat-session-reconnect.png (1024-equiv, NS-504 PR
infographic, added in #47674 alongside the ChatPage.tsx fix)
- infographic/kanban-db-corruption-defense/infographic.png (re-added a
directory #30854 had explicitly removed, in #30952)
Both are unreferenced decorative infographics, so removing them has no
effect on docs, website, or app builds. Removing the latter also clears
the stray top-level infographic/ directory that #30854 had retired.
These blobs remain in history (the commits that introduced them are
already on main and bundled with real code, so they can't be dropped);
this just removes them from the working tree going forward.
Resolves conflicts from the OpenViking churn that merged after #32445 was
opened (#48042/#47662 session-switch + write hardening, #47311/#47973):
- plugins/memory/openviking/__init__.py: keep both __init__ field groups
(the PR's _runtime_start_* alongside main's _prefetch_threads/_shutting_down).
- tests/plugins/memory/test_openviking_provider.py: keep BOTH the PR's new
setup-validation tests and main's session-switch/concurrency tests (disjoint
additions to the same region).
Two fixes layered while reconciling (contributor work otherwise preserved):
- Restore the merged tenant-header contract (#22414/#21232). The PR had changed
_VikingClient defaults to '' and made empty account/user OMIT the tenant
headers; main's contract is that empty falls back to 'default' and the
X-OpenViking-Account/User headers are ALWAYS sent (ROOT API keys need them).
Reverted the constructor to 'account or os.environ.get(..., "default")' and
updated the two PR tests that asserted the omit-when-empty behavior.
- Close a secret-file TOCTOU in the setup writers. _write_env_vars and
_write_ovcli_config wrote the api_key/root_api_key file and chmod 0600
AFTERWARD, leaving a world-readable window on newly-created files. Added
_precreate_secret_file() to create with 0600 before any secret bytes land.
* fix(dashboard): stream file uploads via multipart instead of base64 JSON
The dashboard file manager uploaded files (including backup/restore zip
archives) by reading them client-side with FileReader.readAsDataURL and
POSTing a base64 data URL inside a JSON body to /api/files/upload. For a
large backup this (a) inflates the payload ~33%, (b) buffers the whole
file plus its decoded copy in memory, and (c) reliably trips an upstream
proxy body-size/timeout limit, surfacing as a 502 with the upload
appearing to hang indefinitely (NS-501). Dashboard-only hosted users have
no shell fallback to place the archive, so backup restore was unusable.
Add a streaming multipart endpoint POST /api/files/upload-stream
(UploadFile + Form) that reads the request body in 1 MiB chunks straight
to a sibling temp file, enforces the existing 100 MB size cap as it
streams (413 on overflow, before buffering the whole file), and
atomically renames into place so a partial/aborted/over-limit upload
never clobbers an existing file. The frontend api.uploadFile now sends
multipart/form-data (raw bytes, no base64, browser-set boundary) and
FilesPage passes the File object directly; the dead readAsDataUrl helper
is removed. The legacy base64 JSON endpoint stays for backward compat.
FastAPI's UploadFile/Form require python-multipart, which is NOT pulled in
by fastapi itself, so it is added to the base deps, the [web] extra, and
the tool.dashboard lazy-install set (kept in sync).
Validated: 5 new endpoint tests (roundtrip, multi-chunk >1 MiB,
over-limit 413 without clobbering + no temp-file leak, overwrite=false
conflict, forced-root traversal containment); existing base64 tests still
pass; web typecheck + vite build clean; and a real uvicorn server E2E
(5 MB multipart upload -> HTTP 200 in 0.21s, exact byte match) plus a
30 MB TestClient roundtrip confirm constant-memory streaming end to end.
Reported via beta (NS-501).
* build(deps): regenerate uv.lock for python-multipart (NS-501)
CI ran uv lock --check / uv sync --locked which failed because the
python-multipart dependency add was not reflected in uv.lock. Regenerate
the lockfile (resolves to 0.0.20, matching the [web] extra pin) after
merging current main.
Importing a backup wrote every file from the zip over the target home
wholesale. On a hosted instance this clobbered gateway_state.json with the
source machine's last recorded run/desired state — driving the container-boot
reconciler (container_boot._read_desired_state, which only auto-starts a
gateway whose state is "running") off stale/foreign state and leaving the
gateway stuck "starting", disconnected from the Nous portal.
Add _IMPORT_SKIP_NAMES (gateway_state.json, gateway.pid, cron.pid,
gateway.lock, processes.json) and skip them by basename in run_import, so both
the root profile and named profiles preserve the target's own runtime state.
This mirrors what container_boot._STALE_RUNTIME_FILES already sweeps on every
container boot, and protects against older backups that predate the
backup-side exclusions. The import summary reports which files were preserved.
This is the second half of NS-501 (filed separately as NS-508): the upload
502 was fixed in #47663; this fixes the import-breaks-the-instance half.
The gateway half of relay Phase 3. On a MANAGED boot with relay configured and
no secret pinned, the runtime self-provisions its relay credentials IN-PROCESS:
resolve the agent's own Nous access token (resolve_nous_access_token) -> POST
the connector's /relay/provision asserting its own endpoint + route keys ->
set GATEWAY_RELAY_ID/SECRET/DELIVERY_KEY into os.environ so the immediately-
following register_relay_adapter() reads them and dials out authenticated.
No human, no enrollment token, no disk write — the creds live only in process
memory (save_env_value refuses under managed anyway, and keeping the secret off
any volume is the stronger posture). Stateless: process-env creds don't survive
a restart, so a managed container re-provisions every boot; the connector's
rotation window covers a still-connected prior instance. An explicitly-pinned
GATEWAY_RELAY_SECRET is respected (skip). Self-hosted is unchanged: humans keep
using `hermes gateway enroll`.
Endpoint provenance is gateway-asserted (GATEWAY_RELAY_ENDPOINT +
GATEWAY_RELAY_ROUTE_KEYS, env or gateway.relay_* config) — uniform code path
whether the operator sets it (self-hosted) or NAS stamps it (hosted, the only
case NAS knows the public URL). Both absent -> outbound-only provisioning
(credentials, no inbound routes). The connector scopes the asserted endpoint to
the verified tenant, so it stays within the security model.
- gateway/relay/__init__.py: relay_endpoint(), relay_route_keys(),
_provision_url(), _post_provision(), self_provision_if_managed() (never
raises — a provision failure logs and boots without relay auth).
- gateway/run.py: call self_provision_if_managed() immediately before
register_relay_adapter() in the startup path.
Tests: 12 unit (trigger logic, respect-pinned-secret, in-process env wiring,
endpoint+routes vs outbound-only, fail-soft on token/connector failure);
mutation-checked (drop is_managed guard / pinned-secret guard -> tests fail).
Cross-repo live E2E driver lands on the connector side (depends on this).
EXPERIMENTAL: relay auth scheme may change until >=2 Class-1 platforms validate.
The install method (docker/git/pip/...) describes the *running binary*, but
detect_install_method() read it from $HERMES_HOME/.install_method — a shared
DATA directory. The Docker docs deliberately bind-mount $HERMES_HOME
(~/.hermes:/opt/data) so config/sessions/memory persist and can be shared with
a host-side Desktop/CLI install.
When a containerized gateway and a host install share one $HERMES_HOME, the
home-scoped stamp is a single slot describing two installs: the published image
stamps 'docker' on every boot, the host install then reads 'docker' and the
in-app updater refuses to run 'hermes update' ("doesn't apply inside the Docker
container"). Reinstalling the Desktop app from the DMG doesn't help because the
contaminated stamp is re-read every time.
Fix (option 1 — code-scoped stamp):
- detect_install_method() reads <install tree>/.install_method first (next to
the running code, immune to the shared data dir). It falls back to the legacy
$HERMES_HOME stamp for back-compat, but IGNORES a 'docker' home stamp when
not actually containerized — so already-poisoned shared homes self-heal.
- stamp_install_method() writes the code-scoped stamp.
- install.sh stamps $INSTALL_DIR instead of $HERMES_HOME.
- Dockerfile bakes 'docker' into /opt/hermes/.install_method at build time
(inside the immutable block); stage2-hook.sh no longer writes the home stamp
and proactively removes a stale 'docker' one to heal existing shared homes.
Genuine containers still resolve to 'docker' (baked stamp, or legacy home stamp
honored when containerized). Unstamped installs in generic containers still fall
through to git/pip (preserves the #34397 fix).
* feat(relay): authenticate the connector⇄gateway WS channel
The relay gateway may be customer-managed and internet-exposed, so the
connector⇄gateway channel is itself authenticated (distinct from the
platform crypto the relay path sheds). Add gateway/relay/auth.py — a
Python port of the connector's HMAC token + delivery-signature schemes
(relayAuthToken.ts / deliverySigning.ts), verified byte-for-byte against
the connector's compiled TypeScript via cross-language test vectors.
Present an Authorization bearer on the /relay WS upgrade keyed by the
per-gateway secret (resolved from GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET
in env or config). The connector rejects an unauthenticated/invalid/
revoked upgrade with close 4401.
* feat(relay): signed-HTTP inbound delivery receiver
The connector delivers normalized inbound events to a tenant's gateway
over a signed HTTP POST, not the outbound /relay WS: the connector
instance owning a platform socket is generally not the instance a given
gateway dialed out to, so inbound targets a tenant endpoint that may
load-balance across gateway instances.
Add gateway/relay/inbound_receiver.py — verifies x-relay-signature /
x-relay-timestamp over the EXACT raw request bytes (re-serializing would
break the HMAC: JS JSON.stringify is compact, Python json.dumps spaces)
against the per-tenant delivery key verify list within a 300s replay
window, then dispatches messages to handle_message and interrupts to the
interrupt handler. Wire it into the adapter lifecycle (start in connect()
when a delivery key + bind port are configured, tear down in disconnect();
a purely-outbound dev gateway runs without it).
Refine test_relay_sheds_crypto to distinguish PLATFORM crypto (Discord
ed25519, Twilio/WeCom HMAC — still shed) from the connector⇄gateway
CHANNEL auth (intended): auth.py / inbound_receiver.py are exempt from
the platform-symbol scan but still banned from importing platform-crypto
modules, plus a positive guard that auth.py uses only stdlib hmac/hashlib.
* feat(relay): hermes gateway enroll CLI
Add the gateway half of zero-touch enrollment. `hermes gateway enroll`
resolves a fresh Nous Portal access token (the tenant-proving identity),
POSTs {enrollmentToken, gatewayId} to the connector's /relay/enroll, and
persists GATEWAY_RELAY_ID / GATEWAY_RELAY_SECRET / GATEWAY_RELAY_DELIVERY_KEY
to ~/.hermes/.env. The per-gateway secret authenticates the WS upgrade;
the per-tenant delivery key verifies signed inbound deliveries.
Refuses under is_managed() (hosted installs get the secret stamped in by
the orchestrator). Added as an 'enroll' subcommand on the existing
gateway subparser — not a new top-level command.
* docs(relay): inbound is signed HTTP, not WS; document channel auth
Fix the stale contract: §3/§5 said inbound rode the WS socket (single-
instance only, predates the multi-instance socket-ownership + channel-auth
model). Inbound + connector→gateway interrupt are signed HTTP POSTs to the
tenant endpoint. Add §6.1 documenting the two channel-auth schemes (per-
gateway WS-upgrade secret, per-tenant inbound delivery key) and how they
differ from the platform crypto the relay path sheds.
* test(relay): update build_gateway_parser callers for cmd_gateway_enroll
The enroll subcommand added cmd_gateway_enroll as a required keyword-only
arg to build_gateway_parser, but two existing parser-extraction tests still
called it with only cmd_gateway/cmd_proxy — failing CI with TypeError.
Thread the new handler through both call sites and add a test asserting
`gateway enroll` dispatches to cmd_gateway_enroll with its flags parsed.
* fix(docker): supervised gateway uses --replace to take over stale holder
Inside the s6 container image the per-profile gateway service rendered a
bare `hermes gateway run` (no --replace). When a gateway is started
OUTSIDE s6 — a stray shell `hermes gateway run`, an agent action, or the
Open WebUI helper (scripts/setup_open_webui.sh) — it grabs the
per-HERMES_HOME PID lock first. The supervised slot then execs the bare
`gateway run`, hits the "Another gateway instance is already running"
guard, exits non-zero, and s6 restarts it: a restart loop that floods the
log every ~12s and never binds. The container looks up but the gateway is
permanently down, and dashboard-only users (no shell) cannot recover.
Render the supervised run script as `gateway run --replace` so s6 is
authoritative for its slot: it reaps the stale holder via the hardened
takeover path (takeover marker + SIGTERM->SIGKILL-with-confirmation +
scoped-lock cleanup in gateway/run.py) and binds. This matches the
systemd service path, which already builds its argv with --replace
(_build_gateway_argv / 'nohup hermes gateway run --replace'), and the
intent already documented in _maybe_redirect_run_to_s6_supervision. The
existing HERMES_S6_SUPERVISED_CHILD sentinel still prevents the
run->start->run redirect recursion. Each profile is scoped to its own
HERMES_HOME and s6 guarantees one supervised instance per slot, so there
is no legitimate supervised sibling for --replace to clobber.
Reported via beta (NS-505): gateway.log showed PID 17907 'running
(manual process)' with the guard error repeating every ~12s on
v2026.6.5.
Adds a regression test asserting every gateway-run exec line in the
rendered script (default + named profile, both privilege branches)
carries --replace, and updates the existing render-script assertion.
* fix(ci): remove stray .venv symlink committed into repo
The PR's commit accidentally tracked a .venv symlink pointing at the
developer's local venv (mode 120000 -> /home/ben/nous/hermes-agent/.venv).
The CI test/e2e/build jobs run `uv venv` to create .venv and failed with
`failed to create directory .venv: File exists (os error 17)` because the
checkout already contained the symlink. All test shards aborted in <15s
during setup, before any test ran.
Untrack the symlink and add a bare `.venv` entry to .gitignore (the
existing `.venv/` rule only matches a directory, so a symlink slipped
through).
Salvage corrections on top of @XVVH's #44341:
- Make native web_search injection a 1:1 swap for an already-present client
web_search function, NOT an additive grant. The original unconditionally
appended {"type":"web_search"} on every is_xai_responses turn with any
tools, force-enabling Grok server-side search even when the user never
enabled the web toolset (bypassing Hermes web-provider config + tool-trace
plumbing). Now gated on a client web_search actually being present.
- Reconcile grok-composer context to 200000 (merged in #47908) rather than
262144; 200k is xAI's published usable context window for Composer 2.5,
262144 is the /v1/responses input+output budget.
- Update tests to match scoped behavior + add a no-web-toolset guard test.
- AUTHOR_MAP entry for #44341 salvage.
Incomplete-guard (server-side *_call items at in_progress no longer flip
has_incomplete_items) and preflight built-in-tool allowlist kept as-is.
- model_metadata: grok-composer-2.5-fast → 262144 (OAuth slug not in /v1/models)
- codex transport: inject native {"type":"web_search"} for is_xai_responses;
drop client web_search to avoid duplicate-name 400s
- codex adapter: do not treat in-progress server-side *_call items as incomplete
- tests: adapter, transport build_kwargs, model_metadata, oauth recovery
The desktop self-update runs `hermes update` then `hermes desktop
--build-only`, and only relaunches if the rebuild returns 0. The first
`--build-only` can exit nonzero on a still-settling post-update tree or a
network-blocked Electron fetch that the installer's self-heal repaired
mid-run — so both updaters (the Tauri setup binary and the in-app POSIX
path) bailed before the relaunch step. The update landed but the app
never restarted; a manual launch worked because the heal had completed.
Retry `--build-only` once in both paths before failing, mirroring the
retry-once `hermes update` already does (and the CLI `hermes update`'s
own desktop rebuild). A second run builds clean off the healed dist and
is a near-no-op when the first actually succeeded (content-hash stamp).
- update.rs: retry stage 2; add rebuild_needs_retry() + test
- main.cjs: retry via new update-rebuild.cjs helper (behavior-tested)
Weak open models (mimo, nemotron-class) that see tool-call XML/JSON sitting in
file contents or tool output get primed and emit their own structured tool
calls mimicking the payload — usually with an empty/whitespace name. Those
calls can't be fuzzy-repaired toward a real tool, so the dispatch loop returns
an error and the model retries. Before this fix, every empty-name error dumped
the full tool catalog back to the model, which fed the priming loop more names
to mimic and inflated context 3-4x across the retry budget.
A blank/whitespace-only tool name now gets a terse anti-priming error that
tells the model in-context tool-call syntax is DATA, with no catalog dump. A
genuinely-wrong-but-nonempty name (a real typo) still gets the full catalog so
the model can self-correct.
Not a sandbox/auth boundary issue: Hermes never parses tool-call text from
content into executable calls (structured tool_calls only; the lone text->call
parser is the Copilot ACP transport and it also rejects empty names). The
reporter's own debug dump confirms the injection never executed.
Behavior-contract test added: empty-name -> terse error, no catalog; nonempty
unknown -> catalog preserved. Exercised end-to-end via run_conversation against
an in-process mock provider.
* fix(dashboard): recover the Chat tab when the agent session ends (NS-504)
In the dashboard Chat tab, when the agent process exits — the user types
`/exit`, or starts a new session that ends the current PTY child — the
`/api/pty` WebSocket closes with a normal code (not one of the
4401/4403/4404/4408/1011 rejection codes the server emits). The frontend
handled only those rejection codes; the normal-exit fallback just printed
"[session ended]" into the dead terminal and stopped, with `wsRef` nulled
and no respawn path. The only recovery was a full page refresh — exactly
the beta report ("typing /exit breaks functionality, no way to restart
without refreshing"; "starting a new session completely breaks the
agent").
On a clean/normal close the Chat tab now flips `sessionEnded` and renders
an in-place "Start new session" overlay (mirroring ChatSidebar's existing
reconnect affordance). Clicking it bumps a `reconnectNonce` that is a
dependency of the connect effect, so the effect tears down and re-runs,
spawning a fresh PTY in place — no page refresh. `onopen` clears the
flag so a successful reconnect dismisses the overlay.
An explicit button (rather than auto-respawn) is deliberate: if the agent
is crash-looping, auto-respawn would hide the failure and spin; the user
stays in control.
Verified against a live uvicorn `/api/pty` socket: a child that exits
closes with a non-rejection code (client sees close_code None / 1000-class),
which is precisely the branch that now sets sessionEnded=true. web
typecheck + vite build clean.
Reported via beta (NS-504).
* docs(assets): add NS-504 chat session recovery infographic
* feat(mcp): raise default tool-call timeout 120s -> 300s
Port from openai/codex#28234. Long-running MCP tools (web fetches,
sandboxed builds, deep-research servers) routinely exceed 120s, causing
spurious timeout failures. Codex bumped its default MCP tool timeout from
120 to 300 for the same reason.
- _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server
'timeout' config override unchanged)
- update test_default_timeout assertion
- document the default in mcp-config-reference.md
* fix(dump): show commit date instead of release date in hermes dump
The version line in `hermes dump` (the top of the /debug report) appended
the package release date in parentheses, which reads like a wall-clock
"generated at" timestamp and confuses support triage. Replace it with the
date the HEAD commit was actually made, resolved live via
`git log -1 --format=%cd --date=short`, kept next to the commit SHA.
On Docker/wheel installs with no .git the date resolves to '' and the
suffix is simply omitted (the baked SHA still identifies the build).
* fix(desktop): resolve electronDist dynamically + self-heal blocked installs
Supersedes the static-path approach (#48081) and the install-step self-heal
(#48082) with a fix that removes the whole failure class instead of chasing each
symptom. Three distinct faults converged into the June desktop-build outage; this
closes all three.
Root cause (the part #48081 left open — "Gap B"):
build.electronDist was a static relative path in apps/desktop/package.json, but
npm workspace hoisting is NOT deterministic — depending on the npm version and
what else is installed, npm nests the workspace-only electron devDep under
apps/desktop/node_modules/electron OR hoists it to the repo root. A static path
matches only one layout, so a clean install intermittently fails with "The
specified electronDist does not exist". #48081 re-pointed the path at the
nested layout (correct today) but electron-builder reads electronDist
STATICALLY, so any future hoist change silently breaks it again — only caught
by a CI invariant, never self-corrected.
Fix:
- scripts/run-electron-builder.cjs: resolve electron the way Node's runtime does
— require.resolve("electron/package.json") walks node_modules from the desktop
project upward and finds electron wherever npm actually put it. The path can
never drift out of sync with the install layout again, on any OS/npm version.
* dist present -> pass -c.electronDist=<abs>/dist so electron-builder reuses
the unpacked runtime (keeps the #38673 fast path that dodges the 26.8.x
missing-binary re-unpack bug).
* dist absent -> omit electronDist; electron-builder fetches Electron itself
via @electron/get honoring electronVersion + ELECTRON_MIRROR.
package.json: builder script now runs the wrapper; the static build.electronDist
is removed (the resolver owns it).
- main.py / install.sh / install.ps1: on a dependency-install failure where the
electron package staged but its dist is missing (electron's install.js
process.exit(1) on a blocked/throttled binary download — #47266/#47917/#48021),
repopulate the dist via electron's downloader (canonical, then npmmirror.com)
and CONTINUE to the build instead of aborting. npm runs postinstall LAST, so
the only casualty is electron/dist; bailing here is what made the pack-time
mirror self-heal unreachable on a blocked network. Hard-fail only when electron
never staged at all (a genuine dependency error).
- The pack-time mirror fallback now retries the build even when the pre-fetch
can't populate the dist: the wrapper lets electron-builder download Electron
itself via the mirror, so the retry is no longer a no-op (it was, when
electronDist was a static path).
The exact 40.10.2 pin (already on main) keeps the third mode — the native
@electron-internal/extract-zip win32 binding that 40.10.3/40.10.4 ship without a
published prebuild — from recurring.
Tests:
- test_desktop_electron_pin.py: replace the static-path-matches-lockfile
invariant with contracts that there is no hardcoded electronDist to drift, the
builder script routes through the resolver, and the resolver uses Node module
resolution + injects -c.electronDist.
- test_gui_command.py: install-failure self-heal continues to build; genuine
(electron-never-staged) install failure still hard-fails; pack retries under
the mirror even when the pre-fetch is blocked.
Salvages/supersedes the overlapping community work in #48003 (sitkarev),
#48012 (omegazheng), #48033 (james47kjv), and #48082.
Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>
* fix(desktop): narrow Electron self-heal to real missing-dist failures
Follow-up on #48091 to remove the remaining misdiagnosis risk from the
installer/build fallback path (#46785 concern): only take the Electron
repair/retry path when Electron's package files are staged and dist is actually
missing/corrupt.
- main.py: add _electron_pkg_staged_missing_dist() and use it to gate install
failure recovery; fail fast for unrelated npm install errors.
- main.py/install.sh/install.ps1: run cache purge + retry only when dist is
missing; do not retry unrelated tsc/vite/build failures under an
Electron-specific narrative.
- install.sh/install.ps1: tighten install-stage self-heal guard to require both
package.json + install.js and missing dist.
- tests: add coverage that install failure hard-fails when Electron dist already
exists, and update retry test to reflect the tightened recovery condition.
Validation:
- Python tests: 64 passed
- install.sh-related tests included in the run
- Real mac build on this machine:
- npm ci at repo root: success
- cd apps/desktop && npm run pack: success
- electron-builder packaged darwin arm64 and used custom unpacked Electron dist
* refactor(desktop): trim electron self-heal helpers and comments
Deduplicate mirror-retry into _try_redownload_electron_dist / shell
counterparts; shorten wrapper and install-script commentary without
changing recovery semantics.
---------
Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>
- test_ws_transport.py: drives WebSocketRelayTransport against a REAL in-process
websockets server (not a mock socket): handshake (hello->descriptor), inbound
frame -> handler, outbound request/response correlation, follow_up routing,
and clean disconnect failing pending waiters. Skips if websockets is absent.
- test_relay_registration.py: rewritten for the config-driven gate — registers
when GATEWAY_RELAY_URL is set / an explicit url is passed / force=True; no-op
without a URL; trailing slash stripped; adapter constructs through the registry.
Full relay suite: 57 passed.
Wire the relay adapter into gateway startup and make activation config-driven
instead of a dark-launch flag.
- gateway/relay/__init__.py: replace relay_enabled()/HERMES_GATEWAY_RELAY with
relay_url() (GATEWAY_RELAY_URL env or gateway.relay_url in config.yaml) — the
same shape as gateway.proxy_url. register_relay_adapter() registers when a URL
is configured and builds a live WebSocketRelayTransport; with no URL it's a
no-op (direct/single-tenant deployments unaffected). force=True keeps the
transport-less adapter for unit tests. relay_platform_identity() reads the
hello platform/botId from GATEWAY_RELAY_PLATFORM/GATEWAY_RELAY_BOT_ID.
- gateway/run.py: call register_relay_adapter() during GatewayRunner.start(),
right after plugin discovery, so a configured connector relay is registered
on every boot. Failures are logged, never block startup.
This removes the dark-launch posture: the relay is on whenever it's configured,
shipping the production end state rather than hiding it behind a flag.
Adds the concrete transport behind the RelayTransport Protocol — the missing
'later-phase work' the relay scaffold deferred. The gateway dials OUT to the
connector over a WebSocket and speaks the newline-delimited JSON frame protocol
(docs/relay-connector-contract.md; connector src/relay/protocol.ts):
- connect(): opens the ws, sends hello{platform,botId}, starts a background
read loop, and resolves handshake() when the connector's descriptor frame
arrives.
- inbound frames -> the registered InboundHandler (rebuilt into a MessageEvent
via _event_from_wire, mapping the snake_case SessionSource wire form back
onto the gateway dataclasses).
- send_outbound / send_follow_up / get_chat_info: request/response correlated
by a uuid requestId against a per-request future, with a timeout so a caller
never hangs; send_interrupt is fire-and-forget.
- disconnect(): cancels the reader, closes the ws, and fails any in-flight
outbound waiters with a structured error.
RelayAdapter.connect() now negotiates the real CapabilityDescriptor from the
transport and adopts it (_apply_descriptor updates MAX_MESSAGE_LENGTH +
markdown surface), replacing the construction-time placeholder. Lazy
'import websockets' mirrors gateway/platforms/feishu.py; WEBSOCKETS_AVAILABLE
gates construction.
The contract's §6 still said the connector 'forwards the signed body
byte-for-byte so the gateway's existing crypto validates against unmodified
bytes.' That model is incoherent under an untrusted, disposable tenant
gateway on a shared bot:
- re-validating Twilio HMAC / WeCom crypto needs the shared signing secret
(handing it over IS the cross-tenant leak),
- WeCom payloads are encrypted with that secret (the connector must decrypt
at the edge just to route),
- a Discord interaction token lives inside the signed body — you can't both
preserve the bytes and strip the credential.
Rewrites §6 to the actual model: the connector is the SOLE crypto/identity
boundary — verifies/decrypts at the edge, normalizes to a tenant-scoped
MessageEvent, strips shared-identity capabilities into its vault, and
forwards only the sanitized event. The gateway re-validates nothing (the
invariant test from the crypto-shed commit enforces this). Notes that this
unifies the passthrough + relay planes and points to the connector repo's
capability-trust-boundary.md.
Also documents the follow_up op in §4 (token-less capability action added
in the previous commit). The conformance test (§2/§3 tables) stays green;
contract is unpublished/EXPERIMENTAL so no version-bump ceremony. 55 passed.
The relay outbound surface had send/edit/typing but no way to act on a
SHARED-identity capability (e.g. a Discord interaction follow-up token,
~15min) that the connector captured + stripped at the edge. Under A2 that
credential never reaches the gateway, so the gateway can't just 'send with
the token' — it needs a semantic op naming the session it's already in.
Adds the follow_up op end to end on the gateway side:
- RelayTransport.send_follow_up(action): protocol method. Action carries
op='follow_up' + session_key + kind + content (+ metadata) and NO token.
- RelayAdapter.send_follow_up(session_key, kind, content, metadata): builds
that action and returns a SendResult. The connector resolves the real
capability (its resolveOutboundCapability), enforces the tenant match so
tenant B can't wield tenant A's capability, and egresses; success=False
when the capability is absent/expired/mismatched (nothing to retry — a
leaked gateway holds zero capability material).
- StubConnector records follow_ups + a canned next_follow_up_result.
Tests: round-trips without a token; the wire action carries only session
refs (no credential value field — the 'kind' string is a type ref, not the
secret); failure surfaces when the connector can't resolve; no-transport
fails cleanly. 55 passed. §4 doc entry follows in the contract-rewrite commit.
Under the A2 trust model the connector is the SOLE crypto/identity
boundary: it verifies/decrypts every inbound platform payload at the edge
(it holds the tenant secrets), normalizes to a tenant-scoped MessageEvent,
and forwards only the sanitized event. The gateway re-validates nothing —
it cannot without being handed the shared signing secret, which on a
shared bot is itself the cross-tenant leak.
The relay path already imports no platform-crypto today; this locks that
in as an enforced invariant so nobody bolts re-validation (Discord
ed25519, Twilio HMAC, WeCom BizMsgCrypt, generic webhook signature checks)
onto the relay later and silently re-couples the gateway to platform
secrets it must never hold. Verification stays in the direct platform
adapters (gateway/platforms/*) which serve non-relay deployments.
- test_relay_package_imports_no_platform_crypto: AST-walks gateway/relay/*
and fails on any import of a platform-crypto/verification module.
- test_relay_package_calls_no_signature_verification: fails on any
verification-symbol reference (ed25519/hmac/bizmsg/verify_*).
Invariants (assert the relation 'relay re-validates nothing'), not frozen
snapshots. Verified the guard bites: injecting a wecom_crypto import makes
it fail, removing it goes green. docs §6 rewrite follows in a later commit.
The Phase 1 exit gate requires BOTH Discord and Telegram to round-trip
through the relay stub, but test_relay_roundtrip.py only covered Discord.
Add the Telegram companion exercising its distinct discriminator profile:
- no guild_id — two chats isolate on chat_id alone
- forum topics share one chat_id and isolate by thread_id (the Telegram
analog of Discord per-guild isolation), shared across participants by
default (thread_sessions_per_user=False)
- DM isolation by chat_id
- utf16 len_unit + markdown_v2 dialect round-trip and configure the adapter
- outbound send round-trips through the stub
Proves the CapabilityDescriptor + build_session_key generalize beyond
Discord, not just the struct (which the descriptor unit tests already
covered).
Add an invariant test pinning docs/relay-connector-contract.md to the
Python source of truth so the doc (which the connector repo mirrors by
hand) cannot silently drift:
- CapabilityDescriptor §2 table ⟷ dataclass fields + required/optional
- SessionSource wire keys (to_dict output) ⟷ §3 documented fields
- per-platform discriminator columns exist as real SessionSource fields
- guard that is_bot stays off the wire until deliberately promoted
Writing the test surfaced a real gap: §3 only enumerated 5 discriminators
in its per-platform table while to_dict() emits 12 keys. Seven wire keys
the connector must populate (chat_name, chat_topic, user_id_alt,
chat_id_alt, parent_chat_id, message_id, user_name) were undocumented —
a connector author reading the doc would never know to set them. Added a
complete SessionSource wire-field table to §3. The connector's existing
contract.ts already carries all 12, so no connector change is needed; the
doc was the lagging artifact.
The platform-connected-checker invariant test requires every built-in
Platform enum member to have either a generic token path or a bespoke
entry in _PLATFORM_CONNECTED_CHECKERS. Platform.RELAY was added without
one, so test_all_builtins_have_checker_or_generic_token_path failed.
Relay dials OUT to a connector and is 'connected' once an endpoint URL
is configured (extra['relay_url'] or extra['url']); the capability
descriptor is negotiated at handshake time, so the URL is the only
config-level signal in the experimental phase. Add the checker plus a
synthetic-config case exercising its True path.
CI guard: fails if gateway/ or plugins/ ever imports the test-only stub
connector or defines StubConnector. Matches code leaks (imports / class defs),
not prose mentions, so the transport.py docstring reference to the stub's path
is allowed.
Phase 1 complete. Task 1.6 of the gateway-relay plan.
Formal interface between the Hermes gateway (RelayAdapter) and the Node
connector repo: handshake, CapabilityDescriptor field table, MessageEvent
inbound envelope with per-platform SessionSource discriminators (Discord
guild_id is REQUIRED for server isolation), outbound action set, /stop
interrupt routing, signed-body verify-at-edge/byte-preserving rule, and the
additive-only contract_version policy. Documents bot-identity-vs-tenant
separation so single-bot consolidation (Phase 6) stays open. Read-first
artifact for the connector implementer.
Phase 1, Task 1.5 of the gateway-relay plan.
RelayAdapter.on_interrupt(session_key, chat_id) bridges a connector-delivered
mid-turn /stop into the existing interrupt_session_activity path, setting the
per-session _active_sessions Event and clearing typing — cancelling exactly the
targeted session's turn without touching siblings (mirrors test_stop_thread_
sibling isolation). Transport.send_interrupt carries the gateway-side egress to
the connector for socket-owner routing.
Phase 1, Task 1.4 of the gateway-relay plan.
register_relay_adapter() registers the generic 'relay' platform via the same
PlatformRegistry path as plugin adapters — no core dispatch changes. OFF by
default (dark-launch): only registers when HERMES_GATEWAY_RELAY is truthy (or
force=True for tests), so existing single-tenant/direct deployments are
unaffected. Factory builds a transport-less RelayAdapter with a placeholder
descriptor; the real descriptor is negotiated at handshake.
Phase 1, Task 1.3 of the gateway-relay plan.
Defines RelayTransport (lifecycle/handshake/inbound/outbound/interrupt) as the
gateway<->connector wire contract; RelayAdapter.connect now registers an inbound
handler that bridges connector-delivered MessageEvents into handle_message.
Adds an in-memory StubConnector under tests/ and an E2E round-trip proving:
connect registers the handler, inbound events reach the adapter, guild_id drives
build_session_key isolation (two guilds -> two keys; same guild/channel/user ->
one), outbound send round-trips, get_chat_info is proxied.
Phase 1, Task 1.2 of the gateway-relay plan.
One BasePlatformAdapter subclass that reads its capability profile from a
CapabilityDescriptor: MAX_MESSAGE_LENGTH attribute, message_len_fn (table-driven
by len_unit: chars=len, utf16=Telegram-style code units), supports_draft_streaming.
Implements the four abstract methods (connect/disconnect/send/get_chat_info) by
delegating to an injected RelayTransport (full protocol lands in Task 1.2). Adds
Platform.RELAY enum member. No per-platform gateway code.
Phase 1, Task 1.1 of the gateway-relay plan.
CapabilityDescriptor.from_platform_entry() projects an existing PlatformEntry
(label, max_message_length, emoji, platform_hint, pii_safe, name) into a
descriptor, proving the descriptor is a projection of existing config rather
than a parallel concept. Runtime-only capabilities (len_unit, draft/edit/
thread/markdown) are caller-supplied. max_message_length==0 ('no limit') maps
to the stream_consumer 4096 default.
Phase 0 complete. Task 0.3 of the gateway-relay plan.
Behavioral regression harness locking the capability surface that the future
RelayAdapter must reproduce: the abstract-method set (connect/disconnect/send/
get_chat_info), message_len_fn default, supports_draft_streaming default, and
the stream_consumer MAX_MESSAGE_LENGTH attribute read. Passes on main before
any RelayAdapter exists.
Phase 0, Task 0.1 of the gateway-relay plan.
After the June lockfile regeneration (#46652) floated electron and reshuffled
npm workspace hoisting, the desktop pack fails with "The specified electronDist
does not exist". apps/desktop/package.json pointed electronDist at the repo
root (../../node_modules/electron/dist) while npm now installs electron nested
under apps/desktop/node_modules/electron. The two contradict, so a clean
install can never package the app (Windows + macOS).
- electronDist -> node_modules/electron/dist (resolved relative to apps/desktop,
i.e. the workspace-local install npm actually produces).
- hermes_cli/main.py, scripts/install.sh, scripts/install.ps1: add a runtime
electron-dir resolver that prefers apps/desktop/node_modules/electron and
falls back to the root hoist, so dist checks + the mirror re-download work
under either npm layout.
- patch-electron-builder-mac-binary.cjs: try the workspace-local Electron.app
before the root hoist in the macOS binary-restore fallback (sibling site no
PR touched).
- test: assert build.electronDist resolves to where the lockfile installs
electron, so a future hoist change (root <-> nested) can't silently break it.
Salvages the overlapping work in #48003 (sitkarev), #48012 (omegazheng), and
#48033 (james47kjv).
Co-authored-by: sitkarev <59806492+sitkarev@users.noreply.github.com>
Co-authored-by: omegazheng <zheng@omegasys.eu>
Co-authored-by: james47kjv <220877172+james47kjv@users.noreply.github.com>
* fix(desktop): recover stranded session windows when resume fails
Opening a session in a new window (or any routed resume) could latch the
thread loader on "session" forever — the reported "stays stuck loading,
even after a nap" bug. Two compounding causes:
1. use-session-actions.resumeSession's catch ran the REST transcript
fallback OUTSIDE its own try. When session.resume rejected AND the
fallback also threw (the common case on a wedged/unreachable backend),
the throw skipped setMessages and left activeSessionId null with an
empty transcript — exactly the state the loader gates on
(messagesEmpty && !activeSessionId), with no terminal/error state.
2. use-route-resume's self-heal could never re-fire: resumeSession sets
selectedStoredSessionIdRef synchronously at entry (before failing), so
stuckOnRoutedSession stays false, and on an already-open idle window
neither pathnameChanged nor gatewayBecameOpen fire again. The window
never retried — naps, focus, nothing recovered it.
Fix:
- Wrap the REST fallback in its own try so a fallback failure can't strand
the loader.
- Add $resumeFailedSessionId: armed on terminal resume failure, cleared at
the next resume's entry (and left clear on success).
- use-route-resume gains a bounded backoff auto-retry (4 attempts, 1s→8s)
that re-resumes while the routed session matches the failure flag, with a
fire-time liveness recheck so a recovered session isn't double-resumed.
Regression tests cover: fallback-wrap arming the flag without throwing,
flag cleared on success, retry fires on backoff, no retry for a
non-routed/recovered session, and the retry cap.
* feat(desktop): show error + manual Retry when resume retries exhaust
When a stranded session window's bounded auto-retry gives up (gateway
resume RPC + REST fallback fail through all MAX_RESUME_RETRIES attempts),
the loader latched forever. Add a $resumeExhaustedSessionId atom armed at
the give-up point so the chat view swaps the perpetual spinner for an
explicit error state + manual Retry button. Retry / reconnect / reselect
clears the latch and resets the auto-retry counter for a fresh cycle; a
route-change away from the stranded session also clears it.
Distinct from $resumeFailedSessionId (armed during the backoff window) so
the error UI only appears once auto-recovery has actually given up, not
mid-retry. Adds i18n strings across en/ja/zh/zh-hant and 3 tests covering
latch-arms-on-exhaustion, stays-clear-while-retries-remain, and
clears-on-route-change.
* fix(desktop): address review on stranded-resume recovery layer
Follow-up to review on #47655 (PR head 253bfc0e3). Four issues on the
recovery layer:
1. (blocking) Arm $resumeFailedSessionId only when the transcript is still
empty after the REST fallback ($messages.get().length === 0), matching the
atom's documented contract and the loader's messagesEmpty gate. Previously
armed on any resume-RPC reject regardless of fallback outcome, so a window
that recovered its history via REST still auto-retried and, on exhaustion,
blanked the visible transcript behind the error overlay.
2. Reset the bounded-retry attempt counter on the $resumeExhaustedSessionId
armed->cleared edge so a manual Retry / reconnect / reselect on the SAME
stranded session gets a fresh backoff cycle, not a single one-shot attempt
that immediately re-arms the error. (Keyed on the exhausted latch rather
than the resumeFailedSessionId null->value transition the review suggested:
the auto-retry loop itself toggles resumeFailedSessionId every cycle, so
keying the reset there would defeat the MAX_RESUME_RETRIES cap. Only
resumeSession clears the exhausted latch, making its clear edge the
unambiguous manual-retry signal.)
3. Advance retryAttemptRef only when the timer actually dispatches a resume,
not at schedule time. Prevents unrelated dep changes during the 1s-8s
backoff window (transient gatewayState flip, non-stable resumeSession) from
burning attempts and hitting MAX with fewer than 4 real resume attempts.
4. Drop unrelated blank-line-only insertions in store/session.ts and
use-session-actions.ts to keep the diff tight.
Tests: +3 (RPC-fails-REST-succeeds-no-arm; manual-retry-fresh-cycle;
no-attempts-burned-on-dep-churn). All 19 resume tests + full session-hook
suite (65) pass; tsc --noEmit clean.
---------
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
* fix(photon): preserve text in mixed iMessage attachments
When an iMessage bubble carried both text and an attachment, spectrum-ts'
inbound mapper returned only buildAttachmentMessage(...), dropping the user's
typed text before Hermes could see it. The Photon adapter then had no 'group'
content path, so the text was lost entirely.
- adapter.py: handle a new 'group' content type that flattens text + attachment
items, preserving the typed text alongside cached media (extracted shared
_normalize_binary_payload helper).
- sidecar: emit 'group' content in normalizeContent, and ship
patch-spectrum-mixed-attachments.mjs which patches spectrum-ts' pinned mapper
(at npm postinstall AND at sidecar startup, so existing installs self-heal).
Windows robustness fixes on top of the original PR:
- The patcher's CLI guard used 'import.meta.url === file://${argv[1]}', which
never matches on Windows (file:/// + drive letter) — it silently no-opped.
Switched to pathToFileURL(argv[1]).href.
- The patcher matched \n-joined strings, so a CRLF checkout (Windows git
autocrlf) defeated every replacement. It now normalizes CRLF->LF for matching
and restores the original EOL style on write.
Co-authored-by: Yuhang Lin <yuhanglin@YuhangdeMac-mini.local>
* chore: map YuhangLin contributor email for attribution (#46513)
---------
Co-authored-by: Yuhang Lin <yuhanglin@YuhangdeMac-mini.local>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
A failed turn leaves a red error banner inline in the transcript. These
errors are renderer-local state (never persisted) and stay pinned to the
message until the session is reloaded, so a stale, no-longer-relevant
error (e.g. a transient provider/inference error) lingers with no way to
clear it.
Add an 'x' dismiss button inside the existing MessagePrimitive.Error
block. Clicking it clears the error from BOTH the live $messages view
and the per-runtime session cache — the view first, because
preserveLocalAssistantErrors re-grafts any still-errored message it finds
in the view onto the next session.info flush, so clearing only the cache
would let the heartbeat resurrect the banner. A bare error placeholder
(no streamed content) is dropped entirely; a turn that streamed partial
output before failing keeps its text and just sheds the error.
The control only renders when an onDismissError handler is wired, so
secondary/embedded Thread usages are unaffected. Adds the dismissError
string to all four locales (en/ja/zh/zh-hant) and two behavior tests.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
A /title typed before any message in a fresh desktop chat could be silently
lost: the session DB row is deferred to the first prompt, so session.title
found no row, only stashed pending_title, and returned pending:true. It then
relied on a post-turn apply block to write the title. When that turn never
landed under the same session_key (or the apply path didn't fire), the title
was dropped and the sidebar fell back to the first-message preview — e.g.
"/title my-custom-name" then "hello" left the session titled "hello".
Mirror the messaging gateway's _handle_title_command: an explicit /title is
clear user intent, not an abandoned draft, so create the row up front
(_ensure_session_db_row) and set the title immediately via the profile-aware
_session_db handle, returning pending:false. This also fixes the frontend
symptom for free — the desktop handler's immediate refreshSessions() now pulls
the correct persisted title instead of clobbering the optimistic value with a
still-NULL row.
If row creation can't take (DB unavailable / racing writer), fall back to the
existing pending_title queue so the post-turn apply block remains a recovery
path. The sidebar's min-messages filter keeps a titled 0-message row hidden, so
a /title'd-but-never-used draft still doesn't clutter the list.
Updates the test that asserted the old queue-on-missing-row behavior and adds a
fallback-to-queue regression test.
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
* feat(search_files): path-grouped lossless densification of content matches
Content-mode search_files results repeat the {path,line,content} JSON keys
and the full path string for every match. Group consecutive same-path matches
under one path header with indented '<line>: <content>' rows — lossless (every
path/line/content byte preserved), self-describing (matches_format key), and
readable by the model with no decode step.
57.8% mean token reduction on real search_files content outputs (422-output
corpus), fires on 97% of them. Gated at >=5 matches; below that the verbose
array is left untouched. Default to_dict(densify=False) is unchanged, so no
other caller is affected.
ripgrep emits matches path-ordered, so consecutive grouping never reorders
results.
* test: accept densify kwarg in _FakeSearchResult.to_dict
The search loop-detection tests stub SearchResult with a fake whose
to_dict() must mirror the real signature now that it takes densify=.
* test(search_files): edge-case losslessness battery for densification
Adversarial single-line content (colons, indentation, unicode/emoji, empty,
trailing whitespace, quotes+commas), paths with spaces, and an explicit
one-line-per-match invariant documenting the ripgrep contract the format
relies on (0/6775 real match contents contained a newline).
* fix(logging): alias RotatingFileHandler to concurrent-log-handler
On Windows, stdlib RotatingFileHandler.doRollover() uses os.rename(), which
fails with PermissionError [WinError 32] whenever another process holds an
append-mode handle on agent.log — essentially always in Hermes (TUI, gateway,
hy_memory server, MCP servers, and on-demand CLI commands all log from separate
processes). This pinned agent.log at the 5 MiB threshold and spammed stderr
with a traceback on every emit (#44873).
Add concurrent-log-handler==0.9.29 as a core dep and alias its
ConcurrentRotatingFileHandler as RotatingFileHandler in hermes_logging.py. It
wraps the rename in a cross-process file lock (via portalocker: pywin32 on
Windows, fcntl on POSIX) so only one process rotates at a time. Aliasing keeps
every existing isinstance/class-declaration reference working unchanged.
Co-authored-by: tuancookiez-hub <tuancookiez@gmail.com>
* fix(logging): gate concurrent-log-handler swap to Windows only
The initial salvage aliased RotatingFileHandler -> ConcurrentRotatingFileHandler
unconditionally, which regressed POSIX: CLH opens lazily and rotates via its own
lock path, breaking managed-mode (NixOS) group-writable perms and eager file
creation that _ManagedRotatingFileHandler depends on. CI caught it as 2 failures
in test_managed_mode_*_group_writable on Linux.
The WinError 32 bug (#44873) is Windows-specific — POSIX renames an open file
fine, so stdlib already works on Linux/macOS. Gate the swap behind
sys.platform == 'win32': Windows uses CLH, POSIX keeps stdlib RotatingFileHandler.
- hermes_logging.py: platform-conditional import.
- tests/test_hermes_logging.py: import RotatingFileHandler from hermes_logging
(single source of truth) so the autouse fixture's isinstance checks match the
real handler class on both platforms.
- pyproject.toml/uv.lock: mark the dep 'sys_platform == "win32"' so portalocker
/pywin32 only ship where used.
---------
Co-authored-by: tuancookiez-hub <tuancookiez@gmail.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Follow-up hardening on @ehz0ah / @harshitAgr's session-switch work (#28296):
- on_session_switch no longer runs the old-session writer-drain + pending-token
GET + commit POST inline on the caller's command thread. /new, /branch,
/resume, /undo call it synchronously, so a slow drain (up to 10s) or wedged
commit blocked the user-facing command — the same hazard #41945 fixed for
end-of-turn sync. State now rotates synchronously (cheap) and the old-session
commit is offloaded to a daemon finalizer (generalized _finalize_session_async).
- Guard the (_session_id, _turn_count) pair with _session_state_lock: sync_turn
runs on the memory-manager executor thread while the session hooks run on the
command thread, so the snapshot+reset vs increment was a cross-thread race.
- _session_needs_commit checks the committed-session guard BEFORE the
turn_count>0 shortcut, closing a double-commit window when a racing sync_turn
re-increments after commit+reset.
- Add a _shutting_down flag so deferred finalizers stop POSTing against a
torn-down client; track all prefetch threads in a set so invalidate/shutdown
join every one, not just the latest slot.
Tests: regression for the non-blocking switch (asserts the caller returns while
a slow drain is parked off-thread) and the committed-guard ordering; updated the
deferred-commit test to the unified finalizer contract.
* fix(desktop): pin Electron below the broken native extract-zip install
The Windows desktop install fails at "Building desktop app": Electron's
postinstall aborts with `ERR_DLOPEN_FAILED loading
index.win32-x64-msvc.node` / "Cannot find native binding" from
`@electron-internal/extract-zip`.
Root cause is a dependency drift, not the user's machine. Electron changed
its install mechanism mid-patch-series:
electron 40.9.3 .. 40.10.2 -> @electron/get@^2 + extract-zip@^2 (pure JS)
electron 40.10.3 / 40.10.4 -> @electron/get@^5 + @electron-internal/extract-zip@^1 (native napi)
apps/desktop declares `electronVersion: 40.9.3` (the tested, JS-extract
build) but pinned the dependency as `electron: ^40.9.3`, so `npm ci`/`npm
install` silently resolved 40.10.3/40.10.4 — onto the brand-new native
extract-zip whose win32-x64 binding fails to dlopen on some Windows hosts.
The committed lockfile already carried 40.10.3, and the installer's mirror
fallback can't help (it re-runs Electron's own `install.js`, which uses the
same broken native module).
Fix:
- Pin `electron` to an exact `40.10.2` — the newest build before the native
extract-zip switch — and align `build.electronVersion` to match (Electron
Builder needs electronVersion/electronDist to match the installed binary).
- Add a root `yauzl: ^3.3.1` override so the (re-introduced) JS extract-zip
path also works on Node >= 24.16 / >= 26.1, where the old yauzl hangs.
This is the same workaround the wider Electron ecosystem adopted.
- Regenerate package-lock.json: drops @electron-internal/extract-zip and
@electron/get@5, restores @electron/get@2 + extract-zip@2 + yauzl@3.4.0.
* test(desktop): lock the Electron pin/version/lockfile consistency contract
Guards against the dependency drift that broke the Windows desktop install:
the Electron dependency must be an exact version, must equal
build.electronVersion, and the lockfile must resolve to that same version so
`npm ci` installs exactly what electron-builder packages. Asserts the
relationships, not a specific version number.
* fix(desktop): keep streaming painting in unfocused secondary chat windows
The chat transcript streams to screen through a requestAnimationFrame-gated
flush, which Chromium pauses for blurred/occluded windows. The primary window
opted out with `backgroundThrottling: false`, but the secondary "session
windows" (cmd-click pop-out, new-session, subagent-watch) hand-copied their
webPreferences and silently lost that flag — so a streamed answer in one of them
stalled until the window regained focus (reported on Windows 11). The primary
window's own comment even claimed it was "matching the secondary windows," which
was no longer true.
Hoist the chat-window webPreferences into a single shared factory
(`chatWindowWebPreferences`) in session-windows.cjs and use it for BOTH windows,
so they can never drift on this flag again.
* test(desktop): assert chat windows disable background throttling
Cover chatWindowWebPreferences: it must set backgroundThrottling=false (so the
streaming transcript paints while the window is blurred) and pass the preload
path through while keeping the hardened defaults (contextIsolation, sandbox,
nodeIntegration=false).
The model is callable via xAI OAuth but omitted from models.dev and
/v1/models listings. Merge it into the curated xAI catalog so it appears
in `hermes model` without requiring a custom model name.
Avoid applying text-only persist_user_message overrides to multimodal current-turn user messages. Early crash-resilience persistence mutates the same messages list later used for the API call, so clobbering list content drops ACP image blocks before model dispatch.\n\nAdd regression coverage for both text override behavior and multimodal preservation.\n\nCloses #44242
* feat(mcp): raise default tool-call timeout 120s -> 300s
Port from openai/codex#28234. Long-running MCP tools (web fetches,
sandboxed builds, deep-research servers) routinely exceed 120s, causing
spurious timeout failures. Codex bumped its default MCP tool timeout from
120 to 300 for the same reason.
- _DEFAULT_TOOL_TIMEOUT 120 -> 300 in tools/mcp_tool.py (per-server
'timeout' config override unchanged)
- update test_default_timeout assertion
- document the default in mcp-config-reference.md
* refactor: remove agent-callable send_message tool
The agent should not decide on its own to fire off cross-platform
messages or reactions. Outbound platform messaging is handled outside
the agent loop — cron delivery, the gateway kanban notifier
(dashboard-toggled), and the `hermes send` CLI.
Removes the model-tool registration only; the send engine in
send_message_tool.py (_send_to_platform, _send_via_adapter,
_parse_target_ref, per-platform _send_* helpers) is kept intact for
those non-agent callers. Drops the now-empty 'messaging' toolset and
its `hermes tools` toggle. Yuanbao DM guidance now points at the
native yb_send_dm tool.
A multi-MB message (logged bundle, huge tool dump) froze the renderer
before any paint: Streamdown runs `preprocess` + `marked` lex over the
whole string synchronously in a useMemo, an uninterruptible long task
that no try/catch or content-visibility can help (our JS runs before the
browser ever skips layout). Tiered fix:
- Message gate: past 200KB, bypass markdown entirely and render the raw
text in `content-visibility:auto` line-chunks — synchronous work is
bounded to a string split, the browser virtualizes layout natively,
and every line stays in the DOM (selectable, find-in-page).
- Code-block budget: past 3k lines / 150KB, skip Shiki (which emits a
span per token) and render plain, chunked the same way.
- Collapse/expand: a reusable ExpandableBlock clamps code blocks and the
huge-text fallback to a 120px preview with a gradient + chevron,
expanding to 300px. The inner element is always a scroll container so
the content-visibility chunks stay lazily laid out in both states.
No content is ever dropped; the copy button (card header) always yields
the full block.
restore_skill() falls back to p.name.startswith(f"{skill_name}-") when no
archive directory matches the requested name exactly. That fallback is meant
to catch the timestamped duplicate archive_skill() writes on a name collision
(<skill>-YYYYMMDDHHMMSS), but the bare prefix also matches any unrelated
archived skill named <name>-something. So restoring "git" can pull an archived
"git-helpers" out of .archive/, rename it to "git", and report success: the
requested skill is not restored and the sibling is gone from the archive.
Constrain the fallback to the exact suffix archive_skill() produces, a 14 digit
timestamp. The exact-name match and the recursive nested-archive walk are
unchanged, so nested and timestamped restores still work; unrelated siblings no
longer match.
Fixes#47647
The OpenAI device-code login (POST auth.openai.com/.../deviceauth/usercode)
had no retry or 429 handling — a transient throttle from OpenAI surfaced as
a bare "Device code request returned status 429" with no guidance, reading
as a hard login failure.
- Retry the device-code request with capped exponential backoff (honoring
Retry-After), up to 4 attempts.
- On persistent 429, raise a clear AuthError tagged CODEX_RATE_LIMITED_CODE
(classified transient, not a credential problem) with a wait hint.
- Apply the same 429 classification to the token-exchange step (same bug
class).
Unrelated to PR #47399 (Responses-API cache headers); this is the OAuth
device-code path in hermes_cli/auth.py.
get_runtime_status_running_pid() validates liveness with a local
os.kill(pid, 0) probe. In /api/status the runtime record can be the
REMOTE health-probe body (cross-container), whose PID belongs to another
host and is display-only — probing it locally is wrong and trips the
test live-system guard (os.kill on a PID outside the test subtree).
Run the fallback only against the local read_runtime_status() record.
_profile_scope swaps process-global skills_tool/skill_manager module
attrs under an RLock; /api/status holds that scope across the
run_in_executor remote-health probe await, so a concurrent
/api/skills?profile=X request can cross-restore the status profile's
skill dir on its finally. Add _config_profile_scope (contextvar-only,
task-local, await-safe) and use it for status, which only resolves
get_hermes_home() at call time for config/env/gateway state and never
needs the skills-module globals.
Context files (AGENTS.md, CLAUDE.md, .hermes.md, .cursorrules, SOUL.md) were
hard-capped at a flat 20K chars before head/tail truncation. Among the agent
harnesses we track, only Codex caps project docs at all (32 KiB); Claude Code,
OpenCode, and Cline load them whole. The flat 20K predates large context
windows and silently truncates real-world AGENTS.md files.
B — dynamic cap: when context_file_max_chars is unset (now the shipped
default), the cap scales with the model's context window
(ctx_tokens * 4 * 0.06, floor 20K, ceiling 500K). Small-context models stay at
the historical 20K; a 200K model gets 48K; large models stop truncating real
docs. An explicit context_file_max_chars still wins. Context length is resolved
once per conversation (stable -> prompt cache untouched).
C — when truncation does happen, the marker now names the concrete file path
and tells the agent to read_file it for the full content.
Validation: 154 targeted tests + full agent/ + hermes_cli/ + test_config
(0 failures); E2E against a real 60K AGENTS.md confirms small windows truncate
with the path-bearing marker, large windows load whole, and the system prompt
is byte-stable across rebuilds.
unicode-bidi:plaintext (#44596) resolves text direction per line, but
list markers and the blockquote border are box chrome driven by the CSS
direction property, which plaintext never sets, so an RTL list renders
its numbers stranded at the far left edge. CSS cannot close this gap
(:dir() only reads the dir attribute, never plaintext resolution), so
ul/ol/blockquote carry dir="auto" and the browser resolves their box
direction natively while the plaintext rules keep owning the text.
Inline code carries dir="ltr", which HTML's auto algorithm skips,
matching the no-vote contract the CSS isolate already gives it.
Rolling back to the oldest curator snapshot failed and deleted that
snapshot. rollback() takes a safety snapshot first, and snapshot_skills()
ends by pruning the backups directory down to keep (5 by default). At the
steady keep limit that prune removed the oldest snapshot, which is the very
one being restored, so the extract found no skills.tar.gz and the rollback
stopped with "snapshot extract failed (state restored)".
Thread an optional protect set through snapshot_skills() into _prune_old()
so the pre rollback safety snapshot can never evict the snapshot being
restored. Add two regression tests covering restore of the oldest snapshot
at the keep limit.
Fixes#47612
The curator now defaults to prune-only: the deterministic inactivity pass
(mark stale / archive long-unused skills) still runs whenever the curator is
enabled, but the opinionated LLM umbrella-building consolidation fork is OFF
by default.
- agent/curator.py: add DEFAULT_CONSOLIDATE=False + get_consolidate(); gate
the forked aux-model review in run_curator_review behind it (new consolidate
param, None=read config). When off, the LLM pass is skipped entirely (no
aux-model cost); the run is still recorded and reported.
- config.py: add curator.consolidate (default false); v29->v30 migration seeds
the key for existing installs without clobbering a user-set value.
- hermes_cli/curator.py: 'hermes curator run --consolidate' override; status
shows consolidate state; prune-only notice on run.
- docs + tests.
When refresh_launchd_plist_if_needed() runs from inside the gateway's own
launchd process tree (agent-initiated self-update via the terminal tool), a
direct launchctl bootout tears down the service's process group — including
the CLI doing the refresh — before the follow-up bootstrap can run. The
gateway is left unloaded and KeepAlive can't revive it (#43842).
Detect in-service execution via gateway.status.get_running_pid() +
_is_pid_ancestor_of_current_process(), and delegate the bootout->bootstrap to
a detached (start_new_session=True) helper that survives the process-group
teardown. The normal out-of-tree CLI path is unchanged.
Fixes#43842.
A turn that ends in an error (e.g. an out-of-funds state) was being
re-rendered in unrelated threads. On a warm thread switch the on-screen
`$messages` still belongs to the previously viewed thread, and
`flushPendingViewState` fed it into `preserveLocalAssistantErrors`, which
grafted the prior thread's failed turn onto the newly opened one. Because
the polluted view then became the next switch's baseline, the error
cascaded into every thread the user visited.
Only carry local errors across a view flush when the on-screen baseline is
the same session being flushed; the cached state we publish already retains
that session's own errors. Also surface the turn error as a global toast
even when the failing turn ran in a background thread, since the error
blocks all subsequent interactions until the user acts.
The double-underscore prefix swap fixed bare native tools but SKIPPED tools
already named mcp_<server>_<tool> (real MCP servers, e.g. mcp_linear_get_issue):
they went on the OAuth wire single-underscore and still tripped Anthropic's
third-party billing classifier -> HTTP 400 'extra usage, not plan limits'.
Verified empirically against a live Max subscription: a single mcp_ tool flips
the whole request to the extra-usage lane; mcp__ is accepted.
- build_anthropic_kwargs: promote ANY leading single-underscore mcp_ to mcp__
(bare names -> mcp__name; mcp_<server>_<tool> -> mcp__<server>_<tool>),
never double-prefixing an already-mcp__ name. Same for tool_use blocks in
history.
- normalize_response: reverse the mcp__ wire name back to whichever original
the registry knows — the single-underscore mcp_<server>_<tool> form for MCP
server tools, or the bare name for native tools — preferring a name that
already resolves natively.
- Tests rewritten to assert the invariant: ZERO single-underscore mcp_ names
reach the OAuth wire, and the mcp__ round-trip resolves back to the
registered name for both native and MCP-server tools.
Builds on liuhao1024's mcp__ prefix commit (cherry-picked). Closes the
MCP-server gap that left any session with an MCP server configured still
billing to extra usage.
Anthropic's Claude-Code request classifier treats tool names with a
single-underscore `mcp_<x>` prefix as non-Claude-Code / third-party,
routing the request to extra-usage billing (HTTP 400). Real Claude Code
uses double underscores: `mcp__<server>__<tool>`.
Change the tool-name prefix from `mcp_` to `mcp__` in both the outgoing
path (build_anthropic_kwargs) and the incoming path
(normalize_response). Update the skip-guard to check for both `mcp_`
and `mcp__` prefixes so native MCP server tools (which use the legacy
single-underscore format) are not double-prefixed.
Fixes#46675
`hermes login` was removed in favor of `hermes auth` / `hermes model`, but
the subparser still validated `--provider` against a hardcoded choices list
(nous, openai-codex, xai-oauth). Running `hermes login --provider anthropic`
therefore crashed in argparse with `invalid choice: 'anthropic'` *before* the
deprecation handler could print the redirect to `hermes model` — so a user
trying to authenticate a perfectly valid provider just saw a hard error and
assumed the feature was broken rather than relocated.
- Drop the restrictive `choices=` so every `--provider` value reaches the
deprecation handler (which ignores the value and prints guidance).
- Omit the subparser `help=` kwarg so the dead command no longer advertises
itself in `hermes --help` (#24756). Avoids the `==SUPPRESS==` placeholder
leak that `help=argparse.SUPPRESS` emits for a top-level subparser on 3.12+.
- `hermes login [--flags]` still reaches the actionable deprecation message
for old scripts/aliases; `hermes login --help` shows the redirect.
Picks up the intent of the inactivity-closed #24902, rebased onto the
post-refactor parser location (hermes_cli/subcommands/login.py) and extended
to fix the whole bug class (any provider value), not just hiding from --help.
Tests: parametrized provider acceptance + help-suppression (no SUPPRESS leak).
The docstring described a token as path-like when it contains a "/"
separator, but the keystroke-latency fix now excludes "://" scheme tokens
(URLs) even though they contain "/". Document the exclusion so the contract
matches the behavior.
Regression coverage for the keystroke-latency fix: a URL token contains
"/", so the bare-slash path heuristic used to return it as a path word and
run os.listdir on every keystroke. Assert _extract_path_word rejects
http/https/ssh scheme tokens, that ordinary paths (incl. a bare colon) are
unaffected, and that the completer never touches the filesystem for a URL
under the cursor.
The interactive CLI input box runs its completer with
`complete_while_typing=True`, so `SlashCommandCompleter.get_completions`
is invoked on *every* keystroke. That completer does blocking I/O:
fuzzy `@`-file indexing shells out to `rg`/`fd` (up to a 2s timeout) and
file-path completion calls `os.listdir` + `stat`. Because the completer
was passed inline (never wrapped in `ThreadedCompleter`), all of this ran
synchronously on the prompt_toolkit event loop, stalling the render after
each key — very noticeable on WSL2 and other slow-filesystem setups
("typing in the prompt box being very latent").
Two fixes:
- Wrap the input completer in `ThreadedCompleter` so completion work runs
off the UI event loop and never blocks rendering between keystrokes.
- Stop treating URLs as file paths in `_extract_path_word`: a token like
`https://example.com/x` contains `/`, so it triggered `os.listdir` on
every keystroke while typing/pasting a link (listing a bogus `https:`
dir) for a completion that can never be useful. Skip any token with a
`://` scheme separator.
(cherry picked from commit b5be2ba276)
Streamdown runs our `preprocess` inside its own useMemo, and the user
bubble runs `extractEmbeddedImages`/directive parsing inside theirs — so
anything thrown while rendering one message (a regex/stack overflow on
adversarial content) escapes to the ROOT error boundary and takes down
the entire app, as seen in a reported `RangeError: Maximum call stack
size exceeded` from a single message.
Wrap both the assistant preprocess pipeline and the user-message
directive passes in try/catch that degrade to the raw text. One bad
message now renders plain instead of nuking the transcript.
`normalizeFenceBlocks`/`pushProseFence` appended block bodies with
`out.push(...lines)`, which spreads every line as a separate call
argument. A single message carrying a large fenced block (a logged
minified bundle, base64 blob, or big tool dump — common in long
sessions) overflows V8's argument-count limit and throws
`RangeError: Maximum call stack size exceeded`, breaking the transcript
render. Compression doesn't save us: it gates on tokens vs. window, not
a single message's line count, and the protected recent tail renders
verbatim regardless.
Append iteratively via a small `extend()` helper. Behavior is identical
for normal-sized blocks.
sync_turn's bounded join could drop a still-alive previous worker by
replacing the single _sync_thread slot. The dropped worker kept POSTing
under the old sid but was no longer visible to on_session_end /
on_session_switch, so the commit could fire while orphaned writes were
still in flight — those writes landed past the commit boundary and were
never extracted.
Replace the single _sync_thread slot with _inflight_writers:
Dict[sid, Set[Thread]]. Writers self-register on spawn (sync_turn,
on_memory_write) and self-deregister on exit. The commit path drains
_drain_writers(sid, 10.0) and skips the commit if any writer for that
sid is still alive after the bounded budget.
Also trim inline review-rationale comments to short invariants per
reviewer style ask: "commit only after session writes drain" and
"drop prefetch results from older switch generations."
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 7537ee6f5b)
Three follow-ups from review on #28296:
1. Sync worker outliving the bounded join. Each sync_turn POST has
_TIMEOUT=30s and there are two per turn, but on_session_end and
on_session_switch only join for 10s. If the worker is still alive
after the join, committing the old session orphans the worker's
late writes past the commit boundary — they land in an already-
committed session and never get extracted. Both hooks now re-check
is_alive() after the join and skip the commit when the worker
hasn't drained.
2. on_memory_write late session_id capture. Same shape as the
pre-fix sync_turn: f-string for the post path read self._session_id
inside the worker, so a switch between thread spawn and post call
landed the memory note in the new session. Snapshot sid at call
time, same pattern as sync_turn.
3. Stale prefetch repopulating the new session. The pre-switch
drain+clear only protects against workers that finish before the
join completes; one finishing after the clear would write its
result into the new generation's slot. Added a monotonic
_prefetch_generation; workers capture it at spawn and refuse to
write if it has advanced.
Tests: existing in-flight-sync test updated to drain (it tested the
join-before-commit happy path); four new tests cover hung-writer skip
on end + switch, on_memory_write sid capture, and prefetch generation
gating. 177/177 memory tests pass.
(cherry picked from commit 3791a87dbe)
Two hardening fixes prompted by review on #28296:
1. sync_turn() now snapshots the target session id before spawning the
worker. The previous code read self._session_id inside the worker, so
a worker delayed past on_session_switch's bounded join could read the
rotated-in NEW id and write the OLD turn's messages into the wrong
session.
2. on_session_end() resets _turn_count to 0 after a successful commit,
making the old-session commit path idempotent with the new switch
hook. /new and compression call commit_memory_session() (which fires
on_session_end) immediately before on_session_switch; without this,
the old session would be committed twice. On commit failure we leave
_turn_count > 0 so on_session_switch retries.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
(cherry picked from commit 2ea8d5c537)
OpenVikingMemoryProvider only overrides on_session_end and inherits the
base-class no-op for on_session_switch. When the agent rotates session_id
(via /new, /branch, /reset, /resume, or context compression), the
provider's cached _session_id stays at the value initialize() captured.
All subsequent sync_turn writes then land in the already-closed old
session, and on_session_end tries to commit it a second time — the new
session never accumulates messages and never triggers memory extraction.
The fix mirrors the pattern Hindsight uses (#17508):
1. Wait for any in-flight sync thread to drain under the OLD _session_id
before we mutate it, otherwise the commit below races the last
message write.
2. Commit the old session if it accumulated turns — same extraction
semantics as on_session_end. Skip if empty (nothing to extract).
3. Drain in-flight prefetch from the old session and clear its cached
result so the new session doesn't see stale recall.
4. Rotate _session_id to the new value and reset _turn_count.
Commit failures are swallowed (logged at WARN) so a flaky server can't
strand the provider on the old session forever — same posture as the
existing on_session_end commit.
(cherry picked from commit a1e7185e8a)
Closes#47111
is_container() only recognized Docker (/.dockerenv), Podman
(/run/.containerenv), and docker/podman/lxc markers in /proc/1/cgroup.
Under cgroup v2 (Kubernetes/k3s on containerd or CRI-O) /proc/1/cgroup
collapses to a single "0::/" line with no runtime marker, so
is_container() returned False on every containerd/CRI pod.
That false negative bypassed container-aware behavior across the CLI.
The most damaging case (reported): even after #46290 fixed
detect_service_manager() to gate on _s6_running() alone, other
is_container() call sites (profile home resolution, gateway behaviors,
config, doctor) still misbehave on containerd.
Broaden detection conservatively:
- KUBERNETES_SERVICE_HOST env var (present in every k8s pod).
- kubepods/containerd/crio markers in /proc/1/cgroup (cgroup v1 nested).
- same markers in /proc/self/mountinfo as a cgroup-v2 fallback.
Tests: 3 new (k8s env, kubepods cgroup, cgroup-v2-via-mountinfo) plus the
existing negative case hardened to stub mountinfo + env; 108 constants +
service_manager tests pass.
Follow-up to salvaged PR #41633: the timestamp prefix injection was
unconditional. Gate the in-context render behind
gateway.message_timestamps.enabled (default false) at both the live-message
and history-replay sites; timestamp metadata is still captured + persisted
regardless so the toggle can be flipped on later. Add DEFAULT_CONFIG entry,
docs, and gate tests.
Consolidates these related Amy fork patches:
- 429830f39 feat(gateway): inject message timestamps into user messages for LLM context
- 3c3d6fac0 fix: handle both ISO string and epoch float timestamps in history replay
- 2874f7725 feat: human-friendly timestamp format with weekday and timezone name
- 3735f4c8b fix: render gateway message timestamps once
* fix(desktop): keep the pre-session model pick selected in the picker
The composer picker derived its "current" row from `model.options ?? store`,
so model.options always won. Pre-session that query returns the PROFILE
DEFAULT, not the sticky composer pick — so selecting a model before a session
exists left the checkmark (and the picker's "current" line) on the default,
making the pick look ignored even though the pill updated.
Add `currentPickerSelection()`: with a live session the gateway's model.options
is authoritative; pre-session the sticky `$currentModel`/`$currentProvider`
wins, falling back to options. Wire it into ModelMenuPanel and ModelPickerDialog.
* feat(desktop): global reasoning/speed defaults in Settings → Model
The composer picker is now sticky-UI/per-session only and never writes the
profile default (#46959), but Settings → Model had no reasoning/speed control
and `agent.reasoning_effort` wasn't in the curated config surface at all
(`service_tier` was buried in Advanced) — so there was nowhere to set the
profile default that crons/subagents/messaging resolve from.
Add capability-gated Reasoning (effort) + Fast controls beside the main model,
gated by the applied model's reported capabilities (reasoning defaults on, fast
off when unreported — same as the composer). They read/write `agent.reasoning_effort`
and `agent.service_tier` by round-tripping the config record, matching the
gateway's value semantics (service_tier "fast"/"priority"/"on" ⇒ fast).
* refactor(desktop): don't open the reasoning select from its row label
A <label> wrapping the Select forwarded text clicks to the trigger, opening
the dropdown unexpectedly. Plain row for reasoning; Fast stays a <label> so
clicking its text toggles the switch (expected for a checkbox-like control).
* fix(desktop): re-download Electron binary via mirror when pack fails (#47266)
Since #38673 pinned build.electronDist to node_modules/electron/dist,
electron-builder reads the Electron binary straight from there and never
downloads it during `npm run pack`. That dist tree is only produced by the
electron package's postinstall (install.js) during `npm ci`. When that
download is blocked or throttled (GitHub's release host is unreachable in
some regions), the dist is missing and the build dies with:
The specified electronDist does not exist: .../node_modules/electron/dist
The existing ELECTRON_MIRROR fallback in all three desktop-build paths
(scripts/install.ps1, scripts/install.sh, and `hermes desktop` in
hermes_cli/main.py) re-ran `npm run pack` with ELECTRON_MIRROR set — but
pack never downloads Electron anymore, so the mirror was never used and the
retry re-read the same missing dist. The fallback was effectively dead.
Drive the mirror through electron's own downloader instead:
- Add a dist-presence check + a downloader helper (Test-ElectronDist /
Restore-ElectronDist, _electron_dist_ok / _restore_electron_dist,
_electron_dist_ok / _redownload_electron_dist) that wipes a partial dist
+ the path.txt version marker (electron's install.js short-circuits on it)
and re-runs `node install.js`, optionally via a mirror.
- On the first retry, repopulate a missing dist from the canonical source;
on the mirror retry, re-fetch through npmmirror.com, then pack.
- Gate the re-download on the dist check so an unrelated build failure
(tsc/vite) doesn't trigger a pointless ~200 MB refetch, and skip the final
pack when the binary still can't be fetched instead of failing the same way.
* test(desktop): cover Electron dist re-download mirror fallback (#47266)
Add behavior coverage for the electronDist re-download fix:
- _electron_dist_ok across linux/win32/darwin, including the partial-dist
case (dir present but binary missing) that makes the pinned electronDist
fail.
- _redownload_electron_dist: no-op when the binary is present, bail when
install.js is absent, wipe a stale dist + path.txt marker and run
electron's downloader with ELECTRON_MIRROR injected, and report failure
when the download still produces no binary.
- `hermes desktop`: the mirror fallback now drives electron's own downloader
before re-running pack, and skips the final pack entirely when the binary
can't be fetched.
Replaces the old mirror test that asserted the (now-fixed) dead behavior of
re-running `npm run pack` with ELECTRON_MIRROR set — pack never downloads
Electron under the pinned electronDist, so that retry could never help.
Guards that two user-defined custom endpoints exposing an overlapping
model each keep their full catalog — the dedup must never cross-filter
two user-defined rows against each other.
The /model interactive picker resolved a base_url from user credentials
but never passed it to ProviderProfile.fetch_models(), causing the
picker to always query the provider's hardcoded default endpoint
instead of the user's custom URL (e.g. a company litellm proxy).
- providers/base.py: add optional base_url parameter to fetch_models()
- hermes_cli/models.py: pass resolved base_url to fetch_models()
- Update all subclass overrides for signature compatibility
- Add 6 regression tests covering override, fallback, and integration
Support files under references/, templates/, assets/, and scripts/ are progressive-disclosure data loaded through skill_view(..., file_path=...). They should not be treated as standalone skills during discovery or collision checks.
This prevents archived skill packages or support markdown files inside a real skill from shadowing active skills with the same name while still allowing top-level categories named scripts/templates/assets/references.
Tests cover:
- pruning nested SKILL.md files inside skill support directories
- preserving support-named top-level categories
- avoiding skill_view collisions from support markdown
- keeping archived package SKILL.md files accessible only through file_path
Salvage follow-up for PR #29575: add regression tests for the section-3
no-api_key /v1/models probe (probes bare endpoints, skips when explicit
models set) and add the contributor AUTHOR_MAP entry.
Section 3 of list_authenticated_providers (user-defined endpoints from
the providers: config section) required an api_key before probing the
endpoint's /v1/models for live model discovery. This broke local
self-hosted backends (llama.cpp, Ollama, vLLM, etc.) that don't require
authentication — they would only ever show the single default_model
from config instead of the full model catalog.
Section 4 (custom_providers list) already handled this correctly with
the policy: probe when api_key is set OR when no explicit models are
configured. Apply the same logic to Section 3 so local backends get
full model discovery without requiring a placeholder api_key workaround.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Cover #47375 fix: record-on-rich-send + lookup-on-reply round trip,
lookup miss leaving reply_to_text None, and precedence (native quote
and echoed caption both win over the index fallback).
Telegram does not echo a sendRichMessage's content back in
reply_to_message (.text/.caption empty, .api_kwargs None), so replies
to rich sends (briefings, the gateway's own rich finals) arrived with
no quotable text and the [Replying to: ...] injection was skipped.
Remember message_id -> text at send time in a best-effort JSON index
(gateway/rich_sent_store.py), and recover it on inbound when text and
caption are both empty. Best-effort and no-throw throughout: any
failure degrades to prior behavior and never breaks a send or message.
Salvaged from #47375 by @x1erra. Dropped the cross-platform run.py
reply-prefix rewrite (out of scope; bloated every reply on every
platform) and scrubbed a docstring reference to an out-of-repo script.
Kept the inbound reply_to logging enrichment used to verify the fix.
The #45954 model-dedup builds `user_models` from every is_user_defined
row, then strips those model IDs from every row where is_aggregator(slug)
is True. But is_aggregator() returns True for *every* `custom:*` slug, and
list_authenticated_providers emits named custom providers with slug
`custom:<name>` and is_user_defined=True. So a user's own custom provider
is treated as an aggregator and filtered against user_models — which holds
exactly its own models (the row helped build that set). Every model is
removed, the row drops to zero, and the provider disappears from the model
picker.
Guard the dedup loop to skip is_user_defined rows: a user's configured
provider is never an aggregator duplicate of itself. Built-in aggregators
(openrouter, etc.) are still deduped as before. Adds a regression test.
Reflect the default-model change in the xAI Grok OAuth guide, the web
search docs (EN + zh-Hans), and the web provider docstring. grok-4.3 is
kept in the model tables as the previous default; the Nous/OpenRouter
aggregator catalog still lists grok-4.3 and is left unchanged.
Switch the default model for the xAI/Grok provider and the xAI web
search backend from grok-4.3 to grok-build-0.1. grok-build-0.1 is
already recognized by the model metadata, so no new model definition
is required; grok-4.3 remains selectable.
Follow-up to salvaged PR #41624:
- Remove stray urllib.parse import in run_agent.py (cherry-pick cruft, unused)
- Add tests: session:compress emits with correct context, no-callback is
safe, and a callback exception does not break compression
* feat(desktop): stream subagent replies into watch windows
A desktop watch window resumes a child session lazily (no full agent) and
mirrors the parent-relayed `subagent.*` events into native child-session
stream events. The child's streamed reply text was never relayed, so the
window sat blank while the subagent "talked".
- delegate_tool: forward the child's `run_conversation` stream tokens up the
progress relay as `subagent.text` (inert under CLI/TUI — their progress
handlers ignore non-tool event types; only a gateway watch window mirrors it).
- server: mirror `subagent.text` -> `message.delta` on the child sid only, and
skip the parent emit (per-token frames are meaningless on the parent session,
which shows the child via the spawn tree). Demote `subagent.start` to a
one-time goal header and drop the noisy `subagent.progress` mirror — tools
already mirror natively.
- server: guard `_start_agent_build` so a lazy watch session spectating an
in-flight child stays lazy; incidental RPCs were upgrading it to a full
agent mid-stream and silently killing the mirror.
* fix(desktop): keep watch-window chat clear of titlebar chrome
Secondary windows (new-session scratch, subagent watch, cmd-click pop-out)
hide the titlebar tool cluster + session header, so the transcript ran to the
window's top edge and streamed text slid up under the OS traffic lights.
- Gate the hidden chrome on `isSecondaryWindow()` everywhere (app-shell,
chat header, thread list) instead of the narrower new-session flag.
- Add a fixed opaque drag-strip at the top of the secondary-window transcript:
content padding alone scrolls away with the text, so the strip masks
anything behind it and keeps the window draggable like the main header.
* fix: WSL subagent window
* fix: subagent window top padding
---------
Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Follow-up to salvaged PR #41619: replace the module-global
_truncation_warnings list with a contextvars.ContextVar so concurrent
gateway-session prompt builds can't drain or clear each other's pending
warnings (cross-session leak). Adds a context-isolation test.
PROBLEM: Automatic context files such as SOUL.md and AGENTS.md were capped by a hardcoded CONTEXT_FILE_MAX_CHARS value. Amy's local fork had raised that constant from 20K to 25K so a larger SOUL.md would not be silently truncated, but the hardcoded 25K value changed upstream default behavior and made the patch less generally useful.
SOLUTION: Restore the upstream-compatible 20K default, add a context_file_max_chars config setting for users who intentionally keep larger identity/project-context files, keep chat-visible truncation warnings, and document the new setting. Tests cover the default, config override, explicit max_chars precedence, and the warning text.
Z.ai released GLM 5.2 on 2026-06-15, available on OpenRouter:
- https://openrouter.ai/z-ai/glm-5.2
GLM-5.2 is Z.ai's flagship for long-horizon tasks, shipping a 1M-token
context window (up from 200K on GLM 5.1) and tool calling. Per the
OpenRouter API: text-only, context_length 1048576, tools supported.
No separate -fast variant exists.
The 1M context length, native zai picker entry, setup wizard, and Z.ai
coding-plan auth entries for glm-5.2 already landed on main. This fills
the remaining gap: the two aggregator surfaces where glm-5.1 appears but
glm-5.2 did not.
Changes:
hermes_cli/models.py
- Add z-ai/glm-5.2 to the OpenRouter fallback snapshot (OPENROUTER_MODELS)
and the Nous Portal curated list (_PROVIDER_MODELS["nous"]), newest
flagship first. Live catalogs surface it automatically when reachable;
the fallback lists matter when the manifest fetch fails.
website/static/api/model-catalog.json
- Regenerated via scripts/build_model_catalog.py (not hand-edited) so the
manifest stays in sync with the source lists; guarded by
tests/hermes_cli/test_model_catalog.py.
The generic live+curated merge (commit 630b438) seeded the merged list
from live results, demoting curated-only models below live ones. That
regressed #46309, which deliberately surfaces the newest curated model
(kimi-k2.7-code) FIRST in the native picker even when the live /models
listing lags. Restore curated-first ordering: curated entries lead (in
catalog order), live-only entries are appended for discovery. This keeps
the #46850 fix (zai glm-5.2 now appears) without the kimi regression.
Also switch the validate_requested_model curated fallback (commit
ee7b8a4) from provider_model_ids() — which triggers a second, uncached
live /models fetch with its own 8s timeout and may resolve different
credentials than the api_key/base_url just probed — to the pure-catalog
helper _model_in_provider_catalog(). Membership is checked against the
shipped catalog only, with no extra network call.
Tests: restore the curated-first assertion in
test_kimi_coding_live_catalog_does_not_hide_curated_k2_7_code; update
the new merge tests to curated-first semantics; de-circularize the
validation fallback tests to patch _PROVIDER_MODELS (the real source)
instead of mocking the function under test.
Generalizes #32663 (@ehz0ah). The slash-skill scaffolding pollution
affected every auto-syncing memory provider — mem0, hindsight, retaindb,
byterover, honcho, supermemory all store/embed the raw user turn, so a
/skill invocation poisoned their stores with the full skill body, not just
openviking.
- Lift the contributor's parser into agent/skill_commands.py as the canonical
extract_user_instruction_from_skill_message(), co-located with the message
builders so the markers can't drift.
- Strip once in MemoryManager.{prefetch_all,queue_prefetch_all,sync_all} —
fixes the whole provider fan-out, bare /skill turns are skipped entirely.
- OpenViking's _derive_openviking_user_text() now delegates to the shared
helper as defense-in-depth (no duplicated marker literals).
- Marker-drift regression now asserts against the canonical skill_commands
constants; add manager-level coverage proving every provider gets clean text.
Rewrites the Shop personal-shopping-assistant skill to use the
@shopify/shop-cli (with a full direct-API fallback in references/),
replacing the previous curl-only shop-app skill.
- Rename optional-skills/productivity/shop-app -> shop
- Add references/: catalog-mcp.md, direct-api.md, safety.md, legal.md
- Catalog discovery via Shopify Global Catalog MCP (search / lookup /
get-product), device-authorization sign-in, UCP agent checkout with
delegated spending budget, and order tracking / returns / reorder
- One-product-per-message presentation rules + per-channel overrides
- Expanded security, safety, and legal guidance
Website docs are auto-generated from SKILL.md by CI
(website/scripts/generate-skill-docs.py), so no docs are hand-edited here.
Support linking, copying, and creating ovcli.conf during OpenViking memory setup.
Make setup cancellation write nothing and cover OpenViking/Hindsight picker cancellation paths.
Clicking a model row in the composer dropdown now commits and closes the menu
(via a close context); the hover-revealed reasoning/fast submenu stays open to
tweak. The pill shows a quiet braille loader instead of literal "No model"
until one resolves, and steer takes over the mic slot while typing into a
running agent.
A live config.set model switch already moved the next API call to the new model,
but the conversation could still restore an old sessions.system_prompt snapshot
whose Model/Provider lines named the previous runtime. That made "what model are
you?" answer from stale metadata even while inference ran on the new model.
After a live switch we now refresh the stored system prompt and append a real
system-history pivot (not a fake user turn) so the transcript itself records the
new model/provider. Restore also rejects already-stale prompt snapshots when
their Model/Provider lines disagree with the runtime, so existing bad sessions
self-heal.
The picker no longer touches the profile default. Model/effort/fast live as
plain UI state persisted in localStorage, so a pick follows across Cmd+N and
restarts instead of snapping back. New chats ship that state through
session.create as per-session overrides; live chats still scope switches to the
current session. Settings -> Model remains the only surface that writes the
profile default.
The gateway now accepts those session.create overrides, builds the agent with
them directly, reflects them in the immediate session.info payload, and writes
the chat's own model_config into the lazy DB row so reconnect/resume restores
that chat instead of the global default.
* fix(skills): guard recursive skill delete against tree-escape
Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire
working directory: a built-in-skill sentinel location resolved to the server
cwd and the skill-removal endpoint ran a recursive delete on it.
Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the
agent-facing skill_manage(action='delete') path did a bare
shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target():
refuse to rmtree a path that (1) isn't strictly inside a known skills root,
(2) is a skills root itself, or (3) is reached via a symlink/junction.
Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree
all refused). E2E verified with real symlink + file I/O.
* feat(desktop): allow /browser connect on a local gateway
/browser was hardcoded as terminal-only in the desktop slash palette, so
the chat GUI rejected it with "only available in the terminal interface."
The TUI already drives the live CDP connection via the browser.manage RPC.
Wire the same RPC into the desktop dispatcher as a /browser action handler,
gated to local-gateway connections ($connection.mode !== 'remote'). connect
mutates BROWSER_CDP_URL (and may launch Chrome) in the gateway process, so
it's only meaningful when that process runs on this machine; a remote
gateway gets a clear "local gateway only" message instead.
PROBLEM: Mattermost threads can become invalid or enormous, exposing two failure modes: internal scratch/reasoning/commentary displays could leak into persistent Mattermost threads via global display toggles, while rejected threaded user-visible replies could disappear unless every failed send fell back flat. A broad flat fallback would pollute channels with tool/status/progress noise.
SOLUTION: Require explicit Mattermost platform opt-in for scratch displays, keep using the existing notify=True metadata marker for user-visible final text/media/file replies, and allow the Mattermost plugin adapter to flat-fallback only notify-worthy sends whose threaded POST failure looks like a broken root/thread. Keep tool/status/progress and other non-notify sends thread-strict. Add regression tests for display opt-in, notify-only broken-thread fallback, generic API failure suppression, and stream notify metadata.
Verification: tests/gateway/test_mattermost.py tests/gateway/test_stream_consumer.py tests/gateway/test_stream_consumer_thread_routing.py tests/gateway/test_stream_consumer_fresh_final.py tests/gateway/test_stream_consumer_draft.py; tests/gateway/test_session_api.py tests/gateway/test_status_command.py tests/gateway/test_resume_command.py tests/hermes_cli/test_commands.py; py_compile touched gateway files; git diff --check.
Session: Mattermost thread 6qg8e9dd1pd9pkhi74xyaa1mry, 2026-06-01.
`BasePlatformAdapter.send_multiple_images` passes `metadata=metadata` to
`send_image` / `send_image_file` / `send_animation` on every send. The
WhatsApp and email `send_image` overrides stopped their signature at
`reply_to`, so any image delivered as a URL (the common case — image-gen
backends return URLs) raised:
TypeError: send_image() got an unexpected keyword argument "metadata"
and the image silently failed to send. Their sibling overrides
(`send_image_file` / `send_video` / `send_voice` / `send_document`)
already absorb it via **kwargs, which is why only plain image-URL sends
broke.
- whatsapp/email `send_image`: accept `metadata` (matches the base
signature); WhatsApp forwards it to the super() text fallback.
- Add `tests/gateway/test_media_metadata_contract.py`: asserts WhatsApp +
email accept it, plus a best-effort sweep over every adapter so the next
slip fails at test time instead of in production.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Slack caps apps at 50 slash commands and the registry is at that ceiling, so
adding /debug clamped it out of the native list and broke the telegram-parity
test (debug on Telegram, absent from Slack native slashes, in neither
exclusion set). Add 'debug' to _SLACK_VIA_HERMES_ONLY — same treatment credits
already gets. /debug stays native on CLI/TUI/Telegram/Discord and reachable via
/hermes debug on Slack.
test_session_create_no_race_keeps_worker_alive flaked on CI shard 3 with
'build thread unregistered its own notify despite no race' while passing
20/20 in isolation locally. Root cause: daemon build threads from sibling
session.create tests in the same shard process mutate the shared
server._sessions dict under _sessions_lock and can replace/pop entries
mid-run, flipping this build thread's 'replaced' check (server.py:1011) to
True and triggering a spurious unregister_gateway_notify.
Fix is test-only: snapshot + clear server._sessions before the request so
the test sees only its own session, restore siblings in finally. Also assert
agent_ready.wait() actually returned True (was silently ignoring timeout) and
bump the timeout 2s -> 10s for loaded CI runners.
Classify exhausted pool-only openai-codex credentials as quota/rate-limited instead of missing auth. This prevents auth status and runtime credential resolution from reporting missing credentials when a valid manual:device_code pool credential exists but is temporarily in a 429 usage-limit cooldown.
Adds regression coverage for pool-only Codex auth status and runtime resolution.
Add display.tool_progress_style setting to control how tool progress
messages are displayed in chat platforms:
- 'accumulate' (default): Edit a single message with all tool calls
(new v0.9.0 behavior)
- 'separate': Send each tool call as its own message, interleaved
with thinking messages (pre-v0.9 behavior, better readability)
The setting participates in the per-platform display override system
and can be set globally or per-platform.
Files: gateway/display_config.py, gateway/run.py
When display.memory_notifications is set to 'verbose', skill_manage
notifications now show meaningful change details instead of just the
generic tool message.
Before (verbose mode):
💾📝 Patched SKILL.md in skill 'gogcli' (1 replacement).
After (verbose mode):
💾📝 Skill 'gogcli' patched: "old pitfall text..." → "new pitfall text..."
Changes:
- skill_manager_tool.py: _patch_skill() now includes old/new string
previews (truncated to 200 chars) in the result via '_change' key.
_create_skill() and _edit_skill() include skill description from
frontmatter for verbose create/edit notifications.
- run_agent.py: Background review notification builder now reads the
'_change' dict from skill tool results and formats descriptive
notifications per action type (patch → old→new diff, create/edit →
description preview). Falls back to generic message when _change
data is unavailable (backwards compatible).
This is especially useful when subagents patch skills, since neither
the user nor the parent agent can see what the subagent changed.
Background memory reviews now support three notification modes,
configured via display.memory_notifications in config.yaml:
off — no chat notification (still logged to stdout/HA log)
on — generic '💾 Memory updated' (default, unchanged behavior)
verbose — content preview with action indicators:
💾 Memory ➕ Hermes Repo liegt unter /config/amy/hermes-agent/...
💾 Memory ✏️ Updated repo path from claude-code to hermes-agent...
💾 Memory ➖ old entry about claude-code path...
Previews are truncated to 120 chars for adds/replaces, 60 for removes.
Each action gets its own line in verbose mode for readability.
Files: run_agent.py, gateway/run.py
Streamed Telegram replies that finalize through editMessageText were
converted to MarkdownV2, which has no table syntax and rewrites pipe
tables into bullet lists — users saw a table while streaming that
collapsed to a list at the last moment.
Finalize now edits the existing preview IN PLACE via Bot API 10.1's
editMessageText rich_message parameter when the content has constructs
the legacy path degrades (tables, task lists, <details>, block math).
No fresh send + delete, so no duplicate-preview flicker — the reason
#46206 reverted the fresh-final re-send path. prefers_fresh_final_streaming
stays False; the in-place edit replaces it.
- _needs_rich_rendering(): rich reserved for table/task-list/details/math
(adapted from #45995, @YonganZhang); plain replies stay on MarkdownV2.
- _try_edit_rich(): editMessageText + rich_message via do_api_request,
mirroring _try_send_rich's fallback/latch/transient contract.
- edit_message finalize tries rich in place before the 4,096 overflow
pre-flight (rich cap is 32,768), falling back to legacy on rejection.
- rich_messages default flipped back to True (DEFAULT_CONFIG + adapter).
- docs (en + zh-Hans) + cli-config example updated to default-on.
Closes the root cause behind #45911 / #46009.
When live /v1/models responds but omits a model that exists in the
curated static catalog, validate_requested_model now accepts it with
a note instead of rejecting. This covers the /model slash-command path
(the picker path was already fixed in the parent commit).
Addresses review feedback from potatogim on #46857.
When a provider's live /v1/models endpoint returns a stale or incomplete
list (e.g. Z.AI missing glm-5.2), the generic profile-based code path
returned only the live results, silently dropping curated models.
Generalize the kimi-coding merge pattern to all providers: live entries
come first (provider's preferred order), then curated-only entries are
appended with case-insensitive dedup. This ensures models that the live
endpoint omits still appear in /model picker.
Fixes#46850
External providers (Claude Code) store creds outside Hermes, so the
disconnect API refuses them. The backend now hands the GUI a per-OS
`disconnect_command` that clears the credential the same way the CLI's
logout does (macOS Keychain entry + ~/.claude/.credentials.json), and
the misleading "use claude setup-token" hint is corrected.
Settings → Providers offers a Disconnect button for these: it confirms,
leaves Settings, and runs the removal command in the embedded terminal
via a new runInTerminal() (queues onto $terminalInjection; the terminal
pane flushes and clears it once its session is live). The expanded list
also gets its own "Other providers" header so it no longer reads as
grouped under "Connected". API-managed providers keep the one-click
(trash) disconnect.
Each model remembers its own reasoning effort / fast mode (localStorage,
like model-visibility): editing a model's effort/fast in the submenu
writes its preset, and selecting a model restores its preset onto the
session (capability-gated, Hermes defaults when unset). Every row shows
its own remembered settings (grayed), and the row label and edit submenu
read the same effective value so they can't disagree.
Presets are desktop-client state only — applyModelPreset() no-ops without
a live session id, so selecting a model can't fall through to the
gateway's persistent agent.reasoning_effort / agent.service_tier writes.
Inactive variant `-fast` edits stay preset-only: toggleFast() records
{ fast } on the base model and only swaps models when the row is active,
and selectFamily() honors a saved variant-fast preset by selecting the
`-fast` sibling id.
Provider catalogs surface date-pinned snapshots (`…-20251101`) that the
picker rendered as standalone rows with the date baked into the name
("Opus 4 5 20251101"). Strip the trailing date from display names, and
fold a snapshot out of the list when its rolling alias is present so the
alias stays selectable/searchable while the exact dated id isn't shown
as its own row.
Relocate the model pill to the composer, left of the mic. A new
ModelPill reuses the live ModelMenuPanel dropdown verbatim (single
click target) and the formatModelStatusLabel "Model · Fast Med" label,
anchored to its right edge so the menu doesn't drift with model-name
length. modelMenuContent now flows to ChatView instead of
useStatusbarItems, and the status-bar model-summary item is removed;
the pill subscribes to the model atoms directly and falls back to the
full picker when the gateway is closed.
On a remote gateway connection, agent-written files live on the gateway
host, not the desktop's disk, so the Artifacts view's file:// hrefs failed
("Invalid external URL") and image thumbnails broke.
Make mediaExternalUrl() remote-aware in one place: in remote mode it
rewrites gateway-local paths to GET /api/files/download (a new endpoint
that streams the file as a Content-Disposition: attachment). The artifacts
view now resolves through it, and so do the existing chat-media and
generated-image callers, for free.
The download endpoint stays auth-gated; auth_middleware additionally
accepts the session token as a ?token= query param for this one path so a
shell/browser-opened download (which can't set the session header) still
authenticates — the same query-token tradeoff as the /api/pty WebSocket.
It is NOT added to PUBLIC_API_PATHS.
Salvages #46663 (which carried ~19k lines of CRLF noise and made the
endpoint public). Reimplemented on a clean LF base with the security hole
closed and tests added.
Co-authored-by: qingshan89 <qs2816661685@gmail.com>
* fix(skills): guard recursive skill delete against tree-escape
Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire
working directory: a built-in-skill sentinel location resolved to the server
cwd and the skill-removal endpoint ran a recursive delete on it.
Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the
agent-facing skill_manage(action='delete') path did a bare
shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target():
refuse to rmtree a path that (1) isn't strictly inside a known skills root,
(2) is a skills root itself, or (3) is reached via a symlink/junction.
Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree
all refused). E2E verified with real symlink + file I/O.
* fix(delegation): forward background flag in delegate_task dispatch
delegate_task is an _AGENT_LOOP_TOOLS member, so every surface (CLI,
gateway, desktop/TUI) routes it through AIAgent._dispatch_delegate_task.
That forwarder passed every schema field except background, so
delegate_task(background=true) was silently downgraded to a synchronous
run and returned the sync results payload instead of a delegation_id.
The model sees background in the schema (the call validates), but the
value never reached the function. Add the one missing kwarg so async
background delegation actually engages.
Port from Kilo-Org/kilocode#11240. Their issue #11227 lost a user's entire
working directory: a built-in-skill sentinel location resolved to the server
cwd and the skill-removal endpoint ran a recursive delete on it.
Hermes' /skills uninstall path (skills_hub.py) is already hardened, but the
agent-facing skill_manage(action='delete') path did a bare
shutil.rmtree(skill_dir) with no last-line validation. Add _validate_delete_target():
refuse to rmtree a path that (1) isn't strictly inside a known skills root,
(2) is a skills root itself, or (3) is reached via a symlink/junction.
Tests: 4 cases (normal delete works; symlinked dir, skills-root, out-of-tree
all refused). E2E verified with real symlink + file I/O.
Collapse segmentMergeIndex + mergeTextInto + the three append helpers
into a single segment-aware appendStreamPart core plus a part-factory
table. Same behavior, DRY.
Models that interleave their reasoning_content and content token streams
(Kimi/DeepSeek/GLM-style routes) emit text -> reasoning -> text deltas
within a single tool-bounded segment. Appending each delta as its own
part shredded one sentence into "Let me" / Thinking / "verify the file",
with a Thinking disclosure wedged mid-sentence.
Coalesce streaming deltas into the most recent same-type part within the
current segment (bounded by any non-streaming part, e.g. a tool call).
The opposite streaming channel is transparent, so a reasoning burst
between two content deltas no longer opens a fresh text part, while a
real tool call still starts a new segment and preserves narration order.
Data-layer only; the renderer already groups consecutive reasoning.
* feat(skills): add optional payments skills (Stripe Link, MPP, Projects)
Adds four optional skills under optional-skills/payments/ wrapping the
Stripe Link CLI, the Machine Payments Protocol (MPP) clients, and the
Stripe Projects CLI plugin. Plus a router skill (payments) that picks
between them based on user intent.
All four are gated [linux, macos] — Stripe's Link CLI does not yet
support Windows. The other CLIs (mppx, stripe projects) are
cross-platform on paper but the payments cluster moves as a unit until
Link CLI gains Windows support.
Skills:
- stripe-link-cli - one-time virtual cards + Shared Payment Tokens
- mpp-agent - HTTP 402 payments via mppx/Tempo/Privy/AgentCash
- stripe-projects - provision SaaS services + credential sync
- payments - router/index skill for the cluster
Hard invariants encoded in every skill:
- Card PANs/wallet keys never enter agent transcripts, logs, or memory
- Spend approvals are not self-bypassable (Link app / wallet UI / CLI prompt)
- Final totals confirmed with user before any --request-approval call
- Credential output files cleaned up after one-time use
Zero core touches. Skills install via:
hermes skills install official/payments/<skill>
* chore(skills/payments): drop router skill — skills shouldn't depend on other skills
Removed optional-skills/payments/payments/ — the router skill that
existed to hand off between stripe-link-cli, mpp-agent, and
stripe-projects.
Per project convention: skills should be independently loadable; a
router is a footgun because (a) it assumes the loader will follow its
recommendation rather than just loading what the user asked for, and
(b) it duplicates the trigger logic that already lives in each
sub-skill's '## When to Use' section.
The three remaining skills declare their own triggers and routing
hints. The optional-skills catalog still groups them under '## payments',
which is the appropriate place for cluster-level discoverability.
Also drops 'payments' from each remaining skill's 'related_skills' list
and removes the corresponding entries from the docs catalog + sidebars.
* feat(skills/payments): fold in danhill-stripe review feedback
- mpp-agent: add link-cli as a client option (when Link is already set
up, or the 402 challenge advertises method="stripe")
- stripe-link-cli: reframe Link account / payment method / approval app
as first-run setup, not hard preconditions (CLI configures them on
first run)
- regenerate the two affected optional-skills docs pages
The Honcho provider page documented the per-profile peer model (user
peer / AI peer / observation) but never the gateway axis — how platform
runtime IDs map to peers. Adds the three keys to the config table and a
short Gateway identity mapping subsection that points at the Honcho page
for the resolver ladder.
Uses the corrected pinUserPeer wording (pins non-agent users, overrides
aliases) so the provider-comparison reader gets the same accurate framing
as the dedicated page.
'everyone collapses to your peer' read as a promise about all traffic.
pinUserPeer pins the user-side peer and is checked before userPeerAliases
(session.py:335), so a pin overrides every alias — including agent peers.
For a multi-agent operator that silently pools distinct agents onto one
peer, the opposite of intent.
Scopes the wording to 'every non-agent gateway user', notes the pin
overrides aliases, and points agent-mesh operators at pinUserPeer:false +
userPeerAliases instead. Same correction in the wizard menu/echo text,
the plugin README, and the website Honcho page.
* feat(delegation): async background subagents via delegate_task(background=true)
delegate_task(background=true) dispatches a subagent that runs in the
background and returns a handle immediately, so the user and model keep
working while it runs. The full result — plus the original task source —
re-enters the conversation as a new turn when the subagent finishes,
riding the same completion-queue rail as terminal background processes.
- tools/async_delegation.py: daemon-executor registry, capacity cap,
rich self-contained completion event pushed onto the shared
process_registry.completion_queue (type='async_delegation').
- delegate_tool.py: background param + single-task dispatch branch;
batch async rejected (v1).
- process_registry.py: format_process_notification renders the rich
task-source block (goal/context/toolsets/model/status/result).
- gateway/run.py: dedicated _async_delegation_watcher drains + injects
results into the originating session (idle + post-turn), session_key
routing enrichment, shutdown interrupt of dangling delegations.
- config: delegation.max_async_children (default 3).
Reuses the existing idle-drain wiring rather than mutating a running
agent loop, preserving message-role alternation and prompt-cache
invariants. 13 targeted tests; CLI + gateway paths E2E-verified.
* test(delegation): make async non-blocking tests environment-independent
CI 'test (5)' flaked on a cold, 8-worker runner: the first
delegate_task(background=true) call measured 2.27s of one-time setup
(config load + child-agent construction + imports), tripping the
elapsed < 1.0 wall-clock assertion. That assertion was testing setup
overhead, not blocking.
Replace the wall-clock thresholds with the real invariant: dispatch
returns while the child is still gated (active_count == 1, completion
queue empty), which a synchronous impl could not do. Keep only a loose
4s sanity backstop well under the runner's 5s gate.
* fix(delegation): harden async background delegation
Follow-up review fixes:
- Detach background child from parent._active_children at dispatch —
otherwise parent-turn interrupts (Ctrl+C, mid-turn steering), cache
evicts (release_clients), and session close (/new) kill/close the
detached subagent mid-run, defeating the point of background mode.
Lifecycle is owned by the async registry's interrupt_fn.
- Make the capacity check atomic with the record insert (TOCTOU: two
concurrent dispatches could both pass active_count() and exceed the cap).
- TUI dedup: key async_delegation events by delegation_id — the
fallthrough keyed them all as ("", type), suppressing every completion
after the first in the desktop/TUI status feed.
- CLI /stop now interrupts running background delegations and /agents
lists them (they live outside the process registry and were invisible).
- Drop stray unbalanced ']' line from the re-injection block and the
unused _ASYNC_DEFAULT import.
Tests: detach-at-dispatch + concurrent-capacity race added (15 total in
test_async_delegation.py); 137 delegate + 140 process-registry/notify/watch
+ 7 TUI dedup tests pass.
* fix(delegation): harden async background completion drains
A GUI app launched from Explorer inherits the environment block captured at
login, so a HERMES_HOME set via 'setx' AFTER login is invisible in process.env
even though the CLI (a fresh shell) sees it. The desktop then silently fell
back to %LOCALAPPDATA%\hermes and reported 'No inference provider configured'
despite a valid configured home (#45471).
resolveHermesHome() now consults the live HKCU\Environment registry value on
Windows before the LOCALAPPDATA default. New windows-user-env.cjs helper parses
'reg query' output, expands %VAR% refs, and fails safe (returns null off-Windows,
on spawn error, or empty value). The registry value is normalized through the
same normalizeHermesHomeRoot() path as the env var for consistency.
Co-authored-by: jeffrobodie-glitch <jeffrobodie@gmail.com>
When a desktop/dashboard session had no agent built yet and the user explicitly
picked a provider in the model picker, config.set('model', ...) would first try
to initialize the agent from the (possibly broken) config default provider —
failing before the user's explicit switch could take effect, trapping them on a
misconfigured default.
config.set now pre-parses the model flags: if an explicit --provider is present
and no agent exists yet, it skips the default-provider agent build and routes
straight through _apply_model_switch with the explicit provider. _apply_model_switch
gained a parsed_flags passthrough (avoids double-parsing) and only falls back to
resolve_runtime_provider(requested=None) when no explicit provider was given.
The desktop hook now sends config.set instead of slash.exec for active-session
model changes, so errors from the selected provider surface to the user instead
of being swallowed.
Co-authored-by: rodboev <rod.boev@gmail.com>
Verifies `hermes debug` surfaces a TERMINAL_ENV override of
terminal.backend, reports the config value when no override is present,
and emits no spurious note when env and config agree.
`terminal.backend` in config.yaml is bridged to the TERMINAL_ENV env var,
but a TERMINAL_ENV set in .env / the shell overrides config and is what
terminal_tool actually uses. The dump printed only the config value, so a
user whose agent was jailed in a docker/podman sandbox via a stale
TERMINAL_ENV still saw `terminal: local` — hiding the real cause. Report
the effective backend and flag when TERMINAL_ENV overrides config.yaml.
When a user-defined provider (e.g. litellm-proxy) and an aggregator
(e.g. openrouter) both advertise the same model name, the Desktop/TUI
model picker would show the model under both groups. Selecting it from
the aggregator row silently set model.provider to the aggregator,
breaking calls because the aggregator doesn't actually serve that model
ID.
Fix: after list_authenticated_providers() returns, collect all models
from user-defined provider rows and filter them out of aggregator rows.
Uses is_aggregator() from hermes_cli/providers.py to identify
aggregators. Case-insensitive matching.
Fixes#45954
NVIDIA NIM API uses vendor-prefixed model IDs (e.g. qwen/qwen3.5-122b-a10b,
nvidia/nemotron-3-super-120b-a12b). The doctor command incorrectly warns that
vendor-prefixed slugs belong to aggregators like openrouter when nvidia is
the configured provider.
Add 'nvidia' to the providers_accepting_vendor_slugs set so doctor no longer
raises false-positive warnings for valid NVIDIA NIM configurations.
Fixes#35425
electron-builder 26.8.x can stage an Electron.app without its
Contents/MacOS/Electron binary, then fail renaming it to Hermes:
ENOENT: no such file or directory, rename .../MacOS/Electron -> .../MacOS/Hermes
This breaks `npm run pack` and the installer desktop stage before a
launchable Hermes.app exists.
- Point build.electronDist at the already-installed Electron dist so
electron-builder reuses it instead of re-unpacking from cache.
- Add a darwin-only prebuilder patch that restores the missing main
binary from the runtime dist before the rename. Idempotent (marker
guard), soft-fails on shape mismatch, survives node_modules reinstall.
Co-authored-by: ChasLui <chaslui@outlook.com>
* fix(teams): package Microsoft Teams SDK as an installable extra
The Teams adapter imports the microsoft-teams-apps SDK, but it was never
declared as a dependency, so source/local installs hit ImportError and the
adapter silently reported the SDK as unavailable. Add a 'teams' extra
(microsoft-teams-apps==2.0.13.4 + aiohttp) and document 'uv sync --extra teams'.
Per the 2026-05-12 [all] policy, opt-in messaging-platform SDKs are NOT added
to [all] (they would break every fresh install on a quarantined release); the
teams extra is installed on demand like the other platform backends.
Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>
* chore: map rio-jeong contributor email for attribution (#43945)
* feat(teams): lazy-install the Teams SDK on demand (parity with other channels)
The teams extra alone left Teams as the only messaging platform that wouldn't
auto-install its SDK — every other channel (telegram, discord, slack, matrix,
dingtalk, feishu) lazy-installs via tools.lazy_deps on first connect. Bring
Teams to parity:
- Add 'platform.teams' to LAZY_DEPS (microsoft-teams-apps + aiohttp).
- Replace the passive 'check_teams_requirements = check_requirements' alias with
a real lazy-installer that calls ensure_and_bind('platform.teams', ...),
rebinding all Teams SDK globals on success (mirrors check_slack_requirements).
- Call check_teams_requirements() at the top of TeamsAdapter.connect() so
enabling Teams installs the SDK on demand.
- Keep the passive check_requirements() as the registry check_fn so 'gateway
status' probes never trigger a pip install.
The 'teams' extra remains for packagers / explicit 'uv sync --extra teams'.
Tests: rework the alias test into shortcircuit + lazy-install assertions, and
update test_connect_fails_without_sdk to simulate an uninstallable SDK.
---------
Co-authored-by: rio-jeong <rio.jeong@thebytesize.ai>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
* fix(dashboard): scope chat sidebar model card to selected profile
The PTY already honors ?profile= on profile switch, but the JSON-RPC
sidecar created sessions against the dashboard launch profile. Pass the
management profile through session.create and reconnect on switch.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(dashboard): sync active profile with management scope
Align the sidebar switcher with the sticky active profile on load and
when "Set as active" is clicked, so Chat and management pages match
what the Profiles page shows as active.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(dashboard): auto-reconnect chat sidebar on profile switch
Bump the sidecar connection version when profile or PTY channel changes,
matching the manual Reconnect path so gateway and events sockets come
back without clicking the error banner.
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix(dashboard): prevent model selector chevron overlapping label
Use inline flex layout instead of Button suffix, which is absolutely
positioned and overlapped truncated model names at px-0.
Co-authored-by: Cursor <cursoragent@cursor.com>
---------
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: declare websockets as a core dependency
* fix(deps): relax dev setuptools pin 82.0.1 -> 81.0.0 (torch caps setuptools<82)
torch >= 2.11 publishes Requires-Dist: setuptools<82, so any environment
that resolves the dev extra together with torch is unsatisfiable:
$ uv pip install --dry-run ".[dev]" "torch==2.12.0"
x No solution found when resolving dependencies:
... torch==2.12.0 and all versions of hermes-agent[dev] are incompatible.
81.0.0 is the latest release under the cap and stays inside the declared
build-system window (setuptools>=77.0,<83). uv.lock regenerated with
'uv lock'; diff is scoped to the setuptools entry.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
* chore: map salvaged contributor emails for attribution
Add AUTHOR_MAP entries for the two cherry-picked contributors so the
check-attribution CI gate passes:
- yehaotian@xuanshudeMac-mini.local -> ArcanePivot (#45486)
- dbeyer7@gmail.com -> benegessarit (#44693)
---------
Co-authored-by: 玄枢 <yehaotian@xuanshudeMac-mini.local>
Co-authored-by: David Beyer <dbeyer7@gmail.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
The store pin changed package-lock.json, so the workspace-wide
npmDepsHash in nix/lib.nix is stale and the Nix flake check fails on
the hash mismatch. Use the hash reported by the real fetchNpmDeps
build (the flake check's `got:`), which is authoritative — it differs
from prefetch-npm-deps' lockfile-contents hash, exactly the divergence
nix/lib.nix already documents.
The desktop build break shipped because nothing in CI runs the
apps/desktop production build. typecheck only runs `tsc`, which does
not exercise Vite/Rolldown module resolution, so an unresolvable
package export (the @assistant-ui/tap "./react-shim" split) sailed
through green checks and only failed when users built from source on
install/update.
Add a desktop-build job that runs `npm run build` (tsc -b + vite build
+ assert-dist-built) for apps/desktop. This closes the gap so the same
class of break fails in CI instead of on every user's machine.
Lockfile invariant that would have caught the desktop build break: the
single hoisted @assistant-ui/tap must satisfy every @assistant-ui/*
package's declared tap requirement (deps or non-optional peer). It is a
contract, not a snapshot -- no hardcoded versions -- so it stays green
across routine bumps but fails the moment the cluster splits its tap
requirement again.
The desktop app is built from source on every install/update
(install.ps1 -> npm ci/install -> tsc -b && vite build). The
@assistant-ui packages share an internal reactivity lib,
@assistant-ui/tap, and only interoperate when they all resolve the
SAME tap version.
@assistant-ui/react@0.12.28 and @assistant-ui/core pin tap@^0.5.x
(which exports only "." and "./react"), but the caret range
react -> store@^0.2.9 floated store up to 0.2.18, which bumped its
tap peer to ^0.9.0 and began importing "@assistant-ui/tap/react-shim"
-- an entry point that only exists in the tap 0.9.x line. With the
hoisted tap stuck on 0.5.x, vite build crashed:
"./react-shim" is not exported ... from package @assistant-ui/tap
i.e. the opaque "apps/desktop build failed (exit 1)" everyone hit when
updating today.
Pin @assistant-ui/store via root overrides to 0.2.13 -- the last
release that targets tap@^0.5.x -- so react/core/store all agree on the
hoisted tap@0.5.14 again. Verified: tsc -b and vite build both pass.
PROBLEM: The old public /status PR drifted out of the current Amy patch stack, leaving /status without the model/provider, context window, or explicit cumulative token label that Wolfram uses to monitor context pressure from chat.
SOLUTION: Re-port the feature onto the current gateway status handler. Prefer live/cached agent runtime metadata, fall back to SessionDB + SessionStore state between turns, add localized status model/context lines, and keep token totals explicitly labeled cumulative.
Verification: tests/gateway/test_status_command.py, tests/hermes_cli/test_commands.py
Previously, delegate_task in batch mode only showed '3 parallel tasks'
without revealing what the tasks actually are. Single-task mode showed
the goal via the primary_args fallback, but batch mode had no goal
extraction.
Changes:
- build_tool_preview(): Add dedicated delegate_task handler that
extracts individual task goals from both single and batch modes.
Batch shows '3 tasks: Goal A | Goal B | Goal C'.
- _get_cute_tool_message_impl(): Show individual goals in CLI cute
messages for batch delegate calls ('3x: Goal A | Goal B').
- Add 4 tests covering single goal, batch goals, missing goals,
and no-goal edge case.
Asserts ensureGatewayProfile keeps $connection in lockstep with the active
profile's backend: activating a remote pool profile flips mode to remote,
returning to default resyncs to local, a failed descriptor fetch leaves the
prior connection intact, and a same-profile activation doesn't churn it.
Regression coverage for #46651.
The renderer's $connection seeds from the PRIMARY (window) backend at boot and
otherwise only refreshes on a sleep/wake reconnect. Activating a background
profile (ensureGatewayProfile) pointed the live gateway + REST at that profile's
backend but never updated $connection, so its `mode` stayed stuck on the
primary. With a local primary and a remote pool profile active, every code path
that branches on local-vs-remote misfired: image attachments went out via the
path-based `image.attach` instead of `image.attach_bytes`, handing the remote
gateway a client-only Windows path it can't resolve ("image not found: C:\..."),
and the /api/fs/* file browser and /api/media fetches targeted the wrong
machine.
Resync $connection from the now-active profile's descriptor right after the
gateway swap, so the remote-aware paths follow the live backend. Best-effort: a
failed descriptor fetch leaves the prior connection intact for boot/reconnect to
resync. Single-profile users are unaffected (the same-profile fast path never
runs the swap).
Fixes#46651
Follow up PR #46609's api.minimax.io reasoning report by moving the behavior out of the broad run_agent host gate and into the MiniMax provider profile. Only MiniMax-M3 on the documented OpenAI-compatible /v1 route gets reasoning_split/thinking/reasoning_effort; Anthropic-format MiniMax and non-M3 models keep their existing wire shapes.
Co-authored-by: goku94123 <gooku94123@gmail.com>
Track why a background process finished and include that source in notify-on-complete messages so SIGTERM from process.kill, kill_all, backend loss, and ordinary exits are distinguishable.
Route curator rollback through the same cross-process cron job lock, make save_jobs lock for legacy direct callers without deadlocking nested mutation paths, and harden the regression test so a second _jobs_lock caller really blocks across processes.
`hermes cron pause`/`resume`/`remove` run in their own CLI process (CLI →
cronjob tool → pause_job → update_job → save_jobs), entirely separate from
the gateway process that also writes jobs.json (mark_job_run, advance_next_run,
due-fast-forward in get_due_jobs). The only synchronization was a module-level
`threading.Lock`, which serializes writers *within a single process* but does
nothing across processes — and update_job/pause_job/remove_job/create_job did
not even take it.
The result is a classic lost update: a `cron pause` issued while the gateway is
live loads jobs.json, sets enabled=False, and saves; concurrently the gateway
loads the same file and saves back its run-bookkeeping, clobbering the pause.
The CLI prints "Paused" (it succeeded against its own in-memory copy) but the
job stays enabled and keeps firing, with no error surfaced. The scheduler's
`.tick.lock` flock can't be reused for this — it is held for the entire tick,
including multi-minute agent runs, so a CLI mutation would block for minutes.
Add `_jobs_lock()`: a short-held cross-process advisory file lock (fcntl/msvcrt
flock on `<hermes_home>/cron/.jobs.lock`) layered over the existing in-process
lock, and wrap every load→modify→save critical section with it — create_job,
update_job, remove_job, mark_job_run, advance_next_run, get_due_jobs,
rewrite_skill_refs. The lock degrades to in-process-only if neither fcntl nor
msvcrt is available, preserving prior behaviour. All critical sections are short
(field edits, no agent execution), so contention resolves in milliseconds.
Adds a regression test that proves the lock excludes a second process (an
in-process threading.Lock cannot).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Upgrade the Vite/esbuild surfaces that kept web, ui-tui, and the bootstrap installer on vulnerable esbuild versions, regenerate the root lockfile, and preserve intentional package+lock dependency edits during update lockfile cleanup.
Add a parser-only routing regression that proves raw WhatsApp group JIDs bypass channel-directory resolution and home-channel fallback, include channel_aliases.json in quick state snapshots, harden malformed alias handling, and map Keiron McCammon for release attribution.
send_message(target="whatsapp:<group-jid>") silently delivered to the
configured home DM instead of the requested group. Two gaps:
1. _parse_target_ref had no WhatsApp branch. Group JIDs (<id>@g.us),
user JIDs (<id>@s.whatsapp.net), linked-identity JIDs (<id>@lid), and
broadcast/newsletter JIDs matched no pattern and fell through to
`return None, None, False`, so the caller treated them as
unresolvable and used the home channel. The bridge's /send endpoint
accepts any chatId, so only the tool-side target parsing was at fault.
Add a whatsapp branch that recognizes native JIDs as explicit targets.
The pre-existing '+'-prefixed E.164 path is preserved.
2. WhatsApp groups have no human-friendly name — the channel directory
is regenerated from session data on a timer, so a group shows up as
its raw 18-digit JID and any hand-edit to channel_directory.json is
clobbered on the next rebuild. Add a user-maintained alias overlay
(~/.hermes/channel_aliases.json) re-applied on every build AND every
load, giving durable friendly names and letting a freshly-created
group be pre-named before its first message.
Tests: TestParseTargetRefWhatsAppJID (7 cases) for the parser;
TestChannelAliases (7 cases) for the overlay, plus an autouse fixture
isolating CHANNEL_ALIASES_PATH so a real alias file can't leak into the
existing directory tests.
Keep request dump writes on the shared atomic JSON path, add regression coverage for request body/error/stdout redaction, and map the salvaged contributor email for release attribution.
dump_api_request_debug() masks the provider Authorization header but writes
the request `body` (system prompt, tool defs, context-embedded values) and the
error message raw via atomic_json_write. This path also fires unconditionally
on API errors (not only under HERMES_DUMP_REQUESTS), so any secret surfaced
into context (e.g. an integration token) lands in cleartext at
request_dump_*.json on every failed call.
Run the serialized dump through the existing redact_sensitive_text() scrubber
(already used for logs/tool output) before persisting and before the
HERMES_DUMP_REQUEST_STDOUT print; preserve atomicity via temp-file +
Path.replace. Also add the Notion internal-integration prefix (ntn_) to
_PREFIX_PATTERNS so bare values are caught.
Per SECURITY.md §3.2 this is a redaction (in-process heuristic) hardening, not
a §3.1 vulnerability. Refs #46583.
converse() and converse_stream() were added in boto3 1.34.59. When Hermes
is installed editable into system Python (e.g. Ubuntu 24.04 ships 1.34.46),
the system boto3 takes precedence and calls to converse_stream fail with
AttributeError. Add an early version check in _require_boto3() that raises
a clear RuntimeError with upgrade instructions.
On Linux, systemd spawns core services (cron, nginx, sshd) with
deterministic PIDs and jiffy start_times across reboots. A service can
land on the exact same PID and start_time as a previous gateway, causing
acquire_scoped_lock to mistake it for a live gateway and block startup.
The existing stale-detection paths only covered:
- start_times both non-None and different (clear mismatch)
- start_times both None (macOS/Windows fallback to cmdline check)
The boot-time collision falls through both: times are non-None and
equal, so neither branch fired.
Add a third check: when both start_times are known and match but the
live process fails _looks_like_gateway_process, read its cmdline. If
the cmdline is readable (non-None), we have positive evidence of an
impostor and mark the lock stale. Requiring a readable cmdline keeps the
check conservative — if cmdline is unreadable we do not evict.
Adds an observation_scopes config key (and HINDSIGHT_RETAIN_OBSERVATION_SCOPES
env var) so retained memories can opt into per_tag / all_combinations /
custom scoping instead of Hindsight's default combined pass.
Threaded through _build_retain_kwargs so all three retain paths honor it:
auto-retain and flush-on-switch already use aretain_batch; the tool retain
path is switched from aretain to aretain_batch (functionally equivalent,
aretain just wraps a single-item batch) since aretain doesn't accept the
observation_scopes parameter.
Salvaged commit in this PR is authored by capt-marbles
(andrewdmwalker@gmail.com), a bare gmail that does not auto-resolve in
the check-attribution job. Add the AUTHOR_MAP entry.
The salvaged read-side fix lets a profile resolve the xAI OAuth grant from
the global-root auth store when it has no own providers.xai-oauth block.
But _save_xai_oauth_tokens still wrote rotated tokens only to the active
profile store. Because xAI rotates the refresh_token on every refresh, a
profile that reads root's grant and refreshes it left root holding a now-
revoked refresh token — killing every other profile reading the stale root
grant with invalid_grant once its access token expired (#43589).
Detect the read-from-root case (profile lacks its own providers.xai-oauth
block) and, after the profile save, write the rotated chain back to the
global root too via a best-effort, TOCTOU-safe write-through that reuses
_save_auth_store with an explicit target path. A profile that genuinely
shadows root (has its own block) is left untouched, classic mode is a
no-op, and a failed root write never breaks the profile's own save.
Pairs with the read fallback in the preceding commit so the cross-profile
xAI grant stays coherent in both directions.
* fix(docker): skip per-profile gateway reconciliation in dashboard container
When gateway and dashboard containers share a bind-mounted HERMES_HOME,
both run the cont-init.d profile reconciliation script, which creates
s6-log processes for every persisted profile. These s6-log processes
in different containers race to flock() the same log-directory lock
files under logs/gateways/<profile>/lock, producing repeated
"s6-log: fatal: unable to lock ... Resource busy" errors and a
supervision restart storm.
Add HERMES_SKIP_PROFILE_RECONCILE env var support to container_boot.py
and set it in the official docker-compose.yml dashboard service so the
dashboard container no longer creates per-profile gateway s6 services
it never uses.
* chore(release): map salvaged contributor
* refactor(docker): autodetect dashboard container instead of env-var gate
Replace the HERMES_SKIP_PROFILE_RECONCILE env var with PID 1 argv role
detection. A dashboard-only container never spawns or supervises
per-profile gateways, so the reconcile boot hook now skips itself when
/proc/1/cmdline is the dashboard command — no operator flag to set (or
forget in a hand-written manifest, which would reintroduce the s6-log
flock storm this prevents).
- Extract _strip_container_argv_prefix() shared by the legacy-gateway
and new dashboard detectors (DRY the init/wrapper/hermes peel).
- Add _is_dashboard_container(); gate reconcile main() on it.
- Drop HERMES_SKIP_PROFILE_RECONCILE from code + docker-compose.yml.
- Tests: argv matrix for both roles + main()-level skip/reconcile proof
and a regression that the removed env var is now inert.
Co-authored-by: 895252509 <895252509@qq.com>
---------
Co-authored-by: zhouxiang <895252509@qq.com>
Co-authored-by: Ben <ben@nousresearch.com>
_configured_terminal_cwd and _registered_task_cwd_override carried a
byte-identical sentinel + expanduser + isabs validation tail. Extract it
into _sentinel_free_abs_cwd(raw) so the relative/sentinel rejection rule
lives in one place. Behaviour unchanged (the str() coercion the override
path relied on is preserved in the helper).
The session-cwd fix inserted a registered task/session cwd override step
between the live-cwd and $TERMINAL_CWD fallbacks, but three docstrings still
described the old two-step order — _resolve_base_dir's numbered list was
outright wrong. Update _authoritative_workspace_root, _resolve_base_dir, and
_path_resolution_warning to reflect the actual four-step resolution order.
No behaviour change.
The raw-key-first-then-collapsed override lookup was hand-rolled in three
places with subtly different spellings: terminal_tool's command setup, and
both file_tools._registered_task_cwd_override and _get_file_ops. Since that
exact raw-vs-collapsed invariant is what the session-cwd fix depends on,
keeping three copies invites the drift that caused the original bug.
Add terminal_tool.resolve_task_overrides(task_id) as the single source and
route all three sites through it. Behaviour is unchanged (verified
byte-equivalent across raw/collapsed/isolation/None/subagent inputs).
The supervised `gateway-default` s6 slot runs bare `hermes gateway run`
(no -p) to mean "the root HERMES_HOME profile". But `_apply_profile_override`
falls through its #22502 HERMES_HOME guard for the container root
(/opt/data, whose parent is not `profiles`) and reads the sticky
`active_profile` file. If the user set another profile active (e.g. via
the dashboard), the reserved default gateway gets redirected into that
profile — producing a duplicate gateway for the active profile and no
real default gateway. The profile page and `gateway status` then
correctly report default as "not running" because there genuinely isn't
one.
Guard step 2 (the sticky active_profile fallback) with the existing
HERMES_S6_SUPERVISED_CHILD sentinel that the container run-script already
exports. Supervised named-profile slots pass -p explicitly (step 1, never
reaches step 2); only the bare default slot was affected. Inert outside
the s6 container — the sentinel is never set elsewhere.
Reported in the 'Docker & Profiles & Dashboard' support thread.
The unified machine-dashboard reroute (cmd_dashboard) re-execs a named-profile
dashboard launch as the machine dashboard and dropped HERMES_HOME from the
child env with the comment "so the child binds the machine root". That holds
for a standard install (root == ~/.hermes) but breaks the Docker layout: the
published image sets `ENV HERMES_HOME=/opt/data`, so once HERMES_HOME is unset
the child falls back to $HOME/.hermes = /opt/data/.hermes — an empty,
auto-seeded home.
Two user-visible symptoms, one root cause (reported via support):
1. Dashboard Profiles page shows only an empty `default` — the real
default/oracle/saga profiles live under /opt/data/profiles, but the
rerouted child resolves _get_profiles_root() to /opt/data/.hermes/profiles.
2. The "Update Hermes" button runs `hermes update` inside the container
repeatedly instead of bailing with the docker-update guidance. The Docker
guard keys off detect_install_method(), which reads
$HERMES_HOME/.install_method; the image stamps that at /opt/data, but the
misresolved home has no stamp, no HERMES_MANAGED, and no .git → falls
through to "pip", so the guard never fires.
The reporter's workaround was to bind-mount the host dir at both /opt/data and
/opt/data/.hermes so the two paths converge (at the cost of a self-referential
recursion).
Fix: resolve the machine root explicitly with get_default_hermes_root() and set
it on the child env instead of popping HERMES_HOME. That helper returns the
root for both layouts — ~/.hermes for a standard install, and /opt/data for
Docker (it strips a trailing profiles/<name>). Falls back to the old pop
behaviour only if root resolution raises, so the reroute is never blocked.
Regression tests in test_dashboard_unified_launch.py: the existing standard-
install test now asserts the child carries HERMES_HOME == get_default_hermes_root()
(not absent), and a new test_reexec_pins_docker_machine_root covers the Docker
layout (HERMES_HOME=/opt/data/profiles/oracle → child gets /opt/data). Both
fail against the pre-fix pop behaviour (mutation-verified).
* fix: persist s6 gateway desired state
* chore(release): map salvaged contributor
---------
Co-authored-by: Alfred Smith <alfred@my-cloud.me>
Co-authored-by: Ben <ben@nousresearch.com>
* fix(gateway): chown logs/gateways parent so late-added profiles can log
The per-profile log service script created $HERMES_HOME/logs/gateways/
via 'mkdir -p' but only chowned the leaf logs/gateways/<profile>. When
the first log service boots in root context, the gateways/ parent stays
root:root; every profile registered later runs its log service as the
dropped hermes user, 'mkdir -p' fails with EACCES, and s6-log enters a
sub-second fatal crash-loop flooding the container log. The stage2
recursive heal does not catch it either: it is gated on needs_chown,
which is false when the top-level $HERMES_HOME is already hermes-owned.
Two complementary fixes:
- service_manager._render_log_run: chown the gateways/ parent
(non-recursively) before the leaf chown. Runs on every root-context
boot, so it also heals volumes already poisoned by older images.
- docker/stage2-hook.sh: seed logs/gateways in the as_hermes mkdir -p
block; cont-init runs before any service starts, so the parent
already exists hermes-owned when the first log/run does 'mkdir -p'.
The needs_chown repair loop needs no twin entry: it already chowns
logs/ recursively, which covers logs/gateways.
Fixes#45258
* chore(release): map salvaged contributor
---------
Co-authored-by: tangtaizhong666 <tangtaizhong792@gmail.com>
On Windows, native Python extensions such as _bcrypt.pyd are loaded as
DLLs by any running hermes process. When the installer tries to recreate
the venv (Remove-Item -Recurse -Force "venv"), Windows denies the delete
because the DLL is still mapped into the running process.
Add a taskkill /F /T /IM hermes.exe call before the Remove-Item so any
hermes process tree is stopped first, releasing the file lock. A short
sleep gives the OS time to unload the image before deletion proceeds.
This mirrors the existing force_kill_other_hermes() guard already present
in the --update flow (update.rs), applying the same pattern to the full
reinstall/repair path through install.ps1.
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
The collapsed-pane hover-reveal trigger strip (14px wide, 6px edge
gutter) overlapped the neighboring scroller's 8px .scrollbar-dt
scrollbar, which sits flush with the window edge when the rail panes
are collapsed. Hovering the scrollbar revealed the file browser over
it, and clicks on the overlapped band hit the trigger instead of the
scrollbar thumb.
Widen the edge gutter to calc(0.5rem + 2px) so the strip clears the
scrollbar (rem-coupled to the .scrollbar-dt width) while still
covering the OS window-resize grab area inset.
Part of #44140 (item 2).
Co-authored-by: AIalliAI <285906080+AIalliAI@users.noreply.github.com>
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
* fix(s6): prevent profile create from auto-starting gateway service
When hermes profile create runs inside an s6 container,
_maybe_register_gateway_service() calls register_profile_gateway()
which creates the service directory and triggers s6-svscanctl -a.
Previously the service always started immediately, causing profiles
that share the main gateway's bot token (e.g. Kanban worker profiles)
to fail with a token-lock conflict and persist gateway_state: running
— becoming zombies that resurrect on every container restart.
Wire the existing start_now parameter through the S6 implementation:
when start_now=False, write a marker file (same pattern as
container_boot.py _register_gateway_slot) so s6-supervise leaves the
service stopped until the user explicitly runs hermes -p <profile>
gateway start.
4 files, +61/-6, 4 new tests (all passing).
* test(docker): wait for gateway running state before restart
---------
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Remove the free Parallel Search MCP path and restore the keyed Parallel backend behavior from before it was introduced.
Also drops the keyless fallback registration/display labeling tests and returns the Parallel SDK pin to the prior version.
GLM-5.2 ships with a 1M (1,048,576) token context window. Without this
entry, Hermes falls through to the generic 'glm' key (202,752 tokens),
under-reporting the context bar and prematurely compressing conversations.
The 1M limit was verified empirically via needle-in-a-haystack retrieval
at 789,240 prompt tokens on api.z.ai/api/coding/paas/v4 — zero errors,
zero truncation, correct retrieval at every tested size (25K through 789K).
Changes:
- agent/model_metadata.py: add 'glm-5.2': 1_048_576 before 'glm' fallback
- hermes_cli/models.py: add glm-5.2 to zai curated models
- hermes_cli/setup.py: add glm-5.2 to setup wizard zai list
- hermes_cli/auth.py: add glm-5.2 to coding plan endpoint probes
- plugins/model-providers/zai/__init__.py: add glm-5.2 to fallback_models
- tests/agent/test_model_metadata.py: context resolution + vendor-prefix tests
The #45966 cross-process coherence guard snapshots a session's on-disk
message_count next to the cached agent and rebuilds the agent when the
count changes. But the snapshot is taken at agent-BUILD time — before
the turn writes its own user + assistant (+ tool) rows — and the cache
entry is never rewritten on a reuse. So this process's OWN turn grows
message_count, and the very next turn sees a mismatch and rebuilds the
agent. That happens every turn, for every conversation, silently
destroying the per-conversation prompt caching the cache exists to
protect (AGENTS.md: prompt caching is sacred).
Add _refresh_agent_cache_message_count(): after a turn completes and the
agent has flushed its rows to the SessionDB, re-baseline the stored count
to the now-current value. The guard then fires ONLY when a DIFFERENT
process changes the transcript — preserving the #45966 fix while keeping
the cache warm for normal single-process operation.
Tests drive the real SessionDB + the real guard condition: 5 consecutive
same-process turns now all REUSE the cached agent (0 before the fix); a
cross-process append still invalidates; and the re-baseline is fail-safe
(no DB, falsy session_id, raising probe, legacy 2-tuple, pending sentinel
all no-op).
The platform-disabled fix landed only in agent.skill_utils.get_disabled_skill_names
(the system-prompt path). Two sibling resolvers still used the old
replace-not-union semantics, so the same skill could be hidden from the
<available_skills> prompt yet reported enabled elsewhere:
- hermes_cli/skills_config.get_disabled_skills (the 'hermes skills config' UI)
returned only the platform list, so a globally-disabled skill showed as
enabled (unchecked) on any platform with a platform_disabled entry.
- tools/skills_tool._is_skill_disabled (gates whether skill_view loads a skill)
ignored the global list when a platform list existed, so a globally-disabled
skill could still be loaded on such a platform.
Both now union the global list with the platform list, matching
get_disabled_skill_names. An explicit empty platform list no longer re-enables
a globally-disabled skill — global disables hold on every platform (#46201).
Also: fix the now-stale get_disabled_skill_names docstring and drop a stray
blank line. Regression tests added for both sites (proven to fail on the old
replace semantics).
build_skills_system_prompt() already resolved _platform_hint but called
get_disabled_skill_names() with no argument, so the resolved platform never
reached the filter and the prompt cache_key varied by platform while the
disabled set did not. Pass _platform_hint or None.
get_disabled_skill_names() also fully ignored the global 'disabled' list once
a platform-specific list was found. Return the union (global | platform) so a
globally-disabled skill stays disabled on every platform.
Salvaged from #46203 by @iborazzi; the unrelated apps/shared/tsconfig.json
ES2023 bump is intentionally dropped (one concern per PR).
Three changes to prevent infinite re-execution loops when a user sends
a new message while long-running tools are executing:
1. Filter interrupted tool results in _build_gateway_agent_history:
skip tool messages whose content contains [Command interrupted] or
exit_code 130 — they represent partial execution, not valid results.
2. Don't replay auto-continue notes as user messages: detect
gateway-injected [System note: ...] / [IMPORTANT: ...] prefixes
and skip them in _build_gateway_agent_history so the LLM doesn't
see 4+ messages from 'the user' telling it to finish old work.
3. Fix the wording: the system note now instructs the model to
address the user's NEW message FIRST, IGNORE pending results,
and NOT re-execute old tool calls.
Closes#45230
Port 465 expects implicit TLS (SMTP_SSL) from the first byte. The email
adapter always used SMTP() + starttls(), which is correct for port 587
but hangs/fails on port 465 providers (e.g., Swiss ISPs).
Additionally, when the SMTP host has AAAA DNS records but IPv6 is
unreachable, socket.create_connection() tries IPv6 first and hangs
until timeout. Add an IPv4 fallback via AF_INET socket.
Extract _connect_smtp() helper to consolidate the 4 duplicate SMTP
connection sites into a single method with correct protocol selection
and IPv6 fallback logic.
The initial fix only wrote the prefix npmrc on a fresh Node install, so
pre-existing bundled-Node installs (Node already present) were not repaired
by re-running the installer — install_node/ensure_node skip when Node is
already up to date.
Extract the redirect into an idempotent helper
(configure_managed_node_npm_prefix / _nb_configure_npm_prefix) that no-ops
when there's no Hermes-managed npm, and call it unconditionally from
check_node (install.sh) and at the top of ensure_node (node-bootstrap.sh).
Re-running the install command now repairs an affected install in place,
not just brand-new ones.
Guards that install.sh and node-bootstrap.sh redirect the bundled Node's
npm global prefix to the command link dir's parent via a prefix-local
global npmrc, so `npm install -g` binaries land on PATH instead of the
off-PATH $HERMES_HOME/node/bin.
When the installer falls back to a bundled Node under $HERMES_HOME/node,
npm's default global prefix is that Node dir, so `npm install -g <pkg>`
drops the package binary in $HERMES_HOME/node/bin. Only node/npm/npx are
symlinked into the command link dir (~/.local/bin, /usr/local/bin, or
$PREFIX/bin) — so user-installed global package binaries are NOT on PATH
and can't be run, even though `npm i -g` reports success. They also get
wiped on every Node upgrade (the dir is rm -rf'd and re-extracted).
Redirect the bundled Node's npm global prefix to the command link dir's
parent, so global bins land in the link dir (already on PATH, alongside
node/npm/npx) and survive Node upgrades. Scoped to the bundled Node via
its prefix-local global npmrc ($HERMES_HOME/node/etc/npmrc), so the user's
other Node installs and their ~/.npmrc are untouched. Hermes's own global
installs (agent-browser) pass an explicit --prefix and are unaffected.
Gateway startup now queues real inbound messages until restart-interrupted auto-resume turns have completed, preventing duplicate agents for the same session after a restart.
When profile isolation activates ({HERMES_HOME}/home/ exists), child
processes receive HOME={HERMES_HOME}/home/ for tool config isolation
(git, ssh, gh). However, scripts using Path.home() to locate
~/.hermes/ would incorrectly resolve to the isolated profile home,
breaking helpers that rely on the real user home directory.
New get_real_home() helper in hermes_constants resolves the actual
user home independently of profile isolation. All four subprocess
spawners now inject HERMES_REAL_HOME alongside the profile HOME:
- tools/code_execution_tool.py (execute_code)
- tools/environments/local.py (terminal background, run_env)
- agent/copilot_acp_client.py (Copilot ACP)
Child scripts can now use:
Path(os.environ.get("HERMES_REAL_HOME", os.environ.get("HOME", "")))
to reliably find the real user home regardless of profile isolation.
Closes#25114
Registers 200 plugin commands on top of the native + COMMAND_REGISTRY set
and asserts the tree never exceeds Discord's 100-command limit, that native
high-priority commands survive the cap, and that overflow is actually
dropped. Regression guard for the recurring error 30032
("Maximum number of application commands reached") sync failures.
Discord enforces a hard cap of 100 global application commands per app.
The adapter registers ~27 native commands plus every gateway-available
entry in COMMAND_REGISTRY plus all plugin commands plus the consolidated
/skill group. On a loaded install (many plugins/quick commands) the
desired set exceeds 100, so tree.sync() / _safe_sync_slash_commands()
hits error 30032 ("Maximum number of application commands reached") and
Discord rejects the ENTIRE batch — silently breaking every slash command,
not just the overflow.
Cap registration at the 100-command limit: native commands (registered
first, highest priority) and the /skill group are always kept; lower-
priority auto-registered COMMAND_REGISTRY and plugin commands are added
only until the cap is reached, with a single concise warning telling the
user how to surface the rest. Since both sync paths read from
tree.get_commands(), bounding the tree fixes the root cause for both.
Allow file tools to edit shell startup files, user package-manager configs, and Hermes control files that the user can already modify directly. Keep hard blocks for SSH keys, .env/OAuth token stores, mcp-tokens, pairing files, and system privilege files.
Show a shimmering "Summarizing thread" label during auto-compaction, skip
the post-turn hydrate when compaction fired so the live transcript does not
collapse to the stored summary-only session.
Auto-compression rewrites history mid-turn, which made long threads look
like they reset. Re-tag the gateway lifecycle status as compacting and
surface it in the desktop thread loading indicators.
* fix(desktop): clarify enter-to-send and top-align choice radios
Match the composer keyboard contract in clarify freeform answers and align choice-row radio dots to the start of wrapped labels.
* fix(desktop): clarify loading spinner until request is ready
Hold the clarify panel on a centered Loader2 until clarify.request arrives instead of showing disabled choices or a loading-question stub.
* refactor(desktop): dedupe clarify shell and drop stale ready gates
Extract the shared clarify panel wrapper and remove disabled-state checks that loading already makes unreachable.
The HTTP-200 refusal handler (finish_reason=content_filter) and the
exception-path handler (a provider moderation error classified as
content_policy_blocked) independently built the same terminal turn result —
the same {final_response, messages, api_calls, completed:False, failed:True,
error:'content_policy_blocked: ...'} dict — and ended their user-facing
message with the same 'Try rephrasing... hermes fallback add' trailer, copied
verbatim. The two copies could drift.
Funnel both through a shared _content_policy_blocked_result() builder and a
shared _CONTENT_POLICY_RECOVERY_HINT constant. Also collapse the HTTP-200
path's two near-identical with/without-explanation templates into one (compute
the detail fragment once) and pass reason=FailoverReason.content_policy_blocked
.value to the error hook instead of a hand-written string literal, matching the
sibling hook call.
Behavior-preserving: the provider/refusal lead-in wording stays distinct (a
provider safety filter vs the model declining are genuinely different signals),
the with-text and exception messages are byte-identical to before, and the
no-explanation case only gains a paragraph break for consistency. Surfaced by
the simplify-code reuse/quality reviewers.
The efficiency reviewer's 'redundant normalize_response' flag was deliberately
NOT applied: that branch is cold (refusal-only) and pure-CPU, and reusing the
sibling-branch normalized locals would risk a NameError on the codex_responses
path (which sets finish_reason without normalizing) — re-normalizing is the
robust choice.
A chat-completions response that carries real text or tool calls *alongside*
a `message.refusal` note is a normal, usable turn — the model did work. The
prior logic flipped finish_reason to `content_filter` whenever a refusal
string was present, so the conversation loop reframed a content-bearing turn
as a *failed* safety refusal (failed=True) and buried the model's actual
output inside the "model declined" template, or dropped tool calls entirely.
Only promote to a terminal `content_filter` when the refusal is the sole
payload (no visible text AND no tool calls). The refusal explanation is still
recorded in provider_data in every case for observability. Refusal-only
responses (the bug this feature targets) are unaffected and still surface
terminally; the empty+refusal, bare content_filter passthrough, and no-refusal
common cases are byte-identical to before.
Updates the partial-content test to the corrected contract and adds a
tool_calls-alongside-refusal regression guard.
OpenRouter (and every other OpenAI-compatible provider) uses the default
chat_completions transport, so it is already covered by the refusal fix:
an upstream Claude / moderation refusal arrives as
finish_reason="content_filter" (often empty content, no message.refusal).
Add a regression test asserting the transport passes that finish reason
straight through to the loop's content_filter handler.
(cherry picked from commit 60168a513b)
A Claude refusal (HTTP 200, stop_reason="refusal", empty content) was
laundered into a generic retry loop and surfaced as a misleading
"rate limited / invalid response" or "no content after retries" error,
burning paid attempts reproducing a deterministic refusal.
This hit two distinct paths:
- Direct Anthropic (anthropic_messages): validate_response rejected the
empty-content refusal *before* normalize_response mapped refusal ->
content_filter, so it fell into the invalid-response retry loop.
- Nous Portal / OpenAI-compatible (chat_completions): the portal surfaces
a Claude refusal via message.refusal with empty content, which sailed
past validation and died in the empty-response retry loop.
Fix (one unified content_filter dispatch for all backends):
- AnthropicTransport.validate_response: accept empty content when
stop_reason == "refusal" so it flows to normalize_response.
- ChatCompletionsTransport.normalize_response: promote message.refusal to
content + a content_filter finish reason.
- conversation_loop: handle finish_reason == "content_filter" - fire the
api_request_error hook (content_policy_blocked), try a configured
fallback once, else return a clear terminal refusal message. Never retry
a deterministic refusal.
Supersedes #43084, which fixed only the direct-Anthropic path and could
not reach the chat_completions/portal path.
Tests: transport-level (validate_response refusal, message.refusal
promotion) + end-to-end loop (refusal surfaced, exactly one API call).
(cherry picked from commit 01f546f92c)
Parse provider-reported image pixel ceilings so many-image Anthropic requests can recover by shrinking Retina screenshots below the stricter limit instead of retrying the same rejected payload.
The turn-end sound is a notification concern, not an appearance one — relocate
the variant picker + preview from the Appearance tab to the Notifications tab
(its i18n keys move from settings.appearance to settings.notifications with it).
Adds a native OS notification system (Electron Notification, routed cross-OS)
distinct from the in-app toast feed. Before this, one hardcoded cue existed
(message.complete while document.hidden) with no settings or event coverage.
- Engine (store/native-notifications.ts): localStorage-backed prefs (master
switch + per-kind toggles) and a gated dispatcher over five kinds — approval,
input, turnDone, turnError, backgroundDone — with a 1s per-(kind,session)
self-evicting throttle.
- Gating: "backgrounded" = document.hidden OR !document.hasFocus(), so an
alt-tabbed window still counts as away. Completion kinds fire only when
backgrounded and for the active session (no spam from a busy gateway);
attention kinds (approval/input) also break through for off-screen sessions.
- Wired into real event sites (use-message-stream.ts): message.complete, error,
approval/clarify/sudo/secret.request; backgroundDone from composer-status at
the running -> exited transition.
- Click focuses the window and jumps to the originating session; approval
notifications carry Approve/Reject buttons that resolve in place over
approval.respond, mirroring the in-app Run/Reject bar.
- Settings: new Notifications panel (master + per-kind switches, test button
with real OS-result feedback). Full i18n (en/ja/zh/zh-hant).
* feat(desktop): add curated completion sound bank for turn completion
Replace the prior haptic-only completion cue with a curated Web Audio completion sound flow, defaulting to the minimal two-note comfort preset while keeping alternate presets available for quick iteration. Play the cue on every message completion event (including background sessions) so turn-end feedback is consistent across active and non-active chats.
* refactor(desktop): drop done1 byte sample from completion bank
Keep the curated Web Audio presets only; the embedded sample added bulk without shipping as the default cue.
* feat(desktop): expand completion sounds and add Appearance picker
Add fourteen synthesized turn-end presets with preview in settings, persisted variant selection, and softer default mixing for late-night use.
Co-authored-by: Cursor <cursoragent@cursor.com>
* refactor(desktop): dedupe completion-sound resolver, trim audio comments
Make the store the single source of truth for the variant default + range
validation and have the sound lib import it (one-way lib→store edge, no
cycle), instead of two divergent copies. Extract the shared white-noise
buffer used by the air/whoosh voices and cut the synth comments down to
why-only notes.
---------
Co-authored-by: Austin Pickett <pickett.austin@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Both the regular and execute_code dispatch paths forward task_id into
registry.dispatch via middleware _dispatch lambdas but silently dropped
session_id. Dispatch-layer hooks (e.g. set_enforcement_fn) that correlate
calls with the active session received "" for every invocation.
Pass session_id=session_id at both _dispatch call sites inside
handle_function_call, matching the existing task_id pattern. Hooks
already received session_id; this closes the registry.dispatch gap.
Rebased onto current main where dispatch is wrapped by
run_tool_execution_middleware — the old direct-dispatch sites from
#28479 no longer exist.
test(dispatch): add tests for session_id forwarding (NousResearch#28479)
Covers standard and execute_code paths through the middleware wrapper.
Verifies task_id forwarding is not broken by the change.
The previous test patched ssl.create_default_context globally with a bare
SSLContext that has zero CA certs. Both verify_ca_bundle() and the macOS
fallback got the same mocked context, so the test verified nothing useful:
both paths produced empty get_ca_certs() and the assertion that no
exception escaped was vacuously satisfied.
Only mock the fallback call (no cafile) — let the certifi call hit the
real SSL stack and fail with SSLError on the broken PEM. The mock
fallback returns a context with load_default_certs() so the test now
verifies the real scenario: broken certifi → SSLConfigurationError,
macOS system trust store → success.
Also pads the broken PEM past the 1 KB size guard so the size check
doesn't short-circuit before ssl.create_default_context(cafile=...) runs.
Reported by @liuhao1024 in PR review.
A stale certifi CA bundle after a partial `hermes update` used to crash
the agent on the first outbound HTTPS call with a raw traceback and
trap the gateway in a retry loop.
This patch:
* Adds `agent/errors.py` with a typed `SSLConfigurationError`
* Adds `agent/ssl_guard.py` with a `verify_ca_bundle()` pre-flight
that asserts the bundle exists, is non-trivial in size, and can build
a working SSLContext. On macOS, it falls back to the system trust
store when the bundle is empty but the system store is healthy
(covers corporate proxies / MDM setups).
* Wires the guard into `run_agent.py` and `gateway/run.py` right
after the `hermes_bootstrap` import, inside a try/except so a bug
in the guard itself can never prevent startup.
* Adds a `SSL / CA Certificates` section to `hermes_cli doctor` so
users can detect the failure with one command.
* Adds unit tests covering the healthy, missing, empty, skip-env, and
macOS-fallback paths.
* Adds an RCA document describing the failure mode and the recovery
path (`pip install -e .`).
When the bundle is broken the user sees:
\u26a0\ufe0f SSL certificate bundle issue detected.
Run: pip install -e .
`HERMES_SKIP_SSL_GUARD=1` disables the check for sandboxed
environments that ship their own trust store.
* fix(desktop): jump-to-approval pill for off-screen approvals
A blocked approval's only response surface is the inline Run/Reject bar on
the pending tool row. When that row is scrolled out of view the session looks
stalled with no visible action. Surface a composer-anchored "Approval needed"
pill only when an approval is pending AND its inline bar is scrolled away;
clicking scrolls the bar back into view. Preserves the deliberate inline (not
modal) approval design — the pill never duplicates the approve/reject controls.
The inline bar mirrors its own viewport visibility via IntersectionObserver
(tracks scroll/resize/layout) and registers a scroll-into-view handler the pill
fires, mirroring the existing thread-scroll jump-button bridge.
Supersedes #45828.
* fix(desktop): morph jump-to-bottom into approval prompt; drop scroll bridge
Collapse the separate "jump to approval" pill into the existing
scroll-to-bottom control: when scrolled away from the bottom while an approval
is pending, it relabels to "Approval needed". A parked approval's inline
Run/Reject bar is always the bottom-most content, so the existing
scroll-to-bottom action lands the user right on it — one control, no collision.
This also fixes the layout corruption from the first cut: the pill called
native el.scrollIntoView(), which scrolls every scrollable ancestor including
the overflow:hidden chat shell containers. Those have no scrollbar to scroll
back and don't remount on session switch, so the composer stayed shoved and
the breakage persisted across sessions. Reusing requestScrollToBottom() (the
use-stick-to-bottom path) only touches the one designated scroll container.
Removes the now-unused approval-scroll store + IntersectionObserver wiring.
Three tests covering: a stale .bak poisoning a failed update's move/restore, an orphaned .bak misread as a user deletion, and a partially written dest blocking restore-on-failure. All three fail on current main without the fix.
Refs #44942
Recover an orphaned .bak before classification (interrupted updates no longer read as user deletions), clear a stale .bak before shutil.move (replace, not nest), and clear a partial dest before restore so restore-on-failure actually runs.
Fixes#44942
tools/approval.py already denies tee/redirection writes to every
_SENSITIVE_WRITE_TARGET (~/.ssh/*, ~/.netrc/.pgpass/.npmrc/.pypirc, shell
rc files, ~/.hermes/config.yaml/.env) via the DANGEROUS_PATTERNS tee/`>`
rules, but cp/mv/install were only paired for _SYSTEM_CONFIG_PATH (/etc) and
the project-relative env/config target. So `cp evil ~/.ssh/authorized_keys`
(SSH-key implant / persistence), `cp creds ~/.netrc`, and `cp evil ~/.bashrc`
(login-time command injection) auto-approved while the equivalent tee/`>`
forms were denied — an unpaired write deny is theater (same rationale as
#14639 / commit 4e9d886d, which paired the terminal side for
~/.hermes/config.yaml writes but did not touch these cp/mv/install verbs on
the broader sensitive set).
Add one (cp|mv|install) DANGEROUS_PATTERNS entry reusing the existing
_SENSITIVE_WRITE_TARGET fragment, anchored via _COMMAND_TAIL so it fires on
the destination (last arg) only: reading OUT of a sensitive path
(`cp ~/.ssh/config /tmp/x`) stays auto-approved. Description differs from the
system-config cp entry so the two keep distinct approval keys (no silent
cross-approval). Additive — does not subsume the /etc or project-config rules.
Adds TestSensitiveCopyMovePattern: 5 positive cases (ssh authorized_keys,
ssh private key via mv, netrc via install, bashrc, ~/.hermes/config.yaml) +
2 negative guards (copy FROM ssh, unrelated copy). The ssh/netrc/bashrc
positives fail on main and pass on this branch; the negatives stay green
both ways.
Carry forward focused follow-ups from PR #45741: treat PTB's raw Bot API 10.1 response shapes safely, recognize real missing-endpoint errors, preserve link preview settings on rich sends, and lock the rich limit to Telegram's character-based cap.
Large paste and Ctrl+A → Delete froze the composer for seconds — both routed
through Chromium's contenteditable editing pipeline (~O(n²) on multiline DOM).
- insertPlainTextAtCaret: Range + text/<br> fragment (paste path)
- deleteSelectionInEditor: range.deleteContents for non-collapsed Backspace/Delete
- Shared composerSelectionRange helper; both flush via flushEditorToDraft
Profiled live (47 KB / 122 paragraphs): paste 4474 ms → 13 ms; select-delete
1304 ms → 4 ms. Collapsed-caret deletes still native.
* fix(desktop): accept slash command on space at command stage
Pressing space on a no-arg slash command (e.g. /hermes-agent) fell
through to the arg-completion stage and dead-ended on "No matches"
instead of inserting the directive. Space now mirrors Tab/Enter while
the command name is still being typed: no-arg commands commit the chip,
arg-taking commands expand to their options step.
* fix(desktop): suppress arg popover for no-arg slash commands
Committing a no-arg command (`/hermes-agent `) re-detected the chip+space
as an arg query and re-opened the popover on "No matches". The arg-stage
menu now only opens when the command actually takes args.
* fix(desktop): polish slash arg completion (space/tab/click + typed args)
Unify Enter/Tab/Space accept of the highlighted item at both the command
and arg stages: no-arg commands commit a chip, arg commands expand to
options, and an arg option commits the full `/cmd arg` chip. A fully-typed
arg (which the backend completer drops from suggestions) now commits on
Space/Tab via the verbatim text instead of dead-ending, and the "No
matches" empty state is suppressed past a command's name. Space stays
slash-only so @ mentions keep a literal space.
The salvaged fix's two regression tests mock adapter.handle_message, so
they only assert the pre-claimed sentinel is set/cleaned around a stub —
they never drive the real dispatch chain. Add a full-path test that
exercises _schedule_resume_pending_sessions -> _guarded_handle_message ->
adapter.handle_message -> _process_message_background -> _handle_message
and asserts the resumed session's agent runs EXACTLY ONCE: not zero (the
pre-claim must not self-bounce the resume into a queued no-op) and not
twice (the duplicate-agent bug #45456 the fix targets). Also assert no
leaked sentinel and no orphaned pending event after the drain settles.
Tighten the _guarded_handle_message docstring: on current main the real
sentinel is taken over inside _handle_message (not _process_message_background),
and note the `is _AGENT_PENDING_SENTINEL` guard only releases the slot we
ourselves placed, never one a live run owns.
When the gateway restarts and auto-resumes an interrupted session, an
inbound message arriving in the window between `asyncio.create_task()`
and the task's first await could spin up a second AIAgent for the same
session. Both agents would then process messages concurrently,
producing interleaved duplicate responses (#45456).
Fix: set `_AGENT_PENDING_SENTINEL` in `_running_agents` immediately
after the "already running" check, before creating the task. This
closes the race window — any inbound message sees the slot as occupied
and queues behind the auto-resume.
A `_guarded_handle_message` wrapper ensures the pre-claimed sentinel is
always released, even if `handle_message` raises before reaching
`_process_message_background` (whose `finally` block handles normal
cleanup).
(cherry picked from commit 85150c976b)
Bedrock Converse rejects non-default sampling parameters for Opus 4.7 and 4.8 with a ValidationException. Reuse the Anthropic-native sampling-param guard in the Bedrock kwargs builder so those models omit temperature/topP while older Claude and non-Claude models keep existing behavior.
Includes the stop-sequence regression from the parallel fix to ensure stopSequences still pass through for restricted Opus models.
Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
The dashboard's public /api/status liveness endpoint is in PUBLIC_API_PATHS
and bypasses dashboard auth, yet it returned absolute hermes_home,
config_path, env_path, the gateway PID, and the internal gateway health URL.
That exceeds the shape its own allowlist documents as public ("version,
gateway state, active session count, and the dashboard auth-gate shape. No
bodies, no session content, no secrets"), leaking deployment recon to any
unauthenticated caller on a network-exposed (gated) bind.
Withhold host-local detail unless the bind is loopback / --insecure, where
the dashboard is local-only and the caller is already inside the trust
envelope -- the same split should_require_auth draws. The NAS liveness probe
and the auth-gate badge are unaffected.
Adds invariant tests for both modes (gated withholds, loopback keeps).
Keep the own-policy fail-closed hardening from PR #45444, but still trust WeCom groups.<id>.allow_from because the adapter already checked that sender allowlist before dispatching to gateway auth.
Own-policy adapters (WhatsApp, WeCom, Weixin, QQBot, Yuanbao) default dm_policy/group_policy to "open", which forwards every sender. The gateway's adapter-trust shortcut in _is_user_authorized blanket-trusted those platforms when no env allowlist was set, so an operator who enabled one with only credentials authorized the entire external network -- the fail-open SECURITY.md section 2.6 forbids ("an allowlist is required for every enabled network-exposed adapter").
Trust the adapter only when its effective policy for the chat type is an actual "allowlist" restriction (the case #34515 was protecting). "open"/"pairing"/anything else falls through to default-deny, where {PLATFORM}_ALLOW_ALL_USERS / GATEWAY_ALLOW_ALL_USERS and the pairing flow remain the explicit opt-ins.
Old Office formats (.xls, .doc, .ppt) were missing from the
SUPPORTED_DOCUMENT_TYPES dict in gateway/platforms/base.py while their
newer counterparts (.xlsx, .docx, .pptx) were included.
Sending an .xls file via Telegram triggers 'Unsupported document type'
and the file is silently dropped instead of being cached and forwarded
to the agent.
Add the three legacy MIME types so these files are handled the same way
as their modern equivalents.
The identity-mapping keys never made it to the site docs. Add the three keys
to the config reference and a Gateway Identity Mapping section: when it
applies (gateway only, setup-gated), the intent tree, resolver order, the
un-pin orphan warning, and the deprecated pinPeerName alias.
Drop pinPeerName from the key table (now a deprecated-alias note), and replace
the single/multi/hybrid 'deployment shapes' section with the gateway-gated
intent tree the wizard actually presents, including the [e] raw-edit hatch and
the un-pin pooling steer.
The single/multi/hybrid 'deployment shape' was a misnomer: these keys only
affect the gateway (the one entrypoint supplying a runtime user ID), and the
three preset names stamped a lossy taxonomy onto three orthogonal knobs while
hiding which keys got written.
Replace it with an intent-led tree gated on gateway detection:
- _gateway_platforms() lazily inspects the gateway config (best-effort, no
hard dependency); the step auto-skips when no platform is connected.
- 'who talks to this?' → just me / me+others (pooled?) / only others, deriving
pinUserPeer + userPeerAliases + runtimePeerPrefix and echoing the result.
- [e] drops to a raw-knob editor for power users.
- The single→multi orphan guard survives as a pooling steer.
The setup wizard wrote the legacy pinPeerName even though pinUserPeer is
the canonical key that outranks it in the resolver — so it had to scrub
the canonical key afterward to stop it winning. Write pinUserPeer directly
and migrate any legacy pinPeerName onto it on touch (setup load + clone),
which removes the precedence-fighting entirely.
Resolver still reads pinPeerName as a back-compat alias; that's deferred.
2026-06-10 16:07:53 -04:00
692 changed files with 55991 additions and 13151 deletions
if echo "$LABELS" | grep -Fxq 'mcp-catalog-reviewed'; then
echo "MCP catalog review label present."
exit 0
fi
BODY="## ⚠️ MCP catalog security review required
This PR changes the bundled MCP catalog or MCP catalog installer code. MCP entries can define local commands that users later install into \`mcp_servers\`, so this needs explicit maintainer review before merge.
A maintainer should verify:
- any new/changed \`optional-mcps/**/manifest.yaml\` command and args are expected,
- stdio transports do not use shell+egress/exfiltration payloads,
- git install refs are pinned and bootstrap commands are minimal,
- requested env vars/secrets match the upstream MCP's documented needs.
After review, add the \`mcp-catalog-reviewed\` label and re-run this check."
gh pr comment "$PR" --body "$BODY" || echo "::warning::Could not post PR comment (expected for fork PRs)"
echo "::error::MCP catalog changes require the mcp-catalog-reviewed label."
@@ -181,16 +181,20 @@ See `hermes claw migrate --help` for all options, or use the `openclaw-migration
We welcome contributions! See the [Contributing Guide](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing) for development setup, code style, and PR process.
Quick start for contributors — clone and go with `setup-hermes.sh`:
Quick start for contributors — use the standard installer, then work from the
full git checkout it creates at `$HERMES_HOME/hermes-agent` (usually
`~/.hermes/hermes-agent`). This matches the layout used by `hermes update`, the
managed venv, lazy dependencies, gateway, and docs tooling.
'Set up a WeCom self-built app, expose its callback URL, and provide the corp ID, secret, agent ID, and AES key.',
weixin:
'Sign in to the WeChat Official Account platform, copy the AppID and Token, and point the message callback URL at Hermes.',
'Run `hermes gateway setup`, select Weixin, then scan and confirm the QR code with a personal WeChat account. Hermes connects through Tencent\'s iLink Bot API and saves the credentials.',
qqbot:'Register an app on the QQ Open Platform (q.qq.com) and copy the App ID and Client Secret.',
api_server:
'Expose Hermes as an OpenAI-compatible API. Set an auth key, then point Open WebUI / LobeChat / etc. at the host:port.',
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.