The dangerous-command approval layer already blocks `hermes gateway
(stop|restart)`, `pkill/killall hermes|gateway`, and `kill ... $(pgrep ...)`.
A reporter noted on #33071 that the agent can still achieve the same
effect by driving launchd directly against the gateway's service label
(`launchctl stop ai.hermes.gateway`, `launchctl kickstart -k
system/ai.hermes.gateway`, etc.) or by substituting `pidof` for `pgrep`
in the kill-expansion form.
This widens the "Gateway lifecycle protection" block in
`tools/approval.py` to cover both vectors:
- `launchctl (stop|kickstart|bootout|unload|kill|disable|remove)`
scoped to commands that target a Hermes label (`hermes`,
`ai.hermes`). Read-only inspection (`launchctl print …`,
`launchctl list`) and operations against unrelated labels remain
unflagged.
- `kill ... $(pidof …)` and the backtick form, alongside the existing
`pgrep` expansion. `pidof` is the BSD/Linux equivalent and is
equally opaque to the `(pkill|killall) … hermes` name pattern.
Intentionally left out of scope: plain `kill -TERM <numeric_pid>` with
a PID looked up out-of-band. Catching that would require runtime PID
state and would break the existing
`TestPgrepKillExpansion::test_safe_kill_pid_not_flagged` contract,
which guarantees that a plain literal-PID `kill 12345` stays safe.
Embeds reach out to third parties on render, so default to a placeholder that
mirrors the tool-approval UX: "Load <service>" (this embed) or "Always allow
<service>" (persisted). A desktop-local store ($embedMode ask|always|off +
per-service allowlist) gates the fetch with zero gateway round-trip; an
Appearance setting controls the global default. Local renderers (mermaid, svg,
alerts) are never gated. Addresses review feedback on outbound third-party
requests.
Natural-language skill search returned a short, arbitrary list and never
surfaced NVIDIA (or OpenAI/Anthropic/HuggingFace) skills. Two causes:
1. The runtime index collapses every GitHub tap into source="github", so
there was no way to find or filter by provider at the CLI — the per-tap
identity only existed in the docs-site catalog.
2. HermesIndexSource.search matched only name/description/tags (not the
identifier or provider) and broke at the first `limit` hits in raw index
order, burying the most relevant skills. `search` also defaulted to
--limit 10 against an 86k-entry catalog.
Changes:
- GitHubSource stamps a per-tap provider label (extra.provider) on each
skill via github_provider_for(); source stays "github" so dedup/floor/
index-skip logic is untouched. Flows into the built index.
- HermesIndexSource.search now matches identifier + provider too, and
collect-then-ranks (exact > prefix > whole-word > substring) instead of
break-at-limit.
- --source nvidia|openai|anthropic|huggingface|voltagent|gstack|minimax
provider filters for browse/search (narrows merged results by provider).
- search --limit default 10 -> 25; table Source column shows the provider
label for github skills.
Tested: 181 unit tests pass; E2E against the live runtime index confirms
'nvidia'/'cuda' searches now surface NVIDIA-provider skills and
--source nvidia narrows to exactly the NVIDIA catalog.
The post-update gateway restart path relaunched the gateway with the
venv's console `python.exe` (via `get_python_path()` in
`_gateway_run_args_for_profile`). On Windows this leaves a terminal
window open permanently: uv's `venv\Scripts\python.exe` is a launcher
shim that re-execs the *base* console interpreter, which allocates its
own conhost — and `CREATE_NO_WINDOW` cannot suppress that second window.
The clean-start path (`_spawn_detached`) already dodges this by routing
through `_resolve_detached_python` to use the windowless base
`pythonw.exe`; the restart watcher did not.
Symptom (reported on Windows 11): after an in-app GUI update, a console
window for the gateway stays open and never closes. Confirmed on the
reporter's box — the running gateway was `python.exe ... gateway run
--replace` with a live conhost child and the foreground "Press Ctrl+C to
stop" banner, born exactly at the update's "Restarting Windows gateway"
log line.
Fix:
- Add `gateway_windows.windowless_gateway_restart_spec(run_argv)` which
rewrites a console-python gateway argv into the windowless `pythonw.exe`
equivalent and returns the cwd + env overlay (VIRTUAL_ENV / PYTHONPATH /
HERMES_HOME) the base interpreter needs to import `hermes_cli` without
the venv launcher's site config. No-op on POSIX.
- `_spawn_gateway_restart_watcher` now applies that rewrite on Windows and
threads cwd= / env= into the inlined respawn Popen. Covers both restart
entry points (`launch_detached_profile_gateway_restart` and
`launch_detached_gateway_restart_by_cmdline`). CREATE_NO_WINDOW |
DETACHED_PROCESS | CREATE_BREAKAWAY_FROM_JOB and the breakaway-denied
fallback are all preserved.
Verified E2E on a real Windows 11 box: drove the actual watcher against a
dummy old-pid; the respawned gateway came up as `pythonw.exe` (zero
console python, no conhost child) and booted fully (housekeeping + kanban
dispatcher started → imports resolved under the base interpreter).
Tests: TestWindowlessGatewayRestartSpec (behavior) +
TestGatewayDetachedWatcherWindowsFlags regression assert. Pre-existing
Linux-only failures on a Windows host (SIGKILL, systemd, docker-root)
confirmed identical on the bare base.
Add a deprecation banner to the top of the dedicated Nix & NixOS setup
guide and consistency notes at the Nix sections of installation, updating,
and the plugin-distribution guide. Nix is now best-effort only; the
supported install paths are the curl|bash installer, Docker, and Windows.
On macOS, terminal(background=true) silently failed: the process returned a
session_id and exit_code=0 but the command never ran (empty stdout, no side
effects). Root cause is two interacting issues:
1. _find_shell was aliased to _find_bash, which prefers `shutil.which("bash")`
→ /bin/bash (GNU bash 3.2, still shipped on macOS) over $SHELL (/bin/zsh).
2. process_registry.spawn_local runs [shell, "-lic", "set +m; <cmd>"] with
stdin=/dev/null. bash 3.2 as a login shell sources ~/.bash_profile, which on
many macOS setups contains `exec /bin/zsh -l`; that exec replaces bash but
drops the -c argument, so the command is swallowed (exit 0, no output).
Decouple _find_shell from _find_bash: _find_shell now prefers the user's
configured $SHELL on POSIX (the shell they actually log in with), falling back
to _find_bash when $SHELL is unset/missing. _find_bash is unchanged, so callers
that genuinely need bash (e.g. the _run_bash login-shell snapshot) keep bash
semantics. zsh handles -lic correctly even with redirected stdin.
Salvaged from #42219 by @liuhao1024 (authorship preserved via cherry-pick).
On top of the original (8 unit tests covering $SHELL-set/unset/missing/empty,
Windows-ignores-$SHELL, _find_bash-unchanged), added an E2E regression test
that reproduces the real bash-3.2 login-shell swallow (exit 0 / no file) and
asserts the shell _find_shell selects actually executes a -lic background
command. Mutation-verified: reverting _find_shell to the bash alias fails the
$SHELL-preference test. Bug reproduced directly: /bin/bash 3.2 -lic with a
.bash_profile->exec-zsh creates no file; zsh -lic does.
Closes#42203. Supersedes #42290.
When a Hermes-managed node/npm/npx shim exists but fails --version, redownload
the pinned nodejs.org bundle under HERMES_HOME/node and retry. Do not fall
back to system npm on PATH when a managed tree is present.
POSIX heal probes node, npm, and npx (npm can break while node still runs).
A stale or partial Hermes-managed Node tree under the active HERMES_HOME
can leave bin/npm behind while lib/cli.js is missing. File-existence checks
alone made hermes update pick that broken npm and skip healthy system npm
on PATH. Probe managed candidates with --version before preferring them.
projects.db (per-profile project store) and kanban.db were missing from
_QUICK_STATE_FILES, so the pre-update quick snapshot never backed them up.
On a desktop upgrade, when the update flow removes/replaces the file and the
post-update schema-init re-creates an empty one, all user-created projects,
folder mappings, the active-project pointer, kanban board bindings, and tasks
vanish silently — no error.
Add the per-profile user-created stores to the snapshot set:
- projects.db — project store
- response_store.db — gateway conversation history / tool payloads (WAL)
- memory_store.db — holographic memory facts/entities (WAL)
- verification_evidence.db — agent verification audit trail
- kanban.db — default board (back-compat <root>/kanban.db)
- kanban/boards — non-default boards (<root>/kanban/boards/<slug>/kanban.db
+ metadata); workspaces/ and attachments/ subtrees
are skipped as large + regenerable.
Also: the directory-branch of create_quick_snapshot now routes *.db through the
WAL-safe _safe_copy_db (SQLite backup() API), matching the top-level file path —
previously a non-default board DB with an open WAL could be copied inconsistently.
Salvaged from #52930 by @0xDevNinja (authorship preserved via cherry-pick).
On top of the original (which covered only projects.db + the default kanban.db),
this adds: non-default-board coverage, the three sibling per-profile DBs that
meet the same upgrade-wipe criteria, WAL-safe directory copies, and a
workspaces/attachments skip to avoid snapshot bloat (×20 retained). 8 tests,
all mutation-verified; E2E verified snapshot→wipe→restore preserves all six
store types on the real code path.
Closes#52889. Supersedes #52930.
The external-drain marker .drain_request.json is written under HERMES_HOME,
which on Hermes Cloud is a persistent Fly volume (/opt/data). A begin-drain
marker therefore SURVIVES the post-update machine restart. But the disruptive
lifecycle actions a drain protects (auto-update / image migrate / env edit /
profile change) all restart the machine — which is exactly the signal the drain
is over. The freshly-restarted gateway re-read the orphaned marker on its
startup reconcile and parked itself back in 'draining', refusing every new turn
indefinitely (NS-570: ~52 min until manually cleared).
Fix: stamp the marker with an identity of THIS container/VM instantiation
(kernel boot_id + PID 1 start time, read from /proc) and treat a marker whose
epoch differs from the current instantiation as absent. A deliberate restart →
new PID 1 → new epoch → stale marker ignored → gateway boots 'running'. A marker
written during the current instantiation (the live drain) still matches; an s6
respawn of just the gateway (PID 1/init unchanged) keeps the same epoch, so an
in-flight drain is still honoured (D4a reversibility preserved).
The staleness check is lenient and never fail-closed: a legacy marker with no
epoch, a corrupt/contentless marker, or an environment with no /proc (epoch
unavailable) all degrade to the original presence-only behaviour. NAS is
untouched — it only ever POSTs begin/cancel-drain over HTTP; the marker file is
purely gateway-internal IPC.
The fix is entirely within gateway/drain_control.py; the watcher and the
dashboard endpoint go through the same drain_requested()/write_drain_request()
chokepoints and need no functional change.
## Description
On macOS 26.x, `launchctl bootstrap` and `launchctl kickstart` return exit code 5 ("Input/output error"), which Hermes already anticipates and handles by spawning a detached fallback process. However, the gateway status reporting is ambiguous:
- `gateway status` says "Gateway service is loaded" (because `launchctl list` returns exit 0)
- But `launchctl print` shows `state = not running` — launchd isn't actually supervising anything
- The detached fallback PID running is invisible to the status command
- Users can't tell whether auto-start at login and auto-restart on crash are available
### Root Cause
Two problems in `hermes_cli/gateway.py`:
1. **`_probe_launchd_service_running()`** (line 1067): Determined launchd service liveness solely by `launchctl list <label>` exit code. On macOS 26, this returns 0 even when the service is only *registered* but not running (output lacks a `"PID"` field). This caused `GatewayRuntimeSnapshot.service_running = True` incorrectly, which suppressed the process/service mismatch warning.
2. **`launchd_status()`** (line 3569): Used the same binary "loaded/not loaded" check without inspecting whether launchd actually has a PID, whether a detached fallback is running, or whether auto-start/restart are available.
### Changes
**`hermes_cli/gateway.py`:**
1. **New `_parse_launchd_pid_from_list_output()` helper** — Extracts the PID from `launchctl list` output. When launchd is actively supervising, the output includes `"PID" = <number>;`. When only registered but not running, no PID field is present.
2. **Fixed `_probe_launchd_service_running()`** — Now requires a PID in the `launchctl list` output to confirm launchd is actually supervising. This correctly sets `service_running = False` when launchd has the service registered but `state = not running`, which triggers the existing process/service mismatch detection.
3. **Reworked `launchd_status()`** — Reports clearly separated information:
- LaunchAgent plist currentness (stale or current)
- Whether launchd is actively supervising (with PID)
- Whether a detached fallback PID is running
- Whether auto-start at login and auto-restart on crash are available
- When launchd supervision is known to be unavailable, explains why
4. **Persistent unsupported marker** (`~/.hermes/.gateway-launchd-unsupported`) — Written when `_launchd_fallback_to_detached()` is called (launchd exit 5/125). Allows `launchd_status()` to explain *why* launchd can't supervise even when no fallback process is currently running. Cleared automatically when a future bootstrap/kickstart succeeds (e.g., after an OS update fixes the issue).
5. **Updated `_print_gateway_process_mismatch()`** — Distinguishes the managed detached fallback from a genuinely manual `nohup hermes gateway run`, providing accurate guidance for each case.
### Status Output Examples
**Before** (macOS 26, fallback active):
```
Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist
✓ Service definition matches the current Hermes install
✓ Gateway service is loaded
{
"Label" = "ai.hermes.gateway";
"OnDemand" = true;
...
};
```
**After** (macOS 26, fallback active):
```
Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist
✓ Service definition matches the current Hermes install
⚠ Gateway service is registered but launchd is not supervising it
launchd cannot manage the gateway on this macOS version.
✓ Detached fallback process is running (PID 12345)
Cron jobs will fire. Stop with: hermes gateway stop
⚠ Auto-start at login and auto-restart on crash are NOT available.
```
**After** (normal launchd supervision):
```
Launchd plist: ~/Library/LaunchAgents/ai.hermes.gateway.plist
✓ Service definition matches the current Hermes install
✓ Gateway is supervised by launchd (PID 12345)
Auto-start at login and auto-restart on crash are available.
```
### Tests
Updated 5 existing tests and added 11 new tests in `tests/hermes_cli/test_gateway_service.py`:
- PID parsing from `launchctl list` output (with PID, without PID, empty, unquoted PID)
- `_probe_launchd_service_running()` requires PID presence
- Unsupport marker lifecycle (write, clear, persist across fallback)
- Marker cleared on successful bootstrap
- `launchd_status()` reporting: supervised, fallback-running, fallback-unavailable
- Existing fallback tests now verify marker creation
### Related Issues
- Issue #23387 (original macOS 26 launchd workaround)
- Issue #42524 (this issue)
A clarify/approval/sudo/secret prompt blocks the turn on the user, but the UI
treated it as an in-flight turn: the "thinking" timer kept ticking and Esc
interrupted the run — discarding a question you might want to come back to. Add
$activeSessionAwaitingInput (the pet's awaitingInput concept, scoped to the
active session) and use it to suppress the stall indicator and disarm Esc while a
prompt waits. Clear the session's prompts (and needsInput) on Stop and on turn
end so a resolved/aborted turn can't leave a dead panel or a stuck "needs input"
dot.
The inline clarify panel used its own card tokens, an animated ring, and
oversized spacing — out of step with every other tool row. Rebuild it on the
shared --ui-*/--conversation-* tokens: a compact panel, letter-key badges
(A/B/C…) that double as a/b/c… shortcuts, an inline content-sizing "Other" field
(CSS field-sizing — no view swap, no layout shift on focus), and a Continue
button so picking an option selects rather than auto-sends. Selection lives on
the letter badge alone (solid primary; outlined while Other is focused-but-empty).
Also settle the panel into the standard tool block once the turn stops running,
so a stopped turn no longer strands a live, unanswerable prompt.
Add a content-agnostic Zoomable primitive (useZoomPan hook + overlay viewer):
click to open full-screen, wheel-zoom toward the cursor, drag to pan, toolbar
zoom/reset, and an optional copy action. Wire Mermaid diagrams into it with
copy-as-PNG; reusable for other inline content later.
A config migration (or hand-edit) that leaves an invalid toolset name in
`platform_toolsets` — e.g. the #38798 corruption that rewrote `hermes-cli` to
the non-existent `hermes` — silently disabled all affected tools:
resolve_toolset() returns [] for an unknown name, so the agent quietly lost its
tools with no error, warning, or log entry and degraded to text-only replies.
Surface it loudly at two points:
- After migration (migrate_config): validate platform_toolsets and record/print
a warning per unknown name, with a `hermes-<platform>` suggestion when that
would have been valid (the exact #38798 shape).
- At runtime (_get_platform_tools): if a platform was explicitly configured but
every toolset name is invalid, log a warning when tools are resolved for a
session — so an ALREADY-corrupted config is caught at startup, not only on the
next `hermes update`.
Logic lives in a new pure, side-effect-free helper (toolset_validation.py) with
validate_toolset injected, so it is unit-testable without the tool registry.
Note: the original v25→v26 migration that caused the corruption no longer
exists (config format is now v30; no migration step rewrites toolset names).
This change is the durable defense against the silent-failure mode regardless
of cause, matching the issue's "Expected: log a warning".
Salvaged from #39207 by @lEWFkRAD (authorship preserved via cherry-pick).
Tests: 9 helper cases (incl. the #38798 corruption shape, mixed valid/invalid,
zero-tools state, non-dict/scalar/non-string) + a runtime caplog test — both the
helper warning and the runtime guard mutation-verified to fail without the fix.
Closes#38798. Supersedes #39581 (prevent-in-v25→v26 — that path is gone),
#41006 / #40208 (repair-migration for already-corrupted configs).
Wire the embeds module into the markdown surface: bare provider autolinks unfurl
to inline embeds, ```mermaid/```svg fences route to the rich renderers, and
`> [!NOTE]`-style blockquotes become alert callouts. Labeled links stay plain.
Per-kind renderers, each a lazy split chunk: plain-iframe video/maps (wheel
chains to the transcript; maps gate scroll behind ⌘), the in-document
blockquote-script path for X/Instagram, the dark Spotify player, and the
YouTube iframe. Adds Mermaid and DOMPurify-sanitised SVG fences and GFM alert
callouts, all sized to 33dvh and theme-matched to avoid white color-scheme
artifacts. Main-process stamps a Referer on YouTube embed requests.
Pure, synchronous URL→descriptor matchers for YouTube, Vimeo, Instagram,
Pinterest, TikTok, X, Spotify, Google Maps and OpenStreetMap, plus the shared
embed primitives (error boundary, fail card, escape-html, dark-mode hook,
sizing token). Declares the mermaid + dompurify deps used by the fenced
renderers.
The cron runtime tripwire (_scan_cron_prompt) used a 10-char invisible-unicode
set while the install-time scanner (threat_patterns.INVISIBLE_CHARS) flags 17.
The cron-local set was missing U+2062-U+2064 (invisible math operators) and
U+2066-U+2069 (directional isolates), so a directive obfuscated with one of
those codepoints (e.g. "ig<U+2063>nore all previous instructions") slipped past
the runtime cron gate while being caught at install time.
Import the canonical set so the cron tripwire and install scanner can't drift
apart again. Emoji-ZWJ protection (_zwj_has_emoji_neighbour) is unchanged.
Fixes#35075
Co-authored-by: rlaope <piyrw9754@gmail.com>
Replies on WhatsApp Cloud arrived at the agent with reply_to_id set but
reply_to_text=None, so run.py never injected the "[Replying to: ...]"
disambiguation prefix (it gates on reply_to_text). Meta's webhook context
object carries only the quoted message's id, never its text.
Index (chat_id, wamid) -> text in rich_sent_store on every inbound message
and every outbound text send -- the same store that solved the identical
Telegram rich-send problem -- then look up the quoted text in
_build_message_event_from_cloud and populate reply_to_text plus
reply_to_is_own_message, derived from context.from versus the business
number.
Tasks 2.1 + 2.2 + 2.3 of the safe-shutdown plan — the reversible
quiesce-without-restart machinery NAS drives during a lifecycle action (D4a).
These ship together because the endpoint, the control channel, and the gateway
state machine are one coherent slice.
2.2 — control channel (gateway/drain_control.py, new):
The dashboard has no HTTP path into a running gateway (guardrails: "there is NO
external control channel into a running gateway"); restart/drain is driven only
by markers the gateway reacts to. So begin/cancel-drain writes/removes a
presence-based marker .drain_request.json (HERMES_HOME-scoped, atomic write,
never-raises read; a corrupt marker reads as present-contentless → fail-safe
toward quiescing). This is Q-B option A.
2.2 — gateway state machine (gateway/run.py):
- _external_drain_active flag, DISTINCT from the shutdown _draining flag: this
one does NOT exit the process and is fully reversible.
- _enter_external_drain / _exit_external_drain: idempotent transitions that
flip gateway_state→draining / →running via _update_runtime_status (preserving
the live active_agents count). exit refuses to revert to running during a
real shutdown or after the loop stops (shutdown wins).
- _drain_control_watcher: 1s background task (modelled on _handoff_watcher)
reconciling accept-state with the marker; honours a marker that survived a
restart on its first tick. Registered alongside the other watchers in start.
- New-turn accept gate in _handle_message, placed BEFORE the session-slot
claim: when draining, refuse to START a new turn (so active_agents can only
fall → no TOCTOU race), while in-flight turns finish untouched. Internal/
system events (restart-recovery replays, bg-process completions) bypass it.
2.1 — endpoint (hermes_cli/web_server.py):
POST /api/gateway/drain {action: drain|cancel}. Authenticated by the Task-2.0a
token seam (the drain plugin registered this exact path as a token route);
attributes the request to the verified token principal. Begin writes the
marker, cancel removes it — the gateway process owns the actual transition.
Force-override (D6) is NOT here; it maps onto the existing immediate
/api/gateway/restart force path.
Tests (mocked — necessary-not-sufficient; the HARD live gate Q-B is next):
- tests/gateway/test_external_drain_control.py — marker contract (write/clear/
read/corrupt/atomic), state machine (enter/exit/idempotency/shutdown-wins/
loop-stopped), watcher reconcile-enter-then-exit, new-turn refusal, and
in-flight-not-interrupted. 15 tests.
- tests/hermes_cli/test_web_server.py — /api/gateway/drain begin/default-begin/
cancel/cancel-idempotent/bad-action-400. 6 tests.
- dashboard.drain_auth config section already added in 2.0b commit.
All touched suites green: 301 (gateway+auth) + 9 (web_server endpoints) passed.
Intentionally deferred:
- HARD live-validation gate (Q-B): real isolated `hermes gateway run`, drive a
real begin-drain marker, prove the 5-point checklist a–e.
- Spec-doc status flip + Phase-2 PR.
Build status: external-drain, restart-drain, status, dashboard-auth, drain-plugin,
token-auth, and web_server-endpoint suites green.
Task 2.0b: the concrete shared-bearer-secret auth provider, the FIRST consumer
of the generic token-auth capability (Task 2.0a). Implements decisions.md Q-A.
plugins/dashboard_auth/drain/ (bundled, discovered like dashboard_auth/basic):
- DrainSecretProvider: non-interactive provider, supports_token=True. Verifies
an inbound Authorization bearer token against a per-agent shared secret with
hmac.compare_digest (constant-time, no timing oracle) and, on a match,
vouches for the caller as the "drain-control" principal scoped to "drain".
The five interactive ABC methods raise NotImplementedError; verify_session
returns None (stacks harmlessly in the cookie-verify loop).
- assess_secret_strength(): fail-closed entropy gate. Rejects secrets shorter
than 43 url-safe-b64 chars (~256 bits), with < 16 distinct characters, or
below 128 bits Shannon entropy — so a weak/structured/repeated secret can
never be silently accepted. Enforced both at register() (friendly skip
reason) and in __init__ (raises — defence in depth).
- register(ctx): no-op + skip reason when HERMES_DASHBOARD_DRAIN_SECRET is
unset; rejects a weak secret fail-closed (drain endpoint stays gated). On a
strong secret, registers the provider AND opts /api/gateway/drain into the
generic token-auth seam via register_token_route().
Config: the secret is a CREDENTIAL → carried via HERMES_DASHBOARD_DRAIN_SECRET
(per-agent, provisioned by NAS at deploy). Behavioural knobs only
(dashboard.drain_auth.{scope,min_secret_chars}) live in config.yaml — added to
DEFAULT_CONFIG with the .env-is-for-secrets rationale documented inline.
Tests: tests/plugins/dashboard_auth/test_drain_provider.py — entropy gate
(strong pass; empty/short/repeated/few-distinct/custom-min reject), verify_token
(match → scoped principal, wrong/empty → None, custom scope), protocol
compliance, interactive-methods-raise, and register() (skip-no-secret,
fail-closed-weak-secret, strong-env-secret registers + route opt-in, config
scope + min_secret_chars). 21 new tests; drain + token-auth suites 44 passed.
Verified the plugin is discovered as dashboard_auth/drain alongside basic/nous.
Intentionally deferred:
- The begin/cancel-drain endpoint handler itself — Task 2.1.
- The dashboard→gateway control channel — Task 2.2.
Build status: dashboard-auth + drain-plugin suites green.
Task 2.0a of the safe-shutdown drain-coordination plan. Widens the dashboard
auth framework GENERICALLY to support non-interactive (service-to-service)
bearer-token auth, mirroring the existing supports_password precedent. This is
a reusable capability — any future machine-credential provider plugs in without
core changes (decisions.md Q-C). The drain bearer-secret plugin (Task 2.0b) is
the first consumer, not the definition.
- base.py: add TokenPrincipal dataclass (the token analog of Session) +
supports_token capability flag + verify_token() on the ABC (default raises
NotImplementedError so a misconfigured provider fails loud). Contract mirrors
verify_session stacking: return None for unrecognised tokens (never raise),
raise ProviderError only on a genuine backing-store outage.
- registry.py: list_token_providers() — the supports_token subset, in
registration order. Empty when none registered (token routes fail closed).
- token_auth.py (new): route-agnostic seam. Routes opt in via
register_token_route(exact path); token_auth_middleware owns the auth
decision for those routes only — authenticate via stacked providers, attach
request.state.token_principal + token_authenticated, pass through. 401 on
missing/unrecognised token, 503 when a provider was unreachable, untouched
passthrough for non-token routes. Fails closed (never open).
- web_server.py: install the seam OUTERMOST (registered last → runs first).
Both downstream gates (legacy auth_middleware + gated_auth_middleware) honour
request.state.token_authenticated and skip enforcement, so a token-authed
service request is never bounced to /login.
- audit.py: TOKEN_AUTH_SUCCESS / TOKEN_AUTH_FAILURE events.
Tests: tests/hermes_cli/test_dashboard_token_auth.py — ABC flag default,
verify_token NotImplementedError, registry filter, bearer extraction
(case-insensitive scheme, malformed/non-bearer → ""), provider stacking
(first-match-wins, unreachable-remembered, unreachable-then-valid, buggy
provider doesn't crash the gate), and the seam's passthrough/401/503/
fail-closed behaviour. 29 new tests; full dashboard-auth suite 169 passed.
Intentionally deferred:
- The concrete shared-bearer-secret provider plugin — Task 2.0b.
- The begin/cancel-drain endpoint that registers itself as a token route —
Task 2.1.
Build status: dashboard-auth + plugin-hook suites green.
The known_c2_framework threat pattern included 'praxis' in its
alternation alongside genuine offensive-security tool brands (Cobalt
Strike, Sliver, Havoc, Mythic, Metasploit, Brainworm). Unlike those
distinctive brand names, 'praxis' is a common English word (Greek for
practice/action) and a legitimate agent name, so any context file that
mentioned an agent named Praxis matched at 'context' scope and the whole
AGENTS.md / SOUL.md was replaced with a [BLOCKED] placeholder before it
reached the system prompt.
Remove 'praxis' from the alternation and add a guard comment: every
token in this list must be a distinctive tool brand, not a common word.
Real C2 brands still fire.
The new _maybe_flag_poisoned_client tests built a provider via
get_or_build_provider without an interactive stdin. Under the hermetic
test env (no TTY, no cached tokens), the non-interactive guard in
mcp_oauth_manager._make_provider raised OAuthNonInteractiveError before
the provider was built, failing 6 tests in CI parity (they passed
locally where stdin was a TTY).
Thread monkeypatch into _provider_with_token_endpoint and present an
interactive stdin, matching the sibling test_manager_builds_hermes_provider_subclass.
Fixes#36767.
Two complementary recoveries for the recurring "delete three cache files and
re-auth by hand" ritual when an MCP server's dynamically-registered OAuth
client goes dead server-side (IdP redeploy / DB wipe / rebrand):
- Auto-heal (token-endpoint subset): HermesMCPOAuthProvider now sniffs
auth-flow responses and, on a 400/401 `invalid_client` from the discovered
token endpoint, backs up + deletes `<server>.client.json` and `.meta.json`
and clears the in-memory client so the SDK re-runs RFC 7591 dynamic client
registration on the next flow. Conservative by construction: only
dynamically-registered (non config-supplied) clients, only the token
endpoint, only on a word-boundary `invalid_client` match (so RFC 7591's
`invalid_client_metadata` does not trip it); best-effort so a miss never
breaks the live flow. Covers both code-exchange and refresh when the token
endpoint was discovered. Tokens are preserved.
- `hermes mcp reauth [<name>|--all]`: the reporter's primary symptom — the
IdP's in-browser "Redirect URI Mismatch" — produces no HTTP signal (the SDK
only sees a callback timeout), so it cannot be auto-detected. The new
command re-auths one or ALL `auth: oauth` servers, serially: one browser
flow at a time, which also fixes the startup popup storm when several
servers are stale at once. Single-server reauth is factored out of
`mcp login` and shared.
Tests: +14 (poison helper x2; token-endpoint detection x5 incl. wrong-endpoint,
success-response, pre-registered, and invalid_client_metadata negative guards;
a bridge integration test driving the real async_auth_flow generator to prove
the detection hook preserves the bidirectional asend() forwarding contract;
reauth CLI x6). Verified against the pinned mcp==1.26.0: scripts/run_tests.sh
122/122 green for the touched suites; check-windows-footguns.py and ruff clean.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cut over the agent half of Shape A (D-Q1.5a/b.1/c) to front a SET of platforms on
one relay WS:
- relay_platform_identities() parses GATEWAY_RELAY_PLATFORMS (list) +
GATEWAY_RELAY_BOT_IDS (JSON keyed map {platform:{botId,username?}}). Cut over
from the scalar GATEWAY_RELAY_PLATFORM/_BOT_ID (no fallback, D-Q1.5c).
- self_provision_relay() loops one /relay/provision per platform under one
gatewayId+secret, partial-failure-tolerant.
- WebSocketRelayTransport takes the identity SET, sends one hello per identity
(connector accumulates the advertised set), and stamps the per-frame
OutboundFrame.platform + its matching advertised botId on outbound.
- RelayAdapter remembers each chat's underlying source.platform (mirroring the
existing guild/dm scope capture) and tags the reply's egress platform.
- send_relay_policy() declares one relevance policy per fronted platform (the
connector keys policy by (tenant,platform,instanceId)).
Single-platform deploys are byte-identical on the wire (1-element list, no per-frame
tag -> connector session-default fallback). typecheck/ruff clean; relay unit 221 pass
(+10 new); all 15 cross-repo E2E drivers green vs connector origin/main.
The gateway reconnect watcher (gateway/run.py) recovers a platform after a
fatal adapter error by building a fresh adapter and calling
connect(is_reconnect=True). Every BasePlatformAdapter implements
connect(*, is_reconnect: bool = False) for this — except RelayAdapter, whose
connect() was bare. So the watcher's recovery path raised:
TypeError: connect() got an unexpected keyword argument 'is_reconnect'
Observed live on a hosted staging agent: after a fatal relay adapter error the
watcher could never re-establish relay, so the shared-bot inbound never reached
the gateway and Discord DMs stopped (dashboard surfaced the TypeError).
Relay deliberately ignores the flag: the #46621 server-side-queue-preservation
concern doesn't apply, because relay's outage buffer is the connector's durable
buffer (replayed on the transport's re-handshake), not a gateway-side queue the
adapter owns. Routine WS drops are already handled by the transport's own
reconnect supervisor (WebSocketRelayTransport, reconnect=True); the watcher path
is fatal-error recovery, and the fatal handler disconnect()s the old adapter
(cancelling its supervisor) before a fresh adapter+transport is built, so there
is no double-dial.
Adds two regression tests (both proven red without the fix): connect(is_reconnect=True)
reaches the same transport-less RuntimeError instead of TypeError, and the
signature matches BasePlatformAdapter.connect.
Bring apps/desktop and ui-tui to a clean state for typecheck, eslint,
and prettier:
- Run prettier across both trees (printWidth/wrap drift; prettier is not
CI-enforced for these JS projects, so main had accumulated drift).
- Apply eslint --fix for padding-line-between-statements and perfectionist
import/export sorting.
- Manual fixes for non-auto-fixable rules:
- remove unused node:net import in electron/main.cjs (uses Electron net)
- replace inline `typeof import(...)` annotations with top-level
`import type * as EnvModule` in two ui-tui test files
- scoped eslint-disable no-control-regex on intentional sentinel/ANSI
regexes (mathUnicode.ts, text.ts)
- resolve react-hooks/exhaustive-deps per-case: correct swapped/missing
deps, collapse redundant session.* members, and justified disables on
settings mount-only data-load effects to preserve run-once behavior
No behavior changes; test pass/fail counts are unchanged from the main
baseline.
When operator config has provider=anthropic with model.base_url pointing
at a non-Anthropic host (e.g. https://openrouter.ai/api/v1 with provider=anthropic),
the auxiliary Anthropic path was unconditionally applying that override.
Main-session traffic routed correctly because the main path attaches the
right credential for the actual destination, but every side-channel call
(memory extractors, reflection, vision, title generation, janus
extractor/promise) sent ANTHROPIC_API_KEY to the foreign host and 401'd.
Gate the override on hostname == api.anthropic.com. Operators routing main
through a non-Anthropic provider must use that provider's own auxiliary
client; the Anthropic aux path now stays pointed at api.anthropic.com.
Regression tests cover openrouter, openai, anthropic-with-path, empty, and
anthropic-default-base_url cases.
Alt+wheel now scales about the pixel under the pointer instead of growing from a
corner, so the pet stays put under the cursor instead of running away. In-window
shifts its top-left; the overlay repositions its OS window (cursor-anchored on
wheel, bottom-center for slider-driven changes).
Hold Alt/Option and scroll over the mascot to resize it (same on Mac and
Windows); the modifier keeps a plain scroll passing through to the page. The
gesture drives the same `display.pet.scale` path as the settings slider.
The popped-out overlay grows its OS window to fit the pet at any scale (anchored
bottom-center) so the sprite is never clipped by the window edge, and the
in-window pet re-clamps against its actual size so growing near an edge can't
crop it. Also makes the overlay click-through per-pixel: only solid sprite
pixels (plus bubble / mail button) are interactive, transparent margins pass
clicks through.
Ensure TUI/desktop stop targets the actual conversation thread and cancels any queued next prompt, including the lazy agent-start window, so a stopped session cannot keep running or restart itself.
Wire get_reasoning_stale_timeout_floor() into both stale detectors so known
reasoning models (Nemotron 3 Ultra, OpenAI o1/o3, Opus 4.x thinking, DeepSeek
R1, Qwen QwQ, Grok reasoning) tolerate multi-minute thinking phases instead of
the upstream gateway idle-killing the socket (BrokenPipeError) before first
token. Applied as max(default, floor) — never overrides explicit user config,
never lowers an existing threshold.
The reasoning_timeouts.py allowlist module already landed on main via #52795,
so this salvage carries only the wiring + tests (the duplicate module and the
stale-base MoA reverts from the original PR branch are dropped).
Salvaged from #52238. Fixes#52217.
The salvaged #51875 added a background-review write guard in skill_manage
that refuses mutations to skills.external_dirs skills — but it only fires
when is_background_review() is true. The curator's LLM review fork ran with
the default _memory_write_origin='assistant_tool', so the guard never
triggered during the exact curation pass it exists to protect against
(GH-47688).
- Set _memory_write_origin='background_review' on the curator review fork so
turn_context binds it onto the write-origin ContextVar and the guard fires.
- Add a regression test asserting the fork runs under the background_review
origin (the invariant linking the fork to the guard).
- AUTHOR_MAP: map yu-xin-c for the salvaged commit.
Force redact_sensitive_text(force=True) on the browser_type text arg so
recognized credentials (API keys, tokens, JWTs) are masked in tool
progress, previews, callbacks, and return payloads even when the global
security.redact_secrets opt-out is set — a typed credential reaching chat
history is a security boundary, not log hygiene. Normal typed text matches
no pattern and stays fully readable for debuggability.
Tests assert the API-key-shaped secret is masked across every surface and
that normal text passes through unchanged.
Stopping a turn while the model is streaming (stop/esc to redirect) raised
InterruptedError, set final_response to the throwaway "waiting for model
response" sentinel, and persisted messages WITHOUT the assistant text that
was already streamed to the screen. The next turn then had no record of the
half-finished reply, so the model appeared to "forget" what it just said.
Recover the on-screen text from _current_streamed_assistant_text in the
InterruptedError branch and append it as the assistant turn (and surface it
as final_response). The metadata sentinel is kept only when nothing was
streamed yet, preserving the ACP/client suppression behavior.
Completes the partial-stream recovery from 397eae5d9 (which wired the same
_current_streamed_assistant_text salvage into the connection-failure twin
but missed the user-interrupt path). The lossy handler dates to c98ee9852.
Live-measure WCO width in the renderer, drop the right rail below the titlebar
band, and re-enable GPU compositing under WSLg when /dev/dxg is present.
WSLg bridges clipboard text but not images — pull host screenshots via
PowerShell. Disable titleBarOverlay on plain Linux; gate overlay width per
platform in titlebar-overlay-width.cjs.
* feat(kanban): typed block reasons + unblock-loop breaker
Stops the kanban blocked-task loop: a worker blocks a task, a cron
unblocks it, the worker re-blocks for the same reason, repeat forever.
block_task now takes a typed kind and a persistent block_recurrences
counter on the tasks table:
- kind=dependency routes to todo (parent-gated, auto-resumed), never
the human 'blocked' bucket a cron would keep unblocking.
- needs_input/capability/transient/untyped land in blocked; each
same-cause re-block after an unblock increments block_recurrences,
and at BLOCK_RECURRENCE_LIMIT (default 2) the task routes to triage
for a human instead of blocked.
- unblock_task no longer resets block_recurrences (the amnesia that
let the loop run unbounded); complete_task clears it on success.
Wired through the worker kanban_block tool (new kind arg) and the
hermes kanban block --kind CLI flag, both reporting where the task
actually landed. Docs + 11 new tests; 536 existing kanban tests green.
* test(kanban): make second-block notify test use a distinct block cause
test_notifier_second_blocked_delivers blocked the same task twice with
the same (untyped) reason, which now trips the new unblock-loop breaker
and routes the second block to triage instead of blocked — so only one
'blocked' notification fired. The test's actual intent is that TWO
distinct block cycles each notify; give the two cycles different kinds
(needs_input then capability) so they're genuinely separate blocks. The
same-cause loop→triage path is covered by test_kanban_block_kinds.py.
After a prolonged outage the in-process network-error ladder escalates to
fatal and GatewayRunner._platform_reconnect_watcher rebuilds a fresh adapter
that reconnects through the bootstrap path. That path called
start_polling(drop_pending_updates=True), discarding every update Telegram
queued during the outage — all messages sent while the bot was down were
silently lost. The in-process ladder and 409-conflict handler already passed
drop_pending_updates=False; only bootstrap did not distinguish a cold first
boot from a reconnect.
Thread an is_reconnect signal from the watcher through
_connect_adapter_with_timeout into adapter.connect(). The base
BasePlatformAdapter.connect() gains a keyword-only is_reconnect=False so every
adapter inherits a tolerant signature (no per-platform breakage when the
runner forwards the kwarg). Telegram translates is_reconnect into
drop_pending_updates=not is_reconnect on both the polling and webhook bootstrap
calls. Cold boot still drops the stale queue; a watcher reconnect preserves it.
Fixes#46621.
Co-authored-by: annguyenNous <annguyen@nousresearch.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
Co-authored-by: Kewe63 <Kewe63@users.noreply.github.com>
When the primary provider raises AuthError (e.g. expired OAuth token),
_make_agent now walks the configured fallback_providers/fallback_model
chain before giving up — matching the behavior that cron/scheduler.py
and cli_agent_setup_mixin.py already have.
Fixes#47627
A readable state.db can still reject every message write through the
messages_fts* triggers when the FTS5 index is corrupt: base-table reads and
PRAGMA integrity_check pass, but INSERT INTO messages fails with 'database
disk image is malformed'. The gateway reloads conversation_history from disk
each turn, so a silently-failed write hands the next turn stale/empty history
even though the same cached AIAgent still holds the live transcript — causing
immediate same-session amnesia. (#50502)
- hermes_state.py: _db_opens_cleanly() now drives a rolled-back message write
through the FTS triggers, so write-only corruption (which the read-only
probe reported healthy) is detected. repair_state_db_schema() gains an
in-place FTS5 'rebuild' strategy (tier 0) before the dedup/drop tiers, plus
an already_healthy short-circuit. Both 'hermes sessions repair' and
'hermes doctor' route through these, so the fix covers the whole class.
- hermes_cli/doctor.py: the state.db check runs the write-health probe even on
the success (readable) path and repairs in place with --fix.
- gateway/run.py: _select_cached_agent_history() prefers the cached agent's
longer live _session_messages over a shorter persisted transcript, so an
FTS write failure can't wipe in-session context.
- tests: regressions for write-health detection, in-place repair preserving
rows + resuming writes, the already_healthy shortcut, and the gateway guard.
Combines the approaches from #50504 (@0-CYBERDYNE-SYSTEMS-0, issue author),
#52165 (@davidgut1982), and #50576 (@trevorgordon981).
The email adapter authorized senders entirely off the From: header, which is
attacker-controlled and unauthenticated by IMAP. An attacker could forge
From: an-allowlisted-address and pass both the adapter's EMAIL_ALLOWED_USERS
pre-filter and the gateway's allowlist authz (both key on the same spoofable
sender_addr), getting unauthorized commands executed by the agent.
Verify the From: domain against the trusted Authentication-Results header the
receiving mail server stamps (SPF/DKIM/DMARC) before trusting it for
authorization. Enforced only when an allowlist is in effect and allow-all is
off — fail-closed. Operators whose server does not stamp the header can opt
out via platforms.email.require_authenticated_sender: false (or
EMAIL_TRUST_FROM_HEADER=true).
The scale-to-zero idle watcher never started on a correctly-opted-in,
relay-only instance, so the gateway never ran its idle decision, never called
go_dormant(), and never sent going_idle to the connector. Fly's autostop still
suspended the machine on traffic-idle, but the connector never flipped the
instance to buffered-only — so an inbound DM took the live delivery path,
found no live session for the suspended machine, and was dropped fail-closed
with no wake poke. The machine slept and never woke.
Root cause: _scale_to_zero_should_arm() passed list(config.platforms.keys())
to messaging_is_relay_only_or_absent(). config.platforms is pre-seeded with a
DISABLED placeholder PlatformConfig for every known platform (telegram,
discord, slack, matrix, …), so the key set is always the full ~20-entry
catalog regardless of what the instance actually runs. The relay-only check
discarded "relay", saw the disabled placeholders as live direct-socket
platforms, and returned False — so should_arm() was False and the watcher was
never created. Verified live on a staging instance: config.platforms keys =
[telegram, discord, slack, mattermost, matrix, relay] with only relay
enabled=True; should_arm() = False.
Fix: filter config.platforms to ENABLED entries before the relay-only check,
mirroring the adapter-connect loop which already gates on
`if not platform_config.enabled: continue`. This arms off the same notion of
"active platform" the rest of start() already uses — no parallel concept.
Also add a one-line not-armed diagnostic: when an instance IS opted in (the
HERMES_SCALE_TO_ZERO stamp is set) but the watcher still doesn't arm, log why
(relay_only_or_absent, the enabled platforms, wake_url present/missing). A
non-opted instance stays silent. The arm path previously logged only on
success, so a failed arm was invisible.
Tests: the existing pure-helper tests passed bare names so they never
exercised the call site that feeds the placeholder-laden config. Add
behaviour-contract tests against the REAL _scale_to_zero_should_arm with a
realistic config.platforms (relay enabled + others disabled). The F25
regression test (relay-only + disabled placeholders must arm) and the
no-platform case are RED without this fix, GREEN with it; the
genuinely-enabled-direct-platform / not-opted-in / no-wake-url cases stay
correctly non-arming so the filter can't over-broaden.
Wake mechanism itself verified healthy independently (direct wakeUrl GET
resumed a suspended staging instance in 1.15s, clean resume signature).
CredentialPool._sync_device_code_entry_to_auth_store rotated single-use
OAuth refresh tokens but wrote the new chain only into the active profile
store. When a profile resolves a grant from the global-root fallback
(read_credential_pool, #18594) and the pool then refreshes it, root was
left holding a now-revoked refresh token — every other profile reading the
stale root grant subsequently died with refresh_token_reused / invalid_grant
once its access token expired.
This is the credential-pool analog of #43589 (which fixed the non-pool xAI
refresh path in _save_xai_oauth_tokens). Detect the read-from-root case
(profile lacks its own providers.<id> block) BEFORE the profile save and,
after it, write the rotated chain back to the global root via a best-effort,
seat-belted write-through. A profile that genuinely shadows root (owns the
block) is untouched; classic mode (profile == root) is a no-op; a failed root
write never breaks the profile's own save. Covers openai-codex (reported),
xai-oauth, and nous through the shared sync path.
Prevents stage2-hook.sh recursive chown from following a symlinked $HERMES_HOME/home (or profiles/cron) and destroying the host user's home directory. Also guards top-level state-file chowns and refuses first-boot seeding through symlinks. Fixes#52781.
Co-authored-by: harjoth <harjoth.khara@gmail.com>
Two-part fix:
Part 1 (classifier override at agent/error_classifier.py:720-738):
A transport disconnect on a reasoning model — even on a large session —
now routes to FailoverReason.timeout instead of context_overflow. Without
this, large-session reasoning-model disconnects route to the compression
branch and silently delete conversation history on a phantom
context-length error. The override is strictly targeted: non-reasoning
models (gpt-4o, claude-3-5-sonnet, llama-3.3-70b, etc.) still route to
context_overflow on large sessions — the existing intentional behavior
for chat models whose proxy doesn't idle-kill during prefill/generation.
Part 2 (new agent/thinking_timeout_guidance.py + integration at
agent/conversation_loop.py:3488-3567):
New is_thinking_timeout() and build_thinking_timeout_guidance() helpers.
When a known reasoning model (NVIDIA Nemotron 3 Ultra, OpenAI o1/o3,
Anthropic Opus 4.x thinking, DeepSeek R1, Qwen QwQ, xAI Grok reasoning)
hits a transport-kill on a small session (classifier says timeout
directly) or after Part 1 routes correctly (large session), the user
now sees reasoning-specific guidance with three actionable workarounds
in priority order:
1. Set providers.<provider>.models.<model>.stale_timeout_seconds: 900
in ~/.hermes/config.yaml (Hermes's built-in floor is already 600s
for known reasoning models; raise further if upstream is even
tighter).
2. Lower reasoning_budget or set reasoning_effort: medium on this
model if the provider supports it.
3. Use a smaller / faster reasoning model if the task doesn't
require deep thinking.
The new guidance takes precedence via if/elif over the existing
_is_stream_drop block, so a reasoning-model user with a transport-kill
message sees actionable advice instead of the misleading "try
execute_code with Python's open() for large files" advice (which is
correct for the unrelated large-file-write stream-drop case but
actively wrong for the thinking-timeout case).
Verified:
- 478 tests passing across 9 directly-relevant files (49 new + 429
existing, zero regressions).
- Ruff lint clean on all 4 modified/new files.
- Negative test: 6 parametrized regression guards confirm non-reasoning
models still route to context_overflow on large sessions; 4
parametrized gates confirm non-timeout classifier reasons never
trigger the guidance; 5 parametrized cases confirm non-transport
messages never trigger it.
- Regression guard: new guidance message does NOT contain
"execute_code" or "open()" — the misleading advice is fully
replaced, not appended alongside.
- Cross-vendor dual review via agy -p:
- Gemini 3.5 Flash (Medium) — passed: true, zero blockers, one
SHOULD-FIX (vprint block duplication — fixed by extracting
detection into a helper module).
- GPT-OSS 120B (Medium) — passed: true, zero blockers, two nits
(test placement — adopted at tests/agent/test_thinking_timeout_guidance.py;
primary-model capture — accepted as non-issue per Flash's nit).
Dependency note for maintainers:
This PR includes agent/reasoning_timeouts.py (the reasoning-model
allowlist module from PR #52238) because the Layer 1 override is
load-bearing on get_reasoning_stale_timeout_floor(). After PR #52238
lands on main, this PR's duplicate agent/reasoning_timeouts.py should
be rebased away. Either PR can land first; the other rebase is
mechanical.
Fixes#52271.
The #45966 cross-process coherence guard popped the stale cached agent
and then called the blocking _cleanup_agent_resources (memory-provider
shutdown, tool-resource teardown, async-client teardown) while still
holding _agent_cache_lock, on the gateway event-loop thread. While that
ran, _sweep_idle_cached_agents (driven by _session_expiry_watcher)
blocked acquiring the same lock and the asyncio loop stalled for minutes,
tripping repeated Discord 'heartbeat blocked' warnings.
Fix mirrors the cap-enforcer / idle-sweep paths: pop the stale entry
under the lock, release it, then schedule the SOFT release on a daemon
thread. The soft path (_release_evicted_agent_soft) is also more correct
here than the hard teardown the regression used — the same session
rebuilds a fresh agent immediately after invalidation, so its terminal
sandbox / browser / bg processes (keyed on task_id) must be preserved
for the rebuilt agent to inherit, not torn down.
Verified the cross-process site was the only cleanup-under-lock instance;
the other _cleanup_agent_resources call sites run outside the lock.
CI shard test_telegram_conflict.py timed out (140s) because the new
_polling_heartbeat_loop, started by connect(), busy-spun under those
tests: they monkeypatch asyncio.sleep to instant and pass a bot double
with no get_me(), so the probe raised AttributeError (swallowed) and the
loop re-entered immediately with no real pacing, starving the event loop.
Guard the loop to return when bot.get_me is not callable — a real PTB Bot
always exposes it, so this only triggers on a torn-down app or a test
double, where there is nothing to probe. Also cancel the heartbeat task in
the conflict tests that call connect() without disconnect(), matching the
production disconnect() teardown.
Verified: test_telegram_conflict.py now runs in ~4.5s; the 22
heartbeat/reconnect tests still pass; E2E confirms a hanging get_me still
fires the reconnect ladder while a missing get_me exits without spinning.
When a Telegram long-poll TCP socket enters CLOSE-WAIT (remote sent FIN
but httpx hasn't noticed), epoll still reports it readable so no
exception is raised. PTB's error_callback never fires, the reconnect
ladder never engages, and the gateway silently stops receiving messages
while the process stays alive — until a manual systemctl restart.
The existing recovery only covers two cases: error_callback-driven
reconnects (which require an exception PTB never gets) and a one-shot
_verify_polling_after_reconnect probe (which runs only right after an
explicit reconnect). A socket that wedges during steady-state operation
is never detected.
Add _polling_heartbeat_loop: a background asyncio.Task started in
connect() (polling mode only) that probes get_me() every 90s on the
general request pool (not the getUpdates pool, so healthy long-polls are
never interrupted). On asyncio.TimeoutError/OSError it hands off to the
existing _handle_polling_network_error ladder; other errors are
swallowed. disconnect() cancels and awaits the task. Worst-case
detection window ~105s.
Complementary to #51541 (general-pool keepalive limits / fd leak) — that
recycles idle pooled connections; this detects a wedged active read.
Fixes#48495
Co-authored-by: agt-user <267614622+agt-user@users.noreply.github.com>
The desktop scheduler can overwrite cron/jobs.json with its own small
set of internally-tracked crons after an update/restart, causing
partial loss of tool-created cron jobs. The previous guard only
checked for total loss (live_count == 0), missing the case where
live_count > 0 but less than the pre-update snapshot count.
Compare live_count against snap_count instead of checking for zero,
so both total loss (0 vs N) and partial loss (1 vs 19) trigger
restoration.
Salvaged from #52161 by @liuhao1024.
Closes#52144
A top-level delegate_task dispatches in the background and re-enters as a
fresh turn when done. Print a one-line dispatch-time note — no spinner,
nothing to poll — so the idle prompt doesn't read as "nothing happened."
When idle with a background subagent still in flight, append a tail status
segment spelling out that the agent resumes on its own. Width-budgeted like
every tail segment, so it drops first on a tight terminal where the ⛓ count
already carries the signal.
When idle with a top-level delegate_task still in flight, render a static,
shimmering system-note at the transcript tail instead of a spinner (which
reads as "stuck"). Reuses the shared steer / slash-status chrome (centered,
0.6875rem, muted, Codicon) so it sits in the thread like every other meta
line, and mirrors the primary child's latest stream line, falling back to
generic copy. i18n across en/ja/zh/zh-hant; markdown prose/heading rhythm
tuned so a re-entered turn breathes.
Track top-level delegate_task work that dispatches in the background and
re-enters as a fresh turn. $backgroundResume returns {count, activity} for
the active session while idle — count of parked tasks plus the primary
child's latest stream line (tool/progress/thinking) when readable.
Use the app's amber warn color for the unsaved-edits tab dot (was inheriting
the label text color) and add a tab-bg ring + soft drop shadow so it stays
legible where it overlaps the filename.
Extends the pane store with heightOverride (alongside widthOverride) and a
get/set/clear API, and wires the pane shell + desktop controller so the
bottom-row terminal pane can be resized on the Y axis with its size persisted.
Adds a CodeMirror 6 spot editor to the right-rail file preview so users can
make quick edits in-app without leaving for an IDE. Entering edit mode is a
pure in-place swap of the read view — same fixed-height header, same gutter
geometry/typography (mirrors SourceView 1:1) so nothing shifts — toggled via
the Edit button, a bare `e` when the pane is hovered/focused, or the tab.
- Save path is transport-agnostic (writeDesktopFileText): local Electron IPC
or a new hardened POST /api/fs/write-text on the dashboard server (path
validation, parent-must-exist, regular-files-only, size cap, atomic
temp-file + os.replace), behind the existing auth middleware.
- Stale-on-disk guard re-reads before writing and offers overwrite vs
discard-and-reload instead of clobbering external/agent edits.
- VS Code-style modified dot on the tab; ⌘/Ctrl+S and ⌘/Ctrl+Enter save,
Esc cancels; GitHub highlight style matched to the read view's Shiki theme.
- Typing stays render-free (draft in a ref; dirty flips once at the boundary).
The detector folds absolute home / Hermes-home prefixes into their canonical
~/ and ~/.hermes/ forms so static patterns catch /home/alice/.bashrc the same
way they catch ~/.bashrc (abd69b81). On native Windows this fold never fired,
so terminal commands writing to shell startup files, ~/.ssh/authorized_keys,
or ~/.hermes/config.yaml / .env returned "safe" and skipped the approval
prompt — and config.yaml carries the approval policy itself.
Two compounding causes:
1. The fold ran after the backslash-escape strip (r\m -> rm), which dissolves
the backslash separators in a Windows path (C:\Users\alice\.bashrc ->
C:Usersalice...) before the fold could match. It now runs before the strip.
2. The fold only recognized POSIX absolute paths and only the home prefix,
leaving multi-segment backslash suffixes (\.ssh\authorized_keys) to be
mangled by the strip.
Consolidated into _home_prefix_fold_regex / _fold_home_prefixes: match a home
prefix with either separator, capture the rest of the path token, and
normalize its separators to / so multi-segment patterns match. The
degenerate-path guard generalizes count("/") >= 2 to "at least two components
below the root" (also rejecting a bare drive root C:\). HOME is consulted
directly because Windows' expanduser ignores it; the more specific Hermes home
is folded first, longest candidate first, so neither fold clobbers the other.
POSIX behavior unchanged; the r\m -> rm anti-obfuscation strip still runs.
Adds TestWindowsAbsolutePathFolding, which monkeypatches a Windows-style
HOME/HERMES_HOME so the behavior is also exercised on the CI runner.
Follow-up to the salvage of #45035 + #48682. The two PRs touched different
functions (resolve_resume_session_id vs get_compression_tip) but #45035's
descendant walk followed ANY parent_session_id child, so a delegate/subagent
child could hijack the resume target. Apply the same _branched_from /
_delegate_from / source!='tool' exclusion the rest of hermes_state.py uses,
so the resume walk only follows genuine compression continuations.
Also updates the unrealistic delegation test fixture to carry the real
_delegate_from marker, and updates 3 list_sessions_rich test mocks for the
order_by_last_active kwarg #48682 added.
AUTHOR_MAP: map PINKIIILQWQ + ailang323 salvage authors.
After context compression, the parent session holds pre-compression messages
and a child (or deeper descendant) holds the continuation.
resolve_resume_session_id() short-circuited when the input session already
had messages (row is not None -> return session_id), causing REST API
endpoints, gateway resume, and CLI resume to serve stale parent messages.
Remove the early-return. Walk the full descendant chain, record the
deepest node that has messages (best), and return best if not None
else the original session_id (preserving the empty-chain fallback).
Callers (api_server.py, web_server.py, cli_agent_setup_mixin.py,
cli_commands_mixin.py) all use the resolved != input -> redirect pattern
and are transparent to this change.
The pre-update HERMES_HOME zip shipped on by default (DEFAULT_CONFIG +
runtime fallback both True), so every `hermes update` zipped the entire
~/.hermes — sessions DB, caches, skills — adding minutes to each update.
The shipped cli-config.yaml.example, the --backup help, and the example
config all already said "off by default," so the live default
contradicted its own documentation.
Flip the default to off everywhere: DEFAULT_CONFIG, the runtime
`.get(..., False)` fallback in _run_pre_update_backup, and the stale
--backup help string. Users who want the #48200 safety net opt in via
updates.pre_update_backup: true or --backup for a single run.
Updated test_default_enabled_creates_backup -> test_default_disabled_is_silent
to assert the new default (silent no-op, no zip).
* fix(cron): add default retention to per-run job output to bound disk usage (#52383)
Per-run cron output (cron/output/<job>/<timestamp>.md) is written once
per execution and was never pruned, so a frequently-scheduled job on
a long-running deploy accumulates one file per run indefinitely and
can fill the volume ('no space left on device').
save_job_output() now keeps the most recent N output files per job and
removes older ones. N defaults to 50 and is configurable via
cron.output_retention; a non-positive value disables pruning for
operators who manage cleanup externally.
Salvaged from #52402 by @0xDevNinja.
Closes#52383
* fix(config): add cron.output_retention to DEFAULT_CONFIG
Follow-up to #52383: the retention config key was functional via
get()-with-default but missing from DEFAULT_CONFIG, so the deep-merge
wouldn't auto-populate it for new installs. Add it explicitly.
---------
Co-authored-by: 0xDevNinja <manmit0x@gmail.com>
Regression for the salvaged #48254 fix: billing route is first-writer-wins
via update_token_counts (COALESCE), so a mid-session provider switch left
the dashboard attributing cost to the original provider. Asserts the new
update_session_billing_route() overwrites unconditionally, nulls system_prompt
so the next turn rebuilds Model:/Provider:, and preserves billing_mode when
omitted (COALESCE on None).
The session database records billing_provider and billing_base_url using
COALESCE(column, ?) in update_token_counts(), making them write-once.
When a user switches models mid-session via /model, the runtime (agent.provider,
agent.base_url) updates correctly, but the session row never reflects the new
provider. This causes the dashboard Models page to display a stale provider
badge and misattributes token usage / cost analytics.
Fix: add update_session_billing_route() that unconditionally sets
billing_provider, billing_base_url, and billing_mode (no COALESCE), and call
it from switch_model() in agent_runtime_helpers.py after the swap succeeds.
This follows the same pattern as update_session_model() which already
unconditionally updates the model column (added for the identical COALESCE
problem on the model field).
Closes#48248
A stale-index render race in assistant-ui (a just-shrunk thread rendered
at an old message index during a session switch / teardown) throws
errors like "tapClientLookup: Index N out of bounds", "Cannot read
properties of undefined (reading 'type')", or "Tried to unmount a fiber
that is already unmounted". These bubble to the root ErrorBoundary and
latch the WHOLE desktop app on the "Reload window" fallback even though
the next render against fresh state would be fine.
Teach the root boundary to treat that small set of known-transient
renderer errors as recoverable: log them and schedule a next-tick
reset() so React re-renders against current state instead of stranding
the user on the fallback.
Auto-recovery is BOUNDED -- at most MAX_RECOVERIES (3) attempts within a
5s window -- so a genuinely persistent error can't spin the boundary in
a reset -> throw -> reset loop; after the budget is spent the fallback
is left up for the user. Manual retry (the button) resets the budget.
Only the root boundary auto-recovers; scoped boundaries keep their own
fallbacks, and unrecognized errors are never swallowed.
Tests: transient race recovers (fallback never sticks), a persistent
recoverable error stops at the cap and surfaces the fallback (proving
the loop is bounded), and neither a non-root boundary nor an
unrecognized root error auto-recovers.
Closes#41693. Supersedes #41787 by @izumi0uu, reimplemented with a
bounded recovery budget so a non-transient error can't loop forever.
Co-authored-by: izumi0uu <izumi0uu@gmail.com>
resumeSession's warm-cache fast-path trusted the
storedSessionId -> runtimeId -> ClientSessionState mapping without
checking the cached state still BELONGS to the session being resumed.
A pooled profile backend that gets idle-reaped and respawned
(pruneSecondaryGateways) re-mints runtime ids, so a recycled id can
resolve to a live-but-DIFFERENT session's cache entry. The only
existing guard was a session.usage 404 -- that catches a fully-dead
runtime id, but a recycled id still 200s, so the fast-path happily
painted the wrong transcript under the current route (open chat A,
chat B loads).
Fold the belongs-to check into a single takeWarmCache() helper used at
BOTH cache reads -- the early transcript-keep decision and the fast-path
itself -- so a cross-wired entry can't even briefly flash a stale
transcript before the full resume repaints. On a mismatch the helper
purges both stale map entries and reports a miss, falling through to a
full resume that rebinds a correct runtime id. The full-resume path
already guards its final paint with isCurrentResume(), so only the
cached fast-path was missing the belongs-to check.
Pre-existing bug from the initial desktop app (#20059); not introduced
by the session-switch perf work (#49807), which left these lines
untouched.
Tests: two cases in use-session-actions.test.tsx driven through a
harness that owns the two cache maps -- a cross-wired mapping is
rejected + purged (the bug), and a correctly-wired cache still serves
from memory with no needless refetch (no perf regression).
Supersedes #50464 by @professorpalmer, reimplemented to also guard the
early transcript-keep read (whole-class fix, not just the fast-path).
Co-authored-by: professorpalmer <professorpalmer@users.noreply.github.com>
* feat(moa): expose MoA presets as selectable virtual models
Reconstructed onto current main (PR #46081's base had diverged with no common
ancestor, marking the PR dirty so CI never dispatched). MoA is now a virtual
provider: each named preset is a selectable model under provider 'moa', and the
preset's aggregator is the acting model that answers and calls tools.
Reference models fan out in parallel via a bounded ThreadPoolExecutor (the same
batch pattern delegate_task uses) — all references dispatched at once, collected
when every one finishes, then handed to the aggregator. Output order is
preserved, failures and the MoA-recursion guard stay isolated per reference.
- Removed the old mixture_of_agents model tool and moa toolset.
- Added moa as a virtual provider in the provider/model inventory.
- /moa is shortcut behavior over model selection (default preset / named preset
/ one-shot prompt).
- Dashboard + Desktop manage named presets; presets appear in model pickers.
- Parallel reference fan-out in agent/moa_loop.py with regression test.
* fix(moa): thread moa_config through _run_agent to _run_agent_inner
The reconstructed gateway MoA wiring declared moa_config on _run_agent (the
profile-scoping wrapper) and used it inside _run_agent_inner, but the wrapper
never forwarded it — _run_agent_inner had no such parameter, so the runtime hit
NameError: name 'moa_config' is not defined on the compression-failure session
sync path. Add moa_config to _run_agent_inner's signature and forward it from
both wrapper call sites (multiplex and non-multiplex). Caught by
tests/gateway/test_compression_failure_session_sync.py on CI shard test(4).
* fix(moa): classify moa as a virtual provider in the catalog
The moa virtual provider has no PROVIDER_REGISTRY/ProviderProfile entry, so
provider_catalog() fell through to the default auth_type="api_key" with no
env vars — tripping two catalog invariants:
- test_provider_catalog: api_key providers must expose a credential env var
- test_provider_parity: every hermes-model provider must be desktop-configurable
moa already declares auth_type="virtual" in HERMES_OVERLAYS; consult that
overlay as an auth_type fallback so the catalog reports moa as virtual (no real
credential, no network endpoint). Exempt virtual providers from the desktop
parity union check the same way 'custom' is exempt — derived from the catalog,
not a hardcoded slug, so future virtual providers are covered too.
Scheduled jobs delivering to Telegram/etc. started posting a literal
'⚠️ No reply: the model returned empty content…' message instead of
staying silent. Two interacting causes:
1. The turn-completion explainer (#34452) replaces an empty model turn
with a user-facing '⚠️ No reply…' string. In a cron context that is
not a silence marker, so the scheduler delivered it — a regression
from the previously-silent empty turn. run_job now detects the
explainer text deterministically (via the same formatter that
produced it) for abnormal-empty turn_exit_reasons and strips it to
empty, so the existing empty-response suppression + soft-fail guard
apply. The explainer is unchanged on CLI/gateway.
2. The cron suppression used a loose 'SILENT_MARKER in ...upper()'
substring check. It leaked bracketless near-markers the model emits
('SILENT', 'NO_REPLY', 'NO REPLY' — #51438, #46917) and wrongly
swallowed a real report that merely quoted '[SILENT]' mid-sentence.
Replaced with _is_cron_silence_response(): suppresses a canonical
token as the whole response, its own first/last line, or the
documented bracketed '[SILENT] <note>' prefix — while a token buried
mid-sentence in a genuine report is delivered. Preserves the
intentional cron trailing/prefix tolerance (existing tests unchanged).
Tests: bracketless-variant suppression, mid-sentence-quote delivery,
direct matcher contract, and explainer-strip + defensive real-report
delivery.
The trailing-whitespace expansion in _map_normalized_positions
unconditionally consumed whitespace after the matched region — including
the word-boundary space that separates the match from the next token.
This caused silent file corruption when the fuzzy matcher fell back to
the whitespace_normalized strategy.
Guard the expansion on the normalized match actually ending with
whitespace (i.e. the original had a run of spaces that were collapsed).
When the match ends with a non-space character, the first whitespace in
the original is a boundary and must not be consumed.
Fixes#52491
Assert bare tables upgrade to sendRichMessage under default/opt-out config,
DM-topic resumed sends without reply anchors, and rich finalize edits carry
forum topic routing metadata.
Pipe-only markdown tables now use sendRichMessage even when rich_messages
is off, and resumed DM-topic sends route via direct_messages_topic_id
without requiring a reply anchor. Rich finalize edits forward topic kwargs.
The salvaged context-window screen (#52392) skips fallback candidates that
are too small, and the rate-limit/403 fixes skip candidates that are at
capacity. A third hard failure remained uncovered: a fallback that builds a
client fine but returns a 400 because it structurally cannot run the model.
The canonical case is a configured openai-codex / ChatGPT-account fallback
asked to compress a glm-5.2 conversation:
400 - {'detail': "The 'glm-5.2' model is not supported when using
Codex with a ChatGPT account."}
This is a request-validation error, so should_fallback was False and the
explicit-provider gate blocked it — the auxiliary task (compression) aborted
every turn, dropping middle turns without a summary and churning the session,
which is exactly what destroys the prompt cache.
Adds _is_model_incompatible_error() (400 + capability phrasing, excluding
not-found and billing 400s which the sibling predicates own) and treats it as
a fallback-worthy capacity error in both sync and async call_llm, so the chain
skips the incapable route and continues to the next viable candidate.
The runtime auxiliary fallback chain (_try_configured_fallback_chain and
_try_main_fallback_chain) returned the first reachable candidate without
checking whether the candidate's context window was large enough for the
task. For task='compression' this meant a reachable but undersized
fallback (e.g. 32K) could be selected and then fail, even when a later
larger-context fallback was available.
This adds two small helpers:
_task_minimum_context_length(task)
Returns MINIMUM_CONTEXT_LENGTH (64K) for compression, None for
other tasks (vision, web_extract, etc.).
_candidate_context_window(provider, model, ...)
Thin wrapper around get_model_context_length that returns None on
probe failure so unknown/custom endpoints pass through unchanged
(preserves the existing fallback surface).
Both fallback loops now skip reachable candidates whose resolved context
is below the task minimum and continue iterating. The success path
(first viable candidate wins) is unchanged. Return shape and ordering
for healthy candidates are preserved.
Six regression tests cover:
L2 configured chain skips too-small candidate
L2 chain continues after skipping, returns last viable
L3 main chain skips too-small candidate
L4 unknown-context candidate passes through
L5 non-compression task is not filtered
L6 minimum constant matches MINIMUM_CONTEXT_LENGTH (64K)
3/6 fail on upstream/main without the production change (verified); all
6 pass with the fix. Full test_auxiliary_client.py suite (231 tests)
and related compression tests (130 tests) remain green.
When an explicit aux provider cannot build a client before any request is
sent (missing raw env key, exhausted/unavailable OAuth or credential-pool
auth, resolver returning (None, None)), call_llm raised a misleading
"no API key was found" error and bypassed the configured fallback_chain
entirely. A provider authenticated through Hermes auth / the credential
pool (e.g. ollama-cloud) whose pool entry is exhausted hit this path, so
compression failed instead of routing to the configured fallback.
Adds _try_configured_fallback_for_unavailable_client() and wires it into
both sync and async call_llm before the raise, and into the startup
compression feasibility check.
Salvaged from #51835 by @herbalizer404.
Rate-limit (429) errors on explicit-provider auxiliary tasks were
silently failing instead of triggering the fallback chain. The
is_capacity_error gate only checked payment and connection errors,
excluding rate limits — so when a configured provider like
openai-codex hit its rate limit, auxiliary tasks (kanban_decomposer,
vision, web_extract, approval, etc.) had zero resilience.
Add _is_rate_limit_error() to is_capacity_error at both call sites
(sync and async paths) so rate limits trigger fallback regardless
of whether the provider was auto-detected or explicitly configured.
Fixes#52228
Ollama Cloud (and similar) return 403 with bodies like "this model requires
a subscription, upgrade for access" or "you have reached your session usage
limit, upgrade for higher limits". These are capacity/billing conditions
semantically identical to credit exhaustion, but _is_payment_error() did not
recognize them (403 missing from the status set; keywords missing), so the
configured fallback_chain was never tried and compression failed outright.
Adds 403 to the status set and the subscription/session-usage keywords.
Salvaged from #49076 by @herbalizer404.
These 7 test sites assert rotation behavior (fork, child sessions, lock
contention, logging session-context follows id rotation, boundary hooks fire
on rotation). Pin each builder to in_place=False explicitly so they keep
exercising the retained rotation fallback regardless of the global default
(flipped to True in #38763). Rotation stays a working opt-out fallback and
deserves continued coverage — these are NOT deleted.
Pinned sites:
- test_compression_concurrent_fork._build_agent_with_db
- test_compression_logging_session_context._build_agent_with_db
- test_compression_rotation_state._build_agent_with_db
- test_compression_boundary_hook._make_agent (2 helpers: CompressionBoundaryHook + SessionCompressEvent)
- test_compression_concurrent_sessions._build_agent_with_db
In-place compaction (single durable session id, non-destructive soft-archive)
becomes the default. Rotation is now the opt-out fallback via
compression.in_place: false.
Prerequisite: #50098 (hygiene guard reads result flag not config flag) merged
first — without it, flipping the default causes permanent transcript loss on
gateway hygiene-compress and /compress when no session_db is available.
Blast radius (empirically measured on current main): 7 rotation-asserting
tests broke and are pinned to in_place=False in the companion test commit:
- tests/agent/test_compression_concurrent_fork.py (2)
- tests/agent/test_compression_logging_session_context.py (1)
- tests/agent/test_compression_rotation_state.py (1)
- tests/run_agent/test_compression_boundary_hook.py (2 _make_agent helpers)
- tests/gateway/test_compression_concurrent_sessions.py (2)
Rotation stays as a working fallback and deserves continued coverage.
Plan: .hermes/plans/in-place-compaction-38763.md
Salvage of #50098 by @srojk34, cherry-picked onto current main.
The hygiene auto-compress guard and the /compress slash command both read
compression_in_place (config flag — is in-place mode enabled?) instead of
_last_compaction_in_place (result flag — did in-place compaction actually
succeed?). Both agents are built without a session_db, so archive_and_compact
always fails silently and _last_compaction_in_place stays False. Reading the
config flag makes the guard think in-place succeeded, triggering
rewrite_transcript() which replaces the original messages with only the
compressed summary — permanent data loss.
Co-authored-by: srojk34 <srojk34@users.noreply.github.com>
build_turn_context() created the DB session row via _ensure_db_session()
before the system prompt was restored/built, so a fresh API/gateway agent
carrying client-managed history inserted a row with system_prompt=NULL. That
tripped the misleading 'stored system prompt is null; rebuilding from scratch
... investigate the previous turn's write path' warning and a guaranteed
first-turn prefix cache miss. Move row creation to after _cached_system_prompt
is populated.
Verified live (OpenRouter + claude-sonnet-4.5): persistent-agent turns show
cache_read jumping to the full prefix on turn 2+ (write 24411 -> read 24411),
and the persisted system_prompt is non-NULL so fresh-agent restore keeps the
prefix cache warm.
Tests: turn-context ordering regression asserting _ensure_db_session runs
after _cached_system_prompt is populated.
/learn told the agent to fill the skill `author` field, and the system
prompt environment probe surfaces the OS login name (user=$(whoami) in
prompt_builder.py), so the model wrote the host username into published
SKILL.md frontmatter — a privacy leak the user never opted into, and
inconsistent run to run as the most-salient identity changed.
The /learn authoring prompt now sets `author` to the literal value
`Hermes` and explicitly forbids deriving it from the host environment
(OS/login user, git config, or any probeable identity). The skill names
itself as the tool that wrote it.
Closes#52368.
- Replace getattr(self.session_store, '_db', None) with self._session_db
(the GatewayRunner's own SessionDB, consistent with existing usage in
slash_commands.py L240/L499).
- Remove verbose comment referencing a branch name as an issue number.
- Update stale comment in run.py that said 'today it has no session_db'.
- Add regression test verifying session_db is passed and rotated session
is persisted (adapted from #51624 by @LeonSGP43).
- Add _session_db=None to _make_runner fixtures in test_compress_command,
test_compress_focus, and test_compress_plugin_engine.
Manual /compress and session hygiene auto-compress both create temporary
AIAgent instances to run compression. These agents were created without
a session_db, so compress_context computed the compressed messages in
memory, rotated the session ID, and reported success — but never wrote
to the database. The next user message reloaded the original full
transcript, making compression appear to do nothing.
Fix: pass session_db=self.session_store._db to both temp agents so the
session rotation is properly persisted. Also set _end_session_on_close
on the /compress temp agent (already done in hygiene path) to prevent
cleanup from ending the newly rotated session.
These three assert the eager build contract — stored runtime overrides /
profile db reach _make_agent synchronously, and the agent binds to the
compression tip. Under deferred-by-default the build runs off-thread, so
they raced the timer (green in CI, flaky locally). Pin them to
eager_build; deferred coverage lives in the protocol tests.
Statusbar items declared a 'title' string (e.g. YOLO, gateway health,
agents, cron, version, context usage) that was populated by
use-statusbar-items.tsx but never forwarded to the rendered DOM in
StatusbarControls — so every statusbar button/menu/text/link had no
hover hint.
Wrap the four render branches (menu trigger, text, link, action) in
the existing 'Tip' component from components/ui/tooltip.tsx. Tip is
self-contained (carries its own Provider), instant (delayDuration=0),
themed (bg-foreground/text-background, auto-inverts per theme), and
already in use elsewhere in the desktop shell. Renders the child
untouched when label is falsy, so items without a title stay
zero-cost.
Collapse the duplicated cold-resume / lazy-watch / create scaffolding into
shared helpers: _deferred_session_record (the live-session dict minus the
agent), _lazy_resume_info (the not-yet-built session.info), _claim_or_reuse_live
(lock + double-checked register-or-reuse), and _schedule_agent_build (the
pre-warm timer). Net -12 lines, three copies of the ~30-key session dict and
the lazy-info block down to one each. No behavior change.
Per review: gating the faster path behind a `defer_build` flag that the
only caller always sends is pointless. Flip it — `session.resume` now
defers the agent build by default for every caller (desktop + Ink TUI);
a caller that needs the agent built synchronously passes `eager_build:
true` (used by the build-race test). The desktop no longer sends a flag.
While verifying the flip, fixed two real parity gaps the deferred path
had vs the old eager (`_init_session`) path:
- `_enable_gateway_prompts()` was never called on a deferred resume, so
approvals/clarify wouldn't route through the gateway prompt callbacks.
- `_start_agent_build` never wired `background_review_callback` /
`memory_notifications`, so a deferred-built session's self-improvement
"💾 …" summary leaked to stdout instead of rendering in-transcript.
Wiring it there also fixes it for `session.create` sessions, which
build through the same path.
ACP is unaffected (it uses its own session_manager, not this RPC); the
Ink TUI already consumes the same lazy `info` shape from session.create
and upgrades on the later `session.info` event.
Switching sessions in the desktop app could freeze the whole UI for
several seconds on heavy, tool-rich chats. Root causes and fixes:
- Cold `session.resume` built the AIAgent (MCP discovery, prompt/skill
build) *before* returning, and the desktop awaits that RPC before it
paints — so the entire switch blocked on the build. Add an opt-in
`defer_build` resume path (the contract `session.create` already uses):
return the full display transcript immediately, register an upgradable
live session, and pre-warm the agent on a short timer. The persisted
runtime identity (model/provider/base_url/api_mode/reasoning/tier) is
restored on the deferred build so it can't drop the provider.
- Nothing bounded how many in-memory agents accumulate; a user who
reconnects often piled up detached sessions for the full 6h TTL. Add a
soft LRU cap (`max_live_sessions`, default 16) that evicts the
least-recently-active DETACHED sessions (no live client) — never a
running, awaiting-input, mid-build, or live-transport one. Reopening
re-resumes from disk.
- On the prefetch-hit cold-resume path, skip rebuilding a throwaway
merged-message array (and its 1000-entry Map) when the prefetch already
painted the exact transcript; the downstream sameMessageList guard
already drops the publish, so it was pure main-thread cost.
The desktop opts into `defer_build` for every non-watch cold resume; the
eager path stays for CLI/TUI and existing callers.
When the gateway persists a user message after a transient provider
failure (429/timeout/auth error), subsequent retries of the same
Telegram message could stack duplicate user turns in the transcript,
causing the agent to fall behind by 1-2 messages.
Add has_platform_message_id() to SessionDB (using the existing
idx_messages_platform_msg_id partial index) and a SessionStore wrapper.
The gateway's transient-failure path checks this before
append_to_transcript -- if the platform_message_id is already
persisted, the duplicate write is skipped.
Salvaged from #47869 by @davidgut1982. Adapted to current main which
has additional append sites and an existing content-based dedupe in
the exception handler path.
Closes#47237
todo_tool crashed with `AttributeError: 'str' object has no attribute 'get'`
when the LLM emitted the `todos` param as a JSON-encoded string instead of an
array, or as a list containing non-dict items (observed intermittently on
Claude 4.5/4.6/4.7, and after a prior tool-call rejection where the model
"self-corrects" by wrapping the list in json.dumps).
Three additive guards, no behavior change for well-formed input:
- todo_tool(): if `todos` is a str, json.loads it; reject unparseable strings
and non-list values with a clear tool_error instead of crashing downstream.
- _validate(): non-dict items return a {id:"?", content:"(invalid item)"}
placeholder rather than calling .get() on a str/int/None.
- _dedupe_by_id(): non-dict items get a synthetic key so _validate handles them.
Salvaged from #14785 by @Tranquil-Flow (authorship preserved via cherry-pick).
Comprehensive tests: JSON-string coercion (parse / unparseable / non-list /
non-string), non-dict list items (str/None/int/mixed), and a well-formed-
unchanged regression class — both guards mutation-verified to fail without them.
Closes#14185. Supersedes #14187, #22505, #14350 (same fix, less/no test
coverage) and #16952 (bundled unrelated scope-creep).
During stdio MCP server startup, _run_stdio (an async method) called the
synchronous check_package_for_malware() inline. That makes a blocking
urllib HTTPS POST to api.osv.dev whose own timeout doesn't reliably cover a
stalled SSL handshake, so an intermittent network issue froze the entire
asyncio event loop for up to ~120s — blowing past the TUI/gateway's 15s
startup budget and showing "gateway startup timeout".
Run the check via asyncio.to_thread (off the loop) AND bound it with
asyncio.wait_for(timeout=_OSV_MALWARE_CHECK_TIMEOUT_S=12s). The malware check
is fail-open, so on timeout we log and proceed rather than blocking startup.
Salvaged from #29190 by @qdaszx (re-applied on current main — the call site
moved since the PR was opened), combining the to_thread approach also proposed
in #29192 by @ygd58. Two load-bearing tests: event-loop-not-blocked-during-
check and timeout-fails-open — both mutation-verified to fail against the old
inline blocking call.
Closes#29184.
Co-authored-by: ygd58 <buraysandro9@gmail.com>
atomic_yaml_write used default yaml.dump which emits indentless
sequences (list items at column 0), while atomic_roundtrip_yaml_update
(ruamel.yaml) emits 2-space-indented sequences. Cross-path writes to
the same config.yaml toggled indentation on every save, eventually
producing a mixed-indent file that js-yaml rejects with 'bad indentation
of a mapping entry', silently dropping custom_providers and breaking
model switching.
Add IndentDumper SafeDumper subclass that forces indentless=False,
route atomic_yaml_write through it. Route tui_gateway._save_cfg and
the Telegram adapter's config writer through atomic_yaml_write so all
paths emit the same 2-indent layout.
Salvaged from #32034 by @xxxigm. Adapted to current main which already
has allow_unicode=True (from #51356) but was missing IndentDumper.
Closes#31999
Replace _count_real_sudo_invocations (which called
_rewrite_real_sudo_invocations and discarded the rewritten string) with
a lightweight token scan that reuses the same tokeniser but skips string
building. Remove the agent-facing tip about nested sudo in heredocs —
the cache-cleared warning is enough.
Pipe one password line per sudo invocation in compound commands so a correct
password is not rejected on the second `sudo` in `sudo a && sudo b`. Drop the
session cache when sudo returns Authentication failed, surface sudo_auth_failed
in the tool result, and add hints for interactive sessions.
A prompt sent while a turn was in flight got rejected with 4009 "session busy",
which pushed clients (the desktop app) into a deadline-bounded busy-retry. When
turn teardown outlived that deadline — e.g. the user hits stop while a slow,
non-interruptible tool (web_search, read_file, an MCP call) is mid-flight, since
the sequential executor only checks the interrupt flag between tools — the
resubmitted message was silently dropped: "it just doesn't listen".
Wire the previously-dead display.busy_input_mode config into prompt.submit:
instead of rejecting, apply the policy and queue the message to run as the next
turn (drained in run()'s tail, ahead of goal/notification follow-ups). Modes:
interrupt (default) interrupts the live turn so it winds down promptly then runs
the queued message; queue runs it after the current turn finishes; steer injects
it into the live turn when accepted, else queues. The queued slot pins the
sender's transport and losslessly merges a second arrival. No client deadline,
no dropped sends.
#48879 closed the tool-call sequence on interrupt inside finalize_turn so a
/stop after a tool no longer persists a `tool` tail that the next user message
turns into a `tool -> user` role-alternation violation (which strict providers
like Gemini/Claude react to by hallucinating a continuation and ignoring prior
context — what users see as "lost context after stop").
But the retry-wait, error-handling, and post-error retry-wait interrupt aborts
in conversation_loop return early and never reach finalize_turn, so they still
persisted and returned a raw `tool` tail. Interrupting during provider
backoff/rate-limiting (common under heavy work) hit exactly this path.
Extract the close into a shared close_interrupted_tool_sequence helper and apply
it at every interrupt abort (finalize_turn + the three early returns) so the
whole bug class is fixed, not just the one site.
The desktop self-updater rebuilds and re-signs the .app on each user's own
machine (`hermes desktop --build-only` -> electron-builder `--dir`). With
CSC_IDENTITY_AUTO_DISCOVERY on (its default), electron-builder signs the
type=distribution, hardened-runtime bundle with whatever identity is in that
user's keychain -- typically a personal "Apple Development" cert -- which
stalls/fails the sign step (no Developer ID, no provisioning profile) or
clobbers the original notarized signature with an unusable one, tripping
Gatekeeper on every post-update launch.
Force ad-hoc signing for the local packaged rebuild instead: deterministic,
and exactly what _desktop_macos_relaunchable_fixup already finishes off.
No-op for source runs, off-macOS, when a real identity is configured
(CSC_LINK / APPLE_SIGNING_IDENTITY), or when the caller already pinned the flag.
fixes stacked PRs no-checks bug where
main < a < b
a merges into main
b is retargeted to main
but b doesn't run checks since it's not considered a new pr to main
now b will simply already have passing ci :)
Block scale-to-zero suspend while background async delegations are active, and restore runtime status to running on real inbound after a dormant wake.\n\nAdd regression coverage for both review findings.
The verify-on-stop guard (PRs #52296, #52297) defaulted ON for every
session, so on gateway messaging surfaces (Telegram, Discord, etc.) the
model complied with the nudge by writing a hermes-verify temp script and
emitting an ad-hoc verification summary, which the gateway delivered to
the end user as chat noise.
Resolve a surface-aware default instead. The DEFAULT_CONFIG value becomes
the sentinel "auto", which verify_on_stop_enabled() resolves to ON for
interactive coding surfaces (CLI, TUI, desktop) and programmatic callers,
and OFF for conversational messaging surfaces. The surface is read from
HERMES_SESSION_PLATFORM (what the gateway actually binds), with
HERMES_SESSION_SOURCE and HERMES_PLATFORM as fallbacks, matching the
sibling resolution in skill_commands.py and prompt_builder.py. An explicit
HERMES_VERIFY_ON_STOP env var or a boolean agent.verify_on_stop config
still overrides in either direction.
The passive evidence ledger and the call site are untouched.
The /learn authoring prompt taught a subset of the HARDLINE skill rules,
and stated the <=60-char description rule without making the model enforce
it — so generated descriptions overshot (up to 202 chars), which the
60-char system-prompt skill index then silently truncates.
- description: add the index-truncation rationale, a count-and-trim
self-check, and a good/bad length example so the model actually hits <=60.
- add platforms-gating rule (OS-bound primitives -> declare platforms:).
- add author-credits-human-first rule.
- round out the Hermes-tool framing with the full wrapped-tool mapping and
references/templates layout.
Closes#52367.
The host-allowlist hardening (#30611) plus the refresh heal (#49735) left
the documented NOUS_INFERENCE_BASE_URL dev/staging escape hatch unreachable
for OAuth sessions, despite three code comments asserting it still works.
Root cause — resolution precedence in resolve_nous_runtime_credentials:
inference_base_url = (
_optional_base_url(state.get("inference_base_url")) # stored — wins
or os.getenv("NOUS_INFERENCE_BASE_URL") # env — unreachable
or DEFAULT_NOUS_INFERENCE_URL
)
A staging OAuth login persists its inference_base_url, but the allowlist
rejects the staging host and the refresh heal rewrites the stored value to
the production default. The stored (now prod) value is then read BEFORE the
env var, so the override never takes effect — every request 401s against
prod or is pinned to prod, and setting the env var does nothing.
Fix: the user-set env override is the most-trusted source, so consult it
FIRST for the URL used to build the client / returned to callers — while
keeping the PERSISTED value the validated, network-provenance one (the
override is a runtime overlay, never written to auth.json, so unsetting it
cleanly reverts to prod). Applied at both chokepoints:
- resolve_nous_runtime_credentials (no-refresh read path AND refresh path)
- the nous_portal proxy adapter, which re-validates the resolver's returned
base_url against the prod allowlist as defense-in-depth and would
otherwise reject a legitimate staging override at the forward boundary.
New _nous_inference_env_override() / split of stored-vs-effective URL keep
the threat model intact: Portal-returned URLs are still allowlist-validated
at every network site, and the env path stays ungated (trusted OS user).
Also folds in the no-refresh read-path heal (supersedes the approach in
the open #50265): a poisoned stored staging host now heals to the prod
default on read even when no refresh fires.
Tests: TestEnvOverrideWins (env wins on read + refresh paths; override never
persisted; poisoned stored heals) and TestProxyAdapterEnvOverride. Verified
the 4 behavioral tests fail against pre-fix code and pass with the fix; full
inference-validation + nous-provider suites green (85 passed). E2E-validated
against a real temp HERMES_HOME exercising the real resolver + proxy adapter:
resolver→staging, persisted→prod, proxy→staging, unset→reverts to prod.
- Remove dead `chosen_base or effective_base` fallback; _select_zai_endpoint
always returns a non-empty base URL (returns current_base on cancel).
- Add .rstrip("/") to official-endpoint return for symmetry with custom-proxy
path (both now return normalized URLs).
- Replace magic index 4 with len(ZAI_ENDPOINTS) in custom-proxy tests so they
don't break if a 5th endpoint is added to ZAI_ENDPOINTS.
Z.AI now uses a curses picker instead of plain text input for base URL,
so the existing TestBaseUrlValidation tests (which used zai as their test
subject) are migrated to MiniMax, which still uses the text input path.
Add TestZaiEndpointPicker covering:
- Selecting each official endpoint (Global, China, Coding Plan Global,
Coding Plan China) saves the correct base URL to config
- Custom proxy URL entry (valid + invalid rejection)
- Cancel keeps the existing base URL
- Current endpoint is the default choice in the picker
- Non-standard URL defaults to the Custom proxy option
When provider_id == 'zai', replace the plain text Base URL input with
_select_zai_endpoint, which presents a curses picker offering Global,
China, Coding Plan Global, Coding Plan China, and custom proxy options.
Other API-key providers (MiniMax, DeepSeek, etc.) keep the text input.
Presents a curses-based picker (via _prompt_provider_choice) offering the
four official Z.AI endpoints — Global, China, Coding Plan Global, Coding
Plan China — plus a custom-proxy option. Sourced from ZAI_ENDPOINTS in
auth.py so it stays in sync with the probe list.
Not yet wired into the setup flow; that comes in the next commit.
Add a hover/focus "Remix" action on each completed draft card in the
generation grid. It re-runs generation with the chosen draft fed back in
as the reference image, keeping the same prompt and staying on step 2 so
the user can explore variations without starting over.
Because regenerating is slow and replaces the current drafts, the first
remix shows a one-time confirmation; the acknowledgement is persisted so
subsequent remixes fire immediately.
Move terminal/execute_code/read_file preview compaction into agent.display so CLI, gateway, and Ink TUI all inherit the same labels that desktop introduced in #52321.
The shared preview keeps raw args intact while trimming display-only shell plumbing (`cd`, pipe tails, banner/status echoes) and read_file line ranges. Desktop now prefers backend `context` for live rows and keeps its TypeScript fallback only for hydrated history.
The pet generation image-processing suite is deterministic but expensive enough
to blow the per-file CI timeout on Linux (140s), and it is not relevant to the
fast timeout PR's normal signal. Keep it available for manual validation, but do
not run it by default.
Set HERMES_RUN_SLOW_PET_TESTS=1 to enable the suite. The canonical test wrapper
now preserves that opt-in variable through its hermetic env.
The fixed "up to 5 minutes" wording undersells the slow quality-first path
(OpenAI image via OpenRouter), where a full hatch can run far longer. Use an
open-ended "several minutes" instead so the banner stays honest across the
fast and slow providers.
The quality-first default (OpenAI image via OpenRouter) is slow, and a full
hatch fans out ~8 rows with up to 3 retries each (300s/call) across 2 parallel
waves, so the absolute backend worst case is ~30 min. The old ceilings fired
mid-run:
- per-image HTTP call: 180s -> 300s (a single cold row can exceed 3 min)
- drafts RPC: 240s -> 420s (single wave, no retries — 7 min is ample)
- hatch RPC: 420s -> 1hr (sits above the ~30 min backend worst case)
The hatch ceiling is intentionally well above the realistic max so the frontend
never throws "request timed out" before the backend has exhausted its own
retries. The background-resumable notification path remains the real UX safety
net — the user can close the modal and get pinged on completion.
Make completed desktop tool rows read like useful activity labels instead of raw plumbing: terminal rows use a dispatch-style shell summarizer for agent wrappers, and read_file rows keep the action plus filename and requested line range.
The shell cleanup follows condensed-milk-pi's shape: split command compounds on real separators, strip pipe tails inside each segment, clean redirects/env prefixes, then classify setup/banner/status segments. Multi-command probes render as `first command + N commands`; the full command remains available in copy/detail.
Read rows now render as `Read package.json` or `Read main.ts L25-34`, using requested positive offset/limit and returned line numbers only as fallback for negative/unknown offsets.
Recurring cron jobs were prompt-cache-cold on every fire. session_id is
built as cron_<job_id>_<timestamp>, and the Codex/Responses transport used
session_id directly as prompt_cache_key — so the timestamp changed the cache
key on every run and the static prefix (agent identity + tool schemas) was
re-paid each tick.
Derive prompt_cache_key from a SHA-256 of the static prefix (instructions +
sorted tool schemas) instead. Repeated fires of the same job share one
content-addressed key (pck_<hash>) and reuse the warm prefix within the
provider's cache TTL. The key changes exactly when the prefix changes —
edit the job's prompt or toolset and it re-keys; leave it alone and it stays
stable.
session_id is left untouched for transcript isolation, log correlation, and
the Codex/xAI session-scope routing headers (session_id, x-client-request-id,
x-grok-conv-id) — those are the per-fire identity, not the cache key. Only the
prompt_cache_key body field (standard OpenAI/Codex path and the xAI extra_body
field) is content-addressed.
Closes#51395.
Co-authored-by: spiky02plateau <spiky02plateau@users.noreply.github.com>
Co-authored-by: JoaoMarcos44 <JoaoMarcos44@users.noreply.github.com>
* fix(relay): authorize relay-delivered events by delivery, not source.platform
The #52190 upstream-authz fix keyed _is_user_authorized off
source.platform via _adapter_authorization_is_upstream(source.platform).
But a relay *message* inbound carries the UNDERLYING platform
(source.platform == discord/telegram/...), NOT Platform.RELAY, because
ws_transport._event_from_wire maps the connector's wire payload
(platform="discord") straight onto SessionSource for session-keying and
egress. The relay adapter is registered only under Platform.RELAY, so
adapters.get(Platform.DISCORD) misses, the trusted-upstream branch is
skipped, and the user hits the env-allowlist default-deny:
WARNING gateway.run: Unauthorized user: <id> (<name>) on discord
(Live staging bug: alpha tester linked successfully, then every
follow-up DM was silently dropped.)
Fix: the authentic trust signal is that the event was delivered over the
per-instance-authenticated relay WS, not which platform it underlies. Add
a wire-INVISIBLE SessionSource.delivered_via_upstream_relay flag, stamped
by the relay transport in _event_from_wire, and authorize on it. The flag
is excluded from to_dict/from_dict so a peer can neither forge it across
the wire nor have it restored from persistence. The existing adapter-flag
check is retained for events whose source.platform IS Platform.RELAY
(interaction-passthrough). A direct Discord event on a multiplexing
gateway (direct + relay adapters) is unmarked and still default-denies.
* fix(relay): use identity check on delivery marker to avoid MagicMock fail-open
A MagicMock() source (used by test_signal.py and other gateway tests) auto-
vivifies source.delivered_via_upstream_relay as a truthy Mock, which a bare
truthiness check would treat as authorized — flipping
test_signal_in_allowlist_maps from False to True. The marker is a real bool on
SessionSource, so check 'is True' explicitly: refuses to authorize any non-bool
stand-in, defensive against accidental fail-open.
OpenRouter/Nous image gen now runs a quality-first model chain by default:
attempt the highest-fidelity OpenAI image model first, then fall back to
Gemini 3 Pro Image when it's access-gated/unavailable/times out. An explicit
OPENROUTER_IMAGE_MODEL / config model override pins one model with no fallback.
Atlas validation rejects malformed model output instead of shipping it: adds a
per-state collapse guard (a single sliver/fragment row no longer passes because
other rows are healthy), on top of the existing postage-stamp + multi-pose
checks.
Desktop: pet-gen native notifications are now "global" (not tied to a chat
session), so a background generation started from the command center fires an
OS notification when the user is away even with no active session. Adds a
neutral "This can take up to 5 minutes." banner on step 1, and lets the
provider picker auto-size.
Tests updated/added for the OpenRouter fallback chain, the collapse guard, and
the global notification path.
Make verification closure the default coding behavior after landed file edits while keeping bounded retries and config/env switches for users who need to disable it.
Addresses review on #51077 (kxee). The continuable-cron mirror reused
gateway.mirror.mirror_to_session, which writes role=assistant — re-
introducing the exact alternation violation #2313 (37a997945)
deliberately removed: a cron brief landing as assistant after the
agent's last turn yields assistant->assistant, which breaks strict-
alternation providers (OpenAI/OpenRouter) per issue #2221. The mirror/
mirror_source metadata is also dropped at the SQLite boundary, so the
[Delivered from cron] label is lost on replay.
This is an intentional, opt-in (default OFF) reversal of #2313's
'cron output does not belong in interactive history' for the reply-to-
cron use case — gated behind cron.mirror_delivery / attach_to_session.
Fixes:
- mirror_to_session gains a role param (default 'assistant' — interactive
send_message mirror unchanged, it IS the agent speaking). Cron paths
pass role='user' with a '[Cron delivery: <task>]' prefix so the brief
collapses via repair_message_sequence's consecutive-user merge on every
provider, and stays distinguishable on replay despite the metadata drop.
- thread_seeded: defer seeding + the flag until delivery into the new
thread actually succeeds. Previously set pre-delivery, so an open-
succeeds / deliver-fails case both stranded a seeded-but-unseen brief
AND suppressed the DM-fallback mirror.
- seed mirror now passes user_id='system:cron' to resolve the exact
thread-keyed session row it just created.
- dedupe the duplicate BasePlatformAdapter import in _deliver_result.
- trim oversized docstrings to non-obvious WHY (AGENTS.md).
- docs: document cron.mirror_delivery / attach_to_session in
website/docs/user-guide/features/cron.md.
- test: assert the cron mirror writes role='user' with the label prefix.
204 cron+mirror tests pass.
Continuable cron jobs (attach_to_session / cron.mirror_delivery, default
OFF) now prefer a dedicated thread on thread-capable platforms, falling
back to origin-DM mirroring where threads don't exist.
- Thread-capable (Telegram topics, Discord/Slack threads): open a fresh
thread for the job via the shipped adapter.create_handoff_thread,
route the brief into it, and seed the thread-keyed session so the
user's in-thread reply continues with full context. This is the
'continuable cron opens its own thread' interface.
- DM-only (WhatsApp/Signal/SMS): create_handoff_thread returns None ->
fall back to mirroring into the origin DM session (existing behaviour).
Reuses existing infrastructure end-to-end — no new adapter surface, no
provider-chain signature change:
- adapter.create_handoff_thread (already implemented per-platform,
returns None on unsupported platforms = the fallback signal)
- the live SessionStore via adapter._session_store (already set on every
adapter), reached without threading a new param through the frozen
CronScheduler.start() contract
- gateway.mirror.mirror_to_session for the seed/append
- existing per-target delivery routing carries the new thread_id for free
Mirrors GatewayRunner._process_handoff's open-thread-or-fallback +
seed pattern, standalone for the cron delivery path. thread_seeded
guards against a double-mirror after seeding. Scoped to the origin
target only; fan-out/broadcast targets are never threaded or mirrored.
Config docs updated (cron.mirror_delivery) + cronjob tool
attach_to_session description reframed around continuable/thread-preferred.
Tests: +5 (thread id returned on thread platform; None on DM platform;
None without capability/loop; seed creates thread session + mirrors;
seed no-op on empty). 22/22 in TestCronDeliveryMirror; 532 cron tests
pass (4 failures pre-existing: croniter-not-installed + TZ).
Multi-participant parity with interactive send_message, which passes
HERMES_SESSION_USER_ID to gateway.mirror.mirror_to_session so the mirror
lands in the exact participant's session.
- cronjob_tools._origin_from_env now captures user_id from the session
context at job-create time (alongside platform/chat_id/thread_id).
- _maybe_mirror_cron_delivery forwards user_id to mirror_to_session.
- _deliver_result threads origin.user_id through for the origin target.
Effect: in a per-user-isolated group chat (group_sessions_per_user=True,
the default), the mirror resolves to the member who scheduled the job
instead of conservatively no-op'ing on ambiguous candidates. DMs and
shared group/thread sessions are unaffected (single candidate). Default
still OFF.
Tests: helper forwards user_id; E2E _deliver_result forwards origin
user_id. 17/17 in TestCronDeliveryMirror; 527 cron tests pass (4 failures
pre-existing: croniter-not-installed + TZ, identical on baseline).
The cron->session mirror now fires ONLY for the delivery target that
equals the job's origin (platform+chat_id[+thread_id]). A job created
from a live gateway chat stamps that chat as origin, and that session is
guaranteed to exist (it is the conversation the user scheduled the job
in). Fan-out / broadcast / home-channel-fallback targets are never
mirrored: they are not a continuation of a conversation and may have no
session at all.
This makes the prior 'cold-start session seeding' concern a non-case by
construction: when the mirror semantically applies the session exists;
when none exists the target was never the origin, so we no-op.
Adds _target_matches_origin() + origin-scoping tests (exact match,
other-chat/other-platform/no-origin rejection, thread scoping, fan-out
mirrors only the origin target).
Adds an opt-in path so a cron job's delivered output is also appended to
the TARGET chat's gateway session transcript (as an assistant turn), so a
user reply to a recurring delivery (daily brief, reminder) is answered with
the delivery in context instead of 'what is that?' amnesia.
- Reuses the shipped gateway.mirror.mirror_to_session — the same primitive
interactive send_message mirroring already uses. No messaging-toolset
change (cron still can't call send_message; this rides delivery).
- Gated: per-job attach_to_session overrides global cron.mirror_delivery
(config.yaml). Default OFF — historical isolation preserved byte-for-byte.
- Mirrors the CLEAN agent output, not the cron header/footer wrapper.
- Alternation/cache-safe: append lands at a turn boundary, never mid-loop,
never mutates the cached system prompt. Cold-start (no target session)
is a silent no-op; mirror errors never fail a successful delivery.
- Surfaced on the cronjob tool (attach_to_session) + config schema.
Driven by enterprise cron-as-control-plane use case. 10 new tests; full
cron + cronjob-tool suites pass (600).
ToolFallback rebuilt the `part` wrapper every render, defeating the
buildToolView memo and re-running a full JSON.stringify of the result on
every ~33ms stream delta. A /learn over a large directory (many ~100KB
tool results) saturated the renderer main thread (hang/throttle) and
spiked memory until it OOMd (crash).
- Re-derive a stable `part` from the referentially-stable args/result so
the view/copy memos hold across deltas.
- Clamp every inline-painted payload (detail, stdout/stderr, rawResult,
technical trace) to MAX_TOOL_RENDER_CHARS; the row's Copy button still
reads the uncapped view.detail for the full output.
A DM reply carries no guild_id, so the connector's egress guard cannot
resolve the owning tenant from metadata.guild_id and declines the send
with "discord egress declined: target not routed to an onboarded tenant"
— the bug behind "the bot never replies in DMs". Guild replies are
unaffected (they carry guild_id), which is why the guild path worked
end-to-end while DMs looked broken.
The connector now resolves a DM reply's tenant from the recipient's
author binding (gateway-gateway #67, resolveByUser keyed on
metadata.user_id) — the outbound counterpart to inbound Phase 7a
author-first resolution. But it needs the recipient user_id ON the
outbound action, and the adapter only re-attached guild_id
(_capture_scope/_with_scope), no-op for DMs (the docstring even said so).
This extends the adapter's inbound-scope capture: for a DM (no guild_id)
remember chat_id -> the authentic author user_id we observed, and
re-attach it as metadata.user_id on outbound. Guild capture is unchanged
and wins when present; user_id is the DM-only fallback. The id is the one
the connector observed inbound (never gateway-asserted), so the trust
invariant holds.
+4 unit tests (DM reply re-attaches user_id + no guild_id; unknown chat
invents nothing; explicit user_id preserved; guild reply never carries
user_id). Proved load-bearing (reverting the re-attach fails the DM
test). 144 relay tests pass, ruff clean.
Pairs with gateway-gateway #67 (the connector-side resolver). Together
they close the DM-reply egress gap end-to-end.
terminal_tool() resolves a per-task cwd override that WINS over config["cwd"]:
cwd = overrides.get("cwd") or config["cwd"]
config["cwd"] is sanitized for container backends in _get_env_config() (host
prefixes /Users//home//C:\\/C:/ and relative paths are replaced with the
backend default /root). But the override was applied RAW — it was never run
through that guard. The gateway/TUI registers the host launch dir as a cwd
override for workspace tracking (tui_gateway/server.py _register_session_cwd
-> _terminal_task_cwd -> _session_cwd -> os.getcwd()), so on a container
backend a host path leaked straight to `docker run -w <host-path>`:
- Windows desktop: -w C:\Users\<user> -> container fails to start (exit 125)
- POSIX: -w /home/<user> -> same
The ACP adapter translates its override cwd (acp_adapter/session.py
_translate_acp_cwd), but the gateway path did neither translation nor
sanitization, so the override bypassed the one guard that would have caught it.
Fix: extract the host/relative-path predicate into a shared
_is_unusable_container_cwd() helper (so the existing _get_env_config()
sanitizer and the new guard can't drift), and re-apply it to the *resolved*
cwd at the override-resolution site. Valid in-container override paths
(RL/benchmark sandboxes that set cwd to /workspace, /root, ...) are absolute
non-host paths and pass through untouched.
Tests: unit-pin the predicate (Windows backslash/forwardslash, POSIX home,
macOS /Users, relative, valid container paths) AND an E2E call-site pin that
drives terminal_tool() with a host-path override registered and asserts the
cwd reaching _create_environment is sanitized. Mutation-verified: reverting
the call-site guard makes the two host-path E2E tests fail (showing the raw
host path leaking) while the valid-/workspace-override test stays green.
The desktop bootstrap (and curl/PowerShell/docker installs) seeded
~/.hermes/SOUL.md with a comment-only scaffold that contained no persona
text. That shadowed the runtime default (_ensure_default_soul_md ->
DEFAULT_SOUL_MD), since seeding is guarded by 'if SOUL.md doesn't exist'.
Result: every fresh installer install got the empty template instead of
the documented Hermes persona; desktop just made it visible in onboarding.
- install.sh / install.ps1 / docker/SOUL.md now write DEFAULT_SOUL_MD.
- _ensure_default_soul_md() upgrades a SOUL.md still matching the known
legacy scaffold in place; customized files (any deviation, incl. a
persona appended below the comment) are never touched.
- Detection normalizes CRLF/BOM so Windows-installer drift still matches.
The Desktop GUI (tui_gateway) slash worker subprocess has no reader for
the CLI's _pending_input queue. /learn's CLI handler prints the ack and
puts the built prompt onto that queue, so in the TUI the prompt was
silently dropped — ack shown, no LLM turn, no skill created (#51829).
command.dispatch already handles 'learn' correctly (returns
{type: send, message: build_learn_prompt(arg)}), but 'learn' was missing
from _PENDING_INPUT_COMMANDS, so slash.exec fell through to the worker
instead of routing to command.dispatch. Add it to the frozenset, matching
the existing goal/queue/steer/plan pattern.
The gateway-side BEHAVIOUR layer that consumes the relay scale-to-zero
primitives (gateway-gateway Phase 5): the gateway decides it is idle and
drives the relay transport dormant so the platform (Fly autostop:"suspend")
can suspend the now-traffic-idle machine, which wakes on the connector's
wakeUrl poke (decisions.md Q3=C', D1-D13).
- gateway/scale_to_zero.py: pure helpers — scale_to_zero_enabled (the NAS
Labs HERMES_SCALE_TO_ZERO stamp, D11/Q8=A), parse_idle_timeout_seconds
(config.yaml gateway.scale_to_zero.idle_timeout_minutes, D2),
messaging_is_relay_only_or_absent (F6/D1), should_arm (D1/D11/§3.4(1)),
is_idle (D2/D3/F7).
- gateway/run.py: _last_inbound_at clock stamped on user inbound in
_handle_message (F13); the arm-gate + idle predicate + the
_scale_to_zero_watcher dormant sequence (mark draining -> adapter
go_dormant() -> cooldown), started only when armed. Deliberately NOT the
stop path and NOT mark_resume_pending (F12/D13).
- tools/process_registry.py: has_any_active() for the bg-work guard (D3/F7).
- hermes_cli/config.py: gateway.scale_to_zero.idle_timeout_minutes default 5.
Tests: 38 pure-logic + 6 watcher (incl. bg-work regression guard proven RED).
Full relay + scale-to-zero suites: 184 passed. The 20 unrelated failures in
the broader run are PRE-EXISTING on origin/main (custom-provider/tools tests),
confirmed via a pristine baseline worktree.
Net-new WebSocketRelayTransport.go_dormant() + RelayAdapter.go_dormant() —
the third transport mode the scale-to-zero behaviour layer needs, distinct
from both disconnect() and an unexpected close (decisions.md D12/F14):
- disconnect() sets _closing=True and CANCELS the reconnect supervisor
(terminal "shutting down for good") -> a suspended machine never re-dials
on wake, stranding its buffered backlog.
- an unexpected close re-dials IMMEDIATELY -> the socket never stays down,
so the platform proxy never suspends the machine.
go_dormant(): going_idle->ack (reuse go_idle), then close the socket WITHOUT
setting _closing, so the reader's fall-through still arms the reconnect
supervisor (wake path stays live) but on the longer _dormant_redial_s
cadence so it doesn't fight the platform suspend window. A successful re-dial
clears _dormant. Honors the §3.4 wake->reconnect->drain contract.
Tests: 6 new in test_relay_going_idle.py incl. the F14 regression guard
(routing dormancy through disconnect() fails exactly the 4 wake-path tests).
Full relay suite 140 passed.
Regression for the refText crash: attachmentDisplayText and
optimisticAttachmentRef must return null (not throw) when handed an
undefined/null attachment hole, so the submit path can't reproduce
"Cannot read properties of undefined (reading 'refText')".
A session switch or draft restore can leave undefined/null holes in the
composer attachments array. AttachmentList was guarded against this in
#49624, but the sibling submit path was not: submitPromptText maps the
same array through attachmentDisplayText/optimisticAttachmentRef and
buildContextText (a.kind / a.label / a.refText), so a hole threw
"Cannot read properties of undefined (reading 'refText')" — an uncaught
renderer error that blanks the chat pane and shows "Desktop app link
offline".
Close the whole bug class:
- attachmentDisplayText / optimisticAttachmentRef no-op on a falsy
attachment (shared chokepoint, also protects thread.tsx drop handler).
- submitPromptText filters falsy entries from the source array, and
buildContextText filters its (possibly post-sync) input before reading
fields.
Stabilize the long-running-tool heartbeat test by patching stale thresholds inside the test and asserting the heartbeat exceeds the idle ceiling, which preserves intent while removing scheduler-sensitive assumptions that flake in CI.
Wire the sparkle generate button's cancel action to the same discard/reset path as step-2 cancel so abort semantics are consistent and always return to step 1 while retaining the prompt input.
PR #52151 hardened the runtime-status liveness check to trust a readable
live process command line over stale gateway_state.json argv, so a recycled
PID now owned by an s6 supervisor no longer counts as a running gateway.
That fix is correct but incomplete for the reported symptom: the web
dashboard showed a named profile's gateway green while
`hermes -p <name> gateway status` showed it stopped. Two further issues:
1. Cross-profile PID reuse. In per-profile Docker supervision, one profile's
stale `gateway_state.json` can record a PID the OS later recycled onto a
DIFFERENT profile's live gateway. That PID's command line still
`looks_like_gateway`, so the dead profile was reported running. The
recorded argv has its `-p <name>` selector stripped in-process by
`_apply_profile_override`, so it cannot disambiguate; the live `/proc`
cmdline still carries it. `get_runtime_status_running_pid` now accepts an
`expected_home` and validates the live command line belongs to THAT
profile (mirroring `hermes_cli.gateway._matches_current_profile`, the
logic the CLI scan path already uses — which is why the CLI was correct).
`_check_gateway_running` passes the enumerated profile dir.
2. The existing regression test `test_gateway_running_check_falls_back_to_
runtime_state` used the live pytest PID with a gateway-shaped record; once
the live cmdline became authoritative it no longer looked like a gateway.
Updated to mock the live cmdline to the real separate-process scenario it
describes.
The active-profile path (`get_running_pid`) is intentionally left unscoped:
it is lock-verified and any live gateway cmdline is acceptable there. Multiplex
mode is unaffected — `running` state is only ever written to a gateway's own
home, never a secondary served profile's.
Adds coverage for: cross-profile PID reuse (named + default), matching
profile cmdline (`-p`, `--profile`, explicit HERMES_HOME=), the bare default
gateway, and the unreadable-cmdline cross-platform fallback. Each new
cross-profile assertion fails without the profile scope and passes with it.
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
Remove cute/chibi-biased wording from base draft variations and explicitly preserve the requested mood across base and row prompts so scary, eerie, or other non-cute concepts are honored while keeping sprite constraints.
Two ways the update overlay read as stuck even though the update was
streaming progress underneath:
- In-app (macOS/Linux) UpdatesOverlay: runStreamedUpdate forwards every
stdout line as a progress event with percent: null, and ingestProgress
wrote that straight through — clobbering the milestone percents (10/60)
so the bar fell back to indeterminate on every log line. Keep the last
percent when a line carries null.
- Staged install/update overlay: the bar is completedCount / totalCount,
which counts only *finished* stages, so a long first stage pinned it at
"0 of 2" / 0% until the stage ended. Count the running stage as half a
unit so the bar advances during the stage (the per-stage spinner already
shows which step is live).
Both are display-only; no stage/event semantics change. (The Windows
hermes-setup Tauri progress UI in apps/bootstrap-installer has the same
counter-only-on-completion logic — parity follow-up.)
GET /api/tools/toolsets returns the full CONFIGURABLE_TOOLSETS set with no
desktop curation, so the Skills & Tools → Toolsets list shows entries that
don't belong in a flat per-user toggle: platform-coupled toolsets (discord,
discord_admin, yuanbao — which `hermes tools` already platform-restricts off
the CLI) and internal plumbing (context_engine, moa). `hermes tools` curates
these out; the desktop didn't.
Add a small documented block-list + predicate (mirroring
desktop-slash-commands.ts) and apply it in the toolset list filter. Hiding a
row is cosmetic — enabled state and runtime gating are untouched.
restartGateway, getActionStatus, getStatus, updateHermes and
checkHermesUpdate all hit window.hermesDesktop.api WITHOUT spreading
profileScoped() — unlike their siblings (getModelInfo, setModelAssignment,
grantComputerUsePermissions). _apiProfile tracks the active gateway
profile, and the Electron proxy uses request.profile to pick which pooled
/ remote backend serves the call.
So for a multi-profile or global-remote user, the System-panel "Restart
gateway" (and its status poll, plus Update / status reads) targeted the
primary/default backend instead of the one they're on: the restart hit
the wrong gateway and the poll never saw the action → it looked like
restart silently failed. Single-profile users are unaffected
(profileScoped() returns {} when no profile is active).
Add ...profileScoped() to the five backend-action helpers so they follow
the active profile like the rest of the API surface.
On macOS, the desktop updater's stage 1 (hermes update --gateway) ends by
restarting running gateways. launchd_restart() SIGTERMs the gateway and
silently waits up to agent.restart_drain_timeout (default 180s) for the
drain; the manual profile-gateway loop waits its drain budget per gateway
the same way. Neither path prints anything before the wait, so the desktop
updater's live output goes dead for minutes right after '✓ Update
complete!' — users read it as a hung update and force-kill their gateway
processes to make it move (#44515). The systemd branch already announces
its drain ('draining (up to Ns)...'); launchd and the manual loop did not.
Print the stop/drain (with PID and budget) before the wait in both paths,
mirroring the systemd branch, and assert the message in the existing
launchd drain test.
Fixes#44515
checkUpdates() ran `git rev-list HEAD..origin/<branch> --count`
unconditionally in the parallel probe batch, even on the shallow +
no-merge-base path where resolveBehindCount() ignores the result and
falls back to a SHA compare. In the #51922 failure mode that count walks
the entire remote ancestry (thousands of commits), so the work was pure
latency on every update check for the exact case the fix targets.
Split the probes into two phases: resolve --is-shallow-repository and
merge-base first, then run rev-list --count only when shouldCountCommits
says the number is meaningful (full clone, or shallow-with-merge-base).
The shallow/no-merge-base SHA fallback is preserved unchanged.
The desktop installer clones with `--depth 1`, so a public install's local
history often shares no merge-base with the freshly fetched origin tip. In
that state `git rev-list HEAD..origin/<branch> --count` enumerates the
entire remote ancestry and returns a meaningless huge number, surfacing as
e.g. "v0.17.0 (+12104)" in the update indicator (#51922).
The official-SSH branch of checkUpdates() already sidesteps this by reporting
a binary up-to-date check (`behind: currentSha === targetSha ? 0 : 1`), and
hermes_cli/banner.py guards the identical class for the CLI banner. The
passive desktop count path was the one place the shallow guard was missing.
Detect shallow + no-merge-base up front and fall back to the same SHA-based
binary check; full clones (developers / Docker dev images) keep the exact
count path unchanged. The resolution logic lives in a pure update-count.cjs
helper so it is unit-testable without booting Electron.
Ship the final pet-generation UX polish (provider picker behavior, step-2 cancel flow, banner integration, and visual consistency) and make saturated-chroma background removal C-op driven so hatch processing no longer hammers the machine during long runs.
A hosted instance fronted by the Team Gateway connector dropped EVERY relay
message as "Unauthorized user" and the agent never replied — despite the
message routing correctly through the connector to the instance.
Root cause: gateway authorization (_is_user_authorized) had no notion of
upstream-enforced authz. Platform.RELAY matches no {PLATFORM}_ALLOWED_USERS
allowlist and isn't in the HA/WEBHOOK always-authorized set, so a relay user
with no env allowlist configured hit the default-deny ("No user allowlists
configured. All unauthorized users will be denied."). The message was received,
then silently denied before reaching the agent.
This is incorrect for relay: the connector authenticates the gateway's WS with
a per-instance secret and performs owner-only author-binding resolution BEFORE
delivering. A message only reaches this gateway because the connector resolved
it to THIS instance's bound user (user_instance_binding), keyed on the author id
the connector OBSERVED off the event — never a gateway claim. The authorization
decision is already made by a trusted, authenticated upstream; there is no local
RELAY_ALLOWED_USERS allowlist to consult, and default-denying for its absence is
the bug.
Fix: add a generic BasePlatformAdapter.authorization_is_upstream capability
(default False) that the relay adapter overrides to True, plus a dedicated
trusted branch in _is_user_authorized that honors it. This is delegation to a
trusted upstream, NOT a fail-open: it fires only for an adapter that explicitly
declares the flag; every direct network-exposed adapter leaves it False and the
env-allowlist default-deny (SECURITY.md §2.6) is unchanged. Distinct from
enforces_own_access_policy, which mirrors a LOCAL config-driven allowlist —
this delegates to an authenticated upstream's decision.
Tests: behavior contract that the base defaults False, the relay adapter
declares True, a relay user (group + DM) is authorized with no env allowlist,
and crucially a non-upstream adapter with no allowlist still default-denies
(guards against the fix becoming a blanket fail-open). 6 new tests; relay +
authz + config-policy suites green (134 + 90).
Found via live staging debug of the Discord self-serve onboarding flow.
The 8-minute stream-silence watchdog only removed a stuck session from
$workingSessionIds (the sidebar dot). The composer's busy state lives in
the session-state cache and was never cleared, so a hung or looping turn
that never delivered its terminal event — including an old session
re-opened while the backend still reports it "running" — stayed wedged on
"Thinking" / Stop indefinitely.
Have the watchdog notify subscribers when it force-clears a session, and
subscribe from the session-state cache to also drop that session's
busy/awaiting/needsInput flags. updateSessionState re-syncs $busy when the
healed session is the one on screen, so the composer recovers instead of
spinning forever.
Frontend-only safety net; doesn't touch the turn lifecycle. The backend
root (a stale in-memory session["running"] surviving a dead turn thread
and re-arming busy on every resume) is a separate follow-up.
When a remote gateway dropped after a healthy boot (internet loss,
sleep/wake, VPS restart), use-gateway-boot retried with backoff forever
and never surfaced an error. The renderer sat behind the fullscreen
CONNECTING overlay with gatewayState non-open and boot.error null — no
way to reach Settings, sign in again, or switch to a local gateway. To
the user the app was simply broken on connection loss.
Raise a recoverable boot error once the reconnect loop crosses
RECONNECT_ESCALATE_AFTER (6 attempts, ≈45s), so the BootFailureOverlay
(Retry / Sign in / Use local gateway) replaces the dead-end CONNECTING
screen. The loop keeps retrying underneath; the next successful reconnect
(or a manual/wake-driven one) clears the error and dismisses the overlay.
This implements the contract already specified — but never wired up — in
use-gateway-boot.test.tsx (desktop vitest isn't in CI, so the failing
"FIX:" specs went unnoticed). All 4 hook tests + the 3 connecting-overlay
tests pass.
Three voice-mode papercuts in the desktop app:
1. Ctrl+B did nothing. The docs + `voice.record_key` advertise Ctrl+B to
talk, but the desktop never bound it (only ⌘B = sidebar existed). Add a
rebindable `composer.voice` action that toggles the voice conversation,
defaulting to ⌃B on macOS (distinct from ⌘B; off-macOS `ctrl` folds to
the sidebar chord, so it ships unbound there to avoid stealing it). The
global keybind reaches the composer through a new focus-bus event.
2. The Voice settings page rendered every provider's options at once (~30
fields). Filter to the *selected* TTS/STT provider's sub-fields; STT
provider fields hide when STT is off. Picking "edge" now shows just the
Edge voice, making it obvious voice chat also needs STT enabled.
3. Voice mode could hang "speaking" forever. Free Edge TTS sometimes returns
audio that never fires `playing`/`ended`/`error`, so the playback promise
never settled. Add a stall watchdog (rearmed on each progress tick, so
long speech is never cut off) that rejects a stuck stream, letting the
loop recover with a clear error.
CI test shard has no PyPI egress: the real 'pip install packaging==20.9'
in test_core_package_is_not_shadowed failed (the pypi.org reachability
probe passed but the actual install didn't), failing slice 2/6.
- Prove the anti-shadow invariant deterministically: synthesize a fake
'packaging' in the durable target with a sentinel and assert the import
still resolves to the core copy (TestCoreNeverShadowed). No network.
- Cover the install wire offline: stub subprocess and assert --target +
--constraint are built in durable mode and absent in venv-scoped mode
(TestInstallArgConstruction).
- Gate the genuine PyPI install behind HERMES_RUN_NETWORK_TESTS=1 (opt-in,
skipped in CI) instead of a flaky reachability probe that doesn't predict
install success.
The published Docker image seals the agent venv (root-owned, read-only
/opt/hermes) and sets HERMES_DISABLE_LAZY_INSTALLS=1 so a runtime install
can't mutate and brick the core. But opt-in backends (Firecrawl web search,
Exa, Feishu, ...) deliberately keep their SDKs in tools/lazy_deps.py and out
of [all] (pyproject policy 2026-05-12: one quarantined release must not break
every install). The two policies collided: the SDK isn't baked in AND can't
lazy-install, so the default Firecrawl web_search/web_extract fail out of the
box in Docker (#51136), as do Exa (#49445) and Feishu (#50205).
Fix the whole class instead of baking in one backend: when
HERMES_LAZY_INSTALL_TARGET is set, lazy installs are redirected to a writable
dir on the durable /opt/data volume via `pip/uv install --target`, and that
dir is APPENDED to the end of sys.path. Because the core venv always wins
name collisions, a package installed this way can only ADD new modules — it
can never shadow, downgrade, or break a module the core ships. The worst a
bad/incompatible backend package can do is fail to import and report itself
unavailable; the agent core stays healthy. That structural guarantee is what
made it safe to seal the venv, and it is preserved here even with installs
re-enabled.
- tools/lazy_deps.py: durable-target mode — `--target` install + core-pinned
`--constraint` file (shared deps resolve to core's versions, conflicts fail
loudly at install time), append-only sys.path activation, ABI/Python-version
stamp that wipes the store if an image rebuild bumps the interpreter, and a
reworked gate so HERMES_DISABLE_LAZY_INSTALLS=1 redirects (rather than hard-
blocks) when a target is set. security.allow_lazy_installs=false still
disables installs in every mode.
- hermes_bootstrap.py: activate the durable target on sys.path at first import
(before any backend imports its SDK) so packages installed on a previous run
are importable on this run.
- Dockerfile: set HERMES_LAZY_INSTALL_TARGET=/opt/data/lazy-packages.
- docker/stage2-hook.sh: seed + chown the dir on the data volume.
- tests: real-install E2E proving installs land in the target, import cleanly,
don't leak into the sealed venv, and that a core package is never shadowed;
ABI-stamp wipe/preserve; gate matrix; Dockerfile/stage2 contract test.
Fixes#51136
The status-bar "Agents" item conflated three unrelated signals — running
subagents (aggregated across all sessions), in-flight session turns, and
failed background *system* actions (gateway restarts, toolset installs,
computer-use grants via $desktopActionTasks/preview restart) — yet
clicking it opens AgentsView, which renders only subagents. A failed
gateway restart therefore showed "Agents (1 Failed)" over an empty
"No live subagents" tree. AgentsView also filtered to the active session,
so a subagent running in a background session showed "Agents N running"
with nothing in the tree (the desync reported in #49808).
Unify the scope both surfaces speak:
- AgentsView aggregates subagents across every session (salvages #49819).
- The indicator's running/failed counts come from subagents only
(aggregated), never background system actions — those keep their own
surfaces in settings / command center.
So "Agents (N …)" now always points at a populated Spawn tree.
Supersedes #49819. Fixes#49808.
When Telegram's sendRichMessage returns a FloodWait/RetryAfter error,
_try_send_rich() now extracts the server-provided retry_after value and
propagates it through SendResult.retry_after. The base _send_with_retry()
layer honors this value instead of using its default short exponential
backoff (~2s, ~4s), preventing the retry budget from being exhausted
against a server that demands a 25-37s wait.
Salvaged from #46774 by @liuhao1024. Telegram adapter path moved from
gateway/platforms/telegram.py to plugins/platforms/telegram/adapter.py
since the original PR.
Closes#46762
Closes#47707
Context engines and memory providers expose tool schemas via
get_tool_schemas(). agent_init.py wrapped each as
{"type":"function","function":_schema} without validating that
_schema carries a top-level name. A provider returning an entry already
in OpenAI tool form ({"type":"function","function":{...}}) was then
double-wrapped into a tool whose function has no name. Strict providers
(e.g. DeepSeek) reject the entire request with HTTP 400
'tools[N].function: missing field name', so one malformed schema
silently disables the whole toolset and breaks every turn. The schema
was also never added to valid_tool_names, so even lenient providers
could not call it.
Add a shared normalize_tool_schema() helper that unwraps an
already-wrapped entry and returns None for anything lacking a resolvable
string name. Wire it into the agent_init context-engine loop and all
three memory_manager surfaces (inject_memory_provider_tools,
add_provider routing index, get_all_tool_schemas), so a single bad
plugin schema is skipped with a warning instead of poisoning the
request.
Verification: 209 targeted agent/memory tests pass (incl. 9 new).
New tests assert the unwrap + skip-nameless behavior and fail without
the fix.
When tempfile.mkdtemp() raises OSError (e.g. disk full), the exception
propagated past the try/finally block, so _mark_install_failed() was
never called. The 24h backoff marker never engaged, causing unbounded
retry on every command -- each attempt leaked a tirith-install-* temp
directory, eventually filling /tmp completely.
Fix: wrap mkdtemp in its own try/except OSError, returning
(None, "no_space") so the caller's normal failure path (including
_mark_install_failed) executes.
Salvaged from #51831 by @liuhao1024.
Closes#51826
When delegate_task spawns a child agent with a different model/provider, the
child's init_agent loaded the plugin context-engine GLOBAL singleton by
reference (`_selected_engine = _candidate`) and then called update_model() on
it with the child's (smaller) context_length. Because parent and child shared
the same object, this mutated the PARENT's compressor: e.g. DeepSeek 1M ctx
silently dropped to 204800 and the compression threshold from 200K to 40K
after any delegate_task with a different model.
Deepcopy the singleton before assigning/mutating it (agent_init.py) so the
child gets its own instance and the parent's compressor is untouched.
Salvaged from #42452 by @liuhao1024 (authorship preserved). Added a
source-pin regression test that fails if the production line reverts to the
bare alias, plus an end-to-end test driving get_plugin_context_engine() and a
StubEngine.update_model() — the original PR's tests exercised copy.deepcopy in
isolation but did not guard the actual agent_init code path.
Closes#42449. Supersedes #42469, #42474 (same one-line fix, no test).
A single ddgs (DuckDuckGo) search could hang indefinitely and block the
shared agent loop — and therefore every platform (CLI, Telegram, Matrix...).
The DDGS constructor's timeout only bounds individual HTTP requests; ddgs's
multi-engine retry loop has no overall cap, so a slow/rate-limited response
could spin for 20+ minutes with no output and no error.
Run the synchronous ddgs call in a single-worker ThreadPoolExecutor and cap
it with future.result(timeout=_SEARCH_TIMEOUT_SECS=30). On timeout, return a
clear failure ("DuckDuckGo search timed out ... try a different provider")
instead of blocking; the pool is shut down with cancel_futures so a hung
worker is never awaited.
Salvaged from #37422 by @uzunkuyruk (authorship preserved). Re-applied on
current main (the PR's provider.py base had diverged). Added a load-bearing
timeout regression test (the original PR only updated the fake's constructor
and had no timeout-behavior test) — mutation-verified to fail without the cap.
Closes#36776.
_strip_blocked_tools used a hardcoded set missing 'cronjob'. Children
on gateway platforms could inherit the cronjob toolset, scheduling
persistent jobs that outlive the delegation despite DELEGATE_BLOCKED_TOOLS.
Fix: derive the strip set from DELEGATE_BLOCKED_TOOLS at runtime so the
two lists can never drift. Add 'cronjob' to DELEGATE_BLOCKED_TOOLS for
documentation consistency. Two regression tests lock the invariant.
Salvaged from #43687 by @riyas22. Adapted test to current main (no
'messaging' toolset exists -- send_message is intentionally not
registered as an agent tool).
Closes#43466
Corrupted sessions.json entries (e.g. a bare bool where a dict is
expected) caused TypeError on 'origin' in data' which escaped the
(ValueError, KeyError) inner except and aborted loading ALL remaining
sessions, not just the corrupted one.
Two-layer fix:
- Loop level: isinstance(entry_data, dict) guard before from_dict
- from_dict: isinstance(data['origin'], dict) instead of bare truthiness
- Added TypeError to the inner except as defense-in-depth
Closes#46994
The preflight-compression gate only ran the (expensive) token estimate when
the message COUNT exceeded protect_first_n + protect_last_n + 1. A session
with a handful of very large messages never tripped the count condition, so
compression was never attempted and the turn eventually hit a hard
context-overflow error.
Add _should_run_preflight_estimate() with OR semantics: run the estimate when
either the message count exceeds the protected ranges (the historical gate)
OR a cheap char-based estimate already crosses the configured threshold. The
downstream estimate_request_tokens_rough() stays authoritative — this is only
a hint that decides whether to pay for the full estimate.
Salvaged from #27435 by @texhy (authorship preserved). Re-applied on current
main: the preflight gate moved from conversation_loop.py to turn_context.py
since the PR was opened, so the helper + gate are placed there; the test
imports the real MINIMUM_CONTEXT_LENGTH instead of a hardcoded literal.
Closes#27405.
Add compression.minimum_context_floor config key that allows users
to lower the compression threshold floor below the hardcoded 64K
default, preventing infinite tool-call loops on models whose
structured output degrades well before 64K tokens.
- agent/model_metadata.py: add get_configurable_minimum_context()
helper with 16K hard safety limit
- agent/context_compressor.py: accept minimum_context_floor param,
thread it through _compute_threshold_tokens
- agent/conversation_compression.py: use compressor's floor for
aux model context validation
- agent/agent_init.py: read compression.minimum_context_floor from
config and pass to ContextCompressor
- gateway/run.py: cache-busting includes new key
Salvaged from #31686 by @Tranquil-Flow onto current main.
Resolves conflicts with in-place compaction (#38763) and max_tokens
threshold computation (#43547) that landed after the original PR.
Closes#31600
The drift guard (introduced for #26045) correctly protects replace/remove
from clobbering un-roundtrippable content, but it also fires on the add
path. Since add only appends and never overwrites, the guard is
unnecessary and causes false positives when prior add() calls in the same
session shift the byte count of the on-disk file.
Add skip_drift parameter to _reload_target() and pass True from add().
Replace/remove continue to use the drift guard unchanged.
Salvaged from #42880 by @liuhao1024.
Closes#42874
When no reference-capable image backend is configured, generating a pet is
impossible — so instead of a dead prompt + post-hoc error, the overlay now
detects it up front and offers a way out:
- pet.generate.status RPC reports whether a reference-capable provider
(OpenRouter / Nous Portal / OpenAI) is set up; the overlay probes it on
open and swaps the prompt for a friendly setup card (paw, one-line copy,
"Set up image generation" → /settings?tab=providers, key links).
- useRouteOverlayActive(): reusable hook so any portaled modal yields the
screen to a full-screen route overlay (e.g. settings) and reappears —
re-running its mount effects — on return, instead of closing. The probe
re-runs on that remount, so adding a key flips the card to the prompt.
The /new (and /reset) confirmation-button callback runs the slash-confirm
handler on the asyncio event loop (see _request_slash_confirm). That handler
calls _handle_reset_command, which invoked the SYNCHRONOUS, potentially
long-blocking _cleanup_agent_resources inline: agent.close() tears down
terminal sandboxes, browser daemons and background processes (subprocess
waits), and shutdown_memory_provider() can make a network call. A slow
teardown wedged the entire event loop, so the bot went silent and stopped
processing all messages until a manual restart.
Offload _cleanup_agent_resources via the existing contextvar-preserving
_run_in_executor_with_context helper, bounded by asyncio.wait_for with a
named _RESET_CLEANUP_TIMEOUT_S (30s). The loop is never blocked; on timeout
the reset proceeds and the worker thread is left to finish on its own (it
cannot be cancelled). The text /new path is unaffected (already off-loop).
Tests (tests/gateway/test_35994_reset_button_deadlock.py): the loop keeps
ticking while close() blocks in its worker thread; a cleanup that raises is
swallowed (warning logged) and the reset still rotates the session; a
cleanup that times out degrades gracefully. All three are mutation-verified
to fail without their respective production branch.
- pet.generate / pet.hatch (parallel rows, off the reader thread) +
cooperative pet.cancel; pet.export / pet.rename.
- pet.gallery localOnly fast path + background manifest prefetch so the
picker never blocks on petdex; rename follows the active-pet config.
- gateway request gains optional timeout + AbortSignal for real Stop.
Reference-grounded image provider over the OpenRouter-compatible
chat-completions image protocol (Gemini Flash Image et al.). Nous Portal
proxies OpenRouter, so one provider serves both — giving pet generation a
reference-capable backend beyond OpenAI gpt-image.
Turn a text prompt into a petdex-spec spritesheet (8×9 grid of 192×208
cells), grounded so every animation row stays the same creature:
- orchestrate: base drafts (distinct variation nudges) → per-row grounded
generation → atlas compose; one image call per row, rows fan out in parallel.
- atlas: frame-perfect registration in normalize_cells — 1-D cross-correlation
of each frame's column-mass profile locks the body (robust to limbs/cape),
one shared per-state scale, bottom-anchored; plus alpha-hole repair, gutter
severing, and interior-seeded chroma-pocket clearing.
- prompts: pixel-art-by-default style hints + registration constraints.
- store: local pet write (register_local_pet), slugify/unique_slug,
export_pet, slug-realigning rename_pet, createdBy provenance.
The desktop window opened at a hardcoded 1220×800 every launch, discarding
whatever size and position the user left it at (#39101) — on macOS the dock
reopen was the most visible case, but every restart reset it.
A small window-state.json under userData (same pattern as connection.json /
updates.json) records the window's normal bounds plus its maximized flag,
written debounced on resize/move/maximize and flushed on close, applied on the
next createWindow(). getNormalBounds() captures the pre-maximize size so an
un-maximize next session lands where the user actually sized it.
Restore is defensive: sanitize rejects garbage, drops off-screen positions
(window falls back to Electron centering), and caps a size saved on a
since-disconnected larger monitor to the largest current display. The geometry
math lives in a side-effect-free window-state.cjs so it unit-tests with
node --test, no Electron boot. No new dependency.
Salvages #39154 by @jeffrobodie-glitch — same userData approach and validation
intent, reimplemented tighter and folded into one module.
Co-authored-by: jeffrobodie-glitch <jeffrobodie@gmail.com>
After /stop, the next user message can hit a stale generation token and
return with api_calls=0, no failure, no interruption. _normalize_empty_agent_response
fell through to an empty string, so the gateway logged "response=0 chars"
and sent nothing — the message was silently lost while internal work
sometimes continued.
Add the api_calls==0 / not-failed / not-interrupted / not-partial branch
to the single normalization chokepoint so the user gets a short retry hint
instead of silence. Regression test asserts the hint surfaces.
Salvaged from #33851 (re-applied on current main; original was 1401 commits
behind and the function had moved).
Follow-up to the salvaged venv-recreate fix. Three changes to the
Install-Venv pre-delete sweep:
- Match the venv path with a case-insensitive StartsWith instead of the
PowerShell -like operator. A venv path containing wildcard
metacharacters ('[', ']') — legal in a Windows user name — silently
fails to match under -like, which would let the locking process slip
through and reintroduce the exact access-denied failure this fix
closes.
- Retry Remove-Item once after a short pause. A force-killed process can
take a moment to release its file handles, so the first delete may
still hit a locked .pyd; retry before failing the stage.
- Note in a comment that the gateway autostart task runs at LIMITED
integrity as the current user, so the installer always runs at
equal-or-higher integrity and can read the process executable path,
and that Get-CimInstance is preferred over Get-Process because it
returns a null path for an uninspectable process instead of throwing.
Adds a regression test asserting the recreate branch sweeps by venv path
prefix, uses StartsWith rather than -like, and runs the sweep before
Remove-Item.
Covers issues #47036, #47557, #47910.
The Windows venv-recreate guard only runs `taskkill /IM hermes.exe`, but the
gateway that a scheduled task or watchdog autostarts runs as
`pythonw.exe -m hermes_cli.main gateway run` straight out of venv\Scripts\.
Its image name is python/pythonw, so taskkill never matches it; it keeps the
venv's native extensions (e.g. tornado\speedups.pyd) loaded, and the following
Remove-Item fails with "Access to the path is denied" -- aborting boot at the
venv stage so the desktop app never loads.
Additionally stop any process whose executable lives under this venv, matched
by path so the image name is irrelevant and a global/system python outside the
venv is never touched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Let setup.runtime_check accept an optional provider, persist the selected
provider/model before the gate, and validate the provider the user just
connected instead of a stale config entry such as anthropic.
The apps/desktop workspace was bumped to 0.17.0 in apps/desktop/package.json
but package-lock.json still recorded 0.15.1, so npm install reports the lock
as out of date and rewrites it on every fresh install. Regenerate the lock
(npm install --package-lock-only) to record the current 0.17.0; one-line
change, no dependency resolution churn.
When the current provider is a custom endpoint (custom or custom:*), the model
switch pipeline must NOT auto-switch to a native provider/OpenRouter based on a
static-catalog match. The user explicitly configured their own endpoint and the
same model name may be served there; silently rewriting model.provider destroys
their config.
- detect_static_provider_for_model(): skip the static-catalog scan when the
current provider is custom/custom:*
- switch_model() Step e: extend is_custom to cover custom:* so the
detect_provider_for_model() last-resort fallback cannot fire
Salvaged from #48351 by Elshayib (authorship preserved).
Fixes#48305
The TUI model-switch persistence (_persist_model_switch) rewrote the entire
model config block via save_config(), destroying sibling keys the user set
under model: (model_slots, model_fallback, base_url, ...) on every switch.
Use targeted, atomic, comment-preserving save_config_value("model.default" /
"model.provider" / "model.base_url") writes instead, so a model switch only
touches the keys it changes.
Salvaged from #48391 by kyssta-exe (authorship preserved).
Fixes#48305
Completes the #45006 fix. PR-base commit (configured-provider routing) handles
the case where a typed model IS declared in user/custom provider config. This
commit closes the other root: when a typed model is NOT in any config and the
current provider is a soft-accepting one (openai-codex / xai-oauth), the
hidden-model soft-accept (#16172 / #19729) would accept ANY unknown name as a
hidden model — so `qwen3.5-4b` typed on a Codex-default session "succeeded" and
mislabeled the provider as "OpenAI Codex" (the exact reported symptom), then
400'd on the next turn.
Gate the soft-accept to slugs that plausibly belong to the provider's family
(openai-codex -> gpt-/codex-/o1/o3/o4; xai-oauth -> grok-). Family-shaped
unknown slugs are still soft-accepted (preserving the #16172 entitlement-gated
hidden-model intent); unrelated names are rejected with actionable guidance to
pin the right provider via `--provider <slug>` or the picker.
Adds TestCodexSoftAcceptPlausibilityGate (5 tests): unrelated names rejected on
codex/xai, family-shaped hidden slugs still accepted, real catalog models
unaffected. Verified load-bearing.
When `/model X` is the FIRST message after an idle/daily/suspended auto-reset,
the slash-command path stores a session model override but leaves
`session_entry.was_auto_reset = True` (it never passes through
`_handle_message_with_agent`, which is where the flag was consumed). On the
NEXT regular message, the auto-reset cleanup block pops the freshly-stored
model/reasoning override BEFORE the flag is consumed — so the switch is
silently lost and resolution falls back to the config default, while the
session DB still shows the switched model (a two-sources-of-truth divergence).
Consume the flag at both sites:
1. gateway/run.py — capture `was_auto_reset` into a local and set the
attribute False immediately at the top of the cleanup block, so the
cleanup can't re-fire on a later message and wipe an override stored
between turns. Downstream reads use the captured local.
2. gateway/slash_commands.py — the model path consumes the flag before
storing the override, so a /model-first-after-auto-reset isn't wiped by
the next message's cleanup.
Salvaged from #48062 by x7peeps (authorship preserved).
Tests: tests/gateway/test_48031_model_switch_after_auto_reset.py — AST
invariants pinning both consume sites (load-bearing; verified they fail when
either consume is removed). Mirrors the AST-pin approach in
test_35809_auto_reset_clean_context.py. Gateway session/reset suite: 16 passed.
Fixes#48031
Salvage of PR #48927 by @ehz0ah, which consolidates OpenViking recall
work from #41706 (@huangxun375-stack), #33260, #49975, and #32444.
Replaces stale background post-turn prefetch warming with synchronous
current-query recall. The old queue_prefetch warmed the PREVIOUS user
message while turn-start recall consumed the CURRENT one, so injected
context was always about the wrong topic.
Changes:
- prefetch() now does session-aware /api/v1/search/search with the
current query, falls back to /api/v1/search/find on failure
- Contract-safe payloads: limit, score_threshold, context_type,
session_id — no top_k, no search-body mode, no target_uri
- L2 content reads for items with level=2 or empty abstracts, capped
at full_read_limit (default 2)
- Local ranking (score + query-token overlap + leaf boost), dedup,
score threshold, and injected-char budget
- queue_prefetch() is now a no-op (background warming removed)
- Additive batched viking_read: uris param accepts up to 3 URIs
- Per-request timeout support on _VikingClient.get/post/delete
- Removes stale _prefetch_result/_prefetch_thread/_prefetch_generation
state and _invalidate_prefetch_state()
- Strengthened system_prompt_block guidance
Salvage follow-up fixes:
- Expose all 8 recall config knobs in get_config_schema() (PR #48927
had removed them; #41706 correctly exposed them). Env vars remain
as internal mechanism but are now visible in setup wizard.
- Lower default timeout 8s→4s, request_timeout 6s→3s, full_read_limit
3→2 to reduce per-turn blocking latency.
Co-authored-by: Hao Zhe <haozhe4547@gmail.com>
Co-authored-by: Eurekaxun <eurekaxun@163.com>
The Discord/Telegram /model slash command listed providers synchronously
on the gateway's async event loop. list_picker_providers /
list_authenticated_providers are blocking and can fall through to a
synchronous urllib HTTP fetch when the on-disk provider cache is stale,
freezing the loop for 120-150s -> "application did not respond" and
delayed agent starts.
Port #41304's asyncio.to_thread offload to the current handler location.
The handler moved from gateway/run.py to gateway/slash_commands.py
(_handle_model_command); wrap BOTH blocking call sites so the whole bug
class is covered:
- picker path -> list_picker_providers
- text-fallback path -> list_authenticated_providers
asyncio.to_thread is already idiomatic in this module (and asyncio is
imported), so the loop now stays responsive while the (possibly
network-bound) listing runs on a worker thread.
Adds tests/gateway/test_model_command_async_offload.py asserting the
offload contract at the real handler seam for both paths (mutation-
survivable: reverting either to_thread wrap fails the matching test).
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
The Discord gateway heartbeat stalled ('Shard ID None heartbeat blocked
for more than N seconds') because _handoff_watcher polled the synchronous,
blocking SQLite-backed SessionDB directly on the asyncio event loop every
2s. Each list_pending/claim/complete/fail call performed blocking disk I/O
on the loop thread, starving the Discord heartbeat coroutine.
Wrap every blocking SessionDB call inside the watcher loop in
asyncio.to_thread(...) so the SQLite work runs on a worker thread and the
event loop (and heartbeat) stays responsive. These four call sites are the
only synchronous self._session_db.* calls inside the watcher loop body.
Adds tests/gateway/test_handoff_watcher_async_db.py asserting the watcher
offloads its SessionDB calls via asyncio.to_thread (mutation-survivable:
reverting any to_thread wrap fails the corresponding assertion).
Fixes#40695
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
When context compaction's summary generation fails, the compressor's default
path (abort_on_summary_failure=False) drops the middle window and inserts a
static 'summary unavailable' marker — destroying the compacted turns. #29559
reported the field impact: a Connection error at the compaction moment dropped
124->15 messages (110 lost) for a long browser-automation task; #25585 is the
same failure mode (failed summary commits a destructive compaction anyway).
compress() already has an EXCEPTION to the historical drop default: auth
failures (401/403) ALWAYS abort and preserve the session, because rotating into
a placeholder-summary child on a broken credential strands the user. A transient
network/connection error is the same situation in reverse: it WILL recover, and
retrying then is strictly better than discarding context for a momentary blip.
Extend the always-abort carve-out to terminal connection/network failures:
- new _last_summary_network_failure flag, set in _generate_summary's terminal
failure branch when _is_connection_error(e) (reached only after any main-model
fallback is exhausted), reset alongside the auth flag;
- compress() aborts when it's set (returns messages unchanged,
_last_compress_aborted=True), independent of abort_on_summary_failure;
- a network-specific operator warning (distinct from the auth + config-flag
messages).
Scoped to connection errors only: a generic 500/400 still takes the historical
fallback-drop path (test_non_auth_failure_still_uses_fallback_path stays green).
Tests: network-failure detection + abort-despite-flag-false, both mutation-checked
(removing the flag-set fails detection; removing the carve-out fails the abort).
hermes-pr-review findings:
- notifyError('runtime-not-ready', msg) misused the (error, fallback) API:
the key became the notification body and the message became the title.
Switch to notify({ id, kind, title, message }) which puts content in the
right slots.
- The stable id 'runtime-not-ready' deduplicates: notify() replaces by id,
so repeated refreshOnboarding calls during an outage no longer stack
up to 4 persistent error toasts.
- Remove dead !state.manual guard from shouldPreserveConfiguredOnFallback:
refreshOnboarding already short-circuits on manual before the helper.
- Test: seed localStorage with '1' before asserting it survives (was testing
the wrong invariant — null in, null out).
- Test: use static import for spy instead of fragile await import.
- Test: add negative case for requested=true + configured=true (should
still downgrade — requested overrides preservation).
When shouldPreserveConfiguredOnFallback keeps configured=true, also call
notifyError('runtime-not-ready', ...) so the user knows the backend wasn't
verified instead of silently proceeding. Adapted from @mohamedorigami-jpg's
approach in PR #37634.
Regression coverage for the desktop gateway-restart hang: prompt_yes_no
returns its default when HERMES_NONINTERACTIVE=1 or on a bare EOFError
(closed/redirected stdin), and still exits on KeyboardInterrupt.
The dashboard/desktop spawn gateway actions with stdin=DEVNULL and
HERMES_NONINTERACTIVE=1 (hermes_cli/web_server.py), but prompt_yes_no
ignored that contract and called sys.exit(1) on the resulting EOFError.
On Windows, `gateway start` asks "Install it now so the gateway starts on
login? [Y/n]" when the scheduled task / startup entry is not yet
installed. Spawned from the desktop app there is no stdin to answer it, so
every desktop-triggered gateway restart aborted at that prompt and the
gateway never started ("Gateway service is not installed").
Fall back to the prompt's default when HERMES_NONINTERACTIVE is set, and
treat a bare EOFError as "accept default" rather than exiting. This lets
the Windows start path proceed unattended (Startup-folder fallback + direct
spawn) while interactive TTY usage is unchanged. Ctrl+C still exits.
The wire contract said hop 1 uses "the agent's existing Nous Portal
access token" but didn't name WHICH of an agent's two identities that is.
A hosted agent never holds an `agent:{instanceId}` OAuth client (that
shape is minted only by the interactive dashboard auth-code grant); its
own outbound portal calls use the bootstrap-session token (client
`hermes-cli-vps`) planted in auth.json on first boot. NAS must resolve
the instance id from either an `agent:{id}` client OR the bootstrap
session (AgentInstance.bootstrapSessionId), not gate on `agent:*` alone —
which 403'd every real hosted-agent provision in prod.
Documents the NAS-side fix (resolveAgentCronInstanceId) so the contract
and the implementation agree.
Phase 7 Unit 7d-B. When an operator opts an instance OUT of the Team Gateway
relay (Unit 7b deprovision), the connector revokes the per-gateway secret and
closes the gateway's WS with 4401. The reconnect supervisor previously treated
EVERY close as retryable, so the live process spun "retrying 4401" forever and
the dashboard showed a red error — opt-out looked like a failure.
Now a 4401 close that arrives AFTER a successful handshake is recognized as a
terminal credential revocation:
- ws_transport.py: track `_handshake_succeeded` (set when a descriptor is
received); on a 4401 close after a prior success, latch `auth_revoked` and do
NOT spawn the reconnect supervisor. A 4401 BEFORE any successful handshake
stays retryable (cold-start / not-yet-provisioned race, not a revocation).
New `auth_revoked` property + a websockets-version-safe close-code reader
(prefers `.rcvd`/`.sent` Close frames; `.code` is deprecated in websockets 13+).
- adapter.py: a revocation monitor turns `transport.auth_revoked` into a clean,
NON-retryable `relay_disabled` fatal and notifies the gateway's fatal-error
handler (so the adapter is removed and NOT queued for reconnection — the
credential is dead until the instance is recreated). Monitor is cancelled on
disconnect; only started when the transport exposes `auth_revoked` (prod WS).
- run.py: `_handle_adapter_fatal_error` maps the `relay_disabled` code to a
`disabled` platform_state (not `fatal`/`retrying`).
- web: PlatformsCard renders the `disabled` state with a neutral outline badge,
a PowerOff icon, and muted (not destructive-red) text + message. New optional
`status.disabled` i18n string ("Disabled").
Also bundles the Phase 7 contract-doc update (this doc is authoritative in
hermes-agent): docs/relay-connector-contract.md gains an "Author-first
resolution + the account-link (DM) path" section documenting the
multi-tenant-guild rule (D-7.2 — route by authenticated author binding, never by
guild; unlinked → fail-closed), the `/link <code>` DM flow, and the
connector-authoritative opt-out + terminal-4401 behavior this PR implements.
Tests: +2 ws_transport (4401-after-handshake terminal / no-reconnect;
4401-before-handshake stays retryable) and +2 adapter (revocation → non-retryable
relay_disabled fatal + handler fired; no-revocation → no fatal). 138 relay tests
pass (incl. the contract-doc conformance test); ruff clean; web tsc clean.
Phase 7 Unit 7d-B (relay-adapter solo lane). Q17 → Option 2; Option 3 (live
de-register, no recreate) + the restart-re-provision hole deferred post-alpha.
After `hermes update`, a globally-installed agent-browser's npm postinstall
(fixUnixSymlink) re-points the global symlink (e.g. /opt/homebrew/bin/agent-browser)
at our local node_modules binary. The next update wipes node_modules, leaving a
dangling symlink that `which` still reports but exec fails on with exit 127 —
silently breaking every browser tool (#48521).
Root cause is trust-on-presence: shutil.which/Path.exists accept a name that
resolves but won't run. Add hermes_constants.agent_browser_runnable() (resolves
the path + runs --version) and gate all four resolution sites on it:
_find_agent_browser now skips a dead candidate and falls through to the next
working one (extended PATH -> local .bin -> npx), self-healing the dangling link.
dep_ensure/doctor/nous_subscription validate too; doctor warns on a broken link.
Closes#48521.
* docs: stop recommending pip install hermes-agent; point to install script
The install script is the only supported install path (it provisions a
managed, isolated uv environment). Replace bare `pip install hermes-agent`
primary-install recommendations with the curl install script, and rewrite
optional-extra snippets (`pip install "hermes-agent[X]"`) to the managed-env
form `cd ~/.hermes/hermes-agent && uv pip install -e ".[X]"` that matches the
installer and the English quickstart.
Covers English docs + zh-Hans mirrors, the achievements plugin README, and
realigns the zh-Hans quickstart to the English Desktop-installer-first layout
(dropping its stale "Method A — pip (simplest)" section).
* docs: drop pip as a supported install/update method
Removes the 'pip installs' supported-method sections from updating.md and
cli-commands.md (EN + zh-Hans): the curl install script is the only supported
way to install/update the Hermes CLI. The _cmd_update_pip pip/pipx branches
remain in code as an undocumented safety net for users who already have such an
install, but the docs no longer advertise pip as a path.
Also normalizes a bare `pip install -e '.[acp]'` to the managed-env form.
Leaves python-library.md untouched: importing AIAgent as a library dependency
into your own project is a distinct use case where pip is correct.
On a launchd-managed gateway (macOS), /restart stopped the gateway but
never relaunched it: the handler's service detection checks only
INVOCATION_ID (systemd) and container markers, so under launchd it takes
the detached path and exits 0 — which KeepAlive.SuccessfulExit=false
treats as a deliberate stop. The gateway stays silently dead until a
manual launchctl kickstart.
Detect launchd via XPC_SERVICE_NAME, which launchd sets to the job label
for processes it spawns. The probe deliberately excludes the literal
"0": interactive macOS shells inherit XPC_SERVICE_NAME=0 (a truthy
string), and routing an unsupervised interactive gateway to the service
path would make it exit non-zero with nothing to revive it.
Routing through via_service=True (rather than forcing a non-zero exit
on the detached path) matters: the detached path also spawns a helper
that relaunches the gateway, so exiting non-zero there would have BOTH
the helper and launchd respawn it — two gateways racing for the same
bot tokens. The service path spawns no helper; launchd is the single
respawner.
Fixes#43475. Supersedes the run.py-era probes in #19940/#33393 (the
handler has since moved to gateway/slash_commands.py) and avoids the
double-spawn risk in the exit-code-site approaches (#43498, #43596).
The dashboard chat sidebar's tool-call activity card was disabled in the
product — both ChatPage mounts passed showTools={false} (since #49077),
so the box never rendered. The sidebar still subscribed to tool.* events
and accumulated them in state for a panel nobody saw.
Remove the tools card, the showTools prop, the tool.* event handling and
state, and the now-orphaned ToolCall component. The /api/events
subscription stays for session.info (live title) and
dashboard.new_session_requested. The sidebar is now just the model
selector box; the session list (ChatSessionList) is unchanged.
No behavior change in the live dashboard — the tools box was already
hidden.
Add two regression tests for the salvaged #48706 fix:
- login token exchange targets platform.claude.com first
- falls back to console.anthropic.com when the new host is unreachable
Also map the salvaged contributor's noreply email in release.py
AUTHOR_MAP (CI author-map gate).
Anthropic migrated the OAuth token endpoint from
console.anthropic.com/v1/oauth/token (now returns HTTP 404) to
platform.claude.com/v1/oauth/token. The token *refresh* path already
iterated both hosts, but the two initial code-exchange call sites were
hardcoded to the dead console host, so every new Claude OAuth login
failed with 'Token exchange failed: HTTP Error 404: Not Found' and saved
no credentials.
Fix the whole bug class:
- Add _OAUTH_TOKEN_URLS [platform.claude.com, console.anthropic.com] in
agent/anthropic_adapter.py; _OAUTH_TOKEN_URL now points at the live
host for backward-compat with existing imports.
- run_hermes_oauth_login_pure() (CLI flow) iterates the list, first
success wins, mirroring the refresh path.
- hermes_cli/web_server.py (desktop dashboard flow) imports the list and
iterates it too, so the GUI login path is fixed identically.
Probe: console.anthropic.com/v1/oauth/token -> HTTP 404 (gone),
platform.claude.com/v1/oauth/token -> HTTP 400 (alive). Verified a real
Claude MAX OAuth login now succeeds end-to-end.
Most Matrix clients auto-set a room name when creating a DM (e.g.
"Alice & Bot" from participant display names), so the old
`is_direct and not has_explicit_name` heuristic classified virtually
all client-created DM rooms as "room", forcing require_mention gating
in legitimate one-on-one DMs.
member_count is now the primary DM signal: <=2 members means the room
is necessarily a 1:1 conversation, regardless of m.direct or an explicit
name. A room that grew to 3+ members but is still in stale m.direct is
still classified as a room (conflict flag set). Falls back to the
m.direct + name heuristic when the count is unavailable.
Also hardens _get_room_member_count with a joined_members API fallback
when the cache-backed state_store is empty.
Salvaged from #48554 by @justemu onto the current plugin adapter path
(gateway/platforms/matrix.py -> plugins/platforms/matrix/adapter.py).
Fixes#48551
Users who inspect ~/.hermes/sessions/sessions.json see only gateway entries
(e.g. agent:main:whatsapp:dm:...) and mistake it for the session index that
hermes sessions list / /sessions read — which is actually state.db. Issue
#49361 reported CLI sessions as 'invisible' on this premise.
- gateway/session.py: write a self-documenting _README sentinel at the top of
sessions.json explaining it's the gateway routing index and that ALL sessions
(CLI/TUI/gateway) live in state.db; skip _-prefixed keys on load so the
sentinel never round-trips into a SessionEntry.
- Harden every sessions.json reader against the sentinel: mcp_serve loader,
gateway/mirror.py, gateway/channel_directory.py all skip _-prefixed keys.
- docs/user-guide/sessions.md: warning callout naming the exact symptom.
- tests: assert prune ignores metadata sentinels; add round-trip coverage.
Component button interactions (approve/deny, slash confirm, model
picker, clarify) were not checking the pairing store for authorization.
Users approved via `hermes pairing approve` could send messages and use
slash commands (which go through the gateway authz_mixin), but button
clicks were rejected because `_component_check_auth` only checked
env-var allowlists (DISCORD_ALLOWED_USERS, GATEWAY_ALLOW_ALL_USERS,
etc.) and not the pairing store.
This was a regression from commit f6f363662 which intentionally made
component auth fail-closed when no allowlist is set (security fix for
GHSA-mc26-p6fw-7pp6), but did not account for pairing-based auth.
Fix: add a `PairingStore.is_approved("discord", uid)` check to
`_component_check_auth`, mirroring `authz_mixin._check_authorization`.
The pairing store check runs after all allowlist checks, preserving the
fail-closed behavior for non-paired, non-allowed users.
Fixes#50627
Regression coverage for the synthetic-assistant close: interrupt after a
successful tool must persist an assistant tail (placeholder when no
delivered text), real delivered text is preserved, and non-interrupted
or non-tool tails are left untouched.
Asserts resolve_runtime_provider honors target_model over the stale
persisted model.default when choosing the Bedrock dual-path api_mode:
Claude target -> anthropic_messages, Nova target -> bedrock_converse.
Both fail without the #49095 fix.
The 30-slot default could not fit Hermes's ~50 built-in commands, so
every skill command (and 20 built-ins) were silently dropped from the
Telegram \`/\` menu by default — they only worked when typed manually.
Raising the default to 60 keeps all built-ins plus common skill commands
visible out of the box while staying under Telegram's ~4KB payload limit.
Users can still tune it via platforms.telegram.extra.command_menu.
Adds a configurable Telegram BotCommand menu cap and priority list via
platforms.telegram.extra.command_menu (max_commands clamped 1..100;
priority_mode prepend|append|replace). Default cap stays 30; hidden
commands remain invokable when typed and /commands lists the full set.
Salvaged from PR #42021. Cherry-picked onto current main; the original
edited gateway/platforms/telegram.py, now relocated to
plugins/platforms/telegram/adapter.py.
The desktop composer threw an uncaught "Composer is not available" at
startup and the input went unresponsive (#49903). assistant-ui's composer
mutators (setText/send/…) throw when the thread's composer core isn't bound
yet; the read path is null-safe but the writes are not. ChatBar pushes draft
text via aui.composer().setText() from mount-time effects (draft restore,
clearDraft, external inserts), and the v0.17.0 popout refactor (#49488)
widened the unbound window by moving the composer out of the contain wrapper
into a sibling of the thread — so the throw surfaced as an uncaught error
that wedged the input.
Wrap every composer mutation in a setComposerText helper that swallows the
unbound-core throw. The contentEditable DOM + draftRef already hold the text
and the draft-editor sync re-applies it once the core attaches, so the draft
is never lost — only the premature state push is skipped.
Selective --clone / --clone-from / --clone-config copied .env but not
auth.json, silently dropping the credential pool — including OAuth tokens
(Anthropic `claude /login`, Codex, xAI) that never land in .env. A profile
cloned from an OAuth-authenticated default therefore resolved a different
provider (or none) than the source under provider: auto. --clone-all already
carried auth.json via the full copytree; only the selective path missed it.
Add auth.json to _CLONE_CONFIG_FILES and tighten it to 0o600 after copy,
matching .env semantics.
On Windows, start_server() served uvicorn via a bare asyncio.run(_serve()),
which uses the default ProactorEventLoop. uvicorn's socket-serving stack
assumes a SelectorEventLoop on win32 (uvicorn/loops/asyncio.py forces it, and
uvicorn.Server.run threads config.get_loop_factory() into its runner for
exactly this reason). Driving uvicorn on the proactor loop makes
server.startup() bind a socket that never accepts: the dashboard and desktop
backend print "Skipping web UI build" then hang forever with the port
LISTENING but no TCP handshake completing.
Fix is win32-scoped to keep the blast radius minimal: POSIX keeps the exact
asyncio.run(_serve()) it had (its default loop is already a SelectorEventLoop /
uvloop, which is what uvicorn serves on). Only on Windows do we mirror
uvicorn.Server.run and run on the loop factory uvicorn picks, with a fallback
to WindowsSelectorEventLoopPolicy for uvicorn < 0.36.
Fixes hermes dashboard and hermes desktop (the Electron app spawns a
hermes dashboard backend). The gateway symptom in the report has a separate
root cause (no uvicorn) and is not addressed here.
uperLu's #50958 renamed plugins/cron → plugins/cron_providers but left
two test files patching the now-gone plugins.cron.chronos.verify path,
which would fail collection. Point them at plugins.cron_providers.*.
Add uperLu to release.py AUTHOR_MAP.
The dashboard Profiles view showed "Gateway stopped" for a gateway that
is in fact running — while the sidebar status strip and `hermes gateway
status` (CLI) both correctly showed it running. Reported on v0.17.0
running the gateway + dashboard in one Docker container.
Root cause: three liveness surfaces with three detection strengths, all
reading the same `gateway.pid`:
- `hermes gateway status` -> find_gateway_pids() (process-table scan)
- sidebar /api/status -> get_running_pid() + gateway_state.json PID
fallback + health-URL probe
- Profiles view -> _check_gateway_running() = get_running_pid()
ONLY, no fallback
`get_running_pid()` short-circuits to None the moment the runtime lock
(`gateway.lock`) doesn't register as held by the *calling* process —
which is always true when the reader is a separate process from the
gateway (the dashboard is its own s6 service in the container), and also
for any launch-service-managed gateway that left a fresh
`gateway_state.json` but no live PID file. So the Profiles view alone
reported the live gateway as stopped.
Fix: give _check_gateway_running the same fallback the sidebar already
has — after the pid-file/lock check misses, validate the PID recorded in
that profile's gateway_state.json against the live process table via the
existing get_runtime_status_running_pid(). read_runtime_status() gains an
optional path arg so a profile's state file can be read without mutating
the process-global HERMES_HOME (preserving the contextvar-based profile
isolation the dashboard relies on). Backward compatible: every existing
caller passes no argument.
Tests: a regression test that fails pre-fix (live gateway, lock check
returns None -> must still report running) and a guard test that a
'stopped' state file is never reported running even with a live PID.
The contributor PR stamped runner._exit_code=78 on non-retryable startup
errors, but start_gateway()'s clean-exit branch returned True before the
SystemExit(runner.exit_code) site, so main() exited 0. The s6 finish
script's [ "$1" = "78" ] check never matched and s6 crash-looped the
gateway anyway — the fix was dead as shipped (#51228).
Honor runner.exit_code in the clean-exit branch: raise SystemExit(code)
when set, else return True (normal /restart clean exit). Add a
start_gateway()-level test that asserts process-level SystemExit(78)
propagation — the gap the PR's object-level test missed — plus exit_code
on the existing _CleanExitRunner mocks.
Profiles without their own messaging token inherit the default
profile's token via os.getenv, hit a token collision, and exit with
startup_failed. s6 restarts them immediately, creating ~30MB tirith
sandbox dirs in /tmp each cycle — filling the disk in hours (#51228).
Changes:
- gateway/restart.py: add GATEWAY_FATAL_CONFIG_EXIT_CODE = 78
- gateway/run.py: set exit_code=78 on non-retryable startup errors
(token collision, no platforms)
- hermes_cli/service_manager.py: add _render_finish_script() that
translates exit 78 → exit 125 (s6 permanent failure)
- hermes_cli/container_boot.py: write finish script alongside run
script during profile registration
The s6 finish script pattern follows docker/s6-rc.d/dashboard/finish.
Closes#51228
A naive ISO timestamp (e.g. 2026-06-22T20:07:00) was anchored to the
server's local timezone via dt.astimezone(), but the due-check
(get_due_jobs -> _hermes_now()) runs in the CONFIGURED Hermes timezone.
When the two diverge (cloud host on UTC with a different timezone: set,
or vice-versa) the stored instant lands hours off the user's wall-clock
intent, so one-shots never become due and recurring jobs fire at the
wrong time. The ticker stays healthy (heartbeat + success markers fresh)
because every tick finds nothing due, matching the silent no-fire in #51021.
Anchor naive timestamps to _hermes_now().tzinfo so '20:07' means 20:07 on
the same clock the scheduler checks against. The legacy _ensure_aware path
still treats already-stored naive values as server-local for back-compat.
Fixes#51021
The cron ticker only runs inside the gateway (_start_cron_ticker); there
is no standalone cron daemon. When the gateway isn't running, next_run_at
passes but jobs never fire and last_run_at stays null — and manual
'hermes cron run' (which bypasses the ticker) appears to work, masking
the real cause. This is the most common cron support report (#51038).
cron list already warned; extend the same warning to cron create (the
moment the user is most likely to hit this) via a shared helper, and add
a pointer to 'hermes cron status'. Silent when a gateway is running, so
the gateway /cron path is unaffected.
Launching Hermes from a directory that ships its own top-level package with a
Hermes-internal name (utils/, proxy/, ui/) crashed the gateway/TUI child with
an ImportError (exit 1, crash loop): from utils import atomic_replace resolved
to the user's package.
tui_gateway/entry.py already stripped the relative cwd forms ('' / '.'), but
the launch dir also reaches sys.path as its own ABSOLUTE path (venv activation
or a project that adds itself to PYTHONPATH), which the strip missed and which
sat ahead of the Hermes root.
Centralize a hardened guard in hermes_bootstrap.harden_import_path(): drop the
relative forms AND force the Hermes source root to the front even when an
absolute cwd entry is present. Wire it into tui_gateway/entry.py and
acp_adapter/entry.py (both spawn into arbitrary cwds); hermes_cli/main.py and
gateway/run.py already insert the root at front. gatewayClient.ts now also
exports HERMES_PYTHON_SRC_ROOT for defense in depth.
Builds on @wgu9's runtime-tracking fix: now that find_gateway_pids() can
see a no-supervisor `gateway restart` runtime, have stop_profile_gateway()
fall back to an orphan-aware, profile-scoped reap (SIGTERM then SIGKILL)
when the pidfile/runtime record is missing or stale. Closes the duplicate-
accumulation path in #51325 — a follow-up restart now kills the prior
orphan instead of stacking another listener on :8644. Gated on
not supports_systemd_services() so a transient `gateway restart` argv on
supervised hosts is never killed.
Also adds the AUTHOR_MAP entry for the salvaged contributor.
atomic_yaml_write (and two sibling config writers) called yaml.dump
without allow_unicode=True. The default personalities shipped in cli.py
contain emoji/kaomoji, so PyYAML escaped astral-plane chars as 8-digit
\\UXXXXXXXX sequences inside multi-line double-quoted strings wrapped
with \\ line-continuations. Stricter/non-PyYAML parsers, editors, and
hand-edits break that structure into unclosed quotes, failing the whole
config parse -> silent fallback to defaults -> custom_providers lost.
Add allow_unicode=True to the canonical writer plus tui_gateway/server.py
and the telegram adapter's atomic config write so config is written as
readable UTF-8 with no escape/fold artifacts.
Fixes#51356
deliver=origin (or omitted) from a TUI or classic-CLI session produces a
job with origin=null, because those sessions never populate the
HERMES_SESSION_PLATFORM/CHAT_ID context vars that _origin_from_env reads.
The scheduler then resolves no delivery target and skips delivery — the
job runs and saves output to last_output, but nothing reaches the user
and they only find out by polling cronjob(action='list') (#51568).
This is by design (local sessions have no live-delivery channel), so the
fix surfaces it instead of silently dropping the intent:
- cronjob create now appends an informational notice to its result when
a created job resolves to zero delivery targets and the user did not
explicitly ask for deliver='local'. The check uses the scheduler's own
_resolve_delivery_targets so it accounts for origin, home channels,
'all', and explicit platform targets — no false positives.
- PLATFORM_HINTS gains a 'tui' entry (the TUI had none) and the 'cli'
hint now states that cron jobs from these sessions are local-only and
that deliver must target a gateway-connected platform to notify the
user. This stops the agent promising a delivery that never happens.
No scheduler/delivery behavior change; no new env var; cron isolation
invariant untouched.
Adds the #51579 regression test the issue asked for: run the real
docker_config_migrate.py boot path twice (host-reboot scenario under
--restart unless-stopped) and assert $HERMES_HOME/.env survives
byte-for-byte and the second boot is a no-op (no re-migration, no new
backup). Exercises real migrate_config + real file I/O via subprocess.
spectrum-ts routes stream telemetry through @photon-ai/otel's createLogger,
which sends severity>=ERROR to console.error and WARN/INFO to console.log.
The two lines the health monitor keys off land on different channels:
log.error("stream persistently failing") -> console.error (caught), but
log.warn("stream interrupted; reconnecting") -> console.log (was missed).
The original interception patched console.error only, so the recovering->
degraded escalation counter never saw the interrupt bursts that are the
primary silent-inbound symptom. Verified live against spectrum-ts 3.1.0 +
@photon-ai/otel: 3 real log.warn('stream interrupted') calls now escalate
to degraded -> process.exit(75) -> adapter reconnect.
Adds a shared classifyStreamLog() fed by both console.error and console.log,
plus a regression test asserting both channels are intercepted.
Source-level regression guard (the script only runs on Windows, so there's no
runner on Linux CI). Asserts Resolve-AvailablePythonVersion exists, that
Install-Venv re-resolves the interpreter before the venv-creation line, and
that Test-Python and the resolver share the single $PythonFallbackVersions
constant so detection and venv creation can't drift apart again.
The Windows installer runs each -Stage NAME in its own powershell.exe under
Hermes-Setup.exe. Test-Python records a detected fallback (e.g. 3.12 when 3.11
is absent) via an in-memory $script:PythonVersion = $fallbackVer mutation,
which dies with the python stage's process. The fresh venv stage starts with
$PythonVersion back at its "3.11" default, so it logged "Creating virtual
environment with Python 3.11..." and ran uv venv venv --python 3.11, failing
with exit 2 on machines that only had the fallback installed.
Add a cross-process-safe Resolve-AvailablePythonVersion helper (preferring the
requested version, then the shared $PythonFallbackVersions list, probed via
uv python find) and call it at the top of Install-Venv before creating the
venv. Test-Python's fallback loop now iterates the same shared constant so
detection and venv creation can't drift.
The contract already documents the scale-to-zero PRIMITIVES (§3.2 going-idle/
buffered-flip, §3.3 wake poke) and what's out of scope. This adds the missing
half: the contract FROM the primitives TO the behaviour layer — the guarantees
a separate scale-to-zero workstream must honour to consume them safely (register
a wakeUrl before suspend; drain+ack before teardown; keep the reconnect loop
live; treat suspended != down in the health model; don't assume exactly-once/
prompt wake; suspend only when genuinely idle, composing with the existing drain
machine). Docs-only; lets the independent scale-to-zero stream build against a
written contract instead of re-reading the connector.
The cherry-picked commit is authored by jinhyuk9714@gmail.com (GitHub
sjh9714); the check-attribution CI gate requires every PR commit author
to be present in scripts/release.py AUTHOR_MAP.
_ensure_uv_for_termux only checked resolve_uv() (the managed
$HERMES_HOME/bin/uv) before falling back to pip, so a uv installed via
`pkg install uv` lives on PATH but is invisible to the helper. Combined
with the cherry-picked wheel-only fallback, a Termux user with no managed
uv still hit `pip install uv`, which has no Android wheel and tried to
source-build the Rust crate, OOM-killing low-memory devices.
Probe shutil.which("uv") right after the Termux guard and reuse it before
pip. Add a regression test that keeps resolve_uv() returning None while a
uv exists on PATH and asserts pip is never invoked.
Register a per-instance wakeUrl and forward it to the connector at
self-provision so a suspended gateway can be poked awake when buffered
work arrives (pairs with the connector-side WakePoker).
- relay_wake_url() resolver (env GATEWAY_RELAY_WAKE_URL, then
gateway.relay_wake_url in config.yaml), mirroring relay_instance_id()
- thread wake_url through _post_provision (adds wakeUrl to the body only
when set) + self_provision_relay (resolve, forward, log)
- hermes gateway enroll --wake-url <url> persists GATEWAY_RELAY_WAKE_URL
- document the §5.2 wake poke in relay-connector-contract.md §3.3
- tests: relay_wake_url resolution (env/config/absent), provision
forwarding, body-only-when-set (6 new; 130 relay tests pass)
The actual reconnect+drain on wake is Unit B's loop; this unit only
wires the wake SIGNAL. Opt-in: absent wakeUrl => connector never pokes.
install_pet now refuses spritesheet/pet.json URLs that aren't on a petdex
host (matching thumbnail_png's existing _is_petdex_host guard), so a
spoofed manifest can't redirect a download at an arbitrary host. Slugs
are normalized to a single path segment before indexing into pets_dir(),
closing a path-traversal vector in load_pet/remove_pet/install_pet.
The gateway half of the going-idle/buffered-flip primitive (scale-to-zero
PRIMITIVE, not the behaviour). Integrates with the EXISTING drain transition:
- ws_transport: `go_idle()` sends `going_idle` + awaits the connector's
`going_idle_ack` (connector-authoritative flip-then-ack, Q-5.3c — stays
serving until the ack so nothing is lost in the flip window); acks a buffered
inbound (bufferId present) via `inbound_ack` after the handler runs
(drain-without-dup on the delivery leg); NET-NEW reconnect loop re-dials +
re-handshakes after an unexpected close (off by default, on in production).
- adapter: emits `going_idle` from its existing `disconnect()` drain seam before
tearing down the socket; best-effort + guarded (never blocks shutdown).
- transport Protocol + contract doc §3.2 document the 3 new frames.
+6 relay tests (124 pass). NOT in scope: the autonomous idle timer / machine
suspend / NAS health model (deferred behaviour). Ben's relay-adapter solo lane.
* fix(windows): harden gateway scheduled task
* fix(windows): launch gateway scheduled task via console-less wscript
The Scheduled Task ran the gateway through cmd.exe, which allocates a
console. During logon Windows broadcasts CTRL_CLOSE_EVENT to console
process groups, reaping cmd.exe and the half-initialized gateway with
STATUS_CONTROL_C_EXIT (0xC000013A) - which Task Scheduler treats as a
user cancel, so RestartOnFailure never fires and the gateway vanishes on
every reboot (issue #45599 root cause #1).
Add a console-less .vbs launcher (wscript.exe -> pythonw.exe, both
GUI-subsystem) mirroring the gateway.cmd env + argv, and point the task
action at it. The .cmd stays for the Startup-folder fallback and /Run.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Jeff <jeffrobodie@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When _auto_create_thread() creates a thread from a user message via
message.create_thread(), Discord fires a second MESSAGE_CREATE event
for the 'thread starter message'. That starter message carries
message.id == thread.id and may arrive with type=default instead of
type=21 (thread_starter_message), so the existing type filter in
on_message does not catch it — triggering a second call into
_handle_message and thus a second agent run and response.
Fix: after _auto_create_thread succeeds and returns a thread, pre-seed
the dedup cache with str(thread.id) via self._dedup.is_duplicate().
The dedup cache is the same TTL-based MessageDeduplicator that already
guards against Discord RESUME event replays. Calling is_duplicate()
marks the ID as seen; when the duplicate thread-starter MESSAGE_CREATE
arrives, on_message's guard returns True and the event is dropped.
This is a minimal, targeted fix:
- No new state: reuses the existing _dedup instance
- No timing/race: the pre-seed happens synchronously inside the async
_handle_message, before the thread-starter event can be dispatched
- Scoped: only fires when auto-threading is enabled AND thread creation
succeeds (thread object is not None)
Also adds tests in tests/gateway/test_discord_double_dispatch.py
covering the pre-seed behaviour, failure modes (thread creation fails,
auto-thread disabled), and dedup cache integrity.
Closes#51057
Required by contributor-check/check-attribution before salvaging PR #51129
(Discord thread-starter dedup, #51057). The CI step greps AUTHOR_MAP by
exact email and does not special-case noreply addresses.
_session_task_is_stale() failed to detect a stale session lock when the owner
task completed and cleaned _session_tasks (del in _process_message_background's
finally) but _active_sessions was NOT released because _release_session_guard
skipped on a guard mismatch (a concurrent reset/new command or drain handoff
swapped _active_sessions[key] to a different guard). With no owner task left to
inspect, _session_task_is_stale reported 'not stale', the orphaned guard was
never healed, and the session deadlocked permanently — later messages received
but never dispatched.
Reorder the finally cleanup to release-then-conditional-delete: release the
guard first, then drop the _session_tasks entry ONLY if the guard was actually
released (session_key no longer in _active_sessions). On a guard mismatch the
done-task entry survives, so the on-entry self-heal (_session_task_is_stale ->
_heal_stale_session_lock) detects the stale lock and clears it on the next
inbound message.
Extracted the cleanup into a callable _cleanup_finished_session_task() helper so
the regression test drives the REAL production code path rather than a copy of
its logic (the original test inlined the fixed logic and passed regardless of
the production order — mutation-verified the rewritten tests now fail on the
buggy del-first order). Added a positive-path test (guard matches -> release +
delete) so both branches are pinned.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Open-ended skill learning across every surface. /learn <free text> takes a
description of any source — a directory, a URL, the workflow you just walked
the agent through, or pasted notes — and the live agent gathers it with the
tools it already has (read_file/search_files, web_extract, the conversation,
the pasted text), then authors a SKILL.md via skill_manage following the
house authoring standards (<=60-char description, the standard section order,
Hermes-tool framing, no invented commands).
No engine, no model-tool footprint, works on any terminal backend (local,
Docker, remote): /learn builds a standards-guided prompt and hands it to the
agent as a normal turn.
- agent/learn_prompt.py: shared standards-guided prompt builder
- /learn registry entry (both surfaces) + CLI handler (inject onto input
queue) + gateway handler (rewrite turn, fall through, /blueprint pattern)
- tui_gateway command.dispatch returns a send directive -> TUI + dashboard chat
- dashboard Skills page 'Learn a skill' panel (dir + URL + open-ended text)
composes a /learn request and runs it in chat
- docs (slash-commands ref + skills feature page), 11 targeted tests
Inspired by OpenAI Codex's Record & Replay and the /learn concept from #47234
(dir-distillation engine); reworked to be open-ended and engine-free per
review.
PTB's HTTPXRequest builds its httpx.AsyncClient with
`limits = httpx.Limits(max_connections=connection_pool_size)` and no
keepalive tuning, so httpx's default keepalive_expiry=5.0 applies. Behind
an HTTP proxy (Cloudflare Warp etc.) a peer-initiated FIN can sit in
CLOSE_WAIT longer than that, leaking fds in the general request pool
(_request[1], which routes bot.send_message/set_my_commands) — the pool
_drain_polling_connections never resets. Telegram was the lone holdout
adapter not using the shared #18451 CLOSE_WAIT helper.
Wire gateway.platforms._http_client_limits.platform_httpx_limits() into
the httpx client across ALL THREE request-construction branches —
fallback-transport, proxy, and plain — via httpx_kwargs["limits"], which
PTB spreads last into its client kwargs so our tuned limits win. PTB's
connection_pool_size (max_connections) is preserved; only keepalive
behaviour is tightened (max_keepalive_connections + keepalive_expiry<5.0).
The fix is macOS-import-safe: no Linux-only socket TCP_KEEPIDLE/INTVL/CNT
constants at module scope (unlike the broken candidate which crashed on
import on the reporter's OS), and it patches the actual proxy path the
repro hits rather than TelegramFallbackTransport, which the proxy repro
never instantiates.
Adds a mutation-survivable behavior-contract test asserting every
HTTPXRequest built by connect() receives httpx_kwargs["limits"] with
keepalive_expiry < httpx's 5.0 default, across both the proxy and plain
branches. Reverting the limits wiring fails the test.
Co-authored-by: indigokarasu <mx.indigo.karasu@gmail.com>
When a session rotates id on compression, _sync_session_key_after_compress()
re-anchored the session_key, approval-notify routing, yolo state, and slash
worker — but never moved the active-session lease, which stayed keyed to the
pre-compression id. And _find_live_session_by_key() matched live sessions on
the stale session_key, not the live agent's current agent.session_id. After
compression a resume/create path failed to recognize the existing live agent
and could build a SECOND live agent against the same DB continuation -> forked
lineage / cross-session message mixing.
- active_sessions.transfer_active_session(): move a lease in place to the new
id under the exclusive file lock (no slot drop).
- gateway _transfer_active_session_slot(): call it inside
_sync_session_key_after_compress(); on the rare fallback (entry pruned)
RESERVE the new slot before releasing the old lease (reserve-before-release),
so a concurrent gateway at the session cap cannot grab the freed slot in a
release-then-reacquire window and leave this session with no lease; if the
reserve fails, keep the existing lease (review fix).
- _session_lookup_key(): make live-session lookup authoritative on
agent.session_id, wired into all stale-session_key consumers
(_find_live_session_by_key, _session_live_item, _live_session_payload) —
fixes the whole lookup class.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
/reload-mcp -> shutdown_mcp_servers -> _kill_orphaned_mcp_children(include_active=True)
-> _send_signal -> killpg(pgid, SIGTERM). When a tracked MCP stdio child shares
the gateway's OWN process group, killpg delivers SIGTERM to the gateway itself,
firing its SIGTERM handler -> os._exit(0): /reload-mcp crashes the gateway.
Pre-compute the gateway's own pgid (os.getpgrp(), None on Windows/restricted)
and, in _send_signal, skip killpg when pgid == own pgid, falling through to the
per-pid os.kill path so the child is still reaped without self-signaling.
Adds a regression test (folded in) that pins the guard: with a tracked pgid
equal to the gateway's own pgid, killpg is never called for that pgid and the
per-pid kill fallback is used. Mutation-checked.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Map Hermes xhigh→max to unlock DeepSeek V4's 'Max thinking' tier
through Ollama Cloud's OpenAI-compatible /v1/chat/completions endpoint.
low/medium/high pass through unchanged; disabled/none suppress
reasoning entirely.
Empirically confirmed: reasoning_effort:max produces ~2.5× more
thinking tokens than high on deepseek-v4-pro:cloud (1576 vs 642).
Parity with the classic CLI status bar's ⛓ indicator (PR #51441). The
Ink TUI status bar now shows ⛓ N for live background/async subagents
(delegate_task batches + background single delegations).
- tui_gateway/server.py: _get_usage() embeds active_subagents from
tools.async_delegation.active_count() — the same registry the CLI
reads — onto the existing per-update usage payload, guarded so a
raising active_count() leaves the field off without breaking usage.
- ui-tui appChrome: new 'subagents' status segment (breakpoint w>=92,
slots between bg and cost in the shed-order), renders ⛓ N from
usage.active_subagents.
- Usage / SessionUsageResponse types gain active_subagents?.
Distinct from the turn-scoped SpawnHud / /agents overlay, which mirror
live in-turn subagent.* events; this is the persistent registry count.
By default `hermes slack manifest` opts the app into Slack's AI Assistant
container (assistant_view feature + assistant:write scope +
assistant_thread_* events). Slack then renders DMs as the right-hand
Assistant split-pane, where every exchange is a thread and bare slash
commands (/help, /new, ...) are not delivered as normal command events —
they only work when the bot is @mentioned. There was no way to opt out
short of hand-editing the generated JSON.
Add --no-assistant to emit a flat-DM manifest that omits those three
pieces, so DMs render as a normal chat and slash commands dispatch
inline. The regular messaging surface (Messages tab, slash commands,
Socket Mode, channel + DM scopes/events) is preserved in both modes.
Default behaviour is unchanged (assistant mode still on).
Tests: cover both manifest modes and the argparse wiring.
The classic prompt_toolkit status bar already shows two background
indicators: ▶ N (/background agent threads) and ⚙ N (shell processes
spawned by terminal(background=true)). Background/async subagents
(delegate_task batches and background single delegations) had no
indicator despite being long-running work the user should be able to
see at a glance.
Add a third indicator ⛓ N sourced from
tools.async_delegation.active_count() — the count of delegations still
in the 'running' state. Renders in the plain-text builder and the
styled-fragment builder across the same width tiers as the other two
(omitted on the narrow <52 tier), guarded so a raising active_count()
leaves the snapshot at 0.
Adds a per-platform display.reasoning_style setting (code | blockquote |
subtext) controlling how the show_reasoning summary renders on the gateway.
Discord defaults to "subtext" (-# small grey metadata text); every other
platform keeps the fenced code block. Resolves through the existing
display.platforms.<platform>.reasoning_style override chain.
Replace the old "skips download when a system browser exists" assertions with
tests for the new behavior:
- no PATH scan for browser command names, and the "use the system browser" path
is gone;
- find_system_browser consults only an explicit AGENT_BROWSER_EXECUTABLE_PATH
override (which still skips the bundled download);
- strip_snap_browser_override runs on both install paths and a /snap/* path is
rejected, so already-affected installs auto-recover on update.
The installer scanned PATH/well-known locations for a Chrome/Chromium binary
and, when found, skipped the bundled Playwright Chromium download and wrote that
path into ~/.hermes/.env as AGENT_BROWSER_EXECUTABLE_PATH. On Snap-based systems
`command -v chromium` resolves to /snap/bin/chromium, whose sandbox blocks
agent-browser's control socket under /tmp -- so every browser_navigate hung
until the 60s timeout fired ("opening web page failed").
Drop the system-browser fallback entirely (per maintainer direction):
find_system_browser()/Find-SystemBrowser now honor ONLY an explicit, user-set
AGENT_BROWSER_EXECUTABLE_PATH override -- no PATH scan, no well-known-path scan.
A /snap/* path is rejected even when set explicitly, since its confinement is
the bug. Applied to both install.sh (Linux/macOS) and install.ps1 (Windows).
Crucially, also auto-repair already-affected installs: the bad snap path
persists in .env and is read directly by the runtime, and the installer skips
re-config when AGENT_BROWSER_EXECUTABLE_PATH is already set ("already
configured"), so a plain reinstall/update never recovered an existing user. New
strip_snap_browser_override() removes a snap-pointing AGENT_BROWSER_EXECUTABLE_PATH
(and its auto-written comment) from .env on every install/update, run from both
browser-setup paths (install_node_deps and ensure_browser), so updating is
enough to recover. A deliberately-set non-snap override is left untouched.
docker/stage2-hook.sh is intentionally untouched: it discovers the bundled
Playwright Chromium, not a system browser.
ci: centralize path-gating behind single orchestrator + all-checks-pass
gate
Replace the scattered per-workflow detect-changes pattern with a single
ci.yml orchestrator that runs the classifier once, then conditionally
calls sub-workflows via workflow_call based on lane outputs. A final
all-checks-pass job (if: always()) aggregates all results so branch
protection only needs to require one check.
Changes:
- New .github/workflows/ci.yml orchestrator (detect + conditional calls
+ all-checks-pass gate)
- Extend classify_changes.py with scan/deps/mcp_catalog lanes, absorbing
supply-chain-audit's internal changes job
- Update detect-changes/action.yml to expose the new lane outputs
- Convert all 10 PR-gated sub-workflows to workflow_call-only triggers,
removing their push/pull_request triggers and per-step detect-changes
guards (gating now happens at the orchestrator level)
- lint.yml + supply-chain-audit.yml receive event_name as a
workflow_call
input to replace github.event_name (which is "workflow_call" inside
called workflows)
- supply-chain-audit.yml: remove internal changes job + *-gate jobs
(orchestrator handles gating, booleans arrive as inputs)
- contributor-check.yml: remove internal filter step
- Update test_classify_changes.py for 6-lane output + new supply-chain
test cases
`npm ci` / `uv sync` / toolchain header fetches occasionally die on
transient network blips — e.g. node-pty's node-gyp fetching Node headers
(an undici assert) during the typecheck job's `npm ci`, which killed the job
before `tsc` ever ran. "Re-run and it goes green" is exactly what CI should
do itself.
- New reusable `.github/actions/retry` composite action wraps a command and
retries on failure (3x / 10s, command passed via env so it can't inject).
Applied to every PR-path network install: npm ci (typecheck, desktop
build, docs site), uv sync (tests, e2e), uv tool install (lint),
pip install (docs site).
- typecheck now runs `npm ci --ignore-scripts`: `tsc` needs only sources +
type defs, so skipping install scripts drops node-pty's native rebuild
(whose header fetch was the flake) and is faster. Validated locally — tsc
passes for ui-tui, apps/shared, and apps/desktop with scripts skipped.
- ripgrep download uses `curl --retry`.
Docker (main-only) and the release/windows workflows are intentionally left
for a follow-up.
The image build + smoke test + integration suite are the heaviest jobs in CI
(~9-11 min) and ran on every PR. Gate them to push-to-main and release: a
broken build surfaces on the main push, while the cheap pre-merge guards
(docker-lint hadolint/shellcheck, uv-lockfile-check) still run on PRs to
catch the common Dockerfile/lockfile breakage. Steps skip on PRs so the job
stays green; the dead PR-only arm64 cache-warm build is removed.
Heavy PR checks run on every PR because the workflows deliberately avoid
`on.paths` filters — a path-gated workflow leaves its required check pending
forever when no matching file changes, blocking merge. So a docs-only PR
still spins up the TypeScript matrix, the full Python suite, and ruff/ty.
Keep every workflow triggering on every PR (checks always report) but gate
the expensive *steps* on what the PR touches. Skipping a step (not the job)
leaves the job green, so required checks never hang — the same idiom already
proven in contributor-check.yml.
A classifier (scripts/ci/classify_changes.py) maps the PR diff to three
lanes — python, frontend, site — surfaced as step outputs by a composite
action (.github/actions/detect-changes). Fail-open: an empty diff or any
.github/ change runs everything; python is a denylist (skipped only when
every file is provably prose or a frontend-only package); skills/**/SKILL.md
counts as python-relevant since the skill-doc tests read that tree. Non-PR
events always run the full pipeline.
A Medium-integrity Hermes agent cannot drive High-integrity (admin)
windows on Windows — UIPI blocks UIA enumeration and mouse injection
(SOM returns 0 elements, clicks silently no-op, screenshots still work,
keyboard partially bypasses). OS constraint affecting every Windows
automation stack, not a cua-driver bug. Document the symptom + the
run-elevated workaround. Closes#49067.
Follow-up to the salvaged voice-clip fix: the rerouted video/mp4 branch
used {".m4a": "audio/mp4"}.get(ext, "audio/mp4"), whose sole key's value
equals the default, so it always returned "audio/mp4" regardless of the
cached extension (dead lookup + a throwaway dict per inbound voice clip).
Replace it with a module-level _SLACK_EXT_TO_AUDIO_MIME map so the reported
media_type matches the bytes we cached (e.g. a clip cached as .wav now
reports audio/wav instead of audio/mp4). STT routing already keys on the
audio/ prefix + cached filename extension, so behavior is unchanged; this
just removes the dead construct and keeps the reported mimetype coherent.
Slack in-app voice clips ("record a clip") arrive as MP4/AAC containers
(mimetype audio/mp4, filename audio_message*.mp4), and Slack sometimes
labels them video/mp4. The inbound audio handler derived the cache
extension from the mimetype and fell back to ".ogg" for anything not in
{.ogg,.mp3,.wav,.webm,.m4a} — so audio/mp4 voice messages were cached as
.ogg. OpenAI STT (whisper-1, gpt-4o-transcribe) sniffs the container from
the FILENAME extension, so it received MP4 bytes named .ogg and rejected
them. WhatsApp .ogg and uploaded .m4a worked only because their extension
happened to match the bytes.
Fix:
- _resolve_slack_audio_ext(): pick the cache extension from the real
filename first, then a mimetype map (audio/mp4 -> .m4a), defaulting to
.m4a — never the bogus .ogg fallback. Mirrors the video branch and the
audio map already in gateway/platforms/bluebubbles.py.
- _is_slack_voice_clip(): detect audio-only clips mislabeled video/mp4
via the slack_audio subtype / audio_message* filename, and route them
through the audio path (cached as audio, reported as audio/*) so they
reach STT instead of video understanding. Genuine videos (and
slack_video screen recordings) are left on the video path.
Verified end-to-end against a real audio-only MP4: old path cached it as
.ogg (ffprobe shows MP4 bytes -> container mismatch -> OpenAI rejects);
new path caches it as .mp4 (extension matches bytes -> accepted).
Adds inbound-audio tests (previously none): helper unit tests plus
_handle_slack_message E2E coverage for audio/mp4, video/mp4-mislabeled
voice clips, and a real video staying on the video path. Confirmed the
two voice-message tests fail without the fix (mutation check).
The gateway half of Phase 6 Unit ζ: project the agent's existing relevance
knobs into the connector's platform-agnostic vocabulary and declare them at boot
over the /relay/policy route, so the SAME mention-gating / free-response /
allow-bots behavior the agent applies directly also governs relay delivery (and
excluded chatter never wakes a scaled-to-zero agent).
- gateway/relay/__init__.py:
- relay_relevance_policy(): project require_mention -> requireAddress,
free_response_channels -> freeResponseScopes, {PLATFORM}_ALLOW_BOTS in
{mentions,all} -> allowOtherBots. Reads the fronted platform's config block
+ bridged top-level keys. Returns None when all-default (the connector's
quiet default already matches) or no concrete platform is fronted.
- send_relay_policy(): POST /relay/policy authenticated with the gateway's own
per-gateway upgrade token (make_upgrade_token — same bearer as the WS
upgrade), so the connector attaches it to the authenticated instance, never
a body-asserted id. Re-declares every boot (self-healing, full replace).
NEVER raises, NEVER blocks boot — relevance is an optimization layered on
the δ/ε authorization gate. Reuses the per-gateway secret + the
/relay/provision host; no new inbound surface, no new credential.
- _policy_url(): ws(s)://…/relay -> http(s)://…/relay/policy.
- gateway/run.py: call send_relay_policy() after register_relay_adapter()
succeeds (the secret is resolved by then).
- docs/relay-connector-contract.md: new §7 documenting per-instance delivery +
the management plane (/manage/* + /relay/policy) + the relevance-declaration
contract; versioning renumbered to §8. Contract conformance test stays green
(§2/§3 tables untouched).
Tests: +12 (projection mapping incl. comma-string + top-level fallback; send
auth/skip/fail-soft/non-200). Full relay suite 118 pass. The connector route is
already E2E-proven (connector repo gateway_policy_driver.py); this adds the real
gateway send-path it pairs with.
This completes Phase 6 (Team Gateway per-user isolation) end to end.
A "one-shot" is a single stateless model call that runs OUTSIDE any conversation:
it never touches session history, never breaks prompt caching, and returns plain
text. UI surfaces need this for small generative chores — a commit message from a
diff, a rename suggestion, a summary — where an agent turn would pollute the
thread and hand-rolling an LLM call at every call site would be worse.
- `agent/oneshot.py`: `run_oneshot(...)` over the existing auxiliary-client
plumbing (same path as title generation). Two call shapes: explicit
instructions/input, or a registered `template` + `variables` (templates own the
prompt engineering so it stays consistent across CLI/TUI/desktop). Ships a
`commit_message` template. Model selection inherits the live session via
`main_runtime`, else the configured aux `task` backend.
- `tui_gateway/server.py`: `llm.oneshot` RPC (long-handler) inheriting the
session's model when `session_id` resolves.
Stateless by construction — no session mutation, cache untouched.
Follow-up to the coding-context posture (#43316): that PR detects each repo's
verify loop (manifests, package manager, exact test/lint/build commands, context
files) and bakes it into the system-prompt snapshot — but only as a string, for
the model. Non-prompt consumers (the desktop verify UI) had no way to read it
without re-sniffing and drifting from the prompt.
Split detection from rendering, keeping one source of truth:
- `detect_project_facts(root) -> ProjectFacts` (frozen) holds the structured
facts; `_project_facts()` now renders it into the same snapshot lines, so the
prompt block stays byte-identical (cache-safe).
- `project_facts_for(cwd)` resolves the workspace root (git, else marker) and
returns the structured facts, or None outside a workspace.
- `project.facts` gateway RPC surfaces it to any client (desktop/TUI/ACP).
Tests assert the structured output and that the UI-facing commands never drift
from what the prompt block renders (one detector feeds both).
* Revert "fix(cron): scope job execution to its owning profile (#32091 follow-up) (#50993)"
This reverts commit 660e36f097.
* Revert "fix(cron): anchor cron storage at the default root home (not the active profile)"
This reverts commit a5c09fd176.
Register previewable artifacts from the tool row, feed a session-scoped store,
and render compact rows above the composer. Remove the inline preview card.
* feat(memory): OAuth token storage and refresh for the Honcho provider
* feat(memory): refresh the Honcho OAuth token in the client and session
* feat(memory): zero-CLI loopback OAuth authorization flow
* feat(memory): generic memory-provider OAuth connect endpoints
* feat(desktop): memory-provider OAuth connect link
* feat(memory): CLI OAuth sign-in with source-tagged authorize links
* fix(memory): IP-literal loopback redirect and consent config_path on the authorize link
* fix(memory): profile-scope the memory-provider OAuth endpoints
* refactor(desktop): generic memory-provider OAuth client functions
* docs(memory): trim OAuth module docstrings to the invariants
* docs(memory): document OAuth connect as an optional auth method
* fix(memory): send home-relative display path to consent, not the absolute path
* perf(memory): cache OAuth token expiry in memory to skip the hot-path disk read
* fix(memory): log OAuth refresh failures at warning, not debug
* feat(memory): fall back to an OS-assigned loopback port when 8765 is taken
* test(memory): cover the desktop Connect launcher, status, and provider dispatch
* fix(desktop): keep the memory-provider dropdown one size regardless of connect state
* fix(desktop): move the memory connect link to the description line, leaving the dropdown untouched
* refactor(memory): move OAuth connect routes out of web_server into a memory-layer router
* refactor(desktop): import MemoryConnect directly, drop the single-export barrel
* fix(memory): launch CLI OAuth sign-in right after the auth choice, not after the wizard
* fix(desktop): auto-clear the OAuth error state instead of leaving it sticky
* test(honcho): isolate auth-method prompt from deployment-shape wizard tests
main's wizard suite scripts the cloud prompts without the OAuth auth-method step; auto-answer it in the shared helper so the answer lists stay shape-only.
* docs(honcho): document query-adaptive reasoning level (reasoningHeuristic)
README never mentioned reasoningHeuristic and listed reasoningLevelCap as an orphaned cap with the wrong default (— vs "high"). Add the query-adaptive scaling note + the reasoningHeuristic/reasoningLevelCap rows (grouped under Dialectic & Reasoning), matching the wording already on the hosted honcho.md page, and add a pointer from the memory-providers overview.
* fix(honcho): default the CLI peer prompt to the OAuth consent name
The CLI runs the grant with apply_config=False, so the peerName the user just entered at consent was dropped and the wizard's 'Your name' prompt fell back to $USER. Surface it as a transient OAuthCredential.consent_peer_name (set even when config isn't merged) and seed the prompt default from it.
* feat(honcho): split OAuth client_id by surface (cli=hermes-agent, desktop=hermes-desktop)
resolve_endpoints now picks the client_id from the initiating surface and
threads it through authorize -> token exchange -> persisted grant -> refresh,
so the CLI and desktop register as distinct OAuth clients. Surface-specific
env overrides (HONCHO_OAUTH_CLIENT_ID_CLI/_DESKTOP) win over the generic
HONCHO_OAUTH_CLIENT_ID, which still overrides every surface.
* feat(honcho): show OAuth vs API key in status; detect existing OAuth in setup
status now prints 'Auth: OAuth (clientId, token valid Xm/expired)' instead of
masking the OAuth access token as a generic API key; setup notes an existing
OAuth grant when re-run.
* docs(honcho): drop 'shared pool' wording from unified observation mode help
* fix(honcho): cross-process lock around OAuth refresh to prevent grant revocation
The in-process threading lock can't stop a sibling process (another profile or
the desktop app sharing honcho.json) from replaying the single-use refresh
token and tripping reuse-detection, which revokes the whole grant. Guard the
read-refresh-persist section with an OS file lock on <config>.lock so only one
process rotates at a time; the others re-read the freshly-persisted token.
Best-effort: platforms without flock degrade to in-process serialization.
* refactor(honcho): one OAuth client (hermes-agent) for all surfaces
Collapse the per-surface client_id split. CLI and desktop now use a single
client_id (hermes-agent); consent branding/UI still adapt via the source query
param. One grant identity means no clientId-vs-refresh-token desync that could
get the grant revoked. HONCHO_OAUTH_CLIENT_ID still overrides for self-hosting.
* fix(honcho): per-session resolves to session_id, never remapped by title
Reorder resolve_session_name so stable identifiers win over labels: gateway
per-chat key first, then the per-session session_id, then the cwd map / title.
A (possibly auto-generated) title can no longer remap a live per-session
conversation onto a second Honcho session mid-stream — fixes the desktop, which
is per-conversation via session_id. Consequence: a gateway's per-chat key now
also wins over a title (titles never remap a stable id).
Adds a compact right-edge prompt timeline for long desktop chat sessions, with hover previews, click-to-jump, active/hover row states, and pane hover-reveal suppression so the rail can live at the hard edge without opening side panels.
The subprocess-stdin guard (TUI gateway fd-inheritance protection) flagged
the `permissions grant` call. None of the cua-driver probes/grant read
stdin, so DEVNULL is correct; apply it to the shared `_run` helper and the
grant call.
The card was macOS-only. cua-driver also runs on Windows and Linux, so
fold `cua-driver doctor` (cross-platform binary/health probes) into a
single OS-aware `ready` signal:
- macOS: ready == both TCC grants; keeps the permission rows + grant flow.
- Windows/Linux: no TCC toggles, so ready == driver health, with a
per-OS note (SmartScreen/UIAccess on Windows; X11/XWayland on Linux).
`computer_use_status()` replaces the macOS-only `permissions_status()` and
surfaces `platform`, `ready`, `can_grant`, and the doctor `checks` (non-ok
ones render as warnings). CLI `permissions status`, the REST endpoint, and
the desktop card all key off the one payload. Grant stays macOS-only (400
elsewhere — nothing to grant).
Vision mode called a `screenshot` MCP tool that cua-driver dropped in
0.5.x (full-window PNG capture was folded into `get_window_state`). The
driver replied "Unknown tool: screenshot", so `images` came back empty,
`png_b64` stayed None, and capture returned a 0x0 result with no image
on every call. `som`/`ax` were unaffected because they already use
`get_window_state`, which masked the regression.
Route vision by capability:
- driver advertises `screenshot` (older builds) -> use it (no AX walk)
- otherwise -> call `get_window_state` but discard the AX tree/elements,
returning only the PNG so vision stays free of element noise
- capabilities not yet discovered -> try `screenshot`, fall back to
`get_window_state` on an empty image, so the path self-heals
Add `_image_from_tool_result` to pull the PNG from either an MCP image
content-part or `structuredContent.screenshot_png_b64`, and use it on
the som path too so the image won't silently drop on driver builds that
deliver it via structuredContent instead of a content part.
Verified live (vision: 1568x954, 0 elements; som: image + 527 elements)
and with unit coverage of all four routing cases.
Computer Use already worked through the desktop backend (the cua-driver
toolset enables + installs via Settings -> Skills & Tools), but there was
no in-app way to see or grant the two macOS permissions it needs, so "give
a model my Mac" was tribal knowledge.
The grants attach to cua-driver's OWN TCC identity (com.trycua.driver /
the installed CuaDriver.app), not Hermes -- so no app entitlement is
involved. cua-driver 0.5+ exposes `permissions status/grant`, which we wrap:
- tools/computer_use/permissions.py: thin client over the two subcommands
- hermes computer-use permissions {status,grant}: CLI parity
- GET /api/tools/computer-use/status, POST .../permissions/grant: desktop REST
- ComputerUsePanel: live Accessibility + Screen Recording state with a
Grant button (dialog attributed to CuaDriver), shown in the expanded
Computer Use toolset row. Binary install stays in the existing provider
post-setup runner.
Follow-ups: i18n the card copy; a "Stop driver" control (cua-driver stop)
for the runaway-`serve` case.
Adds auxiliary.background_review.{provider,model} (default auto = main chat
model — unchanged). Set it to a different, cheaper model and the post-turn
self-improvement review runs there for ~3-5x lower cost.
Cache-aware by design: the main chat is warm in the prompt cache, so the
default full-history replay on the main model is cheap cache reads — left
exactly as-is. A different model can't reuse that cache (different key), so
when (and only when) routed to a different model the fork replays a compact
digest instead of the full transcript, minimising what it cold-writes on the
aux model. Same model -> full replay; different model -> digest.
Quality holds in benchmarks: memory capture identical, skill near-identical.
Nothing changes unless you opt in by naming a different model.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
The #32091 fix moved every profile's cron jobs into one shared root store,
but never wired the execution-scoping half it recommended: a job still ran
under whichever profile's ticker picked it up, not its owning profile. So a
job created under `hermes -p donna` could execute with the root profile's
.env / config.yaml / credentials.
- jobs.py: create_job auto-captures the active profile (explicit profile=
override available) and stores it on the job; resolve_profile_home() maps a
profile name to its HERMES_HOME; legacy jobs backfill to 'default'.
- scheduler.py: run_job applies the job's profile via a scoped HERMES_HOME
override (env var + in-process ContextVar) before any .env/config/script
load, restored in finally. tick() routes profile-mismatched jobs to the
single-worker sequential pool so the env mutation can't race.
- cronjob tool threads profile through (NOT exposed in the model schema, to
avoid cross-profile privilege escalation); hermes cron add gains --profile.
E2E verified against a temp HERMES_HOME with a real profile dir: a root-profile
ticker runs a profile='donna' job with HERMES_HOME=donna during execution and
restores the ticker env afterward.
File tools (read_file, write_file, patch, list_directory, etc.) used
os.path.expanduser() which reads the gateway process HOME env var.
In Docker/systemd/s6 deployments where the gateway HOME differs from
interactive sessions, tilde expanded to the wrong directory.
Add _expand_tilde() helper that delegates to get_subprocess_home() when
available, falling back to os.path.expanduser(). Replace all 9
expanduser() call sites in file_tools.py with _expand_tilde().
Follow-up to #50767, which redacted the chat-platform (_approval_notify_sync)
and SSE/API (_approval_notify) approval transports. The TUI JSON-RPC transport
is the third egress and was missed: three register_gateway_notify callbacks in
tui_gateway/server.py emitted the raw approval_data — including the unredacted
command Tirith flagged — straight to the TUI client via _emit.
Route all three registrations through a new module-level _emit_approval_request()
helper that redacts payload['command'] via the shared
gateway.run._redact_approval_command seam before emitting, matching the pattern
used for the other two transports. Completes the whole-bug-class fix for #48456.
Tests: assert the helper emits a redacted command (real credential pattern),
handles missing/None command, and a wiring guard that no registration emits the
raw payload directly (only the helper may). Both mutation-checked.
The #48456 fix series originated from @liuhao1024's #48462 — credit to them for
the original report and chat-platform fix; this completes the remaining transport.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Follow-up to the /memory approve fresh-store fix. Both the CLI fallback and
the messaging-gateway handler built a bare MemoryStore() with the hardcoded
default char limits (2200/1375), ignoring the user's configured
memory.memory_char_limit / user_char_limit. A live agent honors those
overrides (agent/agent_init.py), so an approval applied without a live agent
could accept a write the user's lower cap would reject, or vice versa.
Extract a shared tools.memory_tool.load_on_disk_store() factory that reads
the configured limits (falling back to defaults if config can't load) and
wire both the CLI and gateway handlers to it, closing the gap on both
surfaces and de-duplicating the construction block.
The CLI /memory slash handler (cli_commands_mixin._handle_memory_command)
passed self.agent._memory_store straight through, which is None when the
command runs without a live agent — e.g. /memory approve from the Desktop
GUI. The shared write-approval handler then returns "memory store
unavailable" and applies nothing, even with built-in memory enabled and
pending writes present.
Fall back to a freshly loaded on-disk MemoryStore when no live store is
available, mirroring the gateway path (gateway/slash_commands.py). It
persists to the same MEMORY/USER.md and creates MEMORY.md on the first
approved write.
Fixes#46783
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The media-delivery denylist in gateway/platforms/base.py enumerated only
.env/auth.json/credentials/config.yaml under HERMES_HOME, so other
credential stores that live at the root fell through and could be
auto-attached to chat replies. The reported case: the Google Workspace
skill's google_token.json refreshes every turn, bumping its mtime to
'now', which kept passing the strict-mode recency window and re-sent the
OAuth token on every reply.
Extend the explicit per-file denylist to mirror the canonical credential
set already enforced by the read/write guards in agent/file_safety.py:
google_token.json, google_oauth_pending.json, auth/google_oauth.json,
.anthropic_oauth.json, webhook_subscriptions.json, cache/bws_cache.json,
auth.lock, and the pairing/ token directory.
Targeted per-file additions (not a blanket ~/.hermes deny, which was
declined in #32090/#34425 because it would block skills/, logs/, and
ad-hoc agent-written deliverables). mcp-tokens/ (#37222) and
state.db/kanban.db (#41071) are left to their sibling targeted PRs.
Reported-by: xxxigm (#50912)
An unpinned cron job follows the global default provider (config.yaml
model.default + resolve_runtime_provider). If that global state is changed
after the job is created — e.g. a temporary switch to a paid provider like
nous/claude-fable-5 — the job silently inherits it on its next tick and spends
real money. This is the reported $7.73 incident: a job created under a
free/default provider later inherited a temporary paid switch.
Fix (ask #1 only) preserves the legitimate "unpinned job should follow
model.default" use case by detecting *drift* rather than freezing the model:
- create_job (cron/jobs.py): for UNPINNED, agent-backed jobs (no explicit
provider, not no_agent), snapshot the provider that resolution WOULD pick
right now into a new optional `provider_snapshot` field, resolved via the
same resolve_runtime_provider() path the ticker uses. Fail-open to None on
any resolution error so job creation never breaks.
- run_job (cron/scheduler.py): right after runtime resolution, if the job has
a provider_snapshot AND is unpinned AND the currently-resolved provider
DIFFERS from the snapshot, fail closed for that run — make no paid call and
deliver a loud, actionable alert naming both providers and telling the user
to pin explicitly (`cronjob action=update job_id=.. provider=..`).
Back-compat: jobs with no snapshot (pre-existing jobs, no_agent jobs, or any
job whose creation-time resolution failed) behave exactly as before — the
guard only engages when a snapshot exists. Explicitly-pinned jobs (job.provider
set) are unaffected since they don't drift with global state.
Tests: tests/cron/test_cron_provider_pin.py covers snapshot-matches (runs),
snapshot-differs (fail closed, no agent constructed), no-snapshot back-compat,
None-snapshot back-compat, explicitly-pinned (runs regardless), plus create_job
snapshot capture/skip/fail-open. The fail-closed case is load-bearing (fails
without the guard).
Issue #44585 asks #2-4 (hard-stop a running job, gateway-stop containment,
fail-closed on provider mutation) are out of scope for this change.
Discord enforces a hard 100-command limit per app and rejects an upsert that would push the live total over 100 (error 30032), which silently breaks ALL slash commands. The sync deleted obsolete commands AFTER creating new ones, so an app already at the cap momentarily exceeded it and the whole sync failed.
Reorder: delete no-longer-desired commands up front, then create/update. Removes the now-redundant trailing delete loop. Adapts @infinitycrew39 PR #50890 to current main (the original adapter diff no longer applied after the platform refactor); test commit cherry-picked with authorship preserved.
Add a test to verify that _safe_sync_slash_commands deletes obsolete
commands before creating new ones. This ensures we never temporarily
exceed Discord's 100-command limit during sync, which would trigger
error 30032 and break all slash commands.
This test guards against the regression where sync could fail even though
the registration cap was properly enforced.
f-trycua's #50855 test file predated the cross-platform PR (#50552) and
reintroduced two stale tests asserting Linux is unsupported
(test_*_non_macos_*, patching platform.system="Linux" and expecting a
no-op/warn). Linux + Windows are supported now, so install proceeds on
those platforms. Restore main's cross-platform-correct versions:
test_*_on_unsupported_platform_* using FreeBSD as the genuinely
unsupported case.
`hermes computer-use install` refused to install on Linux, Windows, and
macOS x86_64 because the pre-install asset probe was hitting the wrong
GitHub endpoint AND duplicating tag-resolution logic the upstream
installer already does correctly.
`_check_cua_driver_asset_for_arch()` queried
`https://api.github.com/repos/trycua/cua/releases/latest`. On trycua/cua:
- cua-driver-rs releases (the binary the installer fetches) are marked
**prerelease** on every cut. GitHub's `/releases/latest` explicitly
skips prereleases.
- The Python package releases (`cua-agent`, `cua-computer`, `cua-train`)
are non-prerelease and end up as the "latest" instead.
Live API check today:
$ curl -sf https://api.github.com/repos/trycua/cua/releases/latest \
| jq '{tag:.tag_name, asset_count: (.assets|length)}'
{ "tag": "agent-v0.8.3", "asset_count": 0 }
The probe sees zero assets, prints "Latest CUA release has no Linux
x86_64 asset", and skips install on every Linux / Windows / macOS-x86_64
host — even though the cua-driver-rs-v0.6.0 release ships 19 binary
assets covering all those platforms.
Filtering `/releases?per_page=N` for the `cua-driver-rs-v*` prefix
fixes the bug, but it duplicates tag-resolution logic the upstream
`_install-rust.sh` already does correctly via `CUA_DRIVER_RS_BAKED_VERSION`
(auto-baked by CD on every release, with a `/releases?per_page=N` API
fallback for dev checkouts). The right answer is to trust that
contract instead of mirroring it in Python where it can drift.
Two paths get the same outcome without the probe:
1. **Fresh install**: run `install.sh` directly. It has the baked
release tag, fetches the right asset, and errors with a clear
message on missing-arch downloads. No preflight needed.
2. **Upgrade path**: `cua_driver_update_check()` (separately added)
shells `cua-driver check-update --json` against the installed
binary, which returns the canonical update answer from the same
source the installer uses.
- `hermes_cli/tools_config.py`: delete `_check_cua_driver_asset_for_arch`
and its two call sites in `install_cua_driver`. Replace with an
inline comment near the top of the module explaining the rationale.
- `tests/hermes_cli/test_install_cua_driver.py`: drop the
`TestCheckCuaDriverAssetForArch` block. Add `TestArchProbeRemoval`
with three regressions:
- `test_probe_function_is_gone` — asserts the deleted helpers stay
deleted.
- `test_fresh_install_does_not_call_github_api` — asserts the
install path doesn't hit GitHub directly from Python anymore.
- `test_upgrade_with_binary_does_not_call_github_api_directly` —
same for the upgrade path.
All 9 `test_install_cua_driver` tests pass.
Reported by @teknium1 while testing on a headed Ubuntu host.
* chore: re-trigger CI (workflows did not dispatch on prior head)
* fix(image/video gen): make schema delivery instruction platform-neutral
The image_generate and video_generate tool schema descriptions hardcoded
a gateway-only delivery instruction ('display it with markdown
 and the gateway will deliver it'). That schema
is sent on every platform, so on CLI it directly contradicted the CLI
platform hint ('Do NOT emit MEDIA:/path tags ... state its absolute path
in plain text'), and on messaging platforms it was also wrong about the
mechanism (local file paths are delivered via MEDIA: tags, not markdown
image syntax — markdown ![]() only works for URLs).
The per-platform file-delivery convention is already owned correctly by
the platform hints in prompt_builder.py. The tool schema now just
describes the result shape (URL or absolute path in the image/video field)
and defers 'how to deliver' to the active platform's guidance.
Provider/model injection already works via _build_dynamic_image_schema()
(the 'Active backend: <provider> · model: <model>' line); no change there.
A typed `/model <name>` where `<name>` is declared under `providers.<slug>` or
`custom_providers` — but typed while the current provider is a soft-accepting
one (e.g. `openai-codex`) — stayed on the current provider and was swallowed as
an unknown hidden Codex model, instead of routing to the provider that actually
declares it.
Add configured-provider exact-match detection (`_configured_provider_matches`)
and a new Step d.5 in `switch_model`: if the typed model is declared in
user/custom provider config, route to that provider BEFORE
`detect_provider_for_model()` guesses from static catalogs and BEFORE the
common-path validation lets a soft-accepting current provider swallow the name.
- Matching is exact (case-insensitive) against explicitly-declared model
collections only (`models`, `model`, `default_model`) — never fuzzy/family.
- Same-provider declarer → keep current provider (canonicalize the id).
- Multiple declarers → fail clearly and ask for `--provider <slug>`.
- Single declarer → route there; for `providers.<slug>` user providers, set
`explicit_provider` so the credential block resolves base_url/key from config.
- Step e (`detect_provider_for_model`) is gated off when `config_routed`.
The deliberately-supported openai-codex / xai-oauth hidden-model soft-accept
(#16172 / #19729) is left untouched: when nothing in config matches, detection
is a no-op.
Salvaged from #45442 by harjothkhara (authorship preserved).
Tests: tests/hermes_cli/test_model_switch_configured_provider_routing.py
(7 tests). Full model_switch suite: 214 passed.
Fixes#45006
capture(app='screen'|'desktop') now resolves to the OS shell/desktop
window (Windows Progman/WorkerW desktop or Shell_TrayWnd taskbar, macOS
Finder/Dock) so 'show me my screen' and 'click the taskbar' work.
Previously capture() only matched application windows, and the schema
advertised 'or the whole screen' without any code path delivering it.
cua-driver is window-oriented (no virtual-desktop or per-monitor MCP
tool), so a single image still cannot span multiple monitors — the
schema now states this and the no-desktop-window path returns a clear
message instead of silently grabbing the frontmost app.
cua-driver 0.6.x removed the standalone screenshot MCP tool, so
capture(mode='vision') hit 'Unknown tool: screenshot' and returned a
0x0 image with no PNG while som/ax (which use get_window_state) still
worked. Route vision through get_window_state(capture_mode='vision').
Salvaged from PR #50771; same fix submitted earlier as #39262 by
@Tranquil-Flow.
Add the in-window floating pet (sprite, speech bubble, contact shadow,
profile-scoped, resize-safe) and a pop-out always-on-top overlay window
with gestures and notifications. Add the Cmd+K pet picker page plus the
appearance gallery and size slider in settings. Includes the pet stores,
electron overlay wiring, i18n strings, and store tests.
Add the Ink pet sprite pane, the interactive /pet picker overlay, and live
pet switching/rescale driven by new tui_gateway RPCs (pet state, pet.scale,
per-state frames). Wires pet flash state and the picker into the TUI layout
and slash handler. Covered by the slash-handler test.
Render the reactive pet pane in the classic CLI (steady redraw,
right-aligned) and wire the /pet command to list and switch pets, plus an
enable/disable toggle. Backed by hermes_cli/pets.py and the CLI commands
mixin, registered in the central command registry. Covered by the CLI pet
pane and toggle tests.
Add the shared pet engine under agent/pet/: spritesheet manifest loading
and in-process caching, six-state animation model, frame rendering, and
the persistent pet store. Register the display.pet config block (pet,
scale, enabled, etc.) that every surface reads from. Covered by
tests/agent/test_pet_engine.py.
2026-06-20 14:18:30 -05:00
1039 changed files with 89703 additions and 10220 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.