Add an optional creative/ableton skill for controlling Ableton Live through the
upstream AbletonMCP server. The skill documents the required MIDI Remote Script,
uses the canonical `uvx ableton-mcp` command, and disables upstream telemetry in
the Hermes MCP add command.
Ships a small preflight doctor and research notes; no core dependency or bundled
runtime is added.
Assert bare tables upgrade to sendRichMessage under default/opt-out config,
DM-topic resumed sends without reply anchors, and rich finalize edits carry
forum topic routing metadata.
Pipe-only markdown tables now use sendRichMessage even when rich_messages
is off, and resumed DM-topic sends route via direct_messages_topic_id
without requiring a reply anchor. Rich finalize edits forward topic kwargs.
The salvaged context-window screen (#52392) skips fallback candidates that
are too small, and the rate-limit/403 fixes skip candidates that are at
capacity. A third hard failure remained uncovered: a fallback that builds a
client fine but returns a 400 because it structurally cannot run the model.
The canonical case is a configured openai-codex / ChatGPT-account fallback
asked to compress a glm-5.2 conversation:
400 - {'detail': "The 'glm-5.2' model is not supported when using
Codex with a ChatGPT account."}
This is a request-validation error, so should_fallback was False and the
explicit-provider gate blocked it — the auxiliary task (compression) aborted
every turn, dropping middle turns without a summary and churning the session,
which is exactly what destroys the prompt cache.
Adds _is_model_incompatible_error() (400 + capability phrasing, excluding
not-found and billing 400s which the sibling predicates own) and treats it as
a fallback-worthy capacity error in both sync and async call_llm, so the chain
skips the incapable route and continues to the next viable candidate.
The runtime auxiliary fallback chain (_try_configured_fallback_chain and
_try_main_fallback_chain) returned the first reachable candidate without
checking whether the candidate's context window was large enough for the
task. For task='compression' this meant a reachable but undersized
fallback (e.g. 32K) could be selected and then fail, even when a later
larger-context fallback was available.
This adds two small helpers:
_task_minimum_context_length(task)
Returns MINIMUM_CONTEXT_LENGTH (64K) for compression, None for
other tasks (vision, web_extract, etc.).
_candidate_context_window(provider, model, ...)
Thin wrapper around get_model_context_length that returns None on
probe failure so unknown/custom endpoints pass through unchanged
(preserves the existing fallback surface).
Both fallback loops now skip reachable candidates whose resolved context
is below the task minimum and continue iterating. The success path
(first viable candidate wins) is unchanged. Return shape and ordering
for healthy candidates are preserved.
Six regression tests cover:
L2 configured chain skips too-small candidate
L2 chain continues after skipping, returns last viable
L3 main chain skips too-small candidate
L4 unknown-context candidate passes through
L5 non-compression task is not filtered
L6 minimum constant matches MINIMUM_CONTEXT_LENGTH (64K)
3/6 fail on upstream/main without the production change (verified); all
6 pass with the fix. Full test_auxiliary_client.py suite (231 tests)
and related compression tests (130 tests) remain green.
When an explicit aux provider cannot build a client before any request is
sent (missing raw env key, exhausted/unavailable OAuth or credential-pool
auth, resolver returning (None, None)), call_llm raised a misleading
"no API key was found" error and bypassed the configured fallback_chain
entirely. A provider authenticated through Hermes auth / the credential
pool (e.g. ollama-cloud) whose pool entry is exhausted hit this path, so
compression failed instead of routing to the configured fallback.
Adds _try_configured_fallback_for_unavailable_client() and wires it into
both sync and async call_llm before the raise, and into the startup
compression feasibility check.
Salvaged from #51835 by @herbalizer404.
Rate-limit (429) errors on explicit-provider auxiliary tasks were
silently failing instead of triggering the fallback chain. The
is_capacity_error gate only checked payment and connection errors,
excluding rate limits — so when a configured provider like
openai-codex hit its rate limit, auxiliary tasks (kanban_decomposer,
vision, web_extract, approval, etc.) had zero resilience.
Add _is_rate_limit_error() to is_capacity_error at both call sites
(sync and async paths) so rate limits trigger fallback regardless
of whether the provider was auto-detected or explicitly configured.
Fixes#52228
Ollama Cloud (and similar) return 403 with bodies like "this model requires
a subscription, upgrade for access" or "you have reached your session usage
limit, upgrade for higher limits". These are capacity/billing conditions
semantically identical to credit exhaustion, but _is_payment_error() did not
recognize them (403 missing from the status set; keywords missing), so the
configured fallback_chain was never tried and compression failed outright.
Adds 403 to the status set and the subscription/session-usage keywords.
Salvaged from #49076 by @herbalizer404.
These 7 test sites assert rotation behavior (fork, child sessions, lock
contention, logging session-context follows id rotation, boundary hooks fire
on rotation). Pin each builder to in_place=False explicitly so they keep
exercising the retained rotation fallback regardless of the global default
(flipped to True in #38763). Rotation stays a working opt-out fallback and
deserves continued coverage — these are NOT deleted.
Pinned sites:
- test_compression_concurrent_fork._build_agent_with_db
- test_compression_logging_session_context._build_agent_with_db
- test_compression_rotation_state._build_agent_with_db
- test_compression_boundary_hook._make_agent (2 helpers: CompressionBoundaryHook + SessionCompressEvent)
- test_compression_concurrent_sessions._build_agent_with_db
In-place compaction (single durable session id, non-destructive soft-archive)
becomes the default. Rotation is now the opt-out fallback via
compression.in_place: false.
Prerequisite: #50098 (hygiene guard reads result flag not config flag) merged
first — without it, flipping the default causes permanent transcript loss on
gateway hygiene-compress and /compress when no session_db is available.
Blast radius (empirically measured on current main): 7 rotation-asserting
tests broke and are pinned to in_place=False in the companion test commit:
- tests/agent/test_compression_concurrent_fork.py (2)
- tests/agent/test_compression_logging_session_context.py (1)
- tests/agent/test_compression_rotation_state.py (1)
- tests/run_agent/test_compression_boundary_hook.py (2 _make_agent helpers)
- tests/gateway/test_compression_concurrent_sessions.py (2)
Rotation stays as a working fallback and deserves continued coverage.
Plan: .hermes/plans/in-place-compaction-38763.md
Salvage of #50098 by @srojk34, cherry-picked onto current main.
The hygiene auto-compress guard and the /compress slash command both read
compression_in_place (config flag — is in-place mode enabled?) instead of
_last_compaction_in_place (result flag — did in-place compaction actually
succeed?). Both agents are built without a session_db, so archive_and_compact
always fails silently and _last_compaction_in_place stays False. Reading the
config flag makes the guard think in-place succeeded, triggering
rewrite_transcript() which replaces the original messages with only the
compressed summary — permanent data loss.
Co-authored-by: srojk34 <srojk34@users.noreply.github.com>
build_turn_context() created the DB session row via _ensure_db_session()
before the system prompt was restored/built, so a fresh API/gateway agent
carrying client-managed history inserted a row with system_prompt=NULL. That
tripped the misleading 'stored system prompt is null; rebuilding from scratch
... investigate the previous turn's write path' warning and a guaranteed
first-turn prefix cache miss. Move row creation to after _cached_system_prompt
is populated.
Verified live (OpenRouter + claude-sonnet-4.5): persistent-agent turns show
cache_read jumping to the full prefix on turn 2+ (write 24411 -> read 24411),
and the persisted system_prompt is non-NULL so fresh-agent restore keeps the
prefix cache warm.
Tests: turn-context ordering regression asserting _ensure_db_session runs
after _cached_system_prompt is populated.
/learn told the agent to fill the skill `author` field, and the system
prompt environment probe surfaces the OS login name (user=$(whoami) in
prompt_builder.py), so the model wrote the host username into published
SKILL.md frontmatter — a privacy leak the user never opted into, and
inconsistent run to run as the most-salient identity changed.
The /learn authoring prompt now sets `author` to the literal value
`Hermes` and explicitly forbids deriving it from the host environment
(OS/login user, git config, or any probeable identity). The skill names
itself as the tool that wrote it.
Closes#52368.
- Replace getattr(self.session_store, '_db', None) with self._session_db
(the GatewayRunner's own SessionDB, consistent with existing usage in
slash_commands.py L240/L499).
- Remove verbose comment referencing a branch name as an issue number.
- Update stale comment in run.py that said 'today it has no session_db'.
- Add regression test verifying session_db is passed and rotated session
is persisted (adapted from #51624 by @LeonSGP43).
- Add _session_db=None to _make_runner fixtures in test_compress_command,
test_compress_focus, and test_compress_plugin_engine.
Manual /compress and session hygiene auto-compress both create temporary
AIAgent instances to run compression. These agents were created without
a session_db, so compress_context computed the compressed messages in
memory, rotated the session ID, and reported success — but never wrote
to the database. The next user message reloaded the original full
transcript, making compression appear to do nothing.
Fix: pass session_db=self.session_store._db to both temp agents so the
session rotation is properly persisted. Also set _end_session_on_close
on the /compress temp agent (already done in hygiene path) to prevent
cleanup from ending the newly rotated session.
These three assert the eager build contract — stored runtime overrides /
profile db reach _make_agent synchronously, and the agent binds to the
compression tip. Under deferred-by-default the build runs off-thread, so
they raced the timer (green in CI, flaky locally). Pin them to
eager_build; deferred coverage lives in the protocol tests.
Statusbar items declared a 'title' string (e.g. YOLO, gateway health,
agents, cron, version, context usage) that was populated by
use-statusbar-items.tsx but never forwarded to the rendered DOM in
StatusbarControls — so every statusbar button/menu/text/link had no
hover hint.
Wrap the four render branches (menu trigger, text, link, action) in
the existing 'Tip' component from components/ui/tooltip.tsx. Tip is
self-contained (carries its own Provider), instant (delayDuration=0),
themed (bg-foreground/text-background, auto-inverts per theme), and
already in use elsewhere in the desktop shell. Renders the child
untouched when label is falsy, so items without a title stay
zero-cost.
Collapse the duplicated cold-resume / lazy-watch / create scaffolding into
shared helpers: _deferred_session_record (the live-session dict minus the
agent), _lazy_resume_info (the not-yet-built session.info), _claim_or_reuse_live
(lock + double-checked register-or-reuse), and _schedule_agent_build (the
pre-warm timer). Net -12 lines, three copies of the ~30-key session dict and
the lazy-info block down to one each. No behavior change.
Per review: gating the faster path behind a `defer_build` flag that the
only caller always sends is pointless. Flip it — `session.resume` now
defers the agent build by default for every caller (desktop + Ink TUI);
a caller that needs the agent built synchronously passes `eager_build:
true` (used by the build-race test). The desktop no longer sends a flag.
While verifying the flip, fixed two real parity gaps the deferred path
had vs the old eager (`_init_session`) path:
- `_enable_gateway_prompts()` was never called on a deferred resume, so
approvals/clarify wouldn't route through the gateway prompt callbacks.
- `_start_agent_build` never wired `background_review_callback` /
`memory_notifications`, so a deferred-built session's self-improvement
"💾 …" summary leaked to stdout instead of rendering in-transcript.
Wiring it there also fixes it for `session.create` sessions, which
build through the same path.
ACP is unaffected (it uses its own session_manager, not this RPC); the
Ink TUI already consumes the same lazy `info` shape from session.create
and upgrades on the later `session.info` event.
Switching sessions in the desktop app could freeze the whole UI for
several seconds on heavy, tool-rich chats. Root causes and fixes:
- Cold `session.resume` built the AIAgent (MCP discovery, prompt/skill
build) *before* returning, and the desktop awaits that RPC before it
paints — so the entire switch blocked on the build. Add an opt-in
`defer_build` resume path (the contract `session.create` already uses):
return the full display transcript immediately, register an upgradable
live session, and pre-warm the agent on a short timer. The persisted
runtime identity (model/provider/base_url/api_mode/reasoning/tier) is
restored on the deferred build so it can't drop the provider.
- Nothing bounded how many in-memory agents accumulate; a user who
reconnects often piled up detached sessions for the full 6h TTL. Add a
soft LRU cap (`max_live_sessions`, default 16) that evicts the
least-recently-active DETACHED sessions (no live client) — never a
running, awaiting-input, mid-build, or live-transport one. Reopening
re-resumes from disk.
- On the prefetch-hit cold-resume path, skip rebuilding a throwaway
merged-message array (and its 1000-entry Map) when the prefetch already
painted the exact transcript; the downstream sameMessageList guard
already drops the publish, so it was pure main-thread cost.
The desktop opts into `defer_build` for every non-watch cold resume; the
eager path stays for CLI/TUI and existing callers.
When the gateway persists a user message after a transient provider
failure (429/timeout/auth error), subsequent retries of the same
Telegram message could stack duplicate user turns in the transcript,
causing the agent to fall behind by 1-2 messages.
Add has_platform_message_id() to SessionDB (using the existing
idx_messages_platform_msg_id partial index) and a SessionStore wrapper.
The gateway's transient-failure path checks this before
append_to_transcript -- if the platform_message_id is already
persisted, the duplicate write is skipped.
Salvaged from #47869 by @davidgut1982. Adapted to current main which
has additional append sites and an existing content-based dedupe in
the exception handler path.
Closes#47237
todo_tool crashed with `AttributeError: 'str' object has no attribute 'get'`
when the LLM emitted the `todos` param as a JSON-encoded string instead of an
array, or as a list containing non-dict items (observed intermittently on
Claude 4.5/4.6/4.7, and after a prior tool-call rejection where the model
"self-corrects" by wrapping the list in json.dumps).
Three additive guards, no behavior change for well-formed input:
- todo_tool(): if `todos` is a str, json.loads it; reject unparseable strings
and non-list values with a clear tool_error instead of crashing downstream.
- _validate(): non-dict items return a {id:"?", content:"(invalid item)"}
placeholder rather than calling .get() on a str/int/None.
- _dedupe_by_id(): non-dict items get a synthetic key so _validate handles them.
Salvaged from #14785 by @Tranquil-Flow (authorship preserved via cherry-pick).
Comprehensive tests: JSON-string coercion (parse / unparseable / non-list /
non-string), non-dict list items (str/None/int/mixed), and a well-formed-
unchanged regression class — both guards mutation-verified to fail without them.
Closes#14185. Supersedes #14187, #22505, #14350 (same fix, less/no test
coverage) and #16952 (bundled unrelated scope-creep).
During stdio MCP server startup, _run_stdio (an async method) called the
synchronous check_package_for_malware() inline. That makes a blocking
urllib HTTPS POST to api.osv.dev whose own timeout doesn't reliably cover a
stalled SSL handshake, so an intermittent network issue froze the entire
asyncio event loop for up to ~120s — blowing past the TUI/gateway's 15s
startup budget and showing "gateway startup timeout".
Run the check via asyncio.to_thread (off the loop) AND bound it with
asyncio.wait_for(timeout=_OSV_MALWARE_CHECK_TIMEOUT_S=12s). The malware check
is fail-open, so on timeout we log and proceed rather than blocking startup.
Salvaged from #29190 by @qdaszx (re-applied on current main — the call site
moved since the PR was opened), combining the to_thread approach also proposed
in #29192 by @ygd58. Two load-bearing tests: event-loop-not-blocked-during-
check and timeout-fails-open — both mutation-verified to fail against the old
inline blocking call.
Closes#29184.
Co-authored-by: ygd58 <buraysandro9@gmail.com>
atomic_yaml_write used default yaml.dump which emits indentless
sequences (list items at column 0), while atomic_roundtrip_yaml_update
(ruamel.yaml) emits 2-space-indented sequences. Cross-path writes to
the same config.yaml toggled indentation on every save, eventually
producing a mixed-indent file that js-yaml rejects with 'bad indentation
of a mapping entry', silently dropping custom_providers and breaking
model switching.
Add IndentDumper SafeDumper subclass that forces indentless=False,
route atomic_yaml_write through it. Route tui_gateway._save_cfg and
the Telegram adapter's config writer through atomic_yaml_write so all
paths emit the same 2-indent layout.
Salvaged from #32034 by @xxxigm. Adapted to current main which already
has allow_unicode=True (from #51356) but was missing IndentDumper.
Closes#31999
Replace _count_real_sudo_invocations (which called
_rewrite_real_sudo_invocations and discarded the rewritten string) with
a lightweight token scan that reuses the same tokeniser but skips string
building. Remove the agent-facing tip about nested sudo in heredocs —
the cache-cleared warning is enough.
Pipe one password line per sudo invocation in compound commands so a correct
password is not rejected on the second `sudo` in `sudo a && sudo b`. Drop the
session cache when sudo returns Authentication failed, surface sudo_auth_failed
in the tool result, and add hints for interactive sessions.
A prompt sent while a turn was in flight got rejected with 4009 "session busy",
which pushed clients (the desktop app) into a deadline-bounded busy-retry. When
turn teardown outlived that deadline — e.g. the user hits stop while a slow,
non-interruptible tool (web_search, read_file, an MCP call) is mid-flight, since
the sequential executor only checks the interrupt flag between tools — the
resubmitted message was silently dropped: "it just doesn't listen".
Wire the previously-dead display.busy_input_mode config into prompt.submit:
instead of rejecting, apply the policy and queue the message to run as the next
turn (drained in run()'s tail, ahead of goal/notification follow-ups). Modes:
interrupt (default) interrupts the live turn so it winds down promptly then runs
the queued message; queue runs it after the current turn finishes; steer injects
it into the live turn when accepted, else queues. The queued slot pins the
sender's transport and losslessly merges a second arrival. No client deadline,
no dropped sends.
#48879 closed the tool-call sequence on interrupt inside finalize_turn so a
/stop after a tool no longer persists a `tool` tail that the next user message
turns into a `tool -> user` role-alternation violation (which strict providers
like Gemini/Claude react to by hallucinating a continuation and ignoring prior
context — what users see as "lost context after stop").
But the retry-wait, error-handling, and post-error retry-wait interrupt aborts
in conversation_loop return early and never reach finalize_turn, so they still
persisted and returned a raw `tool` tail. Interrupting during provider
backoff/rate-limiting (common under heavy work) hit exactly this path.
Extract the close into a shared close_interrupted_tool_sequence helper and apply
it at every interrupt abort (finalize_turn + the three early returns) so the
whole bug class is fixed, not just the one site.
The desktop self-updater rebuilds and re-signs the .app on each user's own
machine (`hermes desktop --build-only` -> electron-builder `--dir`). With
CSC_IDENTITY_AUTO_DISCOVERY on (its default), electron-builder signs the
type=distribution, hardened-runtime bundle with whatever identity is in that
user's keychain -- typically a personal "Apple Development" cert -- which
stalls/fails the sign step (no Developer ID, no provisioning profile) or
clobbers the original notarized signature with an unusable one, tripping
Gatekeeper on every post-update launch.
Force ad-hoc signing for the local packaged rebuild instead: deterministic,
and exactly what _desktop_macos_relaunchable_fixup already finishes off.
No-op for source runs, off-macOS, when a real identity is configured
(CSC_LINK / APPLE_SIGNING_IDENTITY), or when the caller already pinned the flag.
fixes stacked PRs no-checks bug where
main < a < b
a merges into main
b is retargeted to main
but b doesn't run checks since it's not considered a new pr to main
now b will simply already have passing ci :)
Block scale-to-zero suspend while background async delegations are active, and restore runtime status to running on real inbound after a dormant wake.\n\nAdd regression coverage for both review findings.
The /learn authoring prompt taught a subset of the HARDLINE skill rules,
and stated the <=60-char description rule without making the model enforce
it — so generated descriptions overshot (up to 202 chars), which the
60-char system-prompt skill index then silently truncates.
- description: add the index-truncation rationale, a count-and-trim
self-check, and a good/bad length example so the model actually hits <=60.
- add platforms-gating rule (OS-bound primitives -> declare platforms:).
- add author-credits-human-first rule.
- round out the Hermes-tool framing with the full wrapped-tool mapping and
references/templates layout.
Closes#52367.
The host-allowlist hardening (#30611) plus the refresh heal (#49735) left
the documented NOUS_INFERENCE_BASE_URL dev/staging escape hatch unreachable
for OAuth sessions, despite three code comments asserting it still works.
Root cause — resolution precedence in resolve_nous_runtime_credentials:
inference_base_url = (
_optional_base_url(state.get("inference_base_url")) # stored — wins
or os.getenv("NOUS_INFERENCE_BASE_URL") # env — unreachable
or DEFAULT_NOUS_INFERENCE_URL
)
A staging OAuth login persists its inference_base_url, but the allowlist
rejects the staging host and the refresh heal rewrites the stored value to
the production default. The stored (now prod) value is then read BEFORE the
env var, so the override never takes effect — every request 401s against
prod or is pinned to prod, and setting the env var does nothing.
Fix: the user-set env override is the most-trusted source, so consult it
FIRST for the URL used to build the client / returned to callers — while
keeping the PERSISTED value the validated, network-provenance one (the
override is a runtime overlay, never written to auth.json, so unsetting it
cleanly reverts to prod). Applied at both chokepoints:
- resolve_nous_runtime_credentials (no-refresh read path AND refresh path)
- the nous_portal proxy adapter, which re-validates the resolver's returned
base_url against the prod allowlist as defense-in-depth and would
otherwise reject a legitimate staging override at the forward boundary.
New _nous_inference_env_override() / split of stored-vs-effective URL keep
the threat model intact: Portal-returned URLs are still allowlist-validated
at every network site, and the env path stays ungated (trusted OS user).
Also folds in the no-refresh read-path heal (supersedes the approach in
the open #50265): a poisoned stored staging host now heals to the prod
default on read even when no refresh fires.
Tests: TestEnvOverrideWins (env wins on read + refresh paths; override never
persisted; poisoned stored heals) and TestProxyAdapterEnvOverride. Verified
the 4 behavioral tests fail against pre-fix code and pass with the fix; full
inference-validation + nous-provider suites green (85 passed). E2E-validated
against a real temp HERMES_HOME exercising the real resolver + proxy adapter:
resolver→staging, persisted→prod, proxy→staging, unset→reverts to prod.
- Remove dead `chosen_base or effective_base` fallback; _select_zai_endpoint
always returns a non-empty base URL (returns current_base on cancel).
- Add .rstrip("/") to official-endpoint return for symmetry with custom-proxy
path (both now return normalized URLs).
- Replace magic index 4 with len(ZAI_ENDPOINTS) in custom-proxy tests so they
don't break if a 5th endpoint is added to ZAI_ENDPOINTS.
Z.AI now uses a curses picker instead of plain text input for base URL,
so the existing TestBaseUrlValidation tests (which used zai as their test
subject) are migrated to MiniMax, which still uses the text input path.
Add TestZaiEndpointPicker covering:
- Selecting each official endpoint (Global, China, Coding Plan Global,
Coding Plan China) saves the correct base URL to config
- Custom proxy URL entry (valid + invalid rejection)
- Cancel keeps the existing base URL
- Current endpoint is the default choice in the picker
- Non-standard URL defaults to the Custom proxy option
When provider_id == 'zai', replace the plain text Base URL input with
_select_zai_endpoint, which presents a curses picker offering Global,
China, Coding Plan Global, Coding Plan China, and custom proxy options.
Other API-key providers (MiniMax, DeepSeek, etc.) keep the text input.
Presents a curses-based picker (via _prompt_provider_choice) offering the
four official Z.AI endpoints — Global, China, Coding Plan Global, Coding
Plan China — plus a custom-proxy option. Sourced from ZAI_ENDPOINTS in
auth.py so it stays in sync with the probe list.
Not yet wired into the setup flow; that comes in the next commit.
Add a hover/focus "Remix" action on each completed draft card in the
generation grid. It re-runs generation with the chosen draft fed back in
as the reference image, keeping the same prompt and staying on step 2 so
the user can explore variations without starting over.
Because regenerating is slow and replaces the current drafts, the first
remix shows a one-time confirmation; the acknowledgement is persisted so
subsequent remixes fire immediately.
Move terminal/execute_code/read_file preview compaction into agent.display so CLI, gateway, and Ink TUI all inherit the same labels that desktop introduced in #52321.
The shared preview keeps raw args intact while trimming display-only shell plumbing (`cd`, pipe tails, banner/status echoes) and read_file line ranges. Desktop now prefers backend `context` for live rows and keeps its TypeScript fallback only for hydrated history.
The pet generation image-processing suite is deterministic but expensive enough
to blow the per-file CI timeout on Linux (140s), and it is not relevant to the
fast timeout PR's normal signal. Keep it available for manual validation, but do
not run it by default.
Set HERMES_RUN_SLOW_PET_TESTS=1 to enable the suite. The canonical test wrapper
now preserves that opt-in variable through its hermetic env.
The fixed "up to 5 minutes" wording undersells the slow quality-first path
(OpenAI image via OpenRouter), where a full hatch can run far longer. Use an
open-ended "several minutes" instead so the banner stays honest across the
fast and slow providers.
The quality-first default (OpenAI image via OpenRouter) is slow, and a full
hatch fans out ~8 rows with up to 3 retries each (300s/call) across 2 parallel
waves, so the absolute backend worst case is ~30 min. The old ceilings fired
mid-run:
- per-image HTTP call: 180s -> 300s (a single cold row can exceed 3 min)
- drafts RPC: 240s -> 420s (single wave, no retries — 7 min is ample)
- hatch RPC: 420s -> 1hr (sits above the ~30 min backend worst case)
The hatch ceiling is intentionally well above the realistic max so the frontend
never throws "request timed out" before the backend has exhausted its own
retries. The background-resumable notification path remains the real UX safety
net — the user can close the modal and get pinged on completion.
Make completed desktop tool rows read like useful activity labels instead of raw plumbing: terminal rows use a dispatch-style shell summarizer for agent wrappers, and read_file rows keep the action plus filename and requested line range.
The shell cleanup follows condensed-milk-pi's shape: split command compounds on real separators, strip pipe tails inside each segment, clean redirects/env prefixes, then classify setup/banner/status segments. Multi-command probes render as `first command + N commands`; the full command remains available in copy/detail.
Read rows now render as `Read package.json` or `Read main.ts L25-34`, using requested positive offset/limit and returned line numbers only as fallback for negative/unknown offsets.
Recurring cron jobs were prompt-cache-cold on every fire. session_id is
built as cron_<job_id>_<timestamp>, and the Codex/Responses transport used
session_id directly as prompt_cache_key — so the timestamp changed the cache
key on every run and the static prefix (agent identity + tool schemas) was
re-paid each tick.
Derive prompt_cache_key from a SHA-256 of the static prefix (instructions +
sorted tool schemas) instead. Repeated fires of the same job share one
content-addressed key (pck_<hash>) and reuse the warm prefix within the
provider's cache TTL. The key changes exactly when the prefix changes —
edit the job's prompt or toolset and it re-keys; leave it alone and it stays
stable.
session_id is left untouched for transcript isolation, log correlation, and
the Codex/xAI session-scope routing headers (session_id, x-client-request-id,
x-grok-conv-id) — those are the per-fire identity, not the cache key. Only the
prompt_cache_key body field (standard OpenAI/Codex path and the xAI extra_body
field) is content-addressed.
Closes#51395.
Co-authored-by: spiky02plateau <spiky02plateau@users.noreply.github.com>
Co-authored-by: JoaoMarcos44 <JoaoMarcos44@users.noreply.github.com>
* fix(relay): authorize relay-delivered events by delivery, not source.platform
The #52190 upstream-authz fix keyed _is_user_authorized off
source.platform via _adapter_authorization_is_upstream(source.platform).
But a relay *message* inbound carries the UNDERLYING platform
(source.platform == discord/telegram/...), NOT Platform.RELAY, because
ws_transport._event_from_wire maps the connector's wire payload
(platform="discord") straight onto SessionSource for session-keying and
egress. The relay adapter is registered only under Platform.RELAY, so
adapters.get(Platform.DISCORD) misses, the trusted-upstream branch is
skipped, and the user hits the env-allowlist default-deny:
WARNING gateway.run: Unauthorized user: <id> (<name>) on discord
(Live staging bug: alpha tester linked successfully, then every
follow-up DM was silently dropped.)
Fix: the authentic trust signal is that the event was delivered over the
per-instance-authenticated relay WS, not which platform it underlies. Add
a wire-INVISIBLE SessionSource.delivered_via_upstream_relay flag, stamped
by the relay transport in _event_from_wire, and authorize on it. The flag
is excluded from to_dict/from_dict so a peer can neither forge it across
the wire nor have it restored from persistence. The existing adapter-flag
check is retained for events whose source.platform IS Platform.RELAY
(interaction-passthrough). A direct Discord event on a multiplexing
gateway (direct + relay adapters) is unmarked and still default-denies.
* fix(relay): use identity check on delivery marker to avoid MagicMock fail-open
A MagicMock() source (used by test_signal.py and other gateway tests) auto-
vivifies source.delivered_via_upstream_relay as a truthy Mock, which a bare
truthiness check would treat as authorized — flipping
test_signal_in_allowlist_maps from False to True. The marker is a real bool on
SessionSource, so check 'is True' explicitly: refuses to authorize any non-bool
stand-in, defensive against accidental fail-open.
OpenRouter/Nous image gen now runs a quality-first model chain by default:
attempt the highest-fidelity OpenAI image model first, then fall back to
Gemini 3 Pro Image when it's access-gated/unavailable/times out. An explicit
OPENROUTER_IMAGE_MODEL / config model override pins one model with no fallback.
Atlas validation rejects malformed model output instead of shipping it: adds a
per-state collapse guard (a single sliver/fragment row no longer passes because
other rows are healthy), on top of the existing postage-stamp + multi-pose
checks.
Desktop: pet-gen native notifications are now "global" (not tied to a chat
session), so a background generation started from the command center fires an
OS notification when the user is away even with no active session. Adds a
neutral "This can take up to 5 minutes." banner on step 1, and lets the
provider picker auto-size.
Tests updated/added for the OpenRouter fallback chain, the collapse guard, and
the global notification path.
Make verification closure the default coding behavior after landed file edits while keeping bounded retries and config/env switches for users who need to disable it.
Addresses review on #51077 (kxee). The continuable-cron mirror reused
gateway.mirror.mirror_to_session, which writes role=assistant — re-
introducing the exact alternation violation #2313 (37a997945)
deliberately removed: a cron brief landing as assistant after the
agent's last turn yields assistant->assistant, which breaks strict-
alternation providers (OpenAI/OpenRouter) per issue #2221. The mirror/
mirror_source metadata is also dropped at the SQLite boundary, so the
[Delivered from cron] label is lost on replay.
This is an intentional, opt-in (default OFF) reversal of #2313's
'cron output does not belong in interactive history' for the reply-to-
cron use case — gated behind cron.mirror_delivery / attach_to_session.
Fixes:
- mirror_to_session gains a role param (default 'assistant' — interactive
send_message mirror unchanged, it IS the agent speaking). Cron paths
pass role='user' with a '[Cron delivery: <task>]' prefix so the brief
collapses via repair_message_sequence's consecutive-user merge on every
provider, and stays distinguishable on replay despite the metadata drop.
- thread_seeded: defer seeding + the flag until delivery into the new
thread actually succeeds. Previously set pre-delivery, so an open-
succeeds / deliver-fails case both stranded a seeded-but-unseen brief
AND suppressed the DM-fallback mirror.
- seed mirror now passes user_id='system:cron' to resolve the exact
thread-keyed session row it just created.
- dedupe the duplicate BasePlatformAdapter import in _deliver_result.
- trim oversized docstrings to non-obvious WHY (AGENTS.md).
- docs: document cron.mirror_delivery / attach_to_session in
website/docs/user-guide/features/cron.md.
- test: assert the cron mirror writes role='user' with the label prefix.
204 cron+mirror tests pass.
Continuable cron jobs (attach_to_session / cron.mirror_delivery, default
OFF) now prefer a dedicated thread on thread-capable platforms, falling
back to origin-DM mirroring where threads don't exist.
- Thread-capable (Telegram topics, Discord/Slack threads): open a fresh
thread for the job via the shipped adapter.create_handoff_thread,
route the brief into it, and seed the thread-keyed session so the
user's in-thread reply continues with full context. This is the
'continuable cron opens its own thread' interface.
- DM-only (WhatsApp/Signal/SMS): create_handoff_thread returns None ->
fall back to mirroring into the origin DM session (existing behaviour).
Reuses existing infrastructure end-to-end — no new adapter surface, no
provider-chain signature change:
- adapter.create_handoff_thread (already implemented per-platform,
returns None on unsupported platforms = the fallback signal)
- the live SessionStore via adapter._session_store (already set on every
adapter), reached without threading a new param through the frozen
CronScheduler.start() contract
- gateway.mirror.mirror_to_session for the seed/append
- existing per-target delivery routing carries the new thread_id for free
Mirrors GatewayRunner._process_handoff's open-thread-or-fallback +
seed pattern, standalone for the cron delivery path. thread_seeded
guards against a double-mirror after seeding. Scoped to the origin
target only; fan-out/broadcast targets are never threaded or mirrored.
Config docs updated (cron.mirror_delivery) + cronjob tool
attach_to_session description reframed around continuable/thread-preferred.
Tests: +5 (thread id returned on thread platform; None on DM platform;
None without capability/loop; seed creates thread session + mirrors;
seed no-op on empty). 22/22 in TestCronDeliveryMirror; 532 cron tests
pass (4 failures pre-existing: croniter-not-installed + TZ).
Multi-participant parity with interactive send_message, which passes
HERMES_SESSION_USER_ID to gateway.mirror.mirror_to_session so the mirror
lands in the exact participant's session.
- cronjob_tools._origin_from_env now captures user_id from the session
context at job-create time (alongside platform/chat_id/thread_id).
- _maybe_mirror_cron_delivery forwards user_id to mirror_to_session.
- _deliver_result threads origin.user_id through for the origin target.
Effect: in a per-user-isolated group chat (group_sessions_per_user=True,
the default), the mirror resolves to the member who scheduled the job
instead of conservatively no-op'ing on ambiguous candidates. DMs and
shared group/thread sessions are unaffected (single candidate). Default
still OFF.
Tests: helper forwards user_id; E2E _deliver_result forwards origin
user_id. 17/17 in TestCronDeliveryMirror; 527 cron tests pass (4 failures
pre-existing: croniter-not-installed + TZ, identical on baseline).
The cron->session mirror now fires ONLY for the delivery target that
equals the job's origin (platform+chat_id[+thread_id]). A job created
from a live gateway chat stamps that chat as origin, and that session is
guaranteed to exist (it is the conversation the user scheduled the job
in). Fan-out / broadcast / home-channel-fallback targets are never
mirrored: they are not a continuation of a conversation and may have no
session at all.
This makes the prior 'cold-start session seeding' concern a non-case by
construction: when the mirror semantically applies the session exists;
when none exists the target was never the origin, so we no-op.
Adds _target_matches_origin() + origin-scoping tests (exact match,
other-chat/other-platform/no-origin rejection, thread scoping, fan-out
mirrors only the origin target).
Adds an opt-in path so a cron job's delivered output is also appended to
the TARGET chat's gateway session transcript (as an assistant turn), so a
user reply to a recurring delivery (daily brief, reminder) is answered with
the delivery in context instead of 'what is that?' amnesia.
- Reuses the shipped gateway.mirror.mirror_to_session — the same primitive
interactive send_message mirroring already uses. No messaging-toolset
change (cron still can't call send_message; this rides delivery).
- Gated: per-job attach_to_session overrides global cron.mirror_delivery
(config.yaml). Default OFF — historical isolation preserved byte-for-byte.
- Mirrors the CLEAN agent output, not the cron header/footer wrapper.
- Alternation/cache-safe: append lands at a turn boundary, never mid-loop,
never mutates the cached system prompt. Cold-start (no target session)
is a silent no-op; mirror errors never fail a successful delivery.
- Surfaced on the cronjob tool (attach_to_session) + config schema.
Driven by enterprise cron-as-control-plane use case. 10 new tests; full
cron + cronjob-tool suites pass (600).
ToolFallback rebuilt the `part` wrapper every render, defeating the
buildToolView memo and re-running a full JSON.stringify of the result on
every ~33ms stream delta. A /learn over a large directory (many ~100KB
tool results) saturated the renderer main thread (hang/throttle) and
spiked memory until it OOMd (crash).
- Re-derive a stable `part` from the referentially-stable args/result so
the view/copy memos hold across deltas.
- Clamp every inline-painted payload (detail, stdout/stderr, rawResult,
technical trace) to MAX_TOOL_RENDER_CHARS; the row's Copy button still
reads the uncapped view.detail for the full output.
A DM reply carries no guild_id, so the connector's egress guard cannot
resolve the owning tenant from metadata.guild_id and declines the send
with "discord egress declined: target not routed to an onboarded tenant"
— the bug behind "the bot never replies in DMs". Guild replies are
unaffected (they carry guild_id), which is why the guild path worked
end-to-end while DMs looked broken.
The connector now resolves a DM reply's tenant from the recipient's
author binding (gateway-gateway #67, resolveByUser keyed on
metadata.user_id) — the outbound counterpart to inbound Phase 7a
author-first resolution. But it needs the recipient user_id ON the
outbound action, and the adapter only re-attached guild_id
(_capture_scope/_with_scope), no-op for DMs (the docstring even said so).
This extends the adapter's inbound-scope capture: for a DM (no guild_id)
remember chat_id -> the authentic author user_id we observed, and
re-attach it as metadata.user_id on outbound. Guild capture is unchanged
and wins when present; user_id is the DM-only fallback. The id is the one
the connector observed inbound (never gateway-asserted), so the trust
invariant holds.
+4 unit tests (DM reply re-attaches user_id + no guild_id; unknown chat
invents nothing; explicit user_id preserved; guild reply never carries
user_id). Proved load-bearing (reverting the re-attach fails the DM
test). 144 relay tests pass, ruff clean.
Pairs with gateway-gateway #67 (the connector-side resolver). Together
they close the DM-reply egress gap end-to-end.
terminal_tool() resolves a per-task cwd override that WINS over config["cwd"]:
cwd = overrides.get("cwd") or config["cwd"]
config["cwd"] is sanitized for container backends in _get_env_config() (host
prefixes /Users//home//C:\\/C:/ and relative paths are replaced with the
backend default /root). But the override was applied RAW — it was never run
through that guard. The gateway/TUI registers the host launch dir as a cwd
override for workspace tracking (tui_gateway/server.py _register_session_cwd
-> _terminal_task_cwd -> _session_cwd -> os.getcwd()), so on a container
backend a host path leaked straight to `docker run -w <host-path>`:
- Windows desktop: -w C:\Users\<user> -> container fails to start (exit 125)
- POSIX: -w /home/<user> -> same
The ACP adapter translates its override cwd (acp_adapter/session.py
_translate_acp_cwd), but the gateway path did neither translation nor
sanitization, so the override bypassed the one guard that would have caught it.
Fix: extract the host/relative-path predicate into a shared
_is_unusable_container_cwd() helper (so the existing _get_env_config()
sanitizer and the new guard can't drift), and re-apply it to the *resolved*
cwd at the override-resolution site. Valid in-container override paths
(RL/benchmark sandboxes that set cwd to /workspace, /root, ...) are absolute
non-host paths and pass through untouched.
Tests: unit-pin the predicate (Windows backslash/forwardslash, POSIX home,
macOS /Users, relative, valid container paths) AND an E2E call-site pin that
drives terminal_tool() with a host-path override registered and asserts the
cwd reaching _create_environment is sanitized. Mutation-verified: reverting
the call-site guard makes the two host-path E2E tests fail (showing the raw
host path leaking) while the valid-/workspace-override test stays green.
The desktop bootstrap (and curl/PowerShell/docker installs) seeded
~/.hermes/SOUL.md with a comment-only scaffold that contained no persona
text. That shadowed the runtime default (_ensure_default_soul_md ->
DEFAULT_SOUL_MD), since seeding is guarded by 'if SOUL.md doesn't exist'.
Result: every fresh installer install got the empty template instead of
the documented Hermes persona; desktop just made it visible in onboarding.
- install.sh / install.ps1 / docker/SOUL.md now write DEFAULT_SOUL_MD.
- _ensure_default_soul_md() upgrades a SOUL.md still matching the known
legacy scaffold in place; customized files (any deviation, incl. a
persona appended below the comment) are never touched.
- Detection normalizes CRLF/BOM so Windows-installer drift still matches.
The Desktop GUI (tui_gateway) slash worker subprocess has no reader for
the CLI's _pending_input queue. /learn's CLI handler prints the ack and
puts the built prompt onto that queue, so in the TUI the prompt was
silently dropped — ack shown, no LLM turn, no skill created (#51829).
command.dispatch already handles 'learn' correctly (returns
{type: send, message: build_learn_prompt(arg)}), but 'learn' was missing
from _PENDING_INPUT_COMMANDS, so slash.exec fell through to the worker
instead of routing to command.dispatch. Add it to the frozenset, matching
the existing goal/queue/steer/plan pattern.
The gateway-side BEHAVIOUR layer that consumes the relay scale-to-zero
primitives (gateway-gateway Phase 5): the gateway decides it is idle and
drives the relay transport dormant so the platform (Fly autostop:"suspend")
can suspend the now-traffic-idle machine, which wakes on the connector's
wakeUrl poke (decisions.md Q3=C', D1-D13).
- gateway/scale_to_zero.py: pure helpers — scale_to_zero_enabled (the NAS
Labs HERMES_SCALE_TO_ZERO stamp, D11/Q8=A), parse_idle_timeout_seconds
(config.yaml gateway.scale_to_zero.idle_timeout_minutes, D2),
messaging_is_relay_only_or_absent (F6/D1), should_arm (D1/D11/§3.4(1)),
is_idle (D2/D3/F7).
- gateway/run.py: _last_inbound_at clock stamped on user inbound in
_handle_message (F13); the arm-gate + idle predicate + the
_scale_to_zero_watcher dormant sequence (mark draining -> adapter
go_dormant() -> cooldown), started only when armed. Deliberately NOT the
stop path and NOT mark_resume_pending (F12/D13).
- tools/process_registry.py: has_any_active() for the bg-work guard (D3/F7).
- hermes_cli/config.py: gateway.scale_to_zero.idle_timeout_minutes default 5.
Tests: 38 pure-logic + 6 watcher (incl. bg-work regression guard proven RED).
Full relay + scale-to-zero suites: 184 passed. The 20 unrelated failures in
the broader run are PRE-EXISTING on origin/main (custom-provider/tools tests),
confirmed via a pristine baseline worktree.
Net-new WebSocketRelayTransport.go_dormant() + RelayAdapter.go_dormant() —
the third transport mode the scale-to-zero behaviour layer needs, distinct
from both disconnect() and an unexpected close (decisions.md D12/F14):
- disconnect() sets _closing=True and CANCELS the reconnect supervisor
(terminal "shutting down for good") -> a suspended machine never re-dials
on wake, stranding its buffered backlog.
- an unexpected close re-dials IMMEDIATELY -> the socket never stays down,
so the platform proxy never suspends the machine.
go_dormant(): going_idle->ack (reuse go_idle), then close the socket WITHOUT
setting _closing, so the reader's fall-through still arms the reconnect
supervisor (wake path stays live) but on the longer _dormant_redial_s
cadence so it doesn't fight the platform suspend window. A successful re-dial
clears _dormant. Honors the §3.4 wake->reconnect->drain contract.
Tests: 6 new in test_relay_going_idle.py incl. the F14 regression guard
(routing dormancy through disconnect() fails exactly the 4 wake-path tests).
Full relay suite 140 passed.
Regression for the refText crash: attachmentDisplayText and
optimisticAttachmentRef must return null (not throw) when handed an
undefined/null attachment hole, so the submit path can't reproduce
"Cannot read properties of undefined (reading 'refText')".
A session switch or draft restore can leave undefined/null holes in the
composer attachments array. AttachmentList was guarded against this in
#49624, but the sibling submit path was not: submitPromptText maps the
same array through attachmentDisplayText/optimisticAttachmentRef and
buildContextText (a.kind / a.label / a.refText), so a hole threw
"Cannot read properties of undefined (reading 'refText')" — an uncaught
renderer error that blanks the chat pane and shows "Desktop app link
offline".
Close the whole bug class:
- attachmentDisplayText / optimisticAttachmentRef no-op on a falsy
attachment (shared chokepoint, also protects thread.tsx drop handler).
- submitPromptText filters falsy entries from the source array, and
buildContextText filters its (possibly post-sync) input before reading
fields.
Stabilize the long-running-tool heartbeat test by patching stale thresholds inside the test and asserting the heartbeat exceeds the idle ceiling, which preserves intent while removing scheduler-sensitive assumptions that flake in CI.
Wire the sparkle generate button's cancel action to the same discard/reset path as step-2 cancel so abort semantics are consistent and always return to step 1 while retaining the prompt input.
PR #52151 hardened the runtime-status liveness check to trust a readable
live process command line over stale gateway_state.json argv, so a recycled
PID now owned by an s6 supervisor no longer counts as a running gateway.
That fix is correct but incomplete for the reported symptom: the web
dashboard showed a named profile's gateway green while
`hermes -p <name> gateway status` showed it stopped. Two further issues:
1. Cross-profile PID reuse. In per-profile Docker supervision, one profile's
stale `gateway_state.json` can record a PID the OS later recycled onto a
DIFFERENT profile's live gateway. That PID's command line still
`looks_like_gateway`, so the dead profile was reported running. The
recorded argv has its `-p <name>` selector stripped in-process by
`_apply_profile_override`, so it cannot disambiguate; the live `/proc`
cmdline still carries it. `get_runtime_status_running_pid` now accepts an
`expected_home` and validates the live command line belongs to THAT
profile (mirroring `hermes_cli.gateway._matches_current_profile`, the
logic the CLI scan path already uses — which is why the CLI was correct).
`_check_gateway_running` passes the enumerated profile dir.
2. The existing regression test `test_gateway_running_check_falls_back_to_
runtime_state` used the live pytest PID with a gateway-shaped record; once
the live cmdline became authoritative it no longer looked like a gateway.
Updated to mock the live cmdline to the real separate-process scenario it
describes.
The active-profile path (`get_running_pid`) is intentionally left unscoped:
it is lock-verified and any live gateway cmdline is acceptable there. Multiplex
mode is unaffected — `running` state is only ever written to a gateway's own
home, never a secondary served profile's.
Adds coverage for: cross-profile PID reuse (named + default), matching
profile cmdline (`-p`, `--profile`, explicit HERMES_HOME=), the bare default
gateway, and the unreadable-cmdline cross-platform fallback. Each new
cross-profile assertion fails without the profile scope and passes with it.
Co-authored-by: helix4u <4317663+helix4u@users.noreply.github.com>
Remove cute/chibi-biased wording from base draft variations and explicitly preserve the requested mood across base and row prompts so scary, eerie, or other non-cute concepts are honored while keeping sprite constraints.
Two ways the update overlay read as stuck even though the update was
streaming progress underneath:
- In-app (macOS/Linux) UpdatesOverlay: runStreamedUpdate forwards every
stdout line as a progress event with percent: null, and ingestProgress
wrote that straight through — clobbering the milestone percents (10/60)
so the bar fell back to indeterminate on every log line. Keep the last
percent when a line carries null.
- Staged install/update overlay: the bar is completedCount / totalCount,
which counts only *finished* stages, so a long first stage pinned it at
"0 of 2" / 0% until the stage ended. Count the running stage as half a
unit so the bar advances during the stage (the per-stage spinner already
shows which step is live).
Both are display-only; no stage/event semantics change. (The Windows
hermes-setup Tauri progress UI in apps/bootstrap-installer has the same
counter-only-on-completion logic — parity follow-up.)
restartGateway, getActionStatus, getStatus, updateHermes and
checkHermesUpdate all hit window.hermesDesktop.api WITHOUT spreading
profileScoped() — unlike their siblings (getModelInfo, setModelAssignment,
grantComputerUsePermissions). _apiProfile tracks the active gateway
profile, and the Electron proxy uses request.profile to pick which pooled
/ remote backend serves the call.
So for a multi-profile or global-remote user, the System-panel "Restart
gateway" (and its status poll, plus Update / status reads) targeted the
primary/default backend instead of the one they're on: the restart hit
the wrong gateway and the poll never saw the action → it looked like
restart silently failed. Single-profile users are unaffected
(profileScoped() returns {} when no profile is active).
Add ...profileScoped() to the five backend-action helpers so they follow
the active profile like the rest of the API surface.
On macOS, the desktop updater's stage 1 (hermes update --gateway) ends by
restarting running gateways. launchd_restart() SIGTERMs the gateway and
silently waits up to agent.restart_drain_timeout (default 180s) for the
drain; the manual profile-gateway loop waits its drain budget per gateway
the same way. Neither path prints anything before the wait, so the desktop
updater's live output goes dead for minutes right after '✓ Update
complete!' — users read it as a hung update and force-kill their gateway
processes to make it move (#44515). The systemd branch already announces
its drain ('draining (up to Ns)...'); launchd and the manual loop did not.
Print the stop/drain (with PID and budget) before the wait in both paths,
mirroring the systemd branch, and assert the message in the existing
launchd drain test.
Fixes#44515
checkUpdates() ran `git rev-list HEAD..origin/<branch> --count`
unconditionally in the parallel probe batch, even on the shallow +
no-merge-base path where resolveBehindCount() ignores the result and
falls back to a SHA compare. In the #51922 failure mode that count walks
the entire remote ancestry (thousands of commits), so the work was pure
latency on every update check for the exact case the fix targets.
Split the probes into two phases: resolve --is-shallow-repository and
merge-base first, then run rev-list --count only when shouldCountCommits
says the number is meaningful (full clone, or shallow-with-merge-base).
The shallow/no-merge-base SHA fallback is preserved unchanged.
The desktop installer clones with `--depth 1`, so a public install's local
history often shares no merge-base with the freshly fetched origin tip. In
that state `git rev-list HEAD..origin/<branch> --count` enumerates the
entire remote ancestry and returns a meaningless huge number, surfacing as
e.g. "v0.17.0 (+12104)" in the update indicator (#51922).
The official-SSH branch of checkUpdates() already sidesteps this by reporting
a binary up-to-date check (`behind: currentSha === targetSha ? 0 : 1`), and
hermes_cli/banner.py guards the identical class for the CLI banner. The
passive desktop count path was the one place the shallow guard was missing.
Detect shallow + no-merge-base up front and fall back to the same SHA-based
binary check; full clones (developers / Docker dev images) keep the exact
count path unchanged. The resolution logic lives in a pure update-count.cjs
helper so it is unit-testable without booting Electron.
Ship the final pet-generation UX polish (provider picker behavior, step-2 cancel flow, banner integration, and visual consistency) and make saturated-chroma background removal C-op driven so hatch processing no longer hammers the machine during long runs.
A hosted instance fronted by the Team Gateway connector dropped EVERY relay
message as "Unauthorized user" and the agent never replied — despite the
message routing correctly through the connector to the instance.
Root cause: gateway authorization (_is_user_authorized) had no notion of
upstream-enforced authz. Platform.RELAY matches no {PLATFORM}_ALLOWED_USERS
allowlist and isn't in the HA/WEBHOOK always-authorized set, so a relay user
with no env allowlist configured hit the default-deny ("No user allowlists
configured. All unauthorized users will be denied."). The message was received,
then silently denied before reaching the agent.
This is incorrect for relay: the connector authenticates the gateway's WS with
a per-instance secret and performs owner-only author-binding resolution BEFORE
delivering. A message only reaches this gateway because the connector resolved
it to THIS instance's bound user (user_instance_binding), keyed on the author id
the connector OBSERVED off the event — never a gateway claim. The authorization
decision is already made by a trusted, authenticated upstream; there is no local
RELAY_ALLOWED_USERS allowlist to consult, and default-denying for its absence is
the bug.
Fix: add a generic BasePlatformAdapter.authorization_is_upstream capability
(default False) that the relay adapter overrides to True, plus a dedicated
trusted branch in _is_user_authorized that honors it. This is delegation to a
trusted upstream, NOT a fail-open: it fires only for an adapter that explicitly
declares the flag; every direct network-exposed adapter leaves it False and the
env-allowlist default-deny (SECURITY.md §2.6) is unchanged. Distinct from
enforces_own_access_policy, which mirrors a LOCAL config-driven allowlist —
this delegates to an authenticated upstream's decision.
Tests: behavior contract that the base defaults False, the relay adapter
declares True, a relay user (group + DM) is authorized with no env allowlist,
and crucially a non-upstream adapter with no allowlist still default-denies
(guards against the fix becoming a blanket fail-open). 6 new tests; relay +
authz + config-policy suites green (134 + 90).
Found via live staging debug of the Discord self-serve onboarding flow.
The 8-minute stream-silence watchdog only removed a stuck session from
$workingSessionIds (the sidebar dot). The composer's busy state lives in
the session-state cache and was never cleared, so a hung or looping turn
that never delivered its terminal event — including an old session
re-opened while the backend still reports it "running" — stayed wedged on
"Thinking" / Stop indefinitely.
Have the watchdog notify subscribers when it force-clears a session, and
subscribe from the session-state cache to also drop that session's
busy/awaiting/needsInput flags. updateSessionState re-syncs $busy when the
healed session is the one on screen, so the composer recovers instead of
spinning forever.
Frontend-only safety net; doesn't touch the turn lifecycle. The backend
root (a stale in-memory session["running"] surviving a dead turn thread
and re-arming busy on every resume) is a separate follow-up.
When a remote gateway dropped after a healthy boot (internet loss,
sleep/wake, VPS restart), use-gateway-boot retried with backoff forever
and never surfaced an error. The renderer sat behind the fullscreen
CONNECTING overlay with gatewayState non-open and boot.error null — no
way to reach Settings, sign in again, or switch to a local gateway. To
the user the app was simply broken on connection loss.
Raise a recoverable boot error once the reconnect loop crosses
RECONNECT_ESCALATE_AFTER (6 attempts, ≈45s), so the BootFailureOverlay
(Retry / Sign in / Use local gateway) replaces the dead-end CONNECTING
screen. The loop keeps retrying underneath; the next successful reconnect
(or a manual/wake-driven one) clears the error and dismisses the overlay.
This implements the contract already specified — but never wired up — in
use-gateway-boot.test.tsx (desktop vitest isn't in CI, so the failing
"FIX:" specs went unnoticed). All 4 hook tests + the 3 connecting-overlay
tests pass.
Three voice-mode papercuts in the desktop app:
1. Ctrl+B did nothing. The docs + `voice.record_key` advertise Ctrl+B to
talk, but the desktop never bound it (only ⌘B = sidebar existed). Add a
rebindable `composer.voice` action that toggles the voice conversation,
defaulting to ⌃B on macOS (distinct from ⌘B; off-macOS `ctrl` folds to
the sidebar chord, so it ships unbound there to avoid stealing it). The
global keybind reaches the composer through a new focus-bus event.
2. The Voice settings page rendered every provider's options at once (~30
fields). Filter to the *selected* TTS/STT provider's sub-fields; STT
provider fields hide when STT is off. Picking "edge" now shows just the
Edge voice, making it obvious voice chat also needs STT enabled.
3. Voice mode could hang "speaking" forever. Free Edge TTS sometimes returns
audio that never fires `playing`/`ended`/`error`, so the playback promise
never settled. Add a stall watchdog (rearmed on each progress tick, so
long speech is never cut off) that rejects a stuck stream, letting the
loop recover with a clear error.
CI test shard has no PyPI egress: the real 'pip install packaging==20.9'
in test_core_package_is_not_shadowed failed (the pypi.org reachability
probe passed but the actual install didn't), failing slice 2/6.
- Prove the anti-shadow invariant deterministically: synthesize a fake
'packaging' in the durable target with a sentinel and assert the import
still resolves to the core copy (TestCoreNeverShadowed). No network.
- Cover the install wire offline: stub subprocess and assert --target +
--constraint are built in durable mode and absent in venv-scoped mode
(TestInstallArgConstruction).
- Gate the genuine PyPI install behind HERMES_RUN_NETWORK_TESTS=1 (opt-in,
skipped in CI) instead of a flaky reachability probe that doesn't predict
install success.
The published Docker image seals the agent venv (root-owned, read-only
/opt/hermes) and sets HERMES_DISABLE_LAZY_INSTALLS=1 so a runtime install
can't mutate and brick the core. But opt-in backends (Firecrawl web search,
Exa, Feishu, ...) deliberately keep their SDKs in tools/lazy_deps.py and out
of [all] (pyproject policy 2026-05-12: one quarantined release must not break
every install). The two policies collided: the SDK isn't baked in AND can't
lazy-install, so the default Firecrawl web_search/web_extract fail out of the
box in Docker (#51136), as do Exa (#49445) and Feishu (#50205).
Fix the whole class instead of baking in one backend: when
HERMES_LAZY_INSTALL_TARGET is set, lazy installs are redirected to a writable
dir on the durable /opt/data volume via `pip/uv install --target`, and that
dir is APPENDED to the end of sys.path. Because the core venv always wins
name collisions, a package installed this way can only ADD new modules — it
can never shadow, downgrade, or break a module the core ships. The worst a
bad/incompatible backend package can do is fail to import and report itself
unavailable; the agent core stays healthy. That structural guarantee is what
made it safe to seal the venv, and it is preserved here even with installs
re-enabled.
- tools/lazy_deps.py: durable-target mode — `--target` install + core-pinned
`--constraint` file (shared deps resolve to core's versions, conflicts fail
loudly at install time), append-only sys.path activation, ABI/Python-version
stamp that wipes the store if an image rebuild bumps the interpreter, and a
reworked gate so HERMES_DISABLE_LAZY_INSTALLS=1 redirects (rather than hard-
blocks) when a target is set. security.allow_lazy_installs=false still
disables installs in every mode.
- hermes_bootstrap.py: activate the durable target on sys.path at first import
(before any backend imports its SDK) so packages installed on a previous run
are importable on this run.
- Dockerfile: set HERMES_LAZY_INSTALL_TARGET=/opt/data/lazy-packages.
- docker/stage2-hook.sh: seed + chown the dir on the data volume.
- tests: real-install E2E proving installs land in the target, import cleanly,
don't leak into the sealed venv, and that a core package is never shadowed;
ABI-stamp wipe/preserve; gate matrix; Dockerfile/stage2 contract test.
Fixes#51136
The status-bar "Agents" item conflated three unrelated signals — running
subagents (aggregated across all sessions), in-flight session turns, and
failed background *system* actions (gateway restarts, toolset installs,
computer-use grants via $desktopActionTasks/preview restart) — yet
clicking it opens AgentsView, which renders only subagents. A failed
gateway restart therefore showed "Agents (1 Failed)" over an empty
"No live subagents" tree. AgentsView also filtered to the active session,
so a subagent running in a background session showed "Agents N running"
with nothing in the tree (the desync reported in #49808).
Unify the scope both surfaces speak:
- AgentsView aggregates subagents across every session (salvages #49819).
- The indicator's running/failed counts come from subagents only
(aggregated), never background system actions — those keep their own
surfaces in settings / command center.
So "Agents (N …)" now always points at a populated Spawn tree.
Supersedes #49819. Fixes#49808.
When Telegram's sendRichMessage returns a FloodWait/RetryAfter error,
_try_send_rich() now extracts the server-provided retry_after value and
propagates it through SendResult.retry_after. The base _send_with_retry()
layer honors this value instead of using its default short exponential
backoff (~2s, ~4s), preventing the retry budget from being exhausted
against a server that demands a 25-37s wait.
Salvaged from #46774 by @liuhao1024. Telegram adapter path moved from
gateway/platforms/telegram.py to plugins/platforms/telegram/adapter.py
since the original PR.
Closes#46762
Closes#47707
Context engines and memory providers expose tool schemas via
get_tool_schemas(). agent_init.py wrapped each as
{"type":"function","function":_schema} without validating that
_schema carries a top-level name. A provider returning an entry already
in OpenAI tool form ({"type":"function","function":{...}}) was then
double-wrapped into a tool whose function has no name. Strict providers
(e.g. DeepSeek) reject the entire request with HTTP 400
'tools[N].function: missing field name', so one malformed schema
silently disables the whole toolset and breaks every turn. The schema
was also never added to valid_tool_names, so even lenient providers
could not call it.
Add a shared normalize_tool_schema() helper that unwraps an
already-wrapped entry and returns None for anything lacking a resolvable
string name. Wire it into the agent_init context-engine loop and all
three memory_manager surfaces (inject_memory_provider_tools,
add_provider routing index, get_all_tool_schemas), so a single bad
plugin schema is skipped with a warning instead of poisoning the
request.
Verification: 209 targeted agent/memory tests pass (incl. 9 new).
New tests assert the unwrap + skip-nameless behavior and fail without
the fix.
When tempfile.mkdtemp() raises OSError (e.g. disk full), the exception
propagated past the try/finally block, so _mark_install_failed() was
never called. The 24h backoff marker never engaged, causing unbounded
retry on every command -- each attempt leaked a tirith-install-* temp
directory, eventually filling /tmp completely.
Fix: wrap mkdtemp in its own try/except OSError, returning
(None, "no_space") so the caller's normal failure path (including
_mark_install_failed) executes.
Salvaged from #51831 by @liuhao1024.
Closes#51826
When delegate_task spawns a child agent with a different model/provider, the
child's init_agent loaded the plugin context-engine GLOBAL singleton by
reference (`_selected_engine = _candidate`) and then called update_model() on
it with the child's (smaller) context_length. Because parent and child shared
the same object, this mutated the PARENT's compressor: e.g. DeepSeek 1M ctx
silently dropped to 204800 and the compression threshold from 200K to 40K
after any delegate_task with a different model.
Deepcopy the singleton before assigning/mutating it (agent_init.py) so the
child gets its own instance and the parent's compressor is untouched.
Salvaged from #42452 by @liuhao1024 (authorship preserved). Added a
source-pin regression test that fails if the production line reverts to the
bare alias, plus an end-to-end test driving get_plugin_context_engine() and a
StubEngine.update_model() — the original PR's tests exercised copy.deepcopy in
isolation but did not guard the actual agent_init code path.
Closes#42449. Supersedes #42469, #42474 (same one-line fix, no test).
A single ddgs (DuckDuckGo) search could hang indefinitely and block the
shared agent loop — and therefore every platform (CLI, Telegram, Matrix...).
The DDGS constructor's timeout only bounds individual HTTP requests; ddgs's
multi-engine retry loop has no overall cap, so a slow/rate-limited response
could spin for 20+ minutes with no output and no error.
Run the synchronous ddgs call in a single-worker ThreadPoolExecutor and cap
it with future.result(timeout=_SEARCH_TIMEOUT_SECS=30). On timeout, return a
clear failure ("DuckDuckGo search timed out ... try a different provider")
instead of blocking; the pool is shut down with cancel_futures so a hung
worker is never awaited.
Salvaged from #37422 by @uzunkuyruk (authorship preserved). Re-applied on
current main (the PR's provider.py base had diverged). Added a load-bearing
timeout regression test (the original PR only updated the fake's constructor
and had no timeout-behavior test) — mutation-verified to fail without the cap.
Closes#36776.
_strip_blocked_tools used a hardcoded set missing 'cronjob'. Children
on gateway platforms could inherit the cronjob toolset, scheduling
persistent jobs that outlive the delegation despite DELEGATE_BLOCKED_TOOLS.
Fix: derive the strip set from DELEGATE_BLOCKED_TOOLS at runtime so the
two lists can never drift. Add 'cronjob' to DELEGATE_BLOCKED_TOOLS for
documentation consistency. Two regression tests lock the invariant.
Salvaged from #43687 by @riyas22. Adapted test to current main (no
'messaging' toolset exists -- send_message is intentionally not
registered as an agent tool).
Closes#43466
Corrupted sessions.json entries (e.g. a bare bool where a dict is
expected) caused TypeError on 'origin' in data' which escaped the
(ValueError, KeyError) inner except and aborted loading ALL remaining
sessions, not just the corrupted one.
Two-layer fix:
- Loop level: isinstance(entry_data, dict) guard before from_dict
- from_dict: isinstance(data['origin'], dict) instead of bare truthiness
- Added TypeError to the inner except as defense-in-depth
Closes#46994
The preflight-compression gate only ran the (expensive) token estimate when
the message COUNT exceeded protect_first_n + protect_last_n + 1. A session
with a handful of very large messages never tripped the count condition, so
compression was never attempted and the turn eventually hit a hard
context-overflow error.
Add _should_run_preflight_estimate() with OR semantics: run the estimate when
either the message count exceeds the protected ranges (the historical gate)
OR a cheap char-based estimate already crosses the configured threshold. The
downstream estimate_request_tokens_rough() stays authoritative — this is only
a hint that decides whether to pay for the full estimate.
Salvaged from #27435 by @texhy (authorship preserved). Re-applied on current
main: the preflight gate moved from conversation_loop.py to turn_context.py
since the PR was opened, so the helper + gate are placed there; the test
imports the real MINIMUM_CONTEXT_LENGTH instead of a hardcoded literal.
Closes#27405.
Add compression.minimum_context_floor config key that allows users
to lower the compression threshold floor below the hardcoded 64K
default, preventing infinite tool-call loops on models whose
structured output degrades well before 64K tokens.
- agent/model_metadata.py: add get_configurable_minimum_context()
helper with 16K hard safety limit
- agent/context_compressor.py: accept minimum_context_floor param,
thread it through _compute_threshold_tokens
- agent/conversation_compression.py: use compressor's floor for
aux model context validation
- agent/agent_init.py: read compression.minimum_context_floor from
config and pass to ContextCompressor
- gateway/run.py: cache-busting includes new key
Salvaged from #31686 by @Tranquil-Flow onto current main.
Resolves conflicts with in-place compaction (#38763) and max_tokens
threshold computation (#43547) that landed after the original PR.
Closes#31600
The drift guard (introduced for #26045) correctly protects replace/remove
from clobbering un-roundtrippable content, but it also fires on the add
path. Since add only appends and never overwrites, the guard is
unnecessary and causes false positives when prior add() calls in the same
session shift the byte count of the on-disk file.
Add skip_drift parameter to _reload_target() and pass True from add().
Replace/remove continue to use the drift guard unchanged.
Salvaged from #42880 by @liuhao1024.
Closes#42874
When no reference-capable image backend is configured, generating a pet is
impossible — so instead of a dead prompt + post-hoc error, the overlay now
detects it up front and offers a way out:
- pet.generate.status RPC reports whether a reference-capable provider
(OpenRouter / Nous Portal / OpenAI) is set up; the overlay probes it on
open and swaps the prompt for a friendly setup card (paw, one-line copy,
"Set up image generation" → /settings?tab=providers, key links).
- useRouteOverlayActive(): reusable hook so any portaled modal yields the
screen to a full-screen route overlay (e.g. settings) and reappears —
re-running its mount effects — on return, instead of closing. The probe
re-runs on that remount, so adding a key flips the card to the prompt.
The /new (and /reset) confirmation-button callback runs the slash-confirm
handler on the asyncio event loop (see _request_slash_confirm). That handler
calls _handle_reset_command, which invoked the SYNCHRONOUS, potentially
long-blocking _cleanup_agent_resources inline: agent.close() tears down
terminal sandboxes, browser daemons and background processes (subprocess
waits), and shutdown_memory_provider() can make a network call. A slow
teardown wedged the entire event loop, so the bot went silent and stopped
processing all messages until a manual restart.
Offload _cleanup_agent_resources via the existing contextvar-preserving
_run_in_executor_with_context helper, bounded by asyncio.wait_for with a
named _RESET_CLEANUP_TIMEOUT_S (30s). The loop is never blocked; on timeout
the reset proceeds and the worker thread is left to finish on its own (it
cannot be cancelled). The text /new path is unaffected (already off-loop).
Tests (tests/gateway/test_35994_reset_button_deadlock.py): the loop keeps
ticking while close() blocks in its worker thread; a cleanup that raises is
swallowed (warning logged) and the reset still rotates the session; a
cleanup that times out degrades gracefully. All three are mutation-verified
to fail without their respective production branch.
- pet.generate / pet.hatch (parallel rows, off the reader thread) +
cooperative pet.cancel; pet.export / pet.rename.
- pet.gallery localOnly fast path + background manifest prefetch so the
picker never blocks on petdex; rename follows the active-pet config.
- gateway request gains optional timeout + AbortSignal for real Stop.
Reference-grounded image provider over the OpenRouter-compatible
chat-completions image protocol (Gemini Flash Image et al.). Nous Portal
proxies OpenRouter, so one provider serves both — giving pet generation a
reference-capable backend beyond OpenAI gpt-image.
Turn a text prompt into a petdex-spec spritesheet (8×9 grid of 192×208
cells), grounded so every animation row stays the same creature:
- orchestrate: base drafts (distinct variation nudges) → per-row grounded
generation → atlas compose; one image call per row, rows fan out in parallel.
- atlas: frame-perfect registration in normalize_cells — 1-D cross-correlation
of each frame's column-mass profile locks the body (robust to limbs/cape),
one shared per-state scale, bottom-anchored; plus alpha-hole repair, gutter
severing, and interior-seeded chroma-pocket clearing.
- prompts: pixel-art-by-default style hints + registration constraints.
- store: local pet write (register_local_pet), slugify/unique_slug,
export_pet, slug-realigning rename_pet, createdBy provenance.
The desktop window opened at a hardcoded 1220×800 every launch, discarding
whatever size and position the user left it at (#39101) — on macOS the dock
reopen was the most visible case, but every restart reset it.
A small window-state.json under userData (same pattern as connection.json /
updates.json) records the window's normal bounds plus its maximized flag,
written debounced on resize/move/maximize and flushed on close, applied on the
next createWindow(). getNormalBounds() captures the pre-maximize size so an
un-maximize next session lands where the user actually sized it.
Restore is defensive: sanitize rejects garbage, drops off-screen positions
(window falls back to Electron centering), and caps a size saved on a
since-disconnected larger monitor to the largest current display. The geometry
math lives in a side-effect-free window-state.cjs so it unit-tests with
node --test, no Electron boot. No new dependency.
Salvages #39154 by @jeffrobodie-glitch — same userData approach and validation
intent, reimplemented tighter and folded into one module.
Co-authored-by: jeffrobodie-glitch <jeffrobodie@gmail.com>
After /stop, the next user message can hit a stale generation token and
return with api_calls=0, no failure, no interruption. _normalize_empty_agent_response
fell through to an empty string, so the gateway logged "response=0 chars"
and sent nothing — the message was silently lost while internal work
sometimes continued.
Add the api_calls==0 / not-failed / not-interrupted / not-partial branch
to the single normalization chokepoint so the user gets a short retry hint
instead of silence. Regression test asserts the hint surfaces.
Salvaged from #33851 (re-applied on current main; original was 1401 commits
behind and the function had moved).
Follow-up to the salvaged venv-recreate fix. Three changes to the
Install-Venv pre-delete sweep:
- Match the venv path with a case-insensitive StartsWith instead of the
PowerShell -like operator. A venv path containing wildcard
metacharacters ('[', ']') — legal in a Windows user name — silently
fails to match under -like, which would let the locking process slip
through and reintroduce the exact access-denied failure this fix
closes.
- Retry Remove-Item once after a short pause. A force-killed process can
take a moment to release its file handles, so the first delete may
still hit a locked .pyd; retry before failing the stage.
- Note in a comment that the gateway autostart task runs at LIMITED
integrity as the current user, so the installer always runs at
equal-or-higher integrity and can read the process executable path,
and that Get-CimInstance is preferred over Get-Process because it
returns a null path for an uninspectable process instead of throwing.
Adds a regression test asserting the recreate branch sweeps by venv path
prefix, uses StartsWith rather than -like, and runs the sweep before
Remove-Item.
Covers issues #47036, #47557, #47910.
The Windows venv-recreate guard only runs `taskkill /IM hermes.exe`, but the
gateway that a scheduled task or watchdog autostarts runs as
`pythonw.exe -m hermes_cli.main gateway run` straight out of venv\Scripts\.
Its image name is python/pythonw, so taskkill never matches it; it keeps the
venv's native extensions (e.g. tornado\speedups.pyd) loaded, and the following
Remove-Item fails with "Access to the path is denied" -- aborting boot at the
venv stage so the desktop app never loads.
Additionally stop any process whose executable lives under this venv, matched
by path so the image name is irrelevant and a global/system python outside the
venv is never touched.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Let setup.runtime_check accept an optional provider, persist the selected
provider/model before the gate, and validate the provider the user just
connected instead of a stale config entry such as anthropic.
The apps/desktop workspace was bumped to 0.17.0 in apps/desktop/package.json
but package-lock.json still recorded 0.15.1, so npm install reports the lock
as out of date and rewrites it on every fresh install. Regenerate the lock
(npm install --package-lock-only) to record the current 0.17.0; one-line
change, no dependency resolution churn.
When the current provider is a custom endpoint (custom or custom:*), the model
switch pipeline must NOT auto-switch to a native provider/OpenRouter based on a
static-catalog match. The user explicitly configured their own endpoint and the
same model name may be served there; silently rewriting model.provider destroys
their config.
- detect_static_provider_for_model(): skip the static-catalog scan when the
current provider is custom/custom:*
- switch_model() Step e: extend is_custom to cover custom:* so the
detect_provider_for_model() last-resort fallback cannot fire
Salvaged from #48351 by Elshayib (authorship preserved).
Fixes#48305
The TUI model-switch persistence (_persist_model_switch) rewrote the entire
model config block via save_config(), destroying sibling keys the user set
under model: (model_slots, model_fallback, base_url, ...) on every switch.
Use targeted, atomic, comment-preserving save_config_value("model.default" /
"model.provider" / "model.base_url") writes instead, so a model switch only
touches the keys it changes.
Salvaged from #48391 by kyssta-exe (authorship preserved).
Fixes#48305
Completes the #45006 fix. PR-base commit (configured-provider routing) handles
the case where a typed model IS declared in user/custom provider config. This
commit closes the other root: when a typed model is NOT in any config and the
current provider is a soft-accepting one (openai-codex / xai-oauth), the
hidden-model soft-accept (#16172 / #19729) would accept ANY unknown name as a
hidden model — so `qwen3.5-4b` typed on a Codex-default session "succeeded" and
mislabeled the provider as "OpenAI Codex" (the exact reported symptom), then
400'd on the next turn.
Gate the soft-accept to slugs that plausibly belong to the provider's family
(openai-codex -> gpt-/codex-/o1/o3/o4; xai-oauth -> grok-). Family-shaped
unknown slugs are still soft-accepted (preserving the #16172 entitlement-gated
hidden-model intent); unrelated names are rejected with actionable guidance to
pin the right provider via `--provider <slug>` or the picker.
Adds TestCodexSoftAcceptPlausibilityGate (5 tests): unrelated names rejected on
codex/xai, family-shaped hidden slugs still accepted, real catalog models
unaffected. Verified load-bearing.
When `/model X` is the FIRST message after an idle/daily/suspended auto-reset,
the slash-command path stores a session model override but leaves
`session_entry.was_auto_reset = True` (it never passes through
`_handle_message_with_agent`, which is where the flag was consumed). On the
NEXT regular message, the auto-reset cleanup block pops the freshly-stored
model/reasoning override BEFORE the flag is consumed — so the switch is
silently lost and resolution falls back to the config default, while the
session DB still shows the switched model (a two-sources-of-truth divergence).
Consume the flag at both sites:
1. gateway/run.py — capture `was_auto_reset` into a local and set the
attribute False immediately at the top of the cleanup block, so the
cleanup can't re-fire on a later message and wipe an override stored
between turns. Downstream reads use the captured local.
2. gateway/slash_commands.py — the model path consumes the flag before
storing the override, so a /model-first-after-auto-reset isn't wiped by
the next message's cleanup.
Salvaged from #48062 by x7peeps (authorship preserved).
Tests: tests/gateway/test_48031_model_switch_after_auto_reset.py — AST
invariants pinning both consume sites (load-bearing; verified they fail when
either consume is removed). Mirrors the AST-pin approach in
test_35809_auto_reset_clean_context.py. Gateway session/reset suite: 16 passed.
Fixes#48031
Salvage of PR #48927 by @ehz0ah, which consolidates OpenViking recall
work from #41706 (@huangxun375-stack), #33260, #49975, and #32444.
Replaces stale background post-turn prefetch warming with synchronous
current-query recall. The old queue_prefetch warmed the PREVIOUS user
message while turn-start recall consumed the CURRENT one, so injected
context was always about the wrong topic.
Changes:
- prefetch() now does session-aware /api/v1/search/search with the
current query, falls back to /api/v1/search/find on failure
- Contract-safe payloads: limit, score_threshold, context_type,
session_id — no top_k, no search-body mode, no target_uri
- L2 content reads for items with level=2 or empty abstracts, capped
at full_read_limit (default 2)
- Local ranking (score + query-token overlap + leaf boost), dedup,
score threshold, and injected-char budget
- queue_prefetch() is now a no-op (background warming removed)
- Additive batched viking_read: uris param accepts up to 3 URIs
- Per-request timeout support on _VikingClient.get/post/delete
- Removes stale _prefetch_result/_prefetch_thread/_prefetch_generation
state and _invalidate_prefetch_state()
- Strengthened system_prompt_block guidance
Salvage follow-up fixes:
- Expose all 8 recall config knobs in get_config_schema() (PR #48927
had removed them; #41706 correctly exposed them). Env vars remain
as internal mechanism but are now visible in setup wizard.
- Lower default timeout 8s→4s, request_timeout 6s→3s, full_read_limit
3→2 to reduce per-turn blocking latency.
Co-authored-by: Hao Zhe <haozhe4547@gmail.com>
Co-authored-by: Eurekaxun <eurekaxun@163.com>
The Discord/Telegram /model slash command listed providers synchronously
on the gateway's async event loop. list_picker_providers /
list_authenticated_providers are blocking and can fall through to a
synchronous urllib HTTP fetch when the on-disk provider cache is stale,
freezing the loop for 120-150s -> "application did not respond" and
delayed agent starts.
Port #41304's asyncio.to_thread offload to the current handler location.
The handler moved from gateway/run.py to gateway/slash_commands.py
(_handle_model_command); wrap BOTH blocking call sites so the whole bug
class is covered:
- picker path -> list_picker_providers
- text-fallback path -> list_authenticated_providers
asyncio.to_thread is already idiomatic in this module (and asyncio is
imported), so the loop now stays responsive while the (possibly
network-bound) listing runs on a worker thread.
Adds tests/gateway/test_model_command_async_offload.py asserting the
offload contract at the real handler seam for both paths (mutation-
survivable: reverting either to_thread wrap fails the matching test).
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
The Discord gateway heartbeat stalled ('Shard ID None heartbeat blocked
for more than N seconds') because _handoff_watcher polled the synchronous,
blocking SQLite-backed SessionDB directly on the asyncio event loop every
2s. Each list_pending/claim/complete/fail call performed blocking disk I/O
on the loop thread, starving the Discord heartbeat coroutine.
Wrap every blocking SessionDB call inside the watcher loop in
asyncio.to_thread(...) so the SQLite work runs on a worker thread and the
event loop (and heartbeat) stays responsive. These four call sites are the
only synchronous self._session_db.* calls inside the watcher loop body.
Adds tests/gateway/test_handoff_watcher_async_db.py asserting the watcher
offloads its SessionDB calls via asyncio.to_thread (mutation-survivable:
reverting any to_thread wrap fails the corresponding assertion).
Fixes#40695
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
When context compaction's summary generation fails, the compressor's default
path (abort_on_summary_failure=False) drops the middle window and inserts a
static 'summary unavailable' marker — destroying the compacted turns. #29559
reported the field impact: a Connection error at the compaction moment dropped
124->15 messages (110 lost) for a long browser-automation task; #25585 is the
same failure mode (failed summary commits a destructive compaction anyway).
compress() already has an EXCEPTION to the historical drop default: auth
failures (401/403) ALWAYS abort and preserve the session, because rotating into
a placeholder-summary child on a broken credential strands the user. A transient
network/connection error is the same situation in reverse: it WILL recover, and
retrying then is strictly better than discarding context for a momentary blip.
Extend the always-abort carve-out to terminal connection/network failures:
- new _last_summary_network_failure flag, set in _generate_summary's terminal
failure branch when _is_connection_error(e) (reached only after any main-model
fallback is exhausted), reset alongside the auth flag;
- compress() aborts when it's set (returns messages unchanged,
_last_compress_aborted=True), independent of abort_on_summary_failure;
- a network-specific operator warning (distinct from the auth + config-flag
messages).
Scoped to connection errors only: a generic 500/400 still takes the historical
fallback-drop path (test_non_auth_failure_still_uses_fallback_path stays green).
Tests: network-failure detection + abort-despite-flag-false, both mutation-checked
(removing the flag-set fails detection; removing the carve-out fails the abort).
hermes-pr-review findings:
- notifyError('runtime-not-ready', msg) misused the (error, fallback) API:
the key became the notification body and the message became the title.
Switch to notify({ id, kind, title, message }) which puts content in the
right slots.
- The stable id 'runtime-not-ready' deduplicates: notify() replaces by id,
so repeated refreshOnboarding calls during an outage no longer stack
up to 4 persistent error toasts.
- Remove dead !state.manual guard from shouldPreserveConfiguredOnFallback:
refreshOnboarding already short-circuits on manual before the helper.
- Test: seed localStorage with '1' before asserting it survives (was testing
the wrong invariant — null in, null out).
- Test: use static import for spy instead of fragile await import.
- Test: add negative case for requested=true + configured=true (should
still downgrade — requested overrides preservation).
When shouldPreserveConfiguredOnFallback keeps configured=true, also call
notifyError('runtime-not-ready', ...) so the user knows the backend wasn't
verified instead of silently proceeding. Adapted from @mohamedorigami-jpg's
approach in PR #37634.
Regression coverage for the desktop gateway-restart hang: prompt_yes_no
returns its default when HERMES_NONINTERACTIVE=1 or on a bare EOFError
(closed/redirected stdin), and still exits on KeyboardInterrupt.
The dashboard/desktop spawn gateway actions with stdin=DEVNULL and
HERMES_NONINTERACTIVE=1 (hermes_cli/web_server.py), but prompt_yes_no
ignored that contract and called sys.exit(1) on the resulting EOFError.
On Windows, `gateway start` asks "Install it now so the gateway starts on
login? [Y/n]" when the scheduled task / startup entry is not yet
installed. Spawned from the desktop app there is no stdin to answer it, so
every desktop-triggered gateway restart aborted at that prompt and the
gateway never started ("Gateway service is not installed").
Fall back to the prompt's default when HERMES_NONINTERACTIVE is set, and
treat a bare EOFError as "accept default" rather than exiting. This lets
the Windows start path proceed unattended (Startup-folder fallback + direct
spawn) while interactive TTY usage is unchanged. Ctrl+C still exits.
The wire contract said hop 1 uses "the agent's existing Nous Portal
access token" but didn't name WHICH of an agent's two identities that is.
A hosted agent never holds an `agent:{instanceId}` OAuth client (that
shape is minted only by the interactive dashboard auth-code grant); its
own outbound portal calls use the bootstrap-session token (client
`hermes-cli-vps`) planted in auth.json on first boot. NAS must resolve
the instance id from either an `agent:{id}` client OR the bootstrap
session (AgentInstance.bootstrapSessionId), not gate on `agent:*` alone —
which 403'd every real hosted-agent provision in prod.
Documents the NAS-side fix (resolveAgentCronInstanceId) so the contract
and the implementation agree.
Phase 7 Unit 7d-B. When an operator opts an instance OUT of the Team Gateway
relay (Unit 7b deprovision), the connector revokes the per-gateway secret and
closes the gateway's WS with 4401. The reconnect supervisor previously treated
EVERY close as retryable, so the live process spun "retrying 4401" forever and
the dashboard showed a red error — opt-out looked like a failure.
Now a 4401 close that arrives AFTER a successful handshake is recognized as a
terminal credential revocation:
- ws_transport.py: track `_handshake_succeeded` (set when a descriptor is
received); on a 4401 close after a prior success, latch `auth_revoked` and do
NOT spawn the reconnect supervisor. A 4401 BEFORE any successful handshake
stays retryable (cold-start / not-yet-provisioned race, not a revocation).
New `auth_revoked` property + a websockets-version-safe close-code reader
(prefers `.rcvd`/`.sent` Close frames; `.code` is deprecated in websockets 13+).
- adapter.py: a revocation monitor turns `transport.auth_revoked` into a clean,
NON-retryable `relay_disabled` fatal and notifies the gateway's fatal-error
handler (so the adapter is removed and NOT queued for reconnection — the
credential is dead until the instance is recreated). Monitor is cancelled on
disconnect; only started when the transport exposes `auth_revoked` (prod WS).
- run.py: `_handle_adapter_fatal_error` maps the `relay_disabled` code to a
`disabled` platform_state (not `fatal`/`retrying`).
- web: PlatformsCard renders the `disabled` state with a neutral outline badge,
a PowerOff icon, and muted (not destructive-red) text + message. New optional
`status.disabled` i18n string ("Disabled").
Also bundles the Phase 7 contract-doc update (this doc is authoritative in
hermes-agent): docs/relay-connector-contract.md gains an "Author-first
resolution + the account-link (DM) path" section documenting the
multi-tenant-guild rule (D-7.2 — route by authenticated author binding, never by
guild; unlinked → fail-closed), the `/link <code>` DM flow, and the
connector-authoritative opt-out + terminal-4401 behavior this PR implements.
Tests: +2 ws_transport (4401-after-handshake terminal / no-reconnect;
4401-before-handshake stays retryable) and +2 adapter (revocation → non-retryable
relay_disabled fatal + handler fired; no-revocation → no fatal). 138 relay tests
pass (incl. the contract-doc conformance test); ruff clean; web tsc clean.
Phase 7 Unit 7d-B (relay-adapter solo lane). Q17 → Option 2; Option 3 (live
de-register, no recreate) + the restart-re-provision hole deferred post-alpha.
After `hermes update`, a globally-installed agent-browser's npm postinstall
(fixUnixSymlink) re-points the global symlink (e.g. /opt/homebrew/bin/agent-browser)
at our local node_modules binary. The next update wipes node_modules, leaving a
dangling symlink that `which` still reports but exec fails on with exit 127 —
silently breaking every browser tool (#48521).
Root cause is trust-on-presence: shutil.which/Path.exists accept a name that
resolves but won't run. Add hermes_constants.agent_browser_runnable() (resolves
the path + runs --version) and gate all four resolution sites on it:
_find_agent_browser now skips a dead candidate and falls through to the next
working one (extended PATH -> local .bin -> npx), self-healing the dangling link.
dep_ensure/doctor/nous_subscription validate too; doctor warns on a broken link.
Closes#48521.
* docs: stop recommending pip install hermes-agent; point to install script
The install script is the only supported install path (it provisions a
managed, isolated uv environment). Replace bare `pip install hermes-agent`
primary-install recommendations with the curl install script, and rewrite
optional-extra snippets (`pip install "hermes-agent[X]"`) to the managed-env
form `cd ~/.hermes/hermes-agent && uv pip install -e ".[X]"` that matches the
installer and the English quickstart.
Covers English docs + zh-Hans mirrors, the achievements plugin README, and
realigns the zh-Hans quickstart to the English Desktop-installer-first layout
(dropping its stale "Method A — pip (simplest)" section).
* docs: drop pip as a supported install/update method
Removes the 'pip installs' supported-method sections from updating.md and
cli-commands.md (EN + zh-Hans): the curl install script is the only supported
way to install/update the Hermes CLI. The _cmd_update_pip pip/pipx branches
remain in code as an undocumented safety net for users who already have such an
install, but the docs no longer advertise pip as a path.
Also normalizes a bare `pip install -e '.[acp]'` to the managed-env form.
Leaves python-library.md untouched: importing AIAgent as a library dependency
into your own project is a distinct use case where pip is correct.
On a launchd-managed gateway (macOS), /restart stopped the gateway but
never relaunched it: the handler's service detection checks only
INVOCATION_ID (systemd) and container markers, so under launchd it takes
the detached path and exits 0 — which KeepAlive.SuccessfulExit=false
treats as a deliberate stop. The gateway stays silently dead until a
manual launchctl kickstart.
Detect launchd via XPC_SERVICE_NAME, which launchd sets to the job label
for processes it spawns. The probe deliberately excludes the literal
"0": interactive macOS shells inherit XPC_SERVICE_NAME=0 (a truthy
string), and routing an unsupervised interactive gateway to the service
path would make it exit non-zero with nothing to revive it.
Routing through via_service=True (rather than forcing a non-zero exit
on the detached path) matters: the detached path also spawns a helper
that relaunches the gateway, so exiting non-zero there would have BOTH
the helper and launchd respawn it — two gateways racing for the same
bot tokens. The service path spawns no helper; launchd is the single
respawner.
Fixes#43475. Supersedes the run.py-era probes in #19940/#33393 (the
handler has since moved to gateway/slash_commands.py) and avoids the
double-spawn risk in the exit-code-site approaches (#43498, #43596).
The dashboard chat sidebar's tool-call activity card was disabled in the
product — both ChatPage mounts passed showTools={false} (since #49077),
so the box never rendered. The sidebar still subscribed to tool.* events
and accumulated them in state for a panel nobody saw.
Remove the tools card, the showTools prop, the tool.* event handling and
state, and the now-orphaned ToolCall component. The /api/events
subscription stays for session.info (live title) and
dashboard.new_session_requested. The sidebar is now just the model
selector box; the session list (ChatSessionList) is unchanged.
No behavior change in the live dashboard — the tools box was already
hidden.
Add two regression tests for the salvaged #48706 fix:
- login token exchange targets platform.claude.com first
- falls back to console.anthropic.com when the new host is unreachable
Also map the salvaged contributor's noreply email in release.py
AUTHOR_MAP (CI author-map gate).
Anthropic migrated the OAuth token endpoint from
console.anthropic.com/v1/oauth/token (now returns HTTP 404) to
platform.claude.com/v1/oauth/token. The token *refresh* path already
iterated both hosts, but the two initial code-exchange call sites were
hardcoded to the dead console host, so every new Claude OAuth login
failed with 'Token exchange failed: HTTP Error 404: Not Found' and saved
no credentials.
Fix the whole bug class:
- Add _OAUTH_TOKEN_URLS [platform.claude.com, console.anthropic.com] in
agent/anthropic_adapter.py; _OAUTH_TOKEN_URL now points at the live
host for backward-compat with existing imports.
- run_hermes_oauth_login_pure() (CLI flow) iterates the list, first
success wins, mirroring the refresh path.
- hermes_cli/web_server.py (desktop dashboard flow) imports the list and
iterates it too, so the GUI login path is fixed identically.
Probe: console.anthropic.com/v1/oauth/token -> HTTP 404 (gone),
platform.claude.com/v1/oauth/token -> HTTP 400 (alive). Verified a real
Claude MAX OAuth login now succeeds end-to-end.
Most Matrix clients auto-set a room name when creating a DM (e.g.
"Alice & Bot" from participant display names), so the old
`is_direct and not has_explicit_name` heuristic classified virtually
all client-created DM rooms as "room", forcing require_mention gating
in legitimate one-on-one DMs.
member_count is now the primary DM signal: <=2 members means the room
is necessarily a 1:1 conversation, regardless of m.direct or an explicit
name. A room that grew to 3+ members but is still in stale m.direct is
still classified as a room (conflict flag set). Falls back to the
m.direct + name heuristic when the count is unavailable.
Also hardens _get_room_member_count with a joined_members API fallback
when the cache-backed state_store is empty.
Salvaged from #48554 by @justemu onto the current plugin adapter path
(gateway/platforms/matrix.py -> plugins/platforms/matrix/adapter.py).
Fixes#48551
Users who inspect ~/.hermes/sessions/sessions.json see only gateway entries
(e.g. agent:main:whatsapp:dm:...) and mistake it for the session index that
hermes sessions list / /sessions read — which is actually state.db. Issue
#49361 reported CLI sessions as 'invisible' on this premise.
- gateway/session.py: write a self-documenting _README sentinel at the top of
sessions.json explaining it's the gateway routing index and that ALL sessions
(CLI/TUI/gateway) live in state.db; skip _-prefixed keys on load so the
sentinel never round-trips into a SessionEntry.
- Harden every sessions.json reader against the sentinel: mcp_serve loader,
gateway/mirror.py, gateway/channel_directory.py all skip _-prefixed keys.
- docs/user-guide/sessions.md: warning callout naming the exact symptom.
- tests: assert prune ignores metadata sentinels; add round-trip coverage.
Component button interactions (approve/deny, slash confirm, model
picker, clarify) were not checking the pairing store for authorization.
Users approved via `hermes pairing approve` could send messages and use
slash commands (which go through the gateway authz_mixin), but button
clicks were rejected because `_component_check_auth` only checked
env-var allowlists (DISCORD_ALLOWED_USERS, GATEWAY_ALLOW_ALL_USERS,
etc.) and not the pairing store.
This was a regression from commit f6f363662 which intentionally made
component auth fail-closed when no allowlist is set (security fix for
GHSA-mc26-p6fw-7pp6), but did not account for pairing-based auth.
Fix: add a `PairingStore.is_approved("discord", uid)` check to
`_component_check_auth`, mirroring `authz_mixin._check_authorization`.
The pairing store check runs after all allowlist checks, preserving the
fail-closed behavior for non-paired, non-allowed users.
Fixes#50627
Regression coverage for the synthetic-assistant close: interrupt after a
successful tool must persist an assistant tail (placeholder when no
delivered text), real delivered text is preserved, and non-interrupted
or non-tool tails are left untouched.
Asserts resolve_runtime_provider honors target_model over the stale
persisted model.default when choosing the Bedrock dual-path api_mode:
Claude target -> anthropic_messages, Nova target -> bedrock_converse.
Both fail without the #49095 fix.
The 30-slot default could not fit Hermes's ~50 built-in commands, so
every skill command (and 20 built-ins) were silently dropped from the
Telegram \`/\` menu by default — they only worked when typed manually.
Raising the default to 60 keeps all built-ins plus common skill commands
visible out of the box while staying under Telegram's ~4KB payload limit.
Users can still tune it via platforms.telegram.extra.command_menu.
Adds a configurable Telegram BotCommand menu cap and priority list via
platforms.telegram.extra.command_menu (max_commands clamped 1..100;
priority_mode prepend|append|replace). Default cap stays 30; hidden
commands remain invokable when typed and /commands lists the full set.
Salvaged from PR #42021. Cherry-picked onto current main; the original
edited gateway/platforms/telegram.py, now relocated to
plugins/platforms/telegram/adapter.py.
The desktop composer threw an uncaught "Composer is not available" at
startup and the input went unresponsive (#49903). assistant-ui's composer
mutators (setText/send/…) throw when the thread's composer core isn't bound
yet; the read path is null-safe but the writes are not. ChatBar pushes draft
text via aui.composer().setText() from mount-time effects (draft restore,
clearDraft, external inserts), and the v0.17.0 popout refactor (#49488)
widened the unbound window by moving the composer out of the contain wrapper
into a sibling of the thread — so the throw surfaced as an uncaught error
that wedged the input.
Wrap every composer mutation in a setComposerText helper that swallows the
unbound-core throw. The contentEditable DOM + draftRef already hold the text
and the draft-editor sync re-applies it once the core attaches, so the draft
is never lost — only the premature state push is skipped.
Selective --clone / --clone-from / --clone-config copied .env but not
auth.json, silently dropping the credential pool — including OAuth tokens
(Anthropic `claude /login`, Codex, xAI) that never land in .env. A profile
cloned from an OAuth-authenticated default therefore resolved a different
provider (or none) than the source under provider: auto. --clone-all already
carried auth.json via the full copytree; only the selective path missed it.
Add auth.json to _CLONE_CONFIG_FILES and tighten it to 0o600 after copy,
matching .env semantics.
On Windows, start_server() served uvicorn via a bare asyncio.run(_serve()),
which uses the default ProactorEventLoop. uvicorn's socket-serving stack
assumes a SelectorEventLoop on win32 (uvicorn/loops/asyncio.py forces it, and
uvicorn.Server.run threads config.get_loop_factory() into its runner for
exactly this reason). Driving uvicorn on the proactor loop makes
server.startup() bind a socket that never accepts: the dashboard and desktop
backend print "Skipping web UI build" then hang forever with the port
LISTENING but no TCP handshake completing.
Fix is win32-scoped to keep the blast radius minimal: POSIX keeps the exact
asyncio.run(_serve()) it had (its default loop is already a SelectorEventLoop /
uvloop, which is what uvicorn serves on). Only on Windows do we mirror
uvicorn.Server.run and run on the loop factory uvicorn picks, with a fallback
to WindowsSelectorEventLoopPolicy for uvicorn < 0.36.
Fixes hermes dashboard and hermes desktop (the Electron app spawns a
hermes dashboard backend). The gateway symptom in the report has a separate
root cause (no uvicorn) and is not addressed here.
uperLu's #50958 renamed plugins/cron → plugins/cron_providers but left
two test files patching the now-gone plugins.cron.chronos.verify path,
which would fail collection. Point them at plugins.cron_providers.*.
Add uperLu to release.py AUTHOR_MAP.
The dashboard Profiles view showed "Gateway stopped" for a gateway that
is in fact running — while the sidebar status strip and `hermes gateway
status` (CLI) both correctly showed it running. Reported on v0.17.0
running the gateway + dashboard in one Docker container.
Root cause: three liveness surfaces with three detection strengths, all
reading the same `gateway.pid`:
- `hermes gateway status` -> find_gateway_pids() (process-table scan)
- sidebar /api/status -> get_running_pid() + gateway_state.json PID
fallback + health-URL probe
- Profiles view -> _check_gateway_running() = get_running_pid()
ONLY, no fallback
`get_running_pid()` short-circuits to None the moment the runtime lock
(`gateway.lock`) doesn't register as held by the *calling* process —
which is always true when the reader is a separate process from the
gateway (the dashboard is its own s6 service in the container), and also
for any launch-service-managed gateway that left a fresh
`gateway_state.json` but no live PID file. So the Profiles view alone
reported the live gateway as stopped.
Fix: give _check_gateway_running the same fallback the sidebar already
has — after the pid-file/lock check misses, validate the PID recorded in
that profile's gateway_state.json against the live process table via the
existing get_runtime_status_running_pid(). read_runtime_status() gains an
optional path arg so a profile's state file can be read without mutating
the process-global HERMES_HOME (preserving the contextvar-based profile
isolation the dashboard relies on). Backward compatible: every existing
caller passes no argument.
Tests: a regression test that fails pre-fix (live gateway, lock check
returns None -> must still report running) and a guard test that a
'stopped' state file is never reported running even with a live PID.
The contributor PR stamped runner._exit_code=78 on non-retryable startup
errors, but start_gateway()'s clean-exit branch returned True before the
SystemExit(runner.exit_code) site, so main() exited 0. The s6 finish
script's [ "$1" = "78" ] check never matched and s6 crash-looped the
gateway anyway — the fix was dead as shipped (#51228).
Honor runner.exit_code in the clean-exit branch: raise SystemExit(code)
when set, else return True (normal /restart clean exit). Add a
start_gateway()-level test that asserts process-level SystemExit(78)
propagation — the gap the PR's object-level test missed — plus exit_code
on the existing _CleanExitRunner mocks.
Profiles without their own messaging token inherit the default
profile's token via os.getenv, hit a token collision, and exit with
startup_failed. s6 restarts them immediately, creating ~30MB tirith
sandbox dirs in /tmp each cycle — filling the disk in hours (#51228).
Changes:
- gateway/restart.py: add GATEWAY_FATAL_CONFIG_EXIT_CODE = 78
- gateway/run.py: set exit_code=78 on non-retryable startup errors
(token collision, no platforms)
- hermes_cli/service_manager.py: add _render_finish_script() that
translates exit 78 → exit 125 (s6 permanent failure)
- hermes_cli/container_boot.py: write finish script alongside run
script during profile registration
The s6 finish script pattern follows docker/s6-rc.d/dashboard/finish.
Closes#51228
A naive ISO timestamp (e.g. 2026-06-22T20:07:00) was anchored to the
server's local timezone via dt.astimezone(), but the due-check
(get_due_jobs -> _hermes_now()) runs in the CONFIGURED Hermes timezone.
When the two diverge (cloud host on UTC with a different timezone: set,
or vice-versa) the stored instant lands hours off the user's wall-clock
intent, so one-shots never become due and recurring jobs fire at the
wrong time. The ticker stays healthy (heartbeat + success markers fresh)
because every tick finds nothing due, matching the silent no-fire in #51021.
Anchor naive timestamps to _hermes_now().tzinfo so '20:07' means 20:07 on
the same clock the scheduler checks against. The legacy _ensure_aware path
still treats already-stored naive values as server-local for back-compat.
Fixes#51021
The cron ticker only runs inside the gateway (_start_cron_ticker); there
is no standalone cron daemon. When the gateway isn't running, next_run_at
passes but jobs never fire and last_run_at stays null — and manual
'hermes cron run' (which bypasses the ticker) appears to work, masking
the real cause. This is the most common cron support report (#51038).
cron list already warned; extend the same warning to cron create (the
moment the user is most likely to hit this) via a shared helper, and add
a pointer to 'hermes cron status'. Silent when a gateway is running, so
the gateway /cron path is unaffected.
Launching Hermes from a directory that ships its own top-level package with a
Hermes-internal name (utils/, proxy/, ui/) crashed the gateway/TUI child with
an ImportError (exit 1, crash loop): from utils import atomic_replace resolved
to the user's package.
tui_gateway/entry.py already stripped the relative cwd forms ('' / '.'), but
the launch dir also reaches sys.path as its own ABSOLUTE path (venv activation
or a project that adds itself to PYTHONPATH), which the strip missed and which
sat ahead of the Hermes root.
Centralize a hardened guard in hermes_bootstrap.harden_import_path(): drop the
relative forms AND force the Hermes source root to the front even when an
absolute cwd entry is present. Wire it into tui_gateway/entry.py and
acp_adapter/entry.py (both spawn into arbitrary cwds); hermes_cli/main.py and
gateway/run.py already insert the root at front. gatewayClient.ts now also
exports HERMES_PYTHON_SRC_ROOT for defense in depth.
Builds on @wgu9's runtime-tracking fix: now that find_gateway_pids() can
see a no-supervisor `gateway restart` runtime, have stop_profile_gateway()
fall back to an orphan-aware, profile-scoped reap (SIGTERM then SIGKILL)
when the pidfile/runtime record is missing or stale. Closes the duplicate-
accumulation path in #51325 — a follow-up restart now kills the prior
orphan instead of stacking another listener on :8644. Gated on
not supports_systemd_services() so a transient `gateway restart` argv on
supervised hosts is never killed.
Also adds the AUTHOR_MAP entry for the salvaged contributor.
atomic_yaml_write (and two sibling config writers) called yaml.dump
without allow_unicode=True. The default personalities shipped in cli.py
contain emoji/kaomoji, so PyYAML escaped astral-plane chars as 8-digit
\\UXXXXXXXX sequences inside multi-line double-quoted strings wrapped
with \\ line-continuations. Stricter/non-PyYAML parsers, editors, and
hand-edits break that structure into unclosed quotes, failing the whole
config parse -> silent fallback to defaults -> custom_providers lost.
Add allow_unicode=True to the canonical writer plus tui_gateway/server.py
and the telegram adapter's atomic config write so config is written as
readable UTF-8 with no escape/fold artifacts.
Fixes#51356
deliver=origin (or omitted) from a TUI or classic-CLI session produces a
job with origin=null, because those sessions never populate the
HERMES_SESSION_PLATFORM/CHAT_ID context vars that _origin_from_env reads.
The scheduler then resolves no delivery target and skips delivery — the
job runs and saves output to last_output, but nothing reaches the user
and they only find out by polling cronjob(action='list') (#51568).
This is by design (local sessions have no live-delivery channel), so the
fix surfaces it instead of silently dropping the intent:
- cronjob create now appends an informational notice to its result when
a created job resolves to zero delivery targets and the user did not
explicitly ask for deliver='local'. The check uses the scheduler's own
_resolve_delivery_targets so it accounts for origin, home channels,
'all', and explicit platform targets — no false positives.
- PLATFORM_HINTS gains a 'tui' entry (the TUI had none) and the 'cli'
hint now states that cron jobs from these sessions are local-only and
that deliver must target a gateway-connected platform to notify the
user. This stops the agent promising a delivery that never happens.
No scheduler/delivery behavior change; no new env var; cron isolation
invariant untouched.
Adds the #51579 regression test the issue asked for: run the real
docker_config_migrate.py boot path twice (host-reboot scenario under
--restart unless-stopped) and assert $HERMES_HOME/.env survives
byte-for-byte and the second boot is a no-op (no re-migration, no new
backup). Exercises real migrate_config + real file I/O via subprocess.
spectrum-ts routes stream telemetry through @photon-ai/otel's createLogger,
which sends severity>=ERROR to console.error and WARN/INFO to console.log.
The two lines the health monitor keys off land on different channels:
log.error("stream persistently failing") -> console.error (caught), but
log.warn("stream interrupted; reconnecting") -> console.log (was missed).
The original interception patched console.error only, so the recovering->
degraded escalation counter never saw the interrupt bursts that are the
primary silent-inbound symptom. Verified live against spectrum-ts 3.1.0 +
@photon-ai/otel: 3 real log.warn('stream interrupted') calls now escalate
to degraded -> process.exit(75) -> adapter reconnect.
Adds a shared classifyStreamLog() fed by both console.error and console.log,
plus a regression test asserting both channels are intercepted.
Source-level regression guard (the script only runs on Windows, so there's no
runner on Linux CI). Asserts Resolve-AvailablePythonVersion exists, that
Install-Venv re-resolves the interpreter before the venv-creation line, and
that Test-Python and the resolver share the single $PythonFallbackVersions
constant so detection and venv creation can't drift apart again.
The Windows installer runs each -Stage NAME in its own powershell.exe under
Hermes-Setup.exe. Test-Python records a detected fallback (e.g. 3.12 when 3.11
is absent) via an in-memory $script:PythonVersion = $fallbackVer mutation,
which dies with the python stage's process. The fresh venv stage starts with
$PythonVersion back at its "3.11" default, so it logged "Creating virtual
environment with Python 3.11..." and ran uv venv venv --python 3.11, failing
with exit 2 on machines that only had the fallback installed.
Add a cross-process-safe Resolve-AvailablePythonVersion helper (preferring the
requested version, then the shared $PythonFallbackVersions list, probed via
uv python find) and call it at the top of Install-Venv before creating the
venv. Test-Python's fallback loop now iterates the same shared constant so
detection and venv creation can't drift.
The contract already documents the scale-to-zero PRIMITIVES (§3.2 going-idle/
buffered-flip, §3.3 wake poke) and what's out of scope. This adds the missing
half: the contract FROM the primitives TO the behaviour layer — the guarantees
a separate scale-to-zero workstream must honour to consume them safely (register
a wakeUrl before suspend; drain+ack before teardown; keep the reconnect loop
live; treat suspended != down in the health model; don't assume exactly-once/
prompt wake; suspend only when genuinely idle, composing with the existing drain
machine). Docs-only; lets the independent scale-to-zero stream build against a
written contract instead of re-reading the connector.
The cherry-picked commit is authored by jinhyuk9714@gmail.com (GitHub
sjh9714); the check-attribution CI gate requires every PR commit author
to be present in scripts/release.py AUTHOR_MAP.
_ensure_uv_for_termux only checked resolve_uv() (the managed
$HERMES_HOME/bin/uv) before falling back to pip, so a uv installed via
`pkg install uv` lives on PATH but is invisible to the helper. Combined
with the cherry-picked wheel-only fallback, a Termux user with no managed
uv still hit `pip install uv`, which has no Android wheel and tried to
source-build the Rust crate, OOM-killing low-memory devices.
Probe shutil.which("uv") right after the Termux guard and reuse it before
pip. Add a regression test that keeps resolve_uv() returning None while a
uv exists on PATH and asserts pip is never invoked.
Register a per-instance wakeUrl and forward it to the connector at
self-provision so a suspended gateway can be poked awake when buffered
work arrives (pairs with the connector-side WakePoker).
- relay_wake_url() resolver (env GATEWAY_RELAY_WAKE_URL, then
gateway.relay_wake_url in config.yaml), mirroring relay_instance_id()
- thread wake_url through _post_provision (adds wakeUrl to the body only
when set) + self_provision_relay (resolve, forward, log)
- hermes gateway enroll --wake-url <url> persists GATEWAY_RELAY_WAKE_URL
- document the §5.2 wake poke in relay-connector-contract.md §3.3
- tests: relay_wake_url resolution (env/config/absent), provision
forwarding, body-only-when-set (6 new; 130 relay tests pass)
The actual reconnect+drain on wake is Unit B's loop; this unit only
wires the wake SIGNAL. Opt-in: absent wakeUrl => connector never pokes.
install_pet now refuses spritesheet/pet.json URLs that aren't on a petdex
host (matching thumbnail_png's existing _is_petdex_host guard), so a
spoofed manifest can't redirect a download at an arbitrary host. Slugs
are normalized to a single path segment before indexing into pets_dir(),
closing a path-traversal vector in load_pet/remove_pet/install_pet.
The gateway half of the going-idle/buffered-flip primitive (scale-to-zero
PRIMITIVE, not the behaviour). Integrates with the EXISTING drain transition:
- ws_transport: `go_idle()` sends `going_idle` + awaits the connector's
`going_idle_ack` (connector-authoritative flip-then-ack, Q-5.3c — stays
serving until the ack so nothing is lost in the flip window); acks a buffered
inbound (bufferId present) via `inbound_ack` after the handler runs
(drain-without-dup on the delivery leg); NET-NEW reconnect loop re-dials +
re-handshakes after an unexpected close (off by default, on in production).
- adapter: emits `going_idle` from its existing `disconnect()` drain seam before
tearing down the socket; best-effort + guarded (never blocks shutdown).
- transport Protocol + contract doc §3.2 document the 3 new frames.
+6 relay tests (124 pass). NOT in scope: the autonomous idle timer / machine
suspend / NAS health model (deferred behaviour). Ben's relay-adapter solo lane.
* fix(windows): harden gateway scheduled task
* fix(windows): launch gateway scheduled task via console-less wscript
The Scheduled Task ran the gateway through cmd.exe, which allocates a
console. During logon Windows broadcasts CTRL_CLOSE_EVENT to console
process groups, reaping cmd.exe and the half-initialized gateway with
STATUS_CONTROL_C_EXIT (0xC000013A) - which Task Scheduler treats as a
user cancel, so RestartOnFailure never fires and the gateway vanishes on
every reboot (issue #45599 root cause #1).
Add a console-less .vbs launcher (wscript.exe -> pythonw.exe, both
GUI-subsystem) mirroring the gateway.cmd env + argv, and point the task
action at it. The .cmd stays for the Startup-folder fallback and /Run.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Jeff <jeffrobodie@gmail.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
When _auto_create_thread() creates a thread from a user message via
message.create_thread(), Discord fires a second MESSAGE_CREATE event
for the 'thread starter message'. That starter message carries
message.id == thread.id and may arrive with type=default instead of
type=21 (thread_starter_message), so the existing type filter in
on_message does not catch it — triggering a second call into
_handle_message and thus a second agent run and response.
Fix: after _auto_create_thread succeeds and returns a thread, pre-seed
the dedup cache with str(thread.id) via self._dedup.is_duplicate().
The dedup cache is the same TTL-based MessageDeduplicator that already
guards against Discord RESUME event replays. Calling is_duplicate()
marks the ID as seen; when the duplicate thread-starter MESSAGE_CREATE
arrives, on_message's guard returns True and the event is dropped.
This is a minimal, targeted fix:
- No new state: reuses the existing _dedup instance
- No timing/race: the pre-seed happens synchronously inside the async
_handle_message, before the thread-starter event can be dispatched
- Scoped: only fires when auto-threading is enabled AND thread creation
succeeds (thread object is not None)
Also adds tests in tests/gateway/test_discord_double_dispatch.py
covering the pre-seed behaviour, failure modes (thread creation fails,
auto-thread disabled), and dedup cache integrity.
Closes#51057
Required by contributor-check/check-attribution before salvaging PR #51129
(Discord thread-starter dedup, #51057). The CI step greps AUTHOR_MAP by
exact email and does not special-case noreply addresses.
_session_task_is_stale() failed to detect a stale session lock when the owner
task completed and cleaned _session_tasks (del in _process_message_background's
finally) but _active_sessions was NOT released because _release_session_guard
skipped on a guard mismatch (a concurrent reset/new command or drain handoff
swapped _active_sessions[key] to a different guard). With no owner task left to
inspect, _session_task_is_stale reported 'not stale', the orphaned guard was
never healed, and the session deadlocked permanently — later messages received
but never dispatched.
Reorder the finally cleanup to release-then-conditional-delete: release the
guard first, then drop the _session_tasks entry ONLY if the guard was actually
released (session_key no longer in _active_sessions). On a guard mismatch the
done-task entry survives, so the on-entry self-heal (_session_task_is_stale ->
_heal_stale_session_lock) detects the stale lock and clears it on the next
inbound message.
Extracted the cleanup into a callable _cleanup_finished_session_task() helper so
the regression test drives the REAL production code path rather than a copy of
its logic (the original test inlined the fixed logic and passed regardless of
the production order — mutation-verified the rewritten tests now fail on the
buggy del-first order). Added a positive-path test (guard matches -> release +
delete) so both branches are pinned.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Open-ended skill learning across every surface. /learn <free text> takes a
description of any source — a directory, a URL, the workflow you just walked
the agent through, or pasted notes — and the live agent gathers it with the
tools it already has (read_file/search_files, web_extract, the conversation,
the pasted text), then authors a SKILL.md via skill_manage following the
house authoring standards (<=60-char description, the standard section order,
Hermes-tool framing, no invented commands).
No engine, no model-tool footprint, works on any terminal backend (local,
Docker, remote): /learn builds a standards-guided prompt and hands it to the
agent as a normal turn.
- agent/learn_prompt.py: shared standards-guided prompt builder
- /learn registry entry (both surfaces) + CLI handler (inject onto input
queue) + gateway handler (rewrite turn, fall through, /blueprint pattern)
- tui_gateway command.dispatch returns a send directive -> TUI + dashboard chat
- dashboard Skills page 'Learn a skill' panel (dir + URL + open-ended text)
composes a /learn request and runs it in chat
- docs (slash-commands ref + skills feature page), 11 targeted tests
Inspired by OpenAI Codex's Record & Replay and the /learn concept from #47234
(dir-distillation engine); reworked to be open-ended and engine-free per
review.
PTB's HTTPXRequest builds its httpx.AsyncClient with
`limits = httpx.Limits(max_connections=connection_pool_size)` and no
keepalive tuning, so httpx's default keepalive_expiry=5.0 applies. Behind
an HTTP proxy (Cloudflare Warp etc.) a peer-initiated FIN can sit in
CLOSE_WAIT longer than that, leaking fds in the general request pool
(_request[1], which routes bot.send_message/set_my_commands) — the pool
_drain_polling_connections never resets. Telegram was the lone holdout
adapter not using the shared #18451 CLOSE_WAIT helper.
Wire gateway.platforms._http_client_limits.platform_httpx_limits() into
the httpx client across ALL THREE request-construction branches —
fallback-transport, proxy, and plain — via httpx_kwargs["limits"], which
PTB spreads last into its client kwargs so our tuned limits win. PTB's
connection_pool_size (max_connections) is preserved; only keepalive
behaviour is tightened (max_keepalive_connections + keepalive_expiry<5.0).
The fix is macOS-import-safe: no Linux-only socket TCP_KEEPIDLE/INTVL/CNT
constants at module scope (unlike the broken candidate which crashed on
import on the reporter's OS), and it patches the actual proxy path the
repro hits rather than TelegramFallbackTransport, which the proxy repro
never instantiates.
Adds a mutation-survivable behavior-contract test asserting every
HTTPXRequest built by connect() receives httpx_kwargs["limits"] with
keepalive_expiry < httpx's 5.0 default, across both the proxy and plain
branches. Reverting the limits wiring fails the test.
Co-authored-by: indigokarasu <mx.indigo.karasu@gmail.com>
When a session rotates id on compression, _sync_session_key_after_compress()
re-anchored the session_key, approval-notify routing, yolo state, and slash
worker — but never moved the active-session lease, which stayed keyed to the
pre-compression id. And _find_live_session_by_key() matched live sessions on
the stale session_key, not the live agent's current agent.session_id. After
compression a resume/create path failed to recognize the existing live agent
and could build a SECOND live agent against the same DB continuation -> forked
lineage / cross-session message mixing.
- active_sessions.transfer_active_session(): move a lease in place to the new
id under the exclusive file lock (no slot drop).
- gateway _transfer_active_session_slot(): call it inside
_sync_session_key_after_compress(); on the rare fallback (entry pruned)
RESERVE the new slot before releasing the old lease (reserve-before-release),
so a concurrent gateway at the session cap cannot grab the freed slot in a
release-then-reacquire window and leave this session with no lease; if the
reserve fails, keep the existing lease (review fix).
- _session_lookup_key(): make live-session lookup authoritative on
agent.session_id, wired into all stale-session_key consumers
(_find_live_session_by_key, _session_live_item, _live_session_payload) —
fixes the whole lookup class.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
/reload-mcp -> shutdown_mcp_servers -> _kill_orphaned_mcp_children(include_active=True)
-> _send_signal -> killpg(pgid, SIGTERM). When a tracked MCP stdio child shares
the gateway's OWN process group, killpg delivers SIGTERM to the gateway itself,
firing its SIGTERM handler -> os._exit(0): /reload-mcp crashes the gateway.
Pre-compute the gateway's own pgid (os.getpgrp(), None on Windows/restricted)
and, in _send_signal, skip killpg when pgid == own pgid, falling through to the
per-pid os.kill path so the child is still reaped without self-signaling.
Adds a regression test (folded in) that pins the guard: with a tracked pgid
equal to the gateway's own pgid, killpg is never called for that pgid and the
per-pid kill fallback is used. Mutation-checked.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Map Hermes xhigh→max to unlock DeepSeek V4's 'Max thinking' tier
through Ollama Cloud's OpenAI-compatible /v1/chat/completions endpoint.
low/medium/high pass through unchanged; disabled/none suppress
reasoning entirely.
Empirically confirmed: reasoning_effort:max produces ~2.5× more
thinking tokens than high on deepseek-v4-pro:cloud (1576 vs 642).
Parity with the classic CLI status bar's ⛓ indicator (PR #51441). The
Ink TUI status bar now shows ⛓ N for live background/async subagents
(delegate_task batches + background single delegations).
- tui_gateway/server.py: _get_usage() embeds active_subagents from
tools.async_delegation.active_count() — the same registry the CLI
reads — onto the existing per-update usage payload, guarded so a
raising active_count() leaves the field off without breaking usage.
- ui-tui appChrome: new 'subagents' status segment (breakpoint w>=92,
slots between bg and cost in the shed-order), renders ⛓ N from
usage.active_subagents.
- Usage / SessionUsageResponse types gain active_subagents?.
Distinct from the turn-scoped SpawnHud / /agents overlay, which mirror
live in-turn subagent.* events; this is the persistent registry count.
By default `hermes slack manifest` opts the app into Slack's AI Assistant
container (assistant_view feature + assistant:write scope +
assistant_thread_* events). Slack then renders DMs as the right-hand
Assistant split-pane, where every exchange is a thread and bare slash
commands (/help, /new, ...) are not delivered as normal command events —
they only work when the bot is @mentioned. There was no way to opt out
short of hand-editing the generated JSON.
Add --no-assistant to emit a flat-DM manifest that omits those three
pieces, so DMs render as a normal chat and slash commands dispatch
inline. The regular messaging surface (Messages tab, slash commands,
Socket Mode, channel + DM scopes/events) is preserved in both modes.
Default behaviour is unchanged (assistant mode still on).
Tests: cover both manifest modes and the argparse wiring.
The classic prompt_toolkit status bar already shows two background
indicators: ▶ N (/background agent threads) and ⚙ N (shell processes
spawned by terminal(background=true)). Background/async subagents
(delegate_task batches and background single delegations) had no
indicator despite being long-running work the user should be able to
see at a glance.
Add a third indicator ⛓ N sourced from
tools.async_delegation.active_count() — the count of delegations still
in the 'running' state. Renders in the plain-text builder and the
styled-fragment builder across the same width tiers as the other two
(omitted on the narrow <52 tier), guarded so a raising active_count()
leaves the snapshot at 0.
Adds a per-platform display.reasoning_style setting (code | blockquote |
subtext) controlling how the show_reasoning summary renders on the gateway.
Discord defaults to "subtext" (-# small grey metadata text); every other
platform keeps the fenced code block. Resolves through the existing
display.platforms.<platform>.reasoning_style override chain.
Replace the old "skips download when a system browser exists" assertions with
tests for the new behavior:
- no PATH scan for browser command names, and the "use the system browser" path
is gone;
- find_system_browser consults only an explicit AGENT_BROWSER_EXECUTABLE_PATH
override (which still skips the bundled download);
- strip_snap_browser_override runs on both install paths and a /snap/* path is
rejected, so already-affected installs auto-recover on update.
The installer scanned PATH/well-known locations for a Chrome/Chromium binary
and, when found, skipped the bundled Playwright Chromium download and wrote that
path into ~/.hermes/.env as AGENT_BROWSER_EXECUTABLE_PATH. On Snap-based systems
`command -v chromium` resolves to /snap/bin/chromium, whose sandbox blocks
agent-browser's control socket under /tmp -- so every browser_navigate hung
until the 60s timeout fired ("opening web page failed").
Drop the system-browser fallback entirely (per maintainer direction):
find_system_browser()/Find-SystemBrowser now honor ONLY an explicit, user-set
AGENT_BROWSER_EXECUTABLE_PATH override -- no PATH scan, no well-known-path scan.
A /snap/* path is rejected even when set explicitly, since its confinement is
the bug. Applied to both install.sh (Linux/macOS) and install.ps1 (Windows).
Crucially, also auto-repair already-affected installs: the bad snap path
persists in .env and is read directly by the runtime, and the installer skips
re-config when AGENT_BROWSER_EXECUTABLE_PATH is already set ("already
configured"), so a plain reinstall/update never recovered an existing user. New
strip_snap_browser_override() removes a snap-pointing AGENT_BROWSER_EXECUTABLE_PATH
(and its auto-written comment) from .env on every install/update, run from both
browser-setup paths (install_node_deps and ensure_browser), so updating is
enough to recover. A deliberately-set non-snap override is left untouched.
docker/stage2-hook.sh is intentionally untouched: it discovers the bundled
Playwright Chromium, not a system browser.
ci: centralize path-gating behind single orchestrator + all-checks-pass
gate
Replace the scattered per-workflow detect-changes pattern with a single
ci.yml orchestrator that runs the classifier once, then conditionally
calls sub-workflows via workflow_call based on lane outputs. A final
all-checks-pass job (if: always()) aggregates all results so branch
protection only needs to require one check.
Changes:
- New .github/workflows/ci.yml orchestrator (detect + conditional calls
+ all-checks-pass gate)
- Extend classify_changes.py with scan/deps/mcp_catalog lanes, absorbing
supply-chain-audit's internal changes job
- Update detect-changes/action.yml to expose the new lane outputs
- Convert all 10 PR-gated sub-workflows to workflow_call-only triggers,
removing their push/pull_request triggers and per-step detect-changes
guards (gating now happens at the orchestrator level)
- lint.yml + supply-chain-audit.yml receive event_name as a
workflow_call
input to replace github.event_name (which is "workflow_call" inside
called workflows)
- supply-chain-audit.yml: remove internal changes job + *-gate jobs
(orchestrator handles gating, booleans arrive as inputs)
- contributor-check.yml: remove internal filter step
- Update test_classify_changes.py for 6-lane output + new supply-chain
test cases
`npm ci` / `uv sync` / toolchain header fetches occasionally die on
transient network blips — e.g. node-pty's node-gyp fetching Node headers
(an undici assert) during the typecheck job's `npm ci`, which killed the job
before `tsc` ever ran. "Re-run and it goes green" is exactly what CI should
do itself.
- New reusable `.github/actions/retry` composite action wraps a command and
retries on failure (3x / 10s, command passed via env so it can't inject).
Applied to every PR-path network install: npm ci (typecheck, desktop
build, docs site), uv sync (tests, e2e), uv tool install (lint),
pip install (docs site).
- typecheck now runs `npm ci --ignore-scripts`: `tsc` needs only sources +
type defs, so skipping install scripts drops node-pty's native rebuild
(whose header fetch was the flake) and is faster. Validated locally — tsc
passes for ui-tui, apps/shared, and apps/desktop with scripts skipped.
- ripgrep download uses `curl --retry`.
Docker (main-only) and the release/windows workflows are intentionally left
for a follow-up.
The image build + smoke test + integration suite are the heaviest jobs in CI
(~9-11 min) and ran on every PR. Gate them to push-to-main and release: a
broken build surfaces on the main push, while the cheap pre-merge guards
(docker-lint hadolint/shellcheck, uv-lockfile-check) still run on PRs to
catch the common Dockerfile/lockfile breakage. Steps skip on PRs so the job
stays green; the dead PR-only arm64 cache-warm build is removed.
Heavy PR checks run on every PR because the workflows deliberately avoid
`on.paths` filters — a path-gated workflow leaves its required check pending
forever when no matching file changes, blocking merge. So a docs-only PR
still spins up the TypeScript matrix, the full Python suite, and ruff/ty.
Keep every workflow triggering on every PR (checks always report) but gate
the expensive *steps* on what the PR touches. Skipping a step (not the job)
leaves the job green, so required checks never hang — the same idiom already
proven in contributor-check.yml.
A classifier (scripts/ci/classify_changes.py) maps the PR diff to three
lanes — python, frontend, site — surfaced as step outputs by a composite
action (.github/actions/detect-changes). Fail-open: an empty diff or any
.github/ change runs everything; python is a denylist (skipped only when
every file is provably prose or a frontend-only package); skills/**/SKILL.md
counts as python-relevant since the skill-doc tests read that tree. Non-PR
events always run the full pipeline.
A Medium-integrity Hermes agent cannot drive High-integrity (admin)
windows on Windows — UIPI blocks UIA enumeration and mouse injection
(SOM returns 0 elements, clicks silently no-op, screenshots still work,
keyboard partially bypasses). OS constraint affecting every Windows
automation stack, not a cua-driver bug. Document the symptom + the
run-elevated workaround. Closes#49067.
Follow-up to the salvaged voice-clip fix: the rerouted video/mp4 branch
used {".m4a": "audio/mp4"}.get(ext, "audio/mp4"), whose sole key's value
equals the default, so it always returned "audio/mp4" regardless of the
cached extension (dead lookup + a throwaway dict per inbound voice clip).
Replace it with a module-level _SLACK_EXT_TO_AUDIO_MIME map so the reported
media_type matches the bytes we cached (e.g. a clip cached as .wav now
reports audio/wav instead of audio/mp4). STT routing already keys on the
audio/ prefix + cached filename extension, so behavior is unchanged; this
just removes the dead construct and keeps the reported mimetype coherent.
Slack in-app voice clips ("record a clip") arrive as MP4/AAC containers
(mimetype audio/mp4, filename audio_message*.mp4), and Slack sometimes
labels them video/mp4. The inbound audio handler derived the cache
extension from the mimetype and fell back to ".ogg" for anything not in
{.ogg,.mp3,.wav,.webm,.m4a} — so audio/mp4 voice messages were cached as
.ogg. OpenAI STT (whisper-1, gpt-4o-transcribe) sniffs the container from
the FILENAME extension, so it received MP4 bytes named .ogg and rejected
them. WhatsApp .ogg and uploaded .m4a worked only because their extension
happened to match the bytes.
Fix:
- _resolve_slack_audio_ext(): pick the cache extension from the real
filename first, then a mimetype map (audio/mp4 -> .m4a), defaulting to
.m4a — never the bogus .ogg fallback. Mirrors the video branch and the
audio map already in gateway/platforms/bluebubbles.py.
- _is_slack_voice_clip(): detect audio-only clips mislabeled video/mp4
via the slack_audio subtype / audio_message* filename, and route them
through the audio path (cached as audio, reported as audio/*) so they
reach STT instead of video understanding. Genuine videos (and
slack_video screen recordings) are left on the video path.
Verified end-to-end against a real audio-only MP4: old path cached it as
.ogg (ffprobe shows MP4 bytes -> container mismatch -> OpenAI rejects);
new path caches it as .mp4 (extension matches bytes -> accepted).
Adds inbound-audio tests (previously none): helper unit tests plus
_handle_slack_message E2E coverage for audio/mp4, video/mp4-mislabeled
voice clips, and a real video staying on the video path. Confirmed the
two voice-message tests fail without the fix (mutation check).
The gateway half of Phase 6 Unit ζ: project the agent's existing relevance
knobs into the connector's platform-agnostic vocabulary and declare them at boot
over the /relay/policy route, so the SAME mention-gating / free-response /
allow-bots behavior the agent applies directly also governs relay delivery (and
excluded chatter never wakes a scaled-to-zero agent).
- gateway/relay/__init__.py:
- relay_relevance_policy(): project require_mention -> requireAddress,
free_response_channels -> freeResponseScopes, {PLATFORM}_ALLOW_BOTS in
{mentions,all} -> allowOtherBots. Reads the fronted platform's config block
+ bridged top-level keys. Returns None when all-default (the connector's
quiet default already matches) or no concrete platform is fronted.
- send_relay_policy(): POST /relay/policy authenticated with the gateway's own
per-gateway upgrade token (make_upgrade_token — same bearer as the WS
upgrade), so the connector attaches it to the authenticated instance, never
a body-asserted id. Re-declares every boot (self-healing, full replace).
NEVER raises, NEVER blocks boot — relevance is an optimization layered on
the δ/ε authorization gate. Reuses the per-gateway secret + the
/relay/provision host; no new inbound surface, no new credential.
- _policy_url(): ws(s)://…/relay -> http(s)://…/relay/policy.
- gateway/run.py: call send_relay_policy() after register_relay_adapter()
succeeds (the secret is resolved by then).
- docs/relay-connector-contract.md: new §7 documenting per-instance delivery +
the management plane (/manage/* + /relay/policy) + the relevance-declaration
contract; versioning renumbered to §8. Contract conformance test stays green
(§2/§3 tables untouched).
Tests: +12 (projection mapping incl. comma-string + top-level fallback; send
auth/skip/fail-soft/non-200). Full relay suite 118 pass. The connector route is
already E2E-proven (connector repo gateway_policy_driver.py); this adds the real
gateway send-path it pairs with.
This completes Phase 6 (Team Gateway per-user isolation) end to end.
A "one-shot" is a single stateless model call that runs OUTSIDE any conversation:
it never touches session history, never breaks prompt caching, and returns plain
text. UI surfaces need this for small generative chores — a commit message from a
diff, a rename suggestion, a summary — where an agent turn would pollute the
thread and hand-rolling an LLM call at every call site would be worse.
- `agent/oneshot.py`: `run_oneshot(...)` over the existing auxiliary-client
plumbing (same path as title generation). Two call shapes: explicit
instructions/input, or a registered `template` + `variables` (templates own the
prompt engineering so it stays consistent across CLI/TUI/desktop). Ships a
`commit_message` template. Model selection inherits the live session via
`main_runtime`, else the configured aux `task` backend.
- `tui_gateway/server.py`: `llm.oneshot` RPC (long-handler) inheriting the
session's model when `session_id` resolves.
Stateless by construction — no session mutation, cache untouched.
Follow-up to the coding-context posture (#43316): that PR detects each repo's
verify loop (manifests, package manager, exact test/lint/build commands, context
files) and bakes it into the system-prompt snapshot — but only as a string, for
the model. Non-prompt consumers (the desktop verify UI) had no way to read it
without re-sniffing and drifting from the prompt.
Split detection from rendering, keeping one source of truth:
- `detect_project_facts(root) -> ProjectFacts` (frozen) holds the structured
facts; `_project_facts()` now renders it into the same snapshot lines, so the
prompt block stays byte-identical (cache-safe).
- `project_facts_for(cwd)` resolves the workspace root (git, else marker) and
returns the structured facts, or None outside a workspace.
- `project.facts` gateway RPC surfaces it to any client (desktop/TUI/ACP).
Tests assert the structured output and that the UI-facing commands never drift
from what the prompt block renders (one detector feeds both).
* Revert "fix(cron): scope job execution to its owning profile (#32091 follow-up) (#50993)"
This reverts commit 660e36f097.
* Revert "fix(cron): anchor cron storage at the default root home (not the active profile)"
This reverts commit a5c09fd176.
Register previewable artifacts from the tool row, feed a session-scoped store,
and render compact rows above the composer. Remove the inline preview card.
* feat(memory): OAuth token storage and refresh for the Honcho provider
* feat(memory): refresh the Honcho OAuth token in the client and session
* feat(memory): zero-CLI loopback OAuth authorization flow
* feat(memory): generic memory-provider OAuth connect endpoints
* feat(desktop): memory-provider OAuth connect link
* feat(memory): CLI OAuth sign-in with source-tagged authorize links
* fix(memory): IP-literal loopback redirect and consent config_path on the authorize link
* fix(memory): profile-scope the memory-provider OAuth endpoints
* refactor(desktop): generic memory-provider OAuth client functions
* docs(memory): trim OAuth module docstrings to the invariants
* docs(memory): document OAuth connect as an optional auth method
* fix(memory): send home-relative display path to consent, not the absolute path
* perf(memory): cache OAuth token expiry in memory to skip the hot-path disk read
* fix(memory): log OAuth refresh failures at warning, not debug
* feat(memory): fall back to an OS-assigned loopback port when 8765 is taken
* test(memory): cover the desktop Connect launcher, status, and provider dispatch
* fix(desktop): keep the memory-provider dropdown one size regardless of connect state
* fix(desktop): move the memory connect link to the description line, leaving the dropdown untouched
* refactor(memory): move OAuth connect routes out of web_server into a memory-layer router
* refactor(desktop): import MemoryConnect directly, drop the single-export barrel
* fix(memory): launch CLI OAuth sign-in right after the auth choice, not after the wizard
* fix(desktop): auto-clear the OAuth error state instead of leaving it sticky
* test(honcho): isolate auth-method prompt from deployment-shape wizard tests
main's wizard suite scripts the cloud prompts without the OAuth auth-method step; auto-answer it in the shared helper so the answer lists stay shape-only.
* docs(honcho): document query-adaptive reasoning level (reasoningHeuristic)
README never mentioned reasoningHeuristic and listed reasoningLevelCap as an orphaned cap with the wrong default (— vs "high"). Add the query-adaptive scaling note + the reasoningHeuristic/reasoningLevelCap rows (grouped under Dialectic & Reasoning), matching the wording already on the hosted honcho.md page, and add a pointer from the memory-providers overview.
* fix(honcho): default the CLI peer prompt to the OAuth consent name
The CLI runs the grant with apply_config=False, so the peerName the user just entered at consent was dropped and the wizard's 'Your name' prompt fell back to $USER. Surface it as a transient OAuthCredential.consent_peer_name (set even when config isn't merged) and seed the prompt default from it.
* feat(honcho): split OAuth client_id by surface (cli=hermes-agent, desktop=hermes-desktop)
resolve_endpoints now picks the client_id from the initiating surface and
threads it through authorize -> token exchange -> persisted grant -> refresh,
so the CLI and desktop register as distinct OAuth clients. Surface-specific
env overrides (HONCHO_OAUTH_CLIENT_ID_CLI/_DESKTOP) win over the generic
HONCHO_OAUTH_CLIENT_ID, which still overrides every surface.
* feat(honcho): show OAuth vs API key in status; detect existing OAuth in setup
status now prints 'Auth: OAuth (clientId, token valid Xm/expired)' instead of
masking the OAuth access token as a generic API key; setup notes an existing
OAuth grant when re-run.
* docs(honcho): drop 'shared pool' wording from unified observation mode help
* fix(honcho): cross-process lock around OAuth refresh to prevent grant revocation
The in-process threading lock can't stop a sibling process (another profile or
the desktop app sharing honcho.json) from replaying the single-use refresh
token and tripping reuse-detection, which revokes the whole grant. Guard the
read-refresh-persist section with an OS file lock on <config>.lock so only one
process rotates at a time; the others re-read the freshly-persisted token.
Best-effort: platforms without flock degrade to in-process serialization.
* refactor(honcho): one OAuth client (hermes-agent) for all surfaces
Collapse the per-surface client_id split. CLI and desktop now use a single
client_id (hermes-agent); consent branding/UI still adapt via the source query
param. One grant identity means no clientId-vs-refresh-token desync that could
get the grant revoked. HONCHO_OAUTH_CLIENT_ID still overrides for self-hosting.
* fix(honcho): per-session resolves to session_id, never remapped by title
Reorder resolve_session_name so stable identifiers win over labels: gateway
per-chat key first, then the per-session session_id, then the cwd map / title.
A (possibly auto-generated) title can no longer remap a live per-session
conversation onto a second Honcho session mid-stream — fixes the desktop, which
is per-conversation via session_id. Consequence: a gateway's per-chat key now
also wins over a title (titles never remap a stable id).
Adds a compact right-edge prompt timeline for long desktop chat sessions, with hover previews, click-to-jump, active/hover row states, and pane hover-reveal suppression so the rail can live at the hard edge without opening side panels.
The subprocess-stdin guard (TUI gateway fd-inheritance protection) flagged
the `permissions grant` call. None of the cua-driver probes/grant read
stdin, so DEVNULL is correct; apply it to the shared `_run` helper and the
grant call.
The card was macOS-only. cua-driver also runs on Windows and Linux, so
fold `cua-driver doctor` (cross-platform binary/health probes) into a
single OS-aware `ready` signal:
- macOS: ready == both TCC grants; keeps the permission rows + grant flow.
- Windows/Linux: no TCC toggles, so ready == driver health, with a
per-OS note (SmartScreen/UIAccess on Windows; X11/XWayland on Linux).
`computer_use_status()` replaces the macOS-only `permissions_status()` and
surfaces `platform`, `ready`, `can_grant`, and the doctor `checks` (non-ok
ones render as warnings). CLI `permissions status`, the REST endpoint, and
the desktop card all key off the one payload. Grant stays macOS-only (400
elsewhere — nothing to grant).
Vision mode called a `screenshot` MCP tool that cua-driver dropped in
0.5.x (full-window PNG capture was folded into `get_window_state`). The
driver replied "Unknown tool: screenshot", so `images` came back empty,
`png_b64` stayed None, and capture returned a 0x0 result with no image
on every call. `som`/`ax` were unaffected because they already use
`get_window_state`, which masked the regression.
Route vision by capability:
- driver advertises `screenshot` (older builds) -> use it (no AX walk)
- otherwise -> call `get_window_state` but discard the AX tree/elements,
returning only the PNG so vision stays free of element noise
- capabilities not yet discovered -> try `screenshot`, fall back to
`get_window_state` on an empty image, so the path self-heals
Add `_image_from_tool_result` to pull the PNG from either an MCP image
content-part or `structuredContent.screenshot_png_b64`, and use it on
the som path too so the image won't silently drop on driver builds that
deliver it via structuredContent instead of a content part.
Verified live (vision: 1568x954, 0 elements; som: image + 527 elements)
and with unit coverage of all four routing cases.
Computer Use already worked through the desktop backend (the cua-driver
toolset enables + installs via Settings -> Skills & Tools), but there was
no in-app way to see or grant the two macOS permissions it needs, so "give
a model my Mac" was tribal knowledge.
The grants attach to cua-driver's OWN TCC identity (com.trycua.driver /
the installed CuaDriver.app), not Hermes -- so no app entitlement is
involved. cua-driver 0.5+ exposes `permissions status/grant`, which we wrap:
- tools/computer_use/permissions.py: thin client over the two subcommands
- hermes computer-use permissions {status,grant}: CLI parity
- GET /api/tools/computer-use/status, POST .../permissions/grant: desktop REST
- ComputerUsePanel: live Accessibility + Screen Recording state with a
Grant button (dialog attributed to CuaDriver), shown in the expanded
Computer Use toolset row. Binary install stays in the existing provider
post-setup runner.
Follow-ups: i18n the card copy; a "Stop driver" control (cua-driver stop)
for the runaway-`serve` case.
Adds auxiliary.background_review.{provider,model} (default auto = main chat
model — unchanged). Set it to a different, cheaper model and the post-turn
self-improvement review runs there for ~3-5x lower cost.
Cache-aware by design: the main chat is warm in the prompt cache, so the
default full-history replay on the main model is cheap cache reads — left
exactly as-is. A different model can't reuse that cache (different key), so
when (and only when) routed to a different model the fork replays a compact
digest instead of the full transcript, minimising what it cold-writes on the
aux model. Same model -> full replay; different model -> digest.
Quality holds in benchmarks: memory capture identical, skill near-identical.
Nothing changes unless you opt in by naming a different model.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
The #32091 fix moved every profile's cron jobs into one shared root store,
but never wired the execution-scoping half it recommended: a job still ran
under whichever profile's ticker picked it up, not its owning profile. So a
job created under `hermes -p donna` could execute with the root profile's
.env / config.yaml / credentials.
- jobs.py: create_job auto-captures the active profile (explicit profile=
override available) and stores it on the job; resolve_profile_home() maps a
profile name to its HERMES_HOME; legacy jobs backfill to 'default'.
- scheduler.py: run_job applies the job's profile via a scoped HERMES_HOME
override (env var + in-process ContextVar) before any .env/config/script
load, restored in finally. tick() routes profile-mismatched jobs to the
single-worker sequential pool so the env mutation can't race.
- cronjob tool threads profile through (NOT exposed in the model schema, to
avoid cross-profile privilege escalation); hermes cron add gains --profile.
E2E verified against a temp HERMES_HOME with a real profile dir: a root-profile
ticker runs a profile='donna' job with HERMES_HOME=donna during execution and
restores the ticker env afterward.
File tools (read_file, write_file, patch, list_directory, etc.) used
os.path.expanduser() which reads the gateway process HOME env var.
In Docker/systemd/s6 deployments where the gateway HOME differs from
interactive sessions, tilde expanded to the wrong directory.
Add _expand_tilde() helper that delegates to get_subprocess_home() when
available, falling back to os.path.expanduser(). Replace all 9
expanduser() call sites in file_tools.py with _expand_tilde().
Follow-up to #50767, which redacted the chat-platform (_approval_notify_sync)
and SSE/API (_approval_notify) approval transports. The TUI JSON-RPC transport
is the third egress and was missed: three register_gateway_notify callbacks in
tui_gateway/server.py emitted the raw approval_data — including the unredacted
command Tirith flagged — straight to the TUI client via _emit.
Route all three registrations through a new module-level _emit_approval_request()
helper that redacts payload['command'] via the shared
gateway.run._redact_approval_command seam before emitting, matching the pattern
used for the other two transports. Completes the whole-bug-class fix for #48456.
Tests: assert the helper emits a redacted command (real credential pattern),
handles missing/None command, and a wiring guard that no registration emits the
raw payload directly (only the helper may). Both mutation-checked.
The #48456 fix series originated from @liuhao1024's #48462 — credit to them for
the original report and chat-platform fix; this completes the remaining transport.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Follow-up to the /memory approve fresh-store fix. Both the CLI fallback and
the messaging-gateway handler built a bare MemoryStore() with the hardcoded
default char limits (2200/1375), ignoring the user's configured
memory.memory_char_limit / user_char_limit. A live agent honors those
overrides (agent/agent_init.py), so an approval applied without a live agent
could accept a write the user's lower cap would reject, or vice versa.
Extract a shared tools.memory_tool.load_on_disk_store() factory that reads
the configured limits (falling back to defaults if config can't load) and
wire both the CLI and gateway handlers to it, closing the gap on both
surfaces and de-duplicating the construction block.
The CLI /memory slash handler (cli_commands_mixin._handle_memory_command)
passed self.agent._memory_store straight through, which is None when the
command runs without a live agent — e.g. /memory approve from the Desktop
GUI. The shared write-approval handler then returns "memory store
unavailable" and applies nothing, even with built-in memory enabled and
pending writes present.
Fall back to a freshly loaded on-disk MemoryStore when no live store is
available, mirroring the gateway path (gateway/slash_commands.py). It
persists to the same MEMORY/USER.md and creates MEMORY.md on the first
approved write.
Fixes#46783
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The media-delivery denylist in gateway/platforms/base.py enumerated only
.env/auth.json/credentials/config.yaml under HERMES_HOME, so other
credential stores that live at the root fell through and could be
auto-attached to chat replies. The reported case: the Google Workspace
skill's google_token.json refreshes every turn, bumping its mtime to
'now', which kept passing the strict-mode recency window and re-sent the
OAuth token on every reply.
Extend the explicit per-file denylist to mirror the canonical credential
set already enforced by the read/write guards in agent/file_safety.py:
google_token.json, google_oauth_pending.json, auth/google_oauth.json,
.anthropic_oauth.json, webhook_subscriptions.json, cache/bws_cache.json,
auth.lock, and the pairing/ token directory.
Targeted per-file additions (not a blanket ~/.hermes deny, which was
declined in #32090/#34425 because it would block skills/, logs/, and
ad-hoc agent-written deliverables). mcp-tokens/ (#37222) and
state.db/kanban.db (#41071) are left to their sibling targeted PRs.
Reported-by: xxxigm (#50912)
An unpinned cron job follows the global default provider (config.yaml
model.default + resolve_runtime_provider). If that global state is changed
after the job is created — e.g. a temporary switch to a paid provider like
nous/claude-fable-5 — the job silently inherits it on its next tick and spends
real money. This is the reported $7.73 incident: a job created under a
free/default provider later inherited a temporary paid switch.
Fix (ask #1 only) preserves the legitimate "unpinned job should follow
model.default" use case by detecting *drift* rather than freezing the model:
- create_job (cron/jobs.py): for UNPINNED, agent-backed jobs (no explicit
provider, not no_agent), snapshot the provider that resolution WOULD pick
right now into a new optional `provider_snapshot` field, resolved via the
same resolve_runtime_provider() path the ticker uses. Fail-open to None on
any resolution error so job creation never breaks.
- run_job (cron/scheduler.py): right after runtime resolution, if the job has
a provider_snapshot AND is unpinned AND the currently-resolved provider
DIFFERS from the snapshot, fail closed for that run — make no paid call and
deliver a loud, actionable alert naming both providers and telling the user
to pin explicitly (`cronjob action=update job_id=.. provider=..`).
Back-compat: jobs with no snapshot (pre-existing jobs, no_agent jobs, or any
job whose creation-time resolution failed) behave exactly as before — the
guard only engages when a snapshot exists. Explicitly-pinned jobs (job.provider
set) are unaffected since they don't drift with global state.
Tests: tests/cron/test_cron_provider_pin.py covers snapshot-matches (runs),
snapshot-differs (fail closed, no agent constructed), no-snapshot back-compat,
None-snapshot back-compat, explicitly-pinned (runs regardless), plus create_job
snapshot capture/skip/fail-open. The fail-closed case is load-bearing (fails
without the guard).
Issue #44585 asks #2-4 (hard-stop a running job, gateway-stop containment,
fail-closed on provider mutation) are out of scope for this change.
Discord enforces a hard 100-command limit per app and rejects an upsert that would push the live total over 100 (error 30032), which silently breaks ALL slash commands. The sync deleted obsolete commands AFTER creating new ones, so an app already at the cap momentarily exceeded it and the whole sync failed.
Reorder: delete no-longer-desired commands up front, then create/update. Removes the now-redundant trailing delete loop. Adapts @infinitycrew39 PR #50890 to current main (the original adapter diff no longer applied after the platform refactor); test commit cherry-picked with authorship preserved.
Add a test to verify that _safe_sync_slash_commands deletes obsolete
commands before creating new ones. This ensures we never temporarily
exceed Discord's 100-command limit during sync, which would trigger
error 30032 and break all slash commands.
This test guards against the regression where sync could fail even though
the registration cap was properly enforced.
f-trycua's #50855 test file predated the cross-platform PR (#50552) and
reintroduced two stale tests asserting Linux is unsupported
(test_*_non_macos_*, patching platform.system="Linux" and expecting a
no-op/warn). Linux + Windows are supported now, so install proceeds on
those platforms. Restore main's cross-platform-correct versions:
test_*_on_unsupported_platform_* using FreeBSD as the genuinely
unsupported case.
`hermes computer-use install` refused to install on Linux, Windows, and
macOS x86_64 because the pre-install asset probe was hitting the wrong
GitHub endpoint AND duplicating tag-resolution logic the upstream
installer already does correctly.
`_check_cua_driver_asset_for_arch()` queried
`https://api.github.com/repos/trycua/cua/releases/latest`. On trycua/cua:
- cua-driver-rs releases (the binary the installer fetches) are marked
**prerelease** on every cut. GitHub's `/releases/latest` explicitly
skips prereleases.
- The Python package releases (`cua-agent`, `cua-computer`, `cua-train`)
are non-prerelease and end up as the "latest" instead.
Live API check today:
$ curl -sf https://api.github.com/repos/trycua/cua/releases/latest \
| jq '{tag:.tag_name, asset_count: (.assets|length)}'
{ "tag": "agent-v0.8.3", "asset_count": 0 }
The probe sees zero assets, prints "Latest CUA release has no Linux
x86_64 asset", and skips install on every Linux / Windows / macOS-x86_64
host — even though the cua-driver-rs-v0.6.0 release ships 19 binary
assets covering all those platforms.
Filtering `/releases?per_page=N` for the `cua-driver-rs-v*` prefix
fixes the bug, but it duplicates tag-resolution logic the upstream
`_install-rust.sh` already does correctly via `CUA_DRIVER_RS_BAKED_VERSION`
(auto-baked by CD on every release, with a `/releases?per_page=N` API
fallback for dev checkouts). The right answer is to trust that
contract instead of mirroring it in Python where it can drift.
Two paths get the same outcome without the probe:
1. **Fresh install**: run `install.sh` directly. It has the baked
release tag, fetches the right asset, and errors with a clear
message on missing-arch downloads. No preflight needed.
2. **Upgrade path**: `cua_driver_update_check()` (separately added)
shells `cua-driver check-update --json` against the installed
binary, which returns the canonical update answer from the same
source the installer uses.
- `hermes_cli/tools_config.py`: delete `_check_cua_driver_asset_for_arch`
and its two call sites in `install_cua_driver`. Replace with an
inline comment near the top of the module explaining the rationale.
- `tests/hermes_cli/test_install_cua_driver.py`: drop the
`TestCheckCuaDriverAssetForArch` block. Add `TestArchProbeRemoval`
with three regressions:
- `test_probe_function_is_gone` — asserts the deleted helpers stay
deleted.
- `test_fresh_install_does_not_call_github_api` — asserts the
install path doesn't hit GitHub directly from Python anymore.
- `test_upgrade_with_binary_does_not_call_github_api_directly` —
same for the upgrade path.
All 9 `test_install_cua_driver` tests pass.
Reported by @teknium1 while testing on a headed Ubuntu host.
* chore: re-trigger CI (workflows did not dispatch on prior head)
* fix(image/video gen): make schema delivery instruction platform-neutral
The image_generate and video_generate tool schema descriptions hardcoded
a gateway-only delivery instruction ('display it with markdown
 and the gateway will deliver it'). That schema
is sent on every platform, so on CLI it directly contradicted the CLI
platform hint ('Do NOT emit MEDIA:/path tags ... state its absolute path
in plain text'), and on messaging platforms it was also wrong about the
mechanism (local file paths are delivered via MEDIA: tags, not markdown
image syntax — markdown ![]() only works for URLs).
The per-platform file-delivery convention is already owned correctly by
the platform hints in prompt_builder.py. The tool schema now just
describes the result shape (URL or absolute path in the image/video field)
and defers 'how to deliver' to the active platform's guidance.
Provider/model injection already works via _build_dynamic_image_schema()
(the 'Active backend: <provider> · model: <model>' line); no change there.
A typed `/model <name>` where `<name>` is declared under `providers.<slug>` or
`custom_providers` — but typed while the current provider is a soft-accepting
one (e.g. `openai-codex`) — stayed on the current provider and was swallowed as
an unknown hidden Codex model, instead of routing to the provider that actually
declares it.
Add configured-provider exact-match detection (`_configured_provider_matches`)
and a new Step d.5 in `switch_model`: if the typed model is declared in
user/custom provider config, route to that provider BEFORE
`detect_provider_for_model()` guesses from static catalogs and BEFORE the
common-path validation lets a soft-accepting current provider swallow the name.
- Matching is exact (case-insensitive) against explicitly-declared model
collections only (`models`, `model`, `default_model`) — never fuzzy/family.
- Same-provider declarer → keep current provider (canonicalize the id).
- Multiple declarers → fail clearly and ask for `--provider <slug>`.
- Single declarer → route there; for `providers.<slug>` user providers, set
`explicit_provider` so the credential block resolves base_url/key from config.
- Step e (`detect_provider_for_model`) is gated off when `config_routed`.
The deliberately-supported openai-codex / xai-oauth hidden-model soft-accept
(#16172 / #19729) is left untouched: when nothing in config matches, detection
is a no-op.
Salvaged from #45442 by harjothkhara (authorship preserved).
Tests: tests/hermes_cli/test_model_switch_configured_provider_routing.py
(7 tests). Full model_switch suite: 214 passed.
Fixes#45006
Salvages #50469 by @libre-7.
_dashboard_local_update_managed_externally() previously blocked every containerized dashboard from the local update API, even when the running install was a bind-mounted git checkout that can be updated with hermes update.
Allow the dashboard updater only for git installs inside containers, while keeping hosted /opt/data, docker, and pip installs managed externally. Pip remains blocked because its apply path mutates the running container filesystem and is not the self-managed checkout case.
Adds regression coverage for docker, git, and pip install-method handling inside containers, and maps the contributor email for release attribution.
Co-authored-by: libre-7 <libre-7@users.noreply.github.com>
Follow-up to #31501. When the send-fallback prune removes a chat's
final telegram_dm_topic_bindings row, also flip
telegram_dm_topic_mode.enabled to 0 in the same transaction.
Without this, a user who turns topics off in the Telegram client
(rather than via /topic off) leaves enabled=1 with zero lanes:
_recover_telegram_topic_thread_id keeps treating the chat as
topic-enabled and lobby messages keep hunting for bindings that no
longer exist. Clearing the flag makes recovery fully stand down once
the dead topics are gone.
Adds 3 regression tests covering the last-binding clear, the
multi-binding no-op, and the unmatched-prune no-op.
Thirteen tests across four layers:
* ``SessionDB.delete_telegram_topic_binding`` — pin the new
helper's contract: removes only the (chat_id, thread_id) row
it was asked about, leaves siblings alone, returns 0 silently
when the row never existed, and is a no-op on a pristine
database whose topic-mode tables haven't been migrated yet.
* ``TelegramAdapter._prune_stale_dm_topic_binding`` — the glue
must drop the binding when ``self._session_store._db``
exposes the helper, swallow exceptions so a failed cleanup
never breaks the user-facing send, and refuse to issue a
DELETE for ``chat_id=None`` / ``thread_id=None`` so a
bookkeeping miss can't accidentally null-match every row.
* Source-level guards on ``TelegramAdapter.send`` and
``_send_message_with_thread_fallback`` — the prune call must
sit beside the two existing "Thread X not found, retrying
without message_thread_id" warnings, before the retry runs,
so a future refactor can't silently drop the cleanup wire.
* End-to-end semantic — once a topic is pruned, the
``GatewayRunner._recover_telegram_topic_thread_id`` walk
steers future inbound messages to the surviving binding
instead of the dead one. This is the exact behaviour change
the bug report's reproduction asks for: no more landings in
the wrong topic until the operator hand-edits ``state.db``.
Refs #31501
Both fallback sites that currently log "Thread X not found,
retrying without message_thread_id" now also drop the
``telegram_dm_topic_bindings`` row keyed on
``(chat_id, thread_id)``:
* The streaming send loop (``send`` body) — fires on the
second failure, after the same-thread one-shot retry confirms
the thread really is gone (the first attempt is left alone
because Bot API has been observed to return a transient
"Thread not found" that recovers on immediate retry).
* The control-message helper ``_send_message_with_thread_fallback``
(approval prompts, model picker, update prompts) — single-shot
retry, prune unconditionally on the BadRequest match.
Without this prune, a user who deletes a Telegram DM topic in
the client keeps getting their next inbound message recovered
back to the dead thread by
``_recover_telegram_topic_thread_id`` in ``gateway/run.py``,
which walks the per-user binding list newest-first and treats
the deleted thread as authoritative. The reproduction in the
bug report is exactly this: tool progress, approvals, activity
messages and replies all land in the wrong place until the user
manually runs DELETE on state.db.
Cleanup is best-effort — we log at INFO when it succeeds, swallow
any exception from the SessionDB call, and the user-facing send
proceeds either way.
Refs #31501
Targeted ``(chat_id, thread_id)`` prune for the
``telegram_dm_topic_bindings`` table — the missing piece for
#31501, where the Telegram adapter detects a topic the user
deleted out-of-band but the binding row keeps living in
state.db. The recovery logic in
``gateway.run._recover_telegram_topic_thread_id`` then steers
every future inbound message back to the dead topic, dropping
tool progress, approvals and replies into the wrong place.
Returns the number of rows deleted; silently no-ops when the
topic-mode tables haven't been migrated yet (read-only / pristine
profile) so the helper is safe to call from a send-fallback
hot path before the schema has run.
capture(app='screen'|'desktop') now resolves to the OS shell/desktop
window (Windows Progman/WorkerW desktop or Shell_TrayWnd taskbar, macOS
Finder/Dock) so 'show me my screen' and 'click the taskbar' work.
Previously capture() only matched application windows, and the schema
advertised 'or the whole screen' without any code path delivering it.
cua-driver is window-oriented (no virtual-desktop or per-monitor MCP
tool), so a single image still cannot span multiple monitors — the
schema now states this and the no-desktop-window path returns a clear
message instead of silently grabbing the frontmost app.
cua-driver 0.6.x removed the standalone screenshot MCP tool, so
capture(mode='vision') hit 'Unknown tool: screenshot' and returned a
0x0 image with no PNG while som/ax (which use get_window_state) still
worked. Route vision through get_window_state(capture_mode='vision').
Salvaged from PR #50771; same fix submitted earlier as #39262 by
@Tranquil-Flow.
Adds an optional structured completion contract to the standing-goal loop,
adapted from OpenAI Codex's /goal guidance (a durable objective works best
when it names what done means, how to prove it, what not to break, what's in
scope, and when to stop).
A contract has five optional fields — outcome, verification, constraints,
boundaries, stop_when. When set, the continuation prompt tells the agent to
target the verification surface and respect constraints, and the judge marks
the goal done only when the verification criterion is met with concrete
evidence (command result, file excerpt, test output) instead of a loose
"looks done" claim. This tightens the most common /goal failure mode:
premature completion / endless over-continuation on an underspecified goal.
Two ways to set a contract, both backward compatible (bare /goal <text>
behaves exactly as before):
- /goal draft <objective> — expands plain text into a full contract via the
goal_judge aux model (cache-safe side call), falls back to a free-form goal
if the model is unavailable.
- /goal <text> with inline 'field: value' lines (verify:, constraints:,
boundaries:, stop when:, ...). Plain goals with an incidental colon are not
mangled — only known field prefixes are pulled out.
- /goal show prints the active contract.
Contracts persist in SessionDB.state_meta alongside the goal (survive /resume),
compose with /subgoal criteria, and old goal rows load unchanged. CLI + every
gateway platform via the shared GoalManager engine; zero new model tools.
Tests: +18 in tests/hermes_cli/test_goals.py (parse/serialize/judge-prompt/
draft/fallback), 73/73 green; 42/42 across the broader goal test surface;
live E2E roundtrip (set -> persist -> reload -> contract-aware prompts) green.
* chore: re-trigger CI (workflows did not dispatch on prior head)
* feat(skills): add cloudflare-temporary-deploy optional skill
Optional web-development skill teaching the agent to deploy a Worker to a
live workers.dev URL with no Cloudflare account via 'wrangler deploy
--temporary' (Wrangler 4.102.0+). Cloudflare provisions a throwaway,
claimable account valid for 60 minutes — ideal for an autonomous
write->deploy->verify loop with no OAuth/signup hard stop.
- SKILL.md: when/when-not, prereqs (unauth requirement, version floor),
step-by-step deploy + verify flow, product limits table, pitfalls
(hidden flag, stale global wrangler, auth-present error, rate limits,
workers.dev edge cache), verification.
- scripts/parse_deploy_output.py: stdlib-only parser extracting live URL,
claim URL, account name/state, expiry, deploy status from wrangler output.
- tests/skills/test_cloudflare_temporary_deploy_skill.py: 16 tests incl.
a real-output regression case.
Verified live end-to-end: temporary account created with no creds,
deployed to a live URL, curl confirmed body, redeploy reused the account.
Re-clamp once more on the next frame after pop-out so layout (sidebar widths,
fonts) has settled, and treat a degenerate pre-layout bounds rect as "unknown"
(fall back to the window) so we never clamp the box into a collapsed area. Net:
anyone who loads in with a stranded position is pulled back on-screen and the
fix is persisted, even if the first measure was premature.
Now that the popped-out composer is fixed to the viewport, clamping against the
window let it slide under a pinned sidebar. Confine it to the thread region
(data-slot="composer-bounds") instead — its rect already excludes a pinned
sidebar and the header — falling back to the full window before it's measured.
This subsumes the old titlebar top-margin (the thread rect starts below the
header).
Replaces the body-portal approach: render ChatBar as a sibling of the
contain:[layout paint] chat wrapper (inside the same runtime boundary) rather
than portaling the floating instance to <body>. The wrapper is a containing
block for — and clips — position:fixed descendants, which is what stranded the
popped-out composer off-screen. As a sibling it anchors to the outer relative
container: docked stays absolute (identical placement), floating resolves
against the viewport. Both states stay mounted, so dock<->float no longer
remounts the editor (the portal toggle did).
The popped-out composer is position:fixed, but the chat content wrapper sets
`contain: layout paint`, which makes it a containing block for — and clips —
fixed descendants. Inline, the floating composer was positioned/clipped relative
to the chat column (which shifts with the sidebars), not the viewport, so the
viewport-based bounds clamp from #50466 couldn't keep it reachable: users still
lost it off-screen. Portal it to <body> when popped out so fixed positioning and
the clamp finally share the viewport as their reference. Docked stays inline
(it's absolute within the chat column by design).
/simplify-code (LOW, flagged by two reviewers): the source tags 'user' /
'project' / 'bundled' were bare string literals scattered across the discovery
scrub and the two mount-time refuse guards. A typo in any one site (e.g.
'users') would SILENTLY disable a security gate with no error — the exact
failure mode this RCE boundary must not have.
Introduce a shared module-level _NON_BUNDLED_PLUGIN_SOURCES frozenset referenced
by both the discovery scrub and the (now single) mount guard, so the
auto-import policy lives in one place. The two mount guards collapse into one
gate that still emits the distinct per-source operator message via a map (no
loss of guidance). Behavior unchanged: 39 RCE-bypass tests pass, and the
constant is mutation-checked (typo'ing it fails the bypass tests).
Defence-in-depth (discovery scrub + mount refuse) is retained intentionally.
* feat(computer_use): disable cua-driver telemetry by default, add opt-in
cua-driver ships anonymous PostHog usage telemetry ENABLED by default
upstream (fires cua_driver_install / cua_driver_doctor events to
eu.i.posthog.com). Hermes now disables it for our users unless they
explicitly opt in.
- New config key `computer_use.cua_telemetry` (default false) in
DEFAULT_CONFIG.
- `cua_backend.cua_driver_child_env()` injects
`CUA_DRIVER_RS_TELEMETRY_ENABLED=0` into the child env when telemetry is
disabled (the default); leaves the var untouched on opt-in so the driver
uses its own default. Reads config fail-safe — any error defaults to
telemetry off.
- Routed every cua-driver spawn site through the policy: MCP backend
(StdioServerParameters env), `cua_driver_update_check`, doctor's
health_report Popen, the install.sh/install.ps1 runner, and the
`--version` / status probes.
- Docs: new Telemetry subsection in computer-use.md (EN).
- Tests: tests/computer_use/test_cua_telemetry.py — default disables,
explicit-false disables, opt-in leaves var untouched, config-failure
fails safe, inherited-enabled is overridden off.
Verified live on Linux against the real cua-driver-rs 0.6.0 binary: with
the var=0 the driver reports "telemetry: disabled via
CUA_DRIVER_RS_TELEMETRY_ENABLED" and sends no event; with it unset it logs
"sending event: cua_driver_doctor". 213 computer_use + install tests green.
* fix(dashboard): fold computer_use config category into agent tab
The new computer_use.cua_telemetry key created a single-field dashboard
config category, tripping test_no_single_field_categories (web_server's
invariant that categories with <2 fields must be merged to avoid tab
sprawl). Add computer_use -> agent to _CATEGORY_MERGE, matching the
existing onboarding/telegram single-field folds.
The Slack docs document `slack.mention_patterns` as custom wake words that
trigger the bot alongside `@mention`, and the config layer bridges the key into
the Slack adapter's `config.extra` — but the adapter never read it. With
`require_mention` on, a channel message containing a configured wake word (and
no literal `<@BOTUID>`) was silently ignored. Every other adapter that
documents `mention_patterns` (Telegram, DingTalk, Mattermost, WhatsApp,
BlueBubbles, Photon) implements it; Slack was the odd one out.
Add `_slack_mention_patterns()` (compiled, cached; reads `slack.mention_patterns`
as a list/string or `SLACK_MENTION_PATTERNS` as a JSON/CSV/newline list, invalid
regexes warned and skipped) and `_slack_message_matches_mention_patterns()`,
mirroring the existing adapters. Channel mention detection now also triggers on
a wake-word match, so the documented field works as described.
Adds tests for pattern compilation (list/string/env/invalid-regex) and for the
channel-trigger gating with a wake word under require_mention.
* chore: re-trigger CI (workflows did not dispatch on prior head)
* fix(delegation): emit high-concurrency cost warning once per process
_get_max_concurrent_children() runs on every get_definitions() schema
rebuild (via _build_top_level_description / _build_tasks_param_description),
not just on actual delegate_task calls. With max_concurrent_children>10 the
cost advisory fired on every turn / agent spawn across every session, spamming
the log even when delegate_task was never used. Gate it behind a module-level
_HIGH_CONCURRENCY_WARNED flag so it warns at most once per process.
The success/staged gating and op-expansion for mirroring built-in memory
writes to external providers lived in a standalone agent/memory_write_bridge.py
helper called inline from two core call sites (tool_executor.py,
agent_runtime_helpers.py). That left the mirror decision-making in the agent
loop, outside the memory-provider interface.
Fold it into a new MemoryManager.notify_memory_tool_write() entry point: the
loop now hands over the raw tool result + args and a metadata callback, and the
manager decides whether/what to mirror. Both core call sites collapse to a
single call; the orphan module is removed. No MemoryProvider ABC change.
Tests rewritten as behavior tests against the manager method.
Mirror built-in memory writes to external providers only after the native memory tool succeeds and is not staged for approval. Keep OpenViking's built-in memory mirroring add-only, since Hermes native memory entries do not yet have stable OpenViking file URIs for replace/remove.
Add a narrow viking_forget tool for exact user memory file deletion and document the current OpenViking write/delete behavior.
The install pre-flight asset probe queried trycua/cua's `releases/latest`,
which floats across the monorepo's components (agent-*, computer-*, lume-*,
train-*) — most ship zero binary assets. So the probe false-negatived and
hard-blocked `install_cua_driver` (line 770: `if not probe: return False`)
BEFORE the upstream installer ran, on Linux, Windows, and Intel macOS — even
though the installer it gates resolves the right tag and would have succeeded.
Net effect: the normal enable path (`hermes tools` → Computer Use post-setup,
and `hermes computer-use install`) refused to install on every platform this
PR claims to support.
Fix: list `/releases?per_page=100`, pick the newest `cua-driver-rs-v*` tag,
and match its assets on OS-token + arch — mirroring what the upstream
`install.sh` already does. Fail open if no driver release surfaces (installer
remains the source of truth). Adds an OS-token gate so a darwin asset can't
satisfy a Linux probe.
Tests: updated the install-probe fixtures to the list-of-releases shape with
`cua-driver-rs-v*` tags + OS-token asset names; added a regression guard
(`test_releases_latest_tag_ignored_picks_driver_rs_tag`) for the monorepo
floating-latest case. 25/25 install + 192 computer_use tests green.
Verified live: probe returns True for all six platform/arch combos against
the real GitHub releases API.
The runtime gate (check_computer_use_requirements) and the hermes tools
platform_gate both enable linux alongside darwin/win32, but several
docstrings/comments still described Linux as "alpha, gated off until it
flips upstream" — contradicting the code that ships it. Bring the prose in
line with the gate that's actually live:
- tool.py / cua_backend.py module docstrings: Linux is enabled (X11 today,
Wayland via XWayland), not gated off.
- toolsets.py description and hermes tools display name: (macOS/Windows) ->
(macOS/Windows/Linux).
No behavior change — the gate already allowed all three platforms.
Make the computer_use toolset platform-agnostic by driving cua-driver on
macOS, Windows, and Linux. Consumes the 8 cua-driver decoupling surfaces
(capability discovery, structuredContent AX tree, opaque element_token,
click button enum, explicit mimeType, machine-readable manifest,
structured list_windows, structured health_report), each degrading
gracefully on older drivers.
Adds `hermes computer-use doctor` (drives cua-driver health_report with a
per-OS check matrix and an exit 0/1/2 ok/degraded/blocked contract), full
typed wrappers for the previously-uncovered cua-driver tools plus a generic
call_tool escape hatch, per-session agent-cursor lifecycle, platform-aware
system-prompt guidance (host-deterministic, cache-safe), and honors
HERMES_CUA_DRIVER_CMD end-to-end.
Replaces the macOS-only skills/apple/macos-computer-use skill with a
cross-platform skills/computer-use skill, and refreshes the EN + zh-Hans
docs.
Supersedes #44221 (Windows-enablement salvage of #30660).
Co-authored-by: Teknium <127238744+teknium1@users.noreply.github.com>
Windows toast notifications silently no-op unless the app sets an
AppUserModelID — new Notification().show() returns without error and
nothing appears. The desktop's native-notification system (approval,
turn-done, input, etc.) was therefore dead on Windows while working on
macOS/Linux.
Set the AUMID to the build appId (com.nousresearch.hermes) on Windows
right after app.setName, so toasts route to the installed Start Menu
shortcut. No-op on macOS/Linux, which don't require it.
* feat(goals): add /goal wait <pid> barrier to park the loop on a background process
The /goal loop re-pokes the agent every turn via the post-turn judge. When a
goal is gated on a long-running background process (CI poller, build, test
matrix, deploy) that produces nothing to judge yet, this spins the agent into
'is it done?' busy-work and burns the turn budget.
/goal wait <pid> [reason] parks the loop: while the PID is alive, the judge is
skipped, no turn is consumed, no continuation fires, and /goal status shows a
parked indicator. The barrier auto-clears the moment the process exits (the
agent's notify_on_complete watcher is the natural wake signal), then the next
turn resumes normal judging. /goal unwait clears it manually; pause/resume/clear
drop it; a dead/stale PID can never wedge the loop.
Wired across CLI, gateway, and the mid-run command guard for parity. Barrier
persists in SessionDB.state_meta (survives /resume); GoalState gains
backward-compatible waiting_on_pid/waiting_reason/waiting_since fields. 12 new
tests; docs updated.
* fix(goals): use gateway.status._pid_exists for liveness, not os.kill(pid,0)
The Windows-footguns CI guard flagged os.kill(pid, 0) in _pid_alive — on
Windows that's not a no-op, it routes to CTRL_C_EVENT and hard-kills the
target's console process group (bpo-14484). Delegate to the canonical
footgun-safe gateway.status._pid_exists (psutil + ctypes/POSIX fallback)
instead, with a direct-psutil last resort.
* feat(goals): judge-driven auto-wait — the loop parks itself, no manual /goal wait
Makes the wait barrier automatic. Every turn the judge is shown the agent's
live background processes (pid, command, uptime, output tail from the
process_registry) alongside the goal + response, and can return a new 'wait'
verdict instead of continue:
{"verdict":"wait","wait_on_pid":N} → park until that process exits
{"verdict":"wait","wait_for_seconds":N} → park until the deadline passes
evaluate_after_turn acts on the directive (sets the barrier, parks the loop)
so the agent isn't re-poked into busy-work while CI/builds/deploys run. Adds a
time-based waiting_until barrier alongside the pid barrier; both auto-clear and
can never wedge the loop. Drivers (CLI, gateway, tui_gateway) feed the live
registry in via gather_background_processes(). Manual /goal wait stays as an
override. Judge verdict contract widened to (verdict, reason, parse_failed,
wait_directive); legacy {"done":bool} shape still accepted.
* test(goals): update kanban _fake_judge to the 4-tuple judge contract
CI test(3) caught it: test_kanban_goal_mode's _fake_judge still returned the
3-tuple (verdict, reason, parse_failed), but the kanban loop now unpacks the
4-tuple (+ wait_directive). Update the fake to return None for the directive
and accept the background_processes kwarg.
* feat(goals): trigger-based wait — park on a process's own signal, not just exit
Addresses two gaps in the judge-driven wait: (1) the judge could only express
'wait until PID exits' or 'wait N seconds', so a long-lived watcher/server that
fires a trigger MID-RUN (and may never exit) couldn't be waited on; (2) the
process's own watch_patterns/notify_on_complete trigger was invisible to the judge.
Adds a session-based barrier (waiting_on_session) that releases on the process's
OWN trigger via process_registry.is_session_waiting(): the session exits, OR (if
started with watch_patterns) its pattern matches — even while the process keeps
running. list_sessions() now surfaces session_id + watch_patterns/watch_hit/
notify_on_complete so the judge sees the trigger and is told to prefer
wait_on_session for trigger processes. Judge verdict gains a {wait_on_session}
directive (preferred over pid). Backward-compatible GoalState field; pid + time
barriers unchanged.
Tests: TestSessionTriggerBarrier (release on mid-run pattern match while alive,
release on exit, unknown-session, full park→trigger→resume, parse, validation,
backcompat load). 105 goal-surface + 85 process_registry tests green.
The composer model picker capped each provider's search matches at 12
(PER_PROVIDER_SEARCH). A provider serving more than 12 models (e.g.
opencode-go with 19) showed only a truncated subset when the user typed
its name to find it — exactly the models they were searching for got
cut. Edit Models showed the full list because it never applied this cap.
A search is already a narrowing action, so capping a single provider's
own matches is wrong. Remove the slice; search now lists every matching
model for the provider. The no-search default still shows the curated
top-N per provider via the visibility set.
Follow-up to #47077 (the backend dedup fix); this closes the remaining
frontend truncation users saw in the composer.
OpenCode Go (and OpenCode Zen) showed only a subset of the models they
serve in the desktop/CLI model picker — e.g. opencode-go rendered 13 of
19, silently dropping minimax-m3/m2.7/m2.5, glm-5/5.1, deepseek-v4-flash.
Root cause: the picker dedup in build_models_payload strips any model
from an aggregator row that overlaps a user-defined provider's catalog
(so a local proxy isn't shadowed by OpenRouter). It gated on
is_aggregator(), which is True for opencode-go/zen because their flat
/v1/models returns bare IDs the model-switch resolver searches. But
those are flat-namespace RESELLERS, not routing aggregators — every
model they list is first-party, so deduping them against a user proxy
that happens to serve a same-named model guts their own catalog.
Fix: add is_routing_aggregator() (True only for true routers like
OpenRouter and custom:* proxies; False for opencode-go/zen) and gate the
picker dedup on it. is_aggregator() is unchanged so model-switch flat
catalog resolution keeps working. Both desktop entry points
(model.options JSON-RPC and /api/model/options REST) and hermes model
share build_models_payload, so all surfaces get the full list.
Fixes#47077
The post-update gateway resume path (`_resume_windows_gateways_after_update`)
only relaunched gateways that were *running* when the update began — it
enumerates live PIDs in `_pause_windows_gateways_for_update` and respawns
exactly those. A gateway that had already died between updates (e.g. it was
launched attached to a terminal/TUI that later closed, taking the child with
it) was never brought back: the Startup-folder / Scheduled-Task autostart
entry only fires on the next login, not after an in-place update.
So a Desktop-GUI update (which runs `hermes update --yes --gateway`) on a box
whose gateway had quietly died would complete with no gateway running, and the
user had no indication anything should have come up.
Fix: when no gateway is running at pause time but an autostart entry is
installed (`gateway_windows.is_installed()` — an explicit "I want a gateway"
signal), return a `cold_start_if_installed` token. The resume step then does a
fresh detached spawn via `gateway_windows._spawn_detached()` — the same
windowless `pythonw` + `CREATE_BREAKAWAY_FROM_JOB` path `hermes gateway start`
uses. It re-checks liveness immediately before spawning so a concurrent start
(autostart entry firing) can't produce a duplicate.
Gateway-less users (no autostart entry) get nothing forced on them — the
pause step still returns None for them. POSIX is unaffected: enabled systemd
units already restart via `Restart=always`.
Windows-only; best-effort throughout (logs at debug and no-ops on any error).
Tests: pause returns the cold-start token only when installed, returns None
when not installed, resume cold-starts on the token, and resume skips the
cold-start when a gateway is already running.
Follow-up to ScotterMonk's cron-truncation fix:
- Remove HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var. Behavioral config
belongs in config.yaml, not a new HERMES_* env var (.env is secrets
only). The actual bug is fixed entirely by the adapter-aware skip; the
configurable cap was unneeded scope. MAX_PLATFORM_OUTPUT is a constant
again, collapsing the max_output=0 disable branch and the
audit-vs-truncation threshold divergence.
- Flag the remaining verified-chunking adapters (slack, matrix, feishu,
mattermost, teams, whatsapp, whatsapp_cloud, weixin, bluebubbles,
yuanbao) with splits_long_messages=True so the fix covers the whole
bug class, not just Discord/Telegram. Each verified to chunk in its
own send() via truncate_message().
- SMS deliberately left False: it chunks for normal replies but a
multi-segment cron blast is cost-bearing; the 4000-cap + file save is
the safer default there.
- Update tests: drop the two env-override tests, add a test asserting a
save failure during truncation (non-chunking) propagates.
Gateway-level truncation (MAX_PLATFORM_OUTPUT=4000) was pre-empting
adapter-side message splitting. Discord and Telegram both chunk long
content natively in their send() via truncate_message(), but the
delivery router truncated to 3800 chars + footer before the adapter
ever saw the full payload — so long cron output was cut short instead
of being delivered as multiple messages (issue #50126).
Changes:
- HERMES_DELIVERY_MAX_PLATFORM_OUTPUT env var makes the cap configurable
(default 4000, backward compatible). Set to 0 to disable truncation.
- TRUNCATED_VISIBLE (3800) removed — visible portion now derived
dynamically from max_output minus the actual footer length.
- New BasePlatformAdapter.splits_long_messages capability flag (default
False). Adapters that chunk in send() set True; delivery skips
truncation for them but still saves full output to disk as audit.
- Flagged Discord and Telegram (both verified to chunk in send()).
Fixes#50126
* chore: re-trigger CI (workflows did not dispatch on prior head)
* fix(update): don't count across shallow-clone boundary (bogus '12492 commits behind')
Installer checkouts are shallow (git clone --depth 1). The CLI banner and
hermes update --check both did a plain git fetch (silently unshallowing the
repo) then git rev-list --count HEAD..origin/main, which counts across the
shallow boundary and prints a huge nonsense number like '12492 commits behind'.
Detect shallow up front, fetch with --depth 1 to preserve the boundary, and
compare tip SHAs instead of counting:
- banner _check_via_local_git: returns UPDATE_AVAILABLE_NO_COUNT when behind
(renders as 'update available') instead of the bogus count.
- _cmd_update_check: reports presence-only on shallow clones.
Full clones keep the exact count path unchanged. Mirrors the desktop fix in
apps/desktop/electron/main.cjs (commit 2950c6fa2).
* fix: update to version 3 endpoints and adding update and delete tool
* chore: removing the test md file
* fix: prevent circuit breaker on client errors in Mem0 provider
* chore: add telemetry for platform version
* feat: add OSS mode support to Mem0 memory provider
* chore: bump mem0ai dependency to >=2.0.1 in memory plugin
* refactor: enhance dependency checks and embedder config in mem0 backend
* refactor: adjust fact storage message for OSS mode
* refactor: expand user paths, add collection recreation on dimension change for Qdrant
* fix(mem0): make MEM0_USER_ID override gateway-native ids and tag writes with channel
When MEM0_USER_ID was configured (env or mem0.json), the gateway-native id
from kwargs (Telegram numeric id, Discord snowflake, ...) still won, so the
same human ended up under different user_ids per channel and memories never
merged across CLI / Telegram / Slack / Discord. Mirrors openclaw's cfg.userId
pattern: configured override wins, gateway-native id is the fallback.
The legacy "hermes-user" placeholder default written by the setup wizard is
treated as unset to avoid silently bucketing every gateway user together.
Also tag every write with metadata.channel (cli/telegram/discord/...) so the
dashboard can offer per-channel filtered views without coupling identity to
the channel; document the read/write filter asymmetry as intentional
(reads scope to user_id only for cross-agent recall).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* refactor: improve Mem0 memory provider backend, pagination, config, and error handling
* refactor: update mem0 telemetry code, docs, and bump version
* fix(mem0): make get_config_schema() return unified schema with mode-aware required flag
Schema always includes api_key field so picker shows "API key / local" for
both modes. In OSS mode api_key.required=False so status won't mislead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: improve mem0 telemetry, add env var key and OSS mode detection
* chore: bump mem0ai lower bound to 2.0.4 (latest SDK release)
* refactor: set telemetry sample rate to 1.0 and update docs for opt‑out
* fix(mem0): resolve 15 correctness, thread-safety, and resource bugs
Thread safety:
- Protect circuit breaker counters with _breaker_lock (race between
prefetch/sync daemon threads and main thread)
- Wrap sync_turn thread creation in _sync_lock; skip if previous sync
is still alive after 5 s join to prevent duplicate memory ingestion
- Guard _schedule_flush timer creation under _queue_lock (TOCTOU race)
- Capture local `backend` reference in prefetch/sync closures so
shutdown() nulling self._backend cannot crash in-flight threads
Correctness:
- Fix bool("false")==True for rerank param; parse string values explicitly
- Guard page/top_k with max(1,...) and move int() inside try blocks
- Fix fact_count=0 always in OSS mode (Memory.add returns list, not dict)
- Fix prefetch() not clearing result when thread still alive after timeout
- Fix atexit.register accumulating on repeated initialize() calls
Backend / setup:
- Handle Qdrant named-vector collections in _recreate_collection_if_dims_changed
(vectors is a dict; .size access raised AttributeError, swallowed silently)
- Wrap QdrantClient and psycopg2 conn/cursor in try/finally to prevent leaks
- Resolve ollama_bin at top of _ensure_ollama; use it for ollama pull
- Fix embedder key lookup when LLM provider has no env_var (e.g. ollama)
Also: remove _telemetry_enabled cache (env var check is cheap), bump
required mem0ai to >=2.0.7, minor README wording fix.
* fix(mem0): fix brittle qdrant path test + add telemetry sample-rate docs
- Replace generator-throw lambda with a proper def in
test_qdrant_path_not_writable; use tmp_path instead of a hardcoded
/nonexistent path so the test is root-safe
- Add MEM0_TELEMETRY_SAMPLE_RATE to memory-providers.md (was only
in the plugin README, not the user-guide docs)
* revert: remove MEM0_TELEMETRY_SAMPLE_RATE from user-guide docs
* refactor: remove telemetry from mem0 plugin and update documentation
* fix(mem0): set stdin=DEVNULL on setup subprocess calls
The TUI stdin guard (scripts/check_subprocess_stdin.py) requires every
subprocess call in plugin code to set stdin= so it can't inherit the
gateway's JSON-RPC stdin fd. Muzzle the docker/ollama calls in the OSS
setup wizard with stdin=subprocess.DEVNULL (none need interactive input).
Also covers the docker-inspect call the linter's regex misses.
---------
Co-authored-by: chaithanyak42 <chaithanya.kumar42a@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Defense-in-depth for the dashboard plugin auto-import path. The web server
auto-imports and mounts the Python backend (dashboard/manifest.json -> api file)
of plugins found in ~/.hermes/plugins/ (user) and ./.hermes/plugins/ (project),
not just bundled plugins. So any plugin that reaches one of those dirs gets
arbitrary Python executed on the next dashboard start.
NOTE ON THREAT MODEL: #43719's originally-documented delivery chain (a public
--insecure dashboard + open API used to git clone a malicious repo into
~/.hermes/plugins/) is ALREADY mitigated on main — since the June 2026
hermes-0day hardening, a non-loopback bind ALWAYS requires an auth provider and
--insecure no longer bypasses the auth gate. This change is therefore NOT
closing that (now-authenticated) network path; it removes the residual
'arbitrary code executes merely because a plugin is on disk' hazard, which still
applies when a plugin arrives by other means: a socially-engineered git clone,
a supply-chain drop, an authenticated-but-malicious actor, or a future
regression in the auth gate. Untrusted on-disk code should not auto-execute.
Restrict dashboard backend Python auto-import to BUNDLED plugins only. User and
project plugins may still extend the dashboard UI via static JS/CSS, but their
api Python file is never auto-imported. Two layers: _discover_dashboard_plugins
scrubs api/_api_file for user/project sources (and bundled wins name conflicts
so a non-bundled plugin cannot shadow a trusted backend route);
_mount_plugin_api_routes re-refuses user/project at mount time. Tightens the
prior GHSA-5qr3-c538-wm9j / #29156 hardening (bundled+user) to bundled-only.
Salvaged from #44472 (@egilewski) onto current main.
The compaction trigger compared estimated input against context_length *
threshold, but the provider reserves max_tokens of OUTPUT out of the same
window. With a large max_tokens (e.g. 65536 on a custom provider) the usable
input budget is materially smaller than the raw window, so sessions hit a
provider 400 before compaction ever fired.
_compute_threshold_tokens now subtracts the output reservation
(context_length - max_tokens) before applying the percentage and the
small-window 85% guard. max_tokens is stored on the compressor (threaded from
agent.max_tokens at construction) and reused across update_model() switches;
None = provider default = no reservation (full-window behavior, unchanged).
Reimplemented on the current _compute_threshold_tokens surface (the inline
threshold calc the original PR targeted was since refactored for the
small-window #14690 fix); composes with that 85% guard on the effective budget.
Credit: @kyssta-exe (#43651) — original design for the output-token
reservation in the compaction threshold.
Closes#43547.
Add relay_instance_id() (env GATEWAY_RELAY_INSTANCE_ID first, then
gateway.relay_instance_id in config.yaml, mirroring the other relay readers) and
forward it in the /relay/provision body so the connector can bind
gatewayId -> instanceId and route inbound per-instance once Phase 6 delivery
lands.
The value is gateway-asserted but safely scoped: the org/tenant stays
NAS-token-verified at the connector, so a dishonest gateway can only bind its
OWN tenant's instance — same posture as relay_endpoint(). instanceId is only
added to the body when present, so omitting it lets the connector store null
(back-compat: self-hosted / pre-Phase-6 gateways simply have no binding yet).
For a managed (NAS-hosted) agent the id is NAS's AgentInstance.id, stamped into
the container env beside GATEWAY_RELAY_URL.
Tests: reader (env/config/absent), self_provision_relay forwards the id (set +
absent), and the real _post_provision body includes instanceId ONLY when set.
Refs: ~/nous/specs/gateway-gateway plan.md Phase 6 Unit α; decisions.md Q11.
Tirith redacts its own findings, but the approval-request callbacks built the
operator prompt from the RAW command string, so a credential-shaped value
Tirith flagged was sent verbatim to clients, undoing the redaction one layer up.
Two egress transports carried the leak; both are fixed via a shared
module-level seam _redact_approval_command() (redact_sensitive_text force=True):
1. chat platforms — _approval_notify_sync (gateway/run.py): redact before
both the button path (send_exec_approval) and the plain-text /approve
fallback.
2. SSE/API stream — _approval_notify (gateway/platforms/api_server.py):
redact event['command'] before it is enqueued to API/desktop clients.
(whole-bug-class: sibling call path on a separate transport.)
force=True so the prompt — a hard secret-egress boundary — honors redaction
even when security.redact_secrets is off. Clean commands pass through unchanged.
Tests bind the seam (synthetic credential-format fixtures, force-when-disabled) AND assert
BOTH callbacks ASSIGN the redacted result before the send/enqueue sink, via an
AST contract that rejects a discarded-result call. All mutation-checked.
After a compaction, the post-compression path parks last_prompt_tokens=-1 and
sets awaiting_real_usage_after_compression=True, but last_real_prompt_tokens
still holds the stale pre-compression value (above threshold). should_defer_
preflight_to_real_usage() hit the 'last_real_prompt_tokens >= threshold => False'
short-circuit and let preflight fire a SECOND compaction before the provider
reported real post-compaction usage. Add an early-return on the awaiting flag so
deferral holds for exactly one turn; update_from_response() clears it.
The flag-setting half (#36718) already landed on main via the in-place
compaction path (conversation_compression.py); this adds the missing
should_defer guard that consumes it.
Credit:
- @ashishpatel26 (#38133) — diagnosis + the should_defer early-return design
- @Tranquil-Flow (#36769) — same #36718 fix, identical guard placement
Closes#36718.
The tail-protection budget walks estimated an assistant message's tokens from content + function.arguments only, dropping each tool_call's id, type and function.name (plus JSON structure). Assistant turns that fan out into parallel tool calls were undercounted by 2-15x (a 4-tool-call turn measures ~73 vs ~1,090 real tokens), so the protected tail overshot tail_token_budget and compression ran far below its intended ratio — context kept growing.
Consolidate the three duplicated budget walks (_prune_old_tool_results and the two passes in _find_tail_cut_by_tokens) into a single _estimate_msg_budget_tokens() helper that counts the full tool_call envelope via len(str(tc)), consistent with how _estimate_message_chars estimates message size elsewhere.
Tested on Windows: new tests/agent/test_compressor_tool_call_budget.py plus the existing compression suite (test_context_compressor, compressor_image_tokens, cross_session_guard, infinite_compaction_loop) — 209 passed.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A cron job that sets `enabled_toolsets` to a list of *native* toolsets (e.g.
`["web", "terminal"]`) silently got ZERO MCP tools, while a job with no
per-job list got every globally-enabled MCP server. `_resolve_cron_enabled_
toolsets` returned the per-job list verbatim, bypassing the MCP-merge that the
platform-fallback branch performs via `_get_platform_tools`. So
`discover_mcp_tools()` registered the MCP tools into the registry, but
`get_tool_definitions(enabled_toolsets=...)` kept only the named native
toolsets — the agent then rejected every `mcp_*` call as "Unknown tool". (R2
of #23997.)
Fix: `_merge_mcp_into_per_job_toolsets` layers MCP membership onto a per-job
allowlist with the SAME semantics as `_get_platform_tools`:
* `no_mcp` sentinel present -> no MCP servers (sentinel stripped)
* one or more MCP server names already listed -> treat as an allowlist
* otherwise -> union in every globally-enabled MCP server
To avoid duplicating the "which MCP servers are enabled" computation (it
already existed inline in `_get_platform_tools`), this extracts a shared
`enabled_mcp_server_names(config)` helper in `hermes_cli.tools_config` and has
BOTH the gateway/CLI platform resolver and the cron per-job resolver call it —
so every path agrees on MCP membership (extend, don't duplicate).
Note: the issue's *headline* — bare MCP server names rejected, registry never
includes them — was already fixed on main (commits c10fea8d2 + 04918345e,
both before the issue was filed). This PR closes the remaining cron-specific
gap (R2). The `server:*` / `mcp:server` alias-notation rejection (R1) and the
quiet-mode silent-drop (R3) are tracked separately.
Salvaged from #32788 by sherman-yang (credited below). Reworked to reuse the
shared `enabled_mcp_server_names` helper instead of re-implementing the MCP
membership set in cron/scheduler.py.
Fixes#23997
Co-authored-by: sherman-yang <58446328+sherman-yang@users.noreply.github.com>
* feat(desktop): add Update now button to About panel
The About > Updates panel only surfaced "See what's new" when an update
was available, which just opens the changelog overlay — there was no way
to start the install directly from About. Add an "Update now" primary
button that opens the updates overlay (for apply progress) and kicks off
the install for the active target (backend in remote mode, else client).
* feat(desktop): PR-style file diffs in chat
Render write_file/edit_file/patch as a reviewable diff instead of raw
result JSON, closer to a Cursor/T3 per-edit review.
- Unified diff via FileDiffPanel: strip git file-header + @@ hunk noise,
drop the +/- gutter, color by line with a 2px gutter accent, full-bleed
to the card, transparent context lines, compact scroll height.
- Header shows filename + language icon + +N/-N stats; full path moves to
a hover tooltip (no Edited verb, no ms).
- Treat the three file-edit tools uniformly (isFileEditTool); read diff
from inline_diff or patch's diff field; suppress raw-arg detail.
- Reusable FileTypeIcon primitive sharing the code-block icon mapping
(codiconForFilename), codicon fallback.
- Per-row scaffolding fade (not the group wrapper, which trapped child
opacity); expanded edits stay full, collapsed fade; keyboard-only focus
lift. Hide diff-less rehydrated creates that read as dupes.
* style(desktop): lead --dt-font-mono with bundled JetBrains Mono
Code/diff blocks preferred a system Cascadia Code before the bundled
JetBrains Mono, so they drifted from the terminal (which leads with
JetBrains Mono) on machines where Cascadia is installed. Reorder so every
mono surface uses the face we actually ship.
* feat(desktop): syntax-highlight inline diffs via Shiki
Unify the diff renderer onto the same Shiki path as code blocks: highlight
the marker-stripped change content in the file's language, then a per-line
transformer layers the add/remove tint + gutter accent on top. Falls back
to the plain color-only renderer when the language is unknown, over budget,
or while Shiki loads.
- shikiLanguageForFilename(): extension → bundled-language id (shared
filename-token helper with codiconForFilename).
- code display:grid so full-width line tints don't double with newline
nodes; theme surface stripped so context lines stay transparent.
* style(desktop): use github-dark-dimmed for inline diffs
The vivid github-dark-default tokens read harsh behind the add/remove
tint in dark mode; switch the diff's dark theme to GitHub's lower-contrast
dimmed palette. Light mode and code blocks are unchanged.
* style(desktop): dim code-block syntax theme + share with diffs
Apply github-dark-dimmed to code blocks too (not just inline diffs) and
export one shared SHIKI_THEME so the two highlighters can't drift. Lower
contrast reads easier at our small code size in dark mode.
* style(desktop): soften shiki token contrast in dark mode
github-dark-dimmed only dims the background, which the diff/code surfaces
strip — so the bright token foregrounds were unchanged. Pull saturation +
brightness back a touch (hues preserved) on .shiki in dark mode for both
code blocks and inline diffs.
Follow-up to the salvaged preflight token-progress fix: require a material
(>5%) token reduction to count as progress, matching the overflow-handler
retry path (conversation_loop.py, #39550), so a sub-5% wobble can't keep the
3-pass preflight loop spinning. Adds boundary + zero-token regression tests.
/simplify-code QUALITY finding: the `if callable(_available_entries): ... else:
pool.select()` ladder was dead for the real CredentialPool type (`_available_entries`
is always a bound method) AND the select() fallback violated the helper's read-only
contract — select() -> _select_unlocked() runs _available_entries(clear_expired=True,
refresh=True), which persists to auth.json and triggers a network refresh. Call
_available_entries(clear_expired=False, refresh=False) directly inside the existing
try/except instead.
Also drops the now-dead `select=` stubs from the 6 pool tests (they only existed to
satisfy the removed fallback branch). Behavior unchanged; 6 pool tests pass and the
read-only / null-token contract tests were mutation-checked (flipping the flags /
removing the None-guard fails the respective test).
Rebased onto god-file Phase 1 refactor — preflight compression has moved
from agent/conversation_loop.py to agent/turn_context.py (no semantic
change in the refactor itself; the bug below was carried over verbatim).
The preflight compression loop in ``turn_context.py`` uses
``len(messages) >= _orig_len`` to decide whether a compression pass has
made progress. That conflates two different conditions: a true no-op
(transcript materially unchanged) and effective token compression that
summarises message contents but keeps the same number of rows. The
second case is misread as "Cannot compress further" — the session then
surfaces ``Context length exceeded`` and auto-resets even when the
post-compression estimate is far below the model context window.
Observed example from #39548: a Telegram session on GPT-5.5 with a 1M
context dropped from ~288k → ~183k tokens (a 36% reduction) while
preserving 220 messages. The loop treats that as exhaustion and the
gateway auto-resets the session.
Fix
---
Add ``_compression_made_progress(orig_len, new_len, orig_tokens, new_tokens)``
and call it after the post-pass ``estimate_request_tokens_rough`` (which
is moved up to run *before* the progress check instead of after it).
Either a row-count reduction OR a token-count reduction now counts as
progress; only when neither moves do we break out as "stuck".
Fixes#39548
Share one SHIKI_THEME (github-dark-dimmed) across code blocks and inline
diffs so they can't drift, and pull token saturation/brightness back via a
`.shiki` dark-mode filter. The dimmed theme alone only changes the
background — which both surfaces strip — so the bright foregrounds needed
the filter to actually calm down.
The connector half (gateway-gateway) moves the passthrough plane's post-ACK
forward off the HTTP gatewayEndpoint onto the gateway's outbound /relay WS via
a new passthrough_forward frame. This is the gateway side: the relay adapter
now RECEIVES and handles that frame, so a hosted gateway (no public IP) can
process forwarded Class-2/3 traffic (Discord interactions, Twilio) over the
socket it already holds — closing the "passthrough inbound doesn't work for
hosted gateways" gap.
- ws_transport.py: decode the passthrough_forward frame; PassthroughForward
dataclass + _passthrough_from_wire (base64 body -> exact bytes, byte parity
with the connector's toPassthroughForward); set_passthrough_handler mirrors
set_interrupt_inbound_handler.
- transport.py: PassthroughHandler type + set_passthrough_handler on the
RelayTransport protocol.
- adapter.py: connect() wires the passthrough handler; _on_passthrough decodes
the (already-sanitized, token-free) forward and, for a Discord interaction,
converts it to a MessageEvent routed through the normal agent path
(handle_message) — the reply egresses over the outbound / token-less
follow_up path, so the gateway never holds the interaction credential. Never
raises (a bad forward can't kill the read loop). Non-discord forwards (Twilio)
are logged + dropped for now.
- docs/relay-connector-contract.md: document the passthrough_forward frame +
PassthroughForward shape + §3.1.
The interaction -> MessageEvent CONVERSION semantics (slash-command vs button
UX, option rendering) are the open sub-design flagged in the spec; the TRANSPORT
+ receive mechanism (this) is settled per Ben's Gate-2 decision: "the relay
adapter handles receiving these events over the WS."
Tests (tests/gateway/relay/test_relay_passthrough.py): byte-preservation
round-trip (+ malformed-body tolerance), connect() wiring, application-command
and message-component interactions route through handle_message with correct
session source + scope capture, malformed/non-discord forwards dropped cleanly.
100 relay tests green. Pairs with the connector PR (gateway-gateway).
Unify the diff renderer onto the same Shiki path as code blocks: highlight
the marker-stripped change content in the file's language, then a per-line
transformer layers the add/remove tint + gutter accent on top. Falls back
to the plain color-only renderer when the language is unknown, over budget,
or while Shiki loads.
- shikiLanguageForFilename(): extension → bundled-language id (shared
filename-token helper with codiconForFilename).
- code display:grid so full-width line tints don't double with newline
nodes; theme surface stripped so context lines stay transparent.
Code/diff blocks preferred a system Cascadia Code before the bundled
JetBrains Mono, so they drifted from the terminal (which leads with
JetBrains Mono) on machines where Cascadia is installed. Reorder so every
mono surface uses the face we actually ship.
Render write_file/edit_file/patch as a reviewable diff instead of raw
result JSON, closer to a Cursor/T3 per-edit review.
- Unified diff via FileDiffPanel: strip git file-header + @@ hunk noise,
drop the +/- gutter, color by line with a 2px gutter accent, full-bleed
to the card, transparent context lines, compact scroll height.
- Header shows filename + language icon + +N/-N stats; full path moves to
a hover tooltip (no Edited verb, no ms).
- Treat the three file-edit tools uniformly (isFileEditTool); read diff
from inline_diff or patch's diff field; suppress raw-arg detail.
- Reusable FileTypeIcon primitive sharing the code-block icon mapping
(codiconForFilename), codicon fallback.
- Per-row scaffolding fade (not the group wrapper, which trapped child
opacity); expanded edits stay full, collapsed fade; keyboard-only focus
lift. Hide diff-less rehydrated creates that read as dupes.
Adds test_413_retries_on_token_only_compression: same message count but
materially fewer tokens after compaction must count as progress and retry,
not abort. Fails on main without the salvaged fix, passes with it.
Compression can materially reduce request size (tool-result pruning,
in-place summarization) without reducing message count. The two
compression-success checks in conversation_loop.py (413 handler and
context-overflow handler) only compared len(messages) to detect
success, missing token-only compression.
Now re-estimates tokens after compress_context() returns and treats
any >=5% reduction as a successful compression pass. Error logs
also use the post-compression token count instead of the stale
pre-compression estimate.
Fixes: #39550
display.timestamps already drove the [HH:MM] suffix on live submitted and
streamed message labels, but there was no runtime command to toggle it and
/history ignored the setting entirely. Add /timestamps [on|off|status]
(alias /ts) and render [HH:MM] in /history for turns that carry a stored
unix timestamp (resumed sessions). Live unsaved turns without a stored time
are never given a fabricated one. Uses the existing sanctioned non-wire
'timestamp' message key (stripped before the API call in chat_completions),
so message-alternation and prompt-cache invariants are untouched.
Ctrl+G already opened $EDITOR with the current draft, but used
open_in_editor(validate_and_handle=False), which only loaded the saved text
back into the input area — the user still had to press Enter. The TUI's
Ctrl+G (openEditor) submits the draft on a clean exit. Since CLI submission
is driven by the custom Enter keybinding (not the buffer accept_handler),
validate_and_handle can't route through it; instead chain a done-callback on
the editor Task that calls the new _submit_editor_buffer(), which mirrors the
Enter handler's idle/queue/slash branches and drops an empty save.
Follow-up to the accept-any-file-type change. The observe-unmentioned and
replied-media paths relied on cache_media_bytes() returning None for
unsupported document types to emit an 'unsupported, not cached' note. Now
that any file type is always cached, those docs are cached and surfaced with
a path-pointing note — consistent with the main document path. The
remaining cached-is-None branch is image-validation-failure only; its note
is reworded accordingly. Updates the group-gating test to the new contract.
Authorization to message the agent is the gate, not the file extension.
Previously the inbound-attachment allowlist (SUPPORTED_DOCUMENT_TYPES) was
opt-OUT on Discord (allow_any_attachment defaulted false) and had no bypass
at all on Telegram/Slack — so an .html (or any non-allowlisted type) was
dropped or hard-rejected before the agent saw it.
Now every authorized upload is cached and surfaced to the agent regardless
of type:
- base.cache_media_bytes(): unknown types cache as octet-stream (or the
caller-supplied MIME) instead of returning None — fixes the chokepoint
that Teams/Telegram-media route through.
- discord/telegram/slack adapters: removed the allowlist reject/skip; any
non-media attachment is typed DOCUMENT and cached. Known types keep their
precise MIME.
- Text inlining now gates on a shared _TEXT_INJECT_EXTENSIONS set (text +
code + config + markup) instead of a blind UTF-8 decode, so binary formats
(PDF/zip/docx) with ASCII headers are never inlined.
- gateway/run.py emits the path-pointing context note for every DOCUMENT,
including non text/application MIME types.
- discord.allow_any_attachment is now a documented no-op kept for config
back-compat.
Validation: 357 gateway tests pass; E2E confirms .html/.bin/custom types
cache, known types stay precise, PDFs are not inlined.
terminal.docker_extra_args passes flags verbatim to `docker run` (e.g.
--gpus=all, --shm-size=16g). It was wired into DEFAULT_CONFIG,
TERMINAL_CONFIG_ENV_MAP (so `hermes config set` bridged it),
terminal_tool._get_env_config (reads TERMINAL_DOCKER_EXTRA_ARGS), and
DockerEnvironment (applies extra_args) -- but it was MISSING from cli.py's
env_mappings and gateway/run.py's _terminal_env_map.
Consequence: a user who hand-edits config.yaml (rather than running
`hermes config set`) has docker_extra_args silently dropped on the CLI and
gateway/desktop startup paths, while docker_image / docker_volumes (which
ARE in those maps) bridge correctly -- producing the reported 'Hermes
partially reads the Docker config' symptom where --gpus=all and
--shm-size=16g never reach docker run.
This is the same bridge-coverage bug class that shipped before for
docker_run_as_host_user (cli + gateway) and docker_mount_cwd_to_workspace
(gateway). Fix by adding the key to both maps, plus a dedicated regression
pin in test_terminal_config_env_sync.py mirroring the existing
test_docker_*_is_bridged_everywhere guards.
* fix(gateway): walk /proc/*/cmdline to find main-wrapper.sh under s6-overlay v3 (#49196)
(cherry picked from commit 3a108c2df0)
* fix(container): peel s6-v3 rc.init prefix so dashboard role is detected
kyssta-exe's preceding commit (#49238) fixed _read_container_argv() to
locate the rc.init-launched main-wrapper.sh process under s6-overlay v3,
but the skip still never fired: _strip_container_argv_prefix() only peeled
a prefix when args[0] was init/main-wrapper.sh/hermes. Under s6 v3 the
matched argv is
/bin/sh -e /run/s6/basedir/scripts/rc.init top
/opt/hermes/docker/main-wrapper.sh dashboard ...
so args[0] stayed /bin/sh, _is_dashboard_container() returned False, and
the dashboard container reconciled + started its own gateway-default —
the exact dual Telegram getUpdates 409 in issue #49196.
Fix: strip everything up to and including the main-wrapper.sh token (the
stable boundary the image owns), covering both the v2 (/init ...) and v3
(/bin/sh ... rc.init top ...) shapes with one rule, instead of matching
launcher tokens positionally. This also repairs _is_legacy_gateway_run_request()
under v3, which shares the same strip helper (the issue called this out).
Tests: extend the dashboard true/false parametrize sets with the s6-v3
argv shape, and add test_main_skips_reconcile_in_dashboard_container_s6v3
exercising main() end-to-end with the v3 argv. Verified via mutation that
both new v3 assertions fail under the old positional strip and pass with
the fix.
---------
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
When `hermes dashboard --host 0.0.0.0` is run interactively with the auth
gate engaged but no DashboardAuthProvider configured, prompt to set up the
bundled username/password provider on the spot (or point at `hermes dashboard
register` for OAuth) instead of only emitting the fail-closed error.
- main.py: `_maybe_setup_dashboard_auth_interactively()` runs before
start_server. No-ops on loopback binds, when a provider is already
registered, or when stdin/stdout isn't a TTY (Docker/s6, CI, piped runs) so
the fail-closed SystemExit stays the backstop for unattended deploys. On the
password path it writes dashboard.basic_auth.{username,password_hash,secret}
to config.yaml (scrypt hash, never plaintext), then force-rediscovers
plugins so the basic provider registers before the gate check.
- web_server.py: fix the fail-closed hint — it told operators to set
`dashboard_auth.basic.username` but the provider reads `dashboard.basic_auth`.
- docs: note the interactive setup under Fail-closed semantics.
No new env vars; reuses the existing dashboard.basic_auth config surface.
* feat(cli): /prompt — compose your next prompt in $EDITOR
Adds /prompt (alias /compose): opens $VISUAL/$EDITOR on a temp markdown
file so you can hand-edit a multi-line prompt, then sends the saved buffer
as the next agent turn. Text after the command pre-seeds the buffer; an
empty save cancels. Reuses the one-shot _pending_agent_seed the interactive
loop already consumes (same mechanism as /blueprint), so no changes to the
input event loop or message pipeline. CLI-only.
* feat(tui): /prompt slash command opens $EDITOR (parity with CLI)
The TUI already opens $EDITOR via Ctrl+G (openEditor), but had no /prompt
slash command like the classic CLI. Wire openEditor into the slash handler
context and register /prompt (alias /compose) to call it; inline text after
the command is dropped into the composer first so it carries into the editor,
matching the CLI's /prompt <text>.
* feat(cli): /reasoning full to show complete thinking, not 10-line clamp
The post-response Reasoning recap box hard-clamped long thinking to the
first 10 lines, so there was no way to see the full reasoning trace after
a turn (live streaming already shows it in full). Add display.reasoning_full
(default off) plus /reasoning full|clamp to toggle it at runtime; the clamp
truncation note now points at the command. Addresses repeated user requests
to show all thinking tokens.
* test(gateway): de-snapshot /reasoning help assertion
The test froze the exact args-hint literal '/reasoning [level|show|hide]',
which the new full/clamp args change to '[level|show|hide|full|clamp]'.
Convert to an invariant: assert /reasoning is in help and carries its core
args, not the exact hint string.
* feat(tui): /reasoning full|clamp parity in tui_gateway
The classic-CLI reasoning_full toggle had no TUI equivalent — typing
/reasoning full in the TUI fell through to parse_reasoning_effort and
errored. The TUI renders thinking as an expand/collapse section (no fixed
10-line recap), so map full -> sections.thinking=expanded (raw, uncapped
via thinkingPreview mode='full') and clamp -> collapsed, persisting
display.reasoning_full for cross-surface config consistency.
Plugins shelling out to bare `hermes` via the terminal tool hit
`command not found` (exit 127) when the gateway was launched without the
hermes install dir on PATH (systemd, service managers, cron, desktop
launchers) — even though `hermes` works in the user's own interactive
terminal, which sources the shell rc that exports that dir.
The terminal tool's subshell PATH was the agent process PATH plus a
static set of system dirs (_SANE_PATH); it never included wherever the
hermes console-script actually lives (~/.local/bin, the venv bin/Scripts,
pipx, nix). Resolve that dir once (which/argv0/sys.executable) and
prepend-if-missing it so bare `hermes` resolves regardless of launch
method.
Sibling-site follow-up to the AGENTS.md token-lock fix (#50481). Platform
adapters migrated from gateway/platforms/<name>.py to
plugins/platforms/<name>/adapter.py; a handful (signal, weixin, bluebubbles,
qqbot, yuanbao, msgraph_webhook, webhook, api_server) still live in
gateway/platforms/.
- adding-platform-adapters.md: new-adapter creation path + reference-impl table
- gateway-internals.md: rewrite the adapter tree to reflect the actual split
- zh-Hans mirrors of both kept in parity
- scripts/release.py: add TutkuEroglu to AUTHOR_MAP (CI gate)
gateway/platforms/telegram.py no longer exists (adapters moved to
plugins/platforms/<name>/adapter.py) and telegram no longer uses the
scoped-lock pattern. Point the token-lock canonical-pattern reference to
plugins/platforms/irc/adapter.py, which acquires the lock in connect()
and releases it in disconnect() — and is already cited as a canonical
example in ADDING_A_PLATFORM.md.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* feat(providers): remove google-gemini-cli + google-antigravity OAuth providers
Google now actively bans accounts for third-party tools that piggyback on
Gemini CLI / Antigravity / Code Assist OAuth, and because abuse prevention
sits at a backend layer the ban can extend to the entire Google account
(Gmail/Drive), with a second violation being permanent.
Ref: https://github.com/google-gemini/gemini-cli/discussions/20632
Removes both OAuth inference providers entirely (modules, provider profiles,
auth/runtime/config/models wiring, the /gquota Code Assist quota command,
the antigravity-cli optional skill, desktop + docs surface in en + zh-Hans).
The API-key 'gemini' provider (GOOGLE_API_KEY/GEMINI_API_KEY against
generativelanguage.googleapis.com) is unaffected and stays fully supported.
* fix(skills): keep the antigravity-cli skill — only the OAuth provider is removed
The antigravity-cli optional skill orchestrates the external `agy` binary as
a coding-agent tool via the terminal tool — it does NOT wrap Hermes inference
through the banned google-antigravity OAuth provider, so it carries none of
the account-ban risk that motivated removing that provider. Restore the skill,
its docs page, the sidebar entry, and the optional-skills catalog row. The
google-antigravity / google-gemini-cli inference providers stay fully removed.
The welcome banner's 'Available Tools' merged in every toolset from the
global check_tool_availability() registry walk, regardless of whether it
was enabled for the current platform. On a Blank Slate CLI (file +
terminal only) that surfaced discord / feishu / kanban tools the agent
was never actually given — they are not in the agent's tool schema, but
the banner displayed them, making it look like they were exposed.
- Filter the unavailable-toolset merge to toolsets actually in
enabled_toolsets (a toolset that's enabled but has unmet deps still
legitimately shows as disabled/lazy).
- Gate the 'Available Skills' section on the skills toolset being
enabled — when it's off, the agent can't load any skill, so show
'Skills toolset disabled' instead of the on-disk catalog.
When enabled_toolsets is empty (older callers), behavior is unchanged.
Validation: blank-slate banner now shows only file + terminal and
'Skills toolset disabled'; a skills-enabled banner still lists the
catalog. Added regression tests; full banner suite green (15/15).
Live testing against a real SIGTERM-ignoring process TREE (parent + children,
the agent-browser daemon + renderer shape) revealed psutil.wait_procs's
gone/alive partition mis-handles a parent/child tree: it reaps via
Process.wait() and could mark targets gone/alive inconsistently across the
tree, leaving survivors un-killed (flaky — sometimes the parent lived,
sometimes a child). Replace it with: sleep out the grace window, then
directly re-probe every captured target (_proc_alive, treating zombies as
dead) and SIGKILL any that's still running. Add a multi-child-tree regression
test. 6/6 escalation tests green across repeated runs; the real-tree E2E now
kills the full tree 6/6 runs.
A daemon that ignores or stalls in its SIGTERM handler currently survives the
process-registry reap and leaks until reboot (observed as agent-browser
daemons accumulating to EMFILE on long-running gateways). _terminate_host_pid
now snapshots the tree, SIGTERMs it, waits a bounded grace window
(terminal.daemon_term_grace_seconds, default 2.0s, 0 disables), then SIGKILLs
any survivor. The recycled-PID identity guard still gates the whole path, so
escalation never reaches a stranger; Windows is unchanged (taskkill /F is
already a hard kill).
Config lives in config.yaml (terminal.daemon_term_grace_seconds), NOT an env
var, per the .env-secrets-only policy.
Implements the SIGKILL-escalation idea from @tkwong's #15008, reworked onto the
current _terminate_host_pid tree-kill path (the original predated it) and
config-gated instead of env-var-gated.
Co-authored-by: Benjamin Wong <tkwong@inspiresynergy.com>
Surface dangerous host/deployment posture at gateway startup so operators get
the 'you're exposed' signal the June 2026 MCP-config persistence campaign
victims never had. Warn-only — never blocks startup, never raises.
Checks (each independently fail-safe):
- Running as root (POSIX uid 0)
- SSH daemon with PasswordAuthentication enabled (incl. the 'yes' default)
- Running in a container with no persistent volume mount over HERMES_HOME
- Network-accessible API server with no API_SERVER_KEY
New module hermes_cli/security_audit_startup.py; invoked once per process from
start_gateway() right after setup_logging(). Cross-platform (root/SSH checks
no-op on Windows). Idea: @Cthulhu.
The s6 dashboard entrypoint and docker integration tests relied on
HERMES_DASHBOARD_INSECURE=1 to bring up a 0.0.0.0 dashboard with no auth
provider. With --insecure now a no-op (auth gate mandatory on non-loopback
binds), that path fails closed.
- s6 dashboard/run: drop --insecure derivation; warn that the env is a no-op
and point operators at HERMES_DASHBOARD_BASIC_AUTH_* / OAuth.
- docker tests: supervision tests now register the bundled basic password
provider (HERMES_DASHBOARD_BASIC_AUTH_USERNAME/_PASSWORD) so the gate has a
provider and the dashboard binds. Rewrote the insecure-opt-out test to
assert fail-closed (dashboard does NOT serve) instead of gate-bypass.
- docs (en + zh-Hans): HERMES_DASHBOARD_INSECURE documented as deprecated
no-op; basic-auth is the zero-infra way to authenticate a containerized
public dashboard.
Remove the dashboard --insecure auth-bypass, add an MCP persistence guard +
IOC blocklist, and raise the API-server key entropy floor.
Driven by the June 2026 hermes-0day campaign (r/hermesagent, live 854.media
instance): scanners find exposed Hermes dashboards/API servers, drive the
root agent to plant a 'command: bash' MCP entry that appends an attacker SSH
key to authorized_keys, which cron + startup then re-execute every tick.
- dashboard: --insecure no longer disables the auth gate. should_require_auth
returns True for every non-loopback bind; a public bind ALWAYS requires an
auth provider (bundled password provider or OAuth). --insecure kept as a
warned no-op for backward compat. Fail-closed error now points at the
password provider, not at --insecure.
- mcp_security: validate_mcp_server_entry now also rejects shell payloads that
write to OS persistence surfaces (authorized_keys/.ssh/pam.d/sudoers/cron/
rc files) and hard-rejects a hermes-0day IOC blocklist (attacker SSH key +
source IPs) anywhere in command/args/env. Runs at save AND spawn time.
- api_server: raise network-bind API_SERVER_KEY entropy floor 8->16 chars;
warn when a network-accessible API server runs an unsandboxed local backend.
Same library-code anti-pattern as the compressor fix: MiniSWERunner.__init__
called logging.basicConfig(), overriding the application's root logger config
every time a runner was instantiated. Moved the call into main() (the CLI
entry point) where it belongs; __init__ now only does getLogger(__name__).
Standalone verbose logging is preserved.
logging.basicConfig() in TrajectoryCompressor.__init__ overrides the
root logger configuration every time the class is instantiated. Library
code should use logging.getLogger(__name__) and let the application
entry point configure the root logger.
Fixes inconsistent log formatting when the compressor is used alongside
other logging configuration in the gateway.
* fix(agent): strip stale reasoning_content when falling back to a strict provider
A reasoning primary (DeepSeek/Kimi/MiMo thinking mode) pins reasoning_content
on every assistant tool-call turn (a single space " " pad). api_messages is
built once under the primary; on a mid-session fallback to a strict
OpenAI-compatible provider (Mistral, Cerebras, Groq, SambaNova), those stale
pads were replayed verbatim and rejected with HTTP 400/422:
body.messages.2.assistant.reasoning_content: Extra inputs are not
permitted (input: ' ')
reapply_reasoning_echo_for_provider() only ever ADDED pads, so it never
reconciled history built under a reasoning primary against a strict fallback.
copy_reasoning_content_for_api() also leaked empty-string and 'reasoning'-only
shapes to non-pad providers.
Fix both sites: when the active provider does not enforce echo-back, strip
reasoning_content (empty, space-pad, or non-empty) entirely. Re-padding when
switching TO a reasoning provider is preserved. Covers the Cerebras 400 from
#45655 and the DeepSeek->Mistral 422 fallback report.
Refs #45655.
* test: update reasoning-replay tests for strict-provider stripping
test_explicit_reasoning_content_beats_normalized_reasoning_on_replay was
implicitly running on the OpenRouter fixture (non-pad); pin it to a reasoning
provider so the precedence it checks is observable. Add a positive
strict-provider test asserting reasoning_content is stripped on replay.
Addresses reviewer feedback on #13377:
1. Restore all stripped docstrings (_load_config, _is_breaker_open,
sync_turn, register, _get_client, _read_filters, _write_filters,
_unwrap_results, save_config) and section dividers
2. Revert api_key to required:true in schema — self-hosted Mem0 also
requires auth by default; validation in _get_client() handles the
either/or logic separately from the schema
3. Confirm secret:true remains on api_key (already correct)
The mem0 plugin previously hardcoded api.mem0.ai as the endpoint.
This adds a `host` config key and MEM0_HOST env var so users can
point the plugin at a self-hosted Mem0 instance.
Changes:
- _load_config(): read MEM0_HOST env var
- is_available(): accept host OR api_key (self-hosted may not need a real key)
- get_config_schema(): add host field
- initialize(): read host from config
- _get_client(): pass host kwarg to MemoryClient when set
- system_prompt_block(): show target (cloud vs URL)
- README: document self-hosted setup
The PID-reuse guard (#43846) reads /proc/<pid>/stat field 22, which only
exists on Linux — on macOS/Windows it returned None and the guard silently
degraded to a bare liveness check (a no-op, safety-wise). Add a
psutil.create_time() fallback (psutil is a hard dep, cross-platform),
quantized to centiseconds for stable equality, so the recycled-PID guard
actually protects macOS/Windows too. /proc always wins first on Linux and
always misses on macOS/Windows, so the two sources never mix on one host and
same-source equality is all the guard needs.
The salvaged test spawned a listener subprocess that printed its port
immediately after bind() but BEFORE listen(), so under CI's loaded 8-worker
box the parent connected before the socket was listening -> ConnectionRefused
(flaked on test slice 2/6). Reorder the child to listen() then print the port,
and make the client connect with a short bounded retry to absorb scheduler
jitter. 15/15 green locally including direct hammering.
Follow-up to the salvaged #43846 commits: the WhatsApp adapter moved from
gateway/platforms/whatsapp.py to plugins/platforms/whatsapp/adapter.py since the
PR was authored. The cherry-pick brought _listener_pids_on_port's `re.finditer`
ss-fallback and the new test's import, but the new module location doesn't import
`re` (latent NameError on the lsof-absent fallback path) and the test imported the
old module path. Add `import re` to the adapter and repoint the test import.
This is the bug that was actually closing Firefox. `_kill_port_process`, run on
every bridge (re)start to free the port, used `lsof -ti :PORT` / `fuser PORT/tcp`
— both of which match a process whose socket merely *involves* that port number
in ANY state, including ESTABLISHED client connections. It then SIGTERMed every
match.
The bridge defaults to port 3000 — a ubiquitous local dev-server port. With a
browser tab open on localhost:3000, `lsof -ti :3000` returned Firefox's PID, so
each restart of the (crash-looping) WhatsApp bridge SIGTERMed Firefox, closing
the whole browser at irregular intervals with no crash and no coredump.
Proven live with the kernel `signal:signal_generate` tracepoint:
hermes-gateway(3396516) -> sig=15 (code=0/SI_USER) -> comm=firefox pid=3371585
captured immediately after a gateway start, while Firefox held a socket on the
bridge port. Demonstrated over-match: `lsof -ti :8080` returns the listener AND
the gateway's own client connection; `lsof -ti tcp:8080 -sTCP:LISTEN` returns
only the listener.
Fix: `_listener_pids_on_port` resolves only LISTEN-state sockets
(`lsof -ti tcp:PORT -sTCP:LISTEN`, with an `ss -ltnp` fallback) and
`_kill_port_process` signals just those. A client whose connection happens to
involve the port number is never touched — which is also more correct, since a
client never blocks the new bridge from binding. Windows already filtered
LISTENING; the broad `fuser -k` path is removed.
Adds TestKillPortProcess: real-socket tests proving a separate client process
is excluded from the listener lookup and survives port cleanup. 9 tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`_kill_stale_bridge_by_pidfile` SIGTERMed the PID recorded in `bridge.pid`
after only a bare liveness check. Once the bridge exits and is reaped the
kernel recycles that PID onto an unrelated process; because the WhatsApp bridge
crash-loops ("Bridge process died (exit code 1)" repeating), this cleanup ran
on every restart and could SIGTERM a recycled PID that had landed on the user's
browser — closing Firefox at irregular intervals with no crash and no coredump
(a clean kill of a stranger).
Same PID-recycling class as the MCP reaper (7bd1f8a2d) and the process-registry
host-PID guard (e6a99cef2); this was the third, and most actively-fired, path.
Fix: `_write_bridge_pidfile` now also records the leader's kernel start time
(line 2). `_kill_stale_bridge_by_pidfile` re-validates identity via
`_bridge_pid_is_ours` before signalling — the (pid, start time) pair must match,
or for legacy single-line pidfiles the live cmdline must name `node` + this
session's unique path. A recycled PID (different start time / cmdline) is logged
and skipped, never signalled. Legacy pidfiles stay readable.
Adds TestWhatsappBridgePidfile: real-process tests proving a genuine bridge is
reaped while a recycled PID (start-time mismatch, or non-bridge cmdline) is
spared. 7 new + 108 gateway/registry tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The background-process registry signalled host PIDs (recovery adoption,
detached-session kill, tree-kill) using a number captured at spawn, guarded
only by a bare liveness check. Once a session's process exits and is reaped the
kernel recycles that PID onto an unrelated process, so an alive-but-different
PID passed the check and got tree-killed.
Observed in the wild: a recycled background-session PID landed on Firefox's
session leader; a later kill/refresh walked its process tree and SIGTERMed
every tab — Firefox "closing" at irregular intervals with no crash/coredump.
This is the same PID/PGID-recycling class fixed for the MCP orphan reaper in
7bd1f8a2d, but the process_registry subsystem was never guarded — so the bug
persisted.
Fix: record each host process's kernel start time (/proc/<pid>/stat field 22)
at spawn, persist it in the checkpoint, and re-validate it before every signal
via `_host_pid_is_ours`. A PID whose start time no longer matches — or that is
gone — is never signalled:
- recover_from_checkpoint: a recycled PID is not adopted as a session.
- _refresh_detached_session: a recycled detached PID is marked exited.
- kill_process / _terminate_host_pid: refuse to tree-kill a stranger.
Legacy checkpoints and platforms without /proc (no baseline) degrade to the
prior best-effort liveness behaviour, so nothing else changes.
Adds TestPidReuseGuard: real-process tests proving a mismatched start time
refuses termination while a matching one still kills, plus recovery/refresh
recycling paths. 74 registry + 22 MCP-stability tests green.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The kanban-worker and kanban-orchestrator bundled skills existed only to
be force-loaded into dispatcher-spawned workers, gated by
environments:[kanban] so they wouldn't leak into normal CLI listings.
That gating was fragile (the leak that #50443 patched) and the
--skills auto-load was already best-effort — most workers ran without it
because the bundled skill isn't present in profile-scoped skills dirs.
Remove the skills entirely and promote their load-bearing content
(workspace kinds, deliverable artifacts, created-card integrity, profile
discovery) into KANBAN_GUIDANCE, which is already injected into every
kanban worker's system prompt. Net result: every worker reliably gets
the guidance, nothing can leak into a CLI/blank-slate session, and the
gating machinery is gone.
- agent/prompt_builder.py: promote the 4 load-bearing rules into KANBAN_GUIDANCE
- hermes_cli/kanban_db.py: drop --skills kanban-worker auto-injection + _kanban_worker_skill_available probe
- hermes_cli/kanban_swarm.py: drop skills=[kanban-orchestrator] on the root card
- hermes_cli/kanban.py: drop kanban-init skill seeding; fix help text
- delete skills/devops/kanban-{worker,orchestrator}
- docs: delete the two skill pages (EN+zh), fix sidebars/catalog/kanban.md/kanban-worker-lanes.md and the video-orchestrator + codex-lane references
- tests: update spawn-argv expectations; re-bound the guidance-size guard
Supersedes the skill-leak half of #50443 (credit @helix4u for flagging the area).
On a Linux source install the in-app updater ran the full backend update +
desktop rebuild successfully but never restarted the app — it hung forever on
the applying overlay with no close button. Two causes:
- applyUpdatesPosixInApp() only handled the macOS .app bundle swap;
runningAppBundle() is null off macOS, so Linux fell through to
{ ok: true, backendUpdated: true } without ever relaunching.
- The renderer store had no terminal state for that result shape, so
$updateApply stayed { applying: true } and the overlay's close button
(hidden while applying) never appeared.
Fix (new electron/update-relaunch.cjs, pure + unit-tested):
- Decide the Linux outcome from whether the *running* binary is the one we
just rebuilt (execPath under release/<plat>-unpacked, path-segment-aware so
linux-unpacked-evil can't masquerade) and whether its chrome-sandbox helper
is launchable (root:root + setuid, or an --no-sandbox / ELECTRON_DISABLE_SANDBOX
opt-out):
relaunch — detached watcher waits for this PID to exit (graceful, then
SIGKILL), self-deletes, and re-execs the rebuilt binary with the original
launch context (filtered args + HERMES_*/sandbox env + cwd) restored.
guiSkew — AppImage/.deb/.rpm/dev: backend updated but this GUI package was
NOT changed; surface an honest closeable 'reinstall the desktop app'
terminal state instead of lying that it loads next launch (#37541 skew).
manual — rebuilt binary but sandbox helper not launchable: keep the
working window, don't quit into a dead app.
- store/updates.ts lands a terminal, closeable state for EVERY resolved apply
outcome (handedOff / guiSkew / manualRestart / updated-not-relaunched / error)
so the hang is impossible regardless of platform or result.
- New DesktopUpdateStage values (update/rebuild/done/guiSkew) + GuiSkewView so
progress reads correctly and the skew state is closeable. i18n in all four
locales (en/ja/zh/zh-hant) in parity.
- electron/update-relaunch.test.cjs (16 tests) + store outcome tests.
Salvaged from #45205 onto current main. Linux quit dwell uses the shared
UPDATE_HANDOFF_DWELL_MS (2.5s) from #50448 for consistency. Four-locale i18n
parity, AUTHOR_MAP entry, and the test wiring added on top.
Closes#45205.
* fix(desktop): filter undefined entries in AttachmentList to prevent refText crash on session switch
When switching sessions, the attachments array can contain stale/undefined
entries from the previous session's state. Accessing attachment.refText on
an undefined entry throws TypeError, breaking session switching entirely.
Fix: add .filter(Boolean) before .map() to skip undefined/null entries.
Fixes#49614
* fix(desktop): update I18nConfigClient usage in attachment test
The i18n config API changed from getLocale/saveLocale to
getConfig/saveConfig. Update the test fixture to match.
CI on the salvage caught two issues the stale PR base masked:
1. The model-setup flows were extracted from main.py into
hermes_cli/model_setup_flows.py after @pmos69 forked. The cherry-pick
re-introduced a stale _model_flow_custom into main.py (duplicating the
one main.py now imports) and put _model_flow_google_antigravity there too.
Move the antigravity flow into model_setup_flows.py alongside its siblings
and drop the stale _model_flow_custom dup. Fixes the getpass/stdin OSError
in tests/cli/test_cli_provider_resolution.py.
2. google-antigravity re-exposes Claude/Gemini/GPT-OSS models, so its catalog
was hijacking bare short aliases (`sonnet` -> google-antigravity instead of
anthropic) in detect_static_provider_for_model via dict insertion order.
Add _BORROWED_MODEL_PROVIDERS and defer those providers to a last-resort
pass so a model's native vendor always wins alias/direct-catalog detection.
Fixes tests/hermes_cli/test_models.py::test_short_alias_resolves_to_static_model.
The salvaged PR wired auth.py / providers.py / runtime_provider.py for
google-antigravity but never registered a ProviderProfile, so the provider
was invisible to list_providers() / the model picker / alias resolution.
Register it in the gemini model-provider plugin (alongside gemini and
google-gemini-cli) with the antigravity-pa:// scheme and aliases. Also add
@pmos69 to release.py AUTHOR_MAP (CI gate).
Salvage follow-up on top of @pmos69's #29474. The PR resolved the
Antigravity OAuth client purely by discovering it from an installed `agy`
binary or HERMES_ANTIGRAVITY_CLIENT_ID/SECRET env vars, so users without
agy installed hit a hard 'client ID not available' error.
Antigravity's desktop OAuth client is a public, non-confidential installed-app
client (PKCE provides the security), baked into every copy of the Antigravity
CLI — same posture as the gemini-cli credentials Hermes already ships in
google_oauth.py. Bake it in as the final fallback (env -> discovery -> public
default) and add the public default Code Assist project as the discovery
fallback, matching the reference Antigravity flow. Now consumers can
authenticate directly without agy installed.
Pin the contract that ``_apply_env_overrides`` consults ``is_connected``
before the install-triggering ``check_fn``: an unconfigured platform is
skipped without calling ``check_fn`` (no lazy install), while a configured
platform still has ``check_fn`` run and is auto-enabled. The first assertion
fails on the pre-fix unconditional sweep.
For adapter plugins, ``PlatformEntry.check_fn`` doubles as a lazy installer:
calling it pip-installs the platform SDK as a side effect (see e.g.
``plugins/platforms/discord/adapter.py::check_discord_requirements``). The
enablement sweep in ``_apply_env_overrides`` called ``check_fn`` for every
registered plugin platform unconditionally, so a single
``load_gateway_config()`` — which the desktop/dashboard readiness probe
``GET /api/status`` awaits synchronously — pip-installed Discord, Telegram,
Slack, Feishu and Dingtalk even when the user configured none of them
(``platforms: none``). On a slow or restricted network the installs ran long
enough to block the event loop past the desktop's readiness timeouts, so the
app timed out, killed and re-spawned the backend, and boot-looped (stuck at
94%).
Consult the cheap ``is_connected`` credential check FIRST and only run the
install-triggering ``check_fn`` for platforms that are already enabled or
actually configured. Auto-enable-by-credentials is unchanged: a platform with
its token set still gets its SDK installed and enabled.
The pop-out position is a bottom-right corner inset; the old clamp only floored
it and capped each inset by a flat constant, so dragging left/up (or restoring a
position saved on a larger/other monitor) could push the box's width/height past
the left/top edges and strand it off-screen — unrecoverable since the bad spot
persisted to localStorage.
Now the clamp bounds the WHOLE box (accounting for its measured width/height plus
an edge margin) on all four sides. Applied on drag (measured size), on load
(clamped in readPosition), and via a mount + window-resize reclamp so a shrunk
window or stale persisted value always pulls the box back into view.
Follow-up to #50238/#50381. The restart-loop is now SAFE (marker + launch
gate), but the trigger that lured users into relaunching mid-update remained:
on the in-app update hand-off the desktop window vanished almost immediately
(app.quit() 600ms after spawning the detached updater), before the updater's
own window appeared — a blank-screen gap that looks like a crash.
- Linger on the update overlay for UPDATE_HANDOFF_DWELL_MS (2.5s, was 600ms)
before quitting, on BOTH hand-off paths (in-app update + Windows bootstrap
recovery), so the message lands and bridges to the updater window.
- Strengthen the restart-stage copy and the overlay's applyingBody/applyingClose
to explicitly tell the user the window will reopen automatically and NOT to
reopen Hermes themselves while it updates. All four locales (en/ja/zh/zh-hant)
updated in parity.
Pure UX; does not touch the #50381 marker/gate mutual-exclusion safety net.
The browser orphan reaper reads a daemon PID from a `.pid` file in a
world-writable, predictably-named temp dir (`/tmp/agent-browser-h_*`) it
does not write itself, then tree-kills that PID via `_terminate_host_pid`
after only a liveness check. A same-user actor could plant a fake socket
dir whose `.pid` points at an arbitrary victim process, and OS PID reuse
after the real daemon exits could land the recorded PID on an unrelated
process — either way an arbitrary same-user process (and its whole tree)
gets SIGTERMed. Local DoS.
Add `_verify_reapable_browser_daemon()`, gated before the kill: via psutil
(a hard dep, fine cross-platform for the same-user processes the reaper can
signal) require both (1) identity — `agent-browser` in the process
name/cmdline — and (2) binding — the live process references *this* session's
socket dir in its cmdline or `AGENT_BROWSER_SOCKET_DIR`. The binding check is
the real spoof defense: a planted/recycled PID won't embed our exact socket
path. Fail-closed on any ambiguity (unreadable cmdline, no match), leaving the
process and its socket dir untouched for a later sweep.
Builds on @sgaofen's fix in #14394 (cmdline identity check); rewritten to use
psutil instead of `/proc`+`ps` (cross-platform, Windows-covered) and to add
the session-socket-dir binding check for recycled-PID / spoof resistance.
Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
Follow-up to the salvaged #9560 fix:
- Replace the _TRAVERSAL_RE regex with an explicit _is_path_unsafe() helper
(drops the now-unused `import re`); catches a path separator ANYWHERE,
not just leading, so a non-leading Windows backslash can't slip through.
- Switch the per-entry skip in _ensure_loaded_locked from print() to
logger.warning to match the module's logging conventions.
- Add AUTHOR_MAP entry for the contributor.
- Add regression tests for the non-leading-separator case.
Extends the CWE-22 path traversal guard to cover Windows absolute paths
of the form C:/... and D:\... — previously only leading / and \ were
checked, which missed drive-letter prefixes. Replaces the inline
startswith check with a compiled module-level regex (_TRAVERSAL_RE) that
covers all three attack patterns: .., leading /\, and leading X: drives.
Adds two regression tests for C:/windows/system32 and D:\\path\\to\\file.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Addresses PR #9560 review comments: applies the CWE-22 fix to current main
(post-PR #458 rebase) and adds the requested regression tests.
- SessionEntry.from_dict now raises ValueError for session_key or session_id
containing '..' or starting with '/' or '\' (directory traversal guard)
- SessionStore._ensure_loaded moves per-entry validation inside the loop so
one malicious/corrupt entry is skipped with a warning instead of aborting
the entire sessions.json load
- Adds TestSessionEntryFromDictTraversalValidation (5 cases) and
TestEnsureLoadedSkipsInvalidEntries covering the skip-not-abort behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Secret redaction only matched `Authorization: Bearer <token>`. Other auth
headers passed through verbatim into logs, tool output, and transcripts:
- `Authorization: Basic <base64>` — leaks base64(user:password)
- `Authorization: token <pat>` / any non-Bearer scheme
- `Proxy-Authorization: ...`
- `x-api-key: <key>` (Anthropic and many providers) and `api-key`,
`x-goog-api-key`, `x-auth-token`, `x-access-token`, ... — opaque values with
no known vendor prefix were caught by nothing
A logged request or an echoed `curl -H "x-api-key: ..."` command therefore
leaked live credentials.
Generalize the Authorization rule to mask the credential for any scheme (and
Proxy-Authorization) while preserving the header name and scheme word for
debuggability, and add an api-key header rule for the single-opaque-value
headers. Bearer behavior is unchanged; plain prose containing the word
"authorization" (no colon-delimited value) is left untouched.
Adds regression tests for Basic/token/Proxy auth and the x-api-key/api-key
headers, including inside a curl command.
Follow-up to the salvaged #25961 fix: regression tests asserting that
scope-bearing IPv6 addresses (fe80::1%eth0, ::1%lo) are blocked by
is_safe_url after the scope is stripped, that a still-unparseable address
fails closed, and that a scoped IPv4-mapped IMDS address is caught by the
always-blocked floor.
ipaddress.ip_address() raises ValueError on IPv6 addresses with scope
IDs (e.g. 'fe80::1%eth0'). Both is_always_blocked_url() and is_safe_url()
silently skipped these via `except ValueError: continue`.
If ALL resolved addresses for a hostname carry scope IDs, every address
is skipped and the URL passes all safety checks — a potential SSRF
bypass vector against link-local or metadata endpoints.
Fix:
- Strip the scope ID (%eth0) before parsing in both functions
- is_safe_url(): fail closed (return False) with a warning log if still
unparseable after stripping
- is_always_blocked_url(): use continue (not return False) to preserve
multi-address scanning, with a warning log
Affected: tools/url_safety.py — is_always_blocked_url(), is_safe_url()
When a streamed Telegram reply finalizes, the stream consumer could take
the fresh-final path (send a new sendRichMessage + best-effort delete the
preview) purely because the time-based _should_send_fresh_final()
threshold elapsed — even though Telegram's prefers_fresh_final_streaming
returns False. The fresh Rich Message then overlapped the legacy
MarkdownV2 preview already on screen, leaving both visible (the #47048
table + bullet double-render).
Honor the adapter's decision: when prefers_fresh_final_streaming exists
on the adapter (checked on the class + instance __dict__ so MagicMock
auto-attrs don't false-positive) and declines, the time threshold no
longer overrides it. Adapters without the hook keep the time-based
fresh-final for backward compat.
Fixes#47048
Fold in the #40715 blank-env OOM fix on top of the host-resolution change:
- connect() now sets a non-retryable fatal error when required settings are
missing, so the gateway stops reconnecting against an empty host instead of
looping forever and leaking memory until the host OOM-kills.
- check_email_requirements() treats blank/whitespace-only EMAIL_* values as
missing, so an abandoned setup with empty keys no longer enables the platform.
Credits the parallel fixes by zerone0x (#40745) and liuhao1024 (#40829).
The email adapter read address/host purely from env vars and never stripped
them, so a missing or whitespace-padded EMAIL_IMAP_HOST reached
imaplib.IMAP4_SSL("") and surfaced as the misleading
"[Errno 8] nodename nor servname provided, or not known" — sending users down a
DNS rabbit hole when the real problem was an empty/dirty host string. A
config.yaml-only setup also left the host empty because __init__ ignored
PlatformConfig.extra, even though the "connected" check, the send helper, and
`hermes config show` already read address/imap_host/smtp_host from it.
Resolve address/imap_host/smtp_host from the env var first, then fall back to
config.extra, and strip surrounding whitespace — matching the send helper's
existing pattern. Validate the required settings at the start of connect() and
return False with an actionable message instead of attempting a connection with
an empty host.
Adds regression tests for whitespace stripping, config.extra fallback, and the
no-IMAP-attempt-on-missing-host path.
- Add thread-scoped regression test: interrupt on the waiting thread resolves
the approval as deny well under the 300s timeout; a foreign-thread interrupt
does NOT release the wait (interrupts are per-thread).
- Add panghuer023 to AUTHOR_MAP for the salvaged #37994 fix.
A dangerous-command gateway approval blocks the agent's execution thread
inside _await_gateway_decision() on threading.Event.wait() until the user
responds or the 5-minute approval timeout fires. The poll loop never checked
is_interrupted(), so /stop (which flags the agent's execution thread via
AIAgent.interrupt()) was silently ignored — the session stayed wedged until
timeout, even though /stop reported the session unlocked.
Check is_interrupted() at the top of the poll loop. The wait runs on the
agent's execution thread, the exact thread interrupt() flags, so the check
sees the signal and resolves the pending approval as deny — the agent loop
receives a normal denial and unwinds cleanly. Covers /stop, /new, and the
gateway inactivity-timeout interrupt through the single shared wait loop used
by both the terminal and execute_code guards.
A bare custom provider configured via `model.api_base` (the intuitive name
OpenAI-SDK / LiteLLM users reach for) was silently ignored: `hermes config set`
accepts any dotted key, so `model.api_base` got written and confirmed, but the
runtime resolver reads only `model.base_url`. Requests fell back to OpenRouter
with an empty key -> 401, zero hits to the custom endpoint (issue #8919).
Now api_base is migrated to base_url at load time (fixes existing broken
configs) and at set time (with a notice), never overriding an explicit
base_url. Closes#8919.
On Windows, _pause_windows_gateways_for_update() force-kills every running
gateway before mutating the venv. Gateways mapped to a profile (via
profile.path/gateway.pid) were respawned afterward, but gateways with NO
profile mapping — e.g. a Windows Scheduled Task running
"pythonw.exe -m hermes_cli.main gateway run" — were force-killed and only
told to restart manually. After an auto-update/bootstrap the Telegram bot
stayed dead until manual intervention.
Now we snapshot each unmapped gateway's argv (psutil, guarded by
looks_like_gateway_command_line) before the kill and replay it through the
same detached watcher used for profile gateways, so unmapped gateways come
back automatically too.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
When a /model switch resolves a valid model but the in-place agent swap
fails mid-conversation (expired key, unreachable base_url), the agent
rolls itself back to the old working model+client and re-raises. The
callers caught that re-raise, logged a warning, then committed the broken
switch anyway: wrote the failed model to the session DB, set
_session_model_overrides to the broken model/provider/key, and (gateway
direct path) evicted the working cached agent. The next message then
rebuilt a dead agent from the broken override -> permanently unusable
conversation (#50163).
Fix the whole caller class so a failed swap aborts the commit entirely:
- gateway/slash_commands.py (picker + direct /model paths): on swap
failure, early-return an error message; skip DB persist, session
override, cache eviction, and config write.
- cli.py (both /model handlers): snapshot CLI-level credential/runtime
fields before mutating, restore them on swap failure, and abort the
note + success print.
- tui_gateway/server.py: wrap the previously-unguarded swap; on failure
raise a clean error and skip worker restart, runtime persist, switch
marker, session model_override, and config persist.
The no-cached-agent path (apply-on-next-session) is unaffected.
Adds a gateway regression test that fails on the pre-fix behavior.
Per @egilewski's audit on this PR (#15544), the original fix was
correct but the file has refactored since: the four endpoint-local
empty-peer checks have been consolidated into _ws_client_is_allowed
and _ws_client_reason, but the helpers were left fail-open ('no peer
host known means allow' / 'no reason to block').
On a loopback-bound dashboard with auth disabled, an ASGI server
behind a misconfigured proxy or a unix-socket transport can deliver
ws.client == None or ws.client.host == ''. The helpers were treating
that as 'allowed', so the loopback-only peer gate could be bypassed
by anything that suppressed the client tuple in transit. All four
WebSocket endpoints (/api/pty, /api/ws, /api/pub, /api/events) route
through _ws_request_is_allowed -> _ws_client_is_allowed, so the gap
applied uniformly.
Fix:
* _ws_client_is_allowed: return False when client_host is empty
instead of True. Only reached on loopback bind with auth disabled
(auth_required=True and explicit non-loopback binds short-circuit
earlier), so the fail-closed behavior is scoped to the surface
that needs it.
* _ws_client_reason: return a 'missing_or_empty_peer bound=...'
block reason instead of None, so the dispatcher's existing
reason-based rejection path picks it up and the close gets logged
with a machine-parseable token for diagnosability.
Behavior unchanged for:
* gated mode (auth_required=True) — early-returns True before the
empty-peer check runs. The OAuth ticket is the auth at that point.
* explicit non-loopback bind (--host 0.0.0.0/::, or a specific LAN
address, always with --insecure) — early-returns True before the
empty-peer check runs. DNS-rebinding is still blocked by the
Host/Origin guard in _ws_host_origin_is_allowed.
* legitimate loopback peers (client_host == '127.0.0.1' / '::1') —
not affected by the empty-peer branch.
Regression tests added in tests/hermes_cli/test_dashboard_auth_ws_auth.py:
* test_empty_client_host_rejected_in_loopback_mode
* test_missing_client_object_rejected_in_loopback_mode
* test_empty_client_host_reason_is_block
Plus two regression guards to ensure the fix does not over-reach:
* test_empty_client_host_still_allowed_in_insecure_public_mode
* test_empty_client_host_still_allowed_in_gated_mode
All three new fail-closed tests fail without this patch (the helpers
return True / None for an empty peer) and pass with it. The 45
pre-existing tests in test_dashboard_auth_ws_auth.py continue to pass.
Baileys' jidDecode crashes ("Cannot destructure property 'user' of
jidDecode(...) as it is undefined") when handed a bare phone number, so
sending a WhatsApp message to +50766715226 / 50766715226 returned HTTP
500 and never delivered (#8637).
Add to_whatsapp_jid() to gateway/whatsapp_identity.py — the outbound
inverse of normalize_whatsapp_identifier: it builds the JID a send must
use (bare phone -> <digits>@s.whatsapp.net) and passes through already
qualified JIDs (@g.us, @lid, status@broadcast, @newsletter) unchanged.
Wire it at every outbound bridge call site in the WhatsApp adapter
(send, edit, media, typing, get_chat_info, and the standalone cron /
send_message sender).
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
When a Windows user relaunches Hermes while an in-app update is still
running (the desktop vanished with no progress and looks crashed), the
fresh instance spawns its own dashboard backend. That backend re-locks
the venv shim, the updater's straggler cleanup (force_kill_other_hermes
-> taskkill /F /T /IM hermes.exe) kills it, the launch dies with the 45s
"backend didn't come up" timeout, and the user relaunches into the same
trap -- an infinite respawn/kill loop (#50238).
Root cause: no mutual exclusion between an applying update and a fresh
desktop spawning its own local backend.
Fix: the updater publishes a HERMES_HOME/.hermes-update-in-progress
marker (pid + start time) for the whole run via an RAII drop-guard that
removes it on every exit path (success, early return, panic). A
freshly-launched desktop checks the marker before spawning its local
backend and PARKS until the update finishes -- then brings the backend
up itself (it is the surviving instance; the updater's own relaunch hits
the single-instance lock and quits). A stale marker (dead pid or past a
20-minute ceiling) is pruned so a crashed updater can never strand
future launches. No rogue backend spawns mid-update, so
force_kill_other_hermes has nothing legitimate to kill.
Marker parse/staleness logic is extracted to update-marker.cjs and
unit-tested; the Rust guard has unit tests; the Rust-write <-> JS-read
contract is E2E-verified.
In Telegram streaming, the typing indicator persisted through the slow
final rich-text/MarkdownV2 finalize edit, so the '...typing' bubble
lingered for seconds after the last streamed token. Add a one-shot
on_before_finalize hook to GatewayStreamConsumer, fired once when the
stream transitions into its finalization path, and wire it on both
Telegram streaming call sites to call pause_typing_for_chat() before
the final edit. Cover hook ordering and once-only behavior in tests.
Fixes#49712
Root cause of #49145: the Windows ZIP-update path did rmtree(dst) then
copytree(src, dst). If the copy failed partway — common on that path,
which only runs because file I/O is already flaky on the machine — the
directory was left deleted with nothing copied back. ui-tui/ vanishing
is what broke 'hermes --tui' (WinError 267), but the bug hit every
top-level directory.
_atomic_replace_dir stages the new copy into a sibling temp dir and only
swaps it in on full success, restoring the original on failure. A failed
update now leaves the live tree untouched instead of half-deleted.
The Windows update path can leave tracked ui-tui/ files deleted in the
working tree (HEAD intact). The guard now self-heals: when ui-tui/ is
missing in a git checkout, run `git restore -- ui-tui` and continue,
falling back to the printed manual-recovery steps only when git can't
recover it (no checkout / restore failed).
Builds on konsisumer's missing-workspace guard.
Answers a recurring plugin-author question: how to read the active
profile and drive Hermes from inside a hook callback when ctx._cli_ref
is None (gateway, hermes chat -q, and kanban-spawned worker sessions).
- Adds a 'Act from inside a hook' section to the plugin guide covering
ctx.profile_name and ctx.dispatch_tool as the session-agnostic APIs,
with a kanban_task_blocked example, and notes there is no in-process
slash-command bridge for headless workers (shell out via the terminal
tool instead).
- Adds the three kanban lifecycle hooks to the hook reference table with
their process semantics.
- Pins the contract with a regression test: ctx.dispatch_tool invokes a
tool handler with _cli_ref=None (worker/hook context).
Requested by @Smithangshu on Discord.
Follow-up on salvaged #50347: the event surface table was missing the
billing.step_up.verification switch case, and the File map omitted
lib/perfPane.tsx.
After a worker crash + reclaim + respawn, the board could show a task in the
Ready lane while its task_run was 'running' and the new worker was actively
executing (#36910). The dispatcher could then treat live work as available and
double-assign.
Root cause: the three reclaim paths (detect_crashed_workers,
release_stale_claims heartbeat-stale backstop, enforce_max_runtime) each
snapshot a task's worker_pid/claim_lock, do liveness work, then reset
tasks.status back to 'ready' with only a 'WHERE status=running' guard. If the
task was reclaimed AND re-claimed by a NEW worker in between (new run, new
claim_lock, live pid), the stale UPDATE clobbered the live task: status flipped
to 'ready' while the fresh run stayed 'running'. claim_task is the only writer
that sets status='running', so nothing put it back — permanent desync.
Fix: gate each reset on the snapshot's claim_lock (and worker_pid where
available) so it only fires when the task is still owned by the worker the
reclaim was computed for. A stale reclaim now no-ops (rowcount 0) instead of
desyncing a re-claimed task. Genuine crashes (lock still matches) reclaim
exactly as before.
This is the same race class the in-gateway dispatch lock (single-writer ticks)
mitigates, closed at the row level so a single dispatcher's fast
reclaim->respawn across two ticks is also safe.
Closes#36910.
Per @egilewski's audit on this PR, the security fix is behaviorally
correct but lacks focused regression coverage for the two traversal
vectors it closes. Adding tests now so the path-traversal guard
cannot silently regress.
* test_restore_rejects_snapshot_id_traversal -- exercises the
snapshot_id input guard with seven hostile values (parent
traversal, single parent, bare '.', bare '..', forward slash,
backslash, empty string). Each must return False without touching
the filesystem.
* test_restore_rejects_manifest_rel_traversal -- exercises the
manifest rel guard by injecting '../../outside.txt' into a real
snapshot's manifest.json, seeding a source payload at the escaped
path, and asserting the destination outside HERMES_HOME does not
exist after restore. This is the higher-value test of the pair --
verified locally that it fails without the fix in
restore_quick_snapshot (the escape destination gets written) and
passes with the fix in place.
The 67 pre-existing tests in test_backup.py continue to pass.
The gateway dispatcher captured kanban.auto_decompose ONCE at boot, so a user
who flipped it to false to STOP auto-decompose had no way to make that take
effect short of restarting the gateway. Reported (#49638): auto-decompose
created and launched tasks the user never intended (while they were still
typing the task description), and 'even Hermes Agent couldn't disable this
feature' — because the live config edit was silently ignored.
Auto-decompose is a safety toggle; turning it off must halt fan-out on the
next tick. The dispatcher now re-reads the flag (and auto_decompose_per_tick)
from config every tick via the extracted _resolve_auto_decompose_settings(),
which fails SAFE (disabled) on a config read error so a transient failure can
never re-enable a feature the user turned off.
Closes#49638.
connect() wrapped its entire body in an unbounded blocking flock(LOCK_EX) on
every call (_cross_process_init_lock). A single process stalled inside the
critical section — or a stale lock held by a wedged worker — blocked every
other connect(), including the long-lived gateway dispatcher's next-tick
connect, forever. No timeout, no traceback, no recovery: the board silently
stopped being worked until a manual restart (issue #36644).
Two fixes:
1. Fast-path skip: once THIS process has initialized a path, the expensive
first-open work (header validation, integrity probe, schema + additive
migrations) is already cached in _INITIALIZED_PATHS. The steady-state
connect has nothing for the cross-process lock to protect, so it now opens
the connection (WAL + pragmas) under only the cheap in-process _INIT_LOCK
and never touches the file lock. This removes the lock from the dispatcher's
hot path entirely — a stalled external 'hermes kanban list' can no longer
block ticks.
2. Bounded acquire: even on first-init, _cross_process_init_lock now retries a
non-blocking acquire up to a 10s deadline, then logs a WARNING and proceeds
WITHOUT the cross-process lock. Safe because the in-process _INIT_LOCK still
serializes same-process threads and the init work is idempotent
(CREATE TABLE IF NOT EXISTS + additive migrations) — worst case is redundant
work, not corruption. A bounded 'proceed anyway' beats an unbounded hang.
Windows path switched LK_LOCK -> LK_NBLCK (non-blocking) to match.
Closes#36644.
_default_spawn launched the worker subprocess with cwd=workspace and set
HERMES_KANBAN_WORKSPACE, but never set TERMINAL_CWD — so the worker inherited
the dispatching gateway's TERMINAL_CWD. That value takes precedence over the
process cwd in two places:
- tools/file_tools.py::_resolve_base_dir — a relative write_file path resolved
against the gateway user's home instead of the workspace, so artifacts
silently landed outside the workspace (#41312).
- agent_init's context-file loader — AGENTS.md was discovered relative to the
gateway's cwd, so under multi-profile dispatch a worker loaded whichever
gateway won the claim race's AGENTS.md, not the task's (#34619).
Both are the same root cause. Pinning TERMINAL_CWD to the workspace (where the
task's work actually happens) fixes both. Guarded on an existing absolute dir
because file_tools rejects relative/sentinel TERMINAL_CWD values — a non-dir
workspace leaves the inherited value rather than writing a meaningless one.
Closes#34619, closes#41312.
hermes -w created the worktree branch from the standalone clone's HEAD, which
lags origin when the clone isn't freshly updated (it's only refreshed by
hermes update, not per session). Every worktree branch then rooted on a stale
base, so the PR diff GitHub computes against current main ballooned with
unrelated changes and the agent had to discover the staleness at push time and
rebase.
_resolve_worktree_base() now fetches and branches from the freshest available
ref: the current branch's upstream if it tracks one (so a deliberate
feature-branch worktree tracks its own remote), else the remote's default
branch (origin/HEAD), else local HEAD as a fail-soft fallback (offline / no
remote / detached). A bogus 'origin/(unknown)' default is guarded, and worktree
creation retries from HEAD if branching off the remote ref fails — so this is
never worse than the old behavior.
Gated by worktree_sync (default true); set worktree_sync: false to keep the
old branch-from-local-HEAD behavior. The resolved base is printed in the
session banner.
This is the follow-up to the #50319 session, where the standalone clone was
213 commits behind origin and the worktree inherited that stale base.
Plugins could observe session/tool/approval lifecycle but had no way to
observe kanban task transitions. Adds three observer hooks fired by the
board's claim/complete/block transitions:
- kanban_task_claimed (dispatcher process, before worker spawn)
- kanban_task_completed (worker process, carries summary)
- kanban_task_blocked (worker process, carries reason)
Each fires AFTER the DB write txn commits, so a plugin observes durable
state and a slow/hanging callback can never hold the SQLite write lock.
All firing is best-effort: a raising hook is logged and swallowed and
never breaks a board transition. profile_name is resolved from
HERMES_HOME so dispatcher- and worker-side hooks carry the right profile.
Requested by @Smithangshu on Discord.
Plugins previously had no way to read the active profile name from the
PluginContext. The workaround in the wild — reaching into
ctx._manager._cli_ref — only works in an interactive CLI session;
_cli_ref is None in the gateway and in kanban-spawned worker sessions
(hermes -p <profile> chat -q ...), so the workaround breaks exactly
where multi-profile awareness matters most.
ctx.profile_name wraps hermes_cli.profiles.get_active_profile_name(),
which derives the name from HERMES_HOME and therefore works in every
execution context with zero dependency on _cli_ref.
After the agent's final response, the '...typing' bubble persisted ~5s.
send() re-triggers send_typing() after every delivery so the bubble
survives intermediate progress messages (Telegram clears typing on each
delivered message). But that re-trigger also fired on the FINAL send,
re-arming Telegram's ~5s timer AFTER the gateway had already torn down
its typing-refresh loop — and Telegram exposes no stop-typing API, so
nothing cancelled it.
Gate the post-send re-trigger on the absence of metadata['notify'] (set
only on the final user-visible reply via _mark_notify_metadata). Both
the rich-message and legacy send paths are covered; intermediate
progress sends still re-trigger so the bubble stays alive mid-response.
Fixes#48678
Add a platform-neutral send-failure vocabulary so consumers can branch on a
typed category instead of substring-matching the raw provider message.
- base.py: SEND_ERROR_KINDS + classify_send_error() (too_long / bad_format /
forbidden / not_found / rate_limited / transient / unknown), and an optional
SendResult.error_kind field (defaults None — fully backward compatible).
- telegram.py: populate error_kind on send() failures; message_too_long keeps
its existing error token plus error_kind='too_long'.
Purely additive: no behavioral change to the existing degrade-and-deliver
paths (MarkdownV2->plain-text fallback, overflow split, retry classification
all untouched). 22 new tests + 210 adapter regression tests green.
The port-announcement clock in waitForDashboardPort starts the instant the
backend process is spawned — before uvicorn binds its socket. On a cold
install the child first compiles and imports the whole hermes_cli.main ->
web_server -> FastAPI/uvicorn chain, and on Windows real-time AV scans every
freshly written .pyc. That pre-bind cost can exceed the old hardcoded 45s
deadline, so the desktop killed a healthy-but-still-starting backend and
respawned it, piling up orphaned processes (#50209).
Raise the default to 90s and make it overridable via
HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS, clamped to a 45s floor so a bad
override can't reintroduce the loop. Warm starts still announce in well under
a second; both call sites inherit the new default with no change. Adds
backend-ready.test.cjs (wired into test:desktop:platforms).
Three tests covering the scenarios from issue #50209 that could not be
validated with real Defender on a fresh install:
1. test_lifespan_warmup_is_nonblocking
Patches _warm_gateway_module to sleep 3 s. Measures TestClient startup
time — must complete in < 1.5 s, proving the fire-and-forget
run_in_executor does not block the event loop before port binding
(HERMES_DASHBOARD_READY timing proxy).
2. test_get_status_does_not_block_event_loop
Patches _resolve_restart_drain_timeout to sleep 3 s. Fires concurrent
GET /api/status and GET /api/version requests. /api/version must
respond in < 3 s while /api/status waits — proving the event loop
stays free during the slow import (15 s socket timeout would not fire).
3. test_concurrent_status_probes_all_respond
Three simultaneous /api/status probes with the slow patch — all must
return HTTP 200 (no connection resets, no orphan accumulation).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Fixes a regression introduced by the prior approach (synchronous import
hermes_cli.gateway inside _lifespan) that caused a new failure mode:
the blocking import stalled the asyncio event loop before uvicorn could
bind its port, pushing HERMES_DASHBOARD_READY past the desktop shell's
45 s announcement deadline and triggering a respawn loop that accumulated
orphaned backend processes.
Two-part fix:
_lifespan: replace the blocking import with a fire-and-forget
run_in_executor call (_warm_gateway_module). The import runs in a
worker thread while the server socket is already open, so
HERMES_DASHBOARD_READY fires without delay.
get_status: replace the inline lazy import with
await run_in_executor(None, _resolve_restart_drain_timeout). This is
the root fix for the original 15 s socket-timeout: the blocking
.pyc-compilation + Defender scan is offloaded to a thread, keeping the
event loop free for every /api/status probe. After the first call the
module is in sys.modules and the executor returns in microseconds.
Both helpers are extracted as module-level sync functions so they can
be unit-tested independently of FastAPI or uvicorn.
Closes#50209
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
On resource-contended hosts the embedded Hindsight daemon can exceed a
single 2s /health check; upstream then waits a grace window before
treating it as stale and killing+restarting it (hindsight-embed reads
HINDSIGHT_EMBED_PORT_HEALTH_GRACE_TIMEOUT, default 30s, into a
module-level constant at import time). Users on busy boxes had no
Hermes-side way to raise it short of hand-setting an env var.
Add a 'port_health_grace_timeout' config.json option to the Hindsight
plugin. When set, initialize() exports it to the process env BEFORE
daemon_embed_manager is imported (the import-time read is the contract).
setdefault() so an explicit operator env override always wins. Exposed
in 'hermes memory setup' for local_embedded mode.
Follow-up to #50308 / issue #13125 comment thread.
The read_file device guard now walks symlink hops before the file operation
layer, but that hop walk still interpreted relative paths against the Python
process cwd. In sessions where TERMINAL_CWD points at the task workspace, a
relative workspace symlink to a blocked alias such as /dev/../dev/stdin could
therefore miss the intermediate device target before later task-cwd resolution.
Anchor relative device checks to the task base before symlink-hop inspection so
the pre-I/O guard sees the same workspace path that read_file would otherwise
read. Absolute device paths and the existing final realpath fallback remain
unchanged.
Refs #10141
Refs #29158
Follow-up for salvaged PR #50256. Unit tests for the three behaviors:
retryable classification of Envoy/sidecar overflow strings, per-chat typing
cooldown with stop_typing reset, and the _supervise_sidecar crash-detection
path that raises a retryable fatal (and the clean-shutdown no-op).
When the Node spectrum-ts sidecar process exited mid-session (crash,
OOM, upstream overflow escalation), _supervise_sidecar returned
silently — readline hit EOF, the log-pump loop broke, and nothing
notified the gateway. _inbound_loop entered an infinite retry loop
against a dead port, _running stayed True, and the adapter remained
in self.adapters with no path to self-recovery short of a manual
gateway restart.
Add a death-detection tail to _supervise_sidecar: after the log-pump
exits (EOF or exception), guard on _inbound_running to distinguish
unexpected death from a deliberate disconnect(). On unexpected exit,
call _set_fatal_error("SIDECAR_CRASHED", retryable=True) followed by
_notify_fatal_error() so the reconnect watcher picks up the platform
within 30 s and retries with exponential backoff (30 s → 300 s cap)
until the sidecar comes back up. All other platforms remain unaffected.
The _inbound_running guard is safe against races: disconnect() sets
_inbound_running = False before _stop_sidecar() cancels the supervisor
task. CancelledError is BaseException, not Exception, so it bypasses
the except clause and propagates normally — the detection block never
runs during a clean shutdown.
Closes#50185
Two independent gaps let a transient Photon/Spectrum upstream overflow
degrade message delivery and amplify gRPC pressure:
1. _is_retryable_error did not recognise Photon- or Envoy-specific error
strings ("internal sidecar error", "upstream connect error",
"reset reason: overflow"), so _send_with_retry fell through to the
plain-text fallback immediately instead of backing off and retrying.
2. send_typing had no rate gate, so a burst of typing-indicator calls
during an overflow event kept hitting the upstream gRPC connection and
widened the failure window.
Fix:
- Add _PHOTON_RETRYABLE_PATTERNS with the three high-specificity Envoy /
sidecar substrings and override _is_retryable_error on PhotonAdapter to
check them after delegating to the base-class patterns. base.py and all
other adapters are untouched.
- Add a 5 s per-chat cooldown in send_typing backed by _typing_last_sent.
stop_typing clears the entry so the next start after a completed turn
fires immediately — only rapid consecutive starts without a stop are
suppressed.
- Reduce PhotonAdapter._send_with_retry default max_retries from 2 to 1
(single 2 s back-off check) — enough to confirm whether the Envoy
circuit-breaker has opened, without adding unnecessary latency.
All changes are scoped to plugins/platforms/photon/adapter.py.
* fix(api-server): stop silently promising async delivery on stateless HTTP path
terminal(notify_on_complete=True / watch_patterns) and delegate_task(background=True)
silently no-op'd on the API server / WebUI path (#10760): the watcher / detached
child registered, but every API-server route (OpenAI-spec /v1/chat/completions
and /v1/responses, plus the proprietary /v1/runs SSE stream) tears down its
channel when the turn ends, and APIServerAdapter.send() is a no-op stub. A
completion that fires after the response closed had nowhere to go — from the
agent side, indistinguishable from a hang.
There is no spec-compliant surface to wake the agent later on a stateless HTTP
client, so make the no-op honest instead of silent:
- Add a per-adapter capability flag supports_async_delivery (default True;
APIServerAdapter = False), propagated into a HERMES_SESSION_ASYNC_DELIVERY
contextvar via async_delivery_supported(). Toggle on the adapter, not a
hardcoded platform string — a future stateless adapter is correct-by-default.
- terminal: when delivery is unsupported, skip watcher registration, force
notify_on_complete off, and return a notify_unsupported note telling the
agent to process(action='poll').
- delegate_task: when delivery is unsupported, fall back to SYNCHRONOUS
execution (work runs and returns in the same response) with a note, instead
of handing out a handle that never resolves.
CLI (in-process completion_queue) and the real gateway platforms are unchanged.
Fixes#10760
* refactor(api-server): route session binding through a single no-delivery chokepoint
Add APIServerAdapter._bind_api_server_session() and route both agent-entry
paths (_run_agent for /v1/chat/completions + /v1/responses, and the /v1/runs
_run_sync path) through it. The helper hardwires platform="api_server" and
async_delivery=False with no async_delivery parameter to pass, so a future
route added to the API server physically cannot reintroduce the silent
no-op (#10760) by forgetting to mark the channel as non-delivering.
The binding stays request-scoped (cleared per turn), so a session resumed
later on a delivering interface (CLI / gateway platform) re-binds fresh and
is NOT blocked — the no-delivery decision tracks the interface handling the
current turn, never the session.
_collect_delegate_child_ids() walks the _delegate_from marker chain to
gather delegate subagents for cascade deletion, but started its visited
set empty. When the chain loops back onto a parent — a delegation cycle,
or a parent that is also another parent's delegate child when several ids
are deleted together — that parent was collected as one of its own
descendants and then permanently deleted, along with all of its messages,
by _delete_delegate_children().
Seed the visited set with the parent ids so they can never be re-collected,
and exclude them from the returned child set. Callers (delete_session,
bulk delete) remove the parents separately, so this only prevents the
unintended parent deletion; legitimate child collection is unchanged.
Add regression tests (in-memory sqlite) covering single/multi-level
delegate chains, the parent_session_id+marker branch, untagged children
(orphan-don't-delete contract), and the cycle case that previously leaked
the parent into the deletion set.
Fixes#49148
A shell-launched 'hermes gateway run --replace' / 'gateway restart' on a
systemd/launchd host can leave an orphan gateway whose kanban dispatcher
escapes the service cgroup, survives 'systemctl restart', and becomes a
second long-lived writer on the shared kanban.db. Two dispatchers that each
believe they own the file both pass SQLite busy_timeout and then race on WAL
frames — the documented root cause of multi-writer corruption (issue #35240).
The existing _guard_supervised_gateway_conflict startup guard blocks the
common way an orphan is born, but does nothing once a second dispatcher
already exists. This adds the defense-in-depth: dispatch_once now wraps every
tick in a non-blocking, board-scoped flock (_dispatch_tick_lock). A losing
dispatcher returns DispatchResult(skipped_locked=True) and does zero DB writes
this tick — so two dispatchers can never run a reclaim/spawn/write sequence
concurrently regardless of how the second one got there.
- Non-blocking (LOCK_NB): never stalls the gateway's async watcher.
- Board-scoped: lock file is a .dispatch.lock sibling of each board's
kanban.db, so unrelated boards tick in parallel.
- POSIX + Windows (fcntl / msvcrt LK_NBLCK), no-op degrade where neither
exists — mirrors the existing _cross_process_init_lock pattern.
Verified with a real two-process orphan repro: while a separate process holds
the lock, dispatch_once skips; after release it runs.
hermes backup only walks HERMES_HOME, so memory providers that keep
config/credentials in home-anchored dotdirs (honcho -> ~/.honcho,
hindsight -> ~/.hindsight, openviking -> ~/.openviking) lost that data
across a backup/import cycle — the peer IDs, session pairings, and API
keys never made it into the archive.
Add an optional MemoryProvider.backup_paths() hook (default []). The
active provider declares its external paths; backup resolves them from
config only (no init, no network), archives the ones under the home dir
into a reserved _external/ subtree encoded relative to home, and import
restores them to their original location with a home-anchored traversal
guard and 0600 on credential-shaped files. Paths outside home are
skipped as non-portable.
honcho, hindsight, and openviking override the hook. E2E-validated full
backup->import cycle plus 7 new tests.
Rich messages are not ready for primetime: current Telegram clients can
render Bot API 10.1 rich messages as blank/unsupported bubbles and make
them hard to copy as plain text, which is worse than the legacy
MarkdownV2 path for command snippets and mobile handoffs. Default the
rich_messages toggle to False so replies stay on the copyable legacy
path; users opt in per bot via platforms.telegram.extra.rich_messages:
true. Updates adapter, gateway config default, example config, English +
zh-Hans docs, and the default/opt-in tests.
A turn forcibly interrupted by the drain-timeout escalation never reaches
turn_finalizer.finalize_turn (the only place that flushes the turn to
state.db). Its in-flight tool rounds live only in the in-memory
_session_messages, so the immediate pre-restart turn was silently dropped
from load_transcript() on resume.
_finalize_shutdown_agents now flushes _session_messages to the SQLite
session store before teardown. The flush is idempotent (identity-tracked
in _flush_messages_to_session_db), so agents that finished gracefully
re-flush nothing. The resume_pending / fresh-tool-tail branches in
_handle_message_with_agent already expect a transcript whose tail may be a
pending tool result.
Fixes#13121.
Inbound image/audio/video payloads were buffered fully into process memory
before being written to the cache, with no size limit. A large upload
(Discord Nitro allows 500 MB) or a remote media URL in an inbound message
pointing at a huge file could spike RAM and OOM-kill the gateway.
Enforce a configurable cap in the shared cache helpers (gateway/platforms/
base.py) so the protection holds across every platform adapter, not one:
- cache_image/audio/video_from_bytes reject oversized payloads before writing
(video was the gap in the original report — now covered).
- cache_image/audio_from_url stream the body, rejecting on an oversized
Content-Length header and re-checking the running total per chunk so an
absent/lying header can't smuggle an unbounded body past the cap.
- Discord's _read_attachment_bytes checks att.size up front, so an oversized
attachment is rejected before any bytes are pulled into memory.
Configurable via gateway.max_inbound_media_bytes in config.yaml (default
128 MiB; 0 disables). No new env var — non-secret config lives in config.yaml.
Salvaged and extended from @sgaofen's PR #13341 (the original report and the
shared-helper approach). Reapplied onto current main (Discord adapter has
since moved to plugins/platforms/discord/), the configurable knob moved from
an env var to config.yaml, and the video cache helper added.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
The salvaged #19820 unifies the write_file guard under
_is_internal_file_tool_content with the message 'internal read_file
display text'. Two tests added to test_file_read_guards.py after the PR
branch point still asserted the old 'status text' wording. Update them
to match the new (correct, more general) message.
hermes config show printed the model dict raw via print(), bypassing the
logging redactor; a custom-provider api_key (e.g. Cloudflare cfut_...) was
shown in plaintext even with security.redact_secrets=true. Opaque tokens
don't match any vendor-prefix regex, so structural key-name masking is
required.
- Add redact_config_value(): recursively masks credential-shaped keys
(api_key/token/secret/... exact-match) via mask_secret.
- Wrap the show_config model dump in it.
- Mask the set_config_value echo when the leaf key is credential-shaped
(config set model.api_key routes to config.yaml, lowercase misses the
.env allowlist).
Bedrock Claude routes through the AnthropicBedrock SDK and injects
cache_control, so cached tokens are always reported — but the pricing
table had no cache cost fields for any Bedrock model, so /usage showed
"cost unknown" on every cached session. Also, cross-region inference
profiles (us./global./eu. prefixes) never matched the bare pricing keys.
- Add cache_read/cache_write rates to the four Bedrock Claude rows
(read 0.1x input, write 1.25x input per the Bedrock pricing page).
- Normalize the cross-region prefix in the Bedrock pricing lookup,
mirroring is_anthropic_bedrock_model's prefix list.
Closes#50295.
PostgreSQL's initdb refuses to run as root, so the embedded Hindsight
daemon could never initialize its data directory under root. The
daemon-start thread would fail, retry, and loop forever — each cycle
reloading embedding models (~958MB RAM, ~33% CPU) with no user-visible
error, leaving Hermes sluggish on a common VPS/cloud root setup.
initialize() now detects root (os.geteuid() == 0) before spawning the
daemon thread, disables local_embedded mode, and surfaces a clear
warning to both the log and the terminal so the user knows to run as a
non-root user or switch to cloud / local_external mode.
Closes#13125.
Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
The TUI /compress slash side-effect compressed the session, synced the
key, and emitted session.info — but returned an empty string, so the
user saw no 'Compressed: N → M messages / ~X → ~Y tokens' feedback. The
CLI (_manual_compress) and gateway (slash_commands) paths both already
call summarize_manual_compression; the TUI slash path was the lone gap.
Snapshot history + rough token estimate before and after compaction and
return the formatted summarize_manual_compression() feedback, mirroring
the session.compress RPC handler. The estimate uses the same
estimate_request_tokens_rough(system_prompt, tools) inputs as the RPC
path, re-reading the system prompt after compaction (it may be rebuilt).
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
The background-review fork (fires ~every 10 turns) pins
review_agent.session_id = agent.session_id — the parent's LIVE id — for
prefix-cache parity, then calls close(). With session finalization now in
close(), that would end the still-active parent session mid-conversation.
Set _end_session_on_close = False on the fork so the real owner (CLI close /
gateway reset / cron) finalizes the session instead.
Follow-up to the #12029 fix.
Funnel session finalization through AIAgent.close() — the single terminal
path every agent (CLI, gateway, subagent, cron) funnels through — so finished
agents stop leaving rows with ended_at IS NULL. The biggest leak source was
delegate_task subagent + background-review forks whose close() never ended
their row.
end_session() is first-reason-wins and no-ops on an already-ended row, so a
'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal
path is never clobbered. /resume already calls reopen_session(), so
finalizing-on-close does not break resumability.
Temporary helper agents that rotate/share the session forward (manual
compression, gateway session-hygiene) opt out via _end_session_on_close=False.
Also stop the long-running gateway heartbeat once the executor is done or the
session slot is rebound to a different agent, preventing a stale
'running: delegate_task' bubble from outliving its run.
Closes#12029.
The 'Session compressed N times — accuracy may degrade' warning went
through _vprint (CLI stdout only), so the Ink TUI / Telegram / Discord
never saw it — unlike the two other compression warnings in the same
module, which route through _emit_status (and store _compression_warning
for late-bound gateway status_callback replay).
Set agent._compression_warning + call agent._emit_status() for this
warning too, matching the sibling pattern. _emit_status still _vprints
for the CLI, so CLI output is unchanged; TUI / gateway surfaces now
receive it via status_callback (and replay_compression_warning can
re-deliver it once a late-bound gateway callback is wired).
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
When an OpenAI-compatible proxy (e.g. cmkey.cn, one-api Anthropic channels)
returns a well-formed HTTP 200 whose summary content is null or empty/
whitespace-only, _generate_summary coerced it to "" and stored a prefix-only
summary — silently replacing the compacted turns with nothing. The model then
lost all in-progress context after compression (#11978, #11914).
_validate_llm_response already guards None / empty-choices, so those never
reach the compressor; the gap was a well-formed response with empty *content*.
Now treat empty content as a summary failure: raise so it routes through the
existing main-model fallback then transient cooldown, dropping the turns
without a summary rather than wiping context with an empty one.
Also narrow the bare 'except RuntimeError' so only genuine 'No LLM provider
configured' errors take the 600s no-provider cooldown; empty/invalid-response
RuntimeErrors from a configured provider now correctly get the main-model
fallback instead of being misrouted into the long no-provider cooldown.
Reported by @Hung2124; area identified by @annguyenNous in #39590.
Follow-up to @de1tydev's poll-read-only fix. Removing the
_completion_consumed.add() from poll() fixes the gateway/tui watcher
suppression (#10156) but reintroduces the CLI duplicate that #8228 fixed:
a notify_on_complete process always enqueues a completion event, and the
CLI idle/post-turn drain would re-inject it as a [SYSTEM: ...] message
even though the agent already saw the exit inline in its poll result.
Add a separate _poll_observed set that poll() populates on an observed
exit. drain_notifications() (CLI only) skips poll-observed sessions; the
gateway/tui watchers keep checking only is_completion_consumed, so a
read-only poll never suppresses their autonomous delivery turn.
- _poll_observed pruned alongside _completion_consumed in _prune_if_needed
- 4 tests: CLI drain dedup after poll, gateway gate untouched, running
poll doesn't mark observed, wait/log still skip CLI drain
Security-hardening fix for the read_file device guard, not a new sandbox
boundary. The guard already rejects direct device paths and upstream now
has a resolved-path pass for workspace symlinks to blocked devices, but
its concrete-path helper still compared the expanded path before
normalization. That leaves residual alias cases where the dangerous path
is visible before final terminal-specific resolution, for example:
1. /dev/../dev/zero and /dev/./urandom should match the blocked-device
list as concrete paths, not only after final realpath;
2. /dev/stdin-style aliases can disappear once realpath follows them
to /proc/self/fd/0 and then to a tty path;
3. a user symlink to /dev/../dev/stdin exposes the dangerous
intermediate target before final resolution, but not necessarily
after it.
Normalize expanded paths before matching and inspect each symlink hop
before falling back to realpath. This preserves the existing /proc fd and
/proc pseudo-file guards while enforcing the intended security invariant:
model-supplied read paths must not reach blocking or infinite device
streams through spelling, normalization, or symlink-hop tricks.
Classification: security hardening / residual bypass fix for the
read_file device blocklist. This is defensive code at the file-tool
boundary, but it fixes a concrete denial-of-service class tracked as
security in #10141 and #29158.
Tests:
- normalized /dev/../dev/zero and /dev/./urandom aliases
- symlink to /dev/../dev/stdin blocked before realpath
- existing symlink-to-device and regular-symlink guards still pass
Fixes#10141Fixes#29158
Telegram Mac/Desktop Bot API 10.1 rich-message rendering leaves garbled
overlapping draft/overlay glyphs for CJK text (#47653), affecting every
message containing CJK characters. The legacy MarkdownV2 path renders the
same text cleanly, so skip the rich send / draft / final-edit paths up
front for content containing CJK (incl. astral-plane extensions) until
affected clients age out. Non-CJK rich rendering is preserved.
Fixes#47653
When the main provider is the Codex app-server runtime (api_mode
codex_app_server), the gateway showed no verbose 'running X' tool-progress
breadcrumbs on Telegram while every other provider did. The app-server
session processes item/started notifications (command execution, file
changes, MCP/dynamic tool calls) but never surfaced them as Hermes
tool-progress events — the session was constructed without an on_event
hook, so the agent's tool_progress_callback was never invoked on this
route.
Add _codex_note_to_tool_progress() mapping item/started → (tool_name,
preview, args) for commandExecution / fileChange / mcpToolCall /
dynamicToolCall, and wire an on_event hook into CodexAppServerSession that
forwards mapped events to agent.tool_progress_callback('tool.started',
...) — the same signature the chat_completions path uses (tool_executor.py).
Non-tool items (agentMessage/reasoning) and non-item/started methods map
to None and are ignored.
Co-authored-by: jplew <462836+jplew@users.noreply.github.com>
load_pool() is meant to be a read, but it persistently pruned env-seeded
pool entries whenever the calling process's os.environ lacked the seeding
var. A process without MINIMAX_API_KEY would delete the persisted
env:MINIMAX_API_KEY entry from auth.json for every other process, causing
auth.json to oscillate and auxiliary auto-detect to fall through to the
wrong provider.
env:* entries are persisted references re-hydrated from the environment on
each load — a missing var means "cannot re-seed right now", not "source is
gone forever". _prune_stale_seeded_entries now gates env-source removal
behind prune_env_sources (default True for explicit cleanup paths);
load_pool() passes prune_env_sources=False. File-backed singletons
(device-code OAuth, hermes_pkce) still prune when their backing file is
gone, and explicit removal via `hermes auth remove` (source suppression)
is unaffected.
Fixes#9331.
Co-authored-by: houko <suzukaze.haduki@gmail.com>
The newline normalization is the shared chokepoint for every rich send
(sendRichMessage, draft, and editMessageText). Injecting a Markdown hard
break (two trailing spaces) into a GFM table row separator corrupts the
natively-rendered table — the rich path's headline feature. Protect both
fenced code blocks AND pipe-table blocks as bare regions; only prose
between them gets hard breaks. Verified RICH_CONTENT and the existing
rich-table tests stay byte-identical.
Bot API 10.1 sendRichMessage treats a lone newline as a soft break, so
multi-line content joined with "\n".join(lines) — slash-command lists,
etc. — collapses into a single paragraph. Normalize single newlines to
Markdown hard breaks (two trailing spaces) in _rich_message_payload,
leaving paragraph breaks and fenced code blocks untouched.
Fixes#46070
The gateway pre-compression hygiene valve force-compressed any session
crossing 400 messages regardless of token usage. On large-context (1M+)
models doing many short, message-dense turns, a healthy session at ~16%
token usage could hit 400 messages and get force-compressed — and the
compression summary's stale Active Task could then bleed into the next
turn.
The valve's actual purpose is to break a death spiral: when API calls
keep disconnecting on an oversized session, no token-usage data arrives,
the token threshold never fires, and the transcript grows unbounded.
It's a count-based floor for that pathological case only. 400 was tuned
for ~200K-context models and is far too low for modern large-context
sessions. Raise the default to 5000 — still well clear of any death
spiral, but no longer firing on legitimate long conversations.
The value remains fully configurable via compression.hygiene_hard_message_limit.
The RPC-rename fallback swallowed all errors silently. Narrow it to log
the swallowed error via console.warn so a genuine session.title RPC
failure (which then surfaces a REST 404 for the runtime id) is
diagnosable instead of invisible. Behavior is unchanged: REST fallback
still runs for any session with a persisted row.
Verifies the active branched session renames via the session.title RPC
(not REST), and that REST is used for non-active rows, title clears, RPC
failures (socket mid-reconnect), and when no gateway is connected.
A freshly branched session (and any brand-new chat) lives only in the
gateway's in-memory _sessions map keyed by its runtime id — no row is
persisted to state.db until the first turn. The rename dialog hit REST
PATCH /api/sessions/{id}, which resolves against the stored sessions
table, so it 404'd with "Session not found" on these runtime-only rows.
Route the rename of the ACTIVE/selected session through the gateway's
session.title RPC (which resolves the live runtime session and persists
the row on demand), mirroring the /title slash command. Fall back to REST
for non-active rows, title clears, and when no gateway is connected.
The compaction threshold is max(context_length * threshold_percent,
MINIMUM_CONTEXT_LENGTH=64000). The floor prevents premature compression on
large models, but degenerates at small windows: a model at exactly 64000
ctx gets max(32000, 64000) = 64000 — a threshold equal to the ENTIRE
window. should_compress() can then never fire, because the provider
rejects the request before usage reaches 100%. Auto-compression silently
never triggers for any model whose context_length <= MINIMUM /
threshold_percent (e.g. 64K-per-slot local models).
Centralize the calc in _compute_threshold_tokens(). When the floor would
meet or exceed the context window, trigger at 85% of the window
(_MIN_CTX_TRIGGER_RATIO) — high enough that a minimum-context model uses
most of its budget before compacting (compacting at the 50% percentage
would waste half the small window), but below 100% so compaction actually
fires before the provider rejects the request. This mirrors the existing
gpt-5.5/Codex 85% autoraise rationale. Large-context behavior (floor at
64000) is unchanged; both call sites (__init__ and update_model) use the
shared helper.
Co-authored-by: soynchux <soynchuux@gmail.com>
Co-authored-by: LeonSGP43 <154585401+LeonSGP43@users.noreply.github.com>
Co-authored-by: Tranquil-Flow <tranquil_flow@protonmail.com>
The About > Updates panel only surfaced "See what's new" when an update
was available, which just opens the changelog overlay — there was no way
to start the install directly from About. Add an "Update now" primary
button that opens the updates overlay (for apply progress) and kicks off
the install for the active target (backend in remote mode, else client).
When terminal.backend is docker/modal/daytona/ssh/singularity, the
terminal runs in a sandboxed container with network isolation, but the
browser still runs on the host. The SSRF guard was skipped because
_is_local_backend() only checked browser.cloud_provider, not the
terminal backend.
Now _is_local_backend() also checks TERMINAL_ENV — when the terminal
is containerized, the browser is treated as non-local and SSRF
protection is enabled.
Fixes#38690
Mirror the CLI's exit-path behaviour in the TUI gateway so that
unpersisted conversation messages are flushed to state.db and the
on_session_end plugin hook fires before the session is closed.
Root cause: _finalize_session() only called db.end_session() to
mark the session row as ended, but did NOT flush in-memory messages
via _persist_session() or fire the on_session_end hook. When the
user force-quit (double Ctrl-C, terminal-close, SIGHUP) while the
agent was mid-turn, messages accumulated since the last persist
point were silently lost.
Changes
-------
tui_gateway/server.py - _finalize_session():
- Persist unflushed messages via agent._persist_session() before
db.end_session(). Prefers agent._session_messages (set by the
last _persist_session call inside run_conversation) over
session['history'] (stale when agent is mid-turn).
- Fire on_session_end(interrupted=True) plugin hook so crash-
recovery plugins can flush buffers, matching cli.py behaviour.
tui_gateway/entry.py - _log_signal():
- Explicitly call _shutdown_sessions() before sys.exit(0) in the
SIGHUP/SIGTERM handler as belt-and-suspenders over atexit.
tests/tui_gateway/test_finalize_session_persist.py (new):
- 11 tests covering: history persistence, _session_messages
priority, empty-history skip, missing-agent, double-finalize,
persist-exception resilience, hook firing, hook-exception
resilience, and db.end_session preservation.
Related
-------
Closes the TUI half of #5021 (CLI already handles this via its
atexit handler). Also addresses the session-persistence gap
discussed in #18465 and #18269.
The OpenAI-compatible API server only enforced a hardcoded cap of 10
concurrent runs on /v1/runs, leaving /v1/chat/completions and
/v1/responses unbounded — a request flood could exhaust CPU, memory,
and upstream LLM quota (#7483).
- Add gateway.api_server.max_concurrent_runs (config.yaml, default 10,
0 disables). No env var.
- Shared concurrency gate across all three agent-serving endpoints,
counting both the chat/responses in-flight counter and the /v1/runs
stream set. Returns OpenAI-style 429 + Retry-After when at the cap.
- Remove the dead hardcoded _MAX_CONCURRENT_RUNS class attribute.
Closes#7483.
When a turn hit max_iterations, finalize_turn ran three unguarded cleanup
steps after the model's summary — _save_trajectory (file I/O), _cleanup_task_resources
(remote VM/browser teardown), and _persist_session (SQLite write). Any raise
there propagated out of run_conversation, discarding the partial final_response
the caller was waiting for; subprocess wrappers saw an empty stdout with no
traceback (#8049).
Each step is now guarded independently so one failure can't skip the others.
Failures log at ERROR with a traceback and are surfaced on the result dict via
cleanup_errors; the partial response is always returned.
Closes#8049.
When truncate_message appends a (N/M) chunk indicator to a chunk that
had to close an in-progress fenced code block, the marker lands on the
closing fence line (``` \(1/2\) after MarkdownV2 escaping). Telegram
does not treat that as a clean closing fence and rejects the MarkdownV2,
falling back to plain text. Move the indicator onto its own line right
after the closing fence at all three legacy-send call sites.
Fixes#48517
ContextCompressor.update_model() recomputed context_length/threshold/budgets
but kept the cross-call calibration state (last_real_prompt_tokens,
last_rough_tokens_when_real_prompt_fit, last_compression_rough_tokens,
awaiting_real_usage_after_compression, _ineffective_compression_count) from the
PREVIOUS model.
Those fields encode 'the provider proved this prompt fit' / 'preflight can be
deferred' decisions valid only for the model that produced them. Carried across
a switch to a smaller-context model, should_defer_preflight_to_real_usage() used
the old model's 'it fit' history to SKIP a preflight compression the new model
actually needed — sending an oversized prompt the provider rejects (#23767).
update_model() now clears that state; the new model's first response repopulates
it via update_from_response(). Verified E2E: after a 200K->65,536 switch, defer
no longer suppresses and should_compress fires on an over-threshold estimate.
The tool-result persistence budget was a fixed 100K chars/result and 200K
chars/turn regardless of the active model. On a small-context model (e.g. a
65K-token local model switched into mid-session) a single large tool result
(reporter: a 279K-char search result) or a full 200K-char turn (~50K tokens)
could by itself approach or exceed the window, forcing an oversized request
that the provider rejects as "Prompt too long".
- budget_config.budget_for_context_window() scales per-result/per-turn char
caps to a fraction of the model window, clamped to the historical 100K/200K
defaults (large models unchanged) and floored so small models stay usable.
- resolve_threshold() now caps the per-tool registry value at default_result_size
so tools that register a fixed 100K cap (web/terminal/x_search) don't re-inflate
a scaled-down budget. No-op for the default budget (both 100K).
- tool_executor wires the agent's live context_length (recomputed on model
switch) into all four persist/turn-budget call sites.
read_file stays inf-pinned (no persist loop). Verified E2E: a 279K-char result
against a 65K model collapses to a ~1.6K preview; a 200K model is byte-identical
to today.
Second cleanup pass (simplify-code review of the first follow-up):
- write_runtime_status now clamps active_agents via parse_active_agents
instead of an inline max(0, int(...)). Removes the duplicated clamp the
helper's docstring acknowledged AND closes a write-side ValueError gap
(a non-numeric active_agents previously raised; now degrades to 0).
- hermes_cli/gateway.py draining-status line routes its active-agents count
through parse_active_agents too — the third coercion site of the same
persisted field, now consistent and non-raising with the two HTTP surfaces.
- web_server.py /api/status: the drain-timeout resolver fallback now catches
ImportError specifically and falls back to DEFAULT_GATEWAY_RESTART_DRAIN_TIMEOUT
(a real float) instead of a blanket 'except Exception -> None'. None would
have violated the surfaced field's int/float contract and stripped NAS's
poll-deadline hint silently.
- Dropped a redundant 'if runtime else 0' branch (parse_active_agents already
handles the empty/None case) and tightened the parse_active_agents docstring
to describe the actual single-contract role (write + both reads).
Follow-up cleanups on top of the busy/idle readout (PR #50103):
- web_server.py /api/status reused the single drain-timeout resolver
hermes_cli.gateway._get_restart_drain_timeout() (HERMES_RESTART_DRAIN_TIMEOUT
env -> agent.restart_drain_timeout config -> default) instead of inlining a
third hand-rolled copy of that precedence chain. Also fixes a subtle
divergence: the inline copy used os.environ.get() so a set-but-empty env var
was treated as a value rather than falling through to config; the shared
resolver .strip()s and falls through correctly.
- Added gateway.status.parse_active_agents() and routed BOTH HTTP surfaces
(/api/status and /health/detailed) through it, so the exposed active_agents
field is consistently clamped non-negative. Previously /api/status clamped
while /health/detailed exposed the raw file value, diverging on a corrupt
count.
- Added TestParseActiveAgents covering the shared coercion contract.
Give an external consumer (NAS) a trustworthy, always-reachable busy/idle
readout it can poll before a disruptive lifecycle action (restart,
migrate, stop, auto-update). The dashboard /api/status is the only HTTP
surface guaranteed up on a hosted agent regardless of which gateway
platforms are enabled, and it already reads gateway_state.json.
Add to /api/status (additive, non-breaking):
- active_agents — in-flight gateway-turn count (now refreshed
per-turn by the companion gateway-side commit)
- gateway_busy — running AND active_agents > 0
- gateway_drainable — running and live (a valid begin-drain target)
- restart_drain_timeout — resolved seconds, so the consumer can size its
poll deadline without out-of-band knowledge
(env HERMES_RESTART_DRAIN_TIMEOUT → config
agent.restart_drain_timeout → default)
The busy/drainable contract is defined once in gateway.status
(derive_gateway_busy / derive_gateway_drainable) and consumed by both
/api/status and /health/detailed so the two surfaces can never disagree.
Liveness keys off gateway_running (a live PID/health probe), NEVER
gateway_updated_at — a healthy idle gateway never advances that timestamp.
All derived fields degrade to safe falsy values when the gateway is down
or the status file is absent/corrupt (never a spurious "busy" that would
wedge the consumer). active_sessions (the 5-min DB recency heuristic the
SPA reads) is left exactly as-is — new signal, new fields.
Tests (behaviour contracts, not snapshots): the pure derivation contract
across every running/state/count/liveness combination; /api/status
integration for busy, idle-drainable, draining, down, stale-busy-file,
corrupt-count, and timeout surfacing; and /health/detailed parity.
The gateway only rewrote gateway_state.json on lifecycle transitions
(start/connect/drain/stop), never on turn start/end. Live-verified on a
hosted agent: a confirmed end-to-end turn ran while gateway_updated_at
stayed frozen at boot and active_agents was absent — so any active_agents
read from the file between transitions is stale. That makes it unusable
as a busy/idle signal for an external consumer (NAS deciding whether it's
safe to restart/migrate/auto-update an agent mid-turn).
Add _persist_active_agents(), called at every turn boundary:
- turn start: both running-agent sentinel-claim sites (normal inbound
message path + startup-resume path)
- turn end: the central _release_running_agent_state() choke point
(covers normal completion, /stop, /reset, sentinel cleanup,
stale-eviction — every path that ends a running turn)
It passes ONLY active_agents to write_runtime_status, leaving
gateway_state (and every other field) _UNSET so the read-merge-write
preserves the current lifecycle state. Passing gateway_state=None would
clobber it — hence a dedicated helper rather than reusing
_update_runtime_status. The write is the same cheap JSON write done on
lifecycle transitions today; best-effort (a failed status write never
disrupts a turn).
Behaviour-contract test: an active_agents-only write preserves both
running and draining gateway_state, and the count clamps non-negative.
`cron/jobs.py` resolved `HERMES_DIR`/`JOBS_FILE` from `get_hermes_home()`,
which follows the active profile override. So a job created from a
profile-scoped agent session (`hermes -p myprofile chat`, where the in-process
`cronjob` tool calls `create_job`) was written to
`~/.hermes/profiles/myprofile/cron/jobs.json`, while the profile-less gateway
(`hermes gateway run`) reads only `~/.hermes/cron/jobs.json`. The job was
silently orphaned: `cronjob action=list` from the same profile reported it
healthy (same file), but the gateway ticker never saw it and it never fired.
`last_run_at` stayed null forever. (#32091)
Fix: resolve the cron store from `get_default_hermes_root()` — the
purpose-built "profile-level operations" root that returns `<root>` even when
`HERMES_HOME` is `<root>/profiles/<name>` (and handles Docker/custom layouts).
Now the creator, the gateway scheduler, and the dashboard all agree on a
single jobs.json at the root, so a job created under any profile is visible to
the gateway.
Scope: this is the storage-location half of the fix. Making a job *execute*
under its originating profile's config/skills (a per-job `profile` field +
runtime context scoping, the #48649 sibling) is a separate, riskier change and
will follow as its own PR — keeping this layer minimal and safe.
Salvaged from #32117 by @mohamedorigami-jpg (authorship preserved). The
comprehensive #33839 (@sweetcornna) takes the same Option-A storage approach
and additionally adds the per-job profile execution scoping; this PR lands the
safe storage layer first.
Tests: `tests/cron/test_cron_profile_storage.py` — asserts the store anchors
at `<root>/cron` under a profile HERMES_HOME (not `<profile>/cron`), and is
unchanged when no profile is active. Full `tests/cron/` suite: 511 passed.
Fixes#32091
Co-authored-by: mohamedorigami-jpg <mohamed.origami@gmail.com>
When a platform-bundle name (e.g. `hermes-yuanbao`, or any `hermes-*`) lands
in `agent.disabled_toolsets`, the shared tool-assembly path
(`model_tools._compute_tool_definitions`, used by the gateway, cron, AND the
CLI) subtracted the WHOLE bundle from the enabled set. Because every platform
bundle is defined as `_HERMES_CORE_TOOLS + [platform extras]`, and core tools
are shared by every other enabled toolset, the subtraction emptied the tool
list entirely — the model received `tools: []` / `tool_choice: null` and
started replying "I cannot execute shell commands" with no error, no warning,
and `hermes tools list` / `hermes doctor` still green. For unattended cron
jobs this fails silently for days. (#33924)
(The original report framed this as gateway-only; it actually affects every
caller of `_compute_tool_definitions`, including the CLI — the reporter's
follow-up confirms this. Fixing the shared chokepoint covers all paths.)
Fix: for a `hermes-*` bundle in `disabled_toolsets`, subtract only its
*non-core delta* (its platform-specific tools plus those of any `includes`),
leaving `_HERMES_CORE_TOOLS` intact. Disabling a bundle now removes its
platform tools (e.g. the `yb_*` tools for `hermes-yuanbao`) while terminal,
read_file, web, etc. survive. A `logger.warning` notes that core tools are
preserved and that bundle names usually belong in `toolsets:`, not
`disabled_toolsets` — informative, not destructive (the subtraction still
behaves sensibly).
Salvaged from #33941 by @liuhao1024 (authorship preserved). Extracted the
inline bundle-resolution into a module-level `_bundle_non_core_tools` helper
(was re-importing `toolsets` inside the disable loop), and added the
informative warning folding in the UX intent of #34073 (@ousiaresearch)
without its hard "ignore the bundle name" behavior — which would have undone
this fix's sensible-subtraction.
Verified empirically: disabling `hermes-yuanbao` from a gateway-style enabled
set keeps all core tools (18→18) and would remove only the 5 `yb_*` tools;
disabling `hermes-discord` removes only `discord`/`discord_admin`.
Fixes#33924
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Follow-up to the salvaged preflight-compression warning:
- Replace silent `except Exception: pass` at all 5 guard call sites
(cli.py x2, gateway/slash_commands.py x2, tui_gateway/server.py) with
`logger.debug(...)` so signature drift in the guard helper isn't hidden.
- tui_gateway/server.py: set the confirm dict's `warning` field to the
merged message (was bare expensive-model text) so it matches
`confirm_message` for any future consumer reading `warning`.
- Add trailing newlines to the two new files.
Adds hermes_cli/context_switch_guard.py mirroring the model_cost_guard
pattern. When a user switches models mid-session (Herm TUI picker, CLI,
or /model on Telegram/Discord), the warning surfaces on the existing
ModelSwitchResult.warning_message path used by the expensive-model
guard if the new model's compression threshold is below the current
session size.
Partial fix for #23767 — addresses only the 'user-facing guardrail
when switching from a high-context provider to a substantially
lower-context provider' slice. The other proposed fixes from that
issue (hard preflight token guard, metadata cache invalidation on
switch, compression safety invariant, oversized tool-output handling)
are out of scope for this PR.
Two regression tests for the agentmemory reconnect-loop:
- _is_method_not_found_error matches the plain 'Unknown method: ping'
phrasing (no structural -32601 code).
- _keepalive_probe latches _ping_unsupported and falls back to list_tools
when send_ping raises 'Unknown method: ping', instead of propagating
(which would reconnect-loop).
A server that doesn't implement the optional 'ping' utility answers a
keepalive ping with JSON-RPC method-not-found. _is_method_not_found_error
latches that condition so the probe falls back to list_tools instead of
reconnect-looping.
The substring fallback only matched 'method not found' / '-32601' /
'not found: ping'. Servers that surface method-not-found as the common
'Unknown method: <name>' phrasing without a structural -32601 code (e.g.
agentmemory's MCP server) slipped through, so the fallback never latched
and the keepalive reconnect-looped every cycle.
Add 'unknown method' to the substring fallback so the ping->list_tools
keepalive fallback latches for these servers too.
Fixes#50028.
Follow-up to the salvaged #47450 fix:
- Extract expandProviderDefaults() so the curated-default expansion rule
lives in one place (was duplicated between defaultVisibleKeys and
resolveVisibleKeys).
- Drop the redundant new Set() wrap in toggleModelVisibility (resolveVisibleKeys
already returns a fresh Set; effectiveVisibleKeys already relied on this).
- Document the intentional re-enable behavior (re-enabling one model of a
hidden-all provider restores only that model, not the curated defaults) and
tighten the toggleModelVisibility JSDoc.
- Add 7 hardening tests: re-enable-restores-only-that-model, full hide/re-enable
round-trip, empty-non-null stored, single toggle-off from null defaults,
zero-model provider, and direct resolveVisibleKeys null/empty assertions.
#43496 added a per-provider hide-all sentinel ('provider::') so emptying a provider in the Edit Models dialog stopped re-expanding its defaults. That fixed the single-provider case, but the dialog's toggle handler seeds its working set from effectiveVisibleKeys(), which strips ALL sentinels before returning. So persisting after any toggle silently dropped every OTHER provider's hide-all sentinel; those providers then looked 'never customized' and re-enabled all their models on the next render.
Split resolution into two functions:
- resolveVisibleKeys(): stored keys + curated default expansion, with hide-all sentinels PRESERVED — the canonical working set the toggle handler mutates and persists.
- effectiveVisibleKeys(): resolveVisibleKeys() then strips sentinels, for display only (unchanged contract).
Move the toggle set-computation into a pure, unit-tested toggleModelVisibility() that seeds from resolveVisibleKeys(), so sibling sentinels survive the persist. Add regression tests that drive the real toggle handler across multiple providers.
Follow-up to #43496; completes the fix for #43485 (cross-provider case).
When a recurring job's execution time exceeds `interval + grace`, the
scheduler entered a perpetual "missed → fast-forward → skip" loop and the
job effectively never ran again. A real job (`hermes-upstream-contribution`)
logged 42 consecutive "missed" events over 9 hours without executing once.
Timeline (5-min interval, 150s grace, ~15-min execution):
14:00 due → advance next_run_at→14:05 → run (blocks 15 min)
14:15 finishes
14:16 tick: next_run_at=14:05, elapsed 660s > grace 150s → "missed!"
→ fast-forward to 14:21 → continue (SKIP) → does NOT run
... repeats forever for any job whose runtime > interval+grace.
The `continue` (skip execution) in `_get_due_jobs_locked` was designed to
prevent burst-catchup after *gateway downtime* — don't run 6 missed
instances of a 30-min job on restart. But it wrongly applied to a job that
missed its slot because it was *still running*, not because the gateway was
down.
Fix: keep the fast-forward (so accumulated missed slots are still collapsed
to a single next slot — no burst) but fall through to `due.append(job)` so
the job runs ONCE now. The log message is updated to be honest about the new
behavior ("Running now; next run fast-forwarded to: ...").
Behavior note: a recurring job missed during gateway downtime now also fires
once immediately on restart (rather than waiting for its next natural slot).
This is the intended trade-off — the same "run once, don't burst" rule now
applies uniformly to both downtime-misses and long-execution-misses.
Salvaged from #33318 by @liuhao1024 (authorship preserved). Also addresses
the diagnosis in #33361 (@agent-trivi), which proposed the same one-line fix.
Tests: updates `test_stale_past_due_skipped` →
`test_stale_past_due_runs_once_and_fast_forwards` (the old test encoded the
skip behavior); adds `test_long_execution_does_not_perpetually_defer` as a
direct regression for the production loop; updates the F2e timezone test that
relied on the old skip path. Full tests/cron/ suite: 510 passed.
Fixes#33315
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
PR #22410 added three-mode Telegram topic routing to the live message path
(TelegramAdapter.send via the gateway DeliveryRouter), but the cron delivery
path never got it. cron/scheduler.py::_deliver_result sent through the live
adapter with a bare ``{"thread_id": ...}`` and fell back to the standalone
_send_telegram, neither of which addresses Bot API Direct Messages topics
correctly. After Bot API 10.0 (2026-05-08), sending to a private chat with a
bare ``message_thread_id`` is rejected/mis-routed, so cron deliveries to a
private DM topic landed in the General topic instead of the requested lane.
Fix: the cron live-adapter branch now routes the text send through the
gateway's ``DeliveryRouter._deliver_to_platform`` — the same canonical path
live messages use — so it inherits all three Telegram routing modes:
1. Forum/supergroup (negative chat_id) -> message_thread_id
2. Bot API DM topics (private chat_id + numeric topic id) ->
direct_messages_topic_id (the case #22773 reported)
3. Hermes-created named private DM-topic lanes -> ensure_dm_topic +
reply anchor
For mode 2, a private-chat target with a numeric topic id is passed as
``direct_messages_topic_id`` metadata (verified end-to-end:
TelegramAdapter._thread_kwargs_for_send turns it into
``{message_thread_id: None, direct_messages_topic_id: <int>}``), instead of a
bare message_thread_id. Forum/supergroup and home-channel deliveries are
unchanged. The standalone fallback (gateway down) is preserved.
No new config knob and no duplicated routing logic — this reuses the existing
DeliveryRouter rather than reimplementing topic routing in the cron path.
Salvaged from #42051 (stepanov1975) and #23249 (devsart95), which both
diagnosed the missing three-mode routing in the cron/standalone path;
reimplemented onto the canonical DeliveryRouter that landed since those PRs
were opened.
Co-authored-by: Alex <9785479+stepanov1975@users.noreply.github.com>
Co-authored-by: devsart95 <devsart95@gmail.com>
A recurring cron job persists `next_run_at` as an absolute timestamp with a
UTC offset (e.g. `2026-05-19T21:00:00+10:00`). Cron expressions, however,
describe *local wall-clock* intent ("run at 21:00"). When Hermes/system
timezone changes after the timestamp was persisted, the stored instant is
re-interpreted in the new zone: `21:00+10:00` is the instant `13:00+02:00`,
which is `<= now` (13:02+02:00) — so the job fires HOURS EARLY, then
`compute_next_run` advances it via croniter to `21:00+02:00` the same day,
producing a SECOND fire. (#28934, recurrence of #24289.)
`_get_due_jobs_locked` now detects this precise migration case before the
due check: for a `cron` job whose converted instant looks due, whose stored
UTC offset differs from the current zone's, AND whose stored *wall-clock*
time is still in the future (distinguishing a migrated offset from a
genuinely missed run), it recomputes `next_run_at` from the schedule and
skips the early fire — preserving the local wall-clock intent.
Verified against the issue's reproducer: stored `21:00+10` under runtime
`+02:00` at wall-clock `13:02` is rescheduled to `21:00+02` instead of
firing early + again.
Salvaged from #28941 by @Tranquil-Flow (authorship preserved). Chosen over
the alternative approaches (#28951 normalize-to-UTC, #28985 rebase-and-match)
because UTC-normalization does not change the absolute-instant comparison and
so does not fix the early fire, and this guard is the tightest: it only acts
when all four conditions hold and reuses the existing `compute_next_run`.
Fixes#28934
`cronjob(action='run')` (and `hermes cron run`) only set `next_run_at = now`
and returned success, relying on the scheduler ticker to actually execute the
job on its next tick. When no gateway/ticker is running — a CLI-only setup, or
the Windows case in #41037 — the job never executed: `run` reported success,
but `last_run_at` stayed null forever, no output, no delivery.
A manual `run` should actually run. `_execute_job_now` now:
- **claims the job via `claim_job_for_fire`** — the same at-most-once CAS the
scheduler/external-provider fire path uses. This both advances `next_run_at`
for recurring jobs and blocks a concurrently-running gateway ticker from
double-firing the same job; if the claim is lost, the run is skipped (the
tool reports `execution_skipped`). This closes the double-fire race that a
bare `advance_next_run` left open (a tick whose `get_due_jobs` already
captured the job between trigger and advance would still fire it).
- **delegates firing to `run_one_job`** — the single shared
execute→save→deliver→mark body the ticker and external providers use — so
failure delivery, `[SILENT]` handling, and live-adapter delivery stay
identical across paths and can't drift. (The original salvage re-implemented
this sequence inline and had already dropped failure delivery + `[SILENT]`.)
The tool response carries `executed`, `execution_success`, and either
`execution_error` or `execution_skipped`. The `hermes cron run` CLI message no
longer claims "It will run on the next scheduler tick" — it reports the actual
"Ran now: succeeded/failed" outcome (or the skip).
Salvaged from #41130 by @kyssta-exe (authorship preserved); reworked to reuse
`claim_job_for_fire` + `run_one_job` per review rather than re-implementing the
fire sequence inline. Adds tests for the claim-then-fire path, claim-lost skip,
failure reporting, and exception capture.
Fixes#41037
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
The in-process cron ticker (cron/scheduler_provider.py) caught only
`Exception` and logged at DEBUG, so a `SystemExit`/`KeyboardInterrupt`
raised from a misbehaving provider SDK or agent retry path killed the
ticker thread silently. The gateway PROCESS stayed up, so `hermes cron
status` — which only checks `find_gateway_pids()` — kept reporting
"✓ jobs will fire automatically" while no jobs ever fired (#32612,
#32895).
This makes ticker death survivable and detectable:
- The ticker loop now catches `BaseException` and logs at ERROR with a
traceback, so a single bad tick no longer tears the thread down and
the failure is visible in the gateway log.
- The loop records a heartbeat (`cron/ticker_heartbeat`, epoch seconds)
on startup and after every tick — best-effort, never raised into the
loop. Both ticker entry points (the gateway and the desktop fallback
in web_server.py) funnel through `InProcessCronScheduler.start`, so one
heartbeat site covers both.
- `hermes cron status` now reads the heartbeat age: if the gateway is
running but the heartbeat is stale (> 200s, i.e. several missed ~60s
ticks), it reports the ticker as STALLED and suggests a restart instead
of falsely claiming jobs will fire. A missing heartbeat (older build /
never ran) is treated as "unknown", not "dead".
Adds tests for BaseException survival, per-iteration heartbeat recording,
heartbeat round-trip/age, staleness detection, and silent-write-failure.
Salvaged from #49660 (BaseException survival on current structure),
extended with the heartbeat + honest-status reporting that the earlier
(pre-refactor) watchdog PRs #35616 and #33849 proposed.
Fixes#32612Fixes#32895
Co-authored-by: banditburai <promptsiren@gmail.com>
Co-authored-by: sweetcornna <96944678+sweetcornna@users.noreply.github.com>
Consolidates three cron-delivery defects in cron/scheduler.py::_deliver_result
that all stem from how the live-adapter send result is interpreted.
#38922 — duplicate message on confirmation timeout.
future.result(timeout=60) raising TimeoutError bubbled to the outer
except handler, which left delivered=False, so `if not delivered:` re-sent
the identical message via the standalone path. future.cancel() cannot
un-send a request already in flight on the wire, so a slow confirmation
deterministically produced a duplicate. The send was already dispatched onto
the gateway loop, so a bare timeout is now treated as delivered
(assume-delivered is safer than guaranteed-duplicate) and the standalone
fallback is skipped. The live-adapter media attempt is also skipped on
timeout since the contended loop would re-block each 30s media budget.
#47056 — silent drop when the gateway has an active session.
The old check `if send_result is None or not getattr(send_result,
"success", True)` let a result object missing a `success` attribute default
to True = counted as a successful delivery, so the scheduler logged
"delivered via live adapter" while the gateway never processed the message.
Delivery is now confirmed via _confirm_adapter_delivery(): only an explicit,
truthy `success` attribute counts; None or a `success`-less object falls
through to the standalone path so the message actually arrives.
A genuine send Exception (not a slow confirmation) still falls through to
the standalone path, and is caught by run_job's outer handler — it is
recorded as the job's last_error and never crashes the cron ticker.
#43014 — deliver=origin fails to resolve in CLI sessions.
A CLI-created job has no {platform, chat_id} origin, so deliver=origin (and
auto-detect / deliver=None) was unresolvable and emitted "no delivery target
resolved" on every run. An unresolvable origin with no configured home
channel is now treated as local (output stays in last_output), matching the
documented auto-deliver contract; a concrete unresolvable platform target
still reports a real error.
Salvaged from #41007 (timeout discriminator), folding in #47127's
_confirm_adapter_delivery hardening and #38937 / #43063's origin→local
fallback. Tests rewritten as behavior contracts (timeout => no duplicate;
None / success-less result => standalone fallback; confirmed success => no
fallback; CLI origin => local, explicit platform => still errors).
Co-authored-by: Evi Nova <66773372+Tranquil-Flow@users.noreply.github.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
Cron jobs created without an explicit `model` are stored as `model: null`.
At fire time `run_job` resolved `model = job.get("model") or os.getenv(
"HERMES_MODEL") or ""` and then `_model_cfg.get("default", model)`, so when
config.yaml had no `model.default` (or `model: {default: null}`) an empty
string flowed straight to the provider and surfaced as an opaque HTTP 400
("Model parameter is required" / "model: String should have at least 1
character"). The operator had to inspect jobs.json to discover the job was
stored with a null model.
This change makes cron model resolution robust and symmetric with the CLI:
- Coerce `model: null`/missing config to `{}` so a falsy default never
overwrites an already-resolved env value with `None`.
- Only overwrite `model` from `model.default` when the resolved value is
truthy; accept a `model.model` alias key, mirroring the sibling resolvers
in hermes_cli/oneshot.py, fallback_cmd.py and prompt_size.py.
- Resolve AFTER the managed-scope overlay so an administrator-pinned model
still wins.
- Fail fast with an actionable error (caught by run_job's outer handler and
recorded as the job's last_error — the cron ticker is unaffected) instead
of letting an empty model reach the API.
- The per-job model is re-read every tick, so a `cronjob action=update
model=...` after a failed run takes effect on the next tick (no cache).
Adds tests/cron/conftest.py pinning a default HERMES_MODEL so existing
run_job tests don't trip the new guard, plus regression tests covering env
fallback, config.default fallback, string-form config, the model alias key,
null-default-no-clobber, corrupt-config graceful degradation, fail-fast,
and the no-cache re-read property.
Salvaged from #24005, rebased onto current main, with additional test
coverage folded in from #45550 and the alias-key behavior from #43952.
Fixes#43899Fixes#23979Fixes#22761
Co-authored-by: szzhoujiarui-sketch <szzhoujiarui@gmail.com>
Co-authored-by: rayjun <rayjun0412@gmail.com>
protect_first_n keeps the first N non-system messages verbatim through
compaction so the original task framing survives. But it was applied on
EVERY compression pass: the same early user turns were re-copied into each
child session and never summarized away, so across a long, repeatedly-
compressed session those old messages became immortal and grew the
protected head unboundedly (#11996, P1).
Decay it: protect_first_n applies on the FIRST compaction only. Once the
session has been compressed at least once (compression_count >= 1, or a
handoff summary already exists), the early turns are captured in the
summary, so _effective_protect_first_n() returns 0 and only the system
prompt stays protected. The decay is read at compress_start computation
time, before compression_count/_previous_summary are mutated at the end of
compress(), so the first pass still protects correctly.
Co-authored-by: truenorth-lj <liliangjya@gmail.com>
Co-authored-by: davidvv <david.vv@icloud.com>
Single-op replace/remove failed with a dead-end 'old_text is required'
error when a structured-output client omitted the optional old_text field
(it can't be schema-required without a top-level if/then combinator that
OpenAI's Codex backend 400s on). The model couldn't recover.
Now a missing old_text returns the current entry inventory plus a retry
instruction (mirroring the batch path's _batch_error), so the model can
reissue the call with old_text set. Also sharpens the old_text schema
description to state it's required for replace/remove.
Fixes#49466, #43412.
The native echo recovery handles replies to most rich messages, but
messages sent before the bot's first rich send have no echo to read.
record() was only called on the fresh-send path (_try_send_rich); a
streamed final finalized via _try_edit_rich/editMessageText was never
indexed, so a reply to it had neither a native echo nor an index entry.
Mirror the fresh-send record() into the edit success path to close
that gap.
Telegram DOES echo a rich message's content back in
reply_to_message.api_kwargs['rich_message']['blocks'] when a user
replies to it. Read that native field first in _build_message_event,
keeping the local send-time index only as a fallback. Duck-type
api_kwargs via .get() since it is a mappingproxy, not a dict.
Fixes#49534
The MCP section pointed to docs/mcp.md, which does not exist. Point it
to website/docs/user-guide/features/mcp.md, matching the existing
hooks.md reference convention in the same file.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Brings the antigravity-cli skill to parity with the codex / claude-code
delegation playbooks. Additive only — auth/sandbox/plugin/settings content
is unchanged.
- New 'Delegation patterns' section: one-shot, background bounded runs,
interactive PTY+tmux, parallel worktree fan-out, and an orchestration
boundary note (agy is a worker backend / reviewer, not a coordination
primitive).
- Documents the two ways agy -p differs from claude-code: plain-text
output (no --output-format json / result envelope) and bounding via
--print-timeout rather than a nonexistent --max-turns. Mirrored into
Pitfalls.
- Bumps version 0.1.0 -> 0.2.0.
The shell-hook stdin payload's extra object contains event-specific
kwargs, but the docstring only mentioned the field without listing
what each event actually puts inside it.
Add a reference table covering post_tool_call, pre_tool_call,
on_session_start, on_session_end, and subagent_stop — the five
hook sites that emit extra keys beyond the top-level payload.
Closes#49370
The MCP discovery wait is now bounded by the config-driven mcp_discovery_timeout
(default 1.5s), not the old 0.75s flat value. Updates the _schedule_mcp_late_refresh
docstring that still cited ~0.75s after #49208 made the bound configurable.
The terminal backend onboarding step pointed at
/docs/developer-guide/environments, which no longer exists. Point it at
the live docs page /docs/user-guide/configuration#terminal-backend-configuration.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Remove the 'you only log in once per machine' claim from spotify.md
and document the ~6-month refresh token expiry with re-auth instructions
- Add test_client_wraps_invalid_grant_as_spotify_auth_required_error to
confirm SpotifyClient wraps AuthError(code=spotify_refresh_invalid_grant)
into SpotifyAuthRequiredError with a user-facing message
Refs: #28155
resolve_spotify_runtime_credentials() called _refresh_spotify_oauth_state()
without a try/except, so a terminal failure (HTTP 400/401, invalid_grant,
refresh_token_reused) raised AuthError but left the dead refresh_token in
auth.json. Every subsequent session re-read and retried the same token over
the network, failing identically each time.
Fix: wrap the refresh call and, when exc.relogin_required is True and a
refresh_token is present, clear the dead OAuth fields (access_token,
refresh_token, expires_at, expires_in, obtained_at) and write a
last_auth_error quarantine marker to auth.json before re-raising. The next
call sees no access_token and fails fast with spotify_access_token_missing —
no network retry — and the user is prompted to re-authenticate.
Mirrors the quarantine pattern already in place for Nous, xAI-OAuth,
Codex-OAuth (#28116, #28118), and MiniMax-OAuth (#28119).
The SKILL.md template in CONTRIBUTING.md was missing the Prerequisites
and How to Run sections, even though the "modern section order"
guidance immediately below it lists both as required.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a new 'Project Context Files' section to the hermes-agent skill
explaining the priority order and discovery rules for .hermes.md,
AGENTS.md, CLAUDE.md, and .cursorrules. Specifically clarifies:
- .hermes.md walks parents up to the git root (good for monorepos)
- AGENTS.md / agents.md is cwd-only (portable to other agents)
- The 20K cap and head+tail truncation strategy
- The threat-pattern scanner behavior (blocks content, not file)
- What --ignore-rules actually skips (everything)
Also fixes an inaccurate docstring in agent/agent_init.py for
skip_context_files — the previous text only mentioned SOUL.md,
AGENTS.md, and .cursorrules, but the actual behavior (per
build_context_files_prompt and the --ignore-rules CLI flag) skips
all of them plus .hermes.md and CLAUDE.md.
Refs: https://github.com/NousResearch/hermes-agent/issues/46775
Clarify that session_search is secondary context and direct source identifiers must be inspected first when accessible. Add regression coverage for the tool description.
After context compression, the agent re-sent an already-delivered
generated image on every subsequent turn (#46627). The auto-append
fallback rescans full history when the message list shrinks (compression-
safe path), deduping against _history_media_paths — but that set was built
by scanning ONLY MEDIA: text tags in tool results. image_generate returns
its path in a JSON payload field (host_image/image/agent_visible_image),
never a MEDIA: tag, so generated-image paths never entered the dedup set
and were re-emitted after the boundary.
Extract the history-path collection into _collect_history_media_paths(),
which now covers BOTH delivery shapes: MEDIA: text tags AND image_generate
JSON-payload paths (mirroring what _collect_auto_append_media_tags
extracts). The inline block in _handle_message is replaced with a call to
the helper.
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
When LLM summarization fails, the deterministic fallback summary rendered
the latest user ask (active_task = "User asked: '<ask>'") verbatim under
THREE headings — Historical Task Snapshot, Historical In-Progress State,
and Historical Pending User Asks. Re-presenting an already-handled ask as
unresolved in-progress/pending work made the model re-answer it AND treat
the resurrected ask as the active turn, burying the genuinely-new
post-compaction user message (#49307: answer repetition + new-instruction
loss, P1).
Keep the latest ask once, under Task Snapshot, as historical context only.
The In-Progress and Pending-Asks sections now say 'Unknown / None
recoverable from deterministic fallback' (consistent with the Active
State / Key Decisions / Resolved Questions sections) and explicitly note
the ask is historical, not outstanding. The raw turn text still appears in
the verbatim 'Last Dropped Turns' transcript — that's the dropped-turn
record, not a re-labeled instruction.
Note: the separate role=assistant standalone-summary regurgitation
(#33256) is left as-is — that role choice is constrained by strict message
alternation (user collides with a user-ending head) and is already
mitigated by the summary end-marker; forcing the role would risk the
alternation invariant.
Co-authored-by: r266-tech <r2668940489@gmail.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
Context compression is atomic, but a gateway interrupt (an incoming user
message while the agent is busy) could abort the in-flight summary call.
The Codex Responses aux stream polls the thread interrupt flag and raised
InterruptedError unconditionally — so compression fell back to a degraded
static 'summary unavailable' marker, losing the real handoff (#23975).
Add a thread-local interrupt-protection flag (aux_interrupt_protection
context manager) in auxiliary_client; the Codex stream's cancellation
check honors it. The compressor wraps its summary call_llm in the context
manager. Timeouts still fire (a hung call must die) and all other aux
tasks (vision, web_extract, title_generation, …) stay interruptible.
Re-entrant, so the main-model retry recursion is safe.
Co-authored-by: konsisumer <der@konsi.org>
test_matrix_voice flaked in CI (6/7 failing on some shards, passing on
others and on main) depending on leaked MATRIX_REQUIRE_MENTION env state.
Root cause: the adapter defaults require_mention=True (falling back to the
MATRIX_REQUIRE_MENTION env var). These tests fire a group-room audio event
with no @mention, so _resolve_message_context drops it before dispatch
('No event was captured') whenever require_mention resolves True — which
happens in a clean shard, but an earlier test in another shard can leave
MATRIX_REQUIRE_MENTION=false in os.environ and mask it. The plugin
migration (#5600105478 adapter→bundled plugin) shifted shard composition
and exposed it.
Pin require_mention: False in the test adapter config so these media-TYPE
detection tests are no longer gated by the mention requirement, regardless
of ambient env. Verified: 7/7 pass with MATRIX_REQUIRE_MENTION=true (the
failing condition) AND with the env unset.
Gateway Session Hygiene auto-compression destroyed the original transcript
when the throwaway hygiene agent couldn't rotate the session (#21301, P1).
The _hyg_agent is built WITHOUT a session_db, so _compress_context cannot
end-and-fork the session (its rotate block is gated on agent._session_db).
The session_id stays unchanged, and the rewrite_transcript() call ran
UNCONDITIONALLY — replacing the full original transcript with just the
head+summary list. Permanent data loss on every hygiene compaction.
Guard the rewrite behind 'rotated OR in-place' exactly like the /compress
path already does (#44794/#39704): only overwrite when a new session id
was minted or in-place compaction succeeded; otherwise preserve the
original transcript and log a warning. The token/count bookkeeping that
followed the rewrite is moved inside the guard, with no-change values in
the preserve branch.
Co-authored-by: SandroHub013 <sandrohub013@gmail.com>
Co-authored-by: WuTianyi123 <wtyopenclaw@gmail.com>
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
#49431 corrected parents[2]->parents[3] for discord + raft only. The same
bug existed in slack, whatsapp, and telegram adapters (migrated from
gateway/platforms/ in 5600105478): each inserts parents[2] = plugins/ onto
sys.path[0], shadowing the real cron/ package with plugins/cron/ so
'import cron.scheduler_provider' raises ModuleNotFoundError on gateway start.
Fixes#49410, #49824.
When model.context_length is set in config.yaml, it blocks auto-detection
from the server's /v1/models endpoint. The skill incorrectly implied a
hard fallback to 131072. Add the resolution chain and the fix command
(hermes config set model.context_length "") to both the config table
and a new troubleshooting section.
CONTRIBUTING.md had no pre-work search step; the only duplicate-check is a
PR-template checkbox that fires at review time, after the work is already done.
Add a "Before You Start: Search First" section near the top so contributors
search open and merged PRs and issues (and the source, since the tracker can
lag the code) before building. References #38284 (the agent-side analog).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The Chinese README still told Windows users to install WSL2 and run
the Linux installer. Hermes now ships a native PowerShell install
script, so replace the outdated WSL2-only note with the direct
PowerShell one-liner.
Fixes: documentation accuracy / Windows onboarding
Closes#48835
The bundled himalaya skill and its website docs documented command
syntax that does not match Himalaya CLI v1.2.0.
Verified against pimalaya/himalaya v1.2.0 source:
- message move: MessageMoveCommand declares target_folder BEFORE
envelopes (src/email/message/command/move.rs) -> usage is
'<TARGET> <ID>...', so 'move 42 "Archive"' is wrong; correct is
'move "Archive" 42'.
- message copy: same ordering in copy.rs.
- attachment download: AttachmentDownloadCommand exposes the flag as
'-d, --downloads-dir <PATH>' (src/email/message/attachment/command/
download.rs), not '--dir'.
Fixed in all three surfaces that carried the wrong examples:
- skills/email/himalaya/SKILL.md
- website/docs/.../email-himalaya.md
- website/i18n/zh-Hans/.../email-himalaya.md
Three state-loss bugs at the compression rotation boundary, fixed together
because they all live in the same ~80-line rotation block:
- #33618: a persistent /goal did not follow the rotation. load_goal does a
flat per-session lookup with no lineage walk, so a goal silently died when
compression minted a fresh child id. Added migrate_goal_to_session() and
call it after the child session is created (move-not-copy: the parent row
is archived as cleared so exactly one active goal row exists).
- #33906/#33907: if the child create_session raised (FK constraint,
contended write), the outer handler only warned and let the agent continue
on the NEW id — which has no row in state.db — producing an orphan session.
Now the rotation rolls agent.session_id back to the still-indexed parent
(reopening it) instead of stranding the conversation on a phantom id.
- #27633: the compaction-boundary on_session_start notification omitted the
platform kwarg, so context-engine plugins saw source=unknown for every
message after the boundary. Forward platform (matching the initial
session-start call in agent_init.py).
Co-authored-by: denisqq <21260182+denisqq@users.noreply.github.com>
Co-authored-by: zccyman <16263913+zccyman@users.noreply.github.com>
Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
The subagent-demotion busy-handler test asserted the internal
merge_pending_message_event call, which the FIFO refactor replaced with
_queue_or_replace_pending_event. Assert the behavioral outcome (the
follow-up lands in the pending slot for the next turn) instead — same
fix already applied to the two steer-fallback tests.
When the agent is busy and the user sends multiple text follow-ups, the
interrupt-mode and steer-fallback path stored them via
merge_pending_message_event(merge_text=True), which newline-joins
consecutive TEXT messages into a SINGLE pending turn — collapsing two
separate user messages into one mashed-together turn and destroying the
message boundaries the user sees (#43066 sub-bug 2).
Route that storage through _queue_or_replace_pending_event (the same FIFO
infrastructure used by busy queue-mode and /queue) so each follow-up gets
its own next-turn slot in arrival order, while still preserving
photo-burst / album merge semantics for media. Pure queue-mode already
used FIFO; this brings the interrupt/steer-fallback path in line.
The sibling defect in #43066 (assistant messages lost after compaction)
was already fixed on main by the identity-tracking flush rewrite (#46053)
plus the pre-rotation flush (#47202), so this only addresses the
remaining busy-message-merge half.
Co-authored-by: KiruyaMomochi <65301509+KiruyaMomochi@users.noreply.github.com>
doctor's npm audit hardcoded PROJECT_ROOT/scripts/whatsapp-bridge. In
read-only Docker installs the bridge deps live in the writable HERMES_HOME
mirror (#49561), so node_modules was never found there and the bridge audit
silently skipped. Resolve the dir through the shared
resolve_whatsapp_bridge_dir() helper so doctor audits where deps actually
install. Falls back to the install-tree path if the helper is unavailable.
On Windows, hermes writes writer.bat (@echo off / hermes -p writer %*)
with CRLF endings instead of the POSIX writer shell script. The test
hardcoded the POSIX path and exact bytes, so it failed on Windows hosts.
Assert on stripped non-empty lines per platform, making it line-ending-
and OS-independent.
Follow-up to the salvaged worktree-materialization fix. When a worktree
task has no explicit workspace_path, resolve the anchor from the board's
default_workdir (a git repo) and materialize <repo>/.worktrees/<id> per
task, instead of silently rooting under the dispatcher's CWD (whatever
directory launched the gateway, e.g. the Hermes checkout). If no
default_workdir is configured, raise with a clear message rather than
guessing from CWD.
Adds AUTHOR_MAP entry for the salvaged commit.
The dispatcher treated workspace_kind=worktree as metadata only and never
ran 'git worktree add', so every worktree task ran in the main repo checkout
instead of an isolated worktree — concurrent tasks silently shared one tree
and contaminated each other.
This materializes a real linked worktree at <repo>/.worktrees/<task_id> on
branch wt/<task_id> when resolve_workspace() handles a worktree task, treats a
repo-root workspace_path as shorthand for that location, persists the derived
workspace/branch back onto the task row, and — on rerun/redispatch — detects an
already-materialized linked worktree (via git-common-dir) and reuses it instead
of nesting a second .worktrees/<id> inside it.
The kanban-worker skill taught kanban_complete with three full examples but
never mentioned the artifacts=[...] parameter added in #27813 — so a worker
reading the skill had no way to learn it can ship a chart/PDF/image as a
native upload to the subscriber's chat.
Adds a 'Shipping deliverables' section covering absolute-path rules, the
inline-vs-file extension behavior, and the trap that the notifier reads the
top-level artifacts list (NOT metadata.*).
Follow-up for salvaged #49654: unit tests for resolve_whatsapp_bridge_dir()
(writable passthrough, read-only mirror, existing-mirror reuse) and the
AUTHOR_MAP entry for the contributor.
In Docker the install tree (/opt/hermes) is read-only, so npm install for
the WhatsApp bridge fails with EACCES. Add resolve_whatsapp_bridge_dir() in
whatsapp_common.py: when the install dir is read-only, mirror the bridge
source into a writable HERMES_HOME location and use that. Both the
adapter and the 'hermes whatsapp' CLI resolve through the shared helper so
the install and runtime paths agree.
Fixes#49561
Add a regression test for #47868 asserting convert_messages strips the
internal per-message timestamp field, plus the identity-return path for
timestamp-free message lists. Map x7peeps for the release attribution gate.
Per-message timestamp metadata injected by _apply_persist_user_message_override
leaks into the Chat Completions payload sent to the provider. Strict OpenAI-compatible
providers (e.g. Fireworks-backed endpoints like OpenCode Go 'glm-5.2', Mistral, Kimi)
reject this schema-foreign field with HTTP 400:
Extra inputs are not permitted, field: 'messages[0].timestamp'
The ChatCompletionsTransport.convert_messages already strips known internal-only
fields (tool_name, _-prefixed scaffolding keys, codex_reasoning_items, etc.) — add
timestamp to that list.
Closes#47868
On a Windows profile whose folder name contains a space (e.g. "First Last"),
Windows can expose %TEMP%/%TMP% as an 8.3 short path
(C:\Users\FIRST~1.LAS\AppData\Local\Temp). PowerShell's FileSystem provider
mishandles the "~1.ext" component when the path reaches a provider cmdlet such
as `Tee-Object -FilePath`, throwing:
An object at the specified path C:\Users\FIRST~1.LAS does not exist.
Every Node/Electron install+build stage streams its log to %TEMP% via
Tee-Object, so they all abort with that error (browser-tools npm, Playwright,
TUI npm, and the hard-failing desktop build), while the Python/uv stages --
which never write a side log to %TEMP% through a provider cmdlet -- succeed.
Normalize %TEMP%/%TMP% to their long form once, up front, so every downstream
cmdlet and child process sees a path the provider can resolve.
Fixes#39308
Adds a ConvertTo-LongPath helper to install.ps1 that expands a Windows 8.3
short path (e.g. C:\Users\FIRST~1.LAS) back to its long form via
Scripting.FileSystemObject. Paths without a "~<digit>" component are returned
unchanged (no COM round-trip), and any COM failure falls back to the input.
Adds an AST-loaded unit test that exercises the helper without executing the
installer body (pass-through, null/empty, and graceful fallback).
Add the in-window floating pet (sprite, speech bubble, contact shadow,
profile-scoped, resize-safe) and a pop-out always-on-top overlay window
with gestures and notifications. Add the Cmd+K pet picker page plus the
appearance gallery and size slider in settings. Includes the pet stores,
electron overlay wiring, i18n strings, and store tests.
Add the Ink pet sprite pane, the interactive /pet picker overlay, and live
pet switching/rescale driven by new tui_gateway RPCs (pet state, pet.scale,
per-state frames). Wires pet flash state and the picker into the TUI layout
and slash handler. Covered by the slash-handler test.
Render the reactive pet pane in the classic CLI (steady redraw,
right-aligned) and wire the /pet command to list and switch pets, plus an
enable/disable toggle. Backed by hermes_cli/pets.py and the CLI commands
mixin, registered in the central command registry. Covered by the CLI pet
pane and toggle tests.
Add the shared pet engine under agent/pet/: spritesheet manifest loading
and in-process caching, six-state animation model, frame rendering, and
the persistent pet store. Register the display.pet config block (pet,
scale, enabled, etc.) that every surface reads from. Covered by
tests/agent/test_pet_engine.py.
When the auxiliary summary call fails with an authentication/permission
error (HTTP 401/403), context compression now ABORTS and preserves the
session unchanged instead of rotating into a child session with a
placeholder summary.
Before: a 401 (invalid/blocked key, or a token pointed at the wrong
inference host) fell through every transient-error check to 'return
None', and because compression.abort_on_summary_failure defaults False,
compress() took the static-fallback path and rotated the session anyway
(messages N->N). The user landed on a fresh-but-broken session that kept
failing the same way — paying for a full-context API call each turn with
no useful compression.
After: _generate_summary classifies 401/403 as a non-recoverable auth
failure (_last_summary_auth_failure) and compress() aborts on it
regardless of abort_on_summary_failure. A distinct auxiliary summary_model
that 401s still retries once on the main model first (its dedicated creds
may be the only broken thing); the abort only sticks when the main model
itself auth-fails or the fallback also auth-fails. The existing
_last_compress_aborted handling in conversation_compression.py already
skips rotation and emits a warning, so no session rotation occurs.
Tests: TestAuthFailureAborts — 401/403 flagging, compress() aborts despite
flag=False, non-auth failures keep the historical fallback path, and
aux-model auth failure recovers on main without aborting.
When the active provider returns a 401/403 that survives its per-provider
credential-refresh attempt (revoked OAuth, blocked/expired key, or an
account pinned to a dead/staging inference endpoint), the conversation
loop now escalates to the configured fallback chain instead of dead-ending.
Before: the generic failover dispatch fired only for {rate_limit, billing};
auth/auth_permanent fell through to 'switch providers manually' advice and
never called _try_activate_fallback(). A user whose primary credential was
broken kept thrashing on the same dead credential every turn — the main
agent appeared 'stuck in fallback mode' while never actually failing over.
This also affected auxiliary tasks (compression, vision, title-gen), since
auto-resolved aux follows the main provider.
After: a persistent auth failure with a configured fallback chain switches
to the next provider (mirroring the rate-limit/billing failover path),
guarded one-shot per attempt by TurnRetryState.auth_failover_attempted.
When no fallback is configured the behavior is unchanged — it falls through
to the existing terminal handling and provider-specific troubleshooting
guidance.
Tests: test_auth_provider_failover.py — 401/403 classify as auth, the
gating condition fires only with a chain present + guard unset, the guard
blocks repeats, and non-auth (500) errors do not trigger auth failover.
* feat(delegation): single-task delegate_task always runs in the background
The model no longer decides whether a subagent runs in the background — a
single-task delegate_task from the top-level agent is now always dispatched
async, so the parent turn returns immediately and the subagent's result
re-enters the conversation when it finishes.
- run_agent._dispatch_delegate_task (the live model path) forces
background=True for top-level single-task calls; the schema-level
`background` param is ignored.
- A batch (tasks with >1 item) stays synchronous (fan-out can't go async).
- A delegation from an orchestrator subagent (depth > 0) stays synchronous —
it needs its workers' results within its own turn.
- The function-level default is unchanged, so direct Python callers/tests keep
the historical synchronous behavior.
- On async-pool capacity rejection, single-task now falls through to a
synchronous run instead of erroring (the child stays attached for interrupt
propagation; detach happens only on a successful dispatch).
- Schema `background` param marked deprecated/ignored; tool description
updated to state the always-background single-task rule.
* feat(delegation): all delegate_task fan-out runs in the background
Extend the always-background behavior to the full fan-out. A batch is now
dispatched as N independent async subagents (one handle each), instead of
running synchronously. Single task and batch both return immediately; each
subagent's result re-enters the conversation as its own message when it
finishes.
- delegate_task: when background is set, loop over ALL built children and
dispatch each via dispatch_async_delegation; return a combined handle block
(count + per-task delegation_ids). Children the async pool rejects (at
capacity) run synchronously inline and are reported alongside the dispatched
handles, so nothing is silently dropped.
- run_agent._dispatch_delegate_task + registry handler: force background for
any top-level model delegation (single OR batch); orchestrator subagents
(depth > 0) still run synchronously since they need workers' results within
their own turn.
- Removed the v1 'batch async not supported' rejection.
- Tool description updated: BOTH MODES RUN IN THE BACKGROUND.
- Tests updated to assert batch fan-out dispatches each task async (verified
E2E: 3-task batch -> 3 independent completion-queue events).
* fix(delegation): background fan-out joins and returns one consolidated block
Correct the fan-out semantics: a backgrounded batch is dispatched as ONE
async unit (one handle, one async-pool slot), not N independent dispatches.
The unit runs all children in parallel, waits on every one, and emits a
SINGLE completion event carrying the consolidated per-task results. The chat
is never blocked; when all subagents finish, their full summaries re-enter
the conversation together as one message.
- async_delegation.dispatch_async_delegation_batch + _finalize_batch: a batch
occupies one slot; its runner returns the combined {results:[...]} dict and
one event with the full results list is pushed to the completion queue.
- delegate_tool: extract the sync execution+aggregation into
_execute_and_aggregate(); background dispatches it via the batch unit and
returns one handle; on pool-capacity rejection it runs the batch inline.
- process_registry._format_async_delegation: render a consolidated multi-task
block (TASK i/N + per-task summary) when the event carries is_batch/results.
- Tests updated; E2E verified: 3-task batch -> immediate return -> one combined
completion block with all three summaries.
Async-delegation completions (delegate_task(background=true)) and
background-process completions (terminal notify_on_complete) re-enter the
originating session as internal MessageEvents. When the session was busy,
_handle_active_session_busy_message treated them like a user TEXT message and
the default busy_input_mode='interrupt' aborted the active turn (and sent a
'Interrupting current task' ack) — the opposite of the design invariant that a
completion surfaces as a new turn only when idle.
Short-circuit internal events to return False so the base adapter queues them
silently (it already excludes internal events from debounce), cascading them as
the next turn after the current one finishes.
Review nit (yoniebans): the config.py comment still said compaction is
'lossy: the pre-compaction transcript is discarded, matching Claude Code /
Codex' — leftover from the original destructive design. The shipped behavior
is soft-archive: lossy for the LIVE context (what the model reloads), but the
pre-compaction turns are kept on disk (active=0, compacted=1), searchable via
session_search and recoverable. Comment now says so. Comment-only; no behavior
change.
Follow-up to the soft-archive durability fix. Reusing the rewind/undo active=0
flag for compaction-archived turns inherited the wrong search semantics: undo
rows are intentionally HIDDEN from session_search (the user took them back), but
compaction-archived turns must stay DISCOVERABLE — that is the whole point of
Teknium's "searchable / recoverable" requirement. As built, search_messages
defaulted to WHERE active=1, so after in-place compaction the pre-compaction
turns were in the FTS index but filtered out of the default search. (The earlier
"searchable" claim only held for a raw FTS query / include_inactive=True, not
the actual session_search tool.)
Empirically confirmed the gap: search 'HMAC' returned 2 hits before compaction,
1 after (only the summary's mention) — the originals were hidden.
Fix — a `compacted` flag distinct from `active`, giving a 3-way state:
- active=1, compacted=0 → live context (normal)
- active=0, compacted=1 → compaction-archived: OUT of live context, IN search
- active=0, compacted=0 → rewind/undo: OUT of live context, OUT of search
Changes:
- messages.compacted INTEGER NOT NULL DEFAULT 0 added to SCHEMA_SQL. Declarative
_reconcile_columns adds it on existing DBs — no version bump (plain column add).
- archive_and_compact: UPDATE … SET active=0, compacted=1 (was active=0 only).
- search_messages: default WHERE active=1 → (active=1 OR compacted=1), on BOTH
the main FTS5 path and the trigram CJK path. include_inactive=True still
returns everything. The short-CJK LIKE fallback already returns all rows
(no active filter) — unchanged.
- Docstrings on archive_and_compact + search_messages document the 3-way state.
Verified: after compaction, session_search default finds the archived originals
(ids 1 & 4); rewind/undo rows stay hidden by default (recoverable via
include_inactive); live context still excludes both. 322 in-place + hermes_state
tests and 46 session_search tests green; ruff clean. Mutation check: reverting
the search WHERE to active-only fails the new searchable test.
(Surfaced by the question "is search semantic or only FTS?" — answer: session
search is FTS5 keyword/BM25 only, no embeddings over the transcript; semantic
retrieval lives in the optional memory-provider layer. Tracing that confirmed
the active-only filter gap above.)
Teknium review: keeping one durable session id must NOT come at the cost of
destroying history. The prior in-place implementation used replace_messages,
which hard-DELETEs the pre-compaction turns (they also drop out of the FTS
index) — same id, but the original conversation is gone with no recovery path
and the summary becomes the only record. Rotation today is non-destructive
(the old session's full transcript survives under the old id); in-place must
match that durability contract, not weaken it.
Fix: compact in place by SOFT-ARCHIVING, reusing the existing messages.active
flag (the /undo soft-delete mechanic), instead of deleting:
- New SessionDB.archive_and_compact(session_id, compacted): in one atomic
write, UPDATE messages SET active=0 on the live turns, then insert the
compacted set as fresh active=1 rows. Nothing is deleted.
- The insert loop is extracted into a shared _insert_message_rows() helper so
archive_and_compact and replace_messages don't duplicate the 60-line
column/encoding block (extend-don't-duplicate).
- Agent in-place branch calls archive_and_compact instead of replace_messages.
Durability outcome (proven by test + E2E across repeated compactions):
- Live context load (get_messages_as_conversation / get_messages) filters
active=1, so a resume reloads ONLY the compacted set — compaction still
shrinks the live session.
- The pre-compaction turns stay on disk at active=0, recoverable via
get_messages(include_inactive=True) / restore_rewound.
- They remain FTS-searchable: the messages_fts* triggers index on INSERT and
remove on DELETE only — they do NOT key on active, and active=0 is a
content-preserving UPDATE. session_search still finds them.
- Verified across TWO successive compactions: the 1st compaction's originals
are still recoverable + searchable after the 2nd (answers the "no recovery
path after the next compaction" concern directly).
message_count now reflects the LIVE (active/compacted) count, matching the
live load. replace_messages keeps its DELETE semantics (still correct for
/retry, /undo) and gains a docstring note pointing compaction at the
non-destructive method.
Tests: test_in_place_keeps_same_session_id strengthened to assert the 8
seeded originals survive at active=0 alongside the 2 compacted rows AND stay
FTS-searchable. Mutation check: swapping archive_and_compact back to a hard
DELETE fails the test, so the non-destructive contract is bound. 285
hermes_state + in-place tests green; rotation/persistence/compress-command/cli
suites green; ruff clean.
Parallel 3-reviewer cleanup of the in-place compaction code. Findings applied:
- perf: in-place mode no longer pre-flushes current-turn messages. The flush
ran INSERTs that the immediately-following replace_messages(compressed)
DELETE+reinsert discarded -- pure wasted writes per compaction. The
current-turn tail survives via the compressor's compressed output
(protect_last_n), not the flush. Verified no data loss; rotation still
pre-flushes (its old session row is preserved, so the flush is real there).
- quality: hoist the two shared post-write steps (update_system_prompt +
_last_flushed_db_idx = 0) below the if/else -- they ran in both branches
against agent.session_id. Removes the easiest divergence bug.
- quality: compute the compaction-boundary locals (_old_sid, _is_boundary,
_boundary_parent) ONCE instead of recomputing locals().get('old_session_id')
and the "_old_sid or agent.session_id or ''" chain three times.
- quality: initialize compacted_in_place up front and assign
agent._last_compaction_in_place directly, dropping the fragile
locals().get('compacted_in_place') reflection.
- reuse: parse the in_place config flag with utils.is_truthy_value (the
project's canonical truthy coerce) instead of a hand-rolled
str().lower() in {...} (agent_init already imports from utils).
Dropped as false positives / out of scope: gateway getattr of agent internals
(established session_id pattern), dual result-dict carry (mirrors history_offset
etc.), stringly-typed "compression" (codebase-wide convention, no constant).
Behavior-preserving: 7 in-place tests (incl. 2 new flush-guard tests) + 26
rotation/boundary/persistence/command tests green; mutation check confirms the
durable-replace guard still binds (removing replace_messages fails the test);
ruff clean. Added test_in_place_skips_redundant_preflush /
test_rotation_still_preflushes to guard the perf change.
Review (Codex + 3-agent parallel) found the first cut of in-place mode was
incomplete: it only updated the system prompt, so the persisted transcript
stayed 'full history + summary' and the next turn/resume reloaded the full
history and immediately re-compacted (a loop), and every downstream layer
that keyed off session-id rotation silently no-op'd. The session_id was
doing double duty as the 'compaction happened' signal. This wires the whole
path so removing rotation is actually complete:
Agent (agent/conversation_compression.py):
- In-place now DURABLY replaces the transcript: replace_messages(session_id,
compressed) on the same row (the canonical store the gateway reloads from),
not just update_system_prompt. Resume reloads the compacted set; no loop.
- Reset flush identity/cursor (_last_flushed_db_idx=0, _flushed_db_message_ids
cleared) so next-turn appends diff against the compacted transcript.
- Expose a rotation-independent signal: agent._last_compaction_in_place, and
in_place=True on the session:compress event.
- Fire the compaction-boundary hooks (context-engine on_session_start, memory
manager on_session_switch, reason='compression') in BOTH modes — in-place
passes the same id as parent so DAG/buffer state still checkpoints. Without
this, memory/context plugins miss every in-place compaction.
Gateway auto-compress (gateway/run.py):
- Read agent._last_compaction_in_place; set history_offset=0 on rotation OR
in-place (both return the compacted set, so slicing past the pre-compaction
length would drop everything). Carry compacted_in_place in the result dict.
- No extra rewrite needed: the agent shares the gateway's SessionDB, so its
replace_messages already updated the canonical store load_transcript reads.
Manual /compress (gateway/slash_commands.py):
- The throwaway /compress agent has no _session_db, so rewrite_transcript is
the durable write. Previously gated behind 'if rotated:' which treated
'id unchanged' as the #44794 data-loss failure case and SKIPPED the rewrite
— making /compress a silent no-op in in-place mode. Now rewrites on rotated
OR in_place; the data-loss guard still fires only for the genuine
no-rotation-AND-not-in-place failure.
Hygiene auto-compress already writes _compressed to the same id
unconditionally (its agent has no _session_db, can't rotate) — correct for
in-place, no change.
Tests (tests/run_agent/test_in_place_compaction.py):
- Assert the DURABLE transcript IS the compacted set after reload
(get_messages_as_conversation == compacted), message_count==2, flush
identity reset, and the rotation-independent signal set on in-place /
unset on rotation. Rotation regression guard unchanged.
Verified: 64 tests green across in-place + rotation/persistence/boundary/
concurrent/failure-sync/command/cli suites; E2E both modes (durable replace,
gateway offset=0, rotation preserves old transcript); ruff clean. Still
default-off.
Context compression today rewrites the message list AND rotates the
session id — it ends the session, forks a parent_session_id child, and
renumbers the title (name -> name #2). That moving identity key is the
root cause of a whole bug cluster: /goal lost (#33618), pending response
lost at the split (#14238), orphan sessions (#33907), TUI sid desync
(#36777), FTS search gaps + duplicate sidebar entries (#45117), null
continuation cwd (#42228), and title-rename dead-ends (#48989). It also
forced a large defensive apparatus (compression lock, contextvar/env/
logging triple-sync, orphan finalization, gateway SessionEntry
re-propagation, tip projection) whose only job is surviving a
mid-conversation id change.
Add a compression.in_place config flag (default False during rollout).
When True, compaction rewrites the transcript and rebuilds the system
prompt but keeps the SAME session_id: no end_session, no child row, no
title renumber, no contextvar/logging re-sync, no memory/context-engine
session-switch. The conversation keeps one durable id for life, like
Claude Code / Codex. Compaction is lossy by design — the pre-compaction
transcript is summarized away, not archived.
The rotation path is unchanged when the flag is off (moved verbatim into
an else branch). Staged rollout: this PR ships the option behind a
default-off flag for live validation; a follow-up flips the default and
deletes the now-redundant rotation machinery, superseding the 14 open
band-aid PRs in this area.
- hermes_cli/config.py: add compression.in_place (default False), documented
- agent/agent_init.py: resolve the flag -> agent.compression_in_place
- agent/conversation_compression.py: branch compress_context() on the flag
- tests/run_agent/test_in_place_compaction.py: in-place invariants +
rotation regression guard + config default
The pre-flush of current-turn messages (#47202) runs in BOTH modes, so no
boundary data loss. Prompt-cache invariant preserved: the system-prompt
rebuild is the same single sanctioned invalidation that already happens
during compaction — no NEW invalidation. Message alternation preserved.
A nous inference_base_url that fails the host allowlist (e.g. a stale
stg-inference-api.nousresearch.com persisted before the allowlist
existed) was only replaced 'if refreshed_url:' — so when the validator
rejected the URL it left the poisoned value in place. The 'falling back
to default' warning fired but never took effect: every subsequent call,
including the auxiliary compression call, kept hitting the dead staging
endpoint and 401'd.
Reset to DEFAULT_NOUS_INFERENCE_URL when validation returns None at both
refresh sites in resolve_nous_runtime_credentials, so a poisoned
auth.json self-heals on the next refresh. The proxy adapter already did
this correctly; this brings the two auth.py sites in line.
The session-stable system prompt embeds Model:/Provider: identity lines,
but mid-turn failover (try_activate_fallback) swaps the runtime without
touching them, so a fallback model misreports itself as the primary when
asked "what model are you?".
rewrite_prompt_model_identity() rewrites the last occurrence of each line
on _cached_system_prompt when a fallback activates (and back on restore,
byte-identical so the primary's prefix cache still hits). The rewrite is
never persisted to the session DB. _sync_failover_system_message() patches
the in-flight api_messages[0] at all 8 failover sites so the current turn
ships the corrected identity. Cache-safe: the fallback's prefix cache is
cold on a model switch anyway.
Co-authored-by: Hermes Agent <noreply@nousresearch.com>
* feat(setup): Blank Slate setup mode — minimal agent, opt in to everything
Adds a third first-time setup option alongside Quick Setup and Full Setup.
Blank Slate forces ON only what an agent needs to run — provider & model,
the File Operations toolset, and the Terminal toolset — and turns
everything else OFF, then walks the user through opting each capability
back in.
What it does:
- platform_toolsets.cli = [file, terminal] (explicit, authoritative list)
- agent.disabled_toolsets = every other known toolset (web, browser,
code_execution, vision, memory, delegation, cronjob, skills, image_gen,
kanban, …). Applied last in the resolver, so it overrides the
non-configurable platform-toolset recovery that would otherwise re-add
toolsets like kanban — guaranteeing a true blank slate.
- Optional config features off: compression, memory + user-profile capture,
checkpoints, smart model routing, auto session reset.
- Bundled skills default to NONE (reuses the .no-bundled-skills marker);
offers to seed the full catalog.
- Walks through tools / plugins / MCP / messaging, all opt-in.
Proven end-to-end: with the Blank Slate config, model_tools.get_tool_definitions
emits exactly 6 schemas — patch, process, read_file, search_files, terminal,
write_file. Nothing else reaches the model.
Re-enable later via hermes tools / hermes skills opt-in --sync /
hermes setup agent.
Tests: tests/hermes_cli/test_setup_blank_slate.py (8 tests) pin the writers,
the resolver invariant ({file, terminal}), and the 6-schema end-to-end set.
Docs: getting-started/quickstart.md documents all three setup modes.
* feat(setup): Blank Slate fork — finish minimal, or walk through configs
After applying the minimal baseline (provider/model + file + terminal,
everything else off), Blank Slate now presents a choice instead of always
running the full walkthrough:
1. Start with everything disabled — finish now with the minimal agent.
2. Walk through all configurations — opt in to tools, skills, plugins, MCP,
and messaging.
Provider/model and terminal are still configured first either way (the agent
can't run without them). The finish-now path records the bundled-skill opt-out
so future `hermes update` runs don't re-inject skills. The walkthrough body
moved to a separate _blank_slate_walkthrough() helper.
Tests: TestBlankSlateFork covers both branches (finish-now applies baseline +
skill opt-out and skips the walkthrough; walkthrough path invokes it). Docs
updated to describe the fork.
test_telegram_webhook_secret reads telegram adapter source by path; point it
at plugins/platforms/telegram/adapter.py. test_windows_native_support
npm-spawn parametrization referenced gateway/platforms/whatsapp.py; point it at
plugins/platforms/whatsapp/adapter.py.
Salvage of PR #41284 onto current main. Relocates the last 9 inline messaging
adapters (+ satellites: telegram_network, feishu_comment/_rules/meeting_invite,
wecom_crypto, wecom_callback) from gateway/platforms/ into self-contained
bundled plugins under plugins/platforms/<x>/, discovered via the platform
registry. Strips the per-platform core touchpoints from gateway/run.py,
gateway/config.py, hermes_cli/gateway.py, hermes_cli/setup.py, and
tools/send_message_tool.py.
Carries forward the migration fixes (explicit enabled:false honored,
get_connected_platforms forces discovery, plugin is_connected via
gateway.get_env_value, logs --component gateway matches plugins.platforms.*,
matrix hidden on Windows).
Additionally ports config keys main added since the PR base: the matrix
plugin's _apply_yaml_config now also covers allowed_users,
ignore_user_patterns, process_notices, and session_scope (the inline
gateway/config.py matrix block gained these in the 1340 commits the PR sat
open; they would otherwise have been silently dropped on deletion).
`_sent_message_timestamps` (the reply-to-own-message quote cache) used a
`set` evicted with `set.pop()`, which removes an ARBITRARY element — so once
more than the cap (500) outbound timestamps are tracked, a still-recent
timestamp could be dropped while older ones survive, missing a genuine
reply-to-own-message. Convert it to an OrderedDict with FIFO (oldest-first)
eviction, mirroring the recently-hardened echo ring (#31250). This closes the
same bug class on the sibling cache.
Adds a regression test asserting oldest-first eviction + MRU promotion.
The hardened echo ring (#31250) changes _recent_sent_timestamps from a set
to an OrderedDict, so the reply-detection-cache regression test from the quote
salvage can no longer call .discard(); route it through the new
_consume_sent_timestamp() helper, which is the real echo-removal path.
Review follow-up on the salvaged self-mention strip (#31217): the original
only stripped the bot's rendered @<number>/@<uuid> self-mention inside the
`require_mention=true` branch, so groups with require_mention=false still
leaked it into the agent text. Hoist the strip to run for every group message
(fixing the whole bug class), and collapse the doubled space a mid-sentence
removal leaves while preserving intentional newlines.
The contributor-check CI auto-resolves only the +id form of GitHub noreply
emails; lkz-de's commits use the legacy plain form
(lkz-de@users.noreply.github.com), so add an explicit AUTHOR_MAP entry.
Widen the env_float() guard from #48735 across the whole bug class: a
non-numeric value (e.g. a stale .env "HERMES_API_TIMEOUT=abc" or a typo'd
port) raised an unhandled ValueError and crashed adapter/agent init.
Converts 22 genuinely-unguarded first-party int/float(os.getenv()) sites to
the canonical utils.env_int / utils.env_float helpers (the established house
pattern), instead of duplicating per-module helpers or inline try/except:
- gateway/config.py: WECOM_CALLBACK_PORT, BLUEBUBBLES_WEBHOOK_PORT
- gateway/platforms/email.py: EMAIL_IMAP/SMTP_PORT, EMAIL_POLL_INTERVAL
- gateway/platforms/feishu.py: dedup cache + text/media batch settings
- gateway/platforms/wecom.py, discord/adapter.py: text batch delays
- gateway/platforms/telegram.py: media batch delay, TELEGRAM_WEBHOOK_PORT
- gateway/platforms/whatsapp.py: WHATSAPP_NPM_INSTALL_TIMEOUT
- hermes_cli/auth.py: CODEX/XAI refresh timeouts
- agent/chat_completion_helpers.py: API/stream read/stale timeouts
- run_agent.py, agent/auxiliary_client.py: API + nous timeouts
Sites already guarded by try/except or local helpers are left untouched.
The HERMES_MAX_ITERATIONS sites are already guarded on main via
_current_max_iterations(), so they are not included.
Verify createLinkTitleWindow mutes audio (regression guard for #49505) and
keeps the hardened offscreen defaults, and register the new test file in the
desktop platforms test script.
Tier-2 link-title resolution loads the URL in an offscreen BrowserWindow to
read its <title> when curl can't. That window was never muted, so pages that
autoplay media (e.g. YouTube `watch` URLs) leaked ~2s of audio every time a
session containing such links was re-rendered. Move the window creation into a
dedicated helper that calls `webContents.setAudioMuted(true)` immediately after
construction, so the offscreen probe can never emit sound.
Fixes#49505
Review follow-up on the salvaged AAC + markdown changes:
- Fix an inaccurate comment claiming the STT layer has a sniff-and-remux
fallback (verified: no such fallback exists; the ffmpeg-absent path caches
raw ADTS and STT may reject it).
- Type the _markdown_to_signal wrapper as tuple[str, list[str]] to match the
shared helper instead of a bare tuple.
- Replace the hardcoded /home/pi/... test fixture with a runtime-generated
ADTS AAC sample so the remux round-trip actually runs in CI (skips only
when ffmpeg is absent) instead of always-skipping.
Mirrors the existing env_int() helper: returns the default when the
variable is unset or non-numeric instead of raising ValueError. Used by
the follow-up commit to guard malformed float env vars across the gateway.
Salvaged from #48735 (@annguyenNous). The PR's api_server.py change is
now redundant — main guards HERMES_MAX_ITERATIONS via
_current_max_iterations().
Android Signal delivers voice notes as raw ADTS AAC frames, which
share the `0xFF 0xFx` sync word with MPEG-1/2 Layer 3 (MP3). The
`_guess_extension` byte-signature test in gateway/platforms/signal.py
was matching both, so ADTS AAC was being misclassified as MP3 — saved
to disk with the wrong extension and rejected by every major STT API
(Groq, OpenAI) because their server-side format sniffers inspect the
actual codec, not the file extension.
Two changes:
1. Tighten the MP3 vs ADTS disambiguator. ADTS packs `ID`,
`layer`, and `protection_absent` into bits 3-0 of byte 1, where
`ID=0` and `layer=00` for AAC. Real MP3 has `ID=1` and
`layer` in {01, 10, 11}. The mask `0xF6` against target `0xF0`
cleanly separates them.
2. Remux raw ADTS AAC to MP4 container at the cache step via
`ffmpeg -c:a copy`. Single demux/remux, no re-encode, no quality
loss, sub-100ms on a Pi 5. The cached file is a normal `.m4a`
that all major STT providers accept. ffmpeg is a transitive
dependency of many other Hermes features (TTS, video skills) so
this isn't a new install requirement; the remux degrades
gracefully to a no-op if ffmpeg is missing.
The new helper `_remux_aac_to_m4a` is unit-tested with a real
Android voice note from the audio cache that originally triggered
the bug, plus synthetic ADTS frames for the byte-level
disambiguator and garbage-input graceful failure.
Closes the gap that broke transcription for any Android Signal user
sending voice messages to Hermes.
Route Signal send paths through shared markdown formatting helpers and render markdown bullets consistently as Unicode bullets. Add coverage for Signal formatting and send_message integration.
* feat(desktop): pop the composer out into a draggable floating window
Gesture-driven: drag the docked composer up to peel it out, drag it back to
the bottom-center dock zone (radial glow ramps with proximity) to redock, and
double-click the grab area to toggle. Floating composer is compact, grows
upward as it wraps, and can be moved by its 5px transparent grab platform
(diagonal hatch on hover). Position + popped state persist; secondary windows
always start docked. rAF-coalesced drag, persisted only on release.
* fix(desktop): keep floating composer radius consistent with docked
* fix(desktop): composer popout polish — peel-off placement, panels, chip editing
- Peel-off undock drops the floating composer under the cursor (centered
horizontally, preserving the vertical grab offset) instead of snapping to
the docked corner.
- Unify the / · @ · ? completion drawer and the attach (+) menu onto one
shared glassy panel primitive (composerPanelCard): smallest theme font,
hairline border, nous shadow; floats off the composer, inset from the left.
- Directive chips: Backspace removes the chip + its auto-inserted trailing
space atomically (no orphaned space), and a phantom trailing block left by
contenteditable no longer falsely expands the composer to two rows.
- Model picker: scroll area capped at max(150px, 30dvh); footer rows aligned
(matching icons, dropped a redundant margin).
- Composer focus shifts the border ~15% toward foreground (no fill change);
input is cursor-text; trimmed control icon/button sizes.
- Peel-off undock drops the floating composer under the cursor (centered
horizontally, preserving the vertical grab offset) instead of snapping to
the docked corner.
- Unify the / · @ · ? completion drawer and the attach (+) menu onto one
shared glassy panel primitive (composerPanelCard): smallest theme font,
hairline border, nous shadow; floats off the composer, inset from the left.
- Directive chips: Backspace removes the chip + its auto-inserted trailing
space atomically (no orphaned space), and a phantom trailing block left by
contenteditable no longer falsely expands the composer to two rows.
- Model picker: scroll area capped at max(150px, 30dvh); footer rows aligned
(matching icons, dropped a redundant margin).
- Composer focus shifts the border ~15% toward foreground (no fill change);
input is cursor-text; trimmed control icon/button sizes.
delegate_task has never exposed a per-call model parameter (removed
intentionally in fb0f579b1). The tool description gave no hint about how
subagent model is actually controlled, so users kept expecting a model
arg and filing it as a dropped/ignored param (e.g. #49332, #23467).
Add one bullet to the dynamically-built tool description stating that
children inherit the parent model + fallback chain, and that pinning all
subagents to a specific model is done via delegation.provider /
delegation.model in config.yaml. No behavior change.
Gesture-driven: drag the docked composer up to peel it out, drag it back to
the bottom-center dock zone (radial glow ramps with proximity) to redock, and
double-click the grab area to toggle. Floating composer is compact, grows
upward as it wraps, and can be moved by its 5px transparent grab platform
(diagonal hatch on hover). Position + popped state persist; secondary windows
always start docked. rAF-coalesced drag, persisted only on release.
The raft platform plugin's check_raft_requirements() logged a WARNING every
time it returned False. Since check_fn is called on every load_gateway_config()
(~every 10s during normal gateway operation), users who don't have the raft
CLI installed get their logs flooded with no way to suppress it — hermes plugins
disable doesn't work for bundled platform plugins, and platforms.raft.enabled:
false doesn't gate the check_fn call.
Fix: make check_raft_requirements() a silent predicate (return True/False
only, no logging), matching the convention documented and used by other
platform adapters (e.g. teams/adapter.py). The caller in
gateway/platform_registry.py create_adapter() already emits its own warning
when requirements aren't met and an adapter is actually requested — that's the
correct place for a user-facing warning (fires once per connect attempt, not
once per config load).
Fixes#49234
When a tool call itself restarts the gateway (docker restart, systemctl
restart, and similar), the process is terminated mid-call — before the
tool result is persisted and before the orderly drain rewind can run. The
transcript tail is left as an assistant(tool_calls) with no matching tool
answer. On resume the model re-issues the unanswered call, taking the
gateway down again — an infinite loop (#49201).
Source fix: _build_gateway_agent_history now strips a trailing
assistant(tool_calls) block that has no tool answers
(_strip_dangling_tool_call_tail), so there is nothing for the model to
re-execute. This complements _strip_interrupted_tool_tails, which only
handles the case where a tool result row exists with an interrupt marker.
Cognitive backstop: the resume-pending system note now states that any
restart command in the history already ran and must not be re-executed or
verified, and the empty-message auto-resume startup turn reports recovery
and asks for instructions instead of the nonsensical "address the user's
NEW message" (there is no new message on that turn).
Reimplements the intent of #49243 by @JoaoMarcos44 at the replay layer.
Fixes#49201
_build_fal_payload and _build_fal_edit_payload assemble the request and then
filter it down to the model's supports / edit_supports whitelist. That filter
also covers prompt (and image_urls for edits), which every FAL endpoint
requires. Today all model configs happen to list those keys, but a single
config that omits one would silently produce a request with no prompt or no
source images — a broken generation with no error.
Always keep the mandatory keys regardless of the whitelist so a missing
whitelist entry can only drop optional knobs, never the prompt or the images.
Remote displays (RDP/SSH/X11) silently disable GPU hardware acceleration with
only a console.log, leaving the user unaware that software rendering is
active. Expose the detected reason over IPC and surface a dismissible banner
in the renderer.
The god-file Phase 4 refactor (094aa85c37) moved agent construction into
CLIAgentSetupMixin, which set the atexit shutdown reference with a bare
`global _active_agent_ref`. After extraction that global binds the *mixin
module's* namespace, not cli.py's. cli._run_cleanup reads
cli._active_agent_ref to decide whether to fire the memory provider's
on_session_end hook — and it stayed None for the whole session, so the
`if _active_agent_ref:` branch was dead and on_session_end never ran on
/exit. Custom memory providers silently lost end-of-session extraction.
Fix: publish the reference onto the cli module explicitly
(`import cli as _cli; _cli._active_agent_ref = self.agent`), using the
deferred-import pattern already established in the mixin.
Regression test asserts cli._active_agent_ref is populated by the mixin's
publish line and guards against a relapse to the bare `global` form. The
existing shutdown tests passed only because they hand-assigned the ref,
which is exactly what masked this.
Makes the CLI memory-provider shutdown path observable: log when CLI
cleanup calls memory shutdown (with session id + message count), warn
instead of swallowing CLI memory-shutdown exceptions, warn on
on_session_end failures during agent shutdown, and raise the
MemoryManager provider-hook failure log from debug to warning with a
traceback.
Salvaged from PR #49287 (authored by Gille / @helix4u).
Teams overrode send_image/send_image_file but not send_video, send_voice,
or send_document — so when the gateway dispatched a video/voice/document
reply to a Teams chat it fell through to the base-class text fallback and
sent the local file path as plain text (same broken-UX class as the LINE
URL-image gap in #49298).
Extract the existing send_image attachment logic into a shared
_send_media_attachment helper (remote URL by reference, local file as a
base64 data URI, MIME guessed from the path) and route all four media
kinds through it. 5 new tests cover remote-URL, local-file base64,
no-app, and missing-file paths.
* docs: clean up three stale comments from the #32848 audit
- tools/memory_tool.py:20 — 'read' action was intentionally removed
but the docstring still listed it. Now matches the schema.
- tools/fuzzy_match.py:9 — unicode_normalized was added but the
chain-count docstring still said '8-strategy'. Now says '9'.
- run_agent.py:1485 — 'See #<TBD>.' placeholder was never filled in.
Replaced with a backfill note.
Fixes#32848 (parts 3, 4, and 12)
* docs(memory): also remove stray memory(action=read) references in lines 144 and 201
The original #32848 audit fix (in 6fd661d6) only addressed line 20
(the action list in the module docstring), but the action was
referenced in two other places:
- tools/memory_tool.py:144 — in a class docstring, claimed
'memory(action=read)' was a way to SEE poisoned entries
- tools/memory_tool.py:201 — in a user-facing warning message,
told the user to 'use memory(action=read) to inspect'
Since the schema on line 683 only allows add/replace/remove, both
references were misleading: the first claimed a way to inspect
poisoned entries that doesn't exist, the second would error out
when the user followed the warning.
This commit removes both references:
- Line 144: '...keep the original text so the user can still SEE
poisoned entries by inspecting the source files directly, and
remove them — silently dropping them would hide the attack
from the user.'
- Line 201: '...use memory(action=remove) to delete the
original. (drop the read-action reference)'
Followup to the previous commit on this branch.
---------
Co-authored-by: KeyArgo <keyargo@argobox.com>
The optional-skills copy was still the v1.0.0 constraint-dispatch skill
(SKILL.md + full-prompt-library.md only). This brings it up to the current
tool: a situation-routed library of 22 named ideation methods drawn from
working artists, scientists, designers, and writers.
SKILL.md becomes a 4-step router (extract PHASE/DOMAIN/SPECIFICITY signals
→ apply overrides → route phase-then-domain → resolve ambiguity), with
anti-slop operating rules and an anti-default check.
Adds:
- 22 method files under references/methods/ — oblique-strategies (Eno/Schmidt),
oulipo, scamper, lateral-provocations (de Bono), triz (Altshuller),
leverage-points (Meadows), pattern-languages (Alexander), compression-progress
(Schmidhuber), analogy-and-blending, pataphysics, first-principles, polya,
biomimicry, volume-generation, creative-discipline, premortem-and-inversion,
defamiliarization, derive-and-mapping, affinity-diagrams, jobs-to-be-done,
story-skeletons, chance-and-remix. Each: when/when-not, the actual
cards/principles/operators, a procedure, a worked example, anti-slop notes.
- references/method-catalog.md (index + when-to-use), heuristics.md (extended
decision tree), anti-slop.md (rules applied to every output), exercises.md
(time-boxed exercises).
- full-prompt-library.md restructured into domain-affinity sections (general /
software / physical / social / lists) so the no-direction default isn't
developer-biased.
Frontmatter: name aligned to directory slug (creative-ideation, folding in
the fix from #18084); version 2.0.0→2.1.0; platforms field preserved.
Original wttdotm-derived constraint dispatch is kept as the default path.
Supersedes #19295 (which targeted the pre-move skills/ path).
Co-authored-by: SHL0MS <SHL0MS@users.noreply.github.com>
Simplify pass on the picker-persist coverage:
- Stub list_picker_providers + resolve_display_context_length so the
tests no longer make real outbound HTTP calls (OpenRouter catalog +
Ollama /api/show) during picker setup and confirmation rendering.
Runtime drops from ~11s to ~0.4s and the tests are now deterministic.
- Collapse the two positive persist cases into one parametrize over the
config seed (nested-dict vs flat-string), asserting the nested-dict
invariant in both.
- Assert the in-memory session override is applied in the --session
case, closing a 'passes for the wrong reason' gap (config untouched
AND the switch still took effect).
- _FakePickerResult -> types.SimpleNamespace.
Mutation re-checked on the final test: both persist cases fail on
pre-fix slash_commands.py; the --session case passes on both.
Add regression coverage for the picker persist fix: drive the real
_handle_model_command with a fake picker-capable adapter that captures
the on_model_selected callback, fire a 'tap', and assert config.yaml is
written (bare /model), left untouched (--session), and that a flat-string
model: is coerced to a nested dict on a tap.
Mutation-checked: the persist and coercion assertions fail on pre-fix
slash_commands.py and pass on the fix.
#49066 made /model text and the CLI picker persist to config.yaml by
default, but the gateway (Telegram/Discord/Matrix) inline-keyboard picker
callback stayed session-only. Mirror the text path's persist block so a
tapped model survives across launches like a typed one.
Behavior-preserving cleanups on the managed-node resolver:
- Hoist _candidate_node_command_names() out of the inner dir loop in
find_hermes_node_executable (computed once, not per directory).
- Drop redundant os.environ.copy() at the two with_hermes_node_path(
os.environ.copy()) sites \u2014 the helper already copies os.environ when
called with no argument (verified env-equivalent).
- Add reciprocal keep-in-sync comments between iter_hermes_node_dirs()
(hermes_constants.py) and hermesManagedNodePathEntries() (electron
main.cjs), which mirror the same platform-ordering rule across the
Python/Node boundary.
The `hermes update` desktop-rebuild gate still used a bare
`shutil.which("npm")` presence check. On a Windows box where the only
working npm is the Hermes-managed npm.cmd (not on PATH), the gate would
skip the desktop rebuild even though _build_web_ui / cmd_gui can now find
it via find_node_executable. Route the gate through the same resolver for
full bug-class coverage.
Surfaced during review of #49239.
Auto-generated session titles already rename the Telegram forum topic via
the title_callback path, but the /title command only wrote the session
title to the database. On a Telegram topic lane the visible topic kept its
auto-assigned name, so a user who ran /title to override it saw no change.
Propagate the user-chosen title to the topic by calling the existing
_schedule_telegram_topic_title_rename helper on a successful /title set. It
already no-ops off Telegram topic lanes and when auto-rename is disabled.
MCP Streamable HTTP servers that garbage-collect idle sessions on a short
TTL (e.g. Unreal Engine's editor MCP, ~15s) were unusable: the keepalive
was hardcoded at 180s, so the session was always dead by the time it ran,
and every idle tool call then landed on an expired session and paid the
full reconnect path (observed hangs of 113-143s until interrupt, bounded
only by the 300s tool_timeout).
Two coordinated, backward-compatible changes:
- Add per-server `keepalive_interval` (config.yaml, not an env var per the
contribution rubric). Default 180s — byte-identical to the old hardcoded
value when unset — floored at 5s. Servers with short session TTLs set it
below their TTL so the session stays warm.
- Switch the keepalive probe from `list_tools()` to `ping` (the MCP base
protocol liveness primitive). On large servers `list_tools` pulled ~1 MB
every cycle (830 tools = 1,068,041 bytes); `ping` is ~55 bytes and works
uniformly across tool/prompt/resource servers. Tool-list changes still
arrive out-of-band via notifications/tools/list_changed -> _refresh_tools.
`ping` is an OPTIONAL utility, so to guarantee zero regression for a
tool-capable server that doesn't implement it: the first -32601 latches
`_ping_unsupported` and the probe falls back to the pre-ping `list_tools`
path for that connection (no reconnect loop). The latch resets on each
fresh connection (_discover_tools, all transport paths) so a server that
gains ping support after a reconnect is re-probed with the cheap path.
Non-(-32601) ping errors propagate as genuine liveness failures.
Verified end-to-end against a live Unreal MCP server (idle 22s past the
~15s TTL -> post-idle tool call returns in 0.31s, no teardown) and with a
simulated ping-less tool server driving the real keepalive loop (ping once,
list_tools thereafter, no reconnect). 25/25 unit tests pass.
Note: a separate upstream defect (modelcontextprotocol/python-sdk#2604)
still tears down the whole session when one tool-call POST returns 4xx;
that is not addressed here.
* fix(discord): hydrate channel context when replying to a message
Replying to a message in a free-response (non-mention, threads-off)
channel previously received only the 500-char "[Replying to: ...]"
snippet — the history-backfill gate fired only for mention-gated
channels and threads, so a reply got no surrounding channel context.
Replies now route through the same _fetch_channel_context hydration
that threads use. When the user replied to a specific (often older)
message, a reply-anchored window is scanned ending at that message so
the agent sees the exchange around what was pointed at, even when the
target sits before the self-message partition. The two windows are
merged chronologically and de-duplicated by message id.
Also hardens the recent-window scan to skip non-conversational status
bumps before the self-message partition check, and makes author-name
resolution defensive against partial/deleted authors.
* fix(discord): duck-type reply-target resolution instead of isinstance(discord.Message)
The e2e suite stubs the discord module, so discord.Message is a MagicMock
and isinstance(_resolved, discord.Message) raises 'isinstance() arg 2 must
be a type'. Any object with an int .id works as a scan anchor, so resolve
the reply target by duck-typing on .id and fall back to a _Snowflake from
the reference message_id.
The cron-script subprocess is now sanitized alongside shell/MCP/
code-exec children; §2.3 listed only the original three. Makes the
_run_job_script docstring's §2.3 citation fully accurate.
Follow-up to salvaged PR #49207.
Matches the env= callsite convention at the other sanitized
subprocess spawns (cua_backend dict(os.environ), gateway
os.environ.copy()). Functionally equivalent — _sanitize_subprocess_env
never mutates its input — but avoids handing the live mapping to the
helper.
Follow-up to salvaged PR #49207.
CI caught 3 ACP test failures (tests/acp/test_server.py,
tests/acp/test_mcp_e2e.py). Root cause: routing ACP's tool-surface rebuild
through the shared refresh_agent_mcp_tools helper (added in the round-2 pass)
broke a deliberate, pre-existing ACP contract:
- the ACP tests assert `agent.tools is <get_tool_definitions return>` (object
identity) and an exact get_tool_definitions(enabled_toolsets=[...],
disabled_toolsets=..., quiet_mode=True) call signature; the shared helper
list()-copies and re-derives differently, breaking identity; and
- the tests use a MagicMock agent whose _tool_snapshot_generation is a mock, so
the new `int < published_gen` generation guard raised TypeError and the whole
ACP refresh silently failed.
ACP already preserves memory-provider tools (its own inject call) and excludes
context_engine, so there was no bug to fix there — only over-reach. Reverted ACP
to its original rebuild. (Same lesson as the gateway path: leave call sites that
carry their own tested contract alone; a reviewer's "inert today, fragile" note
meant leave-it, not change-it.)
Also hardened the generation guard defensively: tolerate a non-int
_tool_snapshot_generation (mock / partially-built agent) instead of throwing
TypeError and silently failing the refresh.
Third review pass (Hermes subagent) declared convergence: no BLOCKING, the
round-2 generation-aware publish / context-engine staging / CLI reload / ACP
routing all verified correct by hand and by test.
- agent_init: capture _tool_snapshot_generation immediately before the tool
snapshot (was ~425 lines earlier); removes a harmless skew window so the
recorded generation always matches the snapshot it describes.
- gateway/run.py _execute_mcp_reload: keep preserving each cached agent's
build-time enabled_toolsets EXACTLY (do NOT merge newly-connected servers like
CLI/TUI do) and document WHY — gateway sessions can be deliberately locked
down, and test_reload_mcp_preserves_per_agent_toolset_overrides asserts this.
A reviewer suggested "parity" here; it would have violated that contract.
Second review pass (Codex + Hermes subagent). Codex reproduced a real race with
a two-thread harness; both converged on the remaining issues.
- Generation-aware publish (fixes a lost-update race): two refresh callers (the
late-refresh daemon and the between-turns prologue around turn 1) could each
compute a snapshot outside the lock; a SLOWER caller holding an OLDER registry
generation could acquire the publish lock after a newer caller and clobber it,
deleting just-landed tools. refresh_agent_mcp_tools now captures
registry._generation before computing and refuses to publish a stale set;
agent._tool_snapshot_generation tracks the published generation.
- Context-engine routing names (_context_engine_tool_names) are now staged on a
local and published atomically with the snapshot, and only claimed when this
rebuild actually appended the schema — matching agent_init's dedup so a
registry/plugin tool of the same name keeps its own dispatch. (Previously
mutated live, before the publish lock, and on no-change refreshes.)
- CLI /reload-mcp: self.enabled_toolsets is resolved once at startup, so a
server newly ENABLED in config mid-session wasn't picked up (TUI already
re-resolved). Merge now-connected MCP server names into the override (unless
the user pinned all/*), mirroring startup, and keep self.enabled_toolsets in
sync. Closes the CLI/TUI parity hole.
- ACP (acp_adapter/server.py) routed through the shared helper — it was a 5th
sibling rebuild that re-injected memory tools but NOT context-engine tools and
bypassed the atomic/name-diff path (inert today, fragile).
- mcp_startup._resolve_discovery_timeout pulls its default from DEFAULT_CONFIG
(single source of truth) instead of a stale hardcoded 5.0 literal.
- Tests: stale-generation-no-clobber, _skip_mcp_refresh honored, timeout
fallback uses DEFAULT_CONFIG.
Consolidated findings from three independent reviewers (Codex, Claude Code, a
Hermes subagent w/ the hermes-agent-dev skill):
- BLOCKING: refresh_agent_mcp_tools rebuilt only the registry subset, silently
dropping post-build-injected memory-provider (mem0/honcho/…) and context-
engine (lcm_*) tools on every refresh. Now additive-preserving: re-applies
the same injectors agent_init uses, staged on locals and published atomically.
- Re-injection now honors the #5544 enabled_toolsets gate for context-engine
tools, so a restricted-toolset platform can't get lcm_* leaked back in.
- Atomic read-diff-publish under one lock: the returned `added` set and the
(tools, valid_tool_names) pair are consistent even under concurrent callers
(no half-swap, no TOCTOU).
- background_review fork opts out (_skip_mcp_refresh) so its byte-identical
tools[] cache parity with the parent is preserved.
- CLI /reload-mcp routed through the shared helper (was a 4th divergent copy
with the same clobber bug + missing disabled_toolsets).
- Explicit reloads (TUI RPC + CLI) pass enabled_override so a server the user
just enabled in config this session is picked up; automatic paths reuse the
agent's build-time selection.
- mcp_discovery_timeout default 5.0 -> 1.5s: correctness now comes from the
between-turns refresh, so the startup wait is only a small turn-1 UX bump
rather than a heavy dead-server latency penalty.
- has_registered_mcp_tools checks registered TOOLS (not connected servers) so a
zero-tool/prompt-only server doesn't make the per-turn hook fire forever.
- Tests: rewrote the thread-safety test to actually exercise the write path
(alternating tool sets), added the #5544-gate regression, the memory/context
preservation regression, and a "callable next turn via valid_tool_names"
contract; removed a dead monkeypatch line.
A slow MCP server (HTTP/OAuth, 2-6s cold connect) that finishes connecting
after the agent's one-time tool snapshot was uncallable for the rest of the
session. The merged pre-first-turn late-refresh only helps during the dead air
before the user's first keystroke; once a turn starts it bails to protect the
prompt cache, so a user who types before the server connects never gets the
tools without a manual /reload-mcp.
Refresh the snapshot in the per-turn prologue (build_turn_context), before this
turn's first API call assembles tools=. This is cache-safe by construction: the
refresh only ever extends a fresh request prefix at a turn boundary, never
mutates the cached prefix of an in-flight turn. So late tools become callable on
the user's NEXT turn automatically, with no /reload-mcp and no cache cost.
- tools/mcp_tool.py: has_registered_mcp_tools() — cheap guard so sessions with
no MCP servers (the common case) skip the rebuild entirely.
- agent/turn_context.py: call the shared refresh_agent_mcp_tools() helper at the
top of the prologue when MCP servers are registered.
- tests: 3 contract tests through the real build_turn_context (adds late tool;
skipped when no servers; no snapshot churn when unchanged).
.hermes/plans/: SPEC + PLAN documenting the root cause, the cache-safety
constraint, and why the existing fixes (#48403/#41630/#42802) don't close it.
MCP servers that connect after the agent's one-time tool snapshot were
invisible for the whole session. Two root causes, fixed together:
1. The startup discovery wait was a flat 0.75s. HTTP/OAuth servers
commonly take 2-6s on a cold connect, so they missed the window and
their tools never entered the agent's snapshot. `thread.join(timeout)`
already returns the instant discovery completes, so raising the bound
costs ~0s for the common case (no MCP / fast servers) and only ever
blocks for a genuinely-pending server, capped so a dead server can't
freeze startup. The bound is now configurable via
`mcp_discovery_timeout` (config.yaml, default 5.0s).
2. Three call sites duplicated the agent tool-snapshot rebuild (the TUI
`reload.mcp` RPC, the gateway reload, and the TUI late-binding refresh
thread), and the late-refresh detected changes by tool COUNT — missing
an equal-size add/remove swap. Consolidated into one shared
`tools.mcp_tool.refresh_agent_mcp_tools(agent)` helper that diffs by
tool NAME, mutates the agent under a lock (thread-safe), and respects
the agent's own enabled/disabled toolsets.
The late-binding refresh keeps its pre-first-turn cache-safety guard:
it never rebuilds the tool list once a turn has started, so the cached
prompt prefix is never invalidated mid-conversation.
Tests: new tests/tools/test_refresh_agent_mcp_tools.py covers the
name-based diff, in-place mutation, agent-scoped filtering, thread
safety, and the config-driven discovery bound (incl. instant-return
when nothing is pending). 75 passed across the touched areas.
next(iter(frozenset)) picked a different blocklist var each run
(PYTHONHASHSEED-dependent), hurting reproducibility. sorted()[0]
keeps the invariant-style assertion (any real blocklisted var)
while making failures reproducible.
Follow-up to salvaged PR #49207.
Wires support for the MCP `elicitation/create` request (Python SDK 1.11+)
so MCP servers can ask the user to confirm sensitive operations
mid-tool-call (payment authorization, OAuth confirmation, etc.) instead
of failing closed or requiring out-of-band biometrics.
Behavior:
- `tools/mcp_tool.py` adds `ElicitationHandler`, attached per server task
and passed to `ClientSession` as `elicitation_callback`. Form-mode
requests route through the existing approval system; URL-mode requests
decline cleanly (out of scope for this pass).
- `tools/approval.py` adds `request_elicitation_consent()`, which dispatches
to whichever surface owns the active session — `_await_gateway_decision`
for Telegram / Slack / etc. (so the approval prompt lands on the right
platform), `prompt_dangerous_approval` for CLI / TUI. Fails closed on
timeout, missing notify_cb, or exception.
- The MCP tool wrapper snapshots `contextvars.copy_context()` into
`MCPServerTask._pending_call_context` before each `session.call_tool`
and clears it after. The recv-loop task that dispatches incoming
`elicitation/create` requests does not inherit the agent task's
contextvars (HERMES_SESSION_PLATFORM and friends), so without the
bridge `_is_gateway_approval_context()` returns False on every
gateway session and the elicitation falls through to a CLI prompt
that has no TTY → fail-closed decline. The handler now reads the
snapshot via its `owner` back-reference and replays it through
`Context.copy().run(...)` so attribution survives the task hop.
Tests (`tests/tools/test_mcp_elicitation.py`):
- form-mode accept / decline / cancel
- URL-mode declined without prompting
- exception in approval system → decline
- timeout in approval → cancel
- context-bridge regression tests (replay observed in consent call,
missing-context fallback, multiple-replay safety, owner with
cleared `_pending_call_context`)
Verified end-to-end against pay's MCP server on macOS: agent message
arrives via Telegram, agent calls `mcp_pay_curl` against a paid endpoint,
pay returns 402, ElicitationHandler routes the approval prompt back to
the originating Telegram chat, user replies in TG, the curl tool signs
and completes.
Platforms tested: macOS 14 (darwin/arm64). No Unix-only syscalls
introduced; Windows footgun checker passes on the touched files.
Cron no_agent and pre-check scripts ran with the full gateway/agent
environment, allowing scripts under HERMES_HOME/scripts/ to read provider
credentials. Apply _sanitize_subprocess_env like terminal and MCP paths
(SECURITY.md section 2.3).
Add regression test asserting blocklisted provider vars are absent in the
child process.
Sets the Telegram bot's short description (the line under its name) to
"Online" on gateway connect and "Offline" on clean disconnect, gated
behind extra.status_indicator (off by default).
Telegram bots have no presence/online dot — that's a user-account
feature the Bot API doesn't expose for bots. The short description is
the closest available surface, so this gives users a way to tell whether
the gateway is up from the bot's profile.
- New extra.status_indicator flag (+ status_online/status_offline text
overrides), read in __init__ via config.extra — no config-schema change.
- _set_status_indicator() helper: best-effort, swallows API errors so it
never blocks connect/disconnect; truncates to Telegram's 120-char cap.
- Wired Online after _mark_connected(), Offline at top of disconnect()
while the bot HTTP client is still alive.
- 9 unit tests + Telegram docs section.
Requested by @ilTrumpista, cc @Teknium.
The image-too-large reactive shrink (try_shrink_image_parts_in_messages)
conflated two independent constraints: it always rejected a resize whose
re-encoded bytes were >= the original, even when the shrink was driven by a
PIXEL-DIMENSION cap (Anthropic many-image 2000px) rather than the byte budget.
Downscaled screenshot PNGs routinely re-encode LARGER in bytes, so the
dimension-correct result was discarded and the image left oversized -> the
provider re-rejected on retry and the session wedged forever.
Fix: track which constraint triggered the shrink (bytes vs dimension) and gate
the accept on the SAME axis.
* dimension path: accept the result as long as it is now within max_dimension,
regardless of byte size (verify via Pillow; fall back to the byte gate only
when the re-encode can't be decoded).
* bytes path: still require bytes to shrink, but ALSO re-check the per-side cap
when it's active — _resize_image_for_vision returns a best-effort, possibly
over-cap blob when it exhausts its halving budget on a very-high-aspect
image, so a byte-shrink alone can leave it over the dimension cap and
re-brick on retry.
Extend the unshrinkable-oversized guard to the pixel axis so a partial shrink
doesn't burn the one-shot retry.
Single shared agent path -> fixes CLI, TUI, and gateway alike.
Adds a real-Pillow runnable proof (repro_48013_image_shrink_brick.py) that
reproduces the issue's per-image table (bricks 3/5 before, passes 5/5 after)
plus unit invariants for the dimension and bytes accept/reject paths,
partial-progress accounting, and the bytes-path still-over-cap regression
surfaced by adversarial review.
Closes#48013
The web dashboard only showed a read-only "Reasoning" capability badge
with no way to set the effort level — unlike the desktop app, which has
an effort radio in its composer model menu. This adds a picker so the two
surfaces reach parity.
- ReasoningPicker: a Select rendered in the chat sidebar, gated on the
effective model's supports_reasoning capability (from /api/model/info).
Reads/writes agent.reasoning_effort via the existing config REST
endpoints (read-modify-write, the dashboard's single-key save pattern),
so the value lands in the config the agent boots a fresh chat from.
Options mirror the desktop: Off/Minimal/Low/Medium/High/Max.
- ChatSidebar: capture supports_reasoning from the model-info fetch and
render the picker; on change, show the same 'apply on /new or reload'
notice the model switch uses.
- reasoning-effort.ts: DOM-free helpers (normalizeEffort + options) so the
node-env vitest harness can cover the resolution logic, plus tests.
The classic CLI status bar could appear twice after a horizontal terminal
resize — two bars at two widths with two different elapsed readings.
Root cause: prompt_toolkit's Application._on_resize() calls renderer.erase(),
which does cursor_up(_cursor_pos.y) + erase_down() using the _cursor_pos.y
cached from the LAST render at the OLD width (renderer.py:745). On a column
shrink the terminal reflows the already-painted full-width chrome into extra
physical rows, so the cached y undershoots: cursor_up doesn't climb past the
reflowed rows and erase_down leaves the old bar stranded ABOVE the live
origin. The next paint stacks a fresh bar below it. The existing post-resize
suppression hides the NEW bar for ~0.35s but never erases the already-reflowed
OLD one, so the ghost survives the whole window. Ctrl+L / /redraw clears it,
confirming a viewport wipe is the fix.
Fix: on a WIDTH change, _recover_after_resize now routes through the same
recovery as Ctrl+L — _clear_prompt_toolkit_screen(rebuild_scrollback=False)
(CSI 2J, visible viewport only) + _replay_output_history() — BEFORE delegating
to prompt_toolkit's resize. Banner-safe: 2J never touches scrollback history
(that's CSI 3J, which we don't send here), so the startup banner is preserved.
Rows-only resizes skip the clear (no reflow → no ghost) to avoid an extra
repaint. Tracks _last_resize_width to distinguish the two.
Tests: replace the now-obsolete 'never clears on resize' assertion with two
tests — rows-only resize delegates without clearing; width change clears the
viewport + replays and never wipes scrollback.
Resolves the 2 npm audit advisories (1 high, 1 moderate), both from
transitive undici:
- undici 6.26.0 -> 6.27.0 (high: TLS bypass / header injection /
response queue poisoning class, via node-gyp + ui-tui)
- jsdom's undici 7.27.2 -> 7.28.0 (moderate, via jsdom test dep)
Both are in-range bumps (no --force). Lockfile also reconciled two
pre-existing manifest drifts during the install: dompurify 3.4.10 ->
3.4.11 (in-range patch) and the web workspace's already-declared
vitest ^4.1.5 devDep. No package.json changes. npm audit reports 0
vulnerabilities in root, ui-tui, and apps/desktop after.
* fix(desktop): rename "Restart messaging" -> "Restart gateway"
The Command Center control restarts the whole messaging gateway, yet was
labelled "Restart messaging" while the status line above it reads "Messaging
gateway running/stopped". Rename the i18n key to match what it does, across
all 4 locales.
* feat(desktop): restart the gateway from Cmd+K, with statusbar spinner feedback
Add a shared runGatewayRestart() (store/system-actions.ts) and wire it to a
new Cmd+K "Restart gateway" action. While a restart is in flight the
statusbar "Gateway" item swaps its icon for the TUI glyph spinner and reads
"restarting…", returning to its real state on completion — driven by a
$gatewayRestarting atom, not a transient toast or the generic "Agents
running" counter. The helper owns its error handling so fire-and-forget
callers can't leak an unhandled rejection; only a failure toasts.
* fix(desktop): offer a Restart gateway action on messaging save/toggle toasts
The "setup saved" and "platform enabled/disabled" toasts told users their
change needs a gateway restart but left it a separate hunt. Attach a "Restart
gateway" action (the shared runGatewayRestart), and reword the copy to state
the pending consequence ("...takes effect after a gateway restart") now that
the button carries the verb. Updated all 4 locales.
* fix(desktop): make rendered logs selectable so they can be copied
The global body { user-select: none } left log surfaces unselectable. Opt them
back in via the existing data-selectable-text convention — at the shared
LogView primitive (boot-failure + bootstrap install overlays) plus Command
Center recent logs, toolset post-setup output, notification detail, and
subagent stream/file lines.
The global body { user-select: none } left log surfaces unselectable. Opt them
back in via the existing data-selectable-text convention — at the shared
LogView primitive (boot-failure + bootstrap install overlays) plus Command
Center recent logs, toolset post-setup output, notification detail, and
subagent stream/file lines.
The "setup saved" and "platform enabled/disabled" toasts told users their
change needs a gateway restart but left it a separate hunt. Attach a "Restart
gateway" action (the shared runGatewayRestart), and reword the copy to state
the pending consequence ("...takes effect after a gateway restart") now that
the button carries the verb. Updated all 4 locales.
Add a shared runGatewayRestart() (store/system-actions.ts) and wire it to a
new Cmd+K "Restart gateway" action. While a restart is in flight the
statusbar "Gateway" item swaps its icon for the TUI glyph spinner and reads
"restarting…", returning to its real state on completion — driven by a
$gatewayRestarting atom, not a transient toast or the generic "Agents
running" counter. The helper owns its error handling so fire-and-forget
callers can't leak an unhandled rejection; only a failure toasts.
The Command Center control restarts the whole messaging gateway, yet was
labelled "Restart messaging" while the status line above it reads "Messaging
gateway running/stopped". Rename the i18n key to match what it does, across
all 4 locales.
The Chat-tab session switcher rendered rows in the API's default
order="created" (original start time) while each row displays
last_active — so a session you just messaged in could sit below an
older one, and the list looked unsorted against its own timestamps.
Pass order="recent" from ChatSessionList so the switcher sorts by
latest activity across the compression chain (most-recently-used at
top, ChatGPT-style; long conversations that auto-compressed into a new
continuation id stay on the first page). Adds an optional, defaulted
`order` arg to api.getSessions; the paginated Sessions page keeps the
stable created order.
The classic CLI status bar could vanish for the rest of a session: any
terminal reflow (SIGWINCH from a tmux pane change, SSH window restore, font
zoom) set _status_bar_suppressed_after_resize=True, but the flag was ONLY
cleared on the next *submitted* user input. Resize then sit idle and the
bottom chrome rendered at height 0 on every repaint — even with the
refresh clock ticking — so the bar was gone until you typed and hit enter.
Fix: _recover_after_resize now schedules a debounced unsuppress timer that
clears the flag and repaints once the reflow settles (~0.35s), so the bar
returns on its own during idle. The next-submit clear stays as a fast path.
Fails open: any error in scheduling clears the flag immediately rather than
leaving the bar stuck hidden.
Satisfies the repo-wide subprocess-stdin guard
(tests/tools/test_subprocess_stdin_guard.py); the long-lived bridge
child should not inherit the gateway's stdin.
Adds a Raft platform adapter as a bundled plugin (plugins/platforms/raft/)
connecting Hermes to Raft as an external agent via a wake-channel bridge.
The adapter starts a loopback HTTP endpoint, spawns 'raft agent bridge' as a
child process, and injects content-free wake hints into the gateway session
pipeline. The agent reads/sends messages through the Raft CLI; the adapter
never touches message bodies or delivery cursors. Activity observer hooks
report tool/LLM/session lifecycle events via a bounded at-most-once queue.
Auto-enables when RAFT_PROFILE is set.
Cherry-picked from PR #47629. Authored by skyzh (@xxchan).
Manual verification surfaced a second bypass class beyond the standalone
config loaders: several code paths bridge config.yaml values into os.environ
(HERMES_TIMEZONE, HERMES_REDACT_SECRETS, HERMES_MAX_ITERATIONS, TERMINAL_*,
network.force_ipv4, ...) by reading the raw user YAML, so the env the whole
process reads carried the USER's value even when an administrator pinned it —
e.g. a managed timezone was overridden because gateway/run.py wrote the user's
timezone into HERMES_TIMEZONE, and _resolve_timezone_name() checks the env var
first.
Wired the shared apply_managed_overlay() into every config→env bridge:
- gateway/run.py module-level startup bridge (timezone, redact_secrets,
max_turns, terminal, display, gateway.strict, ...)
- gateway/run.py _reload_runtime_env_preserving_config_authority (the per-turn
re-bridge that keeps config authoritative over reloaded .env — must keep
MANAGED authoritative on every turn, not just startup)
- hermes_cli/main.py early security.redact_secrets / network.force_ipv4 bridge
(runs before load_config is usable, at import time)
- hermes_cli/send_cmd.py top-level scalar config→env bridge
Verified end-to-end against a writable managed dir (12/12 checks incl. timezone,
logging, model, skin, gateway settings, write-guard) and in a clean process the
gateway per-turn bridge writes HERMES_TIMEZONE=<managed>. Adds an
order-independent regression test for the bridge overlay.
The skin bug was one instance of a class: several subsystems build their
config dict directly from config.yaml instead of routing through
hermes_cli.config.load_config (which carries the managed merge), so they
silently ignored administrator-pinned values. Audited every config.yaml
reader and fixed the behavioral-read bypasses:
- gateway/config.py load_gateway_config (messaging gateway: session_reset,
quick_commands, stt, model, ...)
- gateway/run.py _load_gateway_config (its read_raw_config fast path also
skipped the merge — read_raw_config returns raw user YAML)
- tui_gateway/server.py _load_cfg (new TUI + desktop backend: skin,
reasoning_effort, service_tier, provider_routing)
- cron/scheduler.py (scheduled-job model/reasoning/toolsets/provider_routing)
- hermes_logging.py (logging.level/max_size_mb/backup_count)
- hermes_time.py (timezone)
- hermes_cli/doctor.py (memory-provider diagnostic reads effective config)
All route through a new shared managed_scope.apply_managed_overlay() helper
that mirrors _load_config_impl (env-only expansion so a user ${VAR} can't
shadow a managed literal, root-model-string normalization, leaf-merge) and is
fail-open. cli.py's earlier inline fix is refactored onto the same helper.
Write-back paths (slash_commands, telegram/yuanbao dm_topics, profile
distribution) are deliberately left reading raw user YAML — overlaying managed
values there would persist them into the user file. The dashboard
(web_server.py) already routes through load_config and needed no change.
TUI loader caches the RAW config so _save_cfg never writes managed values to
disk. Adds test_managed_scope_overlay.py (helper) and
test_managed_scope_loaders.py (per-surface integration); mutation-checked.
cli.py's load_cli_config() builds CLI_CONFIG independently of
hermes_cli.config._load_config_impl (it reads config.yaml directly and merges
into hardcoded defaults), so the Phase 2 managed merge never reached the
interactive CLI/TUI surface. Symptom: a managed display.skin (and any other
display/CLI pref read from CLI_CONFIG) was silently ignored by the TUI while
`hermes config`/`doctor`/write-guards — which go through load_config — correctly
honored it. Found via manual testing: the skin engine kept using 'default'.
Fix: overlay the managed config last in load_cli_config(), mirroring
_load_config_impl — expand against the process env only (so a user ${VAR} can't
shadow a managed literal), normalize the root model key so a managed
`model: x/y` string can't clobber the dict shape callers expect, then
leaf-merge. Fail-open so managed scope can never block CLI startup.
Adds tests/hermes_cli/test_managed_scope_cli_config.py locking that CLI_CONFIG
honors managed values, preserves user siblings, and is inert with no scope.
- show_config prints an administrator header naming the managed source and
lists the pinned config/env keys when a scope is active (silent otherwise).
- hermes doctor gains a managed_scope_check under Configuration Files that
reports the resolved managed dir + pinned key counts, and flags a
HERMES_MANAGED_DIR redirect (the documented foot-gun).
- set_config_value hard-rejects a managed config key (D2) and names the
source, exiting non-zero.
- save_env_value / remove_env_value refuse a managed env key.
- save_config strips managed leaves from a bulk write (mechanical safety net)
with a warning, so the unmanaged remainder still persists.
New _strip_dotted_keys helper drives the bulk-save pruning. All guards are
distinct from and layered after the existing is_managed() package-manager
write-lock.
load_hermes_dotenv now loads the managed-scope .env after user/project .env
and external secret sources, with override=True, so managed env values beat
the user .env and any pre-existing shell export. Reuses the existing dotenv
fallback + credential-sanitization path. Fail-open: no managed dir/.env is a
no-op and any error is swallowed so managed scope never blocks startup.
_load_config_impl now deep-merges the managed config.yaml on top of the
expanded user config so managed leaves win while sibling keys stay
user-controlled (leaf-level merge, D3). Managed values are expanded against
the process env only, never user-defined ${VAR}, so a user can't shadow a
managed literal. The managed file's (mtime,size) is folded into the load
cache key so editing it invalidates the cache. This inverts the usual
env-over-config precedence for pinned keys by design (see design doc §4.1).
New hermes_cli/managed_scope.py resolves a system-level managed directory
(HERMES_MANAGED_DIR override > /etc/hermes), parses managed config.yaml/.env
with fail-open semantics, and exposes is_key_managed/is_env_managed helpers.
The system default is ignored under pytest and HERMES_MANAGED_DIR is added to
the conftest env scrub so a real managed scope can't leak into the suite.
Not wired into the load paths yet (Phases 2-3).
_terminate_reclaimed_worker early-returned on ProcessLookupError with
terminated=False. The new reclaim-defer guard reads that as 'worker
survived the kill' and defers the reclaim forever, so a stale task whose
worker is already dead never lands in result.stale. ProcessLookupError
means the process is gone — that IS a successful termination. Split it
from the generic OSError branch and set terminated=True.
release_stale_claims and detect_stale_running call _terminate_reclaimed_worker
and then release the task claim unconditionally, even when the termination did
not actually kill the worker. _terminate_reclaimed_worker already reports this
via its "terminated" flag, but the callers ignore it.
When a worker is parked in uninterruptible (D) state — for example throttled by
a cgroup memory.high limit — a pending SIGTERM/SIGKILL cannot be delivered until
the throttle lifts, so the kill is a no-op. The dispatcher then frees the claim
and spawns a fresh worker beside the still-alive one. Repeated every dispatch
tick this accumulates duplicate workers without bound, deepening the memory
pressure that caused the throttle in the first place — a self-reinforcing
runaway.
Fix: gate both automatic reclaim paths on _worker_survived_termination(). When
we attempted to kill our own host-local worker and it is still alive, defer the
reclaim (_defer_reclaim_for_live_worker extends the claim a short grace and
emits a reclaim_deferred event) instead of releasing. This guarantees at most
one live worker per task and is self-correcting: not spawning a duplicate is
what relieves the pressure so the pending signal lands and the worker dies, and
the next tick reclaims cleanly. Non-host-local claims and the operator-driven
reclaim_task() path keep their existing force-release behaviour.
Related: #41448 (concurrent dispatchers amplify this by doubling reclaim
frequency); #42858 (kill the worker rather than orphan it on archive).
Tests: defer-when-worker-survives, reclaim-when-killed,
release-when-not-host-local, and the detect_stale_running path.
The lazy_deps pin (memory.hindsight -> hindsight-client==0.6.1) was newer
than the plugin's stated floor (>=0.4.22). Align _MIN_CLIENT_VERSION,
the setup wizard dep string, plugin.yaml, and the README to 0.6.1 so the
floor check, auto-upgrade target, and runtime lazy-install all agree.
Also drops the redundant local _MIN_CLIENT_VERSION redefinition in
post_setup.
Five targeted enhancements to the upstream simplify-code skill:
1. Risk-tiered application (SAFE/CAREFUL/RISKY) — safe changes auto-applied,
careful changes verified per-file, risky changes flagged for human review.
Prevents auto-applying N+1 restructures and public API renames.
2. Chesterton's Fence — before flagging anything for removal, reviewers run
'git blame' to understand why it exists. Low-confidence findings are
escalated rather than guessed.
3. AI slop detection — Quality reviewer now catches: extra comments restating
obvious code, unnecessary defensive null-checks on validated inputs, 'as any'
casts, and patterns inconsistent with the rest of the file.
4. Silent failure detection — Efficiency reviewer now catches: empty catch
blocks, ignored error returns, except:pass, .catch(()=>{}) with no handling,
and error propagation gaps.
5. Structured reviewer output with confidence+risk tags — reviewers report in
'file:line → problem → fix | confidence: H/M/L | risk: SAFE/CAREFUL/RISKY'
format, enabling the orchestrator to tier the application.
Plus 3 new pitfalls: over-trusting dead code tools, public contract awareness,
and preserving intentional error handling.
Total: +45/-8 lines. Keeps the 212-line compact spirit.
Ref: #379
ruff (unspecified-encoding) and the Windows-footgun checker both flag
open() in text mode without encoture=. Keep text mode (the Windows lock
path in _try_acquire_file_lock writes a str newline) and pass
encoding='utf-8'.
Two robustness gaps from community review (#44919):
1. Windows dead-path: replaced bespoke fcntl.flock with gateway.status
_try_acquire_file_lock / _release_file_lock — already cross-platform
(msvcrt on Windows, fcntl on POSIX). Added _release_singleton_lock
helper.
2. Lock fd never released: stored handle is now released explicitly in
both exit paths — CancelledError handler and normal while-loop exit.
Allows in-process stop/restart (tests, embedded use).
Also tightened docstrings — 'corrupt the SQLite DBs' is now specific
(wal_autocheckpoint=0 + concurrent manual WAL checkpoints can corrupt
index pages), matching the module's own concurrency claims.
The gateway's embedded dispatcher has no guard against more than one dispatcher
running concurrently. dispatch_in_gateway defaults to true, so a second gateway
for the same profile (a restart race where the old process is slow to exit) — or
any deployment that runs multiple profile gateways with the default — starts a
second dispatcher loop. As #41448 describes, concurrent dispatchers each run
release_stale_claims() against the same boards, double reclaim frequency, and
re-dispatch slow workers before they finish. In practice they also corrupt the
shared kanban SQLite DBs under concurrent write load.
Add _acquire_singleton_lock(): an exclusive, non-blocking fcntl.flock at the
machine-global kanban root (kanban_home()/kanban/.dispatcher.lock — the board is
shared across profiles by design, so this serialises every gateway, not just one
profile). The first gateway to start its dispatcher holds the lock for its
process lifetime; any other gateway finds it contended, logs, and skips
dispatching while still running for messaging. Falls back to config-only control
on non-POSIX or filesystems without flock.
This is more robust than a per-profile guard because the documented model is
"one dispatcher sweeps all boards" — the contention is across profiles, not just
within one. Closes#41448.
Test: lock is exclusive (held, then contended while held, then held again after
release).
Two small, focused fixes for the cron scheduler and checkpoint manager.
1. _summarize_cron_failure_for_delivery (cron/scheduler.py):
Replaces the raw error dump in _process_job with a compact
pattern-matched summary. Provider rate limits, timeouts, and
authentication errors now produce a short human-readable message
instead of dumping multi-KB provider JSON into the delivery channel.
2. _repair_bare_repo_dirs (tools/checkpoint_manager.py):
Recreates refs/heads/ and branches/ directories after git gc
--prune=now, which can remove empty dirs from bare repos and cause
subsequent git add -A to fail with 'fatal: not a git repository'.
Called after all four git gc call sites.
Both fixes use only standard library imports and plug into existing
call sites with no architectural changes.
PR #49056 set the default to 0, which reverts the #45592 idle-clock fix:
without a periodic invalidate, prompt_toolkit stops repainting the bottom
chrome during idle and the status bar goes stale/disappears after a turn.
Restore 1.0 as the default for everyone. The config knob stays — users on
emulators where the per-second redraw fights auto-scroll (#48309) can set
display.cli_refresh_interval: 0 to opt out.
Extend the 'Running Many Gateways at Once' user-guide page with a
'one gateway for all profiles (multiplexing)' section, kept to a single page:
- How to opt in (gateway.multiplex_profiles on the default profile) and when to
prefer it vs one-process-per-profile.
- Every contract change a user sees when the flag is on:
1. secondary-profile 'gateway start' is a hard error (--force escape hatch),
2. HTTP-inbound reached via /p/<profile>/ prefix; secondary profiles must NOT
enable a port-binding platform (webhook/api_server/msgraph_webhook/feishu/
wecom_callback/bluebubbles/sms) — config error at startup,
3. per-credential platforms still need their own token per profile,
4. session keys namespaced agent:<profile>: (default stays agent:main:),
5. single PID/lock + aggregated hermes status, per-profile runtime_status.json.
- What does NOT change: per-profile .env credential isolation (stricter, incl.
MCP/Kanban subprocess env), Kanban, profile-scoped skills/memory/SOUL, routing.
All inert when the flag is off.
- _guard_named_profile_under_multiplexer: when the default gateway is running
with gateway.multiplex_profiles=on, a named-profile 'hermes gateway run' hard
-errors (pointing at the multiplexer) instead of double-binding that
profile's platforms. Inert unless all hold: this invocation is a named
profile, a default-profile gateway is alive, and its config has multiplexing
on. --force overrides. Wired into run_gateway's guard chain.
- write_runtime_status gains served_profiles: the secondary-adapter startup
records [active] + multiplexed profiles into runtime_status.json so
'hermes status' can show per-profile coverage without a second probe. Absent
for single-profile gateways.
Tests: served_profiles round-trips and is absent by default; guard is inert for
the default profile / under --force / when no default gateway is running.
Bring up adapters for every profile the gateway serves, not just the active
one. Keeps self.adapters as the default/active profile's map (the ~93 existing
self.adapters[...] sites are untouched) and adds secondary profiles under
self._profile_adapters[profile][platform].
- _start_secondary_profile_adapters loops profiles_to_serve(multiplex=True),
skips the active profile (handled by the primary startup loop), and for each
other profile loads its gateway config and creates+connects its enabled
adapters under that profile's _profile_runtime_scope (home + secret scope).
- Each secondary adapter gets _make_profile_message_handler(profile): stamps
source.profile (when unset) before delegating to the shared _handle_message,
so the agent turn and session key resolve to that profile.
- Same-platform credential-conflict detection: _adapter_credential_fingerprint
hashes the adapter's bot token (salted, truncated — never logs the token);
two profiles claiming the same (platform, token) refuse the duplicate with a
clear error naming both, since one token can't be polled twice.
- Port-binding hard-error: a SECONDARY profile that enables a port-binding
platform (webhook, api_server, msgraph_webhook, feishu, wecom_callback,
bluebubbles, sms) is a config error and aborts startup via MultiplexConfigError
— the default profile owns the single shared HTTP listener and serves every
profile through the /p/<profile>/ prefix, so a second bind can only collide.
Distinct from a transient connect failure (which logs + stays alive to retry):
a config error writes gateway_state=startup_failed and exits cleanly with an
actionable message (names the profile, the platform, and the fix). There is no
valid reason to bind a second port once you've opted into a multiplexer.
- Shutdown tears down secondary adapters alongside the primary ones.
- Defensive getattr guards keep partial-construction unit tests (stop(),
_run_agent on bare instances) working.
No-op when multiplex_profiles is off (self._profile_adapters stays empty).
Tests: fingerprint stability/log-safety/distinctness, profile message-handler
stamping (and not overriding an already-stamped source), port-binding hard-error
raises + names the profile/platform, non-binding platform is not rejected, and
the guard set covers every TCP-binding adapter.
Serve webhook inbound for multiple profiles off the one shared listener via a
URL prefix, with no second port bound.
- SessionSource gains a 'profile' field (round-trips through to_dict/from_dict;
omitted when unset so existing serialization is unchanged). It carries which
profile an inbound message was routed to.
- WebhookAdapter registers /p/{profile}/webhooks/{route_name} alongside the
existing /webhooks/{route_name}. _resolve_request_profile validates the
prefix against profiles_to_serve(): None when absent or multiplexing is off
(ignored, handled as default — no spurious 404), the profile name when valid,
_PROFILE_REJECTED (→ 404) when the profile isn't served. The resolved profile
is stamped onto the SessionSource.
- session-key namespacing and the per-turn home/credential scope now prefer
source.profile: SessionStore._resolve_profile_for_key(source),
_session_key_for_source fallback, and _resolve_profile_home_for_source all
honor it (→ the agent turn resolves that profile's config/skills/credentials
via the Phase 2 _profile_runtime_scope).
Constraint: routing inbound needs no per-profile platform credential, but the
agent still needs the routed profile's provider key — delivered by Phase 2's
secret scope. api_server (OpenAI-compatible surface) profile routing is a
focused follow-on; its source-construction path differs from webhook's.
Tests: SessionSource.profile round-trip + namespace drive; _resolve_request_
profile accept/reject/ignore matrix.
The credential gate. When multiplexing is active, a profile's secrets resolve
from a context-local scope, never the process-global os.environ (which in a
multiplexer may hold another profile's keys, and is inherited by every
subprocess spawned with env=dict(os.environ)).
- agent/secret_scope.py: get_secret() backed by a secret-scope contextvar.
FAIL-CLOSED: when multiplex is active and no scope is installed, an unscoped
read RAISES UnscopedSecretError instead of falling back to os.environ — a
missed/new call site crashes loudly at that line rather than leaking a
cross-profile value. Genuinely-global vars (HERMES_*, PATH, kanban paths,
…) keep reading os.environ via an allowlist. load_env_file/build_profile_
secret_scope parse a profile .env into an isolated dict WITHOUT mutating
os.environ. Off by default => transparent os.getenv behavior.
- hermes_cli/runtime_provider.py: all credential/provider/base-url reads go
through _getenv -> get_secret.
- agent/credential_pool.py: env fallbacks route through get_secret (the
~/.hermes/.env-first preference is preserved and already profile-correct via
the home override).
- tools/mcp_tool.py: MCP config interpolation resolves through
get_secret, so a server's picks up the routed profile's value.
- gateway/run.py: set_multiplex_active() at GatewayRunner init; per-turn .env
reload is a no-op for credentials in multiplex mode (secrets come from the
scope, not global env); _profile_runtime_scope context manager combines the
HERMES_HOME override + secret scope; _run_agent wraps _run_agent_inner in
that scope (resolved via _resolve_profile_home_for_source) when multiplexing.
Propagates into the agent worker thread for free via the existing
copy_context() in _run_in_executor_with_context.
Tests: 13 unit (fail-closed, scope isolation, global allowlist, .env parsing
without environ mutation) + 7 E2E (runtime_provider + MCP interpolation prove
two profiles isolated, unscoped read raises, globals still read environ).
Foundations for serving multiple profiles from one gateway process, inert
when off:
- gateway.multiplex_profiles config flag (default false), round-trips through
GatewayConfig and load_gateway_config (top-level + nested gateway.* form).
- hermes_cli.profiles.profiles_to_serve(multiplex): the single chokepoint for
which (profile, HERMES_HOME) pairs the gateway serves. Lightweight dir scan;
active-profile-only when off, default + all named profiles when on.
- build_session_key gains a profile= namespace slot. Default/None reuse the
historical 'agent:main:...' literal BYTE-IDENTICALLY (no session migration,
positional parsers unaffected); a named profile becomes 'agent:<profile>:...'
so two profiles on the same platform/chat never collide.
- SessionStore._resolve_profile_for_key + _session_key_for_source fallback
resolve the namespace from the flag (legacy when off, active profile when on).
Tests: byte-identical-when-off (parametrized), namespace isolation, positional
layout preserved, config round-trip, profiles_to_serve enumeration.
The model was enumerating options inside the question string (dead prose the UI
can't render as pickable rows). Schema description now spells out: choices[] is
REQUIRED for selectable options; question holds ONLY the question.
The salvaged non_conversational marking made the home-channel startup
no-metadata branch always pass metadata= explicitly; for non-Discord
platforms _non_conversational_metadata returns None, so Telegram/etc.
went from adapter.send(chat_id, message) to adapter.send(..., metadata=None).
Behaviorally identical but broke test_restart_notification's exact
assert_called_once_with. Only attach metadata when the marker applies
(Discord), restoring the original call shape elsewhere.
Discord channel-history backfill partitions on Hermes' last self-authored
message. Asynchronous, non-conversational status sends (self-improvement
review bubbles, heartbeats, background-process notifications, update status,
gateway restart/online notices) land as ordinary bot messages, so a delayed
status bump becomes the history boundary and swallows real messages that
arrived after Hermes' actual reply.
Mark these sends at the source via metadata["non_conversational"] (Discord
only; other platforms' metadata is unchanged). The adapter no longer advances
the history-boundary cache for marked sends and persists their IDs to a
sidecar JSON so the cold-start scan can skip them by ID after a restart. A
narrow regex recognizer remains only as an upgrade bridge for status bumps
emitted by an older gateway that pre-dates the marking.
The desktop slash dispatcher dropped the `notice` field on `send` and
never handled `prefill` directives at all. `/goal <text>` returns
{type: send, notice: "⊙ Goal set …", message} from command.dispatch —
the desktop submitted the goal text as a plain prompt with no feedback,
so the goal looked like it did nothing. `/undo` returns a prefill
directive that fell through to "invalid response".
- types: add `notice?` to SendCommandDispatchResponse; add
PrefillCommandDispatchResponse to the union.
- parseCommandDispatch: keep `notice` on send, parse prefill.
- runExec dispatcher: render the notice as a system line before acting,
and handle prefill by dropping the message into the composer for
editing (mirrors the TUI's createSlashHandler).
Tests: parseCommandDispatch send-notice / prefill cases.
The xAI TTS REST endpoint (POST /v1/tts) accepts 'speed' (0.7-1.5)
and 'optimize_streaming_latency' (0/1/2) parameters, but the Hermes
built-in xAI provider was reading neither from config nor sending
either in the request body. Add them as tts.xai.speed and
tts.xai.optimize_streaming_latency config knobs (with global
tts.speed / tts.optimize_streaming_latency fallbacks).
- speed: float, clamped to 0.7-1.5. 1.0 (the API default) is omitted
from the request body to preserve the existing minimal-payload
contract.
- optimize_streaming_latency: int, clamped to 0-2. 0 (best quality,
the API default) is omitted from the request body.
Resolver order: tts.xai.<knob> overrides the global tts.<knob>.
Add a ChatGPT-style conversation list beside the embedded TUI on the
dashboard Chat tab so users can swap sessions without leaving the page.
- New ChatSessionList component: lists recent sessions for the active
profile (title/preview, last-active, message count, source), a New chat
button, and a refresh control. Best-effort like ChatSidebar.
- Selecting a row drives /chat?resume=<id>, which ChatPage already treats
as part of the PTY identity, so the terminal respawns resuming that
conversation. Active row is highlighted; New chat clears resume.
- Wired into ChatPage as a dedicated right-side column (desktop) and into
the existing slide-over panel above model/tools (narrow screens).
- i18n: new sessions.newChat key across all locales.
- Read-only switcher by design — delete/rename/export stay on Sessions.
Docs: web-dashboard.md Chat section documents the switcher.
Accounts-tab cards derived from the unified provider_catalog() carry
status_fn=None and had no hardcoded branch in _resolve_provider_status,
so any future OAuth/account provider plugin rendered permanently
logged-out. Fall through to the canonical hermes_cli.auth.get_auth_status
slug dispatcher and adapt its shape, so membership AND status both
auto-extend with the hermes model universe.
Address review feedback on the keyVar test helper: it mocks one /api/env row
(an EnvVarInfo), so type it as such and mirror the sibling provider() factory's
base-plus-Partial-override shape instead of hardcoding positional args and
fabricated fields (description='X direct API', url=''). Route the WidgetAI test
through it too, removing the inline duplicate of the same object shape.
- API-keys tab: a SearchField filters provider cards by name / env-var key /
description, with a 'no providers match' empty state. Card order stays
priority-then-name (curated PROVIDER_GROUPS priority floats recommended
providers up; equal priority falls back to alphabetical).
- Accounts tab: 'Other providers' keep sortProviders order (priority, then
name) — unchanged.
Adds searchKeys/noKeysMatch i18n strings across all four locales. Vitest covers
priority/name ordering + live filtering + empty state.
Adds the end-to-end parity contract test: every CANONICAL_PROVIDERS entry (the
`hermes model` universe) must be configurable on a desktop Providers tab —
keys(/api/env) ∪ ids(/api/providers/oauth) ⊇ canonical. Asserted as an
invariant against the live endpoints so the GUI can never silently drift from
the CLI again.
Surfacing this contract caught Bedrock: it's aws_sdk (no api-key vars), so it
had no Keys card. /api/env now tags AWS_REGION/AWS_PROFILE to the bedrock
provider card. Anthropic is whitelisted as a legitimate dual-tab provider
(direct API key + subscription OAuth).
Also refreshes the _OAUTH_PROVIDER_CATALOG docstring to describe its new role
as the override base for _build_oauth_catalog().
buildProviderKeyGroups now groups provider env vars by the backend-supplied
provider/provider_label (from the unified catalog — the same identity hermes
model uses), falling back to the desktop PROVIDER_GROUPS prefix match only when
the backend gives no hint. A provider the backend tags now always renders its
own Keys card, even with no hand-maintained PROVIDER_GROUPS prefix row —
PROVIDER_GROUPS is demoted to a presentation overlay (priority/blurb/docs).
Adds provider/provider_label to EnvVarInfo. New vitest asserts a backend-tagged
provider with no prefix row still renders a card.
/api/providers/oauth now unions the explicit hand-tuned OAuth cards
(_OAUTH_PROVIDER_CATALOG — bespoke flow/status/cli, plus the api-key Anthropic
PKCE card and synthetic claude-code row) with every accounts-tab provider in
provider_catalog(). Any OAuth/external provider in the `hermes model` universe
now appears automatically, closing the drift where google-gemini-cli and
copilot-acp had no Accounts card despite being CLI-configurable.
Adds read-only status cards for google-gemini-cli (via existing
get_gemini_oauth_auth_status) and copilot-acp (managed-by-CLI, like claude-code).
DELETE handler routes through the same _build_oauth_catalog() builder.
Parity test asserts the Accounts tab offers every accounts-tab catalog provider
as an invariant.
The Keys tab now surfaces every keys-tab provider in provider_catalog() (the
`hermes model` universe), synthesizing a card even when the env var has no hand
entry in OPTIONAL_ENV_VARS. Closes the drift where openai-api, kilocode, novita,
tencent-tokenhub, and copilot were CLI-configurable but invisible in the desktop
Providers → API keys tab.
Each provider row now carries backend-derived provider/provider_label grouping
hints so the desktop can group by the same provider identity the CLI picker
uses. Hand OPTIONAL_ENV_VARS prose still wins where present (enrichment, not a
gate). Shared non-provider credentials (e.g. tool-category GITHUB_TOKEN) are
explicitly not hijacked into a provider card — Copilot uses its provider-owned
COPILOT_GITHUB_TOKEN.
Adds hermes_cli/provider_catalog.py, deriving one descriptor per provider from
the CANONICAL_PROVIDERS universe (what `hermes model` renders, auto-extended
from provider plugins), joined with auth/env from PROVIDER_REGISTRY and display
metadata from ProviderProfile (with canonical/env fallbacks for the four
profile-less providers and the many profiles with blank display/signup fields).
Each descriptor is tagged with the desktop tab it belongs on (keys vs accounts)
by auth_type. This is the single source of truth the desktop Providers tabs will
derive membership from, so they can no longer drift from the CLI picker.
Tests assert the parity contract (catalog == hermes model universe) and tab
routing as invariants, not snapshots.
The previous xAI auto-speech-tag tests asserted on the local
pause-only fallback and only passed because call_llm silently
returns None in the test environment. They gave zero coverage of
the new auxiliary-rewrite path added in the previous commit.
Add tests that:
- mock agent.auxiliary_client.call_llm and pin down the new contract
(auxiliary rewriter output wins over the local fallback)
- verify the system prompt lists every documented inline + wrapping
tag and uses BBCode-style [/tag] closing syntax
- cover markdown-fence stripping (with and without language hint)
- exercise the local fallback on rewriter exception, empty response,
None response, and missing-choices response
- confirm call_llm is NOT invoked when the input already has
explicit speech tags, or is empty / whitespace-only
- replace the end-to-end test that asserted on the silent-fallback
output with one that mocks the rewriter and asserts the
rewriter's tagged text is what reaches the xAI TTS API
Mirrors the existing Gemini TTS audio-tag rewrite path. When the input
has no explicit user/model speech tags, ask the configured auxiliary
model to insert a richer set of xAI-supported tags (laughs, sighs,
whispers, soft/loud, slow/fast, etc.) so voice-mode replies sound more
expressive. Falls back to the local conservative [pause]-only transform
on any auxiliary-model failure.
A plain /model <name> switch only lasted for the current session — every
new session reverted to the previously-configured model, so users had to
re-switch every time (e.g. glm-5.1 -> glm-5.2 on every launch).
Persist-by-default is now the behavior across all three /model surfaces
(CLI, gateway, TUI/dashboard), gated by a new config key
model.persist_switch_by_default (default true):
/model <name> switch model (persists to config.yaml)
/model <name> --session switch for this session only
/model <name> --global switch and persist (explicit, unchanged)
The effective persistence is resolved once via resolve_persist_behavior()
in hermes_cli/model_switch.py so --session opts out, --global opts in,
and the config-gated default applies otherwise. --global remains a valid
explicit no-op alias for the new default.
Commit 6724daa2c added refresh_interval=1.0 to keep the idle clock
ticking, but unconditional 1 Hz redraws in non-fullscreen prompt_toolkit
mode cause terminal emulators (Xshell, iTerm2, Windows Terminal) to
auto-scroll to the bottom on every tick — breaking scroll-up to read
history.
Drive it from display.cli_refresh_interval (0 = disabled, the default)
so users who want the ticking clock can opt in without affecting everyone.
Fixes: #48309
Related: 6724daa2c, 8972a151a
gui.log was registered in hermes_cli/logs.py::LOG_FILES (and surfaced by
`hermes logs gui`) but was never wired into `hermes debug share`. The share
report captured agent/errors/gateway/desktop tails plus full agent/gateway/
desktop logs — but nothing from gui.log, the surface the dashboard, TUI-over-
PTY bridge, and websocket layer (hermes_cli.web_server / pty_bridge /
tui_gateway) actually write to. A user reporting a dashboard or TUI bug shared
zero breadcrumbs from the broken surface.
Wire gui.log through all three share surfaces, matching the existing pattern:
- _capture_default_log_snapshots(): capture the gui snapshot (redacted like the rest)
- collect_debug_report(): add the gui.log summary tail block
- build_debug_share(): pull gui full_text, prepend dump header + redaction banner, add to the upload loop
- run_debug_share() --local branch: same, plus the local print block
- _PRIVACY_NOTICE: name gui.log in both bullets
Redaction is inherited for free — the gui snapshot goes through the same
_capture_log_snapshot(..., redact=redact) path, so secrets are scrubbed in
both the tail and full text (verified E2E: seeded key masked by default,
passes through under --no-redact, raw token never leaks).
Tests: seed gui.log in the fixture, add test_report_includes_gui_log, and bump
the upload-count tripwire 4->5 (test_share_uploads_five_pastes).
The built-in Piper provider (tts.provider: piper, Python piper-tts
package) already constructs piper.SynthesisConfig for the advanced
tuning knobs, but did not forward speaker_id from the user config.
This wires tts.piper.speaker_id through to SynthesisConfig.speaker_id
so multi-speaker ONNX models (e.g. libritts_r) can be addressed via
config without dropping to the command-provider path.
Changes:
- Add speaker_id to the has_advanced tuple so setting it triggers
SynthesisConfig construction (same gating as the other knobs).
- Pass speaker_id=speaker_id to SynthesisConfig. Defaults to 0
(Piper's own default; single-speaker models ignore the field).
- Tolerant parse: bad input (non-int strings, lists, dicts) is
dropped to 0 instead of raising. Booleans are rejected outright
(True/False would silently coerce to 1/0 and hide a config
mistake). Mirrors the same shape as the command-provider's
_resolve_command_tts_optional_number helper.
speaker_id is applied per-call via syn_config.speaker_id, so the
PiperVoice cache key is intentionally left as just (model, cuda) --
the same loaded model serves all speakers. Tests cover the
config knob, the tolerant parse, and the no-reload invariant.
sentence_silence is intentionally not added here: the Python
piper-tts SynthesisConfig does not expose that field (CLI-only).
/update calls dieWithCode(42) which tears down the gateway and
hard-exits the Node process — the same PTY-killing path that /exit
and /quit use. In the hosted dashboard chat there is no Python
update wrapper to catch exit code 42, and the PTY death bricks the
tab until a browser refresh.
Mirror the DASHBOARD_TUI_MODE guard that #48882 added for /exit and
/quit: refuse early with an explanatory message.
Address correctness gaps found in pre-PR review of the strict matcher:
- Profile selectors can appear on EITHER side of the `gateway` token
(`_apply_profile_override` strips `--profile`/`-p` from anywhere in argv
before argparse), so `hermes gateway --profile work run` and
`python -m hermes_cli.main gateway -p work run` are valid launches the
previous matcher wrongly rejected. Strip `--profile`/`-p`/`--profile=`/`-p=`
from anywhere before locating the subcommand.
- A profile literally named `gateway` (`hermes -p gateway gateway run`) made
the old token scan stop on the profile value; stripping the selector+value
first fixes it.
- Tokenize quote-aware with `shlex` so quoted Windows paths containing spaces
(`"C:\Program Files\Hermes\hermes-gateway.exe"`) are no longer split mid-path
and the dedicated-entrypoint match survives.
Without these, the matcher could MISS a real running gateway -> the opposite
failure (restart/status reporting "down" when up). Adds regression tests for
all three shapes.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The pyproject addopts pin `--timeout-method=signal` relies on signal.SIGALRM,
which doesn't exist on Windows. pytest-timeout raised AttributeError at timer
setup and aborted the entire run before any test executed, so the suite was
unrunnable on Windows by default. Override timeout_method to "thread" on
Windows in pytest_configure; POSIX keeps the more reliable signal method.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`hermes gateway restart` on Windows could take the gateway offline with no
replacement. restart() was stop() -> sleep(1.0) -> start(), but the graceful
drain can run up to ~180s while the detached pythonw process stays alive. The
1s sleep let start() run against the still-draining old process; its
"already running" guard then no-opped, and when the old process finally exited
nothing relaunched it.
Two root causes, both fixed:
1. Loose PID detection. `_scan_gateway_pids` and the gateway.status helpers
used substring matches ("... gateway" in cmdline) for lifecycle decisions,
so they false-matched `gateway status`/`dashboard` siblings and unrelated
processes like `python -m tui_gateway`, plus stale gateway.pid records.
Add a shared strict matcher `looks_like_gateway_command_line()` in
gateway/status.py that requires the real `gateway run` subcommand (or the
dedicated entrypoints), and route `_looks_like_gateway_process`,
`_record_looks_like_gateway`, and `_scan_gateway_pids` through it.
2. restart() race. Wait until the gateway is authoritatively gone
(`get_running_pid()` + strict `_gateway_pids()`) before relaunch; force-kill
once if it lingers and raise rather than start a duplicate; verify the
relaunch produced a running gateway and raise loudly if not (no more
exit-0 silent outage).
Scoped to Windows; systemd/launchd restart paths are already drain-aware.
Adds tests/gateway/test_gateway_command_line_matcher.py.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replaces the tautological test from the original PR (which asserted a
plain assignment it performed itself in the test body) with one that
exercises the actual contracts: _init_cached_agent_for_turn leaves
max_iterations untouched, and the per-turn IterationBudget rebuild
(turn_context.py) propagates a refreshed cap.
When a gateway agent is reused from cache, it retains the max_iterations
from its initial creation. If config.yaml agent.max_turns or HERMES_MAX_ITERATIONS
changed between turns, the cached agent's budget becomes stale.
Before reusing a cached agent, refresh agent.max_iterations from the
freshly-resolved value (read from env/config at line 14585).
Fixes partial issue from PR #48127: handles fresh agent creation + cached agent reuse.
The Discord fix (previous commit) handles dict-shaped clarify choices at the
Discord adapter only. The same dict-repr leak originates upstream at
tools/clarify_tool.py's str(c).strip() normalization — the single
platform-agnostic point both the CLI and every gateway adapter flow through.
When an LLM emits [{"description": "..."}] instead of bare strings, str(c)
produced {'description': '...'} which leaked onto the CLI panel
(cli.py:13048/13081), was returned verbatim as the user's answer
(cli.py:11945), and hit Telegram's numbered list too.
Add _flatten_choice (same label->description->text->title unwrap as the
Discord adapter, name/value excluded, keyless dicts dropped) and apply it at
the normalization line. Fixes CLI + Telegram + all platforms at the root;
the Discord smart-truncation now operates on already-clean text.
Adds johnjacobkenny to AUTHOR_MAP for the salvaged commit.
Two bugs surfaced from production usage in #37134:
1. Dict choices rendered as Python repr. LLMs sometimes emit
[{"description": "..."}] instead of bare strings; the old
str(c).strip() coercion turned the whole dict into
"{'description': '...'}" on the button label.
Fix: add a _flatten_choice helper that unwraps dicts against
the canonical LLM tool-call user-facing keys (label, description,
text, title) in that order. Dicts with none of those keys are
dropped. The "name" and "value" keys are deliberately NOT in the
priority list — they're Discord-component-shaped fields that
could appear in dicts that aren't meant to be choices (a
developer-error wiring that passes a Button-shaped object);
picking them would leak raw enum values or 4-char model
identifiers onto user-facing buttons.
2. Mid-word truncation on long button labels. The old
choice[:72] + "..." cut at position 72, mid-word. Worse, the
three-char ellipsis ate into the 80-char Discord label cap,
leaving only 75 chars of body.
Fix: budget-aware cut strategy with three tiers:
a. Last space in the trailing half of the budget (word boundary).
b. Last soft boundary (- , . )) in the trailing half — used
only when no word boundary exists.
c. Hard cut at the budget limit (last resort).
Use single U+2026 (…) to fit the cap. Cut AT soft boundaries
(inclusive) so the label ends on the boundary char rather than
on the alpha char that followed it.
Tests:
- test_unwraps_dict_choices_to_description: reproduces the
screenshot in #37134, asserts the Python repr is gone.
- test_unwrap_prefers_description_over_name_in_multi_key_dict:
regression guard for the name-key order in the unwrap list.
- test_unwrap_prefers_label_over_description: regression guard
for label winning over description.
- test_unwrap_does_not_pick_value_or_name_alone: regression
guard for the "name"/"value" fields being absent.
- test_truncates_long_choice_label: 200-char input, asserts
total <= 80 and U+2026.
- test_truncates_long_choice_label_breaks_on_word_boundary:
asserts the cut is on a space, not mid-word.
- test_truncates_long_no_space_choice_on_soft_boundary:
adversarial input where position 76 is mid-word alpha, asserts
the renderer falls back to a soft boundary.
Parity: telegram clarify suite (12 tests) still passes; the
helper is a Discord adapter local, not shared with the gateway.
Follow-up: gateway/platforms/telegram.py has the same str(c).strip()
pattern in its own send_clarify and will need a similar fix
(separate PR to keep this diff reviewable).
Fixes#37134
Unit-test `storedSessionIdForNotification`: runtime ids resolve to their
stored id, unknown ids and empty maps pass through unchanged, the right
stored id is picked among several sessions, and stored ids (map keys) are
never rewritten.
Native notifications (approval / sudo / secret / clarify) are tagged with
the gateway *runtime* session id — the key under which the session lives in
the gateway's in-memory `_sessions` map and the id every event carries
(`tui_gateway/server.py` `_emit(event, sid, ...)`). The chat route, however,
is keyed by the *stored* session id (`stored_session_id`), which is a
different value: a new chat gets its runtime id immediately but its stored id
only once the first turn persists.
`onFocusSession` navigated straight to `sessionRoute(<runtime id>)`, so
clicking a notification (e.g. an approval prompt) sent the route-resume path a
runtime id where it expects a stored id. `useRouteResume` then resumed it as a
stored session -> REST `/api/sessions/<runtime id>` 404 "session not found",
and the running session was navigated away, which the user experiences as the
session being destroyed.
Translate runtime -> stored before navigating via the existing
`runtimeIdByStoredSessionId` map (new `storedSessionIdForNotification`
helper), falling back to the id as-is when no mapping is known. The
Approve/Reject notification button path is untouched: `approval.respond` is
routed by the runtime id (`_sess()` -> `_sessions[session_id]`), so it must
keep carrying the runtime id.
_is_compression_ancestor walked parent links in a 100-hop Python loop
issuing two SELECTs per hop and hand-re-encoded the compression
continuation edge a fourth time. Collapse it into a single recursive CTE
that reuses the canonical _COMPRESSION_CHILD_SQL fragment (already shared
by _ephemeral_child_sql and set_session_archived), so the edge definition
lives in exactly one place. The UNION recursion also dedups visited nodes,
making it cycle-safe without the defensive hop cap. Behavior is unchanged
(all TestSessionTitleLineage + existing title-command tests pass).
Regression tests for renaming a compression continuation back to its base
title: single- and multi-level chains transfer the title off the ended
predecessor, while unrelated sessions and non-compression children (created
while the parent was live) still raise the uniqueness conflict.
When context compression rotates a session, the original is ended and the
continuation is auto-numbered (e.g. "name" -> "name #2"). The session list
projects the ended root behind its live tip, so the user never sees the
predecessor. But set_session_title's uniqueness check compared against ALL
sessions, so renaming the visible tip back to "name" dead-ended with
"Title 'name' is already in use by session <id the user can't find>".
When the conflicting title is held by a compression ancestor of the session
being renamed, transfer the title instead of raising: clear it from the
ended predecessor and apply it to the continuation. Uniqueness is preserved
(still exactly one session carries the title) and the parent-link lineage is
untouched, so resume-by-title and tip projection keep working. Genuine
conflicts with unrelated sessions, and with non-compression children
(delegate/branch), still raise as before.
The salvaged PR added the vitest devDep + config + a unit test but never
added a "test" script to web/package.json, so "npm run test" errored with
"Missing script: test" and the new suite was unrunnable. Add the script so
"npm run test" runs the suite as the PR body claimed (4/4 pass).
The dashboard's FastAPI server and a terminal CLI are separate processes
sharing one SQLite session DB; there is no inter-process push channel.
The Sessions page polled the 50 newest sessions every 5s for the
"overview" card but only re-fetched the paginated sessions list on page
change or delete, so a session started in a terminal never appeared in
the list until the user navigated.
Reuse the existing 5s overview poll as a change signal: when the head
session id changes, silently reload the current page (no loading
spinner flicker, no scroll/reset of expanded rows or bulk selection,
which are keyed by id). The detection logic is extracted into a pure
shouldRefreshSessions() helper with unit tests. Adds a minimal vitest
setup for web/ (test script + config).
TMUX is not forwarded over SSH, so a TUI launched on a remote host from
inside local tmux only sees TERM=tmux/tmux-256color with no TMUX var --
the cursor-drift bug still applies there. Extend supportsFastEchoTerminal()
to also fall back when TERM is tmux-flavored.
Deliberately scoped to tmux* only, NOT screen*: GNU screen sets the same
screen/screen-256color TERM and has no reported drift, so widening to
screen would disable the optimization for those users with no evidence of
a bug (matching the original PR's stated out-of-scope note).
Adds tests for tmux-flavored TERM (disabled) and screen/xterm TERM
(stays enabled) to guard against accidental widening.
When /goal (and other _PENDING_INPUT_COMMANDS: retry, queue, q, steer,
plan, undo) were typed in the TUI desktop app, slash.exec returned error
4018 instructing the frontend to fall back to command.dispatch. Some
clients failed that client-side fallback, leaving the command empty and
surfacing "empty command" — the user's typed text was silently dropped.
slash.exec now routes pending-input commands to command.dispatch
internally, eliminating the fragile client-side fallback hop. The
response is exactly what command.dispatch would have produced, so the
TUI client behaves identically once the round-trip succeeds.
Salvaged from #48944 — rebased onto current main. The original PR's
source change and test_goal_command.py update are correct, but it missed
the second test surface: tests/tui_gateway/test_protocol.py's
parametrized test_slash_exec_rejects_pending_input_commands still
asserted the old 4018 rejection for retry/queue/q/steer/plan, turning CI
red (5 failures). That test is rewritten here as a behavior contract:
slash.exec for a pending-input command must yield the same payload as a
direct command.dispatch call, and must no longer emit the old
"pending-input command" fallback rejection.
Co-authored-by: kyssta-exe <kyssta-exe@users.noreply.github.com>
`hermes backup` walked every file under HERMES_HOME, excluding only
hermes-agent / node_modules / __pycache__ / backups / checkpoints. Python
dependency trees (plugin and MCP-server venvs, site-packages) and pip/uv
tool caches that live under HERMES_HOME were swept in file-by-file,
ballooning a backup to hundreds of thousands of entries that crawl for
hours — the reported "backup stuck for days / 426543 files" symptom.
Add the canonical regeneratable-dir names (.venv, venv, site-packages,
.tox, .nox, .pytest_cache, .mypy_cache, .ruff_cache — mirroring
agent.skill_utils.EXCLUDED_SKILL_DIRS) plus .cache to the backup's
exclusion set, used by both run_backup and the pre-update/pre-migration
_write_full_zip_backup. .archive is intentionally left in so the curator's
restorable archived skills still get backed up.
Tests cover each new dir name (excluded at any depth), that .archive and
cache-resembling files are kept, and an integration check that a planted
venv/site-packages/cache is pruned from the actual backup zip while
skills/config survive.
The batch tool_status values ('completed'/'error'/'pending') and the inbound
status alias sets were inline magic strings, duplicated across two checks in
_tool_result_status. Hoist them to module-level constants
(_TOOL_STATUS_* + _TOOL_STATUS_{ERROR,COMPLETED}_ALIASES) so the canonical
wire values and the alias->canonical mapping live in one place. Emitted
values are unchanged.
_messages_to_openviking_batch's pre-scan already parses and caches each
tool call's arguments into tool_calls_by_id. The pending-tool-call branch
re-parsed them via _tool_call_input(), a second parse and a second source
of truth. Reuse the cached tool_input when the id was cached (non-empty),
falling back to a parse only for the uncached empty-id case so arguments
are never dropped. No behavior change.
_OPENVIKING_RECALL_TOOL_NAMES hardcoded the three read-tool names as string
literals, which can silently desync from the *_SCHEMA["name"] constants on a
rename (the same drift the adjacent _CATEGORY_SUBDIR_MAP comment warns about).
Derive the set from SEARCH/READ/BROWSE_SCHEMA["name"] instead. Write tools
(viking_remember / viking_add_resource) remain intentionally excluded. Set
contents are unchanged.
The npm workspace pins a single npmDepsHash for fetchNpmDeps. Any change to
package-lock.json that doesn't also refresh that hash breaks the bundled
hermes-tui / hermes-desktop-renderer build for Nix flake consumers, and no
nix CI catches it — the workflow that ran fix-lockfiles was removed in
9eb0bcd6 ("change(ci): rip out nix ci for now").
Fetch the workspace deps with pkgs.importNpmLock instead. It resolves each
package from the lockfile's own integrity hashes, so package-lock.json is the
single source of truth and there is no separate hash to drift.
This also removes:
- the fix-lockfiles checker/refresher and its devShell wiring — it existed
only to keep npmDepsHash in sync, so it is dead once the hash is gone, and
its sole CI consumer was already removed in 9eb0bcd6;
- the patchPhase that normalized lockfile trailing newlines — importNpmLock's
npmConfigHook overwrites the lockfile rather than diffing it, so the
normalization is unnecessary.
npm-lockfile-fix is retained: importNpmLock requires an integrity-complete
lockfile, which that tool guarantees when the lockfile is regenerated.
Co-authored-by: ak2k <19240940+ak2k@users.noreply.github.com>
Two follow-up fixes on top of the cherry-picked structured-sync work:
- _messages_to_openviking_batch only added a recall tool result's id to
skipped_tool_ids when the id was non-empty. An empty tool_call_id (which
the canonical transcript can carry; agent_runtime_helpers defaults it to
"") poisoned the skip set with "", silently dropping any *other* tool
result that also lacked an id. Move the recall-skip add inside the
existing `if tool_id:` guard. Adds a regression test (mutation-checked:
fails on pre-fix code, passes after).
- _sync_trace_enabled() open-coded the canonical truthy-env check; reuse
utils.env_var_enabled (byte-identical {1,true,yes,on} semantics).
* feat(skills): add html-artifact skill, fold in sketch + architecture-diagram + concept-diagrams
Adds a unified `html-artifact` creative skill that produces self-contained,
single-file HTML artifacts — concept explainers, implementation plans,
status/incident reports, code-review walkthroughs, technical + educational
SVG diagrams, multi-variant design comparisons, and throwaway editors that
export their state back to the clipboard. Grounded in Anthropic's
html-effectiveness gallery (MIT); the house style (token block, serif/sans/
mono split, hand-rolled diffs, inline-SVG diagrams, graceful degradation) is
distilled from reading all 20 reference files.
Supersedes and removes three overlapping skills, folding their unique value in:
- sketch -> the fidelity dial (throwaway vs presentation) + the
multi-variant comparison layouts + the browser-vision
verify loop (references/fidelity-and-verify.md)
- architecture-diagram-> the dark "infra" token variant + double-rect masking +
semantic component palette (references/dark-tech.md,
templates/diagram.html infra mode)
- concept-diagrams -> the 9-ramp educational color system + the concept
archetype library (references/concept-archetypes.md,
the light design system in templates/diagram.html)
Structure:
- SKILL.md (description exactly 60 chars), 6 references, 3 templates
- templates verified by headless-Chrome render + vision inspection
- editor export logic (file://-safe clipboard, Promise-normalized) verified in node
Cross-references updated in claude-design (new disambiguation table row drawing
the design-taste vs information-artifact boundary), design-md, pretext, spike,
and kanban-video-orchestrator. Website skill docs + catalogs regenerated;
stale EN/zh-Hans per-skill pages pruned and i18n cross-refs fixed.
Not folded (intentionally orthogonal): excalidraw (.excalidraw JSON), p5js
(generative canvas), claude-design / popular-web-designs / design-md (visual
design taste / brand vocab / token spec).
* feat(skills): ship html-effectiveness gallery as fetched reference examples
Add scripts/fetch-examples.sh (idempotent clone/pull of Anthropic's MIT
html-effectiveness gallery) + references/examples.md mapping each of the 20
example files to a mode so the agent reads the right worked example. The clone
lands in references/examples/ and is gitignored (it's a 384KB upstream repo,
not vendored). SKILL.md workflow + reference list now point at it; falls back to
the distilled pattern references when offline.
* feat(skills): make reading a gallery example a required authoring step
Reading the matching html-effectiveness example is now workflow step 2 (was an
optional aside in step 3): fetch the gallery, read_file the file for your mode,
mirror its structure. Models skip optional steps; the examples are the ground
truth, so consulting one is mandatory. Added an 'Example' column to the
mode->build quick-reference table and a 'don't skip the example' pitfall.
Also dogfooded the skill: read 03-code-review-pr.html and 13-flowchart-diagram.html
raw and reconciled the distilled references against source — aligned diff-row tint
opacity to the source's 0.15 (was 0.18) and added the .ctx/.hunk rows in
house-style.md + base.html so they match 03-code-review-pr.html verbatim.
* docs(skills): explain the consolidation + bundled-vs-optional rationale
The supersession note only stated *what* was folded, not *why* the prune is
sound. Expand SKILL.md's intro into a 'Why this skill exists' section: the three
former skills emitted the same artifact and overlapped, so consolidating removes
which-one-do-I-load ambiguity; and the optional->bundled promotion of
concept-diagrams is footprint-safe because this skill has zero deps (only cost is
the 60-char description; everything else is progressive-disclosure). States the
bundling dividing line explicitly: zero install cost + broadly useful gets
bundled, real install cost (hyperframes: Node+FFmpeg+Chromium) stays optional.
Regenerated website per-skill page to match.
Follow-up to the salvaged hosted /exit fix. Instead of a separate 4-env-var
fingerprint (HERMES_TUI_INLINE + /opt/data HERMES_HOME + HERMES_WRITE_SAFE_ROOT
+ HERMES_DISABLE_LAZY_INSTALLS), gate /exit and /quit on the existing
DASHBOARD_TUI_MODE flag (HERMES_TUI_DASHBOARD) that the keyboard idle-exit
(useInputHandlers) and SIGINT-ignore (entry.tsx) paths already use. One hosted
detection mechanism instead of two divergent ones.
Extract the refusal text to an exported DASHBOARD_EXIT_DISABLED_MESSAGE so the
test asserts the same source of truth as production (no change-detector on the
literal). Test mocks only the DASHBOARD_TUI_MODE export via importActual so the
other env exports stay real.
- Drop empty entries before validating SLACK_ALLOWED_USERS so a trailing or
interior comma (which the gateway silently tolerates in
gateway/platforms/slack.py) is no longer rejected at the dashboard.
- Hoist the member-ID regex to a module-level _SLACK_MEMBER_ID_RE constant
and note it stays in sync with the frontend SLACK_MEMBER_ID_RE.
- Add a regression test for the trailing-comma case.
The new SLACK_ALLOWED_USERS validation rejected '*', but the Slack gateway
honors '*' as an allow-all wildcard (gateway/platforms/slack.py DM auth,
slash-confirm, and approval-button paths). Accept '*' as a valid list entry
in both the API validator and the dashboard form so a value the runtime
honors is no longer blocked at setup.
* fix(relay): enable RELAY platform + normalize dial URL so hosted gateways actually connect
Three bugs blocked a self-provisioned hosted gateway from ever establishing its
inbound relay WS (found while standing up the live staging end-to-end). Each
masked the next; all three are needed for inbound to work.
1. RELAY platform never enabled in config.platforms (gateway/config.py).
register_relay_adapter() puts the adapter in the platform_registry, but
start_gateway()'s connect loop iterates self.config.platforms — which never
contained Platform.RELAY. So the adapter was "registered" but never connected
(logs showed "relay adapter registered" then "No messaging platforms
enabled"). Fix: _apply_env_overrides now enables Platform.RELAY (mirroring
relay_url into extra for the connected-checker) when GATEWAY_RELAY_URL (env)
or gateway.relay_url (yaml) is set. Absent -> no RELAY entry (direct/
single-tenant gateways unaffected).
2. URL scheme not converted for the WS dial (gateway/relay/ws_transport.py).
The relay URL is configured once as the http(s):// base (used as-is for the
provision POST), but websockets.connect rejects http(s):// with "scheme isn't
ws or wss". Fix: _ws_dial_url converts https->wss / http->ws.
3. /relay path not appended (same helper). The connector mounts its
WebSocketServer at path "/relay" and returns HTTP 400 on an upgrade to any
other path. GATEWAY_RELAY_URL is the base (no /relay), so the dial hit "/"
-> 400. Fix: _ws_dial_url ensures the path ends in /relay. Idempotent — a URL
already carrying ws(s):// and/or /relay is unchanged, so provision's
_provision_url (which derives /relay/provision from either form) still works.
Why the cross-repo E2E missed #2/#3: the stub connector binds ws://host:port and
its websockets.serve accepts ANY path, so neither the scheme nor the /relay path
was exercised. Real connector needs both.
Verified live on staging hermes-agent-stg-automated-perception-5054: after the
fixes the gateway logs "Connecting to relay..." -> "✓ relay connected" ->
"Gateway running with 1 platform(s)" against
wss://gateway-gateway.staging-nousresearch.com/relay, stable.
Tests: added _ws_dial_url scheme+path+idempotency cases (test_ws_transport.py)
and RELAY-platform-enablement cases for env + yaml + absent (test_config.py).
Full gateway/relay + config suites green (191 passed).
Relay-adapter lane. EXPERIMENTAL.
* fix(relay): re-attach guild_id to outbound so connector egress resolves the tenant
The final bug in the hosted-relay round-trip. Inbound worked end to end (Discord
-> connector -> bus -> agent WS -> agent runs -> reply), but the reply's egress
was declined by the connector: "discord egress declined: target not routed to an
onboarded tenant".
Cause: the connector's routedEgressGuard resolves the owning tenant from the
OUTBOUND action's metadata.guild_id (Discord's routing discriminator). The
gateway's generic delivery path builds outbound metadata via
run.py _thread_metadata_for_source, which only carries thread_id (and returns
None entirely for a non-threaded message) — so guild_id never reached the
connector, tenant resolution failed, and the shared bot refused to post.
Fix (relay-adapter-local, no perturbation of the generic delivery path or other
platforms): RelayAdapter learns chat_id -> guild_id from each inbound event
(_capture_scope) and re-attaches it to the outbound action's metadata in send()
(_with_scope) when not already present. No-op for chats we never saw inbound
(e.g. DMs) and never overwrites an explicit guild_id.
Verified live on staging hermes-agent-stg-automated-perception-5054: an
@mention in #general now produces a visible bot reply — full multi-tenant relay
round-trip (real Discord -> shared connector bot -> tenant routing -> agent WS ->
reply egress -> Discord).
Tests: _capture_scope/_with_scope reattach, no-scope no-op, explicit-guild_id
preserved (test_relay_adapter.py). Full relay + config suites green (160 passed).
Relay-adapter lane. EXPERIMENTAL.
systemctl --user restart hermes-gateway run via the terminal tool is a
child of the gateway itself. When systemd delivers SIGTERM the gateway
kills this subprocess before it can complete, so the service may never
restart — reproducing issue #37453.
The hermes gateway restart/stop guard (hermes_cli/gateway.py) and the
cron-path guard (hermes_cli/cron.py) already block equivalent commands
in their respective paths but the terminal tool had no such defense.
Add a hard-block before command execution in terminal_tool: when
_HERMES_GATEWAY=1 and the command matches _contains_gateway_lifecycle_command,
return an error immediately. force=True cannot bypass it — unlike the
normal dangerous-command approval flow, here even a user-approved restart
would fail because the SIGTERM propagates to child processes.
Also extend _GATEWAY_LIFECYCLE_PATTERNS to match systemctl with flags
(e.g. systemctl --user restart) — the previous regex required the
action word immediately after systemctl with no flags in between.
Adds 9 regression tests: 6 blocked variants (parametrized), force bypass
attempt, safe systemctl passthrough, and guard-inactive-outside-gateway.
* feat(image-gen): add image-to-image / editing to image_generate
Brings image generation to parity with video generation: the unified
image_generate tool now edits/transforms a source image (image-to-image)
when given image_url / reference_image_urls, routing to each backend's
edit endpoint, exactly as video_generate routes to image-to-video.
- ImageGenProvider ABC: generate() gains keyword-only image_url +
reference_image_urls; new capabilities() declares modalities +
max_reference_images (defaults to text-only, backward compatible).
success_response gains a modality field; adds normalize_reference_images.
- image_generate tool: schema exposes image_url + reference_image_urls;
dynamic schema reflects the active model's actual edit capability so the
agent knows when image_url is honored. Handler + plugin dispatch forward
the new inputs; legacy/text-only providers get a clear modality_unsupported
error instead of silently dropping the source image.
- In-tree FAL: 7 models gain edit endpoints (flux-2-klein, flux-2-pro,
nano-banana-pro, gpt-image-1.5, gpt-image-2, ideogram/v3, qwen-image)
with per-model edit_supports whitelists + reference caps; routes to the
/edit endpoint and skips the upscaler for edits.
- Plugins: openai (images.edit, 16 refs), xai (/v1/images/edits via
grok-imagine-image-quality, JSON body per xAI docs), krea
(image_style_references, 10 refs). openai-codex stays text-only and
rejects edits with an actionable error.
- Tests: 15 new (payload, routing, dispatch forwarding, dynamic schema,
capabilities); updated 2 change-detector/lambda tests for the new schema.
- Docs: image-generation feature page, image-gen provider plugin guide,
tools reference.
* fix(image-gen): preserve legacy passthrough in fal/krea plugin tests
Two existing plugin tests asserted pre-image-to-image behavior:
- fal: forward image_url/reference_image_urls only when supplied, so a
text-to-image delegation stays byte-identical (no None kwargs).
- krea: keep dict-shaped image_style_references refs verbatim (the unified
string refs go through normalize_reference_images; legacy non-string ref
objects pass through unchanged) — fixes KeyError when callers pass the
richer Krea ref-object shape.
* fix(image-gen): clearer not-capable message for text-to-image-only models
When a text-to-image-only model (incl. gpt-image-2 on the Codex OAuth path,
which can't do editing through the Responses image_generation tool) gets a
source image, say 'this model is not capable of image-to-image / editing —
provide a text-only prompt' rather than sending the user shopping for other
backends. Applies to the openai-codex guard, the in-tree FAL no-edit-endpoint
error, and the dynamic tool-schema text-only line.
The desktop model picker had no way to force a fresh model fetch: model.options
went through the 1h-cached provider_models_cache.json, and there was no flag to
bust it. When a provider's cached list expired and its next live fetch failed,
the picker fell back to the curated static list — silently dropping live-only
models (e.g. OpenCode Zen's free tier like deepseek-v4-flash-free) the user had
been using.
- Thread refresh through model.options (RPC + REST /api/model/options) ->
build_models_payload -> list_authenticated_providers, which calls
clear_provider_models_cache() up front when set so every row re-fetches live.
- Add a 'Refresh Models' control to the desktop picker (5-locale i18n, spinning
sync icon). Normal opens leave refresh=false to stay snappy on the cache.
Verified: stale cache hides deepseek-v4-flash-free -> refresh busts it -> live
re-fetch surfaces it. refresh=false never touches the cache.
Live-test finding: the Chronos fire webhook was only on the APIServerAdapter
(aiohttp), but hosted agents expose `hermes dashboard` (the FastAPI web_server
app on :9119) as their public URL — NOT the api_server adapter. So NAS's relay
callback to {callback_url}/api/cron/fire could never reach the verifier on a
hosted agent (the exact target environment). Two layers were wrong:
1. Wrong server: /api/cron/fire didn't exist on the dashboard app. Added
cron_fire_webhook there, alongside the existing /api/cron/* dashboard routes.
It resolves the job's profile (_find_cron_job_profile) and runs fire_due via
the resolved provider under the cron-profile retarget lock
(_fire_cron_job_for_profile, mirroring _call_cron_for_profile) so the CAS
claim + run_one_job operate on the right profile's jobs.json. Runs with no
live adapters (delivery falls back to the per-platform send path, like the
desktop cron path). 202 + background so a long turn never trips NAS's
timeout; the store CAS de-dupes a NAS retry. job-not-found -> 200 "gone".
2. Auth gate: the dashboard auth middleware 401s any non-cookie request before
the handler runs. Added /api/cron/fire to the shared PUBLIC_API_PATHS so the
NAS bearer-JWT callback reaches the verifier — the JWT (purpose=cron_fire),
not the cookie, is the real gate. One shared frozenset feeds both the
loopback and OAuth middlewares, so no drift.
Kept the APIServerAdapter route too (valid self-host api_server surface).
Contract doc updated to name the dashboard app as the hosted-agent callback
surface.
Tests: test_cron_fire_dashboard (6) — route registered on the dashboard app,
in PUBLIC_API_PATHS, 401 on bad token WITH the cookie gate engaged (proves it's
reachable past the gate + JWT is the gate), 400 missing job_id, 200 gone for
unknown job, 202 + fire_due invoked for the resolved profile on a valid token.
Full hermes_cli + cron + chronos + webhook suites green (7637).
Why the original tests missed it: the api_server webhook test built an
APIServerAdapter client directly and never asserted which server the hosted
public URL exposes — green-but-wrong-integration. The new test pins the route
to the dashboard app.
* fix(dashboard): resolve chat TUI argv off event loop
Dashboard chat now resolves its TUI launch command off the
FastAPI/WebSocket event loop. The resolver can run `npm install` /
`npm run build` through `_make_tui_argv()`, and doing that synchronously
in `/api/pty` can block proxy keepalives and other dashboard WebSocket
work long enough for reverse-proxy deployments to drop the chat
connection.
This keeps the current TUI build policy intact: normal production
launches still run the correctness-first `npm run build` path, while
`HERMES_TUI_DIR` remains the prebuilt/no-build path for distros and
containers. The change only moves the potentially slow resolver work to
a worker thread for the dashboard chat path, serialized by an
`asyncio.Lock` so concurrent chat tabs preserve one-build-at-a-time
behavior. `SystemExit` (node/npm missing) and the profile `HTTPException`
path still propagate cleanly through `asyncio.to_thread()`.
Salvaged from #26124 — rebased onto current main. The async wrapper now
threads the `profile` parameter that `_resolve_chat_argv` gained on main
since the PR was opened, so cross-profile chat is preserved.
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
* chore: add 0xdany to AUTHOR_MAP
* fix(dashboard): bind chat-argv lock to app.state; cover error propagation
Self-review hardening on top of the salvaged fix:
- Move `_chat_argv_lock` from a module-level `asyncio.Lock()` onto
`app.state` (initialised in `_lifespan`, lazy fallback via
`_get_chat_argv_lock`), mirroring `event_lock`. A module-level
`asyncio.Lock()` binds to whatever event loop is active at import time,
which is the exact pattern `_get_event_state`'s docstring warns against
(breaks across TestClient instances / uvicorn reloads). This keeps the
lock on the running loop.
- Add two tests exercising the real `_resolve_chat_argv_async` →
`asyncio.to_thread` → lock → re-raise chain: `SystemExit` (node/npm
missing) and `HTTPException` (invalid profile) both propagate out of the
worker thread and are caught by `pty_ws`'s existing handlers. The prior
tests mocked `asyncio.to_thread` away and never covered this path.
* test(dashboard): dedupe pty error-propagation tests; assert close code
simplify-code cleanup pass on the salvage stack:
- Extract the shared scaffolding of the two pty_ws error-propagation tests
into `_assert_pty_propagates`, keeping the two tests as distinct contracts
for the `except SystemExit` and `except HTTPException` arms.
- Assert the stable WebSocket close code (1011) instead of relying solely on
the user-facing "Chat unavailable" notice wording — a behavior contract per
the AGENTS.md "behavior contracts over snapshots" rule, robust to notice
rewording. The detail substring ("unknown profile") is still checked for the
HTTPException case since proving the detail survives the thread hop is the
point of that test.
No production-code change; the helper exercises the same real
_resolve_chat_argv_async -> asyncio.to_thread -> lock -> re-raise chain.
---------
Co-authored-by: draihan <draihan@student.ubc.ca>
git worktree lock at creation and unlock before removal. A locked
worktree refuses 'git worktree remove' (and prune), so a second hermes
process or a stray cleanup can't silently delete an in-use isolated
worktree. Fail-soft on both paths — a lock/unlock error never blocks
the session or cleanup.
Salvaged from #47029 (Issue #46303). Unlock moved to the actual-removal
path so a preserved (unpushed-commits) worktree stays locked while in use.
When SessionDB init fails, the CLI/Desktop previously continued live with only
a buried log line. The chat looks healthy, but the transcript is never written
to state.db — so resume later shows a truncated or empty session and the user
only discovers the loss after the fact (#41386).
Emit a prominent stderr banner at startup when the store is unavailable, making
it explicit that the conversation will not be saved and cannot be resumed, with
a pointer to fix the store. Also set _session_db_unavailable so downstream code
can detect the degraded state.
- Scope 'no such tokenizer' matcher to trigram specifically (#779)
- Decouple base FTS and trigram backfill in v11 migration (#1195)
- CJK search falls back to LIKE when trigram unavailable (#3384/#3430)
- Add _trigram_available tracking across init, migration, and startup
- Add regression tests for migration backfill and CJK LIKE fallback
- Add _is_trigram_unavailable_error and _warn_trigram_unavailable helpers
_is_fts5_unavailable_error only matched 'no such module: fts5', but
SQLite builds that ship FTS5 without the optional trigram tokenizer
raise 'no such tokenizer: trigram' instead. This caused SessionDB init
to crash on those builds.
Additionally, the trigram failure path called _warn_fts5_unavailable
which set _fts_enabled = False, globally disabling full-text search
even though the base FTS5 table was created successfully.
Fix:
- Extend _is_fts5_unavailable_error to also match 'no such tokenizer'
- Add _is_tokenizer_unavailable_error to distinguish tokenizer-specific
failures from whole-module absence
- Only call _warn_fts5_unavailable for module-level failures; skip it
for tokenizer-specific failures so base FTS5 remains usable
Fixes#47002
self_provision_if_managed() gated on is_managed(), but is_managed() means
"NixOS/package-manager-managed" (it keys on HERMES_MANAGED or a ~/.hermes/.managed
marker) — NOT "NAS-hosted". A NAS-provisioned Fly agent sets NEITHER, so the gate
was always False and relay self-provision SILENTLY no-oped on exactly the hosted
agents it was built for. Caught live: a staging agent with GATEWAY_RELAY_URL
correctly stamped logged "No messaging platforms enabled" and never dialed the
connector; HERMES_MANAGED was unset on the machine. The unit tests had mocked
is_managed()->True, so they passed while the real trigger never fired (mocked-
trigger blind spot).
Fix: drop the is_managed() gate and rename self_provision_if_managed ->
self_provision_relay. The real trigger is now "relay_url() set + no pinned secret
+ a resolvable NAS token", which is both NAS-independent and self-guarding:
- NAS-hosted agent: GATEWAY_RELAY_URL + no pinned secret + bootstrapped NAS
token -> self-provisions.
- Self-hosted + `hermes gateway enroll`: pinned GATEWAY_RELAY_SECRET -> skipped
(existing secret-present guard).
- Self-hosted, unenrolled, no NAS identity: resolve_nous_access_token() fails
-> graceful no-op (existing fail-soft path).
Security: unchanged trust model. The connector still derives tenant from the
validated NAS token; this only broadens WHEN the provision attempt fires, and
every broadened case is still guarded by token-resolution + pinned-secret-skip.
Tests: replaced the (wrong) "skips when not managed" test with a regression test
proving a NAS host where is_managed()==False STILL provisions; renamed all call
sites; added a "no NAS token -> non-fatal skip" test for the self-hosted branch.
88 relay tests pass.
Relay-adapter lane. EXPERIMENTAL.
@nous-research/ui@0.18.2 Button is grid-based: size=xs is an
aspect-square icon-only box, and icons belong in prefix/suffix.
The dashboard used shadcn-style size=xs + inline <Icon/> text
children, which forced text buttons into broken tall squares
(Configure, Run setup, Select, Save keys) and split icon/label
across grid columns elsewhere (Schedule it, Prune/Delete actions).
Move leading icons to prefix and size text buttons as sm/default.
For the post-setup spinner, drive the spin from a button-level
[&_svg]:animate-spin selector since the prefix slot clones the
icon and overwrites its className.
- ToolsetConfigDrawer: Select, Save keys, Run setup
- SkillsPage: New skill, Configure
- AutomationBlueprints: Schedule it
- SessionsPage: Prune old sessions, Delete empty, Delete selected
The non-retryable abort path now computes _nonretryable_summary once and
reuses it at the emit sites and the returned error field. The
content-policy-blocked return branch still recomputed the identical
value into a separate _summary local, half-honoring the 'summarize once'
intent. _summarize_api_error is a pure staticmethod and api_error is
never reassigned in this block, so _summary was provably byte-identical
to _nonretryable_summary. Reuse the hoisted value and drop the redundant
call. Behavior-preserving.
Locks the contract that a non-retryable failure (a Cloudflare 403
"managed challenge" page) returns a short, HTML-free `error` field —
guarding the field path where the raw page was dumped to Discord as
~31 messages.
The test drives the standard chat-completions path with a concrete
model so the turn actually reaches `client.chat.completions.create`,
where the mocked 403 is raised. It asserts the create call happened
(guarding against a vacuous pass — an empty model on the Codex
Responses path would otherwise abort on a validation ValueError before
any API call) and that the summarized error includes "403" while
excluding <html> / _cf_chl_opt. The non-retryable abort path is
provider-agnostic; a Cloudflare managed-challenge 403 can surface on
any provider behind Cloudflare.
When a non-retryable client error aborts the turn (e.g. a Codex/Cloudflare
HTTP 403 "managed challenge" page), the conversation loop returned the
failure dict with `error: str(api_error)` — the entire ~60KB HTML page.
Downstream consumers deliver that field verbatim: a cron job dumped a
Cloudflare challenge page to Discord, where it was split into ~31 messages.
The sibling "max retries exhausted" path already collapses such bodies via
`_summarize_api_error` (which extracts the <title> / status from HTML error
pages). This makes the non-retryable path consistent: compute the summary
once and use it for both the status emit and the returned `error`.
Follow-up to #47663 (streaming multipart upload), fixing two issues that
landed with it.
1. Temp file leaked on client disconnect. The streaming upload endpoint's
except chain caught only HTTPException / PermissionError / OSError — all
Exception subclasses. asyncio.CancelledError, raised when a browser aborts
a large upload mid-stream (the exact NS-501 scenario), is a BaseException,
so it bypassed every except clause and reached a finally that only closed
the file handle and never unlinked the temp file. Every aborted large
upload orphaned a partial `.{name}.*.upload` file (up to ~100 MB) in the
target directory. Cleanup now lives in finally, keyed on a `renamed`
success flag, so the temp file is removed on every non-success exit
including BaseException paths. Added test_stream_upload_cleans_temp_on_cancellation,
which fails on the pre-fix code (leaks the temp file) and passes with the fix.
2. python-multipart pinned to ==0.0.27 instead of ==0.0.20. The package was
already resolved at 0.0.27 transitively (via daytona) before #47663; the
explicit ==0.0.20 pin in the [web] extra and the tool.dashboard lazy-install
set downgraded it. Bumped both to ==0.0.27 and regenerated with `uv lock`,
keeping the lockfile coherent. The base dependency stays >=0.0.9,<1.
Phase 4F (F.1 + F.2 + F.3, agent side). F.4 is the operator-run live smoke
(needs a NAS deployment); recorded in the PR, not code.
F.1 — on_jobs_changed wiring:
- cron/scheduler.py: _notify_provider_jobs_changed() — resolve the active
provider, call on_jobs_changed(), swallow errors. Lives in scheduler.py (not
jobs.py) so the store stays free of provider imports (no import cycle).
- Wired at the consumer surfaces AFTER a successful mutation: the cronjob model
tool (tools/cronjob_tools.py, create/update/remove/pause/resume) — which the
`hermes cron` CLI also routes through — and the REST handlers
(gateway/platforms/api_server.py, same five). Built-in's no-op default = zero
behavior change on the default path. Sleeping-agent direct jobs.json writes
(no tool/CLI/REST) are covered by reconcile-on-wake in start().
F.2 — config: cron.chronos.{portal_url,callback_url,expected_audience,
nas_jwks_url}. All non-secret; the agent holds no scheduler creds and the
outbound provision call reuses the existing Nous token (no token key). Additive
deep-merge key, no version literal.
F.3 — docs:
- docs/chronos-managed-cron-contract.md: authoritative agent↔NAS wire contract
(the three agent-cron endpoints + inbound /api/cron/fire + the 3-hop trust
model + at-most-once/re-arm semantics). This is what the NAS-side agent builds
against.
- cron-internals.md: "Managed cron (Chronos) for scale-to-zero" section.
- cli-commands.md: cron.provider accepts chronos + the cron.chronos.* keys.
- User docs name no scheduler vendor (QStash is a NAS-internal detail).
INVARIANT re-verified: zero qstash/upstash hits across plugins/cron, gateway,
hermes_cli, tools, website/docs (the one remaining repo hit is an unrelated
Context7 MCP comment in tools/mcp_tool.py).
Tests: test_jobs_changed_notify (5) — notify calls provider hook, swallows
errors, built-in harmless, tool create/remove notify. Full cron + chronos +
webhook + config + api_server_jobs suites green (504 in the cron+chronos+webhook
run).
Phase 4E (E.1 + E.2). The inbound side of Chronos: NAS POSTs the agent when a
one-shot fires; the agent verifies a NAS-minted JWT and runs the job.
E.1 — plugins/cron/chronos/verify.py:
- verify_nas_fire_token(token, expected_audience, jwks_or_key, issuer): verifies
signature against the NAS JWKS (RS/ES family; symmetric rejected), aud == this
agent, exp/nbf, iss, and purpose == "cron_fire" (so a general agent JWT can't
be replayed against the fire endpoint). Returns claims or None; never raises.
Crypto delegated to PyJWT[crypto] (already a declared dep) — no hand-rolled
JWT, no new dependency. No key configured → refuse (never unsigned-decode a
security boundary).
- get_fire_verifier(): pluggable indirection so the DQ-4 escape hatch
(direct per-job cron-key) can swap in with no handler change.
E.2 — gateway/platforms/api_server.py:
- POST /api/cron/fire (registered only when _CRON_AVAILABLE). Authenticated by
the NAS-JWT via get_fire_verifier() — NOT API_SERVER_KEY (NAS holds no API
key; this is the only inbound that triggers remote job execution, so it gets
its own purpose-scoped check). Verifier args come from cron.chronos.* config.
401 on bad/missing/forged token. 400 on missing job_id. On success: 202 +
fire_due runs in the background (so a long agent turn never trips NAS's HTTP
timeout); the store CAS claim inside fire_due de-dupes a scheduler retry.
Tests:
- test_chronos_verify (11): REAL RS256 signing — valid→claims, wrong-aud,
missing/wrong purpose, expired, wrong-iss, tampered-signature (attacker key),
no-key-refuse, empty-token, JWKS-URL key resolution, get_fire_verifier.
- test_cron_fire_webhook (5): valid→202+fire, invalid→401+no-fire, missing
token→401, missing job_id→400, and fire path does NOT require API_SERVER_KEY.
api_server regression suites (214) green.
E.3 (NAS endpoints) is a separate cross-repo PR; the wire contract lands next
(docs/chronos-managed-cron-contract.md).
Phase 4D. The first non-default CronScheduler: plugins/cron/chronos/. Inert
unless cron.provider=chronos; resolve_cron_scheduler falls back to the built-in
if unavailable, so cron never loses its trigger.
Files:
- chronos/__init__.py — ChronosCronScheduler + register(ctx).
* is_available(): config-only, NO network (portal_url + callback_url + a
stored Nous access token via get_provider_auth_state). Returns False →
resolver falls back to built-in.
* start(): reconcile() then RETURN — no blocking loop, no 60s wake (DQ-1:
this is what makes scale-to-zero real; the machine wakes only on a
NAS→agent fire).
* _arm_one_shot(job): POST NAS provision {job_id, fire_at, agent_callback_url,
dedup_key=job_id:fire_at}. Agent owns the time → sub-minute fires survive
(no scheduler 1-minute floor).
* reconcile(): converge NAS arms toward jobs.json — arm missing/changed-time,
cancel orphaned, skip paused. Cold process rebuilds from jobs.json +
idempotent dedup_key.
* on_jobs_changed(): reconcile (re-arm/cancel the affected one-shot).
* fire_due(): ABC default (CAS claim + run_one_job) THEN re-arm the next
one-shot. Job gone (one-shot done / repeat-N exhausted) → no re-arm.
- chronos/_nas_client.py — thin HTTP wrapper for provision/cancel/list using
the agent's existing refresh-aware Nous token (resolve_nous_access_token).
Names no scheduler vendor; holds no scheduler creds.
- chronos/plugin.yaml — discovery metadata.
INVARIANT: zero "qstash"/"upstash" hits in plugins/cron, gateway, hermes_cli,
website/docs — the external scheduler is a NAS-internal detail, never named
agent-side.
Tests (13, all NAS mocked, zero network): is_available off-without-config +
on-with-config + makes-no-network; arm payload incl. sub-minute + noop without
next_run; reconcile arms-all / cancels-orphan / skips-paused / skips-already-
armed; fire_due re-arms next / no re-arm when job gone / no re-arm when claim
lost.
Phase 4C. claim_job_for_fire(job_id, *, claim_ttl_seconds=300) in cron/jobs.py:
under the existing _jobs_lock() file lock, claim a job for a single external
fire so that across N gateway replicas exactly ONE wins. Single-machine
deployments always win (unaffected).
Semantics:
- missing / disabled / paused job → False.
- a fresh fire_claim (younger than claim_ttl_seconds) already present → False
(someone else holds it). Stale claim (crashed winner) → overwrite, so a job
is never wedged forever.
- on win: stamp fire_claim={at, by:_machine_id()}; for recurring (cron/interval)
advance next_run_at (mirrors advance_next_run's at-most-once bump so a stale
re-delivery can't re-fire); one-shots keep next_run_at but the fresh claim
blocks a duplicate retry for the same fire.
- mark_job_run now clears fire_claim on completion so a re-armed recurring job
is claimable again next fire.
_machine_id() (HERMES_MACHINE_ID env, else hostname:pid) is attribution-only;
correctness is the file lock + fresh-claim check, not the id.
This is consumed by CronScheduler.fire_due (Phase 4B). tick is untouched — it
still uses advance_next_run, so the built-in single-machine path is unaffected.
Tests (real store, temp HERMES_HOME): claim-once-then-block + next_run advance,
one-shot no-double-claim, unknown→False, paused→False, stale-claim reclaimable,
mark_job_run clears the claim (recurring re-claimable). tests/cron/ 470 passed.
Phase 4B. Three NON-abstract hooks on the CronScheduler ABC, all with
built-in-safe defaults so the built-in inherits them without overriding and
test_abc_growth_stays_additive stays green (required surface still {name,
start}):
- on_jobs_changed(): post-mutation reconcile hook. Built-in no-op.
- fire_due(job_id): claim the job via the store CAS (claim_job_for_fire,
Phase 4C) then run it through the shared run_one_job (Phase 4A). Returns
False if the claim is lost or the job vanished (repeat-N exhausted between
arm and fire). The inbound webhook (Phase 4E) routes here.
- reconcile(): converge the external registry toward jobs.json. Built-in no-op.
fire_due imports claim_job_for_fire/get_job/run_one_job INSIDE the method, so
this commits cleanly before Phase 4C lands claim_job_for_fire (import-time is
unaffected; tests monkeypatch it with raising=False).
Tests: required-surface-unchanged guard, built-in inherits no-op defaults, and
fire_due's three paths (claim+run, lost-claim→no-run, missing-job→no-run).
tests/cron/ green (20 in test_scheduler_provider.py).
Phase 4A. Factor tick's per-job closure (_process_job: execute → save →
deliver → mark) into a module-level run_one_job(job, *, adapters, loop,
verbose) so the external Chronos provider's fire_due (Phase 4D) reuses the
IDENTICAL body — no duplicated correctness. tick's _process_job is now a thin
wrapper calling run_one_job; the pool/in-flight-guard/contextvars dispatch
logic is unchanged.
run_one_job fires ONE given job; it does NOT decide due-ness, claim, or compute
next_run (tick advances next_run_at under the file lock; an external provider
claims via the store CAS in Phase 4C). Pure refactor, no behavior change.
TDD: test_run_one_job.py characterizes the sequence through tick() first
(test_tick_process_job_sequence, passed pre-extraction), then unit-tests the
helper directly: success sequence, [SILENT]→skip delivery, empty-response soft
failure (#8585), failed-job-still-delivers, exception→mark-failed.
Verified: tests/cron/ 459 passed (was 453 + 6 new); tick behavior unchanged.
Phase 3.5. cron-internals.md gateway-integration section now describes the
pluggable trigger (resolve_cron_scheduler, built-in default, plugins/cron
discovery, the never-without-a-trigger fallback, and the trigger-vs-execution
split). cli-commands.md notes cron.provider near the hermes cron entry.
Phase 3 — rebind both ticker call sites to resolve_cron_scheduler(). Default
(built-in) path is byte-identical; Phase 0 characterization tests + the full
gateway suite (6919) stay green.
Task 3.1: split gateway/run.py _start_cron_ticker into:
- _start_gateway_housekeeping() — the gateway-only chores (channel-dir
refresh, image/doc cache cleanup, paste sweep, curator poll), now on their
own loop/thread, independent of which cron provider is active.
- _start_cron_ticker() — kept as a DEPRECATED shim that runs only the
built-in InProcessCronScheduler().start(), preserving the symbol for
hermes_cli/debug.py and the Phase 0 characterization test.
Task 3.2: start_gateway() resolves the provider and runs provider.start() in
the 'cron-scheduler' thread, plus a second 'gateway-housekeeping' thread;
teardown sets the shared cron_stop, calls provider.stop(), joins both.
Task 3.3: desktop _start_desktop_cron_ticker() swapped its inline tick loop for
resolve_cron_scheduler().start() (no adapters/loop — desktop has none).
The provider owns ONLY the cron tick (so an external scale-to-zero provider
with no 60s loop fits); gateway housekeeping is decoupled from the cron
trigger. Both threads share cron_stop.
Verified: full tests/cron/ (453) + full tests/gateway/ (6919) green. Manual
gateway smoke (Task 3.4) is operator-run, pending.
Phase 2 of the pluggable cron-scheduler refactor. Still no call-site changes;
this wires up provider SELECTION with a hard safety net.
Task 2.1: cron.provider config key (hermes_cli/config.py), empty = built-in.
Additive key — deep-merge picks it up into existing configs with no version
bump (verified: load_config() yields the key on a pre-existing config.yaml).
Task 2.2: plugins/cron/__init__.py — discovery machinery cloned near-verbatim
from plugins/memory/__init__.py, retargeted at CronScheduler /
register_cron_scheduler. Bundled (plugins/cron/<name>/) + user
(/plugins/<name>/) dirs, bundled wins collisions. The built-in is
NOT discovered here — it's core, so the fallback can't be removed.
Task 2.3: resolve_cron_scheduler() in cron/scheduler_provider.py — reads
cron.provider and ALWAYS degrades to built-in (missing / unavailable / load
error / typo all fall back with a warning). cron can never be left without a
trigger.
Deviation from plan: the plan's resolver snippet used cfg_get("cron.provider")
(dotted-string form). The real cfg_get signature is cfg_get(cfg, *keys,
default=) — corrected to cfg_get(load_config(), "cron", "provider", default=""),
matching plugins/memory/__init__.py:349. Tests monkeypatch load_config (not
cfg_get) so the real traversal runs.
Tests: default key empty, discovery returns list, unknown load returns None,
and the four resolver paths (empty→builtin, no-section→builtin,
unknown→builtin, unavailable→builtin, available→used). Full tests/cron/: 453
passed; config suite green (additive key, no migration break).
Phase 1 of the pluggable cron-scheduler refactor (Axis B — the trigger).
No call-site changes; this phase only makes the abstraction exist + tested
in isolation.
Task 1.1: cron/scheduler_provider.py — the EXPERIMENTAL CronScheduler ABC.
Required surface is name + start; is_available()/stop() carry safe defaults.
is_available has a no-network invariant. Docstring marks it experimental
until the Chronos provider (Phase 4) validates the shape.
Task 1.2: InProcessCronScheduler wraps the historical 60s ticker loop, calling
cron.scheduler.tick(sync=False) exactly as the raw ticker does. Uses
stop_event.wait(interval) for responsive stop (both raw tickers already do).
Tests: ABC-is-abstract, default-is_available, the InProcess loop drives tick
and stops, stop() no-op, and test_abc_growth_stays_additive (the forward-compat
guard: required abstractmethods must stay exactly {name, start}, so the three
Phase-4 hooks land as NON-abstract additions).
tick() internals in cron/scheduler.py are byte-unchanged (only new file added).
Phase 0 characterization tests still green. Full tests/cron/: 445 passed.
¡Gracias por contribuir a Hermes Agent! Esta guía cubre todo lo que necesitas: configurar tu entorno de desarrollo, entender la arquitectura, decidir qué construir y conseguir que tu PR sea aceptado.
---
## Prioridades de Contribución
Valoramos las contribuciones en este orden:
1.**Correcciones de errores** — bloqueos, comportamiento incorrecto, pérdida de datos. Siempre la máxima prioridad.
2.**Compatibilidad entre plataformas** — macOS, diferentes distribuciones de Linux y WSL2 en Windows. Queremos que Hermes funcione en todas partes.
3.**Fortalecimiento de seguridad** — inyección de shell, inyección de prompts, traversal de rutas, escalada de privilegios. Ver [Consideraciones de Seguridad](#consideraciones-de-seguridad).
4.**Rendimiento y robustez** — lógica de reintento, manejo de errores, degradación elegante.
5.**Nuevas habilidades** — pero solo las ampliamente útiles. Ver [¿Debería ser una Habilidad o una Herramienta?](#debería-ser-una-habilidad-o-una-herramienta)
6.**Nuevas herramientas** — raramente necesarias. La mayoría de las capacidades deberían ser habilidades. Ver más abajo.
Esta es la pregunta más común para los nuevos colaboradores. La respuesta casi siempre es **habilidad**.
### Hazlo una Habilidad cuando:
- La capacidad se puede expresar como instrucciones + comandos de shell + herramientas existentes
- Envuelve una CLI externa o API que el agente puede llamar a través de `terminal` o `web_extract`
- No necesita integración personalizada de Python ni gestión de claves API integrada en el agente
- Ejemplos: búsqueda en arXiv, flujos de trabajo de git, gestión de Docker, procesamiento de PDF, email a través de herramientas CLI
### Hazlo una Herramienta cuando:
- Requiere integración de extremo a extremo con claves API, flujos de autenticación o configuración de múltiples componentes gestionada por el harness del agente
- Necesita lógica de procesamiento personalizada que debe ejecutarse con precisión en cada ocasión (no "mejor esfuerzo" de la interpretación del LLM)
- Maneja datos binarios, streaming o eventos en tiempo real que no pueden pasar por el terminal
- Ejemplos: automatización de navegador (gestión de sesiones Browserbase), TTS (codificación de audio + entrega en plataforma), análisis de visión (manejo de imágenes base64)
### ¿Debería la Habilidad estar incluida?
Las habilidades incluidas (en `skills/`) se envían con cada instalación de Hermes. Deben ser **ampliamente útiles para la mayoría de los usuarios**:
- Manejo de documentos, investigación web, flujos de trabajo de desarrollo comunes, administración de sistemas
- Usadas regularmente por una amplia gama de personas
Si tu habilidad es oficial y útil pero no universalmente necesaria (ej., una integración de servicio de pago, una dependencia pesada), ponla en **`optional-skills/`** — se envía con el repositorio pero no está activada por defecto. Los usuarios pueden descubrirla a través de `hermes skills browse` (etiquetada como "oficial") e instalarla con `hermes skills install` (sin advertencia de terceros, confianza integrada).
Si tu habilidad es especializada, contribuida por la comunidad o de nicho, es mejor para un **Skills Hub** — súbela a un registro de habilidades y compártela en el [Discord de Nous Research](https://discord.gg/NousResearch). Los usuarios pueden instalarla con `hermes skills install`.
---
## Proveedores de Memoria: Publicar como Plugin Independiente
**Ya no aceptamos nuevos proveedores de memoria en este repositorio.** El conjunto de proveedores integrados en `plugins/memory/` (honcho, mem0, supermemory, byterover, hindsight, holographic, openviking, retaindb) está cerrado. Si quieres añadir un nuevo backend de memoria, publícalo como un **repositorio de plugin independiente** que los usuarios instalen en `~/.hermes/plugins/` (o a través de un entry point de pip).
Los plugins de memoria independientes:
- Implementan el mismo ABC `MemoryProvider` (`agent/memory_provider.py`) — `sync_turn`, `prefetch`, `shutdown` y opcionalmente `post_setup(hermes_home, config)` para integración con el asistente de configuración
- Usan el mismo sistema de descubrimiento — `discover_memory_providers()` los recoge desde directorios de plugins de usuario/proyecto y entry points de pip
- Se integran con `hermes memory setup` a través de `post_setup()` — sin necesidad de tocar el código base
- Pueden registrar sus propios subcomandos CLI a través de `register_cli(subparser)` en un archivo `cli.py`
- Obtienen todos los mismos hooks de ciclo de vida y plomería de configuración que los proveedores incluidos en el árbol
Los PRs que añadan un nuevo directorio bajo `plugins/memory/` serán cerrados con un puntero para publicar el proveedor como su propio repositorio. Los proveedores en árbol existentes se mantienen; las correcciones de errores para ellos son bienvenidas.
Esto no es una barra de calidad — es una decisión de acoplamiento y mantenimiento. Los proveedores de memoria son el tipo de plugin más común y no deberían vivir todos en este árbol.
---
## Configuración del Desarrollo
### Prerequisitos
| Requisito | Notas |
|-----------|-------|
| **Git** | Con la extensión `git-lfs` instalada |
| **Python 3.11+** | uv lo instalará si falta |
| **uv** | Gestor de paquetes Python rápido ([instalar](https://docs.astral.sh/uv/)) |
| **Node.js 20+** | Opcional — necesario para herramientas de navegador y puente WhatsApp (coincide con los engines de `package.json` raíz) |
| `~/.hermes/state.db` | Base de datos de sesiones SQLite |
| `~/.hermes/sessions/` | Índice de enrutamiento del gateway (`sessions.json`), migas de pan de solicitudes, transcripciones `*.jsonl` del gateway y (opcionalmente) snapshots JSON por sesión cuando `sessions.write_json_snapshots: true` está configurado. Los snapshots por sesión están desactivados por defecto; state.db es canónica. |
| `~/.hermes/cron/` | Datos de trabajos programados |
| `~/.hermes/whatsapp/session/` | Credenciales del puente WhatsApp |
---
## Descripción General de la Arquitectura
### Bucle Central
```
Mensaje del usuario → AIAgent._run_agent_loop()
├── Construir prompt del sistema (prompt_builder.py)
├── Construir kwargs de API (modelo, mensajes, herramientas, configuración de razonamiento)
├── Llamar al LLM (API compatible con OpenAI)
├── Si tool_calls en la respuesta:
│ ├── Ejecutar cada herramienta a través del despacho del registro
│ ├── Añadir resultados de herramientas a la conversación
│ └── Volver a la llamada al LLM
├── Si respuesta de texto:
│ ├── Persistir sesión en DB
│ └── Devolver final_response
└── Compresión de contexto si se acerca al límite de tokens
```
### Patrones de Diseño Clave
- **Herramientas auto-registradas**: Cada archivo de herramienta llama a `registry.register()` en el momento de importación. `model_tools.py` activa el descubrimiento importando todos los módulos de herramientas.
- **Agrupación en toolsets**: Las herramientas se agrupan en toolsets (`web`, `terminal`, `file`, `browser`, etc.) que pueden habilitarse/deshabilitarse por plataforma.
- **Persistencia de sesión**: Todas las conversaciones se almacenan en SQLite (`hermes_state.py`) con búsqueda de texto completo y títulos de sesión únicos.
- **Inyección efímera**: Los prompts del sistema y los mensajes de relleno se inyectan en el momento de la llamada API, nunca se persisten en la base de datos ni en los logs.
- **Abstracción de proveedor**: El agente funciona con cualquier API compatible con OpenAI. La resolución del proveedor ocurre en el momento de la inicialización.
- **Enrutamiento de proveedor**: Al usar OpenRouter, `provider_routing` en config.yaml controla la selección del proveedor.
---
## Estilo de Código
- **PEP 8** con excepciones prácticas (no imponemos longitud de línea estricta)
- **Comentarios**: Solo cuando se explica la intención no obvia, compromisos o peculiaridades de API. No narres lo que hace el código
- **Manejo de errores**: Captura excepciones específicas. Registra con `logger.warning()`/`logger.error()` — usa `exc_info=True` para errores inesperados
- **Multiplataforma**: Nunca asumas Unix. Ver [Compatibilidad Multiplataforma](#compatibilidad-multiplataforma)
---
## Añadir una Nueva Herramienta
Antes de escribir una herramienta, pregúntate: [¿debería ser una habilidad en su lugar?](#debería-ser-una-habilidad-o-una-herramienta)
Las herramientas se auto-registran en el registro central. Cada archivo de herramienta co-localiza su esquema, manejador y registro:
```python
"""my_tool — Breve descripción de lo que hace esta herramienta."""
"""Manejador. Devuelve un resultado en cadena (a menudo JSON)."""
result=do_work(param1,param2)
returnjson.dumps(result)
MY_TOOL_SCHEMA={
"type":"function",
"function":{
"name":"my_tool",
"description":"Qué hace esta herramienta y cuándo debería usarla el agente.",
"parameters":{
"type":"object",
"properties":{
"param1":{"type":"string","description":"Qué es param1"},
"param2":{"type":"integer","description":"Qué es param2","default":10},
},
"required":["param1"],
},
},
}
def_check_requirements()->bool:
"""Devuelve True si las dependencias de esta herramienta están disponibles."""
returnTrue
registry.register(
name="my_tool",
toolset="my_toolset",
schema=MY_TOOL_SCHEMA,
handler=lambdaargs,**kw:my_tool(**args,**kw),
check_fn=_check_requirements,
)
```
**Conectar a un toolset (requerido):** Las herramientas integradas se auto-descubren: cualquier
archivo `tools/*.py` que contenga una llamada de nivel superior `registry.register(...)` es
importado por `discover_builtin_tools()` en `tools/registry.py` cuando `model_tools`
se carga. **No** hay una lista de importaciones manual en `model_tools.py` que mantener.
Todavía debes añadir el nombre de la herramienta a la lista apropiada en `toolsets.py`
(por ejemplo `_HERMES_CORE_TOOLS` o un toolset dedicado); de lo contrario la herramienta
se registra pero nunca se expone al agente.
Consulta `AGENTS.md` (sección **Adding New Tools**) para rutas conscientes del perfil y
orientación sobre plugins vs. núcleo.
---
## Añadir una Habilidad
Las habilidades incluidas viven en `skills/` organizadas por categoría. Las habilidades opcionales oficiales usan la misma estructura en `optional-skills/`:
```
skills/
├── research/
│ └── arxiv/
│ ├── SKILL.md # Requerido: instrucciones principales
│ └── scripts/ # Opcional: scripts auxiliares
│ └── search_arxiv.py
├── productivity/
│ └── ocr-and-documents/
│ ├── SKILL.md
│ ├── scripts/
│ └── references/
└── ...
```
### Formato de SKILL.md
```markdown
---
name: my-skill
description: Breve descripción (mostrada en los resultados de búsqueda de habilidades)
version: 1.0.0
author: Tu Nombre
license: MIT
platforms: [macos, linux] # Opcional — restringir a plataformas de SO específicas
required_environment_variables: # Opcional — metadatos de configuración segura al cargar
- name: MY_API_KEY
prompt: Clave API
help: Dónde obtenerla
required_for: funcionalidad completa
prerequisites: # Requisitos de tiempo de ejecución heredados opcionales
env_vars: [MY_API_KEY]
commands: [curl, jq]
metadata:
hermes:
tags: [Categoría, Subcategoría, Palabras clave]
related_skills: [other-skill-name]
fallback_for_toolsets: [web]
requires_toolsets: [terminal]
---
# Título de la Habilidad
Introducción breve.
## Cuándo Usar
Condiciones de activación — ¿cuándo debería el agente cargar esta habilidad?
## Referencia Rápida
Tabla de comandos o llamadas API comunes.
## Procedimiento
Instrucciones paso a paso que el agente sigue.
## Problemas Conocidos
Modos de fallo conocidos y cómo manejarlos.
## Verificación
Cómo confirma el agente que funcionó.
```
### Estándares de autoría de habilidades (OBLIGATORIOS)
Todo skill nuevo o modernizado — incluido, opcional o contribuido — debe cumplir estos estándares antes del merge:
1.**`description` ≤ 60 caracteres, una oración, termina con punto.** Las descripciones largas saturan la UI de listado de habilidades. Indica la capacidad, no la implementación. Sin palabras de marketing ("potente", "completo", "fluido", "avanzado").
2.**Las herramientas referenciadas en el cuerpo de SKILL.md deben ser herramientas nativas de Hermes o servidores MCP que la habilidad espere explícitamente.** Usa los nombres de herramientas en comillas invertidas: `` `terminal` ``, `` `web_extract` ``, `` `web_search` ``, `` `read_file` ``, `` `write_file` ``, etc.
3.**El campo `platforms:` auditado contra las importaciones reales del script.** Las habilidades que usen primitivos solo de POSIX deben declarar sus plataformas soportadas.
4.**`author` da crédito primero al colaborador humano.**
5.**El cuerpo de SKILL.md usa el orden moderno de secciones:** título, intro de 2-3 oraciones, luego: `## Cuándo Usar`, `## Prerequisitos`, `## Cómo Ejecutar`, `## Referencia Rápida`, `## Procedimiento`, `## Problemas Conocidos`, `## Verificación`.
6.**Los scripts van en `scripts/`, las referencias en `references/`, las plantillas en `templates/`.**
7.**Los tests viven en `tests/skills/test_<skill>_skill.py`** y usan solo stdlib + pytest + `unittest.mock`. Sin llamadas de red en vivo.
8.**Las adiciones a `.env.example` están aisladas en un bloque claramente delimitado.**
---
## Añadir una Skin / Tema
Hermes usa un sistema de skins basado en datos — no se necesitan cambios de código para añadir una nueva skin.
**Opción A: Skin de usuario (archivo YAML)**
Crea `~/.hermes/skins/<nombre>.yaml`:
```yaml
name:mitema
description:Breve descripción del tema
colors:
banner_border:"#HEX"
banner_title:"#HEX"
banner_accent:"#HEX"
banner_dim:"#HEX"
banner_text:"#HEX"
response_border:"#HEX"
spinner:
waiting_faces:["(⚔)","(⛨)"]
thinking_faces:["(⚔)","(⌁)"]
thinking_verbs:["forjando","planeando"]
branding:
agent_name:"Mi Agente"
welcome:"Mensaje de bienvenida"
response_label:" ⚔ Agente "
prompt_symbol:"⚔"
tool_prefix:"╎"
```
Todos los campos son opcionales — los valores faltantes se heredan de la skin predeterminada.
**Opción B: Skin integrada**
Añade al dict `_BUILTIN_SKINS` en `hermes_cli/skin_engine.py`. Usa el mismo esquema que arriba pero como dict de Python.
**Activar:**
- CLI: `/skin mitema` o establece `display.skin: mitema` en config.yaml
---
## Compatibilidad Multiplataforma
Hermes se ejecuta en Linux, macOS y Windows nativo (además de WSL2). Al escribir código
que toca el SO, asume que *cualquier* plataforma puede alcanzar tu ruta de código.
> **Antes de hacer PR:** ejecuta `scripts/check-windows-footguns.py` para detectar
> los patrones inseguros comunes de Windows en tu diff. Es basado en grep y barato;
> CI también lo ejecuta en cada PR.
### Reglas críticas
1.**Nunca llames `os.kill(pid, 0)` para comprobaciones de liveness.** En Windows **NO es una operación sin efecto**. Usa `psutil.pid_exists(pid)` en su lugar.
2.**Usa `shutil.which()` antes de hacer shell — no asumas que Windows tiene las herramientas que tiene Linux.**`ps`, `kill`, `grep`, `awk`, etc. simplemente no existen en Windows.
3.**`termios` y `fcntl` son solo de Unix.** Siempre captura tanto `ImportError` como `NotImplementedError`.
4.**Codificación de archivos.** Windows puede guardar archivos `.env` en `cp1252`. Siempre maneja errores de codificación.
5.**Gestión de procesos.**`os.setsid()`, `os.killpg()`, `os.fork()`, `os.getuid()` y el manejo de señales POSIX difieren en Windows.
6.**Señales que no existen en Windows:**`SIGALRM`, `SIGCHLD`, `SIGHUP`, `SIGUSR1`, `SIGUSR2`, etc.
7.**Separadores de ruta.** Usa `pathlib.Path` en lugar de concatenación de cadenas con `/`.
8.**Los enlaces simbólicos necesitan privilegios elevados en Windows** (a menos que el Modo Desarrollador esté activado).
9.**Los modos de archivo POSIX (0o600, 0o644, etc.) NO se aplican en NTFS** por defecto.
10.**Los daemons de fondo desacoplados en Windows necesitan `pythonw.exe`, NO `python.exe`.**
---
## Consideraciones de Seguridad
Hermes tiene acceso al terminal. La seguridad importa.
### Protecciones existentes
| Capa | Implementación |
|------|---------------|
| **Piping de contraseña sudo** | Usa `shlex.quote()` para prevenir inyección de shell |
| **Detección de comandos peligrosos** | Patrones regex en `tools/approval.py` con flujo de aprobación del usuario |
| **Inyección de prompts en cron** | Escáner en `tools/cronjob_tools.py` bloquea patrones de anulación de instrucciones |
| **Lista de denegación de escritura** | Rutas protegidas resueltas a través de `os.path.realpath()` para prevenir bypass de enlaces simbólicos |
| **Skills Guard** | Escáner de seguridad para habilidades instaladas desde el hub (`tools/skills_guard.py`) |
| **Sandbox de ejecución de código** | El proceso hijo `execute_code` se ejecuta con claves API eliminadas del entorno |
| **Fortalecimiento de contenedor** | Docker: todas las capacidades eliminadas, sin escalada de privilegios, límites de PID, tmpfs de tamaño limitado |
### Al contribuir código sensible a la seguridad
- **Siempre usa `shlex.quote()`** al interpolar entrada del usuario en comandos de shell
- **Resuelve enlaces simbólicos** con `os.path.realpath()` antes de comprobaciones de control de acceso basadas en rutas
- **No registres secretos.** Las claves API, tokens y contraseñas nunca deben aparecer en la salida de log
- **Captura excepciones amplias** alrededor de la ejecución de herramientas para que un solo fallo no bloquee el bucle del agente
- **Prueba en todas las plataformas** si tu cambio toca rutas de archivos, gestión de procesos o comandos de shell
### Política de fijación de dependencias (fortalecimiento de la cadena de suministro)
Tras el [compromiso de la cadena de suministro de litellm](https://github.com/BerriAI/litellm/issues/24512) en marzo de 2026 y la [campaña del gusano Mini Shai-Hulud](https://socket.dev/blog/tanstack-npm-packages-compromised-mini-shai-hulud-supply-chain-attack) en mayo de 2026, todas las dependencias deben seguir estas reglas:
| Tipo de fuente | Tratamiento requerido | Justificación |
|---|---|---|
| **Paquete PyPI** | `>=suelo,<siguiente_mayor` | Las versiones de PyPI son inmutables una vez publicadas, pero pueden empujarse nuevas versiones en tu rango. |
| **URL de Git** | SHA completo del commit | Las ramas y etiquetas son refs mutables; el SHA está direccionado por contenido. |
| **GitHub Actions** | SHA completo del commit + comentario de versión | Las etiquetas de acción son refs mutables. Fija como `uses: owner/action@<sha> # vX.Y.Z` |
| **Instalaciones pip solo de CI** | `==exacto` | Builds de CI herméticos; el cambio es aceptable. |
**Cada nueva dependencia de PyPI en un PR debe tener un límite superior `<siguiente_mayor`.** Los PRs que añadan especificaciones `>=X.Y.Z` sin límite superior serán rechazados.
---
## Proceso de Pull Request
### Nomenclatura de ramas
```
fix/descripcion # Correcciones de errores
feat/descripcion # Nuevas funcionalidades
docs/descripcion # Documentación
test/descripcion # Tests
refactor/descripcion # Reestructuración de código
```
### Antes de enviar
1.**Ejecutar tests**: `scripts/run_tests.sh` (recomendado; igual que CI) o `pytest tests/ -v` con el venv del proyecto activado
2.**Probar manualmente**: Ejecuta `hermes` y ejercita la ruta de código que cambiaste
3.**Verificar impacto multiplataforma**: Si tocas E/S de archivos, gestión de procesos o manejo del terminal, considera macOS, Linux y WSL2
4.**Mantén los PRs enfocados**: Un cambio lógico por PR. No mezcles una corrección de error con una refactorización con una nueva funcionalidad.
### Descripción del PR
Incluye:
- **Qué** cambió y **por qué**
- **Cómo probarlo** (pasos de reproducción para errores, ejemplos de uso para funcionalidades)
@@ -18,6 +18,24 @@ We value contributions in this order:
---
## Before You Start: Search First
A quick search before you build saves your time and keeps the PR queue clean — duplicates are common here, so it's worth a minute up front.
- **Search both open *and* merged PRs and issues** for your topic or error symptom — the duplicate-check in the PR template fires at review time, after you've already done the work:
gh search prs --repo NousResearch/hermes-agent --state all "<your terms>"
```
Or use the web UI: [issues](https://github.com/NousResearch/hermes-agent/issues?q=) · [PRs (all states)](https://github.com/NousResearch/hermes-agent/pulls?q=is%3Apr).
- **The issue tracker can lag the code.** Many requested features are already implemented in-tree, so also search the source (`search_files`, or your editor's grep) for the capability before proposing it.
- **If an open PR already addresses it**, consider reviewing or improving that one instead of opening a competing duplicate.
- **For larger work**, comment on the issue to signal you're working on it, so others don't start the same thing.
Related: #38284 covers the agent-side analog — Hermes itself checking existing issues and PRs before deep self-troubleshooting. This section is the human-contributor complement.
---
## Should it be a Skill or a Tool?
This is the most common question for new contributors. The answer is almost always **skill**.
@@ -412,6 +430,12 @@ Brief intro.
## When to Use
Trigger conditions — when should the agent load this skill?
## Prerequisites
Env vars, install steps, MCP setup, API key sourcing.
<a href="https://nousresearch.com"><img src="https://img.shields.io/badge/Creado%20por-Nous%20Research-blueviolet?style=for-the-badge" alt="Creado por Nous Research"></a>
**El agente de IA con mejora continua creado por [Nous Research](https://nousresearch.com).** Es el único agente con un bucle de aprendizaje integrado: crea habilidades a partir de la experiencia, las mejora durante el uso, se impulsa a sí mismo a persistir el conocimiento, busca en sus propias conversaciones pasadas y construye un modelo cada vez más profundo de quién eres a lo largo de las sesiones. Ejecútalo en un VPS de $5, un clúster de GPUs o infraestructura sin servidor que cuesta casi nada cuando está inactivo. No está atado a tu laptop — habla con él desde Telegram mientras trabaja en una VM en la nube.
Usa cualquier modelo que quieras — [Nous Portal](https://portal.nousresearch.com), [OpenRouter](https://openrouter.ai) (más de 200 modelos), [NovitaAI](https://novita.ai), [NVIDIA NIM](https://build.nvidia.com) (Nemotron), [Xiaomi MiMo](https://platform.xiaomimimo.com), [z.ai/GLM](https://z.ai), [Kimi/Moonshot](https://platform.moonshot.ai), [MiniMax](https://www.minimax.io), [Hugging Face](https://huggingface.co), OpenAI, o tu propio endpoint. Cambia con `hermes model` — sin cambios de código, sin dependencias.
<table>
<tr><td><b>Una interfaz de terminal real</b></td><td>TUI completa con edición multilínea, autocompletado de comandos, historial de conversaciones, interrupción y redirección, y salida de herramientas en streaming.</td></tr>
<tr><td><b>Vive donde tú vives</b></td><td>Telegram, Discord, Slack, WhatsApp, Signal y CLI — todo desde un único proceso gateway. Transcripción de notas de voz, continuidad de conversación entre plataformas.</td></tr>
<tr><td><b>Un bucle de aprendizaje cerrado</b></td><td>Memoria curada por el agente con recordatorios periódicos. Creación autónoma de habilidades tras tareas complejas. Las habilidades mejoran solas durante el uso. Búsqueda FTS5 de sesiones con resumención por LLM para recuperación entre sesiones. Modelado de usuario dialéctico <a href="https://github.com/plastic-labs/honcho">Honcho</a>. Compatible con el estándar abierto de <a href="https://agentskills.io">agentskills.io</a>.</td></tr>
<tr><td><b>Automatizaciones programadas</b></td><td>Planificador cron integrado con entrega a cualquier plataforma. Informes diarios, copias de seguridad nocturnas, auditorías semanales — todo en lenguaje natural, ejecutándose de forma autónoma.</td></tr>
<tr><td><b>Delega y paraleliza</b></td><td>Lanza subagentes aislados para flujos de trabajo paralelos. Escribe scripts de Python que llaman a herramientas vía RPC, convirtiendo pipelines de múltiples pasos en turnos de coste cero de contexto.</td></tr>
<tr><td><b>Funciona en cualquier lugar, no solo en tu laptop</b></td><td>Seis backends de terminal — local, Docker, SSH, Singularity, Modal y Daytona. Daytona y Modal ofrecen persistencia sin servidor — el entorno de tu agente hiberna cuando está inactivo y se activa bajo demanda, costando casi nada entre sesiones. Ejecútalo en un VPS de $5 o un clúster de GPUs.</td></tr>
<tr><td><b>Listo para investigación</b></td><td>Generación de trayectorias en lote, compresión de trayectorias para entrenar la próxima generación de modelos de llamadas a herramientas.</td></tr>
> **Nota:** En Windows nativo, Hermes funciona sin WSL — la CLI, el gateway, la TUI y las herramientas funcionan de forma nativa. Si prefieres usar WSL2, el comando de Linux/macOS de arriba también funciona allí. ¿Encontraste un error? Por favor [crea un issue](https://github.com/NousResearch/hermes-agent/issues).
El instalador se encarga de todo: uv, Python 3.11, Node.js, ripgrep, ffmpeg, **y un Git Bash portátil** (MinGit, descomprimido en `%LOCALAPPDATA%\hermes\git` — no requiere administrador, completamente aislado de cualquier instalación de Git del sistema). Hermes usa este Git Bash incluido para ejecutar comandos de shell.
Si ya tienes Git instalado, el instalador lo detecta y lo usa en su lugar. De lo contrario, una descarga de ~45MB de MinGit es todo lo que necesitas — no tocará ni interferirá con ningún Git del sistema.
> **Android / Termux:** La ruta manual probada está documentada en la [guía de Termux](https://hermes-agent.nousresearch.com/docs/getting-started/termux). En Termux, Hermes instala el extra `.[termux]` curado porque el extra completo `.[all]` actualmente incluye dependencias de voz incompatibles con Android.
>
> **Windows:** Windows nativo es totalmente compatible — el comando de PowerShell de arriba instala todo. Si prefieres usar WSL2, el comando de Linux también funciona allí. La instalación nativa de Windows se encuentra en `%LOCALAPPDATA%\hermes`; WSL2 instala en `~/.hermes` como en Linux.
Hermes funciona con cualquier proveedor que quieras — eso no cambiará. Pero si prefieres no recopilar cinco claves API separadas para el modelo, búsqueda web, generación de imágenes, TTS y un navegador en la nube, **[Nous Portal](https://portal.nousresearch.com)** las cubre todas bajo una sola suscripción:
- **Más de 300 modelos** — elige cualquiera con `/model <nombre>`
- **Tool Gateway** — búsqueda web (Firecrawl), generación de imágenes (FAL), texto a voz (OpenAI), navegador en la nube (Browser Use), todo enrutado a través de tu suscripción. Sin cuentas adicionales.
Un comando desde una instalación nueva:
```bash
hermes setup --portal
```
Esto te autentica vía OAuth, establece Nous como tu proveedor y activa el Tool Gateway. Comprueba qué está conectado en cualquier momento con `hermes portal info`. Detalles completos en la [página de documentación del Tool Gateway](https://hermes-agent.nousresearch.com/docs/user-guide/features/tool-gateway).
Puedes seguir usando tus propias claves por herramienta cuando quieras — el gateway es por backend, no todo o nada.
---
## Referencia rápida: CLI vs Mensajería
Hermes tiene dos puntos de entrada: inicia la interfaz de terminal con `hermes`, o ejecuta el gateway y habla con él desde Telegram, Discord, Slack, WhatsApp, Signal o Email. Una vez en una conversación, muchos comandos de barra son compartidos entre ambas interfaces.
| Reintentar o deshacer último turno | `/retry`, `/undo` | `/retry`, `/undo` |
| Comprimir contexto / ver uso | `/compress`, `/usage`, `/insights [--days N]` | `/compress`, `/usage`, `/insights [days]` |
| Explorar habilidades | `/skills` o `/<nombre-habilidad>` | `/<nombre-habilidad>` |
| Interrumpir trabajo actual | `Ctrl+C` o enviar un nuevo mensaje | `/stop` o enviar un nuevo mensaje |
| Estado específico de plataforma | `/platforms` | `/status`, `/sethome` |
Para las listas de comandos completas, consulta la [guía de CLI](https://hermes-agent.nousresearch.com/docs/user-guide/cli) y la [guía del Gateway de Mensajería](https://hermes-agent.nousresearch.com/docs/user-guide/messaging).
---
## Documentación
Toda la documentación está en **[hermes-agent.nousresearch.com/docs](https://hermes-agent.nousresearch.com/docs/)**:
| [Inicio rápido](https://hermes-agent.nousresearch.com/docs/getting-started/quickstart) | Instalar → configurar → primera conversación en 2 minutos |
| [Uso de CLI](https://hermes-agent.nousresearch.com/docs/user-guide/cli) | Comandos, atajos de teclado, personalidades, sesiones |
| [Configuración](https://hermes-agent.nousresearch.com/docs/user-guide/configuration) | Archivo de configuración, proveedores, modelos, todas las opciones |
| [Gateway de Mensajería](https://hermes-agent.nousresearch.com/docs/user-guide/messaging) | Telegram, Discord, Slack, WhatsApp, Signal, Home Assistant |
| [Seguridad](https://hermes-agent.nousresearch.com/docs/user-guide/security) | Aprobación de comandos, emparejamiento por DM, aislamiento en contenedor |
| [Herramientas y Toolsets](https://hermes-agent.nousresearch.com/docs/user-guide/features/tools) | Más de 40 herramientas, sistema de toolsets, backends de terminal |
| [Sistema de Habilidades](https://hermes-agent.nousresearch.com/docs/user-guide/features/skills) | Memoria procedimental, Skills Hub, creación de habilidades |
| [Programación Cron](https://hermes-agent.nousresearch.com/docs/user-guide/features/cron) | Tareas programadas con entrega a plataforma |
| [Archivos de Contexto](https://hermes-agent.nousresearch.com/docs/user-guide/features/context-files) | Contexto de proyecto que da forma a cada conversación |
| [Arquitectura](https://hermes-agent.nousresearch.com/docs/developer-guide/architecture) | Estructura del proyecto, bucle del agente, clases principales |
| [Contribuir](https://hermes-agent.nousresearch.com/docs/developer-guide/contributing) | Configuración de desarrollo, proceso de PR, estilo de código |
| [Referencia de CLI](https://hermes-agent.nousresearch.com/docs/reference/cli-commands) | Todos los comandos y flags |
| [Variables de Entorno](https://hermes-agent.nousresearch.com/docs/reference/environment-variables) | Referencia completa de variables de entorno |
---
## Migración desde OpenClaw
Si vienes de OpenClaw, Hermes puede importar automáticamente tu configuración, memorias, habilidades y claves API.
**Durante la configuración inicial:** El asistente de configuración (`hermes setup`) detecta automáticamente `~/.openclaw` y ofrece migrar antes de que comience la configuración.
- **Habilidades** — habilidades creadas por el usuario → `~/.hermes/skills/openclaw-imports/`
- **Lista de comandos permitidos** — patrones de aprobación
- **Configuración de mensajería** — configuración de plataformas, usuarios permitidos, directorio de trabajo
- **Claves API** — secretos en lista de permitidos (Telegram, OpenRouter, OpenAI, Anthropic, ElevenLabs)
- **Assets de TTS** — archivos de audio del espacio de trabajo
- **Instrucciones del espacio de trabajo** — AGENTS.md (con `--workspace-target`)
Consulta `hermes claw migrate --help` para todas las opciones, o usa la habilidad `openclaw-migration` para una migración guiada interactiva por el agente con vistas previas de dry-run.
---
## Contribuir
¡Las contribuciones son bienvenidas! Consulta la [Guía de Contribución](CONTRIBUTING.es.md) para la configuración del desarrollo, el estilo de código y el proceso de PR.
Inicio rápido para colaboradores — clona y comienza con `setup-hermes.sh`:
- 🔌 [computer-use-linux](https://github.com/avifenesh/computer-use-linux) — Servidor MCP de control de escritorio Linux para Hermes y otros hosts MCP, con árboles de accesibilidad AT-SPI, entrada Wayland/X11, capturas de pantalla y targeting de ventanas del compositor.
- 🔌 [HermesClaw](https://github.com/AaronWong1999/hermesclaw) — Puente WeChat comunitario: Ejecuta Hermes Agent y OpenClaw en la misma cuenta de WeChat.
---
## Licencia
MIT — ver [LICENSE](LICENSE).
Creado por [Nous Research](https://nousresearch.com).
<a href="https://nousresearch.com"><img src="https://img.shields.io/badge/Built%20by-Nous%20Research-blueviolet?style=for-the-badge" alt="Built by Nous Research"></a>
**The self-improving AI agent built by [Nous Research](https://nousresearch.com).** It's the only agent with a built-in learning loop — it creates skills from experience, improves them during use, nudges itself to persist knowledge, searches its own past conversations, and builds a deepening model of who you are across sessions. Run it on a $5 VPS, a GPU cluster, or serverless infrastructure that costs nearly nothing when idle. It's not tied to your laptop — talk to it from Telegram while it works on a cloud VM.
#### Windows Defender or antivirus flags `uv.exe` as malware
If your antivirus (Bitdefender, Windows Defender, etc.) quarantines `uv.exe` from the Hermes `bin` folder (`%LOCALAPPDATA%\hermes\bin\uv.exe`), this is a **false positive**. The file is Astral's `uv` — the Rust Python package manager Hermes bundles to manage its Python environment. ML-based antivirus engines commonly flag unsigned Rust binaries that download and install packages.
If attestation says "Verification succeeded" and the last line prints `True`, you're good.
**To whitelist Hermes:**
- **Windows Defender:** Run PowerShell as Admin → `Add-MpPreference -ExclusionPath "$env:LOCALAPPDATA\hermes\bin"`
- **Bitdefender:** Add an exception in the Bitdefender console (Protection > Antivirus > Settings > Manage Exceptions)
- Whitelist the **folder**, not the file hash — Hermes updates `uv` and the hash changes every version
For more context, see the upstream Astral reports: [astral-sh/uv#13553](https://github.com/astral-sh/uv/issues/13553), [astral-sh/uv#15011](https://github.com/astral-sh/uv/issues/15011), [astral-sh/uv#10079](https://github.com/astral-sh/uv/issues/10079).
time.sleep(2)# Brief pause between compression retries
_retry.restart_with_compressed_messages=True
break
@@ -3090,13 +3185,13 @@ def run_conversation(
agent._flush_status_buffer()
agent._vprint(f"{agent.log_prefix}❌ Context length exceeded and cannot compress further.",force=True)
agent._vprint(f"{agent.log_prefix} 💡 The conversation has accumulated too much content. Try /new to start fresh, or /compress to manually trigger compression.",force=True)
@@ -85,7 +85,7 @@ Installers are built and uploaded to GitHub Releases manually. macOS/Windows sig
### How it works
The packaged app ships only the Electron shell. On first launch it installs the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. The renderer (React, in `src/`) talks to a `hermes dashboard` backend over the standard gateway APIs and reuses the embedded TUI rather than reimplementing chat. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
The packaged app ships the Electron shell and a native React chat surface. On first launch it can install the Hermes Agent runtime into `HERMES_HOME` (`~/.hermes`, or `%LOCALAPPDATA%\hermes` on Windows) — the **same layout a CLI install uses**, so the two are interchangeable. Backend resolution first honours `HERMES_DESKTOP_HERMES_ROOT`, then a completed managed install, then a probed `hermes` on `PATH` (unless `HERMES_DESKTOP_IGNORE_EXISTING=1` is set), and finally an explicit `HERMES_DESKTOP_HERMES` command override for packagers/troubleshooting. The renderer (React, in `src/`) talks to a `hermes dashboard` backend over the `tui_gateway`/dashboard APIs and reuses the agent runtime rather than embedding `hermes --tui`. The install, backend-resolution, and self-update logic all live in `electron/main.cjs`.
emitUpdateProgress({stage:'restart',message:'Handing off to the Hermes updater…',percent:100})
emitUpdateProgress({
stage:'restart',
message:'Updating Hermes — this window will close and the updater will open. Don’t reopen Hermes yourself; it restarts automatically when the update finishes.',
percent:100
})
repairMacUpdaterHelper(updater)
constupdateRoot=resolveUpdateRoot()
@@ -1827,7 +2010,7 @@ async function applyUpdates(opts = {}) {
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.