Commit Graph

6741 Commits

Author SHA1 Message Date
oak
4e296dcdda fix(auxiliary): pass raw base_url to _maybe_wrap_anthropic for correct transport detection (#17467)
Fixes HTTP 404 errors when using Anthropic-compatible providers (Kimi Coding, MiniMax, MiniMax-CN) for auxiliary tasks.

Root cause: `_to_openai_base_url()` rewrites `/anthropic` → `/v1` so the OpenAI SDK hits the right endpoint. But the rewritten URL was then passed to `_maybe_wrap_anthropic`, whose `_endpoint_speaks_anthropic_messages` detector only fires on `/anthropic` or `api.kimi.com/coding`. Detector saw `/v1` → returned False → no Anthropic wrap → 404 on every aux call.

Fix: preserve the raw base_url before rewriting and pass it to `_maybe_wrap_anthropic` for transport detection, while still giving the rewritten URL to the OpenAI client constructor.

Closes #17705, #17413, #17086, #10469.

Co-authored-by: oak <chengoak@users.noreply.github.com>
2026-04-30 10:18:42 -07:00
brooklyn!
d954d6fbcf Merge pull request #18024 from NousResearch/bb/mouse-mode-fast-path
fix(cli): tighten terminal leak fast path
2026-04-30 10:17:59 -07:00
Brooklyn Nicholson
e30de51ee9 fix(cli): tighten terminal leak fast path 2026-04-30 12:16:04 -05:00
brooklyn!
285e9efb3f Merge pull request #17701 from NousResearch/bb/mouse-mode-self-heal
fix(cli): recover leaked mouse tracking terminal state
2026-04-30 10:09:39 -07:00
Brooklyn Nicholson
cad7944b92 fix(tui): reset extended keyboard modes 2026-04-30 12:05:15 -05:00
Siddharth Balyan
9a14540603 fix(nix): replace magic-nix-cache with Cachix (#17928)
* fix(nix): replace magic-nix-cache with Cachix

magic-nix-cache caused recurring CI failures (TwirpErrorResponse
ResourceExhausted) by hitting GitHub Actions Cache's 10 GB limit and
200 req/min rate limit. This was flagged as 'unfixable infra flake' in
#17836 but is actually a fixable architecture choice.

Switch to Cachix (dedicated binary cache, no GHA quota dependency):
- Replace DeterminateSystems/magic-nix-cache-action with cachix/cachix-action
- Add cachix-auth-token input to nix-setup composite action
- Pass CACHIX_AUTH_TOKEN secret through all three nix workflows
- continue-on-error: true so cache failures never block CI

Cache 'hermes-agent' is public at hermes-agent.cachix.org.
Devs can pull locally with: cachix use hermes-agent

* fix: correct cachix-action commit SHA pin

---------

Co-authored-by: Hermes Agent <hermes@nousresearch.com>
2026-04-30 17:38:58 +05:30
Teknium
ae8930afa5 fix(skills): also bump_use on skill_view tool invocation
Widen #17818 to cover the dominant 'agent actively used this skill' path:
when the model calls the skill_view tool, bump use_count alongside view_count.
The slash-command and --skill preload paths (covered by the cherry-picked
commit) only catch user-initiated invocation; most skill activation happens
via the agent calling skill_view to consume an indexed skill.

Curator's stale-timer keys off last_used_at (agent/curator.py:233), so
without this wire-up agent-created skills would transition to stale
simultaneously regardless of actual use.
2026-04-30 05:07:34 -07:00
Bartok9
4178ab3c07 fix(skills): wire bump_use() into skill invocation and preload paths (#17782)
bump_use() existed and was tested but had zero production call sites —
use_count stayed 0 for all skills, breaking Curator's stale-detection
logic which relies on last_used_at.

Wire bump_use() into:
1. build_skill_invocation_message() — when a user invokes /skill-name
2. build_preloaded_skills_prompt() — when a skill is preloaded at session start

Both are the canonical 'a skill is actively being used' moments, distinct
from 'browsing' (bump_view in skill_view tool call).

Closes #17782
2026-04-30 05:07:34 -07:00
Teknium
4c792865b4 test(gateway): pin cleanup invariants for #17758 in-band drain hand-off
Belt-and-suspenders on top of @briandevans' #17758 fix.  The in-band
drain hand-off (await->create_task + session-guard preservation)
changed cleanup semantics in three places that the original PR
reasoned about but didn't test directly.  Pin each invariant so a
future refactor can't silently regress them:

1. Normal single-message path still releases _active_sessions[sk] and
   _session_tasks[sk] through end-of-finally.  The #17758 follow-up
   moved _release_session_guard under
     if current_task is self._session_tasks.get(session_key)
   For the 99%-common case current_task IS the stored task, so the
   guard must still fire.  Test would fail if the conditional were
   ever tightened in a way that dropped the normal path.

2. Drain-task cancellation releases the session.  If the drain task
   spawned by the in-band hand-off is cancelled mid-handler (e.g.
   /stop fired while draining a follow-up), its own finally must
   fire _release_session_guard.  Without this a cancel would leave
   the session permanently pinned busy.

3. Late-arrival drain still spawns when no in-band drain preceded
   it.  Pre-existing path, but the #17758 follow-up added a
   re-queue branch that only fires when ownership was already
   handed off.  When no handoff happened the else branch must still
   spawn a fresh drain task — otherwise a message arriving during
   stop_typing gets silently dropped.

All three tests pass against current main.  Zero production code
changes.
2026-04-30 05:00:25 -07:00
Teknium
a845177ebe fix(skills): also exclude .archive in skills_tool + add author map entry
Widen #17639 to the fourth sibling site (tools/skills_tool.py _EXCLUDED_SKILL_DIRS)
and register leoneparise in scripts/release.py AUTHOR_MAP so CI release script
resolves the contributor.
2026-04-30 04:59:22 -07:00
Leone Parise
eda1d516dc fix(skills): exclude .archive from skill index walk
Archived skills (moved to ~/.hermes/skills/.archive/ by the curator)
were still surfaced in the <available_skills> system prompt under a
fake '.archive' category, causing the agent to load and try to use
deprecated skills. The os.walk in iter_skill_index_files() only
excluded .git/.github/.hub.

Add '.archive' to EXCLUDED_SKILL_DIRS, and to the two other places
that hardcode the same exclusion tuple (gateway/run.py and
agent/skill_commands.py).
2026-04-30 04:59:22 -07:00
Teknium
e8e5985ce6 fix(curator): seed defaults on update, create logs/curator dir, defer fire import (#17927)
Three fixes bundled for curator reliability on existing installs and
broken/partial installs:

1. run_agent.py: defer `import fire` into the __main__ block. `fire` is
   only used by `fire.Fire(main)` when running run_agent.py directly as
   a CLI — it is NOT needed for library usage. Importing it at module
   top made `from run_agent import AIAgent` from a daemon thread (e.g.
   the curator's forked review agent) crash with ModuleNotFoundError
   on broken/partial installs where `fire` isn't present.

2. hermes_cli/config.py: add version 22 → 23 migration that writes the
   `curator` + `auxiliary.curator` sections to config.yaml with their
   defaults, only filling keys the user hasn't overridden. Existing
   configs from before PR #16049 / the April 2026 `auxiliary.curator`
   unification had neither section on disk, so users couldn't see or
   edit the settings in their config.yaml (runtime deep-merge papered
   over it at read time, but the file never reflected reality).

3. hermes_cli/config.py: `ensure_hermes_home()` now pre-creates
   `~/.hermes/logs/curator/` alongside cron/sessions/logs/memories on
   every CLI launch. Managed-mode (NixOS) variant mkdir's it
   defensively after the activation-script existence checks, since the
   activation script may not know about this subpath.

4. agent/curator.py: `_reports_root()` mkdir's the dir at call time as
   belt-and-suspenders for entry paths that bypass both
   ensure_hermes_home() and the v23 migration (gateway-only installs,
   bare library use).

E2E validated in isolated HERMES_HOME: fresh install gets full defaults
seeded; partial-override config keeps user's `enabled: false` and
custom `interval_hours` while filling the missing keys; re-running the
migration is a no-op.
2026-04-30 04:52:28 -07:00
konsisumer
d1d0ef6dbd fix(gateway): persist user message on transient agent failures (#7100)
The #1630 fix introduced a blanket ``agent_failed_early`` transcript skip
to prevent context-overflow sessions from looping.  That guard also
triggers for unrelated transient failures (429 rate limits, read
timeouts, connection resets, provider 5xx) which have nothing to do with
session size — and it silently drops the user's message, so the agent
has no memory of the last turn on retry.

Split the failure classification in ``GatewayRunner._run_agent``:

* Context-overflow (``compression_exhausted`` flag, explicit
  context-length phrases, or generic 400 with a long history) → keep
  the existing skip, preserving the #1630/#9893 fix.
* Anything else that failed → persist just the user message so the
  conversation survives a retry.

Use specific multi-word phrases (``context length``, ``token limit``,
``prompt is too long``, etc.) to match ``run_agent.py``'s own
classifier; bare ``exceed`` false-positively flagged "rate limit
exceeded" as context overflow.

Covered by new tests in ``tests/gateway/test_7100_transient_failure_transcript.py``
and the existing #1630 suite still passes.
2026-04-30 04:32:33 -07:00
Teknium
87f5e1a25a test(ssh): update tar pipe assertion for --no-overwrite-dir
Existing test_tar_pipe_commands asserted the literal substring
'tar xf - -C /' in ssh_str, which is no longer present after the
#17767 fix adds --no-overwrite-dir between 'tar xf -' and '-C /'.

Split the one substring check into three independent assertions for
the tar stdin mode, the new --no-overwrite-dir flag (regression guard
for #17767), and the extract target.
2026-04-30 04:32:28 -07:00
Teknium
b50bc13ef9 fix(config): preserve YAML lists in hermes config set (#17876)
_set_nested unconditionally replaced any non-dict value with an empty
dict when walking the dotted path, which silently destroyed list-typed
config nodes the moment someone set a value with a numeric index
(e.g. 'hermes config set custom_providers.0.api_key NEW'). Any sibling
entries and any fields inside the targeted entry that the user didn't
write were lost.

Fix:
- _set_nested now detects list nodes and navigates by numeric index,
  and preserves both dicts AND lists at intermediate positions (scalars
  are still replaced so bare-scalar -> nested overrides keep working).
- set_config_value drops its duplicated navigation logic and calls
  _set_nested instead -- single source of truth for the rules.

Regression tests (tests/hermes_cli/test_set_config_value.py):
- test_indexed_set_preserves_sibling_list_entries -- exact #17876 repro
- test_indexed_set_preserves_non_targeted_fields -- inner-dict fields survive
- test_deeper_nesting_through_list -- dict -> list -> dict -> scalar path

35/35 existing + new tests pass.

E2E-verified with the issue's repro against a real on-disk config.yaml --
list stays a list, entry 0 updated, entry 1 intact.

Closes #17876
2026-04-30 04:32:17 -07:00
Teknium
3fc4c63d38 test(model_switch): update regression to reflect bare-custom guard 2026-04-30 04:32:11 -07:00
Teknium
61fec7689d chore(release): map Andy283 gitee email in AUTHOR_MAP 2026-04-30 04:32:11 -07:00
Andy
201f7caed8 fix: prevent bare 'custom' slug in model.provider (#17478)
When hermes model picker switches to a custom_providers entry, the slug
assignment can write the literal string 'custom' to model.provider if a
prior failed switch already left that value in config.yaml.

Two fixes:
1. model_switch.py: filter out bare 'custom' in slug assignment, always
   resolve to canonical custom:<name> form
2. providers.py: resolve_custom_provider() self-heals bare 'custom' by
   falling back to the first valid custom_providers entry

Closes #17478
2026-04-30 04:32:11 -07:00
Sanjays2402
e0fa2cf972 fix(tools): isolate get_tool_definitions quiet_mode cache + dedup LCM injection (#17335)
Long-lived Gateway processes were sending duplicate tool names to
providers that enforce uniqueness:

  - DeepSeek:        'Tool names must be unique.'
  - Xiaomi MiMo:     'tools contains duplicate names: lcm_expand'
  - Moonshot/Kimi:   'function name lcm_grep is duplicated'

TUI was unaffected because TUI runs with quiet_mode=False and skips the
cache entirely.

Root cause (two layered bugs)
- model_tools.get_tool_definitions(quiet_mode=True) memoizes its result
  in _tool_defs_cache. The cache-hit path returned list(cached) (safe),
  but the FIRST uncached call stored and returned the SAME object.
  run_agent.py mutates self.tools (memory + LCM context-engine schemas)
  in-place, so the very first agent init in a Gateway process
  poisoned the cache, and every subsequent init appended LCM schemas
  again on top of the already-polluted list.
- run_agent.py's context-engine injection (lcm_grep / lcm_describe /
  lcm_expand) had no dedup, unlike the memory-tools injection right
  above it which already skips already-present names.

Fix (defense in depth, per the issue's suggested fix)
- model_tools.get_tool_definitions: on the uncached branch, cache the
  computed list but return list(result) to the caller. Same pattern as
  the cache-hit path.
- run_agent.py: build _existing_tool_names from self.tools and skip
  schemas whose names are already present, mirroring the memory-tools
  block. This also defends against plugin paths that may register the
  same schemas via ctx.register_tool().

Tests (tests/test_get_tool_definitions_cache_isolation.py)
- test_first_uncached_call_returns_fresh_list \u2014 pins the fix; without
  it, first-call alias caused all the symptoms.
- test_cache_hit_returns_fresh_list \u2014 pre-existing behavior stays.
- test_caller_mutation_does_not_poison_cache \u2014 simulates run_agent
  appending lcm_grep / lcm_expand to the returned list and asserts the
  next call doesn't see them.
- test_repeated_caller_mutation_does_not_accumulate \u2014 reproduces the
  long-lived Gateway accumulation pattern across 5 agent inits.
- test_non_quiet_mode_does_not_use_cache \u2014 sanity, explains why TUI
  was fine.

5/5 pass on the new file; 23/23 still pass on tests/test_model_tools.py.
2026-04-30 04:32:06 -07:00
Teknium
70ae678af1 chore(release): map rob@atlas.lan to @rmoen 2026-04-30 04:31:23 -07:00
Rob Moen
0dd373ec43 fix(context): honor model.context_length for Ollama num_ctx and all display paths
When a user sets model.context_length in config.yaml, the value was only
used for Hermes' internal compression decisions (context_compressor) but
NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF
metadata (often 256K+) and allocates that much VRAM regardless of the
user's config — causing OOM on smaller GPUs like the P100 (16GB).

Root cause: two separate context values existed independently:
  - context_compressor.context_length = config value (e.g. 65536) ✓
  - _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config

Changes:

1. Cap Ollama num_ctx to config context_length (run_agent.py)
   When model.context_length is explicitly set and no explicit
   ollama_num_ctx override exists, cap the auto-detected GGUF value
   to the user's context_length. This is the core fix — it prevents
   Ollama from allocating more VRAM than the user budgeted.

2. Pass config_context_length through all secondary call sites
   Several paths called get_model_context_length() without the config
   override, falling through to the 256K default fallback:
   - cli.py: @-reference expansion and /model switch display
   - gateway/run.py: @-reference expansion and /model switch display
   - tui_gateway/server.py: @-reference expansion
   - hermes_cli/model_switch.py: resolve_display_context_length()

3. Normalize root-level context_length in config (hermes_cli/config.py)
   _normalize_root_model_keys() now migrates root-level context_length
   into the model section, matching existing behavior for provider and
   base_url. Users who wrote `context_length: 65536` at the YAML root
   instead of under `model:` had it silently ignored.

4. Fix misleading comments (agent/model_metadata.py)
   DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K
   as two comments stated.

Tests: 3 new tests for root-level context_length normalization.
All existing context_length tests pass (96 tests).
2026-04-30 04:31:23 -07:00
Bartok9
fbb3775770 fix(gateway): enforce auth check in busy-session path to prevent unauthorized injection (#17775)
The busy-session handler (_handle_active_session_busy_message) bypassed the
authorization gate that the cold path enforces via _is_user_authorized(). In
shared-thread contexts (Slack threads, Telegram forum topics, Discord threads)
where thread_sessions_per_user=False (the default), all participants share one
session_key. An unauthorized user posting in the same thread as an authorized
user would hit the active-session branch, skip the auth check, and have their
text merged into _pending_messages or injected via agent.interrupt().

This commit adds the same _is_user_authorized() check at the top of the busy
handler, before any message queuing, steering, or interrupt logic. Unauthorized
messages are silently dropped (return True) with a warning log — matching the
cold-path behavior.

Affected platforms: Slack, Telegram, Discord, any adapter with shared-session
thread contexts.

Closes #17775
2026-04-30 04:29:15 -07:00
briandevans
cc5b9fb581 fix(transport): omit thinking_config for Gemma on the gemini provider (#17426)
The `gemini` provider also serves Gemma (e.g. `gemma-4-31b-it`) and
historically other Google models like PaLM. Those reject
`extra_body.thinking_config` with HTTP 400:

    Unknown name "thinking_config": Cannot find field

`_build_gemini_thinking_config()` was unconditionally producing a
config dict for any model on the `gemini` / `google-gemini-cli`
provider, which `ChatCompletionsTransport.build_kwargs` then dropped
into `extra_body["thinking_config"]`. The result: every chat turn for
Gemma users on the gemini provider blew up at the API edge.

The fix is the same shape Hermes already uses for the Gemini-2.5 vs
Gemini-3 family clamping: normalise the model id, strip an
`OpenRouter`-style `google/` prefix, and short-circuit early when the
result doesn't start with `gemini`. We return `None` rather than
`{"includeThoughts": False}`, because the API rejects the field name
itself — even the polite "off" form trips the same 400.

Three regression tests cover Gemma with reasoning enabled, Gemma with
reasoning disabled, and the `google/gemma-…` OpenRouter-style id; the
existing Gemini-2.5 / Gemini-3 / `google/gemini-…` cases keep passing
because the Gemini guard fires after the prefix strip.

Fixes #17426

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 04:29:04 -07:00
Teknium
3de8e21683 feat(gateway): native send_multiple_images for Telegram, Discord, Slack, Mattermost, Email
Ports PR #17888's send_multiple_images ABC to every gateway platform that
has a native multi-attachment API, so images arrive as a single bundled
message instead of N separate ones.

Native overrides:
- Telegram: send_media_group (10 photos per album, chunks over); animated
  GIFs peeled off and routed through send_animation (albums don't support
  animations)
- Discord: channel.send(files=[...]) (10 attachments per message, chunks
  over); URL images downloaded into BytesIO so they render inline; forum
  channels use create_thread with files=[...]
- Slack: files_upload_v2(file_uploads=[...]) (10 per call, chunks over);
  respects thread_ts; records thread participation
- Mattermost: single post with file_ids list (5 per post — Mattermost cap,
  chunks over)
- Email: single SMTP message with multiple MIME attachments (no chunk cap,
  SMTP size governs); remote URLs remain linked in body (parity with
  existing send_image)

All platforms fall back to the base per-image loop on any failure, so a
single bad image in a batch never loses the rest.

Matrix, WhatsApp, and single-attachment platforms (BlueBubbles, Feishu,
WeCom, WeChat, DingTalk) continue to use the base default loop — their
server APIs only accept one attachment per message anyway.

Tests: adds tests/gateway/test_send_multiple_images.py with 19 targeted
tests covering base default loop, chunking, animation peel-off, fallback
paths, and empty-batch no-ops across all five new overrides.

Co-authored-by: Maxence Groine <maxence@groine.fr>
2026-04-30 04:28:08 -07:00
Maxence Groine
04ea895ffb feat(gateway/signal): add support for multiple images sending
Adds a new `send_multiple_images` method to the ``BasePlatformAdapter``
that implements the default "One image per message" loop and allows for
platform-specific overriding.

Implements such an override for the Signal adapter, batching images
and trying (best-effort) to work around rate-limits for voluminous
batches using a specific scheduler.

Also implements batching + rate-limit handling in the `send_message`
tool.

New tests added for the Signal adapter, its rate-limit scheduler and the
`send_message` tool
2026-04-30 04:28:08 -07:00
Teknium
411f586c67 refactor(gateway): extract _float_env helper for env-var float casts
Follow-up to the try/except guards added in the previous commit.
Four sibling call sites all read HERMES_AGENT_TIMEOUT /
HERMES_AGENT_TIMEOUT_WARNING / HERMES_AGENT_NOTIFY_INTERVAL via the
same read-env-or-fallback pattern, so factor it into _float_env(name,
default) alongside the existing _auto_continue_freshness_window()
helper.
2026-04-30 03:32:37 -07:00
vominh1919
ca87c822ed fix(gateway): guard yaml.safe_load and float() env var casts against crash
Two defensive fixes in gateway/run.py:

1. yaml.safe_load returning None on empty config files (line 12706):
   GatewayConfig.from_dict(data) crashes with AttributeError when the YAML
   file is empty because safe_load returns None. All 6 other yaml.safe_load
   call sites already use `or {}` — this one was missed.
   Impact: gateway fails to start with empty --config file.

2. float() on env vars without ValueError guard (lines 3951, 11757, 11805,
   11807): HERMES_AGENT_TIMEOUT, HERMES_AGENT_TIMEOUT_WARNING, and
   HERMES_AGENT_NOTIFY_INTERVAL are cast via float() directly from
   os.getenv(). A typo (e.g. "abc") raises ValueError and crashes the
   agent turn or gateway startup.
   Impact: single misconfigured env var crashes the entire gateway.
2026-04-30 03:32:37 -07:00
Teknium
5af8fa5c8c chore(release): map Heltman email to username for AUTHOR_MAP 2026-04-30 03:31:16 -07:00
Heltman
19f9be1dff fix(tools): serialize concurrent hermes_tools RPC calls from execute_code
The sandbox-side `_call()` in both the UDS and file-based transports was
not thread-safe, so scripts that call tools from multiple threads (e.g.
`ThreadPoolExecutor` over `terminal()`) inside a single `execute_code`
run could silently receive each other's responses.

Root cause:

* UDS transport — a single module-level `_sock` was shared across all
  threads; the newline-framed protocol has no request-id; and the
  server-side RPC loop handles one connection serially. With concurrent
  callers, each thread would `sendall()` then race to `recv()` the next
  newline-terminated response from the shared buffer, so responses got
  delivered to the wrong caller.

* File transport — `_seq += 1` is a non-atomic read-modify-write, so
  two threads could allocate the same sequence number and clobber each
  other's request/response files.

Fix: guard `_call()` with a `threading.Lock` in the UDS case (covering
send+recv), and guard `_seq` allocation with a lock in the file case.
No protocol change.

Regression tests cover both the generated-source level (lock is present
and used) and an end-to-end concurrency test: running a sandboxed
ThreadPoolExecutor of 10 `terminal()` calls against a slow mock
dispatcher, asserting every caller sees its own tagged response. The
test fails without the fix (10/10 mismatched, matching real-world
repro) and passes with it.
2026-04-30 03:31:16 -07:00
Rylen Anil
3858f9419e fix: handle gateway Ctrl+C shutdown cleanly 2026-04-30 03:29:57 -07:00
Teknium
01d7c87ecc chore(release): map zicochaos to GitHub login 2026-04-30 03:29:48 -07:00
Sebastian B
362996e269 fix(runtime_provider): _get_named_custom_provider must honour transport field on v12+ providers dict
The v11→v12 migrate_config step writes the API mode for every entry
under the new transport: field (per the v12+ schema in
_normalize_custom_provider_entry).  _get_named_custom_provider
read the legacy api_mode: spelling only, so for every migrated
config the lookup returned None for the api mode.

Downstream, _resolve_named_custom_runtime then falls back through
custom_provider.get("api_mode") or _detect_api_mode_for_url(base_url)
or "chat_completions".  For loopback URLs (proxies, local servers)
or unknown hostnames, the URL detector returns None and the resolver
silently downgrades the configured codex_responses /
anthropic_messages transport to chat_completions.  Requests
get sent to /v1/chat/completions instead of /v1/responses or
/v1/messages and the provider 404s — or worse, returns a usable
chat_completions response while skipping the model's reasoning /
caching surface.

Fix: read both field names — entry.get("api_mode") or
entry.get("transport") — at the two match-by-key + match-by-name
branches in _get_named_custom_provider.  The runtime normaliser
_normalize_custom_provider_entry already accepts both spellings;
this lifts the same compat into the direct-dict reader so v12+
configs work without going through the shim.

Adds three regression tests under
tests/hermes_cli/test_user_providers_model_switch.py:
- transport field is read on the match-by-key branch
- legacy api_mode spelling still works for hand-edited configs
- transport is read on the match-by-display-name branch
2026-04-30 03:29:48 -07:00
briandevans
f54935738c fix(cron): surface agent run_conversation failure flags as job failure
run_job() ignored the result's `failed=True` / `completed=False` flags
that agent.run_conversation populates on API exhaustion, mid-run
interrupts, and model aborts. Because final_response on those paths is
often a non-empty error string ("API call failed after 3 retries:
Request timed out."), the existing empty-response soft-fail in
_process_job did not trip either: the error text was delivered as if it
were the agent's reply and last_status was set to "ok" with no error
notification. Detect those flags right after the dict-shape guard and
raise so the existing except handler builds the proper failure tuple,
preserving the agent's error message via result["error"].

Adds a parametrized regression covering: API-retry-exhausted with error
text in final_response, completed=False with no final_response,
completed=False without an explicit failed flag, and the partial-reply
plus failed=True case. Plus a guard that a normal completed=True success
result is still treated as success.

Fixes #17855

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 03:27:37 -07:00
briandevans
f44f1f9615 fix(gateway): preserve session guard across in-band drain handoff
When the in-band pending-message drain spawns a fresh task and
transfers ownership via _session_tasks[session_key] = drain_task,
the original task still unwinds through the finally block.  The
drain task picks up the same interrupt_event in its own
_process_message_background entry, so an unconditional
_release_session_guard(session_key, guard=interrupt_event) at the
end of the finally matches and deletes _active_sessions[session_key]
while the drain task is still pending its first await.

A concurrent inbound message arriving in that handoff window passes
the Level-1 guard (no entry exists) and spawns a second
_process_message_background for the same session — two agents on
one session_key, duplicate responses, duplicate tool calls.

Fix: only call _release_session_guard when the current task still
owns _session_tasks[session_key].  When ownership has been
transferred to a drain task, leave _active_sessions populated; the
drain task's own lifecycle releases it.  This mirrors the
late-arrival drain path in the same finally block, which already
leaves both entries alone after handing off.

Also reorder stdlib imports in the new regression test file to
match the gateway test convention (stdlib before third-party).

Regression test: capture _active_sessions[sk] identity at every
handler entry across a 2-step in-band drain chain and assert the
guard Event identity stays the same.  Pre-fix, the original task's
finally deletes the entry, the drain task falls through to the
`or asyncio.Event()` branch, and a fresh Event is installed —
identity diverges.  Post-fix, the entry is preserved and the drain
task reuses the original Event.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 03:27:08 -07:00
briandevans
663ba9a58f fix(gateway): drain pending messages via fresh task, not recursion (#17758)
`_process_message_background` finished a turn, found a queued
follow-up, and drained it via `await
self._process_message_background(pending_event, session_key)`.  Each
chained follow-up added a frame to the call stack instead of starting
fresh.  Under sustained pending-queue activity (e.g. a user sending
follow-ups faster than the agent finishes turns) the C stack would
exhaust at ~2000 nested frames and SIGSEGV the process.

Mirror the late-arrival drain pattern that already exists in the same
function: spawn a new `asyncio.create_task(...)` for the pending event
and return so the current frame can unwind.  The new task takes
ownership via `_session_tasks[session_key]`.

The late-arrival drain in `finally` could now race with the in-band
drain across the `await typing_task` / `await stop_typing` window, so
add a guard: if `_session_tasks[session_key]` is no longer the current
task, an in-band drain already spawned a follow-up task — re-queue the
late-arrival event so that task picks it up after its current event,
instead of spawning a second concurrent task for the same session_key.

Regression test (`test_pending_drain_no_recursion.py`) chains 12
follow-ups and asserts the recorded
`_process_message_background` stack depth stays bounded at handler
entry.  Pre-fix: depths grow linearly `[1,2,3,…,12]`.  Post-fix: all
depths are `1`.

`test_duplicate_reply_suppression::test_stale_response_suppressed_when_interrupted`
called `_process_message_background` directly and implicitly relied on
the old recursive `await` semantic — updated to wait for the spawned
drain task before checking the sent list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 03:27:08 -07:00
vominh1919
cb130bf776 fix(ssh): prevent tar from overwriting remote home dir permissions
tar xf - -C / extracts the staging directory tree to the remote root.
GNU tar default behavior overwrites metadata (including mode) of existing
directories. When the local umask is 002 (Ubuntu default), the staging
dirs are 0775, and tar chmod's /home/<user> to 0775 — breaking sshd
StrictModes which requires 0755 or stricter for home dirs.

Add --no-overwrite-dir to the remote tar command so existing directory
metadata is preserved.

Fixes #17767
2026-04-30 03:26:35 -07:00
Teknium
8d302e37a8 feat(tts): add Piper as a native local TTS provider (closes #8508) (#17885)
Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the
Home Assistant project that supports 44 languages with zero API keys.
Adds it as a native built-in provider alongside edge/neutts/kittentts,
installable via 'hermes tools' with one keystroke.

What ships:

- New 'piper' built-in provider in tools/tts_tool.py
  - Lazy import via _import_piper()
  - Module-level voice cache keyed on (model_path, use_cuda) so switching
    voices doesn't invalidate older cached voices
  - _resolve_piper_voice_path() accepts either an absolute .onnx path or a
    voice name (auto-downloaded on first use via 'python -m
    piper.download_voices --download-dir <cache>')
  - Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via
    get_hermes_dir)
  - Optional SynthesisConfig knobs: length_scale, noise_scale,
    noise_w_scale, volume, normalize_audio, use_cuda — passed through
    only when configured, so older piper-tts versions aren't broken
  - WAV output then ffmpeg conversion path (same as neutts/kittentts) so
    Telegram voice bubbles work when ffmpeg is present
  - Piper added to BUILTIN_TTS_PROVIDERS so a user's
    tts.providers.piper.command cannot shadow the native provider
    (regression test included)

- 'hermes tools' wizard entry
  - Piper appears under Voice and TTS as local free, with
    'pip install piper-tts' auto-install via post_setup handler
  - Prints voice-catalog URL and default-voice info after install

- config.yaml defaults
  - tts.piper.voice defaults to en_US-lessac-medium
  - Commented advanced knobs for discoverability

- Docs
  - New 'Piper (local, 44 languages)' section in features/tts.md
    explaining install path, voice switching, pre-downloaded voices,
    and advanced knobs
  - Piper listed in the ten-provider table and ffmpeg table
  - Custom-command-providers section updated to drop the Piper example
    (now native) and add a piper-custom example for users with their own
    trained .onnx models
  - overview.md bumps provider count to ten

- Tests (tests/tools/test_tts_piper.py, 16 tests)
  - Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH)
  - _resolve_piper_voice_path across every branch: direct .onnx path,
    cached voice name, fresh download with correct CLI args, download
    failure, successful-exit-but-missing-files, empty voice to default
  - _generate_piper_tts: loads voice once, reuses cache, voice-name
    download wiring, advanced knobs flow through SynthesisConfig
  - text_to_speech_tool end-to-end dispatch and missing-package error
  - check_tts_requirements: piper availability toggles the return value
  - Regression guard: piper cannot be shadowed by a command provider
    with the same name
  - Pre-existing test_tts_mistral test broadened to mock the new
    piper/kittentts/command-provider checks (otherwise it false-passes
    when piper is installed in the test venv)

E2E verification (live):

Actual pip install piper-tts, config piper + en_US-lessac-low,
text_to_speech_tool call, voice auto-downloaded from HuggingFace,
WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the
cache (~60ms). Cache dir populated with .onnx and .onnx.json.

This caught a real bug during development: the first pass used '-d' as
the download-dir flag; the actual piper.download_voices CLI wants
'--download-dir'. Fixed before PR opened.
2026-04-30 02:53:20 -07:00
Teknium
2662bfb756 fix(tests): make test_update_stale_dashboard immune to hermes_cli.main reload (#17881)
Six tests in this file failed in CI (-n auto) after #17832 landed because
other tests on the same xdist worker reload hermes_cli.main:

  tests/hermes_cli/test_env_loader.py:85-86
    sys.modules.pop('hermes_cli.main', None)
    importlib.import_module('hermes_cli.main')

  tests/hermes_cli/test_skills_subparser.py:24-25
    del sys.modules['hermes_cli.main']

When either ran first on a worker, our top-of-file
'from hermes_cli.main import _kill_stale_dashboard_processes' captured a
stale function object whose __globals__ points at the old module dict.
patch('hermes_cli.main._find_stale_dashboard_pids', ...) then patched the
new module, but the stale function resolved the dependency via its stale
__globals__, so every patch became a no-op: pids=[] → early return → no
signals, no output, assertions failed.

Fix: add an autouse fixture that rebinds the three module-level names to
whatever is currently live in sys.modules['hermes_cli.main'] before each
test runs. The pollutants in the other two files are load-bearing for
their own tests, so fixing it on the consumer side is correct.

Repro: pytest tests/hermes_cli/test_env_loader.py tests/hermes_cli/test_update_stale_dashboard.py
2026-04-30 02:46:56 -07:00
Teknium
0da968e521 fix(curator): unify under auxiliary.curator (hermes model, dashboard) (#17868)
Voscko reported curator.auxiliary.provider/model was advertised in the
docs but ignored — the review fork read only model.provider/default. The
narrow fix would wire the one-off key through, but that leaves curator
as a parallel system: not in `hermes model` → auxiliary picker, not in
the dashboard Models tab, missing per-task base_url/api_key/timeout/
extra_body.

Unify curator with the rest of the aux task system so `hermes model`
and the dashboard configure it like every other aux task.

Four sources of truth updated:
- hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary
  (timeout=600 since reviews run long), drop the one-off curator.auxiliary
  block from DEFAULT_CONFIG.curator.
- hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass')
  to _AUX_TASKS so the CLI picker offers it.
- hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the
  dashboard REST endpoint accepts it.
- web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard
  Models tab renders the task.

agent/curator.py _resolve_review_model() now reads auxiliary.curator
first (canonical), falls back to legacy curator.auxiliary (with an info
log asking users to migrate), then falls back to the main chat model.
Pre-unification users keep working.

Docs updated: docs/user-guide/features/curator.md now points at
`hermes model` → auxiliary → Curator and the dashboard Models tab.

Tests: 6 unit tests on _resolve_review_model (auto default, canonical
slot honored, partial override fallback, legacy fallback with
deprecation log assertion, new-wins-over-legacy, empty-config safety)
plus a cross-registry test that curator is wired into all four sources
of truth. test_aux_tasks_keys_all_exist_in_default_config already
covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant.

Reported by Voscko on Discord.
2026-04-30 02:46:01 -07:00
teknium1
658947480a fix(acp): drop dead message_id kwarg from replay chunks
UserMessageChunk and AgentMessageChunk do not have a message_id field
in the ACP schema. Passing it silently dropped the kwarg (pydantic
does not raise on unknown init kwargs here) and the subsequent test
assertions on .message_id raised AttributeError. Strip the dead
plumbing (uuid import, message_id= kwarg on both chunk types, unused
session_id/index parameters) and remove the matching .message_id
asserts from the test.
2026-04-30 02:45:54 -07:00
Henkey
d2536a72bf fix(acp): replay session history on load 2026-04-30 02:45:54 -07:00
teknium1
5d253e65b7 fix(openviking): pre-check fs/stat to route file URIs before hitting directory-only endpoints
Adds a deterministic pre-check on top of htsh's exception-based fallback:
before calling /content/abstract or /content/overview on a non-pseudo URI,
probe /api/v1/fs/stat. If the server says the URI is a file, route straight
to /content/read instead of eating a failing 500 round-trip.

This is the same idea pty819 and chennest independently landed in PRs
#12757 and #12937 — merged here on top of htsh's broader fix so we keep
pseudo-URI normalization and v0.3.3 browse-shape handling while avoiding
the slow exception path on servers that return a raised 500 every time.

The exception fallback from #5886 stays in place for environments where
fs/stat is unavailable or returns an unfamiliar shape.

Also credits pty819, chennest, and htsh in AUTHOR_MAP so future release
notes attribute them correctly.
2026-04-30 02:35:29 -07:00
hitesh
10e43edc09 fix(openviking): fallback summary reads to content/read for file URIs
OpenViking returns 500 for /content/abstract and /content/overview when URI points to mem_*.md files.
Add resilient fallback to /content/read for non-pseudo summary file URIs while preserving pseudo summary normalization.
Also add regression tests for fallback behavior.
2026-04-30 02:35:29 -07:00
hitesh
bff8ab0311 test(openviking): add helper regression coverage 2026-04-30 02:35:29 -07:00
Hitesh Aidasani
97a851bf97 fix(openviking): normalize summary pseudo-URIs to prevent v0.3.3 500s
OpenViking v0.3.3 expects directory URIs for abstract/overview reads.
Passing pseudo-files like /.overview.md and /.abstract.md to
/api/v1/content/overview|abstract triggers HTTP 500.

This change normalizes those pseudo-URIs to their parent directory for
abstract/overview requests, preserves full reads, and hardens parsing for
wrapped/unwrapped result payloads and fs list response shapes.
2026-04-30 02:35:29 -07:00
Teknium
25caaa4a70 feat(tips): add cost-saving tips from April 30 tip-of-the-day (#17841)
Seed the tips corpus with the knobs users can turn to reduce token
spend: hermes tools / hermes skills config to trim surface area,
/reasoning low|minimal to dial thinking depth down from the medium
default, and hermes models to route auxiliary tasks (vision, compression,
title gen, session_search) to cheaper backends while the main chat model
stays intact.

Requested by @micheltamanda under Teknium's tip-of-the-day tweet.
2026-04-30 02:30:36 -07:00
Teknium
0ad4f55aa8 feat(dashboard): add --stop and --status flags (#17840)
`hermes dashboard` is a long-lived foreground server that users often
start and forget about, sometimes in a shell they've since closed.  We
didn't have a way to stop it — users had to find the PID manually.

Adds two lifecycle flags that reuse the same detection + termination
path the post-`hermes update` cleanup (PR #17832) uses:

  hermes dashboard --status
    List running hermes dashboard processes with PID + cmdline.
    Exit 0, informational.

  hermes dashboard --stop
    Terminate all running dashboards (3s grace then force-kill survivors).
    Exit 0 if none remain, 1 if any couldn't be stopped.
    Windows uses `taskkill /F` as before.

Both flags short-circuit before any fastapi/uvicorn import so they work
even on installations where the dashboard extras aren't installed —
useful when you're cleaning up after uninstalling.

The kill helper gained an optional `reason=...` param so the output
reads "(requested via --stop)" instead of the post-update-specific
"running backend no longer matches the updated frontend" wording.

E2E: `hermes dashboard --status` with nothing running prints the
empty message; with a fake `hermes dashboard ...` cmdline spawned via
`exec -a`, `--status` lists it, `--stop` terminates it (exit -15),
and a follow-up `--status` returns empty.
2026-04-30 02:30:20 -07:00
Teknium
2facea7f71 feat(tts): add command-type provider registry under tts.providers.<name> (#17843)
Reshape of PR #17211 (@versun). Lets users wire any local or external
TTS CLI into Hermes without adding engine-specific Python code. Users
declare any number of named providers in config.yaml and switch between
them with tts.provider: <name>, alongside the built-ins (edge, openai,
elevenlabs, …).

Config shape:

  tts:
    provider: piper-en
    providers:
      piper-en:
        type: command
        command: 'piper -m ~/model.onnx -f {output_path} < {input_path}'
        output_format: wav

Placeholders: {input_path}, {text_path}, {output_path}, {format},
{voice}, {model}, {speed}. Use {{ / }} for literal braces.

Key behavior:
- Built-in provider names always win — a tts.providers.openai entry
  cannot shadow the native OpenAI provider.
- type: command is the default when command: is set.
- Placeholder values are shell-quote-aware (bare / single / double
  context), so paths with spaces and shell metacharacters are safe.
- Default delivery is a regular audio attachment. voice_compatible: true
  opts in to Telegram voice-bubble delivery via ffmpeg Opus conversion.
- Command failures (non-zero exit, timeout, empty output) surface to
  the agent with stderr/stdout included so you can debug from chat.
- Process-tree kill on timeout (Unix killpg, Windows taskkill /T).
- max_text_length defaults to 5000 for command providers; override
  under tts.providers.<name>.max_text_length.

Tests: tests/tools/test_tts_command_providers.py — 42 new tests cover
provider resolution, shell-quote context, placeholder rendering with
injection payloads, timeout, non-zero exit, empty output, voice_compatible
opt-in, and end-to-end dispatch through text_to_speech_tool. All 88
pre-existing TTS tests still pass.

Docs: new "Custom command providers" section in
website/docs/user-guide/features/tts.md with three worked examples
(Piper, VoxCPM, MLX-Kokoro), placeholder reference, optional keys,
behavior notes, and security caveat.

E2E-verified live: isolated HERMES_HOME, command provider declared in
config.yaml, text_to_speech_tool dispatches through the registered
shell command and the output file is produced as expected.

Co-authored-by: Versun <me+github7604@versun.org>
2026-04-30 02:29:08 -07:00
Teknium
5b85a7d351 fix(update): kill stale dashboard processes instead of warning (#17832)
`hermes update` previously just printed a warning when it detected a
running `hermes dashboard` process from the previous version, telling
the user to kill and restart it themselves.  In practice dashboards get
started and forgotten, so the warning was routinely ignored and users
ended up with a silent frontend/backend mismatch (new JS bundle served
against the old in-memory Python backend, e.g. new auth headers the old
code doesn't recognise → every API call 401s).

The dashboard has no service manager, no PID file, and we don't record
the original launch args (--host, --port, --insecure, --tui, --no-open)
so we can't auto-restart it.  But we CAN stop it, which is what the
user wants — the failure mode when the stale process is left alive is
worse than the dashboard just being down.

- POSIX: SIGTERM, poll for ~3s, SIGKILL any survivors.
- Windows: `taskkill /PID <pid> /F`.
- Print each PID's outcome plus a one-line restart hint.
- Detection logic is unchanged (same ps / wmic scan, same guards
  against the `pgrep -f` greedy-match trap from #16872 and the
  #17049 wmic UnicodeDecodeError fix).

Also split the old monolithic `_warn_stale_dashboard_processes` into
`_find_stale_dashboard_pids` (scan) + `_kill_stale_dashboard_processes`
(kill), keeping the old name as an alias so any external callers still
work.

E2E verified: spawned a fake `hermes dashboard` cmdline via
`exec -a 'hermes dashboard …' sleep 300`, ran
`_kill_stale_dashboard_processes()`, confirmed SIGTERM exit (-15)
and that a post-scan returns an empty PID list.
2026-04-30 01:34:34 -07:00
Teknium
fd0796947f fix: stabilize CI — TS widen, sys.modules restore, WS subscriber race (#17836)
Three narrow fixes targeting the remaining red checks after #17828:

1. ui-tui/src/app/slash/commands/ops.ts (Docker Build):
   /reload-mcp's local params type annotated session_id: string
   while ctx.sid is string | null. Widen to string | null —
   matches every other rpc call site and the test harness which passes
   { session_id: null }. Fixes TS2322 on line 86. The rpc signature
   itself is Record<string, unknown>, so this is purely a local
   typing fix, no behavioral change.

2. tests/plugins/test_achievements_plugin.py (13 cascading test failures):
   _install_fake_session_db did a raw sys.modules['hermes_state'] =
   fake_module without restoration, leaking the fake across xdist
   worker boundaries. Downstream tests doing from hermes_state import
   SessionDB got a module whose SessionDB was lambda: fake_db
   — 6 test_hermes_state.py tests failed with AttributeError: 'function'
   object has no attribute '_sanitize_fts5_query' / _contains_cjk,
   and 7 test_860_dedup.py tests failed with TypeError: got unexpected
   keyword argument 'db_path' (real code calls SessionDB(db_path=...)).

   Fix: stash monkeypatch on the plugin_api module object in the
   fixture, and have the helper do monkeypatch.setitem(sys.modules,
   'hermes_state', fake_module) for auto-restoration at test teardown.

3. tests/hermes_cli/test_web_server.py (WS race):
   TestPtyWebSocket::test_pub_broadcasts_to_events_subscribers hit the
   30s test timeout on CI. websocket_connect returns after
   ws.accept() — but /api/events registers the subscriber in
   _event_channels on the NEXT await (inside _event_lock). A
   publish immediately after connect could race ahead of registration
   and be dropped, and the subsequent receive_text() blocked until
   SIGALRM killed the test. Fix: poll _event_channels after the
   subscriber connects, before publishing.

Validation:
scripts/run_tests.sh tests/plugins/test_achievements_plugin.py
                     tests/run_agent/test_860_dedup.py
                     tests/test_hermes_state.py
                     tests/hermes_cli/test_web_server.py    338 passed
cd ui-tui && npm run type-check                             clean
cd ui-tui && npm run build                                  clean

Remaining red checks are pure infra (Nix ubuntu hits
TwirpErrorResponse ResourceExhausted on the GH Actions cache API; Nix
macos bounces between npm build openssl-legacy and cache rate-limits)
and cannot be fixed in the codebase.
2026-04-30 01:34:08 -07:00