The kanban PR (#17805, c86842546) added the `kanban` toolset and
`tools/kanban_tools.py`, but didn't update three pre-existing test
assertions that bake the full toolset/tool inventory:
* `tests/tools/test_registry.py::test_matches_previous_manual_builtin_tool_set`
hard-codes the manual list of builtin tool modules. `tools.kanban_tools`
was missing.
* `tests/test_tui_gateway_server.py::test_load_enabled_toolsets_rejects_disabled_mcp_env`
and `test_load_enabled_toolsets_falls_back_when_tui_env_invalid` both
expect `["memory"]` from `_load_enabled_toolsets()`. With kanban now
auto-recovered by `_get_platform_tools` (its tools live in hermes-cli's
universe but are not in CONFIGURABLE_TOOLSETS), the resolver returns
`["kanban", "memory"]`.
* `tests/hermes_cli/test_tools_config.py::test_get_platform_tools_preserves_explicit_empty_selection`
asserts `set()` for an explicit empty list. The recovery loop now also
surfaces `kanban`. Reframed to assert the contract the test name
describes — no CONFIGURABLE toolset gets re-enabled when the user
explicitly saved an empty list — which stays correct as more
non-configurable platform toolsets are added.
Verified the failures reproduce on clean origin/main (180a7036b) with
`.[all,dev]`-equivalent extras (fastapi, starlette, httpx, pytest-asyncio)
and that all four pass with this commit applied. CI on main itself is
currently red on these tests; this restores green for everyone's PRs.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Signal-cli sends dataMessage wrappers for profile key updates and other
metadata events that have no actual text content. These were reaching the
gateway as msg='' and triggering full agent turns for nothing.
Add early return in _handle_envelope() when both message field is empty/
missing/whitespace AND there are no attachments. Messages with media
attachments but no text still flow through.
- 12 lines added to gateway/platforms/signal.py
- 5 new tests in TestSignalContentlessEnvelope class
The Ink TUI (\`hermes --tui\` + dashboard \`/chat\`) had no wiring for the
background self-improvement review. When the review fired and patched
a skill or saved a memory entry, the change landed but the user had
no visual indication it happened — only the CLI had a print surface
for the '💾 Self-improvement review: …' line.
Changes:
- tui_gateway/server.py: in _init_session, attach
agent.background_review_callback to an _emit('review.summary',
sid, {text}) closure. Wrapped in try/except so agents with locked
attribute slots don't break session startup.
- ui-tui/src/app/createGatewayEventHandler.ts: handle 'review.summary'
by routing ev.payload.text through sys(…), matching the existing
'background.complete' pattern. Empty / whitespace payloads are
ignored so the transcript never gets a blank system line.
- ui-tui/src/gatewayTypes.ts: extend the GatewayEvent discriminated
union with { type: 'review.summary', payload?: { text?: string } }.
Gateway platforms (Telegram, Discord, Slack, …) already route the
review summary via background_review_callback → post-delivery queue
in gateway/run.py, so they pick up the new 'Self-improvement review:'
prefix from the companion run_agent change with no platform edits.
Tests:
- tests/tui_gateway/test_review_summary_callback.py (Python, 2 tests):
_init_session attaches a callback that emits the right event; the
callback path survives agents that can't accept the attribute.
- ui-tui/src/__tests__/createGatewayEventHandler.test.ts (vitest, 2
new cases): review.summary events feed sys(...) with the full text;
empty / missing payloads are no-ops.
- TypeScript type-check passes.
- tui_gateway suite: 64/64 pass.
When the self-improvement background review fires after a turn, it runs
in a bg thread and emits a ' 💾 <summary>' line to announce what it
saved to memory or skills. Two problems made this invisible to users
even when the review successfully modified a skill:
1. The print went through `_cprint` (prompt_toolkit's print_formatted_text)
on a bg thread while the CLI's PromptSession was live. Direct
print_formatted_text races with the input-area redraw and the line
can land behind/above the prompt, scrolled off without the user
seeing it.
2. The message said only '💾 Skill created.' / '💾 Memory updated'
with no indication that the self-improvement loop was the one doing
this. Users who did catch the line couldn't tell the background
review from some other agent action.
Fixes:
- `_cprint` now detects when it's called from a non-app thread with a
running prompt_toolkit Application, and routes through
`run_in_terminal` via `loop.call_soon_threadsafe`. That pauses the
input, prints the line above the prompt, and redraws — the normal
prompt_toolkit contract for bg-thread output. Direct-print fallback
preserved for the no-app / same-thread / import-error paths. Affects
every bg-thread emission, not just the review summary (curator
summaries and auxiliary failure prints benefit too).
- The summary now reads ' 💾 Self-improvement review: <summary>' in
both the CLI and the gateway `background_review_callback` path, so
the origin is unambiguous.
Tests:
- New `tests/cli/test_cprint_bg_thread.py` covers all five routing
branches (no app, app-not-running, cross-thread schedule, same-thread
direct, app-loop-attribute-error, import-error).
- New case in `tests/run_agent/test_background_review.py` asserts the
attributed prefix shows up in both `_safe_print` and
`background_review_callback`.
Live E2E: exercised _cprint from a bg thread inside a real Application
event loop; confirmed get_app_or_none() sees the app, call_soon_threadsafe
schedules run_in_terminal, and the inner _pt_print runs.
Builds on #16855 (@lsdsjy) which fixed DeepSeek v4 reasoning_content
replay via model_extra fallback + capturing tool_calls at method entry.
Kimi / Moonshot thinking mode enforces the same echo-back contract and
hits the same 400 when a tool-call turn is persisted without
reasoning_content.
- _build_assistant_message: pad branch now uses _needs_thinking_reasoning_pad()
(DeepSeek OR Kimi) instead of _needs_deepseek_tool_reasoning() alone.
- Extract _needs_thinking_reasoning_pad() and reuse it in
_copy_reasoning_content_for_api so both sites share one predicate.
- tests/run_agent/test_deepseek_reasoning_content_echo.py: add
TestBuildAssistantMessagePadsStrictProviders parametrized over DeepSeek
(attr=None, attr-absent), Kimi (attr=None), Moonshot (via base_url),
and an OpenRouter negative control that must NOT pad. Proven to fail
2/5 cases on Kimi/Moonshot without this change.
- scripts/release.py: add AUTHOR_MAP entries for lsdsjy and season179.
Refs #17400.
Co-authored-by: season179 <season.saw@gmail.com>
Alongside the existing 'least recently used' section, surface two more
rankings so users can see which of their agent-created skills actually
get exercised:
- 'most used (top 5)' — sorted by use_count descending. Hidden when every
skill has use_count=0 (noise suppression on fresh installs).
- 'least used (top 5)' — sorted by use_count ascending. Always shown
when the catalog is non-empty.
use_count started tracking real agent skill activation in PR #17932
(bump_use wired into skill_view tool + slash invocation + --skill
preload), so these rankings are now meaningful.
Tests: 3 new in tests/hermes_cli/test_curator_status.py — happy path
with mixed use_counts, zero-use suppression of the most-used section,
and the no-skills clean-empty case.
Treat skill views and edits as activity when curator reports and applies lifecycle transitions, so recently loaded or patched skills are not displayed or transitioned as never used.\n\nAdds regression tests for activity derivation, automatic transitions, and CLI status output.
restore_skill() in tools/skill_usage.py used archive_root.iterdir(), which
only walked the top level of .archive/. Skills archived under nested layouts
(e.g. .archive/openclaw-imports/<skill>/ from older archive paths or
external imports) were invisible to both the exact-match and prefix-match
candidate scans, surfacing as a misleading "skill '<name>' not found in
archive" error even though the directory existed on disk.
Switch both candidate scans to archive_root.rglob('*') so the lookup
descends into category subdirectories.
Fixes#17942
* fix(curator): split 'archived' into consolidated vs pruned in run reports
Users who watched a curator run saw skills like 'anthropic-api' listed
under 'Skills archived' and interpreted that as pruning — but the curator
had actually absorbed those skills into a new umbrella (e.g. 'llm-providers')
during the same run. The directory gets archived for safety (all removals
are recoverable), but the content still lives under a different name.
Users then 'restored' what they thought were deleted skills and ended up
with confusingly duplicated skillsets (old-name + absorbed-inside-umbrella).
Classify removed skills using this run's skill_manage tool calls:
- consolidated: content absorbed into a surviving/newly-created skill
(evidenced by a skill_manage write_file/patch/create/edit whose target
is a different skill AND whose file_path/content references the
removed skill's name)
- pruned: archived without consolidation evidence (truly stale)
REPORT.md now shows two distinct sections:
- 'Consolidated into umbrella skills' — with `removed → merged into umbrella`
- 'Pruned — archived for staleness' — pure staleness archives
run.json schema additions (backward compatible):
- counts.consolidated_this_run, counts.pruned_this_run
- consolidated: [{name, into, evidence}, ...]
- pruned: [names]
- archived: retained as the union for backward compat
Also: relabel the auto-transitions 'archived' counter to 'archived (no
LLM, pure time-based staleness)' so it's clearly distinct from LLM-pass
archives.
Tests: 9 new tests in test_curator_classification.py covering consolidation
evidence parsing (write_file/patch/create), hyphen/underscore name variants,
self-reference rejection, destination-must-exist, mixed runs, and
malformed-JSON fallback safety. Existing test_report_md_is_human_readable
updated to cover the new section names.
E2E: isolated HERMES_HOME, realistic 3-skill run, REPORT.md verified
end-to-end.
* feat(curator): hybrid model-declared + heuristic classification
Extend the consolidated-vs-pruned split with LLM-authored intent:
1. Curator prompt now requires a structured YAML block at the end of the
final response (consolidations / prunings with short rationale).
2. _parse_structured_summary() extracts it tolerantly — missing block,
malformed YAML, partial lists all fall back to heuristic cleanly.
3. _reconcile_classification() merges model intent with the tool-call
heuristic:
- Model wins on rationale when its umbrella exists post-run
- Model hallucination (umbrella doesn't exist) is downgraded to the
heuristic's finding, or pruned if there's no evidence either
- Heuristic catches model omission — consolidations the model
enumerated tools for but forgot to list get surfaced with a
'(detected via tool-call audit)' tag
4. REPORT.md now shows per-row rationale alongside 'removed → umbrella'
and flags audit-only rows so the user knows why no reason is shown.
Backward compat: run.json's 'archived' field (union) is preserved.
'pruned' is now a list of dicts with {name, source, reason};
'pruned_names' is the flat-name list for legacy consumers.
Tests: 15 new covering YAML parse edge cases (malformed, empty lists,
bare-string entries, missing fields), reconciler rules (model wins,
hallucination fallback, heuristic catches omission, prune with reason),
and an end-to-end report-render test with all four paths exercised.
Belt-and-suspenders on top of @briandevans' #17758 fix. The in-band
drain hand-off (await->create_task + session-guard preservation)
changed cleanup semantics in three places that the original PR
reasoned about but didn't test directly. Pin each invariant so a
future refactor can't silently regress them:
1. Normal single-message path still releases _active_sessions[sk] and
_session_tasks[sk] through end-of-finally. The #17758 follow-up
moved _release_session_guard under
if current_task is self._session_tasks.get(session_key)
For the 99%-common case current_task IS the stored task, so the
guard must still fire. Test would fail if the conditional were
ever tightened in a way that dropped the normal path.
2. Drain-task cancellation releases the session. If the drain task
spawned by the in-band hand-off is cancelled mid-handler (e.g.
/stop fired while draining a follow-up), its own finally must
fire _release_session_guard. Without this a cancel would leave
the session permanently pinned busy.
3. Late-arrival drain still spawns when no in-band drain preceded
it. Pre-existing path, but the #17758 follow-up added a
re-queue branch that only fires when ownership was already
handed off. When no handoff happened the else branch must still
spawn a fresh drain task — otherwise a message arriving during
stop_typing gets silently dropped.
All three tests pass against current main. Zero production code
changes.
The #1630 fix introduced a blanket ``agent_failed_early`` transcript skip
to prevent context-overflow sessions from looping. That guard also
triggers for unrelated transient failures (429 rate limits, read
timeouts, connection resets, provider 5xx) which have nothing to do with
session size — and it silently drops the user's message, so the agent
has no memory of the last turn on retry.
Split the failure classification in ``GatewayRunner._run_agent``:
* Context-overflow (``compression_exhausted`` flag, explicit
context-length phrases, or generic 400 with a long history) → keep
the existing skip, preserving the #1630/#9893 fix.
* Anything else that failed → persist just the user message so the
conversation survives a retry.
Use specific multi-word phrases (``context length``, ``token limit``,
``prompt is too long``, etc.) to match ``run_agent.py``'s own
classifier; bare ``exceed`` false-positively flagged "rate limit
exceeded" as context overflow.
Covered by new tests in ``tests/gateway/test_7100_transient_failure_transcript.py``
and the existing #1630 suite still passes.
Existing test_tar_pipe_commands asserted the literal substring
'tar xf - -C /' in ssh_str, which is no longer present after the
#17767 fix adds --no-overwrite-dir between 'tar xf -' and '-C /'.
Split the one substring check into three independent assertions for
the tar stdin mode, the new --no-overwrite-dir flag (regression guard
for #17767), and the extract target.
_set_nested unconditionally replaced any non-dict value with an empty
dict when walking the dotted path, which silently destroyed list-typed
config nodes the moment someone set a value with a numeric index
(e.g. 'hermes config set custom_providers.0.api_key NEW'). Any sibling
entries and any fields inside the targeted entry that the user didn't
write were lost.
Fix:
- _set_nested now detects list nodes and navigates by numeric index,
and preserves both dicts AND lists at intermediate positions (scalars
are still replaced so bare-scalar -> nested overrides keep working).
- set_config_value drops its duplicated navigation logic and calls
_set_nested instead -- single source of truth for the rules.
Regression tests (tests/hermes_cli/test_set_config_value.py):
- test_indexed_set_preserves_sibling_list_entries -- exact #17876 repro
- test_indexed_set_preserves_non_targeted_fields -- inner-dict fields survive
- test_deeper_nesting_through_list -- dict -> list -> dict -> scalar path
35/35 existing + new tests pass.
E2E-verified with the issue's repro against a real on-disk config.yaml --
list stays a list, entry 0 updated, entry 1 intact.
Closes#17876
Long-lived Gateway processes were sending duplicate tool names to
providers that enforce uniqueness:
- DeepSeek: 'Tool names must be unique.'
- Xiaomi MiMo: 'tools contains duplicate names: lcm_expand'
- Moonshot/Kimi: 'function name lcm_grep is duplicated'
TUI was unaffected because TUI runs with quiet_mode=False and skips the
cache entirely.
Root cause (two layered bugs)
- model_tools.get_tool_definitions(quiet_mode=True) memoizes its result
in _tool_defs_cache. The cache-hit path returned list(cached) (safe),
but the FIRST uncached call stored and returned the SAME object.
run_agent.py mutates self.tools (memory + LCM context-engine schemas)
in-place, so the very first agent init in a Gateway process
poisoned the cache, and every subsequent init appended LCM schemas
again on top of the already-polluted list.
- run_agent.py's context-engine injection (lcm_grep / lcm_describe /
lcm_expand) had no dedup, unlike the memory-tools injection right
above it which already skips already-present names.
Fix (defense in depth, per the issue's suggested fix)
- model_tools.get_tool_definitions: on the uncached branch, cache the
computed list but return list(result) to the caller. Same pattern as
the cache-hit path.
- run_agent.py: build _existing_tool_names from self.tools and skip
schemas whose names are already present, mirroring the memory-tools
block. This also defends against plugin paths that may register the
same schemas via ctx.register_tool().
Tests (tests/test_get_tool_definitions_cache_isolation.py)
- test_first_uncached_call_returns_fresh_list \u2014 pins the fix; without
it, first-call alias caused all the symptoms.
- test_cache_hit_returns_fresh_list \u2014 pre-existing behavior stays.
- test_caller_mutation_does_not_poison_cache \u2014 simulates run_agent
appending lcm_grep / lcm_expand to the returned list and asserts the
next call doesn't see them.
- test_repeated_caller_mutation_does_not_accumulate \u2014 reproduces the
long-lived Gateway accumulation pattern across 5 agent inits.
- test_non_quiet_mode_does_not_use_cache \u2014 sanity, explains why TUI
was fine.
5/5 pass on the new file; 23/23 still pass on tests/test_model_tools.py.
When a user sets model.context_length in config.yaml, the value was only
used for Hermes' internal compression decisions (context_compressor) but
NOT for Ollama's num_ctx parameter. Ollama auto-detects context from GGUF
metadata (often 256K+) and allocates that much VRAM regardless of the
user's config — causing OOM on smaller GPUs like the P100 (16GB).
Root cause: two separate context values existed independently:
- context_compressor.context_length = config value (e.g. 65536) ✓
- _ollama_num_ctx = GGUF metadata value (e.g. 256000) ✗ ignored config
Changes:
1. Cap Ollama num_ctx to config context_length (run_agent.py)
When model.context_length is explicitly set and no explicit
ollama_num_ctx override exists, cap the auto-detected GGUF value
to the user's context_length. This is the core fix — it prevents
Ollama from allocating more VRAM than the user budgeted.
2. Pass config_context_length through all secondary call sites
Several paths called get_model_context_length() without the config
override, falling through to the 256K default fallback:
- cli.py: @-reference expansion and /model switch display
- gateway/run.py: @-reference expansion and /model switch display
- tui_gateway/server.py: @-reference expansion
- hermes_cli/model_switch.py: resolve_display_context_length()
3. Normalize root-level context_length in config (hermes_cli/config.py)
_normalize_root_model_keys() now migrates root-level context_length
into the model section, matching existing behavior for provider and
base_url. Users who wrote `context_length: 65536` at the YAML root
instead of under `model:` had it silently ignored.
4. Fix misleading comments (agent/model_metadata.py)
DEFAULT_FALLBACK_CONTEXT is 256K (CONTEXT_PROBE_TIERS[0]), not 128K
as two comments stated.
Tests: 3 new tests for root-level context_length normalization.
All existing context_length tests pass (96 tests).
The busy-session handler (_handle_active_session_busy_message) bypassed the
authorization gate that the cold path enforces via _is_user_authorized(). In
shared-thread contexts (Slack threads, Telegram forum topics, Discord threads)
where thread_sessions_per_user=False (the default), all participants share one
session_key. An unauthorized user posting in the same thread as an authorized
user would hit the active-session branch, skip the auth check, and have their
text merged into _pending_messages or injected via agent.interrupt().
This commit adds the same _is_user_authorized() check at the top of the busy
handler, before any message queuing, steering, or interrupt logic. Unauthorized
messages are silently dropped (return True) with a warning log — matching the
cold-path behavior.
Affected platforms: Slack, Telegram, Discord, any adapter with shared-session
thread contexts.
Closes#17775
The `gemini` provider also serves Gemma (e.g. `gemma-4-31b-it`) and
historically other Google models like PaLM. Those reject
`extra_body.thinking_config` with HTTP 400:
Unknown name "thinking_config": Cannot find field
`_build_gemini_thinking_config()` was unconditionally producing a
config dict for any model on the `gemini` / `google-gemini-cli`
provider, which `ChatCompletionsTransport.build_kwargs` then dropped
into `extra_body["thinking_config"]`. The result: every chat turn for
Gemma users on the gemini provider blew up at the API edge.
The fix is the same shape Hermes already uses for the Gemini-2.5 vs
Gemini-3 family clamping: normalise the model id, strip an
`OpenRouter`-style `google/` prefix, and short-circuit early when the
result doesn't start with `gemini`. We return `None` rather than
`{"includeThoughts": False}`, because the API rejects the field name
itself — even the polite "off" form trips the same 400.
Three regression tests cover Gemma with reasoning enabled, Gemma with
reasoning disabled, and the `google/gemma-…` OpenRouter-style id; the
existing Gemini-2.5 / Gemini-3 / `google/gemini-…` cases keep passing
because the Gemini guard fires after the prefix strip.
Fixes#17426
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Ports PR #17888's send_multiple_images ABC to every gateway platform that
has a native multi-attachment API, so images arrive as a single bundled
message instead of N separate ones.
Native overrides:
- Telegram: send_media_group (10 photos per album, chunks over); animated
GIFs peeled off and routed through send_animation (albums don't support
animations)
- Discord: channel.send(files=[...]) (10 attachments per message, chunks
over); URL images downloaded into BytesIO so they render inline; forum
channels use create_thread with files=[...]
- Slack: files_upload_v2(file_uploads=[...]) (10 per call, chunks over);
respects thread_ts; records thread participation
- Mattermost: single post with file_ids list (5 per post — Mattermost cap,
chunks over)
- Email: single SMTP message with multiple MIME attachments (no chunk cap,
SMTP size governs); remote URLs remain linked in body (parity with
existing send_image)
All platforms fall back to the base per-image loop on any failure, so a
single bad image in a batch never loses the rest.
Matrix, WhatsApp, and single-attachment platforms (BlueBubbles, Feishu,
WeCom, WeChat, DingTalk) continue to use the base default loop — their
server APIs only accept one attachment per message anyway.
Tests: adds tests/gateway/test_send_multiple_images.py with 19 targeted
tests covering base default loop, chunking, animation peel-off, fallback
paths, and empty-batch no-ops across all five new overrides.
Co-authored-by: Maxence Groine <maxence@groine.fr>
Adds a new `send_multiple_images` method to the ``BasePlatformAdapter``
that implements the default "One image per message" loop and allows for
platform-specific overriding.
Implements such an override for the Signal adapter, batching images
and trying (best-effort) to work around rate-limits for voluminous
batches using a specific scheduler.
Also implements batching + rate-limit handling in the `send_message`
tool.
New tests added for the Signal adapter, its rate-limit scheduler and the
`send_message` tool
Merge resolved conflicts in web/src/{i18n/{en,zh,types}.ts,lib/api.ts}
by keeping both this branch's `profiles` additions and upstream's new
`models` page additions.
Copilot review feedback:
- Implement POST /api/profiles/{name}/open-terminal endpoint (already
present); align Windows branch to `cmd.exe /c start "" <cmd>` so it
matches the new test and spawns a fresh window instead of /k reusing
the parent console.
- Move backslash escaping out of the macOS AppleScript f-string
expression (Python <3.12 disallows backslashes inside f-string
expression parts).
- Patch `_get_wrapper_dir` via monkeypatch in
test_profiles_create_creates_wrapper_alias_when_safe so the test no
longer writes to the real `~/.local/bin`.
- Extend test_dashboard_browser_safe_imports to scan `.ts` files in
addition to `.tsx`.
- Switch upstream's new ModelsPage.tsx away from the `@nous-research/ui`
root barrel onto per-component subpaths to satisfy the stricter scan.
- Fix NouiTypography `leading-1.4` -> `leading-[1.4]` so Tailwind
actually emits the line-height for the `sm` variant.
- Guard ProfilesPage.openSoulEditor against out-of-order responses by
tracking the latest requested profile via a ref.
- Replace ProfilesPage's hand-rolled setup command with a fetch to
`/api/profiles/{name}/setup-command` so the copied command always
matches what the backend would actually run (handles wrapper-alias
collisions and reserved names correctly).
- Wire SOUL.md textarea label `htmlFor` -> textarea `id` so screen
readers and clicking the label work as expected.
The sandbox-side `_call()` in both the UDS and file-based transports was
not thread-safe, so scripts that call tools from multiple threads (e.g.
`ThreadPoolExecutor` over `terminal()`) inside a single `execute_code`
run could silently receive each other's responses.
Root cause:
* UDS transport — a single module-level `_sock` was shared across all
threads; the newline-framed protocol has no request-id; and the
server-side RPC loop handles one connection serially. With concurrent
callers, each thread would `sendall()` then race to `recv()` the next
newline-terminated response from the shared buffer, so responses got
delivered to the wrong caller.
* File transport — `_seq += 1` is a non-atomic read-modify-write, so
two threads could allocate the same sequence number and clobber each
other's request/response files.
Fix: guard `_call()` with a `threading.Lock` in the UDS case (covering
send+recv), and guard `_seq` allocation with a lock in the file case.
No protocol change.
Regression tests cover both the generated-source level (lock is present
and used) and an end-to-end concurrency test: running a sandboxed
ThreadPoolExecutor of 10 `terminal()` calls against a slow mock
dispatcher, asserting every caller sees its own tagged response. The
test fails without the fix (10/10 mismatched, matching real-world
repro) and passes with it.
The v11→v12 migrate_config step writes the API mode for every entry
under the new transport: field (per the v12+ schema in
_normalize_custom_provider_entry). _get_named_custom_provider
read the legacy api_mode: spelling only, so for every migrated
config the lookup returned None for the api mode.
Downstream, _resolve_named_custom_runtime then falls back through
custom_provider.get("api_mode") or _detect_api_mode_for_url(base_url)
or "chat_completions". For loopback URLs (proxies, local servers)
or unknown hostnames, the URL detector returns None and the resolver
silently downgrades the configured codex_responses /
anthropic_messages transport to chat_completions. Requests
get sent to /v1/chat/completions instead of /v1/responses or
/v1/messages and the provider 404s — or worse, returns a usable
chat_completions response while skipping the model's reasoning /
caching surface.
Fix: read both field names — entry.get("api_mode") or
entry.get("transport") — at the two match-by-key + match-by-name
branches in _get_named_custom_provider. The runtime normaliser
_normalize_custom_provider_entry already accepts both spellings;
this lifts the same compat into the direct-dict reader so v12+
configs work without going through the shim.
Adds three regression tests under
tests/hermes_cli/test_user_providers_model_switch.py:
- transport field is read on the match-by-key branch
- legacy api_mode spelling still works for hand-edited configs
- transport is read on the match-by-display-name branch
run_job() ignored the result's `failed=True` / `completed=False` flags
that agent.run_conversation populates on API exhaustion, mid-run
interrupts, and model aborts. Because final_response on those paths is
often a non-empty error string ("API call failed after 3 retries:
Request timed out."), the existing empty-response soft-fail in
_process_job did not trip either: the error text was delivered as if it
were the agent's reply and last_status was set to "ok" with no error
notification. Detect those flags right after the dict-shape guard and
raise so the existing except handler builds the proper failure tuple,
preserving the agent's error message via result["error"].
Adds a parametrized regression covering: API-retry-exhausted with error
text in final_response, completed=False with no final_response,
completed=False without an explicit failed flag, and the partial-reply
plus failed=True case. Plus a guard that a normal completed=True success
result is still treated as success.
Fixes#17855
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the in-band pending-message drain spawns a fresh task and
transfers ownership via _session_tasks[session_key] = drain_task,
the original task still unwinds through the finally block. The
drain task picks up the same interrupt_event in its own
_process_message_background entry, so an unconditional
_release_session_guard(session_key, guard=interrupt_event) at the
end of the finally matches and deletes _active_sessions[session_key]
while the drain task is still pending its first await.
A concurrent inbound message arriving in that handoff window passes
the Level-1 guard (no entry exists) and spawns a second
_process_message_background for the same session — two agents on
one session_key, duplicate responses, duplicate tool calls.
Fix: only call _release_session_guard when the current task still
owns _session_tasks[session_key]. When ownership has been
transferred to a drain task, leave _active_sessions populated; the
drain task's own lifecycle releases it. This mirrors the
late-arrival drain path in the same finally block, which already
leaves both entries alone after handing off.
Also reorder stdlib imports in the new regression test file to
match the gateway test convention (stdlib before third-party).
Regression test: capture _active_sessions[sk] identity at every
handler entry across a 2-step in-band drain chain and assert the
guard Event identity stays the same. Pre-fix, the original task's
finally deletes the entry, the drain task falls through to the
`or asyncio.Event()` branch, and a fresh Event is installed —
identity diverges. Post-fix, the entry is preserved and the drain
task reuses the original Event.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`_process_message_background` finished a turn, found a queued
follow-up, and drained it via `await
self._process_message_background(pending_event, session_key)`. Each
chained follow-up added a frame to the call stack instead of starting
fresh. Under sustained pending-queue activity (e.g. a user sending
follow-ups faster than the agent finishes turns) the C stack would
exhaust at ~2000 nested frames and SIGSEGV the process.
Mirror the late-arrival drain pattern that already exists in the same
function: spawn a new `asyncio.create_task(...)` for the pending event
and return so the current frame can unwind. The new task takes
ownership via `_session_tasks[session_key]`.
The late-arrival drain in `finally` could now race with the in-band
drain across the `await typing_task` / `await stop_typing` window, so
add a guard: if `_session_tasks[session_key]` is no longer the current
task, an in-band drain already spawned a follow-up task — re-queue the
late-arrival event so that task picks it up after its current event,
instead of spawning a second concurrent task for the same session_key.
Regression test (`test_pending_drain_no_recursion.py`) chains 12
follow-ups and asserts the recorded
`_process_message_background` stack depth stays bounded at handler
entry. Pre-fix: depths grow linearly `[1,2,3,…,12]`. Post-fix: all
depths are `1`.
`test_duplicate_reply_suppression::test_stale_response_suppressed_when_interrupted`
called `_process_message_background` directly and implicitly relied on
the old recursive `await` semantic — updated to wait for the spawned
drain task before checking the sent list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Piper (OHF-Voice/piper1-gpl) is a fast, local neural TTS engine from the
Home Assistant project that supports 44 languages with zero API keys.
Adds it as a native built-in provider alongside edge/neutts/kittentts,
installable via 'hermes tools' with one keystroke.
What ships:
- New 'piper' built-in provider in tools/tts_tool.py
- Lazy import via _import_piper()
- Module-level voice cache keyed on (model_path, use_cuda) so switching
voices doesn't invalidate older cached voices
- _resolve_piper_voice_path() accepts either an absolute .onnx path or a
voice name (auto-downloaded on first use via 'python -m
piper.download_voices --download-dir <cache>')
- Voice cache at ~/.hermes/cache/piper-voices/ (profile-aware via
get_hermes_dir)
- Optional SynthesisConfig knobs: length_scale, noise_scale,
noise_w_scale, volume, normalize_audio, use_cuda — passed through
only when configured, so older piper-tts versions aren't broken
- WAV output then ffmpeg conversion path (same as neutts/kittentts) so
Telegram voice bubbles work when ffmpeg is present
- Piper added to BUILTIN_TTS_PROVIDERS so a user's
tts.providers.piper.command cannot shadow the native provider
(regression test included)
- 'hermes tools' wizard entry
- Piper appears under Voice and TTS as local free, with
'pip install piper-tts' auto-install via post_setup handler
- Prints voice-catalog URL and default-voice info after install
- config.yaml defaults
- tts.piper.voice defaults to en_US-lessac-medium
- Commented advanced knobs for discoverability
- Docs
- New 'Piper (local, 44 languages)' section in features/tts.md
explaining install path, voice switching, pre-downloaded voices,
and advanced knobs
- Piper listed in the ten-provider table and ffmpeg table
- Custom-command-providers section updated to drop the Piper example
(now native) and add a piper-custom example for users with their own
trained .onnx models
- overview.md bumps provider count to ten
- Tests (tests/tools/test_tts_piper.py, 16 tests)
- Registration (BUILTIN_TTS_PROVIDERS, PROVIDER_MAX_TEXT_LENGTH)
- _resolve_piper_voice_path across every branch: direct .onnx path,
cached voice name, fresh download with correct CLI args, download
failure, successful-exit-but-missing-files, empty voice to default
- _generate_piper_tts: loads voice once, reuses cache, voice-name
download wiring, advanced knobs flow through SynthesisConfig
- text_to_speech_tool end-to-end dispatch and missing-package error
- check_tts_requirements: piper availability toggles the return value
- Regression guard: piper cannot be shadowed by a command provider
with the same name
- Pre-existing test_tts_mistral test broadened to mock the new
piper/kittentts/command-provider checks (otherwise it false-passes
when piper is installed in the test venv)
E2E verification (live):
Actual pip install piper-tts, config piper + en_US-lessac-low,
text_to_speech_tool call, voice auto-downloaded from HuggingFace,
WAV synthesized, ffmpeg-converted to Ogg/Opus. Second call hits the
cache (~60ms). Cache dir populated with .onnx and .onnx.json.
This caught a real bug during development: the first pass used '-d' as
the download-dir flag; the actual piper.download_voices CLI wants
'--download-dir'. Fixed before PR opened.
Six tests in this file failed in CI (-n auto) after #17832 landed because
other tests on the same xdist worker reload hermes_cli.main:
tests/hermes_cli/test_env_loader.py:85-86
sys.modules.pop('hermes_cli.main', None)
importlib.import_module('hermes_cli.main')
tests/hermes_cli/test_skills_subparser.py:24-25
del sys.modules['hermes_cli.main']
When either ran first on a worker, our top-of-file
'from hermes_cli.main import _kill_stale_dashboard_processes' captured a
stale function object whose __globals__ points at the old module dict.
patch('hermes_cli.main._find_stale_dashboard_pids', ...) then patched the
new module, but the stale function resolved the dependency via its stale
__globals__, so every patch became a no-op: pids=[] → early return → no
signals, no output, assertions failed.
Fix: add an autouse fixture that rebinds the three module-level names to
whatever is currently live in sys.modules['hermes_cli.main'] before each
test runs. The pollutants in the other two files are load-bearing for
their own tests, so fixing it on the consumer side is correct.
Repro: pytest tests/hermes_cli/test_env_loader.py tests/hermes_cli/test_update_stale_dashboard.py
Voscko reported curator.auxiliary.provider/model was advertised in the
docs but ignored — the review fork read only model.provider/default. The
narrow fix would wire the one-off key through, but that leaves curator
as a parallel system: not in `hermes model` → auxiliary picker, not in
the dashboard Models tab, missing per-task base_url/api_key/timeout/
extra_body.
Unify curator with the rest of the aux task system so `hermes model`
and the dashboard configure it like every other aux task.
Four sources of truth updated:
- hermes_cli/config.py — add 'curator' slot to DEFAULT_CONFIG.auxiliary
(timeout=600 since reviews run long), drop the one-off curator.auxiliary
block from DEFAULT_CONFIG.curator.
- hermes_cli/main.py — add ('curator', 'Curator', 'skill-usage review pass')
to _AUX_TASKS so the CLI picker offers it.
- hermes_cli/web_server.py — add 'curator' to _AUX_TASK_SLOTS so the
dashboard REST endpoint accepts it.
- web/src/pages/ModelsPage.tsx — add Curator entry so the dashboard
Models tab renders the task.
agent/curator.py _resolve_review_model() now reads auxiliary.curator
first (canonical), falls back to legacy curator.auxiliary (with an info
log asking users to migrate), then falls back to the main chat model.
Pre-unification users keep working.
Docs updated: docs/user-guide/features/curator.md now points at
`hermes model` → auxiliary → Curator and the dashboard Models tab.
Tests: 6 unit tests on _resolve_review_model (auto default, canonical
slot honored, partial override fallback, legacy fallback with
deprecation log assertion, new-wins-over-legacy, empty-config safety)
plus a cross-registry test that curator is wired into all four sources
of truth. test_aux_tasks_keys_all_exist_in_default_config already
covers the DEFAULT_CONFIG ↔ _AUX_TASKS invariant.
Reported by Voscko on Discord.
UserMessageChunk and AgentMessageChunk do not have a message_id field
in the ACP schema. Passing it silently dropped the kwarg (pydantic
does not raise on unknown init kwargs here) and the subsequent test
assertions on .message_id raised AttributeError. Strip the dead
plumbing (uuid import, message_id= kwarg on both chunk types, unused
session_id/index parameters) and remove the matching .message_id
asserts from the test.
Adds a deterministic pre-check on top of htsh's exception-based fallback:
before calling /content/abstract or /content/overview on a non-pseudo URI,
probe /api/v1/fs/stat. If the server says the URI is a file, route straight
to /content/read instead of eating a failing 500 round-trip.
This is the same idea pty819 and chennest independently landed in PRs
#12757 and #12937 — merged here on top of htsh's broader fix so we keep
pseudo-URI normalization and v0.3.3 browse-shape handling while avoiding
the slow exception path on servers that return a raised 500 every time.
The exception fallback from #5886 stays in place for environments where
fs/stat is unavailable or returns an unfamiliar shape.
Also credits pty819, chennest, and htsh in AUTHOR_MAP so future release
notes attribute them correctly.
OpenViking returns 500 for /content/abstract and /content/overview when URI points to mem_*.md files.
Add resilient fallback to /content/read for non-pseudo summary file URIs while preserving pseudo summary normalization.
Also add regression tests for fallback behavior.