Telegram groups emit a single bot_command entity covering the whole
/cmd@botname span with no accompanying mention entity, so the existing
mention gate in _message_mentions_bot dropped slash commands sent via
the bot-menu autocomplete whenever require_mention is enabled.
Recognise bot_command entities whose @botname suffix matches the bot
username (case-insensitive) as a direct mention, and keep rejecting
commands addressed at other bots. Fixes#15415.
Harden the Matrix adapter's sender-drop guards so bot-self events and
appservice/bridge identities never reach the gateway's pairing flow or
the agent loop.
Two filters, applied as early as possible in _on_room_message (and
_on_reaction for the self-filter):
1. _is_self_sender(sender) — case-insensitive + whitespace-trimmed
equality with self._user_id. When self._user_id is still empty
(whoami has not resolved, or login failed), returns True
defensively: an unidentified bot dropping its own events is always
preferable to falling into an echo loop. The previous byte-for-byte
equality check let differently-cased copies of the bot's MXID slip
through, and an unresolved self-ID silently disabled the guard.
2. _is_system_or_bridge_sender(sender) — drops appservice namespace
puppets (conventional @_bridge_...:server form) and malformed
senders with an empty localpart. These identities used to fall
through to the gateway's unauthorized-user path, trigger a pairing
code, and — once an operator approved the bridge — every outbound
message the bridge relayed would loop back as an authorized user
message. This was the root of the 'hall of mirrors' symptom.
Fixes#15763
Test plan
---------
scripts/run_tests.sh tests/gateway/test_matrix.py
scripts/run_tests.sh tests/gateway/test_matrix_mention.py tests/gateway/test_matrix_voice.py
All 182 tests pass. 14 new regression tests cover exact / case-insensitive
/ whitespace / unresolved-self-id matches, bridge prefix detection, empty
sender, and the full _on_room_message drop path.
Four independent session-UX bugs reported by an external user (#16294).
/save wrote hermes_conversation_<ts>.json to CWD — invisible to
'hermes sessions browse' and easy to lose. Snapshots now write under
~/.hermes/sessions/saved/ and the command prints the absolute path plus
a 'hermes --resume <id>' hint for the live DB-indexed session.
'hermes sessions browse' default --limit raised from 50 to 500. With the
old ceiling, users with moderately long histories saw only the most
recent 50 rows and assumed older sessions had been lost.
TUI session.list (`/resume` picker) switched from a hardcoded allow-list
of 13 gateway source names to a deny-list of just { 'tool' }. Sessions
tagged acp / webhook / user-defined HERMES_SESSION_SOURCE values and
any newly-added platform now surface. Default limit 20 → 200.
ollama-cloud provider setup passes force_refresh=True to
fetch_ollama_cloud_models() so a user entering their API key sees the
fresh catalog (e.g. deepseek v4 flash, kimi k2.6) immediately instead
of waiting up to an hour for the disk cache TTL to expire.
Closes#16294.
When the gateway intercepts a pending /update prompt and the user sends
a recognized slash command (/new, /help, ...), the command now dispatches
normally AND the detached update subprocess is unblocked by writing a
blank .update_response. _gateway_prompt reads '' → strips → returns the
prompt's default (typically a safe 'n' / skip), so the update process
exits cleanly instead of blocking on stdin until the 30-minute watcher
timeout.
Also clears _update_prompt_pending[session_key] on this path so stray
future input for the same session isn't re-intercepted.
Extends PR #15849 with tests for the new cancel-write + a regression
test pinning the legacy behavior of unrecognized /foo slash commands
still being consumed as the response.
Slack Bolt posts are not editable like CLI spinners; medium-tier new still emitted a permanent line per tool start (issue #14663).
- Built-in slack default: off; other tier-2 platforms unchanged.
- Adjust /verbose isolation test for off to new cycle.
- Migration tests: read/write config.yaml as UTF-8 (Windows locale).
Extends the existing channel_skill_bindings mechanism (previously
Discord-only) to Slack, so a channel or DM can auto-load one or more
skills at session start without relying on the model's skill selector
for every short reply.
Motivation: Mats's German flashcards DM pushes a cron-driven card
5x/day; he responds with one-word guesses like 'work'. Previously each
reply required the main agent to decide whether to load german-flashcards
(full opus turn just to pick a skill). With the binding configured per
Slack channel, the skill is injected at session start and grading runs
directly.
Changes:
- Extract resolve_channel_skills() from DiscordAdapter._resolve_channel_skills
into gateway.platforms.base (now shared across adapters).
- DiscordAdapter._resolve_channel_skills delegates to the shared helper
(behavior preserved — existing test suite still passes unchanged).
- SlackAdapter: resolve channel_skill_bindings on each message and attach
auto_skill to MessageEvent. gateway/run.py already handles auto-skill
injection on new sessions; this just wires Slack through it.
- gateway/config.py: accept channel_skill_bindings in slack: block of
config.yaml (was Discord-only).
- Tests: new tests/gateway/test_slack_channel_skills.py with 11 cases
covering DM/thread/parent resolution, single-vs-list skills, dedup,
malformed entries. Discord suite unchanged.
- Docs: add 'Per-Channel Skill Bindings' section to Slack user guide.
Config example:
slack:
channel_skill_bindings:
- id: "D0ATH9TQ0G6"
skills: ["german-flashcards"]
Enter while the agent is busy can now inject the typed text via /steer —
arriving at the agent after the next tool call — instead of interrupting
(current default) or queueing for the next turn.
Changes:
- cli.py: keybinding honors busy_input_mode='steer' by calling
agent.steer(text) on the UI thread (thread-safe), with automatic
fallback to 'queue' when the agent is missing, steer() is unavailable,
images are attached, or steer() rejects the payload. /busy accepts
'steer' as a fourth argument alongside queue/interrupt/status.
- gateway/run.py: busy-message handler and the PRIORITY running-agent
path both route through running_agent.steer() when the mode is 'steer',
with the same fallback-to-queue safety net. Ack wording tells users
their message was steered into the current run. Restart-drain queueing
now also activates for 'steer' so messages aren't lost across restarts.
- agent/onboarding.py: first-touch hint has a steer branch for both
CLI and gateway.
- hermes_cli/commands.py: /busy args_hint updated to include steer,
and 'steer' is registered as a subcommand (completions).
- hermes_cli/web_server.py: dashboard select widget offers steer.
- hermes_cli/config.py, cli-config.yaml.example, hermes_cli/tips.py:
inline docs updated.
- website/docs/user-guide/cli.md + messaging/index.md: documented.
- Tests: steer set/status path for /busy; onboarding hints;
_load_busy_input_mode accepts steer; busy-session ack exercises
steer success + two fallback-to-queue branches.
Requested on X by @CodingAcct.
Default is unchanged (interrupt).
Multiple overlapping Slack attachment improvements:
1. Upload retry with backoff on transient errors (429, 5xx, connection
reset, rate_limited, service unavailable). New _is_retryable_upload_error
helper covers three upload paths: _upload_file, send_video,
send_document. Up to 3 attempts with 1.5s * attempt backoff.
2. Thread participation tracking: successful file uploads now add the
thread_ts to _bot_message_ts, mirroring how text replies are tracked.
This lets follow-up thread messages auto-trigger the bot (same
engagement rules as replied threads).
3. Thread metadata preservation in the image redirect-guard fallback
(send_image → send text fallback) and in two gateway.run.py send
paths (image + document fallback calls).
4. HTML response rejection in _download_slack_file_bytes. Parallels
the existing check in _download_slack_file. Guards against Slack
returning a sign-in / redirect page as document bytes when scopes
are missing, so the agent doesn't get HTML-as-a-PDF.
5. File lifecycle event acks (file_shared / file_created / file_change).
These events arrive around snippet uploads. Acking them silences the
slack_bolt 'Unhandled request' 404 warnings without changing behavior.
6. Post-loop message type classification so a mixed image+document upload
classifies as PHOTO (or VOICE if no image), falling back to DOCUMENT.
Previously, the per-file classification in the inbound loop could be
overwritten unpredictably.
7. Expanded text-inject whitelist in inbound document handling to cover
.csv, .json, .xml, .yaml, .yml, .toml, .ini, .cfg (up to 100KB) so
snippets and config files are directly visible to the agent, not just
cached as opaque uploads. Paired with new MIME entries in
SUPPORTED_DOCUMENT_TYPES in base.py.
Squashed from two commits in #11819 so the single commit carries the
contributor's GitHub attribution (the original commits were authored
under a local dev hostname).
Ports openclaw/openclaw#72038 to hermes-agent.
Telegram's `editMessageText` preserves the original message timestamp,
so a long-running streamed reply (reasoning models that take 60+ seconds
to finish) would keep the first-token timestamp even after completion.
Users can't tell how long a task actually took.
When a preview message has been visible for >= 60s (configurable via
`streaming.fresh_final_after_seconds`), finalize by sending a fresh
message instead of editing in place, then best-effort delete the stale
preview. Short previews still edit in place (the existing fast path).
Implementation notes adapted from OpenClaw's TypeScript original:
- `StreamConsumerConfig` gains `fresh_final_after_seconds` (default 0 =
legacy edit-in-place). Gateway-level `StreamingConfig` defaults to 60.
- `GatewayStreamConsumer` tracks `_message_created_ts` at first-send and
checks it in `_send_or_edit` on `finalize=True`. New helpers
`_should_send_fresh_final` + `_try_fresh_final`.
- `BasePlatformAdapter` gains optional `delete_message(chat_id, message_id)`
returning False by default. `TelegramAdapter` implements it via
`_bot.delete_message`.
- `gateway/run.py` only enables fresh-final for `Platform.TELEGRAM`;
other platforms ignore the setting (they don't have the stale-edit
timestamp problem or edit-then-read works cheaply).
- Fallback to normal edit on any fresh-send failure — no user-visible
regression if Telegram rate-limits a send or the message is gone.
Tests: 15 new cases in tests/gateway/test_stream_consumer_fresh_final.py
covering short/long previews, config plumbing, delete-support absent,
send-failure fallback, __no_edit__ sentinel safety, and StreamingConfig
round-trip.
Co-authored-by: Hermes Agent <agent@nousresearch.com>
Slack's modern composer sends messages with a 'blocks' array that
contains rich_text elements. When a user forwards or quotes another
message, the quoted content shows up in the rich_text_quote children
of that array — and is NOT included in the plain 'text' field. The
agent saw only the lossy plain text and was blind to forwarded /
quoted content. Same story for link unfurl previews (Notion, docs,
GitHub, etc.) which Slack puts in the 'attachments' array.
Two fixes in the inbound handler:
1. _extract_text_from_slack_blocks walks rich_text / rich_text_quote /
rich_text_list / rich_text_preformatted trees and renders readable
text ('> quoted', '• bullet', code fences), dedupes against the
plain text field, and appends the extracted content so the agent
sees everything.
2. Link unfurl / attachment preview extraction reads title, url,
body, and footer from the 'attachments' array and appends a
'📎 [title](url)\n body\n _footer_' section per preview.
Skips is_msg_unfurl to avoid echoing our own Slack replies back.
Routing is careful not to trust augmented text: mention gating
(is_mentioned) and slash-command detection both run against the
original 'text' field, so forwarded content containing '<@bot>' or
'/deploy' in a quote can't trick the bot into responding in a
channel it shouldn't or classifying a normal message as a command.
Adjustment from original PR: dropped _serialize_slack_blocks_for_agent,
which inlined a redacted JSON dump of non-rich_text blocks (section,
accessory, actions, etc.) — the agent would see the raw Block Kit
structure for UI-heavy alerts. It added up to 6000 characters to the
prompt context on every qualifying message with no opt-out. The
rich_text extraction and attachment unfurls cover the common bug-fix
case (quoted/forwarded content + link previews) without the prefill
tax. If a user needs block inspection later, it can return as a
config opt-in.
Also updates the Slack platform notes in session.py to accurately
describe what the gateway inlines.
Translate Slack attachment failures into actionable user-facing notices
instead of generic download errors. When a scope/auth/permission issue
breaks attachment processing, the user sees:
[Slack attachment notice]
- Slack attachment access failed for photo.jpg. Missing scope:
files:read. Update the Slack app scopes/settings and reinstall
the app to the workspace.
Two helpers do the translation:
_describe_slack_api_error — handles SlackApiError responses
(missing_scope, invalid_auth, file_not_found, access_denied, etc.)
_describe_slack_download_failure — handles httpx.HTTPStatusError
(401/403/404) and Slack-returns-HTML-sign-in fallbacks
Wired into three existing call sites:
- the Slack Connect files.info path (PR #11111) so scope errors
surface instead of being logged as generic "files.info failed"
- the image, audio, and document download paths so 401/403 and
HTML-body responses translate into actionable notices
Adjustment from original PR: dropped _probe_slack_file_access_issue,
the proactive pre-download files.info probe. It added one extra
Slack API call per attachment even on healthy ones, and overlapped
with the existing files.info call from PR #11111. The post-failure
translation path covers the same user-facing diagnostic value
without the per-message tax.
Also documents files:read scope more prominently in the Slack setup
guide and troubleshooting table.
Contributed back from https://github.com/xinbenlv/zn-hermes-agent.
Closes#7015.
Co-authored-by: xinbenlv <zzn+pa@zzn.im>
The Slack thread-context fetcher used to drop every message with a
bot_id, which silently erased the thread parent whenever a cron job (or
any other bot) had posted it. As a result, replies to a cron-posted
summary lost all context and the agent answered as if from a blank
thread.
Changes:
1. gateway/platforms/slack.py::_fetch_thread_context
- Keep the thread parent even when it was posted by a bot
(e.g. cron summaries, third-party integrations).
- Only skip *our own* prior bot replies to avoid circular context,
matching the per-workspace bot user id via _team_bot_user_ids so
multi-workspace deployments stay correct.
- Keep non-self bot children (useful third-party context).
2. gateway/platforms/slack.py::_handle_slack_message
- Populate MessageEvent.reply_to_text for thread replies (parity
with Telegram/Discord/Feishu/WeCom). gateway.run uses this field
to inject a [Replying to: "..."] prefix when the parent is not
already in the session history, which is exactly the scenario
triggered by cron-generated thread parents.
- New helper _fetch_thread_parent_text reuses the existing thread-
context cache (and its 60s TTL) to avoid duplicate
conversations.replies calls; falls back to a cheap limit=1 fetch
when the cache is cold.
Tests:
- Updated TestSlackThreadContext::test_skips_bot_messages to reflect
the new behaviour (self-bot child dropped, third-party bot kept).
- Added:
* test_fetch_thread_context_includes_bot_parent
* test_fetch_thread_context_excludes_self_bot_replies
* test_fetch_thread_context_multi_workspace
* test_fetch_thread_context_current_ts_excluded (regression guard)
* test_fetch_thread_parent_text_from_cache
* test_slack_reply_to_text_set_on_thread_reply
* test_slack_reply_to_text_none_for_top_level_message
Full Slack suite: 176 passed (was 169).
send_message(target='slack:<channel_id>') failed with "Could not
resolve" because _parse_target_ref had no Slack branch — Slack's
uppercase alphanumeric IDs fell through to channel-name resolution,
which only matched by name. As a fallback, the agent would retry with
bare target='slack' and post to the home channel instead.
Three fixes:
- _parse_target_ref recognizes Slack IDs (C/G/D/U/W prefix) as
explicit targets so the name-resolver is bypassed entirely.
- resolve_channel_name tries a case-sensitive raw-ID match before
the existing name match, so any platform's IDs resolve cleanly.
- _build_slack now actually calls users.conversations against each
workspace's AsyncWebClient (paginated), instead of only returning
session-history entries. This populates the directory with public
and private channels the bot has joined, so action='list' shows
them and they can also be addressed by name. Errors from one
workspace don't block others.
build_channel_directory becomes async (Slack web calls require it).
The two async-context callers in gateway/run.py are awaited; the
cron ticker thread call bridges via asyncio.run_coroutine_threadsafe.
Slack bot needs channels:read and groups:read scopes for full
enumeration; missing scopes degrade gracefully per-workspace.
addressing #15927
Extends the strict_mention feature so an @mention in strict mode no
longer persistently tags the thread as 'mentioned'. Without this, the
thread's first mention would permanently auto-trigger the bot on every
subsequent message — which is exactly what strict_mention is designed
to prevent. Closes the agent-to-agent ack loop hole hhhonzik identified
in #14117.
Co-authored-by: hhhonzik <me@janstepanovsky.cz>
Adds a strict_mention config option that, when enabled, requires an
explicit @-mention on every message in channel threads. Disables the
'once mentioned, forever in the thread' and session-presence auto-triggers.
- New _slack_strict_mention() helper (config.extra + SLACK_STRICT_MENTION env)
- Bridged top-level slack.strict_mention yaml to SLACK_STRICT_MENTION env,
matching require_mention/allow_bots bridging
- Unit tests for the helper + config bridge
Top-level channel messages arrive at _resolve_thread_ts with
metadata.thread_id set to the message's own ts, because the inbound
handler in _handle_message_event uses 'event.ts' as a session-keying
fallback when event.thread_ts is absent. That made metadata alone
insufficient to distinguish a real thread reply from a top-level
message, so reply_in_thread=false only took effect in DMs.
Use reply_to (== incoming message_id == ts for top-level messages) as
the tiebreaker: when metadata.thread_id == reply_to the 'thread' is the
synthetic session-keying fallback, not a real parent, so we reply
directly in the channel. Real thread replies (reply_to != thread_id)
still resolve to the parent thread and preserve conversation context.
Closes#9268.
Parameterize the test helpers in test_status_command.py to accept a
Platform and add two regression tests ensuring the first-run home-channel
onboarding uses '/hermes sethome' on Slack and '/sethome' everywhere else.
Co-authored-by: sgaofen <135070653+sgaofen@users.noreply.github.com>
Repeated /queue commands now each produce a full agent turn, in order,
with no merging. Previously the second /queue overwrote the first
because the handler wrote directly into the adapter's single-slot
_pending_messages dict.
- GatewayRunner grows a _queued_events overflow buffer (dict of list).
- /queue puts new items in the adapter's next-up slot when free,
otherwise appends to the overflow. After each run's drain consumes
the slot, the next overflow item is promoted so the recursive run
picks it up.
- /new and /reset clear the overflow.
- /status now reports queue depth when non-zero.
- Ack message shows the depth once it exceeds 1.
Helpers (_enqueue_fifo, _promote_queued_event, _queue_depth) use the
getattr default-fallback pattern so existing tests that build bare
GatewayRunner instances via object.__new__ keep working.
Every command in COMMAND_REGISTRY (/btw, /stop, /model, /help, /new,
/bg, /reset, ...) is now a first-class Slack slash command instead of
a /hermes <subcommand>. Users get the same autocomplete-driven slash
picker experience Slack users expect and that Discord and Telegram
already provide.
Previously Slack registered ONE native slash (/hermes) and split on
the first word, so typing /btw in Slack's composer got 'couldn't find
an app for /btw' because the workspace manifest never declared it.
Changes
- hermes_cli/commands.py: slack_native_slashes() + slack_app_manifest()
generate a Slack manifest from the registry (canonical names +
aliases + plugin commands), clamped to Slack's 50-slash cap with
/hermes reserved as the catch-all.
- gateway/platforms/slack.py: single regex matcher dispatches every
registered slash to _handle_slash_command, which dispatches on
command['command']. Legacy /hermes <subcommand> keeps working for
backward compat with older workspace manifests.
- hermes_cli/slack_cli.py + hermes_cli/main.py: new 'hermes slack
manifest' command prints/writes a full manifest (display info,
OAuth scopes, event subs, socket mode, slash commands) ready to
paste into 'Create from manifest' or Features → App Manifest.
- hermes_cli/setup.py: _setup_slack() now writes the manifest up-front
and points users at the 'From an app manifest' flow; also offers
to refresh the manifest on reconfigure for picking up new commands.
- Tests: 14 new tests covering native-slash dispatch (/btw, /stop,
/model), legacy /hermes <sub> compat, manifest structure, and
telegram<->slack parity (every Telegram command must also register
as a Slack slash). Existing /hermes-registration test updated to
assert the new regex matches /hermes, /btw, /stop, /model, /help.
- Docs: slack.md gains a 'Slash Commands' section + Option A manifest
flow in Step 1; cli-commands.md documents 'hermes slack manifest'.
Users pick up the new slashes by running 'hermes slack manifest --write'
and pasting into Features → App Manifest → Edit in their Slack app
config, then Save (Slack prompts for reinstall if scopes changed).
Address Copilot review findings:
1. Gate _last_activity_desc on interrupt_depth == 0 alongside _last_activity_ts.
Both fields are semantically paired — desc describes the activity *at* ts.
Updating desc without ts made get_activity_summary() report "starting new
turn (cached)" for 20+ minutes while the timestamp showed the true stale
duration, producing misleading diagnostic output.
2. Monkeypatch gateway.run.time.time to a fixed epoch in tests that assert
on _last_activity_ts values. Real time.time() comparisons were latently
flaky under slow CI or NTP adjustments. _FAKE_NOW = 10_000.0 is used
as the reference; assertions are now exact equality rather than >=.
3. Add test_fresh_turn_resets_desc and test_interrupt_turn_preserves_desc to
directly cover the gated desc behaviour introduced by (1).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_last_activity_ts was unconditionally reset to time.time() on every
_agent_cache hit. For interrupt-recursive _run_agent calls
(_interrupt_depth > 0) this silently reset the inactivity watchdog's
idle clock on each re-entry, preventing the 30-min timeout from ever
firing when a turn got stuck in an interrupt loop. A stuck session
would emit "Still working... iteration 0/60, starting new turn (cached)"
heartbeats indefinitely instead of timing out.
Gate the reset on _interrupt_depth == 0 only. Fresh external turns
still receive the reset so a session idle for 29 min doesn't trip the
watchdog before the new turn makes its first API call (#9051).
The per-turn reset logic is extracted into a static helper
_init_cached_agent_for_turn() to make it directly testable.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Follow-up to PR #16053 (/btw as /background alias). Cleans up the
plumbing added exclusively for the old ephemeral /btw handler and
repairs a broken btw bypass that landed between my refactor and this
follow-up.
run_agent.py:
- Remove persist_session kwarg, instance attr, and _persist_session
short-circuit. Only /btw ever passed persist_session=False; with
/btw gone the default (always persist) is the only behavior anyone
ever wanted.
gateway/run.py:
- Remove the unreachable 'if _cmd_def_inner.name == "btw"' block
(PR #16059). Canonical name for a /btw message is 'background' after
alias resolution — the comparison could never be true, and it called
_handle_btw_command which no longer exists. The /background branch
above it already dispatches /btw correctly.
tests/gateway/test_running_agent_session_toggles.py:
- Fix test_btw_dispatches_mid_run to mock _handle_background_command
(the real dispatch target for /btw) instead of the deleted
_handle_btw_command.
/btw spawns a parallel ephemeral side-question task (self-guarded against
concurrent /btw on the same chat) — exactly like /background. But it was
missing from the running-agent bypass list in _handle_message(), so it
fell through to the catch-all and returned:
⏳ Agent is running — /btw can't run mid-turn. Wait for the current
response or /stop first.
That's the opposite of what /btw is for — asking a side question while
the main turn is still working. Add the bypass next to /background and a
regression test covering the mid-turn dispatch path.
Reported by @IuriiTiunov on Telegram.
Instead of a blocking first-run questionnaire, show a one-time hint the first
time the user hits each behavior fork:
1. First message while the agent is working — appends a hint to the busy-ack
explaining the /busy queue vs /busy interrupt knob, phrased to match the
mode that was just applied (don't tell a queue-mode user to switch to
queue).
2. First tool that runs for >= 30s in the noisiest progress mode
(tool_progress: all) — prints a hint about /verbose to cycle display
modes (all -> new -> off -> verbose). Gated on /verbose actually being
usable on the surface: always shown on CLI; on gateway only shown when
display.tool_progress_command is enabled.
Each hint is latched in config.yaml under onboarding.seen.<flag>, so it
fires exactly once per install across CLI, gateway, and cron, then never
again. Users can wipe the section to re-see hints.
New:
- agent/onboarding.py — is_seen / mark_seen / hint strings, shared by
both CLI and gateway.
- onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in
load_cli_config defaults (cli.py). No _config_version bump — deep
merge handles new keys.
Wired:
- gateway/run.py: _handle_active_session_busy_message appends the hint
after building the ack. progress_callback tracks tool.completed
duration and queues the tool-progress hint into the progress bubble.
- cli.py: CLI input loop appends the busy-input hint on the first busy
Enter; _on_tool_progress appends the tool-progress hint on the first
>=30s tool completion. In-memory CLI_CONFIG is also updated so
subsequent fires in the same process are suppressed immediately.
All writes go through atomic_yaml_write and are wrapped in try/except
so onboarding can never break the input/busy-ack paths.
The base adapter's auto-TTS path fired on any voice message unless the
chat had explicitly run /voice off — it never read voice.auto_tts from
config.yaml, so users who set auto_tts: false still got audio replies.
Gate the base adapter on a three-layer decision instead:
1. chat in _auto_tts_enabled_chats (explicit /voice on|tts) → fire
2. chat in _auto_tts_disabled_chats (explicit /voice off) → suppress
3. else → voice.auto_tts global default
Runner now pushes voice.auto_tts onto the adapter as _auto_tts_default
and mirrors /voice on|tts chats into _auto_tts_enabled_chats via the
existing _sync_voice_mode_state_to_adapter path. /voice off still wins.
Closes#16007.
When the LLM response carries N parallel tool calls, the agent fires
N tool.started events back-to-back before its interrupt check runs.
A user sending /stop mid-batch would see the '⚡ Interrupting current
task' ack followed by a trail of 🔍 web_search bubbles for the remaining
events in the batch — making the interrupt feel ignored.
progress_callback and the drain loop in send_progress_messages now
check agent.is_interrupted (via agent_holder[0], the existing
cross-scope handle). Events that arrive after interrupt are dropped
at both the queueing and rendering stages. The '⚡ Interrupting'
message is sent through a separate adapter path and is unaffected.
Fixes#15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback.
## What changed
New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching.
`agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe).
Wired through five call sites that previously either duplicated the lookup or ignored it entirely:
- `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning)
- `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch
- `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg
- `gateway/run.py` /model confirmation (picker callback + text path)
- `gateway/run.py` `_format_session_info` (/info)
## Context probe tiers
`CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency.
## Repro (from #15779)
```yaml
custom_providers:
- name: my-custom-endpoint
base_url: https://example.invalid/v1
model: gpt-5.5
models:
gpt-5.5:
context_length: 1050000
```
`/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000".
## Tests
- `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants
- `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver
- `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K)
- `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier
Add ability to interrupt a running agent via the runs API. Previously
/v1/runs could start a run and subscribe to events, but there was no
way to cancel it. The new endpoint stores agent and task references
during execution, calls agent.interrupt() to stop LLM calls, then
cancels the asyncio task.
Includes 15 tests covering start, events, and stop scenarios.
The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.
Problems with flush_memories:
- Pre-dates the background review loop. It was the only memory-save
path when introduced; the background review now fires every 10 user
turns on CLI and gateway alike, which is far more frequent than
compression or session reset ever triggered flush.
- Blocking and synchronous. Pre-compression flush ran on the live agent
before compression, blocking the user-visible response.
- Cache-breaking. Flush built a temporary conversation prefix
(system prompt + memory-only tool list) that diverged from the live
conversation's cached prefix, invalidating prompt caching. The
gateway variant spawned a fresh AIAgent with its own clean prompt
for each finalized session — still cache-breaking, just in a
different process.
- Redundant. Background review runs in the live conversation's
session context, gets the same content, writes to the same memory
store, and doesn't break the cache. Everything flush_memories
claimed to preserve is already covered.
What this removes:
- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
(and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
_check_compression_model_feasibility (headroom was only needed
because flush dragged the full main-agent system prompt along;
the compression summariser sends a single user-role prompt so
new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
flush-specific paths
What this renames (with read-time backcompat on sessions.json):
- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
The session-expiry watcher still uses the flag to avoid re-running
finalize/eviction on the same expired session; the new name
reflects what it now actually gates. from_dict() reads
'expiry_finalized' first, falls back to the legacy 'memory_flushed'
key so existing sessions.json files upgrade seamlessly.
Supersedes #15631 and #15638.
Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites. No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
Two adjustments to make CI pass:
- In gateway/platforms/matrix.py: `DeviceID` is `NewType("DeviceID", str)`,
so passing `client.device_id` directly (already a str) works identically
at runtime. The explicit import was cosmetic and tripped CI environments
where `mautrix.types` doesn't re-export DeviceID at the expected path
("cannot import name 'DeviceID' from 'mautrix.types' (unknown location)").
- In tests/gateway/test_matrix.py: add `put_device_id` to the hand-written
`PgCryptoStore` fake so the three encryption-path tests
(test_connect_with_access_token_and_encryption,
test_connect_uses_configured_device_id_over_whoami,
test_connect_registers_encrypted_event_handler_when_encryption_on) can
exercise the new crypto-store binding without AttributeError.
Extends PR #15171 to also cover the server-side cancellation path (aiohttp
shutdown, request-level timeout) — previously only ConnectionResetError
triggered the incomplete-snapshot write, so cancellations left the store
stuck at the in_progress snapshot written on response.created.
Factors the incomplete-snapshot build into a _persist_incomplete_if_needed()
helper called from both the ConnectionResetError and CancelledError
branches; the CancelledError handler re-raises so cooperative cancellation
semantics are preserved.
Adds two regression tests that drive _write_sse_responses directly (the
TestClient disconnect path races the server handler, which makes the
end-to-end assertion flaky).
When display.busy_input_mode is 'queue', the runner-level PRIORITY block
in _handle_message was still calling running_agent.interrupt() for every
text follow-up to an active session. The adapter-level busy handler
already honors queue mode (commit 9d147f7fd), but this runner-level path
was an unconditional interrupt regardless of config.
Adds a queue-mode branch that queues the follow-up via
_queue_or_replace_pending_event() and returns without interrupting.
Salvages the useful part of #12070 (@knockyai). The config fan-out to
per-platform extra was redundant — runner already loads busy_input_mode
directly via _load_busy_input_mode().
Follow-up to the canonical-identity session-key fix: pull the
JID/LID normalize/expand/canonical helpers into gateway/whatsapp_identity.py
instead of living in two places. gateway/session.py (session-key build) and
gateway/run.py (authorisation allowlist) now both import from the shared
module, so the two resolution paths can't drift apart.
Also switches the auth path from module-level _hermes_home (cached at
import time) to dynamic get_hermes_home() lookup, which matches the
session-key path and correctly reflects HERMES_HOME env overrides. The
lone test that monkeypatched gateway.run._hermes_home for the WhatsApp
auth path is updated to set HERMES_HOME env var instead; all other
tests that monkeypatch _hermes_home for unrelated paths (update,
restart drain, shutdown marker, etc.) still work — the module-level
_hermes_home is untouched.
Hermes' WhatsApp bridge routinely surfaces the same person under either
a phone-format JID (60123456789@s.whatsapp.net) or a LID (…@lid),
and may flip between the two for a single human within the same
conversation. Before this change, build_session_key used the raw
identifier verbatim, so the bridge reshuffling an alias form produced
two distinct session keys for the same person — in two places:
1. DM chat_id — a user's DM sessions split in half, transcripts and
per-sender state diverge.
2. Group participant_id (with group_sessions_per_user enabled) — a
member's per-user session inside a group splits in half for the
same reason.
Add a canonicalizer that walks the bridge's lid-mapping-*.json files
and picks the shortest/numeric-preferred alias as the stable identity.
build_session_key now routes both the DM chat_id and the group
participant_id through this helper when the platform is WhatsApp.
All other platforms and chat types are untouched.
Expose canonical_whatsapp_identifier and normalize_whatsapp_identifier
as public helpers. Plugins that need per-sender behaviour (role-based
routing, per-contact authorization, policy gating) need the same
identity resolution Hermes uses internally; without a public helper,
each plugin would have to re-implement the walker against the bridge's
internal on-disk format. Keeping this alongside build_session_key
makes it authoritative and one refactor away if the bridge ever
changes shape.
_expand_whatsapp_aliases stays private — it's an implementation detail
of how the mapping files are walked, not a contract callers should
depend on.
Regression test for #14981. Verifies that _session_expiry_watcher fires
on_session_finalize for each session swept out of the store, matching
the contract documented for /new, /reset, CLI shutdown, and gateway stop.
Verified the test fails cleanly on pre-fix code (hook call list missing
sess-expired) and passes with the fix applied.
When the primary provider raises AuthError (expired OAuth token,
revoked API key), the error was re-raised before AIAgent was created,
so fallback_model was never consulted. Now both gateway/run.py and
cron/scheduler.py catch AuthError specifically and attempt to resolve
credentials from the fallback_providers/fallback_model config chain
before propagating the error.
Closes#7230
Make the main-branch test suite pass again. Most failures were tests
still asserting old shapes after recent refactors; two were real source
bugs.
Source fixes:
- tools/mcp_tool.py: _kill_orphaned_mcp_children() slept 2s on every
shutdown even when no tracked PIDs existed, making test_shutdown_is_parallel
measure ~3s for 3 parallel 1s shutdowns. Early-return when pids is empty.
- hermes_cli/tips.py: tip 105 was 157 chars; corpus max is 150.
Test fixes (mostly stale mock targets / missing fixture fields):
- test_zombie_process_cleanup, test_agent_cache: patch run_agent.cleanup_vm
(the local name bound at import), not tools.terminal_tool.cleanup_vm.
- test_browser_camofox: patch tools.browser_camofox.load_config, not
hermes_cli.config.load_config (the source module, not the resolved one).
- test_flush_memories_codex._chat_response_with_memory_call: add
finish_reason, tool_call.id, tool_call.type so the chat_completions
transport normalizer doesn't AttributeError.
- test_concurrent_interrupt: polling_tool signature now accepts
messages= kwarg that _invoke_tool() passes through.
- test_minimax_provider: add _fallback_chain=[] to the __new__'d agent
so switch_model() doesn't AttributeError.
- test_skills_config: SKILLS_DIR MagicMock + .rglob stopped working
after the scanner switched to agent.skill_utils.iter_skill_index_files
(os.walk-based). Point SKILLS_DIR at a real tmp_path and patch
agent.skill_utils.get_external_skills_dirs.
- test_browser_cdp_tool: browser_cdp toolset was intentionally split into
'browser-cdp' (commit 96b0f3700) so its stricter check_fn doesn't gate
the whole browser toolset; test now expects 'browser-cdp'.
- test_registry: add tools.browser_dialog_tool to the expected
builtin-discovery set (PR #14540 added it).
- test_file_tools TestPatchHints: patch_tool surfaces hints as a '_hint'
key on the JSON payload, not inline '[Hint: ...' text.
- test_write_deny test_hermes_env: resolve .env via get_hermes_home() so
the path matches the profile-aware denylist under hermetic HERMES_HOME.
- test_checkpoint_manager test_falls_back_to_parent: guard the walk-up
so a stray /tmp/pyproject.toml on the host doesn't pick up /tmp as the
project root.
- test_quick_commands: set cli.session_id in the __new__'d CLI so the
alias-args path doesn't trip AttributeError when fuzzy-matching leaks
a skill command across xdist test distribution.