The anyOf collapse in _repair_schema returned early, skipping the
nullable-strip and enum-cleanup steps. When a schema had anyOf
[{enum: [..., null, '']}, {type: null}] alongside a parent-level
'nullable: true', collapsing to the single non-null branch produced a
merged node that still had both 'nullable' and the bad enum values —
Moonshot would still 400 on it.
Fix: fall through to Rules 1/3 when the collapse produces a single
merged node; only return early for the multi-branch case (pure
anyOf preservation) or when there was no null branch to remove.
Adds a test that locks in the combined-case expectation.
When a schema node inside anyOf has enum values but no explicit 'type',
Rule 3 (enum cleanup) ran before _fill_missing_type, so node_type was
None and the enum was never cleaned. Moonshot then rejected the schema
with 'enum value (<nil>) does not match any type in [string]'.
Fix: reorder operations — fill missing type first, strip nullable,
then clean enum. This ensures enum cleanup always has a type to check.
Also fixes test expectation: empty string in enum is now correctly
stripped (Moonshot rejects it too).
Closes#16875
Add a standing-goal slash command that keeps Hermes working toward a
user-stated objective across turns until it is achieved, paused, or
the turn budget runs out. Our take on the Ralph loop — cf. Codex CLI
0.128.0's /goal.
After each turn, a lightweight auxiliary-model judge call asks 'is
this goal satisfied by the assistant's last response?'. If not, and
we're under the turn budget (default 20), Hermes feeds a continuation
prompt back into the same session as a normal user message. Any real
user message preempts the continuation loop automatically.
Judge failures fail OPEN (continue) so a flaky judge never wedges
progress — the turn budget is the real backstop.
### Commands
- `/goal <text>` — set a standing goal (kicks off the first turn)
- `/goal` or `/goal status` — show current state
- `/goal pause` — pause the continuation loop
- `/goal resume` — resume (resets turn counter)
- `/goal clear` — drop the goal
Works on both CLI and gateway platforms via the central CommandDef
registry.
### Design invariants preserved
- **Prompt cache**: continuation prompts are regular user-role
messages appended to history. No system-prompt mutation, no toolset
swap.
- **Role alternation**: continuation is a user turn, never injected
mid-tool-loop.
- **Session persistence**: goal state lives in SessionDB.state_meta
keyed by `goal:<session_id>`, so `/resume` picks it up.
- **Mid-run safety**: on the gateway, `/goal status|pause|clear` are
allowed mid-run (control-plane only); setting a new goal requires
`/stop` first so we don't race a second continuation prompt against
the current turn.
### Files
- `hermes_cli/goals.py` (new, 380 lines) — GoalManager + judge + state
- `hermes_cli/commands.py` — CommandDef entry
- `hermes_cli/config.py` — `goals.max_turns` default
- `hermes_cli/web_server.py` — dashboard category merge
- `cli.py` — /goal handler + post-turn continuation hook in
process_loop
- `gateway/run.py` — /goal handler + post-turn continuation hook
wrapping _handle_message_with_agent
- `tests/hermes_cli/test_goals.py` (new, 26 tests) — judge parsing,
fail-open semantics, lifecycle, persistence, budget exhaustion
- `website/docs/reference/slash-commands.md` — docs entry
hermes update had two interactive [Y/n] prompts with no bypass:
1. Config migration (after new env/config options are added)
2. Autostash restore (when uncommitted work was stashed before pull)
hermes uninstall already has --yes/-y; mirrors that.
Under --yes:
- Config-migrate prompt → auto-yes, migrate_config(interactive=False)
so new config fields are applied but API-key prompts are skipped
(user runs 'hermes config migrate' later for those). Matches
gateway-mode semantics.
- Stash-restore prompt → auto-yes, git stash apply runs automatically.
Closes the 'can I hermes update -y, No ! Fix' gap reported by @murelux.
Adds opt-in auto-deletion for slash-command reply messages like
"New session started!", "Restarting gateway…", "Stopped.", and
YOLO toggles. After the TTL elapses the gateway calls the adapter's
delete_message; on platforms without a delete API (everything except
Telegram today) the TTL is silently ignored and the message stays.
Requested on Twitter by @charlesmcdowell — tool-call bubbles are useful
real-time, but system notices clutter the thread once the agent finishes.
Implementation:
- EphemeralReply(str) sentinel in gateway/platforms/base.py. Subclasses
str so existing 'X' in response / response.startswith(...) checks in
tests and call sites keep working unchanged; isinstance() still
distinguishes it for the send path.
- _process_message_background and both busy-session bypass paths
(in base.py) call _unwrap_ephemeral() on the handler return, send
the unwrapped text, and schedule a detached delete task when the
TTL > 0 AND the adapter class overrides delete_message.
- display.ephemeral_system_ttl (default 0 = disabled) in DEFAULT_CONFIG.
Handler can pass ttl_seconds explicitly to override.
- Wrapped the highest-noise return sites: /new, /reset, /stop,
/yolo on/off, /restart success + "already in progress". Draining
notices and /help output left as plain strings — those are
informational and users want to read them.
Backward-compat: default TTL 0 → no scheduling, no behavior change
for existing users. Platforms without delete_message silently no-op.
When the curator consolidates skill X into umbrella Y, any cron job
that listed X in its skills field would fail to load X at run time —
the scheduler logs a warning and skips it, so the scheduled job runs
without the instructions it was scheduled to follow.
cron.jobs.rewrite_skill_refs(consolidated, pruned) now updates jobs
in-place: consolidated names route to the umbrella target (dedup
when umbrella is already present), pruned names are dropped.
agent.curator._write_run_report calls it after classification,
best-effort so a cron-side failure never breaks the curator itself.
Results are recorded in run.json (counts.cron_jobs_rewritten + full
cron_rewrites payload), a separate cron_rewrites.json for convenience
when jobs were touched, and a section in REPORT.md.
Reported by @tombielecki.
DeepSeek V4 Pro tightened thinking-mode validation and rejects empty-string
reasoning_content with HTTP 400:
The reasoning content in the thinking mode must be passed back to the API.
run_agent.py injected "" at three fallback sites — the tool-call pad in
_build_assistant_message and both injection branches of
_copy_reasoning_content_for_api (cross-provider poison guard + unconditional
thinking pad). All three now emit " " (single space), which satisfies the
non-empty check on V4 Pro without leaking fabricated reasoning.
Also upgrades stale empty-string placeholders on replay: sessions persisted
before this change have reasoning_content="" pinned at creation time; when
the active provider enforces thinking-mode echo, the replay path now rewrites
"" -> " " so existing users don't 400 on their first V4 Pro turn after
updating. Non-thinking providers still round-trip "" verbatim.
Updates 9 existing assertions + adds 2 regression tests (stale-placeholder
upgrade, non-thinking verbatim preservation).
Refs #15250, #17400.
Closes#17341.
The user-visible /compress banner and the post-compression last_prompt_tokens
writeback both counted only the raw message transcript (chars/4). With a 15KB
system prompt and 30 tool schemas (~26KB), a 4-message transcript that looks
like ~45 tokens to the transcript-only estimator is really ~10.5K tokens of
request pressure — a 234x gap.
Two user-facing consequences:
- Banner shows 'Compressing … (~45 tokens)…' while compression is actually
firing on 10K+ tokens of real pressure, confusing users about why
compression triggered (reported by @codecovenant on X; #6217).
- Post-compression last_prompt_tokens writeback omits tool schemas, so the
next should_compress() check compares real usage against a stale
underestimate — compression triggers late, potentially past the model's
context limit on small-context models (#14695).
Swap estimate_messages_tokens_rough() for estimate_request_tokens_rough()
at every user-visible banner and at the post-compression writeback.
estimate_request_tokens_rough() already existed for exactly this purpose
and includes system prompt + tool schemas.
Touched call sites:
- run_agent.py: post-compression last_prompt_tokens writeback, post-tool
call should_compress() fallback when provider usage is missing
- cli.py: /compress banner + summary
- gateway/run.py: gateway /compress banner + summary
- tui_gateway/server.py: TUI /compress status + summary
- acp_adapter/server.py: ACP /compact before/after
Left intentionally alone:
- Session-hygiene fallback and the 'no agent' /status path in gateway/run.py
— no agent instance is in scope to query for system prompt/tools, and the
existing 30-50% overestimate wobble on hygiene is safety-accepted.
- Verbose-mode 'Request size' logging — informational only, already counts
system prompt via api_messages[0].
Also relabels the feedback line from 'Rough transcript estimate' to
'Approx request size' so the metric label matches what it actually measures.
Credits: diagnoses from @devilardis (#14695) and @Jackten (#6217);
user report @codecovenant on X (2026-04-30).
Closes#14695Closes#6217
When a user types /steer <text> on an ACP session that isn't actively
running a turn (and there's no interrupted-prompt salvage available),
_cmd_steer silently appended to state.queued_prompts and replied
"No active turn — queued for the next turn". That looks identical to
/queue output even though the user never typed /queue — @EddyLeeKhane
reported this as "/steer never works, gets queued instead".
Rewrite the payload to a plain user prompt before the slash-intercept
fires, matching the gateway's idle-/steer fallthrough in
gateway/run.py ~L4898.
_process_message_background snapshotted callback_generation from the
interrupt event at the TOP of the task — before the handler ran.
_hermes_run_generation is only set on the event by
GatewayRunner._bind_adapter_run_generation during
_handle_message_with_agent, which runs DURING the handler await. The
early snapshot always captured None, which then flowed into
pop_post_delivery_callback(..., generation=None) in the finally block.
In pop_post_delivery_callback, generation=None with a tuple-registered
entry (generation, callback) bypasses the ownership check — it pops and
fires the callback regardless of which run owns it. Result: a stale run
could fire a fresher run's post-delivery callback (e.g. a
background-review notification attributed to the wrong turn).
Fix: move the snapshot into the finally block, after the handler has
run and _hermes_run_generation has been bound to the current run.
Regression test added: simulates a stale handler at generation=1 and a
fresher callback registered at generation=2. Pre-fix: snapshot=None →
pop fires the generation=2 callback under generation=1's ownership
("newer" fires). Post-fix: snapshot=1 → pop skips the mismatched
entry, callback stays in the dict for the correct run to claim.
Verified: test FAILS on current main (captures "newer" in fired list),
PASSES with this fix.
Salvaged from PR #12565 (the callback-ownership portion only; the
/status totals portion was already fixed on main in 7abc9ce4d via #17158).
Co-authored-by: Oxidane-bot <1317078257maroon@gmail.com>
Widens #16528 to two sibling sites that had the same quoted-boolean
bug: a YAML string "false" (or "0", "no", "off") silently evaluated
truthy under bool() / if-check.
- gateway/run.py _load_show_reasoning: is_truthy_value wrap
- tools/skill_manager_tool.py _guard_agent_created_enabled: is_truthy_value wrap
- regression tests for both
SELECT in get_messages_as_conversation() was missing finish_reason, so
assistant messages round-tripped through replay (including /branch copies)
silently dropped the provider's stop signal. Adds it to the SELECT, restores
it on assistant rows, and locks it in with a round-trip test.
When running on a host with sudoers NOPASSWD configured for the current
user, interactive Hermes sessions were unnecessarily entering the
password prompt path before executing sudo commands. Outside Hermes,
`sudo -n true` exits 0 for that user.
Add `_sudo_nopasswd_works()` that probes `sudo -n true` and, when it
succeeds, lets `_transform_sudo_command()` return the command unchanged
with no stdin password. The probe:
- Is scoped to the `local` terminal backend only, so Docker/SSH/Modal
and other remote backends do not inherit host sudo state.
- Re-probes every call (no process-lifetime cache) so an expired sudo
timestamp cannot silently make a later command block waiting for a
password that Hermes never prompts for.
- Is bypassed entirely when `SUDO_PASSWORD` is configured or a cached
password already exists, preserving existing explicit-password flows.
Co-authored-by: Junting Wu <juntingpublic@gmail.com>
The fix for this bug (isinstance guard) was merged via commit 3ff9e010,
but test coverage was not included. Adding 4 tests:
- dict metadata with hermes keys (normal case)
- string metadata (bug case — previously caused AttributeError)
- None metadata
- missing metadata key
Proves token A's detected capabilities do not leak to token B after the
fix in the preceding commit. Before the fix this test would have seen
both tokens return token A's cached value.
Add two operator-facing toggles for inbound Feishu admission, enabling
bot-to-bot scenarios such as A2A orchestration and inter-bot
notifications:
FEISHU_ALLOW_BOTS=none|mentions|all (default: none)
Accept messages from other bots. `mentions` requires the peer
bot to @-mention Hermes; `all` admits every peer-bot message.
FEISHU_REQUIRE_MENTION=true|false (default: true)
Whether group messages must @-mention the bot. Override per-chat
via `group_rules.<chat_id>.require_mention` in config.yaml.
Defaults preserve prior behavior. Self-echo protection is always on:
when the bot's identity is unresolved (auto-detection failed and
FEISHU_BOT_OPEN_ID unset), peer-bot messages are rejected fail-closed
to avoid feedback loops.
Admitted peer bots bypass the human-user allowlist
(FEISHU_ALLOWED_USERS) to match existing Discord behavior; humans
still need an explicit allowlist entry. yaml feishu.allow_bots is
bridged to the env var so the adapter and gateway auth layer share
one source of truth.
Resolving peer-bot display names requires the
application:bot.basic_info:read scope; without it, peers still route
but appear as their open_id.
Test: tests/gateway/test_feishu_bot_admission.py covers the admission
pipeline, group-policy bot-bypass, hydration, and event-dispatch
plumbing as a parametrized matrix.
Change-Id: I363cccb578c2a5c8b8bf0f0a890c01c89909e256
reset_session() creates a fresh SessionEntry with created_at == updated_at,
but get_or_create_session() bumps updated_at on the next inbound message,
causing _is_new_session in _handle_message_with_agent to evaluate False.
The topic/channel skill auto-load gate (group_topics, channel_skill_bindings)
silently skips the first message after a manual reset.
Add an is_fresh_reset flag on SessionEntry, set by reset_session() and
consumed once by the message handler. Kept distinct from was_auto_reset
because that flag also drives a 'session expired due to inactivity'
user-facing notice and a context-note prepend — both wrong for an
explicit /new or /reset.
Persisted through to_dict/from_dict so the flag survives gateway
restart between /reset and the next message.
Fixes#6508
Co-authored-by: warabe1122 <45554392+warabe1122@users.noreply.github.com>
Co-authored-by: willy-scr <187001140+willy-scr@users.noreply.github.com>
/status was reading session_entry.total_tokens from the in-memory
SessionStore (gateway/session.py), which the agent never writes to —
so the token count was always 0.
The agent already persists token deltas to the SQLite SessionDB
(run_agent.py:11497) for every platform with a session_id. Route
/status through that single source of truth instead of duplicating
token writes into a second store.
Fix:
- gateway/run.py: _handle_status_command now calls
self._session_db.get_session(session_id) and sums the five token
component columns (input/output/cache_read/cache_write/reasoning).
Falls back to 0 when no SessionDB is configured or no row exists.
- Two new regression tests covering the populated-row and
missing-row paths.
Co-authored-by: Hermes <127238744+teknium1@users.noreply.github.com>
sqlite3 can only bind str/bytes/int/float/None to query parameters.
Multimodal message content is a list of parts (text + image_url), which
raised 'Error binding parameter 3: type list is not supported' in
append_message and replace_messages.
In the CLI/TUI this surfaced as a visible crash when users pasted
screenshots. In the gateway it was silently swallowed by a bare except
in append_to_transcript, causing multimodal turns to be lost from the
session transcript.
Fix at the DB layer: _encode_content wraps lists/dicts as
'\\x00json:' + json.dumps(...) on write, _decode_content unwraps on
read. Plain strings are untouched, so existing FTS search, previews,
and JSONL compat are unaffected. Paired decode in get_messages,
get_messages_as_conversation, and search_messages context previews.
Regression test covers: list content round-trip, dict content
round-trip, string content stored unchanged, replace_messages with
multimodal content.
Also included: aligned fix#17522 for TUI image attachment with
paths containing spaces (see previous commit).
Remove frontend regex pre-check that truncated paths containing spaces,
quotes, or Windows drive letters. Backend _detect_file_drop correctly
handles these patterns. This fixes image attachment for common filenames
like "Screenshot 2026-04-29.png".
Add tests:
- test_input_detect_drop_path_with_spaces: attaches image with spaces in name
- test_input_detect_drop_path_with_spaces_and_remainder: remainder handling
Also restored missing in test_rollback_restore_resolves_number_and_file_path.
Scope: tui, vision, tests
When a user defines `custom_providers: [{name: kimi, ...}]` and references
`provider: kimi` from fallback_model or the main config, the built-in alias
rewriting (`kimi` → `kimi-coding`) was hijacking the request before the
named-custom lookup ran. `_get_named_custom_provider` also refused to
return a match when the raw name resolved to any built-in (including aliases),
so the custom endpoint was unreachable.
Fix at both layers of the resolution chain so every caller benefits, not
just `_try_activate_fallback`:
- hermes_cli/runtime_provider.py: narrow `_get_named_custom_provider`'s
built-in-wins guard to canonical provider names only. An alias like
`kimi` that resolves to a different canonical (`kimi-coding`) no longer
blocks the custom lookup; a canonical name like `nous` still does.
- agent/auxiliary_client.py: in `resolve_provider_client`, try the named-
custom lookup with the original (pre-alias-normalization) name before the
alias-normalized one, so aliased requests reach the user's custom entry.
Also honour `explicit_base_url` and `explicit_api_key` in the API-key
provider branch so callers that pass explicit hints (e.g. fallback
activation) can override the registered defaults.
Tests added for:
- custom `kimi` shadowing built-in alias (regression for #15743)
- custom `nous` NOT shadowing canonical built-in (behaviour preserved)
- bare `kimi` without any custom entry still routing to built-in
- explicit base_url/api_key override on the API-key provider branch
Original PR #17827 by @Feranmi10 identified the same bug class and
implemented a narrower fix in `_try_activate_fallback`; this reshapes the
fix to live in the shared resolution layer so all callers benefit.
Fixes#15743
Co-authored-by: Feranmi10 <89228157+Feranmi10@users.noreply.github.com>
Follow-up to the previous commit. Replace the post-fetch Python re-sort (which
required dropping LIMIT/OFFSET from SQL and scanning every session row) with a
recursive CTE that walks compression-continuation chains and computes
effective_last_active per root at SQL level. The outer query can then ORDER BY
+ LIMIT efficiently, and the Python projection loop no longer has to handle
ordering.
This preserves the correctness win (old compression roots whose live tip was
touched recently surface correctly) without the O(N) scan, which matters for
users with thousands of sessions.
Adds a regression test pinning the compression-tip case at limit=1 — the
stress case that any bounded-oversample shortcut would get wrong.
Co-authored-by: simbam99 <simbamax99@gmail.com>
- order session_search recent-mode results by last activity instead of session start time
- add an opt-in `order_by_last_active` path to `SessionDB.list_sessions_rich`
- add regression coverage for both the database ordering and recent-mode call path
The PR wired in a detached watcher that respawns manual profile gateways
after they exit. Pair that with a SIGUSR1 graceful drain (same path
systemd/launchd use) so in-flight agent runs finish instead of getting
SIGTERM'd. Fall back to SIGTERM if SIGUSR1 isn't wired or the gateway
doesn't exit within the drain budget — the watcher sees the exit and
relaunches either way.
Tested end-to-end against an orphaned gateway: graceful drain exits in
0.5s and the watcher fires the relaunch command.
When len(messages) <= protect_tail_count and a token budget is set, the
previous formula min(protect_tail_count, len(result) - 1) under-protected
the tail by one, allowing the oldest message to be summarized.
The test fails on the buggy formula (pruned == 1) and passes on the fix
(pruned == 0, tool content preserved verbatim).
Follow-up to #17963. The threaded branch of resolve_plugin_command_result
previously called Event.wait() with no timeout — a hung async plugin
handler would wedge the terminal indefinitely. Cap the wait at 30s and
raise TimeoutError instead. Added a regression test covering the hung
handler path.