* feat(image-input): native multimodal routing based on model vision capability
Attach user-sent images as OpenAI-style content parts on the user turn when
the active model supports native vision, so vision-capable models see real
pixels instead of a lossy text description from vision_analyze.
Routing decision (agent/image_routing.py::decide_image_input_mode):
agent.image_input_mode = auto | native | text (default: auto)
In auto mode:
- If auxiliary.vision.provider/model is explicitly configured, keep the
text pipeline (user paid for a dedicated vision backend).
- Else if models.dev reports supports_vision=True for the active
provider/model, attach natively.
- Else fall back to text (current behaviour).
Call sites updated: gateway/run.py (all messaging platforms), tui_gateway
(dashboard/Ink), cli.py (interactive /attach + drag-drop).
run_agent.py changes:
- _prepare_anthropic_messages_for_api now passes image parts through
unchanged when the model supports vision — the Anthropic adapter
translates them to native image blocks. Previous behaviour
(vision_analyze → text) only runs for non-vision Anthropic models.
- New _prepare_messages_for_non_vision_model mirrors the same contract
for chat.completions and codex_responses paths, so non-vision models
on any provider get text-fallback instead of failing at the provider.
- New _model_supports_vision() helper reads models.dev caps.
vision_analyze description rewritten: positions it as a tool for images
NOT already visible in the conversation (URLs, tool output, deeper
inspection). Prevents the model from redundantly calling it on images
already attached natively.
Config default: agent.image_input_mode = auto.
Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py),
all existing tests that reference _prepare_anthropic_messages_for_api
still pass (198 targeted + new tests green).
* feat(image-input): size-cap + resize oversized images, charge image tokens in compressor
Two follow-ups that make the native image routing safer for long / heavy
sessions:
1) Oversize handling in build_native_content_parts:
- 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES,
the most restrictive provider — Gemini inline data).
- Delegates to vision_tools._resize_image_for_vision (Pillow-based,
already battle-tested) to downscale to 5 MB first-try.
- If Pillow is missing or resize still overshoots, the image is
dropped and reported back in skipped[]; caller falls back to text
enrichment for that image.
2) Image-token accounting in context_compressor:
- New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant;
within the realistic range for Anthropic/GPT-4o/Gemini billing).
- _content_length_for_budget() helper: sums text-part lengths and
charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/
input_image part. Base64 payload inside image_url is NOT counted
as chars — dimensions don't matter, only image-presence.
- Both tail-cut sites (_prune_old_tool_results L527 and
_find_tail_cut_by_tokens L1126) now call the helper so multi-image
conversations don't slip past compression budget.
Tests: 9 new in test_image_routing.py (oversize triggers resize,
resize-fails-returns-None, oversize-skipped-reported), 11 new in
test_compressor_image_tokens.py (flat charge per image, multiple images,
Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on
raw base64, bounds-check on the constant, integration test that an
image-heavy tail actually gets trimmed).
* fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits
The previous commit imposed a hardcoded 20 MB base64 ceiling on all
providers, triggering auto-resize on anything larger. This was wrong in
both directions:
* Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400
'image exceeds 5 MB maximum' above that).
* Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without
complaint (empirically verified April 2026 with progressive PNG
sizes).
New behaviour:
* _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a
ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder).
* Providers NOT in the table get no ceiling — images attach at native
size and we trust the provider to return its own error if it
disagrees. A provider-specific 400 message is clearer than us
guessing wrong and silently degrading image quality.
* build_native_content_parts() gains a keyword-only provider arg;
gateway/CLI/TUI pass the active provider so Anthropic users get
auto-resize protection while OpenAI users don't pay it.
* Resize target dropped from 5 MB to 4 MB to slide safely under
Anthropic's boundary with header overhead.
Empirical measurements (direct API, no Hermes in the loop):
image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5
0.19 MB ✓ ✓ ✓
12.37 MB ✗ 400 5MB ✓ ✓
23.85 MB ✗ 400 5MB ✓ ✓
49.46 MB ✗ 413 ✓ ✓
Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through,
Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts
routes ceiling by provider, unknown provider gets no ceiling. All 52
targeted tests pass.
* refactor(image-input): attempt native, shrink-and-retry on provider reject
Replace proactive per-provider size ceilings with a reactive shrink path
on the provider's actual rejection. All providers now attempt native
full-size attachment first; if the provider returns an image-too-large
error, the agent silently shrinks and retries once.
Why the previous design was wrong: hardcoding provider ceilings
(anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image
paid no tax, but Anthropic users lost quality on anything >5MB even
though the empirical behaviour at provider-reject time is the same
(shrink + retry). Baking the table into the routing layer also
requires updating Hermes every time a provider's limit changes.
Reactive design:
- image_routing.py: _file_to_data_url encodes native size, no ceiling.
build_native_content_parts drops its provider kwarg.
- error_classifier.py: new FailoverReason.image_too_large + pattern
match ("image exceeds", "image too large", etc.) checked BEFORE
context_overflow so Anthropic's 5MB rejection lands in the right
bucket.
- run_agent.py: new _try_shrink_image_parts_in_messages walks api
messages in-place, re-encodes oversized data: URL image parts
through vision_tools._resize_image_for_vision to fit under 4MB,
handles both chat.completions (dict image_url) and Responses
(string image_url) shapes, ignores http URLs (provider-fetched).
New image_shrink_retry_attempted flag in the retry loop fires the
shrink exactly once per turn after credential-pool recovery but
before auth retries.
E2E verified live against Anthropic claude-sonnet-4-6:
- 17.9MB PNG (23.9MB b64) attached at native size
- Anthropic returns 400 "image exceeds 5 MB maximum"
- Agent logs '📐 Image(s) exceeded provider size limit — shrank and
retrying...'
- Retry succeeds, correct response delivered in 6.8s total.
Tests: 12 new (8 shrink-helper shapes + 4 classifier signals),
replaces 5 proactive-ceiling tests with 3 simpler 'native attach works'
tests. 181 targeted tests pass. test_enum_members_exist in
test_error_classifier.py updated for the new enum value.
* feat(plugins): google_meet — bundled plugin for join+transcribe Meet calls
v1 shipping transcribe-only. Spawns headless Chromium via Playwright,
joins an explicit https://meet.google.com/ URL, enables live captions,
and scrapes them into a transcript file the agent can read across turns.
The agent then has the meeting content in context and can do followup
work (send recap, file issues, schedule followups) with its regular tools.
Surface:
- Tools: meet_join, meet_status, meet_transcript, meet_leave, meet_say
(meet_say is a v1 stub — returns not-implemented; v2 will wire
realtime duplex audio via OpenAI Realtime / Gemini Live +
BlackHole / PulseAudio null-sink.)
- CLI: hermes meet setup | auth | join | status | transcript | stop
- Lifecycle: on_session_end auto-leaves any still-running bot.
Safety:
- URL regex rejects anything that isn't https://meet.google.com/...
- No calendar scanning, no auto-dial, no auto-consent announcement.
- Single active meeting per install; a second meet_join leaves the first.
- Platform-gated to Linux + macOS (Windows audio routing for v2 untested).
- Opt-in: standalone plugin, user must add 'google_meet' to
plugins.enabled in config.yaml.
Zero core changes. Plugin uses existing register_tool /
register_cli_command / register_hook surfaces. 21 new unit tests cover the
URL safety gate, transcript dedup + status round-trip, process-manager
refusals/start/stop paths, tool-handler JSON shape under each branch,
session-end cleanup, and platform-gated register().
* feat(plugins/google_meet): v2 realtime audio + v3 remote node host
v2 \u2014 agent speaks in-meeting
audio_bridge.py: PulseAudio null-sink (Linux) + BlackHole probe (macOS).
On Linux we load pactl module-null-sink + module-virtual-source, track
module ids for teardown; Chrome gets PULSE_SOURCE=<virt src> env so its
fake mic reads what we write to the sink. macOS just probes BlackHole
2ch and returns its device name \u2014 the plugin refuses to switch the
user's default audio input (that would surprise them).
realtime/openai_client.py: sync WebSocket client for the OpenAI Realtime
API. RealtimeSession.speak(text) sends conversation.item.create +
response.create, accumulates response.audio.delta PCM bytes, appends
them to a file. RealtimeSpeaker runs a JSONL-queue loop consuming
meet_say calls. 'websockets' is an optional dep imported lazily.
meet_bot.py: when HERMES_MEET_MODE=realtime, provisions AudioBridge,
starts RealtimeSession + speaker thread, spawns paplay to pump PCM
into the null-sink, then cleans everything up on SIGTERM. If any
realtime setup step fails, falls back cleanly to transcribe mode
with an error flagged in status.json.
process_manager.enqueue_say(): writes a JSONL line to say_queue.jsonl;
refuses when no active meeting or active meeting is transcribe-only.
tools.meet_say: real implementation; requires active mode='realtime'.
meet_join: adds mode='transcribe'|'realtime' param.
v3 \u2014 remote node host
node/protocol.py: JSON envelope (type, id, token, payload) + validate.
node/registry.py: $HERMES_HOME/workspace/meetings/nodes.json, with
resolve() auto-selecting the sole registered node when name is None.
node/server.py: NodeServer \u2014 websockets.serve, bearer-token auth,
dispatches start_bot/stop/status/transcript/say/ping onto the local
process_manager. Token auto-generated + persisted on first run.
node/client.py: NodeClient \u2014 short-lived sync WS per RPC, raises
RuntimeError on error envelopes, clean API matching the server.
node/cli.py: 'hermes meet node {run,list,approve,remove,status,ping}'
subtree; wired into the main meet CLI by cli.py so 'hermes meet node'
Just Works.
tools.py: every meet_* tool accepts node='<name>'|'auto'; when set,
routes through NodeClient to the remote bot instead of running
locally. Unknown node \u2192 clear 'no registered meet node matches ...'
error.
cli.py: 'hermes meet join --node my-mac --mode realtime' and
'hermes meet say "..." --node my-mac' route to the node; 'hermes
meet node approve <name> <url> <token>' registers one.
Tests
21 v1 tests updated (meet_say is no longer a stub; active-record now
carries mode).
20 new audio_bridge + realtime tests.
42 new node tests (protocol/registry/server/client/cli).
17 new v1/v2/v3 integration tests at the plugin level covering
enqueue_say edge cases, env var passthrough, mode validation, node
routing (known/unknown/auto/ambiguous), and argparse wiring for
`hermes meet say` + `hermes meet node` + --mode/--node flags.
Total: 100 plugin tests + 58 plugin-system tests = 158 passing.
E2E verified on Linux with fresh HERMES_HOME: plugin loads, 5 tools
register, on_session_end hook wires, 'hermes meet' CLI tree wires
including the node subtree, NodeRegistry round-trips, meet_join routes
correctly to NodeClient under node='my-mac' with mode='realtime',
enqueue_say accepts realtime/rejects transcribe, argparse parses every
new flag cleanly.
Zero changes to core. All new code lives under plugins/google_meet/.
* feat(plugins/google_meet): auto-install, admission detect, mac PCM pump, barge-in, richer status
Ready-for-live-test follow-up on PR #16364. Five additions that matter for
the first live run on a real Meet, in priority order:
1. hermes meet install [--realtime] [--yes]
pip install playwright websockets + python -m playwright install chromium
--realtime: installs platform audio deps (pulseaudio-utils on Linux via
sudo apt, blackhole-2ch + ffmpeg on macOS via brew). Prompts before
sudo/brew unless --yes. Refuses on Windows. Refuses to auto-flip the
macOS default input — user still selects BlackHole in System Settings
(deliberate; surprise audio rerouting is worse than a manual step).
2. Admission detection
_detect_admission(page): Leave-button visible OR caption region
attached OR participants list present → we're in-call.
_detect_denied(page): 'You can\'t join this video call' / 'You were
removed' / 'No one responded to your request' → bail out.
HERMES_MEET_LOBBY_TIMEOUT (default 300s) caps how long we sit in
the lobby before giving up. in_call stays False until admitted.
Status surfaces leaveReason: duration_expired | lobby_timeout |
denied | page_closed.
3. macOS PCM pump
ffmpeg reads speaker.pcm (24kHz s16le mono) and writes to the
BlackHole AVFoundation output via -f audiotoolbox
-audio_device_index <N>. _mac_audio_device_index() probes
ffmpeg -f avfoundation -list_devices true to resolve 'BlackHole 2ch'
→ numeric index. Falls back to index 0 on probe failure. Linux
paplay pump unchanged.
4. Richer status dict
_BotState now tracks realtime, realtimeReady, realtimeDevice,
audioBytesOut, lastAudioOutAt, lastBargeInAt, joinAttemptedAt,
leaveReason. RealtimeSession.audio_bytes_out / last_audio_out_at
counters fold into the status file once a second so meet_status()
can show the agent's voice activity in near-real-time.
5. Barge-in
RealtimeSession.cancel_response() sends type='response.cancel' over
the same WS (lock-guarded so it's safe to call from the caption
thread while speak() is reading frames). Handles response.cancelled
as a terminal frame type. _looks_like_human_speaker() gates triggers
so the bot's own name, 'You', 'Unknown', and blanks don't self-cancel.
Called from the caption drain loop: when a new caption arrives
attributed to a real participant while rt.session exists, we fire
cancel_response() and stamp lastBargeInAt.
Tests: 20 new unit tests across _BotState telemetry, barge-in gating,
admission/denied probe error handling, cancel_response with and without
a connected WS, and `hermes meet install` CLI wiring (flag parsing +
end-to-end subprocess.run verification + Linux-already-installed fast
path). Total 171 passing across all google_meet test files + the
plugin-system regression suite.
E2E verified on Linux: plugin loads, all 5 tools register,
`hermes meet install --realtime --yes` parses, fresh-bot status.json
has every new telemetry key, cancel_response on a disconnected session
returns False without raising, barge-in helper gates the bot's own
name correctly.
Still out of scope (for a future PR, not blocking live test):
mic → Realtime duplex (the agent listening to meeting audio via
WebRTC), node-host TLS/pairing UX, Windows audio, Meet create+Twilio.
Docs updated: SKILL.md now lists the installer subcommand, lobby
timeout, barge-in caveat, and the full status-dict reference table.
README.md quick-start uses hermes meet install.
Every 'hermes update' now runs a full backup of ~/.hermes/ first, so
users can always roll back to the exact state they had before the
update if anything goes wrong (corrupted sessions.db, broken skills,
config migrations that don't round-trip, etc.).
Changes:
- hermes_cli/backup.py: new create_pre_update_backup() helper. Writes
to <HERMES_HOME>/backups/pre-update-<stamp>.zip using the same
exclusion rules and SQLite safe-copy as 'hermes backup'. Auto-rotates
(keep last N, pre-update-*.zip only — hand-dropped zips in backups/
are untouched). Adds 'backups' to _EXCLUDED_DIRS so subsequent backups
don't nest prior ones.
- hermes_cli/main.py: _run_pre_update_backup() wired into
_cmd_update_impl before any git operation. Prints save path, restore
command, and how to disable. Swallows failures so a broken backup
never blocks the update itself. New --no-backup flag on 'hermes
update' for one-off override.
- hermes_cli/config.py: new 'updates' section in DEFAULT_CONFIG with
pre_update_backup (default true) and backup_keep (default 5).
Auto-surfaces in the dashboard config UI.
- tests/hermes_cli/test_backup.py: +11 tests covering backup location,
content parity with 'hermes backup', no-recursion, rotation, manual
file preservation, config gate, --no-backup flag, flag-wins-over-config.
The CLI renders through prompt_toolkit in non-full-screen mode, so every
repaint uses the renderer's tracked _cursor_pos.y to cursor_up() + erase
before drawing the new frame. Any time that tracked position drifts from
terminal reality, redraws stack on top of stale content instead of
overwriting it. Four user-visible bugs share this root cause.
Fixes:
- #5474 (SIGWINCH ghosts): the resize wrapper previously only handled
column-shrink reflow. Generalize it to force a full screen-clear
(erase_screen + cursor_goto(0,0)) and renderer.reset() on every resize
— covers widen, row-shrink, and multiplexer SIGWINCH-less redraws.
- #8688 (cmux/tmux tab switch): no SIGWINCH fires on focus regain, so
prompt_toolkit has no signal to recover. Add a _force_full_redraw()
helper, bound to Ctrl+L (standard bash/zsh/vim convention) and exposed
as /redraw. Users can manually clear drift without restarting Hermes.
- #14692 (DSR response leaks — ^[[53;1R): resize storms make
prompt_toolkit's CSI 6n queries race past the input parser; the
terminal's reply ends up as literal input text. Add a sibling of the
bracketed-paste sanitizer that strips \x1b[<row>;<col>R and the
caret-escape visible form from paste text, buffer text-filter, and
the input-processing loop.
The idle-redraw removal (#12641) is in the preceding commit from
@foxion37 — keeping them as separate commits preserves attribution.
On provider switches mid-session (e.g. MiniMax -> DeepSeek), the source
assistant turn carries a 'reasoning' field written by the prior provider
but no 'reasoning_content' key. _copy_reasoning_content_for_api would
promote that foreign 'reasoning' to 'reasoning_content' on the outbound
DeepSeek request, leaking a cross-provider chain of thought and in
practice causing HTTP 400.
DeepSeek's own _build_assistant_message always pins reasoning_content=''
at creation time for tool-call turns, so the shape (reasoning set,
reasoning_content absent, tool_calls present) is unreachable from
same-provider DeepSeek history — it can only come from a prior provider.
Pad with '' in that case instead of promoting.
Healthy same-provider 'reasoning' promotion (no tool_calls, or on
providers that do not require the empty-string pin) is unchanged.
Defensive: when the generator encounters a fenced code block containing
Unicode box-drawing characters, wrap it in `<!-- ascii-guard-ignore -->`
markers so the docs-site-checks lint (which scans inside code fences)
can't reject the page for a skill's own diagram.
Plain bash/python code blocks stay uncluttered — only blocks with box
chars get wrapped. Skill authors no longer have to remember to add the
ignore markers in every SKILL.md with ASCII art.
Fixes#15305.
Quick state snapshot now includes pairing JSONs (generic + legacy +
Feishu comment pairing), and `hermes update` takes a pre-update
snapshot labeled `pre-update` before pulling.
Pairing data lives outside state.db in platform-specific JSONs under
~/.hermes/pairing/, ~/.hermes/platforms/pairing/, and
~/.hermes/feishu_comment_pairing.json. The update command already
couldn't touch $HERMES_HOME, but #15733 reports lost pairing after
an update — this gives users something to restore from via
`/snapshot list` / `/snapshot restore <id>` if anything clobbers
the approved-user lists.
- Extend _QUICK_STATE_FILES with pairing paths (files + dirs)
- Snapshot walks directories recursively and records each file in the
manifest individually so restore logic is unchanged
- _cmd_update_impl calls create_quick_snapshot(label='pre-update')
after 'Found N new commits' and before 'Pulling updates'
- Snapshot failures are logged at debug and never block the update
Refs #15733.
read_file's dedup path returned a lightweight stub on re-reads of an
unchanged file, then returned early — so the consecutive-read loop
guard (hard block at count>=4) at the bottom of read_file_tool never
ran for stub-looped calls. Weaker tool-following models (local Qwen3.6
variants in the reported case) ignore the passive 'refer to earlier
result' hint and hammer the same read_file call until iteration budget
runs out.
Track per-key stub returns in task_data['dedup_hits'] and, on the
second stub for the same (path, offset, limit), return a hard BLOCKED
error mirroring the wording the real-read path already uses. A real
read, an intervening non-read tool call (notify_other_tool_call), or
reset_file_dedup (on context compression) all clear the counter so
the guard never stays engaged longer than the actual loop.
Closes#15759
Telegram groups emit a single bot_command entity covering the whole
/cmd@botname span with no accompanying mention entity, so the existing
mention gate in _message_mentions_bot dropped slash commands sent via
the bot-menu autocomplete whenever require_mention is enabled.
Recognise bot_command entities whose @botname suffix matches the bot
username (case-insensitive) as a direct mention, and keep rejecting
commands addressed at other bots. Fixes#15415.
When 'hermes model' runs against a providers: (keyed-schema) entry that
relies only on key_env, the picker resolves the env var for the live
/models request and then wrote a synthesized 'api_key: ${KEY_ENV}' back
to the providers.<key> entry. That's redundant — the runtime already
resolves from key_env directly — and it clutters configs that
intentionally keep credentials out of config.yaml.
Only persist provider_entry['api_key'] when the user originally had an
inline value (literal secret or ${VAR} template). Entries that declared
only key_env stay clean on save.
Fixes#15803.
Harden the Matrix adapter's sender-drop guards so bot-self events and
appservice/bridge identities never reach the gateway's pairing flow or
the agent loop.
Two filters, applied as early as possible in _on_room_message (and
_on_reaction for the self-filter):
1. _is_self_sender(sender) — case-insensitive + whitespace-trimmed
equality with self._user_id. When self._user_id is still empty
(whoami has not resolved, or login failed), returns True
defensively: an unidentified bot dropping its own events is always
preferable to falling into an echo loop. The previous byte-for-byte
equality check let differently-cased copies of the bot's MXID slip
through, and an unresolved self-ID silently disabled the guard.
2. _is_system_or_bridge_sender(sender) — drops appservice namespace
puppets (conventional @_bridge_...:server form) and malformed
senders with an empty localpart. These identities used to fall
through to the gateway's unauthorized-user path, trigger a pairing
code, and — once an operator approved the bridge — every outbound
message the bridge relayed would loop back as an authorized user
message. This was the root of the 'hall of mirrors' symptom.
Fixes#15763
Test plan
---------
scripts/run_tests.sh tests/gateway/test_matrix.py
scripts/run_tests.sh tests/gateway/test_matrix_mention.py tests/gateway/test_matrix_voice.py
All 182 tests pass. 14 new regression tests cover exact / case-insensitive
/ whitespace / unresolved-self-id matches, bridge prefix detection, empty
sender, and the full _on_room_message drop path.
Closes#15775.
Title generation swallowed exceptions at debug level and returned None,
so a depleted auxiliary provider (e.g. OpenRouter 402) silently left
sessions with NULL titles. Reporter observed 45 untitled sessions
accumulated over 19 days with no user-visible indication.
- agent/title_generator.py: accept optional failure_callback, bump log
to WARNING, invoke callback on call_llm exception (swallowing callback
errors so nothing can crash the fire-and-forget worker thread).
- cli.py, gateway/run.py: pass agent._emit_auxiliary_failure as the
callback so failures route through the existing user-visible warning
channel.
- tests: cover callback fires / errors are swallowed / no-callback
legacy behavior / maybe_auto_title forwards kwarg to worker.
raw_content from message["content"] can be a list that contains bare
strings, not only dicts. The previous `p.get("text", "")` call raised
AttributeError on string items, crashing context compression for any
session that had a message with mixed content.
Guard with isinstance checks: dict → .get("text"), str → len(p),
fallback → len(str(p)). Adds a regression test covering the bare-string
case that would have AttributeError'd on the pre-fix code.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
_find_tail_cut_by_tokens called len(content) to estimate message tokens.
When content is a list of blocks (multimodal: text + image_url), len()
returns block count (e.g. 2) rather than character count, so a message
with 500 chars of text was counted as ~10 tokens instead of ~135.
This caused the backward walk to exhaust all messages before hitting the
budget ceiling; the head_end safeguard then forced cut = n - min_tail,
shrinking the protected tail to the bare minimum and preventing effective
compression of long multimodal conversations.
Fix mirrors the existing pattern in _prune_old_tool_results (line 487):
sum(len(p.get("text", "")) for p in raw_content)
if isinstance(raw_content, list) else len(raw_content)
Tests: 3 new cases in TestTokenBudgetTailProtection — regression guard
(confirms the test fails with the bug), plain-string regression guard,
and image-only block edge case.
Fixes#16087.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
If the gateway's Python env loses access to 'croniter' between when a
cron job was created and when mark_job_run() fires, compute_next_run()
returns None for cron schedules. mark_job_run() treated that as terminal
completion and wrote enabled=false, state=completed — turning a missing
runtime dep into a silent, permanent job-off.
That behaviour is safe for one-shot jobs but wrong for recurring ones. A
missing dep should surface as an error the user can see, not as successful
completion of a job that is about to stop firing.
mark_job_run() now only disables the job on next_run_at=None when the
schedule is one-shot. For recurring (cron/interval) schedules it keeps
enabled=true, sets state=error, and records last_error so the user can
see why the job isn't advancing. compute_next_run() also logs a warning
the first time cron+no-croniter hits, so the underlying cause is visible
in the gateway log.
Tests cover:
- recurring cron job stays enabled with state=error when HAS_CRONITER=False
- recurring interval stays enabled when compute_next_run returns None
- one-shot jobs still flip to enabled=false, state=completed (no regression)
Fixes#16265
Azure Foundry deploys GPT-5.x, codex-*, and o1/o3/o4 reasoning models as
Responses-API-only. Calling /chat/completions against these deployments
returns 400 'The requested operation is unsupported.', which broke any
user who ran 'hermes model' on Azure, picked a gpt-5/codex deployment,
and kept the default api_mode: chat_completions. Verified in a user
debug bundle on 2026-04-26: gpt-5.3-codex failed on synopsisse.openai.azure.com
with that exact payload while gpt-4o-pure on the same endpoint worked.
Adds azure_foundry_model_api_mode(model_name) that returns
codex_responses when the model name starts with gpt-5, codex, o1, o3,
or o4 — otherwise None so chat_completions / anthropic_messages stay
untouched for gpt-4o, Llama, Claude-via-Anthropic, etc.
Resolver (both the direct Azure Foundry path and the pool-entry path)
consults it and upgrades api_mode unless the user explicitly picked
anthropic_messages. target_model (from /model mid-session switch)
takes precedence over the persisted default so switching from gpt-4o
to gpt-5.3-codex routes correctly before the next request.
Docs: correct the azure-foundry guide which previously claimed Azure
keeps gpt-5.x on chat completions — that was only true for early Azure
OpenAI, not Azure Foundry codex/o-series deployments.
Tests: 14 unit tests for azure_foundry_model_api_mode + 6 integration
tests in TestAzureFoundryResolution covering Bob's exact scenario,
target_model override, anthropic_messages guard, and o3-mini.
Parse scope from the raw callback URL before stripping the auth code so Flow.fetch_token matches user-granted scopes. Add regression test for dual-scope callbacks.
Made-with: Cursor
Two related fixes for OpenClaw-residue problems after an OpenClaw→Hermes
migration (especially migrations done via OpenClaw's own tool, which
doesn't archive the source directory).
1. optional-skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py:
rebrand_text() was rewriting ~/.openclaw/config.yaml → ~/.Hermes/config.yaml
(capital H — a directory that doesn't exist). Now case-preserving:
"OpenClaw" → "Hermes" (prose), but "openclaw" → "hermes" (so filesystem
paths land on the real Hermes home). Regex logic unchanged — replacement
function now checks if the matched text was all-lowercase and emits the
replacement in the matching case.
2. agent/onboarding.py + cli.py: one-time startup banner the first time
Hermes launches and finds ~/.openclaw/. Tells the user to run
`hermes claw cleanup` to archive it, gated on the existing onboarding
seen-flag framework (onboarding.seen.openclaw_residue_cleanup in
config.yaml). Fires once per install; re-running requires wiping that
flag or running cleanup directly.
Tests:
- 4 new TestDetectOpenclawResidue tests (present / absent / file-instead-
of-dir / default-home smoke)
- 2 TestOpenclawResidueHint tests (content check)
- 2 TestOpenclawResidueSeenFlag tests (flag isolation + round-trip)
- test_rebrand_text_preserves_filesystem_path_casing regression test
with 4 scenarios including the exact ~/.openclaw/config.yaml case
- Existing test_rebrand_text_* tests updated to the new case-preserving
contract (lowercase input → lowercase output)
Co-authored-by: teknium1 <teknium@noreply.github.com>
* feat(skills): install skills from a direct HTTP(S) URL
Adds UrlSource adapter so `hermes skills install <url-to-SKILL.md>` and
`/skills install <url>` work as first-class operations — no more
improvising with curl + patch + cp.
- Claims identifiers that start with http(s):// and end in .md
- Skips /.well-known/skills/ URLs (WellKnownSkillSource handles those)
- Skill name from YAML frontmatter, URL-slug fallback
- Single-file SKILL.md only (v1 scope — multi-file skills need a manifest)
- Trust level 'community'; full security scan still runs
- Lock file stores the URL as identifier so `hermes skills update`
re-fetches from the same URL cleanly
Scope matches real user need from @versun's docx feedback where
`https://sharethis.chat/SKILL.md` had no first-class install path.
* feat(skills): interactive name/category for URL installs + --name override
Follow-up to the UrlSource adapter. The previous commit fell back to weak
heuristics when frontmatter had no ``name:`` and could produce garbage names
like ``SKILL`` or ``unnamed-skill``. Now:
tools/skills_hub.py
- ``UrlSource._is_valid_skill_name()`` — strict identifier check
(``^[a-z][a-z0-9_-]*$``), rejects sentinel values (``SKILL``, ``README``,
``INDEX``, ``unnamed-skill``, empty, non-strings).
- ``_resolve_skill_name()`` returns ``Optional[str]`` — ``None`` when
nothing valid is resolvable. Also ignores unsafe frontmatter names
(``../evil``) and falls through to URL slug instead of returning None
immediately, so a URL with a bad frontmatter but a good path still
works.
- ``fetch()``/``inspect()`` carry an ``awaiting_name=True`` marker in
metadata/extra when resolution fails, letting ``do_install`` decide
whether to prompt, apply an override, or error out.
hermes_cli/skills_hub.py
- ``do_install`` gains a ``name_override`` parameter.
- On URL-sourced bundles with ``awaiting_name=True``:
1. If ``name_override`` is valid → use it.
2. If ``name_override`` is invalid → refuse with a clear error.
3. Else if ``skip_confirm=True`` (non-interactive: slash / TUI /
gateway / scripts) → refuse with an actionable retry hint pointing
at ``--name <your-name>`` on both CLI and slash forms.
4. Else (interactive TTY) → prompt for the name.
- Interactive TTY also prompts for a category when none is given for a
URL-sourced install, hinting existing category buckets so users can
reuse ``productivity``, ``devops``, etc. Empty input → flat install.
- ``_existing_categories()`` scans ``~/.hermes/skills/`` for subdirs that
look like category buckets (contain nested SKILL.md files); skips
top-level skills and hidden dirs.
- ``_prompt_for_skill_name()`` / ``_prompt_for_category()`` helpers
(EOF/Ctrl-C-safe, match the existing ``Confirm [y/N]`` prompt style).
hermes_cli/main.py
- ``hermes skills install`` argparse gains ``--name <name>``.
hermes_cli/skills_hub.py (slash)
- ``/skills install <url> --name <x>`` parsing added.
Tests
- tests/tools/test_skills_hub.py: updated ``UrlSource`` tests to assert
the new ``awaiting_name`` metadata; added 4 new tests for
``_is_valid_skill_name`` rejection sets and the awaiting-name marker.
- tests/hermes_cli/test_skills_hub.py: 8 new tests covering --name
override accept/reject, non-interactive error, interactive name prompt,
interactive category prompt, cancel-aborts-install, and
``_existing_categories`` scan behavior (buckets vs flat skills).
- E2E verified all four paths (no-name/no-override → error;
--name override → install; frontmatter name → install;
invalid --name → rejection).
---------
Co-authored-by: teknium1 <teknium@noreply.github.com>
`_resolve_effective_accept()` used `return bool(cfg_val)` for the
`hooks_auto_accept` config key. In Python, `bool("false")` is `True`,
so a user setting `hooks_auto_accept: "false"` (quoted YAML string)
in `config.yaml` would silently enable auto-approval of every shell
hook, bypassing the consent prompt entirely.
Replace the coercion with the same type-aware parsing already used for
the HERMES_ACCEPT_HOOKS env var three lines above: bool passthrough,
strings checked against {1,true,yes,on} case-insensitively, everything
else (including "false", None, 0, ints) rejected.
Add TestHooksAutoAcceptParsing guarding the regression across all four
value shapes (bool, string-truthy, string-falsy, missing/None).
Reported by @sprmn24 in #16244.
- remove the temporary -c MRU logic and companion test from this branch so PR #15926 stays focused on TUI perf work
- keep the resume-ordering change isolated in the dedicated follow-up PR
CPU profiling showed the built TUI loading React development modules unless NODE_ENV was set. Default CLI and dashboard TUI children to production while preserving explicit user overrides.
When _compress_context rotates session_id (compression split), fire
on_session_start(new_sid, boundary_reason="compression",
old_session_id=<old>) on the active context engine. Plugin engines
(e.g. hermes-lcm) use this to preserve DAG lineage across the rollover
instead of re-initializing fresh per-session state.
Built-in ContextCompressor.on_session_start accepts **kwargs and ignores
them — no behavior change for default users.
Closes hermes-lcm#68 symptom: after Hermes compressed and minted a new
physical session, LCM was treating the split as a fresh /new and losing
continuity (compression_count: 1, store_messages: 0, dag_nodes: 0).
Credit: @Tosko4 (PR #13370) — minimized scope to the boundary_reason
signal only; the broader session-lifecycle refactor will be taken in
separate PRs if justified by concrete plugin need.
Every working dir hermes ever touches gets its own shadow git repo under
~/.hermes/checkpoints/{sha256(abs_dir)[:16]}/. The per-repo _prune is a
no-op (comment in CheckpointManager._prune says so), so abandoned repos
from deleted/moved projects or one-off tmp dirs pile up forever. Field
reports put the typical offender at 1000+ repos / ~12 GB on active
contributor machines.
Adds an opt-in startup sweep that mirrors the sessions.auto_prune
pattern from #13861 / #16286:
- tools/checkpoint_manager.py: new prune_checkpoints() and
maybe_auto_prune_checkpoints() helpers. Deletes shadow repos that
are orphan (HERMES_WORKDIR marker points to a path that no longer
exists) or stale (newest in-repo mtime older than retention_days).
Idempotent via a CHECKPOINT_BASE/.last_prune marker file so it only
runs once per min_interval_hours regardless of how many hermes
processes start up.
- hermes_cli/config.py: new checkpoints.auto_prune /
retention_days / delete_orphans / min_interval_hours knobs.
Default auto_prune: false so users who rely on /rollback against
long-ago sessions never lose data silently.
- cli.py / gateway/run.py: startup hooks gated on checkpoints.auto_prune,
called right next to the existing state.db maintenance block.
- Docs updated with the new config knobs.
- 11 regression tests: orphan/stale deletion, precedence, byte-freed
tracking, non-shadow dir skip, interval gating, corrupt marker
recovery.
Refs #3015 (session-file disk growth was fixed in #16286; this covers
the checkpoint side noted out-of-scope there).
The write_file guard added in #16223 used strict equality against the
internal dedup status message. In practice, the model sometimes
prepends a short note or appends a trailing comment before calling
write_file, which slipped past the strict check.
Broaden the heuristic: reject writes whose stripped content equals
the status message OR contains it and is <=2x its length. Short,
status-dominated writes are always corruption; legitimate docs that
quote the message verbatim are always much longer.
Adds two tests: one for the small-wrapper corruption shape, one
confirming large legitimate files that quote the status still write.
write_file_tool and patch_tool both call _update_read_timestamp to
refresh the staleness tracker after writing, but they never invalidate
the dedup cache entries for the written path. The dedup cache keys are
(resolved_path, offset, limit) → mtime tuples populated by read_file_tool.
On filesystems where a read and write land in the same mtime second (or
when mtime granularity is 1s), the cached and current mtime are equal,
so the dedup check incorrectly returns a 'File unchanged since last
read' stub — even though the file was just overwritten.
The agent then sees stale content (or a stale 'File not found' error)
and enters expensive error-recovery loops, burning API calls.
Fix: add _invalidate_dedup_for_path(filepath, task_id) that removes all
dedup entries whose resolved path matches the written file. Called from
_update_read_timestamp so both write_file_tool and patch_tool benefit
automatically. Scoped to the writing task_id — other tasks' caches are
not affected.
6 regression tests added covering:
- read→write→read within same mtime second (core #13144 scenario)
- invalidation across all offset/limit combinations
- isolation: writing file A does not invalidate file B's cache
- isolation: writing in task A does not invalidate task B's cache
- _invalidate_dedup_for_path safety on missing task / empty dedup
All 25 tests pass (19 existing + 6 new).
Fixes#13144
Four independent session-UX bugs reported by an external user (#16294).
/save wrote hermes_conversation_<ts>.json to CWD — invisible to
'hermes sessions browse' and easy to lose. Snapshots now write under
~/.hermes/sessions/saved/ and the command prints the absolute path plus
a 'hermes --resume <id>' hint for the live DB-indexed session.
'hermes sessions browse' default --limit raised from 50 to 500. With the
old ceiling, users with moderately long histories saw only the most
recent 50 rows and assumed older sessions had been lost.
TUI session.list (`/resume` picker) switched from a hardcoded allow-list
of 13 gateway source names to a deny-list of just { 'tool' }. Sessions
tagged acp / webhook / user-defined HERMES_SESSION_SOURCE values and
any newly-added platform now surface. Default limit 20 → 200.
ollama-cloud provider setup passes force_refresh=True to
fetch_ollama_cloud_models() so a user entering their API key sees the
fresh catalog (e.g. deepseek v4 flash, kimi k2.6) immediately instead
of waiting up to an hour for the disk cache TTL to expire.
Closes#16294.
_web_ui_build_needed() in PR #14914 checked web_dir/"dist" as the
sentinel, but vite.config.ts sets outDir: "../hermes_cli/web_dist" so
the build output lands in hermes_cli/web_dist/, never in web/dist/.
The sentinel was therefore always missing → _web_ui_build_needed always
returned True → npm install + Vite build ran on every startup → OOM on
low-memory VPS persisted unchanged.
Fix: derive dist_dir as web_dir.parent / "hermes_cli" / "web_dist" so
the sentinel points to the actual build output directory.
Fixes#14898
When the gateway intercepts a pending /update prompt and the user sends
a recognized slash command (/new, /help, ...), the command now dispatches
normally AND the detached update subprocess is unblocked by writing a
blank .update_response. _gateway_prompt reads '' → strips → returns the
prompt's default (typically a safe 'n' / skip), so the update process
exits cleanly instead of blocking on stdin until the 30-minute watcher
timeout.
Also clears _update_prompt_pending[session_key] on this path so stray
future input for the same session isn't re-intercepted.
Extends PR #15849 with tests for the new cancel-write + a regression
test pinning the legacy behavior of unrecognized /foo slash commands
still being consumed as the response.
Slack Bolt posts are not editable like CLI spinners; medium-tier new still emitted a permanent line per tool start (issue #14663).
- Built-in slack default: off; other tier-2 platforms unchanged.
- Adjust /verbose isolation test for off to new cycle.
- Migration tests: read/write config.yaml as UTF-8 (Windows locale).
- TestAutoMaintenance gains 3 tests: auto-prune deletes transcript files
when sessions_dir is passed, preserves them when it isn't (backward-
compat), and never touches active-session files during prune.
- FakeDB helpers in test_sessions_delete.py accept **kwargs so they
don't break when delete_session signature gains sessions_dir.
Extends the existing channel_skill_bindings mechanism (previously
Discord-only) to Slack, so a channel or DM can auto-load one or more
skills at session start without relying on the model's skill selector
for every short reply.
Motivation: Mats's German flashcards DM pushes a cron-driven card
5x/day; he responds with one-word guesses like 'work'. Previously each
reply required the main agent to decide whether to load german-flashcards
(full opus turn just to pick a skill). With the binding configured per
Slack channel, the skill is injected at session start and grading runs
directly.
Changes:
- Extract resolve_channel_skills() from DiscordAdapter._resolve_channel_skills
into gateway.platforms.base (now shared across adapters).
- DiscordAdapter._resolve_channel_skills delegates to the shared helper
(behavior preserved — existing test suite still passes unchanged).
- SlackAdapter: resolve channel_skill_bindings on each message and attach
auto_skill to MessageEvent. gateway/run.py already handles auto-skill
injection on new sessions; this just wires Slack through it.
- gateway/config.py: accept channel_skill_bindings in slack: block of
config.yaml (was Discord-only).
- Tests: new tests/gateway/test_slack_channel_skills.py with 11 cases
covering DM/thread/parent resolution, single-vs-list skills, dedup,
malformed entries. Discord suite unchanged.
- Docs: add 'Per-Channel Skill Bindings' section to Slack user guide.
Config example:
slack:
channel_skill_bindings:
- id: "D0ATH9TQ0G6"
skills: ["german-flashcards"]