hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-28 23:11:37 +08:00

Author	SHA1	Message	Date
Teknium	731e1ef8cb	feat(azure-foundry): auto-detect transport, models, context length The azure-foundry wizard now probes the endpoint before asking the user to pick anything by hand: 1. URL path sniff — endpoints ending in /anthropic are Azure Foundry Claude routes and skip to anthropic_messages. 2. GET <base>/models probe — if the endpoint returns an OpenAI-shaped model list, we switch to chat_completions and prefill the picker with the returned deployment/model IDs. 3. Anthropic Messages probe — fallback for endpoints that don't expose /models but do speak the Anthropic Messages shape. 4. Manual fallback — private endpoints / custom routes still work; the user picks API mode + types a deployment name. Context length for the selected model is resolved through the existing agent.model_metadata.get_model_context_length chain (models.dev, provider metadata, hardcoded family fallbacks) and stored in model.context_length when a non-default value is found. Also refactors runtime_provider so Azure Foundry resolution is reused between the explicit-credentials path and the default top-level path — previously the /v1 strip for Anthropic-style Azure only ran when the caller passed explicit_* args, which meant config-driven sessions hit a double-/v1 URL. New module hermes_cli/azure_detect.py with 19 unit tests covering: - path sniff, model ID extraction, probe fallbacks - HTTP error handling (URLError, HTTPError) - context-length lookup passthrough - DEFAULT_FALLBACK_CONTEXT rejection New runtime tests cover: - OpenAI-style Azure Foundry - Anthropic-style Azure Foundry with /v1 stripping - Missing base_url / API key raising AuthError Rationale: Microsoft confirms there's no pure-API-key endpoint to list Azure deployments (that requires ARM management auth). The v1 Azure OpenAI endpoint does expose /models with the resource's available model catalog, which is good enough for picker prefill in the common case. Users on private/gated endpoints fall through to manual entry.	2026-04-25 18:48:43 -07:00
akhater	ac57114284	fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint Azure OpenAI exposes an OpenAI-compatible endpoint at `{resource}.openai.azure.com/openai/v1` that accepts the standard `openai` Python client. Two issues prevented gpt-5.x models from working: 1. `_max_tokens_param()` only sent `max_completion_tokens` for `api.openai.com` URLs. Azure also requires `max_completion_tokens` for gpt-5.x models. 2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x to Responses API. Azure does NOT support the Responses API — it serves gpt-5.x on the regular `/chat/completions` path, causing a 404. Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs. - `_max_tokens_param()` now returns `max_completion_tokens` for Azure. - The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on `chat_completions` where Azure actually serves it. - The fallback-provider api_mode picker also recognises Azure and stays on chat_completions. - Tests cover max_tokens routing, api_mode behaviour, and URL detection. gpt-4.x models on Azure are unaffected (already used chat_completions + max_tokens, which Azure accepts for those models). Salvage of PR #10086 — rewritten against current main where the codex_responses upgrade gate gained copilot-acp / explicit-api_mode exclusions.	2026-04-25 18:48:43 -07:00
Teknium	125de02056	fix(context): honor custom_providers context_length on /model switch + bump probe tier to 256K (#15844 ) Fixes #15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback. ## What changed New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching. `agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe). Wired through five call sites that previously either duplicated the lookup or ignored it entirely: - `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning) - `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch - `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg - `gateway/run.py` /model confirmation (picker callback + text path) - `gateway/run.py` `_format_session_info` (/info) ## Context probe tiers `CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency. ## Repro (from #15779) ```yaml custom_providers: - name: my-custom-endpoint base_url: https://example.invalid/v1 model: gpt-5.5 models: gpt-5.5: context_length: 1050000 ``` `/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000". ## Tests - `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants - `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver - `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K) - `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier	2026-04-25 18:47:53 -07:00
ekko	0a15dbdc43	feat(api_server): add POST /v1/runs/{run_id}/stop endpoint Add ability to interrupt a running agent via the runs API. Previously /v1/runs could start a run and subscribe to events, but there was no way to cancel it. The new endpoint stores agent and task references during execution, calls agent.interrupt() to stop LLM calls, then cancels the asyncio task. Includes 15 tests covering start, events, and stop scenarios.	2026-04-25 18:40:35 -07:00
Wysie	1d80e92c7e	test(discord): add guild to fake e2e messages	2026-04-25 18:25:56 -07:00
voidborne-d	45e1228a8a	fix(cli): suppress OSError EIO on interrupt shutdown When the user interrupts a long-running task, prompt_toolkit tries to flush stdout during emergency shutdown. If stdout is in a broken state (redirected to /dev/null, pipe closed, terminal gone), the flush raises `OSError: [Errno 5] Input/output error` which propagates unhandled and crashes the CLI. Two defense layers: 1. `_suppress_closed_loop_errors`: add `OSError` with `errno.EIO` to the asyncio exception handler, matching the existing pattern for `RuntimeError("Event loop is closed")` and `KeyError("is not registered")`. 2. Outer `except (KeyError, OSError)` block: add `errno.EIO` check before the existing string-match guards, silently suppressing the error instead of printing a misleading stdin-related message. Fixes #13710.	2026-04-25 18:25:13 -07:00
nerijusas	81e01f6ee9	fix(agent): preserve Codex message items for replay	2026-04-25 18:22:06 -07:00
Teknium	8bbeaea6c7	fix(config): broaden api-key ref lookup to templated base_url The raw-template lookup added in PR #15817 went through `get_compatible_custom_providers(read_raw_config())`, which calls `_normalize_custom_provider_entry` → `urlparse(base_url)`. Any entry whose `base_url` is itself an env-ref (`${NEURALWATT_API_BASE}`) was dropped as 'not a valid URL', so `api_key_ref` stayed empty and the resolved secret was still written to `model.api_key` — the exact case the original Discord report described. Replace the normalizer-gated lookup with a direct read of `raw['custom_providers']` and `raw['providers']`, indexed by name (case-insensitive, optionally qualified by model) so the loaded (expanded) entry can be matched regardless of how `base_url` is written. Add an integration regression test driving the real `select_provider_and_model` entry point with the Discord-reported NeuralWatt config (`${VAR}` in both `base_url` and `api_key`). This test fails on the PR-only fix and passes with the broadened lookup.	2026-04-25 18:10:52 -07:00
helix4u	1fdc31b214	fix(config): preserve custom provider api key refs	2026-04-25 18:10:52 -07:00
helix4u	b2d3308f98	fix(doctor): accept bare custom provider	2026-04-25 18:01:36 -07:00
Iris Jin	25ba6a4a74	fix(gateway): make reasoning session-scoped by default	2026-04-25 18:01:31 -07:00
FocusFlow Dev	ad0ac89478	fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages Previously _copy_reasoning_content_for_api only padded reasoning_content when the assistant message had tool_calls. DeepSeek V4 thinking mode requires the field on every assistant turn, including plain text replies without tool_calls. - Remove the 'source_msg.get("tool_calls") and' guard - Update test: plain assistant turns now get padded for DeepSeek/Kimi Fixes #15213	2026-04-26 07:47:13 +08:00
Brooklyn Nicholson	c6fdf48b79	fix(tui): sync inference model after switches - keep HERMES_INFERENCE_MODEL aligned with HERMES_MODEL after in-TUI model switches - clarify static provider detection remapping docs	2026-04-25 14:17:57 -05:00
Brooklyn Nicholson	fdcbd2257b	fix(tui): resolve startup model aliases statically - expand short model aliases like sonnet/opus via static catalogs during startup runtime resolution - keep startup alias resolution network-free and add regression tests in models and tui gateway suites	2026-04-25 14:13:02 -05:00
Brooklyn Nicholson	5e52011de3	fix(tui): bind provider as model alias	2026-04-25 13:58:59 -05:00
Brooklyn Nicholson	e48a497d16	fix(tui): share static model detection	2026-04-25 13:56:16 -05:00
Brooklyn Nicholson	2dfcc8087a	fix(tui): avoid network lookup during startup	2026-04-25 13:47:18 -05:00
Brooklyn Nicholson	4db58d45d4	fix(tui): address startup provider review	2026-04-25 13:29:15 -05:00
Brooklyn Nicholson	57b43fdd4b	fix(tui): preserve provider precedence on startup	2026-04-25 13:25:43 -05:00
Brooklyn Nicholson	e9c47c7042	fix(tui): honor launch model overrides	2026-04-25 13:21:59 -05:00
brooklyn!	ee0728c6c4	Merge pull request #15351 from helix4u/fix/tui-rebuild-missing-ink-bundle fix(tui): rebuild when ink bundle is missing	2026-04-25 13:14:23 -05:00
kshitij	648b89911f	fix: use output_text for assistant message content in Codex Responses API (#15690 ) The Codex Responses API rejects input_text inside assistant messages — only output_text and refusal are valid content types for assistant role. _chat_content_to_responses_parts() previously hardcoded all text content to input_text regardless of the message role. When an assistant message had list-format content (multimodal or structured), this produced invalid input_text parts that the API rejected with: Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'. Fix: add a role parameter to _chat_content_to_responses_parts() that selects output_text for assistant messages and input_text for user messages. Thread this through _chat_messages_to_responses_input() and _preflight_codex_input_items(). Fixes #15687	2026-04-25 10:13:29 -07:00
kshitijk4poor	7c17accb29	fix: /stop now immediately aborts streaming retry loop When a user sends /stop during a streaming API call, the outer poll loop detects _interrupt_requested and closes the HTTP connection. However, the inner _call() thread catches the connection error and enters its retry loop — opening a FRESH connection without checking the interrupt flag. On slow providers like ollama-cloud, each retry attempt blocks for the full stream-read timeout (120s+). With 3 retry attempts this caused 510+ second delays between /stop and actual response — the agent appeared completely unresponsive despite the stop being acknowledged. Fix: add an _interrupt_requested check at the top of the streaming retry loop so the agent exits immediately instead of retrying. Also fix log truncation: all session key logging in gateway/run.py used [:20] or [:30] slices, which truncated 'agent:main:telegram:dm:5690190437' (33 chars) to 'agent:main:telegram:' — losing the identifying chat type and user ID. Replace with full keys to make logs debuggable. Reported by user Sidharth Pulipaka via Telegram on ollama-cloud provider.	2026-04-25 09:51:39 -07:00
Teknium	ea01bdcebe	refactor(memory): remove flush_memories entirely (#15696 ) The AIAgent.flush_memories pre-compression save, the gateway _flush_memories_for_session, and everything feeding them are obsolete now that the background memory/skill review handles persistent memory extraction. Problems with flush_memories: - Pre-dates the background review loop. It was the only memory-save path when introduced; the background review now fires every 10 user turns on CLI and gateway alike, which is far more frequent than compression or session reset ever triggered flush. - Blocking and synchronous. Pre-compression flush ran on the live agent before compression, blocking the user-visible response. - Cache-breaking. Flush built a temporary conversation prefix (system prompt + memory-only tool list) that diverged from the live conversation's cached prefix, invalidating prompt caching. The gateway variant spawned a fresh AIAgent with its own clean prompt for each finalized session — still cache-breaking, just in a different process. - Redundant. Background review runs in the live conversation's session context, gets the same content, writes to the same memory store, and doesn't break the cache. Everything flush_memories claimed to preserve is already covered. What this removes: - AIAgent.flush_memories() method (~248 LOC in run_agent.py) - Pre-compression flush call in _compress_context - flush_memories call sites in cli.py (/new + exit) - GatewayRunner._flush_memories_for_session + _async_flush_memories (and the 3 call sites: session expiry watcher, /new, /resume) - 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks, hermes tools UI task list, auxiliary_client docstrings - _memory_flush_min_turns config + init - #15631's headroom-deduction math in _check_compression_model_feasibility (headroom was only needed because flush dragged the full main-agent system prompt along; the compression summariser sends a single user-role prompt so new_threshold = aux_context is safe again) - The dedicated test files and assertions that exercised flush-specific paths What this renames (with read-time backcompat on sessions.json): - SessionEntry.memory_flushed -> SessionEntry.expiry_finalized. The session-expiry watcher still uses the flag to avoid re-running finalize/eviction on the same expired session; the new name reflects what it now actually gates. from_dict() reads 'expiry_finalized' first, falls back to the legacy 'memory_flushed' key so existing sessions.json files upgrade seamlessly. Supersedes #15631 and #15638. Tested: 383 targeted tests pass across run_agent/, agent/, cli/, and gateway/ session-boundary suites. No behavior regressions — background memory review continues to handle persistent memory extraction on both CLI and gateway.	2026-04-25 08:21:14 -07:00
kshitijk4poor	d635e2df3f	fix(compression): pass provider to context length resolver in feasibility check _check_compression_model_feasibility calls get_model_context_length without provider=, so Codex OAuth users get 1,050,000 (from models.dev for 'openai') instead of the actual 272,000 limit. This happens because _infer_provider_from_url maps chatgpt.com → 'openai' (not 'openai-codex'), skipping the Codex-specific resolution branch entirely. Result: compression threshold set at 85% of 1.05M = 892K — conversations never trigger compression, the context grows unbounded, and when gateway hygiene eventually forces compression, the Codex endpoint drops the oversized streaming request ('peer closed connection without sending complete message body'). Fix: forward self.provider to get_model_context_length so provider- specific resolution branches (Codex OAuth 272K, Copilot live /models, Nous suffix-match) fire correctly. Reported by user on GPT 5.5 via Codex OAuth Pro (paste.rs/vsra3).	2026-04-25 07:09:47 -07:00
Teknium	af22421e87	feat(dashboard): page-scoped plugin slots for built-in pages (#15658 ) * fix(terminal): three-layer defense against watch_patterns notification spam Background processes that stack notify_on_complete=True with watch_patterns can flood the user with duplicate, delayed notifications — matches deliver asynchronously via the completion queue and continue arriving minutes after the process has exited. The docstring warning against this (PR #12113) has proven insufficient; agents still misuse the combination. Three layered defenses, each sufficient on its own: 1. Mutual exclusion (terminal_tool.py): When both flags are set on a background process, drop watch_patterns with a warning. notify_on_complete wins because 'let me know when it's done' is the more useful signal and fires exactly once. Extracted as _resolve_notification_flag_conflict() so the rule is testable in isolation. 2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now bails the moment session.exited is True. Post-exit chunks (buffered reads draining after the process is gone) no longer produce notifications. This is the fix flagged as future work in session 20260418_020302_79881c. 3. Global circuit breaker (process_registry.py): Per-session rate limits don't catch the sibling-flood case — N concurrent processes can each stay under 8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap trips a 30-second cooldown across ALL sessions, emits a single watch_overflow_tripped event, silently counts dropped events, and emits a watch_overflow_released summary when the cooldown ends. Also updates the tool schema + docstring to document the new behavior. Tests: 8 new tests covering all three fixes (suppress-after-exit x2, mutual-exclusion resolver x4, global breaker trip/cooldown/release x2). All 60 tests across test_watch_patterns.py, test_notify_on_complete.py, test_terminal_tool.py pass. Real-world trigger: self-inflicted in session 20260425_051924 — three concurrent hermes-sweeper review subprocesses each set watch_patterns= ['failed validation', 'errored'] AND notify_on_complete=True, then iterated over multiple items, producing enough matches per process to defeat the per-session cap while staying under the global cap that didn't yet exist. * fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion Per Teknium's direction, the watch_patterns rate limit is now much more aggressive and self-healing. ## New rule — per session - HARD cap: 1 watch-match notification per 15 seconds per process. - Any match arriving inside the cooldown window is dropped and counts as ONE strike for that window (many drops in the same window still = 1 strike). - After 3 consecutive strike windows, watch_patterns is permanently disabled for the session and the session is auto-promoted to notify_on_complete semantics — exactly one notification when the process actually exits. - A cooldown window that expires with zero drops resets the consecutive strike counter — healthy cadence is forgiven. ## Schema + docstring rewritten The tool schema description now gives the model explicit guidance: - notify_on_complete is 'the right choice for almost every long-running task' - watch_patterns is for RARE one-shot signals on LONG-LIVED processes - Do NOT use watch_patterns with loops/batch jobs — error patterns fire every iteration and will hit the strike limit fast - Mutual exclusion is stated on both parameter descriptions - 1/15s cooldown and 3-strike promotion are stated in the watch_patterns description so the model sees the contract every turn ## Removed - WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the new 1/15s limit subsumes both; keeping them would double-count. - _watch_window_hits / _watch_window_start / _watch_overload_since fields on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until / _watch_strike_candidate / _watch_consecutive_strikes. ## Kept - Global circuit breaker across all sessions (15/10s → 30s cooldown) as a secondary safety net for concurrent siblings. Still valuable when 20 short-lived processes each fire once — none individually violates the per-session limit. - Suppress-after-exit guard. - Mutual exclusion resolver at the tool entry point. ## Tests - 6 new tests in TestPerSessionRateLimit covering: first match delivers, second in cooldown suppressed, multi-drop = single strike, 3 strikes disables + promotes, clean window resets counter, suppressed count carried to next emit. - Global circuit breaker tests rewritten to use fresh sessions instead of hacking removed per-window fields. - 50/50 watch_patterns + notify_on_complete tests pass. - 60/60 including test_terminal_tool.py pass. * feat(dashboard): page-scoped plugin slots for built-in pages Dashboard plugins can now inject components into specific built-in pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs, Chat) without overriding the whole route. Previously, plugins could only: 1. Add new tabs (tab.path) 2. Replace whole built-in pages (tab.override) 3. Inject into global shell slots (header-, footer-, pre-main, ...) None of those let a plugin add a banner, card, or widget to an existing page. The new <page>:top / <page>:bottom slots close that gap, reusing the existing registerSlot() API. Changes - web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries (sessions:top, sessions:bottom, analytics:top, ..., chat:bottom), grouped under "Shell-wide" vs "Page-scoped" in the docblock - web/src/pages/*: each built-in page now renders <PluginSlot name="<page>:top" /> as the first child of its outer wrapper and <PluginSlot name="<page>:bottom" /> as the last child -- zero visual cost when no plugin registers - plugins/example-dashboard: registers a demo banner into sessions:top via registerSlot(), with matching slots entry in the manifest -- so freshly-setup users can see what page-scoped slots look like without writing any plugin code - website/docs: new "Page-scoped slots" table in the plugin authoring guide, with a worked example - tests/hermes_cli/test_web_server.py: round-trip test for colon-bearing slot names (sessions:top, analytics:bottom, ...) Validation - npm run build: clean (tsc -b + vite build, 2761 modules) - scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass	2026-04-25 06:55:35 -07:00
Teknium	97d54f0e4d	fix(terminal): three-layer defense against watch_patterns notification spam (#15642 ) * fix(terminal): three-layer defense against watch_patterns notification spam Background processes that stack notify_on_complete=True with watch_patterns can flood the user with duplicate, delayed notifications — matches deliver asynchronously via the completion queue and continue arriving minutes after the process has exited. The docstring warning against this (PR #12113) has proven insufficient; agents still misuse the combination. Three layered defenses, each sufficient on its own: 1. Mutual exclusion (terminal_tool.py): When both flags are set on a background process, drop watch_patterns with a warning. notify_on_complete wins because 'let me know when it's done' is the more useful signal and fires exactly once. Extracted as _resolve_notification_flag_conflict() so the rule is testable in isolation. 2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now bails the moment session.exited is True. Post-exit chunks (buffered reads draining after the process is gone) no longer produce notifications. This is the fix flagged as future work in session 20260418_020302_79881c. 3. Global circuit breaker (process_registry.py): Per-session rate limits don't catch the sibling-flood case — N concurrent processes can each stay under 8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap trips a 30-second cooldown across ALL sessions, emits a single watch_overflow_tripped event, silently counts dropped events, and emits a watch_overflow_released summary when the cooldown ends. Also updates the tool schema + docstring to document the new behavior. Tests: 8 new tests covering all three fixes (suppress-after-exit x2, mutual-exclusion resolver x4, global breaker trip/cooldown/release x2). All 60 tests across test_watch_patterns.py, test_notify_on_complete.py, test_terminal_tool.py pass. Real-world trigger: self-inflicted in session 20260425_051924 — three concurrent hermes-sweeper review subprocesses each set watch_patterns= ['failed validation', 'errored'] AND notify_on_complete=True, then iterated over multiple items, producing enough matches per process to defeat the per-session cap while staying under the global cap that didn't yet exist. * fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion Per Teknium's direction, the watch_patterns rate limit is now much more aggressive and self-healing. ## New rule — per session - HARD cap: 1 watch-match notification per 15 seconds per process. - Any match arriving inside the cooldown window is dropped and counts as ONE strike for that window (many drops in the same window still = 1 strike). - After 3 consecutive strike windows, watch_patterns is permanently disabled for the session and the session is auto-promoted to notify_on_complete semantics — exactly one notification when the process actually exits. - A cooldown window that expires with zero drops resets the consecutive strike counter — healthy cadence is forgiven. ## Schema + docstring rewritten The tool schema description now gives the model explicit guidance: - notify_on_complete is 'the right choice for almost every long-running task' - watch_patterns is for RARE one-shot signals on LONG-LIVED processes - Do NOT use watch_patterns with loops/batch jobs — error patterns fire every iteration and will hit the strike limit fast - Mutual exclusion is stated on both parameter descriptions - 1/15s cooldown and 3-strike promotion are stated in the watch_patterns description so the model sees the contract every turn ## Removed - WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the new 1/15s limit subsumes both; keeping them would double-count. - _watch_window_hits / _watch_window_start / _watch_overload_since fields on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until / _watch_strike_candidate / _watch_consecutive_strikes. ## Kept - Global circuit breaker across all sessions (15/10s → 30s cooldown) as a secondary safety net for concurrent siblings. Still valuable when 20 short-lived processes each fire once — none individually violates the per-session limit. - Suppress-after-exit guard. - Mutual exclusion resolver at the tool entry point. ## Tests - 6 new tests in TestPerSessionRateLimit covering: first match delivers, second in cooldown suppressed, multi-drop = single strike, 3 strikes disables + promotes, clean window resets counter, suppressed count carried to next emit. - Global circuit breaker tests rewritten to use fresh sessions instead of hacking removed per-window fields. - 50/50 watch_patterns + notify_on_complete tests pass. - 60/60 including test_terminal_tool.py pass.	2026-04-25 06:41:58 -07:00
Teknium	ac05daa189	fix(tools): dedupe bundled plugin toolsets with built-in entries (#15634 ) `hermes tools` → "reconfigure existing" listed Spotify twice because the Apr 24 refactor that moved Spotify into plugins/spotify/ (PR #15174) left the entry in CONFIGURABLE_TOOLSETS. _get_effective_configurable_toolsets() unconditionally appended get_plugin_toolsets() on top, so the same 'spotify' key showed up from both sources. Dedupe by key — built-in CONFIGURABLE_TOOLSETS entry wins (it has the nicer label and description). Also guards against future bundled plugins that share a toolset key with a built-in.	2026-04-25 05:53:08 -07:00
Teknium	3c1c65e754	fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633 ) Generalize the temperature-specific 400 retry that shipped in PR #15621 so the same reactive strategy covers any provider that rejects an arbitrary request parameter — — not just temperature. - agent/auxiliary_client.py: * New _is_unsupported_parameter_error(exc, param): matches the same six phrasings the old temperature detector did plus 'unrecognized parameter' and 'invalid parameter', against any named param. * _is_unsupported_temperature_error is now a thin back-compat wrapper so existing imports and tests keep working. * The max_tokens → max_completion_tokens retry branch in call_llm and async_call_llm now (a) gates on 'max_tokens is not None' so we do not pop a key that was never set and silently substitute a None value on the retry, and (b) also matches the generic helper in addition to the legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking up phrasings like 'Unknown parameter: max_tokens' that previously slipped through. - tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering the generic detector across params, the back-compat wrapper, and the two hardenings to the max_tokens retry branch (None gate + generic phrasing). Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR also proposed the reactive temperature retry which landed independently via PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages the remaining hardening ideas onto current main.	2026-04-25 05:50:34 -07:00
Teknium	f92006ce1c	fix(compression): reserve system+tools headroom when aux binds threshold (#15631 ) When the auxiliary compression model's context is smaller than the main model's compression threshold, _check_compression_model_feasibility auto-lowers the session threshold. Previously it set: new_threshold = aux_context This let the raw message list grow to exactly aux_context tokens. But compression and flush_memories actually send system_prompt + tool_schemas + messages to the aux model. With 50+ tools that overhead is 25-30K tokens, so the full request overflowed aux with HTTP 400. Subtract a headroom estimate from aux_context before setting the new threshold: the actual tool-schema token count (from estimate_request_tokens_rough) plus a 12K allowance for the system prompt (not yet built at __init__ time) and flush-instruction overhead. Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with an unusually heavy tool schema. This fixes the 'flush_memories overflow on busy toolsets' path that Teknium flagged — where main and aux can be nominally the same model but still 400 because the threshold left no room for the request overhead. Same fix also protects the normal compression summarisation request on the same binding aux. Tests: two new regression tests cover the headroom reservation and the MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new (lower) threshold values now that empty-tools still produces a 12K static headroom deduction.	2026-04-25 05:41:56 -07:00
Ash Rowan Vale 🌿	facea84559	fix(auxiliary): retry without temperature when any provider rejects it Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature' across all providers/models — not just Codex Responses. The same backend can accept temperature for some models and reject it for others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does not scale. call_llm / async_call_llm now detect the concrete 'unsupported parameter: temperature' 400 and transparently retry once without temperature. Kimi's server-managed omission and Opus 4.7+'s proactive strip stay in place — this is the safety net for everything else. Changes: - agent/auxiliary_client.py: add _is_unsupported_temperature_error helper; wire into both sync and async call_llm paths before the existing max_tokens/payment/auth retry ladder - tests/agent/test_unsupported_temperature_retry.py: 19 tests covering detector phrasings, sync + async retry, no-retry-without-temperature, and non-temperature 400s not triggering the retry Builds on PR #15620 (codex_responses fallback) which stripped temperature up front for that one api_mode. This PR closes the gap for every other provider/model combo via reactive retry. Credit: retry approach and detector originate from @BlueBirdBack's PR #15578. Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>	2026-04-25 05:27:17 -07:00
Teknium	f67a61dc93	fix(flush_memories): strip temperature from codex_responses fallback (#15620 ) The memory-flush fallback for api_mode='codex_responses' was unconditionally adding `temperature` to codex_kwargs before calling _run_codex_stream. The Responses API does not accept temperature on any supported backend: - chatgpt.com/backend-api/codex rejects it outright - api.openai.com + gpt-5/o-series reasoning models reject it - Copilot Responses rejects it on reasoning models The CodexAuxiliaryClient adapter and the codex_responses transport both correctly omit temperature — the flush fallback was the only path putting it back. On errors from the primary aux path (e.g. expired OAuth token), users saw `⚠ Auxiliary memory flush failed: HTTP 400: Unsupported parameter: temperature`. Reported by Garik [NOUS] on GPT-5.5 via Codex OAuth Pro.	2026-04-25 05:01:25 -07:00
Teknium	6ed37e0f42	feat(tools): make discord/discord_admin opt-in, Discord-only Both discord (read/participate) and discord_admin (server admin) are now configurable via `hermes tools` with default-OFF. Previously the core discord tool (fetch_messages, search_members, create_thread) auto-loaded on every Discord install with DISCORD_BOT_TOKEN set — 19 tools the user never opted into. Adds a platform-scoping mechanism (_TOOLSET_PLATFORM_RESTRICTIONS) so the discord toolsets only show up in the Discord platform's checklist, not on CLI/Telegram/Slack/etc. Applied at four gates: - _prompt_toolset_checklist: checklist filter - _get_platform_tools: resolution filter (both branches) - _save_platform_tools: save-time filter (covers 'Configure all platforms' and hand-edited config.yaml) - tools_disable_enable_command: rejects `hermes tools enable discord` on non-Discord platforms with a clear error build_session_context_prompt now injects the Discord IDs block only when both conditions hold: the discord/discord_admin toolset is enabled AND DISCORD_BOT_TOKEN is set. Toolset alone isn't enough — the tool's check_fn gates on the token at registry time, so opting in without a token yields no tools and the IDs block would lie. Otherwise keep the stale-API disclaimer.	2026-04-25 04:51:11 -07:00
alt-glitch	db09477b77	feat(feishu): wire feishu doc/drive tools into hermes-feishu composite The feishu_doc and feishu_drive tools were registered in the tool registry but never added to the hermes-feishu composite toolset. The pipeline fix from the prior commit now recovers them automatically once they are in the composite.	2026-04-25 04:50:14 -07:00
alt-glitch	81987f0350	feat(discord): split discord_server into discord + discord_admin tools Split the monolithic discord_server tool (14 actions) into two: - discord: core actions (fetch_messages, search_members, create_thread) that are useful for the agent's normal operation. Auto-enabled on the discord platform via the pipeline fix. - discord_admin: server management actions (list channels/roles, pins, role assignment) that require explicit opt-in via hermes tools. Added to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS.	2026-04-25 04:50:14 -07:00
alt-glitch	9830905dab	fix(tools): recover non-configurable toolsets from composite resolution The reverse-mapping loop in _get_platform_tools only checked CONFIGURABLE_TOOLSETS, silently dropping platform-specific toolsets like discord and feishu_doc whose tools were in the composite but had no configurable key. Add a second pass over TOOLSETS that picks up unclaimed toolsets whose tools are present in the resolved composite.	2026-04-25 04:50:14 -07:00
Teknium	0d548d1db9	fix(cron): wire context_from through the update action The tool schema promised 'On update, pass an empty array to clear' but the update branch ignored the context_from kwarg entirely — users could set the field at create time and never modify or clear it afterward. - tools/cronjob_tools.py: handle context_from in the update branch the same way script/enabled_toolsets/workdir are handled: normalize str/list to refs, validate each referenced job exists (same check the create branch does), store as list-or-None to match create_job()'s shape. Empty string or empty list clears the field. - tests/cron/test_cron_context_from.py: 6 new tests covering add/change/ clear (both shapes)/bad-ref/preserve-across-unrelated-update.	2026-04-25 04:49:28 -07:00
MorAlekss	eb92222811	fix(cron): silent skip when context_from job has no output yet	2026-04-25 04:49:28 -07:00
MorAlekss	e4a91ccb76	test(cron): add PermissionError coverage for context_from	2026-04-25 04:49:28 -07:00
MorAlekss	5ac5365923	feat(cron): add context_from field for cron job output chaining	2026-04-25 04:49:28 -07:00
alt-glitch	9d7b64b5dd	fix(tools): normalize numeric entries and clear stale no_mcp in _save_platform_tools YAML parses bare numeric toolset names (e.g. 12306:) as int, causing TypeError in sorted() since the read path normalizes to str but the save path did not. The no_mcp sentinel was preserved in existing entries even when the user re-enabled MCP servers, causing MCP to stay silently disabled.	2026-04-25 04:49:02 -07:00
vominh1919	5401a0080d	fix: recalculate token budgets on model switch in ContextCompressor update_model() recalculated threshold_tokens but left tail_token_budget and max_summary_tokens at their __init__ values. When switching from a 200K model to 32K, the tail budget stayed at ~20K tokens (62% of 32K) instead of the intended ~10%. Adds budget recalculation in update_model() and 2 regression tests.	2026-04-25 15:07:56 +05:30
Teknium	023b1bff11	fix(delegate): resolve subagent approval prompts without deadlocking parent TUI (#15491 ) Subagents run inside a ThreadPoolExecutor. The CLI's interactive approval callback lives in tools/terminal_tool.py's threading.local(), which worker threads do not inherit. When a subagent hits a dangerous-command guard, prompt_dangerous_approval() falls back to input() from the worker thread, deadlocking against the parent's prompt_toolkit TUI that owns stdin. Fix: install a non-interactive callback into every subagent worker thread via ThreadPoolExecutor(initializer=set_approval_callback, initargs=(cb,)). The callback is config-gated by delegation.subagent_auto_approve: false (default) -> _subagent_auto_deny (safe; matches leaf tool blocklist) true -> _subagent_auto_approve (opt-in YOLO for cron/batch) Both emit a logger.warning audit line. Gateway sessions are unaffected because they resolve approvals via tools/approval.py's per-session queue, not through these TLS callbacks. Diagnosis credit: @MorAlekss (#14685). - hermes_cli/config.py: DEFAULT_CONFIG.delegation.subagent_auto_approve: False - cli-config.yaml.example: documented, commented (default) - tools/delegate_tool.py: _subagent_auto_deny, _subagent_auto_approve, _get_subagent_approval_callback, wired into the child timeout executor - tests/tools/test_delegate.py: 7 tests covering defaults, truthy coercion, and TLS scoping in the worker thread	2026-04-24 22:37:22 -07:00
Clifford Garwood	2182de55bb	fix(matrix): drop needless DeviceID import + mock put_device_id in tests Two adjustments to make CI pass: - In gateway/platforms/matrix.py: `DeviceID` is `NewType("DeviceID", str)`, so passing `client.device_id` directly (already a str) works identically at runtime. The explicit import was cosmetic and tripped CI environments where `mautrix.types` doesn't re-export DeviceID at the expected path ("cannot import name 'DeviceID' from 'mautrix.types' (unknown location)"). - In tests/gateway/test_matrix.py: add `put_device_id` to the hand-written `PgCryptoStore` fake so the three encryption-path tests (test_connect_with_access_token_and_encryption, test_connect_uses_configured_device_id_over_whoami, test_connect_registers_encrypted_event_handler_when_encryption_on) can exercise the new crypto-store binding without AttributeError.	2026-04-25 07:17:03 +05:30
Teknium	05d8f11085	fix(/model): show provider-enforced context length, not raw models.dev (#15438 ) /model gpt-5.5 on openai-codex showed 'Context: 1,050,000 tokens' because the display block used ModelInfo.context_window directly from models.dev. Codex OAuth actually enforces 272K for the same slug, and the agent's compressor already runs at 272K via get_model_context_length() — so the banner + real context budget said 272K while /model lied with 1M. Route the display context through a new resolve_display_context_length() helper that always prefers agent.model_metadata.get_model_context_length (which knows about Codex OAuth, Copilot, Nous caps) and only falls back to models.dev when that returns nothing. Fix applied to all 3 /model display sites: cli.py _handle_model_switch gateway/run.py picker on_model_selected callback gateway/run.py text-fallback confirmation Reported by @emilstridell (Telegram, April 2026).	2026-04-24 17:21:38 -07:00
Jérôme Benoit	c34d3f4807	fix(skills): factor HERMES_HOME resolution into shared _hermes_home helper The three google-workspace scripts (setup.py, google_api.py, gws_bridge.py) each had their own way of resolving HERMES_HOME: - setup.py imported hermes_constants (crashes outside Hermes process) - google_api.py used os.getenv inline (no strip, no empty handling) - gws_bridge.py defined its own local get_hermes_home() (duplicate) Extract the common logic into _hermes_home.py which: - Delegates to hermes_constants when available (profile support, etc.) - Falls back to os.getenv with .strip() + empty-as-unset handling - Provides display_hermes_home() with ~/ shortening for profiles All three scripts now import from _hermes_home instead of duplicating. 7 regression tests cover the fallback path: env var override, default ~/.hermes, empty env var, display shortening, profile paths, and custom non-home paths. Closes #12722	2026-04-24 16:45:27 -07:00
simbam99	19a3e2ce8e	fix(gateway): follow compression continuations during /resume	2026-04-24 16:42:31 -07:00
Teknium	d58b305adf	refactor(deepseek-reasoning): consolidate detection into helpers + regression tests Extracts _needs_kimi_tool_reasoning() for symmetry with the existing _needs_deepseek_tool_reasoning() helper, so _copy_reasoning_content_for_api uses the same detection logic as _build_assistant_message. Future changes to either provider's signals now only touch one function. Adds tests/run_agent/test_deepseek_reasoning_content_echo.py covering: - All 3 DeepSeek detection signals (provider, model, host) - Poisoned history replay (empty string fallback) - Plain assistant turns NOT padded - Explicit reasoning_content preserved - Reasoning field promoted to reasoning_content - Existing Kimi/Moonshot detection intact - Non-thinking providers left alone 21 tests, all pass.	2026-04-24 16:38:29 -07:00
Benjamin Sehl	f731c2c2bd	fix(gateway/bluebubbles): align iMessage delivery with non-editable UX	2026-04-24 16:04:37 -07:00
Brian D. Evans	00c3d848d8	fix(memory): skip external-provider sync on interrupted turns (#15218 ) ``run_conversation`` was calling ``memory_manager.sync_all( original_user_message, final_response)`` at the end of every turn where both args were present. That gate didn't consider the ``interrupted`` local flag, so an external memory backend received partial assistant output, aborted tool chains, or mid-stream resets as durable conversational truth. Downstream recall then treated the not-yet-real state as if the user had seen it complete, poisoning the trust boundary between "what the user took away from the turn" and "what Hermes was in the middle of producing when the interrupt hit". Extracted the inline sync block into a new private method ``AIAgent._sync_external_memory_for_turn(original_user_message, final_response, interrupted)`` so the interrupt guard is a single visible check at the top of the method instead of hidden in a boolean-and at the call site. That also gives tests a clean seam to assert on — the pre-fix layout buried the logic inside the 3,000-line ``run_conversation`` function where no focused test could reach it. The new method encodes three independent skip conditions: 1. ``interrupted`` → skip entirely (the #15218 fix). Applies even when ``final_response`` and ``original_user_message`` happen to be populated — an interrupt may have landed between a streamed reply and the next tool call, so the strings on disk are not actually the turn the user took away. 2. No memory manager / no final_response / no user message → preserve existing skip behaviour (nothing new for providerless sessions, system-initiated refreshes, tool-only turns that never resolved, etc.). 3. Sync_all / queue_prefetch_all exceptions → swallow. External memory providers are strictly best-effort; a misconfigured or offline backend must never block the user from seeing their response. The prefetch side-effect is gated on the same interrupt flag: the user's next message is almost certainly a retry of the same intent, and a prefetch keyed on the interrupted turn would fire against stale context. ### Tests (16 new, all passing on py3.11 venv) ``tests/run_agent/test_memory_sync_interrupted.py`` exercises the helper directly on a bare ``AIAgent`` (``__new__`` pattern that the interrupt-propagation tests already use). Coverage: - Interrupted turn with full-looking response → no sync (the fix) - Interrupted turn with long assistant output → no sync (the interrupt could have landed mid-stream; strings-on-disk lie) - Normal completed turn → sync_all + queue_prefetch_all both called with the right args (regression guard for the positive path) - No final_response / no user_message / no memory manager → existing pre-fix skip paths still apply - sync_all raises → exception swallowed, prefetch still attempted - queue_prefetch_all raises → exception swallowed after sync succeeded - 8-case parametrised matrix across (interrupted × final_response × original_user_message) asserts sync fires iff interrupted=False AND both strings are non-empty Closes #15218 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 15:30:18 -07:00

1 2 3 4 5 ...

2543 Commits