hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-28 06:51:16 +08:00

Author	SHA1	Message	Date
Brooklyn Nicholson	648da6a8d1	feat(gui): make desktop setup flow real and testable Add a GUI-first setup gate and runtime state API so desktop onboarding is safe, iterative, and works with isolated fresh-mode installs. Scaffold and wire the desktop shell/runtime pieces so this branch runs end-to-end without disturbing existing user installs.	2026-04-25 19:48:02 -05:00
brooklyn!	edc78e258c	Merge pull request #15766 from NousResearch/bb/tui-ssh-copy fix(tui): honor client copy shortcut over ssh	2026-04-25 15:33:17 -05:00
Brooklyn Nicholson	31d7f1951a	fix(tui): clamp copied selection bounds Clamp copied selection columns to the screen width before scanning rendered cells.	2026-04-25 15:32:45 -05:00
Brooklyn Nicholson	b1c18e5a41	refactor(tui): format screen imports Keep screen.ts import ordering aligned with the ui-tui formatter.	2026-04-25 15:26:51 -05:00
Brooklyn Nicholson	bd66e55a02	fix(tui): track rendered spaces for selection copy - add a written-cell bitmap so selection can distinguish rendered spaces from blank padding - preserve code indentation without markdown-specific rendering hacks	2026-04-25 15:21:26 -05:00
Brooklyn Nicholson	1735ced93b	fix(tui): preserve code block indentation in selection Render code indentation spaces as selectable cells so copied fenced code keeps its leading whitespace.	2026-04-25 15:17:36 -05:00
Brooklyn Nicholson	bba16943f6	fix(tui): preserve rendered indentation in selections - trim only empty edge rows instead of full selected text - bound selection paint using unwritten cells so rendered indentation remains copyable	2026-04-25 15:14:26 -05:00
Brooklyn Nicholson	132620ba3d	refactor(tui): simplify remote copy hotkey hints Use an explicit conditional table instead of spread casting for SSH copy hint rows.	2026-04-25 15:09:12 -05:00
Brooklyn Nicholson	876bb60044	fix(tui): trim whitespace-only selection chrome - clamp selection highlight to real row content so blank drag margins do not render or copy - keep successful copy actions quiet while preserving usage and failure feedback	2026-04-25 15:07:29 -05:00
Brooklyn Nicholson	a68793b6c4	refactor(tui): share remote shell detection Reuse the platform helper for SSH-aware copy hints so hotkey display and input handling cannot drift.	2026-04-25 14:55:28 -05:00
Brooklyn Nicholson	bcc5362432	fix(tui): honor client copy shortcut over ssh - accept forwarded Cmd+C for selection copy in SSH sessions even when Hermes runs on Linux - keep local Linux Alt+C from acting as copy and update TUI hotkey hints for remote shells	2026-04-25 14:44:39 -05:00
brooklyn!	283c8fd6e2	Merge pull request #15755 from NousResearch/bb/tui-model-flag fix(tui): honor launch model overrides	2026-04-25 14:30:26 -05:00
Brooklyn Nicholson	919274b60e	fix(tui): align overlay q shortcut casing Keep shared overlay close behavior consistent with pager and agents overlays by binding lowercase q only.	2026-04-25 14:26:35 -05:00
Brooklyn Nicholson	6e83d90eb4	refactor(tui): tighten overlay helpers - rename overlay help text component to match its role - share picker window math across model, session, and skills overlays	2026-04-25 14:23:45 -05:00
Brooklyn Nicholson	c6fdf48b79	fix(tui): sync inference model after switches - keep HERMES_INFERENCE_MODEL aligned with HERMES_MODEL after in-TUI model switches - clarify static provider detection remapping docs	2026-04-25 14:17:57 -05:00
Brooklyn Nicholson	a046483e86	fix(tui): share overlay close controls - add reusable overlay key and help-text helpers for picker-style overlays - make model, session, skills, and pager hints consistently support Esc/q close behavior	2026-04-25 14:17:04 -05:00
Brooklyn Nicholson	fdcbd2257b	fix(tui): resolve startup model aliases statically - expand short model aliases like sonnet/opus via static catalogs during startup runtime resolution - keep startup alias resolution network-free and add regression tests in models and tui gateway suites	2026-04-25 14:13:02 -05:00
Brooklyn Nicholson	48bdd2445e	fix(tui): apply ui-tui fix pass and restore type-check - run the requested ui-tui lint+format pass and include resulting formatting updates - guard text-measure cache eviction key in hermes-ink so ui-tui type-check stays green	2026-04-25 14:08:54 -05:00
Brooklyn Nicholson	5e52011de3	fix(tui): bind provider as model alias	2026-04-25 13:58:59 -05:00
Brooklyn Nicholson	e48a497d16	fix(tui): share static model detection	2026-04-25 13:56:16 -05:00
Brooklyn Nicholson	2dfcc8087a	fix(tui): avoid network lookup during startup	2026-04-25 13:47:18 -05:00
Brooklyn Nicholson	4db58d45d4	fix(tui): address startup provider review	2026-04-25 13:29:15 -05:00
Brooklyn Nicholson	57b43fdd4b	fix(tui): preserve provider precedence on startup	2026-04-25 13:25:43 -05:00
Brooklyn Nicholson	e9c47c7042	fix(tui): honor launch model overrides	2026-04-25 13:21:59 -05:00
brooklyn!	ee0728c6c4	Merge pull request #15351 from helix4u/fix/tui-rebuild-missing-ink-bundle fix(tui): rebuild when ink bundle is missing	2026-04-25 13:14:23 -05:00
kshitij	648b89911f	fix: use output_text for assistant message content in Codex Responses API (#15690 ) The Codex Responses API rejects input_text inside assistant messages — only output_text and refusal are valid content types for assistant role. _chat_content_to_responses_parts() previously hardcoded all text content to input_text regardless of the message role. When an assistant message had list-format content (multimodal or structured), this produced invalid input_text parts that the API rejected with: Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'. Fix: add a role parameter to _chat_content_to_responses_parts() that selects output_text for assistant messages and input_text for user messages. Thread this through _chat_messages_to_responses_input() and _preflight_codex_input_items(). Fixes #15687	2026-04-25 10:13:29 -07:00
kshitijk4poor	7c17accb29	fix: /stop now immediately aborts streaming retry loop When a user sends /stop during a streaming API call, the outer poll loop detects _interrupt_requested and closes the HTTP connection. However, the inner _call() thread catches the connection error and enters its retry loop — opening a FRESH connection without checking the interrupt flag. On slow providers like ollama-cloud, each retry attempt blocks for the full stream-read timeout (120s+). With 3 retry attempts this caused 510+ second delays between /stop and actual response — the agent appeared completely unresponsive despite the stop being acknowledged. Fix: add an _interrupt_requested check at the top of the streaming retry loop so the agent exits immediately instead of retrying. Also fix log truncation: all session key logging in gateway/run.py used [:20] or [:30] slices, which truncated 'agent:main:telegram:dm:5690190437' (33 chars) to 'agent:main:telegram:' — losing the identifying chat type and user ID. Replace with full keys to make logs debuggable. Reported by user Sidharth Pulipaka via Telegram on ollama-cloud provider.	2026-04-25 09:51:39 -07:00
Teknium	5006b2204b	fix(update): honor RestartSec when polling for gateway respawn (#15707 ) The post-graceful-drain is-active poll used a fixed 10s timeout, but systemd's hermes-gateway.service has RestartSec=30 — so systemd won't respawn the unit for 30s after exit-75, and our poll gives up during the cooldown. Result: every 'hermes update' printed ⚠ hermes-gateway drained but didn't relaunch — forcing restart followed by a redundant 'systemctl restart' that kicked the newly- respawning gateway again (and re-started WhatsApp / Discord a second time in the process). Fix: read RestartUSec from the unit via 'systemctl show' and set the poll budget to max(10s, RestartSec + 10s slack). Units without RestartSec set (or value=infinity) fall back to the original 10s. Observed timeline from journalctl before fix: 08:56:22.262 old PID exits 75 08:56:32.707 systemd logs Stopped -> Started (10.4s gap, > 10s budget) After fix the poll covers 40s — comfortably inside RestartSec + slack. Validation: - RestartUSec parser tested against '30s', '100ms', '1min 30s', 'infinity', '', 'garbage', '500us', '2min' — all correct. - Against the live hermes-gateway.service: parses to 30.0s. - tests/hermes_cli/test_update_gateway_restart.py: 41/41 pass.	2026-04-25 09:08:27 -07:00
Teknium	a9fa73a620	feat(oneshot): add --model / --provider / HERMES_INFERENCE_MODEL (#15704 ) Makes hermes -z usable by sweeper without mutating user config. - Top-level -m/--model and --provider flags that apply to -z/--oneshot (mirrors hermes chat's plumbing). - HERMES_INFERENCE_MODEL env var as the parallel to HERMES_INFERENCE_PROVIDER for CI / scripted invocations. - resolve_runtime_provider() gets the requested provider; when --model is given without --provider, detect_provider_for_model() auto-selects the provider that serves it (same semantic as /model in an interactive session). - --provider without --model errors out with exit 2 — carrying a config model across to a different provider is usually wrong, and silently picking the provider's catalog default hides the mismatch. Config defaults still used when both flags are omitted (existing behavior). Validation (all live against OpenRouter): -z 'x' ....................... uses config default (opus-4.7) -z 'x' --model haiku-4.5 ..... haiku-4.5 via auto-detected openrouter -z 'x' --model ... --provider pair as given HERMES_INFERENCE_MODEL=... -z haiku-4.5 via env var -z 'x' --provider anthropic .. exits 2 with error to stderr	2026-04-25 08:55:36 -07:00
Teknium	7c8c031f60	feat: add `hermes -z <prompt>` one-shot mode (#15702 ) * feat: add `hermes -z <prompt>` one-shot mode Top-level flag that runs a single prompt and prints ONLY the final response text to stdout. No banner, no spinner, no tool previews, no session_id line — stdout is machine-readable, stderr is silent. Tools, memory, rules, and AGENTS.md in the CWD are loaded as normal. Approvals are auto-bypassed (sets HERMES_YOLO_MODE=1 for the call). Bypasses cli.py entirely — goes straight to AIAgent.chat(). * feat(oneshot): handle interactive-callback gaps explicitly Document (and where needed, patch) the interactive surfaces that have no user to answer in oneshot mode: - clarify — inject a callback that tells the agent to pick the best default and continue (previously returned a generic 'not available in this execution context' error that wastes a tool call) - sudo password — terminal_tool already gates on HERMES_INTERACTIVE (we don't set it); sudo fails gracefully - shell hooks — HERMES_ACCEPT_HOOKS=1 auto-approves; also falls back to deny on non-tty stdin - dangerous cmd — HERMES_YOLO_MODE=1 short-circuits before input() - secret capture— tool returns gracefully when no callback wired Live-tested: agent asked clarify(['red','blue']) and got 'red' back, replied with only 'red'.	2026-04-25 08:44:38 -07:00
Teknium	ea01bdcebe	refactor(memory): remove flush_memories entirely (#15696 ) The AIAgent.flush_memories pre-compression save, the gateway _flush_memories_for_session, and everything feeding them are obsolete now that the background memory/skill review handles persistent memory extraction. Problems with flush_memories: - Pre-dates the background review loop. It was the only memory-save path when introduced; the background review now fires every 10 user turns on CLI and gateway alike, which is far more frequent than compression or session reset ever triggered flush. - Blocking and synchronous. Pre-compression flush ran on the live agent before compression, blocking the user-visible response. - Cache-breaking. Flush built a temporary conversation prefix (system prompt + memory-only tool list) that diverged from the live conversation's cached prefix, invalidating prompt caching. The gateway variant spawned a fresh AIAgent with its own clean prompt for each finalized session — still cache-breaking, just in a different process. - Redundant. Background review runs in the live conversation's session context, gets the same content, writes to the same memory store, and doesn't break the cache. Everything flush_memories claimed to preserve is already covered. What this removes: - AIAgent.flush_memories() method (~248 LOC in run_agent.py) - Pre-compression flush call in _compress_context - flush_memories call sites in cli.py (/new + exit) - GatewayRunner._flush_memories_for_session + _async_flush_memories (and the 3 call sites: session expiry watcher, /new, /resume) - 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks, hermes tools UI task list, auxiliary_client docstrings - _memory_flush_min_turns config + init - #15631's headroom-deduction math in _check_compression_model_feasibility (headroom was only needed because flush dragged the full main-agent system prompt along; the compression summariser sends a single user-role prompt so new_threshold = aux_context is safe again) - The dedicated test files and assertions that exercised flush-specific paths What this renames (with read-time backcompat on sessions.json): - SessionEntry.memory_flushed -> SessionEntry.expiry_finalized. The session-expiry watcher still uses the flag to avoid re-running finalize/eviction on the same expired session; the new name reflects what it now actually gates. from_dict() reads 'expiry_finalized' first, falls back to the legacy 'memory_flushed' key so existing sessions.json files upgrade seamlessly. Supersedes #15631 and #15638. Tested: 383 targeted tests pass across run_agent/, agent/, cli/, and gateway/ session-boundary suites. No behavior regressions — background memory review continues to handle persistent memory extraction on both CLI and gateway.	2026-04-25 08:21:14 -07:00
kshitijk4poor	d635e2df3f	fix(compression): pass provider to context length resolver in feasibility check _check_compression_model_feasibility calls get_model_context_length without provider=, so Codex OAuth users get 1,050,000 (from models.dev for 'openai') instead of the actual 272,000 limit. This happens because _infer_provider_from_url maps chatgpt.com → 'openai' (not 'openai-codex'), skipping the Codex-specific resolution branch entirely. Result: compression threshold set at 85% of 1.05M = 892K — conversations never trigger compression, the context grows unbounded, and when gateway hygiene eventually forces compression, the Codex endpoint drops the oversized streaming request ('peer closed connection without sending complete message body'). Fix: forward self.provider to get_model_context_length so provider- specific resolution branches (Codex OAuth 272K, Copilot live /models, Nous suffix-match) fire correctly. Reported by user on GPT 5.5 via Codex OAuth Pro (paste.rs/vsra3).	2026-04-25 07:09:47 -07:00
Teknium	cf2fabc40f	docs(dashboard): document page-scoped plugin slots (#15662 ) Follow-up to PR #15658. The feature PR introduced page-scoped slots (<page>:top / <page>:bottom inside every built-in page) but only touched the Shell slots catalogue. Adds proper narrative coverage so plugin authors find the feature. Changes - extending-the-dashboard.md: - Frontmatter description + intro bullet now mention page-scoped slots - New TOC entry "Augmenting built-in pages (page-scoped slots)" - New dedicated subsection after "Replacing built-in pages" explaining the heavy-vs-light tradeoff, listing the pages that expose slots, and showing a worked manifest + IIFE example with tab.hidden: true - Cross-link from the tab.override section pointing readers to the lighter augmentation option - web-dashboard.md: - Bullet mentioning "page-scoped slots (inject widgets into built-in pages without overriding them)" Validation - TOC anchor "#augmenting-built-in-pages-page-scoped-slots" matches the generated heading slug - Code fences balanced (64, even) - Pre-existing docusaurus build errors (skills.json, api-server.md link) reproduce on bare main -- not introduced here	2026-04-25 06:59:24 -07:00
Teknium	af22421e87	feat(dashboard): page-scoped plugin slots for built-in pages (#15658 ) * fix(terminal): three-layer defense against watch_patterns notification spam Background processes that stack notify_on_complete=True with watch_patterns can flood the user with duplicate, delayed notifications — matches deliver asynchronously via the completion queue and continue arriving minutes after the process has exited. The docstring warning against this (PR #12113) has proven insufficient; agents still misuse the combination. Three layered defenses, each sufficient on its own: 1. Mutual exclusion (terminal_tool.py): When both flags are set on a background process, drop watch_patterns with a warning. notify_on_complete wins because 'let me know when it's done' is the more useful signal and fires exactly once. Extracted as _resolve_notification_flag_conflict() so the rule is testable in isolation. 2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now bails the moment session.exited is True. Post-exit chunks (buffered reads draining after the process is gone) no longer produce notifications. This is the fix flagged as future work in session 20260418_020302_79881c. 3. Global circuit breaker (process_registry.py): Per-session rate limits don't catch the sibling-flood case — N concurrent processes can each stay under 8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap trips a 30-second cooldown across ALL sessions, emits a single watch_overflow_tripped event, silently counts dropped events, and emits a watch_overflow_released summary when the cooldown ends. Also updates the tool schema + docstring to document the new behavior. Tests: 8 new tests covering all three fixes (suppress-after-exit x2, mutual-exclusion resolver x4, global breaker trip/cooldown/release x2). All 60 tests across test_watch_patterns.py, test_notify_on_complete.py, test_terminal_tool.py pass. Real-world trigger: self-inflicted in session 20260425_051924 — three concurrent hermes-sweeper review subprocesses each set watch_patterns= ['failed validation', 'errored'] AND notify_on_complete=True, then iterated over multiple items, producing enough matches per process to defeat the per-session cap while staying under the global cap that didn't yet exist. * fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion Per Teknium's direction, the watch_patterns rate limit is now much more aggressive and self-healing. ## New rule — per session - HARD cap: 1 watch-match notification per 15 seconds per process. - Any match arriving inside the cooldown window is dropped and counts as ONE strike for that window (many drops in the same window still = 1 strike). - After 3 consecutive strike windows, watch_patterns is permanently disabled for the session and the session is auto-promoted to notify_on_complete semantics — exactly one notification when the process actually exits. - A cooldown window that expires with zero drops resets the consecutive strike counter — healthy cadence is forgiven. ## Schema + docstring rewritten The tool schema description now gives the model explicit guidance: - notify_on_complete is 'the right choice for almost every long-running task' - watch_patterns is for RARE one-shot signals on LONG-LIVED processes - Do NOT use watch_patterns with loops/batch jobs — error patterns fire every iteration and will hit the strike limit fast - Mutual exclusion is stated on both parameter descriptions - 1/15s cooldown and 3-strike promotion are stated in the watch_patterns description so the model sees the contract every turn ## Removed - WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the new 1/15s limit subsumes both; keeping them would double-count. - _watch_window_hits / _watch_window_start / _watch_overload_since fields on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until / _watch_strike_candidate / _watch_consecutive_strikes. ## Kept - Global circuit breaker across all sessions (15/10s → 30s cooldown) as a secondary safety net for concurrent siblings. Still valuable when 20 short-lived processes each fire once — none individually violates the per-session limit. - Suppress-after-exit guard. - Mutual exclusion resolver at the tool entry point. ## Tests - 6 new tests in TestPerSessionRateLimit covering: first match delivers, second in cooldown suppressed, multi-drop = single strike, 3 strikes disables + promotes, clean window resets counter, suppressed count carried to next emit. - Global circuit breaker tests rewritten to use fresh sessions instead of hacking removed per-window fields. - 50/50 watch_patterns + notify_on_complete tests pass. - 60/60 including test_terminal_tool.py pass. * feat(dashboard): page-scoped plugin slots for built-in pages Dashboard plugins can now inject components into specific built-in pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs, Chat) without overriding the whole route. Previously, plugins could only: 1. Add new tabs (tab.path) 2. Replace whole built-in pages (tab.override) 3. Inject into global shell slots (header-, footer-, pre-main, ...) None of those let a plugin add a banner, card, or widget to an existing page. The new <page>:top / <page>:bottom slots close that gap, reusing the existing registerSlot() API. Changes - web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries (sessions:top, sessions:bottom, analytics:top, ..., chat:bottom), grouped under "Shell-wide" vs "Page-scoped" in the docblock - web/src/pages/*: each built-in page now renders <PluginSlot name="<page>:top" /> as the first child of its outer wrapper and <PluginSlot name="<page>:bottom" /> as the last child -- zero visual cost when no plugin registers - plugins/example-dashboard: registers a demo banner into sessions:top via registerSlot(), with matching slots entry in the manifest -- so freshly-setup users can see what page-scoped slots look like without writing any plugin code - website/docs: new "Page-scoped slots" table in the plugin authoring guide, with a worked example - tests/hermes_cli/test_web_server.py: round-trip test for colon-bearing slot names (sessions:top, analytics:bottom, ...) Validation - npm run build: clean (tsc -b + vite build, 2761 modules) - scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass	2026-04-25 06:55:35 -07:00
Teknium	97d54f0e4d	fix(terminal): three-layer defense against watch_patterns notification spam (#15642 ) * fix(terminal): three-layer defense against watch_patterns notification spam Background processes that stack notify_on_complete=True with watch_patterns can flood the user with duplicate, delayed notifications — matches deliver asynchronously via the completion queue and continue arriving minutes after the process has exited. The docstring warning against this (PR #12113) has proven insufficient; agents still misuse the combination. Three layered defenses, each sufficient on its own: 1. Mutual exclusion (terminal_tool.py): When both flags are set on a background process, drop watch_patterns with a warning. notify_on_complete wins because 'let me know when it's done' is the more useful signal and fires exactly once. Extracted as _resolve_notification_flag_conflict() so the rule is testable in isolation. 2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now bails the moment session.exited is True. Post-exit chunks (buffered reads draining after the process is gone) no longer produce notifications. This is the fix flagged as future work in session 20260418_020302_79881c. 3. Global circuit breaker (process_registry.py): Per-session rate limits don't catch the sibling-flood case — N concurrent processes can each stay under 8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap trips a 30-second cooldown across ALL sessions, emits a single watch_overflow_tripped event, silently counts dropped events, and emits a watch_overflow_released summary when the cooldown ends. Also updates the tool schema + docstring to document the new behavior. Tests: 8 new tests covering all three fixes (suppress-after-exit x2, mutual-exclusion resolver x4, global breaker trip/cooldown/release x2). All 60 tests across test_watch_patterns.py, test_notify_on_complete.py, test_terminal_tool.py pass. Real-world trigger: self-inflicted in session 20260425_051924 — three concurrent hermes-sweeper review subprocesses each set watch_patterns= ['failed validation', 'errored'] AND notify_on_complete=True, then iterated over multiple items, producing enough matches per process to defeat the per-session cap while staying under the global cap that didn't yet exist. * fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion Per Teknium's direction, the watch_patterns rate limit is now much more aggressive and self-healing. ## New rule — per session - HARD cap: 1 watch-match notification per 15 seconds per process. - Any match arriving inside the cooldown window is dropped and counts as ONE strike for that window (many drops in the same window still = 1 strike). - After 3 consecutive strike windows, watch_patterns is permanently disabled for the session and the session is auto-promoted to notify_on_complete semantics — exactly one notification when the process actually exits. - A cooldown window that expires with zero drops resets the consecutive strike counter — healthy cadence is forgiven. ## Schema + docstring rewritten The tool schema description now gives the model explicit guidance: - notify_on_complete is 'the right choice for almost every long-running task' - watch_patterns is for RARE one-shot signals on LONG-LIVED processes - Do NOT use watch_patterns with loops/batch jobs — error patterns fire every iteration and will hit the strike limit fast - Mutual exclusion is stated on both parameter descriptions - 1/15s cooldown and 3-strike promotion are stated in the watch_patterns description so the model sees the contract every turn ## Removed - WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the new 1/15s limit subsumes both; keeping them would double-count. - _watch_window_hits / _watch_window_start / _watch_overload_since fields on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until / _watch_strike_candidate / _watch_consecutive_strikes. ## Kept - Global circuit breaker across all sessions (15/10s → 30s cooldown) as a secondary safety net for concurrent siblings. Still valuable when 20 short-lived processes each fire once — none individually violates the per-session limit. - Suppress-after-exit guard. - Mutual exclusion resolver at the tool entry point. ## Tests - 6 new tests in TestPerSessionRateLimit covering: first match delivers, second in cooldown suppressed, multi-drop = single strike, 3 strikes disables + promotes, clean window resets counter, suppressed count carried to next emit. - Global circuit breaker tests rewritten to use fresh sessions instead of hacking removed per-window fields. - 50/50 watch_patterns + notify_on_complete tests pass. - 60/60 including test_terminal_tool.py pass.	2026-04-25 06:41:58 -07:00
Teknium	6e561ffa6d	fix(update): poll is-active instead of one-shot sleep(3) after gateway restart (#15639 ) The auto-restart path in `hermes update` verifies systemd unit health with `time.sleep(3)` + a single `systemctl is-active` call. The unit's Stopped -> Started transition after a graceful SIGUSR1 exit (or a hard restart) is not always complete inside that 3s window, so the verify races and reports 'drained but didn't relaunch' even though systemd is about to bring the unit back up a fraction of a second later. Users then see a spurious warning, a redundant fallback `systemctl restart` fires, and adapters (Discord, WhatsApp) get restarted twice. Replace the three sleep+oneshot sites with a small `_wait_for_service_active()` closure that polls `is-active` every 0.5s for up to 10s. Behaviour is unchanged when the unit is healthy or truly dead — only the race window around a clean restart is now handled correctly. Tests: tests/hermes_cli/test_update_gateway_restart.py (41/41).	2026-04-25 06:11:22 -07:00
Teknium	ac05daa189	fix(tools): dedupe bundled plugin toolsets with built-in entries (#15634 ) `hermes tools` → "reconfigure existing" listed Spotify twice because the Apr 24 refactor that moved Spotify into plugins/spotify/ (PR #15174) left the entry in CONFIGURABLE_TOOLSETS. _get_effective_configurable_toolsets() unconditionally appended get_plugin_toolsets() on top, so the same 'spotify' key showed up from both sources. Dedupe by key — built-in CONFIGURABLE_TOOLSETS entry wins (it has the nicer label and description). Also guards against future bundled plugins that share a toolset key with a built-in.	2026-04-25 05:53:08 -07:00
Teknium	3c1c65e754	fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633 ) Generalize the temperature-specific 400 retry that shipped in PR #15621 so the same reactive strategy covers any provider that rejects an arbitrary request parameter — — not just temperature. - agent/auxiliary_client.py: * New _is_unsupported_parameter_error(exc, param): matches the same six phrasings the old temperature detector did plus 'unrecognized parameter' and 'invalid parameter', against any named param. * _is_unsupported_temperature_error is now a thin back-compat wrapper so existing imports and tests keep working. * The max_tokens → max_completion_tokens retry branch in call_llm and async_call_llm now (a) gates on 'max_tokens is not None' so we do not pop a key that was never set and silently substitute a None value on the retry, and (b) also matches the generic helper in addition to the legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking up phrasings like 'Unknown parameter: max_tokens' that previously slipped through. - tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering the generic detector across params, the back-compat wrapper, and the two hardenings to the max_tokens retry branch (None gate + generic phrasing). Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR also proposed the reactive temperature retry which landed independently via PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages the remaining hardening ideas onto current main.	2026-04-25 05:50:34 -07:00
Teknium	f92006ce1c	fix(compression): reserve system+tools headroom when aux binds threshold (#15631 ) When the auxiliary compression model's context is smaller than the main model's compression threshold, _check_compression_model_feasibility auto-lowers the session threshold. Previously it set: new_threshold = aux_context This let the raw message list grow to exactly aux_context tokens. But compression and flush_memories actually send system_prompt + tool_schemas + messages to the aux model. With 50+ tools that overhead is 25-30K tokens, so the full request overflowed aux with HTTP 400. Subtract a headroom estimate from aux_context before setting the new threshold: the actual tool-schema token count (from estimate_request_tokens_rough) plus a 12K allowance for the system prompt (not yet built at __init__ time) and flush-instruction overhead. Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with an unusually heavy tool schema. This fixes the 'flush_memories overflow on busy toolsets' path that Teknium flagged — where main and aux can be nominally the same model but still 400 because the threshold left no room for the request overhead. Same fix also protects the normal compression summarisation request on the same binding aux. Tests: two new regression tests cover the headroom reservation and the MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new (lower) threshold values now that empty-tools still produces a 12K static headroom deduction.	2026-04-25 05:41:56 -07:00
Teknium	b35d692f45	chore(release): map ash@users.noreply.github.com to ash	2026-04-25 05:27:17 -07:00
Ash Rowan Vale 🌿	facea84559	fix(auxiliary): retry without temperature when any provider rejects it Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature' across all providers/models — not just Codex Responses. The same backend can accept temperature for some models and reject it for others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does not scale. call_llm / async_call_llm now detect the concrete 'unsupported parameter: temperature' 400 and transparently retry once without temperature. Kimi's server-managed omission and Opus 4.7+'s proactive strip stay in place — this is the safety net for everything else. Changes: - agent/auxiliary_client.py: add _is_unsupported_temperature_error helper; wire into both sync and async call_llm paths before the existing max_tokens/payment/auth retry ladder - tests/agent/test_unsupported_temperature_retry.py: 19 tests covering detector phrasings, sync + async retry, no-retry-without-temperature, and non-temperature 400s not triggering the retry Builds on PR #15620 (codex_responses fallback) which stripped temperature up front for that one api_mode. This PR closes the gap for every other provider/model combo via reactive retry. Credit: retry approach and detector originate from @BlueBirdBack's PR #15578. Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>	2026-04-25 05:27:17 -07:00
Teknium	f67a61dc93	fix(flush_memories): strip temperature from codex_responses fallback (#15620 ) The memory-flush fallback for api_mode='codex_responses' was unconditionally adding `temperature` to codex_kwargs before calling _run_codex_stream. The Responses API does not accept temperature on any supported backend: - chatgpt.com/backend-api/codex rejects it outright - api.openai.com + gpt-5/o-series reasoning models reject it - Copilot Responses rejects it on reasoning models The CodexAuxiliaryClient adapter and the codex_responses transport both correctly omit temperature — the flush fallback was the only path putting it back. On errors from the primary aux path (e.g. expired OAuth token), users saw `⚠ Auxiliary memory flush failed: HTTP 400: Unsupported parameter: temperature`. Reported by Garik [NOUS] on GPT-5.5 via Codex OAuth Pro.	2026-04-25 05:01:25 -07:00
Teknium	6ed37e0f42	feat(tools): make discord/discord_admin opt-in, Discord-only Both discord (read/participate) and discord_admin (server admin) are now configurable via `hermes tools` with default-OFF. Previously the core discord tool (fetch_messages, search_members, create_thread) auto-loaded on every Discord install with DISCORD_BOT_TOKEN set — 19 tools the user never opted into. Adds a platform-scoping mechanism (_TOOLSET_PLATFORM_RESTRICTIONS) so the discord toolsets only show up in the Discord platform's checklist, not on CLI/Telegram/Slack/etc. Applied at four gates: - _prompt_toolset_checklist: checklist filter - _get_platform_tools: resolution filter (both branches) - _save_platform_tools: save-time filter (covers 'Configure all platforms' and hand-edited config.yaml) - tools_disable_enable_command: rejects `hermes tools enable discord` on non-Discord platforms with a clear error build_session_context_prompt now injects the Discord IDs block only when both conditions hold: the discord/discord_admin toolset is enabled AND DISCORD_BOT_TOKEN is set. Toolset alone isn't enough — the tool's check_fn gates on the token at registry time, so opting in without a token yields no tools and the IDs block would lie. Otherwise keep the stale-API disclaimer.	2026-04-25 04:51:11 -07:00
alt-glitch	591deeb928	feat(session): inject Discord IDs block when discord tool is loaded When DISCORD_BOT_TOKEN is set — meaning the discord tool actually loads — emit a dedicated IDs block in the session context prompt so the agent can call ``fetch_messages``, ``pin_message``, etc. with real identifiers instead of probing. Currently only ``thread_id`` was exposed as a raw ID (via the ``description`` string). The agent in a Discord thread had to guess that the thread ID doubles as a channel ID for the REST API (it does), and it had no way to reference the parent channel, the guild, or the triggering message at all. The block adapts to context: - Thread: guild / parent channel / thread / message - Channel: guild / channel / message - (DM has no guild/channel IDs worth listing; only message) Discord isn't in _PII_SAFE_PLATFORMS, so IDs ship unredacted.	2026-04-25 04:51:11 -07:00
alt-glitch	5ae07e7b5c	fix(session): gate stale "no Discord APIs" note on DISCORD_BOT_TOKEN The Discord platform note in the session context prompt claimed the agent has no server-management APIs — pre-dating the discord tool. With a bot token configured the agent actually has fetch_messages, search_members, create_thread, and optionally the discord_admin tool; telling the model otherwise causes it to refuse or apologise for calls it is fully able to make. Gate the disclaimer on DISCORD_BOT_TOKEN being unset, matching the tool's own ``check_fn``. Without a token the note still appears and remains accurate; with a token the model is no longer gaslit into refusing valid tool calls.	2026-04-25 04:51:11 -07:00
alt-glitch	47b02e961c	feat(discord): populate guild_id, parent_chat_id, message_id on SessionSource Discord knows all four identifiers for every inbound message — guild, channel (or thread), parent channel when in a thread, and the triggering message. Pass them into ``SessionSource`` via the new ``build_source()`` kwargs so downstream code (context-prompt builder, delivery, logging) can use them without re-resolving from discord.py objects. For auto-threaded messages, remember the original channel as the parent before swapping ``chat_id`` to the freshly created thread. Behavioural: still a no-op — nothing consumes these fields yet.	2026-04-25 04:51:11 -07:00
alt-glitch	0702231dd8	feat(session): add guild_id/parent_chat_id/message_id to SessionSource Groundwork for injecting raw platform identifiers into the agent's system prompt. Currently only `thread_id` is exposed as a raw ID — callers in a Discord thread had to guess `channel_id == thread_id` (which happens to work because threads are channels in Discord's REST API) and had no way to reference the parent channel, guild, or the triggering message. Adds three optional fields: - `guild_id` — Discord guild / Slack workspace / Matrix server scope - `parent_chat_id` — parent channel when chat_id refers to a thread - `message_id` — ID of the triggering message (pin/reply/react) Extends `BasePlatformAdapter.build_source()` to accept + forward them and teaches `to_dict`/`from_dict` to serialize them. Behaviourally a no-op: nothing reads the fields yet and they default to None.	2026-04-25 04:51:11 -07:00
alt-glitch	db09477b77	feat(feishu): wire feishu doc/drive tools into hermes-feishu composite The feishu_doc and feishu_drive tools were registered in the tool registry but never added to the hermes-feishu composite toolset. The pipeline fix from the prior commit now recovers them automatically once they are in the composite.	2026-04-25 04:50:14 -07:00
alt-glitch	81987f0350	feat(discord): split discord_server into discord + discord_admin tools Split the monolithic discord_server tool (14 actions) into two: - discord: core actions (fetch_messages, search_members, create_thread) that are useful for the agent's normal operation. Auto-enabled on the discord platform via the pipeline fix. - discord_admin: server management actions (list channels/roles, pins, role assignment) that require explicit opt-in via hermes tools. Added to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS.	2026-04-25 04:50:14 -07:00
alt-glitch	9830905dab	fix(tools): recover non-configurable toolsets from composite resolution The reverse-mapping loop in _get_platform_tools only checked CONFIGURABLE_TOOLSETS, silently dropping platform-specific toolsets like discord and feishu_doc whose tools were in the composite but had no configurable key. Add a second pass over TOOLSETS that picks up unclaimed toolsets whose tools are present in the resolved composite.	2026-04-25 04:50:14 -07:00

1 2 3 4 5 ...

5907 Commits