Commit Graph

5907 Commits

Author SHA1 Message Date
Brooklyn Nicholson
648da6a8d1 feat(gui): make desktop setup flow real and testable
Add a GUI-first setup gate and runtime state API so desktop onboarding is safe, iterative, and works with isolated fresh-mode installs. Scaffold and wire the desktop shell/runtime pieces so this branch runs end-to-end without disturbing existing user installs.
2026-04-25 19:48:02 -05:00
brooklyn!
edc78e258c Merge pull request #15766 from NousResearch/bb/tui-ssh-copy
fix(tui): honor client copy shortcut over ssh
2026-04-25 15:33:17 -05:00
Brooklyn Nicholson
31d7f1951a fix(tui): clamp copied selection bounds
Clamp copied selection columns to the screen width before scanning rendered cells.
2026-04-25 15:32:45 -05:00
Brooklyn Nicholson
b1c18e5a41 refactor(tui): format screen imports
Keep screen.ts import ordering aligned with the ui-tui formatter.
2026-04-25 15:26:51 -05:00
Brooklyn Nicholson
bd66e55a02 fix(tui): track rendered spaces for selection copy
- add a written-cell bitmap so selection can distinguish rendered spaces from blank padding
- preserve code indentation without markdown-specific rendering hacks
2026-04-25 15:21:26 -05:00
Brooklyn Nicholson
1735ced93b fix(tui): preserve code block indentation in selection
Render code indentation spaces as selectable cells so copied fenced code keeps its leading whitespace.
2026-04-25 15:17:36 -05:00
Brooklyn Nicholson
bba16943f6 fix(tui): preserve rendered indentation in selections
- trim only empty edge rows instead of full selected text
- bound selection paint using unwritten cells so rendered indentation remains copyable
2026-04-25 15:14:26 -05:00
Brooklyn Nicholson
132620ba3d refactor(tui): simplify remote copy hotkey hints
Use an explicit conditional table instead of spread casting for SSH copy hint rows.
2026-04-25 15:09:12 -05:00
Brooklyn Nicholson
876bb60044 fix(tui): trim whitespace-only selection chrome
- clamp selection highlight to real row content so blank drag margins do not render or copy
- keep successful copy actions quiet while preserving usage and failure feedback
2026-04-25 15:07:29 -05:00
Brooklyn Nicholson
a68793b6c4 refactor(tui): share remote shell detection
Reuse the platform helper for SSH-aware copy hints so hotkey display and input handling cannot drift.
2026-04-25 14:55:28 -05:00
Brooklyn Nicholson
bcc5362432 fix(tui): honor client copy shortcut over ssh
- accept forwarded Cmd+C for selection copy in SSH sessions even when Hermes runs on Linux
- keep local Linux Alt+C from acting as copy and update TUI hotkey hints for remote shells
2026-04-25 14:44:39 -05:00
brooklyn!
283c8fd6e2 Merge pull request #15755 from NousResearch/bb/tui-model-flag
fix(tui): honor launch model overrides
2026-04-25 14:30:26 -05:00
Brooklyn Nicholson
919274b60e fix(tui): align overlay q shortcut casing
Keep shared overlay close behavior consistent with pager and agents overlays by binding lowercase q only.
2026-04-25 14:26:35 -05:00
Brooklyn Nicholson
6e83d90eb4 refactor(tui): tighten overlay helpers
- rename overlay help text component to match its role
- share picker window math across model, session, and skills overlays
2026-04-25 14:23:45 -05:00
Brooklyn Nicholson
c6fdf48b79 fix(tui): sync inference model after switches
- keep HERMES_INFERENCE_MODEL aligned with HERMES_MODEL after in-TUI model switches
- clarify static provider detection remapping docs
2026-04-25 14:17:57 -05:00
Brooklyn Nicholson
a046483e86 fix(tui): share overlay close controls
- add reusable overlay key and help-text helpers for picker-style overlays
- make model, session, skills, and pager hints consistently support Esc/q close behavior
2026-04-25 14:17:04 -05:00
Brooklyn Nicholson
fdcbd2257b fix(tui): resolve startup model aliases statically
- expand short model aliases like sonnet/opus via static catalogs during startup runtime resolution
- keep startup alias resolution network-free and add regression tests in models and tui gateway suites
2026-04-25 14:13:02 -05:00
Brooklyn Nicholson
48bdd2445e fix(tui): apply ui-tui fix pass and restore type-check
- run the requested ui-tui lint+format pass and include resulting formatting updates
- guard text-measure cache eviction key in hermes-ink so ui-tui type-check stays green
2026-04-25 14:08:54 -05:00
Brooklyn Nicholson
5e52011de3 fix(tui): bind provider as model alias 2026-04-25 13:58:59 -05:00
Brooklyn Nicholson
e48a497d16 fix(tui): share static model detection 2026-04-25 13:56:16 -05:00
Brooklyn Nicholson
2dfcc8087a fix(tui): avoid network lookup during startup 2026-04-25 13:47:18 -05:00
Brooklyn Nicholson
4db58d45d4 fix(tui): address startup provider review 2026-04-25 13:29:15 -05:00
Brooklyn Nicholson
57b43fdd4b fix(tui): preserve provider precedence on startup 2026-04-25 13:25:43 -05:00
Brooklyn Nicholson
e9c47c7042 fix(tui): honor launch model overrides 2026-04-25 13:21:59 -05:00
brooklyn!
ee0728c6c4 Merge pull request #15351 from helix4u/fix/tui-rebuild-missing-ink-bundle
fix(tui): rebuild when ink bundle is missing
2026-04-25 13:14:23 -05:00
kshitij
648b89911f fix: use output_text for assistant message content in Codex Responses API (#15690)
The Codex Responses API rejects input_text inside assistant messages —
only output_text and refusal are valid content types for assistant role.

_chat_content_to_responses_parts() previously hardcoded all text content
to input_text regardless of the message role. When an assistant message
had list-format content (multimodal or structured), this produced invalid
input_text parts that the API rejected with:

  Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'.

Fix: add a role parameter to _chat_content_to_responses_parts() that
selects output_text for assistant messages and input_text for user
messages. Thread this through _chat_messages_to_responses_input() and
_preflight_codex_input_items().

Fixes #15687
2026-04-25 10:13:29 -07:00
kshitijk4poor
7c17accb29 fix: /stop now immediately aborts streaming retry loop
When a user sends /stop during a streaming API call, the outer poll loop
detects _interrupt_requested and closes the HTTP connection. However, the
inner _call() thread catches the connection error and enters its retry
loop — opening a FRESH connection without checking the interrupt flag.

On slow providers like ollama-cloud, each retry attempt blocks for the
full stream-read timeout (120s+). With 3 retry attempts this caused
510+ second delays between /stop and actual response — the agent appeared
completely unresponsive despite the stop being acknowledged.

Fix: add an _interrupt_requested check at the top of the streaming retry
loop so the agent exits immediately instead of retrying.

Also fix log truncation: all session key logging in gateway/run.py used
[:20] or [:30] slices, which truncated 'agent:main:telegram:dm:5690190437'
(33 chars) to 'agent:main:telegram:' — losing the identifying chat type
and user ID. Replace with full keys to make logs debuggable.

Reported by user Sidharth Pulipaka via Telegram on ollama-cloud provider.
2026-04-25 09:51:39 -07:00
Teknium
5006b2204b fix(update): honor RestartSec when polling for gateway respawn (#15707)
The post-graceful-drain is-active poll used a fixed 10s timeout, but
systemd's hermes-gateway.service has RestartSec=30 — so systemd won't
respawn the unit for 30s after exit-75, and our poll gives up during
the cooldown. Result: every 'hermes update' printed

  ⚠ hermes-gateway drained but didn't relaunch — forcing restart

followed by a redundant 'systemctl restart' that kicked the newly-
respawning gateway again (and re-started WhatsApp / Discord a second
time in the process).

Fix: read RestartUSec from the unit via 'systemctl show' and set the
poll budget to max(10s, RestartSec + 10s slack). Units without
RestartSec set (or value=infinity) fall back to the original 10s.

Observed timeline from journalctl before fix:
  08:56:22.262  old PID exits 75
  08:56:32.707  systemd logs Stopped -> Started  (10.4s gap, > 10s budget)

After fix the poll covers 40s — comfortably inside RestartSec + slack.

Validation:
- RestartUSec parser tested against '30s', '100ms', '1min 30s',
  'infinity', '', 'garbage', '500us', '2min' — all correct.
- Against the live hermes-gateway.service: parses to 30.0s.
- tests/hermes_cli/test_update_gateway_restart.py: 41/41 pass.
2026-04-25 09:08:27 -07:00
Teknium
a9fa73a620 feat(oneshot): add --model / --provider / HERMES_INFERENCE_MODEL (#15704)
Makes hermes -z usable by sweeper without mutating user config.

- Top-level -m/--model and --provider flags that apply to -z/--oneshot
  (mirrors hermes chat's plumbing).
- HERMES_INFERENCE_MODEL env var as the parallel to HERMES_INFERENCE_PROVIDER
  for CI / scripted invocations.
- resolve_runtime_provider() gets the requested provider; when --model is
  given without --provider, detect_provider_for_model() auto-selects the
  provider that serves it (same semantic as /model in an interactive session).
- --provider without --model errors out with exit 2 — carrying a config
  model across to a different provider is usually wrong, and silently
  picking the provider's catalog default hides the mismatch.

Config defaults still used when both flags are omitted (existing behavior).

Validation (all live against OpenRouter):
  -z 'x' ....................... uses config default (opus-4.7)
  -z 'x' --model haiku-4.5 ..... haiku-4.5 via auto-detected openrouter
  -z 'x' --model ... --provider  pair as given
  HERMES_INFERENCE_MODEL=... -z  haiku-4.5 via env var
  -z 'x' --provider anthropic .. exits 2 with error to stderr
2026-04-25 08:55:36 -07:00
Teknium
7c8c031f60 feat: add hermes -z <prompt> one-shot mode (#15702)
* feat: add `hermes -z <prompt>` one-shot mode

Top-level flag that runs a single prompt and prints ONLY the final
response text to stdout. No banner, no spinner, no tool previews, no
session_id line — stdout is machine-readable, stderr is silent.

Tools, memory, rules, and AGENTS.md in the CWD are loaded as normal.
Approvals are auto-bypassed (sets HERMES_YOLO_MODE=1 for the call).
Bypasses cli.py entirely — goes straight to AIAgent.chat().

* feat(oneshot): handle interactive-callback gaps explicitly

Document (and where needed, patch) the interactive surfaces that have
no user to answer in oneshot mode:

  - clarify       — inject a callback that tells the agent to pick the
                    best default and continue (previously returned a
                    generic 'not available in this execution context'
                    error that wastes a tool call)
  - sudo password — terminal_tool already gates on HERMES_INTERACTIVE
                    (we don't set it); sudo fails gracefully
  - shell hooks   — HERMES_ACCEPT_HOOKS=1 auto-approves; also falls
                    back to deny on non-tty stdin
  - dangerous cmd — HERMES_YOLO_MODE=1 short-circuits before input()
  - secret capture— tool returns gracefully when no callback wired

Live-tested: agent asked clarify(['red','blue']) and got 'red' back,
replied with only 'red'.
2026-04-25 08:44:38 -07:00
Teknium
ea01bdcebe refactor(memory): remove flush_memories entirely (#15696)
The AIAgent.flush_memories pre-compression save, the gateway
_flush_memories_for_session, and everything feeding them are
obsolete now that the background memory/skill review handles
persistent memory extraction.

Problems with flush_memories:

- Pre-dates the background review loop.  It was the only memory-save
  path when introduced; the background review now fires every 10 user
  turns on CLI and gateway alike, which is far more frequent than
  compression or session reset ever triggered flush.
- Blocking and synchronous.  Pre-compression flush ran on the live agent
  before compression, blocking the user-visible response.
- Cache-breaking.  Flush built a temporary conversation prefix
  (system prompt + memory-only tool list) that diverged from the live
  conversation's cached prefix, invalidating prompt caching.  The
  gateway variant spawned a fresh AIAgent with its own clean prompt
  for each finalized session — still cache-breaking, just in a
  different process.
- Redundant.  Background review runs in the live conversation's
  session context, gets the same content, writes to the same memory
  store, and doesn't break the cache.  Everything flush_memories
  claimed to preserve is already covered.

What this removes:

- AIAgent.flush_memories() method (~248 LOC in run_agent.py)
- Pre-compression flush call in _compress_context
- flush_memories call sites in cli.py (/new + exit)
- GatewayRunner._flush_memories_for_session + _async_flush_memories
  (and the 3 call sites: session expiry watcher, /new, /resume)
- 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks,
  hermes tools UI task list, auxiliary_client docstrings
- _memory_flush_min_turns config + init
- #15631's headroom-deduction math in
  _check_compression_model_feasibility (headroom was only needed
  because flush dragged the full main-agent system prompt along;
  the compression summariser sends a single user-role prompt so
  new_threshold = aux_context is safe again)
- The dedicated test files and assertions that exercised
  flush-specific paths

What this renames (with read-time backcompat on sessions.json):

- SessionEntry.memory_flushed -> SessionEntry.expiry_finalized.
  The session-expiry watcher still uses the flag to avoid re-running
  finalize/eviction on the same expired session; the new name
  reflects what it now actually gates.  from_dict() reads
  'expiry_finalized' first, falls back to the legacy 'memory_flushed'
  key so existing sessions.json files upgrade seamlessly.

Supersedes #15631 and #15638.

Tested: 383 targeted tests pass across run_agent/, agent/, cli/,
and gateway/ session-boundary suites.  No behavior regressions —
background memory review continues to handle persistent memory
extraction on both CLI and gateway.
2026-04-25 08:21:14 -07:00
kshitijk4poor
d635e2df3f fix(compression): pass provider to context length resolver in feasibility check
_check_compression_model_feasibility calls get_model_context_length
without provider=, so Codex OAuth users get 1,050,000 (from models.dev
for 'openai') instead of the actual 272,000 limit. This happens because
_infer_provider_from_url maps chatgpt.com → 'openai' (not 'openai-codex'),
skipping the Codex-specific resolution branch entirely.

Result: compression threshold set at 85% of 1.05M = 892K — conversations
never trigger compression, the context grows unbounded, and when gateway
hygiene eventually forces compression, the Codex endpoint drops the
oversized streaming request ('peer closed connection without sending
complete message body').

Fix: forward self.provider to get_model_context_length so provider-
specific resolution branches (Codex OAuth 272K, Copilot live /models,
Nous suffix-match) fire correctly.

Reported by user on GPT 5.5 via Codex OAuth Pro (paste.rs/vsra3).
2026-04-25 07:09:47 -07:00
Teknium
cf2fabc40f docs(dashboard): document page-scoped plugin slots (#15662)
Follow-up to PR #15658. The feature PR introduced page-scoped slots
(<page>:top / <page>:bottom inside every built-in page) but only
touched the Shell slots catalogue. Adds proper narrative coverage so
plugin authors find the feature.

Changes
- extending-the-dashboard.md:
  - Frontmatter description + intro bullet now mention page-scoped slots
  - New TOC entry "Augmenting built-in pages (page-scoped slots)"
  - New dedicated subsection after "Replacing built-in pages"
    explaining the heavy-vs-light tradeoff, listing the pages that
    expose slots, and showing a worked manifest + IIFE example with
    tab.hidden: true
  - Cross-link from the tab.override section pointing readers to the
    lighter augmentation option
- web-dashboard.md:
  - Bullet mentioning "page-scoped slots (inject widgets into
    built-in pages without overriding them)"

Validation
- TOC anchor "#augmenting-built-in-pages-page-scoped-slots" matches
  the generated heading slug
- Code fences balanced (64, even)
- Pre-existing docusaurus build errors (skills.json, api-server.md
  link) reproduce on bare main -- not introduced here
2026-04-25 06:59:24 -07:00
Teknium
af22421e87 feat(dashboard): page-scoped plugin slots for built-in pages (#15658)
* fix(terminal): three-layer defense against watch_patterns notification spam

Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.

Three layered defenses, each sufficient on its own:

1. Mutual exclusion (terminal_tool.py): When both flags are set on a
   background process, drop watch_patterns with a warning. notify_on_complete
   wins because 'let me know when it's done' is the more useful signal and
   fires exactly once. Extracted as _resolve_notification_flag_conflict() so
   the rule is testable in isolation.

2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
   bails the moment session.exited is True. Post-exit chunks (buffered reads
   draining after the process is gone) no longer produce notifications. This
   is the fix flagged as future work in session 20260418_020302_79881c.

3. Global circuit breaker (process_registry.py): Per-session rate limits don't
   catch the sibling-flood case — N concurrent processes can each stay under
   8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
   trips a 30-second cooldown across ALL sessions, emits a single
   watch_overflow_tripped event, silently counts dropped events, and emits a
   watch_overflow_released summary when the cooldown ends.

Also updates the tool schema + docstring to document the new behavior.

Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.

Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.

* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion

Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.

## New rule — per session

- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
  ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
  for the session and the session is auto-promoted to notify_on_complete
  semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
  strike counter — healthy cadence is forgiven.

## Schema + docstring rewritten

The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
  iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
  description so the model sees the contract every turn

## Removed

- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
  new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
  on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
  / _watch_strike_candidate / _watch_consecutive_strikes.

## Kept

- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
  secondary safety net for concurrent siblings. Still valuable when 20
  short-lived processes each fire once — none individually violates the
  per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.

## Tests

- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
  second in cooldown suppressed, multi-drop = single strike, 3 strikes
  disables + promotes, clean window resets counter, suppressed count
  carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
  hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.

* feat(dashboard): page-scoped plugin slots for built-in pages

Dashboard plugins can now inject components into specific built-in
pages (Sessions, Analytics, Logs, Cron, Skills, Config, Env, Docs,
Chat) without overriding the whole route.

Previously, plugins could only:
  1. Add new tabs (tab.path)
  2. Replace whole built-in pages (tab.override)
  3. Inject into global shell slots (header-*, footer-*, pre-main, ...)

None of those let a plugin add a banner, card, or widget to an
existing page. The new <page>:top / <page>:bottom slots close that
gap, reusing the existing registerSlot() API.

Changes
- web/src/plugins/slots.ts: 18 new KNOWN_SLOT_NAMES entries
  (sessions:top, sessions:bottom, analytics:top, ..., chat:bottom),
  grouped under "Shell-wide" vs "Page-scoped" in the docblock
- web/src/pages/*: each built-in page now renders
    <PluginSlot name="<page>:top" />
  as the first child of its outer wrapper and
    <PluginSlot name="<page>:bottom" />
  as the last child -- zero visual cost when no plugin registers
- plugins/example-dashboard: registers a demo banner into
  sessions:top via registerSlot(), with matching slots entry in
  the manifest -- so freshly-setup users can see what page-scoped
  slots look like without writing any plugin code
- website/docs: new "Page-scoped slots" table in the plugin
  authoring guide, with a worked example
- tests/hermes_cli/test_web_server.py: round-trip test for
  colon-bearing slot names (sessions:top, analytics:bottom, ...)

Validation
- npm run build: clean (tsc -b + vite build, 2761 modules)
- scripts/run_tests.sh tests/hermes_cli/test_web_server.py::TestDashboardPluginManifestExtensions: 5/5 pass
2026-04-25 06:55:35 -07:00
Teknium
97d54f0e4d fix(terminal): three-layer defense against watch_patterns notification spam (#15642)
* fix(terminal): three-layer defense against watch_patterns notification spam

Background processes that stack notify_on_complete=True with watch_patterns
can flood the user with duplicate, delayed notifications — matches deliver
asynchronously via the completion queue and continue arriving minutes after
the process has exited. The docstring warning against this (PR #12113) has
proven insufficient; agents still misuse the combination.

Three layered defenses, each sufficient on its own:

1. Mutual exclusion (terminal_tool.py): When both flags are set on a
   background process, drop watch_patterns with a warning. notify_on_complete
   wins because 'let me know when it's done' is the more useful signal and
   fires exactly once. Extracted as _resolve_notification_flag_conflict() so
   the rule is testable in isolation.

2. Suppress-after-exit (process_registry.py): _check_watch_patterns() now
   bails the moment session.exited is True. Post-exit chunks (buffered reads
   draining after the process is gone) no longer produce notifications. This
   is the fix flagged as future work in session 20260418_020302_79881c.

3. Global circuit breaker (process_registry.py): Per-session rate limits don't
   catch the sibling-flood case — N concurrent processes can each stay under
   8/10s and still collectively spam. New WATCH_GLOBAL_MAX_PER_WINDOW=15 cap
   trips a 30-second cooldown across ALL sessions, emits a single
   watch_overflow_tripped event, silently counts dropped events, and emits a
   watch_overflow_released summary when the cooldown ends.

Also updates the tool schema + docstring to document the new behavior.

Tests: 8 new tests covering all three fixes (suppress-after-exit x2,
mutual-exclusion resolver x4, global breaker trip/cooldown/release x2).
All 60 tests across test_watch_patterns.py, test_notify_on_complete.py,
test_terminal_tool.py pass.

Real-world trigger: self-inflicted in session 20260425_051924 — three
concurrent hermes-sweeper review subprocesses each set watch_patterns=
['failed validation', 'errored'] AND notify_on_complete=True, then iterated
over multiple items, producing enough matches per process to defeat the
per-session cap while staying under the global cap that didn't yet exist.

* fix(terminal): aggressive 1-per-15s watch_patterns rate limit + strike-3 promotion

Per Teknium's direction, the watch_patterns rate limit is now much more
aggressive and self-healing.

## New rule — per session

- HARD cap: 1 watch-match notification per 15 seconds per process.
- Any match arriving inside the cooldown window is dropped and counts as
  ONE strike for that window (many drops in the same window still = 1 strike).
- After 3 consecutive strike windows, watch_patterns is permanently disabled
  for the session and the session is auto-promoted to notify_on_complete
  semantics — exactly one notification when the process actually exits.
- A cooldown window that expires with zero drops resets the consecutive
  strike counter — healthy cadence is forgiven.

## Schema + docstring rewritten

The tool schema description now gives the model explicit guidance:
- notify_on_complete is 'the right choice for almost every long-running task'
- watch_patterns is for RARE one-shot signals on LONG-LIVED processes
- Do NOT use watch_patterns with loops/batch jobs — error patterns fire every
  iteration and will hit the strike limit fast
- Mutual exclusion is stated on both parameter descriptions
- 1/15s cooldown and 3-strike promotion are stated in the watch_patterns
  description so the model sees the contract every turn

## Removed

- WATCH_MAX_PER_WINDOW (8/10s) and WATCH_OVERLOAD_KILL_SECONDS (45) — the
  new 1/15s limit subsumes both; keeping them would double-count.
- _watch_window_hits / _watch_window_start / _watch_overload_since fields
  on ProcessSession. Replaced by _watch_last_emit_at / _watch_cooldown_until
  / _watch_strike_candidate / _watch_consecutive_strikes.

## Kept

- Global circuit breaker across all sessions (15/10s → 30s cooldown) as a
  secondary safety net for concurrent siblings. Still valuable when 20
  short-lived processes each fire once — none individually violates the
  per-session limit.
- Suppress-after-exit guard.
- Mutual exclusion resolver at the tool entry point.

## Tests

- 6 new tests in TestPerSessionRateLimit covering: first match delivers,
  second in cooldown suppressed, multi-drop = single strike, 3 strikes
  disables + promotes, clean window resets counter, suppressed count
  carried to next emit.
- Global circuit breaker tests rewritten to use fresh sessions instead of
  hacking removed per-window fields.
- 50/50 watch_patterns + notify_on_complete tests pass.
- 60/60 including test_terminal_tool.py pass.
2026-04-25 06:41:58 -07:00
Teknium
6e561ffa6d fix(update): poll is-active instead of one-shot sleep(3) after gateway restart (#15639)
The auto-restart path in `hermes update` verifies systemd unit health with
`time.sleep(3)` + a single `systemctl is-active` call.  The unit's
Stopped -> Started transition after a graceful SIGUSR1 exit (or a hard
restart) is not always complete inside that 3s window, so the verify
races and reports 'drained but didn't relaunch' even though systemd is
about to bring the unit back up a fraction of a second later.  Users
then see a spurious warning, a redundant fallback `systemctl restart`
fires, and adapters (Discord, WhatsApp) get restarted twice.

Replace the three sleep+oneshot sites with a small `_wait_for_service_active()`
closure that polls `is-active` every 0.5s for up to 10s.  Behaviour
is unchanged when the unit is healthy or truly dead — only the race
window around a clean restart is now handled correctly.

Tests: tests/hermes_cli/test_update_gateway_restart.py (41/41).
2026-04-25 06:11:22 -07:00
Teknium
ac05daa189 fix(tools): dedupe bundled plugin toolsets with built-in entries (#15634)
`hermes tools` → "reconfigure existing" listed Spotify twice because
the Apr 24 refactor that moved Spotify into plugins/spotify/ (PR #15174)
left the entry in CONFIGURABLE_TOOLSETS. _get_effective_configurable_toolsets()
unconditionally appended get_plugin_toolsets() on top, so the same
'spotify' key showed up from both sources.

Dedupe by key — built-in CONFIGURABLE_TOOLSETS entry wins (it has the
nicer label and description). Also guards against future bundled plugins
that share a toolset key with a built-in.
2026-04-25 05:53:08 -07:00
Teknium
3c1c65e754 fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633)
Generalize the temperature-specific 400 retry that shipped in PR #15621 so
the same reactive strategy covers any provider that rejects an arbitrary
request parameter —  — not just temperature.

- agent/auxiliary_client.py:
  * New _is_unsupported_parameter_error(exc, param): matches the same six
    phrasings the old temperature detector did plus 'unrecognized parameter'
    and 'invalid parameter', against any named param.
  * _is_unsupported_temperature_error is now a thin back-compat wrapper so
    existing imports and tests keep working.
  * The max_tokens → max_completion_tokens retry branch in call_llm and
    async_call_llm now (a) gates on 'max_tokens is not None' so we do not
    pop a key that was never set and silently substitute a None value on
    the retry, and (b) also matches the generic helper in addition to the
    legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking
    up phrasings like 'Unknown parameter: max_tokens' that previously slipped
    through.

- tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering
  the generic detector across params, the back-compat wrapper, and the two
  hardenings to the max_tokens retry branch (None gate + generic phrasing).

Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR
also proposed the reactive temperature retry which landed independently via
PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages
the remaining hardening ideas onto current main.
2026-04-25 05:50:34 -07:00
Teknium
f92006ce1c fix(compression): reserve system+tools headroom when aux binds threshold (#15631)
When the auxiliary compression model's context is smaller than the main
model's compression threshold, _check_compression_model_feasibility
auto-lowers the session threshold. Previously it set:

    new_threshold = aux_context

This let the raw message list grow to exactly aux_context tokens. But
compression and flush_memories actually send system_prompt + tool_schemas
+ messages to the aux model. With 50+ tools that overhead is 25-30K
tokens, so the full request overflowed aux with HTTP 400.

Subtract a headroom estimate from aux_context before setting the new
threshold: the actual tool-schema token count (from
estimate_request_tokens_rough) plus a 12K allowance for the system
prompt (not yet built at __init__ time) and flush-instruction overhead.
Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with
an unusually heavy tool schema.

This fixes the 'flush_memories overflow on busy toolsets' path that
Teknium flagged — where main and aux can be nominally the same model
but still 400 because the threshold left no room for the request
overhead. Same fix also protects the normal compression summarisation
request on the same binding aux.

Tests: two new regression tests cover the headroom reservation and the
MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new
(lower) threshold values now that empty-tools still produces a 12K
static headroom deduction.
2026-04-25 05:41:56 -07:00
Teknium
b35d692f45 chore(release): map ash@users.noreply.github.com to ash 2026-04-25 05:27:17 -07:00
Ash Rowan Vale 🌿
facea84559 fix(auxiliary): retry without temperature when any provider rejects it
Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature'
across all providers/models — not just Codex Responses.

The same backend can accept temperature for some models and reject it for
others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI
endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and
Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does
not scale.

call_llm / async_call_llm now detect the concrete 'unsupported parameter:
temperature' 400 and transparently retry once without temperature. Kimi's
server-managed omission and Opus 4.7+'s proactive strip stay in place —
this is the safety net for everything else.

Changes:
- agent/auxiliary_client.py: add _is_unsupported_temperature_error helper;
  wire into both sync and async call_llm paths before the existing
  max_tokens/payment/auth retry ladder
- tests/agent/test_unsupported_temperature_retry.py: 19 tests covering
  detector phrasings, sync + async retry, no-retry-without-temperature,
  and non-temperature 400s not triggering the retry

Builds on PR #15620 (codex_responses fallback) which stripped temperature
up front for that one api_mode. This PR closes the gap for every other
provider/model combo via reactive retry.

Credit: retry approach and detector originate from @BlueBirdBack's PR #15578.

Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>
2026-04-25 05:27:17 -07:00
Teknium
f67a61dc93 fix(flush_memories): strip temperature from codex_responses fallback (#15620)
The memory-flush fallback for api_mode='codex_responses' was unconditionally
adding `temperature` to codex_kwargs before calling _run_codex_stream. The
Responses API does not accept temperature on any supported backend:

- chatgpt.com/backend-api/codex rejects it outright
- api.openai.com + gpt-5/o-series reasoning models reject it
- Copilot Responses rejects it on reasoning models

The CodexAuxiliaryClient adapter and the codex_responses transport both
correctly omit temperature — the flush fallback was the only path putting
it back. On errors from the primary aux path (e.g. expired OAuth token),
users saw `⚠ Auxiliary memory flush failed: HTTP 400: Unsupported parameter:
temperature`.

Reported by Garik [NOUS] on GPT-5.5 via Codex OAuth Pro.
2026-04-25 05:01:25 -07:00
Teknium
6ed37e0f42 feat(tools): make discord/discord_admin opt-in, Discord-only
Both discord (read/participate) and discord_admin (server admin) are now
configurable via `hermes tools` with default-OFF. Previously the core
discord tool (fetch_messages, search_members, create_thread) auto-loaded
on every Discord install with DISCORD_BOT_TOKEN set — 19 tools the user
never opted into.

Adds a platform-scoping mechanism (_TOOLSET_PLATFORM_RESTRICTIONS) so
the discord toolsets only show up in the Discord platform's checklist,
not on CLI/Telegram/Slack/etc. Applied at four gates:
  - _prompt_toolset_checklist: checklist filter
  - _get_platform_tools: resolution filter (both branches)
  - _save_platform_tools: save-time filter (covers 'Configure all
    platforms' and hand-edited config.yaml)
  - tools_disable_enable_command: rejects `hermes tools enable discord`
    on non-Discord platforms with a clear error

build_session_context_prompt now injects the Discord IDs block only
when both conditions hold: the discord/discord_admin toolset is
enabled AND DISCORD_BOT_TOKEN is set. Toolset alone isn't enough —
the tool's check_fn gates on the token at registry time, so opting
in without a token yields no tools and the IDs block would lie.
Otherwise keep the stale-API disclaimer.
2026-04-25 04:51:11 -07:00
alt-glitch
591deeb928 feat(session): inject Discord IDs block when discord tool is loaded
When DISCORD_BOT_TOKEN is set — meaning the discord tool actually
loads — emit a dedicated IDs block in the session context prompt so
the agent can call ``fetch_messages``, ``pin_message``, etc. with
real identifiers instead of probing.

Currently only ``thread_id`` was exposed as a raw ID (via the
``description`` string).  The agent in a Discord thread had to guess
that the thread ID doubles as a channel ID for the REST API (it
does), and it had no way to reference the parent channel, the guild,
or the triggering message at all.

The block adapts to context:

  - Thread:     guild / parent channel / thread / message
  - Channel:    guild / channel / message
  - (DM has no guild/channel IDs worth listing; only message)

Discord isn't in _PII_SAFE_PLATFORMS, so IDs ship unredacted.
2026-04-25 04:51:11 -07:00
alt-glitch
5ae07e7b5c fix(session): gate stale "no Discord APIs" note on DISCORD_BOT_TOKEN
The Discord platform note in the session context prompt claimed the
agent has no server-management APIs — pre-dating the discord tool.
With a bot token configured the agent actually has fetch_messages,
search_members, create_thread, and optionally the discord_admin tool;
telling the model otherwise causes it to refuse or apologise for
calls it is fully able to make.

Gate the disclaimer on DISCORD_BOT_TOKEN being unset, matching the
tool's own ``check_fn``.  Without a token the note still appears and
remains accurate; with a token the model is no longer gaslit into
refusing valid tool calls.
2026-04-25 04:51:11 -07:00
alt-glitch
47b02e961c feat(discord): populate guild_id, parent_chat_id, message_id on SessionSource
Discord knows all four identifiers for every inbound message — guild,
channel (or thread), parent channel when in a thread, and the
triggering message.  Pass them into ``SessionSource`` via the new
``build_source()`` kwargs so downstream code (context-prompt builder,
delivery, logging) can use them without re-resolving from discord.py
objects.

For auto-threaded messages, remember the original channel as the
parent before swapping ``chat_id`` to the freshly created thread.

Behavioural: still a no-op — nothing consumes these fields yet.
2026-04-25 04:51:11 -07:00
alt-glitch
0702231dd8 feat(session): add guild_id/parent_chat_id/message_id to SessionSource
Groundwork for injecting raw platform identifiers into the agent's
system prompt.  Currently only `thread_id` is exposed as a raw ID —
callers in a Discord thread had to guess `channel_id == thread_id`
(which happens to work because threads are channels in Discord's REST
API) and had no way to reference the parent channel, guild, or the
triggering message.

Adds three optional fields:

- `guild_id` — Discord guild / Slack workspace / Matrix server scope
- `parent_chat_id` — parent channel when chat_id refers to a thread
- `message_id` — ID of the triggering message (pin/reply/react)

Extends `BasePlatformAdapter.build_source()` to accept + forward them
and teaches `to_dict`/`from_dict` to serialize them.  Behaviourally a
no-op: nothing reads the fields yet and they default to None.
2026-04-25 04:51:11 -07:00
alt-glitch
db09477b77 feat(feishu): wire feishu doc/drive tools into hermes-feishu composite
The feishu_doc and feishu_drive tools were registered in the tool
registry but never added to the hermes-feishu composite toolset.
The pipeline fix from the prior commit now recovers them automatically
once they are in the composite.
2026-04-25 04:50:14 -07:00
alt-glitch
81987f0350 feat(discord): split discord_server into discord + discord_admin tools
Split the monolithic discord_server tool (14 actions) into two:

- discord: core actions (fetch_messages, search_members, create_thread)
  that are useful for the agent's normal operation. Auto-enabled on
  the discord platform via the pipeline fix.

- discord_admin: server management actions (list channels/roles, pins,
  role assignment) that require explicit opt-in via hermes tools.
  Added to CONFIGURABLE_TOOLSETS and _DEFAULT_OFF_TOOLSETS.
2026-04-25 04:50:14 -07:00
alt-glitch
9830905dab fix(tools): recover non-configurable toolsets from composite resolution
The reverse-mapping loop in _get_platform_tools only checked
CONFIGURABLE_TOOLSETS, silently dropping platform-specific toolsets
like discord and feishu_doc whose tools were in the composite but
had no configurable key. Add a second pass over TOOLSETS that picks
up unclaimed toolsets whose tools are present in the resolved
composite.
2026-04-25 04:50:14 -07:00