hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-28 06:51:16 +08:00

Author	SHA1	Message	Date
ygd58	d7a3468246	fix(prompts): replace [SYSTEM: with [IMPORTANT: to avoid Azure content filter Azure OpenAI content filters (Default/DefaultV2) treat bracketed [SYSTEM: ...] meta-instructions as prompt-injection attempts and reject requests with HTTP 400. Replacing [SYSTEM: with [IMPORTANT: preserves the same semantic meaning for the model while bypassing the Azure heuristic. Fixes #6576	2026-04-26 08:44:58 -07:00
Teknium	f2d655529a	fix(auth): hoist get_env_value import + strengthen .env fallback tests Follow-up to cherry-picked PR #15920: - agent/credential_pool.py: hoist 'from hermes_cli.config import get_env_value' to module top instead of inline try/except in each seed site (3 sites). No import cycle — hermes_cli/config.py doesn't depend on agent.credential_pool. - hermes_cli/auth.py: same hoist for the _resolve_api_key_provider_secret loop. - tests/tools/test_credential_pool_env_fallback.py: replace smoke-only tests with real .env file I/O. Each test writes a temp ~/.hermes/.env, verifies _seed_from_env / _resolve_api_key_provider_secret read from it, and asserts the full priority chain: os.environ > .env > credential_pool. Uses 'deepseek' as the test provider since 'openai' isn't in PROVIDER_REGISTRY and _seed_from_env's generic path requires a real pconfig lookup.	2026-04-26 08:32:09 -07:00
阿泥豆	8443998dc3	fix(auth): resolve API keys from ~/.hermes/.env and credential_pool _resolve_api_key_provider_secret() and _seed_from_env() only checked os.environ for provider API keys. When keys exist in ~/.hermes/.env but are not loaded into the process environment (e.g. ACP adapter entry point, post-session-start .env edits, or non-CLI entry points), the resolution returns an empty string, causing HTTP 401 failures. Changes: - credential_pool._seed_from_env: use get_env_value() which checks both os.environ and ~/.hermes/.env file, preventing _prune_stale_seeded_entries from removing valid entries whose env var isn't in os.environ - credential_pool._seed_from_env: same fix for openrouter and base_url_env_var resolution - auth._resolve_api_key_provider_secret: use get_env_value() instead of os.getenv(), and add credential_pool fallback when env resolution fails Fixes #15914	2026-04-26 08:32:09 -07:00
Teknium	9a70260490	Revert "feat(onboarding): port first-touch hints to the TUI (#16054 )" (#16062 ) This reverts commit `ffd2621039`.	2026-04-26 06:31:37 -07:00
Teknium	ffd2621039	feat(onboarding): port first-touch hints to the TUI (#16054 ) PR #16046 added /busy and /verbose hints to the classic CLI and the gateway runner but skipped the Ink TUI (and therefore the dashboard /chat page, which embeds the TUI via PTY). This extends the same latch to the TUI with TUI-native wording. The TUI's busy-input model is not the /busy knob from the CLI — single Enter while busy auto-queues, double Enter on an empty line interrupts. The new busy-input hint teaches THAT gesture instead of telling the user to flip a config that does not apply. Changes: - agent/onboarding.py — add busy_input_hint_tui() + tool_progress_hint_tui() - tui_gateway/server.py — onboarding.claim JSON-RPC (Ink triggers busy hint on enqueue) + _maybe_emit_onboarding_hint helper hooked into _on_tool_complete for the 30s/tool_progress=all path. Same config.yaml latch so each hint fires at most once per install across CLI, gateway, and TUI combined. - ui-tui/src/gatewayTypes.ts — OnboardingClaimResponse + onboarding.hint event - ui-tui/src/app/createGatewayEventHandler.ts — render the hint event as sys() - ui-tui/src/app/useSubmission.ts — claim busy_input_prompt on first busy enqueue - tests/agent/test_onboarding.py — +3 cases for TUI hint shape - tests/tui_gateway/test_protocol.py — +4 cases for onboarding.claim - website/docs/user-guide/tui.md — new 'Interrupting and queueing' section explaining the TUI's double-Enter model and the hints Validation: scripts/run_tests.sh tests/agent/test_onboarding.py \ tests/tui_gateway/test_protocol.py \ tests/gateway/test_busy_session_ack.py -> 66 passed npm --prefix ui-tui run type-check -> clean npm --prefix ui-tui run lint -> clean npm --prefix ui-tui run build -> clean	2026-04-26 06:24:19 -07:00
Teknium	83c1c201f6	feat(onboarding): contextual first-touch hints for /busy and /verbose (#16046 ) Instead of a blocking first-run questionnaire, show a one-time hint the first time the user hits each behavior fork: 1. First message while the agent is working — appends a hint to the busy-ack explaining the /busy queue vs /busy interrupt knob, phrased to match the mode that was just applied (don't tell a queue-mode user to switch to queue). 2. First tool that runs for >= 30s in the noisiest progress mode (tool_progress: all) — prints a hint about /verbose to cycle display modes (all -> new -> off -> verbose). Gated on /verbose actually being usable on the surface: always shown on CLI; on gateway only shown when display.tool_progress_command is enabled. Each hint is latched in config.yaml under onboarding.seen.<flag>, so it fires exactly once per install across CLI, gateway, and cron, then never again. Users can wipe the section to re-see hints. New: - agent/onboarding.py — is_seen / mark_seen / hint strings, shared by both CLI and gateway. - onboarding.seen in DEFAULT_CONFIG (hermes_cli/config.py) and in load_cli_config defaults (cli.py). No _config_version bump — deep merge handles new keys. Wired: - gateway/run.py: _handle_active_session_busy_message appends the hint after building the ack. progress_callback tracks tool.completed duration and queues the tool-progress hint into the progress bubble. - cli.py: CLI input loop appends the busy-input hint on the first busy Enter; _on_tool_progress appends the tool-progress hint on the first >=30s tool completion. In-memory CLI_CONFIG is also updated so subsequent fires in the same process are suppressed immediately. All writes go through atomic_yaml_write and are wrapped in try/except so onboarding can never break the input/busy-ack paths.	2026-04-26 06:06:27 -07:00
Teknium	438db0c7b0	fix(cli): /model picker honors provider-specific context caps (#16030 ) `_apply_model_switch_result` (the interactive `/model` picker's confirmation path) printed `ModelInfo.context_window` straight from models.dev, which reports the vendor-wide value (1.05M for gpt-5.5 on openai). ChatGPT Codex OAuth caps the same slug at 272K, so the picker showed 1M while the runtime (compressor, gateway `/model`, typed `/model <name>`) correctly used 272K — the classic 'sometimes 1M, sometimes 272K' mismatch on a single model. Both display paths now go through `resolve_display_context_length()`, matching the fix that `_handle_model_switch` received earlier. Also bump the stale last-resort fallback in DEFAULT_CONTEXT_LENGTHS (`gpt-5.5: 400000 -> 1050000`) to match the real OpenAI API value; the 272K Codex cap is already enforced via the Codex-OAuth branch, so the fallback now reflects what every non-Codex probe-miss should see. Tests: adds `test_apply_model_switch_result_context.py` with three scenarios (Codex cap wins, OpenRouter shows 1.05M, resolver-empty falls back to ModelInfo). Updates the existing non-Codex fallback test to assert 1.05M (the correct value). ## Validation \| path \| before \| after \| \|-------------------------------\|-----------\|-----------\| \| picker -> gpt-5.5 on Codex \| 1,050,000 \| 272,000 \| \| picker -> gpt-5.5 on OpenAI \| 1,050,000 \| 1,050,000 \| \| picker -> gpt-5.5 on OpenRouter \| 1,050,000 \| 1,050,000 \| \| typed /model gpt-5.5 on Codex \| 272,000 \| 272,000 \|	2026-04-26 05:43:31 -07:00
zkl	2ccdadcca6	fix(deepseek): bump V4 family context window to 1M tokens #14934 added deepseek-v4-pro / deepseek-v4-flash to the DeepSeek native provider but the context-window lookup still falls back to the existing "deepseek" substring entry (128K). DeepSeek V4 ships with a 1M context window, so any caller relying on get_model_context_length() for pre-flight token budgeting (compression, context warnings) under-counts by ~8x. Add explicit lowercase entries for the four DeepSeek model ids that ship 1M context: - deepseek-v4-pro - deepseek-v4-flash - deepseek-chat (legacy alias, server-side maps to v4-flash non-thinking) - deepseek-reasoner (legacy alias, server-side maps to v4-flash thinking) Longest-key-first substring matching means these explicit entries also cover the vendor-prefixed forms (deepseek/deepseek-v4-pro on OpenRouter and Nous Portal) without regressing the existing 128K fallback for older / unknown DeepSeek model ids on custom endpoints. Source: https://api-docs.deepseek.com/zh-cn/quick_start/pricing	2026-04-26 05:32:54 -07:00
Teknium	192e7eb21f	fix(nous): don't trip cross-session rate breaker on upstream-capacity 429s (#15898 ) Nous Portal multiplexes multiple upstream providers (DeepSeek, Kimi, MiMo, Hermes) behind one endpoint. Before this fix, any 429 on any of those models recorded a cross-session file breaker that blocked EVERY model on Nous for the cooldown window -- even though the caller's own RPM/RPH/TPM/TPH buckets were healthy. Users hit a DeepSeek V4 Pro capacity error, restarted, switched to Kimi 2.6, and still got 'Nous Portal rate limit active -- resets in 46m 53s'. Nous already emits the full x-ratelimit-* header suite on every response (captured by rate_limit_tracker into agent._rate_limit_state). We now gate the breaker on that data: trip it only when either the 429's own headers or the last-known-good state show a bucket with remaining == 0 AND a reset window >= 60s. Upstream-capacity 429s (healthy buckets everywhere, but upstream out of capacity) fall through to normal retry/fallback and the breaker is never written. Note: the in-memory 'restart TUI/gateway to clear' workaround circulated in Discord does NOT work -- the breaker is file-backed at ~/.hermes/rate_limits/nous.json. The workaround for users still affected by a bad state file is to delete it. Reported in Discord by CrazyDok1 and KYSIV (Apr 2026).	2026-04-26 04:53:42 -07:00
pein892	24b4b24d79	fix: preserve URL query params for Azure OpenAI and custom endpoints Azure OpenAI requires an `api-version` query parameter on every request. When users include it in the base_url (e.g. `?api-version=2025-04-01-preview`), the OpenAI SDK silently drops it during URL construction, causing 404 errors. Extract query params from base_url and pass them via `default_query` so the SDK appends them to every request. This is a generic solution that works for any custom endpoint requiring query parameters, not just Azure. No-op for URLs without query params — fully backward compatible.	2026-04-25 18:48:43 -07:00
HangGlidersRule	c15064fa37	fix: pass api-version as default_query param, not in base_url — SDK was producing malformed URLs like /anthropic?api-version=.../v1/messages	2026-04-25 18:48:43 -07:00
Teknium	125de02056	fix(context): honor custom_providers context_length on /model switch + bump probe tier to 256K (#15844 ) Fixes #15779. Custom-provider per-model context_length (`custom_providers[].models.<id>.context_length`) is now honored across every resolution path, not just agent startup. Also adds 256K as the top probe tier and default fallback. ## What changed New helper `hermes_cli.config.get_custom_provider_context_length()` — single source of truth for the per-model override lookup, with trailing-slash-insensitive base-url matching. `agent.model_metadata.get_model_context_length()` gains an optional `custom_providers=` kwarg (step 0b — runs after explicit `config_context_length` but before every other probe). Wired through five call sites that previously either duplicated the lookup or ignored it entirely: - `run_agent.py` startup — refactored to use the new helper (dedups legacy inline loop, keeps invalid-value warning) - `AIAgent.switch_model()` — re-reads custom_providers from live config on every /model switch - `hermes_cli.model_switch.resolve_display_context_length()` — new `custom_providers=` kwarg - `gateway/run.py` /model confirmation (picker callback + text path) - `gateway/run.py` `_format_session_info` (/info) ## Context probe tiers `CONTEXT_PROBE_TIERS = [256_000, 128_000, 64_000, 32_000, 16_000, 8_000]` — was `[128_000, ...]`. `DEFAULT_FALLBACK_CONTEXT` follows tier[0], so unknown models now default to 256K. The stale `128000` literal in the OpenRouter metadata-miss path is replaced with `DEFAULT_FALLBACK_CONTEXT` for consistency. ## Repro (from #15779) ```yaml custom_providers: - name: my-custom-endpoint base_url: https://example.invalid/v1 model: gpt-5.5 models: gpt-5.5: context_length: 1050000 ``` `/model gpt-5.5 --provider custom:my-custom-endpoint` → previously "Context: 128,000", now "Context: 1,050,000". ## Tests - `tests/hermes_cli/test_custom_provider_context_length.py` — new file, 19 tests covering the helper, step-0b integration, and the 256K tier invariants - `tests/hermes_cli/test_model_switch_context_display.py` — added regression tests for #15779 through the display resolver - `tests/gateway/test_session_info.py` — updated default-fallback assertion (128K → 256K) - `tests/agent/test_model_metadata.py` — updated tier assertions for the new top tier	2026-04-25 18:47:53 -07:00
nerijusas	81e01f6ee9	fix(agent): preserve Codex message items for replay	2026-04-25 18:22:06 -07:00
kshitij	648b89911f	fix: use output_text for assistant message content in Codex Responses API (#15690 ) The Codex Responses API rejects input_text inside assistant messages — only output_text and refusal are valid content types for assistant role. _chat_content_to_responses_parts() previously hardcoded all text content to input_text regardless of the message role. When an assistant message had list-format content (multimodal or structured), this produced invalid input_text parts that the API rejected with: Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'. Fix: add a role parameter to _chat_content_to_responses_parts() that selects output_text for assistant messages and input_text for user messages. Thread this through _chat_messages_to_responses_input() and _preflight_codex_input_items(). Fixes #15687	2026-04-25 10:13:29 -07:00
Teknium	ea01bdcebe	refactor(memory): remove flush_memories entirely (#15696 ) The AIAgent.flush_memories pre-compression save, the gateway _flush_memories_for_session, and everything feeding them are obsolete now that the background memory/skill review handles persistent memory extraction. Problems with flush_memories: - Pre-dates the background review loop. It was the only memory-save path when introduced; the background review now fires every 10 user turns on CLI and gateway alike, which is far more frequent than compression or session reset ever triggered flush. - Blocking and synchronous. Pre-compression flush ran on the live agent before compression, blocking the user-visible response. - Cache-breaking. Flush built a temporary conversation prefix (system prompt + memory-only tool list) that diverged from the live conversation's cached prefix, invalidating prompt caching. The gateway variant spawned a fresh AIAgent with its own clean prompt for each finalized session — still cache-breaking, just in a different process. - Redundant. Background review runs in the live conversation's session context, gets the same content, writes to the same memory store, and doesn't break the cache. Everything flush_memories claimed to preserve is already covered. What this removes: - AIAgent.flush_memories() method (~248 LOC in run_agent.py) - Pre-compression flush call in _compress_context - flush_memories call sites in cli.py (/new + exit) - GatewayRunner._flush_memories_for_session + _async_flush_memories (and the 3 call sites: session expiry watcher, /new, /resume) - 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks, hermes tools UI task list, auxiliary_client docstrings - _memory_flush_min_turns config + init - #15631's headroom-deduction math in _check_compression_model_feasibility (headroom was only needed because flush dragged the full main-agent system prompt along; the compression summariser sends a single user-role prompt so new_threshold = aux_context is safe again) - The dedicated test files and assertions that exercised flush-specific paths What this renames (with read-time backcompat on sessions.json): - SessionEntry.memory_flushed -> SessionEntry.expiry_finalized. The session-expiry watcher still uses the flag to avoid re-running finalize/eviction on the same expired session; the new name reflects what it now actually gates. from_dict() reads 'expiry_finalized' first, falls back to the legacy 'memory_flushed' key so existing sessions.json files upgrade seamlessly. Supersedes #15631 and #15638. Tested: 383 targeted tests pass across run_agent/, agent/, cli/, and gateway/ session-boundary suites. No behavior regressions — background memory review continues to handle persistent memory extraction on both CLI and gateway.	2026-04-25 08:21:14 -07:00
Teknium	3c1c65e754	fix(auxiliary): generalize unsupported-parameter detector and harden max_tokens retry (#15633 ) Generalize the temperature-specific 400 retry that shipped in PR #15621 so the same reactive strategy covers any provider that rejects an arbitrary request parameter — — not just temperature. - agent/auxiliary_client.py: * New _is_unsupported_parameter_error(exc, param): matches the same six phrasings the old temperature detector did plus 'unrecognized parameter' and 'invalid parameter', against any named param. * _is_unsupported_temperature_error is now a thin back-compat wrapper so existing imports and tests keep working. * The max_tokens → max_completion_tokens retry branch in call_llm and async_call_llm now (a) gates on 'max_tokens is not None' so we do not pop a key that was never set and silently substitute a None value on the retry, and (b) also matches the generic helper in addition to the legacy 'max_tokens' / 'unsupported_parameter' substring checks — picking up phrasings like 'Unknown parameter: max_tokens' that previously slipped through. - tests/agent/test_unsupported_parameter_retry.py: 18 new tests covering the generic detector across params, the back-compat wrapper, and the two hardenings to the max_tokens retry branch (None gate + generic phrasing). Credit: retry-generalization pattern from @nicholasrae's PR #15416. That PR also proposed the reactive temperature retry which landed independently via PR #15621 + #15623 (co-authored with @BlueBirdBack). This commit salvages the remaining hardening ideas onto current main.	2026-04-25 05:50:34 -07:00
Ash Rowan Vale 🌿	facea84559	fix(auxiliary): retry without temperature when any provider rejects it Universal reactive fix for 'HTTP 400: Unsupported parameter: temperature' across all providers/models — not just Codex Responses. The same backend can accept temperature for some models and reject it for others (e.g. gpt-5.4 accepts but gpt-5.5 rejects on the same OpenAI endpoint; similar patterns on Copilot, OpenRouter reasoning routes, and Anthropic Opus 4.7+ via OAI-compat). An allow/deny-list by model name does not scale. call_llm / async_call_llm now detect the concrete 'unsupported parameter: temperature' 400 and transparently retry once without temperature. Kimi's server-managed omission and Opus 4.7+'s proactive strip stay in place — this is the safety net for everything else. Changes: - agent/auxiliary_client.py: add _is_unsupported_temperature_error helper; wire into both sync and async call_llm paths before the existing max_tokens/payment/auth retry ladder - tests/agent/test_unsupported_temperature_retry.py: 19 tests covering detector phrasings, sync + async retry, no-retry-without-temperature, and non-temperature 400s not triggering the retry Builds on PR #15620 (codex_responses fallback) which stripped temperature up front for that one api_mode. This PR closes the gap for every other provider/model combo via reactive retry. Credit: retry approach and detector originate from @BlueBirdBack's PR #15578. Co-authored-by: BlueBirdBack <BlueBirdBack@users.noreply.github.com>	2026-04-25 05:27:17 -07:00
vominh1919	5401a0080d	fix: recalculate token budgets on model switch in ContextCompressor update_model() recalculated threshold_tokens but left tail_token_budget and max_summary_tokens at their __init__ values. When switching from a 200K model to 32K, the tail budget stayed at ~20K tokens (62% of 32K) instead of the intended ~10%. Adds budget recalculation in update_model() and 2 regression tests.	2026-04-25 15:07:56 +05:30
helix4u	ead66f0c92	fix(skills): apply inline shell in skill_view	2026-04-24 15:15:07 -07:00
Teknium	4093ee9c62	fix(codex): detect leaked tool-call text in assistant content (#15347 ) gpt-5.x on the Codex Responses API sometimes degenerates and emits Harmony-style `to=functions.<name> {json}` serialization as plain assistant-message text instead of a structured `function_call` item. The intent never makes it into `response.output` as a function_call, so `tool_calls` is empty and `_normalize_codex_response()` returns the leaked text as the final content. Downstream (e.g. delegate_task), this surfaces as a confident-looking summary with `tool_trace: []` because no tools actually ran — the Taiwan-embassy-email bug report. Detect the pattern, scrub the content, and return finish_reason= 'incomplete' so the existing Codex-incomplete continuation path (run_agent.py:11331, 3 retries) gets a chance to re-elicit a proper function_call item. Encrypted reasoning items are preserved so the model keeps its chain-of-thought on the retry. Regression tests: leaked text triggers incomplete, real tool calls alongside leak-looking text are preserved, clean responses pass through unchanged. Reported on Discord (gpt-5.4 / openai-codex).	2026-04-24 14:39:59 -07:00
helix4u	6a957a74bc	fix(memory): add write origin metadata	2026-04-24 14:37:55 -07:00
helix4u	8a2506af43	fix(aux): surface auxiliary failures in UI	2026-04-24 14:31:21 -07:00
Andre Kurait	a9ccb03ccc	fix(bedrock): evict cached boto3 client on stale-connection errors ## Problem When a pooled HTTPS connection to the Bedrock runtime goes stale (NAT timeout, VPN flap, server-side TCP RST, proxy idle cull), the next Converse call surfaces as one of: * botocore.exceptions.ConnectionClosedError / ReadTimeoutError / EndpointConnectionError / ConnectTimeoutError * urllib3.exceptions.ProtocolError * A bare AssertionError raised from inside urllib3 or botocore (internal connection-pool invariant check) The agent loop retries the request 3x, but the cached boto3 client in _bedrock_runtime_client_cache is reused across retries — so every attempt hits the same dead connection pool and fails identically. Only a process restart clears the cache and lets the user keep working. The bare-AssertionError variant is particularly user-hostile because str(AssertionError()) is an empty string, so the retry banner shows: ⚠️ API call failed: AssertionError 📝 Error: with no hint of what went wrong. ## Fix Add two helpers to agent/bedrock_adapter.py: * is_stale_connection_error(exc) — classifies exceptions that indicate dead-client/dead-socket state. Matches botocore ConnectionError + HTTPClientError subtrees, urllib3 ProtocolError / NewConnectionError, and AssertionError raised from a frame whose module name starts with urllib3., botocore., or boto3.. Application-level AssertionErrors are intentionally excluded. * invalidate_runtime_client(region) — per-region counterpart to the existing reset_client_cache(). Evicts a single cached client so the next call rebuilds it (and its connection pool). Wire both into the Converse call sites: * call_converse() / call_converse_stream() in bedrock_adapter.py (defense-in-depth for any future caller) * The two direct client.converse(kwargs) / client.converse_stream(kwargs) call sites in run_agent.py (the paths the agent loop actually uses) On a stale-connection exception, the client is evicted and the exception re-raised unchanged. The agent's existing retry loop then builds a fresh client on the next attempt and recovers without requiring a process restart. ## Tests tests/agent/test_bedrock_adapter.py gets three new classes (14 tests): * TestInvalidateRuntimeClient — per-region eviction correctness; non-cached region returns False. * TestIsStaleConnectionError — classifies botocore ConnectionClosedError / EndpointConnectionError / ReadTimeoutError, urllib3 ProtocolError, library-internal AssertionError (both urllib3.* and botocore.* frames), and correctly ignores application-level AssertionError and unrelated exceptions (ValueError, KeyError). * TestCallConverseInvalidatesOnStaleError — end-to-end: stale error evicts the cached client, non-stale error (validation) leaves it alone, successful call leaves it cached. All 116 tests in test_bedrock_adapter.py pass. Signed-off-by: Andre Kurait <andrekurait@gmail.com>	2026-04-24 07:26:07 -07:00
Tranquil-Flow	7dc6eb9fbf	fix(agent): handle aws_sdk auth type in resolve_provider_client Bedrock's aws_sdk auth_type had no matching branch in resolve_provider_client(), causing it to fall through to the "unhandled auth_type" warning and return (None, None). This broke all auxiliary tasks (compression, memory, summarization) for Bedrock users — the main conversation loop worked fine, but background context management silently failed. Add an aws_sdk branch that creates an AnthropicAuxiliaryClient via build_anthropic_bedrock_client(), using boto3's default credential chain (IAM roles, SSO, env vars, instance metadata). Default auxiliary model is Haiku for cost efficiency. Closes #13919	2026-04-24 07:26:07 -07:00
Andre Kurait	b290297d66	fix(bedrock): resolve context length via static table before custom-endpoint probe ## Problem `get_model_context_length()` in `agent/model_metadata.py` had a resolution order bug that caused every Bedrock model to fall back to the 128K default context length instead of reaching the static Bedrock table (200K for Claude, etc.). The root cause: `bedrock-runtime.<region>.amazonaws.com` is not listed in `_URL_TO_PROVIDER`, so `_is_known_provider_base_url()` returned False. The resolution order then ran the custom-endpoint probe (step 2) before the Bedrock branch (step 4b), which: 1. Treated Bedrock as a custom endpoint (via `_is_custom_endpoint`). 2. Called `fetch_endpoint_model_metadata()` → `GET /models` on the bedrock-runtime URL (Bedrock doesn't serve this shape). 3. Fell through to `return DEFAULT_FALLBACK_CONTEXT` (128K) at the "probe-down" branch — never reaching the Bedrock static table. Result: users on Bedrock saw 128K context for Claude models that actually support 200K on Bedrock, causing premature auto-compression. ## Fix Promote the Bedrock branch from step 4b to step 1b, so it runs before the custom-endpoint probe at step 2. The static table in `bedrock_adapter.py::get_bedrock_context_length()` is the authoritative source for Bedrock (the ListFoundationModels API doesn't expose context window sizes), so there's no reason to probe `/models` first. The original step 4b is replaced with a one-line breadcrumb comment pointing to the new location, to make the resolution-order docstring accurate. ## Changes - `agent/model_metadata.py` - Add step 1b: Bedrock static-table branch (unchanged predicate, moved). - Remove dead step 4b block, replace with breadcrumb comment. - Update resolution-order docstring to include step 1b. - `tests/agent/test_model_metadata.py` - New `TestBedrockContextResolution` class (3 tests): - `test_bedrock_provider_returns_static_table_before_probe`: confirms `provider="bedrock"` hits the static table and does NOT call `fetch_endpoint_model_metadata` (regression guard). - `test_bedrock_url_without_provider_hint`: confirms the `bedrock-runtime.*.amazonaws.com` host match works without an explicit `provider=` hint. - `test_non_bedrock_url_still_probes`: confirms the probe still fires for genuinely-custom endpoints (no over-reach). ## Testing pytest tests/agent/test_model_metadata.py -q # 83 passed in 1.95s (3 new + 80 existing) ## Risk Very low. - Predicate is identical to the original step 4b — no behaviour change for non-Bedrock paths. - Original step 4b was dead code for the user-facing case (always hit the 128K fallback first), so removing it cannot regress behaviour. - Bedrock path now short-circuits before any network I/O — faster too. - `ImportError` fall-through preserved so users without `boto3` installed are unaffected. ## Related - This is a prerequisite for accurate context-window accounting on Bedrock — the fix for #14710 (stale-connection client eviction) depends on correct context sizing to know when to compress. Signed-off-by: Andre Kurait <andrekurait@gmail.com>	2026-04-24 07:26:07 -07:00
Qi Ke	f2fba4f9a1	fix(anthropic): auto-detect Bedrock model IDs in normalize_model_name (#12295 ) Bedrock model IDs use dots as namespace separators (anthropic.claude-opus-4-7, us.anthropic.claude-sonnet-4-5-v1:0), not version separators. normalize_model_name() was unconditionally converting all dots to hyphens, producing invalid IDs that Bedrock rejects with HTTP 400/404. This affected both the main agent loop (partially mitigated by _anthropic_preserve_dots in run_agent.py) and all auxiliary client calls (compression, session_search, vision, etc.) which go through _AnthropicCompletionsAdapter and never pass preserve_dots=True. Fix: add _is_bedrock_model_id() to detect Bedrock namespace prefixes (anthropic., us., eu., ap., jp., global.) and skip dot-to-hyphen conversion for these IDs regardless of the preserve_dots flag.	2026-04-24 07:26:07 -07:00
Wooseong Kim	54146ae07c	fix(aux): refresh cached auth after 401	2026-04-24 07:14:00 -07:00
Wooseong Kim	be6b83562d	fix(aux): force anthropic oauth refresh after 401 Co-Authored-By: Paperclip <noreply@paperclip.ing>	2026-04-24 07:14:00 -07:00
5park1e	e1106772d9	fix: re-auth on stale OAuth token; read Claude Code credentials from macOS Keychain Bug 3 — Stale OAuth token not detected in 'hermes model': - _model_flow_anthropic used 'has_creds = bool(existing_key)' which treats any non-empty token (including expired OAuth tokens) as valid. - Added existing_is_stale_oauth check: if the only credential is an OAuth token (sk-ant- prefix) with no valid cc_creds fallback, mark it stale and force the re-auth menu instead of silently accepting a broken token. Bug 4 — macOS Keychain credentials never read: - Claude Code >=2.1.114 migrated from ~/.claude/.credentials.json to the macOS Keychain under service 'Claude Code-credentials'. - Added _read_claude_code_credentials_from_keychain() using the 'security' CLI tool; read_claude_code_credentials() now tries Keychain first then falls back to JSON file. - Non-Darwin platforms return None from Keychain read immediately. Tests: - tests/agent/test_anthropic_keychain.py: 11 cases covering Darwin-only guard, security command failures, JSON parsing, fallback priority. - tests/hermes_cli/test_anthropic_model_flow_stale_oauth.py: 8 cases covering stale OAuth detection, API key passthrough, cc_creds fallback. Refs: #12905	2026-04-24 07:14:00 -07:00
nightq	5383615db5	fix: recognize Claude Code OAuth tokens (cc- prefix) in _is_oauth_token Fixes NousResearch/hermes-agent#9813 Root cause: _is_oauth_token() only recognized sk-ant-* and eyJ* patterns, but Claude Code OAuth tokens from CLAUDE_CODE_OAUTH_TOKEN use cc- prefix Fix: Add cc- prefix detection so these tokens route through Bearer auth	2026-04-24 07:14:00 -07:00
Maymun	56086e3fd7	fix(auth): write Anthropic OAuth token files atomically to prevent corruption	2026-04-24 07:14:00 -07:00
vlwkaos	f7f7588893	fix(agent): only set rate-limit cooldown when leaving primary; add tests	2026-04-24 05:35:43 -07:00
Teknium	ba44a3d256	fix(gemini): fail fast on missing API key + surface it in hermes dump (#15133 ) Two small fixes triggered by a support report where the user saw a cryptic 'HTTP 400 - Error 400 (Bad Request)!!1' (Google's GFE HTML error page, not a real API error) on every gemini-2.5-pro request. The underlying cause was an empty GOOGLE_API_KEY / GEMINI_API_KEY, but nothing in our output made that diagnosable: 1. hermes_cli/dump.py: the api_keys section enumerated 23 providers but omitted Google entirely, so users had no way to verify from 'hermes dump' whether the key was set. Added GOOGLE_API_KEY and GEMINI_API_KEY rows. 2. agent/gemini_native_adapter.py: GeminiNativeClient.__init__ accepted an empty/whitespace api_key and stamped it into the x-goog-api-key header, which made Google's frontend return a generic HTML 400 long before the request reached the Generative Language backend. Now we raise RuntimeError at construction with an actionable message pointing at GOOGLE_API_KEY/GEMINI_API_KEY and aistudio.google.com. Added a regression test that covers '', ' ', and None.	2026-04-24 05:35:17 -07:00
konsisumer	785d168d50	fix(credential_pool): add Nous OAuth cross-process auth-store sync Concurrent Hermes processes (e.g. cron jobs) refreshing a Nous OAuth token via resolve_nous_runtime_credentials() write the rotated tokens to auth.json. The calling process's pool entry becomes stale, and the next refresh against the already-rotated token triggers a 'refresh token reuse' revocation on the Nous Portal. _sync_nous_entry_from_auth_store() reads auth.json under the same lock used by resolve_nous_runtime_credentials, and adopts the newer token pair before refreshing the pool entry. This complements #15111 (which preserved the obtained_at timestamps through seeding). Partial salvage of #10160 by @konsisumer — only the agent/credential_pool.py changes + the 3 Nous-specific regression tests. The PR also touched 10 unrelated files (Dockerfile, tips.py, various tool tests) which were dropped as scope creep. Regression tests: - test_sync_nous_entry_from_auth_store_adopts_newer_tokens - test_sync_nous_entry_noop_when_tokens_match - test_nous_exhausted_entry_recovers_via_auth_store_sync	2026-04-24 05:20:05 -07:00
vominh1919	461899894e	fix: increment request_count in least_used pool strategy The least_used strategy selected entries via min(request_count) but never incremented the counter. All entries stayed at count=0, so the strategy degenerated to fill_first behavior with no actual load balancing. Now increments request_count after each selection and persists the update.	2026-04-24 05:20:05 -07:00
NiuNiu Xia	76329196c1	fix(copilot): wire live /models max_prompt_tokens into context-window resolver The Copilot provider resolved context windows via models.dev static data, which does not include account-specific models (e.g. claude-opus-4.6-1m with 1M context). This adds the live Copilot /models API as a higher- priority source for copilot/copilot-acp/github-copilot providers. New helper get_copilot_model_context() in hermes_cli/models.py extracts capabilities.limits.max_prompt_tokens from the cached catalog. Results are cached in-process for 1 hour. In agent/model_metadata.py, step 5a queries the live API before falling through to models.dev (step 5b). This ensures account-specific models get correct context windows while standard models still have a fallback. Part 1 of #7731. Refs: #7272	2026-04-24 05:09:08 -07:00
NiuNiu Xia	d7ad07d6fe	fix(copilot): exchange raw GitHub token for Copilot API JWT Raw GitHub tokens (gho_/github_pat_/ghu_) are now exchanged for short-lived Copilot API tokens via /copilot_internal/v2/token before being used as Bearer credentials. This is required to access internal-only models (e.g. claude-opus-4.6-1m with 1M context). Implementation: - exchange_copilot_token(): calls the token exchange endpoint with in-process caching (dict keyed by SHA-256 fingerprint), refreshed 2 minutes before expiry. No disk persistence — gateway is long-running so in-memory cache is sufficient. - get_copilot_api_token(): convenience wrapper with graceful fallback — returns exchanged token on success, raw token on failure. - Both callers (hermes_cli/auth.py and agent/credential_pool.py) now pipe the raw token through get_copilot_api_token() before use. 12 new tests covering exchange, caching, expiry, error handling, fingerprinting, and caller integration. All 185 existing copilot/auth tests pass. Part 2 of #7731.	2026-04-24 05:09:08 -07:00
MestreY0d4-Uninter	7d2f93a97f	fix: set HOME for Copilot ACP subprocesses Pass an explicit HOME into Copilot ACP child processes so delegated ACP runs do not fail when the ambient environment is missing HOME. Prefer the per-profile subprocess home when available, then fall back to HOME, expanduser('~'), pwd.getpwuid(...), and /home/openclaw. Add regression tests for both profile-home preference and clean HOME fallback. Refs #11068.	2026-04-24 05:09:08 -07:00
Teknium	78450c4bd6	fix(nous-oauth): preserve obtained_at in pool + actionable message on RT reuse (#15111 ) Two narrow fixes motivated by #15099. 1. _seed_from_singletons() was dropping obtained_at, agent_key_obtained_at, expires_in, and friends when seeding device_code pool entries from the providers.nous singleton. Fresh credentials showed up with obtained_at=None, which broke downstream freshness-sensitive consumers (self-heal hooks, pool pruning by age) — they treated just-minted credentials as older than they actually were and evicted them. 2. When the Nous Portal OAuth 2.1 server returns invalid_grant with 'Refresh token reuse detected' in the error_description, rewrite the message to explain the likely cause (an external process consumed the rotated RT without persisting it back) and the mitigation. The generic reuse message led users to report this as a Hermes persistence bug when the actual trigger was typically a third-party monitoring script calling /api/oauth/token directly. Non-reuse errors keep their original server description untouched. Closes #15099. Regression tests: - tests/agent/test_credential_pool.py::test_nous_seed_from_singletons_preserves_obtained_at_timestamps - tests/hermes_cli/test_auth_nous_provider.py::test_refresh_token_reuse_detection_surfaces_actionable_message - tests/hermes_cli/test_auth_nous_provider.py::test_refresh_non_reuse_error_keeps_original_description	2026-04-24 05:08:46 -07:00
Teknium	3aa1a41e88	feat(gemini): block free-tier keys at setup + surface guidance on 429 (#15100 ) Google AI Studio's free tier (<= 250 req/day for gemini-2.5-flash) is exhausted in a handful of agent turns, so the setup wizard now refuses to wire up Gemini when the supplied key is on the free tier, and the runtime 429 handler appends actionable billing guidance. Setup-time probe (hermes_cli/main.py): - `_model_flow_api_key_provider` fires one minimal generateContent call when provider_id == 'gemini' and classifies the response as free/paid/unknown via x-ratelimit-limit-requests-per-day header or 429 body containing 'free_tier'. - Free -> print block message, refuse to save the provider, return. - Paid -> 'Tier check: paid' and proceed. - Unknown (network/auth error) -> 'could not verify', proceed anyway. Runtime 429 handler (agent/gemini_native_adapter.py): - `gemini_http_error` appends billing guidance when the 429 error body mentions 'free_tier', catching users who bypass setup by putting GOOGLE_API_KEY directly in .env. Tests: 21 unit tests for the probe + error path, 4 tests for the setup-flow block. All 67 existing gemini tests still pass.	2026-04-24 04:46:17 -07:00
Teknium	346601ca8d	fix(context): invalidate stale Codex OAuth cache entries >= 400k (#15078 ) PR #14935 added a Codex-aware context resolver but only new lookups hit the live /models probe. Users who had run Hermes on gpt-5.5 / 5.4 BEFORE that PR already had the wrong value (e.g. 1,050,000 from models.dev) persisted in ~/.hermes/context_length_cache.yaml, and the cache-first lookup in get_model_context_length() returns it forever. Symptom (reported in the wild by Ludwig, min heo, Gaoge on current main at `6051fba9d`, which is AFTER #14935): * Startup banner shows context usage against 1M * Compression fires late and then OpenAI hard-rejects with 'context length will be reduced from 1,050,000 to 128,000' around the real 272k boundary. Fix: when the step-1 cache returns a value for an openai-codex lookup, check whether it's >= 400k. Codex OAuth caps every slug at 272k (live probe values) so anything at or above 400k is definitionally a pre-#14935 leftover. Drop that entry from the on-disk cache and fall through to step 5, which runs the live /models probe and repersists the correct value (or 272k from the hardcoded fallback if the probe fails). Non-Codex providers and legitimately-cached Codex entries at 272k are untouched. Changes: - agent/model_metadata.py: * _invalidate_cached_context_length() — drop a single entry from context_length_cache.yaml and rewrite the file. * Step-1 cache check in get_model_context_length() now gates provider=='openai-codex' entries >= 400k through invalidation instead of returning them. Tests (3 new in TestCodexOAuthContextLength): - stale 1.05M Codex entry is dropped from disk AND re-resolved through the live probe to 272k; unrelated cache entries survive. - fresh 272k Codex entry is respected (no probe call, no invalidation). - non-Codex 1M entries (e.g. anthropic/claude-opus-4.6 on OpenRouter) are unaffected — the guard is strictly scoped to openai-codex. Full tests/agent/test_model_metadata.py: 88 passed.	2026-04-24 04:46:07 -07:00
Teknium	1f9c368622	fix(gemini): drop integer/number/boolean enums from tool schemas (#15082 ) Gemini's Schema validator requires every `enum` entry to be a string, even when the parent `type` is integer/number/boolean. Discord's `auto_archive_duration` parameter (`type: integer, enum: [60, 1440, 4320, 10080]`) tripped this on every request that shipped the full tool catalog to generativelanguage.googleapis.com, surfacing as `Gateway: Non-retryable client error: Gemini HTTP 400 (INVALID_ARGUMENT) Invalid value ... (TYPE_STRING), 60` and aborting the turn. Sanitize by dropping the `enum` key when the declared type is numeric or boolean and any entry is non-string. The `type` and `description` survive, so the model still knows the allowed values; the tool handler keeps its own runtime validation. Other providers (OpenAI, OpenRouter, Anthropic) are unaffected — the sanitizer only runs for native Gemini / cloudcode adapters. Reported by @selfhostedsoul on Discord with hermes debug share.	2026-04-24 03:40:00 -07:00
Nicecsh	2e2de124af	fix(aux): normalize GitHub Copilot provider slugs Keep auxiliary provider resolution aligned with the switch and persisted main-provider paths when models.dev returns github-copilot slugs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-04-24 03:33:29 -07:00
Teknium	b2e124d082	refactor(commands): drop /provider, /plan handler, and clean up slash registry (#15047 ) * refactor(commands): drop /provider and clean up slash registry * refactor(commands): drop /plan special handler — use plain skill dispatch	2026-04-24 03:10:52 -07:00
Teknium	b29287258a	fix(aux-client): honor api_mode: anthropic_messages for named custom providers (#15059 ) Auxiliary tasks (session_search, flush_memories, approvals, compression, vision, etc.) that route to a named custom provider declared under config.yaml 'providers:' with 'api_mode: anthropic_messages' were silently building a plain OpenAI client and POSTing to {base_url}/chat/completions, which returns 404 on Anthropic-compatible gateways that only expose /v1/messages. Two gaps caused this: 1. hermes_cli/runtime_provider.py::_get_named_custom_provider — the providers-dict branch (new-style) returned only name/base_url/api_key/ model and dropped api_mode. The legacy custom_providers-list branch already propagated it correctly. The dict branch now parses and returns api_mode via _parse_api_mode() in both match paths. 2. agent/auxiliary_client.py::resolve_provider_client — the named custom provider block at ~L1740 ignored custom_entry['api_mode'] and unconditionally built an OpenAI client (only wrapping for Codex/Responses). It now mirrors _try_custom_endpoint()'s three-way dispatch: anthropic_messages → AnthropicAuxiliaryClient (async wrapped in AsyncAnthropicAuxiliaryClient), codex_responses → CodexAuxiliaryClient, otherwise plain OpenAI. An explicit task-level api_mode override still wins over the provider entry's declared api_mode. Fixes #15033 Tests: tests/agent/test_auxiliary_named_custom_providers.py gains a TestProvidersDictApiModeAnthropicMessages class covering - providers-dict preserves valid api_mode - invalid api_mode values are dropped - missing api_mode leaves the entry unchanged (no regression) - resolve_provider_client returns (Async)AnthropicAuxiliaryClient for api_mode=anthropic_messages - full chain via get_text_auxiliary_client / get_async_text_auxiliary_client with an auxiliary.<task> override - providers without api_mode still use the OpenAI-wire path	2026-04-24 03:10:30 -07:00
Teknium	f58a16f520	fix(auth): apply verify= to Codex OAuth /models probe (#15049 ) Follow-up to PR #14533 — applies the same _resolve_requests_verify() treatment to the one requests.get() site the PR missed (Codex OAuth chatgpt.com /models probe). Keeps all seven requests.get() callsites in model_metadata.py consistent so HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE / SSL_CERT_FILE are honored everywhere. Co-authored-by: teknium1 <teknium@hermes-agent>	2026-04-24 03:02:24 -07:00
0xbyt4	8aa37a0cf9	fix(auth): honor SSL CA env vars across httpx + requests callsites - hermes_cli/auth.py: add _default_verify() with macOS Homebrew certifi fallback (mirrors weixin `3a0ec1d93`). Extend env var chain to include REQUESTS_CA_BUNDLE so one env var works across httpx + requests paths. - agent/model_metadata.py: add _resolve_requests_verify() reading HERMES_CA_BUNDLE / REQUESTS_CA_BUNDLE / SSL_CERT_FILE in priority order. Apply explicit verify= to all 6 requests.get callsites. - Tests: 18 new unit tests + autouse platform pin on existing TestResolveVerifyFallback to keep its "returns True" assertions platform-independent. Empirically verified against self-signed HTTPS server: requests honors REQUESTS_CA_BUNDLE only; httpx honors SSL_CERT_FILE only. Hermes now honors all three everywhere. Triggered by Discord reports — Nous OAuth SSL failure on macOS Homebrew Python; custom provider self-signed cert ignored despite REQUESTS_CA_BUNDLE set in env.	2026-04-24 03:00:33 -07:00
Teknium	a9a4416c7c	fix(compress): don't reach into ContextCompressor privates from /compress (#15039 ) Manual /compress crashed with 'LCMEngine' object has no attribute '_align_boundary_forward' when any context-engine plugin was active. The gateway handler reached into _align_boundary_forward and _find_tail_cut_by_tokens on tmp_agent.context_compressor, but those are ContextCompressor-specific — not part of the generic ContextEngine ABC — so every plugin engine (LCM, etc.) raised AttributeError. - Add optional has_content_to_compress(messages) to ContextEngine ABC with a safe default of True (always attempt). - Override it in the built-in ContextCompressor using the existing private helpers — preserves exact prior behavior for 'compressor'. - Rewrite gateway /compress preflight to call the ABC method, deleting the private-helper reach-in. - Add focus_topic to the ABC compress() signature. Make _compress_context retry without focus_topic on TypeError so older strict-sig plugins don't crash on manual /compress <focus>. - Regression test with a fake ContextEngine subclass that only implements the ABC (mirrors LCM's surface). Reported by @selfhostedsoul (Discord, Apr 22).	2026-04-24 02:55:43 -07:00
Teknium	2acc8783d1	fix(errors): classify OpenRouter privacy-guardrail 404s distinctly (#14943 ) OpenRouter returns a 404 with the specific message 'No endpoints available matching your guardrail restrictions and data policy. Configure: https://openrouter.ai/settings/privacy' when a user's account-level privacy setting excludes the only endpoint serving a model (e.g. DeepSeek V4 Pro, which today is hosted only by DeepSeek's own endpoint that may log inputs). Before this change we classified it as model_not_found, which was misleading (the model exists) and triggered provider fallback (useless — the same account setting applies to every OpenRouter call). Now it classifies as a new FailoverReason.provider_policy_blocked with retryable=False, should_fallback=False. The error body already contains the fix URL, so the user still gets actionable guidance.	2026-04-23 23:26:29 -07:00
Teknium	51f4c9827f	fix(context): resolve real Codex OAuth context windows (272k, not 1M) (#14935 ) On ChatGPT Codex OAuth every gpt-5.x slug actually caps at 272,000 tokens, but Hermes was resolving gpt-5.5 / gpt-5.4 to 1,050,000 (from models.dev) because openai-codex aliases to the openai entry there. At 1.05M the compressor never fires and requests hard-fail with 'context window exceeded' around the real 272k boundary. Verified live against chatgpt.com/backend-api/codex/models: gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2-codex, gpt-5.2, gpt-5.1-codex-max → context_window = 272000 Changes: - agent/model_metadata.py: * _fetch_codex_oauth_context_lengths() — probe the Codex /models endpoint with the OAuth bearer token and read context_window per slug (1h in-memory TTL). * _resolve_codex_oauth_context_length() — prefer the live probe, fall back to hardcoded _CODEX_OAUTH_CONTEXT_FALLBACK (all 272k). * Wire into get_model_context_length() when provider=='openai-codex', running BEFORE the models.dev lookup (which returns 1.05M). Result persists via save_context_length() so subsequent lookups skip the probe entirely. * Fixed the now-wrong comment on the DEFAULT_CONTEXT_LENGTHS gpt-5.5 entry (400k was never right for Codex; it's the catch-all for providers we can't probe live). Tests (4 new in TestCodexOAuthContextLength): - fallback table used when no token is available (no models.dev leakage) - live probe overrides the fallback - probe failure (non-200) falls back to hardcoded 272k - non-codex providers (openrouter, direct openai) unaffected Non-codex context resolution is unchanged — the Codex branch only fires when provider=='openai-codex'.	2026-04-23 22:39:47 -07:00

1 2 3 4 5 ...

610 Commits