hermes-agent

mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-04-28 06:51:16 +08:00

Author	SHA1	Message	Date
Erosika	4a9ac5c355	fix(memory): drop scrub from interim commentary + final response Same layering concern as the persisted-assistant scrub already removed: _emit_interim_assistant_message and the final_response return path were mutating model output broadly. Streaming scrubber covers real leaks delta-by-delta; these post-stream scrubs were redundant.	2026-04-27 12:37:33 -07:00
Erosika	e553f6f3e4	fix(memory): narrow scrub surface to known wrapper boundaries Reviewer pushback on the original boundary-hardening commits — three overreach points pulled plugin-specific policy into shared core paths: 1. gateway/run.py hardcoded a '## Honcho Context' literal split for vision-LLM output. Plugin-format heading in framework code; could truncate legitimate output naturally containing that header. Drop the literal split; keep generic sanitize_context (the wrapper strip is plugin-agnostic). Plugin-specific cleanup belongs at the provider boundary, not the shared gateway path. 2. run_agent.run_conversation scrubbed user_message and persist_user_message before the conversation loop. User text is sacred — if a user types a literal <memory-context> tag we must not silently delete it. The producer (build_memory_context_block) is the only legitimate emitter; user input should never need the reverse op. 3. _build_assistant_message scrubbed model output before persistence. Same hazard: would silently mutate legitimate documentation/code the model emits containing the literal markers. The streaming scrubber catches real leaks delta-by-delta before content is concatenated; persist-time scrub was redundant belt-and-suspenders. 4. _fire_stream_delta stripped leading newlines from every delta unless a paragraph break flag was set. Mid-stream '\n' is legitimate markdown — lists, code fences, paragraph breaks — and chunk boundaries are arbitrary. Narrow lstrip to the very first delta of the stream only (so stale provider preamble still gets cleaned on turn start, but mid-stream formatting survives). Plus: build_memory_context_block now logs a warning when its defensive sanitize_context strips something — surfaces buggy providers returning pre-wrapped text instead of silently double-fencing. Net architectural change: scrub surface collapses from 8 sites to 3 (StreamingContextScrubber on output deltas, plugin→backend send, build_memory_context_block input-validation). Plugin-specific strings stay out of shared runtime paths. User input and persisted assistant output are no longer mutated. Tests: rescoped TestMemoryContextSanitization (helper-correctness only, no source-inspection of removed call sites), updated vision tests to drop '## Honcho Context' literal-split assertions, updated _build_assistant_message persistence test to assert preservation. Added: cross-turn scrubber reset, build_memory_context_block warn-on- violation, mid-stream newline preservation (plain + code fence).	2026-04-27 12:37:33 -07:00
Erosika	3b2edb347d	fix(gateway): scrub memory-context leaks from vision auto-analysis output fixes #5719 The auxiliary vision LLM called by gateway._enrich_message_with_vision can echo its injected Honcho system prompt back into the image description. That description gets embedded verbatim into the enriched user message, so recalled memory (personal facts, dialectic output) surfaces into a user-visible bubble. Strips both forms of leak before embedding: - <memory-context>...</memory-context> fenced blocks (sanitize_context) - trailing '## Honcho Context' sections (header + everything after) Plus regression tests: - tests/agent/test_streaming_context_scrubber.py — 13 tests on the stateful scrubber (whole block, split tags, false-positive partial tags, unterminated span, reset, case-insensitivity) - tests/run_agent/test_run_agent_codex_responses.py — 2 new tests on _fire_stream_delta covering the realistic 7-chunk leak scenario and the cross-turn scrubber reset - tests/gateway/test_vision_memory_leak.py — 4 tests covering the vision auto-analysis boundary (clean pass-through, '## Honcho Context' header, fenced block, both patterns together)	2026-04-27 12:37:33 -07:00
dontcallmejames	f1ba4014e1	fix: harden memory-context leak boundaries	2026-04-27 12:37:33 -07:00
dontcallmejames	39713ba2ae	fix: strip leaked memory context from commentary	2026-04-27 12:37:33 -07:00
hermes-agent-dhabibi	aa53fb661a	fix(copilot): mark native image requests as vision Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>	2026-04-27 08:35:50 -07:00
hermes-agent-dhabibi	8402ba150e	fix(copilot): send vision header for Copilot vision requests Thread a vision-request flag through auxiliary provider resolution so Copilot clients can include Copilot-Vision-Request only for vision tasks. This preserves normal text requests while ensuring Copilot vision payloads reach the vision-capable route. Add regression coverage for Copilot vision routing and keep cached text and vision clients separate so a text client without the header is not reused for vision. Co-authored-by: dhabibi <9087935+dhabibi@users.noreply.github.com>	2026-04-27 08:35:50 -07:00
Teknium	008860a23f	fix(approval): close remaining prompt_toolkit deadlock vectors (#15216 ) PR #13734 fixed the concurrent-tool-executor vector (ThreadPoolExecutor workers didn't inherit the CLI's TLS approval callback). Two vectors remained that could still land in the deadlocking input() fallback: 1. _spawn_background_review spawns a raw threading.Thread with no approval callback installed, so any dangerous-command guard the review agent trips falls back to input() -> deadlock against the parent's prompt_toolkit TUI (same class as delegate_task subagents, fixed in `023b1bff1` / #15491). Install a _bg_review_auto_deny callback at thread start, clear on finally. 2. prompt_dangerous_approval's fallback unconditionally spawned a daemon thread calling input() when approval_callback was None. That fallback can never succeed under prompt_toolkit because the user's Enter goes to pt's raw-mode stdin capture. Detect an active pt Application via get_app_or_none() and fail closed (deny + log) instead, so future threads that forget to install a callback degrade gracefully instead of hanging 60s invisibly. Regression guards: - tests/run_agent/test_background_review.py verifies the review worker thread sees a callable auto-deny callback mid-run and that the slot is cleared in the finally block. - tests/tools/test_approval.py TestFailClosedUnderPromptToolkit verifies prompt_dangerous_approval returns 'deny' fast under a mocked pt Application, and that a real callback still wins over the guard.	2026-04-27 06:42:32 -07:00
luyao618	8ad29a938a	fix(agent): restrict background review agent to memory and skills toolsets The background skill/memory review agent was created without toolset restrictions, inheriting the full default tool set. This allowed it to use terminal, send_message, delegate_task, and other tools outside its intended scope, potentially performing unrelated side effects after skill creation. Restrict the review agent to only memory and skills toolsets by passing enabled_toolsets=['memory', 'skills'] during AIAgent construction. Fixes #15204	2026-04-27 06:41:23 -07:00
Teknium	ec671c4154	feat(image-input): native multimodal routing based on model vision capability (#16506 ) * feat(image-input): native multimodal routing based on model vision capability Attach user-sent images as OpenAI-style content parts on the user turn when the active model supports native vision, so vision-capable models see real pixels instead of a lossy text description from vision_analyze. Routing decision (agent/image_routing.py::decide_image_input_mode): agent.image_input_mode = auto \| native \| text (default: auto) In auto mode: - If auxiliary.vision.provider/model is explicitly configured, keep the text pipeline (user paid for a dedicated vision backend). - Else if models.dev reports supports_vision=True for the active provider/model, attach natively. - Else fall back to text (current behaviour). Call sites updated: gateway/run.py (all messaging platforms), tui_gateway (dashboard/Ink), cli.py (interactive /attach + drag-drop). run_agent.py changes: - _prepare_anthropic_messages_for_api now passes image parts through unchanged when the model supports vision — the Anthropic adapter translates them to native image blocks. Previous behaviour (vision_analyze → text) only runs for non-vision Anthropic models. - New _prepare_messages_for_non_vision_model mirrors the same contract for chat.completions and codex_responses paths, so non-vision models on any provider get text-fallback instead of failing at the provider. - New _model_supports_vision() helper reads models.dev caps. vision_analyze description rewritten: positions it as a tool for images NOT already visible in the conversation (URLs, tool output, deeper inspection). Prevents the model from redundantly calling it on images already attached natively. Config default: agent.image_input_mode = auto. Tests: 35 new (test_image_routing.py + test_vision_aware_preprocessing.py), all existing tests that reference _prepare_anthropic_messages_for_api still pass (198 targeted + new tests green). * feat(image-input): size-cap + resize oversized images, charge image tokens in compressor Two follow-ups that make the native image routing safer for long / heavy sessions: 1) Oversize handling in build_native_content_parts: - 20 MB ceiling per image (matches vision_tools._MAX_BASE64_BYTES, the most restrictive provider — Gemini inline data). - Delegates to vision_tools._resize_image_for_vision (Pillow-based, already battle-tested) to downscale to 5 MB first-try. - If Pillow is missing or resize still overshoots, the image is dropped and reported back in skipped[]; caller falls back to text enrichment for that image. 2) Image-token accounting in context_compressor: - New _IMAGE_TOKEN_ESTIMATE = 1600 (matches Claude Code's constant; within the realistic range for Anthropic/GPT-4o/Gemini billing). - _content_length_for_budget() helper: sums text-part lengths and charges _IMAGE_CHAR_EQUIVALENT (1600 * 4 chars) per image/image_url/ input_image part. Base64 payload inside image_url is NOT counted as chars — dimensions don't matter, only image-presence. - Both tail-cut sites (_prune_old_tool_results L527 and _find_tail_cut_by_tokens L1126) now call the helper so multi-image conversations don't slip past compression budget. Tests: 9 new in test_image_routing.py (oversize triggers resize, resize-fails-returns-None, oversize-skipped-reported), 11 new in test_compressor_image_tokens.py (flat charge per image, multiple images, Responses-API / Anthropic-native / OpenAI-chat shapes, no-inflation on raw base64, bounds-check on the constant, integration test that an image-heavy tail actually gets trimmed). * fix(image-input): replace blanket 20MB ceiling with empirically-verified per-provider limits The previous commit imposed a hardcoded 20 MB base64 ceiling on all providers, triggering auto-resize on anything larger. This was wrong in both directions: * Too loose for Anthropic — actual limit is 5 MB (returns HTTP 400 'image exceeds 5 MB maximum' above that). * Too strict for OpenAI / Codex / OpenRouter — accept 49 MB+ without complaint (empirically verified April 2026 with progressive PNG sizes). New behaviour: * _PROVIDER_BASE64_CEILING table: only anthropic and bedrock have a ceiling (5 MB, since bedrock-on-Claude shares Anthropic's decoder). * Providers NOT in the table get no ceiling — images attach at native size and we trust the provider to return its own error if it disagrees. A provider-specific 400 message is clearer than us guessing wrong and silently degrading image quality. * build_native_content_parts() gains a keyword-only provider arg; gateway/CLI/TUI pass the active provider so Anthropic users get auto-resize protection while OpenAI users don't pay it. * Resize target dropped from 5 MB to 4 MB to slide safely under Anthropic's boundary with header overhead. Empirical measurements (direct API, no Hermes in the loop): image b64 anthropic openrouter/gpt5.5 codex-oauth/gpt5.5 0.19 MB ✓ ✓ ✓ 12.37 MB ✗ 400 5MB ✓ ✓ 23.85 MB ✗ 400 5MB ✓ ✓ 49.46 MB ✗ 413 ✓ ✓ Tests: rewrote TestOversizeHandling (5 tests): no-ceiling pass-through, Anthropic resize fires, Anthropic skip on resize-fail, build_native_parts routes ceiling by provider, unknown provider gets no ceiling. All 52 targeted tests pass. * refactor(image-input): attempt native, shrink-and-retry on provider reject Replace proactive per-provider size ceilings with a reactive shrink path on the provider's actual rejection. All providers now attempt native full-size attachment first; if the provider returns an image-too-large error, the agent silently shrinks and retries once. Why the previous design was wrong: hardcoding provider ceilings (anthropic=5MB, others=unlimited) meant OpenAI users on a 10MB image paid no tax, but Anthropic users lost quality on anything >5MB even though the empirical behaviour at provider-reject time is the same (shrink + retry). Baking the table into the routing layer also requires updating Hermes every time a provider's limit changes. Reactive design: - image_routing.py: _file_to_data_url encodes native size, no ceiling. build_native_content_parts drops its provider kwarg. - error_classifier.py: new FailoverReason.image_too_large + pattern match ("image exceeds", "image too large", etc.) checked BEFORE context_overflow so Anthropic's 5MB rejection lands in the right bucket. - run_agent.py: new _try_shrink_image_parts_in_messages walks api messages in-place, re-encodes oversized data: URL image parts through vision_tools._resize_image_for_vision to fit under 4MB, handles both chat.completions (dict image_url) and Responses (string image_url) shapes, ignores http URLs (provider-fetched). New image_shrink_retry_attempted flag in the retry loop fires the shrink exactly once per turn after credential-pool recovery but before auth retries. E2E verified live against Anthropic claude-sonnet-4-6: - 17.9MB PNG (23.9MB b64) attached at native size - Anthropic returns 400 "image exceeds 5 MB maximum" - Agent logs '📐 Image(s) exceeded provider size limit — shrank and retrying...' - Retry succeeds, correct response delivered in 6.8s total. Tests: 12 new (8 shrink-helper shapes + 4 classifier signals), replaces 5 proactive-ceiling tests with 3 simpler 'native attach works' tests. 181 targeted tests pass. test_enum_members_exist in test_error_classifier.py updated for the new enum value.	2026-04-27 06:27:59 -07:00
Teknium	ee1a07f9e9	fix(agent): block cross-provider reasoning leak to DeepSeek/Kimi (#15748 ) (#16500 ) On provider switches mid-session (e.g. MiniMax -> DeepSeek), the source assistant turn carries a 'reasoning' field written by the prior provider but no 'reasoning_content' key. _copy_reasoning_content_for_api would promote that foreign 'reasoning' to 'reasoning_content' on the outbound DeepSeek request, leaking a cross-provider chain of thought and in practice causing HTTP 400. DeepSeek's own _build_assistant_message always pins reasoning_content='' at creation time for tool-call turns, so the shape (reasoning set, reasoning_content absent, tool_calls present) is unreachable from same-provider DeepSeek history — it can only come from a prior provider. Pad with '' in that case instead of promoting. Healthy same-provider 'reasoning' promotion (no tool_calls, or on providers that do not require the empty-string pin) is unchanged.	2026-04-27 04:06:23 -07:00
Tosko4	e85b752516	fix: signal compression boundary to context engine When _compress_context rotates session_id (compression split), fire on_session_start(new_sid, boundary_reason="compression", old_session_id=<old>) on the active context engine. Plugin engines (e.g. hermes-lcm) use this to preserve DAG lineage across the rollover instead of re-initializing fresh per-session state. Built-in ContextCompressor.on_session_start accepts **kwargs and ignores them — no behavior change for default users. Closes hermes-lcm#68 symptom: after Hermes compressed and minted a new physical session, LCM was treating the split as a fresh /new and losing continuity (compression_count: 1, store_messages: 0, dag_nodes: 0). Credit: @Tosko4 (PR #13370) — minimized scope to the boundary_reason signal only; the broader session-lifecycle refactor will be taken in separate PRs if justified by concrete plugin need.	2026-04-26 19:07:18 -07:00
Teknium	45bfcb9e71	test: update bare-agent helper for live-runtime attrs added by #16099 Background review fork now inherits session_id, credential_pool, and status_callback from the parent (added in #16099 after this PR was written). Extend the bare-agent helper so the regression test keeps reaching the cleanup assertions instead of failing in the runtime resolver. Signed-off-by: Teknium <8425893+teknium1@users.noreply.github.com>	2026-04-26 12:45:39 -07:00
MRHwick	2d86e97a7e	fix(run_agent): shut down background review memory providers Temporary background review agents can initialize Hindsight-backed memory clients, but close() alone skips provider teardown. Shut the memory provider down before closing so aiohttp sessions do not leak at process exit. Made-with: Cursor	2026-04-26 12:45:39 -07:00
Teknium	76042f5867	feat(review): class-first skill review prompt (#16026 ) The background skill-review prompt (spawned after N user turns) now instructs the reviewer to SURVEY existing skills first, identify the CLASS of task, and PREFER updating/generalizing an existing skill over creating a new narrow one. This reduces near-duplicate skill accumulation at the source. Catches the common failure mode where repeated tasks of the same class each spawn their own specific skill ("fix-my-tauri-error", "fix-my-electron-error") instead of a single class-level skill ("desktop-app-build-troubleshooting"). Applied to both _SKILL_REVIEW_PROMPT and the Skills half of _COMBINED_REVIEW_PROMPT. Memory-only review prompt unchanged. Groundwork for the Curator feature (issue #7816) — the creation-side fix. Curator handles the retirement/consolidation side in a follow-up PR. Tests assert the behavioral instructions are present (survey, class, update- over-create, overlap-flagging, opt-out clause) rather than snapshotting the full prompt text.	2026-04-26 05:17:10 -07:00
akhater	ac57114284	fix(agent): support Azure OpenAI gpt-5.x on chat/completions endpoint Azure OpenAI exposes an OpenAI-compatible endpoint at `{resource}.openai.azure.com/openai/v1` that accepts the standard `openai` Python client. Two issues prevented gpt-5.x models from working: 1. `_max_tokens_param()` only sent `max_completion_tokens` for `api.openai.com` URLs. Azure also requires `max_completion_tokens` for gpt-5.x models. 2. The `codex_responses` upgrade gate unconditionally upgraded gpt-5.x to Responses API. Azure does NOT support the Responses API — it serves gpt-5.x on the regular `/chat/completions` path, causing a 404. Fix: add `_is_azure_openai_url()` that matches `openai.azure.com` URLs. - `_max_tokens_param()` now returns `max_completion_tokens` for Azure. - The `codex_responses` upgrade gate skips Azure so gpt-5.x stays on `chat_completions` where Azure actually serves it. - The fallback-provider api_mode picker also recognises Azure and stays on chat_completions. - Tests cover max_tokens routing, api_mode behaviour, and URL detection. gpt-4.x models on Azure are unaffected (already used chat_completions + max_tokens, which Azure accepts for those models). Salvage of PR #10086 — rewritten against current main where the codex_responses upgrade gate gained copilot-acp / explicit-api_mode exclusions.	2026-04-25 18:48:43 -07:00
nerijusas	81e01f6ee9	fix(agent): preserve Codex message items for replay	2026-04-25 18:22:06 -07:00
FocusFlow Dev	ad0ac89478	fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages Previously _copy_reasoning_content_for_api only padded reasoning_content when the assistant message had tool_calls. DeepSeek V4 thinking mode requires the field on every assistant turn, including plain text replies without tool_calls. - Remove the 'source_msg.get("tool_calls") and' guard - Update test: plain assistant turns now get padded for DeepSeek/Kimi Fixes #15213	2026-04-26 07:47:13 +08:00
kshitij	648b89911f	fix: use output_text for assistant message content in Codex Responses API (#15690 ) The Codex Responses API rejects input_text inside assistant messages — only output_text and refusal are valid content types for assistant role. _chat_content_to_responses_parts() previously hardcoded all text content to input_text regardless of the message role. When an assistant message had list-format content (multimodal or structured), this produced invalid input_text parts that the API rejected with: Invalid value: 'input_text'. Supported values are: 'output_text' and 'refusal'. Fix: add a role parameter to _chat_content_to_responses_parts() that selects output_text for assistant messages and input_text for user messages. Thread this through _chat_messages_to_responses_input() and _preflight_codex_input_items(). Fixes #15687	2026-04-25 10:13:29 -07:00
kshitijk4poor	7c17accb29	fix: /stop now immediately aborts streaming retry loop When a user sends /stop during a streaming API call, the outer poll loop detects _interrupt_requested and closes the HTTP connection. However, the inner _call() thread catches the connection error and enters its retry loop — opening a FRESH connection without checking the interrupt flag. On slow providers like ollama-cloud, each retry attempt blocks for the full stream-read timeout (120s+). With 3 retry attempts this caused 510+ second delays between /stop and actual response — the agent appeared completely unresponsive despite the stop being acknowledged. Fix: add an _interrupt_requested check at the top of the streaming retry loop so the agent exits immediately instead of retrying. Also fix log truncation: all session key logging in gateway/run.py used [:20] or [:30] slices, which truncated 'agent:main:telegram:dm:5690190437' (33 chars) to 'agent:main:telegram:' — losing the identifying chat type and user ID. Replace with full keys to make logs debuggable. Reported by user Sidharth Pulipaka via Telegram on ollama-cloud provider.	2026-04-25 09:51:39 -07:00
Teknium	ea01bdcebe	refactor(memory): remove flush_memories entirely (#15696 ) The AIAgent.flush_memories pre-compression save, the gateway _flush_memories_for_session, and everything feeding them are obsolete now that the background memory/skill review handles persistent memory extraction. Problems with flush_memories: - Pre-dates the background review loop. It was the only memory-save path when introduced; the background review now fires every 10 user turns on CLI and gateway alike, which is far more frequent than compression or session reset ever triggered flush. - Blocking and synchronous. Pre-compression flush ran on the live agent before compression, blocking the user-visible response. - Cache-breaking. Flush built a temporary conversation prefix (system prompt + memory-only tool list) that diverged from the live conversation's cached prefix, invalidating prompt caching. The gateway variant spawned a fresh AIAgent with its own clean prompt for each finalized session — still cache-breaking, just in a different process. - Redundant. Background review runs in the live conversation's session context, gets the same content, writes to the same memory store, and doesn't break the cache. Everything flush_memories claimed to preserve is already covered. What this removes: - AIAgent.flush_memories() method (~248 LOC in run_agent.py) - Pre-compression flush call in _compress_context - flush_memories call sites in cli.py (/new + exit) - GatewayRunner._flush_memories_for_session + _async_flush_memories (and the 3 call sites: session expiry watcher, /new, /resume) - 'flush_memories' entry from DEFAULT_CONFIG auxiliary tasks, hermes tools UI task list, auxiliary_client docstrings - _memory_flush_min_turns config + init - #15631's headroom-deduction math in _check_compression_model_feasibility (headroom was only needed because flush dragged the full main-agent system prompt along; the compression summariser sends a single user-role prompt so new_threshold = aux_context is safe again) - The dedicated test files and assertions that exercised flush-specific paths What this renames (with read-time backcompat on sessions.json): - SessionEntry.memory_flushed -> SessionEntry.expiry_finalized. The session-expiry watcher still uses the flag to avoid re-running finalize/eviction on the same expired session; the new name reflects what it now actually gates. from_dict() reads 'expiry_finalized' first, falls back to the legacy 'memory_flushed' key so existing sessions.json files upgrade seamlessly. Supersedes #15631 and #15638. Tested: 383 targeted tests pass across run_agent/, agent/, cli/, and gateway/ session-boundary suites. No behavior regressions — background memory review continues to handle persistent memory extraction on both CLI and gateway.	2026-04-25 08:21:14 -07:00
kshitijk4poor	d635e2df3f	fix(compression): pass provider to context length resolver in feasibility check _check_compression_model_feasibility calls get_model_context_length without provider=, so Codex OAuth users get 1,050,000 (from models.dev for 'openai') instead of the actual 272,000 limit. This happens because _infer_provider_from_url maps chatgpt.com → 'openai' (not 'openai-codex'), skipping the Codex-specific resolution branch entirely. Result: compression threshold set at 85% of 1.05M = 892K — conversations never trigger compression, the context grows unbounded, and when gateway hygiene eventually forces compression, the Codex endpoint drops the oversized streaming request ('peer closed connection without sending complete message body'). Fix: forward self.provider to get_model_context_length so provider- specific resolution branches (Codex OAuth 272K, Copilot live /models, Nous suffix-match) fire correctly. Reported by user on GPT 5.5 via Codex OAuth Pro (paste.rs/vsra3).	2026-04-25 07:09:47 -07:00
Teknium	f92006ce1c	fix(compression): reserve system+tools headroom when aux binds threshold (#15631 ) When the auxiliary compression model's context is smaller than the main model's compression threshold, _check_compression_model_feasibility auto-lowers the session threshold. Previously it set: new_threshold = aux_context This let the raw message list grow to exactly aux_context tokens. But compression and flush_memories actually send system_prompt + tool_schemas + messages to the aux model. With 50+ tools that overhead is 25-30K tokens, so the full request overflowed aux with HTTP 400. Subtract a headroom estimate from aux_context before setting the new threshold: the actual tool-schema token count (from estimate_request_tokens_rough) plus a 12K allowance for the system prompt (not yet built at __init__ time) and flush-instruction overhead. Clamp to MINIMUM_CONTEXT_LENGTH so the session still starts even with an unusually heavy tool schema. This fixes the 'flush_memories overflow on busy toolsets' path that Teknium flagged — where main and aux can be nominally the same model but still 400 because the threshold left no room for the request overhead. Same fix also protects the normal compression summarisation request on the same binding aux. Tests: two new regression tests cover the headroom reservation and the MINIMUM_CONTEXT_LENGTH floor. Two existing tests updated for the new (lower) threshold values now that empty-tools still produces a 12K static headroom deduction.	2026-04-25 05:41:56 -07:00
Teknium	f67a61dc93	fix(flush_memories): strip temperature from codex_responses fallback (#15620 ) The memory-flush fallback for api_mode='codex_responses' was unconditionally adding `temperature` to codex_kwargs before calling _run_codex_stream. The Responses API does not accept temperature on any supported backend: - chatgpt.com/backend-api/codex rejects it outright - api.openai.com + gpt-5/o-series reasoning models reject it - Copilot Responses rejects it on reasoning models The CodexAuxiliaryClient adapter and the codex_responses transport both correctly omit temperature — the flush fallback was the only path putting it back. On errors from the primary aux path (e.g. expired OAuth token), users saw `⚠ Auxiliary memory flush failed: HTTP 400: Unsupported parameter: temperature`. Reported by Garik [NOUS] on GPT-5.5 via Codex OAuth Pro.	2026-04-25 05:01:25 -07:00
Teknium	d58b305adf	refactor(deepseek-reasoning): consolidate detection into helpers + regression tests Extracts _needs_kimi_tool_reasoning() for symmetry with the existing _needs_deepseek_tool_reasoning() helper, so _copy_reasoning_content_for_api uses the same detection logic as _build_assistant_message. Future changes to either provider's signals now only touch one function. Adds tests/run_agent/test_deepseek_reasoning_content_echo.py covering: - All 3 DeepSeek detection signals (provider, model, host) - Poisoned history replay (empty string fallback) - Plain assistant turns NOT padded - Explicit reasoning_content preserved - Reasoning field promoted to reasoning_content - Existing Kimi/Moonshot detection intact - Non-thinking providers left alone 21 tests, all pass.	2026-04-24 16:38:29 -07:00
Brian D. Evans	00c3d848d8	fix(memory): skip external-provider sync on interrupted turns (#15218 ) ``run_conversation`` was calling ``memory_manager.sync_all( original_user_message, final_response)`` at the end of every turn where both args were present. That gate didn't consider the ``interrupted`` local flag, so an external memory backend received partial assistant output, aborted tool chains, or mid-stream resets as durable conversational truth. Downstream recall then treated the not-yet-real state as if the user had seen it complete, poisoning the trust boundary between "what the user took away from the turn" and "what Hermes was in the middle of producing when the interrupt hit". Extracted the inline sync block into a new private method ``AIAgent._sync_external_memory_for_turn(original_user_message, final_response, interrupted)`` so the interrupt guard is a single visible check at the top of the method instead of hidden in a boolean-and at the call site. That also gives tests a clean seam to assert on — the pre-fix layout buried the logic inside the 3,000-line ``run_conversation`` function where no focused test could reach it. The new method encodes three independent skip conditions: 1. ``interrupted`` → skip entirely (the #15218 fix). Applies even when ``final_response`` and ``original_user_message`` happen to be populated — an interrupt may have landed between a streamed reply and the next tool call, so the strings on disk are not actually the turn the user took away. 2. No memory manager / no final_response / no user message → preserve existing skip behaviour (nothing new for providerless sessions, system-initiated refreshes, tool-only turns that never resolved, etc.). 3. Sync_all / queue_prefetch_all exceptions → swallow. External memory providers are strictly best-effort; a misconfigured or offline backend must never block the user from seeing their response. The prefetch side-effect is gated on the same interrupt flag: the user's next message is almost certainly a retry of the same intent, and a prefetch keyed on the interrupted turn would fire against stale context. ### Tests (16 new, all passing on py3.11 venv) ``tests/run_agent/test_memory_sync_interrupted.py`` exercises the helper directly on a bare ``AIAgent`` (``__new__`` pattern that the interrupt-propagation tests already use). Coverage: - Interrupted turn with full-looking response → no sync (the fix) - Interrupted turn with long assistant output → no sync (the interrupt could have landed mid-stream; strings-on-disk lie) - Normal completed turn → sync_all + queue_prefetch_all both called with the right args (regression guard for the positive path) - No final_response / no user_message / no memory manager → existing pre-fix skip paths still apply - sync_all raises → exception swallowed, prefetch still attempted - queue_prefetch_all raises → exception swallowed after sync succeeded - 8-case parametrised matrix across (interrupted × final_response × original_user_message) asserts sync fires iff interrupted=False AND both strings are non-empty Closes #15218 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 15:30:18 -07:00
Teknium	2d444fc84d	fix(run_agent): handle unescaped control chars in tool_call arguments (#15356 ) Extends _repair_tool_call_arguments() to cover the most common local-model JSON corruption pattern: llama.cpp/Ollama backends emit literal tabs and newlines inside JSON string values (memory save summaries, file contents, etc.). Previously fell through to '{}' replacement, losing the call. Adds two repair passes: - Pass 0: json.loads(strict=False) + re-serialise to canonical wire form - Pass 4: escape 0x00-0x1F control chars inside string values, then retry Ports the core utility from #12068 / PR #12093 without the larger plumbing change (that PR also replaced json.loads at 8 call sites; current main's _repair_tool_call_arguments is already the single chokepoint, so the upgrade happens transparently for every existing caller). Credit: @truenorth-lj for the original utility design. 4 new regression tests covering literal newlines, tabs, re-serialisation to strict=True-valid output, and the trailing-comma + control-char combination case.	2026-04-24 15:06:41 -07:00
AJ	17fc84c256	fix: repair malformed tool call args in streaming assembly before flagging as truncated When the streaming path (chat completions) assembled tool call deltas and detected malformed JSON arguments, it set has_truncated_tool_args=True but passed the broken args through unchanged. This triggered the truncation handler which returned a partial result and killed the session (/new required). _many_ malformations are repairable: trailing commas, unclosed brackets, Python None, empty strings. _repair_tool_call_arguments() already existed for the pre-API-request path but wasn't called during streaming assembly. Now when JSON parsing fails during streaming assembly, we attempt repair via _repair_tool_call_arguments() before flagging as truncated. If repair succeeds (returns valid JSON), the tool call proceeds normally. Only truly unrepairable args fall through to the truncation handler. This prevents the most common session-killing failure mode for models like GLM-5.1 that produce trailing commas or unclosed brackets. Tests: 12 new streaming assembly repair tests, all 29 existing repair tests still passing.	2026-04-24 15:03:07 -07:00
luyao618	7a192b124e	fix(run_agent): repair corrupted tool_call arguments before sending to provider When a session is split by context compression mid-tool-call, an assistant message may end up with truncated/invalid JSON in tool_calls[].function.arguments. On the next turn this is replayed verbatim and providers reject the entire request with HTTP 400 invalid_tool_call_format, bricking the conversation in a loop that cannot recover without manual session quarantine. This patch adds a defensive sanitizer that runs immediately before client.chat.completions.create() in AIAgent.run_conversation(): - Validates each assistant tool_calls[].function.arguments via json.loads - Replaces invalid/empty arguments with '{}' - Injects a synthetic tool response (or prepends a marker to the existing one) so downstream messages keep valid tool_call_id pairing - Logs each repair with session_id / message_index / preview for observability Defense in depth: corruption can originate from compression splits, manual edits, or plugin bugs. Sanitizing at the send chokepoint catches all sources. Adds 7 unit tests covering: truncated JSON, empty string, None, non-string args, existing matching tool response (no duplicate injection), non-assistant messages ignored, multiple repairs. Fixes #15236	2026-04-24 14:55:47 -07:00
Teknium	4093ee9c62	fix(codex): detect leaked tool-call text in assistant content (#15347 ) gpt-5.x on the Codex Responses API sometimes degenerates and emits Harmony-style `to=functions.<name> {json}` serialization as plain assistant-message text instead of a structured `function_call` item. The intent never makes it into `response.output` as a function_call, so `tool_calls` is empty and `_normalize_codex_response()` returns the leaked text as the final content. Downstream (e.g. delegate_task), this surfaces as a confident-looking summary with `tool_trace: []` because no tools actually ran — the Taiwan-embassy-email bug report. Detect the pattern, scrub the content, and return finish_reason= 'incomplete' so the existing Codex-incomplete continuation path (run_agent.py:11331, 3 retries) gets a chance to re-elicit a proper function_call item. Encrypted reasoning items are preserved so the model keeps its chain-of-thought on the retry. Regression tests: leaked text triggers incomplete, real tool calls alongside leak-looking text are preserved, clean responses pass through unchanged. Reported on Discord (gpt-5.4 / openai-codex).	2026-04-24 14:39:59 -07:00
helix4u	6a957a74bc	fix(memory): add write origin metadata	2026-04-24 14:37:55 -07:00
helix4u	8a2506af43	fix(aux): surface auxiliary failures in UI	2026-04-24 14:31:21 -07:00
vlwkaos	f7f7588893	fix(agent): only set rate-limit cooldown when leaving primary; add tests	2026-04-24 05:35:43 -07:00
LeonSGP43	a9fd8d7c88	fix(agent): default missing fallback chain on switch	2026-04-24 05:35:43 -07:00
Teknium	a1caec1088	fix(agent): repair CamelCase + _tool suffix tool-call emissions (#15124 ) Claude-style and some Anthropic-tuned models occasionally emit tool names as class-like identifiers: TodoTool_tool, Patch_tool, BrowserClick_tool, PatchTool. These failed strict-dict lookup in valid_tool_names and triggered the 'Unknown tool' self-correction loop, wasting a full turn of iteration and tokens. _repair_tool_call already handled lowercase / separator / fuzzy matches but couldn't bridge the CamelCase-to-snake_case gap or the trailing '_tool' suffix that Claude sometimes tacks on. Extend it with two bounded normalization passes: 1. CamelCase -> snake_case (via regex lookbehind). 2. Strip trailing _tool / -tool / tool suffix (case-insensitive, applied twice so TodoTool_tool reduces all the way: strip _tool -> TodoTool, snake -> todo_tool, strip 'tool' -> todo). Cheap fast-paths (lowercase / separator-normalized) still run first so the common case stays zero-cost. Fuzzy match remains the last resort unchanged. Tests: tests/run_agent/test_repair_tool_call_name.py covers the three original reports (TodoTool_tool, Patch_tool, BrowserClick_tool), plus PatchTool, WriteFileTool, ReadFile_tool, write-file_Tool, patch-tool, and edge cases (empty, None, '_tool' alone, genuinely unknown names). 18 new tests + 17 existing arg-repair tests = 35/35 pass. Closes #14784	2026-04-24 05:32:08 -07:00
Prasad Subrahmanya	1fc77f995b	fix(agent): fall back on rate limit when pool has no rotation room Extracts pool-rotation-room logic into `_pool_may_recover_from_rate_limit` so single-credential pools no longer block the eager-fallback path on 429. The existing check `pool is not None and pool.has_available()` lets fallback fire only after the pool marks every entry as exhausted. With exactly one credential in the pool (the common shape for Gemini OAuth, Vertex service accounts, and any personal-key setup), `has_available()` flips back to True as soon as the cooldown expires — Hermes retries against the same entry, hits the same daily-quota 429, and burns the retry budget in a tight loop before ever reaching the configured `fallback_model`. Observed in the wild as 4+ hours of 429 noise on a single Gemini key instead of falling through to Vertex as configured. Rotation is only meaningful with more than one credential — gate on `len(pool.entries()) > 1`. Multi-credential pools keep the current wait-for-rotation behaviour unchanged. Fixes #11314. Related to #8947, #10210, #7230. Narrower scope than open PRs #8023 (classifier change) and #11492 (503/529 credential-pool bypass) — this addresses the single-credential 429 case specifically and does not conflict with either. Tests: 6 new unit tests in tests/run_agent/test_provider_fallback.py covering (a) None pool, (b) single-cred available, (c) single-cred in cooldown, (d) 2-cred available rotates, (e) multi-cred all cooling-down falls back, (f) many-cred available rotates. All 18 tests in the file pass.	2026-04-24 05:20:05 -07:00
l0hde	2cab8129d1	feat(copilot): add 401 auth recovery with automatic token refresh and client rebuild When using GitHub Copilot as provider, HTTP 401 errors could cause Hermes to silently fall back to the next model in the chain instead of recovering. This adds a one-shot retry mechanism that: 1. Re-resolves the Copilot token via the standard priority chain (COPILOT_GITHUB_TOKEN -> GH_TOKEN -> GITHUB_TOKEN -> gh auth token) 2. Rebuilds the OpenAI client with fresh credentials and Copilot headers 3. Retries the failed request before falling back The fix handles the common case where the gho_* OAuth token remains valid but the httpx client state becomes stale (e.g. after startup race conditions or long-lived sessions). Key design decisions: - Always rebuild client even if token string unchanged (recovers stale state) - Uses _apply_client_headers_for_base_url() for canonical header management - One-shot flag guard prevents infinite 401 loops (matches existing pattern used by Codex/Nous/Anthropic providers) - No token exchange via /copilot_internal/v2/token (returns 404 for some account types; direct gho_* auth works reliably) Tests: 3 new test cases covering end-to-end 401->refresh->retry, client rebuild verification, and same-token rebuild scenarios. Docs: Updated providers.md with Copilot auth behavior section.	2026-04-24 05:09:08 -07:00
Teknium	c2b3db48f5	fix(agent): retry on json.JSONDecodeError instead of treating it as a local validation error (#15107 ) json.JSONDecodeError inherits from ValueError. The agent loop's non-retryable classifier at run_agent.py ~L10782 treated any ValueError/TypeError as a local programming bug and short-circuited retry. Without a carve-out, a transient JSONDecodeError from a provider that returned a malformed response body, a truncated stream, or a router-layer corruption would fail the turn immediately. Add JSONDecodeError to the existing UnicodeEncodeError exclusion tuple so the classified-retry logic (which already handles 429/529/ context-overflow/etc.) gets to run on bad-JSON errors. Tests (tests/run_agent/test_jsondecodeerror_retryable.py): - JSONDecodeError: NOT local validation - UnicodeEncodeError: NOT local validation (existing carve-out) - bare ValueError: IS local validation (programming bug) - bare TypeError: IS local validation (programming bug) - source-level assertion that run_agent.py still carries the carve-out (guards against accidental revert) Closes #14782	2026-04-24 05:02:58 -07:00
Teknium	18f3fc8a6f	fix(tests): resolve 17 persistent CI test failures (#15084 ) Make the main-branch test suite pass again. Most failures were tests still asserting old shapes after recent refactors; two were real source bugs. Source fixes: - tools/mcp_tool.py: _kill_orphaned_mcp_children() slept 2s on every shutdown even when no tracked PIDs existed, making test_shutdown_is_parallel measure ~3s for 3 parallel 1s shutdowns. Early-return when pids is empty. - hermes_cli/tips.py: tip 105 was 157 chars; corpus max is 150. Test fixes (mostly stale mock targets / missing fixture fields): - test_zombie_process_cleanup, test_agent_cache: patch run_agent.cleanup_vm (the local name bound at import), not tools.terminal_tool.cleanup_vm. - test_browser_camofox: patch tools.browser_camofox.load_config, not hermes_cli.config.load_config (the source module, not the resolved one). - test_flush_memories_codex._chat_response_with_memory_call: add finish_reason, tool_call.id, tool_call.type so the chat_completions transport normalizer doesn't AttributeError. - test_concurrent_interrupt: polling_tool signature now accepts messages= kwarg that _invoke_tool() passes through. - test_minimax_provider: add _fallback_chain=[] to the __new__'d agent so switch_model() doesn't AttributeError. - test_skills_config: SKILLS_DIR MagicMock + .rglob stopped working after the scanner switched to agent.skill_utils.iter_skill_index_files (os.walk-based). Point SKILLS_DIR at a real tmp_path and patch agent.skill_utils.get_external_skills_dirs. - test_browser_cdp_tool: browser_cdp toolset was intentionally split into 'browser-cdp' (commit `96b0f3700`) so its stricter check_fn doesn't gate the whole browser toolset; test now expects 'browser-cdp'. - test_registry: add tools.browser_dialog_tool to the expected builtin-discovery set (PR #14540 added it). - test_file_tools TestPatchHints: patch_tool surfaces hints as a '_hint' key on the JSON payload, not inline '[Hint: ...' text. - test_write_deny test_hermes_env: resolve .env via get_hermes_home() so the path matches the profile-aware denylist under hermetic HERMES_HOME. - test_checkpoint_manager test_falls_back_to_parent: guard the walk-up so a stray /tmp/pyproject.toml on the host doesn't pick up /tmp as the project root. - test_quick_commands: set cli.session_id in the __new__'d CLI so the alias-args path doesn't trip AttributeError when fuzzy-matching leaks a skill command across xdist test distribution.	2026-04-24 03:46:46 -07:00
WildCat Eng Manager	7626f3702e	feat: read prompt caching cache_ttl from config - Load prompt_caching.cache_ttl in AIAgent (5m default, 1h opt-in) - Document DEFAULT_CONFIG and developer guide example - Add unit tests for default, 1h, and invalid TTL fallback Made-with: Cursor	2026-04-24 03:21:29 -07:00
luyao618	bc15f526fb	fix(agent): exclude prior-history tool messages from background review summary Cherry-pick-of: `27b6a217b` (PR #14967 by @luyao618) Co-authored-by: luyao618 <364939526@qq.com>	2026-04-24 03:10:19 -07:00
Teknium	166b960fe4	test(proxy): regression tests for NO_PROXY bypass on keepalive client Pin the behaviour added in the preceding commit — `_get_proxy_for_base_url()` must return None for hosts covered by NO_PROXY and the HTTPS_PROXY otherwise, and the full `_create_openai_client()` path must NOT mount HTTPProxy for a NO_PROXY host. Refs: #14966	2026-04-24 03:04:42 -07:00
Reginaldas	3e10f339fd	fix(providers): send user agent to routermint endpoints	2026-04-24 03:02:16 -07:00
Teknium	a9a4416c7c	fix(compress): don't reach into ContextCompressor privates from /compress (#15039 ) Manual /compress crashed with 'LCMEngine' object has no attribute '_align_boundary_forward' when any context-engine plugin was active. The gateway handler reached into _align_boundary_forward and _find_tail_cut_by_tokens on tmp_agent.context_compressor, but those are ContextCompressor-specific — not part of the generic ContextEngine ABC — so every plugin engine (LCM, etc.) raised AttributeError. - Add optional has_content_to_compress(messages) to ContextEngine ABC with a safe default of True (always attempt). - Override it in the built-in ContextCompressor using the existing private helpers — preserves exact prior behavior for 'compressor'. - Rewrite gateway /compress preflight to call the ABC method, deleting the private-helper reach-in. - Add focus_topic to the ABC compress() signature. Make _compress_context retry without focus_topic on TypeError so older strict-sig plugins don't crash on manual /compress <focus>. - Regression test with a fake ContextEngine subclass that only implements the ABC (mirrors LCM's surface). Reported by @selfhostedsoul (Discord, Apr 22).	2026-04-24 02:55:43 -07:00
Teknium	6a20e187dd	test,chore: cover stringified array/object coercion + AUTHOR_MAP entry Follow-up to the cherry-picked coercion commit: adds 9 regression tests covering array/object parsing, invalid-JSON passthrough, wrong-shape preservation, and the issue #3947 gmail-mcp scenario end-to-end. Adds dan@danlynn.com -> danklynn to scripts/release.py AUTHOR_MAP so the salvage PR's contributor attribution doesn't break CI.	2026-04-23 16:38:38 -07:00
maelrx	e020f46bec	fix(agent): preserve MiniMax context length on delta-only overflow	2026-04-23 14:06:37 -07:00
helix4u	1dfcda4e3c	fix(approval): guard env and config overwrites	2026-04-23 14:05:36 -07:00
Teknium	165b2e481a	feat(agent): make API retry count configurable via agent.api_max_retries (#14730 ) Closes #11616. The agent's API retry loop hardcoded max_retries = 3, so users with fallback providers on flaky primaries burned through ~3 × provider timeout (e.g. 3 × 180s = 9 minutes) before their fallback chain got a chance to kick in. Expose a new config key: agent: api_max_retries: 3 # default unchanged Set it to 1 for fast failover when you have fallback providers, or raise it if you prefer longer tolerance on a single provider. Values < 1 are clamped to 1 (single attempt, no retry); non-integer values fall back to the default. This wraps the Hermes-level retry loop only — the OpenAI SDK's own low-level retries (max_retries=2 default) still run beneath this for transient network errors. Changes: - hermes_cli/config.py: add agent.api_max_retries default 3 with comment. - run_agent.py: read self._api_max_retries in AIAgent.__init__; replace hardcoded max_retries = 3 in the retry loop with self._api_max_retries. - cli-config.yaml.example: documented example entry. - hermes_cli/tips.py: discoverable tip line. - tests/run_agent/test_api_max_retries_config.py: 4 tests covering default, override, clamp-to-one, and invalid-value fallback.	2026-04-23 13:59:32 -07:00
kshitijk4poor	43de1ca8c2	refactor: remove _nr_to_assistant_message shim + fix flush_memories guard NormalizedResponse and ToolCall now have backward-compat properties so the agent loop can read them directly without the shim: ToolCall: .type, .function (returns self), .call_id, .response_item_id NormalizedResponse: .reasoning_content, .reasoning_details, .codex_reasoning_items This eliminates the 35-line shim and its 4 call sites in run_agent.py. Also changes flush_memories guard from hasattr(response, 'choices') to self.api_mode in ('chat_completions', 'bedrock_converse') so it works with raw boto3 dicts too. WS1 items 3+4 of Cycle 2 (#14418).	2026-04-23 02:30:05 -07:00
kshitijk4poor	f4612785a4	refactor: collapse normalize_anthropic_response to return NormalizedResponse directly 3-layer chain (transport → v2 → v1) was collapsed to 2-layer in PR 7. This collapses the remaining 2-layer (transport → v1 → NR mapping in transport) to 1-layer: v1 now returns NormalizedResponse directly. Before: adapter returns (SimpleNamespace, finish_reason) tuple, transport unpacks and maps to NormalizedResponse (22 lines). After: adapter returns NormalizedResponse, transport is a 1-line passthrough. Also updates ToolCall construction — adapter now creates ToolCall dataclass directly instead of SimpleNamespace(id, type, function). WS1 item 1 of Cycle 2 (#14418).	2026-04-23 02:30:05 -07:00

1 2 3 4

160 Commits