diff --git a/RELEASE_v0.8.0.md b/RELEASE_v0.8.0.md
new file mode 100644
index 0000000000..57c8b05aba
--- /dev/null
+++ b/RELEASE_v0.8.0.md
@@ -0,0 +1,346 @@
+# Hermes Agent v0.8.0 (v2026.4.8)
+
+**Release Date:** April 8, 2026
+
+> The intelligence release — background task auto-notifications, free MiMo v2 Pro on Nous Portal, live model switching across all platforms, self-optimized GPT/Codex guidance, native Google AI Studio, smart inactivity timeouts, approval buttons, MCP OAuth 2.1, and 209 merged PRs with 82 resolved issues.
+
+---
+
+## ✨ Highlights
+
+- **Background Process Auto-Notifications (`notify_on_complete`)** — Background tasks can now automatically notify the agent when they finish. Start a long-running process (AI model training, test suites, deployments, builds) and the agent gets notified on completion — no polling needed. The agent can keep working on other things and pick up results when they land. ([#5779](https://github.com/NousResearch/hermes-agent/pull/5779))
+
+- **Free Xiaomi MiMo v2 Pro on Nous Portal** — Nous Portal now supports the free-tier Xiaomi MiMo v2 Pro model for auxiliary tasks (compression, vision, summarization), with free-tier model gating and pricing display in model selection. ([#6018](https://github.com/NousResearch/hermes-agent/pull/6018), [#5880](https://github.com/NousResearch/hermes-agent/pull/5880))
+
+- **Live Model Switching (`/model` Command)** — Switch models and providers mid-session from CLI, Telegram, Discord, Slack, or any gateway platform. Aggregator-aware resolution keeps you on OpenRouter/Nous when possible, with automatic cross-provider fallback when needed. Interactive model pickers on Telegram and Discord with inline buttons. ([#5181](https://github.com/NousResearch/hermes-agent/pull/5181), [#5742](https://github.com/NousResearch/hermes-agent/pull/5742))
+
+- **Self-Optimized GPT/Codex Tool-Use Guidance** — The agent diagnosed and patched 5 failure modes in GPT and Codex tool calling through automated behavioral benchmarking, dramatically improving reliability on OpenAI models. Includes execution discipline guidance and thinking-only prefill continuation for structured reasoning. ([#6120](https://github.com/NousResearch/hermes-agent/pull/6120), [#5414](https://github.com/NousResearch/hermes-agent/pull/5414), [#5931](https://github.com/NousResearch/hermes-agent/pull/5931))
+
+- **Google AI Studio (Gemini) Native Provider** — Direct access to Gemini models through Google's AI Studio API. Includes automatic models.dev registry integration for real-time context length detection across any provider. ([#5577](https://github.com/NousResearch/hermes-agent/pull/5577))
+
+- **Inactivity-Based Agent Timeouts** — Gateway and cron timeouts now track actual tool activity instead of wall-clock time. Long-running tasks that are actively working will never be killed — only truly idle agents time out. ([#5389](https://github.com/NousResearch/hermes-agent/pull/5389), [#5440](https://github.com/NousResearch/hermes-agent/pull/5440))
+
+- **Approval Buttons on Slack & Telegram** — Dangerous command approval via native platform buttons instead of typing `/approve`. Slack gets thread context preservation; Telegram gets emoji reactions for approval status. ([#5890](https://github.com/NousResearch/hermes-agent/pull/5890), [#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
+
+- **MCP OAuth 2.1 PKCE + OSV Malware Scanning** — Full standards-compliant OAuth for MCP server authentication, plus automatic malware scanning of MCP extension packages via the OSV vulnerability database. ([#5420](https://github.com/NousResearch/hermes-agent/pull/5420), [#5305](https://github.com/NousResearch/hermes-agent/pull/5305))
+
+- **Centralized Logging & Config Validation** — Structured logging to `~/.hermes/logs/` (agent.log + errors.log) with the `hermes logs` command for tailing and filtering. Config structure validation catches malformed YAML at startup before it causes cryptic failures. ([#5430](https://github.com/NousResearch/hermes-agent/pull/5430), [#5426](https://github.com/NousResearch/hermes-agent/pull/5426))
+
+- **Plugin System Expansion** — Plugins can now register CLI subcommands, receive request-scoped API hooks with correlation IDs, prompt for required env vars during install, and hook into session lifecycle events (finalize/reset). ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295), [#5427](https://github.com/NousResearch/hermes-agent/pull/5427), [#5470](https://github.com/NousResearch/hermes-agent/pull/5470), [#6129](https://github.com/NousResearch/hermes-agent/pull/6129))
+
+- **Matrix Tier 1 & Platform Hardening** — Matrix gets reactions, read receipts, rich formatting, and room management. Discord adds channel controls and ignored channels. Signal gets full MEDIA: tag delivery. Mattermost gets file attachments. Comprehensive reliability fixes across all platforms. ([#5275](https://github.com/NousResearch/hermes-agent/pull/5275), [#5975](https://github.com/NousResearch/hermes-agent/pull/5975), [#5602](https://github.com/NousResearch/hermes-agent/pull/5602))
+
+- **Security Hardening Pass** — Consolidated SSRF protections, timing attack mitigations, tar traversal prevention, credential leakage guards, cron path traversal hardening, and cross-session isolation. Terminal workdir sanitization across all backends. ([#5944](https://github.com/NousResearch/hermes-agent/pull/5944), [#5613](https://github.com/NousResearch/hermes-agent/pull/5613), [#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
+
+---
+
+## 🏗️ Core Agent & Architecture
+
+### Provider & Model Support
+- **Native Google AI Studio (Gemini) provider** with models.dev integration for automatic context length detection ([#5577](https://github.com/NousResearch/hermes-agent/pull/5577))
+- **`/model` command — full provider+model system overhaul** — live switching across CLI and all gateway platforms with aggregator-aware resolution ([#5181](https://github.com/NousResearch/hermes-agent/pull/5181))
+- **Interactive model picker for Telegram and Discord** — inline button-based model selection ([#5742](https://github.com/NousResearch/hermes-agent/pull/5742))
+- **Nous Portal free-tier model gating** with pricing display in model selection ([#5880](https://github.com/NousResearch/hermes-agent/pull/5880))
+- **Model pricing display** for OpenRouter and Nous Portal providers ([#5416](https://github.com/NousResearch/hermes-agent/pull/5416))
+- **xAI (Grok) prompt caching** via `x-grok-conv-id` header ([#5604](https://github.com/NousResearch/hermes-agent/pull/5604))
+- **Grok added to tool-use enforcement models** for direct xAI usage ([#5595](https://github.com/NousResearch/hermes-agent/pull/5595))
+- **MiniMax TTS provider** (speech-2.8) ([#4963](https://github.com/NousResearch/hermes-agent/pull/4963))
+- **Non-agentic model warning** — warns users when loading Hermes LLM models not designed for tool use ([#5378](https://github.com/NousResearch/hermes-agent/pull/5378))
+- **Ollama Cloud auth, /model switch persistence**, and alias tab completion ([#5269](https://github.com/NousResearch/hermes-agent/pull/5269))
+- **Preserve dots in OpenCode Go model names** (minimax-m2.7, glm-4.5, kimi-k2.5) ([#5597](https://github.com/NousResearch/hermes-agent/pull/5597))
+- **MiniMax models 404 fix** — strip /v1 from Anthropic base URL for OpenCode Go ([#4918](https://github.com/NousResearch/hermes-agent/pull/4918))
+- **Provider credential reset windows** honored in pooled failover ([#5188](https://github.com/NousResearch/hermes-agent/pull/5188))
+- **OAuth token sync** between credential pool and credentials file ([#4981](https://github.com/NousResearch/hermes-agent/pull/4981))
+- **Stale OAuth credentials** no longer block OpenRouter users on auto-detect ([#5746](https://github.com/NousResearch/hermes-agent/pull/5746))
+- **Codex OAuth credential pool disconnect** + expired token import fix ([#5681](https://github.com/NousResearch/hermes-agent/pull/5681))
+- **Codex pool entry sync** from `~/.codex/auth.json` on exhaustion — @GratefulDave ([#5610](https://github.com/NousResearch/hermes-agent/pull/5610))
+- **Auxiliary client payment fallback** — retry with next provider on 402 ([#5599](https://github.com/NousResearch/hermes-agent/pull/5599))
+- **Auxiliary client resolves named custom providers** and 'main' alias ([#5978](https://github.com/NousResearch/hermes-agent/pull/5978))
+- **Use mimo-v2-pro** for non-vision auxiliary tasks on Nous free tier ([#6018](https://github.com/NousResearch/hermes-agent/pull/6018))
+- **Vision auto-detection** tries main provider first ([#6041](https://github.com/NousResearch/hermes-agent/pull/6041))
+- **Provider re-ordering and Quick Install** — @austinpickett ([#4664](https://github.com/NousResearch/hermes-agent/pull/4664))
+- **Nous OAuth access_token** no longer used as inference API key — @SHL0MS ([#5564](https://github.com/NousResearch/hermes-agent/pull/5564))
+- **HERMES_PORTAL_BASE_URL env var** respected during Nous login — @benbarclay ([#5745](https://github.com/NousResearch/hermes-agent/pull/5745))
+- **Env var overrides** for Nous portal/inference URLs ([#5419](https://github.com/NousResearch/hermes-agent/pull/5419))
+- **Z.AI endpoint auto-detect** via probe and cache ([#5763](https://github.com/NousResearch/hermes-agent/pull/5763))
+- **MiniMax context lengths, model catalog, thinking guard, aux model, and config base_url** corrections ([#6082](https://github.com/NousResearch/hermes-agent/pull/6082))
+- **Community provider/model resolution fixes** — salvaged 4 community PRs + MiniMax aux URL ([#5983](https://github.com/NousResearch/hermes-agent/pull/5983))
+
+### Agent Loop & Conversation
+- **Self-optimized GPT/Codex tool-use guidance** via automated behavioral benchmarking — agent self-diagnosed and patched 5 failure modes ([#6120](https://github.com/NousResearch/hermes-agent/pull/6120))
+- **GPT/Codex execution discipline guidance** in system prompts ([#5414](https://github.com/NousResearch/hermes-agent/pull/5414))
+- **Thinking-only prefill continuation** for structured reasoning responses ([#5931](https://github.com/NousResearch/hermes-agent/pull/5931))
+- **Accept reasoning-only responses** without retries — set content to "(empty)" instead of infinite retry ([#5278](https://github.com/NousResearch/hermes-agent/pull/5278))
+- **Jittered retry backoff** — exponential backoff with jitter for API retries ([#6048](https://github.com/NousResearch/hermes-agent/pull/6048))
+- **Smart thinking block signature management** — preserve and manage Anthropic thinking signatures across turns ([#6112](https://github.com/NousResearch/hermes-agent/pull/6112))
+- **Coerce tool call arguments** to match JSON Schema types — fixes models that send strings instead of numbers/booleans ([#5265](https://github.com/NousResearch/hermes-agent/pull/5265))
+- **Save oversized tool results to file** instead of destructive truncation ([#5210](https://github.com/NousResearch/hermes-agent/pull/5210))
+- **Sandbox-aware tool result persistence** ([#6085](https://github.com/NousResearch/hermes-agent/pull/6085))
+- **Streaming fallback** improved after edit failures ([#6110](https://github.com/NousResearch/hermes-agent/pull/6110))
+- **Codex empty-output gaps** covered in fallback + normalizer + auxiliary client ([#5724](https://github.com/NousResearch/hermes-agent/pull/5724), [#5730](https://github.com/NousResearch/hermes-agent/pull/5730), [#5734](https://github.com/NousResearch/hermes-agent/pull/5734))
+- **Codex stream output backfill** from output_item.done events ([#5689](https://github.com/NousResearch/hermes-agent/pull/5689))
+- **Stream consumer creates new message** after tool boundaries ([#5739](https://github.com/NousResearch/hermes-agent/pull/5739))
+- **Codex validation aligned** with normalization for empty stream output ([#5940](https://github.com/NousResearch/hermes-agent/pull/5940))
+- **Bridge tool-calls** in copilot-acp adapter ([#5460](https://github.com/NousResearch/hermes-agent/pull/5460))
+- **Filter transcript-only roles** from chat-completions payload ([#4880](https://github.com/NousResearch/hermes-agent/pull/4880))
+- **Context compaction failures fixed** on temperature-restricted models — @MadKangYu ([#5608](https://github.com/NousResearch/hermes-agent/pull/5608))
+- **Sanitize tool_calls for all strict APIs** (Fireworks, Mistral, etc.) — @lumethegreat ([#5183](https://github.com/NousResearch/hermes-agent/pull/5183))
+
+### Memory & Sessions
+- **Supermemory memory provider** — new memory plugin with multi-container, search_mode, identity template, and env var override ([#5737](https://github.com/NousResearch/hermes-agent/pull/5737), [#5933](https://github.com/NousResearch/hermes-agent/pull/5933))
+- **Shared thread sessions** by default — multi-user thread support across gateway platforms ([#5391](https://github.com/NousResearch/hermes-agent/pull/5391))
+- **Subagent sessions linked to parent** and hidden from session list ([#5309](https://github.com/NousResearch/hermes-agent/pull/5309))
+- **Profile-scoped memory isolation** and clone support ([#4845](https://github.com/NousResearch/hermes-agent/pull/4845))
+- **Thread gateway user_id to memory plugins** for per-user scoping ([#5895](https://github.com/NousResearch/hermes-agent/pull/5895))
+- **Honcho plugin drift overhaul** + plugin CLI registration system ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295))
+- **Honcho holographic prompt and trust score** rendering preserved ([#4872](https://github.com/NousResearch/hermes-agent/pull/4872))
+- **Honcho doctor fix** — use recall_mode instead of memory_mode — @techguysimon ([#5645](https://github.com/NousResearch/hermes-agent/pull/5645))
+- **RetainDB** — API routes, write queue, dialectic, agent model, file tools fixes ([#5461](https://github.com/NousResearch/hermes-agent/pull/5461))
+- **Hindsight memory plugin overhaul** + memory setup wizard fixes ([#5094](https://github.com/NousResearch/hermes-agent/pull/5094))
+- **mem0 API v2 compat**, prefetch context fencing, secret redaction ([#5423](https://github.com/NousResearch/hermes-agent/pull/5423))
+- **mem0 env vars merged** with mem0.json instead of either/or ([#4939](https://github.com/NousResearch/hermes-agent/pull/4939))
+- **Clean user message** used for all memory provider operations ([#4940](https://github.com/NousResearch/hermes-agent/pull/4940))
+- **Silent memory flush failure** on /new and /resume fixed — @ryanautomated ([#5640](https://github.com/NousResearch/hermes-agent/pull/5640))
+- **OpenViking atexit safety net** for session commit ([#5664](https://github.com/NousResearch/hermes-agent/pull/5664))
+- **OpenViking tenant-scoping headers** for multi-tenant servers ([#4936](https://github.com/NousResearch/hermes-agent/pull/4936))
+- **ByteRover brv query** runs synchronously before LLM call ([#4831](https://github.com/NousResearch/hermes-agent/pull/4831))
+
+---
+
+## 📱 Messaging Platforms (Gateway)
+
+### Gateway Core
+- **Inactivity-based agent timeout** — replaces wall-clock timeout with smart activity tracking; long-running active tasks never killed ([#5389](https://github.com/NousResearch/hermes-agent/pull/5389))
+- **Approval buttons for Slack & Telegram** + Slack thread context preservation ([#5890](https://github.com/NousResearch/hermes-agent/pull/5890))
+- **Live-stream /update output** + forward interactive prompts to user ([#5180](https://github.com/NousResearch/hermes-agent/pull/5180))
+- **Infinite timeout support** + periodic notifications + actionable error messages ([#4959](https://github.com/NousResearch/hermes-agent/pull/4959))
+- **Duplicate message prevention** — gateway dedup + partial stream guard ([#4878](https://github.com/NousResearch/hermes-agent/pull/4878))
+- **Webhook delivery_info persistence** + full session id in /status ([#5942](https://github.com/NousResearch/hermes-agent/pull/5942))
+- **Tool preview truncation** respects tool_preview_length in all/new progress modes ([#5937](https://github.com/NousResearch/hermes-agent/pull/5937))
+- **Short preview truncation** restored for all/new tool progress modes ([#4935](https://github.com/NousResearch/hermes-agent/pull/4935))
+- **Update-pending state** written atomically to prevent corruption ([#4923](https://github.com/NousResearch/hermes-agent/pull/4923))
+- **Approval session key isolated** per turn ([#4884](https://github.com/NousResearch/hermes-agent/pull/4884))
+- **Active-session guard bypass** for /approve, /deny, /stop, /new ([#4926](https://github.com/NousResearch/hermes-agent/pull/4926), [#5765](https://github.com/NousResearch/hermes-agent/pull/5765))
+- **Typing indicator paused** during approval waits ([#5893](https://github.com/NousResearch/hermes-agent/pull/5893))
+- **Caption check** uses exact line-by-line match instead of substring (all platforms) ([#5939](https://github.com/NousResearch/hermes-agent/pull/5939))
+- **MEDIA: tags stripped** from streamed gateway messages ([#5152](https://github.com/NousResearch/hermes-agent/pull/5152))
+- **MEDIA: tags extracted** from cron delivery before sending ([#5598](https://github.com/NousResearch/hermes-agent/pull/5598))
+- **Profile-aware service units** + voice transcription cleanup ([#5972](https://github.com/NousResearch/hermes-agent/pull/5972))
+- **Thread-safe PairingStore** with atomic writes — @CharlieKerfoot ([#5656](https://github.com/NousResearch/hermes-agent/pull/5656))
+- **Sanitize media URLs** in base platform logs — @WAXLYY ([#5631](https://github.com/NousResearch/hermes-agent/pull/5631))
+- **Reduce Telegram fallback IP activation log noise** — @MadKangYu ([#5615](https://github.com/NousResearch/hermes-agent/pull/5615))
+- **Cron static method wrappers** to prevent self-binding ([#5299](https://github.com/NousResearch/hermes-agent/pull/5299))
+- **Stale 'hermes login' replaced** with 'hermes auth' + credential removal re-seeding fix ([#5670](https://github.com/NousResearch/hermes-agent/pull/5670))
+
+### Telegram
+- **Group topics skill binding** for supergroup forum topics ([#4886](https://github.com/NousResearch/hermes-agent/pull/4886))
+- **Emoji reactions** for approval status and notifications ([#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
+- **Duplicate message delivery prevented** on send timeout ([#5153](https://github.com/NousResearch/hermes-agent/pull/5153))
+- **Command names sanitized** to strip invalid characters ([#5596](https://github.com/NousResearch/hermes-agent/pull/5596))
+- **Per-platform disabled skills** respected in Telegram menu and gateway dispatch ([#4799](https://github.com/NousResearch/hermes-agent/pull/4799))
+- **/approve and /deny** routed through running-agent guard ([#4798](https://github.com/NousResearch/hermes-agent/pull/4798))
+
+### Discord
+- **Channel controls** — ignored_channels and no_thread_channels config options ([#5975](https://github.com/NousResearch/hermes-agent/pull/5975))
+- **Skills registered as native slash commands** via shared gateway logic ([#5603](https://github.com/NousResearch/hermes-agent/pull/5603))
+- **/approve, /deny, /queue, /background, /btw** registered as native slash commands ([#4800](https://github.com/NousResearch/hermes-agent/pull/4800), [#5477](https://github.com/NousResearch/hermes-agent/pull/5477))
+- **Unnecessary members intent** removed on startup + token lock leak fix ([#5302](https://github.com/NousResearch/hermes-agent/pull/5302))
+
+### Slack
+- **Thread engagement** — auto-respond in bot-started and mentioned threads ([#5897](https://github.com/NousResearch/hermes-agent/pull/5897))
+- **mrkdwn in edit_message** + thread replies without @mentions ([#5733](https://github.com/NousResearch/hermes-agent/pull/5733))
+
+### Matrix
+- **Tier 1 feature parity** — reactions, read receipts, rich formatting, room management ([#5275](https://github.com/NousResearch/hermes-agent/pull/5275))
+- **MATRIX_REQUIRE_MENTION and MATRIX_AUTO_THREAD** support ([#5106](https://github.com/NousResearch/hermes-agent/pull/5106))
+- **Comprehensive reliability** — encrypted media, auth recovery, cron E2EE, Synapse compat ([#5271](https://github.com/NousResearch/hermes-agent/pull/5271))
+- **CJK input, E2EE, and reconnect** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
+
+### Signal
+- **Full MEDIA: tag delivery** — send_image_file, send_voice, and send_video implemented ([#5602](https://github.com/NousResearch/hermes-agent/pull/5602))
+
+### Mattermost
+- **File attachments** — set message type to DOCUMENT when post has file attachments — @nericervin ([#5609](https://github.com/NousResearch/hermes-agent/pull/5609))
+
+### Feishu
+- **Interactive card approval buttons** ([#6043](https://github.com/NousResearch/hermes-agent/pull/6043))
+- **Reconnect and ACL** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
+
+### Webhooks
+- **`{__raw__}` template token** and thread_id passthrough for forum topics ([#5662](https://github.com/NousResearch/hermes-agent/pull/5662))
+
+---
+
+## 🖥️ CLI & User Experience
+
+### Interactive CLI
+- **Defer response content** until reasoning block completes ([#5773](https://github.com/NousResearch/hermes-agent/pull/5773))
+- **Ghost status-bar lines cleared** on terminal resize ([#4960](https://github.com/NousResearch/hermes-agent/pull/4960))
+- **Normalise \r\n and \r line endings** in pasted text ([#4849](https://github.com/NousResearch/hermes-agent/pull/4849))
+- **ChatConsole errors, curses scroll, skin-aware banner, git state** banner fixes ([#5974](https://github.com/NousResearch/hermes-agent/pull/5974))
+- **Native Windows image paste** support ([#5917](https://github.com/NousResearch/hermes-agent/pull/5917))
+- **--yolo and other flags** no longer silently dropped when placed before 'chat' subcommand ([#5145](https://github.com/NousResearch/hermes-agent/pull/5145))
+
+### Setup & Configuration
+- **Config structure validation** — detect malformed YAML at startup with actionable error messages ([#5426](https://github.com/NousResearch/hermes-agent/pull/5426))
+- **Centralized logging** to `~/.hermes/logs/` — agent.log (INFO+), errors.log (WARNING+) with `hermes logs` command ([#5430](https://github.com/NousResearch/hermes-agent/pull/5430))
+- **Docs links added** to setup wizard sections ([#5283](https://github.com/NousResearch/hermes-agent/pull/5283))
+- **Doctor diagnostics** — sync provider checks, config migration, WAL and mem0 diagnostics ([#5077](https://github.com/NousResearch/hermes-agent/pull/5077))
+- **Timeout debug logging** and user-facing diagnostics improved ([#5370](https://github.com/NousResearch/hermes-agent/pull/5370))
+- **Reasoning effort unified** to config.yaml only ([#6118](https://github.com/NousResearch/hermes-agent/pull/6118))
+- **Permanent command allowlist** loaded on startup ([#5076](https://github.com/NousResearch/hermes-agent/pull/5076))
+- **`hermes auth remove`** now clears env-seeded credentials permanently ([#5285](https://github.com/NousResearch/hermes-agent/pull/5285))
+- **Bundled skills synced to all profiles** during update ([#5795](https://github.com/NousResearch/hermes-agent/pull/5795))
+- **`hermes update` no longer kills** freshly-restarted gateway service ([#5448](https://github.com/NousResearch/hermes-agent/pull/5448))
+- **Subprocess.run() timeouts** added to all gateway CLI commands ([#5424](https://github.com/NousResearch/hermes-agent/pull/5424))
+- **Actionable error message** when Codex refresh token is reused — @tymrtn ([#5612](https://github.com/NousResearch/hermes-agent/pull/5612))
+- **Google-workspace skill scripts** can now run directly — @xinbenlv ([#5624](https://github.com/NousResearch/hermes-agent/pull/5624))
+
+### Cron System
+- **Inactivity-based cron timeout** — replaces wall-clock; active tasks run indefinitely ([#5440](https://github.com/NousResearch/hermes-agent/pull/5440))
+- **Pre-run script injection** for data collection and change detection ([#5082](https://github.com/NousResearch/hermes-agent/pull/5082))
+- **Delivery failure tracking** in job status ([#6042](https://github.com/NousResearch/hermes-agent/pull/6042))
+- **Delivery guidance** in cron prompts — stops send_message thrashing ([#5444](https://github.com/NousResearch/hermes-agent/pull/5444))
+- **MEDIA files delivered** as native platform attachments ([#5921](https://github.com/NousResearch/hermes-agent/pull/5921))
+- **[SILENT] suppression** works anywhere in response — @auspic7 ([#5654](https://github.com/NousResearch/hermes-agent/pull/5654))
+- **Cron path traversal** hardening ([#5147](https://github.com/NousResearch/hermes-agent/pull/5147))
+
+---
+
+## 🔧 Tool System
+
+### Terminal & Execution
+- **Execute_code on remote backends** — code execution now works on Docker, SSH, Modal, and other remote terminal backends ([#5088](https://github.com/NousResearch/hermes-agent/pull/5088))
+- **Exit code context** for common CLI tools in terminal results — helps agent understand what went wrong ([#5144](https://github.com/NousResearch/hermes-agent/pull/5144))
+- **Progressive subdirectory hint discovery** — agent learns project structure as it navigates ([#5291](https://github.com/NousResearch/hermes-agent/pull/5291))
+- **notify_on_complete for background processes** — get notified when long-running tasks finish ([#5779](https://github.com/NousResearch/hermes-agent/pull/5779))
+- **Docker env config** — explicit container environment variables via docker_env config ([#4738](https://github.com/NousResearch/hermes-agent/pull/4738))
+- **Approval metadata included** in terminal tool results ([#5141](https://github.com/NousResearch/hermes-agent/pull/5141))
+- **Workdir parameter sanitized** in terminal tool across all backends ([#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
+- **Detached process crash recovery** state corrected ([#6101](https://github.com/NousResearch/hermes-agent/pull/6101))
+- **Agent-browser paths with spaces** preserved — @Vasanthdev2004 ([#6077](https://github.com/NousResearch/hermes-agent/pull/6077))
+- **Portable base64 encoding** for image reading on macOS — @CharlieKerfoot ([#5657](https://github.com/NousResearch/hermes-agent/pull/5657))
+
+### Browser
+- **Switch managed browser provider** from Browserbase to Browser Use — @benbarclay ([#5750](https://github.com/NousResearch/hermes-agent/pull/5750))
+- **Firecrawl cloud browser** provider — @alt-glitch ([#5628](https://github.com/NousResearch/hermes-agent/pull/5628))
+- **JS evaluation** via browser_console expression parameter ([#5303](https://github.com/NousResearch/hermes-agent/pull/5303))
+- **Windows browser** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
+
+### MCP
+- **MCP OAuth 2.1 PKCE** — full standards-compliant OAuth client support ([#5420](https://github.com/NousResearch/hermes-agent/pull/5420))
+- **OSV malware check** for MCP extension packages ([#5305](https://github.com/NousResearch/hermes-agent/pull/5305))
+- **Prefer structuredContent over text** + no_mcp sentinel ([#5979](https://github.com/NousResearch/hermes-agent/pull/5979))
+- **Unknown toolsets warning suppressed** for MCP server names ([#5279](https://github.com/NousResearch/hermes-agent/pull/5279))
+
+### Web & Files
+- **.zip document support** + auto-mount cache dirs into remote backends ([#4846](https://github.com/NousResearch/hermes-agent/pull/4846))
+- **Redact query secrets** in send_message errors — @WAXLYY ([#5650](https://github.com/NousResearch/hermes-agent/pull/5650))
+
+### Delegation
+- **Credential pool sharing** + workspace path hints for subagents ([#5748](https://github.com/NousResearch/hermes-agent/pull/5748))
+
+### ACP (VS Code / Zed / JetBrains)
+- **Aggregate ACP improvements** — auth compat, protocol fixes, command ads, delegation, SSE events ([#5292](https://github.com/NousResearch/hermes-agent/pull/5292))
+
+---
+
+## 🧩 Skills Ecosystem
+
+### Skills System
+- **Skill config interface** — skills can declare required config.yaml settings, prompted during setup, injected at load time ([#5635](https://github.com/NousResearch/hermes-agent/pull/5635))
+- **Plugin CLI registration system** — plugins register their own CLI subcommands without touching main.py ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295))
+- **Request-scoped API hooks** with tool call correlation IDs for plugins ([#5427](https://github.com/NousResearch/hermes-agent/pull/5427))
+- **Session lifecycle hooks** — on_session_finalize and on_session_reset for CLI + gateway ([#6129](https://github.com/NousResearch/hermes-agent/pull/6129))
+- **Prompt for required env vars** during plugin install — @kshitijk4poor ([#5470](https://github.com/NousResearch/hermes-agent/pull/5470))
+- **Plugin name validation** — reject names that resolve to plugins root ([#5368](https://github.com/NousResearch/hermes-agent/pull/5368))
+- **pre_llm_call plugin context** moved to user message to preserve prompt cache ([#5146](https://github.com/NousResearch/hermes-agent/pull/5146))
+
+### New & Updated Skills
+- **popular-web-designs** — 54 production website design systems ([#5194](https://github.com/NousResearch/hermes-agent/pull/5194))
+- **p5js creative coding** — @SHL0MS ([#5600](https://github.com/NousResearch/hermes-agent/pull/5600))
+- **manim-video** — mathematical and technical animations — @SHL0MS ([#4930](https://github.com/NousResearch/hermes-agent/pull/4930))
+- **llm-wiki** — Karpathy's LLM Wiki skill ([#5635](https://github.com/NousResearch/hermes-agent/pull/5635))
+- **gitnexus-explorer** — codebase indexing and knowledge serving ([#5208](https://github.com/NousResearch/hermes-agent/pull/5208))
+- **research-paper-writing** — AI-Scientist & GPT-Researcher patterns — @SHL0MS ([#5421](https://github.com/NousResearch/hermes-agent/pull/5421))
+- **blogwatcher** updated to JulienTant's fork ([#5759](https://github.com/NousResearch/hermes-agent/pull/5759))
+- **claude-code skill** comprehensive rewrite v2.0 + v2.2 ([#5155](https://github.com/NousResearch/hermes-agent/pull/5155), [#5158](https://github.com/NousResearch/hermes-agent/pull/5158))
+- **Code verification skills** consolidated into one ([#4854](https://github.com/NousResearch/hermes-agent/pull/4854))
+- **Manim CE reference docs** expanded — geometry, animations, LaTeX — @leotrs ([#5791](https://github.com/NousResearch/hermes-agent/pull/5791))
+- **Manim-video references** — design thinking, updaters, paper explainer, decorations, production quality — @SHL0MS ([#5588](https://github.com/NousResearch/hermes-agent/pull/5588), [#5408](https://github.com/NousResearch/hermes-agent/pull/5408))
+
+---
+
+## 🔒 Security & Reliability
+
+### Security Hardening
+- **Consolidated security** — SSRF protections, timing attack mitigations, tar traversal prevention, credential leakage guards ([#5944](https://github.com/NousResearch/hermes-agent/pull/5944))
+- **Cross-session isolation** + cron path traversal hardening ([#5613](https://github.com/NousResearch/hermes-agent/pull/5613))
+- **Workdir parameter sanitized** in terminal tool across all backends ([#5629](https://github.com/NousResearch/hermes-agent/pull/5629))
+- **Approval 'once' session escalation** prevented + cron delivery platform validation ([#5280](https://github.com/NousResearch/hermes-agent/pull/5280))
+- **Profile-scoped Google Workspace OAuth tokens** protected ([#4910](https://github.com/NousResearch/hermes-agent/pull/4910))
+
+### Reliability
+- **Aggressive worktree and branch cleanup** to prevent accumulation ([#6134](https://github.com/NousResearch/hermes-agent/pull/6134))
+- **O(n²) catastrophic backtracking** in redact regex fixed — 100x improvement on large outputs ([#4962](https://github.com/NousResearch/hermes-agent/pull/4962))
+- **Runtime stability fixes** across core, web, delegate, and browser tools ([#4843](https://github.com/NousResearch/hermes-agent/pull/4843))
+- **API server streaming fix** + conversation history support ([#5977](https://github.com/NousResearch/hermes-agent/pull/5977))
+- **OpenViking API endpoint paths** and response parsing corrected ([#5078](https://github.com/NousResearch/hermes-agent/pull/5078))
+
+---
+
+## 🐛 Notable Bug Fixes
+
+- **9 community bugfixes salvaged** — gateway, cron, deps, macOS launchd in one batch ([#5288](https://github.com/NousResearch/hermes-agent/pull/5288))
+- **Batch core bug fixes** — model config, session reset, alias fallback, launchctl, delegation, atomic writes ([#5630](https://github.com/NousResearch/hermes-agent/pull/5630))
+- **Batch gateway/platform fixes** — matrix E2EE, CJK input, Windows browser, Feishu reconnect + ACL ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665))
+- **Stale test skips removed**, regex backtracking, file search bug, and test flakiness ([#4969](https://github.com/NousResearch/hermes-agent/pull/4969))
+- **Nix flake** — read version, regen uv.lock, add hermes_logging — @alt-glitch ([#5651](https://github.com/NousResearch/hermes-agent/pull/5651))
+- **Lowercase variable redaction** regression tests ([#5185](https://github.com/NousResearch/hermes-agent/pull/5185))
+
+---
+
+## 🧪 Testing
+
+- **57 failing CI tests repaired** across 14 files ([#5823](https://github.com/NousResearch/hermes-agent/pull/5823))
+- **Test suite re-architecture** + CI failure fixes — @alt-glitch ([#5946](https://github.com/NousResearch/hermes-agent/pull/5946))
+- **Codebase-wide lint cleanup** — unused imports, dead code, and inefficient patterns ([#5821](https://github.com/NousResearch/hermes-agent/pull/5821))
+- **browser_close tool removed** — auto-cleanup handles it ([#5792](https://github.com/NousResearch/hermes-agent/pull/5792))
+
+---
+
+## 📚 Documentation
+
+- **Comprehensive documentation audit** — fix stale info, expand thin pages, add depth ([#5393](https://github.com/NousResearch/hermes-agent/pull/5393))
+- **40+ discrepancies fixed** between documentation and codebase ([#5818](https://github.com/NousResearch/hermes-agent/pull/5818))
+- **13 features documented** from last week's PRs ([#5815](https://github.com/NousResearch/hermes-agent/pull/5815))
+- **Guides section overhaul** — fix existing + add 3 new tutorials ([#5735](https://github.com/NousResearch/hermes-agent/pull/5735))
+- **Salvaged 4 docs PRs** — docker setup, post-update validation, local LLM guide, signal-cli install ([#5727](https://github.com/NousResearch/hermes-agent/pull/5727))
+- **Discord configuration reference** ([#5386](https://github.com/NousResearch/hermes-agent/pull/5386))
+- **Community FAQ entries** for common workflows and troubleshooting ([#4797](https://github.com/NousResearch/hermes-agent/pull/4797))
+- **WSL2 networking guide** for local model servers ([#5616](https://github.com/NousResearch/hermes-agent/pull/5616))
+- **Honcho CLI reference** + plugin CLI registration docs ([#5308](https://github.com/NousResearch/hermes-agent/pull/5308))
+- **Obsidian Headless setup** for servers in llm-wiki ([#5660](https://github.com/NousResearch/hermes-agent/pull/5660))
+- **Hermes Mod visual skin editor** added to skins page ([#6095](https://github.com/NousResearch/hermes-agent/pull/6095))
+
+---
+
+## 👥 Contributors
+
+### Core
+- **@teknium1** — 179 PRs
+
+### Top Community Contributors
+- **@SHL0MS** (7 PRs) — p5js creative coding skill, manim-video skill + 5 reference expansions, research-paper-writing, Nous OAuth fix, manim font fix
+- **@alt-glitch** (3 PRs) — Firecrawl cloud browser provider, test re-architecture + CI fixes, Nix flake fixes
+- **@benbarclay** (2 PRs) — Browser Use managed provider switch, Nous portal base URL fix
+- **@CharlieKerfoot** (2 PRs) — macOS portable base64 encoding, thread-safe PairingStore
+- **@WAXLYY** (2 PRs) — send_message secret redaction, gateway media URL sanitization
+- **@MadKangYu** (2 PRs) — Telegram log noise reduction, context compaction fix for temperature-restricted models
+
+### All Contributors
+@alt-glitch, @austinpickett, @auspic7, @benbarclay, @CharlieKerfoot, @GratefulDave, @kshitijk4poor, @leotrs, @lumethegreat, @MadKangYu, @nericervin, @ryanautomated, @SHL0MS, @techguysimon, @tymrtn, @Vasanthdev2004, @WAXLYY, @xinbenlv
+
+---
+
+**Full Changelog**: [v2026.4.3...v2026.4.8](https://github.com/NousResearch/hermes-agent/compare/v2026.4.3...v2026.4.8)
diff --git a/agent/anthropic_adapter.py b/agent/anthropic_adapter.py
index f4e8dcee65..2d6c2dd82e 100644
--- a/agent/anthropic_adapter.py
+++ b/agent/anthropic_adapter.py
@@ -1102,7 +1102,15 @@ def convert_messages_to_anthropic(
curr_content = [{"type": "text", "text": curr_content}]
fixed[-1]["content"] = prev_content + curr_content
else:
- # Consecutive assistant messages — merge text content
+ # Consecutive assistant messages — merge text content.
+ # Drop thinking blocks from the *second* message: their
+ # signature was computed against a different turn boundary
+ # and becomes invalid once merged.
+ if isinstance(m["content"], list):
+ m["content"] = [
+ b for b in m["content"]
+ if not (isinstance(b, dict) and b.get("type") in ("thinking", "redacted_thinking"))
+ ]
prev_blocks = fixed[-1]["content"]
curr_blocks = m["content"]
if isinstance(prev_blocks, list) and isinstance(curr_blocks, list):
@@ -1120,6 +1128,68 @@ def convert_messages_to_anthropic(
fixed.append(m)
result = fixed
+ # ── Thinking block signature management ──────────────────────────
+ # Anthropic signs thinking blocks against the full turn content.
+ # Any upstream mutation (context compression, session truncation,
+ # orphan stripping, message merging) invalidates the signature,
+ # causing HTTP 400 "Invalid signature in thinking block".
+ #
+ # Strategy (following clawdbot/OpenClaw pattern):
+ # 1. Strip thinking/redacted_thinking from all assistant messages
+ # EXCEPT the last one — preserves reasoning continuity on the
+ # current tool-use chain while avoiding stale signature errors.
+ # 2. Downgrade unsigned thinking blocks (no signature) to text —
+ # Anthropic can't validate them and will reject them.
+ # 3. Strip cache_control from thinking/redacted_thinking blocks —
+ # cache markers can interfere with signature validation.
+ _THINKING_TYPES = frozenset(("thinking", "redacted_thinking"))
+
+ last_assistant_idx = None
+ for i in range(len(result) - 1, -1, -1):
+ if result[i].get("role") == "assistant":
+ last_assistant_idx = i
+ break
+
+ for idx, m in enumerate(result):
+ if m.get("role") != "assistant" or not isinstance(m.get("content"), list):
+ continue
+
+ if idx != last_assistant_idx:
+ # Strip ALL thinking blocks from non-latest assistant messages
+ stripped = [
+ b for b in m["content"]
+ if not (isinstance(b, dict) and b.get("type") in _THINKING_TYPES)
+ ]
+ m["content"] = stripped or [{"type": "text", "text": "(thinking elided)"}]
+ else:
+ # Latest assistant: keep signed thinking blocks for reasoning
+ # continuity; downgrade unsigned ones to plain text.
+ new_content = []
+ for b in m["content"]:
+ if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES:
+ new_content.append(b)
+ continue
+ if b.get("type") == "redacted_thinking":
+ # Redacted blocks use 'data' for the signature payload
+ if b.get("data"):
+ new_content.append(b)
+ # else: drop — no data means it can't be validated
+ elif b.get("signature"):
+ # Signed thinking block — keep it
+ new_content.append(b)
+ else:
+ # Unsigned thinking — downgrade to text so it's not lost
+ thinking_text = b.get("thinking", "")
+ if thinking_text:
+ new_content.append({"type": "text", "text": thinking_text})
+ m["content"] = new_content or [{"type": "text", "text": "(empty)"}]
+
+ # Strip cache_control from any remaining thinking/redacted_thinking
+ # blocks — cache markers interfere with signature validation.
+ for b in m["content"]:
+ if isinstance(b, dict) and b.get("type") in _THINKING_TYPES:
+ b.pop("cache_control", None)
+
return system, result
@@ -1224,9 +1294,9 @@ def build_anthropic_kwargs(
# Map reasoning_config to Anthropic's thinking parameter.
# Claude 4.6 models use adaptive thinking + output_config.effort.
# Older models use manual thinking with budget_tokens.
- # Haiku models do NOT support extended thinking at all — skip entirely.
+ # Haiku and MiniMax models do NOT support extended thinking — skip entirely.
if reasoning_config and isinstance(reasoning_config, dict):
- if reasoning_config.get("enabled") is not False and "haiku" not in model.lower():
+ if reasoning_config.get("enabled") is not False and "haiku" not in model.lower() and "minimax" not in model.lower():
effort = str(reasoning_config.get("effort", "medium")).lower()
budget = THINKING_BUDGET.get(effort, 8000)
if _supports_adaptive_thinking(model):
diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py
index 49a78458d3..2b99ac0708 100644
--- a/agent/auxiliary_client.py
+++ b/agent/auxiliary_client.py
@@ -59,13 +59,48 @@ from hermes_constants import OPENROUTER_BASE_URL
logger = logging.getLogger(__name__)
+_PROVIDER_ALIASES = {
+ "google": "gemini",
+ "google-gemini": "gemini",
+ "google-ai-studio": "gemini",
+ "glm": "zai",
+ "z-ai": "zai",
+ "z.ai": "zai",
+ "zhipu": "zai",
+ "kimi": "kimi-coding",
+ "moonshot": "kimi-coding",
+ "minimax-china": "minimax-cn",
+ "minimax_cn": "minimax-cn",
+ "claude": "anthropic",
+ "claude-code": "anthropic",
+}
+
+
+def _normalize_aux_provider(provider: Optional[str], *, for_vision: bool = False) -> str:
+ normalized = (provider or "auto").strip().lower()
+ if normalized.startswith("custom:"):
+ suffix = normalized.split(":", 1)[1].strip()
+ if not suffix:
+ return "custom"
+ normalized = suffix if not for_vision else "custom"
+ if normalized == "codex":
+ return "openai-codex"
+ if normalized == "main":
+ # Resolve to the user's actual main provider so named custom providers
+ # and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
+ main_prov = _read_main_provider()
+ if main_prov and main_prov not in ("auto", "main", ""):
+ return main_prov
+ return "custom"
+ return _PROVIDER_ALIASES.get(normalized, normalized)
+
# Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"gemini": "gemini-3-flash-preview",
"zai": "glm-4.5-flash",
"kimi-coding": "kimi-k2-turbo-preview",
- "minimax": "MiniMax-M2.7-highspeed",
- "minimax-cn": "MiniMax-M2.7-highspeed",
+ "minimax": "MiniMax-M2.7",
+ "minimax-cn": "MiniMax-M2.7",
"anthropic": "claude-haiku-4-5-20251001",
"ai-gateway": "google/gemini-3-flash",
"opencode-zen": "gemini-3-flash",
@@ -92,6 +127,7 @@ auxiliary_is_nous: bool = False
_OPENROUTER_MODEL = "google/gemini-3-flash-preview"
_NOUS_MODEL = "google/gemini-3-flash-preview"
_NOUS_FREE_TIER_VISION_MODEL = "xiaomi/mimo-v2-omni"
+_NOUS_FREE_TIER_AUX_MODEL = "xiaomi/mimo-v2-pro"
_NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1"
_ANTHROPIC_DEFAULT_BASE_URL = "https://api.anthropic.com"
_AUTH_JSON_PATH = get_hermes_home() / "auth.json"
@@ -105,6 +141,23 @@ _CODEX_AUX_MODEL = "gpt-5.2-codex"
_CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"
+def _to_openai_base_url(base_url: str) -> str:
+ """Normalize an Anthropic-style base URL to OpenAI-compatible format.
+
+ Some providers (MiniMax, MiniMax-CN) expose an ``/anthropic`` endpoint for
+ the Anthropic Messages API and a separate ``/v1`` endpoint for OpenAI chat
+ completions. The auxiliary client uses the OpenAI SDK, so it must hit the
+ ``/v1`` surface. Passing the raw ``inference_base_url`` causes requests to
+ land on ``/anthropic/chat/completions`` — a 404.
+ """
+ url = str(base_url or "").strip().rstrip("/")
+ if url.endswith("/anthropic"):
+ rewritten = url[: -len("/anthropic")] + "/v1"
+ logger.debug("Auxiliary client: rewrote base URL %s → %s", url, rewritten)
+ return rewritten
+ return url
+
+
def _select_pool_entry(provider: str) -> Tuple[bool, Optional[Any]]:
"""Return (pool_exists_for_provider, selected_entry)."""
try:
@@ -634,7 +687,9 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
if not api_key:
continue
- base_url = _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
+ base_url = _to_openai_base_url(
+ _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url
+ )
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
extra = {}
@@ -651,7 +706,9 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
if not api_key:
continue
- base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
+ base_url = _to_openai_base_url(
+ str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
+ )
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
extra = {}
@@ -713,7 +770,7 @@ def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
-def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]:
+def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]:
nous = _read_nous_auth()
if not nous:
return None, None
@@ -725,12 +782,13 @@ def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]:
else:
model = _NOUS_MODEL
# Free-tier users can't use paid auxiliary models — use the free
- # multimodal model instead so vision/browser-vision still works.
+ # models instead: mimo-v2-omni for vision, mimo-v2-pro for text tasks.
try:
from hermes_cli.models import check_nous_free_tier
if check_nous_free_tier():
- model = _NOUS_FREE_TIER_VISION_MODEL
- logger.debug("Free-tier Nous account — using %s for auxiliary/vision", model)
+ model = _NOUS_FREE_TIER_VISION_MODEL if vision else _NOUS_FREE_TIER_AUX_MODEL
+ logger.debug("Free-tier Nous account — using %s for auxiliary/%s",
+ model, "vision" if vision else "text")
except Exception:
pass
return (
@@ -776,7 +834,7 @@ def _read_main_provider() -> str:
if isinstance(model_cfg, dict):
provider = model_cfg.get("provider", "")
if isinstance(provider, str) and provider.strip():
- return provider.strip().lower()
+ return _normalize_aux_provider(provider)
except Exception:
pass
return ""
@@ -1138,17 +1196,7 @@ def resolve_provider_client(
(client, resolved_model) or (None, None) if auth is unavailable.
"""
# Normalise aliases
- provider = (provider or "auto").strip().lower()
- if provider == "codex":
- provider = "openai-codex"
- if provider == "main":
- # Resolve to the user's actual main provider so named custom providers
- # and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly.
- main_prov = _read_main_provider()
- if main_prov and main_prov not in ("auto", "main", ""):
- provider = main_prov
- else:
- provider = "custom"
+ provider = _normalize_aux_provider(provider)
# ── Auto: try all providers in priority order ────────────────────
if provider == "auto":
@@ -1298,7 +1346,9 @@ def resolve_provider_client(
provider, ", ".join(tried_sources))
return None, None
- base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
+ base_url = _to_openai_base_url(
+ str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url
+ )
default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "")
final_model = model or default_model
@@ -1375,24 +1425,11 @@ def get_async_text_auxiliary_client(task: str = ""):
_VISION_AUTO_PROVIDER_ORDER = (
"openrouter",
"nous",
- "openai-codex",
- "anthropic",
- "custom",
)
def _normalize_vision_provider(provider: Optional[str]) -> str:
- provider = (provider or "auto").strip().lower()
- if provider == "codex":
- return "openai-codex"
- if provider == "main":
- # Resolve to actual main provider — named custom providers and
- # non-aggregator providers need to pass through as their real name.
- main_prov = _read_main_provider()
- if main_prov and main_prov not in ("auto", "main", ""):
- return main_prov
- return "custom"
- return provider
+ return _normalize_aux_provider(provider, for_vision=True)
def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Optional[str]]:
@@ -1400,7 +1437,7 @@ def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Option
if provider == "openrouter":
return _try_openrouter()
if provider == "nous":
- return _try_nous()
+ return _try_nous(vision=True)
if provider == "openai-codex":
return _try_codex()
if provider == "anthropic":
@@ -1433,17 +1470,20 @@ def _preferred_main_vision_provider() -> Optional[str]:
def get_available_vision_backends() -> List[str]:
"""Return the currently available vision backends in auto-selection order.
- This is the single source of truth for setup, tool gating, and runtime
- auto-routing of vision tasks. The selected main provider is preferred when
- it is also a known-good vision backend; otherwise Hermes falls back through
- the standard conservative order.
+ Order: OpenRouter → Nous → active provider. This is the single source
+ of truth for setup, tool gating, and runtime auto-routing of vision tasks.
"""
- ordered = list(_VISION_AUTO_PROVIDER_ORDER)
- preferred = _preferred_main_vision_provider()
- if preferred in ordered:
- ordered.remove(preferred)
- ordered.insert(0, preferred)
- return [provider for provider in ordered if _strict_vision_backend_available(provider)]
+ available = [p for p in _VISION_AUTO_PROVIDER_ORDER
+ if _strict_vision_backend_available(p)]
+ # Also check the user's active provider (may be DeepSeek, Alibaba, named
+ # custom, etc.) — resolve_provider_client handles all provider types.
+ main_provider = _read_main_provider()
+ if (main_provider and main_provider not in ("auto", "")
+ and main_provider not in available):
+ client, _ = resolve_provider_client(main_provider, _read_main_model())
+ if client is not None:
+ available.append(main_provider)
+ return available
def resolve_vision_provider_client(
@@ -1488,16 +1528,30 @@ def resolve_vision_provider_client(
return "custom", client, final_model
if requested == "auto":
- ordered = list(_VISION_AUTO_PROVIDER_ORDER)
- preferred = _preferred_main_vision_provider()
- if preferred in ordered:
- ordered.remove(preferred)
- ordered.insert(0, preferred)
-
- for candidate in ordered:
+ # Vision auto-detection order:
+ # 1. OpenRouter (known vision-capable default model)
+ # 2. Nous Portal (known vision-capable default model)
+ # 3. Active provider + model (user's main chat config)
+ # 4. Stop
+ for candidate in _VISION_AUTO_PROVIDER_ORDER:
sync_client, default_model = _resolve_strict_vision_backend(candidate)
if sync_client is not None:
return _finalize(candidate, sync_client, default_model)
+
+ # Fall back to the user's active provider + model.
+ main_provider = _read_main_provider()
+ main_model = _read_main_model()
+ if main_provider and main_provider not in ("auto", ""):
+ sync_client, resolved_model = resolve_provider_client(
+ main_provider, main_model)
+ if sync_client is not None:
+ logger.info(
+ "Vision auto-detect: using active provider %s (%s)",
+ main_provider, resolved_model or main_model,
+ )
+ return _finalize(
+ main_provider, sync_client, resolved_model or main_model)
+
logger.debug("Auxiliary vision client: none available")
return None, None, None
diff --git a/agent/model_metadata.py b/agent/model_metadata.py
index 50245a7c9c..0a22711865 100644
--- a/agent/model_metadata.py
+++ b/agent/model_metadata.py
@@ -113,8 +113,15 @@ DEFAULT_CONTEXT_LENGTHS = {
"llama": 131072,
# Qwen
"qwen": 131072,
- # MiniMax
- "minimax": 204800,
+ # MiniMax (lowercase — lookup lowercases model names at line 973)
+ "minimax-m1-256k": 1000000,
+ "minimax-m1-128k": 1000000,
+ "minimax-m1-80k": 1000000,
+ "minimax-m1-40k": 1000000,
+ "minimax-m1": 1000000,
+ "minimax-m2.5": 1048576,
+ "minimax-m2.7": 1048576,
+ "minimax": 1048576,
# GLM
"glm": 202752,
# Kimi
@@ -127,7 +134,7 @@ DEFAULT_CONTEXT_LENGTHS = {
"deepseek-ai/DeepSeek-V3.2": 65536,
"moonshotai/Kimi-K2.5": 262144,
"moonshotai/Kimi-K2-Thinking": 262144,
- "MiniMaxAI/MiniMax-M2.5": 204800,
+ "minimaxai/minimax-m2.5": 1048576,
"XiaomiMiMo/MiMo-V2-Flash": 32768,
"mimo-v2-pro": 1048576,
"mimo-v2-omni": 1048576,
@@ -611,6 +618,59 @@ def _model_id_matches(candidate_id: str, lookup_model: str) -> bool:
return False
+def query_ollama_num_ctx(model: str, base_url: str) -> Optional[int]:
+ """Query an Ollama server for the model's context length.
+
+ Returns the model's maximum context from GGUF metadata via ``/api/show``,
+ or the explicit ``num_ctx`` from the Modelfile if set. Returns None if
+ the server is unreachable or not Ollama.
+
+ This is the value that should be passed as ``num_ctx`` in Ollama chat
+ requests to override the default 2048.
+ """
+ import httpx
+
+ bare_model = _strip_provider_prefix(model)
+ server_url = base_url.rstrip("/")
+ if server_url.endswith("/v1"):
+ server_url = server_url[:-3]
+
+ try:
+ server_type = detect_local_server_type(base_url)
+ except Exception:
+ return None
+ if server_type != "ollama":
+ return None
+
+ try:
+ with httpx.Client(timeout=3.0) as client:
+ resp = client.post(f"{server_url}/api/show", json={"name": bare_model})
+ if resp.status_code != 200:
+ return None
+ data = resp.json()
+
+ # Prefer explicit num_ctx from Modelfile parameters (user override)
+ params = data.get("parameters", "")
+ if "num_ctx" in params:
+ for line in params.split("\n"):
+ if "num_ctx" in line:
+ parts = line.strip().split()
+ if len(parts) >= 2:
+ try:
+ return int(parts[-1])
+ except ValueError:
+ pass
+
+ # Fall back to GGUF model_info context_length (training max)
+ model_info = data.get("model_info", {})
+ for key, value in model_info.items():
+ if "context_length" in key and isinstance(value, (int, float)):
+ return int(value)
+ except Exception:
+ pass
+ return None
+
+
def _query_local_context_length(model: str, base_url: str) -> Optional[int]:
"""Query a local server for the model's context length."""
import httpx
diff --git a/agent/prompt_builder.py b/agent/prompt_builder.py
index df5532e125..b1b0891f59 100644
--- a/agent/prompt_builder.py
+++ b/agent/prompt_builder.py
@@ -204,6 +204,30 @@ OPENAI_MODEL_EXECUTION_GUIDANCE = (
"the result.\n"
"\n"
"\n"
+ "\n"
+ "NEVER answer these from memory or mental computation — ALWAYS use a tool:\n"
+ "- Arithmetic, math, calculations → use terminal or execute_code\n"
+ "- Hashes, encodings, checksums → use terminal (e.g. sha256sum, base64)\n"
+ "- Current time, date, timezone → use terminal (e.g. date)\n"
+ "- System state: OS, CPU, memory, disk, ports, processes → use terminal\n"
+ "- File contents, sizes, line counts → use read_file, search_files, or terminal\n"
+ "- Git history, branches, diffs → use terminal\n"
+ "- Current facts (weather, news, versions) → use web_search\n"
+ "Your memory and user profile describe the USER, not the system you are "
+ "running on. The execution environment may differ from what the user profile "
+ "says about their personal setup.\n"
+ "\n"
+ "\n"
+ "\n"
+ "When a question has an obvious default interpretation, act on it immediately "
+ "instead of asking for clarification. Examples:\n"
+ "- 'Is port 443 open?' → check THIS machine (don't ask 'open where?')\n"
+ "- 'What OS am I running?' → check the live system (don't use user profile)\n"
+ "- 'What time is it?' → run `date` (don't guess)\n"
+ "Only ask for clarification when the ambiguity genuinely changes what tool "
+ "you would call.\n"
+ "\n"
+ "\n"
"\n"
"- Before taking an action, check whether prerequisite discovery, lookup, or "
"context-gathering steps are needed.\n"
diff --git a/agent/retry_utils.py b/agent/retry_utils.py
new file mode 100644
index 0000000000..71d6963f7b
--- /dev/null
+++ b/agent/retry_utils.py
@@ -0,0 +1,57 @@
+"""Retry utilities — jittered backoff for decorrelated retries.
+
+Replaces fixed exponential backoff with jittered delays to prevent
+thundering-herd retry spikes when multiple sessions hit the same
+rate-limited provider concurrently.
+"""
+
+import random
+import threading
+import time
+
+# Monotonic counter for jitter seed uniqueness within the same process.
+# Protected by a lock to avoid race conditions in concurrent retry paths
+# (e.g. multiple gateway sessions retrying simultaneously).
+_jitter_counter = 0
+_jitter_lock = threading.Lock()
+
+
+def jittered_backoff(
+ attempt: int,
+ *,
+ base_delay: float = 5.0,
+ max_delay: float = 120.0,
+ jitter_ratio: float = 0.5,
+) -> float:
+ """Compute a jittered exponential backoff delay.
+
+ Args:
+ attempt: 1-based retry attempt number.
+ base_delay: Base delay in seconds for attempt 1.
+ max_delay: Maximum delay cap in seconds.
+ jitter_ratio: Fraction of computed delay to use as random jitter
+ range. 0.5 means jitter is uniform in [0, 0.5 * delay].
+
+ Returns:
+ Delay in seconds: min(base * 2^(attempt-1), max_delay) + jitter.
+
+ The jitter decorrelates concurrent retries so multiple sessions
+ hitting the same provider don't all retry at the same instant.
+ """
+ global _jitter_counter
+ with _jitter_lock:
+ _jitter_counter += 1
+ tick = _jitter_counter
+
+ exponent = max(0, attempt - 1)
+ if exponent >= 63 or base_delay <= 0:
+ delay = max_delay
+ else:
+ delay = min(base_delay * (2 ** exponent), max_delay)
+
+ # Seed from time + counter for decorrelation even with coarse clocks.
+ seed = (time.time_ns() ^ (tick * 0x9E3779B9)) & 0xFFFFFFFF
+ rng = random.Random(seed)
+ jitter = rng.uniform(0, jitter_ratio * delay)
+
+ return delay + jitter
diff --git a/cli.py b/cli.py
index b4358a163c..f00e6b7fea 100644
--- a/cli.py
+++ b/cli.py
@@ -612,6 +612,11 @@ def _run_cleanup():
pass
# Shut down memory provider (on_session_end + shutdown_all) at actual
# session boundary — NOT per-turn inside run_conversation().
+ try:
+ from hermes_cli.plugins import invoke_hook as _invoke_hook
+ _invoke_hook("on_session_finalize", session_id=_active_agent_ref.session_id if _active_agent_ref else None, platform="cli")
+ except Exception:
+ pass
try:
if _active_agent_ref and hasattr(_active_agent_ref, 'shutdown_memory_provider'):
_active_agent_ref.shutdown_memory_provider(
@@ -755,7 +760,10 @@ def _setup_worktree(repo_root: str = None) -> Optional[Dict[str, str]]:
def _cleanup_worktree(info: Dict[str, str] = None) -> None:
"""Remove a worktree and its branch on exit.
- If the worktree has uncommitted changes, warn and keep it.
+ Preserves the worktree only if it has unpushed commits (real work
+ that hasn't been pushed to any remote). Uncommitted changes alone
+ (untracked files, test artifacts) are not enough to keep it — agent
+ work lives in commits/PRs, not the working tree.
"""
global _active_worktree
info = info or _active_worktree
@@ -771,23 +779,27 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
if not Path(wt_path).exists():
return
- # Check for uncommitted changes
+ # Check for unpushed commits — commits reachable from HEAD but not
+ # from any remote branch. These represent real work the agent did
+ # but didn't push.
+ has_unpushed = False
try:
- status = subprocess.run(
- ["git", "status", "--porcelain"],
+ result = subprocess.run(
+ ["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
capture_output=True, text=True, timeout=10, cwd=wt_path,
)
- has_changes = bool(status.stdout.strip())
+ has_unpushed = bool(result.stdout.strip())
except Exception:
- has_changes = True # Assume dirty on error — don't delete
+ has_unpushed = True # Assume unpushed on error — don't delete
- if has_changes:
- print(f"\n\033[33m⚠ Worktree has uncommitted changes, keeping: {wt_path}\033[0m")
- print(f" To clean up manually: git worktree remove {wt_path}")
+ if has_unpushed:
+ print(f"\n\033[33m⚠ Worktree has unpushed commits, keeping: {wt_path}\033[0m")
+ print(f" To clean up manually: git worktree remove --force {wt_path}")
_active_worktree = None
return
- # Remove worktree
+ # Remove worktree (even if working tree is dirty — uncommitted
+ # changes without unpushed commits are just artifacts)
try:
subprocess.run(
["git", "worktree", "remove", wt_path, "--force"],
@@ -796,7 +808,7 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
except Exception as e:
logger.debug("Failed to remove worktree: %s", e)
- # Delete the branch (only if it was never pushed / has no upstream)
+ # Delete the branch
try:
subprocess.run(
["git", "branch", "-D", branch],
@@ -810,19 +822,27 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None:
def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
- """Remove worktrees older than max_age_hours that have no uncommitted changes.
+ """Remove stale worktrees and orphaned branches on startup.
- Runs silently on startup to clean up after crashed/killed sessions.
+ Age-based tiers:
+ - Under max_age_hours (24h): skip — session may still be active.
+ - 24h–72h: remove if no unpushed commits.
+ - Over 72h: force remove regardless (nothing should sit this long).
+
+ Also prunes orphaned ``hermes/*`` and ``pr-*`` local branches that
+ have no corresponding worktree.
"""
import subprocess
import time
worktrees_dir = Path(repo_root) / ".worktrees"
if not worktrees_dir.exists():
+ _prune_orphaned_branches(repo_root)
return
now = time.time()
- cutoff = now - (max_age_hours * 3600)
+ soft_cutoff = now - (max_age_hours * 3600) # 24h default
+ hard_cutoff = now - (max_age_hours * 3 * 3600) # 72h default
for entry in worktrees_dir.iterdir():
if not entry.is_dir() or not entry.name.startswith("hermes-"):
@@ -831,21 +851,24 @@ def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
# Check age
try:
mtime = entry.stat().st_mtime
- if mtime > cutoff:
+ if mtime > soft_cutoff:
continue # Too recent — skip
except Exception:
continue
- # Check for uncommitted changes
- try:
- status = subprocess.run(
- ["git", "status", "--porcelain"],
- capture_output=True, text=True, timeout=5, cwd=str(entry),
- )
- if status.stdout.strip():
- continue # Has changes — skip
- except Exception:
- continue # Can't check — skip
+ force = mtime <= hard_cutoff # Over 72h — force remove
+
+ if not force:
+ # 24h–72h tier: only remove if no unpushed commits
+ try:
+ result = subprocess.run(
+ ["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
+ capture_output=True, text=True, timeout=5, cwd=str(entry),
+ )
+ if result.stdout.strip():
+ continue # Has unpushed commits — skip
+ except Exception:
+ continue # Can't check — skip
# Safe to remove
try:
@@ -864,10 +887,81 @@ def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None:
["git", "branch", "-D", branch],
capture_output=True, text=True, timeout=10, cwd=repo_root,
)
- logger.debug("Pruned stale worktree: %s", entry.name)
+ logger.debug("Pruned stale worktree: %s (force=%s)", entry.name, force)
except Exception as e:
logger.debug("Failed to prune worktree %s: %s", entry.name, e)
+ _prune_orphaned_branches(repo_root)
+
+
+def _prune_orphaned_branches(repo_root: str) -> None:
+ """Delete local ``hermes/hermes-*`` and ``pr-*`` branches with no worktree.
+
+ These are auto-generated by ``hermes -w`` sessions and PR review
+ workflows respectively. Once their worktree is gone they serve no
+ purpose and just accumulate.
+ """
+ import subprocess
+
+ try:
+ result = subprocess.run(
+ ["git", "branch", "--format=%(refname:short)"],
+ capture_output=True, text=True, timeout=10, cwd=repo_root,
+ )
+ if result.returncode != 0:
+ return
+ all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
+ except Exception:
+ return
+
+ # Collect branches that are actively checked out in a worktree
+ active_branches: set = set()
+ try:
+ wt_result = subprocess.run(
+ ["git", "worktree", "list", "--porcelain"],
+ capture_output=True, text=True, timeout=10, cwd=repo_root,
+ )
+ for line in wt_result.stdout.split("\n"):
+ if line.startswith("branch refs/heads/"):
+ active_branches.add(line.split("branch refs/heads/", 1)[-1].strip())
+ except Exception:
+ return # Can't determine active branches — bail
+
+ # Also protect the currently checked-out branch and main
+ try:
+ head_result = subprocess.run(
+ ["git", "branch", "--show-current"],
+ capture_output=True, text=True, timeout=5, cwd=repo_root,
+ )
+ current = head_result.stdout.strip()
+ if current:
+ active_branches.add(current)
+ except Exception:
+ pass
+ active_branches.add("main")
+
+ orphaned = [
+ b for b in all_branches
+ if b not in active_branches
+ and (b.startswith("hermes/hermes-") or b.startswith("pr-"))
+ ]
+
+ if not orphaned:
+ return
+
+ # Delete in batches
+ for i in range(0, len(orphaned), 50):
+ batch = orphaned[i:i + 50]
+ try:
+ subprocess.run(
+ ["git", "branch", "-D"] + batch,
+ capture_output=True, text=True, timeout=30, cwd=repo_root,
+ )
+ except Exception as e:
+ logger.debug("Failed to prune orphaned branches: %s", e)
+
+ logger.debug("Pruned %d orphaned branches", len(orphaned))
+
# ============================================================================
# ASCII Art & Branding
# ============================================================================
@@ -3314,6 +3408,22 @@ class HermesCLI:
flush_tool_summary()
print()
+ def _notify_session_boundary(self, event_type: str) -> None:
+ """Fire a session-boundary plugin hook (on_session_finalize or on_session_reset).
+
+ Non-blocking — errors are caught and logged. Safe to call from any
+ lifecycle point (shutdown, /new, /reset).
+ """
+ try:
+ from hermes_cli.plugins import invoke_hook as _invoke_hook
+ _invoke_hook(
+ event_type,
+ session_id=self.agent.session_id if self.agent else None,
+ platform=getattr(self, "platform", None) or "cli",
+ )
+ except Exception:
+ pass
+
def new_session(self, silent=False):
"""Start a fresh session with a new session ID and cleared agent state."""
if self.agent and self.conversation_history:
@@ -3321,6 +3431,10 @@ class HermesCLI:
self.agent.flush_memories(self.conversation_history)
except (Exception, KeyboardInterrupt):
pass
+ self._notify_session_boundary("on_session_finalize")
+ elif self.agent:
+ # First session or empty history — still finalize the old session
+ self._notify_session_boundary("on_session_finalize")
old_session_id = self.session_id
if self._session_db and old_session_id:
@@ -3365,6 +3479,7 @@ class HermesCLI:
)
except Exception:
pass
+ self._notify_session_boundary("on_session_reset")
if not silent:
print("(^_^)v New session started!")
diff --git a/cron/jobs.py b/cron/jobs.py
index 214da521fe..4096d1fd81 100644
--- a/cron/jobs.py
+++ b/cron/jobs.py
@@ -574,12 +574,16 @@ def remove_job(job_id: str) -> bool:
return False
-def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
+def mark_job_run(job_id: str, success: bool, error: Optional[str] = None,
+ delivery_error: Optional[str] = None):
"""
Mark a job as having been run.
Updates last_run_at, last_status, increments completed count,
computes next_run_at, and auto-deletes if repeat limit reached.
+
+ ``delivery_error`` is tracked separately from the agent error — a job
+ can succeed (agent produced output) but fail delivery (platform down).
"""
jobs = load_jobs()
for i, job in enumerate(jobs):
@@ -588,6 +592,8 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
job["last_run_at"] = now
job["last_status"] = "ok" if success else "error"
job["last_error"] = error if not success else None
+ # Track delivery failures separately — cleared on successful delivery
+ job["last_delivery_error"] = delivery_error
# Increment completed count
if job.get("repeat"):
diff --git a/cron/scheduler.py b/cron/scheduler.py
index 8d71248b4e..33a9b89935 100644
--- a/cron/scheduler.py
+++ b/cron/scheduler.py
@@ -196,7 +196,7 @@ def _send_media_via_adapter(adapter, chat_id: str, media_files: list, metadata:
logger.warning("Job '%s': failed to send media %s: %s", job.get("id", "?"), media_path, e)
-def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None:
+def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Optional[str]:
"""
Deliver job output to the configured target (origin chat, specific platform, etc.).
@@ -204,16 +204,16 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None:
use the live adapter first — this supports E2EE rooms (e.g. Matrix) where
the standalone HTTP path cannot encrypt. Falls back to standalone send if
the adapter path fails or is unavailable.
+
+ Returns None on success, or an error string on failure.
"""
target = _resolve_delivery_target(job)
if not target:
if job.get("deliver", "local") != "local":
- logger.warning(
- "Job '%s' deliver=%s but no concrete delivery target could be resolved",
- job["id"],
- job.get("deliver", "local"),
- )
- return
+ msg = f"no delivery target resolved for deliver={job.get('deliver', 'local')}"
+ logger.warning("Job '%s': %s", job["id"], msg)
+ return msg
+ return None # local-only jobs don't deliver — not a failure
platform_name = target["platform"]
chat_id = target["chat_id"]
@@ -239,19 +239,22 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None:
}
platform = platform_map.get(platform_name.lower())
if not platform:
- logger.warning("Job '%s': unknown platform '%s' for delivery", job["id"], platform_name)
- return
+ msg = f"unknown platform '{platform_name}'"
+ logger.warning("Job '%s': %s", job["id"], msg)
+ return msg
try:
config = load_gateway_config()
except Exception as e:
- logger.error("Job '%s': failed to load gateway config for delivery: %s", job["id"], e)
- return
+ msg = f"failed to load gateway config: {e}"
+ logger.error("Job '%s': %s", job["id"], msg)
+ return msg
pconfig = config.platforms.get(platform)
if not pconfig or not pconfig.enabled:
- logger.warning("Job '%s': platform '%s' not configured/enabled", job["id"], platform_name)
- return
+ msg = f"platform '{platform_name}' not configured/enabled"
+ logger.warning("Job '%s': %s", job["id"], msg)
+ return msg
# Optionally wrap the content with a header/footer so the user knows this
# is a cron delivery. Wrapping is on by default; set cron.wrap_response: false
@@ -307,7 +310,7 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None:
if adapter_ok:
logger.info("Job '%s': delivered to %s:%s via live adapter", job["id"], platform_name, chat_id)
- return
+ return None
except Exception as e:
logger.warning(
"Job '%s': live adapter delivery to %s:%s failed (%s), falling back to standalone",
@@ -329,13 +332,17 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None:
future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, cleaned_delivery_content, thread_id=thread_id, media_files=media_files))
result = future.result(timeout=30)
except Exception as e:
- logger.error("Job '%s': delivery to %s:%s failed: %s", job["id"], platform_name, chat_id, e)
- return
+ msg = f"delivery to {platform_name}:{chat_id} failed: {e}"
+ logger.error("Job '%s': %s", job["id"], msg)
+ return msg
if result and result.get("error"):
- logger.error("Job '%s': delivery error: %s", job["id"], result["error"])
- else:
- logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
+ msg = f"delivery error: {result['error']}"
+ logger.error("Job '%s': %s", job["id"], msg)
+ return msg
+
+ logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
+ return None
_SCRIPT_TIMEOUT = 120 # seconds
@@ -578,11 +585,9 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
except Exception as e:
logger.warning("Job '%s': failed to load config.yaml, using defaults: %s", job_id, e)
- # Reasoning config from env or config.yaml
+ # Reasoning config from config.yaml
from hermes_constants import parse_reasoning_effort
- effort = os.getenv("HERMES_REASONING_EFFORT", "")
- if not effort:
- effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip()
+ effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip()
reasoning_config = parse_reasoning_effort(effort)
# Prefill messages from env or config.yaml
@@ -868,13 +873,15 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int:
logger.info("Job '%s': agent returned %s — skipping delivery", job["id"], SILENT_MARKER)
should_deliver = False
+ delivery_error = None
if should_deliver:
try:
- _deliver_result(job, deliver_content, adapters=adapters, loop=loop)
+ delivery_error = _deliver_result(job, deliver_content, adapters=adapters, loop=loop)
except Exception as de:
+ delivery_error = str(de)
logger.error("Delivery failed for job %s: %s", job["id"], de)
- mark_job_run(job["id"], success, error)
+ mark_job_run(job["id"], success, error, delivery_error=delivery_error)
executed += 1
except Exception as e:
diff --git a/gateway/platforms/feishu.py b/gateway/platforms/feishu.py
index 4bc712f29f..6012a0f1c0 100644
--- a/gateway/platforms/feishu.py
+++ b/gateway/platforms/feishu.py
@@ -20,6 +20,7 @@ from __future__ import annotations
import asyncio
import hashlib
import hmac
+import itertools
import json
import logging
import mimetypes
@@ -1052,6 +1053,9 @@ class FeishuAdapter(BasePlatformAdapter):
self._media_batch_state = FeishuBatchState()
self._pending_media_batches = self._media_batch_state.events
self._pending_media_batch_tasks = self._media_batch_state.tasks
+ # Exec approval button state (approval_id → {session_key, message_id, chat_id})
+ self._approval_state: Dict[int, Dict[str, str]] = {}
+ self._approval_counter = itertools.count(1)
self._load_seen_message_ids()
@staticmethod
@@ -1394,6 +1398,104 @@ class FeishuAdapter(BasePlatformAdapter):
logger.error("[Feishu] Failed to edit message %s: %s", message_id, exc, exc_info=True)
return SendResult(success=False, error=str(exc))
+ async def send_exec_approval(
+ self, chat_id: str, command: str, session_key: str,
+ description: str = "dangerous command",
+ metadata: Optional[Dict[str, Any]] = None,
+ ) -> SendResult:
+ """Send an interactive card with approval buttons.
+
+ The buttons carry ``hermes_action`` in their value dict so that
+ ``_handle_card_action_event`` can intercept them and call
+ ``resolve_gateway_approval()`` to unblock the waiting agent thread.
+ """
+ if not self._client:
+ return SendResult(success=False, error="Not connected")
+
+ try:
+ approval_id = next(self._approval_counter)
+ cmd_preview = command[:3000] + "..." if len(command) > 3000 else command
+
+ def _btn(label: str, action_name: str, btn_type: str = "default") -> dict:
+ return {
+ "tag": "button",
+ "text": {"tag": "plain_text", "content": label},
+ "type": btn_type,
+ "value": {"hermes_action": action_name, "approval_id": approval_id},
+ }
+
+ card = {
+ "config": {"wide_screen_mode": True},
+ "header": {
+ "title": {"content": "⚠️ Command Approval Required", "tag": "plain_text"},
+ "template": "orange",
+ },
+ "elements": [
+ {
+ "tag": "markdown",
+ "content": f"```\n{cmd_preview}\n```\n**Reason:** {description}",
+ },
+ {
+ "tag": "action",
+ "actions": [
+ _btn("✅ Allow Once", "approve_once", "primary"),
+ _btn("✅ Session", "approve_session"),
+ _btn("✅ Always", "approve_always"),
+ _btn("❌ Deny", "deny", "danger"),
+ ],
+ },
+ ],
+ }
+
+ payload = json.dumps(card, ensure_ascii=False)
+ response = await self._feishu_send_with_retry(
+ chat_id=chat_id,
+ msg_type="interactive",
+ payload=payload,
+ reply_to=None,
+ metadata=metadata,
+ )
+
+ result = self._finalize_send_result(response, "send_exec_approval failed")
+ if result.success:
+ self._approval_state[approval_id] = {
+ "session_key": session_key,
+ "message_id": result.message_id or "",
+ "chat_id": chat_id,
+ }
+ return result
+ except Exception as exc:
+ logger.warning("[Feishu] send_exec_approval failed: %s", exc)
+ return SendResult(success=False, error=str(exc))
+
+ async def _update_approval_card(
+ self, message_id: str, label: str, user_name: str, choice: str,
+ ) -> None:
+ """Replace the approval card with a resolved status card."""
+ if not self._client or not message_id:
+ return
+ icon = "❌" if choice == "deny" else "✅"
+ card = {
+ "config": {"wide_screen_mode": True},
+ "header": {
+ "title": {"content": f"{icon} {label}", "tag": "plain_text"},
+ "template": "red" if choice == "deny" else "green",
+ },
+ "elements": [
+ {
+ "tag": "markdown",
+ "content": f"{icon} **{label}** by {user_name}",
+ },
+ ],
+ }
+ try:
+ payload = json.dumps(card, ensure_ascii=False)
+ body = self._build_update_message_body(msg_type="interactive", content=payload)
+ request = self._build_update_message_request(message_id=message_id, request_body=body)
+ await asyncio.to_thread(self._client.im.v1.message.update, request)
+ except Exception as exc:
+ logger.warning("[Feishu] Failed to update approval card %s: %s", message_id, exc)
+
async def send_voice(
self,
chat_id: str,
@@ -1820,6 +1922,52 @@ class FeishuAdapter(BasePlatformAdapter):
action = getattr(event, "action", None)
action_tag = str(getattr(action, "tag", "") or "button")
action_value = getattr(action, "value", {}) or {}
+
+ # --- Exec approval button intercept ---
+ hermes_action = action_value.get("hermes_action") if isinstance(action_value, dict) else None
+ if hermes_action:
+ approval_id = action_value.get("approval_id")
+ state = self._approval_state.pop(approval_id, None)
+ if not state:
+ logger.debug("[Feishu] Approval %s already resolved or unknown", approval_id)
+ return
+
+ choice_map = {
+ "approve_once": "once",
+ "approve_session": "session",
+ "approve_always": "always",
+ "deny": "deny",
+ }
+ choice = choice_map.get(hermes_action, "deny")
+
+ label_map = {
+ "once": "Approved once",
+ "session": "Approved for session",
+ "always": "Approved permanently",
+ "deny": "Denied",
+ }
+ label = label_map.get(choice, "Resolved")
+
+ # Resolve sender name for the status card
+ sender_id = SimpleNamespace(open_id=open_id, user_id=None, union_id=None)
+ sender_profile = await self._resolve_sender_profile(sender_id)
+ user_name = sender_profile.get("user_name") or open_id
+
+ # Resolve the approval — unblocks the agent thread
+ try:
+ from tools.approval import resolve_gateway_approval
+ count = resolve_gateway_approval(state["session_key"], choice)
+ logger.info(
+ "Feishu button resolved %d approval(s) for session %s (choice=%s, user=%s)",
+ count, state["session_key"], choice, user_name,
+ )
+ except Exception as exc:
+ logger.error("Failed to resolve gateway approval from Feishu button: %s", exc)
+
+ # Update the card to show the decision
+ await self._update_approval_card(state.get("message_id", ""), label, user_name, choice)
+ return
+
synthetic_text = f"/card {action_tag}"
if action_value:
try:
diff --git a/gateway/run.py b/gateway/run.py
index 99c71d9156..7a551be168 100644
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -921,12 +921,11 @@ class GatewayRunner:
@staticmethod
def _load_reasoning_config() -> dict | None:
- """Load reasoning effort from config with env fallback.
+ """Load reasoning effort from config.yaml.
- Checks agent.reasoning_effort in config.yaml first, then
- HERMES_REASONING_EFFORT as a fallback. Valid: "xhigh", "high",
- "medium", "low", "minimal", "none". Returns None to use default
- (medium).
+ Reads agent.reasoning_effort from config.yaml. Valid: "xhigh",
+ "high", "medium", "low", "minimal", "none". Returns None to use
+ default (medium).
"""
from hermes_constants import parse_reasoning_effort
effort = ""
@@ -939,8 +938,6 @@ class GatewayRunner:
effort = str(cfg.get("agent", {}).get("reasoning_effort", "") or "").strip()
except Exception:
pass
- if not effort:
- effort = os.getenv("HERMES_REASONING_EFFORT", "")
result = parse_reasoning_effort(effort)
if effort and effort.strip() and result is None:
logger.warning("Unknown reasoning_effort '%s', using default (medium)", effort)
@@ -1484,6 +1481,14 @@ class GatewayRunner:
logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20])
except Exception as e:
logger.debug("Failed interrupting agent during shutdown: %s", e)
+ # Fire plugin on_session_finalize hook before memory shutdown
+ try:
+ from hermes_cli.plugins import invoke_hook as _invoke_hook
+ _invoke_hook("on_session_finalize",
+ session_id=getattr(agent, 'session_id', None),
+ platform="gateway")
+ except Exception:
+ pass
# Shut down memory provider at actual session boundary
try:
if hasattr(agent, 'shutdown_memory_provider'):
@@ -3277,6 +3282,15 @@ class GatewayRunner:
# the configured default instead of the previously switched model.
self._session_model_overrides.pop(session_key, None)
+ # Fire plugin on_session_finalize hook (session boundary)
+ try:
+ from hermes_cli.plugins import invoke_hook as _invoke_hook
+ _old_sid = old_entry.session_id if old_entry else None
+ _invoke_hook("on_session_finalize", session_id=_old_sid,
+ platform=source.platform.value if source.platform else "")
+ except Exception:
+ pass
+
# Emit session:end hook (session is ending)
await self.hooks.emit("session:end", {
"platform": source.platform.value if source.platform else "",
@@ -3290,7 +3304,7 @@ class GatewayRunner:
"user_id": source.user_id,
"session_key": session_key,
})
-
+
# Resolve session config info to surface to the user
try:
session_info = self._format_session_info()
@@ -3301,9 +3315,18 @@ class GatewayRunner:
header = "✨ Session reset! Starting fresh."
else:
# No existing session, just create one
- self.session_store.get_or_create_session(source, force_new=True)
+ new_entry = self.session_store.get_or_create_session(source, force_new=True)
header = "✨ New session started!"
+ # Fire plugin on_session_reset hook (new session guaranteed to exist)
+ try:
+ from hermes_cli.plugins import invoke_hook as _invoke_hook
+ _new_sid = new_entry.session_id if new_entry else None
+ _invoke_hook("on_session_reset", session_id=_new_sid,
+ platform=source.platform.value if source.platform else "")
+ except Exception:
+ pass
+
if session_info:
return f"{header}\n\n{session_info}"
return header
diff --git a/gateway/stream_consumer.py b/gateway/stream_consumer.py
index 2cda33642a..5522c631db 100644
--- a/gateway/stream_consumer.py
+++ b/gateway/stream_consumer.py
@@ -74,6 +74,8 @@ class GatewayStreamConsumer:
self._edit_supported = True # Disabled on first edit failure (Signal/Email/HA)
self._last_edit_time = 0.0
self._last_sent_text = "" # Track last-sent text to skip redundant edits
+ self._fallback_final_send = False
+ self._fallback_prefix = ""
@property
def already_sent(self) -> bool:
@@ -138,12 +140,19 @@ class GatewayStreamConsumer:
while (
len(self._accumulated) > _safe_limit
and self._message_id is not None
+ and self._edit_supported
):
split_at = self._accumulated.rfind("\n", 0, _safe_limit)
if split_at < _safe_limit // 2:
split_at = _safe_limit
chunk = self._accumulated[:split_at]
await self._send_or_edit(chunk)
+ if self._fallback_final_send:
+ # Edit failed while attempting to split an oversized
+ # message. Keep the full accumulated text intact so
+ # the fallback final-send path can deliver the
+ # remaining continuation without dropping content.
+ break
self._accumulated = self._accumulated[split_at:].lstrip("\n")
self._message_id = None
self._last_sent_text = ""
@@ -156,9 +165,17 @@ class GatewayStreamConsumer:
self._last_edit_time = time.monotonic()
if got_done:
- # Final edit without cursor
- if self._accumulated and self._message_id:
- await self._send_or_edit(self._accumulated)
+ # Final edit without cursor. If progressive editing failed
+ # mid-stream, send a single continuation/fallback message
+ # here instead of letting the base gateway path send the
+ # full response again.
+ if self._accumulated:
+ if self._fallback_final_send:
+ await self._send_fallback_final(self._accumulated)
+ elif self._message_id:
+ await self._send_or_edit(self._accumulated)
+ elif not self._already_sent:
+ await self._send_or_edit(self._accumulated)
return
# Tool boundary: the should_edit block above already flushed
@@ -169,6 +186,8 @@ class GatewayStreamConsumer:
self._message_id = None
self._accumulated = ""
self._last_sent_text = ""
+ self._fallback_final_send = False
+ self._fallback_prefix = ""
await asyncio.sleep(0.05) # Small yield to not busy-loop
@@ -207,6 +226,86 @@ class GatewayStreamConsumer:
# Strip trailing whitespace/newlines but preserve leading content
return cleaned.rstrip()
+ def _visible_prefix(self) -> str:
+ """Return the visible text already shown in the streamed message."""
+ prefix = self._last_sent_text or ""
+ if self.cfg.cursor and prefix.endswith(self.cfg.cursor):
+ prefix = prefix[:-len(self.cfg.cursor)]
+ return self._clean_for_display(prefix)
+
+ def _continuation_text(self, final_text: str) -> str:
+ """Return only the part of final_text the user has not already seen."""
+ prefix = self._fallback_prefix or self._visible_prefix()
+ if prefix and final_text.startswith(prefix):
+ return final_text[len(prefix):].lstrip()
+ return final_text
+
+ @staticmethod
+ def _split_text_chunks(text: str, limit: int) -> list[str]:
+ """Split text into reasonably sized chunks for fallback sends."""
+ if len(text) <= limit:
+ return [text]
+ chunks: list[str] = []
+ remaining = text
+ while len(remaining) > limit:
+ split_at = remaining.rfind("\n", 0, limit)
+ if split_at < limit // 2:
+ split_at = limit
+ chunks.append(remaining[:split_at])
+ remaining = remaining[split_at:].lstrip("\n")
+ if remaining:
+ chunks.append(remaining)
+ return chunks
+
+ async def _send_fallback_final(self, text: str) -> None:
+ """Send the final continuation after streaming edits stop working."""
+ final_text = self._clean_for_display(text)
+ continuation = self._continuation_text(final_text)
+ self._fallback_final_send = False
+ if not continuation.strip():
+ # Nothing new to send — the visible partial already matches final text.
+ self._already_sent = True
+ return
+
+ raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096)
+ safe_limit = max(500, raw_limit - 100)
+ chunks = self._split_text_chunks(continuation, safe_limit)
+
+ last_message_id: Optional[str] = None
+ last_successful_chunk = ""
+ sent_any_chunk = False
+ for chunk in chunks:
+ result = await self.adapter.send(
+ chat_id=self.chat_id,
+ content=chunk,
+ metadata=self.metadata,
+ )
+ if not result.success:
+ if sent_any_chunk:
+ # Some continuation text already reached the user. Suppress
+ # the base gateway final-send path so we don't resend the
+ # full response and create another duplicate.
+ self._already_sent = True
+ self._message_id = last_message_id
+ self._last_sent_text = last_successful_chunk
+ self._fallback_prefix = ""
+ return
+ # No fallback chunk reached the user — allow the normal gateway
+ # final-send path to try one more time.
+ self._already_sent = False
+ self._message_id = None
+ self._last_sent_text = ""
+ self._fallback_prefix = ""
+ return
+ sent_any_chunk = True
+ last_successful_chunk = chunk
+ last_message_id = result.message_id or last_message_id
+
+ self._message_id = last_message_id
+ self._already_sent = True
+ self._last_sent_text = chunks[-1]
+ self._fallback_prefix = ""
+
async def _send_or_edit(self, text: str) -> None:
"""Send or edit the streaming message."""
# Strip MEDIA: directives so they don't appear as visible text.
@@ -232,14 +331,16 @@ class GatewayStreamConsumer:
self._last_sent_text = text
else:
# If an edit fails mid-stream (especially Telegram flood control),
- # stop progressive edits and let the normal final send path deliver
- # the complete answer instead of leaving the user with a partial.
+ # stop progressive edits and send only the missing tail once the
+ # final response is available.
logger.debug("Edit failed, disabling streaming for this adapter")
+ self._fallback_prefix = self._visible_prefix()
+ self._fallback_final_send = True
self._edit_supported = False
- self._already_sent = False
+ self._already_sent = True
else:
# Editing not supported — skip intermediate updates.
- # The final response will be sent by the normal path.
+ # The final response will be sent by the fallback path.
pass
else:
# First message — send new
diff --git a/hermes_cli/__init__.py b/hermes_cli/__init__.py
index 0873d3d29c..959332e81c 100644
--- a/hermes_cli/__init__.py
+++ b/hermes_cli/__init__.py
@@ -11,5 +11,5 @@ Provides subcommands for:
- hermes cron - Manage cron jobs
"""
-__version__ = "0.7.0"
-__release_date__ = "2026.4.3"
+__version__ = "0.8.0"
+__release_date__ = "2026.4.8"
diff --git a/hermes_cli/cron.py b/hermes_cli/cron.py
index d10513a280..e0ab6007a8 100644
--- a/hermes_cli/cron.py
+++ b/hermes_cli/cron.py
@@ -93,6 +93,21 @@ def cron_list(show_all: bool = False):
script = job.get("script")
if script:
print(f" Script: {script}")
+
+ # Execution history
+ last_status = job.get("last_status")
+ if last_status:
+ last_run = job.get("last_run_at", "?")
+ if last_status == "ok":
+ status_display = color("ok", Colors.GREEN)
+ else:
+ status_display = color(f"{last_status}: {job.get('last_error', '?')}", Colors.RED)
+ print(f" Last run: {last_run} {status_display}")
+
+ delivery_err = job.get("last_delivery_error")
+ if delivery_err:
+ print(f" {color('⚠ Delivery failed:', Colors.YELLOW)} {delivery_err}")
+
print()
from hermes_cli.gateway import find_gateway_pids
diff --git a/hermes_cli/doctor.py b/hermes_cli/doctor.py
index 876ab15d57..361e81d214 100644
--- a/hermes_cli/doctor.py
+++ b/hermes_cli/doctor.py
@@ -812,69 +812,83 @@ def run_doctor(args):
check_warn("No GITHUB_TOKEN", f"(60 req/hr rate limit — set in {_DHH}/.env for better rates)")
# =========================================================================
- # Honcho memory
+ # Memory Provider (only check the active provider, if any)
# =========================================================================
print()
- print(color("◆ Honcho Memory", Colors.CYAN, Colors.BOLD))
+ print(color("◆ Memory Provider", Colors.CYAN, Colors.BOLD))
+ _active_memory_provider = ""
try:
- from plugins.memory.honcho.client import HonchoClientConfig, resolve_config_path
- hcfg = HonchoClientConfig.from_global_config()
- _honcho_cfg_path = resolve_config_path()
+ import yaml as _yaml
+ _mem_cfg_path = HERMES_HOME / "config.yaml"
+ if _mem_cfg_path.exists():
+ with open(_mem_cfg_path) as _f:
+ _raw_cfg = _yaml.safe_load(_f) or {}
+ _active_memory_provider = (_raw_cfg.get("memory") or {}).get("provider", "")
+ except Exception:
+ pass
- if not _honcho_cfg_path.exists():
- check_warn("Honcho config not found", "run: hermes memory setup")
- elif not hcfg.enabled:
- check_info(f"Honcho disabled (set enabled: true in {_honcho_cfg_path} to activate)")
- elif not (hcfg.api_key or hcfg.base_url):
- check_fail("Honcho API key or base URL not set", "run: hermes memory setup")
- issues.append("No Honcho API key — run 'hermes memory setup'")
- else:
- from plugins.memory.honcho.client import get_honcho_client, reset_honcho_client
- reset_honcho_client()
- try:
- get_honcho_client(hcfg)
- check_ok(
- "Honcho connected",
- f"workspace={hcfg.workspace_id} mode={hcfg.recall_mode} freq={hcfg.write_frequency}",
- )
- except Exception as _e:
- check_fail("Honcho connection failed", str(_e))
- issues.append(f"Honcho unreachable: {_e}")
- except ImportError:
- check_warn("honcho-ai not installed", "pip install honcho-ai")
- except Exception as _e:
- check_warn("Honcho check failed", str(_e))
+ if not _active_memory_provider:
+ check_ok("Built-in memory active", "(no external provider configured — this is fine)")
+ elif _active_memory_provider == "honcho":
+ try:
+ from plugins.memory.honcho.client import HonchoClientConfig, resolve_config_path
+ hcfg = HonchoClientConfig.from_global_config()
+ _honcho_cfg_path = resolve_config_path()
- # =========================================================================
- # Mem0 memory
- # =========================================================================
- print()
- print(color("◆ Mem0 Memory", Colors.CYAN, Colors.BOLD))
-
- try:
- from plugins.memory.mem0 import _load_config as _load_mem0_config
- mem0_cfg = _load_mem0_config()
- mem0_key = mem0_cfg.get("api_key", "")
- if mem0_key:
- check_ok("Mem0 API key configured")
- check_info(f"user_id={mem0_cfg.get('user_id', '?')} agent_id={mem0_cfg.get('agent_id', '?')}")
- # Check if mem0.json exists but is missing api_key (the bug we fixed)
- mem0_json = HERMES_HOME / "mem0.json"
- if mem0_json.exists():
+ if not _honcho_cfg_path.exists():
+ check_warn("Honcho config not found", "run: hermes memory setup")
+ elif not hcfg.enabled:
+ check_info(f"Honcho disabled (set enabled: true in {_honcho_cfg_path} to activate)")
+ elif not (hcfg.api_key or hcfg.base_url):
+ check_fail("Honcho API key or base URL not set", "run: hermes memory setup")
+ issues.append("No Honcho API key — run 'hermes memory setup'")
+ else:
+ from plugins.memory.honcho.client import get_honcho_client, reset_honcho_client
+ reset_honcho_client()
try:
- import json as _json
- file_cfg = _json.loads(mem0_json.read_text())
- if not file_cfg.get("api_key") and mem0_key:
- check_info("api_key from .env (not in mem0.json) — this is fine")
- except Exception:
- pass
- else:
- check_warn("Mem0 not configured", "(set MEM0_API_KEY in .env or run hermes memory setup)")
- except ImportError:
- check_warn("Mem0 plugin not loadable", "(optional)")
- except Exception as _e:
- check_warn("Mem0 check failed", str(_e))
+ get_honcho_client(hcfg)
+ check_ok(
+ "Honcho connected",
+ f"workspace={hcfg.workspace_id} mode={hcfg.recall_mode} freq={hcfg.write_frequency}",
+ )
+ except Exception as _e:
+ check_fail("Honcho connection failed", str(_e))
+ issues.append(f"Honcho unreachable: {_e}")
+ except ImportError:
+ check_fail("honcho-ai not installed", "pip install honcho-ai")
+ issues.append("Honcho is set as memory provider but honcho-ai is not installed")
+ except Exception as _e:
+ check_warn("Honcho check failed", str(_e))
+ elif _active_memory_provider == "mem0":
+ try:
+ from plugins.memory.mem0 import _load_config as _load_mem0_config
+ mem0_cfg = _load_mem0_config()
+ mem0_key = mem0_cfg.get("api_key", "")
+ if mem0_key:
+ check_ok("Mem0 API key configured")
+ check_info(f"user_id={mem0_cfg.get('user_id', '?')} agent_id={mem0_cfg.get('agent_id', '?')}")
+ else:
+ check_fail("Mem0 API key not set", "(set MEM0_API_KEY in .env or run hermes memory setup)")
+ issues.append("Mem0 is set as memory provider but API key is missing")
+ except ImportError:
+ check_fail("Mem0 plugin not loadable", "pip install mem0ai")
+ issues.append("Mem0 is set as memory provider but mem0ai is not installed")
+ except Exception as _e:
+ check_warn("Mem0 check failed", str(_e))
+ else:
+ # Generic check for other memory providers (openviking, hindsight, etc.)
+ try:
+ from plugins.memory import load_memory_provider
+ _provider = load_memory_provider(_active_memory_provider)
+ if _provider and _provider.is_available():
+ check_ok(f"{_active_memory_provider} provider active")
+ elif _provider:
+ check_warn(f"{_active_memory_provider} configured but not available", "run: hermes memory status")
+ else:
+ check_warn(f"{_active_memory_provider} plugin not found", "run: hermes memory setup")
+ except Exception as _e:
+ check_warn(f"{_active_memory_provider} check failed", str(_e))
# =========================================================================
# Profiles
diff --git a/hermes_cli/model_switch.py b/hermes_cli/model_switch.py
index 988eeebdf1..07efbcf4a6 100644
--- a/hermes_cli/model_switch.py
+++ b/hermes_cli/model_switch.py
@@ -791,12 +791,12 @@ def list_authenticated_providers(
if overlay.auth_type in ("oauth_device_code", "oauth_external", "external_process"):
# These use auth stores, not env vars — check for auth.json entries
try:
- from hermes_cli.auth import _read_auth_store
- store = _read_auth_store()
- if store and pid in store:
+ from hermes_cli.auth import _load_auth_store
+ store = _load_auth_store()
+ if store and (pid in store.get("providers", {}) or pid in store.get("credential_pool", {})):
has_creds = True
- except Exception:
- pass
+ except Exception as exc:
+ logger.debug("Auth store check failed for %s: %s", pid, exc)
if not has_creds:
continue
diff --git a/hermes_cli/models.py b/hermes_cli/models.py
index 4b37bc9e73..aa68f877d9 100644
--- a/hermes_cli/models.py
+++ b/hermes_cli/models.py
@@ -144,18 +144,22 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
"kimi-k2-0905-preview",
],
"minimax": [
- "MiniMax-M2.7",
- "MiniMax-M2.7-highspeed",
+ "MiniMax-M1",
+ "MiniMax-M1-40k",
+ "MiniMax-M1-80k",
+ "MiniMax-M1-128k",
+ "MiniMax-M1-256k",
"MiniMax-M2.5",
- "MiniMax-M2.5-highspeed",
- "MiniMax-M2.1",
+ "MiniMax-M2.7",
],
"minimax-cn": [
- "MiniMax-M2.7",
- "MiniMax-M2.7-highspeed",
+ "MiniMax-M1",
+ "MiniMax-M1-40k",
+ "MiniMax-M1-80k",
+ "MiniMax-M1-128k",
+ "MiniMax-M1-256k",
"MiniMax-M2.5",
- "MiniMax-M2.5-highspeed",
- "MiniMax-M2.1",
+ "MiniMax-M2.7",
],
"anthropic": [
"claude-opus-4-6",
diff --git a/hermes_cli/plugins.py b/hermes_cli/plugins.py
index 23a655aa30..7323bbd011 100644
--- a/hermes_cli/plugins.py
+++ b/hermes_cli/plugins.py
@@ -61,6 +61,8 @@ VALID_HOOKS: Set[str] = {
"post_api_request",
"on_session_start",
"on_session_end",
+ "on_session_finalize",
+ "on_session_reset",
}
ENTRY_POINTS_GROUP = "hermes_agent.plugins"
diff --git a/hermes_cli/runtime_provider.py b/hermes_cli/runtime_provider.py
index 9c82ef62af..fa9d493980 100644
--- a/hermes_cli/runtime_provider.py
+++ b/hermes_cli/runtime_provider.py
@@ -163,6 +163,16 @@ def _resolve_runtime_from_pool_entry(
api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", ""))
else:
configured_provider = str(model_cfg.get("provider") or "").strip().lower()
+ # Honour model.base_url from config.yaml when the configured provider
+ # matches this provider — same pattern as the Anthropic branch above.
+ # Only override when the pool entry has no explicit base_url (i.e. it
+ # fell back to the hardcoded default). Env var overrides win (#6039).
+ pconfig = PROVIDER_REGISTRY.get(provider)
+ pool_url_is_default = pconfig and base_url.rstrip("/") == pconfig.inference_base_url.rstrip("/")
+ if configured_provider == provider and pool_url_is_default:
+ cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/")
+ if cfg_base_url:
+ base_url = cfg_base_url
configured_mode = _parse_api_mode(model_cfg.get("api_mode"))
if configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider):
api_mode = configured_mode
@@ -724,7 +734,15 @@ def resolve_runtime_provider(
pconfig = PROVIDER_REGISTRY.get(provider)
if pconfig and pconfig.auth_type == "api_key":
creds = resolve_api_key_provider_credentials(provider)
- base_url = creds.get("base_url", "").rstrip("/")
+ # Honour model.base_url from config.yaml when the configured provider
+ # matches this provider — mirrors the Anthropic path above. Without
+ # this, users who set model.base_url to e.g. api.minimaxi.com/anthropic
+ # (China endpoint) still get the hardcoded api.minimax.io default (#6039).
+ cfg_provider = str(model_cfg.get("provider") or "").strip().lower()
+ cfg_base_url = ""
+ if cfg_provider == provider:
+ cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/")
+ base_url = cfg_base_url or creds.get("base_url", "").rstrip("/")
api_mode = "chat_completions"
if provider == "copilot":
api_mode = _copilot_runtime_api_mode(model_cfg, creds.get("api_key", ""))
diff --git a/hermes_cli/setup.py b/hermes_cli/setup.py
index 2407ca275d..43c3b086d9 100644
--- a/hermes_cli/setup.py
+++ b/hermes_cli/setup.py
@@ -105,8 +105,8 @@ _DEFAULT_PROVIDER_MODELS = {
],
"zai": ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"],
"kimi-coding": ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"],
- "minimax": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
- "minimax-cn": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"],
+ "minimax": ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"],
+ "minimax-cn": ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"],
"ai-gateway": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-3-flash"],
"kilocode": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5.4", "google/gemini-3-pro-preview", "google/gemini-3-flash-preview"],
"opencode-zen": ["gpt-5.4", "gpt-5.3-codex", "claude-sonnet-4-6", "gemini-3-flash", "glm-5", "kimi-k2.5", "minimax-m2.7"],
diff --git a/plugins/memory/hindsight/__init__.py b/plugins/memory/hindsight/__init__.py
index 51feb3cb61..199a7dd5cd 100644
--- a/plugins/memory/hindsight/__init__.py
+++ b/plugins/memory/hindsight/__init__.py
@@ -23,6 +23,8 @@ import json
import logging
import os
import threading
+
+from hermes_constants import get_hermes_home
from typing import Any, Dict, List
from agent.memory_provider import MemoryProvider
@@ -142,7 +144,6 @@ def _load_config() -> dict:
3. Environment variables
"""
from pathlib import Path
- from hermes_constants import get_hermes_home
# Profile-scoped path (preferred)
profile_path = get_hermes_home() / "hindsight" / "config.json"
diff --git a/pyproject.toml b/pyproject.toml
index c35c94e21f..8982e6e46b 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "hermes-agent"
-version = "0.7.0"
+version = "0.8.0"
description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere"
readme = "README.md"
requires-python = ">=3.11"
diff --git a/run_agent.py b/run_agent.py
index a990045b06..a0ae15a162 100644
--- a/run_agent.py
+++ b/run_agent.py
@@ -76,6 +76,7 @@ from hermes_constants import OPENROUTER_BASE_URL
# Agent internals extracted to agent/ package for modularity
from agent.memory_manager import build_memory_context_block
+from agent.retry_utils import jittered_backoff
from agent.prompt_builder import (
DEFAULT_AGENT_IDENTITY, PLATFORM_HINTS,
MEMORY_GUIDANCE, SESSION_SEARCH_GUIDANCE, SKILLS_GUIDANCE,
@@ -86,6 +87,7 @@ from agent.model_metadata import (
estimate_tokens_rough, estimate_messages_tokens_rough, estimate_request_tokens_rough,
get_next_probe_tier, parse_context_limit_from_error,
save_context_length, is_local_endpoint,
+ query_ollama_num_ctx,
)
from agent.context_compressor import ContextCompressor
from agent.subdirectory_hints import SubdirectoryHintTracker
@@ -1160,6 +1162,33 @@ class AIAgent:
self.session_cost_status = "unknown"
self.session_cost_source = "none"
+ # ── Ollama num_ctx injection ──
+ # Ollama defaults to 2048 context regardless of the model's capabilities.
+ # When running against an Ollama server, detect the model's max context
+ # and pass num_ctx on every chat request so the full window is used.
+ # User override: set model.ollama_num_ctx in config.yaml to cap VRAM use.
+ self._ollama_num_ctx: int | None = None
+ _ollama_num_ctx_override = None
+ if isinstance(_model_cfg, dict):
+ _ollama_num_ctx_override = _model_cfg.get("ollama_num_ctx")
+ if _ollama_num_ctx_override is not None:
+ try:
+ self._ollama_num_ctx = int(_ollama_num_ctx_override)
+ except (TypeError, ValueError):
+ logger.debug("Invalid ollama_num_ctx config value: %r", _ollama_num_ctx_override)
+ if self._ollama_num_ctx is None and self.base_url and is_local_endpoint(self.base_url):
+ try:
+ _detected = query_ollama_num_ctx(self.model, self.base_url)
+ if _detected and _detected > 0:
+ self._ollama_num_ctx = _detected
+ except Exception as exc:
+ logger.debug("Ollama num_ctx detection failed: %s", exc)
+ if self._ollama_num_ctx and not self.quiet_mode:
+ logger.info(
+ "Ollama num_ctx: will request %d tokens (model max from /api/show)",
+ self._ollama_num_ctx,
+ )
+
if not self.quiet_mode:
if compression_enabled:
print(f"📊 Context limit: {self.context_compressor.context_length:,} tokens (compress at {int(compression_threshold*100)}% = {self.context_compressor.threshold_tokens:,})")
@@ -5400,6 +5429,15 @@ class AIAgent:
if _is_nous:
extra_body["tags"] = ["product=hermes-agent"]
+ # Ollama num_ctx: override the 2048 default so the model actually
+ # uses the context window it was trained for. Passed via the OpenAI
+ # SDK's extra_body → options.num_ctx, which Ollama's OpenAI-compat
+ # endpoint forwards to the runner as --ctx-size.
+ if self._ollama_num_ctx:
+ options = extra_body.get("options", {})
+ options["num_ctx"] = self._ollama_num_ctx
+ extra_body["options"] = options
+
if extra_body:
api_kwargs["extra_body"] = extra_body
@@ -7250,6 +7288,7 @@ class AIAgent:
codex_auth_retry_attempted=False
anthropic_auth_retry_attempted=False
nous_auth_retry_attempted=False
+ thinking_sig_retry_attempted = False
has_retried_429 = False
restart_with_compressed_messages = False
restart_with_length_continuation = False
@@ -7465,7 +7504,8 @@ class AIAgent:
}
# Longer backoff for rate limiting (likely cause of None choices)
- wait_time = min(5 * (2 ** (retry_count - 1)), 120) # 5s, 10s, 20s, 40s, 80s, 120s
+ # Jittered exponential: 5s base, 120s cap + random jitter
+ wait_time = jittered_backoff(retry_count, base_delay=5.0, max_delay=120.0)
self._vprint(f"{self.log_prefix}⏳ Retrying in {wait_time}s (extended backoff for possible rate limit)...", force=True)
logging.warning(f"Invalid API response (retry {retry_count}/{max_retries}): {', '.join(error_details)} | Provider: {provider_name}")
@@ -7838,8 +7878,38 @@ class AIAgent:
print(f"{self.log_prefix} • Check ANTHROPIC_API_KEY in {_dhh}/.env for API keys or legacy token values")
print(f"{self.log_prefix} • For API keys: verify at https://console.anthropic.com/settings/keys")
print(f"{self.log_prefix} • For Claude Code: run 'claude /login' to refresh, then retry")
- print(f"{self.log_prefix} • Clear stale keys: hermes config set ANTHROPIC_TOKEN \"\"")
- print(f"{self.log_prefix} • Legacy cleanup: hermes config set ANTHROPIC_API_KEY \"\"")
+ print(f"{self.log_prefix} • Legacy cleanup: hermes config set ANTHROPIC_TOKEN \"\"")
+ print(f"{self.log_prefix} • Clear stale keys: hermes config set ANTHROPIC_API_KEY \"\"")
+
+ # ── Thinking block signature recovery ─────────────────
+ # Anthropic signs thinking blocks against the full turn
+ # content. Any upstream mutation (context compression,
+ # session truncation, message merging) invalidates the
+ # signature → HTTP 400. Recovery: strip reasoning_details
+ # from all messages so the next retry sends no thinking
+ # blocks at all. One-shot — don't retry infinitely.
+ if (
+ self.api_mode == "anthropic_messages"
+ and status_code == 400
+ and not thinking_sig_retry_attempted
+ ):
+ _err_msg_lower = str(api_error).lower()
+ if "signature" in _err_msg_lower and "thinking" in _err_msg_lower:
+ thinking_sig_retry_attempted = True
+ for _m in messages:
+ if isinstance(_m, dict):
+ _m.pop("reasoning_details", None)
+ self._vprint(
+ f"{self.log_prefix}⚠️ Thinking block signature invalid — "
+ f"stripped all thinking blocks, retrying...",
+ force=True,
+ )
+ logging.warning(
+ "%sThinking block signature recovery: stripped "
+ "reasoning_details from %d messages",
+ self.log_prefix, len(messages),
+ )
+ continue
retry_count += 1
elapsed_time = time.time() - api_start_time
@@ -8322,7 +8392,7 @@ class AIAgent:
_retry_after = min(int(_ra_raw), 120) # Cap at 2 minutes
except (TypeError, ValueError):
pass
- wait_time = _retry_after if _retry_after else min(2 ** retry_count, 60)
+ wait_time = _retry_after if _retry_after else jittered_backoff(retry_count, base_delay=2.0, max_delay=60.0)
if is_rate_limited:
self._emit_status(f"⏱️ Rate limit reached. Waiting {wait_time}s before retry (attempt {retry_count + 1}/{max_retries})...")
else:
diff --git a/tests/agent/test_anthropic_adapter.py b/tests/agent/test_anthropic_adapter.py
index 9aa8c10b17..0024fac624 100644
--- a/tests/agent/test_anthropic_adapter.py
+++ b/tests/agent/test_anthropic_adapter.py
@@ -1276,6 +1276,258 @@ class TestRoleAlternation:
assert [m["role"] for m in result] == ["user", "assistant", "user"]
+# ---------------------------------------------------------------------------
+# Thinking block signature management
+# ---------------------------------------------------------------------------
+
+
+class TestThinkingBlockSignatureManagement:
+ """Tests for the thinking block handling strategy:
+ strip from old turns, preserve latest signed, downgrade unsigned."""
+
+ def test_thinking_stripped_from_non_last_assistant(self):
+ """Thinking blocks are removed from all assistant messages except the last."""
+ messages = [
+ {
+ "role": "assistant",
+ "content": "",
+ "tool_calls": [
+ {"id": "tc_1", "function": {"name": "tool1", "arguments": "{}"}},
+ ],
+ "reasoning_details": [
+ {"type": "thinking", "thinking": "Old reasoning.", "signature": "sig_old"},
+ ],
+ },
+ {"role": "tool", "tool_call_id": "tc_1", "content": "result 1"},
+ {
+ "role": "assistant",
+ "content": "",
+ "tool_calls": [
+ {"id": "tc_2", "function": {"name": "tool2", "arguments": "{}"}},
+ ],
+ "reasoning_details": [
+ {"type": "thinking", "thinking": "Latest reasoning.", "signature": "sig_new"},
+ ],
+ },
+ {"role": "tool", "tool_call_id": "tc_2", "content": "result 2"},
+ ]
+ _, result = convert_messages_to_anthropic(messages)
+
+ # Find both assistant messages
+ assistants = [m for m in result if m["role"] == "assistant"]
+ assert len(assistants) == 2
+
+ # First (non-last) assistant: no thinking blocks
+ first_types = [b.get("type") for b in assistants[0]["content"]]
+ assert "thinking" not in first_types
+ assert "redacted_thinking" not in first_types
+ assert "tool_use" in first_types # tool_use should survive
+
+ # Last assistant: thinking block preserved with signature
+ last_blocks = assistants[1]["content"]
+ thinking_blocks = [b for b in last_blocks if b.get("type") == "thinking"]
+ assert len(thinking_blocks) == 1
+ assert thinking_blocks[0]["thinking"] == "Latest reasoning."
+ assert thinking_blocks[0]["signature"] == "sig_new"
+
+ def test_signed_thinking_preserved_on_last_turn(self):
+ """A signed thinking block on the last assistant message is kept."""
+ messages = [
+ {
+ "role": "assistant",
+ "content": "The answer is 42.",
+ "reasoning_details": [
+ {"type": "thinking", "thinking": "Deep thought.", "signature": "sig_valid"},
+ ],
+ },
+ ]
+ _, result = convert_messages_to_anthropic(messages)
+ blocks = result[0]["content"]
+ thinking = [b for b in blocks if b.get("type") == "thinking"]
+ assert len(thinking) == 1
+ assert thinking[0]["signature"] == "sig_valid"
+
+ def test_unsigned_thinking_downgraded_to_text_on_last_turn(self):
+ """Unsigned thinking blocks on the last turn become text blocks."""
+ messages = [
+ {
+ "role": "assistant",
+ "content": "Response text.",
+ "reasoning_details": [
+ {"type": "thinking", "thinking": "Unsigned reasoning."},
+ # No 'signature' field
+ ],
+ },
+ ]
+ _, result = convert_messages_to_anthropic(messages)
+ blocks = result[0]["content"]
+
+ # No thinking blocks should remain
+ assert not any(b.get("type") == "thinking" for b in blocks)
+ # The reasoning text should be preserved as a text block
+ text_contents = [b.get("text", "") for b in blocks if b.get("type") == "text"]
+ assert "Unsigned reasoning." in text_contents
+
+ def test_redacted_thinking_with_data_preserved(self):
+ """Redacted thinking with 'data' field is kept on last turn."""
+ messages = [
+ {
+ "role": "assistant",
+ "content": "Response.",
+ "reasoning_details": [
+ {"type": "redacted_thinking", "data": "opaque_signature_data"},
+ ],
+ },
+ ]
+ _, result = convert_messages_to_anthropic(messages)
+ blocks = result[0]["content"]
+ redacted = [b for b in blocks if b.get("type") == "redacted_thinking"]
+ assert len(redacted) == 1
+ assert redacted[0]["data"] == "opaque_signature_data"
+
+ def test_redacted_thinking_without_data_dropped(self):
+ """Redacted thinking without 'data' is dropped — can't be validated."""
+ messages = [
+ {
+ "role": "assistant",
+ "content": "Response.",
+ "reasoning_details": [
+ {"type": "redacted_thinking"},
+ # No 'data' field
+ ],
+ },
+ ]
+ _, result = convert_messages_to_anthropic(messages)
+ blocks = result[0]["content"]
+ assert not any(b.get("type") == "redacted_thinking" for b in blocks)
+
+ def test_cache_control_stripped_from_thinking_blocks(self):
+ """cache_control markers are removed from thinking/redacted_thinking blocks."""
+ messages = [
+ {
+ "role": "assistant",
+ "content": "",
+ "tool_calls": [
+ {"id": "tc_1", "function": {"name": "t", "arguments": "{}"}},
+ ],
+ "reasoning_details": [
+ {
+ "type": "thinking",
+ "thinking": "Reasoning.",
+ "signature": "sig_1",
+ "cache_control": {"type": "ephemeral"},
+ },
+ ],
+ },
+ {"role": "tool", "tool_call_id": "tc_1", "content": "result"},
+ ]
+ _, result = convert_messages_to_anthropic(messages)
+ assistant = next(m for m in result if m["role"] == "assistant")
+ for block in assistant["content"]:
+ if block.get("type") in ("thinking", "redacted_thinking"):
+ assert "cache_control" not in block
+
+ def test_thinking_stripped_from_merged_consecutive_assistants(self):
+ """When consecutive assistants are merged, second one's thinking is dropped."""
+ messages = [
+ {
+ "role": "assistant",
+ "content": "First response.",
+ "reasoning_details": [
+ {"type": "thinking", "thinking": "First thought.", "signature": "sig_1"},
+ ],
+ },
+ {
+ "role": "assistant",
+ "content": "Second response.",
+ "reasoning_details": [
+ {"type": "thinking", "thinking": "Second thought.", "signature": "sig_2"},
+ ],
+ },
+ ]
+ _, result = convert_messages_to_anthropic(messages)
+
+ # Should be merged into one assistant message
+ assistants = [m for m in result if m["role"] == "assistant"]
+ assert len(assistants) == 1
+
+ # Only the first thinking block should remain (signed, on the last/only assistant)
+ blocks = assistants[0]["content"]
+ thinking = [b for b in blocks if b.get("type") == "thinking"]
+ assert len(thinking) == 1
+ assert thinking[0]["thinking"] == "First thought."
+
+ def test_empty_content_after_strip_gets_placeholder(self):
+ """If stripping thinking leaves an empty message, a placeholder is added."""
+ messages = [
+ {
+ "role": "assistant",
+ "content": "",
+ "reasoning_details": [
+ {"type": "thinking", "thinking": "Only thinking, no text."},
+ # Unsigned — will be downgraded, but content was empty string
+ ],
+ },
+ {"role": "user", "content": "Next message."},
+ {"role": "assistant", "content": "Final."},
+ ]
+ _, result = convert_messages_to_anthropic(messages)
+ # First assistant is non-last, so thinking is stripped completely.
+ # The original content was empty and thinking was unsigned → placeholder
+ first_assistant = result[0]
+ assert first_assistant["role"] == "assistant"
+ assert len(first_assistant["content"]) >= 1
+
+ def test_multi_turn_conversation_preserves_only_last(self):
+ """Full multi-turn conversation: only last assistant keeps thinking."""
+ messages = [
+ {"role": "user", "content": "Question 1"},
+ {
+ "role": "assistant",
+ "content": "Answer 1",
+ "reasoning_details": [
+ {"type": "thinking", "thinking": "Thought 1", "signature": "sig_1"},
+ ],
+ },
+ {"role": "user", "content": "Question 2"},
+ {
+ "role": "assistant",
+ "content": "Answer 2",
+ "reasoning_details": [
+ {"type": "thinking", "thinking": "Thought 2", "signature": "sig_2"},
+ ],
+ },
+ {"role": "user", "content": "Question 3"},
+ {
+ "role": "assistant",
+ "content": "Answer 3",
+ "reasoning_details": [
+ {"type": "thinking", "thinking": "Thought 3", "signature": "sig_3"},
+ ],
+ },
+ ]
+ _, result = convert_messages_to_anthropic(messages)
+
+ assistants = [m for m in result if m["role"] == "assistant"]
+ assert len(assistants) == 3
+
+ # First two: no thinking blocks
+ for a in assistants[:2]:
+ assert not any(
+ b.get("type") in ("thinking", "redacted_thinking")
+ for b in a["content"]
+ if isinstance(b, dict)
+ )
+
+ # Last one: thinking preserved
+ last_thinking = [
+ b for b in assistants[2]["content"]
+ if isinstance(b, dict) and b.get("type") == "thinking"
+ ]
+ assert len(last_thinking) == 1
+ assert last_thinking[0]["signature"] == "sig_3"
+
+
# ---------------------------------------------------------------------------
# Tool choice
# ---------------------------------------------------------------------------
diff --git a/tests/agent/test_auxiliary_client.py b/tests/agent/test_auxiliary_client.py
index 32f481988e..c7cd12ae71 100644
--- a/tests/agent/test_auxiliary_client.py
+++ b/tests/agent/test_auxiliary_client.py
@@ -471,6 +471,23 @@ class TestExplicitProviderRouting:
client, model = resolve_provider_client("zai")
assert client is not None
+ def test_explicit_google_alias_uses_gemini_credentials(self):
+ """provider='google' should route through the gemini API-key provider."""
+ with (
+ patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={
+ "api_key": "gemini-key",
+ "base_url": "https://generativelanguage.googleapis.com/v1beta/openai",
+ }),
+ patch("agent.auxiliary_client.OpenAI") as mock_openai,
+ ):
+ mock_openai.return_value = MagicMock()
+ client, model = resolve_provider_client("google", model="gemini-3.1-pro-preview")
+
+ assert client is not None
+ assert model == "gemini-3.1-pro-preview"
+ assert mock_openai.call_args.kwargs["api_key"] == "gemini-key"
+ assert mock_openai.call_args.kwargs["base_url"] == "https://generativelanguage.googleapis.com/v1beta/openai"
+
def test_explicit_unknown_returns_none(self, monkeypatch):
"""Unknown provider should return None."""
client, model = resolve_provider_client("nonexistent-provider")
@@ -624,12 +641,15 @@ class TestVisionClientFallback:
assert client is None
assert model is None
- def test_vision_auto_includes_anthropic_when_configured(self, monkeypatch):
- monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key")
+ def test_vision_auto_includes_active_provider_when_configured(self, monkeypatch):
+ """Active provider appears in available backends when credentials exist."""
+ monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
with (
patch("agent.auxiliary_client._read_nous_auth", return_value=None),
+ patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
+ patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
- patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"),
+ patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
):
backends = get_available_vision_backends()
@@ -702,88 +722,50 @@ class TestAuxiliaryPoolAwareness:
assert call_kwargs["base_url"] == "https://api.githubcopilot.com"
assert call_kwargs["default_headers"]["Editor-Version"]
- def test_vision_auto_uses_anthropic_when_no_higher_priority_backend(self, monkeypatch):
- monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key")
+ def test_vision_auto_uses_active_provider_as_fallback(self, monkeypatch):
+ """When no OpenRouter/Nous available, vision auto falls back to active provider."""
+ monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
with (
patch("agent.auxiliary_client._read_nous_auth", return_value=None),
+ patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
+ patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
- patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"),
+ patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"),
):
client, model = get_vision_auxiliary_client()
assert client is not None
assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
- assert model == "claude-haiku-4-5-20251001"
- def test_selected_anthropic_provider_is_preferred_for_vision_auto(self, monkeypatch):
+ def test_vision_auto_prefers_openrouter_over_active_provider(self, monkeypatch):
+ """OpenRouter is tried before the active provider in vision auto."""
monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
- monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key")
-
- def fake_load_config():
- return {"model": {"provider": "anthropic", "default": "claude-sonnet-4-6"}}
+ monkeypatch.setenv("ANTHROPIC_API_KEY", "***")
with (
patch("agent.auxiliary_client._read_nous_auth", return_value=None),
- patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()),
- patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"),
+ patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"),
+ patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"),
patch("agent.auxiliary_client.OpenAI") as mock_openai,
- patch("hermes_cli.config.load_config", fake_load_config),
- ):
- client, model = get_vision_auxiliary_client()
-
- assert client is not None
- assert client.__class__.__name__ == "AnthropicAuxiliaryClient"
- assert model == "claude-haiku-4-5-20251001"
-
- def test_selected_codex_provider_short_circuits_vision_auto(self, monkeypatch):
- def fake_load_config():
- return {"model": {"provider": "openai-codex", "default": "gpt-5.2-codex"}}
-
- codex_client = MagicMock()
- with (
- patch("hermes_cli.config.load_config", fake_load_config),
- patch("agent.auxiliary_client._try_codex", return_value=(codex_client, "gpt-5.2-codex")) as mock_codex,
- patch("agent.auxiliary_client._try_openrouter") as mock_openrouter,
- patch("agent.auxiliary_client._try_nous") as mock_nous,
- patch("agent.auxiliary_client._try_anthropic") as mock_anthropic,
- patch("agent.auxiliary_client._try_custom_endpoint") as mock_custom,
):
provider, client, model = resolve_vision_provider_client()
- assert provider == "openai-codex"
- assert client is codex_client
- assert model == "gpt-5.2-codex"
- mock_codex.assert_called_once()
- mock_openrouter.assert_not_called()
- mock_nous.assert_not_called()
- mock_anthropic.assert_not_called()
- mock_custom.assert_not_called()
+ # OpenRouter should win over anthropic active provider
+ assert provider == "openrouter"
- def test_vision_auto_includes_codex(self, codex_auth_dir):
- """Codex supports vision (gpt-5.3-codex), so auto mode should use it."""
- with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
- patch("agent.auxiliary_client.OpenAI"):
- client, model = get_vision_auxiliary_client()
- from agent.auxiliary_client import CodexAuxiliaryClient
- assert isinstance(client, CodexAuxiliaryClient)
- assert model == "gpt-5.2-codex"
-
- def test_vision_auto_falls_back_to_custom_endpoint(self, monkeypatch):
- """Custom endpoint is used as fallback in vision auto mode.
-
- Many local models (Qwen-VL, LLaVA, etc.) support vision.
- When no OpenRouter/Nous/Codex is available, try the custom endpoint.
- """
+ def test_vision_auto_uses_named_custom_as_active_provider(self, monkeypatch):
+ """Named custom provider works as active provider fallback in vision auto."""
monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False)
with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)), \
- patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
- patch("agent.auxiliary_client._resolve_custom_runtime",
- return_value=("http://localhost:1234/v1", "local-key")), \
- patch("agent.auxiliary_client.OpenAI") as mock_openai:
- client, model = get_vision_auxiliary_client()
- assert client is not None # Custom endpoint picked up as fallback
+ patch("agent.auxiliary_client._read_main_provider", return_value="custom:local"), \
+ patch("agent.auxiliary_client._read_main_model", return_value="my-local-model"), \
+ patch("agent.auxiliary_client.resolve_provider_client",
+ return_value=(MagicMock(), "my-local-model")) as mock_resolve:
+ provider, client, model = resolve_vision_provider_client()
+ assert client is not None
+ assert provider == "custom:local"
def test_vision_direct_endpoint_override(self, monkeypatch):
monkeypatch.setenv("OPENROUTER_API_KEY", "or-key")
@@ -822,6 +804,31 @@ class TestAuxiliaryPoolAwareness:
assert model == "google/gemini-3-flash-preview"
assert client is not None
+ def test_vision_config_google_provider_uses_gemini_credentials(self, monkeypatch):
+ config = {
+ "auxiliary": {
+ "vision": {
+ "provider": "google",
+ "model": "gemini-3.1-pro-preview",
+ }
+ }
+ }
+ monkeypatch.setattr("hermes_cli.config.load_config", lambda: config)
+ with (
+ patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={
+ "api_key": "gemini-key",
+ "base_url": "https://generativelanguage.googleapis.com/v1beta/openai",
+ }),
+ patch("agent.auxiliary_client.OpenAI") as mock_openai,
+ ):
+ resolved_provider, client, model = resolve_vision_provider_client()
+
+ assert resolved_provider == "gemini"
+ assert client is not None
+ assert model == "gemini-3.1-pro-preview"
+ assert mock_openai.call_args.kwargs["api_key"] == "gemini-key"
+ assert mock_openai.call_args.kwargs["base_url"] == "https://generativelanguage.googleapis.com/v1beta/openai"
+
def test_vision_forced_main_uses_custom_endpoint(self, monkeypatch):
"""When explicitly forced to 'main', vision CAN use custom endpoint."""
config = {
@@ -846,7 +853,14 @@ class TestAuxiliaryPoolAwareness:
monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "main")
monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
monkeypatch.delenv("OPENAI_API_KEY", raising=False)
+ # Clear client cache to avoid stale entries from previous tests
+ from agent.auxiliary_client import _client_cache
+ _client_cache.clear()
with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \
+ patch("agent.auxiliary_client._read_main_provider", return_value=""), \
+ patch("agent.auxiliary_client._read_main_model", return_value=""), \
+ patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)), \
+ patch("agent.auxiliary_client._resolve_custom_runtime", return_value=(None, None)), \
patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \
patch("agent.auxiliary_client._resolve_api_key_provider", return_value=(None, None)):
client, model = get_vision_auxiliary_client()
diff --git a/tests/agent/test_minimax_auxiliary_url.py b/tests/agent/test_minimax_auxiliary_url.py
new file mode 100644
index 0000000000..4444c3aadf
--- /dev/null
+++ b/tests/agent/test_minimax_auxiliary_url.py
@@ -0,0 +1,42 @@
+"""Tests for MiniMax auxiliary client URL normalization.
+
+MiniMax and MiniMax-CN set inference_base_url to the /anthropic path.
+The auxiliary client uses the OpenAI SDK, which needs /v1 instead.
+"""
+
+import sys
+import os
+
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", ".."))
+
+from agent.auxiliary_client import _to_openai_base_url
+
+
+class TestToOpenaiBaseUrl:
+ def test_minimax_global_anthropic_suffix_replaced(self):
+ assert _to_openai_base_url("https://api.minimax.io/anthropic") == "https://api.minimax.io/v1"
+
+ def test_minimax_cn_anthropic_suffix_replaced(self):
+ assert _to_openai_base_url("https://api.minimaxi.com/anthropic") == "https://api.minimaxi.com/v1"
+
+ def test_trailing_slash_stripped_before_replace(self):
+ assert _to_openai_base_url("https://api.minimax.io/anthropic/") == "https://api.minimax.io/v1"
+
+ def test_v1_url_unchanged(self):
+ assert _to_openai_base_url("https://api.openai.com/v1") == "https://api.openai.com/v1"
+
+ def test_openrouter_url_unchanged(self):
+ assert _to_openai_base_url("https://openrouter.ai/api/v1") == "https://openrouter.ai/api/v1"
+
+ def test_anthropic_domain_unchanged(self):
+ """api.anthropic.com doesn't end with /anthropic — should be untouched."""
+ assert _to_openai_base_url("https://api.anthropic.com") == "https://api.anthropic.com"
+
+ def test_anthropic_in_subpath_unchanged(self):
+ assert _to_openai_base_url("https://example.com/anthropic/extra") == "https://example.com/anthropic/extra"
+
+ def test_empty_string(self):
+ assert _to_openai_base_url("") == ""
+
+ def test_none(self):
+ assert _to_openai_base_url(None) == ""
diff --git a/tests/agent/test_minimax_provider.py b/tests/agent/test_minimax_provider.py
new file mode 100644
index 0000000000..c6819e877d
--- /dev/null
+++ b/tests/agent/test_minimax_provider.py
@@ -0,0 +1,105 @@
+"""Tests for MiniMax provider hardening — context lengths, thinking guard, catalog."""
+
+
+class TestMinimaxContextLengths:
+ """Verify per-model context length entries for MiniMax models."""
+
+ def test_m1_variants_have_1m_context(self):
+ from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
+ # Keys are lowercase because the lookup lowercases model names
+ for model in ("minimax-m1", "minimax-m1-40k", "minimax-m1-80k",
+ "minimax-m1-128k", "minimax-m1-256k"):
+ assert model in DEFAULT_CONTEXT_LENGTHS, f"{model} missing from context lengths"
+ assert DEFAULT_CONTEXT_LENGTHS[model] == 1_000_000, f"{model} expected 1M"
+
+ def test_m2_variants_have_1m_context(self):
+ from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
+ # Keys are lowercase because the lookup lowercases model names
+ for model in ("minimax-m2.5", "minimax-m2.7"):
+ assert model in DEFAULT_CONTEXT_LENGTHS, f"{model} missing from context lengths"
+ assert DEFAULT_CONTEXT_LENGTHS[model] == 1_048_576, f"{model} expected 1048576"
+
+ def test_minimax_prefix_fallback(self):
+ from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
+ # The generic "minimax" prefix entry should be 1M for unknown models
+ assert DEFAULT_CONTEXT_LENGTHS["minimax"] == 1_048_576
+
+
+
+class TestMinimaxThinkingGuard:
+ """Verify that build_anthropic_kwargs does NOT add thinking params for MiniMax models."""
+
+ def test_no_thinking_for_minimax_m27(self):
+ from agent.anthropic_adapter import build_anthropic_kwargs
+ kwargs = build_anthropic_kwargs(
+ model="MiniMax-M2.7",
+ messages=[{"role": "user", "content": "hello"}],
+ tools=None,
+ max_tokens=4096,
+ reasoning_config={"enabled": True, "effort": "medium"},
+ )
+ assert "thinking" not in kwargs
+ assert "output_config" not in kwargs
+
+ def test_no_thinking_for_minimax_m1(self):
+ from agent.anthropic_adapter import build_anthropic_kwargs
+ kwargs = build_anthropic_kwargs(
+ model="MiniMax-M1-128k",
+ messages=[{"role": "user", "content": "hello"}],
+ tools=None,
+ max_tokens=4096,
+ reasoning_config={"enabled": True, "effort": "high"},
+ )
+ assert "thinking" not in kwargs
+
+ def test_thinking_still_works_for_claude(self):
+ from agent.anthropic_adapter import build_anthropic_kwargs
+ kwargs = build_anthropic_kwargs(
+ model="claude-sonnet-4-20250514",
+ messages=[{"role": "user", "content": "hello"}],
+ tools=None,
+ max_tokens=4096,
+ reasoning_config={"enabled": True, "effort": "medium"},
+ )
+ assert "thinking" in kwargs
+
+
+class TestMinimaxAuxModel:
+ """Verify auxiliary model is standard (not highspeed)."""
+
+ def test_minimax_aux_is_standard(self):
+ from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
+ assert _API_KEY_PROVIDER_AUX_MODELS["minimax"] == "MiniMax-M2.7"
+ assert _API_KEY_PROVIDER_AUX_MODELS["minimax-cn"] == "MiniMax-M2.7"
+
+ def test_minimax_aux_not_highspeed(self):
+ from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS
+ assert "highspeed" not in _API_KEY_PROVIDER_AUX_MODELS["minimax"]
+ assert "highspeed" not in _API_KEY_PROVIDER_AUX_MODELS["minimax-cn"]
+
+
+class TestMinimaxModelCatalog:
+ """Verify the model catalog includes M1 family and excludes deprecated models."""
+
+ def test_catalog_includes_m1_family(self):
+ from hermes_cli.models import _PROVIDER_MODELS
+ for provider in ("minimax", "minimax-cn"):
+ models = _PROVIDER_MODELS[provider]
+ assert "MiniMax-M1" in models
+ assert "MiniMax-M1-40k" in models
+ assert "MiniMax-M1-80k" in models
+ assert "MiniMax-M1-128k" in models
+ assert "MiniMax-M1-256k" in models
+
+ def test_catalog_excludes_deprecated(self):
+ from hermes_cli.models import _PROVIDER_MODELS
+ for provider in ("minimax", "minimax-cn"):
+ models = _PROVIDER_MODELS[provider]
+ assert "MiniMax-M2.1" not in models
+
+ def test_catalog_excludes_highspeed(self):
+ from hermes_cli.models import _PROVIDER_MODELS
+ for provider in ("minimax", "minimax-cn"):
+ models = _PROVIDER_MODELS[provider]
+ assert "MiniMax-M2.7-highspeed" not in models
+ assert "MiniMax-M2.5-highspeed" not in models
diff --git a/tests/cli/test_session_boundary_hooks.py b/tests/cli/test_session_boundary_hooks.py
new file mode 100644
index 0000000000..19de4cd97a
--- /dev/null
+++ b/tests/cli/test_session_boundary_hooks.py
@@ -0,0 +1,66 @@
+import pytest
+from unittest.mock import MagicMock, patch
+from hermes_cli.plugins import VALID_HOOKS, PluginManager
+import os
+import shutil
+import tempfile
+from cli import HermesCLI
+
+
+def test_session_hooks_in_valid_hooks():
+ """Verify on_session_finalize and on_session_reset are registered as valid hooks."""
+ assert "on_session_finalize" in VALID_HOOKS
+ assert "on_session_reset" in VALID_HOOKS
+
+
+@patch("hermes_cli.plugins.invoke_hook")
+def test_session_finalize_on_reset(mock_invoke_hook):
+ """Verify on_session_finalize fires when /new or /reset is used."""
+ cli = HermesCLI()
+ cli.agent = MagicMock()
+ cli.agent.session_id = "test-session-id"
+
+ # Simulate /new command which triggers on_session_finalize for the old session
+ cli.new_session(silent=True)
+
+ # Check if on_session_finalize was called for the old session
+ mock_invoke_hook.assert_any_call(
+ "on_session_finalize", session_id="test-session-id", platform="cli"
+ )
+ # Check if on_session_reset was called for the new session
+ mock_invoke_hook.assert_any_call(
+ "on_session_reset", session_id=cli.session_id, platform="cli"
+ )
+
+
+@patch("hermes_cli.plugins.invoke_hook")
+def test_session_finalize_on_cleanup(mock_invoke_hook):
+ """Verify on_session_finalize fires during CLI exit cleanup."""
+ import cli as cli_mod
+
+ mock_agent = MagicMock()
+ mock_agent.session_id = "cleanup-session-id"
+ cli_mod._active_agent_ref = mock_agent
+ cli_mod._cleanup_done = False
+
+ cli_mod._run_cleanup()
+
+ mock_invoke_hook.assert_any_call(
+ "on_session_finalize", session_id="cleanup-session-id", platform="cli"
+ )
+
+
+@patch("hermes_cli.plugins.invoke_hook")
+def test_hook_errors_are_caught(mock_invoke_hook):
+ """Verify hook exceptions are caught and don't crash the agent."""
+ mgr = PluginManager()
+
+ # Register a hook that raises
+ def bad_callback(**kwargs):
+ raise Exception("Hook failed")
+
+ mgr._hooks["on_session_finalize"] = [bad_callback]
+
+ # This should not raise
+ results = mgr.invoke_hook("on_session_finalize", session_id="test", platform="cli")
+ assert results == []
diff --git a/tests/cli/test_worktree.py b/tests/cli/test_worktree.py
index f545baa391..fece9cf6be 100644
--- a/tests/cli/test_worktree.py
+++ b/tests/cli/test_worktree.py
@@ -33,6 +33,13 @@ def git_repo(tmp_path):
["git", "commit", "-m", "Initial commit"],
cwd=repo, capture_output=True,
)
+ # Add a fake remote ref so cleanup logic sees the initial commit as
+ # "pushed". Without this, `git log HEAD --not --remotes` treats every
+ # commit as unpushed and cleanup refuses to delete worktrees.
+ subprocess.run(
+ ["git", "update-ref", "refs/remotes/origin/main", "HEAD"],
+ cwd=repo, capture_output=True,
+ )
return repo
@@ -81,7 +88,11 @@ def _setup_worktree(repo_root):
def _cleanup_worktree(info):
- """Test version of _cleanup_worktree."""
+ """Test version of _cleanup_worktree.
+
+ Preserves the worktree only if it has unpushed commits.
+ Dirty working tree alone is not enough to keep it.
+ """
wt_path = info["path"]
branch = info["branch"]
repo_root = info["repo_root"]
@@ -89,15 +100,15 @@ def _cleanup_worktree(info):
if not Path(wt_path).exists():
return
- # Check for uncommitted changes
- status = subprocess.run(
- ["git", "status", "--porcelain"],
+ # Check for unpushed commits
+ result = subprocess.run(
+ ["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
capture_output=True, text=True, timeout=10, cwd=wt_path,
)
- has_changes = bool(status.stdout.strip())
+ has_unpushed = bool(result.stdout.strip())
- if has_changes:
- return False # Did not clean up
+ if has_unpushed:
+ return False # Did not clean up — has unpushed commits
subprocess.run(
["git", "worktree", "remove", wt_path, "--force"],
@@ -204,20 +215,45 @@ class TestWorktreeCleanup:
assert result is True
assert not Path(info["path"]).exists()
- def test_dirty_worktree_kept(self, git_repo):
+ def test_dirty_worktree_cleaned_when_no_unpushed(self, git_repo):
+ """Dirty working tree without unpushed commits is cleaned up.
+
+ Agent sessions typically leave untracked files / artifacts behind.
+ Since all real work is in pushed commits, these don't warrant
+ keeping the worktree.
+ """
info = _setup_worktree(str(git_repo))
assert info is not None
- # Make uncommitted changes
+ # Make uncommitted changes (untracked file)
(Path(info["path"]) / "new-file.txt").write_text("uncommitted")
subprocess.run(
["git", "add", "new-file.txt"],
cwd=info["path"], capture_output=True,
)
+ # The git_repo fixture already has a fake remote ref so the initial
+ # commit is seen as "pushed". No unpushed commits → cleanup proceeds.
result = _cleanup_worktree(info)
- assert result is False
- assert Path(info["path"]).exists() # Still there
+ assert result is True # Cleaned up despite dirty working tree
+ assert not Path(info["path"]).exists()
+
+ def test_worktree_with_unpushed_commits_kept(self, git_repo):
+ """Worktree with unpushed commits is preserved."""
+ info = _setup_worktree(str(git_repo))
+ assert info is not None
+
+ # Make a commit that is NOT on any remote
+ (Path(info["path"]) / "work.txt").write_text("real work")
+ subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True)
+ subprocess.run(
+ ["git", "commit", "-m", "agent work"],
+ cwd=info["path"], capture_output=True,
+ )
+
+ result = _cleanup_worktree(info)
+ assert result is False # Kept — has unpushed commits
+ assert Path(info["path"]).exists()
def test_branch_deleted_on_cleanup(self, git_repo):
info = _setup_worktree(str(git_repo))
@@ -367,7 +403,7 @@ class TestMultipleWorktrees:
lines = [l for l in result.stdout.strip().splitlines() if l.strip()]
assert len(lines) == 11
- # Cleanup all
+ # Cleanup all (git_repo fixture has a fake remote ref so cleanup works)
for info in worktrees:
# Discard changes first so cleanup works
subprocess.run(
@@ -492,33 +528,77 @@ class TestStaleWorktreePruning:
assert not pruned
assert Path(info["path"]).exists()
- def test_keeps_dirty_old_worktree(self, git_repo):
- """Old worktrees with uncommitted changes should NOT be pruned."""
+ def test_keeps_old_worktree_with_unpushed_commits(self, git_repo):
+ """Old worktrees (24-72h) with unpushed commits should NOT be pruned."""
import time
info = _setup_worktree(str(git_repo))
assert info is not None
- # Make it dirty
- (Path(info["path"]) / "dirty.txt").write_text("uncommitted")
+ # Make an unpushed commit
+ (Path(info["path"]) / "work.txt").write_text("real work")
+ subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True)
subprocess.run(
- ["git", "add", "dirty.txt"],
+ ["git", "commit", "-m", "agent work"],
cwd=info["path"], capture_output=True,
)
- # Make it old
+ # Make it old (25h — in the 24-72h soft tier)
old_time = time.time() - (25 * 3600)
os.utime(info["path"], (old_time, old_time))
- # Check if it would be pruned
- status = subprocess.run(
- ["git", "status", "--porcelain"],
+ # Check for unpushed commits (simulates prune logic)
+ result = subprocess.run(
+ ["git", "log", "--oneline", "HEAD", "--not", "--remotes"],
capture_output=True, text=True, cwd=info["path"],
)
- has_changes = bool(status.stdout.strip())
- assert has_changes # Should be dirty → not pruned
+ has_unpushed = bool(result.stdout.strip())
+ assert has_unpushed # Has unpushed commits → not pruned in soft tier
assert Path(info["path"]).exists()
+ def test_force_prunes_very_old_worktree(self, git_repo):
+ """Worktrees older than 72h should be force-pruned regardless."""
+ import time
+
+ info = _setup_worktree(str(git_repo))
+ assert info is not None
+
+ # Make an unpushed commit (would normally protect it)
+ (Path(info["path"]) / "work.txt").write_text("stale work")
+ subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True)
+ subprocess.run(
+ ["git", "commit", "-m", "old agent work"],
+ cwd=info["path"], capture_output=True,
+ )
+
+ # Make it very old (73h — beyond the 72h hard threshold)
+ old_time = time.time() - (73 * 3600)
+ os.utime(info["path"], (old_time, old_time))
+
+ # Simulate the force-prune tier check
+ hard_cutoff = time.time() - (72 * 3600)
+ mtime = Path(info["path"]).stat().st_mtime
+ assert mtime <= hard_cutoff # Should qualify for force removal
+
+ # Actually remove it (simulates _prune_stale_worktrees force path)
+ branch_result = subprocess.run(
+ ["git", "branch", "--show-current"],
+ capture_output=True, text=True, timeout=5, cwd=info["path"],
+ )
+ branch = branch_result.stdout.strip()
+
+ subprocess.run(
+ ["git", "worktree", "remove", info["path"], "--force"],
+ capture_output=True, text=True, timeout=15, cwd=str(git_repo),
+ )
+ if branch:
+ subprocess.run(
+ ["git", "branch", "-D", branch],
+ capture_output=True, text=True, timeout=10, cwd=str(git_repo),
+ )
+
+ assert not Path(info["path"]).exists()
+
class TestEdgeCases:
"""Test edge cases for robustness."""
@@ -611,6 +691,133 @@ class TestTerminalCWDIntegration:
assert result.stdout.strip() == "true"
+class TestOrphanedBranchPruning:
+ """Test cleanup of orphaned hermes/* and pr-* branches."""
+
+ def test_prunes_orphaned_hermes_branch(self, git_repo):
+ """hermes/hermes-* branches with no worktree should be deleted."""
+ # Create a branch that looks like a worktree branch but has no worktree
+ subprocess.run(
+ ["git", "branch", "hermes/hermes-deadbeef", "HEAD"],
+ cwd=str(git_repo), capture_output=True,
+ )
+
+ # Verify it exists
+ result = subprocess.run(
+ ["git", "branch", "--list", "hermes/hermes-deadbeef"],
+ capture_output=True, text=True, cwd=str(git_repo),
+ )
+ assert "hermes/hermes-deadbeef" in result.stdout
+
+ # Simulate _prune_orphaned_branches logic
+ result = subprocess.run(
+ ["git", "branch", "--format=%(refname:short)"],
+ capture_output=True, text=True, cwd=str(git_repo),
+ )
+ all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
+
+ wt_result = subprocess.run(
+ ["git", "worktree", "list", "--porcelain"],
+ capture_output=True, text=True, cwd=str(git_repo),
+ )
+ active_branches = {"main"}
+ for line in wt_result.stdout.split("\n"):
+ if line.startswith("branch refs/heads/"):
+ active_branches.add(line.split("branch refs/heads/", 1)[-1].strip())
+
+ orphaned = [
+ b for b in all_branches
+ if b not in active_branches
+ and (b.startswith("hermes/hermes-") or b.startswith("pr-"))
+ ]
+ assert "hermes/hermes-deadbeef" in orphaned
+
+ # Delete them
+ if orphaned:
+ subprocess.run(
+ ["git", "branch", "-D"] + orphaned,
+ capture_output=True, text=True, cwd=str(git_repo),
+ )
+
+ # Verify gone
+ result = subprocess.run(
+ ["git", "branch", "--list", "hermes/hermes-deadbeef"],
+ capture_output=True, text=True, cwd=str(git_repo),
+ )
+ assert "hermes/hermes-deadbeef" not in result.stdout
+
+ def test_prunes_orphaned_pr_branch(self, git_repo):
+ """pr-* branches should be deleted during pruning."""
+ subprocess.run(
+ ["git", "branch", "pr-1234", "HEAD"],
+ cwd=str(git_repo), capture_output=True,
+ )
+ subprocess.run(
+ ["git", "branch", "pr-5678", "HEAD"],
+ cwd=str(git_repo), capture_output=True,
+ )
+
+ result = subprocess.run(
+ ["git", "branch", "--format=%(refname:short)"],
+ capture_output=True, text=True, cwd=str(git_repo),
+ )
+ all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
+
+ active_branches = {"main"}
+ orphaned = [
+ b for b in all_branches
+ if b not in active_branches and b.startswith("pr-")
+ ]
+ assert "pr-1234" in orphaned
+ assert "pr-5678" in orphaned
+
+ subprocess.run(
+ ["git", "branch", "-D"] + orphaned,
+ capture_output=True, text=True, cwd=str(git_repo),
+ )
+
+ # Verify gone
+ result = subprocess.run(
+ ["git", "branch", "--format=%(refname:short)"],
+ capture_output=True, text=True, cwd=str(git_repo),
+ )
+ remaining = result.stdout.strip()
+ assert "pr-1234" not in remaining
+ assert "pr-5678" not in remaining
+
+ def test_preserves_active_worktree_branch(self, git_repo):
+ """Branches with active worktrees should NOT be pruned."""
+ info = _setup_worktree(str(git_repo))
+ assert info is not None
+
+ result = subprocess.run(
+ ["git", "worktree", "list", "--porcelain"],
+ capture_output=True, text=True, cwd=str(git_repo),
+ )
+ active_branches = set()
+ for line in result.stdout.split("\n"):
+ if line.startswith("branch refs/heads/"):
+ active_branches.add(line.split("branch refs/heads/", 1)[-1].strip())
+
+ assert info["branch"] in active_branches # Protected
+
+ def test_preserves_main_branch(self, git_repo):
+ """main branch should never be pruned."""
+ result = subprocess.run(
+ ["git", "branch", "--format=%(refname:short)"],
+ capture_output=True, text=True, cwd=str(git_repo),
+ )
+ all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()]
+ active_branches = {"main"}
+
+ orphaned = [
+ b for b in all_branches
+ if b not in active_branches
+ and (b.startswith("hermes/hermes-") or b.startswith("pr-"))
+ ]
+ assert "main" not in orphaned
+
+
class TestSystemPromptInjection:
"""Test that the agent gets worktree context in its system prompt."""
@@ -625,7 +832,7 @@ class TestSystemPromptInjection:
f"{info['path']}. Your branch is `{info['branch']}`. "
f"Changes here do not affect the main working tree or other agents. "
f"Remember to commit and push your changes, and create a PR if appropriate. "
- f"The original repo is at {info['repo_root']}.]"
+ f"The original repo is at {info['repo_root']}.]\n"
)
assert info["path"] in wt_note
diff --git a/tests/cron/test_jobs.py b/tests/cron/test_jobs.py
index cca460100a..e0f56b9612 100644
--- a/tests/cron/test_jobs.py
+++ b/tests/cron/test_jobs.py
@@ -339,6 +339,36 @@ class TestMarkJobRun:
assert updated["last_status"] == "error"
assert updated["last_error"] == "timeout"
+ def test_delivery_error_tracked_separately(self, tmp_cron_dir):
+ """Agent succeeds but delivery fails — both tracked independently."""
+ job = create_job(prompt="Report", schedule="every 1h")
+ mark_job_run(job["id"], success=True, delivery_error="platform 'telegram' not configured")
+ updated = get_job(job["id"])
+ assert updated["last_status"] == "ok"
+ assert updated["last_error"] is None
+ assert updated["last_delivery_error"] == "platform 'telegram' not configured"
+
+ def test_delivery_error_cleared_on_success(self, tmp_cron_dir):
+ """Successful delivery clears the previous delivery error."""
+ job = create_job(prompt="Report", schedule="every 1h")
+ mark_job_run(job["id"], success=True, delivery_error="network timeout")
+ updated = get_job(job["id"])
+ assert updated["last_delivery_error"] == "network timeout"
+ # Next run delivers successfully
+ mark_job_run(job["id"], success=True, delivery_error=None)
+ updated = get_job(job["id"])
+ assert updated["last_delivery_error"] is None
+
+ def test_both_agent_and_delivery_error(self, tmp_cron_dir):
+ """Agent fails AND delivery fails — both errors recorded."""
+ job = create_job(prompt="Report", schedule="every 1h")
+ mark_job_run(job["id"], success=False, error="model timeout",
+ delivery_error="platform 'discord' not enabled")
+ updated = get_job(job["id"])
+ assert updated["last_status"] == "error"
+ assert updated["last_error"] == "model timeout"
+ assert updated["last_delivery_error"] == "platform 'discord' not enabled"
+
class TestAdvanceNextRun:
"""Tests for advance_next_run() — crash-safety for recurring jobs."""
diff --git a/tests/cron/test_scheduler.py b/tests/cron/test_scheduler.py
index 4a15fa2238..c07663a37d 100644
--- a/tests/cron/test_scheduler.py
+++ b/tests/cron/test_scheduler.py
@@ -508,6 +508,90 @@ class TestDeliverResultWrapping:
assert send_mock.call_args.kwargs["thread_id"] == "17585"
+class TestDeliverResultErrorReturns:
+ """Verify _deliver_result returns error strings on failure, None on success."""
+
+ def test_returns_none_on_successful_delivery(self):
+ from gateway.config import Platform
+
+ pconfig = MagicMock()
+ pconfig.enabled = True
+ mock_cfg = MagicMock()
+ mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
+
+ with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \
+ patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"success": True})):
+ job = {
+ "id": "ok-job",
+ "deliver": "origin",
+ "origin": {"platform": "telegram", "chat_id": "123"},
+ }
+ result = _deliver_result(job, "Output.")
+ assert result is None
+
+ def test_returns_none_for_local_delivery(self):
+ """local-only jobs don't deliver — not a failure."""
+ job = {"id": "local-job", "deliver": "local"}
+ result = _deliver_result(job, "Output.")
+ assert result is None
+
+ def test_returns_error_for_unknown_platform(self):
+ job = {
+ "id": "bad-platform",
+ "deliver": "origin",
+ "origin": {"platform": "fax", "chat_id": "123"},
+ }
+ with patch("gateway.config.load_gateway_config"):
+ result = _deliver_result(job, "Output.")
+ assert result is not None
+ assert "unknown platform" in result
+
+ def test_returns_error_when_platform_disabled(self):
+ from gateway.config import Platform
+
+ pconfig = MagicMock()
+ pconfig.enabled = False
+ mock_cfg = MagicMock()
+ mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
+
+ with patch("gateway.config.load_gateway_config", return_value=mock_cfg):
+ job = {
+ "id": "disabled",
+ "deliver": "origin",
+ "origin": {"platform": "telegram", "chat_id": "123"},
+ }
+ result = _deliver_result(job, "Output.")
+ assert result is not None
+ assert "not configured" in result
+
+ def test_returns_error_on_send_failure(self):
+ from gateway.config import Platform
+
+ pconfig = MagicMock()
+ pconfig.enabled = True
+ mock_cfg = MagicMock()
+ mock_cfg.platforms = {Platform.TELEGRAM: pconfig}
+
+ with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \
+ patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"error": "rate limited"})):
+ job = {
+ "id": "rate-limited",
+ "deliver": "origin",
+ "origin": {"platform": "telegram", "chat_id": "123"},
+ }
+ result = _deliver_result(job, "Output.")
+ assert result is not None
+ assert "rate limited" in result
+
+ def test_returns_error_for_unresolved_target(self, monkeypatch):
+ """Non-local delivery with no resolvable target should return an error."""
+ monkeypatch.delenv("TELEGRAM_HOME_CHANNEL", raising=False)
+ job = {"id": "no-target", "deliver": "telegram"}
+ result = _deliver_result(job, "Output.")
+ assert result is not None
+ assert "no delivery target" in result
+
+
class TestRunJobSessionPersistence:
def test_run_job_passes_session_db_and_cron_platform(self, tmp_path):
job = {
diff --git a/tests/gateway/test_feishu_approval_buttons.py b/tests/gateway/test_feishu_approval_buttons.py
new file mode 100644
index 0000000000..9c51d1ac49
--- /dev/null
+++ b/tests/gateway/test_feishu_approval_buttons.py
@@ -0,0 +1,432 @@
+"""Tests for Feishu interactive card approval buttons."""
+
+import asyncio
+import json
+import os
+import sys
+from pathlib import Path
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, MagicMock, Mock, patch
+
+import pytest
+
+# ---------------------------------------------------------------------------
+# Ensure the repo root is importable
+# ---------------------------------------------------------------------------
+_repo = str(Path(__file__).resolve().parents[2])
+if _repo not in sys.path:
+ sys.path.insert(0, _repo)
+
+
+# ---------------------------------------------------------------------------
+# Minimal Feishu mock so FeishuAdapter can be imported without lark-oapi
+# ---------------------------------------------------------------------------
+def _ensure_feishu_mocks():
+ """Provide stubs for lark-oapi / aiohttp.web so the import succeeds."""
+ if "lark_oapi" not in sys.modules:
+ mod = MagicMock()
+ for name in (
+ "lark_oapi", "lark_oapi.api.im.v1",
+ "lark_oapi.event", "lark_oapi.event.callback_type",
+ ):
+ sys.modules.setdefault(name, mod)
+ if "aiohttp" not in sys.modules:
+ aio = MagicMock()
+ sys.modules.setdefault("aiohttp", aio)
+ sys.modules.setdefault("aiohttp.web", aio.web)
+
+
+_ensure_feishu_mocks()
+
+from gateway.config import PlatformConfig
+from gateway.platforms.feishu import FeishuAdapter
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _make_adapter() -> FeishuAdapter:
+ """Create a FeishuAdapter with mocked internals."""
+ config = PlatformConfig(enabled=True)
+ adapter = FeishuAdapter(config)
+ adapter._client = MagicMock()
+ return adapter
+
+
+def _make_card_action_data(
+ action_value: dict,
+ chat_id: str = "oc_12345",
+ open_id: str = "ou_user1",
+ token: str = "tok_abc",
+) -> SimpleNamespace:
+ """Create a mock Feishu card action callback data object."""
+ return SimpleNamespace(
+ event=SimpleNamespace(
+ token=token,
+ context=SimpleNamespace(open_chat_id=chat_id),
+ operator=SimpleNamespace(open_id=open_id),
+ action=SimpleNamespace(
+ tag="button",
+ value=action_value,
+ ),
+ ),
+ )
+
+
+# ===========================================================================
+# send_exec_approval — interactive card with buttons
+# ===========================================================================
+
+class TestFeishuExecApproval:
+ """Test send_exec_approval sends an interactive card."""
+
+ @pytest.mark.asyncio
+ async def test_sends_interactive_card(self):
+ adapter = _make_adapter()
+
+ mock_response = SimpleNamespace(
+ success=lambda: True,
+ data=SimpleNamespace(message_id="msg_001"),
+ )
+ with patch.object(
+ adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
+ return_value=mock_response,
+ ) as mock_send:
+ result = await adapter.send_exec_approval(
+ chat_id="oc_12345",
+ command="rm -rf /important",
+ session_key="agent:main:feishu:group:oc_12345",
+ description="dangerous deletion",
+ )
+
+ assert result.success is True
+ assert result.message_id == "msg_001"
+
+ mock_send.assert_called_once()
+ kwargs = mock_send.call_args[1]
+ assert kwargs["chat_id"] == "oc_12345"
+ assert kwargs["msg_type"] == "interactive"
+
+ # Verify card payload contains the command and buttons
+ card = json.loads(kwargs["payload"])
+ assert card["header"]["template"] == "orange"
+ assert "rm -rf /important" in card["elements"][0]["content"]
+ assert "dangerous deletion" in card["elements"][0]["content"]
+
+ # Check buttons
+ actions = card["elements"][1]["actions"]
+ assert len(actions) == 4
+ action_names = [a["value"]["hermes_action"] for a in actions]
+ assert action_names == [
+ "approve_once", "approve_session", "approve_always", "deny"
+ ]
+
+ @pytest.mark.asyncio
+ async def test_stores_approval_state(self):
+ adapter = _make_adapter()
+
+ mock_response = SimpleNamespace(
+ success=lambda: True,
+ data=SimpleNamespace(message_id="msg_002"),
+ )
+ with patch.object(
+ adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
+ return_value=mock_response,
+ ):
+ await adapter.send_exec_approval(
+ chat_id="oc_12345",
+ command="echo test",
+ session_key="my-session-key",
+ )
+
+ assert len(adapter._approval_state) == 1
+ approval_id = list(adapter._approval_state.keys())[0]
+ state = adapter._approval_state[approval_id]
+ assert state["session_key"] == "my-session-key"
+ assert state["message_id"] == "msg_002"
+ assert state["chat_id"] == "oc_12345"
+
+ @pytest.mark.asyncio
+ async def test_not_connected(self):
+ adapter = _make_adapter()
+ adapter._client = None
+ result = await adapter.send_exec_approval(
+ chat_id="oc_12345", command="ls", session_key="s"
+ )
+ assert result.success is False
+
+ @pytest.mark.asyncio
+ async def test_truncates_long_command(self):
+ adapter = _make_adapter()
+
+ mock_response = SimpleNamespace(
+ success=lambda: True,
+ data=SimpleNamespace(message_id="msg_003"),
+ )
+ with patch.object(
+ adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
+ return_value=mock_response,
+ ) as mock_send:
+ long_cmd = "x" * 5000
+ await adapter.send_exec_approval(
+ chat_id="oc_12345", command=long_cmd, session_key="s"
+ )
+
+ card = json.loads(mock_send.call_args[1]["payload"])
+ content = card["elements"][0]["content"]
+ assert "..." in content
+ assert len(content) < 5000
+
+ @pytest.mark.asyncio
+ async def test_multiple_approvals_get_unique_ids(self):
+ adapter = _make_adapter()
+
+ mock_response = SimpleNamespace(
+ success=lambda: True,
+ data=SimpleNamespace(message_id="msg_x"),
+ )
+ with patch.object(
+ adapter, "_feishu_send_with_retry", new_callable=AsyncMock,
+ return_value=mock_response,
+ ):
+ await adapter.send_exec_approval(
+ chat_id="oc_1", command="cmd1", session_key="s1"
+ )
+ await adapter.send_exec_approval(
+ chat_id="oc_2", command="cmd2", session_key="s2"
+ )
+
+ assert len(adapter._approval_state) == 2
+ ids = list(adapter._approval_state.keys())
+ assert ids[0] != ids[1]
+
+
+# ===========================================================================
+# _handle_card_action_event — approval button clicks
+# ===========================================================================
+
+class TestFeishuApprovalCallback:
+ """Test the approval intercept in _handle_card_action_event."""
+
+ @pytest.mark.asyncio
+ async def test_resolves_approval_on_click(self):
+ adapter = _make_adapter()
+ adapter._approval_state[1] = {
+ "session_key": "agent:main:feishu:group:oc_12345",
+ "message_id": "msg_001",
+ "chat_id": "oc_12345",
+ }
+
+ data = _make_card_action_data(
+ action_value={"hermes_action": "approve_once", "approval_id": 1},
+ )
+
+ with (
+ patch.object(
+ adapter, "_resolve_sender_profile", new_callable=AsyncMock,
+ return_value={"user_id": "ou_user1", "user_name": "Norbert", "user_id_alt": None},
+ ),
+ patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update,
+ patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
+ ):
+ await adapter._handle_card_action_event(data)
+
+ mock_resolve.assert_called_once_with("agent:main:feishu:group:oc_12345", "once")
+ mock_update.assert_called_once_with("msg_001", "Approved once", "Norbert", "once")
+
+ # State should be cleaned up
+ assert 1 not in adapter._approval_state
+
+ @pytest.mark.asyncio
+ async def test_deny_button(self):
+ adapter = _make_adapter()
+ adapter._approval_state[2] = {
+ "session_key": "some-session",
+ "message_id": "msg_002",
+ "chat_id": "oc_12345",
+ }
+
+ data = _make_card_action_data(
+ action_value={"hermes_action": "deny", "approval_id": 2},
+ token="tok_deny",
+ )
+
+ with (
+ patch.object(
+ adapter, "_resolve_sender_profile", new_callable=AsyncMock,
+ return_value={"user_id": "ou_alice", "user_name": "Alice", "user_id_alt": None},
+ ),
+ patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update,
+ patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
+ ):
+ await adapter._handle_card_action_event(data)
+
+ mock_resolve.assert_called_once_with("some-session", "deny")
+ mock_update.assert_called_once_with("msg_002", "Denied", "Alice", "deny")
+
+ @pytest.mark.asyncio
+ async def test_session_approval(self):
+ adapter = _make_adapter()
+ adapter._approval_state[3] = {
+ "session_key": "sess-3",
+ "message_id": "msg_003",
+ "chat_id": "oc_99",
+ }
+
+ data = _make_card_action_data(
+ action_value={"hermes_action": "approve_session", "approval_id": 3},
+ token="tok_ses",
+ )
+
+ with (
+ patch.object(
+ adapter, "_resolve_sender_profile", new_callable=AsyncMock,
+ return_value={"user_id": "ou_u", "user_name": "Bob", "user_id_alt": None},
+ ),
+ patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update,
+ patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
+ ):
+ await adapter._handle_card_action_event(data)
+
+ mock_resolve.assert_called_once_with("sess-3", "session")
+ mock_update.assert_called_once_with("msg_003", "Approved for session", "Bob", "session")
+
+ @pytest.mark.asyncio
+ async def test_always_approval(self):
+ adapter = _make_adapter()
+ adapter._approval_state[4] = {
+ "session_key": "sess-4",
+ "message_id": "msg_004",
+ "chat_id": "oc_55",
+ }
+
+ data = _make_card_action_data(
+ action_value={"hermes_action": "approve_always", "approval_id": 4},
+ token="tok_alw",
+ )
+
+ with (
+ patch.object(
+ adapter, "_resolve_sender_profile", new_callable=AsyncMock,
+ return_value={"user_id": "ou_u", "user_name": "Carol", "user_id_alt": None},
+ ),
+ patch.object(adapter, "_update_approval_card", new_callable=AsyncMock),
+ patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve,
+ ):
+ await adapter._handle_card_action_event(data)
+
+ mock_resolve.assert_called_once_with("sess-4", "always")
+
+ @pytest.mark.asyncio
+ async def test_already_resolved_drops_silently(self):
+ adapter = _make_adapter()
+ # No state for approval_id 99 — already resolved
+
+ data = _make_card_action_data(
+ action_value={"hermes_action": "approve_once", "approval_id": 99},
+ token="tok_gone",
+ )
+
+ with patch("tools.approval.resolve_gateway_approval") as mock_resolve:
+ await adapter._handle_card_action_event(data)
+
+ # Should NOT resolve — already handled
+ mock_resolve.assert_not_called()
+
+ @pytest.mark.asyncio
+ async def test_non_approval_actions_route_normally(self):
+ """Non-approval card actions should still become synthetic commands."""
+ adapter = _make_adapter()
+
+ data = _make_card_action_data(
+ action_value={"custom_action": "something_else"},
+ token="tok_normal",
+ )
+
+ with (
+ patch.object(
+ adapter, "_resolve_sender_profile", new_callable=AsyncMock,
+ return_value={"user_id": "ou_u", "user_name": "Dave", "user_id_alt": None},
+ ),
+ patch.object(adapter, "get_chat_info", new_callable=AsyncMock, return_value={"name": "Test Chat"}),
+ patch.object(adapter, "_handle_message_with_guards", new_callable=AsyncMock) as mock_handle,
+ patch("tools.approval.resolve_gateway_approval") as mock_resolve,
+ ):
+ await adapter._handle_card_action_event(data)
+
+ # Should NOT resolve any approval
+ mock_resolve.assert_not_called()
+ # Should have routed as synthetic command
+ mock_handle.assert_called_once()
+ event = mock_handle.call_args[0][0]
+ assert "/card button" in event.text
+
+
+# ===========================================================================
+# _update_approval_card — card replacement after resolution
+# ===========================================================================
+
+class TestFeishuUpdateApprovalCard:
+ """Test the card update after approval resolution."""
+
+ @pytest.mark.asyncio
+ async def test_updates_card_on_approve(self):
+ adapter = _make_adapter()
+
+ mock_update = AsyncMock()
+ adapter._client.im.v1.message.update = MagicMock()
+
+ with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
+ await adapter._update_approval_card(
+ "msg_001", "Approved once", "Norbert", "once"
+ )
+
+ mock_thread.assert_called_once()
+ # Verify the update request was built
+ call_args = mock_thread.call_args
+ assert call_args[0][0] == adapter._client.im.v1.message.update
+
+ @pytest.mark.asyncio
+ async def test_updates_card_on_deny(self):
+ adapter = _make_adapter()
+
+ with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
+ await adapter._update_approval_card(
+ "msg_002", "Denied", "Alice", "deny"
+ )
+
+ mock_thread.assert_called_once()
+
+ @pytest.mark.asyncio
+ async def test_skips_update_when_not_connected(self):
+ adapter = _make_adapter()
+ adapter._client = None
+
+ with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
+ await adapter._update_approval_card(
+ "msg_001", "Approved", "Bob", "once"
+ )
+
+ mock_thread.assert_not_called()
+
+ @pytest.mark.asyncio
+ async def test_skips_update_when_no_message_id(self):
+ adapter = _make_adapter()
+
+ with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread:
+ await adapter._update_approval_card(
+ "", "Approved", "Bob", "once"
+ )
+
+ mock_thread.assert_not_called()
+
+ @pytest.mark.asyncio
+ async def test_swallows_update_errors(self):
+ adapter = _make_adapter()
+
+ with patch("asyncio.to_thread", new_callable=AsyncMock, side_effect=Exception("API error")):
+ # Should not raise
+ await adapter._update_approval_card(
+ "msg_001", "Approved", "Bob", "once"
+ )
diff --git a/tests/gateway/test_reasoning_command.py b/tests/gateway/test_reasoning_command.py
index cb9e01f11e..e39ed1123d 100644
--- a/tests/gateway/test_reasoning_command.py
+++ b/tests/gateway/test_reasoning_command.py
@@ -87,7 +87,6 @@ class TestReasoningCommand:
)
monkeypatch.setattr(gateway_run, "_hermes_home", hermes_home)
- monkeypatch.delenv("HERMES_REASONING_EFFORT", raising=False)
runner = _make_runner()
runner._reasoning_config = {"enabled": True, "effort": "xhigh"}
@@ -108,7 +107,6 @@ class TestReasoningCommand:
config_path.write_text("agent:\n reasoning_effort: medium\n", encoding="utf-8")
monkeypatch.setattr(gateway_run, "_hermes_home", hermes_home)
- monkeypatch.delenv("HERMES_REASONING_EFFORT", raising=False)
runner = _make_runner()
runner._reasoning_config = {"enabled": True, "effort": "medium"}
@@ -138,7 +136,6 @@ class TestReasoningCommand:
"api_key": "test-key",
},
)
- monkeypatch.delenv("HERMES_REASONING_EFFORT", raising=False)
fake_run_agent = types.ModuleType("run_agent")
fake_run_agent.AIAgent = _CapturingAgent
monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
@@ -170,55 +167,6 @@ class TestReasoningCommand:
assert _CapturingAgent.last_init is not None
assert _CapturingAgent.last_init["reasoning_config"] == {"enabled": True, "effort": "low"}
- def test_run_agent_prefers_config_over_stale_reasoning_env(self, tmp_path, monkeypatch):
- hermes_home = tmp_path / "hermes"
- hermes_home.mkdir()
- (hermes_home / "config.yaml").write_text("agent:\n reasoning_effort: none\n", encoding="utf-8")
-
- monkeypatch.setattr(gateway_run, "_hermes_home", hermes_home)
- monkeypatch.setattr(gateway_run, "_env_path", hermes_home / ".env")
- monkeypatch.setattr(gateway_run, "load_dotenv", lambda *args, **kwargs: None)
- monkeypatch.setattr(
- gateway_run,
- "_resolve_runtime_agent_kwargs",
- lambda: {
- "provider": "openrouter",
- "api_mode": "chat_completions",
- "base_url": "https://openrouter.ai/api/v1",
- "api_key": "test-key",
- },
- )
- monkeypatch.setenv("HERMES_REASONING_EFFORT", "low")
- fake_run_agent = types.ModuleType("run_agent")
- fake_run_agent.AIAgent = _CapturingAgent
- monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent)
-
- _CapturingAgent.last_init = None
- runner = _make_runner()
-
- source = SessionSource(
- platform=Platform.LOCAL,
- chat_id="cli",
- chat_name="CLI",
- chat_type="dm",
- user_id="user-1",
- )
-
- result = asyncio.run(
- runner._run_agent(
- message="ping",
- context_prompt="",
- history=[],
- source=source,
- session_id="session-1",
- session_key="agent:main:local:dm",
- )
- )
-
- assert result["final_response"] == "ok"
- assert _CapturingAgent.last_init is not None
- assert _CapturingAgent.last_init["reasoning_config"] == {"enabled": False}
-
def test_run_agent_includes_enabled_mcp_servers_in_gateway_toolsets(self, tmp_path, monkeypatch):
hermes_home = tmp_path / "hermes"
hermes_home.mkdir()
diff --git a/tests/gateway/test_session_boundary_hooks.py b/tests/gateway/test_session_boundary_hooks.py
new file mode 100644
index 0000000000..31e02980a7
--- /dev/null
+++ b/tests/gateway/test_session_boundary_hooks.py
@@ -0,0 +1,158 @@
+"""Tests that on_session_finalize and on_session_reset plugin hooks fire in the gateway."""
+from datetime import datetime
+from types import SimpleNamespace
+from unittest.mock import AsyncMock, MagicMock, patch
+
+import pytest
+
+from gateway.config import GatewayConfig, Platform, PlatformConfig
+from gateway.platforms.base import MessageEvent
+from gateway.session import SessionEntry, SessionSource, build_session_key
+
+
+def _make_source() -> SessionSource:
+ return SessionSource(
+ platform=Platform.TELEGRAM,
+ user_id="u1",
+ chat_id="c1",
+ user_name="tester",
+ chat_type="dm",
+ )
+
+
+def _make_event(text: str) -> MessageEvent:
+ return MessageEvent(text=text, source=_make_source(), message_id="m1")
+
+
+def _make_runner():
+ from gateway.run import GatewayRunner
+
+ runner = object.__new__(GatewayRunner)
+ runner.config = GatewayConfig(
+ platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")}
+ )
+ adapter = MagicMock()
+ adapter.send = AsyncMock()
+ runner.adapters = {Platform.TELEGRAM: adapter}
+ runner._voice_mode = {}
+ runner.hooks = SimpleNamespace(emit=AsyncMock(), loaded_hooks=False)
+ runner._session_model_overrides = {}
+ runner._pending_model_notes = {}
+ runner._background_tasks = set()
+
+ session_key = build_session_key(_make_source())
+ session_entry = SessionEntry(
+ session_key=session_key,
+ session_id="sess-old",
+ created_at=datetime.now(),
+ updated_at=datetime.now(),
+ platform=Platform.TELEGRAM,
+ chat_type="dm",
+ )
+ new_session_entry = SessionEntry(
+ session_key=session_key,
+ session_id="sess-new",
+ created_at=datetime.now(),
+ updated_at=datetime.now(),
+ platform=Platform.TELEGRAM,
+ chat_type="dm",
+ )
+ runner.session_store = MagicMock()
+ runner.session_store.get_or_create_session.return_value = new_session_entry
+ runner.session_store.reset_session.return_value = new_session_entry
+ runner.session_store._entries = {session_key: session_entry}
+ runner.session_store._generate_session_key.return_value = session_key
+ runner._running_agents = {}
+ runner._pending_messages = {}
+ runner._pending_approvals = {}
+ runner._session_db = None
+ runner._agent_cache_lock = None
+ runner._is_user_authorized = lambda _source: True
+ runner._format_session_info = lambda: ""
+
+ return runner
+
+
+@pytest.mark.asyncio
+@patch("hermes_cli.plugins.invoke_hook")
+async def test_reset_fires_finalize_hook(mock_invoke_hook):
+ """/new must fire on_session_finalize with the OLD session id."""
+ runner = _make_runner()
+
+ await runner._handle_reset_command(_make_event("/new"))
+
+ mock_invoke_hook.assert_any_call(
+ "on_session_finalize", session_id="sess-old", platform="telegram"
+ )
+
+
+@pytest.mark.asyncio
+@patch("hermes_cli.plugins.invoke_hook")
+async def test_reset_fires_reset_hook(mock_invoke_hook):
+ """/new must fire on_session_reset with the NEW session id."""
+ runner = _make_runner()
+
+ await runner._handle_reset_command(_make_event("/new"))
+
+ mock_invoke_hook.assert_any_call(
+ "on_session_reset", session_id="sess-new", platform="telegram"
+ )
+
+
+@pytest.mark.asyncio
+@patch("hermes_cli.plugins.invoke_hook")
+async def test_finalize_before_reset(mock_invoke_hook):
+ """on_session_finalize must fire before on_session_reset."""
+ runner = _make_runner()
+
+ await runner._handle_reset_command(_make_event("/new"))
+
+ calls = [c for c in mock_invoke_hook.call_args_list
+ if c[0][0] in ("on_session_finalize", "on_session_reset")]
+ hook_names = [c[0][0] for c in calls]
+ assert hook_names == ["on_session_finalize", "on_session_reset"]
+
+
+@pytest.mark.asyncio
+@patch("hermes_cli.plugins.invoke_hook")
+async def test_shutdown_fires_finalize_for_active_agents(mock_invoke_hook):
+ """Gateway stop() must fire on_session_finalize for each active agent."""
+ from gateway.run import GatewayRunner
+
+ runner = object.__new__(GatewayRunner)
+ runner._running = True
+ runner._background_tasks = set()
+ runner._pending_messages = {}
+ runner._pending_approvals = {}
+ runner._shutdown_event = MagicMock()
+ runner.adapters = {}
+ runner._exit_reason = "test"
+
+ agent1 = MagicMock()
+ agent1.session_id = "sess-a"
+ agent2 = MagicMock()
+ agent2.session_id = "sess-b"
+ runner._running_agents = {"key-a": agent1, "key-b": agent2}
+
+ with patch("gateway.status.remove_pid_file"), \
+ patch("gateway.status.write_runtime_status"):
+ await runner.stop()
+
+ finalize_calls = [
+ c for c in mock_invoke_hook.call_args_list
+ if c[0][0] == "on_session_finalize"
+ ]
+ session_ids = {c[1]["session_id"] for c in finalize_calls}
+ assert session_ids == {"sess-a", "sess-b"}
+
+
+@pytest.mark.asyncio
+@patch("hermes_cli.plugins.invoke_hook", side_effect=Exception("boom"))
+async def test_hook_error_does_not_break_reset(mock_invoke_hook):
+ """Plugin hook errors must not prevent /new from completing."""
+ runner = _make_runner()
+
+ result = await runner._handle_reset_command(_make_event("/new"))
+
+ # Should still return a success message despite hook errors
+ assert "Session reset" in result or "New session" in result
diff --git a/tests/gateway/test_stream_consumer.py b/tests/gateway/test_stream_consumer.py
index 6c908bbe40..ddc88fc2fc 100644
--- a/tests/gateway/test_stream_consumer.py
+++ b/tests/gateway/test_stream_consumer.py
@@ -324,3 +324,91 @@ class TestSegmentBreakOnToolBoundary:
await consumer.run()
assert consumer.already_sent
+
+ @pytest.mark.asyncio
+ async def test_edit_failure_sends_only_unsent_tail_at_finish(self):
+ """If an edit fails mid-stream, send only the missing tail once at finish."""
+ adapter = MagicMock()
+ send_results = [
+ SimpleNamespace(success=True, message_id="msg_1"),
+ SimpleNamespace(success=True, message_id="msg_2"),
+ ]
+ adapter.send = AsyncMock(side_effect=send_results)
+ adapter.edit_message = AsyncMock(return_value=SimpleNamespace(success=False, error="flood_control:6"))
+ adapter.MAX_MESSAGE_LENGTH = 4096
+
+ config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5, cursor=" ▉")
+ consumer = GatewayStreamConsumer(adapter, "chat_123", config)
+
+ consumer.on_delta("Hello")
+ task = asyncio.create_task(consumer.run())
+ await asyncio.sleep(0.08)
+ consumer.on_delta(" world")
+ await asyncio.sleep(0.08)
+ consumer.finish()
+ await task
+
+ assert adapter.send.call_count == 2
+ first_text = adapter.send.call_args_list[0][1]["content"]
+ second_text = adapter.send.call_args_list[1][1]["content"]
+ assert "Hello" in first_text
+ assert second_text.strip() == "world"
+ assert consumer.already_sent
+
+ @pytest.mark.asyncio
+ async def test_segment_break_clears_failed_edit_fallback_state(self):
+ """A tool boundary after edit failure must not duplicate the next segment."""
+ adapter = MagicMock()
+ send_results = [
+ SimpleNamespace(success=True, message_id="msg_1"),
+ SimpleNamespace(success=True, message_id="msg_2"),
+ ]
+ adapter.send = AsyncMock(side_effect=send_results)
+ adapter.edit_message = AsyncMock(return_value=SimpleNamespace(success=False, error="flood_control:6"))
+ adapter.MAX_MESSAGE_LENGTH = 4096
+
+ config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5, cursor=" ▉")
+ consumer = GatewayStreamConsumer(adapter, "chat_123", config)
+
+ consumer.on_delta("Hello")
+ task = asyncio.create_task(consumer.run())
+ await asyncio.sleep(0.08)
+ consumer.on_delta(" world")
+ await asyncio.sleep(0.08)
+ consumer.on_delta(None)
+ consumer.on_delta("Next segment")
+ consumer.finish()
+ await task
+
+ sent_texts = [call[1]["content"] for call in adapter.send.call_args_list]
+ assert sent_texts == ["Hello ▉", "Next segment"]
+
+ @pytest.mark.asyncio
+ async def test_fallback_final_splits_long_continuation_without_dropping_text(self):
+ """Long continuation tails should be chunked when fallback final-send runs."""
+ adapter = MagicMock()
+ adapter.send = AsyncMock(side_effect=[
+ SimpleNamespace(success=True, message_id="msg_1"),
+ SimpleNamespace(success=True, message_id="msg_2"),
+ SimpleNamespace(success=True, message_id="msg_3"),
+ ])
+ adapter.edit_message = AsyncMock(return_value=SimpleNamespace(success=False, error="flood_control:6"))
+ adapter.MAX_MESSAGE_LENGTH = 610
+
+ config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5, cursor=" ▉")
+ consumer = GatewayStreamConsumer(adapter, "chat_123", config)
+
+ prefix = "abc"
+ tail = "x" * 620
+ consumer.on_delta(prefix)
+ task = asyncio.create_task(consumer.run())
+ await asyncio.sleep(0.08)
+ consumer.on_delta(tail)
+ await asyncio.sleep(0.08)
+ consumer.finish()
+ await task
+
+ sent_texts = [call[1]["content"] for call in adapter.send.call_args_list]
+ assert len(sent_texts) == 3
+ assert sent_texts[0].startswith(prefix)
+ assert sum(len(t) for t in sent_texts[1:]) == len(tail)
diff --git a/tests/hermes_cli/test_doctor.py b/tests/hermes_cli/test_doctor.py
index d91cf3f647..f30fb48396 100644
--- a/tests/hermes_cli/test_doctor.py
+++ b/tests/hermes_cli/test_doctor.py
@@ -136,3 +136,73 @@ def test_check_gateway_service_linger_skips_when_service_not_installed(monkeypat
out = capsys.readouterr().out
assert out == ""
assert issues == []
+
+
+# ── Memory provider section (doctor should only check the *active* provider) ──
+
+
+class TestDoctorMemoryProviderSection:
+ """The ◆ Memory Provider section should respect memory.provider config."""
+
+ def _make_hermes_home(self, tmp_path, provider=""):
+ """Create a minimal HERMES_HOME with config.yaml."""
+ home = tmp_path / ".hermes"
+ home.mkdir(parents=True, exist_ok=True)
+ import yaml
+ config = {"memory": {"provider": provider}} if provider else {"memory": {}}
+ (home / "config.yaml").write_text(yaml.dump(config))
+ return home
+
+ def _run_doctor_and_capture(self, monkeypatch, tmp_path, provider=""):
+ """Run doctor and capture stdout."""
+ home = self._make_hermes_home(tmp_path, provider)
+ monkeypatch.setattr(doctor_mod, "HERMES_HOME", home)
+ monkeypatch.setattr(doctor_mod, "PROJECT_ROOT", tmp_path / "project")
+ monkeypatch.setattr(doctor_mod, "_DHH", str(home))
+ (tmp_path / "project").mkdir(exist_ok=True)
+
+ # Stub tool availability (returns empty) so doctor runs past it
+ fake_model_tools = types.SimpleNamespace(
+ check_tool_availability=lambda *a, **kw: ([], []),
+ TOOLSET_REQUIREMENTS={},
+ )
+ monkeypatch.setitem(sys.modules, "model_tools", fake_model_tools)
+
+ # Stub auth checks to avoid real API calls
+ try:
+ from hermes_cli import auth as _auth_mod
+ monkeypatch.setattr(_auth_mod, "get_nous_auth_status", lambda: {})
+ monkeypatch.setattr(_auth_mod, "get_codex_auth_status", lambda: {})
+ except Exception:
+ pass
+
+ import io, contextlib
+ buf = io.StringIO()
+ with contextlib.redirect_stdout(buf):
+ doctor_mod.run_doctor(Namespace(fix=False))
+ return buf.getvalue()
+
+ def test_no_provider_shows_builtin_ok(self, monkeypatch, tmp_path):
+ out = self._run_doctor_and_capture(monkeypatch, tmp_path, provider="")
+ assert "Memory Provider" in out
+ assert "Built-in memory active" in out
+ # Should NOT mention Honcho or Mem0 errors
+ assert "Honcho API key" not in out
+ assert "Mem0" not in out
+
+ def test_honcho_provider_not_installed_shows_fail(self, monkeypatch, tmp_path):
+ # Make honcho import fail
+ monkeypatch.setitem(
+ sys.modules, "plugins.memory.honcho.client", None
+ )
+ out = self._run_doctor_and_capture(monkeypatch, tmp_path, provider="honcho")
+ assert "Memory Provider" in out
+ # Should show failure since honcho is set but not importable
+ assert "Built-in memory active" not in out
+
+ def test_mem0_provider_not_installed_shows_fail(self, monkeypatch, tmp_path):
+ # Make mem0 import fail
+ monkeypatch.setitem(sys.modules, "plugins.memory.mem0", None)
+ out = self._run_doctor_and_capture(monkeypatch, tmp_path, provider="mem0")
+ assert "Memory Provider" in out
+ assert "Built-in memory active" not in out
diff --git a/tests/hermes_cli/test_runtime_provider_resolution.py b/tests/hermes_cli/test_runtime_provider_resolution.py
index ded0c9202f..0abc8196f7 100644
--- a/tests/hermes_cli/test_runtime_provider_resolution.py
+++ b/tests/hermes_cli/test_runtime_provider_resolution.py
@@ -808,6 +808,55 @@ def test_minimax_explicit_api_mode_respected(monkeypatch):
assert resolved["api_mode"] == "chat_completions"
+def test_minimax_config_base_url_overrides_hardcoded_default(monkeypatch):
+ """model.base_url in config.yaml should override the hardcoded default (#6039)."""
+ monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax")
+ monkeypatch.setattr(rp, "_get_model_config", lambda: {
+ "provider": "minimax",
+ "base_url": "https://api.minimaxi.com/anthropic",
+ })
+ monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key")
+ monkeypatch.delenv("MINIMAX_BASE_URL", raising=False)
+
+ resolved = rp.resolve_runtime_provider(requested="minimax")
+
+ assert resolved["provider"] == "minimax"
+ assert resolved["base_url"] == "https://api.minimaxi.com/anthropic"
+ assert resolved["api_mode"] == "anthropic_messages"
+
+
+def test_minimax_env_base_url_still_wins_over_config(monkeypatch):
+ """MINIMAX_BASE_URL env var should take priority over config.yaml model.base_url."""
+ monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax")
+ monkeypatch.setattr(rp, "_get_model_config", lambda: {
+ "provider": "minimax",
+ "base_url": "https://api.minimaxi.com/anthropic",
+ })
+ monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key")
+ monkeypatch.setenv("MINIMAX_BASE_URL", "https://custom.example.com/v1")
+
+ resolved = rp.resolve_runtime_provider(requested="minimax")
+
+ # Env var wins because resolve_api_key_provider_credentials prefers it
+ assert resolved["base_url"] == "https://custom.example.com/v1"
+
+
+def test_minimax_config_base_url_ignored_for_different_provider(monkeypatch):
+ """model.base_url should NOT be used when model.provider doesn't match."""
+ monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax")
+ monkeypatch.setattr(rp, "_get_model_config", lambda: {
+ "provider": "openrouter",
+ "base_url": "https://some-other-endpoint.com/v1",
+ })
+ monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key")
+ monkeypatch.delenv("MINIMAX_BASE_URL", raising=False)
+
+ resolved = rp.resolve_runtime_provider(requested="minimax")
+
+ # Should use the default, NOT the config base_url from a different provider
+ assert resolved["base_url"] == "https://api.minimax.io/anthropic"
+
+
def test_alibaba_default_coding_intl_endpoint_uses_chat_completions(monkeypatch):
"""Alibaba default coding-intl /v1 URL should use chat_completions mode."""
monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "alibaba")
diff --git a/tests/hermes_cli/test_setup_model_selection.py b/tests/hermes_cli/test_setup_model_selection.py
index 3cb7056cf2..b42365da9d 100644
--- a/tests/hermes_cli/test_setup_model_selection.py
+++ b/tests/hermes_cli/test_setup_model_selection.py
@@ -34,8 +34,8 @@ class TestSetupProviderModelSelection:
@pytest.mark.parametrize("provider_id,expected_defaults", [
("zai", ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"]),
("kimi-coding", ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"]),
- ("minimax", ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"]),
- ("minimax-cn", ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"]),
+ ("minimax", ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"]),
+ ("minimax-cn", ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"]),
("opencode-zen", ["gpt-5.4", "gpt-5.3-codex", "claude-sonnet-4-6", "gemini-3-flash"]),
("opencode-go", ["glm-5", "kimi-k2.5", "minimax-m2.5", "minimax-m2.7"]),
])
diff --git a/tests/test_ollama_num_ctx.py b/tests/test_ollama_num_ctx.py
new file mode 100644
index 0000000000..fff0144d33
--- /dev/null
+++ b/tests/test_ollama_num_ctx.py
@@ -0,0 +1,135 @@
+"""Tests for Ollama num_ctx context length detection and injection.
+
+Covers:
+ agent/model_metadata.py — query_ollama_num_ctx()
+ run_agent.py — _ollama_num_ctx detection + extra_body injection
+"""
+
+from unittest.mock import patch, MagicMock
+
+import pytest
+
+from agent.model_metadata import query_ollama_num_ctx
+
+
+# ═══════════════════════════════════════════════════════════════════════
+# Level 1: query_ollama_num_ctx — Ollama API interaction
+# ═══════════════════════════════════════════════════════════════════════
+
+
+def _mock_httpx_client(show_response_data, status_code=200):
+ """Create a mock httpx.Client context manager that returns given /api/show data."""
+ mock_resp = MagicMock(status_code=status_code)
+ mock_resp.json.return_value = show_response_data
+ mock_client = MagicMock()
+ mock_client.post.return_value = mock_resp
+ mock_ctx = MagicMock()
+ mock_ctx.__enter__ = MagicMock(return_value=mock_client)
+ mock_ctx.__exit__ = MagicMock(return_value=False)
+ return mock_ctx, mock_client
+
+
+class TestQueryOllamaNumCtx:
+ """Test the Ollama /api/show context length query."""
+
+ def test_returns_context_from_model_info(self):
+ """Should extract context_length from GGUF model_info metadata."""
+ show_data = {
+ "model_info": {"llama.context_length": 131072},
+ "parameters": "",
+ }
+ mock_ctx, _ = _mock_httpx_client(show_data)
+
+ with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"):
+ # httpx is imported inside the function — patch the module import
+ import httpx
+ with patch.object(httpx, "Client", return_value=mock_ctx):
+ result = query_ollama_num_ctx("llama3.1:8b", "http://localhost:11434/v1")
+
+ assert result == 131072
+
+ def test_prefers_explicit_num_ctx_from_modelfile(self):
+ """If the Modelfile sets num_ctx explicitly, that should take priority."""
+ show_data = {
+ "model_info": {"llama.context_length": 131072},
+ "parameters": "num_ctx 32768\ntemperature 0.7",
+ }
+ mock_ctx, _ = _mock_httpx_client(show_data)
+
+ with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"):
+ import httpx
+ with patch.object(httpx, "Client", return_value=mock_ctx):
+ result = query_ollama_num_ctx("custom-model", "http://localhost:11434")
+
+ assert result == 32768
+
+ def test_returns_none_for_non_ollama_server(self):
+ """Should return None if the server is not Ollama."""
+ with patch("agent.model_metadata.detect_local_server_type", return_value="lm-studio"):
+ result = query_ollama_num_ctx("model", "http://localhost:1234")
+ assert result is None
+
+ def test_returns_none_on_connection_error(self):
+ """Should return None if the server is unreachable."""
+ with patch("agent.model_metadata.detect_local_server_type", side_effect=Exception("timeout")):
+ result = query_ollama_num_ctx("model", "http://localhost:11434")
+ assert result is None
+
+ def test_returns_none_on_404(self):
+ """Should return None if the model is not found."""
+ mock_ctx, _ = _mock_httpx_client({}, status_code=404)
+
+ with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"):
+ import httpx
+ with patch.object(httpx, "Client", return_value=mock_ctx):
+ result = query_ollama_num_ctx("nonexistent", "http://localhost:11434")
+
+ assert result is None
+
+ def test_strips_provider_prefix(self):
+ """Should strip 'local:' prefix from model name before querying."""
+ show_data = {
+ "model_info": {"qwen2.context_length": 32768},
+ "parameters": "",
+ }
+ mock_ctx, mock_client = _mock_httpx_client(show_data)
+
+ with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"):
+ import httpx
+ with patch.object(httpx, "Client", return_value=mock_ctx):
+ result = query_ollama_num_ctx("local:qwen2.5:7b", "http://localhost:11434/v1")
+
+ # Verify the post was called with stripped name (no "local:" prefix)
+ call_args = mock_client.post.call_args
+ assert call_args[1]["json"]["name"] == "qwen2.5:7b" or call_args[0][1] is not None
+ assert result == 32768
+
+ def test_handles_qwen2_architecture_key(self):
+ """Different model architectures use different key prefixes in model_info."""
+ show_data = {
+ "model_info": {"qwen2.context_length": 65536},
+ "parameters": "",
+ }
+ mock_ctx, _ = _mock_httpx_client(show_data)
+
+ with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"):
+ import httpx
+ with patch.object(httpx, "Client", return_value=mock_ctx):
+ result = query_ollama_num_ctx("qwen2.5:32b", "http://localhost:11434")
+
+ assert result == 65536
+
+ def test_returns_none_when_model_info_empty(self):
+ """Should return None if model_info has no context_length key."""
+ show_data = {
+ "model_info": {"llama.embedding_length": 4096},
+ "parameters": "",
+ }
+ mock_ctx, _ = _mock_httpx_client(show_data)
+
+ with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"):
+ import httpx
+ with patch.object(httpx, "Client", return_value=mock_ctx):
+ result = query_ollama_num_ctx("model", "http://localhost:11434")
+
+ assert result is None
diff --git a/tests/test_retry_utils.py b/tests/test_retry_utils.py
new file mode 100644
index 0000000000..f39c3142d9
--- /dev/null
+++ b/tests/test_retry_utils.py
@@ -0,0 +1,117 @@
+"""Tests for agent.retry_utils jittered backoff."""
+
+import threading
+
+import agent.retry_utils as retry_utils
+from agent.retry_utils import jittered_backoff
+
+
+def test_backoff_is_exponential():
+ """Base delay should double each attempt (before jitter)."""
+ for attempt in (1, 2, 3, 4):
+ delays = [jittered_backoff(attempt, base_delay=5.0, max_delay=120.0, jitter_ratio=0.0) for _ in range(100)]
+ expected = min(5.0 * (2 ** (attempt - 1)), 120.0)
+ mean = sum(delays) / len(delays)
+ assert abs(mean - expected) < 0.01, f"attempt {attempt}: expected {expected}, got {mean}"
+
+
+def test_backoff_respects_max_delay():
+ """Even with high attempt numbers, delay should not exceed max_delay."""
+ for attempt in (10, 20, 100):
+ delay = jittered_backoff(attempt, base_delay=5.0, max_delay=60.0, jitter_ratio=0.0)
+ assert delay <= 60.0, f"attempt {attempt}: delay {delay} exceeds max 60s"
+
+
+def test_backoff_adds_jitter():
+ """With jitter enabled, delays should vary across calls."""
+ delays = [jittered_backoff(1, base_delay=10.0, max_delay=120.0, jitter_ratio=0.5) for _ in range(50)]
+ assert min(delays) != max(delays), "jitter should produce varying delays"
+ assert all(d >= 10.0 for d in delays), "jittered delay should be >= base delay"
+ assert all(d <= 15.0 for d in delays), "jittered delay should be bounded"
+
+
+def test_backoff_attempt_1_is_base():
+ """First attempt delay should equal base_delay (with no jitter)."""
+ delay = jittered_backoff(1, base_delay=3.0, max_delay=120.0, jitter_ratio=0.0)
+ assert delay == 3.0
+
+
+def test_backoff_with_zero_base_delay_returns_max():
+ """base_delay=0 should return max_delay (guard against busy-wait)."""
+ delay = jittered_backoff(1, base_delay=0.0, max_delay=60.0, jitter_ratio=0.0)
+ assert delay == 60.0
+
+
+def test_backoff_with_extreme_attempt_returns_max():
+ """Very large attempt numbers should not overflow and should return max_delay."""
+ delay = jittered_backoff(999, base_delay=5.0, max_delay=120.0, jitter_ratio=0.0)
+ assert delay == 120.0
+
+
+def test_backoff_negative_attempt_treated_as_one():
+ """Negative attempt should not crash and behaves like attempt=1."""
+ delay = jittered_backoff(-5, base_delay=10.0, max_delay=120.0, jitter_ratio=0.0)
+ assert delay == 10.0
+
+
+def test_backoff_thread_safety():
+ """Concurrent calls should generally produce different delays."""
+ results = []
+ barrier = threading.Barrier(8)
+
+ def _call_backoff():
+ barrier.wait()
+ results.append(jittered_backoff(1, base_delay=10.0, max_delay=120.0, jitter_ratio=0.5))
+
+ threads = [threading.Thread(target=_call_backoff) for _ in range(8)]
+ for t in threads:
+ t.start()
+ for t in threads:
+ t.join(timeout=5)
+
+ assert len(results) == 8
+ unique = len(set(results))
+ assert unique >= 6, f"Expected mostly unique delays, got {unique}/8 unique"
+
+
+def test_backoff_uses_locked_tick_for_seed(monkeypatch):
+ """Seed derivation should use per-call tick captured under lock."""
+ import time
+
+ monkeypatch.setattr(retry_utils, "_jitter_counter", 0)
+
+ recorded_seeds = []
+
+ class _RecordingRandom:
+ def __init__(self, seed):
+ recorded_seeds.append(seed)
+
+ def uniform(self, a, b):
+ return 0.0
+
+ monkeypatch.setattr(retry_utils.random, "Random", _RecordingRandom)
+
+ fixed_time_ns = 123456789
+
+ def _time_ns_wait_for_two_ticks():
+ deadline = time.time() + 2.0
+ while retry_utils._jitter_counter < 2 and time.time() < deadline:
+ time.sleep(0.001)
+ return fixed_time_ns
+
+ monkeypatch.setattr(retry_utils.time, "time_ns", _time_ns_wait_for_two_ticks)
+
+ barrier = threading.Barrier(2)
+
+ def _call():
+ barrier.wait()
+ jittered_backoff(1, base_delay=10.0, max_delay=120.0, jitter_ratio=0.5)
+
+ threads = [threading.Thread(target=_call) for _ in range(2)]
+ for t in threads:
+ t.start()
+ for t in threads:
+ t.join(timeout=5)
+
+ assert len(recorded_seeds) == 2
+ assert len(set(recorded_seeds)) == 2, f"Expected unique seeds, got {recorded_seeds}"
diff --git a/tests/tools/test_browser_camofox_persistence.py b/tests/tools/test_browser_camofox_persistence.py
index 0fa5723c67..0e9c863727 100644
--- a/tests/tools/test_browser_camofox_persistence.py
+++ b/tests/tools/test_browser_camofox_persistence.py
@@ -16,6 +16,7 @@ from tools.browser_camofox import (
_managed_persistence_enabled,
camofox_close,
camofox_navigate,
+ camofox_soft_cleanup,
check_camofox_available,
cleanup_all_camofox_sessions,
get_vnc_url,
@@ -240,3 +241,50 @@ class TestVncUrlDiscovery:
assert result["vnc_url"] == "http://localhost:6080"
assert "vnc_hint" in result
+
+
+class TestCamofoxSoftCleanup:
+ """camofox_soft_cleanup drops local state only when managed persistence is on."""
+
+ def test_returns_true_and_drops_session_when_enabled(self, tmp_path, monkeypatch):
+ monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+ monkeypatch.setenv("CAMOFOX_URL", "http://localhost:9377")
+
+ with _enable_persistence():
+ _get_session("task-1")
+ result = camofox_soft_cleanup("task-1")
+
+ assert result is True
+ # Session should have been dropped from in-memory store
+ import tools.browser_camofox as mod
+ with mod._sessions_lock:
+ assert "task-1" not in mod._sessions
+
+ def test_returns_false_when_disabled(self, tmp_path, monkeypatch):
+ monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+ monkeypatch.setenv("CAMOFOX_URL", "http://localhost:9377")
+
+ _get_session("task-1")
+ config = {"browser": {"camofox": {"managed_persistence": False}}}
+ with patch("tools.browser_camofox.load_config", return_value=config):
+ result = camofox_soft_cleanup("task-1")
+
+ assert result is False
+ # Session should still be present — not dropped
+ import tools.browser_camofox as mod
+ with mod._sessions_lock:
+ assert "task-1" in mod._sessions
+
+ def test_does_not_call_server_delete(self, tmp_path, monkeypatch):
+ """Soft cleanup must never hit the Camofox /sessions DELETE endpoint."""
+ monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+ monkeypatch.setenv("CAMOFOX_URL", "http://localhost:9377")
+
+ with (
+ _enable_persistence(),
+ patch("tools.browser_camofox.requests.delete") as mock_delete,
+ ):
+ _get_session("task-1")
+ camofox_soft_cleanup("task-1")
+
+ mock_delete.assert_not_called()
diff --git a/tests/tools/test_browser_cleanup.py b/tests/tools/test_browser_cleanup.py
index df21f3a0ea..817927903e 100644
--- a/tests/tools/test_browser_cleanup.py
+++ b/tests/tools/test_browser_cleanup.py
@@ -65,6 +65,62 @@ class TestBrowserCleanup:
mock_stop.assert_called_once_with("task-1")
mock_run.assert_called_once_with("task-1", "close", [], timeout=10)
+ def test_cleanup_camofox_managed_persistence_skips_close(self):
+ """When camofox mode + managed persistence, soft_cleanup fires instead of close."""
+ browser_tool = self.browser_tool
+ browser_tool._active_sessions["task-1"] = {
+ "session_name": "sess-1",
+ "bb_session_id": None,
+ }
+ browser_tool._session_last_activity["task-1"] = 123.0
+
+ with (
+ patch("tools.browser_tool._is_camofox_mode", return_value=True),
+ patch("tools.browser_tool._maybe_stop_recording") as mock_stop,
+ patch(
+ "tools.browser_tool._run_browser_command",
+ return_value={"success": True},
+ ),
+ patch("tools.browser_tool.os.path.exists", return_value=False),
+ patch(
+ "tools.browser_camofox.camofox_soft_cleanup",
+ return_value=True,
+ ) as mock_soft,
+ patch("tools.browser_camofox.camofox_close") as mock_close,
+ ):
+ browser_tool.cleanup_browser("task-1")
+
+ mock_soft.assert_called_once_with("task-1")
+ mock_close.assert_not_called()
+
+ def test_cleanup_camofox_no_persistence_calls_close(self):
+ """When camofox mode but managed persistence is off, camofox_close fires."""
+ browser_tool = self.browser_tool
+ browser_tool._active_sessions["task-1"] = {
+ "session_name": "sess-1",
+ "bb_session_id": None,
+ }
+ browser_tool._session_last_activity["task-1"] = 123.0
+
+ with (
+ patch("tools.browser_tool._is_camofox_mode", return_value=True),
+ patch("tools.browser_tool._maybe_stop_recording") as mock_stop,
+ patch(
+ "tools.browser_tool._run_browser_command",
+ return_value={"success": True},
+ ),
+ patch("tools.browser_tool.os.path.exists", return_value=False),
+ patch(
+ "tools.browser_camofox.camofox_soft_cleanup",
+ return_value=False,
+ ) as mock_soft,
+ patch("tools.browser_camofox.camofox_close") as mock_close,
+ ):
+ browser_tool.cleanup_browser("task-1")
+
+ mock_soft.assert_called_once_with("task-1")
+ mock_close.assert_called_once_with("task-1")
+
def test_emergency_cleanup_clears_all_tracking_state(self):
browser_tool = self.browser_tool
browser_tool._cleanup_done = False
diff --git a/tests/tools/test_browser_homebrew_paths.py b/tests/tools/test_browser_homebrew_paths.py
index 3e2e766694..33b725604c 100644
--- a/tests/tools/test_browser_homebrew_paths.py
+++ b/tests/tools/test_browser_homebrew_paths.py
@@ -152,6 +152,109 @@ class TestFindAgentBrowser:
class TestRunBrowserCommandPathConstruction:
"""Verify _run_browser_command() includes Homebrew node dirs in subprocess PATH."""
+ def test_subprocess_preserves_executable_path_with_spaces(self, tmp_path):
+ """A local agent-browser path containing spaces must stay one argv entry."""
+ captured_cmd = None
+
+ mock_proc = MagicMock()
+ mock_proc.returncode = 0
+ mock_proc.wait.return_value = 0
+
+ def capture_popen(cmd, **kwargs):
+ nonlocal captured_cmd
+ captured_cmd = cmd
+ return mock_proc
+
+ fake_session = {
+ "session_name": "test-session",
+ "session_id": "test-id",
+ "cdp_url": None,
+ }
+ fake_json = json.dumps({"success": True})
+ browser_path = "/Users/test/Library/Application Support/hermes/node_modules/.bin/agent-browser"
+ hermes_home = str(tmp_path / "hermes-home")
+
+ with patch("tools.browser_tool._find_agent_browser", return_value=browser_path), \
+ patch("tools.browser_tool._get_session_info", return_value=fake_session), \
+ patch("tools.browser_tool._socket_safe_tmpdir", return_value=str(tmp_path)), \
+ patch("tools.browser_tool._discover_homebrew_node_dirs", return_value=[]), \
+ patch("hermes_constants.Path.home", return_value=tmp_path), \
+ patch("subprocess.Popen", side_effect=capture_popen), \
+ patch("os.open", return_value=99), \
+ patch("os.close"), \
+ patch("tools.interrupt.is_interrupted", return_value=False), \
+ patch.dict(
+ os.environ,
+ {
+ "PATH": "/usr/bin:/bin",
+ "HOME": "/home/test",
+ "HERMES_HOME": hermes_home,
+ },
+ clear=True,
+ ):
+ with patch("builtins.open", mock_open(read_data=fake_json)):
+ _run_browser_command("test-task", "navigate", ["https://example.com"])
+
+ assert captured_cmd is not None
+ assert captured_cmd[0] == browser_path
+ assert captured_cmd[1:5] == [
+ "--session",
+ "test-session",
+ "--json",
+ "navigate",
+ ]
+
+ def test_subprocess_splits_npx_fallback_into_command_and_package(self, tmp_path):
+ """The synthetic npx fallback should still expand into separate argv items."""
+ captured_cmd = None
+
+ mock_proc = MagicMock()
+ mock_proc.returncode = 0
+ mock_proc.wait.return_value = 0
+
+ def capture_popen(cmd, **kwargs):
+ nonlocal captured_cmd
+ captured_cmd = cmd
+ return mock_proc
+
+ fake_session = {
+ "session_name": "test-session",
+ "session_id": "test-id",
+ "cdp_url": None,
+ }
+ fake_json = json.dumps({"success": True})
+ hermes_home = str(tmp_path / "hermes-home")
+
+ with patch("tools.browser_tool._find_agent_browser", return_value="npx agent-browser"), \
+ patch("tools.browser_tool._get_session_info", return_value=fake_session), \
+ patch("tools.browser_tool._socket_safe_tmpdir", return_value=str(tmp_path)), \
+ patch("tools.browser_tool._discover_homebrew_node_dirs", return_value=[]), \
+ patch("hermes_constants.Path.home", return_value=tmp_path), \
+ patch("subprocess.Popen", side_effect=capture_popen), \
+ patch("os.open", return_value=99), \
+ patch("os.close"), \
+ patch("tools.interrupt.is_interrupted", return_value=False), \
+ patch.dict(
+ os.environ,
+ {
+ "PATH": "/usr/bin:/bin",
+ "HOME": "/home/test",
+ "HERMES_HOME": hermes_home,
+ },
+ clear=True,
+ ):
+ with patch("builtins.open", mock_open(read_data=fake_json)):
+ _run_browser_command("test-task", "navigate", ["https://example.com"])
+
+ assert captured_cmd is not None
+ assert captured_cmd[:2] == ["npx", "agent-browser"]
+ assert captured_cmd[2:6] == [
+ "--session",
+ "test-session",
+ "--json",
+ "navigate",
+ ]
+
def test_subprocess_path_includes_homebrew_node_dirs(self, tmp_path):
"""When _discover_homebrew_node_dirs returns dirs, they should appear
in the subprocess env PATH passed to Popen."""
diff --git a/tests/tools/test_notify_on_complete.py b/tests/tools/test_notify_on_complete.py
index 888721906d..8cf17bfbf6 100644
--- a/tests/tools/test_notify_on_complete.py
+++ b/tests/tools/test_notify_on_complete.py
@@ -197,6 +197,26 @@ class TestCheckpointNotify:
s = registry.get("proc_live")
assert s.notify_on_complete is True
+ def test_recover_requeues_notify_watchers(self, registry, tmp_path):
+ checkpoint = tmp_path / "procs.json"
+ checkpoint.write_text(json.dumps([{
+ "session_id": "proc_live",
+ "command": "sleep 999",
+ "pid": os.getpid(),
+ "task_id": "t1",
+ "session_key": "sk1",
+ "watcher_platform": "telegram",
+ "watcher_chat_id": "123",
+ "watcher_thread_id": "42",
+ "watcher_interval": 5,
+ "notify_on_complete": True,
+ }]))
+ with patch("tools.process_registry.CHECKPOINT_PATH", checkpoint):
+ recovered = registry.recover_from_checkpoint()
+ assert recovered == 1
+ assert len(registry.pending_watchers) == 1
+ assert registry.pending_watchers[0]["notify_on_complete"] is True
+
def test_recover_defaults_false(self, registry, tmp_path):
"""Old checkpoint entries without the field default to False."""
checkpoint = tmp_path / "procs.json"
diff --git a/tests/tools/test_process_registry.py b/tests/tools/test_process_registry.py
index e6cfa40e77..44e3a1bd32 100644
--- a/tests/tools/test_process_registry.py
+++ b/tests/tools/test_process_registry.py
@@ -2,6 +2,9 @@
import json
import os
+import signal
+import subprocess
+import sys
import time
import pytest
from pathlib import Path
@@ -45,6 +48,23 @@ def _make_session(
return s
+def _spawn_python_sleep(seconds: float) -> subprocess.Popen:
+ """Spawn a portable short-lived Python sleep process."""
+ return subprocess.Popen(
+ [sys.executable, "-c", f"import time; time.sleep({seconds})"],
+ )
+
+
+def _wait_until(predicate, timeout: float = 5.0, interval: float = 0.05) -> bool:
+ """Poll a predicate until it returns truthy or the timeout elapses."""
+ deadline = time.monotonic() + timeout
+ while time.monotonic() < deadline:
+ if predicate():
+ return True
+ time.sleep(interval)
+ return False
+
+
# =========================================================================
# Get / Poll
# =========================================================================
@@ -349,6 +369,88 @@ class TestCheckpoint:
assert recovered == 1
assert len(registry.pending_watchers) == 0
+ def test_recovery_keeps_live_checkpoint_entries(self, registry, tmp_path):
+ checkpoint = tmp_path / "procs.json"
+ checkpoint.write_text(json.dumps([{
+ "session_id": "proc_live",
+ "command": "sleep 999",
+ "pid": os.getpid(),
+ "task_id": "t1",
+ "session_key": "sk1",
+ }]))
+
+ with patch("tools.process_registry.CHECKPOINT_PATH", checkpoint):
+ recovered = registry.recover_from_checkpoint()
+ assert recovered == 1
+ assert registry.get("proc_live") is not None
+
+ data = json.loads(checkpoint.read_text())
+ assert len(data) == 1
+ assert data[0]["session_id"] == "proc_live"
+ assert data[0]["pid"] == os.getpid()
+ assert data != []
+
+ def test_recovery_skips_explicit_sandbox_backed_entries(self, registry, tmp_path):
+ checkpoint = tmp_path / "procs.json"
+ original = [{
+ "session_id": "proc_remote",
+ "command": "sleep 999",
+ "pid": os.getpid(),
+ "task_id": "t1",
+ "pid_scope": "sandbox",
+ }]
+ checkpoint.write_text(json.dumps(original))
+
+ with patch("tools.process_registry.CHECKPOINT_PATH", checkpoint):
+ recovered = registry.recover_from_checkpoint()
+ assert recovered == 0
+ assert registry.get("proc_remote") is None
+
+ data = json.loads(checkpoint.read_text())
+ assert data == []
+
+ def test_detached_recovered_process_eventually_exits(self, registry, tmp_path):
+ proc = _spawn_python_sleep(0.4)
+ checkpoint = tmp_path / "procs.json"
+ checkpoint.write_text(json.dumps([{
+ "session_id": "proc_live",
+ "command": "python -c 'import time; time.sleep(0.4)'",
+ "pid": proc.pid,
+ "task_id": "t1",
+ "session_key": "sk1",
+ }]))
+
+ try:
+ with patch("tools.process_registry.CHECKPOINT_PATH", checkpoint):
+ recovered = registry.recover_from_checkpoint()
+ assert recovered == 1
+
+ session = registry.get("proc_live")
+ assert session is not None
+ assert session.detached is True
+
+ proc.wait(timeout=5)
+
+ assert _wait_until(
+ lambda: registry.get("proc_live") is not None
+ and registry.get("proc_live").exited,
+ timeout=5,
+ )
+
+ poll_result = registry.poll("proc_live")
+ assert poll_result["status"] == "exited"
+
+ wait_result = registry.wait("proc_live", timeout=1)
+ assert wait_result["status"] == "exited"
+ finally:
+ if proc.poll() is None:
+ proc.terminate()
+ try:
+ proc.wait(timeout=5)
+ except Exception:
+ proc.kill()
+ proc.wait(timeout=5)
+
# =========================================================================
# Kill process
@@ -365,6 +467,27 @@ class TestKillProcess:
result = registry.kill_process(s.id)
assert result["status"] == "already_exited"
+ def test_kill_detached_session_uses_host_pid(self, registry):
+ s = _make_session(sid="proc_detached", command="sleep 999")
+ s.pid = 424242
+ s.detached = True
+ registry._running[s.id] = s
+
+ calls = []
+
+ def fake_kill(pid, sig):
+ calls.append((pid, sig))
+
+ try:
+ with patch("tools.process_registry.os.kill", side_effect=fake_kill):
+ result = registry.kill_process(s.id)
+
+ assert result["status"] == "killed"
+ assert (424242, 0) in calls
+ assert (424242, signal.SIGTERM) in calls
+ finally:
+ registry._running.pop(s.id, None)
+
# =========================================================================
# Tool handler
diff --git a/tests/tools/test_tool_result_storage.py b/tests/tools/test_tool_result_storage.py
index 96b904a576..4e51fe7bb7 100644
--- a/tests/tools/test_tool_result_storage.py
+++ b/tests/tools/test_tool_result_storage.py
@@ -395,7 +395,7 @@ class TestEnforceTurnBudget:
assert PERSISTED_OUTPUT_TAG in msgs[1]["content"]
def test_medium_result_regression(self):
- """6 results of 42K chars each (252K total) — each under 50K default
+ """6 results of 42K chars each (252K total) — each under 100K default
threshold but aggregate exceeds 200K budget. L3 should persist."""
env = MagicMock()
env.execute.return_value = {"output": "", "returncode": 0}
@@ -449,7 +449,7 @@ class TestPerToolThresholds:
try:
import tools.terminal_tool # noqa: F401
val = registry.get_max_result_size("terminal")
- assert val == 30_000
+ assert val == 100_000
except ImportError:
pytest.skip("terminal_tool not importable in test env")
@@ -467,6 +467,6 @@ class TestPerToolThresholds:
try:
import tools.file_tools # noqa: F401
val = registry.get_max_result_size("search_files")
- assert val == 20_000
+ assert val == 100_000
except ImportError:
pytest.skip("file_tools not importable in test env")
diff --git a/tools/browser_camofox.py b/tools/browser_camofox.py
index 226e99b56b..3a305bbcb1 100644
--- a/tools/browser_camofox.py
+++ b/tools/browser_camofox.py
@@ -101,7 +101,8 @@ def _managed_persistence_enabled() -> bool:
"""
try:
camofox_cfg = load_config().get("browser", {}).get("camofox", {})
- except Exception:
+ except Exception as exc:
+ logger.warning("managed_persistence check failed, defaulting to disabled: %s", exc)
return False
return bool(camofox_cfg.get("managed_persistence"))
@@ -172,6 +173,22 @@ def _drop_session(task_id: Optional[str]) -> Optional[Dict[str, Any]]:
return _sessions.pop(task_id, None)
+def camofox_soft_cleanup(task_id: Optional[str] = None) -> bool:
+ """Release the in-memory session without destroying the server-side context.
+
+ When managed persistence is enabled the browser profile (and its cookies)
+ must survive across agent tasks. This helper drops only the local tracking
+ entry and returns ``True``. When managed persistence is *not* enabled it
+ does nothing and returns ``False`` so the caller can fall back to
+ :func:`camofox_close`.
+ """
+ if _managed_persistence_enabled():
+ _drop_session(task_id)
+ logger.debug("Camofox soft cleanup for task %s (managed persistence)", task_id)
+ return True
+ return False
+
+
# ---------------------------------------------------------------------------
# HTTP helpers
# ---------------------------------------------------------------------------
diff --git a/tools/browser_tool.py b/tools/browser_tool.py
index 7e52ed78d9..e62a586c11 100644
--- a/tools/browser_tool.py
+++ b/tools/browser_tool.py
@@ -877,7 +877,11 @@ def _run_browser_command(
# Local mode — launch a headless Chromium instance
backend_args = ["--session", session_info["session_name"]]
- cmd_parts = browser_cmd.split() + backend_args + [
+ # Keep concrete executable paths intact, even when they contain spaces.
+ # Only the synthetic npx fallback needs to expand into multiple argv items.
+ cmd_prefix = ["npx", "agent-browser"] if browser_cmd == "npx agent-browser" else [browser_cmd]
+
+ cmd_parts = cmd_prefix + backend_args + [
"--json",
command
] + args
@@ -1931,11 +1935,15 @@ def cleanup_browser(task_id: Optional[str] = None) -> None:
if task_id is None:
task_id = "default"
- # Also clean up Camofox session if running in Camofox mode
+ # Also clean up Camofox session if running in Camofox mode.
+ # Skip full close when managed persistence is enabled — the browser
+ # profile (and its session cookies) must survive across agent tasks.
+ # The inactivity reaper still frees idle resources.
if _is_camofox_mode():
try:
- from tools.browser_camofox import camofox_close
- camofox_close(task_id)
+ from tools.browser_camofox import camofox_close, camofox_soft_cleanup
+ if not camofox_soft_cleanup(task_id):
+ camofox_close(task_id)
except Exception as e:
logger.debug("Camofox cleanup for task %s: %s", task_id, e)
diff --git a/tools/budget_config.py b/tools/budget_config.py
index 52204cdf8e..577e59442e 100644
--- a/tools/budget_config.py
+++ b/tools/budget_config.py
@@ -15,9 +15,9 @@ PINNED_THRESHOLDS: Dict[str, float] = {
# Defaults matching the current hardcoded values in tool_result_storage.py.
# Kept here as the single source of truth; tool_result_storage.py imports these.
-DEFAULT_RESULT_SIZE_CHARS: int = 50_000
+DEFAULT_RESULT_SIZE_CHARS: int = 100_000
DEFAULT_TURN_BUDGET_CHARS: int = 200_000
-DEFAULT_PREVIEW_SIZE_CHARS: int = 2_000
+DEFAULT_PREVIEW_SIZE_CHARS: int = 1_500
@dataclass(frozen=True)
diff --git a/tools/code_execution_tool.py b/tools/code_execution_tool.py
index 08cbf15b1f..f0d61210ff 100644
--- a/tools/code_execution_tool.py
+++ b/tools/code_execution_tool.py
@@ -1343,5 +1343,5 @@ registry.register(
enabled_tools=kw.get("enabled_tools")),
check_fn=check_sandbox_requirements,
emoji="🐍",
- max_result_size_chars=30_000,
+ max_result_size_chars=100_000,
)
diff --git a/tools/cronjob_tools.py b/tools/cronjob_tools.py
index 82d43c588b..595ad8bc71 100644
--- a/tools/cronjob_tools.py
+++ b/tools/cronjob_tools.py
@@ -195,6 +195,7 @@ def _format_job(job: Dict[str, Any]) -> Dict[str, Any]:
"next_run_at": job.get("next_run_at"),
"last_run_at": job.get("last_run_at"),
"last_status": job.get("last_status"),
+ "last_delivery_error": job.get("last_delivery_error"),
"enabled": job.get("enabled", True),
"state": job.get("state", "scheduled" if job.get("enabled", True) else "paused"),
"paused_at": job.get("paused_at"),
diff --git a/tools/file_tools.py b/tools/file_tools.py
index 4ca10b2dcf..05376dfc67 100644
--- a/tools/file_tools.py
+++ b/tools/file_tools.py
@@ -856,4 +856,4 @@ def _handle_search_files(args, **kw):
registry.register(name="read_file", toolset="file", schema=READ_FILE_SCHEMA, handler=_handle_read_file, check_fn=_check_file_reqs, emoji="📖", max_result_size_chars=float('inf'))
registry.register(name="write_file", toolset="file", schema=WRITE_FILE_SCHEMA, handler=_handle_write_file, check_fn=_check_file_reqs, emoji="✍️", max_result_size_chars=100_000)
registry.register(name="patch", toolset="file", schema=PATCH_SCHEMA, handler=_handle_patch, check_fn=_check_file_reqs, emoji="🔧", max_result_size_chars=100_000)
-registry.register(name="search_files", toolset="file", schema=SEARCH_FILES_SCHEMA, handler=_handle_search_files, check_fn=_check_file_reqs, emoji="🔎", max_result_size_chars=20_000)
+registry.register(name="search_files", toolset="file", schema=SEARCH_FILES_SCHEMA, handler=_handle_search_files, check_fn=_check_file_reqs, emoji="🔎", max_result_size_chars=100_000)
diff --git a/tools/process_registry.py b/tools/process_registry.py
index 948f073abb..b935f49c33 100644
--- a/tools/process_registry.py
+++ b/tools/process_registry.py
@@ -76,6 +76,7 @@ class ProcessSession:
output_buffer: str = "" # Rolling output (last MAX_OUTPUT_CHARS)
max_output_chars: int = MAX_OUTPUT_CHARS
detached: bool = False # True if recovered from crash (no pipe)
+ pid_scope: str = "host" # "host" for local/PTY PIDs, "sandbox" for env-local PIDs
# Watcher/notification metadata (persisted for crash recovery)
watcher_platform: str = ""
watcher_chat_id: str = ""
@@ -127,6 +128,48 @@ class ProcessRegistry:
lines.pop(0)
return "\n".join(lines)
+ @staticmethod
+ def _is_host_pid_alive(pid: Optional[int]) -> bool:
+ """Best-effort liveness check for host-visible PIDs."""
+ if not pid:
+ return False
+ try:
+ os.kill(pid, 0)
+ return True
+ except (ProcessLookupError, PermissionError):
+ return False
+
+ def _refresh_detached_session(self, session: Optional[ProcessSession]) -> Optional[ProcessSession]:
+ """Update recovered host-PID sessions when the underlying process has exited."""
+ if session is None or session.exited or not session.detached or session.pid_scope != "host":
+ return session
+
+ if self._is_host_pid_alive(session.pid):
+ return session
+
+ with session._lock:
+ if session.exited:
+ return session
+ session.exited = True
+ # Recovered sessions no longer have a waitable handle, so the real
+ # exit code is unavailable once the original process object is gone.
+ session.exit_code = None
+
+ self._move_to_finished(session)
+ return session
+
+ @staticmethod
+ def _terminate_host_pid(pid: int) -> None:
+ """Terminate a host-visible PID without requiring the original process handle."""
+ if _IS_WINDOWS:
+ os.kill(pid, signal.SIGTERM)
+ return
+
+ try:
+ os.killpg(os.getpgid(pid), signal.SIGTERM)
+ except (OSError, ProcessLookupError, PermissionError):
+ os.kill(pid, signal.SIGTERM)
+
# ----- Spawn -----
def spawn_local(
@@ -269,6 +312,7 @@ class ProcessRegistry:
cwd=cwd,
started_at=time.time(),
env_ref=env,
+ pid_scope="sandbox",
)
# Run the command in the sandbox with output capture
@@ -439,7 +483,8 @@ class ProcessRegistry:
def get(self, session_id: str) -> Optional[ProcessSession]:
"""Get a session by ID (running or finished)."""
with self._lock:
- return self._running.get(session_id) or self._finished.get(session_id)
+ session = self._running.get(session_id) or self._finished.get(session_id)
+ return self._refresh_detached_session(session)
def poll(self, session_id: str) -> dict:
"""Check status and get new output for a background process."""
@@ -531,6 +576,7 @@ class ProcessRegistry:
deadline = time.monotonic() + effective_timeout
while time.monotonic() < deadline:
+ session = self._refresh_detached_session(session)
if session.exited:
result = {
"status": "exited",
@@ -596,6 +642,25 @@ class ProcessRegistry:
elif session.env_ref and session.pid:
# Non-local -- kill inside sandbox
session.env_ref.execute(f"kill {session.pid} 2>/dev/null", timeout=5)
+ elif session.detached and session.pid_scope == "host" and session.pid:
+ if not self._is_host_pid_alive(session.pid):
+ with session._lock:
+ session.exited = True
+ session.exit_code = None
+ self._move_to_finished(session)
+ return {
+ "status": "already_exited",
+ "exit_code": session.exit_code,
+ }
+ self._terminate_host_pid(session.pid)
+ else:
+ return {
+ "status": "error",
+ "error": (
+ "Recovered process cannot be killed after restart because "
+ "its original runtime handle is no longer available"
+ ),
+ }
session.exited = True
session.exit_code = -15 # SIGTERM
self._move_to_finished(session)
@@ -640,6 +705,8 @@ class ProcessRegistry:
with self._lock:
all_sessions = list(self._running.values()) + list(self._finished.values())
+ all_sessions = [self._refresh_detached_session(s) for s in all_sessions]
+
if task_id:
all_sessions = [s for s in all_sessions if s.task_id == task_id]
@@ -666,6 +733,12 @@ class ProcessRegistry:
def has_active_processes(self, task_id: str) -> bool:
"""Check if there are active (running) processes for a task_id."""
+ with self._lock:
+ sessions = list(self._running.values())
+
+ for session in sessions:
+ self._refresh_detached_session(session)
+
with self._lock:
return any(
s.task_id == task_id and not s.exited
@@ -674,6 +747,12 @@ class ProcessRegistry:
def has_active_for_session(self, session_key: str) -> bool:
"""Check if there are active processes for a gateway session key."""
+ with self._lock:
+ sessions = list(self._running.values())
+
+ for session in sessions:
+ self._refresh_detached_session(session)
+
with self._lock:
return any(
s.session_key == session_key and not s.exited
@@ -727,6 +806,7 @@ class ProcessRegistry:
"session_id": s.id,
"command": s.command,
"pid": s.pid,
+ "pid_scope": s.pid_scope,
"cwd": s.cwd,
"started_at": s.started_at,
"task_id": s.task_id,
@@ -764,13 +844,21 @@ class ProcessRegistry:
if not pid:
continue
+ pid_scope = entry.get("pid_scope", "host")
+ if pid_scope != "host":
+ # Sandbox-backed processes keep only in-sandbox PIDs in the
+ # checkpoint, which are not meaningful to the restarted host
+ # process once the original environment handle is gone.
+ logger.info(
+ "Skipping recovery for non-host process: %s (pid=%s, scope=%s)",
+ entry.get("command", "unknown")[:60],
+ pid,
+ pid_scope,
+ )
+ continue
+
# Check if PID is still alive
- alive = False
- try:
- os.kill(pid, 0)
- alive = True
- except (ProcessLookupError, PermissionError):
- pass
+ alive = self._is_host_pid_alive(pid)
if alive:
session = ProcessSession(
@@ -779,6 +867,7 @@ class ProcessRegistry:
task_id=entry.get("task_id", ""),
session_key=entry.get("session_key", ""),
pid=pid,
+ pid_scope=pid_scope,
cwd=entry.get("cwd"),
started_at=entry.get("started_at", time.time()),
detached=True, # Can't read output, but can report status + kill
@@ -802,14 +891,10 @@ class ProcessRegistry:
"platform": session.watcher_platform,
"chat_id": session.watcher_chat_id,
"thread_id": session.watcher_thread_id,
+ "notify_on_complete": session.notify_on_complete,
})
- # Clear the checkpoint (will be rewritten as processes finish)
- try:
- from utils import atomic_json_write
- atomic_json_write(CHECKPOINT_PATH, [])
- except Exception as e:
- logger.debug("Could not clear checkpoint file: %s", e, exc_info=True)
+ self._write_checkpoint()
return recovered
diff --git a/tools/terminal_tool.py b/tools/terminal_tool.py
index 520de31998..243127a295 100644
--- a/tools/terminal_tool.py
+++ b/tools/terminal_tool.py
@@ -1620,5 +1620,5 @@ registry.register(
handler=_handle_terminal,
check_fn=check_terminal_requirements,
emoji="💻",
- max_result_size_chars=30_000,
+ max_result_size_chars=100_000,
)
diff --git a/trajectory_compressor.py b/trajectory_compressor.py
index e4faf97a3d..24c1f722af 100644
--- a/trajectory_compressor.py
+++ b/trajectory_compressor.py
@@ -44,6 +44,7 @@ import fire
from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn, TimeElapsedColumn, TimeRemainingColumn
from rich.console import Console
from hermes_constants import OPENROUTER_BASE_URL
+from agent.retry_utils import jittered_backoff
# Load environment variables
from dotenv import load_dotenv
@@ -585,7 +586,7 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
self.logger.warning(f"Summarization attempt {attempt + 1} failed: {e}")
if attempt < self.config.max_retries - 1:
- time.sleep(self.config.retry_delay * (attempt + 1))
+ time.sleep(jittered_backoff(attempt + 1, base_delay=self.config.retry_delay, max_delay=30.0))
else:
# Fallback: create a basic summary
return "[CONTEXT SUMMARY]: [Summary generation failed - previous turns contained tool calls and responses that have been compressed to save context space.]"
@@ -647,7 +648,7 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
self.logger.warning(f"Summarization attempt {attempt + 1} failed: {e}")
if attempt < self.config.max_retries - 1:
- await asyncio.sleep(self.config.retry_delay * (attempt + 1))
+ await asyncio.sleep(jittered_backoff(attempt + 1, base_delay=self.config.retry_delay, max_delay=30.0))
else:
# Fallback: create a basic summary
return "[CONTEXT SUMMARY]: [Summary generation failed - previous turns contained tool calls and responses that have been compressed to save context space.]"
diff --git a/website/docs/user-guide/features/skins.md b/website/docs/user-guide/features/skins.md
index 5aec20cdf1..e093a763b5 100644
--- a/website/docs/user-guide/features/skins.md
+++ b/website/docs/user-guide/features/skins.md
@@ -196,6 +196,55 @@ branding:
tool_prefix: "▏"
```
+## Hermes Mod — Visual Skin Editor
+
+[Hermes Mod](https://github.com/cocktailpeanut/hermes-mod) is a community-built web UI for creating and managing skins visually. Instead of writing YAML by hand, you get a point-and-click editor with live preview.
+
+
+
+**What it does:**
+
+- Lists all built-in and custom skins
+- Opens any skin into a visual editor with all Hermes skin fields (colors, spinner, branding, tool prefix, tool emojis)
+- Generates `banner_logo` text art from a text prompt
+- Converts uploaded images (PNG, JPG, GIF, WEBP) into `banner_hero` ASCII art with multiple render styles (braille, ASCII ramp, blocks, dots)
+- Saves directly to `~/.hermes/skins/`
+- Activates a skin by updating `~/.hermes/config.yaml`
+- Shows the generated YAML and a live preview
+
+### Install
+
+**Option 1 — Pinokio (1-click):**
+
+Find it on [pinokio.computer](https://pinokio.computer) and install with one click.
+
+**Option 2 — npx (quickest from terminal):**
+
+```bash
+npx -y hermes-mod
+```
+
+**Option 3 — Manual:**
+
+```bash
+git clone https://github.com/cocktailpeanut/hermes-mod.git
+cd hermes-mod/app
+npm install
+npm start
+```
+
+### Usage
+
+1. Start the app (via Pinokio or terminal).
+2. Open **Skin Studio**.
+3. Choose a built-in or custom skin to edit.
+4. Generate a logo from text and/or upload an image for hero art. Pick a render style and width.
+5. Edit colors, spinner, branding, and other fields.
+6. Click **Save** to write the skin YAML to `~/.hermes/skins/`.
+7. Click **Activate** to set it as the current skin (updates `display.skin` in `config.yaml`).
+
+Hermes Mod respects the `HERMES_HOME` environment variable, so it works with [profiles](/docs/user-guide/profiles) too.
+
## Operational notes
- Built-in skins load from `hermes_cli/skin_engine.py`.
diff --git a/website/docs/user-guide/messaging/telegram.md b/website/docs/user-guide/messaging/telegram.md
index a59b73ca5a..4e4495ad28 100644
--- a/website/docs/user-guide/messaging/telegram.md
+++ b/website/docs/user-guide/messaging/telegram.md
@@ -463,6 +463,40 @@ platforms:
You usually don't need to configure this manually. The auto-discovery via DoH handles most restricted-network scenarios. The `TELEGRAM_FALLBACK_IPS` env var is only needed if DoH is also blocked on your network.
:::
+## Proxy Support
+
+If your network requires an HTTP proxy to reach the internet (common in corporate environments), the Telegram adapter automatically reads standard proxy environment variables and routes all connections through the proxy.
+
+### Supported variables
+
+The adapter checks these environment variables in order, using the first one that is set:
+
+1. `HTTPS_PROXY`
+2. `HTTP_PROXY`
+3. `ALL_PROXY`
+4. `https_proxy` / `http_proxy` / `all_proxy` (lowercase variants)
+
+### Configuration
+
+Set the proxy in your environment before starting the gateway:
+
+```bash
+export HTTPS_PROXY=http://proxy.example.com:8080
+hermes gateway
+```
+
+Or add it to `~/.hermes/.env`:
+
+```bash
+HTTPS_PROXY=http://proxy.example.com:8080
+```
+
+The proxy applies to both the primary transport and all fallback IP transports. No additional Hermes configuration is needed — if the environment variable is set, it's used automatically.
+
+:::note
+This covers the custom fallback transport layer that Hermes uses for Telegram connections. The standard `httpx` client used elsewhere already respects proxy env vars natively.
+:::
+
## Message Reactions
The bot can add emoji reactions to messages as visual processing feedback: