diff --git a/RELEASE_v0.8.0.md b/RELEASE_v0.8.0.md new file mode 100644 index 0000000000..57c8b05aba --- /dev/null +++ b/RELEASE_v0.8.0.md @@ -0,0 +1,346 @@ +# Hermes Agent v0.8.0 (v2026.4.8) + +**Release Date:** April 8, 2026 + +> The intelligence release — background task auto-notifications, free MiMo v2 Pro on Nous Portal, live model switching across all platforms, self-optimized GPT/Codex guidance, native Google AI Studio, smart inactivity timeouts, approval buttons, MCP OAuth 2.1, and 209 merged PRs with 82 resolved issues. + +--- + +## ✨ Highlights + +- **Background Process Auto-Notifications (`notify_on_complete`)** — Background tasks can now automatically notify the agent when they finish. Start a long-running process (AI model training, test suites, deployments, builds) and the agent gets notified on completion — no polling needed. The agent can keep working on other things and pick up results when they land. ([#5779](https://github.com/NousResearch/hermes-agent/pull/5779)) + +- **Free Xiaomi MiMo v2 Pro on Nous Portal** — Nous Portal now supports the free-tier Xiaomi MiMo v2 Pro model for auxiliary tasks (compression, vision, summarization), with free-tier model gating and pricing display in model selection. ([#6018](https://github.com/NousResearch/hermes-agent/pull/6018), [#5880](https://github.com/NousResearch/hermes-agent/pull/5880)) + +- **Live Model Switching (`/model` Command)** — Switch models and providers mid-session from CLI, Telegram, Discord, Slack, or any gateway platform. Aggregator-aware resolution keeps you on OpenRouter/Nous when possible, with automatic cross-provider fallback when needed. Interactive model pickers on Telegram and Discord with inline buttons. ([#5181](https://github.com/NousResearch/hermes-agent/pull/5181), [#5742](https://github.com/NousResearch/hermes-agent/pull/5742)) + +- **Self-Optimized GPT/Codex Tool-Use Guidance** — The agent diagnosed and patched 5 failure modes in GPT and Codex tool calling through automated behavioral benchmarking, dramatically improving reliability on OpenAI models. Includes execution discipline guidance and thinking-only prefill continuation for structured reasoning. ([#6120](https://github.com/NousResearch/hermes-agent/pull/6120), [#5414](https://github.com/NousResearch/hermes-agent/pull/5414), [#5931](https://github.com/NousResearch/hermes-agent/pull/5931)) + +- **Google AI Studio (Gemini) Native Provider** — Direct access to Gemini models through Google's AI Studio API. Includes automatic models.dev registry integration for real-time context length detection across any provider. ([#5577](https://github.com/NousResearch/hermes-agent/pull/5577)) + +- **Inactivity-Based Agent Timeouts** — Gateway and cron timeouts now track actual tool activity instead of wall-clock time. Long-running tasks that are actively working will never be killed — only truly idle agents time out. ([#5389](https://github.com/NousResearch/hermes-agent/pull/5389), [#5440](https://github.com/NousResearch/hermes-agent/pull/5440)) + +- **Approval Buttons on Slack & Telegram** — Dangerous command approval via native platform buttons instead of typing `/approve`. Slack gets thread context preservation; Telegram gets emoji reactions for approval status. ([#5890](https://github.com/NousResearch/hermes-agent/pull/5890), [#5975](https://github.com/NousResearch/hermes-agent/pull/5975)) + +- **MCP OAuth 2.1 PKCE + OSV Malware Scanning** — Full standards-compliant OAuth for MCP server authentication, plus automatic malware scanning of MCP extension packages via the OSV vulnerability database. ([#5420](https://github.com/NousResearch/hermes-agent/pull/5420), [#5305](https://github.com/NousResearch/hermes-agent/pull/5305)) + +- **Centralized Logging & Config Validation** — Structured logging to `~/.hermes/logs/` (agent.log + errors.log) with the `hermes logs` command for tailing and filtering. Config structure validation catches malformed YAML at startup before it causes cryptic failures. ([#5430](https://github.com/NousResearch/hermes-agent/pull/5430), [#5426](https://github.com/NousResearch/hermes-agent/pull/5426)) + +- **Plugin System Expansion** — Plugins can now register CLI subcommands, receive request-scoped API hooks with correlation IDs, prompt for required env vars during install, and hook into session lifecycle events (finalize/reset). ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295), [#5427](https://github.com/NousResearch/hermes-agent/pull/5427), [#5470](https://github.com/NousResearch/hermes-agent/pull/5470), [#6129](https://github.com/NousResearch/hermes-agent/pull/6129)) + +- **Matrix Tier 1 & Platform Hardening** — Matrix gets reactions, read receipts, rich formatting, and room management. Discord adds channel controls and ignored channels. Signal gets full MEDIA: tag delivery. Mattermost gets file attachments. Comprehensive reliability fixes across all platforms. ([#5275](https://github.com/NousResearch/hermes-agent/pull/5275), [#5975](https://github.com/NousResearch/hermes-agent/pull/5975), [#5602](https://github.com/NousResearch/hermes-agent/pull/5602)) + +- **Security Hardening Pass** — Consolidated SSRF protections, timing attack mitigations, tar traversal prevention, credential leakage guards, cron path traversal hardening, and cross-session isolation. Terminal workdir sanitization across all backends. ([#5944](https://github.com/NousResearch/hermes-agent/pull/5944), [#5613](https://github.com/NousResearch/hermes-agent/pull/5613), [#5629](https://github.com/NousResearch/hermes-agent/pull/5629)) + +--- + +## 🏗️ Core Agent & Architecture + +### Provider & Model Support +- **Native Google AI Studio (Gemini) provider** with models.dev integration for automatic context length detection ([#5577](https://github.com/NousResearch/hermes-agent/pull/5577)) +- **`/model` command — full provider+model system overhaul** — live switching across CLI and all gateway platforms with aggregator-aware resolution ([#5181](https://github.com/NousResearch/hermes-agent/pull/5181)) +- **Interactive model picker for Telegram and Discord** — inline button-based model selection ([#5742](https://github.com/NousResearch/hermes-agent/pull/5742)) +- **Nous Portal free-tier model gating** with pricing display in model selection ([#5880](https://github.com/NousResearch/hermes-agent/pull/5880)) +- **Model pricing display** for OpenRouter and Nous Portal providers ([#5416](https://github.com/NousResearch/hermes-agent/pull/5416)) +- **xAI (Grok) prompt caching** via `x-grok-conv-id` header ([#5604](https://github.com/NousResearch/hermes-agent/pull/5604)) +- **Grok added to tool-use enforcement models** for direct xAI usage ([#5595](https://github.com/NousResearch/hermes-agent/pull/5595)) +- **MiniMax TTS provider** (speech-2.8) ([#4963](https://github.com/NousResearch/hermes-agent/pull/4963)) +- **Non-agentic model warning** — warns users when loading Hermes LLM models not designed for tool use ([#5378](https://github.com/NousResearch/hermes-agent/pull/5378)) +- **Ollama Cloud auth, /model switch persistence**, and alias tab completion ([#5269](https://github.com/NousResearch/hermes-agent/pull/5269)) +- **Preserve dots in OpenCode Go model names** (minimax-m2.7, glm-4.5, kimi-k2.5) ([#5597](https://github.com/NousResearch/hermes-agent/pull/5597)) +- **MiniMax models 404 fix** — strip /v1 from Anthropic base URL for OpenCode Go ([#4918](https://github.com/NousResearch/hermes-agent/pull/4918)) +- **Provider credential reset windows** honored in pooled failover ([#5188](https://github.com/NousResearch/hermes-agent/pull/5188)) +- **OAuth token sync** between credential pool and credentials file ([#4981](https://github.com/NousResearch/hermes-agent/pull/4981)) +- **Stale OAuth credentials** no longer block OpenRouter users on auto-detect ([#5746](https://github.com/NousResearch/hermes-agent/pull/5746)) +- **Codex OAuth credential pool disconnect** + expired token import fix ([#5681](https://github.com/NousResearch/hermes-agent/pull/5681)) +- **Codex pool entry sync** from `~/.codex/auth.json` on exhaustion — @GratefulDave ([#5610](https://github.com/NousResearch/hermes-agent/pull/5610)) +- **Auxiliary client payment fallback** — retry with next provider on 402 ([#5599](https://github.com/NousResearch/hermes-agent/pull/5599)) +- **Auxiliary client resolves named custom providers** and 'main' alias ([#5978](https://github.com/NousResearch/hermes-agent/pull/5978)) +- **Use mimo-v2-pro** for non-vision auxiliary tasks on Nous free tier ([#6018](https://github.com/NousResearch/hermes-agent/pull/6018)) +- **Vision auto-detection** tries main provider first ([#6041](https://github.com/NousResearch/hermes-agent/pull/6041)) +- **Provider re-ordering and Quick Install** — @austinpickett ([#4664](https://github.com/NousResearch/hermes-agent/pull/4664)) +- **Nous OAuth access_token** no longer used as inference API key — @SHL0MS ([#5564](https://github.com/NousResearch/hermes-agent/pull/5564)) +- **HERMES_PORTAL_BASE_URL env var** respected during Nous login — @benbarclay ([#5745](https://github.com/NousResearch/hermes-agent/pull/5745)) +- **Env var overrides** for Nous portal/inference URLs ([#5419](https://github.com/NousResearch/hermes-agent/pull/5419)) +- **Z.AI endpoint auto-detect** via probe and cache ([#5763](https://github.com/NousResearch/hermes-agent/pull/5763)) +- **MiniMax context lengths, model catalog, thinking guard, aux model, and config base_url** corrections ([#6082](https://github.com/NousResearch/hermes-agent/pull/6082)) +- **Community provider/model resolution fixes** — salvaged 4 community PRs + MiniMax aux URL ([#5983](https://github.com/NousResearch/hermes-agent/pull/5983)) + +### Agent Loop & Conversation +- **Self-optimized GPT/Codex tool-use guidance** via automated behavioral benchmarking — agent self-diagnosed and patched 5 failure modes ([#6120](https://github.com/NousResearch/hermes-agent/pull/6120)) +- **GPT/Codex execution discipline guidance** in system prompts ([#5414](https://github.com/NousResearch/hermes-agent/pull/5414)) +- **Thinking-only prefill continuation** for structured reasoning responses ([#5931](https://github.com/NousResearch/hermes-agent/pull/5931)) +- **Accept reasoning-only responses** without retries — set content to "(empty)" instead of infinite retry ([#5278](https://github.com/NousResearch/hermes-agent/pull/5278)) +- **Jittered retry backoff** — exponential backoff with jitter for API retries ([#6048](https://github.com/NousResearch/hermes-agent/pull/6048)) +- **Smart thinking block signature management** — preserve and manage Anthropic thinking signatures across turns ([#6112](https://github.com/NousResearch/hermes-agent/pull/6112)) +- **Coerce tool call arguments** to match JSON Schema types — fixes models that send strings instead of numbers/booleans ([#5265](https://github.com/NousResearch/hermes-agent/pull/5265)) +- **Save oversized tool results to file** instead of destructive truncation ([#5210](https://github.com/NousResearch/hermes-agent/pull/5210)) +- **Sandbox-aware tool result persistence** ([#6085](https://github.com/NousResearch/hermes-agent/pull/6085)) +- **Streaming fallback** improved after edit failures ([#6110](https://github.com/NousResearch/hermes-agent/pull/6110)) +- **Codex empty-output gaps** covered in fallback + normalizer + auxiliary client ([#5724](https://github.com/NousResearch/hermes-agent/pull/5724), [#5730](https://github.com/NousResearch/hermes-agent/pull/5730), [#5734](https://github.com/NousResearch/hermes-agent/pull/5734)) +- **Codex stream output backfill** from output_item.done events ([#5689](https://github.com/NousResearch/hermes-agent/pull/5689)) +- **Stream consumer creates new message** after tool boundaries ([#5739](https://github.com/NousResearch/hermes-agent/pull/5739)) +- **Codex validation aligned** with normalization for empty stream output ([#5940](https://github.com/NousResearch/hermes-agent/pull/5940)) +- **Bridge tool-calls** in copilot-acp adapter ([#5460](https://github.com/NousResearch/hermes-agent/pull/5460)) +- **Filter transcript-only roles** from chat-completions payload ([#4880](https://github.com/NousResearch/hermes-agent/pull/4880)) +- **Context compaction failures fixed** on temperature-restricted models — @MadKangYu ([#5608](https://github.com/NousResearch/hermes-agent/pull/5608)) +- **Sanitize tool_calls for all strict APIs** (Fireworks, Mistral, etc.) — @lumethegreat ([#5183](https://github.com/NousResearch/hermes-agent/pull/5183)) + +### Memory & Sessions +- **Supermemory memory provider** — new memory plugin with multi-container, search_mode, identity template, and env var override ([#5737](https://github.com/NousResearch/hermes-agent/pull/5737), [#5933](https://github.com/NousResearch/hermes-agent/pull/5933)) +- **Shared thread sessions** by default — multi-user thread support across gateway platforms ([#5391](https://github.com/NousResearch/hermes-agent/pull/5391)) +- **Subagent sessions linked to parent** and hidden from session list ([#5309](https://github.com/NousResearch/hermes-agent/pull/5309)) +- **Profile-scoped memory isolation** and clone support ([#4845](https://github.com/NousResearch/hermes-agent/pull/4845)) +- **Thread gateway user_id to memory plugins** for per-user scoping ([#5895](https://github.com/NousResearch/hermes-agent/pull/5895)) +- **Honcho plugin drift overhaul** + plugin CLI registration system ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295)) +- **Honcho holographic prompt and trust score** rendering preserved ([#4872](https://github.com/NousResearch/hermes-agent/pull/4872)) +- **Honcho doctor fix** — use recall_mode instead of memory_mode — @techguysimon ([#5645](https://github.com/NousResearch/hermes-agent/pull/5645)) +- **RetainDB** — API routes, write queue, dialectic, agent model, file tools fixes ([#5461](https://github.com/NousResearch/hermes-agent/pull/5461)) +- **Hindsight memory plugin overhaul** + memory setup wizard fixes ([#5094](https://github.com/NousResearch/hermes-agent/pull/5094)) +- **mem0 API v2 compat**, prefetch context fencing, secret redaction ([#5423](https://github.com/NousResearch/hermes-agent/pull/5423)) +- **mem0 env vars merged** with mem0.json instead of either/or ([#4939](https://github.com/NousResearch/hermes-agent/pull/4939)) +- **Clean user message** used for all memory provider operations ([#4940](https://github.com/NousResearch/hermes-agent/pull/4940)) +- **Silent memory flush failure** on /new and /resume fixed — @ryanautomated ([#5640](https://github.com/NousResearch/hermes-agent/pull/5640)) +- **OpenViking atexit safety net** for session commit ([#5664](https://github.com/NousResearch/hermes-agent/pull/5664)) +- **OpenViking tenant-scoping headers** for multi-tenant servers ([#4936](https://github.com/NousResearch/hermes-agent/pull/4936)) +- **ByteRover brv query** runs synchronously before LLM call ([#4831](https://github.com/NousResearch/hermes-agent/pull/4831)) + +--- + +## 📱 Messaging Platforms (Gateway) + +### Gateway Core +- **Inactivity-based agent timeout** — replaces wall-clock timeout with smart activity tracking; long-running active tasks never killed ([#5389](https://github.com/NousResearch/hermes-agent/pull/5389)) +- **Approval buttons for Slack & Telegram** + Slack thread context preservation ([#5890](https://github.com/NousResearch/hermes-agent/pull/5890)) +- **Live-stream /update output** + forward interactive prompts to user ([#5180](https://github.com/NousResearch/hermes-agent/pull/5180)) +- **Infinite timeout support** + periodic notifications + actionable error messages ([#4959](https://github.com/NousResearch/hermes-agent/pull/4959)) +- **Duplicate message prevention** — gateway dedup + partial stream guard ([#4878](https://github.com/NousResearch/hermes-agent/pull/4878)) +- **Webhook delivery_info persistence** + full session id in /status ([#5942](https://github.com/NousResearch/hermes-agent/pull/5942)) +- **Tool preview truncation** respects tool_preview_length in all/new progress modes ([#5937](https://github.com/NousResearch/hermes-agent/pull/5937)) +- **Short preview truncation** restored for all/new tool progress modes ([#4935](https://github.com/NousResearch/hermes-agent/pull/4935)) +- **Update-pending state** written atomically to prevent corruption ([#4923](https://github.com/NousResearch/hermes-agent/pull/4923)) +- **Approval session key isolated** per turn ([#4884](https://github.com/NousResearch/hermes-agent/pull/4884)) +- **Active-session guard bypass** for /approve, /deny, /stop, /new ([#4926](https://github.com/NousResearch/hermes-agent/pull/4926), [#5765](https://github.com/NousResearch/hermes-agent/pull/5765)) +- **Typing indicator paused** during approval waits ([#5893](https://github.com/NousResearch/hermes-agent/pull/5893)) +- **Caption check** uses exact line-by-line match instead of substring (all platforms) ([#5939](https://github.com/NousResearch/hermes-agent/pull/5939)) +- **MEDIA: tags stripped** from streamed gateway messages ([#5152](https://github.com/NousResearch/hermes-agent/pull/5152)) +- **MEDIA: tags extracted** from cron delivery before sending ([#5598](https://github.com/NousResearch/hermes-agent/pull/5598)) +- **Profile-aware service units** + voice transcription cleanup ([#5972](https://github.com/NousResearch/hermes-agent/pull/5972)) +- **Thread-safe PairingStore** with atomic writes — @CharlieKerfoot ([#5656](https://github.com/NousResearch/hermes-agent/pull/5656)) +- **Sanitize media URLs** in base platform logs — @WAXLYY ([#5631](https://github.com/NousResearch/hermes-agent/pull/5631)) +- **Reduce Telegram fallback IP activation log noise** — @MadKangYu ([#5615](https://github.com/NousResearch/hermes-agent/pull/5615)) +- **Cron static method wrappers** to prevent self-binding ([#5299](https://github.com/NousResearch/hermes-agent/pull/5299)) +- **Stale 'hermes login' replaced** with 'hermes auth' + credential removal re-seeding fix ([#5670](https://github.com/NousResearch/hermes-agent/pull/5670)) + +### Telegram +- **Group topics skill binding** for supergroup forum topics ([#4886](https://github.com/NousResearch/hermes-agent/pull/4886)) +- **Emoji reactions** for approval status and notifications ([#5975](https://github.com/NousResearch/hermes-agent/pull/5975)) +- **Duplicate message delivery prevented** on send timeout ([#5153](https://github.com/NousResearch/hermes-agent/pull/5153)) +- **Command names sanitized** to strip invalid characters ([#5596](https://github.com/NousResearch/hermes-agent/pull/5596)) +- **Per-platform disabled skills** respected in Telegram menu and gateway dispatch ([#4799](https://github.com/NousResearch/hermes-agent/pull/4799)) +- **/approve and /deny** routed through running-agent guard ([#4798](https://github.com/NousResearch/hermes-agent/pull/4798)) + +### Discord +- **Channel controls** — ignored_channels and no_thread_channels config options ([#5975](https://github.com/NousResearch/hermes-agent/pull/5975)) +- **Skills registered as native slash commands** via shared gateway logic ([#5603](https://github.com/NousResearch/hermes-agent/pull/5603)) +- **/approve, /deny, /queue, /background, /btw** registered as native slash commands ([#4800](https://github.com/NousResearch/hermes-agent/pull/4800), [#5477](https://github.com/NousResearch/hermes-agent/pull/5477)) +- **Unnecessary members intent** removed on startup + token lock leak fix ([#5302](https://github.com/NousResearch/hermes-agent/pull/5302)) + +### Slack +- **Thread engagement** — auto-respond in bot-started and mentioned threads ([#5897](https://github.com/NousResearch/hermes-agent/pull/5897)) +- **mrkdwn in edit_message** + thread replies without @mentions ([#5733](https://github.com/NousResearch/hermes-agent/pull/5733)) + +### Matrix +- **Tier 1 feature parity** — reactions, read receipts, rich formatting, room management ([#5275](https://github.com/NousResearch/hermes-agent/pull/5275)) +- **MATRIX_REQUIRE_MENTION and MATRIX_AUTO_THREAD** support ([#5106](https://github.com/NousResearch/hermes-agent/pull/5106)) +- **Comprehensive reliability** — encrypted media, auth recovery, cron E2EE, Synapse compat ([#5271](https://github.com/NousResearch/hermes-agent/pull/5271)) +- **CJK input, E2EE, and reconnect** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665)) + +### Signal +- **Full MEDIA: tag delivery** — send_image_file, send_voice, and send_video implemented ([#5602](https://github.com/NousResearch/hermes-agent/pull/5602)) + +### Mattermost +- **File attachments** — set message type to DOCUMENT when post has file attachments — @nericervin ([#5609](https://github.com/NousResearch/hermes-agent/pull/5609)) + +### Feishu +- **Interactive card approval buttons** ([#6043](https://github.com/NousResearch/hermes-agent/pull/6043)) +- **Reconnect and ACL** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665)) + +### Webhooks +- **`{__raw__}` template token** and thread_id passthrough for forum topics ([#5662](https://github.com/NousResearch/hermes-agent/pull/5662)) + +--- + +## 🖥️ CLI & User Experience + +### Interactive CLI +- **Defer response content** until reasoning block completes ([#5773](https://github.com/NousResearch/hermes-agent/pull/5773)) +- **Ghost status-bar lines cleared** on terminal resize ([#4960](https://github.com/NousResearch/hermes-agent/pull/4960)) +- **Normalise \r\n and \r line endings** in pasted text ([#4849](https://github.com/NousResearch/hermes-agent/pull/4849)) +- **ChatConsole errors, curses scroll, skin-aware banner, git state** banner fixes ([#5974](https://github.com/NousResearch/hermes-agent/pull/5974)) +- **Native Windows image paste** support ([#5917](https://github.com/NousResearch/hermes-agent/pull/5917)) +- **--yolo and other flags** no longer silently dropped when placed before 'chat' subcommand ([#5145](https://github.com/NousResearch/hermes-agent/pull/5145)) + +### Setup & Configuration +- **Config structure validation** — detect malformed YAML at startup with actionable error messages ([#5426](https://github.com/NousResearch/hermes-agent/pull/5426)) +- **Centralized logging** to `~/.hermes/logs/` — agent.log (INFO+), errors.log (WARNING+) with `hermes logs` command ([#5430](https://github.com/NousResearch/hermes-agent/pull/5430)) +- **Docs links added** to setup wizard sections ([#5283](https://github.com/NousResearch/hermes-agent/pull/5283)) +- **Doctor diagnostics** — sync provider checks, config migration, WAL and mem0 diagnostics ([#5077](https://github.com/NousResearch/hermes-agent/pull/5077)) +- **Timeout debug logging** and user-facing diagnostics improved ([#5370](https://github.com/NousResearch/hermes-agent/pull/5370)) +- **Reasoning effort unified** to config.yaml only ([#6118](https://github.com/NousResearch/hermes-agent/pull/6118)) +- **Permanent command allowlist** loaded on startup ([#5076](https://github.com/NousResearch/hermes-agent/pull/5076)) +- **`hermes auth remove`** now clears env-seeded credentials permanently ([#5285](https://github.com/NousResearch/hermes-agent/pull/5285)) +- **Bundled skills synced to all profiles** during update ([#5795](https://github.com/NousResearch/hermes-agent/pull/5795)) +- **`hermes update` no longer kills** freshly-restarted gateway service ([#5448](https://github.com/NousResearch/hermes-agent/pull/5448)) +- **Subprocess.run() timeouts** added to all gateway CLI commands ([#5424](https://github.com/NousResearch/hermes-agent/pull/5424)) +- **Actionable error message** when Codex refresh token is reused — @tymrtn ([#5612](https://github.com/NousResearch/hermes-agent/pull/5612)) +- **Google-workspace skill scripts** can now run directly — @xinbenlv ([#5624](https://github.com/NousResearch/hermes-agent/pull/5624)) + +### Cron System +- **Inactivity-based cron timeout** — replaces wall-clock; active tasks run indefinitely ([#5440](https://github.com/NousResearch/hermes-agent/pull/5440)) +- **Pre-run script injection** for data collection and change detection ([#5082](https://github.com/NousResearch/hermes-agent/pull/5082)) +- **Delivery failure tracking** in job status ([#6042](https://github.com/NousResearch/hermes-agent/pull/6042)) +- **Delivery guidance** in cron prompts — stops send_message thrashing ([#5444](https://github.com/NousResearch/hermes-agent/pull/5444)) +- **MEDIA files delivered** as native platform attachments ([#5921](https://github.com/NousResearch/hermes-agent/pull/5921)) +- **[SILENT] suppression** works anywhere in response — @auspic7 ([#5654](https://github.com/NousResearch/hermes-agent/pull/5654)) +- **Cron path traversal** hardening ([#5147](https://github.com/NousResearch/hermes-agent/pull/5147)) + +--- + +## 🔧 Tool System + +### Terminal & Execution +- **Execute_code on remote backends** — code execution now works on Docker, SSH, Modal, and other remote terminal backends ([#5088](https://github.com/NousResearch/hermes-agent/pull/5088)) +- **Exit code context** for common CLI tools in terminal results — helps agent understand what went wrong ([#5144](https://github.com/NousResearch/hermes-agent/pull/5144)) +- **Progressive subdirectory hint discovery** — agent learns project structure as it navigates ([#5291](https://github.com/NousResearch/hermes-agent/pull/5291)) +- **notify_on_complete for background processes** — get notified when long-running tasks finish ([#5779](https://github.com/NousResearch/hermes-agent/pull/5779)) +- **Docker env config** — explicit container environment variables via docker_env config ([#4738](https://github.com/NousResearch/hermes-agent/pull/4738)) +- **Approval metadata included** in terminal tool results ([#5141](https://github.com/NousResearch/hermes-agent/pull/5141)) +- **Workdir parameter sanitized** in terminal tool across all backends ([#5629](https://github.com/NousResearch/hermes-agent/pull/5629)) +- **Detached process crash recovery** state corrected ([#6101](https://github.com/NousResearch/hermes-agent/pull/6101)) +- **Agent-browser paths with spaces** preserved — @Vasanthdev2004 ([#6077](https://github.com/NousResearch/hermes-agent/pull/6077)) +- **Portable base64 encoding** for image reading on macOS — @CharlieKerfoot ([#5657](https://github.com/NousResearch/hermes-agent/pull/5657)) + +### Browser +- **Switch managed browser provider** from Browserbase to Browser Use — @benbarclay ([#5750](https://github.com/NousResearch/hermes-agent/pull/5750)) +- **Firecrawl cloud browser** provider — @alt-glitch ([#5628](https://github.com/NousResearch/hermes-agent/pull/5628)) +- **JS evaluation** via browser_console expression parameter ([#5303](https://github.com/NousResearch/hermes-agent/pull/5303)) +- **Windows browser** fixes ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665)) + +### MCP +- **MCP OAuth 2.1 PKCE** — full standards-compliant OAuth client support ([#5420](https://github.com/NousResearch/hermes-agent/pull/5420)) +- **OSV malware check** for MCP extension packages ([#5305](https://github.com/NousResearch/hermes-agent/pull/5305)) +- **Prefer structuredContent over text** + no_mcp sentinel ([#5979](https://github.com/NousResearch/hermes-agent/pull/5979)) +- **Unknown toolsets warning suppressed** for MCP server names ([#5279](https://github.com/NousResearch/hermes-agent/pull/5279)) + +### Web & Files +- **.zip document support** + auto-mount cache dirs into remote backends ([#4846](https://github.com/NousResearch/hermes-agent/pull/4846)) +- **Redact query secrets** in send_message errors — @WAXLYY ([#5650](https://github.com/NousResearch/hermes-agent/pull/5650)) + +### Delegation +- **Credential pool sharing** + workspace path hints for subagents ([#5748](https://github.com/NousResearch/hermes-agent/pull/5748)) + +### ACP (VS Code / Zed / JetBrains) +- **Aggregate ACP improvements** — auth compat, protocol fixes, command ads, delegation, SSE events ([#5292](https://github.com/NousResearch/hermes-agent/pull/5292)) + +--- + +## 🧩 Skills Ecosystem + +### Skills System +- **Skill config interface** — skills can declare required config.yaml settings, prompted during setup, injected at load time ([#5635](https://github.com/NousResearch/hermes-agent/pull/5635)) +- **Plugin CLI registration system** — plugins register their own CLI subcommands without touching main.py ([#5295](https://github.com/NousResearch/hermes-agent/pull/5295)) +- **Request-scoped API hooks** with tool call correlation IDs for plugins ([#5427](https://github.com/NousResearch/hermes-agent/pull/5427)) +- **Session lifecycle hooks** — on_session_finalize and on_session_reset for CLI + gateway ([#6129](https://github.com/NousResearch/hermes-agent/pull/6129)) +- **Prompt for required env vars** during plugin install — @kshitijk4poor ([#5470](https://github.com/NousResearch/hermes-agent/pull/5470)) +- **Plugin name validation** — reject names that resolve to plugins root ([#5368](https://github.com/NousResearch/hermes-agent/pull/5368)) +- **pre_llm_call plugin context** moved to user message to preserve prompt cache ([#5146](https://github.com/NousResearch/hermes-agent/pull/5146)) + +### New & Updated Skills +- **popular-web-designs** — 54 production website design systems ([#5194](https://github.com/NousResearch/hermes-agent/pull/5194)) +- **p5js creative coding** — @SHL0MS ([#5600](https://github.com/NousResearch/hermes-agent/pull/5600)) +- **manim-video** — mathematical and technical animations — @SHL0MS ([#4930](https://github.com/NousResearch/hermes-agent/pull/4930)) +- **llm-wiki** — Karpathy's LLM Wiki skill ([#5635](https://github.com/NousResearch/hermes-agent/pull/5635)) +- **gitnexus-explorer** — codebase indexing and knowledge serving ([#5208](https://github.com/NousResearch/hermes-agent/pull/5208)) +- **research-paper-writing** — AI-Scientist & GPT-Researcher patterns — @SHL0MS ([#5421](https://github.com/NousResearch/hermes-agent/pull/5421)) +- **blogwatcher** updated to JulienTant's fork ([#5759](https://github.com/NousResearch/hermes-agent/pull/5759)) +- **claude-code skill** comprehensive rewrite v2.0 + v2.2 ([#5155](https://github.com/NousResearch/hermes-agent/pull/5155), [#5158](https://github.com/NousResearch/hermes-agent/pull/5158)) +- **Code verification skills** consolidated into one ([#4854](https://github.com/NousResearch/hermes-agent/pull/4854)) +- **Manim CE reference docs** expanded — geometry, animations, LaTeX — @leotrs ([#5791](https://github.com/NousResearch/hermes-agent/pull/5791)) +- **Manim-video references** — design thinking, updaters, paper explainer, decorations, production quality — @SHL0MS ([#5588](https://github.com/NousResearch/hermes-agent/pull/5588), [#5408](https://github.com/NousResearch/hermes-agent/pull/5408)) + +--- + +## 🔒 Security & Reliability + +### Security Hardening +- **Consolidated security** — SSRF protections, timing attack mitigations, tar traversal prevention, credential leakage guards ([#5944](https://github.com/NousResearch/hermes-agent/pull/5944)) +- **Cross-session isolation** + cron path traversal hardening ([#5613](https://github.com/NousResearch/hermes-agent/pull/5613)) +- **Workdir parameter sanitized** in terminal tool across all backends ([#5629](https://github.com/NousResearch/hermes-agent/pull/5629)) +- **Approval 'once' session escalation** prevented + cron delivery platform validation ([#5280](https://github.com/NousResearch/hermes-agent/pull/5280)) +- **Profile-scoped Google Workspace OAuth tokens** protected ([#4910](https://github.com/NousResearch/hermes-agent/pull/4910)) + +### Reliability +- **Aggressive worktree and branch cleanup** to prevent accumulation ([#6134](https://github.com/NousResearch/hermes-agent/pull/6134)) +- **O(n²) catastrophic backtracking** in redact regex fixed — 100x improvement on large outputs ([#4962](https://github.com/NousResearch/hermes-agent/pull/4962)) +- **Runtime stability fixes** across core, web, delegate, and browser tools ([#4843](https://github.com/NousResearch/hermes-agent/pull/4843)) +- **API server streaming fix** + conversation history support ([#5977](https://github.com/NousResearch/hermes-agent/pull/5977)) +- **OpenViking API endpoint paths** and response parsing corrected ([#5078](https://github.com/NousResearch/hermes-agent/pull/5078)) + +--- + +## 🐛 Notable Bug Fixes + +- **9 community bugfixes salvaged** — gateway, cron, deps, macOS launchd in one batch ([#5288](https://github.com/NousResearch/hermes-agent/pull/5288)) +- **Batch core bug fixes** — model config, session reset, alias fallback, launchctl, delegation, atomic writes ([#5630](https://github.com/NousResearch/hermes-agent/pull/5630)) +- **Batch gateway/platform fixes** — matrix E2EE, CJK input, Windows browser, Feishu reconnect + ACL ([#5665](https://github.com/NousResearch/hermes-agent/pull/5665)) +- **Stale test skips removed**, regex backtracking, file search bug, and test flakiness ([#4969](https://github.com/NousResearch/hermes-agent/pull/4969)) +- **Nix flake** — read version, regen uv.lock, add hermes_logging — @alt-glitch ([#5651](https://github.com/NousResearch/hermes-agent/pull/5651)) +- **Lowercase variable redaction** regression tests ([#5185](https://github.com/NousResearch/hermes-agent/pull/5185)) + +--- + +## 🧪 Testing + +- **57 failing CI tests repaired** across 14 files ([#5823](https://github.com/NousResearch/hermes-agent/pull/5823)) +- **Test suite re-architecture** + CI failure fixes — @alt-glitch ([#5946](https://github.com/NousResearch/hermes-agent/pull/5946)) +- **Codebase-wide lint cleanup** — unused imports, dead code, and inefficient patterns ([#5821](https://github.com/NousResearch/hermes-agent/pull/5821)) +- **browser_close tool removed** — auto-cleanup handles it ([#5792](https://github.com/NousResearch/hermes-agent/pull/5792)) + +--- + +## 📚 Documentation + +- **Comprehensive documentation audit** — fix stale info, expand thin pages, add depth ([#5393](https://github.com/NousResearch/hermes-agent/pull/5393)) +- **40+ discrepancies fixed** between documentation and codebase ([#5818](https://github.com/NousResearch/hermes-agent/pull/5818)) +- **13 features documented** from last week's PRs ([#5815](https://github.com/NousResearch/hermes-agent/pull/5815)) +- **Guides section overhaul** — fix existing + add 3 new tutorials ([#5735](https://github.com/NousResearch/hermes-agent/pull/5735)) +- **Salvaged 4 docs PRs** — docker setup, post-update validation, local LLM guide, signal-cli install ([#5727](https://github.com/NousResearch/hermes-agent/pull/5727)) +- **Discord configuration reference** ([#5386](https://github.com/NousResearch/hermes-agent/pull/5386)) +- **Community FAQ entries** for common workflows and troubleshooting ([#4797](https://github.com/NousResearch/hermes-agent/pull/4797)) +- **WSL2 networking guide** for local model servers ([#5616](https://github.com/NousResearch/hermes-agent/pull/5616)) +- **Honcho CLI reference** + plugin CLI registration docs ([#5308](https://github.com/NousResearch/hermes-agent/pull/5308)) +- **Obsidian Headless setup** for servers in llm-wiki ([#5660](https://github.com/NousResearch/hermes-agent/pull/5660)) +- **Hermes Mod visual skin editor** added to skins page ([#6095](https://github.com/NousResearch/hermes-agent/pull/6095)) + +--- + +## 👥 Contributors + +### Core +- **@teknium1** — 179 PRs + +### Top Community Contributors +- **@SHL0MS** (7 PRs) — p5js creative coding skill, manim-video skill + 5 reference expansions, research-paper-writing, Nous OAuth fix, manim font fix +- **@alt-glitch** (3 PRs) — Firecrawl cloud browser provider, test re-architecture + CI fixes, Nix flake fixes +- **@benbarclay** (2 PRs) — Browser Use managed provider switch, Nous portal base URL fix +- **@CharlieKerfoot** (2 PRs) — macOS portable base64 encoding, thread-safe PairingStore +- **@WAXLYY** (2 PRs) — send_message secret redaction, gateway media URL sanitization +- **@MadKangYu** (2 PRs) — Telegram log noise reduction, context compaction fix for temperature-restricted models + +### All Contributors +@alt-glitch, @austinpickett, @auspic7, @benbarclay, @CharlieKerfoot, @GratefulDave, @kshitijk4poor, @leotrs, @lumethegreat, @MadKangYu, @nericervin, @ryanautomated, @SHL0MS, @techguysimon, @tymrtn, @Vasanthdev2004, @WAXLYY, @xinbenlv + +--- + +**Full Changelog**: [v2026.4.3...v2026.4.8](https://github.com/NousResearch/hermes-agent/compare/v2026.4.3...v2026.4.8) diff --git a/agent/anthropic_adapter.py b/agent/anthropic_adapter.py index f4e8dcee65..2d6c2dd82e 100644 --- a/agent/anthropic_adapter.py +++ b/agent/anthropic_adapter.py @@ -1102,7 +1102,15 @@ def convert_messages_to_anthropic( curr_content = [{"type": "text", "text": curr_content}] fixed[-1]["content"] = prev_content + curr_content else: - # Consecutive assistant messages — merge text content + # Consecutive assistant messages — merge text content. + # Drop thinking blocks from the *second* message: their + # signature was computed against a different turn boundary + # and becomes invalid once merged. + if isinstance(m["content"], list): + m["content"] = [ + b for b in m["content"] + if not (isinstance(b, dict) and b.get("type") in ("thinking", "redacted_thinking")) + ] prev_blocks = fixed[-1]["content"] curr_blocks = m["content"] if isinstance(prev_blocks, list) and isinstance(curr_blocks, list): @@ -1120,6 +1128,68 @@ def convert_messages_to_anthropic( fixed.append(m) result = fixed + # ── Thinking block signature management ────────────────────────── + # Anthropic signs thinking blocks against the full turn content. + # Any upstream mutation (context compression, session truncation, + # orphan stripping, message merging) invalidates the signature, + # causing HTTP 400 "Invalid signature in thinking block". + # + # Strategy (following clawdbot/OpenClaw pattern): + # 1. Strip thinking/redacted_thinking from all assistant messages + # EXCEPT the last one — preserves reasoning continuity on the + # current tool-use chain while avoiding stale signature errors. + # 2. Downgrade unsigned thinking blocks (no signature) to text — + # Anthropic can't validate them and will reject them. + # 3. Strip cache_control from thinking/redacted_thinking blocks — + # cache markers can interfere with signature validation. + _THINKING_TYPES = frozenset(("thinking", "redacted_thinking")) + + last_assistant_idx = None + for i in range(len(result) - 1, -1, -1): + if result[i].get("role") == "assistant": + last_assistant_idx = i + break + + for idx, m in enumerate(result): + if m.get("role") != "assistant" or not isinstance(m.get("content"), list): + continue + + if idx != last_assistant_idx: + # Strip ALL thinking blocks from non-latest assistant messages + stripped = [ + b for b in m["content"] + if not (isinstance(b, dict) and b.get("type") in _THINKING_TYPES) + ] + m["content"] = stripped or [{"type": "text", "text": "(thinking elided)"}] + else: + # Latest assistant: keep signed thinking blocks for reasoning + # continuity; downgrade unsigned ones to plain text. + new_content = [] + for b in m["content"]: + if not isinstance(b, dict) or b.get("type") not in _THINKING_TYPES: + new_content.append(b) + continue + if b.get("type") == "redacted_thinking": + # Redacted blocks use 'data' for the signature payload + if b.get("data"): + new_content.append(b) + # else: drop — no data means it can't be validated + elif b.get("signature"): + # Signed thinking block — keep it + new_content.append(b) + else: + # Unsigned thinking — downgrade to text so it's not lost + thinking_text = b.get("thinking", "") + if thinking_text: + new_content.append({"type": "text", "text": thinking_text}) + m["content"] = new_content or [{"type": "text", "text": "(empty)"}] + + # Strip cache_control from any remaining thinking/redacted_thinking + # blocks — cache markers interfere with signature validation. + for b in m["content"]: + if isinstance(b, dict) and b.get("type") in _THINKING_TYPES: + b.pop("cache_control", None) + return system, result @@ -1224,9 +1294,9 @@ def build_anthropic_kwargs( # Map reasoning_config to Anthropic's thinking parameter. # Claude 4.6 models use adaptive thinking + output_config.effort. # Older models use manual thinking with budget_tokens. - # Haiku models do NOT support extended thinking at all — skip entirely. + # Haiku and MiniMax models do NOT support extended thinking — skip entirely. if reasoning_config and isinstance(reasoning_config, dict): - if reasoning_config.get("enabled") is not False and "haiku" not in model.lower(): + if reasoning_config.get("enabled") is not False and "haiku" not in model.lower() and "minimax" not in model.lower(): effort = str(reasoning_config.get("effort", "medium")).lower() budget = THINKING_BUDGET.get(effort, 8000) if _supports_adaptive_thinking(model): diff --git a/agent/auxiliary_client.py b/agent/auxiliary_client.py index 49a78458d3..2b99ac0708 100644 --- a/agent/auxiliary_client.py +++ b/agent/auxiliary_client.py @@ -59,13 +59,48 @@ from hermes_constants import OPENROUTER_BASE_URL logger = logging.getLogger(__name__) +_PROVIDER_ALIASES = { + "google": "gemini", + "google-gemini": "gemini", + "google-ai-studio": "gemini", + "glm": "zai", + "z-ai": "zai", + "z.ai": "zai", + "zhipu": "zai", + "kimi": "kimi-coding", + "moonshot": "kimi-coding", + "minimax-china": "minimax-cn", + "minimax_cn": "minimax-cn", + "claude": "anthropic", + "claude-code": "anthropic", +} + + +def _normalize_aux_provider(provider: Optional[str], *, for_vision: bool = False) -> str: + normalized = (provider or "auto").strip().lower() + if normalized.startswith("custom:"): + suffix = normalized.split(":", 1)[1].strip() + if not suffix: + return "custom" + normalized = suffix if not for_vision else "custom" + if normalized == "codex": + return "openai-codex" + if normalized == "main": + # Resolve to the user's actual main provider so named custom providers + # and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly. + main_prov = _read_main_provider() + if main_prov and main_prov not in ("auto", "main", ""): + return main_prov + return "custom" + return _PROVIDER_ALIASES.get(normalized, normalized) + # Default auxiliary models for direct API-key providers (cheap/fast for side tasks) _API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = { "gemini": "gemini-3-flash-preview", "zai": "glm-4.5-flash", "kimi-coding": "kimi-k2-turbo-preview", - "minimax": "MiniMax-M2.7-highspeed", - "minimax-cn": "MiniMax-M2.7-highspeed", + "minimax": "MiniMax-M2.7", + "minimax-cn": "MiniMax-M2.7", "anthropic": "claude-haiku-4-5-20251001", "ai-gateway": "google/gemini-3-flash", "opencode-zen": "gemini-3-flash", @@ -92,6 +127,7 @@ auxiliary_is_nous: bool = False _OPENROUTER_MODEL = "google/gemini-3-flash-preview" _NOUS_MODEL = "google/gemini-3-flash-preview" _NOUS_FREE_TIER_VISION_MODEL = "xiaomi/mimo-v2-omni" +_NOUS_FREE_TIER_AUX_MODEL = "xiaomi/mimo-v2-pro" _NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1" _ANTHROPIC_DEFAULT_BASE_URL = "https://api.anthropic.com" _AUTH_JSON_PATH = get_hermes_home() / "auth.json" @@ -105,6 +141,23 @@ _CODEX_AUX_MODEL = "gpt-5.2-codex" _CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex" +def _to_openai_base_url(base_url: str) -> str: + """Normalize an Anthropic-style base URL to OpenAI-compatible format. + + Some providers (MiniMax, MiniMax-CN) expose an ``/anthropic`` endpoint for + the Anthropic Messages API and a separate ``/v1`` endpoint for OpenAI chat + completions. The auxiliary client uses the OpenAI SDK, so it must hit the + ``/v1`` surface. Passing the raw ``inference_base_url`` causes requests to + land on ``/anthropic/chat/completions`` — a 404. + """ + url = str(base_url or "").strip().rstrip("/") + if url.endswith("/anthropic"): + rewritten = url[: -len("/anthropic")] + "/v1" + logger.debug("Auxiliary client: rewrote base URL %s → %s", url, rewritten) + return rewritten + return url + + def _select_pool_entry(provider: str) -> Tuple[bool, Optional[Any]]: """Return (pool_exists_for_provider, selected_entry).""" try: @@ -634,7 +687,9 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]: if not api_key: continue - base_url = _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url + base_url = _to_openai_base_url( + _pool_runtime_base_url(entry, pconfig.inference_base_url) or pconfig.inference_base_url + ) model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default") logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model) extra = {} @@ -651,7 +706,9 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]: if not api_key: continue - base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url + base_url = _to_openai_base_url( + str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url + ) model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default") logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model) extra = {} @@ -713,7 +770,7 @@ def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]: default_headers=_OR_HEADERS), _OPENROUTER_MODEL -def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]: +def _try_nous(vision: bool = False) -> Tuple[Optional[OpenAI], Optional[str]]: nous = _read_nous_auth() if not nous: return None, None @@ -725,12 +782,13 @@ def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]: else: model = _NOUS_MODEL # Free-tier users can't use paid auxiliary models — use the free - # multimodal model instead so vision/browser-vision still works. + # models instead: mimo-v2-omni for vision, mimo-v2-pro for text tasks. try: from hermes_cli.models import check_nous_free_tier if check_nous_free_tier(): - model = _NOUS_FREE_TIER_VISION_MODEL - logger.debug("Free-tier Nous account — using %s for auxiliary/vision", model) + model = _NOUS_FREE_TIER_VISION_MODEL if vision else _NOUS_FREE_TIER_AUX_MODEL + logger.debug("Free-tier Nous account — using %s for auxiliary/%s", + model, "vision" if vision else "text") except Exception: pass return ( @@ -776,7 +834,7 @@ def _read_main_provider() -> str: if isinstance(model_cfg, dict): provider = model_cfg.get("provider", "") if isinstance(provider, str) and provider.strip(): - return provider.strip().lower() + return _normalize_aux_provider(provider) except Exception: pass return "" @@ -1138,17 +1196,7 @@ def resolve_provider_client( (client, resolved_model) or (None, None) if auth is unavailable. """ # Normalise aliases - provider = (provider or "auto").strip().lower() - if provider == "codex": - provider = "openai-codex" - if provider == "main": - # Resolve to the user's actual main provider so named custom providers - # and non-aggregator providers (DeepSeek, Alibaba, etc.) work correctly. - main_prov = _read_main_provider() - if main_prov and main_prov not in ("auto", "main", ""): - provider = main_prov - else: - provider = "custom" + provider = _normalize_aux_provider(provider) # ── Auto: try all providers in priority order ──────────────────── if provider == "auto": @@ -1298,7 +1346,9 @@ def resolve_provider_client( provider, ", ".join(tried_sources)) return None, None - base_url = str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url + base_url = _to_openai_base_url( + str(creds.get("base_url", "")).strip().rstrip("/") or pconfig.inference_base_url + ) default_model = _API_KEY_PROVIDER_AUX_MODELS.get(provider, "") final_model = model or default_model @@ -1375,24 +1425,11 @@ def get_async_text_auxiliary_client(task: str = ""): _VISION_AUTO_PROVIDER_ORDER = ( "openrouter", "nous", - "openai-codex", - "anthropic", - "custom", ) def _normalize_vision_provider(provider: Optional[str]) -> str: - provider = (provider or "auto").strip().lower() - if provider == "codex": - return "openai-codex" - if provider == "main": - # Resolve to actual main provider — named custom providers and - # non-aggregator providers need to pass through as their real name. - main_prov = _read_main_provider() - if main_prov and main_prov not in ("auto", "main", ""): - return main_prov - return "custom" - return provider + return _normalize_aux_provider(provider, for_vision=True) def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Optional[str]]: @@ -1400,7 +1437,7 @@ def _resolve_strict_vision_backend(provider: str) -> Tuple[Optional[Any], Option if provider == "openrouter": return _try_openrouter() if provider == "nous": - return _try_nous() + return _try_nous(vision=True) if provider == "openai-codex": return _try_codex() if provider == "anthropic": @@ -1433,17 +1470,20 @@ def _preferred_main_vision_provider() -> Optional[str]: def get_available_vision_backends() -> List[str]: """Return the currently available vision backends in auto-selection order. - This is the single source of truth for setup, tool gating, and runtime - auto-routing of vision tasks. The selected main provider is preferred when - it is also a known-good vision backend; otherwise Hermes falls back through - the standard conservative order. + Order: OpenRouter → Nous → active provider. This is the single source + of truth for setup, tool gating, and runtime auto-routing of vision tasks. """ - ordered = list(_VISION_AUTO_PROVIDER_ORDER) - preferred = _preferred_main_vision_provider() - if preferred in ordered: - ordered.remove(preferred) - ordered.insert(0, preferred) - return [provider for provider in ordered if _strict_vision_backend_available(provider)] + available = [p for p in _VISION_AUTO_PROVIDER_ORDER + if _strict_vision_backend_available(p)] + # Also check the user's active provider (may be DeepSeek, Alibaba, named + # custom, etc.) — resolve_provider_client handles all provider types. + main_provider = _read_main_provider() + if (main_provider and main_provider not in ("auto", "") + and main_provider not in available): + client, _ = resolve_provider_client(main_provider, _read_main_model()) + if client is not None: + available.append(main_provider) + return available def resolve_vision_provider_client( @@ -1488,16 +1528,30 @@ def resolve_vision_provider_client( return "custom", client, final_model if requested == "auto": - ordered = list(_VISION_AUTO_PROVIDER_ORDER) - preferred = _preferred_main_vision_provider() - if preferred in ordered: - ordered.remove(preferred) - ordered.insert(0, preferred) - - for candidate in ordered: + # Vision auto-detection order: + # 1. OpenRouter (known vision-capable default model) + # 2. Nous Portal (known vision-capable default model) + # 3. Active provider + model (user's main chat config) + # 4. Stop + for candidate in _VISION_AUTO_PROVIDER_ORDER: sync_client, default_model = _resolve_strict_vision_backend(candidate) if sync_client is not None: return _finalize(candidate, sync_client, default_model) + + # Fall back to the user's active provider + model. + main_provider = _read_main_provider() + main_model = _read_main_model() + if main_provider and main_provider not in ("auto", ""): + sync_client, resolved_model = resolve_provider_client( + main_provider, main_model) + if sync_client is not None: + logger.info( + "Vision auto-detect: using active provider %s (%s)", + main_provider, resolved_model or main_model, + ) + return _finalize( + main_provider, sync_client, resolved_model or main_model) + logger.debug("Auxiliary vision client: none available") return None, None, None diff --git a/agent/model_metadata.py b/agent/model_metadata.py index 50245a7c9c..0a22711865 100644 --- a/agent/model_metadata.py +++ b/agent/model_metadata.py @@ -113,8 +113,15 @@ DEFAULT_CONTEXT_LENGTHS = { "llama": 131072, # Qwen "qwen": 131072, - # MiniMax - "minimax": 204800, + # MiniMax (lowercase — lookup lowercases model names at line 973) + "minimax-m1-256k": 1000000, + "minimax-m1-128k": 1000000, + "minimax-m1-80k": 1000000, + "minimax-m1-40k": 1000000, + "minimax-m1": 1000000, + "minimax-m2.5": 1048576, + "minimax-m2.7": 1048576, + "minimax": 1048576, # GLM "glm": 202752, # Kimi @@ -127,7 +134,7 @@ DEFAULT_CONTEXT_LENGTHS = { "deepseek-ai/DeepSeek-V3.2": 65536, "moonshotai/Kimi-K2.5": 262144, "moonshotai/Kimi-K2-Thinking": 262144, - "MiniMaxAI/MiniMax-M2.5": 204800, + "minimaxai/minimax-m2.5": 1048576, "XiaomiMiMo/MiMo-V2-Flash": 32768, "mimo-v2-pro": 1048576, "mimo-v2-omni": 1048576, @@ -611,6 +618,59 @@ def _model_id_matches(candidate_id: str, lookup_model: str) -> bool: return False +def query_ollama_num_ctx(model: str, base_url: str) -> Optional[int]: + """Query an Ollama server for the model's context length. + + Returns the model's maximum context from GGUF metadata via ``/api/show``, + or the explicit ``num_ctx`` from the Modelfile if set. Returns None if + the server is unreachable or not Ollama. + + This is the value that should be passed as ``num_ctx`` in Ollama chat + requests to override the default 2048. + """ + import httpx + + bare_model = _strip_provider_prefix(model) + server_url = base_url.rstrip("/") + if server_url.endswith("/v1"): + server_url = server_url[:-3] + + try: + server_type = detect_local_server_type(base_url) + except Exception: + return None + if server_type != "ollama": + return None + + try: + with httpx.Client(timeout=3.0) as client: + resp = client.post(f"{server_url}/api/show", json={"name": bare_model}) + if resp.status_code != 200: + return None + data = resp.json() + + # Prefer explicit num_ctx from Modelfile parameters (user override) + params = data.get("parameters", "") + if "num_ctx" in params: + for line in params.split("\n"): + if "num_ctx" in line: + parts = line.strip().split() + if len(parts) >= 2: + try: + return int(parts[-1]) + except ValueError: + pass + + # Fall back to GGUF model_info context_length (training max) + model_info = data.get("model_info", {}) + for key, value in model_info.items(): + if "context_length" in key and isinstance(value, (int, float)): + return int(value) + except Exception: + pass + return None + + def _query_local_context_length(model: str, base_url: str) -> Optional[int]: """Query a local server for the model's context length.""" import httpx diff --git a/agent/prompt_builder.py b/agent/prompt_builder.py index df5532e125..b1b0891f59 100644 --- a/agent/prompt_builder.py +++ b/agent/prompt_builder.py @@ -204,6 +204,30 @@ OPENAI_MODEL_EXECUTION_GUIDANCE = ( "the result.\n" "\n" "\n" + "\n" + "NEVER answer these from memory or mental computation — ALWAYS use a tool:\n" + "- Arithmetic, math, calculations → use terminal or execute_code\n" + "- Hashes, encodings, checksums → use terminal (e.g. sha256sum, base64)\n" + "- Current time, date, timezone → use terminal (e.g. date)\n" + "- System state: OS, CPU, memory, disk, ports, processes → use terminal\n" + "- File contents, sizes, line counts → use read_file, search_files, or terminal\n" + "- Git history, branches, diffs → use terminal\n" + "- Current facts (weather, news, versions) → use web_search\n" + "Your memory and user profile describe the USER, not the system you are " + "running on. The execution environment may differ from what the user profile " + "says about their personal setup.\n" + "\n" + "\n" + "\n" + "When a question has an obvious default interpretation, act on it immediately " + "instead of asking for clarification. Examples:\n" + "- 'Is port 443 open?' → check THIS machine (don't ask 'open where?')\n" + "- 'What OS am I running?' → check the live system (don't use user profile)\n" + "- 'What time is it?' → run `date` (don't guess)\n" + "Only ask for clarification when the ambiguity genuinely changes what tool " + "you would call.\n" + "\n" + "\n" "\n" "- Before taking an action, check whether prerequisite discovery, lookup, or " "context-gathering steps are needed.\n" diff --git a/agent/retry_utils.py b/agent/retry_utils.py new file mode 100644 index 0000000000..71d6963f7b --- /dev/null +++ b/agent/retry_utils.py @@ -0,0 +1,57 @@ +"""Retry utilities — jittered backoff for decorrelated retries. + +Replaces fixed exponential backoff with jittered delays to prevent +thundering-herd retry spikes when multiple sessions hit the same +rate-limited provider concurrently. +""" + +import random +import threading +import time + +# Monotonic counter for jitter seed uniqueness within the same process. +# Protected by a lock to avoid race conditions in concurrent retry paths +# (e.g. multiple gateway sessions retrying simultaneously). +_jitter_counter = 0 +_jitter_lock = threading.Lock() + + +def jittered_backoff( + attempt: int, + *, + base_delay: float = 5.0, + max_delay: float = 120.0, + jitter_ratio: float = 0.5, +) -> float: + """Compute a jittered exponential backoff delay. + + Args: + attempt: 1-based retry attempt number. + base_delay: Base delay in seconds for attempt 1. + max_delay: Maximum delay cap in seconds. + jitter_ratio: Fraction of computed delay to use as random jitter + range. 0.5 means jitter is uniform in [0, 0.5 * delay]. + + Returns: + Delay in seconds: min(base * 2^(attempt-1), max_delay) + jitter. + + The jitter decorrelates concurrent retries so multiple sessions + hitting the same provider don't all retry at the same instant. + """ + global _jitter_counter + with _jitter_lock: + _jitter_counter += 1 + tick = _jitter_counter + + exponent = max(0, attempt - 1) + if exponent >= 63 or base_delay <= 0: + delay = max_delay + else: + delay = min(base_delay * (2 ** exponent), max_delay) + + # Seed from time + counter for decorrelation even with coarse clocks. + seed = (time.time_ns() ^ (tick * 0x9E3779B9)) & 0xFFFFFFFF + rng = random.Random(seed) + jitter = rng.uniform(0, jitter_ratio * delay) + + return delay + jitter diff --git a/cli.py b/cli.py index b4358a163c..f00e6b7fea 100644 --- a/cli.py +++ b/cli.py @@ -612,6 +612,11 @@ def _run_cleanup(): pass # Shut down memory provider (on_session_end + shutdown_all) at actual # session boundary — NOT per-turn inside run_conversation(). + try: + from hermes_cli.plugins import invoke_hook as _invoke_hook + _invoke_hook("on_session_finalize", session_id=_active_agent_ref.session_id if _active_agent_ref else None, platform="cli") + except Exception: + pass try: if _active_agent_ref and hasattr(_active_agent_ref, 'shutdown_memory_provider'): _active_agent_ref.shutdown_memory_provider( @@ -755,7 +760,10 @@ def _setup_worktree(repo_root: str = None) -> Optional[Dict[str, str]]: def _cleanup_worktree(info: Dict[str, str] = None) -> None: """Remove a worktree and its branch on exit. - If the worktree has uncommitted changes, warn and keep it. + Preserves the worktree only if it has unpushed commits (real work + that hasn't been pushed to any remote). Uncommitted changes alone + (untracked files, test artifacts) are not enough to keep it — agent + work lives in commits/PRs, not the working tree. """ global _active_worktree info = info or _active_worktree @@ -771,23 +779,27 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None: if not Path(wt_path).exists(): return - # Check for uncommitted changes + # Check for unpushed commits — commits reachable from HEAD but not + # from any remote branch. These represent real work the agent did + # but didn't push. + has_unpushed = False try: - status = subprocess.run( - ["git", "status", "--porcelain"], + result = subprocess.run( + ["git", "log", "--oneline", "HEAD", "--not", "--remotes"], capture_output=True, text=True, timeout=10, cwd=wt_path, ) - has_changes = bool(status.stdout.strip()) + has_unpushed = bool(result.stdout.strip()) except Exception: - has_changes = True # Assume dirty on error — don't delete + has_unpushed = True # Assume unpushed on error — don't delete - if has_changes: - print(f"\n\033[33m⚠ Worktree has uncommitted changes, keeping: {wt_path}\033[0m") - print(f" To clean up manually: git worktree remove {wt_path}") + if has_unpushed: + print(f"\n\033[33m⚠ Worktree has unpushed commits, keeping: {wt_path}\033[0m") + print(f" To clean up manually: git worktree remove --force {wt_path}") _active_worktree = None return - # Remove worktree + # Remove worktree (even if working tree is dirty — uncommitted + # changes without unpushed commits are just artifacts) try: subprocess.run( ["git", "worktree", "remove", wt_path, "--force"], @@ -796,7 +808,7 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None: except Exception as e: logger.debug("Failed to remove worktree: %s", e) - # Delete the branch (only if it was never pushed / has no upstream) + # Delete the branch try: subprocess.run( ["git", "branch", "-D", branch], @@ -810,19 +822,27 @@ def _cleanup_worktree(info: Dict[str, str] = None) -> None: def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None: - """Remove worktrees older than max_age_hours that have no uncommitted changes. + """Remove stale worktrees and orphaned branches on startup. - Runs silently on startup to clean up after crashed/killed sessions. + Age-based tiers: + - Under max_age_hours (24h): skip — session may still be active. + - 24h–72h: remove if no unpushed commits. + - Over 72h: force remove regardless (nothing should sit this long). + + Also prunes orphaned ``hermes/*`` and ``pr-*`` local branches that + have no corresponding worktree. """ import subprocess import time worktrees_dir = Path(repo_root) / ".worktrees" if not worktrees_dir.exists(): + _prune_orphaned_branches(repo_root) return now = time.time() - cutoff = now - (max_age_hours * 3600) + soft_cutoff = now - (max_age_hours * 3600) # 24h default + hard_cutoff = now - (max_age_hours * 3 * 3600) # 72h default for entry in worktrees_dir.iterdir(): if not entry.is_dir() or not entry.name.startswith("hermes-"): @@ -831,21 +851,24 @@ def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None: # Check age try: mtime = entry.stat().st_mtime - if mtime > cutoff: + if mtime > soft_cutoff: continue # Too recent — skip except Exception: continue - # Check for uncommitted changes - try: - status = subprocess.run( - ["git", "status", "--porcelain"], - capture_output=True, text=True, timeout=5, cwd=str(entry), - ) - if status.stdout.strip(): - continue # Has changes — skip - except Exception: - continue # Can't check — skip + force = mtime <= hard_cutoff # Over 72h — force remove + + if not force: + # 24h–72h tier: only remove if no unpushed commits + try: + result = subprocess.run( + ["git", "log", "--oneline", "HEAD", "--not", "--remotes"], + capture_output=True, text=True, timeout=5, cwd=str(entry), + ) + if result.stdout.strip(): + continue # Has unpushed commits — skip + except Exception: + continue # Can't check — skip # Safe to remove try: @@ -864,10 +887,81 @@ def _prune_stale_worktrees(repo_root: str, max_age_hours: int = 24) -> None: ["git", "branch", "-D", branch], capture_output=True, text=True, timeout=10, cwd=repo_root, ) - logger.debug("Pruned stale worktree: %s", entry.name) + logger.debug("Pruned stale worktree: %s (force=%s)", entry.name, force) except Exception as e: logger.debug("Failed to prune worktree %s: %s", entry.name, e) + _prune_orphaned_branches(repo_root) + + +def _prune_orphaned_branches(repo_root: str) -> None: + """Delete local ``hermes/hermes-*`` and ``pr-*`` branches with no worktree. + + These are auto-generated by ``hermes -w`` sessions and PR review + workflows respectively. Once their worktree is gone they serve no + purpose and just accumulate. + """ + import subprocess + + try: + result = subprocess.run( + ["git", "branch", "--format=%(refname:short)"], + capture_output=True, text=True, timeout=10, cwd=repo_root, + ) + if result.returncode != 0: + return + all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()] + except Exception: + return + + # Collect branches that are actively checked out in a worktree + active_branches: set = set() + try: + wt_result = subprocess.run( + ["git", "worktree", "list", "--porcelain"], + capture_output=True, text=True, timeout=10, cwd=repo_root, + ) + for line in wt_result.stdout.split("\n"): + if line.startswith("branch refs/heads/"): + active_branches.add(line.split("branch refs/heads/", 1)[-1].strip()) + except Exception: + return # Can't determine active branches — bail + + # Also protect the currently checked-out branch and main + try: + head_result = subprocess.run( + ["git", "branch", "--show-current"], + capture_output=True, text=True, timeout=5, cwd=repo_root, + ) + current = head_result.stdout.strip() + if current: + active_branches.add(current) + except Exception: + pass + active_branches.add("main") + + orphaned = [ + b for b in all_branches + if b not in active_branches + and (b.startswith("hermes/hermes-") or b.startswith("pr-")) + ] + + if not orphaned: + return + + # Delete in batches + for i in range(0, len(orphaned), 50): + batch = orphaned[i:i + 50] + try: + subprocess.run( + ["git", "branch", "-D"] + batch, + capture_output=True, text=True, timeout=30, cwd=repo_root, + ) + except Exception as e: + logger.debug("Failed to prune orphaned branches: %s", e) + + logger.debug("Pruned %d orphaned branches", len(orphaned)) + # ============================================================================ # ASCII Art & Branding # ============================================================================ @@ -3314,6 +3408,22 @@ class HermesCLI: flush_tool_summary() print() + def _notify_session_boundary(self, event_type: str) -> None: + """Fire a session-boundary plugin hook (on_session_finalize or on_session_reset). + + Non-blocking — errors are caught and logged. Safe to call from any + lifecycle point (shutdown, /new, /reset). + """ + try: + from hermes_cli.plugins import invoke_hook as _invoke_hook + _invoke_hook( + event_type, + session_id=self.agent.session_id if self.agent else None, + platform=getattr(self, "platform", None) or "cli", + ) + except Exception: + pass + def new_session(self, silent=False): """Start a fresh session with a new session ID and cleared agent state.""" if self.agent and self.conversation_history: @@ -3321,6 +3431,10 @@ class HermesCLI: self.agent.flush_memories(self.conversation_history) except (Exception, KeyboardInterrupt): pass + self._notify_session_boundary("on_session_finalize") + elif self.agent: + # First session or empty history — still finalize the old session + self._notify_session_boundary("on_session_finalize") old_session_id = self.session_id if self._session_db and old_session_id: @@ -3365,6 +3479,7 @@ class HermesCLI: ) except Exception: pass + self._notify_session_boundary("on_session_reset") if not silent: print("(^_^)v New session started!") diff --git a/cron/jobs.py b/cron/jobs.py index 214da521fe..4096d1fd81 100644 --- a/cron/jobs.py +++ b/cron/jobs.py @@ -574,12 +574,16 @@ def remove_job(job_id: str) -> bool: return False -def mark_job_run(job_id: str, success: bool, error: Optional[str] = None): +def mark_job_run(job_id: str, success: bool, error: Optional[str] = None, + delivery_error: Optional[str] = None): """ Mark a job as having been run. Updates last_run_at, last_status, increments completed count, computes next_run_at, and auto-deletes if repeat limit reached. + + ``delivery_error`` is tracked separately from the agent error — a job + can succeed (agent produced output) but fail delivery (platform down). """ jobs = load_jobs() for i, job in enumerate(jobs): @@ -588,6 +592,8 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None): job["last_run_at"] = now job["last_status"] = "ok" if success else "error" job["last_error"] = error if not success else None + # Track delivery failures separately — cleared on successful delivery + job["last_delivery_error"] = delivery_error # Increment completed count if job.get("repeat"): diff --git a/cron/scheduler.py b/cron/scheduler.py index 8d71248b4e..33a9b89935 100644 --- a/cron/scheduler.py +++ b/cron/scheduler.py @@ -196,7 +196,7 @@ def _send_media_via_adapter(adapter, chat_id: str, media_files: list, metadata: logger.warning("Job '%s': failed to send media %s: %s", job.get("id", "?"), media_path, e) -def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None: +def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> Optional[str]: """ Deliver job output to the configured target (origin chat, specific platform, etc.). @@ -204,16 +204,16 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None: use the live adapter first — this supports E2EE rooms (e.g. Matrix) where the standalone HTTP path cannot encrypt. Falls back to standalone send if the adapter path fails or is unavailable. + + Returns None on success, or an error string on failure. """ target = _resolve_delivery_target(job) if not target: if job.get("deliver", "local") != "local": - logger.warning( - "Job '%s' deliver=%s but no concrete delivery target could be resolved", - job["id"], - job.get("deliver", "local"), - ) - return + msg = f"no delivery target resolved for deliver={job.get('deliver', 'local')}" + logger.warning("Job '%s': %s", job["id"], msg) + return msg + return None # local-only jobs don't deliver — not a failure platform_name = target["platform"] chat_id = target["chat_id"] @@ -239,19 +239,22 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None: } platform = platform_map.get(platform_name.lower()) if not platform: - logger.warning("Job '%s': unknown platform '%s' for delivery", job["id"], platform_name) - return + msg = f"unknown platform '{platform_name}'" + logger.warning("Job '%s': %s", job["id"], msg) + return msg try: config = load_gateway_config() except Exception as e: - logger.error("Job '%s': failed to load gateway config for delivery: %s", job["id"], e) - return + msg = f"failed to load gateway config: {e}" + logger.error("Job '%s': %s", job["id"], msg) + return msg pconfig = config.platforms.get(platform) if not pconfig or not pconfig.enabled: - logger.warning("Job '%s': platform '%s' not configured/enabled", job["id"], platform_name) - return + msg = f"platform '{platform_name}' not configured/enabled" + logger.warning("Job '%s': %s", job["id"], msg) + return msg # Optionally wrap the content with a header/footer so the user knows this # is a cron delivery. Wrapping is on by default; set cron.wrap_response: false @@ -307,7 +310,7 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None: if adapter_ok: logger.info("Job '%s': delivered to %s:%s via live adapter", job["id"], platform_name, chat_id) - return + return None except Exception as e: logger.warning( "Job '%s': live adapter delivery to %s:%s failed (%s), falling back to standalone", @@ -329,13 +332,17 @@ def _deliver_result(job: dict, content: str, adapters=None, loop=None) -> None: future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, cleaned_delivery_content, thread_id=thread_id, media_files=media_files)) result = future.result(timeout=30) except Exception as e: - logger.error("Job '%s': delivery to %s:%s failed: %s", job["id"], platform_name, chat_id, e) - return + msg = f"delivery to {platform_name}:{chat_id} failed: {e}" + logger.error("Job '%s': %s", job["id"], msg) + return msg if result and result.get("error"): - logger.error("Job '%s': delivery error: %s", job["id"], result["error"]) - else: - logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id) + msg = f"delivery error: {result['error']}" + logger.error("Job '%s': %s", job["id"], msg) + return msg + + logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id) + return None _SCRIPT_TIMEOUT = 120 # seconds @@ -578,11 +585,9 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]: except Exception as e: logger.warning("Job '%s': failed to load config.yaml, using defaults: %s", job_id, e) - # Reasoning config from env or config.yaml + # Reasoning config from config.yaml from hermes_constants import parse_reasoning_effort - effort = os.getenv("HERMES_REASONING_EFFORT", "") - if not effort: - effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip() + effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip() reasoning_config = parse_reasoning_effort(effort) # Prefill messages from env or config.yaml @@ -868,13 +873,15 @@ def tick(verbose: bool = True, adapters=None, loop=None) -> int: logger.info("Job '%s': agent returned %s — skipping delivery", job["id"], SILENT_MARKER) should_deliver = False + delivery_error = None if should_deliver: try: - _deliver_result(job, deliver_content, adapters=adapters, loop=loop) + delivery_error = _deliver_result(job, deliver_content, adapters=adapters, loop=loop) except Exception as de: + delivery_error = str(de) logger.error("Delivery failed for job %s: %s", job["id"], de) - mark_job_run(job["id"], success, error) + mark_job_run(job["id"], success, error, delivery_error=delivery_error) executed += 1 except Exception as e: diff --git a/gateway/platforms/feishu.py b/gateway/platforms/feishu.py index 4bc712f29f..6012a0f1c0 100644 --- a/gateway/platforms/feishu.py +++ b/gateway/platforms/feishu.py @@ -20,6 +20,7 @@ from __future__ import annotations import asyncio import hashlib import hmac +import itertools import json import logging import mimetypes @@ -1052,6 +1053,9 @@ class FeishuAdapter(BasePlatformAdapter): self._media_batch_state = FeishuBatchState() self._pending_media_batches = self._media_batch_state.events self._pending_media_batch_tasks = self._media_batch_state.tasks + # Exec approval button state (approval_id → {session_key, message_id, chat_id}) + self._approval_state: Dict[int, Dict[str, str]] = {} + self._approval_counter = itertools.count(1) self._load_seen_message_ids() @staticmethod @@ -1394,6 +1398,104 @@ class FeishuAdapter(BasePlatformAdapter): logger.error("[Feishu] Failed to edit message %s: %s", message_id, exc, exc_info=True) return SendResult(success=False, error=str(exc)) + async def send_exec_approval( + self, chat_id: str, command: str, session_key: str, + description: str = "dangerous command", + metadata: Optional[Dict[str, Any]] = None, + ) -> SendResult: + """Send an interactive card with approval buttons. + + The buttons carry ``hermes_action`` in their value dict so that + ``_handle_card_action_event`` can intercept them and call + ``resolve_gateway_approval()`` to unblock the waiting agent thread. + """ + if not self._client: + return SendResult(success=False, error="Not connected") + + try: + approval_id = next(self._approval_counter) + cmd_preview = command[:3000] + "..." if len(command) > 3000 else command + + def _btn(label: str, action_name: str, btn_type: str = "default") -> dict: + return { + "tag": "button", + "text": {"tag": "plain_text", "content": label}, + "type": btn_type, + "value": {"hermes_action": action_name, "approval_id": approval_id}, + } + + card = { + "config": {"wide_screen_mode": True}, + "header": { + "title": {"content": "⚠️ Command Approval Required", "tag": "plain_text"}, + "template": "orange", + }, + "elements": [ + { + "tag": "markdown", + "content": f"```\n{cmd_preview}\n```\n**Reason:** {description}", + }, + { + "tag": "action", + "actions": [ + _btn("✅ Allow Once", "approve_once", "primary"), + _btn("✅ Session", "approve_session"), + _btn("✅ Always", "approve_always"), + _btn("❌ Deny", "deny", "danger"), + ], + }, + ], + } + + payload = json.dumps(card, ensure_ascii=False) + response = await self._feishu_send_with_retry( + chat_id=chat_id, + msg_type="interactive", + payload=payload, + reply_to=None, + metadata=metadata, + ) + + result = self._finalize_send_result(response, "send_exec_approval failed") + if result.success: + self._approval_state[approval_id] = { + "session_key": session_key, + "message_id": result.message_id or "", + "chat_id": chat_id, + } + return result + except Exception as exc: + logger.warning("[Feishu] send_exec_approval failed: %s", exc) + return SendResult(success=False, error=str(exc)) + + async def _update_approval_card( + self, message_id: str, label: str, user_name: str, choice: str, + ) -> None: + """Replace the approval card with a resolved status card.""" + if not self._client or not message_id: + return + icon = "❌" if choice == "deny" else "✅" + card = { + "config": {"wide_screen_mode": True}, + "header": { + "title": {"content": f"{icon} {label}", "tag": "plain_text"}, + "template": "red" if choice == "deny" else "green", + }, + "elements": [ + { + "tag": "markdown", + "content": f"{icon} **{label}** by {user_name}", + }, + ], + } + try: + payload = json.dumps(card, ensure_ascii=False) + body = self._build_update_message_body(msg_type="interactive", content=payload) + request = self._build_update_message_request(message_id=message_id, request_body=body) + await asyncio.to_thread(self._client.im.v1.message.update, request) + except Exception as exc: + logger.warning("[Feishu] Failed to update approval card %s: %s", message_id, exc) + async def send_voice( self, chat_id: str, @@ -1820,6 +1922,52 @@ class FeishuAdapter(BasePlatformAdapter): action = getattr(event, "action", None) action_tag = str(getattr(action, "tag", "") or "button") action_value = getattr(action, "value", {}) or {} + + # --- Exec approval button intercept --- + hermes_action = action_value.get("hermes_action") if isinstance(action_value, dict) else None + if hermes_action: + approval_id = action_value.get("approval_id") + state = self._approval_state.pop(approval_id, None) + if not state: + logger.debug("[Feishu] Approval %s already resolved or unknown", approval_id) + return + + choice_map = { + "approve_once": "once", + "approve_session": "session", + "approve_always": "always", + "deny": "deny", + } + choice = choice_map.get(hermes_action, "deny") + + label_map = { + "once": "Approved once", + "session": "Approved for session", + "always": "Approved permanently", + "deny": "Denied", + } + label = label_map.get(choice, "Resolved") + + # Resolve sender name for the status card + sender_id = SimpleNamespace(open_id=open_id, user_id=None, union_id=None) + sender_profile = await self._resolve_sender_profile(sender_id) + user_name = sender_profile.get("user_name") or open_id + + # Resolve the approval — unblocks the agent thread + try: + from tools.approval import resolve_gateway_approval + count = resolve_gateway_approval(state["session_key"], choice) + logger.info( + "Feishu button resolved %d approval(s) for session %s (choice=%s, user=%s)", + count, state["session_key"], choice, user_name, + ) + except Exception as exc: + logger.error("Failed to resolve gateway approval from Feishu button: %s", exc) + + # Update the card to show the decision + await self._update_approval_card(state.get("message_id", ""), label, user_name, choice) + return + synthetic_text = f"/card {action_tag}" if action_value: try: diff --git a/gateway/run.py b/gateway/run.py index 99c71d9156..7a551be168 100644 --- a/gateway/run.py +++ b/gateway/run.py @@ -921,12 +921,11 @@ class GatewayRunner: @staticmethod def _load_reasoning_config() -> dict | None: - """Load reasoning effort from config with env fallback. + """Load reasoning effort from config.yaml. - Checks agent.reasoning_effort in config.yaml first, then - HERMES_REASONING_EFFORT as a fallback. Valid: "xhigh", "high", - "medium", "low", "minimal", "none". Returns None to use default - (medium). + Reads agent.reasoning_effort from config.yaml. Valid: "xhigh", + "high", "medium", "low", "minimal", "none". Returns None to use + default (medium). """ from hermes_constants import parse_reasoning_effort effort = "" @@ -939,8 +938,6 @@ class GatewayRunner: effort = str(cfg.get("agent", {}).get("reasoning_effort", "") or "").strip() except Exception: pass - if not effort: - effort = os.getenv("HERMES_REASONING_EFFORT", "") result = parse_reasoning_effort(effort) if effort and effort.strip() and result is None: logger.warning("Unknown reasoning_effort '%s', using default (medium)", effort) @@ -1484,6 +1481,14 @@ class GatewayRunner: logger.debug("Interrupted running agent for session %s during shutdown", session_key[:20]) except Exception as e: logger.debug("Failed interrupting agent during shutdown: %s", e) + # Fire plugin on_session_finalize hook before memory shutdown + try: + from hermes_cli.plugins import invoke_hook as _invoke_hook + _invoke_hook("on_session_finalize", + session_id=getattr(agent, 'session_id', None), + platform="gateway") + except Exception: + pass # Shut down memory provider at actual session boundary try: if hasattr(agent, 'shutdown_memory_provider'): @@ -3277,6 +3282,15 @@ class GatewayRunner: # the configured default instead of the previously switched model. self._session_model_overrides.pop(session_key, None) + # Fire plugin on_session_finalize hook (session boundary) + try: + from hermes_cli.plugins import invoke_hook as _invoke_hook + _old_sid = old_entry.session_id if old_entry else None + _invoke_hook("on_session_finalize", session_id=_old_sid, + platform=source.platform.value if source.platform else "") + except Exception: + pass + # Emit session:end hook (session is ending) await self.hooks.emit("session:end", { "platform": source.platform.value if source.platform else "", @@ -3290,7 +3304,7 @@ class GatewayRunner: "user_id": source.user_id, "session_key": session_key, }) - + # Resolve session config info to surface to the user try: session_info = self._format_session_info() @@ -3301,9 +3315,18 @@ class GatewayRunner: header = "✨ Session reset! Starting fresh." else: # No existing session, just create one - self.session_store.get_or_create_session(source, force_new=True) + new_entry = self.session_store.get_or_create_session(source, force_new=True) header = "✨ New session started!" + # Fire plugin on_session_reset hook (new session guaranteed to exist) + try: + from hermes_cli.plugins import invoke_hook as _invoke_hook + _new_sid = new_entry.session_id if new_entry else None + _invoke_hook("on_session_reset", session_id=_new_sid, + platform=source.platform.value if source.platform else "") + except Exception: + pass + if session_info: return f"{header}\n\n{session_info}" return header diff --git a/gateway/stream_consumer.py b/gateway/stream_consumer.py index 2cda33642a..5522c631db 100644 --- a/gateway/stream_consumer.py +++ b/gateway/stream_consumer.py @@ -74,6 +74,8 @@ class GatewayStreamConsumer: self._edit_supported = True # Disabled on first edit failure (Signal/Email/HA) self._last_edit_time = 0.0 self._last_sent_text = "" # Track last-sent text to skip redundant edits + self._fallback_final_send = False + self._fallback_prefix = "" @property def already_sent(self) -> bool: @@ -138,12 +140,19 @@ class GatewayStreamConsumer: while ( len(self._accumulated) > _safe_limit and self._message_id is not None + and self._edit_supported ): split_at = self._accumulated.rfind("\n", 0, _safe_limit) if split_at < _safe_limit // 2: split_at = _safe_limit chunk = self._accumulated[:split_at] await self._send_or_edit(chunk) + if self._fallback_final_send: + # Edit failed while attempting to split an oversized + # message. Keep the full accumulated text intact so + # the fallback final-send path can deliver the + # remaining continuation without dropping content. + break self._accumulated = self._accumulated[split_at:].lstrip("\n") self._message_id = None self._last_sent_text = "" @@ -156,9 +165,17 @@ class GatewayStreamConsumer: self._last_edit_time = time.monotonic() if got_done: - # Final edit without cursor - if self._accumulated and self._message_id: - await self._send_or_edit(self._accumulated) + # Final edit without cursor. If progressive editing failed + # mid-stream, send a single continuation/fallback message + # here instead of letting the base gateway path send the + # full response again. + if self._accumulated: + if self._fallback_final_send: + await self._send_fallback_final(self._accumulated) + elif self._message_id: + await self._send_or_edit(self._accumulated) + elif not self._already_sent: + await self._send_or_edit(self._accumulated) return # Tool boundary: the should_edit block above already flushed @@ -169,6 +186,8 @@ class GatewayStreamConsumer: self._message_id = None self._accumulated = "" self._last_sent_text = "" + self._fallback_final_send = False + self._fallback_prefix = "" await asyncio.sleep(0.05) # Small yield to not busy-loop @@ -207,6 +226,86 @@ class GatewayStreamConsumer: # Strip trailing whitespace/newlines but preserve leading content return cleaned.rstrip() + def _visible_prefix(self) -> str: + """Return the visible text already shown in the streamed message.""" + prefix = self._last_sent_text or "" + if self.cfg.cursor and prefix.endswith(self.cfg.cursor): + prefix = prefix[:-len(self.cfg.cursor)] + return self._clean_for_display(prefix) + + def _continuation_text(self, final_text: str) -> str: + """Return only the part of final_text the user has not already seen.""" + prefix = self._fallback_prefix or self._visible_prefix() + if prefix and final_text.startswith(prefix): + return final_text[len(prefix):].lstrip() + return final_text + + @staticmethod + def _split_text_chunks(text: str, limit: int) -> list[str]: + """Split text into reasonably sized chunks for fallback sends.""" + if len(text) <= limit: + return [text] + chunks: list[str] = [] + remaining = text + while len(remaining) > limit: + split_at = remaining.rfind("\n", 0, limit) + if split_at < limit // 2: + split_at = limit + chunks.append(remaining[:split_at]) + remaining = remaining[split_at:].lstrip("\n") + if remaining: + chunks.append(remaining) + return chunks + + async def _send_fallback_final(self, text: str) -> None: + """Send the final continuation after streaming edits stop working.""" + final_text = self._clean_for_display(text) + continuation = self._continuation_text(final_text) + self._fallback_final_send = False + if not continuation.strip(): + # Nothing new to send — the visible partial already matches final text. + self._already_sent = True + return + + raw_limit = getattr(self.adapter, "MAX_MESSAGE_LENGTH", 4096) + safe_limit = max(500, raw_limit - 100) + chunks = self._split_text_chunks(continuation, safe_limit) + + last_message_id: Optional[str] = None + last_successful_chunk = "" + sent_any_chunk = False + for chunk in chunks: + result = await self.adapter.send( + chat_id=self.chat_id, + content=chunk, + metadata=self.metadata, + ) + if not result.success: + if sent_any_chunk: + # Some continuation text already reached the user. Suppress + # the base gateway final-send path so we don't resend the + # full response and create another duplicate. + self._already_sent = True + self._message_id = last_message_id + self._last_sent_text = last_successful_chunk + self._fallback_prefix = "" + return + # No fallback chunk reached the user — allow the normal gateway + # final-send path to try one more time. + self._already_sent = False + self._message_id = None + self._last_sent_text = "" + self._fallback_prefix = "" + return + sent_any_chunk = True + last_successful_chunk = chunk + last_message_id = result.message_id or last_message_id + + self._message_id = last_message_id + self._already_sent = True + self._last_sent_text = chunks[-1] + self._fallback_prefix = "" + async def _send_or_edit(self, text: str) -> None: """Send or edit the streaming message.""" # Strip MEDIA: directives so they don't appear as visible text. @@ -232,14 +331,16 @@ class GatewayStreamConsumer: self._last_sent_text = text else: # If an edit fails mid-stream (especially Telegram flood control), - # stop progressive edits and let the normal final send path deliver - # the complete answer instead of leaving the user with a partial. + # stop progressive edits and send only the missing tail once the + # final response is available. logger.debug("Edit failed, disabling streaming for this adapter") + self._fallback_prefix = self._visible_prefix() + self._fallback_final_send = True self._edit_supported = False - self._already_sent = False + self._already_sent = True else: # Editing not supported — skip intermediate updates. - # The final response will be sent by the normal path. + # The final response will be sent by the fallback path. pass else: # First message — send new diff --git a/hermes_cli/__init__.py b/hermes_cli/__init__.py index 0873d3d29c..959332e81c 100644 --- a/hermes_cli/__init__.py +++ b/hermes_cli/__init__.py @@ -11,5 +11,5 @@ Provides subcommands for: - hermes cron - Manage cron jobs """ -__version__ = "0.7.0" -__release_date__ = "2026.4.3" +__version__ = "0.8.0" +__release_date__ = "2026.4.8" diff --git a/hermes_cli/cron.py b/hermes_cli/cron.py index d10513a280..e0ab6007a8 100644 --- a/hermes_cli/cron.py +++ b/hermes_cli/cron.py @@ -93,6 +93,21 @@ def cron_list(show_all: bool = False): script = job.get("script") if script: print(f" Script: {script}") + + # Execution history + last_status = job.get("last_status") + if last_status: + last_run = job.get("last_run_at", "?") + if last_status == "ok": + status_display = color("ok", Colors.GREEN) + else: + status_display = color(f"{last_status}: {job.get('last_error', '?')}", Colors.RED) + print(f" Last run: {last_run} {status_display}") + + delivery_err = job.get("last_delivery_error") + if delivery_err: + print(f" {color('⚠ Delivery failed:', Colors.YELLOW)} {delivery_err}") + print() from hermes_cli.gateway import find_gateway_pids diff --git a/hermes_cli/doctor.py b/hermes_cli/doctor.py index 876ab15d57..361e81d214 100644 --- a/hermes_cli/doctor.py +++ b/hermes_cli/doctor.py @@ -812,69 +812,83 @@ def run_doctor(args): check_warn("No GITHUB_TOKEN", f"(60 req/hr rate limit — set in {_DHH}/.env for better rates)") # ========================================================================= - # Honcho memory + # Memory Provider (only check the active provider, if any) # ========================================================================= print() - print(color("◆ Honcho Memory", Colors.CYAN, Colors.BOLD)) + print(color("◆ Memory Provider", Colors.CYAN, Colors.BOLD)) + _active_memory_provider = "" try: - from plugins.memory.honcho.client import HonchoClientConfig, resolve_config_path - hcfg = HonchoClientConfig.from_global_config() - _honcho_cfg_path = resolve_config_path() + import yaml as _yaml + _mem_cfg_path = HERMES_HOME / "config.yaml" + if _mem_cfg_path.exists(): + with open(_mem_cfg_path) as _f: + _raw_cfg = _yaml.safe_load(_f) or {} + _active_memory_provider = (_raw_cfg.get("memory") or {}).get("provider", "") + except Exception: + pass - if not _honcho_cfg_path.exists(): - check_warn("Honcho config not found", "run: hermes memory setup") - elif not hcfg.enabled: - check_info(f"Honcho disabled (set enabled: true in {_honcho_cfg_path} to activate)") - elif not (hcfg.api_key or hcfg.base_url): - check_fail("Honcho API key or base URL not set", "run: hermes memory setup") - issues.append("No Honcho API key — run 'hermes memory setup'") - else: - from plugins.memory.honcho.client import get_honcho_client, reset_honcho_client - reset_honcho_client() - try: - get_honcho_client(hcfg) - check_ok( - "Honcho connected", - f"workspace={hcfg.workspace_id} mode={hcfg.recall_mode} freq={hcfg.write_frequency}", - ) - except Exception as _e: - check_fail("Honcho connection failed", str(_e)) - issues.append(f"Honcho unreachable: {_e}") - except ImportError: - check_warn("honcho-ai not installed", "pip install honcho-ai") - except Exception as _e: - check_warn("Honcho check failed", str(_e)) + if not _active_memory_provider: + check_ok("Built-in memory active", "(no external provider configured — this is fine)") + elif _active_memory_provider == "honcho": + try: + from plugins.memory.honcho.client import HonchoClientConfig, resolve_config_path + hcfg = HonchoClientConfig.from_global_config() + _honcho_cfg_path = resolve_config_path() - # ========================================================================= - # Mem0 memory - # ========================================================================= - print() - print(color("◆ Mem0 Memory", Colors.CYAN, Colors.BOLD)) - - try: - from plugins.memory.mem0 import _load_config as _load_mem0_config - mem0_cfg = _load_mem0_config() - mem0_key = mem0_cfg.get("api_key", "") - if mem0_key: - check_ok("Mem0 API key configured") - check_info(f"user_id={mem0_cfg.get('user_id', '?')} agent_id={mem0_cfg.get('agent_id', '?')}") - # Check if mem0.json exists but is missing api_key (the bug we fixed) - mem0_json = HERMES_HOME / "mem0.json" - if mem0_json.exists(): + if not _honcho_cfg_path.exists(): + check_warn("Honcho config not found", "run: hermes memory setup") + elif not hcfg.enabled: + check_info(f"Honcho disabled (set enabled: true in {_honcho_cfg_path} to activate)") + elif not (hcfg.api_key or hcfg.base_url): + check_fail("Honcho API key or base URL not set", "run: hermes memory setup") + issues.append("No Honcho API key — run 'hermes memory setup'") + else: + from plugins.memory.honcho.client import get_honcho_client, reset_honcho_client + reset_honcho_client() try: - import json as _json - file_cfg = _json.loads(mem0_json.read_text()) - if not file_cfg.get("api_key") and mem0_key: - check_info("api_key from .env (not in mem0.json) — this is fine") - except Exception: - pass - else: - check_warn("Mem0 not configured", "(set MEM0_API_KEY in .env or run hermes memory setup)") - except ImportError: - check_warn("Mem0 plugin not loadable", "(optional)") - except Exception as _e: - check_warn("Mem0 check failed", str(_e)) + get_honcho_client(hcfg) + check_ok( + "Honcho connected", + f"workspace={hcfg.workspace_id} mode={hcfg.recall_mode} freq={hcfg.write_frequency}", + ) + except Exception as _e: + check_fail("Honcho connection failed", str(_e)) + issues.append(f"Honcho unreachable: {_e}") + except ImportError: + check_fail("honcho-ai not installed", "pip install honcho-ai") + issues.append("Honcho is set as memory provider but honcho-ai is not installed") + except Exception as _e: + check_warn("Honcho check failed", str(_e)) + elif _active_memory_provider == "mem0": + try: + from plugins.memory.mem0 import _load_config as _load_mem0_config + mem0_cfg = _load_mem0_config() + mem0_key = mem0_cfg.get("api_key", "") + if mem0_key: + check_ok("Mem0 API key configured") + check_info(f"user_id={mem0_cfg.get('user_id', '?')} agent_id={mem0_cfg.get('agent_id', '?')}") + else: + check_fail("Mem0 API key not set", "(set MEM0_API_KEY in .env or run hermes memory setup)") + issues.append("Mem0 is set as memory provider but API key is missing") + except ImportError: + check_fail("Mem0 plugin not loadable", "pip install mem0ai") + issues.append("Mem0 is set as memory provider but mem0ai is not installed") + except Exception as _e: + check_warn("Mem0 check failed", str(_e)) + else: + # Generic check for other memory providers (openviking, hindsight, etc.) + try: + from plugins.memory import load_memory_provider + _provider = load_memory_provider(_active_memory_provider) + if _provider and _provider.is_available(): + check_ok(f"{_active_memory_provider} provider active") + elif _provider: + check_warn(f"{_active_memory_provider} configured but not available", "run: hermes memory status") + else: + check_warn(f"{_active_memory_provider} plugin not found", "run: hermes memory setup") + except Exception as _e: + check_warn(f"{_active_memory_provider} check failed", str(_e)) # ========================================================================= # Profiles diff --git a/hermes_cli/model_switch.py b/hermes_cli/model_switch.py index 988eeebdf1..07efbcf4a6 100644 --- a/hermes_cli/model_switch.py +++ b/hermes_cli/model_switch.py @@ -791,12 +791,12 @@ def list_authenticated_providers( if overlay.auth_type in ("oauth_device_code", "oauth_external", "external_process"): # These use auth stores, not env vars — check for auth.json entries try: - from hermes_cli.auth import _read_auth_store - store = _read_auth_store() - if store and pid in store: + from hermes_cli.auth import _load_auth_store + store = _load_auth_store() + if store and (pid in store.get("providers", {}) or pid in store.get("credential_pool", {})): has_creds = True - except Exception: - pass + except Exception as exc: + logger.debug("Auth store check failed for %s: %s", pid, exc) if not has_creds: continue diff --git a/hermes_cli/models.py b/hermes_cli/models.py index 4b37bc9e73..aa68f877d9 100644 --- a/hermes_cli/models.py +++ b/hermes_cli/models.py @@ -144,18 +144,22 @@ _PROVIDER_MODELS: dict[str, list[str]] = { "kimi-k2-0905-preview", ], "minimax": [ - "MiniMax-M2.7", - "MiniMax-M2.7-highspeed", + "MiniMax-M1", + "MiniMax-M1-40k", + "MiniMax-M1-80k", + "MiniMax-M1-128k", + "MiniMax-M1-256k", "MiniMax-M2.5", - "MiniMax-M2.5-highspeed", - "MiniMax-M2.1", + "MiniMax-M2.7", ], "minimax-cn": [ - "MiniMax-M2.7", - "MiniMax-M2.7-highspeed", + "MiniMax-M1", + "MiniMax-M1-40k", + "MiniMax-M1-80k", + "MiniMax-M1-128k", + "MiniMax-M1-256k", "MiniMax-M2.5", - "MiniMax-M2.5-highspeed", - "MiniMax-M2.1", + "MiniMax-M2.7", ], "anthropic": [ "claude-opus-4-6", diff --git a/hermes_cli/plugins.py b/hermes_cli/plugins.py index 23a655aa30..7323bbd011 100644 --- a/hermes_cli/plugins.py +++ b/hermes_cli/plugins.py @@ -61,6 +61,8 @@ VALID_HOOKS: Set[str] = { "post_api_request", "on_session_start", "on_session_end", + "on_session_finalize", + "on_session_reset", } ENTRY_POINTS_GROUP = "hermes_agent.plugins" diff --git a/hermes_cli/runtime_provider.py b/hermes_cli/runtime_provider.py index 9c82ef62af..fa9d493980 100644 --- a/hermes_cli/runtime_provider.py +++ b/hermes_cli/runtime_provider.py @@ -163,6 +163,16 @@ def _resolve_runtime_from_pool_entry( api_mode = _copilot_runtime_api_mode(model_cfg, getattr(entry, "runtime_api_key", "")) else: configured_provider = str(model_cfg.get("provider") or "").strip().lower() + # Honour model.base_url from config.yaml when the configured provider + # matches this provider — same pattern as the Anthropic branch above. + # Only override when the pool entry has no explicit base_url (i.e. it + # fell back to the hardcoded default). Env var overrides win (#6039). + pconfig = PROVIDER_REGISTRY.get(provider) + pool_url_is_default = pconfig and base_url.rstrip("/") == pconfig.inference_base_url.rstrip("/") + if configured_provider == provider and pool_url_is_default: + cfg_base_url = str(model_cfg.get("base_url") or "").strip().rstrip("/") + if cfg_base_url: + base_url = cfg_base_url configured_mode = _parse_api_mode(model_cfg.get("api_mode")) if configured_mode and _provider_supports_explicit_api_mode(provider, configured_provider): api_mode = configured_mode @@ -724,7 +734,15 @@ def resolve_runtime_provider( pconfig = PROVIDER_REGISTRY.get(provider) if pconfig and pconfig.auth_type == "api_key": creds = resolve_api_key_provider_credentials(provider) - base_url = creds.get("base_url", "").rstrip("/") + # Honour model.base_url from config.yaml when the configured provider + # matches this provider — mirrors the Anthropic path above. Without + # this, users who set model.base_url to e.g. api.minimaxi.com/anthropic + # (China endpoint) still get the hardcoded api.minimax.io default (#6039). + cfg_provider = str(model_cfg.get("provider") or "").strip().lower() + cfg_base_url = "" + if cfg_provider == provider: + cfg_base_url = (model_cfg.get("base_url") or "").strip().rstrip("/") + base_url = cfg_base_url or creds.get("base_url", "").rstrip("/") api_mode = "chat_completions" if provider == "copilot": api_mode = _copilot_runtime_api_mode(model_cfg, creds.get("api_key", "")) diff --git a/hermes_cli/setup.py b/hermes_cli/setup.py index 2407ca275d..43c3b086d9 100644 --- a/hermes_cli/setup.py +++ b/hermes_cli/setup.py @@ -105,8 +105,8 @@ _DEFAULT_PROVIDER_MODELS = { ], "zai": ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"], "kimi-coding": ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"], - "minimax": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"], - "minimax-cn": ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"], + "minimax": ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"], + "minimax-cn": ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"], "ai-gateway": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5", "google/gemini-3-flash"], "kilocode": ["anthropic/claude-opus-4.6", "anthropic/claude-sonnet-4.6", "openai/gpt-5.4", "google/gemini-3-pro-preview", "google/gemini-3-flash-preview"], "opencode-zen": ["gpt-5.4", "gpt-5.3-codex", "claude-sonnet-4-6", "gemini-3-flash", "glm-5", "kimi-k2.5", "minimax-m2.7"], diff --git a/plugins/memory/hindsight/__init__.py b/plugins/memory/hindsight/__init__.py index 51feb3cb61..199a7dd5cd 100644 --- a/plugins/memory/hindsight/__init__.py +++ b/plugins/memory/hindsight/__init__.py @@ -23,6 +23,8 @@ import json import logging import os import threading + +from hermes_constants import get_hermes_home from typing import Any, Dict, List from agent.memory_provider import MemoryProvider @@ -142,7 +144,6 @@ def _load_config() -> dict: 3. Environment variables """ from pathlib import Path - from hermes_constants import get_hermes_home # Profile-scoped path (preferred) profile_path = get_hermes_home() / "hindsight" / "config.json" diff --git a/pyproject.toml b/pyproject.toml index c35c94e21f..8982e6e46b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "hermes-agent" -version = "0.7.0" +version = "0.8.0" description = "The self-improving AI agent — creates skills from experience, improves them during use, and runs anywhere" readme = "README.md" requires-python = ">=3.11" diff --git a/run_agent.py b/run_agent.py index a990045b06..a0ae15a162 100644 --- a/run_agent.py +++ b/run_agent.py @@ -76,6 +76,7 @@ from hermes_constants import OPENROUTER_BASE_URL # Agent internals extracted to agent/ package for modularity from agent.memory_manager import build_memory_context_block +from agent.retry_utils import jittered_backoff from agent.prompt_builder import ( DEFAULT_AGENT_IDENTITY, PLATFORM_HINTS, MEMORY_GUIDANCE, SESSION_SEARCH_GUIDANCE, SKILLS_GUIDANCE, @@ -86,6 +87,7 @@ from agent.model_metadata import ( estimate_tokens_rough, estimate_messages_tokens_rough, estimate_request_tokens_rough, get_next_probe_tier, parse_context_limit_from_error, save_context_length, is_local_endpoint, + query_ollama_num_ctx, ) from agent.context_compressor import ContextCompressor from agent.subdirectory_hints import SubdirectoryHintTracker @@ -1160,6 +1162,33 @@ class AIAgent: self.session_cost_status = "unknown" self.session_cost_source = "none" + # ── Ollama num_ctx injection ── + # Ollama defaults to 2048 context regardless of the model's capabilities. + # When running against an Ollama server, detect the model's max context + # and pass num_ctx on every chat request so the full window is used. + # User override: set model.ollama_num_ctx in config.yaml to cap VRAM use. + self._ollama_num_ctx: int | None = None + _ollama_num_ctx_override = None + if isinstance(_model_cfg, dict): + _ollama_num_ctx_override = _model_cfg.get("ollama_num_ctx") + if _ollama_num_ctx_override is not None: + try: + self._ollama_num_ctx = int(_ollama_num_ctx_override) + except (TypeError, ValueError): + logger.debug("Invalid ollama_num_ctx config value: %r", _ollama_num_ctx_override) + if self._ollama_num_ctx is None and self.base_url and is_local_endpoint(self.base_url): + try: + _detected = query_ollama_num_ctx(self.model, self.base_url) + if _detected and _detected > 0: + self._ollama_num_ctx = _detected + except Exception as exc: + logger.debug("Ollama num_ctx detection failed: %s", exc) + if self._ollama_num_ctx and not self.quiet_mode: + logger.info( + "Ollama num_ctx: will request %d tokens (model max from /api/show)", + self._ollama_num_ctx, + ) + if not self.quiet_mode: if compression_enabled: print(f"📊 Context limit: {self.context_compressor.context_length:,} tokens (compress at {int(compression_threshold*100)}% = {self.context_compressor.threshold_tokens:,})") @@ -5400,6 +5429,15 @@ class AIAgent: if _is_nous: extra_body["tags"] = ["product=hermes-agent"] + # Ollama num_ctx: override the 2048 default so the model actually + # uses the context window it was trained for. Passed via the OpenAI + # SDK's extra_body → options.num_ctx, which Ollama's OpenAI-compat + # endpoint forwards to the runner as --ctx-size. + if self._ollama_num_ctx: + options = extra_body.get("options", {}) + options["num_ctx"] = self._ollama_num_ctx + extra_body["options"] = options + if extra_body: api_kwargs["extra_body"] = extra_body @@ -7250,6 +7288,7 @@ class AIAgent: codex_auth_retry_attempted=False anthropic_auth_retry_attempted=False nous_auth_retry_attempted=False + thinking_sig_retry_attempted = False has_retried_429 = False restart_with_compressed_messages = False restart_with_length_continuation = False @@ -7465,7 +7504,8 @@ class AIAgent: } # Longer backoff for rate limiting (likely cause of None choices) - wait_time = min(5 * (2 ** (retry_count - 1)), 120) # 5s, 10s, 20s, 40s, 80s, 120s + # Jittered exponential: 5s base, 120s cap + random jitter + wait_time = jittered_backoff(retry_count, base_delay=5.0, max_delay=120.0) self._vprint(f"{self.log_prefix}⏳ Retrying in {wait_time}s (extended backoff for possible rate limit)...", force=True) logging.warning(f"Invalid API response (retry {retry_count}/{max_retries}): {', '.join(error_details)} | Provider: {provider_name}") @@ -7838,8 +7878,38 @@ class AIAgent: print(f"{self.log_prefix} • Check ANTHROPIC_API_KEY in {_dhh}/.env for API keys or legacy token values") print(f"{self.log_prefix} • For API keys: verify at https://console.anthropic.com/settings/keys") print(f"{self.log_prefix} • For Claude Code: run 'claude /login' to refresh, then retry") - print(f"{self.log_prefix} • Clear stale keys: hermes config set ANTHROPIC_TOKEN \"\"") - print(f"{self.log_prefix} • Legacy cleanup: hermes config set ANTHROPIC_API_KEY \"\"") + print(f"{self.log_prefix} • Legacy cleanup: hermes config set ANTHROPIC_TOKEN \"\"") + print(f"{self.log_prefix} • Clear stale keys: hermes config set ANTHROPIC_API_KEY \"\"") + + # ── Thinking block signature recovery ───────────────── + # Anthropic signs thinking blocks against the full turn + # content. Any upstream mutation (context compression, + # session truncation, message merging) invalidates the + # signature → HTTP 400. Recovery: strip reasoning_details + # from all messages so the next retry sends no thinking + # blocks at all. One-shot — don't retry infinitely. + if ( + self.api_mode == "anthropic_messages" + and status_code == 400 + and not thinking_sig_retry_attempted + ): + _err_msg_lower = str(api_error).lower() + if "signature" in _err_msg_lower and "thinking" in _err_msg_lower: + thinking_sig_retry_attempted = True + for _m in messages: + if isinstance(_m, dict): + _m.pop("reasoning_details", None) + self._vprint( + f"{self.log_prefix}⚠️ Thinking block signature invalid — " + f"stripped all thinking blocks, retrying...", + force=True, + ) + logging.warning( + "%sThinking block signature recovery: stripped " + "reasoning_details from %d messages", + self.log_prefix, len(messages), + ) + continue retry_count += 1 elapsed_time = time.time() - api_start_time @@ -8322,7 +8392,7 @@ class AIAgent: _retry_after = min(int(_ra_raw), 120) # Cap at 2 minutes except (TypeError, ValueError): pass - wait_time = _retry_after if _retry_after else min(2 ** retry_count, 60) + wait_time = _retry_after if _retry_after else jittered_backoff(retry_count, base_delay=2.0, max_delay=60.0) if is_rate_limited: self._emit_status(f"⏱️ Rate limit reached. Waiting {wait_time}s before retry (attempt {retry_count + 1}/{max_retries})...") else: diff --git a/tests/agent/test_anthropic_adapter.py b/tests/agent/test_anthropic_adapter.py index 9aa8c10b17..0024fac624 100644 --- a/tests/agent/test_anthropic_adapter.py +++ b/tests/agent/test_anthropic_adapter.py @@ -1276,6 +1276,258 @@ class TestRoleAlternation: assert [m["role"] for m in result] == ["user", "assistant", "user"] +# --------------------------------------------------------------------------- +# Thinking block signature management +# --------------------------------------------------------------------------- + + +class TestThinkingBlockSignatureManagement: + """Tests for the thinking block handling strategy: + strip from old turns, preserve latest signed, downgrade unsigned.""" + + def test_thinking_stripped_from_non_last_assistant(self): + """Thinking blocks are removed from all assistant messages except the last.""" + messages = [ + { + "role": "assistant", + "content": "", + "tool_calls": [ + {"id": "tc_1", "function": {"name": "tool1", "arguments": "{}"}}, + ], + "reasoning_details": [ + {"type": "thinking", "thinking": "Old reasoning.", "signature": "sig_old"}, + ], + }, + {"role": "tool", "tool_call_id": "tc_1", "content": "result 1"}, + { + "role": "assistant", + "content": "", + "tool_calls": [ + {"id": "tc_2", "function": {"name": "tool2", "arguments": "{}"}}, + ], + "reasoning_details": [ + {"type": "thinking", "thinking": "Latest reasoning.", "signature": "sig_new"}, + ], + }, + {"role": "tool", "tool_call_id": "tc_2", "content": "result 2"}, + ] + _, result = convert_messages_to_anthropic(messages) + + # Find both assistant messages + assistants = [m for m in result if m["role"] == "assistant"] + assert len(assistants) == 2 + + # First (non-last) assistant: no thinking blocks + first_types = [b.get("type") for b in assistants[0]["content"]] + assert "thinking" not in first_types + assert "redacted_thinking" not in first_types + assert "tool_use" in first_types # tool_use should survive + + # Last assistant: thinking block preserved with signature + last_blocks = assistants[1]["content"] + thinking_blocks = [b for b in last_blocks if b.get("type") == "thinking"] + assert len(thinking_blocks) == 1 + assert thinking_blocks[0]["thinking"] == "Latest reasoning." + assert thinking_blocks[0]["signature"] == "sig_new" + + def test_signed_thinking_preserved_on_last_turn(self): + """A signed thinking block on the last assistant message is kept.""" + messages = [ + { + "role": "assistant", + "content": "The answer is 42.", + "reasoning_details": [ + {"type": "thinking", "thinking": "Deep thought.", "signature": "sig_valid"}, + ], + }, + ] + _, result = convert_messages_to_anthropic(messages) + blocks = result[0]["content"] + thinking = [b for b in blocks if b.get("type") == "thinking"] + assert len(thinking) == 1 + assert thinking[0]["signature"] == "sig_valid" + + def test_unsigned_thinking_downgraded_to_text_on_last_turn(self): + """Unsigned thinking blocks on the last turn become text blocks.""" + messages = [ + { + "role": "assistant", + "content": "Response text.", + "reasoning_details": [ + {"type": "thinking", "thinking": "Unsigned reasoning."}, + # No 'signature' field + ], + }, + ] + _, result = convert_messages_to_anthropic(messages) + blocks = result[0]["content"] + + # No thinking blocks should remain + assert not any(b.get("type") == "thinking" for b in blocks) + # The reasoning text should be preserved as a text block + text_contents = [b.get("text", "") for b in blocks if b.get("type") == "text"] + assert "Unsigned reasoning." in text_contents + + def test_redacted_thinking_with_data_preserved(self): + """Redacted thinking with 'data' field is kept on last turn.""" + messages = [ + { + "role": "assistant", + "content": "Response.", + "reasoning_details": [ + {"type": "redacted_thinking", "data": "opaque_signature_data"}, + ], + }, + ] + _, result = convert_messages_to_anthropic(messages) + blocks = result[0]["content"] + redacted = [b for b in blocks if b.get("type") == "redacted_thinking"] + assert len(redacted) == 1 + assert redacted[0]["data"] == "opaque_signature_data" + + def test_redacted_thinking_without_data_dropped(self): + """Redacted thinking without 'data' is dropped — can't be validated.""" + messages = [ + { + "role": "assistant", + "content": "Response.", + "reasoning_details": [ + {"type": "redacted_thinking"}, + # No 'data' field + ], + }, + ] + _, result = convert_messages_to_anthropic(messages) + blocks = result[0]["content"] + assert not any(b.get("type") == "redacted_thinking" for b in blocks) + + def test_cache_control_stripped_from_thinking_blocks(self): + """cache_control markers are removed from thinking/redacted_thinking blocks.""" + messages = [ + { + "role": "assistant", + "content": "", + "tool_calls": [ + {"id": "tc_1", "function": {"name": "t", "arguments": "{}"}}, + ], + "reasoning_details": [ + { + "type": "thinking", + "thinking": "Reasoning.", + "signature": "sig_1", + "cache_control": {"type": "ephemeral"}, + }, + ], + }, + {"role": "tool", "tool_call_id": "tc_1", "content": "result"}, + ] + _, result = convert_messages_to_anthropic(messages) + assistant = next(m for m in result if m["role"] == "assistant") + for block in assistant["content"]: + if block.get("type") in ("thinking", "redacted_thinking"): + assert "cache_control" not in block + + def test_thinking_stripped_from_merged_consecutive_assistants(self): + """When consecutive assistants are merged, second one's thinking is dropped.""" + messages = [ + { + "role": "assistant", + "content": "First response.", + "reasoning_details": [ + {"type": "thinking", "thinking": "First thought.", "signature": "sig_1"}, + ], + }, + { + "role": "assistant", + "content": "Second response.", + "reasoning_details": [ + {"type": "thinking", "thinking": "Second thought.", "signature": "sig_2"}, + ], + }, + ] + _, result = convert_messages_to_anthropic(messages) + + # Should be merged into one assistant message + assistants = [m for m in result if m["role"] == "assistant"] + assert len(assistants) == 1 + + # Only the first thinking block should remain (signed, on the last/only assistant) + blocks = assistants[0]["content"] + thinking = [b for b in blocks if b.get("type") == "thinking"] + assert len(thinking) == 1 + assert thinking[0]["thinking"] == "First thought." + + def test_empty_content_after_strip_gets_placeholder(self): + """If stripping thinking leaves an empty message, a placeholder is added.""" + messages = [ + { + "role": "assistant", + "content": "", + "reasoning_details": [ + {"type": "thinking", "thinking": "Only thinking, no text."}, + # Unsigned — will be downgraded, but content was empty string + ], + }, + {"role": "user", "content": "Next message."}, + {"role": "assistant", "content": "Final."}, + ] + _, result = convert_messages_to_anthropic(messages) + # First assistant is non-last, so thinking is stripped completely. + # The original content was empty and thinking was unsigned → placeholder + first_assistant = result[0] + assert first_assistant["role"] == "assistant" + assert len(first_assistant["content"]) >= 1 + + def test_multi_turn_conversation_preserves_only_last(self): + """Full multi-turn conversation: only last assistant keeps thinking.""" + messages = [ + {"role": "user", "content": "Question 1"}, + { + "role": "assistant", + "content": "Answer 1", + "reasoning_details": [ + {"type": "thinking", "thinking": "Thought 1", "signature": "sig_1"}, + ], + }, + {"role": "user", "content": "Question 2"}, + { + "role": "assistant", + "content": "Answer 2", + "reasoning_details": [ + {"type": "thinking", "thinking": "Thought 2", "signature": "sig_2"}, + ], + }, + {"role": "user", "content": "Question 3"}, + { + "role": "assistant", + "content": "Answer 3", + "reasoning_details": [ + {"type": "thinking", "thinking": "Thought 3", "signature": "sig_3"}, + ], + }, + ] + _, result = convert_messages_to_anthropic(messages) + + assistants = [m for m in result if m["role"] == "assistant"] + assert len(assistants) == 3 + + # First two: no thinking blocks + for a in assistants[:2]: + assert not any( + b.get("type") in ("thinking", "redacted_thinking") + for b in a["content"] + if isinstance(b, dict) + ) + + # Last one: thinking preserved + last_thinking = [ + b for b in assistants[2]["content"] + if isinstance(b, dict) and b.get("type") == "thinking" + ] + assert len(last_thinking) == 1 + assert last_thinking[0]["signature"] == "sig_3" + + # --------------------------------------------------------------------------- # Tool choice # --------------------------------------------------------------------------- diff --git a/tests/agent/test_auxiliary_client.py b/tests/agent/test_auxiliary_client.py index 32f481988e..c7cd12ae71 100644 --- a/tests/agent/test_auxiliary_client.py +++ b/tests/agent/test_auxiliary_client.py @@ -471,6 +471,23 @@ class TestExplicitProviderRouting: client, model = resolve_provider_client("zai") assert client is not None + def test_explicit_google_alias_uses_gemini_credentials(self): + """provider='google' should route through the gemini API-key provider.""" + with ( + patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={ + "api_key": "gemini-key", + "base_url": "https://generativelanguage.googleapis.com/v1beta/openai", + }), + patch("agent.auxiliary_client.OpenAI") as mock_openai, + ): + mock_openai.return_value = MagicMock() + client, model = resolve_provider_client("google", model="gemini-3.1-pro-preview") + + assert client is not None + assert model == "gemini-3.1-pro-preview" + assert mock_openai.call_args.kwargs["api_key"] == "gemini-key" + assert mock_openai.call_args.kwargs["base_url"] == "https://generativelanguage.googleapis.com/v1beta/openai" + def test_explicit_unknown_returns_none(self, monkeypatch): """Unknown provider should return None.""" client, model = resolve_provider_client("nonexistent-provider") @@ -624,12 +641,15 @@ class TestVisionClientFallback: assert client is None assert model is None - def test_vision_auto_includes_anthropic_when_configured(self, monkeypatch): - monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key") + def test_vision_auto_includes_active_provider_when_configured(self, monkeypatch): + """Active provider appears in available backends when credentials exist.""" + monkeypatch.setenv("ANTHROPIC_API_KEY", "***") with ( patch("agent.auxiliary_client._read_nous_auth", return_value=None), + patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"), + patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"), patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()), - patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"), + patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"), ): backends = get_available_vision_backends() @@ -702,88 +722,50 @@ class TestAuxiliaryPoolAwareness: assert call_kwargs["base_url"] == "https://api.githubcopilot.com" assert call_kwargs["default_headers"]["Editor-Version"] - def test_vision_auto_uses_anthropic_when_no_higher_priority_backend(self, monkeypatch): - monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key") + def test_vision_auto_uses_active_provider_as_fallback(self, monkeypatch): + """When no OpenRouter/Nous available, vision auto falls back to active provider.""" + monkeypatch.setenv("ANTHROPIC_API_KEY", "***") with ( patch("agent.auxiliary_client._read_nous_auth", return_value=None), + patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"), + patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"), patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()), - patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"), + patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="***"), ): client, model = get_vision_auxiliary_client() assert client is not None assert client.__class__.__name__ == "AnthropicAuxiliaryClient" - assert model == "claude-haiku-4-5-20251001" - def test_selected_anthropic_provider_is_preferred_for_vision_auto(self, monkeypatch): + def test_vision_auto_prefers_openrouter_over_active_provider(self, monkeypatch): + """OpenRouter is tried before the active provider in vision auto.""" monkeypatch.setenv("OPENROUTER_API_KEY", "or-key") - monkeypatch.setenv("ANTHROPIC_API_KEY", "sk-ant-api03-key") - - def fake_load_config(): - return {"model": {"provider": "anthropic", "default": "claude-sonnet-4-6"}} + monkeypatch.setenv("ANTHROPIC_API_KEY", "***") with ( patch("agent.auxiliary_client._read_nous_auth", return_value=None), - patch("agent.anthropic_adapter.build_anthropic_client", return_value=MagicMock()), - patch("agent.anthropic_adapter.resolve_anthropic_token", return_value="sk-ant-api03-key"), + patch("agent.auxiliary_client._read_main_provider", return_value="anthropic"), + patch("agent.auxiliary_client._read_main_model", return_value="claude-sonnet-4"), patch("agent.auxiliary_client.OpenAI") as mock_openai, - patch("hermes_cli.config.load_config", fake_load_config), - ): - client, model = get_vision_auxiliary_client() - - assert client is not None - assert client.__class__.__name__ == "AnthropicAuxiliaryClient" - assert model == "claude-haiku-4-5-20251001" - - def test_selected_codex_provider_short_circuits_vision_auto(self, monkeypatch): - def fake_load_config(): - return {"model": {"provider": "openai-codex", "default": "gpt-5.2-codex"}} - - codex_client = MagicMock() - with ( - patch("hermes_cli.config.load_config", fake_load_config), - patch("agent.auxiliary_client._try_codex", return_value=(codex_client, "gpt-5.2-codex")) as mock_codex, - patch("agent.auxiliary_client._try_openrouter") as mock_openrouter, - patch("agent.auxiliary_client._try_nous") as mock_nous, - patch("agent.auxiliary_client._try_anthropic") as mock_anthropic, - patch("agent.auxiliary_client._try_custom_endpoint") as mock_custom, ): provider, client, model = resolve_vision_provider_client() - assert provider == "openai-codex" - assert client is codex_client - assert model == "gpt-5.2-codex" - mock_codex.assert_called_once() - mock_openrouter.assert_not_called() - mock_nous.assert_not_called() - mock_anthropic.assert_not_called() - mock_custom.assert_not_called() + # OpenRouter should win over anthropic active provider + assert provider == "openrouter" - def test_vision_auto_includes_codex(self, codex_auth_dir): - """Codex supports vision (gpt-5.3-codex), so auto mode should use it.""" - with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \ - patch("agent.auxiliary_client.OpenAI"): - client, model = get_vision_auxiliary_client() - from agent.auxiliary_client import CodexAuxiliaryClient - assert isinstance(client, CodexAuxiliaryClient) - assert model == "gpt-5.2-codex" - - def test_vision_auto_falls_back_to_custom_endpoint(self, monkeypatch): - """Custom endpoint is used as fallback in vision auto mode. - - Many local models (Qwen-VL, LLaVA, etc.) support vision. - When no OpenRouter/Nous/Codex is available, try the custom endpoint. - """ + def test_vision_auto_uses_named_custom_as_active_provider(self, monkeypatch): + """Named custom provider works as active provider fallback in vision auto.""" monkeypatch.delenv("OPENROUTER_API_KEY", raising=False) monkeypatch.delenv("ANTHROPIC_API_KEY", raising=False) with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \ patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)), \ - patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \ - patch("agent.auxiliary_client._resolve_custom_runtime", - return_value=("http://localhost:1234/v1", "local-key")), \ - patch("agent.auxiliary_client.OpenAI") as mock_openai: - client, model = get_vision_auxiliary_client() - assert client is not None # Custom endpoint picked up as fallback + patch("agent.auxiliary_client._read_main_provider", return_value="custom:local"), \ + patch("agent.auxiliary_client._read_main_model", return_value="my-local-model"), \ + patch("agent.auxiliary_client.resolve_provider_client", + return_value=(MagicMock(), "my-local-model")) as mock_resolve: + provider, client, model = resolve_vision_provider_client() + assert client is not None + assert provider == "custom:local" def test_vision_direct_endpoint_override(self, monkeypatch): monkeypatch.setenv("OPENROUTER_API_KEY", "or-key") @@ -822,6 +804,31 @@ class TestAuxiliaryPoolAwareness: assert model == "google/gemini-3-flash-preview" assert client is not None + def test_vision_config_google_provider_uses_gemini_credentials(self, monkeypatch): + config = { + "auxiliary": { + "vision": { + "provider": "google", + "model": "gemini-3.1-pro-preview", + } + } + } + monkeypatch.setattr("hermes_cli.config.load_config", lambda: config) + with ( + patch("hermes_cli.auth.resolve_api_key_provider_credentials", return_value={ + "api_key": "gemini-key", + "base_url": "https://generativelanguage.googleapis.com/v1beta/openai", + }), + patch("agent.auxiliary_client.OpenAI") as mock_openai, + ): + resolved_provider, client, model = resolve_vision_provider_client() + + assert resolved_provider == "gemini" + assert client is not None + assert model == "gemini-3.1-pro-preview" + assert mock_openai.call_args.kwargs["api_key"] == "gemini-key" + assert mock_openai.call_args.kwargs["base_url"] == "https://generativelanguage.googleapis.com/v1beta/openai" + def test_vision_forced_main_uses_custom_endpoint(self, monkeypatch): """When explicitly forced to 'main', vision CAN use custom endpoint.""" config = { @@ -846,7 +853,14 @@ class TestAuxiliaryPoolAwareness: monkeypatch.setenv("AUXILIARY_VISION_PROVIDER", "main") monkeypatch.delenv("OPENAI_BASE_URL", raising=False) monkeypatch.delenv("OPENAI_API_KEY", raising=False) + # Clear client cache to avoid stale entries from previous tests + from agent.auxiliary_client import _client_cache + _client_cache.clear() with patch("agent.auxiliary_client._read_nous_auth", return_value=None), \ + patch("agent.auxiliary_client._read_main_provider", return_value=""), \ + patch("agent.auxiliary_client._read_main_model", return_value=""), \ + patch("agent.auxiliary_client._select_pool_entry", return_value=(False, None)), \ + patch("agent.auxiliary_client._resolve_custom_runtime", return_value=(None, None)), \ patch("agent.auxiliary_client._read_codex_access_token", return_value=None), \ patch("agent.auxiliary_client._resolve_api_key_provider", return_value=(None, None)): client, model = get_vision_auxiliary_client() diff --git a/tests/agent/test_minimax_auxiliary_url.py b/tests/agent/test_minimax_auxiliary_url.py new file mode 100644 index 0000000000..4444c3aadf --- /dev/null +++ b/tests/agent/test_minimax_auxiliary_url.py @@ -0,0 +1,42 @@ +"""Tests for MiniMax auxiliary client URL normalization. + +MiniMax and MiniMax-CN set inference_base_url to the /anthropic path. +The auxiliary client uses the OpenAI SDK, which needs /v1 instead. +""" + +import sys +import os + +sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "..")) + +from agent.auxiliary_client import _to_openai_base_url + + +class TestToOpenaiBaseUrl: + def test_minimax_global_anthropic_suffix_replaced(self): + assert _to_openai_base_url("https://api.minimax.io/anthropic") == "https://api.minimax.io/v1" + + def test_minimax_cn_anthropic_suffix_replaced(self): + assert _to_openai_base_url("https://api.minimaxi.com/anthropic") == "https://api.minimaxi.com/v1" + + def test_trailing_slash_stripped_before_replace(self): + assert _to_openai_base_url("https://api.minimax.io/anthropic/") == "https://api.minimax.io/v1" + + def test_v1_url_unchanged(self): + assert _to_openai_base_url("https://api.openai.com/v1") == "https://api.openai.com/v1" + + def test_openrouter_url_unchanged(self): + assert _to_openai_base_url("https://openrouter.ai/api/v1") == "https://openrouter.ai/api/v1" + + def test_anthropic_domain_unchanged(self): + """api.anthropic.com doesn't end with /anthropic — should be untouched.""" + assert _to_openai_base_url("https://api.anthropic.com") == "https://api.anthropic.com" + + def test_anthropic_in_subpath_unchanged(self): + assert _to_openai_base_url("https://example.com/anthropic/extra") == "https://example.com/anthropic/extra" + + def test_empty_string(self): + assert _to_openai_base_url("") == "" + + def test_none(self): + assert _to_openai_base_url(None) == "" diff --git a/tests/agent/test_minimax_provider.py b/tests/agent/test_minimax_provider.py new file mode 100644 index 0000000000..c6819e877d --- /dev/null +++ b/tests/agent/test_minimax_provider.py @@ -0,0 +1,105 @@ +"""Tests for MiniMax provider hardening — context lengths, thinking guard, catalog.""" + + +class TestMinimaxContextLengths: + """Verify per-model context length entries for MiniMax models.""" + + def test_m1_variants_have_1m_context(self): + from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS + # Keys are lowercase because the lookup lowercases model names + for model in ("minimax-m1", "minimax-m1-40k", "minimax-m1-80k", + "minimax-m1-128k", "minimax-m1-256k"): + assert model in DEFAULT_CONTEXT_LENGTHS, f"{model} missing from context lengths" + assert DEFAULT_CONTEXT_LENGTHS[model] == 1_000_000, f"{model} expected 1M" + + def test_m2_variants_have_1m_context(self): + from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS + # Keys are lowercase because the lookup lowercases model names + for model in ("minimax-m2.5", "minimax-m2.7"): + assert model in DEFAULT_CONTEXT_LENGTHS, f"{model} missing from context lengths" + assert DEFAULT_CONTEXT_LENGTHS[model] == 1_048_576, f"{model} expected 1048576" + + def test_minimax_prefix_fallback(self): + from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS + # The generic "minimax" prefix entry should be 1M for unknown models + assert DEFAULT_CONTEXT_LENGTHS["minimax"] == 1_048_576 + + + +class TestMinimaxThinkingGuard: + """Verify that build_anthropic_kwargs does NOT add thinking params for MiniMax models.""" + + def test_no_thinking_for_minimax_m27(self): + from agent.anthropic_adapter import build_anthropic_kwargs + kwargs = build_anthropic_kwargs( + model="MiniMax-M2.7", + messages=[{"role": "user", "content": "hello"}], + tools=None, + max_tokens=4096, + reasoning_config={"enabled": True, "effort": "medium"}, + ) + assert "thinking" not in kwargs + assert "output_config" not in kwargs + + def test_no_thinking_for_minimax_m1(self): + from agent.anthropic_adapter import build_anthropic_kwargs + kwargs = build_anthropic_kwargs( + model="MiniMax-M1-128k", + messages=[{"role": "user", "content": "hello"}], + tools=None, + max_tokens=4096, + reasoning_config={"enabled": True, "effort": "high"}, + ) + assert "thinking" not in kwargs + + def test_thinking_still_works_for_claude(self): + from agent.anthropic_adapter import build_anthropic_kwargs + kwargs = build_anthropic_kwargs( + model="claude-sonnet-4-20250514", + messages=[{"role": "user", "content": "hello"}], + tools=None, + max_tokens=4096, + reasoning_config={"enabled": True, "effort": "medium"}, + ) + assert "thinking" in kwargs + + +class TestMinimaxAuxModel: + """Verify auxiliary model is standard (not highspeed).""" + + def test_minimax_aux_is_standard(self): + from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS + assert _API_KEY_PROVIDER_AUX_MODELS["minimax"] == "MiniMax-M2.7" + assert _API_KEY_PROVIDER_AUX_MODELS["minimax-cn"] == "MiniMax-M2.7" + + def test_minimax_aux_not_highspeed(self): + from agent.auxiliary_client import _API_KEY_PROVIDER_AUX_MODELS + assert "highspeed" not in _API_KEY_PROVIDER_AUX_MODELS["minimax"] + assert "highspeed" not in _API_KEY_PROVIDER_AUX_MODELS["minimax-cn"] + + +class TestMinimaxModelCatalog: + """Verify the model catalog includes M1 family and excludes deprecated models.""" + + def test_catalog_includes_m1_family(self): + from hermes_cli.models import _PROVIDER_MODELS + for provider in ("minimax", "minimax-cn"): + models = _PROVIDER_MODELS[provider] + assert "MiniMax-M1" in models + assert "MiniMax-M1-40k" in models + assert "MiniMax-M1-80k" in models + assert "MiniMax-M1-128k" in models + assert "MiniMax-M1-256k" in models + + def test_catalog_excludes_deprecated(self): + from hermes_cli.models import _PROVIDER_MODELS + for provider in ("minimax", "minimax-cn"): + models = _PROVIDER_MODELS[provider] + assert "MiniMax-M2.1" not in models + + def test_catalog_excludes_highspeed(self): + from hermes_cli.models import _PROVIDER_MODELS + for provider in ("minimax", "minimax-cn"): + models = _PROVIDER_MODELS[provider] + assert "MiniMax-M2.7-highspeed" not in models + assert "MiniMax-M2.5-highspeed" not in models diff --git a/tests/cli/test_session_boundary_hooks.py b/tests/cli/test_session_boundary_hooks.py new file mode 100644 index 0000000000..19de4cd97a --- /dev/null +++ b/tests/cli/test_session_boundary_hooks.py @@ -0,0 +1,66 @@ +import pytest +from unittest.mock import MagicMock, patch +from hermes_cli.plugins import VALID_HOOKS, PluginManager +import os +import shutil +import tempfile +from cli import HermesCLI + + +def test_session_hooks_in_valid_hooks(): + """Verify on_session_finalize and on_session_reset are registered as valid hooks.""" + assert "on_session_finalize" in VALID_HOOKS + assert "on_session_reset" in VALID_HOOKS + + +@patch("hermes_cli.plugins.invoke_hook") +def test_session_finalize_on_reset(mock_invoke_hook): + """Verify on_session_finalize fires when /new or /reset is used.""" + cli = HermesCLI() + cli.agent = MagicMock() + cli.agent.session_id = "test-session-id" + + # Simulate /new command which triggers on_session_finalize for the old session + cli.new_session(silent=True) + + # Check if on_session_finalize was called for the old session + mock_invoke_hook.assert_any_call( + "on_session_finalize", session_id="test-session-id", platform="cli" + ) + # Check if on_session_reset was called for the new session + mock_invoke_hook.assert_any_call( + "on_session_reset", session_id=cli.session_id, platform="cli" + ) + + +@patch("hermes_cli.plugins.invoke_hook") +def test_session_finalize_on_cleanup(mock_invoke_hook): + """Verify on_session_finalize fires during CLI exit cleanup.""" + import cli as cli_mod + + mock_agent = MagicMock() + mock_agent.session_id = "cleanup-session-id" + cli_mod._active_agent_ref = mock_agent + cli_mod._cleanup_done = False + + cli_mod._run_cleanup() + + mock_invoke_hook.assert_any_call( + "on_session_finalize", session_id="cleanup-session-id", platform="cli" + ) + + +@patch("hermes_cli.plugins.invoke_hook") +def test_hook_errors_are_caught(mock_invoke_hook): + """Verify hook exceptions are caught and don't crash the agent.""" + mgr = PluginManager() + + # Register a hook that raises + def bad_callback(**kwargs): + raise Exception("Hook failed") + + mgr._hooks["on_session_finalize"] = [bad_callback] + + # This should not raise + results = mgr.invoke_hook("on_session_finalize", session_id="test", platform="cli") + assert results == [] diff --git a/tests/cli/test_worktree.py b/tests/cli/test_worktree.py index f545baa391..fece9cf6be 100644 --- a/tests/cli/test_worktree.py +++ b/tests/cli/test_worktree.py @@ -33,6 +33,13 @@ def git_repo(tmp_path): ["git", "commit", "-m", "Initial commit"], cwd=repo, capture_output=True, ) + # Add a fake remote ref so cleanup logic sees the initial commit as + # "pushed". Without this, `git log HEAD --not --remotes` treats every + # commit as unpushed and cleanup refuses to delete worktrees. + subprocess.run( + ["git", "update-ref", "refs/remotes/origin/main", "HEAD"], + cwd=repo, capture_output=True, + ) return repo @@ -81,7 +88,11 @@ def _setup_worktree(repo_root): def _cleanup_worktree(info): - """Test version of _cleanup_worktree.""" + """Test version of _cleanup_worktree. + + Preserves the worktree only if it has unpushed commits. + Dirty working tree alone is not enough to keep it. + """ wt_path = info["path"] branch = info["branch"] repo_root = info["repo_root"] @@ -89,15 +100,15 @@ def _cleanup_worktree(info): if not Path(wt_path).exists(): return - # Check for uncommitted changes - status = subprocess.run( - ["git", "status", "--porcelain"], + # Check for unpushed commits + result = subprocess.run( + ["git", "log", "--oneline", "HEAD", "--not", "--remotes"], capture_output=True, text=True, timeout=10, cwd=wt_path, ) - has_changes = bool(status.stdout.strip()) + has_unpushed = bool(result.stdout.strip()) - if has_changes: - return False # Did not clean up + if has_unpushed: + return False # Did not clean up — has unpushed commits subprocess.run( ["git", "worktree", "remove", wt_path, "--force"], @@ -204,20 +215,45 @@ class TestWorktreeCleanup: assert result is True assert not Path(info["path"]).exists() - def test_dirty_worktree_kept(self, git_repo): + def test_dirty_worktree_cleaned_when_no_unpushed(self, git_repo): + """Dirty working tree without unpushed commits is cleaned up. + + Agent sessions typically leave untracked files / artifacts behind. + Since all real work is in pushed commits, these don't warrant + keeping the worktree. + """ info = _setup_worktree(str(git_repo)) assert info is not None - # Make uncommitted changes + # Make uncommitted changes (untracked file) (Path(info["path"]) / "new-file.txt").write_text("uncommitted") subprocess.run( ["git", "add", "new-file.txt"], cwd=info["path"], capture_output=True, ) + # The git_repo fixture already has a fake remote ref so the initial + # commit is seen as "pushed". No unpushed commits → cleanup proceeds. result = _cleanup_worktree(info) - assert result is False - assert Path(info["path"]).exists() # Still there + assert result is True # Cleaned up despite dirty working tree + assert not Path(info["path"]).exists() + + def test_worktree_with_unpushed_commits_kept(self, git_repo): + """Worktree with unpushed commits is preserved.""" + info = _setup_worktree(str(git_repo)) + assert info is not None + + # Make a commit that is NOT on any remote + (Path(info["path"]) / "work.txt").write_text("real work") + subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True) + subprocess.run( + ["git", "commit", "-m", "agent work"], + cwd=info["path"], capture_output=True, + ) + + result = _cleanup_worktree(info) + assert result is False # Kept — has unpushed commits + assert Path(info["path"]).exists() def test_branch_deleted_on_cleanup(self, git_repo): info = _setup_worktree(str(git_repo)) @@ -367,7 +403,7 @@ class TestMultipleWorktrees: lines = [l for l in result.stdout.strip().splitlines() if l.strip()] assert len(lines) == 11 - # Cleanup all + # Cleanup all (git_repo fixture has a fake remote ref so cleanup works) for info in worktrees: # Discard changes first so cleanup works subprocess.run( @@ -492,33 +528,77 @@ class TestStaleWorktreePruning: assert not pruned assert Path(info["path"]).exists() - def test_keeps_dirty_old_worktree(self, git_repo): - """Old worktrees with uncommitted changes should NOT be pruned.""" + def test_keeps_old_worktree_with_unpushed_commits(self, git_repo): + """Old worktrees (24-72h) with unpushed commits should NOT be pruned.""" import time info = _setup_worktree(str(git_repo)) assert info is not None - # Make it dirty - (Path(info["path"]) / "dirty.txt").write_text("uncommitted") + # Make an unpushed commit + (Path(info["path"]) / "work.txt").write_text("real work") + subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True) subprocess.run( - ["git", "add", "dirty.txt"], + ["git", "commit", "-m", "agent work"], cwd=info["path"], capture_output=True, ) - # Make it old + # Make it old (25h — in the 24-72h soft tier) old_time = time.time() - (25 * 3600) os.utime(info["path"], (old_time, old_time)) - # Check if it would be pruned - status = subprocess.run( - ["git", "status", "--porcelain"], + # Check for unpushed commits (simulates prune logic) + result = subprocess.run( + ["git", "log", "--oneline", "HEAD", "--not", "--remotes"], capture_output=True, text=True, cwd=info["path"], ) - has_changes = bool(status.stdout.strip()) - assert has_changes # Should be dirty → not pruned + has_unpushed = bool(result.stdout.strip()) + assert has_unpushed # Has unpushed commits → not pruned in soft tier assert Path(info["path"]).exists() + def test_force_prunes_very_old_worktree(self, git_repo): + """Worktrees older than 72h should be force-pruned regardless.""" + import time + + info = _setup_worktree(str(git_repo)) + assert info is not None + + # Make an unpushed commit (would normally protect it) + (Path(info["path"]) / "work.txt").write_text("stale work") + subprocess.run(["git", "add", "work.txt"], cwd=info["path"], capture_output=True) + subprocess.run( + ["git", "commit", "-m", "old agent work"], + cwd=info["path"], capture_output=True, + ) + + # Make it very old (73h — beyond the 72h hard threshold) + old_time = time.time() - (73 * 3600) + os.utime(info["path"], (old_time, old_time)) + + # Simulate the force-prune tier check + hard_cutoff = time.time() - (72 * 3600) + mtime = Path(info["path"]).stat().st_mtime + assert mtime <= hard_cutoff # Should qualify for force removal + + # Actually remove it (simulates _prune_stale_worktrees force path) + branch_result = subprocess.run( + ["git", "branch", "--show-current"], + capture_output=True, text=True, timeout=5, cwd=info["path"], + ) + branch = branch_result.stdout.strip() + + subprocess.run( + ["git", "worktree", "remove", info["path"], "--force"], + capture_output=True, text=True, timeout=15, cwd=str(git_repo), + ) + if branch: + subprocess.run( + ["git", "branch", "-D", branch], + capture_output=True, text=True, timeout=10, cwd=str(git_repo), + ) + + assert not Path(info["path"]).exists() + class TestEdgeCases: """Test edge cases for robustness.""" @@ -611,6 +691,133 @@ class TestTerminalCWDIntegration: assert result.stdout.strip() == "true" +class TestOrphanedBranchPruning: + """Test cleanup of orphaned hermes/* and pr-* branches.""" + + def test_prunes_orphaned_hermes_branch(self, git_repo): + """hermes/hermes-* branches with no worktree should be deleted.""" + # Create a branch that looks like a worktree branch but has no worktree + subprocess.run( + ["git", "branch", "hermes/hermes-deadbeef", "HEAD"], + cwd=str(git_repo), capture_output=True, + ) + + # Verify it exists + result = subprocess.run( + ["git", "branch", "--list", "hermes/hermes-deadbeef"], + capture_output=True, text=True, cwd=str(git_repo), + ) + assert "hermes/hermes-deadbeef" in result.stdout + + # Simulate _prune_orphaned_branches logic + result = subprocess.run( + ["git", "branch", "--format=%(refname:short)"], + capture_output=True, text=True, cwd=str(git_repo), + ) + all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()] + + wt_result = subprocess.run( + ["git", "worktree", "list", "--porcelain"], + capture_output=True, text=True, cwd=str(git_repo), + ) + active_branches = {"main"} + for line in wt_result.stdout.split("\n"): + if line.startswith("branch refs/heads/"): + active_branches.add(line.split("branch refs/heads/", 1)[-1].strip()) + + orphaned = [ + b for b in all_branches + if b not in active_branches + and (b.startswith("hermes/hermes-") or b.startswith("pr-")) + ] + assert "hermes/hermes-deadbeef" in orphaned + + # Delete them + if orphaned: + subprocess.run( + ["git", "branch", "-D"] + orphaned, + capture_output=True, text=True, cwd=str(git_repo), + ) + + # Verify gone + result = subprocess.run( + ["git", "branch", "--list", "hermes/hermes-deadbeef"], + capture_output=True, text=True, cwd=str(git_repo), + ) + assert "hermes/hermes-deadbeef" not in result.stdout + + def test_prunes_orphaned_pr_branch(self, git_repo): + """pr-* branches should be deleted during pruning.""" + subprocess.run( + ["git", "branch", "pr-1234", "HEAD"], + cwd=str(git_repo), capture_output=True, + ) + subprocess.run( + ["git", "branch", "pr-5678", "HEAD"], + cwd=str(git_repo), capture_output=True, + ) + + result = subprocess.run( + ["git", "branch", "--format=%(refname:short)"], + capture_output=True, text=True, cwd=str(git_repo), + ) + all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()] + + active_branches = {"main"} + orphaned = [ + b for b in all_branches + if b not in active_branches and b.startswith("pr-") + ] + assert "pr-1234" in orphaned + assert "pr-5678" in orphaned + + subprocess.run( + ["git", "branch", "-D"] + orphaned, + capture_output=True, text=True, cwd=str(git_repo), + ) + + # Verify gone + result = subprocess.run( + ["git", "branch", "--format=%(refname:short)"], + capture_output=True, text=True, cwd=str(git_repo), + ) + remaining = result.stdout.strip() + assert "pr-1234" not in remaining + assert "pr-5678" not in remaining + + def test_preserves_active_worktree_branch(self, git_repo): + """Branches with active worktrees should NOT be pruned.""" + info = _setup_worktree(str(git_repo)) + assert info is not None + + result = subprocess.run( + ["git", "worktree", "list", "--porcelain"], + capture_output=True, text=True, cwd=str(git_repo), + ) + active_branches = set() + for line in result.stdout.split("\n"): + if line.startswith("branch refs/heads/"): + active_branches.add(line.split("branch refs/heads/", 1)[-1].strip()) + + assert info["branch"] in active_branches # Protected + + def test_preserves_main_branch(self, git_repo): + """main branch should never be pruned.""" + result = subprocess.run( + ["git", "branch", "--format=%(refname:short)"], + capture_output=True, text=True, cwd=str(git_repo), + ) + all_branches = [b.strip() for b in result.stdout.strip().split("\n") if b.strip()] + active_branches = {"main"} + + orphaned = [ + b for b in all_branches + if b not in active_branches + and (b.startswith("hermes/hermes-") or b.startswith("pr-")) + ] + assert "main" not in orphaned + + class TestSystemPromptInjection: """Test that the agent gets worktree context in its system prompt.""" @@ -625,7 +832,7 @@ class TestSystemPromptInjection: f"{info['path']}. Your branch is `{info['branch']}`. " f"Changes here do not affect the main working tree or other agents. " f"Remember to commit and push your changes, and create a PR if appropriate. " - f"The original repo is at {info['repo_root']}.]" + f"The original repo is at {info['repo_root']}.]\n" ) assert info["path"] in wt_note diff --git a/tests/cron/test_jobs.py b/tests/cron/test_jobs.py index cca460100a..e0f56b9612 100644 --- a/tests/cron/test_jobs.py +++ b/tests/cron/test_jobs.py @@ -339,6 +339,36 @@ class TestMarkJobRun: assert updated["last_status"] == "error" assert updated["last_error"] == "timeout" + def test_delivery_error_tracked_separately(self, tmp_cron_dir): + """Agent succeeds but delivery fails — both tracked independently.""" + job = create_job(prompt="Report", schedule="every 1h") + mark_job_run(job["id"], success=True, delivery_error="platform 'telegram' not configured") + updated = get_job(job["id"]) + assert updated["last_status"] == "ok" + assert updated["last_error"] is None + assert updated["last_delivery_error"] == "platform 'telegram' not configured" + + def test_delivery_error_cleared_on_success(self, tmp_cron_dir): + """Successful delivery clears the previous delivery error.""" + job = create_job(prompt="Report", schedule="every 1h") + mark_job_run(job["id"], success=True, delivery_error="network timeout") + updated = get_job(job["id"]) + assert updated["last_delivery_error"] == "network timeout" + # Next run delivers successfully + mark_job_run(job["id"], success=True, delivery_error=None) + updated = get_job(job["id"]) + assert updated["last_delivery_error"] is None + + def test_both_agent_and_delivery_error(self, tmp_cron_dir): + """Agent fails AND delivery fails — both errors recorded.""" + job = create_job(prompt="Report", schedule="every 1h") + mark_job_run(job["id"], success=False, error="model timeout", + delivery_error="platform 'discord' not enabled") + updated = get_job(job["id"]) + assert updated["last_status"] == "error" + assert updated["last_error"] == "model timeout" + assert updated["last_delivery_error"] == "platform 'discord' not enabled" + class TestAdvanceNextRun: """Tests for advance_next_run() — crash-safety for recurring jobs.""" diff --git a/tests/cron/test_scheduler.py b/tests/cron/test_scheduler.py index 4a15fa2238..c07663a37d 100644 --- a/tests/cron/test_scheduler.py +++ b/tests/cron/test_scheduler.py @@ -508,6 +508,90 @@ class TestDeliverResultWrapping: assert send_mock.call_args.kwargs["thread_id"] == "17585" +class TestDeliverResultErrorReturns: + """Verify _deliver_result returns error strings on failure, None on success.""" + + def test_returns_none_on_successful_delivery(self): + from gateway.config import Platform + + pconfig = MagicMock() + pconfig.enabled = True + mock_cfg = MagicMock() + mock_cfg.platforms = {Platform.TELEGRAM: pconfig} + + with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \ + patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"success": True})): + job = { + "id": "ok-job", + "deliver": "origin", + "origin": {"platform": "telegram", "chat_id": "123"}, + } + result = _deliver_result(job, "Output.") + assert result is None + + def test_returns_none_for_local_delivery(self): + """local-only jobs don't deliver — not a failure.""" + job = {"id": "local-job", "deliver": "local"} + result = _deliver_result(job, "Output.") + assert result is None + + def test_returns_error_for_unknown_platform(self): + job = { + "id": "bad-platform", + "deliver": "origin", + "origin": {"platform": "fax", "chat_id": "123"}, + } + with patch("gateway.config.load_gateway_config"): + result = _deliver_result(job, "Output.") + assert result is not None + assert "unknown platform" in result + + def test_returns_error_when_platform_disabled(self): + from gateway.config import Platform + + pconfig = MagicMock() + pconfig.enabled = False + mock_cfg = MagicMock() + mock_cfg.platforms = {Platform.TELEGRAM: pconfig} + + with patch("gateway.config.load_gateway_config", return_value=mock_cfg): + job = { + "id": "disabled", + "deliver": "origin", + "origin": {"platform": "telegram", "chat_id": "123"}, + } + result = _deliver_result(job, "Output.") + assert result is not None + assert "not configured" in result + + def test_returns_error_on_send_failure(self): + from gateway.config import Platform + + pconfig = MagicMock() + pconfig.enabled = True + mock_cfg = MagicMock() + mock_cfg.platforms = {Platform.TELEGRAM: pconfig} + + with patch("gateway.config.load_gateway_config", return_value=mock_cfg), \ + patch("tools.send_message_tool._send_to_platform", new=AsyncMock(return_value={"error": "rate limited"})): + job = { + "id": "rate-limited", + "deliver": "origin", + "origin": {"platform": "telegram", "chat_id": "123"}, + } + result = _deliver_result(job, "Output.") + assert result is not None + assert "rate limited" in result + + def test_returns_error_for_unresolved_target(self, monkeypatch): + """Non-local delivery with no resolvable target should return an error.""" + monkeypatch.delenv("TELEGRAM_HOME_CHANNEL", raising=False) + job = {"id": "no-target", "deliver": "telegram"} + result = _deliver_result(job, "Output.") + assert result is not None + assert "no delivery target" in result + + class TestRunJobSessionPersistence: def test_run_job_passes_session_db_and_cron_platform(self, tmp_path): job = { diff --git a/tests/gateway/test_feishu_approval_buttons.py b/tests/gateway/test_feishu_approval_buttons.py new file mode 100644 index 0000000000..9c51d1ac49 --- /dev/null +++ b/tests/gateway/test_feishu_approval_buttons.py @@ -0,0 +1,432 @@ +"""Tests for Feishu interactive card approval buttons.""" + +import asyncio +import json +import os +import sys +from pathlib import Path +from types import SimpleNamespace +from unittest.mock import AsyncMock, MagicMock, Mock, patch + +import pytest + +# --------------------------------------------------------------------------- +# Ensure the repo root is importable +# --------------------------------------------------------------------------- +_repo = str(Path(__file__).resolve().parents[2]) +if _repo not in sys.path: + sys.path.insert(0, _repo) + + +# --------------------------------------------------------------------------- +# Minimal Feishu mock so FeishuAdapter can be imported without lark-oapi +# --------------------------------------------------------------------------- +def _ensure_feishu_mocks(): + """Provide stubs for lark-oapi / aiohttp.web so the import succeeds.""" + if "lark_oapi" not in sys.modules: + mod = MagicMock() + for name in ( + "lark_oapi", "lark_oapi.api.im.v1", + "lark_oapi.event", "lark_oapi.event.callback_type", + ): + sys.modules.setdefault(name, mod) + if "aiohttp" not in sys.modules: + aio = MagicMock() + sys.modules.setdefault("aiohttp", aio) + sys.modules.setdefault("aiohttp.web", aio.web) + + +_ensure_feishu_mocks() + +from gateway.config import PlatformConfig +from gateway.platforms.feishu import FeishuAdapter + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +def _make_adapter() -> FeishuAdapter: + """Create a FeishuAdapter with mocked internals.""" + config = PlatformConfig(enabled=True) + adapter = FeishuAdapter(config) + adapter._client = MagicMock() + return adapter + + +def _make_card_action_data( + action_value: dict, + chat_id: str = "oc_12345", + open_id: str = "ou_user1", + token: str = "tok_abc", +) -> SimpleNamespace: + """Create a mock Feishu card action callback data object.""" + return SimpleNamespace( + event=SimpleNamespace( + token=token, + context=SimpleNamespace(open_chat_id=chat_id), + operator=SimpleNamespace(open_id=open_id), + action=SimpleNamespace( + tag="button", + value=action_value, + ), + ), + ) + + +# =========================================================================== +# send_exec_approval — interactive card with buttons +# =========================================================================== + +class TestFeishuExecApproval: + """Test send_exec_approval sends an interactive card.""" + + @pytest.mark.asyncio + async def test_sends_interactive_card(self): + adapter = _make_adapter() + + mock_response = SimpleNamespace( + success=lambda: True, + data=SimpleNamespace(message_id="msg_001"), + ) + with patch.object( + adapter, "_feishu_send_with_retry", new_callable=AsyncMock, + return_value=mock_response, + ) as mock_send: + result = await adapter.send_exec_approval( + chat_id="oc_12345", + command="rm -rf /important", + session_key="agent:main:feishu:group:oc_12345", + description="dangerous deletion", + ) + + assert result.success is True + assert result.message_id == "msg_001" + + mock_send.assert_called_once() + kwargs = mock_send.call_args[1] + assert kwargs["chat_id"] == "oc_12345" + assert kwargs["msg_type"] == "interactive" + + # Verify card payload contains the command and buttons + card = json.loads(kwargs["payload"]) + assert card["header"]["template"] == "orange" + assert "rm -rf /important" in card["elements"][0]["content"] + assert "dangerous deletion" in card["elements"][0]["content"] + + # Check buttons + actions = card["elements"][1]["actions"] + assert len(actions) == 4 + action_names = [a["value"]["hermes_action"] for a in actions] + assert action_names == [ + "approve_once", "approve_session", "approve_always", "deny" + ] + + @pytest.mark.asyncio + async def test_stores_approval_state(self): + adapter = _make_adapter() + + mock_response = SimpleNamespace( + success=lambda: True, + data=SimpleNamespace(message_id="msg_002"), + ) + with patch.object( + adapter, "_feishu_send_with_retry", new_callable=AsyncMock, + return_value=mock_response, + ): + await adapter.send_exec_approval( + chat_id="oc_12345", + command="echo test", + session_key="my-session-key", + ) + + assert len(adapter._approval_state) == 1 + approval_id = list(adapter._approval_state.keys())[0] + state = adapter._approval_state[approval_id] + assert state["session_key"] == "my-session-key" + assert state["message_id"] == "msg_002" + assert state["chat_id"] == "oc_12345" + + @pytest.mark.asyncio + async def test_not_connected(self): + adapter = _make_adapter() + adapter._client = None + result = await adapter.send_exec_approval( + chat_id="oc_12345", command="ls", session_key="s" + ) + assert result.success is False + + @pytest.mark.asyncio + async def test_truncates_long_command(self): + adapter = _make_adapter() + + mock_response = SimpleNamespace( + success=lambda: True, + data=SimpleNamespace(message_id="msg_003"), + ) + with patch.object( + adapter, "_feishu_send_with_retry", new_callable=AsyncMock, + return_value=mock_response, + ) as mock_send: + long_cmd = "x" * 5000 + await adapter.send_exec_approval( + chat_id="oc_12345", command=long_cmd, session_key="s" + ) + + card = json.loads(mock_send.call_args[1]["payload"]) + content = card["elements"][0]["content"] + assert "..." in content + assert len(content) < 5000 + + @pytest.mark.asyncio + async def test_multiple_approvals_get_unique_ids(self): + adapter = _make_adapter() + + mock_response = SimpleNamespace( + success=lambda: True, + data=SimpleNamespace(message_id="msg_x"), + ) + with patch.object( + adapter, "_feishu_send_with_retry", new_callable=AsyncMock, + return_value=mock_response, + ): + await adapter.send_exec_approval( + chat_id="oc_1", command="cmd1", session_key="s1" + ) + await adapter.send_exec_approval( + chat_id="oc_2", command="cmd2", session_key="s2" + ) + + assert len(adapter._approval_state) == 2 + ids = list(adapter._approval_state.keys()) + assert ids[0] != ids[1] + + +# =========================================================================== +# _handle_card_action_event — approval button clicks +# =========================================================================== + +class TestFeishuApprovalCallback: + """Test the approval intercept in _handle_card_action_event.""" + + @pytest.mark.asyncio + async def test_resolves_approval_on_click(self): + adapter = _make_adapter() + adapter._approval_state[1] = { + "session_key": "agent:main:feishu:group:oc_12345", + "message_id": "msg_001", + "chat_id": "oc_12345", + } + + data = _make_card_action_data( + action_value={"hermes_action": "approve_once", "approval_id": 1}, + ) + + with ( + patch.object( + adapter, "_resolve_sender_profile", new_callable=AsyncMock, + return_value={"user_id": "ou_user1", "user_name": "Norbert", "user_id_alt": None}, + ), + patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update, + patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve, + ): + await adapter._handle_card_action_event(data) + + mock_resolve.assert_called_once_with("agent:main:feishu:group:oc_12345", "once") + mock_update.assert_called_once_with("msg_001", "Approved once", "Norbert", "once") + + # State should be cleaned up + assert 1 not in adapter._approval_state + + @pytest.mark.asyncio + async def test_deny_button(self): + adapter = _make_adapter() + adapter._approval_state[2] = { + "session_key": "some-session", + "message_id": "msg_002", + "chat_id": "oc_12345", + } + + data = _make_card_action_data( + action_value={"hermes_action": "deny", "approval_id": 2}, + token="tok_deny", + ) + + with ( + patch.object( + adapter, "_resolve_sender_profile", new_callable=AsyncMock, + return_value={"user_id": "ou_alice", "user_name": "Alice", "user_id_alt": None}, + ), + patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update, + patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve, + ): + await adapter._handle_card_action_event(data) + + mock_resolve.assert_called_once_with("some-session", "deny") + mock_update.assert_called_once_with("msg_002", "Denied", "Alice", "deny") + + @pytest.mark.asyncio + async def test_session_approval(self): + adapter = _make_adapter() + adapter._approval_state[3] = { + "session_key": "sess-3", + "message_id": "msg_003", + "chat_id": "oc_99", + } + + data = _make_card_action_data( + action_value={"hermes_action": "approve_session", "approval_id": 3}, + token="tok_ses", + ) + + with ( + patch.object( + adapter, "_resolve_sender_profile", new_callable=AsyncMock, + return_value={"user_id": "ou_u", "user_name": "Bob", "user_id_alt": None}, + ), + patch.object(adapter, "_update_approval_card", new_callable=AsyncMock) as mock_update, + patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve, + ): + await adapter._handle_card_action_event(data) + + mock_resolve.assert_called_once_with("sess-3", "session") + mock_update.assert_called_once_with("msg_003", "Approved for session", "Bob", "session") + + @pytest.mark.asyncio + async def test_always_approval(self): + adapter = _make_adapter() + adapter._approval_state[4] = { + "session_key": "sess-4", + "message_id": "msg_004", + "chat_id": "oc_55", + } + + data = _make_card_action_data( + action_value={"hermes_action": "approve_always", "approval_id": 4}, + token="tok_alw", + ) + + with ( + patch.object( + adapter, "_resolve_sender_profile", new_callable=AsyncMock, + return_value={"user_id": "ou_u", "user_name": "Carol", "user_id_alt": None}, + ), + patch.object(adapter, "_update_approval_card", new_callable=AsyncMock), + patch("tools.approval.resolve_gateway_approval", return_value=1) as mock_resolve, + ): + await adapter._handle_card_action_event(data) + + mock_resolve.assert_called_once_with("sess-4", "always") + + @pytest.mark.asyncio + async def test_already_resolved_drops_silently(self): + adapter = _make_adapter() + # No state for approval_id 99 — already resolved + + data = _make_card_action_data( + action_value={"hermes_action": "approve_once", "approval_id": 99}, + token="tok_gone", + ) + + with patch("tools.approval.resolve_gateway_approval") as mock_resolve: + await adapter._handle_card_action_event(data) + + # Should NOT resolve — already handled + mock_resolve.assert_not_called() + + @pytest.mark.asyncio + async def test_non_approval_actions_route_normally(self): + """Non-approval card actions should still become synthetic commands.""" + adapter = _make_adapter() + + data = _make_card_action_data( + action_value={"custom_action": "something_else"}, + token="tok_normal", + ) + + with ( + patch.object( + adapter, "_resolve_sender_profile", new_callable=AsyncMock, + return_value={"user_id": "ou_u", "user_name": "Dave", "user_id_alt": None}, + ), + patch.object(adapter, "get_chat_info", new_callable=AsyncMock, return_value={"name": "Test Chat"}), + patch.object(adapter, "_handle_message_with_guards", new_callable=AsyncMock) as mock_handle, + patch("tools.approval.resolve_gateway_approval") as mock_resolve, + ): + await adapter._handle_card_action_event(data) + + # Should NOT resolve any approval + mock_resolve.assert_not_called() + # Should have routed as synthetic command + mock_handle.assert_called_once() + event = mock_handle.call_args[0][0] + assert "/card button" in event.text + + +# =========================================================================== +# _update_approval_card — card replacement after resolution +# =========================================================================== + +class TestFeishuUpdateApprovalCard: + """Test the card update after approval resolution.""" + + @pytest.mark.asyncio + async def test_updates_card_on_approve(self): + adapter = _make_adapter() + + mock_update = AsyncMock() + adapter._client.im.v1.message.update = MagicMock() + + with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread: + await adapter._update_approval_card( + "msg_001", "Approved once", "Norbert", "once" + ) + + mock_thread.assert_called_once() + # Verify the update request was built + call_args = mock_thread.call_args + assert call_args[0][0] == adapter._client.im.v1.message.update + + @pytest.mark.asyncio + async def test_updates_card_on_deny(self): + adapter = _make_adapter() + + with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread: + await adapter._update_approval_card( + "msg_002", "Denied", "Alice", "deny" + ) + + mock_thread.assert_called_once() + + @pytest.mark.asyncio + async def test_skips_update_when_not_connected(self): + adapter = _make_adapter() + adapter._client = None + + with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread: + await adapter._update_approval_card( + "msg_001", "Approved", "Bob", "once" + ) + + mock_thread.assert_not_called() + + @pytest.mark.asyncio + async def test_skips_update_when_no_message_id(self): + adapter = _make_adapter() + + with patch("asyncio.to_thread", new_callable=AsyncMock) as mock_thread: + await adapter._update_approval_card( + "", "Approved", "Bob", "once" + ) + + mock_thread.assert_not_called() + + @pytest.mark.asyncio + async def test_swallows_update_errors(self): + adapter = _make_adapter() + + with patch("asyncio.to_thread", new_callable=AsyncMock, side_effect=Exception("API error")): + # Should not raise + await adapter._update_approval_card( + "msg_001", "Approved", "Bob", "once" + ) diff --git a/tests/gateway/test_reasoning_command.py b/tests/gateway/test_reasoning_command.py index cb9e01f11e..e39ed1123d 100644 --- a/tests/gateway/test_reasoning_command.py +++ b/tests/gateway/test_reasoning_command.py @@ -87,7 +87,6 @@ class TestReasoningCommand: ) monkeypatch.setattr(gateway_run, "_hermes_home", hermes_home) - monkeypatch.delenv("HERMES_REASONING_EFFORT", raising=False) runner = _make_runner() runner._reasoning_config = {"enabled": True, "effort": "xhigh"} @@ -108,7 +107,6 @@ class TestReasoningCommand: config_path.write_text("agent:\n reasoning_effort: medium\n", encoding="utf-8") monkeypatch.setattr(gateway_run, "_hermes_home", hermes_home) - monkeypatch.delenv("HERMES_REASONING_EFFORT", raising=False) runner = _make_runner() runner._reasoning_config = {"enabled": True, "effort": "medium"} @@ -138,7 +136,6 @@ class TestReasoningCommand: "api_key": "test-key", }, ) - monkeypatch.delenv("HERMES_REASONING_EFFORT", raising=False) fake_run_agent = types.ModuleType("run_agent") fake_run_agent.AIAgent = _CapturingAgent monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent) @@ -170,55 +167,6 @@ class TestReasoningCommand: assert _CapturingAgent.last_init is not None assert _CapturingAgent.last_init["reasoning_config"] == {"enabled": True, "effort": "low"} - def test_run_agent_prefers_config_over_stale_reasoning_env(self, tmp_path, monkeypatch): - hermes_home = tmp_path / "hermes" - hermes_home.mkdir() - (hermes_home / "config.yaml").write_text("agent:\n reasoning_effort: none\n", encoding="utf-8") - - monkeypatch.setattr(gateway_run, "_hermes_home", hermes_home) - monkeypatch.setattr(gateway_run, "_env_path", hermes_home / ".env") - monkeypatch.setattr(gateway_run, "load_dotenv", lambda *args, **kwargs: None) - monkeypatch.setattr( - gateway_run, - "_resolve_runtime_agent_kwargs", - lambda: { - "provider": "openrouter", - "api_mode": "chat_completions", - "base_url": "https://openrouter.ai/api/v1", - "api_key": "test-key", - }, - ) - monkeypatch.setenv("HERMES_REASONING_EFFORT", "low") - fake_run_agent = types.ModuleType("run_agent") - fake_run_agent.AIAgent = _CapturingAgent - monkeypatch.setitem(sys.modules, "run_agent", fake_run_agent) - - _CapturingAgent.last_init = None - runner = _make_runner() - - source = SessionSource( - platform=Platform.LOCAL, - chat_id="cli", - chat_name="CLI", - chat_type="dm", - user_id="user-1", - ) - - result = asyncio.run( - runner._run_agent( - message="ping", - context_prompt="", - history=[], - source=source, - session_id="session-1", - session_key="agent:main:local:dm", - ) - ) - - assert result["final_response"] == "ok" - assert _CapturingAgent.last_init is not None - assert _CapturingAgent.last_init["reasoning_config"] == {"enabled": False} - def test_run_agent_includes_enabled_mcp_servers_in_gateway_toolsets(self, tmp_path, monkeypatch): hermes_home = tmp_path / "hermes" hermes_home.mkdir() diff --git a/tests/gateway/test_session_boundary_hooks.py b/tests/gateway/test_session_boundary_hooks.py new file mode 100644 index 0000000000..31e02980a7 --- /dev/null +++ b/tests/gateway/test_session_boundary_hooks.py @@ -0,0 +1,158 @@ +"""Tests that on_session_finalize and on_session_reset plugin hooks fire in the gateway.""" +from datetime import datetime +from types import SimpleNamespace +from unittest.mock import AsyncMock, MagicMock, patch + +import pytest + +from gateway.config import GatewayConfig, Platform, PlatformConfig +from gateway.platforms.base import MessageEvent +from gateway.session import SessionEntry, SessionSource, build_session_key + + +def _make_source() -> SessionSource: + return SessionSource( + platform=Platform.TELEGRAM, + user_id="u1", + chat_id="c1", + user_name="tester", + chat_type="dm", + ) + + +def _make_event(text: str) -> MessageEvent: + return MessageEvent(text=text, source=_make_source(), message_id="m1") + + +def _make_runner(): + from gateway.run import GatewayRunner + + runner = object.__new__(GatewayRunner) + runner.config = GatewayConfig( + platforms={Platform.TELEGRAM: PlatformConfig(enabled=True, token="***")} + ) + adapter = MagicMock() + adapter.send = AsyncMock() + runner.adapters = {Platform.TELEGRAM: adapter} + runner._voice_mode = {} + runner.hooks = SimpleNamespace(emit=AsyncMock(), loaded_hooks=False) + runner._session_model_overrides = {} + runner._pending_model_notes = {} + runner._background_tasks = set() + + session_key = build_session_key(_make_source()) + session_entry = SessionEntry( + session_key=session_key, + session_id="sess-old", + created_at=datetime.now(), + updated_at=datetime.now(), + platform=Platform.TELEGRAM, + chat_type="dm", + ) + new_session_entry = SessionEntry( + session_key=session_key, + session_id="sess-new", + created_at=datetime.now(), + updated_at=datetime.now(), + platform=Platform.TELEGRAM, + chat_type="dm", + ) + runner.session_store = MagicMock() + runner.session_store.get_or_create_session.return_value = new_session_entry + runner.session_store.reset_session.return_value = new_session_entry + runner.session_store._entries = {session_key: session_entry} + runner.session_store._generate_session_key.return_value = session_key + runner._running_agents = {} + runner._pending_messages = {} + runner._pending_approvals = {} + runner._session_db = None + runner._agent_cache_lock = None + runner._is_user_authorized = lambda _source: True + runner._format_session_info = lambda: "" + + return runner + + +@pytest.mark.asyncio +@patch("hermes_cli.plugins.invoke_hook") +async def test_reset_fires_finalize_hook(mock_invoke_hook): + """/new must fire on_session_finalize with the OLD session id.""" + runner = _make_runner() + + await runner._handle_reset_command(_make_event("/new")) + + mock_invoke_hook.assert_any_call( + "on_session_finalize", session_id="sess-old", platform="telegram" + ) + + +@pytest.mark.asyncio +@patch("hermes_cli.plugins.invoke_hook") +async def test_reset_fires_reset_hook(mock_invoke_hook): + """/new must fire on_session_reset with the NEW session id.""" + runner = _make_runner() + + await runner._handle_reset_command(_make_event("/new")) + + mock_invoke_hook.assert_any_call( + "on_session_reset", session_id="sess-new", platform="telegram" + ) + + +@pytest.mark.asyncio +@patch("hermes_cli.plugins.invoke_hook") +async def test_finalize_before_reset(mock_invoke_hook): + """on_session_finalize must fire before on_session_reset.""" + runner = _make_runner() + + await runner._handle_reset_command(_make_event("/new")) + + calls = [c for c in mock_invoke_hook.call_args_list + if c[0][0] in ("on_session_finalize", "on_session_reset")] + hook_names = [c[0][0] for c in calls] + assert hook_names == ["on_session_finalize", "on_session_reset"] + + +@pytest.mark.asyncio +@patch("hermes_cli.plugins.invoke_hook") +async def test_shutdown_fires_finalize_for_active_agents(mock_invoke_hook): + """Gateway stop() must fire on_session_finalize for each active agent.""" + from gateway.run import GatewayRunner + + runner = object.__new__(GatewayRunner) + runner._running = True + runner._background_tasks = set() + runner._pending_messages = {} + runner._pending_approvals = {} + runner._shutdown_event = MagicMock() + runner.adapters = {} + runner._exit_reason = "test" + + agent1 = MagicMock() + agent1.session_id = "sess-a" + agent2 = MagicMock() + agent2.session_id = "sess-b" + runner._running_agents = {"key-a": agent1, "key-b": agent2} + + with patch("gateway.status.remove_pid_file"), \ + patch("gateway.status.write_runtime_status"): + await runner.stop() + + finalize_calls = [ + c for c in mock_invoke_hook.call_args_list + if c[0][0] == "on_session_finalize" + ] + session_ids = {c[1]["session_id"] for c in finalize_calls} + assert session_ids == {"sess-a", "sess-b"} + + +@pytest.mark.asyncio +@patch("hermes_cli.plugins.invoke_hook", side_effect=Exception("boom")) +async def test_hook_error_does_not_break_reset(mock_invoke_hook): + """Plugin hook errors must not prevent /new from completing.""" + runner = _make_runner() + + result = await runner._handle_reset_command(_make_event("/new")) + + # Should still return a success message despite hook errors + assert "Session reset" in result or "New session" in result diff --git a/tests/gateway/test_stream_consumer.py b/tests/gateway/test_stream_consumer.py index 6c908bbe40..ddc88fc2fc 100644 --- a/tests/gateway/test_stream_consumer.py +++ b/tests/gateway/test_stream_consumer.py @@ -324,3 +324,91 @@ class TestSegmentBreakOnToolBoundary: await consumer.run() assert consumer.already_sent + + @pytest.mark.asyncio + async def test_edit_failure_sends_only_unsent_tail_at_finish(self): + """If an edit fails mid-stream, send only the missing tail once at finish.""" + adapter = MagicMock() + send_results = [ + SimpleNamespace(success=True, message_id="msg_1"), + SimpleNamespace(success=True, message_id="msg_2"), + ] + adapter.send = AsyncMock(side_effect=send_results) + adapter.edit_message = AsyncMock(return_value=SimpleNamespace(success=False, error="flood_control:6")) + adapter.MAX_MESSAGE_LENGTH = 4096 + + config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5, cursor=" ▉") + consumer = GatewayStreamConsumer(adapter, "chat_123", config) + + consumer.on_delta("Hello") + task = asyncio.create_task(consumer.run()) + await asyncio.sleep(0.08) + consumer.on_delta(" world") + await asyncio.sleep(0.08) + consumer.finish() + await task + + assert adapter.send.call_count == 2 + first_text = adapter.send.call_args_list[0][1]["content"] + second_text = adapter.send.call_args_list[1][1]["content"] + assert "Hello" in first_text + assert second_text.strip() == "world" + assert consumer.already_sent + + @pytest.mark.asyncio + async def test_segment_break_clears_failed_edit_fallback_state(self): + """A tool boundary after edit failure must not duplicate the next segment.""" + adapter = MagicMock() + send_results = [ + SimpleNamespace(success=True, message_id="msg_1"), + SimpleNamespace(success=True, message_id="msg_2"), + ] + adapter.send = AsyncMock(side_effect=send_results) + adapter.edit_message = AsyncMock(return_value=SimpleNamespace(success=False, error="flood_control:6")) + adapter.MAX_MESSAGE_LENGTH = 4096 + + config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5, cursor=" ▉") + consumer = GatewayStreamConsumer(adapter, "chat_123", config) + + consumer.on_delta("Hello") + task = asyncio.create_task(consumer.run()) + await asyncio.sleep(0.08) + consumer.on_delta(" world") + await asyncio.sleep(0.08) + consumer.on_delta(None) + consumer.on_delta("Next segment") + consumer.finish() + await task + + sent_texts = [call[1]["content"] for call in adapter.send.call_args_list] + assert sent_texts == ["Hello ▉", "Next segment"] + + @pytest.mark.asyncio + async def test_fallback_final_splits_long_continuation_without_dropping_text(self): + """Long continuation tails should be chunked when fallback final-send runs.""" + adapter = MagicMock() + adapter.send = AsyncMock(side_effect=[ + SimpleNamespace(success=True, message_id="msg_1"), + SimpleNamespace(success=True, message_id="msg_2"), + SimpleNamespace(success=True, message_id="msg_3"), + ]) + adapter.edit_message = AsyncMock(return_value=SimpleNamespace(success=False, error="flood_control:6")) + adapter.MAX_MESSAGE_LENGTH = 610 + + config = StreamConsumerConfig(edit_interval=0.01, buffer_threshold=5, cursor=" ▉") + consumer = GatewayStreamConsumer(adapter, "chat_123", config) + + prefix = "abc" + tail = "x" * 620 + consumer.on_delta(prefix) + task = asyncio.create_task(consumer.run()) + await asyncio.sleep(0.08) + consumer.on_delta(tail) + await asyncio.sleep(0.08) + consumer.finish() + await task + + sent_texts = [call[1]["content"] for call in adapter.send.call_args_list] + assert len(sent_texts) == 3 + assert sent_texts[0].startswith(prefix) + assert sum(len(t) for t in sent_texts[1:]) == len(tail) diff --git a/tests/hermes_cli/test_doctor.py b/tests/hermes_cli/test_doctor.py index d91cf3f647..f30fb48396 100644 --- a/tests/hermes_cli/test_doctor.py +++ b/tests/hermes_cli/test_doctor.py @@ -136,3 +136,73 @@ def test_check_gateway_service_linger_skips_when_service_not_installed(monkeypat out = capsys.readouterr().out assert out == "" assert issues == [] + + +# ── Memory provider section (doctor should only check the *active* provider) ── + + +class TestDoctorMemoryProviderSection: + """The ◆ Memory Provider section should respect memory.provider config.""" + + def _make_hermes_home(self, tmp_path, provider=""): + """Create a minimal HERMES_HOME with config.yaml.""" + home = tmp_path / ".hermes" + home.mkdir(parents=True, exist_ok=True) + import yaml + config = {"memory": {"provider": provider}} if provider else {"memory": {}} + (home / "config.yaml").write_text(yaml.dump(config)) + return home + + def _run_doctor_and_capture(self, monkeypatch, tmp_path, provider=""): + """Run doctor and capture stdout.""" + home = self._make_hermes_home(tmp_path, provider) + monkeypatch.setattr(doctor_mod, "HERMES_HOME", home) + monkeypatch.setattr(doctor_mod, "PROJECT_ROOT", tmp_path / "project") + monkeypatch.setattr(doctor_mod, "_DHH", str(home)) + (tmp_path / "project").mkdir(exist_ok=True) + + # Stub tool availability (returns empty) so doctor runs past it + fake_model_tools = types.SimpleNamespace( + check_tool_availability=lambda *a, **kw: ([], []), + TOOLSET_REQUIREMENTS={}, + ) + monkeypatch.setitem(sys.modules, "model_tools", fake_model_tools) + + # Stub auth checks to avoid real API calls + try: + from hermes_cli import auth as _auth_mod + monkeypatch.setattr(_auth_mod, "get_nous_auth_status", lambda: {}) + monkeypatch.setattr(_auth_mod, "get_codex_auth_status", lambda: {}) + except Exception: + pass + + import io, contextlib + buf = io.StringIO() + with contextlib.redirect_stdout(buf): + doctor_mod.run_doctor(Namespace(fix=False)) + return buf.getvalue() + + def test_no_provider_shows_builtin_ok(self, monkeypatch, tmp_path): + out = self._run_doctor_and_capture(monkeypatch, tmp_path, provider="") + assert "Memory Provider" in out + assert "Built-in memory active" in out + # Should NOT mention Honcho or Mem0 errors + assert "Honcho API key" not in out + assert "Mem0" not in out + + def test_honcho_provider_not_installed_shows_fail(self, monkeypatch, tmp_path): + # Make honcho import fail + monkeypatch.setitem( + sys.modules, "plugins.memory.honcho.client", None + ) + out = self._run_doctor_and_capture(monkeypatch, tmp_path, provider="honcho") + assert "Memory Provider" in out + # Should show failure since honcho is set but not importable + assert "Built-in memory active" not in out + + def test_mem0_provider_not_installed_shows_fail(self, monkeypatch, tmp_path): + # Make mem0 import fail + monkeypatch.setitem(sys.modules, "plugins.memory.mem0", None) + out = self._run_doctor_and_capture(monkeypatch, tmp_path, provider="mem0") + assert "Memory Provider" in out + assert "Built-in memory active" not in out diff --git a/tests/hermes_cli/test_runtime_provider_resolution.py b/tests/hermes_cli/test_runtime_provider_resolution.py index ded0c9202f..0abc8196f7 100644 --- a/tests/hermes_cli/test_runtime_provider_resolution.py +++ b/tests/hermes_cli/test_runtime_provider_resolution.py @@ -808,6 +808,55 @@ def test_minimax_explicit_api_mode_respected(monkeypatch): assert resolved["api_mode"] == "chat_completions" +def test_minimax_config_base_url_overrides_hardcoded_default(monkeypatch): + """model.base_url in config.yaml should override the hardcoded default (#6039).""" + monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax") + monkeypatch.setattr(rp, "_get_model_config", lambda: { + "provider": "minimax", + "base_url": "https://api.minimaxi.com/anthropic", + }) + monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key") + monkeypatch.delenv("MINIMAX_BASE_URL", raising=False) + + resolved = rp.resolve_runtime_provider(requested="minimax") + + assert resolved["provider"] == "minimax" + assert resolved["base_url"] == "https://api.minimaxi.com/anthropic" + assert resolved["api_mode"] == "anthropic_messages" + + +def test_minimax_env_base_url_still_wins_over_config(monkeypatch): + """MINIMAX_BASE_URL env var should take priority over config.yaml model.base_url.""" + monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax") + monkeypatch.setattr(rp, "_get_model_config", lambda: { + "provider": "minimax", + "base_url": "https://api.minimaxi.com/anthropic", + }) + monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key") + monkeypatch.setenv("MINIMAX_BASE_URL", "https://custom.example.com/v1") + + resolved = rp.resolve_runtime_provider(requested="minimax") + + # Env var wins because resolve_api_key_provider_credentials prefers it + assert resolved["base_url"] == "https://custom.example.com/v1" + + +def test_minimax_config_base_url_ignored_for_different_provider(monkeypatch): + """model.base_url should NOT be used when model.provider doesn't match.""" + monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "minimax") + monkeypatch.setattr(rp, "_get_model_config", lambda: { + "provider": "openrouter", + "base_url": "https://some-other-endpoint.com/v1", + }) + monkeypatch.setenv("MINIMAX_API_KEY", "test-minimax-key") + monkeypatch.delenv("MINIMAX_BASE_URL", raising=False) + + resolved = rp.resolve_runtime_provider(requested="minimax") + + # Should use the default, NOT the config base_url from a different provider + assert resolved["base_url"] == "https://api.minimax.io/anthropic" + + def test_alibaba_default_coding_intl_endpoint_uses_chat_completions(monkeypatch): """Alibaba default coding-intl /v1 URL should use chat_completions mode.""" monkeypatch.setattr(rp, "resolve_provider", lambda *a, **k: "alibaba") diff --git a/tests/hermes_cli/test_setup_model_selection.py b/tests/hermes_cli/test_setup_model_selection.py index 3cb7056cf2..b42365da9d 100644 --- a/tests/hermes_cli/test_setup_model_selection.py +++ b/tests/hermes_cli/test_setup_model_selection.py @@ -34,8 +34,8 @@ class TestSetupProviderModelSelection: @pytest.mark.parametrize("provider_id,expected_defaults", [ ("zai", ["glm-5", "glm-4.7", "glm-4.5", "glm-4.5-flash"]), ("kimi-coding", ["kimi-k2.5", "kimi-k2-thinking", "kimi-k2-turbo-preview"]), - ("minimax", ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"]), - ("minimax-cn", ["MiniMax-M2.7", "MiniMax-M2.7-highspeed", "MiniMax-M2.5", "MiniMax-M2.5-highspeed", "MiniMax-M2.1"]), + ("minimax", ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"]), + ("minimax-cn", ["MiniMax-M1", "MiniMax-M1-40k", "MiniMax-M1-80k", "MiniMax-M1-128k", "MiniMax-M1-256k", "MiniMax-M2.5", "MiniMax-M2.7"]), ("opencode-zen", ["gpt-5.4", "gpt-5.3-codex", "claude-sonnet-4-6", "gemini-3-flash"]), ("opencode-go", ["glm-5", "kimi-k2.5", "minimax-m2.5", "minimax-m2.7"]), ]) diff --git a/tests/test_ollama_num_ctx.py b/tests/test_ollama_num_ctx.py new file mode 100644 index 0000000000..fff0144d33 --- /dev/null +++ b/tests/test_ollama_num_ctx.py @@ -0,0 +1,135 @@ +"""Tests for Ollama num_ctx context length detection and injection. + +Covers: + agent/model_metadata.py — query_ollama_num_ctx() + run_agent.py — _ollama_num_ctx detection + extra_body injection +""" + +from unittest.mock import patch, MagicMock + +import pytest + +from agent.model_metadata import query_ollama_num_ctx + + +# ═══════════════════════════════════════════════════════════════════════ +# Level 1: query_ollama_num_ctx — Ollama API interaction +# ═══════════════════════════════════════════════════════════════════════ + + +def _mock_httpx_client(show_response_data, status_code=200): + """Create a mock httpx.Client context manager that returns given /api/show data.""" + mock_resp = MagicMock(status_code=status_code) + mock_resp.json.return_value = show_response_data + mock_client = MagicMock() + mock_client.post.return_value = mock_resp + mock_ctx = MagicMock() + mock_ctx.__enter__ = MagicMock(return_value=mock_client) + mock_ctx.__exit__ = MagicMock(return_value=False) + return mock_ctx, mock_client + + +class TestQueryOllamaNumCtx: + """Test the Ollama /api/show context length query.""" + + def test_returns_context_from_model_info(self): + """Should extract context_length from GGUF model_info metadata.""" + show_data = { + "model_info": {"llama.context_length": 131072}, + "parameters": "", + } + mock_ctx, _ = _mock_httpx_client(show_data) + + with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"): + # httpx is imported inside the function — patch the module import + import httpx + with patch.object(httpx, "Client", return_value=mock_ctx): + result = query_ollama_num_ctx("llama3.1:8b", "http://localhost:11434/v1") + + assert result == 131072 + + def test_prefers_explicit_num_ctx_from_modelfile(self): + """If the Modelfile sets num_ctx explicitly, that should take priority.""" + show_data = { + "model_info": {"llama.context_length": 131072}, + "parameters": "num_ctx 32768\ntemperature 0.7", + } + mock_ctx, _ = _mock_httpx_client(show_data) + + with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"): + import httpx + with patch.object(httpx, "Client", return_value=mock_ctx): + result = query_ollama_num_ctx("custom-model", "http://localhost:11434") + + assert result == 32768 + + def test_returns_none_for_non_ollama_server(self): + """Should return None if the server is not Ollama.""" + with patch("agent.model_metadata.detect_local_server_type", return_value="lm-studio"): + result = query_ollama_num_ctx("model", "http://localhost:1234") + assert result is None + + def test_returns_none_on_connection_error(self): + """Should return None if the server is unreachable.""" + with patch("agent.model_metadata.detect_local_server_type", side_effect=Exception("timeout")): + result = query_ollama_num_ctx("model", "http://localhost:11434") + assert result is None + + def test_returns_none_on_404(self): + """Should return None if the model is not found.""" + mock_ctx, _ = _mock_httpx_client({}, status_code=404) + + with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"): + import httpx + with patch.object(httpx, "Client", return_value=mock_ctx): + result = query_ollama_num_ctx("nonexistent", "http://localhost:11434") + + assert result is None + + def test_strips_provider_prefix(self): + """Should strip 'local:' prefix from model name before querying.""" + show_data = { + "model_info": {"qwen2.context_length": 32768}, + "parameters": "", + } + mock_ctx, mock_client = _mock_httpx_client(show_data) + + with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"): + import httpx + with patch.object(httpx, "Client", return_value=mock_ctx): + result = query_ollama_num_ctx("local:qwen2.5:7b", "http://localhost:11434/v1") + + # Verify the post was called with stripped name (no "local:" prefix) + call_args = mock_client.post.call_args + assert call_args[1]["json"]["name"] == "qwen2.5:7b" or call_args[0][1] is not None + assert result == 32768 + + def test_handles_qwen2_architecture_key(self): + """Different model architectures use different key prefixes in model_info.""" + show_data = { + "model_info": {"qwen2.context_length": 65536}, + "parameters": "", + } + mock_ctx, _ = _mock_httpx_client(show_data) + + with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"): + import httpx + with patch.object(httpx, "Client", return_value=mock_ctx): + result = query_ollama_num_ctx("qwen2.5:32b", "http://localhost:11434") + + assert result == 65536 + + def test_returns_none_when_model_info_empty(self): + """Should return None if model_info has no context_length key.""" + show_data = { + "model_info": {"llama.embedding_length": 4096}, + "parameters": "", + } + mock_ctx, _ = _mock_httpx_client(show_data) + + with patch("agent.model_metadata.detect_local_server_type", return_value="ollama"): + import httpx + with patch.object(httpx, "Client", return_value=mock_ctx): + result = query_ollama_num_ctx("model", "http://localhost:11434") + + assert result is None diff --git a/tests/test_retry_utils.py b/tests/test_retry_utils.py new file mode 100644 index 0000000000..f39c3142d9 --- /dev/null +++ b/tests/test_retry_utils.py @@ -0,0 +1,117 @@ +"""Tests for agent.retry_utils jittered backoff.""" + +import threading + +import agent.retry_utils as retry_utils +from agent.retry_utils import jittered_backoff + + +def test_backoff_is_exponential(): + """Base delay should double each attempt (before jitter).""" + for attempt in (1, 2, 3, 4): + delays = [jittered_backoff(attempt, base_delay=5.0, max_delay=120.0, jitter_ratio=0.0) for _ in range(100)] + expected = min(5.0 * (2 ** (attempt - 1)), 120.0) + mean = sum(delays) / len(delays) + assert abs(mean - expected) < 0.01, f"attempt {attempt}: expected {expected}, got {mean}" + + +def test_backoff_respects_max_delay(): + """Even with high attempt numbers, delay should not exceed max_delay.""" + for attempt in (10, 20, 100): + delay = jittered_backoff(attempt, base_delay=5.0, max_delay=60.0, jitter_ratio=0.0) + assert delay <= 60.0, f"attempt {attempt}: delay {delay} exceeds max 60s" + + +def test_backoff_adds_jitter(): + """With jitter enabled, delays should vary across calls.""" + delays = [jittered_backoff(1, base_delay=10.0, max_delay=120.0, jitter_ratio=0.5) for _ in range(50)] + assert min(delays) != max(delays), "jitter should produce varying delays" + assert all(d >= 10.0 for d in delays), "jittered delay should be >= base delay" + assert all(d <= 15.0 for d in delays), "jittered delay should be bounded" + + +def test_backoff_attempt_1_is_base(): + """First attempt delay should equal base_delay (with no jitter).""" + delay = jittered_backoff(1, base_delay=3.0, max_delay=120.0, jitter_ratio=0.0) + assert delay == 3.0 + + +def test_backoff_with_zero_base_delay_returns_max(): + """base_delay=0 should return max_delay (guard against busy-wait).""" + delay = jittered_backoff(1, base_delay=0.0, max_delay=60.0, jitter_ratio=0.0) + assert delay == 60.0 + + +def test_backoff_with_extreme_attempt_returns_max(): + """Very large attempt numbers should not overflow and should return max_delay.""" + delay = jittered_backoff(999, base_delay=5.0, max_delay=120.0, jitter_ratio=0.0) + assert delay == 120.0 + + +def test_backoff_negative_attempt_treated_as_one(): + """Negative attempt should not crash and behaves like attempt=1.""" + delay = jittered_backoff(-5, base_delay=10.0, max_delay=120.0, jitter_ratio=0.0) + assert delay == 10.0 + + +def test_backoff_thread_safety(): + """Concurrent calls should generally produce different delays.""" + results = [] + barrier = threading.Barrier(8) + + def _call_backoff(): + barrier.wait() + results.append(jittered_backoff(1, base_delay=10.0, max_delay=120.0, jitter_ratio=0.5)) + + threads = [threading.Thread(target=_call_backoff) for _ in range(8)] + for t in threads: + t.start() + for t in threads: + t.join(timeout=5) + + assert len(results) == 8 + unique = len(set(results)) + assert unique >= 6, f"Expected mostly unique delays, got {unique}/8 unique" + + +def test_backoff_uses_locked_tick_for_seed(monkeypatch): + """Seed derivation should use per-call tick captured under lock.""" + import time + + monkeypatch.setattr(retry_utils, "_jitter_counter", 0) + + recorded_seeds = [] + + class _RecordingRandom: + def __init__(self, seed): + recorded_seeds.append(seed) + + def uniform(self, a, b): + return 0.0 + + monkeypatch.setattr(retry_utils.random, "Random", _RecordingRandom) + + fixed_time_ns = 123456789 + + def _time_ns_wait_for_two_ticks(): + deadline = time.time() + 2.0 + while retry_utils._jitter_counter < 2 and time.time() < deadline: + time.sleep(0.001) + return fixed_time_ns + + monkeypatch.setattr(retry_utils.time, "time_ns", _time_ns_wait_for_two_ticks) + + barrier = threading.Barrier(2) + + def _call(): + barrier.wait() + jittered_backoff(1, base_delay=10.0, max_delay=120.0, jitter_ratio=0.5) + + threads = [threading.Thread(target=_call) for _ in range(2)] + for t in threads: + t.start() + for t in threads: + t.join(timeout=5) + + assert len(recorded_seeds) == 2 + assert len(set(recorded_seeds)) == 2, f"Expected unique seeds, got {recorded_seeds}" diff --git a/tests/tools/test_browser_camofox_persistence.py b/tests/tools/test_browser_camofox_persistence.py index 0fa5723c67..0e9c863727 100644 --- a/tests/tools/test_browser_camofox_persistence.py +++ b/tests/tools/test_browser_camofox_persistence.py @@ -16,6 +16,7 @@ from tools.browser_camofox import ( _managed_persistence_enabled, camofox_close, camofox_navigate, + camofox_soft_cleanup, check_camofox_available, cleanup_all_camofox_sessions, get_vnc_url, @@ -240,3 +241,50 @@ class TestVncUrlDiscovery: assert result["vnc_url"] == "http://localhost:6080" assert "vnc_hint" in result + + +class TestCamofoxSoftCleanup: + """camofox_soft_cleanup drops local state only when managed persistence is on.""" + + def test_returns_true_and_drops_session_when_enabled(self, tmp_path, monkeypatch): + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + monkeypatch.setenv("CAMOFOX_URL", "http://localhost:9377") + + with _enable_persistence(): + _get_session("task-1") + result = camofox_soft_cleanup("task-1") + + assert result is True + # Session should have been dropped from in-memory store + import tools.browser_camofox as mod + with mod._sessions_lock: + assert "task-1" not in mod._sessions + + def test_returns_false_when_disabled(self, tmp_path, monkeypatch): + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + monkeypatch.setenv("CAMOFOX_URL", "http://localhost:9377") + + _get_session("task-1") + config = {"browser": {"camofox": {"managed_persistence": False}}} + with patch("tools.browser_camofox.load_config", return_value=config): + result = camofox_soft_cleanup("task-1") + + assert result is False + # Session should still be present — not dropped + import tools.browser_camofox as mod + with mod._sessions_lock: + assert "task-1" in mod._sessions + + def test_does_not_call_server_delete(self, tmp_path, monkeypatch): + """Soft cleanup must never hit the Camofox /sessions DELETE endpoint.""" + monkeypatch.setenv("HERMES_HOME", str(tmp_path)) + monkeypatch.setenv("CAMOFOX_URL", "http://localhost:9377") + + with ( + _enable_persistence(), + patch("tools.browser_camofox.requests.delete") as mock_delete, + ): + _get_session("task-1") + camofox_soft_cleanup("task-1") + + mock_delete.assert_not_called() diff --git a/tests/tools/test_browser_cleanup.py b/tests/tools/test_browser_cleanup.py index df21f3a0ea..817927903e 100644 --- a/tests/tools/test_browser_cleanup.py +++ b/tests/tools/test_browser_cleanup.py @@ -65,6 +65,62 @@ class TestBrowserCleanup: mock_stop.assert_called_once_with("task-1") mock_run.assert_called_once_with("task-1", "close", [], timeout=10) + def test_cleanup_camofox_managed_persistence_skips_close(self): + """When camofox mode + managed persistence, soft_cleanup fires instead of close.""" + browser_tool = self.browser_tool + browser_tool._active_sessions["task-1"] = { + "session_name": "sess-1", + "bb_session_id": None, + } + browser_tool._session_last_activity["task-1"] = 123.0 + + with ( + patch("tools.browser_tool._is_camofox_mode", return_value=True), + patch("tools.browser_tool._maybe_stop_recording") as mock_stop, + patch( + "tools.browser_tool._run_browser_command", + return_value={"success": True}, + ), + patch("tools.browser_tool.os.path.exists", return_value=False), + patch( + "tools.browser_camofox.camofox_soft_cleanup", + return_value=True, + ) as mock_soft, + patch("tools.browser_camofox.camofox_close") as mock_close, + ): + browser_tool.cleanup_browser("task-1") + + mock_soft.assert_called_once_with("task-1") + mock_close.assert_not_called() + + def test_cleanup_camofox_no_persistence_calls_close(self): + """When camofox mode but managed persistence is off, camofox_close fires.""" + browser_tool = self.browser_tool + browser_tool._active_sessions["task-1"] = { + "session_name": "sess-1", + "bb_session_id": None, + } + browser_tool._session_last_activity["task-1"] = 123.0 + + with ( + patch("tools.browser_tool._is_camofox_mode", return_value=True), + patch("tools.browser_tool._maybe_stop_recording") as mock_stop, + patch( + "tools.browser_tool._run_browser_command", + return_value={"success": True}, + ), + patch("tools.browser_tool.os.path.exists", return_value=False), + patch( + "tools.browser_camofox.camofox_soft_cleanup", + return_value=False, + ) as mock_soft, + patch("tools.browser_camofox.camofox_close") as mock_close, + ): + browser_tool.cleanup_browser("task-1") + + mock_soft.assert_called_once_with("task-1") + mock_close.assert_called_once_with("task-1") + def test_emergency_cleanup_clears_all_tracking_state(self): browser_tool = self.browser_tool browser_tool._cleanup_done = False diff --git a/tests/tools/test_browser_homebrew_paths.py b/tests/tools/test_browser_homebrew_paths.py index 3e2e766694..33b725604c 100644 --- a/tests/tools/test_browser_homebrew_paths.py +++ b/tests/tools/test_browser_homebrew_paths.py @@ -152,6 +152,109 @@ class TestFindAgentBrowser: class TestRunBrowserCommandPathConstruction: """Verify _run_browser_command() includes Homebrew node dirs in subprocess PATH.""" + def test_subprocess_preserves_executable_path_with_spaces(self, tmp_path): + """A local agent-browser path containing spaces must stay one argv entry.""" + captured_cmd = None + + mock_proc = MagicMock() + mock_proc.returncode = 0 + mock_proc.wait.return_value = 0 + + def capture_popen(cmd, **kwargs): + nonlocal captured_cmd + captured_cmd = cmd + return mock_proc + + fake_session = { + "session_name": "test-session", + "session_id": "test-id", + "cdp_url": None, + } + fake_json = json.dumps({"success": True}) + browser_path = "/Users/test/Library/Application Support/hermes/node_modules/.bin/agent-browser" + hermes_home = str(tmp_path / "hermes-home") + + with patch("tools.browser_tool._find_agent_browser", return_value=browser_path), \ + patch("tools.browser_tool._get_session_info", return_value=fake_session), \ + patch("tools.browser_tool._socket_safe_tmpdir", return_value=str(tmp_path)), \ + patch("tools.browser_tool._discover_homebrew_node_dirs", return_value=[]), \ + patch("hermes_constants.Path.home", return_value=tmp_path), \ + patch("subprocess.Popen", side_effect=capture_popen), \ + patch("os.open", return_value=99), \ + patch("os.close"), \ + patch("tools.interrupt.is_interrupted", return_value=False), \ + patch.dict( + os.environ, + { + "PATH": "/usr/bin:/bin", + "HOME": "/home/test", + "HERMES_HOME": hermes_home, + }, + clear=True, + ): + with patch("builtins.open", mock_open(read_data=fake_json)): + _run_browser_command("test-task", "navigate", ["https://example.com"]) + + assert captured_cmd is not None + assert captured_cmd[0] == browser_path + assert captured_cmd[1:5] == [ + "--session", + "test-session", + "--json", + "navigate", + ] + + def test_subprocess_splits_npx_fallback_into_command_and_package(self, tmp_path): + """The synthetic npx fallback should still expand into separate argv items.""" + captured_cmd = None + + mock_proc = MagicMock() + mock_proc.returncode = 0 + mock_proc.wait.return_value = 0 + + def capture_popen(cmd, **kwargs): + nonlocal captured_cmd + captured_cmd = cmd + return mock_proc + + fake_session = { + "session_name": "test-session", + "session_id": "test-id", + "cdp_url": None, + } + fake_json = json.dumps({"success": True}) + hermes_home = str(tmp_path / "hermes-home") + + with patch("tools.browser_tool._find_agent_browser", return_value="npx agent-browser"), \ + patch("tools.browser_tool._get_session_info", return_value=fake_session), \ + patch("tools.browser_tool._socket_safe_tmpdir", return_value=str(tmp_path)), \ + patch("tools.browser_tool._discover_homebrew_node_dirs", return_value=[]), \ + patch("hermes_constants.Path.home", return_value=tmp_path), \ + patch("subprocess.Popen", side_effect=capture_popen), \ + patch("os.open", return_value=99), \ + patch("os.close"), \ + patch("tools.interrupt.is_interrupted", return_value=False), \ + patch.dict( + os.environ, + { + "PATH": "/usr/bin:/bin", + "HOME": "/home/test", + "HERMES_HOME": hermes_home, + }, + clear=True, + ): + with patch("builtins.open", mock_open(read_data=fake_json)): + _run_browser_command("test-task", "navigate", ["https://example.com"]) + + assert captured_cmd is not None + assert captured_cmd[:2] == ["npx", "agent-browser"] + assert captured_cmd[2:6] == [ + "--session", + "test-session", + "--json", + "navigate", + ] + def test_subprocess_path_includes_homebrew_node_dirs(self, tmp_path): """When _discover_homebrew_node_dirs returns dirs, they should appear in the subprocess env PATH passed to Popen.""" diff --git a/tests/tools/test_notify_on_complete.py b/tests/tools/test_notify_on_complete.py index 888721906d..8cf17bfbf6 100644 --- a/tests/tools/test_notify_on_complete.py +++ b/tests/tools/test_notify_on_complete.py @@ -197,6 +197,26 @@ class TestCheckpointNotify: s = registry.get("proc_live") assert s.notify_on_complete is True + def test_recover_requeues_notify_watchers(self, registry, tmp_path): + checkpoint = tmp_path / "procs.json" + checkpoint.write_text(json.dumps([{ + "session_id": "proc_live", + "command": "sleep 999", + "pid": os.getpid(), + "task_id": "t1", + "session_key": "sk1", + "watcher_platform": "telegram", + "watcher_chat_id": "123", + "watcher_thread_id": "42", + "watcher_interval": 5, + "notify_on_complete": True, + }])) + with patch("tools.process_registry.CHECKPOINT_PATH", checkpoint): + recovered = registry.recover_from_checkpoint() + assert recovered == 1 + assert len(registry.pending_watchers) == 1 + assert registry.pending_watchers[0]["notify_on_complete"] is True + def test_recover_defaults_false(self, registry, tmp_path): """Old checkpoint entries without the field default to False.""" checkpoint = tmp_path / "procs.json" diff --git a/tests/tools/test_process_registry.py b/tests/tools/test_process_registry.py index e6cfa40e77..44e3a1bd32 100644 --- a/tests/tools/test_process_registry.py +++ b/tests/tools/test_process_registry.py @@ -2,6 +2,9 @@ import json import os +import signal +import subprocess +import sys import time import pytest from pathlib import Path @@ -45,6 +48,23 @@ def _make_session( return s +def _spawn_python_sleep(seconds: float) -> subprocess.Popen: + """Spawn a portable short-lived Python sleep process.""" + return subprocess.Popen( + [sys.executable, "-c", f"import time; time.sleep({seconds})"], + ) + + +def _wait_until(predicate, timeout: float = 5.0, interval: float = 0.05) -> bool: + """Poll a predicate until it returns truthy or the timeout elapses.""" + deadline = time.monotonic() + timeout + while time.monotonic() < deadline: + if predicate(): + return True + time.sleep(interval) + return False + + # ========================================================================= # Get / Poll # ========================================================================= @@ -349,6 +369,88 @@ class TestCheckpoint: assert recovered == 1 assert len(registry.pending_watchers) == 0 + def test_recovery_keeps_live_checkpoint_entries(self, registry, tmp_path): + checkpoint = tmp_path / "procs.json" + checkpoint.write_text(json.dumps([{ + "session_id": "proc_live", + "command": "sleep 999", + "pid": os.getpid(), + "task_id": "t1", + "session_key": "sk1", + }])) + + with patch("tools.process_registry.CHECKPOINT_PATH", checkpoint): + recovered = registry.recover_from_checkpoint() + assert recovered == 1 + assert registry.get("proc_live") is not None + + data = json.loads(checkpoint.read_text()) + assert len(data) == 1 + assert data[0]["session_id"] == "proc_live" + assert data[0]["pid"] == os.getpid() + assert data != [] + + def test_recovery_skips_explicit_sandbox_backed_entries(self, registry, tmp_path): + checkpoint = tmp_path / "procs.json" + original = [{ + "session_id": "proc_remote", + "command": "sleep 999", + "pid": os.getpid(), + "task_id": "t1", + "pid_scope": "sandbox", + }] + checkpoint.write_text(json.dumps(original)) + + with patch("tools.process_registry.CHECKPOINT_PATH", checkpoint): + recovered = registry.recover_from_checkpoint() + assert recovered == 0 + assert registry.get("proc_remote") is None + + data = json.loads(checkpoint.read_text()) + assert data == [] + + def test_detached_recovered_process_eventually_exits(self, registry, tmp_path): + proc = _spawn_python_sleep(0.4) + checkpoint = tmp_path / "procs.json" + checkpoint.write_text(json.dumps([{ + "session_id": "proc_live", + "command": "python -c 'import time; time.sleep(0.4)'", + "pid": proc.pid, + "task_id": "t1", + "session_key": "sk1", + }])) + + try: + with patch("tools.process_registry.CHECKPOINT_PATH", checkpoint): + recovered = registry.recover_from_checkpoint() + assert recovered == 1 + + session = registry.get("proc_live") + assert session is not None + assert session.detached is True + + proc.wait(timeout=5) + + assert _wait_until( + lambda: registry.get("proc_live") is not None + and registry.get("proc_live").exited, + timeout=5, + ) + + poll_result = registry.poll("proc_live") + assert poll_result["status"] == "exited" + + wait_result = registry.wait("proc_live", timeout=1) + assert wait_result["status"] == "exited" + finally: + if proc.poll() is None: + proc.terminate() + try: + proc.wait(timeout=5) + except Exception: + proc.kill() + proc.wait(timeout=5) + # ========================================================================= # Kill process @@ -365,6 +467,27 @@ class TestKillProcess: result = registry.kill_process(s.id) assert result["status"] == "already_exited" + def test_kill_detached_session_uses_host_pid(self, registry): + s = _make_session(sid="proc_detached", command="sleep 999") + s.pid = 424242 + s.detached = True + registry._running[s.id] = s + + calls = [] + + def fake_kill(pid, sig): + calls.append((pid, sig)) + + try: + with patch("tools.process_registry.os.kill", side_effect=fake_kill): + result = registry.kill_process(s.id) + + assert result["status"] == "killed" + assert (424242, 0) in calls + assert (424242, signal.SIGTERM) in calls + finally: + registry._running.pop(s.id, None) + # ========================================================================= # Tool handler diff --git a/tests/tools/test_tool_result_storage.py b/tests/tools/test_tool_result_storage.py index 96b904a576..4e51fe7bb7 100644 --- a/tests/tools/test_tool_result_storage.py +++ b/tests/tools/test_tool_result_storage.py @@ -395,7 +395,7 @@ class TestEnforceTurnBudget: assert PERSISTED_OUTPUT_TAG in msgs[1]["content"] def test_medium_result_regression(self): - """6 results of 42K chars each (252K total) — each under 50K default + """6 results of 42K chars each (252K total) — each under 100K default threshold but aggregate exceeds 200K budget. L3 should persist.""" env = MagicMock() env.execute.return_value = {"output": "", "returncode": 0} @@ -449,7 +449,7 @@ class TestPerToolThresholds: try: import tools.terminal_tool # noqa: F401 val = registry.get_max_result_size("terminal") - assert val == 30_000 + assert val == 100_000 except ImportError: pytest.skip("terminal_tool not importable in test env") @@ -467,6 +467,6 @@ class TestPerToolThresholds: try: import tools.file_tools # noqa: F401 val = registry.get_max_result_size("search_files") - assert val == 20_000 + assert val == 100_000 except ImportError: pytest.skip("file_tools not importable in test env") diff --git a/tools/browser_camofox.py b/tools/browser_camofox.py index 226e99b56b..3a305bbcb1 100644 --- a/tools/browser_camofox.py +++ b/tools/browser_camofox.py @@ -101,7 +101,8 @@ def _managed_persistence_enabled() -> bool: """ try: camofox_cfg = load_config().get("browser", {}).get("camofox", {}) - except Exception: + except Exception as exc: + logger.warning("managed_persistence check failed, defaulting to disabled: %s", exc) return False return bool(camofox_cfg.get("managed_persistence")) @@ -172,6 +173,22 @@ def _drop_session(task_id: Optional[str]) -> Optional[Dict[str, Any]]: return _sessions.pop(task_id, None) +def camofox_soft_cleanup(task_id: Optional[str] = None) -> bool: + """Release the in-memory session without destroying the server-side context. + + When managed persistence is enabled the browser profile (and its cookies) + must survive across agent tasks. This helper drops only the local tracking + entry and returns ``True``. When managed persistence is *not* enabled it + does nothing and returns ``False`` so the caller can fall back to + :func:`camofox_close`. + """ + if _managed_persistence_enabled(): + _drop_session(task_id) + logger.debug("Camofox soft cleanup for task %s (managed persistence)", task_id) + return True + return False + + # --------------------------------------------------------------------------- # HTTP helpers # --------------------------------------------------------------------------- diff --git a/tools/browser_tool.py b/tools/browser_tool.py index 7e52ed78d9..e62a586c11 100644 --- a/tools/browser_tool.py +++ b/tools/browser_tool.py @@ -877,7 +877,11 @@ def _run_browser_command( # Local mode — launch a headless Chromium instance backend_args = ["--session", session_info["session_name"]] - cmd_parts = browser_cmd.split() + backend_args + [ + # Keep concrete executable paths intact, even when they contain spaces. + # Only the synthetic npx fallback needs to expand into multiple argv items. + cmd_prefix = ["npx", "agent-browser"] if browser_cmd == "npx agent-browser" else [browser_cmd] + + cmd_parts = cmd_prefix + backend_args + [ "--json", command ] + args @@ -1931,11 +1935,15 @@ def cleanup_browser(task_id: Optional[str] = None) -> None: if task_id is None: task_id = "default" - # Also clean up Camofox session if running in Camofox mode + # Also clean up Camofox session if running in Camofox mode. + # Skip full close when managed persistence is enabled — the browser + # profile (and its session cookies) must survive across agent tasks. + # The inactivity reaper still frees idle resources. if _is_camofox_mode(): try: - from tools.browser_camofox import camofox_close - camofox_close(task_id) + from tools.browser_camofox import camofox_close, camofox_soft_cleanup + if not camofox_soft_cleanup(task_id): + camofox_close(task_id) except Exception as e: logger.debug("Camofox cleanup for task %s: %s", task_id, e) diff --git a/tools/budget_config.py b/tools/budget_config.py index 52204cdf8e..577e59442e 100644 --- a/tools/budget_config.py +++ b/tools/budget_config.py @@ -15,9 +15,9 @@ PINNED_THRESHOLDS: Dict[str, float] = { # Defaults matching the current hardcoded values in tool_result_storage.py. # Kept here as the single source of truth; tool_result_storage.py imports these. -DEFAULT_RESULT_SIZE_CHARS: int = 50_000 +DEFAULT_RESULT_SIZE_CHARS: int = 100_000 DEFAULT_TURN_BUDGET_CHARS: int = 200_000 -DEFAULT_PREVIEW_SIZE_CHARS: int = 2_000 +DEFAULT_PREVIEW_SIZE_CHARS: int = 1_500 @dataclass(frozen=True) diff --git a/tools/code_execution_tool.py b/tools/code_execution_tool.py index 08cbf15b1f..f0d61210ff 100644 --- a/tools/code_execution_tool.py +++ b/tools/code_execution_tool.py @@ -1343,5 +1343,5 @@ registry.register( enabled_tools=kw.get("enabled_tools")), check_fn=check_sandbox_requirements, emoji="🐍", - max_result_size_chars=30_000, + max_result_size_chars=100_000, ) diff --git a/tools/cronjob_tools.py b/tools/cronjob_tools.py index 82d43c588b..595ad8bc71 100644 --- a/tools/cronjob_tools.py +++ b/tools/cronjob_tools.py @@ -195,6 +195,7 @@ def _format_job(job: Dict[str, Any]) -> Dict[str, Any]: "next_run_at": job.get("next_run_at"), "last_run_at": job.get("last_run_at"), "last_status": job.get("last_status"), + "last_delivery_error": job.get("last_delivery_error"), "enabled": job.get("enabled", True), "state": job.get("state", "scheduled" if job.get("enabled", True) else "paused"), "paused_at": job.get("paused_at"), diff --git a/tools/file_tools.py b/tools/file_tools.py index 4ca10b2dcf..05376dfc67 100644 --- a/tools/file_tools.py +++ b/tools/file_tools.py @@ -856,4 +856,4 @@ def _handle_search_files(args, **kw): registry.register(name="read_file", toolset="file", schema=READ_FILE_SCHEMA, handler=_handle_read_file, check_fn=_check_file_reqs, emoji="📖", max_result_size_chars=float('inf')) registry.register(name="write_file", toolset="file", schema=WRITE_FILE_SCHEMA, handler=_handle_write_file, check_fn=_check_file_reqs, emoji="✍️", max_result_size_chars=100_000) registry.register(name="patch", toolset="file", schema=PATCH_SCHEMA, handler=_handle_patch, check_fn=_check_file_reqs, emoji="🔧", max_result_size_chars=100_000) -registry.register(name="search_files", toolset="file", schema=SEARCH_FILES_SCHEMA, handler=_handle_search_files, check_fn=_check_file_reqs, emoji="🔎", max_result_size_chars=20_000) +registry.register(name="search_files", toolset="file", schema=SEARCH_FILES_SCHEMA, handler=_handle_search_files, check_fn=_check_file_reqs, emoji="🔎", max_result_size_chars=100_000) diff --git a/tools/process_registry.py b/tools/process_registry.py index 948f073abb..b935f49c33 100644 --- a/tools/process_registry.py +++ b/tools/process_registry.py @@ -76,6 +76,7 @@ class ProcessSession: output_buffer: str = "" # Rolling output (last MAX_OUTPUT_CHARS) max_output_chars: int = MAX_OUTPUT_CHARS detached: bool = False # True if recovered from crash (no pipe) + pid_scope: str = "host" # "host" for local/PTY PIDs, "sandbox" for env-local PIDs # Watcher/notification metadata (persisted for crash recovery) watcher_platform: str = "" watcher_chat_id: str = "" @@ -127,6 +128,48 @@ class ProcessRegistry: lines.pop(0) return "\n".join(lines) + @staticmethod + def _is_host_pid_alive(pid: Optional[int]) -> bool: + """Best-effort liveness check for host-visible PIDs.""" + if not pid: + return False + try: + os.kill(pid, 0) + return True + except (ProcessLookupError, PermissionError): + return False + + def _refresh_detached_session(self, session: Optional[ProcessSession]) -> Optional[ProcessSession]: + """Update recovered host-PID sessions when the underlying process has exited.""" + if session is None or session.exited or not session.detached or session.pid_scope != "host": + return session + + if self._is_host_pid_alive(session.pid): + return session + + with session._lock: + if session.exited: + return session + session.exited = True + # Recovered sessions no longer have a waitable handle, so the real + # exit code is unavailable once the original process object is gone. + session.exit_code = None + + self._move_to_finished(session) + return session + + @staticmethod + def _terminate_host_pid(pid: int) -> None: + """Terminate a host-visible PID without requiring the original process handle.""" + if _IS_WINDOWS: + os.kill(pid, signal.SIGTERM) + return + + try: + os.killpg(os.getpgid(pid), signal.SIGTERM) + except (OSError, ProcessLookupError, PermissionError): + os.kill(pid, signal.SIGTERM) + # ----- Spawn ----- def spawn_local( @@ -269,6 +312,7 @@ class ProcessRegistry: cwd=cwd, started_at=time.time(), env_ref=env, + pid_scope="sandbox", ) # Run the command in the sandbox with output capture @@ -439,7 +483,8 @@ class ProcessRegistry: def get(self, session_id: str) -> Optional[ProcessSession]: """Get a session by ID (running or finished).""" with self._lock: - return self._running.get(session_id) or self._finished.get(session_id) + session = self._running.get(session_id) or self._finished.get(session_id) + return self._refresh_detached_session(session) def poll(self, session_id: str) -> dict: """Check status and get new output for a background process.""" @@ -531,6 +576,7 @@ class ProcessRegistry: deadline = time.monotonic() + effective_timeout while time.monotonic() < deadline: + session = self._refresh_detached_session(session) if session.exited: result = { "status": "exited", @@ -596,6 +642,25 @@ class ProcessRegistry: elif session.env_ref and session.pid: # Non-local -- kill inside sandbox session.env_ref.execute(f"kill {session.pid} 2>/dev/null", timeout=5) + elif session.detached and session.pid_scope == "host" and session.pid: + if not self._is_host_pid_alive(session.pid): + with session._lock: + session.exited = True + session.exit_code = None + self._move_to_finished(session) + return { + "status": "already_exited", + "exit_code": session.exit_code, + } + self._terminate_host_pid(session.pid) + else: + return { + "status": "error", + "error": ( + "Recovered process cannot be killed after restart because " + "its original runtime handle is no longer available" + ), + } session.exited = True session.exit_code = -15 # SIGTERM self._move_to_finished(session) @@ -640,6 +705,8 @@ class ProcessRegistry: with self._lock: all_sessions = list(self._running.values()) + list(self._finished.values()) + all_sessions = [self._refresh_detached_session(s) for s in all_sessions] + if task_id: all_sessions = [s for s in all_sessions if s.task_id == task_id] @@ -666,6 +733,12 @@ class ProcessRegistry: def has_active_processes(self, task_id: str) -> bool: """Check if there are active (running) processes for a task_id.""" + with self._lock: + sessions = list(self._running.values()) + + for session in sessions: + self._refresh_detached_session(session) + with self._lock: return any( s.task_id == task_id and not s.exited @@ -674,6 +747,12 @@ class ProcessRegistry: def has_active_for_session(self, session_key: str) -> bool: """Check if there are active processes for a gateway session key.""" + with self._lock: + sessions = list(self._running.values()) + + for session in sessions: + self._refresh_detached_session(session) + with self._lock: return any( s.session_key == session_key and not s.exited @@ -727,6 +806,7 @@ class ProcessRegistry: "session_id": s.id, "command": s.command, "pid": s.pid, + "pid_scope": s.pid_scope, "cwd": s.cwd, "started_at": s.started_at, "task_id": s.task_id, @@ -764,13 +844,21 @@ class ProcessRegistry: if not pid: continue + pid_scope = entry.get("pid_scope", "host") + if pid_scope != "host": + # Sandbox-backed processes keep only in-sandbox PIDs in the + # checkpoint, which are not meaningful to the restarted host + # process once the original environment handle is gone. + logger.info( + "Skipping recovery for non-host process: %s (pid=%s, scope=%s)", + entry.get("command", "unknown")[:60], + pid, + pid_scope, + ) + continue + # Check if PID is still alive - alive = False - try: - os.kill(pid, 0) - alive = True - except (ProcessLookupError, PermissionError): - pass + alive = self._is_host_pid_alive(pid) if alive: session = ProcessSession( @@ -779,6 +867,7 @@ class ProcessRegistry: task_id=entry.get("task_id", ""), session_key=entry.get("session_key", ""), pid=pid, + pid_scope=pid_scope, cwd=entry.get("cwd"), started_at=entry.get("started_at", time.time()), detached=True, # Can't read output, but can report status + kill @@ -802,14 +891,10 @@ class ProcessRegistry: "platform": session.watcher_platform, "chat_id": session.watcher_chat_id, "thread_id": session.watcher_thread_id, + "notify_on_complete": session.notify_on_complete, }) - # Clear the checkpoint (will be rewritten as processes finish) - try: - from utils import atomic_json_write - atomic_json_write(CHECKPOINT_PATH, []) - except Exception as e: - logger.debug("Could not clear checkpoint file: %s", e, exc_info=True) + self._write_checkpoint() return recovered diff --git a/tools/terminal_tool.py b/tools/terminal_tool.py index 520de31998..243127a295 100644 --- a/tools/terminal_tool.py +++ b/tools/terminal_tool.py @@ -1620,5 +1620,5 @@ registry.register( handler=_handle_terminal, check_fn=check_terminal_requirements, emoji="💻", - max_result_size_chars=30_000, + max_result_size_chars=100_000, ) diff --git a/trajectory_compressor.py b/trajectory_compressor.py index e4faf97a3d..24c1f722af 100644 --- a/trajectory_compressor.py +++ b/trajectory_compressor.py @@ -44,6 +44,7 @@ import fire from rich.progress import Progress, SpinnerColumn, TextColumn, BarColumn, TaskProgressColumn, TimeElapsedColumn, TimeRemainingColumn from rich.console import Console from hermes_constants import OPENROUTER_BASE_URL +from agent.retry_utils import jittered_backoff # Load environment variables from dotenv import load_dotenv @@ -585,7 +586,7 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix.""" self.logger.warning(f"Summarization attempt {attempt + 1} failed: {e}") if attempt < self.config.max_retries - 1: - time.sleep(self.config.retry_delay * (attempt + 1)) + time.sleep(jittered_backoff(attempt + 1, base_delay=self.config.retry_delay, max_delay=30.0)) else: # Fallback: create a basic summary return "[CONTEXT SUMMARY]: [Summary generation failed - previous turns contained tool calls and responses that have been compressed to save context space.]" @@ -647,7 +648,7 @@ Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix.""" self.logger.warning(f"Summarization attempt {attempt + 1} failed: {e}") if attempt < self.config.max_retries - 1: - await asyncio.sleep(self.config.retry_delay * (attempt + 1)) + await asyncio.sleep(jittered_backoff(attempt + 1, base_delay=self.config.retry_delay, max_delay=30.0)) else: # Fallback: create a basic summary return "[CONTEXT SUMMARY]: [Summary generation failed - previous turns contained tool calls and responses that have been compressed to save context space.]" diff --git a/website/docs/user-guide/features/skins.md b/website/docs/user-guide/features/skins.md index 5aec20cdf1..e093a763b5 100644 --- a/website/docs/user-guide/features/skins.md +++ b/website/docs/user-guide/features/skins.md @@ -196,6 +196,55 @@ branding: tool_prefix: "▏" ``` +## Hermes Mod — Visual Skin Editor + +[Hermes Mod](https://github.com/cocktailpeanut/hermes-mod) is a community-built web UI for creating and managing skins visually. Instead of writing YAML by hand, you get a point-and-click editor with live preview. + +![Hermes Mod skin editor](https://raw.githubusercontent.com/cocktailpeanut/hermes-mod/master/nous.png) + +**What it does:** + +- Lists all built-in and custom skins +- Opens any skin into a visual editor with all Hermes skin fields (colors, spinner, branding, tool prefix, tool emojis) +- Generates `banner_logo` text art from a text prompt +- Converts uploaded images (PNG, JPG, GIF, WEBP) into `banner_hero` ASCII art with multiple render styles (braille, ASCII ramp, blocks, dots) +- Saves directly to `~/.hermes/skins/` +- Activates a skin by updating `~/.hermes/config.yaml` +- Shows the generated YAML and a live preview + +### Install + +**Option 1 — Pinokio (1-click):** + +Find it on [pinokio.computer](https://pinokio.computer) and install with one click. + +**Option 2 — npx (quickest from terminal):** + +```bash +npx -y hermes-mod +``` + +**Option 3 — Manual:** + +```bash +git clone https://github.com/cocktailpeanut/hermes-mod.git +cd hermes-mod/app +npm install +npm start +``` + +### Usage + +1. Start the app (via Pinokio or terminal). +2. Open **Skin Studio**. +3. Choose a built-in or custom skin to edit. +4. Generate a logo from text and/or upload an image for hero art. Pick a render style and width. +5. Edit colors, spinner, branding, and other fields. +6. Click **Save** to write the skin YAML to `~/.hermes/skins/`. +7. Click **Activate** to set it as the current skin (updates `display.skin` in `config.yaml`). + +Hermes Mod respects the `HERMES_HOME` environment variable, so it works with [profiles](/docs/user-guide/profiles) too. + ## Operational notes - Built-in skins load from `hermes_cli/skin_engine.py`. diff --git a/website/docs/user-guide/messaging/telegram.md b/website/docs/user-guide/messaging/telegram.md index a59b73ca5a..4e4495ad28 100644 --- a/website/docs/user-guide/messaging/telegram.md +++ b/website/docs/user-guide/messaging/telegram.md @@ -463,6 +463,40 @@ platforms: You usually don't need to configure this manually. The auto-discovery via DoH handles most restricted-network scenarios. The `TELEGRAM_FALLBACK_IPS` env var is only needed if DoH is also blocked on your network. ::: +## Proxy Support + +If your network requires an HTTP proxy to reach the internet (common in corporate environments), the Telegram adapter automatically reads standard proxy environment variables and routes all connections through the proxy. + +### Supported variables + +The adapter checks these environment variables in order, using the first one that is set: + +1. `HTTPS_PROXY` +2. `HTTP_PROXY` +3. `ALL_PROXY` +4. `https_proxy` / `http_proxy` / `all_proxy` (lowercase variants) + +### Configuration + +Set the proxy in your environment before starting the gateway: + +```bash +export HTTPS_PROXY=http://proxy.example.com:8080 +hermes gateway +``` + +Or add it to `~/.hermes/.env`: + +```bash +HTTPS_PROXY=http://proxy.example.com:8080 +``` + +The proxy applies to both the primary transport and all fallback IP transports. No additional Hermes configuration is needed — if the environment variable is set, it's used automatically. + +:::note +This covers the custom fallback transport layer that Hermes uses for Telegram connections. The standard `httpx` client used elsewhere already respects proxy env vars natively. +::: + ## Message Reactions The bot can add emoji reactions to messages as visual processing feedback: