mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-02 08:47:26 +08:00
Salvaged from PR #3816 by 0xbyt4. Stripped unrelated changes (telegram thread retry, cache logging in quiet_mode), preserved existing beta headers (interleaved-thinking, fine-grained-tool-streaming), and rebased onto current main. New computer_use toolset: - Screenshot capture via macOS native screencapture + sips - Mouse: click, double/triple/right/middle click, drag, move - Keyboard: type text (clipboard paste for Unicode), key combos - Zoom for inspecting small screen regions at full resolution - Auto-screenshot after destructive actions (saves API round-trips) Architecture: - Dual-schema: stub (OpenAI format) for dispatch + native (computer_20251124) injected into Anthropic API calls - Provider gating: stripped from non-Anthropic providers at init - Beta API routing: messages.create → beta.messages.create when native tools present (both streaming and non-streaming) - Multimodal results: _anthropic_content_blocks on tool messages, content stays string for session DB / trajectory compatibility Token optimization: - Server-side context editing (context-management-2025-06-27 beta) - Client-side screenshot-aware pruning in context compressor - Image eviction: keeps only 3 most recent screenshots - Image-aware token estimation (flat 1500 tokens per image) Safety: - Hard-blocked key combos (empty trash, force delete, lock screen) - Blocked type patterns (curl|bash, sudo -S -p '' rm -rf, privilege escalation) - Anti-injection system prompt guidance - Approval callback wired (disabled during beta) Includes: 102 tests, 657-line macOS workflow skill (auto-loaded), feature docs page, reference catalog updates.