Files
hermes-agent/website/docs/reference
Teknium 8e3803f3ce feat: Computer Use Tool — macOS desktop control via Anthropic native API
Salvaged from PR #3816 by 0xbyt4. Stripped unrelated changes (telegram
thread retry, cache logging in quiet_mode), preserved existing beta
headers (interleaved-thinking, fine-grained-tool-streaming), and
rebased onto current main.

New computer_use toolset:
- Screenshot capture via macOS native screencapture + sips
- Mouse: click, double/triple/right/middle click, drag, move
- Keyboard: type text (clipboard paste for Unicode), key combos
- Zoom for inspecting small screen regions at full resolution
- Auto-screenshot after destructive actions (saves API round-trips)

Architecture:
- Dual-schema: stub (OpenAI format) for dispatch + native
  (computer_20251124) injected into Anthropic API calls
- Provider gating: stripped from non-Anthropic providers at init
- Beta API routing: messages.create → beta.messages.create when
  native tools present (both streaming and non-streaming)
- Multimodal results: _anthropic_content_blocks on tool messages,
  content stays string for session DB / trajectory compatibility

Token optimization:
- Server-side context editing (context-management-2025-06-27 beta)
- Client-side screenshot-aware pruning in context compressor
- Image eviction: keeps only 3 most recent screenshots
- Image-aware token estimation (flat 1500 tokens per image)

Safety:
- Hard-blocked key combos (empty trash, force delete, lock screen)
- Blocked type patterns (curl|bash, sudo -S -p '' rm -rf, privilege escalation)
- Anti-injection system prompt guidance
- Approval callback wired (disabled during beta)

Includes: 102 tests, 657-line macOS workflow skill (auto-loaded),
feature docs page, reference catalog updates.
2026-04-02 01:59:32 -07:00
..