feat: Computer Use Tool — macOS desktop control via Anthropic native API

Salvaged from PR #3816 by 0xbyt4. Stripped unrelated changes (telegram
thread retry, cache logging in quiet_mode), preserved existing beta
headers (interleaved-thinking, fine-grained-tool-streaming), and
rebased onto current main.

New computer_use toolset:
- Screenshot capture via macOS native screencapture + sips
- Mouse: click, double/triple/right/middle click, drag, move
- Keyboard: type text (clipboard paste for Unicode), key combos
- Zoom for inspecting small screen regions at full resolution
- Auto-screenshot after destructive actions (saves API round-trips)

Architecture:
- Dual-schema: stub (OpenAI format) for dispatch + native
  (computer_20251124) injected into Anthropic API calls
- Provider gating: stripped from non-Anthropic providers at init
- Beta API routing: messages.create → beta.messages.create when
  native tools present (both streaming and non-streaming)
- Multimodal results: _anthropic_content_blocks on tool messages,
  content stays string for session DB / trajectory compatibility

Token optimization:
- Server-side context editing (context-management-2025-06-27 beta)
- Client-side screenshot-aware pruning in context compressor
- Image eviction: keeps only 3 most recent screenshots
- Image-aware token estimation (flat 1500 tokens per image)

Safety:
- Hard-blocked key combos (empty trash, force delete, lock screen)
- Blocked type patterns (curl|bash, sudo -S -p '' rm -rf, privilege escalation)
- Anti-injection system prompt guidance
- Approval callback wired (disabled during beta)

Includes: 102 tests, 657-line macOS workflow skill (auto-loaded),
feature docs page, reference catalog updates.
This commit is contained in:
Teknium
2026-04-02 01:59:32 -07:00
parent 64584a931f
commit 8e3803f3ce
22 changed files with 3775 additions and 78 deletions

View File

@@ -36,6 +36,12 @@ This page documents the built-in Hermes tool registry as it exists in code. Avai
|------|-------------|----------------------|
| `execute_code` | Run a Python script that can call Hermes tools programmatically. Use this when you need 3+ tool calls with processing logic between them, need to filter/reduce large tool outputs before they enter your context, need conditional branching (… | — |
## `computer_use` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `computer` | Control the macOS desktop — take screenshots, click, type, scroll, drag, and use keyboard shortcuts. Uses Anthropic's Computer Use API (`computer_20251124`). Actions: `screenshot`, `left_click`, `right_click`, `double_click`, `triple_click`, `middle_click`, `mouse_move`, `left_click_drag`, `left_mouse_down`, `left_mouse_up`, `type`, `key`, `hold_key`, `scroll`, `zoom`, `wait`. Requires macOS, pyautogui, Quartz, and Anthropic native API. | macOS + Anthropic provider |
## `cronjob` toolset
| Tool | Description | Requires environment |