Files
hermes-agent/tools/computer_use_tool.py
Francesco Bonacci 15cffcc522 feat(computer_use): enable Windows via cua-driver-rs
The cua-driver backend was gated to macOS only:

    # tools/computer_use/tool.py
    def check_computer_use_requirements() -> bool:
        if sys.platform != "darwin":
            return False
        ...

But cua-driver itself has been Windows-feature-complete since cua-driver-rs
(the cross-platform Rust port) shipped its Windows backend. Every action
tool — click, type_text, hotkey, drag, scroll, screenshot, launch_app,
list_apps, list_windows, get_window_state, move_cursor, wait — is marked
VERIFIED on Windows in the cross-platform PARITY matrix:
https://github.com/trycua/cua/blob/main/libs/cua-driver-rs/PARITY.md

This PR widens the gate to `sys.platform in ("darwin", "win32")`. No new
code paths — the existing MCP stdio integration in cua_backend.py works
identically against cua-driver on Windows because cua-driver's tool
surface is uniform across OSes.

Linux is not in scope. cua-driver-rs Linux support exists in tree but is
alpha (most Linux rows in PARITY are OPEN, not VERIFIED) — keeping it gated
off here until upstream flips those to VERIFIED. The plumbing is
OS-agnostic so flipping the gate later is one-line.

Empirical verification on Windows 11 24H2 (2026-05-22 dogfood):

  - Built-in Administrator (RID 500) at High IL via cua-driver-rs
    RunLevel=Highest autostart task:
      `cua-driver call get_window_state` for Calculator UWP
      → element_count: 41

  - Regular admin (UAC-split, Medium IL primary token) running
    `cua-driver call` directly from PowerShell:
      `cua-driver call get_window_state` for Calculator UWP
      → element_count: 41

UWP / AppContainer UIA works at any IL for any user. No EV cert, no
uiAccess="true" manifest, no Program Files install requirement.

## Changes

- tools/computer_use/tool.py: replace `sys.platform != "darwin"`
  early-return with `sys.platform not in ("darwin", "win32")`. Update
  top-of-file docstring + vision-prompt phrasing ("macOS application" →
  "desktop application") so the model isn't told to expect a Mac UI when
  it's looking at a Windows screen.
- tools/computer_use/cua_backend.py: rewrite top-of-file docstring to
  cover macOS + Windows + the Linux-alpha caveat. `is_available()`
  matches the same `darwin/win32` allowlist. `cua_driver_install_hint()`
  returns the Windows installer (irm | iex) on Windows, the bash
  installer on macOS.
- tools/computer_use_tool.py: update registry description from "macOS
  desktop control" to "desktop control (macOS, Windows; Linux alpha)".

The macOS-specific bits in `cua_backend.py` (the `_is_arm_mac` helper, the
"macOS reports localized app names" warning) stay as-is — they're macOS
runtime details that are conditionally taken when running on macOS, not
gates that block other OSes.

## Install

Same one-liner story, OS-specific installer:

  macOS:
    /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"

  Windows (PowerShell):
    irm https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.ps1 | iex

After install, `cua-driver` is on $PATH and Hermes's check_fn sees it.

## Related

Replies to @teknium1's question on #20660 about whether cua-driver-rs
ships Windows + Linux backends and whether @Abd0r's per-OS Python work
should be absorbed into cua-driver as a starting point. Short answer:
the cua-driver-rs Rust impl is months ahead of a fresh Python port on
Windows. Linux is alpha and will get there. Several pieces of #20660
(kill-switch, JSONL audit log, screenshot redact_regions, the per-OS
SKILL.md docs) are worth absorbing into cua-driver as follow-up work —
separate from this PR.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-11 03:56:03 -07:00

40 lines
1.2 KiB
Python

"""Shim for tool discovery. Registers `computer_use` with tools.registry.
The real implementation lives in the `tools/computer_use/` package to keep
the file structure clean. This shim exists because tools.registry auto-imports
`tools/*.py` — we need a top-level module to trigger the registration.
"""
from __future__ import annotations
from tools.computer_use.schema import COMPUTER_USE_SCHEMA
from tools.computer_use.tool import (
check_computer_use_requirements,
handle_computer_use,
set_approval_callback,
)
from tools.registry import registry
registry.register(
name="computer_use",
toolset="computer_use",
schema=COMPUTER_USE_SCHEMA,
handler=lambda args, **kw: handle_computer_use(args, **kw),
check_fn=check_computer_use_requirements,
requires_env=[],
description=(
"Universal desktop control via cua-driver (macOS, Windows; Linux alpha). Works with any "
"tool-capable model (Anthropic, OpenAI, OpenRouter, local vLLM, "
"etc.). Background computer-use: does NOT steal the user's cursor "
"or keyboard focus."
),
)
__all__ = [
"handle_computer_use",
"set_approval_callback",
"check_computer_use_requirements",
]