mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-06-19 00:20:51 +08:00
The cua-driver backend was gated to macOS only:
# tools/computer_use/tool.py
def check_computer_use_requirements() -> bool:
if sys.platform != "darwin":
return False
...
But cua-driver itself has been Windows-feature-complete since cua-driver-rs
(the cross-platform Rust port) shipped its Windows backend. Every action
tool — click, type_text, hotkey, drag, scroll, screenshot, launch_app,
list_apps, list_windows, get_window_state, move_cursor, wait — is marked
VERIFIED on Windows in the cross-platform PARITY matrix:
https://github.com/trycua/cua/blob/main/libs/cua-driver-rs/PARITY.md
This PR widens the gate to `sys.platform in ("darwin", "win32")`. No new
code paths — the existing MCP stdio integration in cua_backend.py works
identically against cua-driver on Windows because cua-driver's tool
surface is uniform across OSes.
Linux is not in scope. cua-driver-rs Linux support exists in tree but is
alpha (most Linux rows in PARITY are OPEN, not VERIFIED) — keeping it gated
off here until upstream flips those to VERIFIED. The plumbing is
OS-agnostic so flipping the gate later is one-line.
Empirical verification on Windows 11 24H2 (2026-05-22 dogfood):
- Built-in Administrator (RID 500) at High IL via cua-driver-rs
RunLevel=Highest autostart task:
`cua-driver call get_window_state` for Calculator UWP
→ element_count: 41
- Regular admin (UAC-split, Medium IL primary token) running
`cua-driver call` directly from PowerShell:
`cua-driver call get_window_state` for Calculator UWP
→ element_count: 41
UWP / AppContainer UIA works at any IL for any user. No EV cert, no
uiAccess="true" manifest, no Program Files install requirement.
## Changes
- tools/computer_use/tool.py: replace `sys.platform != "darwin"`
early-return with `sys.platform not in ("darwin", "win32")`. Update
top-of-file docstring + vision-prompt phrasing ("macOS application" →
"desktop application") so the model isn't told to expect a Mac UI when
it's looking at a Windows screen.
- tools/computer_use/cua_backend.py: rewrite top-of-file docstring to
cover macOS + Windows + the Linux-alpha caveat. `is_available()`
matches the same `darwin/win32` allowlist. `cua_driver_install_hint()`
returns the Windows installer (irm | iex) on Windows, the bash
installer on macOS.
- tools/computer_use_tool.py: update registry description from "macOS
desktop control" to "desktop control (macOS, Windows; Linux alpha)".
The macOS-specific bits in `cua_backend.py` (the `_is_arm_mac` helper, the
"macOS reports localized app names" warning) stay as-is — they're macOS
runtime details that are conditionally taken when running on macOS, not
gates that block other OSes.
## Install
Same one-liner story, OS-specific installer:
macOS:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.sh)"
Windows (PowerShell):
irm https://raw.githubusercontent.com/trycua/cua/main/libs/cua-driver/scripts/install.ps1 | iex
After install, `cua-driver` is on $PATH and Hermes's check_fn sees it.
## Related
Replies to @teknium1's question on #20660 about whether cua-driver-rs
ships Windows + Linux backends and whether @Abd0r's per-OS Python work
should be absorbed into cua-driver as a starting point. Short answer:
the cua-driver-rs Rust impl is months ahead of a fresh Python port on
Windows. Linux is alpha and will get there. Several pieces of #20660
(kill-switch, JSONL audit log, screenshot redact_regions, the per-OS
SKILL.md docs) are worth absorbing into cua-driver as follow-up work —
separate from this PR.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
40 lines
1.2 KiB
Python
40 lines
1.2 KiB
Python
"""Shim for tool discovery. Registers `computer_use` with tools.registry.
|
|
|
|
The real implementation lives in the `tools/computer_use/` package to keep
|
|
the file structure clean. This shim exists because tools.registry auto-imports
|
|
`tools/*.py` — we need a top-level module to trigger the registration.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
from tools.computer_use.schema import COMPUTER_USE_SCHEMA
|
|
from tools.computer_use.tool import (
|
|
check_computer_use_requirements,
|
|
handle_computer_use,
|
|
set_approval_callback,
|
|
)
|
|
from tools.registry import registry
|
|
|
|
|
|
registry.register(
|
|
name="computer_use",
|
|
toolset="computer_use",
|
|
schema=COMPUTER_USE_SCHEMA,
|
|
handler=lambda args, **kw: handle_computer_use(args, **kw),
|
|
check_fn=check_computer_use_requirements,
|
|
requires_env=[],
|
|
description=(
|
|
"Universal desktop control via cua-driver (macOS, Windows; Linux alpha). Works with any "
|
|
"tool-capable model (Anthropic, OpenAI, OpenRouter, local vLLM, "
|
|
"etc.). Background computer-use: does NOT steal the user's cursor "
|
|
"or keyboard focus."
|
|
),
|
|
)
|
|
|
|
|
|
__all__ = [
|
|
"handle_computer_use",
|
|
"set_approval_callback",
|
|
"check_computer_use_requirements",
|
|
]
|