Files
hermes-agent/tools/computer_use_tool.py
Jeff f76f382a65 feat(tools): add Windows UIA backend for computer_use
Brings desktop control to Windows hosts: UI Automation element
discovery with SOM overlays, SendInput mouse/keyboard (virtual-
desktop-normalized absolute coords, Unicode typing), and focus-free
set_value via UIA value/selection/range patterns. Backend selection
is platform-aware (HERMES_COMPUTER_USE_BACKEND still overrides) and
check_computer_use_requirements() now gates per platform. Windows
session-killing key combos (win+l, ctrl+alt+del, alt+f4) are
hard-blocked alongside the macOS list.

Unlike cua-driver on macOS there is no background input injection on
Windows: pointer/keyboard actions briefly foreground the target
window, and the platform-aware tool schema tells the model so.

Requires uiautomation (+comtypes) in the venv; windows_backend
degrades to unavailable when imports fail. 118 computer_use tests
pass incl. 21 new dependency-free Windows tests; verified live
against Notepad (capture/SOM/type/set_value/key).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
2026-06-15 06:19:20 -07:00

41 lines
1.3 KiB
Python

"""Shim for tool discovery. Registers `computer_use` with tools.registry.
The real implementation lives in the `tools/computer_use/` package to keep
the file structure clean. This shim exists because tools.registry auto-imports
`tools/*.py` — we need a top-level module to trigger the registration.
"""
from __future__ import annotations
from tools.computer_use.schema import COMPUTER_USE_SCHEMA
from tools.computer_use.tool import (
check_computer_use_requirements,
handle_computer_use,
set_approval_callback,
)
from tools.registry import registry
registry.register(
name="computer_use",
toolset="computer_use",
schema=COMPUTER_USE_SCHEMA,
handler=lambda args, **kw: handle_computer_use(args, **kw),
check_fn=check_computer_use_requirements,
requires_env=[],
description=(
"Universal desktop control. Works with any tool-capable model "
"(Anthropic, OpenAI, OpenRouter, local vLLM, etc.). macOS: "
"background computer-use via cua-driver (does NOT steal the user's "
"cursor or keyboard focus). Windows: UI Automation + SendInput "
"(actions briefly foreground the target window)."
),
)
__all__ = [
"handle_computer_use",
"set_approval_callback",
"check_computer_use_requirements",
]