docs: expand Docusaurus coverage across CLI, tools, skills, and skins (#1232)
- add code-derived reference pages for slash commands, tools, toolsets,
bundled skills, and official optional skills
- document the skin system and link visual theming separately from
conversational personality
- refresh quickstart, configuration, environment variable, and messaging
docs to match current provider, gateway, and browser behavior
- fix stale command, session, and Home Assistant configuration guidance
2026-03-13 21:34:41 -07:00
---
sidebar_position: 3
title: "Built-in Tools Reference"
description: "Authoritative reference for Hermes built-in tools, grouped by toolset"
---
# Built-in Tools Reference
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
This page documents all 55 built-in tools in the Hermes tool registry, grouped by toolset. Availability varies by platform, credentials, and enabled toolsets.
docs: comprehensive documentation audit — fix stale info, expand thin pages, add depth (#5393)
Major changes across 20 documentation pages:
Staleness fixes:
- Fix FAQ: wrong import path (hermes.agent → run_agent)
- Fix FAQ: stale Gemini 2.0 model → Gemini 3 Flash
- Fix integrations/index: missing MiniMax TTS provider
- Fix integrations/index: web_crawl is not a registered tool
- Fix sessions: add all 19 session sources (was only 5)
- Fix cron: add all 18 delivery targets (was only telegram/discord)
- Fix webhooks: add all delivery targets
- Fix overview: add missing MCP, memory providers, credential pools
- Fix all line-number references → use function name searches instead
- Update file size estimates (run_agent ~9200, gateway ~7200, cli ~8500)
Expanded thin pages (< 150 lines → substantial depth):
- honcho.md: 43 → 108 lines — added feature comparison, tools, config, CLI
- overview.md: 49 → 55 lines — added MCP, memory providers, credential pools
- toolsets-reference.md: 57 → 175 lines — added explanations, config examples,
custom toolsets, wildcards, platform differences table
- optional-skills-catalog.md: 74 → 153 lines — added 25+ missing skills across
communication, devops, mlops (18!), productivity, research categories
- integrations/index.md: 82 → 115 lines — added messaging, HA, plugins sections
- cron-internals.md: 90 → 195 lines — added job JSON example, lifecycle states,
tick cycle, delivery targets, script-backed jobs, CLI interface
- gateway-internals.md: 111 → 250 lines — added architecture diagram, message
flow, two-level guard, platform adapters, token locks, process management
- agent-loop.md: 112 → 235 lines — added entry points, API mode resolution,
turn lifecycle detail, message alternation rules, tool execution flow,
callback table, budget tracking, compression details
- architecture.md: 152 → 295 lines — added system overview diagram, data flow
diagrams, design principles table, dependency chain
Other depth additions:
- context-references.md: added platform availability, compression interaction,
common patterns sections
- slash-commands.md: added quick commands config example, alias resolution
- image-generation.md: added platform delivery table
- tools-reference.md: added tool counts, MCP tools note
- index.md: updated platform count (5 → 14+), tool count (40+ → 47)
2026-04-05 19:45:50 -07:00
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
**Quick counts:** 12 browser tools, 4 file tools, 10 RL tools, 4 Home Assistant tools, 2 terminal tools, 2 web tools, 5 Feishu tools, and 15 standalone tools across other toolsets.
docs: comprehensive documentation audit — fix stale info, expand thin pages, add depth (#5393)
Major changes across 20 documentation pages:
Staleness fixes:
- Fix FAQ: wrong import path (hermes.agent → run_agent)
- Fix FAQ: stale Gemini 2.0 model → Gemini 3 Flash
- Fix integrations/index: missing MiniMax TTS provider
- Fix integrations/index: web_crawl is not a registered tool
- Fix sessions: add all 19 session sources (was only 5)
- Fix cron: add all 18 delivery targets (was only telegram/discord)
- Fix webhooks: add all delivery targets
- Fix overview: add missing MCP, memory providers, credential pools
- Fix all line-number references → use function name searches instead
- Update file size estimates (run_agent ~9200, gateway ~7200, cli ~8500)
Expanded thin pages (< 150 lines → substantial depth):
- honcho.md: 43 → 108 lines — added feature comparison, tools, config, CLI
- overview.md: 49 → 55 lines — added MCP, memory providers, credential pools
- toolsets-reference.md: 57 → 175 lines — added explanations, config examples,
custom toolsets, wildcards, platform differences table
- optional-skills-catalog.md: 74 → 153 lines — added 25+ missing skills across
communication, devops, mlops (18!), productivity, research categories
- integrations/index.md: 82 → 115 lines — added messaging, HA, plugins sections
- cron-internals.md: 90 → 195 lines — added job JSON example, lifecycle states,
tick cycle, delivery targets, script-backed jobs, CLI interface
- gateway-internals.md: 111 → 250 lines — added architecture diagram, message
flow, two-level guard, platform adapters, token locks, process management
- agent-loop.md: 112 → 235 lines — added entry points, API mode resolution,
turn lifecycle detail, message alternation rules, tool execution flow,
callback table, budget tracking, compression details
- architecture.md: 152 → 295 lines — added system overview diagram, data flow
diagrams, design principles table, dependency chain
Other depth additions:
- context-references.md: added platform availability, compression interaction,
common patterns sections
- slash-commands.md: added quick commands config example, alias resolution
- image-generation.md: added platform delivery table
- tools-reference.md: added tool counts, MCP tools note
- index.md: updated platform count (5 → 14+), tool count (40+ → 47)
2026-04-05 19:45:50 -07:00
:::tip MCP Tools
In addition to built-in tools, Hermes can load tools dynamically from MCP servers. MCP tools appear with a server-name prefix (e.g., `github_create_issue` for the `github` MCP server). See [MCP Integration ](/docs/user-guide/features/mcp ) for configuration.
:::
docs: expand Docusaurus coverage across CLI, tools, skills, and skins (#1232)
- add code-derived reference pages for slash commands, tools, toolsets,
bundled skills, and official optional skills
- document the skin system and link visual theming separately from
conversational personality
- refresh quickstart, configuration, environment variable, and messaging
docs to match current provider, gateway, and browser behavior
- fix stale command, session, and Home Assistant configuration guidance
2026-03-13 21:34:41 -07:00
## `browser` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `browser_back` | Navigate back to the previous page in browser history. Requires browser_navigate to be called first. | — |
feat(browser): add browser_cdp raw DevTools Protocol passthrough (#12369)
Agents can now send arbitrary CDP commands to the browser. The tool is
gated on a reachable CDP endpoint at session start — it only appears in
the toolset when BROWSER_CDP_URL is set (from '/browser connect') or
'browser.cdp_url' is configured in config.yaml. Backends that don't
currently expose CDP to the Python side (Camofox, default local
agent-browser, cloud providers whose per-session cdp_url is not yet
surfaced) do not see the tool at all.
Tool schema description links to the CDP method reference at
https://chromedevtools.github.io/devtools-protocol/ so the agent can
web_extract specific method docs on demand.
Stateless per call. Browser-level methods (Target.*, Browser.*,
Storage.*) omit target_id. Page-level methods attach to the target
with flatten=true and dispatch the method on the returned sessionId.
Clean errors when the endpoint becomes unreachable mid-session or
the URL isn't a WebSocket.
Tests: 19 unit (mock CDP server + gate checks) + E2E against real
headless Chrome (Target.getTargets, Browser.getVersion,
Runtime.evaluate with target_id, Page.navigate + re-eval, bogus
method, bogus target_id, missing endpoint) + E2E of the check_fn
gate (tool hidden without CDP URL, visible with it, hidden again
after unset).
2026-04-19 00:03:10 -07:00
| `browser_cdp` | Send a raw Chrome DevTools Protocol (CDP) command. Escape hatch for browser operations not covered by browser_navigate, browser_click, browser_console, etc. Only available when a CDP endpoint is reachable at session start — via `/browser connect` or `browser.cdp_url` config. See https://chromedevtools.github.io/devtools-protocol/ | — |
feat(browser): CDP supervisor — dialog detection + response + cross-origin iframe eval (#14540)
* docs: browser CDP supervisor design (for upcoming PR)
Design doc ahead of implementation — dialog + iframe detection/interaction
via a persistent CDP supervisor. Covers backend capability matrix (verified
live 2026-04-23), architecture, lifecycle, policy, agent surface, PR split,
non-goals, and test plan.
Supersedes #12550.
No code changes in this commit.
* feat(browser): add persistent CDP supervisor for dialog + frame detection
Single persistent CDP WebSocket per Hermes task_id that subscribes to
Page/Runtime/Target events and maintains thread-safe state for pending
dialogs, frame tree, and console errors.
Supervisor lives in its own daemon thread running an asyncio loop;
external callers use sync API (snapshot(), respond_to_dialog()) that
bridges onto the loop.
Auto-attaches to OOPIF child targets via Target.setAutoAttach{flatten:true}
and enables Page+Runtime on each so iframe-origin dialogs surface through
the same supervisor.
Dialog policies: must_respond (default, 300s safety timeout),
auto_dismiss, auto_accept.
Frame tree capped at 30 entries + OOPIF depth 2 to keep snapshot
payloads bounded on ad-heavy pages.
E2E verified against real Chrome via smoke test — detects + responds
to main-frame alerts, iframe-contentWindow alerts, preserves frame
tree, graceful no-dialog error path, clean shutdown.
No agent-facing tool wiring in this commit (comes next).
* feat(browser): add browser_dialog tool wired to CDP supervisor
Agent-facing response-only tool. Schema:
action: 'accept' | 'dismiss' (required)
prompt_text: response for prompt() dialogs (optional)
dialog_id: disambiguate when multiple dialogs queued (optional)
Handler:
SUPERVISOR_REGISTRY.get(task_id).respond_to_dialog(...)
check_fn shares _browser_cdp_check with browser_cdp so both surface and
hide together. When no supervisor is attached (Camofox, default
Playwright, or no browser session started yet), tool is hidden; if
somehow invoked it returns a clear error pointing the agent to
browser_navigate / /browser connect.
Registered in _HERMES_CORE_TOOLS and the browser / hermes-acp /
hermes-api-server toolsets alongside browser_cdp.
* feat(browser): wire CDP supervisor into session lifecycle + browser_snapshot
Supervisor lifecycle:
* _get_session_info lazy-starts the supervisor after a session row is
materialized — covers every backend code path (Browserbase, cdp_url
override, /browser connect, future providers) with one hook.
* cleanup_browser(task_id) stops the supervisor for that task first
(before the backend tears down CDP).
* cleanup_all_browsers() calls SUPERVISOR_REGISTRY.stop_all().
* /browser connect eagerly starts the supervisor for task 'default'
so the first snapshot already shows pending_dialogs.
* /browser disconnect stops the supervisor.
CDP URL resolution for the supervisor:
1. BROWSER_CDP_URL / browser.cdp_url override.
2. Fallback: session_info['cdp_url'] from cloud providers (Browserbase).
browser_snapshot merges supervisor state (pending_dialogs + frame_tree)
into its JSON output when a supervisor is active — the agent reads
pending_dialogs from the snapshot it already requests, then calls
browser_dialog to respond. No extra tool surface.
Config defaults:
* browser.dialog_policy: 'must_respond' (new)
* browser.dialog_timeout_s: 300 (new)
No version bump — new keys deep-merge into existing browser section.
Deadlock fix in supervisor event dispatch:
* _on_dialog_opening and _on_target_attached used to await CDP calls
while the reader was still processing an event — but only the reader
can set the response Future, so the call timed out.
* Both now fire asyncio.create_task(...) so the reader stays pumping.
* auto_dismiss/auto_accept now actually close the dialog immediately.
Tests (tests/tools/test_browser_supervisor.py, 11 tests, real Chrome):
* supervisor start/snapshot
* main-frame alert detection + dismiss
* iframe.contentWindow alert
* prompt() with prompt_text reply
* respond with no pending dialog -> clean error
* auto_dismiss clears on event
* registry idempotency
* registry stop -> snapshot reports inactive
* browser_dialog tool no-supervisor error
* browser_dialog invalid action
* browser_dialog end-to-end via tool handler
xdist-safe: chrome_cdp fixture uses a per-worker port.
Skipped when google-chrome/chromium isn't installed.
* docs(browser): document browser_dialog tool + CDP supervisor
- user-guide/features/browser.md: new browser_dialog section with
workflow, availability gate, and dialog_policy table
- reference/tools-reference.md: row for browser_dialog, tool count
bumped 53 -> 54, browser tools count 11 -> 12
- reference/toolsets-reference.md: browser_dialog added to browser
toolset row with note on pending_dialogs / frame_tree snapshot fields
Full design doc lives at
developer-guide/browser-supervisor.md (committed earlier).
* fix(browser): reconnect loop + recent_dialogs for Browserbase visibility
Found via Browserbase E2E test that revealed two production-critical issues:
1. **Supervisor WebSocket drops when other clients disconnect.** Browserbase's
CDP proxy tears down our long-lived WebSocket whenever a short-lived
client (e.g. agent-browser CLI's per-command CDP connection) disconnects.
Fixed with a reconnecting _run loop that re-attaches with exponential
backoff on drops. _page_session_id and _child_sessions are reset on each
reconnect; pending_dialogs and frames are preserved across reconnects.
2. **Browserbase auto-dismisses dialogs server-side within ~10ms.** Their
Playwright-based CDP proxy dismisses alert/confirm/prompt before our
Page.handleJavaScriptDialog call can respond. So pending_dialogs is
empty by the time the agent reads a snapshot on Browserbase.
Added a recent_dialogs ring buffer (capacity 20) that retains a
DialogRecord for every dialog that opened, with a closed_by tag:
* 'agent' — agent called browser_dialog
* 'auto_policy' — local auto_dismiss/auto_accept fired
* 'watchdog' — must_respond timeout auto-dismissed (300s default)
* 'remote' — browser/backend closed it on us (Browserbase)
Agents on Browserbase now see the dialog history with closed_by='remote'
so they at least know a dialog fired, even though they couldn't respond.
3. **Page.javascriptDialogClosed matching bug.** The event doesn't include a
'message' field (CDP spec has only 'result' and 'userInput') but our
_on_dialog_closed was matching on message. Fixed to match by session_id
+ oldest-first, with a safety assumption that only one dialog is in
flight per session (the JS thread is blocked while a dialog is up).
Docs + tests updated:
* browser.md: new availability matrix showing the three backends and
which mode (pending / recent / response) each supports
* developer-guide/browser-supervisor.md: three-field snapshot schema
with closed_by semantics
* test_browser_supervisor.py: +test_recent_dialogs_ring_buffer (12/12
passing against real Chrome)
E2E verified both backends:
* Local Chrome via /browser connect: detect + respond full workflow
(smoke_supervisor.py all 7 scenarios pass)
* Browserbase: detect via recent_dialogs with closed_by='remote'
(smoke_supervisor_browserbase_v2.py passes)
Camofox remains out of scope (REST-only, no CDP) — tracked for
upstream PR 3.
* feat(browser): XHR bridge for dialog response on Browserbase (FIXED)
Browserbase's CDP proxy auto-dismisses native JS dialogs within ~10ms, so
Page.handleJavaScriptDialog calls lose the race. Solution: bypass native
dialogs entirely.
The supervisor now injects Page.addScriptToEvaluateOnNewDocument with a
JavaScript override for window.alert/confirm/prompt. Those overrides
perform a synchronous XMLHttpRequest to a magic host
('hermes-dialog-bridge.invalid'). We intercept those XHRs via Fetch.enable
with a requestStage=Request pattern.
Flow when a page calls alert('hi'):
1. window.alert override intercepts, builds XHR GET to
http://hermes-dialog-bridge.invalid/?kind=alert&message=hi
2. Sync XHR blocks the page's JS thread (mirrors real dialog semantics)
3. Fetch.requestPaused fires on our WebSocket; supervisor surfaces
it as a pending dialog with bridge_request_id set
4. Agent reads pending_dialogs from browser_snapshot, calls browser_dialog
5. Supervisor calls Fetch.fulfillRequest with JSON body:
{accept: true|false, prompt_text: '...', dialog_id: 'd-N'}
6. The injected script parses the body, returns the appropriate value
from the override (undefined for alert, bool for confirm, string|null
for prompt)
This works identically on Browserbase AND local Chrome — no native dialog
ever fires, so Browserbase's auto-dismiss has nothing to race. Dialog
policies (must_respond / auto_dismiss / auto_accept) all still work.
Bridge is installed on every attached session (main page + OOPIF child
sessions) so iframe dialogs are captured too.
Native-dialog path kept as a fallback for backends that don't auto-dismiss
(so a page that somehow bypasses our override — e.g. iframes that load
after Fetch.enable but before the init-script runs — still gets observed
via Page.javascriptDialogOpening).
E2E VERIFIED:
* Local Chrome: 13/13 pytest tests green (12 original + new
test_bridge_captures_prompt_and_returns_reply_text that asserts
window.__ret === 'AGENT-SUPPLIED-REPLY' after agent responds)
* Browserbase: smoke_bb_bridge_v2.py runs 4/4 PASS:
- alert('BB-ALERT-MSG') dismiss → page.alert_ret = undefined ✓
- prompt('BB-PROMPT-MSG', 'default-xyz') accept with 'AGENT-REPLY'
→ page.prompt_ret === 'AGENT-REPLY' ✓
- confirm('BB-CONFIRM-MSG') accept → page.confirm_ret === true ✓
- confirm('BB-CONFIRM-MSG') dismiss → page.confirm_ret === false ✓
Docs updated in browser.md and developer-guide/browser-supervisor.md —
availability matrix now shows Browserbase at full parity with local
Chrome for both detection and response.
* feat(browser): cross-origin iframe interaction via browser_cdp(frame_id=...)
Adds iframe interaction to the CDP supervisor PR (was queued as PR 2).
Design: browser_cdp gets an optional frame_id parameter. When set, the
tool looks up the frame in the supervisor's frame_tree, grabs its child
cdp_session_id (OOPIF session), and dispatches the CDP call through the
supervisor's already-connected WebSocket via run_coroutine_threadsafe.
Why not stateless: on Browserbase, each fresh browser_cdp WebSocket
must re-negotiate against a signed connectUrl. The session info carries
a specific URL that can expire while the supervisor's long-lived
connection stays valid. Routing via the supervisor sidesteps this.
Agent workflow:
1. browser_snapshot → frame_tree.children[] shows OOPIFs with is_oopif=true
2. browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF frame_id>,
params={'expression': 'document.title', 'returnByValue': True})
3. Supervisor dispatches the call on the OOPIF's child session
Supervisor state fixes needed along the way:
* _on_frame_detached now skips reason='swap' (frame migrating processes)
* _on_frame_detached also skips when the frame is an OOPIF with a live
child session — Browserbase fires spurious remove events when a
same-origin iframe gets promoted to OOPIF
* _on_target_detached clears cdp_session_id but KEEPS the frame record
so the agent still sees the OOPIF in frame_tree during transient
session flaps
E2E VERIFIED on Browserbase (smoke_bb_iframe_agent_path.py):
browser_cdp(method='Runtime.evaluate',
params={'expression': 'document.title', 'returnByValue': True},
frame_id=<OOPIF>)
→ {'success': True, 'result': {'value': 'Example Domain'}}
The iframe is <iframe src='https://example.com/'> inside a top-level
data: URL page on a real Browserbase session. The agent Runtime.evaluates
INSIDE the cross-origin iframe and gets example.com's title back.
Tests (tests/tools/test_browser_supervisor.py — 16 pass total):
* test_browser_cdp_frame_id_routes_via_supervisor — injects fake OOPIF,
verifies routing via supervisor, Runtime.evaluate returns 1+1=2
* test_browser_cdp_frame_id_missing_supervisor — clean error when no
supervisor attached
* test_browser_cdp_frame_id_not_in_frame_tree — clean error on bad
frame_id
Docs (browser.md and developer-guide/browser-supervisor.md) updated with
the iframe workflow, availability matrix now shows OOPIF eval as shipped
for local Chrome + Browserbase.
* test(browser): real-OOPIF E2E verified manually + chrome_cdp uses --site-per-process
When asked 'did you test the iframe stuff' I had only done a mocked
pytest (fake injected OOPIF) plus a Browserbase E2E. Closed the
local-Chrome real-OOPIF gap by writing /tmp/dialog-iframe-test/
smoke_local_oopif.py:
* 2 http servers on different hostnames (localhost:18905 + 127.0.0.1:18906)
* Chrome with --site-per-process so the cross-origin iframe becomes a
real OOPIF in its own process
* Navigate, find OOPIF in supervisor.frame_tree, call
browser_cdp(method='Runtime.evaluate', frame_id=<OOPIF>) which routes
through the supervisor's child session
* Asserts iframe document.title === 'INNER-FRAME-XYZ' (from the
inner page, retrieved via OOPIF eval)
PASSED on 2026-04-23.
Tried to embed this as a pytest but hit an asyncio version quirk between
venv (3.11) and the system python (3.13) — Page.navigate hangs in the
pytest harness but works in standalone. Left a self-documenting skip
test that points to the smoke script + describes the verification.
chrome_cdp fixture now passes --site-per-process so future iframe tests
can rely on OOPIF behavior.
Result: 16 pass + 1 documented-skip = 17 tests in
tests/tools/test_browser_supervisor.py.
* docs(browser): add dialog_policy + dialog_timeout_s to configuration.md, fix tool count
Pre-merge docs audit revealed two gaps:
1. user-guide/configuration.md browser config example was missing the
two new dialog_* knobs. Added with a short table explaining
must_respond / auto_dismiss / auto_accept semantics and a link to
the feature page for the full workflow.
2. reference/tools-reference.md header said '54 built-in tools' — real
count on main is 54, this branch adds browser_dialog so it's 55.
Fixed the header. (browser count was already correctly bumped
11 -> 12 in the earlier docs commit.)
No code changes.
2026-04-23 22:23:37 -07:00
| `browser_dialog` | Respond to a native JavaScript dialog (alert / confirm / prompt / beforeunload). Call `browser_snapshot` first — pending dialogs appear in its `pending_dialogs` field. Then call `browser_dialog(action='accept'|'dismiss')` . Same availability as `browser_cdp` (Browserbase or `/browser connect` ). | — |
docs: expand Docusaurus coverage across CLI, tools, skills, and skins (#1232)
- add code-derived reference pages for slash commands, tools, toolsets,
bundled skills, and official optional skills
- document the skin system and link visual theming separately from
conversational personality
- refresh quickstart, configuration, environment variable, and messaging
docs to match current provider, gateway, and browser behavior
- fix stale command, session, and Home Assistant configuration guidance
2026-03-13 21:34:41 -07:00
| `browser_click` | Click on an element identified by its ref ID from the snapshot (e.g., '@e5 '). The ref IDs are shown in square brackets in the snapshot output. Requires browser_navigate and browser_snapshot to be called first. | — |
| `browser_console` | Get browser console output and JavaScript errors from the current page. Returns console.log/warn/error/info messages and uncaught JS exceptions. Use this to detect silent JavaScript errors, failed API calls, and application warnings. Requi… | — |
| `browser_get_images` | Get a list of all images on the current page with their URLs and alt text. Useful for finding images to analyze with the vision tool. Requires browser_navigate to be called first. | — |
| `browser_navigate` | Navigate to a URL in the browser. Initializes the session and loads the page. Must be called before other browser tools. For simple information retrieval, prefer web_search or web_extract (faster, cheaper). Use browser tools when you need… | — |
| `browser_press` | Press a keyboard key. Useful for submitting forms (Enter), navigating (Tab), or keyboard shortcuts. Requires browser_navigate to be called first. | — |
| `browser_scroll` | Scroll the page in a direction. Use this to reveal more content that may be below or above the current viewport. Requires browser_navigate to be called first. | — |
| `browser_snapshot` | Get a text-based snapshot of the current page's accessibility tree. Returns interactive elements with ref IDs (like @e1 , @e2 ) for browser_click and browser_type. full=false (default): compact view with interactive elements. full=true: comp… | — |
| `browser_type` | Type text into an input field identified by its ref ID. Clears the field first, then types the new text. Requires browser_navigate and browser_snapshot to be called first. | — |
| `browser_vision` | Take a screenshot of the current page and analyze it with vision AI. Use this when you need to visually understand what's on the page - especially useful for CAPTCHAs, visual verification challenges, complex layouts, or when the text snaps… | — |
## `clarify` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `clarify` | Ask the user a question when you need clarification, feedback, or a decision before proceeding. Supports two modes: 1. **Multiple choice ** — provide up to 4 choices. The user picks one or types their own answer via a 5th 'Other' option. 2.… | — |
## `code_execution` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `execute_code` | Run a Python script that can call Hermes tools programmatically. Use this when you need 3+ tool calls with processing logic between them, need to filter/reduce large tool outputs before they enter your context, need conditional branching (… | — |
## `cronjob` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
2026-03-14 19:18:10 -07:00
| `cronjob` | Unified scheduled-task manager. Use `action="create"` , `"list"` , `"update"` , `"pause"` , `"resume"` , `"run"` , or `"remove"` to manage jobs. Supports skill-backed jobs with one or more attached skills, and `skills=[]` on update clears attached skills. Cron runs happen in fresh sessions with no current-chat context. | — |
docs: expand Docusaurus coverage across CLI, tools, skills, and skins (#1232)
- add code-derived reference pages for slash commands, tools, toolsets,
bundled skills, and official optional skills
- document the skin system and link visual theming separately from
conversational personality
- refresh quickstart, configuration, environment variable, and messaging
docs to match current provider, gateway, and browser behavior
- fix stale command, session, and Home Assistant configuration guidance
2026-03-13 21:34:41 -07:00
## `delegation` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `delegate_task` | Spawn one or more subagents to work on tasks in isolated contexts. Each subagent gets its own conversation, terminal session, and toolset. Only the final summary is returned -- intermediate tool results never enter your context window. TWO… | — |
docs: correctness audit — fix wrong values, add missing coverage (#11972)
Comprehensive audit of every reference/messaging/feature doc page against the
live code registries (PROVIDER_REGISTRY, OPTIONAL_ENV_VARS, COMMAND_REGISTRY,
TOOLSETS, tool registry, on-disk skills). Every fix was verified against code
before writing.
### Wrong values fixed (users would paste-and-fail)
- reference/environment-variables.md:
- DASHSCOPE_BASE_URL default was `coding-intl.dashscope.aliyuncs.com/v1` \u2192
actual `dashscope-intl.aliyuncs.com/compatible-mode/v1`.
- MINIMAX_BASE_URL and MINIMAX_CN_BASE_URL defaults were `/v1` \u2192 actual
`/anthropic` (Hermes calls MiniMax via its Anthropic Messages endpoint).
- reference/toolsets-reference.md MCP example used the non-existent nested
`mcp: servers:` key \u2192 real key is the flat `mcp_servers:`.
- reference/skills-catalog.md listed ~20 bundled skills that no longer exist
on disk (all moved to `optional-skills/`). Regenerated the whole bundled
section from `skills/**/SKILL.md` \u2014 79 skills, accurate paths and names.
- messaging/slack.md ":::info" callout claimed Slack has no
`free_response_channels` equivalent; both the env var and the yaml key are
in fact read.
- messaging/qqbot.md documented `QQ_MARKDOWN_SUPPORT` as an env var, but the
adapter only reads `extra.markdown_support` from config.yaml. Removed the
env var row and noted config-only nature.
- messaging/qqbot.md `hermes setup gateway` \u2192 `hermes gateway setup`.
### Missing coverage added
- Providers: AWS Bedrock and Qwen Portal (qwen-oauth) \u2014 both in
PROVIDER_REGISTRY but undocumented everywhere. Added sections to
integrations/providers.md, rows to quickstart.md and fallback-providers.md.
- integrations/providers.md "Fallback Model" provider list now includes
gemini, google-gemini-cli, qwen-oauth, xai, nvidia, ollama-cloud, bedrock.
- reference/cli-commands.md `--provider` enum and HERMES_INFERENCE_PROVIDER
enum in env-vars now include the same set.
- reference/slash-commands.md: added `/agents` (alias `/tasks`) and `/copy`.
Removed duplicate rows for `/snapshot`, `/fast` (\u00d72), `/debug`.
- reference/tools-reference.md: fixed "47 built-in tools" \u2192 52. Added
`feishu_doc` and `feishu_drive` toolset sections.
- reference/toolsets-reference.md: added `feishu_doc` / `feishu_drive` core
rows + all missing `hermes-<platform>` toolsets in the platform table
(bluebubbles, dingtalk, feishu, qqbot, wecom, wecom-callback, weixin,
homeassistant, webhook, gateway). Fixed the `debugging` composite to
describe the actual `includes=[...]` mechanism.
- reference/optional-skills-catalog.md: added `fitness-nutrition`.
- reference/environment-variables.md: added NOUS_BASE_URL,
NOUS_INFERENCE_BASE_URL, NVIDIA_API_KEY/BASE_URL, OLLAMA_API_KEY/BASE_URL,
XAI_API_KEY/BASE_URL, MISTRAL_API_KEY, AWS_REGION/AWS_PROFILE,
BEDROCK_BASE_URL, HERMES_QWEN_BASE_URL, DISCORD_ALLOWED_CHANNELS,
DISCORD_PROXY, TELEGRAM_REPLY_TO_MODE, MATRIX_DEVICE_ID, MATRIX_REACTIONS,
QQBOT_HOME_CHANNEL_NAME, QQ_SANDBOX.
- messaging/discord.md: documented DISCORD_ALLOWED_CHANNELS, DISCORD_PROXY,
HERMES_DISCORD_TEXT_BATCH_DELAY_SECONDS and HERMES_DISCORD_TEXT_BATCH_SPLIT
_DELAY_SECONDS (all actively read by the adapter).
- messaging/matrix.md: documented MATRIX_REACTIONS (default true).
- messaging/telegram.md: removed the redundant second Webhook Mode section
that invented a `telegram.webhook_mode: true` yaml key the adapter does
not read.
- user-guide/features/hooks.md: added `on_session_finalize` and
`on_session_reset` (both emitted via invoke_hook but undocumented).
- user-guide/features/api-server.md: documented GET /health/detailed, the
`/api/jobs/*` CRUD surface, POST /v1/runs, and GET /v1/runs/{id}/events
(10 routes that were live but undocumented).
- user-guide/features/fallback-providers.md: added `approval` and
`title_generation` auxiliary-task rows; added gemini, bedrock, qwen-oauth
to the supported-providers table.
- user-guide/features/tts.md: "seven providers" \u2192 "eight" (post-xAI add
oversight in #11942).
- user-guide/configuration.md: TTS provider enum gains `xai` and `gemini`;
yaml example block gains `mistral:`, `gemini:`, `xai:` subsections.
Auxiliary-provider enum now enumerates all real registry entries.
- reference/faq.md: stale AIAgent/config examples bumped from
`nous/hermes-3-llama-3.1-70b` and `claude-sonnet-4.6` to
`claude-opus-4.7`.
### Docs-site integrity
- guides/build-a-hermes-plugin.md referenced two nonexistent hooks
(`pre_api_request`, `post_api_request`). Replaced with the real
`on_session_finalize` / `on_session_reset` entries.
- messaging/open-webui.md and features/api-server.md had pre-existing
broken links to `/docs/user-guide/features/profiles` (actual path is
`/docs/user-guide/profiles`). Fixed.
- reference/skills-catalog.md had one `<1%` literal that MDX parsed as a
JSX tag. Escaped to `<1%`.
### False positives filtered out (not changed, verified correct)
- `/set-home` is a registered alias of `/sethome` \u2014 docs were fine.
- `hermes setup gateway` is valid syntax (`hermes setup \<section\>`);
changed in qqbot.md for cross-doc consistency, not as a bug fix.
- Telegram reactions "disabled by default" matches code (default `"false"`).
- Matrix encryption "opt-in" matches code (empty env default \u2192 disabled).
- `pre_api_request` / `post_api_request` hooks do NOT exist in current code;
documented instead the real `on_session_finalize` / `on_session_reset`.
- SIGNAL_IGNORE_STORIES is already in env-vars.md (subagent missed it).
Validation:
- `docusaurus build` \u2014 passes (only pre-existing nix-setup anchor warning).
- `ascii-guard lint docs` \u2014 124 files, 0 errors.
- 22 files changed, +317 / \u2212158.
2026-04-18 01:45:48 -07:00
## `feishu_doc` toolset
Scoped to the Feishu document-comment intelligent-reply handler (`gateway/platforms/feishu_comment.py` ). Not exposed on `hermes-cli` or the regular Feishu chat adapter.
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `feishu_doc_read` | Read the full text content of a Feishu/Lark document (Docx, Doc, or Sheet) given its file_type and token. | Feishu app credentials |
## `feishu_drive` toolset
Scoped to the Feishu document-comment handler. Drives comment read/write operations on drive files.
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `feishu_drive_add_comment` | Add a top-level comment on a Feishu/Lark document or file. | Feishu app credentials |
| `feishu_drive_list_comments` | List whole-document comments on a Feishu/Lark file, most recent first. | Feishu app credentials |
| `feishu_drive_list_comment_replies` | List replies on a specific Feishu comment thread (whole-doc or local-selection). | Feishu app credentials |
| `feishu_drive_reply_comment` | Post a reply on a Feishu comment thread, with optional `@` -mention. | Feishu app credentials |
docs: expand Docusaurus coverage across CLI, tools, skills, and skins (#1232)
- add code-derived reference pages for slash commands, tools, toolsets,
bundled skills, and official optional skills
- document the skin system and link visual theming separately from
conversational personality
- refresh quickstart, configuration, environment variable, and messaging
docs to match current provider, gateway, and browser behavior
- fix stale command, session, and Home Assistant configuration guidance
2026-03-13 21:34:41 -07:00
## `file` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `patch` | Targeted find-and-replace edits in files. Use this instead of sed/awk in terminal. Uses fuzzy matching (9 strategies) so minor whitespace/indentation differences won't break it. Returns a unified diff. Auto-runs syntax checks after editing… | — |
| `read_file` | Read a text file with line numbers and pagination. Use this instead of cat/head/tail in terminal. Output format: 'LINE_NUM\|CONTENT'. Suggests similar filenames if not found. Use offset and limit for large files. NOTE: Cannot read images o… | — |
| `search_files` | Search file contents or find files by name. Use this instead of grep/rg/find/ls in terminal. Ripgrep-backed, faster than shell equivalents. Content search (target='content'): Regex search inside files. Output modes: full matches with line… | — |
| `write_file` | Write content to a file, completely replacing existing content. Use this instead of echo/cat heredoc in terminal. Creates parent directories automatically. OVERWRITES the entire file — use 'patch' for targeted edits. | — |
## `homeassistant` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `ha_call_service` | Call a Home Assistant service to control a device. Use ha_list_services to discover available services and their parameters for each domain. | — |
| `ha_get_state` | Get the detailed state of a single Home Assistant entity, including all attributes (brightness, color, temperature setpoint, sensor readings, etc.). | — |
| `ha_list_entities` | List Home Assistant entities. Optionally filter by domain (light, switch, climate, sensor, binary_sensor, cover, fan, etc.) or by area name (living room, kitchen, bedroom, etc.). | — |
| `ha_list_services` | List available Home Assistant services (actions) for device control. Shows what actions can be performed on each device type and what parameters they accept. Use this to discover how to control devices found via ha_list_entities. | — |
2026-04-03 23:30:19 -07:00
:::note
feat(honcho): context injection overhaul, 5-tool surface, cost safety, session isolation (#10619)
Salvaged from PR #9884 by erosika. Cherry-picked plugin changes onto
current main with minimal core modifications.
Plugin changes (plugins/memory/honcho/):
- New honcho_reasoning tool (5th tool, splits LLM calls from honcho_context)
- Two-layer context injection: base context (summary + representation + card)
on contextCadence, dialectic supplement on dialecticCadence
- Multi-pass dialectic depth (1-3 passes) with early bail-out on strong signal
- Cold/warm prompt selection based on session state
- dialecticCadence defaults to 3 (was 1) — ~66% fewer Honcho LLM calls
- Session summary injection for conversational continuity
- Bidirectional peer targeting on all 5 tools
- Correctness fixes: peer param fallback, None guard on set_peer_card,
schema validation, signal_sufficient anchored regex, mid->medium level fix
Core changes (~20 lines across 3 files):
- agent/memory_manager.py: Enhanced sanitize_context() to strip full
<memory-context> blocks and system notes (prevents leak from saveMessages)
- run_agent.py: gateway_session_key param for stable per-chat Honcho sessions,
on_turn_start() call before prefetch_all() for cadence tracking,
sanitize_context() on user messages to strip leaked memory blocks
- gateway/run.py: skip_memory=True on 2 temp agents (prevents orphan sessions),
gateway_session_key threading to main agent
Tests: 509 passed (3 skipped — honcho SDK not installed locally)
Docs: Updated honcho.md, memory-providers.md, tools-reference.md, SKILL.md
Co-authored-by: erosika <erosika@users.noreply.github.com>
2026-04-15 19:12:19 -07:00
**Honcho tools** (`honcho_profile` , `honcho_search` , `honcho_context` , `honcho_reasoning` , `honcho_conclude` ) are no longer built-in. They are available via the Honcho memory provider plugin at `plugins/memory/honcho/` . See [Memory Providers ](../user-guide/features/memory-providers.md ) for installation and usage.
2026-04-03 23:30:19 -07:00
:::
docs: expand Docusaurus coverage across CLI, tools, skills, and skins (#1232)
- add code-derived reference pages for slash commands, tools, toolsets,
bundled skills, and official optional skills
- document the skin system and link visual theming separately from
conversational personality
- refresh quickstart, configuration, environment variable, and messaging
docs to match current provider, gateway, and browser behavior
- fix stale command, session, and Home Assistant configuration guidance
2026-03-13 21:34:41 -07:00
## `image_gen` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
feat(image_gen): multi-model FAL support with picker in hermes tools (#11265)
* feat(image_gen): multi-model FAL support with picker in hermes tools
Adds 8 FAL text-to-image models selectable via `hermes tools` →
Image Generation → (FAL.ai | Nous Subscription) → model picker.
Models supported:
- fal-ai/flux-2/klein/9b (new default, <1s, $0.006/MP)
- fal-ai/flux-2-pro (previous default, kept backward-compat upscaling)
- fal-ai/z-image/turbo (Tongyi-MAI, bilingual EN/CN)
- fal-ai/nano-banana (Gemini 2.5 Flash Image)
- fal-ai/gpt-image-1.5 (with quality tier: low/medium/high)
- fal-ai/ideogram/v3 (best typography)
- fal-ai/recraft-v3 (vector, brand styles)
- fal-ai/qwen-image (LLM-based)
Architecture:
- FAL_MODELS catalog declares per-model size family, defaults, supports
whitelist, and upscale flag. Three size families handled uniformly:
image_size_preset (flux family), aspect_ratio (nano-banana), and
gpt_literal (gpt-image-1.5).
- _build_fal_payload() translates unified inputs (prompt + aspect_ratio)
into model-specific payloads, merges defaults, applies caller overrides,
wires GPT quality_setting, then filters to the supports whitelist — so
models never receive rejected keys.
- IMAGEGEN_BACKENDS registry in tools_config prepares for future imagegen
providers (Replicate, Stability, etc.); each provider entry tags itself
with imagegen_backend: 'fal' to select the right catalog.
- Upscaler (Clarity) defaults off for new models (preserves <1s value
prop), on for flux-2-pro (backward-compat). Per-model via FAL_MODELS.
Config:
image_gen.model = fal-ai/flux-2/klein/9b (new)
image_gen.quality_setting = medium (new, GPT only)
image_gen.use_gateway = bool (existing)
Agent-facing schema unchanged (prompt + aspect_ratio only) — model
choice is a user-level config decision, not an agent-level arg.
Picker uses curses_radiolist (arrow keys, auto numbered-fallback on
non-TTY). Column-aligned: Model / Speed / Strengths / Price.
Docs: image-generation.md rewritten with the model table and picker
walkthrough. tools-reference, tool-gateway, overview updated to drop
the stale "FLUX 2 Pro" wording.
Tests: 42 new in tests/tools/test_image_generation.py covering catalog
integrity, all 3 size families, supports filter, default merging, GPT
quality wiring, model resolution fallback. 8 new in
tests/hermes_cli/test_tools_config.py for picker wiring (registry,
config writes, GPT quality follow-up prompt, corrupt-config repair).
* feat(image_gen): translate managed-gateway 4xx to actionable error
When the Nous Subscription managed FAL proxy rejects a model with 4xx
(likely portal-side allowlist miss or billing gate), surface a clear
message explaining:
1. The rejected model ID + HTTP status
2. Two remediation paths: set FAL_KEY for direct access, or
pick a different model via `hermes tools`
5xx, connection errors, and direct-FAL errors pass through unchanged
(those have different root causes and reasonable native messages).
Motivation: new FAL models added to this release (flux-2-klein-9b,
z-image-turbo, nano-banana, gpt-image-1.5, ideogram-v3, recraft-v3,
qwen-image) are untested against the Nous Portal proxy. If the portal
allowlists model IDs, users on Nous Subscription will hit cryptic
4xx errors without guidance on how to work around it.
Tests: 8 new cases covering status extraction across httpx/fal error
shapes and 4xx-vs-5xx-vs-ConnectionError translation policy.
Docs: brief note in image-generation.md for Nous subscribers.
Operator action (Nous Portal side): verify that fal-queue-gateway
passes through these 7 new FAL model IDs. If the proxy has an
allowlist, add them; otherwise Nous Subscription users will see the
new translated error and fall back to direct FAL.
* feat(image_gen): pin GPT-Image quality to medium (no user choice)
Previously the tools picker asked a follow-up question for GPT-Image
quality tier (low / medium / high) and persisted the answer to
`image_gen.quality_setting`. This created two problems:
1. Nous Portal billing complexity — the 22x cost spread between tiers
($0.009 low / $0.20 high) forces the gateway to meter per-tier per
user, which the portal team can't easily support at launch.
2. User footgun — anyone picking `high` by mistake burns through
credit ~6x faster than `medium`.
This commit pins quality at medium by baking it into FAL_MODELS
defaults for gpt-image-1.5 and removes all user-facing override paths:
- Removed `_resolve_gpt_quality()` runtime lookup
- Removed `honors_quality_setting` flag on the model entry
- Removed `_configure_gpt_quality_setting()` picker helper
- Removed `_GPT_QUALITY_CHOICES` constant
- Removed the follow-up prompt call in `_configure_imagegen_model()`
- Even if a user manually edits `image_gen.quality_setting` in
config.yaml, no code path reads it — always sends medium.
Tests:
- Replaced TestGptQualitySetting (6 tests) with TestGptQualityPinnedToMedium
(5 tests) — proves medium is baked in, config is ignored, flag is
removed, helper is removed, non-gpt models never get quality.
- Replaced test_picker_with_gpt_image_also_prompts_quality with
test_picker_with_gpt_image_does_not_prompt_quality — proves only 1
picker call fires when gpt-image is selected (no quality follow-up).
Docs updated: image-generation.md replaces the quality-tier table
with a short note explaining the pinning decision.
* docs(image_gen): drop stale 'wires GPT quality tier' line from internals section
Caught in a cleanup sweep after pinning quality to medium. The
"How It Works Internally" walkthrough still described the removed
quality-wiring step.
2026-04-16 20:19:53 -07:00
| `image_generate` | Generate high-quality images from text prompts using FAL.ai. The underlying model is user-configured (default: FLUX 2 Klein 9B, sub-1s generation) and is not selectable by the agent. Returns a single image URL. Display it using… | FAL_KEY |
docs: expand Docusaurus coverage across CLI, tools, skills, and skins (#1232)
- add code-derived reference pages for slash commands, tools, toolsets,
bundled skills, and official optional skills
- document the skin system and link visual theming separately from
conversational personality
- refresh quickstart, configuration, environment variable, and messaging
docs to match current provider, gateway, and browser behavior
- fix stale command, session, and Home Assistant configuration guidance
2026-03-13 21:34:41 -07:00
## `memory` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `memory` | Save important information to persistent memory that survives across sessions. Your memory appears in your system prompt at session start -- it's how you remember things about the user and your environment between conversations. WHEN TO SA… | — |
## `messaging` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `send_message` | Send a message to a connected messaging platform, or list available targets. IMPORTANT: When the user asks to send to a specific channel or person (not just a bare platform name), call send_message(action='list') FIRST to see available tar… | — |
## `moa` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `mixture_of_agents` | Route a hard problem through multiple frontier LLMs collaboratively. Makes 5 API calls (4 reference models + 1 aggregator) with maximum reasoning effort — use sparingly for genuinely difficult problems. Best for: complex math, advanced alg… | OPENROUTER_API_KEY |
## `rl` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `rl_check_status` | Get status and metrics for a training run. RATE LIMITED: enforces 30-minute minimum between checks for the same run. Returns WandB metrics: step, state, reward_mean, loss, percent_correct. | TINKER_API_KEY, WANDB_API_KEY |
| `rl_edit_config` | Update a configuration field. Use rl_get_current_config() first to see all available fields for the selected environment. Each environment has different configurable options. Infrastructure settings (tokenizer, URLs, lora_rank, learning_ra… | TINKER_API_KEY, WANDB_API_KEY |
| `rl_get_current_config` | Get the current environment configuration. Returns only fields that can be modified: group_size, max_token_length, total_steps, steps_per_eval, use_wandb, wandb_name, max_num_workers. | TINKER_API_KEY, WANDB_API_KEY |
| `rl_get_results` | Get final results and metrics for a completed training run. Returns final metrics and path to trained weights. | TINKER_API_KEY, WANDB_API_KEY |
| `rl_list_environments` | List all available RL environments. Returns environment names, paths, and descriptions. TIP: Read the file_path with file tools to understand how each environment works (verifiers, data loading, rewards). | TINKER_API_KEY, WANDB_API_KEY |
| `rl_list_runs` | List all training runs (active and completed) with their status. | TINKER_API_KEY, WANDB_API_KEY |
| `rl_select_environment` | Select an RL environment for training. Loads the environment's default configuration. After selecting, use rl_get_current_config() to see settings and rl_edit_config() to modify them. | TINKER_API_KEY, WANDB_API_KEY |
| `rl_start_training` | Start a new RL training run with the current environment and config. Most training parameters (lora_rank, learning_rate, etc.) are fixed. Use rl_edit_config() to set group_size, batch_size, wandb_project before starting. WARNING: Training… | TINKER_API_KEY, WANDB_API_KEY |
| `rl_stop_training` | Stop a running training job. Use if metrics look bad, training is stagnant, or you want to try different settings. | TINKER_API_KEY, WANDB_API_KEY |
| `rl_test_inference` | Quick inference test for any environment. Runs a few steps of inference + scoring using OpenRouter. Default: 3 steps x 16 completions = 48 rollouts per model, testing 3 models = 144 total. Tests environment loading, prompt construction, in… | TINKER_API_KEY, WANDB_API_KEY |
## `session_search` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `session_search` | Search your long-term memory of past conversations. This is your recall -- every past session is searchable, and this tool summarizes what happened. USE THIS PROACTIVELY when: - The user says 'we did this before', 'remember when', 'last ti… | — |
## `skills` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `skill_manage` | Manage skills (create, update, delete). Skills are your procedural memory — reusable approaches for recurring task types. New skills go to ~/.hermes/skills/; existing skills can be modified wherever they live. Actions: create (full SKILL.m… | — |
| `skill_view` | Skills allow for loading information about specific tasks and workflows, as well as scripts and templates. Load a skill's full content or access its linked files (references, templates, scripts). First call returns SKILL.md content plus a… | — |
| `skills_list` | List available skills (name + description). Use skill_view(name) to load full content. | — |
## `terminal` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `process` | Manage background processes started with terminal(background=true). Actions: 'list' (show all), 'poll' (check status + new output), 'log' (full output with pagination), 'wait' (block until done or timeout), 'kill' (terminate), 'write' (sen… | — |
2026-04-07 10:21:03 -07:00
| `terminal` | Execute shell commands on a Linux environment. Filesystem persists between calls. Set `background=true` for long-running servers. Set `notify_on_complete=true` (with `background=true` ) to get an automatic notification when the process finishes — no polling needed. Do NOT use cat/head/tail — use read_file. Do NOT use grep/rg/find — use search_files. | — |
docs: expand Docusaurus coverage across CLI, tools, skills, and skins (#1232)
- add code-derived reference pages for slash commands, tools, toolsets,
bundled skills, and official optional skills
- document the skin system and link visual theming separately from
conversational personality
- refresh quickstart, configuration, environment variable, and messaging
docs to match current provider, gateway, and browser behavior
- fix stale command, session, and Home Assistant configuration guidance
2026-03-13 21:34:41 -07:00
## `todo` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `todo` | Manage your task list for the current session. Use for complex tasks with 3+ steps or when the user provides multiple tasks. Call with no parameters to read the current list. Writing: - Provide 'todos' array to create/update items - merge=… | — |
docs: fix documentation inconsistencies across reference and user guides
- toolsets-reference: add browser_console to browser + all platform toolsets,
add missing hermes-acp, hermes-sms, messaging toolsets, correct hermes-gateway
as composite, deduplicate platform toolset listings
- tools-reference: add missing vision and web toolset sections
- slash-commands: fix /new+/reset as alias (not separate commands), add /stop to
CLI section (available in both CLI and gateway), add /plugins command, fix Notes
section about messaging-only vs CLI-only
- environment-variables: fix HERMES_MAX_ITERATIONS default (90 not 60), add
DEEPSEEK_API_KEY/BASE_URL, OPENCODE_ZEN/GO keys, TAVILY_API_KEY,
GITHUB_TOKEN, HERMES_EPHEMERAL_SYSTEM_PROMPT
- configuration: remove duplicate Alibaba Cloud row, add OpenCode Zen/Go providers
- cli-commands: add missing providers to --provider list (opencode-zen,
opencode-go, ai-gateway, kilocode, alibaba)
- quickstart: add OpenCode Zen and OpenCode Go to provider table
Co-authored-by: Test <test@test.com>
2026-03-18 16:26:27 -07:00
## `vision` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `vision_analyze` | Analyze images using AI vision. Provides a comprehensive description and answers a specific question about the image content. | — |
## `web` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
docs: comprehensive documentation audit — fix 9 HIGH, 20+ MEDIUM gaps (#4087)
Reference docs fixes:
- cli-commands.md: remove non-existent --provider alibaba, add hermes
profile/completion/plugins/mcp to top-level table, add --profile/-p
global flag, add --source chat option
- slash-commands.md: add /yolo and /commands, fix /q alias conflict
(resolves to /queue not /quit), add missing aliases (/bg, /set-home,
/reload_mcp, /gateway)
- toolsets-reference.md: fix hermes-api-server (not same as hermes-cli,
omits clarify/send_message/text_to_speech)
- profile-commands.md: fix show name required not optional, --clone-from
not --from, add --remove/--name to alias, fix alias path, fix export/
import arg types, remove non-existent fish completion
- tools-reference.md: add EXA_API_KEY to web tools requires_env
- mcp-config-reference.md: add auth key for OAuth, tool name sanitization
- environment-variables.md: add EXA_API_KEY, update provider values
- plugins.md: remove non-existent ctx.register_command(), add
ctx.inject_message()
Feature docs additions:
- security.md: add /yolo mode, approval modes (manual/smart/off),
configurable timeout, expanded dangerous patterns table
- cron.md: add wrap_response config, [SILENT] suppression
- mcp.md: add dynamic tool discovery, MCP sampling support
- cli.md: add Ctrl+Z suspend, busy_input_mode, tool_preview_length
- docker.md: add skills/credential file mounting
Messaging platform docs:
- telegram.md: add webhook mode, DoH fallback IPs
- slack.md: add multi-workspace OAuth support
- discord.md: add DISCORD_IGNORE_NO_MENTION
- matrix.md: add MSC3245 native voice messages
- feishu.md: expand from 129 to 365 lines (encrypt key, verification
token, group policy, card actions, media, rate limiting, markdown,
troubleshooting)
- wecom.md: expand from 86 to 264 lines (per-group allowlists, media,
AES decryption, stream replies, reconnection, troubleshooting)
Configuration docs:
- quickstart.md: add DeepSeek, Copilot, Copilot ACP providers
- configuration.md: add DeepSeek provider, Exa web backend, terminal
env_passthrough/images, browser.command_timeout, compression params,
discord config, security/tirith config, timezone, auxiliary models
21 files changed, ~1000 lines added
2026-03-30 17:15:21 -07:00
| `web_search` | Search the web for information on any topic. Returns up to 5 relevant results with titles, URLs, and descriptions. | EXA_API_KEY or PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY |
| `web_extract` | Extract content from web page URLs. Returns page content in markdown format. Also works with PDF URLs — pass the PDF link directly and it converts to markdown text. Pages under 5000 chars return full markdown; larger pages are LLM-summarized. | EXA_API_KEY or PARALLEL_API_KEY or FIRECRAWL_API_KEY or TAVILY_API_KEY |
docs: fix documentation inconsistencies across reference and user guides
- toolsets-reference: add browser_console to browser + all platform toolsets,
add missing hermes-acp, hermes-sms, messaging toolsets, correct hermes-gateway
as composite, deduplicate platform toolset listings
- tools-reference: add missing vision and web toolset sections
- slash-commands: fix /new+/reset as alias (not separate commands), add /stop to
CLI section (available in both CLI and gateway), add /plugins command, fix Notes
section about messaging-only vs CLI-only
- environment-variables: fix HERMES_MAX_ITERATIONS default (90 not 60), add
DEEPSEEK_API_KEY/BASE_URL, OPENCODE_ZEN/GO keys, TAVILY_API_KEY,
GITHUB_TOKEN, HERMES_EPHEMERAL_SYSTEM_PROMPT
- configuration: remove duplicate Alibaba Cloud row, add OpenCode Zen/Go providers
- cli-commands: add missing providers to --provider list (opencode-zen,
opencode-go, ai-gateway, kilocode, alibaba)
- quickstart: add OpenCode Zen and OpenCode Go to provider table
Co-authored-by: Test <test@test.com>
2026-03-18 16:26:27 -07:00
docs: expand Docusaurus coverage across CLI, tools, skills, and skins (#1232)
- add code-derived reference pages for slash commands, tools, toolsets,
bundled skills, and official optional skills
- document the skin system and link visual theming separately from
conversational personality
- refresh quickstart, configuration, environment variable, and messaging
docs to match current provider, gateway, and browser behavior
- fix stale command, session, and Home Assistant configuration guidance
2026-03-13 21:34:41 -07:00
## `tts` toolset
| Tool | Description | Requires environment |
|------|-------------|----------------------|
| `text_to_speech` | Convert text to speech audio. Returns a MEDIA: path that the platform delivers as a voice message. On Telegram it plays as a voice bubble, on Discord/WhatsApp as an audio attachment. In CLI mode, saves to ~/voice-memos/. Voice and provider… | — |