mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-01 16:31:56 +08:00
docs: two-week gap sweep — platforms, CLI, config, TUI, hooks, providers (#17727)
Covers ~60 merged PRs from Apr 15–29 that shipped user-visible behavior without docs coverage. No functional code changes; docs + static manifest regeneration only. Highlights: Stale / incorrect: - configuration.md: auxiliary auto-routing line was wrong since #11900; now correctly states auto routes to the main model, with a note on the cost trade-off and per-task override pattern. - integrations/providers.md + configuration.md compression intro: removed stale 'Gemini Flash via OpenRouter' claim. - website/static/api/model-catalog.json: rebuilt from hermes_cli/models.py so the live manifest picks up tencent/hy3-preview (and remains in sync for future model-catalog PRs). Platform messaging (#17417 #16997 #16193 #14315 #13151 #11794 #10610 #10283 #10246 #11564 #13178): - Signal: native formatting (bodyRanges), reply quotes, reactions. - Telegram: table rendering (bullets + code-block fallback), disable_link_previews, group_allowed_chats. - Slack: strict_mention config. - Discord: slash_commands disable, send_animation GIF, send_message native media attachments. - DingTalk: require_mention + allowed_users. CLI (#16052 #16539 #16566 #15841 #14798 #10043): - New 'hermes fallback' interactive manager. - New 'hermes update --check', '--backup' flag, and pre-update pairing snapshot behavior. - 'hermes gateway start/restart --all' multi-profile flag. - cron.md: 'hermes tools' as a platform, per-job enabled_toolsets, wakeAgent gate, context_from chaining. Config keys / env vars (#17305 #17026 #17000 #15077 #14557 #14227 #14166 #14730 #17008): - terminal.docker_run_as_host_user, display.runtime_metadata_footer, compression.hygiene_hard_message_limit, HINDSIGHT_TIMEOUT, skills.guard_agent_created, TAVILY_BASE_URL, security.allow_private_urls, agent.api_max_retries, gateway hot-reload of compression/context_length config edits. TUI / CLI UX (#17130 #17113 #17175 #17150 #16707 #12312 #12305 #12934 #14810 #14045 #17286 #17126): - HERMES_TUI_RESUME, HERMES_TUI_THEME, LaTeX rendering, busy-indicator styles, ctrl-x queued-message delete, git branch in status bar, per- prompt elapsed stopwatch, external-editor keybind, markdown stripping, TUI voice-mode parity, /agents overlay, /reload + /mouse. Gateway features (#16506 #15027 #13428 #12116): - Native multimodal image routing based on vision capability. - /usage account-limits section. - /steer slash command (added to reference + explanation in CLI). Plugins / hooks (#12929 #12972 #10763 #16364): - transform_tool_result, transform_terminal_output plugin hooks. - PluginContext.dispatch_tool() documented with slash-command example. - google_meet bundled plugin entry under built-in-plugins.md. Other (#16576 #16572 #16383 #15878 #15608 #15606 #14809 #14767 #14231 #14232 #14307 #13683 #12373 #11891 #11291 #10066): - hermes backup exclusions (WAL/SHM/journal + checkpoints/). - security.md hardline blocklist (floor below --yolo). - FHS install layout for root installs. - openssh-client + docker-cli baked into the Docker image. - MEDIA: tag supported extensions table (docs/office/archives/pdf). - Remote-to-host file sync on SSH/Modal/Daytona teardown. - 'hermes model' -> Configure Auxiliary Models interactive picker. - Podman support via HERMES_DOCKER_BINARY. Providers / STT / one-shot (#15045 #14473 #15704): - alibaba-coding-plan first-class provider entry. - xAI Grok STT as a 6th transcription option. - 'hermes -z' scripted one-shot mode + HERMES_INFERENCE_MODEL. Build: 'docusaurus build' succeeds. No new broken links/anchors; pre-existing warnings unchanged.
This commit is contained in:
@@ -162,6 +162,36 @@ Hermes-prefixed and standard SDK env vars (`LANGFUSE_PUBLIC_KEY`, `LANGFUSE_SECR
|
||||
|
||||
**Disabling:** `hermes plugins disable observability/langfuse`. The plugin module is still discovered, but no module code runs until you re-enable.
|
||||
|
||||
### google_meet
|
||||
|
||||
Lets the agent **join, transcribe, and participate in Google Meet calls** — take notes on a meeting, summarize the back-and-forth after, follow up on specific points, and (optionally) speak replies back into the call via TTS.
|
||||
|
||||
**What it adds:**
|
||||
|
||||
- A headless virtual participant that joins a Meet URL using browser automation
|
||||
- Live transcription of the meeting audio via the configured STT provider
|
||||
- A `meet_summarize` / `meet_speak` / `meet_followup` toolset the agent invokes to act on what it heard
|
||||
- Post-meeting artifacts (transcript, speaker-attributed notes, action items) saved under `~/.hermes/cache/google_meet/<meeting_id>/`
|
||||
|
||||
**Setup:**
|
||||
|
||||
```bash
|
||||
hermes plugins enable google_meet
|
||||
# Prompts you to sign in via the plugin's OAuth flow on first use —
|
||||
# needs a Google account with Meet access. Host approval may be required
|
||||
# if the meeting enforces "only invited participants can join".
|
||||
```
|
||||
|
||||
Usage from chat:
|
||||
|
||||
> "Join meet.google.com/abc-defg-hij and take notes. After the call, send me a summary with action items."
|
||||
|
||||
The agent kicks off the meeting join, streams the transcription back into its context as the call proceeds, and produces a structured summary when the meeting ends (or when you tell it to stop).
|
||||
|
||||
**When to use it:** recurring standups where you want a bot to transcribe + summarize for async attendees; deposition-style interviews where you want structured notes; any case where you'd otherwise need Fireflies / Otter / Grain. When you'd rather not have an AI listening in — don't enable it.
|
||||
|
||||
**Disabling:** `hermes plugins disable google_meet`. Any cached transcripts and recordings stay in `~/.hermes/cache/google_meet/` until you remove them.
|
||||
|
||||
## Adding a bundled plugin
|
||||
|
||||
Bundled plugins are written exactly like any other Hermes plugin — see [Build a Hermes Plugin](/docs/guides/build-a-hermes-plugin). The only differences are:
|
||||
|
||||
@@ -366,6 +366,64 @@ cronjob(action="remove", job_id="...")
|
||||
|
||||
For `update`, pass `skills=[]` to remove all attached skills.
|
||||
|
||||
## Toolsets available to cron jobs
|
||||
|
||||
Cron runs each job in a fresh agent session with no chat platform attached. By default the cron agent gets **the toolset you configured for the `cron` platform in `hermes tools`** — not the CLI default, not everything under the sun.
|
||||
|
||||
```bash
|
||||
hermes tools
|
||||
# → pick the "cron" platform in the curses UI
|
||||
# → toggle toolsets on/off just like you would for Telegram/Discord/etc.
|
||||
```
|
||||
|
||||
Tighter per-job control is available via the `enabled_toolsets` field on `cronjob.create` (or on an existing job via `cronjob.update`):
|
||||
|
||||
```text
|
||||
cronjob(action="create", name="weekly-news-summary",
|
||||
schedule="every sunday 9am",
|
||||
enabled_toolsets=["web", "file"], # just web + file, no terminal/browser/etc.
|
||||
prompt="Summarize this week's AI news: ...")
|
||||
```
|
||||
|
||||
When `enabled_toolsets` is set on a job it wins; otherwise the `hermes tools` cron-platform config wins; otherwise Hermes falls back to the built-in defaults. This matters for cost control: carrying `moa`, `browser`, `delegation` into every tiny "fetch news" job bloats the tool-schema prompt on every LLM call.
|
||||
|
||||
### Skipping the agent entirely: `wakeAgent`
|
||||
|
||||
If your cron job attaches a pre-check script (via `script=`), the script can decide at runtime whether Hermes should even invoke the agent. Emit a final stdout line of the form:
|
||||
|
||||
```text
|
||||
{"wakeAgent": false}
|
||||
```
|
||||
|
||||
…and cron skips the agent run entirely for this tick. Useful for frequent polls (every 1–5 min) that only need to wake the LLM when state actually changed — otherwise you pay for zero-content agent turns over and over.
|
||||
|
||||
```python
|
||||
# pre-check script
|
||||
import json, sys
|
||||
latest = fetch_latest_issue_count()
|
||||
prev = read_state("issue_count")
|
||||
if latest == prev:
|
||||
print(json.dumps({"wakeAgent": False})) # skip this tick
|
||||
sys.exit(0)
|
||||
write_state("issue_count", latest)
|
||||
print(json.dumps({"wakeAgent": True, "context": {"new_issues": latest - prev}}))
|
||||
```
|
||||
|
||||
When `wakeAgent` is omitted, the default is `true` (wake the agent as usual).
|
||||
|
||||
### Chaining jobs: `context_from`
|
||||
|
||||
A cron job can consume the most recent successful output of one or more other jobs by listing their names (or IDs) in `context_from`:
|
||||
|
||||
```text
|
||||
cronjob(action="create", name="daily-digest",
|
||||
schedule="every day 7am",
|
||||
context_from=["ai-news-fetch", "github-prs-fetch"],
|
||||
prompt="Write the daily digest using the outputs above.")
|
||||
```
|
||||
|
||||
The referenced jobs' most recent completed outputs are injected above the prompt as context for this run. Each upstream entry must be a valid job ID or name (see `cronjob action="list"`). Note: chaining reads the *most recent completed* output — it does not wait for upstream jobs that are running in the same tick.
|
||||
|
||||
## Job storage
|
||||
|
||||
Jobs are stored in `~/.hermes/cron/jobs.json`. Output from job runs is saved to `~/.hermes/cron/output/{job_id}/{timestamp}.md`.
|
||||
|
||||
@@ -173,6 +173,32 @@ delegate_task(
|
||||
)
|
||||
```
|
||||
|
||||
## Child Timeout
|
||||
|
||||
Subagents are killed as stuck if they go quiet for more than `delegation.child_timeout_seconds` wall-clock seconds. The default is **600** (10 minutes) — bumped up from 300s in earlier releases because high-reasoning models on non-trivial research tasks were getting killed mid-think. Tune it per-install:
|
||||
|
||||
```yaml
|
||||
delegation:
|
||||
child_timeout_seconds: 600 # default
|
||||
```
|
||||
|
||||
Lower it for fast local models; raise it for slow reasoning models on hard problems. The timer resets every time the child makes an API call or tool call — only genuinely idle workers trigger the kill.
|
||||
|
||||
:::tip Diagnostic dump on zero-call timeout
|
||||
If a subagent times out having made **zero** API calls (usually: provider unreachable, auth failure, or tool-schema rejection), `delegate_task` writes a structured diagnostic to `~/.hermes/logs/subagent-timeout-<session>-<timestamp>.log` containing the subagent's config snapshot, credential-resolution trace, and any early error messages. Much easier to root-cause than the previous silent-timeout behavior.
|
||||
:::
|
||||
|
||||
## Monitoring Running Subagents (`/agents`)
|
||||
|
||||
The TUI ships a `/agents` overlay (alias `/tasks`) that turns recursive `delegate_task` fan-out into a first-class audit surface:
|
||||
|
||||
- Live tree view of running and recently-finished subagents, grouped by parent
|
||||
- Per-branch cost, token, and file-touched rollups
|
||||
- Kill and pause controls — cancel a specific subagent mid-flight without interrupting its siblings
|
||||
- Post-hoc review: step through each subagent's turn-by-turn history even after they've returned to the parent
|
||||
|
||||
The classic CLI just prints `/agents` as a text summary; the TUI is where the overlay shines. See [TUI — Slash commands](/docs/user-guide/tui#slash-commands).
|
||||
|
||||
## Depth Limit and Nested Orchestration
|
||||
|
||||
By default, delegation is **flat**: a parent (depth 0) spawns children (depth 1), and those children cannot delegate further. This prevents runaway recursive delegation.
|
||||
|
||||
@@ -21,7 +21,15 @@ When your main LLM provider encounters errors — rate limits, server overload,
|
||||
|
||||
### Configuration
|
||||
|
||||
Add a `fallback_model` section to `~/.hermes/config.yaml`:
|
||||
The easiest path is the interactive manager:
|
||||
|
||||
```bash
|
||||
hermes fallback
|
||||
```
|
||||
|
||||
`hermes fallback` reuses the provider picker from `hermes model` — same provider list, same credential prompts, same validation. Press `a` to add a fallback, `↑`/`↓` to reorder, `d` to remove, `q` to save and exit. Changes persist under `model.fallback_providers` in `config.yaml`.
|
||||
|
||||
If you'd rather edit the YAML directly, add a `fallback_model` section to `~/.hermes/config.yaml`:
|
||||
|
||||
```yaml
|
||||
fallback_model:
|
||||
@@ -31,6 +39,10 @@ fallback_model:
|
||||
|
||||
Both `provider` and `model` are **required**. If either is missing, the fallback is disabled.
|
||||
|
||||
:::note `fallback_model` vs `fallback_providers`
|
||||
`fallback_model` (singular) is the legacy single-fallback key — Hermes still honors it for back-compat. `fallback_providers` (plural, list) supports multiple fallbacks tried in order; `hermes fallback` writes to this key. When both are set, Hermes merges them with `fallback_providers` taking priority.
|
||||
:::
|
||||
|
||||
### Supported Providers
|
||||
|
||||
| Provider | Value | Requirements |
|
||||
|
||||
@@ -385,6 +385,8 @@ def register(ctx):
|
||||
| [`pre_gateway_dispatch`](#pre_gateway_dispatch) | Gateway received a user message, before auth + dispatch | `{"action": "skip" \| "rewrite" \| "allow", ...}` to influence flow |
|
||||
| [`pre_approval_request`](#pre_approval_request) | Dangerous command needs user approval, before the prompt/notification is sent | ignored |
|
||||
| [`post_approval_response`](#post_approval_response) | User responded to an approval prompt (or it timed out) | ignored |
|
||||
| [`transform_tool_result`](#transform_tool_result) | After any tool returns, before the result is handed back to the model | `str` to replace the result, `None` to leave unchanged |
|
||||
| [`transform_terminal_output`](#transform_terminal_output) | Inside the `terminal` tool, before truncation/ANSI-strip/redact | `str` to replace the raw output, `None` to leave unchanged |
|
||||
|
||||
---
|
||||
|
||||
@@ -1003,6 +1005,94 @@ def register(ctx):
|
||||
|
||||
---
|
||||
|
||||
### `transform_tool_result`
|
||||
|
||||
Fires **after** a tool returns and **before** the result is appended to the conversation. Lets a plugin rewrite ANY tool's result string — not just terminal output — before the model sees it.
|
||||
|
||||
**Callback signature:**
|
||||
|
||||
```python
|
||||
def my_callback(
|
||||
tool_name: str,
|
||||
arguments: dict,
|
||||
result: str,
|
||||
task_id: str | None,
|
||||
**kwargs,
|
||||
) -> str | None:
|
||||
```
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `tool_name` | `str` | Tool that produced the result (`read_file`, `web_extract`, `delegate_task`, …). |
|
||||
| `arguments` | `dict` | Arguments the model called the tool with. |
|
||||
| `result` | `str` | The tool's raw result string, post-truncation and post-ANSI-strip. |
|
||||
| `task_id` | `str \| None` | Task/session ID when running inside RL/benchmark environments. |
|
||||
|
||||
**Return value:** `str` to replace the result (the returned string is what the model sees), `None` to leave it unchanged.
|
||||
|
||||
**Use cases:** Redact organization-specific PII from `web_extract` output, wrap long JSON tool responses in a summary header, inject retrieval-augmented hints into `read_file` results, rewrite `delegate_task` subagent reports into a project-specific schema.
|
||||
|
||||
```python
|
||||
import re
|
||||
SECRET = re.compile(r"sk-[A-Za-z0-9]{32,}")
|
||||
|
||||
def redact_secrets(tool_name, result, **kwargs):
|
||||
if SECRET.search(result):
|
||||
return SECRET.sub("[REDACTED]", result)
|
||||
return None
|
||||
|
||||
def register(ctx):
|
||||
ctx.register_hook("transform_tool_result", redact_secrets)
|
||||
```
|
||||
|
||||
Applies to every tool. For terminal-only rewriting see `transform_terminal_output` below — it's narrower and runs earlier in the pipeline (pre-truncation, pre-redaction).
|
||||
|
||||
---
|
||||
|
||||
### `transform_terminal_output`
|
||||
|
||||
Fires inside the `terminal` tool's foreground-output pipeline, **before** the default 50 KB truncation, ANSI strip, and secret redaction. Lets plugins rewrite the raw stdout/stderr of a shell command before any downstream processing touches it.
|
||||
|
||||
**Callback signature:**
|
||||
|
||||
```python
|
||||
def my_callback(
|
||||
command: str,
|
||||
output: str,
|
||||
exit_code: int,
|
||||
cwd: str,
|
||||
task_id: str | None,
|
||||
**kwargs,
|
||||
) -> str | None:
|
||||
```
|
||||
|
||||
| Parameter | Type | Description |
|
||||
|-----------|------|-------------|
|
||||
| `command` | `str` | The shell command that produced the output. |
|
||||
| `output` | `str` | Raw combined stdout/stderr (may be very large — truncation happens after the hook). |
|
||||
| `exit_code` | `int` | Process exit code. |
|
||||
| `cwd` | `str` | Working directory the command ran in. |
|
||||
|
||||
**Return value:** `str` to replace the output, `None` to leave it unchanged.
|
||||
|
||||
**Use cases:** Inject summaries for commands that produce massive output (`du -ah`, `find`, `tree`), tag output with a project-specific marker so downstream hooks know how to handle it, strip timing noise that flaps between runs and defeats prompt caching.
|
||||
|
||||
```python
|
||||
def summarize_find(command, output, **kwargs):
|
||||
if command.startswith("find ") and len(output) > 50_000:
|
||||
lines = output.count("\n")
|
||||
head = "\n".join(output.splitlines()[:40])
|
||||
return f"{head}\n\n[summary: {lines} paths total, showing first 40]"
|
||||
return None
|
||||
|
||||
def register(ctx):
|
||||
ctx.register_hook("transform_terminal_output", summarize_find)
|
||||
```
|
||||
|
||||
Pairs well with `transform_tool_result` (which covers every other tool).
|
||||
|
||||
---
|
||||
|
||||
## Shell Hooks
|
||||
|
||||
Declare shell-script hooks in your `cli-config.yaml` and Hermes will run them as subprocesses whenever the corresponding plugin-hook event fires — in both CLI and gateway sessions. No Python plugin authoring required.
|
||||
|
||||
@@ -135,13 +135,15 @@ Local transcription works out of the box when `faster-whisper` is installed. If
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
stt:
|
||||
provider: "local" # "local" | "groq" | "openai" | "mistral"
|
||||
provider: "local" # "local" | "groq" | "openai" | "mistral" | "xai"
|
||||
local:
|
||||
model: "base" # tiny, base, small, medium, large-v3
|
||||
openai:
|
||||
model: "whisper-1" # whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe
|
||||
mistral:
|
||||
model: "voxtral-mini-latest" # voxtral-mini-latest, voxtral-mini-2602
|
||||
xai:
|
||||
model: "grok-stt" # xAI Grok STT
|
||||
```
|
||||
|
||||
### Provider Details
|
||||
@@ -162,6 +164,8 @@ stt:
|
||||
|
||||
**Mistral API (Voxtral Transcribe)** — Requires `MISTRAL_API_KEY`. Uses Mistral's [Voxtral Transcribe](https://docs.mistral.ai/capabilities/audio/speech_to_text/) models. Supports 13 languages, speaker diarization, and word-level timestamps. Install with `pip install hermes-agent[mistral]`.
|
||||
|
||||
**xAI Grok STT** — Requires `XAI_API_KEY`. Posts to `https://api.x.ai/v1/stt` as multipart/form-data. Good choice if you're already using xAI for chat or TTS and want one API key for everything. Auto-detection order puts it after Groq — explicitly set `stt.provider: xai` to force it.
|
||||
|
||||
**Custom local CLI fallback** — Set `HERMES_LOCAL_STT_COMMAND` if you want Hermes to call a local transcription command directly. The command template supports `{input_path}`, `{output_dir}`, `{language}`, and `{model}` placeholders.
|
||||
|
||||
### Fallback Behavior
|
||||
|
||||
@@ -189,3 +189,16 @@ Image paste works with any vision-capable model. The image is sent as a base64-e
|
||||
```
|
||||
|
||||
Most modern models support this format, including GPT-4 Vision, Claude (with vision), Gemini, and open-source multimodal models served through OpenRouter.
|
||||
|
||||
## Image Routing (Vision-Capable vs Text-Only Models)
|
||||
|
||||
When a user attaches an image — from the CLI clipboard, the gateway (Telegram/Discord photo), or any other entry point — Hermes routes it based on whether your current model actually supports vision:
|
||||
|
||||
| Your model | What happens to the image |
|
||||
|---|---|
|
||||
| **Vision-capable** (GPT-4V, Claude with vision, Gemini, Qwen-VL, MiMo-VL, etc.) | Sent as **real pixels** using the provider's native image content format above. No text summary layer. |
|
||||
| **Text-only** (DeepSeek V3, smaller open-source models, older chat-only endpoints) | Routed through the `vision_analyze` auxiliary tool — an auxiliary vision model describes the image, and the text description is injected into the conversation. |
|
||||
|
||||
You don't configure this — Hermes looks up your current model's capability in the provider metadata and picks the right path automatically. The practical effect: you can switch between vision and non-vision models mid-session and image handling "just works" without changing your workflow. Text-only models get coherent context about the image rather than a broken multimodal payload they'd have to reject.
|
||||
|
||||
Which auxiliary model handles the text-description path is configurable under `auxiliary.vision` — see [Auxiliary Models](/docs/user-guide/configuration#auxiliary-models).
|
||||
|
||||
@@ -105,6 +105,8 @@ If `faster-whisper` is installed, voice mode works with **zero API keys** for ST
|
||||
|
||||
## CLI Voice Mode
|
||||
|
||||
Voice mode is available in both the **classic CLI** (`hermes chat`) and the **TUI** (`hermes --tui`). Behavior is identical across both — same slash commands, same VAD silence detection, same streaming TTS, same hallucination filter. The TUI additionally forwards crash-forensic logs to `~/.hermes/logs/` so push-to-talk failures on exotic audio backends can be reported with a full stack trace rather than disappearing silently.
|
||||
|
||||
### Quick Start
|
||||
|
||||
Start the CLI and enable voice mode:
|
||||
|
||||
Reference in New Issue
Block a user