Compare commits

...

8 Commits

Author SHA1 Message Date
Teknium
e102222828 fix: skip KawaiiSpinner when TUI handles tool progress
In the interactive CLI, the agent runs with quiet_mode=True and
tool_progress_callback set. The quiet_mode condition triggered
KawaiiSpinner for every tool call, but the TUI was already handling
progress display via the spinner widget.

The KawaiiSpinner writes carriage-return animation through StdoutProxy,
triggering run_in_terminal() erase/redraw cycles on every flush. These
redundant cycles cause the status bar to ghost into terminal scrollback.

The thinking spinner already had this guard (checks thinking_callback).
This extends the same pattern to the three tool spinner creation sites:
concurrent tools, delegate_task, and single tool execution.
2026-03-25 08:19:42 -07:00
Teknium
0f3c191ef1 fix(cli): enhance real-time reasoning output by forcing flush of long partial lines
Updated the reasoning output mechanism to emit complete lines and force-flush long partial lines, ensuring reasoning is visible in real-time even without newlines. This improves user experience during reasoning sessions.
2026-03-24 19:56:30 -07:00
Teknium
7cdf4efe05 fix(skills): agent-created skills were incorrectly treated as untrusted community content
_resolve_trust_level() didn't handle 'agent-created' source, so it
fell through to 'community' trust level. Community policy blocks on
any caution or dangerous findings, which meant common patterns like
curl with env vars, systemctl, crontab, cloudflared references etc.
would block skill creation/patching.

The agent-created policy row already existed in INSTALL_POLICY with
permissive settings (allow caution, ask on dangerous) but was never
reached. Now it is.

Fixes reports of skill_manage being blocked by security scanner.
2026-03-24 19:56:30 -07:00
Teknium
adee8d1b5f fix: browser_vision ignores auxiliary.vision.timeout config (#2901)
* docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event

The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.

Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
  signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
  on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)

* fix: browser_vision ignores auxiliary.vision.timeout config

browser_vision called call_llm() without passing a timeout parameter,
so it always used the 30-second default in auxiliary_client.py. This
made vision analysis with local models (llama.cpp, ollama) impossible
since they typically need more than 30s for screenshot analysis.

Now browser_vision reads auxiliary.vision.timeout from config.yaml
(same config key that vision_analyze already uses) and passes it
through to call_llm().

Also bumped the default vision timeout from 30s to 120s in both
browser_vision and vision_analyze — 30s is too aggressive for local
models and the previous default silently failed for anyone running
vision locally.

Fixes user report from GamerGB1988.
2026-03-24 19:56:30 -07:00
Teknium
f5b84dddfd fix(compression): restore sane defaults and cap summary at 12K tokens
- threshold: 0.80 → 0.50 (compress at 50%, not 80%)
- target_ratio: 0.40 → 0.20, now relative to threshold not total context
  (20% of 50% = 10% of context as tail budget)
- summary ceiling: 32K → 12K (Gemini can't output more than ~12K)
- Updated DEFAULT_CONFIG, config display, example config, and tests
2026-03-24 19:56:30 -07:00
Teknium
4549a2f51a docs: clarify two-mode behavior in session_search schema description 2026-03-24 19:56:30 -07:00
Teknium
466720c2f3 feat(session_search): add recent sessions mode when query is omitted
When session_search is called without a query (or with an empty query),
it now returns metadata for the most recent sessions instead of erroring.
This lets the agent quickly see what was worked on recently without
needing specific keywords.

Returns for each session: session_id, title, source, started_at,
last_active, message_count, preview (first user message).
Zero LLM cost — pure DB query. Current session lineage and child
delegation sessions are excluded.

The agent can then keyword-search specific sessions if it needs
deeper context from any of them.
2026-03-24 19:56:30 -07:00
Teknium
fccd7a2ab4 docs: unify hooks documentation — add plugin hooks to hooks page, add session:end event
The hooks page only documented gateway event hooks (HOOK.yaml system).
The plugins page listed plugin hooks (pre_tool_call, etc.) that weren't
referenced from the hooks page, which was confusing.

Changes:
- hooks.md: Add overview table showing both hook systems
- hooks.md: Add Plugin Hooks section with available hooks, callback
  signatures, and example
- hooks.md: Add missing session:end gateway event (emitted but undocumented)
- hooks.md: Mark pre_llm_call, post_llm_call, on_session_start,
  on_session_end as planned (defined in VALID_HOOKS but not yet invoked)
- hooks.md: Update info box to cross-reference plugin hooks
- hooks.md: Fix heading hierarchy (gateway content as subsections)
- plugins.md: Add cross-reference to hooks page for full details
- plugins.md: Mark planned hooks as (planned)
2026-03-24 18:34:14 -07:00
12 changed files with 217 additions and 65 deletions

View File

@@ -40,7 +40,7 @@ _MIN_SUMMARY_TOKENS = 2000
# Proportion of compressed content to allocate for summary
_SUMMARY_RATIO = 0.20
# Absolute ceiling for summary tokens (even on very large context windows)
_SUMMARY_TOKENS_CEILING = 32_000
_SUMMARY_TOKENS_CEILING = 12_000
# Placeholder used when pruning old tool results
_PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"
@@ -63,10 +63,10 @@ class ContextCompressor:
def __init__(
self,
model: str,
threshold_percent: float = 0.80,
threshold_percent: float = 0.50,
protect_first_n: int = 3,
protect_last_n: int = 20,
summary_target_ratio: float = 0.40,
summary_target_ratio: float = 0.20,
quiet_mode: bool = False,
summary_model_override: str = None,
base_url: str = "",
@@ -92,8 +92,8 @@ class ContextCompressor:
self.threshold_tokens = int(self.context_length * threshold_percent)
self.compression_count = 0
# Derive token budgets from the target ratio and context length
target_tokens = int(self.context_length * self.summary_target_ratio)
# Derive token budgets: ratio is relative to the threshold, not total context
target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
self.tail_token_budget = target_tokens
self.max_summary_tokens = min(
int(self.context_length * 0.05), _SUMMARY_TOKENS_CEILING,

View File

@@ -236,23 +236,24 @@ browser:
# 5. Summarizes middle turns using a fast/cheap model
# 6. Inserts summary as a user message, continues conversation seamlessly
#
# Post-compression size scales with the model's context window via target_ratio:
# MiniMax 200K context → ~80K post-compression (at 0.40 ratio)
# GPT-5 1M context → ~400K post-compression (at 0.40 ratio)
# Post-compression tail budget is target_ratio × threshold × context_length:
# 200K context, threshold 0.50, ratio 0.20 → 20K tokens of recent tail preserved
# 1M context, threshold 0.50, ratio 0.20 → 100K tokens of recent tail preserved
#
compression:
# Enable automatic context compression (default: true)
# Set to false if you prefer to manage context manually or want errors on overflow
enabled: true
# Trigger compression at this % of model's context limit (default: 0.80 = 80%)
# Trigger compression at this % of model's context limit (default: 0.50 = 50%)
# Lower values = more aggressive compression, higher values = compress later
threshold: 0.80
threshold: 0.50
# Target post-compression size as a fraction of context window (default: 0.40 = 40%)
# Controls how much context survives compression. Tail token budget and summary
# cap scale with this value. Range: 0.10 - 0.80
target_ratio: 0.40
# Fraction of the threshold to preserve as recent tail (default: 0.20 = 20%)
# e.g. 20% of 50% threshold = 10% of total context kept as recent messages.
# Summary output is separately capped at 12K tokens (Gemini output limit).
# Range: 0.10 - 0.80
target_ratio: 0.20
# Number of most-recent messages to always preserve (default: 20 ≈ 10 full turns)
# Higher values keep more recent conversation intact at the cost of more aggressive

6
cli.py
View File

@@ -1509,10 +1509,14 @@ class HermesCLI:
self._reasoning_buf = getattr(self, "_reasoning_buf", "") + text
# Emit complete lines
# Emit complete lines, and force-flush long partial lines so
# reasoning is visible in real-time even without newlines.
while "\n" in self._reasoning_buf:
line, self._reasoning_buf = self._reasoning_buf.split("\n", 1)
_cprint(f"{_DIM}{line}{_RST}")
if len(self._reasoning_buf) > 80:
_cprint(f"{_DIM}{self._reasoning_buf}{_RST}")
self._reasoning_buf = ""
def _close_reasoning_box(self) -> None:
"""Close the live reasoning box if it's open."""

View File

@@ -163,8 +163,8 @@ DEFAULT_CONFIG = {
"compression": {
"enabled": True,
"threshold": 0.80, # compress when context usage exceeds this ratio
"target_ratio": 0.40, # fraction of context to preserve as recent tail
"threshold": 0.50, # compress when context usage exceeds this ratio
"target_ratio": 0.20, # fraction of threshold to preserve as recent tail
"protect_last_n": 20, # minimum recent messages to keep uncompressed
"summary_model": "", # empty = use main configured model
"summary_provider": "auto",
@@ -1686,8 +1686,8 @@ def show_config():
enabled = compression.get('enabled', True)
print(f" Enabled: {'yes' if enabled else 'no'}")
if enabled:
print(f" Threshold: {compression.get('threshold', 0.80) * 100:.0f}%")
print(f" Target ratio: {compression.get('target_ratio', 0.40) * 100:.0f}% of context preserved")
print(f" Threshold: {compression.get('threshold', 0.50) * 100:.0f}%")
print(f" Target ratio: {compression.get('target_ratio', 0.20) * 100:.0f}% of threshold preserved")
print(f" Protect last: {compression.get('protect_last_n', 20)} messages")
_sm = compression.get('summary_model', '') or '(main model)'
print(f" Model: {_sm}")

View File

@@ -1009,10 +1009,10 @@ class AIAgent:
_compression_cfg = _agent_cfg.get("compression", {})
if not isinstance(_compression_cfg, dict):
_compression_cfg = {}
compression_threshold = float(_compression_cfg.get("threshold", 0.80))
compression_threshold = float(_compression_cfg.get("threshold", 0.50))
compression_enabled = str(_compression_cfg.get("enabled", True)).lower() in ("true", "1", "yes")
compression_summary_model = _compression_cfg.get("summary_model") or None
compression_target_ratio = float(_compression_cfg.get("target_ratio", 0.40))
compression_target_ratio = float(_compression_cfg.get("target_ratio", 0.20))
compression_protect_last = int(_compression_cfg.get("protect_last_n", 20))
# Read explicit context_length override from model config
@@ -4816,9 +4816,9 @@ class AIAgent:
is_error, _ = _detect_tool_failure(function_name, result)
results[index] = (function_name, function_args, result, duration, is_error)
# Start spinner for CLI mode
# Start spinner for CLI mode (skip when TUI handles tool progress)
spinner = None
if self.quiet_mode:
if self.quiet_mode and not self.tool_progress_callback:
face = random.choice(KawaiiSpinner.KAWAII_WAITING)
spinner = KawaiiSpinner(f"{face} ⚡ running {num_tools} tools concurrently", spinner_type='dots')
spinner.start()
@@ -5044,7 +5044,7 @@ class AIAgent:
goal_preview = (function_args.get("goal") or "")[:30]
spinner_label = f"🔀 {goal_preview}" if goal_preview else "🔀 delegating"
spinner = None
if self.quiet_mode:
if self.quiet_mode and not self.tool_progress_callback:
face = random.choice(KawaiiSpinner.KAWAII_WAITING)
spinner = KawaiiSpinner(f"{face} {spinner_label}", spinner_type='dots')
spinner.start()
@@ -5069,13 +5069,15 @@ class AIAgent:
elif self.quiet_mode:
self._vprint(f" {cute_msg}")
elif self.quiet_mode:
face = random.choice(KawaiiSpinner.KAWAII_WAITING)
emoji = _get_tool_emoji(function_name)
preview = _build_tool_preview(function_name, function_args) or function_name
if len(preview) > 30:
preview = preview[:27] + "..."
spinner = KawaiiSpinner(f"{face} {emoji} {preview}", spinner_type='dots')
spinner.start()
spinner = None
if not self.tool_progress_callback:
face = random.choice(KawaiiSpinner.KAWAII_WAITING)
emoji = _get_tool_emoji(function_name)
preview = _build_tool_preview(function_name, function_args) or function_name
if len(preview) > 30:
preview = preview[:27] + "..."
spinner = KawaiiSpinner(f"{face} {emoji} {preview}", spinner_type='dots')
spinner.start()
_spinner_result = None
try:
function_result = handle_function_call(
@@ -5091,7 +5093,10 @@ class AIAgent:
finally:
tool_duration = time.time() - tool_start_time
cute_msg = _get_cute_tool_message_impl(function_name, function_args, tool_duration, result=_spinner_result)
spinner.stop(cute_msg)
if spinner:
spinner.stop(cute_msg)
else:
self._vprint(f" {cute_msg}")
else:
try:
function_result = handle_function_call(

View File

@@ -519,24 +519,26 @@ class TestSummaryTargetRatio:
"""Verify that summary_target_ratio properly scales budgets with context window."""
def test_tail_budget_scales_with_context(self):
"""Tail token budget should be context_length * summary_target_ratio."""
"""Tail token budget should be threshold_tokens * summary_target_ratio."""
with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.40)
assert c.tail_token_budget == 80_000
# 200K * 0.50 threshold * 0.40 ratio = 40K
assert c.tail_token_budget == 40_000
with patch("agent.context_compressor.get_model_context_length", return_value=1_000_000):
c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.40)
assert c.tail_token_budget == 400_000
# 1M * 0.50 threshold * 0.40 ratio = 200K
assert c.tail_token_budget == 200_000
def test_summary_cap_scales_with_context(self):
"""Max summary tokens should be 5% of context, capped at 32K."""
"""Max summary tokens should be 5% of context, capped at 12K."""
with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
c = ContextCompressor(model="test", quiet_mode=True)
assert c.max_summary_tokens == 10_000 # 200K * 0.05
with patch("agent.context_compressor.get_model_context_length", return_value=1_000_000):
c = ContextCompressor(model="test", quiet_mode=True)
assert c.max_summary_tokens == 32_000 # capped at ceiling
assert c.max_summary_tokens == 12_000 # capped at 12K ceiling
def test_ratio_clamped(self):
"""Ratio should be clamped to [0.10, 0.80]."""
@@ -548,12 +550,12 @@ class TestSummaryTargetRatio:
c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.95)
assert c.summary_target_ratio == 0.80
def test_default_threshold_is_80_percent(self):
"""Default compression threshold should be 80%."""
def test_default_threshold_is_50_percent(self):
"""Default compression threshold should be 50%."""
with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
c = ContextCompressor(model="test", quiet_mode=True)
assert c.threshold_percent == 0.80
assert c.threshold_tokens == 80_000
assert c.threshold_percent == 0.50
assert c.threshold_tokens == 50_000
def test_default_protect_last_n_is_20(self):
"""Default protect_last_n should be 20."""

View File

@@ -1567,6 +1567,20 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
vision_model = _get_vision_model()
logger.debug("browser_vision: analysing screenshot (%d bytes)",
len(image_data))
# Read vision timeout from config (auxiliary.vision.timeout), default 120s.
# Local vision models (llama.cpp, ollama) can take well over 30s for
# screenshot analysis, so the default must be generous.
vision_timeout = 120.0
try:
from hermes_cli.config import load_config
_cfg = load_config()
_vt = _cfg.get("auxiliary", {}).get("vision", {}).get("timeout")
if _vt is not None:
vision_timeout = float(_vt)
except Exception:
pass
call_kwargs = {
"task": "vision",
"messages": [
@@ -1580,6 +1594,7 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
],
"max_tokens": 2000,
"temperature": 0.1,
"timeout": vision_timeout,
}
if vision_model:
call_kwargs["model"] = vision_model

View File

@@ -179,6 +179,58 @@ async def _summarize_session(
return None
def _list_recent_sessions(db, limit: int, current_session_id: str = None) -> str:
"""Return metadata for the most recent sessions (no LLM calls)."""
try:
sessions = db.list_sessions_rich(limit=limit + 5) # fetch extra to skip current
# Resolve current session lineage to exclude it
current_root = None
if current_session_id:
try:
sid = current_session_id
visited = set()
while sid and sid not in visited:
visited.add(sid)
s = db.get_session(sid)
parent = s.get("parent_session_id") if s else None
sid = parent if parent else None
current_root = max(visited, key=len) if visited else current_session_id
except Exception:
current_root = current_session_id
results = []
for s in sessions:
sid = s.get("id", "")
if current_root and (sid == current_root or sid == current_session_id):
continue
# Skip child/delegation sessions (they have parent_session_id)
if s.get("parent_session_id"):
continue
results.append({
"session_id": sid,
"title": s.get("title") or None,
"source": s.get("source", ""),
"started_at": s.get("started_at", ""),
"last_active": s.get("last_active", ""),
"message_count": s.get("message_count", 0),
"preview": s.get("preview", ""),
})
if len(results) >= limit:
break
return json.dumps({
"success": True,
"mode": "recent",
"results": results,
"count": len(results),
"message": f"Showing {len(results)} most recent sessions. Use a keyword query to search specific topics.",
}, ensure_ascii=False)
except Exception as e:
logging.error("Error listing recent sessions: %s", e, exc_info=True)
return json.dumps({"success": False, "error": f"Failed to list recent sessions: {e}"}, ensure_ascii=False)
def session_search(
query: str,
role_filter: str = None,
@@ -195,11 +247,14 @@ def session_search(
if db is None:
return json.dumps({"success": False, "error": "Session database not available."}, ensure_ascii=False)
limit = min(limit, 5) # Cap at 5 sessions to avoid excessive LLM calls
# Recent sessions mode: when query is empty, return metadata for recent sessions.
# No LLM calls — just DB queries for titles, previews, timestamps.
if not query or not query.strip():
return json.dumps({"success": False, "error": "Query cannot be empty."}, ensure_ascii=False)
return _list_recent_sessions(db, limit, current_session_id)
query = query.strip()
limit = min(limit, 5) # Cap at 5 sessions to avoid excessive LLM calls
try:
# Parse role filter
@@ -364,8 +419,14 @@ def check_session_search_requirements() -> bool:
SESSION_SEARCH_SCHEMA = {
"name": "session_search",
"description": (
"Search your long-term memory of past conversations. This is your recall -- "
"Search your long-term memory of past conversations, or browse recent sessions. This is your recall -- "
"every past session is searchable, and this tool summarizes what happened.\n\n"
"TWO MODES:\n"
"1. Recent sessions (no query): Call with no arguments to see what was worked on recently. "
"Returns titles, previews, and timestamps. Zero LLM cost, instant. "
"Start here when the user asks what were we working on or what did we do recently.\n"
"2. Keyword search (with query): Search for specific topics across all past sessions. "
"Returns LLM-generated summaries of matching sessions.\n\n"
"USE THIS PROACTIVELY when:\n"
"- The user says 'we did this before', 'remember when', 'last time', 'as I mentioned'\n"
"- The user asks about a topic you worked on before but don't have in current context\n"
@@ -385,7 +446,7 @@ SESSION_SEARCH_SCHEMA = {
"properties": {
"query": {
"type": "string",
"description": "Search query — keywords, phrases, or boolean expressions to find in past sessions.",
"description": "Search query — keywords, phrases, or boolean expressions to find in past sessions. Omit this parameter entirely to browse recent sessions instead (returns titles, previews, timestamps with no LLM cost).",
},
"role_filter": {
"type": "string",
@@ -397,7 +458,7 @@ SESSION_SEARCH_SCHEMA = {
"default": 3,
},
},
"required": ["query"],
"required": [],
},
}
@@ -410,7 +471,7 @@ registry.register(
toolset="session_search",
schema=SESSION_SEARCH_SCHEMA,
handler=lambda args, **kw: session_search(
query=args.get("query", ""),
query=args.get("query") or "",
role_filter=args.get("role_filter"),
limit=args.get("limit", 3),
db=kw.get("db"),

View File

@@ -1050,6 +1050,9 @@ def _get_configured_model() -> str:
def _resolve_trust_level(source: str) -> str:
"""Map a source identifier to a trust level."""
# Agent-created skills get their own permissive trust level
if source == "agent-created":
return "agent-created"
# Official optional skills shipped with the repo
if source.startswith("official/") or source == "official":
return "builtin"

View File

@@ -325,8 +325,9 @@ async def vision_analyze_tool(
logger.info("Processing image with vision model...")
# Call the vision API via centralized router.
# Read timeout from config.yaml (auxiliary.vision.timeout), default 30s.
vision_timeout = 30.0
# Read timeout from config.yaml (auxiliary.vision.timeout), default 120s.
# Local vision models (llama.cpp, ollama) can take well over 30s.
vision_timeout = 120.0
try:
from hermes_cli.config import load_config
_cfg = load_config()

View File

@@ -6,9 +6,20 @@ description: "Run custom code at key lifecycle points — log activity, send ale
# Event Hooks
The hooks system lets you run custom code at key points in the agent lifecycle — session creation, slash commands, each tool-calling step, and more. Hooks fire automatically during gateway operation without blocking the main agent pipeline.
Hermes has two hook systems that run custom code at key lifecycle points:
## Creating a Hook
| System | Registered via | Runs in | Use case |
|--------|---------------|---------|----------|
| **[Gateway hooks](#gateway-event-hooks)** | `HOOK.yaml` + `handler.py` in `~/.hermes/hooks/` | Gateway only | Logging, alerts, webhooks |
| **[Plugin hooks](#plugin-hooks)** | `ctx.register_hook()` in a [plugin](/docs/user-guide/features/plugins) | CLI + Gateway | Tool interception, metrics, guardrails |
Both systems are non-blocking — errors in any hook are caught and logged, never crashing the agent.
## Gateway Event Hooks
Gateway hooks fire automatically during gateway operation (Telegram, Discord, Slack, WhatsApp) without blocking the main agent pipeline.
### Creating a Hook
Each hook is a directory under `~/.hermes/hooks/` containing two files:
@@ -19,7 +30,7 @@ Each hook is a directory under `~/.hermes/hooks/` containing two files:
└── handler.py # Python handler function
```
### HOOK.yaml
#### HOOK.yaml
```yaml
name: my-hook
@@ -32,7 +43,7 @@ events:
The `events` list determines which events trigger your handler. You can subscribe to any combination of events, including wildcards like `command:*`.
### handler.py
#### handler.py
```python
import json
@@ -58,25 +69,26 @@ async def handle(event_type: str, context: dict):
- Can be `async def` or regular `def` — both work
- Errors are caught and logged, never crashing the agent
## Available Events
### Available Events
| Event | When it fires | Context keys |
|-------|---------------|--------------|
| `gateway:startup` | Gateway process starts | `platforms` (list of active platform names) |
| `session:start` | New messaging session created | `platform`, `user_id`, `session_id`, `session_key` |
| `session:end` | Session ended (before reset) | `platform`, `user_id`, `session_key` |
| `session:reset` | User ran `/new` or `/reset` | `platform`, `user_id`, `session_key` |
| `agent:start` | Agent begins processing a message | `platform`, `user_id`, `session_id`, `message` |
| `agent:step` | Each iteration of the tool-calling loop | `platform`, `user_id`, `session_id`, `iteration`, `tool_names` |
| `agent:end` | Agent finishes processing | `platform`, `user_id`, `session_id`, `message`, `response` |
| `command:*` | Any slash command executed | `platform`, `user_id`, `command`, `args` |
### Wildcard Matching
#### Wildcard Matching
Handlers registered for `command:*` fire for any `command:` event (`command:model`, `command:reset`, etc.). Monitor all slash commands with a single subscription.
## Examples
### Examples
### Telegram Alert on Long Tasks
#### Telegram Alert on Long Tasks
Send yourself a message when the agent takes more than 10 steps:
@@ -109,7 +121,7 @@ async def handle(event_type: str, context: dict):
)
```
### Command Usage Logger
#### Command Usage Logger
Track which slash commands are used:
@@ -142,7 +154,7 @@ def handle(event_type: str, context: dict):
f.write(json.dumps(entry) + "\n")
```
### Session Start Webhook
#### Session Start Webhook
POST to an external service on new sessions:
@@ -169,7 +181,7 @@ async def handle(event_type: str, context: dict):
}, timeout=5)
```
## How It Works
### How It Works
1. On gateway startup, `HookRegistry.discover_and_load()` scans `~/.hermes/hooks/`
2. Each subdirectory with `HOOK.yaml` + `handler.py` is loaded dynamically
@@ -178,5 +190,51 @@ async def handle(event_type: str, context: dict):
5. Errors in any handler are caught and logged — a broken hook never crashes the agent
:::info
Hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp). The CLI does not currently load hooks.
Gateway hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp). The CLI does not load gateway hooks. For hooks that work everywhere, use [plugin hooks](#plugin-hooks).
:::
## Plugin Hooks
[Plugins](/docs/user-guide/features/plugins) can register hooks that fire in **both CLI and gateway** sessions. These are registered programmatically via `ctx.register_hook()` in your plugin's `register()` function.
```python
def register(ctx):
ctx.register_hook("pre_tool_call", my_callback)
ctx.register_hook("post_tool_call", my_callback)
```
### Available Plugin Hooks
| Hook | Fires when | Callback receives |
|------|-----------|-------------------|
| `pre_tool_call` | Before any tool executes | `tool_name`, `args`, `task_id` |
| `post_tool_call` | After any tool returns | `tool_name`, `args`, `result`, `task_id` |
| `pre_llm_call` | Before LLM API request | *(planned — not yet wired)* |
| `post_llm_call` | After LLM API response | *(planned — not yet wired)* |
| `on_session_start` | Session begins | *(planned — not yet wired)* |
| `on_session_end` | Session ends | *(planned — not yet wired)* |
Callbacks receive keyword arguments matching the columns above:
```python
def my_callback(**kwargs):
tool = kwargs["tool_name"]
args = kwargs["args"]
# ...
```
### Example: Block Dangerous Tools
```python
# ~/.hermes/plugins/tool-guard/__init__.py
BLOCKED = {"terminal", "write_file"}
def guard(**kwargs):
if kwargs["tool_name"] in BLOCKED:
print(f"⚠ Blocked tool call: {kwargs['tool_name']}")
def register(ctx):
ctx.register_hook("pre_tool_call", guard)
```
See the **[Plugins guide](/docs/user-guide/features/plugins)** for full details on creating plugins.

View File

@@ -46,14 +46,16 @@ Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable
## Available hooks
Plugins can register callbacks for these lifecycle events. See the **[Event Hooks page](/docs/user-guide/features/hooks#plugin-hooks)** for full details, callback signatures, and examples.
| Hook | Fires when |
|------|-----------|
| `pre_tool_call` | Before any tool executes |
| `post_tool_call` | After any tool returns |
| `pre_llm_call` | Before LLM API request |
| `post_llm_call` | After LLM API response |
| `on_session_start` | Session begins |
| `on_session_end` | Session ends |
| `pre_llm_call` | Before LLM API request *(planned)* |
| `post_llm_call` | After LLM API response *(planned)* |
| `on_session_start` | Session begins *(planned)* |
| `on_session_end` | Session ends *(planned)* |
## Slash commands