mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-04-29 23:41:35 +08:00
Compare commits
22 Commits
hermes/her
...
docs/hooks
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
a18884a3d0 | ||
|
|
29d3f1216b | ||
|
|
fe37a53b75 | ||
|
|
b6ef1deafd | ||
|
|
0f3c191ef1 | ||
|
|
7cdf4efe05 | ||
|
|
adee8d1b5f | ||
|
|
f5b84dddfd | ||
|
|
4549a2f51a | ||
|
|
466720c2f3 | ||
|
|
fccd7a2ab4 | ||
|
|
27c023e071 | ||
|
|
9231a335d4 | ||
|
|
7efaa5968d | ||
|
|
8ee4f32819 | ||
|
|
689344430c | ||
|
|
618f15dda9 | ||
|
|
481915587e | ||
|
|
0b993c1e07 | ||
|
|
9718334962 | ||
|
|
ebcb81b649 | ||
|
|
ac5b8a478a |
192
.github/workflows/supply-chain-audit.yml
vendored
Normal file
192
.github/workflows/supply-chain-audit.yml
vendored
Normal file
@@ -0,0 +1,192 @@
|
||||
name: Supply Chain Audit
|
||||
|
||||
on:
|
||||
pull_request:
|
||||
types: [opened, synchronize, reopened]
|
||||
|
||||
permissions:
|
||||
pull-requests: write
|
||||
contents: read
|
||||
|
||||
jobs:
|
||||
scan:
|
||||
name: Scan PR for supply chain risks
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Checkout
|
||||
uses: actions/checkout@v4
|
||||
with:
|
||||
fetch-depth: 0
|
||||
|
||||
- name: Scan diff for suspicious patterns
|
||||
id: scan
|
||||
env:
|
||||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
run: |
|
||||
set -euo pipefail
|
||||
|
||||
BASE="${{ github.event.pull_request.base.sha }}"
|
||||
HEAD="${{ github.event.pull_request.head.sha }}"
|
||||
|
||||
# Get the full diff (added lines only)
|
||||
DIFF=$(git diff "$BASE".."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
|
||||
|
||||
FINDINGS=""
|
||||
CRITICAL=false
|
||||
|
||||
# --- .pth files (auto-execute on Python startup) ---
|
||||
PTH_FILES=$(git diff --name-only "$BASE".."$HEAD" | grep '\.pth$' || true)
|
||||
if [ -n "$PTH_FILES" ]; then
|
||||
CRITICAL=true
|
||||
FINDINGS="${FINDINGS}
|
||||
### 🚨 CRITICAL: .pth file added or modified
|
||||
Python \`.pth\` files in \`site-packages/\` execute automatically when the interpreter starts — no import required. This is the exact mechanism used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512).
|
||||
|
||||
**Files:**
|
||||
\`\`\`
|
||||
${PTH_FILES}
|
||||
\`\`\`
|
||||
"
|
||||
fi
|
||||
|
||||
# --- base64 + exec/eval combo (the litellm attack pattern) ---
|
||||
B64_EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|decodebytes|urlsafe_b64decode)' | grep -iE 'exec\(|eval\(' | head -10 || true)
|
||||
if [ -n "$B64_EXEC_HITS" ]; then
|
||||
CRITICAL=true
|
||||
FINDINGS="${FINDINGS}
|
||||
### 🚨 CRITICAL: base64 decode + exec/eval combo
|
||||
This is the exact pattern used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512) — base64-decoded strings passed to exec/eval to hide credential-stealing payloads.
|
||||
|
||||
**Matches:**
|
||||
\`\`\`
|
||||
${B64_EXEC_HITS}
|
||||
\`\`\`
|
||||
"
|
||||
fi
|
||||
|
||||
# --- base64 decode/encode (alone — legitimate uses exist) ---
|
||||
B64_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|b64encode|decodebytes|encodebytes|urlsafe_b64decode)|atob\(|btoa\(|Buffer\.from\(.*base64' | head -20 || true)
|
||||
if [ -n "$B64_HITS" ]; then
|
||||
FINDINGS="${FINDINGS}
|
||||
### ⚠️ WARNING: base64 encoding/decoding detected
|
||||
Base64 has legitimate uses (images, JWT, etc.) but is also commonly used to obfuscate malicious payloads. Verify the usage is appropriate.
|
||||
|
||||
**Matches (first 20):**
|
||||
\`\`\`
|
||||
${B64_HITS}
|
||||
\`\`\`
|
||||
"
|
||||
fi
|
||||
|
||||
# --- exec/eval with string arguments ---
|
||||
EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E '(exec|eval)\s*\(' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert\|# ' | head -20 || true)
|
||||
if [ -n "$EXEC_HITS" ]; then
|
||||
FINDINGS="${FINDINGS}
|
||||
### ⚠️ WARNING: exec() or eval() usage
|
||||
Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.
|
||||
|
||||
**Matches (first 20):**
|
||||
\`\`\`
|
||||
${EXEC_HITS}
|
||||
\`\`\`
|
||||
"
|
||||
fi
|
||||
|
||||
# --- subprocess with encoded/obfuscated commands ---
|
||||
PROC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E 'subprocess\.(Popen|call|run)\s*\(' | grep -iE 'base64|decode|encode|\\x|chr\(' | head -10 || true)
|
||||
if [ -n "$PROC_HITS" ]; then
|
||||
CRITICAL=true
|
||||
FINDINGS="${FINDINGS}
|
||||
### 🚨 CRITICAL: subprocess with encoded/obfuscated command
|
||||
Subprocess calls with encoded arguments are a strong indicator of payload execution.
|
||||
|
||||
**Matches:**
|
||||
\`\`\`
|
||||
${PROC_HITS}
|
||||
\`\`\`
|
||||
"
|
||||
fi
|
||||
|
||||
# --- Network calls to non-standard domains ---
|
||||
EXFIL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'requests\.(post|put)\(|httpx\.(post|put)\(|urllib\.request\.urlopen' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert' | head -10 || true)
|
||||
if [ -n "$EXFIL_HITS" ]; then
|
||||
FINDINGS="${FINDINGS}
|
||||
### ⚠️ WARNING: Outbound network calls (POST/PUT)
|
||||
Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.
|
||||
|
||||
**Matches (first 10):**
|
||||
\`\`\`
|
||||
${EXFIL_HITS}
|
||||
\`\`\`
|
||||
"
|
||||
fi
|
||||
|
||||
# --- setup.py / setup.cfg install hooks ---
|
||||
SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(setup\.py|setup\.cfg|__init__\.pth|sitecustomize\.py|usercustomize\.py)$' || true)
|
||||
if [ -n "$SETUP_HITS" ]; then
|
||||
FINDINGS="${FINDINGS}
|
||||
### ⚠️ WARNING: Install hook files modified
|
||||
These files can execute code during package installation or interpreter startup.
|
||||
|
||||
**Files:**
|
||||
\`\`\`
|
||||
${SETUP_HITS}
|
||||
\`\`\`
|
||||
"
|
||||
fi
|
||||
|
||||
# --- Compile/marshal/pickle (code object injection) ---
|
||||
MARSHAL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'marshal\.loads|pickle\.loads|compile\(' | grep -v '^\+\s*#' | grep -v 'test_\|re\.compile\|ast\.compile' | head -10 || true)
|
||||
if [ -n "$MARSHAL_HITS" ]; then
|
||||
FINDINGS="${FINDINGS}
|
||||
### ⚠️ WARNING: marshal/pickle/compile usage
|
||||
These can deserialize or construct executable code objects.
|
||||
|
||||
**Matches:**
|
||||
\`\`\`
|
||||
${MARSHAL_HITS}
|
||||
\`\`\`
|
||||
"
|
||||
fi
|
||||
|
||||
# --- Output results ---
|
||||
if [ -n "$FINDINGS" ]; then
|
||||
echo "found=true" >> "$GITHUB_OUTPUT"
|
||||
if [ "$CRITICAL" = true ]; then
|
||||
echo "critical=true" >> "$GITHUB_OUTPUT"
|
||||
else
|
||||
echo "critical=false" >> "$GITHUB_OUTPUT"
|
||||
fi
|
||||
# Write findings to a file (multiline env vars are fragile)
|
||||
echo "$FINDINGS" > /tmp/findings.md
|
||||
else
|
||||
echo "found=false" >> "$GITHUB_OUTPUT"
|
||||
echo "critical=false" >> "$GITHUB_OUTPUT"
|
||||
fi
|
||||
|
||||
- name: Post warning comment
|
||||
if: steps.scan.outputs.found == 'true'
|
||||
env:
|
||||
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
|
||||
run: |
|
||||
SEVERITY="⚠️ Supply Chain Risk Detected"
|
||||
if [ "${{ steps.scan.outputs.critical }}" = "true" ]; then
|
||||
SEVERITY="🚨 CRITICAL Supply Chain Risk Detected"
|
||||
fi
|
||||
|
||||
BODY="## ${SEVERITY}
|
||||
|
||||
This PR contains patterns commonly associated with supply chain attacks. This does **not** mean the PR is malicious — but these patterns require careful human review before merging.
|
||||
|
||||
$(cat /tmp/findings.md)
|
||||
|
||||
---
|
||||
*Automated scan triggered by [supply-chain-audit](/.github/workflows/supply-chain-audit.yml). If this is a false positive, a maintainer can approve after manual review.*"
|
||||
|
||||
gh pr comment "${{ github.event.pull_request.number }}" --body "$BODY"
|
||||
|
||||
- name: Fail on critical findings
|
||||
if: steps.scan.outputs.critical == 'true'
|
||||
run: |
|
||||
echo "::error::CRITICAL supply chain risk patterns detected in this PR. See the PR comment for details."
|
||||
exit 1
|
||||
1
.gitignore
vendored
1
.gitignore
vendored
@@ -53,3 +53,4 @@ environments/benchmarks/evals/
|
||||
|
||||
# Release script temp files
|
||||
.release_notes.md
|
||||
mini-swe-agent/
|
||||
|
||||
@@ -35,14 +35,12 @@ SUMMARY_PREFIX = (
|
||||
)
|
||||
LEGACY_SUMMARY_PREFIX = "[CONTEXT SUMMARY]:"
|
||||
|
||||
# Minimum / maximum tokens for the summary output
|
||||
# Minimum tokens for the summary output
|
||||
_MIN_SUMMARY_TOKENS = 2000
|
||||
_MAX_SUMMARY_TOKENS = 8000
|
||||
# Proportion of compressed content to allocate for summary
|
||||
_SUMMARY_RATIO = 0.20
|
||||
|
||||
# Token budget for tail protection (keep most-recent context)
|
||||
_DEFAULT_TAIL_TOKEN_BUDGET = 20_000
|
||||
# Absolute ceiling for summary tokens (even on very large context windows)
|
||||
_SUMMARY_TOKENS_CEILING = 12_000
|
||||
|
||||
# Placeholder used when pruning old tool results
|
||||
_PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"
|
||||
@@ -67,8 +65,8 @@ class ContextCompressor:
|
||||
model: str,
|
||||
threshold_percent: float = 0.50,
|
||||
protect_first_n: int = 3,
|
||||
protect_last_n: int = 4,
|
||||
summary_target_tokens: int = 2500,
|
||||
protect_last_n: int = 20,
|
||||
summary_target_ratio: float = 0.20,
|
||||
quiet_mode: bool = False,
|
||||
summary_model_override: str = None,
|
||||
base_url: str = "",
|
||||
@@ -83,7 +81,7 @@ class ContextCompressor:
|
||||
self.threshold_percent = threshold_percent
|
||||
self.protect_first_n = protect_first_n
|
||||
self.protect_last_n = protect_last_n
|
||||
self.summary_target_tokens = summary_target_tokens
|
||||
self.summary_target_ratio = max(0.10, min(summary_target_ratio, 0.80))
|
||||
self.quiet_mode = quiet_mode
|
||||
|
||||
self.context_length = get_model_context_length(
|
||||
@@ -94,12 +92,22 @@ class ContextCompressor:
|
||||
self.threshold_tokens = int(self.context_length * threshold_percent)
|
||||
self.compression_count = 0
|
||||
|
||||
# Derive token budgets: ratio is relative to the threshold, not total context
|
||||
target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
|
||||
self.tail_token_budget = target_tokens
|
||||
self.max_summary_tokens = min(
|
||||
int(self.context_length * 0.05), _SUMMARY_TOKENS_CEILING,
|
||||
)
|
||||
|
||||
if not quiet_mode:
|
||||
logger.info(
|
||||
"Context compressor initialized: model=%s context_length=%d "
|
||||
"threshold=%d (%.0f%%) provider=%s base_url=%s",
|
||||
"threshold=%d (%.0f%%) target_ratio=%.0f%% tail_budget=%d "
|
||||
"provider=%s base_url=%s",
|
||||
model, self.context_length, self.threshold_tokens,
|
||||
threshold_percent * 100, provider or "none", base_url or "none",
|
||||
threshold_percent * 100, self.summary_target_ratio * 100,
|
||||
self.tail_token_budget,
|
||||
provider or "none", base_url or "none",
|
||||
)
|
||||
self._context_probed = False # True after a step-down from context error
|
||||
|
||||
@@ -179,10 +187,15 @@ class ContextCompressor:
|
||||
# ------------------------------------------------------------------
|
||||
|
||||
def _compute_summary_budget(self, turns_to_summarize: List[Dict[str, Any]]) -> int:
|
||||
"""Scale summary token budget with the amount of content being compressed."""
|
||||
"""Scale summary token budget with the amount of content being compressed.
|
||||
|
||||
The maximum scales with the model's context window (5% of context,
|
||||
capped at ``_SUMMARY_TOKENS_CEILING``) so large-context models get
|
||||
richer summaries instead of being hard-capped at 8K tokens.
|
||||
"""
|
||||
content_tokens = estimate_messages_tokens_rough(turns_to_summarize)
|
||||
budget = int(content_tokens * _SUMMARY_RATIO)
|
||||
return max(_MIN_SUMMARY_TOKENS, min(budget, _MAX_SUMMARY_TOKENS))
|
||||
return max(_MIN_SUMMARY_TOKENS, min(budget, self.max_summary_tokens))
|
||||
|
||||
def _serialize_for_summary(self, turns: List[Dict[str, Any]]) -> str:
|
||||
"""Serialize conversation turns into labeled text for the summarizer.
|
||||
@@ -477,14 +490,20 @@ Write only the summary body. Do not include any preamble or prefix."""
|
||||
|
||||
def _find_tail_cut_by_tokens(
|
||||
self, messages: List[Dict[str, Any]], head_end: int,
|
||||
token_budget: int = _DEFAULT_TAIL_TOKEN_BUDGET,
|
||||
token_budget: int | None = None,
|
||||
) -> int:
|
||||
"""Walk backward from the end of messages, accumulating tokens until
|
||||
the budget is reached. Returns the index where the tail starts.
|
||||
|
||||
``token_budget`` defaults to ``self.tail_token_budget`` which is
|
||||
derived from ``summary_target_ratio * context_length``, so it
|
||||
scales automatically with the model's context window.
|
||||
|
||||
Never cuts inside a tool_call/result group. Falls back to the old
|
||||
``protect_last_n`` if the budget would protect fewer messages.
|
||||
"""
|
||||
if token_budget is None:
|
||||
token_budget = self.tail_token_budget
|
||||
n = len(messages)
|
||||
min_tail = self.protect_last_n
|
||||
accumulated = 0
|
||||
|
||||
@@ -657,10 +657,6 @@ def format_context_pressure(
|
||||
The bar and percentage show progress toward the compaction threshold,
|
||||
NOT the raw context window. 100% = compaction fires.
|
||||
|
||||
Uses ANSI colors:
|
||||
- cyan at ~60% to compaction = informational
|
||||
- bold yellow at ~85% to compaction = warning
|
||||
|
||||
Args:
|
||||
compaction_progress: How close to compaction (0.0–1.0, 1.0 = fires).
|
||||
threshold_tokens: Compaction threshold in tokens.
|
||||
@@ -674,18 +670,12 @@ def format_context_pressure(
|
||||
threshold_k = f"{threshold_tokens // 1000}k" if threshold_tokens >= 1000 else str(threshold_tokens)
|
||||
threshold_pct_int = int(threshold_percent * 100)
|
||||
|
||||
# Tier styling
|
||||
if compaction_progress >= 0.85:
|
||||
color = f"{_BOLD}{_YELLOW}"
|
||||
icon = "⚠"
|
||||
if compression_enabled:
|
||||
hint = "compaction imminent"
|
||||
else:
|
||||
hint = "no auto-compaction"
|
||||
color = f"{_BOLD}{_YELLOW}"
|
||||
icon = "⚠"
|
||||
if compression_enabled:
|
||||
hint = "compaction approaching"
|
||||
else:
|
||||
color = _CYAN
|
||||
icon = "◐"
|
||||
hint = "approaching compaction"
|
||||
hint = "no auto-compaction"
|
||||
|
||||
return (
|
||||
f" {color}{icon} context {bar} {pct_int}% to compaction{_ANSI_RESET}"
|
||||
@@ -709,14 +699,10 @@ def format_context_pressure_gateway(
|
||||
|
||||
threshold_pct_int = int(threshold_percent * 100)
|
||||
|
||||
if compaction_progress >= 0.85:
|
||||
icon = "⚠️"
|
||||
if compression_enabled:
|
||||
hint = f"Context compaction is imminent (threshold: {threshold_pct_int}% of window)."
|
||||
else:
|
||||
hint = "Auto-compaction is disabled — context may be truncated."
|
||||
icon = "⚠️"
|
||||
if compression_enabled:
|
||||
hint = f"Context compaction approaching (threshold: {threshold_pct_int}% of window)."
|
||||
else:
|
||||
icon = "ℹ️"
|
||||
hint = f"Compaction threshold is at {threshold_pct_int}% of context window."
|
||||
hint = "Auto-compaction is disabled — context may be truncated."
|
||||
|
||||
return f"{icon} Context: {bar} {pct_int}% to compaction\n{hint}"
|
||||
|
||||
@@ -232,19 +232,34 @@ browser:
|
||||
# 1. Tracks actual token usage from API responses (not estimates)
|
||||
# 2. When prompt_tokens >= threshold% of model's context_length, triggers compression
|
||||
# 3. Protects first 3 turns (system prompt, initial request, first response)
|
||||
# 4. Protects last 4 turns (recent context is most relevant)
|
||||
# 4. Protects last N turns (default 20 messages = ~10 full turns of recent context)
|
||||
# 5. Summarizes middle turns using a fast/cheap model
|
||||
# 6. Inserts summary as a user message, continues conversation seamlessly
|
||||
#
|
||||
# Post-compression tail budget is target_ratio × threshold × context_length:
|
||||
# 200K context, threshold 0.50, ratio 0.20 → 20K tokens of recent tail preserved
|
||||
# 1M context, threshold 0.50, ratio 0.20 → 100K tokens of recent tail preserved
|
||||
#
|
||||
compression:
|
||||
# Enable automatic context compression (default: true)
|
||||
# Set to false if you prefer to manage context manually or want errors on overflow
|
||||
enabled: true
|
||||
|
||||
# Trigger compression at this % of model's context limit (default: 0.85 = 85%)
|
||||
# Trigger compression at this % of model's context limit (default: 0.50 = 50%)
|
||||
# Lower values = more aggressive compression, higher values = compress later
|
||||
threshold: 0.85
|
||||
threshold: 0.50
|
||||
|
||||
# Fraction of the threshold to preserve as recent tail (default: 0.20 = 20%)
|
||||
# e.g. 20% of 50% threshold = 10% of total context kept as recent messages.
|
||||
# Summary output is separately capped at 12K tokens (Gemini output limit).
|
||||
# Range: 0.10 - 0.80
|
||||
target_ratio: 0.20
|
||||
|
||||
# Number of most-recent messages to always preserve (default: 20 ≈ 10 full turns)
|
||||
# Higher values keep more recent conversation intact at the cost of more aggressive
|
||||
# compression of older turns.
|
||||
protect_last_n: 20
|
||||
|
||||
# Model to use for generating summaries (fast/cheap recommended)
|
||||
# This model compresses the middle turns into a concise summary.
|
||||
# IMPORTANT: it receives the full middle section of the conversation, so it
|
||||
|
||||
6
cli.py
6
cli.py
@@ -1509,10 +1509,14 @@ class HermesCLI:
|
||||
|
||||
self._reasoning_buf = getattr(self, "_reasoning_buf", "") + text
|
||||
|
||||
# Emit complete lines
|
||||
# Emit complete lines, and force-flush long partial lines so
|
||||
# reasoning is visible in real-time even without newlines.
|
||||
while "\n" in self._reasoning_buf:
|
||||
line, self._reasoning_buf = self._reasoning_buf.split("\n", 1)
|
||||
_cprint(f"{_DIM}{line}{_RST}")
|
||||
if len(self._reasoning_buf) > 80:
|
||||
_cprint(f"{_DIM}{self._reasoning_buf}{_RST}")
|
||||
self._reasoning_buf = ""
|
||||
|
||||
def _close_reasoning_box(self) -> None:
|
||||
"""Close the live reasoning box if it's open."""
|
||||
|
||||
@@ -163,8 +163,10 @@ DEFAULT_CONFIG = {
|
||||
|
||||
"compression": {
|
||||
"enabled": True,
|
||||
"threshold": 0.50,
|
||||
"summary_model": "", # empty = use main configured model
|
||||
"threshold": 0.50, # compress when context usage exceeds this ratio
|
||||
"target_ratio": 0.20, # fraction of threshold to preserve as recent tail
|
||||
"protect_last_n": 20, # minimum recent messages to keep uncompressed
|
||||
"summary_model": "", # empty = use main configured model
|
||||
"summary_provider": "auto",
|
||||
"summary_base_url": None,
|
||||
},
|
||||
@@ -1685,6 +1687,8 @@ def show_config():
|
||||
print(f" Enabled: {'yes' if enabled else 'no'}")
|
||||
if enabled:
|
||||
print(f" Threshold: {compression.get('threshold', 0.50) * 100:.0f}%")
|
||||
print(f" Target ratio: {compression.get('target_ratio', 0.20) * 100:.0f}% of threshold preserved")
|
||||
print(f" Protect last: {compression.get('protect_last_n', 20)} messages")
|
||||
_sm = compression.get('summary_model', '') or '(main model)'
|
||||
print(f" Model: {_sm}")
|
||||
comp_provider = compression.get('summary_provider', 'auto')
|
||||
|
||||
@@ -873,9 +873,9 @@ def setup_model_provider(config: dict):
|
||||
keep_label = None # No provider configured — don't show "Keep current"
|
||||
|
||||
provider_choices = [
|
||||
"OpenRouter API key (100+ models, pay-per-use)",
|
||||
"Login with Nous Portal (Nous Research subscription — OAuth)",
|
||||
"Login with OpenAI Codex",
|
||||
"OpenRouter API key (100+ models, pay-per-use)",
|
||||
"Custom OpenAI-compatible endpoint (self-hosted / VLLM / etc.)",
|
||||
"Z.AI / GLM (Zhipu AI models)",
|
||||
"Kimi / Moonshot (Kimi coding models)",
|
||||
@@ -894,7 +894,7 @@ def setup_model_provider(config: dict):
|
||||
provider_choices.append(keep_label)
|
||||
|
||||
# Default to "Keep current" if a provider exists, otherwise OpenRouter (most common)
|
||||
default_provider = len(provider_choices) - 1 if has_any_provider else 2
|
||||
default_provider = len(provider_choices) - 1 if has_any_provider else 0
|
||||
|
||||
if not has_any_provider:
|
||||
print_warning("An inference provider is required for Hermes to work.")
|
||||
@@ -911,81 +911,7 @@ def setup_model_provider(config: dict):
|
||||
selected_base_url = None # deferred until after model selection
|
||||
nous_models = [] # populated if Nous login succeeds
|
||||
|
||||
if provider_idx == 0: # Nous Portal (OAuth)
|
||||
selected_provider = "nous"
|
||||
print()
|
||||
print_header("Nous Portal Login")
|
||||
print_info("This will open your browser to authenticate with Nous Portal.")
|
||||
print_info("You'll need a Nous Research account with an active subscription.")
|
||||
print()
|
||||
|
||||
try:
|
||||
from hermes_cli.auth import _login_nous, ProviderConfig
|
||||
import argparse
|
||||
|
||||
mock_args = argparse.Namespace(
|
||||
portal_url=None,
|
||||
inference_url=None,
|
||||
client_id=None,
|
||||
scope=None,
|
||||
no_browser=False,
|
||||
timeout=15.0,
|
||||
ca_bundle=None,
|
||||
insecure=False,
|
||||
)
|
||||
pconfig = PROVIDER_REGISTRY["nous"]
|
||||
_login_nous(mock_args, pconfig)
|
||||
_sync_model_from_disk(config)
|
||||
|
||||
# Fetch models for the selection step
|
||||
try:
|
||||
creds = resolve_nous_runtime_credentials(
|
||||
min_key_ttl_seconds=5 * 60,
|
||||
timeout_seconds=15.0,
|
||||
)
|
||||
nous_models = fetch_nous_models(
|
||||
inference_base_url=creds.get("base_url", ""),
|
||||
api_key=creds.get("api_key", ""),
|
||||
)
|
||||
except Exception as e:
|
||||
logger.debug("Could not fetch Nous models after login: %s", e)
|
||||
|
||||
except SystemExit:
|
||||
print_warning("Nous Portal login was cancelled or failed.")
|
||||
print_info("You can try again later with: hermes model")
|
||||
selected_provider = None
|
||||
except Exception as e:
|
||||
print_error(f"Login failed: {e}")
|
||||
print_info("You can try again later with: hermes model")
|
||||
selected_provider = None
|
||||
|
||||
elif provider_idx == 1: # OpenAI Codex
|
||||
selected_provider = "openai-codex"
|
||||
print()
|
||||
print_header("OpenAI Codex Login")
|
||||
print()
|
||||
|
||||
try:
|
||||
import argparse
|
||||
|
||||
mock_args = argparse.Namespace()
|
||||
_login_openai_codex(mock_args, PROVIDER_REGISTRY["openai-codex"])
|
||||
# Clear custom endpoint vars that would override provider routing.
|
||||
if existing_custom:
|
||||
save_env_value("OPENAI_BASE_URL", "")
|
||||
save_env_value("OPENAI_API_KEY", "")
|
||||
_update_config_for_provider("openai-codex", DEFAULT_CODEX_BASE_URL)
|
||||
_set_model_provider(config, "openai-codex", DEFAULT_CODEX_BASE_URL)
|
||||
except SystemExit:
|
||||
print_warning("OpenAI Codex login was cancelled or failed.")
|
||||
print_info("You can try again later with: hermes model")
|
||||
selected_provider = None
|
||||
except Exception as e:
|
||||
print_error(f"Login failed: {e}")
|
||||
print_info("You can try again later with: hermes model")
|
||||
selected_provider = None
|
||||
|
||||
elif provider_idx == 2: # OpenRouter
|
||||
if provider_idx == 0: # OpenRouter
|
||||
selected_provider = "openrouter"
|
||||
print()
|
||||
print_header("OpenRouter API Key")
|
||||
@@ -1040,6 +966,80 @@ def setup_model_provider(config: dict):
|
||||
except Exception as e:
|
||||
logger.debug("Could not save provider to config.yaml: %s", e)
|
||||
|
||||
elif provider_idx == 1: # Nous Portal (OAuth)
|
||||
selected_provider = "nous"
|
||||
print()
|
||||
print_header("Nous Portal Login")
|
||||
print_info("This will open your browser to authenticate with Nous Portal.")
|
||||
print_info("You'll need a Nous Research account with an active subscription.")
|
||||
print()
|
||||
|
||||
try:
|
||||
from hermes_cli.auth import _login_nous, ProviderConfig
|
||||
import argparse
|
||||
|
||||
mock_args = argparse.Namespace(
|
||||
portal_url=None,
|
||||
inference_url=None,
|
||||
client_id=None,
|
||||
scope=None,
|
||||
no_browser=False,
|
||||
timeout=15.0,
|
||||
ca_bundle=None,
|
||||
insecure=False,
|
||||
)
|
||||
pconfig = PROVIDER_REGISTRY["nous"]
|
||||
_login_nous(mock_args, pconfig)
|
||||
_sync_model_from_disk(config)
|
||||
|
||||
# Fetch models for the selection step
|
||||
try:
|
||||
creds = resolve_nous_runtime_credentials(
|
||||
min_key_ttl_seconds=5 * 60,
|
||||
timeout_seconds=15.0,
|
||||
)
|
||||
nous_models = fetch_nous_models(
|
||||
inference_base_url=creds.get("base_url", ""),
|
||||
api_key=creds.get("api_key", ""),
|
||||
)
|
||||
except Exception as e:
|
||||
logger.debug("Could not fetch Nous models after login: %s", e)
|
||||
|
||||
except SystemExit:
|
||||
print_warning("Nous Portal login was cancelled or failed.")
|
||||
print_info("You can try again later with: hermes model")
|
||||
selected_provider = None
|
||||
except Exception as e:
|
||||
print_error(f"Login failed: {e}")
|
||||
print_info("You can try again later with: hermes model")
|
||||
selected_provider = None
|
||||
|
||||
elif provider_idx == 2: # OpenAI Codex
|
||||
selected_provider = "openai-codex"
|
||||
print()
|
||||
print_header("OpenAI Codex Login")
|
||||
print()
|
||||
|
||||
try:
|
||||
import argparse
|
||||
|
||||
mock_args = argparse.Namespace()
|
||||
_login_openai_codex(mock_args, PROVIDER_REGISTRY["openai-codex"])
|
||||
# Clear custom endpoint vars that would override provider routing.
|
||||
if existing_custom:
|
||||
save_env_value("OPENAI_BASE_URL", "")
|
||||
save_env_value("OPENAI_API_KEY", "")
|
||||
_update_config_for_provider("openai-codex", DEFAULT_CODEX_BASE_URL)
|
||||
_set_model_provider(config, "openai-codex", DEFAULT_CODEX_BASE_URL)
|
||||
except SystemExit:
|
||||
print_warning("OpenAI Codex login was cancelled or failed.")
|
||||
print_info("You can try again later with: hermes model")
|
||||
selected_provider = None
|
||||
except Exception as e:
|
||||
print_error(f"Login failed: {e}")
|
||||
print_info("You can try again later with: hermes model")
|
||||
selected_provider = None
|
||||
|
||||
elif provider_idx == 3: # Custom endpoint
|
||||
selected_provider = "custom"
|
||||
print()
|
||||
|
||||
153
run_agent.py
153
run_agent.py
@@ -585,8 +585,7 @@ class AIAgent:
|
||||
# Context pressure warnings: notify the USER (not the LLM) as context
|
||||
# fills up. Purely informational — displayed in CLI output and sent via
|
||||
# status_callback for gateway platforms. Does NOT inject into messages.
|
||||
self._context_50_warned = False
|
||||
self._context_70_warned = False
|
||||
self._context_pressure_warned = False
|
||||
|
||||
# Persistent error log -- always writes WARNING+ to ~/.hermes/logs/errors.log
|
||||
# so tool failures, API errors, etc. are inspectable after the fact.
|
||||
@@ -1013,6 +1012,8 @@ class AIAgent:
|
||||
compression_threshold = float(_compression_cfg.get("threshold", 0.50))
|
||||
compression_enabled = str(_compression_cfg.get("enabled", True)).lower() in ("true", "1", "yes")
|
||||
compression_summary_model = _compression_cfg.get("summary_model") or None
|
||||
compression_target_ratio = float(_compression_cfg.get("target_ratio", 0.20))
|
||||
compression_protect_last = int(_compression_cfg.get("protect_last_n", 20))
|
||||
|
||||
# Read explicit context_length override from model config
|
||||
_model_cfg = _agent_cfg.get("model", {})
|
||||
@@ -1051,8 +1052,8 @@ class AIAgent:
|
||||
model=self.model,
|
||||
threshold_percent=compression_threshold,
|
||||
protect_first_n=3,
|
||||
protect_last_n=4,
|
||||
summary_target_tokens=500,
|
||||
protect_last_n=compression_protect_last,
|
||||
summary_target_ratio=compression_target_ratio,
|
||||
summary_model_override=compression_summary_model,
|
||||
quiet_mode=self.quiet_mode,
|
||||
base_url=self.base_url,
|
||||
@@ -2362,7 +2363,13 @@ class AIAgent:
|
||||
prompt_parts.append(skills_prompt)
|
||||
|
||||
if not self.skip_context_files:
|
||||
context_files_prompt = build_context_files_prompt(skip_soul=_soul_loaded)
|
||||
# Use TERMINAL_CWD for context file discovery when set (gateway
|
||||
# mode). The gateway process runs from the hermes-agent install
|
||||
# dir, so os.getcwd() would pick up the repo's AGENTS.md and
|
||||
# other dev files — inflating token usage by ~10k for no benefit.
|
||||
_context_cwd = os.getenv("TERMINAL_CWD") or None
|
||||
context_files_prompt = build_context_files_prompt(
|
||||
cwd=_context_cwd, skip_soul=_soul_loaded)
|
||||
if context_files_prompt:
|
||||
prompt_parts.append(context_files_prompt)
|
||||
|
||||
@@ -3578,7 +3585,20 @@ class AIAgent:
|
||||
|
||||
def _call_chat_completions():
|
||||
"""Stream a chat completions response."""
|
||||
stream_kwargs = {**api_kwargs, "stream": True, "stream_options": {"include_usage": True}}
|
||||
import httpx as _httpx
|
||||
_base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 900.0))
|
||||
_stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 60.0))
|
||||
stream_kwargs = {
|
||||
**api_kwargs,
|
||||
"stream": True,
|
||||
"stream_options": {"include_usage": True},
|
||||
"timeout": _httpx.Timeout(
|
||||
connect=30.0,
|
||||
read=_stream_read_timeout,
|
||||
write=_base_timeout,
|
||||
pool=30.0,
|
||||
),
|
||||
}
|
||||
request_client_holder["client"] = self._create_request_openai_client(
|
||||
reason="chat_completion_stream_request"
|
||||
)
|
||||
@@ -3646,6 +3666,7 @@ class AIAgent:
|
||||
name = entry["function"]["name"]
|
||||
if name and idx not in tool_gen_notified:
|
||||
tool_gen_notified.add(idx)
|
||||
_fire_first_delta()
|
||||
self._fire_tool_gen_started(name)
|
||||
|
||||
if chunk.choices[0].finish_reason:
|
||||
@@ -3714,6 +3735,7 @@ class AIAgent:
|
||||
has_tool_use = True
|
||||
tool_name = getattr(block, "name", None)
|
||||
if tool_name:
|
||||
_fire_first_delta()
|
||||
self._fire_tool_gen_started(tool_name)
|
||||
|
||||
elif event_type == "content_block_delta":
|
||||
@@ -3735,29 +3757,84 @@ class AIAgent:
|
||||
return stream.get_final_message()
|
||||
|
||||
def _call():
|
||||
import httpx as _httpx
|
||||
|
||||
_max_stream_retries = int(os.getenv("HERMES_STREAM_RETRIES", 2))
|
||||
|
||||
try:
|
||||
if self.api_mode == "anthropic_messages":
|
||||
self._try_refresh_anthropic_client_credentials()
|
||||
result["response"] = _call_anthropic()
|
||||
else:
|
||||
result["response"] = _call_chat_completions()
|
||||
except Exception as e:
|
||||
if deltas_were_sent["yes"]:
|
||||
# Streaming failed AFTER some tokens were already delivered
|
||||
# to consumers. Don't fall back — that would cause
|
||||
# double-delivery (partial streamed + full non-streamed).
|
||||
# Let the error propagate; the partial content already
|
||||
# reached the user via the stream.
|
||||
logger.warning("Streaming failed after partial delivery, not falling back: %s", e)
|
||||
result["error"] = e
|
||||
else:
|
||||
# Streaming failed before any tokens reached consumers.
|
||||
# Safe to fall back to the standard non-streaming path.
|
||||
logger.info("Streaming failed before delivery, falling back to non-streaming: %s", e)
|
||||
for _stream_attempt in range(_max_stream_retries + 1):
|
||||
try:
|
||||
result["response"] = self._interruptible_api_call(api_kwargs)
|
||||
except Exception as fallback_err:
|
||||
result["error"] = fallback_err
|
||||
if self.api_mode == "anthropic_messages":
|
||||
self._try_refresh_anthropic_client_credentials()
|
||||
result["response"] = _call_anthropic()
|
||||
else:
|
||||
result["response"] = _call_chat_completions()
|
||||
return # success
|
||||
except Exception as e:
|
||||
if deltas_were_sent["yes"]:
|
||||
# Streaming failed AFTER some tokens were already
|
||||
# delivered. Don't retry or fall back — partial
|
||||
# content already reached the user.
|
||||
logger.warning(
|
||||
"Streaming failed after partial delivery, not retrying: %s", e
|
||||
)
|
||||
result["error"] = e
|
||||
return
|
||||
|
||||
_is_timeout = isinstance(
|
||||
e, (_httpx.ReadTimeout, _httpx.ConnectTimeout, _httpx.PoolTimeout)
|
||||
)
|
||||
_is_conn_err = isinstance(
|
||||
e, (_httpx.ConnectError, _httpx.RemoteProtocolError, ConnectionError)
|
||||
)
|
||||
|
||||
if _is_timeout or _is_conn_err:
|
||||
# Transient network / timeout error. Retry the
|
||||
# streaming request with a fresh connection rather
|
||||
# than falling back to non-streaming (which would
|
||||
# hang for up to 15 min on the same dead server).
|
||||
if _stream_attempt < _max_stream_retries:
|
||||
logger.info(
|
||||
"Streaming attempt %s/%s failed (%s: %s), "
|
||||
"retrying with fresh connection...",
|
||||
_stream_attempt + 1,
|
||||
_max_stream_retries + 1,
|
||||
type(e).__name__,
|
||||
e,
|
||||
)
|
||||
# Close the stale request client before retry
|
||||
stale = request_client_holder.get("client")
|
||||
if stale is not None:
|
||||
self._close_request_openai_client(
|
||||
stale, reason="stream_retry_cleanup"
|
||||
)
|
||||
request_client_holder["client"] = None
|
||||
continue
|
||||
# Exhausted retries — propagate to outer loop
|
||||
logger.warning(
|
||||
"Streaming exhausted %s retries on transient error: %s",
|
||||
_max_stream_retries + 1,
|
||||
e,
|
||||
)
|
||||
result["error"] = e
|
||||
return
|
||||
|
||||
# Non-transient error (e.g. "streaming not supported",
|
||||
# auth error, 4xx). Fall back to non-streaming once.
|
||||
err_msg = str(e).lower()
|
||||
if "stream" in err_msg and "not supported" in err_msg:
|
||||
logger.info(
|
||||
"Streaming not supported, falling back to non-streaming: %s", e
|
||||
)
|
||||
try:
|
||||
result["response"] = self._interruptible_api_call(api_kwargs)
|
||||
except Exception as fallback_err:
|
||||
result["error"] = fallback_err
|
||||
return
|
||||
|
||||
# Unknown error — propagate to outer retry loop
|
||||
result["error"] = e
|
||||
return
|
||||
finally:
|
||||
request_client = request_client_holder.get("client")
|
||||
if request_client is not None:
|
||||
@@ -4609,9 +4686,17 @@ class AIAgent:
|
||||
except Exception as e:
|
||||
logger.debug("Session DB compression split failed: %s", e)
|
||||
|
||||
# Reset context pressure warnings — usage drops after compaction
|
||||
self._context_50_warned = False
|
||||
self._context_70_warned = False
|
||||
# Reset context pressure warning and token estimate — usage drops
|
||||
# after compaction. Without this, the stale last_prompt_tokens from
|
||||
# the previous API call causes the pressure calculation to stay at
|
||||
# >1000% and spam warnings / re-trigger compression in a loop.
|
||||
self._context_pressure_warned = False
|
||||
_compressed_est = (
|
||||
estimate_tokens_rough(new_system_prompt)
|
||||
+ estimate_messages_tokens_rough(compressed)
|
||||
)
|
||||
self.context_compressor.last_prompt_tokens = _compressed_est
|
||||
self.context_compressor.last_completion_tokens = 0
|
||||
|
||||
return compressed, new_system_prompt
|
||||
|
||||
@@ -6844,12 +6929,8 @@ class AIAgent:
|
||||
# and fires status_callback for gateway platforms.
|
||||
if _compressor.threshold_tokens > 0:
|
||||
_compaction_progress = _estimated_next_prompt / _compressor.threshold_tokens
|
||||
if _compaction_progress >= 0.85 and not self._context_70_warned:
|
||||
self._context_70_warned = True
|
||||
self._context_50_warned = True # skip first tier if we jumped past it
|
||||
self._emit_context_pressure(_compaction_progress, _compressor)
|
||||
elif _compaction_progress >= 0.60 and not self._context_50_warned:
|
||||
self._context_50_warned = True
|
||||
if _compaction_progress >= 0.85 and not self._context_pressure_warned:
|
||||
self._context_pressure_warned = True
|
||||
self._emit_context_pressure(_compaction_progress, _compressor)
|
||||
|
||||
if self.compression_enabled and _compressor.should_compress(_estimated_next_prompt):
|
||||
|
||||
@@ -217,7 +217,7 @@ class TestCompressWithClient:
|
||||
mock_client.chat.completions.create.return_value = mock_response
|
||||
|
||||
with patch("agent.context_compressor.get_model_context_length", return_value=100000):
|
||||
c = ContextCompressor(model="test", quiet_mode=True)
|
||||
c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
|
||||
|
||||
msgs = [{"role": "user" if i % 2 == 0 else "assistant", "content": f"msg {i}"} for i in range(10)]
|
||||
with patch("agent.context_compressor.call_llm", return_value=mock_response):
|
||||
@@ -513,3 +513,52 @@ class TestCompressWithClient:
|
||||
for msg in result:
|
||||
if msg.get("role") == "tool" and msg.get("tool_call_id"):
|
||||
assert msg["tool_call_id"] in called_ids
|
||||
|
||||
|
||||
class TestSummaryTargetRatio:
|
||||
"""Verify that summary_target_ratio properly scales budgets with context window."""
|
||||
|
||||
def test_tail_budget_scales_with_context(self):
|
||||
"""Tail token budget should be threshold_tokens * summary_target_ratio."""
|
||||
with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
|
||||
c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.40)
|
||||
# 200K * 0.50 threshold * 0.40 ratio = 40K
|
||||
assert c.tail_token_budget == 40_000
|
||||
|
||||
with patch("agent.context_compressor.get_model_context_length", return_value=1_000_000):
|
||||
c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.40)
|
||||
# 1M * 0.50 threshold * 0.40 ratio = 200K
|
||||
assert c.tail_token_budget == 200_000
|
||||
|
||||
def test_summary_cap_scales_with_context(self):
|
||||
"""Max summary tokens should be 5% of context, capped at 12K."""
|
||||
with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
|
||||
c = ContextCompressor(model="test", quiet_mode=True)
|
||||
assert c.max_summary_tokens == 10_000 # 200K * 0.05
|
||||
|
||||
with patch("agent.context_compressor.get_model_context_length", return_value=1_000_000):
|
||||
c = ContextCompressor(model="test", quiet_mode=True)
|
||||
assert c.max_summary_tokens == 12_000 # capped at 12K ceiling
|
||||
|
||||
def test_ratio_clamped(self):
|
||||
"""Ratio should be clamped to [0.10, 0.80]."""
|
||||
with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
|
||||
c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.05)
|
||||
assert c.summary_target_ratio == 0.10
|
||||
|
||||
with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
|
||||
c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.95)
|
||||
assert c.summary_target_ratio == 0.80
|
||||
|
||||
def test_default_threshold_is_50_percent(self):
|
||||
"""Default compression threshold should be 50%."""
|
||||
with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
|
||||
c = ContextCompressor(model="test", quiet_mode=True)
|
||||
assert c.threshold_percent == 0.50
|
||||
assert c.threshold_tokens == 50_000
|
||||
|
||||
def test_default_protect_last_n_is_20(self):
|
||||
"""Default protect_last_n should be 20."""
|
||||
with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
|
||||
c = ContextCompressor(model="test", quiet_mode=True)
|
||||
assert c.protect_last_n == 20
|
||||
|
||||
@@ -34,7 +34,7 @@ def test_nous_oauth_setup_keeps_current_model_when_syncing_disk_provider(
|
||||
|
||||
def fake_prompt_choice(question, choices, default=0):
|
||||
if question == "Select your inference provider:":
|
||||
return 0
|
||||
return 1 # Nous Portal
|
||||
if question == "Configure vision:":
|
||||
return len(choices) - 1
|
||||
if question == "Select default model:":
|
||||
@@ -135,7 +135,7 @@ def test_codex_setup_uses_runtime_access_token_for_live_model_list(tmp_path, mon
|
||||
|
||||
def fake_prompt_choice(question, choices, default=0):
|
||||
if question == "Select your inference provider:":
|
||||
return 1
|
||||
return 2 # OpenAI Codex
|
||||
if question == "Select default model:":
|
||||
return 0
|
||||
tts_idx = _maybe_keep_current_tts(question, choices)
|
||||
|
||||
@@ -401,7 +401,7 @@ def test_setup_switch_custom_to_codex_clears_custom_endpoint_and_updates_config(
|
||||
|
||||
def fake_prompt_choice(question, choices, default=0):
|
||||
if question == "Select your inference provider:":
|
||||
return 1
|
||||
return 2 # OpenAI Codex
|
||||
if question == "Select default model:":
|
||||
return 0
|
||||
tts_idx = _maybe_keep_current_tts(question, choices)
|
||||
|
||||
@@ -29,40 +29,36 @@ class TestFormatContextPressure:
|
||||
raw context window. 60% = 60% of the way to compaction.
|
||||
"""
|
||||
|
||||
def test_60_percent_uses_info_icon(self):
|
||||
line = format_context_pressure(0.60, 100_000, 0.50)
|
||||
assert "◐" in line
|
||||
assert "60% to compaction" in line
|
||||
|
||||
def test_85_percent_uses_warning_icon(self):
|
||||
line = format_context_pressure(0.85, 100_000, 0.50)
|
||||
def test_80_percent_uses_warning_icon(self):
|
||||
line = format_context_pressure(0.80, 100_000, 0.50)
|
||||
assert "⚠" in line
|
||||
assert "85% to compaction" in line
|
||||
assert "80% to compaction" in line
|
||||
|
||||
def test_90_percent_uses_warning_icon(self):
|
||||
line = format_context_pressure(0.90, 100_000, 0.50)
|
||||
assert "⚠" in line
|
||||
assert "90% to compaction" in line
|
||||
|
||||
def test_bar_length_scales_with_progress(self):
|
||||
line_60 = format_context_pressure(0.60, 100_000, 0.50)
|
||||
line_85 = format_context_pressure(0.85, 100_000, 0.50)
|
||||
assert line_85.count("▰") > line_60.count("▰")
|
||||
line_80 = format_context_pressure(0.80, 100_000, 0.50)
|
||||
line_95 = format_context_pressure(0.95, 100_000, 0.50)
|
||||
assert line_95.count("▰") > line_80.count("▰")
|
||||
|
||||
def test_shows_threshold_tokens(self):
|
||||
line = format_context_pressure(0.60, 100_000, 0.50)
|
||||
line = format_context_pressure(0.80, 100_000, 0.50)
|
||||
assert "100k" in line
|
||||
|
||||
def test_small_threshold(self):
|
||||
line = format_context_pressure(0.60, 500, 0.50)
|
||||
line = format_context_pressure(0.80, 500, 0.50)
|
||||
assert "500" in line
|
||||
|
||||
def test_shows_threshold_percent(self):
|
||||
line = format_context_pressure(0.85, 100_000, 0.50)
|
||||
assert "50%" in line # threshold percent shown
|
||||
line = format_context_pressure(0.80, 100_000, 0.50)
|
||||
assert "50%" in line
|
||||
|
||||
def test_imminent_hint_at_85(self):
|
||||
line = format_context_pressure(0.85, 100_000, 0.50)
|
||||
assert "compaction imminent" in line
|
||||
|
||||
def test_approaching_hint_below_85(self):
|
||||
line = format_context_pressure(0.60, 100_000, 0.80)
|
||||
assert "approaching compaction" in line
|
||||
def test_approaching_hint(self):
|
||||
line = format_context_pressure(0.80, 100_000, 0.50)
|
||||
assert "compaction approaching" in line
|
||||
|
||||
def test_no_compaction_when_disabled(self):
|
||||
line = format_context_pressure(0.85, 100_000, 0.50, compression_enabled=False)
|
||||
@@ -82,26 +78,26 @@ class TestFormatContextPressure:
|
||||
class TestFormatContextPressureGateway:
|
||||
"""Gateway (plain text) context pressure display."""
|
||||
|
||||
def test_60_percent_informational(self):
|
||||
msg = format_context_pressure_gateway(0.60, 0.50)
|
||||
assert "60% to compaction" in msg
|
||||
assert "50%" in msg # threshold shown
|
||||
def test_80_percent_warning(self):
|
||||
msg = format_context_pressure_gateway(0.80, 0.50)
|
||||
assert "80% to compaction" in msg
|
||||
assert "50%" in msg
|
||||
|
||||
def test_85_percent_warning(self):
|
||||
msg = format_context_pressure_gateway(0.85, 0.50)
|
||||
assert "85% to compaction" in msg
|
||||
assert "imminent" in msg
|
||||
def test_90_percent_warning(self):
|
||||
msg = format_context_pressure_gateway(0.90, 0.50)
|
||||
assert "90% to compaction" in msg
|
||||
assert "approaching" in msg
|
||||
|
||||
def test_no_compaction_warning(self):
|
||||
msg = format_context_pressure_gateway(0.85, 0.50, compression_enabled=False)
|
||||
assert "disabled" in msg
|
||||
|
||||
def test_no_ansi_codes(self):
|
||||
msg = format_context_pressure_gateway(0.85, 0.50)
|
||||
msg = format_context_pressure_gateway(0.80, 0.50)
|
||||
assert "\033[" not in msg
|
||||
|
||||
def test_has_progress_bar(self):
|
||||
msg = format_context_pressure_gateway(0.85, 0.50)
|
||||
msg = format_context_pressure_gateway(0.80, 0.50)
|
||||
assert "▰" in msg
|
||||
|
||||
|
||||
@@ -145,9 +141,8 @@ def agent():
|
||||
class TestContextPressureFlags:
|
||||
"""Context pressure warning flag tracking on AIAgent."""
|
||||
|
||||
def test_flags_initialized_false(self, agent):
|
||||
assert agent._context_50_warned is False
|
||||
assert agent._context_70_warned is False
|
||||
def test_flag_initialized_false(self, agent):
|
||||
assert agent._context_pressure_warned is False
|
||||
|
||||
def test_emit_calls_status_callback(self, agent):
|
||||
"""status_callback should be invoked with event type and message."""
|
||||
@@ -204,13 +199,11 @@ class TestContextPressureFlags:
|
||||
captured = capsys.readouterr()
|
||||
assert "▰" not in captured.out
|
||||
|
||||
def test_flags_reset_on_compression(self, agent):
|
||||
"""After _compress_context, context pressure flags should reset."""
|
||||
agent._context_50_warned = True
|
||||
agent._context_70_warned = True
|
||||
def test_flag_reset_on_compression(self, agent):
|
||||
"""After _compress_context, context pressure flag should reset."""
|
||||
agent._context_pressure_warned = True
|
||||
agent.compression_enabled = True
|
||||
|
||||
# Mock the compressor's compress method to return minimal valid output
|
||||
agent.context_compressor = MagicMock()
|
||||
agent.context_compressor.compress.return_value = [
|
||||
{"role": "user", "content": "Summary of conversation so far."}
|
||||
@@ -218,11 +211,9 @@ class TestContextPressureFlags:
|
||||
agent.context_compressor.context_length = 200_000
|
||||
agent.context_compressor.threshold_tokens = 100_000
|
||||
|
||||
# Mock _todo_store
|
||||
agent._todo_store = MagicMock()
|
||||
agent._todo_store.format_for_injection.return_value = None
|
||||
|
||||
# Mock _build_system_prompt
|
||||
agent._build_system_prompt = MagicMock(return_value="system prompt")
|
||||
agent._cached_system_prompt = "old system prompt"
|
||||
agent._session_db = None
|
||||
@@ -233,8 +224,7 @@ class TestContextPressureFlags:
|
||||
]
|
||||
agent._compress_context(messages, "system prompt")
|
||||
|
||||
assert agent._context_50_warned is False
|
||||
assert agent._context_70_warned is False
|
||||
assert agent._context_pressure_warned is False
|
||||
|
||||
def test_emit_callback_error_handled(self, agent):
|
||||
"""If status_callback raises, it should be caught gracefully."""
|
||||
|
||||
@@ -1567,6 +1567,20 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
|
||||
vision_model = _get_vision_model()
|
||||
logger.debug("browser_vision: analysing screenshot (%d bytes)",
|
||||
len(image_data))
|
||||
|
||||
# Read vision timeout from config (auxiliary.vision.timeout), default 120s.
|
||||
# Local vision models (llama.cpp, ollama) can take well over 30s for
|
||||
# screenshot analysis, so the default must be generous.
|
||||
vision_timeout = 120.0
|
||||
try:
|
||||
from hermes_cli.config import load_config
|
||||
_cfg = load_config()
|
||||
_vt = _cfg.get("auxiliary", {}).get("vision", {}).get("timeout")
|
||||
if _vt is not None:
|
||||
vision_timeout = float(_vt)
|
||||
except Exception:
|
||||
pass
|
||||
|
||||
call_kwargs = {
|
||||
"task": "vision",
|
||||
"messages": [
|
||||
@@ -1580,6 +1594,7 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
|
||||
],
|
||||
"max_tokens": 2000,
|
||||
"temperature": 0.1,
|
||||
"timeout": vision_timeout,
|
||||
}
|
||||
if vision_model:
|
||||
call_kwargs["model"] = vision_model
|
||||
|
||||
@@ -179,6 +179,58 @@ async def _summarize_session(
|
||||
return None
|
||||
|
||||
|
||||
def _list_recent_sessions(db, limit: int, current_session_id: str = None) -> str:
|
||||
"""Return metadata for the most recent sessions (no LLM calls)."""
|
||||
try:
|
||||
sessions = db.list_sessions_rich(limit=limit + 5) # fetch extra to skip current
|
||||
|
||||
# Resolve current session lineage to exclude it
|
||||
current_root = None
|
||||
if current_session_id:
|
||||
try:
|
||||
sid = current_session_id
|
||||
visited = set()
|
||||
while sid and sid not in visited:
|
||||
visited.add(sid)
|
||||
s = db.get_session(sid)
|
||||
parent = s.get("parent_session_id") if s else None
|
||||
sid = parent if parent else None
|
||||
current_root = max(visited, key=len) if visited else current_session_id
|
||||
except Exception:
|
||||
current_root = current_session_id
|
||||
|
||||
results = []
|
||||
for s in sessions:
|
||||
sid = s.get("id", "")
|
||||
if current_root and (sid == current_root or sid == current_session_id):
|
||||
continue
|
||||
# Skip child/delegation sessions (they have parent_session_id)
|
||||
if s.get("parent_session_id"):
|
||||
continue
|
||||
results.append({
|
||||
"session_id": sid,
|
||||
"title": s.get("title") or None,
|
||||
"source": s.get("source", ""),
|
||||
"started_at": s.get("started_at", ""),
|
||||
"last_active": s.get("last_active", ""),
|
||||
"message_count": s.get("message_count", 0),
|
||||
"preview": s.get("preview", ""),
|
||||
})
|
||||
if len(results) >= limit:
|
||||
break
|
||||
|
||||
return json.dumps({
|
||||
"success": True,
|
||||
"mode": "recent",
|
||||
"results": results,
|
||||
"count": len(results),
|
||||
"message": f"Showing {len(results)} most recent sessions. Use a keyword query to search specific topics.",
|
||||
}, ensure_ascii=False)
|
||||
except Exception as e:
|
||||
logging.error("Error listing recent sessions: %s", e, exc_info=True)
|
||||
return json.dumps({"success": False, "error": f"Failed to list recent sessions: {e}"}, ensure_ascii=False)
|
||||
|
||||
|
||||
def session_search(
|
||||
query: str,
|
||||
role_filter: str = None,
|
||||
@@ -195,11 +247,14 @@ def session_search(
|
||||
if db is None:
|
||||
return json.dumps({"success": False, "error": "Session database not available."}, ensure_ascii=False)
|
||||
|
||||
limit = min(limit, 5) # Cap at 5 sessions to avoid excessive LLM calls
|
||||
|
||||
# Recent sessions mode: when query is empty, return metadata for recent sessions.
|
||||
# No LLM calls — just DB queries for titles, previews, timestamps.
|
||||
if not query or not query.strip():
|
||||
return json.dumps({"success": False, "error": "Query cannot be empty."}, ensure_ascii=False)
|
||||
return _list_recent_sessions(db, limit, current_session_id)
|
||||
|
||||
query = query.strip()
|
||||
limit = min(limit, 5) # Cap at 5 sessions to avoid excessive LLM calls
|
||||
|
||||
try:
|
||||
# Parse role filter
|
||||
@@ -364,8 +419,14 @@ def check_session_search_requirements() -> bool:
|
||||
SESSION_SEARCH_SCHEMA = {
|
||||
"name": "session_search",
|
||||
"description": (
|
||||
"Search your long-term memory of past conversations. This is your recall -- "
|
||||
"Search your long-term memory of past conversations, or browse recent sessions. This is your recall -- "
|
||||
"every past session is searchable, and this tool summarizes what happened.\n\n"
|
||||
"TWO MODES:\n"
|
||||
"1. Recent sessions (no query): Call with no arguments to see what was worked on recently. "
|
||||
"Returns titles, previews, and timestamps. Zero LLM cost, instant. "
|
||||
"Start here when the user asks what were we working on or what did we do recently.\n"
|
||||
"2. Keyword search (with query): Search for specific topics across all past sessions. "
|
||||
"Returns LLM-generated summaries of matching sessions.\n\n"
|
||||
"USE THIS PROACTIVELY when:\n"
|
||||
"- The user says 'we did this before', 'remember when', 'last time', 'as I mentioned'\n"
|
||||
"- The user asks about a topic you worked on before but don't have in current context\n"
|
||||
@@ -385,7 +446,7 @@ SESSION_SEARCH_SCHEMA = {
|
||||
"properties": {
|
||||
"query": {
|
||||
"type": "string",
|
||||
"description": "Search query — keywords, phrases, or boolean expressions to find in past sessions.",
|
||||
"description": "Search query — keywords, phrases, or boolean expressions to find in past sessions. Omit this parameter entirely to browse recent sessions instead (returns titles, previews, timestamps with no LLM cost).",
|
||||
},
|
||||
"role_filter": {
|
||||
"type": "string",
|
||||
@@ -397,7 +458,7 @@ SESSION_SEARCH_SCHEMA = {
|
||||
"default": 3,
|
||||
},
|
||||
},
|
||||
"required": ["query"],
|
||||
"required": [],
|
||||
},
|
||||
}
|
||||
|
||||
@@ -410,7 +471,7 @@ registry.register(
|
||||
toolset="session_search",
|
||||
schema=SESSION_SEARCH_SCHEMA,
|
||||
handler=lambda args, **kw: session_search(
|
||||
query=args.get("query", ""),
|
||||
query=args.get("query") or "",
|
||||
role_filter=args.get("role_filter"),
|
||||
limit=args.get("limit", 3),
|
||||
db=kw.get("db"),
|
||||
|
||||
@@ -1050,6 +1050,9 @@ def _get_configured_model() -> str:
|
||||
|
||||
def _resolve_trust_level(source: str) -> str:
|
||||
"""Map a source identifier to a trust level."""
|
||||
# Agent-created skills get their own permissive trust level
|
||||
if source == "agent-created":
|
||||
return "agent-created"
|
||||
# Official optional skills shipped with the repo
|
||||
if source.startswith("official/") or source == "official":
|
||||
return "builtin"
|
||||
|
||||
@@ -325,8 +325,9 @@ async def vision_analyze_tool(
|
||||
logger.info("Processing image with vision model...")
|
||||
|
||||
# Call the vision API via centralized router.
|
||||
# Read timeout from config.yaml (auxiliary.vision.timeout), default 30s.
|
||||
vision_timeout = 30.0
|
||||
# Read timeout from config.yaml (auxiliary.vision.timeout), default 120s.
|
||||
# Local vision models (llama.cpp, ollama) can take well over 30s.
|
||||
vision_timeout = 120.0
|
||||
try:
|
||||
from hermes_cli.config import load_config
|
||||
_cfg = load_config()
|
||||
|
||||
@@ -57,6 +57,15 @@ metadata:
|
||||
hermes:
|
||||
tags: [Category, Subcategory, Keywords]
|
||||
related_skills: [other-skill-name]
|
||||
requires_toolsets: [web] # Optional — only show when these toolsets are active
|
||||
requires_tools: [web_search] # Optional — only show when these tools are available
|
||||
fallback_for_toolsets: [browser] # Optional — hide when these toolsets are active
|
||||
fallback_for_tools: [browser_navigate] # Optional — hide when these tools exist
|
||||
required_environment_variables: # Optional — env vars the skill needs
|
||||
- name: MY_API_KEY
|
||||
prompt: "Enter your API key"
|
||||
help: "Get one at https://example.com"
|
||||
required_for: "API access"
|
||||
---
|
||||
|
||||
# Skill Title
|
||||
@@ -91,6 +100,57 @@ platforms: [windows] # Windows only
|
||||
|
||||
When set, the skill is automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms. If omitted or empty, the skill loads on all platforms (backward compatible).
|
||||
|
||||
### Conditional Skill Activation
|
||||
|
||||
Skills can declare dependencies on specific tools or toolsets. This controls whether the skill appears in the system prompt for a given session.
|
||||
|
||||
```yaml
|
||||
metadata:
|
||||
hermes:
|
||||
requires_toolsets: [web] # Hide if the web toolset is NOT active
|
||||
requires_tools: [web_search] # Hide if web_search tool is NOT available
|
||||
fallback_for_toolsets: [browser] # Hide if the browser toolset IS active
|
||||
fallback_for_tools: [browser_navigate] # Hide if browser_navigate IS available
|
||||
```
|
||||
|
||||
| Field | Behavior |
|
||||
|-------|----------|
|
||||
| `requires_toolsets` | Skill is **hidden** when ANY listed toolset is **not** available |
|
||||
| `requires_tools` | Skill is **hidden** when ANY listed tool is **not** available |
|
||||
| `fallback_for_toolsets` | Skill is **hidden** when ANY listed toolset **is** available |
|
||||
| `fallback_for_tools` | Skill is **hidden** when ANY listed tool **is** available |
|
||||
|
||||
**Use case for `fallback_for_*`:** Create a skill that serves as a workaround when a primary tool isn't available. For example, a `duckduckgo-search` skill with `fallback_for_tools: [web_search]` only shows when the web search tool (which requires an API key) is not configured.
|
||||
|
||||
**Use case for `requires_*`:** Create a skill that only makes sense when certain tools are present. For example, a web scraping workflow skill with `requires_toolsets: [web]` won't clutter the prompt when web tools are disabled.
|
||||
|
||||
### Environment Variable Requirements
|
||||
|
||||
Skills can declare environment variables they need. When a skill is loaded via `skill_view`, its required vars are automatically registered for passthrough into sandboxed execution environments (terminal, execute_code).
|
||||
|
||||
```yaml
|
||||
required_environment_variables:
|
||||
- name: TENOR_API_KEY
|
||||
prompt: "Tenor API key" # Shown when prompting user
|
||||
help: "Get your key at https://tenor.com" # Help text or URL
|
||||
required_for: "GIF search functionality" # What needs this var
|
||||
```
|
||||
|
||||
Each entry supports:
|
||||
- `name` (required) — the environment variable name
|
||||
- `prompt` (optional) — prompt text when asking the user for the value
|
||||
- `help` (optional) — help text or URL for obtaining the value
|
||||
- `required_for` (optional) — describes which feature needs this variable
|
||||
|
||||
Users can also manually configure passthrough variables in `config.yaml`:
|
||||
|
||||
```yaml
|
||||
terminal:
|
||||
env_passthrough:
|
||||
- MY_CUSTOM_VAR
|
||||
- ANOTHER_VAR
|
||||
```
|
||||
|
||||
See `skills/apple/` for examples of macOS-only skills.
|
||||
|
||||
## Secure Setup on Load
|
||||
|
||||
@@ -139,7 +139,7 @@ hermes gateway setup # Interactive platform configuration
|
||||
Want microphone input in the CLI or spoken replies in messaging?
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[voice]
|
||||
pip install "hermes-agent[voice]"
|
||||
|
||||
# Optional but recommended for free local speech-to-text
|
||||
pip install faster-whisper
|
||||
|
||||
@@ -57,19 +57,19 @@ If that is not solid yet, fix text mode first.
|
||||
### CLI microphone + playback
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[voice]
|
||||
pip install "hermes-agent[voice]"
|
||||
```
|
||||
|
||||
### Messaging platforms
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[messaging]
|
||||
pip install "hermes-agent[messaging]"
|
||||
```
|
||||
|
||||
### Premium ElevenLabs TTS
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[tts-premium]
|
||||
pip install "hermes-agent[tts-premium]"
|
||||
```
|
||||
|
||||
### Local NeuTTS (optional)
|
||||
@@ -81,7 +81,7 @@ python -m pip install -U neutts[all]
|
||||
### Everything
|
||||
|
||||
```bash
|
||||
pip install hermes-agent[all]
|
||||
pip install "hermes-agent[all]"
|
||||
```
|
||||
|
||||
## Step 3: install system dependencies
|
||||
|
||||
@@ -348,7 +348,7 @@ Configure in `~/.hermes/config.yaml` under your gateway's settings. See the [Mes
|
||||
**Solution:**
|
||||
```bash
|
||||
# Install messaging dependencies
|
||||
pip install hermes-agent[telegram] # or [discord], [slack], [whatsapp]
|
||||
pip install "hermes-agent[telegram]" # or [discord], [slack], [whatsapp]
|
||||
|
||||
# Check for port conflicts
|
||||
lsof -i :8080
|
||||
|
||||
@@ -55,6 +55,22 @@ Settings are resolved in this order (highest priority first):
|
||||
Secrets (API keys, bot tokens, passwords) go in `.env`. Everything else (model, terminal backend, compression settings, memory limits, toolsets) goes in `config.yaml`. When both are set, `config.yaml` wins for non-secret settings.
|
||||
:::
|
||||
|
||||
## Environment Variable Substitution
|
||||
|
||||
You can reference environment variables in `config.yaml` using `${VAR_NAME}` syntax:
|
||||
|
||||
```yaml
|
||||
auxiliary:
|
||||
vision:
|
||||
api_key: ${GOOGLE_API_KEY}
|
||||
base_url: ${CUSTOM_VISION_URL}
|
||||
|
||||
delegation:
|
||||
api_key: ${DELEGATION_KEY}
|
||||
```
|
||||
|
||||
Multiple references in a single value work: `url: "${HOST}:${PORT}"`. If a referenced variable is not set, the placeholder is kept verbatim (`${UNDEFINED_VAR}` stays as-is). Only the `${VAR}` syntax is supported — bare `$VAR` is not expanded.
|
||||
|
||||
## Inference Providers
|
||||
|
||||
You need at least one way to connect to an LLM. Use `hermes model` to switch providers and models interactively, or configure directly:
|
||||
@@ -320,7 +336,7 @@ vLLM supports tool calling, structured output, and multi-modal models. Use `--en
|
||||
|
||||
```bash
|
||||
# Start SGLang server
|
||||
pip install sglang[all]
|
||||
pip install "sglang[all]"
|
||||
python -m sglang.launch_server \
|
||||
--model meta-llama/Llama-3.1-70B-Instruct \
|
||||
--port 8000 \
|
||||
@@ -363,7 +379,7 @@ Download GGUF models from [Hugging Face](https://huggingface.co/models?library=g
|
||||
|
||||
```bash
|
||||
# Install and start
|
||||
pip install litellm[proxy]
|
||||
pip install "litellm[proxy]"
|
||||
litellm --model anthropic/claude-sonnet-4 --port 4000
|
||||
|
||||
# Or with a config file for multiple models:
|
||||
@@ -1329,6 +1345,23 @@ Usage: type `/status`, `/disk`, `/update`, or `/gpu` in the CLI or any messaging
|
||||
- **Type** — only `exec` is supported (runs a shell command); other types show an error
|
||||
- **Works everywhere** — CLI, Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant
|
||||
|
||||
## Gateway Streaming
|
||||
|
||||
Enable progressive token delivery on messaging platforms. When streaming is enabled, responses appear character-by-character in Telegram, Discord, and Slack via message editing, rather than waiting for the full response.
|
||||
|
||||
```yaml
|
||||
streaming:
|
||||
enabled: false # Enable streaming token delivery (default: off)
|
||||
transport: edit # "edit" (progressive message editing) or "off"
|
||||
edit_interval: 0.3 # Min seconds between message edits
|
||||
buffer_threshold: 40 # Characters accumulated before forcing an edit
|
||||
cursor: " ▉" # Cursor character shown during streaming
|
||||
```
|
||||
|
||||
**Platform support:** Telegram, Discord, and Slack support edit-based streaming. Platforms that don't support message editing (Signal, Email, Home Assistant) are auto-detected on the first attempt — streaming is gracefully disabled for that session with no flood of messages.
|
||||
|
||||
**Overflow handling:** If the streamed text exceeds the platform's message length limit (~4096 chars), the current message is finalized and a new one starts automatically.
|
||||
|
||||
## Human Delay
|
||||
|
||||
Simulate human-like response pacing in messaging platforms:
|
||||
@@ -1350,6 +1383,27 @@ code_execution:
|
||||
max_tool_calls: 50 # Max tool calls within code execution
|
||||
```
|
||||
|
||||
## Web Search Backends
|
||||
|
||||
The `web_search`, `web_extract`, and `web_crawl` tools support three backend providers. Configure the backend in `config.yaml` or via `hermes tools`:
|
||||
|
||||
```yaml
|
||||
web:
|
||||
backend: firecrawl # firecrawl | parallel | tavily
|
||||
```
|
||||
|
||||
| Backend | Env Var | Search | Extract | Crawl |
|
||||
|---------|---------|--------|---------|-------|
|
||||
| **Firecrawl** (default) | `FIRECRAWL_API_KEY` | ✔ | ✔ | ✔ |
|
||||
| **Parallel** | `PARALLEL_API_KEY` | ✔ | ✔ | — |
|
||||
| **Tavily** | `TAVILY_API_KEY` | ✔ | ✔ | ✔ |
|
||||
|
||||
**Backend selection:** If `web.backend` is not set, the backend is auto-detected from available API keys. If only `TAVILY_API_KEY` is set, Tavily is used. If only `PARALLEL_API_KEY` is set, Parallel is used. Otherwise Firecrawl is the default.
|
||||
|
||||
**Self-hosted Firecrawl:** Set `FIRECRAWL_API_URL` to point at your own instance. When a custom URL is set, the API key becomes optional (set `USE_DB_AUTHENTICATION=false` on the server to disable auth).
|
||||
|
||||
**Parallel search modes:** Set `PARALLEL_SEARCH_MODE` to control search behavior — `fast`, `one-shot`, or `agentic` (default: `agentic`).
|
||||
|
||||
## Browser
|
||||
|
||||
Configure browser automation behavior:
|
||||
|
||||
@@ -231,6 +231,6 @@ Any frontend that supports the OpenAI API format works. Tested/documented integr
|
||||
|
||||
## Limitations
|
||||
|
||||
- **Response storage is in-memory** — stored responses (for `previous_response_id`) are lost on gateway restart. Max 100 stored responses (LRU eviction).
|
||||
- **Response storage** — stored responses (for `previous_response_id`) are persisted in SQLite and survive gateway restarts. Max 100 stored responses (LRU eviction).
|
||||
- **No file upload** — vision/document analysis via uploaded files is not yet supported through the API.
|
||||
- **Model field is cosmetic** — the `model` field in requests is accepted but the actual LLM model used is configured server-side in config.yaml.
|
||||
|
||||
109
website/docs/user-guide/features/context-references.md
Normal file
109
website/docs/user-guide/features/context-references.md
Normal file
@@ -0,0 +1,109 @@
|
||||
---
|
||||
sidebar_position: 9
|
||||
title: "Context References"
|
||||
description: "Inline @-syntax for attaching files, folders, git diffs, and URLs directly into your messages"
|
||||
---
|
||||
|
||||
# Context References
|
||||
|
||||
Type `@` followed by a reference to inject content directly into your message. Hermes expands the reference inline and appends the content under an `--- Attached Context ---` section.
|
||||
|
||||
## Supported References
|
||||
|
||||
| Syntax | Description |
|
||||
|--------|-------------|
|
||||
| `@file:path/to/file.py` | Inject file contents |
|
||||
| `@file:path/to/file.py:10-25` | Inject specific line range (1-indexed, inclusive) |
|
||||
| `@folder:path/to/dir` | Inject directory tree listing with file metadata |
|
||||
| `@diff` | Inject `git diff` (unstaged working tree changes) |
|
||||
| `@staged` | Inject `git diff --staged` (staged changes) |
|
||||
| `@git:5` | Inject last N commits with patches (max 10) |
|
||||
| `@url:https://example.com` | Fetch and inject web page content |
|
||||
|
||||
## Usage Examples
|
||||
|
||||
```text
|
||||
Review @file:src/main.py and suggest improvements
|
||||
|
||||
What changed? @diff
|
||||
|
||||
Compare @file:old_config.yaml and @file:new_config.yaml
|
||||
|
||||
What's in @folder:src/components?
|
||||
|
||||
Summarize this article @url:https://arxiv.org/abs/2301.00001
|
||||
```
|
||||
|
||||
Multiple references work in a single message:
|
||||
|
||||
```text
|
||||
Check @file:main.py, and also @file:test.py.
|
||||
```
|
||||
|
||||
Trailing punctuation (`,`, `.`, `;`, `!`, `?`) is automatically stripped from reference values.
|
||||
|
||||
## CLI Tab Completion
|
||||
|
||||
In the interactive CLI, typing `@` triggers autocomplete:
|
||||
|
||||
- `@` shows all reference types (`@diff`, `@staged`, `@file:`, `@folder:`, `@git:`, `@url:`)
|
||||
- `@file:` and `@folder:` trigger filesystem path completion with file size metadata
|
||||
- Bare `@` followed by partial text shows matching files and folders from the current directory
|
||||
|
||||
## Line Ranges
|
||||
|
||||
The `@file:` reference supports line ranges for precise content injection:
|
||||
|
||||
```text
|
||||
@file:src/main.py:42 # Single line 42
|
||||
@file:src/main.py:10-25 # Lines 10 through 25 (inclusive)
|
||||
```
|
||||
|
||||
Lines are 1-indexed. Invalid ranges are silently ignored (full file is returned).
|
||||
|
||||
## Size Limits
|
||||
|
||||
Context references are bounded to prevent overwhelming the model's context window:
|
||||
|
||||
| Threshold | Value | Behavior |
|
||||
|-----------|-------|----------|
|
||||
| Soft limit | 25% of context length | Warning appended, expansion proceeds |
|
||||
| Hard limit | 50% of context length | Expansion refused, original message returned unchanged |
|
||||
| Folder entries | 200 files max | Excess entries replaced with `- ...` |
|
||||
| Git commits | 10 max | `@git:N` clamped to range [1, 10] |
|
||||
|
||||
## Security
|
||||
|
||||
### Sensitive Path Blocking
|
||||
|
||||
These paths are always blocked from `@file:` references to prevent credential exposure:
|
||||
|
||||
- SSH keys and config: `~/.ssh/id_rsa`, `~/.ssh/id_ed25519`, `~/.ssh/authorized_keys`, `~/.ssh/config`
|
||||
- Shell profiles: `~/.bashrc`, `~/.zshrc`, `~/.profile`, `~/.bash_profile`, `~/.zprofile`
|
||||
- Credential files: `~/.netrc`, `~/.pgpass`, `~/.npmrc`, `~/.pypirc`
|
||||
- Hermes env: `$HERMES_HOME/.env`
|
||||
|
||||
These directories are fully blocked (any file inside):
|
||||
- `~/.ssh/`, `~/.aws/`, `~/.gnupg/`, `~/.kube/`, `$HERMES_HOME/skills/.hub/`
|
||||
|
||||
### Path Traversal Protection
|
||||
|
||||
All paths are resolved relative to the working directory. References that resolve outside the allowed workspace root are rejected.
|
||||
|
||||
### Binary File Detection
|
||||
|
||||
Binary files are detected via MIME type and null-byte scanning. Known text extensions (`.py`, `.md`, `.json`, `.yaml`, `.toml`, `.js`, `.ts`, etc.) bypass MIME-based detection. Binary files are rejected with a warning.
|
||||
|
||||
## Error Handling
|
||||
|
||||
Invalid references produce inline warnings rather than failures:
|
||||
|
||||
| Condition | Behavior |
|
||||
|-----------|----------|
|
||||
| File not found | Warning: "file not found" |
|
||||
| Binary file | Warning: "binary files are not supported" |
|
||||
| Folder not found | Warning: "folder not found" |
|
||||
| Git command fails | Warning with git stderr |
|
||||
| URL returns no content | Warning: "no content extracted" |
|
||||
| Sensitive path | Warning: "path is a sensitive credential file" |
|
||||
| Path outside workspace | Warning: "path is outside the allowed workspace" |
|
||||
@@ -6,9 +6,20 @@ description: "Run custom code at key lifecycle points — log activity, send ale
|
||||
|
||||
# Event Hooks
|
||||
|
||||
The hooks system lets you run custom code at key points in the agent lifecycle — session creation, slash commands, each tool-calling step, and more. Hooks fire automatically during gateway operation without blocking the main agent pipeline.
|
||||
Hermes has two hook systems that run custom code at key lifecycle points:
|
||||
|
||||
## Creating a Hook
|
||||
| System | Registered via | Runs in | Use case |
|
||||
|--------|---------------|---------|----------|
|
||||
| **[Gateway hooks](#gateway-event-hooks)** | `HOOK.yaml` + `handler.py` in `~/.hermes/hooks/` | Gateway only | Logging, alerts, webhooks |
|
||||
| **[Plugin hooks](#plugin-hooks)** | `ctx.register_hook()` in a [plugin](/docs/user-guide/features/plugins) | CLI + Gateway | Tool interception, metrics, guardrails |
|
||||
|
||||
Both systems are non-blocking — errors in any hook are caught and logged, never crashing the agent.
|
||||
|
||||
## Gateway Event Hooks
|
||||
|
||||
Gateway hooks fire automatically during gateway operation (Telegram, Discord, Slack, WhatsApp) without blocking the main agent pipeline.
|
||||
|
||||
### Creating a Hook
|
||||
|
||||
Each hook is a directory under `~/.hermes/hooks/` containing two files:
|
||||
|
||||
@@ -19,7 +30,7 @@ Each hook is a directory under `~/.hermes/hooks/` containing two files:
|
||||
└── handler.py # Python handler function
|
||||
```
|
||||
|
||||
### HOOK.yaml
|
||||
#### HOOK.yaml
|
||||
|
||||
```yaml
|
||||
name: my-hook
|
||||
@@ -32,7 +43,7 @@ events:
|
||||
|
||||
The `events` list determines which events trigger your handler. You can subscribe to any combination of events, including wildcards like `command:*`.
|
||||
|
||||
### handler.py
|
||||
#### handler.py
|
||||
|
||||
```python
|
||||
import json
|
||||
@@ -58,25 +69,26 @@ async def handle(event_type: str, context: dict):
|
||||
- Can be `async def` or regular `def` — both work
|
||||
- Errors are caught and logged, never crashing the agent
|
||||
|
||||
## Available Events
|
||||
### Available Events
|
||||
|
||||
| Event | When it fires | Context keys |
|
||||
|-------|---------------|--------------|
|
||||
| `gateway:startup` | Gateway process starts | `platforms` (list of active platform names) |
|
||||
| `session:start` | New messaging session created | `platform`, `user_id`, `session_id`, `session_key` |
|
||||
| `session:end` | Session ended (before reset) | `platform`, `user_id`, `session_key` |
|
||||
| `session:reset` | User ran `/new` or `/reset` | `platform`, `user_id`, `session_key` |
|
||||
| `agent:start` | Agent begins processing a message | `platform`, `user_id`, `session_id`, `message` |
|
||||
| `agent:step` | Each iteration of the tool-calling loop | `platform`, `user_id`, `session_id`, `iteration`, `tool_names` |
|
||||
| `agent:end` | Agent finishes processing | `platform`, `user_id`, `session_id`, `message`, `response` |
|
||||
| `command:*` | Any slash command executed | `platform`, `user_id`, `command`, `args` |
|
||||
|
||||
### Wildcard Matching
|
||||
#### Wildcard Matching
|
||||
|
||||
Handlers registered for `command:*` fire for any `command:` event (`command:model`, `command:reset`, etc.). Monitor all slash commands with a single subscription.
|
||||
|
||||
## Examples
|
||||
### Examples
|
||||
|
||||
### Telegram Alert on Long Tasks
|
||||
#### Telegram Alert on Long Tasks
|
||||
|
||||
Send yourself a message when the agent takes more than 10 steps:
|
||||
|
||||
@@ -109,7 +121,7 @@ async def handle(event_type: str, context: dict):
|
||||
)
|
||||
```
|
||||
|
||||
### Command Usage Logger
|
||||
#### Command Usage Logger
|
||||
|
||||
Track which slash commands are used:
|
||||
|
||||
@@ -142,7 +154,7 @@ def handle(event_type: str, context: dict):
|
||||
f.write(json.dumps(entry) + "\n")
|
||||
```
|
||||
|
||||
### Session Start Webhook
|
||||
#### Session Start Webhook
|
||||
|
||||
POST to an external service on new sessions:
|
||||
|
||||
@@ -169,7 +181,7 @@ async def handle(event_type: str, context: dict):
|
||||
}, timeout=5)
|
||||
```
|
||||
|
||||
## How It Works
|
||||
### How It Works
|
||||
|
||||
1. On gateway startup, `HookRegistry.discover_and_load()` scans `~/.hermes/hooks/`
|
||||
2. Each subdirectory with `HOOK.yaml` + `handler.py` is loaded dynamically
|
||||
@@ -178,5 +190,51 @@ async def handle(event_type: str, context: dict):
|
||||
5. Errors in any handler are caught and logged — a broken hook never crashes the agent
|
||||
|
||||
:::info
|
||||
Hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp). The CLI does not currently load hooks.
|
||||
Gateway hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp). The CLI does not load gateway hooks. For hooks that work everywhere, use [plugin hooks](#plugin-hooks).
|
||||
:::
|
||||
|
||||
## Plugin Hooks
|
||||
|
||||
[Plugins](/docs/user-guide/features/plugins) can register hooks that fire in **both CLI and gateway** sessions. These are registered programmatically via `ctx.register_hook()` in your plugin's `register()` function.
|
||||
|
||||
```python
|
||||
def register(ctx):
|
||||
ctx.register_hook("pre_tool_call", my_callback)
|
||||
ctx.register_hook("post_tool_call", my_callback)
|
||||
```
|
||||
|
||||
### Available Plugin Hooks
|
||||
|
||||
| Hook | Fires when | Callback receives |
|
||||
|------|-----------|-------------------|
|
||||
| `pre_tool_call` | Before any tool executes | `tool_name`, `args`, `task_id` |
|
||||
| `post_tool_call` | After any tool returns | `tool_name`, `args`, `result`, `task_id` |
|
||||
| `pre_llm_call` | Before LLM API request | *(planned — not yet wired)* |
|
||||
| `post_llm_call` | After LLM API response | *(planned — not yet wired)* |
|
||||
| `on_session_start` | Session begins | *(planned — not yet wired)* |
|
||||
| `on_session_end` | Session ends | *(planned — not yet wired)* |
|
||||
|
||||
Callbacks receive keyword arguments matching the columns above:
|
||||
|
||||
```python
|
||||
def my_callback(**kwargs):
|
||||
tool = kwargs["tool_name"]
|
||||
args = kwargs["args"]
|
||||
# ...
|
||||
```
|
||||
|
||||
### Example: Block Dangerous Tools
|
||||
|
||||
```python
|
||||
# ~/.hermes/plugins/tool-guard/__init__.py
|
||||
BLOCKED = {"terminal", "write_file"}
|
||||
|
||||
def guard(**kwargs):
|
||||
if kwargs["tool_name"] in BLOCKED:
|
||||
print(f"⚠ Blocked tool call: {kwargs['tool_name']}")
|
||||
|
||||
def register(ctx):
|
||||
ctx.register_hook("pre_tool_call", guard)
|
||||
```
|
||||
|
||||
See the **[Plugins guide](/docs/user-guide/features/plugins)** for full details on creating plugins.
|
||||
|
||||
@@ -46,14 +46,16 @@ Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable
|
||||
|
||||
## Available hooks
|
||||
|
||||
Plugins can register callbacks for these lifecycle events. See the **[Event Hooks page](/docs/user-guide/features/hooks#plugin-hooks)** for full details, callback signatures, and examples.
|
||||
|
||||
| Hook | Fires when |
|
||||
|------|-----------|
|
||||
| `pre_tool_call` | Before any tool executes |
|
||||
| `post_tool_call` | After any tool returns |
|
||||
| `pre_llm_call` | Before LLM API request |
|
||||
| `post_llm_call` | After LLM API response |
|
||||
| `on_session_start` | Session begins |
|
||||
| `on_session_end` | Session ends |
|
||||
| `pre_llm_call` | Before LLM API request *(planned)* |
|
||||
| `post_llm_call` | After LLM API response *(planned)* |
|
||||
| `on_session_start` | Session begins *(planned)* |
|
||||
| `on_session_end` | Session ends *(planned)* |
|
||||
|
||||
## Slash commands
|
||||
|
||||
|
||||
@@ -36,19 +36,19 @@ The `~/.hermes/` directory and default `config.yaml` are created automatically t
|
||||
|
||||
```bash
|
||||
# CLI voice mode (microphone + audio playback)
|
||||
pip install hermes-agent[voice]
|
||||
pip install "hermes-agent[voice]"
|
||||
|
||||
# Discord + Telegram messaging (includes discord.py[voice] for VC support)
|
||||
pip install hermes-agent[messaging]
|
||||
pip install "hermes-agent[messaging]"
|
||||
|
||||
# Premium TTS (ElevenLabs)
|
||||
pip install hermes-agent[tts-premium]
|
||||
pip install "hermes-agent[tts-premium]"
|
||||
|
||||
# Local TTS (NeuTTS, optional)
|
||||
python -m pip install -U neutts[all]
|
||||
|
||||
# Everything at once
|
||||
pip install hermes-agent[all]
|
||||
pip install "hermes-agent[all]"
|
||||
```
|
||||
|
||||
| Extra | Packages | Required For |
|
||||
|
||||
@@ -358,6 +358,42 @@ When a blocked URL is requested, the tool returns an error explaining the domain
|
||||
|
||||
See [Website Blocklist](/docs/user-guide/configuration#website-blocklist) in the configuration guide for full details.
|
||||
|
||||
### SSRF Protection
|
||||
|
||||
All URL-capable tools (web search, web extract, vision, browser) validate URLs before fetching them to prevent Server-Side Request Forgery (SSRF) attacks. Blocked addresses include:
|
||||
|
||||
- **Private networks** (RFC 1918): `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`
|
||||
- **Loopback**: `127.0.0.0/8`, `::1`
|
||||
- **Link-local**: `169.254.0.0/16` (includes cloud metadata at `169.254.169.254`)
|
||||
- **CGNAT / shared address space** (RFC 6598): `100.64.0.0/10` (Tailscale, WireGuard VPNs)
|
||||
- **Cloud metadata hostnames**: `metadata.google.internal`, `metadata.goog`
|
||||
- **Reserved, multicast, and unspecified addresses**
|
||||
|
||||
SSRF protection is always active and cannot be disabled. DNS failures are treated as blocked (fail-closed). Redirect chains are re-validated at each hop to prevent redirect-based bypasses.
|
||||
|
||||
### Tirith Pre-Exec Security Scanning
|
||||
|
||||
Hermes integrates [tirith](https://github.com/sheeki03/tirith) for content-level command scanning before execution. Tirith detects threats that pattern matching alone misses:
|
||||
|
||||
- Homograph URL spoofing (internationalized domain attacks)
|
||||
- Pipe-to-interpreter patterns (`curl | bash`, `wget | sh`)
|
||||
- Terminal injection attacks
|
||||
|
||||
Tirith auto-installs from GitHub releases on first use with SHA-256 checksum verification (and cosign provenance verification if cosign is available).
|
||||
|
||||
```yaml
|
||||
# In ~/.hermes/config.yaml
|
||||
security:
|
||||
tirith_enabled: true # Enable/disable tirith scanning (default: true)
|
||||
tirith_path: "tirith" # Path to tirith binary (default: PATH lookup)
|
||||
tirith_timeout: 5 # Subprocess timeout in seconds
|
||||
tirith_fail_open: true # Allow execution when tirith is unavailable (default: true)
|
||||
```
|
||||
|
||||
When `tirith_fail_open` is `true` (default), commands proceed if tirith is not installed or times out. Set to `false` in high-security environments to block commands when tirith is unavailable.
|
||||
|
||||
Tirith's verdict integrates with the approval flow: safe commands pass through, suspicious commands trigger user approval, and dangerous commands are blocked.
|
||||
|
||||
### Context File Injection Protection
|
||||
|
||||
Context files (AGENTS.md, .cursorrules, SOUL.md) are scanned for prompt injection before being included in the system prompt. The scanner checks for:
|
||||
|
||||
@@ -114,7 +114,13 @@ Session IDs follow the format `YYYYMMDD_HHMMSS_<8-char-hex>`, e.g. `20250305_091
|
||||
|
||||
Give sessions human-readable titles so you can find and resume them easily.
|
||||
|
||||
### Setting a Title
|
||||
### Auto-Generated Titles
|
||||
|
||||
Hermes automatically generates a short descriptive title (3–7 words) for each session after the first exchange. This runs in a background thread using a fast auxiliary model, so it adds no latency. You'll see auto-generated titles when browsing sessions with `hermes sessions list` or `hermes sessions browse`.
|
||||
|
||||
Auto-titling only fires once per session and is skipped if you've already set a title manually.
|
||||
|
||||
### Setting a Title Manually
|
||||
|
||||
Use the `/title` slash command inside any chat session (CLI or gateway):
|
||||
|
||||
|
||||
Reference in New Issue
Block a user