- Add `shopt -s expand_aliases` to snapshot so aliases captured by
`alias -p` actually work under `bash -c` (review comment #2)
- Pass threshold=0 in enforce_turn_budget() so L3 can force-persist
results below the 50K default when aggregate budget is exceeded
(review comment #3)
- Add regression test: 6x42K results (each under 50K) exceeding 200K
budget are now correctly persisted
- Daytona: skip refresh_data() API call unless sandbox was interrupted/errored
- Docker: cache _build_forward_env_args() to avoid re-reading .env every command
- All remote backends: TTL-based sync skip (5s) to avoid redundant dir walks
Expanded tool_result_storage.py module docstring to document the
three-level architecture. Replaced opaque L2/L3 labels at call
sites with self-describing comments.
Eliminates ~50 lines of duplicated pipe+thread+poll boilerplate between
_ModalProcessHandle and _DaytonaProcessHandle. Both now use closures
passed to the shared _ThreadedProcessHandle in base.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SSH _before_execute() ran rsync unconditionally before every command,
adding ~2.3s overhead even when zero bytes were transferred. This was
80% of per-command latency (actual execution: ~0.6s).
Add (mtime, size) caching — matching the pattern Modal and Daytona
already use — to skip rsync when local files haven't changed:
- Per-file mtime+size check for credential files
- Directory fingerprint (set of relpath/mtime/size tuples) for skills
- --delete flag on skills rsync to prune uninstalled skills
- Track created remote dirs to avoid redundant mkdir -p calls
- Cache invalidation on rsync failure (remote may have been wiped)
- force=True parameter as escape hatch for debugging
Before: ~3s per SSH command (2.3s rsync + 0.6s execution)
After: ~0.6s per SSH command (mtime check + execution)
SSH test suite: 134s → 50s
Previously, _wrap_command() wrote pwd to a file on the remote (container,
sandbox, SSH host), then _update_cwd_from_file() read it back via another
_run_bash() call. On Modal/Daytona this was a full API round-trip just to
read 20 bytes.
Now the wrapping template echoes the cwd to stdout with markers:
printf '\n__HERMES_CWD__%s__HERMES_CWD__\n' "$(pwd -P)"
_extract_cwd_from_output() parses it from the output already in memory.
Zero extra round-trips on any backend. The cwdfile, _read_file_in_env(),
and per-backend overrides are all deleted.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Declares max_result_size_chars on each tool registration so the persistence
layer can apply per-tool limits instead of the global 50K default. Adds a
Layer 1 output cap inside search_tool() to prevent context overflow, and
adds a schema maximum of 10000 to the search_files limit parameter.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tools/tool_result_storage.py implementing Layer 2 (per-result) and
Layer 3 (per-turn budget) persistence for large tool outputs. Results
exceeding thresholds are written to disk with a <persisted-output>
preview block replacing the inline content. Extend ToolEntry and
ToolRegistry with max_result_size_chars for per-tool threshold control.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update execute tests to account for init_session during __init__
- Fix CWD resolution tests for cwdfile reads
- Patch is_interrupted at base module level (where _wait_for_process uses it)
- Update stdin heredoc test for new call pattern
- 27/27 Daytona unit tests passing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DELETE persistent_shell.py entirely (277 lines removed)
- Remove _SHELL_NOISE_SUBSTRINGS, _clean_shell_noise, _extract_fenced_output
from local.py (unused after fence marker removal)
- Adapt ManagedModalEnvironment to use BaseEnvironment + _wrap_command()
while keeping its own HTTP-based execute()
- Remove _OUTPUT_FENCE constant
42/42 tests passing across all testable backends.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes:
1. Replace all stale 'hermes login' references with 'hermes auth' across
auth.py, auxiliary_client.py, delegate_tool.py, config.py, run_agent.py,
and documentation. The 'hermes login' command was deprecated; 'hermes auth'
now handles OAuth credential management.
2. Fix credential removal not persisting for singleton-sourced credentials
(device_code for openai-codex/nous, hermes_pkce for anthropic).
auth_remove_command already cleared env vars for env-sourced credentials,
but singleton credentials stored in the auth store were re-seeded by
_seed_from_singletons() on the next load_pool() call. Now clears the
underlying auth store entry when removing singleton-sourced credentials.
* feat(tools): add Firecrawl cloud browser provider
Adds Firecrawl (https://firecrawl.dev) as a cloud browser provider
alongside Browserbase and Browser Use. All browser tools route through
Firecrawl's cloud browser via CDP when selected.
- tools/browser_providers/firecrawl.py — FirecrawlProvider
- tools/browser_tool.py — register in _PROVIDER_REGISTRY
- hermes_cli/tools_config.py — add to onboarding provider picker
- hermes_cli/setup.py — add to setup summary
- hermes_cli/config.py — add FIRECRAWL_BROWSER_TTL config
- website/docs/ — browser docs and env var reference
Based on #4490 by @developersdigest.
Co-Authored-By: Developers Digest <124798203+developersdigest@users.noreply.github.com>
* refactor: simplify FirecrawlProvider.emergency_cleanup
Use self._headers() and self._api_url() instead of duplicating
env-var reads and header construction.
* fix: recognize Firecrawl in subscription browser detection
_resolve_browser_feature_state() now handles "firecrawl" as a direct
browser provider (same pattern as "browser-use"), so hermes setup
summary correctly shows "Browser Automation (Firecrawl)" instead of
misreporting as "Local browser".
Also fixes test_config_version_unchanged assertion (11 → 12).
---------
Co-authored-by: Developers Digest <124798203+developersdigest@users.noreply.github.com>
When parent_agent.enabled_toolsets is None (the default, meaning all tools
are enabled), subagents incorrectly fell back to DEFAULT_TOOLSETS
(['terminal', 'file', 'web']) instead of inheriting the parent's full
toolset.
Root cause:
- Line 188 used 'or' fallback: None or DEFAULT_TOOLSETS evaluates to
DEFAULT_TOOLSETS
- Line 192 checked truthiness: None is falsy, falling through to else
Fix:
- Use 'is not None' checks instead of truthiness
- When enabled_toolsets is None, derive effective toolsets from
parent_agent.valid_tool_names via the tool registry
Fixes the bug introduced in f75b1d21b and repeated in e5d14445e (PR #3269).
Shell injection via unquoted workdir interpolation in docker, singularity,
and SSH backends. When workdir contained shell metacharacters (e.g.
~/;id), arbitrary commands could execute.
Changes:
- Add shlex.quote() at each interpolation point in docker.py,
singularity.py, and ssh.py with tilde-aware quoting (keep ~
unquoted for shell expansion, quote only the subpath)
- Add _validate_workdir() allowlist in terminal_tool.py as
defense-in-depth before workdir reaches any backend
Original work by Mariano A. Nicolini (PR #5620). Salvaged with fixes
for tilde expansion (shlex.quote breaks cd ~/path) and replaced
incomplete deny-list with strict character allowlist.
Co-authored-by: Mariano A. Nicolini <entropidelic@users.noreply.github.com>
The ContextVar migration removed 'from pathlib import Path' but Path
is still used in _load_config_passthrough(). Without this import,
config-based env passthrough would raise NameError.
Subagent sessions spawned by delegate_task were created with
parent_session_id=NULL and source=cli, making them indistinguishable
from user sessions in hermes sessions list and /resume.
Changes:
- delegate_tool.py: pass parent_agent.session_id to child agent
- run_agent.py: accept parent_session_id param, pass to create_session
- hermes_state.py list_sessions_rich: filter parent_session_id IS NULL
by default (opt-in include_children=True for callers that need them)
- hermes_state.py delete_session: delete child sessions first (FK)
- hermes_state.py prune_sessions: delete children before parents (FK)
session_search already handles parent_session_id correctly — child
sessions are filtered from recent list and resolved to parent root
in full-text search results.
Fixes#5122
Before launching an MCP server via npx/uvx, queries the OSV (Open Source
Vulnerabilities) API to check if the package has known malware advisories
(MAL-* IDs). Regular CVEs are ignored — only confirmed malware is blocked.
- Free, public API (Google-maintained), ~300ms per query
- Runs once per MCP server launch, inside _run_stdio() before subprocess spawn
- Parallel with other MCP servers (asyncio.gather already in place)
- Fail-open: network errors, timeouts, unrecognized commands → allow
- Parses npm (scoped @scope/pkg@version) and PyPI (name[extras]==version)
Inspired by Block/goose extension malware check.
Add optional 'expression' parameter to browser_console that evaluates
JavaScript in the page context (like DevTools console). Returns structured
results with auto-JSON parsing.
No new tool — extends the existing browser_console schema with ~20 tokens
of overhead instead of adding a 12th browser tool.
Both backends supported:
- Browserbase: uses agent-browser 'eval' command via CDP
- Camofox: uses /tabs/{tab_id}/eval endpoint with graceful degradation
E2E verified: string eval, number eval, structured JSON, DOM manipulation,
error handling, and original console-output mode all working.
Add POST /v1/runs to start async agent runs and GET /v1/runs/{run_id}/events
for SSE streaming of typed lifecycle events (tool.started, tool.completed,
message.delta, reasoning.available, run.completed, run.failed).
Changes the internal tool_progress_callback signature from positional
(tool_name, preview, args) to event-type-first
(event_type, tool_name, preview, args, **kwargs). Existing consumers
filter on event_type and remain backward-compatible.
Adds concurrency limit (_MAX_CONCURRENT_RUNS=10) and orphaned run sweep.
Fixes logic inversion in cli.py _on_tool_progress where the original PR
would have displayed internal tools instead of non-internal ones.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Allow delegate_task to specify custom ACP transport per-task, so a parent
running via CLI/Discord/Telegram can spawn child agents over ACP
(e.g. claude --acp --stdio). Follows the existing override_provider pattern.
Supports per-task granularity in batch mode.
Co-authored-by: Mibayy <mibayy@users.noreply.github.com>
Route AIAgent print output to stderr via _print_fn for ACP stdio sessions.
Gate quiet-mode spinner startup on _should_start_quiet_spinner() so JSON-RPC
on stdout isn't corrupted. Child agents inherit the redirect.
Co-authored-by: Git-on-my-level <Git-on-my-level@users.noreply.github.com>
- Firecrawl scrape: 60s timeout via asyncio.wait_for + to_thread
(previously could hang indefinitely)
- Summarizer retries: 6 → 2 (one retry), reads timeout from
auxiliary.web_extract.timeout config (default 360s / 6min)
- Summarizer failure: falls back to truncated raw content (~5000 chars)
instead of useless error message, with guidance about config/model
- Config default: auxiliary.web_extract.timeout bumped 30 → 360s
for local model compatibility
Addresses Discord reports of agent hanging during web_extract.
Salvaged from PRs #3767 (chalkers), #5236 (ygd58), #2641 (buntingszn).
Three improvements to Matrix cron delivery:
1. Live adapter path: when the gateway is running, cron delivery now uses
the connected MatrixAdapter via run_coroutine_threadsafe instead of
the standalone HTTP PUT. This enables delivery to E2EE rooms where
the raw HTTP path cannot encrypt. Falls back to standalone on failure.
Threads adapters + event loop from gateway -> cron ticker -> tick() ->
_deliver_result(). (from #3767)
2. HTML formatted_body: _send_matrix() now converts markdown to HTML
using the optional markdown library, with h1-h6 to bold conversion
for Element X compatibility. Falls back to plain text if markdown
is not installed. Also adds random bytes to txn_id to prevent
collisions. (from #5236)
3. Origin fallback: when deliver="origin" but origin is null (jobs
created via API/scripts), falls back to HOME_CHANNEL env vars
in order: matrix -> telegram -> discord -> slack. (from #2641)
When commands like grep, diff, test, or find return non-zero exit codes
that aren't actual errors (grep 1 = no matches, diff 1 = files differ),
the model wastes turns investigating non-problems. This adds an
exit_code_meaning field to the terminal JSON result that explains
informational exit codes, so the agent can move on instead of debugging.
Covers grep/rg/ag/ack (no matches), diff (files differ), find (partial
access), test/[ (condition false), curl (timeouts, DNS, HTTP errors),
and git (context-dependent). Correctly extracts the last command from
pipelines and chains, strips full paths and env var assignments.
The exit_code field itself is unchanged — this is purely additive context.
When a dangerous command is approved (gateway, CLI, or smart approval),
the terminal tool now includes an 'approval' field in the result JSON
so the model knows approval was requested and granted. Previously the
model only saw normal command output with no indication that approval
happened, causing it to hallucinate that the approval system didn't fire.
Changes:
- approval.py: Return user_approved/description in all 3 approval paths
(gateway blocking, CLI interactive, smart approval)
- terminal_tool.py: Capture approval metadata and inject into both
foreground and background command results
* feat: execute_code runs on remote terminal backends (Docker/SSH/Modal/Daytona/Singularity)
When TERMINAL_ENV is not 'local', execute_code now ships the script to
the remote environment and runs it there via the terminal backend --
the same container/sandbox/SSH session used by terminal() and file tools.
Architecture:
- Local backend: unchanged (UDS RPC, subprocess.Popen)
- Remote backends: file-based RPC via execute_oneshot() polling
- Script writes request files, parent polls and dispatches tool calls
- Responses written atomically (tmp + rename) via base64/stdin
- execute_oneshot() bypasses persistent shell lock for concurrency
Changes:
- tools/environments/base.py: add execute_oneshot() (delegates to execute())
- tools/environments/persistent_shell.py: override execute_oneshot() to
bypass _shell_lock via _execute_oneshot(), enabling concurrent polling
- tools/code_execution_tool.py: add file-based transport to
generate_hermes_tools_module(), _execute_remote() with full env
get-or-create, file shipping, RPC poll loop, output post-processing
* fix: use _get_env_config() instead of raw TERMINAL_ENV env var
Read terminal backend type through the canonical config resolution
path (terminal_tool._get_env_config) instead of os.getenv directly.
* fix: use echo piping instead of stdin_data for base64 writes
Modal doesn't reliably deliver stdin_data to chained commands
(base64 -d > file && mv), producing 0-byte files. Switch to
echo 'base64' | base64 -d which works on all backends.
Verified E2E on both Docker and Modal.
Add an optional 'script' parameter to cron jobs that references a Python
script. The script runs before each agent turn, and its stdout is injected
into the prompt as context. This enables stateful monitoring — the script
handles data collection and change detection, the LLM analyzes and reports.
- cron/jobs.py: add script field to create_job(), stored in job dict
- cron/scheduler.py: add _run_job_script() executor with timeout handling,
inject script output/errors into _build_job_prompt()
- tools/cronjob_tools.py: add script to tool schema, create/update handlers,
_format_job display
- hermes_cli/cron.py: add --script to create/edit, display in list/edit output
- hermes_cli/main.py: add --script argparse for cron create/edit subcommands
- tests/cron/test_cron_script.py: 20 tests covering job CRUD, script
execution, path resolution, error handling, prompt injection, tool API
Script paths can be absolute or relative (resolved against ~/.hermes/scripts/).
Scripts run with a 120s timeout. Failures are injected as error context so
the LLM can report the problem. Empty string clears an attached script.
Bug fixes:
- agent/redact.py: catastrophic regex backtracking in _ENV_ASSIGN_RE — removed
re.IGNORECASE and changed [A-Z_]* to [A-Z0-9_]* to restrict matching to actual
env var name chars. Without this, the pattern backtracks exponentially on large
strings (e.g. 100K tool output), causing test_file_read_guards to time out.
- tools/file_operations.py: over-escaped newline in find -printf format string
produced literal backslash-n instead of a real newline, breaking file search
result parsing (total_count always 1, paths concatenated).
Test fixes:
- Remove stale pytestmark.skip from 4 test modules that were blanket-skipped as
'Hangs in non-interactive environments' but actually run fine:
- test_413_compression.py (12 tests, 25s)
- test_file_tools_live.py (71 tests, 24s)
- test_code_execution.py (61 tests, 99s)
- test_agent_loop_tool_calling.py (has proper OPENROUTER_API_KEY skip already)
- test_413_compression.py: fix threshold values in 2 preflight compression tests
where context_length was too small for the compressed output to fit in one pass.
- test_mcp_probe.py: add missing _MCP_AVAILABLE mock so tests work without MCP SDK.
- test_mcp_tool_issue_948.py: inject MCP symbols (StdioServerParameters etc.) when
SDK is not installed so patch() targets exist.
- test_approve_deny_commands.py: replace time.sleep(0.3) with deterministic polling
of _gateway_queues — fixes race condition where resolve fires before threads
register their approval entries, causing the test to hang indefinitely.
Net effect: +256 tests recovered from skip, 8 real failures fixed.
Add docker_env option to terminal config — a dict of key-value pairs that
get set inside Docker containers via -e flags at both container creation
(docker run) and per-command execution (docker exec) time.
This complements docker_forward_env (which reads values dynamically from
the host process environment). docker_env is useful when Hermes runs as a
systemd service without access to the user's shell environment — e.g.
setting SSH_AUTH_SOCK or GNUPGHOME to known stable paths for SSH/GPG
agent socket forwarding.
Precedence: docker_env provides baseline values; docker_forward_env
overrides for the same key.
Config example:
terminal:
docker_env:
SSH_AUTH_SOCK: /run/user/1000/ssh-agent.sock
GNUPGHOME: /root/.gnupg
docker_volumes:
- /run/user/1000/ssh-agent.sock:/run/user/1000/ssh-agent.sock
- /run/user/1000/gnupg/S.gpg-agent:/root/.gnupg/S.gpg-agent
Two pre-existing issues causing test_file_read_guards timeouts on CI:
1. agent/redact.py: _ENV_ASSIGN_RE used unbounded [A-Z_]* with
IGNORECASE, matching any letter/underscore to end-of-string at
each position → O(n²) backtracking on 100K+ char inputs.
Bounded to {0,50} since env var names are never that long.
2. tools/file_tools.py: redact_sensitive_text() ran BEFORE the
character-count guard, so oversized content (that would be rejected
anyway) went through the expensive regex first. Reordered to check
size limit before redaction.
The config key skills.external_dirs and core resolution (get_all_skills_dirs,
get_external_skills_dirs in agent/skill_utils.py) already existed but several
code paths still only scanned SKILLS_DIR. Now external dirs are respected
everywhere:
- skills_categories(): scan all dirs for category discovery
- _get_category_from_path(): resolve categories against any skills root
- skill_manager_tool._find_skill(): search all dirs for edit/patch/delete
- credential_files.get_skills_directory_mount(): mount all dirs into
Docker/Singularity containers (external dirs at external_skills/<idx>)
- credential_files.iter_skills_files(): list files from all dirs for
Modal/Daytona upload
- tools/environments/ssh.py: rsync all skill dirs to remote hosts
- gateway _check_unavailable_skill(): check disabled skills across all dirs
Usage in config.yaml:
skills:
external_dirs:
- ~/repos/agent-skills/hermes
- /shared/team-skills
- Add contextvars.Token[str] type hints to set/reset_current_session_key
- Use get_current_session_key(default='') in terminal_tool.py for background
process session tracking, fixing the same env var race for concurrent
gateway sessions spawning background processes
- Add .zip to SUPPORTED_DOCUMENT_TYPES so gateway platforms (Telegram,
Slack, Discord) cache uploaded zip files instead of rejecting them.
- Add get_cache_directory_mounts() and iter_cache_files() to
credential_files.py for host-side cache directory passthrough
(documents, images, audio, screenshots).
- Docker: bind-mount cache dirs read-only alongside credentials/skills.
Changes are live (bind mount semantics).
- Modal: mount cache files at sandbox creation + resync before each
command via _sync_files() with mtime+size change detection.
- Handles backward-compat with legacy dir names (document_cache,
image_cache, audio_cache, browser_screenshots) via get_hermes_dir().
- Container paths always use the new cache/<subdir> layout regardless
of host layout.
This replaces the need for a dedicated extract_archive tool (PR #4819)
— the agent can now use standard terminal commands (unzip, tar) on
uploaded files inside remote containers.
Closes: related to PR #4819 by kshitijk4poor
Three fixes for memory+profile isolation bugs:
1. memory_tool.py: Replace module-level MEMORY_DIR constant with
get_memory_dir() function that calls get_hermes_home() dynamically.
The old constant was cached at import time and could go stale if
HERMES_HOME changed after import. Internal MemoryStore methods now
call get_memory_dir() directly. MEMORY_DIR kept as backward-compat
alias.
2. profiles.py: profile create --clone now copies MEMORY.md and USER.md
from the source profile. These curated memory files are part of the
agent's identity (same as SOUL.md) and should carry over on clone.
3. holographic plugin: initialize() now expands $HERMES_HOME and
${HERMES_HOME} in the db_path config value, so users can write
'db_path: $HERMES_HOME/memory_store.db' and it resolves to the
active profile directory, not the default home.
Tests updated to mock get_memory_dir() alongside the legacy MEMORY_DIR.