Compare commits

..

246 Commits

Author SHA1 Message Date
Aum08Desai
e6b325cc24 feat(cli): add /reasoning slash command to manage reasoning effort
Cherry-picked from PR #789 by Aum08Desai, rebased onto current main
with conflict resolution and improvements:

- Added /reasoning command: view current level or set to none|low|medium|high|xhigh
- Persists to config via save_config_value, forces agent re-init
- Resolved conflict with COMMANDS_BY_CATEGORY refactor (added to Configuration category)
- Restricted valid levels to none, low, medium, high, xhigh (removed 'minimal')
- Updated _parse_reasoning_config in cli.py and _load_reasoning_config in gateway/run.py
- Improved display messages (show all valid options, clearer defaults/disabled state)
- Added EXPECTED_COMMANDS entry for regression guard
- Expanded test suite: 16 tests covering all levels, rejection, display, case insensitivity,
  config save failure

Co-authored-by: Aum08Desai <145567217+Aum08Desai@users.noreply.github.com>
2026-03-11 06:11:53 -07:00
teknium1
c837ef949d fix: replace debug print() with logger.error() in file_tools
Stray print() in write_file_tool exception handler leaked debug output
to stdout. Replaced with logger.error() which is already set up in
the file.

Authored by memosr.

Co-authored-by: memosr <memosr@users.noreply.github.com>
2026-03-11 04:38:07 -07:00
teknium1
a71736ea73 Merge PR #910: fix: add missing Responses API parameters for Codex provider
Adds tool_choice=auto, parallel_tool_calls=true, and prompt_cache_key
to Codex Responses API requests, matching the official Codex CLI.
Root cause fix for #747 (agent claiming no shell access).
2026-03-11 04:28:52 -07:00
teknium1
a82ce60294 fix: add missing Responses API parameters for Codex provider
Adds tool_choice, parallel_tool_calls, and prompt_cache_key to the
Codex Responses API request kwargs — matching what the official Codex
CLI sends.

- tool_choice: 'auto' — enables the model to proactively call tools.
  Without this, the model may default to not using tools, which explains
  reports of the agent claiming it lacks shell access (#747).
- parallel_tool_calls: True — allows the model to issue multiple tool
  calls in a single turn for efficiency.
- prompt_cache_key: session_id — enables server-side prompt caching
  across turns in the same session, reducing latency and cost.

Refs #747
2026-03-11 04:28:31 -07:00
teknium1
69090d6da1 fix: add **kwargs to base/telegram media send methods for metadata routing
The MEDIA routing in _process_message_background passes
metadata=_thread_metadata to send_video, send_document, and
send_image_file — but none accepted it, causing TypeError silently
caught by the except handler. Files just failed to send.

Fix: add **kwargs to all four base class media methods and their
Telegram overrides.
2026-03-11 03:24:39 -07:00
teknium1
322ffbed61 Merge PR #779: feat: Telegram native file attachment support (send_document + send_video)
Adds send_document() and send_video() overrides to TelegramAdapter.
Requested by TigerHix.
2026-03-11 03:23:11 -07:00
Teknium
fe9da5280f Merge pull request #766 from spanishflu-est1918/codex/telegram-topic-session-pr
Isolate Telegram forum topic sessions — each topic gets its own independent session key, history, and interrupt tracking. Progress, hygiene, and cron messages all route to the correct topic.
2026-03-11 03:14:43 -07:00
teknium1
4864a5684a refactor: extract shared curses checklist, fix skill discovery perf
Four cleanups to code merged today:

1. New hermes_cli/curses_ui.py — shared curses_checklist() used by both
   hermes tools and hermes skills. Eliminates ~140 lines of near-identical
   curses code (scrolling, key handling, color setup, numbered fallback).

2. Fix _find_all_skills() perf — was calling load_config() per skill
   (~100+ YAML parses). Now loads disabled set once via
   _get_disabled_skill_names() and does a set lookup.

3. Eliminate _list_all_skills_unfiltered() duplication — _find_all_skills()
   now accepts skip_disabled=True for the config UI, removing 30 lines
   of copy-pasted discovery logic from skills_config.py.

4. Fix fragile label round-trip in skills_command — was building label
   strings, passing to checklist, then mapping labels back to skill names
   (collision-prone). Now works with indices directly, like tools_config.
2026-03-11 03:06:15 -07:00
alireza78a
f1510ec33e test(terminal): add tests for env var validation in _get_env_config 2026-03-11 02:59:12 -07:00
alireza78a
4523cc09cf fix(terminal): validate env var types with clear error messages 2026-03-11 02:59:12 -07:00
teknium1
f524aed23e fix: clean up empty file after failed wl-paste clipboard extraction
When wl-paste produces empty output, the destination file was left as
a 0-byte orphan. Added dest.unlink() before returning False, matching
the existing cleanup pattern in the exception handler.

Authored by 0xbyt4.

Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>
2026-03-11 02:56:19 -07:00
teknium1
925f378baa Merge PR #773: feat(cli,gateway): add /personality none and custom personality support
Authored by teyrebaz33. Closes #643.

- /personality none/default/neutral clears system prompt overlay
- Dict format personalities with description, tone, style fields
- Works in both CLI and gateway
- 18 tests
2026-03-11 02:54:27 -07:00
teknium1
fe29594716 fix: replace blocking time.sleep with await asyncio.sleep in WhatsApp connect
time.sleep(1) inside async def connect() blocks the entire event loop.
Replaced with await asyncio.sleep(1) to properly yield control.

Authored by 0xbyt4. Fixes blocking sleep in WhatsApp bridge startup.

Co-authored-by: 0xbyt4 <0xbyt4@users.noreply.github.com>
2026-03-11 02:51:49 -07:00
teknium1
7721518591 Merge PR #770: fix: off-by-one in setup toggle selection error message
Authored by 0xbyt4. Error message showed 'between 1 and N+1' instead
of 'between 1 and N' for N items.
2026-03-11 02:50:52 -07:00
teknium1
6e303def12 Merge PR #757: security: enforce 0600/0700 file permissions on sensitive files
Enforces owner-only permissions on files containing secrets:
- config.yaml, .env → 0600
- ~/.hermes/, cron dirs → 0700
- cron jobs.json, output files → 0600

Windows-safe (all chmod calls wrapped in try/except).
Inspired by openclaw v2026.3.7.
2026-03-11 02:48:56 -07:00
teknium1
ad1fbd88b2 Merge feature/background-command: add /background slash command
Adds /background <prompt> command to both CLI and gateway platforms.
Spawns a new agent session in the background — users can keep chatting
while the task runs, and results are delivered when done.

CLI: threaded execution with rich Panel output
Gateway: asyncio task with platform adapter delivery (text, images, media)

Includes 15 new tests and updates to command registry.
2026-03-11 02:46:37 -07:00
teknium1
b8067ac27e feat: add /background command to gateway and CLI commands registry
Add /background <prompt> to the gateway, allowing users on Telegram,
Discord, Slack, etc. to fire off a prompt in a separate agent session.
The result is delivered back to the same chat when done, without
modifying the active conversation history.

Implementation:
- _handle_background_command: validates input, spawns asyncio task
- _run_background_task: creates AIAgent in executor thread, delivers
  result (text, images, media files) back via the platform adapter
- Inherits model, toolsets, provider routing from gateway config
- Error handling with user-visible failure messages

Also adds /background to hermes_cli/commands.py registry so it
appears in /help and autocomplete.

Tests: 15 new tests covering usage, task creation, uniqueness,
multi-platform, error paths, and help/autocomplete integration.
2026-03-11 02:46:31 -07:00
teknium1
bd2606a576 fix: initialize self.config in HermesCLI to fix AttributeError on slash commands
HermesCLI.__init__ never assigned self.config, causing an
AttributeError ('HermesCLI' object has no attribute 'config')
whenever an unrecognized slash command fell through to the
quick_commands check on line 2838. This affected skill slash
commands like /x-thread-creation since the quick_commands lookup
runs before the skill command check.

Set self.config = CLI_CONFIG in __init__ to match the pattern used
by the gateway (run.py:199).
2026-03-11 02:33:31 -07:00
teknium1
f5324f9aa5 fix: initialize self.config in HermesCLI to fix AttributeError on slash commands
HermesCLI.__init__ never assigned self.config, causing an
AttributeError ('HermesCLI object has no attribute config')
whenever an unrecognized slash command fell through to the
quick_commands check (line 2832). This broke skill slash commands
like /x-thread-creation since the quick_commands lookup runs
before the skill command check.

Set self.config = CLI_CONFIG in __init__, matching the pattern
used by the gateway (run.py:199).
2026-03-11 02:33:25 -07:00
SPANISH FLU
de2b881886 test(cron): cover topic thread delivery metadata 2026-03-11 09:22:32 +01:00
SPANISH FLU
0d6b25274c fix(gateway): isolate telegram forum topic sessions 2026-03-11 09:15:34 +01:00
teknium1
fbfdde496b docs: update AGENTS.md with new files and test count
- Add hermes_cli/ files: skills_config, tools_config, skills_hub, models, auth
- Add acp_adapter/ directory
- Update test count: ~2500 → ~3000 (~3 min runtime)
2026-03-11 00:54:49 -07:00
Bartok Moltbot
ae1c11c5a5 fix(cli): resolve duplicate 'skills' subparser crash on Python 3.11+
Fixes #898 — Python 3.11 changed argparse to raise an exception on
duplicate subparser names (CPython #94331). The 'skills' name was
registered twice: once for Skills Hub and once for skills config.

Changes:
- Remove duplicate 'skills' subparser registration
- Add 'config' as a sub-action under the existing 'hermes skills' command
- Route 'hermes skills config' to skills_config module
- Add regression test to catch future duplicates

Migration: 'hermes skills' (config) is now 'hermes skills config'
2026-03-11 00:50:39 -07:00
Teknium
5abee4fb23 Merge pull request #769 from 0xbyt4/fix/codex-models-visibility-mismatch
Minor defensive fix — accept both 'hide' and 'hidden' visibility values in codex model filtering.
2026-03-11 00:49:59 -07:00
teknium1
331af8df23 fix: clean up tools --summary output and type annotations
- Use Optional[List[str]] instead of List[str] | None (consistency)
- Add header, per-platform counts, and checkmark list format
- Matches the visual style of the interactive configurator
2026-03-11 00:47:26 -07:00
teknium1
3a2fd1a5c9 Merge PR #767: feat: add --summary flag to hermes tools
Authored by luisv-1. Adds hermes tools --summary for a quick
non-interactive view of enabled tools per platform.
2026-03-11 00:46:32 -07:00
teknium1
2e1aa1b424 docs: add iteration budget pressure section to configuration guide
Documents the two-tier budget warning system from PR #762:
- Explains caution (70%) and warning (90%) thresholds
- Table showing what the model sees at each tier
- Notes on how injection preserves prompt caching
- Links to max_turns config
2026-03-11 00:40:44 -07:00
teknium1
aead9c8ead chore: remove unnecessary pragma comments from Telegram adapter
Strip 18 '# pragma: no cover - defensive logging' annotations — these
are real code paths, not worth excluding from coverage.
2026-03-11 00:37:45 -07:00
teknium1
93230af7bd Merge PR #763: improve Telegram gateway error handling and logging
Authored by aydnOktay. Replaces print() statements with structured
logging calls (error/warning/info/debug) throughout the Telegram
adapter. Adds exc_info=True for stack traces on failures.
2026-03-11 00:37:28 -07:00
teknium1
21ff0d39ad feat: iteration budget pressure via tool result injection
Two-tier warning system that nudges the LLM as it approaches
max_iterations, injected into the last tool result JSON rather
than as a separate system message:

- Caution (70%): {"_budget_warning": "[BUDGET: 42/60...]"}
- Warning (90%): {"_budget_warning": "[BUDGET WARNING: 54/60...]"}

For JSON tool results, adds a _budget_warning field to the existing
dict. For plain text results, appends the warning as text.

Key properties:
- No system messages injected mid-conversation
- No changes to message structure
- Prompt cache stays valid
- Configurable thresholds (0.7 / 0.9)
- Can be disabled: _budget_pressure_enabled = False

Inspired by PR #421 (@Bartok9) and issue #414.
8 tests covering thresholds, edge cases, JSON and text injection.
2026-03-11 00:37:24 -07:00
teknium1
4b619c9672 Merge PR #761: Improve Discord gateway error handling and logging
Authored by aydnOktay. Replaces bare print statements with structured
logger calls (error/warning/info) and adds exc_info=True for stack
traces on failure paths.
2026-03-11 00:35:31 -07:00
teknium1
c5321298ce docs: add quick commands documentation
Documents the quick_commands config feature from PR #746:
- configuration.md: full section with examples (server status, disk,
  gpu, update), behavior notes (timeout, priority, works everywhere)
- cli.md: brief section with config example + link to config guide
2026-03-11 00:28:52 -07:00
teknium1
359352b947 Merge PR #755: fix: head+tail truncation for execute_code stdout
Replaces head-only stdout capture with 40/60 head/tail split so final
print() output is never lost. 3 new tests.
2026-03-11 00:26:26 -07:00
teknium1
a9241f3e3e fix: head+tail truncation for execute_code stdout
Replaces head-only stdout capture with a two-buffer approach (40% head,
60% tail rolling window) so scripts that print() their final results
at the end never lose them. Adds truncation notice between sections.

Cherry-picked from PR #755, conflict resolved (test file additions).

3 new tests for short output, head+tail preservation, and notice format.
2026-03-11 00:26:13 -07:00
teknium1
ea0a263434 Merge PR #758: feat(discord): add DISCORD_ALLOW_BOTS config for bot message filtering
Adds configurable bot message filtering via DISCORD_ALLOW_BOTS env var:
- 'none' (default): ignore all bot messages
- 'mentions': accept bots only when they @mention us
- 'all': accept all bot messages

Includes 8 tests.
2026-03-11 00:25:51 -07:00
teknium1
3be6e8a5f2 Merge PR #746: feat(cli,gateway): add user-defined quick commands that bypass agent loop
Authored by teyrebaz33. Adds config-driven quick commands that execute
shell commands without invoking the LLM — zero token usage, works from
Telegram/Discord/Slack/etc. Closes #744.
2026-03-11 00:24:34 -07:00
teknium1
2b244762e1 feat: add missing commands to categorized /help
Post-merge follow-up to PR #752 — adds 10 commands that were added
since the PR was submitted:

Session: /title, /compress, /rollback
Configuration: /provider, /verbose, /skin
Tools & Skills: /reload-mcp (+ full /skills description)
Info: /usage, /insights, /paste

Also preserved existing color formatting (_cprint, _GOLD, _BOLD, _DIM)
and skill commands section from main.
2026-03-10 23:49:03 -07:00
teknium1
a169a656b4 Merge PR #743: feat: hermes skills — enable/disable individual skills and categories
Authored by teyrebaz33. Fixes #642.
2026-03-10 23:46:42 -07:00
teknium1
a9fdd8dc3c Merge PR #752: feat(ux): improve /help formatting with command categories
Authored by Bartok9. Organizes /help output into categories (Session,
Configuration, Tools & Skills, Info, Exit) for better readability.
Fixes #640.
2026-03-10 23:45:41 -07:00
Bartok Moltbot
8eb9eed074 feat(ux): improve /help formatting with command categories (#640)
- Organize COMMANDS into COMMANDS_BY_CATEGORY dict
- Group commands: Session, Configuration, Tools & Skills, Info, Exit
- Add visual category headers with spacing
- Maintain backwards compat via flat COMMANDS dict
- Better visual hierarchy and scannability

Before:
  /help           - Show this help message
  /tools          - List available tools
  ... (dense list)

After:
  ── Session ──
    /new           Start a new conversation
    /reset         Reset conversation only
    ...

  ── Configuration ──
    /config        Show current configuration
    ...

Closes #640
2026-03-10 23:45:36 -07:00
teknium1
909e048ad4 fix: integration hardening for gateway token tracking
Follow-up to 58dbd81 — ensures smooth transition for existing users:

- Backward compat: old session files without last_prompt_tokens
  default to 0 via data.get('last_prompt_tokens', 0)
- /compress, /undo, /retry: reset last_prompt_tokens to 0 after
  rewriting transcripts (stale token counts would under-report)
- Auto-compression hygiene: reset last_prompt_tokens after rewriting
- update_session: use None sentinel (not 0) as default so callers
  can explicitly reset to 0 while normal calls don't clobber
- 6 new tests covering: default value, serialization roundtrip,
  old-format migration, set/reset/no-change semantics
- /reset: new SessionEntry naturally gets last_prompt_tokens=0

2942 tests pass.
2026-03-10 23:40:24 -07:00
teyrebaz33
5eb62ef423 test(gateway): add regression test for /retry response fix
Adds two tests for _handle_retry_command: verifies /retry returns the
agent response (not None), and verifies graceful handling when no
previous message exists.

Cherry-picked from PR #731 by teyrebaz33. Regression coverage for
the fix merged in PR #441.

Co-authored-by: teyrebaz33 <teyrebaz33@users.noreply.github.com>
2026-03-10 23:34:52 -07:00
teknium1
58dbd81f03 fix: use actual API token counts for gateway compression pre-check
Root cause of aggressive gateway compression vs CLI:
- CLI: single AIAgent persists across conversation, uses real API-reported
  prompt_tokens for compression decisions — accurate
- Gateway: each message creates fresh AIAgent, token count discarded after,
  next message pre-check falls back to rough str(msg)//4 estimate which
  overestimates 30-50% on tool-heavy conversations

Fix:
- Add last_prompt_tokens field to SessionEntry — stores the actual
  API-reported prompt token count from the most recent agent turn
- After run_conversation(), extract context_compressor.last_prompt_tokens
  and persist it via update_session()
- Gateway pre-check now uses stored actual tokens when available (exact
  same accuracy as CLI), falling back to rough estimate with 1.4x safety
  factor only for the first message of a session

This makes gateway compression behave identically to CLI compression
for all turns after the first. Reported by TigerHix.
2026-03-10 23:28:23 -07:00
Teknium
a35c37a2f9 Merge pull request #891 from NousResearch/hermes/hermes-b0162f8d
fix: sort Nous Portal model list (opus first, sonnet lower)
2026-03-10 23:21:01 -07:00
teknium1
1518734e59 fix: sort Nous Portal model list (opus first, sonnet lower)
fetch_nous_models() returned models in whatever order the API gave
them, which put sonnet near the top. Add a priority sort so users
see the best models first: opus > pro > other > sonnet.
2026-03-10 23:20:46 -07:00
teknium1
67b9470207 fix: reduce premature gateway compression on tool-heavy sessions
The gateway's session hygiene pre-check uses a rough char-based token
estimate (total_chars / 4) to decide whether to compress before the
agent starts. This significantly overestimates for tool-heavy and
code-heavy conversations because:

1. str(msg) on dicts includes Python repr overhead (keys, brackets, etc.)
2. Code/JSON tokenizes at 5-7+ chars/token, not the assumed 4

This caused users with 200k context to see compression trigger at
~100-113k actual tokens instead of the expected 170k (85% threshold).
Reported by TigerHix on Twitter.

Fix: apply a 1.4x safety factor to the gateway pre-check threshold.
This pre-check is only meant to catch pathologically large transcripts
— the agent's own compression uses actual API-reported token counts
for precise threshold management.
2026-03-10 23:16:49 -07:00
teknium1
586fe5d62d Merge PR #724: feat: --yolo flag to bypass all approval prompts
Authored by dmahan93. Adds HERMES_YOLO_MODE env var and --yolo CLI flag
to auto-approve all dangerous command prompts.

Post-merge: renamed --fuck-it-ship-it to --yolo for brevity,
resolved conflict with --checkpoints flag.
2026-03-10 20:56:30 -07:00
teknium1
2d80ef7872 fix: _init_agent returns bool, not agent — fix quiet mode crash 2026-03-10 20:49:03 -07:00
Teknium
b76cae94d4 Merge pull request #889 from NousResearch/hermes/hermes-b0162f8d
fix: Docker backend fails when docker is not in PATH (macOS gateway)
2026-03-10 20:45:34 -07:00
teknium1
23270d41b9 feat: add --quiet/-Q flag for programmatic single-query mode
Adds -Q/--quiet to `hermes chat` for use by external orchestrators
(Paperclip, scripts, CI). When combined with -q, suppresses:
- Banner and ASCII art
- Spinner animations
- Tool preview lines (┊ prefix)

Only outputs:
- The agent's final response text
- A parseable 'session_id: <id>' line for session resumption

Usage: hermes chat -q 'Do something' -Q
Used by: Paperclip adapter (@nousresearch/paperclip-adapter-hermes)
2026-03-10 20:45:28 -07:00
teknium1
24479625a2 fix: Docker backend fails when docker is not in PATH (macOS gateway)
On macOS, Docker Desktop installs the CLI to /usr/local/bin/docker, but
when Hermes runs as a gateway service (launchd) or in other non-login
contexts, /usr/local/bin is often not in PATH. This causes the Docker
requirements check to fail with 'No such file or directory: docker' even
though docker works fine from the user's terminal.

Add find_docker() helper that uses shutil.which() first, then probes
common Docker Desktop install paths on macOS (/usr/local/bin,
/opt/homebrew/bin, Docker.app bundle). The resolved path is cached and
passed to mini-swe-agent via its 'executable' parameter.

- tools/environments/docker.py: add find_docker(), use it in
  _storage_opt_supported() and pass to _Docker(executable=...)
- tools/terminal_tool.py: use find_docker() in requirements check
- tests/tools/test_docker_find.py: 4 tests (PATH, fallback, not found, cache)

2877 tests pass.
2026-03-10 20:45:13 -07:00
vilkasdev
d502952bac fix(cli): add loading indicators for slow slash commands
Shows an immediate status message and braille spinner for slow slash
commands (/skills search|browse|inspect|install, /reload-mcp). Makes
input read-only while the command runs so the CLI doesn't appear frozen.

Cherry-picked from PR #714 by vilkasdev, rebased onto current main
with conflict resolution and bug fix (get_hint_text duplicate return).

Fixes #636

Co-authored-by: vilkasdev <vilkasdev@users.noreply.github.com>
2026-03-10 17:31:00 -07:00
Teknium
ac53bf1d71 Merge pull request #881 from NousResearch/hermes/hermes-b0162f8d
fix: provider selection not persisting when switching via hermes model
2026-03-10 17:13:26 -07:00
teknium1
145c57fc01 fix: provider selection not persisting when switching via hermes model
Two related bugs prevented users from reliably switching providers:

1. OPENAI_BASE_URL poisoning OpenRouter resolution: When a user with a
   custom endpoint ran /model openrouter:model, _resolve_openrouter_runtime
   picked up OPENAI_BASE_URL instead of the OpenRouter URL, causing model
   validation to probe the wrong API and reject valid models.

   Fix: skip OPENAI_BASE_URL when requested_provider is explicitly
   'openrouter'.

2. Provider never saved to config: _save_model_choice() could save
   config.model as a plain string. All five _model_flow_* functions then
   checked isinstance(model, dict) before writing the provider — which
   silently failed on strings. With no provider in config, auto-detection
   would pick up stale credentials (e.g. Codex desktop app) instead of
   the user's explicit choice.

   Fix: _save_model_choice() now always saves as dict format. All flow
   functions also normalize string->dict as a safety net before writing
   provider.

Adds 4 regression tests. 2873 tests pass.
2026-03-10 17:12:34 -07:00
teknium1
2dddfce08c fix: log prefill parse errors + clean up cron scheduler tests
Follow-up to PR #716 (0xbyt4):
- Log the third remaining silent except-pass in scheduler (prefill
  messages JSON parse failure)
- Fix test mock: run → run_conversation (matches actual agent API)
- Remove unused imports (asyncio, AsyncMock)
- Add test for prefill_messages parse failure logging
2026-03-10 17:10:01 -07:00
teknium1
03a4f184e6 fix: call _stop_training_run on early-return failure paths
The 4 early-return paths in _spawn_training_run (API exit, trainer
exit, env not found, env exit) were doing manual process.terminate()
or returning without cleanup, leaking open log file handles. Now all
paths call _stop_training_run() which handles both process termination
and file handle closure.

Also adds 12 tests for _stop_training_run covering file handle
cleanup, process termination, status transitions, and edge cases.

Inspired by PR #715 (0xbyt4) which identified the early-return issue.
Core file handle fix was already on main via e28dc13 (memosr.eth).
2026-03-10 17:09:51 -07:00
teknium1
be2e259596 Merge PR #716: fix: log exceptions instead of silently swallowing in cron scheduler
Authored by 0xbyt4. Replaces two except-Exception-pass blocks with
logger.warning() calls and adds tests for both paths.
2026-03-10 17:05:59 -07:00
teknium1
05bc8b19fe Merge PR #713: docs: clarify Telegram token regex constraint
Authored by VolodymyrBg.
2026-03-10 16:59:54 -07:00
teknium1
cb6b70bbfb Merge PR #709: fix: close log file handles to prevent resource leaks
Authored by memosr. Fixes bare open() calls in browser_tool.py and
unclosed log file handles in rl_training_tool.py.
2026-03-10 16:26:29 -07:00
teknium1
a458b535c9 fix: improve read-loop detection — consecutive-only, correct thresholds, fix bugs
Follow-up to PR #705 (merged from 0xbyt4). Addresses several issues:

1. CONSECUTIVE-ONLY TRACKING: Redesigned the read/search tracker to only
   warn/block on truly consecutive identical calls. Any other tool call
   in between (write, patch, terminal, etc.) resets the counter via
   notify_other_tool_call(), called from handle_function_call() in
   model_tools.py. This prevents false blocks in read→edit→verify flows.

2. THRESHOLD ADJUSTMENT: Warn on 3rd consecutive (was 2nd), block on
   4th+ consecutive (was 3rd+). Gives the model more room before
   intervening.

3. TUPLE UNPACKING BUG: Fixed get_read_files_summary() which crashed on
   search keys (5-tuple) when trying to unpack as 3-tuple. Now uses a
   separate read_history set that only tracks file reads.

4. WEB_EXTRACT DOCSTRING: Reverted incorrect removal of 'title' from
   web_extract return docs in code_execution_tool.py — the field IS
   returned by web_tools.py.

5. TESTS: Rewrote test_read_loop_detection.py (35 tests) to cover
   consecutive-only behavior, notify_other_tool_call, interleaved
   read/search, and summary-unaffected-by-searches.
2026-03-10 16:25:41 -07:00
teknium1
b53d5dad67 Merge PR #705: fix: detect, warn, and block file re-read/search loops after context compression
Authored by 0xbyt4. Adds read/search loop detection, file history injection after compression, and todo filtering for active items only.
2026-03-10 16:17:03 -07:00
teknium1
ad7a16dca6 fix: remove left/right borders from response box for easier copy-paste
Use rich_box.HORIZONTALS instead of the default ROUNDED box style
for the agent response panel. This keeps the top/bottom horizontal
rules (with title) but removes the vertical │ borders on left and
right, making it much easier to copy-paste response text from the
terminal.
2026-03-10 15:59:08 -07:00
teknium1
6e851a1f6a Merge PR #873: fix: eliminate 3x SQLite message duplication in gateway sessions
Fixes #860.
2026-03-10 15:29:24 -07:00
teknium1
c1171fe666 fix: eliminate 3x SQLite message duplication in gateway sessions (#860)
Three separate code paths all wrote to the same SQLite state.db with
no deduplication, inflating session transcripts by 3-4x:

1. _log_msg_to_db() — wrote each message individually after append
2. _flush_messages_to_session_db() — re-wrote ALL new messages at
   every _persist_session() call (~18 exit points), with no tracking
   of what was already written
3. gateway append_to_transcript() — wrote everything a third time
   after the agent returned

Since load_transcript() prefers SQLite over JSONL, the inflated data
was loaded on every session resume, causing proportional token waste.

Fix:
- Remove _log_msg_to_db() and all 16 call sites (redundant with flush)
- Add _last_flushed_db_idx tracking in _flush_messages_to_session_db()
  so repeated _persist_session() calls only write truly new messages
- Reset flush cursor on compression (new session ID)
- Add skip_db parameter to SessionStore.append_to_transcript() so the
  gateway skips SQLite writes when the agent already persisted them
- Gateway now passes skip_db=True for agent-managed messages, still
  writes to JSONL as backup

Verified: a 12-message CLI session with tool calls produces exactly
12 SQLite rows with zero duplicates (previously would be 36-48).

Tests: 9 new tests covering flush deduplication, skip_db behavior,
compression reset, and initialization. Full suite passes (2869 tests).
2026-03-10 15:22:44 -07:00
teknium1
2210068f5b Merge: fix(signal) align send() signature with base class 2026-03-10 15:18:31 -07:00
teknium1
d6ab35c1a3 fix(signal): align send() signature with base class (content, reply_to, metadata)
Signal's send() used 'text' instead of 'content' and 'reply_to_message_id'
instead of 'reply_to', mismatching BasePlatformAdapter.send(). Callers in
gateway/run.py use keyword args matching the base interface, so Signal's
send() was missing its required 'text' positional arg.

Fixes: 'SignalAdapter.send() missing 1 required positional argument: text'
2026-03-10 15:18:26 -07:00
teknium1
5fc751e543 Merge: fix(gateway) add metadata param to _keep_typing and base send_typing 2026-03-10 15:08:45 -07:00
teknium1
cea78c5e27 fix(gateway): add metadata param to _keep_typing and base send_typing
_keep_typing() was called with metadata= for thread-aware typing
indicators, but neither it nor the base send_typing() accepted
that parameter. Most adapter overrides (Slack, Discord, Telegram,
WhatsApp, HA) already accept metadata=None, but the base class
and Signal adapter did not.

- Add metadata=None to BasePlatformAdapter.send_typing()
- Add metadata=None to BasePlatformAdapter._keep_typing(), pass through
- Add metadata=None to SignalAdapter.send_typing()

Fixes TypeError in _process_message_background for Signal.
2026-03-10 15:08:40 -07:00
teknium1
53be6afe92 Merge PR #871: fix(signal): use media_urls/media_types in MessageEvent construction 2026-03-10 15:00:08 -07:00
teknium1
d04b9f4dc5 fix(signal): use media_urls/media_types instead of non-existent image_paths/audio_path/document_paths
The Signal adapter was passing image_paths, audio_path, and document_paths
to MessageEvent.__init__(), but those fields don't exist on the dataclass.
MessageEvent uses media_urls (List[str]) and media_types (List[str]).

Changes:
- Replace separate image_paths/audio_path/document_paths with unified
  media_urls and media_types lists (matching Discord, Slack, etc.)
- Add _ext_to_mime() helper to map file extensions to MIME types
- Use Signal's contentType from attachment metadata when available,
  falling back to extension-based mapping
- Update message type detection to check media_types prefixes

Fixes TypeError: MessageEvent.__init__() got an unexpected keyword
argument 'image_paths'
2026-03-10 14:58:16 -07:00
SHL0MS
149516f365 Merge pull request #854 from NousResearch/add-ascii-video-skill
Add ASCII video skill to creative category
2026-03-10 16:34:57 -04:00
SHL0MS
0229e6b407 Fix test_analysis_error_logs_exc_info: mock _aux_async_client so download path is reached 2026-03-10 16:03:19 -04:00
SHL0MS
c358af7861 Add ASCII video skill to creative category 2026-03-10 15:54:38 -04:00
teknium1
8eefbef91c fix: replace ANSI response box with Rich Panel + reduce widget flashing
Major UX improvements:

1. Response box now uses a Rich Panel rendered through ChatConsole
   instead of hand-rolled ANSI box-drawing borders. Rich Panels
   adapt to terminal width at render time, wrap content inside
   the borders properly, and use skin colors natively.

2. ChatConsole now reads terminal width at render time via
   shutil.get_terminal_size() instead of defaulting to 80 cols.
   All Rich output adapts to the current terminal size.

3. User-input separator reduced to fixed 40-char width so it
   never wraps regardless of terminal resize.

4. Approval and clarify countdown repaints throttled to every 5s
   (was 1s), dramatically reducing flicker in Kitty/ghostty.
   Selection changes still trigger instant repaints via key bindings.

5. Sudo widget now uses dynamic _panel_box_width() instead of
   hardcoded border strings.

Tests: 2860 passed.
2026-03-10 07:04:02 -07:00
teknium1
e590caf8d8 Revert "Merge PR #702: feat: configurable embedding infrastructure — local (fastembed) + API (OpenAI)"
This reverts commit 46b95ee694, reversing
changes made to 0fdeffe6c4.
2026-03-10 07:00:54 -07:00
teknium1
46b95ee694 Merge PR #702: feat: configurable embedding infrastructure — local (fastembed) + API (OpenAI)
Authored by teyrebaz33. Adds agent/embeddings.py with Embedder protocol,
FastEmbedEmbedder (local, 384d), OpenAIEmbedder (API, 1536d), factory,
and cosine similarity utilities. 30 tests. Optional fastembed dependency.
Infrastructure for #509 (cognitive memory) and #489 (semantic search).
Closes #675.
2026-03-10 06:59:22 -07:00
teknium1
0fdeffe6c4 fix: replace silent exception swallowing with debug logging across tools
Add logger.debug() calls to 27 bare 'except: pass' blocks across 7 core
files, giving visibility into errors that were previously silently
swallowed. This makes it much easier to diagnose user-reported issues
from debug logs.

Files changed:
- tools/terminal_tool.py: 5 catches (stat, termios, fd close, cleanup)
- tools/delegate_tool.py: 7 catches + added logger (spinner, callbacks)
- tools/browser_tool.py: 5 catches (screenshot/recording cleanup, daemon kill)
- tools/code_execution_tool.py: 2 remaining catches (socket, server close)
- gateway/session.py: 2 catches (platform enum parse, temp file cleanup)
- agent/display.py: 2 catches + added logger (JSON parse in failure detect)
- agent/prompt_builder.py: 1 catch (skill description read)

Deliberately kept bare pass for:
- ImportError checks for optional dependencies (terminal_tool.py)
- SystemExit/KeyboardInterrupt handlers
- Spinner _write catch (would spam on every frame when stdout closed)
- process_registry PID-alive check (canonical os.kill(pid,0) pattern)

Extends the pattern from PR #686 (@aydnOktay).
2026-03-10 06:59:20 -07:00
teyrebaz33
cc4ead999a feat: configurable embedding infrastructure — local (fastembed) + API (OpenAI) (#675)
- Add agent/embeddings.py with Embedder protocol, FastEmbedEmbedder, OpenAIEmbedder
- Factory function get_embedder() reads provider from config.yaml embeddings section
- Lazy initialization — no startup impact, model loaded on first embed call
- cosine_similarity() and cosine_similarity_matrix() utility functions included
- Add fastembed as optional dependency in pyproject.toml
- 30 unit tests, all passing

Closes #675
2026-03-10 06:56:18 -07:00
teknium1
60cba55d82 Merge PR #701: fix: tool call repair — auto-lowercase, fuzzy match, helpful error on unknown tool
Authored by teyrebaz33. Adds _repair_tool_call() method: tries lowercase,
normalize (hyphens/spaces → underscores), then fuzzy match (difflib, 0.7
cutoff). Replaces hard abort after 3 retries with graceful error message
sent back to model for self-correction. Fixed bug where valid tool calls
in a mixed batch would get no results (now all get results).
Fixes #520.
2026-03-10 06:54:17 -07:00
teyrebaz33
1caee06b22 fix: tool call repair — auto-lowercase, fuzzy match, helpful error on unknown tool (#520)
- Add _repair_tool_call(): tries lowercase, normalize, then fuzzy match (difflib 0.7)
- Replace 3-retry-then-abort with graceful error: model receives helpful message and self-corrects
- Conversation stays alive instead of dying on hallucinated tool names

Closes #520
2026-03-10 06:54:11 -07:00
teknium1
a6eaf0f41f Merge PR #700: fix(config): atomic write for config.yaml to prevent data loss on crash
Authored by alireza78a. Adds atomic_yaml_write() to utils.py (mirrors
existing atomic_json_write pattern), replaces bare open('w') in
save_config(). Integrated with max_turns normalization and commented
sections via extra_content param. 3 new tests for crash safety.
2026-03-10 06:48:43 -07:00
alireza78a
fadad820dd fix(config): atomic write for config.yaml to prevent data loss on crash 2026-03-10 06:48:37 -07:00
teknium1
e8b19b5826 fix: cap user-input separator at 120 cols (matches response box) 2026-03-10 06:47:26 -07:00
teknium1
9ea2209a43 fix: reduce approval/clarify widget flashing + dynamic border widths
Three UI improvements:

1. Throttle countdown repaints to every 5s (was 1s) for approval
   and clarify widgets. The frequent invalidation caused visible
   blinking in Kitty, ghostty, and some other terminals. Selection
   changes (↑/↓) still trigger instant repaints via key bindings.

2. Make echo Link2them00n. | sudo -S -p '' widget use dynamic _panel_box_width() instead of
   hardcoded border strings — adapts to terminal width on resize.

3. Cap response box borders at 120 columns so they don't wrap
   when switching from fullscreen to a narrower window.

Tests: 2857 passed.
2026-03-10 06:44:13 -07:00
teknium1
87af622df4 Merge PR #686: improve error handling and logging in code execution tool
Authored by @aydnOktay. Adds exc_info=True to exception logging, replaces
silent pass statements with logger.debug calls, fixes variable shadowing
in _kill_process_group nested except blocks.
2026-03-10 06:43:11 -07:00
teknium1
2c21c4b897 Merge PR #698: fix(security): pipe sudo password via stdin instead of shell cmdline
Authored by johnh4098. Fixes CWE-214: SUDO_PASSWORD was visible in
/proc/PID/cmdline via echo pipe. Now passed through subprocess stdin.
All 6 backends updated: local, ssh, docker, singularity pipe via stdin;
modal and daytona use printf fallback (remote sandbox, documented).
2026-03-10 06:38:44 -07:00
teknium1
771969f747 fix: wire up enabled_tools in agent loop + simplify sandbox tool selection
Completes the fix started in 8318a51 — handle_function_call() accepted
enabled_tools but run_agent.py never passed it. Now both call sites in
_execute_tool_calls() pass self.valid_tool_names, so each agent session
uses its own tool list instead of the process-global
_last_resolved_tool_names (which subagents can overwrite).

Also simplifies the redundant ternary in code_execution_tool.py:
sandbox_tools is already computed correctly (intersection with session
tools, or full SANDBOX_ALLOWED_TOOLS as fallback), so the conditional
was dead logic.

Inspired by PR #663 (JasonOA888). Closes #662.
Tests: 2857 passed.
2026-03-10 06:35:28 -07:00
johnh4098
e9742e202f fix(security): pipe sudo password via stdin instead of shell cmdline 2026-03-10 06:34:59 -07:00
teknium1
a2ea85924a Merge PR #687: fix(file_tools): pass docker_volumes to sandbox container config
Authored by manuelschipper. Adds missing docker_volumes key to
container_config in file_tools.py, matching terminal_tool.py.
Without this, Docker sandbox containers created by file operations
lack user volume mounts when file tools run before terminal.
2026-03-10 06:33:30 -07:00
teknium1
8318a519e6 fix: pass enabled_tools through handle_function_call to avoid global race
The process-global _last_resolved_tool_names gets overwritten when
subagents resolve their own toolsets, causing execute_code in the
parent agent to generate imports for the wrong set of tools.

Fix: handle_function_call() now accepts an enabled_tools parameter.
run_agent.py already passes self.valid_tool_names at both call sites.
This change makes model_tools.py actually use it, falling back to the
global only when the caller doesn't provide a list (backward compat).
2026-03-10 06:32:08 -07:00
teknium1
8ef3c815e7 Merge PR #680: feat: add Nous Portal API key provider
Authored by Indelwin. Adds 'nous-api' provider for direct API key
access to Nous Portal inference, mirroring how OpenRouter and other
API-key providers work. Includes PROVIDER_REGISTRY entry, setup wizard
option, OPTIONAL_ENV_VARS, provider aliases, and test.
Fixes #644.
2026-03-10 06:31:03 -07:00
Indelwin
de07aa7c40 feat: add Nous Portal API key provider (#644)
Add support for using Nous Portal via a direct API key, mirroring
how OpenRouter and other API-key providers work. This gives users a
simpler alternative to the OAuth device-code flow when they already
have a Nous API key.

Changes:
- Add 'nous-api' to PROVIDER_REGISTRY as an api_key provider
  pointing to https://inference-api.nousresearch.com/v1
- Add NOUS_API_KEY and NOUS_BASE_URL to OPTIONAL_ENV_VARS
- Add NOUS_API_BASE_URL / NOUS_API_CHAT_URL to hermes_constants
- Add 'Nous Portal API key' as first option in setup wizard
- Add provider aliases (nous_api, nousapi, nous-portal-api)
- Add test for nous-api runtime provider resolution

Closes #644
2026-03-10 06:28:00 -07:00
teknium1
928bb16da1 fix: forward thread_id to Telegram adapter + update send_typing signatures
Part 2 of thread_id forum topic fix: add metadata param to
send_voice, send_image, send_animation, send_typing in Telegram
adapter and pass message_thread_id to all Bot API calls. Update
send_typing signature in Discord, Slack, WhatsApp, HomeAssistant
for compatibility.

Based on the fix proposed by @Bitstreamono in PR #656.
2026-03-10 06:26:32 -07:00
teknium1
441f498d6f Merge PR #679: fix(code_execution): handle empty enabled_sandbox_tools in schema description
Authored by 0xbyt4. Fixes broken 'from hermes_tools import , ...'
syntax in schema description when no sandbox tools are enabled.
Adds 29 new tests for schema generation, env var filtering,
edge cases, and interrupt handling.
2026-03-10 06:22:56 -07:00
teknium1
a630ca15de fix: forward thread_id metadata for Telegram forum topic routing
Replies in Telegram forum topics (supergroups with topics) now land in
the correct topic thread instead of 'General'.

- base.py: build thread_id metadata from event.source, pass to all
  send/media calls; add metadata param to send_typing, send_image,
  send_animation, send_voice, send_video, send_document, send_image_file,
  _keep_typing
- telegram.py: extract thread_id from metadata and pass as
  message_thread_id to all Bot API calls (send_photo, send_voice,
  send_audio, send_animation, send_chat_action)
- run.py: pass thread_id metadata to progress/streaming send calls
- discord/slack/whatsapp/homeassistant: update send_typing signature

Based on the fix proposed by @Bitstreamono in PR #656.
2026-03-10 06:21:15 -07:00
0xbyt4
52e3580cd4 refactor: merge new tests into test_code_execution.py
Move all new tests (schema, env filtering, edge cases, interrupt) into
the existing test_code_execution.py instead of a separate file.
Delete the now-redundant test_code_execution_schema.py.
2026-03-10 06:18:27 -07:00
0xbyt4
694a3ebdd5 fix(code_execution): handle empty enabled_sandbox_tools in schema description
build_execute_code_schema(set()) produced "from hermes_tools import , ..."
in the code property description — invalid Python syntax shown to the model.

This triggers when a user enables only the code_execution toolset without
any of the sandbox-allowed tools (e.g. `hermes tools code_execution`),
because SANDBOX_ALLOWED_TOOLS & {"execute_code"} = empty set.

Also adds 29 unit tests covering build_execute_code_schema, environment
variable filtering, execute_code edge cases, and interrupt handling.
2026-03-10 06:18:27 -07:00
teknium1
2a062e2f45 Merge PR #840: background process notification modes + fix spinner line spam
- feat(gateway): configurable background_process_notifications (off/result/error/all)
- fix(display): rate-limit spinner flushes to prevent line spam under patch_stdout

Background notifications inspired by @PeterFile (PR #593).
2026-03-10 06:17:18 -07:00
teknium1
49ec1c9e8f Merge PR #655: fix: normalize max turns config path
Authored by stablegenius49. Rebased onto current main, resolved 3
conflicts (load_config encoding, save_config commented sections, setup
default value), fixed missing MagicMock import, aligned DEFAULT_CONFIG
default to 90 (matching cli.py).

Migrates legacy root-level max_turns to agent.max_turns across all
config loaders (load_config, load_cli_config, save_config, setup).
Adds _normalize_max_turns_config() for consistent migration.
Fixes #634.
2026-03-10 06:05:20 -07:00
stablegenius49
4bd579f915 fix: normalize max turns config path 2026-03-10 06:05:02 -07:00
teknium1
e4adb67ed8 fix(display): rate-limit spinner flushes to prevent line spam under patch_stdout
The KawaiiSpinner animation would occasionally spam dozens of duplicate
lines instead of overwriting in-place with \r. This happened because
prompt_toolkit's StdoutProxy processes each flush() as a separate
run_in_terminal() call — when the write thread is slow (busy event loop
during long tool executions), each \r frame gets its own call, and the
terminal layout save/restore between calls breaks the \r overwrite
semantics.

Fix: rate-limit flush() calls to at most every 0.4s. Between flushes,
\r-frame writes accumulate in StdoutProxy's buffer. When flushed, they
concatenate into one string (e.g. \r frame1 \r frame2 \r frame3) and
are written in a single run_in_terminal() call where \r works correctly.

The spinner still animates (flush ~2.5x/sec) but each flush batches
~3 frames, guaranteeing the \r collapse always works. Most visible
with execute_code and terminal tools (3+ second executions).
2026-03-10 06:02:07 -07:00
teknium1
ff09cad879 Merge PR #621: fix: limit concurrent Modal sandbox creations to avoid deadlocks
Authored by voteblake.

- Semaphore limits concurrent Modal sandbox creations to 8 (configurable)
  to prevent thread pool deadlocks when 86+ tasks fire simultaneously
- Modal cleanup guard for failed init (prevents AttributeError)
- CWD override to /app for TB2 containers
- Add /home/ to host path validation for container backends
2026-03-10 05:57:54 -07:00
teknium1
580e6ba2ff feat: add proper favicon and logo for landing page and docs site
Generated favicon files (ico, 16x16, 32x32, 180x180, 192x192, 512x512)
from the Hermes Agent logo. Replaces the inline SVG caduceus emoji with
real favicon files so Google's favicon service can pick up the logo.

Landing page: updated <link> tags to reference favicon.ico, favicon PNGs,
and apple-touch-icon.
Docusaurus: updated config to use favicon.ico and logo.png instead of
favicon.svg.
2026-03-10 05:51:45 -07:00
0xbyt4
ca23875575 fix: unify visibility filter in codex model discovery
_fetch_models_from_api checked for "hide" while _read_cache_models
checked for "hidden", causing models hidden by the API to still
appear when loaded from cache. Both now accept either value.
2026-03-10 15:15:33 +03:00
teknium1
d6d5a43d3a Merge PR #627: fix: continue non-tool replies after output-length truncation
Authored by tripledoublev (vincent). Rebased onto current main and
conflict-resolved.

When finish_reason='length' on a non-tool chat-completions response,
instead of rolling back and returning None, the agent now:
- Appends the truncated text and a continuation prompt
- Retries up to 3 times, accumulating partial chunks
- Concatenates all chunks into the final response
- Preserves existing rollback behavior for tool-call truncations
2026-03-10 04:33:14 -07:00
teknium1
d723208b1b Merge PR #617: Improve skills tool error handling
Authored by aydnOktay. Adds logging to skills_tool.py with specific
exception handling for file read errors (UnicodeDecodeError, PermissionError)
vs unexpected exceptions, replacing bare except-and-continue blocks.
2026-03-10 04:32:26 -07:00
vincent
b0a5fe8974 fix: continue after output-length truncation 2026-03-10 04:30:19 -07:00
teknium1
899dfdcfb9 Merge PR #616: fix: retry with rebuilt payload after compression
Authored by tripledoublev.

After context compression on 413/400 errors, the inner retry loop was
reusing the stale pre-compression api_messages payload. Fix breaks out
of the inner retry loop so the outer loop rebuilds api_messages from
the now-compressed messages list. Adds regression test verifying the
second request actually contains the compressed payload.
2026-03-10 04:22:42 -07:00
teknium1
8f0b07ed29 Merge PR #611: fix(session): atomic write for sessions.json to prevent data loss on crash
Authored by alireza78a.

Replaces open('w') + json.dump with tempfile.mkstemp + os.replace atomic
write pattern, matching the existing pattern in cron/jobs.py. Prevents
silent session loss if the process crashes or gets OOM-killed mid-write.

Resolved conflict: kept encoding='utf-8' from HEAD in the new fdopen call.
2026-03-10 04:18:53 -07:00
teknium1
f16f2912cf Merge PR #607: fix: reset all retry counters at start of run_conversation()
Authored by 0xbyt4. Adds missing resets for _incomplete_scratchpad_retries and _codex_incomplete_retries to prevent stale counters carrying over between CLI conversations.
2026-03-10 04:17:47 -07:00
teknium1
af748539f8 Merge PR #608: fix: remove unused imports and unnecessary f-strings
Authored by JackTheGit.

- Remove unused 'random' import from agent/display.py
- Remove unused 'Optional' import from agent/redact.py
- Remove unnecessary f-string prefixes in batch_runner.py
2026-03-10 04:16:23 -07:00
teknium1
695c017411 Merge PR #603: fix: return deny on approval callback timeout instead of None
Authored by 0xbyt4.

_approval_callback() had no return statement after the timeout break,
causing it to return None instead of 'deny'. Callers in approval.py
expect one of 'once', 'session', 'always', or 'deny'. This matches
the existing timeout behavior in approval.py:209.
2026-03-10 04:15:31 -07:00
teknium1
5e6c7bc205 Merge PR #602: fix: prevent data loss in clipboard PNG conversion when ImageMagick fails
Authored by 0xbyt4. Only deletes temp .bmp after confirmed successful conversion, restores original on failure. Adds 3 tests.
2026-03-10 04:15:05 -07:00
teknium1
e8cec55fad feat(gateway): configurable background process watcher notifications
Add display.background_process_notifications config option to control
how chatty the gateway process watcher is when using
terminal(background=true, check_interval=...) from messaging platforms.

Modes:
  - all:    running-output updates + final message (default, current behavior)
  - result: only the final completion message
  - error:  only the final message when exit code != 0
  - off:    no watcher messages at all

Also supports HERMES_BACKGROUND_NOTIFICATIONS env var override.

Includes 12 tests (5 config loading + 7 watcher behavior).

Inspired by @PeterFile's PR #593. Closes #592.
2026-03-10 04:12:39 -07:00
teknium1
67fc6bc4e9 Merge PR #600: fix(security): use in-memory set for permanent allowlist save
Authored by alireza78a. Uses _permanent_approved directly instead of re-reading from disk, preventing potential data loss if a previous save failed.
2026-03-10 04:12:11 -07:00
teknium1
cbca0225f6 Merge PR #599: fix: strip MarkdownV2 italic markers in Telegram plaintext fallback
Authored by 0xbyt4.
2026-03-10 04:09:33 -07:00
teknium1
36ac91c902 Merge PR #598: feat(skill): expand duckduckgo-search with DDGS Python API coverage
Authored by areu01or00. Adds Python DDGS library examples for text, news, images, and video search with structured return field docs.
2026-03-10 04:08:53 -07:00
teknium1
a2902fbad5 Merge PR #594: Improve TTS error handling and logging
Authored by aydnOktay. Adds specific exception handlers, ffmpeg return code checking, and exc_info logging to tts_tool.py.
2026-03-10 04:04:17 -07:00
teknium1
d03de749a1 fix: add themed hero art for all skins, fix triple-quote syntax
Each themed skin (ares, poseidon, sisyphus, charizard) now has custom
banner_hero art that replaces the default Hermes caduceus. The hero art
uses braille-dot patterns themed to each skin:
- Ares: shield/spear emblem in crimson/bronze
- Poseidon: trident with wave patterns in blue/seafoam
- Sisyphus: boulder on slope in grayscale
- Charizard: dragon silhouette in orange/ember

Also fixes triple-quote string termination that caused a syntax error
in the previous commit.
2026-03-10 03:54:12 -07:00
Dev User
c3dec1dcda fix(file_tools): pass docker_volumes to sandbox container config
file_tools.py creates its own Docker sandbox when read_file/search_files
runs before any terminal command. The container_config was missing
docker_volumes, so the sandbox had no user volume mounts — breaking
access to heartbeat state, cron output, and all other mounted data.

Matches the existing pattern in terminal_tool.py:872.

Missed in original PR #158 (feat: add docker_volumes config).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-10 10:18:33 +01:00
teknium1
4945240fc3 feat: add poseidon/sisyphus/charizard skins + banner logo support
Adds 3 new built-in skins (poseidon, sisyphus, charizard) with full
customization — colors, spinner faces/verbs/wings, branding text, and
custom ASCII art banner logos. Total: 7 built-in skins.

Also adds banner_logo and banner_hero fields to SkinConfig, allowing
any skin to replace the HERMES-AGENT ASCII art logo and the caduceus
hero art with custom artwork. The CLI now renders the skin's logo when
available, falling back to the default Hermes logo.

Skins with custom logos: ares, poseidon, sisyphus, charizard
Skins using default logo: default, mono, slate
2026-03-10 02:11:50 -07:00
teknium1
f6bc620d39 fix: apply skin colors to local build_welcome_banner in cli.py
cli.py had a local copy of build_welcome_banner() that shadowed the
imported one from banner.py. This local copy had all colors hardcoded,
so /skin changes had no visible effect on the banner.

Now the local copy resolves skin colors at render time using
get_active_skin(), matching the banner.py behavior. All hardcoded
#FFD700/#CD7F32/#FFBF00/#B8860B/#FFF8DC/#8B8682 values in the local
function are replaced with skin-aware lookups.
2026-03-10 00:58:42 -07:00
teknium1
b4b46d1b67 docs: comprehensive skin/theme system documentation
- AGENTS.md: add Skin/Theme System section with architecture, skinnable
  elements table, built-in skins list, adding built-in/user skins guide,
  YAML example; add skin_engine.py to project structure; mention skin
  engine in CLI Architecture section
- CONTRIBUTING.md: add skin_engine.py to project structure; add 'Adding
  a Skin/Theme' section with YAML schema, activation instructions
- cli-config.yaml.example: add full skin config documentation with
  schema reference, built-in skins list, all color/spinner/branding keys
- docs/skins/example-skin.yaml: complete annotated skin template with
  all available fields and inline documentation
- hermes_cli/skin_engine.py: expand module docstring to full schema
  reference with all fields documented, usage examples, built-in skins
  list
2026-03-10 00:51:27 -07:00
teknium1
c1775de56f feat: filesystem checkpoints and /rollback command
Automatic filesystem snapshots before destructive file operations,
with user-facing rollback.  Inspired by PR #559 (by @alireza78a).

Architecture:
- Shadow git repos at ~/.hermes/checkpoints/{hash}/ via GIT_DIR
- CheckpointManager: take/list/restore, turn-scoped dedup, pruning
- Transparent — the LLM never sees it, no tool schema, no tokens
- Once per turn — only first write_file/patch triggers a snapshot

Integration:
- Config: checkpoints.enabled + checkpoints.max_snapshots
- CLI flag: hermes --checkpoints
- Trigger: run_agent.py _execute_tool_calls() before write_file/patch
- /rollback slash command in CLI + gateway (list, restore by number)
- Pre-rollback snapshot auto-created on restore (undo the undo)

Safety:
- Never blocks file operations — all errors silently logged
- Skips root dir, home dir, dirs >50K files
- Disables gracefully when git not installed
- Shadow repo completely isolated from project git

Tests: 35 new tests, all passing (2798 total suite)
Docs: feature page, config reference, CLI commands reference
2026-03-10 00:49:15 -07:00
teknium1
de6750ed23 feat: add data-driven skin/theme engine for CLI customization
Adds a skin system that lets users customize the CLI's visual appearance
through data files (YAML) rather than code changes. Skins define: color
palette, spinner faces/verbs/wings, branding text, and tool output prefix.

New files:
- hermes_cli/skin_engine.py — SkinConfig dataclass, built-in skins
  (default, ares, mono, slate), YAML loader for user skins from
  ~/.hermes/skins/, skin management API
- tests/hermes_cli/test_skin_engine.py — 26 tests covering config,
  built-in skins, user YAML skins, display integration

Modified files:
- agent/display.py — skin-aware spinner wings, faces, verbs, tool prefix
- hermes_cli/banner.py — skin-aware banner colors (title, border, accent,
  dim, text, session) via _skin_color()/_skin_branding() helpers
- cli.py — /skin command handler, skin init from config, skin-aware
  response box label and welcome message
- hermes_cli/config.py — add display.skin default
- hermes_cli/commands.py — add /skin to slash commands

Built-in skins:
- default: classic Hermes gold/kawaii
- ares: crimson/bronze war-god theme (from community PRs #579/#725)
- mono: clean grayscale
- slate: cool blue developer theme

User skins: drop a YAML file in ~/.hermes/skins/ with name, colors,
spinner, branding, and tool_prefix fields. Missing values inherit from
the default skin.
2026-03-10 00:37:28 -07:00
teknium1
c0ffd6b704 feat: expand OpenClaw migration to cover all platform channels, provider keys, model/TTS config, shared skills, and daily memory
Adds 9 new migration categories to the OpenClaw-to-Hermes migration script:

Platform channels (non-secret, in user-data preset):
- discord-settings: bot token + allowlist → .env
- slack-settings: bot/app tokens + allowlist → .env
- whatsapp-settings: allowlist → .env
- signal-settings: account, HTTP URL, allowlist → .env

Configuration:
- model-config: default model → config.yaml
- tts-config: TTS provider/voice settings → config.yaml tts.*

Data:
- shared-skills: ~/.openclaw/skills/ → ~/.hermes/skills/openclaw-imports/
- daily-memory: workspace/memory/*.md entries → merged into MEMORY.md

Secrets (full preset only, requires --migrate-secrets):
- provider-keys: OpenRouter/OpenAI/Anthropic API keys, ElevenLabs/OpenAI TTS keys

Bug fix: workspace-agents now records 'skipped' status when source is
missing instead of silently returning (invisible failure in reports).

Total migration options: 10 → 19
Tests: 14 → 24 (10 new tests covering all new categories)
Full suite: 2798 passed, 0 failures
2026-03-10 00:35:14 -07:00
teknium1
8b9de366f2 Merge PR #570: feat: OpenClaw migration skill + CLI panel width improvements
Authored by unmodeled-tyler. Adds openclaw-migration skill to optional-skills/
with migration script, SKILL.md, and 7 tests. Also improves clarify/approval
panel rendering with dynamic width calculation.
2026-03-10 00:06:40 -07:00
teknium1
60d3f79c72 Merge PR #565: fix: sanitize FTS5 queries and close mirror DB connections
Authored by 0xbyt4. Fixes #N/A (no linked issue).

- Sanitize user input before FTS5 MATCH to prevent OperationalError on
  special characters (C++, unbalanced quotes, dangling operators, etc.)
- Close SessionDB connection in mirror._append_to_sqlite() via finally block
- Added tests for both fixes
2026-03-09 23:59:26 -07:00
teknium1
6f3a673aba fix: restore success-path server_sock.close() before rpc_thread.join()
PR #568 moved the close entirely to the finally block, but the success-path
close is needed to break the RPC thread out of accept() immediately. Without
it, rpc_thread.join(3) may block for up to 3 seconds if the child process
never connected. The finally-block close remains as a safety net for the
exception/error path (the actual fd leak fix).
2026-03-09 23:40:20 -07:00
teknium1
ab6a6338c4 Merge PR #568: fix(code-execution): close server socket in finally block to prevent fd leak
Authored by alireza78a. Moves server_sock.close() into the finally block so
the socket fd is always cleaned up, even if an exception occurs between socket
creation and the success-path close.
2026-03-09 23:39:13 -07:00
teknium1
1ec8c1fcaa Merge PR #564: fix: count actual tool calls instead of tool-related messages
Authored by 0xbyt4. Fixes tool_call_count double-counting tool responses
and under-counting parallel tool calls.
2026-03-09 23:32:54 -07:00
teknium1
739eb6702e Merge PR #551: Make skill file writes atomic
Authored by aydnOktay. Adds _atomic_write_text() helper using tempfile.mkstemp()
+ os.replace() to prevent skill file corruption on crash/interrupt. All 7
write_text() calls in skill_manager_tool.py converted, including rollback writes
during security scans.
2026-03-09 23:31:43 -07:00
teknium1
1aa7badb3c fix: add missing Platform.SIGNAL to toolset mappings, update test + config docs
Platform.SIGNAL was missing from default_toolset_map and platform_config_key
in gateway/run.py, causing Signal to silently fall back to hermes-telegram
toolset (same bug as HomeAssistant, fixed in PR #538).

Also updates:
- tests/test_toolsets.py: include hermes-signal and hermes-homeassistant in
  the platform core-tools consistency check
- cli-config.yaml.example: document signal and homeassistant platform keys
2026-03-09 23:27:19 -07:00
teknium1
ee4008431a fix: stop terminal border flashing with steady cursor and TUI spinner widget
Cherry-picked and improved from PR #470 (fixes #464).

Problem: On Ubuntu 24.04 with ghostty + tmux, the prompt input box
border lines flash due to cursor blink and raw spinner terminal writes
conflicting with prompt_toolkit's rendering.

Changes:
- cli.py: Add CursorShape.BLOCK to Application() to disable cursor blink
- cli.py: Add thinking_callback + spinner_widget in TUI layout so
  thinking status displays as a proper prompt_toolkit widget instead of
  raw terminal writes that conflict with the TUI renderer
- run_agent.py: Add thinking_callback parameter to AIAgent; when set,
  uses the callback instead of KawaiiSpinner for thinking display

What was NOT changed (preserving existing behavior):
- agent/display.py: Untouched. KawaiiSpinner _write() stdout capture,
  _animate() logic, and 0.12s frame interval all preserved. This
  protects subagent stdout redirection and keeps smooth animations
  for non-CLI contexts (gateway, batch runner).
- Original emoji spinner types (brain/sparkle/pulse/moon/star) preserved
  for all non-CLI contexts.

Fixes from original PR #470:
- CursorShape.STEADY_BLOCK -> CursorShape.BLOCK (STEADY_BLOCK doesn't
  exist in prompt_toolkit 3.0.52)
- Removed duplicate self._spinner_text = '' line
- Removed redundant nested if-checks

Tested: 2706 tests pass, interactive CLI verified via tmux.
2026-03-09 23:26:43 -07:00
teknium1
88f8bcde38 Merge PR #538: fix cron HERMES_HOME path mismatch, missing HomeAssistant toolset mapping, Daytona timeout drift
Authored by Himess. Three independent fixes:
- cron/jobs.py: respect HERMES_HOME env var (consistent with scheduler.py)
- gateway/run.py: add Platform.HOMEASSISTANT to toolset mappings
- tools/environments/daytona.py: use time.monotonic() for timeout deadline
2026-03-09 23:20:52 -07:00
teknium1
2285615010 Merge PR #533: fix: use regex for search output parsing to handle Windows drive-letter paths
Authored by Himess. Replaces split(':', 2) with regex that optionally
captures Windows drive-letter prefix in rg/grep output parsing. Fixes
search_files returning zero results on Windows where paths like
C:\path\file.py:42:content were misparsed by naive colon splitting.
No behavior change on Unix/Mac.
2026-03-09 23:18:42 -07:00
teknium1
805ce8177b Merge PR #529: fix: restrict .env file permissions to owner-only
Authored by Himess. Adds 0600 chmod on ~/.hermes/.env after writing API keys,
matching the existing pattern in auth.py for auth.json.
2026-03-09 23:10:59 -07:00
teknium1
bdce33e239 Merge PR #810: fix(cli): handle unquoted multi-word session names in -c/--continue and -r/--resume 2026-03-09 23:08:45 -07:00
Teknium
9be8d88ccc Merge pull request #815 from NousResearch/hermes/hermes-5ab2a29e
Add hermes-atropos-environments bundled skill
2026-03-09 23:06:19 -07:00
teknium1
6ab3ebf195 Add hermes-atropos-environments skill (bundled)
Add comprehensive skill for building, testing, and debugging Hermes Agent
RL environments for Atropos training. Includes:

- SKILL.md: Full guide covering HermesAgentBaseEnv interface, required
  methods, config class, CLI modes (serve/process/evaluate), reward
  function patterns, common pitfalls, and minimum implementation checklist
- New 'Inference Setup' section: instructs the agent to always ask the
  user for their inference provider (OpenRouter + model choice, self-hosted
  VLLM endpoint, or other OpenAI-compatible API) before running tests
- references/agentresult-fields.md: AgentResult dataclass field reference
- references/atropos-base-env.md: Atropos BaseEnv API reference
- references/usage-patterns.md: Step-by-step patterns for process,
  evaluate, serve, and smoke test modes

Will be auto-synced to ~/.hermes/skills/ via skills_sync.
2026-03-09 23:04:17 -07:00
teknium1
0a628c1aef fix(cli): handle unquoted multi-word session names in -c/--continue and -r/--resume
When a user runs `hermes -w -c Pokemon Agent Dev` without quoting the
session name, argparse would fail with:
  error: argument command: invalid choice: 'Agent'

This is because argparse parses `-c Pokemon` (consuming one token via
nargs='?'), then sees 'Agent' and tries to match it as a subcommand.

Fix: add _coalesce_session_name_args() that pre-processes sys.argv before
argparse, joining consecutive non-flag, non-subcommand tokens after -c or
-r into a single argument. This makes both quoted and unquoted multi-word
session names work transparently.

Includes 17 tests covering all edge cases: multi-word names, single-word,
bare flags, flag ordering, subcommand boundaries, and passthrough.
2026-03-09 21:36:29 -07:00
teknium1
36328a996f Merge PR #458: Add explicit UTF-8 encoding to config/data file I/O
Authored by shitcoinsherpa. Adds encoding='utf-8' to all text-mode
open() calls in gateway/run.py, gateway/config.py, hermes_cli/config.py,
hermes_cli/main.py, and hermes_cli/status.py. Prevents encoding errors
on Windows where the default locale is not UTF-8.

Also fixed 4 additional open() calls in gateway/run.py that were added
after the PR branch was created.
2026-03-09 21:19:20 -07:00
shitcoinsherpa
4bc32dc0f1 Fix password reader for Windows using msvcrt.getwch()
The existing password prompt uses /dev/tty and termios to read input
with echo disabled. Neither exists on Windows.

On Windows, msvcrt.getwch() reads a single character from the console
without echoing it. This adds a Windows code path that uses getwch()
in a loop, collecting characters until Enter is pressed.

The Unix path using termios and /dev/tty is unchanged.
2026-03-09 21:15:59 -07:00
teknium1
4de5e017f1 Merge PR #457: Use pywinpty for PTY support on Windows
Authored by shitcoinsherpa. Imports winpty.PtyProcess on Windows instead
of ptyprocess.PtyProcess, and adds platform markers to the [pty] extra
so the correct package is installed automatically.
2026-03-09 21:09:56 -07:00
teknium1
3e352f8a0d fix: add upstream guard for non-dict function_args + tests for build_tool_preview
Complements PR #453 by 0xbyt4. Adds isinstance(dict) guard in
run_agent.py to catch cases where json.loads returns non-dict
(e.g. null, list, string) before they reach downstream code.

Also adds 15 tests for build_tool_preview covering None args,
empty dicts, known/unknown tools, fallback keys, truncation,
and all special-cased tools (process, todo, memory, session_search).
2026-03-09 21:01:40 -07:00
teknium1
28ae5db9b0 Merge PR #453: fix: handle None args in build_tool_preview
Authored by 0xbyt4. Adds defensive guard for None/empty args in
build_tool_preview() to prevent crashes when a model returns null
tool call arguments.
2026-03-09 20:58:34 -07:00
teknium1
d5811c887a Merge: fix double judge call + eval buffer pollution in WebResearchEnv 2026-03-09 20:57:54 -07:00
teknium1
975fd86dc4 fix: eliminate double LLM judge call and eval buffer pollution
evaluate() was calling _llm_judge twice per item (once via
compute_reward, once directly) — double the API cost for no benefit.
Now extracts correctness from compute_reward's buffer instead.

Also: compute_reward appends to training metric buffers during eval,
which would pollute wandb training charts. Now rolls back buffer
entries added during eval so training metrics stay clean.
2026-03-09 20:57:46 -07:00
teknium1
0ff7fe3ee2 Merge PR #439: docs: fix spelling of 'publicly'
Authored by JackTheGit. Simple typo fix: publically → publicly in axolotl reference docs.
2026-03-09 20:55:37 -07:00
teknium1
b9d55d5719 feat: add pokemon-player skill with battle-tested gameplay tips
Comprehensive skill for playing Pokemon Red/Blue via the pokemon-agent
package (NousResearch/pokemon-agent). Includes:

- Full startup procedure (uv venv, server, localhost.run dashboard tunnel)
- Save/load lifecycle and naming conventions
- Gameplay loop with emphasis on frequent vision checks
- Hard-learned navigation tips:
  - Use vision every 2-4 steps (RAM state is blind to obstacles)
  - Wait 2-3 seconds after door/stair warps for map transitions
  - Sidestep after exiting buildings to avoid re-entering
  - Hold B to speed Gen 1's slow text scrolling
  - Ledges are one-way — use vision to find gaps
- Battle strategy, type chart, Gen 1 quirks
- Memory conventions with PKM: prefix
- Progression milestones through all 8 gyms + Elite Four
2026-03-09 20:29:38 -07:00
teknium1
ab7dc22984 Merge: WebResearchEnv evaluate() with full agent loop + tools 2026-03-09 19:53:36 -07:00
teknium1
bf8350ac18 fix: evaluate() uses full agent loop with tools, not single-turn
The evaluate method was doing single-turn chat_completion (no tools),
which defeats the purpose of an agentic research benchmark. Fixed to
run the full HermesAgentLoop with web_search/web_extract tools.

Results comparison (Claude Sonnet 4.5, FRAMES benchmark):
  Without tools (broken): 0.56 mean correctness
  With agent loop + tools: 1.00 mean correctness, 0.994 reward

New eval metrics: mean_correctness, mean_reward, mean_tool_calls,
tool_usage_rate — all logged via evaluate_log() in lighteval format.
2026-03-09 19:53:28 -07:00
teknium1
a5c6348d41 Merge: WebResearchEnv compute_reward fix (verified with live test) 2026-03-09 19:29:19 -07:00
teknium1
320f881e0b fix: WebResearchEnv compute_reward extracts from AgentResult.messages
AgentResult has .messages (list of dicts), not .final_response or
.tool_calls. Fixed compute_reward to extract the final response
and tool names from the message history.

Verified with live process mode test:
  - Agent used 7 tool calls (web_search, web_extract)
  - Produced a 1106-char researched response about Winter Olympics
  - Reward: 0.384 (partial correctness via LLM judge)
  - JSONL output contains valid tokens, masks, scores, messages
2026-03-09 19:29:12 -07:00
teknium1
172a38c344 fix: Docker persistent bind mounts fail with Permission denied
cap-drop ALL removes DAC_OVERRIDE, which root needs to write to
bind-mounted directories owned by the host user (uid 1000). This
broke persistent Docker sandboxes — the container couldn't write
to /workspace or /root.

Add back the minimum capabilities needed:
- DAC_OVERRIDE: root can write to bind-mounted dirs owned by host user
- CHOWN: package managers (pip, npm, apt) need to set file ownership
- FOWNER: needed for operations on files owned by other users

Still drops all other capabilities (NET_RAW, SYS_ADMIN, etc.) and
keeps no-new-privileges. Security boundary is the container itself.

Verified end-to-end: create files → destroy container → new container
with same task_id → files persist on host and are accessible in the
new container.
2026-03-09 17:52:33 -07:00
teknium1
8bc0d4f77d Merge: WebResearchEnv Atropos standards compliance 2026-03-09 17:45:57 -07:00
teknium1
8eabdefa8a fix: bring WebResearchEnv up to Atropos environment standards
The environment was merged missing several standard components.
Updated to match the patterns established by 82 Atropos environments
and our own HermesAgentBaseEnv contract.

Added:
- WebResearchEnvConfig — custom Pydantic config with reward weights,
  efficiency thresholds, eval settings, dataset config (all tunable
  via CLI/YAML without code changes)
- config_init() classmethod — default server config (OpenRouter +
  Claude) so the env works out of the box
- wandb_log() override — logs reward breakdown metrics (correctness,
  tool_usage, efficiency, diversity, correct_rate, tool_usage_rate)
  with proper buffer management and super() call
- evaluate() — uses server.chat_completion instead of broken stub
  _run_agent_on_item(). Logs via evaluate_log() for lighteval-
  compatible output.

Fixed:
- Removed broken _run_agent_on_item() stub that returned empty results
- evaluate() now uses server.chat_completion (same pattern as
  TerminalTestEnv) for actual model evaluation
- compute_reward reads tool calls from AgentResult properly
- LLM judge uses self.server.chat_completion instead of ctx

Reward config is now tunable without code changes:
  --env.correctness_weight 0.6
  --env.tool_usage_weight 0.2
  --env.efficiency_weight 0.2
  --env.diversity_bonus 0.1
  --env.efficient_max_calls 5
2026-03-09 17:45:50 -07:00
teknium1
f658af45c2 Merge PR #446: fix(cli): use correct visibility filter string in codex API model fetch
Authored by PercyDikec. Fixes #445.
Changes 'hide' to 'hidden' in _fetch_models_from_api to match
_read_cache_models and the actual API response format.
2026-03-09 17:42:39 -07:00
teknium1
5212644861 fix(security): prevent shell injection in tilde-username path expansion
Validate that the username portion of ~username paths contains only
valid characters (alphanumeric, dot, hyphen, underscore) before passing
to shell echo for expansion. Previously, paths like '~; rm -rf /'
would be passed unquoted to self._exec(f'echo {path}'), allowing
arbitrary command execution.

The approach validates the username rather than using shlex.quote(),
which would prevent tilde expansion from working at all since
echo '~user' outputs the literal string instead of expanding it.

Added tests for injection blocking and valid ~username/path expansion.

Credit to @alireza78a for reporting (PR #442, issue #442).
2026-03-09 17:33:19 -07:00
teknium1
1151f84351 Merge PR #434: feat: add WebResearchEnv RL environment for multi-step web research
Authored by jackx707. Adds web_research_env.py (Atropos RL environment for
multi-step web research using FRAMES benchmark) and batch generation config.
2026-03-09 17:24:20 -07:00
teknium1
9abd6bf342 fix: gateway missing docker_volumes config bridge + list serialization bug
The gateway's config.yaml → env var bridge was missing docker_volumes,
so Docker volume mounts configured in config.yaml were ignored for
gateway sessions (Telegram, Discord, etc.) while working in CLI.

Also fixes list serialization: str() produces Python repr with single
quotes which json.loads() in terminal_tool.py can't parse. Now uses
json.dumps() for list values.

Based on PR #431 by @manuelschipper (applied manually due to stale branch).
2026-03-09 17:24:00 -07:00
Teknium
d2c7ef6b41 Merge pull request #792 from NousResearch/hermes/hermes-d2f5523a
Merge PR #428: Improve type hints and error diagnostics in vision_tools + add 42 tests
2026-03-09 17:21:44 -07:00
teknium1
a34102049b Merge: vision auto-detection fallback to local endpoints 2026-03-09 15:36:27 -07:00
teknium1
ef5d811aba fix: vision auto-detection now falls back to custom/local endpoints
Vision auto-mode previously only tried OpenRouter, Nous, and Codex
for multimodal — deliberately skipping custom endpoints with the
assumption they 'may not handle vision input.' This caused silent
failures for users running local multimodal models (Qwen-VL, LLaVA,
Pixtral, etc.) without any cloud API keys.

Now custom endpoints are tried as a last resort in auto mode. If the
model doesn't support vision, the API call fails gracefully — but
users with local vision models no longer need to manually set
auxiliary.vision.provider: main in config.yaml.

Reported by @Spadav and @kotyKD.
2026-03-09 15:36:19 -07:00
teknium1
2d44ed1c5b test: add comprehensive tests for vision_tools (42 tests)
Covers PR #428 changes and existing vision_tools functionality:
- _validate_image_url: 20 tests for urlparse-based validation
- _determine_mime_type: 6 tests for MIME type detection
- _image_to_base64_data_url: 3 tests for base64 conversion
- _handle_vision_analyze: 5 tests for type hints, prompt building,
  AUXILIARY_VISION_MODEL env var override
- Error logging exc_info: 3 async tests verifying stack traces are
  logged on download failure, analysis error, and cleanup error
- check_vision_requirements & get_debug_session_info: 2 basic tests
- Registry integration: 3 tests for tool registration
2026-03-09 15:32:02 -07:00
teknium1
fa2e72ae9c docs: document docker_volumes config for shared host directories
The Docker backend already supports user-configured volume mounts via
docker_volumes, but it was undocumented — missing from DEFAULT_CONFIG,
cli.py defaults, and configuration docs.

Changes:
- hermes_cli/config.py: Add docker_volumes to DEFAULT_CONFIG with
  inline documentation and examples
- cli.py: Add docker_volumes to load_cli_config defaults
- configuration.md: Full Docker Volume Mounts section with YAML
  examples, use cases (providing files, receiving outputs, shared
  workspaces), and env var alternative
2026-03-09 15:29:34 -07:00
teknium1
5bfc4ed53b Merge PR #428: Improve type hints and error diagnostics in vision_tools
Authored by aydnOktay. Improves URL validation with urlparse, adds exc_info
to error logs for full stack traces, and tightens type hints.

Resolved merge conflict in _handle_vision_analyze: kept PR's string formatting
with our AUXILIARY_VISION_MODEL env var logic.
2026-03-09 15:27:54 -07:00
teknium1
520aec20e0 fix: add mcp to dev dependencies for test suite
MCP tests import from mcp.types but mcp wasn't in the dev optional
dependencies. Fresh 'pip install -e .[dev]' setups failed 3 tests.

Based on PR #427 by @teyrebaz33 (applied manually due to stale branch).
2026-03-09 15:12:54 -07:00
teknium1
64bec1d060 fix: Slack gateway setup missing event subscriptions and scopes
The 'hermes gateway setup' instructions for Slack were missing:
- The 'Subscribe to Events' step entirely (message.im, message.channels,
  app_mention, message.groups)
- Several required scopes (app_mentions:read, groups:history, users:read,
  files:write)
- Warning about bot only working in DMs without message.channels
- Step to invite the bot to channels

The 'hermes setup' flow (setup.py) and the website docs (slack.md)
already had the correct information — only gateway.py was outdated.

Reported by JordanB on Slack.
2026-03-09 14:31:19 -07:00
teknium1
ac58309dbd docs: improve Slack setup guide with channel event subscriptions and scopes
The #1 support issue with Slack is 'bot works in DMs but not channels'.
This is almost always caused by missing event subscriptions (message.channels,
message.groups) or missing OAuth scopes (channels:history, groups:history).

Changes:
- slack.md: Move channels:history and groups:history from optional to required
  scopes. Move message.channels and message.groups to required events. Add new
  'How the Bot Responds' section explaining DM vs channel behavior. Add Step 8
  for inviting bot to channels. Expand troubleshooting table with specific
  'works in DMs not channels' entry. Add quick checklist for channel debugging.
- setup.py: Expand Slack setup wizard with all required scopes, event
  subscriptions, and a warning that without message.channels/message.groups
  the bot only works in DMs. Add link to full docs. Improve Member ID
  discovery instructions.
- config.py: Update SLACK_BOT_TOKEN and SLACK_APP_TOKEN descriptions to list
  required scopes and event subscriptions inline.
2026-03-09 14:00:11 -07:00
teknium1
5eaf4a3f32 feat: Telegram send_document and send_video for native file attachments
Implement send_document() and send_video() overrides in TelegramAdapter
so the agent can deliver files (PDFs, CSVs, docs, etc.) and videos as
native Telegram attachments instead of just printing the file path as
text.

The base adapter already routes MEDIA:<path> tags by extension — audio
goes to send_voice(), images to send_image_file(), and everything else
falls through to send_document(). But TelegramAdapter didn't override
send_document() or send_video(), so those fell back to plain text.

Now when the agent includes MEDIA:/path/to/report.pdf in its response,
users get a proper downloadable file attachment in Telegram.

Features:
- send_document: sends files via bot.send_document with display name,
  caption (truncated to 1024), and reply_to support
- send_video: sends videos via bot.send_video with inline playback
- Both fall back to base class text if the Telegram API call fails
- 10 new tests covering success, custom filename, file-not-found,
  not-connected, caption truncation, API error fallback, and reply_to

Requested by @TigerHixTang on Twitter.
2026-03-09 13:07:10 -07:00
Teknium
a5a5d82a21 Merge pull request #784 from NousResearch/feat/slack-app-mention-and-documents
feat(slack): fix app_mention 404 + add document/video support
2026-03-09 13:04:50 -07:00
teknium1
34e8d088c2 feat(slack): fix app_mention 404 + add document/video support
- Register no-op app_mention event handler to suppress Bolt 404 errors.
  The 'message' handler already processes @mentions in channels, so
  app_mention is acknowledged without duplicate processing.

- Add send_document() for native file attachments (PDFs, CSVs, etc.)
  via files_upload_v2, matching the pattern from Telegram PR #779.

- Add send_video() for native video uploads via files_upload_v2.

- Handle incoming document attachments from users: download, cache,
  and inject text content for .txt/.md files (capped at 100KB),
  following the same pattern as the Telegram adapter.

- Add _download_slack_file_bytes() helper for raw byte downloads.

- Add 24 new tests covering all new functionality.

Fixes the unhandled app_mention events reported in gateway logs.
2026-03-09 13:02:59 -07:00
memosr.eth
b78b605ba9 fix: replace print() with logger.error() in file_tools 2026-03-09 22:29:16 +03:00
teyrebaz33
c3cf88b202 feat(cli,gateway): add /personality none and custom personality support
Closes #643

Changes:
- /personality none|default|neutral — clears system prompt overlay
- Custom personalities in config.yaml support dict format with:
  name, description, system_prompt, tone, style directives
- Backwards compatible — existing string format still works
- CLI + gateway both updated
- 18 tests covering none/default/neutral, dict format, string format,
  list display, save to config
2026-03-09 17:31:54 +03:00
0xbyt4
58b756f04c fix: clean up empty file after failed wl-paste clipboard extraction
When wl-paste produces empty output, the destination file was left
on disk as a 0-byte orphan. Now explicitly removed before returning
False.
2026-03-09 17:17:10 +03:00
0xbyt4
34f8ac2d85 fix: replace blocking time.sleep with await asyncio.sleep in WhatsApp connect
time.sleep(1) inside async def connect() blocks the entire event
loop for 1 second. Replaced with await asyncio.sleep(1) to yield
control back to the event loop while waiting for the killed port
process to release.
2026-03-09 17:16:26 +03:00
0xbyt4
1a10eb8cd9 fix: off-by-one in setup toggle selection error message
Error message said "between 1 and N+1" for N items, showing a
max value that would itself be rejected. Now correctly says
"between 1 and N".
2026-03-09 17:15:23 +03:00
luisv-1
59705b80cd Add tools summary flag to Hermes CLI
Made-with: Cursor
2026-03-09 16:50:53 +03:00
aydnOktay
46a7d6aeb2 Improve Telegram gateway error handling and logging 2026-03-09 15:58:01 +03:00
teknium1
c754135965 fix: banner wraps in narrow terminals (Kitty, small windows)
The full HERMES-AGENT ASCII logo needs ~95 columns, and the
side-by-side caduceus + tools panel needs ~80. In narrow terminals
(Kitty default, resized windows) everything wraps into visual garbage.

Fixes:
- show_banner() auto-detects terminal width and falls back to compact
  banner when < 80 columns
- build_welcome_banner() skips the ASCII logo when < 95 columns
- Compact banner now dynamically sized via _build_compact_banner()
  instead of a hardcoded 64-char box that also wrapped in narrow terms
- Same width checks applied to /clear command's banner refresh

The up/down arrow key issue in Kitty terminal for multiline input is
a known Kitty keyboard protocol (CSI u) vs prompt_toolkit compatibility
gap — arrow keys work correctly in standard terminals and tmux. Users
can work around it by running in tmux or setting TERM=xterm-256color.
2026-03-09 05:57:36 -07:00
teknium1
c6b75baad0 feat: find-nearby skill and Telegram location support
Adds a 'find-nearby' skill for discovering nearby places using
OpenStreetMap (Overpass + Nominatim). No API keys needed. Works with:
- Coordinates (from Telegram location pins)
- Addresses, cities, zip codes, landmarks (auto-geocoded)
- Multiple place types (restaurant, cafe, bar, pharmacy, etc.)

Returns names, distances, cuisine, hours, addresses, and Google Maps
links (pin + directions). 184-line stdlib-only script.

Also adds Telegram location message handling:
- New MessageType.LOCATION in gateway base
- Telegram adapter handles LOCATION and VENUE messages
- Injects lat/lon coordinates into conversation context
- Prompts agent to ask what the user wants nearby

Inspired by PR #422 (reimplemented with simpler script and broader
skill scope — addresses/cities/zips, not just Telegram coordinates).
2026-03-09 05:31:10 -07:00
teknium1
a7ad6f6d28 Merge: custom providers instant activation + model persistence 2026-03-09 05:08:01 -07:00
teknium1
1a2141d04d fix: custom providers activate immediately, save model name
Selecting a saved custom provider now switches instantly without
probing /models — the model name is stored in the config entry
as a complete profile (name + url + key + model).

Changes:
- custom_providers entries now include 'model' field
- Selecting a saved provider with a model just activates it
- Only probes /models if no model is saved (first-time setup)
- Menu shows saved model name: 'Local (localhost:8000) — llama-70b'
- Dedup on re-entry: still activates the model, just doesn't add
  a duplicate config entry (updates model name if changed)
2026-03-09 05:07:53 -07:00
teknium1
ff3f3169b2 Merge: auto-save custom endpoints + removal option 2026-03-09 04:58:27 -07:00
teknium1
f4580b6010 feat: auto-save custom endpoints + removal option
When a user adds a custom endpoint via 'hermes model' → 'Custom
endpoint', it now automatically saves to custom_providers in
config.yaml so it persists and appears in the provider menu on
subsequent runs. Deduplicates by base_url.

Auto-generated names based on URL:
  http://localhost:8000/v1 → 'Local (localhost:8000)'
  https://xyz.runpod.ai/v1 → 'RunPod (xyz.runpod.ai)'
  https://api.example.com/v1 → 'Api.example.com'

Also adds 'Remove a saved custom provider' option to the menu
(only shown when custom providers exist) with a selection UI
to pick which one to remove.

Users can also manually edit custom_providers in config.yaml
for full control over names and settings.
2026-03-09 04:58:20 -07:00
aydnOktay
d82fcef91b Improve Discord gateway error handling and logging 2026-03-09 14:33:21 +03:00
teknium1
7b63a787b3 Merge: named custom providers in hermes model 2026-03-09 03:45:26 -07:00
teknium1
069570d103 feat: support multiple named custom providers in hermes model
Users with multiple local servers or custom endpoints can now define
them all in config.yaml and switch between them from the model
selection menu:

  custom_providers:
    - name: 'Local Llama 70B'
      base_url: 'http://localhost:8000/v1'
      api_key: 'not-needed'
    - name: 'RunPod vLLM'
      base_url: 'https://xyz.runpod.ai/v1'
      api_key: 'rp_xxxxx'

These appear in `hermes model` provider selection alongside the
built-in providers. When selected, the endpoint's /models API is
probed to show available models in a selection menu.

Previously only a single 'Custom endpoint' option existed, requiring
manual URL entry each time you wanted to switch between local servers.

Requested by @ZiarnoBobu on Twitter.
2026-03-09 03:45:17 -07:00
teknium1
0dafdcab86 Merge: skill reorganization + sub-category support
- Sub-category support in prompt_builder.py (backwards-compatible)
- Split mlops (40 skills) into 7 logical sub-categories
- Merged 8 singleton categories into logical parents
- Fixed 2 misplaced skills (code-review, ml-paper-writing)
2026-03-09 03:40:11 -07:00
Teknium
654e16187e feat(mcp): add sampling support — server-initiated LLM requests (#753)
Add MCP sampling/createMessage capability via SamplingHandler class.

Text-only sampling + tool use in sampling with governance (rate limits,
model whitelist, token caps, tool loop limits). Per-server audit metrics.

Based on concept from PR #366 by eren-karakus0. Restructured as class-based
design with bug fixes and tests using real MCP SDK types.

50 new tests, 2600 total passing.
2026-03-09 03:37:38 -07:00
teknium1
732c66b0f3 refactor: reorganize skills into sub-categories
The skills directory was getting disorganized — mlops alone had 40
skills in a flat list, and 12 categories were singletons with just
one skill each.

Code change:
- prompt_builder.py: Support sub-categories in skill scanner.
  skills/mlops/training/axolotl/SKILL.md now shows as category
  'mlops/training' instead of just 'mlops'. Backwards-compatible
  with existing flat structure.

Split mlops (40 skills) into 7 sub-categories:
- mlops/training (12): accelerate, axolotl, flash-attention,
  grpo-rl-training, peft, pytorch-fsdp, pytorch-lightning,
  simpo, slime, torchtitan, trl-fine-tuning, unsloth
- mlops/inference (8): gguf, guidance, instructor, llama-cpp,
  obliteratus, outlines, tensorrt-llm, vllm
- mlops/models (6): audiocraft, clip, llava, segment-anything,
  stable-diffusion, whisper
- mlops/vector-databases (4): chroma, faiss, pinecone, qdrant
- mlops/evaluation (5): huggingface-tokenizers,
  lm-evaluation-harness, nemo-curator, saelens, weights-and-biases
- mlops/cloud (2): lambda-labs, modal
- mlops/research (1): dspy

Merged singleton categories:
- gifs → media (gif-search joins youtube-content)
- music-creation → media (heartmula, songsee)
- diagramming → creative (excalidraw joins ascii-art)
- ocr-and-documents → productivity
- domain → research (domain-intel)
- feeds → research (blogwatcher)
- market-data → research (polymarket)

Fixed misplaced skills:
- mlops/code-review → software-development (not ML-specific)
- mlops/ml-paper-writing → research (academic writing)

Added DESCRIPTION.md files for all new/updated categories.
2026-03-09 03:35:53 -07:00
teknium1
1f0944de21 fix: handle non-string content from OpenAI-compatible servers (#759)
Some local LLM servers (llama-server, etc.) return message.content as
a dict or list instead of a plain string. This caused AttributeError
'dict object has no attribute strip' on every API call.

Normalizes content to string immediately after receiving the response:
- dict: extracts 'text' or 'content' field, falls back to json.dumps
- list: extracts text parts (OpenAI multimodal content format)
- other: str() conversion

Applied at the single point where response.choices[0].message is read
in the main agent loop, so all downstream .strip()/.startswith()/[:100]
operations work regardless of server implementation.

Closes #759
2026-03-09 03:32:32 -07:00
0xbyt4
912efe11b5 fix(tests): add content attribute to fake result objects
_FakeReadResult and _FakeSearchResult now expose the attributes
that read_file_tool/search_tool access after the redact_sensitive_text
integration from main.
2026-03-09 13:25:52 +03:00
0xbyt4
4684aaffdc merge: resolve file_tools.py conflict with origin/main
Combine read/search loop detection with main's redact_sensitive_text
and truncation hint features. Add tracker reset to TestSearchHints
to prevent cross-test state leakage.
2026-03-09 13:21:46 +03:00
teknium1
f1a1b58319 fix: hermes setup doesn't update provider when switching to OpenRouter
When switching FROM Codex/Nous/custom TO OpenRouter via 'hermes setup',
the old provider stayed active because setup only saved the API key but
never updated config.yaml or auth.json. This caused resolve_provider()
to keep returning the old provider (e.g. openai-codex) even after the
user selected OpenRouter.

Fix: the OpenRouter path in setup now deactivates any OAuth provider
in auth.json and writes model.provider='openrouter' to config.yaml,
matching what all other provider paths already do.
2026-03-09 03:14:22 -07:00
teknium1
c21d77ca08 Merge: OBLITERATUS skill v2.0 + unified gateway compression
OBLITERATUS skill (PR #408 updated):
- 9 CLI methods, 28 analysis modules, 116 model presets
- Default method: advanced (multi-direction SVD, norm-preserving)
- Live-tested: Qwen2.5-3B 75%→0% refusal, Qwen2.5-0.5B 60%→20%
- References, templates, and real-world pitfalls included

Gateway compression fix (PR #739):
- Unified session hygiene with agent compression config
- Uses model context length × compression.threshold from config.yaml
- Removed hardcoded 100k/200-msg thresholds
2026-03-09 02:59:41 -07:00
teknium1
d6c710706f docs: add real-world testing findings to OBLITERATUS skill
Added pitfalls discovered during live abliteration testing:
- Models < 1B have fragmented refusal, respond poorly (0.5B: 60%→20%)
- Models 3B+ work much better (3B: 75%→0% with advanced defaults)
- aggressive method can backfire on small models (made it worse)
- Spectral certification RED is common even when refusal rate is 0%
- Fixed torch property: total_mem → total_memory
2026-03-09 02:52:54 -07:00
teknium1
a6d3becd6a feat: update OBLITERATUS skill to v2.0 — match current repo state
Major updates to reflect the current OBLITERATUS codebase:

- Change default recommendation from 'informed' (experimental) to
  'advanced' (reliable, well-tested multi-direction SVD)
- Add new CLI commands: tourney, recommend, strategies, report,
  aggregate, abliterate (alias)
- Add --direction-method flag (diff_means, svd, leace)
- Add strategies module (embedding/FFN ablation, head pruning,
  layer removal)
- Add evaluation module with LM Eval Harness integration
- Expand analysis modules from 15 to 28
- Add Apple Silicon (MLX) support
- Add study presets (quick, jailbreak, knowledge, etc.)
- Add --contribute, --verify-sample-size, --preset flags
- Add complete CLI command reference table
- Fix torch property name: total_mem -> total_memory (caught
  during live testing)

Tested: Successfully abliterated Qwen2.5-0.5B-Instruct using
'advanced' method — refusal rate 0.4%, coherence 1.0, model
responds without refusal to test prompts.
2026-03-09 02:39:03 -07:00
teknium1
3b67606c42 fix: custom endpoint provider shows as openrouter in gateway
Three issues caused the gateway to display 'openrouter' instead of
'Custom endpoint' when users configured a custom OAI-compatible endpoint:

1. hermes setup: custom endpoint path saved OPENAI_BASE_URL and
   OPENAI_API_KEY to .env but never wrote model.provider to config.yaml.
   All other providers (Codex, z.ai, Kimi, etc.) call
   _update_config_for_provider() which sets this — custom was the only
   path that skipped it. Now writes model.provider='custom' and
   model.base_url to config.yaml.

2. hermes model: custom endpoint set model.provider='auto' in config.yaml.
   The CLI display had a hack to detect OPENAI_BASE_URL and override to
   'custom', but the gateway didn't. Now sets model.provider='custom'
   directly.

3. gateway /model and /provider commands: defaulted to 'openrouter' and
   read config.yaml — which had no provider set. Added OPENAI_BASE_URL
   detection fallback (same pattern the CLI uses) as a defensive catch
   for existing users who set up before this fix.
2026-03-09 02:38:34 -07:00
teknium1
f8240143b6 feat(discord): add DISCORD_ALLOW_BOTS config for bot message filtering (inspired by openclaw)
Add configurable bot message filtering via DISCORD_ALLOW_BOTS env var:

- 'none' (default): Ignore all other bot messages — matches previous
  behavior where only our own bot was filtered, but now ALL bots are
  filtered by default for cleaner channels
- 'mentions': Accept bot messages only when they @mention our bot —
  useful for bot-to-bot workflows triggered by mentions
- 'all': Accept all bot messages — for setups where bots need to
  interact freely

Previously, we only ignored our own bot's messages, allowing all other
bots through. This could cause noisy loops in channels with multiple bots.

8 new tests covering all filter modes and edge cases.

Inspired by openclaw v2026.3.7 Discord allowBots: 'mentions' config.
2026-03-09 02:20:57 -07:00
teknium1
0ce190be0d security: enforce 0600/0700 file permissions on sensitive files (inspired by openclaw)
Enforce owner-only permissions on files and directories that contain
secrets or sensitive data:

- cron/jobs.py: jobs.json (0600), cron dirs (0700), job output files (0600)
- hermes_cli/config.py: config.yaml (0600), .env (0600), ~/.hermes/* dirs (0700)
- cli.py: config.yaml via save_config_value (0600)

All chmod calls use try/except for Windows compatibility.

Includes _secure_file() and _secure_dir() helpers with graceful fallback.
8 new tests verify permissions on all file types.

Inspired by openclaw v2026.3.7 file permission enforcement.
2026-03-09 02:19:32 -07:00
teyrebaz33
1404f846a7 feat(cli,gateway): add user-defined quick commands that bypass agent loop
Implements config-driven quick commands for both CLI and gateway that
execute locally without invoking the LLM.

Config example (~/.hermes/config.yaml):
  quick_commands:
    limits:
      type: exec
      command: /home/user/.local/bin/hermes-limits
    dn:
      type: exec
      command: echo daily-note

Changes:
- hermes_cli/config.py: add quick_commands: {} default
- cli.py: check quick_commands before skill commands in process_command()
- gateway/run.py: check quick_commands before skill commands in _handle_message()
- tests/test_quick_commands.py: 11 tests covering exec, timeout, unsupported type, missing command, priority over skills

Closes #744
2026-03-09 07:38:06 +03:00
teyrebaz33
7241e8784a feat: hermes skills — enable/disable individual skills and categories (#642)
Add interactive skill configuration via `hermes skills` command,
mirroring the existing `hermes tools` pattern.

Changes:
- hermes_cli/skills_config.py (new): skills_command() entry point with
  curses checklist UI + numbered fallback. Supports global and
  per-platform disable lists, individual skill toggle, and category toggle.
- hermes_cli/main.py: register `hermes skills` subcommand
- tools/skills_tool.py: add _is_skill_disabled() and filter disabled
  skills in _find_all_skills(). Resolves platform from argument,
  HERMES_PLATFORM env var, then falls back to global disabled list.

Config schema (config.yaml):
  skills:
    disabled: [skill-a]                 # global
    platform_disabled:
      telegram: [skill-b]               # per-platform override

22 unit tests, 2489 passed, 0 failed.

Closes #642
2026-03-09 07:02:06 +03:00
teknium1
763c6d104d fix: unify gateway session hygiene with agent compression config
The gateway had a SEPARATE compression system ('session hygiene')
with hardcoded thresholds (100k tokens / 200 messages) that were
completely disconnected from the model's context length and the
user's compression config in config.yaml. This caused premature
auto-compression on Telegram/Discord — triggering at ~60k tokens
(from the 200-message threshold) or inconsistent token counts.

Changes:
- Gateway hygiene now reads model name from config.yaml and uses
  get_model_context_length() to derive the actual context limit
- Compression threshold comes from compression.threshold in
  config.yaml (default 0.85), same as the agent's ContextCompressor
- Removed the message-count-based trigger (was redundant and caused
  false positives in tool-heavy sessions)
- Removed the undocumented session_hygiene config section — the
  standard compression.* config now controls everything
- Env var overrides (CONTEXT_COMPRESSION_THRESHOLD,
  CONTEXT_COMPRESSION_ENABLED) are respected
- Warn threshold is now 95% of model context (was hardcoded 200k)
- Updated tests to verify model-aware thresholds, scaling across
  models, and that message count alone no longer triggers compression

For claude-opus-4.6 (200k context) at 85% threshold: gateway
hygiene now triggers at 170k tokens instead of the old 100k.
2026-03-08 20:08:02 -07:00
teknium1
37752ff1ac feat: bell_on_complete — terminal bell when agent finishes
Adds a simple config option to play the terminal bell (\a) when the
agent finishes a response. Useful for long-running tasks — switch to
another window and your terminal will ding when done.

Works over SSH since the bell character propagates through the
connection. Most terminal emulators can be configured to flash the
taskbar, play a sound, or show a visual indicator on bell.

Config (default: off):
  display:
    bell_on_complete: true

Closes #318
2026-03-08 19:41:17 -07:00
0xbyt4
d8df91dfa8 fix: resolve merge conflict with main in clipboard.py 2026-03-09 03:50:29 +03:00
dmahan93
7791174ced feat: add --fuck-it-ship-it flag to bypass dangerous command approvals
Adds a fun alias for skipping all dangerous command approval prompts.
When passed, sets HERMES_YOLO_MODE=1 which causes check_dangerous_command()
to auto-approve everything.

Available on both top-level and chat subcommand:
  hermes --fuck-it-ship-it
  hermes chat --fuck-it-ship-it

Includes 5 tests covering normal blocking, yolo bypass, all patterns,
and edge cases (empty string env var).
2026-03-08 18:36:37 -05:00
0xbyt4
0c3253a485 fix: mock asyncio.run in mirror test to prevent event loop destruction
asyncio.run() closes the event loop after execution, which breaks
subsequent tests using asyncio.get_event_loop() (test_send_image_file).
2026-03-09 00:20:19 +03:00
0xbyt4
d0f84c0964 fix: log exceptions instead of silently swallowing in cron scheduler
Two 'except Exception: pass' blocks silently hide failures:
- mirror_to_session failure: user's message never gets mirrored, no trace
- config.yaml parse failure: wrong model used silently

Replace with logger.warning so failures are visible in logs.
2026-03-09 00:06:34 +03:00
VolodymyrBg
ceefe36756 docs: clarify Telegram token regex constraint 2026-03-08 22:33:06 +02:00
0xbyt4
67421ed74f fix: update test_non_empty_has_markers to match todo filtering behavior
Completed/cancelled items are now filtered from format_for_injection()
output. Update the existing test to verify active items appear and
completed items are excluded.
2026-03-08 23:07:38 +03:00
0xbyt4
e2fe1373f3 fix: escalate read/search blocking, track search loops, filter completed todos
- Block file reads after 3+ re-reads of same region (no content returned)
- Track search_files calls and block repeated identical searches
- Filter completed/cancelled todos from post-compression injection
  to prevent agent from re-doing finished work
- Add 10 new tests covering all three fixes
2026-03-08 23:01:21 +03:00
memosr.eth
7891050e06 fix: use Path.read_text() instead of open() in browser_tool 2026-03-08 22:39:17 +03:00
memosr.eth
e28dc13cd5 fix: store and close log file handles in rl_training_tool 2026-03-08 22:38:02 +03:00
0xbyt4
9eee529a7f fix: detect and warn on file re-read loops after context compression
When context compression summarizes conversation history, the agent
loses track of which files it already read and re-reads them in a loop.
Users report the agent reading the same files endlessly without writing.

Root cause: context compression is lossy — file contents and read history
are lost in the summary. After compression, the model thinks it hasn't
examined the files yet and reads them again.

Fix (two-part):
1. Track file reads per task in file_tools.py. When the same file region
   is read again, include a _warning in the response telling the model
   to stop re-reading and use existing information.
2. After context compression, inject a structured message listing all
   files already read in the session with explicit "do NOT re-read"
   instruction, preserving read history across compression boundaries.

Adds 16 tests covering warning detection, task isolation, summary
accuracy, tracker cleanup, and compression history injection.
2026-03-08 20:44:42 +03:00
aydnOktay
7b1f40dd00 Improve error handling and logging in code execution tool 2026-03-08 14:50:23 +03:00
vincent
86eed141af fix: rebuild compressed payload before retry 2026-03-07 18:55:01 -05:00
Blake Johnson
c6df39955c fix: limit concurrent Modal sandbox creations to avoid deadlocks
- Add max_concurrent_tasks config (default 8) with semaphore in TB2 eval
- Pass cwd: /app via register_task_env_overrides for TB2 tasks
- Add /home/ to host path prefixes as safety net for container backends

When all 86 TerminalBench2 tasks fire simultaneously, each creates a Modal sandbox
via asyncio.run() inside a thread pool worker. Modal's blocking calls deadlock
when too many are created at once. The semaphore ensures max 8 concurrent creations.

Co-Authored-By: hermes-agent[bot] <hermes-agent[bot]@users.noreply.github.com>
2026-03-07 14:02:34 -08:00
aydnOktay
19459b7623 Improve skills tool error handling 2026-03-08 00:30:49 +03:00
alireza78a
b0b19fdeb1 fix(session): atomic write for sessions.json to prevent data loss on crash 2026-03-07 20:57:00 +03:30
0xbyt4
8c26a057a3 fix: reset all retry counters at start of run_conversation()
_incomplete_scratchpad_retries and _codex_incomplete_retries were not
reset at the start of run_conversation(). In CLI mode, where the same
AIAgent instance is reused across conversations, stale counters from
a previous conversation could carry over, causing premature retry
exhaustion and partial responses.
2026-03-07 20:12:08 +03:00
JackTheGit
ae4644f495 Fix Ruff lint warnings (unused imports and unnecessary f-strings) 2026-03-07 17:08:09 +00:00
0xbyt4
70cffa4d3b fix: return "deny" on approval callback timeout instead of None
_approval_callback() had no return statement after the timeout break,
causing it to return None. Callers expect a string ("once", "session",
"always", or "deny"), so None could lead to undefined behavior when
approving dangerous commands.
2026-03-07 20:02:13 +03:00
0xbyt4
ee7d8c56c7 fix: prevent data loss in clipboard PNG conversion when ImageMagick fails
_convert_to_png() renamed the original file to .bmp before calling
ImageMagick convert, then unconditionally deleted the .bmp regardless
of whether convert succeeded. If convert failed, both files were gone.

- Only delete .bmp after confirmed successful conversion
- Restore original file on convert failure, timeout, or missing binary
- Add 3 tests covering failure, not-installed, and timeout scenarios
2026-03-07 20:02:12 +03:00
alireza78a
40bc7216e1 fix(security): use in-memory set for permanent allowlist save 2026-03-07 19:33:30 +03:30
0xbyt4
5cdcb9e26f fix: strip MarkdownV2 italic markers in Telegram plaintext fallback
When MarkdownV2 parsing fails, _strip_mdv2() removes escape backslashes
and bold markers (*text*) but missed italic markers (_text_). Users saw
raw underscores around italic text in the plaintext fallback.

- Add regex to strip _text_ italic markers in _strip_mdv2()
- Use word boundary lookaround to preserve snake_case identifiers
- Add tests for _strip_mdv2 covering italic, bold, snake_case, and edge cases
2026-03-07 18:55:25 +03:00
areu01or00
ce7e7fef30 docs(skill): expand duckduckgo-search with DDGS Python API coverage
Add Python DDGS library examples for all 4 search types (text, news,
images, videos) with return field documentation, quick reference table,
and validated gotchas. Reorganize to put Python API primary, CLI secondary.
Soften Firecrawl-fallback framing. All examples validated on ddgs==9.11.2.
2026-03-07 21:15:29 +05:30
aydnOktay
86caa8539c Improve TTS error handling and logging 2026-03-07 16:53:30 +03:00
Tyler
53b4b7651a Add official OpenClaw migration skill for Hermes Agent
Introduces a new OpenClaw-to-Hermes migration skill with a Python
helper script that handles importing SOUL.md, memories, user profiles,
messaging settings, command allowlists, skills, TTS assets, and
workspace instructions.

Supports two migration presets (user-data / full), three skill conflict
modes (skip / overwrite / rename), overflow file export for entries that
exceed character limits, and granular include/exclude option filtering.

Includes detailed SKILL.md agent instructions covering the clarify-tool
interaction protocol, decision-to-command mapping, post-run reporting
rules, and path resolution guidance.

Adds dynamic panel width calculation to CLI clarify/approval widgets so
panels adapt to content and terminal size.

Includes 7 new tests covering presets, include/exclude, conflict modes,
overflow exports, and skills_guard integration.
2026-03-06 18:57:12 -08:00
alireza78a
a857321463 fix(code-execution): close server socket in finally block to prevent fd leak 2026-03-07 05:49:48 +03:30
0xbyt4
33cfe1515d fix: sanitize FTS5 queries and close mirror DB connections
Two bugs fixed:

1. search_messages() crashes with OperationalError when user queries
   contain FTS5 special characters (+, ", (, {, dangling AND/OR, etc).
   Added _sanitize_fts5_query() to strip dangerous operators and a
   fallback try-except for edge cases.

2. _append_to_sqlite() in mirror.py creates a new SessionDB per call
   but never closes it, leaking SQLite connections. Added finally block
   to ensure db.close() is always called.
2026-03-07 04:24:45 +03:00
0xbyt4
3b43f7267a fix: count actual tool calls instead of tool-related messages
tool_call_count was inaccurate in two ways:

1. Under-counting: an assistant message with N parallel tool calls
   (e.g. "kill the light and shut off the fan" = 2 ha_call_service)
   only incremented tool_call_count by 1 instead of N.

2. Over-counting: tool response messages (role=tool) also incremented
   tool_call_count, double-counting every tool interaction.

Combined: 2 parallel tool calls produced tool_call_count=3 (1 from
assistant + 2 from tool responses) instead of the correct value of 2.

Fix: only count from assistant messages with tool_calls, incrementing
by len(tool_calls) to handle parallel calls correctly. Tool response
messages no longer affect tool_call_count.

This impacts /insights and /usage accuracy for sessions with tool use.
2026-03-07 04:07:52 +03:00
unmodeled-tyler
1755a9e38a Design agent migration skill for Hermes Agent from OpenClaw | Run
successful dry tests with reports
2026-03-06 15:12:45 -08:00
aydnOktay
566aeaeefa Make skill file writes atomic 2026-03-07 00:49:10 +03:00
Himess
7a0544ab57 fix: three small inconsistencies across cron, gateway, and daytona
1. cron/jobs.py: respect HERMES_HOME env var for job storage path.
   scheduler.py already uses os.getenv("HERMES_HOME", ...) but jobs.py
   hardcodes Path.home() / ".hermes", causing path mismatch when
   HERMES_HOME is set.

2. gateway/run.py: add Platform.HOMEASSISTANT to default_toolset_map
   and platform_config_key. The adapter and hermes-homeassistant
   toolset both exist but the mapping dicts omit it, so HomeAssistant
   events silently fall back to the Telegram toolset.

3. tools/environments/daytona.py: use time.monotonic() for deadline
   instead of float subtraction. All other backends (docker, ssh,
   singularity, local) use monotonic clock for timeout tracking.
   The accumulator pattern (deadline -= 0.2) drifts because
   t.join(0.2) + interrupt checks take longer than 0.2s per iteration.
2026-03-06 16:52:17 +03:00
Himess
453e0677d6 fix: use regex for search output parsing to handle Windows drive-letter paths
The ripgrep/grep output parser uses `split(':', 2)` to extract
file:lineno:content from match lines. On Windows, absolute paths
contain a drive letter colon (e.g. `C:\Users\foo\bar.py:42:content`),
so `split(':', 2)` produces `["C", "\Users\...", "42:content"]`.
`int(parts[1])` then raises ValueError and the match is silently
dropped. All search results are lost on Windows.

Same category as #390 — string-based path parsing that fails on
Windows. Replace `split()` with a regex that optionally captures
the drive letter prefix: `^([A-Za-z]:)?(.*?):(\d+):(.*)$`.

Applied to both `_search_with_rg` and `_search_with_grep`.
2026-03-06 15:54:33 +03:00
Himess
32dbd31b9a fix: restrict .env file permissions to owner-only
save_env_value() writes API keys to ~/.hermes/.env but never sets file
permissions, leaving the file world-readable (0644). auth.py already
restricts auth.json to 0600 — apply the same treatment to .env.

Skipped on Windows where chmod is not effective.
2026-03-06 15:14:26 +03:00
shitcoinsherpa
81986022b7 Add explicit encoding="utf-8" to all config/data file open() calls
On Windows, open() defaults to the system locale encoding (cp1252,
cp1254, etc.) rather than UTF-8. This breaks any file containing
non-ASCII characters, and also causes crashes when writing JSON with
ensure_ascii=False.

This adds encoding="utf-8" to open() calls in:
- gateway/run.py (config.yaml reads/writes throughout)
- gateway/config.py (gateway.json and config.yaml)
- hermes_cli/config.py (config.yaml load/save)
- hermes_cli/main.py (session export with ensure_ascii=False)
- hermes_cli/status.py (jobs.json and sessions.json)
2026-03-05 17:16:04 -05:00
shitcoinsherpa
dcba291d45 Use pywinpty instead of ptyprocess on Windows for PTY support
ptyprocess depends on Unix-only APIs (fork, openpty) and cannot work
on Windows at all. pywinpty provides a compatible PtyProcess interface
using the Windows ConPTY API.

This conditionally imports winpty.PtyProcess on Windows and
ptyprocess.PtyProcess on Unix. The pyproject.toml pty extra now uses
platform markers so the correct package is installed automatically.
2026-03-05 17:16:04 -05:00
shitcoinsherpa
48e65631f6 Fix auth store file lock for Windows (msvcrt) with reentrancy support
fcntl is not available on Windows. This adds msvcrt.locking as a
fallback for cross-process advisory locking on Windows.

msvcrt.locking is not reentrant within the same thread, unlike fcntl.flock.
This matters because resolve_codex_runtime_credentials holds the lock and
then calls _save_codex_tokens, which tries to acquire it again. Without
reentrancy tracking, this deadlocks on Windows after a 15-second timeout.

Uses threading.local() to track lock depth per thread, allowing nested
acquisitions to pass through without re-acquiring the underlying lock.

Also handles msvcrt-specific requirements: file must be opened in r+ mode
(not a+), must have at least 1 byte of content, and the file pointer must
be at position 0 before locking.
2026-03-05 17:16:03 -05:00
0xbyt4
14a11d24b4 fix: handle None args in build_tool_preview
When an LLM returns null/empty tool call arguments, json.loads()
produces None. build_tool_preview then crashes with
"argument of type 'NoneType' is not iterable" on the `in` check.
Return None early when args is falsy.
2026-03-05 23:09:11 +03:00
PercyDikec
36214d14db fix(cli): use correct visibility filter string in codex API model fetch 2026-03-05 21:12:53 +03:00
JackTheGit
71c0cd00e5 docs: fix spelling of 'publicly' 2026-03-05 16:46:21 +00:00
jackx707
15561ec425 feat: add WebResearchEnv RL environment for multi-step web research 2026-03-05 14:34:36 +00:00
aydnOktay
7d79ce92ac Improve type hints and error diagnostics in vision_tools 2026-03-05 16:11:59 +03:00
391 changed files with 24859 additions and 1662 deletions

291
.plans/openai-api-server.md Normal file
View File

@@ -0,0 +1,291 @@
# OpenAI-Compatible API Server for Hermes Agent
## Motivation
Every major chat frontend (Open WebUI 126k★, LobeChat 73k★, LibreChat 34k★,
AnythingLLM 56k★, NextChat 87k★, ChatBox 39k★, Jan 26k★, HF Chat-UI 8k★,
big-AGI 7k★) connects to backends via the OpenAI-compatible REST API with
SSE streaming. By exposing this endpoint, hermes-agent becomes instantly
usable as a backend for all of them — no custom adapters needed.
## What It Enables
```
┌──────────────────┐
│ Open WebUI │──┐
│ LobeChat │ │ POST /v1/chat/completions
│ LibreChat │ ├──► Authorization: Bearer <key> ┌─────────────────┐
│ AnythingLLM │ │ {"messages": [...]} │ hermes-agent │
│ NextChat │ │ │ gateway │
│ Any OAI client │──┘ ◄── SSE streaming response │ (API server) │
└──────────────────┘ └─────────────────┘
```
A user would:
1. Set `API_SERVER_ENABLED=true` in `~/.hermes/.env`
2. Run `hermes gateway` (API server starts alongside Telegram/Discord/etc.)
3. Point Open WebUI (or any frontend) at `http://localhost:8642/v1`
4. Chat with hermes-agent through any OpenAI-compatible UI
## Endpoints
| Method | Path | Purpose |
|--------|------|---------|
| POST | `/v1/chat/completions` | Chat with the agent (streaming + non-streaming) |
| GET | `/v1/models` | List available "models" (returns hermes-agent as a model) |
| GET | `/health` | Health check |
## Architecture
### Option A: Gateway Platform Adapter (recommended)
Create `gateway/platforms/api_server.py` as a new platform adapter that
extends `BasePlatformAdapter`. This is the cleanest approach because:
- Reuses all gateway infrastructure (session management, auth, context building)
- Runs in the same async loop as other adapters
- Gets message handling, interrupt support, and session persistence for free
- Follows the established pattern (like Telegram, Discord, etc.)
- Uses `aiohttp.web` (already a dependency) for the HTTP server
The adapter would start an `aiohttp.web.Application` server in `connect()`
and route incoming HTTP requests through the standard `handle_message()` pipeline.
### Option B: Standalone Component
A separate HTTP server class in `gateway/api_server.py` that creates its own
AIAgent instances directly. Simpler but duplicates session/auth logic.
**Recommendation: Option A** — fits the existing architecture, less code to
maintain, gets all gateway features for free.
## Request/Response Format
### Chat Completions (non-streaming)
```
POST /v1/chat/completions
Authorization: Bearer hermes-api-key-here
Content-Type: application/json
{
"model": "hermes-agent",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What files are in the current directory?"}
],
"stream": false,
"temperature": 0.7
}
```
Response:
```json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1710000000,
"model": "hermes-agent",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": "Here are the files in the current directory:\n..."
},
"finish_reason": "stop"
}],
"usage": {
"prompt_tokens": 50,
"completion_tokens": 200,
"total_tokens": 250
}
}
```
### Chat Completions (streaming)
Same request with `"stream": true`. Response is SSE:
```
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Here "},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"are "},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
```
### Models List
```
GET /v1/models
Authorization: Bearer hermes-api-key-here
```
Response:
```json
{
"object": "list",
"data": [{
"id": "hermes-agent",
"object": "model",
"created": 1710000000,
"owned_by": "hermes-agent"
}]
}
```
## Key Design Decisions
### 1. Session Management
The OpenAI API is stateless — each request includes the full conversation.
But hermes-agent sessions have persistent state (memory, skills, tool context).
**Approach: Hybrid**
- Default: Stateless. Each request is independent. The `messages` array IS
the conversation. No session persistence between requests.
- Opt-in persistent sessions via `X-Session-ID` header. When provided, the
server maintains session state across requests (conversation history,
memory context, tool state). This enables richer agent behavior.
- The session ID also enables interrupt support — a subsequent request with
the same session ID while one is running triggers an interrupt.
### 2. Streaming
The agent's `run_conversation()` is synchronous and returns the full response.
For real SSE streaming, we need to emit chunks as they're generated.
**Phase 1 (MVP):** Run agent in a thread, return the complete response as
a single SSE chunk + `[DONE]`. This works with all frontends — they just see
a fast single-chunk response. Not true streaming but functional.
**Phase 2:** Add a response callback to AIAgent that emits text chunks as the
LLM generates them. The API server captures these via a queue and streams them
as SSE events. This gives real token-by-token streaming.
**Phase 3:** Stream tool execution progress too — emit tool call/result events
as the agent works, giving frontends visibility into what the agent is doing.
### 3. Tool Transparency
Two modes:
- **Opaque (default):** Frontends see only the final response. Tool calls
happen server-side and are invisible. Best for general-purpose UIs.
- **Transparent (opt-in via header):** Tool calls are emitted as OpenAI-format
tool_call/tool_result messages in the stream. Useful for agent-aware frontends.
### 4. Authentication
- Bearer token via `Authorization: Bearer <key>` header
- Token configured via `API_SERVER_KEY` env var
- Optional: allow unauthenticated local-only access (127.0.0.1 bind)
- Follows the same pattern as other platform adapters
### 5. Model Mapping
Frontends send `"model": "hermes-agent"` (or whatever). The actual LLM model
used is configured server-side in config.yaml. The API server maps any
requested model name to the configured hermes-agent model.
Optionally, allow model passthrough: if the frontend sends
`"model": "anthropic/claude-sonnet-4"`, the agent uses that model. Controlled
by a config flag.
## Configuration
```yaml
# In config.yaml
api_server:
enabled: true
port: 8642
host: "127.0.0.1" # localhost only by default
key: "your-secret-key" # or via API_SERVER_KEY env var
allow_model_override: false # let clients choose the model
max_concurrent: 5 # max simultaneous requests
```
Environment variables:
```bash
API_SERVER_ENABLED=true
API_SERVER_PORT=8642
API_SERVER_HOST=127.0.0.1
API_SERVER_KEY=your-secret-key
```
## Implementation Plan
### Phase 1: MVP (non-streaming) — PR
1. `gateway/platforms/api_server.py` — new adapter
- aiohttp.web server with endpoints:
- `POST /v1/chat/completions` — Chat Completions API (universal compat)
- `POST /v1/responses` — Responses API (server-side state, tool preservation)
- `GET /v1/models` — list available models
- `GET /health` — health check
- Bearer token auth middleware
- Non-streaming responses (run agent, return full result)
- Chat Completions: stateless, messages array is the conversation
- Responses API: server-side conversation storage via previous_response_id
- Store full internal conversation (including tool calls) keyed by response ID
- On subsequent requests, reconstruct full context from stored chain
- Frontend system prompt layered on top of hermes-agent's core prompt
2. `gateway/config.py` — add `Platform.API_SERVER` enum + config
3. `gateway/run.py` — register adapter in `_create_adapter()`
4. Tests in `tests/gateway/test_api_server.py`
### Phase 2: SSE Streaming
1. Add response streaming to both endpoints
- Chat Completions: `choices[0].delta.content` SSE format
- Responses API: semantic events (response.output_text.delta, etc.)
- Run agent in thread, collect output via callback queue
- Handle client disconnect (cancel agent)
2. Add `stream_callback` parameter to `AIAgent.run_conversation()`
### Phase 3: Enhanced Features
1. Tool call transparency mode (opt-in)
2. Model passthrough/override
3. Concurrent request limiting
4. Usage tracking / rate limiting
5. CORS headers for browser-based frontends
6. GET /v1/responses/{id} — retrieve stored response
7. DELETE /v1/responses/{id} — delete stored response
## Files Changed
| File | Change |
|------|--------|
| `gateway/platforms/api_server.py` | NEW — main adapter (~300 lines) |
| `gateway/config.py` | Add Platform.API_SERVER + config (~20 lines) |
| `gateway/run.py` | Register adapter in _create_adapter() (~10 lines) |
| `tests/gateway/test_api_server.py` | NEW — tests (~200 lines) |
| `cli-config.yaml.example` | Add api_server section |
| `README.md` | Mention API server in platform list |
## Compatibility Matrix
Once implemented, hermes-agent works as a drop-in backend for:
| Frontend | Stars | How to Connect |
|----------|-------|---------------|
| Open WebUI | 126k | Settings → Connections → Add OpenAI API, URL: `http://localhost:8642/v1` |
| NextChat | 87k | BASE_URL env var |
| LobeChat | 73k | Custom provider endpoint |
| AnythingLLM | 56k | LLM Provider → Generic OpenAI |
| Oobabooga | 42k | Already a backend, not a frontend |
| ChatBox | 39k | API Host setting |
| LibreChat | 34k | librechat.yaml custom endpoint |
| Chatbot UI | 29k | Custom API endpoint |
| Jan | 26k | Remote model config |
| AionUI | 18k | Custom API endpoint |
| HF Chat-UI | 8k | OPENAI_BASE_URL env var |
| big-AGI | 7k | Custom endpoint |

705
.plans/streaming-support.md Normal file
View File

@@ -0,0 +1,705 @@
# Streaming LLM Response Support for Hermes Agent
## Overview
Add token-by-token streaming of LLM responses across all platforms. When enabled,
users see the response typing out live instead of waiting for the full generation.
Streaming is opt-in via config, defaults to off, and all existing non-streaming
code paths remain intact as the default.
## Design Principles
1. **Feature-flagged**: `streaming.enabled: true` in config.yaml. Off by default.
When off, all existing code paths are unchanged — zero risk to current behavior.
2. **Callback-based**: A simple `stream_callback(text_delta: str)` function injected
into AIAgent. The agent doesn't know or care what the consumer does with tokens.
3. **Graceful degradation**: If the provider doesn't support streaming, or streaming
fails for any reason, silently fall back to the non-streaming path.
4. **Platform-agnostic core**: The streaming mechanism in AIAgent works the same
regardless of whether the consumer is CLI, Telegram, Discord, or the API server.
---
## Architecture
```
stream_callback(delta)
┌─────────────┐ ┌─────────────▼──────────────┐
│ LLM API │ │ queue.Queue() │
│ (stream) │───►│ thread-safe bridge between │
│ │ │ agent thread & consumer │
└─────────────┘ └─────────────┬──────────────┘
┌──────────────┼──────────────┐
│ │ │
┌─────▼─────┐ ┌─────▼─────┐ ┌─────▼─────┐
│ CLI │ │ Gateway │ │ API Server│
│ print to │ │ edit msg │ │ SSE event │
│ terminal │ │ on Tg/Dc │ │ to client │
└───────────┘ └───────────┘ └───────────┘
```
The agent runs in a thread. The callback puts tokens into a thread-safe queue.
Each consumer reads the queue in its own context (async task, main thread, etc.).
---
## Configuration
### config.yaml
```yaml
streaming:
enabled: false # Master switch. Default off.
# Per-platform overrides (optional):
# cli: true # Override for CLI only
# telegram: true # Override for Telegram only
# discord: false # Keep Discord non-streaming
# api_server: true # Override for API server
```
### Environment variables
```
HERMES_STREAMING_ENABLED=true # Master switch via env
```
### How the flag is read
- **CLI**: `load_cli_config()` reads `streaming.enabled`, sets env var. AIAgent
checks at init time.
- **Gateway**: `_run_agent()` reads config, decides whether to pass
`stream_callback` to the AIAgent constructor.
- **API server**: For Chat Completions `stream=true` requests, always uses streaming
regardless of config (the client is explicitly requesting it). For non-stream
requests, uses config.
### Precedence
1. API server: client's `stream` field overrides everything
2. Per-platform config override (e.g., `streaming.telegram: true`)
3. Master `streaming.enabled` flag
4. Default: off
---
## Implementation Plan
### Phase 1: Core streaming infrastructure in AIAgent
**File: run_agent.py**
#### 1a. Add stream_callback parameter to __init__ (~5 lines)
```python
def __init__(self, ..., stream_callback: callable = None, ...):
self.stream_callback = stream_callback
```
No other init changes. The callback is optional — when None, everything
works exactly as before.
#### 1b. Add _run_streaming_chat_completion() method (~65 lines)
New method for Chat Completions API streaming:
```python
def _run_streaming_chat_completion(self, api_kwargs: dict):
"""Stream a chat completion, emitting text tokens via stream_callback.
Returns a fake response object compatible with the non-streaming code path.
Falls back to non-streaming on any error.
"""
stream_kwargs = dict(api_kwargs)
stream_kwargs["stream"] = True
stream_kwargs["stream_options"] = {"include_usage": True}
accumulated_content = []
accumulated_tool_calls = {} # index -> {id, name, arguments}
final_usage = None
try:
stream = self.client.chat.completions.create(**stream_kwargs)
for chunk in stream:
if not chunk.choices:
# Usage-only chunk (final)
if chunk.usage:
final_usage = chunk.usage
continue
delta = chunk.choices[0].delta
# Text content — emit via callback
if delta.content:
accumulated_content.append(delta.content)
if self.stream_callback:
try:
self.stream_callback(delta.content)
except Exception:
pass
# Tool call deltas — accumulate silently
if delta.tool_calls:
for tc_delta in delta.tool_calls:
idx = tc_delta.index
if idx not in accumulated_tool_calls:
accumulated_tool_calls[idx] = {
"id": tc_delta.id or "",
"name": "", "arguments": ""
}
if tc_delta.function:
if tc_delta.function.name:
accumulated_tool_calls[idx]["name"] = tc_delta.function.name
if tc_delta.function.arguments:
accumulated_tool_calls[idx]["arguments"] += tc_delta.function.arguments
# Build fake response compatible with existing code
tool_calls = []
for idx in sorted(accumulated_tool_calls):
tc = accumulated_tool_calls[idx]
if tc["name"]:
tool_calls.append(SimpleNamespace(
id=tc["id"], type="function",
function=SimpleNamespace(name=tc["name"], arguments=tc["arguments"]),
))
return SimpleNamespace(
choices=[SimpleNamespace(
message=SimpleNamespace(
content="".join(accumulated_content) or "",
tool_calls=tool_calls or None,
role="assistant",
),
finish_reason="tool_calls" if tool_calls else "stop",
)],
usage=final_usage,
model=self.model,
)
except Exception as e:
logger.debug("Streaming failed, falling back to non-streaming: %s", e)
return self.client.chat.completions.create(**api_kwargs)
```
#### 1c. Modify _run_codex_stream() for Responses API (~10 lines)
The method already iterates the stream. Add callback emission:
```python
def _run_codex_stream(self, api_kwargs: dict):
with self.client.responses.stream(**api_kwargs) as stream:
for event in stream:
# Emit text deltas if streaming callback is set
if self.stream_callback and hasattr(event, 'type'):
if event.type == 'response.output_text.delta':
try:
self.stream_callback(event.delta)
except Exception:
pass
return stream.get_final_response()
```
#### 1d. Modify _interruptible_api_call() (~5 lines)
Add the streaming branch:
```python
def _call():
try:
if self.api_mode == "codex_responses":
result["response"] = self._run_codex_stream(api_kwargs)
elif self.stream_callback is not None:
result["response"] = self._run_streaming_chat_completion(api_kwargs)
else:
result["response"] = self.client.chat.completions.create(**api_kwargs)
except Exception as e:
result["error"] = e
```
#### 1e. Signal end-of-stream to consumers (~5 lines)
After the API call returns, signal the callback that streaming is done
so consumers can finalize (remove cursor, close SSE, etc.):
```python
# In run_conversation(), after _interruptible_api_call returns:
if self.stream_callback:
try:
self.stream_callback(None) # None = end of stream signal
except Exception:
pass
```
Consumers check: `if delta is None: finalize()`
**Tests for Phase 1:** (~150 lines)
- Test _run_streaming_chat_completion with mocked stream
- Test fallback to non-streaming on error
- Test tool_call accumulation during streaming
- Test stream_callback receives correct deltas
- Test None signal at end of stream
- Test streaming disabled when callback is None
---
### Phase 2: Gateway consumers (Telegram, Discord, etc.)
**File: gateway/run.py**
#### 2a. Read streaming config (~15 lines)
In `_run_agent()`, before creating the AIAgent:
```python
# Read streaming config
_streaming_enabled = False
try:
# Check per-platform override first
platform_key = source.platform.value if source.platform else ""
_stream_cfg = {} # loaded from config.yaml streaming section
if _stream_cfg.get(platform_key) is not None:
_streaming_enabled = bool(_stream_cfg[platform_key])
else:
_streaming_enabled = bool(_stream_cfg.get("enabled", False))
except Exception:
pass
# Env var override
if os.getenv("HERMES_STREAMING_ENABLED", "").lower() in ("true", "1", "yes"):
_streaming_enabled = True
```
#### 2b. Set up queue + callback (~15 lines)
```python
_stream_q = None
_stream_done = None
_stream_msg_id = [None] # mutable ref for the async task
if _streaming_enabled:
import queue as _q
_stream_q = _q.Queue()
_stream_done = threading.Event()
def _on_token(delta):
if delta is None:
_stream_done.set()
else:
_stream_q.put(delta)
```
Pass `stream_callback=_on_token` to the AIAgent constructor.
#### 2c. Telegram/Discord stream preview task (~50 lines)
```python
async def stream_preview():
"""Progressively edit a message with streaming tokens."""
if not _stream_q:
return
adapter = self.adapters.get(source.platform)
if not adapter:
return
accumulated = []
token_count = 0
last_edit = 0.0
MIN_TOKENS = 20 # Don't show until enough context
EDIT_INTERVAL = 1.5 # Respect Telegram rate limits
try:
while not _stream_done.is_set():
try:
chunk = _stream_q.get(timeout=0.1)
accumulated.append(chunk)
token_count += 1
except queue.Empty:
continue
now = time.monotonic()
if token_count >= MIN_TOKENS and (now - last_edit) >= EDIT_INTERVAL:
preview = "".join(accumulated) + ""
if _stream_msg_id[0] is None:
r = await adapter.send(
chat_id=source.chat_id,
content=preview,
metadata=_thread_metadata,
)
if r.success and r.message_id:
_stream_msg_id[0] = r.message_id
else:
await adapter.edit_message(
chat_id=source.chat_id,
message_id=_stream_msg_id[0],
content=preview,
)
last_edit = now
# Drain remaining tokens
while not _stream_q.empty():
accumulated.append(_stream_q.get_nowait())
# Final edit — remove cursor, show complete text
if _stream_msg_id[0] and accumulated:
await adapter.edit_message(
chat_id=source.chat_id,
message_id=_stream_msg_id[0],
content="".join(accumulated),
)
except asyncio.CancelledError:
# Clean up on cancel
if _stream_msg_id[0] and accumulated:
try:
await adapter.edit_message(
chat_id=source.chat_id,
message_id=_stream_msg_id[0],
content="".join(accumulated),
)
except Exception:
pass
except Exception as e:
logger.debug("stream_preview error: %s", e)
```
#### 2d. Skip final send if already streamed (~10 lines)
In `_process_message_background()` (base.py), after getting the response,
if streaming was active and `_stream_msg_id[0]` is set, the final response
was already delivered via progressive edits. Skip the normal `self.send()`
call to avoid duplicating the message.
This is the most delicate integration point — we need to communicate from
the gateway's `_run_agent` back to the base adapter's response sender that
the response was already delivered. Options:
- **Option A**: Return a special marker in the result dict:
`result["_streamed_msg_id"] = _stream_msg_id[0]`
The base adapter checks this and skips `send()`.
- **Option B**: Edit the already-sent message with the final response
(which may differ slightly from accumulated tokens due to think-block
stripping, etc.) and don't send a new one.
- **Option C**: The stream preview task handles the FULL final response
(including any post-processing), and the handler returns None to skip
the normal send path.
Recommended: **Option A** — cleanest separation. The result dict already
carries metadata; adding one more field is low-risk.
**Platform-specific considerations:**
| Platform | Edit support | Rate limits | Streaming approach |
|----------|-------------|-------------|-------------------|
| Telegram | ✅ edit_message_text | ~20 edits/min | Edit every 1.5s |
| Discord | ✅ message.edit | 5 edits/5s per message | Edit every 1.2s |
| Slack | ✅ chat.update | Tier 3 (~50/min) | Edit every 1.5s |
| WhatsApp | ❌ no edit support | N/A | Skip streaming, use normal path |
| HomeAssistant | ❌ no edit | N/A | Skip streaming |
| API Server | ✅ SSE native | No limit | Real SSE events |
WhatsApp and HomeAssistant fall back to non-streaming automatically because
they don't support message editing.
**Tests for Phase 2:** (~100 lines)
- Test stream_preview sends/edits correctly
- Test skip-final-send when streaming delivered
- Test WhatsApp/HA graceful fallback
- Test streaming disabled per-platform config
- Test thread_id metadata forwarded in stream messages
---
### Phase 3: CLI streaming
**File: cli.py**
#### 3a. Set up callback in the CLI chat loop (~20 lines)
In `_chat_once()` or wherever the agent is invoked:
```python
if streaming_enabled:
_stream_q = queue.Queue()
_stream_done = threading.Event()
def _cli_stream_callback(delta):
if delta is None:
_stream_done.set()
else:
_stream_q.put(delta)
agent.stream_callback = _cli_stream_callback
```
#### 3b. Token display thread/task (~30 lines)
Start a thread that reads the queue and prints tokens:
```python
def _stream_display():
"""Print tokens to terminal as they arrive."""
first_token = True
while not _stream_done.is_set():
try:
delta = _stream_q.get(timeout=0.1)
except queue.Empty:
continue
if first_token:
# Print response box top border
_cprint(f"\n{top}")
first_token = False
sys.stdout.write(delta)
sys.stdout.flush()
# Drain remaining
while not _stream_q.empty():
sys.stdout.write(_stream_q.get_nowait())
sys.stdout.flush()
# Print bottom border
_cprint(f"\n\n{bot}")
```
**Integration challenge: prompt_toolkit**
The CLI uses prompt_toolkit which controls the terminal. Writing directly
to stdout while prompt_toolkit is active can cause display corruption.
The existing KawaiiSpinner already solves this by using prompt_toolkit's
`patch_stdout` context. The streaming display would need to do the same.
Alternative: use `_cprint()` for each token chunk (routes through
prompt_toolkit's renderer). But this might be slow for individual tokens.
Recommended approach: accumulate tokens in small batches (e.g., every 50ms)
and `_cprint()` the batch. This balances display responsiveness with
prompt_toolkit compatibility.
**Tests for Phase 3:** (~50 lines)
- Test CLI streaming callback setup
- Test response box borders with streaming
- Test fallback when streaming disabled
---
### Phase 4: API Server real streaming
**File: gateway/platforms/api_server.py**
Replace the pseudo-streaming `_write_sse_chat_completion()` with real
token-by-token SSE when the agent supports it.
#### 4a. Wire streaming callback for stream=true requests (~20 lines)
```python
if stream:
_stream_q = queue.Queue()
def _api_stream_callback(delta):
_stream_q.put(delta) # None = done
# Pass callback to _run_agent
result, usage = await self._run_agent(
..., stream_callback=_api_stream_callback,
)
```
#### 4b. Real SSE writer (~40 lines)
```python
async def _write_real_sse(self, request, completion_id, model, stream_q):
response = web.StreamResponse(
headers={"Content-Type": "text/event-stream", "Cache-Control": "no-cache"},
)
await response.prepare(request)
# Role chunk
await response.write(...)
# Stream content chunks as they arrive
while True:
try:
delta = await asyncio.get_event_loop().run_in_executor(
None, lambda: stream_q.get(timeout=0.1)
)
except queue.Empty:
continue
if delta is None: # End of stream
break
chunk = {"id": completion_id, "object": "chat.completion.chunk", ...
"choices": [{"delta": {"content": delta}, ...}]}
await response.write(f"data: {json.dumps(chunk)}\n\n".encode())
# Finish + [DONE]
await response.write(...)
await response.write(b"data: [DONE]\n\n")
return response
```
**Challenge: concurrent execution**
The agent runs in a thread executor. SSE writing happens in the async event
loop. The queue bridges them. But `_run_agent()` currently awaits the full
result before returning. For real streaming, we need to start the agent in
the background and stream tokens while it runs:
```python
# Start agent in background
agent_task = asyncio.create_task(self._run_agent_async(...))
# Stream tokens while agent runs
await self._write_real_sse(request, ..., stream_q)
# Agent is done by now (stream_q received None)
result, usage = await agent_task
```
This requires splitting `_run_agent` into an async version that doesn't
block waiting for the result, or running it in a separate task.
**Responses API SSE format:**
For `/v1/responses` with `stream=true`, the SSE events are different:
```
event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"Hello"}
event: response.completed
data: {"type":"response.completed","response":{...}}
```
This needs a separate SSE writer that emits Responses API format events.
**Tests for Phase 4:** (~80 lines)
- Test real SSE streaming with mocked agent
- Test SSE event format (Chat Completions vs Responses)
- Test client disconnect during streaming
- Test fallback to pseudo-streaming when callback not available
---
## Integration Issues & Edge Cases
### 1. Tool calls during streaming
When the model returns tool calls instead of text, no text tokens are emitted.
The stream_callback is simply never called with text. After tools execute, the
next API call may produce the final text response — streaming picks up again.
The stream preview task needs to handle this: if no tokens arrive during a
tool-call round, don't send/edit any message. The tool progress messages
continue working as before.
### 2. Duplicate messages
The biggest risk: the agent sends the final response normally (via the
existing send path) AND the stream preview already showed it. The user
sees the response twice.
Prevention: when streaming is active and tokens were delivered, the final
response send must be suppressed. The `result["_streamed_msg_id"]` marker
tells the base adapter to skip its normal send.
### 3. Response post-processing
The final response may differ from the accumulated streamed tokens:
- Think block stripping (`<think>...</think>` removed)
- Trailing whitespace cleanup
- Tool result media tag appending
The stream preview shows raw tokens. The final edit should use the
post-processed version. This means the final edit (removing the cursor)
should use the post-processed `final_response`, not just the accumulated
stream text.
### 4. Context compression during streaming
If the agent triggers context compression mid-conversation, the streaming
tokens from BEFORE compression are from a different context than those
after. This isn't a problem in practice — compression happens between
API calls, not during streaming.
### 5. Interrupt during streaming
User sends a new message while streaming → interrupt. The stream is killed
(HTTP connection closed), accumulated tokens are shown as-is (no cursor),
and the interrupt message is processed normally. This is already handled by
`_interruptible_api_call` closing the client.
### 6. Multi-model / fallback
If the primary model fails and the agent falls back to a different model,
streaming state resets. The fallback call may or may not support streaming.
The graceful fallback in `_run_streaming_chat_completion` handles this.
### 7. Rate limiting on edits
Telegram: ~20 edits/minute (~1 every 3 seconds to be safe)
Discord: 5 edits per 5 seconds per message
Slack: ~50 API calls/minute
The 1.5s edit interval is conservative enough for all platforms. If we get
429 rate limit errors on edits, just skip that edit cycle and try next time.
---
## Files Changed Summary
| File | Phase | Changes |
|------|-------|---------|
| `run_agent.py` | 1 | +stream_callback param, +_run_streaming_chat_completion(), modify _run_codex_stream(), modify _interruptible_api_call() |
| `gateway/run.py` | 2 | +streaming config reader, +queue/callback setup, +stream_preview task, +skip-final-send logic |
| `gateway/platforms/base.py` | 2 | +check for _streamed_msg_id in response handler |
| `cli.py` | 3 | +streaming setup, +token display, +response box integration |
| `gateway/platforms/api_server.py` | 4 | +real SSE writer, +streaming callback wiring |
| `hermes_cli/config.py` | 1 | +streaming config defaults |
| `cli-config.yaml.example` | 1 | +streaming section |
| `tests/test_streaming.py` | 1-4 | NEW — ~380 lines of tests |
**Total new code**: ~500 lines across all phases
**Total test code**: ~380 lines
---
## Rollout Plan
1. **Phase 1** (core): Merge to main. Streaming disabled by default.
Zero impact on existing behavior. Can be tested with env var.
2. **Phase 2** (gateway): Merge to main. Test on Telegram manually.
Enable per-platform: `streaming.telegram: true` in config.
3. **Phase 3** (CLI): Merge to main. Test in terminal.
Enable: `streaming.cli: true` or `streaming.enabled: true`.
4. **Phase 4** (API server): Merge to main. Test with Open WebUI.
Auto-enabled when client sends `stream: true`.
Each phase is independently mergeable and testable. Streaming stays
off by default throughout. Once all phases are stable, consider
changing the default to enabled.
---
## Config Reference (final state)
```yaml
# config.yaml
streaming:
enabled: false # Master switch (default: off)
cli: true # Per-platform override
telegram: true
discord: true
slack: true
api_server: true # API server always streams when client requests it
edit_interval: 1.5 # Seconds between message edits (default: 1.5)
min_tokens: 20 # Tokens before first display (default: 20)
```
```bash
# Environment variable override
HERMES_STREAMING_ENABLED=true
```

113
AGENTS.md
View File

@@ -31,7 +31,13 @@ hermes-agent/
│ ├── config.py # DEFAULT_CONFIG, OPTIONAL_ENV_VARS, migration
│ ├── commands.py # Slash command definitions + SlashCommandCompleter
│ ├── callbacks.py # Terminal callbacks (clarify, sudo, approval)
── setup.py # Interactive setup wizard
── setup.py # Interactive setup wizard
│ ├── skin_engine.py # Skin/theme engine — CLI visual customization
│ ├── skills_config.py # `hermes skills` — enable/disable skills per platform
│ ├── tools_config.py # `hermes tools` — enable/disable tools per platform
│ ├── skills_hub.py # `/skills` slash command (search, browse, install)
│ ├── models.py # Model catalog, provider model lists
│ └── auth.py # Provider credential resolution
├── tools/ # Tool implementations (one file per tool)
│ ├── registry.py # Central tool registry (schemas, handlers, dispatch)
│ ├── approval.py # Dangerous command detection
@@ -48,9 +54,10 @@ hermes-agent/
│ ├── run.py # Main loop, slash commands, message dispatch
│ ├── session.py # SessionStore — conversation persistence
│ └── platforms/ # Adapters: telegram, discord, slack, whatsapp, homeassistant, signal
├── acp_adapter/ # ACP server (VS Code / Zed / JetBrains integration)
├── cron/ # Scheduler (jobs.py, scheduler.py)
├── environments/ # RL training environments (Atropos)
├── tests/ # Pytest suite (~2500+ tests)
├── tests/ # Pytest suite (~3000 tests)
└── batch_runner.py # Parallel batch processing
```
@@ -121,6 +128,7 @@ Messages follow OpenAI format: `{"role": "system/user/assistant/tool", ...}`. Re
- **Rich** for banner/panels, **prompt_toolkit** for input with autocomplete
- **KawaiiSpinner** (`agent/display.py`) — animated faces during API calls, `┊` activity feed for tool results
- `load_cli_config()` in cli.py merges hardcoded defaults + user config YAML
- **Skin engine** (`hermes_cli/skin_engine.py`) — data-driven CLI theming; initialized from `display.skin` config key at startup; skins customize banner colors, spinner faces/verbs/wings, tool prefix, response box, branding text
- `process_command()` is a method on `HermesCLI` (not in commands.py)
- Skill slash commands: `agent/skill_commands.py` scans `~/.hermes/skills/`, injects as **user message** (not system prompt) to preserve prompt caching
@@ -195,6 +203,94 @@ The registry handles schema collection, dispatch, availability checking, and err
---
## Skin/Theme System
The skin engine (`hermes_cli/skin_engine.py`) provides data-driven CLI visual customization. Skins are **pure data** — no code changes needed to add a new skin.
### Architecture
```
hermes_cli/skin_engine.py # SkinConfig dataclass, built-in skins, YAML loader
~/.hermes/skins/*.yaml # User-installed custom skins (drop-in)
```
- `init_skin_from_config()` — called at CLI startup, reads `display.skin` from config
- `get_active_skin()` — returns cached `SkinConfig` for the current skin
- `set_active_skin(name)` — switches skin at runtime (used by `/skin` command)
- `load_skin(name)` — loads from user skins first, then built-ins, then falls back to default
- Missing skin values inherit from the `default` skin automatically
### What skins customize
| Element | Skin Key | Used By |
|---------|----------|---------|
| Banner panel border | `colors.banner_border` | `banner.py` |
| Banner panel title | `colors.banner_title` | `banner.py` |
| Banner section headers | `colors.banner_accent` | `banner.py` |
| Banner dim text | `colors.banner_dim` | `banner.py` |
| Banner body text | `colors.banner_text` | `banner.py` |
| Response box border | `colors.response_border` | `cli.py` |
| Spinner faces (waiting) | `spinner.waiting_faces` | `display.py` |
| Spinner faces (thinking) | `spinner.thinking_faces` | `display.py` |
| Spinner verbs | `spinner.thinking_verbs` | `display.py` |
| Spinner wings (optional) | `spinner.wings` | `display.py` |
| Tool output prefix | `tool_prefix` | `display.py` |
| Agent name | `branding.agent_name` | `banner.py`, `cli.py` |
| Welcome message | `branding.welcome` | `cli.py` |
| Response box label | `branding.response_label` | `cli.py` |
| Prompt symbol | `branding.prompt_symbol` | `cli.py` |
### Built-in skins
- `default` — Classic Hermes gold/kawaii (the current look)
- `ares` — Crimson/bronze war-god theme with custom spinner wings
- `mono` — Clean grayscale monochrome
- `slate` — Cool blue developer-focused theme
### Adding a built-in skin
Add to `_BUILTIN_SKINS` dict in `hermes_cli/skin_engine.py`:
```python
"mytheme": {
"name": "mytheme",
"description": "Short description",
"colors": { ... },
"spinner": { ... },
"branding": { ... },
"tool_prefix": "",
},
```
### User skins (YAML)
Users create `~/.hermes/skins/<name>.yaml`:
```yaml
name: cyberpunk
description: Neon-soaked terminal theme
colors:
banner_border: "#FF00FF"
banner_title: "#00FFFF"
banner_accent: "#FF1493"
spinner:
thinking_verbs: ["jacking in", "decrypting", "uploading"]
wings:
- ["⟨⚡", "⚡⟩"]
branding:
agent_name: "Cyber Agent"
response_label: " ⚡ Cyber "
tool_prefix: "▏"
```
Activate with `/skin cyberpunk` or `display.skin: cyberpunk` in config.yaml.
---
## Important Policies
### Prompt Caching Must Not Break
@@ -210,6 +306,17 @@ Cache-breaking forces dramatically higher costs. The ONLY time we alter context
- **CLI**: Uses current directory (`.``os.getcwd()`)
- **Messaging**: Uses `MESSAGING_CWD` env var (default: home directory)
### Background Process Notifications (Gateway)
When `terminal(background=true, check_interval=...)` is used, the gateway runs a watcher that
pushes status updates to the user's chat. Control verbosity with `display.background_process_notifications`
in config.yaml (or `HERMES_BACKGROUND_NOTIFICATIONS` env var):
- `all` — running-output updates + final message (default)
- `result` — only the final completion message
- `error` — only the final message when exit code != 0
- `off` — no watcher messages at all
---
## Known Pitfalls
@@ -232,7 +339,7 @@ The `_isolate_hermes_home` autouse fixture in `tests/conftest.py` redirects `HER
```bash
source .venv/bin/activate
python -m pytest tests/ -q # Full suite (~2500 tests, ~2 min)
python -m pytest tests/ -q # Full suite (~3000 tests, ~3 min)
python -m pytest tests/test_model_tools.py -q # Toolset resolution
python -m pytest tests/test_cli_init.py -q # CLI config loading
python -m pytest tests/gateway/ -q # Gateway tests

View File

@@ -139,7 +139,8 @@ hermes-agent/
│ ├── commands.py # Slash command definitions + autocomplete
│ ├── callbacks.py # Interactive callbacks (clarify, sudo, approval)
│ ├── doctor.py # Diagnostics
── skills_hub.py # Skills Hub CLI + /skills slash command
── skills_hub.py # Skills Hub CLI + /skills slash command
│ └── skin_engine.py # Skin/theme engine — data-driven CLI visual customization
├── tools/ # Tool implementations (self-registering)
│ ├── registry.py # Central tool registry (schemas, handlers, dispatch)
@@ -375,6 +376,56 @@ If the field is omitted or empty, the skill loads on all platforms (backward com
---
## Adding a Skin / Theme
Hermes uses a data-driven skin system — no code changes needed to add a new skin.
**Option A: User skin (YAML file)**
Create `~/.hermes/skins/<name>.yaml`:
```yaml
name: mytheme
description: Short description of the theme
colors:
banner_border: "#HEX" # Panel border color
banner_title: "#HEX" # Panel title color
banner_accent: "#HEX" # Section header color
banner_dim: "#HEX" # Muted/dim text color
banner_text: "#HEX" # Body text color
response_border: "#HEX" # Response box border
spinner:
waiting_faces: ["(⚔)", "(⛨)"]
thinking_faces: ["(⚔)", "(⌁)"]
thinking_verbs: ["forging", "plotting"]
wings: # Optional left/right decorations
- ["⟪⚔", "⚔⟫"]
branding:
agent_name: "My Agent"
welcome: "Welcome message"
response_label: " ⚔ Agent "
prompt_symbol: "⚔ "
tool_prefix: "╎" # Tool output line prefix
```
All fields are optional — missing values inherit from the default skin.
**Option B: Built-in skin**
Add to `_BUILTIN_SKINS` dict in `hermes_cli/skin_engine.py`. Use the same schema as above but as a Python dict. Built-in skins ship with the package and are always available.
**Activating:**
- CLI: `/skin mytheme` or set `display.skin: mytheme` in config.yaml
- Config: `display: { skin: mytheme }`
See `hermes_cli/skin_engine.py` for the full schema and existing skins as examples.
---
## Cross-Platform Compatibility
Hermes runs on Linux, macOS, and Windows. When writing code that touches the OS:

View File

@@ -560,12 +560,16 @@ def get_vision_auxiliary_client() -> Tuple[Optional[OpenAI], Optional[str]]:
forced = _get_auxiliary_provider("vision")
if forced != "auto":
return _resolve_forced_provider(forced)
# Auto: only multimodal-capable providers
for try_fn in (_try_openrouter, _try_nous, _try_codex):
# Auto: try providers known to support multimodal first, then fall
# back to the user's custom endpoint. Many local models (Qwen-VL,
# LLaVA, Pixtral, etc.) support vision — skipping them entirely
# caused silent failures for local-only users.
for try_fn in (_try_openrouter, _try_nous, _try_codex,
_try_custom_endpoint):
client, model = try_fn()
if client is not None:
return client, model
logger.debug("Auxiliary vision client: none available (auto only tries OpenRouter/Nous/Codex)")
logger.debug("Auxiliary vision client: none available")
return None, None

View File

@@ -5,8 +5,8 @@ Used by AIAgent._execute_tool_calls for CLI feedback.
"""
import json
import logging
import os
import random
import sys
import threading
import time
@@ -15,6 +15,49 @@ import time
_RED = "\033[31m"
_RESET = "\033[0m"
logger = logging.getLogger(__name__)
# =========================================================================
# Skin-aware helpers (lazy import to avoid circular deps)
# =========================================================================
def _get_skin():
"""Get the active skin config, or None if not available."""
try:
from hermes_cli.skin_engine import get_active_skin
return get_active_skin()
except Exception:
return None
def get_skin_faces(key: str, default: list) -> list:
"""Get spinner face list from active skin, falling back to default."""
skin = _get_skin()
if skin:
faces = skin.get_spinner_list(key)
if faces:
return faces
return default
def get_skin_verbs() -> list:
"""Get thinking verbs from active skin."""
skin = _get_skin()
if skin:
verbs = skin.get_spinner_list("thinking_verbs")
if verbs:
return verbs
return KawaiiSpinner.THINKING_VERBS
def get_skin_tool_prefix() -> str:
"""Get tool output prefix character from active skin."""
skin = _get_skin()
if skin:
return skin.tool_prefix
return ""
# =========================================================================
# Tool preview (one-line summary of a tool call's primary argument)
@@ -22,6 +65,8 @@ _RESET = "\033[0m"
def build_tool_preview(tool_name: str, args: dict, max_len: int = 40) -> str:
"""Build a short preview of a tool call's primary argument for display."""
if not args:
return None
primary_args = {
"terminal": "command", "web_search": "query", "web_extract": "urls",
"read_file": "path", "write_file": "path", "patch": "path",
@@ -163,6 +208,7 @@ class KawaiiSpinner:
self.frame_idx = 0
self.start_time = None
self.last_line_len = 0
self._last_flush_time = 0.0 # Rate-limit flushes for patch_stdout compat
# Capture stdout NOW, before any redirect_stdout(devnull) from
# child agents can replace sys.stdout with a black hole.
self._out = sys.stdout
@@ -177,15 +223,34 @@ class KawaiiSpinner:
pass
def _animate(self):
# Cache skin wings at start (avoid per-frame imports)
skin = _get_skin()
wings = skin.get_spinner_wings() if skin else []
while self.running:
if os.getenv("HERMES_SPINNER_PAUSE"):
time.sleep(0.1)
continue
frame = self.spinner_frames[self.frame_idx % len(self.spinner_frames)]
elapsed = time.time() - self.start_time
line = f" {frame} {self.message} ({elapsed:.1f}s)"
if wings:
left, right = wings[self.frame_idx % len(wings)]
line = f" {left} {frame} {self.message} {right} ({elapsed:.1f}s)"
else:
line = f" {frame} {self.message} ({elapsed:.1f}s)"
pad = max(self.last_line_len - len(line), 0)
self._write(f"\r{line}{' ' * pad}", end='', flush=True)
# Rate-limit flush() calls to avoid spinner spam under
# prompt_toolkit's patch_stdout. Each flush() pushes a queue
# item that may trigger a separate run_in_terminal() call; if
# items are processed one-at-a-time the \r overwrite is lost
# and every frame appears on its own line. By flushing at
# most every 0.4s we guarantee multiple \r-frames are batched
# into a single write, so the terminal collapses them correctly.
now = time.time()
should_flush = (now - self._last_flush_time) >= 0.4
self._write(f"\r{line}{' ' * pad}", end='', flush=should_flush)
if should_flush:
self._last_flush_time = now
self.last_line_len = len(line)
self.frame_idx += 1
time.sleep(0.12)
@@ -300,7 +365,7 @@ def _detect_tool_failure(tool_name: str, result: str | None) -> tuple[bool, str]
if exit_code is not None and exit_code != 0:
return True, f" [exit {exit_code}]"
except (json.JSONDecodeError, TypeError, AttributeError):
pass
logger.debug("Could not parse terminal result as JSON for exit code check")
return False, ""
# Memory-specific: distinguish "full" from real errors
@@ -310,7 +375,7 @@ def _detect_tool_failure(tool_name: str, result: str | None) -> tuple[bool, str]
if data.get("success") is False and "exceed the limit" in data.get("error", ""):
return True, " [full]"
except (json.JSONDecodeError, TypeError, AttributeError):
pass
logger.debug("Could not parse memory result as JSON for capacity check")
# Generic heuristic for non-terminal tools
lower = result[:500].lower()
@@ -332,6 +397,7 @@ def get_cute_tool_message(
"""
dur = f"{duration:.1f}s"
is_failure, failure_suffix = _detect_tool_failure(tool_name, result)
skin_prefix = get_skin_tool_prefix()
def _trunc(s, n=40):
s = str(s)
@@ -342,7 +408,9 @@ def get_cute_tool_message(
return ("..." + p[-(n-3):]) if len(p) > n else p
def _wrap(line: str) -> str:
"""Append failure suffix when the tool failed."""
"""Apply skin tool prefix and failure suffix."""
if skin_prefix != "":
line = line.replace("", skin_prefix, 1)
if not is_failure:
return line
return f"{line}{failure_suffix}"

View File

@@ -159,8 +159,8 @@ def _read_skill_description(skill_file: Path, max_chars: int = 60) -> str:
if len(desc) > max_chars:
desc = desc[:max_chars - 3] + "..."
return desc
except Exception:
pass
except Exception as e:
logger.debug("Failed to read skill description from %s: %s", skill_file, e)
return ""
@@ -195,6 +195,8 @@ def build_skills_system_prompt() -> str:
# Collect skills with descriptions, grouped by category
# Each entry: (skill_name, description)
# Supports sub-categories: skills/mlops/training/axolotl/SKILL.md
# → category "mlops/training", skill "axolotl"
skills_by_category: dict[str, list[tuple[str, str]]] = {}
for skill_file in skills_dir.rglob("SKILL.md"):
# Skip skills incompatible with the current OS platform
@@ -203,8 +205,13 @@ def build_skills_system_prompt() -> str:
rel_path = skill_file.relative_to(skills_dir)
parts = rel_path.parts
if len(parts) >= 2:
category = parts[0]
# Category is everything between skills_dir and the skill folder
# e.g. parts = ("mlops", "training", "axolotl", "SKILL.md")
# → category = "mlops/training", skill_name = "axolotl"
# e.g. parts = ("github", "github-auth", "SKILL.md")
# → category = "github", skill_name = "github-auth"
skill_name = parts[-2]
category = "/".join(parts[:-2]) if len(parts) > 2 else parts[0]
else:
category = "general"
skill_name = skill_file.parent.name
@@ -215,9 +222,11 @@ def build_skills_system_prompt() -> str:
return ""
# Read category-level descriptions from DESCRIPTION.md
# Checks both the exact category path and parent directories
category_descriptions = {}
for category in skills_by_category:
desc_file = skills_dir / category / "DESCRIPTION.md"
cat_path = Path(category)
desc_file = skills_dir / cat_path / "DESCRIPTION.md"
if desc_file.exists():
try:
content = desc_file.read_text(encoding="utf-8")

View File

@@ -10,7 +10,6 @@ the first 6 and last 4 characters for debuggability.
import logging
import os
import re
from typing import Optional
logger = logging.getLogger(__name__)
@@ -60,7 +59,8 @@ _AUTH_HEADER_RE = re.compile(
re.IGNORECASE,
)
# Telegram bot tokens: bot<digits>:<token> or <digits>:<alphanum>
# Telegram bot tokens: bot<digits>:<token> or <digits>:<token>,
# where token part is restricted to [-A-Za-z0-9_] and length >= 30
_TELEGRAM_RE = re.compile(
r"(bot)?(\d{8,}):([-A-Za-z0-9_]{30,})",
)

View File

@@ -606,7 +606,7 @@ class BatchRunner:
# Create batches
self.batches = self._create_batches()
print(f"📊 Batch Runner Initialized")
print("📊 Batch Runner Initialized")
print(f" Dataset: {self.dataset_file} ({len(self.dataset)} prompts)")
print(f" Batch size: {self.batch_size}")
print(f" Total batches: {len(self.batches)}")
@@ -826,7 +826,7 @@ class BatchRunner:
print("=" * 70)
print(f" Original dataset size: {len(self.dataset):,} prompts")
print(f" Already completed: {len(skipped_indices):,} prompts")
print(f" ─────────────────────────────────────────")
print(" ─────────────────────────────────────────")
print(f" 🎯 RESUMING WITH: {len(filtered_entries):,} prompts")
print(f" New batches created: {len(batches_to_process)}")
print("=" * 70 + "\n")
@@ -888,7 +888,7 @@ class BatchRunner:
]
print(f"✅ Created {len(tasks)} batch tasks")
print(f"🚀 Starting parallel batch processing...\n")
print("🚀 Starting parallel batch processing...\n")
# Use rich Progress for better visual tracking with persistent bottom bar
# redirect_stdout/stderr lets rich manage all output so progress bar stays clean
@@ -1057,7 +1057,7 @@ class BatchRunner:
print(f"✅ Total trajectories in merged file: {total_entries - filtered_entries}")
print(f"✅ Total batch files merged: {batch_files_found}")
print(f"⏱️ Total duration: {round(time.time() - start_time, 2)}s")
print(f"\n📈 Tool Usage Statistics:")
print("\n📈 Tool Usage Statistics:")
print("-" * 70)
if total_tool_stats:
@@ -1084,7 +1084,7 @@ class BatchRunner:
# Print reasoning coverage stats
total_discarded = sum(r.get("discarded_no_reasoning", 0) for r in results)
print(f"\n🧠 Reasoning Coverage:")
print("\n🧠 Reasoning Coverage:")
print("-" * 70)
total_turns = total_reasoning_stats["total_assistant_turns"]
with_reasoning = total_reasoning_stats["turns_with_reasoning"]
@@ -1101,8 +1101,8 @@ class BatchRunner:
print(f" 🚫 Samples discarded (zero reasoning): {total_discarded:,}")
print(f"\n💾 Results saved to: {self.output_dir}")
print(f" - Trajectories: trajectories.jsonl (combined)")
print(f" - Individual batches: batch_*.jsonl (for debugging)")
print(" - Trajectories: trajectories.jsonl (combined)")
print(" - Individual batches: batch_*.jsonl (for debugging)")
print(f" - Statistics: {self.stats_file.name}")
print(f" - Checkpoint: {self.checkpoint_file.name}")
@@ -1238,7 +1238,7 @@ def main(
with open(prefill_messages_file, 'r', encoding='utf-8') as f:
prefill_messages = json.load(f)
if not isinstance(prefill_messages, list):
print(f"❌ Error: prefill_messages_file must contain a JSON array of messages")
print("❌ Error: prefill_messages_file must contain a JSON array of messages")
return
print(f"💬 Loaded {len(prefill_messages)} prefill messages from {prefill_messages_file}")
except Exception as e:

View File

@@ -11,6 +11,7 @@ model:
# Inference provider selection:
# "auto" - Use Nous Portal if logged in, otherwise OpenRouter/env vars (default)
# "nous-api" - Use Nous Portal via API key (requires: NOUS_API_KEY)
# "openrouter" - Always use OpenRouter API key from OPENROUTER_API_KEY
# "nous" - Always use Nous Portal (requires: hermes login)
# "zai" - Use z.ai / ZhipuAI GLM models (requires: GLM_API_KEY)
@@ -402,11 +403,13 @@ agent:
# discord: [web, vision, skills, todo]
#
# If not set, defaults are:
# cli: hermes-cli (everything + cronjob management)
# telegram: hermes-telegram (terminal, file, web, vision, image, tts, browser, skills, todo, cronjob, messaging)
# discord: hermes-discord (same as telegram)
# whatsapp: hermes-whatsapp (same as telegram)
# slack: hermes-slack (same as telegram)
# cli: hermes-cli (everything + cronjob management)
# telegram: hermes-telegram (terminal, file, web, vision, image, tts, browser, skills, todo, cronjob, messaging)
# discord: hermes-discord (same as telegram)
# whatsapp: hermes-whatsapp (same as telegram)
# slack: hermes-slack (same as telegram)
# signal: hermes-signal (same as telegram)
# homeassistant: hermes-homeassistant (same as telegram)
#
platform_toolsets:
cli: [hermes-cli]
@@ -414,6 +417,8 @@ platform_toolsets:
discord: [hermes-discord]
whatsapp: [hermes-whatsapp]
slack: [hermes-slack]
signal: [hermes-signal]
homeassistant: [hermes-homeassistant]
# ─────────────────────────────────────────────────────────────────────────────
# Available toolsets (use these names in platform_toolsets or the toolsets list)
@@ -555,6 +560,21 @@ toolsets:
# args: ["-y", "@modelcontextprotocol/server-github"]
# env:
# GITHUB_PERSONAL_ACCESS_TOKEN: "ghp_..."
#
# Sampling (server-initiated LLM requests) — enabled by default.
# Per-server config under the 'sampling' key:
# analysis:
# command: npx
# args: ["-y", "analysis-server"]
# sampling:
# enabled: true # default: true
# model: "gemini-3-flash" # override model (optional)
# max_tokens_cap: 4096 # max tokens per request
# timeout: 30 # LLM call timeout (seconds)
# max_rpm: 10 # max requests per minute
# allowed_models: [] # model whitelist (empty = all)
# max_tool_rounds: 5 # tool loop limit (0 = disable)
# log_level: "info" # audit verbosity
# =============================================================================
# Voice Transcription (Speech-to-Text)
@@ -636,7 +656,57 @@ display:
# Toggle at runtime with /verbose in the CLI
tool_progress: all
# Background process notifications (gateway/messaging only).
# Controls how chatty the process watcher is when you use
# terminal(background=true, check_interval=...) from Telegram/Discord/etc.
# off: No watcher messages at all
# result: Only the final completion message
# error: Only the final message when exit code != 0
# all: Running output updates + final message (default)
background_process_notifications: all
# Play terminal bell when agent finishes a response.
# Useful for long-running tasks — your terminal will ding when the agent is done.
# Works over SSH. Most terminals can be configured to flash the taskbar or play a sound.
bell_on_complete: false
# ───────────────────────────────────────────────────────────────────────────
# Skin / Theme
# ───────────────────────────────────────────────────────────────────────────
# Customize CLI visual appearance — banner colors, spinner faces, tool prefix,
# response box label, and branding text. Change at runtime with /skin <name>.
#
# Built-in skins:
# default — Classic Hermes gold/kawaii
# ares — Crimson/bronze war-god theme with spinner wings
# mono — Clean grayscale monochrome
# slate — Cool blue developer-focused
#
# Custom skins: drop a YAML file in ~/.hermes/skins/<name>.yaml
# Schema (all fields optional, missing values inherit from default):
#
# name: my-theme
# description: Short description
# colors:
# banner_border: "#HEX" # Panel border
# banner_title: "#HEX" # Panel title
# banner_accent: "#HEX" # Section headers (Available Tools, etc.)
# banner_dim: "#HEX" # Dim/muted text
# banner_text: "#HEX" # Body text (tool names, skill names)
# ui_accent: "#HEX" # UI accent color
# response_border: "#HEX" # Response box border color
# spinner:
# waiting_faces: ["(⚔)", "(⛨)"] # Faces shown while waiting
# thinking_faces: ["(⚔)", "(⌁)"] # Faces shown while thinking
# thinking_verbs: ["forging", "plotting"] # Verbs for spinner messages
# wings: # Optional left/right spinner decorations
# - ["⟪⚔", "⚔⟫"]
# - ["⟪▲", "▲⟫"]
# branding:
# agent_name: "My Agent" # Banner title and branding
# welcome: "Welcome message" # Shown at CLI startup
# response_label: " ⚔ Agent " # Response box header label
# prompt_symbol: "⚔ " # Prompt symbol
# tool_prefix: "╎" # Tool output line prefix (default: ┊)
#
skin: default

804
cli.py

File diff suppressed because it is too large Load Diff

View File

@@ -26,16 +26,35 @@ except ImportError:
# Configuration
# =============================================================================
HERMES_DIR = Path.home() / ".hermes"
HERMES_DIR = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
CRON_DIR = HERMES_DIR / "cron"
JOBS_FILE = CRON_DIR / "jobs.json"
OUTPUT_DIR = CRON_DIR / "output"
def _secure_dir(path: Path):
"""Set directory to owner-only access (0700). No-op on Windows."""
try:
os.chmod(path, 0o700)
except (OSError, NotImplementedError):
pass # Windows or other platforms where chmod is not supported
def _secure_file(path: Path):
"""Set file to owner-only read/write (0600). No-op on Windows."""
try:
if path.exists():
os.chmod(path, 0o600)
except (OSError, NotImplementedError):
pass
def ensure_dirs():
"""Ensure cron directories exist."""
"""Ensure cron directories exist with secure permissions."""
CRON_DIR.mkdir(parents=True, exist_ok=True)
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
_secure_dir(CRON_DIR)
_secure_dir(OUTPUT_DIR)
# =============================================================================
@@ -223,6 +242,7 @@ def save_jobs(jobs: List[Dict[str, Any]]):
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, JOBS_FILE)
_secure_file(JOBS_FILE)
except BaseException:
try:
os.unlink(tmp_path)
@@ -400,11 +420,13 @@ def save_job_output(job_id: str, output: str):
ensure_dirs()
job_output_dir = OUTPUT_DIR / job_id
job_output_dir.mkdir(parents=True, exist_ok=True)
_secure_dir(job_output_dir)
timestamp = _hermes_now().strftime("%Y-%m-%d_%H-%M-%S")
output_file = job_output_dir / f"{timestamp}.md"
with open(output_file, 'w', encoding='utf-8') as f:
f.write(output)
_secure_file(output_file)
return output_file

View File

@@ -45,7 +45,7 @@ _LOCK_FILE = _LOCK_DIR / ".tick.lock"
def _resolve_origin(job: dict) -> Optional[dict]:
"""Extract origin info from a job, returning {platform, chat_id, chat_name} or None."""
"""Extract origin info from a job, preserving any extra routing metadata."""
origin = job.get("origin")
if not origin:
return None
@@ -69,6 +69,8 @@ def _deliver_result(job: dict, content: str) -> None:
if deliver == "local":
return
thread_id = None
# Resolve target platform + chat_id
if deliver == "origin":
if not origin:
@@ -76,6 +78,7 @@ def _deliver_result(job: dict, content: str) -> None:
return
platform_name = origin["platform"]
chat_id = origin["chat_id"]
thread_id = origin.get("thread_id")
elif ":" in deliver:
platform_name, chat_id = deliver.split(":", 1)
else:
@@ -83,6 +86,7 @@ def _deliver_result(job: dict, content: str) -> None:
platform_name = deliver
if origin and origin.get("platform") == platform_name:
chat_id = origin["chat_id"]
thread_id = origin.get("thread_id")
else:
# Fall back to home channel
chat_id = os.getenv(f"{platform_name.upper()}_HOME_CHANNEL", "")
@@ -118,13 +122,13 @@ def _deliver_result(job: dict, content: str) -> None:
# Run the async send in a fresh event loop (safe from any thread)
try:
result = asyncio.run(_send_to_platform(platform, pconfig, chat_id, content))
result = asyncio.run(_send_to_platform(platform, pconfig, chat_id, content, thread_id=thread_id))
except RuntimeError:
# asyncio.run() fails if there's already a running loop in this thread;
# spin up a new thread to avoid that.
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, content))
future = pool.submit(asyncio.run, _send_to_platform(platform, pconfig, chat_id, content, thread_id=thread_id))
result = future.result(timeout=30)
except Exception as e:
logger.error("Job '%s': delivery to %s:%s failed: %s", job["id"], platform_name, chat_id, e)
@@ -137,9 +141,9 @@ def _deliver_result(job: dict, content: str) -> None:
# Mirror the delivered content into the target's gateway session
try:
from gateway.mirror import mirror_to_session
mirror_to_session(platform_name, chat_id, content, source_label="cron")
except Exception:
pass
mirror_to_session(platform_name, chat_id, content, source_label="cron", thread_id=thread_id)
except Exception as e:
logger.warning("Job '%s': mirror_to_session failed: %s", job["id"], e)
def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
@@ -190,8 +194,8 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
model = _model_cfg
elif isinstance(_model_cfg, dict):
model = _model_cfg.get("default", model)
except Exception:
pass
except Exception as e:
logger.warning("Job '%s': failed to load config.yaml, using defaults: %s", job_id, e)
# Reasoning config from env or config.yaml
reasoning_config = None
@@ -219,7 +223,8 @@ def run_job(job: dict) -> tuple[bool, str, str, Optional[str]]:
prefill_messages = _json.load(_pf)
if not isinstance(prefill_messages, list):
prefill_messages = None
except Exception:
except Exception as e:
logger.warning("Job '%s': failed to parse prefill messages file '%s': %s", job_id, pfpath, e)
prefill_messages = None
# Max iterations

View File

@@ -0,0 +1,46 @@
# datagen-config-examples/web_research.yaml
#
# Batch data generation config for WebResearchEnv.
# Generates tool-calling trajectories for multi-step web research tasks.
#
# Usage:
# python batch_runner.py \
# --config datagen-config-examples/web_research.yaml \
# --run_name web_research_v1
environment: web-research
# Toolsets available to the agent during data generation
toolsets:
- web
- file
# How many parallel workers to use
num_workers: 4
# Questions per batch
batch_size: 20
# Total trajectories to generate (comment out to run full dataset)
max_items: 500
# Model to use for generation (override with --model flag)
model: openrouter/nousresearch/hermes-3-llama-3.1-405b
# System prompt additions (ephemeral — not saved to trajectories)
ephemeral_system_prompt: |
You are a highly capable research agent. When asked a factual question,
always use web_search to find current, accurate information before answering.
Cite at least 2 sources. Be concise and accurate.
# Output directory
output_dir: data/web_research_v1
# Trajectory compression settings (for fitting into training token budgets)
compression:
enabled: true
target_max_tokens: 16000
# Eval settings
eval_every: 100 # Run eval every N trajectories
eval_size: 25 # Number of held-out questions per eval run

View File

@@ -0,0 +1,89 @@
# ============================================================================
# Hermes Agent — Example Skin Template
# ============================================================================
#
# Copy this file to ~/.hermes/skins/<name>.yaml to create a custom skin.
# All fields are optional — missing values inherit from the default skin.
# Activate with: /skin <name> or display.skin: <name> in config.yaml
#
# See hermes_cli/skin_engine.py for the full schema reference.
# ============================================================================
# Required: unique skin name (used in /skin command and config)
name: example
description: An example custom skin — copy and modify this template
# ── Colors ──────────────────────────────────────────────────────────────────
# Hex color values for Rich markup. These control the CLI's visual palette.
colors:
# Banner panel (the startup welcome box)
banner_border: "#CD7F32" # Panel border
banner_title: "#FFD700" # Panel title text
banner_accent: "#FFBF00" # Section headers (Available Tools, Skills, etc.)
banner_dim: "#B8860B" # Dim/muted text (separators, model info)
banner_text: "#FFF8DC" # Body text (tool names, skill names)
# UI elements
ui_accent: "#FFBF00" # General accent color
ui_label: "#4dd0e1" # Labels
ui_ok: "#4caf50" # Success indicators
ui_error: "#ef5350" # Error indicators
ui_warn: "#ffa726" # Warning indicators
# Input area
prompt: "#FFF8DC" # Prompt text color
input_rule: "#CD7F32" # Horizontal rule around input
# Response box
response_border: "#FFD700" # Response box border (ANSI color)
# Session display
session_label: "#DAA520" # Session label
session_border: "#8B8682" # Session ID dim color
# ── Spinner ─────────────────────────────────────────────────────────────────
# Customize the animated spinner shown during API calls and tool execution.
spinner:
# Faces shown while waiting for the API response
waiting_faces:
- "(。◕‿◕。)"
- "(◕‿◕✿)"
- "٩(◕‿◕。)۶"
# Faces shown during extended thinking/reasoning
thinking_faces:
- "(。•́︿•̀。)"
- "(◔_◔)"
- "(¬‿¬)"
# Verbs used in spinner messages (e.g., "pondering your request...")
thinking_verbs:
- "pondering"
- "contemplating"
- "musing"
- "ruminating"
# Optional: left/right decorations around the spinner
# Each entry is a [left, right] pair. Omit entirely for no wings.
# wings:
# - ["⟪⚔", "⚔⟫"]
# - ["⟪▲", "▲⟫"]
# ── Branding ────────────────────────────────────────────────────────────────
# Text strings used throughout the CLI interface.
branding:
agent_name: "Hermes Agent" # Banner title, about display
welcome: "Welcome! Type your message or /help for commands."
goodbye: "Goodbye! ⚕" # Exit message
response_label: " ⚕ Hermes " # Response box header label
prompt_symbol: " " # Input prompt symbol
help_header: "(^_^)? Available Commands" # /help header text
# ── Tool Output ─────────────────────────────────────────────────────────────
# Character used as the prefix for tool output lines.
# Default is "┊" (thin dotted vertical line). Some alternatives:
# "╎" (light triple dash vertical)
# "▏" (left one-eighth block)
# "│" (box drawing light vertical)
# "┃" (box drawing heavy vertical)
tool_prefix: "┊"

View File

@@ -29,6 +29,10 @@ env:
wandb_name: "terminal-bench-2"
ensure_scores_are_not_same: false
data_dir_to_save_evals: "environments/benchmarks/evals/terminal-bench-2"
# CRITICAL: Limit concurrent Modal sandbox creations to avoid deadlocks.
# Modal's blocking calls (App.lookup, etc.) deadlock when too many sandboxes
# are created simultaneously inside thread pool workers via asyncio.run().
max_concurrent_tasks: 8
openai:
base_url: "https://openrouter.ai/api/v1"

View File

@@ -118,6 +118,15 @@ class TerminalBench2EvalConfig(HermesAgentEnvConfig):
"Tasks exceeding this are scored as FAIL. Default 30 minutes.",
)
# --- Concurrency control ---
max_concurrent_tasks: int = Field(
default=8,
description="Maximum number of tasks to run concurrently. "
"Limits concurrent Modal sandbox creations to avoid async/threading deadlocks. "
"Modal has internal limits and creating too many sandboxes simultaneously "
"causes blocking calls to deadlock inside the thread pool.",
)
# Tasks that cannot run properly on Modal and are excluded from scoring.
MODAL_INCOMPATIBLE_TASKS = {
@@ -430,7 +439,7 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
}
# --- 2. Register per-task Modal image override ---
register_task_env_overrides(task_id, {"modal_image": modal_image})
register_task_env_overrides(task_id, {"modal_image": modal_image, "cwd": "/app"})
logger.info(
"Task %s: registered image override for task_id %s",
task_name, task_id[:8],
@@ -733,12 +742,23 @@ class TerminalBench2EvalEnv(HermesAgentBaseEnv):
print(f" Tool thread pool: {self.config.tool_pool_size}")
print(f" Terminal timeout: {self.config.terminal_timeout}s/cmd")
print(f" Terminal lifetime: {self.config.terminal_lifetime}s (auto: task_timeout + 120)")
print(f" Max concurrent tasks: {self.config.max_concurrent_tasks}")
print(f"{'='*60}\n")
# Semaphore to limit concurrent Modal sandbox creations.
# Without this, all 86 tasks fire simultaneously, each creating a Modal
# sandbox via asyncio.run() inside a thread pool worker. Modal's blocking
# calls (App.lookup, etc.) deadlock when too many are created at once.
semaphore = asyncio.Semaphore(self.config.max_concurrent_tasks)
async def _eval_with_semaphore(item):
async with semaphore:
return await self._eval_with_timeout(item)
# Fire all tasks with wall-clock timeout, track live accuracy on the bar
total_tasks = len(self.all_eval_items)
eval_tasks = [
asyncio.ensure_future(self._eval_with_timeout(item))
asyncio.ensure_future(_eval_with_semaphore(item))
for item in self.all_eval_items
]

View File

@@ -0,0 +1,718 @@
"""
WebResearchEnv — RL Environment for Multi-Step Web Research
============================================================
Trains models to do accurate, efficient, multi-source web research.
Reward signals:
- Answer correctness (LLM judge, 0.01.0)
- Source diversity (used ≥2 distinct domains)
- Efficiency (penalizes excessive tool calls)
- Tool usage (bonus for actually using web tools)
Dataset: FRAMES benchmark (Google, 2024) — multi-hop factual questions
HuggingFace: google/frames-benchmark
Fallback: built-in sample questions (no HF token needed)
Usage:
# Phase 1 (OpenAI-compatible server)
python environments/web_research_env.py serve \\
--openai.base_url http://localhost:8000/v1 \\
--openai.model_name YourModel \\
--openai.server_type openai
# Process mode (offline data generation)
python environments/web_research_env.py process \\
--env.data_path_to_save_groups data/web_research.jsonl
# Standalone eval
python environments/web_research_env.py evaluate \\
--openai.base_url http://localhost:8000/v1 \\
--openai.model_name YourModel
Built by: github.com/jackx707
Inspired by: GroceryMind — production Hermes agent doing live web research
across German grocery stores (firecrawl + hermes-agent)
"""
from __future__ import annotations
import asyncio
import json
import logging
import os
import random
import re
import sys
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import urlparse
from pydantic import Field
# Ensure hermes-agent root is on path
_repo_root = Path(__file__).resolve().parent.parent
if str(_repo_root) not in sys.path:
sys.path.insert(0, str(_repo_root))
# ---------------------------------------------------------------------------
# Optional HuggingFace datasets import
# ---------------------------------------------------------------------------
try:
from datasets import load_dataset
HF_AVAILABLE = True
except ImportError:
HF_AVAILABLE = False
from atroposlib.envs.base import ScoredDataGroup
from atroposlib.envs.server_handling.server_manager import APIServerConfig
from atroposlib.type_definitions import Item
from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
from environments.agent_loop import AgentResult
from environments.tool_context import ToolContext
logger = logging.getLogger(__name__)
# ---------------------------------------------------------------------------
# Fallback sample dataset (used when HuggingFace is unavailable)
# Multi-hop questions requiring real web search to answer.
# ---------------------------------------------------------------------------
SAMPLE_QUESTIONS = [
{
"question": "What is the current population of the capital city of the country that won the 2022 FIFA World Cup?",
"answer": "Buenos Aires has approximately 3 million people in the city proper, or around 15 million in the greater metro area.",
"difficulty": "medium",
"hops": 2,
},
{
"question": "Who is the CEO of the company that makes the most widely used open-source container orchestration platform?",
"answer": "The Linux Foundation oversees Kubernetes. CNCF (Cloud Native Computing Foundation) is the specific body — it does not have a traditional CEO but has an executive director.",
"difficulty": "medium",
"hops": 2,
},
{
"question": "What programming language was used to write the original version of the web framework used by Instagram?",
"answer": "Django, which Instagram was built on, is written in Python.",
"difficulty": "easy",
"hops": 2,
},
{
"question": "In what year was the university founded where the inventor of the World Wide Web currently holds a professorship?",
"answer": "Tim Berners-Lee holds a professorship at MIT (founded 1861) and the University of Southampton (founded 1952).",
"difficulty": "hard",
"hops": 3,
},
{
"question": "What is the latest stable version of the programming language that ranks #1 on the TIOBE index as of this year?",
"answer": "Python is currently #1 on TIOBE. The latest stable version should be verified via the official python.org site.",
"difficulty": "medium",
"hops": 2,
},
{
"question": "How many employees does the parent company of Instagram have?",
"answer": "Meta Platforms (parent of Instagram) employs approximately 70,000+ people as of recent reports.",
"difficulty": "medium",
"hops": 2,
},
{
"question": "What is the current interest rate set by the central bank of the country where the Eiffel Tower is located?",
"answer": "The European Central Bank sets rates for France/eurozone. The current rate should be verified — it has changed frequently in 2023-2025.",
"difficulty": "hard",
"hops": 2,
},
{
"question": "Which company acquired the startup founded by the creator of Oculus VR?",
"answer": "Palmer Luckey founded Oculus VR, which was acquired by Facebook (now Meta). He later founded Anduril Industries.",
"difficulty": "medium",
"hops": 2,
},
{
"question": "What is the market cap of the company that owns the most popular search engine in Russia?",
"answer": "Yandex (now split into separate entities after 2024 restructuring). Current market cap should be verified via financial sources.",
"difficulty": "hard",
"hops": 2,
},
{
"question": "What was the GDP growth rate of the country that hosted the most recent Summer Olympics?",
"answer": "Paris, France hosted the 2024 Summer Olympics. France's recent GDP growth should be verified via World Bank or IMF data.",
"difficulty": "hard",
"hops": 2,
},
]
# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------
class WebResearchEnvConfig(HermesAgentEnvConfig):
"""Configuration for the web research RL environment."""
# Reward weights
correctness_weight: float = Field(
default=0.6,
description="Weight for answer correctness in reward (LLM judge score).",
)
tool_usage_weight: float = Field(
default=0.2,
description="Weight for tool usage signal (did the model actually use web tools?).",
)
efficiency_weight: float = Field(
default=0.2,
description="Weight for efficiency signal (penalizes excessive tool calls).",
)
diversity_bonus: float = Field(
default=0.1,
description="Bonus reward for citing ≥2 distinct domains.",
)
# Efficiency thresholds
efficient_max_calls: int = Field(
default=5,
description="Maximum tool calls before efficiency penalty begins.",
)
heavy_penalty_calls: int = Field(
default=10,
description="Tool call count where efficiency penalty steepens.",
)
# Eval
eval_size: int = Field(
default=20,
description="Number of held-out items for evaluation.",
)
eval_split_ratio: float = Field(
default=0.1,
description="Fraction of dataset to hold out for evaluation (0.01.0).",
)
# Dataset
dataset_name: str = Field(
default="google/frames-benchmark",
description="HuggingFace dataset name for research questions.",
)
# ---------------------------------------------------------------------------
# Environment
# ---------------------------------------------------------------------------
class WebResearchEnv(HermesAgentBaseEnv):
"""
RL environment for training multi-step web research skills.
The model is given a factual question requiring 2-3 hops of web research
and must use web_search / web_extract tools to find and synthesize the answer.
Reward is multi-signal:
60% — answer correctness (LLM judge)
20% — tool usage (did the model actually search the web?)
20% — efficiency (penalizes >5 tool calls)
Bonus +0.1 for source diversity (≥2 distinct domains cited).
"""
name = "web-research"
env_config_cls = WebResearchEnvConfig
# Default toolsets for this environment — web + file for saving notes
default_toolsets = ["web", "file"]
@classmethod
def config_init(cls) -> Tuple[WebResearchEnvConfig, List[APIServerConfig]]:
"""Default configuration for the web research environment."""
env_config = WebResearchEnvConfig(
enabled_toolsets=["web", "file"],
max_agent_turns=15,
agent_temperature=1.0,
system_prompt=(
"You are a highly capable research agent. When asked a factual question, "
"always use web_search to find current, accurate information before answering. "
"Cite at least 2 sources. Be concise and accurate."
),
group_size=4,
total_steps=1000,
steps_per_eval=100,
use_wandb=True,
wandb_name="web-research",
)
server_configs = [
APIServerConfig(
base_url="https://openrouter.ai/api/v1",
model_name="anthropic/claude-sonnet-4.5",
server_type="openai",
api_key=os.getenv("OPENROUTER_API_KEY", ""),
health_check=False,
)
]
return env_config, server_configs
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._items: list[dict] = []
self._eval_items: list[dict] = []
self._index: int = 0
# Metrics tracking for wandb
self._reward_buffer: list[float] = []
self._correctness_buffer: list[float] = []
self._tool_usage_buffer: list[float] = []
self._efficiency_buffer: list[float] = []
self._diversity_buffer: list[float] = []
# ------------------------------------------------------------------
# 1. Setup — load dataset
# ------------------------------------------------------------------
async def setup(self) -> None:
"""Load the FRAMES benchmark or fall back to built-in samples."""
if HF_AVAILABLE:
try:
logger.info("Loading FRAMES benchmark from HuggingFace...")
ds = load_dataset(self.config.dataset_name, split="test")
self._items = [
{
"question": row["Prompt"],
"answer": row["Answer"],
"difficulty": row.get("reasoning_types", "unknown"),
"hops": 2,
}
for row in ds
]
# Hold out for eval
eval_size = max(
self.config.eval_size,
int(len(self._items) * self.config.eval_split_ratio),
)
random.shuffle(self._items)
self._eval_items = self._items[:eval_size]
self._items = self._items[eval_size:]
logger.info(
f"Loaded {len(self._items)} train / {len(self._eval_items)} eval items "
f"from FRAMES benchmark."
)
return
except Exception as e:
logger.warning(f"Could not load FRAMES from HuggingFace: {e}. Using built-in samples.")
# Fallback
random.shuffle(SAMPLE_QUESTIONS)
split = max(1, len(SAMPLE_QUESTIONS) * 8 // 10)
self._items = SAMPLE_QUESTIONS[:split]
self._eval_items = SAMPLE_QUESTIONS[split:]
logger.info(
f"Using built-in sample dataset: {len(self._items)} train / "
f"{len(self._eval_items)} eval items."
)
# ------------------------------------------------------------------
# 2. get_next_item — return the next question
# ------------------------------------------------------------------
async def get_next_item(self) -> dict:
"""Return the next item, cycling through the dataset."""
if not self._items:
raise RuntimeError("Dataset is empty. Did you call setup()?")
item = self._items[self._index % len(self._items)]
self._index += 1
return item
# ------------------------------------------------------------------
# 3. format_prompt — build the user-facing prompt
# ------------------------------------------------------------------
def format_prompt(self, item: dict) -> str:
"""Format the research question as a task prompt."""
return (
f"Research the following question thoroughly using web search. "
f"You MUST search the web to find current, accurate information — "
f"do not rely solely on your training data.\n\n"
f"Question: {item['question']}\n\n"
f"Requirements:\n"
f"- Use web_search and/or web_extract tools to find information\n"
f"- Search at least 2 different sources\n"
f"- Provide a concise, accurate answer (2-4 sentences)\n"
f"- Cite the sources you used"
)
# ------------------------------------------------------------------
# 4. compute_reward — multi-signal scoring
# ------------------------------------------------------------------
async def compute_reward(
self,
item: dict,
result: AgentResult,
ctx: ToolContext,
) -> float:
"""
Multi-signal reward function:
correctness_weight * correctness — LLM judge comparing answer to ground truth
tool_usage_weight * tool_used — binary: did the model use web tools?
efficiency_weight * efficiency — penalizes wasteful tool usage
+ diversity_bonus — source diversity (≥2 distinct domains)
"""
# Extract final response from messages (last assistant message with content)
final_response = ""
tools_used: list[str] = []
for msg in reversed(result.messages):
if msg.get("role") == "assistant" and msg.get("content") and not final_response:
final_response = msg["content"]
# Collect tool names from tool call messages
if msg.get("role") == "assistant" and msg.get("tool_calls"):
for tc in msg["tool_calls"]:
fn = tc.get("function", {}) if isinstance(tc, dict) else {}
name = fn.get("name", "")
if name:
tools_used.append(name)
tool_call_count: int = result.turns_used or len(tools_used)
cfg = self.config
# ---- Signal 1: Answer correctness (LLM judge) ----------------
correctness = await self._llm_judge(
question=item["question"],
expected=item["answer"],
model_answer=final_response,
)
# ---- Signal 2: Web tool usage --------------------------------
web_tools = {"web_search", "web_extract", "search", "firecrawl"}
tool_used = 1.0 if any(t in web_tools for t in tools_used) else 0.0
# ---- Signal 3: Efficiency ------------------------------------
if tool_call_count <= cfg.efficient_max_calls:
efficiency = 1.0
elif tool_call_count <= cfg.heavy_penalty_calls:
efficiency = 1.0 - (tool_call_count - cfg.efficient_max_calls) * 0.08
else:
efficiency = max(0.0, 1.0 - (tool_call_count - cfg.efficient_max_calls) * 0.12)
# ---- Bonus: Source diversity ---------------------------------
domains = self._extract_domains(final_response)
diversity = cfg.diversity_bonus if len(domains) >= 2 else 0.0
# ---- Combine ------------------------------------------------
reward = (
cfg.correctness_weight * correctness
+ cfg.tool_usage_weight * tool_used
+ cfg.efficiency_weight * efficiency
+ diversity
)
reward = min(1.0, max(0.0, reward)) # clamp to [0, 1]
# Track for wandb
self._reward_buffer.append(reward)
self._correctness_buffer.append(correctness)
self._tool_usage_buffer.append(tool_used)
self._efficiency_buffer.append(efficiency)
self._diversity_buffer.append(diversity)
logger.debug(
f"Reward breakdown — correctness={correctness:.2f}, "
f"tool_used={tool_used:.1f}, efficiency={efficiency:.2f}, "
f"diversity={diversity:.1f} → total={reward:.3f}"
)
return reward
# ------------------------------------------------------------------
# 5. evaluate — run on held-out eval split
# ------------------------------------------------------------------
async def evaluate(self, *args, **kwargs) -> None:
"""Run evaluation on the held-out split using the full agent loop with tools.
Each eval item runs through the same agent loop as training —
the model can use web_search, web_extract, etc. to research answers.
This measures actual agentic research capability, not just knowledge.
"""
import time
import uuid
from environments.agent_loop import HermesAgentLoop
from environments.tool_context import ToolContext
items = self._eval_items
if not items:
logger.warning("No eval items available.")
return
eval_size = min(self.config.eval_size, len(items))
eval_items = items[:eval_size]
logger.info(f"Running eval on {len(eval_items)} questions (with agent loop + tools)...")
start_time = time.time()
samples = []
# Resolve tools once for all eval items
tools, valid_names = self._resolve_tools_for_group()
for i, item in enumerate(eval_items):
task_id = str(uuid.uuid4())
logger.info(f"Eval [{i+1}/{len(eval_items)}]: {item['question'][:80]}...")
try:
# Build messages
messages: List[Dict[str, Any]] = []
if self.config.system_prompt:
messages.append({"role": "system", "content": self.config.system_prompt})
messages.append({"role": "user", "content": self.format_prompt(item)})
# Run the full agent loop with tools
agent = HermesAgentLoop(
server=self.server,
tool_schemas=tools,
valid_tool_names=valid_names,
max_turns=self.config.max_agent_turns,
task_id=task_id,
temperature=0.0, # Deterministic for eval
max_tokens=self.config.max_token_length,
extra_body=self.config.extra_body,
)
result = await agent.run(messages)
# Extract final response and tool usage from messages
final_response = ""
tool_call_count = 0
for msg in reversed(result.messages):
if msg.get("role") == "assistant" and msg.get("content") and not final_response:
final_response = msg["content"]
if msg.get("role") == "assistant" and msg.get("tool_calls"):
tool_call_count += len(msg["tool_calls"])
# Compute reward (includes LLM judge for correctness)
# Temporarily save buffer lengths so we can extract the
# correctness score without calling judge twice, and avoid
# polluting training metric buffers with eval data.
buf_len = len(self._correctness_buffer)
ctx = ToolContext(task_id)
try:
reward = await self.compute_reward(item, result, ctx)
finally:
ctx.cleanup()
# Extract correctness from the buffer (compute_reward appended it)
# then remove eval entries from training buffers
correctness = (
self._correctness_buffer[buf_len]
if len(self._correctness_buffer) > buf_len
else 0.0
)
# Roll back buffers to avoid polluting training metrics
for buf in (
self._reward_buffer, self._correctness_buffer,
self._tool_usage_buffer, self._efficiency_buffer,
self._diversity_buffer,
):
if len(buf) > buf_len:
buf.pop()
samples.append({
"prompt": item["question"],
"response": final_response[:500],
"expected": item["answer"],
"correctness": correctness,
"reward": reward,
"tool_calls": tool_call_count,
"turns": result.turns_used,
})
logger.info(
f" → correctness={correctness:.2f}, reward={reward:.3f}, "
f"tools={tool_call_count}, turns={result.turns_used}"
)
except Exception as e:
logger.error(f"Eval error on item: {e}")
samples.append({
"prompt": item["question"],
"response": f"ERROR: {e}",
"expected": item["answer"],
"correctness": 0.0,
"reward": 0.0,
"tool_calls": 0,
"turns": 0,
})
end_time = time.time()
# Compute aggregate metrics
correctness_scores = [s["correctness"] for s in samples]
rewards = [s["reward"] for s in samples]
tool_counts = [s["tool_calls"] for s in samples]
n = len(samples)
eval_metrics = {
"eval/mean_correctness": sum(correctness_scores) / n if n else 0.0,
"eval/mean_reward": sum(rewards) / n if n else 0.0,
"eval/mean_tool_calls": sum(tool_counts) / n if n else 0.0,
"eval/tool_usage_rate": sum(1 for t in tool_counts if t > 0) / n if n else 0.0,
"eval/n_items": n,
}
logger.info(
f"Eval complete — correctness={eval_metrics['eval/mean_correctness']:.3f}, "
f"reward={eval_metrics['eval/mean_reward']:.3f}, "
f"tool_usage={eval_metrics['eval/tool_usage_rate']:.0%}"
)
await self.evaluate_log(
metrics=eval_metrics,
samples=samples,
start_time=start_time,
end_time=end_time,
)
# ------------------------------------------------------------------
# 6. wandb_log — custom metrics
# ------------------------------------------------------------------
async def wandb_log(self, wandb_metrics: Optional[Dict] = None) -> None:
"""Log reward breakdown metrics to wandb."""
if wandb_metrics is None:
wandb_metrics = {}
if self._reward_buffer:
n = len(self._reward_buffer)
wandb_metrics["train/mean_reward"] = sum(self._reward_buffer) / n
wandb_metrics["train/mean_correctness"] = sum(self._correctness_buffer) / n
wandb_metrics["train/mean_tool_usage"] = sum(self._tool_usage_buffer) / n
wandb_metrics["train/mean_efficiency"] = sum(self._efficiency_buffer) / n
wandb_metrics["train/mean_diversity"] = sum(self._diversity_buffer) / n
wandb_metrics["train/total_rollouts"] = n
# Accuracy buckets
wandb_metrics["train/correct_rate"] = (
sum(1 for c in self._correctness_buffer if c >= 0.7) / n
)
wandb_metrics["train/tool_usage_rate"] = (
sum(1 for t in self._tool_usage_buffer if t > 0) / n
)
# Clear buffers
self._reward_buffer.clear()
self._correctness_buffer.clear()
self._tool_usage_buffer.clear()
self._efficiency_buffer.clear()
self._diversity_buffer.clear()
await super().wandb_log(wandb_metrics)
# ------------------------------------------------------------------
# Private helpers
# ------------------------------------------------------------------
async def _llm_judge(
self,
question: str,
expected: str,
model_answer: str,
) -> float:
"""
Use the server's LLM to judge answer correctness.
Falls back to keyword heuristic if LLM call fails.
"""
if not model_answer or not model_answer.strip():
return 0.0
judge_prompt = (
"You are an impartial judge evaluating the quality of an AI research answer.\n\n"
f"Question: {question}\n\n"
f"Reference answer: {expected}\n\n"
f"Model answer: {model_answer}\n\n"
"Score the model answer on a scale from 0.0 to 1.0 where:\n"
" 1.0 = fully correct and complete\n"
" 0.7 = mostly correct with minor gaps\n"
" 0.4 = partially correct\n"
" 0.1 = mentions relevant topic but wrong or very incomplete\n"
" 0.0 = completely wrong or no answer\n\n"
"Consider: factual accuracy, completeness, and relevance.\n"
'Respond with ONLY a JSON object: {"score": <float>, "reason": "<one sentence>"}'
)
try:
response = await self.server.chat_completion(
messages=[{"role": "user", "content": judge_prompt}],
n=1,
max_tokens=150,
temperature=0.0,
split="eval",
)
text = response.choices[0].message.content if response.choices else ""
parsed = self._parse_judge_json(text)
if parsed is not None:
return float(parsed)
except Exception as e:
logger.debug(f"LLM judge failed: {e}. Using heuristic.")
return self._heuristic_score(expected, model_answer)
@staticmethod
def _parse_judge_json(text: str) -> Optional[float]:
"""Extract the score float from LLM judge JSON response."""
try:
clean = re.sub(r"```(?:json)?|```", "", text).strip()
data = json.loads(clean)
score = float(data.get("score", -1))
if 0.0 <= score <= 1.0:
return score
except Exception:
match = re.search(r'"score"\s*:\s*([0-9.]+)', text)
if match:
score = float(match.group(1))
if 0.0 <= score <= 1.0:
return score
return None
@staticmethod
def _heuristic_score(expected: str, model_answer: str) -> float:
"""Lightweight keyword overlap score as fallback."""
stopwords = {
"the", "a", "an", "is", "are", "was", "were", "of", "in", "on",
"at", "to", "for", "with", "and", "or", "but", "it", "its",
"this", "that", "as", "by", "from", "be", "has", "have", "had",
}
def tokenize(text: str) -> set:
tokens = re.findall(r'\b\w+\b', text.lower())
return {t for t in tokens if t not in stopwords and len(t) > 2}
expected_tokens = tokenize(expected)
answer_tokens = tokenize(model_answer)
if not expected_tokens:
return 0.5
overlap = len(expected_tokens & answer_tokens)
union = len(expected_tokens | answer_tokens)
jaccard = overlap / union if union > 0 else 0.0
recall = overlap / len(expected_tokens)
return min(1.0, 0.4 * jaccard + 0.6 * recall)
@staticmethod
def _extract_domains(text: str) -> set:
"""Extract unique domains from URLs cited in the response."""
urls = re.findall(r'https?://[^\s\)>\]"\']+', text)
domains = set()
for url in urls:
try:
parsed = urlparse(url)
domain = parsed.netloc.lower().lstrip("www.")
if domain:
domains.add(domain)
except Exception:
pass
return domains
# ---------------------------------------------------------------------------
# Entry point
# ---------------------------------------------------------------------------
if __name__ == "__main__":
WebResearchEnv.cli()

View File

@@ -17,6 +17,26 @@ logger = logging.getLogger(__name__)
DIRECTORY_PATH = Path.home() / ".hermes" / "channel_directory.json"
def _session_entry_id(origin: Dict[str, Any]) -> Optional[str]:
chat_id = origin.get("chat_id")
if not chat_id:
return None
thread_id = origin.get("thread_id")
if thread_id:
return f"{chat_id}:{thread_id}"
return str(chat_id)
def _session_entry_name(origin: Dict[str, Any]) -> str:
base_name = origin.get("chat_name") or origin.get("user_name") or str(origin.get("chat_id"))
thread_id = origin.get("thread_id")
if not thread_id:
return base_name
topic_label = origin.get("chat_topic") or f"topic {thread_id}"
return f"{base_name} / {topic_label}"
# ---------------------------------------------------------------------------
# Build / refresh
# ---------------------------------------------------------------------------
@@ -123,14 +143,15 @@ def _build_from_sessions(platform_name: str) -> List[Dict[str, str]]:
origin = session.get("origin") or {}
if origin.get("platform") != platform_name:
continue
chat_id = origin.get("chat_id")
if not chat_id or chat_id in seen_ids:
entry_id = _session_entry_id(origin)
if not entry_id or entry_id in seen_ids:
continue
seen_ids.add(chat_id)
seen_ids.add(entry_id)
entries.append({
"id": str(chat_id),
"name": origin.get("chat_name") or origin.get("user_name") or str(chat_id),
"id": entry_id,
"name": _session_entry_name(origin),
"type": session.get("chat_type", "dm"),
"thread_id": origin.get("thread_id"),
})
except Exception as e:
logger.debug("Channel directory: failed to read sessions for %s: %s", platform_name, e)

View File

@@ -270,7 +270,7 @@ def load_gateway_config() -> GatewayConfig:
gateway_config_path = Path.home() / ".hermes" / "gateway.json"
if gateway_config_path.exists():
try:
with open(gateway_config_path, "r") as f:
with open(gateway_config_path, "r", encoding="utf-8") as f:
data = json.load(f)
config = GatewayConfig.from_dict(data)
except Exception as e:
@@ -283,7 +283,7 @@ def load_gateway_config() -> GatewayConfig:
import yaml
config_yaml_path = Path.home() / ".hermes" / "config.yaml"
if config_yaml_path.exists():
with open(config_yaml_path) as f:
with open(config_yaml_path, encoding="utf-8") as f:
yaml_cfg = yaml.safe_load(f) or {}
sr = yaml_cfg.get("session_reset")
if sr and isinstance(sr, dict):
@@ -441,5 +441,5 @@ def save_gateway_config(config: GatewayConfig) -> None:
gateway_config_path = Path.home() / ".hermes" / "gateway.json"
gateway_config_path.parent.mkdir(parents=True, exist_ok=True)
with open(gateway_config_path, "w") as f:
with open(gateway_config_path, "w", encoding="utf-8") as f:
json.dump(config.to_dict(), f, indent=2)

View File

@@ -37,6 +37,7 @@ class DeliveryTarget:
"""
platform: Platform
chat_id: Optional[str] = None # None means use home channel
thread_id: Optional[str] = None
is_origin: bool = False
is_explicit: bool = False # True if chat_id was explicitly specified
@@ -58,6 +59,7 @@ class DeliveryTarget:
return cls(
platform=origin.platform,
chat_id=origin.chat_id,
thread_id=origin.thread_id,
is_origin=True,
)
else:
@@ -150,7 +152,7 @@ class DeliveryRouter:
continue
# Deduplicate
key = (target.platform, target.chat_id)
key = (target.platform, target.chat_id, target.thread_id)
if key not in seen_platforms:
seen_platforms.add(key)
targets.append(target)
@@ -285,7 +287,10 @@ class DeliveryRouter:
+ f"\n\n... [truncated, full output saved to {saved_path}]"
)
return await adapter.send(target.chat_id, content, metadata=metadata)
send_metadata = dict(metadata or {})
if target.thread_id and "thread_id" not in send_metadata:
send_metadata["thread_id"] = target.thread_id
return await adapter.send(target.chat_id, content, metadata=send_metadata or None)
def parse_deliver_spec(

View File

@@ -26,6 +26,7 @@ def mirror_to_session(
chat_id: str,
message_text: str,
source_label: str = "cli",
thread_id: Optional[str] = None,
) -> bool:
"""
Append a delivery-mirror message to the target session's transcript.
@@ -37,9 +38,9 @@ def mirror_to_session(
All errors are caught -- this is never fatal.
"""
try:
session_id = _find_session_id(platform, str(chat_id))
session_id = _find_session_id(platform, str(chat_id), thread_id=thread_id)
if not session_id:
logger.debug("Mirror: no session found for %s:%s", platform, chat_id)
logger.debug("Mirror: no session found for %s:%s:%s", platform, chat_id, thread_id)
return False
mirror_msg = {
@@ -57,11 +58,11 @@ def mirror_to_session(
return True
except Exception as e:
logger.debug("Mirror failed for %s:%s: %s", platform, chat_id, e)
logger.debug("Mirror failed for %s:%s:%s: %s", platform, chat_id, thread_id, e)
return False
def _find_session_id(platform: str, chat_id: str) -> Optional[str]:
def _find_session_id(platform: str, chat_id: str, thread_id: Optional[str] = None) -> Optional[str]:
"""
Find the active session_id for a platform + chat_id pair.
@@ -91,6 +92,9 @@ def _find_session_id(platform: str, chat_id: str) -> Optional[str]:
origin_chat_id = str(origin.get("chat_id", ""))
if origin_chat_id == str(chat_id):
origin_thread_id = origin.get("thread_id")
if thread_id is not None and str(origin_thread_id or "") != str(thread_id):
continue
updated = entry.get("updated_at", "")
if updated > best_updated:
best_updated = updated
@@ -111,6 +115,7 @@ def _append_to_jsonl(session_id: str, message: dict) -> None:
def _append_to_sqlite(session_id: str, message: dict) -> None:
"""Append a message to the SQLite session database."""
db = None
try:
from hermes_state import SessionDB
db = SessionDB()
@@ -121,3 +126,6 @@ def _append_to_sqlite(session_id: str, message: dict) -> None:
)
except Exception as e:
logger.debug("Mirror SQLite write failed: %s", e)
finally:
if db is not None:
db.close()

View File

@@ -24,7 +24,7 @@ from pathlib import Path as _Path
sys.path.insert(0, str(_Path(__file__).resolve().parents[2]))
from gateway.config import Platform, PlatformConfig
from gateway.session import SessionSource
from gateway.session import SessionSource, build_session_key
# ---------------------------------------------------------------------------
@@ -252,6 +252,7 @@ def cleanup_document_cache(max_age_hours: int = 24) -> int:
class MessageType(Enum):
"""Types of incoming messages."""
TEXT = "text"
LOCATION = "location"
PHOTO = "photo"
VIDEO = "video"
AUDIO = "audio"
@@ -412,11 +413,12 @@ class BasePlatformAdapter(ABC):
"""
return SendResult(success=False, error="Not supported")
async def send_typing(self, chat_id: str) -> None:
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""
Send a typing indicator.
Override in subclasses if the platform supports it.
metadata: optional dict with platform-specific context (e.g. thread_id for Slack).
"""
pass
@@ -514,6 +516,7 @@ class BasePlatformAdapter(ABC):
audio_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""
Send an audio file as a native voice message via the platform API.
@@ -533,6 +536,7 @@ class BasePlatformAdapter(ABC):
video_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""
Send a video natively via the platform API.
@@ -552,6 +556,7 @@ class BasePlatformAdapter(ABC):
caption: Optional[str] = None,
file_name: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""
Send a document/file natively via the platform API.
@@ -570,6 +575,7 @@ class BasePlatformAdapter(ABC):
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""
Send a local image file natively via the platform API.
@@ -619,7 +625,7 @@ class BasePlatformAdapter(ABC):
return media, cleaned
async def _keep_typing(self, chat_id: str, interval: float = 2.0) -> None:
async def _keep_typing(self, chat_id: str, interval: float = 2.0, metadata=None) -> None:
"""
Continuously send typing indicator until cancelled.
@@ -628,7 +634,7 @@ class BasePlatformAdapter(ABC):
"""
try:
while True:
await self.send_typing(chat_id)
await self.send_typing(chat_id, metadata=metadata)
await asyncio.sleep(interval)
except asyncio.CancelledError:
pass # Normal cancellation when handler completes
@@ -644,7 +650,7 @@ class BasePlatformAdapter(ABC):
if not self._message_handler:
return
session_key = event.source.chat_id
session_key = build_session_key(event.source)
# Check if there's already an active handler for this session
if session_key in self._active_sessions:
@@ -686,7 +692,8 @@ class BasePlatformAdapter(ABC):
self._active_sessions[session_key] = interrupt_event
# Start continuous typing indicator (refreshes every 2 seconds)
typing_task = asyncio.create_task(self._keep_typing(event.source.chat_id))
_thread_metadata = {"thread_id": event.source.thread_id} if event.source.thread_id else None
typing_task = asyncio.create_task(self._keep_typing(event.source.chat_id, metadata=_thread_metadata))
try:
# Call the handler (this can take a while with tool calls)
@@ -710,7 +717,8 @@ class BasePlatformAdapter(ABC):
result = await self.send(
chat_id=event.source.chat_id,
content=text_content,
reply_to=event.message_id
reply_to=event.message_id,
metadata=_thread_metadata,
)
# Log send failures (don't raise - user already saw tool progress)
@@ -720,7 +728,8 @@ class BasePlatformAdapter(ABC):
fallback_result = await self.send(
chat_id=event.source.chat_id,
content=f"(Response formatting failed, plain text:)\n\n{text_content[:3500]}",
reply_to=event.message_id
reply_to=event.message_id,
metadata=_thread_metadata,
)
if not fallback_result.success:
print(f"[{self.name}] Fallback send also failed: {fallback_result.error}")
@@ -742,12 +751,14 @@ class BasePlatformAdapter(ABC):
chat_id=event.source.chat_id,
animation_url=image_url,
caption=alt_text if alt_text else None,
metadata=_thread_metadata,
)
else:
img_result = await self.send_image(
chat_id=event.source.chat_id,
image_url=image_url,
caption=alt_text if alt_text else None,
metadata=_thread_metadata,
)
if not img_result.success:
logger.error("[%s] Failed to send image: %s", self.name, img_result.error)
@@ -768,21 +779,25 @@ class BasePlatformAdapter(ABC):
media_result = await self.send_voice(
chat_id=event.source.chat_id,
audio_path=media_path,
metadata=_thread_metadata,
)
elif ext in _VIDEO_EXTS:
media_result = await self.send_video(
chat_id=event.source.chat_id,
video_path=media_path,
metadata=_thread_metadata,
)
elif ext in _IMAGE_EXTS:
media_result = await self.send_image_file(
chat_id=event.source.chat_id,
image_path=media_path,
metadata=_thread_metadata,
)
else:
media_result = await self.send_document(
chat_id=event.source.chat_id,
file_path=media_path,
metadata=_thread_metadata,
)
if not media_result.success:

View File

@@ -72,11 +72,11 @@ class DiscordAdapter(BasePlatformAdapter):
async def connect(self) -> bool:
"""Connect to Discord and start receiving events."""
if not DISCORD_AVAILABLE:
print(f"[{self.name}] discord.py not installed. Run: pip install discord.py")
logger.error("[%s] discord.py not installed. Run: pip install discord.py", self.name)
return False
if not self.config.token:
print(f"[{self.name}] No bot token configured")
logger.error("[%s] No bot token configured", self.name)
return False
try:
@@ -105,7 +105,7 @@ class DiscordAdapter(BasePlatformAdapter):
# Register event handlers
@self._client.event
async def on_ready():
print(f"[{adapter_self.name}] Connected as {adapter_self._client.user}")
logger.info("[%s] Connected as %s", adapter_self.name, adapter_self._client.user)
# Resolve any usernames in the allowed list to numeric IDs
await adapter_self._resolve_allowed_usernames()
@@ -113,16 +113,30 @@ class DiscordAdapter(BasePlatformAdapter):
# Sync slash commands with Discord
try:
synced = await adapter_self._client.tree.sync()
print(f"[{adapter_self.name}] Synced {len(synced)} slash command(s)")
except Exception as e:
print(f"[{adapter_self.name}] Slash command sync failed: {e}")
logger.info("[%s] Synced %d slash command(s)", adapter_self.name, len(synced))
except Exception as e: # pragma: no cover - defensive logging
logger.warning("[%s] Slash command sync failed: %s", adapter_self.name, e, exc_info=True)
adapter_self._ready_event.set()
@self._client.event
async def on_message(message: DiscordMessage):
# Ignore bot's own messages
# Always ignore our own messages
if message.author == self._client.user:
return
# Bot message filtering (DISCORD_ALLOW_BOTS):
# "none" — ignore all other bots (default)
# "mentions" — accept bot messages only when they @mention us
# "all" — accept all bot messages
if getattr(message.author, "bot", False):
allow_bots = os.getenv("DISCORD_ALLOW_BOTS", "none").lower().strip()
if allow_bots == "none":
return
elif allow_bots == "mentions":
if not self._client.user or self._client.user not in message.mentions:
return
# "all" falls through to handle_message
await self._handle_message(message)
# Register slash commands
@@ -138,10 +152,10 @@ class DiscordAdapter(BasePlatformAdapter):
return True
except asyncio.TimeoutError:
print(f"[{self.name}] Timeout waiting for connection")
logger.error("[%s] Timeout waiting for connection to Discord", self.name, exc_info=True)
return False
except Exception as e:
print(f"[{self.name}] Failed to connect: {e}")
except Exception as e: # pragma: no cover - defensive logging
logger.error("[%s] Failed to connect to Discord: %s", self.name, e, exc_info=True)
return False
async def disconnect(self) -> None:
@@ -149,13 +163,13 @@ class DiscordAdapter(BasePlatformAdapter):
if self._client:
try:
await self._client.close()
except Exception as e:
print(f"[{self.name}] Error during disconnect: {e}")
except Exception as e: # pragma: no cover - defensive logging
logger.warning("[%s] Error during disconnect: %s", self.name, e, exc_info=True)
self._running = False
self._client = None
self._ready_event.clear()
print(f"[{self.name}] Disconnected")
logger.info("[%s] Disconnected", self.name)
async def send(
self,
@@ -204,7 +218,8 @@ class DiscordAdapter(BasePlatformAdapter):
raw_response={"message_ids": message_ids}
)
except Exception as e:
except Exception as e: # pragma: no cover - defensive logging
logger.error("[%s] Failed to send Discord message: %s", self.name, e, exc_info=True)
return SendResult(success=False, error=str(e))
async def edit_message(
@@ -226,7 +241,8 @@ class DiscordAdapter(BasePlatformAdapter):
formatted = formatted[:self.MAX_MESSAGE_LENGTH - 3] + "..."
await msg.edit(content=formatted)
return SendResult(success=True, message_id=message_id)
except Exception as e:
except Exception as e: # pragma: no cover - defensive logging
logger.error("[%s] Failed to edit Discord message %s: %s", self.name, message_id, e, exc_info=True)
return SendResult(success=False, error=str(e))
async def send_voice(
@@ -263,8 +279,8 @@ class DiscordAdapter(BasePlatformAdapter):
)
return SendResult(success=True, message_id=str(msg.id))
except Exception as e:
print(f"[{self.name}] Failed to send audio: {e}")
except Exception as e: # pragma: no cover - defensive logging
logger.error("[%s] Failed to send audio, falling back to base adapter: %s", self.name, e, exc_info=True)
return await super().send_voice(chat_id, audio_path, caption, reply_to)
async def send_image_file(
@@ -300,8 +316,8 @@ class DiscordAdapter(BasePlatformAdapter):
)
return SendResult(success=True, message_id=str(msg.id))
except Exception as e:
print(f"[{self.name}] Failed to send local image: {e}")
except Exception as e: # pragma: no cover - defensive logging
logger.error("[%s] Failed to send local image, falling back to base adapter: %s", self.name, e, exc_info=True)
return await super().send_image_file(chat_id, image_path, caption, reply_to)
async def send_image(
@@ -353,13 +369,22 @@ class DiscordAdapter(BasePlatformAdapter):
return SendResult(success=True, message_id=str(msg.id))
except ImportError:
print(f"[{self.name}] aiohttp not installed, falling back to URL. Run: pip install aiohttp")
logger.warning(
"[%s] aiohttp not installed, falling back to URL. Run: pip install aiohttp",
self.name,
exc_info=True,
)
return await super().send_image(chat_id, image_url, caption, reply_to)
except Exception as e:
print(f"[{self.name}] Failed to send image attachment, falling back to URL: {e}")
except Exception as e: # pragma: no cover - defensive logging
logger.error(
"[%s] Failed to send image attachment, falling back to URL: %s",
self.name,
e,
exc_info=True,
)
return await super().send_image(chat_id, image_url, caption, reply_to)
async def send_typing(self, chat_id: str) -> None:
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""Send typing indicator."""
if self._client:
try:
@@ -404,7 +429,8 @@ class DiscordAdapter(BasePlatformAdapter):
"guild_id": str(channel.guild.id) if hasattr(channel, "guild") and channel.guild else None,
"guild_name": channel.guild.name if hasattr(channel, "guild") and channel.guild else None,
}
except Exception as e:
except Exception as e: # pragma: no cover - defensive logging
logger.error("[%s] Failed to get chat info for %s: %s", self.name, chat_id, e, exc_info=True)
return {"name": str(chat_id), "type": "dm", "error": str(e)}
async def _resolve_allowed_usernames(self) -> None:

View File

@@ -419,7 +419,7 @@ class HomeAssistantAdapter(BasePlatformAdapter):
except Exception as e:
return SendResult(success=False, error=str(e))
async def send_typing(self, chat_id: str) -> None:
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""No typing indicator for Home Assistant."""
pass

View File

@@ -104,6 +104,20 @@ def _is_audio_ext(ext: str) -> bool:
return ext.lower() in (".mp3", ".wav", ".ogg", ".m4a", ".aac")
_EXT_TO_MIME = {
".jpg": "image/jpeg", ".jpeg": "image/jpeg", ".png": "image/png",
".gif": "image/gif", ".webp": "image/webp",
".ogg": "audio/ogg", ".mp3": "audio/mpeg", ".wav": "audio/wav",
".m4a": "audio/mp4", ".aac": "audio/aac",
".mp4": "video/mp4", ".pdf": "application/pdf", ".zip": "application/zip",
}
def _ext_to_mime(ext: str) -> str:
"""Map file extension to MIME type."""
return _EXT_TO_MIME.get(ext.lower(), "application/octet-stream")
def _render_mentions(text: str, mentions: list) -> str:
"""Replace Signal mention placeholders (\\uFFFC) with readable @identifiers.
@@ -404,9 +418,8 @@ class SignalAdapter(BasePlatformAdapter):
# Process attachments
attachments_data = data_message.get("attachments", [])
image_paths = []
audio_path = None
document_paths = []
media_urls = []
media_types = []
if attachments_data and not getattr(self, "ignore_attachments", False):
for att in attachments_data:
@@ -420,12 +433,10 @@ class SignalAdapter(BasePlatformAdapter):
try:
cached_path, ext = await self._fetch_attachment(att_id)
if cached_path:
if _is_image_ext(ext):
image_paths.append(cached_path)
elif _is_audio_ext(ext):
audio_path = cached_path
else:
document_paths.append(cached_path)
# Use contentType from Signal if available, else map from extension
content_type = att.get("contentType") or _ext_to_mime(ext)
media_urls.append(cached_path)
media_types.append(content_type)
except Exception:
logger.exception("Signal: failed to fetch attachment %s", att_id)
@@ -440,12 +451,13 @@ class SignalAdapter(BasePlatformAdapter):
chat_id_alt=group_id if is_group else None,
)
# Determine message type
# Determine message type from media
msg_type = MessageType.TEXT
if audio_path:
msg_type = MessageType.VOICE
elif image_paths:
msg_type = MessageType.IMAGE
if media_types:
if any(mt.startswith("audio/") for mt in media_types):
msg_type = MessageType.VOICE
elif any(mt.startswith("image/") for mt in media_types):
msg_type = MessageType.IMAGE
# Parse timestamp from envelope data (milliseconds since epoch)
ts_ms = envelope_data.get("timestamp", 0)
@@ -462,9 +474,8 @@ class SignalAdapter(BasePlatformAdapter):
source=source,
text=text or "",
message_type=msg_type,
image_paths=image_paths,
audio_path=audio_path,
document_paths=document_paths,
media_urls=media_urls,
media_types=media_types,
timestamp=timestamp,
)
@@ -546,16 +557,16 @@ class SignalAdapter(BasePlatformAdapter):
async def send(
self,
chat_id: str,
text: str,
reply_to_message_id: Optional[str] = None,
**kwargs,
content: str,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send a text message."""
await self._stop_typing_indicator(chat_id)
params: Dict[str, Any] = {
"account": self.account,
"message": text,
"message": content,
}
if chat_id.startswith("group:"):
@@ -569,7 +580,7 @@ class SignalAdapter(BasePlatformAdapter):
return SendResult(success=True)
return SendResult(success=False, error="RPC send failed")
async def send_typing(self, chat_id: str) -> None:
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""Send a typing indicator."""
params: Dict[str, Any] = {
"account": self.account,

View File

@@ -10,6 +10,7 @@ Uses slack-bolt (Python) with Socket Mode for:
import asyncio
import os
import re
from typing import Dict, List, Optional, Any
try:
@@ -33,6 +34,8 @@ from gateway.platforms.base import (
MessageEvent,
MessageType,
SendResult,
SUPPORTED_DOCUMENT_TYPES,
cache_document_from_bytes,
cache_image_from_url,
cache_audio_from_url,
)
@@ -96,6 +99,13 @@ class SlackAdapter(BasePlatformAdapter):
async def handle_message_event(event, say):
await self._handle_slack_message(event)
# Acknowledge app_mention events to prevent Bolt 404 errors.
# The "message" handler above already processes @mentions in
# channels, so this is intentionally a no-op to avoid duplicates.
@self._app.event("app_mention")
async def handle_app_mention(event, say):
pass
# Register slash command handler
@self._app.command("/hermes")
async def handle_hermes_command(ack, command):
@@ -175,7 +185,7 @@ class SlackAdapter(BasePlatformAdapter):
except Exception as e:
return SendResult(success=False, error=str(e))
async def send_typing(self, chat_id: str) -> None:
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""Slack doesn't have a direct typing indicator API for bots."""
pass
@@ -266,6 +276,65 @@ class SlackAdapter(BasePlatformAdapter):
except Exception as e:
return SendResult(success=False, error=str(e))
async def send_video(
self,
chat_id: str,
video_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send a video file to Slack."""
if not self._app:
return SendResult(success=False, error="Not connected")
if not os.path.exists(video_path):
return SendResult(success=False, error=f"Video file not found: {video_path}")
try:
result = await self._app.client.files_upload_v2(
channel=chat_id,
file=video_path,
filename=os.path.basename(video_path),
initial_comment=caption or "",
thread_ts=reply_to,
)
return SendResult(success=True, raw_response=result)
except Exception as e:
print(f"[{self.name}] Failed to send video: {e}")
return await super().send_video(chat_id, video_path, caption, reply_to)
async def send_document(
self,
chat_id: str,
file_path: str,
caption: Optional[str] = None,
file_name: Optional[str] = None,
reply_to: Optional[str] = None,
) -> SendResult:
"""Send a document/file attachment to Slack."""
if not self._app:
return SendResult(success=False, error="Not connected")
if not os.path.exists(file_path):
return SendResult(success=False, error=f"File not found: {file_path}")
display_name = file_name or os.path.basename(file_path)
try:
result = await self._app.client.files_upload_v2(
channel=chat_id,
file=file_path,
filename=display_name,
initial_comment=caption or "",
thread_ts=reply_to,
)
return SendResult(success=True, raw_response=result)
except Exception as e:
print(f"[{self.name}] Failed to send document: {e}")
return await super().send_document(chat_id, file_path, caption, file_name, reply_to)
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Get information about a Slack channel."""
if not self._app:
@@ -347,6 +416,58 @@ class SlackAdapter(BasePlatformAdapter):
msg_type = MessageType.VOICE
except Exception as e:
print(f"[Slack] Failed to cache audio: {e}", flush=True)
elif url:
# Try to handle as a document attachment
try:
original_filename = f.get("name", "")
ext = ""
if original_filename:
_, ext = os.path.splitext(original_filename)
ext = ext.lower()
# Fallback: reverse-lookup from MIME type
if not ext and mimetype:
mime_to_ext = {v: k for k, v in SUPPORTED_DOCUMENT_TYPES.items()}
ext = mime_to_ext.get(mimetype, "")
if ext not in SUPPORTED_DOCUMENT_TYPES:
continue # Skip unsupported file types silently
# Check file size (Slack limit: 20 MB for bots)
file_size = f.get("size", 0)
MAX_DOC_BYTES = 20 * 1024 * 1024
if not file_size or file_size > MAX_DOC_BYTES:
print(f"[Slack] Document too large or unknown size: {file_size}", flush=True)
continue
# Download and cache
raw_bytes = await self._download_slack_file_bytes(url)
cached_path = cache_document_from_bytes(
raw_bytes, original_filename or f"document{ext}"
)
doc_mime = SUPPORTED_DOCUMENT_TYPES[ext]
media_urls.append(cached_path)
media_types.append(doc_mime)
msg_type = MessageType.DOCUMENT
print(f"[Slack] Cached user document: {cached_path}", flush=True)
# Inject text content for .txt/.md files (capped at 100 KB)
MAX_TEXT_INJECT_BYTES = 100 * 1024
if ext in (".md", ".txt") and len(raw_bytes) <= MAX_TEXT_INJECT_BYTES:
try:
text_content = raw_bytes.decode("utf-8")
display_name = original_filename or f"document{ext}"
display_name = re.sub(r'[^\w.\- ]', '_', display_name)
injection = f"[Content of {display_name}]:\n{text_content}"
if text:
text = f"{injection}\n\n{text}"
else:
text = injection
except UnicodeDecodeError:
pass # Binary content, skip injection
except Exception as e:
print(f"[Slack] Failed to cache document: {e}", flush=True)
# Build source
source = self.build_source(
@@ -427,3 +548,16 @@ class SlackAdapter(BasePlatformAdapter):
else:
from gateway.platforms.base import cache_image_from_bytes
return cache_image_from_bytes(response.content, ext)
async def _download_slack_file_bytes(self, url: str) -> bytes:
"""Download a Slack file and return raw bytes."""
import httpx
bot_token = self.config.token
async with httpx.AsyncClient(timeout=30.0, follow_redirects=True) as client:
response = await client.get(
url,
headers={"Authorization": f"Bearer {bot_token}"},
)
response.raise_for_status()
return response.content

View File

@@ -86,6 +86,9 @@ def _strip_mdv2(text: str) -> str:
cleaned = re.sub(r'\\([_*\[\]()~`>#\+\-=|{}.!\\])', r'\1', text)
# Remove MarkdownV2 bold markers that format_message converted from **bold**
cleaned = re.sub(r'\*([^*]+)\*', r'\1', cleaned)
# Remove MarkdownV2 italic markers that format_message converted from *italic*
# Use word boundary (\b) to avoid breaking snake_case like my_variable_name
cleaned = re.sub(r'(?<!\w)_([^_]+)_(?!\w)', r'\1', cleaned)
return cleaned
@@ -111,11 +114,14 @@ class TelegramAdapter(BasePlatformAdapter):
async def connect(self) -> bool:
"""Connect to Telegram and start polling for updates."""
if not TELEGRAM_AVAILABLE:
print(f"[{self.name}] python-telegram-bot not installed. Run: pip install python-telegram-bot")
logger.error(
"[%s] python-telegram-bot not installed. Run: pip install python-telegram-bot",
self.name,
)
return False
if not self.config.token:
print(f"[{self.name}] No bot token configured")
logger.error("[%s] No bot token configured", self.name)
return False
try:
@@ -132,6 +138,10 @@ class TelegramAdapter(BasePlatformAdapter):
filters.COMMAND,
self._handle_command
))
self._app.add_handler(TelegramMessageHandler(
filters.LOCATION | getattr(filters, "VENUE", filters.LOCATION),
self._handle_location_message
))
self._app.add_handler(TelegramMessageHandler(
filters.PHOTO | filters.VIDEO | filters.AUDIO | filters.VOICE | filters.Document.ALL | filters.Sticker.ALL,
self._handle_media_message
@@ -166,14 +176,19 @@ class TelegramAdapter(BasePlatformAdapter):
BotCommand("help", "Show available commands"),
])
except Exception as e:
print(f"[{self.name}] Could not register command menu: {e}")
logger.warning(
"[%s] Could not register Telegram command menu: %s",
self.name,
e,
exc_info=True,
)
self._running = True
print(f"[{self.name}] Connected and polling for updates")
logger.info("[%s] Connected and polling for Telegram updates", self.name)
return True
except Exception as e:
print(f"[{self.name}] Failed to connect: {e}")
logger.error("[%s] Failed to connect to Telegram: %s", self.name, e, exc_info=True)
return False
async def disconnect(self) -> None:
@@ -184,12 +199,12 @@ class TelegramAdapter(BasePlatformAdapter):
await self._app.stop()
await self._app.shutdown()
except Exception as e:
print(f"[{self.name}] Error during disconnect: {e}")
logger.warning("[%s] Error during Telegram disconnect: %s", self.name, e, exc_info=True)
self._running = False
self._app = None
self._bot = None
print(f"[{self.name}] Disconnected")
logger.info("[%s] Disconnected from Telegram", self.name)
async def send(
self,
@@ -245,6 +260,7 @@ class TelegramAdapter(BasePlatformAdapter):
)
except Exception as e:
logger.error("[%s] Failed to send Telegram message: %s", self.name, e, exc_info=True)
return SendResult(success=False, error=str(e))
async def edit_message(
@@ -274,6 +290,13 @@ class TelegramAdapter(BasePlatformAdapter):
)
return SendResult(success=True, message_id=message_id)
except Exception as e:
logger.error(
"[%s] Failed to edit Telegram message %s: %s",
self.name,
message_id,
e,
exc_info=True,
)
return SendResult(success=False, error=str(e))
async def send_voice(
@@ -282,6 +305,7 @@ class TelegramAdapter(BasePlatformAdapter):
audio_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send audio as a native Telegram voice message or audio file."""
if not self._bot:
@@ -295,23 +319,32 @@ class TelegramAdapter(BasePlatformAdapter):
with open(audio_path, "rb") as audio_file:
# .ogg files -> send as voice (round playable bubble)
if audio_path.endswith(".ogg") or audio_path.endswith(".opus"):
_voice_thread = metadata.get("thread_id") if metadata else None
msg = await self._bot.send_voice(
chat_id=int(chat_id),
voice=audio_file,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=int(_voice_thread) if _voice_thread else None,
)
else:
# .mp3 and others -> send as audio file
_audio_thread = metadata.get("thread_id") if metadata else None
msg = await self._bot.send_audio(
chat_id=int(chat_id),
audio=audio_file,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=int(_audio_thread) if _audio_thread else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
print(f"[{self.name}] Failed to send voice/audio: {e}")
logger.error(
"[%s] Failed to send Telegram voice/audio, falling back to base adapter: %s",
self.name,
e,
exc_info=True,
)
return await super().send_voice(chat_id, audio_path, caption, reply_to)
async def send_image_file(
@@ -320,6 +353,7 @@ class TelegramAdapter(BasePlatformAdapter):
image_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send a local image file natively as a Telegram photo."""
if not self._bot:
@@ -339,15 +373,81 @@ class TelegramAdapter(BasePlatformAdapter):
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
print(f"[{self.name}] Failed to send local image: {e}")
logger.error(
"[%s] Failed to send Telegram local image, falling back to base adapter: %s",
self.name,
e,
exc_info=True,
)
return await super().send_image_file(chat_id, image_path, caption, reply_to)
async def send_document(
self,
chat_id: str,
file_path: str,
caption: Optional[str] = None,
file_name: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send a document/file natively as a Telegram file attachment."""
if not self._bot:
return SendResult(success=False, error="Not connected")
try:
if not os.path.exists(file_path):
return SendResult(success=False, error=f"File not found: {file_path}")
display_name = file_name or os.path.basename(file_path)
with open(file_path, "rb") as f:
msg = await self._bot.send_document(
chat_id=int(chat_id),
document=f,
filename=display_name,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
print(f"[{self.name}] Failed to send document: {e}")
return await super().send_document(chat_id, file_path, caption, file_name, reply_to)
async def send_video(
self,
chat_id: str,
video_path: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
**kwargs,
) -> SendResult:
"""Send a video natively as a Telegram video message."""
if not self._bot:
return SendResult(success=False, error="Not connected")
try:
if not os.path.exists(video_path):
return SendResult(success=False, error=f"Video file not found: {video_path}")
with open(video_path, "rb") as f:
msg = await self._bot.send_video(
chat_id=int(chat_id),
video=f,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
print(f"[{self.name}] Failed to send video: {e}")
return await super().send_video(chat_id, video_path, caption, reply_to)
async def send_image(
self,
chat_id: str,
image_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send an image natively as a Telegram photo.
@@ -359,15 +459,22 @@ class TelegramAdapter(BasePlatformAdapter):
try:
# Telegram can send photos directly from URLs (up to ~5MB)
_photo_thread = metadata.get("thread_id") if metadata else None
msg = await self._bot.send_photo(
chat_id=int(chat_id),
photo=image_url,
caption=caption[:1024] if caption else None, # Telegram caption limit
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=int(_photo_thread) if _photo_thread else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
logger.warning("[%s] URL-based send_photo failed (%s), trying file upload", self.name, e)
logger.warning(
"[%s] URL-based send_photo failed, trying file upload: %s",
self.name,
e,
exc_info=True,
)
# Fallback: download and upload as file (supports up to 10MB)
try:
import httpx
@@ -384,7 +491,12 @@ class TelegramAdapter(BasePlatformAdapter):
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e2:
logger.error("[%s] File upload send_photo also failed: %s", self.name, e2)
logger.error(
"[%s] File upload send_photo also failed: %s",
self.name,
e2,
exc_info=True,
)
# Final fallback: send URL as text
return await super().send_image(chat_id, image_url, caption, reply_to)
@@ -394,34 +506,50 @@ class TelegramAdapter(BasePlatformAdapter):
animation_url: str,
caption: Optional[str] = None,
reply_to: Optional[str] = None,
metadata: Optional[Dict[str, Any]] = None,
) -> SendResult:
"""Send an animated GIF natively as a Telegram animation (auto-plays inline)."""
if not self._bot:
return SendResult(success=False, error="Not connected")
try:
_anim_thread = metadata.get("thread_id") if metadata else None
msg = await self._bot.send_animation(
chat_id=int(chat_id),
animation=animation_url,
caption=caption[:1024] if caption else None,
reply_to_message_id=int(reply_to) if reply_to else None,
message_thread_id=int(_anim_thread) if _anim_thread else None,
)
return SendResult(success=True, message_id=str(msg.message_id))
except Exception as e:
print(f"[{self.name}] Failed to send animation, falling back to photo: {e}")
logger.error(
"[%s] Failed to send Telegram animation, falling back to photo: %s",
self.name,
e,
exc_info=True,
)
# Fallback: try as a regular photo
return await self.send_image(chat_id, animation_url, caption, reply_to)
async def send_typing(self, chat_id: str) -> None:
async def send_typing(self, chat_id: str, metadata: Optional[Dict[str, Any]] = None) -> None:
"""Send typing indicator."""
if self._bot:
try:
_typing_thread = metadata.get("thread_id") if metadata else None
await self._bot.send_chat_action(
chat_id=int(chat_id),
action="typing"
action="typing",
message_thread_id=int(_typing_thread) if _typing_thread else None,
)
except Exception as e:
# Typing failures are non-fatal; log at debug level only.
logger.debug(
"[%s] Failed to send Telegram typing indicator: %s",
self.name,
e,
exc_info=True,
)
except Exception:
pass # Ignore typing indicator failures
async def get_chat_info(self, chat_id: str) -> Dict[str, Any]:
"""Get information about a Telegram chat."""
@@ -448,6 +576,13 @@ class TelegramAdapter(BasePlatformAdapter):
"is_forum": getattr(chat, "is_forum", False),
}
except Exception as e:
logger.error(
"[%s] Failed to get Telegram chat info for %s: %s",
self.name,
chat_id,
e,
exc_info=True,
)
return {"name": str(chat_id), "type": "dm", "error": str(e)}
def format_message(self, content: str) -> str:
@@ -546,6 +681,41 @@ class TelegramAdapter(BasePlatformAdapter):
event = self._build_message_event(update.message, MessageType.COMMAND)
await self.handle_message(event)
async def _handle_location_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Handle incoming location/venue pin messages."""
if not update.message:
return
msg = update.message
venue = getattr(msg, "venue", None)
location = getattr(venue, "location", None) if venue else getattr(msg, "location", None)
if not location:
return
lat = getattr(location, "latitude", None)
lon = getattr(location, "longitude", None)
if lat is None or lon is None:
return
# Build a text message with coordinates and context
parts = ["[The user shared a location pin.]"]
if venue:
title = getattr(venue, "title", None)
address = getattr(venue, "address", None)
if title:
parts.append(f"Venue: {title}")
if address:
parts.append(f"Address: {address}")
parts.append(f"latitude: {lat}")
parts.append(f"longitude: {lon}")
parts.append(f"Map: https://www.google.com/maps/search/?api=1&query={lat},{lon}")
parts.append("Ask what they'd like to find nearby (restaurants, cafes, etc.) and any preferences.")
event = self._build_message_event(msg, MessageType.LOCATION)
event.text = "\n".join(parts)
await self.handle_message(event)
async def _handle_media_message(self, update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
"""Handle incoming media messages, downloading images to local cache."""
if not update.message:
@@ -601,9 +771,9 @@ class TelegramAdapter(BasePlatformAdapter):
cached_path = cache_image_from_bytes(bytes(image_bytes), ext=ext)
event.media_urls = [cached_path]
event.media_types = [f"image/{ext.lstrip('.')}"]
print(f"[Telegram] Cached user photo: {cached_path}", flush=True)
logger.info("[Telegram] Cached user photo at %s", cached_path)
except Exception as e:
print(f"[Telegram] Failed to cache photo: {e}", flush=True)
logger.warning("[Telegram] Failed to cache photo: %s", e, exc_info=True)
# Download voice/audio messages to cache for STT transcription
if msg.voice:
@@ -613,9 +783,9 @@ class TelegramAdapter(BasePlatformAdapter):
cached_path = cache_audio_from_bytes(bytes(audio_bytes), ext=".ogg")
event.media_urls = [cached_path]
event.media_types = ["audio/ogg"]
print(f"[Telegram] Cached user voice: {cached_path}", flush=True)
logger.info("[Telegram] Cached user voice at %s", cached_path)
except Exception as e:
print(f"[Telegram] Failed to cache voice: {e}", flush=True)
logger.warning("[Telegram] Failed to cache voice: %s", e, exc_info=True)
elif msg.audio:
try:
file_obj = await msg.audio.get_file()
@@ -623,9 +793,9 @@ class TelegramAdapter(BasePlatformAdapter):
cached_path = cache_audio_from_bytes(bytes(audio_bytes), ext=".mp3")
event.media_urls = [cached_path]
event.media_types = ["audio/mp3"]
print(f"[Telegram] Cached user audio: {cached_path}", flush=True)
logger.info("[Telegram] Cached user audio at %s", cached_path)
except Exception as e:
print(f"[Telegram] Failed to cache audio: {e}", flush=True)
logger.warning("[Telegram] Failed to cache audio: %s", e, exc_info=True)
# Download document files to cache for agent processing
elif msg.document:
@@ -650,7 +820,7 @@ class TelegramAdapter(BasePlatformAdapter):
f"Unsupported document type '{ext or 'unknown'}'. "
f"Supported types: {supported_list}"
)
print(f"[Telegram] Unsupported document type: {ext or 'unknown'}", flush=True)
logger.info("[Telegram] Unsupported document type: %s", ext or "unknown")
await self.handle_message(event)
return
@@ -661,7 +831,7 @@ class TelegramAdapter(BasePlatformAdapter):
"The document is too large or its size could not be verified. "
"Maximum: 20 MB."
)
print(f"[Telegram] Document too large: {doc.file_size} bytes", flush=True)
logger.info("[Telegram] Document too large: %s bytes", doc.file_size)
await self.handle_message(event)
return
@@ -673,7 +843,7 @@ class TelegramAdapter(BasePlatformAdapter):
mime_type = SUPPORTED_DOCUMENT_TYPES[ext]
event.media_urls = [cached_path]
event.media_types = [mime_type]
print(f"[Telegram] Cached user document: {cached_path}", flush=True)
logger.info("[Telegram] Cached user document at %s", cached_path)
# For text files, inject content into event.text (capped at 100 KB)
MAX_TEXT_INJECT_BYTES = 100 * 1024
@@ -688,10 +858,13 @@ class TelegramAdapter(BasePlatformAdapter):
else:
event.text = injection
except UnicodeDecodeError:
print(f"[Telegram] Could not decode text file as UTF-8, skipping content injection", flush=True)
logger.warning(
"[Telegram] Could not decode text file as UTF-8, skipping content injection",
exc_info=True,
)
except Exception as e:
print(f"[Telegram] Failed to cache document: {e}", flush=True)
logger.warning("[Telegram] Failed to cache document: %s", e, exc_info=True)
await self.handle_message(event)
@@ -726,7 +899,7 @@ class TelegramAdapter(BasePlatformAdapter):
event.text = build_sticker_injection(
cached["description"], cached.get("emoji", emoji), cached.get("set_name", set_name)
)
print(f"[Telegram] Sticker cache hit: {sticker.file_unique_id}", flush=True)
logger.info("[Telegram] Sticker cache hit: %s", sticker.file_unique_id)
return
# Cache miss -- download and analyze
@@ -734,7 +907,7 @@ class TelegramAdapter(BasePlatformAdapter):
file_obj = await sticker.get_file()
image_bytes = await file_obj.download_as_bytearray()
cached_path = cache_image_from_bytes(bytes(image_bytes), ext=".webp")
print(f"[Telegram] Analyzing sticker: {cached_path}", flush=True)
logger.info("[Telegram] Analyzing sticker at %s", cached_path)
from tools.vision_tools import vision_analyze_tool
import json as _json
@@ -756,7 +929,7 @@ class TelegramAdapter(BasePlatformAdapter):
emoji, set_name,
)
except Exception as e:
print(f"[Telegram] Sticker analysis error: {e}", flush=True)
logger.warning("[Telegram] Sticker analysis error: %s", e, exc_info=True)
event.text = build_sticker_injection(
f"a sticker with emoji {emoji}" if emoji else "a sticker",
emoji, set_name,

View File

@@ -181,8 +181,8 @@ class WhatsAppAdapter(BasePlatformAdapter):
# Kill any orphaned bridge from a previous gateway run
_kill_port_process(self._bridge_port)
import time
time.sleep(1)
import asyncio
await asyncio.sleep(1)
# Start the bridge process in its own process group.
# Route output to a log file so QR codes, errors, and reconnection
@@ -493,7 +493,7 @@ class WhatsAppAdapter(BasePlatformAdapter):
file_name or os.path.basename(file_path),
)
async def send_typing(self, chat_id: str) -> None:
async def send_typing(self, chat_id: str, metadata=None) -> None:
"""Send typing indicator via bridge."""
if not self._running:
return

View File

@@ -48,7 +48,7 @@ _config_path = _hermes_home / 'config.yaml'
if _config_path.exists():
try:
import yaml as _yaml
with open(_config_path) as _f:
with open(_config_path, encoding="utf-8") as _f:
_cfg = _yaml.safe_load(_f) or {}
# Top-level simple values (fallback only — don't override .env)
for _key, _val in _cfg.items():
@@ -75,11 +75,16 @@ if _config_path.exists():
"container_memory": "TERMINAL_CONTAINER_MEMORY",
"container_disk": "TERMINAL_CONTAINER_DISK",
"container_persistent": "TERMINAL_CONTAINER_PERSISTENT",
"docker_volumes": "TERMINAL_DOCKER_VOLUMES",
"sandbox_dir": "TERMINAL_SANDBOX_DIR",
}
for _cfg_key, _env_var in _terminal_env_map.items():
if _cfg_key in _terminal_cfg:
os.environ[_env_var] = str(_terminal_cfg[_cfg_key])
_val = _terminal_cfg[_cfg_key]
if isinstance(_val, list):
os.environ[_env_var] = json.dumps(_val)
else:
os.environ[_env_var] = str(_val)
_compression_cfg = _cfg.get("compression", {})
if _compression_cfg and isinstance(_compression_cfg, dict):
_compression_env_map = {
@@ -311,7 +316,7 @@ class GatewayRunner:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path) as _f:
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
file_path = cfg.get("prefill_messages_file", "")
except Exception:
@@ -349,7 +354,7 @@ class GatewayRunner:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path) as _f:
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
return (cfg.get("agent", {}).get("system_prompt", "") or "").strip()
except Exception:
@@ -361,7 +366,7 @@ class GatewayRunner:
"""Load reasoning effort from config or env var.
Checks HERMES_REASONING_EFFORT env var first, then agent.reasoning_effort
in config.yaml. Valid: "xhigh", "high", "medium", "low", "minimal", "none".
in config.yaml. Valid: "none", "low", "medium", "high", "xhigh".
Returns None to use default (medium).
"""
effort = os.getenv("HERMES_REASONING_EFFORT", "")
@@ -370,7 +375,7 @@ class GatewayRunner:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path) as _f:
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
effort = str(cfg.get("agent", {}).get("reasoning_effort", "") or "").strip()
except Exception:
@@ -380,12 +385,47 @@ class GatewayRunner:
effort = effort.lower().strip()
if effort == "none":
return {"enabled": False}
valid = ("xhigh", "high", "medium", "low", "minimal")
valid = ("low", "medium", "high", "xhigh")
if effort in valid:
return {"enabled": True, "effort": effort}
logger.warning("Unknown reasoning_effort '%s', using default (medium)", effort)
return None
@staticmethod
def _load_background_notifications_mode() -> str:
"""Load background process notification mode from config or env var.
Modes:
- ``all`` — push running-output updates *and* the final message (default)
- ``result`` — only the final completion message (regardless of exit code)
- ``error`` — only the final message when exit code is non-zero
- ``off`` — no watcher messages at all
"""
mode = os.getenv("HERMES_BACKGROUND_NOTIFICATIONS", "")
if not mode:
try:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
raw = cfg.get("display", {}).get("background_process_notifications")
if raw is False:
mode = "off"
elif raw not in (None, ""):
mode = str(raw)
except Exception:
pass
mode = (mode or "all").strip().lower()
valid = {"all", "result", "error", "off"}
if mode not in valid:
logger.warning(
"Unknown background_process_notifications '%s', defaulting to 'all'",
mode,
)
return "all"
return mode
@staticmethod
def _load_provider_routing() -> dict:
"""Load OpenRouter provider routing preferences from config.yaml."""
@@ -393,7 +433,7 @@ class GatewayRunner:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path) as _f:
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
return cfg.get("provider_routing", {}) or {}
except Exception:
@@ -411,7 +451,7 @@ class GatewayRunner:
import yaml as _y
cfg_path = _hermes_home / "config.yaml"
if cfg_path.exists():
with open(cfg_path) as _f:
with open(cfg_path, encoding="utf-8") as _f:
cfg = _y.safe_load(_f) or {}
fb = cfg.get("fallback_model", {}) or {}
if fb.get("provider") and fb.get("model"):
@@ -766,7 +806,8 @@ class GatewayRunner:
_known_commands = {"new", "reset", "help", "status", "stop", "model",
"personality", "retry", "undo", "sethome", "set-home",
"compress", "usage", "insights", "reload-mcp", "reload_mcp",
"update", "title", "resume", "provider"}
"update", "title", "resume", "provider", "rollback",
"background"}
if command and command in _known_commands:
await self.hooks.emit(f"command:{command}", {
"platform": source.platform.value if source.platform else "",
@@ -825,7 +866,39 @@ class GatewayRunner:
if command == "resume":
return await self._handle_resume_command(event)
if command == "rollback":
return await self._handle_rollback_command(event)
if command == "background":
return await self._handle_background_command(event)
# User-defined quick commands (bypass agent loop, no LLM call)
if command:
quick_commands = self.config.get("quick_commands", {})
if command in quick_commands:
qcmd = quick_commands[command]
if qcmd.get("type") == "exec":
exec_cmd = qcmd.get("command", "")
if exec_cmd:
try:
proc = await asyncio.create_subprocess_shell(
exec_cmd,
stdout=asyncio.subprocess.PIPE,
stderr=asyncio.subprocess.PIPE,
)
stdout, stderr = await asyncio.wait_for(proc.communicate(), timeout=30)
output = (stdout or stderr).decode().strip()
return output if output else "Command returned no output."
except asyncio.TimeoutError:
return "Quick command timed out (30s)."
except Exception as e:
return f"Quick command error: {e}"
else:
return f"Quick command '/{command}' has no command defined."
else:
return f"Quick command '/{command}' has unsupported type (only 'exec' is supported)."
# Skill slash commands: /skill-name loads the skill and sends to agent
if command:
try:
@@ -907,9 +980,12 @@ class GatewayRunner:
# repeated truncation/context failures. Detect this early and
# compress proactively — before the agent even starts. (#628)
#
# Thresholds are derived from the SAME compression config the
# agent uses (compression.threshold × model context length) so
# CLI and messaging platforms behave identically.
# Token source priority:
# 1. Actual API-reported prompt_tokens from the last turn
# (stored in session_entry.last_prompt_tokens)
# 2. Rough char-based estimate (str(msg)//4) with a 1.4x
# safety factor to account for overestimation on tool-heavy
# conversations (code/JSON tokenizes at 5-7+ chars/token).
# -----------------------------------------------------------------
if history and len(history) >= 4:
from agent.model_metadata import (
@@ -926,7 +1002,7 @@ class GatewayRunner:
_hyg_cfg_path = _hermes_home / "config.yaml"
if _hyg_cfg_path.exists():
import yaml as _hyg_yaml
with open(_hyg_cfg_path) as _hyg_f:
with open(_hyg_cfg_path, encoding="utf-8") as _hyg_f:
_hyg_data = _hyg_yaml.safe_load(_hyg_f) or {}
# Resolve model name (same logic as run_sync)
@@ -960,31 +1036,48 @@ class GatewayRunner:
_compress_token_threshold = int(
_hyg_context_length * _hyg_threshold_pct
)
# Warn if still huge after compression (95% of context)
_warn_token_threshold = int(_hyg_context_length * 0.95)
_msg_count = len(history)
_approx_tokens = estimate_messages_tokens_rough(history)
# Prefer actual API-reported tokens from the last turn
# (stored in session entry) over the rough char-based estimate.
# The rough estimate (str(msg)//4) overestimates by 30-50% on
# tool-heavy/code-heavy conversations, causing premature compression.
_stored_tokens = session_entry.last_prompt_tokens
if _stored_tokens > 0:
_approx_tokens = _stored_tokens
_token_source = "actual"
else:
_approx_tokens = estimate_messages_tokens_rough(history)
# Apply safety factor only for rough estimates
_compress_token_threshold = int(
_compress_token_threshold * 1.4
)
_warn_token_threshold = int(_warn_token_threshold * 1.4)
_token_source = "estimated"
_needs_compress = _approx_tokens >= _compress_token_threshold
if _needs_compress:
logger.info(
"Session hygiene: %s messages, ~%s tokens — auto-compressing "
"Session hygiene: %s messages, ~%s tokens (%s) — auto-compressing "
"(threshold: %s%% of %s = %s tokens)",
_msg_count, f"{_approx_tokens:,}",
_msg_count, f"{_approx_tokens:,}", _token_source,
int(_hyg_threshold_pct * 100),
f"{_hyg_context_length:,}",
f"{_compress_token_threshold:,}",
)
_hyg_adapter = self.adapters.get(source.platform)
_hyg_meta = {"thread_id": source.thread_id} if source.thread_id else None
if _hyg_adapter:
try:
await _hyg_adapter.send(
source.chat_id,
f"🗜️ Session is large ({_msg_count} messages, "
f"~{_approx_tokens:,} tokens). Auto-compressing..."
f"~{_approx_tokens:,} tokens). Auto-compressing...",
metadata=_hyg_meta,
)
except Exception:
pass
@@ -1022,6 +1115,8 @@ class GatewayRunner:
self.session_store.rewrite_transcript(
session_entry.session_id, _compressed
)
# Reset stored token count — transcript was rewritten
session_entry.last_prompt_tokens = 0
history = _compressed
_new_count = len(_compressed)
_new_tokens = estimate_messages_tokens_rough(
@@ -1042,7 +1137,8 @@ class GatewayRunner:
f"🗜️ Compressed: {_msg_count}"
f"{_new_count} messages, "
f"~{_approx_tokens:,}"
f"~{_new_tokens:,} tokens"
f"~{_new_tokens:,} tokens",
metadata=_hyg_meta,
)
except Exception:
pass
@@ -1062,7 +1158,8 @@ class GatewayRunner:
"after compression "
f"(~{_new_tokens:,} tokens). "
"Consider using /reset to start "
"fresh if you experience issues."
"fresh if you experience issues.",
metadata=_hyg_meta,
)
except Exception:
pass
@@ -1074,6 +1171,7 @@ class GatewayRunner:
# Compression failed and session is dangerously large
if _approx_tokens >= _warn_token_threshold:
_hyg_adapter = self.adapters.get(source.platform)
_hyg_meta = {"thread_id": source.thread_id} if source.thread_id else None
if _hyg_adapter:
try:
await _hyg_adapter.send(
@@ -1083,7 +1181,8 @@ class GatewayRunner:
f"~{_approx_tokens:,} tokens) and "
"auto-compression failed. Consider "
"using /compress or /reset to avoid "
"issues."
"issues.",
metadata=_hyg_meta,
)
except Exception:
pass
@@ -1279,6 +1378,11 @@ class GatewayRunner:
{"role": "assistant", "content": response, "timestamp": ts}
)
else:
# The agent already persisted these messages to SQLite via
# _flush_messages_to_session_db(), so skip the DB write here
# to prevent the duplicate-write bug (#860). We still write
# to JSONL for backward compatibility and as a backup.
agent_persisted = self._session_db is not None
for msg in new_messages:
# Skip system messages (they're rebuilt each run)
if msg.get("role") == "system":
@@ -1286,11 +1390,15 @@ class GatewayRunner:
# Add timestamp to each message for debugging
entry = {**msg, "timestamp": ts}
self.session_store.append_to_transcript(
session_entry.session_id, entry
session_entry.session_id, entry,
skip_db=agent_persisted,
)
# Update session
self.session_store.update_session(session_entry.session_key)
# Update session with actual prompt token count from the agent
self.session_store.update_session(
session_entry.session_key,
last_prompt_tokens=agent_result.get("last_prompt_tokens", 0),
)
return response
@@ -1395,6 +1503,8 @@ class GatewayRunner:
"`/resume [name]` — Resume a previously-named session",
"`/usage` — Show token usage for this session",
"`/insights [days]` — Show usage insights and analytics",
"`/rollback [number]` — List or restore filesystem checkpoints",
"`/background <prompt>` — Run a prompt in a separate background session",
"`/reload-mcp` — Reload MCP servers from config",
"`/update` — Update Hermes Agent to the latest version",
"`/help` — Show this message",
@@ -1429,7 +1539,7 @@ class GatewayRunner:
current_provider = "openrouter"
try:
if config_path.exists():
with open(config_path) as f:
with open(config_path, encoding="utf-8") as f:
cfg = yaml.safe_load(f) or {}
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, str):
@@ -1449,6 +1559,11 @@ class GatewayRunner:
except Exception:
current_provider = "openrouter"
# Detect custom endpoint: provider resolved to openrouter but a custom
# base URL is configured — the user set up a custom endpoint.
if current_provider == "openrouter" and os.getenv("OPENAI_BASE_URL", "").strip():
current_provider = "custom"
if not args:
provider_label = _PROVIDER_LABELS.get(current_provider, current_provider)
lines = [
@@ -1515,14 +1630,14 @@ class GatewayRunner:
try:
user_config = {}
if config_path.exists():
with open(config_path) as f:
with open(config_path, encoding="utf-8") as f:
user_config = yaml.safe_load(f) or {}
if "model" not in user_config or not isinstance(user_config["model"], dict):
user_config["model"] = {}
user_config["model"]["default"] = new_model
if provider_changed:
user_config["model"]["provider"] = target_provider
with open(config_path, 'w') as f:
with open(config_path, 'w', encoding="utf-8") as f:
yaml.dump(user_config, f, default_flow_style=False, sort_keys=False)
except Exception as e:
return f"⚠️ Failed to save model change: {e}"
@@ -1559,7 +1674,7 @@ class GatewayRunner:
config_path = _hermes_home / 'config.yaml'
try:
if config_path.exists():
with open(config_path) as f:
with open(config_path, encoding="utf-8") as f:
cfg = yaml.safe_load(f) or {}
model_cfg = cfg.get("model", {})
if isinstance(model_cfg, dict):
@@ -1575,6 +1690,10 @@ class GatewayRunner:
except Exception:
current_provider = "openrouter"
# Detect custom endpoint
if current_provider == "openrouter" and os.getenv("OPENAI_BASE_URL", "").strip():
current_provider = "custom"
current_label = _PROVIDER_LABELS.get(current_provider, current_provider)
lines = [
@@ -1604,7 +1723,7 @@ class GatewayRunner:
try:
if config_path.exists():
with open(config_path, 'r') as f:
with open(config_path, 'r', encoding="utf-8") as f:
config = yaml.safe_load(f) or {}
personalities = config.get("agent", {}).get("personalities", {})
else:
@@ -1619,21 +1738,46 @@ class GatewayRunner:
if not args:
lines = ["🎭 **Available Personalities**\n"]
lines.append("• `none` — (no personality overlay)")
for name, prompt in personalities.items():
preview = prompt[:50] + "..." if len(prompt) > 50 else prompt
if isinstance(prompt, dict):
preview = prompt.get("description") or prompt.get("system_prompt", "")[:50]
else:
preview = prompt[:50] + "..." if len(prompt) > 50 else prompt
lines.append(f"• `{name}` — {preview}")
lines.append(f"\nUsage: `/personality <name>`")
return "\n".join(lines)
if args in personalities:
new_prompt = personalities[args]
def _resolve_prompt(value):
if isinstance(value, dict):
parts = [value.get("system_prompt", "")]
if value.get("tone"):
parts.append(f'Tone: {value["tone"]}')
if value.get("style"):
parts.append(f'Style: {value["style"]}')
return "\n".join(p for p in parts if p)
return str(value)
if args in ("none", "default", "neutral"):
try:
if "agent" not in config or not isinstance(config.get("agent"), dict):
config["agent"] = {}
config["agent"]["system_prompt"] = ""
with open(config_path, "w") as f:
yaml.dump(config, f, default_flow_style=False, sort_keys=False)
except Exception as e:
return f"⚠️ Failed to save personality change: {e}"
self._ephemeral_system_prompt = ""
return "🎭 Personality cleared — using base agent behavior.\n_(takes effect on next message)_"
elif args in personalities:
new_prompt = _resolve_prompt(personalities[args])
# Write to config.yaml, same pattern as CLI save_config_value.
try:
if "agent" not in config or not isinstance(config.get("agent"), dict):
config["agent"] = {}
config["agent"]["system_prompt"] = new_prompt
with open(config_path, 'w') as f:
with open(config_path, 'w', encoding="utf-8") as f:
yaml.dump(config, f, default_flow_style=False, sort_keys=False)
except Exception as e:
return f"⚠️ Failed to save personality change: {e}"
@@ -1643,7 +1787,7 @@ class GatewayRunner:
return f"🎭 Personality set to **{args}**\n_(takes effect on next message)_"
available = ", ".join(f"`{n}`" for n in personalities.keys())
available = "`none`, " + ", ".join(f"`{n}`" for n in personalities.keys())
return f"Unknown personality: `{args}`\n\nAvailable: {available}"
async def _handle_retry_command(self, event: MessageEvent) -> str:
@@ -1667,6 +1811,8 @@ class GatewayRunner:
# Truncate history to before the last user message and persist
truncated = history[:last_user_idx]
self.session_store.rewrite_transcript(session_entry.session_id, truncated)
# Reset stored token count — transcript was truncated
session_entry.last_prompt_tokens = 0
# Re-send by creating a fake text event with the old message
retry_event = MessageEvent(
@@ -1698,6 +1844,8 @@ class GatewayRunner:
removed_msg = history[last_user_idx].get("content", "")
removed_count = len(history) - last_user_idx
self.session_store.rewrite_transcript(session_entry.session_id, history[:last_user_idx])
# Reset stored token count — transcript was truncated
session_entry.last_prompt_tokens = 0
preview = removed_msg[:40] + "..." if len(removed_msg) > 40 else removed_msg
return f"↩️ Undid {removed_count} message(s).\nRemoved: \"{preview}\""
@@ -1717,10 +1865,10 @@ class GatewayRunner:
config_path = _hermes_home / 'config.yaml'
user_config = {}
if config_path.exists():
with open(config_path) as f:
with open(config_path, encoding="utf-8") as f:
user_config = yaml.safe_load(f) or {}
user_config[env_key] = chat_id
with open(config_path, 'w') as f:
with open(config_path, 'w', encoding="utf-8") as f:
yaml.dump(user_config, f, default_flow_style=False)
# Also set in the current environment so it takes effect immediately
os.environ[env_key] = str(chat_id)
@@ -1732,6 +1880,267 @@ class GatewayRunner:
f"Cron jobs and cross-platform messages will be delivered here."
)
async def _handle_rollback_command(self, event: MessageEvent) -> str:
"""Handle /rollback command — list or restore filesystem checkpoints."""
from tools.checkpoint_manager import CheckpointManager, format_checkpoint_list
# Read checkpoint config from config.yaml
cp_cfg = {}
try:
import yaml as _y
_cfg_path = _hermes_home / "config.yaml"
if _cfg_path.exists():
with open(_cfg_path, encoding="utf-8") as _f:
_data = _y.safe_load(_f) or {}
cp_cfg = _data.get("checkpoints", {})
if isinstance(cp_cfg, bool):
cp_cfg = {"enabled": cp_cfg}
except Exception:
pass
if not cp_cfg.get("enabled", False):
return (
"Checkpoints are not enabled.\n"
"Enable in config.yaml:\n```\ncheckpoints:\n enabled: true\n```"
)
mgr = CheckpointManager(
enabled=True,
max_snapshots=cp_cfg.get("max_snapshots", 50),
)
cwd = os.getenv("MESSAGING_CWD", str(Path.home()))
arg = event.get_command_args().strip()
if not arg:
checkpoints = mgr.list_checkpoints(cwd)
return format_checkpoint_list(checkpoints, cwd)
# Restore by number or hash
checkpoints = mgr.list_checkpoints(cwd)
if not checkpoints:
return f"No checkpoints found for {cwd}"
target_hash = None
try:
idx = int(arg) - 1
if 0 <= idx < len(checkpoints):
target_hash = checkpoints[idx]["hash"]
else:
return f"Invalid checkpoint number. Use 1-{len(checkpoints)}."
except ValueError:
target_hash = arg
result = mgr.restore(cwd, target_hash)
if result["success"]:
return (
f"✅ Restored to checkpoint {result['restored_to']}: {result['reason']}\n"
f"A pre-rollback snapshot was saved automatically."
)
return f"{result['error']}"
async def _handle_background_command(self, event: MessageEvent) -> str:
"""Handle /background <prompt> — run a prompt in a separate background session.
Spawns a new AIAgent in a background thread with its own session.
When it completes, sends the result back to the same chat without
modifying the active session's conversation history.
"""
prompt = event.get_command_args().strip()
if not prompt:
return (
"Usage: /background <prompt>\n"
"Example: /background Summarize the top HN stories today\n\n"
"Runs the prompt in a separate session. "
"You can keep chatting — the result will appear here when done."
)
source = event.source
task_id = f"bg_{datetime.now().strftime('%H%M%S')}_{os.urandom(3).hex()}"
# Fire-and-forget the background task
asyncio.create_task(
self._run_background_task(prompt, source, task_id)
)
preview = prompt[:60] + ("..." if len(prompt) > 60 else "")
return f'🔄 Background task started: "{preview}"\nTask ID: {task_id}\nYou can keep chatting — results will appear when done.'
async def _run_background_task(
self, prompt: str, source: "SessionSource", task_id: str
) -> None:
"""Execute a background agent task and deliver the result to the chat."""
from run_agent import AIAgent
adapter = self.adapters.get(source.platform)
if not adapter:
logger.warning("No adapter for platform %s in background task %s", source.platform, task_id)
return
_thread_metadata = {"thread_id": source.thread_id} if source.thread_id else None
try:
runtime_kwargs = _resolve_runtime_agent_kwargs()
if not runtime_kwargs.get("api_key"):
await adapter.send(
source.chat_id,
f"❌ Background task {task_id} failed: no provider credentials configured.",
metadata=_thread_metadata,
)
return
# Read model from config (same as _run_agent)
model = os.getenv("HERMES_MODEL") or os.getenv("LLM_MODEL") or "anthropic/claude-opus-4.6"
try:
import yaml as _y
_cfg_path = _hermes_home / "config.yaml"
if _cfg_path.exists():
with open(_cfg_path, encoding="utf-8") as _f:
_cfg = _y.safe_load(_f) or {}
_model_cfg = _cfg.get("model", {})
if isinstance(_model_cfg, str):
model = _model_cfg
elif isinstance(_model_cfg, dict):
model = _model_cfg.get("default", model)
except Exception:
pass
# Determine toolset (same logic as _run_agent)
default_toolset_map = {
Platform.LOCAL: "hermes-cli",
Platform.TELEGRAM: "hermes-telegram",
Platform.DISCORD: "hermes-discord",
Platform.WHATSAPP: "hermes-whatsapp",
Platform.SLACK: "hermes-slack",
Platform.SIGNAL: "hermes-signal",
Platform.HOMEASSISTANT: "hermes-homeassistant",
}
platform_toolsets_config = {}
try:
config_path = _hermes_home / 'config.yaml'
if config_path.exists():
import yaml
with open(config_path, 'r', encoding="utf-8") as f:
user_config = yaml.safe_load(f) or {}
platform_toolsets_config = user_config.get("platform_toolsets", {})
except Exception:
pass
platform_config_key = {
Platform.LOCAL: "cli",
Platform.TELEGRAM: "telegram",
Platform.DISCORD: "discord",
Platform.WHATSAPP: "whatsapp",
Platform.SLACK: "slack",
Platform.SIGNAL: "signal",
Platform.HOMEASSISTANT: "homeassistant",
}.get(source.platform, "telegram")
config_toolsets = platform_toolsets_config.get(platform_config_key)
if config_toolsets and isinstance(config_toolsets, list):
enabled_toolsets = config_toolsets
else:
default_toolset = default_toolset_map.get(source.platform, "hermes-telegram")
enabled_toolsets = [default_toolset]
platform_key = "cli" if source.platform == Platform.LOCAL else source.platform.value
pr = self._provider_routing
max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
def run_sync():
agent = AIAgent(
model=model,
**runtime_kwargs,
max_iterations=max_iterations,
quiet_mode=True,
verbose_logging=False,
enabled_toolsets=enabled_toolsets,
reasoning_config=self._reasoning_config,
providers_allowed=pr.get("only"),
providers_ignored=pr.get("ignore"),
providers_order=pr.get("order"),
provider_sort=pr.get("sort"),
provider_require_parameters=pr.get("require_parameters", False),
provider_data_collection=pr.get("data_collection"),
session_id=task_id,
platform=platform_key,
session_db=self._session_db,
fallback_model=self._fallback_model,
)
return agent.run_conversation(
user_message=prompt,
task_id=task_id,
)
loop = asyncio.get_event_loop()
result = await loop.run_in_executor(None, run_sync)
response = result.get("final_response", "") if result else ""
if not response and result and result.get("error"):
response = f"Error: {result['error']}"
# Extract media files from the response
if response:
media_files, response = adapter.extract_media(response)
images, text_content = adapter.extract_images(response)
preview = prompt[:60] + ("..." if len(prompt) > 60 else "")
header = f'✅ Background task complete\nPrompt: "{preview}"\n\n'
if text_content:
await adapter.send(
chat_id=source.chat_id,
content=header + text_content,
metadata=_thread_metadata,
)
elif not images and not media_files:
await adapter.send(
chat_id=source.chat_id,
content=header + "(No response generated)",
metadata=_thread_metadata,
)
# Send extracted images
for image_url, alt_text in (images or []):
try:
await adapter.send_image(
chat_id=source.chat_id,
image_url=image_url,
caption=alt_text,
)
except Exception:
pass
# Send media files
for media_path in (media_files or []):
try:
await adapter.send_file(
chat_id=source.chat_id,
file_path=media_path,
)
except Exception:
pass
else:
preview = prompt[:60] + ("..." if len(prompt) > 60 else "")
await adapter.send(
chat_id=source.chat_id,
content=f'✅ Background task complete\nPrompt: "{preview}"\n\n(No response generated)',
metadata=_thread_metadata,
)
except Exception as e:
logger.exception("Background task %s failed", task_id)
try:
await adapter.send(
chat_id=source.chat_id,
content=f"❌ Background task {task_id} failed: {e}",
metadata=_thread_metadata,
)
except Exception:
pass
async def _handle_compress_command(self, event: MessageEvent) -> str:
"""Handle /compress command -- manually compress conversation context."""
source = event.source
@@ -1772,6 +2181,10 @@ class GatewayRunner:
)
self.session_store.rewrite_transcript(session_entry.session_id, compressed)
# Reset stored token count — transcript changed, old value is stale
self.session_store.update_session(
session_entry.session_key, last_prompt_tokens=0,
)
new_count = len(compressed)
new_tokens = estimate_messages_tokens_rough(compressed)
@@ -2293,6 +2706,12 @@ class GatewayRunner:
Runs as an asyncio task. Stays silent when nothing changed.
Auto-removes when the process exits or is killed.
Notification mode (from ``display.background_process_notifications``):
- ``all`` — running-output updates + final message
- ``result`` — final completion message only
- ``error`` — final message only when exit code != 0
- ``off`` — no messages at all
"""
from tools.process_registry import process_registry
@@ -2301,8 +2720,21 @@ class GatewayRunner:
session_key = watcher.get("session_key", "")
platform_name = watcher.get("platform", "")
chat_id = watcher.get("chat_id", "")
notify_mode = self._load_background_notifications_mode()
logger.debug("Process watcher started: %s (every %ss)", session_id, interval)
logger.debug("Process watcher started: %s (every %ss, notify=%s)",
session_id, interval, notify_mode)
if notify_mode == "off":
# Still wait for the process to exit so we can log it, but don't
# push any messages to the user.
while True:
await asyncio.sleep(interval)
session = process_registry.get(session_id)
if session is None or session.exited:
break
logger.debug("Process watcher ended (silent): %s", session_id)
return
last_output_len = 0
while True:
@@ -2317,27 +2749,31 @@ class GatewayRunner:
last_output_len = current_output_len
if session.exited:
# Process finished -- deliver final update
new_output = session.output_buffer[-1000:] if session.output_buffer else ""
message_text = (
f"[Background process {session_id} finished with exit code {session.exit_code}~ "
f"Here's the final output:\n{new_output}]"
# Decide whether to notify based on mode
should_notify = (
notify_mode in ("all", "result")
or (notify_mode == "error" and session.exit_code not in (0, None))
)
# Try to deliver to the originating platform
adapter = None
for p, a in self.adapters.items():
if p.value == platform_name:
adapter = a
break
if adapter and chat_id:
try:
await adapter.send(chat_id, message_text)
except Exception as e:
logger.error("Watcher delivery error: %s", e)
if should_notify:
new_output = session.output_buffer[-1000:] if session.output_buffer else ""
message_text = (
f"[Background process {session_id} finished with exit code {session.exit_code}~ "
f"Here's the final output:\n{new_output}]"
)
adapter = None
for p, a in self.adapters.items():
if p.value == platform_name:
adapter = a
break
if adapter and chat_id:
try:
await adapter.send(chat_id, message_text)
except Exception as e:
logger.error("Watcher delivery error: %s", e)
break
elif has_new_output:
# New output available -- deliver status update
elif has_new_output and notify_mode == "all":
# New output available -- deliver status update (only in "all" mode)
new_output = session.output_buffer[-500:] if session.output_buffer else ""
message_text = (
f"[Background process {session_id} is still running~ "
@@ -2388,6 +2824,8 @@ class GatewayRunner:
Platform.DISCORD: "hermes-discord",
Platform.WHATSAPP: "hermes-whatsapp",
Platform.SLACK: "hermes-slack",
Platform.SIGNAL: "hermes-signal",
Platform.HOMEASSISTANT: "hermes-homeassistant",
}
# Try to load platform_toolsets from config
@@ -2396,7 +2834,7 @@ class GatewayRunner:
config_path = _hermes_home / 'config.yaml'
if config_path.exists():
import yaml
with open(config_path, 'r') as f:
with open(config_path, 'r', encoding="utf-8") as f:
user_config = yaml.safe_load(f) or {}
platform_toolsets_config = user_config.get("platform_toolsets", {})
except Exception as e:
@@ -2409,6 +2847,8 @@ class GatewayRunner:
Platform.DISCORD: "discord",
Platform.WHATSAPP: "whatsapp",
Platform.SLACK: "slack",
Platform.SIGNAL: "signal",
Platform.HOMEASSISTANT: "homeassistant",
}.get(source.platform, "telegram")
# Use config override if present (list of toolsets), otherwise hardcoded default
@@ -2426,7 +2866,7 @@ class GatewayRunner:
_tp_cfg_path = _hermes_home / "config.yaml"
if _tp_cfg_path.exists():
import yaml as _tp_yaml
with open(_tp_cfg_path) as _tp_f:
with open(_tp_cfg_path, encoding="utf-8") as _tp_f:
_tp_data = _tp_yaml.safe_load(_tp_f) or {}
_progress_cfg = _tp_data.get("display", {})
except Exception:
@@ -2517,6 +2957,8 @@ class GatewayRunner:
# Background task to send progress messages
# Accumulates tool lines into a single message that gets edited
_progress_metadata = {"thread_id": source.thread_id} if source.thread_id else None
async def send_progress_messages():
if not progress_queue:
return
@@ -2546,21 +2988,21 @@ class GatewayRunner:
# Platform doesn't support editing — stop trying,
# send just this new line as a separate message
can_edit = False
await adapter.send(chat_id=source.chat_id, content=msg)
await adapter.send(chat_id=source.chat_id, content=msg, metadata=_progress_metadata)
else:
if can_edit:
# First tool: send all accumulated text as new message
full_text = "\n".join(progress_lines)
result = await adapter.send(chat_id=source.chat_id, content=full_text)
result = await adapter.send(chat_id=source.chat_id, content=full_text, metadata=_progress_metadata)
else:
# Editing unsupported: send just this line
result = await adapter.send(chat_id=source.chat_id, content=msg)
result = await adapter.send(chat_id=source.chat_id, content=msg, metadata=_progress_metadata)
if result.success and result.message_id:
progress_msg_id = result.message_id
# Restore typing indicator
await asyncio.sleep(0.3)
await adapter.send_typing(source.chat_id)
await adapter.send_typing(source.chat_id, metadata=_progress_metadata)
except queue.Empty:
await asyncio.sleep(0.3)
@@ -2644,7 +3086,7 @@ class GatewayRunner:
import yaml as _y
_cfg_path = _hermes_home / "config.yaml"
if _cfg_path.exists():
with open(_cfg_path) as _f:
with open(_cfg_path, encoding="utf-8") as _f:
_cfg = _y.safe_load(_f) or {}
_model_cfg = _cfg.get("model", {})
if isinstance(_model_cfg, str):
@@ -2755,6 +3197,13 @@ class GatewayRunner:
# Return final response, or a message if something went wrong
final_response = result.get("final_response")
# Extract last actual prompt token count from the agent's compressor
_last_prompt_toks = 0
_agent = agent_holder[0]
if _agent and hasattr(_agent, "context_compressor"):
_last_prompt_toks = getattr(_agent.context_compressor, "last_prompt_tokens", 0)
if not final_response:
error_msg = f"⚠️ {result['error']}" if result.get("error") else "(No response generated)"
return {
@@ -2763,6 +3212,7 @@ class GatewayRunner:
"api_calls": result.get("api_calls", 0),
"tools": tools_holder[0] or [],
"history_offset": len(agent_history),
"last_prompt_tokens": _last_prompt_toks,
}
# Scan tool results for MEDIA:<path> tags that need to be delivered
@@ -2806,6 +3256,7 @@ class GatewayRunner:
"api_calls": result_holder[0].get("api_calls", 0) if result_holder[0] else 0,
"tools": tools_holder[0] or [],
"history_offset": len(agent_history),
"last_prompt_tokens": _last_prompt_toks,
}
# Start progress message sender if enabled
@@ -3126,7 +3577,7 @@ def main():
config = None
if args.config:
import json
with open(args.config) as f:
with open(args.config, encoding="utf-8") as f:
data = json.load(f)
config = GatewayConfig.from_dict(data)

View File

@@ -241,6 +241,9 @@ class SessionEntry:
output_tokens: int = 0
total_tokens: int = 0
# Last API-reported prompt tokens (for accurate compression pre-check)
last_prompt_tokens: int = 0
# Set when a session was created because the previous one expired;
# consumed once by the message handler to inject a notice into context
was_auto_reset: bool = False
@@ -257,6 +260,7 @@ class SessionEntry:
"input_tokens": self.input_tokens,
"output_tokens": self.output_tokens,
"total_tokens": self.total_tokens,
"last_prompt_tokens": self.last_prompt_tokens,
}
if self.origin:
result["origin"] = self.origin.to_dict()
@@ -272,8 +276,8 @@ class SessionEntry:
if data.get("platform"):
try:
platform = Platform(data["platform"])
except ValueError:
pass
except ValueError as e:
logger.debug("Unknown platform value %r: %s", data["platform"], e)
return cls(
session_key=data["session_key"],
@@ -287,6 +291,7 @@ class SessionEntry:
input_tokens=data.get("input_tokens", 0),
output_tokens=data.get("output_tokens", 0),
total_tokens=data.get("total_tokens", 0),
last_prompt_tokens=data.get("last_prompt_tokens", 0),
)
@@ -301,6 +306,8 @@ def build_session_key(source: SessionSource) -> str:
if platform == "whatsapp" and source.chat_id:
return f"agent:main:{platform}:dm:{source.chat_id}"
return f"agent:main:{platform}:dm"
if source.thread_id:
return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}:{source.thread_id}"
return f"agent:main:{platform}:{source.chat_type}:{source.chat_id}"
@@ -353,12 +360,26 @@ class SessionStore:
def _save(self) -> None:
"""Save sessions index to disk (kept for session key -> ID mapping)."""
import tempfile
self.sessions_dir.mkdir(parents=True, exist_ok=True)
sessions_file = self.sessions_dir / "sessions.json"
data = {key: entry.to_dict() for key, entry in self._entries.items()}
with open(sessions_file, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
fd, tmp_path = tempfile.mkstemp(
dir=str(self.sessions_dir), suffix=".tmp", prefix=".sessions_"
)
try:
with os.fdopen(fd, "w", encoding="utf-8") as f:
json.dump(data, f, indent=2)
f.flush()
os.fsync(f.fileno())
os.replace(tmp_path, sessions_file)
except BaseException:
try:
os.unlink(tmp_path)
except OSError as e:
logger.debug("Could not remove temp file %s: %s", tmp_path, e)
raise
def _generate_session_key(self, source: SessionSource) -> str:
"""Generate a session key from a source."""
@@ -536,7 +557,8 @@ class SessionStore:
self,
session_key: str,
input_tokens: int = 0,
output_tokens: int = 0
output_tokens: int = 0,
last_prompt_tokens: int = None,
) -> None:
"""Update a session's metadata after an interaction."""
self._ensure_loaded()
@@ -546,6 +568,8 @@ class SessionStore:
entry.updated_at = datetime.now()
entry.input_tokens += input_tokens
entry.output_tokens += output_tokens
if last_prompt_tokens is not None:
entry.last_prompt_tokens = last_prompt_tokens
entry.total_tokens = entry.input_tokens + entry.output_tokens
self._save()
@@ -663,10 +687,17 @@ class SessionStore:
"""Get the path to a session's legacy transcript file."""
return self.sessions_dir / f"{session_id}.jsonl"
def append_to_transcript(self, session_id: str, message: Dict[str, Any]) -> None:
"""Append a message to a session's transcript (SQLite + legacy JSONL)."""
# Write to SQLite
if self._db:
def append_to_transcript(self, session_id: str, message: Dict[str, Any], skip_db: bool = False) -> None:
"""Append a message to a session's transcript (SQLite + legacy JSONL).
Args:
skip_db: When True, only write to JSONL and skip the SQLite write.
Used when the agent already persisted messages to SQLite
via its own _flush_messages_to_session_db(), preventing
the duplicate-write bug (#860).
"""
# Write to SQLite (unless the agent already handled it)
if self._db and not skip_db:
try:
self._db.append_message(
session_id=session_id,

View File

@@ -23,6 +23,7 @@ import stat
import base64
import hashlib
import subprocess
import threading
import time
import uuid
import webbrowser
@@ -44,6 +45,10 @@ try:
import fcntl
except Exception:
fcntl = None
try:
import msvcrt
except Exception:
msvcrt = None
# =============================================================================
# Constants
@@ -103,6 +108,14 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
auth_type="oauth_external",
inference_base_url=DEFAULT_CODEX_BASE_URL,
),
"nous-api": ProviderConfig(
id="nous-api",
name="Nous Portal (API Key)",
auth_type="api_key",
inference_base_url="https://inference-api.nousresearch.com/v1",
api_key_env_vars=("NOUS_API_KEY",),
base_url_env_var="NOUS_BASE_URL",
),
"zai": ProviderConfig(
id="zai",
name="Z.AI / GLM",
@@ -299,31 +312,64 @@ def _auth_lock_path() -> Path:
return _auth_file_path().with_suffix(".lock")
_auth_lock_holder = threading.local()
@contextmanager
def _auth_store_lock(timeout_seconds: float = AUTH_LOCK_TIMEOUT_SECONDS):
"""Cross-process advisory lock for auth.json reads+writes."""
"""Cross-process advisory lock for auth.json reads+writes. Reentrant."""
# Reentrant: if this thread already holds the lock, just yield.
if getattr(_auth_lock_holder, "depth", 0) > 0:
_auth_lock_holder.depth += 1
try:
yield
finally:
_auth_lock_holder.depth -= 1
return
lock_path = _auth_lock_path()
lock_path.parent.mkdir(parents=True, exist_ok=True)
with lock_path.open("a+") as lock_file:
if fcntl is None:
if fcntl is None and msvcrt is None:
_auth_lock_holder.depth = 1
try:
yield
return
finally:
_auth_lock_holder.depth = 0
return
# On Windows, msvcrt.locking needs the file to have content and the
# file pointer at position 0. Ensure the lock file has at least 1 byte.
if msvcrt and (not lock_path.exists() or lock_path.stat().st_size == 0):
lock_path.write_text(" ", encoding="utf-8")
with lock_path.open("r+" if msvcrt else "a+") as lock_file:
deadline = time.time() + max(1.0, timeout_seconds)
while True:
try:
fcntl.flock(lock_file.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
if fcntl:
fcntl.flock(lock_file.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
else:
lock_file.seek(0)
msvcrt.locking(lock_file.fileno(), msvcrt.LK_NBLCK, 1)
break
except BlockingIOError:
except (BlockingIOError, OSError, PermissionError):
if time.time() >= deadline:
raise TimeoutError("Timed out waiting for auth store lock")
time.sleep(0.05)
_auth_lock_holder.depth = 1
try:
yield
finally:
fcntl.flock(lock_file.fileno(), fcntl.LOCK_UN)
_auth_lock_holder.depth = 0
if fcntl:
fcntl.flock(lock_file.fileno(), fcntl.LOCK_UN)
elif msvcrt:
try:
lock_file.seek(0)
msvcrt.locking(lock_file.fileno(), msvcrt.LK_UNLCK, 1)
except (OSError, IOError):
pass
def _load_auth_store(auth_file: Optional[Path] = None) -> Dict[str, Any]:
@@ -475,6 +521,7 @@ def resolve_provider(
# Normalize provider aliases
_PROVIDER_ALIASES = {
"nous_api": "nous-api", "nousapi": "nous-api", "nous-portal-api": "nous-api",
"glm": "zai", "z-ai": "zai", "z.ai": "zai", "zhipu": "zai",
"kimi": "kimi-coding", "moonshot": "kimi-coding",
"minimax-china": "minimax-cn", "minimax_cn": "minimax-cn",
@@ -1056,6 +1103,19 @@ def fetch_nous_models(
continue
model_ids.append(mid)
# Sort: prefer opus > pro > haiku/flash > sonnet (sonnet is cheap/fast,
# users who want the best model should see opus first).
def _model_priority(mid: str) -> tuple:
low = mid.lower()
if "opus" in low:
return (0, mid)
if "pro" in low and "sonnet" not in low:
return (1, mid)
if "sonnet" in low:
return (3, mid)
return (2, mid)
model_ids.sort(key=_model_priority)
return list(dict.fromkeys(model_ids))
@@ -1624,11 +1684,11 @@ def _save_model_choice(model_id: str) -> None:
from hermes_cli.config import save_config, load_config, save_env_value
config = load_config()
# Handle both string and dict model formats
# Always use dict format so provider/base_url can be stored alongside
if isinstance(config.get("model"), dict):
config["model"]["default"] = model_id
else:
config["model"] = model_id
config["model"] = {"default": model_id}
save_config(config)
save_env_value("LLM_MODEL", model_id)

View File

@@ -36,6 +36,28 @@ def cprint(text: str):
_pt_print(_PT_ANSI(text))
# =========================================================================
# Skin-aware color helpers
# =========================================================================
def _skin_color(key: str, fallback: str) -> str:
"""Get a color from the active skin, or return fallback."""
try:
from hermes_cli.skin_engine import get_active_skin
return get_active_skin().get_color(key, fallback)
except Exception:
return fallback
def _skin_branding(key: str, fallback: str) -> str:
"""Get a branding string from the active skin, or return fallback."""
try:
from hermes_cli.skin_engine import get_active_skin
return get_active_skin().get_branding(key, fallback)
except Exception:
return fallback
# =========================================================================
# ASCII Art & Branding
# =========================================================================
@@ -217,18 +239,24 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
layout_table.add_column("left", justify="center")
layout_table.add_column("right", justify="left")
# Resolve skin colors once for the entire banner
accent = _skin_color("banner_accent", "#FFBF00")
dim = _skin_color("banner_dim", "#B8860B")
text = _skin_color("banner_text", "#FFF8DC")
session_color = _skin_color("session_border", "#8B8682")
left_lines = ["", HERMES_CADUCEUS, ""]
model_short = model.split("/")[-1] if "/" in model else model
if len(model_short) > 28:
model_short = model_short[:25] + "..."
ctx_str = f" [dim #B8860B]·[/] [dim #B8860B]{_format_context_length(context_length)} context[/]" if context_length else ""
left_lines.append(f"[#FFBF00]{model_short}[/]{ctx_str} [dim #B8860B]·[/] [dim #B8860B]Nous Research[/]")
left_lines.append(f"[dim #B8860B]{cwd}[/]")
ctx_str = f" [dim {dim}]·[/] [dim {dim}]{_format_context_length(context_length)} context[/]" if context_length else ""
left_lines.append(f"[{accent}]{model_short}[/]{ctx_str} [dim {dim}]·[/] [dim {dim}]Nous Research[/]")
left_lines.append(f"[dim {dim}]{cwd}[/]")
if session_id:
left_lines.append(f"[dim #8B8682]Session: {session_id}[/]")
left_lines.append(f"[dim {session_color}]Session: {session_id}[/]")
left_content = "\n".join(left_lines)
right_lines = ["[bold #FFBF00]Available Tools[/]"]
right_lines = [f"[bold {accent}]Available Tools[/]"]
toolsets_dict: Dict[str, list] = {}
for tool in tools:
@@ -256,7 +284,7 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
if name in disabled_tools:
colored_names.append(f"[red]{name}[/]")
else:
colored_names.append(f"[#FFF8DC]{name}[/]")
colored_names.append(f"[{text}]{name}[/]")
tools_str = ", ".join(colored_names)
if len(", ".join(sorted(tool_names))) > 45:
@@ -275,7 +303,7 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
elif name in disabled_tools:
colored_names.append(f"[red]{name}[/]")
else:
colored_names.append(f"[#FFF8DC]{name}[/]")
colored_names.append(f"[{text}]{name}[/]")
tools_str = ", ".join(colored_names)
right_lines.append(f"[dim #B8860B]{toolset}:[/] {tools_str}")
@@ -306,7 +334,7 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
)
right_lines.append("")
right_lines.append("[bold #FFBF00]Available Skills[/]")
right_lines.append(f"[bold {accent}]Available Skills[/]")
skills_by_category = get_available_skills()
total_skills = sum(len(s) for s in skills_by_category.values())
@@ -320,9 +348,9 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
skills_str = ", ".join(skill_names)
if len(skills_str) > 50:
skills_str = skills_str[:47] + "..."
right_lines.append(f"[dim #B8860B]{category}:[/] [#FFF8DC]{skills_str}[/]")
right_lines.append(f"[dim {dim}]{category}:[/] [{text}]{skills_str}[/]")
else:
right_lines.append("[dim #B8860B]No skills installed[/]")
right_lines.append(f"[dim {dim}]No skills installed[/]")
right_lines.append("")
mcp_connected = sum(1 for s in mcp_status if s["connected"]) if mcp_status else 0
@@ -330,7 +358,7 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
if mcp_connected:
summary_parts.append(f"{mcp_connected} MCP servers")
summary_parts.append("/help for commands")
right_lines.append(f"[dim #B8860B]{' · '.join(summary_parts)}[/]")
right_lines.append(f"[dim {dim}]{' · '.join(summary_parts)}[/]")
# Update check — show if behind origin/main
try:
@@ -347,10 +375,13 @@ def build_welcome_banner(console: Console, model: str, cwd: str,
right_content = "\n".join(right_lines)
layout_table.add_row(left_content, right_content)
agent_name = _skin_branding("agent_name", "Hermes Agent")
title_color = _skin_color("banner_title", "#FFD700")
border_color = _skin_color("banner_border", "#CD7F32")
outer_panel = Panel(
layout_table,
title=f"[bold #FFD700]Hermes Agent {VERSION}[/]",
border_style="#CD7F32",
title=f"[bold {title_color}]{agent_name} {VERSION}[/]",
border_style=border_color,
padding=(0, 2),
)

View File

@@ -254,6 +254,7 @@ def _wayland_save(dest: Path) -> bool:
)
if not dest.exists() or dest.stat().st_size == 0:
dest.unlink(missing_ok=True)
return False
# BMP needs conversion to PNG (common in WSLg where only BMP
@@ -292,9 +293,12 @@ def _convert_to_png(path: Path) -> bool:
["convert", str(tmp), "png:" + str(path)],
capture_output=True, timeout=5,
)
tmp.unlink(missing_ok=True)
if r.returncode == 0 and path.exists() and path.stat().st_size > 0:
tmp.unlink(missing_ok=True)
return True
else:
# Convert failed — restore the original file
tmp.rename(path)
except FileNotFoundError:
logger.debug("ImageMagick not installed — cannot convert BMP to PNG")
if tmp.exists() and not path.exists():

View File

@@ -47,7 +47,7 @@ def _fetch_models_from_api(access_token: str) -> List[str]:
if item.get("supported_in_api") is False:
continue
visibility = item.get("visibility", "")
if isinstance(visibility, str) and visibility.strip().lower() == "hide":
if isinstance(visibility, str) and visibility.strip().lower() in ("hide", "hidden"):
continue
priority = item.get("priority")
rank = int(priority) if isinstance(priority, (int, float)) else 10_000
@@ -97,7 +97,7 @@ def _read_cache_models(codex_home: Path) -> List[str]:
if item.get("supported_in_api") is False:
continue
visibility = item.get("visibility")
if isinstance(visibility, str) and visibility.strip().lower() == "hidden":
if isinstance(visibility, str) and visibility.strip().lower() in ("hide", "hidden"):
continue
priority = item.get("priority")
rank = int(priority) if isinstance(priority, (int, float)) else 10_000

View File

@@ -13,35 +13,55 @@ from typing import Any
from prompt_toolkit.completion import Completer, Completion
COMMANDS = {
"/help": "Show this help message",
"/tools": "List available tools",
"/toolsets": "List available toolsets",
"/model": "Show or change the current model",
"/provider": "Show available providers and current provider",
"/prompt": "View/set custom system prompt",
"/personality": "Set a predefined personality",
"/clear": "Clear screen and reset conversation (fresh start)",
"/history": "Show conversation history",
"/new": "Start a new conversation (reset history)",
"/reset": "Reset conversation only (keep screen)",
"/retry": "Retry the last message (resend to agent)",
"/undo": "Remove the last user/assistant exchange",
"/save": "Save the current conversation",
"/config": "Show current configuration",
"/cron": "Manage scheduled tasks (list, add, remove)",
"/skills": "Search, install, inspect, or manage skills from online registries",
"/platforms": "Show gateway/messaging platform status",
"/verbose": "Cycle tool progress display: off → new → all → verbose",
"/compress": "Manually compress conversation context (flush memories + summarize)",
"/title": "Set a title for the current session (usage: /title My Session Name)",
"/usage": "Show token usage for the current session",
"/insights": "Show usage insights and analytics (last 30 days)",
"/paste": "Check clipboard for an image and attach it",
"/reload-mcp": "Reload MCP servers from config.yaml",
"/quit": "Exit the CLI (also: /exit, /q)",
# Commands organized by category for better help display
COMMANDS_BY_CATEGORY = {
"Session": {
"/new": "Start a new conversation (reset history)",
"/reset": "Reset conversation only (keep screen)",
"/clear": "Clear screen and reset conversation (fresh start)",
"/history": "Show conversation history",
"/save": "Save the current conversation",
"/retry": "Retry the last message (resend to agent)",
"/undo": "Remove the last user/assistant exchange",
"/title": "Set a title for the current session (usage: /title My Session Name)",
"/compress": "Manually compress conversation context (flush memories + summarize)",
"/rollback": "List or restore filesystem checkpoints (usage: /rollback [number])",
"/background": "Run a prompt in the background (usage: /background <prompt>)",
},
"Configuration": {
"/config": "Show current configuration",
"/model": "Show or change the current model",
"/provider": "Show available providers and current provider",
"/prompt": "View/set custom system prompt",
"/personality": "Set a predefined personality",
"/reasoning": "Show or change reasoning effort (none|low|medium|high|xhigh)",
"/verbose": "Cycle tool progress display: off → new → all → verbose",
"/skin": "Show or change the display skin/theme",
},
"Tools & Skills": {
"/tools": "List available tools",
"/toolsets": "List available toolsets",
"/skills": "Search, install, inspect, or manage skills from online registries",
"/cron": "Manage scheduled tasks (list, add, remove)",
"/reload-mcp": "Reload MCP servers from config.yaml",
},
"Info": {
"/help": "Show this help message",
"/usage": "Show token usage for the current session",
"/insights": "Show usage insights and analytics (last 30 days)",
"/platforms": "Show gateway/messaging platform status",
"/paste": "Check clipboard for an image and attach it",
},
"Exit": {
"/quit": "Exit the CLI (also: /exit, /q)",
},
}
# Flat dict for backwards compatibility and autocomplete
COMMANDS = {}
for category_commands in COMMANDS_BY_CATEGORY.values():
COMMANDS.update(category_commands)
class SlashCommandCompleter(Completer):
"""Autocomplete for built-in slash commands and optional skill commands."""

View File

@@ -14,8 +14,9 @@ This module provides:
import os
import platform
import sys
import stat
import subprocess
import sys
from pathlib import Path
from typing import Dict, Any, Optional, List, Tuple
@@ -46,13 +47,32 @@ def get_project_root() -> Path:
"""Get the project installation directory."""
return Path(__file__).parent.parent.resolve()
def _secure_dir(path):
"""Set directory to owner-only access (0700). No-op on Windows."""
try:
os.chmod(path, 0o700)
except (OSError, NotImplementedError):
pass
def _secure_file(path):
"""Set file to owner-only read/write (0600). No-op on Windows."""
try:
if os.path.exists(str(path)):
os.chmod(path, 0o600)
except (OSError, NotImplementedError):
pass
def ensure_hermes_home():
"""Ensure ~/.hermes directory structure exists."""
"""Ensure ~/.hermes directory structure exists with secure permissions."""
home = get_hermes_home()
(home / "cron").mkdir(parents=True, exist_ok=True)
(home / "sessions").mkdir(parents=True, exist_ok=True)
(home / "logs").mkdir(parents=True, exist_ok=True)
(home / "memories").mkdir(parents=True, exist_ok=True)
home.mkdir(parents=True, exist_ok=True)
_secure_dir(home)
for subdir in ("cron", "sessions", "logs", "memories"):
d = home / subdir
d.mkdir(parents=True, exist_ok=True)
_secure_dir(d)
# =============================================================================
@@ -62,7 +82,9 @@ def ensure_hermes_home():
DEFAULT_CONFIG = {
"model": "anthropic/claude-opus-4.6",
"toolsets": ["hermes-cli"],
"max_turns": 100,
"agent": {
"max_turns": 90,
},
"terminal": {
"backend": "local",
@@ -77,6 +99,10 @@ DEFAULT_CONFIG = {
"container_memory": 5120, # MB (default 5GB)
"container_disk": 51200, # MB (default 50GB)
"container_persistent": True, # Persist filesystem across sessions
# Docker volume mounts — share host directories with the container.
# Each entry is "host_path:container_path" (standard Docker -v syntax).
# Example: ["/home/user/projects:/workspace/projects", "/data:/data"]
"docker_volumes": [],
},
"browser": {
@@ -84,6 +110,14 @@ DEFAULT_CONFIG = {
"record_sessions": False, # Auto-record browser sessions as WebM videos
},
# Filesystem checkpoints — automatic snapshots before destructive file ops.
# When enabled, the agent takes a snapshot of the working directory once per
# conversation turn (on first write_file/patch call). Use /rollback to restore.
"checkpoints": {
"enabled": False,
"max_snapshots": 50, # Max checkpoints to keep per directory
},
"compression": {
"enabled": True,
"threshold": 0.85,
@@ -107,8 +141,9 @@ DEFAULT_CONFIG = {
"display": {
"compact": False,
"personality": "kawaii",
"resume_display": "full", # "full" (show previous messages) | "minimal" (one-liner only)
"bell_on_complete": False, # Play terminal bell (\a) when agent finishes a response
"resume_display": "full",
"bell_on_complete": False,
"skin": "default",
},
# Text-to-speech configuration
@@ -164,9 +199,15 @@ DEFAULT_CONFIG = {
# Permanently allowed dangerous command patterns (added via "always" approval)
"command_allowlist": [],
# User-defined quick commands that bypass the agent loop (type: exec only)
"quick_commands": {},
# Custom personalities — add your own entries here
# Supports string format: {"name": "system prompt"}
# Or dict format: {"name": {"description": "...", "system_prompt": "...", "tone": "...", "style": "..."}}
"personalities": {},
# Config schema version - bump this when adding new required fields
"_config_version": 5,
"_config_version": 6,
}
# =============================================================================
@@ -191,6 +232,22 @@ REQUIRED_ENV_VARS = {}
# Optional environment variables that enhance functionality
OPTIONAL_ENV_VARS = {
# ── Provider (handled in provider selection, not shown in checklists) ──
"NOUS_API_KEY": {
"description": "Nous Portal API key (direct API key access to Nous inference)",
"prompt": "Nous Portal API key",
"url": "https://portal.nousresearch.com",
"password": True,
"category": "provider",
"advanced": True,
},
"NOUS_BASE_URL": {
"description": "Nous Portal base URL override",
"prompt": "Nous Portal base URL (leave empty for default)",
"url": None,
"password": False,
"category": "provider",
"advanced": True,
},
"OPENROUTER_API_KEY": {
"description": "OpenRouter API key (for vision, web scraping helpers, and MoA)",
"prompt": "OpenRouter API key",
@@ -401,14 +458,18 @@ OPTIONAL_ENV_VARS = {
"category": "messaging",
},
"SLACK_BOT_TOKEN": {
"description": "Slack bot integration",
"description": "Slack bot token (xoxb-). Get from OAuth & Permissions after installing your app. "
"Required scopes: chat:write, app_mentions:read, channels:history, groups:history, "
"im:history, im:read, im:write, users:read, files:write",
"prompt": "Slack Bot Token (xoxb-...)",
"url": "https://api.slack.com/apps",
"password": True,
"category": "messaging",
},
"SLACK_APP_TOKEN": {
"description": "Slack Socket Mode connection",
"description": "Slack app-level token (xapp-) for Socket Mode. Get from Basic Information → "
"App-Level Tokens. Also ensure Event Subscriptions include: message.im, "
"message.channels, message.groups, app_mention",
"prompt": "Slack App Token (xapp-...)",
"url": "https://api.slack.com/apps",
"password": True,
@@ -740,6 +801,23 @@ def _deep_merge(base: dict, override: dict) -> dict:
return result
def _normalize_max_turns_config(config: Dict[str, Any]) -> Dict[str, Any]:
"""Normalize legacy root-level max_turns into agent.max_turns."""
config = dict(config)
agent_config = dict(config.get("agent") or {})
if "max_turns" in config and "max_turns" not in agent_config:
agent_config["max_turns"] = config["max_turns"]
if "max_turns" not in agent_config:
agent_config["max_turns"] = DEFAULT_CONFIG["agent"]["max_turns"]
config["agent"] = agent_config
config.pop("max_turns", None)
return config
def load_config() -> Dict[str, Any]:
"""Load configuration from ~/.hermes/config.yaml."""
import copy
@@ -749,14 +827,21 @@ def load_config() -> Dict[str, Any]:
if config_path.exists():
try:
with open(config_path) as f:
with open(config_path, encoding="utf-8") as f:
user_config = yaml.safe_load(f) or {}
if "max_turns" in user_config:
agent_user_config = dict(user_config.get("agent") or {})
if agent_user_config.get("max_turns") is None:
agent_user_config["max_turns"] = user_config["max_turns"]
user_config["agent"] = agent_user_config
user_config.pop("max_turns", None)
config = _deep_merge(config, user_config)
except Exception as e:
print(f"Warning: Failed to load config: {e}")
return config
return _normalize_max_turns_config(config)
_COMMENTED_SECTIONS = """
@@ -791,23 +876,28 @@ _COMMENTED_SECTIONS = """
def save_config(config: Dict[str, Any]):
"""Save configuration to ~/.hermes/config.yaml."""
from utils import atomic_yaml_write
ensure_hermes_home()
config_path = get_config_path()
with open(config_path, 'w') as f:
yaml.dump(config, f, default_flow_style=False, sort_keys=False)
# Append commented-out sections for features that are off by default
# or only relevant when explicitly configured. Skip sections the
# user has already uncommented and configured.
sections = []
sec = config.get("security", {})
if not sec or sec.get("redact_secrets") is None:
sections.append("security")
fb = config.get("fallback_model", {})
if not fb or not (fb.get("provider") and fb.get("model")):
sections.append("fallback")
if sections:
f.write(_COMMENTED_SECTIONS)
normalized = _normalize_max_turns_config(config)
# Build optional commented-out sections for features that are off by
# default or only relevant when explicitly configured.
sections = []
sec = normalized.get("security", {})
if not sec or sec.get("redact_secrets") is None:
sections.append("security")
fb = normalized.get("fallback_model", {})
if not fb or not (fb.get("provider") and fb.get("model")):
sections.append("fallback")
atomic_yaml_write(
config_path,
normalized,
extra_content=_COMMENTED_SECTIONS if sections else None,
)
_secure_file(config_path)
def load_env() -> Dict[str, str]:
@@ -860,6 +950,14 @@ def save_env_value(key: str, value: str):
with open(env_path, 'w', **write_kw) as f:
f.writelines(lines)
_secure_file(env_path)
# Restrict .env permissions to owner-only (contains API keys)
if not _IS_WINDOWS:
try:
os.chmod(env_path, stat.S_IRUSR | stat.S_IWUSR)
except OSError:
pass
def get_env_value(key: str) -> Optional[str]:
@@ -924,7 +1022,7 @@ def show_config():
print()
print(color("◆ Model", Colors.CYAN, Colors.BOLD))
print(f" Model: {config.get('model', 'not set')}")
print(f" Max turns: {config.get('max_turns', 100)}")
print(f" Max turns: {config.get('agent', {}).get('max_turns', DEFAULT_CONFIG['agent']['max_turns'])}")
print(f" Toolsets: {', '.join(config.get('toolsets', ['all']))}")
# Terminal
@@ -1069,7 +1167,7 @@ def set_config_value(key: str, value: str):
user_config = {}
if config_path.exists():
try:
with open(config_path) as f:
with open(config_path, encoding="utf-8") as f:
user_config = yaml.safe_load(f) or {}
except Exception:
user_config = {}
@@ -1097,7 +1195,7 @@ def set_config_value(key: str, value: str):
# Write only user config back (not the full merged defaults)
ensure_hermes_home()
with open(config_path, 'w') as f:
with open(config_path, 'w', encoding="utf-8") as f:
yaml.dump(user_config, f, default_flow_style=False, sort_keys=False)
# Keep .env in sync for keys that terminal_tool reads directly from env vars.

140
hermes_cli/curses_ui.py Normal file
View File

@@ -0,0 +1,140 @@
"""Shared curses-based UI components for Hermes CLI.
Used by `hermes tools` and `hermes skills` for interactive checklists.
Provides a curses multi-select with keyboard navigation, plus a
text-based numbered fallback for terminals without curses support.
"""
from typing import List, Set
from hermes_cli.colors import Colors, color
def curses_checklist(
title: str,
items: List[str],
selected: Set[int],
*,
cancel_returns: Set[int] | None = None,
) -> Set[int]:
"""Curses multi-select checklist. Returns set of selected indices.
Args:
title: Header line displayed above the checklist.
items: Display labels for each row.
selected: Indices that start checked (pre-selected).
cancel_returns: Returned on ESC/q. Defaults to the original *selected*.
"""
if cancel_returns is None:
cancel_returns = set(selected)
try:
import curses
chosen = set(selected)
result_holder: list = [None]
def _draw(stdscr):
curses.curs_set(0)
if curses.has_colors():
curses.start_color()
curses.use_default_colors()
curses.init_pair(1, curses.COLOR_GREEN, -1)
curses.init_pair(2, curses.COLOR_YELLOW, -1)
curses.init_pair(3, 8, -1) # dim gray
cursor = 0
scroll_offset = 0
while True:
stdscr.clear()
max_y, max_x = stdscr.getmaxyx()
# Header
try:
hattr = curses.A_BOLD
if curses.has_colors():
hattr |= curses.color_pair(2)
stdscr.addnstr(0, 0, title, max_x - 1, hattr)
stdscr.addnstr(
1, 0,
" ↑↓ navigate SPACE toggle ENTER confirm ESC cancel",
max_x - 1, curses.A_DIM,
)
except curses.error:
pass
# Scrollable item list
visible_rows = max_y - 3
if cursor < scroll_offset:
scroll_offset = cursor
elif cursor >= scroll_offset + visible_rows:
scroll_offset = cursor - visible_rows + 1
for draw_i, i in enumerate(
range(scroll_offset, min(len(items), scroll_offset + visible_rows))
):
y = draw_i + 3
if y >= max_y - 1:
break
check = "" if i in chosen else " "
arrow = "" if i == cursor else " "
line = f" {arrow} [{check}] {items[i]}"
attr = curses.A_NORMAL
if i == cursor:
attr = curses.A_BOLD
if curses.has_colors():
attr |= curses.color_pair(1)
try:
stdscr.addnstr(y, 0, line, max_x - 1, attr)
except curses.error:
pass
stdscr.refresh()
key = stdscr.getch()
if key in (curses.KEY_UP, ord("k")):
cursor = (cursor - 1) % len(items)
elif key in (curses.KEY_DOWN, ord("j")):
cursor = (cursor + 1) % len(items)
elif key == ord(" "):
chosen.symmetric_difference_update({cursor})
elif key in (curses.KEY_ENTER, 10, 13):
result_holder[0] = set(chosen)
return
elif key in (27, ord("q")):
result_holder[0] = cancel_returns
return
curses.wrapper(_draw)
return result_holder[0] if result_holder[0] is not None else cancel_returns
except Exception:
return _numbered_fallback(title, items, selected, cancel_returns)
def _numbered_fallback(
title: str,
items: List[str],
selected: Set[int],
cancel_returns: Set[int],
) -> Set[int]:
"""Text-based toggle fallback for terminals without curses."""
chosen = set(selected)
print(color(f"\n {title}", Colors.YELLOW))
print(color(" Toggle by number, Enter to confirm.\n", Colors.DIM))
while True:
for i, label in enumerate(items):
marker = color("[✓]", Colors.GREEN) if i in chosen else "[ ]"
print(f" {marker} {i + 1:>2}. {label}")
print()
try:
val = input(color(" Toggle # (or Enter to confirm): ", Colors.DIM)).strip()
if not val:
break
idx = int(val) - 1
if 0 <= idx < len(items):
chosen.symmetric_difference_update({idx})
except (ValueError, KeyboardInterrupt, EOFError):
return cancel_returns
print()
return chosen

View File

@@ -482,14 +482,19 @@ _PLATFORMS = [
"token_var": "SLACK_BOT_TOKEN",
"setup_instructions": [
"1. Go to https://api.slack.com/apps → Create New App → From Scratch",
"2. Enable Socket Mode: App Settings → Socket Mode → Enable",
"3. Get Bot Token: OAuth & Permissions → Install to Workspace → copy xoxb-... token",
"4. Get App Token: Basic Information → App-Level Tokens → Generate",
" Name it anything, add scope: connections:write → copy xapp-... token",
"5. Add bot scopes: OAuth & Permissions → Scopes → chat:write, im:history,",
" im:read, im:write, channels:history, channels:read",
"6. Reinstall the app to your workspace after adding scopes",
"2. Enable Socket Mode: Settings → Socket Mode → Enable",
" Create an App-Level Token with scope: connections:write → copy xapp-... token",
"3. Add Bot Token Scopes: Features → OAuth & Permissions → Scopes",
" Required: chat:write, app_mentions:read, channels:history, channels:read,",
" groups:history, im:history, im:read, im:write, users:read, files:write",
"4. Subscribe to Events: Features → Event Subscriptions → Enable",
" Required events: message.im, message.channels, app_mention",
" Optional: message.groups (for private channels)",
" ⚠ Without message.channels the bot will ONLY work in DMs!",
"5. Install to Workspace: Settings → Install App → copy xoxb-... token",
"6. Reinstall the app after any scope or event changes",
"7. Find your user ID: click your profile → three dots → Copy member ID",
"8. Invite the bot to channels: /invite @YourBot",
],
"vars": [
{"name": "SLACK_BOT_TOKEN", "prompt": "Bot Token (xoxb-...)", "password": True,

View File

@@ -477,6 +477,10 @@ def cmd_chat(args):
except Exception:
pass
# --yolo: bypass all dangerous command approvals
if getattr(args, "yolo", False):
os.environ["HERMES_YOLO_MODE"] = "1"
# Import and run the CLI
from cli import main as cli_main
@@ -486,9 +490,11 @@ def cmd_chat(args):
"provider": getattr(args, "provider", None),
"toolsets": args.toolsets,
"verbose": args.verbose,
"quiet": getattr(args, "quiet", False),
"query": args.query,
"resume": getattr(args, "resume", None),
"worktree": getattr(args, "worktree", False),
"checkpoints": getattr(args, "checkpoints", False),
}
# Filter out None values
kwargs = {k: v for k, v in kwargs.items() if v is not None}
@@ -761,9 +767,39 @@ def cmd_model(args):
("kimi-coding", "Kimi / Moonshot (Moonshot AI direct API)"),
("minimax", "MiniMax (global direct API)"),
("minimax-cn", "MiniMax China (domestic direct API)"),
("custom", "Custom endpoint (self-hosted / VLLM / etc.)"),
]
# Add user-defined custom providers from config.yaml
custom_providers_cfg = config.get("custom_providers") or []
_custom_provider_map = {} # key → {name, base_url, api_key}
if isinstance(custom_providers_cfg, list):
for entry in custom_providers_cfg:
if not isinstance(entry, dict):
continue
name = entry.get("name", "").strip()
base_url = entry.get("base_url", "").strip()
if not name or not base_url:
continue
# Generate a stable key from the name
key = "custom:" + name.lower().replace(" ", "-")
short_url = base_url.replace("https://", "").replace("http://", "").rstrip("/")
saved_model = entry.get("model", "")
model_hint = f"{saved_model}" if saved_model else ""
providers.append((key, f"{name} ({short_url}){model_hint}"))
_custom_provider_map[key] = {
"name": name,
"base_url": base_url,
"api_key": entry.get("api_key", ""),
"model": saved_model,
}
# Always add the manual custom endpoint option last
providers.append(("custom", "Custom endpoint (enter URL manually)"))
# Add removal option if there are saved custom providers
if _custom_provider_map:
providers.append(("remove-custom", "Remove a saved custom provider"))
# Reorder so the active provider is at the top
known_keys = {k for k, _ in providers}
active_key = active if active in known_keys else "custom"
@@ -791,6 +827,10 @@ def cmd_model(args):
_model_flow_openai_codex(config, current_model)
elif selected_provider == "custom":
_model_flow_custom(config)
elif selected_provider.startswith("custom:") and selected_provider in _custom_provider_map:
_model_flow_named_custom(config, _custom_provider_map[selected_provider])
elif selected_provider == "remove-custom":
_remove_custom_provider(config)
elif selected_provider in ("zai", "kimi-coding", "minimax", "minimax-cn"):
_model_flow_api_key_provider(config, selected_provider, current_model)
@@ -871,9 +911,11 @@ def _model_flow_openrouter(config, current_model=""):
from hermes_cli.config import load_config, save_config
cfg = load_config()
model = cfg.get("model")
if isinstance(model, dict):
model["provider"] = "openrouter"
model["base_url"] = OPENROUTER_BASE_URL
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "openrouter"
model["base_url"] = OPENROUTER_BASE_URL
save_config(cfg)
deactivate_provider()
print(f"Default model set to: {selected} (via OpenRouter)")
@@ -1006,7 +1048,11 @@ def _model_flow_openai_codex(config, current_model=""):
def _model_flow_custom(config):
"""Custom endpoint: collect URL, API key, and model name."""
"""Custom endpoint: collect URL, API key, and model name.
Automatically saves the endpoint to ``custom_providers`` in config.yaml
so it appears in the provider menu on subsequent runs.
"""
from hermes_cli.auth import _save_model_choice, deactivate_provider
from hermes_cli.config import get_env_value, save_env_value, load_config, save_config
@@ -1038,6 +1084,8 @@ def _model_flow_custom(config):
print(f"Invalid URL: {effective_url} (must start with http:// or https://)")
return
effective_key = api_key or current_key
if base_url:
save_env_value("OPENAI_BASE_URL", base_url)
if api_key:
@@ -1049,9 +1097,11 @@ def _model_flow_custom(config):
# Update config and deactivate any OAuth provider
cfg = load_config()
model = cfg.get("model")
if isinstance(model, dict):
model["provider"] = "auto"
model["base_url"] = effective_url
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "custom"
model["base_url"] = effective_url
save_config(cfg)
deactivate_provider()
@@ -1061,6 +1111,227 @@ def _model_flow_custom(config):
deactivate_provider()
print("Endpoint saved. Use `/model` in chat or `hermes model` to set a model.")
# Auto-save to custom_providers so it appears in the menu next time
_save_custom_provider(effective_url, effective_key, model_name or "")
def _save_custom_provider(base_url, api_key="", model=""):
"""Save a custom endpoint to custom_providers in config.yaml.
Deduplicates by base_url — if the URL already exists, updates the
model name but doesn't add a duplicate entry.
Auto-generates a display name from the URL hostname.
"""
from hermes_cli.config import load_config, save_config
cfg = load_config()
providers = cfg.get("custom_providers") or []
if not isinstance(providers, list):
providers = []
# Check if this URL is already saved — update model if so
for entry in providers:
if isinstance(entry, dict) and entry.get("base_url", "").rstrip("/") == base_url.rstrip("/"):
if model and entry.get("model") != model:
entry["model"] = model
cfg["custom_providers"] = providers
save_config(cfg)
return # already saved, updated model if needed
# Auto-generate a name from the URL
import re
clean = base_url.replace("https://", "").replace("http://", "").rstrip("/")
# Remove /v1 suffix for cleaner names
clean = re.sub(r"/v1/?$", "", clean)
# Use hostname:port as the name
name = clean.split("/")[0]
# Capitalize for readability
if "localhost" in name or "127.0.0.1" in name:
name = f"Local ({name})"
elif "runpod" in name.lower():
name = f"RunPod ({name})"
else:
name = name.capitalize()
entry = {"name": name, "base_url": base_url}
if api_key:
entry["api_key"] = api_key
if model:
entry["model"] = model
providers.append(entry)
cfg["custom_providers"] = providers
save_config(cfg)
print(f" 💾 Saved to custom providers as \"{name}\" (edit in config.yaml)")
def _remove_custom_provider(config):
"""Let the user remove a saved custom provider from config.yaml."""
from hermes_cli.config import load_config, save_config
cfg = load_config()
providers = cfg.get("custom_providers") or []
if not isinstance(providers, list) or not providers:
print("No custom providers configured.")
return
print("Remove a custom provider:\n")
choices = []
for entry in providers:
if isinstance(entry, dict):
name = entry.get("name", "unnamed")
url = entry.get("base_url", "")
short_url = url.replace("https://", "").replace("http://", "").rstrip("/")
choices.append(f"{name} ({short_url})")
else:
choices.append(str(entry))
choices.append("Cancel")
try:
from simple_term_menu import TerminalMenu
menu = TerminalMenu(
[f" {c}" for c in choices], cursor_index=0,
menu_cursor="-> ", menu_cursor_style=("fg_red", "bold"),
menu_highlight_style=("fg_red",),
cycle_cursor=True, clear_screen=False,
title="Select provider to remove:",
)
idx = menu.show()
print()
except (ImportError, NotImplementedError):
for i, c in enumerate(choices, 1):
print(f" {i}. {c}")
print()
try:
val = input(f"Choice [1-{len(choices)}]: ").strip()
idx = int(val) - 1 if val else None
except (ValueError, KeyboardInterrupt, EOFError):
idx = None
if idx is None or idx >= len(providers):
print("No change.")
return
removed = providers.pop(idx)
cfg["custom_providers"] = providers
save_config(cfg)
removed_name = removed.get("name", "unnamed") if isinstance(removed, dict) else str(removed)
print(f"✅ Removed \"{removed_name}\" from custom providers.")
def _model_flow_named_custom(config, provider_info):
"""Handle a named custom provider from config.yaml custom_providers list.
If the entry has a saved model name, activates it immediately.
Otherwise probes the endpoint's /models API to let the user pick one.
"""
from hermes_cli.auth import _save_model_choice, deactivate_provider
from hermes_cli.config import save_env_value, load_config, save_config
from hermes_cli.models import fetch_api_models
name = provider_info["name"]
base_url = provider_info["base_url"]
api_key = provider_info.get("api_key", "")
saved_model = provider_info.get("model", "")
# If a model is saved, just activate immediately — no probing needed
if saved_model:
save_env_value("OPENAI_BASE_URL", base_url)
if api_key:
save_env_value("OPENAI_API_KEY", api_key)
_save_model_choice(saved_model)
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "custom"
model["base_url"] = base_url
save_config(cfg)
deactivate_provider()
print(f"✅ Switched to: {saved_model}")
print(f" Provider: {name} ({base_url})")
return
# No saved model — probe endpoint and let user pick
print(f" Provider: {name}")
print(f" URL: {base_url}")
print()
print("No model saved for this provider. Fetching available models...")
models = fetch_api_models(api_key, base_url, timeout=8.0)
if models:
print(f"Found {len(models)} model(s):\n")
try:
from simple_term_menu import TerminalMenu
menu_items = [f" {m}" for m in models] + [" Cancel"]
menu = TerminalMenu(
menu_items, cursor_index=0,
menu_cursor="-> ", menu_cursor_style=("fg_green", "bold"),
menu_highlight_style=("fg_green",),
cycle_cursor=True, clear_screen=False,
title=f"Select model from {name}:",
)
idx = menu.show()
print()
if idx is None or idx >= len(models):
print("Cancelled.")
return
model_name = models[idx]
except (ImportError, NotImplementedError):
for i, m in enumerate(models, 1):
print(f" {i}. {m}")
print(f" {len(models) + 1}. Cancel")
print()
try:
val = input(f"Choice [1-{len(models) + 1}]: ").strip()
if not val:
print("Cancelled.")
return
idx = int(val) - 1
if idx < 0 or idx >= len(models):
print("Cancelled.")
return
model_name = models[idx]
except (ValueError, KeyboardInterrupt, EOFError):
print("\nCancelled.")
return
else:
print("Could not fetch models from endpoint. Enter model name manually.")
try:
model_name = input("Model name: ").strip()
except (KeyboardInterrupt, EOFError):
print("\nCancelled.")
return
if not model_name:
print("No model specified. Cancelled.")
return
# Activate and save the model to the custom_providers entry
save_env_value("OPENAI_BASE_URL", base_url)
if api_key:
save_env_value("OPENAI_API_KEY", api_key)
_save_model_choice(model_name)
cfg = load_config()
model = cfg.get("model")
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = "custom"
model["base_url"] = base_url
save_config(cfg)
deactivate_provider()
# Save model name to the custom_providers entry for next time
_save_custom_provider(base_url, api_key, model_name)
print(f"\n✅ Model set to: {model_name}")
print(f" Provider: {name} ({base_url})")
# Curated model lists for direct API-key providers
_PROVIDER_MODELS = {
@@ -1162,9 +1433,11 @@ def _model_flow_api_key_provider(config, provider_id, current_model=""):
# Update config with provider and base URL
cfg = load_config()
model = cfg.get("model")
if isinstance(model, dict):
model["provider"] = provider_id
model["base_url"] = effective_base
if not isinstance(model, dict):
model = {"default": model} if model else {}
cfg["model"] = model
model["provider"] = provider_id
model["base_url"] = effective_base
save_config(cfg)
deactivate_provider()
@@ -1520,6 +1793,44 @@ def cmd_update(args):
sys.exit(1)
def _coalesce_session_name_args(argv: list) -> list:
"""Join unquoted multi-word session names after -c/--continue and -r/--resume.
When a user types ``hermes -c Pokemon Agent Dev`` without quoting the
session name, argparse sees three separate tokens. This function merges
them into a single argument so argparse receives
``['-c', 'Pokemon Agent Dev']`` instead.
Tokens are collected after the flag until we hit another flag (``-*``)
or a known top-level subcommand.
"""
_SUBCOMMANDS = {
"chat", "model", "gateway", "setup", "whatsapp", "login", "logout",
"status", "cron", "doctor", "config", "pairing", "skills", "tools",
"sessions", "insights", "version", "update", "uninstall",
}
_SESSION_FLAGS = {"-c", "--continue", "-r", "--resume"}
result = []
i = 0
while i < len(argv):
token = argv[i]
if token in _SESSION_FLAGS:
result.append(token)
i += 1
# Collect subsequent non-flag, non-subcommand tokens as one name
parts: list = []
while i < len(argv) and not argv[i].startswith("-") and argv[i] not in _SUBCOMMANDS:
parts.append(argv[i])
i += 1
if parts:
result.append(" ".join(parts))
else:
result.append(token)
i += 1
return result
def main():
"""Main entry point for hermes CLI."""
parser = argparse.ArgumentParser(
@@ -1578,6 +1889,12 @@ For more help on a command:
default=False,
help="Run in an isolated git worktree (for parallel agents)"
)
parser.add_argument(
"--yolo",
action="store_true",
default=False,
help="Bypass all dangerous command approval prompts (use at your own risk)"
)
subparsers = parser.add_subparsers(dest="command", help="Command to run")
@@ -1612,6 +1929,11 @@ For more help on a command:
action="store_true",
help="Verbose output"
)
chat_parser.add_argument(
"-Q", "--quiet",
action="store_true",
help="Quiet mode for programmatic use: suppress banner, spinner, and tool previews. Only output the final response and session info."
)
chat_parser.add_argument(
"--resume", "-r",
metavar="SESSION_ID",
@@ -1632,6 +1954,18 @@ For more help on a command:
default=False,
help="Run in an isolated git worktree (for parallel agents on the same repo)"
)
chat_parser.add_argument(
"--checkpoints",
action="store_true",
default=False,
help="Enable filesystem checkpoints before destructive file operations (use /rollback to restore)"
)
chat_parser.add_argument(
"--yolo",
action="store_true",
default=False,
help="Bypass all dangerous command approval prompts (use at your own risk)"
)
chat_parser.set_defaults(func=cmd_chat)
# =========================================================================
@@ -1918,8 +2252,8 @@ For more help on a command:
# =========================================================================
skills_parser = subparsers.add_parser(
"skills",
help="Skills Hub — search, install, and manage skills from online registries",
description="Search, install, inspect, audit, and manage skills from GitHub, ClawHub, and other registries."
help="Search, install, configure, and manage skills",
description="Search, install, inspect, audit, configure, and manage skills from GitHub, ClawHub, and other registries."
)
skills_subparsers = skills_parser.add_subparsers(dest="skills_action")
@@ -1973,9 +2307,17 @@ For more help on a command:
tap_rm = tap_subparsers.add_parser("remove", help="Remove a tap")
tap_rm.add_argument("name", help="Tap name to remove")
# config sub-action: interactive enable/disable
skills_subparsers.add_parser("config", help="Interactive skill configuration — enable/disable individual skills")
def cmd_skills(args):
from hermes_cli.skills_hub import skills_command
skills_command(args)
# Route 'config' action to skills_config module
if getattr(args, 'skills_action', None) == 'config':
from hermes_cli.skills_config import skills_command as skills_config_command
skills_config_command(args)
else:
from hermes_cli.skills_hub import skills_command
skills_command(args)
skills_parser.set_defaults(func=cmd_skills)
@@ -1987,13 +2329,17 @@ For more help on a command:
help="Configure which tools are enabled per platform",
description="Interactive tool configuration — enable/disable tools for CLI, Telegram, Discord, etc."
)
tools_parser.add_argument(
"--summary",
action="store_true",
help="Print a summary of enabled tools per platform and exit"
)
def cmd_tools(args):
from hermes_cli.tools_config import tools_command
tools_command(args)
tools_parser.set_defaults(func=cmd_tools)
# =========================================================================
# sessions command
# =========================================================================
@@ -2099,12 +2445,12 @@ For more help on a command:
if not data:
print(f"Session '{args.session_id}' not found.")
return
with open(args.output, "w") as f:
with open(args.output, "w", encoding="utf-8") as f:
f.write(_json.dumps(data, ensure_ascii=False) + "\n")
print(f"Exported 1 session to {args.output}")
else:
sessions = db.export_all(source=args.source)
with open(args.output, "w") as f:
with open(args.output, "w", encoding="utf-8") as f:
for s in sessions:
f.write(_json.dumps(s, ensure_ascii=False) + "\n")
print(f"Exported {len(sessions)} sessions to {args.output}")
@@ -2258,7 +2604,11 @@ For more help on a command:
# =========================================================================
# Parse and execute
# =========================================================================
args = parser.parse_args()
# Pre-process argv so unquoted multi-word session names after -c / -r
# are merged into a single token before argparse sees them.
# e.g. ``hermes -c Pokemon Agent Dev`` → ``hermes -c 'Pokemon Agent Dev'``
_processed_argv = _coalesce_session_name_args(sys.argv[1:])
args = parser.parse_args(_processed_argv)
# Handle --version flag
if args.version:

View File

@@ -63,7 +63,7 @@ _PROVIDER_LABELS = {
"kimi-coding": "Kimi / Moonshot",
"minimax": "MiniMax",
"minimax-cn": "MiniMax (China)",
"custom": "custom endpoint",
"custom": "Custom endpoint",
}
_PROVIDER_ALIASES = {

View File

@@ -66,9 +66,14 @@ def _resolve_openrouter_runtime(
if not cfg_provider or cfg_provider == "auto":
use_config_base_url = True
# When the user explicitly requested the openrouter provider, skip
# OPENAI_BASE_URL — it typically points to a custom / non-OpenRouter
# endpoint and would prevent switching back to OpenRouter (#874).
skip_openai_base = requested_norm == "openrouter"
base_url = (
(explicit_base_url or "").strip()
or env_openai_base_url
or ("" if skip_openai_base else env_openai_base_url)
or (cfg_base_url.strip() if use_config_base_url else "")
or env_openrouter_base_url
or OPENROUTER_BASE_URL

View File

@@ -243,7 +243,7 @@ def prompt_checklist(title: str, items: list, pre_selected: list = None) -> list
else:
selected.add(idx)
else:
print_error(f"Enter a number between 1 and {len(items) + 1}")
print_error(f"Enter a number between 1 and {len(items)}")
except ValueError:
print_error("Enter a number")
except (KeyboardInterrupt, EOFError):
@@ -516,7 +516,8 @@ def setup_model_provider(config: dict):
keep_label = None # No provider configured — don't show "Keep current"
provider_choices = [
"Login with Nous Portal (Nous Research subscription)",
"Nous Portal API key (direct API key access)",
"Login with Nous Portal (Nous Research subscription — OAuth)",
"Login with OpenAI Codex",
"OpenRouter API key (100+ models, pay-per-use)",
"Custom OpenAI-compatible endpoint (self-hosted / VLLM / etc.)",
@@ -529,7 +530,7 @@ def setup_model_provider(config: dict):
provider_choices.append(keep_label)
# Default to "Keep current" if a provider exists, otherwise OpenRouter (most common)
default_provider = len(provider_choices) - 1 if has_any_provider else 2
default_provider = len(provider_choices) - 1 if has_any_provider else 3
if not has_any_provider:
print_warning("An inference provider is required for Hermes to work.")
@@ -541,7 +542,37 @@ def setup_model_provider(config: dict):
selected_provider = None # "nous", "openai-codex", "openrouter", "custom", or None (keep)
nous_models = [] # populated if Nous login succeeds
if provider_idx == 0: # Nous Portal
if provider_idx == 0: # Nous Portal API Key (direct)
selected_provider = "nous-api"
print()
print_header("Nous Portal API Key")
print_info("Use a Nous Portal API key for direct access to Nous inference.")
print_info("Get your API key at: https://portal.nousresearch.com")
print()
existing_key = get_env_value("NOUS_API_KEY")
if existing_key:
print_info(f"Current: {existing_key[:8]}... (configured)")
if prompt_yes_no("Update Nous API key?", False):
api_key = prompt(" Nous API key", password=True)
if api_key:
save_env_value("NOUS_API_KEY", api_key)
print_success("Nous API key updated")
else:
api_key = prompt(" Nous API key", password=True)
if api_key:
save_env_value("NOUS_API_KEY", api_key)
print_success("Nous API key saved")
else:
print_warning("Skipped - agent won't work without an API key")
# Clear custom endpoint vars if switching
if existing_custom:
save_env_value("OPENAI_BASE_URL", "")
save_env_value("OPENAI_API_KEY", "")
_update_config_for_provider("nous-api", "https://inference-api.nousresearch.com/v1")
elif provider_idx == 1: # Nous Portal
selected_provider = "nous"
print()
print_header("Nous Portal Login")
@@ -581,7 +612,7 @@ def setup_model_provider(config: dict):
print_info("You can try again later with: hermes model")
selected_provider = None
elif provider_idx == 1: # OpenAI Codex
elif provider_idx == 2: # OpenAI Codex
selected_provider = "openai-codex"
print()
print_header("OpenAI Codex Login")
@@ -605,7 +636,7 @@ def setup_model_provider(config: dict):
print_info("You can try again later with: hermes model")
selected_provider = None
elif provider_idx == 2: # OpenRouter
elif provider_idx == 3: # OpenRouter
selected_provider = "openrouter"
print()
print_header("OpenRouter API Key")
@@ -632,7 +663,30 @@ def setup_model_provider(config: dict):
save_env_value("OPENAI_BASE_URL", "")
save_env_value("OPENAI_API_KEY", "")
elif provider_idx == 3: # Custom endpoint
# Update config.yaml and deactivate any OAuth provider so the
# resolver doesn't keep returning the old provider (e.g. Codex).
try:
from hermes_cli.auth import deactivate_provider
deactivate_provider()
except Exception:
pass
import yaml
config_path = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes")) / "config.yaml"
try:
disk_cfg = {}
if config_path.exists():
disk_cfg = yaml.safe_load(config_path.read_text()) or {}
model_section = disk_cfg.get("model", {})
if isinstance(model_section, str):
model_section = {"default": model_section}
model_section["provider"] = "openrouter"
model_section.pop("base_url", None) # OpenRouter uses default URL
disk_cfg["model"] = model_section
config_path.write_text(yaml.safe_dump(disk_cfg, sort_keys=False))
except Exception as e:
logger.debug("Could not save provider to config.yaml: %s", e)
elif provider_idx == 4: # Custom endpoint
selected_provider = "custom"
print()
print_header("Custom OpenAI-Compatible Endpoint")
@@ -659,9 +713,31 @@ def setup_model_provider(config: dict):
if model_name:
config['model'] = model_name
save_env_value("LLM_MODEL", model_name)
# Save provider and base_url to config.yaml so the gateway and CLI
# both resolve the correct provider without relying on env-var heuristics.
if base_url:
import yaml
config_path = Path(os.environ.get("HERMES_HOME", Path.home() / ".hermes")) / "config.yaml"
try:
disk_cfg = {}
if config_path.exists():
disk_cfg = yaml.safe_load(config_path.read_text()) or {}
model_section = disk_cfg.get("model", {})
if isinstance(model_section, str):
model_section = {"default": model_section}
model_section["provider"] = "custom"
model_section["base_url"] = base_url.rstrip("/")
if model_name:
model_section["default"] = model_name
disk_cfg["model"] = model_section
config_path.write_text(yaml.safe_dump(disk_cfg, sort_keys=False))
except Exception as e:
logger.debug("Could not save provider to config.yaml: %s", e)
print_success("Custom endpoint configured")
elif provider_idx == 4: # Z.AI / GLM
elif provider_idx == 5: # Z.AI / GLM
selected_provider = "zai"
print()
print_header("Z.AI / GLM API Key")
@@ -715,7 +791,7 @@ def setup_model_provider(config: dict):
save_env_value("OPENAI_API_KEY", "")
_update_config_for_provider("zai", zai_base_url)
elif provider_idx == 5: # Kimi / Moonshot
elif provider_idx == 6: # Kimi / Moonshot
selected_provider = "kimi-coding"
print()
print_header("Kimi / Moonshot API Key")
@@ -747,7 +823,7 @@ def setup_model_provider(config: dict):
save_env_value("OPENAI_API_KEY", "")
_update_config_for_provider("kimi-coding", pconfig.inference_base_url)
elif provider_idx == 6: # MiniMax
elif provider_idx == 7: # MiniMax
selected_provider = "minimax"
print()
print_header("MiniMax API Key")
@@ -779,7 +855,7 @@ def setup_model_provider(config: dict):
save_env_value("OPENAI_API_KEY", "")
_update_config_for_provider("minimax", pconfig.inference_base_url)
elif provider_idx == 7: # MiniMax China
elif provider_idx == 8: # MiniMax China
selected_provider = "minimax-cn"
print()
print_header("MiniMax China API Key")
@@ -811,12 +887,12 @@ def setup_model_provider(config: dict):
save_env_value("OPENAI_API_KEY", "")
_update_config_for_provider("minimax-cn", pconfig.inference_base_url)
# else: provider_idx == 8 (Keep current) — only shown when a provider already exists
# else: provider_idx == 9 (Keep current) — only shown when a provider already exists
# ── OpenRouter API Key for tools (if not already set) ──
# Tools (vision, web, MoA) use OpenRouter independently of the main provider.
# Prompt for OpenRouter key if not set and a non-OpenRouter provider was chosen.
if selected_provider in ("nous", "openai-codex", "custom", "zai", "kimi-coding", "minimax", "minimax-cn") and not get_env_value("OPENROUTER_API_KEY"):
if selected_provider in ("nous", "nous-api", "openai-codex", "custom", "zai", "kimi-coding", "minimax", "minimax-cn") and not get_env_value("OPENROUTER_API_KEY"):
print()
print_header("OpenRouter API Key (for tools)")
print_info("Tools like vision analysis, web search, and MoA use OpenRouter")
@@ -869,6 +945,14 @@ def setup_model_provider(config: dict):
if custom:
config['model'] = custom
save_env_value("LLM_MODEL", custom)
elif selected_provider == "nous-api":
# Nous API key provider — prompt for model manually
print_info("Enter a model name available on Nous inference API.")
print_info("Examples: anthropic/claude-opus-4.6, deepseek/deepseek-r1")
custom = prompt(f" Model name (Enter to keep '{current_model}')")
if custom:
config['model'] = custom
save_env_value("LLM_MODEL", custom)
elif selected_provider == "openai-codex":
from hermes_cli.codex_models import get_codex_model_ids
codex_models = get_codex_model_ids()
@@ -1264,7 +1348,7 @@ def setup_agent_settings(config: dict):
# ── Max Iterations ──
print_header("Agent Settings")
current_max = get_env_value('HERMES_MAX_ITERATIONS') or '90'
current_max = get_env_value('HERMES_MAX_ITERATIONS') or str(config.get('agent', {}).get('max_turns', 90))
print_info("Maximum tool-calling iterations per conversation.")
print_info("Higher = more complex tasks, but costs more tokens.")
print_info("Recommended: 30-60 for most tasks, 100+ for open exploration.")
@@ -1274,7 +1358,8 @@ def setup_agent_settings(config: dict):
max_iter = int(max_iter_str)
if max_iter > 0:
save_env_value("HERMES_MAX_ITERATIONS", str(max_iter))
config['max_turns'] = max_iter
config.setdefault('agent', {})['max_turns'] = max_iter
config.pop('max_turns', None)
print_success(f"Max iterations set to {max_iter}")
except ValueError:
print_warning("Invalid number, keeping current value")
@@ -1527,10 +1612,22 @@ def setup_gateway(config: dict):
if not existing_slack and prompt_yes_no("Set up Slack bot?", False):
print_info("Steps to create a Slack app:")
print_info(" 1. Go to https://api.slack.com/apps → Create New App")
print_info(" 2. Enable Socket Mode: App Settings → Socket Mode → Enable")
print_info(" 3. Bot Token: OAuth & Permissions → Install to Workspace")
print_info(" 4. App Token: Basic Information → App-Level Tokens → Generate")
print_info(" 1. Go to https://api.slack.com/apps → Create New App (from scratch)")
print_info(" 2. Enable Socket Mode: Settings → Socket Mode → Enable")
print_info(" • Create an App-Level Token with 'connections:write' scope")
print_info(" 3. Add Bot Token Scopes: Features → OAuth & Permissions")
print_info(" Required scopes: chat:write, app_mentions:read,")
print_info(" channels:history, channels:read, groups:history,")
print_info(" im:history, im:read, im:write, users:read, files:write")
print_info(" 4. Subscribe to Events: Features → Event Subscriptions → Enable")
print_info(" Required events: message.im, message.channels,")
print_info(" message.groups, app_mention")
print_warning(" ⚠ Without message.channels/message.groups events,")
print_warning(" the bot will ONLY work in DMs, not channels!")
print_info(" 5. Install to Workspace: Settings → Install App")
print_info(" 6. After installing, invite the bot to channels: /invite @YourBot")
print()
print_info(" Full guide: https://hermes-agent.ai/docs/user-guide/messaging/slack")
print()
bot_token = prompt("Slack Bot Token (xoxb-...)", password=True)
if bot_token:
@@ -1542,7 +1639,7 @@ def setup_gateway(config: dict):
print()
print_info("🔒 Security: Restrict who can use your bot")
print_info(" Find Slack user IDs in your profile or via the Slack API")
print_info(" To find a Member ID: click a user's name → View full profile → ⋮ → Copy member ID")
print()
allowed_users = prompt("Allowed user IDs (comma-separated, leave empty for open access)")
if allowed_users:

179
hermes_cli/skills_config.py Normal file
View File

@@ -0,0 +1,179 @@
"""
Skills configuration for Hermes Agent.
`hermes skills` enters this module.
Toggle individual skills or categories on/off, globally or per-platform.
Config stored in ~/.hermes/config.yaml under:
skills:
disabled: [skill-a, skill-b] # global disabled list
platform_disabled: # per-platform overrides
telegram: [skill-c]
cli: []
"""
from typing import Dict, List, Optional, Set
from hermes_cli.config import load_config, save_config
from hermes_cli.colors import Colors, color
PLATFORMS = {
"cli": "🖥️ CLI",
"telegram": "📱 Telegram",
"discord": "💬 Discord",
"slack": "💼 Slack",
"whatsapp": "📱 WhatsApp",
}
# ─── Config Helpers ───────────────────────────────────────────────────────────
def get_disabled_skills(config: dict, platform: Optional[str] = None) -> Set[str]:
"""Return disabled skill names. Platform-specific list falls back to global."""
skills_cfg = config.get("skills", {})
global_disabled = set(skills_cfg.get("disabled", []))
if platform is None:
return global_disabled
platform_disabled = skills_cfg.get("platform_disabled", {}).get(platform)
if platform_disabled is None:
return global_disabled
return set(platform_disabled)
def save_disabled_skills(config: dict, disabled: Set[str], platform: Optional[str] = None):
"""Persist disabled skill names to config."""
config.setdefault("skills", {})
if platform is None:
config["skills"]["disabled"] = sorted(disabled)
else:
config["skills"].setdefault("platform_disabled", {})
config["skills"]["platform_disabled"][platform] = sorted(disabled)
save_config(config)
# ─── Skill Discovery ─────────────────────────────────────────────────────────
def _list_all_skills() -> List[dict]:
"""Return all installed skills (ignoring disabled state)."""
try:
from tools.skills_tool import _find_all_skills
return _find_all_skills(skip_disabled=True)
except Exception:
return []
def _get_categories(skills: List[dict]) -> List[str]:
"""Return sorted unique category names (None -> 'uncategorized')."""
return sorted({s["category"] or "uncategorized" for s in skills})
# ─── Platform Selection ──────────────────────────────────────────────────────
def _select_platform() -> Optional[str]:
"""Ask user which platform to configure, or global."""
options = [("global", "All platforms (global default)")] + list(PLATFORMS.items())
print()
print(color(" Configure skills for:", Colors.BOLD))
for i, (key, label) in enumerate(options, 1):
print(f" {i}. {label}")
print()
try:
raw = input(color(" Select [1]: ", Colors.YELLOW)).strip()
except (KeyboardInterrupt, EOFError):
return None
if not raw:
return None # global
try:
idx = int(raw) - 1
if 0 <= idx < len(options):
key = options[idx][0]
return None if key == "global" else key
except ValueError:
pass
return None
# ─── Category Toggle ─────────────────────────────────────────────────────────
def _toggle_by_category(skills: List[dict], disabled: Set[str]) -> Set[str]:
"""Toggle all skills in a category at once."""
from hermes_cli.curses_ui import curses_checklist
categories = _get_categories(skills)
cat_labels = []
# A category is "enabled" (checked) when NOT all its skills are disabled
pre_selected = set()
for i, cat in enumerate(categories):
cat_skills = [s["name"] for s in skills if (s["category"] or "uncategorized") == cat]
cat_labels.append(f"{cat} ({len(cat_skills)} skills)")
if not all(s in disabled for s in cat_skills):
pre_selected.add(i)
chosen = curses_checklist(
"Categories — toggle entire categories",
cat_labels, pre_selected, cancel_returns=pre_selected,
)
new_disabled = set(disabled)
for i, cat in enumerate(categories):
cat_skills = {s["name"] for s in skills if (s["category"] or "uncategorized") == cat}
if i in chosen:
new_disabled -= cat_skills # category enabled → remove from disabled
else:
new_disabled |= cat_skills # category disabled → add to disabled
return new_disabled
# ─── Entry Point ──────────────────────────────────────────────────────────────
def skills_command(args=None):
"""Entry point for `hermes skills`."""
from hermes_cli.curses_ui import curses_checklist
config = load_config()
skills = _list_all_skills()
if not skills:
print(color(" No skills installed.", Colors.DIM))
return
# Step 1: Select platform
platform = _select_platform()
platform_label = PLATFORMS.get(platform, "All platforms") if platform else "All platforms"
# Step 2: Select mode — individual or by category
print()
print(color(f" Configure for: {platform_label}", Colors.DIM))
print()
print(" 1. Toggle individual skills")
print(" 2. Toggle by category")
print()
try:
mode = input(color(" Select [1]: ", Colors.YELLOW)).strip() or "1"
except (KeyboardInterrupt, EOFError):
return
disabled = get_disabled_skills(config, platform)
if mode == "2":
new_disabled = _toggle_by_category(skills, disabled)
else:
# Build labels and map indices → skill names
labels = [
f"{s['name']} ({s['category'] or 'uncategorized'}) — {s['description'][:55]}"
for s in skills
]
# "selected" = enabled (not disabled) — matches the [✓] convention
pre_selected = {i for i, s in enumerate(skills) if s["name"] not in disabled}
chosen = curses_checklist(
f"Skills for {platform_label}",
labels, pre_selected, cancel_returns=pre_selected,
)
# Anything NOT chosen is disabled
new_disabled = {skills[i]["name"] for i in range(len(skills)) if i not in chosen}
if new_disabled == disabled:
print(color(" No changes.", Colors.DIM))
return
save_disabled_skills(config, new_disabled, platform)
enabled_count = len(skills) - len(new_disabled)
print(color(f"✓ Saved: {enabled_count} enabled, {len(new_disabled)} disabled ({platform_label}).", Colors.GREEN))

630
hermes_cli/skin_engine.py Normal file
View File

@@ -0,0 +1,630 @@
"""Hermes CLI skin/theme engine.
A data-driven skin system that lets users customize the CLI's visual appearance.
Skins are defined as YAML files in ~/.hermes/skins/ or as built-in presets.
No code changes are needed to add a new skin.
SKIN YAML SCHEMA
================
All fields are optional. Missing values inherit from the ``default`` skin.
.. code-block:: yaml
# Required: skin identity
name: mytheme # Unique skin name (lowercase, hyphens ok)
description: Short description # Shown in /skin listing
# Colors: hex values for Rich markup (banner, UI, response box)
colors:
banner_border: "#CD7F32" # Panel border color
banner_title: "#FFD700" # Panel title text color
banner_accent: "#FFBF00" # Section headers (Available Tools, etc.)
banner_dim: "#B8860B" # Dim/muted text (separators, labels)
banner_text: "#FFF8DC" # Body text (tool names, skill names)
ui_accent: "#FFBF00" # General UI accent
ui_label: "#4dd0e1" # UI labels
ui_ok: "#4caf50" # Success indicators
ui_error: "#ef5350" # Error indicators
ui_warn: "#ffa726" # Warning indicators
prompt: "#FFF8DC" # Prompt text color
input_rule: "#CD7F32" # Input area horizontal rule
response_border: "#FFD700" # Response box border (ANSI)
session_label: "#DAA520" # Session label color
session_border: "#8B8682" # Session ID dim color
# Spinner: customize the animated spinner during API calls
spinner:
waiting_faces: # Faces shown while waiting for API
- "(⚔)"
- "(⛨)"
thinking_faces: # Faces shown during reasoning
- "(⌁)"
- "(<>)"
thinking_verbs: # Verbs for spinner messages
- "forging"
- "plotting"
wings: # Optional left/right spinner decorations
- ["⟪⚔", "⚔⟫"] # Each entry is [left, right] pair
- ["⟪▲", "▲⟫"]
# Branding: text strings used throughout the CLI
branding:
agent_name: "Hermes Agent" # Banner title, status display
welcome: "Welcome message" # Shown at CLI startup
goodbye: "Goodbye! ⚕" # Shown on exit
response_label: " ⚕ Hermes " # Response box header label
prompt_symbol: " " # Input prompt symbol
help_header: "(^_^)? Commands" # /help header text
# Tool prefix: character for tool output lines (default: ┊)
tool_prefix: ""
USAGE
=====
.. code-block:: python
from hermes_cli.skin_engine import get_active_skin, list_skins, set_active_skin
skin = get_active_skin()
print(skin.colors["banner_title"]) # "#FFD700"
print(skin.get_branding("agent_name")) # "Hermes Agent"
set_active_skin("ares") # Switch to built-in ares skin
set_active_skin("mytheme") # Switch to user skin from ~/.hermes/skins/
BUILT-IN SKINS
==============
- ``default`` — Classic Hermes gold/kawaii (the current look)
- ``ares`` — Crimson/bronze war-god theme with custom spinner wings
- ``mono`` — Clean grayscale monochrome
- ``slate`` — Cool blue developer-focused theme
USER SKINS
==========
Drop a YAML file in ``~/.hermes/skins/<name>.yaml`` following the schema above.
Activate with ``/skin <name>`` in the CLI or ``display.skin: <name>`` in config.yaml.
"""
import logging
import os
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple
logger = logging.getLogger(__name__)
# =============================================================================
# Skin data structure
# =============================================================================
@dataclass
class SkinConfig:
"""Complete skin configuration."""
name: str
description: str = ""
colors: Dict[str, str] = field(default_factory=dict)
spinner: Dict[str, Any] = field(default_factory=dict)
branding: Dict[str, str] = field(default_factory=dict)
tool_prefix: str = ""
banner_logo: str = "" # Rich-markup ASCII art logo (replaces HERMES_AGENT_LOGO)
banner_hero: str = "" # Rich-markup hero art (replaces HERMES_CADUCEUS)
def get_color(self, key: str, fallback: str = "") -> str:
"""Get a color value with fallback."""
return self.colors.get(key, fallback)
def get_spinner_list(self, key: str) -> List[str]:
"""Get a spinner list (faces, verbs, etc.)."""
return self.spinner.get(key, [])
def get_spinner_wings(self) -> List[Tuple[str, str]]:
"""Get spinner wing pairs, or empty list if none."""
raw = self.spinner.get("wings", [])
result = []
for pair in raw:
if isinstance(pair, (list, tuple)) and len(pair) == 2:
result.append((str(pair[0]), str(pair[1])))
return result
def get_branding(self, key: str, fallback: str = "") -> str:
"""Get a branding value with fallback."""
return self.branding.get(key, fallback)
# =============================================================================
# Built-in skin definitions
# =============================================================================
_BUILTIN_SKINS: Dict[str, Dict[str, Any]] = {
"default": {
"name": "default",
"description": "Classic Hermes — gold and kawaii",
"colors": {
"banner_border": "#CD7F32",
"banner_title": "#FFD700",
"banner_accent": "#FFBF00",
"banner_dim": "#B8860B",
"banner_text": "#FFF8DC",
"ui_accent": "#FFBF00",
"ui_label": "#4dd0e1",
"ui_ok": "#4caf50",
"ui_error": "#ef5350",
"ui_warn": "#ffa726",
"prompt": "#FFF8DC",
"input_rule": "#CD7F32",
"response_border": "#FFD700",
"session_label": "#DAA520",
"session_border": "#8B8682",
},
"spinner": {
# Empty = use hardcoded defaults in display.py
},
"branding": {
"agent_name": "Hermes Agent",
"welcome": "Welcome to Hermes Agent! Type your message or /help for commands.",
"goodbye": "Goodbye! ⚕",
"response_label": " ⚕ Hermes ",
"prompt_symbol": " ",
"help_header": "(^_^)? Available Commands",
},
"tool_prefix": "",
},
"ares": {
"name": "ares",
"description": "War-god theme — crimson and bronze",
"colors": {
"banner_border": "#9F1C1C",
"banner_title": "#C7A96B",
"banner_accent": "#DD4A3A",
"banner_dim": "#6B1717",
"banner_text": "#F1E6CF",
"ui_accent": "#DD4A3A",
"ui_label": "#C7A96B",
"ui_ok": "#4caf50",
"ui_error": "#ef5350",
"ui_warn": "#ffa726",
"prompt": "#F1E6CF",
"input_rule": "#9F1C1C",
"response_border": "#C7A96B",
"session_label": "#C7A96B",
"session_border": "#6E584B",
},
"spinner": {
"waiting_faces": ["(⚔)", "(⛨)", "(▲)", "(<>)", "(/)"],
"thinking_faces": ["(⚔)", "(⛨)", "(▲)", "(⌁)", "(<>)"],
"thinking_verbs": [
"forging", "marching", "sizing the field", "holding the line",
"hammering plans", "tempering steel", "plotting impact", "raising the shield",
],
"wings": [
["⟪⚔", "⚔⟫"],
["⟪▲", "▲⟫"],
["⟪╸", "╺⟫"],
["⟪⛨", "⛨⟫"],
],
},
"branding": {
"agent_name": "Ares Agent",
"welcome": "Welcome to Ares Agent! Type your message or /help for commands.",
"goodbye": "Farewell, warrior! ⚔",
"response_label": " ⚔ Ares ",
"prompt_symbol": " ",
"help_header": "(⚔) Available Commands",
},
"tool_prefix": "",
"banner_logo": """[bold #A3261F] █████╗ ██████╗ ███████╗███████╗ █████╗ ██████╗ ███████╗███╗ ██╗████████╗[/]
[bold #B73122]██╔══██╗██╔══██╗██╔════╝██╔════╝ ██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝[/]
[#C93C24]███████║██████╔╝█████╗ ███████╗█████╗███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║[/]
[#D84A28]██╔══██║██╔══██╗██╔══╝ ╚════██║╚════╝██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║[/]
[#E15A2D]██║ ██║██║ ██║███████╗███████║ ██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║[/]
[#EB6C32]╚═╝ ╚═╝╚═╝ ╚═╝╚══════╝╚══════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝[/]""",
"banner_hero": """[#9F1C1C]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣤⣤⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#9F1C1C]⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣴⣿⠟⠻⣿⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#C7A96B]⠀⠀⠀⠀⠀⠀⠀⣠⣾⡿⠋⠀⠀⠀⠙⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀[/]
[#C7A96B]⠀⠀⠀⠀⠀⢀⣾⡿⠋⠀⠀⢠⡄⠀⠀⠙⢿⣷⡀⠀⠀⠀⠀⠀[/]
[#DD4A3A]⠀⠀⠀⠀⣰⣿⠟⠀⠀⠀⣰⣿⣿⣆⠀⠀⠀⠻⣿⣆⠀⠀⠀⠀[/]
[#DD4A3A]⠀⠀⠀⢰⣿⠏⠀⠀⢀⣾⡿⠉⢿⣷⡀⠀⠀⠹⣿⡆⠀⠀⠀[/]
[#9F1C1C]⠀⠀⠀⣿⡟⠀⠀⣠⣿⠟⠀⠀⠀⠻⣿⣄⠀⠀⢻⣿⠀⠀⠀[/]
[#9F1C1C]⠀⠀⠀⣿⡇⠀⠀⠙⠋⠀⠀⚔⠀⠀⠙⠋⠀⠀⢸⣿⠀⠀⠀[/]
[#6B1717]⠀⠀⠀⢿⣧⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣼⡿⠀⠀⠀[/]
[#6B1717]⠀⠀⠀⠘⢿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⣾⡿⠃⠀⠀⠀[/]
[#C7A96B]⠀⠀⠀⠀⠈⠻⣿⣷⣦⣤⣀⣀⣤⣤⣶⣿⠿⠋⠀⠀⠀⠀[/]
[#C7A96B]⠀⠀⠀⠀⠀⠀⠀⠉⠛⠿⠿⠿⠿⠛⠉⠀⠀⠀⠀⠀⠀⠀[/]
[#DD4A3A]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⚔⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[dim #6B1717]war god online[/]""",
},
"mono": {
"name": "mono",
"description": "Monochrome — clean grayscale",
"colors": {
"banner_border": "#555555",
"banner_title": "#e6edf3",
"banner_accent": "#aaaaaa",
"banner_dim": "#444444",
"banner_text": "#c9d1d9",
"ui_accent": "#aaaaaa",
"ui_label": "#888888",
"ui_ok": "#888888",
"ui_error": "#cccccc",
"ui_warn": "#999999",
"prompt": "#c9d1d9",
"input_rule": "#444444",
"response_border": "#aaaaaa",
"session_label": "#888888",
"session_border": "#555555",
},
"spinner": {},
"branding": {
"agent_name": "Hermes Agent",
"welcome": "Welcome to Hermes Agent! Type your message or /help for commands.",
"goodbye": "Goodbye! ⚕",
"response_label": " ⚕ Hermes ",
"prompt_symbol": " ",
"help_header": "[?] Available Commands",
},
"tool_prefix": "",
},
"slate": {
"name": "slate",
"description": "Cool blue — developer-focused",
"colors": {
"banner_border": "#4169e1",
"banner_title": "#7eb8f6",
"banner_accent": "#8EA8FF",
"banner_dim": "#4b5563",
"banner_text": "#c9d1d9",
"ui_accent": "#7eb8f6",
"ui_label": "#8EA8FF",
"ui_ok": "#63D0A6",
"ui_error": "#F7A072",
"ui_warn": "#e6a855",
"prompt": "#c9d1d9",
"input_rule": "#4169e1",
"response_border": "#7eb8f6",
"session_label": "#7eb8f6",
"session_border": "#4b5563",
},
"spinner": {},
"branding": {
"agent_name": "Hermes Agent",
"welcome": "Welcome to Hermes Agent! Type your message or /help for commands.",
"goodbye": "Goodbye! ⚕",
"response_label": " ⚕ Hermes ",
"prompt_symbol": " ",
"help_header": "(^_^)? Available Commands",
},
"tool_prefix": "",
},
"poseidon": {
"name": "poseidon",
"description": "Ocean-god theme — deep blue and seafoam",
"colors": {
"banner_border": "#2A6FB9",
"banner_title": "#A9DFFF",
"banner_accent": "#5DB8F5",
"banner_dim": "#153C73",
"banner_text": "#EAF7FF",
"ui_accent": "#5DB8F5",
"ui_label": "#A9DFFF",
"ui_ok": "#4caf50",
"ui_error": "#ef5350",
"ui_warn": "#ffa726",
"prompt": "#EAF7FF",
"input_rule": "#2A6FB9",
"response_border": "#5DB8F5",
"session_label": "#A9DFFF",
"session_border": "#496884",
},
"spinner": {
"waiting_faces": ["(≈)", "(Ψ)", "(∿)", "(◌)", "(◠)"],
"thinking_faces": ["(Ψ)", "(∿)", "(≈)", "(⌁)", "(◌)"],
"thinking_verbs": [
"charting currents", "sounding the depth", "reading foam lines",
"steering the trident", "tracking undertow", "plotting sea lanes",
"calling the swell", "measuring pressure",
],
"wings": [
["⟪≈", "≈⟫"],
["⟪Ψ", "Ψ⟫"],
["⟪∿", "∿⟫"],
["⟪◌", "◌⟫"],
],
},
"branding": {
"agent_name": "Poseidon Agent",
"welcome": "Welcome to Poseidon Agent! Type your message or /help for commands.",
"goodbye": "Fair winds! Ψ",
"response_label": " Ψ Poseidon ",
"prompt_symbol": "Ψ ",
"help_header": "(Ψ) Available Commands",
},
"tool_prefix": "",
"banner_logo": """[bold #B8E8FF]██████╗ ██████╗ ███████╗██╗██████╗ ███████╗ ██████╗ ███╗ ██╗ █████╗ ██████╗ ███████╗███╗ ██╗████████╗[/]
[bold #97D6FF]██╔══██╗██╔═══██╗██╔════╝██║██╔══██╗██╔════╝██╔═══██╗████╗ ██║ ██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝[/]
[#75C1F6]██████╔╝██║ ██║███████╗██║██║ ██║█████╗ ██║ ██║██╔██╗ ██║█████╗███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║[/]
[#4FA2E0]██╔═══╝ ██║ ██║╚════██║██║██║ ██║██╔══╝ ██║ ██║██║╚██╗██║╚════╝██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║[/]
[#2E7CC7]██║ ╚██████╔╝███████║██║██████╔╝███████╗╚██████╔╝██║ ╚████║ ██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║[/]
[#1B4F95]╚═╝ ╚═════╝ ╚══════╝╚═╝╚═════╝ ╚══════╝ ╚═════╝ ╚═╝ ╚═══╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝[/]""",
"banner_hero": """[#2A6FB9]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#5DB8F5]⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⣾⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#5DB8F5]⠀⠀⠀⠀⠀⠀⠀⢠⣿⠏⠀Ψ⠀⠹⣿⡄⠀⠀⠀⠀⠀⠀⠀[/]
[#A9DFFF]⠀⠀⠀⠀⠀⠀⠀⣿⡟⠀⠀⠀⠀⠀⢻⣿⠀⠀⠀⠀⠀⠀⠀[/]
[#A9DFFF]⠀⠀⠀≈≈≈≈≈⣿⡇⠀⠀⠀⠀⠀⢸⣿≈≈≈≈≈⠀⠀⠀[/]
[#5DB8F5]⠀⠀⠀⠀⠀⠀⠀⣿⡇⠀⠀⠀⠀⠀⢸⣿⠀⠀⠀⠀⠀⠀⠀[/]
[#2A6FB9]⠀⠀⠀⠀⠀⠀⠀⢿⣧⠀⠀⠀⠀⠀⣼⡿⠀⠀⠀⠀⠀⠀⠀[/]
[#2A6FB9]⠀⠀⠀⠀⠀⠀⠀⠘⢿⣷⣄⣀⣠⣾⡿⠃⠀⠀⠀⠀⠀⠀⠀[/]
[#153C73]⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣿⣿⡿⠟⠁⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#153C73]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#5DB8F5]⠀⠀⠀⠀⠀≈≈≈≈≈≈≈≈≈≈≈≈≈≈≈⠀⠀⠀⠀⠀[/]
[#A9DFFF]⠀⠀⠀⠀⠀⠀≈≈≈≈≈≈≈≈≈≈≈≈≈⠀⠀⠀⠀⠀⠀[/]
[dim #153C73]deep waters hold[/]""",
},
"sisyphus": {
"name": "sisyphus",
"description": "Sisyphean theme — austere grayscale with persistence",
"colors": {
"banner_border": "#B7B7B7",
"banner_title": "#F5F5F5",
"banner_accent": "#E7E7E7",
"banner_dim": "#4A4A4A",
"banner_text": "#D3D3D3",
"ui_accent": "#E7E7E7",
"ui_label": "#D3D3D3",
"ui_ok": "#919191",
"ui_error": "#E7E7E7",
"ui_warn": "#B7B7B7",
"prompt": "#F5F5F5",
"input_rule": "#656565",
"response_border": "#B7B7B7",
"session_label": "#919191",
"session_border": "#656565",
},
"spinner": {
"waiting_faces": ["(◉)", "(◌)", "(◬)", "(⬤)", "(::)"],
"thinking_faces": ["(◉)", "(◬)", "(◌)", "(○)", "(●)"],
"thinking_verbs": [
"finding traction", "measuring the grade", "resetting the boulder",
"counting the ascent", "testing leverage", "setting the shoulder",
"pushing uphill", "enduring the loop",
],
"wings": [
["⟪◉", "◉⟫"],
["⟪◬", "◬⟫"],
["⟪◌", "◌⟫"],
["⟪⬤", "⬤⟫"],
],
},
"branding": {
"agent_name": "Sisyphus Agent",
"welcome": "Welcome to Sisyphus Agent! Type your message or /help for commands.",
"goodbye": "The boulder waits. ◉",
"response_label": " ◉ Sisyphus ",
"prompt_symbol": " ",
"help_header": "(◉) Available Commands",
},
"tool_prefix": "",
"banner_logo": """[bold #F5F5F5]███████╗██╗███████╗██╗ ██╗██████╗ ██╗ ██╗██╗ ██╗███████╗ █████╗ ██████╗ ███████╗███╗ ██╗████████╗[/]
[bold #E7E7E7]██╔════╝██║██╔════╝╚██╗ ██╔╝██╔══██╗██║ ██║██║ ██║██╔════╝ ██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝[/]
[#D7D7D7]███████╗██║███████╗ ╚████╔╝ ██████╔╝███████║██║ ██║███████╗█████╗███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║[/]
[#BFBFBF]╚════██║██║╚════██║ ╚██╔╝ ██╔═══╝ ██╔══██║██║ ██║╚════██║╚════╝██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║[/]
[#8F8F8F]███████║██║███████║ ██║ ██║ ██║ ██║╚██████╔╝███████║ ██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║[/]
[#626262]╚══════╝╚═╝╚══════╝ ╚═╝ ╚═╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝[/]""",
"banner_hero": """[#B7B7B7]⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣀⣀⣀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#D3D3D3]⠀⠀⠀⠀⠀⠀⠀⣠⣾⣿⣿⣿⣿⣷⣄⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#E7E7E7]⠀⠀⠀⠀⠀⠀⣾⣿⣿⣿⣿⣿⣿⣿⣷⠀⠀⠀⠀⠀⠀⠀[/]
[#F5F5F5]⠀⠀⠀⠀⠀⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⡇⠀⠀⠀⠀⠀⠀[/]
[#E7E7E7]⠀⠀⠀⠀⠀⠀⣿⣿⣿⣿⣿⣿⣿⣿⣿⠀⠀⠀⠀⠀⠀⠀[/]
[#D3D3D3]⠀⠀⠀⠀⠀⠀⠘⢿⣿⣿⣿⣿⣿⡿⠃⠀⠀⠀⠀⠀⠀⠀[/]
[#B7B7B7]⠀⠀⠀⠀⠀⠀⠀⠀⠙⠿⣿⠿⠋⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#919191][/]
[#656565]⠀⠀⠀⠀⠀⠀⠀⠀⠀⣰⡄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#656565]⠀⠀⠀⠀⠀⠀⠀⠀⣰⣿⣿⣆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#4A4A4A]⠀⠀⠀⠀⠀⠀⠀⣰⣿⣿⣿⣿⣆⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#4A4A4A]⠀⠀⠀⠀⠀⣀⣴⣿⣿⣿⣿⣿⣿⣦⣀⠀⠀⠀⠀⠀⠀[/]
[#656565]⠀⠀⠀━━━━━━━━━━━━━━━━━━━━━━━⠀⠀⠀[/]
[dim #4A4A4A]the boulder[/]""",
},
"charizard": {
"name": "charizard",
"description": "Volcanic theme — burnt orange and ember",
"colors": {
"banner_border": "#C75B1D",
"banner_title": "#FFD39A",
"banner_accent": "#F29C38",
"banner_dim": "#7A3511",
"banner_text": "#FFF0D4",
"ui_accent": "#F29C38",
"ui_label": "#FFD39A",
"ui_ok": "#4caf50",
"ui_error": "#ef5350",
"ui_warn": "#ffa726",
"prompt": "#FFF0D4",
"input_rule": "#C75B1D",
"response_border": "#F29C38",
"session_label": "#FFD39A",
"session_border": "#6C4724",
},
"spinner": {
"waiting_faces": ["(✦)", "(▲)", "(◇)", "(<>)", "(🔥)"],
"thinking_faces": ["(✦)", "(▲)", "(◇)", "(⌁)", "(🔥)"],
"thinking_verbs": [
"banking into the draft", "measuring burn", "reading the updraft",
"tracking ember fall", "setting wing angle", "holding the flame core",
"plotting a hot landing", "coiling for lift",
],
"wings": [
["⟪✦", "✦⟫"],
["⟪▲", "▲⟫"],
["⟪◌", "◌⟫"],
["⟪◇", "◇⟫"],
],
},
"branding": {
"agent_name": "Charizard Agent",
"welcome": "Welcome to Charizard Agent! Type your message or /help for commands.",
"goodbye": "Flame out! ✦",
"response_label": " ✦ Charizard ",
"prompt_symbol": " ",
"help_header": "(✦) Available Commands",
},
"tool_prefix": "",
"banner_logo": """[bold #FFF0D4] ██████╗██╗ ██╗ █████╗ ██████╗ ██╗███████╗ █████╗ ██████╗ ██████╗ █████╗ ██████╗ ███████╗███╗ ██╗████████╗[/]
[bold #FFD39A]██╔════╝██║ ██║██╔══██╗██╔══██╗██║╚══███╔╝██╔══██╗██╔══██╗██╔══██╗ ██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝[/]
[#F29C38]██║ ███████║███████║██████╔╝██║ ███╔╝ ███████║██████╔╝██║ ██║█████╗███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║[/]
[#E2832B]██║ ██╔══██║██╔══██║██╔══██╗██║ ███╔╝ ██╔══██║██╔══██╗██║ ██║╚════╝██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║[/]
[#C75B1D]╚██████╗██║ ██║██║ ██║██║ ██║██║███████╗██║ ██║██║ ██║██████╔╝ ██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║[/]
[#7A3511] ╚═════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═╝╚══════╝╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝[/]""",
"banner_hero": """[#FFD39A]⠀⠀⠀⠀⠀⠀⠀⠀⣀⣤⠶⠶⠶⣤⣀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#F29C38]⠀⠀⠀⠀⠀⠀⣴⠟⠁⠀⠀⠀⠀⠈⠻⣦⠀⠀⠀⠀⠀⠀[/]
[#F29C38]⠀⠀⠀⠀⠀⣼⠏⠀⠀⠀✦⠀⠀⠀⠀⠹⣧⠀⠀⠀⠀⠀[/]
[#E2832B]⠀⠀⠀⠀⢰⡟⠀⠀⣀⣤⣤⣤⣀⠀⠀⠀⢻⡆⠀⠀⠀⠀[/]
[#E2832B]⠀⠀⣠⡾⠛⠁⣠⣾⠟⠉⠀⠉⠻⣷⣄⠀⠈⠛⢷⣄⠀⠀[/]
[#C75B1D]⠀⣼⠟⠀⢀⣾⠟⠁⠀⠀⠀⠀⠀⠈⠻⣷⡀⠀⠻⣧⠀[/]
[#C75B1D]⢸⡟⠀⠀⣿⡟⠀⠀⠀🔥⠀⠀⠀⠀⢻⣿⠀⠀⢻⡇[/]
[#7A3511]⠀⠻⣦⡀⠘⢿⣧⡀⠀⠀⠀⠀⠀⢀⣼⡿⠃⢀⣴⠟⠀[/]
[#7A3511]⠀⠀⠈⠻⣦⣀⠙⢿⣷⣤⣤⣤⣾⡿⠋⣀⣴⠟⠁⠀⠀[/]
[#C75B1D]⠀⠀⠀⠀⠈⠙⠛⠶⠤⠭⠭⠤⠶⠛⠋⠁⠀⠀⠀⠀[/]
[#F29C38]⠀⠀⠀⠀⠀⠀⠀⠀⣰⡿⢿⣆⠀⠀⠀⠀⠀⠀⠀⠀⠀[/]
[#F29C38]⠀⠀⠀⠀⠀⠀⠀⣼⡟⠀⠀⢻⣧⠀⠀⠀⠀⠀⠀⠀⠀[/]
[dim #7A3511]tail flame lit[/]""",
},
}
# =============================================================================
# Skin loading and management
# =============================================================================
_active_skin: Optional[SkinConfig] = None
_active_skin_name: str = "default"
def _skins_dir() -> Path:
"""User skins directory."""
home = Path(os.getenv("HERMES_HOME", Path.home() / ".hermes"))
return home / "skins"
def _load_skin_from_yaml(path: Path) -> Optional[Dict[str, Any]]:
"""Load a skin definition from a YAML file."""
try:
import yaml
with open(path, "r", encoding="utf-8") as f:
data = yaml.safe_load(f)
if isinstance(data, dict) and "name" in data:
return data
except Exception as e:
logger.debug("Failed to load skin from %s: %s", path, e)
return None
def _build_skin_config(data: Dict[str, Any]) -> SkinConfig:
"""Build a SkinConfig from a raw dict (built-in or loaded from YAML)."""
# Start with default values as base for missing keys
default = _BUILTIN_SKINS["default"]
colors = dict(default.get("colors", {}))
colors.update(data.get("colors", {}))
spinner = dict(default.get("spinner", {}))
spinner.update(data.get("spinner", {}))
branding = dict(default.get("branding", {}))
branding.update(data.get("branding", {}))
return SkinConfig(
name=data.get("name", "unknown"),
description=data.get("description", ""),
colors=colors,
spinner=spinner,
branding=branding,
tool_prefix=data.get("tool_prefix", default.get("tool_prefix", "")),
banner_logo=data.get("banner_logo", ""),
banner_hero=data.get("banner_hero", ""),
)
def list_skins() -> List[Dict[str, str]]:
"""List all available skins (built-in + user-installed).
Returns list of {"name": ..., "description": ..., "source": "builtin"|"user"}.
"""
result = []
for name, data in _BUILTIN_SKINS.items():
result.append({
"name": name,
"description": data.get("description", ""),
"source": "builtin",
})
skins_path = _skins_dir()
if skins_path.is_dir():
for f in sorted(skins_path.glob("*.yaml")):
data = _load_skin_from_yaml(f)
if data:
skin_name = data.get("name", f.stem)
# Skip if it shadows a built-in
if any(s["name"] == skin_name for s in result):
continue
result.append({
"name": skin_name,
"description": data.get("description", ""),
"source": "user",
})
return result
def load_skin(name: str) -> SkinConfig:
"""Load a skin by name. Checks user skins first, then built-in."""
# Check user skins directory
skins_path = _skins_dir()
user_file = skins_path / f"{name}.yaml"
if user_file.is_file():
data = _load_skin_from_yaml(user_file)
if data:
return _build_skin_config(data)
# Check built-in skins
if name in _BUILTIN_SKINS:
return _build_skin_config(_BUILTIN_SKINS[name])
# Fallback to default
logger.warning("Skin '%s' not found, using default", name)
return _build_skin_config(_BUILTIN_SKINS["default"])
def get_active_skin() -> SkinConfig:
"""Get the currently active skin config (cached)."""
global _active_skin
if _active_skin is None:
_active_skin = load_skin(_active_skin_name)
return _active_skin
def set_active_skin(name: str) -> SkinConfig:
"""Switch the active skin. Returns the new SkinConfig."""
global _active_skin, _active_skin_name
_active_skin_name = name
_active_skin = load_skin(name)
return _active_skin
def get_active_skin_name() -> str:
"""Get the name of the currently active skin."""
return _active_skin_name
def init_skin_from_config(config: dict) -> None:
"""Initialize the active skin from CLI config at startup.
Call this once during CLI init with the loaded config dict.
"""
display = config.get("display", {})
skin_name = display.get("skin", "default")
if isinstance(skin_name, str) and skin_name.strip():
set_active_skin(skin_name.strip())
else:
set_active_skin("default")

View File

@@ -263,7 +263,7 @@ def show_status(args):
if jobs_file.exists():
import json
try:
with open(jobs_file) as f:
with open(jobs_file, encoding="utf-8") as f:
data = json.load(f)
jobs = data.get("jobs", [])
enabled_jobs = [j for j in jobs if j.get("enabled", True)]
@@ -283,7 +283,7 @@ def show_status(args):
if sessions_file.exists():
import json
try:
with open(sessions_file) as f:
with open(sessions_file, encoding="utf-8") as f:
data = json.load(f)
print(f" Active: {len(data)} session(s)")
except Exception:

View File

@@ -11,7 +11,7 @@ the `platform_toolsets` key.
import sys
from pathlib import Path
from typing import Dict, List, Set
from typing import Dict, List, Optional, Set
import os
@@ -308,6 +308,22 @@ def _get_enabled_platforms() -> List[str]:
return enabled
def _platform_toolset_summary(config: dict, platforms: Optional[List[str]] = None) -> Dict[str, Set[str]]:
"""Return a summary of enabled toolsets per platform.
When ``platforms`` is None, this uses ``_get_enabled_platforms`` to
auto-detect platforms. Tests can pass an explicit list to avoid relying
on environment variables.
"""
if platforms is None:
platforms = _get_enabled_platforms()
summary: Dict[str, Set[str]] = {}
for pkey in platforms:
summary[pkey] = _get_platform_tools(config, pkey)
return summary
def _get_platform_tools(config: dict, platform: str) -> Set[str]:
"""Resolve which individual toolset names are enabled for a platform."""
from toolsets import resolve_toolset, TOOLSETS
@@ -447,6 +463,7 @@ def _prompt_choice(question: str, choices: list, default: int = 0) -> int:
def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str]:
"""Multi-select checklist of toolsets. Returns set of selected toolset keys."""
from hermes_cli.curses_ui import curses_checklist
labels = []
for ts_key, ts_label, ts_desc in CONFIGURABLE_TOOLSETS:
@@ -455,112 +472,18 @@ def _prompt_toolset_checklist(platform_label: str, enabled: Set[str]) -> Set[str
suffix = " [no API key]"
labels.append(f"{ts_label} ({ts_desc}){suffix}")
pre_selected_indices = [
pre_selected = {
i for i, (ts_key, _, _) in enumerate(CONFIGURABLE_TOOLSETS)
if ts_key in enabled
]
}
# Curses-based multi-select — arrow keys + space to toggle + enter to confirm.
# simple_term_menu has rendering bugs in tmux, iTerm, and other terminals.
try:
import curses
selected = set(pre_selected_indices)
result_holder = [None]
def _curses_checklist(stdscr):
curses.curs_set(0)
if curses.has_colors():
curses.start_color()
curses.use_default_colors()
curses.init_pair(1, curses.COLOR_GREEN, -1)
curses.init_pair(2, curses.COLOR_YELLOW, -1)
curses.init_pair(3, 8, -1) # dim gray
cursor = 0
scroll_offset = 0
while True:
stdscr.clear()
max_y, max_x = stdscr.getmaxyx()
header = f"Tools for {platform_label} — ↑↓ navigate, SPACE toggle, ENTER confirm"
try:
stdscr.addnstr(0, 0, header, max_x - 1, curses.A_BOLD | curses.color_pair(2) if curses.has_colors() else curses.A_BOLD)
except curses.error:
pass
visible_rows = max_y - 3
if cursor < scroll_offset:
scroll_offset = cursor
elif cursor >= scroll_offset + visible_rows:
scroll_offset = cursor - visible_rows + 1
for draw_i, i in enumerate(range(scroll_offset, min(len(labels), scroll_offset + visible_rows))):
y = draw_i + 2
if y >= max_y - 1:
break
check = "" if i in selected else " "
arrow = "" if i == cursor else " "
line = f" {arrow} [{check}] {labels[i]}"
attr = curses.A_NORMAL
if i == cursor:
attr = curses.A_BOLD
if curses.has_colors():
attr |= curses.color_pair(1)
try:
stdscr.addnstr(y, 0, line, max_x - 1, attr)
except curses.error:
pass
stdscr.refresh()
key = stdscr.getch()
if key in (curses.KEY_UP, ord('k')):
cursor = (cursor - 1) % len(labels)
elif key in (curses.KEY_DOWN, ord('j')):
cursor = (cursor + 1) % len(labels)
elif key == ord(' '):
if cursor in selected:
selected.discard(cursor)
else:
selected.add(cursor)
elif key in (curses.KEY_ENTER, 10, 13):
result_holder[0] = {CONFIGURABLE_TOOLSETS[i][0] for i in selected}
return
elif key in (27, ord('q')): # ESC or q
result_holder[0] = enabled
return
curses.wrapper(_curses_checklist)
return result_holder[0] if result_holder[0] is not None else enabled
except Exception:
pass # fall through to numbered toggle
# Final fallback: numbered toggle (Windows without curses, etc.)
selected = set(pre_selected_indices)
print(color(f"\n Tools for {platform_label}", Colors.YELLOW))
print(color(" Toggle by number, Enter to confirm.\n", Colors.DIM))
while True:
for i, label in enumerate(labels):
marker = color("[✓]", Colors.GREEN) if i in selected else "[ ]"
print(f" {marker} {i + 1:>2}. {label}")
print()
try:
val = input(color(" Toggle # (or Enter to confirm): ", Colors.DIM)).strip()
if not val:
break
idx = int(val) - 1
if 0 <= idx < len(labels):
if idx in selected:
selected.discard(idx)
else:
selected.add(idx)
except (ValueError, KeyboardInterrupt, EOFError):
return enabled
print()
return {CONFIGURABLE_TOOLSETS[i][0] for i in selected}
chosen = curses_checklist(
f"Tools for {platform_label}",
labels,
pre_selected,
cancel_returns=pre_selected,
)
return {CONFIGURABLE_TOOLSETS[i][0] for i in chosen}
# ─── Provider-Aware Configuration ────────────────────────────────────────────
@@ -874,6 +797,26 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
enabled_platforms = _get_enabled_platforms()
print()
# Non-interactive summary mode for CLI usage
if getattr(args, "summary", False):
total = len(CONFIGURABLE_TOOLSETS)
print(color("⚕ Tool Summary", Colors.CYAN, Colors.BOLD))
print()
summary = _platform_toolset_summary(config, enabled_platforms)
for pkey in enabled_platforms:
pinfo = PLATFORMS[pkey]
enabled = summary.get(pkey, set())
count = len(enabled)
print(color(f" {pinfo['label']}", Colors.BOLD) + color(f" ({count}/{total})", Colors.DIM))
if enabled:
for ts_key in sorted(enabled):
label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts_key), ts_key)
print(color(f"{label}", Colors.GREEN))
else:
print(color(" (none enabled)", Colors.DIM))
print()
return
print(color("⚕ Hermes Tool Configuration", Colors.CYAN, Colors.BOLD))
print(color(" Enable or disable tools per platform.", Colors.DIM))
print(color(" Tools that need API keys will be configured when enabled.", Colors.DIM))
@@ -941,22 +884,68 @@ def tools_command(args=None, first_install: bool = False, config: dict = None):
platform_choices.append(f"Configure {pinfo['label']} ({count}/{total} enabled)")
platform_keys.append(pkey)
if len(platform_keys) > 1:
platform_choices.append("Configure all platforms (global)")
platform_choices.append("Reconfigure an existing tool's provider or API key")
platform_choices.append("Done")
# Index offsets for the extra options after per-platform entries
_global_idx = len(platform_keys) if len(platform_keys) > 1 else -1
_reconfig_idx = len(platform_keys) + (1 if len(platform_keys) > 1 else 0)
_done_idx = _reconfig_idx + 1
while True:
idx = _prompt_choice("Select an option:", platform_choices, default=0)
# "Done" selected
if idx == len(platform_keys) + 1:
if idx == _done_idx:
break
# "Reconfigure" selected
if idx == len(platform_keys):
if idx == _reconfig_idx:
_reconfigure_tool(config)
print()
continue
# "Configure all platforms (global)" selected
if idx == _global_idx:
# Use the union of all platforms' current tools as the starting state
all_current = set()
for pk in platform_keys:
all_current |= _get_platform_tools(config, pk)
new_enabled = _prompt_toolset_checklist("All platforms", all_current)
if new_enabled != all_current:
for pk in platform_keys:
prev = _get_platform_tools(config, pk)
added = new_enabled - prev
removed = prev - new_enabled
pinfo_inner = PLATFORMS[pk]
if added or removed:
print(color(f" {pinfo_inner['label']}:", Colors.DIM))
for ts in sorted(added):
label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts), ts)
print(color(f" + {label}", Colors.GREEN))
for ts in sorted(removed):
label = next((l for k, l, _ in CONFIGURABLE_TOOLSETS if k == ts), ts)
print(color(f" - {label}", Colors.RED))
# Configure API keys for newly enabled tools
for ts_key in sorted(added):
if (TOOL_CATEGORIES.get(ts_key) or TOOLSET_ENV_REQUIREMENTS.get(ts_key)):
if not _toolset_has_keys(ts_key):
_configure_toolset(ts_key, config)
_save_platform_tools(config, pk, new_enabled)
save_config(config)
print(color(" ✓ Saved configuration for all platforms", Colors.GREEN))
# Update choice labels
for ci, pk in enumerate(platform_keys):
new_count = len(_get_platform_tools(config, pk))
total = len(CONFIGURABLE_TOOLSETS)
platform_choices[ci] = f"Configure {PLATFORMS[pk]['label']} ({new_count}/{total} enabled)"
else:
print(color(" No changes", Colors.DIM))
print()
continue
pkey = platform_keys[idx]
pinfo = PLATFORMS[pkey]

View File

@@ -7,3 +7,6 @@ without risk of circular imports.
OPENROUTER_BASE_URL = "https://openrouter.ai/api/v1"
OPENROUTER_MODELS_URL = f"{OPENROUTER_BASE_URL}/models"
OPENROUTER_CHAT_URL = f"{OPENROUTER_BASE_URL}/chat/completions"
NOUS_API_BASE_URL = "https://inference-api.nousresearch.com/v1"
NOUS_API_CHAT_URL = f"{NOUS_API_BASE_URL}/chat/completions"

View File

@@ -16,6 +16,7 @@ Key design decisions:
import json
import os
import re
import sqlite3
import time
from pathlib import Path
@@ -490,12 +491,16 @@ class SessionDB:
msg_id = cursor.lastrowid
# Update counters
is_tool_related = role == "tool" or tool_calls is not None
if is_tool_related:
# Count actual tool calls from the tool_calls list (not from tool responses).
# A single assistant message can contain multiple parallel tool calls.
num_tool_calls = 0
if tool_calls is not None:
num_tool_calls = len(tool_calls) if isinstance(tool_calls, list) else 1
if num_tool_calls > 0:
self._conn.execute(
"""UPDATE sessions SET message_count = message_count + 1,
tool_call_count = tool_call_count + 1 WHERE id = ?""",
(session_id,),
tool_call_count = tool_call_count + ? WHERE id = ?""",
(num_tool_calls, session_id),
)
else:
self._conn.execute(
@@ -553,6 +558,32 @@ class SessionDB:
# Search
# =========================================================================
@staticmethod
def _sanitize_fts5_query(query: str) -> str:
"""Sanitize user input for safe use in FTS5 MATCH queries.
FTS5 has its own query syntax where characters like ``"``, ``(``, ``)``,
``+``, ``*``, ``{``, ``}`` and bare boolean operators (``AND``, ``OR``,
``NOT``) have special meaning. Passing raw user input directly to
MATCH can cause ``sqlite3.OperationalError``.
Strategy: strip characters that are only meaningful as FTS5 operators
and would otherwise cause syntax errors. This preserves normal keyword
search while preventing crashes on inputs like ``C++``, ``"unterminated``,
or ``hello AND``.
"""
# Remove FTS5-special characters that are not useful in keyword search
sanitized = re.sub(r'[+{}()"^]', " ", query)
# Collapse repeated * (e.g. "***") into a single one, and remove
# leading * (prefix-only matching requires at least one char before *)
sanitized = re.sub(r"\*+", "*", sanitized)
sanitized = re.sub(r"(^|\s)\*", r"\1", sanitized)
# Remove dangling boolean operators at start/end that would cause
# syntax errors (e.g. "hello AND" or "OR world")
sanitized = re.sub(r"(?i)^(AND|OR|NOT)\b\s*", "", sanitized.strip())
sanitized = re.sub(r"(?i)\s+(AND|OR|NOT)\s*$", "", sanitized.strip())
return sanitized.strip()
def search_messages(
self,
query: str,
@@ -576,6 +607,10 @@ class SessionDB:
if not query or not query.strip():
return []
query = self._sanitize_fts5_query(query)
if not query:
return []
if source_filter is None:
source_filter = ["cli", "telegram", "discord", "whatsapp", "slack"]
@@ -615,7 +650,11 @@ class SessionDB:
LIMIT ? OFFSET ?
"""
cursor = self._conn.execute(sql, params)
try:
cursor = self._conn.execute(sql, params)
except sqlite3.OperationalError:
# FTS5 query syntax error despite sanitization — return empty
return []
matches = [dict(row) for row in cursor.fetchall()]
# Add surrounding context (1 message before + after each match)

Binary file not shown.

After

Width:  |  Height:  |  Size: 28 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 870 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.5 KiB

BIN
landingpage/favicon.ico Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.9 KiB

BIN
landingpage/icon-192.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 29 KiB

BIN
landingpage/icon-512.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 134 KiB

View File

@@ -19,7 +19,10 @@
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=JetBrains+Mono:wght@400;500&display=swap" rel="stylesheet">
<link rel="stylesheet" href="style.css">
<link rel="icon" href="data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 100 100'><text y='.9em' font-size='90'>⚕</text></svg>">
<link rel="icon" type="image/x-icon" href="favicon.ico">
<link rel="icon" type="image/png" sizes="32x32" href="favicon-32x32.png">
<link rel="icon" type="image/png" sizes="16x16" href="favicon-16x16.png">
<link rel="apple-touch-icon" sizes="180x180" href="apple-touch-icon.png">
</head>
<body>
<!-- Ambient glow effects -->

View File

@@ -266,6 +266,7 @@ def handle_function_call(
function_args: Dict[str, Any],
task_id: Optional[str] = None,
user_task: Optional[str] = None,
enabled_tools: Optional[List[str]] = None,
) -> str:
"""
Main function call dispatcher that routes calls to the tool registry.
@@ -275,19 +276,36 @@ def handle_function_call(
function_args: Arguments for the function.
task_id: Unique identifier for terminal/browser session isolation.
user_task: The user's original task (for browser_snapshot context).
enabled_tools: Tool names enabled for this session. When provided,
execute_code uses this list to determine which sandbox
tools to generate. Falls back to the process-global
``_last_resolved_tool_names`` for backward compat.
Returns:
Function result as a JSON string.
"""
# Notify the read-loop tracker when a non-read/search tool runs,
# so the *consecutive* counter resets (reads after other work are fine).
_READ_SEARCH_TOOLS = {"read_file", "search_files"}
if function_name not in _READ_SEARCH_TOOLS:
try:
from tools.file_tools import notify_other_tool_call
notify_other_tool_call(task_id or "default")
except Exception:
pass # file_tools may not be loaded yet
try:
if function_name in _AGENT_LOOP_TOOLS:
return json.dumps({"error": f"{function_name} must be handled by the agent loop"})
if function_name == "execute_code":
# Prefer the caller-provided list so subagents can't overwrite
# the parent's tool set via the process-global.
sandbox_enabled = enabled_tools if enabled_tools is not None else _last_resolved_tool_names
return registry.dispatch(
function_name, function_args,
task_id=task_id,
enabled_tools=_last_resolved_tool_names,
enabled_tools=sandbox_enabled,
)
return registry.dispatch(

View File

@@ -0,0 +1,2 @@
Optional migration workflows for importing user state and customizations from
other agent systems into Hermes Agent.

View File

@@ -0,0 +1,281 @@
---
name: openclaw-migration
description: Migrate a user's OpenClaw customization footprint into Hermes Agent. Imports Hermes-compatible memories, SOUL.md, command allowlists, user skills, and selected workspace assets from ~/.openclaw, then reports exactly what could not be migrated and why.
version: 1.0.0
author: Hermes Agent (Nous Research)
license: MIT
metadata:
hermes:
tags: [Migration, OpenClaw, Hermes, Memory, Persona, Import]
related_skills: [hermes-agent]
---
# OpenClaw -> Hermes Migration
Use this skill when a user wants to move their OpenClaw setup into Hermes Agent with minimal manual cleanup.
## What this skill does
It uses `scripts/openclaw_to_hermes.py` to:
- import `SOUL.md` into the Hermes home directory as `SOUL.md`
- transform OpenClaw `MEMORY.md` and `USER.md` into Hermes memory entries
- merge OpenClaw command approval patterns into Hermes `command_allowlist`
- migrate Hermes-compatible messaging settings such as `TELEGRAM_ALLOWED_USERS` and `MESSAGING_CWD`
- copy OpenClaw skills into `~/.hermes/skills/openclaw-imports/`
- optionally copy the OpenClaw workspace instructions file into a chosen Hermes workspace
- mirror compatible workspace assets such as `workspace/tts/` into `~/.hermes/tts/`
- archive non-secret docs that do not have a direct Hermes destination
- produce a structured report listing migrated items, conflicts, skipped items, and reasons
## Path resolution
The helper script lives in this skill directory at:
- `scripts/openclaw_to_hermes.py`
When this skill is installed from the Skills Hub, the normal location is:
- `~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py`
Do not guess a shorter path like `~/.hermes/skills/openclaw-migration/...`.
Before running the helper:
1. Prefer the installed path under `~/.hermes/skills/migration/openclaw-migration/`.
2. If that path fails, inspect the installed skill directory and resolve the script relative to the installed `SKILL.md`.
3. Only use `find` as a fallback if the installed location is missing or the skill was moved manually.
4. When calling the terminal tool, do not pass `workdir: "~"`. Use an absolute directory such as the user's home directory, or omit `workdir` entirely.
With `--migrate-secrets`, it will also import a small allowlisted set of Hermes-compatible secrets, currently:
- `TELEGRAM_BOT_TOKEN`
## Default workflow
1. Inspect first with a dry run.
2. Present a simple summary of what can be migrated, what cannot be migrated, and what would be archived.
3. If the `clarify` tool is available, use it for user decisions instead of asking for a free-form prose reply.
4. If the dry run finds imported skill directory conflicts, ask how those should be handled before executing.
5. Ask the user to choose between the two supported migration modes before executing.
6. Ask for a target workspace path only if the user wants the workspace instructions file brought over.
7. Execute the migration with the matching preset and flags.
8. Summarize the results, especially:
- what was migrated
- what was archived for manual review
- what was skipped and why
## User interaction protocol
Hermes CLI supports the `clarify` tool for interactive prompts, but it is limited to:
- one choice at a time
- up to 4 predefined choices
- an automatic `Other` free-text option
It does **not** support true multi-select checkboxes in a single prompt.
For every `clarify` call:
- always include a non-empty `question`
- include `choices` only for real selectable prompts
- keep `choices` to 2-4 plain string options
- never emit placeholder or truncated options such as `...`
- never pad or stylize choices with extra whitespace
- never include fake form fields in the question such as `enter directory here`, blank lines to fill in, or underscores like `_____`
- for open-ended path questions, ask only the plain sentence; the user types in the normal CLI prompt below the panel
If a `clarify` call returns an error, inspect the error text, correct the payload, and retry once with a valid `question` and clean choices.
When `clarify` is available and the dry run reveals any required user decision, your **next action must be a `clarify` tool call**.
Do not end the turn with a normal assistant message such as:
- "Let me present the choices"
- "What would you like to do?"
- "Here are the options"
If a user decision is required, collect it via `clarify` before producing more prose.
If multiple unresolved decisions remain, do not insert an explanatory assistant message between them. After one `clarify` response is received, your next action should usually be the next required `clarify` call.
Treat `workspace-agents` as an unresolved decision whenever the dry run reports:
- `kind="workspace-agents"`
- `status="skipped"`
- reason containing `No workspace target was provided`
In that case, you must ask about workspace instructions before execution. Do not silently treat that as a decision to skip.
Because of that limitation, use this simplified decision flow:
1. For `SOUL.md` conflicts, use `clarify` with choices such as:
- `keep existing`
- `overwrite with backup`
- `review first`
2. If the dry run shows one or more `kind="skill"` items with `status="conflict"`, use `clarify` with choices such as:
- `keep existing skills`
- `overwrite conflicting skills with backup`
- `import conflicting skills under renamed folders`
3. For workspace instructions, use `clarify` with choices such as:
- `skip workspace instructions`
- `copy to a workspace path`
- `decide later`
4. If the user chooses to copy workspace instructions, ask a follow-up open-ended `clarify` question requesting an **absolute path**.
5. If the user chooses `skip workspace instructions` or `decide later`, proceed without `--workspace-target`.
5. For migration mode, use `clarify` with these 3 choices:
- `user-data only`
- `full compatible migration`
- `cancel`
6. `user-data only` means: migrate user data and compatible config, but do **not** import allowlisted secrets.
7. `full compatible migration` means: migrate the same compatible user data plus the allowlisted secrets when present.
8. If `clarify` is not available, ask the same question in normal text, but still constrain the answer to `user-data only`, `full compatible migration`, or `cancel`.
Execution gate:
- Do not execute while a `workspace-agents` skip caused by `No workspace target was provided` remains unresolved.
- The only valid ways to resolve it are:
- user explicitly chooses `skip workspace instructions`
- user explicitly chooses `decide later`
- user provides a workspace path after choosing `copy to a workspace path`
- Absence of a workspace target in the dry run is not itself permission to execute.
- Do not execute while any required `clarify` decision remains unresolved.
Use these exact `clarify` payload shapes as the default pattern:
- `{"question":"Your existing SOUL.md conflicts with the imported one. What should I do?","choices":["keep existing","overwrite with backup","review first"]}`
- `{"question":"One or more imported OpenClaw skills already exist in Hermes. How should I handle those skill conflicts?","choices":["keep existing skills","overwrite conflicting skills with backup","import conflicting skills under renamed folders"]}`
- `{"question":"Choose migration mode: migrate only user data, or run the full compatible migration including allowlisted secrets?","choices":["user-data only","full compatible migration","cancel"]}`
- `{"question":"Do you want to copy the OpenClaw workspace instructions file into a Hermes workspace?","choices":["skip workspace instructions","copy to a workspace path","decide later"]}`
- `{"question":"Please provide an absolute path where the workspace instructions should be copied."}`
## Decision-to-command mapping
Map user decisions to command flags exactly:
- If the user chooses `keep existing` for `SOUL.md`, do **not** add `--overwrite`.
- If the user chooses `overwrite with backup`, add `--overwrite`.
- If the user chooses `review first`, stop before execution and review the relevant files.
- If the user chooses `keep existing skills`, add `--skill-conflict skip`.
- If the user chooses `overwrite conflicting skills with backup`, add `--skill-conflict overwrite`.
- If the user chooses `import conflicting skills under renamed folders`, add `--skill-conflict rename`.
- If the user chooses `user-data only`, execute with `--preset user-data` and do **not** add `--migrate-secrets`.
- If the user chooses `full compatible migration`, execute with `--preset full --migrate-secrets`.
- Only add `--workspace-target` if the user explicitly provided an absolute workspace path.
- If the user chooses `skip workspace instructions` or `decide later`, do not add `--workspace-target`.
Before executing, restate the exact command plan in plain language and make sure it matches the user's choices.
## Post-run reporting rules
After execution, treat the script's JSON output as the source of truth.
1. Base all counts on `report.summary`.
2. Only list an item under "Successfully Migrated" if its `status` is exactly `migrated`.
3. Do not claim a conflict was resolved unless the report shows that item as `migrated`.
4. Do not say `SOUL.md` was overwritten unless the report item for `kind="soul"` has `status="migrated"`.
5. If `report.summary.conflict > 0`, include a conflict section instead of silently implying success.
6. If counts and listed items disagree, fix the list to match the report before responding.
7. Include the `output_dir` path from the report when available so the user can inspect `report.json`, `summary.md`, backups, and archived files.
8. For memory or user-profile overflow, do not say the entries were archived unless the report explicitly shows an archive path. If `details.overflow_file` exists, say the full overflow list was exported there.
9. If a skill was imported under a renamed folder, report the final destination and mention `details.renamed_from`.
10. If `report.skill_conflict_mode` is present, use it as the source of truth for the selected imported-skill conflict policy.
11. If an item has `status="skipped"`, do not describe it as overwritten, backed up, migrated, or resolved.
12. If `kind="soul"` has `status="skipped"` with reason `Target already matches source`, say it was left unchanged and do not mention a backup.
13. If a renamed imported skill has an empty `details.backup`, do not imply the existing Hermes skill was renamed or backed up. Say only that the imported copy was placed in the new destination and reference `details.renamed_from` as the pre-existing folder that remained in place.
## Migration presets
Prefer these two presets in normal use:
- `user-data`
- `full`
`user-data` includes:
- `soul`
- `workspace-agents`
- `memory`
- `user-profile`
- `messaging-settings`
- `command-allowlist`
- `skills`
- `tts-assets`
- `archive`
`full` includes everything in `user-data` plus:
- `secret-settings`
The helper script still supports category-level `--include` / `--exclude`, but treat that as an advanced fallback rather than the default UX.
## Commands
Dry run with full discovery:
```bash
python3 ~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py
```
When using the terminal tool, prefer an absolute invocation pattern such as:
```json
{"command":"python3 /home/USER/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py","workdir":"/home/USER"}
```
Dry run with the user-data preset:
```bash
python3 ~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py --preset user-data
```
Execute a user-data migration:
```bash
python3 ~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py --execute --preset user-data --skill-conflict skip
```
Execute a full compatible migration:
```bash
python3 ~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py --execute --preset full --migrate-secrets --skill-conflict skip
```
Execute with workspace instructions included:
```bash
python3 ~/.hermes/skills/migration/openclaw-migration/scripts/openclaw_to_hermes.py --execute --preset user-data --skill-conflict rename --workspace-target "/absolute/workspace/path"
```
Do not use `$PWD` or the home directory as the workspace target by default. Ask for an explicit workspace path first.
## Important rules
1. Run a dry run before writing unless the user explicitly says to proceed immediately.
2. Do not migrate secrets by default. Tokens, auth blobs, device credentials, and raw gateway config should stay out of Hermes unless the user explicitly asks for secret migration.
3. Do not silently overwrite non-empty Hermes targets unless the user explicitly wants that. The helper script will preserve backups when overwriting is enabled.
4. Always give the user the skipped-items report. That report is part of the migration, not an optional extra.
5. Prefer the primary OpenClaw workspace (`~/.openclaw/workspace/`) over `workspace.default/`. Only use the default workspace as fallback when the primary files are missing.
6. Even in secret-migration mode, only migrate secrets with a clean Hermes destination. Unsupported auth blobs must still be reported as skipped.
7. If the dry run shows a large asset copy, a conflicting `SOUL.md`, or overflowed memory entries, call those out separately before execution.
8. Default to `user-data only` if the user is unsure.
9. Only include `workspace-agents` when the user has explicitly provided a destination workspace path.
10. Treat category-level `--include` / `--exclude` as an advanced escape hatch, not the normal flow.
11. Do not end the dry-run summary with a vague “What would you like to do?” if `clarify` is available. Use structured follow-up prompts instead.
12. Do not use an open-ended `clarify` prompt when a real choice prompt would work. Prefer selectable choices first, then free text only for absolute paths or file review requests.
13. After a dry run, never stop after summarizing if there is still an unresolved decision. Use `clarify` immediately for the highest-priority blocking decision.
14. Priority order for follow-up questions:
- `SOUL.md` conflict
- imported skill conflicts
- migration mode
- workspace instructions destination
15. Do not promise to present choices later in the same message. Present them by actually calling `clarify`.
16. After the migration-mode answer, explicitly check whether `workspace-agents` is still unresolved. If it is, your next action must be the workspace-instructions `clarify` call.
17. After any `clarify` answer, if another required decision remains, do not narrate what was just decided. Ask the next required question immediately.
## Expected result
After a successful run, the user should have:
- Hermes persona state imported
- Hermes memory files populated with converted OpenClaw knowledge
- OpenClaw skills available under `~/.hermes/skills/openclaw-imports/`
- a migration report showing any conflicts, omissions, or unsupported data

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,218 @@
# Checkpoint & Rollback — Implementation Plan
## Goal
Automatic filesystem snapshots before destructive file operations, with user-facing rollback. The agent never sees or interacts with this — it's transparent infrastructure.
## Design Principles
1. **Not a tool** — the LLM never knows about it. Zero prompt tokens, zero tool schema overhead.
2. **Once per turn** — checkpoint at most once per conversation turn (user message → agent response cycle), triggered lazily on the first file-mutating operation. Not on every write.
3. **Opt-in via config** — disabled by default, enabled with `checkpoints: true` in config.yaml.
4. **Works on any directory** — uses a shadow git repo completely separate from the user's project git. Works on git repos, non-git directories, anything.
5. **User-facing rollback**`/rollback` slash command (CLI + gateway) to list and restore checkpoints. Also `hermes rollback` CLI subcommand.
## Architecture
```
~/.hermes/checkpoints/
{sha256(abs_dir)[:16]}/ # Shadow git repo per working directory
HEAD, refs/, objects/... # Standard git internals
HERMES_WORKDIR # Original dir path (for display)
info/exclude # Default excludes (node_modules, .env, etc.)
```
### Core: CheckpointManager (new file: tools/checkpoint_manager.py)
Adapted from PR #559's CheckpointStore. Key changes from the PR:
- **Not a tool** — no schema, no registry entry, no handler
- **Turn-scoped deduplication** — tracks `_checkpointed_dirs: Set[str]` per turn
- **Configurable** — reads `checkpoints` config key
- **Pruning** — keeps last N snapshots per directory (default 50), prunes on take
```python
class CheckpointManager:
def __init__(self, enabled: bool = False, max_snapshots: int = 50):
self.enabled = enabled
self.max_snapshots = max_snapshots
self._checkpointed_dirs: Set[str] = set() # reset each turn
def new_turn(self):
"""Call at start of each conversation turn to reset dedup."""
self._checkpointed_dirs.clear()
def ensure_checkpoint(self, working_dir: str, reason: str = "auto") -> None:
"""Take a checkpoint if enabled and not already done this turn."""
if not self.enabled:
return
abs_dir = str(Path(working_dir).resolve())
if abs_dir in self._checkpointed_dirs:
return
self._checkpointed_dirs.add(abs_dir)
try:
self._take(abs_dir, reason)
except Exception as e:
logger.debug("Checkpoint failed (non-fatal): %s", e)
def list_checkpoints(self, working_dir: str) -> List[dict]:
"""List available checkpoints for a directory."""
...
def restore(self, working_dir: str, commit_hash: str) -> dict:
"""Restore files to a checkpoint state."""
...
def _take(self, working_dir: str, reason: str):
"""Shadow git: add -A + commit. Prune if over max_snapshots."""
...
def _prune(self, shadow_repo: Path):
"""Keep only last max_snapshots commits."""
...
```
### Integration Point: run_agent.py
The AIAgent already owns the conversation loop. Add CheckpointManager as an instance attribute:
```python
class AIAgent:
def __init__(self, ...):
...
# Checkpoint manager — reads config to determine if enabled
self._checkpoint_mgr = CheckpointManager(
enabled=config.get("checkpoints", False),
max_snapshots=config.get("checkpoint_max_snapshots", 50),
)
```
**Turn boundary** — in `run_conversation()`, call `new_turn()` at the start of each agent iteration (before processing tool calls):
```python
# Inside the main loop, before _execute_tool_calls():
self._checkpoint_mgr.new_turn()
```
**Trigger point** — in `_execute_tool_calls()`, before dispatching file-mutating tools:
```python
# Before the handle_function_call dispatch:
if function_name in ("write_file", "patch"):
# Determine working dir from the file path in the args
file_path = function_args.get("path", "") or function_args.get("old_string", "")
if file_path:
work_dir = str(Path(file_path).parent.resolve())
self._checkpoint_mgr.ensure_checkpoint(work_dir, f"before {function_name}")
```
This means:
- First `write_file` in a turn → checkpoint (fast, one `git add -A && git commit`)
- Subsequent writes in the same turn → no-op (already checkpointed)
- Next turn (new user message) → fresh checkpoint eligibility
### Config
Add to `DEFAULT_CONFIG` in `hermes_cli/config.py`:
```python
"checkpoints": False, # Enable filesystem checkpoints before destructive ops
"checkpoint_max_snapshots": 50, # Max snapshots to keep per directory
```
User enables with:
```yaml
# ~/.hermes/config.yaml
checkpoints: true
```
### User-Facing Rollback
**CLI slash command** — add `/rollback` to `process_command()` in `cli.py`:
```
/rollback — List recent checkpoints for the current directory
/rollback <hash> — Restore files to that checkpoint
```
Shows a numbered list:
```
📸 Checkpoints for /home/user/project:
1. abc1234 2026-03-09 21:15 before write_file (3 files changed)
2. def5678 2026-03-09 20:42 before patch (1 file changed)
3. ghi9012 2026-03-09 20:30 before write_file (2 files changed)
Use /rollback <number> to restore, e.g. /rollback 1
```
**Gateway slash command** — add `/rollback` to gateway/run.py with the same behavior.
**CLI subcommand**`hermes rollback` (optional, lower priority).
### What Gets Excluded (not checkpointed)
Same as the PR's defaults — written to the shadow repo's `info/exclude`:
```
node_modules/
dist/
build/
.env
.env.*
__pycache__/
*.pyc
.DS_Store
*.log
.cache/
.venv/
.git/
```
Also respects the project's `.gitignore` if present (shadow repo can read it via `core.excludesFile`).
### Safety
- `ensure_checkpoint()` wraps everything in try/except — a checkpoint failure never blocks the actual file operation
- Shadow repo is completely isolated — GIT_DIR + GIT_WORK_TREE env vars, never touches user's .git
- If git isn't installed, checkpoints silently disable
- Large directories: add a file count check — skip checkpoint if >50K files to avoid slowdowns
## Files to Create/Modify
| File | Change |
|------|--------|
| `tools/checkpoint_manager.py` | **NEW** — CheckpointManager class (adapted from PR #559) |
| `run_agent.py` | Add CheckpointManager init + trigger in `_execute_tool_calls()` |
| `hermes_cli/config.py` | Add `checkpoints` + `checkpoint_max_snapshots` to DEFAULT_CONFIG |
| `cli.py` | Add `/rollback` slash command handler |
| `gateway/run.py` | Add `/rollback` slash command handler |
| `tests/tools/test_checkpoint_manager.py` | **NEW** — tests (adapted from PR #559's tests) |
## What We Take From PR #559
- `_shadow_repo_path()` — deterministic path hashing ✅
- `_git_env()` — GIT_DIR/GIT_WORK_TREE isolation ✅
- `_run_git()` — subprocess wrapper with timeout ✅
- `_init_shadow_repo()` — shadow repo initialization ✅
- `DEFAULT_EXCLUDES` list ✅
- Test structure and patterns ✅
## What We Change From PR #559
- **Remove tool schema/registry** — not a tool
- **Remove injection into file_operations.py and patch_parser.py** — trigger from run_agent.py instead
- **Add turn-scoped deduplication** — one checkpoint per turn, not per operation
- **Add pruning** — keep last N snapshots
- **Add config flag** — opt-in, not mandatory
- **Add /rollback command** — user-facing restore UI
- **Add file count guard** — skip huge directories
## Implementation Order
1. `tools/checkpoint_manager.py` — core class with take/list/restore/prune
2. `tests/tools/test_checkpoint_manager.py` — tests
3. `hermes_cli/config.py` — config keys
4. `run_agent.py` — integration (init + trigger)
5. `cli.py``/rollback` slash command
6. `gateway/run.py``/rollback` slash command
7. Full test suite run + manual smoke test

View File

@@ -40,13 +40,16 @@ dependencies = [
[project.optional-dependencies]
modal = ["swe-rex[modal]>=1.4.0"]
daytona = ["daytona>=0.148.0"]
dev = ["pytest", "pytest-asyncio"]
dev = ["pytest", "pytest-asyncio", "mcp>=1.2.0"]
messaging = ["python-telegram-bot>=20.0", "discord.py>=2.0", "aiohttp>=3.9.0", "slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
cron = ["croniter"]
slack = ["slack-bolt>=1.18.0", "slack-sdk>=3.27.0"]
cli = ["simple-term-menu"]
tts-premium = ["elevenlabs"]
pty = ["ptyprocess>=0.7.0"]
pty = [
"ptyprocess>=0.7.0; sys_platform != 'win32'",
"pywinpty>=2.0.0; sys_platform == 'win32'",
]
honcho = ["honcho-ai>=2.0.1"]
mcp = ["mcp>=1.2.0"]
homeassistant = ["aiohttp>=3.9.0"]

View File

@@ -172,6 +172,7 @@ class AIAgent:
provider_data_collection: str = None,
session_id: str = None,
tool_progress_callback: callable = None,
thinking_callback: callable = None,
clarify_callback: callable = None,
step_callback: callable = None,
max_tokens: int = None,
@@ -184,6 +185,8 @@ class AIAgent:
honcho_session_key: str = None,
iteration_budget: "IterationBudget" = None,
fallback_model: Dict[str, Any] = None,
checkpoints_enabled: bool = False,
checkpoint_max_snapshots: int = 50,
):
"""
Initialize the AI Agent.
@@ -256,6 +259,7 @@ class AIAgent:
self.api_mode = "chat_completions"
self.tool_progress_callback = tool_progress_callback
self.thinking_callback = thinking_callback
self.clarify_callback = clarify_callback
self.step_callback = step_callback
self._last_reported_tool = None # Track for "new tool" mode
@@ -293,6 +297,13 @@ class AIAgent:
self._use_prompt_caching = is_openrouter and is_claude
self._cache_ttl = "5m" # Default 5-minute TTL (1.25x write cost)
# Iteration budget pressure: warn the LLM as it approaches max_iterations.
# Warnings are injected into the last tool result JSON (not as separate
# messages) so they don't break message structure or invalidate caching.
self._budget_caution_threshold = 0.7 # 70% — nudge to start wrapping up
self._budget_warning_threshold = 0.9 # 90% — urgent, respond now
self._budget_pressure_enabled = True
# Persistent error log -- always writes WARNING+ to ~/.hermes/logs/errors.log
# so tool failures, API errors, etc. are inspectable after the fact.
from agent.redact import RedactingFormatter
@@ -484,8 +495,16 @@ class AIAgent:
# Cached system prompt -- built once per session, only rebuilt on compression
self._cached_system_prompt: Optional[str] = None
# Filesystem checkpoint manager (transparent — not a tool)
from tools.checkpoint_manager import CheckpointManager
self._checkpoint_mgr = CheckpointManager(
enabled=checkpoints_enabled,
max_snapshots=checkpoint_max_snapshots,
)
# SQLite session store (optional -- provided by CLI or gateway)
self._session_db = session_db
self._last_flushed_db_idx = 0 # tracks DB-write cursor to prevent duplicate writes
if self._session_db:
try:
self._session_db.create_session(
@@ -791,45 +810,19 @@ class AIAgent:
self._save_session_log(messages)
self._flush_messages_to_session_db(messages, conversation_history)
def _log_msg_to_db(self, msg: Dict):
"""Log a single message to SQLite immediately. Called after each messages.append()."""
if not self._session_db:
return
try:
role = msg.get("role", "unknown")
content = msg.get("content")
tool_calls_data = None
if hasattr(msg, "tool_calls") and msg.tool_calls:
tool_calls_data = [
{"name": tc.function.name, "arguments": tc.function.arguments}
for tc in msg.tool_calls
]
elif isinstance(msg.get("tool_calls"), list):
tool_calls_data = msg["tool_calls"]
self._session_db.append_message(
session_id=self.session_id,
role=role,
content=content,
tool_name=msg.get("tool_name"),
tool_calls=tool_calls_data,
tool_call_id=msg.get("tool_call_id"),
finish_reason=msg.get("finish_reason"),
)
except Exception as e:
logger.debug("Session DB log_msg failed: %s", e)
def _flush_messages_to_session_db(self, messages: List[Dict], conversation_history: List[Dict] = None):
"""Persist any un-logged messages to the SQLite session store.
"""Persist any un-flushed messages to the SQLite session store.
Called both at the normal end of run_conversation and from every early-
return path so that tool calls, tool responses, and assistant messages
are never lost even when the conversation errors out.
Uses _last_flushed_db_idx to track which messages have already been
written, so repeated calls (from multiple exit paths) only write
truly new messages — preventing the duplicate-write bug (#860).
"""
if not self._session_db:
return
try:
start_idx = len(conversation_history) if conversation_history else 0
for msg in messages[start_idx:]:
flush_from = max(start_idx, self._last_flushed_db_idx)
for msg in messages[flush_from:]:
role = msg.get("role", "unknown")
content = msg.get("content")
tool_calls_data = None
@@ -849,6 +842,7 @@ class AIAgent:
tool_call_id=msg.get("tool_call_id"),
finish_reason=msg.get("finish_reason"),
)
self._last_flushed_db_idx = len(messages)
except Exception as e:
logger.debug("Session DB append_message failed: %s", e)
@@ -1431,6 +1425,34 @@ class AIAgent:
return "\n\n".join(prompt_parts)
def _repair_tool_call(self, tool_name: str) -> str | None:
"""Attempt to repair a mismatched tool name before aborting.
1. Try lowercase
2. Try normalized (lowercase + hyphens/spaces -> underscores)
3. Try fuzzy match (difflib, cutoff=0.7)
Returns the repaired name if found in valid_tool_names, else None.
"""
from difflib import get_close_matches
# 1. Lowercase
lowered = tool_name.lower()
if lowered in self.valid_tool_names:
return lowered
# 2. Normalize
normalized = lowered.replace("-", "_").replace(" ", "_")
if normalized in self.valid_tool_names:
return normalized
# 3. Fuzzy match
matches = get_close_matches(lowered, self.valid_tool_names, n=1, cutoff=0.7)
if matches:
return matches[0]
return None
def _invalidate_system_prompt(self):
"""
Invalidate the cached system prompt, forcing a rebuild on the next turn.
@@ -2318,7 +2340,10 @@ class AIAgent:
"instructions": instructions,
"input": self._chat_messages_to_responses_input(payload_messages),
"tools": self._responses_tools(),
"tool_choice": "auto",
"parallel_tool_calls": True,
"store": False,
"prompt_cache_key": self.session_id,
}
if reasoning_enabled:
@@ -2610,7 +2635,7 @@ class AIAgent:
if messages and messages[-1].get("_flush_sentinel") == _sentinel:
messages.pop()
def _compress_context(self, messages: list, system_message: str, *, approx_tokens: int = None) -> tuple:
def _compress_context(self, messages: list, system_message: str, *, approx_tokens: int = None, task_id: str = "default") -> tuple:
"""Compress conversation context and split the session in SQLite.
Returns:
@@ -2625,6 +2650,25 @@ class AIAgent:
if todo_snapshot:
compressed.append({"role": "user", "content": todo_snapshot})
# Preserve file-read history so the model doesn't re-read files
# it already examined before compression.
try:
from tools.file_tools import get_read_files_summary
read_files = get_read_files_summary(task_id)
if read_files:
file_list = "\n".join(
f" - {f['path']} ({', '.join(f['regions'])})"
for f in read_files
)
compressed.append({"role": "user", "content": (
"[Files already read in this session — do NOT re-read these]\n"
f"{file_list}\n"
"Use the information from the context summary above. "
"Proceed with writing, editing, or responding."
)})
except Exception:
pass # Don't break compression if file tracking fails
self._invalidate_system_prompt()
new_system_prompt = self._build_system_prompt(system_message)
self._cached_system_prompt = new_system_prompt
@@ -2650,12 +2694,14 @@ class AIAgent:
except (ValueError, Exception) as e:
logger.debug("Could not propagate title on compression: %s", e)
self._session_db.update_system_prompt(self.session_id, new_system_prompt)
# Reset flush cursor — new session starts with no messages written
self._last_flushed_db_idx = 0
except Exception as e:
logger.debug("Session DB compression split failed: %s", e)
return compressed, new_system_prompt
def _execute_tool_calls(self, assistant_message, messages: list, effective_task_id: str) -> None:
def _execute_tool_calls(self, assistant_message, messages: list, effective_task_id: str, api_call_count: int = 0) -> None:
"""Execute tool calls from the assistant message and append results to messages."""
for i, tool_call in enumerate(assistant_message.tool_calls, 1):
# SAFETY: check interrupt BEFORE starting each tool.
@@ -2673,7 +2719,6 @@ class AIAgent:
"tool_call_id": skipped_tc.id,
}
messages.append(skip_msg)
self._log_msg_to_db(skip_msg)
break
function_name = tool_call.function.name
@@ -2689,6 +2734,8 @@ class AIAgent:
except json.JSONDecodeError as e:
logging.warning(f"Unexpected JSON error after validation: {e}")
function_args = {}
if not isinstance(function_args, dict):
function_args = {}
if not self.quiet_mode:
args_str = json.dumps(function_args, ensure_ascii=False)
@@ -2702,6 +2749,18 @@ class AIAgent:
except Exception as cb_err:
logging.debug(f"Tool progress callback error: {cb_err}")
# Checkpoint: snapshot working dir before file-mutating tools
if function_name in ("write_file", "patch") and self._checkpoint_mgr.enabled:
try:
file_path = function_args.get("path", "")
if file_path:
work_dir = self._checkpoint_mgr.get_working_dir_for_path(file_path)
self._checkpoint_mgr.ensure_checkpoint(
work_dir, f"before {function_name}"
)
except Exception:
pass # never block tool execution
tool_start_time = time.time()
if function_name == "todo":
@@ -2814,7 +2873,10 @@ class AIAgent:
spinner.start()
_spinner_result = None
try:
function_result = handle_function_call(function_name, function_args, effective_task_id)
function_result = handle_function_call(
function_name, function_args, effective_task_id,
enabled_tools=list(self.valid_tool_names) if self.valid_tool_names else None,
)
_spinner_result = function_result
except Exception as tool_error:
function_result = f"Error executing tool '{function_name}': {tool_error}"
@@ -2825,7 +2887,10 @@ class AIAgent:
spinner.stop(cute_msg)
else:
try:
function_result = handle_function_call(function_name, function_args, effective_task_id)
function_result = handle_function_call(
function_name, function_args, effective_task_id,
enabled_tools=list(self.valid_tool_names) if self.valid_tool_names else None,
)
except Exception as tool_error:
function_result = f"Error executing tool '{function_name}': {tool_error}"
logger.error("handle_function_call raised for %s: %s", function_name, tool_error, exc_info=True)
@@ -2862,7 +2927,6 @@ class AIAgent:
"tool_call_id": tool_call.id
}
messages.append(tool_msg)
self._log_msg_to_db(tool_msg)
if not self.quiet_mode:
response_preview = function_result[:self.log_prefix_chars] + "..." if len(function_result) > self.log_prefix_chars else function_result
@@ -2879,12 +2943,56 @@ class AIAgent:
"tool_call_id": skipped_tc.id
}
messages.append(skip_msg)
self._log_msg_to_db(skip_msg)
break
if self.tool_delay > 0 and i < len(assistant_message.tool_calls):
time.sleep(self.tool_delay)
# ── Budget pressure injection ─────────────────────────────────
# After all tool calls in this turn are processed, check if we're
# approaching max_iterations. If so, inject a warning into the LAST
# tool result's JSON so the LLM sees it naturally when reading results.
budget_warning = self._get_budget_warning(api_call_count)
if budget_warning and messages and messages[-1].get("role") == "tool":
last_content = messages[-1]["content"]
try:
parsed = json.loads(last_content)
if isinstance(parsed, dict):
parsed["_budget_warning"] = budget_warning
messages[-1]["content"] = json.dumps(parsed, ensure_ascii=False)
else:
messages[-1]["content"] = last_content + f"\n\n{budget_warning}"
except (json.JSONDecodeError, TypeError):
messages[-1]["content"] = last_content + f"\n\n{budget_warning}"
if not self.quiet_mode:
remaining = self.max_iterations - api_call_count
tier = "⚠️ WARNING" if remaining <= self.max_iterations * 0.1 else "💡 CAUTION"
print(f"{self.log_prefix}{tier}: {remaining} iterations remaining")
def _get_budget_warning(self, api_call_count: int) -> Optional[str]:
"""Return a budget pressure string, or None if not yet needed.
Two-tier system:
- Caution (70%): nudge to consolidate work
- Warning (90%): urgent, must respond now
"""
if not self._budget_pressure_enabled or self.max_iterations <= 0:
return None
progress = api_call_count / self.max_iterations
remaining = self.max_iterations - api_call_count
if progress >= self._budget_warning_threshold:
return (
f"[BUDGET WARNING: Iteration {api_call_count}/{self.max_iterations}. "
f"Only {remaining} iteration(s) left. "
"Provide your final response NOW. No more tool calls unless absolutely critical.]"
)
if progress >= self._budget_caution_threshold:
return (
f"[BUDGET: Iteration {api_call_count}/{self.max_iterations}. "
f"{remaining} iterations left. Start consolidating your work.]"
)
return None
def _handle_max_iterations(self, messages: list, api_call_count: int) -> str:
"""Request a summary when max iterations are reached. Returns the final response text."""
print(f"⚠️ Reached maximum iterations ({self.max_iterations}). Requesting summary...")
@@ -3042,6 +3150,8 @@ class AIAgent:
self._invalid_tool_retries = 0
self._invalid_json_retries = 0
self._empty_content_retries = 0
self._incomplete_scratchpad_retries = 0
self._codex_incomplete_retries = 0
self._last_content_with_tools = None
self._turns_since_memory = 0
self._iters_since_skill = 0
@@ -3108,7 +3218,6 @@ class AIAgent:
# Add user message
user_msg = {"role": "user", "content": user_message}
messages.append(user_msg)
self._log_msg_to_db(user_msg)
if not self.quiet_mode:
print(f"💬 Starting conversation: '{user_message[:60]}{'...' if len(user_message) > 60 else ''}'")
@@ -3190,7 +3299,8 @@ class AIAgent:
for _pass in range(3):
_orig_len = len(messages)
messages, active_system_prompt = self._compress_context(
messages, system_message, approx_tokens=_preflight_tokens
messages, system_message, approx_tokens=_preflight_tokens,
task_id=effective_task_id,
)
if len(messages) >= _orig_len:
break # Cannot compress further
@@ -3206,11 +3316,16 @@ class AIAgent:
final_response = None
interrupted = False
codex_ack_continuations = 0
length_continue_retries = 0
truncated_response_prefix = ""
# Clear any stale interrupt state at start
self.clear_interrupt()
while api_call_count < self.max_iterations and self.iteration_budget.remaining > 0:
# Reset per-turn checkpoint dedup so each iteration can take one snapshot
self._checkpoint_mgr.new_turn()
# Check for interrupt request (e.g., user sent new message)
if self._interrupt_requested:
interrupted = True
@@ -3254,7 +3369,7 @@ class AIAgent:
api_messages = []
for msg in messages:
api_msg = msg.copy()
# For ALL assistant messages, pass reasoning back to the API
# This ensures multi-turn reasoning context is preserved
if msg.get("role") == "assistant":
@@ -3262,7 +3377,7 @@ class AIAgent:
if reasoning_text:
# Add reasoning_content for API compatibility (Moonshot AI, Novita, OpenRouter)
api_msg["reasoning_content"] = reasoning_text
# Remove 'reasoning' field - it's for trajectory storage only
# We've copied it to 'reasoning_content' for the API above
if "reasoning" in api_msg:
@@ -3273,7 +3388,7 @@ class AIAgent:
# Keep 'reasoning_details' - OpenRouter uses this for multi-turn reasoning context
# The signature field helps maintain reasoning continuity
api_messages.append(api_msg)
# Build the final system message: cached prompt + ephemeral system prompt.
# The ephemeral part is appended here (not baked into the cached prompt)
# so it stays out of the session DB and logs.
@@ -3286,21 +3401,21 @@ class AIAgent:
effective_system = (effective_system + "\n\n" + self.ephemeral_system_prompt).strip()
if effective_system:
api_messages = [{"role": "system", "content": effective_system}] + api_messages
# Inject ephemeral prefill messages right after the system prompt
# but before conversation history. Same API-call-time-only pattern.
if self.prefill_messages:
sys_offset = 1 if effective_system else 0
for idx, pfm in enumerate(self.prefill_messages):
api_messages.insert(sys_offset + idx, pfm.copy())
# Apply Anthropic prompt caching for Claude models via OpenRouter.
# Auto-detected: if model name contains "claude" and base_url is OpenRouter,
# inject cache_control breakpoints (system + last 3 messages) to reduce
# input token costs by ~75% on multi-turn conversations.
if self._use_prompt_caching:
api_messages = apply_anthropic_cache_control(api_messages, cache_ttl=self._cache_ttl)
# Safety net: strip orphaned tool results / add stubs for missing
# results before sending to the API. The compressor handles this
# during compression, but orphans can also sneak in from session
@@ -3323,9 +3438,13 @@ class AIAgent:
# Animated thinking spinner in quiet mode
face = random.choice(KawaiiSpinner.KAWAII_THINKING)
verb = random.choice(KawaiiSpinner.THINKING_VERBS)
spinner_type = random.choice(['brain', 'sparkle', 'pulse', 'moon', 'star'])
thinking_spinner = KawaiiSpinner(f"{face} {verb}...", spinner_type=spinner_type)
thinking_spinner.start()
if self.thinking_callback:
# CLI TUI mode: use prompt_toolkit widget instead of raw spinner
self.thinking_callback(f"{face} {verb}...")
else:
spinner_type = random.choice(['brain', 'sparkle', 'pulse', 'moon', 'star'])
thinking_spinner = KawaiiSpinner(f"{face} {verb}...", spinner_type=spinner_type)
thinking_spinner.start()
# Log request details if verbose
if self.verbose_logging:
@@ -3340,6 +3459,8 @@ class AIAgent:
max_compression_attempts = 3
codex_auth_retry_attempted = False
nous_auth_retry_attempted = False
restart_with_compressed_messages = False
restart_with_length_continuation = False
finish_reason = "stop"
response = None # Guard against UnboundLocalError if all retries fail
@@ -3362,6 +3483,8 @@ class AIAgent:
if thinking_spinner:
thinking_spinner.stop("")
thinking_spinner = None
if self.thinking_callback:
self.thinking_callback("")
if not self.quiet_mode:
print(f"{self.log_prefix}⏱️ API call completed in {api_duration:.2f}s")
@@ -3402,6 +3525,8 @@ class AIAgent:
if thinking_spinner:
thinking_spinner.stop(f"(´;ω;`) oops, retrying...")
thinking_spinner = None
if self.thinking_callback:
self.thinking_callback("")
# This is often rate limiting or provider returning malformed response
retry_count += 1
@@ -3486,19 +3611,58 @@ class AIAgent:
finish_reason = "stop"
else:
finish_reason = response.choices[0].finish_reason
# Handle "length" finish_reason - response was truncated
if finish_reason == "length":
print(f"{self.log_prefix}⚠️ Response truncated (finish_reason='length') - model hit max output tokens")
if self.api_mode == "chat_completions":
assistant_message = response.choices[0].message
if not assistant_message.tool_calls:
length_continue_retries += 1
interim_msg = self._build_assistant_message(assistant_message, finish_reason)
messages.append(interim_msg)
if assistant_message.content:
truncated_response_prefix += assistant_message.content
if length_continue_retries < 3:
print(
f"{self.log_prefix}↻ Requesting continuation "
f"({length_continue_retries}/3)..."
)
continue_msg = {
"role": "user",
"content": (
"[System: Your previous response was truncated by the output "
"length limit. Continue exactly where you left off. Do not "
"restart or repeat prior text. Finish the answer directly.]"
),
}
messages.append(continue_msg)
self._session_messages = messages
self._save_session_log(messages)
restart_with_length_continuation = True
break
partial_response = self._strip_think_blocks(truncated_response_prefix).strip()
self._cleanup_task_resources(effective_task_id)
self._persist_session(messages, conversation_history)
return {
"final_response": partial_response or None,
"messages": messages,
"api_calls": api_call_count,
"completed": False,
"partial": True,
"error": "Response remained truncated after 3 continuation attempts",
}
# If we have prior messages, roll back to last complete state
if len(messages) > 1:
print(f"{self.log_prefix} ⏪ Rolling back to last complete assistant turn")
rolled_back_messages = self._get_messages_up_to_last_assistant(messages)
self._cleanup_task_resources(effective_task_id)
self._persist_session(messages, conversation_history)
return {
"final_response": None,
"messages": rolled_back_messages,
@@ -3571,6 +3735,8 @@ class AIAgent:
if thinking_spinner:
thinking_spinner.stop("")
thinking_spinner = None
if self.thinking_callback:
self.thinking_callback("")
api_elapsed = time.time() - api_start_time
print(f"{self.log_prefix}⚡ Interrupted during API call.")
self._persist_session(messages, conversation_history)
@@ -3583,6 +3749,8 @@ class AIAgent:
if thinking_spinner:
thinking_spinner.stop(f"(╥_╥) error, retrying...")
thinking_spinner = None
if self.thinking_callback:
self.thinking_callback("")
status_code = getattr(api_error, "status_code", None)
if (
@@ -3659,13 +3827,15 @@ class AIAgent:
original_len = len(messages)
messages, active_system_prompt = self._compress_context(
messages, system_message, approx_tokens=approx_tokens
messages, system_message, approx_tokens=approx_tokens,
task_id=effective_task_id,
)
if len(messages) < original_len:
print(f"{self.log_prefix} 🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
time.sleep(2) # Brief pause between compression retries
continue # Retry with compressed messages
restart_with_compressed_messages = True
break
else:
print(f"{self.log_prefix}❌ Payload too large and cannot compress further.")
logging.error(f"{self.log_prefix}413 payload too large. Cannot compress further.")
@@ -3726,14 +3896,16 @@ class AIAgent:
original_len = len(messages)
messages, active_system_prompt = self._compress_context(
messages, system_message, approx_tokens=approx_tokens
messages, system_message, approx_tokens=approx_tokens,
task_id=effective_task_id,
)
if len(messages) < original_len or new_ctx and new_ctx < old_ctx:
if len(messages) < original_len:
print(f"{self.log_prefix} 🗜️ Compressed {original_len}{len(messages)} messages, retrying...")
time.sleep(2) # Brief pause between compression retries
continue # Retry with compressed messages or new tier
restart_with_compressed_messages = True
break
else:
# Can't compress further and already at minimum tier
print(f"{self.log_prefix}❌ Context length exceeded and cannot compress further.")
@@ -3820,6 +3992,14 @@ class AIAgent:
if interrupted:
break
if restart_with_compressed_messages:
api_call_count -= 1
self.iteration_budget.refund()
continue
if restart_with_length_continuation:
continue
# Guard: if all retries exhausted without a successful response
# (e.g. repeated context-length errors that exhausted retry_count),
# the `response` variable is still None. Break out cleanly.
@@ -3834,6 +4014,27 @@ class AIAgent:
else:
assistant_message = response.choices[0].message
# Normalize content to string — some OpenAI-compatible servers
# (llama-server, etc.) return content as a dict or list instead
# of a plain string, which crashes downstream .strip() calls.
if assistant_message.content is not None and not isinstance(assistant_message.content, str):
raw = assistant_message.content
if isinstance(raw, dict):
assistant_message.content = raw.get("text", "") or raw.get("content", "") or json.dumps(raw)
elif isinstance(raw, list):
# Multimodal content list — extract text parts
parts = []
for part in raw:
if isinstance(part, str):
parts.append(part)
elif isinstance(part, dict) and part.get("type") == "text":
parts.append(part.get("text", ""))
elif isinstance(part, dict) and "text" in part:
parts.append(str(part["text"]))
assistant_message.content = "\n".join(parts)
else:
assistant_message.content = str(raw)
# Handle assistant response
if assistant_message.content and not self.quiet_mode:
print(f"{self.log_prefix}🤖 Assistant: {assistant_message.content[:100]}{'...' if len(assistant_message.content) > 100 else ''}")
@@ -3911,7 +4112,6 @@ class AIAgent:
)
if not duplicate_interim:
messages.append(interim_msg)
self._log_msg_to_db(interim_msg)
if self._codex_incomplete_retries < 3:
if not self.quiet_mode:
@@ -3943,39 +4143,36 @@ class AIAgent:
logging.debug(f"Tool call: {tc.function.name} with args: {tc.function.arguments[:200]}...")
# Validate tool call names - detect model hallucinations
# Repair mismatched tool names before validating
for tc in assistant_message.tool_calls:
if tc.function.name not in self.valid_tool_names:
repaired = self._repair_tool_call(tc.function.name)
if repaired:
print(f"{self.log_prefix}🔧 Auto-repaired tool name: '{tc.function.name}' -> '{repaired}'")
tc.function.name = repaired
invalid_tool_calls = [
tc.function.name for tc in assistant_message.tool_calls
tc.function.name for tc in assistant_message.tool_calls
if tc.function.name not in self.valid_tool_names
]
if invalid_tool_calls:
# Track retries for invalid tool calls
if not hasattr(self, '_invalid_tool_retries'):
self._invalid_tool_retries = 0
self._invalid_tool_retries += 1
invalid_preview = invalid_tool_calls[0][:80] + "..." if len(invalid_tool_calls[0]) > 80 else invalid_tool_calls[0]
print(f"{self.log_prefix}⚠️ Invalid tool call detected: '{invalid_preview}'")
print(f"{self.log_prefix} Valid tools: {sorted(self.valid_tool_names)}")
if self._invalid_tool_retries < 3:
print(f"{self.log_prefix}🔄 Retrying API call ({self._invalid_tool_retries}/3)...")
# Don't add anything to messages, just retry the API call
continue
else:
print(f"{self.log_prefix}❌ Max retries (3) for invalid tool calls exceeded. Stopping as partial.")
# Return partial result - don't include the bad tool call in messages
self._invalid_tool_retries = 0
self._persist_session(messages, conversation_history)
return {
"final_response": None,
"messages": messages,
"api_calls": api_call_count,
"completed": False,
"partial": True,
"error": f"Model generated invalid tool call: {invalid_preview}"
}
# Return helpful error to model — model can self-correct next turn
available = ", ".join(sorted(self.valid_tool_names))
invalid_name = invalid_tool_calls[0]
invalid_preview = invalid_name[:80] + "..." if len(invalid_name) > 80 else invalid_name
print(f"{self.log_prefix}⚠️ Unknown tool '{invalid_preview}' — sending error to model for self-correction")
assistant_msg = self._build_assistant_message(assistant_message, finish_reason)
messages.append(assistant_msg)
for tc in assistant_message.tool_calls:
if tc.function.name not in self.valid_tool_names:
content = f"Tool '{tc.function.name}' does not exist. Available tools: {available}"
else:
content = f"Skipped: another tool call in this turn used an invalid name. Please retry this tool call."
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": content,
})
continue
# Reset retry counter on successful tool call validation
if hasattr(self, '_invalid_tool_retries'):
self._invalid_tool_retries = 0
@@ -4019,7 +4216,6 @@ class AIAgent:
)
recovery_dict = {"role": "user", "content": recovery_msg}
messages.append(recovery_dict)
self._log_msg_to_db(recovery_dict)
continue
# Reset retry counter on successful JSON validation
@@ -4041,9 +4237,8 @@ class AIAgent:
print(f" ┊ 💬 {clean}")
messages.append(assistant_msg)
self._log_msg_to_db(assistant_msg)
self._execute_tool_calls(assistant_message, messages, effective_task_id)
self._execute_tool_calls(assistant_message, messages, effective_task_id, api_call_count)
# Refund the iteration if the ONLY tool(s) called were
# execute_code (programmatic tool calling). These are
@@ -4055,7 +4250,8 @@ class AIAgent:
if self.compression_enabled and self.context_compressor.should_compress():
messages, active_system_prompt = self._compress_context(
messages, system_message,
approx_tokens=self.context_compressor.last_prompt_tokens
approx_tokens=self.context_compressor.last_prompt_tokens,
task_id=effective_task_id,
)
# Save session log incrementally (so progress is visible even if interrupted)
@@ -4142,7 +4338,6 @@ class AIAgent:
"finish_reason": finish_reason,
}
messages.append(empty_msg)
self._log_msg_to_db(empty_msg)
self._cleanup_task_resources(effective_task_id)
self._persist_session(messages, conversation_history)
@@ -4173,7 +4368,6 @@ class AIAgent:
codex_ack_continuations += 1
interim_msg = self._build_assistant_message(assistant_message, "incomplete")
messages.append(interim_msg)
self._log_msg_to_db(interim_msg)
continue_msg = {
"role": "user",
@@ -4183,12 +4377,14 @@ class AIAgent:
),
}
messages.append(continue_msg)
self._log_msg_to_db(continue_msg)
self._session_messages = messages
self._save_session_log(messages)
continue
codex_ack_continuations = 0
if truncated_response_prefix:
final_response = truncated_response_prefix + final_response
# Strip <think> blocks from user-facing response (keep raw in messages for trajectory)
final_response = self._strip_think_blocks(final_response).strip()
@@ -4196,7 +4392,6 @@ class AIAgent:
final_msg = self._build_assistant_message(assistant_message, finish_reason)
messages.append(final_msg)
self._log_msg_to_db(final_msg)
if not self.quiet_mode:
print(f"🎉 Conversation completed after {api_call_count} OpenAI-compatible API call(s)")
@@ -4233,7 +4428,6 @@ class AIAgent:
"content": f"Error executing tool: {error_msg}",
}
messages.append(err_msg)
self._log_msg_to_db(err_msg)
pending_handled = True
break
@@ -4246,7 +4440,6 @@ class AIAgent:
"content": f"[System error during processing: {error_msg}]",
}
messages.append(sys_err_msg)
self._log_msg_to_db(sys_err_msg)
# If we're near the limit, break to avoid infinite loops
if api_call_count >= self.max_iterations - 1:

View File

@@ -0,0 +1,3 @@
---
description: Creative content generation — ASCII art, hand-drawn style diagrams, and visual design tools.
---

View File

@@ -0,0 +1,250 @@
---
name: ascii-video
description: "Production pipeline for ASCII art video — any format. Converts video/audio/images/generative input into colored ASCII character video output (MP4, GIF, image sequence). Covers: video-to-ASCII conversion, audio-reactive music visualizers, generative ASCII art animations, hybrid video+audio reactive, text/lyrics overlays, real-time terminal rendering. Use when users request: ASCII video, text art video, terminal-style video, character art animation, retro text visualization, audio visualizer in ASCII, converting video to ASCII art, matrix-style effects, or any animated ASCII output."
---
# ASCII Video Production Pipeline
Full production pipeline for rendering any content as colored ASCII character video.
## Modes
| Mode | Input | Output | Read |
|------|-------|--------|------|
| **Video-to-ASCII** | Video file | ASCII recreation of source footage | `references/inputs.md` § Video Sampling |
| **Audio-reactive** | Audio file | Generative visuals driven by audio features | `references/inputs.md` § Audio Analysis |
| **Generative** | None (or seed params) | Procedural ASCII animation | `references/effects.md` |
| **Hybrid** | Video + audio | ASCII video with audio-reactive overlays | Both input refs |
| **Lyrics/text** | Audio + text/SRT | Timed text with visual effects | `references/inputs.md` § Text/Lyrics |
| **TTS narration** | Text quotes + TTS API | Narrated testimonial/quote video with typed text | `references/inputs.md` § TTS Integration |
## Stack
Single self-contained Python script per project. No GPU.
| Layer | Tool | Purpose |
|-------|------|---------|
| Core | Python 3.10+, NumPy | Math, array ops, vectorized effects |
| Signal | SciPy | FFT, peak detection (audio modes only) |
| Imaging | Pillow (PIL) | Font rasterization, video frame decoding, image I/O |
| Video I/O | ffmpeg (CLI) | Decode input, encode output segments, mux audio, mix tracks |
| Parallel | concurrent.futures / multiprocessing | N workers for batch/clip rendering |
| TTS | ElevenLabs API (or similar) | Generate narration clips for quote/testimonial videos |
| Optional | OpenCV | Video frame sampling, edge detection, optical flow |
## Pipeline Architecture (v2)
Every mode follows the same 6-stage pipeline. See `references/architecture.md` for implementation details, `references/scenes.md` for scene protocol, and `references/composition.md` for multi-grid composition and tonemap.
```
┌─────────┐ ┌──────────┐ ┌───────────┐ ┌──────────┐ ┌─────────┐ ┌────────┐
│ 1.INPUT │→│ 2.ANALYZE │→│ 3.SCENE_FN │→│ 4.TONEMAP │→│ 5.SHADE │→│ 6.ENCODE│
│ load src │ │ features │ │ → canvas │ │ normalize │ │ post-fx │ │ → video │
└─────────┘ └──────────┘ └───────────┘ └──────────┘ └─────────┘ └────────┘
```
1. **INPUT** — Load/decode source material (video frames, audio samples, images, or nothing)
2. **ANALYZE** — Extract per-frame features (audio bands, video luminance/edges, motion vectors)
3. **SCENE_FN** — Scene function renders directly to pixel canvas (`uint8 H,W,3`). May internally compose multiple character grids via `_render_vf()` + pixel blend modes. See `references/composition.md`
4. **TONEMAP** — Percentile-based adaptive brightness normalization with per-scene gamma. Replaces linear brightness multipliers. See `references/composition.md` § Adaptive Tonemap
5. **SHADE** — Apply post-processing `ShaderChain` + `FeedbackBuffer`. See `references/shaders.md`
6. **ENCODE** — Pipe raw RGB frames to ffmpeg for H.264/GIF encoding
## Creative Direction
**Every project should look and feel different.** The references provide a vocabulary of building blocks — don't copy them verbatim. Combine, modify, and invent.
### Aesthetic Dimensions to Vary
| Dimension | Options | Reference |
|-----------|---------|-----------|
| **Character palette** | Density ramps, block elements, symbols, scripts (katakana, Greek, runes, braille), dots, project-specific | `architecture.md` § Character Palettes |
| **Color strategy** | HSV (angle/distance/time/value mapped), discrete RGB palettes, monochrome, complementary, triadic, temperature | `architecture.md` § Color System |
| **Color tint** | Warm, cool, amber, matrix green, neon pink, sepia, ice, blood, void, sunset | `shaders.md` § Color Grade |
| **Background texture** | Sine fields, noise, smooth noise, cellular/voronoi, video source | `effects.md` § Background Fills |
| **Primary effects** | Rings, spirals, tunnel, vortex, waves, interference, aurora, ripple, fire | `effects.md` § Radial / Wave / Fire |
| **Particles** | Energy sparks, snow, rain, bubbles, runes, binary data, orbits, gravity wells | `effects.md` § Particle Systems |
| **Shader mood** | Retro CRT, clean modern, glitch art, cinematic, dreamy, harsh industrial, psychedelic | `shaders.md` § Design Philosophy |
| **Grid density** | xs(8px) through xxl(40px), mixed per layer | `architecture.md` § Grid System |
| **Font** | Menlo, Monaco, Courier, SF Mono, JetBrains Mono, Fira Code, IBM Plex | `architecture.md` § Font Selection |
| **Mirror mode** | None, horizontal, vertical, quad, diagonal, kaleidoscope | `shaders.md` § Mirror Effects |
| **Transition style** | Crossfade, wipe (directional/radial), dissolve, glitch cut | `shaders.md` § Transitions |
### Per-Section Variation
Never use the same config for the entire video. For each section/scene/quote:
- Choose a **different background effect** (or compose 2-3)
- Choose a **different character palette** (match the mood)
- Choose a **different color strategy** (or at minimum a different hue)
- Vary **shader intensity** (more bloom during peaks, more grain during quiet)
- Use **different particle types** if particles are active
### Project-Specific Invention
For every project, invent at least one of:
- A custom character palette matching the theme
- A custom background effect (combine/modify existing ones)
- A custom color palette (discrete RGB set matching the brand/mood)
- A custom particle character set
## Workflow
### Step 1: Determine Mode and Gather Requirements
Establish with user:
- **Input source** — file path, format, duration
- **Mode** — which of the 6 modes above
- **Sections** — time-mapped style changes (timestamps → effect names)
- **Resolution** — default 1920x1080 @ 24fps; GIFs typically 640x360 @ 15fps
- **Style direction** — dense/sparse, bright/dark, chaotic/minimal, color palette
- **Text/branding** — easter eggs, overlays, credits, themed character sets
- **Output format** — MP4 (default), GIF, PNG sequence
### Step 2: Detect Hardware and Set Quality
Before building the script, detect the user's hardware and set appropriate defaults. See `references/optimization.md` § Hardware Detection.
```python
hw = detect_hardware()
profile = quality_profile(hw, target_duration, user_quality_pref)
log(f"Hardware: {hw['cpu_count']} cores, {hw['mem_gb']:.1f}GB RAM")
log(f"Render: {profile['vw']}x{profile['vh']} @{profile['fps']}fps, {profile['workers']} workers")
```
Never hardcode worker counts, resolution, or CRF. Always detect and adapt.
### Step 3: Build the Script
Write as a single Python file. Major components:
1. **Hardware detection + quality profile** — see `references/optimization.md`
2. **Input loader** — mode-dependent; see `references/inputs.md`
3. **Feature analyzer** — audio FFT, video luminance, or pass-through
4. **Grid + renderer** — multi-density character grids with bitmap cache; `_render_vf()` helper for value/hue field → canvas
5. **Character palettes** — multiple palettes chosen per project theme; see `references/architecture.md`
6. **Color system** — HSV + discrete RGB palettes as needed; see `references/architecture.md`
7. **Scene functions** — each returns `canvas (uint8 H,W,3)` directly. May compose multiple grids internally via pixel blend modes. See `references/scenes.md` + `references/composition.md`
8. **Tonemap** — adaptive brightness normalization with per-scene gamma; see `references/composition.md`
9. **Shader pipeline**`ShaderChain` + `FeedbackBuffer` per-section config; see `references/shaders.md`
10. **Scene table + dispatcher** — maps time ranges to scene functions + shader/feedback configs; see `references/scenes.md`
11. **Parallel encoder** — N-worker batch clip rendering with ffmpeg pipes
12. **Main** — orchestrate full pipeline
### Step 4: Handle Critical Bugs
#### Font Cell Height (macOS Pillow)
`textbbox()` returns wrong height. Use `font.getmetrics()`:
```python
ascent, descent = font.getmetrics()
cell_height = ascent + descent # correct
```
#### ffmpeg Pipe Deadlock
Never use `stderr=subprocess.PIPE` with long-running ffmpeg. Redirect to file:
```python
stderr_fh = open(err_path, "w")
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, stderr=stderr_fh)
```
#### Brightness — Use `tonemap()`, Not Linear Multipliers
ASCII on black is inherently dark. This is the #1 visual issue. **Do NOT use linear `* N` brightness multipliers** — they clip highlights and wash out the image. Instead, use the **adaptive tonemap** function from `references/composition.md`:
```python
def tonemap(canvas, gamma=0.75):
"""Percentile-based adaptive normalization + gamma. Replaces all brightness multipliers."""
f = canvas.astype(np.float32)
lo = np.percentile(f, 1) # black point (1st percentile)
hi = np.percentile(f, 99.5) # white point (99.5th percentile)
if hi - lo < 1: hi = lo + 1
f = (f - lo) / (hi - lo)
f = np.clip(f, 0, 1) ** gamma # gamma < 1 = brighter mids
return (f * 255).astype(np.uint8)
```
Pipeline ordering: `scene_fn() → tonemap() → FeedbackBuffer → ShaderChain → ffmpeg`
Per-scene gamma overrides for destructive effects:
- Default: `gamma=0.75`
- Solarize scenes: `gamma=0.55` (solarize darkens above-threshold pixels)
- Posterize scenes: `gamma=0.50` (quantization loses brightness range)
- Already-bright scenes: `gamma=0.85`
Additional brightness best practices:
- Dense animated backgrounds — never flat black, always fill the grid
- Vignette minimum clamped to 0.15 (not 0.12)
- Bloom threshold lowered to 130 (not 170) so more pixels contribute to glow
- Use `screen` blend mode (not `overlay`) when compositing dark ASCII layers — overlay squares dark values: `2 * 0.12 * 0.12 = 0.03`
#### Font Compatibility
Not all Unicode characters render in all fonts. Validate palettes at init:
```python
for c in palette:
img = Image.new("L", (20, 20), 0)
ImageDraw.Draw(img).text((0, 0), c, fill=255, font=font)
if np.array(img).max() == 0:
log(f"WARNING: char '{c}' (U+{ord(c):04X}) not in font, removing from palette")
```
### Step 4b: Per-Clip Architecture (for segmented videos)
When the video has discrete segments (quotes, scenes, chapters), render each as a separate clip file. This enables:
- Re-rendering individual clips without touching the rest (`--clip q05`)
- Faster iteration on specific sections
- Easy reordering or trimming in post
```python
segments = [
{"id": "intro", "start": 0.0, "end": 5.0, "type": "intro"},
{"id": "q00", "start": 5.0, "end": 12.0, "type": "quote", "qi": 0, ...},
{"id": "t00", "start": 12.0, "end": 13.5, "type": "transition", ...},
{"id": "outro", "start": 208.0, "end": 211.6, "type": "outro"},
]
from concurrent.futures import ProcessPoolExecutor, as_completed
with ProcessPoolExecutor(max_workers=hw["workers"]) as pool:
futures = {pool.submit(render_clip, seg, features, path): seg["id"]
for seg, path in clip_args}
for fut in as_completed(futures):
fut.result()
```
CLI: `--clip q00 t00 q01` to re-render specific clips, `--list` to show segments, `--skip-render` to re-stitch only.
### Step 5: Render and Iterate
Performance targets per frame:
| Component | Budget |
|-----------|--------|
| Feature extraction | 1-5ms |
| Effect function | 2-15ms |
| Character render | 80-150ms (bottleneck) |
| Shader pipeline | 5-25ms |
| **Total** | ~100-200ms/frame |
**Fast iteration**: render single test frames to check brightness/layout before full render:
```python
canvas = render_single_frame(frame_index, features, renderer)
Image.fromarray(canvas).save("test.png")
```
**Brightness verification**: sample 5-10 frames across video, check `mean > 8` for ASCII content.
## References
| File | Contents |
|------|----------|
| `references/architecture.md` | Grid system, font selection, character palettes (library of 20+), color system (HSV + discrete RGB), `_render_vf()` helper, compositing, v2 effect function contract |
| `references/inputs.md` | All input sources: audio analysis, video sampling, image conversion, text/lyrics, TTS integration (ElevenLabs, voice assignment, audio mixing) |
| `references/effects.md` | Effect building blocks: 12 value field generators (`vf_sinefield` through `vf_noise_static`), 8 hue field generators (`hf_fixed` through `hf_plasma`), radial/wave/fire effects, particles, composing guide |
| `references/shaders.md` | 38 shader implementations (geometry, channel, color, glow, noise, pattern, tone, glitch, mirror), `ShaderChain` class, full `_apply_shader_step()` dispatch, audio-reactive scaling, transitions, tint presets |
| `references/composition.md` | **v2 core**: pixel blend modes (20 modes with implementations), multi-grid composition, `_render_vf()` helper, adaptive `tonemap()`, per-scene gamma, `FeedbackBuffer` with spatial transforms, `PixelBlendStack` |
| `references/scenes.md` | **v2 scene protocol**: scene function contract, `Renderer` class, `SCENES` table structure, `render_clip()` loop, beat-synced cutting, parallel rendering + pickling constraints, 4 complete scene examples, scene design checklist |
| `references/troubleshooting.md` | NumPy broadcasting traps, blend mode pitfalls, multiprocessing/pickling issues, brightness diagnostics, ffmpeg deadlocks, font issues, performance bottlenecks, common mistakes |
| `references/optimization.md` | Hardware detection, adaptive quality profiles (draft/preview/production/max), CLI integration, vectorized effect patterns, parallel rendering, memory management |

View File

@@ -0,0 +1,528 @@
# Architecture Reference
## Grid System
### Multi-Density Grids
Pre-initialize multiple grid sizes. Switch per section for visual variety.
| Key | Font Size | Grid (1920x1080) | Use |
|-----|-----------|-------------------|-----|
| xs | 8 | 400x108 | Ultra-dense data fields |
| sm | 10 | 320x83 | Dense detail, rain, starfields |
| md | 16 | 192x56 | Default balanced, transitions |
| lg | 20 | 160x45 | Quote/lyric text (readable at 1080p) |
| xl | 24 | 137x37 | Short quotes, large titles |
| xxl | 40 | 80x22 | Giant text, minimal |
**Grid sizing for text-heavy content**: When displaying readable text (quotes, lyrics, testimonials), use 20px (`lg`) as the primary grid. This gives 160 columns -- plenty for lines up to ~50 chars centered. For very short quotes (< 60 chars, <= 3 lines), 24px (`xl`) makes them more impactful. Only init the grids you actually use -- each grid pre-rasterizes all characters which costs ~0.3-0.5s.
Grid dimensions: `cols = VW // cell_width`, `rows = VH // cell_height`.
### Font Selection
Don't hardcode a single font. Choose fonts to match the project's mood. Monospace fonts are required for grid alignment but vary widely in personality:
| Font | Personality | Platform |
|------|-------------|----------|
| Menlo | Clean, neutral, Apple-native | macOS |
| Monaco | Retro terminal, compact | macOS |
| Courier New | Classic typewriter, wide | Cross-platform |
| SF Mono | Modern, tight spacing | macOS |
| Consolas | Windows native, clean | Windows |
| JetBrains Mono | Developer, ligature-ready | Install |
| Fira Code | Geometric, modern | Install |
| IBM Plex Mono | Corporate, authoritative | Install |
| Source Code Pro | Adobe, balanced | Install |
**Font detection at init**: probe available fonts and fall back gracefully:
```python
import platform
def find_font(preferences):
"""Try fonts in order, return first that exists."""
for name, path in preferences:
if os.path.exists(path):
return path
raise FileNotFoundError(f"No monospace font found. Tried: {[p for _,p in preferences]}")
FONT_PREFS_MACOS = [
("Menlo", "/System/Library/Fonts/Menlo.ttc"),
("Monaco", "/System/Library/Fonts/Monaco.ttf"),
("SF Mono", "/System/Library/Fonts/SFNSMono.ttf"),
("Courier", "/System/Library/Fonts/Courier.ttc"),
]
FONT_PREFS_LINUX = [
("DejaVu Sans Mono", "/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf"),
("Liberation Mono", "/usr/share/fonts/truetype/liberation/LiberationMono-Regular.ttf"),
("Noto Sans Mono", "/usr/share/fonts/truetype/noto/NotoSansMono-Regular.ttf"),
("Ubuntu Mono", "/usr/share/fonts/truetype/ubuntu/UbuntuMono-R.ttf"),
]
FONT_PREFS = FONT_PREFS_MACOS if platform.system() == "Darwin" else FONT_PREFS_LINUX
```
**Multi-font rendering**: use different fonts for different layers (e.g., monospace for background, a bolder variant for overlay text). Each GridLayer owns its own font:
```python
grid_bg = GridLayer(find_font(FONT_PREFS), 16) # background
grid_text = GridLayer(find_font(BOLD_PREFS), 20) # readable text
```
### Collecting All Characters
Before initializing grids, gather all characters that need bitmap pre-rasterization:
```python
all_chars = set()
for pal in [PAL_DEFAULT, PAL_DENSE, PAL_BLOCKS, PAL_RUNE, PAL_KATA,
PAL_GREEK, PAL_MATH, PAL_DOTS, PAL_BRAILLE, PAL_STARS,
PAL_BINARY, PAL_MUSIC, PAL_BOX, PAL_CIRCUIT, PAL_ARROWS,
PAL_HERMES]: # ... all palettes used in project
all_chars.update(pal)
# Add any overlay text characters
all_chars.update("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789 .,-:;!?/|")
all_chars.discard(" ") # space is never rendered
```
### GridLayer Initialization
Each grid pre-computes coordinate arrays for vectorized effect math:
```python
class GridLayer:
def __init__(self, font_path, font_size):
self.font = ImageFont.truetype(font_path, font_size)
asc, desc = self.font.getmetrics()
bbox = self.font.getbbox("M")
self.cw = bbox[2] - bbox[0] # character cell width
self.ch = asc + desc # CRITICAL: not textbbox height
self.cols = VW // self.cw
self.rows = VH // self.ch
self.ox = (VW - self.cols * self.cw) // 2 # centering
self.oy = (VH - self.rows * self.ch) // 2
# Index arrays
self.rr = np.arange(self.rows, dtype=np.float32)[:, None]
self.cc = np.arange(self.cols, dtype=np.float32)[None, :]
# Polar coordinates (aspect-corrected)
cx, cy = self.cols / 2.0, self.rows / 2.0
asp = self.cw / self.ch
self.dx = self.cc - cx
self.dy = (self.rr - cy) * asp
self.dist = np.sqrt(self.dx**2 + self.dy**2)
self.angle = np.arctan2(self.dy, self.dx)
# Normalized (0-1 range) -- for distance falloff
self.dx_n = (self.cc - cx) / max(self.cols, 1)
self.dy_n = (self.rr - cy) / max(self.rows, 1) * asp
self.dist_n = np.sqrt(self.dx_n**2 + self.dy_n**2)
# Pre-rasterize all characters to float32 bitmaps
self.bm = {}
for c in all_chars:
img = Image.new("L", (self.cw, self.ch), 0)
ImageDraw.Draw(img).text((0, 0), c, fill=255, font=self.font)
self.bm[c] = np.array(img, dtype=np.float32) / 255.0
```
### Character Render Loop
The bottleneck. Composites pre-rasterized bitmaps onto pixel canvas:
```python
def render(self, chars, colors, canvas=None):
if canvas is None:
canvas = np.zeros((VH, VW, 3), dtype=np.uint8)
for row in range(self.rows):
y = self.oy + row * self.ch
if y + self.ch > VH: break
for col in range(self.cols):
c = chars[row, col]
if c == " ": continue
x = self.ox + col * self.cw
if x + self.cw > VW: break
a = self.bm[c] # float32 bitmap
canvas[y:y+self.ch, x:x+self.cw] = np.maximum(
canvas[y:y+self.ch, x:x+self.cw],
(a[:, :, None] * colors[row, col]).astype(np.uint8))
return canvas
```
Use `np.maximum` for additive blending (brighter chars overwrite dimmer ones, never darken).
### Multi-Layer Rendering
Render multiple grids onto the same canvas for depth:
```python
canvas = np.zeros((VH, VW, 3), dtype=np.uint8)
canvas = grid_lg.render(bg_chars, bg_colors, canvas) # background layer
canvas = grid_md.render(main_chars, main_colors, canvas) # main layer
canvas = grid_sm.render(detail_chars, detail_colors, canvas) # detail overlay
```
---
## Character Palettes
### Design Principles
Character palettes are the primary visual texture of ASCII video. They control not just brightness mapping but the entire visual feel. Design palettes intentionally:
- **Visual weight**: characters sorted by the amount of ink/pixels they fill. Space is always index 0.
- **Coherence**: characters within a palette should belong to the same visual family.
- **Density curve**: the brightness-to-character mapping is nonlinear. Dense palettes (many chars) give smoother gradients; sparse palettes (5-8 chars) give posterized/graphic looks.
- **Rendering compatibility**: every character in the palette must exist in the font. Test at init and remove missing glyphs.
### Palette Library
Organized by visual family. Mix and match per project -- don't default to PAL_DEFAULT for everything.
#### Density / Brightness Palettes
```python
PAL_DEFAULT = " .`'-:;!><=+*^~?/|(){}[]#&$@%" # classic ASCII art
PAL_DENSE = " .:;+=xX$#@\u2588" # simple 11-level ramp
PAL_MINIMAL = " .:-=+#@" # 8-level, graphic
PAL_BINARY = " \u2588" # 2-level, extreme contrast
PAL_GRADIENT = " \u2591\u2592\u2593\u2588" # 4-level block gradient
```
#### Unicode Block Elements
```python
PAL_BLOCKS = " \u2591\u2592\u2593\u2588\u2584\u2580\u2590\u258c" # standard blocks
PAL_BLOCKS_EXT = " \u2596\u2597\u2598\u2599\u259a\u259b\u259c\u259d\u259e\u259f\u2591\u2592\u2593\u2588" # quadrant blocks (more detail)
PAL_SHADE = " \u2591\u2592\u2593\u2588\u2587\u2586\u2585\u2584\u2583\u2582\u2581" # vertical fill progression
```
#### Symbolic / Thematic
```python
PAL_MATH = " \u00b7\u2218\u2219\u2022\u00b0\u00b1\u2213\u00d7\u00f7\u2248\u2260\u2261\u2264\u2265\u221e\u222b\u2211\u220f\u221a\u2207\u2202\u2206\u03a9" # math symbols
PAL_BOX = " \u2500\u2502\u250c\u2510\u2514\u2518\u251c\u2524\u252c\u2534\u253c\u2550\u2551\u2554\u2557\u255a\u255d\u2560\u2563\u2566\u2569\u256c" # box drawing
PAL_CIRCUIT = " .\u00b7\u2500\u2502\u250c\u2510\u2514\u2518\u253c\u25cb\u25cf\u25a1\u25a0\u2206\u2207\u2261" # circuit board
PAL_RUNE = " .\u16a0\u16a2\u16a6\u16b1\u16b7\u16c1\u16c7\u16d2\u16d6\u16da\u16de\u16df" # elder futhark runes
PAL_ALCHEMIC = " \u2609\u263d\u2640\u2642\u2643\u2644\u2645\u2646\u2647\u2648\u2649\u264a\u264b" # planetary/alchemical symbols
PAL_ZODIAC = " \u2648\u2649\u264a\u264b\u264c\u264d\u264e\u264f\u2650\u2651\u2652\u2653" # zodiac
PAL_ARROWS = " \u2190\u2191\u2192\u2193\u2194\u2195\u2196\u2197\u2198\u2199\u21a9\u21aa\u21bb\u27a1" # directional arrows
PAL_MUSIC = " \u266a\u266b\u266c\u2669\u266d\u266e\u266f\u25cb\u25cf" # musical notation
```
#### Script / Writing System
```python
PAL_KATA = " \u00b7\uff66\uff67\uff68\uff69\uff6a\uff6b\uff6c\uff6d\uff6e\uff6f\uff70\uff71\uff72\uff73\uff74\uff75\uff76\uff77" # katakana halfwidth (matrix rain)
PAL_GREEK = " \u03b1\u03b2\u03b3\u03b4\u03b5\u03b6\u03b7\u03b8\u03b9\u03ba\u03bb\u03bc\u03bd\u03be\u03c0\u03c1\u03c3\u03c4\u03c6\u03c8\u03c9" # Greek lowercase
PAL_CYRILLIC = " \u0430\u0431\u0432\u0433\u0434\u0435\u0436\u0437\u0438\u043a\u043b\u043c\u043d\u043e\u043f\u0440\u0441\u0442\u0443\u0444\u0445\u0446\u0447\u0448" # Cyrillic lowercase
PAL_ARABIC = " \u0627\u0628\u062a\u062b\u062c\u062d\u062e\u062f\u0630\u0631\u0632\u0633\u0634\u0635\u0636\u0637" # Arabic letters (isolated forms)
```
#### Dot / Point Progressions
```python
PAL_DOTS = " \u22c5\u2218\u2219\u25cf\u25c9\u25ce\u25c6\u2726\u2605" # dot size progression
PAL_BRAILLE = " \u2801\u2802\u2803\u2804\u2805\u2806\u2807\u2808\u2809\u280a\u280b\u280c\u280d\u280e\u280f\u2810\u2811\u2812\u2813\u2814\u2815\u2816\u2817\u2818\u2819\u281a\u281b\u281c\u281d\u281e\u281f\u283f" # braille patterns
PAL_STARS = " \u00b7\u2727\u2726\u2729\u2728\u2605\u2736\u2733\u2738" # star progression
```
#### Project-Specific (examples -- invent new ones per project)
```python
PAL_HERMES = " .\u00b7~=\u2248\u221e\u26a1\u263f\u2726\u2605\u2295\u25ca\u25c6\u25b2\u25bc\u25cf\u25a0" # mythology/tech blend
PAL_OCEAN = " ~\u2248\u2248\u2248\u223c\u2307\u2248\u224b\u224c\u2248" # water/wave characters
PAL_ORGANIC = " .\u00b0\u2218\u2022\u25e6\u25c9\u2742\u273f\u2741\u2743" # growing/botanical
PAL_MACHINE = " _\u2500\u2502\u250c\u2510\u253c\u2261\u25a0\u2588\u2593\u2592\u2591" # mechanical/industrial
```
### Creating Custom Palettes
When designing for a project, build palettes from the content's theme:
1. **Choose a visual family** (dots, blocks, symbols, script)
2. **Sort by visual weight** -- render each char at target font size, count lit pixels, sort ascending
3. **Test at target grid size** -- some chars collapse to blobs at small sizes
4. **Validate in font** -- remove chars the font can't render:
```python
def validate_palette(pal, font):
"""Remove characters the font can't render."""
valid = []
for c in pal:
if c == " ":
valid.append(c)
continue
img = Image.new("L", (20, 20), 0)
ImageDraw.Draw(img).text((0, 0), c, fill=255, font=font)
if np.array(img).max() > 0: # char actually rendered something
valid.append(c)
return "".join(valid)
```
### Mapping Values to Characters
```python
def val2char(v, mask, pal=PAL_DEFAULT):
"""Map float array (0-1) to character array using palette."""
n = len(pal)
idx = np.clip((v * n).astype(int), 0, n - 1)
out = np.full(v.shape, " ", dtype="U1")
for i, ch in enumerate(pal):
out[mask & (idx == i)] = ch
return out
```
**Nonlinear mapping** for different visual curves:
```python
def val2char_gamma(v, mask, pal, gamma=1.0):
"""Gamma-corrected palette mapping. gamma<1 = brighter, gamma>1 = darker."""
v_adj = np.power(np.clip(v, 0, 1), gamma)
return val2char(v_adj, mask, pal)
def val2char_step(v, mask, pal, thresholds):
"""Custom threshold mapping. thresholds = list of float breakpoints."""
out = np.full(v.shape, pal[0], dtype="U1")
for i, thr in enumerate(thresholds):
out[mask & (v > thr)] = pal[min(i + 1, len(pal) - 1)]
return out
```
---
## Color System
### HSV->RGB (Vectorized)
All color computation in HSV for intuitive control, converted at render time:
```python
def hsv2rgb(h, s, v):
"""Vectorized HSV->RGB. h,s,v are numpy arrays. Returns (R,G,B) uint8 arrays."""
h = h % 1.0
c = v * s; x = c * (1 - np.abs((h*6) % 2 - 1)); m = v - c
# ... 6 sector assignment ...
return (np.clip((r+m)*255, 0, 255).astype(np.uint8),
np.clip((g+m)*255, 0, 255).astype(np.uint8),
np.clip((b+m)*255, 0, 255).astype(np.uint8))
```
### Color Mapping Strategies
Don't default to a single strategy. Choose based on the visual intent:
| Strategy | Hue source | Effect | Good for |
|----------|------------|--------|----------|
| Angle-mapped | `g.angle / (2*pi)` | Rainbow around center | Radial effects, kaleidoscopes |
| Distance-mapped | `g.dist_n * 0.3` | Gradient from center | Tunnels, depth effects |
| Frequency-mapped | `f["cent"] * 0.2` | Timbral color shifting | Audio-reactive |
| Value-mapped | `val * 0.15` | Brightness-dependent hue | Fire, heat maps |
| Time-cycled | `t * rate` | Slow color rotation | Ambient, chill |
| Source-sampled | Video frame pixel colors | Preserve original color | Video-to-ASCII |
| Palette-indexed | Discrete color lookup | Flat graphic style | Retro, pixel art |
| Temperature | Blend between warm/cool | Emotional tone | Mood-driven scenes |
| Complementary | `hue` and `hue + 0.5` | High contrast | Bold, dramatic |
| Triadic | `hue`, `hue + 0.33`, `hue + 0.66` | Vibrant, balanced | Psychedelic |
| Analogous | `hue +/- 0.08` | Harmonious, subtle | Elegant, cohesive |
| Monochrome | Fixed hue, vary S and V | Restrained, focused | Noir, minimal |
### Color Palettes (Discrete RGB)
For non-HSV workflows -- direct RGB color sets for graphic/retro looks:
```python
# Named color palettes -- use for flat/graphic styles or per-character coloring
COLORS_NEON = [(255,0,102), (0,255,153), (102,0,255), (255,255,0), (0,204,255)]
COLORS_PASTEL = [(255,179,186), (255,223,186), (255,255,186), (186,255,201), (186,225,255)]
COLORS_MONO_GREEN = [(0,40,0), (0,80,0), (0,140,0), (0,200,0), (0,255,0)]
COLORS_MONO_AMBER = [(40,20,0), (80,50,0), (140,90,0), (200,140,0), (255,191,0)]
COLORS_CYBERPUNK = [(255,0,60), (0,255,200), (180,0,255), (255,200,0)]
COLORS_VAPORWAVE = [(255,113,206), (1,205,254), (185,103,255), (5,255,161)]
COLORS_EARTH = [(86,58,26), (139,90,43), (189,154,91), (222,193,136), (245,230,193)]
COLORS_ICE = [(200,230,255), (150,200,240), (100,170,230), (60,130,210), (30,80,180)]
COLORS_BLOOD = [(80,0,0), (140,10,10), (200,20,20), (255,50,30), (255,100,80)]
COLORS_FOREST = [(10,30,10), (20,60,15), (30,100,20), (50,150,30), (80,200,50)]
def rgb_palette_map(val, mask, palette):
"""Map float array (0-1) to RGB colors from a discrete palette."""
n = len(palette)
idx = np.clip((val * n).astype(int), 0, n - 1)
R = np.zeros(val.shape, dtype=np.uint8)
G = np.zeros(val.shape, dtype=np.uint8)
B = np.zeros(val.shape, dtype=np.uint8)
for i, (r, g, b) in enumerate(palette):
m = mask & (idx == i)
R[m] = r; G[m] = g; B[m] = b
return R, G, B
```
### Compositing Helpers
```python
def mkc(R, G, B, rows, cols):
"""Pack 3 uint8 arrays into (rows, cols, 3) color array."""
o = np.zeros((rows, cols, 3), dtype=np.uint8)
o[:,:,0] = R; o[:,:,1] = G; o[:,:,2] = B
return o
def layer_over(base_ch, base_co, top_ch, top_co):
"""Composite top layer onto base. Non-space chars overwrite."""
m = top_ch != " "
base_ch[m] = top_ch[m]; base_co[m] = top_co[m]
return base_ch, base_co
def layer_blend(base_co, top_co, alpha):
"""Alpha-blend top color layer onto base. alpha is float array (0-1) or scalar."""
if isinstance(alpha, (int, float)):
alpha = np.full(base_co.shape[:2], alpha, dtype=np.float32)
a = alpha[:,:,None]
return np.clip(base_co * (1 - a) + top_co * a, 0, 255).astype(np.uint8)
def stamp(ch, co, text, row, col, color=(255,255,255)):
"""Write text string at position."""
for i, c in enumerate(text):
cc = col + i
if 0 <= row < ch.shape[0] and 0 <= cc < ch.shape[1]:
ch[row, cc] = c; co[row, cc] = color
```
---
## Section System
Map time ranges to effect functions + shader configs + grid sizes:
```python
SECTIONS = [
(0.0, "void"), (3.94, "starfield"), (21.0, "matrix"),
(46.0, "drop"), (130.0, "glitch"), (187.0, "outro"),
]
FX_DISPATCH = {"void": fx_void, "starfield": fx_starfield, ...}
SECTION_FX = {"void": {"vignette": 0.3, "bloom": 170}, ...}
SECTION_GRID = {"void": "md", "starfield": "sm", "drop": "lg", ...}
SECTION_MIRROR = {"drop": "h", "bass_rings": "quad"}
def get_section(t):
sec = SECTIONS[0][1]
for ts, name in SECTIONS:
if t >= ts: sec = name
return sec
```
---
## Parallel Encoding
Split frames across N workers. Each pipes raw RGB to its own ffmpeg subprocess:
```python
def render_batch(batch_id, frame_start, frame_end, features, seg_path):
r = Renderer()
cmd = ["ffmpeg", "-y", "-f", "rawvideo", "-pix_fmt", "rgb24",
"-s", f"{VW}x{VH}", "-r", str(FPS), "-i", "pipe:0",
"-c:v", "libx264", "-preset", "fast", "-crf", "18",
"-pix_fmt", "yuv420p", seg_path]
# CRITICAL: stderr to file, not pipe
stderr_fh = open(os.path.join(workdir, f"err_{batch_id:02d}.log"), "w")
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE,
stdout=subprocess.DEVNULL, stderr=stderr_fh)
for fi in range(frame_start, frame_end):
t = fi / FPS
sec = get_section(t)
f = {k: float(features[k][fi]) for k in features}
ch, co = FX_DISPATCH[sec](r, f, t)
canvas = r.render(ch, co)
canvas = apply_mirror(canvas, sec, f)
canvas = apply_shaders(canvas, sec, f, t)
pipe.stdin.write(canvas.tobytes())
pipe.stdin.close()
pipe.wait()
stderr_fh.close()
```
Concatenate segments + mux audio:
```python
# Write concat file
with open(concat_path, "w") as cf:
for seg in segments:
cf.write(f"file '{seg}'\n")
subprocess.run(["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", concat_path,
"-i", audio_path, "-c:v", "copy", "-c:a", "aac", "-b:a", "192k",
"-shortest", output_path])
```
## Effect Function Contract
### v2 Protocol (Current)
Every scene function: `(renderer, features_dict, time_float, state_dict) -> canvas_uint8`
```python
def fx_example(r, f, t, S):
"""Scene function returns a full pixel canvas (uint8 H,W,3).
Scenes have full control over multi-grid rendering and pixel-level composition.
"""
# Render multiple layers at different grid densities
canvas_a = _render_vf(r, "md", vf_plasma, hf_angle(0.0), PAL_DENSE, f, t, S)
canvas_b = _render_vf(r, "sm", vf_vortex, hf_time_cycle(0.1), PAL_RUNE, f, t, S)
# Pixel-level blend
result = blend_canvas(canvas_a, canvas_b, "screen", 0.8)
return result
```
See `references/scenes.md` for the full scene protocol, the Renderer class, `_render_vf()` helper, and complete scene examples.
See `references/composition.md` for blend modes, tone mapping, feedback buffers, and multi-grid composition.
### v1 Protocol (Legacy)
Simple scenes that use a single grid can still return `(chars, colors)` and let the caller handle rendering, but the v2 canvas protocol is preferred for all new code.
```python
def fx_simple(r, f, t, S):
g = r.get_grid("md")
val = np.sin(g.dist * 0.1 - t * 3) * f.get("bass", 0.3) * 2
val = np.clip(val, 0, 1); mask = val > 0.03
ch = val2char(val, mask, PAL_DEFAULT)
R, G, B = hsv2rgb(np.full_like(val, 0.6), np.full_like(val, 0.7), val)
co = mkc(R, G, B, g.rows, g.cols)
return g.render(ch, co) # returns canvas directly
```
### Persistent State
Effects that need state across frames (particles, rain columns) use the `S` dict parameter (which is `r.S` — same object, but passed explicitly for clarity):
```python
def fx_with_state(r, f, t, S):
if "particles" not in S:
S["particles"] = initialize_particles()
update_particles(S["particles"])
# ...
```
State persists across frames within a single scene/clip. Each worker process (and each scene) gets its own independent state.
### Helper Functions
```python
def hsv2rgb_scalar(h, s, v):
"""Single-value HSV to RGB. Returns (R, G, B) tuple of ints 0-255."""
h = h % 1.0
c = v * s; x = c * (1 - abs((h * 6) % 2 - 1)); m = v - c
if h * 6 < 1: r, g, b = c, x, 0
elif h * 6 < 2: r, g, b = x, c, 0
elif h * 6 < 3: r, g, b = 0, c, x
elif h * 6 < 4: r, g, b = 0, x, c
elif h * 6 < 5: r, g, b = x, 0, c
else: r, g, b = c, 0, x
return (int((r+m)*255), int((g+m)*255), int((b+m)*255))
def log(msg):
"""Print timestamped log message."""
print(msg, flush=True)
```

View File

@@ -0,0 +1,476 @@
# Composition & Brightness Reference
The composable system is the core of visual complexity. It operates at three levels: pixel-level blend modes, multi-grid composition, and adaptive brightness management. This document covers all three.
## Pixel-Level Blend Modes
### The `blend_canvas()` Function
All blending operates on full pixel canvases (`uint8 H,W,3`). Internally converts to float32 [0,1] for precision, blends, lerps by opacity, converts back.
```python
def blend_canvas(base, top, mode="normal", opacity=1.0):
af = base.astype(np.float32) / 255.0
bf = top.astype(np.float32) / 255.0
fn = BLEND_MODES.get(mode, BLEND_MODES["normal"])
result = fn(af, bf)
if opacity < 1.0:
result = af * (1 - opacity) + result * opacity
return np.clip(result * 255, 0, 255).astype(np.uint8)
```
### 20 Blend Modes
```python
BLEND_MODES = {
# Basic arithmetic
"normal": lambda a, b: b,
"add": lambda a, b: np.clip(a + b, 0, 1),
"subtract": lambda a, b: np.clip(a - b, 0, 1),
"multiply": lambda a, b: a * b,
"screen": lambda a, b: 1 - (1 - a) * (1 - b),
# Contrast
"overlay": lambda a, b: np.where(a < 0.5, 2*a*b, 1 - 2*(1-a)*(1-b)),
"softlight": lambda a, b: (1 - 2*b)*a*a + 2*b*a,
"hardlight": lambda a, b: np.where(b < 0.5, 2*a*b, 1 - 2*(1-a)*(1-b)),
# Difference
"difference": lambda a, b: np.abs(a - b),
"exclusion": lambda a, b: a + b - 2*a*b,
# Dodge / burn
"colordodge": lambda a, b: np.clip(a / (1 - b + 1e-6), 0, 1),
"colorburn": lambda a, b: np.clip(1 - (1 - a) / (b + 1e-6), 0, 1),
# Light
"linearlight": lambda a, b: np.clip(a + 2*b - 1, 0, 1),
"vividlight": lambda a, b: np.where(b < 0.5,
np.clip(1 - (1-a)/(2*b + 1e-6), 0, 1),
np.clip(a / (2*(1-b) + 1e-6), 0, 1)),
"pin_light": lambda a, b: np.where(b < 0.5,
np.minimum(a, 2*b), np.maximum(a, 2*b - 1)),
"hard_mix": lambda a, b: np.where(a + b >= 1.0, 1.0, 0.0),
# Compare
"lighten": lambda a, b: np.maximum(a, b),
"darken": lambda a, b: np.minimum(a, b),
# Grain
"grain_extract": lambda a, b: np.clip(a - b + 0.5, 0, 1),
"grain_merge": lambda a, b: np.clip(a + b - 0.5, 0, 1),
}
```
### Blend Mode Selection Guide
**Modes that brighten** (safe for dark inputs):
- `screen` — always brightens. Two 50% gray layers screen to 75%. The go-to safe blend.
- `add` — simple addition, clips at white. Good for sparkles, glows, particle overlays.
- `colordodge` — extreme brightening at overlap zones. Can blow out. Use low opacity (0.3-0.5).
- `linearlight` — aggressive brightening. Similar to add but with offset.
**Modes that darken** (avoid with dark inputs):
- `multiply` — darkens everything. Only use when both layers are already bright.
- `overlay` — darkens when base < 0.5, brightens when base > 0.5. Crushes dark inputs: `2 * 0.12 * 0.12 = 0.03`. Use `screen` instead for dark material.
- `colorburn` — extreme darkening at overlap zones.
**Modes that create contrast**:
- `softlight` — gentle contrast. Good for subtle texture overlay.
- `hardlight` — strong contrast. Like overlay but keyed on the top layer.
- `vividlight` — very aggressive contrast. Use sparingly.
**Modes that create color effects**:
- `difference` — XOR-like patterns. Two identical layers difference to black; offset layers create wild colors. Great for psychedelic looks.
- `exclusion` — softer version of difference. Creates complementary color patterns.
- `hard_mix` — posterizes to pure black/white/saturated color at intersections.
**Modes for texture blending**:
- `grain_extract` / `grain_merge` — extract a texture from one layer, apply it to another.
### Multi-Layer Chaining
```python
# Pattern: render layers -> blend sequentially
canvas_a = _render_vf(r, "md", vf_plasma, hf_angle(0.0), PAL_DENSE, f, t, S)
canvas_b = _render_vf(r, "sm", vf_vortex, hf_time_cycle(0.1), PAL_RUNE, f, t, S)
canvas_c = _render_vf(r, "lg", vf_rings, hf_distance(), PAL_BLOCKS, f, t, S)
result = blend_canvas(canvas_a, canvas_b, "screen", 0.8)
result = blend_canvas(result, canvas_c, "difference", 0.6)
```
Order matters: `screen(A, B)` is commutative, but `difference(screen(A,B), C)` differs from `difference(A, screen(B,C))`.
---
## Multi-Grid Composition
This is the core visual technique. Rendering the same conceptual scene at different grid densities (character sizes) creates natural texture interference, because characters at different scales overlap at different spatial frequencies.
### Why It Works
- `sm` grid (10pt font): 320x83 characters. Fine detail, dense texture.
- `md` grid (16pt): 192x56 characters. Medium density.
- `lg` grid (20pt): 160x45 characters. Coarse, chunky characters.
When you render a plasma field on `sm` and a vortex on `lg`, then screen-blend them, the fine plasma texture shows through the gaps in the coarse vortex characters. The result has more visual complexity than either layer alone.
### The `_render_vf()` Helper
This is the workhorse function. It takes a value field + hue field + palette + grid, renders to a complete pixel canvas:
```python
def _render_vf(r, grid_key, val_fn, hue_fn, pal, f, t, S, sat=0.8, threshold=0.03):
"""Render a value field + hue field to a pixel canvas via a named grid.
Args:
r: Renderer instance (has .get_grid())
grid_key: "xs", "sm", "md", "lg", "xl", "xxl"
val_fn: (g, f, t, S) -> float32 [0,1] array (rows, cols)
hue_fn: callable (g, f, t, S) -> float32 hue array, OR float scalar
pal: character palette string
f: feature dict
t: time in seconds
S: persistent state dict
sat: HSV saturation (0-1)
threshold: minimum value to render (below = space)
Returns:
uint8 array (VH, VW, 3) — full pixel canvas
"""
g = r.get_grid(grid_key)
val = np.clip(val_fn(g, f, t, S), 0, 1)
mask = val > threshold
ch = val2char(val, mask, pal)
# Hue: either a callable or a fixed float
if callable(hue_fn):
h = hue_fn(g, f, t, S) % 1.0
else:
h = np.full((g.rows, g.cols), float(hue_fn), dtype=np.float32)
# CRITICAL: broadcast to full shape and copy (see Troubleshooting)
h = np.broadcast_to(h, (g.rows, g.cols)).copy()
R, G, B = hsv2rgb(h, np.full_like(val, sat), val)
co = mkc(R, G, B, g.rows, g.cols)
return g.render(ch, co)
```
### Grid Combination Strategies
| Combination | Effect | Good For |
|-------------|--------|----------|
| `sm` + `lg` | Maximum contrast between fine detail and chunky blocks | Bold, graphic looks |
| `sm` + `md` | Subtle texture layering, similar scales | Organic, flowing looks |
| `md` + `lg` + `xs` | Three-scale interference, maximum complexity | Psychedelic, dense |
| `sm` + `sm` (different effects) | Same scale, pattern interference only | Moire, interference |
### Complete Multi-Grid Scene Example
```python
def fx_psychedelic(r, f, t, S):
"""Three-layer multi-grid scene with beat-reactive kaleidoscope."""
# Layer A: plasma on medium grid with rainbow hue
canvas_a = _render_vf(r, "md",
lambda g, f, t, S: vf_plasma(g, f, t, S) * 1.3,
hf_angle(0.0), PAL_DENSE, f, t, S, sat=0.8)
# Layer B: vortex on small grid with cycling hue
canvas_b = _render_vf(r, "sm",
lambda g, f, t, S: vf_vortex(g, f, t, S, twist=5.0) * 1.2,
hf_time_cycle(0.1), PAL_RUNE, f, t, S, sat=0.7)
# Layer C: rings on large grid with distance hue
canvas_c = _render_vf(r, "lg",
lambda g, f, t, S: vf_rings(g, f, t, S, n_base=8, spacing_base=3) * 1.4,
hf_distance(0.3, 0.02), PAL_BLOCKS, f, t, S, sat=0.9)
# Blend: A screened with B, then difference with C
result = blend_canvas(canvas_a, canvas_b, "screen", 0.8)
result = blend_canvas(result, canvas_c, "difference", 0.6)
# Beat-triggered kaleidoscope
if f.get("bdecay", 0) > 0.3:
result = sh_kaleidoscope(result.copy(), folds=6)
return result
```
---
## Adaptive Tone Mapping
### The Brightness Problem
ASCII characters are small bright dots on a black background. Most pixels in any frame are background (black). This means:
- Mean frame brightness is inherently low (often 5-30 out of 255)
- Different effect combinations produce wildly different brightness levels
- A spiral scene might be 50 mean, while a fire scene is 9 mean
- Linear multipliers (e.g., `canvas * 2.0`) either leave dark scenes dark or blow out bright scenes
### The `tonemap()` Function
Replaces linear brightness multipliers with adaptive per-frame normalization + gamma correction:
```python
def tonemap(canvas, target_mean=90, gamma=0.75, black_point=2, white_point=253):
"""Adaptive tone-mapping: normalizes + gamma-corrects so no frame is
fully dark or washed out.
1. Compute 1st and 99.5th percentile (ignores outlier pixels)
2. Stretch that range to [0, 1]
3. Apply gamma curve (< 1 lifts shadows, > 1 darkens)
4. Rescale to [black_point, white_point]
"""
f = canvas.astype(np.float32)
lo = np.percentile(f, 1)
hi = np.percentile(f, 99.5)
if hi - lo < 10:
hi = max(hi, lo + 10) # near-uniform frame fallback
f = np.clip((f - lo) / (hi - lo), 0.0, 1.0)
f = np.power(f, gamma)
f = f * (white_point - black_point) + black_point
return np.clip(f, 0, 255).astype(np.uint8)
```
### Why Gamma, Not Linear
Linear multiplier `* 2.0`:
```
input 10 -> output 20 (still dark)
input 100 -> output 200 (ok)
input 200 -> output 255 (clipped, lost detail)
```
Gamma 0.75 after normalization:
```
input 0.04 -> output 0.08 (lifted from invisible to visible)
input 0.39 -> output 0.50 (moderate lift)
input 0.78 -> output 0.84 (gentle lift, no clipping)
```
Gamma < 1 compresses the highlights and expands the shadows. This is exactly what we need: lift dark ASCII content into visibility without blowing out the bright parts.
### Pipeline Ordering
The pipeline in `render_clip()` is:
```
scene_fn(r, f, t, S) -> canvas
|
tonemap(canvas, gamma=scene_gamma)
|
FeedbackBuffer.apply(canvas, ...)
|
ShaderChain.apply(canvas, f=f, t=t)
|
ffmpeg pipe
```
Tonemap runs BEFORE feedback and shaders. This means:
- Feedback operates on normalized data (consistent behavior regardless of scene brightness)
- Shaders like solarize, posterize, contrast operate on properly-ranged data
- The brightness shader in the chain is no longer needed (tonemap handles it)
### Per-Scene Gamma Tuning
Default gamma is 0.75. Scenes that apply destructive post-processing need more aggressive lift because the destruction happens after tonemap:
| Scene Type | Recommended Gamma | Why |
|------------|-------------------|-----|
| Standard effects | 0.75 | Default, works for most scenes |
| Solarize post-process | 0.50-0.60 | Solarize inverts bright pixels, reducing overall brightness |
| Posterize post-process | 0.50-0.55 | Posterize quantizes, often crushing mid-values to black |
| Heavy difference blending | 0.60-0.70 | Difference mode creates many near-zero pixels |
| Already bright scenes | 0.85-1.0 | Don't over-boost scenes that are naturally bright |
Configure via the scene table:
```python
SCENES = [
{"start": 9.17, "end": 11.25, "name": "fire", "gamma": 0.55,
"fx": fx_fire, "shaders": [("solarize", {"threshold": 200}), ...]},
{"start": 25.96, "end": 27.29, "name": "diamond", "gamma": 0.5,
"fx": fx_diamond, "shaders": [("bloom", {"thr": 90}), ...]},
]
```
### Brightness Verification
After rendering, spot-check frame brightness:
```python
# In test-frame mode
canvas = scene["fx"](r, feat, t, r.S)
canvas = tonemap(canvas, gamma=scene.get("gamma", 0.75))
chain = ShaderChain()
for sn, kw in scene.get("shaders", []):
chain.add(sn, **kw)
canvas = chain.apply(canvas, f=feat, t=t)
print(f"Mean brightness: {canvas.astype(float).mean():.1f}, max: {canvas.max()}")
```
Target ranges after tonemap + shaders:
- Quiet/ambient scenes: mean 30-60
- Active scenes: mean 40-100
- Climax/peak scenes: mean 60-150
- If mean < 20: gamma is too high or a shader is destroying brightness
- If mean > 180: gamma is too low or add is stacking too much
---
## FeedbackBuffer Spatial Transforms
The feedback buffer stores the previous frame and blends it into the current frame with decay. Spatial transforms applied to the buffer before blending create the illusion of motion in the feedback trail.
### Implementation
```python
class FeedbackBuffer:
def __init__(self):
self.buf = None
def apply(self, canvas, decay=0.85, blend="screen", opacity=0.5,
transform=None, transform_amt=0.02, hue_shift=0.0):
if self.buf is None:
self.buf = canvas.astype(np.float32) / 255.0
return canvas
# Decay old buffer
self.buf *= decay
# Spatial transform
if transform:
self.buf = self._transform(self.buf, transform, transform_amt)
# Hue shift the feedback for rainbow trails
if hue_shift > 0:
self.buf = self._hue_shift(self.buf, hue_shift)
# Blend feedback into current frame
result = blend_canvas(canvas,
np.clip(self.buf * 255, 0, 255).astype(np.uint8),
blend, opacity)
# Update buffer with current frame
self.buf = result.astype(np.float32) / 255.0
return result
def _transform(self, buf, transform, amt):
h, w = buf.shape[:2]
if transform == "zoom":
# Zoom in: sample from slightly inside (creates expanding tunnel)
m = int(h * amt); n = int(w * amt)
if m > 0 and n > 0:
cropped = buf[m:-m or None, n:-n or None]
# Resize back to full (nearest-neighbor for speed)
buf = np.array(Image.fromarray(
np.clip(cropped * 255, 0, 255).astype(np.uint8)
).resize((w, h), Image.NEAREST)).astype(np.float32) / 255.0
elif transform == "shrink":
# Zoom out: pad edges, shrink center
m = int(h * amt); n = int(w * amt)
small = np.array(Image.fromarray(
np.clip(buf * 255, 0, 255).astype(np.uint8)
).resize((w - 2*n, h - 2*m), Image.NEAREST))
new = np.zeros((h, w, 3), dtype=np.uint8)
new[m:m+small.shape[0], n:n+small.shape[1]] = small
buf = new.astype(np.float32) / 255.0
elif transform == "rotate_cw":
# Small clockwise rotation via affine
angle = amt * 10 # amt=0.005 -> 0.05 degrees per frame
cy, cx = h / 2, w / 2
Y = np.arange(h, dtype=np.float32)[:, None]
X = np.arange(w, dtype=np.float32)[None, :]
cos_a, sin_a = np.cos(angle), np.sin(angle)
sx = (X - cx) * cos_a + (Y - cy) * sin_a + cx
sy = -(X - cx) * sin_a + (Y - cy) * cos_a + cy
sx = np.clip(sx.astype(int), 0, w - 1)
sy = np.clip(sy.astype(int), 0, h - 1)
buf = buf[sy, sx]
elif transform == "rotate_ccw":
angle = -amt * 10
cy, cx = h / 2, w / 2
Y = np.arange(h, dtype=np.float32)[:, None]
X = np.arange(w, dtype=np.float32)[None, :]
cos_a, sin_a = np.cos(angle), np.sin(angle)
sx = (X - cx) * cos_a + (Y - cy) * sin_a + cx
sy = -(X - cx) * sin_a + (Y - cy) * cos_a + cy
sx = np.clip(sx.astype(int), 0, w - 1)
sy = np.clip(sy.astype(int), 0, h - 1)
buf = buf[sy, sx]
elif transform == "shift_up":
pixels = max(1, int(h * amt))
buf = np.roll(buf, -pixels, axis=0)
buf[-pixels:] = 0 # black fill at bottom
elif transform == "shift_down":
pixels = max(1, int(h * amt))
buf = np.roll(buf, pixels, axis=0)
buf[:pixels] = 0
elif transform == "mirror_h":
buf = buf[:, ::-1]
return buf
def _hue_shift(self, buf, amount):
"""Rotate hues of the feedback buffer. Operates on float32 [0,1]."""
rgb = np.clip(buf * 255, 0, 255).astype(np.uint8)
hsv = np.zeros_like(buf)
# Simple approximate RGB->HSV->shift->RGB
r, g, b = buf[:,:,0], buf[:,:,1], buf[:,:,2]
mx = np.maximum(np.maximum(r, g), b)
mn = np.minimum(np.minimum(r, g), b)
delta = mx - mn + 1e-10
# Hue
h = np.where(mx == r, ((g - b) / delta) % 6,
np.where(mx == g, (b - r) / delta + 2, (r - g) / delta + 4))
h = (h / 6 + amount) % 1.0
# Reconstruct with shifted hue (simplified)
s = delta / (mx + 1e-10)
v = mx
c = v * s; x = c * (1 - np.abs((h * 6) % 2 - 1)); m = v - c
ro = np.zeros_like(h); go = np.zeros_like(h); bo = np.zeros_like(h)
for lo, hi, rv, gv, bv in [(0,1,c,x,0),(1,2,x,c,0),(2,3,0,c,x),
(3,4,0,x,c),(4,5,x,0,c),(5,6,c,0,x)]:
mask = ((h*6) >= lo) & ((h*6) < hi)
ro[mask] = rv[mask] if not isinstance(rv, (int,float)) else rv
go[mask] = gv[mask] if not isinstance(gv, (int,float)) else gv
bo[mask] = bv[mask] if not isinstance(bv, (int,float)) else bv
return np.stack([ro+m, go+m, bo+m], axis=2)
```
### Feedback Presets
| Preset | Config | Visual Effect |
|--------|--------|---------------|
| Infinite zoom tunnel | `decay=0.8, blend="screen", transform="zoom", transform_amt=0.015` | Expanding ring patterns |
| Rainbow trails | `decay=0.7, blend="screen", transform="zoom", transform_amt=0.01, hue_shift=0.02` | Psychedelic color trails |
| Ghostly echo | `decay=0.9, blend="add", opacity=0.15, transform="shift_up", transform_amt=0.01` | Faint upward smearing |
| Kaleidoscopic recursion | `decay=0.75, blend="screen", transform="rotate_cw", transform_amt=0.005, hue_shift=0.01` | Rotating mandala feedback |
| Color evolution | `decay=0.8, blend="difference", opacity=0.4, hue_shift=0.03` | Frame-to-frame color XOR |
| Rising heat haze | `decay=0.5, blend="add", opacity=0.2, transform="shift_up", transform_amt=0.02` | Hot air shimmer |
---
## PixelBlendStack
Higher-level wrapper for multi-layer compositing:
```python
class PixelBlendStack:
def __init__(self):
self.layers = []
def add(self, canvas, mode="normal", opacity=1.0):
self.layers.append((canvas, mode, opacity))
return self
def composite(self):
if not self.layers:
return np.zeros((VH, VW, 3), dtype=np.uint8)
result = self.layers[0][0]
for canvas, mode, opacity in self.layers[1:]:
result = blend_canvas(result, canvas, mode, opacity)
return result
```

View File

@@ -0,0 +1,893 @@
# Effect Catalog
Effect building blocks that produce visual patterns. In v2, these are used **inside scene functions** that return a pixel canvas directly. The building blocks below operate on grid coordinate arrays and produce `(chars, colors)` or value/hue fields that the scene function renders to canvas via `_render_vf()`. See `composition.md` for the v2 rendering pattern and `scenes.md` for scene function examples.
## Design Philosophy
Effects are the creative core. Don't copy these verbatim for every project -- use them as **building blocks** and **combine, modify, and invent** new ones. Every project should feel distinct.
Key principles:
- **Layer multiple effects** rather than using a single monolithic function
- **Parameterize everything** -- hue, speed, density, amplitude should all be arguments
- **React to features** -- audio/video features should modulate at least 2-3 parameters per effect
- **Vary per section** -- never use the same effect config for the entire video
- **Invent project-specific effects** -- the catalog below is a starting vocabulary, not a fixed set
---
## Background Fills
Every effect should start with a background. Never leave flat black.
### Animated Sine Field (General Purpose)
```python
def bg_sinefield(g, f, t, hue=0.6, bri=0.5, pal=PAL_DEFAULT,
freq=(0.13, 0.17, 0.07, 0.09), speed=(0.5, -0.4, -0.3, 0.2)):
"""Layered sine field. Adjust freq/speed tuples for different textures."""
v1 = np.sin(g.cc*freq[0] + t*speed[0]) * np.sin(g.rr*freq[1] - t*speed[1]) * 0.5 + 0.5
v2 = np.sin(g.cc*freq[2] - t*speed[2] + g.rr*freq[3]) * 0.4 + 0.5
v3 = np.sin(g.dist_n*5 + t*0.2) * 0.3 + 0.4
v4 = np.cos(g.angle*3 - t*0.6) * 0.15 + 0.5
val = np.clip((v1*0.3 + v2*0.25 + v3*0.25 + v4*0.2) * bri * (0.6 + f["rms"]*0.6), 0.06, 1)
mask = val > 0.03
ch = val2char(val, mask, pal)
h = np.full_like(val, hue) + f.get("cent", 0.5)*0.1 + val*0.08
R, G, B = hsv2rgb(h, np.clip(0.35+f.get("flat",0.4)*0.4, 0, 1) * np.ones_like(val), val)
return ch, mkc(R, G, B, g.rows, g.cols)
```
### Video-Source Background
```python
def bg_video(g, frame_rgb, pal=PAL_DEFAULT, brightness=0.5):
small = np.array(Image.fromarray(frame_rgb).resize((g.cols, g.rows)))
lum = np.mean(small, axis=2) / 255.0 * brightness
mask = lum > 0.02
ch = val2char(lum, mask, pal)
co = np.clip(small * np.clip(lum[:,:,None]*1.5+0.3, 0.3, 1), 0, 255).astype(np.uint8)
return ch, co
```
### Noise / Static Field
```python
def bg_noise(g, f, t, pal=PAL_BLOCKS, density=0.3, hue_drift=0.02):
val = np.random.random((g.rows, g.cols)).astype(np.float32) * density * (0.5 + f["rms"]*0.5)
val = np.clip(val, 0, 1); mask = val > 0.02
ch = val2char(val, mask, pal)
R, G, B = hsv2rgb(np.full_like(val, t*hue_drift % 1), np.full_like(val, 0.3), val)
return ch, mkc(R, G, B, g.rows, g.cols)
```
### Perlin-Like Smooth Noise
```python
def bg_smooth_noise(g, f, t, hue=0.5, bri=0.5, pal=PAL_DOTS, octaves=3):
"""Layered sine approximation of Perlin noise. Cheap, smooth, organic."""
val = np.zeros((g.rows, g.cols), dtype=np.float32)
for i in range(octaves):
freq = 0.05 * (2 ** i)
amp = 0.5 / (i + 1)
phase = t * (0.3 + i * 0.2)
val += np.sin(g.cc * freq + phase) * np.cos(g.rr * freq * 0.7 - phase * 0.5) * amp
val = np.clip(val * 0.5 + 0.5, 0, 1) * bri
mask = val > 0.03
ch = val2char(val, mask, pal)
h = np.full_like(val, hue) + val * 0.1
R, G, B = hsv2rgb(h, np.full_like(val, 0.5), val)
return ch, mkc(R, G, B, g.rows, g.cols)
```
### Cellular / Voronoi Approximation
```python
def bg_cellular(g, f, t, n_centers=12, hue=0.5, bri=0.6, pal=PAL_BLOCKS):
"""Voronoi-like cells using distance to nearest of N moving centers."""
rng = np.random.RandomState(42) # deterministic centers
cx = (rng.rand(n_centers) * g.cols).astype(np.float32)
cy = (rng.rand(n_centers) * g.rows).astype(np.float32)
# Animate centers
cx_t = cx + np.sin(t * 0.5 + np.arange(n_centers) * 0.7) * 5
cy_t = cy + np.cos(t * 0.4 + np.arange(n_centers) * 0.9) * 3
# Min distance to any center
min_d = np.full((g.rows, g.cols), 999.0, dtype=np.float32)
for i in range(n_centers):
d = np.sqrt((g.cc - cx_t[i])**2 + (g.rr - cy_t[i])**2)
min_d = np.minimum(min_d, d)
val = np.clip(1.0 - min_d / (g.cols * 0.3), 0, 1) * bri
# Cell edges (where distance is near-equal between two centers)
# ... second-nearest trick for edge highlighting
mask = val > 0.03
ch = val2char(val, mask, pal)
R, G, B = hsv2rgb(np.full_like(val, hue) + min_d * 0.005, np.full_like(val, 0.5), val)
return ch, mkc(R, G, B, g.rows, g.cols)
```
---
## Radial Effects
### Concentric Rings
Bass/sub-driven pulsing rings from center. Scale ring count and thickness with bass energy.
```python
def eff_rings(g, f, t, hue=0.5, n_base=6, pal=PAL_DEFAULT):
n_rings = int(n_base + f["sub_r"] * 25 + f["bass"] * 10)
spacing = 2 + f["bass_r"] * 7 + f["rms"] * 3
ring_cv = np.zeros((g.rows, g.cols), dtype=np.float32)
for ri in range(n_rings):
rad = (ri+1) * spacing + f["bdecay"] * 15
wobble = f["mid_r"]*5*np.sin(g.angle*3 + t*4) + f["hi_r"]*3*np.sin(g.angle*7 - t*6)
rd = np.abs(g.dist - rad - wobble)
th = 1 + f["sub"] * 3
ring_cv = np.maximum(ring_cv, np.clip((1 - rd/th) * (0.4 + f["bass"]*0.8), 0, 1))
# Color by angle + distance for rainbow rings
h = g.angle/(2*np.pi) + g.dist*0.005 + f["sub_r"]*0.2
return ring_cv, h
```
### Radial Rays
```python
def eff_rays(g, f, t, n_base=8, hue=0.5):
n_rays = int(n_base + f["hi_r"] * 25)
ray = np.clip(np.cos(g.angle*n_rays + t*3) * f["bdecay"]*0.6 * (1-g.dist_n), 0, 0.7)
return ray
```
### Spiral Arms (Logarithmic)
```python
def eff_spiral(g, f, t, n_arms=3, tightness=2.5, hue=0.5):
arm_cv = np.zeros((g.rows, g.cols), dtype=np.float32)
for ai in range(n_arms):
offset = ai * 2*np.pi / n_arms
log_r = np.log(g.dist + 1) * tightness
arm_phase = g.angle + offset - log_r + t * 0.8
arm_val = np.clip(np.cos(arm_phase * n_arms) * 0.6 + 0.2, 0, 1)
arm_val *= (0.4 + f["rms"]*0.6) * np.clip(1 - g.dist_n*0.5, 0.2, 1)
arm_cv = np.maximum(arm_cv, arm_val)
return arm_cv
```
### Center Glow / Pulse
```python
def eff_glow(g, f, t, intensity=0.6, spread=2.0):
return np.clip(intensity * np.exp(-g.dist_n * spread) * (0.5 + f["rms"]*2 + np.sin(t*1.2)*0.2), 0, 0.9)
```
### Tunnel / Depth
```python
def eff_tunnel(g, f, t, speed=3.0, complexity=6):
tunnel_d = 1.0 / (g.dist_n + 0.1)
v1 = np.sin(tunnel_d*2 - t*speed) * 0.45 + 0.55
v2 = np.sin(g.angle*complexity + tunnel_d*1.5 - t*2) * 0.35 + 0.55
return v1 * 0.5 + v2 * 0.5
```
### Vortex (Rotating Distortion)
```python
def eff_vortex(g, f, t, twist=3.0, pulse=True):
"""Twisting radial pattern -- distance modulates angle."""
twisted = g.angle + g.dist_n * twist * np.sin(t * 0.5)
val = np.sin(twisted * 4 - t * 2) * 0.5 + 0.5
if pulse:
val *= 0.5 + f.get("bass", 0.3) * 0.8
return np.clip(val, 0, 1)
```
---
## Wave Effects
### Multi-Band Frequency Waves
Each frequency band draws its own wave at different spatial/temporal frequencies:
```python
def eff_freq_waves(g, f, t, bands=None):
if bands is None:
bands = [("sub",0.06,1.2,0.0), ("bass",0.10,2.0,0.08), ("lomid",0.15,3.0,0.16),
("mid",0.22,4.5,0.25), ("himid",0.32,6.5,0.4), ("hi",0.45,8.5,0.55)]
mid = g.rows / 2.0
composite = np.zeros((g.rows, g.cols), dtype=np.float32)
for band_key, sf, tf, hue_base in bands:
amp = f.get(band_key, 0.3) * g.rows * 0.4
y_wave = mid - np.sin(g.cc*sf + t*tf) * amp
y_wave += np.sin(g.cc*sf*2.3 + t*tf*1.7) * amp * 0.2 # harmonic
dist = np.abs(g.rr - y_wave)
thickness = 2 + f.get(band_key, 0.3) * 5
intensity = np.clip((1 - dist/thickness) * f.get(band_key, 0.3) * 1.5, 0, 1)
composite = np.maximum(composite, intensity)
return composite
```
### Interference Pattern
6-8 overlapping sine waves creating moire-like patterns:
```python
def eff_interference(g, f, t, n_waves=5):
"""Parametric interference -- vary n_waves for complexity."""
# Each wave has different orientation, frequency, and feature driver
drivers = ["mid_r", "himid_r", "bass_r", "lomid_r", "hi_r"]
vals = np.zeros((g.rows, g.cols), dtype=np.float32)
for i in range(min(n_waves, len(drivers))):
angle = i * np.pi / n_waves # spread orientations
freq = 0.06 + i * 0.03
sp = 0.5 + i * 0.3
proj = g.cc * np.cos(angle) + g.rr * np.sin(angle)
vals += np.sin(proj * freq + t * sp) * f.get(drivers[i], 0.3) * 2.5
return np.clip(vals * 0.12 + 0.45, 0.1, 1)
```
### Aurora / Horizontal Bands
```python
def eff_aurora(g, f, t, hue=0.4, n_bands=3):
val = np.zeros((g.rows, g.cols), dtype=np.float32)
for i in range(n_bands):
freq_r = 0.08 + i * 0.04
freq_c = 0.012 + i * 0.008
sp_r = 0.7 + i * 0.3
sp_c = 0.18 + i * 0.12
val += np.sin(g.rr*freq_r + t*sp_r) * np.sin(g.cc*freq_c + t*sp_c) * (0.6 / n_bands)
return np.clip(val * (f.get("lomid_r", 0.3)*3 + 0.2), 0, 0.7)
```
### Ripple (Point-Source Waves)
```python
def eff_ripple(g, f, t, sources=None, freq=0.3, damping=0.02):
"""Concentric ripples from point sources. Sources = [(row_frac, col_frac), ...]"""
if sources is None:
sources = [(0.5, 0.5)] # center
val = np.zeros((g.rows, g.cols), dtype=np.float32)
for ry, rx in sources:
dy = g.rr - g.rows * ry
dx = g.cc - g.cols * rx
d = np.sqrt(dy**2 + dx**2)
val += np.sin(d * freq - t * 4) * np.exp(-d * damping) * 0.5
return np.clip(val + 0.5, 0, 1)
```
---
## Particle Systems
### General Pattern
All particle systems use persistent state:
```python
S = state # dict persisted across frames
if "px" not in S:
S["px"]=[]; S["py"]=[]; S["vx"]=[]; S["vy"]=[]; S["life"]=[]; S["char"]=[]
# Emit new particles (on beat, continuously, or on trigger)
# Update: position += velocity, apply forces, decay life
# Draw: map to grid, set char/color based on life
# Cull: remove dead, cap total count
```
### Particle Character Sets
Don't hardcode particle chars. Choose per project/mood:
```python
# Energy / explosive
PART_ENERGY = list("*+#@\u26a1\u2726\u2605\u2588\u2593")
PART_SPARK = list("\u00b7\u2022\u25cf\u2605\u2736*+")
# Organic / natural
PART_LEAF = list("\u2740\u2741\u2742\u2743\u273f\u2618\u2022")
PART_SNOW = list("\u2744\u2745\u2746\u00b7\u2022*\u25cb")
PART_RAIN = list("|\u2502\u2503\u2551/\\")
PART_BUBBLE = list("\u25cb\u25ce\u25c9\u25cf\u2218\u2219\u00b0")
# Data / tech
PART_DATA = list("01{}[]<>|/\\")
PART_HEX = list("0123456789ABCDEF")
PART_BINARY = list("01")
# Mystical
PART_RUNE = list("\u16a0\u16a2\u16a6\u16b1\u16b7\u16c1\u16c7\u16d2\u16d6\u16da\u16de\u16df\u2726\u2605")
PART_ZODIAC = list("\u2648\u2649\u264a\u264b\u264c\u264d\u264e\u264f\u2650\u2651\u2652\u2653")
# Minimal
PART_DOT = list("\u00b7\u2022\u25cf")
PART_DASH = list("-=~\u2500\u2550")
```
### Explosion (Beat-Triggered)
```python
def emit_explosion(S, f, center_r, center_c, char_set=PART_ENERGY, count_base=80):
if f.get("beat", 0) > 0:
for _ in range(int(count_base + f["rms"]*150)):
ang = random.uniform(0, 2*math.pi)
sp = random.uniform(1, 9) * (0.5 + f.get("sub_r", 0.3)*2)
S["px"].append(float(center_c))
S["py"].append(float(center_r))
S["vx"].append(math.cos(ang)*sp*2.5)
S["vy"].append(math.sin(ang)*sp)
S["life"].append(1.0)
S["char"].append(random.choice(char_set))
# Update: gravity on vy += 0.03, life -= 0.015
# Color: life * 255 for brightness, hue fade controlled by caller
```
### Rising Embers
```python
# Emit: sy = rows-1, vy = -random.uniform(1,5), vx = random.uniform(-1.5,1.5)
# Update: vx += random jitter * 0.3, life -= 0.01
# Cap at ~1500 particles
```
### Dissolving Cloud
```python
# Init: N=600 particles spread across screen
# Update: slow upward drift, fade life progressively
# life -= 0.002 * (1 + elapsed * 0.05) # accelerating fade
```
### Starfield (3D Projection)
```python
# N stars with (sx, sy, sz) in normalized coords
# Move: sz -= speed (stars approach camera)
# Project: px = cx + sx/sz * cx, py = cy + sy/sz * cy
# Reset stars that pass camera (sz <= 0.01)
# Brightness = (1 - sz), draw streaks behind bright stars
```
### Orbit (Circular/Elliptical Motion)
```python
def emit_orbit(S, n=20, radius=15, speed=1.0, char_set=PART_DOT):
"""Particles orbiting a center point."""
for i in range(n):
angle = i * 2 * math.pi / n
S["px"].append(0.0); S["py"].append(0.0) # will be computed from angle
S["vx"].append(angle) # store angle as "vx" for orbit
S["vy"].append(radius + random.uniform(-2, 2)) # store radius
S["life"].append(1.0)
S["char"].append(random.choice(char_set))
# Update: angle += speed * dt, px = cx + radius * cos(angle), py = cy + radius * sin(angle)
```
### Gravity Well
```python
# Particles attracted toward one or more gravity points
# Update: compute force vector toward each well, apply as acceleration
# Particles that reach well center respawn at edges
```
---
## Rain / Matrix Effects
### Column Rain (Vectorized)
```python
def eff_matrix_rain(g, f, t, state, hue=0.33, bri=0.6, pal=PAL_KATA,
speed_base=0.5, speed_beat=3.0):
"""Vectorized matrix rain. state dict persists column positions."""
if "ry" not in state or len(state["ry"]) != g.cols:
state["ry"] = np.random.uniform(-g.rows, g.rows, g.cols).astype(np.float32)
state["rsp"] = np.random.uniform(0.3, 2.0, g.cols).astype(np.float32)
state["rln"] = np.random.randint(8, 40, g.cols)
state["rch"] = np.random.randint(0, len(pal), (g.rows, g.cols)) # pre-assign chars
speed_mult = speed_base + f.get("bass", 0.3)*speed_beat + f.get("sub_r", 0.3)*3
if f.get("beat", 0) > 0: speed_mult *= 2.5
state["ry"] += state["rsp"] * speed_mult
# Reset columns that fall past bottom
rst = (state["ry"] - state["rln"]) > g.rows
state["ry"][rst] = np.random.uniform(-25, -2, rst.sum())
# Vectorized draw using fancy indexing
ch = np.full((g.rows, g.cols), " ", dtype="U1")
co = np.zeros((g.rows, g.cols, 3), dtype=np.uint8)
heads = state["ry"].astype(int)
for c in range(g.cols):
head = heads[c]
trail_len = state["rln"][c]
for i in range(trail_len):
row = head - i
if 0 <= row < g.rows:
fade = 1.0 - i / trail_len
ci = state["rch"][row, c] % len(pal)
ch[row, c] = pal[ci]
v = fade * bri * 255
if i == 0: # head is bright white-ish
co[row, c] = (int(v*0.9), int(min(255, v*1.1)), int(v*0.9))
else:
R, G, B = hsv2rgb_single(hue, 0.7, fade * bri)
co[row, c] = (R, G, B)
return ch, co, state
```
---
## Glitch / Data Effects
### Horizontal Band Displacement
```python
def eff_glitch_displace(ch, co, f, intensity=1.0):
n_bands = int(8 + f.get("flux", 0.3)*25 + f.get("bdecay", 0)*15) * intensity
for _ in range(int(n_bands)):
y = random.randint(0, ch.shape[0]-1)
h = random.randint(1, int(3 + f.get("sub", 0.3)*8))
shift = int((random.random()-0.5) * f.get("rms", 0.3)*40 + f.get("bdecay", 0)*20*(random.random()-0.5))
if shift != 0:
for row in range(h):
rr = y + row
if 0 <= rr < ch.shape[0]:
ch[rr] = np.roll(ch[rr], shift)
co[rr] = np.roll(co[rr], shift, axis=0)
return ch, co
```
### Block Corruption
```python
def eff_block_corrupt(ch, co, f, char_pool=None, count_base=20):
if char_pool is None:
char_pool = list(PAL_BLOCKS[4:] + PAL_KATA[2:8])
for _ in range(int(count_base + f.get("flux", 0.3)*60 + f.get("bdecay", 0)*40)):
bx = random.randint(0, max(1, ch.shape[1]-6))
by = random.randint(0, max(1, ch.shape[0]-4))
bw, bh = random.randint(2,6), random.randint(1,4)
block_char = random.choice(char_pool)
# Fill rectangle with single char and random color
for r in range(bh):
for c in range(bw):
rr, cc = by+r, bx+c
if 0 <= rr < ch.shape[0] and 0 <= cc < ch.shape[1]:
ch[rr, cc] = block_char
co[rr, cc] = (random.randint(100,255), random.randint(0,100), random.randint(0,80))
return ch, co
```
### Scan Bars (Vertical)
```python
def eff_scanbars(ch, co, f, t, n_base=4, chars="|\u2551|!1l"):
for bi in range(int(n_base + f.get("himid_r", 0.3)*12)):
sx = int((t*50*(1+bi*0.3) + bi*37) % ch.shape[1])
for rr in range(ch.shape[0]):
if random.random() < 0.7:
ch[rr, sx] = random.choice(chars)
return ch, co
```
### Error Messages
```python
# Parameterize the error vocabulary per project:
ERRORS_TECH = ["SEGFAULT","0xDEADBEEF","BUFFER_OVERRUN","PANIC!","NULL_PTR",
"CORRUPT","SIGSEGV","ERR_OVERFLOW","STACK_SMASH","BAD_ALLOC"]
ERRORS_COSMIC = ["VOID_BREACH","ENTROPY_MAX","SINGULARITY","DIMENSION_FAULT",
"REALITY_ERR","TIME_PARADOX","DARK_MATTER_LEAK","QUANTUM_DECOHERE"]
ERRORS_ORGANIC = ["CELL_DIVISION_ERR","DNA_MISMATCH","MUTATION_OVERFLOW",
"NEURAL_DEADLOCK","SYNAPSE_TIMEOUT","MEMBRANE_BREACH"]
```
### Hex Data Stream
```python
hex_str = "".join(random.choice("0123456789ABCDEF") for _ in range(random.randint(8,20)))
stamp(ch, co, hex_str, rand_row, rand_col, (0, 160, 80))
```
---
## Spectrum / Visualization
### Mirrored Spectrum Bars
```python
def eff_spectrum(g, f, t, n_bars=64, pal=PAL_BLOCKS, mirror=True):
bar_w = max(1, g.cols // n_bars); mid = g.rows // 2
band_vals = np.array([f.get("sub",0.3), f.get("bass",0.3), f.get("lomid",0.3),
f.get("mid",0.3), f.get("himid",0.3), f.get("hi",0.3)])
ch = np.full((g.rows, g.cols), " ", dtype="U1")
co = np.zeros((g.rows, g.cols, 3), dtype=np.uint8)
for b in range(n_bars):
frac = b / n_bars
fi = frac * 5; lo_i = int(fi); hi_i = min(lo_i+1, 5)
bval = min(1, (band_vals[lo_i]*(1-fi%1) + band_vals[hi_i]*(fi%1)) * 1.8)
height = int(bval * (g.rows//2 - 2))
for dy in range(height):
hue = (f.get("cent",0.5)*0.3 + frac*0.3 + dy/max(height,1)*0.15) % 1.0
ci = pal[min(int(dy/max(height,1)*len(pal)*0.7+len(pal)*0.2), len(pal)-1)]
for dc in range(bar_w - (1 if bar_w > 2 else 0)):
cc = b*bar_w + dc
if 0 <= cc < g.cols:
rows_to_draw = [mid - dy, mid + dy] if mirror else [g.rows - 1 - dy]
for row in rows_to_draw:
if 0 <= row < g.rows:
ch[row, cc] = ci
co[row, cc] = hsv_to_rgb_single(hue, 0.85, 0.5+dy/max(height,1)*0.5)
return ch, co
```
### Waveform
```python
def eff_waveform(g, f, t, row_offset=-5, hue=0.1):
ch = np.full((g.rows, g.cols), " ", dtype="U1")
co = np.zeros((g.rows, g.cols, 3), dtype=np.uint8)
for c in range(g.cols):
wv = (math.sin(c*0.15+t*5)*f.get("bass",0.3)*0.5
+ math.sin(c*0.3+t*8)*f.get("mid",0.3)*0.3
+ math.sin(c*0.6+t*12)*f.get("hi",0.3)*0.15)
wr = g.rows + row_offset + int(wv * 4)
if 0 <= wr < g.rows:
ch[wr, c] = "~"
v = int(120 + f.get("rms",0.3)*135)
co[wr, c] = [v, int(v*0.7), int(v*0.4)]
return ch, co
```
---
## Fire / Lava
### Fire Columns
```python
def eff_fire(g, f, t, n_base=20, hue_base=0.02, hue_range=0.12, pal=PAL_BLOCKS):
n_cols = int(n_base + f.get("bass",0.3)*30 + f.get("sub_r",0.3)*20)
ch = np.full((g.rows, g.cols), " ", dtype="U1")
co = np.zeros((g.rows, g.cols, 3), dtype=np.uint8)
for fi in range(n_cols):
fx_c = int((fi*g.cols/n_cols + np.sin(t*2+fi*0.7)*3) % g.cols)
height = int((f.get("bass",0.3)*0.4 + f.get("sub_r",0.3)*0.3 + f.get("rms",0.3)*0.3) * g.rows * 0.7)
for dy in range(min(height, g.rows)):
fr = g.rows - 1 - dy
frac = dy / max(height, 1)
bri = max(0.1, (1 - frac*0.6) * (0.5 + f.get("rms",0.3)*0.5))
hue = hue_base + frac * hue_range
ci = "\u2588" if frac<0.2 else ("\u2593" if frac<0.4 else ("\u2592" if frac<0.6 else "\u2591"))
ch[fr, fx_c] = ci
R, G, B = hsv2rgb_single(hue, 0.9, bri)
co[fr, fx_c] = (R, G, B)
return ch, co
```
### Ice / Cold Fire (same structure, different hue range)
```python
# hue_base=0.55, hue_range=0.15 -- blue to cyan
# Lower intensity, slower movement
```
---
## Text Overlays
### Scrolling Ticker
```python
def eff_ticker(ch, co, t, text, row, speed=15, color=(80, 100, 140)):
off = int(t * speed) % max(len(text), 1)
doubled = text + " " + text
stamp(ch, co, doubled[off:off+ch.shape[1]], row, 0, color)
```
### Beat-Triggered Words
```python
def eff_beat_words(ch, co, f, words, row_center=None, color=(255,240,220)):
if f.get("beat", 0) > 0:
w = random.choice(words)
r = (row_center or ch.shape[0]//2) + random.randint(-5,5)
stamp(ch, co, w, r, (ch.shape[1]-len(w))//2, color)
```
### Fading Message Sequence
```python
def eff_fading_messages(ch, co, t, elapsed, messages, period=4.0, color_base=(220,220,220)):
msg_idx = int(elapsed / period) % len(messages)
phase = elapsed % period
fade = max(0, min(1.0, phase) * min(1.0, period - phase))
if fade > 0.05:
v = fade
msg = messages[msg_idx]
cr, cg, cb = [int(c * v) for c in color_base]
stamp(ch, co, msg, ch.shape[0]//2, (ch.shape[1]-len(msg))//2, (cr, cg, cb))
```
---
## Screen Shake
Shift entire char/color arrays on beat:
```python
def eff_shake(ch, co, f, x_amp=6, y_amp=3):
shake_x = int(f.get("sub",0.3)*x_amp*(random.random()-0.5)*2 + f.get("bdecay",0)*4*(random.random()-0.5)*2)
shake_y = int(f.get("bass",0.3)*y_amp*(random.random()-0.5)*2)
if abs(shake_x) > 0:
ch = np.roll(ch, shake_x, axis=1)
co = np.roll(co, shake_x, axis=1)
if abs(shake_y) > 0:
ch = np.roll(ch, shake_y, axis=0)
co = np.roll(co, shake_y, axis=0)
return ch, co
```
---
## Composable Effect System
The real creative power comes from **composition**. There are three levels:
### Level 1: Character-Level Layering
Stack multiple effects as `(chars, colors)` layers:
```python
class LayerStack(EffectNode):
"""Render effects bottom-to-top with character-level compositing."""
def add(self, effect, alpha=1.0):
"""alpha < 1.0 = probabilistic override (sparse overlay)."""
self.layers.append((effect, alpha))
# Usage:
stack = LayerStack()
stack.add(bg_effect) # base — fills screen
stack.add(main_effect) # overlay on top (space chars = transparent)
stack.add(particle_effect) # sparse overlay on top of that
ch, co = stack.render(g, f, t, S)
```
### Level 2: Pixel-Level Blending
After rendering to canvases, blend with Photoshop-style modes:
```python
class PixelBlendStack:
"""Stack canvases with blend modes for complex compositing."""
def add(self, canvas, mode="normal", opacity=1.0)
def composite(self) -> canvas
# Usage:
pbs = PixelBlendStack()
pbs.add(canvas_a) # base
pbs.add(canvas_b, "screen", 0.7) # additive glow
pbs.add(canvas_c, "difference", 0.5) # psychedelic interference
result = pbs.composite()
```
### Level 3: Temporal Feedback
Feed previous frame back into current frame for recursive effects:
```python
fb = FeedbackBuffer()
for each frame:
canvas = render_current()
canvas = fb.apply(canvas, decay=0.8, blend="screen",
transform="zoom", transform_amt=0.015, hue_shift=0.02)
```
### Effect Nodes — Uniform Interface
In the v2 protocol, effect nodes are used **inside** scene functions. The scene function itself returns a canvas. Effect nodes produce intermediate `(chars, colors)` that are rendered to canvas via the grid's `.render()` method or `_render_vf()`.
```python
class EffectNode:
def render(self, g, f, t, S) -> (chars, colors)
# Concrete implementations:
class ValueFieldEffect(EffectNode):
"""Wraps a value field function + hue field function + palette."""
def __init__(self, val_fn, hue_fn, pal=PAL_DEFAULT, sat=0.7)
class LambdaEffect(EffectNode):
"""Wrap any (g,f,t,S) -> (ch,co) function."""
def __init__(self, fn)
class ConditionalEffect(EffectNode):
"""Switch effects based on audio features."""
def __init__(self, condition, if_true, if_false=None)
```
### Value Field Generators (Atomic Building Blocks)
These produce float32 arrays `(rows, cols)` in range [0,1]. They are the raw visual patterns. All have signature `(g, f, t, S, **params) -> float32 array`.
```python
def vf_sinefield(g, f, t, S, bri=0.5,
freq=(0.13, 0.17, 0.07, 0.09), speed=(0.5, -0.4, -0.3, 0.2)):
"""Layered sine field. General purpose background/texture."""
v1 = np.sin(g.cc*freq[0] + t*speed[0]) * np.sin(g.rr*freq[1] - t*speed[1]) * 0.5 + 0.5
v2 = np.sin(g.cc*freq[2] - t*speed[2] + g.rr*freq[3]) * 0.4 + 0.5
v3 = np.sin(g.dist_n*5 + t*0.2) * 0.3 + 0.4
return np.clip((v1*0.35 + v2*0.35 + v3*0.3) * bri * (0.6 + f.get("rms",0.3)*0.6), 0, 1)
def vf_smooth_noise(g, f, t, S, octaves=3, bri=0.5):
"""Multi-octave sine approximation of Perlin noise."""
val = np.zeros((g.rows, g.cols), dtype=np.float32)
for i in range(octaves):
freq = 0.05 * (2 ** i); amp = 0.5 / (i + 1)
phase = t * (0.3 + i * 0.2)
val = val + np.sin(g.cc*freq + phase) * np.cos(g.rr*freq*0.7 - phase*0.5) * amp
return np.clip(val * 0.5 + 0.5, 0, 1) * bri
def vf_rings(g, f, t, S, n_base=6, spacing_base=4):
"""Concentric rings, bass-driven count and wobble."""
n = int(n_base + f.get("sub_r",0.3)*25 + f.get("bass",0.3)*10)
sp = spacing_base + f.get("bass_r",0.3)*7 + f.get("rms",0.3)*3
val = np.zeros((g.rows, g.cols), dtype=np.float32)
for ri in range(n):
rad = (ri+1)*sp + f.get("bdecay",0)*15
wobble = f.get("mid_r",0.3)*5*np.sin(g.angle*3+t*4)
rd = np.abs(g.dist - rad - wobble)
th = 1 + f.get("sub",0.3)*3
val = np.maximum(val, np.clip((1 - rd/th) * (0.4 + f.get("bass",0.3)*0.8), 0, 1))
return val
def vf_spiral(g, f, t, S, n_arms=3, tightness=2.5):
"""Logarithmic spiral arms."""
val = np.zeros((g.rows, g.cols), dtype=np.float32)
for ai in range(n_arms):
offset = ai * 2*np.pi / n_arms
log_r = np.log(g.dist + 1) * tightness
arm_phase = g.angle + offset - log_r + t * 0.8
arm_val = np.clip(np.cos(arm_phase * n_arms) * 0.6 + 0.2, 0, 1)
arm_val *= (0.4 + f.get("rms",0.3)*0.6) * np.clip(1 - g.dist_n*0.5, 0.2, 1)
val = np.maximum(val, arm_val)
return val
def vf_tunnel(g, f, t, S, speed=3.0, complexity=6):
"""Tunnel depth effect — infinite zoom feeling."""
tunnel_d = 1.0 / (g.dist_n + 0.1)
v1 = np.sin(tunnel_d*2 - t*speed) * 0.45 + 0.55
v2 = np.sin(g.angle*complexity + tunnel_d*1.5 - t*2) * 0.35 + 0.55
return np.clip(v1*0.5 + v2*0.5, 0, 1)
def vf_vortex(g, f, t, S, twist=3.0):
"""Twisting radial pattern — distance modulates angle."""
twisted = g.angle + g.dist_n * twist * np.sin(t * 0.5)
val = np.sin(twisted * 4 - t * 2) * 0.5 + 0.5
return np.clip(val * (0.5 + f.get("bass",0.3)*0.8), 0, 1)
def vf_interference(g, f, t, S, n_waves=6):
"""Overlapping sine waves creating moire patterns."""
drivers = ["mid_r", "himid_r", "bass_r", "lomid_r", "hi_r", "sub_r"]
vals = np.zeros((g.rows, g.cols), dtype=np.float32)
for i in range(min(n_waves, len(drivers))):
angle = i * np.pi / n_waves
freq = 0.06 + i * 0.03; sp = 0.5 + i * 0.3
proj = g.cc * np.cos(angle) + g.rr * np.sin(angle)
vals = vals + np.sin(proj*freq + t*sp) * f.get(drivers[i], 0.3) * 2.5
return np.clip(vals * 0.12 + 0.45, 0.1, 1)
def vf_aurora(g, f, t, S, n_bands=3):
"""Horizontal aurora bands."""
val = np.zeros((g.rows, g.cols), dtype=np.float32)
for i in range(n_bands):
fr = 0.08 + i*0.04; fc = 0.012 + i*0.008
sr = 0.7 + i*0.3; sc = 0.18 + i*0.12
val = val + np.sin(g.rr*fr + t*sr) * np.sin(g.cc*fc + t*sc) * (0.6/n_bands)
return np.clip(val * (f.get("lomid_r",0.3)*3 + 0.2), 0, 0.7)
def vf_ripple(g, f, t, S, sources=None, freq=0.3, damping=0.02):
"""Concentric ripples from point sources."""
if sources is None: sources = [(0.5, 0.5)]
val = np.zeros((g.rows, g.cols), dtype=np.float32)
for ry, rx in sources:
dy = g.rr - g.rows*ry; dx = g.cc - g.cols*rx
d = np.sqrt(dy**2 + dx**2)
val = val + np.sin(d*freq - t*4) * np.exp(-d*damping) * 0.5
return np.clip(val + 0.5, 0, 1)
def vf_plasma(g, f, t, S):
"""Classic plasma: sum of sines at different orientations and speeds."""
v = np.sin(g.cc * 0.03 + t * 0.7) * 0.5
v = v + np.sin(g.rr * 0.04 - t * 0.5) * 0.4
v = v + np.sin((g.cc * 0.02 + g.rr * 0.03) + t * 0.3) * 0.3
v = v + np.sin(g.dist_n * 4 - t * 0.8) * 0.3
return np.clip(v * 0.5 + 0.5, 0, 1)
def vf_diamond(g, f, t, S, freq=0.15):
"""Diamond/checkerboard pattern."""
val = np.abs(np.sin(g.cc * freq + t * 0.5)) * np.abs(np.sin(g.rr * freq * 1.2 - t * 0.3))
return np.clip(val * (0.6 + f.get("rms",0.3)*0.8), 0, 1)
def vf_noise_static(g, f, t, S, density=0.4):
"""Random noise — different each frame. Non-deterministic."""
return np.random.random((g.rows, g.cols)).astype(np.float32) * density * (0.5 + f.get("rms",0.3)*0.5)
```
### Hue Field Generators (Color Mapping)
These produce float32 hue arrays [0,1]. Independently combinable with any value field. Each is a factory returning a closure with signature `(g, f, t, S) -> float32 array`. Can also be a plain float for fixed hue.
```python
def hf_fixed(hue):
"""Single hue everywhere."""
def fn(g, f, t, S):
return np.full((g.rows, g.cols), hue, dtype=np.float32)
return fn
def hf_angle(offset=0.0):
"""Hue mapped to angle from center — rainbow wheel."""
def fn(g, f, t, S):
return (g.angle / (2 * np.pi) + offset + t * 0.05) % 1.0
return fn
def hf_distance(base=0.5, scale=0.02):
"""Hue mapped to distance from center."""
def fn(g, f, t, S):
return (base + g.dist * scale + t * 0.03) % 1.0
return fn
def hf_time_cycle(speed=0.1):
"""Hue cycles uniformly over time."""
def fn(g, f, t, S):
return np.full((g.rows, g.cols), (t * speed) % 1.0, dtype=np.float32)
return fn
def hf_audio_cent():
"""Hue follows spectral centroid — timbral color shifting."""
def fn(g, f, t, S):
return np.full((g.rows, g.cols), f.get("cent", 0.5) * 0.3, dtype=np.float32)
return fn
def hf_gradient_h(start=0.0, end=1.0):
"""Left-to-right hue gradient."""
def fn(g, f, t, S):
h = np.broadcast_to(
start + (g.cc / g.cols) * (end - start),
(g.rows, g.cols)
).copy() # .copy() is CRITICAL — see troubleshooting.md
return h % 1.0
return fn
def hf_gradient_v(start=0.0, end=1.0):
"""Top-to-bottom hue gradient."""
def fn(g, f, t, S):
h = np.broadcast_to(
start + (g.rr / g.rows) * (end - start),
(g.rows, g.cols)
).copy()
return h % 1.0
return fn
def hf_plasma(speed=0.3):
"""Plasma-style hue field — organic color variation."""
def fn(g, f, t, S):
return (np.sin(g.cc*0.02 + t*speed)*0.5 + np.sin(g.rr*0.015 + t*speed*0.7)*0.5) % 1.0
return fn
```
### Combining Value Fields
The combinatorial explosion comes from mixing value fields with math:
```python
# Multiplication = intersection (only shows where both have brightness)
combined = vf_plasma(g,f,t,S) * vf_vortex(g,f,t,S)
# Addition = union (shows both, clips at 1.0)
combined = np.clip(vf_rings(g,f,t,S) + vf_spiral(g,f,t,S), 0, 1)
# Interference = beat pattern (shows XOR-like patterns)
combined = np.abs(vf_plasma(g,f,t,S) - vf_tunnel(g,f,t,S))
# Modulation = one effect shapes the other
combined = vf_rings(g,f,t,S) * (0.3 + 0.7 * vf_plasma(g,f,t,S))
# Maximum = shows the brightest of two effects
combined = np.maximum(vf_spiral(g,f,t,S), vf_aurora(g,f,t,S))
```
### Full Scene Example (v2 — Canvas Return)
A v2 scene function composes effects internally and returns a pixel canvas:
```python
def scene_complex(r, f, t, S):
"""v2 scene function: returns canvas (uint8 H,W,3).
r = Renderer, f = audio features, t = time, S = persistent state dict."""
g = r.grids["md"]
rows, cols = g.rows, g.cols
# 1. Value field composition
plasma = vf_plasma(g, f, t, S)
vortex = vf_vortex(g, f, t, S, twist=4.0)
combined = np.clip(plasma * 0.6 + vortex * 0.5 + plasma * vortex * 0.4, 0, 1)
# 2. Color from hue field
h = (hf_angle(0.3)(g,f,t,S) * 0.5 + hf_time_cycle(0.08)(g,f,t,S) * 0.5) % 1.0
# 3. Render to canvas via _render_vf helper
canvas = _render_vf(g, combined, h, sat=0.75, pal=PAL_DENSE)
# 4. Optional: blend a second layer
overlay = _render_vf(r.grids["sm"], vf_rings(r.grids["sm"],f,t,S),
hf_fixed(0.6)(r.grids["sm"],f,t,S), pal=PAL_BLOCK)
canvas = blend_canvas(canvas, overlay, "screen", 0.4)
return canvas
# In the render_clip() loop (handled by the framework):
# canvas = scene_fn(r, f, t, S)
# canvas = tonemap(canvas, gamma=scene_gamma)
# canvas = feedback.apply(canvas, ...)
# canvas = shader_chain.apply(canvas, f=f, t=t)
# pipe.stdin.write(canvas.tobytes())
```
Vary the **value field combo**, **hue field**, **palette**, **blend modes**, **feedback config**, and **shader chain** per section for maximum visual variety. With 12 value fields × 8 hue fields × 14 palettes × 20 blend modes × 7 feedback transforms × 38 shaders, the combinations are effectively infinite.

View File

@@ -0,0 +1,407 @@
# Input Sources
## Audio Analysis
### Loading
```python
tmp = tempfile.mktemp(suffix=".wav")
subprocess.run(["ffmpeg", "-y", "-i", input_path, "-ac", "1", "-ar", "22050",
"-sample_fmt", "s16", tmp], capture_output=True, check=True)
with wave.open(tmp) as wf:
sr = wf.getframerate()
raw = wf.readframes(wf.getnframes())
samples = np.frombuffer(raw, dtype=np.int16).astype(np.float32) / 32768.0
```
### Per-Frame FFT
```python
hop = sr // fps # samples per frame
win = hop * 2 # analysis window (2x hop for overlap)
window = np.hanning(win)
freqs = rfftfreq(win, 1.0 / sr)
bands = {
"sub": (freqs >= 20) & (freqs < 80),
"bass": (freqs >= 80) & (freqs < 250),
"lomid": (freqs >= 250) & (freqs < 500),
"mid": (freqs >= 500) & (freqs < 2000),
"himid": (freqs >= 2000)& (freqs < 6000),
"hi": (freqs >= 6000),
}
```
For each frame: extract chunk, apply window, FFT, compute band energies.
### Feature Set
| Feature | Formula | Controls |
|---------|---------|----------|
| `rms` | `sqrt(mean(chunk²))` | Overall loudness/energy |
| `sub`..`hi` | `sqrt(mean(band_magnitudes²))` | Per-band energy |
| `centroid` | `sum(freq*mag) / sum(mag)` | Brightness/timbre |
| `flatness` | `geomean(mag) / mean(mag)` | Noise vs tone |
| `flux` | `sum(max(0, mag - prev_mag))` | Transient strength |
| `sub_r`..`hi_r` | `band / sum(all_bands)` | Spectral shape (volume-independent) |
| `cent_d` | `abs(gradient(centroid))` | Timbral change rate |
| `beat` | Flux peak detection | Binary beat onset |
| `bdecay` | Exponential decay from beats | Smooth beat pulse (0→1→0) |
**Band ratios are critical** — they decouple spectral shape from volume, so a quiet bass section and a loud bass section both read as "bassy" rather than just "loud" vs "quiet".
### Smoothing
EMA prevents visual jitter:
```python
def ema(arr, alpha):
out = np.empty_like(arr); out[0] = arr[0]
for i in range(1, len(arr)):
out[i] = alpha * arr[i] + (1 - alpha) * out[i-1]
return out
# Slow-moving features (alpha=0.12): centroid, flatness, band ratios, cent_d
# Fast-moving features (alpha=0.3): rms, flux, raw bands
```
### Beat Detection
```python
flux_smooth = np.convolve(flux, np.ones(5)/5, mode="same")
peaks, _ = signal.find_peaks(flux_smooth, height=0.15, distance=fps//5, prominence=0.05)
beat = np.zeros(n_frames)
bdecay = np.zeros(n_frames, dtype=np.float32)
for p in peaks:
beat[p] = 1.0
for d in range(fps // 2):
if p + d < n_frames:
bdecay[p + d] = max(bdecay[p + d], math.exp(-d * 2.5 / (fps // 2)))
```
`bdecay` gives smooth 0→1→0 pulse per beat, decaying over ~0.5s. Use for flash/glitch/mirror triggers.
### Normalization
After computing all frames, normalize each feature to 0-1:
```python
for k in features:
a = features[k]
lo, hi = a.min(), a.max()
features[k] = (a - lo) / (hi - lo + 1e-10)
```
## Video Sampling
### Frame Extraction
```python
# Method 1: ffmpeg pipe (memory efficient)
cmd = ["ffmpeg", "-i", input_video, "-f", "rawvideo", "-pix_fmt", "rgb24",
"-s", f"{target_w}x{target_h}", "-r", str(fps), "-"]
pipe = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
frame_size = target_w * target_h * 3
for fi in range(n_frames):
raw = pipe.stdout.read(frame_size)
if len(raw) < frame_size: break
frame = np.frombuffer(raw, dtype=np.uint8).reshape(target_h, target_w, 3)
# process frame...
# Method 2: OpenCV (if available)
cap = cv2.VideoCapture(input_video)
```
### Luminance-to-Character Mapping
Convert video pixels to ASCII characters based on brightness:
```python
def frame_to_ascii(frame_rgb, grid, pal=PAL_DEFAULT):
"""Convert video frame to character + color arrays."""
rows, cols = grid.rows, grid.cols
# Resize frame to grid dimensions
small = np.array(Image.fromarray(frame_rgb).resize((cols, rows), Image.LANCZOS))
# Luminance
lum = (0.299 * small[:,:,0] + 0.587 * small[:,:,1] + 0.114 * small[:,:,2]) / 255.0
# Map to chars
chars = val2char(lum, lum > 0.02, pal)
# Colors: use source pixel colors, scaled by luminance for visibility
colors = np.clip(small * np.clip(lum[:,:,None] * 1.5 + 0.3, 0.3, 1), 0, 255).astype(np.uint8)
return chars, colors
```
### Edge-Weighted Character Mapping
Use edge detection for more detail in contour regions:
```python
def frame_to_ascii_edges(frame_rgb, grid, pal=PAL_DEFAULT, edge_pal=PAL_BOX):
gray = np.mean(frame_rgb, axis=2)
small_gray = resize(gray, (grid.rows, grid.cols))
lum = small_gray / 255.0
# Sobel edge detection
gx = np.abs(small_gray[:, 2:] - small_gray[:, :-2])
gy = np.abs(small_gray[2:, :] - small_gray[:-2, :])
edge = np.zeros_like(small_gray)
edge[:, 1:-1] += gx; edge[1:-1, :] += gy
edge = np.clip(edge / edge.max(), 0, 1)
# Edge regions get box drawing chars, flat regions get brightness chars
is_edge = edge > 0.15
chars = val2char(lum, lum > 0.02, pal)
edge_chars = val2char(edge, is_edge, edge_pal)
chars[is_edge] = edge_chars[is_edge]
return chars, colors
```
### Motion Detection
Detect pixel changes between frames for motion-reactive effects:
```python
prev_frame = None
def compute_motion(frame):
global prev_frame
if prev_frame is None:
prev_frame = frame.astype(np.float32)
return np.zeros(frame.shape[:2])
diff = np.abs(frame.astype(np.float32) - prev_frame).mean(axis=2)
prev_frame = frame.astype(np.float32) * 0.7 + prev_frame * 0.3 # smoothed
return np.clip(diff / 30.0, 0, 1) # normalized motion map
```
Use motion map to drive particle emission, glitch intensity, or character density.
### Video Feature Extraction
Per-frame features analogous to audio features, for driving effects:
```python
def analyze_video_frame(frame_rgb):
gray = np.mean(frame_rgb, axis=2)
return {
"brightness": gray.mean() / 255.0,
"contrast": gray.std() / 128.0,
"edge_density": compute_edge_density(gray),
"motion": compute_motion(frame_rgb).mean(),
"dominant_hue": compute_dominant_hue(frame_rgb),
"color_variance": compute_color_variance(frame_rgb),
}
```
## Image Sequence
### Static Image to ASCII
Same as single video frame conversion. For animated sequences:
```python
import glob
frames = sorted(glob.glob("frames/*.png"))
for fi, path in enumerate(frames):
img = np.array(Image.open(path).resize((VW, VH)))
chars, colors = frame_to_ascii(img, grid, pal)
```
### Image as Texture Source
Use an image as a background texture that effects modulate:
```python
def load_texture(path, grid):
img = np.array(Image.open(path).resize((grid.cols, grid.rows)))
lum = np.mean(img, axis=2) / 255.0
return lum, img # luminance for char mapping, RGB for colors
```
## Text / Lyrics
### SRT Parsing
```python
import re
def parse_srt(path):
"""Returns [(start_sec, end_sec, text), ...]"""
entries = []
with open(path) as f:
content = f.read()
blocks = content.strip().split("\n\n")
for block in blocks:
lines = block.strip().split("\n")
if len(lines) >= 3:
times = lines[1]
m = re.match(r"(\d+):(\d+):(\d+),(\d+) --> (\d+):(\d+):(\d+),(\d+)", times)
if m:
g = [int(x) for x in m.groups()]
start = g[0]*3600 + g[1]*60 + g[2] + g[3]/1000
end = g[4]*3600 + g[5]*60 + g[6] + g[7]/1000
text = " ".join(lines[2:])
entries.append((start, end, text))
return entries
```
### Lyrics Display Modes
- **Typewriter**: characters appear left-to-right over the time window
- **Fade-in**: whole line fades from dark to bright
- **Flash**: appear instantly on beat, fade out
- **Scatter**: characters start at random positions, converge to final position
- **Wave**: text follows a sine wave path
```python
def lyrics_typewriter(ch, co, text, row, col, t, t_start, t_end, color):
"""Reveal characters progressively over time window."""
progress = np.clip((t - t_start) / (t_end - t_start), 0, 1)
n_visible = int(len(text) * progress)
stamp(ch, co, text[:n_visible], row, col, color)
```
## Generative (No Input)
For pure generative ASCII art, the "features" dict is synthesized from time:
```python
def synthetic_features(t, bpm=120):
"""Generate audio-like features from time alone."""
beat_period = 60.0 / bpm
beat_phase = (t % beat_period) / beat_period
return {
"rms": 0.5 + 0.3 * math.sin(t * 0.5),
"bass": 0.5 + 0.4 * math.sin(t * 2 * math.pi / beat_period),
"sub": 0.3 + 0.3 * math.sin(t * 0.8),
"mid": 0.4 + 0.3 * math.sin(t * 1.3),
"hi": 0.3 + 0.2 * math.sin(t * 2.1),
"cent": 0.5 + 0.2 * math.sin(t * 0.3),
"flat": 0.4,
"flux": 0.3 + 0.2 * math.sin(t * 3),
"beat": 1.0 if beat_phase < 0.05 else 0.0,
"bdecay": max(0, 1.0 - beat_phase * 4),
# ratios
"sub_r": 0.2, "bass_r": 0.25, "lomid_r": 0.15,
"mid_r": 0.2, "himid_r": 0.12, "hi_r": 0.08,
"cent_d": 0.1,
}
```
## TTS Integration
For narrated videos (testimonials, quotes, storytelling), generate speech audio per segment and mix with background music.
### ElevenLabs Voice Generation
```python
import requests
def generate_tts(text, voice_id, api_key, output_path, model="eleven_multilingual_v2"):
"""Generate TTS audio via ElevenLabs API."""
url = f"https://api.elevenlabs.io/v1/text-to-speech/{voice_id}"
headers = {"xi-api-key": api_key, "Content-Type": "application/json"}
data = {"text": text, "model_id": model,
"voice_settings": {"stability": 0.5, "similarity_boost": 0.75}}
resp = requests.post(url, json=data, headers=headers, timeout=30)
resp.raise_for_status()
with open(output_path, "wb") as f:
f.write(resp.content)
```
### Voice Assignment
Use multiple voices for variety. Shuffle deterministically so re-runs are consistent:
```python
import random as _rng
def assign_voices(n_quotes, voice_pool, seed=42):
"""Assign a different voice to each quote, cycling if needed."""
r = _rng.Random(seed)
shuffled = list(voice_pool)
r.shuffle(shuffled)
return [shuffled[i % len(shuffled)] for i in range(n_quotes)]
```
### Pronunciation Control
TTS text should be separate from display text. Common fixes:
- Brand names: spell phonetically ("Nous" -> "Noose", "nginx" -> "engine-x")
- Abbreviations: expand ("API" -> "A P I", "CLI" -> "C L I")
- Technical terms: add phonetic hints
```python
QUOTES = [("Display text here", "Author")]
QUOTES_TTS = ["TTS text with phonetic spelling here"]
# Keep both arrays in sync -- same indices
```
### Audio Pipeline
1. Generate individual TTS clips (MP3/WAV per quote)
2. Get duration of each clip
3. Calculate timing: speech start/end per quote with gaps
4. Concatenate into single TTS track with silence padding
5. Mix with background music
```python
def build_tts_track(tts_clips, target_duration, gap_seconds=2.0):
"""Concatenate TTS clips with gaps, pad to target duration."""
# Get durations
durations = []
for clip in tts_clips:
result = subprocess.run(
["ffprobe", "-v", "error", "-show_entries", "format=duration",
"-of", "csv=p=0", clip],
capture_output=True, text=True)
durations.append(float(result.stdout.strip()))
# Calculate timing
total_speech = sum(durations)
total_gaps = target_duration - total_speech
gap = max(0.5, total_gaps / (len(tts_clips) + 1))
timing = [] # (start, end, quote_index)
t = gap # start after initial gap
for i, dur in enumerate(durations):
timing.append((t, t + dur, i))
t += dur + gap
# Concatenate with ffmpeg
# ... silence padding + concat filter
return timing
```
### Audio Mixing
Mix TTS (center) with background music (wide stereo, low volume):
```python
def mix_audio(tts_path, bgm_path, output_path, bgm_volume=0.15):
"""Mix TTS centered with BGM panned wide stereo."""
cmd = [
"ffmpeg", "-y",
"-i", tts_path, # mono TTS
"-i", bgm_path, # stereo BGM
"-filter_complex",
f"[0:a]aformat=sample_fmts=fltp:sample_rates=44100:channel_layouts=mono,"
f"pan=stereo|c0=c0|c1=c0[tts];" # TTS center
f"[1:a]loudnorm=I=-16:TP=-1.5:LRA=11,"
f"volume={bgm_volume},"
f"extrastereo=2.5[bgm];" # BGM wide stereo
f"[tts][bgm]amix=inputs=2:duration=longest[out]",
"-map", "[out]", "-c:a", "pcm_s16le", output_path
]
subprocess.run(cmd, capture_output=True, check=True)
```
### Feature Analysis on Mixed Audio
Run the standard audio analysis (FFT, beat detection) on the final mixed track so visual effects react to both TTS and music:
```python
# Analyze mixed_final.wav (not individual tracks)
features = analyze_audio("mixed_final.wav", fps=24)
```
This means visuals will pulse with both the music beats and the speech energy -- creating natural synchronization.

View File

@@ -0,0 +1,435 @@
# Optimization Reference
## Hardware Detection
Detect the user's hardware at script startup and adapt rendering parameters automatically. Never hardcode worker counts or resolution.
### CPU and Memory Detection
```python
import multiprocessing
import platform
import shutil
import os
def detect_hardware():
"""Detect hardware capabilities and return render config."""
cpu_count = multiprocessing.cpu_count()
# Leave 1-2 cores free for OS + ffmpeg encoding
if cpu_count >= 16:
workers = cpu_count - 2
elif cpu_count >= 8:
workers = cpu_count - 1
elif cpu_count >= 4:
workers = cpu_count - 1
else:
workers = max(1, cpu_count)
# Memory detection (platform-specific)
try:
if platform.system() == "Darwin":
import subprocess
mem_bytes = int(subprocess.check_output(["sysctl", "-n", "hw.memsize"]).strip())
elif platform.system() == "Linux":
with open("/proc/meminfo") as f:
for line in f:
if line.startswith("MemTotal"):
mem_bytes = int(line.split()[1]) * 1024
break
else:
mem_bytes = 8 * 1024**3 # assume 8GB on unknown
except Exception:
mem_bytes = 8 * 1024**3
mem_gb = mem_bytes / (1024**3)
# Each worker uses ~50-150MB depending on grid sizes
# Cap workers if memory is tight
mem_per_worker_mb = 150
max_workers_by_mem = int(mem_gb * 1024 * 0.6 / mem_per_worker_mb) # use 60% of RAM
workers = min(workers, max_workers_by_mem)
# ffmpeg availability and codec support
has_ffmpeg = shutil.which("ffmpeg") is not None
return {
"cpu_count": cpu_count,
"workers": workers,
"mem_gb": mem_gb,
"platform": platform.system(),
"arch": platform.machine(),
"has_ffmpeg": has_ffmpeg,
}
```
### Adaptive Quality Profiles
Scale resolution, FPS, CRF, and grid density based on hardware:
```python
def quality_profile(hw, target_duration_s, user_preference="auto"):
"""
Returns render settings adapted to hardware.
user_preference: "auto", "draft", "preview", "production", "max"
"""
if user_preference == "draft":
return {"vw": 960, "vh": 540, "fps": 12, "crf": 28, "workers": min(4, hw["workers"]),
"grid_scale": 0.5, "shaders": "minimal", "particles_max": 200}
if user_preference == "preview":
return {"vw": 1280, "vh": 720, "fps": 15, "crf": 25, "workers": hw["workers"],
"grid_scale": 0.75, "shaders": "standard", "particles_max": 500}
if user_preference == "max":
return {"vw": 3840, "vh": 2160, "fps": 30, "crf": 15, "workers": hw["workers"],
"grid_scale": 2.0, "shaders": "full", "particles_max": 3000}
# "production" or "auto"
# Auto-detect: estimate render time, downgrade if it would take too long
n_frames = int(target_duration_s * 24)
est_seconds_per_frame = 0.18 # ~180ms at 1080p
est_total_s = n_frames * est_seconds_per_frame / max(1, hw["workers"])
if hw["mem_gb"] < 4 or hw["cpu_count"] <= 2:
# Low-end: 720p, 15fps
return {"vw": 1280, "vh": 720, "fps": 15, "crf": 23, "workers": hw["workers"],
"grid_scale": 0.75, "shaders": "standard", "particles_max": 500}
if est_total_s > 3600: # would take over an hour
# Downgrade to 720p to speed up
return {"vw": 1280, "vh": 720, "fps": 24, "crf": 20, "workers": hw["workers"],
"grid_scale": 0.75, "shaders": "standard", "particles_max": 800}
# Standard production: 1080p 24fps
return {"vw": 1920, "vh": 1080, "fps": 24, "crf": 20, "workers": hw["workers"],
"grid_scale": 1.0, "shaders": "full", "particles_max": 1200}
def apply_quality_profile(profile):
"""Set globals from quality profile."""
global VW, VH, FPS, N_WORKERS
VW = profile["vw"]
VH = profile["vh"]
FPS = profile["fps"]
N_WORKERS = profile["workers"]
# Grid sizes scale with resolution
# CRF passed to ffmpeg encoder
# Shader set determines which post-processing is active
```
### CLI Integration
```python
parser = argparse.ArgumentParser()
parser.add_argument("--quality", choices=["draft", "preview", "production", "max", "auto"],
default="auto", help="Render quality preset")
parser.add_argument("--workers", type=int, default=0, help="Override worker count (0=auto)")
parser.add_argument("--resolution", type=str, default="", help="Override resolution e.g. 1280x720")
args = parser.parse_args()
hw = detect_hardware()
if args.workers > 0:
hw["workers"] = args.workers
profile = quality_profile(hw, target_duration, args.quality)
if args.resolution:
w, h = args.resolution.split("x")
profile["vw"], profile["vh"] = int(w), int(h)
apply_quality_profile(profile)
log(f"Hardware: {hw['cpu_count']} cores, {hw['mem_gb']:.1f}GB RAM, {hw['platform']}")
log(f"Render: {profile['vw']}x{profile['vh']} @{profile['fps']}fps, "
f"CRF {profile['crf']}, {profile['workers']} workers")
```
## Performance Budget
Target: 100-200ms per frame (5-10 fps single-threaded, 40-80 fps across 8 workers).
| Component | Time | Notes |
|-----------|------|-------|
| Feature extraction | 1-5ms | Pre-computed for all frames before render |
| Effect function | 2-15ms | Vectorized numpy, avoid Python loops |
| Character render | 80-150ms | **Bottleneck** -- per-cell Python loop |
| Shader pipeline | 5-25ms | Depends on active shaders |
| ffmpeg encode | ~5ms | Amortized by pipe buffering |
## Bitmap Pre-Rasterization
Rasterize every character at init, not per-frame:
```python
# At init time -- done once
for c in all_characters:
img = Image.new("L", (cell_w, cell_h), 0)
ImageDraw.Draw(img).text((0, 0), c, fill=255, font=font)
bitmaps[c] = np.array(img, dtype=np.float32) / 255.0 # float32 for fast multiply
# At render time -- fast lookup
bitmap = bitmaps[char]
canvas[y:y+ch, x:x+cw] = np.maximum(canvas[y:y+ch, x:x+cw],
(bitmap[:,:,None] * color).astype(np.uint8))
```
Collect all characters from all palettes + overlay text into the init set. Lazy-init for any missed characters.
## Coordinate Array Caching
Pre-compute all grid-relative coordinate arrays at init, not per-frame:
```python
# These are O(rows*cols) and used in every effect
self.rr = np.arange(rows)[:, None] # row indices
self.cc = np.arange(cols)[None, :] # col indices
self.dist = np.sqrt(dx**2 + dy**2) # distance from center
self.angle = np.arctan2(dy, dx) # angle from center
self.dist_n = ... # normalized distance
```
## Vectorized Effect Patterns
### Avoid Per-Cell Python Loops in Effects
The render loop (compositing bitmaps) is unavoidably per-cell. But effect functions must be fully vectorized numpy -- never iterate over rows/cols in Python.
Bad (O(rows*cols) Python loop):
```python
for r in range(rows):
for c in range(cols):
val[r, c] = math.sin(c * 0.1 + t) * math.cos(r * 0.1 - t)
```
Good (vectorized):
```python
val = np.sin(g.cc * 0.1 + t) * np.cos(g.rr * 0.1 - t)
```
### Vectorized Matrix Rain
The naive per-column per-trail-pixel loop is the second biggest bottleneck after the render loop. Use numpy fancy indexing:
```python
# Instead of nested Python loops over columns and trail pixels:
# Build row index arrays for all active trail pixels at once
all_rows = []
all_cols = []
all_fades = []
for c in range(cols):
head = int(state["ry"][c])
trail_len = state["rln"][c]
for i in range(trail_len):
row = head - i
if 0 <= row < rows:
all_rows.append(row)
all_cols.append(c)
all_fades.append(1.0 - i / trail_len)
# Vectorized assignment
ar = np.array(all_rows)
ac = np.array(all_cols)
af = np.array(all_fades, dtype=np.float32)
# Assign chars and colors in bulk using fancy indexing
ch[ar, ac] = ... # vectorized char assignment
co[ar, ac, 1] = (af * bri * 255).astype(np.uint8) # green channel
```
### Vectorized Fire Columns
Same pattern -- accumulate index arrays, assign in bulk:
```python
fire_val = np.zeros((rows, cols), dtype=np.float32)
for fi in range(n_cols):
fx_c = int((fi * cols / n_cols + np.sin(t * 2 + fi * 0.7) * 3) % cols)
height = int(energy * rows * 0.7)
dy = np.arange(min(height, rows))
fr = rows - 1 - dy
frac = dy / max(height, 1)
# Width spread: base columns wider at bottom
for dx in range(-1, 2): # 3-wide columns
c = fx_c + dx
if 0 <= c < cols:
fire_val[fr, c] = np.maximum(fire_val[fr, c],
(1 - frac * 0.6) * (0.5 + rms * 0.5))
# Now map fire_val to chars and colors in one vectorized pass
```
## Bloom Optimization
**Do NOT use `scipy.ndimage.uniform_filter`** -- measured at 424ms/frame.
Use 4x downsample + manual box blur instead -- 84ms/frame (5x faster):
```python
sm = canvas[::4, ::4].astype(np.float32) # 4x downsample
br = np.where(sm > threshold, sm, 0)
for _ in range(3): # 3-pass manual box blur
p = np.pad(br, ((1,1),(1,1),(0,0)), mode='edge')
br = (p[:-2,:-2] + p[:-2,1:-1] + p[:-2,2:] +
p[1:-1,:-2] + p[1:-1,1:-1] + p[1:-1,2:] +
p[2:,:-2] + p[2:,1:-1] + p[2:,2:]) / 9.0
bl = np.repeat(np.repeat(br, 4, axis=0), 4, axis=1)[:H, :W]
```
## Vignette Caching
Distance field is resolution- and strength-dependent, never changes per frame:
```python
_vig_cache = {}
def sh_vignette(canvas, strength):
key = (canvas.shape[0], canvas.shape[1], round(strength, 2))
if key not in _vig_cache:
Y = np.linspace(-1, 1, H)[:, None]
X = np.linspace(-1, 1, W)[None, :]
_vig_cache[key] = np.clip(1.0 - np.sqrt(X**2+Y**2) * strength, 0.15, 1).astype(np.float32)
return np.clip(canvas * _vig_cache[key][:,:,None], 0, 255).astype(np.uint8)
```
Same pattern for CRT barrel distortion (cache remap coordinates).
## Film Grain Optimization
Generate noise at half resolution, tile up:
```python
noise = np.random.randint(-amt, amt+1, (H//2, W//2, 1), dtype=np.int16)
noise = np.repeat(np.repeat(noise, 2, axis=0), 2, axis=1)[:H, :W]
```
2x blocky grain looks like film grain and costs 1/4 the random generation.
## Parallel Rendering
### Worker Architecture
```python
hw = detect_hardware()
N_WORKERS = hw["workers"]
# Batch splitting (for non-clip architectures)
batch_size = (n_frames + N_WORKERS - 1) // N_WORKERS
batches = [(i, i*batch_size, min((i+1)*batch_size, n_frames), features, seg_path) ...]
with multiprocessing.Pool(N_WORKERS) as pool:
segments = pool.starmap(render_batch, batches)
```
### Per-Clip Parallelism (Preferred for Segmented Videos)
```python
from concurrent.futures import ProcessPoolExecutor, as_completed
with ProcessPoolExecutor(max_workers=N_WORKERS) as pool:
futures = {pool.submit(render_clip, seg, features, path): seg["id"]
for seg, path in clip_args}
for fut in as_completed(futures):
clip_id = futures[fut]
try:
fut.result()
log(f" {clip_id} done")
except Exception as e:
log(f" {clip_id} FAILED: {e}")
```
### Worker Isolation
Each worker:
- Creates its own `Renderer` instance (with full grid + bitmap init)
- Opens its own ffmpeg subprocess
- Has independent random seed (`random.seed(batch_id * 10000)`)
- Writes to its own segment file and stderr log
### ffmpeg Pipe Safety
**CRITICAL**: Never `stderr=subprocess.PIPE` with long-running ffmpeg. The stderr buffer fills at ~64KB and deadlocks:
```python
# WRONG -- will deadlock
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE, stderr=subprocess.PIPE)
# RIGHT -- stderr to file
stderr_fh = open(err_path, "w")
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.DEVNULL, stderr=stderr_fh)
# ... write all frames ...
pipe.stdin.close()
pipe.wait()
stderr_fh.close()
```
### Concatenation
```python
with open(concat_file, "w") as cf:
for seg in segments:
cf.write(f"file '{seg}'\n")
cmd = ["ffmpeg", "-y", "-f", "concat", "-safe", "0", "-i", concat_file]
if audio_path:
cmd += ["-i", audio_path, "-c:v", "copy", "-c:a", "aac", "-b:a", "192k", "-shortest"]
else:
cmd += ["-c:v", "copy"]
cmd.append(output_path)
subprocess.run(cmd, capture_output=True, check=True)
```
## Particle System Performance
Cap particle counts based on quality profile:
| System | Low | Standard | High |
|--------|-----|----------|------|
| Explosion | 300 | 1000 | 2500 |
| Embers | 500 | 1500 | 3000 |
| Starfield | 300 | 800 | 1500 |
| Dissolve | 200 | 600 | 1200 |
Cull by truncating lists:
```python
MAX_PARTICLES = profile.get("particles_max", 1200)
if len(S["px"]) > MAX_PARTICLES:
for k in ("px", "py", "vx", "vy", "life", "char"):
S[k] = S[k][-MAX_PARTICLES:] # keep newest
```
## Memory Management
- Feature arrays: pre-computed for all frames, shared across workers via fork semantics (COW)
- Canvas: allocated once per worker, reused (`np.zeros(...)`)
- Character arrays: allocated per frame (cheap -- rows*cols U1 strings)
- Bitmap cache: ~500KB per grid size, initialized once per worker
Total memory per worker: ~50-150MB. Total: ~400-800MB for 8 workers.
For low-memory systems (< 4GB), reduce worker count and use smaller grids.
## Brightness Verification
After render, spot-check brightness at sample timestamps:
```python
for t in [2, 30, 60, 120, 180]:
cmd = ["ffmpeg", "-ss", str(t), "-i", output_path,
"-frames:v", "1", "-f", "rawvideo", "-pix_fmt", "rgb24", "-"]
r = subprocess.run(cmd, capture_output=True)
arr = np.frombuffer(r.stdout, dtype=np.uint8)
print(f"t={t}s mean={arr.mean():.1f} max={arr.max()}")
```
Target: mean > 5 for quiet sections, mean > 15 for active sections. If consistently below, increase brightness floor in effects and/or global boost multiplier.
## Render Time Estimates
Scale with hardware. Baseline: 1080p, 24fps, ~180ms/frame/worker.
| Duration | Frames | 4 workers | 8 workers | 16 workers |
|----------|--------|-----------|-----------|------------|
| 30s | 720 | ~3 min | ~2 min | ~1 min |
| 2 min | 2,880 | ~13 min | ~7 min | ~4 min |
| 3.5 min | 5,040 | ~23 min | ~12 min | ~6 min |
| 5 min | 7,200 | ~33 min | ~17 min | ~9 min |
| 10 min | 14,400 | ~65 min | ~33 min | ~17 min |
At 720p: multiply times by ~0.5. At 4K: multiply by ~4.
Heavier effects (many particles, dense grids, extra shader passes) add ~20-50%.

View File

@@ -0,0 +1,382 @@
# Scene System Reference
Scenes are the top-level creative unit. Each scene is a time-bounded segment with its own effect function, shader chain, feedback configuration, and tone-mapping gamma.
## Scene Protocol (v2)
### Function Signature
```python
def fx_scene_name(r, f, t, S) -> canvas:
"""
Args:
r: Renderer instance — access multiple grids via r.get_grid("sm")
f: dict of audio/video features, all values normalized to [0, 1]
t: time in seconds (global, not local to scene)
S: dict for persistent state (particles, rain columns, etc.)
Returns:
canvas: numpy uint8 array, shape (VH, VW, 3) — full pixel frame
"""
```
This replaces the v1 protocol where scenes returned `(chars, colors)` tuples. The v2 protocol gives scenes full control over multi-grid rendering and pixel-level composition internally.
### The Renderer Class
```python
class Renderer:
def __init__(self):
self.grids = {} # lazy-initialized grid cache
self.g = None # "active" grid (for backward compat)
self.S = {} # persistent state dict
def get_grid(self, key):
"""Get or create a GridLayer by size key."""
if key not in self.grids:
sizes = {"xs": 8, "sm": 10, "md": 16, "lg": 20, "xl": 24, "xxl": 40}
self.grids[key] = GridLayer(FONT_PATH, sizes[key])
return self.grids[key]
def set_grid(self, key):
"""Set active grid (legacy). Prefer get_grid() for multi-grid scenes."""
self.g = self.get_grid(key)
return self.g
```
**Key difference from v1**: scenes call `r.get_grid("sm")`, `r.get_grid("lg")`, etc. to access multiple grids. Each grid is lazy-initialized and cached. The `set_grid()` method still works for single-grid scenes.
### Minimal Scene (Single Grid)
```python
def fx_simple_rings(r, f, t, S):
"""Single-grid scene: rings with distance-mapped hue."""
canvas = _render_vf(r, "md",
lambda g, f, t, S: vf_rings(g, f, t, S, n_base=8, spacing_base=3),
hf_distance(0.3, 0.02), PAL_STARS, f, t, S, sat=0.85)
return canvas
```
### Standard Scene (Two Grids + Blend)
```python
def fx_tunnel_ripple(r, f, t, S):
"""Two-grid scene: tunnel depth exclusion-blended with ripple."""
canvas_a = _render_vf(r, "md",
lambda g, f, t, S: vf_tunnel(g, f, t, S, speed=5.0, complexity=10) * 1.3,
hf_distance(0.55, 0.02), PAL_GREEK, f, t, S, sat=0.7)
canvas_b = _render_vf(r, "sm",
lambda g, f, t, S: vf_ripple(g, f, t, S,
sources=[(0.3,0.3), (0.7,0.7), (0.5,0.2)], freq=0.5, damping=0.012) * 1.4,
hf_angle(0.1), PAL_STARS, f, t, S, sat=0.8)
return blend_canvas(canvas_a, canvas_b, "exclusion", 0.8)
```
### Complex Scene (Three Grids + Conditional + Custom Rendering)
```python
def fx_rings_explosion(r, f, t, S):
"""Three-grid scene with particles and conditional kaleidoscope."""
# Layer 1: rings
canvas_a = _render_vf(r, "sm",
lambda g, f, t, S: vf_rings(g, f, t, S, n_base=10, spacing_base=2) * 1.4,
lambda g, f, t, S: (g.angle / (2*np.pi) + t * 0.15) % 1.0,
PAL_STARS, f, t, S, sat=0.9)
# Layer 2: vortex on different grid
canvas_b = _render_vf(r, "md",
lambda g, f, t, S: vf_vortex(g, f, t, S, twist=6.0) * 1.2,
hf_time_cycle(0.15), PAL_BLOCKS, f, t, S, sat=0.8)
result = blend_canvas(canvas_b, canvas_a, "screen", 0.7)
# Layer 3: particles (custom rendering, not _render_vf)
g = r.get_grid("sm")
if "px" not in S:
S["px"], S["py"], S["vx"], S["vy"], S["life"], S["pch"] = (
[], [], [], [], [], [])
if f.get("beat", 0) > 0.5:
chars = list("\u2605\u2736\u2733\u2738\u2726\u2728*+")
for _ in range(int(80 + f.get("rms", 0.3) * 120)):
ang = random.uniform(0, 2 * math.pi)
sp = random.uniform(1, 10) * (0.5 + f.get("sub_r", 0.3) * 2)
S["px"].append(float(g.cols // 2))
S["py"].append(float(g.rows // 2))
S["vx"].append(math.cos(ang) * sp * 2.5)
S["vy"].append(math.sin(ang) * sp)
S["life"].append(1.0)
S["pch"].append(random.choice(chars))
# Update + draw particles
ch_p = np.full((g.rows, g.cols), " ", dtype="U1")
co_p = np.zeros((g.rows, g.cols, 3), dtype=np.uint8)
i = 0
while i < len(S["px"]):
S["px"][i] += S["vx"][i]; S["py"][i] += S["vy"][i]
S["vy"][i] += 0.03; S["life"][i] -= 0.02
if S["life"][i] <= 0:
for k in ("px","py","vx","vy","life","pch"): S[k].pop(i)
else:
pr, pc = int(S["py"][i]), int(S["px"][i])
if 0 <= pr < g.rows and 0 <= pc < g.cols:
ch_p[pr, pc] = S["pch"][i]
co_p[pr, pc] = hsv2rgb_scalar(
0.08 + (1-S["life"][i])*0.15, 0.95, S["life"][i])
i += 1
canvas_p = g.render(ch_p, co_p)
result = blend_canvas(result, canvas_p, "add", 0.8)
# Conditional kaleidoscope on strong beats
if f.get("bdecay", 0) > 0.4:
result = sh_kaleidoscope(result.copy(), folds=6)
return result
```
### Scene with Custom Character Rendering (Matrix Rain)
When you need per-cell control beyond what `_render_vf()` provides:
```python
def fx_matrix_layered(r, f, t, S):
"""Matrix rain blended with tunnel — two grids, screen blend."""
# Layer 1: Matrix rain (custom per-column rendering)
g = r.get_grid("md")
rows, cols = g.rows, g.cols
pal = PAL_KATA
if "ry" not in S or len(S["ry"]) != cols:
S["ry"] = np.random.uniform(-rows, rows, cols).astype(np.float32)
S["rsp"] = np.random.uniform(0.3, 2.0, cols).astype(np.float32)
S["rln"] = np.random.randint(8, 35, cols)
S["rch"] = np.random.randint(1, len(pal), (rows, cols))
speed = 0.6 + f.get("bass", 0.3) * 3
if f.get("beat", 0) > 0.5: speed *= 2.5
S["ry"] += S["rsp"] * speed
ch = np.full((rows, cols), " ", dtype="U1")
co = np.zeros((rows, cols, 3), dtype=np.uint8)
heads = S["ry"].astype(int)
for c in range(cols):
head = heads[c]
for i in range(S["rln"][c]):
row = head - i
if 0 <= row < rows:
fade = 1.0 - i / S["rln"][c]
ch[row, c] = pal[S["rch"][row, c] % len(pal)]
if i == 0:
v = int(min(255, fade * 300))
co[row, c] = (int(v*0.9), v, int(v*0.9))
else:
v = int(fade * 240)
co[row, c] = (int(v*0.1), v, int(v*0.4))
canvas_a = g.render(ch, co)
# Layer 2: Tunnel on sm grid for depth texture
canvas_b = _render_vf(r, "sm",
lambda g, f, t, S: vf_tunnel(g, f, t, S, speed=5.0, complexity=10),
hf_distance(0.3, 0.02), PAL_BLOCKS, f, t, S, sat=0.6)
return blend_canvas(canvas_a, canvas_b, "screen", 0.5)
```
---
## Scene Table
The scene table defines the timeline: which scene plays when, with what configuration.
### Structure
```python
SCENES = [
{
"start": 0.0, # start time in seconds
"end": 3.96, # end time in seconds
"name": "starfield", # identifier (used for clip filenames)
"grid": "sm", # default grid (for render_clip setup)
"fx": fx_starfield, # scene function reference (must be module-level)
"gamma": 0.75, # tonemap gamma override (default 0.75)
"shaders": [ # shader chain (applied after tonemap + feedback)
("bloom", {"thr": 120}),
("vignette", {"s": 0.2}),
("grain", {"amt": 8}),
],
"feedback": None, # feedback buffer config (None = disabled)
# "feedback": {"decay": 0.8, "blend": "screen", "opacity": 0.3,
# "transform": "zoom", "transform_amt": 0.02, "hue_shift": 0.02},
},
{
"start": 3.96,
"end": 6.58,
"name": "matrix_layered",
"grid": "md",
"fx": fx_matrix_layered,
"shaders": [
("crt", {"strength": 0.05}),
("scanlines", {"intensity": 0.12}),
("color_grade", {"tint": (0.7, 1.2, 0.7)}),
("bloom", {"thr": 100}),
],
"feedback": {"decay": 0.5, "blend": "add", "opacity": 0.2},
},
# ... more scenes ...
]
```
### Beat-Synced Scene Cutting
Derive cut points from audio analysis:
```python
# Get beat timestamps
beats = [fi / FPS for fi in range(N_FRAMES) if features["beat"][fi] > 0.5]
# Group beats into phrase boundaries (every 4-8 beats)
cuts = [0.0]
for i in range(0, len(beats), 4): # cut every 4 beats
cuts.append(beats[i])
cuts.append(DURATION)
# Or use the music's structure: silence gaps, energy changes
energy = features["rms"]
# Find timestamps where energy drops significantly -> natural break points
```
### `render_clip()` — The Render Loop
This function renders one scene to a clip file:
```python
def render_clip(seg, features, clip_path):
r = Renderer()
r.set_grid(seg["grid"])
S = r.S
random.seed(hash(seg["id"]) + 42) # deterministic per scene
# Build shader chain from config
chain = ShaderChain()
for shader_name, kwargs in seg.get("shaders", []):
chain.add(shader_name, **kwargs)
# Setup feedback buffer
fb = None
fb_cfg = seg.get("feedback", None)
if fb_cfg:
fb = FeedbackBuffer()
fx_fn = seg["fx"]
# Open ffmpeg pipe
cmd = ["ffmpeg", "-y", "-f", "rawvideo", "-pix_fmt", "rgb24",
"-s", f"{VW}x{VH}", "-r", str(FPS), "-i", "pipe:0",
"-c:v", "libx264", "-preset", "fast", "-crf", "20",
"-pix_fmt", "yuv420p", clip_path]
stderr_fh = open(clip_path.replace(".mp4", ".log"), "w")
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE,
stdout=subprocess.DEVNULL, stderr=stderr_fh)
for fi in range(seg["frame_start"], seg["frame_end"]):
t = fi / FPS
feat = {k: float(features[k][fi]) for k in features}
# 1. Scene renders canvas
canvas = fx_fn(r, feat, t, S)
# 2. Tonemap normalizes brightness
canvas = tonemap(canvas, gamma=seg.get("gamma", 0.75))
# 3. Feedback adds temporal recursion
if fb and fb_cfg:
canvas = fb.apply(canvas, **{k: fb_cfg[k] for k in fb_cfg})
# 4. Shader chain adds post-processing
canvas = chain.apply(canvas, f=feat, t=t)
pipe.stdin.write(canvas.tobytes())
pipe.stdin.close(); pipe.wait(); stderr_fh.close()
```
### Building Segments from Scene Table
```python
segments = []
for i, scene in enumerate(SCENES):
segments.append({
"id": f"s{i:02d}_{scene['name']}",
"name": scene["name"],
"grid": scene["grid"],
"fx": scene["fx"],
"shaders": scene.get("shaders", []),
"feedback": scene.get("feedback", None),
"gamma": scene.get("gamma", 0.75),
"frame_start": int(scene["start"] * FPS),
"frame_end": int(scene["end"] * FPS),
})
```
### Parallel Rendering
Scenes are independent units dispatched to a process pool:
```python
from concurrent.futures import ProcessPoolExecutor, as_completed
with ProcessPoolExecutor(max_workers=N_WORKERS) as pool:
futures = {
pool.submit(render_clip, seg, features, clip_path): seg["id"]
for seg, clip_path in zip(segments, clip_paths)
}
for fut in as_completed(futures):
try:
fut.result()
except Exception as e:
log(f"ERROR {futures[fut]}: {e}")
```
**Pickling constraint**: `ProcessPoolExecutor` serializes arguments via pickle. Module-level functions can be pickled; lambdas and closures cannot. All `fx_*` scene functions MUST be defined at module level, not as closures or class methods.
### Test-Frame Mode
Render a single frame at a specific timestamp to verify visuals without a full render:
```python
if args.test_frame >= 0:
fi = min(int(args.test_frame * FPS), N_FRAMES - 1)
t = fi / FPS
feat = {k: float(features[k][fi]) for k in features}
scene = next(sc for sc in reversed(SCENES) if t >= sc["start"])
r = Renderer()
r.set_grid(scene["grid"])
canvas = scene["fx"](r, feat, t, r.S)
canvas = tonemap(canvas, gamma=scene.get("gamma", 0.75))
chain = ShaderChain()
for sn, kw in scene.get("shaders", []):
chain.add(sn, **kw)
canvas = chain.apply(canvas, f=feat, t=t)
Image.fromarray(canvas).save(f"test_{args.test_frame:.1f}s.png")
print(f"Mean brightness: {canvas.astype(float).mean():.1f}")
```
CLI: `python reel.py --test-frame 10.0`
---
## Scene Design Checklist
For each scene:
1. **Choose 2-3 grid sizes** — different scales create interference
2. **Choose different value fields** per layer — don't use the same effect on every grid
3. **Choose different hue fields** per layer — or at minimum different hue offsets
4. **Choose different palettes** per layer — mixing PAL_RUNE with PAL_BLOCKS looks different from PAL_RUNE with PAL_DENSE
5. **Choose a blend mode** that matches the energy — screen for bright, difference for psychedelic, exclusion for subtle
6. **Add conditional effects** on beat — kaleidoscope, mirror, glitch
7. **Configure feedback** for trailing/recursive looks — or None for clean cuts
8. **Set gamma** if using destructive shaders (solarize, posterize)
9. **Test with --test-frame** at the scene's midpoint before full render

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,331 @@
# Troubleshooting Reference
Common bugs, gotchas, and platform-specific issues encountered during ASCII video development.
## NumPy Broadcasting
### The `broadcast_to().copy()` Trap
Hue field generators often return arrays that are broadcast views — they have shape `(1, cols)` or `(rows, 1)` that numpy broadcasts to `(rows, cols)`. These views are **read-only**. If any downstream code tries to modify them in-place (e.g., `h %= 1.0`), numpy raises:
```
ValueError: output array is read-only
```
**Fix**: Always `.copy()` after `broadcast_to()`:
```python
h = np.broadcast_to(h, (g.rows, g.cols)).copy()
```
This is especially important in `_render_vf()` where hue arrays flow through `hsv2rgb()`.
### The `+=` vs `+` Trap
Broadcasting also fails with in-place operators when operand shapes don't match exactly:
```python
# FAILS if result is (rows,1) and operand is (rows, cols)
val += np.sin(g.cc * 0.02 + t * 0.3) * 0.5
# WORKS — creates a new array
val = val + np.sin(g.cc * 0.02 + t * 0.3) * 0.5
```
The `vf_plasma()` function had this bug. Use `+` instead of `+=` when mixing different-shaped arrays.
### Shape Mismatch in `hsv2rgb()`
`hsv2rgb(h, s, v)` requires all three arrays to have identical shapes. If `h` is `(1, cols)` and `s` is `(rows, cols)`, the function crashes or produces wrong output.
**Fix**: Ensure all inputs are broadcast and copied to `(rows, cols)` before calling.
---
## Blend Mode Pitfalls
### Overlay Crushes Dark Inputs
`overlay(a, b) = 2*a*b` when `a < 0.5`. Two values of 0.12 produce `2 * 0.12 * 0.12 = 0.03`. The result is darker than either input.
**Impact**: If both layers are dark (which ASCII art usually is), overlay produces near-black output.
**Fix**: Use `screen` for dark source material. Screen always brightens: `1 - (1-a)*(1-b)`.
### Colordodge Division by Zero
`colordodge(a, b) = a / (1 - b)`. When `b = 1.0` (pure white pixels), this divides by zero.
**Fix**: Add epsilon: `a / (1 - b + 1e-6)`. The implementation in `BLEND_MODES` should include this.
### Colorburn Division by Zero
`colorburn(a, b) = 1 - (1-a) / b`. When `b = 0` (pure black pixels), this divides by zero.
**Fix**: Add epsilon: `1 - (1-a) / (b + 1e-6)`.
### Multiply Always Darkens
`multiply(a, b) = a * b`. Since both operands are [0,1], the result is always <= min(a,b). Never use multiply as a feedback blend mode — the frame goes black within a few frames.
**Fix**: Use `screen` for feedback, or `add` with low opacity.
---
## Multiprocessing
### Pickling Constraints
`ProcessPoolExecutor` serializes function arguments via pickle. This constrains what you can pass to workers:
| Can Pickle | Cannot Pickle |
|-----------|---------------|
| Module-level functions (`def fx_foo():`) | Lambdas (`lambda x: x + 1`) |
| Dicts, lists, numpy arrays | Closures (functions defined inside functions) |
| Class instances (with `__reduce__`) | Instance methods |
| Strings, numbers | File handles, sockets |
**Impact**: All scene functions referenced in the SCENES table must be defined at module level with `def`. If you use a lambda or closure, you get:
```
_pickle.PicklingError: Can't pickle <function <lambda> at 0x...>
```
**Fix**: Define all scene functions at module top level. Lambdas used inside `_render_vf()` as val_fn/hue_fn are fine because they execute within the worker process — they're not pickled across process boundaries.
### macOS spawn vs Linux fork
On macOS, `multiprocessing` defaults to `spawn` (full serialization). On Linux, it defaults to `fork` (copy-on-write). This means:
- **macOS**: Feature arrays are serialized per worker (~57KB for 30s video, but scales with duration). Each worker re-imports the entire module.
- **Linux**: Feature arrays are shared via COW. Workers inherit the parent's memory.
**Impact**: On macOS, module-level code (like `detect_hardware()`) runs in every worker process. If it has side effects (e.g., subprocess calls), those happen N+1 times.
### Per-Worker State Isolation
Each worker creates its own:
- `Renderer` instance (with fresh grid cache)
- `FeedbackBuffer` (feedback doesn't cross scene boundaries)
- Random seed (`random.seed(hash(seg_id) + 42)`)
This means:
- Particle state doesn't carry between scenes (expected)
- Feedback trails reset at scene cuts (expected)
- `np.random` state is NOT seeded by `random.seed()` — they use separate RNGs
**Fix for deterministic noise**: Use `np.random.RandomState(seed)` explicitly:
```python
rng = np.random.RandomState(hash(seg_id) + 42)
noise = rng.random((rows, cols))
```
---
## Brightness Issues
### Dark Scenes After Tonemap
If a scene is still dark after tonemap, check:
1. **Gamma too high**: Lower gamma (0.5-0.6) for scenes with destructive post-processing
2. **Shader destroying brightness**: Solarize, posterize, or contrast adjustments in the shader chain can undo tonemap's work. Move destructive shaders earlier in the chain, or increase gamma to compensate.
3. **Feedback with multiply**: Multiply feedback darkens every frame. Switch to screen or add.
4. **Overlay blend in scene**: If the scene function uses `blend_canvas(..., "overlay", ...)` with dark layers, switch to screen.
### Diagnostic: Test-Frame Brightness
```bash
python reel.py --test-frame 10.0
# Output: Mean brightness: 44.3, max: 255
```
If mean < 20, the scene needs attention. Common fixes:
- Lower gamma in the SCENES entry
- Change internal blend modes from overlay/multiply to screen/add
- Increase value field multipliers (e.g., `vf_plasma(...) * 1.5`)
- Check that the shader chain doesn't have an aggressive solarize or threshold
### v1 Brightness Pattern (Deprecated)
The old pattern used a linear multiplier:
```python
# OLD — don't use
canvas = np.clip(canvas.astype(np.float32) * 2.0, 0, 255).astype(np.uint8)
```
This fails because:
- Dark scenes (mean 8): `8 * 2.0 = 16` — still dark
- Bright scenes (mean 130): `130 * 2.0 = 255` — clipped, lost detail
Use `tonemap()` instead. See `composition.md` § Adaptive Tone Mapping.
---
## ffmpeg Issues
### Pipe Deadlock
The #1 production bug. If you use `stderr=subprocess.PIPE`:
```python
# DEADLOCK — stderr buffer fills at 64KB, blocks ffmpeg, blocks your writes
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE, stderr=subprocess.PIPE)
```
**Fix**: Always redirect stderr to a file:
```python
stderr_fh = open(err_path, "w")
pipe = subprocess.Popen(cmd, stdin=subprocess.PIPE,
stdout=subprocess.DEVNULL, stderr=stderr_fh)
```
### Frame Count Mismatch
If the number of frames written to the pipe doesn't match what ffmpeg expects (based on `-r` and duration), the output may have:
- Missing frames at the end
- Incorrect duration
- Audio-video desync
**Fix**: Calculate frame count explicitly: `n_frames = int(duration * FPS)`. Don't use `range(int(start*FPS), int(end*FPS))` without verifying the total matches.
### Concat Fails with "unsafe file name"
```
[concat @ ...] Unsafe file name
```
**Fix**: Always use `-safe 0`:
```python
["ffmpeg", "-f", "concat", "-safe", "0", "-i", concat_path, ...]
```
---
## Font Issues
### Cell Height (macOS Pillow)
`textbbox()` and `getbbox()` return incorrect heights on some macOS Pillow versions. Use `getmetrics()`:
```python
ascent, descent = font.getmetrics()
cell_height = ascent + descent # correct
# NOT: font.getbbox("M")[3] # wrong on some versions
```
### Missing Unicode Glyphs
Not all fonts render all Unicode characters. If a palette character isn't in the font, the glyph renders as a blank or tofu box, appearing as a dark hole in the output.
**Fix**: Validate at init:
```python
all_chars = set()
for pal in [PAL_DEFAULT, PAL_DENSE, PAL_RUNE, ...]:
all_chars.update(pal)
valid_chars = set()
for c in all_chars:
if c == " ":
valid_chars.add(c)
continue
img = Image.new("L", (20, 20), 0)
ImageDraw.Draw(img).text((0, 0), c, fill=255, font=font)
if np.array(img).max() > 0:
valid_chars.add(c)
else:
log(f"WARNING: '{c}' (U+{ord(c):04X}) missing from font")
```
### Platform Font Paths
| Platform | Common Paths |
|----------|-------------|
| macOS | `/System/Library/Fonts/Menlo.ttc`, `/System/Library/Fonts/Monaco.ttf` |
| Linux | `/usr/share/fonts/truetype/dejavu/DejaVuSansMono.ttf` |
| Windows | `C:\Windows\Fonts\consola.ttf` (Consolas) |
Always probe multiple paths and fall back gracefully. See `architecture.md` § Font Selection.
---
## Performance
### Slow Shaders
Some shaders use Python loops and are very slow at 1080p:
| Shader | Issue | Fix |
|--------|-------|-----|
| `wave_distort` | Per-row Python loop | Use vectorized fancy indexing |
| `halftone` | Triple-nested loop | Vectorize with block reduction |
| `matrix rain` | Per-column per-trail loop | Accumulate index arrays, bulk assign |
### Render Time Scaling
If render is taking much longer than expected:
1. Check grid count — each extra grid adds ~100-150ms/frame for init
2. Check particle count — cap at quality-appropriate limits
3. Check shader count — each shader adds 2-25ms
4. Check for accidental Python loops in effects (should be numpy only)
---
## Common Mistakes
### Using `r.S` vs the `S` Parameter
The v2 scene protocol passes `S` (the state dict) as an explicit parameter. But `S` IS `r.S` — they're the same object. Both work:
```python
def fx_scene(r, f, t, S):
S["counter"] = S.get("counter", 0) + 1 # via parameter (preferred)
r.S["counter"] = r.S.get("counter", 0) + 1 # via renderer (also works)
```
Use the `S` parameter for clarity. The explicit parameter makes it obvious that the function has persistent state.
### Forgetting to Handle Empty Feature Values
Audio features default to 0.0 if the audio is silent. Use `.get()` with sensible defaults:
```python
energy = f.get("bass", 0.3) # default to 0.3, not 0
```
If you default to 0, effects go blank during silence.
### Writing New Files Instead of Editing Existing State
A common bug in particle systems: creating new arrays every frame instead of updating persistent state.
```python
# WRONG — particles reset every frame
S["px"] = []
for _ in range(100):
S["px"].append(random.random())
# RIGHT — only initialize once, update each frame
if "px" not in S:
S["px"] = []
# ... emit new particles based on beats
# ... update existing particles
```
### Not Clipping Value Fields
Value fields should be [0, 1]. If they exceed this range, `val2char()` produces index errors:
```python
# WRONG — vf_plasma() * 1.5 can exceed 1.0
val = vf_plasma(g, f, t, S) * 1.5
# RIGHT — clip after scaling
val = np.clip(vf_plasma(g, f, t, S) * 1.5, 0, 1)
```
The `_render_vf()` helper clips automatically, but if you're building custom scenes, clip explicitly.

View File

@@ -0,0 +1,215 @@
---
name: pokemon-player
description: Play Pokemon games autonomously via headless emulation. Starts a game server, reads structured game state from RAM, makes strategic decisions, and sends button inputs — all from the terminal.
tags: [gaming, pokemon, emulator, pyboy, gameplay, gameboy]
---
# Pokemon Player
Play Pokemon games via headless emulation using the `pokemon-agent` package.
## When to Use
- User says "play pokemon", "start pokemon", "pokemon game"
- User asks about Pokemon Red, Blue, Yellow, FireRed, etc.
- User wants to watch an AI play Pokemon
- User references a ROM file (.gb, .gbc, .gba)
## Startup Procedure
### 1. First-time setup (clone, venv, install)
The repo is NousResearch/pokemon-agent on GitHub. Clone it, then
set up a Python 3.10+ virtual environment. Use uv (preferred for speed)
to create the venv and install the package in editable mode with the
pyboy extra. If uv is not available, fall back to python3 -m venv + pip.
On this machine it is already set up at /home/teknium/pokemon-agent
with a venv ready — just cd there and source .venv/bin/activate.
You also need a ROM file. Ask the user for theirs. On this machine
one exists at roms/pokemon_red.gb inside that directory.
NEVER download or provide ROM files — always ask the user.
### 2. Start the game server
From inside the pokemon-agent directory with the venv activated, run
pokemon-agent serve with --rom pointing to the ROM and --port 9876.
Run it in the background with &.
To resume from a saved game, add --load-state with the save name.
Wait 4 seconds for startup, then verify with GET /health.
### 3. Set up live dashboard for user to watch
Use an SSH reverse tunnel via localhost.run so the user can view
the dashboard in their browser. Connect with ssh, forwarding local
port 9876 to remote port 80 on nokey@localhost.run. Redirect output
to a log file, wait 10 seconds, then grep the log for the .lhr.life
URL. Give the user the URL with /dashboard/ appended.
The tunnel URL changes each time — give the user the new one if restarted.
## Save and Load
### When to save
- Every 15-20 turns of gameplay
- ALWAYS before gym battles, rival encounters, or risky fights
- Before entering a new town or dungeon
- Before any action you are unsure about
### How to save
POST /save with a descriptive name. Good examples:
before_brock, route1_start, mt_moon_entrance, got_cut
### How to load
POST /load with the save name.
### List available saves
GET /saves returns all saved states.
### Loading on server startup
Use --load-state flag when starting the server to auto-load a save.
This is faster than loading via the API after startup.
## The Gameplay Loop
### Step 1: OBSERVE — check state AND take a screenshot
GET /state for position, HP, battle, dialog.
GET /screenshot and save to /tmp/pokemon.png, then use vision_analyze.
Always do BOTH — RAM state gives numbers, vision gives spatial awareness.
### Step 2: ORIENT
- Dialog/text on screen → advance it
- In battle → fight or run
- Party hurt → head to Pokemon Center
- Near objective → navigate carefully
### Step 3: DECIDE
Priority: dialog > battle > heal > story objective > training > explore
### Step 4: ACT — move 2-4 steps max, then re-check
POST /action with a SHORT action list (2-4 actions, not 10-15).
### Step 5: VERIFY — screenshot after every move sequence
Take a screenshot and use vision_analyze to confirm you moved where
intended. This is the MOST IMPORTANT step. Without vision you WILL get lost.
### Step 6: RECORD progress to memory with PKM: prefix
### Step 7: SAVE periodically
## Action Reference
- press_a — confirm, talk, select
- press_b — cancel, close menu
- press_start — open game menu
- walk_up/down/left/right — move one tile
- hold_b_N — hold B for N frames (use for speeding through text)
- wait_60 — wait about 1 second (60 frames)
- a_until_dialog_end — press A repeatedly until dialog clears
## Critical Tips from Experience
### USE VISION CONSTANTLY
- Take a screenshot every 2-4 movement steps
- The RAM state tells you position and HP but NOT what is around you
- Ledges, fences, signs, building doors, NPCs — only visible via screenshot
- Ask the vision model specific questions: "what is one tile north of me?"
- When stuck, always screenshot before trying random directions
### Warp Transitions Need Extra Wait Time
When walking through a door or stairs, the screen fades to black during
the map transition. You MUST wait for it to complete. Add 2-3 wait_60
actions after any door/stair warp. Without waiting, the position reads
as stale and you will think you are still in the old map.
### Building Exit Trap
When you exit a building, you appear directly IN FRONT of the door.
If you walk north, you go right back inside. ALWAYS sidestep first
by walking left or right 2 tiles, then proceed in your intended direction.
### Dialog Handling
Gen 1 text scrolls slowly letter-by-letter. To speed through dialog,
hold B for 120 frames then press A. Repeat as needed. Holding B makes
text display at max speed. Then press A to advance to the next line.
The a_until_dialog_end action checks the RAM dialog flag, but this flag
does not catch ALL text states. If dialog seems stuck, use the manual
hold_b + press_a pattern instead and verify via screenshot.
### Ledges Are One-Way
Ledges (small cliff edges) can only be jumped DOWN (south), never climbed
UP (north). If blocked by a ledge going north, you must go left or right
to find the gap around it. Use vision to identify which direction the
gap is. Ask the vision model explicitly.
### Navigation Strategy
- Move 2-4 steps at a time, then screenshot to check position
- When entering a new area, screenshot immediately to orient
- Ask the vision model "which direction to [destination]?"
- If stuck for 3+ attempts, screenshot and re-evaluate completely
- Do not spam 10-15 movements — you will overshoot or get stuck
### Running from Wild Battles
On the battle menu, RUN is bottom-right. To reach it from the default
cursor position (FIGHT, top-left): press down then right to move cursor
to RUN, then press A. Wrap with hold_b to speed through text/animations.
### Battling (FIGHT)
On the battle menu FIGHT is top-left (default cursor position).
Press A to enter move selection, A again to use the first move.
Then hold B to speed through attack animations and text.
## Battle Strategy
### Decision Tree
1. Want to catch? → Weaken then throw Poke Ball
2. Wild you don't need? → RUN
3. Type advantage? → Use super-effective move
4. No advantage? → Use strongest STAB move
5. Low HP? → Switch or use Potion
### Gen 1 Type Chart (key matchups)
- Water beats Fire, Ground, Rock
- Fire beats Grass, Bug, Ice
- Grass beats Water, Ground, Rock
- Electric beats Water, Flying
- Ground beats Fire, Electric, Rock, Poison
- Psychic beats Fighting, Poison (dominant in Gen 1!)
### Gen 1 Quirks
- Special stat = both offense AND defense for special moves
- Psychic type is overpowered (Ghost moves bugged)
- Critical hits based on Speed stat
- Wrap/Bind prevent opponent from acting
- Focus Energy bug: REDUCES crit rate instead of raising it
## Memory Conventions
| Prefix | Purpose | Example |
|--------|---------|---------|
| PKM:OBJECTIVE | Current goal | Get Parcel from Viridian Mart |
| PKM:MAP | Navigation knowledge | Viridian: mart is northeast |
| PKM:STRATEGY | Battle/team plans | Need Grass type before Misty |
| PKM:PROGRESS | Milestone tracker | Beat rival, heading to Viridian |
| PKM:STUCK | Stuck situations | Ledge at y=28 go right to bypass |
| PKM:TEAM | Team notes | Squirtle Lv6, Tackle + Tail Whip |
## Progression Milestones
- Choose starter
- Deliver Parcel from Viridian Mart, receive Pokedex
- Boulder Badge — Brock (Rock) → use Water/Grass
- Cascade Badge — Misty (Water) → use Grass/Electric
- Thunder Badge — Lt. Surge (Electric) → use Ground
- Rainbow Badge — Erika (Grass) → use Fire/Ice/Flying
- Soul Badge — Koga (Poison) → use Ground/Psychic
- Marsh Badge — Sabrina (Psychic) → hardest gym
- Volcano Badge — Blaine (Fire) → use Water/Ground
- Earth Badge — Giovanni (Ground) → use Water/Grass/Ice
- Elite Four → Champion!
## Stopping Play
1. Save the game with a descriptive name via POST /save
2. Update memory with PKM:PROGRESS
3. Tell user: "Game saved as [name]! Say 'play pokemon' to resume."
4. Kill the server and tunnel background processes
## Pitfalls
- NEVER download or provide ROM files
- Do NOT send more than 4-5 actions without checking vision
- Always sidestep after exiting buildings before going north
- Always add wait_60 x2-3 after door/stair warps
- Dialog detection via RAM is unreliable — verify with screenshots
- Save BEFORE risky encounters
- The tunnel URL changes each time you restart it

View File

@@ -0,0 +1,69 @@
---
name: find-nearby
description: Find nearby places (restaurants, cafes, bars, pharmacies, etc.) using OpenStreetMap. Works with coordinates, addresses, cities, zip codes, or Telegram location pins. No API keys needed.
version: 1.0.0
metadata:
hermes:
tags: [location, maps, nearby, places, restaurants, local]
related_skills: []
---
# Find Nearby — Local Place Discovery
Find restaurants, cafes, bars, pharmacies, and other places near any location. Uses OpenStreetMap (free, no API keys). Works with:
- **Coordinates** from Telegram location pins (latitude/longitude in conversation)
- **Addresses** ("near 123 Main St, Springfield")
- **Cities** ("restaurants in downtown Austin")
- **Zip codes** ("pharmacies near 90210")
- **Landmarks** ("cafes near Times Square")
## Quick Reference
```bash
# By coordinates (from Telegram location pin or user-provided)
python3 SKILL_DIR/scripts/find_nearby.py --lat <LAT> --lon <LON> --type restaurant --radius 1500
# By address, city, or landmark (auto-geocoded)
python3 SKILL_DIR/scripts/find_nearby.py --near "Times Square, New York" --type cafe
# Multiple place types
python3 SKILL_DIR/scripts/find_nearby.py --near "downtown austin" --type restaurant --type bar --limit 10
# JSON output
python3 SKILL_DIR/scripts/find_nearby.py --near "90210" --type pharmacy --json
```
### Parameters
| Flag | Description | Default |
|------|-------------|---------|
| `--lat`, `--lon` | Exact coordinates | — |
| `--near` | Address, city, zip, or landmark (geocoded) | — |
| `--type` | Place type (repeatable for multiple) | restaurant |
| `--radius` | Search radius in meters | 1500 |
| `--limit` | Max results | 15 |
| `--json` | Machine-readable JSON output | off |
### Common Place Types
`restaurant`, `cafe`, `bar`, `pub`, `fast_food`, `pharmacy`, `hospital`, `bank`, `atm`, `fuel`, `parking`, `supermarket`, `convenience`, `hotel`
## Workflow
1. **Get the location.** Look for coordinates (`latitude: ... / longitude: ...`) from a Telegram pin, or ask the user for an address/city/zip.
2. **Ask for preferences** (only if not already stated): place type, how far they're willing to go, any specifics (cuisine, "open now", etc.).
3. **Run the script** with appropriate flags. Use `--json` if you need to process results programmatically.
4. **Present results** with names, distances, and Google Maps links. If the user asked about hours or "open now," check the `hours` field in results — if missing or unclear, verify with `web_search`.
5. **For directions**, use the `directions_url` from results, or construct: `https://www.google.com/maps/dir/?api=1&origin=<LAT>,<LON>&destination=<LAT>,<LON>`
## Tips
- If results are sparse, widen the radius (1500 → 3000m)
- For "open now" requests: check the `hours` field in results, cross-reference with `web_search` for accuracy since OSM hours aren't always complete
- Zip codes alone can be ambiguous globally — prompt the user for country/state if results look wrong
- The script uses OpenStreetMap data which is community-maintained; coverage varies by region

View File

@@ -0,0 +1,184 @@
#!/usr/bin/env python3
"""Find nearby places using OpenStreetMap (Overpass + Nominatim). No API keys needed.
Usage:
# By coordinates
python find_nearby.py --lat 36.17 --lon -115.14 --type restaurant --radius 1500
# By address/city/zip (auto-geocoded)
python find_nearby.py --near "Times Square, New York" --type cafe --radius 1000
python find_nearby.py --near "90210" --type pharmacy
# Multiple types
python find_nearby.py --lat 36.17 --lon -115.14 --type restaurant --type bar
# JSON output for programmatic use
python find_nearby.py --near "downtown las vegas" --type restaurant --json
"""
import argparse
import json
import math
import sys
import urllib.parse
import urllib.request
from typing import Any
OVERPASS_URLS = [
"https://overpass-api.de/api/interpreter",
"https://overpass.kumi.systems/api/interpreter",
]
NOMINATIM_URL = "https://nominatim.openstreetmap.org/search"
USER_AGENT = "HermesAgent/1.0 (find-nearby skill)"
TIMEOUT = 15
def _http_get(url: str) -> Any:
req = urllib.request.Request(url, headers={"User-Agent": USER_AGENT})
with urllib.request.urlopen(req, timeout=TIMEOUT) as r:
return json.loads(r.read())
def _http_post(url: str, data: str) -> Any:
req = urllib.request.Request(
url, data=data.encode(), headers={"User-Agent": USER_AGENT}
)
with urllib.request.urlopen(req, timeout=TIMEOUT) as r:
return json.loads(r.read())
def haversine(lat1: float, lon1: float, lat2: float, lon2: float) -> float:
"""Distance in meters between two coordinates."""
R = 6_371_000
rlat1, rlat2 = math.radians(lat1), math.radians(lat2)
dlat = math.radians(lat2 - lat1)
dlon = math.radians(lon2 - lon1)
a = math.sin(dlat / 2) ** 2 + math.cos(rlat1) * math.cos(rlat2) * math.sin(dlon / 2) ** 2
return R * 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
def geocode(query: str) -> tuple[float, float]:
"""Convert address/city/zip to coordinates via Nominatim."""
params = urllib.parse.urlencode({"q": query, "format": "json", "limit": 1})
results = _http_get(f"{NOMINATIM_URL}?{params}")
if not results:
print(f"Error: Could not geocode '{query}'. Try a more specific address.", file=sys.stderr)
sys.exit(1)
return float(results[0]["lat"]), float(results[0]["lon"])
def find_nearby(lat: float, lon: float, types: list[str], radius: int = 1500, limit: int = 15) -> list[dict]:
"""Query Overpass for nearby amenities."""
# Build Overpass QL query
type_filters = "".join(
f'nwr["amenity"="{t}"](around:{radius},{lat},{lon});' for t in types
)
query = f"[out:json][timeout:{TIMEOUT}];({type_filters});out center tags;"
# Try each Overpass server
data = None
for url in OVERPASS_URLS:
try:
data = _http_post(url, f"data={urllib.parse.quote(query)}")
break
except Exception:
continue
if not data:
return []
# Parse results
places = []
for el in data.get("elements", []):
tags = el.get("tags", {})
name = tags.get("name")
if not name:
continue
# Get coordinates (nodes have lat/lon directly, ways/relations use center)
plat = el.get("lat") or (el.get("center", {}) or {}).get("lat")
plon = el.get("lon") or (el.get("center", {}) or {}).get("lon")
if not plat or not plon:
continue
dist = haversine(lat, lon, plat, plon)
place = {
"name": name,
"type": tags.get("amenity", ""),
"distance_m": round(dist),
"lat": plat,
"lon": plon,
"maps_url": f"https://www.google.com/maps/search/?api=1&query={plat},{plon}",
"directions_url": f"https://www.google.com/maps/dir/?api=1&origin={lat},{lon}&destination={plat},{plon}",
}
# Add useful optional fields
if tags.get("cuisine"):
place["cuisine"] = tags["cuisine"]
if tags.get("opening_hours"):
place["hours"] = tags["opening_hours"]
if tags.get("phone"):
place["phone"] = tags["phone"]
if tags.get("website"):
place["website"] = tags["website"]
if tags.get("addr:street"):
addr_parts = [tags.get("addr:housenumber", ""), tags.get("addr:street", "")]
if tags.get("addr:city"):
addr_parts.append(tags["addr:city"])
place["address"] = " ".join(p for p in addr_parts if p)
places.append(place)
# Sort by distance, limit results
places.sort(key=lambda p: p["distance_m"])
return places[:limit]
def main():
parser = argparse.ArgumentParser(description="Find nearby places via OpenStreetMap")
parser.add_argument("--lat", type=float, help="Latitude")
parser.add_argument("--lon", type=float, help="Longitude")
parser.add_argument("--near", type=str, help="Address, city, or zip code (geocoded automatically)")
parser.add_argument("--type", action="append", dest="types", default=[], help="Place type (restaurant, cafe, bar, pharmacy, etc.)")
parser.add_argument("--radius", type=int, default=1500, help="Search radius in meters (default: 1500)")
parser.add_argument("--limit", type=int, default=15, help="Max results (default: 15)")
parser.add_argument("--json", action="store_true", dest="json_output", help="Output as JSON")
args = parser.parse_args()
# Resolve coordinates
if args.near:
lat, lon = geocode(args.near)
elif args.lat is not None and args.lon is not None:
lat, lon = args.lat, args.lon
else:
print("Error: Provide --lat/--lon or --near", file=sys.stderr)
sys.exit(1)
if not args.types:
args.types = ["restaurant"]
places = find_nearby(lat, lon, args.types, args.radius, args.limit)
if args.json_output:
print(json.dumps({"origin": {"lat": lat, "lon": lon}, "results": places, "count": len(places)}, indent=2))
else:
if not places:
print(f"No {'/'.join(args.types)} found within {args.radius}m")
return
print(f"Found {len(places)} places within {args.radius}m:\n")
for i, p in enumerate(places, 1):
dist_str = f"{p['distance_m']}m" if p["distance_m"] < 1000 else f"{p['distance_m']/1000:.1f}km"
print(f" {i}. {p['name']} ({p['type']}) — {dist_str}")
if p.get("cuisine"):
print(f" Cuisine: {p['cuisine']}")
if p.get("hours"):
print(f" Hours: {p['hours']}")
if p.get("address"):
print(f" Address: {p['address']}")
print(f" Map: {p['maps_url']}")
print()
if __name__ == "__main__":
main()

View File

@@ -321,6 +321,32 @@ mcp_servers:
All tools from all servers are registered and available simultaneously. Each server's tools are prefixed with its name to avoid collisions.
## Sampling (Server-Initiated LLM Requests)
Hermes supports MCP's `sampling/createMessage` capability — MCP servers can request LLM completions through the agent during tool execution. This enables agent-in-the-loop workflows (data analysis, content generation, decision-making).
Sampling is **enabled by default**. Configure per server:
```yaml
mcp_servers:
my_server:
command: "npx"
args: ["-y", "my-mcp-server"]
sampling:
enabled: true # default: true
model: "gemini-3-flash" # model override (optional)
max_tokens_cap: 4096 # max tokens per request
timeout: 30 # LLM call timeout (seconds)
max_rpm: 10 # max requests per minute
allowed_models: [] # model whitelist (empty = all)
max_tool_rounds: 5 # tool loop limit (0 = disable)
log_level: "info" # audit verbosity
```
Servers can also include `tools` in sampling requests for multi-turn tool-augmented workflows. The `max_tool_rounds` config prevents infinite tool loops. Per-server audit metrics (requests, errors, tokens, tool use count) are tracked via `get_mcp_status()`.
Disable sampling for untrusted servers with `sampling: { enabled: false }`.
## Notes
- MCP tools are called synchronously from the agent's perspective but run asynchronously on a dedicated background event loop

View File

@@ -1 +1,3 @@
Media content extraction and transformation tools — YouTube transcripts, audio, video processing.
---
description: Skills for working with media content — YouTube transcripts, GIF search, music generation, and audio visualization.
---

View File

@@ -0,0 +1,3 @@
---
description: GPU cloud providers and serverless compute platforms for ML workloads.
---

View File

@@ -0,0 +1,3 @@
---
description: Model evaluation benchmarks, experiment tracking, data curation, tokenizers, and interpretability tools.
---

Some files were not shown because too many files have changed in this diff Show More