- Add `shopt -s expand_aliases` to snapshot so aliases captured by
`alias -p` actually work under `bash -c` (review comment #2)
- Pass threshold=0 in enforce_turn_budget() so L3 can force-persist
results below the 50K default when aggregate budget is exceeded
(review comment #3)
- Add regression test: 6x42K results (each under 50K) exceeding 200K
budget are now correctly persisted
- Daytona: skip refresh_data() API call unless sandbox was interrupted/errored
- Docker: cache _build_forward_env_args() to avoid re-reading .env every command
- All remote backends: TTL-based sync skip (5s) to avoid redundant dir walks
Expanded tool_result_storage.py module docstring to document the
three-level architecture. Replaced opaque L2/L3 labels at call
sites with self-describing comments.
Eliminates ~50 lines of duplicated pipe+thread+poll boilerplate between
_ModalProcessHandle and _DaytonaProcessHandle. Both now use closures
passed to the shared _ThreadedProcessHandle in base.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lockfile had drifted from main (debugpy, exa-py, version bump)
causing atomicwrites to fail building in the nix sandbox due to
missing setuptools in pyproject-build-systems.
SSH _before_execute() ran rsync unconditionally before every command,
adding ~2.3s overhead even when zero bytes were transferred. This was
80% of per-command latency (actual execution: ~0.6s).
Add (mtime, size) caching — matching the pattern Modal and Daytona
already use — to skip rsync when local files haven't changed:
- Per-file mtime+size check for credential files
- Directory fingerprint (set of relpath/mtime/size tuples) for skills
- --delete flag on skills rsync to prune uninstalled skills
- Track created remote dirs to avoid redundant mkdir -p calls
- Cache invalidation on rsync failure (remote may have been wiped)
- force=True parameter as escape hatch for debugging
Before: ~3s per SSH command (2.3s rsync + 0.6s execution)
After: ~0.6s per SSH command (mtime check + execution)
SSH test suite: 134s → 50s
Previously, _wrap_command() wrote pwd to a file on the remote (container,
sandbox, SSH host), then _update_cwd_from_file() read it back via another
_run_bash() call. On Modal/Daytona this was a full API round-trip just to
read 20 bytes.
Now the wrapping template echoes the cwd to stdout with markers:
printf '\n__HERMES_CWD__%s__HERMES_CWD__\n' "$(pwd -P)"
_extract_cwd_from_output() parses it from the output already in memory.
Zero extra round-trips on any backend. The cwdfile, _read_file_in_env(),
and per-backend overrides are all deleted.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add _tool_result_storage_dir to HermesAgentLoop.__init__
- Apply maybe_persist_tool_result() before tool message append
- Add enforce_turn_budget() after all tool calls in a turn
- Both wrapped in try/except (best-effort in eval path)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Layer 2: Replace destructive head-truncation with maybe_persist_tool_result()
in both concurrent and sequential tool execution paths. Large results are
now written to ~/.hermes/sessions/{id}/tool-results/ with a 2KB preview
in context. Model can read_file the persisted path for full content.
Layer 3: Add enforce_turn_budget() after all tool results in a turn.
If aggregate exceeds 200K chars, persist largest results first until
under budget. Runs after concurrent futures.wait() (single-threaded).
Callbacks still receive full untruncated results before persistence.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Declares max_result_size_chars on each tool registration so the persistence
layer can apply per-tool limits instead of the global 50K default. Adds a
Layer 1 output cap inside search_tool() to prevent context overflow, and
adds a schema maximum of 10000 to the search_files limit parameter.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add tools/tool_result_storage.py implementing Layer 2 (per-result) and
Layer 3 (per-turn budget) persistence for large tool outputs. Results
exceeding thresholds are written to disk with a <persisted-output>
preview block replacing the inline content. Extend ToolEntry and
ToolRegistry with max_result_size_chars for per-tool threshold control.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Update execute tests to account for init_session during __init__
- Fix CWD resolution tests for cwdfile reads
- Patch is_interrupted at base module level (where _wait_for_process uses it)
- Update stdin heredoc test for new call pattern
- 27/27 Daytona unit tests passing
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- DELETE persistent_shell.py entirely (277 lines removed)
- Remove _SHELL_NOISE_SUBSTRINGS, _clean_shell_noise, _extract_fenced_output
from local.py (unused after fence marker removal)
- Adapt ManagedModalEnvironment to use BaseEnvironment + _wrap_command()
while keeping its own HTTP-based execute()
- Remove _OUTPUT_FENCE constant
42/42 tests passing across all testable backends.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Salvages the core fix from PR #5673 (egerev) onto current main.
The chatgpt.com/backend-api/codex endpoint streams valid output items
via response.output_item.done events, but the OpenAI SDK's
get_final_response() returns an empty output list. This caused every
Codex response to be rejected as invalid.
Fix: collect output_item.done events during streaming and backfill
response.output when get_final_response() returns empty. Falls back
to synthesizing from text deltas when no done events were received.
Also moves the synthesis logic from the validation loop (too late, from
#5681) into _run_codex_stream() (before the response leaves the
streaming function), and simplifies the validation to just log
diagnostics since recovery now happens upstream.
Co-authored-by: Egor <egerev@users.noreply.github.com>
OpenAI OAuth refresh tokens are single-use and rotate on every refresh.
When the Codex CLI (or another Hermes profile) refreshes its token, the
pool entry's refresh_token becomes stale. Subsequent refresh attempts
fail with invalid_grant, and the entry enters a 24-hour exhaustion
cooldown with no recovery path.
This mirrors the existing _sync_anthropic_entry_from_credentials_file()
pattern: when an openai-codex entry is exhausted, compare its
refresh_token against ~/.codex/auth.json and sync the fresh pair if
they differ.
Fixes the common scenario where users run 'codex login' to refresh
their token externally and Hermes never picks it up.
Co-authored-by: David Andrews (LexGenius.ai) <david@lexgenius.ai>
Three bugs causing OpenAI Codex sessions to fail silently:
1. Credential pool vs legacy store disconnect: hermes auth and hermes
model store device_code tokens in the credential pool, but
get_codex_auth_status(), resolve_codex_runtime_credentials(), and
_model_flow_openai_codex() only read from the legacy provider state.
Fresh pool tokens were invisible to the auth status checks and model
selection flow.
2. _import_codex_cli_tokens() imported expired tokens from ~/.codex/
without checking JWT expiry. Combined with _login_openai_codex()
saying 'Login successful!' for expired credentials, users got stuck
in a loop of dead tokens being recycled.
3. _login_openai_codex() accepted expired tokens from
resolve_codex_runtime_credentials() without validating expiry before
telling the user login succeeded.
Fixes:
- get_codex_auth_status() now checks credential pool first, falls back
to legacy provider state
- _model_flow_openai_codex() uses pool-aware auth status for token
retrieval when fetching model lists
- _import_codex_cli_tokens() validates JWT exp claim, rejects expired
- _login_openai_codex() verifies resolved token isn't expiring before
accepting existing credentials
- _run_codex_stream() logs response.incomplete/failed terminal events
with status and incomplete_details for diagnostics
- Codex empty output recovery: captures streamed text during streaming
and synthesizes a response when get_final_response() returns empty
output (handles chatgpt.com backend-api edge cases)
Two fixes:
1. Replace all stale 'hermes login' references with 'hermes auth' across
auth.py, auxiliary_client.py, delegate_tool.py, config.py, run_agent.py,
and documentation. The 'hermes login' command was deprecated; 'hermes auth'
now handles OAuth credential management.
2. Fix credential removal not persisting for singleton-sourced credentials
(device_code for openai-codex/nous, hermes_pkce for anthropic).
auth_remove_command already cleared env vars for env-sourced credentials,
but singleton credentials stored in the auth store were re-seeded by
_seed_from_singletons() on the next load_pool() call. Now clears the
underlying auth store entry when removing singleton-sourced credentials.
Add fine-grained authorization policies per Feishu group chat via
platforms.feishu.extra configuration.
- Add global bot-level admins that bypass all group restrictions
- Add per-group policies: open, allowlist, blacklist, admin_only, disabled
- Add default_group_policy fallback for chats without explicit rules
- Thread chat_id through group message gate for per-chat rule selection
- Match both open_id and user_id for backward compatibility
- Preserve existing FEISHU_ALLOWED_USERS / FEISHU_GROUP_POLICY behavior
- Add focused regression tests for all policy modes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Consolidate coercion functions, extract loop readiness check, and deduplicate test mock setup to improve maintainability without changing behavior.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reapply local reconnect and ping settings after the Feishu SDK refreshes its client config so user-provided websocket tuning actually takes effect.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allow Feishu websocket keepalive timing to be configured via platform
extra config so disconnects can be detected faster in unstable networks.
New optional extra settings:
- ws_ping_interval
- ws_ping_timeout
These values are applied only when explicitly configured. Invalid values
fall back to the websocket library defaults by leaving the options unset.
This complements the reconnect timing settings added previously and helps
reduce total recovery time after network interruptions.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Allow users to configure websocket reconnect behavior via platform extra
config to reduce reconnect latency in production environments.
The official Feishu SDK defaults to:
- First reconnect: random jitter 0-30 seconds
- Subsequent retries: 120 second intervals
This can cause 20-30 second delays before reconnection after network
interruptions. This commit makes these values configurable while keeping
the SDK defaults for backward compatibility.
Configuration via ~/.hermes/config.yaml:
```yaml
platforms:
feishu:
extra:
ws_reconnect_nonce: 0 # Disable first-reconnect jitter (default: 30)
ws_reconnect_interval: 3 # Retry every 3 seconds (default: 120)
```
Invalid values (negative numbers, non-integers) fall back to SDK defaults.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit fixes two critical bugs in the Feishu adapter that affect
message reliability and process lifecycle.
**Bug Fix 1: Intermittent Message Drops**
Root cause: Event handler was created once in __init__ and reused across
reconnects, causing callbacks to capture stale loop references. When the
adapter disconnected and reconnected, old callbacks continued firing with
invalid loop references, resulting in dropped messages with warnings:
"[Feishu] Dropping inbound message before adapter loop is ready"
Fix:
- Rebuild event handler on each connect (websocket/webhook)
- Clear handler on disconnect
- Ensure callbacks always capture current valid loop
- Add defensive loop.is_closed() checks with getattr for test compatibility
- Unify webhook dispatch path to use same loop checks as websocket mode
**Bug Fix 2: Process Hangs on Ctrl+C / SIGTERM**
Root cause: Feishu SDK's websocket client runs in a background thread with
an infinite _select() loop that never exits naturally. The thread was never
properly joined on disconnect, causing processes to hang indefinitely after
Ctrl+C or gateway stop commands.
Fix:
- Store reference to thread-local event loop (_ws_thread_loop)
- On disconnect, cancel all tasks in thread loop and stop it gracefully
via call_soon_threadsafe()
- Await thread future with 10s timeout
- Clean up pending tasks in thread's finally block before closing loop
- Add detailed debug logging for disconnect flow
**Additional Improvements:**
- Add regression tests for disconnect cleanup and webhook dispatch
- Ensure all event callbacks check loop readiness before dispatching
Tested on Linux with websocket mode. All Feishu tests pass.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two issues caused Matrix E2EE to silently not work in encrypted rooms:
1. When matrix-nio is installed without the [e2e] extra (no python-olm /
libolm), nio.crypto.ENCRYPTION_ENABLED is False and client.olm is
never initialized. The adapter logged warnings but returned True from
connect(), so the bot appeared online but could never decrypt messages.
Now: check_matrix_requirements() and connect() both hard-fail with a
clear error message when MATRIX_ENCRYPTION=true but E2EE deps are
missing.
2. Without a stable device_id, the bot gets a new device identity on each
restart. Other clients see it as "unknown device" and refuse to share
Megolm session keys. Now: MATRIX_DEVICE_ID env var lets users pin a
stable device identity that persists across restarts and is passed to
nio.AsyncClient constructor + restore_login().
Changes:
- gateway/platforms/matrix.py: add _check_e2ee_deps(), hard-fail in
connect() and check_matrix_requirements(), MATRIX_DEVICE_ID support
in constructor + restore_login
- gateway/config.py: plumb MATRIX_DEVICE_ID into platform extras
- hermes_cli/config.py: add MATRIX_DEVICE_ID to OPTIONAL_ENV_VARS
Closes#3521
When the Codex CLI (or VS Code extension) consumes a refresh token before
Hermes can use it, Hermes previously surfaced a generic 401 error with no
actionable guidance.
- In `refresh_codex_oauth_pure`: detect `refresh_token_reused` from the
OAuth endpoint and raise an AuthError explaining the cause and the exact
steps to recover (run `codex` to refresh, then `hermes login`).
- In `run_agent.py`: when provider is `openai-codex` and HTTP 401 is
received, show Codex-specific recovery steps instead of the generic
"check your API key" message.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- avoid hard-coded ~/.hermes paths in the setup and API shorthands
- prefer HERMES_HOME with a sane default to /Users/peteradams/.hermes
- keep the examples aligned with profile-aware Hermes installs
- fall back to adding the repo root to sys.path when hermes_constants is not importable
- fixes direct execution of setup.py and google_api.py from the repo checkout
- keeps the upstream PR scoped to the google-workspace compatibility fix
The Mattermost adapter downloads file attachments correctly but
never updates msg_type from TEXT to DOCUMENT. This means the
document enrichment block in gateway/run.py (which requires
MessageType.DOCUMENT) never executes — text files are not
inlined, and the agent is never notified about attached files.
The user sends a file, the adapter downloads it to the local
cache, but the agent sees an empty message and responds with
'I didn't receive any file'.
Set msg_type to DOCUMENT when file_ids is non-empty, matching
the behavior of the Telegram and Discord adapters.
The _async_flush_memories() helper accepts (session_id) but both the
/new and /resume handlers passed two arguments (session_id, session_key).
The TypeError was silently swallowed at DEBUG level, so memory extraction
never ran when users typed /new or /resume.
One call site (the session expiry watcher) was already fixed in 9c96f669,
but /new and /resume were missed.
- gateway/run.py:3247 — remove stray session_key from /new handler
- gateway/run.py:4989 — remove stray session_key from /resume handler
- tests/gateway/test_resume_command.py:222 — update test assertion
Previously the scheduler checked startswith('[SILENT]'), so agents that
appended [SILENT] after an explanation (e.g. 'N items filtered.\n\n[SILENT]')
would still trigger delivery.
Change the check to 'in' so the marker is caught regardless of position.
Add test_silent_trailing_suppresses_delivery to cover this case.
Ensures pending sessions are committed on process exit even if
shutdown_memory_provider is never called (gateway crash, SIGKILL,
or exception in _async_flush_memories preventing shutdown).
Also reorders on_session_end to wait for the pending sync thread
before checking turn_count, so the last turn's messages are flushed.
Based on PR #4919 by dagbs.
Updates the plugin build guide and features page to reflect the
interactive env var prompting added in PR #5470. Documents the rich
manifest format (name/description/url/secret) alongside the simple
string format.