feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
"""Base class for all Hermes execution environment backends.
|
2026-02-21 22:31:43 -08:00
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
Unified spawn-per-call model: every command spawns a fresh ``bash -c`` process.
|
|
|
|
|
A session snapshot (env vars, functions, aliases) is captured once at init and
|
|
|
|
|
re-sourced before each command. CWD persists via in-band stdout markers (remote)
|
|
|
|
|
or a temp file (local).
|
|
|
|
|
"""
|
|
|
|
|
|
fix(environments): use incremental UTF-8 decoder in select-based drain
The first draft of the fix called `chunk.decode("utf-8")` directly on
each 4096-byte `os.read()` result, which corrupts output whenever a
multi-byte UTF-8 character straddles a read boundary:
* `UnicodeDecodeError` fires on the valid-but-truncated byte sequence.
* The except handler clears ALL previously-decoded output and replaces
the whole buffer with `[binary output detected ...]`.
Empirically: 10000 '日' chars (30001 bytes) through the wrapper loses
all 10000 characters on the first draft; the baseline TextIOWrapper
drain (which uses `encoding='utf-8', errors='replace'` on Popen)
preserves them all. This regression affects any command emitting
non-ASCII output larger than one chunk — CJK/Arabic/emoji in
`npm install`, `pip install`, `docker logs`, `kubectl logs`, etc.
Fix: swap to `codecs.getincrementaldecoder('utf-8')(errors='replace')`,
which buffers partial multi-byte sequences across chunks and substitutes
U+FFFD for genuinely invalid bytes. Flush on drain exit via
`decoder.decode(b'', final=True)` to emit any trailing replacement
character for a dangling partial sequence.
Adds two regression tests:
* test_utf8_multibyte_across_read_boundary — 10000 U+65E5 chars,
verifies count round-trips and no fallback fires.
* test_invalid_utf8_uses_replacement_not_fallback — deliberate
\xff\xfe between valid ASCII, verifies surrounding text survives.
2026-04-19 11:26:02 -07:00
|
|
|
import codecs
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
import json
|
|
|
|
|
import logging
|
2026-02-23 21:15:35 -08:00
|
|
|
import os
|
fix(environments): prevent terminal hang when commands background children (#8340)
When a user's command backgrounds a child (`cmd &`, `setsid cmd & disown`,
etc.), the backgrounded grandchild inherits the write-end of our stdout
pipe via fork(). The old `for line in proc.stdout` drain never EOF'd
until the grandchild closed the pipe — so for a uvicorn server, the
terminal tool hung indefinitely (users reported the whole session
deadlocking when asking the agent to restart a backend).
Fix: switch _drain() to select()-based non-blocking reads and stop
draining shortly after bash exits even if the pipe hasn't EOF'd. Any
output the grandchild writes after that point goes to an orphaned pipe,
which is exactly what the user asked for when they said '&'.
Adds regression tests covering the issue's exact repro and 5 related
patterns (plain bg, setsid+disown, streaming output, high volume,
timeout, UTF-8).
2026-04-19 11:12:39 -07:00
|
|
|
import select
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
import shlex
|
2026-02-21 22:31:43 -08:00
|
|
|
import subprocess
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
import threading
|
|
|
|
|
import time
|
|
|
|
|
import uuid
|
|
|
|
|
from abc import ABC, abstractmethod
|
2026-02-23 21:15:35 -08:00
|
|
|
from pathlib import Path
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
from typing import IO, Callable, Protocol
|
2026-02-23 21:15:35 -08:00
|
|
|
|
2026-03-31 08:48:54 +09:00
|
|
|
from hermes_constants import get_hermes_home
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
from tools.interrupt import is_interrupted
|
|
|
|
|
|
|
|
|
|
logger = logging.getLogger(__name__)
|
fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths
Several files resolved paths via Path.home() / ".hermes" or
os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME
environment variable. This broke isolation when running multiple
Hermes instances with distinct HERMES_HOME directories.
Replace all hardcoded paths with calls to get_hermes_home() from
hermes_cli.config, consistent with the rest of the codebase.
Files fixed:
- tools/process_registry.py (processes.json)
- gateway/pairing.py (pairing/)
- gateway/sticker_cache.py (sticker_cache.json)
- gateway/channel_directory.py (channel_directory.json, sessions.json)
- gateway/config.py (gateway.json, config.yaml, sessions_dir)
- gateway/mirror.py (sessions/)
- gateway/hooks.py (hooks/)
- gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/)
- gateway/platforms/whatsapp.py (whatsapp/session)
- gateway/delivery.py (cron/output)
- agent/auxiliary_client.py (auth.json)
- agent/prompt_builder.py (SOUL.md)
- cli.py (config.yaml, images/, pastes/, history)
- run_agent.py (logs/)
- tools/environments/base.py (sandboxes/)
- tools/environments/modal.py (modal_snapshots.json)
- tools/environments/singularity.py (singularity_snapshots.json)
- tools/tts_tool.py (audio_cache)
- hermes_cli/status.py (cron/jobs.json, sessions.json)
- hermes_cli/gateway.py (logs/, whatsapp session)
- hermes_cli/main.py (whatsapp/session)
Tests updated to use HERMES_HOME env var instead of patching Path.home().
Closes #892
(cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)
2026-03-11 07:31:41 +01:00
|
|
|
|
fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907)
* fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace
interrupt() previously only flagged the agent's _execution_thread_id.
Tools running inside _execute_tool_calls_concurrent execute on
ThreadPoolExecutor worker threads whose tids are distinct from the
agent's, so is_interrupted() inside those tools returned False no matter
how many times the gateway called .interrupt() — hung ssh / curl / long
make-builds ran to their own timeout.
Changes:
- run_agent.py: track concurrent-tool worker tids in a per-agent set,
fan interrupt()/clear_interrupt() out to them, and handle the
register-after-interrupt race at _run_tool entry. getattr fallback
for the tracker so test stubs built via object.__new__ keep working.
- tools/environments/base.py: opt-in _wait_for_process trace (ENTER,
per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT
DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1.
- tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target
tid, set snapshot) behind the same env flag.
- tests: new regression test runs a polling tool on a concurrent worker
and asserts is_interrupted() flips to True within ~1s of interrupt().
Second new test guards clear_interrupt() clearing tracked worker bits.
Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env
subset 216 pass.
* fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log
AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when
quiet_mode=True (the CLI default). This would silently swallow every
INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation
added in the parent commit — confirmed by running hermes chat -q with
the flag and finding zero trace lines in agent.log even though
_wait_for_process was clearly executing (subprocess pid existed).
Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets
its own logger level to INFO at import time, overriding the 'tools'
parent-level filter. Scoped to the opt-in case only, so production
(quiet_mode default) logs stay quiet as designed.
Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes
'_wait_for_process ENTER/EXIT' lines to agent.log as expected.
* fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses
Tool subprocesses spawned by the local environment backend use
os.setsid so they run in their own process group. Before this fix,
SIGTERM/SIGHUP to the hermes CLI killed the main thread via
KeyboardInterrupt but the worker thread running _wait_for_process
never got a chance to call _kill_process — Python exited, the child
was reparented to init (PPID=1), and the subprocess ran to its
natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM
to the agent until manual cleanup).
Changes:
- cli.py _signal_handler (interactive) + _signal_handler_q (-q mode):
route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll
loop sees the per-thread interrupt flag and calls _kill_process
(os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default
1.5s) gives the worker time to complete its SIGTERM+SIGKILL
escalation before KeyboardInterrupt unwinds main.
- tools/environments/base.py _wait_for_process: wrap the poll loop in
try/except (KeyboardInterrupt, SystemExit) so the cleanup fires
even on paths the signal handlers don't cover (direct sys.exit,
unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace
line when HERMES_DEBUG_INTERRUPT=1.
- New regression test: injects KeyboardInterrupt into a running
_wait_for_process via PyThreadState_SetAsyncExc, verifies the
subprocess process group is dead within 3s of the exception and
that KeyboardInterrupt re-raises cleanly afterward.
Validation:
| Before | After |
|---------------------------------------------------------|--------------------|
| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM | dies within 2 s |
| No INTERRUPT DETECTED in trace | INTERRUPT DETECTED fires + killing process group |
| tests/tools/test_local_interrupt_cleanup | 1/1 pass |
| tests/run_agent/test_concurrent_interrupt | 4/4 pass |
2026-04-17 20:39:25 -07:00
|
|
|
# Opt-in debug tracing for the interrupt/activity/poll machinery. Set
|
|
|
|
|
# HERMES_DEBUG_INTERRUPT=1 to log loop entry/exit, periodic heartbeats, and
|
|
|
|
|
# every is_interrupted() state change from _wait_for_process. Off by default
|
|
|
|
|
# to avoid flooding production gateway logs.
|
|
|
|
|
_DEBUG_INTERRUPT = bool(os.getenv("HERMES_DEBUG_INTERRUPT"))
|
|
|
|
|
|
|
|
|
|
if _DEBUG_INTERRUPT:
|
|
|
|
|
# AIAgent's quiet_mode path (run_agent.py) forces the `tools` logger to
|
|
|
|
|
# ERROR on CLI startup, which would silently swallow every trace we emit.
|
|
|
|
|
# Force this module's own logger back to INFO so the trace is visible in
|
|
|
|
|
# agent.log regardless of quiet-mode. Scoped to the opt-in case only.
|
|
|
|
|
logger.setLevel(logging.INFO)
|
|
|
|
|
|
fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking
Three root causes of the 'agent stops mid-task' gateway bug:
1. Compression threshold floor (64K tokens minimum)
- The 50% threshold on a 100K-context model fired at 50K tokens,
causing premature compression that made models lose track of
multi-step plans. Now threshold_tokens = max(50% * context, 64K).
- Models with <64K context are rejected at startup with a clear error.
2. Budget warning removal — grace call instead
- Removed the 70%/90% iteration budget warnings entirely. These
injected '[BUDGET WARNING: Provide your final response NOW]' into
tool results, causing models to abandon complex tasks prematurely.
- Now: no warnings during normal execution. When the budget is
actually exhausted (90/90), inject a user message asking the model
to summarise, allow one grace API call, and only then fall back
to _handle_max_iterations.
3. Activity touches during long terminal execution
- _wait_for_process polls every 0.2s but never reported activity.
The gateway's inactivity timeout (default 1800s) would fire during
long-running commands that appeared 'idle.'
- Now: thread-local activity callback fires every 10s during the
poll loop, keeping the gateway's activity tracker alive.
- Agent wires _touch_activity into the callback before each tool call.
Also: docs update noting 64K minimum context requirement.
Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).
2026-04-11 16:18:57 -07:00
|
|
|
# Thread-local activity callback. The agent sets this before a tool call so
|
|
|
|
|
# long-running _wait_for_process loops can report liveness to the gateway.
|
|
|
|
|
_activity_callback_local = threading.local()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def set_activity_callback(cb: Callable[[str], None] | None) -> None:
|
|
|
|
|
"""Register a callback that _wait_for_process fires periodically."""
|
|
|
|
|
_activity_callback_local.callback = cb
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _get_activity_callback() -> Callable[[str], None] | None:
|
|
|
|
|
return getattr(_activity_callback_local, "callback", None)
|
|
|
|
|
|
2026-02-23 21:15:35 -08:00
|
|
|
|
2026-04-16 19:01:56 +05:30
|
|
|
def touch_activity_if_due(
|
|
|
|
|
state: dict,
|
|
|
|
|
label: str,
|
|
|
|
|
) -> None:
|
|
|
|
|
"""Fire the activity callback at most once every ``state['interval']`` seconds.
|
|
|
|
|
|
|
|
|
|
*state* must contain ``last_touch`` (monotonic timestamp) and ``start``
|
|
|
|
|
(monotonic timestamp of the operation start). An optional ``interval``
|
|
|
|
|
key overrides the default 10 s cadence.
|
|
|
|
|
|
|
|
|
|
Swallows all exceptions so callers don't need their own try/except.
|
|
|
|
|
"""
|
|
|
|
|
now = time.monotonic()
|
|
|
|
|
interval = state.get("interval", 10.0)
|
|
|
|
|
if now - state["last_touch"] < interval:
|
|
|
|
|
return
|
|
|
|
|
state["last_touch"] = now
|
|
|
|
|
try:
|
|
|
|
|
cb = _get_activity_callback()
|
|
|
|
|
if cb:
|
|
|
|
|
elapsed = int(now - state["start"])
|
|
|
|
|
cb(f"{label} ({elapsed}s elapsed)")
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
|
2026-02-23 21:15:35 -08:00
|
|
|
def get_sandbox_dir() -> Path:
|
|
|
|
|
"""Return the host-side root for all sandbox storage (Docker workspaces,
|
|
|
|
|
Singularity overlays/SIF cache, etc.).
|
|
|
|
|
|
fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths
Several files resolved paths via Path.home() / ".hermes" or
os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME
environment variable. This broke isolation when running multiple
Hermes instances with distinct HERMES_HOME directories.
Replace all hardcoded paths with calls to get_hermes_home() from
hermes_cli.config, consistent with the rest of the codebase.
Files fixed:
- tools/process_registry.py (processes.json)
- gateway/pairing.py (pairing/)
- gateway/sticker_cache.py (sticker_cache.json)
- gateway/channel_directory.py (channel_directory.json, sessions.json)
- gateway/config.py (gateway.json, config.yaml, sessions_dir)
- gateway/mirror.py (sessions/)
- gateway/hooks.py (hooks/)
- gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/)
- gateway/platforms/whatsapp.py (whatsapp/session)
- gateway/delivery.py (cron/output)
- agent/auxiliary_client.py (auth.json)
- agent/prompt_builder.py (SOUL.md)
- cli.py (config.yaml, images/, pastes/, history)
- run_agent.py (logs/)
- tools/environments/base.py (sandboxes/)
- tools/environments/modal.py (modal_snapshots.json)
- tools/environments/singularity.py (singularity_snapshots.json)
- tools/tts_tool.py (audio_cache)
- hermes_cli/status.py (cron/jobs.json, sessions.json)
- hermes_cli/gateway.py (logs/, whatsapp session)
- hermes_cli/main.py (whatsapp/session)
Tests updated to use HERMES_HOME env var instead of patching Path.home().
Closes #892
(cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)
2026-03-11 07:31:41 +01:00
|
|
|
Configurable via TERMINAL_SANDBOX_DIR. Defaults to {HERMES_HOME}/sandboxes/.
|
2026-02-23 21:15:35 -08:00
|
|
|
"""
|
|
|
|
|
custom = os.getenv("TERMINAL_SANDBOX_DIR")
|
|
|
|
|
if custom:
|
|
|
|
|
p = Path(custom)
|
|
|
|
|
else:
|
fix(cli): respect HERMES_HOME in all remaining hardcoded ~/.hermes paths
Several files resolved paths via Path.home() / ".hermes" or
os.path.expanduser("~/.hermes/..."), bypassing the HERMES_HOME
environment variable. This broke isolation when running multiple
Hermes instances with distinct HERMES_HOME directories.
Replace all hardcoded paths with calls to get_hermes_home() from
hermes_cli.config, consistent with the rest of the codebase.
Files fixed:
- tools/process_registry.py (processes.json)
- gateway/pairing.py (pairing/)
- gateway/sticker_cache.py (sticker_cache.json)
- gateway/channel_directory.py (channel_directory.json, sessions.json)
- gateway/config.py (gateway.json, config.yaml, sessions_dir)
- gateway/mirror.py (sessions/)
- gateway/hooks.py (hooks/)
- gateway/platforms/base.py (image_cache/, audio_cache/, document_cache/)
- gateway/platforms/whatsapp.py (whatsapp/session)
- gateway/delivery.py (cron/output)
- agent/auxiliary_client.py (auth.json)
- agent/prompt_builder.py (SOUL.md)
- cli.py (config.yaml, images/, pastes/, history)
- run_agent.py (logs/)
- tools/environments/base.py (sandboxes/)
- tools/environments/modal.py (modal_snapshots.json)
- tools/environments/singularity.py (singularity_snapshots.json)
- tools/tts_tool.py (audio_cache)
- hermes_cli/status.py (cron/jobs.json, sessions.json)
- hermes_cli/gateway.py (logs/, whatsapp session)
- hermes_cli/main.py (whatsapp/session)
Tests updated to use HERMES_HOME env var instead of patching Path.home().
Closes #892
(cherry picked from commit 78ac1bba43b8b74a934c6172f2c29bb4d03164b9)
2026-03-11 07:31:41 +01:00
|
|
|
p = get_hermes_home() / "sandboxes"
|
2026-02-23 21:15:35 -08:00
|
|
|
p.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
return p
|
2026-02-21 22:31:43 -08:00
|
|
|
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# Shared constants and utilities
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _pipe_stdin(proc: subprocess.Popen, data: str) -> None:
|
|
|
|
|
"""Write *data* to proc.stdin on a daemon thread to avoid pipe-buffer deadlocks."""
|
|
|
|
|
|
|
|
|
|
def _write():
|
|
|
|
|
try:
|
|
|
|
|
proc.stdin.write(data)
|
|
|
|
|
proc.stdin.close()
|
|
|
|
|
except (BrokenPipeError, OSError):
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
threading.Thread(target=_write, daemon=True).start()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _popen_bash(
|
|
|
|
|
cmd: list[str], stdin_data: str | None = None, **kwargs
|
|
|
|
|
) -> subprocess.Popen:
|
|
|
|
|
"""Spawn a subprocess with standard stdout/stderr/stdin setup.
|
|
|
|
|
|
|
|
|
|
If *stdin_data* is provided, writes it asynchronously via :func:`_pipe_stdin`.
|
|
|
|
|
Backends with special Popen needs (e.g. local's ``preexec_fn``) can bypass
|
|
|
|
|
this and call :func:`_pipe_stdin` directly.
|
|
|
|
|
"""
|
|
|
|
|
proc = subprocess.Popen(
|
|
|
|
|
cmd,
|
|
|
|
|
stdout=subprocess.PIPE,
|
|
|
|
|
stderr=subprocess.STDOUT,
|
|
|
|
|
stdin=subprocess.PIPE if stdin_data is not None else subprocess.DEVNULL,
|
|
|
|
|
text=True,
|
|
|
|
|
**kwargs,
|
|
|
|
|
)
|
|
|
|
|
if stdin_data is not None:
|
|
|
|
|
_pipe_stdin(proc, stdin_data)
|
|
|
|
|
return proc
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _load_json_store(path: Path) -> dict:
|
|
|
|
|
"""Load a JSON file as a dict, returning ``{}`` on any error."""
|
|
|
|
|
if path.exists():
|
|
|
|
|
try:
|
|
|
|
|
return json.loads(path.read_text())
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
return {}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _save_json_store(path: Path, data: dict) -> None:
|
|
|
|
|
"""Write *data* as pretty-printed JSON to *path*."""
|
|
|
|
|
path.parent.mkdir(parents=True, exist_ok=True)
|
|
|
|
|
path.write_text(json.dumps(data, indent=2))
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _file_mtime_key(host_path: str) -> tuple[float, int] | None:
|
|
|
|
|
"""Return ``(mtime, size)`` for cache comparison, or ``None`` if unreadable."""
|
|
|
|
|
try:
|
|
|
|
|
st = Path(host_path).stat()
|
|
|
|
|
return (st.st_mtime, st.st_size)
|
|
|
|
|
except OSError:
|
|
|
|
|
return None
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# ProcessHandle protocol
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class ProcessHandle(Protocol):
|
|
|
|
|
"""Duck type that every backend's _run_bash() must return.
|
|
|
|
|
|
|
|
|
|
subprocess.Popen satisfies this natively. SDK backends (Modal, Daytona)
|
|
|
|
|
return _ThreadedProcessHandle which adapts their blocking calls.
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
def poll(self) -> int | None: ...
|
|
|
|
|
def kill(self) -> None: ...
|
|
|
|
|
def wait(self, timeout: float | None = None) -> int: ...
|
|
|
|
|
|
|
|
|
|
@property
|
|
|
|
|
def stdout(self) -> IO[str] | None: ...
|
|
|
|
|
|
|
|
|
|
@property
|
|
|
|
|
def returncode(self) -> int | None: ...
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class _ThreadedProcessHandle:
|
|
|
|
|
"""Adapter for SDK backends (Modal, Daytona) that have no real subprocess.
|
|
|
|
|
|
|
|
|
|
Wraps a blocking ``exec_fn() -> (output_str, exit_code)`` in a background
|
|
|
|
|
thread and exposes a ProcessHandle-compatible interface. An optional
|
|
|
|
|
``cancel_fn`` is invoked on ``kill()`` for backend-specific cancellation
|
|
|
|
|
(e.g. Modal sandbox.terminate, Daytona sandbox.stop).
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
def __init__(
|
|
|
|
|
self,
|
|
|
|
|
exec_fn: Callable[[], tuple[str, int]],
|
|
|
|
|
cancel_fn: Callable[[], None] | None = None,
|
|
|
|
|
):
|
|
|
|
|
self._cancel_fn = cancel_fn
|
|
|
|
|
self._done = threading.Event()
|
|
|
|
|
self._returncode: int | None = None
|
|
|
|
|
self._error: Exception | None = None
|
|
|
|
|
|
|
|
|
|
# Pipe for stdout — drain thread in _wait_for_process reads the read end.
|
|
|
|
|
read_fd, write_fd = os.pipe()
|
|
|
|
|
self._stdout = os.fdopen(read_fd, "r", encoding="utf-8", errors="replace")
|
|
|
|
|
self._write_fd = write_fd
|
|
|
|
|
|
|
|
|
|
def _worker():
|
|
|
|
|
try:
|
|
|
|
|
output, exit_code = exec_fn()
|
|
|
|
|
self._returncode = exit_code
|
|
|
|
|
# Write output into the pipe so drain thread picks it up.
|
|
|
|
|
try:
|
|
|
|
|
os.write(self._write_fd, output.encode("utf-8", errors="replace"))
|
|
|
|
|
except OSError:
|
|
|
|
|
pass
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
self._error = exc
|
|
|
|
|
self._returncode = 1
|
|
|
|
|
finally:
|
|
|
|
|
try:
|
|
|
|
|
os.close(self._write_fd)
|
|
|
|
|
except OSError:
|
|
|
|
|
pass
|
|
|
|
|
self._done.set()
|
|
|
|
|
|
|
|
|
|
t = threading.Thread(target=_worker, daemon=True)
|
|
|
|
|
t.start()
|
|
|
|
|
|
|
|
|
|
@property
|
|
|
|
|
def stdout(self):
|
|
|
|
|
return self._stdout
|
|
|
|
|
|
|
|
|
|
@property
|
|
|
|
|
def returncode(self) -> int | None:
|
|
|
|
|
return self._returncode
|
|
|
|
|
|
|
|
|
|
def poll(self) -> int | None:
|
|
|
|
|
return self._returncode if self._done.is_set() else None
|
|
|
|
|
|
|
|
|
|
def kill(self):
|
|
|
|
|
if self._cancel_fn:
|
|
|
|
|
try:
|
|
|
|
|
self._cancel_fn()
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
def wait(self, timeout: float | None = None) -> int:
|
|
|
|
|
self._done.wait(timeout=timeout)
|
|
|
|
|
return self._returncode
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# CWD marker for remote backends
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def _cwd_marker(session_id: str) -> str:
|
|
|
|
|
return f"__HERMES_CWD_{session_id}__"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
# BaseEnvironment
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
|
2026-02-21 22:31:43 -08:00
|
|
|
class BaseEnvironment(ABC):
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
"""Common interface and unified execution flow for all Hermes backends.
|
2026-02-21 22:31:43 -08:00
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
Subclasses implement ``_run_bash()`` and ``cleanup()``. The base class
|
|
|
|
|
provides ``execute()`` with session snapshot sourcing, CWD tracking,
|
|
|
|
|
interrupt handling, and timeout enforcement.
|
2026-02-21 22:31:43 -08:00
|
|
|
"""
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
# Subclasses that embed stdin as a heredoc (Modal, Daytona) set this.
|
|
|
|
|
_stdin_mode: str = "pipe" # "pipe" or "heredoc"
|
|
|
|
|
|
|
|
|
|
# Snapshot creation timeout (override for slow cold-starts).
|
|
|
|
|
_snapshot_timeout: int = 30
|
|
|
|
|
|
2026-04-09 08:22:55 +02:00
|
|
|
def get_temp_dir(self) -> str:
|
|
|
|
|
"""Return the backend temp directory used for session artifacts.
|
|
|
|
|
|
|
|
|
|
Most sandboxed backends use ``/tmp`` inside the target environment.
|
|
|
|
|
LocalEnvironment overrides this on platforms like Termux where ``/tmp``
|
|
|
|
|
may be missing and ``TMPDIR`` is the portable writable location.
|
|
|
|
|
"""
|
|
|
|
|
return "/tmp"
|
|
|
|
|
|
2026-02-21 22:31:43 -08:00
|
|
|
def __init__(self, cwd: str, timeout: int, env: dict = None):
|
|
|
|
|
self.cwd = cwd
|
|
|
|
|
self.timeout = timeout
|
|
|
|
|
self.env = env or {}
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
self._session_id = uuid.uuid4().hex[:12]
|
2026-04-09 08:22:55 +02:00
|
|
|
temp_dir = self.get_temp_dir().rstrip("/") or "/"
|
|
|
|
|
self._snapshot_path = f"{temp_dir}/hermes-snap-{self._session_id}.sh"
|
|
|
|
|
self._cwd_file = f"{temp_dir}/hermes-cwd-{self._session_id}.txt"
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
self._cwd_marker = _cwd_marker(self._session_id)
|
|
|
|
|
self._snapshot_ready = False
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Abstract methods
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def _run_bash(
|
|
|
|
|
self,
|
|
|
|
|
cmd_string: str,
|
|
|
|
|
*,
|
|
|
|
|
login: bool = False,
|
|
|
|
|
timeout: int = 120,
|
|
|
|
|
stdin_data: str | None = None,
|
|
|
|
|
) -> ProcessHandle:
|
|
|
|
|
"""Spawn a bash process to run *cmd_string*.
|
|
|
|
|
|
|
|
|
|
Returns a ProcessHandle (subprocess.Popen or _ThreadedProcessHandle).
|
|
|
|
|
Must be overridden by every backend.
|
|
|
|
|
"""
|
|
|
|
|
raise NotImplementedError(f"{type(self).__name__} must implement _run_bash()")
|
2026-02-21 22:31:43 -08:00
|
|
|
|
|
|
|
|
@abstractmethod
|
|
|
|
|
def cleanup(self):
|
|
|
|
|
"""Release backend resources (container, instance, connection)."""
|
|
|
|
|
...
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Session snapshot (init_session)
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def init_session(self):
|
|
|
|
|
"""Capture login shell environment into a snapshot file.
|
|
|
|
|
|
|
|
|
|
Called once after backend construction. On success, sets
|
|
|
|
|
``_snapshot_ready = True`` so subsequent commands source the snapshot
|
|
|
|
|
instead of running with ``bash -l``.
|
|
|
|
|
"""
|
|
|
|
|
# Full capture: env vars, functions (filtered), aliases, shell options.
|
|
|
|
|
bootstrap = (
|
|
|
|
|
f"export -p > {self._snapshot_path}\n"
|
|
|
|
|
f"declare -f | grep -vE '^_[^_]' >> {self._snapshot_path}\n"
|
|
|
|
|
f"alias -p >> {self._snapshot_path}\n"
|
|
|
|
|
f"echo 'shopt -s expand_aliases' >> {self._snapshot_path}\n"
|
|
|
|
|
f"echo 'set +e' >> {self._snapshot_path}\n"
|
|
|
|
|
f"echo 'set +u' >> {self._snapshot_path}\n"
|
|
|
|
|
f"pwd -P > {self._cwd_file} 2>/dev/null || true\n"
|
|
|
|
|
f"printf '\\n{self._cwd_marker}%s{self._cwd_marker}\\n' \"$(pwd -P)\"\n"
|
|
|
|
|
)
|
|
|
|
|
try:
|
|
|
|
|
proc = self._run_bash(bootstrap, login=True, timeout=self._snapshot_timeout)
|
|
|
|
|
result = self._wait_for_process(proc, timeout=self._snapshot_timeout)
|
|
|
|
|
self._snapshot_ready = True
|
|
|
|
|
self._update_cwd(result)
|
|
|
|
|
logger.info(
|
|
|
|
|
"Session snapshot created (session=%s, cwd=%s)",
|
|
|
|
|
self._session_id,
|
|
|
|
|
self.cwd,
|
|
|
|
|
)
|
|
|
|
|
except Exception as exc:
|
|
|
|
|
logger.warning(
|
|
|
|
|
"init_session failed (session=%s): %s — "
|
|
|
|
|
"falling back to bash -l per command",
|
|
|
|
|
self._session_id,
|
|
|
|
|
exc,
|
|
|
|
|
)
|
|
|
|
|
self._snapshot_ready = False
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Command wrapping
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
2026-04-24 14:38:00 +03:00
|
|
|
@staticmethod
|
|
|
|
|
def _quote_cwd_for_cd(cwd: str) -> str:
|
|
|
|
|
"""Quote a ``cd`` target while preserving ``~`` expansion."""
|
|
|
|
|
if cwd == "~":
|
|
|
|
|
return cwd
|
|
|
|
|
if cwd == "~/":
|
|
|
|
|
return "$HOME"
|
|
|
|
|
if cwd.startswith("~/"):
|
|
|
|
|
return f"$HOME/{shlex.quote(cwd[2:])}"
|
|
|
|
|
return shlex.quote(cwd)
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
def _wrap_command(self, command: str, cwd: str) -> str:
|
|
|
|
|
"""Build the full bash script that sources snapshot, cd's, runs command,
|
|
|
|
|
re-dumps env vars, and emits CWD markers."""
|
|
|
|
|
escaped = command.replace("'", "'\\''")
|
|
|
|
|
|
|
|
|
|
parts = []
|
|
|
|
|
|
|
|
|
|
# Source snapshot (env vars from previous commands)
|
|
|
|
|
if self._snapshot_ready:
|
|
|
|
|
parts.append(f"source {self._snapshot_path} 2>/dev/null || true")
|
|
|
|
|
|
2026-04-24 14:38:00 +03:00
|
|
|
# Preserve bare ``~`` expansion, but rewrite ``~/...`` through
|
|
|
|
|
# ``$HOME`` so suffixes with spaces remain a single shell word.
|
|
|
|
|
quoted_cwd = self._quote_cwd_for_cd(cwd)
|
2026-04-14 10:18:11 +02:00
|
|
|
parts.append(f"builtin cd {quoted_cwd} || exit 126")
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
|
|
|
|
|
# Run the actual command
|
|
|
|
|
parts.append(f"eval '{escaped}'")
|
|
|
|
|
parts.append("__hermes_ec=$?")
|
|
|
|
|
|
|
|
|
|
# Re-dump env vars to snapshot (last-writer-wins for concurrent calls)
|
|
|
|
|
if self._snapshot_ready:
|
|
|
|
|
parts.append(f"export -p > {self._snapshot_path} 2>/dev/null || true")
|
|
|
|
|
|
|
|
|
|
# Write CWD to file (local reads this) and stdout marker (remote parses this)
|
|
|
|
|
parts.append(f"pwd -P > {self._cwd_file} 2>/dev/null || true")
|
|
|
|
|
# Use a distinct line for the marker. The leading \n ensures
|
|
|
|
|
# the marker starts on its own line even if the command doesn't
|
|
|
|
|
# end with a newline (e.g. printf 'exact'). We'll strip this
|
|
|
|
|
# injected newline in _extract_cwd_from_output.
|
|
|
|
|
parts.append(
|
|
|
|
|
f"printf '\\n{self._cwd_marker}%s{self._cwd_marker}\\n' \"$(pwd -P)\""
|
|
|
|
|
)
|
|
|
|
|
parts.append("exit $__hermes_ec")
|
|
|
|
|
|
|
|
|
|
return "\n".join(parts)
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Stdin heredoc embedding (for SDK backends)
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
@staticmethod
|
|
|
|
|
def _embed_stdin_heredoc(command: str, stdin_data: str) -> str:
|
|
|
|
|
"""Append stdin_data as a shell heredoc to the command string."""
|
|
|
|
|
delimiter = f"HERMES_STDIN_{uuid.uuid4().hex[:12]}"
|
|
|
|
|
return f"{command} << '{delimiter}'\n{stdin_data}\n{delimiter}"
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Process lifecycle
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def _wait_for_process(self, proc: ProcessHandle, timeout: int = 120) -> dict:
|
|
|
|
|
"""Poll-based wait with interrupt checking and stdout draining.
|
|
|
|
|
|
|
|
|
|
Shared across all backends — not overridden.
|
fix: prevent agent from stopping mid-task — compression floor, budget overhaul, activity tracking
Three root causes of the 'agent stops mid-task' gateway bug:
1. Compression threshold floor (64K tokens minimum)
- The 50% threshold on a 100K-context model fired at 50K tokens,
causing premature compression that made models lose track of
multi-step plans. Now threshold_tokens = max(50% * context, 64K).
- Models with <64K context are rejected at startup with a clear error.
2. Budget warning removal — grace call instead
- Removed the 70%/90% iteration budget warnings entirely. These
injected '[BUDGET WARNING: Provide your final response NOW]' into
tool results, causing models to abandon complex tasks prematurely.
- Now: no warnings during normal execution. When the budget is
actually exhausted (90/90), inject a user message asking the model
to summarise, allow one grace API call, and only then fall back
to _handle_max_iterations.
3. Activity touches during long terminal execution
- _wait_for_process polls every 0.2s but never reported activity.
The gateway's inactivity timeout (default 1800s) would fire during
long-running commands that appeared 'idle.'
- Now: thread-local activity callback fires every 10s during the
poll loop, keeping the gateway's activity tracker alive.
- Agent wires _touch_activity into the callback before each tool call.
Also: docs update noting 64K minimum context requirement.
Closes #7915 (root cause was agent-loop termination, not Weixin delivery limits).
2026-04-11 16:18:57 -07:00
|
|
|
|
|
|
|
|
Fires the ``activity_callback`` (if set on this instance) every 10s
|
|
|
|
|
while the process is running so the gateway's inactivity timeout
|
|
|
|
|
doesn't kill long-running commands.
|
fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907)
* fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace
interrupt() previously only flagged the agent's _execution_thread_id.
Tools running inside _execute_tool_calls_concurrent execute on
ThreadPoolExecutor worker threads whose tids are distinct from the
agent's, so is_interrupted() inside those tools returned False no matter
how many times the gateway called .interrupt() — hung ssh / curl / long
make-builds ran to their own timeout.
Changes:
- run_agent.py: track concurrent-tool worker tids in a per-agent set,
fan interrupt()/clear_interrupt() out to them, and handle the
register-after-interrupt race at _run_tool entry. getattr fallback
for the tracker so test stubs built via object.__new__ keep working.
- tools/environments/base.py: opt-in _wait_for_process trace (ENTER,
per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT
DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1.
- tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target
tid, set snapshot) behind the same env flag.
- tests: new regression test runs a polling tool on a concurrent worker
and asserts is_interrupted() flips to True within ~1s of interrupt().
Second new test guards clear_interrupt() clearing tracked worker bits.
Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env
subset 216 pass.
* fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log
AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when
quiet_mode=True (the CLI default). This would silently swallow every
INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation
added in the parent commit — confirmed by running hermes chat -q with
the flag and finding zero trace lines in agent.log even though
_wait_for_process was clearly executing (subprocess pid existed).
Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets
its own logger level to INFO at import time, overriding the 'tools'
parent-level filter. Scoped to the opt-in case only, so production
(quiet_mode default) logs stay quiet as designed.
Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes
'_wait_for_process ENTER/EXIT' lines to agent.log as expected.
* fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses
Tool subprocesses spawned by the local environment backend use
os.setsid so they run in their own process group. Before this fix,
SIGTERM/SIGHUP to the hermes CLI killed the main thread via
KeyboardInterrupt but the worker thread running _wait_for_process
never got a chance to call _kill_process — Python exited, the child
was reparented to init (PPID=1), and the subprocess ran to its
natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM
to the agent until manual cleanup).
Changes:
- cli.py _signal_handler (interactive) + _signal_handler_q (-q mode):
route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll
loop sees the per-thread interrupt flag and calls _kill_process
(os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default
1.5s) gives the worker time to complete its SIGTERM+SIGKILL
escalation before KeyboardInterrupt unwinds main.
- tools/environments/base.py _wait_for_process: wrap the poll loop in
try/except (KeyboardInterrupt, SystemExit) so the cleanup fires
even on paths the signal handlers don't cover (direct sys.exit,
unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace
line when HERMES_DEBUG_INTERRUPT=1.
- New regression test: injects KeyboardInterrupt into a running
_wait_for_process via PyThreadState_SetAsyncExc, verifies the
subprocess process group is dead within 3s of the exception and
that KeyboardInterrupt re-raises cleanly afterward.
Validation:
| Before | After |
|---------------------------------------------------------|--------------------|
| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM | dies within 2 s |
| No INTERRUPT DETECTED in trace | INTERRUPT DETECTED fires + killing process group |
| tests/tools/test_local_interrupt_cleanup | 1/1 pass |
| tests/run_agent/test_concurrent_interrupt | 4/4 pass |
2026-04-17 20:39:25 -07:00
|
|
|
|
|
|
|
|
Also wraps the poll loop in a ``try/finally`` that guarantees we
|
|
|
|
|
call ``self._kill_process(proc)`` if we exit via ``KeyboardInterrupt``
|
|
|
|
|
or ``SystemExit``. Without this, the local backend (which spawns
|
|
|
|
|
subprocesses with ``os.setsid`` into their own process group) leaves
|
|
|
|
|
an orphan with ``PPID=1`` when python is shut down mid-tool — the
|
|
|
|
|
``sleep 300``-survives-30-min bug Physikal and I both hit.
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
"""
|
|
|
|
|
output_chunks: list[str] = []
|
|
|
|
|
|
fix(environments): prevent terminal hang when commands background children (#8340)
When a user's command backgrounds a child (`cmd &`, `setsid cmd & disown`,
etc.), the backgrounded grandchild inherits the write-end of our stdout
pipe via fork(). The old `for line in proc.stdout` drain never EOF'd
until the grandchild closed the pipe — so for a uvicorn server, the
terminal tool hung indefinitely (users reported the whole session
deadlocking when asking the agent to restart a backend).
Fix: switch _drain() to select()-based non-blocking reads and stop
draining shortly after bash exits even if the pipe hasn't EOF'd. Any
output the grandchild writes after that point goes to an orphaned pipe,
which is exactly what the user asked for when they said '&'.
Adds regression tests covering the issue's exact repro and 5 related
patterns (plain bg, setsid+disown, streaming output, high volume,
timeout, UTF-8).
2026-04-19 11:12:39 -07:00
|
|
|
# Non-blocking drain via select().
|
|
|
|
|
#
|
|
|
|
|
# The old pattern — ``for line in proc.stdout`` — blocks on
|
|
|
|
|
# ``readline()`` until the pipe reaches EOF. When the user's command
|
|
|
|
|
# backgrounds a process (``cmd &``, ``setsid cmd & disown``, etc.),
|
|
|
|
|
# that backgrounded grandchild inherits the write-end of our stdout
|
|
|
|
|
# pipe via ``fork()``. Even after ``bash`` itself exits, the pipe
|
|
|
|
|
# stays open because the grandchild still holds it — so the drain
|
|
|
|
|
# thread never returns and the tool hangs for the full lifetime of
|
|
|
|
|
# the grandchild (issue #8340: users reported indefinite hangs when
|
|
|
|
|
# restarting uvicorn with ``setsid ... & disown``).
|
|
|
|
|
#
|
|
|
|
|
# The fix: select() with a short poll interval, and stop draining
|
|
|
|
|
# shortly after ``bash`` exits even if the pipe hasn't EOF'd yet.
|
|
|
|
|
# Any output the grandchild writes after that point goes to an
|
|
|
|
|
# orphaned pipe (harmless — the kernel reaps it when our end closes).
|
fix(environments): use incremental UTF-8 decoder in select-based drain
The first draft of the fix called `chunk.decode("utf-8")` directly on
each 4096-byte `os.read()` result, which corrupts output whenever a
multi-byte UTF-8 character straddles a read boundary:
* `UnicodeDecodeError` fires on the valid-but-truncated byte sequence.
* The except handler clears ALL previously-decoded output and replaces
the whole buffer with `[binary output detected ...]`.
Empirically: 10000 '日' chars (30001 bytes) through the wrapper loses
all 10000 characters on the first draft; the baseline TextIOWrapper
drain (which uses `encoding='utf-8', errors='replace'` on Popen)
preserves them all. This regression affects any command emitting
non-ASCII output larger than one chunk — CJK/Arabic/emoji in
`npm install`, `pip install`, `docker logs`, `kubectl logs`, etc.
Fix: swap to `codecs.getincrementaldecoder('utf-8')(errors='replace')`,
which buffers partial multi-byte sequences across chunks and substitutes
U+FFFD for genuinely invalid bytes. Flush on drain exit via
`decoder.decode(b'', final=True)` to emit any trailing replacement
character for a dangling partial sequence.
Adds two regression tests:
* test_utf8_multibyte_across_read_boundary — 10000 U+65E5 chars,
verifies count round-trips and no fallback fires.
* test_invalid_utf8_uses_replacement_not_fallback — deliberate
\xff\xfe between valid ASCII, verifies surrounding text survives.
2026-04-19 11:26:02 -07:00
|
|
|
#
|
|
|
|
|
# Decoding: we ``os.read()`` raw bytes in fixed-size chunks (4096)
|
|
|
|
|
# so a single multibyte UTF-8 character can split across reads. An
|
|
|
|
|
# incremental decoder buffers partial sequences across chunks, and
|
|
|
|
|
# ``errors="replace"`` mirrors the baseline ``TextIOWrapper`` (which
|
|
|
|
|
# was constructed with ``encoding="utf-8", errors="replace"`` on
|
|
|
|
|
# ``Popen``) so binary or mis-encoded output is preserved with
|
|
|
|
|
# U+FFFD substitution rather than clobbering the whole buffer.
|
|
|
|
|
decoder = codecs.getincrementaldecoder("utf-8")(errors="replace")
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
def _drain():
|
fix(environments): prevent terminal hang when commands background children (#8340)
When a user's command backgrounds a child (`cmd &`, `setsid cmd & disown`,
etc.), the backgrounded grandchild inherits the write-end of our stdout
pipe via fork(). The old `for line in proc.stdout` drain never EOF'd
until the grandchild closed the pipe — so for a uvicorn server, the
terminal tool hung indefinitely (users reported the whole session
deadlocking when asking the agent to restart a backend).
Fix: switch _drain() to select()-based non-blocking reads and stop
draining shortly after bash exits even if the pipe hasn't EOF'd. Any
output the grandchild writes after that point goes to an orphaned pipe,
which is exactly what the user asked for when they said '&'.
Adds regression tests covering the issue's exact repro and 5 related
patterns (plain bg, setsid+disown, streaming output, high volume,
timeout, UTF-8).
2026-04-19 11:12:39 -07:00
|
|
|
fd = proc.stdout.fileno()
|
|
|
|
|
idle_after_exit = 0
|
fix(environments): use incremental UTF-8 decoder in select-based drain
The first draft of the fix called `chunk.decode("utf-8")` directly on
each 4096-byte `os.read()` result, which corrupts output whenever a
multi-byte UTF-8 character straddles a read boundary:
* `UnicodeDecodeError` fires on the valid-but-truncated byte sequence.
* The except handler clears ALL previously-decoded output and replaces
the whole buffer with `[binary output detected ...]`.
Empirically: 10000 '日' chars (30001 bytes) through the wrapper loses
all 10000 characters on the first draft; the baseline TextIOWrapper
drain (which uses `encoding='utf-8', errors='replace'` on Popen)
preserves them all. This regression affects any command emitting
non-ASCII output larger than one chunk — CJK/Arabic/emoji in
`npm install`, `pip install`, `docker logs`, `kubectl logs`, etc.
Fix: swap to `codecs.getincrementaldecoder('utf-8')(errors='replace')`,
which buffers partial multi-byte sequences across chunks and substitutes
U+FFFD for genuinely invalid bytes. Flush on drain exit via
`decoder.decode(b'', final=True)` to emit any trailing replacement
character for a dangling partial sequence.
Adds two regression tests:
* test_utf8_multibyte_across_read_boundary — 10000 U+65E5 chars,
verifies count round-trips and no fallback fires.
* test_invalid_utf8_uses_replacement_not_fallback — deliberate
\xff\xfe between valid ASCII, verifies surrounding text survives.
2026-04-19 11:26:02 -07:00
|
|
|
try:
|
|
|
|
|
while True:
|
fix(environments): prevent terminal hang when commands background children (#8340)
When a user's command backgrounds a child (`cmd &`, `setsid cmd & disown`,
etc.), the backgrounded grandchild inherits the write-end of our stdout
pipe via fork(). The old `for line in proc.stdout` drain never EOF'd
until the grandchild closed the pipe — so for a uvicorn server, the
terminal tool hung indefinitely (users reported the whole session
deadlocking when asking the agent to restart a backend).
Fix: switch _drain() to select()-based non-blocking reads and stop
draining shortly after bash exits even if the pipe hasn't EOF'd. Any
output the grandchild writes after that point goes to an orphaned pipe,
which is exactly what the user asked for when they said '&'.
Adds regression tests covering the issue's exact repro and 5 related
patterns (plain bg, setsid+disown, streaming output, high volume,
timeout, UTF-8).
2026-04-19 11:12:39 -07:00
|
|
|
try:
|
fix(environments): use incremental UTF-8 decoder in select-based drain
The first draft of the fix called `chunk.decode("utf-8")` directly on
each 4096-byte `os.read()` result, which corrupts output whenever a
multi-byte UTF-8 character straddles a read boundary:
* `UnicodeDecodeError` fires on the valid-but-truncated byte sequence.
* The except handler clears ALL previously-decoded output and replaces
the whole buffer with `[binary output detected ...]`.
Empirically: 10000 '日' chars (30001 bytes) through the wrapper loses
all 10000 characters on the first draft; the baseline TextIOWrapper
drain (which uses `encoding='utf-8', errors='replace'` on Popen)
preserves them all. This regression affects any command emitting
non-ASCII output larger than one chunk — CJK/Arabic/emoji in
`npm install`, `pip install`, `docker logs`, `kubectl logs`, etc.
Fix: swap to `codecs.getincrementaldecoder('utf-8')(errors='replace')`,
which buffers partial multi-byte sequences across chunks and substitutes
U+FFFD for genuinely invalid bytes. Flush on drain exit via
`decoder.decode(b'', final=True)` to emit any trailing replacement
character for a dangling partial sequence.
Adds two regression tests:
* test_utf8_multibyte_across_read_boundary — 10000 U+65E5 chars,
verifies count round-trips and no fallback fires.
* test_invalid_utf8_uses_replacement_not_fallback — deliberate
\xff\xfe between valid ASCII, verifies surrounding text survives.
2026-04-19 11:26:02 -07:00
|
|
|
ready, _, _ = select.select([fd], [], [], 0.1)
|
fix(environments): prevent terminal hang when commands background children (#8340)
When a user's command backgrounds a child (`cmd &`, `setsid cmd & disown`,
etc.), the backgrounded grandchild inherits the write-end of our stdout
pipe via fork(). The old `for line in proc.stdout` drain never EOF'd
until the grandchild closed the pipe — so for a uvicorn server, the
terminal tool hung indefinitely (users reported the whole session
deadlocking when asking the agent to restart a backend).
Fix: switch _drain() to select()-based non-blocking reads and stop
draining shortly after bash exits even if the pipe hasn't EOF'd. Any
output the grandchild writes after that point goes to an orphaned pipe,
which is exactly what the user asked for when they said '&'.
Adds regression tests covering the issue's exact repro and 5 related
patterns (plain bg, setsid+disown, streaming output, high volume,
timeout, UTF-8).
2026-04-19 11:12:39 -07:00
|
|
|
except (ValueError, OSError):
|
fix(environments): use incremental UTF-8 decoder in select-based drain
The first draft of the fix called `chunk.decode("utf-8")` directly on
each 4096-byte `os.read()` result, which corrupts output whenever a
multi-byte UTF-8 character straddles a read boundary:
* `UnicodeDecodeError` fires on the valid-but-truncated byte sequence.
* The except handler clears ALL previously-decoded output and replaces
the whole buffer with `[binary output detected ...]`.
Empirically: 10000 '日' chars (30001 bytes) through the wrapper loses
all 10000 characters on the first draft; the baseline TextIOWrapper
drain (which uses `encoding='utf-8', errors='replace'` on Popen)
preserves them all. This regression affects any command emitting
non-ASCII output larger than one chunk — CJK/Arabic/emoji in
`npm install`, `pip install`, `docker logs`, `kubectl logs`, etc.
Fix: swap to `codecs.getincrementaldecoder('utf-8')(errors='replace')`,
which buffers partial multi-byte sequences across chunks and substitutes
U+FFFD for genuinely invalid bytes. Flush on drain exit via
`decoder.decode(b'', final=True)` to emit any trailing replacement
character for a dangling partial sequence.
Adds two regression tests:
* test_utf8_multibyte_across_read_boundary — 10000 U+65E5 chars,
verifies count round-trips and no fallback fires.
* test_invalid_utf8_uses_replacement_not_fallback — deliberate
\xff\xfe between valid ASCII, verifies surrounding text survives.
2026-04-19 11:26:02 -07:00
|
|
|
break # fd already closed
|
|
|
|
|
if ready:
|
|
|
|
|
try:
|
|
|
|
|
chunk = os.read(fd, 4096)
|
|
|
|
|
except (ValueError, OSError):
|
|
|
|
|
break
|
|
|
|
|
if not chunk:
|
|
|
|
|
break # true EOF — all writers closed
|
|
|
|
|
output_chunks.append(decoder.decode(chunk))
|
|
|
|
|
idle_after_exit = 0
|
|
|
|
|
elif proc.poll() is not None:
|
|
|
|
|
# bash is gone and the pipe was idle for ~100ms. Give
|
|
|
|
|
# it two more cycles to catch any buffered tail, then
|
|
|
|
|
# stop — otherwise we wait forever on a grandchild pipe.
|
|
|
|
|
idle_after_exit += 1
|
|
|
|
|
if idle_after_exit >= 3:
|
|
|
|
|
break
|
|
|
|
|
finally:
|
|
|
|
|
# Flush any bytes buffered mid-sequence. With ``errors="replace"``
|
|
|
|
|
# this emits U+FFFD for any final incomplete sequence rather than
|
|
|
|
|
# raising.
|
|
|
|
|
try:
|
|
|
|
|
tail = decoder.decode(b"", final=True)
|
|
|
|
|
if tail:
|
|
|
|
|
output_chunks.append(tail)
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
|
|
|
|
|
drain_thread = threading.Thread(target=_drain, daemon=True)
|
|
|
|
|
drain_thread.start()
|
|
|
|
|
deadline = time.monotonic() + timeout
|
2026-04-16 19:01:56 +05:30
|
|
|
_now = time.monotonic()
|
|
|
|
|
_activity_state = {
|
|
|
|
|
"last_touch": _now,
|
|
|
|
|
"start": _now,
|
|
|
|
|
}
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
|
fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907)
* fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace
interrupt() previously only flagged the agent's _execution_thread_id.
Tools running inside _execute_tool_calls_concurrent execute on
ThreadPoolExecutor worker threads whose tids are distinct from the
agent's, so is_interrupted() inside those tools returned False no matter
how many times the gateway called .interrupt() — hung ssh / curl / long
make-builds ran to their own timeout.
Changes:
- run_agent.py: track concurrent-tool worker tids in a per-agent set,
fan interrupt()/clear_interrupt() out to them, and handle the
register-after-interrupt race at _run_tool entry. getattr fallback
for the tracker so test stubs built via object.__new__ keep working.
- tools/environments/base.py: opt-in _wait_for_process trace (ENTER,
per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT
DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1.
- tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target
tid, set snapshot) behind the same env flag.
- tests: new regression test runs a polling tool on a concurrent worker
and asserts is_interrupted() flips to True within ~1s of interrupt().
Second new test guards clear_interrupt() clearing tracked worker bits.
Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env
subset 216 pass.
* fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log
AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when
quiet_mode=True (the CLI default). This would silently swallow every
INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation
added in the parent commit — confirmed by running hermes chat -q with
the flag and finding zero trace lines in agent.log even though
_wait_for_process was clearly executing (subprocess pid existed).
Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets
its own logger level to INFO at import time, overriding the 'tools'
parent-level filter. Scoped to the opt-in case only, so production
(quiet_mode default) logs stay quiet as designed.
Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes
'_wait_for_process ENTER/EXIT' lines to agent.log as expected.
* fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses
Tool subprocesses spawned by the local environment backend use
os.setsid so they run in their own process group. Before this fix,
SIGTERM/SIGHUP to the hermes CLI killed the main thread via
KeyboardInterrupt but the worker thread running _wait_for_process
never got a chance to call _kill_process — Python exited, the child
was reparented to init (PPID=1), and the subprocess ran to its
natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM
to the agent until manual cleanup).
Changes:
- cli.py _signal_handler (interactive) + _signal_handler_q (-q mode):
route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll
loop sees the per-thread interrupt flag and calls _kill_process
(os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default
1.5s) gives the worker time to complete its SIGTERM+SIGKILL
escalation before KeyboardInterrupt unwinds main.
- tools/environments/base.py _wait_for_process: wrap the poll loop in
try/except (KeyboardInterrupt, SystemExit) so the cleanup fires
even on paths the signal handlers don't cover (direct sys.exit,
unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace
line when HERMES_DEBUG_INTERRUPT=1.
- New regression test: injects KeyboardInterrupt into a running
_wait_for_process via PyThreadState_SetAsyncExc, verifies the
subprocess process group is dead within 3s of the exception and
that KeyboardInterrupt re-raises cleanly afterward.
Validation:
| Before | After |
|---------------------------------------------------------|--------------------|
| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM | dies within 2 s |
| No INTERRUPT DETECTED in trace | INTERRUPT DETECTED fires + killing process group |
| tests/tools/test_local_interrupt_cleanup | 1/1 pass |
| tests/run_agent/test_concurrent_interrupt | 4/4 pass |
2026-04-17 20:39:25 -07:00
|
|
|
# --- Debug tracing (opt-in via HERMES_DEBUG_INTERRUPT=1) -------------
|
|
|
|
|
# Captures loop entry/exit, interrupt state changes, and periodic
|
|
|
|
|
# heartbeats so we can diagnose "agent never sees the interrupt"
|
|
|
|
|
# reports without reproducing locally.
|
|
|
|
|
_tid = threading.current_thread().ident
|
|
|
|
|
_pid = getattr(proc, "pid", None)
|
|
|
|
|
_iter_count = 0
|
|
|
|
|
_last_heartbeat = _now
|
|
|
|
|
_last_interrupt_state = False
|
|
|
|
|
_cb_was_none = _get_activity_callback() is None
|
|
|
|
|
if _DEBUG_INTERRUPT:
|
|
|
|
|
logger.info(
|
|
|
|
|
"[interrupt-debug] _wait_for_process ENTER tid=%s pid=%s "
|
|
|
|
|
"timeout=%ss activity_cb=%s initial_interrupt=%s",
|
|
|
|
|
_tid, _pid, timeout,
|
|
|
|
|
"set" if not _cb_was_none else "MISSING",
|
|
|
|
|
is_interrupted(),
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
while proc.poll() is None:
|
|
|
|
|
_iter_count += 1
|
|
|
|
|
if is_interrupted():
|
|
|
|
|
if _DEBUG_INTERRUPT:
|
|
|
|
|
logger.info(
|
|
|
|
|
"[interrupt-debug] _wait_for_process INTERRUPT DETECTED "
|
|
|
|
|
"tid=%s pid=%s iter=%d elapsed=%.1fs — killing process group",
|
|
|
|
|
_tid, _pid, _iter_count, time.monotonic() - _activity_state["start"],
|
|
|
|
|
)
|
|
|
|
|
self._kill_process(proc)
|
|
|
|
|
drain_thread.join(timeout=2)
|
|
|
|
|
return {
|
|
|
|
|
"output": "".join(output_chunks) + "\n[Command interrupted]",
|
|
|
|
|
"returncode": 130,
|
|
|
|
|
}
|
|
|
|
|
if time.monotonic() > deadline:
|
|
|
|
|
if _DEBUG_INTERRUPT:
|
|
|
|
|
logger.info(
|
|
|
|
|
"[interrupt-debug] _wait_for_process TIMEOUT "
|
|
|
|
|
"tid=%s pid=%s iter=%d timeout=%ss",
|
|
|
|
|
_tid, _pid, _iter_count, timeout,
|
|
|
|
|
)
|
|
|
|
|
self._kill_process(proc)
|
|
|
|
|
drain_thread.join(timeout=2)
|
|
|
|
|
partial = "".join(output_chunks)
|
|
|
|
|
timeout_msg = f"\n[Command timed out after {timeout}s]"
|
|
|
|
|
return {
|
|
|
|
|
"output": partial + timeout_msg
|
|
|
|
|
if partial
|
|
|
|
|
else timeout_msg.lstrip(),
|
|
|
|
|
"returncode": 124,
|
|
|
|
|
}
|
|
|
|
|
# Periodic activity touch so the gateway knows we're alive
|
|
|
|
|
touch_activity_if_due(_activity_state, "terminal command running")
|
|
|
|
|
|
|
|
|
|
# Heartbeat every ~30s: proves the loop is alive and reports
|
|
|
|
|
# the activity-callback state (thread-local, can get clobbered
|
|
|
|
|
# by nested tool calls or executor thread reuse).
|
|
|
|
|
if _DEBUG_INTERRUPT and time.monotonic() - _last_heartbeat >= 30.0:
|
|
|
|
|
_cb_now_none = _get_activity_callback() is None
|
|
|
|
|
logger.info(
|
|
|
|
|
"[interrupt-debug] _wait_for_process HEARTBEAT "
|
|
|
|
|
"tid=%s pid=%s iter=%d elapsed=%.0fs "
|
|
|
|
|
"interrupt=%s activity_cb=%s%s",
|
|
|
|
|
_tid, _pid, _iter_count,
|
|
|
|
|
time.monotonic() - _activity_state["start"],
|
|
|
|
|
is_interrupted(),
|
|
|
|
|
"set" if not _cb_now_none else "MISSING",
|
|
|
|
|
" (LOST during run)" if _cb_now_none and not _cb_was_none else "",
|
|
|
|
|
)
|
|
|
|
|
_last_heartbeat = time.monotonic()
|
|
|
|
|
_cb_was_none = _cb_now_none
|
|
|
|
|
|
|
|
|
|
time.sleep(0.2)
|
|
|
|
|
except (KeyboardInterrupt, SystemExit):
|
|
|
|
|
# Signal arrived (SIGTERM/SIGHUP/SIGINT) or sys.exit() was called
|
|
|
|
|
# while we were polling. The local backend spawns subprocesses
|
|
|
|
|
# with os.setsid, which puts them in their own process group — so
|
|
|
|
|
# if we let the interrupt propagate without killing the child,
|
|
|
|
|
# python exits and the child is reparented to init (PPID=1) and
|
|
|
|
|
# keeps running as an orphan. Killing the process group here
|
|
|
|
|
# guarantees the tool's side effects stop when the agent stops.
|
|
|
|
|
if _DEBUG_INTERRUPT:
|
|
|
|
|
logger.info(
|
|
|
|
|
"[interrupt-debug] _wait_for_process EXCEPTION_EXIT "
|
|
|
|
|
"tid=%s pid=%s iter=%d elapsed=%.1fs — killing subprocess group before re-raise",
|
|
|
|
|
_tid, _pid, _iter_count,
|
|
|
|
|
time.monotonic() - _activity_state["start"],
|
|
|
|
|
)
|
|
|
|
|
try:
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
self._kill_process(proc)
|
|
|
|
|
drain_thread.join(timeout=2)
|
fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907)
* fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace
interrupt() previously only flagged the agent's _execution_thread_id.
Tools running inside _execute_tool_calls_concurrent execute on
ThreadPoolExecutor worker threads whose tids are distinct from the
agent's, so is_interrupted() inside those tools returned False no matter
how many times the gateway called .interrupt() — hung ssh / curl / long
make-builds ran to their own timeout.
Changes:
- run_agent.py: track concurrent-tool worker tids in a per-agent set,
fan interrupt()/clear_interrupt() out to them, and handle the
register-after-interrupt race at _run_tool entry. getattr fallback
for the tracker so test stubs built via object.__new__ keep working.
- tools/environments/base.py: opt-in _wait_for_process trace (ENTER,
per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT
DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1.
- tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target
tid, set snapshot) behind the same env flag.
- tests: new regression test runs a polling tool on a concurrent worker
and asserts is_interrupted() flips to True within ~1s of interrupt().
Second new test guards clear_interrupt() clearing tracked worker bits.
Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env
subset 216 pass.
* fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log
AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when
quiet_mode=True (the CLI default). This would silently swallow every
INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation
added in the parent commit — confirmed by running hermes chat -q with
the flag and finding zero trace lines in agent.log even though
_wait_for_process was clearly executing (subprocess pid existed).
Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets
its own logger level to INFO at import time, overriding the 'tools'
parent-level filter. Scoped to the opt-in case only, so production
(quiet_mode default) logs stay quiet as designed.
Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes
'_wait_for_process ENTER/EXIT' lines to agent.log as expected.
* fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses
Tool subprocesses spawned by the local environment backend use
os.setsid so they run in their own process group. Before this fix,
SIGTERM/SIGHUP to the hermes CLI killed the main thread via
KeyboardInterrupt but the worker thread running _wait_for_process
never got a chance to call _kill_process — Python exited, the child
was reparented to init (PPID=1), and the subprocess ran to its
natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM
to the agent until manual cleanup).
Changes:
- cli.py _signal_handler (interactive) + _signal_handler_q (-q mode):
route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll
loop sees the per-thread interrupt flag and calls _kill_process
(os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default
1.5s) gives the worker time to complete its SIGTERM+SIGKILL
escalation before KeyboardInterrupt unwinds main.
- tools/environments/base.py _wait_for_process: wrap the poll loop in
try/except (KeyboardInterrupt, SystemExit) so the cleanup fires
even on paths the signal handlers don't cover (direct sys.exit,
unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace
line when HERMES_DEBUG_INTERRUPT=1.
- New regression test: injects KeyboardInterrupt into a running
_wait_for_process via PyThreadState_SetAsyncExc, verifies the
subprocess process group is dead within 3s of the exception and
that KeyboardInterrupt re-raises cleanly afterward.
Validation:
| Before | After |
|---------------------------------------------------------|--------------------|
| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM | dies within 2 s |
| No INTERRUPT DETECTED in trace | INTERRUPT DETECTED fires + killing process group |
| tests/tools/test_local_interrupt_cleanup | 1/1 pass |
| tests/run_agent/test_concurrent_interrupt | 4/4 pass |
2026-04-17 20:39:25 -07:00
|
|
|
except Exception:
|
|
|
|
|
pass # cleanup is best-effort
|
|
|
|
|
raise
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
|
fix(environments): prevent terminal hang when commands background children (#8340)
When a user's command backgrounds a child (`cmd &`, `setsid cmd & disown`,
etc.), the backgrounded grandchild inherits the write-end of our stdout
pipe via fork(). The old `for line in proc.stdout` drain never EOF'd
until the grandchild closed the pipe — so for a uvicorn server, the
terminal tool hung indefinitely (users reported the whole session
deadlocking when asking the agent to restart a backend).
Fix: switch _drain() to select()-based non-blocking reads and stop
draining shortly after bash exits even if the pipe hasn't EOF'd. Any
output the grandchild writes after that point goes to an orphaned pipe,
which is exactly what the user asked for when they said '&'.
Adds regression tests covering the issue's exact repro and 5 related
patterns (plain bg, setsid+disown, streaming output, high volume,
timeout, UTF-8).
2026-04-19 11:12:39 -07:00
|
|
|
# Drain thread now exits promptly after bash does (~300ms idle
|
|
|
|
|
# check). A short join is enough; a long one would be a bug since
|
|
|
|
|
# it means the non-blocking loop itself stopped cooperating.
|
|
|
|
|
drain_thread.join(timeout=2)
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
|
|
|
|
|
try:
|
|
|
|
|
proc.stdout.close()
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
|
fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace (#11907)
* fix(interrupt): propagate to concurrent-tool workers + opt-in debug trace
interrupt() previously only flagged the agent's _execution_thread_id.
Tools running inside _execute_tool_calls_concurrent execute on
ThreadPoolExecutor worker threads whose tids are distinct from the
agent's, so is_interrupted() inside those tools returned False no matter
how many times the gateway called .interrupt() — hung ssh / curl / long
make-builds ran to their own timeout.
Changes:
- run_agent.py: track concurrent-tool worker tids in a per-agent set,
fan interrupt()/clear_interrupt() out to them, and handle the
register-after-interrupt race at _run_tool entry. getattr fallback
for the tracker so test stubs built via object.__new__ keep working.
- tools/environments/base.py: opt-in _wait_for_process trace (ENTER,
per-30s HEARTBEAT with interrupt+activity-cb state, INTERRUPT
DETECTED, TIMEOUT, EXIT) behind HERMES_DEBUG_INTERRUPT=1.
- tools/interrupt.py: opt-in set_interrupt() trace (caller tid, target
tid, set snapshot) behind the same env flag.
- tests: new regression test runs a polling tool on a concurrent worker
and asserts is_interrupted() flips to True within ~1s of interrupt().
Second new test guards clear_interrupt() clearing tracked worker bits.
Validation: tests/run_agent/ all 762 pass; tests/tools/ interrupt+env
subset 216 pass.
* fix(interrupt-debug): bypass quiet_mode logger filter so trace reaches agent.log
AIAgent.__init__ sets logging.getLogger('tools').setLevel(ERROR) when
quiet_mode=True (the CLI default). This would silently swallow every
INFO-level trace line from the HERMES_DEBUG_INTERRUPT=1 instrumentation
added in the parent commit — confirmed by running hermes chat -q with
the flag and finding zero trace lines in agent.log even though
_wait_for_process was clearly executing (subprocess pid existed).
Fix: when HERMES_DEBUG_INTERRUPT=1, each traced module explicitly sets
its own logger level to INFO at import time, overriding the 'tools'
parent-level filter. Scoped to the opt-in case only, so production
(quiet_mode default) logs stay quiet as designed.
Validation: hermes chat -q with HERMES_DEBUG_INTERRUPT=1 now writes
'_wait_for_process ENTER/EXIT' lines to agent.log as expected.
* fix(cli): SIGTERM/SIGHUP no longer orphans tool subprocesses
Tool subprocesses spawned by the local environment backend use
os.setsid so they run in their own process group. Before this fix,
SIGTERM/SIGHUP to the hermes CLI killed the main thread via
KeyboardInterrupt but the worker thread running _wait_for_process
never got a chance to call _kill_process — Python exited, the child
was reparented to init (PPID=1), and the subprocess ran to its
natural end (confirmed live: sleep 300 survived 4+ min after SIGTERM
to the agent until manual cleanup).
Changes:
- cli.py _signal_handler (interactive) + _signal_handler_q (-q mode):
route SIGTERM/SIGHUP through agent.interrupt() so the worker's poll
loop sees the per-thread interrupt flag and calls _kill_process
(os.killpg) on the subprocess group. HERMES_SIGTERM_GRACE (default
1.5s) gives the worker time to complete its SIGTERM+SIGKILL
escalation before KeyboardInterrupt unwinds main.
- tools/environments/base.py _wait_for_process: wrap the poll loop in
try/except (KeyboardInterrupt, SystemExit) so the cleanup fires
even on paths the signal handlers don't cover (direct sys.exit,
unhandled KI from nested code, etc.). Emits EXCEPTION_EXIT trace
line when HERMES_DEBUG_INTERRUPT=1.
- New regression test: injects KeyboardInterrupt into a running
_wait_for_process via PyThreadState_SetAsyncExc, verifies the
subprocess process group is dead within 3s of the exception and
that KeyboardInterrupt re-raises cleanly afterward.
Validation:
| Before | After |
|---------------------------------------------------------|--------------------|
| sleep 300 survives 4+ min as PPID=1 orphan after SIGTERM | dies within 2 s |
| No INTERRUPT DETECTED in trace | INTERRUPT DETECTED fires + killing process group |
| tests/tools/test_local_interrupt_cleanup | 1/1 pass |
| tests/run_agent/test_concurrent_interrupt | 4/4 pass |
2026-04-17 20:39:25 -07:00
|
|
|
if _DEBUG_INTERRUPT:
|
|
|
|
|
logger.info(
|
|
|
|
|
"[interrupt-debug] _wait_for_process EXIT (natural) "
|
|
|
|
|
"tid=%s pid=%s iter=%d elapsed=%.1fs returncode=%s",
|
|
|
|
|
_tid, _pid, _iter_count,
|
|
|
|
|
time.monotonic() - _activity_state["start"],
|
|
|
|
|
proc.returncode,
|
|
|
|
|
)
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
return {"output": "".join(output_chunks), "returncode": proc.returncode}
|
|
|
|
|
|
|
|
|
|
def _kill_process(self, proc: ProcessHandle):
|
|
|
|
|
"""Terminate a process. Subclasses may override for process-group kill."""
|
|
|
|
|
try:
|
|
|
|
|
proc.kill()
|
|
|
|
|
except (ProcessLookupError, PermissionError, OSError):
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# CWD extraction
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def _update_cwd(self, result: dict):
|
|
|
|
|
"""Extract CWD from command output. Override for local file-based read."""
|
|
|
|
|
self._extract_cwd_from_output(result)
|
|
|
|
|
|
|
|
|
|
def _extract_cwd_from_output(self, result: dict):
|
|
|
|
|
"""Parse the __HERMES_CWD_{session}__ marker from stdout output.
|
|
|
|
|
|
|
|
|
|
Updates self.cwd and strips the marker from result["output"].
|
|
|
|
|
Used by remote backends (Docker, SSH, Modal, Daytona, Singularity).
|
|
|
|
|
"""
|
|
|
|
|
output = result.get("output", "")
|
|
|
|
|
marker = self._cwd_marker
|
|
|
|
|
last = output.rfind(marker)
|
|
|
|
|
if last == -1:
|
|
|
|
|
return
|
|
|
|
|
|
|
|
|
|
# Find the opening marker before this closing one
|
|
|
|
|
search_start = max(0, last - 4096) # CWD path won't be >4KB
|
|
|
|
|
first = output.rfind(marker, search_start, last)
|
|
|
|
|
if first == -1 or first == last:
|
|
|
|
|
return
|
|
|
|
|
|
|
|
|
|
cwd_path = output[first + len(marker) : last].strip()
|
|
|
|
|
if cwd_path:
|
|
|
|
|
self.cwd = cwd_path
|
|
|
|
|
|
|
|
|
|
# Strip the marker line AND the \n we injected before it.
|
|
|
|
|
# The wrapper emits: printf '\n__MARKER__%s__MARKER__\n'
|
|
|
|
|
# So the output looks like: <cmd output>\n__MARKER__path__MARKER__\n
|
|
|
|
|
# We want to remove everything from the injected \n onwards.
|
|
|
|
|
line_start = output.rfind("\n", 0, first)
|
|
|
|
|
if line_start == -1:
|
|
|
|
|
line_start = first
|
|
|
|
|
line_end = output.find("\n", last + len(marker))
|
|
|
|
|
line_end = line_end + 1 if line_end != -1 else len(output)
|
|
|
|
|
|
|
|
|
|
result["output"] = output[:line_start] + output[line_end:]
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Hooks
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
2026-04-08 14:56:44 -07:00
|
|
|
def _before_execute(self) -> None:
|
|
|
|
|
"""Hook called before each command execution.
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
|
2026-04-08 14:56:44 -07:00
|
|
|
Remote backends (SSH, Modal, Daytona) override this to trigger
|
|
|
|
|
their FileSyncManager. Bind-mount backends (Docker, Singularity)
|
|
|
|
|
and Local don't need file sync — the host filesystem is directly
|
|
|
|
|
visible inside the container/process.
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
"""
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Unified execute()
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
|
|
|
|
def execute(
|
|
|
|
|
self,
|
|
|
|
|
command: str,
|
|
|
|
|
cwd: str = "",
|
|
|
|
|
*,
|
|
|
|
|
timeout: int | None = None,
|
|
|
|
|
stdin_data: str | None = None,
|
|
|
|
|
) -> dict:
|
|
|
|
|
"""Execute a command, return {"output": str, "returncode": int}."""
|
|
|
|
|
self._before_execute()
|
|
|
|
|
|
|
|
|
|
exec_command, sudo_stdin = self._prepare_command(command)
|
fix(terminal): rewrite `A && B &` to `A && { B & }` to prevent subshell leak
bash parses `A && B &` with `&&` tighter than `&`, so it forks a subshell
for the compound and backgrounds the subshell. Inside the subshell, B
runs foreground, so the subshell waits for B. When B is a process that
doesn't naturally exit (`python3 -m http.server`, `yes > /dev/null`, a
long-running daemon), the subshell is stuck in `wait4` forever and leaks
as an orphan reparented to init.
Observed in production: agents running `cd X && python3 -m http.server
8000 &>/dev/null & sleep 1 && curl ...` as a "start a local server, then
verify it" one-liner. Outer bash exits cleanly; the subshell never does.
Across ~3 days of use, 8 unique stuck-terminal events and 7 leaked
bash+server pairs accumulated on the fleet, with some sessions appearing
hung from the user's perspective because the subshell's open stdout pipe
kept the terminal tool's drain thread blocked.
This is distinct from the `set +m` fix in 933fbd8f (which addressed
interactive-shell job-control waiting at exit). `set +m` doesn't help
here because `bash -c` is non-interactive and job control is already
off; the problem is the subshell's own internal wait for its foreground
B, not the outer shell's job-tracking.
The fix: walk the command shell-aware (respecting quotes, parens, brace
groups, `&>`/`>&` redirects), find `A && B &` / `A || B &` at depth 0
and rewrite the tail to `A && { B & }`. Brace groups don't fork a
subshell — they run in the current shell. `B &` inside the group is a
simple background (no subshell wait). The outer `&` is absorbed into
the group, so the compound no longer needs an explicit subshell.
`&&` error-propagation is preserved exactly: if A fails, `&&`
short-circuits and B never runs.
- Skips quoted strings, comment lines, and `(…)` subshells
- Handles `&>/dev/null`, `2>&1`, `>&2` without mistaking them for `&`
- Resets chain state at `;`, `|`, and newlines
- Tracks brace depth so already-rewritten output is idempotent
- Walks using the existing `_read_shell_token` tokenizer, matching the
pattern of `_rewrite_real_sudo_invocations`
Called once from `BaseEnvironment.execute` right after
`_prepare_command`, so it runs for every backend (local, ssh, docker,
modal, etc.) with no per-backend plumbing.
34 new tests covering rewrite cases, preservation cases, redirect
edge-cases, quoting/parens/backticks, idempotency, and empty/edge
inputs. End-to-end verified on a test VM: the exact vela-incident
command now returns in ~1.3s with no leaked bash, only the intentional
backgrounded server.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 12:04:03 -04:00
|
|
|
# Guard against the `A && B &` subshell-wait trap: bash forks a
|
|
|
|
|
# subshell for the compound that then waits for an infinite B (a
|
|
|
|
|
# server, `yes > /dev/null`, etc.), leaking the subshell forever.
|
|
|
|
|
# Rewriting to `A && { B & }` runs B as a plain background in the
|
|
|
|
|
# current shell — no subshell wait.
|
|
|
|
|
from tools.terminal_tool import _rewrite_compound_background
|
|
|
|
|
exec_command = _rewrite_compound_background(exec_command)
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
effective_timeout = timeout or self.timeout
|
|
|
|
|
effective_cwd = cwd or self.cwd
|
|
|
|
|
|
|
|
|
|
# Merge sudo stdin with caller stdin
|
|
|
|
|
if sudo_stdin is not None and stdin_data is not None:
|
|
|
|
|
effective_stdin = sudo_stdin + stdin_data
|
|
|
|
|
elif sudo_stdin is not None:
|
|
|
|
|
effective_stdin = sudo_stdin
|
|
|
|
|
else:
|
|
|
|
|
effective_stdin = stdin_data
|
|
|
|
|
|
|
|
|
|
# Embed stdin as heredoc for backends that need it
|
|
|
|
|
if effective_stdin and self._stdin_mode == "heredoc":
|
|
|
|
|
exec_command = self._embed_stdin_heredoc(exec_command, effective_stdin)
|
|
|
|
|
effective_stdin = None
|
|
|
|
|
|
|
|
|
|
wrapped = self._wrap_command(exec_command, effective_cwd)
|
|
|
|
|
|
|
|
|
|
# Use login shell if snapshot failed (so user's profile still loads)
|
|
|
|
|
login = not self._snapshot_ready
|
|
|
|
|
|
|
|
|
|
proc = self._run_bash(
|
|
|
|
|
wrapped, login=login, timeout=effective_timeout, stdin_data=effective_stdin
|
|
|
|
|
)
|
|
|
|
|
result = self._wait_for_process(proc, timeout=effective_timeout)
|
|
|
|
|
self._update_cwd(result)
|
|
|
|
|
|
|
|
|
|
return result
|
|
|
|
|
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
# Shared helpers
|
|
|
|
|
# ------------------------------------------------------------------
|
|
|
|
|
|
2026-02-21 22:31:43 -08:00
|
|
|
def stop(self):
|
|
|
|
|
"""Alias for cleanup (compat with older callers)."""
|
|
|
|
|
self.cleanup()
|
|
|
|
|
|
|
|
|
|
def __del__(self):
|
|
|
|
|
try:
|
|
|
|
|
self.cleanup()
|
|
|
|
|
except Exception:
|
|
|
|
|
pass
|
|
|
|
|
|
2026-03-08 17:46:11 +03:30
|
|
|
def _prepare_command(self, command: str) -> tuple[str, str | None]:
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
"""Transform sudo commands if SUDO_PASSWORD is available."""
|
2026-02-21 22:31:43 -08:00
|
|
|
from tools.terminal_tool import _transform_sudo_command
|
|
|
|
|
|
feat(environments): unified spawn-per-call execution layer
Replace dual execution model (PersistentShellMixin + per-backend oneshot)
with spawn-per-call + session snapshot for all backends except ManagedModal.
Core changes:
- Every command spawns a fresh bash process; session snapshot (env vars,
functions, aliases) captured at init and re-sourced before each command
- CWD persists via file-based read (local) or in-band stdout markers (remote)
- ProcessHandle protocol + _ThreadedProcessHandle adapter for SDK backends
- cancel_fn wired for Modal (sandbox.terminate) and Daytona (sandbox.stop)
- Shared utilities extracted: _pipe_stdin, _popen_bash, _load_json_store,
_save_json_store, _file_mtime_key, _SYNC_INTERVAL_SECONDS
- Rate-limited file sync unified in base _before_execute() with _sync_files() hook
- execute_oneshot() removed; all 11 call sites in code_execution_tool.py
migrated to execute()
- Daytona timeout wrapper replaced with SDK-native timeout parameter
- persistent_shell.py deleted (291 lines)
Backend-specific:
- Local: process-group kill via os.killpg, file-based CWD read
- Docker: -e env flags only on init_session, not per-command
- SSH: shlex.quote transport, ControlMaster connection reuse
- Singularity: apptainer exec with instance://, no forced --pwd
- Modal: _AsyncWorker + _ThreadedProcessHandle, cancel_fn -> sandbox.terminate
- Daytona: SDK-level timeout (not shell wrapper), cancel_fn -> sandbox.stop
- ManagedModal: unchanged (gateway owns execution); docstring added explaining why
2026-04-08 13:38:04 -07:00
|
|
|
return _transform_sudo_command(command)
|
2026-04-04 12:57:49 -07:00
|
|
|
|