Compare commits

...

5 Commits

Author SHA1 Message Date
dependabot[bot]
c887b8ba36 chore(deps): bump pytest from 9.0.2 to 9.0.3
Bumps [pytest](https://github.com/pytest-dev/pytest) from 9.0.2 to 9.0.3.
- [Release notes](https://github.com/pytest-dev/pytest/releases)
- [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst)
- [Commits](https://github.com/pytest-dev/pytest/compare/9.0.2...9.0.3)

---
updated-dependencies:
- dependency-name: pytest
  dependency-version: 9.0.3
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-06-12 17:45:13 +00:00
ethernet
4d68984ec7 fix(tests): remove no-longer-needed forensics 2026-06-12 13:42:42 -04:00
ethernet
6ff39c31ad fix(tests): guard against real 'hermes update' subprocess spawns in conftest
Extends _live_system_guard in tests/conftest.py to block any subprocess
call that would run 'hermes update' (or 'python -m hermes_cli.main update')
against the real checkout.

These commands run git fetch origin + git pull, overwriting repo files
like pyproject.toml mid-test-run and corrupting every subsequent
subprocess that reads them. The spawned process uses setsid /
start_new_session=True so it's invisible to pytest's process tree
(PPid=1) — the corruption was essentially undetectable without
explicit inotify/SHA watchdogs.

Root cause of #43703 CI failures: tests in TestUpdateCommandPlatformGate
called _handle_update_command() with HERMES_MANAGED='' and no Popen mock,
causing the code to fall through and spawn a real 'hermes update --gateway'
that overwrote pyproject.toml with origin/main's content (which still
had '--timeout=30 --timeout-method=thread' in addopts while the PR had
already removed pytest-timeout).

The guard covers all three invocation patterns:
- 'hermes update' / 'hermes update --gateway' (direct or via setsid bash -c)
- 'python -m hermes_cli.main update --gateway'
- '.venv/bin/hermes update' (absolute path variant)

Does not false-positive on: git update-index, apt-get update,
pip install --upgrade, or any command lacking 'hermes'/'hermes_cli'.
2026-06-12 13:42:42 -04:00
ethernet
c41a6534cf fix(tests): mock subprocess.Popen in all _handle_update_command tests 2026-06-12 13:42:42 -04:00
ethernet
2f9d18711f fix(ci): remove pytest-timeout, use per-file timeout only
fix(ci): write a new cache for test durations every time
change(ci): rip out error 4 retries because we found the real bug
2026-06-12 13:42:42 -04:00
11 changed files with 128 additions and 299 deletions

View File

@@ -90,7 +90,7 @@ jobs:
# (see `_SKIP_PARTS` in scripts/run_tests_parallel.py) because each
# shard would otherwise reach the session-scoped ``built_image``
# fixture in ``tests/docker/conftest.py`` and start a 3-7min
# ``docker build`` under a 180s pytest-timeout cap — guaranteed to
# ``docker build`` — guaranteed to
# die in fixture setup.
#
# Piggybacking here avoids a second image build: the smoke test
@@ -114,7 +114,7 @@ jobs:
run: |
uv venv .venv --python 3.11
source .venv/bin/activate
# ``dev`` extra pulls in pytest, pytest-asyncio, pytest-timeout
# ``dev`` extra pulls in pytest, pytest-asyncio —
# everything tests/docker/ needs. We deliberately avoid ``all``
# here because the docker tests only drive the container via
# subprocess and don't import hermes_agent's optional deps.

View File

@@ -4,13 +4,13 @@ on:
push:
branches: [main]
paths-ignore:
- '**/*.md'
- 'docs/**'
- "**/*.md"
- "docs/**"
pull_request:
branches: [main]
paths-ignore:
- '**/*.md'
- 'docs/**'
- "**/*.md"
- "docs/**"
permissions:
contents: read
@@ -30,13 +30,17 @@ jobs:
slice: [1, 2, 3, 4, 5, 6]
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Restore duration cache
uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: test_durations.json
# Single stable key. main always overwrites, PRs always find it.
# main always writes a new suffix, but jobs pick the latest one with the same prefix
# quote from https://docs.github.com/en/actions/reference/workflows-and-actions/dependency-caching#cache-hits-and-misses
# If you provide restore-keys, the cache action sequentially searches for any caches that match the list of restore-keys.
# If there are no exact matches, the action searches for partial matches of the restore keys.
# When the action finds a partial match, the most recent cache is restored to the path directory.
key: test-durations
- name: Install ripgrep (prebuilt binary)
@@ -54,7 +58,7 @@ jobs:
rg --version
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
with:
# Persist uv's download/wheel cache (~/.cache/uv) across runs.
# Keyed on the dependency manifests, so the cache is reused until
@@ -115,7 +119,7 @@ jobs:
NOUS_API_KEY: ""
- name: Upload per-slice durations
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
with:
name: test-durations-slice-${{ matrix.slice }}
path: test_durations.json
@@ -129,7 +133,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Download all slice durations
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
with:
pattern: test-durations-slice-*
path: durations
@@ -149,17 +153,17 @@ jobs:
"
- name: Save merged duration cache
uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
with:
path: test_durations.json
key: test-durations
key: test-durations-${{ github.run_id }}
e2e:
runs-on: ubuntu-latest
timeout-minutes: 15
steps:
- name: Checkout code
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
- name: Install ripgrep (prebuilt binary)
run: |
@@ -176,7 +180,7 @@ jobs:
rg --version
- name: Install uv
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
with:
# Persist uv's download/wheel cache (~/.cache/uv) across runs.
# Keyed on the dependency manifests, so the cache is reused until
@@ -215,4 +219,4 @@ jobs:
env:
OPENROUTER_API_KEY: ""
OPENAI_API_KEY: ""
NOUS_API_KEY: ""
NOUS_API_KEY: ""

View File

@@ -131,7 +131,7 @@ edge-tts = ["edge-tts==7.2.7"]
modal = ["modal==1.3.4"]
daytona = ["daytona==0.155.0"]
hindsight = ["hindsight-client==0.6.1"]
dev = ["debugpy==1.8.20", "pytest==9.0.2", "pytest-asyncio==1.3.0", "pytest-timeout==2.4.0", "mcp==1.26.0", "starlette==1.0.1", "ty==0.0.21", "ruff==0.15.10", "setuptools==82.0.1"] # starlette: CVE-2026-48710
dev = ["debugpy==1.8.20", "pytest==9.0.3", "pytest-asyncio==1.3.0", "mcp==1.26.0", "starlette==1.0.1", "ty==0.0.21", "ruff==0.15.10", "setuptools==82.0.1"] # starlette: CVE-2026-48710
messaging = ["python-telegram-bot[webhooks]==22.6", "discord.py[voice]==2.7.1", "aiohttp==3.13.4", "brotlicffi==1.2.0.1", "slack-bolt==1.27.0", "slack-sdk==3.40.1", "qrcode==7.4.2"] # aiohttp: CVE-2026-34513/34518/34519/34520/34525
cron = [] # croniter is now a core dependency; this extra kept for back-compat
slack = ["slack-bolt==1.27.0", "slack-sdk==3.40.1", "aiohttp==3.13.4"]
@@ -327,12 +327,8 @@ markers = [
"integration: marks tests requiring external services (API keys, Modal, etc.)",
"real_concurrent_gate: opt out of the autouse stub that disables _detect_concurrent_hermes_instances",
]
# pytest-timeout: per-test 30s hard cap with cross-platform thread method.
# This is the fallback inside each per-file pytest subprocess (see
# scripts/run_tests_parallel.py). Per-file isolation gives every test
# file a fresh Python interpreter; pytest-timeout catches Python-level
# hangs within a file.
addopts = "-m 'not integration' --timeout=30 --timeout-method=thread"
# integration tests take way too long to run in the normal CI environments
addopts = "-m 'not integration'"
[tool.ty.environment]
python-version = "3.13"

View File

@@ -73,6 +73,7 @@ exec env -i \
LANG=C.UTF-8 \
LC_ALL=C.UTF-8 \
PYTHONHASHSEED=0 \
PYTHONDONTWRITEBYTECODE=1 \
${EXTRA_PYTHONPATH:+PYTHONPATH="$EXTRA_PYTHONPATH"} \
${EXTRA_PYTEST_PLUGINS:+PYTEST_PLUGINS="$EXTRA_PYTEST_PLUGINS"} \
"$PYTHON" "$SCRIPT_DIR/run_tests_parallel.py" "$@"

View File

@@ -65,17 +65,14 @@ _DEFAULT_ROOTS = ["tests"]
# rebuild). The full pytest-shard runner can't
# host these because the session-scoped
# ``built_image`` fixture would do a 3-7min
# ``docker build`` inside a 180s per-test
# pytest-timeout cap (set by tests/docker/conftest.py),
# ``docker build``,
# so the build is guaranteed to die in fixture
# setup. The dedicated job sidesteps both costs.
_SKIP_PARTS = {"integration", "e2e", "docker"}
# Per-file wall-clock cap. Generous default — pytest-timeout still
# enforces per-test caps inside each subprocess; this is just an outer
# safety net so a single hung file can't stall the whole suite. Override
# Per-file wall-clock cap. Override
# via --file-timeout or HERMES_TEST_FILE_TIMEOUT.
_DEFAULT_FILE_TIMEOUT_SECONDS = 600.0 # 10 minutes
_DEFAULT_FILE_TIMEOUT_SECONDS = 140.0 # set by observing the slowest file at commit time was ~100s in CI and adding some leeway
# Duration cache: maps relative file paths to last-observed subprocess
# wall-clock seconds. Used by ``--slice`` to distribute files across
@@ -246,27 +243,49 @@ def _kill_tree(proc: "subprocess.Popen", pgid: int | None = None) -> None:
pass
def _spawn_pytest_once(
cmd: List[str],
def _run_one_file(
file: Path,
pytest_args: List[str],
repo_root: Path,
file_timeout: float,
*,
timeout_note: str = "per-file timeout",
) -> Tuple[int, str]:
"""Run one ``pytest`` subprocess to completion and return ``(rc, output)``.
) -> Tuple[Path, int, str, dict[str, int], float]:
"""Run ``python -m pytest <file> <pytest_args>`` in a fresh subprocess.
Spawns the child in its own process group / session so a hung file and
its grandchildren (uvicorn servers, async runtimes, etc.) can be SIGKILL'd
as a tree on timeout rather than orphaning onto PID 1. Shared by the
primary per-file run and the exit-4 retry loop so the lifecycle/cleanup
logic lives in exactly one place.
Returns (file, returncode, captured_combined_output, summary_counts, subprocess_wall_seconds).
``summary_counts`` is the result of ``_parse_pytest_summary(output)`` —
pytest exit codes (https://docs.pytest.org/en/stable/reference/exit-codes.html):
0 = all tests passed
1 = some tests failed
2 = test execution interrupted
3 = internal error
4 = pytest CLI usage error
5 = no tests collected
We treat exit 5 as a pass: it just means every test in the file was
skipped or filtered by a marker (e.g. ``-m 'not integration'`` skips
files where every test is marked integration). That's intentional and
not a failure mode.
On per-file timeout (``file_timeout`` seconds) or any other exception
during ``communicate()``, we kill the whole process group / process
tree so grandchildren (uvicorn servers, async runtimes, etc.) do not
orphan onto PID 1. This outer timeout exists only to
bound a pathologically slow or hung file as a whole.
"""
cmd = [sys.executable, "-m", "pytest", str(file), *pytest_args]
subproc_start = time.monotonic()
# launch the pytest process
proc = subprocess.Popen(
cmd,
cwd=repo_root,
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
text=True,
# skipping writing bytecode because we're running a bunch of parallel python processes on the same code
env={**os.environ, 'PYTHONDONTWRITEBYTECODE': '1'},
# POSIX: place the child at the head of its own process group so
# _kill_tree can SIGKILL the group atomically.
# Windows: this maps to CREATE_NEW_PROCESS_GROUP in CPython 3.12+;
@@ -296,7 +315,7 @@ def _spawn_pytest_once(
output = "(file timeout exceeded; output unavailable)"
rc = 124 # de facto convention for "killed by timeout".
output = (
f"({timeout_note}: {file_timeout:.0f}s exceeded; "
f"({file_timeout:.0f}s exceeded; "
f"process tree SIGKILL'd)\n{output}"
)
except BaseException:
@@ -309,123 +328,7 @@ def _spawn_pytest_once(
# case it left grandchildren behind; already-dead is a no-op.
_kill_tree(proc, pgid=pgid)
return rc, output
# How many times to re-run a file that exits 4 ("file or directory not found")
# while the file demonstrably exists on disk. On loaded shared CI runners the
# planner can enumerate a file (tests counted via --collect-only) but the
# per-file subprocess fail to stat it moments later — and a SINGLE immediate
# retry can land in the same brief high-load window and fail again. We retry a
# few times with a short backoff so transient I/O pressure has time to settle.
_EXIT4_RETRY_ATTEMPTS = 3
_EXIT4_RETRY_BACKOFF_SECONDS = 0.5
def _file_present(file: Path, *, attempts: int = 3, delay: float = 0.2) -> bool:
"""Return True if ``file`` exists, re-checking a few times.
``Path.exists()`` itself issues a ``stat`` that can transiently fail under
the same load that makes pytest report "file or directory not found", so a
single negative check is not authoritative. Only conclude the file is
genuinely missing if it's absent across several spaced checks.
"""
for i in range(attempts):
if file.exists():
return True
if i < attempts - 1:
time.sleep(delay)
return False
def _run_one_file(
file: Path,
pytest_args: List[str],
repo_root: Path,
file_timeout: float,
) -> Tuple[Path, int, str, dict[str, int], float]:
"""Run ``python -m pytest <file> <pytest_args>`` in a fresh subprocess.
Returns (file, returncode, captured_combined_output, summary_counts, subprocess_wall_seconds).
``summary_counts`` is the result of ``_parse_pytest_summary(output)`` —
pytest exit codes (https://docs.pytest.org/en/stable/reference/exit-codes.html):
0 = all tests passed
1 = some tests failed
2 = test execution interrupted
3 = internal error
4 = pytest CLI usage error
5 = no tests collected
We treat exit 5 as a pass: it just means every test in the file was
skipped or filtered by a marker (e.g. ``-m 'not integration'`` skips
files where every test is marked integration). That's intentional and
not a failure mode.
On per-file timeout (``file_timeout`` seconds) or any other exception
during ``communicate()``, we kill the whole process group / process
tree so grandchildren (uvicorn servers, async runtimes, etc.) do not
orphan onto PID 1. The pytest-timeout plugin enforces per-test
timeouts inside the subprocess; this outer timeout exists only to
bound a pathologically slow or hung file as a whole.
"""
cmd = [sys.executable, "-m", "pytest", str(file), *pytest_args]
subproc_start = time.monotonic()
rc, output = _spawn_pytest_once(cmd, repo_root, file_timeout)
# pytest exit 4 = "file or directory not found" at exec time. On loaded
# shared CI runners we have seen the planner enumerate a file (its tests
# counted via --collect-only) but the per-file subprocess fail to stat it
# moments later — a transient the deterministic LPT slicer otherwise
# reproduces on every rerun (same file set → same shard). Re-run the file a
# few times with a short backoff so the I/O pressure has time to settle,
# but ONLY while the file demonstrably exists on disk. A single immediate
# retry (the old behaviour) could land in the same brief high-load window
# and fail again; a single Path.exists() check could itself be a flaky stat
# under that load, so we re-check existence across spaced attempts.
# We do NOT widen the exit-5 rule: exit 4 on a file that genuinely does not
# exist must still fail.
attempt = 0
while rc == 4 and attempt < _EXIT4_RETRY_ATTEMPTS and _file_present(file):
attempt += 1
time.sleep(_EXIT4_RETRY_BACKOFF_SECONDS * attempt)
rc, output = _spawn_pytest_once(
cmd, repo_root, file_timeout,
timeout_note=f"per-file timeout on exit-4 retry {attempt}",
)
if rc == 4:
# Exit-4 survived the retries (or the file was judged absent).
# Capture filesystem forensics so a CI-only "file not found" can
# be diagnosed from the log instead of guessed at: does the file
# exist NOW, what does the parent dir hold, and is the git tree
# clean? (June 2026: a PR-added test file repeatedly hit exit 4
# on one CI shard while passing locally — these lines exist so
# the next occurrence is attributable.)
forensics = [f"--- exit-4 forensics for {file} ---"]
try:
forensics.append(f"exists={file.exists()} retries_used={attempt}")
parent = file.parent
if parent.exists():
names = sorted(p.name for p in parent.iterdir())
sibling_hint = [n for n in names if file.stem[:12] in n]
forensics.append(
f"parent={parent} entries={len(names)} "
f"similar={sibling_hint[:5]}"
)
else:
forensics.append(f"parent={parent} MISSING")
git_st = subprocess.run(
["git", "status", "--porcelain"],
cwd=repo_root, capture_output=True, text=True, timeout=10,
)
dirty = git_st.stdout.strip().splitlines()
forensics.append(f"git_dirty_entries={len(dirty)}")
forensics.extend(f" {line}" for line in dirty[:10])
except Exception as exc: # noqa: BLE001 — forensics must never mask rc=4
forensics.append(f"(forensics error: {exc})")
output = output + "\n" + "\n".join(forensics)
output += "\n"
if rc == 5:
# No tests collected — every test in the file was filtered out.
@@ -721,7 +624,7 @@ def main() -> int:
help=(
"Per-file wall-clock cap in seconds. On timeout, the pytest "
"subprocess and its full process tree are SIGKILL'd. "
"Default: 600 (10 min), env: HERMES_TEST_FILE_TIMEOUT."
f"Default: {_DEFAULT_FILE_TIMEOUT_SECONDS}s ({round(_DEFAULT_FILE_TIMEOUT_SECONDS/60)} min), env: HERMES_TEST_FILE_TIMEOUT."
),
)
parser.add_argument(

View File

@@ -731,6 +731,41 @@ def _live_system_guard(request, monkeypatch):
"Mark with @pytest.mark.live_system_guard_bypass if "
"intentional."
)
# Block any subprocess that would run `hermes update` (or the
# equivalent `python -m hermes_cli.main update`). These commands
# run `git fetch origin + git pull` against the REAL checkout,
# overwriting files like pyproject.toml mid-test-run and corrupting
# every subsequent subprocess that reads them. The corruption is
# especially insidious because the spawned process uses setsid/
# start_new_session=True, making it invisible to pytest's process
# tree (PPid=1) and nearly impossible to trace without explicit
# inotify/SHA watchdogs. Any test that legitimately needs to exercise
# the update-spawn path must mock subprocess.Popen explicitly.
cmd_str = _cmd_to_string(cmd)
low = cmd_str.lower()
if "update" in low and (
# hermes update / hermes update --gateway / setsid bash -c ... hermes update
("hermes" in low and "update" in low.split())
or
# python -m hermes_cli.main update --gateway
("hermes_cli" in low and "update" in low.split())
or
# venv/bin/hermes update (absolute path variant used in tests)
(".venv/bin/hermes" in low and "update" in low)
):
raise RuntimeError(
f"tests/conftest.py live-system guard: blocked "
f"subprocess.{name}({cmd!r}) — this command would run "
"`hermes update` against the real checkout, fetching "
"from origin and overwriting repo files (e.g. "
"pyproject.toml) mid-test-run. This corrupts every "
"subsequent subprocess in the same runner. "
"Mock subprocess.Popen (and subprocess.run if used) "
"in the test instead, or mark with "
"@pytest.mark.live_system_guard_bypass if genuinely "
"needed (e.g. an integration test testing the update "
"flow against a dedicated throwaway repo)."
)
def _wrap_subprocess(name, real):
def _guarded(cmd, *args, **kwargs):

View File

@@ -51,10 +51,16 @@ class TestHandleUpdateCommand:
event = _make_event()
monkeypatch.setenv("HERMES_MANAGED", "homebrew")
result = await runner._handle_update_command(event)
# Guard: prevent any accidental fall-through from spawning a real
# `hermes update --gateway` against the CI checkout. The managed-install
# guard should return before Popen is ever reached, but mock it as
# belt-and-suspenders so a premature return doesn't corrupt the repo.
with patch("subprocess.Popen") as mock_popen:
result = await runner._handle_update_command(event)
assert "managed by Homebrew" in result
assert "brew upgrade hermes-agent" in result
mock_popen.assert_not_called() # must return before reaching Popen
@pytest.mark.asyncio
async def test_no_git_directory(self, tmp_path):
@@ -388,16 +394,16 @@ class TestUpdateCommandPlatformGate:
blocked by the allowlist gate before any side effects fire."""
runner = _make_runner()
event = _make_event(platform=Platform.WEBHOOK)
# Stop _handle_update_command from progressing further if the gate
# somehow lets the event through — the assertion on the returned
# string is the real test.
monkeypatch.setenv("HERMES_MANAGED", "")
result = await runner._handle_update_command(event)
# Guard: platform gate must fire before any real subprocess spawn.
with patch("subprocess.Popen") as mock_popen:
result = await runner._handle_update_command(event)
# The exact rejection message comes from
# ``gateway.update.platform_not_messaging`` translation key.
assert "only available from messaging platforms" in result
mock_popen.assert_not_called()
@pytest.mark.asyncio
async def test_blocks_api_server_platform(self, monkeypatch):
@@ -408,9 +414,11 @@ class TestUpdateCommandPlatformGate:
event = _make_event(platform=Platform.API_SERVER)
monkeypatch.setenv("HERMES_MANAGED", "")
result = await runner._handle_update_command(event)
with patch("subprocess.Popen") as mock_popen:
result = await runner._handle_update_command(event)
assert "only available from messaging platforms" in result
mock_popen.assert_not_called()
@pytest.mark.asyncio
async def test_allows_plugin_platform_via_registry_fallback(self, monkeypatch):
@@ -439,7 +447,8 @@ class TestUpdateCommandPlatformGate:
event = _make_event(platform=Platform.DISCORD)
monkeypatch.setenv("HERMES_MANAGED", "")
result = await runner._handle_update_command(event)
with patch("subprocess.Popen"):
result = await runner._handle_update_command(event)
# The gate must NOT have rejected us — anything other than the
# ``platform_not_messaging`` rejection string is acceptable here.
@@ -467,7 +476,8 @@ class TestUpdateCommandPlatformGate:
event = _make_event(platform=Platform.MATTERMOST)
monkeypatch.setenv("HERMES_MANAGED", "")
result = await runner._handle_update_command(event)
with patch("subprocess.Popen"):
result = await runner._handle_update_command(event)
assert "only available from messaging platforms" not in result
@@ -492,7 +502,8 @@ class TestUpdateCommandPlatformGate:
event = _make_event(platform=Platform.HOMEASSISTANT)
monkeypatch.setenv("HERMES_MANAGED", "")
result = await runner._handle_update_command(event)
with patch("subprocess.Popen"):
result = await runner._handle_update_command(event)
assert "only available from messaging platforms" not in result
@@ -509,7 +520,8 @@ class TestUpdateCommandPlatformGate:
event = _make_event(platform=Platform.TELEGRAM)
monkeypatch.setenv("HERMES_MANAGED", "")
result = await runner._handle_update_command(event)
with patch("subprocess.Popen"):
result = await runner._handle_update_command(event)
assert "only available from messaging platforms" not in result

View File

@@ -126,8 +126,8 @@ def test_cmd_update_on_git_install_does_not_print_docker_message(
``subprocess.run`` is mocked because the git path will otherwise shell
out to ``git fetch upstream`` / ``git fetch origin`` — on CI runners
with no ``upstream`` remote configured this can hang past the 30s
pytest-timeout depending on git's network behaviour. The stub
with no ``upstream`` remote configured this can hang past a timeout
depending on git's network behaviour. The stub
returns a successful CompletedProcess-shaped object with ``"0\\n"``
stdout, which both keeps the flow shell-free AND parses cleanly as
the "0 commits behind" rev-list output the check path later parses

View File

@@ -4749,7 +4749,7 @@ class TestPtyWebSocket:
while time.monotonic() < deadline:
# receive_bytes() blocks; once the child prints its winsize and
# exits, the PTY closes and further reads raise. Without this
# guard a missed-marker run blocks until the 30s pytest-timeout
# guard a missed-marker run blocks until a test timeout
# (flaky failure) instead of failing fast on the assert below.
try:
frame = conn.receive_bytes()

View File

@@ -185,111 +185,3 @@ def test_grandchild_leak_is_killed_by_runner(tmp_path: Path) -> None:
f"diag={diag!r} test_pid={test_pid} test_pgid={test_pgid}; "
f"runner output:\n{proc.stdout}"
)
# ---------------------------------------------------------------------------
# exit-4 retry loop (transient "file or directory not found" on loaded runners)
# ---------------------------------------------------------------------------
import importlib.util as _importlib_util # noqa: E402
def _load_runner_module():
"""Import scripts/run_tests_parallel.py as a module for in-process tests."""
repo_root = Path(__file__).resolve().parent.parent
path = repo_root / "scripts" / "run_tests_parallel.py"
spec = _importlib_util.spec_from_file_location("_rtp_under_test", path)
mod = _importlib_util.module_from_spec(spec)
spec.loader.exec_module(mod)
return mod
def test_exit4_retry_recovers_when_file_exists(tmp_path, monkeypatch):
"""A file that exits 4 transiently then passes must be retried and recover.
Simulates the loaded-CI transient: the per-file pytest subprocess reports
"file or directory not found" (exit 4) on the first attempts even though
the file is on disk, then succeeds. The runner must retry and report pass.
"""
rtp = _load_runner_module()
f = tmp_path / "test_transient.py"
f.write_text("def test_ok():\n assert True\n")
calls = {"n": 0}
def fake_spawn(cmd, repo_root, file_timeout, *, timeout_note="per-file timeout"):
calls["n"] += 1
# First two attempts: transient exit-4. Third: success.
if calls["n"] < 3:
return 4, "ERROR: file or directory not found\nno tests ran in 0.00s"
return 0, "1 passed"
monkeypatch.setattr(rtp, "_spawn_pytest_once", fake_spawn)
monkeypatch.setattr(rtp, "_EXIT4_RETRY_BACKOFF_SECONDS", 0.0) # no real sleep
file, rc, output, summary, _wall = rtp._run_one_file(f, [], tmp_path, 30.0)
assert rc == 0, f"expected recovery to pass, got rc={rc}, output={output!r}"
assert calls["n"] == 3, f"expected 3 attempts (1 + 2 retries), got {calls['n']}"
def test_exit4_no_retry_when_file_genuinely_missing(tmp_path, monkeypatch):
"""Exit 4 on a file that does NOT exist must fail fast without retrying.
Guards the narrowing: we only retry while the file is present on disk, so a
real typo / deleted file surfaces immediately instead of looping.
"""
rtp = _load_runner_module()
missing = tmp_path / "test_does_not_exist.py" # never created
calls = {"n": 0}
def fake_spawn(cmd, repo_root, file_timeout, *, timeout_note="per-file timeout"):
calls["n"] += 1
return 4, "ERROR: file or directory not found"
monkeypatch.setattr(rtp, "_spawn_pytest_once", fake_spawn)
monkeypatch.setattr(rtp, "_EXIT4_RETRY_BACKOFF_SECONDS", 0.0)
file, rc, output, summary, _wall = rtp._run_one_file(missing, [], tmp_path, 30.0)
assert rc == 4, f"genuinely-missing file should keep rc=4, got {rc}"
assert calls["n"] == 1, f"missing file must NOT be retried, got {calls['n']} calls"
def test_exit4_retry_gives_up_after_max_attempts(tmp_path, monkeypatch):
"""If the transient never clears, we stop after the bounded attempt count."""
rtp = _load_runner_module()
f = tmp_path / "test_persistent_transient.py"
f.write_text("def test_ok():\n assert True\n")
calls = {"n": 0}
def fake_spawn(cmd, repo_root, file_timeout, *, timeout_note="per-file timeout"):
calls["n"] += 1
return 4, "ERROR: file or directory not found"
monkeypatch.setattr(rtp, "_spawn_pytest_once", fake_spawn)
monkeypatch.setattr(rtp, "_EXIT4_RETRY_BACKOFF_SECONDS", 0.0)
file, rc, output, summary, _wall = rtp._run_one_file(f, [], tmp_path, 30.0)
assert rc == 4
# 1 initial + _EXIT4_RETRY_ATTEMPTS retries.
assert calls["n"] == 1 + rtp._EXIT4_RETRY_ATTEMPTS
def test_file_present_tolerates_transient_negative(tmp_path, monkeypatch):
"""_file_present must not conclude 'missing' on a single flaky stat."""
rtp = _load_runner_module()
f = tmp_path / "test_flaky_stat.py"
f.write_text("x = 1\n")
seq = iter([False, False, True]) # first two stats flake, third succeeds
monkeypatch.setattr(rtp.Path, "exists", lambda self: next(seq))
assert rtp._file_present(f, attempts=3, delay=0.0) is True
def test_file_present_reports_truly_missing(tmp_path, monkeypatch):
"""_file_present returns False when the file is absent across all checks."""
rtp = _load_runner_module()
f = tmp_path / "nope.py"
monkeypatch.setattr(rtp.Path, "exists", lambda self: False)
assert rtp._file_present(f, attempts=3, delay=0.0) is False

22
uv.lock generated
View File

@@ -1461,7 +1461,6 @@ dev = [
{ name = "mcp" },
{ name = "pytest" },
{ name = "pytest-asyncio" },
{ name = "pytest-timeout" },
{ name = "ruff" },
{ name = "setuptools" },
{ name = "starlette" },
@@ -1661,9 +1660,8 @@ requires-dist = [
{ name = "ptyprocess", marker = "sys_platform != 'win32'", specifier = ">=0.7.0,<1" },
{ name = "pydantic", specifier = "==2.13.4" },
{ name = "pyjwt", extras = ["crypto"], specifier = "==2.13.0" },
{ name = "pytest", marker = "extra == 'dev'", specifier = "==9.0.2" },
{ name = "pytest", marker = "extra == 'dev'", specifier = "==9.0.3" },
{ name = "pytest-asyncio", marker = "extra == 'dev'", specifier = "==1.3.0" },
{ name = "pytest-timeout", marker = "extra == 'dev'", specifier = "==2.4.0" },
{ name = "python-dotenv", specifier = "==1.2.2" },
{ name = "python-telegram-bot", extras = ["webhooks"], marker = "extra == 'messaging'", specifier = "==22.6" },
{ name = "python-telegram-bot", extras = ["webhooks"], marker = "extra == 'termux'", specifier = "==22.6" },
@@ -3148,7 +3146,7 @@ wheels = [
[[package]]
name = "pytest"
version = "9.0.2"
version = "9.0.3"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "colorama", marker = "sys_platform == 'win32'" },
@@ -3157,9 +3155,9 @@ dependencies = [
{ name = "pluggy" },
{ name = "pygments" },
]
sdist = { url = "https://files.pythonhosted.org/packages/d1/db/7ef3487e0fb0049ddb5ce41d3a49c235bf9ad299b6a25d5780a89f19230f/pytest-9.0.2.tar.gz", hash = "sha256:75186651a92bd89611d1d9fc20f0b4345fd827c41ccd5c299a868a05d70edf11", size = 1568901, upload-time = "2025-12-06T21:30:51.014Z" }
sdist = { url = "https://files.pythonhosted.org/packages/7d/0d/549bd94f1a0a402dc8cf64563a117c0f3765662e2e668477624baeec44d5/pytest-9.0.3.tar.gz", hash = "sha256:b86ada508af81d19edeb213c681b1d48246c1a91d304c6c81a427674c17eb91c", size = 1572165, upload-time = "2026-04-07T17:16:18.027Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/3b/ab/b3226f0bd7cdcf710fbede2b3548584366da3b19b5021e74f5bde2a8fa3f/pytest-9.0.2-py3-none-any.whl", hash = "sha256:711ffd45bf766d5264d487b917733b453d917afd2b0ad65223959f59089f875b", size = 374801, upload-time = "2025-12-06T21:30:49.154Z" },
{ url = "https://files.pythonhosted.org/packages/d4/24/a372aaf5c9b7208e7112038812994107bc65a84cd00e0354a88c2c77a617/pytest-9.0.3-py3-none-any.whl", hash = "sha256:2c5efc453d45394fdd706ade797c0a81091eccd1d6e4bccfcd476e2b8e0ab5d9", size = 375249, upload-time = "2026-04-07T17:16:16.13Z" },
]
[[package]]
@@ -3175,18 +3173,6 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/e5/35/f8b19922b6a25bc0880171a2f1a003eaeb93657475193ab516fd87cac9da/pytest_asyncio-1.3.0-py3-none-any.whl", hash = "sha256:611e26147c7f77640e6d0a92a38ed17c3e9848063698d5c93d5aa7aa11cebff5", size = 15075, upload-time = "2025-11-10T16:07:45.537Z" },
]
[[package]]
name = "pytest-timeout"
version = "2.4.0"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "pytest" },
]
sdist = { url = "https://files.pythonhosted.org/packages/ac/82/4c9ecabab13363e72d880f2fb504c5f750433b2b6f16e99f4ec21ada284c/pytest_timeout-2.4.0.tar.gz", hash = "sha256:7e68e90b01f9eff71332b25001f85c75495fc4e3a836701876183c4bcfd0540a", size = 17973, upload-time = "2025-05-05T19:44:34.99Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/fa/b6/3127540ecdf1464a00e5a01ee60a1b09175f6913f0644ac748494d9c4b21/pytest_timeout-2.4.0-py3-none-any.whl", hash = "sha256:c42667e5cdadb151aeb5b26d114aff6bdf5a907f176a007a30b940d3d865b5c2", size = 14382, upload-time = "2025-05-05T19:44:33.502Z" },
]
[[package]]
name = "python-dateutil"
version = "2.9.0.post0"