chore(deps): bump pytest from 9.0.2 to 9.0.3

Bumps [pytest](https://github.com/pytest-dev/pytest) from 9.0.2 to 9.0.3. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pytest-dev/pytest/compare/9.0.2...9.0.3) --- updated-dependencies: - dependency-name: pytest dependency-version: 9.0.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
fix(tests): remove no-longer-needed forensics
2026-06-14 22:29:09 +08:00 · 2026-06-12 17:45:13 +00:00 · 2026-06-12 13:42:42 -04:00 · 2026-06-12 13:42:42 -04:00 · 2026-06-12 13:42:42 -04:00 · 2026-06-12 13:42:42 -04:00
11 changed files with 128 additions and 299 deletions
--- a/.github/workflows/docker-publish.yml
+++ b/.github/workflows/docker-publish.yml
@@ -90,7 +90,7 @@ jobs:
      # (see `_SKIP_PARTS` in scripts/run_tests_parallel.py) because each
      # shard would otherwise reach the session-scoped ``built_image``
      # fixture in ``tests/docker/conftest.py`` and start a 3-7min
-      # ``docker build`` under a 180s pytest-timeout cap — guaranteed to
+      # ``docker build`` — guaranteed to
      # die in fixture setup.
      #
      # Piggybacking here avoids a second image build: the smoke test
@@ -114,7 +114,7 @@ jobs:
        run: |
          uv venv .venv --python 3.11
          source .venv/bin/activate
-          # ``dev`` extra pulls in pytest, pytest-asyncio, pytest-timeout —
+          # ``dev`` extra pulls in pytest, pytest-asyncio —
          # everything tests/docker/ needs.  We deliberately avoid ``all``
          # here because the docker tests only drive the container via
          # subprocess and don't import hermes_agent's optional deps.
--- a/.github/workflows/tests.yml
+++ b/.github/workflows/tests.yml
@@ -4,13 +4,13 @@ on:
  push:
    branches: [main]
    paths-ignore:
-      - '**/*.md'
-      - 'docs/**'
+      - "**/*.md"
+      - "docs/**"
  pull_request:
    branches: [main]
    paths-ignore:
-      - '**/*.md'
-      - 'docs/**'
+      - "**/*.md"
+      - "docs/**"

 permissions:
  contents: read
@@ -30,13 +30,17 @@ jobs:
        slice: [1, 2, 3, 4, 5, 6]
    steps:
      - name: Checkout code
-        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Restore duration cache
-        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae  # v5.0.5
+        uses: actions/cache/restore@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
        with:
          path: test_durations.json
-          # Single stable key. main always overwrites, PRs always find it.
+          # main always writes a new suffix, but jobs pick the latest one with the same prefix
+          # quote from https://docs.github.com/en/actions/reference/workflows-and-actions/dependency-caching#cache-hits-and-misses
+          # If you provide restore-keys, the cache action sequentially searches for any caches that match the list of restore-keys.
+          # If there are no exact matches, the action searches for partial matches of the restore keys.
+          # When the action finds a partial match, the most recent cache is restored to the path directory.
          key: test-durations

      - name: Install ripgrep (prebuilt binary)
@@ -54,7 +58,7 @@ jobs:
          rg --version

      - name: Install uv
-        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86  # v5
+        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
        with:
          # Persist uv's download/wheel cache (~/.cache/uv) across runs.
          # Keyed on the dependency manifests, so the cache is reused until
@@ -115,7 +119,7 @@ jobs:
          NOUS_API_KEY: ""

      - name: Upload per-slice durations
-        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a  # v7.0.1
+        uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
        with:
          name: test-durations-slice-${{ matrix.slice }}
          path: test_durations.json
@@ -129,7 +133,7 @@ jobs:
    runs-on: ubuntu-latest
    steps:
      - name: Download all slice durations
-        uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c  # v8.0.1
+        uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
        with:
          pattern: test-durations-slice-*
          path: durations
@@ -149,17 +153,17 @@ jobs:
          "

      - name: Save merged duration cache
-        uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae  # v5.0.5
+        uses: actions/cache/save@27d5ce7f107fe9357f9df03efb73ab90386fccae # v5.0.5
        with:
          path: test_durations.json
-          key: test-durations
+          key: test-durations-${{ github.run_id }}

  e2e:
    runs-on: ubuntu-latest
    timeout-minutes: 15
    steps:
      - name: Checkout code
-        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

      - name: Install ripgrep (prebuilt binary)
        run: |
@@ -176,7 +180,7 @@ jobs:
          rg --version

      - name: Install uv
-        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86  # v5
+        uses: astral-sh/setup-uv@d4b2f3b6ecc6e67c4457f6d3e41ec42d3d0fcb86 # v5
        with:
          # Persist uv's download/wheel cache (~/.cache/uv) across runs.
          # Keyed on the dependency manifests, so the cache is reused until
@@ -215,4 +219,4 @@ jobs:
        env:
          OPENROUTER_API_KEY: ""
          OPENAI_API_KEY: ""
-          NOUS_API_KEY: ""
+          NOUS_API_KEY: ""
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -131,7 +131,7 @@ edge-tts = ["edge-tts==7.2.7"]
 modal = ["modal==1.3.4"]
 daytona = ["daytona==0.155.0"]
 hindsight = ["hindsight-client==0.6.1"]
-dev = ["debugpy==1.8.20", "pytest==9.0.2", "pytest-asyncio==1.3.0", "pytest-timeout==2.4.0", "mcp==1.26.0", "starlette==1.0.1", "ty==0.0.21", "ruff==0.15.10", "setuptools==82.0.1"]  # starlette: CVE-2026-48710
+dev = ["debugpy==1.8.20", "pytest==9.0.3", "pytest-asyncio==1.3.0", "mcp==1.26.0", "starlette==1.0.1", "ty==0.0.21", "ruff==0.15.10", "setuptools==82.0.1"]  # starlette: CVE-2026-48710
 messaging = ["python-telegram-bot[webhooks]==22.6", "discord.py[voice]==2.7.1", "aiohttp==3.13.4", "brotlicffi==1.2.0.1", "slack-bolt==1.27.0", "slack-sdk==3.40.1", "qrcode==7.4.2"]  # aiohttp: CVE-2026-34513/34518/34519/34520/34525
 cron = []  # croniter is now a core dependency; this extra kept for back-compat
 slack = ["slack-bolt==1.27.0", "slack-sdk==3.40.1", "aiohttp==3.13.4"]
@@ -327,12 +327,8 @@ markers = [
    "integration: marks tests requiring external services (API keys, Modal, etc.)",
    "real_concurrent_gate: opt out of the autouse stub that disables _detect_concurrent_hermes_instances",
 ]
-# pytest-timeout: per-test 30s hard cap with cross-platform thread method.
-# This is the fallback inside each per-file pytest subprocess (see
-# scripts/run_tests_parallel.py). Per-file isolation gives every test
-# file a fresh Python interpreter; pytest-timeout catches Python-level
-# hangs within a file.
-addopts = "-m 'not integration' --timeout=30 --timeout-method=thread"
+# integration tests take way too long to run in the normal CI environments
+addopts = "-m 'not integration'"

 [tool.ty.environment]
 python-version = "3.13"
--- a/scripts/run_tests.sh
+++ b/scripts/run_tests.sh
@@ -73,6 +73,7 @@ exec env -i \
  LANG=C.UTF-8 \
  LC_ALL=C.UTF-8 \
  PYTHONHASHSEED=0 \
+  PYTHONDONTWRITEBYTECODE=1 \
  ${EXTRA_PYTHONPATH:+PYTHONPATH="$EXTRA_PYTHONPATH"} \
  ${EXTRA_PYTEST_PLUGINS:+PYTEST_PLUGINS="$EXTRA_PYTEST_PLUGINS"} \
  "$PYTHON" "$SCRIPT_DIR/run_tests_parallel.py" "$@"
--- a/scripts/run_tests_parallel.py
+++ b/scripts/run_tests_parallel.py
@@ -65,17 +65,14 @@ _DEFAULT_ROOTS = ["tests"]
 #                        rebuild). The full pytest-shard runner can't
 #                        host these because the session-scoped
 #                        ``built_image`` fixture would do a 3-7min
-#                        ``docker build`` inside a 180s per-test
-#                        pytest-timeout cap (set by tests/docker/conftest.py),
+#                        ``docker build``,
 #                        so the build is guaranteed to die in fixture
 #                        setup. The dedicated job sidesteps both costs.
 _SKIP_PARTS = {"integration", "e2e", "docker"}

-# Per-file wall-clock cap. Generous default — pytest-timeout still
-# enforces per-test caps inside each subprocess; this is just an outer
-# safety net so a single hung file can't stall the whole suite. Override
+# Per-file wall-clock cap. Override
 # via --file-timeout or HERMES_TEST_FILE_TIMEOUT.
-_DEFAULT_FILE_TIMEOUT_SECONDS = 600.0  # 10 minutes
+_DEFAULT_FILE_TIMEOUT_SECONDS = 140.0 # set by observing the slowest file at commit time was ~100s in CI and adding some leeway

 # Duration cache: maps relative file paths to last-observed subprocess
 # wall-clock seconds. Used by ``--slice`` to distribute files across
@@ -246,27 +243,49 @@ def _kill_tree(proc: "subprocess.Popen", pgid: int | None = None) -> None:
        pass


-def _spawn_pytest_once(
-    cmd: List[str],
+def _run_one_file(
+    file: Path,
+    pytest_args: List[str],
    repo_root: Path,
    file_timeout: float,
-    *,
-    timeout_note: str = "per-file timeout",
-) -> Tuple[int, str]:
-    """Run one ``pytest`` subprocess to completion and return ``(rc, output)``.
+) -> Tuple[Path, int, str, dict[str, int], float]:
+    """Run ``python -m pytest <file> <pytest_args>`` in a fresh subprocess.

-    Spawns the child in its own process group / session so a hung file and
-    its grandchildren (uvicorn servers, async runtimes, etc.) can be SIGKILL'd
-    as a tree on timeout rather than orphaning onto PID 1. Shared by the
-    primary per-file run and the exit-4 retry loop so the lifecycle/cleanup
-    logic lives in exactly one place.
+    Returns (file, returncode, captured_combined_output, summary_counts, subprocess_wall_seconds).
+
+    ``summary_counts`` is the result of ``_parse_pytest_summary(output)`` —
+
+    pytest exit codes (https://docs.pytest.org/en/stable/reference/exit-codes.html):
+        0 = all tests passed
+        1 = some tests failed
+        2 = test execution interrupted
+        3 = internal error
+        4 = pytest CLI usage error
+        5 = no tests collected
+
+    We treat exit 5 as a pass: it just means every test in the file was
+    skipped or filtered by a marker (e.g. ``-m 'not integration'`` skips
+    files where every test is marked integration). That's intentional and
+    not a failure mode.
+
+    On per-file timeout (``file_timeout`` seconds) or any other exception
+    during ``communicate()``, we kill the whole process group / process
+    tree so grandchildren (uvicorn servers, async runtimes, etc.) do not
+    orphan onto PID 1. This outer timeout exists only to
+    bound a pathologically slow or hung file as a whole.
    """
+    cmd = [sys.executable, "-m", "pytest", str(file), *pytest_args]
+    
+    subproc_start = time.monotonic()
+    # launch the pytest process
    proc = subprocess.Popen(
        cmd,
        cwd=repo_root,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        text=True,
+        # skipping writing bytecode because we're running a bunch of parallel python processes on the same code
+        env={**os.environ, 'PYTHONDONTWRITEBYTECODE': '1'},
        # POSIX: place the child at the head of its own process group so
        # _kill_tree can SIGKILL the group atomically.
        # Windows: this maps to CREATE_NEW_PROCESS_GROUP in CPython 3.12+;
@@ -296,7 +315,7 @@ def _spawn_pytest_once(
            output = "(file timeout exceeded; output unavailable)"
        rc = 124  # de facto convention for "killed by timeout".
        output = (
-            f"({timeout_note}: {file_timeout:.0f}s exceeded; "
+            f"({file_timeout:.0f}s exceeded; "
            f"process tree SIGKILL'd)\n{output}"
        )
    except BaseException:
@@ -309,123 +328,7 @@ def _spawn_pytest_once(
        # case it left grandchildren behind; already-dead is a no-op.
        _kill_tree(proc, pgid=pgid)

-    return rc, output
-
-
-# How many times to re-run a file that exits 4 ("file or directory not found")
-# while the file demonstrably exists on disk. On loaded shared CI runners the
-# planner can enumerate a file (tests counted via --collect-only) but the
-# per-file subprocess fail to stat it moments later — and a SINGLE immediate
-# retry can land in the same brief high-load window and fail again. We retry a
-# few times with a short backoff so transient I/O pressure has time to settle.
-_EXIT4_RETRY_ATTEMPTS = 3
-_EXIT4_RETRY_BACKOFF_SECONDS = 0.5
-
-
-def _file_present(file: Path, *, attempts: int = 3, delay: float = 0.2) -> bool:
-    """Return True if ``file`` exists, re-checking a few times.
-
-    ``Path.exists()`` itself issues a ``stat`` that can transiently fail under
-    the same load that makes pytest report "file or directory not found", so a
-    single negative check is not authoritative. Only conclude the file is
-    genuinely missing if it's absent across several spaced checks.
-    """
-    for i in range(attempts):
-        if file.exists():
-            return True
-        if i < attempts - 1:
-            time.sleep(delay)
-    return False
-
-
-def _run_one_file(
-    file: Path,
-    pytest_args: List[str],
-    repo_root: Path,
-    file_timeout: float,
-) -> Tuple[Path, int, str, dict[str, int], float]:
-    """Run ``python -m pytest <file> <pytest_args>`` in a fresh subprocess.
-
-    Returns (file, returncode, captured_combined_output, summary_counts, subprocess_wall_seconds).
-
-    ``summary_counts`` is the result of ``_parse_pytest_summary(output)`` —
-
-    pytest exit codes (https://docs.pytest.org/en/stable/reference/exit-codes.html):
-        0 = all tests passed
-        1 = some tests failed
-        2 = test execution interrupted
-        3 = internal error
-        4 = pytest CLI usage error
-        5 = no tests collected
-
-    We treat exit 5 as a pass: it just means every test in the file was
-    skipped or filtered by a marker (e.g. ``-m 'not integration'`` skips
-    files where every test is marked integration). That's intentional and
-    not a failure mode.
-
-    On per-file timeout (``file_timeout`` seconds) or any other exception
-    during ``communicate()``, we kill the whole process group / process
-    tree so grandchildren (uvicorn servers, async runtimes, etc.) do not
-    orphan onto PID 1. The pytest-timeout plugin enforces per-test
-    timeouts inside the subprocess; this outer timeout exists only to
-    bound a pathologically slow or hung file as a whole.
-    """
-    cmd = [sys.executable, "-m", "pytest", str(file), *pytest_args]
-    subproc_start = time.monotonic()
-    rc, output = _spawn_pytest_once(cmd, repo_root, file_timeout)
-
-    # pytest exit 4 = "file or directory not found" at exec time. On loaded
-    # shared CI runners we have seen the planner enumerate a file (its tests
-    # counted via --collect-only) but the per-file subprocess fail to stat it
-    # moments later — a transient the deterministic LPT slicer otherwise
-    # reproduces on every rerun (same file set → same shard). Re-run the file a
-    # few times with a short backoff so the I/O pressure has time to settle,
-    # but ONLY while the file demonstrably exists on disk. A single immediate
-    # retry (the old behaviour) could land in the same brief high-load window
-    # and fail again; a single Path.exists() check could itself be a flaky stat
-    # under that load, so we re-check existence across spaced attempts.
-    # We do NOT widen the exit-5 rule: exit 4 on a file that genuinely does not
-    # exist must still fail.
-    attempt = 0
-    while rc == 4 and attempt < _EXIT4_RETRY_ATTEMPTS and _file_present(file):
-        attempt += 1
-        time.sleep(_EXIT4_RETRY_BACKOFF_SECONDS * attempt)
-        rc, output = _spawn_pytest_once(
-            cmd, repo_root, file_timeout,
-            timeout_note=f"per-file timeout on exit-4 retry {attempt}",
-        )
-
-    if rc == 4:
-        # Exit-4 survived the retries (or the file was judged absent).
-        # Capture filesystem forensics so a CI-only "file not found" can
-        # be diagnosed from the log instead of guessed at: does the file
-        # exist NOW, what does the parent dir hold, and is the git tree
-        # clean?  (June 2026: a PR-added test file repeatedly hit exit 4
-        # on one CI shard while passing locally — these lines exist so
-        # the next occurrence is attributable.)
-        forensics = [f"--- exit-4 forensics for {file} ---"]
-        try:
-            forensics.append(f"exists={file.exists()} retries_used={attempt}")
-            parent = file.parent
-            if parent.exists():
-                names = sorted(p.name for p in parent.iterdir())
-                sibling_hint = [n for n in names if file.stem[:12] in n]
-                forensics.append(
-                    f"parent={parent} entries={len(names)} "
-                    f"similar={sibling_hint[:5]}"
-                )
-            else:
-                forensics.append(f"parent={parent} MISSING")
-            git_st = subprocess.run(
-                ["git", "status", "--porcelain"],
-                cwd=repo_root, capture_output=True, text=True, timeout=10,
-            )
-            dirty = git_st.stdout.strip().splitlines()
-            forensics.append(f"git_dirty_entries={len(dirty)}")
-            forensics.extend(f"  {line}" for line in dirty[:10])
-        except Exception as exc:  # noqa: BLE001 — forensics must never mask rc=4
-            forensics.append(f"(forensics error: {exc})")
-        output = output + "\n" + "\n".join(forensics)
+        output +=  "\n"

    if rc == 5:
        # No tests collected — every test in the file was filtered out.
@@ -721,7 +624,7 @@ def main() -> int:
        help=(
            "Per-file wall-clock cap in seconds. On timeout, the pytest "
            "subprocess and its full process tree are SIGKILL'd. "
-            "Default: 600 (10 min), env: HERMES_TEST_FILE_TIMEOUT."
+            f"Default: {_DEFAULT_FILE_TIMEOUT_SECONDS}s ({round(_DEFAULT_FILE_TIMEOUT_SECONDS/60)} min), env: HERMES_TEST_FILE_TIMEOUT."
        ),
    )
    parser.add_argument(
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -731,6 +731,41 @@ def _live_system_guard(request, monkeypatch):
                "Mark with @pytest.mark.live_system_guard_bypass if "
                "intentional."
            )
+        # Block any subprocess that would run `hermes update` (or the
+        # equivalent `python -m hermes_cli.main update`).  These commands
+        # run `git fetch origin + git pull` against the REAL checkout,
+        # overwriting files like pyproject.toml mid-test-run and corrupting
+        # every subsequent subprocess that reads them.  The corruption is
+        # especially insidious because the spawned process uses setsid/
+        # start_new_session=True, making it invisible to pytest's process
+        # tree (PPid=1) and nearly impossible to trace without explicit
+        # inotify/SHA watchdogs.  Any test that legitimately needs to exercise
+        # the update-spawn path must mock subprocess.Popen explicitly.
+        cmd_str = _cmd_to_string(cmd)
+        low = cmd_str.lower()
+        if "update" in low and (
+            # hermes update / hermes update --gateway / setsid bash -c ... hermes update
+            ("hermes" in low and "update" in low.split())
+            or
+            # python -m hermes_cli.main update --gateway
+            ("hermes_cli" in low and "update" in low.split())
+            or
+            # venv/bin/hermes update  (absolute path variant used in tests)
+            (".venv/bin/hermes" in low and "update" in low)
+        ):
+            raise RuntimeError(
+                f"tests/conftest.py live-system guard: blocked "
+                f"subprocess.{name}({cmd!r}) — this command would run "
+                "`hermes update` against the real checkout, fetching "
+                "from origin and overwriting repo files (e.g. "
+                "pyproject.toml) mid-test-run. This corrupts every "
+                "subsequent subprocess in the same runner. "
+                "Mock subprocess.Popen (and subprocess.run if used) "
+                "in the test instead, or mark with "
+                "@pytest.mark.live_system_guard_bypass if genuinely "
+                "needed (e.g. an integration test testing the update "
+                "flow against a dedicated throwaway repo)."
+            )

    def _wrap_subprocess(name, real):
        def _guarded(cmd, *args, **kwargs):
--- a/tests/gateway/test_update_command.py
+++ b/tests/gateway/test_update_command.py
@@ -51,10 +51,16 @@ class TestHandleUpdateCommand:
        event = _make_event()
        monkeypatch.setenv("HERMES_MANAGED", "homebrew")

-        result = await runner._handle_update_command(event)
+        # Guard: prevent any accidental fall-through from spawning a real
+        # `hermes update --gateway` against the CI checkout. The managed-install
+        # guard should return before Popen is ever reached, but mock it as
+        # belt-and-suspenders so a premature return doesn't corrupt the repo.
+        with patch("subprocess.Popen") as mock_popen:
+            result = await runner._handle_update_command(event)

        assert "managed by Homebrew" in result
        assert "brew upgrade hermes-agent" in result
+        mock_popen.assert_not_called()  # must return before reaching Popen

    @pytest.mark.asyncio
    async def test_no_git_directory(self, tmp_path):
@@ -388,16 +394,16 @@ class TestUpdateCommandPlatformGate:
        blocked by the allowlist gate before any side effects fire."""
        runner = _make_runner()
        event = _make_event(platform=Platform.WEBHOOK)
-        # Stop _handle_update_command from progressing further if the gate
-        # somehow lets the event through — the assertion on the returned
-        # string is the real test.
        monkeypatch.setenv("HERMES_MANAGED", "")

-        result = await runner._handle_update_command(event)
+        # Guard: platform gate must fire before any real subprocess spawn.
+        with patch("subprocess.Popen") as mock_popen:
+            result = await runner._handle_update_command(event)

        # The exact rejection message comes from
        # ``gateway.update.platform_not_messaging`` translation key.
        assert "only available from messaging platforms" in result
+        mock_popen.assert_not_called()

    @pytest.mark.asyncio
    async def test_blocks_api_server_platform(self, monkeypatch):
@@ -408,9 +414,11 @@ class TestUpdateCommandPlatformGate:
        event = _make_event(platform=Platform.API_SERVER)
        monkeypatch.setenv("HERMES_MANAGED", "")

-        result = await runner._handle_update_command(event)
+        with patch("subprocess.Popen") as mock_popen:
+            result = await runner._handle_update_command(event)

        assert "only available from messaging platforms" in result
+        mock_popen.assert_not_called()

    @pytest.mark.asyncio
    async def test_allows_plugin_platform_via_registry_fallback(self, monkeypatch):
@@ -439,7 +447,8 @@ class TestUpdateCommandPlatformGate:
        event = _make_event(platform=Platform.DISCORD)
        monkeypatch.setenv("HERMES_MANAGED", "")

-        result = await runner._handle_update_command(event)
+        with patch("subprocess.Popen"):
+            result = await runner._handle_update_command(event)

        # The gate must NOT have rejected us — anything other than the
        # ``platform_not_messaging`` rejection string is acceptable here.
@@ -467,7 +476,8 @@ class TestUpdateCommandPlatformGate:
        event = _make_event(platform=Platform.MATTERMOST)
        monkeypatch.setenv("HERMES_MANAGED", "")

-        result = await runner._handle_update_command(event)
+        with patch("subprocess.Popen"):
+            result = await runner._handle_update_command(event)

        assert "only available from messaging platforms" not in result

@@ -492,7 +502,8 @@ class TestUpdateCommandPlatformGate:
        event = _make_event(platform=Platform.HOMEASSISTANT)
        monkeypatch.setenv("HERMES_MANAGED", "")

-        result = await runner._handle_update_command(event)
+        with patch("subprocess.Popen"):
+            result = await runner._handle_update_command(event)

        assert "only available from messaging platforms" not in result

@@ -509,7 +520,8 @@ class TestUpdateCommandPlatformGate:
        event = _make_event(platform=Platform.TELEGRAM)
        monkeypatch.setenv("HERMES_MANAGED", "")

-        result = await runner._handle_update_command(event)
+        with patch("subprocess.Popen"):
+            result = await runner._handle_update_command(event)

        assert "only available from messaging platforms" not in result

--- a/tests/hermes_cli/test_cmd_update_docker.py
+++ b/tests/hermes_cli/test_cmd_update_docker.py
@@ -126,8 +126,8 @@ def test_cmd_update_on_git_install_does_not_print_docker_message(

    ``subprocess.run`` is mocked because the git path will otherwise shell
    out to ``git fetch upstream`` / ``git fetch origin`` — on CI runners
-    with no ``upstream`` remote configured this can hang past the 30s
-    pytest-timeout depending on git's network behaviour.  The stub
+    with no ``upstream`` remote configured this can hang past a timeout
+    depending on git's network behaviour.  The stub
    returns a successful CompletedProcess-shaped object with ``"0\\n"``
    stdout, which both keeps the flow shell-free AND parses cleanly as
    the "0 commits behind" rev-list output the check path later parses
--- a/tests/hermes_cli/test_web_server.py
+++ b/tests/hermes_cli/test_web_server.py
@@ -4749,7 +4749,7 @@ class TestPtyWebSocket:
            while time.monotonic() < deadline:
                # receive_bytes() blocks; once the child prints its winsize and
                # exits, the PTY closes and further reads raise. Without this
-                # guard a missed-marker run blocks until the 30s pytest-timeout
+                # guard a missed-marker run blocks until a test timeout
                # (flaky failure) instead of failing fast on the assert below.
                try:
                    frame = conn.receive_bytes()
--- a/tests/test_run_tests_parallel.py
+++ b/tests/test_run_tests_parallel.py
@@ -185,111 +185,3 @@ def test_grandchild_leak_is_killed_by_runner(tmp_path: Path) -> None:
            f"diag={diag!r} test_pid={test_pid} test_pgid={test_pgid}; "
            f"runner output:\n{proc.stdout}"
        )
-
-
-# ---------------------------------------------------------------------------
-# exit-4 retry loop (transient "file or directory not found" on loaded runners)
-# ---------------------------------------------------------------------------
-
-import importlib.util as _importlib_util  # noqa: E402
-
-
-def _load_runner_module():
-    """Import scripts/run_tests_parallel.py as a module for in-process tests."""
-    repo_root = Path(__file__).resolve().parent.parent
-    path = repo_root / "scripts" / "run_tests_parallel.py"
-    spec = _importlib_util.spec_from_file_location("_rtp_under_test", path)
-    mod = _importlib_util.module_from_spec(spec)
-    spec.loader.exec_module(mod)
-    return mod
-
-
-def test_exit4_retry_recovers_when_file_exists(tmp_path, monkeypatch):
-    """A file that exits 4 transiently then passes must be retried and recover.
-
-    Simulates the loaded-CI transient: the per-file pytest subprocess reports
-    "file or directory not found" (exit 4) on the first attempts even though
-    the file is on disk, then succeeds. The runner must retry and report pass.
-    """
-    rtp = _load_runner_module()
-    f = tmp_path / "test_transient.py"
-    f.write_text("def test_ok():\n    assert True\n")
-
-    calls = {"n": 0}
-
-    def fake_spawn(cmd, repo_root, file_timeout, *, timeout_note="per-file timeout"):
-        calls["n"] += 1
-        # First two attempts: transient exit-4. Third: success.
-        if calls["n"] < 3:
-            return 4, "ERROR: file or directory not found\nno tests ran in 0.00s"
-        return 0, "1 passed"
-
-    monkeypatch.setattr(rtp, "_spawn_pytest_once", fake_spawn)
-    monkeypatch.setattr(rtp, "_EXIT4_RETRY_BACKOFF_SECONDS", 0.0)  # no real sleep
-
-    file, rc, output, summary, _wall = rtp._run_one_file(f, [], tmp_path, 30.0)
-    assert rc == 0, f"expected recovery to pass, got rc={rc}, output={output!r}"
-    assert calls["n"] == 3, f"expected 3 attempts (1 + 2 retries), got {calls['n']}"
-
-
-def test_exit4_no_retry_when_file_genuinely_missing(tmp_path, monkeypatch):
-    """Exit 4 on a file that does NOT exist must fail fast without retrying.
-
-    Guards the narrowing: we only retry while the file is present on disk, so a
-    real typo / deleted file surfaces immediately instead of looping.
-    """
-    rtp = _load_runner_module()
-    missing = tmp_path / "test_does_not_exist.py"  # never created
-
-    calls = {"n": 0}
-
-    def fake_spawn(cmd, repo_root, file_timeout, *, timeout_note="per-file timeout"):
-        calls["n"] += 1
-        return 4, "ERROR: file or directory not found"
-
-    monkeypatch.setattr(rtp, "_spawn_pytest_once", fake_spawn)
-    monkeypatch.setattr(rtp, "_EXIT4_RETRY_BACKOFF_SECONDS", 0.0)
-
-    file, rc, output, summary, _wall = rtp._run_one_file(missing, [], tmp_path, 30.0)
-    assert rc == 4, f"genuinely-missing file should keep rc=4, got {rc}"
-    assert calls["n"] == 1, f"missing file must NOT be retried, got {calls['n']} calls"
-
-
-def test_exit4_retry_gives_up_after_max_attempts(tmp_path, monkeypatch):
-    """If the transient never clears, we stop after the bounded attempt count."""
-    rtp = _load_runner_module()
-    f = tmp_path / "test_persistent_transient.py"
-    f.write_text("def test_ok():\n    assert True\n")
-
-    calls = {"n": 0}
-
-    def fake_spawn(cmd, repo_root, file_timeout, *, timeout_note="per-file timeout"):
-        calls["n"] += 1
-        return 4, "ERROR: file or directory not found"
-
-    monkeypatch.setattr(rtp, "_spawn_pytest_once", fake_spawn)
-    monkeypatch.setattr(rtp, "_EXIT4_RETRY_BACKOFF_SECONDS", 0.0)
-
-    file, rc, output, summary, _wall = rtp._run_one_file(f, [], tmp_path, 30.0)
-    assert rc == 4
-    # 1 initial + _EXIT4_RETRY_ATTEMPTS retries.
-    assert calls["n"] == 1 + rtp._EXIT4_RETRY_ATTEMPTS
-
-
-def test_file_present_tolerates_transient_negative(tmp_path, monkeypatch):
-    """_file_present must not conclude 'missing' on a single flaky stat."""
-    rtp = _load_runner_module()
-    f = tmp_path / "test_flaky_stat.py"
-    f.write_text("x = 1\n")
-
-    seq = iter([False, False, True])  # first two stats flake, third succeeds
-    monkeypatch.setattr(rtp.Path, "exists", lambda self: next(seq))
-    assert rtp._file_present(f, attempts=3, delay=0.0) is True
-
-
-def test_file_present_reports_truly_missing(tmp_path, monkeypatch):
-    """_file_present returns False when the file is absent across all checks."""
-    rtp = _load_runner_module()
-    f = tmp_path / "nope.py"
-    monkeypatch.setattr(rtp.Path, "exists", lambda self: False)
-    assert rtp._file_present(f, attempts=3, delay=0.0) is False
--- a/uv.lock
+++ b/uv.lock
@@ -1461,7 +1461,6 @@ dev = [
    { name = "mcp" },
    { name = "pytest" },
    { name = "pytest-asyncio" },
-    { name = "pytest-timeout" },
    { name = "ruff" },
    { name = "setuptools" },
    { name = "starlette" },
@@ -1661,9 +1660,8 @@ requires-dist = [
    { name = "ptyprocess", marker = "sys_platform != 'win32'", specifier = ">=0.7.0,<1" },
    { name = "pydantic", specifier = "==2.13.4" },
    { name = "pyjwt", extras = ["crypto"], specifier = "==2.13.0" },
-    { name = "pytest", marker = "extra == 'dev'", specifier = "==9.0.2" },
+    { name = "pytest", marker = "extra == 'dev'", specifier = "==9.0.3" },
    { name = "pytest-asyncio", marker = "extra == 'dev'", specifier = "==1.3.0" },
-    { name = "pytest-timeout", marker = "extra == 'dev'", specifier = "==2.4.0" },
    { name = "python-dotenv", specifier = "==1.2.2" },
    { name = "python-telegram-bot", extras = ["webhooks"], marker = "extra == 'messaging'", specifier = "==22.6" },
    { name = "python-telegram-bot", extras = ["webhooks"], marker = "extra == 'termux'", specifier = "==22.6" },
@@ -3148,7 +3146,7 @@ wheels = [

 [[package]]
 name = "pytest"
-version = "9.0.2"
+version = "9.0.3"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "colorama", marker = "sys_platform == 'win32'" },
@@ -3157,9 +3155,9 @@ dependencies = [
    { name = "pluggy" },
    { name = "pygments" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/d1/db/7ef3487e0fb0049ddb5ce41d3a49c235bf9ad299b6a25d5780a89f19230f/pytest-9.0.2.tar.gz", hash = "sha256:75186651a92bd89611d1d9fc20f0b4345fd827c41ccd5c299a868a05d70edf11", size = 1568901, upload-time = "2025-12-06T21:30:51.014Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/7d/0d/549bd94f1a0a402dc8cf64563a117c0f3765662e2e668477624baeec44d5/pytest-9.0.3.tar.gz", hash = "sha256:b86ada508af81d19edeb213c681b1d48246c1a91d304c6c81a427674c17eb91c", size = 1572165, upload-time = "2026-04-07T17:16:18.027Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/3b/ab/b3226f0bd7cdcf710fbede2b3548584366da3b19b5021e74f5bde2a8fa3f/pytest-9.0.2-py3-none-any.whl", hash = "sha256:711ffd45bf766d5264d487b917733b453d917afd2b0ad65223959f59089f875b", size = 374801, upload-time = "2025-12-06T21:30:49.154Z" },
+    { url = "https://files.pythonhosted.org/packages/d4/24/a372aaf5c9b7208e7112038812994107bc65a84cd00e0354a88c2c77a617/pytest-9.0.3-py3-none-any.whl", hash = "sha256:2c5efc453d45394fdd706ade797c0a81091eccd1d6e4bccfcd476e2b8e0ab5d9", size = 375249, upload-time = "2026-04-07T17:16:16.13Z" },
 ]

 [[package]]
@@ -3175,18 +3173,6 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/e5/35/f8b19922b6a25bc0880171a2f1a003eaeb93657475193ab516fd87cac9da/pytest_asyncio-1.3.0-py3-none-any.whl", hash = "sha256:611e26147c7f77640e6d0a92a38ed17c3e9848063698d5c93d5aa7aa11cebff5", size = 15075, upload-time = "2025-11-10T16:07:45.537Z" },
 ]

-[[package]]
-name = "pytest-timeout"
-version = "2.4.0"
-source = { registry = "https://pypi.org/simple" }
-dependencies = [
-    { name = "pytest" },
-]
-sdist = { url = "https://files.pythonhosted.org/packages/ac/82/4c9ecabab13363e72d880f2fb504c5f750433b2b6f16e99f4ec21ada284c/pytest_timeout-2.4.0.tar.gz", hash = "sha256:7e68e90b01f9eff71332b25001f85c75495fc4e3a836701876183c4bcfd0540a", size = 17973, upload-time = "2025-05-05T19:44:34.99Z" }
-wheels = [
-    { url = "https://files.pythonhosted.org/packages/fa/b6/3127540ecdf1464a00e5a01ee60a1b09175f6913f0644ac748494d9c4b21/pytest_timeout-2.4.0-py3-none-any.whl", hash = "sha256:c42667e5cdadb151aeb5b26d114aff6bdf5a907f176a007a30b940d3d865b5c2", size = 14382, upload-time = "2025-05-05T19:44:33.502Z" },
-]
-
 [[package]]
 name = "python-dateutil"
 version = "2.9.0.post0"
Author	SHA1	Message	Date
dependabot[bot]	c887b8ba36	chore(deps): bump pytest from 9.0.2 to 9.0.3 Bumps [pytest](https://github.com/pytest-dev/pytest) from 9.0.2 to 9.0.3. - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](https://github.com/pytest-dev/pytest/compare/9.0.2...9.0.3) --- updated-dependencies: - dependency-name: pytest dependency-version: 9.0.3 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>	2026-06-12 17:45:13 +00:00
ethernet	4d68984ec7	fix(tests): remove no-longer-needed forensics	2026-06-12 13:42:42 -04:00
ethernet	6ff39c31ad	fix(tests): guard against real 'hermes update' subprocess spawns in conftest Extends _live_system_guard in tests/conftest.py to block any subprocess call that would run 'hermes update' (or 'python -m hermes_cli.main update') against the real checkout. These commands run git fetch origin + git pull, overwriting repo files like pyproject.toml mid-test-run and corrupting every subsequent subprocess that reads them. The spawned process uses setsid / start_new_session=True so it's invisible to pytest's process tree (PPid=1) — the corruption was essentially undetectable without explicit inotify/SHA watchdogs. Root cause of #43703 CI failures: tests in TestUpdateCommandPlatformGate called _handle_update_command() with HERMES_MANAGED='' and no Popen mock, causing the code to fall through and spawn a real 'hermes update --gateway' that overwrote pyproject.toml with origin/main's content (which still had '--timeout=30 --timeout-method=thread' in addopts while the PR had already removed pytest-timeout). The guard covers all three invocation patterns: - 'hermes update' / 'hermes update --gateway' (direct or via setsid bash -c) - 'python -m hermes_cli.main update --gateway' - '.venv/bin/hermes update' (absolute path variant) Does not false-positive on: git update-index, apt-get update, pip install --upgrade, or any command lacking 'hermes'/'hermes_cli'.	2026-06-12 13:42:42 -04:00
ethernet	c41a6534cf	fix(tests): mock subprocess.Popen in all _handle_update_command tests	2026-06-12 13:42:42 -04:00
ethernet	2f9d18711f	fix(ci): remove pytest-timeout, use per-file timeout only fix(ci): write a new cache for test durations every time change(ci): rip out error 4 retries because we found the real bug	2026-06-12 13:42:42 -04:00