Compare commits

...

1 Commits

Author SHA1 Message Date
Teknium
742732fc12 fix(windows): respawn gateway after Desktop GUI update by breaking the watcher away from Electron's job object
`hermes update --gateway` (which the Tauri / Electron Desktop updater calls
to relaunch the user's gateway after an update) spawns a tiny watcher
subprocess that polls the old gateway PID, SIGTERMs it, and respawns the
new one. The watcher and its respawned gateway both need to outlive the
Electron Desktop process that triggered the update — Electron exits
mid-update so the venv shim is unlocked.

On Windows that was failing in a subtle way: the watcher was spawned
with `DETACHED_PROCESS | CREATE_NEW_PROCESS_GROUP | CREATE_NO_WINDOW`
but NOT `CREATE_BREAKAWAY_FROM_JOB`. Electron wraps its child processes
in a Win32 job object; when Electron quits, the OS tears the job down
and reaps every descendant that didn't explicitly break away — DETACHED
or not. The watcher died before it could relaunch the gateway. End
result: every Desktop-GUI-driven update left the user without a gateway
until they ran `hermes gateway start` manually.

`hermes gateway start` itself already does the right thing (its
`_spawn_detached` in `gateway_windows.py` includes the breakaway bit
plus a retry without it for restrictive job objects). The fix is to
mirror that pattern in the two spawn sites the update flow uses:

1. `windows_detach_flags()` in `_subprocess_compat.py` now includes
   `CREATE_BREAKAWAY_FROM_JOB` (0x01000000) in its default bundle.
   `windows_detach_popen_kwargs()` inherits the new flag automatically.
   A new helper `windows_detach_flags_without_breakaway()` exposes the
   pre-breakaway bundle as the documented retry payload.

2. `launch_detached_profile_gateway_restart()` in `gateway.py`:
   - The inlined respawn script (the watcher's payload — a stringified
     Python program passed to `python -c`) now sets the breakaway bit
     on the respawned gateway, with the same access-denied fallback.
   - The outer Popen that launches the watcher itself wraps in a
     try/except OSError and retries without the breakaway bit using
     `windows_detach_flags_without_breakaway()`, mirroring
     `_spawn_detached` in `gateway_windows.py`.

POSIX paths are unchanged: `windows_detach_flags()` returns 0 off
Windows and `windows_detach_popen_kwargs()` still yields
`{"start_new_session": True}`.

Tests:

- `test_windows_detach_flags_includes_breakaway_from_job` — regression
  guard so the breakaway bit isn't accidentally dropped from the
  default flag bundle.
- `test_windows_detach_flags_without_breakaway_drops_only_that_bit`
  — fallback payload keeps the other three detach bits intact.
- `test_launch_detached_profile_gateway_restart_inlined_watcher_uses_breakaway`
  — static check that the stringified watcher payload contains
  `CREATE_BREAKAWAY_FROM_JOB` symbolically and as the hex literal, so
  the intent is greppable and future refactors don't silently drop it.
- The existing `test_windows_detach_flags_has_expected_win32_bits` now
  also asserts the breakaway bit is present.
2026-06-06 19:39:49 -07:00
3 changed files with 182 additions and 17 deletions

View File

@@ -35,6 +35,7 @@ __all__ = [
"IS_WINDOWS",
"resolve_node_command",
"windows_detach_flags",
"windows_detach_flags_without_breakaway",
"windows_hide_flags",
"windows_detach_popen_kwargs",
]
@@ -97,6 +98,7 @@ def resolve_node_command(name: str, argv: Sequence[str]) -> list[str]:
_CREATE_NEW_PROCESS_GROUP = 0x00000200
_DETACHED_PROCESS = 0x00000008
_CREATE_NO_WINDOW = 0x08000000
_CREATE_BREAKAWAY_FROM_JOB = 0x01000000
def windows_detach_flags() -> int:
@@ -116,6 +118,39 @@ def windows_detach_flags() -> int:
- ``CREATE_NO_WINDOW`` — suppress the brief cmd flash that would
otherwise appear when launching a console app. Redundant with
DETACHED_PROCESS but explicit for clarity.
- ``CREATE_BREAKAWAY_FROM_JOB`` — escape any Win32 job object the
parent process is wrapped in. Electron (Desktop GUI) and some
Windows Terminal configs wrap their children in a job object;
when the parent exits, the OS tears down its job, which would
otherwise reap *detached* grandchildren (e.g. the post-update
gateway-respawn watcher) along with it. Without this flag, the
Desktop GUI's "update + relaunch gateway" handoff fails on
Windows: Electron quits while handing off to hermes-setup.exe,
its job is torn down, and the gateway-respawn watcher dies
before it can relaunch the gateway.
Some restrictive job-object configurations refuse breakaway with
ERROR_ACCESS_DENIED. Callers spawning a Popen with these flags
should be prepared to retry without ``CREATE_BREAKAWAY_FROM_JOB``
if the first attempt raises ``OSError`` — see
``windows_detach_flags_without_breakaway`` below.
"""
if not IS_WINDOWS:
return 0
return (
_CREATE_NEW_PROCESS_GROUP
| _DETACHED_PROCESS
| _CREATE_NO_WINDOW
| _CREATE_BREAKAWAY_FROM_JOB
)
def windows_detach_flags_without_breakaway() -> int:
"""Same as :func:`windows_detach_flags` minus ``CREATE_BREAKAWAY_FROM_JOB``.
Use as the retry payload when a Popen with the full detach bundle
raises ``OSError`` (typically ERROR_ACCESS_DENIED from a job object
that doesn't permit breakaway). 0 on non-Windows.
"""
if not IS_WINDOWS:
return 0

View File

@@ -637,11 +637,24 @@ def launch_detached_profile_gateway_restart(profile: str, old_pid: int) -> bool:
# when the user closes the terminal, leaving ``hermes update`` users
# with no running gateway until they re-invoke ``hermes gateway``
# manually. The Win32 equivalent is the ``CREATE_NEW_PROCESS_GROUP |
# DETACHED_PROCESS | CREATE_NO_WINDOW`` creationflags bundle.
# DETACHED_PROCESS | CREATE_NO_WINDOW | CREATE_BREAKAWAY_FROM_JOB``
# creationflags bundle.
#
# ``CREATE_BREAKAWAY_FROM_JOB`` is load-bearing on the Desktop GUI
# update path: Electron (which spawned hermes-setup.exe → which
# spawned `hermes update` → which spawned the watcher) wraps its
# children in a Win32 job object, and quits while handing off the
# update. When the OS tears down Electron's job, any descendant
# without the breakaway bit gets reaped along with it — including
# this watcher, before it gets a chance to relaunch the gateway.
# Result: the gateway dies during the update and never comes back.
#
# ``windows_detach_popen_kwargs()`` returns the right kwargs for the
# host platform and is a no-op on POSIX (just ``start_new_session=True``).
from hermes_cli._subprocess_compat import windows_detach_popen_kwargs
from hermes_cli._subprocess_compat import (
windows_detach_flags_without_breakaway,
windows_detach_popen_kwargs,
)
watcher = textwrap.dedent(
"""
@@ -664,6 +677,9 @@ def launch_detached_profile_gateway_restart(profile: str, old_pid: int) -> bool:
# Platform-appropriate detach for the respawned gateway. On POSIX
# start_new_session=True maps to os.setsid; on Windows we need
# explicit creationflags because start_new_session is a no-op there.
# CREATE_BREAKAWAY_FROM_JOB matters here for the same reason it
# mattered when this watcher itself was spawned — without it, the
# respawned gateway can be reaped by the parent's job teardown.
_popen_kwargs = {
"stdout": subprocess.DEVNULL,
"stderr": subprocess.DEVNULL,
@@ -672,32 +688,68 @@ def launch_detached_profile_gateway_restart(profile: str, old_pid: int) -> bool:
_CREATE_NEW_PROCESS_GROUP = 0x00000200
_DETACHED_PROCESS = 0x00000008
_CREATE_NO_WINDOW = 0x08000000
_popen_kwargs["creationflags"] = (
_CREATE_NEW_PROCESS_GROUP | _DETACHED_PROCESS | _CREATE_NO_WINDOW
_CREATE_BREAKAWAY_FROM_JOB = 0x01000000
_flags = (
_CREATE_NEW_PROCESS_GROUP
| _DETACHED_PROCESS
| _CREATE_NO_WINDOW
| _CREATE_BREAKAWAY_FROM_JOB
)
try:
_popen_kwargs["creationflags"] = _flags
subprocess.Popen(cmd, **_popen_kwargs)
except OSError:
# CREATE_BREAKAWAY_FROM_JOB can fail with ERROR_ACCESS_DENIED
# when the parent's job object doesn't permit breakaway.
# Retry without it — in most setups DETACHED_PROCESS is
# enough on its own.
_popen_kwargs["creationflags"] = _flags & ~_CREATE_BREAKAWAY_FROM_JOB
subprocess.Popen(cmd, **_popen_kwargs)
else:
_popen_kwargs["start_new_session"] = True
subprocess.Popen(cmd, **_popen_kwargs)
subprocess.Popen(cmd, **_popen_kwargs)
"""
).strip()
watcher_argv = [
sys.executable,
"-c",
watcher,
str(old_pid),
*_gateway_run_args_for_profile(profile),
]
# Same platform-aware detach for the watcher process itself — so
# closing the user's terminal (or Electron exiting after handing
# off the update) doesn't kill the watcher before it can respawn
# the gateway.
try:
# Same platform-aware detach for the watcher process itself — so
# closing the user's terminal doesn't kill the watcher.
subprocess.Popen(
[
sys.executable,
"-c",
watcher,
str(old_pid),
*_gateway_run_args_for_profile(profile),
],
watcher_argv,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
**windows_detach_popen_kwargs(),
)
except OSError:
return False
# CREATE_BREAKAWAY_FROM_JOB rejected by the parent job object —
# retry without it. Detach is still best-effort via
# DETACHED_PROCESS + CREATE_NEW_PROCESS_GROUP + CREATE_NO_WINDOW.
# On POSIX this branch is unreachable (start_new_session always
# succeeds), so the fallback path is Windows-only in practice.
try:
fallback_kwargs: dict = (
{"creationflags": windows_detach_flags_without_breakaway()}
if sys.platform == "win32"
else {"start_new_session": True}
)
subprocess.Popen(
watcher_argv,
stdout=subprocess.DEVNULL,
stderr=subprocess.DEVNULL,
**fallback_kwargs,
)
except OSError:
return False
return True

View File

@@ -541,10 +541,12 @@ class TestSubprocessCompatHelpers:
def test_windows_flags_zero_on_posix(self):
from hermes_cli._subprocess_compat import (
windows_detach_flags,
windows_detach_flags_without_breakaway,
windows_hide_flags,
)
if sys.platform != "win32":
assert windows_detach_flags() == 0
assert windows_detach_flags_without_breakaway() == 0
assert windows_hide_flags() == 0
def test_windows_detach_popen_kwargs_is_posix_equivalent_on_posix(self):
@@ -556,7 +558,9 @@ class TestSubprocessCompatHelpers:
# branch behaviour. Do NOT break Linux/macOS here.
assert kwargs == {"start_new_session": True}
else:
# Windows path must include creationflags with all 3 bits set.
# Windows path must include creationflags with all 4 bits set
# (including CREATE_BREAKAWAY_FROM_JOB — see the dedicated
# breakaway test below for the rationale).
assert "creationflags" in kwargs
assert kwargs["creationflags"] != 0
# No start_new_session on Windows (silently no-op there).
@@ -567,10 +571,84 @@ class TestSubprocessCompatHelpers:
from hermes_cli import _subprocess_compat as sc
monkeypatch.setattr(sc, "IS_WINDOWS", True)
flags = sc.windows_detach_flags()
# CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW
# CREATE_NEW_PROCESS_GROUP | DETACHED_PROCESS | CREATE_NO_WINDOW |
# CREATE_BREAKAWAY_FROM_JOB
assert flags & 0x00000200, "missing CREATE_NEW_PROCESS_GROUP"
assert flags & 0x00000008, "missing DETACHED_PROCESS"
assert flags & 0x08000000, "missing CREATE_NO_WINDOW"
assert flags & 0x01000000, "missing CREATE_BREAKAWAY_FROM_JOB"
def test_windows_detach_flags_includes_breakaway_from_job(self, monkeypatch):
"""CREATE_BREAKAWAY_FROM_JOB is load-bearing for the GUI-driven update path.
Without it, the gateway-respawn watcher spawned by ``hermes update``
(which runs under hermes-setup.exe, itself a grandchild of the
Electron Desktop app) gets reaped when Electron exits and its
Win32 job object is torn down by the OS. Result: gateway dies
during update and never comes back.
Regression guard against accidentally dropping the breakaway bit.
"""
from hermes_cli import _subprocess_compat as sc
monkeypatch.setattr(sc, "IS_WINDOWS", True)
assert sc.windows_detach_flags() & 0x01000000, (
"CREATE_BREAKAWAY_FROM_JOB (0x01000000) must remain in the "
"default detach flag bundle so the Desktop GUI update flow "
"can respawn the gateway after Electron exits."
)
def test_windows_detach_flags_without_breakaway_drops_only_that_bit(
self, monkeypatch
):
"""Fallback retry payload for restrictive job objects.
Some Windows Terminal / container configurations refuse
CREATE_BREAKAWAY_FROM_JOB with ERROR_ACCESS_DENIED. Callers
catch ``OSError`` and retry with this payload. It must drop ONLY
the breakaway bit — DETACHED_PROCESS et al. are still required
for the child to survive the parent's exit.
"""
from hermes_cli import _subprocess_compat as sc
monkeypatch.setattr(sc, "IS_WINDOWS", True)
full = sc.windows_detach_flags()
fallback = sc.windows_detach_flags_without_breakaway()
# Fallback equals full minus the breakaway bit, nothing else changed.
assert fallback == full & ~0x01000000
# And the three "detach" bits we still need are present.
assert fallback & 0x00000200, "fallback missing CREATE_NEW_PROCESS_GROUP"
assert fallback & 0x00000008, "fallback missing DETACHED_PROCESS"
assert fallback & 0x08000000, "fallback missing CREATE_NO_WINDOW"
def test_launch_detached_profile_gateway_restart_inlined_watcher_uses_breakaway(self):
"""The inlined respawn script (stringified Python passed to ``python -c``)
must include CREATE_BREAKAWAY_FROM_JOB so the *respawned gateway* also
breaks away from any job-object the watcher itself inherits.
Static check — the watcher source is built at import time and embedded
verbatim in the module text. Parsing it for an exact AST node would be
brittle; the textual presence of the hex flag plus the symbolic name is
a sufficient regression guard.
"""
from pathlib import Path
src = Path(__file__).resolve().parents[2] / "hermes_cli" / "gateway.py"
text = src.read_text(encoding="utf-8")
# Find the watcher block and assert the breakaway flag appears inside it.
marker = "watcher = textwrap.dedent("
idx = text.find(marker)
assert idx != -1, "watcher block not found in gateway.py"
end = text.find(").strip()", idx)
assert end != -1, "watcher block end not found"
block = text[idx:end]
assert "0x01000000" in block, (
"Inlined respawn watcher must set CREATE_BREAKAWAY_FROM_JOB "
"(0x01000000) on the respawned gateway — without it, the new "
"gateway is reaped when the parent job is torn down."
)
assert "_CREATE_BREAKAWAY_FROM_JOB" in block, (
"Inlined respawn watcher must name CREATE_BREAKAWAY_FROM_JOB "
"symbolically so the intent is greppable."
)
# ---------------------------------------------------------------------------