fix(agent): route structured-reasoning empties to prefill, not nudge

Post-tool empty-response nudge fired before the prefill branch for thinking models that emit reasoning via structured API fields (OpenRouter reasoning / reasoning_details, e.g. qwen3-vl-8b-thinking). The nudge guard only checked _has_inline_thinking (<think> tags in content), so every tool-using turn on these models hit the nudge path — one wasted LLM round-trip (~3-5s, ~400 tokens) and a spurious warning, before self-recovering. Hoist the _has_structured computation above the nudge guard and widen the guard from 'not _has_inline_thinking' to 'not _has_structured'. Nudge and prefill are now disjoint on _has_structured; the empty-retry branch's existing _prefill_exhausted guard already handles always-reasoning models falling through after prefill. Closes #34655. Reported by @sawtdakhili.
fix(gateway): only fire planned-stop watcher for self-targeting markers + fix Windows consume (#34749 )
2026-06-23 18:33:19 +08:00 · 2026-05-29 12:23:21 -07:00 · 2026-05-29 17:36:58 +00:00 · 2026-05-29 22:26:24 +05:30 · 2026-05-29 22:26:24 +05:30 · 2026-05-29 22:26:24 +05:30
33 changed files with 2146 additions and 109 deletions
--- a/.github/workflows/contributor-check.yml
+++ b/.github/workflows/contributor-check.yml
@@ -3,11 +3,9 @@ name: Contributor Attribution Check
 on:
  pull_request:
    branches: [main]
-    paths:
-      # Only run when code files change (not docs-only PRs)
-      - '*.py'
-      - '**/*.py'
-      - '.github/workflows/contributor-check.yml'
+  # No paths filter — the job must always run so the required check
+  # reports a status (path-gated workflows leave checks "pending" forever
+  # when no matching files change, which blocks merge).

 permissions:
  contents: read
@@ -20,7 +18,21 @@ jobs:
        with:
          fetch-depth: 0  # Full history needed for git log

+      - name: Check if relevant files changed
+        id: filter
+        run: |
+          BASE="${{ github.event.pull_request.base.sha }}"
+          HEAD="${{ github.event.pull_request.head.sha }}"
+          CHANGED=$(git diff --name-only "$BASE"..."$HEAD" -- '*.py' '**/*.py' '.github/workflows/contributor-check.yml' || true)
+          if [ -n "$CHANGED" ]; then
+            echo "run=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "run=false" >> "$GITHUB_OUTPUT"
+            echo "No Python files changed, skipping attribution check."
+          fi
+
      - name: Check for unmapped contributor emails
+        if: steps.filter.outputs.run == 'true'
        run: |
          # Get the merge base between this PR and main
          MERGE_BASE=$(git merge-base origin/main HEAD)
--- a/.github/workflows/supply-chain-audit.yml
+++ b/.github/workflows/supply-chain-audit.yml
@@ -3,15 +3,9 @@ name: Supply Chain Audit
 on:
  pull_request:
    types: [opened, synchronize, reopened]
-    paths:
-      - '**/*.py'
-      - '**/*.pth'
-      - '**/setup.py'
-      - '**/setup.cfg'
-      - '**/sitecustomize.py'
-      - '**/usercustomize.py'
-      - '**/__init__.pth'
-      - 'pyproject.toml'
+  # No paths filter — the jobs must always run so required checks
+  # report a status (path-gated workflows leave checks "pending" forever
+  # when no matching files change, which blocks merge).

 permissions:
  pull-requests: write
@@ -27,8 +21,44 @@ permissions:
 # advisory-only workflow instead.

 jobs:
+  # ── Path filter (shared by both scan and dep-bounds) ───────────────
+  changes:
+    runs-on: ubuntu-latest
+    outputs:
+      # True when any file the scanner cares about changed in this PR
+      scan: ${{ steps.filter.outputs.scan }}
+      # True when pyproject.toml changed in this PR
+      deps: ${{ steps.filter.outputs.deps }}
+    steps:
+      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
+        with:
+          fetch-depth: 0
+      - name: Check for relevant file changes
+        id: filter
+        run: |
+          BASE="${{ github.event.pull_request.base.sha }}"
+          HEAD="${{ github.event.pull_request.head.sha }}"
+          SCAN_FILES=$(git diff --name-only "$BASE"..."$HEAD" -- \
+            '*.py' '**/*.py' '*.pth' '**/*.pth' \
+            'setup.py' 'setup.cfg' \
+            'sitecustomize.py' 'usercustomize.py' '__init__.pth' \
+            'pyproject.toml' || true)
+          if [ -n "$SCAN_FILES" ]; then
+            echo "scan=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "scan=false" >> "$GITHUB_OUTPUT"
+          fi
+          DEPS_FILES=$(git diff --name-only "$BASE"..."$HEAD" -- 'pyproject.toml' || true)
+          if [ -n "$DEPS_FILES" ]; then
+            echo "deps=true" >> "$GITHUB_OUTPUT"
+          else
+            echo "deps=false" >> "$GITHUB_OUTPUT"
+          fi
+
  scan:
    name: Scan PR for critical supply chain risks
+    needs: changes
+    if: needs.changes.outputs.scan == 'true'
    runs-on: ubuntu-latest
    steps:
      - name: Checkout
@@ -147,10 +177,24 @@ jobs:
          echo "::error::CRITICAL supply chain risk patterns detected in this PR. See the PR comment for details."
          exit 1

+  # Gate: reports success when scan was skipped (no relevant files changed).
+  # This ensures the required check always gets a status.
+  scan-gate:
+    name: Scan PR for critical supply chain risks
+    needs: changes
+    # always() so the gate still reports SUCCESS even if `changes` fails/is
+    # skipped — without it, a failed dependency would leave the required
+    # check unreported (i.e. "pending"), the exact failure mode this fixes.
+    if: always() && needs.changes.outputs.scan != 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - run: echo "No supply-chain-relevant files changed, skipping scan."
+
  dep-bounds:
    name: Check PyPI dependency upper bounds
+    needs: changes
+    if: needs.changes.outputs.deps == 'true'
    runs-on: ubuntu-latest
-    if: contains(github.event.pull_request.changed_files_url, 'pyproject.toml') || true
    steps:
      - name: Checkout
        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd  # v6.0.2
@@ -211,3 +255,16 @@ jobs:
        run: |
          echo "::error::PyPI dependencies without upper bounds detected. Add <next_major ceiling per CONTRIBUTING.md policy."
          exit 1
+
+  # Gate: reports success when dep-bounds was skipped (no pyproject.toml changed).
+  # This ensures the required check always gets a status.
+  dep-bounds-gate:
+    name: Check PyPI dependency upper bounds
+    needs: changes
+    # always() so the gate still reports SUCCESS even if `changes` fails/is
+    # skipped — without it, a failed dependency would leave the required
+    # check unreported (i.e. "pending"), the exact failure mode this fixes.
+    if: always() && needs.changes.outputs.deps != 'true'
+    runs-on: ubuntu-latest
+    steps:
+      - run: echo "No pyproject.toml changes, skipping dependency bounds check."
--- a/agent/conversation_loop.py
+++ b/agent/conversation_loop.py
@@ -3981,10 +3981,25 @@ def run_conversation(
                            re.IGNORECASE,
                        )
                    )
+                    # Detect structured reasoning emitted via API fields
+                    # (OpenRouter `reasoning` / `reasoning_details`, or the
+                    # streaming-accumulated `reasoning_content`).  Thinking
+                    # models like qwen3-vl-8b-thinking return reasoning here
+                    # with empty content after tool calls — that's the model
+                    # still working, not a genuine empty response.  Compute
+                    # this BEFORE the nudge guard so those turns route to the
+                    # prefill branch below instead of wasting an LLM round-trip
+                    # on a nudge.
+                    _has_structured = bool(
+                        getattr(assistant_message, "reasoning", None)
+                        or getattr(assistant_message, "reasoning_content", None)
+                        or getattr(assistant_message, "reasoning_details", None)
+                        or _has_inline_thinking
+                    )
                    if (
                        _prior_was_tool
                        and not getattr(agent, "_post_tool_empty_retried", False)
-                        and not _has_inline_thinking  # thinking model still working — let prefill handle
+                        and not _has_structured  # thinking model still working — let prefill handle
                    ):
                        agent._post_tool_empty_retried = True
                        # Clear stale narration so it doesn't resurface
@@ -4028,12 +4043,8 @@ def run_conversation(
                    # Inspired by clawdbot's "incomplete-text" recovery.
                    # Also covers Qwen3/Ollama in-content <think> blocks
                    # (detected above as _has_inline_thinking).
-                    _has_structured = bool(
-                        getattr(assistant_message, "reasoning", None)
-                        or getattr(assistant_message, "reasoning_content", None)
-                        or getattr(assistant_message, "reasoning_details", None)
-                        or _has_inline_thinking
-                    )
+                    # _has_structured was computed above the nudge guard so
+                    # both branches share the same definition.
                    if _has_structured and agent._thinking_prefill_retries < 2:
                        agent._thinking_prefill_retries += 1
                        logger.info(
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -18442,7 +18442,10 @@ def _run_planned_stop_watcher(
        poll_interval: seconds between marker checks. 0.5s gives a
            responsive shutdown without burning CPU.
    """
-    from gateway.status import _get_planned_stop_marker_path
+    from gateway.status import (
+        _get_planned_stop_marker_path,
+        planned_stop_marker_targets_self,
+    )
    marker_path = _get_planned_stop_marker_path()
    while not stop_event.is_set():
        try:
@@ -18451,6 +18454,26 @@ def _run_planned_stop_watcher(
                and not getattr(runner, "_draining", False)
                and getattr(runner, "_running", False)
            ):
+                # A marker existing is NOT sufficient — it may have been
+                # written for a PREVIOUS gateway instance (different PID)
+                # and left behind because that process exited before the
+                # CLI's stop() could clean it up. Firing the handler on a
+                # stale/foreign marker drives the gateway into shutdown,
+                # then consume_planned_stop_marker_for_self() correctly
+                # reports a PID mismatch — but by then we're already
+                # stopping, so it's logged as an unexpected "UNKNOWN" exit
+                # and the watchdog crash-loops the gateway (issue #34597,
+                # a regression from PR #33798 which added this watcher
+                # without the PID check).
+                #
+                # Only fire when the marker actually targets us. The probe
+                # is non-destructive on a match (the handler does the
+                # authoritative consume on the loop thread) and self-heals
+                # by unlinking stale/malformed markers so they cannot wedge
+                # a freshly booted gateway.
+                if not planned_stop_marker_targets_self():
+                    stop_event.wait(poll_interval)
+                    continue
                # Drive the same path as a real signal handler.
                # Pass signal=None — the handler tolerates that and consumes
                # the marker via consume_planned_stop_marker_for_self,
--- a/gateway/status.py
+++ b/gateway/status.py
@@ -816,12 +816,24 @@ def _consume_pid_marker_for_self(

    our_pid = os.getpid()
    our_start_time = _get_process_start_time(our_pid)
-    matches = (
-        target_pid == our_pid
-        and target_start_time is not None
-        and our_start_time is not None
-        and target_start_time == our_start_time
-    )
+    # Start-time is a PID-reuse guard. It is only meaningful when both
+    # sides actually have it: ``_get_process_start_time`` returns None on
+    # platforms without ``/proc`` (macOS, native Windows — the very
+    # platform the planned-stop watcher exists for). Requiring a non-None
+    # match there would make every consume return False, so a legitimate
+    # ``hermes gateway stop`` on Windows would be misclassified as an
+    # unexpected ``UNKNOWN`` exit (exit 1) and revived by the service
+    # manager. So: when both start_times are known they must match; when
+    # either is unknown, fall back to PID equality alone (bounded by the
+    # marker's short TTL). This mirrors ``planned_stop_marker_targets_self``
+    # so the watcher's non-destructive probe and this authoritative
+    # consume agree on every platform (issue #34597).
+    if target_pid != our_pid:
+        matches = False
+    elif target_start_time is not None and our_start_time is not None:
+        matches = target_start_time == our_start_time
+    else:
+        matches = True

    try:
        path.unlink(missing_ok=True)
@@ -914,6 +926,68 @@ def consume_planned_stop_marker_for_self() -> bool:
    )


+def planned_stop_marker_targets_self() -> bool:
+    """Return True only when a live planned-stop marker names the current process.
+
+    This is a **non-destructive** probe used by the watcher thread
+    (``gateway/run.py:_run_planned_stop_watcher``) to decide whether to
+    trigger shutdown. Unlike :func:`consume_planned_stop_marker_for_self`,
+    it never unlinks a marker that matches us — the shutdown handler does
+    the authoritative consume on its own thread.
+
+    It *does* clean up markers that can never apply to this process:
+    malformed markers and markers older than the TTL are unlinked so a
+    stale file left behind by a previous gateway instance cannot wedge
+    the new one. Markers naming a different PID/start_time are left in
+    place (they may still be consumed legitimately by the process they
+    name) but report False here.
+
+    Returns False (without raising) on any read/parse error.
+    """
+    path = _get_planned_stop_marker_path()
+    record = _read_json_file(path)
+    if not record:
+        return False
+
+    try:
+        target_pid = int(record["target_pid"])
+        target_start_time = record.get("target_start_time")
+        written_at = record.get("written_at") or ""
+    except (KeyError, TypeError, ValueError):
+        # Malformed marker can never match anyone — drop it.
+        try:
+            path.unlink(missing_ok=True)
+        except OSError:
+            pass
+        return False
+
+    if _marker_is_stale(written_at, _PLANNED_STOP_MARKER_TTL_S):
+        # A marker this old is past its useful life regardless of target —
+        # clean it up so it cannot crash-loop a freshly booted gateway.
+        try:
+            path.unlink(missing_ok=True)
+        except OSError:
+            pass
+        return False
+
+    our_pid = os.getpid()
+    if target_pid != our_pid:
+        return False
+
+    # Start-time is a PID-reuse guard. It is only meaningful when both
+    # sides actually have it: ``_get_process_start_time`` returns None on
+    # platforms without ``/proc`` (macOS, native Windows — the very
+    # platform this watcher exists for). Requiring a non-None match there
+    # would make the watcher never fire and re-break the #33778 Windows
+    # session-resume path. So: when both start_times are known they must
+    # match; when either is unknown, fall back to PID equality alone
+    # (the marker is short-lived under a 60s TTL, bounding reuse risk).
+    our_start_time = _get_process_start_time(our_pid)
+    if target_start_time is not None and our_start_time is not None:
+        return target_start_time == our_start_time
+    return True
+
+
 def clear_planned_stop_marker() -> None:
    """Remove the planned-stop marker unconditionally."""
    try:
--- a/hermes_cli/main.py
+++ b/hermes_cli/main.py
@@ -13390,6 +13390,11 @@ Examples:
        "--yes", "-y", action="store_true", help="Skip confirmation"
    )

+    sessions_subparsers.add_parser(
+        "optimize",
+        help="Reclaim disk space: merge FTS5 segments + VACUUM (no data change)",
+    )
+
    sessions_subparsers.add_parser("stats", help="Show session store statistics")

    sessions_rename = sessions_subparsers.add_parser(
@@ -13562,6 +13567,34 @@ Examples:
            relaunch(["--resume", selected_id])
            return  # won't reach here after execvp

+        elif action == "optimize":
+            db_path = db.db_path
+            before_mb = (
+                os.path.getsize(db_path) / (1024 * 1024)
+                if db_path.exists()
+                else 0.0
+            )
+            print("Optimizing session store (FTS merge + VACUUM)…")
+            try:
+                # vacuum() merges FTS5 segments (optimize_fts) then VACUUMs,
+                # and returns the number of indexes it merged.
+                n = db.vacuum()
+            except Exception as e:
+                print(f"Error: optimization failed: {e}")
+                db.close()
+                return
+            after_mb = (
+                os.path.getsize(db_path) / (1024 * 1024)
+                if db_path.exists()
+                else 0.0
+            )
+            saved = before_mb - after_mb
+            print(f"Optimized {n} FTS index(es).")
+            print(
+                f"Database size: {before_mb:.1f} MB -> {after_mb:.1f} MB "
+                f"(reclaimed {saved:.1f} MB)"
+            )
+
        elif action == "stats":
            total = db.session_count()
            msgs = db.message_count()
--- a/hermes_cli/models.py
+++ b/hermes_cli/models.py
@@ -49,7 +49,7 @@ OPENROUTER_MODELS: list[tuple[str, str]] = [
    ("xiaomi/mimo-v2.5-pro",                   ""),
    ("tencent/hy3-preview",                    ""),
    ("google/gemini-3-pro-image-preview",      ""),
-    ("google/gemini-3-flash-preview",          ""),
+    ("google/gemini-3.5-flash",                ""),
    ("google/gemini-3.1-pro-preview",          ""),
    ("google/gemini-3.1-flash-lite-preview",   ""),
    ("qwen/qwen3.6-35b-a3b",                   ""),
@@ -156,7 +156,7 @@ _PROVIDER_MODELS: dict[str, list[str]] = {
        "xiaomi/mimo-v2.5-pro",
        "tencent/hy3-preview",
        "google/gemini-3-pro-preview",
-        "google/gemini-3-flash-preview",
+        "google/gemini-3.5-flash",
        "google/gemini-3.1-pro-preview",
        "google/gemini-3.1-flash-lite-preview",
        "qwen/qwen3.6-35b-a3b",
--- a/hermes_cli/nous_subscription.py
+++ b/hermes_cli/nous_subscription.py
@@ -71,12 +71,16 @@ class NousSubscriptionFeatures:
    def browser(self) -> NousFeatureState:
        return self.features["browser"]

+    @property
+    def video_gen(self) -> NousFeatureState:
+        return self.features["video_gen"]
+
    @property
    def modal(self) -> NousFeatureState:
        return self.features["modal"]

    def items(self) -> Iterable[NousFeatureState]:
-        ordered = ("web", "image_gen", "tts", "browser", "modal")
+        ordered = ("web", "image_gen", "video_gen", "tts", "browser", "modal")
        for key in ordered:
            yield self.features[key]

@@ -255,6 +259,7 @@ def get_nous_subscription_features(

    web_tool_enabled = _toolset_enabled(config, "web")
    image_tool_enabled = _toolset_enabled(config, "image_gen")
+    video_tool_enabled = _toolset_enabled(config, "video_gen")
    tts_tool_enabled = _toolset_enabled(config, "tts")
    browser_tool_enabled = _toolset_enabled(config, "browser")
    modal_tool_enabled = _toolset_enabled(config, "terminal")
@@ -289,6 +294,8 @@ def get_nous_subscription_features(
    browser_use_gateway = _uses_gateway(browser_cfg)
    image_gen_cfg = config.get("image_gen") if isinstance(config.get("image_gen"), dict) else {}
    image_use_gateway = _uses_gateway(image_gen_cfg)
+    video_gen_cfg = config.get("video_gen") if isinstance(config.get("video_gen"), dict) else {}
+    video_use_gateway = _uses_gateway(video_gen_cfg)

    direct_exa = bool(get_env_value("EXA_API_KEY"))
    direct_firecrawl = bool(get_env_value("FIRECRAWL_API_KEY") or get_env_value("FIRECRAWL_API_URL"))
@@ -296,6 +303,7 @@ def get_nous_subscription_features(
    direct_tavily = bool(get_env_value("TAVILY_API_KEY"))
    direct_searxng = bool(get_env_value("SEARXNG_URL"))
    direct_fal = fal_key_is_configured()
+    direct_fal_video = direct_fal  # same FAL_KEY; separate var so use_gateway is independent
    direct_openai_tts = bool(resolve_openai_audio_api_key())
    direct_elevenlabs = bool(get_env_value("ELEVENLABS_API_KEY"))
    direct_camofox = bool(get_env_value("CAMOFOX_URL"))
@@ -311,6 +319,8 @@ def get_nous_subscription_features(
        direct_tavily = False
    if image_use_gateway:
        direct_fal = False
+    if video_use_gateway:
+        direct_fal_video = False
    if tts_use_gateway:
        direct_openai_tts = False
        direct_elevenlabs = False
@@ -320,6 +330,8 @@ def get_nous_subscription_features(

    managed_web_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("firecrawl")
    managed_image_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("fal-queue")
+    # Video gen uses the same fal-queue gateway as image gen.
+    managed_video_available = managed_image_available
    managed_tts_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("openai-audio")
    managed_browser_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("browser-use")
    managed_modal_available = managed_tools_flag and nous_auth_present and is_managed_tool_gateway_ready("modal")
@@ -357,6 +369,10 @@ def get_nous_subscription_features(
    image_active = bool(image_tool_enabled and (image_managed or direct_fal))
    image_available = bool(managed_image_available or direct_fal)

+    video_managed = video_tool_enabled and managed_video_available and not direct_fal_video
+    video_active = bool(video_tool_enabled and (video_managed or direct_fal_video))
+    video_available = bool(managed_video_available or direct_fal_video)
+
    tts_current_provider = tts_provider or "edge"
    tts_managed = (
        tts_tool_enabled
@@ -451,6 +467,18 @@ def get_nous_subscription_features(
            current_provider="FAL" if direct_fal else ("Nous Subscription" if image_managed else ""),
            explicit_configured=direct_fal,
        ),
+        "video_gen": NousFeatureState(
+            key="video_gen",
+            label="Video generation",
+            included_by_default=False,
+            available=video_available,
+            active=video_active,
+            managed_by_nous=video_managed,
+            direct_override=video_active and not video_managed,
+            toolset_enabled=video_tool_enabled,
+            current_provider="FAL" if direct_fal_video else ("Nous Subscription" if video_managed else ""),
+            explicit_configured=direct_fal_video,
+        ),
        "tts": NousFeatureState(
            key="tts",
            label="OpenAI TTS",
@@ -561,6 +589,9 @@ def apply_nous_managed_defaults(
    if "image_gen" in selected_toolsets and not fal_key_is_configured():
        changed.add("image_gen")

+    if "video_gen" in selected_toolsets and not fal_key_is_configured():
+        changed.add("video_gen")
+
    return changed


@@ -571,6 +602,7 @@ def apply_nous_managed_defaults(
 _GATEWAY_TOOL_LABELS = {
    "web": "Web search & extract (Firecrawl)",
    "image_gen": "Image generation (FAL)",
+    "video_gen": "Video generation (FAL)",
    "tts": "Text-to-speech (OpenAI TTS)",
    "browser": "Browser automation (Browser Use)",
 }
@@ -578,6 +610,7 @@ _GATEWAY_TOOL_LABELS = {

 def _get_gateway_direct_credentials() -> Dict[str, bool]:
    """Return a dict of tool_key -> has_direct_credentials."""
+    fal_direct = fal_key_is_configured()
    return {
        "web": bool(
            get_env_value("FIRECRAWL_API_KEY")
@@ -586,7 +619,8 @@ def _get_gateway_direct_credentials() -> Dict[str, bool]:
            or get_env_value("TAVILY_API_KEY")
            or get_env_value("EXA_API_KEY")
        ),
-        "image_gen": fal_key_is_configured(),
+        "image_gen": fal_direct,
+        "video_gen": fal_direct,
        "tts": bool(
            resolve_openai_audio_api_key()
            or get_env_value("ELEVENLABS_API_KEY")
@@ -601,11 +635,12 @@ def _get_gateway_direct_credentials() -> Dict[str, bool]:
 _GATEWAY_DIRECT_LABELS = {
    "web": "Firecrawl/Exa/Parallel/Tavily key",
    "image_gen": "FAL key",
+    "video_gen": "FAL key",
    "tts": "OpenAI/ElevenLabs key",
    "browser": "Browser Use/Browserbase key",
 }

-_ALL_GATEWAY_KEYS = ("web", "image_gen", "tts", "browser")
+_ALL_GATEWAY_KEYS = ("web", "image_gen", "video_gen", "tts", "browser")


 def get_gateway_eligible_tools(
@@ -646,6 +681,7 @@ def get_gateway_eligible_tools(
    opted_in = {
        "web": _uses_gateway(config.get("web")),
        "image_gen": _uses_gateway(config.get("image_gen")),
+        "video_gen": _uses_gateway(config.get("video_gen")),
        "tts": _uses_gateway(config.get("tts")),
        "browser": _uses_gateway(config.get("browser")),
    }
@@ -714,6 +750,15 @@ def apply_gateway_defaults(
        image_cfg["use_gateway"] = True
        changed.add("image_gen")

+    if "video_gen" in tool_keys:
+        video_cfg = config.get("video_gen")
+        if not isinstance(video_cfg, dict):
+            video_cfg = {}
+            config["video_gen"] = video_cfg
+        video_cfg["provider"] = "fal"
+        video_cfg["use_gateway"] = True
+        changed.add("video_gen")
+
    return changed


--- a/hermes_cli/setup.py
+++ b/hermes_cli/setup.py
@@ -454,22 +454,25 @@ def _print_setup_summary(config: dict, hermes_home):
    # Video generation — opt-in via `hermes tools` → Video Generation.
    # Only show the row when a plugin reports available so we don't badger
    # users who don't care about video gen with a "missing" status line.
-    try:
-        from agent.video_gen_registry import list_providers as _list_video_providers
-        from hermes_cli.plugins import _ensure_plugins_discovered as _ensure_plugins
-        _ensure_plugins()
-        _video_backend = None
-        for _vp in _list_video_providers():
-            try:
-                if _vp.is_available():
-                    _video_backend = _vp.display_name
-                    break
-            except Exception:
-                continue
-    except Exception:
-        _video_backend = None
-    if _video_backend:
-        tool_status.append((f"Video Generation ({_video_backend})", True, None))
+    if subscription_features.video_gen.managed_by_nous:
+        tool_status.append(("Video Generation (FAL via Nous subscription)", True, None))
+    else:
+        try:
+            from agent.video_gen_registry import list_providers as _list_video_providers
+            from hermes_cli.plugins import _ensure_plugins_discovered as _ensure_plugins
+            _ensure_plugins()
+            _video_backend = None
+            for _vp in _list_video_providers():
+                try:
+                    if _vp.is_available():
+                        _video_backend = _vp.display_name
+                        break
+                except Exception:
+                    continue
+        except Exception:
+            _video_backend = None
+        if _video_backend:
+            tool_status.append((f"Video Generation ({_video_backend})", True, None))

    # TTS — show configured provider
    tts_provider = cfg_get(config, "tts", "provider", default="edge")
--- a/hermes_cli/tools_config.py
+++ b/hermes_cli/tools_config.py
@@ -339,11 +339,26 @@ TOOL_CATEGORIES = {
    "video_gen": {
        "name": "Video Generation",
        "icon": "🎬",
-        # Providers list is intentionally empty — every video gen backend
-        # is a plugin, surfaced by ``_plugin_video_gen_providers()`` and
-        # injected by ``_visible_providers``. Mirrors the design we'll
-        # converge image_gen toward.
-        "providers": [],
+        # "Nous Subscription" row mirrors the image_gen pattern — managed
+        # FAL video generation billed via the Nous Portal.  Plugin-backed
+        # provider rows (FAL BYOK, xAI, …) are injected at runtime by
+        # ``_plugin_video_gen_providers()`` in ``_visible_providers``.
+        "providers": [
+            {
+                "name": "Nous Subscription",
+                "badge": "subscription",
+                "tag": "Managed FAL video generation billed to your subscription",
+                "env_vars": [],
+                "requires_nous_auth": True,
+                "managed_nous_feature": "video_gen",
+                "override_env_vars": ["FAL_KEY"],
+                # The underlying plugin backend — when the user picks
+                # "Nous Subscription" we set video_gen.provider = "fal"
+                # and video_gen.use_gateway = True so the FAL plugin
+                # routes through the managed queue gateway.
+                "video_gen_plugin_name": "fal",
+            },
+        ],
    },
    "x_search": {
        "name": "X (Twitter) Search",
@@ -1438,7 +1453,7 @@ def _toolset_has_keys(
        except Exception:
            return False

-    if ts_key in {"web", "image_gen", "tts", "browser"}:
+    if ts_key in {"web", "image_gen", "video_gen", "tts", "browser"}:
        features = get_nous_subscription_features(config, force_fresh=force_fresh)
        feature = features.features.get(ts_key)
        if feature and (feature.available or feature.managed_by_nous):
@@ -2153,7 +2168,7 @@ def _is_provider_active(
        return isinstance(image_cfg, dict) and image_cfg.get("provider") == plugin_name

    video_plugin_name = provider.get("video_gen_plugin_name")
-    if video_plugin_name:
+    if video_plugin_name and not provider.get("managed_nous_feature"):
        video_cfg = config.get("video_gen", {})
        return isinstance(video_cfg, dict) and video_cfg.get("provider") == video_plugin_name

@@ -2172,6 +2187,15 @@ def _is_provider_active(
                if image_cfg.get("use_gateway") is not None and not is_truthy_value(image_cfg.get("use_gateway"), default=False):
                    return False
            return feature.managed_by_nous
+        if managed_feature == "video_gen":
+            video_cfg = config.get("video_gen", {})
+            if isinstance(video_cfg, dict):
+                configured_provider = video_cfg.get("provider")
+                if configured_provider not in {None, "", "fal"}:
+                    return False
+                if video_cfg.get("use_gateway") is not None and not is_truthy_value(video_cfg.get("use_gateway"), default=False):
+                    return False
+            return feature.managed_by_nous
        if provider.get("tts_provider"):
            return (
                feature.managed_by_nous
@@ -2505,14 +2529,14 @@ def _configure_videogen_model_for_plugin(plugin_name: str, config: dict) -> None
    _print_success(f"  Model set to: {chosen}")


-def _select_plugin_video_gen_provider(plugin_name: str, config: dict) -> None:
+def _select_plugin_video_gen_provider(plugin_name: str, config: dict, *, use_gateway: bool = False) -> None:
    """Persist a plugin-backed video generation provider selection."""
    vid_cfg = config.setdefault("video_gen", {})
    if not isinstance(vid_cfg, dict):
        vid_cfg = {}
        config["video_gen"] = vid_cfg
    vid_cfg["provider"] = plugin_name
-    vid_cfg["use_gateway"] = False
+    vid_cfg["use_gateway"] = use_gateway
    _print_success(f"  video_gen.provider set to: {plugin_name}")
    _configure_videogen_model_for_plugin(plugin_name, config)

@@ -2597,7 +2621,7 @@ def _configure_provider(
        # registry.
        video_plugin = provider.get("video_gen_plugin_name")
        if video_plugin:
-            _select_plugin_video_gen_provider(video_plugin, config)
+            _select_plugin_video_gen_provider(video_plugin, config, use_gateway=bool(managed_feature))
            return
        # Imagegen backends prompt for model selection after backend pick.
        backend = provider.get("imagegen_backend")
@@ -2676,7 +2700,7 @@ def _configure_provider(
            return
        video_plugin = provider.get("video_gen_plugin_name")
        if video_plugin:
-            _select_plugin_video_gen_provider(video_plugin, config)
+            _select_plugin_video_gen_provider(video_plugin, config, use_gateway=bool(managed_feature))
            return
        # Imagegen backends prompt for model selection after env vars are in.
        backend = provider.get("imagegen_backend")
@@ -2957,7 +2981,7 @@ def _reconfigure_provider(
        # Plugin-registered video_gen provider — same flow, different registry.
        video_plugin = provider.get("video_gen_plugin_name")
        if video_plugin:
-            _select_plugin_video_gen_provider(video_plugin, config)
+            _select_plugin_video_gen_provider(video_plugin, config, use_gateway=bool(managed_feature))
            return
        # Imagegen backends prompt for model selection on reconfig too.
        backend = provider.get("imagegen_backend")
@@ -2997,7 +3021,7 @@ def _reconfigure_provider(
    # Plugin-registered video_gen provider — same flow, different registry.
    video_plugin = provider.get("video_gen_plugin_name")
    if video_plugin:
-        _select_plugin_video_gen_provider(video_plugin, config)
+        _select_plugin_video_gen_provider(video_plugin, config, use_gateway=bool(managed_feature))
        return

    backend = provider.get("imagegen_backend")
--- a/hermes_state.py
+++ b/hermes_state.py
@@ -3251,7 +3251,59 @@ class SessionDB:

    # ── Space reclamation ──

-    def vacuum(self) -> None:
+    # FTS5 virtual tables whose b-tree segments we merge on optimize. The
+    # trigram table is created lazily / may be disabled, so we probe before
+    # touching it (see optimize_fts).
+    _FTS_TABLES = ("messages_fts", "messages_fts_trigram")
+
+    def _fts_table_exists(self, name: str) -> bool:
+        """True if an FTS5 virtual table is queryable in this DB."""
+        try:
+            self._conn.execute(f"SELECT 1 FROM {name} LIMIT 0")
+            return True
+        except sqlite3.OperationalError:
+            return False
+
+    def optimize_fts(self) -> int:
+        """Merge fragmented FTS5 b-tree segments into one per index.
+
+        FTS5 indexes grow as a series of incremental segments — one per
+        ``INSERT`` batch driven by the message triggers. Over tens of
+        thousands of messages these segments accumulate, which both bloats
+        the ``*_data`` shadow tables and slows ``MATCH`` queries that must
+        scan every segment. The special ``'optimize'`` command rewrites each
+        index as a single merged segment.
+
+        This is purely a maintenance operation — it changes neither search
+        results nor ``snippet()`` output, only on-disk layout and query
+        speed. It is complementary to VACUUM: ``optimize`` compacts the FTS
+        index internally, then VACUUM returns the freed pages to the OS.
+
+        Skips any FTS table that does not exist (e.g. the trigram index when
+        disabled via ``HERMES_DISABLE_FTS_TRIGRAM`` or not yet created), so
+        it is safe to call unconditionally.
+
+        Returns the number of FTS indexes that were optimized.
+        """
+        optimized = 0
+        with self._lock:
+            for tbl in self._FTS_TABLES:
+                if not self._fts_table_exists(tbl):
+                    continue
+                try:
+                    # The column name in the INSERT must match the table name
+                    # for FTS5 special commands.
+                    self._conn.execute(
+                        f"INSERT INTO {tbl}({tbl}) VALUES('optimize')"
+                    )
+                    optimized += 1
+                except sqlite3.OperationalError as exc:
+                    logger.warning(
+                        "FTS optimize failed for %s: %s", tbl, exc
+                    )
+        return optimized
+
+    def vacuum(self) -> int:
        """Run VACUUM to reclaim disk space after large deletes.

        SQLite does not shrink the database file when rows are deleted —
@@ -3264,7 +3316,21 @@ class SessionDB:
        exclusive lock, so callers must ensure no other writers are
        active. Safe to call at startup before the gateway/CLI starts
        serving traffic.
+
+        FTS5 segments are merged first via :meth:`optimize_fts` so the
+        subsequent VACUUM reclaims the pages freed by the merge. This is a
+        layout-only optimization — search results are unchanged.
+
+        Returns the number of FTS indexes that were optimized (0 if the
+        merge step failed or no FTS tables exist).
        """
+        # Merge FTS5 segments before VACUUM so the freed pages are returned
+        # to the OS in the same pass. optimize_fts() manages its own lock.
+        optimized = 0
+        try:
+            optimized = self.optimize_fts()
+        except Exception as exc:
+            logger.warning("FTS optimize before VACUUM failed: %s", exc)
        # VACUUM cannot be executed inside a transaction.
        with self._lock:
            # Best-effort WAL checkpoint first, then VACUUM.
@@ -3273,6 +3339,7 @@ class SessionDB:
            except Exception:
                pass
            self._conn.execute("VACUUM")
+        return optimized

    def maybe_auto_prune_and_vacuum(
        self,
--- a/nix/packages.nix
+++ b/nix/packages.nix
@@ -43,7 +43,6 @@
            "modal"
            "parallel-web"
            "tts-premium"
-            "vercel"
            "voice"
          ] ++ lib.optionals pkgs.stdenv.isLinux [ "matrix" ];
        };
--- a/optional-skills/autonomous-ai-agents/antigravity-cli/SKILL.md
+++ b/optional-skills/autonomous-ai-agents/antigravity-cli/SKILL.md
@@ -0,0 +1,177 @@
+---
+name: antigravity-cli
+description: "Operate the Antigravity CLI (agy): plugins, auth, sandbox."
+version: 0.1.0
+author: Tony Simons (asimons81), Hermes Agent
+license: MIT
+platforms: [linux, macos, windows]
+metadata:
+  hermes:
+    tags: [Coding-Agent, Antigravity, CLI, Auth, Plugins, Sandbox]
+    related_skills: [grok, codex, claude-code, hermes-agent]
+---
+
+# Antigravity CLI (`agy`)
+
+Operator guide for the Antigravity CLI, invoked as `agy`. Run all `agy`
+commands through the Hermes `terminal` tool; inspect its config and logs with
+`read_file`. This skill is reference + procedure — it does not wrap a network
+API, so there is nothing to authenticate from Hermes itself.
+
+## When to Use
+
+- Installing, updating, or smoke-testing the `agy` binary
+- Driving non-interactive `agy --print` / `agy -p` one-shots
+- Debugging Antigravity auth, sandbox, permissions, or plugin state
+- Reading Antigravity settings, keybindings, conversations, or logs
+
+## Mental model
+
+Antigravity has two layers — keep them distinct or the guidance will be wrong:
+
+1. **Shell wrapper commands** — `agy help`, `agy install`, `agy plugin`,
+   `agy update`, `agy changelog`. Run these through the `terminal` tool.
+2. **Interactive in-session slash commands** — `/config`, `/permissions`,
+   `/skills`, `/agents`, etc. These only exist inside a running `agy` TUI
+   session, not on the shell wrapper.
+
+`agy help` shows the shell wrapper surface, NOT the in-session slash commands.
+
+## Prerequisites
+
+- The `agy` binary on PATH. Verify through the `terminal` tool:
+  `command -v agy && agy --version`.
+- No env vars or API keys required by this skill — Antigravity manages its own
+  auth via the OS keyring / browser sign-in (see Authentication below).
+
+## How to Run
+
+Invoke every `agy` command through the `terminal` tool. Examples:
+
+```
+terminal(command="agy --version")
+terminal(command="agy help")
+terminal(command="agy plugin list")
+terminal(command="agy --print 'Summarize the repo in 3 bullets'", workdir="/path/to/project")
+```
+
+For an interactive multi-turn TUI session, launch `agy` with `pty=true` (and
+tmux for capture/monitoring), the same pattern the `codex` / `claude-code`
+skills use. For one-shot smoke tests and scripted prompts, prefer
+`agy --print` (non-interactive).
+
+To inspect Antigravity's own files, use `read_file` on the paths under Core
+paths below — do not `cat` them through the terminal.
+
+## Core paths
+
+- Binary / entrypoint: `agy`
+- App data dir: `~/.gemini/antigravity-cli/`
+- Settings file: `~/.gemini/antigravity-cli/settings.json`
+- Keybindings file: `~/.gemini/antigravity-cli/keybindings.json`
+- Logs: `~/.gemini/antigravity-cli/log/cli-*.log`
+- Conversations: `~/.gemini/antigravity-cli/conversations/`
+- Brain artifacts: `~/.gemini/antigravity-cli/brain/`
+- History: `~/.gemini/antigravity-cli/history.jsonl`
+- Plugin staging: `~/.gemini/antigravity-cli/plugins/<plugin_name>/`
+
+## Quick Reference
+
+### Wrapper commands
+- `agy changelog`
+- `agy help`
+- `agy install`
+- `agy plugin` / `agy plugins`
+- `agy update`
+
+### Useful flags
+- `--add-dir`
+- `--continue` / `-c`
+- `--conversation`
+- `--dangerously-skip-permissions`
+- `--print` / `-p`
+- `--print-timeout`
+- `--prompt`
+- `--prompt-interactive` / `-i`
+- `--sandbox`
+- `--log-file`
+- `--version`
+
+### Plugin subcommands (`agy plugin --help`)
+- `list`, `import [source]`, `install <target>`, `uninstall <name>`,
+  `enable <name>`, `disable <name>`, `validate [path]`, `link <mp> <target>`,
+  `help`
+
+### Install flags (`agy install --help`)
+- `--dir`, `--skip-aliases`, `--skip-path`
+
+### In-session slash commands
+- **Conversation control:** `/resume` (`/switch`), `/rewind` (`/undo`),
+  `/rename <name>`, `/clear`, `/fork`, `/reset`, `/new`
+- **Settings & tools:** `/config`, `/settings`, `/permissions`, `/model`,
+  `/keybindings`, `/statusline`, `/tasks`, `/skills`, `/mcp`, `/open <path>`,
+  `/usage`, `/logout`, `/agents`
+- **Prompt helpers:** `@` path autocomplete, `esc esc` clears the prompt (when
+  not streaming), `!` runs a terminal command directly, `?` opens help
+
+## Settings and permissions
+
+### Common settings keys (`settings.json`)
+- `allowNonWorkspaceAccess`
+- `colorScheme`
+- `permissions.allow`
+- `trustedWorkspaces`
+
+### Permission modes
+`request-review`, `always-proceed`, `strict`, `proceed-in-sandbox`.
+
+### Sandbox behavior
+- `enableTerminalSandbox` is a boolean in `settings.json`; default `false`.
+- Launch-time overrides (`--sandbox`, `--dangerously-skip-permissions`) can
+  supersede persistent settings for the current session.
+
+## Authentication behavior
+
+- The CLI tries the OS secure keyring first.
+- With no saved session, it falls back to browser-based Google sign-in.
+- Locally it opens the default browser; over SSH it prints an authorization URL
+  and expects the auth code pasted back.
+- `/logout` removes saved credentials.
+
+## Plugins
+
+- Plugins stage under `~/.gemini/antigravity-cli/plugins/<plugin_name>/`.
+- They can bundle skills, agents, rules, MCP servers, and hooks.
+- `agy plugin list` returning no imported plugins is a valid empty state.
+
+## Pitfalls
+
+- `agy help` shows wrapper commands, not interactive slash commands.
+- `agy --version` is the safe non-interactive version check; `agy version` is
+  interactive and can fail without a real TTY.
+- First place to look for failures: `~/.gemini/antigravity-cli/log/cli-*.log`
+  (read with `read_file`).
+- Don't confuse persistent JSON settings with launch-time overrides.
+- `~/.gemini/antigravity-cli/bin/agentapi` is a thin wrapper to `agy agentapi`.
+- On WSL, token storage is file-based, so auth issues are usually local-file /
+  session-state problems, not browser-only problems.
+- Workspace identity can depend on launch directory and the `.antigravitycli`
+  project marker.
+
+## Verification
+
+Confirm the install is real and usable, all through the `terminal` tool (read
+files with `read_file`):
+
+1. `terminal(command="command -v agy")`
+2. `terminal(command="agy --version")`
+3. `terminal(command="agy help")`
+4. `terminal(command="agy plugin list")`
+5. `read_file` on `~/.gemini/antigravity-cli/settings.json`
+6. `read_file` on the latest `~/.gemini/antigravity-cli/log/cli-*.log`
+7. If needed, `read_file` on `~/.gemini/antigravity-cli/keybindings.json`
+
+## Support files
+
+- `references/cli-docs.md` — condensed notes from the getting-started, usage,
+  and features docs.
--- a/optional-skills/autonomous-ai-agents/antigravity-cli/references/cli-docs.md
+++ b/optional-skills/autonomous-ai-agents/antigravity-cli/references/cli-docs.md
@@ -0,0 +1,64 @@
+# Antigravity CLI docs, condensed
+
+Source pages reviewed:
+- `/docs/cli-getting-started`
+- `/docs/cli-using`
+- `/docs/cli-features`
+
+## Install
+- macOS/Linux: `curl -fsSL https://antigravity.google/cli/install.sh | bash`
+- Windows PowerShell: `irm https://antigravity.google/cli/install.ps1 | iex`
+- Windows CMD: `curl -fsSL https://antigravity.google/cli/install.cmd -o install.cmd && install.cmd && del install.cmd`
+
+## Authentication
+- Tries secure keyring first.
+- If no saved session exists, falls back to browser-based Google sign-in.
+- Local machine: opens the default browser.
+- SSH/remote: prints a secure authorization URL, then expects the auth code to be pasted back.
+- `/logout` removes saved credentials.
+
+## Config and files
+- Settings: `~/.gemini/antigravity-cli/settings.json`
+- Keybindings: `~/.gemini/antigravity-cli/keybindings.json`
+- Plugins: `~/.gemini/antigravity-cli/plugins/<plugin_name>/`
+
+## Useful slash commands
+- `/config`, `/settings`
+- `/permissions`
+- `/resume` / `/switch`
+- `/rewind` / `/undo`
+- `/rename <name>`
+- `/model`
+- `/keybindings`
+- `/statusline`
+- `/tasks`
+- `/skills`
+- `/mcp`
+- `/open <path>`
+- `/usage`
+- `/logout`
+- `/agents`
+
+## Prompt helpers
+- `@` path autocomplete
+- `esc esc` clears prompt when not streaming
+- `!` runs a terminal command
+- `?` opens help / slash command list
+
+## Permissions and sandbox
+- Permission modes: `request-review`, `always-proceed`, `strict`, `proceed-in-sandbox`
+- Launch overrides: `--sandbox`, `--dangerously-skip-permissions`
+- Sandbox setting: `enableTerminalSandbox` in `settings.json` (default `false`)
+
+## Plugins
+- Plugins can bundle skills, agents, rules, MCP servers, and hooks.
+- They are staged locally and auto-discovered once installed.
+
+## Subagents
+- `/agents` opens the panel for active/completed subagents.
+- Subagents can run in parallel and request approvals.
+
+## Keybindings
+- `~/.gemini/antigravity-cli/keybindings.json`
+- Malformed JSON falls back to defaults for broken actions.
+- Docs list default bindings for clear, submit, cancel, exit, suspend, editor, approval yes/no, navigation, clipboard, undo/redo, and newline insertion.
--- a/optional-skills/autonomous-ai-agents/grok/SKILL.md
+++ b/optional-skills/autonomous-ai-agents/grok/SKILL.md
@@ -0,0 +1,301 @@
+---
+name: grok
+description: "Delegate coding to xAI Grok Build CLI (features, PRs)."
+version: 0.1.0
+author: Matt Maximo (MattMaximo), Hermes Agent
+license: MIT
+platforms: [linux, macos, windows]
+metadata:
+  hermes:
+    tags: [Coding-Agent, Grok, xAI, Code-Review, Refactoring, Automation]
+    related_skills: [codex, claude-code, hermes-agent]
+---
+
+# Grok Build CLI — Hermes Orchestration Guide
+
+Delegate coding tasks to [Grok Build](https://docs.x.ai/build/overview) (xAI's
+autonomous coding agent CLI, the `grok` command) via the Hermes terminal. Grok
+can read files, write code, run shell commands, spawn subagents, and manage git
+workflows. It runs three ways: an interactive TUI, **headless** (`-p`), and as
+an **ACP agent** over JSON-RPC.
+
+This is the third sibling to `codex` and `claude-code`. The orchestration
+pattern is nearly identical — **prefer headless `-p` for one-shots**, use a PTY
+for interactive sessions.
+
+## When to use
+
+- Building features
+- Refactoring
+- PR reviews
+- Batch issue fixing
+- Any task where you'd otherwise reach for Codex / Claude Code but want Grok
+
+## Prerequisites
+
+- **Install (preferred):** `npm install -g @xai-official/grok`
+  - The official installer `curl -fsSL https://x.ai/cli/install.sh | bash` also
+    works, but the `x.ai` host is Cloudflare-walled in some environments. The
+    npm path avoids that dependency entirely.
+- **Auth — SuperGrok / X Premium+ subscription (primary path):**
+  - Run `grok login` once → opens a browser for OAuth → token cached in
+    `~/.grok/auth.json`. This uses your **SuperGrok or X Premium+** subscription
+    (no per-token API billing).
+  - Check sign-in state by looking for `~/.grok/auth.json`, or run a cheap
+    headless smoke test: `grok --no-auto-update -p "Say ok."`
+  - In the TUI, `/logout` signs out and `/login` (or relaunching) signs back in.
+- **No git repo required** — unlike Codex, Grok runs fine outside a git
+  directory (good for scratch/throwaway tasks).
+- **Claude Code / AGENTS.md compatible with zero config** — Grok auto-reads
+  `CLAUDE.md`, `.claude/` (skills, agents, MCPs, hooks, rules), and the
+  `AGENTS.md` family. Existing project context just works.
+
+> **API-key fallback (not the default for this user):** Grok also supports
+> setting the `XAI_API_KEY` environment variable for pay-as-you-go billing
+> via `api.x.ai`. Only use
+> this if `grok login` / SuperGrok auth is unavailable. The subscription path
+> (`grok login`) is the intended setup here.
+
+## Two Orchestration Modes
+
+### Mode 1: Headless (`-p`) — Non-Interactive (PREFERRED)
+
+Runs a one-shot task, prints the result, and exits. No PTY, no interactive
+dialogs to navigate. This is the cleanest integration path — the analog of
+`claude -p` and `codex exec`.
+
+```
+terminal(command="grok --no-auto-update -p 'Add a dark mode toggle to settings'", workdir="/path/to/project", timeout=180)
+```
+
+Always pass `--no-auto-update` in automation to skip background update checks.
+
+**When to use headless:**
+- One-shot coding tasks (fix a bug, add a feature, refactor)
+- CI/CD automation and scripting
+- Structured output parsing with `--output-format json`
+- Any task that doesn't need multi-turn conversation
+
+### Mode 2: Interactive PTY — Multi-Turn TUI Sessions
+
+The TUI is a fullscreen, mouse-interactive app. Drive it with `pty=true`. For
+robust monitoring/input use tmux (same pattern as the `claude-code` skill).
+
+```
+# Launch in a tmux session for capture-pane monitoring
+terminal(command="tmux new-session -d -s grok-work -x 140 -y 40")
+terminal(command="tmux send-keys -t grok-work 'cd /path/to/project && grok' Enter")
+
+# Wait for startup, then send a task
+terminal(command="sleep 5 && tmux send-keys -t grok-work 'Refactor the auth module to use JWT' Enter")
+
+# Monitor progress
+terminal(command="sleep 15 && tmux capture-pane -t grok-work -p -S -50")
+
+# Exit when done
+terminal(command="tmux send-keys -t grok-work '/quit' Enter && sleep 1 && tmux kill-session -t grok-work")
+```
+
+**Tip for headless-but-inline output:** if you want TUI-style output without the
+fullscreen alt-screen takeover (e.g. for cleaner logs), add `--no-alt-screen`.
+For pure automation, headless `-p` is still cleaner than the TUI.
+
+## Headless Deep Dive
+
+### Common Flags
+
+| Flag | Effect |
+|------|--------|
+| `-p, --single <PROMPT>` | Send one prompt, run headless, exit |
+| `-m, --model <MODEL>` | Choose a model |
+| `-s, --session-id <ID>` | Create or resume a named headless session |
+| `-r, --resume <ID>` | Resume an existing session |
+| `-c, --continue` | Continue the most recent session in the current directory |
+| `--cwd <PATH>` | Set the working directory |
+| `--output-format <FMT>` | `plain` (default), `json`, or `streaming-json` |
+| `--always-approve` | Auto-approve all tool executions (the `--full-auto` / `--yolo` equivalent) |
+| `--no-alt-screen` | Run inline, no fullscreen TUI takeover |
+| `--no-auto-update` | Skip background update checks (use in all automation) |
+
+### Output Formats
+
+- `plain` — human-readable text (default)
+- `json` — one JSON object at the end of the run (parse the result cleanly)
+- `streaming-json` — newline-delimited JSON events as they arrive
+
+```
+# Structured result for parsing
+terminal(command="grok --no-auto-update -p 'List all TODO comments in src/' --output-format json", workdir="/project", timeout=120)
+
+# Auto-approve for autonomous building
+terminal(command="grok --no-auto-update --always-approve -p 'Refactor the database layer and run the tests'", workdir="/project", timeout=300)
+```
+
+### Background Mode (Long Tasks)
+
+```
+# Start headless in background
+terminal(command="grok --no-auto-update --always-approve -p 'Refactor the auth module'", workdir="/project", background=true, notify_on_complete=true)
+# Returns session_id
+
+# Monitor
+process(action="poll", session_id="<id>")
+process(action="log", session_id="<id>")
+
+# Kill if needed
+process(action="kill", session_id="<id>")
+```
+
+For an interactive (TUI) background session, use `pty=true` + tmux and monitor
+with `tmux capture-pane`, exactly like the `claude-code` / `codex` skills.
+
+### Session Continuation
+
+```
+# Start a named session
+terminal(command="grok --no-auto-update -s refactor-db -p 'Start refactoring the database layer' --always-approve", workdir="/project", timeout=240)
+
+# Resume it later
+terminal(command="grok --no-auto-update -r refactor-db -p 'Now add connection pooling' --always-approve", workdir="/project", timeout=180)
+
+# Or continue the most recent session in this directory
+terminal(command="grok --no-auto-update -c -p 'What did you change last time?'", workdir="/project", timeout=60)
+```
+
+## Read-Only Audit → Markdown Note Pattern
+
+To have Grok review local artifacts and return a clean markdown note (for
+Obsidian or a repo) without mutating anything:
+
+1. Prepare stable input files first with Hermes tools (`read_file`,
+   `write_file`). Snapshot only the relevant context into a temp file rather
+   than dumping raw paths.
+2. Run Grok headless **without** `--always-approve` so it cannot auto-write, and
+   demand `markdown only, no preamble`.
+3. Save Grok's stdout straight into the destination note with `write_file()`.
+
+```
+grok --no-auto-update -p "Read /tmp/current.md and /tmp/inventory.md. Produce markdown only, no preamble. Output a clean note titled 'Cleanup Review'." --output-format plain
+```
+
+**Pitfall (same as Claude Code):** for document rewrites, a loose "rewrite this"
+prompt may return a change summary instead of the full file. Instead: pipe the
+file in, and demand `Return ONLY the full revised markdown document. No intro,
+no explanation, no code fences. Start immediately with '# Title'.` Verify the
+first lines with `read_file()` before overwriting the destination.
+
+## PR Review Patterns
+
+### Quick Review (Headless)
+
+```
+terminal(command="cd /path/to/repo && git diff main...feature-branch | grok --no-auto-update -p 'Review this diff for bugs, security issues, and style problems. Be thorough.'", timeout=120)
+```
+
+### Clone-to-temp Review (safe, no repo mutation)
+
+```
+terminal(command="REVIEW=$(mktemp -d) && git clone https://github.com/user/repo.git $REVIEW && cd $REVIEW && gh pr checkout 42 && grok --no-auto-update -p 'Review the changes vs origin/main. Check bugs, security, race conditions, missing tests.'", pty=true, timeout=300)
+```
+
+### Post the review
+
+```
+terminal(command="gh pr comment 42 --body '<review text>'", workdir="/path/to/repo")
+```
+
+## Parallel Issue Fixing with Worktrees
+
+```
+# Create worktrees
+terminal(command="git worktree add -b fix/issue-78 /tmp/issue-78 main", workdir="~/project")
+terminal(command="git worktree add -b fix/issue-99 /tmp/issue-99 main", workdir="~/project")
+
+# Launch Grok headless in each (background)
+terminal(command="grok --no-auto-update --always-approve -p 'Fix issue #78: <description>. Commit when done.'", workdir="/tmp/issue-78", background=true, notify_on_complete=true)
+terminal(command="grok --no-auto-update --always-approve -p 'Fix issue #99: <description>. Commit when done.'", workdir="/tmp/issue-99", background=true, notify_on_complete=true)
+
+# Monitor
+process(action="list")
+
+# After completion: push and open PRs
+terminal(command="cd /tmp/issue-78 && git push -u origin fix/issue-78")
+terminal(command="gh pr create --repo user/repo --head fix/issue-78 --title 'fix: ...' --body '...'")
+
+# Cleanup
+terminal(command="git worktree remove /tmp/issue-78", workdir="~/project")
+```
+
+## Useful Subcommands & TUI Commands
+
+| Command | Purpose |
+|---------|---------|
+| `grok` | Start the interactive TUI |
+| `grok -p "query"` | Headless one-shot |
+| `grok login` / `grok logout` | Sign in / out (SuperGrok / X Premium+ OAuth) |
+| `grok inspect` | Show what Grok discovered in cwd: config sources, instructions, skills, plugins, hooks, MCP servers |
+| `grok agent stdio` | Run as an ACP agent over JSON-RPC (for IDE/tool integration) |
+| `grok update` | Update the CLI (needs the `x.ai` host; skip in automation) |
+
+TUI slash commands (interactive only): `/model <name>`, `/always-approve`,
+`/plan`, `/context`, `/compact`, `/resume`, `/sessions`, `/fork`, `/usage`,
+`/quit`. `Shift+Tab` cycles session modes (including Plan mode, which blocks
+write tools except the session plan file).
+
+## Config (`~/.grok/config.toml`)
+
+```toml
+[cli]
+auto_update = false          # skip background update checks persistently
+
+[ui]
+permission_mode = "ask"      # or "always-approve" to skip tool prompts by default
+
+[models]
+default = "grok-build-0.1"
+```
+
+Put global preferences in `~/.grok/config.toml` (not project-scoped
+`.grok/config.toml`). `permission_mode` supersedes the legacy `approval_mode` /
+`yolo = true` keys.
+
+## Pitfalls & Gotchas
+
+1. **Auth is subscription-gated.** `grok login` requires a SuperGrok or X
+   Premium+ subscription. If login fails or there's no `~/.grok/auth.json`,
+   confirm the subscription is active before falling back to `XAI_API_KEY`.
+2. **Don't conflate Hermes' xAI auth with the `grok` CLI's auth.** Hermes'
+   `x_search` runs on its own xAI OAuth; the standalone `grok` CLI has a
+   separate token in `~/.grok/auth.json`. A working `x_search` does NOT mean
+   `grok` is logged in.
+3. **Always pass `--no-auto-update` in automation** — otherwise Grok phones home
+   for update checks (and `x.ai`/`storage.googleapis.com` may be unreachable).
+4. **Prefer npm install over the curl installer** — `npm install -g
+   @xai-official/grok` avoids the Cloudflare-walled `x.ai` host.
+5. **`--always-approve` is the autonomous-build switch.** Without it, headless
+   runs may stall waiting on tool-approval prompts. Omit it deliberately for
+   read-only review/audit work so Grok can't mutate files.
+6. **Headless `-p` skips TUI dialogs**; the TUI needs `pty=true` (+ tmux for
+   monitoring), just like Claude Code.
+7. **Use `--no-alt-screen`** if you run the TUI inline and the fullscreen
+   alt-screen takeover garbles captured output.
+8. **No git repo needed**, but for PR/commit workflows you still want one — use
+   `mktemp -d && git init` for scratch commit tasks.
+9. **Clean up tmux sessions** with `tmux kill-session -t <name>` when done.
+
+## Rules for Hermes Agents
+
+1. **Prefer headless `-p`** for single tasks — cleanest integration, structured
+   output via `--output-format json`.
+2. **Always set `workdir`** (or `--cwd`) so Grok targets the right project.
+3. **Pass `--no-auto-update`** in every automated invocation.
+4. **Use `--always-approve` only when Grok should write autonomously**; omit it
+   for read-only reviews and audits.
+5. **Background long tasks** with `background=true, notify_on_complete=true` and
+   monitor via the `process` tool.
+6. **Use tmux for multi-turn interactive work** and monitor with
+   `tmux capture-pane -t <session> -p -S -50`.
+7. **Verify auth before relying on it** — check `~/.grok/auth.json` or run a
+   cheap `grok -p "Say ok."` smoke test; don't assume Hermes' xAI auth carries
+   over.
+8. **Report results to the user** — summarize what Grok changed and what's left.
--- a/plugins/video_gen/fal/init.py
+++ b/plugins/video_gen/fal/init.py
@@ -17,7 +17,7 @@ Model families (each with t2v + i2v endpoints):
    veo3.1        fal-ai/veo3.1                                  /  fal-ai/veo3.1/image-to-video
    seedance-2.0  bytedance/seedance-2.0/text-to-video           /  bytedance/seedance-2.0/image-to-video
    kling-v3-4k   fal-ai/kling-video/v3/4k/text-to-video         /  fal-ai/kling-video/v3/4k/image-to-video
-    happy-horse   fal-ai/happy-horse/text-to-video               /  fal-ai/happy-horse/image-to-video
+    happy-horse   alibaba/happy-horse/text-to-video              /  alibaba/happy-horse/image-to-video

 Selection precedence for the active family:
    1. ``model=`` arg from the tool call
@@ -26,14 +26,16 @@ Selection precedence for the active family:
    4. ``video_gen.model`` in ``config.yaml`` (when it's one of our family IDs)
    5. ``DEFAULT_MODEL``

-Authentication via ``FAL_KEY``. Output is an HTTPS URL from FAL's CDN; the
-gateway downloads and delivers it.
+Authentication via ``FAL_KEY`` or the managed Nous gateway. Output is an
+HTTPS URL from FAL's CDN; the gateway downloads and delivers it.
 """

 from __future__ import annotations

 import logging
 import os
+import threading
+import uuid
 from typing import Any, Dict, List, Optional, Tuple

 from agent.video_gen_provider import (
@@ -104,8 +106,9 @@ FAL_FAMILIES: Dict[str, Dict[str, Any]] = {
        "text_endpoint": "fal-ai/veo3.1",
        "image_endpoint": "fal-ai/veo3.1/image-to-video",
        "aspect_ratios": ("16:9", "9:16"),
-        "resolutions": ("720p", "1080p"),
+        "resolutions": ("720p", "1080p", "4k"),
        "durations": (4, 6, 8),
+        "duration_suffix": "s",  # FAL veo3.1 wants "4s" not "4"
        "audio": True,
        "negative": True,
    },
@@ -148,8 +151,8 @@ FAL_FAMILIES: Dict[str, Dict[str, Any]] = {
        "price": "premium",
        "strengths": "Alibaba. New model, sparse public docs — conservative defaults.",
        "tier": "premium",
-        "text_endpoint": "fal-ai/happy-horse/text-to-video",
-        "image_endpoint": "fal-ai/happy-horse/image-to-video",
+        "text_endpoint": "alibaba/happy-horse/text-to-video",
+        "image_endpoint": "alibaba/happy-horse/image-to-video",
        # Docs don't expose duration/aspect/resolution — let the endpoint
        # apply its own defaults.
        "aspect_ratios": None,
@@ -270,7 +273,9 @@ def _build_payload(
    clamped = _clamp_duration(family, duration)
    if clamped is not None and family.get("durations"):
        # FAL exposes duration as a string in the queue API ("8" not 8).
-        payload["duration"] = str(clamped)
+        # Some families (e.g. veo3.1) require a unit suffix ("4s" not "4").
+        suffix = family.get("duration_suffix", "")
+        payload["duration"] = f"{clamped}{suffix}"

    if family.get("audio") and audio is not None:
        payload["generate_audio"] = bool(audio)
@@ -302,6 +307,92 @@ def _load_fal_client() -> Any:
    return _fal_client


+# ---------------------------------------------------------------------------
+# Managed FAL gateway (Nous Subscription)
+# ---------------------------------------------------------------------------
+
+_managed_fal_video_client: Any = None
+_managed_fal_video_client_config: Any = None
+_managed_fal_video_client_lock = threading.Lock()
+
+
+def _resolve_managed_fal_video_gateway():
+    """Return managed fal-queue gateway config when the user prefers the gateway
+    or direct FAL credentials are absent."""
+    from tools.tool_backend_helpers import fal_key_is_configured, prefers_gateway
+
+    if fal_key_is_configured() and not prefers_gateway("video_gen"):
+        return None
+    from tools.managed_tool_gateway import resolve_managed_tool_gateway
+
+    return resolve_managed_tool_gateway("fal-queue")
+
+
+def _get_managed_fal_video_client(managed_gateway):
+    """Reuse the managed FAL client so its internal httpx.Client is not leaked per call."""
+    global _managed_fal_video_client, _managed_fal_video_client_config
+    from tools.fal_common import _ManagedFalSyncClient
+
+    client_config = (
+        managed_gateway.gateway_origin.rstrip("/"),
+        managed_gateway.nous_user_token,
+    )
+    with _managed_fal_video_client_lock:
+        if _managed_fal_video_client is not None and _managed_fal_video_client_config == client_config:
+            return _managed_fal_video_client
+
+        _load_fal_client()
+        _managed_fal_video_client = _ManagedFalSyncClient(
+            _fal_client,
+            key=managed_gateway.nous_user_token,
+            queue_run_origin=managed_gateway.gateway_origin,
+        )
+        _managed_fal_video_client_config = client_config
+        return _managed_fal_video_client
+
+
+def _submit_fal_video_request(endpoint: str, arguments: Dict[str, Any]):
+    """Submit a FAL video request using direct credentials or the managed queue gateway.
+
+    Returns a request handle whose ``.get()`` blocks until the result is ready.
+    """
+    _load_fal_client()
+    request_headers = {"x-idempotency-key": str(uuid.uuid4())}
+    managed_gateway = _resolve_managed_fal_video_gateway()
+    if managed_gateway is None:
+        return _fal_client.submit(endpoint, arguments=arguments, headers=request_headers)
+
+    managed_client = _get_managed_fal_video_client(managed_gateway)
+    try:
+        return managed_client.submit(
+            endpoint,
+            arguments=arguments,
+            headers=request_headers,
+        )
+    except Exception as exc:
+        from tools.fal_common import _extract_http_status
+
+        status = _extract_http_status(exc)
+        if status is not None and 400 <= status < 500:
+            raise ValueError(
+                f"Nous Subscription gateway rejected endpoint '{endpoint}' "
+                f"(HTTP {status}). This model may not yet be enabled on "
+                f"the Nous Portal's FAL proxy. Either:\n"
+                f"  • Set FAL_KEY in your environment to use FAL.ai directly, or\n"
+                f"  • Pick a different model via `hermes tools` → Video Generation."
+            ) from exc
+        raise
+
+
+def _check_fal_video_available() -> bool:
+    """True if the FAL.ai video backend is reachable (direct key or managed gateway)."""
+    from tools.tool_backend_helpers import fal_key_is_configured
+
+    if fal_key_is_configured():
+        return True
+    return _resolve_managed_fal_video_gateway() is not None
+
+
 # ---------------------------------------------------------------------------
 # Provider
 # ---------------------------------------------------------------------------
@@ -323,13 +414,10 @@ class FALVideoGenProvider(VideoGenProvider):
        return "FAL"

    def is_available(self) -> bool:
-        if not os.environ.get("FAL_KEY", "").strip():
-            return False
        try:
-            import fal_client  # noqa: F401
-        except ImportError:
+            return _check_fal_video_available()
+        except Exception:  # noqa: BLE001 — never break the picker
            return False
-        return True

    def list_models(self) -> List[Dict[str, Any]]:
        out: List[Dict[str, Any]] = []
@@ -394,11 +482,12 @@ class FALVideoGenProvider(VideoGenProvider):
        seed: Optional[int] = None,
        **kwargs: Any,
    ) -> Dict[str, Any]:
-        if not os.environ.get("FAL_KEY", "").strip():
+        if not _check_fal_video_available():
            return error_response(
                error=(
-                    "FAL_KEY not set. Run `hermes tools` → Video Generation "
-                    "→ FAL to configure."
+                    "No FAL backend available. Either set FAL_KEY "
+                    "(run `hermes tools` → Video Generation → FAL to configure) "
+                    "or sign in to Nous (`hermes setup`) for managed gateway access."
                ),
                error_type="auth_required",
                provider="fal",
@@ -406,7 +495,7 @@ class FALVideoGenProvider(VideoGenProvider):
            )

        try:
-            fal_client = _load_fal_client()
+            _load_fal_client()
        except ImportError:
            return error_response(
                error="fal_client Python package not installed (pip install fal-client)",
@@ -467,11 +556,8 @@ class FALVideoGenProvider(VideoGenProvider):
        )

        try:
-            result = fal_client.subscribe(
-                endpoint,
-                arguments=payload,
-                with_logs=False,
-            )
+            handle = _submit_fal_video_request(endpoint, payload)
+            result = handle.get()
        except Exception as exc:
            logger.warning(
                "FAL video gen failed (family=%s, endpoint=%s): %s",
@@ -511,7 +597,7 @@ class FALVideoGenProvider(VideoGenProvider):
            prompt=prompt,
            modality=modality_used,
            aspect_ratio=aspect_ratio if "aspect_ratio" in payload else "",
-            duration=int(payload["duration"]) if "duration" in payload else 0,
+            duration=int("".join(c for c in payload["duration"] if c.isdigit()) or "0") if "duration" in payload else 0,
            provider="fal",
            extra=extra,
        )
--- a/tests/agent/test_prompt_builder.py
+++ b/tests/agent/test_prompt_builder.py
@@ -440,6 +440,7 @@ class TestBuildNousSubscriptionPrompt:
                features={
                    "web": NousFeatureState("web", "Web tools", True, True, True, True, False, True, "firecrawl"),
                    "image_gen": NousFeatureState("image_gen", "Image generation", True, True, True, True, False, True, "Nous Subscription"),
+                    "video_gen": NousFeatureState("video_gen", "Video generation", False, False, False, False, False, False, ""),
                    "tts": NousFeatureState("tts", "OpenAI TTS", True, True, True, True, False, True, "OpenAI TTS"),
                    "browser": NousFeatureState("browser", "Browser automation", True, True, True, True, False, True, "Browser Use"),
                    "modal": NousFeatureState("modal", "Modal execution", False, True, False, False, False, True, "local"),
@@ -464,6 +465,7 @@ class TestBuildNousSubscriptionPrompt:
                features={
                    "web": NousFeatureState("web", "Web tools", True, False, False, False, False, True, ""),
                    "image_gen": NousFeatureState("image_gen", "Image generation", True, False, False, False, False, True, ""),
+                    "video_gen": NousFeatureState("video_gen", "Video generation", False, False, False, False, False, False, ""),
                    "tts": NousFeatureState("tts", "OpenAI TTS", True, False, False, False, False, True, ""),
                    "browser": NousFeatureState("browser", "Browser automation", True, False, False, False, False, True, ""),
                    "modal": NousFeatureState("modal", "Modal execution", False, False, False, False, False, True, ""),
--- a/tests/gateway/test_planned_stop_watcher.py
+++ b/tests/gateway/test_planned_stop_watcher.py
@@ -12,12 +12,33 @@ See issue #33778 for the original Windows session-loss bug report.
 """

 import asyncio
+import json
+import os
 import threading
 import time
 from unittest.mock import MagicMock


 from gateway.run import _run_planned_stop_watcher
+from gateway import status as status_mod
+
+
+def _write_self_marker(marker, *, stale: bool = False):
+    """Write a planned-stop marker that targets the CURRENT process.
+
+    The watcher only fires for markers naming our PID + start_time (the
+    fix for issue #34597), so tests that expect a fire must write a
+    self-targeting marker. Pass ``stale=True`` to backdate ``written_at``
+    past the TTL.
+    """
+    written_at = "2000-01-01T00:00:00+00:00" if stale else status_mod._utc_now_iso()
+    record = {
+        "target_pid": os.getpid(),
+        "target_start_time": status_mod._get_process_start_time(os.getpid()),
+        "stopper_pid": os.getpid(),
+        "written_at": written_at,
+    }
+    marker.write_text(json.dumps(record), encoding="utf-8")


 class _FakeRunner:
@@ -41,11 +62,10 @@ def _make_loop_capturing_calls():


 def test_watcher_fires_shutdown_when_marker_appears(tmp_path, monkeypatch):
-    """When the marker file exists, the watcher must call the shutdown handler."""
+    """When a marker targeting THIS process exists, fire the shutdown handler."""
    marker = tmp_path / ".gateway-planned-stop.json"

    # Patch the marker-path resolver so the watcher polls our temp location.
-    from gateway import status as status_mod
    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)

    runner = _FakeRunner(running=True, draining=False)
@@ -53,8 +73,8 @@ def test_watcher_fires_shutdown_when_marker_appears(tmp_path, monkeypatch):
    shutdown_handler = MagicMock(name="shutdown_signal_handler")
    stop_event = threading.Event()

-    # Drop the marker before the thread starts.
-    marker.write_text('{"target_pid": 1234}', encoding="utf-8")
+    # Drop a self-targeting marker before the thread starts.
+    _write_self_marker(marker)

    watcher = threading.Thread(
        target=_run_planned_stop_watcher,
@@ -114,9 +134,8 @@ def test_watcher_skips_when_runner_already_draining(tmp_path, monkeypatch):
    so the watcher backs off once any shutdown is in flight.
    """
    marker = tmp_path / ".gateway-planned-stop.json"
-    marker.write_text('{"target_pid": 1234}', encoding="utf-8")
+    _write_self_marker(marker)

-    from gateway import status as status_mod
    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)

    # Already draining — watcher should be a no-op.
@@ -204,9 +223,8 @@ def test_watcher_fires_only_once_when_marker_persists(tmp_path, monkeypatch):
    times before the gateway actually shuts down.
    """
    marker = tmp_path / ".gateway-planned-stop.json"
-    marker.write_text('{"target_pid": 1234}', encoding="utf-8")
+    _write_self_marker(marker)

-    from gateway import status as status_mod
    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)

    runner = _FakeRunner(running=True, draining=False)
@@ -263,3 +281,113 @@ def test_watcher_tolerates_marker_path_resolution_errors(tmp_path, monkeypatch,
    assert not watcher.is_alive(), "Watcher should still honour stop_event after errors"
    # No shutdown fired because the marker never reported existence.
    assert loop._captured == []
+
+
+# ---------------------------------------------------------------------------
+# Regression coverage for issue #34597:
+# A marker left behind by a PREVIOUS gateway instance (different PID, or
+# past its TTL) must NOT crash the freshly booted gateway. The watcher
+# only fires when the marker targets the current process, and self-heals
+# by cleaning up stale/malformed markers.
+# ---------------------------------------------------------------------------
+
+
+def test_watcher_does_not_fire_for_foreign_pid_marker(tmp_path, monkeypatch):
+    """A marker naming a DIFFERENT process must not trigger our shutdown.
+
+    This is the core #34597 regression: a stale marker from a prior
+    gateway instance was firing the handler, driving the new gateway into
+    a false "Received UNKNOWN" shutdown and a watchdog crash loop.
+    """
+    marker = tmp_path / ".gateway-planned-stop.json"
+    # Foreign PID + a start_time that cannot match ours, freshly written
+    # so the TTL does NOT remove it — the watcher must still decline.
+    record = {
+        "target_pid": os.getpid() + 1,
+        "target_start_time": -1,
+        "stopper_pid": os.getpid() + 1,
+        "written_at": status_mod._utc_now_iso(),
+    }
+    marker.write_text(json.dumps(record), encoding="utf-8")
+
+    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
+
+    runner = _FakeRunner(running=True, draining=False)
+    loop = _make_loop_capturing_calls()
+    shutdown_handler = MagicMock(name="shutdown_signal_handler")
+    stop_event = threading.Event()
+
+    watcher = threading.Thread(
+        target=_run_planned_stop_watcher,
+        args=(stop_event, runner, loop, shutdown_handler),
+        kwargs={"poll_interval": 0.05},
+        daemon=True,
+    )
+    watcher.start()
+    time.sleep(0.3)  # several poll cycles
+    stop_event.set()
+    watcher.join(timeout=2.0)
+
+    assert not watcher.is_alive()
+    assert loop._captured == [], (
+        f"Watcher fired on a foreign-PID marker (#34597 regression): {loop._captured}"
+    )
+    shutdown_handler.assert_not_called()
+    # Foreign (but live) marker is left in place — it may still belong to
+    # the process it names.
+    assert marker.exists()
+
+
+def test_watcher_cleans_up_stale_marker_and_keeps_running(tmp_path, monkeypatch):
+    """A marker older than the TTL is unlinked and never fires shutdown."""
+    marker = tmp_path / ".gateway-planned-stop.json"
+    # Self-targeting but backdated past the TTL: must be treated as dead.
+    _write_self_marker(marker, stale=True)
+
+    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
+
+    runner = _FakeRunner(running=True, draining=False)
+    loop = _make_loop_capturing_calls()
+    shutdown_handler = MagicMock(name="shutdown_signal_handler")
+    stop_event = threading.Event()
+
+    watcher = threading.Thread(
+        target=_run_planned_stop_watcher,
+        args=(stop_event, runner, loop, shutdown_handler),
+        kwargs={"poll_interval": 0.05},
+        daemon=True,
+    )
+    watcher.start()
+    time.sleep(0.3)
+    stop_event.set()
+    watcher.join(timeout=2.0)
+
+    assert not watcher.is_alive()
+    assert loop._captured == [], "Stale marker must not fire shutdown"
+    shutdown_handler.assert_not_called()
+    assert not marker.exists(), "Stale marker should have been cleaned up"
+
+
+def test_planned_stop_marker_targets_self_probe_is_non_destructive(tmp_path, monkeypatch):
+    """The probe returns True for a self-marker WITHOUT unlinking it.
+
+    The shutdown handler performs the authoritative consume on its own
+    thread, so the watcher's probe must leave a matching marker intact.
+    """
+    marker = tmp_path / ".gateway-planned-stop.json"
+    _write_self_marker(marker)
+    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
+
+    assert status_mod.planned_stop_marker_targets_self() is True
+    assert marker.exists(), "Probe must not consume a matching marker"
+    # Idempotent: still True on a second call.
+    assert status_mod.planned_stop_marker_targets_self() is True
+
+
+def test_planned_stop_marker_targets_self_drops_malformed(tmp_path, monkeypatch):
+    """A malformed marker reports False and is cleaned up."""
+    marker = tmp_path / ".gateway-planned-stop.json"
+    marker.write_text("{not valid json", encoding="utf-8")
+    monkeypatch.setattr(status_mod, "_get_planned_stop_marker_path", lambda: marker)
+
+    assert status_mod.planned_stop_marker_targets_self() is False
--- a/tests/gateway/test_status.py
+++ b/tests/gateway/test_status.py
@@ -707,6 +707,33 @@ class TestTakeoverMarker:

        assert result is False

+    def test_consume_returns_true_on_windows_when_start_time_unavailable(
+        self, tmp_path, monkeypatch
+    ):
+        """Takeover consume must also recognise a self-marker on platforms
+        without ``/proc`` (macOS / native Windows).
+
+        ``consume_takeover_marker_for_self`` shares ``_consume_pid_marker_for_self``
+        with the planned-stop path, so the same start_time fallback applies:
+        a ``--replace`` SIGTERM on Windows (where start_time is None on both
+        sides) must be recognised as a planned takeover and exit 0, not be
+        misclassified as an unexpected UNKNOWN exit. With start_time
+        unavailable we fall back to PID equality alone, bounded by the TTL.
+        """
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        # Simulate Windows: no start_time available for any PID.
+        monkeypatch.setattr(status, "_get_process_start_time", lambda pid: None)
+
+        ok = status.write_takeover_marker(target_pid=os.getpid())
+        assert ok is True
+        payload = json.loads((tmp_path / ".gateway-takeover.json").read_text())
+        assert payload["target_start_time"] is None
+
+        result = status.consume_takeover_marker_for_self()
+
+        assert result is True
+        assert not (tmp_path / ".gateway-takeover.json").exists()
+
    def test_consume_returns_false_when_marker_missing(self, tmp_path, monkeypatch):
        monkeypatch.setenv("HERMES_HOME", str(tmp_path))

@@ -899,6 +926,74 @@ class TestPlannedStopMarker:

        assert ok is False

+    def test_consume_returns_true_on_windows_when_start_time_unavailable(
+        self, tmp_path, monkeypatch
+    ):
+        """Regression for #34597: a legitimate stop must be recognised on
+        platforms without ``/proc``.
+
+        ``_get_process_start_time`` returns None on macOS / native Windows
+        (no ``/proc/<pid>/stat``). The planned-stop watcher only runs there,
+        so if the authoritative consume required a non-None start_time match
+        it would always return False — and ``hermes gateway stop`` would be
+        misclassified as an unexpected ``UNKNOWN`` exit, exit 1, and revived
+        by the service manager (the very crash loop #34597 set out to fix).
+        With start_time unavailable on BOTH sides we fall back to PID
+        equality alone, bounded by the marker TTL.
+        """
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        # Simulate Windows: no start_time available for any PID.
+        monkeypatch.setattr(status, "_get_process_start_time", lambda pid: None)
+
+        ok = status.write_planned_stop_marker(target_pid=os.getpid())
+        assert ok is True
+        # Marker carries a null start_time, exactly as written on Windows.
+        payload = json.loads((tmp_path / ".gateway-planned-stop.json").read_text())
+        assert payload["target_start_time"] is None
+
+        result = status.consume_planned_stop_marker_for_self()
+
+        assert result is True
+        assert not (tmp_path / ".gateway-planned-stop.json").exists()
+
+    def test_consume_still_rejects_foreign_pid_when_start_time_unavailable(
+        self, tmp_path, monkeypatch
+    ):
+        """The PID-only fallback must NOT match a marker naming another PID.
+
+        Falling back to PID equality when start_time is unknown must remain
+        a PID check — a marker for a different process is never ours.
+        """
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        monkeypatch.setattr(status, "_get_process_start_time", lambda pid: None)
+
+        ok = status.write_planned_stop_marker(target_pid=os.getpid() + 9999)
+        assert ok is True
+
+        result = status.consume_planned_stop_marker_for_self()
+
+        assert result is False
+
+    def test_consume_still_rejects_start_time_mismatch_when_both_known(
+        self, tmp_path, monkeypatch
+    ):
+        """PID-reuse defence is preserved when BOTH start_times are present.
+
+        The Windows fallback only relaxes matching when a start_time is
+        unavailable. When both sides report one (Linux), a mismatch must
+        still reject — otherwise PID reuse could resurrect a stale marker.
+        """
+        monkeypatch.setenv("HERMES_HOME", str(tmp_path))
+        monkeypatch.setattr(status, "_get_process_start_time", lambda pid: 100)
+        status.write_planned_stop_marker(target_pid=os.getpid())
+
+        # Simulate PID reuse: same PID, different start_time.
+        monkeypatch.setattr(status, "_get_process_start_time", lambda pid: 9999)
+
+        result = status.consume_planned_stop_marker_for_self()
+
+        assert result is False
+

 class TestReadProcessCmdlinePsFallback:
    """Tests for _read_process_cmdline falling back to ps on non-Linux."""
--- a/tests/hermes_cli/test_nous_subscription.py
+++ b/tests/hermes_cli/test_nous_subscription.py
@@ -218,7 +218,7 @@ def test_get_gateway_eligible_tools_ignores_quoted_false_opt_in(monkeypatch):
    monkeypatch.setattr(
        ns,
        "_get_gateway_direct_credentials",
-        lambda: {"web": True, "image_gen": False, "tts": False, "browser": False},
+        lambda: {"web": True, "image_gen": False, "video_gen": False, "tts": False, "browser": False},
    )

    unconfigured, has_direct, already_managed = ns.get_gateway_eligible_tools(
@@ -230,4 +230,4 @@ def test_get_gateway_eligible_tools_ignores_quoted_false_opt_in(monkeypatch):

    assert "web" in has_direct
    assert "web" not in already_managed
-    assert set(unconfigured) == {"image_gen", "tts", "browser"}
+    assert set(unconfigured) == {"image_gen", "video_gen", "tts", "browser"}
--- a/tests/hermes_cli/test_setup_model_provider.py
+++ b/tests/hermes_cli/test_setup_model_provider.py
@@ -498,6 +498,7 @@ def test_setup_summary_shows_camofox_when_browser_feature_is_camofox(tmp_path, m
            features={
                "web": NousFeatureState("web", "Web tools", True, False, False, False, False, True, ""),
                "image_gen": NousFeatureState("image_gen", "Image generation", True, False, False, False, False, True, ""),
+                "video_gen": NousFeatureState("video_gen", "Video generation", False, False, False, False, False, False, ""),
                "tts": NousFeatureState("tts", "OpenAI TTS", True, False, False, False, False, True, ""),
                "browser": NousFeatureState("browser", "Browser automation", True, True, True, False, True, True, "Camofox"),
                "modal": NousFeatureState("modal", "Modal execution", False, False, False, False, False, True, "local"),
@@ -525,6 +526,7 @@ def test_setup_summary_does_not_mark_incomplete_browserbase_as_available(tmp_pat
            features={
                "web": NousFeatureState("web", "Web tools", True, False, False, False, False, True, ""),
                "image_gen": NousFeatureState("image_gen", "Image generation", True, False, False, False, False, True, ""),
+                "video_gen": NousFeatureState("video_gen", "Video generation", False, False, False, False, False, False, ""),
                "tts": NousFeatureState("tts", "OpenAI TTS", True, False, False, False, False, True, ""),
                "browser": NousFeatureState("browser", "Browser automation", True, False, False, False, False, True, "Browserbase"),
                "modal": NousFeatureState("modal", "Modal execution", False, False, False, False, False, True, "local"),
--- a/tests/hermes_cli/test_status_model_provider.py
+++ b/tests/hermes_cli/test_status_model_provider.py
@@ -88,6 +88,7 @@ def test_show_status_reports_managed_nous_features(monkeypatch, capsys, tmp_path
            features={
                "web": NousFeatureState("web", "Web tools", True, True, True, True, False, True, "firecrawl"),
                "image_gen": NousFeatureState("image_gen", "Image generation", True, True, True, True, False, True, "Nous Subscription"),
+                "video_gen": NousFeatureState("video_gen", "Video generation", False, False, False, False, False, False, ""),
                "tts": NousFeatureState("tts", "OpenAI TTS", True, True, True, True, False, True, "OpenAI TTS"),
                "browser": NousFeatureState("browser", "Browser automation", True, True, True, True, False, True, "Browser Use"),
                "modal": NousFeatureState("modal", "Modal execution", False, True, False, False, False, True, "local"),
--- a/tests/plugins/video_gen/test_fal_plugin.py
+++ b/tests/plugins/video_gen/test_fal_plugin.py
@@ -85,44 +85,72 @@ def test_fal_list_models_advertises_both_modalities():

 def test_fal_unavailable_without_key(monkeypatch):
    from plugins.video_gen.fal import FALVideoGenProvider
+    from plugins.video_gen import fal as fal_plugin

    monkeypatch.delenv("FAL_KEY", raising=False)
+    # Also ensure managed gateway is unavailable
+    monkeypatch.setattr(fal_plugin, "_resolve_managed_fal_video_gateway", lambda: None)
    assert FALVideoGenProvider().is_available() is False


 def test_fal_generate_requires_fal_key(monkeypatch):
    from plugins.video_gen.fal import FALVideoGenProvider
+    from plugins.video_gen import fal as fal_plugin

    monkeypatch.delenv("FAL_KEY", raising=False)
+    # Also ensure managed gateway is unavailable
+    monkeypatch.setattr(fal_plugin, "_resolve_managed_fal_video_gateway", lambda: None)
    result = FALVideoGenProvider().generate("a happy dog")
    assert result["success"] is False
    assert result["error_type"] == "auth_required"


+def test_fal_available_via_gateway(monkeypatch):
+    from plugins.video_gen.fal import FALVideoGenProvider
+    from plugins.video_gen import fal as fal_plugin
+
+    monkeypatch.delenv("FAL_KEY", raising=False)
+    monkeypatch.setattr(
+        fal_plugin,
+        "_resolve_managed_fal_video_gateway",
+        lambda: object(),  # truthy sentinel — gateway is available
+    )
+    assert FALVideoGenProvider().is_available() is True
+
+
 class TestFamilyRouting:
    """The headline behavior: image_url presence picks the endpoint."""

    @pytest.fixture
    def with_fake_fal(self, monkeypatch):
-        """Stub fal_client.subscribe to capture which endpoint we hit."""
+        """Stub fal_client.submit to capture which endpoint we hit."""
        import sys
        import types

        captured = {"endpoint": None, "arguments": None}

+        class FakeHandle:
+            def get(self):
+                return {"video": {"url": "https://fake/out.mp4"}}
+
        fake = types.ModuleType("fal_client")
-        def _subscribe(endpoint, arguments=None, with_logs=False):
+        def _submit(endpoint, arguments=None, headers=None):
            captured["endpoint"] = endpoint
            captured["arguments"] = arguments
-            return {"video": {"url": "https://fake/out.mp4"}}
-        fake.subscribe = _subscribe  # type: ignore
+            return FakeHandle()
+        fake.submit = _submit  # type: ignore
        monkeypatch.setitem(sys.modules, "fal_client", fake)

        # Reset the lazy global so it picks up our stub
        from plugins.video_gen import fal as fal_plugin
        fal_plugin._fal_client = None
+        # Also reset the managed client cache
+        fal_plugin._managed_fal_video_client = None
+        fal_plugin._managed_fal_video_client_config = None

        monkeypatch.setenv("FAL_KEY", "test")
+        # Force direct mode — no managed gateway
+        monkeypatch.setattr(fal_plugin, "_resolve_managed_fal_video_gateway", lambda: None)
        return captured

    def test_text_to_video_routes_to_text_endpoint(self, with_fake_fal):
@@ -229,7 +257,7 @@ class TestPayloadBuilder:
            seed=42,
        )
        assert p["prompt"] == "x"
-        assert p["duration"] == "8"  # FAL queue API uses strings
+        assert p["duration"] == "8s"  # veo3.1 uses "Ns" format per FAL API
        assert p["aspect_ratio"] == "16:9"
        assert p["resolution"] == "720p"
        assert p["generate_audio"] is True
--- a/tests/test_hermes_state.py
+++ b/tests/test_hermes_state.py
@@ -2676,6 +2676,64 @@ class TestVacuum:
        db.vacuum()


+class TestOptimizeFts:
+    def test_optimize_returns_index_count(self, db):
+        """A fresh DB has both FTS indexes; optimize merges both."""
+        db.create_session(session_id="s1", source="cli")
+        db.append_message(session_id="s1", role="user", content="hello world")
+        assert db.optimize_fts() == 2
+
+    def test_optimize_preserves_search_and_snippet(self, db):
+        """Optimize is layout-only: MATCH results + snippets are unchanged."""
+        db.create_session(session_id="s1", source="cli")
+        for i in range(50):
+            db.append_message(
+                session_id="s1",
+                role="user",
+                content=f"needle alpha bravo charlie message {i}",
+            )
+        before = db.search_messages("needle")
+        n = db.optimize_fts()
+        assert n == 2
+        after = db.search_messages("needle")
+        assert len(after) == len(before)
+        assert len(after) > 0
+        # Snippet must still be populated (would be empty/None if the FTS
+        # content shadow were lost during optimize).
+        assert all(row.get("snippet") for row in after)
+        # IDs and snippets are identical before/after — pure layout change.
+        assert [r["id"] for r in after] == [r["id"] for r in before]
+        assert [r["snippet"] for r in after] == [r["snippet"] for r in before]
+
+    def test_optimize_skips_missing_trigram_table(self, db):
+        """When the trigram index is absent, optimize handles only the porter
+        index and does not raise."""
+        db.create_session(session_id="s1", source="cli")
+        db.append_message(session_id="s1", role="user", content="hello")
+        # Drop the trigram table + triggers to simulate a disabled/absent index.
+        with db._lock:
+            for trig in (
+                "messages_fts_trigram_insert",
+                "messages_fts_trigram_delete",
+                "messages_fts_trigram_update",
+            ):
+                db._conn.execute(f"DROP TRIGGER IF EXISTS {trig}")
+            db._conn.execute("DROP TABLE IF EXISTS messages_fts_trigram")
+        assert db._fts_table_exists("messages_fts_trigram") is False
+        assert db._fts_table_exists("messages_fts") is True
+        # Only the porter index remains -> 1 optimized, no error.
+        assert db.optimize_fts() == 1
+
+    def test_optimize_idempotent(self, db):
+        """Running optimize twice is safe (second pass is a no-op merge)."""
+        db.create_session(session_id="s1", source="cli")
+        db.append_message(session_id="s1", role="user", content="repeat me")
+        assert db.optimize_fts() == 2
+        assert db.optimize_fts() == 2
+        # Search still works after repeated optimization.
+        assert len(db.search_messages("repeat")) == 1
+
+
 class TestAutoMaintenance:
    def _make_old_ended(self, db, sid: str, days_old: int = 100):
        """Create a session that is ended and was started `days_old` days ago."""
--- a/tests/tools/test_managed_media_gateways.py
+++ b/tests/tools/test_managed_media_gateways.py
@@ -305,3 +305,214 @@ def test_transcription_uses_model_specific_response_formats(monkeypatch, tmp_pat
    assert json_result["transcript"] == "hello from gpt-4o"
    assert json_capture["transcription_kwargs"]["response_format"] == "json"
    assert json_capture["close_calls"] == 1
+
+
+PLUGINS_DIR = Path(__file__).resolve().parents[2] / "plugins"
+
+
+def _load_video_gen_plugin(monkeypatch):
+    """Load the FAL video gen plugin in isolation."""
+    _install_fake_tools_package()
+
+    # Also need the agent.video_gen_provider ABC
+    agent_dir = Path(__file__).resolve().parents[2] / "agent"
+    spec = spec_from_file_location(
+        "agent.video_gen_provider",
+        agent_dir / "video_gen_provider.py",
+    )
+    assert spec and spec.loader
+    mod = module_from_spec(spec)
+    sys.modules["agent.video_gen_provider"] = mod
+    spec.loader.exec_module(mod)
+
+    # Load the plugin
+    plugin_init = PLUGINS_DIR / "video_gen" / "fal" / "__init__.py"
+    spec = spec_from_file_location("plugins.video_gen.fal", plugin_init)
+    assert spec and spec.loader
+    plugin_mod = module_from_spec(spec)
+    sys.modules["plugins.video_gen.fal"] = plugin_mod
+    spec.loader.exec_module(plugin_mod)
+    return plugin_mod
+
+
+def test_video_gen_managed_fal_submit_uses_gateway(monkeypatch):
+    """Video gen routes through the managed gateway when FAL_KEY is absent."""
+    captured = {}
+    fake_fal = _install_fake_fal_client(captured)
+    monkeypatch.delenv("FAL_KEY", raising=False)
+    monkeypatch.setenv("FAL_QUEUE_GATEWAY_URL", "http://127.0.0.1:3009")
+    monkeypatch.setenv("TOOL_GATEWAY_USER_TOKEN", "nous-video-token")
+
+    plugin = _load_video_gen_plugin(monkeypatch)
+
+    # Patch uuid for deterministic idempotency key
+    monkeypatch.setattr(plugin.uuid, "uuid4", lambda: "video-submit-456")
+
+    plugin._submit_fal_video_request(
+        "fal-ai/pixverse/v6/text-to-video",
+        {"prompt": "a cat riding a bicycle", "duration": "5"},
+    )
+
+    assert captured["submit_via"] == "managed_client"
+    assert captured["client_key"] == "nous-video-token"
+    assert captured["submit_url"] == "http://127.0.0.1:3009/fal-ai/pixverse/v6/text-to-video"
+    assert captured["method"] == "POST"
+    assert captured["arguments"] == {"prompt": "a cat riding a bicycle", "duration": "5"}
+    assert captured["headers"] == {"x-idempotency-key": "video-submit-456"}
+    assert captured["sync_client_inits"] == 1
+
+
+def test_video_gen_managed_client_reused_across_calls(monkeypatch):
+    """The managed video client is cached and reused across requests."""
+    captured = {}
+    _install_fake_fal_client(captured)
+    monkeypatch.delenv("FAL_KEY", raising=False)
+    monkeypatch.setenv("FAL_QUEUE_GATEWAY_URL", "http://127.0.0.1:3009")
+    monkeypatch.setenv("TOOL_GATEWAY_USER_TOKEN", "nous-video-token")
+
+    plugin = _load_video_gen_plugin(monkeypatch)
+
+    plugin._submit_fal_video_request("fal-ai/pixverse/v6/text-to-video", {"prompt": "first"})
+    first_client = captured["http_client"]
+    plugin._submit_fal_video_request("fal-ai/pixverse/v6/text-to-video", {"prompt": "second"})
+
+    assert captured["sync_client_inits"] == 1
+    assert captured["http_client"] is first_client
+
+
+def test_video_gen_direct_mode_when_fal_key_set(monkeypatch):
+    """When FAL_KEY is set and gateway not preferred, uses direct fal_client.submit."""
+    captured = {}
+    _install_fake_fal_client(captured)
+    monkeypatch.setenv("FAL_KEY", "direct-fal-key-123")
+    monkeypatch.delenv("FAL_QUEUE_GATEWAY_URL", raising=False)
+    monkeypatch.delenv("TOOL_GATEWAY_USER_TOKEN", raising=False)
+
+    plugin = _load_video_gen_plugin(monkeypatch)
+    monkeypatch.setattr(plugin.uuid, "uuid4", lambda: "direct-456")
+
+    # Trigger the lazy load so _fal_client is populated from our fake
+    plugin._load_fal_client()
+
+    # In direct mode, fal_client.submit is the module-level function.
+    # Our fake raises AssertionError from the managed path, so we need
+    # to patch it to actually capture the call.
+    direct_captured = {}
+
+    def direct_submit(endpoint, arguments=None, headers=None):
+        direct_captured["endpoint"] = endpoint
+        direct_captured["arguments"] = arguments
+        direct_captured["headers"] = headers
+        # Return a mock handle
+        class FakeHandle:
+            def get(self):
+                return {"video": {"url": "https://fal.media/result.mp4"}}
+        return FakeHandle()
+
+    plugin._fal_client.submit = direct_submit
+
+    plugin._submit_fal_video_request(
+        "fal-ai/pixverse/v6/text-to-video",
+        {"prompt": "test direct"},
+    )
+
+    assert direct_captured["endpoint"] == "fal-ai/pixverse/v6/text-to-video"
+    assert direct_captured["arguments"] == {"prompt": "test direct"}
+    assert direct_captured["headers"] == {"x-idempotency-key": "direct-456"}
+    # Managed client should NOT have been initialized
+    assert "submit_via" not in captured
+
+
+def test_video_gen_gateway_4xx_raises_actionable_valueerror(monkeypatch):
+    """A 4xx from the managed gateway surfaces a clear ValueError with remediation hints."""
+    captured = {}
+    _install_fake_fal_client(captured)
+    monkeypatch.delenv("FAL_KEY", raising=False)
+    monkeypatch.setenv("FAL_QUEUE_GATEWAY_URL", "http://127.0.0.1:3009")
+    monkeypatch.setenv("TOOL_GATEWAY_USER_TOKEN", "nous-video-token")
+
+    plugin = _load_video_gen_plugin(monkeypatch)
+
+    # Make _maybe_retry_request raise an exception with a 403 status
+    class FakeResponse:
+        status_code = 403
+
+    class GatewayRejectError(Exception):
+        def __init__(self):
+            super().__init__("forbidden")
+            self.response = FakeResponse()
+
+    original_retry = sys.modules["fal_client"].client._maybe_retry_request
+
+    def raising_retry(client, method, url, json=None, timeout=None, headers=None):
+        raise GatewayRejectError()
+
+    sys.modules["fal_client"].client._maybe_retry_request = raising_retry
+
+    with pytest.raises(ValueError, match=r"gateway rejected endpoint.*HTTP 403"):
+        plugin._submit_fal_video_request(
+            "fal-ai/pixverse/v6/text-to-video",
+            {"prompt": "test 4xx"},
+        )
+
+
+def test_video_gen_is_available_true_via_gateway(monkeypatch):
+    """is_available() returns True when FAL_KEY is absent but managed gateway is configured."""
+    _install_fake_fal_client({})
+    monkeypatch.delenv("FAL_KEY", raising=False)
+    monkeypatch.setenv("FAL_QUEUE_GATEWAY_URL", "http://127.0.0.1:3009")
+    monkeypatch.setenv("TOOL_GATEWAY_USER_TOKEN", "nous-video-token")
+
+    plugin = _load_video_gen_plugin(monkeypatch)
+    provider = plugin.FALVideoGenProvider()
+    assert provider.is_available() is True
+
+
+def test_video_gen_prefers_gateway_overrides_direct_key(monkeypatch):
+    """When FAL_KEY is set but prefers_gateway('video_gen') is True, routes through gateway."""
+    captured = {}
+    _install_fake_fal_client(captured)
+    monkeypatch.setenv("FAL_KEY", "direct-key-present")
+    monkeypatch.setenv("FAL_QUEUE_GATEWAY_URL", "http://127.0.0.1:3009")
+    monkeypatch.setenv("TOOL_GATEWAY_USER_TOKEN", "nous-video-token")
+
+    plugin = _load_video_gen_plugin(monkeypatch)
+
+    # Patch prefers_gateway to return True for video_gen
+    tb_helpers = sys.modules["tools.tool_backend_helpers"]
+    original_pg = tb_helpers.prefers_gateway
+    monkeypatch.setattr(tb_helpers, "prefers_gateway", lambda section: section == "video_gen")
+
+    plugin._submit_fal_video_request(
+        "fal-ai/pixverse/v6/text-to-video",
+        {"prompt": "gateway preferred"},
+    )
+
+    assert captured["submit_via"] == "managed_client"
+    assert captured["client_key"] == "nous-video-token"
+
+
+def test_video_gen_happy_horse_uses_alibaba_namespace():
+    """Verify the happy-horse family uses alibaba/ not fal-ai/ endpoints."""
+    _install_fake_tools_package()
+
+    # Load just the plugin module to check the catalog
+    plugin_init = PLUGINS_DIR / "video_gen" / "fal" / "__init__.py"
+
+    agent_dir = Path(__file__).resolve().parents[2] / "agent"
+    spec = spec_from_file_location(
+        "agent.video_gen_provider",
+        agent_dir / "video_gen_provider.py",
+    )
+    mod = module_from_spec(spec)
+    sys.modules["agent.video_gen_provider"] = mod
+    spec.loader.exec_module(mod)
+
+    spec = spec_from_file_location("plugins.video_gen.fal", plugin_init)
+    plugin_mod = module_from_spec(spec)
+    sys.modules["plugins.video_gen.fal"] = plugin_mod
+    spec.loader.exec_module(plugin_mod)
+
+    hh = plugin_mod.FAL_FAMILIES["happy-horse"]
+    assert hh["text_endpoint"] == "alibaba/happy-horse/text-to-video"
+    assert hh["image_endpoint"] == "alibaba/happy-horse/image-to-video"
--- a/tests/tools/test_video_generation_tool_surface_matrix.py
+++ b/tests/tools/test_video_generation_tool_surface_matrix.py
@@ -46,6 +46,18 @@ def matrix_env(tmp_path, monkeypatch):
        fal_calls.append({"endpoint": endpoint, "arguments": arguments})
        return {"video": {"url": f"https://fake-fal/{endpoint.replace('/','_')}.mp4"}}
    fake_fal.subscribe = _subscribe  # type: ignore
+
+    class _FalHandle:
+        def __init__(self, result):
+            self._result = result
+        def get(self):
+            return self._result
+
+    def _submit(endpoint, arguments=None, headers=None):
+        fal_calls.append({"endpoint": endpoint, "arguments": arguments})
+        return _FalHandle({"video": {"url": f"https://fake-fal/{endpoint.replace('/','_')}.mp4"}})
+    fake_fal.submit = _submit  # type: ignore
+
    monkeypatch.setitem(__import__("sys").modules, "fal_client", fake_fal)

    # httpx stub for xAI
--- a/website/docs/reference/optional-skills-catalog.md
+++ b/website/docs/reference/optional-skills-catalog.md
@@ -31,7 +31,9 @@ hermes skills uninstall <skill-name>

 | Skill | Description |
 |-------|-------------|
+| [**antigravity-cli**](/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-antigravity-cli) | Operate the Antigravity CLI (agy): plugins, auth, sandbox. |
 | [**blackbox**](/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-blackbox) | Delegate coding tasks to Blackbox AI CLI agent. Multi-model agent with built-in judge that runs tasks through multiple LLMs and picks the best result. Requires the blackbox CLI and a Blackbox AI API key. |
+| [**grok**](/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-grok) | Delegate coding to xAI Grok Build CLI (features, PRs). |
 | [**honcho**](/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-honcho) | Configure and use Honcho memory with Hermes -- cross-session user modeling, multi-profile peer isolation, observation config, dialectic reasoning, session summaries, and context budget enforcement. Use when setting up Honcho, troubleshoo... |
 | [**openhands**](/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-openhands) | Delegate coding to OpenHands CLI (model-agnostic, LiteLLM). |

--- a/website/docs/user-guide/features/credential-pools.md
+++ b/website/docs/user-guide/features/credential-pools.md
@@ -22,8 +22,11 @@ Your request
  → Pick key from pool (round_robin / least_used / fill_first / random)
  → Send to provider
  → 429 rate limit?
-      → Retry same key once (transient blip)
-      → Second 429 → rotate to next pool key
+      → Plan/usage limit reached (e.g. ChatGPT/Codex "usage limit reached")?
+          → Rotate to next pool key immediately (no retry — the cap won't clear on retry)
+      → Generic / transient 429?
+          → Retry same key once (transient blip)
+          → Second 429 → rotate to next pool key
      → All keys exhausted → fallback_model (different provider)
  → 402 billing error?
      → Immediately rotate to next pool key (24h cooldown)
--- a/website/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-antigravity-cli.md
+++ b/website/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-antigravity-cli.md
@@ -0,0 +1,195 @@
+---
+title: "Antigravity Cli — Operate the Antigravity CLI (agy): plugins, auth, sandbox"
+sidebar_label: "Antigravity Cli"
+description: "Operate the Antigravity CLI (agy): plugins, auth, sandbox"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Antigravity Cli
+
+Operate the Antigravity CLI (agy): plugins, auth, sandbox.
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/autonomous-ai-agents/antigravity-cli` |
+| Path | `optional-skills/autonomous-ai-agents/antigravity-cli` |
+| Version | `0.1.0` |
+| Author | Tony Simons (asimons81), Hermes Agent |
+| License | MIT |
+| Platforms | linux, macos, windows |
+| Tags | `Coding-Agent`, `Antigravity`, `CLI`, `Auth`, `Plugins`, `Sandbox` |
+| Related skills | [`grok`](/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-grok), [`codex`](/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-codex), [`claude-code`](/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-claude-code), [`hermes-agent`](/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-hermes-agent) |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Antigravity CLI (`agy`)
+
+Operator guide for the Antigravity CLI, invoked as `agy`. Run all `agy`
+commands through the Hermes `terminal` tool; inspect its config and logs with
+`read_file`. This skill is reference + procedure — it does not wrap a network
+API, so there is nothing to authenticate from Hermes itself.
+
+## When to Use
+
+- Installing, updating, or smoke-testing the `agy` binary
+- Driving non-interactive `agy --print` / `agy -p` one-shots
+- Debugging Antigravity auth, sandbox, permissions, or plugin state
+- Reading Antigravity settings, keybindings, conversations, or logs
+
+## Mental model
+
+Antigravity has two layers — keep them distinct or the guidance will be wrong:
+
+1. **Shell wrapper commands** — `agy help`, `agy install`, `agy plugin`,
+   `agy update`, `agy changelog`. Run these through the `terminal` tool.
+2. **Interactive in-session slash commands** — `/config`, `/permissions`,
+   `/skills`, `/agents`, etc. These only exist inside a running `agy` TUI
+   session, not on the shell wrapper.
+
+`agy help` shows the shell wrapper surface, NOT the in-session slash commands.
+
+## Prerequisites
+
+- The `agy` binary on PATH. Verify through the `terminal` tool:
+  `command -v agy && agy --version`.
+- No env vars or API keys required by this skill — Antigravity manages its own
+  auth via the OS keyring / browser sign-in (see Authentication below).
+
+## How to Run
+
+Invoke every `agy` command through the `terminal` tool. Examples:
+
+```
+terminal(command="agy --version")
+terminal(command="agy help")
+terminal(command="agy plugin list")
+terminal(command="agy --print 'Summarize the repo in 3 bullets'", workdir="/path/to/project")
+```
+
+For an interactive multi-turn TUI session, launch `agy` with `pty=true` (and
+tmux for capture/monitoring), the same pattern the `codex` / `claude-code`
+skills use. For one-shot smoke tests and scripted prompts, prefer
+`agy --print` (non-interactive).
+
+To inspect Antigravity's own files, use `read_file` on the paths under Core
+paths below — do not `cat` them through the terminal.
+
+## Core paths
+
+- Binary / entrypoint: `agy`
+- App data dir: `~/.gemini/antigravity-cli/`
+- Settings file: `~/.gemini/antigravity-cli/settings.json`
+- Keybindings file: `~/.gemini/antigravity-cli/keybindings.json`
+- Logs: `~/.gemini/antigravity-cli/log/cli-*.log`
+- Conversations: `~/.gemini/antigravity-cli/conversations/`
+- Brain artifacts: `~/.gemini/antigravity-cli/brain/`
+- History: `~/.gemini/antigravity-cli/history.jsonl`
+- Plugin staging: `~/.gemini/antigravity-cli/plugins/<plugin_name>/`
+
+## Quick Reference
+
+### Wrapper commands
+- `agy changelog`
+- `agy help`
+- `agy install`
+- `agy plugin` / `agy plugins`
+- `agy update`
+
+### Useful flags
+- `--add-dir`
+- `--continue` / `-c`
+- `--conversation`
+- `--dangerously-skip-permissions`
+- `--print` / `-p`
+- `--print-timeout`
+- `--prompt`
+- `--prompt-interactive` / `-i`
+- `--sandbox`
+- `--log-file`
+- `--version`
+
+### Plugin subcommands (`agy plugin --help`)
+- `list`, `import [source]`, `install <target>`, `uninstall <name>`,
+  `enable <name>`, `disable <name>`, `validate [path]`, `link <mp> <target>`,
+  `help`
+
+### Install flags (`agy install --help`)
+- `--dir`, `--skip-aliases`, `--skip-path`
+
+### In-session slash commands
+- **Conversation control:** `/resume` (`/switch`), `/rewind` (`/undo`),
+  `/rename <name>`, `/clear`, `/fork`, `/reset`, `/new`
+- **Settings & tools:** `/config`, `/settings`, `/permissions`, `/model`,
+  `/keybindings`, `/statusline`, `/tasks`, `/skills`, `/mcp`, `/open <path>`,
+  `/usage`, `/logout`, `/agents`
+- **Prompt helpers:** `@` path autocomplete, `esc esc` clears the prompt (when
+  not streaming), `!` runs a terminal command directly, `?` opens help
+
+## Settings and permissions
+
+### Common settings keys (`settings.json`)
+- `allowNonWorkspaceAccess`
+- `colorScheme`
+- `permissions.allow`
+- `trustedWorkspaces`
+
+### Permission modes
+`request-review`, `always-proceed`, `strict`, `proceed-in-sandbox`.
+
+### Sandbox behavior
+- `enableTerminalSandbox` is a boolean in `settings.json`; default `false`.
+- Launch-time overrides (`--sandbox`, `--dangerously-skip-permissions`) can
+  supersede persistent settings for the current session.
+
+## Authentication behavior
+
+- The CLI tries the OS secure keyring first.
+- With no saved session, it falls back to browser-based Google sign-in.
+- Locally it opens the default browser; over SSH it prints an authorization URL
+  and expects the auth code pasted back.
+- `/logout` removes saved credentials.
+
+## Plugins
+
+- Plugins stage under `~/.gemini/antigravity-cli/plugins/<plugin_name>/`.
+- They can bundle skills, agents, rules, MCP servers, and hooks.
+- `agy plugin list` returning no imported plugins is a valid empty state.
+
+## Pitfalls
+
+- `agy help` shows wrapper commands, not interactive slash commands.
+- `agy --version` is the safe non-interactive version check; `agy version` is
+  interactive and can fail without a real TTY.
+- First place to look for failures: `~/.gemini/antigravity-cli/log/cli-*.log`
+  (read with `read_file`).
+- Don't confuse persistent JSON settings with launch-time overrides.
+- `~/.gemini/antigravity-cli/bin/agentapi` is a thin wrapper to `agy agentapi`.
+- On WSL, token storage is file-based, so auth issues are usually local-file /
+  session-state problems, not browser-only problems.
+- Workspace identity can depend on launch directory and the `.antigravitycli`
+  project marker.
+
+## Verification
+
+Confirm the install is real and usable, all through the `terminal` tool (read
+files with `read_file`):
+
+1. `terminal(command="command -v agy")`
+2. `terminal(command="agy --version")`
+3. `terminal(command="agy help")`
+4. `terminal(command="agy plugin list")`
+5. `read_file` on `~/.gemini/antigravity-cli/settings.json`
+6. `read_file` on the latest `~/.gemini/antigravity-cli/log/cli-*.log`
+7. If needed, `read_file` on `~/.gemini/antigravity-cli/keybindings.json`
+
+## Support files
+
+- `references/cli-docs.md` — condensed notes from the getting-started, usage,
+  and features docs.
--- a/website/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-grok.md
+++ b/website/docs/user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-grok.md
@@ -0,0 +1,319 @@
+---
+title: "Grok — Delegate coding to xAI Grok Build CLI (features, PRs)"
+sidebar_label: "Grok"
+description: "Delegate coding to xAI Grok Build CLI (features, PRs)"
+---
+
+{/* This page is auto-generated from the skill's SKILL.md by website/scripts/generate-skill-docs.py. Edit the source SKILL.md, not this page. */}
+
+# Grok
+
+Delegate coding to xAI Grok Build CLI (features, PRs).
+
+## Skill metadata
+
+| | |
+|---|---|
+| Source | Optional — install with `hermes skills install official/autonomous-ai-agents/grok` |
+| Path | `optional-skills/autonomous-ai-agents/grok` |
+| Version | `0.1.0` |
+| Author | Matt Maximo (MattMaximo), Hermes Agent |
+| License | MIT |
+| Platforms | linux, macos, windows |
+| Tags | `Coding-Agent`, `Grok`, `xAI`, `Code-Review`, `Refactoring`, `Automation` |
+| Related skills | [`codex`](/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-codex), [`claude-code`](/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-claude-code), [`hermes-agent`](/docs/user-guide/skills/bundled/autonomous-ai-agents/autonomous-ai-agents-hermes-agent) |
+
+## Reference: full SKILL.md
+
+:::info
+The following is the complete skill definition that Hermes loads when this skill is triggered. This is what the agent sees as instructions when the skill is active.
+:::
+
+# Grok Build CLI — Hermes Orchestration Guide
+
+Delegate coding tasks to [Grok Build](https://docs.x.ai/build/overview) (xAI's
+autonomous coding agent CLI, the `grok` command) via the Hermes terminal. Grok
+can read files, write code, run shell commands, spawn subagents, and manage git
+workflows. It runs three ways: an interactive TUI, **headless** (`-p`), and as
+an **ACP agent** over JSON-RPC.
+
+This is the third sibling to `codex` and `claude-code`. The orchestration
+pattern is nearly identical — **prefer headless `-p` for one-shots**, use a PTY
+for interactive sessions.
+
+## When to use
+
+- Building features
+- Refactoring
+- PR reviews
+- Batch issue fixing
+- Any task where you'd otherwise reach for Codex / Claude Code but want Grok
+
+## Prerequisites
+
+- **Install (preferred):** `npm install -g @xai-official/grok`
+  - The official installer `curl -fsSL https://x.ai/cli/install.sh | bash` also
+    works, but the `x.ai` host is Cloudflare-walled in some environments. The
+    npm path avoids that dependency entirely.
+- **Auth — SuperGrok / X Premium+ subscription (primary path):**
+  - Run `grok login` once → opens a browser for OAuth → token cached in
+    `~/.grok/auth.json`. This uses your **SuperGrok or X Premium+** subscription
+    (no per-token API billing).
+  - Check sign-in state by looking for `~/.grok/auth.json`, or run a cheap
+    headless smoke test: `grok --no-auto-update -p "Say ok."`
+  - In the TUI, `/logout` signs out and `/login` (or relaunching) signs back in.
+- **No git repo required** — unlike Codex, Grok runs fine outside a git
+  directory (good for scratch/throwaway tasks).
+- **Claude Code / AGENTS.md compatible with zero config** — Grok auto-reads
+  `CLAUDE.md`, `.claude/` (skills, agents, MCPs, hooks, rules), and the
+  `AGENTS.md` family. Existing project context just works.
+
+> **API-key fallback (not the default for this user):** Grok also supports
+> setting the `XAI_API_KEY` environment variable for pay-as-you-go billing
+> via `api.x.ai`. Only use
+> this if `grok login` / SuperGrok auth is unavailable. The subscription path
+> (`grok login`) is the intended setup here.
+
+## Two Orchestration Modes
+
+### Mode 1: Headless (`-p`) — Non-Interactive (PREFERRED)
+
+Runs a one-shot task, prints the result, and exits. No PTY, no interactive
+dialogs to navigate. This is the cleanest integration path — the analog of
+`claude -p` and `codex exec`.
+
+```
+terminal(command="grok --no-auto-update -p 'Add a dark mode toggle to settings'", workdir="/path/to/project", timeout=180)
+```
+
+Always pass `--no-auto-update` in automation to skip background update checks.
+
+**When to use headless:**
+- One-shot coding tasks (fix a bug, add a feature, refactor)
+- CI/CD automation and scripting
+- Structured output parsing with `--output-format json`
+- Any task that doesn't need multi-turn conversation
+
+### Mode 2: Interactive PTY — Multi-Turn TUI Sessions
+
+The TUI is a fullscreen, mouse-interactive app. Drive it with `pty=true`. For
+robust monitoring/input use tmux (same pattern as the `claude-code` skill).
+
+```
+# Launch in a tmux session for capture-pane monitoring
+terminal(command="tmux new-session -d -s grok-work -x 140 -y 40")
+terminal(command="tmux send-keys -t grok-work 'cd /path/to/project && grok' Enter")
+
+# Wait for startup, then send a task
+terminal(command="sleep 5 && tmux send-keys -t grok-work 'Refactor the auth module to use JWT' Enter")
+
+# Monitor progress
+terminal(command="sleep 15 && tmux capture-pane -t grok-work -p -S -50")
+
+# Exit when done
+terminal(command="tmux send-keys -t grok-work '/quit' Enter && sleep 1 && tmux kill-session -t grok-work")
+```
+
+**Tip for headless-but-inline output:** if you want TUI-style output without the
+fullscreen alt-screen takeover (e.g. for cleaner logs), add `--no-alt-screen`.
+For pure automation, headless `-p` is still cleaner than the TUI.
+
+## Headless Deep Dive
+
+### Common Flags
+
+| Flag | Effect |
+|------|--------|
+| `-p, --single <PROMPT>` | Send one prompt, run headless, exit |
+| `-m, --model <MODEL>` | Choose a model |
+| `-s, --session-id <ID>` | Create or resume a named headless session |
+| `-r, --resume <ID>` | Resume an existing session |
+| `-c, --continue` | Continue the most recent session in the current directory |
+| `--cwd <PATH>` | Set the working directory |
+| `--output-format <FMT>` | `plain` (default), `json`, or `streaming-json` |
+| `--always-approve` | Auto-approve all tool executions (the `--full-auto` / `--yolo` equivalent) |
+| `--no-alt-screen` | Run inline, no fullscreen TUI takeover |
+| `--no-auto-update` | Skip background update checks (use in all automation) |
+
+### Output Formats
+
+- `plain` — human-readable text (default)
+- `json` — one JSON object at the end of the run (parse the result cleanly)
+- `streaming-json` — newline-delimited JSON events as they arrive
+
+```
+# Structured result for parsing
+terminal(command="grok --no-auto-update -p 'List all TODO comments in src/' --output-format json", workdir="/project", timeout=120)
+
+# Auto-approve for autonomous building
+terminal(command="grok --no-auto-update --always-approve -p 'Refactor the database layer and run the tests'", workdir="/project", timeout=300)
+```
+
+### Background Mode (Long Tasks)
+
+```
+# Start headless in background
+terminal(command="grok --no-auto-update --always-approve -p 'Refactor the auth module'", workdir="/project", background=true, notify_on_complete=true)
+# Returns session_id
+
+# Monitor
+process(action="poll", session_id="<id>")
+process(action="log", session_id="<id>")
+
+# Kill if needed
+process(action="kill", session_id="<id>")
+```
+
+For an interactive (TUI) background session, use `pty=true` + tmux and monitor
+with `tmux capture-pane`, exactly like the `claude-code` / `codex` skills.
+
+### Session Continuation
+
+```
+# Start a named session
+terminal(command="grok --no-auto-update -s refactor-db -p 'Start refactoring the database layer' --always-approve", workdir="/project", timeout=240)
+
+# Resume it later
+terminal(command="grok --no-auto-update -r refactor-db -p 'Now add connection pooling' --always-approve", workdir="/project", timeout=180)
+
+# Or continue the most recent session in this directory
+terminal(command="grok --no-auto-update -c -p 'What did you change last time?'", workdir="/project", timeout=60)
+```
+
+## Read-Only Audit → Markdown Note Pattern
+
+To have Grok review local artifacts and return a clean markdown note (for
+Obsidian or a repo) without mutating anything:
+
+1. Prepare stable input files first with Hermes tools (`read_file`,
+   `write_file`). Snapshot only the relevant context into a temp file rather
+   than dumping raw paths.
+2. Run Grok headless **without** `--always-approve` so it cannot auto-write, and
+   demand `markdown only, no preamble`.
+3. Save Grok's stdout straight into the destination note with `write_file()`.
+
+```
+grok --no-auto-update -p "Read /tmp/current.md and /tmp/inventory.md. Produce markdown only, no preamble. Output a clean note titled 'Cleanup Review'." --output-format plain
+```
+
+**Pitfall (same as Claude Code):** for document rewrites, a loose "rewrite this"
+prompt may return a change summary instead of the full file. Instead: pipe the
+file in, and demand `Return ONLY the full revised markdown document. No intro,
+no explanation, no code fences. Start immediately with '# Title'.` Verify the
+first lines with `read_file()` before overwriting the destination.
+
+## PR Review Patterns
+
+### Quick Review (Headless)
+
+```
+terminal(command="cd /path/to/repo && git diff main...feature-branch | grok --no-auto-update -p 'Review this diff for bugs, security issues, and style problems. Be thorough.'", timeout=120)
+```
+
+### Clone-to-temp Review (safe, no repo mutation)
+
+```
+terminal(command="REVIEW=$(mktemp -d) && git clone https://github.com/user/repo.git $REVIEW && cd $REVIEW && gh pr checkout 42 && grok --no-auto-update -p 'Review the changes vs origin/main. Check bugs, security, race conditions, missing tests.'", pty=true, timeout=300)
+```
+
+### Post the review
+
+```
+terminal(command="gh pr comment 42 --body '<review text>'", workdir="/path/to/repo")
+```
+
+## Parallel Issue Fixing with Worktrees
+
+```
+# Create worktrees
+terminal(command="git worktree add -b fix/issue-78 /tmp/issue-78 main", workdir="~/project")
+terminal(command="git worktree add -b fix/issue-99 /tmp/issue-99 main", workdir="~/project")
+
+# Launch Grok headless in each (background)
+terminal(command="grok --no-auto-update --always-approve -p 'Fix issue #78: <description>. Commit when done.'", workdir="/tmp/issue-78", background=true, notify_on_complete=true)
+terminal(command="grok --no-auto-update --always-approve -p 'Fix issue #99: <description>. Commit when done.'", workdir="/tmp/issue-99", background=true, notify_on_complete=true)
+
+# Monitor
+process(action="list")
+
+# After completion: push and open PRs
+terminal(command="cd /tmp/issue-78 && git push -u origin fix/issue-78")
+terminal(command="gh pr create --repo user/repo --head fix/issue-78 --title 'fix: ...' --body '...'")
+
+# Cleanup
+terminal(command="git worktree remove /tmp/issue-78", workdir="~/project")
+```
+
+## Useful Subcommands & TUI Commands
+
+| Command | Purpose |
+|---------|---------|
+| `grok` | Start the interactive TUI |
+| `grok -p "query"` | Headless one-shot |
+| `grok login` / `grok logout` | Sign in / out (SuperGrok / X Premium+ OAuth) |
+| `grok inspect` | Show what Grok discovered in cwd: config sources, instructions, skills, plugins, hooks, MCP servers |
+| `grok agent stdio` | Run as an ACP agent over JSON-RPC (for IDE/tool integration) |
+| `grok update` | Update the CLI (needs the `x.ai` host; skip in automation) |
+
+TUI slash commands (interactive only): `/model <name>`, `/always-approve`,
+`/plan`, `/context`, `/compact`, `/resume`, `/sessions`, `/fork`, `/usage`,
+`/quit`. `Shift+Tab` cycles session modes (including Plan mode, which blocks
+write tools except the session plan file).
+
+## Config (`~/.grok/config.toml`)
+
+```toml
+[cli]
+auto_update = false          # skip background update checks persistently
+
+[ui]
+permission_mode = "ask"      # or "always-approve" to skip tool prompts by default
+
+[models]
+default = "grok-build-0.1"
+```
+
+Put global preferences in `~/.grok/config.toml` (not project-scoped
+`.grok/config.toml`). `permission_mode` supersedes the legacy `approval_mode` /
+`yolo = true` keys.
+
+## Pitfalls & Gotchas
+
+1. **Auth is subscription-gated.** `grok login` requires a SuperGrok or X
+   Premium+ subscription. If login fails or there's no `~/.grok/auth.json`,
+   confirm the subscription is active before falling back to `XAI_API_KEY`.
+2. **Don't conflate Hermes' xAI auth with the `grok` CLI's auth.** Hermes'
+   `x_search` runs on its own xAI OAuth; the standalone `grok` CLI has a
+   separate token in `~/.grok/auth.json`. A working `x_search` does NOT mean
+   `grok` is logged in.
+3. **Always pass `--no-auto-update` in automation** — otherwise Grok phones home
+   for update checks (and `x.ai`/`storage.googleapis.com` may be unreachable).
+4. **Prefer npm install over the curl installer** — `npm install -g
+   @xai-official/grok` avoids the Cloudflare-walled `x.ai` host.
+5. **`--always-approve` is the autonomous-build switch.** Without it, headless
+   runs may stall waiting on tool-approval prompts. Omit it deliberately for
+   read-only review/audit work so Grok can't mutate files.
+6. **Headless `-p` skips TUI dialogs**; the TUI needs `pty=true` (+ tmux for
+   monitoring), just like Claude Code.
+7. **Use `--no-alt-screen`** if you run the TUI inline and the fullscreen
+   alt-screen takeover garbles captured output.
+8. **No git repo needed**, but for PR/commit workflows you still want one — use
+   `mktemp -d && git init` for scratch commit tasks.
+9. **Clean up tmux sessions** with `tmux kill-session -t <name>` when done.
+
+## Rules for Hermes Agents
+
+1. **Prefer headless `-p`** for single tasks — cleanest integration, structured
+   output via `--output-format json`.
+2. **Always set `workdir`** (or `--cwd`) so Grok targets the right project.
+3. **Pass `--no-auto-update`** in every automated invocation.
+4. **Use `--always-approve` only when Grok should write autonomously**; omit it
+   for read-only reviews and audits.
+5. **Background long tasks** with `background=true, notify_on_complete=true` and
+   monitor via the `process` tool.
+6. **Use tmux for multi-turn interactive work** and monitor with
+   `tmux capture-pane -t <session> -p -S -50`.
+7. **Verify auth before relying on it** — check `~/.grok/auth.json` or run a
+   cheap `grok -p "Say ok."` smoke test; don't assume Hermes' xAI auth carries
+   over.
+8. **Report results to the user** — summarize what Grok changed and what's left.
--- a/website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/user-guide/features/credential-pools.md
+++ b/website/i18n/zh-Hans/docusaurus-plugin-content-docs/current/user-guide/features/credential-pools.md
@@ -18,8 +18,11 @@ Your request
  → Pick key from pool (round_robin / least_used / fill_first / random)
  → Send to provider
  → 429 rate limit?
-      → Retry same key once (transient blip)
-      → Second 429 → rotate to next pool key
+      → Plan/usage limit reached (e.g. ChatGPT/Codex "usage limit reached")?
+          → Rotate to next pool key immediately (no retry — the cap won't clear on retry)
+      → Generic / transient 429?
+          → Retry same key once (transient blip)
+          → Second 429 → rotate to next pool key
      → All keys exhausted → fallback_model (different provider)
  → 402 billing error?
      → Immediately rotate to next pool key (24h cooldown)
--- a/website/sidebars.ts
+++ b/website/sidebars.ts
@@ -389,7 +389,9 @@ const sidebars: SidebarsConfig = {
                  key: 'skills-optional-autonomous-ai-agents',
                  collapsed: true,
                  items: [
+                    'user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-antigravity-cli',
                    'user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-blackbox',
+                    'user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-grok',
                    'user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-honcho',
                    'user-guide/skills/optional/autonomous-ai-agents/autonomous-ai-agents-openhands',
                  ],
--- a/website/static/api/model-catalog.json
+++ b/website/static/api/model-catalog.json
@@ -1,6 +1,6 @@
 {
  "version": 1,
-  "updated_at": "2026-05-29T06:55:44Z",
+  "updated_at": "2026-05-29T11:20:16Z",
  "metadata": {
    "source": "hermes-agent repo",
    "docs": "https://hermes-agent.nousresearch.com/docs/reference/model-catalog"
@@ -81,7 +81,7 @@
          "description": ""
        },
        {
-          "id": "google/gemini-3-flash-preview",
+          "id": "google/gemini-3.5-flash",
          "description": ""
        },
        {
@@ -198,7 +198,7 @@
          "id": "google/gemini-3-pro-preview"
        },
        {
-          "id": "google/gemini-3-flash-preview"
+          "id": "google/gemini-3.5-flash"
        },
        {
          "id": "google/gemini-3.1-pro-preview"
Author	SHA1	Message	Date
teknium1	daa0a938e4	fix(agent): route structured-reasoning empties to prefill, not nudge Post-tool empty-response nudge fired before the prefill branch for thinking models that emit reasoning via structured API fields (OpenRouter reasoning / reasoning_details, e.g. qwen3-vl-8b-thinking). The nudge guard only checked _has_inline_thinking (<think> tags in content), so every tool-using turn on these models hit the nudge path — one wasted LLM round-trip (~3-5s, ~400 tokens) and a spurious warning, before self-recovering. Hoist the _has_structured computation above the nudge guard and widen the guard from 'not _has_inline_thinking' to 'not _has_structured'. Nudge and prefill are now disjoint on _has_structured; the empty-retry branch's existing _prefill_exhausted guard already handles always-reasoning models falling through after prefill. Closes #34655. Reported by @sawtdakhili.	2026-05-29 12:23:21 -07:00
kshitij	7379f17556	fix(gateway): only fire planned-stop watcher for self-targeting markers + fix Windows consume (#34749 ) * fix(gateway): only fire planned-stop watcher for markers targeting self Salvaged from #34599 — rebased onto current main. The planned-stop watcher now only fires shutdown for a marker that targets the current process, instead of any marker that exists on disk. Fixes the Windows crash loop (#34597) where a stale marker from a previous Gateway instance kills a freshly booted Gateway ~400ms after start with a false "Received UNKNOWN — initiating shutdown". Co-authored-by: Bartok9 <danielrpike9@gmail.com> * fix(gateway): match planned-stop/takeover markers by PID alone when start_time is unavailable Follow-up to the #34599 salvage. The watcher's non-destructive probe (planned_stop_marker_targets_self) already falls back to PID equality when a process start_time is unavailable, but the authoritative consume it gates (_consume_pid_marker_for_self) still required a non-None start_time match. _get_process_start_time reads /proc/<pid>/stat and returns None on macOS and native Windows — the only platform the planned-stop watcher exists for. So on Windows the probe would fire the shutdown handler (PID matches) but the handler's consume_planned_stop_marker_for_self() would return False, and a legitimate 'hermes gateway stop' was still misclassified as an unexpected UNKNOWN exit (exit 1) and revived by the service manager — a residual half of the #34597 crash loop on the legitimate-stop path. Align the consume with the probe: when both start_times are known they must match (PID-reuse guard preserved on Linux); when either is unavailable, fall back to PID equality alone, bounded by the existing short marker TTL. This also fixes the parallel --replace takeover consume on Windows, which shares the same helper. Adds regression tests for the Windows (None start_time) path, the foreign-PID rejection under that fallback, and confirmation the start_time-mismatch guard still rejects when both are known. --------- Co-authored-by: Bartok9 <danielrpike9@gmail.com>	2026-05-29 17:36:58 +00:00
alt-glitch	0563ab0652	fix(test): add fal_client.submit stub to surface matrix test The plugin switched from fal_client.subscribe() to submit()+handle.get(). The test mock only had subscribe, causing CI failures.	2026-05-29 22:26:24 +05:30
alt-glitch	e46e4bcf47	fix(video_gen): parse duration suffix in success_response int(payload["duration"]) blows up on "4s" (veo3.1 format). Strip non-digit chars before int conversion in the response builder.	2026-05-29 22:26:24 +05:30
alt-glitch	3183b2e28c	fix(video_gen): veo3.1 duration format and 4k resolution FAL veo3.1 API expects duration as "4s"/"6s"/"8s" (with unit suffix), not bare "4"/"6"/"8" like other families. Add per-family duration_suffix field and apply it in _build_payload. Also add "4k" to veo3.1 resolutions per FAL API docs. Note: the managed gateway currently rejects the "4s" format (expects integer duration). Gateway-side fix needed for veo3.1 to work through the Nous subscription path.	2026-05-29 22:26:24 +05:30
alt-glitch	a4c18f65d4	feat(video_gen): wire Nous subscription override into hermes tools UX Add the same managed-gateway UX that image_gen already has: - TOOL_CATEGORIES['video_gen'] gets a 'Nous Subscription' provider row with managed_nous_feature='video_gen' + video_gen_plugin_name='fal' - NousSubscriptionFeatures gains a video_gen property + feature state computation (managed/active/available using the fal-queue gateway) - _GATEWAY_TOOL_LABELS, _GATEWAY_DIRECT_LABELS, _ALL_GATEWAY_KEYS, _get_gateway_direct_credentials, opted_in all include video_gen - apply_nous_managed_defaults and apply_gateway_defaults handle video_gen - _is_toolset_satisfied checks Nous features for video_gen - _is_provider_active detects managed video_gen (use_gateway + fal provider) - _select_plugin_video_gen_provider accepts use_gateway kwarg, propagated from all 4 call sites in _configure_provider when managed_feature is set - hermes setup status shows 'Video Generation (FAL via Nous subscription)' Users on a Nous subscription can now pick 'Nous Subscription' under hermes tools → Video Generation, which sets video_gen.provider=fal + video_gen.use_gateway=true. The FAL plugin's _resolve_managed_fal_video_gateway then routes through the managed queue gateway — no FAL_KEY needed.	2026-05-29 22:26:24 +05:30
alt-glitch	b6294ea9f1	test(video_gen): cover gateway decision matrix gaps and 4xx error path - Add test for 4xx ValueError with actionable remediation message - Add test for is_available() returning True via managed gateway - Add test for prefers_gateway overriding direct FAL_KEY - Add test for is_available() via gateway in plugin test file	2026-05-29 22:26:24 +05:30
alt-glitch	d04b3c193e	feat(video_gen): route FAL video gen through managed Nous gateway Wire plugins/video_gen/fal/__init__.py to use the same _ManagedFalSyncClient pattern that image gen already uses. Changes: - Add managed gateway resolution, client caching, and _submit_fal_video_request() that routes between direct FAL_KEY and Nous gateway modes - Update is_available() to return True when either FAL_KEY or the managed gateway is reachable - Update generate() to use submit+get handle pattern instead of fal_client.subscribe() directly - Fix happy-horse endpoint namespace: fal-ai/ → alibaba/ (matches the tool-gateway allowlist from fal-video-gen branch) - Surface actionable error on 4xx gateway rejections Tests: - 4 new tests in test_managed_media_gateways.py (gateway routing, client reuse, direct mode fallback, alibaba namespace) - Updated existing test_fal_plugin.py fixture to use submit/handle pattern and patch _resolve_managed_fal_video_gateway for isolation	2026-05-29 22:26:24 +05:30
kshitijk4poor	5cd0673217	ci: harden supply-chain gate jobs against changes-job failure The scan-gate / dep-bounds-gate jobs use needs.changes; if the changes job itself fails, its dependents would be skipped via a failed dependency (not a conditional skip), leaving the required check unreported — the same "pending forever" failure this PR fixes. Add always() and switch the gate condition from == 'false' to != 'true' so the gate still fires (and reports SUCCESS) when changes fails and its output is empty.	2026-05-29 09:17:01 -07:00
ethernet	6bc309baf2	ci: ensure required checks always report status Remove paths filters from contributor-check and supply-chain-audit workflows. When no matching files changed, the workflows never ran and the required checks (check-attribution, supply chain scan, dep bounds) stayed "pending" forever, blocking merge. Now both workflows always trigger. A path-check step/job determines whether the real work should run; gate jobs with matching names report success when the real job was skipped, so branch protection always gets a check status. Also fixes dep-bounds: the old condition if: contains(github.event.pull_request.changed_files_url, 'pyproject.toml') \|\| true was always true (the \|\| true made it unconditional). Now uses the proper changes.deps output from the shared filter job.	2026-05-29 09:17:01 -07:00
ethernet	6928692cec	Merge pull request #33773 from dvir-pashut/fix/nix-full-drop-stale-vercel-group fix(nix): drop stale "vercel" group from #full variant	2026-05-29 11:16:25 -04:00
teknium1	75cd420b3b	docs(skills): move antigravity-cli to autonomous-ai-agents in catalog + sidebar	2026-05-29 05:21:48 -07:00
teknium1	78d7fa1b5c	refactor(skills/antigravity-cli): move to autonomous-ai-agents (it's an AI agent CLI)	2026-05-29 05:21:48 -07:00
teknium1	904c0b479b	refactor(state): return FTS index count from vacuum() Have vacuum() return optimize_fts()'s count so the CLI 'sessions optimize' summary uses the real merged-index count instead of probing the private _FTS_TABLES / _fts_table_exists() members.	2026-05-29 05:09:56 -07:00
kshitijk4poor	38695254f8	perf(state): merge FTS5 segments on VACUUM + add 'hermes sessions optimize' The FTS5 indexes (messages_fts, messages_fts_trigram) grow as a series of incremental b-tree segments — one per trigger-driven insert batch. SQLite's automerge caps at ~16 segments, so a long-lived store keeps scanning many segments per MATCH and never collapses them unless the special 'optimize' command runs. Nothing in the codebase ever ran it: vacuum() only fired after a prune that deleted rows, and even then never merged FTS segments. Changes: - SessionDB.optimize_fts(): merges each FTS5 index to a single segment, probing for the (optional/lazy) trigram table first so it is safe to call unconditionally. Layout-only — search results and snippet() are unchanged. - vacuum() now calls optimize_fts() before VACUUM so freed index pages are returned to the OS in the same pass. - 'hermes sessions optimize' CLI subcommand for on-demand reclamation + segment compaction (previously there was no way to compact the store without a prune deleting rows), with before/after size reporting. Benchmark (8000 msgs, fragmented to 8 segments/index): - segments 8 -> 1 on both indexes - porter MATCH 5.5x faster (0.449 -> 0.081 ms/q) - trigram MATCH 3.0x faster (0.632 -> 0.207 ms/q) - 8000 matches before == 8000 after, identical row ids (no functional change) Orthogonal to the structural FTS-size PRs (#20239 external-content, #27770 optional trigram) — segment merge helps regardless of those. Tests: TestOptimizeFts covers index count, search+snippet preservation, missing-trigram path, and idempotency. Full test_hermes_state.py green (227).	2026-05-29 05:09:56 -07:00
Teknium	2159d2a729	docs(credential-pools): document immediate rotation on usage-limit 429 (#34580 ) The rotation flowchart only described the generic 'retry once, rotate on second 429' path. ChatGPT/Codex plan-limit 429s carry a usage_limit_reached reason and rotate to the next pool key immediately (no retry, since the cap won't clear on retry). Document that case so the docs match the code.	2026-05-29 04:50:14 -07:00
teknium1	0dba60f73b	docs(skills): regen catalog + sidebar for optional antigravity-cli skill	2026-05-29 04:49:42 -07:00
teknium1	632a7088a3	chore(skills/antigravity-cli): make optional, frame through Hermes tools, tighten frontmatter	2026-05-29 04:49:42 -07:00
Tony Simons	1bba5f27ab	feat(skills): add antigravity-cli operator skill	2026-05-29 04:49:42 -07:00
teknium1	d6f2bdabda	docs(skills): regen catalog + sidebar for optional grok skill	2026-05-29 04:49:38 -07:00
teknium1	99ddba94ed	chore(skills/grok): make optional + tighten SKILL.md to modern format	2026-05-29 04:49:38 -07:00
Matt Maximo	10cd4138cc	feat(skills): add grok skill for xAI Grok Build CLI Adds a `grok` skill under `skills/autonomous-ai-agents/`, a third coding-agent orchestration guide alongside `codex` and `claude-code`. It teaches Hermes to delegate coding tasks to Grok Build (xAI's `grok` CLI). - Headless `-p` one-shots (preferred) - Interactive TUI via pty + tmux - Session resume, background tasks, structured JSON output - PR review and parallel worktree patterns - Auth via SuperGrok / X Premium+ (`grok login`) - Full pitfalls and config notes	2026-05-29 04:49:38 -07:00
Teknium	5e7c2ffa9f	chore(models): gemini-3.5-flash replaces gemini-3-flash-preview in OpenRouter + Nous lists (#34581 ) * chore(models): swap gemini-3-flash-preview for gemini-3.5-flash in OpenRouter + Nous lists * chore(models): regenerate model-catalog.json for gemini-3.5-flash swap	2026-05-29 04:27:58 -07:00
dvir pashut	66265a0571	fix(nix): drop stale "vercel" group from #full variant The `vercel` optional-dependency was removed from pyproject.toml in #33067, but `nix/packages.nix` (added a few hours later in #33108) still references `"vercel"` in the `#full` variant's `extraDependencyGroups`. uv2nix fails evaluation with: error: Extra/group name 'vercel' does not match either extra or dependency group Because `nix/devShell.nix` does `inputsFrom = builtins.attrValues self'.packages`, the broken `#full` derivation is pulled into the dev shell too, so `nix develop` / direnv breaks on a fresh clone — not just `nix build .#full`.	2026-05-28 11:52:31 +03:00