docs(bluebubbles): fix pairing instructions to use existing approve flow

The docs incorrectly referenced 'hermes pairing generate bluebubbles' which doesn't exist. The existing reactive pairing flow already handles this — when an unknown user messages the bot, it sends them a code automatically, and the owner approves with 'hermes pairing approve'.
2026-06-23 02:13:14 +08:00 · 2026-04-09 03:55:12 -07:00
46 changed files with 298 additions and 3176 deletions
--- a/.github/workflows/docs-site-checks.yml
+++ b/.github/workflows/docs-site-checks.yml
@@ -27,8 +27,8 @@ jobs:
        with:
          python-version: '3.11'

-      - name: Install ascii-guard
-        run: python -m pip install ascii-guard==2.3.0 pyyaml==6.0.3
+      - name: Install Python dependencies
+        run: python -m pip install ascii-guard pyyaml

      - name: Extract skill metadata for dashboard
        run: python3 website/scripts/extract-skills.py
--- a/.github/workflows/nix.yml
+++ b/.github/workflows/nix.yml
@@ -27,8 +27,8 @@ jobs:
    timeout-minutes: 30
    steps:
      - uses: actions/checkout@v4
-      - uses: DeterminateSystems/nix-installer-action@ef8a148080ab6020fd15196c2084a2eea5ff2d25  # v22
-      - uses: DeterminateSystems/magic-nix-cache-action@565684385bcd71bad329742eefe8d12f2e765b39  # v13
+      - uses: DeterminateSystems/nix-installer-action@main
+      - uses: DeterminateSystems/magic-nix-cache-action@main
      - name: Check flake
        if: runner.os == 'Linux'
        run: nix flake check --print-build-logs
--- a/3
+++ b/3
@@ -1,8 +1,5 @@
 FROM debian:13.4

-# Disable Python stdout buffering to ensure logs are printed immediately
-ENV PYTHONUNBUFFERED=1
-
 # Install system dependencies in one layer, clear APT cache
 RUN apt-get update && \
    apt-get install -y --no-install-recommends \
--- a/agent/anthropic_adapter.py
+++ b/agent/anthropic_adapter.py
@@ -1238,27 +1238,10 @@ def build_anthropic_kwargs(
 ) -> Dict[str, Any]:
    """Build kwargs for anthropic.messages.create().

-    Naming note — two distinct concepts, easily confused:
-      max_tokens     = OUTPUT token cap for a single response.
-                       Anthropic's API calls this "max_tokens" but it only
-                       limits the *output*.  Anthropic's own native SDK
-                       renamed it "max_output_tokens" for clarity.
-      context_length = TOTAL context window (input tokens + output tokens).
-                       The API enforces: input_tokens + max_tokens ≤ context_length.
-                       Stored on the ContextCompressor; reduced on overflow errors.
-
-    When *max_tokens* is None the model's native output ceiling is used
-    (e.g. 128K for Opus 4.6, 64K for Sonnet 4.6).
-
-    When *context_length* is provided and the model's native output ceiling
-    exceeds it (e.g. a local endpoint with an 8K window), the output cap is
-    clamped to context_length − 1.  This only kicks in for unusually small
-    context windows; for full-size models the native output cap is always
-    smaller than the context window so no clamping happens.
-    NOTE: this clamping does not account for prompt size — if the prompt is
-    large, Anthropic may still reject the request.  The caller must detect
-    "max_tokens too large given prompt" errors and retry with a smaller cap
-    (see parse_available_output_tokens_from_error + _ephemeral_max_output_tokens).
+    When *max_tokens* is None, the model's native output limit is used
+    (e.g. 128K for Opus 4.6, 64K for Sonnet 4.6).  If *context_length*
+    is provided, the effective limit is clamped so it doesn't exceed
+    the context window.

    When *is_oauth* is True, applies Claude Code compatibility transforms:
    system prompt prefix, tool name prefixing, and prompt sanitization.
@@ -1273,14 +1256,10 @@ def build_anthropic_kwargs(
    anthropic_tools = convert_tools_to_anthropic(tools) if tools else []

    model = normalize_model_name(model, preserve_dots=preserve_dots)
-    # effective_max_tokens = output cap for this call (≠ total context window)
    effective_max_tokens = max_tokens or _get_anthropic_max_output(model)

-    # Clamp output cap to fit inside the total context window.
-    # Only matters for small custom endpoints where context_length < native
-    # output ceiling.  For standard Anthropic models context_length (e.g.
-    # 200K) is always larger than the output ceiling (e.g. 128K), so this
-    # branch is not taken.
+    # Clamp to context window if the user set a lower context_length
+    # (e.g. custom endpoint with limited capacity).
    if context_length and effective_max_tokens > context_length:
        effective_max_tokens = max(context_length - 1, 1)

--- a/agent/auxiliary_client.py
+++ b/agent/auxiliary_client.py
@@ -702,7 +702,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
            logger.debug("Auxiliary text client: %s (%s) via pool", pconfig.name, model)
            extra = {}
            if "api.kimi.com" in base_url.lower():
-                extra["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
+                extra["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
            elif "api.githubcopilot.com" in base_url.lower():
                from hermes_cli.models import copilot_default_headers

@@ -721,7 +721,7 @@ def _resolve_api_key_provider() -> Tuple[Optional[OpenAI], Optional[str]]:
        logger.debug("Auxiliary text client: %s (%s)", pconfig.name, model)
        extra = {}
        if "api.kimi.com" in base_url.lower():
-            extra["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
+            extra["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
        elif "api.githubcopilot.com" in base_url.lower():
            from hermes_cli.models import copilot_default_headers

@@ -1047,32 +1047,6 @@ def _is_payment_error(exc: Exception) -> bool:
    return False


-def _is_connection_error(exc: Exception) -> bool:
-    """Detect connection/network errors that warrant provider fallback.
-
-    Returns True for errors indicating the provider endpoint is unreachable
-    (DNS failure, connection refused, TLS errors, timeouts).  These are
-    distinct from API errors (4xx/5xx) which indicate the provider IS
-    reachable but returned an error.
-    """
-    from openai import APIConnectionError, APITimeoutError
-
-    if isinstance(exc, (APIConnectionError, APITimeoutError)):
-        return True
-    # urllib3 / httpx / httpcore connection errors
-    err_type = type(exc).__name__
-    if any(kw in err_type for kw in ("Connection", "Timeout", "DNS", "SSL")):
-        return True
-    err_lower = str(exc).lower()
-    if any(kw in err_lower for kw in (
-        "connection refused", "name or service not known",
-        "no route to host", "network is unreachable",
-        "timed out", "connection reset",
-    )):
-        return True
-    return False
-
-
 def _try_payment_fallback(
    failed_provider: str,
    task: str = None,
@@ -1137,7 +1111,7 @@ def _resolve_auto() -> Tuple[Optional[OpenAI], Optional[str]]:
    main_model = _read_main_model()
    if (main_provider and main_model
            and main_provider not in _AGGREGATOR_PROVIDERS
-            and main_provider not in ("auto", "")):
+            and main_provider not in ("auto", "custom", "")):
        client, resolved = resolve_provider_client(main_provider, main_model)
        if client is not None:
            logger.info("Auxiliary auto-detect: using main provider %s (%s)",
@@ -1195,7 +1169,7 @@ def _to_async_client(sync_client, model: str):

        async_kwargs["default_headers"] = copilot_default_headers()
    elif "api.kimi.com" in base_lower:
-        async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
+        async_kwargs["default_headers"] = {"User-Agent": "KimiCLI/1.0"}
    return AsyncOpenAI(**async_kwargs), model


@@ -1315,13 +1289,7 @@ def resolve_provider_client(
                )
                return None, None
            final_model = model or _read_main_model() or "gpt-4o-mini"
-            extra = {}
-            if "api.kimi.com" in custom_base.lower():
-                extra["default_headers"] = {"User-Agent": "KimiCLI/1.3"}
-            elif "api.githubcopilot.com" in custom_base.lower():
-                from hermes_cli.models import copilot_default_headers
-                extra["default_headers"] = copilot_default_headers()
-            client = OpenAI(api_key=custom_key, base_url=custom_base, **extra)
+            client = OpenAI(api_key=custom_key, base_url=custom_base)
            return (_to_async_client(client, final_model) if async_mode
                    else (client, final_model))
        # Try custom first, then codex, then API-key providers
@@ -1400,7 +1368,7 @@ def resolve_provider_client(
        # Provider-specific headers
        headers = {}
        if "api.kimi.com" in base_url.lower():
-            headers["User-Agent"] = "KimiCLI/1.3"
+            headers["User-Agent"] = "KimiCLI/1.0"
        elif "api.githubcopilot.com" in base_url.lower():
            from hermes_cli.models import copilot_default_headers

@@ -2125,18 +2093,7 @@ def call_llm(
        # try alternative providers instead of giving up.  This handles the
        # common case where a user runs out of OpenRouter credits but has
        # Codex OAuth or another provider available.
-        #
-        # ── Connection error fallback ────────────────────────────────
-        # When a provider endpoint is unreachable (DNS failure, connection
-        # refused, timeout), try alternative providers.  This handles stale
-        # Codex/OAuth tokens that authenticate but whose endpoint is down,
-        # and providers the user never configured that got picked up by
-        # the auto-detection chain.
-        should_fallback = _is_payment_error(first_err) or _is_connection_error(first_err)
-        if should_fallback:
-            reason = "payment error" if _is_payment_error(first_err) else "connection error"
-            logger.info("Auxiliary %s: %s on %s (%s), trying fallback",
-                        task or "call", reason, resolved_provider, first_err)
+        if _is_payment_error(first_err):
            fb_client, fb_model, fb_label = _try_payment_fallback(
                resolved_provider, task)
            if fb_client is not None:
--- a/agent/credential_pool.py
+++ b/agent/credential_pool.py
@@ -18,14 +18,12 @@ import hermes_cli.auth as auth_mod
 from hermes_cli.auth import (
    CODEX_ACCESS_TOKEN_REFRESH_SKEW_SECONDS,
    DEFAULT_AGENT_KEY_MIN_TTL_SECONDS,
-    KIMI_CODE_BASE_URL,
    PROVIDER_REGISTRY,
    _codex_access_token_is_expiring,
    _decode_jwt_claims,
    _import_codex_cli_tokens,
    _load_auth_store,
    _load_provider_state,
-    _resolve_kimi_base_url,
    _resolve_zai_base_url,
    read_credential_pool,
    write_credential_pool,
@@ -513,13 +511,6 @@ class CredentialPool:
                    except Exception as wexc:
                        logger.debug("Failed to write refreshed token to credentials file: %s", wexc)
            elif self.provider == "openai-codex":
-                # Proactively sync from ~/.codex/auth.json before refresh.
-                # The Codex CLI (or another Hermes profile) may have already
-                # consumed our refresh_token.  Syncing first avoids a
-                # "refresh_token_reused" error when the CLI has a newer pair.
-                synced = self._sync_codex_entry_from_cli(entry)
-                if synced is not entry:
-                    entry = synced
                refreshed = auth_mod.refresh_codex_oauth_pure(
                    entry.access_token,
                    entry.refresh_token,
@@ -605,35 +596,6 @@ class CredentialPool:
                    # Credentials file had a valid (non-expired) token — use it directly
                    logger.debug("Credentials file has valid token, using without refresh")
                    return synced
-            # For openai-codex: the refresh_token may have been consumed by
-            # the Codex CLI between our proactive sync and the refresh call.
-            # Re-sync and retry once.
-            if self.provider == "openai-codex":
-                synced = self._sync_codex_entry_from_cli(entry)
-                if synced.refresh_token != entry.refresh_token:
-                    logger.debug("Retrying Codex refresh with synced token from ~/.codex/auth.json")
-                    try:
-                        refreshed = auth_mod.refresh_codex_oauth_pure(
-                            synced.access_token,
-                            synced.refresh_token,
-                        )
-                        updated = replace(
-                            synced,
-                            access_token=refreshed["access_token"],
-                            refresh_token=refreshed["refresh_token"],
-                            last_refresh=refreshed.get("last_refresh"),
-                            last_status=STATUS_OK,
-                            last_status_at=None,
-                            last_error_code=None,
-                        )
-                        self._replace_entry(synced, updated)
-                        self._persist()
-                        return updated
-                    except Exception as retry_exc:
-                        logger.debug("Codex retry refresh also failed: %s", retry_exc)
-                elif not self._entry_needs_refresh(synced):
-                    logger.debug("Codex CLI has valid token, using without refresh")
-                    return synced
            self._mark_exhausted(entry, None)
            return None

@@ -1122,9 +1084,7 @@ def _seed_from_env(provider: str, entries: List[PooledCredential]) -> Tuple[bool
        active_sources.add(source)
        auth_type = AUTH_TYPE_OAUTH if provider == "anthropic" and not token.startswith("sk-ant-api") else AUTH_TYPE_API_KEY
        base_url = env_url or pconfig.inference_base_url
-        if provider == "kimi-coding":
-            base_url = _resolve_kimi_base_url(token, pconfig.inference_base_url, env_url)
-        elif provider == "zai":
+        if provider == "zai":
            base_url = _resolve_zai_base_url(token, pconfig.inference_base_url, env_url)
        changed |= _upsert_entry(
            entries,
--- a/agent/error_classifier.py
+++ b/agent/error_classifier.py
@@ -1,792 +0,0 @@
-"""API error classification for smart failover and recovery.
-
-Provides a structured taxonomy of API errors and a priority-ordered
-classification pipeline that determines the correct recovery action
-(retry, rotate credential, fallback to another provider, compress
-context, or abort).
-
-Replaces scattered inline string-matching with a centralized classifier
-that the main retry loop in run_agent.py consults for every API failure.
-"""
-
-from __future__ import annotations
-
-import enum
-import logging
-import re
-from dataclasses import dataclass, field
-from typing import Any, Dict, Optional
-
-logger = logging.getLogger(__name__)
-
-
-# ── Error taxonomy ──────────────────────────────────────────────────────
-
-class FailoverReason(enum.Enum):
-    """Why an API call failed — determines recovery strategy."""
-
-    # Authentication / authorization
-    auth = "auth"                        # Transient auth (401/403) — refresh/rotate
-    auth_permanent = "auth_permanent"    # Auth failed after refresh — abort
-
-    # Billing / quota
-    billing = "billing"                  # 402 or confirmed credit exhaustion — rotate immediately
-    rate_limit = "rate_limit"            # 429 or quota-based throttling — backoff then rotate
-
-    # Server-side
-    overloaded = "overloaded"            # 503/529 — provider overloaded, backoff
-    server_error = "server_error"        # 500/502 — internal server error, retry
-
-    # Transport
-    timeout = "timeout"                  # Connection/read timeout — rebuild client + retry
-
-    # Context / payload
-    context_overflow = "context_overflow"  # Context too large — compress, not failover
-    payload_too_large = "payload_too_large"  # 413 — compress payload
-
-    # Model
-    model_not_found = "model_not_found"  # 404 or invalid model — fallback to different model
-
-    # Request format
-    format_error = "format_error"        # 400 bad request — abort or strip + retry
-
-    # Provider-specific
-    thinking_signature = "thinking_signature"  # Anthropic thinking block sig invalid
-    long_context_tier = "long_context_tier"    # Anthropic "extra usage" tier gate
-
-    # Catch-all
-    unknown = "unknown"                  # Unclassifiable — retry with backoff
-
-
-# ── Classification result ───────────────────────────────────────────────
-
-@dataclass
-class ClassifiedError:
-    """Structured classification of an API error with recovery hints."""
-
-    reason: FailoverReason
-    status_code: Optional[int] = None
-    provider: Optional[str] = None
-    model: Optional[str] = None
-    message: str = ""
-    error_context: Dict[str, Any] = field(default_factory=dict)
-
-    # Recovery action hints — the retry loop checks these instead of
-    # re-classifying the error itself.
-    retryable: bool = True
-    should_compress: bool = False
-    should_rotate_credential: bool = False
-    should_fallback: bool = False
-
-    @property
-    def is_auth(self) -> bool:
-        return self.reason in (FailoverReason.auth, FailoverReason.auth_permanent)
-
-    @property
-    def is_transient(self) -> bool:
-        """Error is expected to resolve on retry (with or without backoff)."""
-        return self.reason in (
-            FailoverReason.rate_limit,
-            FailoverReason.overloaded,
-            FailoverReason.server_error,
-            FailoverReason.timeout,
-            FailoverReason.unknown,
-        )
-
-
-# ── Provider-specific patterns ──────────────────────────────────────────
-
-# Patterns that indicate billing exhaustion (not transient rate limit)
-_BILLING_PATTERNS = [
-    "insufficient credits",
-    "insufficient_quota",
-    "credit balance",
-    "credits have been exhausted",
-    "top up your credits",
-    "payment required",
-    "billing hard limit",
-    "exceeded your current quota",
-    "account is deactivated",
-    "plan does not include",
-]
-
-# Patterns that indicate rate limiting (transient, will resolve)
-_RATE_LIMIT_PATTERNS = [
-    "rate limit",
-    "rate_limit",
-    "too many requests",
-    "throttled",
-    "requests per minute",
-    "tokens per minute",
-    "requests per day",
-    "try again in",
-    "please retry after",
-    "resource_exhausted",
-]
-
-# Usage-limit patterns that need disambiguation (could be billing OR rate_limit)
-_USAGE_LIMIT_PATTERNS = [
-    "usage limit",
-    "quota",
-    "limit exceeded",
-    "key limit exceeded",
-]
-
-# Patterns confirming usage limit is transient (not billing)
-_USAGE_LIMIT_TRANSIENT_SIGNALS = [
-    "try again",
-    "retry",
-    "resets at",
-    "reset in",
-    "wait",
-    "requests remaining",
-    "periodic",
-    "window",
-]
-
-# Payload-too-large patterns detected from message text (no status_code attr).
-# Proxies and some backends embed the HTTP status in the error message.
-_PAYLOAD_TOO_LARGE_PATTERNS = [
-    "request entity too large",
-    "payload too large",
-    "error code: 413",
-]
-
-# Context overflow patterns
-_CONTEXT_OVERFLOW_PATTERNS = [
-    "context length",
-    "context size",
-    "maximum context",
-    "token limit",
-    "too many tokens",
-    "reduce the length",
-    "exceeds the limit",
-    "context window",
-    "prompt is too long",
-    "prompt exceeds max length",
-    "max_tokens",
-    "maximum number of tokens",
-    # Chinese error messages (some providers return these)
-    "超过最大长度",
-    "上下文长度",
-]
-
-# Model not found patterns
-_MODEL_NOT_FOUND_PATTERNS = [
-    "is not a valid model",
-    "invalid model",
-    "model not found",
-    "model_not_found",
-    "does not exist",
-    "no such model",
-    "unknown model",
-    "unsupported model",
-]
-
-# Auth patterns (non-status-code signals)
-_AUTH_PATTERNS = [
-    "invalid api key",
-    "invalid_api_key",
-    "authentication",
-    "unauthorized",
-    "forbidden",
-    "invalid token",
-    "token expired",
-    "token revoked",
-    "access denied",
-]
-
-# Anthropic thinking block signature patterns
-_THINKING_SIG_PATTERNS = [
-    "signature",  # Combined with "thinking" check
-]
-
-# Transport error type names
-_TRANSPORT_ERROR_TYPES = frozenset({
-    "ReadTimeout", "ConnectTimeout", "PoolTimeout",
-    "ConnectError", "RemoteProtocolError",
-    "ConnectionError", "ConnectionResetError",
-    "ConnectionAbortedError", "BrokenPipeError",
-    "TimeoutError", "ReadError",
-    "ServerDisconnectedError",
-    # OpenAI SDK errors (not subclasses of Python builtins)
-    "APIConnectionError",
-    "APITimeoutError",
-})
-
-# Server disconnect patterns (no status code, but transport-level)
-_SERVER_DISCONNECT_PATTERNS = [
-    "server disconnected",
-    "peer closed connection",
-    "connection reset by peer",
-    "connection was closed",
-    "network connection lost",
-    "unexpected eof",
-    "incomplete chunked read",
-]
-
-
-# ── Classification pipeline ─────────────────────────────────────────────
-
-def classify_api_error(
-    error: Exception,
-    *,
-    provider: str = "",
-    model: str = "",
-    approx_tokens: int = 0,
-    context_length: int = 200000,
-    num_messages: int = 0,
-) -> ClassifiedError:
-    """Classify an API error into a structured recovery recommendation.
-
-    Priority-ordered pipeline:
-      1. Special-case provider-specific patterns (thinking sigs, tier gates)
-      2. HTTP status code + message-aware refinement
-      3. Error code classification (from body)
-      4. Message pattern matching (billing vs rate_limit vs context vs auth)
-      5. Transport error heuristics
-      6. Server disconnect + large session → context overflow
-      7. Fallback: unknown (retryable with backoff)
-
-    Args:
-        error: The exception from the API call.
-        provider: Current provider name (e.g. "openrouter", "anthropic").
-        model: Current model slug.
-        approx_tokens: Approximate token count of the current context.
-        context_length: Maximum context length for the current model.
-
-    Returns:
-        ClassifiedError with reason and recovery action hints.
-    """
-    status_code = _extract_status_code(error)
-    error_type = type(error).__name__
-    body = _extract_error_body(error)
-    error_code = _extract_error_code(body)
-
-    # Build a comprehensive error message string for pattern matching.
-    # str(error) alone may not include the body message (e.g. OpenAI SDK's
-    # APIStatusError.__str__ returns the first arg, not the body).  Append
-    # the body message so patterns like "try again" in 402 disambiguation
-    # are detected even when only present in the structured body.
-    #
-    # Also extract metadata.raw — OpenRouter wraps upstream provider errors
-    # inside {"error": {"message": "Provider returned error", "metadata":
-    # {"raw": "<actual error JSON>"}}} and the real error message (e.g.
-    # "context length exceeded") is only in the inner JSON.
-    _raw_msg = str(error).lower()
-    _body_msg = ""
-    _metadata_msg = ""
-    if isinstance(body, dict):
-        _err_obj = body.get("error", {})
-        if isinstance(_err_obj, dict):
-            _body_msg = (_err_obj.get("message") or "").lower()
-            # Parse metadata.raw for wrapped provider errors
-            _metadata = _err_obj.get("metadata", {})
-            if isinstance(_metadata, dict):
-                _raw_json = _metadata.get("raw") or ""
-                if isinstance(_raw_json, str) and _raw_json.strip():
-                    try:
-                        import json
-                        _inner = json.loads(_raw_json)
-                        if isinstance(_inner, dict):
-                            _inner_err = _inner.get("error", {})
-                            if isinstance(_inner_err, dict):
-                                _metadata_msg = (_inner_err.get("message") or "").lower()
-                    except (json.JSONDecodeError, TypeError):
-                        pass
-        if not _body_msg:
-            _body_msg = (body.get("message") or "").lower()
-    # Combine all message sources for pattern matching
-    parts = [_raw_msg]
-    if _body_msg and _body_msg not in _raw_msg:
-        parts.append(_body_msg)
-    if _metadata_msg and _metadata_msg not in _raw_msg and _metadata_msg not in _body_msg:
-        parts.append(_metadata_msg)
-    error_msg = " ".join(parts)
-    provider_lower = (provider or "").strip().lower()
-    model_lower = (model or "").strip().lower()
-
-    def _result(reason: FailoverReason, **overrides) -> ClassifiedError:
-        defaults = {
-            "reason": reason,
-            "status_code": status_code,
-            "provider": provider,
-            "model": model,
-            "message": _extract_message(error, body),
-        }
-        defaults.update(overrides)
-        return ClassifiedError(**defaults)
-
-    # ── 1. Provider-specific patterns (highest priority) ────────────
-
-    # Anthropic thinking block signature invalid (400).
-    # Don't gate on provider — OpenRouter proxies Anthropic errors, so the
-    # provider may be "openrouter" even though the error is Anthropic-specific.
-    # The message pattern ("signature" + "thinking") is unique enough.
-    if (
-        status_code == 400
-        and "signature" in error_msg
-        and "thinking" in error_msg
-    ):
-        return _result(
-            FailoverReason.thinking_signature,
-            retryable=True,
-            should_compress=False,
-        )
-
-    # Anthropic long-context tier gate (429 "extra usage" + "long context")
-    if (
-        status_code == 429
-        and "extra usage" in error_msg
-        and "long context" in error_msg
-    ):
-        return _result(
-            FailoverReason.long_context_tier,
-            retryable=True,
-            should_compress=True,
-        )
-
-    # ── 2. HTTP status code classification ──────────────────────────
-
-    if status_code is not None:
-        classified = _classify_by_status(
-            status_code, error_msg, error_code, body,
-            provider=provider_lower, model=model_lower,
-            approx_tokens=approx_tokens, context_length=context_length,
-            num_messages=num_messages,
-            result_fn=_result,
-        )
-        if classified is not None:
-            return classified
-
-    # ── 3. Error code classification ────────────────────────────────
-
-    if error_code:
-        classified = _classify_by_error_code(error_code, error_msg, _result)
-        if classified is not None:
-            return classified
-
-    # ── 4. Message pattern matching (no status code) ────────────────
-
-    classified = _classify_by_message(
-        error_msg, error_type,
-        approx_tokens=approx_tokens,
-        context_length=context_length,
-        result_fn=_result,
-    )
-    if classified is not None:
-        return classified
-
-    # ── 5. Server disconnect + large session → context overflow ─────
-    # Must come BEFORE generic transport error catch — a disconnect on
-    # a large session is more likely context overflow than a transient
-    # transport hiccup.  Without this ordering, RemoteProtocolError
-    # always maps to timeout regardless of session size.
-
-    is_disconnect = any(p in error_msg for p in _SERVER_DISCONNECT_PATTERNS)
-    if is_disconnect and not status_code:
-        is_large = approx_tokens > context_length * 0.6 or approx_tokens > 120000 or num_messages > 200
-        if is_large:
-            return _result(
-                FailoverReason.context_overflow,
-                retryable=True,
-                should_compress=True,
-            )
-        return _result(FailoverReason.timeout, retryable=True)
-
-    # ── 6. Transport / timeout heuristics ───────────────────────────
-
-    if error_type in _TRANSPORT_ERROR_TYPES or isinstance(error, (TimeoutError, ConnectionError, OSError)):
-        return _result(FailoverReason.timeout, retryable=True)
-
-    # ── 7. Fallback: unknown ────────────────────────────────────────
-
-    return _result(FailoverReason.unknown, retryable=True)
-
-
-# ── Status code classification ──────────────────────────────────────────
-
-def _classify_by_status(
-    status_code: int,
-    error_msg: str,
-    error_code: str,
-    body: dict,
-    *,
-    provider: str,
-    model: str,
-    approx_tokens: int,
-    context_length: int,
-    num_messages: int = 0,
-    result_fn,
-) -> Optional[ClassifiedError]:
-    """Classify based on HTTP status code with message-aware refinement."""
-
-    if status_code == 401:
-        # Not retryable on its own — credential pool rotation and
-        # provider-specific refresh (Codex, Anthropic, Nous) run before
-        # the retryability check in run_agent.py.  If those succeed, the
-        # loop `continue`s.  If they fail, retryable=False ensures we
-        # hit the client-error abort path (which tries fallback first).
-        return result_fn(
-            FailoverReason.auth,
-            retryable=False,
-            should_rotate_credential=True,
-            should_fallback=True,
-        )
-
-    if status_code == 403:
-        # OpenRouter 403 "key limit exceeded" is actually billing
-        if "key limit exceeded" in error_msg or "spending limit" in error_msg:
-            return result_fn(
-                FailoverReason.billing,
-                retryable=False,
-                should_rotate_credential=True,
-                should_fallback=True,
-            )
-        return result_fn(
-            FailoverReason.auth,
-            retryable=False,
-            should_fallback=True,
-        )
-
-    if status_code == 402:
-        return _classify_402(error_msg, result_fn)
-
-    if status_code == 404:
-        if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
-            return result_fn(
-                FailoverReason.model_not_found,
-                retryable=False,
-                should_fallback=True,
-            )
-        # Generic 404 — could be model or endpoint
-        return result_fn(
-            FailoverReason.model_not_found,
-            retryable=False,
-            should_fallback=True,
-        )
-
-    if status_code == 413:
-        return result_fn(
-            FailoverReason.payload_too_large,
-            retryable=True,
-            should_compress=True,
-        )
-
-    if status_code == 429:
-        # Already checked long_context_tier above; this is a normal rate limit
-        return result_fn(
-            FailoverReason.rate_limit,
-            retryable=True,
-            should_rotate_credential=True,
-            should_fallback=True,
-        )
-
-    if status_code == 400:
-        return _classify_400(
-            error_msg, error_code, body,
-            provider=provider, model=model,
-            approx_tokens=approx_tokens,
-            context_length=context_length,
-            num_messages=num_messages,
-            result_fn=result_fn,
-        )
-
-    if status_code in (500, 502):
-        return result_fn(FailoverReason.server_error, retryable=True)
-
-    if status_code in (503, 529):
-        return result_fn(FailoverReason.overloaded, retryable=True)
-
-    # Other 4xx — non-retryable
-    if 400 <= status_code < 500:
-        return result_fn(
-            FailoverReason.format_error,
-            retryable=False,
-            should_fallback=True,
-        )
-
-    # Other 5xx — retryable
-    if 500 <= status_code < 600:
-        return result_fn(FailoverReason.server_error, retryable=True)
-
-    return None
-
-
-def _classify_402(error_msg: str, result_fn) -> ClassifiedError:
-    """Disambiguate 402: billing exhaustion vs transient usage limit.
-
-    The key insight from OpenClaw: some 402s are transient rate limits
-    disguised as payment errors.  "Usage limit, try again in 5 minutes"
-    is NOT a billing problem — it's a periodic quota that resets.
-    """
-    # Check for transient usage-limit signals first
-    has_usage_limit = any(p in error_msg for p in _USAGE_LIMIT_PATTERNS)
-    has_transient_signal = any(p in error_msg for p in _USAGE_LIMIT_TRANSIENT_SIGNALS)
-
-    if has_usage_limit and has_transient_signal:
-        # Transient quota — treat as rate limit, not billing
-        return result_fn(
-            FailoverReason.rate_limit,
-            retryable=True,
-            should_rotate_credential=True,
-            should_fallback=True,
-        )
-
-    # Confirmed billing exhaustion
-    return result_fn(
-        FailoverReason.billing,
-        retryable=False,
-        should_rotate_credential=True,
-        should_fallback=True,
-    )
-
-
-def _classify_400(
-    error_msg: str,
-    error_code: str,
-    body: dict,
-    *,
-    provider: str,
-    model: str,
-    approx_tokens: int,
-    context_length: int,
-    num_messages: int = 0,
-    result_fn,
-) -> ClassifiedError:
-    """Classify 400 Bad Request — context overflow, format error, or generic."""
-
-    # Context overflow from 400
-    if any(p in error_msg for p in _CONTEXT_OVERFLOW_PATTERNS):
-        return result_fn(
-            FailoverReason.context_overflow,
-            retryable=True,
-            should_compress=True,
-        )
-
-    # Some providers return model-not-found as 400 instead of 404 (e.g. OpenRouter).
-    if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
-        return result_fn(
-            FailoverReason.model_not_found,
-            retryable=False,
-            should_fallback=True,
-        )
-
-    # Some providers return rate limit / billing errors as 400 instead of 429/402.
-    # Check these patterns before falling through to format_error.
-    if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
-        return result_fn(
-            FailoverReason.rate_limit,
-            retryable=True,
-            should_rotate_credential=True,
-            should_fallback=True,
-        )
-    if any(p in error_msg for p in _BILLING_PATTERNS):
-        return result_fn(
-            FailoverReason.billing,
-            retryable=False,
-            should_rotate_credential=True,
-            should_fallback=True,
-        )
-
-    # Generic 400 + large session → probable context overflow
-    # Anthropic sometimes returns a bare "Error" message when context is too large
-    err_body_msg = ""
-    if isinstance(body, dict):
-        err_obj = body.get("error", {})
-        if isinstance(err_obj, dict):
-            err_body_msg = (err_obj.get("message") or "").strip().lower()
-        # Responses API (and some providers) use flat body: {"message": "..."}
-        if not err_body_msg:
-            err_body_msg = (body.get("message") or "").strip().lower()
-    is_generic = len(err_body_msg) < 30 or err_body_msg in ("error", "")
-    is_large = approx_tokens > context_length * 0.4 or approx_tokens > 80000 or num_messages > 80
-
-    if is_generic and is_large:
-        return result_fn(
-            FailoverReason.context_overflow,
-            retryable=True,
-            should_compress=True,
-        )
-
-    # Non-retryable format error
-    return result_fn(
-        FailoverReason.format_error,
-        retryable=False,
-        should_fallback=True,
-    )
-
-
-# ── Error code classification ───────────────────────────────────────────
-
-def _classify_by_error_code(
-    error_code: str, error_msg: str, result_fn,
-) -> Optional[ClassifiedError]:
-    """Classify by structured error codes from the response body."""
-    code_lower = error_code.lower()
-
-    if code_lower in ("resource_exhausted", "throttled", "rate_limit_exceeded"):
-        return result_fn(
-            FailoverReason.rate_limit,
-            retryable=True,
-            should_rotate_credential=True,
-        )
-
-    if code_lower in ("insufficient_quota", "billing_not_active", "payment_required"):
-        return result_fn(
-            FailoverReason.billing,
-            retryable=False,
-            should_rotate_credential=True,
-            should_fallback=True,
-        )
-
-    if code_lower in ("model_not_found", "model_not_available", "invalid_model"):
-        return result_fn(
-            FailoverReason.model_not_found,
-            retryable=False,
-            should_fallback=True,
-        )
-
-    if code_lower in ("context_length_exceeded", "max_tokens_exceeded"):
-        return result_fn(
-            FailoverReason.context_overflow,
-            retryable=True,
-            should_compress=True,
-        )
-
-    return None
-
-
-# ── Message pattern classification ──────────────────────────────────────
-
-def _classify_by_message(
-    error_msg: str,
-    error_type: str,
-    *,
-    approx_tokens: int,
-    context_length: int,
-    result_fn,
-) -> Optional[ClassifiedError]:
-    """Classify based on error message patterns when no status code is available."""
-
-    # Payload-too-large patterns (from message text when no status_code)
-    if any(p in error_msg for p in _PAYLOAD_TOO_LARGE_PATTERNS):
-        return result_fn(
-            FailoverReason.payload_too_large,
-            retryable=True,
-            should_compress=True,
-        )
-
-    # Billing patterns
-    if any(p in error_msg for p in _BILLING_PATTERNS):
-        return result_fn(
-            FailoverReason.billing,
-            retryable=False,
-            should_rotate_credential=True,
-            should_fallback=True,
-        )
-
-    # Rate limit patterns
-    if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
-        return result_fn(
-            FailoverReason.rate_limit,
-            retryable=True,
-            should_rotate_credential=True,
-            should_fallback=True,
-        )
-
-    # Context overflow patterns
-    if any(p in error_msg for p in _CONTEXT_OVERFLOW_PATTERNS):
-        return result_fn(
-            FailoverReason.context_overflow,
-            retryable=True,
-            should_compress=True,
-        )
-
-    # Auth patterns
-    if any(p in error_msg for p in _AUTH_PATTERNS):
-        return result_fn(
-            FailoverReason.auth,
-            retryable=True,
-            should_rotate_credential=True,
-        )
-
-    # Model not found patterns
-    if any(p in error_msg for p in _MODEL_NOT_FOUND_PATTERNS):
-        return result_fn(
-            FailoverReason.model_not_found,
-            retryable=False,
-            should_fallback=True,
-        )
-
-    return None
-
-
-# ── Helpers ─────────────────────────────────────────────────────────────
-
-def _extract_status_code(error: Exception) -> Optional[int]:
-    """Walk the error and its cause chain to find an HTTP status code."""
-    current = error
-    for _ in range(5):  # Max depth to prevent infinite loops
-        code = getattr(current, "status_code", None)
-        if isinstance(code, int):
-            return code
-        # Some SDKs use .status instead of .status_code
-        code = getattr(current, "status", None)
-        if isinstance(code, int) and 100 <= code < 600:
-            return code
-        # Walk cause chain
-        cause = getattr(current, "__cause__", None) or getattr(current, "__context__", None)
-        if cause is None or cause is current:
-            break
-        current = cause
-    return None
-
-
-def _extract_error_body(error: Exception) -> dict:
-    """Extract the structured error body from an SDK exception."""
-    body = getattr(error, "body", None)
-    if isinstance(body, dict):
-        return body
-    # Some errors have .response.json()
-    response = getattr(error, "response", None)
-    if response is not None:
-        try:
-            json_body = response.json()
-            if isinstance(json_body, dict):
-                return json_body
-        except Exception:
-            pass
-    return {}
-
-
-def _extract_error_code(body: dict) -> str:
-    """Extract an error code string from the response body."""
-    if not body:
-        return ""
-    error_obj = body.get("error", {})
-    if isinstance(error_obj, dict):
-        code = error_obj.get("code") or error_obj.get("type") or ""
-        if isinstance(code, str) and code.strip():
-            return code.strip()
-    # Top-level code
-    code = body.get("code") or body.get("error_code") or ""
-    if isinstance(code, (str, int)):
-        return str(code).strip()
-    return ""
-
-
-def _extract_message(error: Exception, body: dict) -> str:
-    """Extract the most informative error message."""
-    # Try structured body first
-    if body:
-        error_obj = body.get("error", {})
-        if isinstance(error_obj, dict):
-            msg = error_obj.get("message", "")
-            if isinstance(msg, str) and msg.strip():
-                return msg.strip()[:500]
-        msg = body.get("message", "")
-        if isinstance(msg, str) and msg.strip():
-            return msg.strip()[:500]
-    # Fallback to str(error)
-    return str(error)[:500]
--- a/agent/model_metadata.py
+++ b/agent/model_metadata.py
@@ -603,49 +603,6 @@ def parse_context_limit_from_error(error_msg: str) -> Optional[int]:
    return None


-def parse_available_output_tokens_from_error(error_msg: str) -> Optional[int]:
-    """Detect an "output cap too large" error and return how many output tokens are available.
-
-    Background — two distinct context errors exist:
-      1. "Prompt too long"  — the INPUT itself exceeds the context window.
-           Fix: compress history and/or halve context_length.
-      2. "max_tokens too large" — input is fine, but input + requested_output > window.
-           Fix: reduce max_tokens (the output cap) for this call.
-           Do NOT touch context_length — the window hasn't shrunk.
-
-    Anthropic's API returns errors like:
-      "max_tokens: 32768 > context_window: 200000 - input_tokens: 190000 = available_tokens: 10000"
-
-    Returns the number of output tokens that would fit (e.g. 10000 above), or None if
-    the error does not look like a max_tokens-too-large error.
-    """
-    error_lower = error_msg.lower()
-
-    # Must look like an output-cap error, not a prompt-length error.
-    is_output_cap_error = (
-        "max_tokens" in error_lower
-        and ("available_tokens" in error_lower or "available tokens" in error_lower)
-    )
-    if not is_output_cap_error:
-        return None
-
-    # Extract the available_tokens figure.
-    # Anthropic format: "… = available_tokens: 10000"
-    patterns = [
-        r'available_tokens[:\s]+(\d+)',
-        r'available\s+tokens[:\s]+(\d+)',
-        # fallback: last number after "=" in expressions like "200000 - 190000 = 10000"
-        r'=\s*(\d+)\s*$',
-    ]
-    for pattern in patterns:
-        match = re.search(pattern, error_lower)
-        if match:
-            tokens = int(match.group(1))
-            if tokens >= 1:
-                return tokens
-    return None
-
-
 def _model_id_matches(candidate_id: str, lookup_model: str) -> bool:
    """Return True if *candidate_id* (from server) matches *lookup_model* (configured).

--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -48,25 +48,6 @@ model:
  # api_key: "your-key-here"  # Uncomment to set here instead of .env
  base_url: "https://openrouter.ai/api/v1"

-  # ── Token limits — two settings, easy to confuse ──────────────────────────
-  #
-  # context_length: TOTAL context window (input + output tokens combined).
-  #   Controls when Hermes compresses history and validates requests.
-  #   Leave unset — Hermes auto-detects the correct value from the provider.
-  #   Set manually only when auto-detection is wrong (e.g. a local server with
-  #   a custom num_ctx, or a proxy that doesn't expose /v1/models).
-  #
-  # context_length: 131072
-  #
-  # max_tokens: OUTPUT cap — maximum tokens the model may generate per response.
-  #   Unrelated to how long your conversation history can be.
-  #   The OpenAI-standard name "max_tokens" is a misnomer; Anthropic's native
-  #   API has since renamed it "max_output_tokens" for clarity.
-  #   Leave unset to use the model's native output ceiling (recommended).
-  #   Set only if you want to deliberately limit individual response length.
-  #
-  # max_tokens: 8192
-
 # =============================================================================
 # OpenRouter Provider Routing (only applies when using OpenRouter)
 # =============================================================================
--- a/cli.py
+++ b/cli.py
@@ -1603,12 +1603,7 @@ class HermesCLI:
        return f"[{('█' * filled) + ('░' * max(0, width - filled))}]"

    def _get_status_bar_snapshot(self) -> Dict[str, Any]:
-        # Prefer the agent's model name — it updates on fallback.
-        # self.model reflects the originally configured model and never
-        # changes mid-session, so the TUI would show a stale name after
-        # _try_activate_fallback() switches provider/model.
-        agent = getattr(self, "agent", None)
-        model_name = (getattr(agent, "model", None) or self.model or "unknown")
+        model_name = self.model or "unknown"
        model_short = model_name.split("/")[-1] if "/" in model_name else model_name
        if model_short.endswith(".gguf"):
            model_short = model_short[:-5]
@@ -1634,6 +1629,7 @@ class HermesCLI:
            "compressions": 0,
        }

+        agent = getattr(self, "agent", None)
        if not agent:
            return snapshot

@@ -4008,7 +4004,59 @@ class HermesCLI:

        print("  To change model or provider, use: hermes model")

-
+    def _handle_prompt_command(self, cmd: str):
+        """Handle the /prompt command to view or set system prompt."""
+        parts = cmd.split(maxsplit=1)
+        
+        if len(parts) > 1:
+            # Set new prompt
+            new_prompt = parts[1].strip()
+            
+            if new_prompt.lower() == "clear":
+                self.system_prompt = ""
+                self.agent = None  # Force re-init
+                if save_config_value("agent.system_prompt", ""):
+                    print("(^_^)b System prompt cleared (saved to config)")
+                else:
+                    print("(^_^) System prompt cleared (session only)")
+            else:
+                self.system_prompt = new_prompt
+                self.agent = None  # Force re-init
+                if save_config_value("agent.system_prompt", new_prompt):
+                    print("(^_^)b System prompt set (saved to config)")
+                else:
+                    print("(^_^) System prompt set (session only)")
+                print(f"  \"{new_prompt[:60]}{'...' if len(new_prompt) > 60 else ''}\"")
+        else:
+            # Show current prompt
+            print()
+            print("+" + "-" * 50 + "+")
+            print("|" + " " * 15 + "(^_^) System Prompt" + " " * 15 + "|")
+            print("+" + "-" * 50 + "+")
+            print()
+            if self.system_prompt:
+                # Word wrap the prompt for display
+                words = self.system_prompt.split()
+                lines = []
+                current_line = ""
+                for word in words:
+                    if len(current_line) + len(word) + 1 <= 50:
+                        current_line += (" " if current_line else "") + word
+                    else:
+                        lines.append(current_line)
+                        current_line = word
+                if current_line:
+                    lines.append(current_line)
+                for line in lines:
+                    print(f"  {line}")
+            else:
+                print("  (no custom prompt set - using default)")
+            print()
+            print("  Usage:")
+            print("    /prompt <text>  - Set a custom system prompt")
+            print("    /prompt clear   - Remove custom prompt")
+            print("    /personality    - Use a predefined personality")
+            print()
    

    @staticmethod
@@ -4508,7 +4556,9 @@ class HermesCLI:
            self._handle_model_switch(cmd_original)
        elif canonical == "provider":
            self._show_model_and_providers()
-
+        elif canonical == "prompt":
+            # Use original case so prompt text isn't lowercased
+            self._handle_prompt_command(cmd_original)
        elif canonical == "personality":
            # Use original case (handler lowercases the personality name itself)
            self._handle_personality_command(cmd_original)
--- a/flake.lock
+++ b/flake.lock
@@ -22,16 +22,16 @@
    },
    "nixpkgs": {
      "locked": {
-        "lastModified": 1775036866,
-        "narHash": "sha256-ZojAnPuCdy657PbTq5V0Y+AHKhZAIwSIT2cb8UgAz/U=",
+        "lastModified": 1751274312,
+        "narHash": "sha256-/bVBlRpECLVzjV19t5KMdMFWSwKLtb5RyXdjz3LJT+g=",
        "owner": "NixOS",
        "repo": "nixpkgs",
-        "rev": "6201e203d09599479a3b3450ed24fa81537ebc4e",
+        "rev": "50ab793786d9de88ee30ec4e4c24fb4236fc2674",
        "type": "github"
      },
      "original": {
        "owner": "NixOS",
-        "ref": "nixos-unstable",
+        "ref": "nixos-24.11",
        "repo": "nixpkgs",
        "type": "github"
      }
--- a/flake.nix
+++ b/flake.nix
@@ -2,7 +2,7 @@
  description = "Hermes Agent - AI agent framework by Nous Research";

  inputs = {
-    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
+    nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.11";
    flake-parts = {
      url = "github:hercules-ci/flake-parts";
      inputs.nixpkgs-lib.follows = "nixpkgs";
--- a/hermes_cli/auth.py
+++ b/hermes_cli/auth.py
@@ -250,7 +250,7 @@ PROVIDER_REGISTRY: Dict[str, ProviderConfig] = {
 # Kimi Code Endpoint Detection
 # =============================================================================

-# Kimi Code (kimi.com/code) issues keys prefixed "sk-kimi-" that only work
+# Kimi Code (platform.kimi.ai) issues keys prefixed "sk-kimi-" that only work
 # on api.kimi.com/coding/v1.  Legacy keys from platform.moonshot.ai work on
 # api.moonshot.ai/v1 (the default).  Auto-detect when user hasn't set
 # KIMI_BASE_URL explicitly.
@@ -3017,15 +3017,12 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
            _save_provider_state(auth_store, "nous", auth_state)
            saved_to = _save_auth_store(auth_store)

+        config_path = _update_config_for_provider("nous", inference_base_url)
        print()
        print("Login successful!")
        print(f"  Auth state: {saved_to}")
+        print(f"  Config updated: {config_path} (model.provider=nous)")

-        # Resolve model BEFORE writing provider to config.yaml so we never
-        # leave the config in a half-updated state (provider=nous but model
-        # still set to the previous provider's model, e.g. opus from
-        # OpenRouter).  The auth.json active_provider was already set above.
-        selected_model = None
        try:
            runtime_key = auth_state.get("agent_key") or auth_state.get("access_token")
            if not isinstance(runtime_key, str) or not runtime_key:
@@ -3059,6 +3056,9 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
                    unavailable_models=unavailable_models,
                    portal_url=_portal,
                )
+                if selected_model:
+                    _save_model_choice(selected_model)
+                    print(f"Default model set to: {selected_model}")
            elif unavailable_models:
                _url = (_portal or DEFAULT_NOUS_PORTAL_URL).rstrip("/")
                print("No free models currently available.")
@@ -3070,15 +3070,6 @@ def _login_nous(args, pconfig: ProviderConfig) -> None:
            print()
            print(f"Login succeeded, but could not fetch available models. Reason: {message}")

-        # Write provider + model atomically so config is never mismatched.
-        config_path = _update_config_for_provider(
-            "nous", inference_base_url, default_model=selected_model,
-        )
-        if selected_model:
-            _save_model_choice(selected_model)
-            print(f"Default model set to: {selected_model}")
-        print(f"  Config updated: {config_path} (model.provider=nous)")
-
    except KeyboardInterrupt:
        print("\nLogin cancelled.")
        raise SystemExit(130)
--- a/hermes_cli/commands.py
+++ b/hermes_cli/commands.py
@@ -87,7 +87,8 @@ COMMAND_REGISTRY: list[CommandDef] = [
    CommandDef("model", "Switch model for this session", "Configuration", args_hint="[model] [--global]"),
    CommandDef("provider", "Show available providers and current provider",
               "Configuration"),
-
+    CommandDef("prompt", "View/set custom system prompt", "Configuration",
+               cli_only=True, args_hint="[text]", subcommands=("clear",)),
    CommandDef("personality", "Set a predefined personality", "Configuration",
               args_hint="[name]"),
    CommandDef("statusbar", "Toggle the context/model status bar", "Configuration",
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@@ -569,7 +569,7 @@ DEFAULT_CONFIG = {
    },

    # Config schema version - bump this when adding new required fields
-    "_config_version": 13,
+    "_config_version": 12,
 }

 # =============================================================================
@@ -1701,21 +1701,6 @@ def migrate_config(interactive: bool = True, quiet: bool = False) -> Dict[str, A
                        ep = providers_dict[key]
                        print(f"    → {key}: {ep.get('api', '')}")

-    # ── Version 12 → 13: clear dead LLM_MODEL / OPENAI_MODEL from .env ──
-    # These env vars were written by the old setup wizard but nothing reads
-    # them anymore (config.yaml is the sole source of truth since March 2026).
-    # Stale entries cause user confusion — see issue report.
-    if current_ver < 13:
-        for dead_var in ("LLM_MODEL", "OPENAI_MODEL"):
-            try:
-                old_val = get_env_value(dead_var)
-                if old_val:
-                    save_env_value(dead_var, "")
-                    if not quiet:
-                        print(f"  ✓ Cleared {dead_var} from .env (no longer used — config.yaml is source of truth)")
-            except Exception:
-                pass
-
    if current_ver < latest_ver and not quiet:
        print(f"Config version: {current_ver} → {latest_ver}")
    
--- a/hermes_cli/dump.py
+++ b/hermes_cli/dump.py
@@ -1,337 +0,0 @@
-"""
-Dump command for hermes CLI.
-
-Outputs a compact, plain-text summary of the user's Hermes setup
-that can be copy-pasted into Discord/GitHub/Telegram for support context.
-No ANSI colors, no checkmarks — just data.
-"""
-
-import json
-import os
-import platform
-import subprocess
-import sys
-from pathlib import Path
-
-from hermes_cli.config import get_hermes_home, get_env_path, get_project_root, load_config
-from hermes_constants import display_hermes_home
-
-
-def _get_git_commit(project_root: Path) -> str:
-    """Return short git commit hash, or '(unknown)'."""
-    try:
-        result = subprocess.run(
-            ["git", "rev-parse", "--short=8", "HEAD"],
-            capture_output=True, text=True, timeout=5,
-            cwd=str(project_root),
-        )
-        if result.returncode == 0:
-            return result.stdout.strip()
-    except Exception:
-        pass
-    return "(unknown)"
-
-
-def _key_present(name: str) -> str:
-    """Return 'set' or 'not set' for an env var."""
-    return "set" if os.getenv(name) else "not set"
-
-
-def _redact(value: str) -> str:
-    """Redact all but first 4 and last 4 chars."""
-    if not value:
-        return ""
-    if len(value) < 12:
-        return "***"
-    return value[:4] + "..." + value[-4:]
-
-
-def _gateway_status() -> str:
-    """Return a short gateway status string."""
-    if sys.platform.startswith("linux"):
-        try:
-            from hermes_cli.gateway import get_service_name
-            svc = get_service_name()
-        except Exception:
-            svc = "hermes-gateway"
-        try:
-            r = subprocess.run(
-                ["systemctl", "--user", "is-active", svc],
-                capture_output=True, text=True, timeout=5,
-            )
-            return "running (systemd)" if r.stdout.strip() == "active" else "stopped"
-        except Exception:
-            return "unknown"
-    elif sys.platform == "darwin":
-        try:
-            from hermes_cli.gateway import get_launchd_label
-            r = subprocess.run(
-                ["launchctl", "list", get_launchd_label()],
-                capture_output=True, text=True, timeout=5,
-            )
-            return "loaded (launchd)" if r.returncode == 0 else "not loaded"
-        except Exception:
-            return "unknown"
-    return "N/A"
-
-
-def _count_skills(hermes_home: Path) -> int:
-    """Count installed skills."""
-    skills_dir = hermes_home / "skills"
-    if not skills_dir.is_dir():
-        return 0
-    count = 0
-    for item in skills_dir.rglob("SKILL.md"):
-        count += 1
-    return count
-
-
-def _count_mcp_servers(config: dict) -> int:
-    """Count configured MCP servers."""
-    mcp = config.get("mcp", {})
-    servers = mcp.get("servers", {})
-    return len(servers)
-
-
-def _cron_summary(hermes_home: Path) -> str:
-    """Return cron jobs summary."""
-    jobs_file = hermes_home / "cron" / "jobs.json"
-    if not jobs_file.exists():
-        return "0"
-    try:
-        with open(jobs_file, encoding="utf-8") as f:
-            data = json.load(f)
-        jobs = data.get("jobs", [])
-        active = sum(1 for j in jobs if j.get("enabled", True))
-        return f"{active} active / {len(jobs)} total"
-    except Exception:
-        return "(error reading)"
-
-
-def _configured_platforms() -> list[str]:
-    """Return list of configured messaging platform names."""
-    checks = {
-        "telegram": "TELEGRAM_BOT_TOKEN",
-        "discord": "DISCORD_BOT_TOKEN",
-        "slack": "SLACK_BOT_TOKEN",
-        "whatsapp": "WHATSAPP_ENABLED",
-        "signal": "SIGNAL_HTTP_URL",
-        "email": "EMAIL_ADDRESS",
-        "sms": "TWILIO_ACCOUNT_SID",
-        "matrix": "MATRIX_HOMESERVER_URL",
-        "mattermost": "MATTERMOST_URL",
-        "homeassistant": "HASS_TOKEN",
-        "dingtalk": "DINGTALK_CLIENT_ID",
-        "feishu": "FEISHU_APP_ID",
-        "wecom": "WECOM_BOT_ID",
-    }
-    return [name for name, env in checks.items() if os.getenv(env)]
-
-
-def _memory_provider(config: dict) -> str:
-    """Return the active memory provider name."""
-    mem = config.get("memory", {})
-    provider = mem.get("provider", "")
-    return provider if provider else "built-in"
-
-
-def _get_model_and_provider(config: dict) -> tuple[str, str]:
-    """Extract model and provider from config."""
-    model_cfg = config.get("model", "")
-    if isinstance(model_cfg, dict):
-        model = model_cfg.get("default") or model_cfg.get("model") or model_cfg.get("name") or "(not set)"
-        provider = model_cfg.get("provider") or "(auto)"
-    elif isinstance(model_cfg, str):
-        model = model_cfg or "(not set)"
-        provider = "(auto)"
-    else:
-        model = "(not set)"
-        provider = "(auto)"
-    return model, provider
-
-
-def _config_overrides(config: dict) -> dict[str, str]:
-    """Find non-default config values worth reporting.
-    
-    Returns a flat dict of dotpath -> value for interesting overrides.
-    """
-    from hermes_cli.config import DEFAULT_CONFIG
-
-    overrides = {}
-
-    # Sections with interesting user-facing overrides
-    interesting_paths = [
-        ("agent", "max_turns"),
-        ("agent", "gateway_timeout"),
-        ("agent", "tool_use_enforcement"),
-        ("terminal", "backend"),
-        ("terminal", "docker_image"),
-        ("terminal", "persistent_shell"),
-        ("browser", "allow_private_urls"),
-        ("compression", "enabled"),
-        ("compression", "threshold"),
-        ("display", "streaming"),
-        ("display", "skin"),
-        ("display", "show_reasoning"),
-        ("smart_model_routing", "enabled"),
-        ("privacy", "redact_pii"),
-        ("tts", "provider"),
-    ]
-
-    for section, key in interesting_paths:
-        default_section = DEFAULT_CONFIG.get(section, {})
-        user_section = config.get(section, {})
-        if not isinstance(default_section, dict) or not isinstance(user_section, dict):
-            continue
-        default_val = default_section.get(key)
-        user_val = user_section.get(key)
-        if user_val is not None and user_val != default_val:
-            overrides[f"{section}.{key}"] = str(user_val)
-
-    # Toolsets (if different from default)
-    default_toolsets = DEFAULT_CONFIG.get("toolsets", [])
-    user_toolsets = config.get("toolsets", [])
-    if user_toolsets != default_toolsets:
-        overrides["toolsets"] = str(user_toolsets)
-
-    # Fallback providers
-    fallbacks = config.get("fallback_providers", [])
-    if fallbacks:
-        overrides["fallback_providers"] = str(fallbacks)
-
-    return overrides
-
-
-def run_dump(args):
-    """Output a compact, copy-pasteable setup summary."""
-    show_keys = getattr(args, "show_keys", False)
-
-    # Load env from .env file so key checks work
-    from dotenv import load_dotenv
-    env_path = get_env_path()
-    if env_path.exists():
-        try:
-            load_dotenv(env_path, encoding="utf-8")
-        except UnicodeDecodeError:
-            load_dotenv(env_path, encoding="latin-1")
-    # Also try project .env as dev fallback
-    load_dotenv(get_project_root() / ".env", override=False, encoding="utf-8")
-
-    project_root = get_project_root()
-    hermes_home = get_hermes_home()
-
-    try:
-        from hermes_cli import __version__, __release_date__
-    except ImportError:
-        __version__ = "(unknown)"
-        __release_date__ = ""
-
-    commit = _get_git_commit(project_root)
-
-    try:
-        config = load_config()
-    except Exception:
-        config = {}
-
-    model, provider = _get_model_and_provider(config)
-
-    # Profile
-    try:
-        from hermes_cli.profiles import get_active_profile_name
-        profile = get_active_profile_name() or "(default)"
-    except Exception:
-        profile = "(default)"
-
-    # Terminal backend
-    terminal_cfg = config.get("terminal", {})
-    backend = terminal_cfg.get("backend", "local")
-
-    # OpenAI SDK version
-    try:
-        import openai
-        openai_ver = openai.__version__
-    except ImportError:
-        openai_ver = "not installed"
-
-    # OS info
-    os_info = f"{platform.system()} {platform.release()} {platform.machine()}"
-
-    lines = []
-    lines.append("--- hermes dump ---")
-    ver_str = f"{__version__}"
-    if __release_date__:
-        ver_str += f" ({__release_date__})"
-    ver_str += f" [{commit}]"
-    lines.append(f"version:          {ver_str}")
-    lines.append(f"os:               {os_info}")
-    lines.append(f"python:           {sys.version.split()[0]}")
-    lines.append(f"openai_sdk:       {openai_ver}")
-    lines.append(f"profile:          {profile}")
-    lines.append(f"hermes_home:      {display_hermes_home()}")
-    lines.append(f"model:            {model}")
-    lines.append(f"provider:         {provider}")
-    lines.append(f"terminal:         {backend}")
-
-    # API keys
-    lines.append("")
-    lines.append("api_keys:")
-    api_keys = [
-        ("OPENROUTER_API_KEY", "openrouter"),
-        ("OPENAI_API_KEY", "openai"),
-        ("ANTHROPIC_API_KEY", "anthropic"),
-        ("ANTHROPIC_TOKEN", "anthropic_token"),
-        ("NOUS_API_KEY", "nous"),
-        ("GLM_API_KEY", "glm/zai"),
-        ("ZAI_API_KEY", "zai"),
-        ("KIMI_API_KEY", "kimi"),
-        ("MINIMAX_API_KEY", "minimax"),
-        ("DEEPSEEK_API_KEY", "deepseek"),
-        ("DASHSCOPE_API_KEY", "dashscope"),
-        ("HF_TOKEN", "huggingface"),
-        ("AI_GATEWAY_API_KEY", "ai_gateway"),
-        ("OPENCODE_ZEN_API_KEY", "opencode_zen"),
-        ("OPENCODE_GO_API_KEY", "opencode_go"),
-        ("KILOCODE_API_KEY", "kilocode"),
-        ("FIRECRAWL_API_KEY", "firecrawl"),
-        ("TAVILY_API_KEY", "tavily"),
-        ("BROWSERBASE_API_KEY", "browserbase"),
-        ("FAL_KEY", "fal"),
-        ("ELEVENLABS_API_KEY", "elevenlabs"),
-        ("GITHUB_TOKEN", "github"),
-    ]
-
-    for env_var, label in api_keys:
-        val = os.getenv(env_var, "")
-        if show_keys and val:
-            display = _redact(val)
-        else:
-            display = "set" if val else "not set"
-        lines.append(f"  {label:<20} {display}")
-
-    # Features summary
-    lines.append("")
-    lines.append("features:")
-
-    toolsets = config.get("toolsets", ["hermes-cli"])
-    lines.append(f"  toolsets:           {', '.join(toolsets) if toolsets else '(default)'}")
-    lines.append(f"  mcp_servers:        {_count_mcp_servers(config)}")
-    lines.append(f"  memory_provider:    {_memory_provider(config)}")
-    lines.append(f"  gateway:            {_gateway_status()}")
-
-    platforms = _configured_platforms()
-    lines.append(f"  platforms:          {', '.join(platforms) if platforms else 'none'}")
-    lines.append(f"  cron_jobs:          {_cron_summary(hermes_home)}")
-    lines.append(f"  skills:             {_count_skills(hermes_home)}")
-
-    # Config overrides (non-default values)
-    overrides = _config_overrides(config)
-    if overrides:
-        lines.append("")
-        lines.append("config_overrides:")
-        for key, val in overrides.items():
-            lines.append(f"  {key}: {val}")
-
-    lines.append("--- end dump ---")
-
-    output = "\n".join(lines)
-    print(output)
--- a/hermes_cli/main.py
+++ b/hermes_cli/main.py
@@ -2643,12 +2643,6 @@ def cmd_doctor(args):
    run_doctor(args)


-def cmd_dump(args):
-    """Dump setup summary for support/debugging."""
-    from hermes_cli.dump import run_dump
-    run_dump(args)
-
-
 def cmd_config(args):
    """Configuration management."""
    from hermes_cli.config import config_command
@@ -4730,22 +4724,6 @@ For more help on a command:
        help="Attempt to fix issues automatically"
    )
    doctor_parser.set_defaults(func=cmd_doctor)
-
-    # =========================================================================
-    # dump command
-    # =========================================================================
-    dump_parser = subparsers.add_parser(
-        "dump",
-        help="Dump setup summary for support/debugging",
-        description="Output a compact, plain-text summary of your Hermes setup "
-                    "that can be copy-pasted into Discord/GitHub for support context"
-    )
-    dump_parser.add_argument(
-        "--show-keys",
-        action="store_true",
-        help="Show redacted API key prefixes (first/last 4 chars) instead of just set/not set"
-    )
-    dump_parser.set_defaults(func=cmd_dump)
    
    # =========================================================================
    # config command
--- a/hermes_cli/model_switch.py
+++ b/hermes_cli/model_switch.py
@@ -733,7 +733,6 @@ def list_authenticated_providers(
        fetch_models_dev,
        get_provider_info as _mdev_pinfo,
    )
-    from hermes_cli.auth import PROVIDER_REGISTRY
    from hermes_cli.models import OPENROUTER_MODELS, _PROVIDER_MODELS

    results: List[dict] = []
@@ -754,16 +753,9 @@ def list_authenticated_providers(
        if not isinstance(pdata, dict):
            continue

-        # Prefer auth.py PROVIDER_REGISTRY for env var names — it's our
-        # source of truth.  models.dev can have wrong mappings (e.g.
-        # minimax-cn → MINIMAX_API_KEY instead of MINIMAX_CN_API_KEY).
-        pconfig = PROVIDER_REGISTRY.get(hermes_id)
-        if pconfig and pconfig.api_key_env_vars:
-            env_vars = list(pconfig.api_key_env_vars)
-        else:
-            env_vars = pdata.get("env", [])
-            if not isinstance(env_vars, list):
-                continue
+        env_vars = pdata.get("env", [])
+        if not isinstance(env_vars, list):
+            continue

        # Check if any env var is set
        has_creds = any(os.environ.get(ev) for ev in env_vars)
--- a/hermes_cli/profiles.py
+++ b/hermes_cli/profiles.py
@@ -102,7 +102,7 @@ _RESERVED_NAMES = frozenset({
 # Hermes subcommands that cannot be used as profile names/aliases
 _HERMES_SUBCOMMANDS = frozenset({
    "chat", "model", "gateway", "setup", "whatsapp", "login", "logout",
-    "status", "cron", "doctor", "dump", "config", "pairing", "skills", "tools",
+    "status", "cron", "doctor", "config", "pairing", "skills", "tools",
    "mcp", "sessions", "insights", "version", "update", "uninstall",
    "profile", "plugins", "honcho", "acp",
 })
@@ -1007,7 +1007,7 @@ _hermes_completion() {

    # Top-level subcommands
    if [[ "$COMP_CWORD" == 1 ]]; then
-        local commands="chat model gateway setup status cron doctor dump config skills tools mcp sessions profile update version"
+        local commands="chat model gateway setup status cron doctor config skills tools mcp sessions profile update version"
        COMPREPLY=($(compgen -W "$commands" -- "$cur"))
    fi
 }
@@ -1032,7 +1032,7 @@ _hermes() {
    _arguments \\
        '-p[Profile name]:profile:($profiles)' \\
        '--profile[Profile name]:profile:($profiles)' \\
-        '1:command:(chat model gateway setup status cron doctor dump config skills tools mcp sessions profile update version)' \\
+        '1:command:(chat model gateway setup status cron doctor config skills tools mcp sessions profile update version)' \\
        '*::arg:->args'

    case $words[1] in
--- a/hermes_cli/setup.py
+++ b/hermes_cli/setup.py
@@ -2572,120 +2572,9 @@ _OPENCLAW_SCRIPT = (
 )


-def _load_openclaw_migration_module():
-    """Load the openclaw_to_hermes migration script as a module.
-
-    Returns the loaded module, or None if the script can't be loaded.
-    """
-    if not _OPENCLAW_SCRIPT.exists():
-        return None
-
-    spec = importlib.util.spec_from_file_location(
-        "openclaw_to_hermes", _OPENCLAW_SCRIPT
-    )
-    if spec is None or spec.loader is None:
-        return None
-
-    mod = importlib.util.module_from_spec(spec)
-    # Register in sys.modules so @dataclass can resolve the module
-    # (Python 3.11+ requires this for dynamically loaded modules)
-    import sys as _sys
-    _sys.modules[spec.name] = mod
-    try:
-        spec.loader.exec_module(mod)
-    except Exception:
-        _sys.modules.pop(spec.name, None)
-        raise
-    return mod
-
-
-# Item kinds that represent high-impact changes warranting explicit warnings.
-# Gateway tokens/channels can hijack messaging platforms from the old agent.
-# Config values may have different semantics between OpenClaw and Hermes.
-# Instruction/context files (.md) can contain incompatible setup procedures.
-_HIGH_IMPACT_KIND_KEYWORDS = {
-    "gateway": "⚠ Gateway/messaging — this will configure Hermes to use your OpenClaw messaging channels",
-    "telegram": "⚠ Telegram — this will point Hermes at your OpenClaw Telegram bot",
-    "slack": "⚠ Slack — this will point Hermes at your OpenClaw Slack workspace",
-    "discord": "⚠ Discord — this will point Hermes at your OpenClaw Discord bot",
-    "whatsapp": "⚠ WhatsApp — this will point Hermes at your OpenClaw WhatsApp connection",
-    "config": "⚠ Config values — OpenClaw settings may not map 1:1 to Hermes equivalents",
-    "soul": "⚠ Instruction file — may contain OpenClaw-specific setup/restart procedures",
-    "memory": "⚠ Memory/context file — may reference OpenClaw-specific infrastructure",
-    "context": "⚠ Context file — may contain OpenClaw-specific instructions",
-}
-
-
-def _print_migration_preview(report: dict):
-    """Print a detailed dry-run preview of what migration would do.
-
-    Groups items by category and adds explicit warnings for high-impact
-    changes like gateway token takeover and config value differences.
-    """
-    items = report.get("items", [])
-    if not items:
-        print_info("Nothing to migrate.")
-        return
-
-    migrated_items = [i for i in items if i.get("status") == "migrated"]
-    conflict_items = [i for i in items if i.get("status") == "conflict"]
-    skipped_items = [i for i in items if i.get("status") == "skipped"]
-
-    warnings_shown = set()
-
-    if migrated_items:
-        print(color("  Would import:", Colors.GREEN))
-        for item in migrated_items:
-            kind = item.get("kind", "unknown")
-            dest = item.get("destination", "")
-            if dest:
-                dest_short = str(dest).replace(str(Path.home()), "~")
-                print(f"      {kind:<22s} → {dest_short}")
-            else:
-                print(f"      {kind}")
-
-            # Check for high-impact items and collect warnings
-            kind_lower = kind.lower()
-            dest_lower = str(dest).lower()
-            for keyword, warning in _HIGH_IMPACT_KIND_KEYWORDS.items():
-                if keyword in kind_lower or keyword in dest_lower:
-                    warnings_shown.add(warning)
-        print()
-
-    if conflict_items:
-        print(color("  Would overwrite (conflicts with existing Hermes config):", Colors.YELLOW))
-        for item in conflict_items:
-            kind = item.get("kind", "unknown")
-            reason = item.get("reason", "already exists")
-            print(f"      {kind:<22s}  {reason}")
-        print()
-
-    if skipped_items:
-        print(color("  Would skip:", Colors.DIM))
-        for item in skipped_items:
-            kind = item.get("kind", "unknown")
-            reason = item.get("reason", "")
-            print(f"      {kind:<22s}  {reason}")
-        print()
-
-    # Print collected warnings
-    if warnings_shown:
-        print(color("  ── Warnings ──", Colors.YELLOW))
-        for warning in sorted(warnings_shown):
-            print(color(f"    {warning}", Colors.YELLOW))
-        print()
-        print(color("  Note: OpenClaw config values may have different semantics in Hermes.", Colors.YELLOW))
-        print(color("  For example, OpenClaw's tool_call_execution: \"auto\" ≠ Hermes's yolo mode.", Colors.YELLOW))
-        print(color("  Instruction files (.md) from OpenClaw may contain incompatible procedures.", Colors.YELLOW))
-        print()
-
-
 def _offer_openclaw_migration(hermes_home: Path) -> bool:
    """Detect ~/.openclaw and offer to migrate during first-time setup.

-    Runs a dry-run first to show the user exactly what would be imported,
-    overwritten, or taken over. Only executes after explicit confirmation.
-
    Returns True if migration ran successfully, False otherwise.
    """
    openclaw_dir = Path.home() / ".openclaw"
@@ -2698,12 +2587,12 @@ def _offer_openclaw_migration(hermes_home: Path) -> bool:
    print()
    print_header("OpenClaw Installation Detected")
    print_info(f"Found OpenClaw data at {openclaw_dir}")
-    print_info("Hermes can preview what would be imported before making any changes.")
+    print_info("Hermes can import your settings, memories, skills, and API keys.")
    print()

-    if not prompt_yes_no("Would you like to see what can be imported?", default=True):
+    if not prompt_yes_no("Would you like to import from OpenClaw?", default=True):
        print_info(
-            "Skipping migration. You can run it later with: hermes claw migrate --dry-run"
+            "Skipping migration. You can run it later via the openclaw-migration skill."
        )
        return False

@@ -2712,71 +2601,34 @@ def _offer_openclaw_migration(hermes_home: Path) -> bool:
    if not config_path.exists():
        save_config(load_config())

-    # Load the migration module
+    # Dynamically load the migration script
    try:
-        mod = _load_openclaw_migration_module()
-        if mod is None:
+        spec = importlib.util.spec_from_file_location(
+            "openclaw_to_hermes", _OPENCLAW_SCRIPT
+        )
+        if spec is None or spec.loader is None:
            print_warning("Could not load migration script.")
            return False
-    except Exception as e:
-        print_warning(f"Could not load migration script: {e}")
-        logger.debug("OpenClaw migration module load error", exc_info=True)
-        return False

-    # ── Phase 1: Dry-run preview ──
-    try:
+        mod = importlib.util.module_from_spec(spec)
+        # Register in sys.modules so @dataclass can resolve the module
+        # (Python 3.11+ requires this for dynamically loaded modules)
+        import sys as _sys
+        _sys.modules[spec.name] = mod
+        try:
+            spec.loader.exec_module(mod)
+        except Exception:
+            _sys.modules.pop(spec.name, None)
+            raise
+
+        # Run migration with the "full" preset, execute mode, no overwrite
        selected = mod.resolve_selected_options(None, None, preset="full")
-        dry_migrator = mod.Migrator(
-            source_root=openclaw_dir.resolve(),
-            target_root=hermes_home.resolve(),
-            execute=False,  # dry-run — no files modified
-            workspace_target=None,
-            overwrite=True,  # show everything including conflicts
-            migrate_secrets=True,
-            output_dir=None,
-            selected_options=selected,
-            preset_name="full",
-        )
-        preview_report = dry_migrator.migrate()
-    except Exception as e:
-        print_warning(f"Migration preview failed: {e}")
-        logger.debug("OpenClaw migration preview error", exc_info=True)
-        return False
-
-    # Display the full preview
-    preview_summary = preview_report.get("summary", {})
-    preview_count = preview_summary.get("migrated", 0)
-
-    if preview_count == 0:
-        print()
-        print_info("Nothing to import from OpenClaw.")
-        return False
-
-    print()
-    print_header(f"Migration Preview — {preview_count} item(s) would be imported")
-    print_info("No changes have been made yet. Review the list below:")
-    print()
-    _print_migration_preview(preview_report)
-
-    # ── Phase 2: Confirm and execute ──
-    if not prompt_yes_no("Proceed with migration?", default=False):
-        print_info(
-            "Migration cancelled. You can run it later with: hermes claw migrate"
-        )
-        print_info(
-            "Use --dry-run to preview again, or --preset minimal for a lighter import."
-        )
-        return False
-
-    # Execute the migration — overwrite=False so existing Hermes configs are
-    # preserved. The user saw the preview; conflicts are skipped by default.
-    try:
        migrator = mod.Migrator(
            source_root=openclaw_dir.resolve(),
            target_root=hermes_home.resolve(),
            execute=True,
            workspace_target=None,
-            overwrite=False,  # preserve existing Hermes config
+            overwrite=True,
            migrate_secrets=True,
            output_dir=None,
            selected_options=selected,
@@ -2788,7 +2640,7 @@ def _offer_openclaw_migration(hermes_home: Path) -> bool:
        logger.debug("OpenClaw migration error", exc_info=True)
        return False

-    # Print final summary
+    # Print summary
    summary = report.get("summary", {})
    migrated = summary.get("migrated", 0)
    skipped = summary.get("skipped", 0)
@@ -2799,7 +2651,7 @@ def _offer_openclaw_migration(hermes_home: Path) -> bool:
    if migrated:
        print_success(f"Imported {migrated} item(s) from OpenClaw.")
    if conflicts:
-        print_info(f"Skipped {conflicts} item(s) that already exist in Hermes (use hermes claw migrate --overwrite to force).")
+        print_info(f"Skipped {conflicts} item(s) that already exist in Hermes.")
    if skipped:
        print_info(f"Skipped {skipped} item(s) (not found or unchanged).")
    if errors:
--- a/nix/nixosModules.nix
+++ b/nix/nixosModules.nix
@@ -569,7 +569,7 @@

      # ── Activation: link config + auth + documents ────────────────────
      {
-        system.activationScripts."hermes-agent-setup" = lib.stringAfter ([ "users" ] ++ lib.optional (config.system.activationScripts ? setupSecrets) "setupSecrets") ''
+        system.activationScripts."hermes-agent-setup" = lib.stringAfter [ "users" "setupSecrets" ] ''
          # Ensure directories exist (activation runs before tmpfiles)
          mkdir -p ${cfg.stateDir}/.hermes
          mkdir -p ${cfg.stateDir}/home
--- a/nix/packages.nix
+++ b/nix/packages.nix
@@ -14,7 +14,7 @@
      };

      runtimeDeps = with pkgs; [
-        nodejs_20 ripgrep git openssh ffmpeg tirith
+        nodejs_20 ripgrep git openssh ffmpeg
      ];

      runtimePath = pkgs.lib.makeBinPath runtimeDeps;
--- a/run_agent.py
+++ b/run_agent.py
@@ -77,7 +77,6 @@ from hermes_constants import OPENROUTER_BASE_URL
 # Agent internals extracted to agent/ package for modularity
 from agent.memory_manager import build_memory_context_block
 from agent.retry_utils import jittered_backoff
-from agent.error_classifier import classify_api_error, FailoverReason
 from agent.prompt_builder import (
    DEFAULT_AGENT_IDENTITY, PLATFORM_HINTS,
    MEMORY_GUIDANCE, SESSION_SEARCH_GUIDANCE, SKILLS_GUIDANCE,
@@ -87,7 +86,6 @@ from agent.model_metadata import (
    fetch_model_metadata,
    estimate_tokens_rough, estimate_messages_tokens_rough, estimate_request_tokens_rough,
    get_next_probe_tier, parse_context_limit_from_error,
-    parse_available_output_tokens_from_error,
    save_context_length, is_local_endpoint,
    query_ollama_num_ctx,
 )
@@ -4969,21 +4967,9 @@ class AIAgent:
                # Swap OpenAI client and config in-place
                self.api_key = fb_client.api_key
                self.client = fb_client
-                # Preserve provider-specific headers that
-                # resolve_provider_client() may have baked into
-                # fb_client via the default_headers kwarg.  The OpenAI
-                # SDK stores these in _custom_headers.  Without this,
-                # subsequent request-client rebuilds (via
-                # _create_request_openai_client) drop the headers,
-                # causing 403s from providers like Kimi Coding that
-                # require a User-Agent sentinel.
-                fb_headers = getattr(fb_client, "_custom_headers", None)
-                if not fb_headers:
-                    fb_headers = getattr(fb_client, "default_headers", None)
                self._client_kwargs = {
                    "api_key": fb_client.api_key,
                    "base_url": fb_base_url,
-                    **({"default_headers": dict(fb_headers)} if fb_headers else {}),
                }

            # Re-evaluate prompt caching for the new provider/model
@@ -5398,22 +5384,15 @@ class AIAgent:
        if self.api_mode == "anthropic_messages":
            from agent.anthropic_adapter import build_anthropic_kwargs
            anthropic_messages = self._prepare_anthropic_messages_for_api(api_messages)
-            # Pass context_length (total input+output window) so the adapter can
-            # clamp max_tokens (output cap) when the user configured a smaller
-            # context window than the model's native output limit.
+            # Pass context_length so the adapter can clamp max_tokens if the
+            # user configured a smaller context window than the model's output limit.
            ctx_len = getattr(self, "context_compressor", None)
            ctx_len = ctx_len.context_length if ctx_len else None
-            # _ephemeral_max_output_tokens is set for one call when the API
-            # returns "max_tokens too large given prompt" — it caps output to
-            # the available window space without touching context_length.
-            ephemeral_out = getattr(self, "_ephemeral_max_output_tokens", None)
-            if ephemeral_out is not None:
-                self._ephemeral_max_output_tokens = None  # consume immediately
            return build_anthropic_kwargs(
                model=self.model,
                messages=anthropic_messages,
                tools=self.tools,
-                max_tokens=ephemeral_out if ephemeral_out is not None else self.max_tokens,
+                max_tokens=self.max_tokens,
                reasoning_config=self.reasoning_config,
                is_oauth=self._is_anthropic_oauth,
                preserve_dots=self._anthropic_preserve_dots(),
@@ -7302,7 +7281,6 @@ class AIAgent:
        length_continue_retries = 0
        truncated_response_prefix = ""
        compression_attempts = 0
-        _turn_exit_reason = "unknown"  # Diagnostic: why the loop ended
        
        # Clear any stale interrupt state at start
        self.clear_interrupt()
@@ -7327,7 +7305,6 @@ class AIAgent:
            # Check for interrupt request (e.g., user sent new message)
            if self._interrupt_requested:
                interrupted = True
-                _turn_exit_reason = "interrupted_by_user"
                if not self.quiet_mode:
                    self._safe_print("\n⚡ Breaking out of tool loop due to interrupt...")
                break
@@ -7336,7 +7313,6 @@ class AIAgent:
            self._api_call_count = api_call_count
            self._touch_activity(f"starting API call #{api_call_count}")
            if not self.iteration_budget.consume():
-                _turn_exit_reason = "budget_exhausted"
                if not self.quiet_mode:
                    self._safe_print(f"\n⚠️  Iteration budget exhausted ({self.iteration_budget.used}/{self.iteration_budget.max_total} iterations used)")
                break
@@ -8041,25 +8017,6 @@ class AIAgent:

                    status_code = getattr(api_error, "status_code", None)
                    error_context = self._extract_api_error_context(api_error)
-
-                    # ── Classify the error for structured recovery decisions ──
-                    _compressor = getattr(self, "context_compressor", None)
-                    _ctx_len = getattr(_compressor, "context_length", 200000) if _compressor else 200000
-                    classified = classify_api_error(
-                        api_error,
-                        provider=getattr(self, "provider", "") or "",
-                        model=getattr(self, "model", "") or "",
-                        approx_tokens=approx_tokens,
-                        context_length=_ctx_len,
-                        num_messages=len(api_messages) if api_messages else 0,
-                    )
-                    logger.debug(
-                        "Error classified: reason=%s status=%s retryable=%s compress=%s rotate=%s fallback=%s",
-                        classified.reason.value, classified.status_code,
-                        classified.retryable, classified.should_compress,
-                        classified.should_rotate_credential, classified.should_fallback,
-                    )
-
                    recovered_with_pool, has_retried_429 = self._recover_with_credential_pool(
                        status_code=status_code,
                        has_retried_429=has_retried_429,
@@ -8122,24 +8079,27 @@ class AIAgent:
                    # from all messages so the next retry sends no thinking
                    # blocks at all.  One-shot — don't retry infinitely.
                    if (
-                        classified.reason == FailoverReason.thinking_signature
+                        self.api_mode == "anthropic_messages"
+                        and status_code == 400
                        and not thinking_sig_retry_attempted
                    ):
-                        thinking_sig_retry_attempted = True
-                        for _m in messages:
-                            if isinstance(_m, dict):
-                                _m.pop("reasoning_details", None)
-                        self._vprint(
-                            f"{self.log_prefix}⚠️  Thinking block signature invalid — "
-                            f"stripped all thinking blocks, retrying...",
-                            force=True,
-                        )
-                        logging.warning(
-                            "%sThinking block signature recovery: stripped "
-                            "reasoning_details from %d messages",
-                            self.log_prefix, len(messages),
-                        )
-                        continue
+                        _err_msg_lower = str(api_error).lower()
+                        if "signature" in _err_msg_lower and "thinking" in _err_msg_lower:
+                            thinking_sig_retry_attempted = True
+                            for _m in messages:
+                                if isinstance(_m, dict):
+                                    _m.pop("reasoning_details", None)
+                            self._vprint(
+                                f"{self.log_prefix}⚠️  Thinking block signature invalid — "
+                                f"stripped all thinking blocks, retrying...",
+                                force=True,
+                            )
+                            logging.warning(
+                                "%sThinking block signature recovery: stripped "
+                                "reasoning_details from %d messages",
+                                self.log_prefix, len(messages),
+                            )
+                            continue

                    retry_count += 1
                    elapsed_time = time.time() - api_start_time
@@ -8196,7 +8156,14 @@ class AIAgent:
                    # is NOT a transient rate limit — retrying or switching
                    # credentials won't help.  Reduce context to 200k (the
                    # standard tier) and compress.
-                    if classified.reason == FailoverReason.long_context_tier:
+                    # Only applies to Sonnet — Opus 1M is general access.
+                    _is_long_context_tier_error = (
+                        status_code == 429
+                        and "extra usage" in error_msg
+                        and "long context" in error_msg
+                        and "sonnet" in self.model.lower()
+                    )
+                    if _is_long_context_tier_error:
                        _reduced_ctx = 200000
                        compressor = self.context_compressor
                        old_ctx = compressor.context_length
@@ -8241,9 +8208,13 @@ class AIAgent:
                    # When a fallback model is configured, switch immediately instead
                    # of burning through retries with exponential backoff -- the
                    # primary provider won't recover within the retry window.
-                    is_rate_limited = classified.reason in (
-                        FailoverReason.rate_limit,
-                        FailoverReason.billing,
+                    is_rate_limited = (
+                        status_code == 429
+                        or "rate limit" in error_msg
+                        or "too many requests" in error_msg
+                        or "rate_limit" in error_msg
+                        or "usage limit" in error_msg
+                        or "quota" in error_msg
                    )
                    if is_rate_limited and self._fallback_index < len(self._fallback_chain):
                        # Don't eagerly fallback if credential pool rotation may
@@ -8259,7 +8230,10 @@ class AIAgent:
                                continue

                    is_payload_too_large = (
-                        classified.reason == FailoverReason.payload_too_large
+                        status_code == 413
+                        or 'request entity too large' in error_msg
+                        or 'payload too large' in error_msg
+                        or 'error code: 413' in error_msg
                    )

                    if is_payload_too_large:
@@ -8303,59 +8277,69 @@ class AIAgent:
                            }

                    # Check for context-length errors BEFORE generic 4xx handler.
-                    # The classifier detects context overflow from: explicit error
-                    # messages, generic 400 + large session heuristic (#1630), and
-                    # server disconnect + large session pattern (#2153).
-                    is_context_length_error = (
-                        classified.reason == FailoverReason.context_overflow
-                    )
+                    # Local backends (LM Studio, Ollama, llama.cpp) often return
+                    # HTTP 400 with messages like "Context size has been exceeded"
+                    # which must trigger compression, not an immediate abort.
+                    is_context_length_error = any(phrase in error_msg for phrase in [
+                        'context length', 'context size', 'maximum context',
+                        'token limit', 'too many tokens', 'reduce the length',
+                        'exceeds the limit', 'context window',
+                        'request entity too large',  # OpenRouter/Nous 413 safety net
+                        'prompt is too long',  # Anthropic: "prompt is too long: N tokens > M maximum"
+                        'prompt exceeds max length',  # Z.AI / GLM: generic 400 overflow wording
+                    ])
+
+                    # Fallback heuristic: Anthropic sometimes returns a generic
+                    # 400 invalid_request_error with just "Error" as the message
+                    # when the context is too large.  If the error message is very
+                    # short/generic AND the session is large, treat it as a
+                    # probable context-length error and attempt compression rather
+                    # than aborting.  This prevents an infinite failure loop where
+                    # each failed message gets persisted, making the session even
+                    # larger. (#1630)
+                    if not is_context_length_error and status_code == 400:
+                        ctx_len = getattr(getattr(self, 'context_compressor', None), 'context_length', 200000)
+                        is_large_session = approx_tokens > ctx_len * 0.4 or len(api_messages) > 80
+                        is_generic_error = len(error_msg.strip()) < 30  # e.g. just "error"
+                        if is_large_session and is_generic_error:
+                            is_context_length_error = True
+                            self._vprint(
+                                f"{self.log_prefix}⚠️  Generic 400 with large session "
+                                f"(~{approx_tokens:,} tokens, {len(api_messages)} msgs) — "
+                                f"treating as probable context overflow.",
+                                force=True,
+                            )
+
+                    # Server disconnects on large sessions are often caused by
+                    # the request exceeding the provider's context/payload limit
+                    # without a proper HTTP error response.  Treat these as
+                    # context-length errors to trigger compression rather than
+                    # burning through retries that will all fail the same way.
+                    # This breaks the death spiral: disconnect → no token data
+                    # → no compression → bigger session → more disconnects.
+                    # (#2153)
+                    if not is_context_length_error and not status_code:
+                        _is_server_disconnect = (
+                            'server disconnected' in error_msg
+                            or 'peer closed connection' in error_msg
+                            or error_type in ('ReadError', 'RemoteProtocolError', 'ServerDisconnectedError')
+                        )
+                        if _is_server_disconnect:
+                            ctx_len = getattr(getattr(self, 'context_compressor', None), 'context_length', 200000)
+                            _is_large = approx_tokens > ctx_len * 0.6 or len(api_messages) > 200
+                            if _is_large:
+                                is_context_length_error = True
+                                self._vprint(
+                                    f"{self.log_prefix}⚠️  Server disconnected with large session "
+                                    f"(~{approx_tokens:,} tokens, {len(api_messages)} msgs) — "
+                                    f"treating as context-length error, attempting compression.",
+                                    force=True,
+                                )

                    if is_context_length_error:
                        compressor = self.context_compressor
                        old_ctx = compressor.context_length

-                        # ── Distinguish two very different errors ───────────
-                        # 1. "Prompt too long": the INPUT exceeds the context window.
-                        #    Fix: reduce context_length + compress history.
-                        # 2. "max_tokens too large": input is fine, but
-                        #    input_tokens + requested max_tokens > context_window.
-                        #    Fix: reduce max_tokens (the OUTPUT cap) for this call.
-                        #    Do NOT shrink context_length — the window is unchanged.
-                        #
-                        # Note: max_tokens = output token cap (one response).
-                        #       context_length = total window (input + output combined).
-                        available_out = parse_available_output_tokens_from_error(error_msg)
-                        if available_out is not None:
-                            # Error is purely about the output cap being too large.
-                            # Cap output to the available space and retry without
-                            # touching context_length or triggering compression.
-                            safe_out = max(1, available_out - 64)  # small safety margin
-                            self._ephemeral_max_output_tokens = safe_out
-                            self._vprint(
-                                f"{self.log_prefix}⚠️  Output cap too large for current prompt — "
-                                f"retrying with max_tokens={safe_out:,} "
-                                f"(available_tokens={available_out:,}; context_length unchanged at {old_ctx:,})",
-                                force=True,
-                            )
-                            # Still count against compression_attempts so we don't
-                            # loop forever if the error keeps recurring.
-                            compression_attempts += 1
-                            if compression_attempts > max_compression_attempts:
-                                self._vprint(f"{self.log_prefix}❌ Max compression attempts ({max_compression_attempts}) reached.", force=True)
-                                self._vprint(f"{self.log_prefix}   💡 Try /new to start a fresh conversation, or /compress to retry compression.", force=True)
-                                logging.error(f"{self.log_prefix}Context compression failed after {max_compression_attempts} attempts.")
-                                self._persist_session(messages, conversation_history)
-                                return {
-                                    "messages": messages,
-                                    "completed": False,
-                                    "api_calls": api_call_count,
-                                    "error": f"Context length exceeded: max compression attempts ({max_compression_attempts}) reached.",
-                                    "partial": True
-                                }
-                            restart_with_compressed_messages = True
-                            break
-
-                        # Error is about the INPUT being too large — reduce context_length.
                        # Try to parse the actual limit from the error message
                        parsed_limit = parse_context_limit_from_error(error_msg)
                        if parsed_limit and parsed_limit < old_ctx:
@@ -8422,30 +8406,35 @@ class AIAgent:
                                "partial": True
                            }

-                    # Check for non-retryable client errors.  The classifier
-                    # already accounts for 413, 429, 529 (transient), context
-                    # overflow, and generic-400 heuristics.  Local validation
-                    # errors (ValueError, TypeError) are programming bugs.
+                    # Check for non-retryable client errors (4xx HTTP status codes).
+                    # These indicate a problem with the request itself (bad model ID,
+                    # invalid API key, forbidden, etc.) and will never succeed on retry.
+                    # Note: 413 and context-length errors are excluded — handled above.
+                    # 429 (rate limit) is transient and MUST be retried with backoff.
+                    # 529 (Anthropic overloaded) is also transient.
+                    # Also catch local validation errors (ValueError, TypeError) — these
+                    # are programming bugs, not transient failures.
+                    # Exclude UnicodeEncodeError — it's a ValueError subclass but is
+                    # handled separately by the surrogate sanitization path above.
+                    _RETRYABLE_STATUS_CODES = {413, 429, 529}
                    is_local_validation_error = (
                        isinstance(api_error, (ValueError, TypeError))
                        and not isinstance(api_error, UnicodeEncodeError)
                    )
-                    is_client_error = (
-                        is_local_validation_error
-                        or (
-                            not classified.retryable
-                            and not classified.should_compress
-                            and classified.reason not in (
-                                FailoverReason.rate_limit,
-                                FailoverReason.billing,
-                                FailoverReason.overloaded,
-                                FailoverReason.context_overflow,
-                                FailoverReason.payload_too_large,
-                                FailoverReason.long_context_tier,
-                                FailoverReason.thinking_signature,
-                            )
-                        )
-                    ) and not is_context_length_error
+                    # Detect generic 400s from Anthropic OAuth (transient server-side failures).
+                    # Real invalid_request_error responses include a descriptive message;
+                    # transient ones contain only "Error" or are empty. (ref: issue #1608)
+                    _err_body = getattr(api_error, "body", None) or {}
+                    _err_message = (_err_body.get("error", {}).get("message", "") if isinstance(_err_body, dict) else "")
+                    _is_generic_400 = (status_code == 400 and _err_message.strip().lower() in ("error", ""))
+                    is_client_status_error = isinstance(status_code, int) and 400 <= status_code < 500 and status_code not in _RETRYABLE_STATUS_CODES and not _is_generic_400
+                    is_client_error = (is_local_validation_error or is_client_status_error or any(phrase in error_msg for phrase in [
+                        'error code: 401', 'error code: 403',
+                        'error code: 404', 'error code: 422',
+                        'is not a valid model', 'invalid model', 'model not found',
+                        'invalid api key', 'invalid_api_key', 'authentication',
+                        'unauthorized', 'forbidden', 'not found',
+                    ])) and not is_context_length_error

                    if is_client_error:
                        # Try fallback before aborting — a different provider
@@ -8465,7 +8454,7 @@ class AIAgent:
                        self._vprint(f"{self.log_prefix}   🔌 Provider: {_provider}  Model: {_model}", force=True)
                        self._vprint(f"{self.log_prefix}   🌐 Endpoint: {_base}", force=True)
                        # Actionable guidance for common auth errors
-                        if classified.is_auth or classified.reason == FailoverReason.billing:
+                        if status_code in (401, 403) or "unauthorized" in error_msg or "forbidden" in error_msg or "permission" in error_msg:
                            if _provider == "openai-codex" and status_code == 401:
                                self._vprint(f"{self.log_prefix}   💡 Codex OAuth token was rejected (HTTP 401). Your token may have been", force=True)
                                self._vprint(f"{self.log_prefix}      refreshed by another client (Codex CLI, VS Code). To fix:", force=True)
@@ -8625,7 +8614,6 @@ class AIAgent:
            
            # If the API call was interrupted, skip response processing
            if interrupted:
-                _turn_exit_reason = "interrupted_during_api_call"
                break

            if restart_with_compressed_messages:
@@ -8645,7 +8633,6 @@ class AIAgent:
            # (e.g. repeated context-length errors that exhausted retry_count),
            # the `response` variable is still None. Break out cleanly.
            if response is None:
-                _turn_exit_reason = "all_retries_exhausted_no_response"
                print(f"{self.log_prefix}❌ All API retries exhausted with no successful response.")
                self._persist_session(messages, conversation_history)
                break
@@ -9109,7 +9096,6 @@ class AIAgent:
                        # instead of wasting API calls on retries that won't help.
                        fallback = getattr(self, '_last_content_with_tools', None)
                        if fallback:
-                            _turn_exit_reason = "fallback_prior_turn_content"
                            logger.debug("Empty follow-up after tool calls — using prior turn content as final response")
                            self._last_content_with_tools = None
                            self._empty_content_retries = 0
@@ -9176,7 +9162,6 @@ class AIAgent:
                        # Exhausted prefill attempts, empty retries, or
                        # structured reasoning with no content —
                        # fall through to "(empty)" terminal.
-                        _turn_exit_reason = "empty_response_exhausted"
                        reasoning_text = self._extract_reasoning(assistant_message)
                        assistant_msg = self._build_assistant_message(assistant_message, finish_reason)
                        assistant_msg["content"] = "(empty)"
@@ -9248,7 +9233,6 @@ class AIAgent:

                    messages.append(final_msg)
                    
-                    _turn_exit_reason = f"text_response(finish_reason={finish_reason})"
                    if not self.quiet_mode:
                        self._safe_print(f"🎉 Conversation completed after {api_call_count} OpenAI-compatible API call(s)")
                    break
@@ -9298,7 +9282,6 @@ class AIAgent:

                # If we're near the limit, break to avoid infinite loops
                if api_call_count >= self.max_iterations - 1:
-                    _turn_exit_reason = f"error_near_max_iterations({error_msg[:80]})"
                    final_response = f"I apologize, but I encountered repeated errors: {error_msg}"
                    # Append as assistant so the history stays valid for
                    # session resume (avoids consecutive user messages).
@@ -9309,7 +9292,6 @@ class AIAgent:
            api_call_count >= self.max_iterations
            or self.iteration_budget.remaining <= 0
        ):
-            _turn_exit_reason = f"max_iterations_reached({api_call_count}/{self.max_iterations})"
            if self.iteration_budget.remaining <= 0 and not self.quiet_mode:
                print(f"\n⚠️  Iteration budget exhausted ({self.iteration_budget.used}/{self.iteration_budget.max_total} iterations used)")
            final_response = self._handle_max_iterations(messages, api_call_count)
@@ -9326,49 +9308,6 @@ class AIAgent:
        # Persist session to both JSON log and SQLite
        self._persist_session(messages, conversation_history)

-        # ── Turn-exit diagnostic log ─────────────────────────────────────
-        # Always logged at INFO so agent.log captures WHY every turn ended.
-        # When the last message is a tool result (agent was mid-work), log
-        # at WARNING — this is the "just stops" scenario users report.
-        _last_msg_role = messages[-1].get("role") if messages else None
-        _last_tool_name = None
-        if _last_msg_role == "tool":
-            # Walk back to find the assistant message with the tool call
-            for _m in reversed(messages):
-                if _m.get("role") == "assistant" and _m.get("tool_calls"):
-                    _tcs = _m["tool_calls"]
-                    if _tcs and isinstance(_tcs[0], dict):
-                        _last_tool_name = _tcs[-1].get("function", {}).get("name")
-                    break
-
-        _turn_tool_count = sum(
-            1 for m in messages
-            if isinstance(m, dict) and m.get("role") == "assistant" and m.get("tool_calls")
-        )
-        _resp_len = len(final_response) if final_response else 0
-        _budget_used = self.iteration_budget.used if self.iteration_budget else 0
-        _budget_max = self.iteration_budget.max_total if self.iteration_budget else 0
-
-        _diag_msg = (
-            "Turn ended: reason=%s model=%s api_calls=%d/%d budget=%d/%d "
-            "tool_turns=%d last_msg_role=%s response_len=%d session=%s"
-        )
-        _diag_args = (
-            _turn_exit_reason, self.model, api_call_count, self.max_iterations,
-            _budget_used, _budget_max,
-            _turn_tool_count, _last_msg_role, _resp_len,
-            self.session_id or "none",
-        )
-
-        if _last_msg_role == "tool" and not interrupted:
-            # Agent was mid-work — this is the "just stops" case.
-            logger.warning(
-                "Turn ended with pending tool result (agent may appear stuck). "
-                + _diag_msg + " last_tool=%s",
-                *_diag_args, _last_tool_name,
-            )
-        else:
-            logger.info(_diag_msg, *_diag_args)

        # Plugin hook: post_llm_call
        # Fired once per turn after the tool-calling loop completes.
--- a/skills/autonomous-ai-agents/hermes-agent/SKILL.md
+++ b/skills/autonomous-ai-agents/hermes-agent/SKILL.md
@@ -249,6 +249,7 @@ Type these during an interactive chat session.
 /config              Show config (CLI)
 /model [name]        Show or change model
 /provider            Show provider info
+/prompt [text]       View/set system prompt (CLI)
 /personality [name]  Set personality
 /reasoning [level]   Set reasoning (none|low|medium|high|xhigh|show|hide)
 /verbose             Cycle: off → new → all → verbose
--- a/tests/agent/test_error_classifier.py
+++ b/tests/agent/test_error_classifier.py
@@ -1,782 +0,0 @@
-"""Tests for agent.error_classifier — structured API error classification."""
-
-import pytest
-from agent.error_classifier import (
-    ClassifiedError,
-    FailoverReason,
-    classify_api_error,
-    _extract_status_code,
-    _extract_error_body,
-    _extract_error_code,
-    _classify_402,
-)
-
-
-# ── Helper: mock API errors ────────────────────────────────────────────
-
-class MockAPIError(Exception):
-    """Simulates an OpenAI SDK APIStatusError."""
-    def __init__(self, message, status_code=None, body=None):
-        super().__init__(message)
-        self.status_code = status_code
-        self.body = body or {}
-
-
-class MockTransportError(Exception):
-    """Simulates a transport-level error with a specific type name."""
-    pass
-
-
-class ReadTimeout(MockTransportError):
-    pass
-
-
-class ConnectError(MockTransportError):
-    pass
-
-
-class RemoteProtocolError(MockTransportError):
-    pass
-
-
-class ServerDisconnectedError(MockTransportError):
-    pass
-
-
-# ── Test: FailoverReason enum ──────────────────────────────────────────
-
-class TestFailoverReason:
-    def test_all_reasons_have_string_values(self):
-        for reason in FailoverReason:
-            assert isinstance(reason.value, str)
-
-    def test_enum_members_exist(self):
-        expected = {
-            "auth", "auth_permanent", "billing", "rate_limit",
-            "overloaded", "server_error", "timeout",
-            "context_overflow", "payload_too_large",
-            "model_not_found", "format_error",
-            "thinking_signature", "long_context_tier", "unknown",
-        }
-        actual = {r.value for r in FailoverReason}
-        assert expected == actual
-
-
-# ── Test: ClassifiedError ──────────────────────────────────────────────
-
-class TestClassifiedError:
-    def test_is_auth_property(self):
-        e1 = ClassifiedError(reason=FailoverReason.auth)
-        assert e1.is_auth is True
-
-        e2 = ClassifiedError(reason=FailoverReason.auth_permanent)
-        assert e2.is_auth is True
-
-        e3 = ClassifiedError(reason=FailoverReason.billing)
-        assert e3.is_auth is False
-
-    def test_is_transient_property(self):
-        transient_reasons = [
-            FailoverReason.rate_limit,
-            FailoverReason.overloaded,
-            FailoverReason.server_error,
-            FailoverReason.timeout,
-            FailoverReason.unknown,
-        ]
-        for reason in transient_reasons:
-            e = ClassifiedError(reason=reason)
-            assert e.is_transient is True, f"{reason} should be transient"
-
-        non_transient = [
-            FailoverReason.auth,
-            FailoverReason.billing,
-            FailoverReason.model_not_found,
-            FailoverReason.format_error,
-        ]
-        for reason in non_transient:
-            e = ClassifiedError(reason=reason)
-            assert e.is_transient is False, f"{reason} should NOT be transient"
-
-    def test_defaults(self):
-        e = ClassifiedError(reason=FailoverReason.unknown)
-        assert e.retryable is True
-        assert e.should_compress is False
-        assert e.should_rotate_credential is False
-        assert e.should_fallback is False
-        assert e.status_code is None
-        assert e.message == ""
-
-
-# ── Test: Status code extraction ───────────────────────────────────────
-
-class TestExtractStatusCode:
-    def test_from_status_code_attr(self):
-        e = MockAPIError("fail", status_code=429)
-        assert _extract_status_code(e) == 429
-
-    def test_from_status_attr(self):
-        class ErrWithStatus(Exception):
-            status = 503
-        assert _extract_status_code(ErrWithStatus()) == 503
-
-    def test_from_cause_chain(self):
-        inner = MockAPIError("inner", status_code=401)
-        outer = Exception("outer")
-        outer.__cause__ = inner
-        assert _extract_status_code(outer) == 401
-
-    def test_none_when_missing(self):
-        assert _extract_status_code(Exception("generic")) is None
-
-    def test_rejects_non_http_status(self):
-        """Integers outside 100-599 on .status should be ignored."""
-        class ErrWeirdStatus(Exception):
-            status = 42
-        assert _extract_status_code(ErrWeirdStatus()) is None
-
-
-# ── Test: Error body extraction ────────────────────────────────────────
-
-class TestExtractErrorBody:
-    def test_from_body_attr(self):
-        e = MockAPIError("fail", body={"error": {"message": "bad"}})
-        assert _extract_error_body(e) == {"error": {"message": "bad"}}
-
-    def test_empty_when_no_body(self):
-        assert _extract_error_body(Exception("generic")) == {}
-
-
-# ── Test: Error code extraction ────────────────────────────────────────
-
-class TestExtractErrorCode:
-    def test_from_nested_error_code(self):
-        body = {"error": {"code": "rate_limit_exceeded"}}
-        assert _extract_error_code(body) == "rate_limit_exceeded"
-
-    def test_from_nested_error_type(self):
-        body = {"error": {"type": "invalid_request_error"}}
-        assert _extract_error_code(body) == "invalid_request_error"
-
-    def test_from_top_level_code(self):
-        body = {"code": "model_not_found"}
-        assert _extract_error_code(body) == "model_not_found"
-
-    def test_empty_when_no_code(self):
-        assert _extract_error_code({}) == ""
-        assert _extract_error_code({"error": {"message": "oops"}}) == ""
-
-
-# ── Test: 402 disambiguation ───────────────────────────────────────────
-
-class TestClassify402:
-    """The critical 402 billing vs rate_limit disambiguation."""
-
-    def test_billing_exhaustion(self):
-        """Plain 402 = billing."""
-        result = _classify_402(
-            "payment required",
-            lambda reason, **kw: ClassifiedError(reason=reason, **kw),
-        )
-        assert result.reason == FailoverReason.billing
-        assert result.should_rotate_credential is True
-
-    def test_transient_usage_limit(self):
-        """402 with 'usage limit' + 'try again' = rate limit, not billing."""
-        result = _classify_402(
-            "usage limit exceeded. try again in 5 minutes",
-            lambda reason, **kw: ClassifiedError(reason=reason, **kw),
-        )
-        assert result.reason == FailoverReason.rate_limit
-        assert result.should_rotate_credential is True
-
-    def test_quota_with_retry(self):
-        """402 with 'quota' + 'retry' = rate limit."""
-        result = _classify_402(
-            "quota exceeded, please retry after the window resets",
-            lambda reason, **kw: ClassifiedError(reason=reason, **kw),
-        )
-        assert result.reason == FailoverReason.rate_limit
-
-    def test_quota_without_retry(self):
-        """402 with just 'quota' but no transient signal = billing."""
-        result = _classify_402(
-            "quota exceeded",
-            lambda reason, **kw: ClassifiedError(reason=reason, **kw),
-        )
-        assert result.reason == FailoverReason.billing
-
-    def test_insufficient_credits(self):
-        result = _classify_402(
-            "insufficient credits to complete request",
-            lambda reason, **kw: ClassifiedError(reason=reason, **kw),
-        )
-        assert result.reason == FailoverReason.billing
-
-
-# ── Test: Full classification pipeline ─────────────────────────────────
-
-class TestClassifyApiError:
-    """End-to-end classification tests."""
-
-    # ── Auth errors ──
-
-    def test_401_classified_as_auth(self):
-        e = MockAPIError("Unauthorized", status_code=401)
-        result = classify_api_error(e, provider="openrouter")
-        assert result.reason == FailoverReason.auth
-        assert result.should_rotate_credential is True
-        # 401 is non-retryable on its own — credential rotation runs
-        # before the retryability check in the agent loop.
-        assert result.retryable is False
-        assert result.should_fallback is True
-
-    def test_403_classified_as_auth(self):
-        e = MockAPIError("Forbidden", status_code=403)
-        result = classify_api_error(e, provider="anthropic")
-        assert result.reason == FailoverReason.auth
-        assert result.should_fallback is True
-
-    def test_403_key_limit_classified_as_billing(self):
-        """OpenRouter 403 'key limit exceeded' is billing, not auth."""
-        e = MockAPIError("Key limit exceeded for this key", status_code=403)
-        result = classify_api_error(e, provider="openrouter")
-        assert result.reason == FailoverReason.billing
-        assert result.should_rotate_credential is True
-        assert result.should_fallback is True
-
-    def test_403_spending_limit_classified_as_billing(self):
-        e = MockAPIError("spending limit reached", status_code=403)
-        result = classify_api_error(e, provider="openrouter")
-        assert result.reason == FailoverReason.billing
-
-    # ── Billing ──
-
-    def test_402_plain_billing(self):
-        e = MockAPIError("Payment Required", status_code=402)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.billing
-        assert result.retryable is False
-
-    def test_402_transient_usage_limit(self):
-        e = MockAPIError("usage limit exceeded, try again later", status_code=402)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.rate_limit
-        assert result.retryable is True
-
-    # ── Rate limit ──
-
-    def test_429_rate_limit(self):
-        e = MockAPIError("Too Many Requests", status_code=429)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.rate_limit
-        assert result.should_fallback is True
-
-    # ── Server errors ──
-
-    def test_500_server_error(self):
-        e = MockAPIError("Internal Server Error", status_code=500)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.server_error
-        assert result.retryable is True
-
-    def test_502_server_error(self):
-        e = MockAPIError("Bad Gateway", status_code=502)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.server_error
-
-    def test_503_overloaded(self):
-        e = MockAPIError("Service Unavailable", status_code=503)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.overloaded
-
-    def test_529_anthropic_overloaded(self):
-        e = MockAPIError("Overloaded", status_code=529)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.overloaded
-
-    # ── Model not found ──
-
-    def test_404_model_not_found(self):
-        e = MockAPIError("model not found", status_code=404)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.model_not_found
-        assert result.should_fallback is True
-        assert result.retryable is False
-
-    def test_404_generic(self):
-        e = MockAPIError("Not Found", status_code=404)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.model_not_found
-
-    # ── Payload too large ──
-
-    def test_413_payload_too_large(self):
-        e = MockAPIError("Request Entity Too Large", status_code=413)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.payload_too_large
-        assert result.should_compress is True
-
-    # ── Context overflow ──
-
-    def test_400_context_length(self):
-        e = MockAPIError("context length exceeded: 250000 > 200000", status_code=400)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.context_overflow
-        assert result.should_compress is True
-
-    def test_400_too_many_tokens(self):
-        e = MockAPIError("This model's maximum context is 128000 tokens, too many tokens", status_code=400)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.context_overflow
-
-    def test_400_prompt_too_long(self):
-        e = MockAPIError("prompt is too long: 300000 tokens > 200000 maximum", status_code=400)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.context_overflow
-
-    def test_400_generic_large_session(self):
-        """Generic 400 with large session → context overflow heuristic."""
-        e = MockAPIError(
-            "Error",
-            status_code=400,
-            body={"error": {"message": "Error"}},
-        )
-        result = classify_api_error(e, approx_tokens=100000, context_length=200000)
-        assert result.reason == FailoverReason.context_overflow
-
-    def test_400_generic_small_session_is_format_error(self):
-        """Generic 400 with small session → format error, not context overflow."""
-        e = MockAPIError(
-            "Error",
-            status_code=400,
-            body={"error": {"message": "Error"}},
-        )
-        result = classify_api_error(e, approx_tokens=1000, context_length=200000)
-        assert result.reason == FailoverReason.format_error
-
-    # ── Server disconnect + large session ──
-
-    def test_disconnect_large_session_context_overflow(self):
-        """Server disconnect with large session → context overflow."""
-        e = Exception("server disconnected without sending complete message")
-        result = classify_api_error(e, approx_tokens=150000, context_length=200000)
-        assert result.reason == FailoverReason.context_overflow
-        assert result.should_compress is True
-
-    def test_disconnect_small_session_timeout(self):
-        """Server disconnect with small session → timeout."""
-        e = Exception("server disconnected without sending complete message")
-        result = classify_api_error(e, approx_tokens=5000, context_length=200000)
-        assert result.reason == FailoverReason.timeout
-
-    # ── Provider-specific: Anthropic thinking signature ──
-
-    def test_anthropic_thinking_signature(self):
-        e = MockAPIError(
-            "thinking block has invalid signature",
-            status_code=400,
-        )
-        result = classify_api_error(e, provider="anthropic")
-        assert result.reason == FailoverReason.thinking_signature
-        assert result.retryable is True
-
-    def test_non_anthropic_400_with_signature_not_classified_as_thinking(self):
-        """400 with 'signature' but from non-Anthropic → format error."""
-        e = MockAPIError("invalid signature", status_code=400)
-        result = classify_api_error(e, provider="openrouter", approx_tokens=0)
-        # Without "thinking" in the message, it shouldn't be thinking_signature
-        assert result.reason != FailoverReason.thinking_signature
-
-    # ── Provider-specific: Anthropic long-context tier ──
-
-    def test_anthropic_long_context_tier(self):
-        e = MockAPIError(
-            "Extra usage is required for long context requests over 200k tokens",
-            status_code=429,
-        )
-        result = classify_api_error(e, provider="anthropic", model="claude-sonnet-4")
-        assert result.reason == FailoverReason.long_context_tier
-        assert result.should_compress is True
-
-    def test_normal_429_not_long_context(self):
-        """Normal 429 without 'extra usage' + 'long context' → rate_limit."""
-        e = MockAPIError("Too Many Requests", status_code=429)
-        result = classify_api_error(e, provider="anthropic")
-        assert result.reason == FailoverReason.rate_limit
-
-    # ── Transport errors ──
-
-    def test_read_timeout(self):
-        e = ReadTimeout("Read timed out")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.timeout
-        assert result.retryable is True
-
-    def test_connect_error(self):
-        e = ConnectError("Connection refused")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.timeout
-
-    def test_connection_error_builtin(self):
-        e = ConnectionError("Connection reset by peer")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.timeout
-
-    def test_timeout_error_builtin(self):
-        e = TimeoutError("timed out")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.timeout
-
-    # ── Error code classification ──
-
-    def test_error_code_resource_exhausted(self):
-        e = MockAPIError(
-            "Resource exhausted",
-            body={"error": {"code": "resource_exhausted", "message": "Too many requests"}},
-        )
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.rate_limit
-
-    def test_error_code_model_not_found(self):
-        e = MockAPIError(
-            "Model not available",
-            body={"error": {"code": "model_not_found"}},
-        )
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.model_not_found
-
-    def test_error_code_context_length_exceeded(self):
-        e = MockAPIError(
-            "Context too large",
-            body={"error": {"code": "context_length_exceeded"}},
-        )
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.context_overflow
-
-    # ── Message-only patterns (no status code) ──
-
-    def test_message_billing_pattern(self):
-        e = Exception("insufficient credits to complete this request")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.billing
-
-    def test_message_rate_limit_pattern(self):
-        e = Exception("rate limit reached for this model")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.rate_limit
-
-    def test_message_auth_pattern(self):
-        e = Exception("invalid api key provided")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.auth
-
-    def test_message_model_not_found_pattern(self):
-        e = Exception("gpt-99 is not a valid model")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.model_not_found
-
-    def test_message_context_overflow_pattern(self):
-        e = Exception("maximum context length exceeded")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.context_overflow
-
-    # ── Unknown / fallback ──
-
-    def test_generic_exception_is_unknown(self):
-        e = Exception("something weird happened")
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.unknown
-        assert result.retryable is True
-
-    # ── Format error ──
-
-    def test_400_descriptive_format_error(self):
-        """400 with descriptive message (not context overflow) → format error."""
-        e = MockAPIError(
-            "Invalid value for parameter 'temperature': must be between 0 and 2",
-            status_code=400,
-            body={"error": {"message": "Invalid value for parameter 'temperature': must be between 0 and 2"}},
-        )
-        result = classify_api_error(e, approx_tokens=1000)
-        assert result.reason == FailoverReason.format_error
-        assert result.retryable is False
-
-    def test_422_format_error(self):
-        e = MockAPIError("Unprocessable Entity", status_code=422)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.format_error
-        assert result.retryable is False
-
-    def test_400_flat_body_descriptive_not_context_overflow(self):
-        """Responses API flat body with descriptive error + large session → format error.
-
-        The Codex Responses API returns errors in flat body format:
-        {"message": "...", "type": "..."} without an "error" wrapper.
-        A descriptive 400 must NOT be misclassified as context overflow
-        just because the session is large.
-        """
-        e = MockAPIError(
-            "Invalid 'input[index].name': string does not match pattern.",
-            status_code=400,
-            body={"message": "Invalid 'input[index].name': string does not match pattern.",
-                  "type": "invalid_request_error"},
-        )
-        result = classify_api_error(e, approx_tokens=200000, context_length=400000, num_messages=500)
-        assert result.reason == FailoverReason.format_error
-        assert result.retryable is False
-
-    def test_400_flat_body_generic_large_session_still_context_overflow(self):
-        """Flat body with generic 'Error' message + large session → context overflow.
-
-        Regression: the flat-body fallback must not break the existing heuristic
-        for genuinely generic errors from providers that use flat bodies.
-        """
-        e = MockAPIError(
-            "Error",
-            status_code=400,
-            body={"message": "Error"},
-        )
-        result = classify_api_error(e, approx_tokens=100000, context_length=200000)
-        assert result.reason == FailoverReason.context_overflow
-
-    # ── Peer closed + large session ──
-
-    def test_peer_closed_large_session(self):
-        e = Exception("peer closed connection without sending complete message")
-        result = classify_api_error(e, approx_tokens=130000, context_length=200000)
-        assert result.reason == FailoverReason.context_overflow
-
-    # ── Chinese error messages ──
-
-    def test_chinese_context_overflow(self):
-        e = MockAPIError("超过最大长度限制", status_code=400)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.context_overflow
-
-    # ── Result metadata ──
-
-    def test_provider_and_model_in_result(self):
-        e = MockAPIError("fail", status_code=500)
-        result = classify_api_error(e, provider="openrouter", model="gpt-5")
-        assert result.provider == "openrouter"
-        assert result.model == "gpt-5"
-        assert result.status_code == 500
-
-    def test_message_extracted(self):
-        e = MockAPIError(
-            "outer",
-            status_code=500,
-            body={"error": {"message": "Internal server error occurred"}},
-        )
-        result = classify_api_error(e)
-        assert result.message == "Internal server error occurred"
-
-
-# ── Test: Adversarial / edge cases (from live testing) ─────────────────
-
-class TestAdversarialEdgeCases:
-    """Edge cases discovered during live testing with real SDK objects."""
-
-    def test_empty_exception_message(self):
-        result = classify_api_error(Exception(""))
-        assert result.reason == FailoverReason.unknown
-        assert result.retryable is True
-
-    def test_500_with_none_body(self):
-        e = MockAPIError("fail", status_code=500, body=None)
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.server_error
-
-    def test_non_dict_body(self):
-        """Some providers return strings instead of JSON."""
-        class StringBodyError(Exception):
-            status_code = 400
-            body = "just a string"
-        result = classify_api_error(StringBodyError("bad"))
-        assert result.reason == FailoverReason.format_error
-
-    def test_list_body(self):
-        class ListBodyError(Exception):
-            status_code = 500
-            body = [{"error": "something"}]
-        result = classify_api_error(ListBodyError("server error"))
-        assert result.reason == FailoverReason.server_error
-
-    def test_circular_cause_chain(self):
-        """Must not infinite-loop on circular __cause__."""
-        e = Exception("circular")
-        e.__cause__ = e
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.unknown
-
-    def test_three_level_cause_chain(self):
-        inner = MockAPIError("inner", status_code=429)
-        middle = Exception("middle")
-        middle.__cause__ = inner
-        outer = RuntimeError("outer")
-        outer.__cause__ = middle
-        result = classify_api_error(outer)
-        assert result.status_code == 429
-        assert result.reason == FailoverReason.rate_limit
-
-    def test_400_with_rate_limit_text(self):
-        """Some providers send rate limits as 400 instead of 429."""
-        e = MockAPIError(
-            "rate limit policy",
-            status_code=400,
-            body={"error": {"message": "rate limit exceeded on this model"}},
-        )
-        result = classify_api_error(e, provider="openrouter")
-        assert result.reason == FailoverReason.rate_limit
-
-    def test_400_with_billing_text(self):
-        """Some providers send billing errors as 400."""
-        e = MockAPIError(
-            "billing",
-            status_code=400,
-            body={"error": {"message": "insufficient credits for this request"}},
-        )
-        result = classify_api_error(e)
-        assert result.reason == FailoverReason.billing
-
-    def test_200_with_error_body(self):
-        """200 status with error in body — should be unknown, not crash."""
-        class WeirdSuccess(Exception):
-            status_code = 200
-            body = {"error": {"message": "loading"}}
-        result = classify_api_error(WeirdSuccess("model loading"))
-        assert result.reason == FailoverReason.unknown
-
-    def test_ollama_context_size_exceeded(self):
-        e = MockAPIError(
-            "Error",
-            status_code=400,
-            body={"error": {"message": "context size has been exceeded"}},
-        )
-        result = classify_api_error(e, provider="ollama")
-        assert result.reason == FailoverReason.context_overflow
-
-    def test_connection_refused_error(self):
-        e = ConnectionRefusedError("Connection refused: localhost:11434")
-        result = classify_api_error(e, provider="ollama")
-        assert result.reason == FailoverReason.timeout
-
-    def test_body_message_enrichment(self):
-        """Body message must be included in pattern matching even when
-        str(error) doesn't contain it (OpenAI SDK APIStatusError)."""
-        e = MockAPIError(
-            "Usage limit",  # str(e) = "usage limit"
-            status_code=402,
-            body={"error": {"message": "Usage limit reached, try again in 5 minutes"}},
-        )
-        result = classify_api_error(e)
-        # "try again" is only in body, not in str(e)
-        assert result.reason == FailoverReason.rate_limit
-
-    def test_disconnect_pattern_ordering(self):
-        """Disconnect + large session must beat generic transport catch."""
-        class FakeRemoteProtocol(Exception):
-            pass
-        # Type name isn't in _TRANSPORT_ERROR_TYPES but message has disconnect pattern
-        e = Exception("peer closed connection without sending complete message")
-        result = classify_api_error(e, approx_tokens=150000, context_length=200000)
-        assert result.reason == FailoverReason.context_overflow
-        assert result.should_compress is True
-
-    def test_credit_balance_too_low(self):
-        e = MockAPIError(
-            "Credits low",
-            status_code=402,
-            body={"error": {"message": "Your credit balance is too low"}},
-        )
-        result = classify_api_error(e, provider="anthropic")
-        assert result.reason == FailoverReason.billing
-
-    def test_deepseek_402_chinese(self):
-        """Chinese billing message should still match billing patterns."""
-        # "余额不足" doesn't match English billing patterns, but 402 defaults to billing
-        e = MockAPIError("余额不足", status_code=402)
-        result = classify_api_error(e, provider="deepseek")
-        assert result.reason == FailoverReason.billing
-
-    def test_openrouter_wrapped_context_overflow_in_metadata_raw(self):
-        """OpenRouter wraps provider errors in metadata.raw JSON string."""
-        e = MockAPIError(
-            "Provider returned error",
-            status_code=400,
-            body={
-                "error": {
-                    "message": "Provider returned error",
-                    "code": 400,
-                    "metadata": {
-                        "raw": '{"error":{"message":"context length exceeded: 50000 > 32768"}}'
-                    }
-                }
-            },
-        )
-        result = classify_api_error(e, provider="openrouter", approx_tokens=10000)
-        assert result.reason == FailoverReason.context_overflow
-        assert result.should_compress is True
-
-    def test_openrouter_wrapped_rate_limit_in_metadata_raw(self):
-        e = MockAPIError(
-            "Provider returned error",
-            status_code=400,
-            body={
-                "error": {
-                    "message": "Provider returned error",
-                    "metadata": {
-                        "raw": '{"error":{"message":"Rate limit exceeded. Please retry after 30s."}}'
-                    }
-                }
-            },
-        )
-        result = classify_api_error(e, provider="openrouter")
-        assert result.reason == FailoverReason.rate_limit
-
-    def test_thinking_signature_via_openrouter(self):
-        """Thinking signature errors proxied through OpenRouter must be caught."""
-        e = MockAPIError(
-            "thinking block has invalid signature",
-            status_code=400,
-        )
-        # provider is openrouter, not anthropic — old code missed this
-        result = classify_api_error(e, provider="openrouter", model="anthropic/claude-sonnet-4")
-        assert result.reason == FailoverReason.thinking_signature
-
-    def test_generic_400_large_by_message_count(self):
-        """Many small messages (>80) should trigger context overflow heuristic."""
-        e = MockAPIError(
-            "Error",
-            status_code=400,
-            body={"error": {"message": "Error"}},
-        )
-        # Low token count but high message count
-        result = classify_api_error(
-            e, approx_tokens=5000, context_length=200000, num_messages=100,
-        )
-        assert result.reason == FailoverReason.context_overflow
-
-    def test_disconnect_large_by_message_count(self):
-        """Server disconnect with 200+ messages should trigger context overflow."""
-        e = Exception("server disconnected without sending complete message")
-        result = classify_api_error(
-            e, approx_tokens=5000, context_length=200000, num_messages=250,
-        )
-        assert result.reason == FailoverReason.context_overflow
-
-    def test_openrouter_wrapped_model_not_found_in_metadata_raw(self):
-        e = MockAPIError(
-            "Provider returned error",
-            status_code=400,
-            body={
-                "error": {
-                    "message": "Provider returned error",
-                    "metadata": {
-                        "raw": '{"error":{"message":"The model gpt-99 does not exist"}}'
-                    }
-                }
-            },
-        )
-        result = classify_api_error(e, provider="openrouter")
-        assert result.reason == FailoverReason.model_not_found
--- a/tests/cli/test_cli_status_bar.py
+++ b/tests/cli/test_cli_status_bar.py
@@ -41,7 +41,6 @@ def _attach_agent(
        session_completion_tokens=completion_tokens,
        session_total_tokens=total_tokens,
        session_api_calls=api_calls,
-        get_rate_limit_state=lambda: None,
        context_compressor=SimpleNamespace(
            last_prompt_tokens=context_tokens,
            context_length=context_length,
--- a/tests/conftest.py
+++ b/tests/conftest.py
@@ -38,8 +38,6 @@ def _isolate_hermes_home(tmp_path, monkeypatch):
    monkeypatch.delenv("HERMES_SESSION_CHAT_ID", raising=False)
    monkeypatch.delenv("HERMES_SESSION_CHAT_NAME", raising=False)
    monkeypatch.delenv("HERMES_GATEWAY_SESSION", raising=False)
-    # Avoid making real calls during tests if this key is set in the env files
-    monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)


@pytest.fixture()
--- a/tests/gateway/test_media_download_retry.py
+++ b/tests/gateway/test_media_download_retry.py
@@ -38,11 +38,10 @@ def _make_timeout_error() -> httpx.TimeoutException:
 # cache_image_from_url (base.py)
 # ---------------------------------------------------------------------------

-@patch("tools.url_safety.is_safe_url", return_value=True)
 class TestCacheImageFromUrl:
    """Tests for gateway.platforms.base.cache_image_from_url"""

-    def test_success_on_first_attempt(self, _mock_safe, tmp_path, monkeypatch):
+    def test_success_on_first_attempt(self, tmp_path, monkeypatch):
        """A clean 200 response caches the image and returns a path."""
        monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")

@@ -66,7 +65,7 @@ class TestCacheImageFromUrl:
        assert path.endswith(".jpg")
        mock_client.get.assert_called_once()

-    def test_retries_on_timeout_then_succeeds(self, _mock_safe, tmp_path, monkeypatch):
+    def test_retries_on_timeout_then_succeeds(self, tmp_path, monkeypatch):
        """A timeout on the first attempt is retried; second attempt succeeds."""
        monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")

@@ -96,7 +95,7 @@ class TestCacheImageFromUrl:
        assert mock_client.get.call_count == 2
        mock_sleep.assert_called_once()

-    def test_retries_on_429_then_succeeds(self, _mock_safe, tmp_path, monkeypatch):
+    def test_retries_on_429_then_succeeds(self, tmp_path, monkeypatch):
        """A 429 response on the first attempt is retried; second attempt succeeds."""
        monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")

@@ -123,7 +122,7 @@ class TestCacheImageFromUrl:
        assert path.endswith(".jpg")
        assert mock_client.get.call_count == 2

-    def test_raises_after_max_retries_exhausted(self, _mock_safe, tmp_path, monkeypatch):
+    def test_raises_after_max_retries_exhausted(self, tmp_path, monkeypatch):
        """Timeout on every attempt raises after all retries are consumed."""
        monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")

@@ -146,7 +145,7 @@ class TestCacheImageFromUrl:
        # 3 total calls: initial + 2 retries
        assert mock_client.get.call_count == 3

-    def test_non_retryable_4xx_raises_immediately(self, _mock_safe, tmp_path, monkeypatch):
+    def test_non_retryable_4xx_raises_immediately(self, tmp_path, monkeypatch):
        """A 404 (non-retryable) is raised immediately without any retry."""
        monkeypatch.setattr("gateway.platforms.base.IMAGE_CACHE_DIR", tmp_path / "img")

@@ -176,11 +175,10 @@ class TestCacheImageFromUrl:
 # cache_audio_from_url (base.py)
 # ---------------------------------------------------------------------------

-@patch("tools.url_safety.is_safe_url", return_value=True)
 class TestCacheAudioFromUrl:
    """Tests for gateway.platforms.base.cache_audio_from_url"""

-    def test_success_on_first_attempt(self, _mock_safe, tmp_path, monkeypatch):
+    def test_success_on_first_attempt(self, tmp_path, monkeypatch):
        """A clean 200 response caches the audio and returns a path."""
        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")

@@ -204,7 +202,7 @@ class TestCacheAudioFromUrl:
        assert path.endswith(".ogg")
        mock_client.get.assert_called_once()

-    def test_retries_on_timeout_then_succeeds(self, _mock_safe, tmp_path, monkeypatch):
+    def test_retries_on_timeout_then_succeeds(self, tmp_path, monkeypatch):
        """A timeout on the first attempt is retried; second attempt succeeds."""
        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")

@@ -234,7 +232,7 @@ class TestCacheAudioFromUrl:
        assert mock_client.get.call_count == 2
        mock_sleep.assert_called_once()

-    def test_retries_on_429_then_succeeds(self, _mock_safe, tmp_path, monkeypatch):
+    def test_retries_on_429_then_succeeds(self, tmp_path, monkeypatch):
        """A 429 response on the first attempt is retried; second attempt succeeds."""
        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")

@@ -261,7 +259,7 @@ class TestCacheAudioFromUrl:
        assert path.endswith(".ogg")
        assert mock_client.get.call_count == 2

-    def test_retries_on_500_then_succeeds(self, _mock_safe, tmp_path, monkeypatch):
+    def test_retries_on_500_then_succeeds(self, tmp_path, monkeypatch):
        """A 500 response on the first attempt is retried; second attempt succeeds."""
        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")

@@ -288,7 +286,7 @@ class TestCacheAudioFromUrl:
        assert path.endswith(".ogg")
        assert mock_client.get.call_count == 2

-    def test_raises_after_max_retries_exhausted(self, _mock_safe, tmp_path, monkeypatch):
+    def test_raises_after_max_retries_exhausted(self, tmp_path, monkeypatch):
        """Timeout on every attempt raises after all retries are consumed."""
        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")

@@ -311,7 +309,7 @@ class TestCacheAudioFromUrl:
        # 3 total calls: initial + 2 retries
        assert mock_client.get.call_count == 3

-    def test_non_retryable_4xx_raises_immediately(self, _mock_safe, tmp_path, monkeypatch):
+    def test_non_retryable_4xx_raises_immediately(self, tmp_path, monkeypatch):
        """A 404 (non-retryable) is raised immediately without any retry."""
        monkeypatch.setattr("gateway.platforms.base.AUDIO_CACHE_DIR", tmp_path / "audio")

--- a/tests/gateway/test_wecom.py
+++ b/tests/gateway/test_wecom.py
@@ -4,7 +4,7 @@ import base64
 import os
 from pathlib import Path
 from types import SimpleNamespace
-from unittest.mock import AsyncMock, patch
+from unittest.mock import AsyncMock

 import pytest

@@ -355,8 +355,7 @@ class TestMediaUpload:
        assert calls[3][1]["chunk_index"] == 2

    @pytest.mark.asyncio
-    @patch("tools.url_safety.is_safe_url", return_value=True)
-    async def test_download_remote_bytes_rejects_large_content_length(self, _mock_safe):
+    async def test_download_remote_bytes_rejects_large_content_length(self):
        from gateway.platforms.wecom import WeComAdapter

        class FakeResponse:
--- a/tests/hermes_cli/test_api_key_providers.py
+++ b/tests/hermes_cli/test_api_key_providers.py
@@ -628,21 +628,14 @@ class TestHasAnyProviderConfigured:
    def test_claude_code_creds_ignored_on_fresh_install(self, monkeypatch, tmp_path):
        """Claude Code credentials should NOT skip the wizard when Hermes is unconfigured."""
        from hermes_cli import config as config_module
-        from hermes_cli.auth import PROVIDER_REGISTRY
        hermes_home = tmp_path / ".hermes"
        hermes_home.mkdir()
        monkeypatch.setattr(config_module, "get_env_path", lambda: hermes_home / ".env")
        monkeypatch.setattr(config_module, "get_hermes_home", lambda: hermes_home)
        # Clear all provider env vars so earlier checks don't short-circuit
-        _all_vars = {"OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
-                      "ANTHROPIC_TOKEN", "OPENAI_BASE_URL"}
-        for pconfig in PROVIDER_REGISTRY.values():
-            if pconfig.auth_type == "api_key":
-                _all_vars.update(pconfig.api_key_env_vars)
-        for var in _all_vars:
+        for var in ("OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
+                     "ANTHROPIC_TOKEN", "OPENAI_BASE_URL"):
            monkeypatch.delenv(var, raising=False)
-        # Prevent gh-cli / copilot auth fallback from leaking in
-        monkeypatch.setattr("hermes_cli.auth.get_auth_status", lambda _pid: {})
        # Simulate valid Claude Code credentials
        monkeypatch.setattr(
            "agent.anthropic_adapter.read_claude_code_credentials",
@@ -717,7 +710,6 @@ class TestHasAnyProviderConfigured:
        """config.yaml model dict with empty default and no creds stays false."""
        import yaml
        from hermes_cli import config as config_module
-        from hermes_cli.auth import PROVIDER_REGISTRY
        hermes_home = tmp_path / ".hermes"
        hermes_home.mkdir()
        config_file = hermes_home / "config.yaml"
@@ -727,15 +719,9 @@ class TestHasAnyProviderConfigured:
        monkeypatch.setattr(config_module, "get_env_path", lambda: hermes_home / ".env")
        monkeypatch.setattr(config_module, "get_hermes_home", lambda: hermes_home)
        monkeypatch.setenv("HERMES_HOME", str(hermes_home))
-        _all_vars = {"OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
-                      "ANTHROPIC_TOKEN", "OPENAI_BASE_URL"}
-        for pconfig in PROVIDER_REGISTRY.values():
-            if pconfig.auth_type == "api_key":
-                _all_vars.update(pconfig.api_key_env_vars)
-        for var in _all_vars:
+        for var in ("OPENROUTER_API_KEY", "OPENAI_API_KEY", "ANTHROPIC_API_KEY",
+                     "ANTHROPIC_TOKEN", "OPENAI_BASE_URL"):
            monkeypatch.delenv(var, raising=False)
-        # Prevent gh-cli / copilot auth fallback from leaking in
-        monkeypatch.setattr("hermes_cli.auth.get_auth_status", lambda _pid: {})
        from hermes_cli.main import _has_any_provider_configured
        assert _has_any_provider_configured() is False

@@ -955,10 +941,9 @@ class TestHuggingFaceModels:
        """Every HF model should have a context length entry."""
        from hermes_cli.models import _PROVIDER_MODELS
        from agent.model_metadata import DEFAULT_CONTEXT_LENGTHS
-        lower_keys = {k.lower() for k in DEFAULT_CONTEXT_LENGTHS}
        hf_models = _PROVIDER_MODELS["huggingface"]
        for model in hf_models:
-            assert model.lower() in lower_keys, (
+            assert model in DEFAULT_CONTEXT_LENGTHS, (
                f"HF model {model!r} missing from DEFAULT_CONTEXT_LENGTHS"
            )

--- a/tests/hermes_cli/test_commands.py
+++ b/tests/hermes_cli/test_commands.py
@@ -425,8 +425,8 @@ class TestSlashCommandCompleter:
 class TestSubcommands:
    def test_explicit_subcommands_extracted(self):
        """Commands with explicit subcommands on CommandDef are extracted."""
-        assert "/skills" in SUBCOMMANDS
-        assert "install" in SUBCOMMANDS["/skills"]
+        assert "/prompt" in SUBCOMMANDS
+        assert "clear" in SUBCOMMANDS["/prompt"]

    def test_reasoning_has_subcommands(self):
        assert "/reasoning" in SUBCOMMANDS
--- a/tests/hermes_cli/test_setup_openclaw_migration.py
+++ b/tests/hermes_cli/test_setup_openclaw_migration.py
@@ -44,7 +44,7 @@ class TestOfferOpenclawMigration:
            assert setup_mod._offer_openclaw_migration(tmp_path / ".hermes") is False

    def test_runs_migration_when_user_accepts(self, tmp_path):
-        """Should run dry-run preview first, then execute after confirmation."""
+        """Should dynamically load the script and run the Migrator."""
        openclaw_dir = tmp_path / ".openclaw"
        openclaw_dir.mkdir()

@@ -60,7 +60,6 @@ class TestOfferOpenclawMigration:
        fake_migrator = MagicMock()
        fake_migrator.migrate.return_value = {
            "summary": {"migrated": 3, "skipped": 1, "conflict": 0, "error": 0},
-            "items": [{"kind": "config", "status": "migrated", "destination": "/tmp/x"}],
            "output_dir": str(hermes_home / "migration"),
        }
        fake_mod.Migrator = MagicMock(return_value=fake_migrator)
@@ -71,7 +70,6 @@ class TestOfferOpenclawMigration:
        with (
            patch("hermes_cli.setup.Path.home", return_value=tmp_path),
            patch.object(setup_mod, "_OPENCLAW_SCRIPT", script),
-            # Both prompts answered Yes: preview offer + proceed confirmation
            patch.object(setup_mod, "prompt_yes_no", return_value=True),
            patch.object(setup_mod, "get_config_path", return_value=config_path),
            patch("importlib.util.spec_from_file_location") as mock_spec_fn,
@@ -93,75 +91,13 @@ class TestOfferOpenclawMigration:
        fake_mod.resolve_selected_options.assert_called_once_with(
            None, None, preset="full"
        )
-        # Migrator called twice: once for dry-run preview, once for execution
-        assert fake_mod.Migrator.call_count == 2
-
-        # First call: dry-run preview (execute=False, overwrite=True to show all)
-        preview_kwargs = fake_mod.Migrator.call_args_list[0][1]
-        assert preview_kwargs["execute"] is False
-        assert preview_kwargs["overwrite"] is True
-        assert preview_kwargs["migrate_secrets"] is True
-        assert preview_kwargs["preset_name"] == "full"
-
-        # Second call: actual execution (execute=True, overwrite=False to preserve)
-        exec_kwargs = fake_mod.Migrator.call_args_list[1][1]
-        assert exec_kwargs["execute"] is True
-        assert exec_kwargs["overwrite"] is False
-        assert exec_kwargs["migrate_secrets"] is True
-        assert exec_kwargs["preset_name"] == "full"
-
-        # migrate() called twice (once per Migrator instance)
-        assert fake_migrator.migrate.call_count == 2
-
-    def test_user_declines_after_preview(self, tmp_path):
-        """Should return False when user sees preview but declines to proceed."""
-        openclaw_dir = tmp_path / ".openclaw"
-        openclaw_dir.mkdir()
-
-        hermes_home = tmp_path / ".hermes"
-        hermes_home.mkdir()
-        config_path = hermes_home / "config.yaml"
-        config_path.write_text("agent:\n  max_turns: 90\n")
-
-        fake_mod = ModuleType("openclaw_to_hermes")
-        fake_mod.resolve_selected_options = MagicMock(return_value={"soul", "memory"})
-        fake_migrator = MagicMock()
-        fake_migrator.migrate.return_value = {
-            "summary": {"migrated": 3, "skipped": 0, "conflict": 0, "error": 0},
-            "items": [{"kind": "config", "status": "migrated", "destination": "/tmp/x"}],
-        }
-        fake_mod.Migrator = MagicMock(return_value=fake_migrator)
-
-        script = tmp_path / "openclaw_to_hermes.py"
-        script.write_text("# placeholder")
-
-        # First prompt (preview): Yes, Second prompt (proceed): No
-        prompt_responses = iter([True, False])
-
-        with (
-            patch("hermes_cli.setup.Path.home", return_value=tmp_path),
-            patch.object(setup_mod, "_OPENCLAW_SCRIPT", script),
-            patch.object(setup_mod, "prompt_yes_no", side_effect=prompt_responses),
-            patch.object(setup_mod, "get_config_path", return_value=config_path),
-            patch("importlib.util.spec_from_file_location") as mock_spec_fn,
-        ):
-            mock_spec = MagicMock()
-            mock_spec.loader = MagicMock()
-            mock_spec_fn.return_value = mock_spec
-
-            def exec_module(mod):
-                mod.resolve_selected_options = fake_mod.resolve_selected_options
-                mod.Migrator = fake_mod.Migrator
-
-            mock_spec.loader.exec_module = exec_module
-
-            result = setup_mod._offer_openclaw_migration(hermes_home)
-
-        assert result is False
-        # Only dry-run Migrator was created, not the execute one
-        assert fake_mod.Migrator.call_count == 1
-        preview_kwargs = fake_mod.Migrator.call_args[1]
-        assert preview_kwargs["execute"] is False
+        fake_mod.Migrator.assert_called_once()
+        call_kwargs = fake_mod.Migrator.call_args[1]
+        assert call_kwargs["execute"] is True
+        assert call_kwargs["overwrite"] is True
+        assert call_kwargs["migrate_secrets"] is True
+        assert call_kwargs["preset_name"] == "full"
+        fake_migrator.migrate.assert_called_once()

    def test_handles_migration_error_gracefully(self, tmp_path):
        """Should catch exceptions and return False."""
--- a/tests/hermes_cli/test_tools_config.py
+++ b/tests/hermes_cli/test_tools_config.py
@@ -354,14 +354,6 @@ def test_first_install_nous_auto_configures_managed_defaults(monkeypatch):
        lambda *args, **kwargs: {"web", "image_gen", "tts", "browser"},
    )
    monkeypatch.setattr("hermes_cli.tools_config.save_config", lambda config: None)
-    # Prevent leaked platform tokens (e.g. DISCORD_BOT_TOKEN from gateway.run
-    # import) from adding extra platforms. The loop in tools_command runs
-    # apply_nous_managed_defaults per platform; a second iteration sees values
-    # set by the first as "explicit" and skips them.
-    monkeypatch.setattr(
-        "hermes_cli.tools_config._get_enabled_platforms",
-        lambda: ["cli"],
-    )
    monkeypatch.setattr(
        "hermes_cli.nous_subscription.get_nous_auth_status",
        lambda: {"logged_in": True},
--- a/tests/hermes_cli/test_update_gateway_restart.py
+++ b/tests/hermes_cli/test_update_gateway_restart.py
@@ -368,9 +368,6 @@ class TestCmdUpdateLaunchdRestart:
        monkeypatch.setattr(
            gateway_cli, "is_macos", lambda: False,
        )
-        monkeypatch.setattr(
-            gateway_cli, "is_linux", lambda: True,
-        )

        mock_run.side_effect = _make_run_side_effect(
            commit_count="3",
--- a/tests/test_ctx_halving_fix.py
+++ b/tests/test_ctx_halving_fix.py
@@ -1,319 +0,0 @@
-"""Tests for the context-halving bugfix.
-
-Background
----------
-When the API returns "max_tokens too large given prompt" (input is fine,
-but input_tokens + requested max_tokens > context_window), the old code
-incorrectly halved context_length via get_next_probe_tier().
-
-The fix introduces:
-  * parse_available_output_tokens_from_error() — detects this specific
-    error class and returns the available output token budget.
-  * _ephemeral_max_output_tokens on AIAgent — a one-shot override that
-    caps the output for one retry without touching context_length.
-
-Naming note
-----------
-  max_tokens     = OUTPUT token cap (a single response).
-  context_length = TOTAL context window (input + output combined).
-These are different and the old code conflated them; the fix keeps them
-separate.
-"""
-
-import sys
-import os
-from unittest.mock import MagicMock, patch, PropertyMock
-
-sys.path.insert(0, os.path.join(os.path.dirname(__file__), ".."))
-
-import pytest
-
-
-# ---------------------------------------------------------------------------
-# parse_available_output_tokens_from_error — unit tests
-# ---------------------------------------------------------------------------
-
-class TestParseAvailableOutputTokens:
-    """Pure-function tests; no I/O required."""
-
-    def _parse(self, msg):
-        from agent.model_metadata import parse_available_output_tokens_from_error
-        return parse_available_output_tokens_from_error(msg)
-
-    # ── Should detect and extract ────────────────────────────────────────
-
-    def test_anthropic_canonical_format(self):
-        """Canonical Anthropic error: max_tokens: X > context_window: Y - input_tokens: Z = available_tokens: W"""
-        msg = (
-            "max_tokens: 32768 > context_window: 200000 "
-            "- input_tokens: 190000 = available_tokens: 10000"
-        )
-        assert self._parse(msg) == 10000
-
-    def test_anthropic_format_large_numbers(self):
-        msg = (
-            "max_tokens: 128000 > context_window: 200000 "
-            "- input_tokens: 180000 = available_tokens: 20000"
-        )
-        assert self._parse(msg) == 20000
-
-    def test_available_tokens_variant_spacing(self):
-        """Handles extra spaces around the colon."""
-        msg = "max_tokens: 32768 > 200000 available_tokens : 5000"
-        assert self._parse(msg) == 5000
-
-    def test_available_tokens_natural_language(self):
-        """'available tokens: N' wording (no underscore)."""
-        msg = "max_tokens must be at most 10000 given your prompt (available tokens: 10000)"
-        assert self._parse(msg) == 10000
-
-    def test_single_token_available(self):
-        """Edge case: only 1 token left."""
-        msg = "max_tokens: 9999 > context_window: 10000 - input_tokens: 9999 = available_tokens: 1"
-        assert self._parse(msg) == 1
-
-    # ── Should NOT detect (returns None) ─────────────────────────────────
-
-    def test_prompt_too_long_is_not_output_cap_error(self):
-        """'prompt is too long' errors must NOT be caught — they need context halving."""
-        msg = "prompt is too long: 205000 tokens > 200000 maximum"
-        assert self._parse(msg) is None
-
-    def test_generic_context_window_exceeded(self):
-        """Generic context window errors without available_tokens should not match."""
-        msg = "context window exceeded: maximum is 32768 tokens"
-        assert self._parse(msg) is None
-
-    def test_context_length_exceeded(self):
-        msg = "context_length_exceeded: prompt has 131073 tokens, limit is 131072"
-        assert self._parse(msg) is None
-
-    def test_no_max_tokens_keyword(self):
-        """Error not related to max_tokens at all."""
-        msg = "invalid_api_key: the API key is invalid"
-        assert self._parse(msg) is None
-
-    def test_empty_string(self):
-        assert self._parse("") is None
-
-    def test_rate_limit_error(self):
-        msg = "rate_limit_error: too many requests per minute"
-        assert self._parse(msg) is None
-
-
-# ---------------------------------------------------------------------------
-# build_anthropic_kwargs — output cap clamping
-# ---------------------------------------------------------------------------
-
-class TestBuildAnthropicKwargsClamping:
-    """The context_length clamp only fires when output ceiling > window.
-    For standard Anthropic models (output ceiling < window) it must not fire.
-    """
-
-    def _build(self, model, max_tokens=None, context_length=None):
-        from agent.anthropic_adapter import build_anthropic_kwargs
-        return build_anthropic_kwargs(
-            model=model,
-            messages=[{"role": "user", "content": "hi"}],
-            tools=None,
-            max_tokens=max_tokens,
-            reasoning_config=None,
-            context_length=context_length,
-        )
-
-    def test_no_clamping_when_output_ceiling_fits_in_window(self):
-        """Opus 4.6 native output (128K) < context window (200K) — no clamping."""
-        kwargs = self._build("claude-opus-4-6", context_length=200_000)
-        assert kwargs["max_tokens"] == 128_000
-
-    def test_clamping_fires_for_tiny_custom_window(self):
-        """When context_length is 8K (local model), output cap is clamped to 7999."""
-        kwargs = self._build("claude-opus-4-6", context_length=8_000)
-        assert kwargs["max_tokens"] == 7_999
-
-    def test_explicit_max_tokens_respected_when_within_window(self):
-        """Explicit max_tokens smaller than window passes through unchanged."""
-        kwargs = self._build("claude-opus-4-6", max_tokens=4096, context_length=200_000)
-        assert kwargs["max_tokens"] == 4096
-
-    def test_explicit_max_tokens_clamped_when_exceeds_window(self):
-        """Explicit max_tokens larger than a small window is clamped."""
-        kwargs = self._build("claude-opus-4-6", max_tokens=32_768, context_length=16_000)
-        assert kwargs["max_tokens"] == 15_999
-
-    def test_no_context_length_uses_native_ceiling(self):
-        """Without context_length the native output ceiling is used directly."""
-        kwargs = self._build("claude-sonnet-4-6")
-        assert kwargs["max_tokens"] == 64_000
-
-
-# ---------------------------------------------------------------------------
-# Ephemeral max_tokens mechanism — _build_api_kwargs
-# ---------------------------------------------------------------------------
-
-class TestEphemeralMaxOutputTokens:
-    """_build_api_kwargs consumes _ephemeral_max_output_tokens exactly once
-    and falls back to self.max_tokens on subsequent calls.
-    """
-
-    def _make_agent(self):
-        """Return a minimal AIAgent with api_mode='anthropic_messages' and
-        a stubbed context_compressor, bypassing full __init__ cost."""
-        from run_agent import AIAgent
-        agent = object.__new__(AIAgent)
-        # Minimal attributes used by _build_api_kwargs
-        agent.api_mode = "anthropic_messages"
-        agent.model = "claude-opus-4-6"
-        agent.tools = []
-        agent.max_tokens = None
-        agent.reasoning_config = None
-        agent._is_anthropic_oauth = False
-        agent._ephemeral_max_output_tokens = None
-
-        compressor = MagicMock()
-        compressor.context_length = 200_000
-        agent.context_compressor = compressor
-
-        # Stub out the internal message-preparation helper
-        agent._prepare_anthropic_messages_for_api = MagicMock(
-            return_value=[{"role": "user", "content": "hi"}]
-        )
-        agent._anthropic_preserve_dots = MagicMock(return_value=False)
-        return agent
-
-    def test_ephemeral_override_is_used_on_first_call(self):
-        """When _ephemeral_max_output_tokens is set, it overrides self.max_tokens."""
-        agent = self._make_agent()
-        agent._ephemeral_max_output_tokens = 5_000
-
-        kwargs = agent._build_api_kwargs([{"role": "user", "content": "hi"}])
-        assert kwargs["max_tokens"] == 5_000
-
-    def test_ephemeral_override_is_consumed_after_one_call(self):
-        """After one call the ephemeral override is cleared to None."""
-        agent = self._make_agent()
-        agent._ephemeral_max_output_tokens = 5_000
-
-        agent._build_api_kwargs([{"role": "user", "content": "hi"}])
-        assert agent._ephemeral_max_output_tokens is None
-
-    def test_subsequent_call_uses_self_max_tokens(self):
-        """A second _build_api_kwargs call uses the normal max_tokens path."""
-        agent = self._make_agent()
-        agent._ephemeral_max_output_tokens = 5_000
-        agent.max_tokens = None  # will resolve to native ceiling (128K for Opus 4.6)
-
-        agent._build_api_kwargs([{"role": "user", "content": "hi"}])
-        # Second call — ephemeral is gone
-        kwargs2 = agent._build_api_kwargs([{"role": "user", "content": "hi"}])
-        assert kwargs2["max_tokens"] == 128_000  # Opus 4.6 native ceiling
-
-    def test_no_ephemeral_uses_self_max_tokens_directly(self):
-        """Without an ephemeral override, self.max_tokens is used normally."""
-        agent = self._make_agent()
-        agent.max_tokens = 8_192
-
-        kwargs = agent._build_api_kwargs([{"role": "user", "content": "hi"}])
-        assert kwargs["max_tokens"] == 8_192
-
-
-# ---------------------------------------------------------------------------
-# Integration: error handler does NOT halve context_length for output-cap errors
-# ---------------------------------------------------------------------------
-
-class TestContextNotHalvedOnOutputCapError:
-    """When the API returns 'max_tokens too large given prompt', the handler
-    must set _ephemeral_max_output_tokens and NOT modify context_length.
-    """
-
-    def _make_agent_with_compressor(self, context_length=200_000):
-        from run_agent import AIAgent
-        from agent.context_compressor import ContextCompressor
-
-        agent = object.__new__(AIAgent)
-        agent.api_mode = "anthropic_messages"
-        agent.model = "claude-opus-4-6"
-        agent.base_url = "https://api.anthropic.com"
-        agent.tools = []
-        agent.max_tokens = None
-        agent.reasoning_config = None
-        agent._is_anthropic_oauth = False
-        agent._ephemeral_max_output_tokens = None
-        agent.log_prefix = ""
-        agent.quiet_mode = True
-        agent.verbose_logging = False
-
-        compressor = MagicMock(spec=ContextCompressor)
-        compressor.context_length = context_length
-        compressor.threshold_percent = 0.75
-        agent.context_compressor = compressor
-
-        agent._prepare_anthropic_messages_for_api = MagicMock(
-            return_value=[{"role": "user", "content": "hi"}]
-        )
-        agent._anthropic_preserve_dots = MagicMock(return_value=False)
-        agent._vprint = MagicMock()
-        return agent
-
-    def test_output_cap_error_sets_ephemeral_not_context_length(self):
-        """On 'max_tokens too large' error, _ephemeral_max_output_tokens is set
-        and compressor.context_length is left unchanged."""
-        from agent.model_metadata import parse_available_output_tokens_from_error
-        from agent.model_metadata import get_next_probe_tier
-
-        error_msg = (
-            "max_tokens: 128000 > context_window: 200000 "
-            "- input_tokens: 180000 = available_tokens: 20000"
-        )
-
-        # Simulate the handler logic from run_agent.py
-        agent = self._make_agent_with_compressor(context_length=200_000)
-        old_ctx = agent.context_compressor.context_length
-
-        available_out = parse_available_output_tokens_from_error(error_msg)
-        assert available_out == 20_000, "parser must detect the error"
-
-        # The fix: set ephemeral, skip context_length modification
-        agent._ephemeral_max_output_tokens = max(1, available_out - 64)
-
-        # context_length must be untouched
-        assert agent.context_compressor.context_length == old_ctx
-        assert agent._ephemeral_max_output_tokens == 19_936
-
-    def test_prompt_too_long_still_triggers_probe_tier(self):
-        """Genuine prompt-too-long errors must still use get_next_probe_tier."""
-        from agent.model_metadata import parse_available_output_tokens_from_error
-        from agent.model_metadata import get_next_probe_tier
-
-        error_msg = "prompt is too long: 205000 tokens > 200000 maximum"
-
-        available_out = parse_available_output_tokens_from_error(error_msg)
-        assert available_out is None, "prompt-too-long must not be caught by output-cap parser"
-
-        # The old halving path is still used for this class of error
-        new_ctx = get_next_probe_tier(200_000)
-        assert new_ctx == 128_000
-
-    def test_output_cap_error_safety_margin(self):
-        """The ephemeral value includes a 64-token safety margin below available_out."""
-        from agent.model_metadata import parse_available_output_tokens_from_error
-
-        error_msg = (
-            "max_tokens: 32768 > context_window: 200000 "
-            "- input_tokens: 190000 = available_tokens: 10000"
-        )
-        available_out = parse_available_output_tokens_from_error(error_msg)
-        safe_out = max(1, available_out - 64)
-        assert safe_out == 9_936
-
-    def test_safety_margin_never_goes_below_one(self):
-        """When available_out is very small, safe_out must be at least 1."""
-        from agent.model_metadata import parse_available_output_tokens_from_error
-
-        error_msg = (
-            "max_tokens: 10 > context_window: 200000 "
-            "- input_tokens: 199990 = available_tokens: 1"
-        )
-        available_out = parse_available_output_tokens_from_error(error_msg)
-        safe_out = max(1, available_out - 64)
-        assert safe_out == 1
--- a/tests/tools/test_browser_camofox_state.py
+++ b/tests/tools/test_browser_camofox_state.py
@@ -63,4 +63,4 @@ class TestCamofoxConfigDefaults:
        from hermes_cli.config import DEFAULT_CONFIG

        # managed_persistence is auto-merged by _deep_merge, no version bump needed
-        assert DEFAULT_CONFIG["_config_version"] == 13
+        assert DEFAULT_CONFIG["_config_version"] == 12
--- a/tests/tools/test_docker_environment.py
+++ b/tests/tools/test_docker_environment.py
@@ -258,30 +258,28 @@ def _make_execute_only_env(forward_env=None):

 def test_init_env_args_uses_hermes_dotenv_for_allowlisted_env(monkeypatch):
    """_build_init_env_args picks up forwarded env vars from .env file at init time."""
-    # Use a var that is NOT in _HERMES_PROVIDER_ENV_BLOCKLIST (GITHUB_TOKEN
-    # is in the copilot provider's api_key_env_vars and gets stripped).
-    env = _make_execute_only_env(["DATABASE_URL"])
+    env = _make_execute_only_env(["GITHUB_TOKEN"])

-    monkeypatch.delenv("DATABASE_URL", raising=False)
-    monkeypatch.setattr(docker_env, "_load_hermes_env_vars", lambda: {"DATABASE_URL": "value_from_dotenv"})
+    monkeypatch.delenv("GITHUB_TOKEN", raising=False)
+    monkeypatch.setattr(docker_env, "_load_hermes_env_vars", lambda: {"GITHUB_TOKEN": "value_from_dotenv"})

    args = env._build_init_env_args()
    args_str = " ".join(args)

-    assert "DATABASE_URL=value_from_dotenv" in args_str
+    assert "GITHUB_TOKEN=value_from_dotenv" in args_str


 def test_init_env_args_prefers_shell_env_over_hermes_dotenv(monkeypatch):
    """Shell env vars take priority over .env file values in init env args."""
-    env = _make_execute_only_env(["DATABASE_URL"])
+    env = _make_execute_only_env(["GITHUB_TOKEN"])

-    monkeypatch.setenv("DATABASE_URL", "value_from_shell")
-    monkeypatch.setattr(docker_env, "_load_hermes_env_vars", lambda: {"DATABASE_URL": "value_from_dotenv"})
+    monkeypatch.setenv("GITHUB_TOKEN", "value_from_shell")
+    monkeypatch.setattr(docker_env, "_load_hermes_env_vars", lambda: {"GITHUB_TOKEN": "value_from_dotenv"})

    args = env._build_init_env_args()
    args_str = " ".join(args)

-    assert "DATABASE_URL=value_from_shell" in args_str
+    assert "GITHUB_TOKEN=value_from_shell" in args_str
    assert "value_from_dotenv" not in args_str


--- a/tests/tools/test_managed_server_tool_support.py
+++ b/tests/tools/test_managed_server_tool_support.py
@@ -147,7 +147,7 @@ class TestBaseEnvCompatibility:
        """Hermes wires parser selection through ServerManager.tool_parser."""
        import ast

-        base_env_path = Path(__file__).parent.parent.parent / "environments" / "hermes_base_env.py"
+        base_env_path = Path(__file__).parent.parent / "environments" / "hermes_base_env.py"
        source = base_env_path.read_text()
        tree = ast.parse(source)

@@ -171,7 +171,7 @@ class TestBaseEnvCompatibility:

    def test_hermes_base_env_uses_config_tool_call_parser(self):
        """Verify hermes_base_env uses the config field rather than a local parser instance."""
-        base_env_path = Path(__file__).parent.parent.parent / "environments" / "hermes_base_env.py"
+        base_env_path = Path(__file__).parent.parent / "environments" / "hermes_base_env.py"
        source = base_env_path.read_text()

        assert 'tool_call_parser: str = Field(' in source
--- a/tests/tools/test_send_message_missing_platforms.py
+++ b/tests/tools/test_send_message_missing_platforms.py
@@ -125,9 +125,7 @@ class TestSendMatrix:
        url = call_kwargs[0][0]
        assert url.startswith("https://matrix.example.com/_matrix/client/v3/rooms/!room:example.com/send/m.room.message/")
        assert call_kwargs[1]["headers"]["Authorization"] == "Bearer syt_tok"
-        payload = call_kwargs[1]["json"]
-        assert payload["msgtype"] == "m.text"
-        assert payload["body"] == "hello matrix"
+        assert call_kwargs[1]["json"] == {"msgtype": "m.text", "body": "hello matrix"}

    def test_http_error(self):
        resp = _make_aiohttp_resp(403, text_data="Forbidden")
--- a/tests/tools/test_vision_tools.py
+++ b/tests/tools/test_vision_tools.py
@@ -30,10 +30,7 @@ class TestValidateImageUrl:
    """Tests for URL validation, including urlparse-based netloc check."""

    def test_valid_https_url(self):
-        with patch("tools.url_safety.socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("93.184.216.34", 0)),
-        ]):
-            assert _validate_image_url("https://example.com/image.jpg") is True
+        assert _validate_image_url("https://example.com/image.jpg") is True

    def test_valid_http_url(self):
        with patch("tools.url_safety.socket.getaddrinfo", return_value=[
@@ -59,16 +56,10 @@ class TestValidateImageUrl:
        assert _validate_image_url("http://localhost:8080/image.png") is False

    def test_valid_url_with_port(self):
-        with patch("tools.url_safety.socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("93.184.216.34", 0)),
-        ]):
-            assert _validate_image_url("http://example.com:8080/image.png") is True
+        assert _validate_image_url("http://example.com:8080/image.png") is True

    def test_valid_url_with_path_only(self):
-        with patch("tools.url_safety.socket.getaddrinfo", return_value=[
-            (2, 1, 6, "", ("93.184.216.34", 0)),
-        ]):
-            assert _validate_image_url("https://example.com/") is True
+        assert _validate_image_url("https://example.com/") is True

    def test_rejects_empty_string(self):
        assert _validate_image_url("") is False
@@ -450,11 +441,6 @@ class TestVisionRequirements:
        (tmp_path / "auth.json").write_text(
            '{"active_provider":"openai-codex","providers":{"openai-codex":{"tokens":{"access_token":"codex-access-token","refresh_token":"codex-refresh-token"}}}}'
        )
-        # config.yaml must reference the codex provider so vision auto-detect
-        # falls back to the active provider via _read_main_provider().
-        (tmp_path / "config.yaml").write_text(
-            'model:\n  default: gpt-4o\n  provider: openai-codex\n'
-        )
        monkeypatch.delenv("OPENROUTER_API_KEY", raising=False)
        monkeypatch.delenv("OPENAI_BASE_URL", raising=False)
        monkeypatch.delenv("OPENAI_API_KEY", raising=False)
--- a/tests/tools/test_web_tools_tavily.py
+++ b/tests/tools/test_web_tools_tavily.py
@@ -225,7 +225,6 @@ class TestWebCrawlTavily:
             patch.dict(os.environ, {"TAVILY_API_KEY": "tvly-test"}), \
             patch("tools.web_tools.httpx.post", return_value=mock_response), \
             patch("tools.web_tools.check_website_access", return_value=None), \
-             patch("tools.web_tools.is_safe_url", return_value=True), \
             patch("tools.interrupt.is_interrupted", return_value=False):
            from tools.web_tools import web_crawl_tool
            result = json.loads(asyncio.get_event_loop().run_until_complete(
@@ -245,7 +244,6 @@ class TestWebCrawlTavily:
             patch.dict(os.environ, {"TAVILY_API_KEY": "tvly-test"}), \
             patch("tools.web_tools.httpx.post", return_value=mock_response) as mock_post, \
             patch("tools.web_tools.check_website_access", return_value=None), \
-             patch("tools.web_tools.is_safe_url", return_value=True), \
             patch("tools.interrupt.is_interrupted", return_value=False):
            from tools.web_tools import web_crawl_tool
            asyncio.get_event_loop().run_until_complete(
--- a/website/docs/getting-started/nix-setup.md
+++ b/website/docs/getting-started/nix-setup.md
@@ -74,7 +74,7 @@ This module requires NixOS. For non-NixOS systems (macOS, other Linux distros),
 # /etc/nixos/flake.nix (or your system flake)
 {
  inputs = {
-    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
+    nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.11";
    hermes-agent.url = "github:NousResearch/hermes-agent";
  };

--- a/website/docs/integrations/providers.md
+++ b/website/docs/integrations/providers.md
@@ -230,7 +230,7 @@ model:
 ```

 :::warning Legacy env vars
-`OPENAI_BASE_URL` and `LLM_MODEL` in `.env` are **removed**. Neither is read by any part of Hermes — `config.yaml` is the single source of truth for model and endpoint configuration. If you have stale entries in your `.env`, they are automatically cleared on the next `hermes setup` or config migration. Use `hermes model` or edit `config.yaml` directly.
+`OPENAI_BASE_URL` and `LLM_MODEL` in `.env` are **deprecated**. `OPENAI_BASE_URL` is no longer consulted for endpoint resolution — `config.yaml` is the single source of truth. The CLI ignores `LLM_MODEL` entirely (only the gateway reads it as a fallback). Use `hermes model` or edit `config.yaml` directly — both persist correctly across restarts and Docker containers.
 :::

 Both approaches persist to `config.yaml`, which is the source of truth for model, provider, and base URL.
@@ -657,8 +657,8 @@ model:
 #### Responses get cut off mid-sentence

 **Possible causes:**
-1. **Low output cap (`max_tokens`) on the server** — SGLang defaults to 128 tokens per response. Set `--default-max-tokens` on the server or configure Hermes with `model.max_tokens` in config.yaml. Note: `max_tokens` controls response length only — it is unrelated to how long your conversation history can be (that is `context_length`).
-2. **Context exhaustion** — The model filled its context window. Increase `model.context_length` or enable [context compression](/docs/user-guide/configuration#context-compression) in Hermes.
+1. **Low `max_tokens` on the server** — SGLang defaults to 128 tokens per response. Set `--default-max-tokens` on the server or configure Hermes with `model.max_tokens` in config.yaml.
+2. **Context exhaustion** — The model filled its context window. Increase context length or enable [context compression](/docs/user-guide/configuration#context-compression) in Hermes.

 ---

@@ -751,15 +751,6 @@ model:

 ### Context Length Detection

-:::note Two settings, easy to confuse
-**`context_length`** is the **total context window** — the combined budget for input *and* output tokens (e.g. 200,000 for Claude Opus 4.6). Hermes uses this to decide when to compress history and to validate API requests.
-
-**`model.max_tokens`** is the **output cap** — the maximum number of tokens the model may generate in a *single response*. It has nothing to do with how long your conversation history can be. The industry-standard name `max_tokens` is a common source of confusion; Anthropic's native API has since renamed it `max_output_tokens` for clarity.
-
-Set `context_length` when auto-detection gets the window size wrong.
-Set `model.max_tokens` only when you need to limit how long individual responses can be.
-:::
-
 Hermes uses a multi-source resolution chain to detect the correct context window for your model and provider:

 1. **Config override** — `model.context_length` in config.yaml (highest priority)
--- a/website/docs/reference/cli-commands.md
+++ b/website/docs/reference/cli-commands.md
@@ -43,8 +43,6 @@ hermes [global-options] <command> [subcommand/options]
 | `hermes cron` | Inspect and tick the cron scheduler. |
 | `hermes webhook` | Manage dynamic webhook subscriptions for event-driven activation. |
 | `hermes doctor` | Diagnose config and dependency issues. |
-| `hermes dump` | Copy-pasteable setup summary for support/debugging. |
-| `hermes logs` | View, tail, and filter agent/gateway/error log files. |
 | `hermes config` | Show, edit, migrate, and query configuration files. |
 | `hermes pairing` | Approve or revoke messaging pairing codes. |
 | `hermes skills` | Browse, install, publish, audit, and configure skills. |
@@ -274,149 +272,6 @@ hermes doctor [--fix]
 |--------|-------------|
 | `--fix` | Attempt automatic repairs where possible. |

-## `hermes dump`
-
-```bash
-hermes dump [--show-keys]
-```
-
-Outputs a compact, plain-text summary of your entire Hermes setup. Designed to be copy-pasted into Discord, GitHub issues, or Telegram when asking for support — no ANSI colors, no special formatting, just data.
-
-| Option | Description |
-|--------|-------------|
-| `--show-keys` | Show redacted API key prefixes (first and last 4 characters) instead of just `set`/`not set`. |
-
-### What it includes
-
-| Section | Details |
-|---------|---------|
-| **Header** | Hermes version, release date, git commit hash |
-| **Environment** | OS, Python version, OpenAI SDK version |
-| **Identity** | Active profile name, HERMES_HOME path |
-| **Model** | Configured default model and provider |
-| **Terminal** | Backend type (local, docker, ssh, etc.) |
-| **API keys** | Presence check for all 22 provider/tool API keys |
-| **Features** | Enabled toolsets, MCP server count, memory provider |
-| **Services** | Gateway status, configured messaging platforms |
-| **Workload** | Cron job counts, installed skill count |
-| **Config overrides** | Any config values that differ from defaults |
-
-### Example output
-
-```
--- hermes dump ---
-version:          0.8.0 (2026.4.8) [af4abd2f]
-os:               Linux 6.14.0-37-generic x86_64
-python:           3.11.14
-openai_sdk:       2.24.0
-profile:          default
-hermes_home:      ~/.hermes
-model:            anthropic/claude-opus-4.6
-provider:         openrouter
-terminal:         local
-
-api_keys:
-  openrouter           set
-  openai               not set
-  anthropic            set
-  nous                 not set
-  firecrawl            set
-  ...
-
-features:
-  toolsets:           all
-  mcp_servers:        0
-  memory_provider:    built-in
-  gateway:            running (systemd)
-  platforms:          telegram, discord
-  cron_jobs:          3 active / 5 total
-  skills:             42
-
-config_overrides:
-  agent.max_turns: 250
-  compression.threshold: 0.85
-  display.streaming: True
--- end dump ---
-```
-
-### When to use
-
- Reporting a bug on GitHub — paste the dump into your issue
- Asking for help in Discord — share it in a code block
- Comparing your setup to someone else's
- Quick sanity check when something isn't working
-
-:::tip
-`hermes dump` is specifically designed for sharing. For interactive diagnostics, use `hermes doctor`. For a visual overview, use `hermes status`.
-:::
-
-## `hermes logs`
-
-```bash
-hermes logs [log_name] [options]
-```
-
-View, tail, and filter Hermes log files. All logs are stored in `~/.hermes/logs/` (or `<profile>/logs/` for non-default profiles).
-
-### Log files
-
-| Name | File | What it captures |
-|------|------|-----------------|
-| `agent` (default) | `agent.log` | All agent activity — API calls, tool dispatch, session lifecycle (INFO and above) |
-| `errors` | `errors.log` | Warnings and errors only — a filtered subset of agent.log |
-| `gateway` | `gateway.log` | Messaging gateway activity — platform connections, message dispatch, webhook events |
-
-### Options
-
-| Option | Description |
-|--------|-------------|
-| `log_name` | Which log to view: `agent` (default), `errors`, `gateway`, or `list` to show available files with sizes. |
-| `-n`, `--lines <N>` | Number of lines to show (default: 50). |
-| `-f`, `--follow` | Follow the log in real time, like `tail -f`. Press Ctrl+C to stop. |
-| `--level <LEVEL>` | Minimum log level to show: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`. |
-| `--session <ID>` | Filter lines containing a session ID substring. |
-| `--since <TIME>` | Show lines from a relative time ago: `30m`, `1h`, `2d`, etc. Supports `s` (seconds), `m` (minutes), `h` (hours), `d` (days). |
-
-### Examples
-
-```bash
-# View the last 50 lines of agent.log (default)
-hermes logs
-
-# Follow agent.log in real time
-hermes logs -f
-
-# View the last 100 lines of gateway.log
-hermes logs gateway -n 100
-
-# Show only warnings and errors from the last hour
-hermes logs --level WARNING --since 1h
-
-# Filter by a specific session
-hermes logs --session abc123
-
-# Follow errors.log, starting from 30 minutes ago
-hermes logs errors --since 30m -f
-
-# List all log files with their sizes
-hermes logs list
-```
-
-### Filtering
-
-Filters can be combined. When multiple filters are active, a log line must pass **all** of them to be shown:
-
-```bash
-# WARNING+ lines from the last 2 hours containing session "tg-12345"
-hermes logs --level WARNING --since 2h --session tg-12345
-```
-
-Lines without a parseable timestamp are included when `--since` is active (they may be continuation lines from a multi-line log entry). Lines without a detectable level are included when `--level` is active.
-
-### Log rotation
-
-Hermes uses Python's `RotatingFileHandler`. Old logs are rotated automatically — look for `agent.log.1`, `agent.log.2`, etc. The `hermes logs list` subcommand shows all log files including rotated ones.
-
 ## `hermes config`

 ```bash
--- a/website/docs/reference/environment-variables.md
+++ b/website/docs/reference/environment-variables.md
@@ -53,7 +53,8 @@ All variables go in `~/.hermes/.env`. You can also set them with `hermes config
 | `OPENCODE_GO_API_KEY` | OpenCode Go API key — $10/month subscription for open models ([opencode.ai](https://opencode.ai/auth)) |
 | `OPENCODE_GO_BASE_URL` | Override OpenCode Go base URL |
 | `CLAUDE_CODE_OAUTH_TOKEN` | Explicit Claude Code token override if you export one manually |
-| `HERMES_MODEL` | Override model name at process level (used by cron scheduler; prefer `config.yaml` for normal use) |
+| `HERMES_MODEL` | Preferred model name (checked before `LLM_MODEL`, used by gateway) |
+| `LLM_MODEL` | Default model name (fallback when not set in config.yaml) |
 | `VOICE_TOOLS_OPENAI_KEY` | Preferred OpenAI key for OpenAI speech-to-text and text-to-speech providers |
 | `HERMES_LOCAL_STT_COMMAND` | Optional local speech-to-text command template. Supports `{input_path}`, `{output_dir}`, `{language}`, and `{model}` placeholders |
 | `HERMES_LOCAL_STT_LANGUAGE` | Default language passed to `HERMES_LOCAL_STT_COMMAND` or auto-detected local `whisper` CLI fallback (default: `en`) |
--- a/website/docs/reference/slash-commands.md
+++ b/website/docs/reference/slash-commands.md
@@ -46,6 +46,7 @@ Type `/` in the CLI to open the autocomplete menu. Built-in commands are case-in
 | `/config` | Show current configuration |
 | `/model [model-name]` | Show or change the current model. Supports: `/model claude-sonnet-4`, `/model provider:model` (switch providers), `/model custom:model` (custom endpoint), `/model custom:name:model` (named custom provider), `/model custom` (auto-detect from endpoint) |
 | `/provider` | Show available providers and current provider |
+| `/prompt` | View/set custom system prompt |
 | `/personality` | Set a predefined personality |
 | `/verbose` | Cycle tool progress display: off → new → all → verbose. Can be [enabled for messaging](#notes) via config. |
 | `/reasoning` | Manage reasoning effort and display (usage: /reasoning [level\|show\|hide]) |
@@ -143,7 +144,7 @@ The messaging gateway supports the following built-in commands inside Telegram,

 ## Notes

- `/skin`, `/tools`, `/toolsets`, `/browser`, `/config`, `/cron`, `/skills`, `/platforms`, `/paste`, `/statusbar`, and `/plugins` are **CLI-only** commands.
+- `/skin`, `/tools`, `/toolsets`, `/browser`, `/config`, `/prompt`, `/cron`, `/skills`, `/platforms`, `/paste`, `/statusbar`, and `/plugins` are **CLI-only** commands.
 - `/verbose` is **CLI-only by default**, but can be enabled for messaging platforms by setting `display.tool_progress_command: true` in `config.yaml`. When enabled, it cycles the `display.tool_progress` mode and saves to config.
 - `/status`, `/sethome`, `/update`, `/approve`, `/deny`, and `/commands` are **messaging-only** commands.
 - `/background`, `/voice`, `/reload-mcp`, `/rollback`, and `/yolo` work in **both** the CLI and the messaging gateway.