docs: quote pip install extras to fix zsh glob errors

zsh interprets square brackets as glob patterns, so `pip install hermes-agent[voice]` fails with 'no matches found'. Quote all pip install commands with extras across 5 docs pages (12 instances). Reported by OFumik0OP.
2026-07-02 16:26:34 +08:00 · 2026-03-24 08:53:11 -07:00
25 changed files with 228 additions and 974 deletions
--- a/.github/workflows/supply-chain-audit.yml
+++ b/.github/workflows/supply-chain-audit.yml
@@ -1,192 +0,0 @@
-name: Supply Chain Audit
-
-on:
-  pull_request:
-    types: [opened, synchronize, reopened]
-
-permissions:
-  pull-requests: write
-  contents: read
-
-jobs:
-  scan:
-    name: Scan PR for supply chain risks
-    runs-on: ubuntu-latest
-    steps:
-      - name: Checkout
-        uses: actions/checkout@v4
-        with:
-          fetch-depth: 0
-
-      - name: Scan diff for suspicious patterns
-        id: scan
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          set -euo pipefail
-
-          BASE="${{ github.event.pull_request.base.sha }}"
-          HEAD="${{ github.event.pull_request.head.sha }}"
-
-          # Get the full diff (added lines only)
-          DIFF=$(git diff "$BASE".."$HEAD" -- . ':!uv.lock' ':!*.lock' ':!package-lock.json' ':!yarn.lock' || true)
-
-          FINDINGS=""
-          CRITICAL=false
-
-          # --- .pth files (auto-execute on Python startup) ---
-          PTH_FILES=$(git diff --name-only "$BASE".."$HEAD" | grep '\.pth$' || true)
-          if [ -n "$PTH_FILES" ]; then
-            CRITICAL=true
-            FINDINGS="${FINDINGS}
-          ### 🚨 CRITICAL: .pth file added or modified
-          Python \`.pth\` files in \`site-packages/\` execute automatically when the interpreter starts — no import required. This is the exact mechanism used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512).
-
-          **Files:**
-          \`\`\`
-          ${PTH_FILES}
-          \`\`\`
-          "
-          fi
-
-          # --- base64 + exec/eval combo (the litellm attack pattern) ---
-          B64_EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|decodebytes|urlsafe_b64decode)' | grep -iE 'exec\(|eval\(' | head -10 || true)
-          if [ -n "$B64_EXEC_HITS" ]; then
-            CRITICAL=true
-            FINDINGS="${FINDINGS}
-          ### 🚨 CRITICAL: base64 decode + exec/eval combo
-          This is the exact pattern used in the [litellm supply chain attack](https://github.com/BerriAI/litellm/issues/24512) — base64-decoded strings passed to exec/eval to hide credential-stealing payloads.
-
-          **Matches:**
-          \`\`\`
-          ${B64_EXEC_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- base64 decode/encode (alone — legitimate uses exist) ---
-          B64_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'base64\.(b64decode|b64encode|decodebytes|encodebytes|urlsafe_b64decode)|atob\(|btoa\(|Buffer\.from\(.*base64' | head -20 || true)
-          if [ -n "$B64_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: base64 encoding/decoding detected
-          Base64 has legitimate uses (images, JWT, etc.) but is also commonly used to obfuscate malicious payloads. Verify the usage is appropriate.
-
-          **Matches (first 20):**
-          \`\`\`
-          ${B64_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- exec/eval with string arguments ---
-          EXEC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E '(exec|eval)\s*\(' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert\|# ' | head -20 || true)
-          if [ -n "$EXEC_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: exec() or eval() usage
-          Dynamic code execution can hide malicious behavior, especially when combined with base64 or network fetches.
-
-          **Matches (first 20):**
-          \`\`\`
-          ${EXEC_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- subprocess with encoded/obfuscated commands ---
-          PROC_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -E 'subprocess\.(Popen|call|run)\s*\(' | grep -iE 'base64|decode|encode|\\x|chr\(' | head -10 || true)
-          if [ -n "$PROC_HITS" ]; then
-            CRITICAL=true
-            FINDINGS="${FINDINGS}
-          ### 🚨 CRITICAL: subprocess with encoded/obfuscated command
-          Subprocess calls with encoded arguments are a strong indicator of payload execution.
-
-          **Matches:**
-          \`\`\`
-          ${PROC_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- Network calls to non-standard domains ---
-          EXFIL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'requests\.(post|put)\(|httpx\.(post|put)\(|urllib\.request\.urlopen' | grep -v '^\+\s*#' | grep -v 'test_\|mock\|assert' | head -10 || true)
-          if [ -n "$EXFIL_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: Outbound network calls (POST/PUT)
-          Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.
-
-          **Matches (first 10):**
-          \`\`\`
-          ${EXFIL_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- setup.py / setup.cfg install hooks ---
-          SETUP_HITS=$(git diff --name-only "$BASE".."$HEAD" | grep -E '(setup\.py|setup\.cfg|__init__\.pth|sitecustomize\.py|usercustomize\.py)$' || true)
-          if [ -n "$SETUP_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: Install hook files modified
-          These files can execute code during package installation or interpreter startup.
-
-          **Files:**
-          \`\`\`
-          ${SETUP_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- Compile/marshal/pickle (code object injection) ---
-          MARSHAL_HITS=$(echo "$DIFF" | grep -n '^\+' | grep -iE 'marshal\.loads|pickle\.loads|compile\(' | grep -v '^\+\s*#' | grep -v 'test_\|re\.compile\|ast\.compile' | head -10 || true)
-          if [ -n "$MARSHAL_HITS" ]; then
-            FINDINGS="${FINDINGS}
-          ### ⚠️ WARNING: marshal/pickle/compile usage
-          These can deserialize or construct executable code objects.
-
-          **Matches:**
-          \`\`\`
-          ${MARSHAL_HITS}
-          \`\`\`
-          "
-          fi
-
-          # --- Output results ---
-          if [ -n "$FINDINGS" ]; then
-            echo "found=true" >> "$GITHUB_OUTPUT"
-            if [ "$CRITICAL" = true ]; then
-              echo "critical=true" >> "$GITHUB_OUTPUT"
-            else
-              echo "critical=false" >> "$GITHUB_OUTPUT"
-            fi
-            # Write findings to a file (multiline env vars are fragile)
-            echo "$FINDINGS" > /tmp/findings.md
-          else
-            echo "found=false" >> "$GITHUB_OUTPUT"
-            echo "critical=false" >> "$GITHUB_OUTPUT"
-          fi
-
-      - name: Post warning comment
-        if: steps.scan.outputs.found == 'true'
-        env:
-          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        run: |
-          SEVERITY="⚠️ Supply Chain Risk Detected"
-          if [ "${{ steps.scan.outputs.critical }}" = "true" ]; then
-            SEVERITY="🚨 CRITICAL Supply Chain Risk Detected"
-          fi
-
-          BODY="## ${SEVERITY}
-
-          This PR contains patterns commonly associated with supply chain attacks. This does **not** mean the PR is malicious — but these patterns require careful human review before merging.
-
-          $(cat /tmp/findings.md)
-
-          ---
-          *Automated scan triggered by [supply-chain-audit](/.github/workflows/supply-chain-audit.yml). If this is a false positive, a maintainer can approve after manual review.*"
-
-          gh pr comment "${{ github.event.pull_request.number }}" --body "$BODY"
-
-      - name: Fail on critical findings
-        if: steps.scan.outputs.critical == 'true'
-        run: |
-          echo "::error::CRITICAL supply chain risk patterns detected in this PR. See the PR comment for details."
-          exit 1
--- a/.gitignore
+++ b/.gitignore
@@ -53,4 +53,3 @@ environments/benchmarks/evals/

 # Release script temp files
 .release_notes.md
-mini-swe-agent/
--- a/agent/context_compressor.py
+++ b/agent/context_compressor.py
@@ -35,12 +35,14 @@ SUMMARY_PREFIX = (
 )
 LEGACY_SUMMARY_PREFIX = "[CONTEXT SUMMARY]:"

-# Minimum tokens for the summary output
+# Minimum / maximum tokens for the summary output
 _MIN_SUMMARY_TOKENS = 2000
+_MAX_SUMMARY_TOKENS = 8000
 # Proportion of compressed content to allocate for summary
 _SUMMARY_RATIO = 0.20
-# Absolute ceiling for summary tokens (even on very large context windows)
-_SUMMARY_TOKENS_CEILING = 12_000
+
+# Token budget for tail protection (keep most-recent context)
+_DEFAULT_TAIL_TOKEN_BUDGET = 20_000

 # Placeholder used when pruning old tool results
 _PRUNED_TOOL_PLACEHOLDER = "[Old tool output cleared to save context space]"
@@ -65,8 +67,8 @@ class ContextCompressor:
        model: str,
        threshold_percent: float = 0.50,
        protect_first_n: int = 3,
-        protect_last_n: int = 20,
-        summary_target_ratio: float = 0.20,
+        protect_last_n: int = 4,
+        summary_target_tokens: int = 2500,
        quiet_mode: bool = False,
        summary_model_override: str = None,
        base_url: str = "",
@@ -81,7 +83,7 @@ class ContextCompressor:
        self.threshold_percent = threshold_percent
        self.protect_first_n = protect_first_n
        self.protect_last_n = protect_last_n
-        self.summary_target_ratio = max(0.10, min(summary_target_ratio, 0.80))
+        self.summary_target_tokens = summary_target_tokens
        self.quiet_mode = quiet_mode

        self.context_length = get_model_context_length(
@@ -92,22 +94,12 @@ class ContextCompressor:
        self.threshold_tokens = int(self.context_length * threshold_percent)
        self.compression_count = 0

-        # Derive token budgets: ratio is relative to the threshold, not total context
-        target_tokens = int(self.threshold_tokens * self.summary_target_ratio)
-        self.tail_token_budget = target_tokens
-        self.max_summary_tokens = min(
-            int(self.context_length * 0.05), _SUMMARY_TOKENS_CEILING,
-        )
-
        if not quiet_mode:
            logger.info(
                "Context compressor initialized: model=%s context_length=%d "
-                "threshold=%d (%.0f%%) target_ratio=%.0f%% tail_budget=%d "
-                "provider=%s base_url=%s",
+                "threshold=%d (%.0f%%) provider=%s base_url=%s",
                model, self.context_length, self.threshold_tokens,
-                threshold_percent * 100, self.summary_target_ratio * 100,
-                self.tail_token_budget,
-                provider or "none", base_url or "none",
+                threshold_percent * 100, provider or "none", base_url or "none",
            )
        self._context_probed = False  # True after a step-down from context error

@@ -187,15 +179,10 @@ class ContextCompressor:
    # ------------------------------------------------------------------

    def _compute_summary_budget(self, turns_to_summarize: List[Dict[str, Any]]) -> int:
-        """Scale summary token budget with the amount of content being compressed.
-
-        The maximum scales with the model's context window (5% of context,
-        capped at ``_SUMMARY_TOKENS_CEILING``) so large-context models get
-        richer summaries instead of being hard-capped at 8K tokens.
-        """
+        """Scale summary token budget with the amount of content being compressed."""
        content_tokens = estimate_messages_tokens_rough(turns_to_summarize)
        budget = int(content_tokens * _SUMMARY_RATIO)
-        return max(_MIN_SUMMARY_TOKENS, min(budget, self.max_summary_tokens))
+        return max(_MIN_SUMMARY_TOKENS, min(budget, _MAX_SUMMARY_TOKENS))

    def _serialize_for_summary(self, turns: List[Dict[str, Any]]) -> str:
        """Serialize conversation turns into labeled text for the summarizer.
@@ -490,20 +477,14 @@ Write only the summary body. Do not include any preamble or prefix."""

    def _find_tail_cut_by_tokens(
        self, messages: List[Dict[str, Any]], head_end: int,
-        token_budget: int | None = None,
+        token_budget: int = _DEFAULT_TAIL_TOKEN_BUDGET,
    ) -> int:
        """Walk backward from the end of messages, accumulating tokens until
        the budget is reached. Returns the index where the tail starts.

-        ``token_budget`` defaults to ``self.tail_token_budget`` which is
-        derived from ``summary_target_ratio * context_length``, so it
-        scales automatically with the model's context window.
-
        Never cuts inside a tool_call/result group. Falls back to the old
        ``protect_last_n`` if the budget would protect fewer messages.
        """
-        if token_budget is None:
-            token_budget = self.tail_token_budget
        n = len(messages)
        min_tail = self.protect_last_n
        accumulated = 0
--- a/agent/display.py
+++ b/agent/display.py
@@ -657,6 +657,10 @@ def format_context_pressure(
    The bar and percentage show progress toward the compaction threshold,
    NOT the raw context window.  100% = compaction fires.

+    Uses ANSI colors:
+      - cyan at ~60% to compaction = informational
+      - bold yellow at ~85% to compaction = warning
+
    Args:
        compaction_progress: How close to compaction (0.0–1.0, 1.0 = fires).
        threshold_tokens: Compaction threshold in tokens.
@@ -670,12 +674,18 @@ def format_context_pressure(
    threshold_k = f"{threshold_tokens // 1000}k" if threshold_tokens >= 1000 else str(threshold_tokens)
    threshold_pct_int = int(threshold_percent * 100)

-    color = f"{_BOLD}{_YELLOW}"
-    icon = "⚠"
-    if compression_enabled:
-        hint = "compaction approaching"
+    # Tier styling
+    if compaction_progress >= 0.85:
+        color = f"{_BOLD}{_YELLOW}"
+        icon = "⚠"
+        if compression_enabled:
+            hint = "compaction imminent"
+        else:
+            hint = "no auto-compaction"
    else:
-        hint = "no auto-compaction"
+        color = _CYAN
+        icon = "◐"
+        hint = "approaching compaction"

    return (
        f"  {color}{icon} context {bar} {pct_int}% to compaction{_ANSI_RESET}"
@@ -699,10 +709,14 @@ def format_context_pressure_gateway(

    threshold_pct_int = int(threshold_percent * 100)

-    icon = "⚠️"
-    if compression_enabled:
-        hint = f"Context compaction approaching (threshold: {threshold_pct_int}% of window)."
+    if compaction_progress >= 0.85:
+        icon = "⚠️"
+        if compression_enabled:
+            hint = f"Context compaction is imminent (threshold: {threshold_pct_int}% of window)."
+        else:
+            hint = "Auto-compaction is disabled — context may be truncated."
    else:
-        hint = "Auto-compaction is disabled — context may be truncated."
+        icon = "ℹ️"
+        hint = f"Compaction threshold is at {threshold_pct_int}% of context window."

    return f"{icon} Context: {bar} {pct_int}% to compaction\n{hint}"
--- a/cli-config.yaml.example
+++ b/cli-config.yaml.example
@@ -232,34 +232,19 @@ browser:
 # 1. Tracks actual token usage from API responses (not estimates)
 # 2. When prompt_tokens >= threshold% of model's context_length, triggers compression
 # 3. Protects first 3 turns (system prompt, initial request, first response)
-# 4. Protects last N turns (default 20 messages = ~10 full turns of recent context)
+# 4. Protects last 4 turns (recent context is most relevant)
 # 5. Summarizes middle turns using a fast/cheap model
 # 6. Inserts summary as a user message, continues conversation seamlessly
 #
-# Post-compression tail budget is target_ratio × threshold × context_length:
-#   200K context, threshold 0.50, ratio 0.20 → 20K tokens of recent tail preserved
-#   1M   context, threshold 0.50, ratio 0.20 → 100K tokens of recent tail preserved
-#
 compression:
  # Enable automatic context compression (default: true)
  # Set to false if you prefer to manage context manually or want errors on overflow
  enabled: true
  
-  # Trigger compression at this % of model's context limit (default: 0.50 = 50%)
+  # Trigger compression at this % of model's context limit (default: 0.85 = 85%)
  # Lower values = more aggressive compression, higher values = compress later
-  threshold: 0.50
+  threshold: 0.85
  
-  # Fraction of the threshold to preserve as recent tail (default: 0.20 = 20%)
-  # e.g. 20% of 50% threshold = 10% of total context kept as recent messages.
-  # Summary output is separately capped at 12K tokens (Gemini output limit).
-  # Range: 0.10 - 0.80
-  target_ratio: 0.20
-
-  # Number of most-recent messages to always preserve (default: 20 ≈ 10 full turns)
-  # Higher values keep more recent conversation intact at the cost of more aggressive
-  # compression of older turns.
-  protect_last_n: 20
-
  # Model to use for generating summaries (fast/cheap recommended)
  # This model compresses the middle turns into a concise summary.
  # IMPORTANT: it receives the full middle section of the conversation, so it
--- a/cli.py
+++ b/cli.py
@@ -1509,14 +1509,10 @@ class HermesCLI:

        self._reasoning_buf = getattr(self, "_reasoning_buf", "") + text

-        # Emit complete lines, and force-flush long partial lines so
-        # reasoning is visible in real-time even without newlines.
+        # Emit complete lines
        while "\n" in self._reasoning_buf:
            line, self._reasoning_buf = self._reasoning_buf.split("\n", 1)
            _cprint(f"{_DIM}{line}{_RST}")
-        if len(self._reasoning_buf) > 80:
-            _cprint(f"{_DIM}{self._reasoning_buf}{_RST}")
-            self._reasoning_buf = ""

    def _close_reasoning_box(self) -> None:
        """Close the live reasoning box if it's open."""
--- a/hermes_cli/config.py
+++ b/hermes_cli/config.py
@@ -163,10 +163,8 @@ DEFAULT_CONFIG = {
    
    "compression": {
        "enabled": True,
-        "threshold": 0.50,            # compress when context usage exceeds this ratio
-        "target_ratio": 0.20,         # fraction of threshold to preserve as recent tail
-        "protect_last_n": 20,         # minimum recent messages to keep uncompressed
-        "summary_model": "",          # empty = use main configured model
+        "threshold": 0.50,
+        "summary_model": "",  # empty = use main configured model
        "summary_provider": "auto",
        "summary_base_url": None,
    },
@@ -1687,8 +1685,6 @@ def show_config():
    print(f"  Enabled:      {'yes' if enabled else 'no'}")
    if enabled:
        print(f"  Threshold:    {compression.get('threshold', 0.50) * 100:.0f}%")
-        print(f"  Target ratio: {compression.get('target_ratio', 0.20) * 100:.0f}% of threshold preserved")
-        print(f"  Protect last: {compression.get('protect_last_n', 20)} messages")
        _sm = compression.get('summary_model', '') or '(main model)'
        print(f"  Model:        {_sm}")
        comp_provider = compression.get('summary_provider', 'auto')
--- a/hermes_cli/setup.py
+++ b/hermes_cli/setup.py
@@ -873,9 +873,9 @@ def setup_model_provider(config: dict):
        keep_label = None  # No provider configured — don't show "Keep current"

    provider_choices = [
-        "OpenRouter API key (100+ models, pay-per-use)",
        "Login with Nous Portal (Nous Research subscription — OAuth)",
        "Login with OpenAI Codex",
+        "OpenRouter API key (100+ models, pay-per-use)",
        "Custom OpenAI-compatible endpoint (self-hosted / VLLM / etc.)",
        "Z.AI / GLM (Zhipu AI models)",
        "Kimi / Moonshot (Kimi coding models)",
@@ -894,7 +894,7 @@ def setup_model_provider(config: dict):
        provider_choices.append(keep_label)

    # Default to "Keep current" if a provider exists, otherwise OpenRouter (most common)
-    default_provider = len(provider_choices) - 1 if has_any_provider else 0
+    default_provider = len(provider_choices) - 1 if has_any_provider else 2

    if not has_any_provider:
        print_warning("An inference provider is required for Hermes to work.")
@@ -911,7 +911,81 @@ def setup_model_provider(config: dict):
    selected_base_url = None  # deferred until after model selection
    nous_models = []  # populated if Nous login succeeds

-    if provider_idx == 0:  # OpenRouter
+    if provider_idx == 0:  # Nous Portal (OAuth)
+        selected_provider = "nous"
+        print()
+        print_header("Nous Portal Login")
+        print_info("This will open your browser to authenticate with Nous Portal.")
+        print_info("You'll need a Nous Research account with an active subscription.")
+        print()
+
+        try:
+            from hermes_cli.auth import _login_nous, ProviderConfig
+            import argparse
+
+            mock_args = argparse.Namespace(
+                portal_url=None,
+                inference_url=None,
+                client_id=None,
+                scope=None,
+                no_browser=False,
+                timeout=15.0,
+                ca_bundle=None,
+                insecure=False,
+            )
+            pconfig = PROVIDER_REGISTRY["nous"]
+            _login_nous(mock_args, pconfig)
+            _sync_model_from_disk(config)
+
+            # Fetch models for the selection step
+            try:
+                creds = resolve_nous_runtime_credentials(
+                    min_key_ttl_seconds=5 * 60,
+                    timeout_seconds=15.0,
+                )
+                nous_models = fetch_nous_models(
+                    inference_base_url=creds.get("base_url", ""),
+                    api_key=creds.get("api_key", ""),
+                )
+            except Exception as e:
+                logger.debug("Could not fetch Nous models after login: %s", e)
+
+        except SystemExit:
+            print_warning("Nous Portal login was cancelled or failed.")
+            print_info("You can try again later with: hermes model")
+            selected_provider = None
+        except Exception as e:
+            print_error(f"Login failed: {e}")
+            print_info("You can try again later with: hermes model")
+            selected_provider = None
+
+    elif provider_idx == 1:  # OpenAI Codex
+        selected_provider = "openai-codex"
+        print()
+        print_header("OpenAI Codex Login")
+        print()
+
+        try:
+            import argparse
+
+            mock_args = argparse.Namespace()
+            _login_openai_codex(mock_args, PROVIDER_REGISTRY["openai-codex"])
+            # Clear custom endpoint vars that would override provider routing.
+            if existing_custom:
+                save_env_value("OPENAI_BASE_URL", "")
+                save_env_value("OPENAI_API_KEY", "")
+            _update_config_for_provider("openai-codex", DEFAULT_CODEX_BASE_URL)
+            _set_model_provider(config, "openai-codex", DEFAULT_CODEX_BASE_URL)
+        except SystemExit:
+            print_warning("OpenAI Codex login was cancelled or failed.")
+            print_info("You can try again later with: hermes model")
+            selected_provider = None
+        except Exception as e:
+            print_error(f"Login failed: {e}")
+            print_info("You can try again later with: hermes model")
+            selected_provider = None
+
+    elif provider_idx == 2:  # OpenRouter
        selected_provider = "openrouter"
        print()
        print_header("OpenRouter API Key")
@@ -966,80 +1040,6 @@ def setup_model_provider(config: dict):
        except Exception as e:
            logger.debug("Could not save provider to config.yaml: %s", e)

-    elif provider_idx == 1:  # Nous Portal (OAuth)
-        selected_provider = "nous"
-        print()
-        print_header("Nous Portal Login")
-        print_info("This will open your browser to authenticate with Nous Portal.")
-        print_info("You'll need a Nous Research account with an active subscription.")
-        print()
-
-        try:
-            from hermes_cli.auth import _login_nous, ProviderConfig
-            import argparse
-
-            mock_args = argparse.Namespace(
-                portal_url=None,
-                inference_url=None,
-                client_id=None,
-                scope=None,
-                no_browser=False,
-                timeout=15.0,
-                ca_bundle=None,
-                insecure=False,
-            )
-            pconfig = PROVIDER_REGISTRY["nous"]
-            _login_nous(mock_args, pconfig)
-            _sync_model_from_disk(config)
-
-            # Fetch models for the selection step
-            try:
-                creds = resolve_nous_runtime_credentials(
-                    min_key_ttl_seconds=5 * 60,
-                    timeout_seconds=15.0,
-                )
-                nous_models = fetch_nous_models(
-                    inference_base_url=creds.get("base_url", ""),
-                    api_key=creds.get("api_key", ""),
-                )
-            except Exception as e:
-                logger.debug("Could not fetch Nous models after login: %s", e)
-
-        except SystemExit:
-            print_warning("Nous Portal login was cancelled or failed.")
-            print_info("You can try again later with: hermes model")
-            selected_provider = None
-        except Exception as e:
-            print_error(f"Login failed: {e}")
-            print_info("You can try again later with: hermes model")
-            selected_provider = None
-
-    elif provider_idx == 2:  # OpenAI Codex
-        selected_provider = "openai-codex"
-        print()
-        print_header("OpenAI Codex Login")
-        print()
-
-        try:
-            import argparse
-
-            mock_args = argparse.Namespace()
-            _login_openai_codex(mock_args, PROVIDER_REGISTRY["openai-codex"])
-            # Clear custom endpoint vars that would override provider routing.
-            if existing_custom:
-                save_env_value("OPENAI_BASE_URL", "")
-                save_env_value("OPENAI_API_KEY", "")
-            _update_config_for_provider("openai-codex", DEFAULT_CODEX_BASE_URL)
-            _set_model_provider(config, "openai-codex", DEFAULT_CODEX_BASE_URL)
-        except SystemExit:
-            print_warning("OpenAI Codex login was cancelled or failed.")
-            print_info("You can try again later with: hermes model")
-            selected_provider = None
-        except Exception as e:
-            print_error(f"Login failed: {e}")
-            print_info("You can try again later with: hermes model")
-            selected_provider = None
-
    elif provider_idx == 3:  # Custom endpoint
        selected_provider = "custom"
        print()
--- a/run_agent.py
+++ b/run_agent.py
@@ -585,7 +585,8 @@ class AIAgent:
        # Context pressure warnings: notify the USER (not the LLM) as context
        # fills up.  Purely informational — displayed in CLI output and sent via
        # status_callback for gateway platforms.  Does NOT inject into messages.
-        self._context_pressure_warned = False
+        self._context_50_warned = False
+        self._context_70_warned = False

        # Persistent error log -- always writes WARNING+ to ~/.hermes/logs/errors.log
        # so tool failures, API errors, etc. are inspectable after the fact.
@@ -1012,8 +1013,6 @@ class AIAgent:
        compression_threshold = float(_compression_cfg.get("threshold", 0.50))
        compression_enabled = str(_compression_cfg.get("enabled", True)).lower() in ("true", "1", "yes")
        compression_summary_model = _compression_cfg.get("summary_model") or None
-        compression_target_ratio = float(_compression_cfg.get("target_ratio", 0.20))
-        compression_protect_last = int(_compression_cfg.get("protect_last_n", 20))

        # Read explicit context_length override from model config
        _model_cfg = _agent_cfg.get("model", {})
@@ -1052,8 +1051,8 @@ class AIAgent:
            model=self.model,
            threshold_percent=compression_threshold,
            protect_first_n=3,
-            protect_last_n=compression_protect_last,
-            summary_target_ratio=compression_target_ratio,
+            protect_last_n=4,
+            summary_target_tokens=500,
            summary_model_override=compression_summary_model,
            quiet_mode=self.quiet_mode,
            base_url=self.base_url,
@@ -2363,13 +2362,7 @@ class AIAgent:
            prompt_parts.append(skills_prompt)

        if not self.skip_context_files:
-            # Use TERMINAL_CWD for context file discovery when set (gateway
-            # mode).  The gateway process runs from the hermes-agent install
-            # dir, so os.getcwd() would pick up the repo's AGENTS.md and
-            # other dev files — inflating token usage by ~10k for no benefit.
-            _context_cwd = os.getenv("TERMINAL_CWD") or None
-            context_files_prompt = build_context_files_prompt(
-                cwd=_context_cwd, skip_soul=_soul_loaded)
+            context_files_prompt = build_context_files_prompt(skip_soul=_soul_loaded)
            if context_files_prompt:
                prompt_parts.append(context_files_prompt)

@@ -3585,20 +3578,7 @@ class AIAgent:

        def _call_chat_completions():
            """Stream a chat completions response."""
-            import httpx as _httpx
-            _base_timeout = float(os.getenv("HERMES_API_TIMEOUT", 900.0))
-            _stream_read_timeout = float(os.getenv("HERMES_STREAM_READ_TIMEOUT", 60.0))
-            stream_kwargs = {
-                **api_kwargs,
-                "stream": True,
-                "stream_options": {"include_usage": True},
-                "timeout": _httpx.Timeout(
-                    connect=30.0,
-                    read=_stream_read_timeout,
-                    write=_base_timeout,
-                    pool=30.0,
-                ),
-            }
+            stream_kwargs = {**api_kwargs, "stream": True, "stream_options": {"include_usage": True}}
            request_client_holder["client"] = self._create_request_openai_client(
                reason="chat_completion_stream_request"
            )
@@ -3666,7 +3646,6 @@ class AIAgent:
                        name = entry["function"]["name"]
                        if name and idx not in tool_gen_notified:
                            tool_gen_notified.add(idx)
-                            _fire_first_delta()
                            self._fire_tool_gen_started(name)

                if chunk.choices[0].finish_reason:
@@ -3735,7 +3714,6 @@ class AIAgent:
                            has_tool_use = True
                            tool_name = getattr(block, "name", None)
                            if tool_name:
-                                _fire_first_delta()
                                self._fire_tool_gen_started(tool_name)

                    elif event_type == "content_block_delta":
@@ -3757,84 +3735,29 @@ class AIAgent:
                return stream.get_final_message()

        def _call():
-            import httpx as _httpx
-
-            _max_stream_retries = int(os.getenv("HERMES_STREAM_RETRIES", 2))
-
            try:
-                for _stream_attempt in range(_max_stream_retries + 1):
+                if self.api_mode == "anthropic_messages":
+                    self._try_refresh_anthropic_client_credentials()
+                    result["response"] = _call_anthropic()
+                else:
+                    result["response"] = _call_chat_completions()
+            except Exception as e:
+                if deltas_were_sent["yes"]:
+                    # Streaming failed AFTER some tokens were already delivered
+                    # to consumers. Don't fall back — that would cause
+                    # double-delivery (partial streamed + full non-streamed).
+                    # Let the error propagate; the partial content already
+                    # reached the user via the stream.
+                    logger.warning("Streaming failed after partial delivery, not falling back: %s", e)
+                    result["error"] = e
+                else:
+                    # Streaming failed before any tokens reached consumers.
+                    # Safe to fall back to the standard non-streaming path.
+                    logger.info("Streaming failed before delivery, falling back to non-streaming: %s", e)
                    try:
-                        if self.api_mode == "anthropic_messages":
-                            self._try_refresh_anthropic_client_credentials()
-                            result["response"] = _call_anthropic()
-                        else:
-                            result["response"] = _call_chat_completions()
-                        return  # success
-                    except Exception as e:
-                        if deltas_were_sent["yes"]:
-                            # Streaming failed AFTER some tokens were already
-                            # delivered.  Don't retry or fall back — partial
-                            # content already reached the user.
-                            logger.warning(
-                                "Streaming failed after partial delivery, not retrying: %s", e
-                            )
-                            result["error"] = e
-                            return
-
-                        _is_timeout = isinstance(
-                            e, (_httpx.ReadTimeout, _httpx.ConnectTimeout, _httpx.PoolTimeout)
-                        )
-                        _is_conn_err = isinstance(
-                            e, (_httpx.ConnectError, _httpx.RemoteProtocolError, ConnectionError)
-                        )
-
-                        if _is_timeout or _is_conn_err:
-                            # Transient network / timeout error.  Retry the
-                            # streaming request with a fresh connection rather
-                            # than falling back to non-streaming (which would
-                            # hang for up to 15 min on the same dead server).
-                            if _stream_attempt < _max_stream_retries:
-                                logger.info(
-                                    "Streaming attempt %s/%s failed (%s: %s), "
-                                    "retrying with fresh connection...",
-                                    _stream_attempt + 1,
-                                    _max_stream_retries + 1,
-                                    type(e).__name__,
-                                    e,
-                                )
-                                # Close the stale request client before retry
-                                stale = request_client_holder.get("client")
-                                if stale is not None:
-                                    self._close_request_openai_client(
-                                        stale, reason="stream_retry_cleanup"
-                                    )
-                                    request_client_holder["client"] = None
-                                continue
-                            # Exhausted retries — propagate to outer loop
-                            logger.warning(
-                                "Streaming exhausted %s retries on transient error: %s",
-                                _max_stream_retries + 1,
-                                e,
-                            )
-                            result["error"] = e
-                            return
-
-                        # Non-transient error (e.g. "streaming not supported",
-                        # auth error, 4xx).  Fall back to non-streaming once.
-                        err_msg = str(e).lower()
-                        if "stream" in err_msg and "not supported" in err_msg:
-                            logger.info(
-                                "Streaming not supported, falling back to non-streaming: %s", e
-                            )
-                            try:
-                                result["response"] = self._interruptible_api_call(api_kwargs)
-                            except Exception as fallback_err:
-                                result["error"] = fallback_err
-                            return
-
-                        # Unknown error — propagate to outer retry loop
-                        result["error"] = e
-                        return
+                        result["response"] = self._interruptible_api_call(api_kwargs)
+                    except Exception as fallback_err:
+                        result["error"] = fallback_err
            finally:
                request_client = request_client_holder.get("client")
                if request_client is not None:
@@ -4686,17 +4609,9 @@ class AIAgent:
            except Exception as e:
                logger.debug("Session DB compression split failed: %s", e)

-        # Reset context pressure warning and token estimate — usage drops
-        # after compaction.  Without this, the stale last_prompt_tokens from
-        # the previous API call causes the pressure calculation to stay at
-        # >1000% and spam warnings / re-trigger compression in a loop.
-        self._context_pressure_warned = False
-        _compressed_est = (
-            estimate_tokens_rough(new_system_prompt)
-            + estimate_messages_tokens_rough(compressed)
-        )
-        self.context_compressor.last_prompt_tokens = _compressed_est
-        self.context_compressor.last_completion_tokens = 0
+        # Reset context pressure warnings — usage drops after compaction
+        self._context_50_warned = False
+        self._context_70_warned = False

        return compressed, new_system_prompt

@@ -6929,8 +6844,12 @@ class AIAgent:
                    # and fires status_callback for gateway platforms.
                    if _compressor.threshold_tokens > 0:
                        _compaction_progress = _estimated_next_prompt / _compressor.threshold_tokens
-                        if _compaction_progress >= 0.85 and not self._context_pressure_warned:
-                            self._context_pressure_warned = True
+                        if _compaction_progress >= 0.85 and not self._context_70_warned:
+                            self._context_70_warned = True
+                            self._context_50_warned = True  # skip first tier if we jumped past it
+                            self._emit_context_pressure(_compaction_progress, _compressor)
+                        elif _compaction_progress >= 0.60 and not self._context_50_warned:
+                            self._context_50_warned = True
                            self._emit_context_pressure(_compaction_progress, _compressor)

                    if self.compression_enabled and _compressor.should_compress(_estimated_next_prompt):
--- a/tests/agent/test_context_compressor.py
+++ b/tests/agent/test_context_compressor.py
@@ -217,7 +217,7 @@ class TestCompressWithClient:
        mock_client.chat.completions.create.return_value = mock_response

        with patch("agent.context_compressor.get_model_context_length", return_value=100000):
-            c = ContextCompressor(model="test", quiet_mode=True, protect_first_n=2, protect_last_n=2)
+            c = ContextCompressor(model="test", quiet_mode=True)

        msgs = [{"role": "user" if i % 2 == 0 else "assistant", "content": f"msg {i}"} for i in range(10)]
        with patch("agent.context_compressor.call_llm", return_value=mock_response):
@@ -513,52 +513,3 @@ class TestCompressWithClient:
        for msg in result:
            if msg.get("role") == "tool" and msg.get("tool_call_id"):
                assert msg["tool_call_id"] in called_ids
-
-
-class TestSummaryTargetRatio:
-    """Verify that summary_target_ratio properly scales budgets with context window."""
-
-    def test_tail_budget_scales_with_context(self):
-        """Tail token budget should be threshold_tokens * summary_target_ratio."""
-        with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
-            c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.40)
-        # 200K * 0.50 threshold * 0.40 ratio = 40K
-        assert c.tail_token_budget == 40_000
-
-        with patch("agent.context_compressor.get_model_context_length", return_value=1_000_000):
-            c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.40)
-        # 1M * 0.50 threshold * 0.40 ratio = 200K
-        assert c.tail_token_budget == 200_000
-
-    def test_summary_cap_scales_with_context(self):
-        """Max summary tokens should be 5% of context, capped at 12K."""
-        with patch("agent.context_compressor.get_model_context_length", return_value=200_000):
-            c = ContextCompressor(model="test", quiet_mode=True)
-        assert c.max_summary_tokens == 10_000  # 200K * 0.05
-
-        with patch("agent.context_compressor.get_model_context_length", return_value=1_000_000):
-            c = ContextCompressor(model="test", quiet_mode=True)
-        assert c.max_summary_tokens == 12_000  # capped at 12K ceiling
-
-    def test_ratio_clamped(self):
-        """Ratio should be clamped to [0.10, 0.80]."""
-        with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
-            c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.05)
-        assert c.summary_target_ratio == 0.10
-
-        with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
-            c = ContextCompressor(model="test", quiet_mode=True, summary_target_ratio=0.95)
-        assert c.summary_target_ratio == 0.80
-
-    def test_default_threshold_is_50_percent(self):
-        """Default compression threshold should be 50%."""
-        with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
-            c = ContextCompressor(model="test", quiet_mode=True)
-        assert c.threshold_percent == 0.50
-        assert c.threshold_tokens == 50_000
-
-    def test_default_protect_last_n_is_20(self):
-        """Default protect_last_n should be 20."""
-        with patch("agent.context_compressor.get_model_context_length", return_value=100_000):
-            c = ContextCompressor(model="test", quiet_mode=True)
-        assert c.protect_last_n == 20
--- a/tests/hermes_cli/test_setup.py
+++ b/tests/hermes_cli/test_setup.py
@@ -34,7 +34,7 @@ def test_nous_oauth_setup_keeps_current_model_when_syncing_disk_provider(

    def fake_prompt_choice(question, choices, default=0):
        if question == "Select your inference provider:":
-            return 1  # Nous Portal
+            return 0
        if question == "Configure vision:":
            return len(choices) - 1
        if question == "Select default model:":
@@ -135,7 +135,7 @@ def test_codex_setup_uses_runtime_access_token_for_live_model_list(tmp_path, mon

    def fake_prompt_choice(question, choices, default=0):
        if question == "Select your inference provider:":
-            return 2  # OpenAI Codex
+            return 1
        if question == "Select default model:":
            return 0
        tts_idx = _maybe_keep_current_tts(question, choices)
--- a/tests/hermes_cli/test_setup_model_provider.py
+++ b/tests/hermes_cli/test_setup_model_provider.py
@@ -401,7 +401,7 @@ def test_setup_switch_custom_to_codex_clears_custom_endpoint_and_updates_config(

    def fake_prompt_choice(question, choices, default=0):
        if question == "Select your inference provider:":
-            return 2  # OpenAI Codex
+            return 1
        if question == "Select default model:":
            return 0
        tts_idx = _maybe_keep_current_tts(question, choices)
--- a/tests/test_context_pressure.py
+++ b/tests/test_context_pressure.py
@@ -29,36 +29,40 @@ class TestFormatContextPressure:
    raw context window.  60% = 60% of the way to compaction.
    """

-    def test_80_percent_uses_warning_icon(self):
-        line = format_context_pressure(0.80, 100_000, 0.50)
-        assert "⚠" in line
-        assert "80% to compaction" in line
+    def test_60_percent_uses_info_icon(self):
+        line = format_context_pressure(0.60, 100_000, 0.50)
+        assert "◐" in line
+        assert "60% to compaction" in line

-    def test_90_percent_uses_warning_icon(self):
-        line = format_context_pressure(0.90, 100_000, 0.50)
+    def test_85_percent_uses_warning_icon(self):
+        line = format_context_pressure(0.85, 100_000, 0.50)
        assert "⚠" in line
-        assert "90% to compaction" in line
+        assert "85% to compaction" in line

    def test_bar_length_scales_with_progress(self):
-        line_80 = format_context_pressure(0.80, 100_000, 0.50)
-        line_95 = format_context_pressure(0.95, 100_000, 0.50)
-        assert line_95.count("▰") > line_80.count("▰")
+        line_60 = format_context_pressure(0.60, 100_000, 0.50)
+        line_85 = format_context_pressure(0.85, 100_000, 0.50)
+        assert line_85.count("▰") > line_60.count("▰")

    def test_shows_threshold_tokens(self):
-        line = format_context_pressure(0.80, 100_000, 0.50)
+        line = format_context_pressure(0.60, 100_000, 0.50)
        assert "100k" in line

    def test_small_threshold(self):
-        line = format_context_pressure(0.80, 500, 0.50)
+        line = format_context_pressure(0.60, 500, 0.50)
        assert "500" in line

    def test_shows_threshold_percent(self):
-        line = format_context_pressure(0.80, 100_000, 0.50)
-        assert "50%" in line
+        line = format_context_pressure(0.85, 100_000, 0.50)
+        assert "50%" in line  # threshold percent shown

-    def test_approaching_hint(self):
-        line = format_context_pressure(0.80, 100_000, 0.50)
-        assert "compaction approaching" in line
+    def test_imminent_hint_at_85(self):
+        line = format_context_pressure(0.85, 100_000, 0.50)
+        assert "compaction imminent" in line
+
+    def test_approaching_hint_below_85(self):
+        line = format_context_pressure(0.60, 100_000, 0.80)
+        assert "approaching compaction" in line

    def test_no_compaction_when_disabled(self):
        line = format_context_pressure(0.85, 100_000, 0.50, compression_enabled=False)
@@ -78,26 +82,26 @@ class TestFormatContextPressure:
 class TestFormatContextPressureGateway:
    """Gateway (plain text) context pressure display."""

-    def test_80_percent_warning(self):
-        msg = format_context_pressure_gateway(0.80, 0.50)
-        assert "80% to compaction" in msg
-        assert "50%" in msg
+    def test_60_percent_informational(self):
+        msg = format_context_pressure_gateway(0.60, 0.50)
+        assert "60% to compaction" in msg
+        assert "50%" in msg  # threshold shown

-    def test_90_percent_warning(self):
-        msg = format_context_pressure_gateway(0.90, 0.50)
-        assert "90% to compaction" in msg
-        assert "approaching" in msg
+    def test_85_percent_warning(self):
+        msg = format_context_pressure_gateway(0.85, 0.50)
+        assert "85% to compaction" in msg
+        assert "imminent" in msg

    def test_no_compaction_warning(self):
        msg = format_context_pressure_gateway(0.85, 0.50, compression_enabled=False)
        assert "disabled" in msg

    def test_no_ansi_codes(self):
-        msg = format_context_pressure_gateway(0.80, 0.50)
+        msg = format_context_pressure_gateway(0.85, 0.50)
        assert "\033[" not in msg

    def test_has_progress_bar(self):
-        msg = format_context_pressure_gateway(0.80, 0.50)
+        msg = format_context_pressure_gateway(0.85, 0.50)
        assert "▰" in msg


@@ -141,8 +145,9 @@ def agent():
 class TestContextPressureFlags:
    """Context pressure warning flag tracking on AIAgent."""

-    def test_flag_initialized_false(self, agent):
-        assert agent._context_pressure_warned is False
+    def test_flags_initialized_false(self, agent):
+        assert agent._context_50_warned is False
+        assert agent._context_70_warned is False

    def test_emit_calls_status_callback(self, agent):
        """status_callback should be invoked with event type and message."""
@@ -199,11 +204,13 @@ class TestContextPressureFlags:
        captured = capsys.readouterr()
        assert "▰" not in captured.out

-    def test_flag_reset_on_compression(self, agent):
-        """After _compress_context, context pressure flag should reset."""
-        agent._context_pressure_warned = True
+    def test_flags_reset_on_compression(self, agent):
+        """After _compress_context, context pressure flags should reset."""
+        agent._context_50_warned = True
+        agent._context_70_warned = True
        agent.compression_enabled = True

+        # Mock the compressor's compress method to return minimal valid output
        agent.context_compressor = MagicMock()
        agent.context_compressor.compress.return_value = [
            {"role": "user", "content": "Summary of conversation so far."}
@@ -211,9 +218,11 @@ class TestContextPressureFlags:
        agent.context_compressor.context_length = 200_000
        agent.context_compressor.threshold_tokens = 100_000

+        # Mock _todo_store
        agent._todo_store = MagicMock()
        agent._todo_store.format_for_injection.return_value = None

+        # Mock _build_system_prompt
        agent._build_system_prompt = MagicMock(return_value="system prompt")
        agent._cached_system_prompt = "old system prompt"
        agent._session_db = None
@@ -224,7 +233,8 @@ class TestContextPressureFlags:
        ]
        agent._compress_context(messages, "system prompt")

-        assert agent._context_pressure_warned is False
+        assert agent._context_50_warned is False
+        assert agent._context_70_warned is False

    def test_emit_callback_error_handled(self, agent):
        """If status_callback raises, it should be caught gracefully."""
--- a/tools/browser_tool.py
+++ b/tools/browser_tool.py
@@ -1567,20 +1567,6 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
        vision_model = _get_vision_model()
        logger.debug("browser_vision: analysing screenshot (%d bytes)",
                     len(image_data))
-
-        # Read vision timeout from config (auxiliary.vision.timeout), default 120s.
-        # Local vision models (llama.cpp, ollama) can take well over 30s for
-        # screenshot analysis, so the default must be generous.
-        vision_timeout = 120.0
-        try:
-            from hermes_cli.config import load_config
-            _cfg = load_config()
-            _vt = _cfg.get("auxiliary", {}).get("vision", {}).get("timeout")
-            if _vt is not None:
-                vision_timeout = float(_vt)
-        except Exception:
-            pass
-
        call_kwargs = {
            "task": "vision",
            "messages": [
@@ -1594,7 +1580,6 @@ def browser_vision(question: str, annotate: bool = False, task_id: Optional[str]
            ],
            "max_tokens": 2000,
            "temperature": 0.1,
-            "timeout": vision_timeout,
        }
        if vision_model:
            call_kwargs["model"] = vision_model
--- a/tools/session_search_tool.py
+++ b/tools/session_search_tool.py
@@ -179,58 +179,6 @@ async def _summarize_session(
                return None


-def _list_recent_sessions(db, limit: int, current_session_id: str = None) -> str:
-    """Return metadata for the most recent sessions (no LLM calls)."""
-    try:
-        sessions = db.list_sessions_rich(limit=limit + 5)  # fetch extra to skip current
-
-        # Resolve current session lineage to exclude it
-        current_root = None
-        if current_session_id:
-            try:
-                sid = current_session_id
-                visited = set()
-                while sid and sid not in visited:
-                    visited.add(sid)
-                    s = db.get_session(sid)
-                    parent = s.get("parent_session_id") if s else None
-                    sid = parent if parent else None
-                current_root = max(visited, key=len) if visited else current_session_id
-            except Exception:
-                current_root = current_session_id
-
-        results = []
-        for s in sessions:
-            sid = s.get("id", "")
-            if current_root and (sid == current_root or sid == current_session_id):
-                continue
-            # Skip child/delegation sessions (they have parent_session_id)
-            if s.get("parent_session_id"):
-                continue
-            results.append({
-                "session_id": sid,
-                "title": s.get("title") or None,
-                "source": s.get("source", ""),
-                "started_at": s.get("started_at", ""),
-                "last_active": s.get("last_active", ""),
-                "message_count": s.get("message_count", 0),
-                "preview": s.get("preview", ""),
-            })
-            if len(results) >= limit:
-                break
-
-        return json.dumps({
-            "success": True,
-            "mode": "recent",
-            "results": results,
-            "count": len(results),
-            "message": f"Showing {len(results)} most recent sessions. Use a keyword query to search specific topics.",
-        }, ensure_ascii=False)
-    except Exception as e:
-        logging.error("Error listing recent sessions: %s", e, exc_info=True)
-        return json.dumps({"success": False, "error": f"Failed to list recent sessions: {e}"}, ensure_ascii=False)
-
-
 def session_search(
    query: str,
    role_filter: str = None,
@@ -247,14 +195,11 @@ def session_search(
    if db is None:
        return json.dumps({"success": False, "error": "Session database not available."}, ensure_ascii=False)

-    limit = min(limit, 5)  # Cap at 5 sessions to avoid excessive LLM calls
-
-    # Recent sessions mode: when query is empty, return metadata for recent sessions.
-    # No LLM calls — just DB queries for titles, previews, timestamps.
    if not query or not query.strip():
-        return _list_recent_sessions(db, limit, current_session_id)
+        return json.dumps({"success": False, "error": "Query cannot be empty."}, ensure_ascii=False)

    query = query.strip()
+    limit = min(limit, 5)  # Cap at 5 sessions to avoid excessive LLM calls

    try:
        # Parse role filter
@@ -419,14 +364,8 @@ def check_session_search_requirements() -> bool:
 SESSION_SEARCH_SCHEMA = {
    "name": "session_search",
    "description": (
-        "Search your long-term memory of past conversations, or browse recent sessions. This is your recall -- "
+        "Search your long-term memory of past conversations. This is your recall -- "
        "every past session is searchable, and this tool summarizes what happened.\n\n"
-        "TWO MODES:\n"
-        "1. Recent sessions (no query): Call with no arguments to see what was worked on recently. "
-        "Returns titles, previews, and timestamps. Zero LLM cost, instant. "
-        "Start here when the user asks what were we working on or what did we do recently.\n"
-        "2. Keyword search (with query): Search for specific topics across all past sessions. "
-        "Returns LLM-generated summaries of matching sessions.\n\n"
        "USE THIS PROACTIVELY when:\n"
        "- The user says 'we did this before', 'remember when', 'last time', 'as I mentioned'\n"
        "- The user asks about a topic you worked on before but don't have in current context\n"
@@ -446,7 +385,7 @@ SESSION_SEARCH_SCHEMA = {
        "properties": {
            "query": {
                "type": "string",
-                "description": "Search query — keywords, phrases, or boolean expressions to find in past sessions. Omit this parameter entirely to browse recent sessions instead (returns titles, previews, timestamps with no LLM cost).",
+                "description": "Search query — keywords, phrases, or boolean expressions to find in past sessions.",
            },
            "role_filter": {
                "type": "string",
@@ -458,7 +397,7 @@ SESSION_SEARCH_SCHEMA = {
                "default": 3,
            },
        },
-        "required": [],
+        "required": ["query"],
    },
 }

@@ -471,7 +410,7 @@ registry.register(
    toolset="session_search",
    schema=SESSION_SEARCH_SCHEMA,
    handler=lambda args, **kw: session_search(
-        query=args.get("query") or "",
+        query=args.get("query", ""),
        role_filter=args.get("role_filter"),
        limit=args.get("limit", 3),
        db=kw.get("db"),
--- a/tools/skills_guard.py
+++ b/tools/skills_guard.py
@@ -1050,9 +1050,6 @@ def _get_configured_model() -> str:

 def _resolve_trust_level(source: str) -> str:
    """Map a source identifier to a trust level."""
-    # Agent-created skills get their own permissive trust level
-    if source == "agent-created":
-        return "agent-created"
    # Official optional skills shipped with the repo
    if source.startswith("official/") or source == "official":
        return "builtin"
--- a/tools/vision_tools.py
+++ b/tools/vision_tools.py
@@ -325,9 +325,8 @@ async def vision_analyze_tool(
        logger.info("Processing image with vision model...")
        
        # Call the vision API via centralized router.
-        # Read timeout from config.yaml (auxiliary.vision.timeout), default 120s.
-        # Local vision models (llama.cpp, ollama) can take well over 30s.
-        vision_timeout = 120.0
+        # Read timeout from config.yaml (auxiliary.vision.timeout), default 30s.
+        vision_timeout = 30.0
        try:
            from hermes_cli.config import load_config
            _cfg = load_config()
--- a/website/docs/developer-guide/creating-skills.md
+++ b/website/docs/developer-guide/creating-skills.md
@@ -57,15 +57,6 @@ metadata:
  hermes:
    tags: [Category, Subcategory, Keywords]
    related_skills: [other-skill-name]
-    requires_toolsets: [web]            # Optional — only show when these toolsets are active
-    requires_tools: [web_search]        # Optional — only show when these tools are available
-    fallback_for_toolsets: [browser]    # Optional — hide when these toolsets are active
-    fallback_for_tools: [browser_navigate]  # Optional — hide when these tools exist
-required_environment_variables:          # Optional — env vars the skill needs
-  - name: MY_API_KEY
-    prompt: "Enter your API key"
-    help: "Get one at https://example.com"
-    required_for: "API access"
 ---

 # Skill Title
@@ -100,57 +91,6 @@ platforms: [windows]          # Windows only

 When set, the skill is automatically hidden from the system prompt, `skills_list()`, and slash commands on incompatible platforms. If omitted or empty, the skill loads on all platforms (backward compatible).

-### Conditional Skill Activation
-
-Skills can declare dependencies on specific tools or toolsets. This controls whether the skill appears in the system prompt for a given session.
-
-```yaml
-metadata:
-  hermes:
-    requires_toolsets: [web]           # Hide if the web toolset is NOT active
-    requires_tools: [web_search]       # Hide if web_search tool is NOT available
-    fallback_for_toolsets: [browser]   # Hide if the browser toolset IS active
-    fallback_for_tools: [browser_navigate]  # Hide if browser_navigate IS available
-```
-
-| Field | Behavior |
-|-------|----------|
-| `requires_toolsets` | Skill is **hidden** when ANY listed toolset is **not** available |
-| `requires_tools` | Skill is **hidden** when ANY listed tool is **not** available |
-| `fallback_for_toolsets` | Skill is **hidden** when ANY listed toolset **is** available |
-| `fallback_for_tools` | Skill is **hidden** when ANY listed tool **is** available |
-
-**Use case for `fallback_for_*`:** Create a skill that serves as a workaround when a primary tool isn't available. For example, a `duckduckgo-search` skill with `fallback_for_tools: [web_search]` only shows when the web search tool (which requires an API key) is not configured.
-
-**Use case for `requires_*`:** Create a skill that only makes sense when certain tools are present. For example, a web scraping workflow skill with `requires_toolsets: [web]` won't clutter the prompt when web tools are disabled.
-
-### Environment Variable Requirements
-
-Skills can declare environment variables they need. When a skill is loaded via `skill_view`, its required vars are automatically registered for passthrough into sandboxed execution environments (terminal, execute_code).
-
-```yaml
-required_environment_variables:
-  - name: TENOR_API_KEY
-    prompt: "Tenor API key"               # Shown when prompting user
-    help: "Get your key at https://tenor.com"  # Help text or URL
-    required_for: "GIF search functionality"   # What needs this var
-```
-
-Each entry supports:
- `name` (required) — the environment variable name
- `prompt` (optional) — prompt text when asking the user for the value
- `help` (optional) — help text or URL for obtaining the value
- `required_for` (optional) — describes which feature needs this variable
-
-Users can also manually configure passthrough variables in `config.yaml`:
-
-```yaml
-terminal:
-  env_passthrough:
-    - MY_CUSTOM_VAR
-    - ANOTHER_VAR
-```
-
 See `skills/apple/` for examples of macOS-only skills.

 ## Secure Setup on Load
--- a/website/docs/user-guide/configuration.md
+++ b/website/docs/user-guide/configuration.md
@@ -55,22 +55,6 @@ Settings are resolved in this order (highest priority first):
 Secrets (API keys, bot tokens, passwords) go in `.env`. Everything else (model, terminal backend, compression settings, memory limits, toolsets) goes in `config.yaml`. When both are set, `config.yaml` wins for non-secret settings.
 :::

-## Environment Variable Substitution
-
-You can reference environment variables in `config.yaml` using `${VAR_NAME}` syntax:
-
-```yaml
-auxiliary:
-  vision:
-    api_key: ${GOOGLE_API_KEY}
-    base_url: ${CUSTOM_VISION_URL}
-
-delegation:
-  api_key: ${DELEGATION_KEY}
-```
-
-Multiple references in a single value work: `url: "${HOST}:${PORT}"`. If a referenced variable is not set, the placeholder is kept verbatim (`${UNDEFINED_VAR}` stays as-is). Only the `${VAR}` syntax is supported — bare `$VAR` is not expanded.
-
 ## Inference Providers

 You need at least one way to connect to an LLM. Use `hermes model` to switch providers and models interactively, or configure directly:
@@ -1345,23 +1329,6 @@ Usage: type `/status`, `/disk`, `/update`, or `/gpu` in the CLI or any messaging
 - **Type** — only `exec` is supported (runs a shell command); other types show an error
 - **Works everywhere** — CLI, Telegram, Discord, Slack, WhatsApp, Signal, Email, Home Assistant

-## Gateway Streaming
-
-Enable progressive token delivery on messaging platforms. When streaming is enabled, responses appear character-by-character in Telegram, Discord, and Slack via message editing, rather than waiting for the full response.
-
-```yaml
-streaming:
-  enabled: false              # Enable streaming token delivery (default: off)
-  transport: edit             # "edit" (progressive message editing) or "off"
-  edit_interval: 0.3          # Min seconds between message edits
-  buffer_threshold: 40        # Characters accumulated before forcing an edit
-  cursor: " ▉"               # Cursor character shown during streaming
-```
-
-**Platform support:** Telegram, Discord, and Slack support edit-based streaming. Platforms that don't support message editing (Signal, Email, Home Assistant) are auto-detected on the first attempt — streaming is gracefully disabled for that session with no flood of messages.
-
-**Overflow handling:** If the streamed text exceeds the platform's message length limit (~4096 chars), the current message is finalized and a new one starts automatically.
-
 ## Human Delay

 Simulate human-like response pacing in messaging platforms:
@@ -1383,27 +1350,6 @@ code_execution:
  max_tool_calls: 50           # Max tool calls within code execution
 ```

-## Web Search Backends
-
-The `web_search`, `web_extract`, and `web_crawl` tools support three backend providers. Configure the backend in `config.yaml` or via `hermes tools`:
-
-```yaml
-web:
-  backend: firecrawl    # firecrawl | parallel | tavily
-```
-
-| Backend | Env Var | Search | Extract | Crawl |
-|---------|---------|--------|---------|-------|
-| **Firecrawl** (default) | `FIRECRAWL_API_KEY` | ✔ | ✔ | ✔ |
-| **Parallel** | `PARALLEL_API_KEY` | ✔ | ✔ | — |
-| **Tavily** | `TAVILY_API_KEY` | ✔ | ✔ | ✔ |
-
-**Backend selection:** If `web.backend` is not set, the backend is auto-detected from available API keys. If only `TAVILY_API_KEY` is set, Tavily is used. If only `PARALLEL_API_KEY` is set, Parallel is used. Otherwise Firecrawl is the default.
-
-**Self-hosted Firecrawl:** Set `FIRECRAWL_API_URL` to point at your own instance. When a custom URL is set, the API key becomes optional (set `USE_DB_AUTHENTICATION=false` on the server to disable auth).
-
-**Parallel search modes:** Set `PARALLEL_SEARCH_MODE` to control search behavior — `fast`, `one-shot`, or `agentic` (default: `agentic`).
-
 ## Browser

 Configure browser automation behavior:
--- a/website/docs/user-guide/features/api-server.md
+++ b/website/docs/user-guide/features/api-server.md
@@ -231,6 +231,6 @@ Any frontend that supports the OpenAI API format works. Tested/documented integr

 ## Limitations

- **Response storage** — stored responses (for `previous_response_id`) are persisted in SQLite and survive gateway restarts. Max 100 stored responses (LRU eviction).
+- **Response storage is in-memory** — stored responses (for `previous_response_id`) are lost on gateway restart. Max 100 stored responses (LRU eviction).
 - **No file upload** — vision/document analysis via uploaded files is not yet supported through the API.
 - **Model field is cosmetic** — the `model` field in requests is accepted but the actual LLM model used is configured server-side in config.yaml.
--- a/website/docs/user-guide/features/context-references.md
+++ b/website/docs/user-guide/features/context-references.md
@@ -1,109 +0,0 @@
---
-sidebar_position: 9
-title: "Context References"
-description: "Inline @-syntax for attaching files, folders, git diffs, and URLs directly into your messages"
---
-
-# Context References
-
-Type `@` followed by a reference to inject content directly into your message. Hermes expands the reference inline and appends the content under an `--- Attached Context ---` section.
-
-## Supported References
-
-| Syntax | Description |
-|--------|-------------|
-| `@file:path/to/file.py` | Inject file contents |
-| `@file:path/to/file.py:10-25` | Inject specific line range (1-indexed, inclusive) |
-| `@folder:path/to/dir` | Inject directory tree listing with file metadata |
-| `@diff` | Inject `git diff` (unstaged working tree changes) |
-| `@staged` | Inject `git diff --staged` (staged changes) |
-| `@git:5` | Inject last N commits with patches (max 10) |
-| `@url:https://example.com` | Fetch and inject web page content |
-
-## Usage Examples
-
-```text
-Review @file:src/main.py and suggest improvements
-
-What changed? @diff
-
-Compare @file:old_config.yaml and @file:new_config.yaml
-
-What's in @folder:src/components?
-
-Summarize this article @url:https://arxiv.org/abs/2301.00001
-```
-
-Multiple references work in a single message:
-
-```text
-Check @file:main.py, and also @file:test.py.
-```
-
-Trailing punctuation (`,`, `.`, `;`, `!`, `?`) is automatically stripped from reference values.
-
-## CLI Tab Completion
-
-In the interactive CLI, typing `@` triggers autocomplete:
-
- `@` shows all reference types (`@diff`, `@staged`, `@file:`, `@folder:`, `@git:`, `@url:`)
- `@file:` and `@folder:` trigger filesystem path completion with file size metadata
- Bare `@` followed by partial text shows matching files and folders from the current directory
-
-## Line Ranges
-
-The `@file:` reference supports line ranges for precise content injection:
-
-```text
-@file:src/main.py:42        # Single line 42
-@file:src/main.py:10-25     # Lines 10 through 25 (inclusive)
-```
-
-Lines are 1-indexed. Invalid ranges are silently ignored (full file is returned).
-
-## Size Limits
-
-Context references are bounded to prevent overwhelming the model's context window:
-
-| Threshold | Value | Behavior |
-|-----------|-------|----------|
-| Soft limit | 25% of context length | Warning appended, expansion proceeds |
-| Hard limit | 50% of context length | Expansion refused, original message returned unchanged |
-| Folder entries | 200 files max | Excess entries replaced with `- ...` |
-| Git commits | 10 max | `@git:N` clamped to range [1, 10] |
-
-## Security
-
-### Sensitive Path Blocking
-
-These paths are always blocked from `@file:` references to prevent credential exposure:
-
- SSH keys and config: `~/.ssh/id_rsa`, `~/.ssh/id_ed25519`, `~/.ssh/authorized_keys`, `~/.ssh/config`
- Shell profiles: `~/.bashrc`, `~/.zshrc`, `~/.profile`, `~/.bash_profile`, `~/.zprofile`
- Credential files: `~/.netrc`, `~/.pgpass`, `~/.npmrc`, `~/.pypirc`
- Hermes env: `$HERMES_HOME/.env`
-
-These directories are fully blocked (any file inside):
- `~/.ssh/`, `~/.aws/`, `~/.gnupg/`, `~/.kube/`, `$HERMES_HOME/skills/.hub/`
-
-### Path Traversal Protection
-
-All paths are resolved relative to the working directory. References that resolve outside the allowed workspace root are rejected.
-
-### Binary File Detection
-
-Binary files are detected via MIME type and null-byte scanning. Known text extensions (`.py`, `.md`, `.json`, `.yaml`, `.toml`, `.js`, `.ts`, etc.) bypass MIME-based detection. Binary files are rejected with a warning.
-
-## Error Handling
-
-Invalid references produce inline warnings rather than failures:
-
-| Condition | Behavior |
-|-----------|----------|
-| File not found | Warning: "file not found" |
-| Binary file | Warning: "binary files are not supported" |
-| Folder not found | Warning: "folder not found" |
-| Git command fails | Warning with git stderr |
-| URL returns no content | Warning: "no content extracted" |
-| Sensitive path | Warning: "path is a sensitive credential file" |
-| Path outside workspace | Warning: "path is outside the allowed workspace" |
--- a/website/docs/user-guide/features/hooks.md
+++ b/website/docs/user-guide/features/hooks.md
@@ -6,20 +6,9 @@ description: "Run custom code at key lifecycle points — log activity, send ale

 # Event Hooks

-Hermes has two hook systems that run custom code at key lifecycle points:
+The hooks system lets you run custom code at key points in the agent lifecycle — session creation, slash commands, each tool-calling step, and more. Hooks fire automatically during gateway operation without blocking the main agent pipeline.

-| System | Registered via | Runs in | Use case |
-|--------|---------------|---------|----------|
-| **[Gateway hooks](#gateway-event-hooks)** | `HOOK.yaml` + `handler.py` in `~/.hermes/hooks/` | Gateway only | Logging, alerts, webhooks |
-| **[Plugin hooks](#plugin-hooks)** | `ctx.register_hook()` in a [plugin](/docs/user-guide/features/plugins) | CLI + Gateway | Tool interception, metrics, guardrails |
-
-Both systems are non-blocking — errors in any hook are caught and logged, never crashing the agent.
-
-## Gateway Event Hooks
-
-Gateway hooks fire automatically during gateway operation (Telegram, Discord, Slack, WhatsApp) without blocking the main agent pipeline.
-
-### Creating a Hook
+## Creating a Hook

 Each hook is a directory under `~/.hermes/hooks/` containing two files:

@@ -30,7 +19,7 @@ Each hook is a directory under `~/.hermes/hooks/` containing two files:
    └── handler.py     # Python handler function
 ```

-#### HOOK.yaml
+### HOOK.yaml

 ```yaml
 name: my-hook
@@ -43,7 +32,7 @@ events:

 The `events` list determines which events trigger your handler. You can subscribe to any combination of events, including wildcards like `command:*`.

-#### handler.py
+### handler.py

 ```python
 import json
@@ -69,26 +58,25 @@ async def handle(event_type: str, context: dict):
 - Can be `async def` or regular `def` — both work
 - Errors are caught and logged, never crashing the agent

-### Available Events
+## Available Events

 | Event | When it fires | Context keys |
 |-------|---------------|--------------|
 | `gateway:startup` | Gateway process starts | `platforms` (list of active platform names) |
 | `session:start` | New messaging session created | `platform`, `user_id`, `session_id`, `session_key` |
-| `session:end` | Session ended (before reset) | `platform`, `user_id`, `session_key` |
 | `session:reset` | User ran `/new` or `/reset` | `platform`, `user_id`, `session_key` |
 | `agent:start` | Agent begins processing a message | `platform`, `user_id`, `session_id`, `message` |
 | `agent:step` | Each iteration of the tool-calling loop | `platform`, `user_id`, `session_id`, `iteration`, `tool_names` |
 | `agent:end` | Agent finishes processing | `platform`, `user_id`, `session_id`, `message`, `response` |
 | `command:*` | Any slash command executed | `platform`, `user_id`, `command`, `args` |

-#### Wildcard Matching
+### Wildcard Matching

 Handlers registered for `command:*` fire for any `command:` event (`command:model`, `command:reset`, etc.). Monitor all slash commands with a single subscription.

-### Examples
+## Examples

-#### Telegram Alert on Long Tasks
+### Telegram Alert on Long Tasks

 Send yourself a message when the agent takes more than 10 steps:

@@ -121,7 +109,7 @@ async def handle(event_type: str, context: dict):
            )
 ```

-#### Command Usage Logger
+### Command Usage Logger

 Track which slash commands are used:

@@ -154,7 +142,7 @@ def handle(event_type: str, context: dict):
        f.write(json.dumps(entry) + "\n")
 ```

-#### Session Start Webhook
+### Session Start Webhook

 POST to an external service on new sessions:

@@ -181,7 +169,7 @@ async def handle(event_type: str, context: dict):
        }, timeout=5)
 ```

-### How It Works
+## How It Works

 1. On gateway startup, `HookRegistry.discover_and_load()` scans `~/.hermes/hooks/`
 2. Each subdirectory with `HOOK.yaml` + `handler.py` is loaded dynamically
@@ -190,51 +178,5 @@ async def handle(event_type: str, context: dict):
 5. Errors in any handler are caught and logged — a broken hook never crashes the agent

 :::info
-Gateway hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp). The CLI does not load gateway hooks. For hooks that work everywhere, use [plugin hooks](#plugin-hooks).
+Hooks only fire in the **gateway** (Telegram, Discord, Slack, WhatsApp). The CLI does not currently load hooks.
 :::
-
-## Plugin Hooks
-
-[Plugins](/docs/user-guide/features/plugins) can register hooks that fire in **both CLI and gateway** sessions. These are registered programmatically via `ctx.register_hook()` in your plugin's `register()` function.
-
-```python
-def register(ctx):
-    ctx.register_hook("pre_tool_call", my_callback)
-    ctx.register_hook("post_tool_call", my_callback)
-```
-
-### Available Plugin Hooks
-
-| Hook | Fires when | Callback receives |
-|------|-----------|-------------------|
-| `pre_tool_call` | Before any tool executes | `tool_name`, `args`, `task_id` |
-| `post_tool_call` | After any tool returns | `tool_name`, `args`, `result`, `task_id` |
-| `pre_llm_call` | Before LLM API request | *(planned — not yet wired)* |
-| `post_llm_call` | After LLM API response | *(planned — not yet wired)* |
-| `on_session_start` | Session begins | *(planned — not yet wired)* |
-| `on_session_end` | Session ends | *(planned — not yet wired)* |
-
-Callbacks receive keyword arguments matching the columns above:
-
-```python
-def my_callback(**kwargs):
-    tool = kwargs["tool_name"]
-    args = kwargs["args"]
-    # ...
-```
-
-### Example: Block Dangerous Tools
-
-```python
-# ~/.hermes/plugins/tool-guard/__init__.py
-BLOCKED = {"terminal", "write_file"}
-
-def guard(**kwargs):
-    if kwargs["tool_name"] in BLOCKED:
-        print(f"⚠ Blocked tool call: {kwargs['tool_name']}")
-
-def register(ctx):
-    ctx.register_hook("pre_tool_call", guard)
-```
-
-See the **[Plugins guide](/docs/user-guide/features/plugins)** for full details on creating plugins.
--- a/website/docs/user-guide/features/plugins.md
+++ b/website/docs/user-guide/features/plugins.md
@@ -46,16 +46,14 @@ Project-local plugins under `./.hermes/plugins/` are disabled by default. Enable

 ## Available hooks

-Plugins can register callbacks for these lifecycle events. See the **[Event Hooks page](/docs/user-guide/features/hooks#plugin-hooks)** for full details, callback signatures, and examples.
-
 | Hook | Fires when |
 |------|-----------|
 | `pre_tool_call` | Before any tool executes |
 | `post_tool_call` | After any tool returns |
-| `pre_llm_call` | Before LLM API request *(planned)* |
-| `post_llm_call` | After LLM API response *(planned)* |
-| `on_session_start` | Session begins *(planned)* |
-| `on_session_end` | Session ends *(planned)* |
+| `pre_llm_call` | Before LLM API request |
+| `post_llm_call` | After LLM API response |
+| `on_session_start` | Session begins |
+| `on_session_end` | Session ends |

 ## Slash commands

--- a/website/docs/user-guide/security.md
+++ b/website/docs/user-guide/security.md
@@ -358,42 +358,6 @@ When a blocked URL is requested, the tool returns an error explaining the domain

 See [Website Blocklist](/docs/user-guide/configuration#website-blocklist) in the configuration guide for full details.

-### SSRF Protection
-
-All URL-capable tools (web search, web extract, vision, browser) validate URLs before fetching them to prevent Server-Side Request Forgery (SSRF) attacks. Blocked addresses include:
-
- **Private networks** (RFC 1918): `10.0.0.0/8`, `172.16.0.0/12`, `192.168.0.0/16`
- **Loopback**: `127.0.0.0/8`, `::1`
- **Link-local**: `169.254.0.0/16` (includes cloud metadata at `169.254.169.254`)
- **CGNAT / shared address space** (RFC 6598): `100.64.0.0/10` (Tailscale, WireGuard VPNs)
- **Cloud metadata hostnames**: `metadata.google.internal`, `metadata.goog`
- **Reserved, multicast, and unspecified addresses**
-
-SSRF protection is always active and cannot be disabled. DNS failures are treated as blocked (fail-closed). Redirect chains are re-validated at each hop to prevent redirect-based bypasses.
-
-### Tirith Pre-Exec Security Scanning
-
-Hermes integrates [tirith](https://github.com/sheeki03/tirith) for content-level command scanning before execution. Tirith detects threats that pattern matching alone misses:
-
- Homograph URL spoofing (internationalized domain attacks)
- Pipe-to-interpreter patterns (`curl | bash`, `wget | sh`)
- Terminal injection attacks
-
-Tirith auto-installs from GitHub releases on first use with SHA-256 checksum verification (and cosign provenance verification if cosign is available).
-
-```yaml
-# In ~/.hermes/config.yaml
-security:
-  tirith_enabled: true       # Enable/disable tirith scanning (default: true)
-  tirith_path: "tirith"      # Path to tirith binary (default: PATH lookup)
-  tirith_timeout: 5          # Subprocess timeout in seconds
-  tirith_fail_open: true     # Allow execution when tirith is unavailable (default: true)
-```
-
-When `tirith_fail_open` is `true` (default), commands proceed if tirith is not installed or times out. Set to `false` in high-security environments to block commands when tirith is unavailable.
-
-Tirith's verdict integrates with the approval flow: safe commands pass through, suspicious commands trigger user approval, and dangerous commands are blocked.
-
 ### Context File Injection Protection

 Context files (AGENTS.md, .cursorrules, SOUL.md) are scanned for prompt injection before being included in the system prompt. The scanner checks for:
--- a/website/docs/user-guide/sessions.md
+++ b/website/docs/user-guide/sessions.md
@@ -114,13 +114,7 @@ Session IDs follow the format `YYYYMMDD_HHMMSS_<8-char-hex>`, e.g. `20250305_091

 Give sessions human-readable titles so you can find and resume them easily.

-### Auto-Generated Titles
-
-Hermes automatically generates a short descriptive title (3–7 words) for each session after the first exchange. This runs in a background thread using a fast auxiliary model, so it adds no latency. You'll see auto-generated titles when browsing sessions with `hermes sessions list` or `hermes sessions browse`.
-
-Auto-titling only fires once per session and is skipped if you've already set a title manually.
-
-### Setting a Title Manually
+### Setting a Title

 Use the `/title` slash command inside any chat session (CLI or gateway):