pwncollege: slot pool for process mode + include_challenges filter

Replace asyncio.Semaphore with pre-allocated slot pool (asyncio.Queue) in process_manager. Eliminates silent item drops from slot contention — 188/850 items were lost in the semaphore-based approach. Key changes: - _acquire_instance(): pool mode resets existing slot, falls back to create on failure. Tracks actual slot ID through replacements. - collect_trajectory(): accepts pool_instance kwarg to skip acquisition in pool mode. evaluate/serve modes unchanged. - process_manager(): pre-allocates dojo slots into asyncio.Queue, tasks wait for real slots, return them via finally (even on failure). - include_challenges config field: explicit challenge list for retry runs, overrides dojo/module filters in setup(). Bug fixes from Claude Code review: - Dead slot no longer returned to pool on acquisition failure (actual_slot=None) - Tool resolution moved before asyncio.gather (no concurrent redundant calls) - Slot replacement logged for debugging - Pre-allocation count fix in error message
feat: add disable_secret_redaction config for RL environments
2026-04-28 23:11:37 +08:00 · 2026-03-31 22:30:25 +00:00 · 2026-03-31 16:19:22 +00:00 · 2026-03-31 01:06:51 -07:00 · 2026-03-31 00:56:51 -07:00 · 2026-03-31 00:24:02 -07:00
21 changed files with 2790 additions and 141 deletions
--- a/environments/agent_loop.py
+++ b/environments/agent_loop.py
@@ -18,7 +18,7 @@ import logging
 import os
 import uuid
 from dataclasses import dataclass, field
-from typing import Any, Dict, List, Optional, Set
+from typing import Any, Callable, Dict, List, Optional, Set
 from model_tools import handle_function_call
@@ -138,6 +138,7 @@ class HermesAgentLoop:
        temperature: float = 1.0,
        max_tokens: Optional[int] = None,
        extra_body: Optional[Dict[str, Any]] = None,
        early_stop_check: Optional[Callable[[List[Dict[str, Any]]], bool]] = None,
    ):
        """
        Initialize the agent loop.
@@ -154,6 +155,9 @@ class HermesAgentLoop:
            extra_body: Extra parameters passed to the OpenAI client's create() call.
                        Used for OpenRouter provider preferences, transforms, etc.
                        e.g. {"provider": {"ignore": ["DeepInfra"]}}
            early_stop_check: Optional callback that inspects messages after each tool
                        turn. If it returns True, the loop ends with finished_naturally=True.
                        Used for environment-level completion signals (e.g., flag accepted).
        """
        self.server = server
        self.tool_schemas = tool_schemas
@@ -163,6 +167,7 @@ class HermesAgentLoop:
        self.temperature = temperature
        self.max_tokens = max_tokens
        self.extra_body = extra_body
        self.early_stop_check = early_stop_check
    async def run(self, messages: List[Dict[str, Any]]) -> AgentResult:
        """
@@ -456,6 +461,23 @@ class HermesAgentLoop:
                        }
                    )
                # Check if environment signals early stop (e.g., flag accepted)
                if self.early_stop_check and self.early_stop_check(messages):
                    turn_elapsed = _time.monotonic() - turn_start
                    logger.info(
                        "[%s] turn %d: early stop triggered after %d tools (%.1fs)",
                        self.task_id[:8], turn + 1,
                        len(assistant_msg.tool_calls), turn_elapsed,
                    )
                    return AgentResult(
                        messages=messages,
                        managed_state=self._get_managed_state(),
                        turns_used=turn + 1,
                        finished_naturally=True,
                        reasoning_per_turn=reasoning_per_turn,
                        tool_errors=tool_errors,
                    )
                turn_elapsed = _time.monotonic() - turn_start
                logger.info(
                    "[%s] turn %d: api=%.1fs, %d tools, turn_total=%.1fs",
--- a/environments/hermes_base_env.py
+++ b/environments/hermes_base_env.py
@@ -176,6 +176,22 @@ class HermesAgentEnvConfig(BaseEnvConfig):
        "transforms, and other provider-specific settings.",
    )
    # --- Security guards ---
    disable_command_guards: bool = Field(
        default=False,
        description="Disable terminal command security guards (dangerous command "
        "detection, tirith scanning, approval prompts). Enable this for RL "
        "environment runs where the agent operates inside isolated containers "
        "and needs unrestricted command execution (e.g., pwn.college challenges "
        "that require inline Python, raw sockets, binary exploitation, etc.).",
    )
    disable_secret_redaction: bool = Field(
        default=False,
        description="Disable secret/password redaction in tool output. Enable this "
        "for RL environments where the agent needs to read source code containing "
        "password fields (e.g. Flask apps in web-security challenges).",
    )
 class HermesAgentBaseEnv(BaseEnv):
    """
@@ -218,6 +234,15 @@ class HermesAgentBaseEnv(BaseEnv):
            os.environ["TERMINAL_ENV"] = config.terminal_backend
        os.environ["TERMINAL_TIMEOUT"] = str(config.terminal_timeout)
        os.environ["TERMINAL_LIFETIME_SECONDS"] = str(config.terminal_lifetime)
        # Disable command security guards for RL environments that need
        # unrestricted execution (agent runs inside isolated containers).
        if config.disable_command_guards:
            os.environ["HERMES_YOLO_MODE"] = "1"
            print("🔓 Command guards disabled (disable_command_guards=true)")
        if config.disable_secret_redaction:
            os.environ["HERMES_REDACT_SECRETS"] = "false"
            print("🔓 Secret redaction disabled (disable_secret_redaction=true)")
        print(
            f"🖥️  Terminal: backend={config.terminal_backend}, "
            f"timeout={config.terminal_timeout}s, lifetime={config.terminal_lifetime}s"
--- a/environments/pwncollege_env/init.py
+++ b/environments/pwncollege_env/init.py
@@ -0,0 +1 @@
 from .pwncollege_env import PwnCollegeEnv, PwnCollegeEnvConfig
--- a/environments/pwncollege_env/default.yaml
+++ b/environments/pwncollege_env/default.yaml
@@ -0,0 +1,47 @@
 # PwnCollege Training Environment
 #
 # Usage:
 #   python environments/pwncollege_env/pwncollege_env.py serve \
 #       --config environments/pwncollege_env/default.yaml
 #
 #   python environments/pwncollege_env/pwncollege_env.py process \
 #       --config environments/pwncollege_env/default.yaml \
 #       --env.data_path_to_save_groups sft_data.jsonl
 env:
  enabled_toolsets: ["terminal", "file", "pwncollege"]
  max_agent_turns: 20
  max_token_length: 16384
  agent_temperature: 0.7
  terminal_backend: "ssh"
  # Dojo connection
  base_url: "http://100.120.55.25:8080"
  ssh_host: "100.120.55.25"
  ssh_port: 2222
  ssh_key: "environments/pwncollege_env/keys/rl_test_key"
  # Training: challenge selection
  # challenge: "hello/hello"        # Single challenge (training fallback)
  # dojo_filter: "linux-luminarium" # Filter training set by dojo
  # module_filter: "hello"          # Filter training set by module
  # Eval settings (null = all)
  eval_dojo: null
  eval_module: null
  eval_exclude_dojos: ["archive"]
  eval_concurrency: 16
  # Atropos settings
  data_dir_to_save_evals: "eval_output/pwncollege"
  use_wandb: false
  wandb_name: "pwncollege"
  ensure_scores_are_not_same: false
  tokenizer_name: "NousResearch/Hermes-3-Llama-3.1-8B"
 openai:
  base_url: "https://openrouter.ai/api/v1"
  model_name: "anthropic/claude-sonnet-4.5"
  server_type: "openai"
  health_check: false
  # api_key: set OPENROUTER_API_KEY in .env or shell
--- a/environments/pwncollege_env/eval_intro_cybersec_flash.yaml
+++ b/environments/pwncollege_env/eval_intro_cybersec_flash.yaml
@@ -0,0 +1,74 @@
 env:
  group_size: 4
  max_num_workers: -1
  max_eval_workers: 16
  max_num_workers_per_node: 8
  steps_per_eval: 100
  max_token_length: 16384
  eval_handling: STOP_TRAIN
  eval_limit_ratio: 0.5
  inference_weight: 1.0
  batch_size: -1
  max_batches_offpolicy: 3
  tokenizer_name: NousResearch/Hermes-3-Llama-3.1-8B
  use_wandb: false
  rollout_server_url: http://localhost:8000
  total_steps: 1000
  wandb_name: pwncollege-intro-cybersec-flash
  num_rollouts_to_keep: 32
  num_rollouts_per_group_for_logging: 1
  ensure_scores_are_not_same: false
  data_path_to_save_groups: null
  data_dir_to_save_evals: environments/pwncollege_env/eval_runs/intro_cybersec_flash
  min_items_sent_before_logging: 2
  include_messages: false
  min_batch_allocation: null
  worker_timeout: 600.0
  thinking_mode: false
  reasoning_effort: null
  max_reasoning_tokens: null
  custom_thinking_prompt: null
  enabled_toolsets:
    - terminal
    - file
    - pwncollege
  disabled_toolsets: null
  distribution: null
  max_agent_turns: 80
  agent_temperature: 0.7
  terminal_backend: ssh
  terminal_timeout: 120
  terminal_lifetime: 3600
  disable_command_guards: true
  dataset_name: null
  dataset_split: train
  prompt_field: prompt
  tool_pool_size: 128
  tool_call_parser: hermes
  extra_body: null
  base_url: http://100.120.55.25:8080
  ssh_host: 100.120.55.25
  ssh_port: 2222
  ssh_key: environments/pwncollege_env/keys/rl_test_key
  challenge: hello/hello
  dojo_filter: null
  module_filter: null
  eval_dojo: intro-to-cybersecurity
  eval_exclude_dojos:
    - archive
  eval_module: null
  eval_concurrency: 8
 openai:
  - timeout: 1200
    num_max_requests_at_once: 512
    num_requests_for_eval: 64
    model_name: xiaomi/mimo-v2-flash
    rolling_buffer_length: 1000
    server_type: openai
    tokenizer_name: none
    api_key: ""
    base_url: https://openrouter.ai/api/v1
    n_kwarg_is_ignored: false
    health_check: false
 slurm: false
 testing: false
--- a/environments/pwncollege_env/evaluate_config.yaml
+++ b/environments/pwncollege_env/evaluate_config.yaml
@@ -0,0 +1,73 @@
 env:
  group_size: 4
  max_num_workers: -1
  max_eval_workers: 16
  max_num_workers_per_node: 8
  steps_per_eval: 100
  max_token_length: 16384
  eval_handling: STOP_TRAIN
  eval_limit_ratio: 0.5
  inference_weight: 1.0
  batch_size: -1
  max_batches_offpolicy: 3
  tokenizer_name: NousResearch/Hermes-3-Llama-3.1-8B
  use_wandb: false
  rollout_server_url: http://localhost:8000
  total_steps: 1000
  wandb_name: pwncollege
  num_rollouts_to_keep: 32
  num_rollouts_per_group_for_logging: 1
  ensure_scores_are_not_same: false
  data_path_to_save_groups: null
  data_dir_to_save_evals: eval_output/pwncollege
  min_items_sent_before_logging: 2
  include_messages: false
  min_batch_allocation: null
  worker_timeout: 600.0
  thinking_mode: false
  reasoning_effort: null
  max_reasoning_tokens: null
  custom_thinking_prompt: null
  enabled_toolsets:
    - terminal
    - file
    - pwncollege
  disabled_toolsets: null
  distribution: null
  max_agent_turns: 50
  agent_temperature: 0.7
  terminal_backend: ssh
  terminal_timeout: 120
  terminal_lifetime: 3600
  dataset_name: null
  dataset_split: train
  prompt_field: prompt
  tool_pool_size: 128
  tool_call_parser: hermes
  extra_body: null
  base_url: http://100.120.55.25:8080
  ssh_host: 100.120.55.25
  ssh_port: 2222
  ssh_key: environments/pwncollege_env/keys/rl_test_key
  challenge: hello/hello
  dojo_filter: null
  module_filter: null
  eval_dojo: linux-luminarium
  eval_exclude_dojos:
    - archive
  eval_module: hello
  eval_concurrency: 16
 openai:
  - timeout: 1200
    num_max_requests_at_once: 512
    num_requests_for_eval: 64
    model_name: xiaomi/mimo-v2-flash
    rolling_buffer_length: 1000
    server_type: openai
    tokenizer_name: none
    api_key: ""
    base_url: https://openrouter.ai/api/v1
    n_kwarg_is_ignored: false
    health_check: false
 slurm: false
 testing: false
--- a/environments/pwncollege_env/keys/.gitignore
+++ b/environments/pwncollege_env/keys/.gitignore
@@ -0,0 +1,3 @@
 # SSH private keys -- never commit
 *
 !.gitignore
--- a/environments/pwncollege_env/process_config.yaml
+++ b/environments/pwncollege_env/process_config.yaml
@@ -0,0 +1,54 @@
 env:
  # Breadth: total items to process (>= 842 challenges in dojo)
  total_steps: 850
  # Depth: completions per item (1 = max coverage speed)
  group_size: 1
  # Concurrency: match dojo max_instances (16 slots)
  eval_concurrency: 16
  max_agent_turns: 30
  max_token_length: 16384
  agent_temperature: 0.7
  enabled_toolsets:
    - terminal
    - file
    - pwncollege
  terminal_backend: ssh
  terminal_timeout: 120
  terminal_lifetime: 3600
  disable_command_guards: true
  disable_secret_redaction: true
  tool_call_parser: hermes
  # Dojo connection
  base_url: http://100.120.55.25:8080
  ssh_host: 100.120.55.25
  ssh_port: 2222
  ssh_key: ""
  # No filters = all challenges
  challenge: hello/hello
  dojo_filter: null
  module_filter: null
  # Output
  data_path_to_save_groups: environments/pwncollege_env/eval_runs/process_all.jsonl
  data_dir_to_save_evals: environments/pwncollege_env/eval_runs/process_all
  ensure_scores_are_not_same: false
  include_messages: true
  use_wandb: false
  wandb_name: pwncollege-process
  # Not needed for process mode but required by config schema
  tokenizer_name: NousResearch/Hermes-3-Llama-3.1-8B
  rollout_server_url: http://localhost:8000
 openai:
  - timeout: 1200
    num_max_requests_at_once: 128
    model_name: xiaomi/mimo-v2-flash
    server_type: openai
    api_key: ""
    base_url: https://openrouter.ai/api/v1
    health_check: false
 slurm: false
 testing: false
--- a/environments/pwncollege_env/prompts.py
+++ b/environments/pwncollege_env/prompts.py
@@ -0,0 +1,35 @@
 """Prompt templates for pwn.college environments."""
 SYSTEM_PROMPT = """You are a CTF player solving security challenges on pwn.college.
 Your goal is to find and capture the flag. Flags follow the format: `pwn.college{...}`
 When you find a flag, use the submit_flag tool to verify it.
 ## How pwn.college Challenges Work
 - `/challenge/run` or `/challenge/bin/<name>` — The challenge binary that checks your work
 - The flag is printed ONLY when you run the challenge binary after correctly completing the task
 - Do NOT search for a pre-existing flag file — you must earn it by solving the challenge
 ## Tips
 - If a challenge needs a long-running process (e.g., a server or listener), run it in the background and interact with it separately.
 - For binary exploitation, check protections with `checksec` and use `python3` with `pwntools` if available.
 - Read error messages carefully — they often hint at what the challenge expects.
 - Sometimes `/challenge/run` drops you into an interactive shell inside a network namespace. To interact with it, use `tmux`:
  ```
  tmux new-session -d -s challenge '/challenge/run'
  tmux send-keys -t challenge 'your-command-here' Enter
  tmux capture-pane -t challenge -p
  ```
 """
 USER_PROMPT_TEMPLATE = """Solve this pwn.college challenge.
 Module: {module_name}
 Challenge: {challenge_name}
 ## Challenge Description
 {challenge_description}"""
--- a/environments/pwncollege_env/pwncollege_env.py
+++ b/environments/pwncollege_env/pwncollege_env.py
@@ -0,0 +1,852 @@
 """
 PwnCollege Training Environment for Hermes-Agent + Atropos
 Uses hermes-agent's tool system and HermesAgentLoop for the agent,
 with pwn.college SDK + SSH for challenge container management.
 Usage:
    python environments/pwncollege_env/pwncollege_env.py serve \
        --config environments/pwncollege_env/default.yaml
    python environments/pwncollege_env/pwncollege_env.py process \
        --config environments/pwncollege_env/default.yaml \
        --env.data_path_to_save_groups sft_data.jsonl
    python environments/pwncollege_env/pwncollege_env.py evaluate \
        --config environments/pwncollege_env/default.yaml
 """
 import asyncio
 import atexit
 import json
 import logging
 import os
 import re
 import signal
 import sys
 import uuid
 import httpx
 from pathlib import Path
 from typing import Any, Dict, List, Optional, Tuple, Union
 from pydantic import Field
 # Ensure repo root is on sys.path
 _repo_root = Path(__file__).resolve().parent.parent.parent
 if str(_repo_root) not in sys.path:
    sys.path.insert(0, str(_repo_root))
 from dotenv import load_dotenv
 _env_path = _repo_root / ".env"
 if _env_path.exists():
    load_dotenv(dotenv_path=_env_path)
 from environments.patches import apply_patches
 apply_patches()
 from atroposlib.envs.base import APIServerConfig, ScoredDataItem
 from atroposlib.type_definitions import Item
 from environments.agent_loop import AgentResult, HermesAgentLoop
 from environments.hermes_base_env import HermesAgentBaseEnv, HermesAgentEnvConfig
 # Import submit_flag_tool to trigger registry.register() at module load
 from environments.pwncollege_env import submit_flag_tool  # noqa: F401
 from environments.pwncollege_env.prompts import SYSTEM_PROMPT, USER_PROMPT_TEMPLATE
 from environments.pwncollege_env.sdk import (
    DojoRLClient, DojoRLSyncClient, RLChallenge, RLInstance,
 )
 from environments.pwncollege_env.submit_flag_tool import (
    clear_flag_context,
    register_flag_context,
 )
 from environments.tool_context import ToolContext
 from tools.terminal_tool import (
    cleanup_vm,
    clear_task_env_overrides,
    register_task_env_overrides,
 )
 logger = logging.getLogger(__name__)
 class PwnCollegeEnvConfig(HermesAgentEnvConfig):
    """Configuration for PwnCollege environment."""
    # Dojo connection
    base_url: str = Field(
        default="http://100.120.55.25:8080",
        description="Dojo API base URL",
    )
    ssh_host: str = Field(
        default="100.120.55.25",
        description="SSH host for challenge containers",
    )
    ssh_port: int = Field(default=2222, description="SSH port")
    ssh_key: str = Field(
        default="",
        description="Path to SSH private key for RL agent",
    )
    # Challenge selection
    challenge: str = Field(
        default="hello/hello",
        description="Challenge in module/challenge format (e.g., 'hello/hello', 'paths/root')",
    )
    dojo_filter: Optional[str] = Field(default=None, description="Filter by dojo ID")
    module_filter: Optional[str] = Field(
        default=None, description="Filter by module ID"
    )
    include_challenges: Optional[List[str]] = Field(
        default=None,
        description="Specific challenge keys to include in training "
        "(format: module_id/challenge_id). Overrides dojo/module "
        "filters. Use for retry runs.",
    )
    # Eval settings
    eval_dojo: Optional[str] = Field(
        default=None,
        description="Dojo to evaluate on (None = all dojos)",
    )
    eval_exclude_dojos: List[str] = Field(
        default_factory=list,
        description="Dojos to exclude from evaluation",
    )
    eval_module: Optional[str] = Field(
        default=None,
        description="Module to evaluate on (None = all modules)",
    )
    eval_exclude_modules: List[str] = Field(
        default_factory=list,
        description="Modules to exclude from evaluation",
    )
    eval_challenges: Optional[List[str]] = Field(
        default=None,
        description="Specific challenges to evaluate (format: module_id/challenge_id). Overrides dojo/module filters.",
    )
    eval_concurrency: int = Field(
        default=4,
        description="Max concurrent eval episodes (limited by dojo slots)",
    )
 class PwnCollegeEnv(HermesAgentBaseEnv):
    """PwnCollege training environment.
    Lifecycle per rollout:
    1. Create dojo instance (SDK) → get slot + ssh_user
    2. Register SSH overrides so terminal tool routes to that instance
    3. Register flag context so submit_flag tool can verify flags
    4. Run hermes-agent loop (terminal + file + submit_flag tools)
    5. Score: did agent submit the correct flag?
    6. Cleanup: destroy instance, clear overrides
    """
    name = "pwncollege"
    env_config_cls = PwnCollegeEnvConfig
    def __init__(
        self,
        config: PwnCollegeEnvConfig,
        server_configs: List[APIServerConfig],
        slurm: bool = False,
        testing: bool = False,
    ):
        # Set global SSH env vars before super().__init__ triggers terminal validation.
        # Per-task overrides (ssh_user) are registered before each rollout.
        os.environ.setdefault("TERMINAL_SSH_HOST", config.ssh_host)
        os.environ.setdefault("TERMINAL_SSH_USER", "rl_0")
        os.environ.setdefault("TERMINAL_SSH_KEY", config.ssh_key)
        # Patch api_key from env var before super().__init__ bakes it into openai.AsyncClient
        api_key = os.getenv("OPENROUTER_API_KEY", "")
        if api_key:
            for sc in server_configs:
                if not sc.api_key:
                    sc.api_key = api_key
        super().__init__(config, server_configs, slurm, testing)
        self.config: PwnCollegeEnvConfig = config
        self.train: list[RLChallenge] = []
        self.iter = 0
        self.solve_rate_buffer: list[float] = []
        self._active_slots: set[int] = set()
        # SDK clients — async for setup/lifecycle, sync for submit_flag handler
        self.client: Optional[DojoRLClient] = None
        self.sync_client: Optional[DojoRLSyncClient] = None
    @classmethod
    def config_init(cls) -> Tuple[PwnCollegeEnvConfig, List[APIServerConfig]]:
        env_config = PwnCollegeEnvConfig(
            enabled_toolsets=["terminal", "file", "pwncollege"],
            max_agent_turns=20,
            max_token_length=16384,
            agent_temperature=0.7,
            terminal_backend="ssh",
            system_prompt=SYSTEM_PROMPT,
            use_wandb=True,
            wandb_name="pwncollege",
            ensure_scores_are_not_same=False,
        )
        server_configs = [
            APIServerConfig(
                base_url="https://openrouter.ai/api/v1",
                model_name="anthropic/claude-sonnet-4.5",
                server_type="openai",
                api_key=os.getenv("OPENROUTER_API_KEY", ""),
                health_check=False,
            ),
        ]
        return env_config, server_configs
    def _cleanup_instances(self):
        """Destroy all running dojo instances. Called on exit/signal."""
        if not self.sync_client:
            return
        try:
            n = self.sync_client.destroy_all()
            if n:
                logger.info("Cleaned up %d dojo instance(s)", n)
        except Exception as e:
            logger.warning("Instance cleanup failed: %s", e)
        if hasattr(self, "_auto_ssh_key_dir"):
            import shutil
            shutil.rmtree(self._auto_ssh_key_dir, ignore_errors=True)
    def _signal_handler(self, signum, frame):
        """Handle SIGINT/SIGTERM: clean up instances, then re-raise."""
        logger.info("Signal %d received, cleaning up dojo instances...", signum)
        self._cleanup_instances()
        signal.signal(signum, signal.SIG_DFL)
        os.kill(os.getpid(), signum)
    async def _ensure_ssh_key(self):
        """Auto-generate and register an SSH key if none configured."""
        if self.config.ssh_key and Path(self.config.ssh_key).exists():
            return
        import subprocess
        import tempfile
        key_dir = Path(tempfile.mkdtemp(prefix="hermes-ssh-"))
        key_path = key_dir / "id_ed25519"
        subprocess.run(
            ["ssh-keygen", "-t", "ed25519", "-f", str(key_path), "-N", "", "-q"],
            check=True,
        )
        pub_key = key_path.with_suffix(".pub").read_text().strip()
        registered = await self.client.register_ssh_key(pub_key)
        if not registered:
            raise RuntimeError("Failed to register SSH key with dojo")
        self.config.ssh_key = str(key_path)
        os.environ["TERMINAL_SSH_KEY"] = str(key_path)
        self._auto_ssh_key_dir = key_dir
        logger.info("Auto-generated SSH key and registered with dojo")
    async def setup(self):
        """Load challenges from dojo and initialize SDK clients."""
        self.client = DojoRLClient(self.config.base_url)
        self.sync_client = DojoRLSyncClient(self.config.base_url)
        await self._ensure_ssh_key()
        atexit.register(self._cleanup_instances)
        signal.signal(signal.SIGINT, self._signal_handler)
        signal.signal(signal.SIGTERM, self._signal_handler)
        # Fetch challenges
        challenges = await self.client.list_challenges()
        logger.info("Fetched %d challenges from dojo", len(challenges))
        # Apply filters
        if self.config.include_challenges:
            # Explicit include list overrides all other filters
            include_set = set(self.config.include_challenges)
            for c in challenges:
                if c.challenge_key in include_set:
                    self.train.append(c)
        else:
            for c in challenges:
                if (self.config.dojo_filter
                        and c.dojo_id != self.config.dojo_filter):
                    continue
                if (self.config.module_filter
                        and c.module_id != self.config.module_filter):
                    continue
                self.train.append(c)
        # If a specific challenge is set and no filters matched, use it directly
        if not self.train and self.config.challenge:
            parts = self.config.challenge.split("/")
            self.train.append(
                RLChallenge(
                    id=parts[-1],
                    module_id=parts[0],
                    dojo_id="unknown",
                    name=self.config.challenge,
                    description="",
                )
            )
        if not self.train:
            raise RuntimeError(
                f"No challenges matched filters (dojo_filter={self.config.dojo_filter}, "
                f"module_filter={self.config.module_filter}, challenge={self.config.challenge}). "
                f"Total available: {len(challenges)}"
            )
        logger.info("Training on %d challenges", len(self.train))
    async def get_next_item(self) -> RLChallenge:
        """Return next challenge item (round-robin)."""
        item = self.train[self.iter % len(self.train)]
        self.iter += 1
        return item
    def _get_challenge_key(self, item: RLChallenge) -> str:
        """Extract the challenge key from a challenge."""
        return item.challenge_key or f"{item.module_id or ''}/{item.id}"
    def format_prompt(self, item: RLChallenge) -> str:
        """Build user prompt from challenge metadata."""
        challenge_key = self._get_challenge_key(item)
        return USER_PROMPT_TEMPLATE.format(
            module_name=item.module_id or "unknown",
            challenge_name=item.name or item.id,
            challenge_description=item.description or f"Solve the challenge: {challenge_key}",
        )
    async def _acquire_instance(
        self, challenge_key: str, *, pool_slot: Optional[int] = None,
    ) -> Optional[RLInstance]:
        """Acquire a dojo instance for a challenge.
        If *pool_slot* is given (process mode), try to reset the slot.
        If the slot is dead on the dojo, destroy it and create a fresh
        one.  The returned instance may have a different slot ID than
        *pool_slot* — callers must use ``inst.slot`` going forward.
        If *pool_slot* is ``None`` (evaluate / serve modes), create a
        new instance with transient-error retries.
        """
        if pool_slot is not None:
            # Pool mode: try reset first (fast path)
            try:
                return await self.client.reset_instance(
                    pool_slot, challenge=challenge_key,
                )
            except Exception as e:
                logger.warning(
                    "reset_instance(%d, %s) failed: %s — "
                    "destroying and creating fresh slot",
                    pool_slot, challenge_key, str(e)[:80],
                )
                try:
                    await self.client.destroy_instance(pool_slot)
                except Exception:
                    pass
                # Fall through to create mode
        # Create mode: new instance with transient-error retries
        max_retries = 10 if pool_slot is not None else 5
        for attempt in range(max_retries):
            try:
                return await self.client.create_instance(
                    challenge_key,
                )
            except Exception as e:
                err_str = str(e)
                is_transient = (
                    isinstance(e, httpx.HTTPStatusError)
                    and e.response.status_code >= 500
                    or isinstance(e, (
                        httpx.ReadTimeout,
                        httpx.ConnectTimeout,
                        httpx.ConnectError,
                    ))
                    or "No available slots" in err_str
                )
                if is_transient and attempt < max_retries - 1:
                    wait = min(2 ** (attempt + 1), 60)
                    logger.warning(
                        "Transient error creating instance "
                        "for %s (attempt %d/%d): %s, "
                        "retrying in %ds",
                        challenge_key, attempt + 1,
                        max_retries, err_str[:80], wait,
                    )
                    await asyncio.sleep(wait)
                else:
                    logger.error(
                        "Failed to create instance for %s "
                        "after %d attempts: %s",
                        challenge_key, attempt + 1, e,
                    )
                    return None
        return None
    async def collect_trajectory(
        self, item: Item, *, pool_instance: Optional[RLInstance] = None,
    ) -> Tuple[Optional[Union[ScoredDataItem, Any]], List[Item]]:
        """Run a single rollout with dojo instance lifecycle.
        Wraps the agent loop with:
        1. Dojo instance creation (SSH-accessible challenge container)
        2. SSH override registration (routes terminal tool to the instance)
        3. Flag context registration (enables submit_flag tool)
        4. Cleanup on completion
        When *pool_instance* is provided (process mode), that
        pre-acquired instance is used directly and NOT destroyed on
        completion — the caller manages its lifecycle.
        """
        task_id = str(uuid.uuid4())
        challenge_key = self._get_challenge_key(item)
        owns_slot = pool_instance is None
        if pool_instance is not None:
            inst = pool_instance
        else:
            inst = await self._acquire_instance(challenge_key)
            if inst is None:
                return None, []
        slot = inst.slot
        self._active_slots.add(slot)
        register_task_env_overrides(
            task_id,
            {
                "ssh_user": inst.ssh_user,
                "ssh_host": self.config.ssh_host,
                "ssh_port": self.config.ssh_port,
                "ssh_key": self.config.ssh_key,
            },
        )
        register_flag_context(task_id, self.sync_client, slot)
        try:
            # Resolve tools (includes submit_flag via "pwncollege" toolset)
            if self._current_group_tools is None:
                tools, valid_names = self._resolve_tools_for_group()
            else:
                tools, valid_names = self._current_group_tools
            messages: List[Dict[str, Any]] = []
            if self.config.system_prompt:
                messages.append({"role": "system", "content": self.config.system_prompt})
            messages.append({"role": "user", "content": self.format_prompt(item)})
            agent = HermesAgentLoop(
                server=self.server,
                tool_schemas=tools,
                valid_tool_names=valid_names,
                max_turns=self.config.max_agent_turns,
                task_id=task_id,
                temperature=self.config.agent_temperature,
                max_tokens=self.config.max_token_length,
                extra_body=self.config.extra_body,
            )
            result = await agent.run(messages)
            # Skip reward if agent produced no output
            only_system_and_user = all(
                msg.get("role") in ("system", "user") for msg in result.messages
            )
            if result.turns_used == 0 or only_system_and_user:
                logger.warning("Agent produced no output for %s", challenge_key)
                reward = 0.0
            else:
                ctx = ToolContext(task_id)
                try:
                    reward = await self.compute_reward(item, result, ctx)
                finally:
                    ctx.cleanup()
            # Track tool errors
            if result.tool_errors:
                for err in result.tool_errors:
                    self._tool_error_buffer.append({
                        "turn": err.turn,
                        "tool": err.tool_name,
                        "args": err.arguments[:150],
                        "error": err.error[:300],
                        "result": err.tool_result[:300],
                    })
            # Build scored item (Phase 1: placeholder tokens)
            full_text = "\n".join(
                msg.get("content", "") for msg in result.messages if msg.get("content")
            )
            if self.tokenizer:
                tokens = self.tokenizer.encode(full_text, add_special_tokens=True)
            else:
                tokens = list(range(min(len(full_text) // 4, 128)))
            scored_item = {
                "tokens": tokens,
                "masks": [-100] + tokens[1:],
                "scores": reward,
                "messages": result.messages,
            }
            return scored_item, []
        finally:
            clear_flag_context(task_id)
            clear_task_env_overrides(task_id)
            cleanup_vm(task_id)
            if owns_slot:
                # Evaluate/serve mode: we created it, we destroy it
                try:
                    await self.client.destroy_instance(slot)
                except Exception as e:
                    logger.warning("Failed to destroy instance slot %d: %s", slot, e)
            # Pool mode: caller is responsible for the slot lifecycle
            self._active_slots.discard(slot)
    async def compute_reward(
        self, item: Item, result: AgentResult, ctx: ToolContext
    ) -> float:
        """Score the rollout: 1.0 if flag was correctly submitted, 0.0 otherwise.
        Checks two signals:
        1. Did submit_flag return {"success": true}?
        2. Fallback: extract pwn.college{...} from terminal output and verify via SDK.
        """
        # Check submit_flag tool results in the conversation
        for msg in result.messages:
            if msg.get("role") == "tool":
                try:
                    data = json.loads(msg.get("content", ""))
                    if isinstance(data, dict) and data.get("success") is True:
                        self.solve_rate_buffer.append(1.0)
                        return 1.0
                except (json.JSONDecodeError, TypeError):
                    pass
        # Fallback: scan for flag pattern in all messages
        for msg in result.messages:
            content = msg.get("content", "")
            if not content:
                continue
            flag_match = re.search(r"pwn\.college\{[^}]+\}", content)
            if flag_match:
                # We can't verify here since instance is being torn down,
                # but the flag pattern presence suggests partial progress
                self.solve_rate_buffer.append(0.0)
                return 0.0
        self.solve_rate_buffer.append(0.0)
        return 0.0
    async def process_manager(self):
        """Override: process items concurrently with pre-allocated slot pool.
        Uses a pool of dojo instances (asyncio.Queue) instead of a semaphore.
        Each task waits for a real dojo slot to become available, resets it
        to the target challenge, and returns it to the pool on completion.
        This guarantees zero silent drops from slot contention.
        """
        from atroposlib.frontend.jsonl2html import generate_html
        await self.setup()
        if self.config.use_wandb:
            import random
            import string
            from datetime import datetime
            import wandb
            random_id = "".join(random.choices(string.ascii_lowercase, k=6))
            current_date = datetime.now().strftime("%Y-%m-%d")
            wandb.init(
                project=self.wandb_project,
                name=f"{self.name}-{current_date}-{random_id}",
                group=self.wandb_group,
                config=self.config.model_dump(),
            )
        self.config.group_size = self.group_size_to_process
        items = self.train[:self.n_groups_to_process]
        total = len(items)
        concurrency = self.config.eval_concurrency
        completed = 0
        # --- Pre-allocate slot pool ---
        # Use the first challenge as a throwaway target; each task will
        # reset_instance to its own challenge before running.
        first_key = self._get_challenge_key(items[0]) if items else "hello/hello"
        slot_pool: asyncio.Queue[int] = asyncio.Queue()
        pool_size = 0
        logger.info("Pre-allocating %d dojo slots...", concurrency)
        for i in range(concurrency):
            try:
                inst = await self.client.create_instance(first_key)
                slot_pool.put_nowait(inst.slot)
                pool_size += 1
            except Exception as e:
                # Dojo has a hard slot cap; once full, stop trying
                logger.info(
                    "Pre-allocated %d/%d slots (dojo full: %s)",
                    i, concurrency, e,
                )
                break
        if pool_size == 0:
            raise RuntimeError("Could not allocate any dojo slots")
        logger.info(
            "Processing %d items (pool_size=%d, group_size=%d)",
            total, pool_size, self.group_size_to_process,
        )
        # Resolve tools once before launching concurrent tasks
        self._current_group_tools = self._resolve_tools_for_group()
        async def process_one(item):
            nonlocal completed
            challenge_key = self._get_challenge_key(item)
            # Wait for a real slot (blocks until one is returned)
            original_slot = await slot_pool.get()
            # _acquire_instance may create a new slot if the original
            # died on the dojo, so we track the actual slot to return.
            actual_slot: int | None = original_slot
            try:
                # Acquire instance (reset or create)
                inst = await self._acquire_instance(
                    challenge_key, pool_slot=original_slot,
                )
                if inst is None:
                    logger.warning(
                        "Could not acquire instance for %s",
                        challenge_key,
                    )
                    actual_slot = None  # don't poison pool
                    return
                actual_slot = inst.slot
                if actual_slot != original_slot:
                    logger.info(
                        "Slot %d replaced with %d for %s",
                        original_slot, actual_slot,
                        challenge_key,
                    )
                # Run the trajectory with the acquired instance
                scored, _ = await self.collect_trajectory(
                    item, pool_instance=inst,
                )
                if scored is None:
                    logger.warning(
                        "No scored data for %s (slot %d)",
                        challenge_key, actual_slot,
                    )
                    return
                # Wrap in ScoredDataGroup for postprocessing
                to_postprocess = {
                    "tokens": [scored["tokens"]],
                    "masks": [scored["masks"]],
                    "scores": [scored["scores"]],
                    "advantages": [],
                    "ref_logprobs": [],
                    "messages": [scored.get("messages", [])],
                    "group_overrides": {},
                    "overrides": [],
                    "images": [],
                }
                processed = await self.postprocess_histories(
                    to_postprocess,
                )
                await self.handle_send_to_api(
                    processed, item,
                    do_send_to_api=False,
                    abort_on_any_max_length_exceeded=False,
                )
            except Exception as e:
                logger.error(
                    "Failed to process %s: %s", challenge_key, e,
                )
            finally:
                completed += 1
                logger.info(
                    "Processed %d/%d (%s)",
                    completed, total, challenge_key,
                )
                # Return the actual slot to pool (may differ from
                # original_slot if reset failed and a new one was
                # created). None means acquisition failed entirely.
                if actual_slot is not None:
                    slot_pool.put_nowait(actual_slot)
        await asyncio.gather(*[process_one(item) for item in items])
        logger.info("Completed processing %d items", completed)
        # Cleanup: destroy all pooled slots
        while not slot_pool.empty():
            slot = slot_pool.get_nowait()
            try:
                await self.client.destroy_instance(slot)
            except Exception as e:
                logger.warning("Failed to destroy pool slot %d: %s", slot, e)
        if self.jsonl_writer is not None:
            self.jsonl_writer.close()
        if self.config.data_path_to_save_groups:
            generate_html(self.config.data_path_to_save_groups)
    async def evaluate(self, *args, **kwargs):
        """Run evaluation on a dojo/module and report solve rate.
        Fetches challenges matching eval_dojo/eval_module, runs each through
        the agent loop with concurrency control, and logs results.
        """
        import time
        if not self.client:
            logger.error("SDK client not initialized. Call setup() first.")
            return
        start_time = time.time()
        # Fetch and filter eval challenges
        all_challenges = await self.client.list_challenges()
        if self.config.eval_challenges:
            challenge_set = set(self.config.eval_challenges)
            eval_challenges = [c for c in all_challenges if c.challenge_key in challenge_set]
        else:
            eval_challenges = [
                c for c in all_challenges
                if (self.config.eval_dojo is None or c.dojo_id == self.config.eval_dojo)
                and (self.config.eval_module is None or c.module_id == self.config.eval_module)
                and c.dojo_id not in self.config.eval_exclude_dojos
                and c.module_id not in self.config.eval_exclude_modules
            ]
        if not eval_challenges:
            logger.warning(
                "No challenges found for eval_dojo=%s eval_module=%s",
                self.config.eval_dojo, self.config.eval_module,
            )
            return
        print(
            f"Evaluating {len(eval_challenges)} challenges from "
            f"{self.config.eval_dojo or '*'}/{self.config.eval_module or '*'} "
            f"(concurrency={self.config.eval_concurrency})",
            flush=True,
        )
        semaphore = asyncio.Semaphore(self.config.eval_concurrency)
        completed = 0
        total = len(eval_challenges)
        async def eval_one(challenge: RLChallenge) -> dict:
            nonlocal completed
            challenge_key = self._get_challenge_key(challenge)
            async with semaphore:
                try:
                    scored, _ = await self.collect_trajectory(challenge)
                    solved = scored is not None and scored.get("scores", 0.0) >= 1.0
                    completed += 1
                    status = "PASS" if solved else "FAIL"
                    reward = scored.get("scores", 0.0) if scored else 0.0
                    print(
                        f"  [{completed}/{total}] [{status}] {challenge_key} "
                        f"(reward={reward:.1f})",
                        flush=True,
                    )
                    result = {
                        "challenge": challenge_key,
                        "name": challenge.name,
                        "solved": solved,
                        "reward": reward,
                    }
                    # Stream-write sample with full conversation for HTML viewer
                    self.log_eval_sample({
                        "score": reward,
                        "challenge": challenge_key,
                        "solved": solved,
                        "messages": scored.get("messages", []) if scored else [],
                    })
                    return result
                except Exception as e:
                    completed += 1
                    print(
                        f"  [{completed}/{total}] [ERR ] {challenge_key}: {e}",
                        flush=True,
                    )
                    self.log_eval_sample({
                        "score": 0.0,
                        "challenge": challenge_key,
                        "solved": False,
                        "messages": [{"role": "system", "content": f"Error: {e}"}],
                    })
                    return {
                        "challenge": challenge_key,
                        "name": challenge.name,
                        "solved": False,
                        "reward": 0.0,
                        "error": str(e),
                    }
        tasks = [eval_one(c) for c in eval_challenges]
        results = await asyncio.gather(*tasks)
        end_time = time.time()
        # Aggregate
        n = len(results)
        solved = sum(1 for r in results if r["solved"])
        solve_rate = solved / n if n else 0.0
        print("=" * 60, flush=True)
        print(
            f"Eval: {solved}/{n} solved ({solve_rate * 100:.1f}%) "
            f"in {end_time - start_time:.1f}s",
            flush=True,
        )
        print("=" * 60, flush=True)
        eval_metrics = {
            "eval/solve_rate": solve_rate,
            "eval/solved": solved,
            "eval/total": n,
        }
        await self.evaluate_log(
            metrics=eval_metrics,
            start_time=start_time,
            end_time=end_time,
        )
    async def wandb_log(self, wandb_metrics: Optional[Dict] = None):
        """Log solve rate metrics to wandb."""
        if wandb_metrics is None:
            wandb_metrics = {}
        if self.solve_rate_buffer:
            n = len(self.solve_rate_buffer)
            wandb_metrics["train/solve_rate"] = sum(self.solve_rate_buffer) / n
            wandb_metrics["train/num_rollouts"] = n
            self.solve_rate_buffer = []
        await super().wandb_log(wandb_metrics)
 if __name__ == "__main__":
    PwnCollegeEnv.cli()
--- a/environments/pwncollege_env/sdk.py
+++ b/environments/pwncollege_env/sdk.py
@@ -0,0 +1,468 @@
 """SDK for pwncollege dojo"""
 import asyncio
 import logging
 import re
 from contextlib import asynccontextmanager
 from dataclasses import dataclass, field
 from typing import Any
 import httpx
 logger = logging.getLogger(__name__)
 def _extract_csrf_nonce(html: str) -> str | None:
    match = re.search(r"'csrfNonce': \"([^\"]+)\"", html)
    return match.group(1) if match else None
@dataclass
 class RLInstance:
    slot: int
    ssh_user: str
    challenge_id: str
    module_id: str
    dojo_id: str
    flag: str | None = None
    created_at: float | None = None
    status: str | None = None
    @property
    def challenge_key(self) -> str:
        return f"{self.module_id}/{self.challenge_id}"
@dataclass
 class RLResource:
    type: str
    name: str
    content: str | None = None
    video: str | None = None
    slides: str | None = None
@dataclass
 class RLChallenge:
    id: str
    name: str
    description: str
    module_id: str | None = None
    module_name: str | None = None
    module_description: str | None = None
    dojo_id: str | None = None
    dojo_name: str | None = None
    dojo_description: str | None = None
    resources: list[RLResource] = field(default_factory=list)
    @property
    def challenge_key(self) -> str | None:
        if self.module_id:
            return f"{self.module_id}/{self.id}"
        return None
@dataclass
 class RLStatus:
    enabled: bool
    max_instances: int
    running: int
    instances: list[RLInstance]
 class DojoRLClient:
    """Client for the dojo RL API. No auth required."""
    def __init__(self, base_url: str, timeout: float = 120.0):
        self.base_url = base_url.rstrip("/")
        self.client = httpx.AsyncClient(
            base_url=self.base_url,
            timeout=timeout,
            follow_redirects=True,
        )
    async def __aenter__(self):
        return self
    async def __aexit__(self, *args):
        await self.close()
    async def close(self):
        await self.client.aclose()
    def _rl_url(self, path: str) -> str:
        return f"/pwncollege_api/v1/rl{path}"
    async def _get(self, path: str) -> dict[str, Any]:
        resp = await self.client.get(self._rl_url(path))
        resp.raise_for_status()
        return resp.json()
    async def _post(self, path: str, json: dict | None = None) -> dict[str, Any]:
        resp = await self.client.post(self._rl_url(path), json=json or {})
        resp.raise_for_status()
        return resp.json()
    async def _delete(self, path: str) -> dict[str, Any]:
        resp = await self.client.delete(self._rl_url(path))
        resp.raise_for_status()
        return resp.json()
    # ── Response Parsing ──────────────────────────────────────────────────────
    # The API uses different field names in create/reset vs get/list responses.
    # These parsers normalize everything into RLInstance.
    @staticmethod
    def _parse_create_response(data: dict[str, Any]) -> RLInstance:
        return RLInstance(
            slot=data["slot"],
            ssh_user=data["ssh_user"],
            challenge_id=data["challenge"],
            module_id=data["module"],
            dojo_id=data["dojo"],
        )
    @staticmethod
    def _parse_instance_detail(data: dict[str, Any]) -> RLInstance:
        created_at = data.get("created_at")
        return RLInstance(
            slot=data["slot"],
            ssh_user=data.get("ssh_user", f"rl_{data['slot']}"),
            challenge_id=data["challenge_id"],
            module_id=data["module_id"],
            dojo_id=data["dojo_id"],
            flag=data.get("flag"),
            created_at=float(created_at) if created_at else None,
        )
    @staticmethod
    def _parse_instance_listing(data: dict[str, Any]) -> RLInstance:
        created_at = data.get("created_at")
        return RLInstance(
            slot=data["slot"],
            ssh_user=f"rl_{data['slot']}",
            challenge_id=data["challenge_id"],
            module_id=data["module_id"],
            dojo_id=data["dojo_id"],
            created_at=float(created_at) if created_at else None,
            status=data.get("status"),
        )
    @staticmethod
    def _parse_challenge(data: dict[str, Any]) -> RLChallenge:
        resources = [
            RLResource(
                type=r["type"],
                name=r["name"],
                content=r.get("content"),
                video=r.get("video"),
                slides=r.get("slides"),
            )
            for r in data.get("resources", [])
        ]
        return RLChallenge(
            id=data["id"],
            name=data["name"],
            description=data["description"],
            module_id=data.get("module_id"),
            module_name=data.get("module_name"),
            module_description=data.get("module_description"),
            dojo_id=data.get("dojo_id"),
            dojo_name=data.get("dojo_name"),
            dojo_description=data.get("dojo_description"),
            resources=resources,
        )
    # ── RL Instance Lifecycle ─────────────────────────────────────────────────
    async def status(self) -> RLStatus:
        result = await self._get("/status")
        instances = [
            self._parse_instance_listing(inst) for inst in result.get("instances", [])
        ]
        return RLStatus(
            enabled=result["enabled"],
            max_instances=result["max_instances"],
            running=result["running"],
            instances=instances,
        )
    async def create_instance(
        self, challenge: str, *, variant: int | None = None
    ) -> RLInstance:
        data: dict[str, Any] = {"challenge": challenge}
        if variant is not None:
            data["variant"] = variant
        result = await self._post("/instances", json=data)
        if not result.get("success"):
            raise RuntimeError(f"Failed to create instance: {result.get('error')}")
        return self._parse_create_response(result)
    async def get_instance(self, slot: int) -> RLInstance:
        result = await self._get(f"/instances/{slot}")
        if not result.get("success"):
            raise KeyError(f"No instance at slot {slot}")
        return self._parse_instance_detail(result)
    async def list_instances(self) -> list[RLInstance]:
        result = await self._get("/instances")
        return [
            self._parse_instance_listing(inst) for inst in result.get("instances", [])
        ]
    async def destroy_instance(self, slot: int) -> None:
        result = await self._delete(f"/instances/{slot}")
        if not result.get("success"):
            raise RuntimeError(f"Failed to destroy instance: {result.get('error')}")
    async def reset_instance(
        self, slot: int, *, challenge: str | None = None
    ) -> RLInstance:
        data: dict[str, Any] = {}
        if challenge is not None:
            data["challenge"] = challenge
        result = await self._post(f"/instances/{slot}/reset", json=data)
        if not result.get("success"):
            raise RuntimeError(f"Failed to reset instance: {result.get('error')}")
        return self._parse_create_response(result)
    async def check_flag(self, slot: int, flag: str) -> bool:
        result = await self._post(f"/instances/{slot}/check", json={"flag": flag})
        return result.get("correct", False)
    async def get_flag(self, slot: int) -> str:
        instance = await self.get_instance(slot)
        if instance.flag is None:
            raise RuntimeError(f"No flag available for slot {slot}")
        return instance.flag
    # ── SSH Key Management ────────────────────────────────────────────────────
    async def register_ssh_key(self, public_key: str) -> bool:
        result = await self._post("/ssh_key", json={"public_key": public_key})
        return result.get("success", False)
    async def get_ssh_key(self) -> dict[str, Any]:
        return await self._get("/ssh_key")
    # ── Challenge Discovery ───────────────────────────────────────────────────
    async def list_challenges(self) -> list[RLChallenge]:
        result = await self._get("/challenges")
        return [self._parse_challenge(ch) for ch in result.get("challenges", [])]
    # ── Admin (requires auth) ─────────────────────────────────────────────────
    async def admin_login(
        self, username: str = "admin", password: str = "admin"
    ) -> None:
        resp = await self.client.get("/login")
        nonce = _extract_csrf_nonce(resp.text)
        if not nonce:
            raise RuntimeError("Could not extract CSRF nonce")
        self._admin_csrf = nonce
        resp = await self.client.post(
            "/login",
            data={"name": username, "password": password, "nonce": nonce},
        )
        if resp.status_code not in (200, 302):
            raise RuntimeError(f"Login failed: {resp.status_code}")
        resp = await self.client.get("/")
        self._admin_csrf = _extract_csrf_nonce(resp.text) or self._admin_csrf
    async def load_dojo(self, repository: str) -> str:
        if not hasattr(self, "_admin_csrf"):
            raise RuntimeError("Must call admin_login() first")
        resp = await self.client.post(
            "/pwncollege_api/v1/dojos/create",
            json={
                "repository": repository,
                "public_key": f"public/{repository}",
                "private_key": f"private/{repository}",
            },
            headers={"CSRF-Token": self._admin_csrf},
        )
        resp.raise_for_status()
        data = resp.json()
        if not data.get("success", True):
            raise RuntimeError(f"Failed to load dojo: {data.get('error', data)}")
        return data.get("dojo", repository)
    async def promote_dojo(self, dojo_id: str) -> None:
        if not hasattr(self, "_admin_csrf"):
            raise RuntimeError("Must call admin_login() first")
        resp = await self.client.post(
            f"/pwncollege_api/v1/dojos/{dojo_id}/promote",
            json={},
            headers={"CSRF-Token": self._admin_csrf},
        )
        resp.raise_for_status()
    # ── Bulk Operations ───────────────────────────────────────────────────────
    async def create_batch(self, challenge: str, count: int) -> list[RLInstance]:
        tasks = [self.create_instance(challenge) for _ in range(count)]
        return await asyncio.gather(*tasks)
    async def destroy_all(self) -> int:
        instances = await self.list_instances()
        for inst in instances:
            await self.destroy_instance(inst.slot)
        return len(instances)
 class DojoRLSyncClient:
    """Sync wrapper for DojoRLClient.
    Runs all async operations on a dedicated background thread with its own
    event loop, so it's safe to call from any context — including from inside
    another running event loop (e.g., Atropos's loop or tool dispatch threads).
    """
    def __init__(self, base_url: str, timeout: float = 120.0):
        import threading
        self._async = DojoRLClient(base_url, timeout)
        self._loop = asyncio.new_event_loop()
        self._thread = threading.Thread(
            target=self._loop.run_forever,
            daemon=True,
        )
        self._thread.start()
    def _run(self, coro):
        return asyncio.run_coroutine_threadsafe(coro, self._loop).result()
    def __enter__(self):
        return self
    def __exit__(self, *args):
        self.close()
    def close(self):
        if not self._loop.is_running():
            return
        try:
            self._run(self._async.close())
        except Exception:
            pass
        self._loop.call_soon_threadsafe(self._loop.stop)
        self._thread.join(timeout=5)
    def status(self) -> RLStatus:
        return self._run(self._async.status())
    def create_instance(
        self, challenge: str, *, variant: int | None = None
    ) -> RLInstance:
        return self._run(self._async.create_instance(challenge, variant=variant))
    def get_instance(self, slot: int) -> RLInstance:
        return self._run(self._async.get_instance(slot))
    def list_instances(self) -> list[RLInstance]:
        return self._run(self._async.list_instances())
    def destroy_instance(self, slot: int) -> None:
        return self._run(self._async.destroy_instance(slot))
    def reset_instance(self, slot: int, *, challenge: str | None = None) -> RLInstance:
        return self._run(self._async.reset_instance(slot, challenge=challenge))
    def check_flag(self, slot: int, flag: str) -> bool:
        return self._run(self._async.check_flag(slot, flag))
    def get_flag(self, slot: int) -> str:
        return self._run(self._async.get_flag(slot))
    def list_challenges(self) -> list[RLChallenge]:
        return self._run(self._async.list_challenges())
    def register_ssh_key(self, public_key: str) -> bool:
        return self._run(self._async.register_ssh_key(public_key))
    def get_ssh_key(self) -> dict[str, Any]:
        return self._run(self._async.get_ssh_key())
    def admin_login(self, username: str = "admin", password: str = "admin") -> None:
        return self._run(self._async.admin_login(username, password))
    def load_dojo(self, repository: str) -> str:
        return self._run(self._async.load_dojo(repository))
    def promote_dojo(self, dojo_id: str) -> None:
        return self._run(self._async.promote_dojo(dojo_id))
    def destroy_all(self) -> int:
        return self._run(self._async.destroy_all())
@dataclass
 class EpisodePool:
    """Manages a pool of RL instances for parallel episode collection."""
    client: DojoRLClient
    challenge: str
    pool_size: int = 32
    acquisition_timeout: float = 300.0
    _available: asyncio.Queue[RLInstance] = field(
        default_factory=asyncio.Queue, init=False
    )
    _all_instances: dict[int, RLInstance] = field(default_factory=dict, init=False)
    _initialized: bool = field(default=False, init=False)
    async def initialize(self) -> None:
        if self._initialized:
            return
        for _ in range(self.pool_size):
            instance = await self.client.create_instance(self.challenge)
            full = await self.client.get_instance(instance.slot)
            self._all_instances[instance.slot] = full
            await self._available.put(full)
        self._initialized = True
    @asynccontextmanager
    async def acquire(self):
        if not self._initialized:
            raise RuntimeError("EpisodePool not initialized")
        try:
            instance = await asyncio.wait_for(
                self._available.get(), timeout=self.acquisition_timeout
            )
        except asyncio.TimeoutError:
            raise RuntimeError(
                f"No instance available within {self.acquisition_timeout}s"
            )
        try:
            yield instance
        finally:
            try:
                reset = await self.client.reset_instance(
                    instance.slot, challenge=self.challenge
                )
                full = await self.client.get_instance(reset.slot)
                self._all_instances[reset.slot] = full
                await self._available.put(full)
            except Exception as e:
                logger.error(
                    "Failed to reset instance slot %d, returning stale instance: %s",
                    instance.slot,
                    e,
                )
                await self._available.put(instance)
    async def shutdown(self) -> None:
        errors = []
        for slot in list(self._all_instances.keys()):
            try:
                await self.client.destroy_instance(slot)
            except Exception as e:
                errors.append((slot, e))
                logger.warning("Failed to destroy instance slot %d: %s", slot, e)
        self._all_instances.clear()
        self._initialized = False
        if errors:
            logger.error(
                "EpisodePool shutdown: %d instance(s) failed to destroy", len(errors)
            )
--- a/environments/pwncollege_env/smoke_hello.yaml
+++ b/environments/pwncollege_env/smoke_hello.yaml
@@ -0,0 +1,74 @@
 env:
  group_size: 4
  max_num_workers: -1
  max_eval_workers: 16
  max_num_workers_per_node: 8
  steps_per_eval: 100
  max_token_length: 16384
  eval_handling: STOP_TRAIN
  eval_limit_ratio: 0.5
  inference_weight: 1.0
  batch_size: -1
  max_batches_offpolicy: 3
  tokenizer_name: NousResearch/Hermes-3-Llama-3.1-8B
  use_wandb: false
  rollout_server_url: http://localhost:8000
  total_steps: 1000
  wandb_name: pwncollege-smoke-hello
  num_rollouts_to_keep: 32
  num_rollouts_per_group_for_logging: 1
  ensure_scores_are_not_same: false
  data_path_to_save_groups: null
  data_dir_to_save_evals: environments/pwncollege_env/eval_runs/smoke_hello
  min_items_sent_before_logging: 2
  include_messages: false
  min_batch_allocation: null
  worker_timeout: 600.0
  thinking_mode: false
  reasoning_effort: null
  max_reasoning_tokens: null
  custom_thinking_prompt: null
  enabled_toolsets:
    - terminal
    - file
    - pwncollege
  disabled_toolsets: null
  distribution: null
  max_agent_turns: 20
  agent_temperature: 0.7
  terminal_backend: ssh
  terminal_timeout: 120
  terminal_lifetime: 3600
  disable_command_guards: true
  dataset_name: null
  dataset_split: train
  prompt_field: prompt
  tool_pool_size: 128
  tool_call_parser: hermes
  extra_body: null
  base_url: http://100.120.55.25:8080
  ssh_host: 100.120.55.25
  ssh_port: 2222
  ssh_key: environments/pwncollege_env/keys/rl_test_key
  challenge: hello/hello
  dojo_filter: null
  module_filter: null
  eval_dojo: linux-luminarium
  eval_exclude_dojos:
    - archive
  eval_module: hello
  eval_concurrency: 3
 openai:
  - timeout: 1200
    num_max_requests_at_once: 512
    num_requests_for_eval: 64
    model_name: xiaomi/mimo-v2-flash
    rolling_buffer_length: 1000
    server_type: openai
    tokenizer_name: none
    api_key: ""
    base_url: https://openrouter.ai/api/v1
    n_kwarg_is_ignored: false
    health_check: false
 slurm: false
 testing: false
--- a/environments/pwncollege_env/stress_test.py
+++ b/environments/pwncollege_env/stress_test.py
@@ -0,0 +1,513 @@
 """
 Capability verification test for pwn-dojo RL infrastructure.
 Verifies that RL containers are provisioned with the correct Linux capabilities,
 resource limits, and host configuration for each challenge type.
 Usage:
    python environments/pwncollege_env/stress_test.py -y
    python environments/pwncollege_env/stress_test.py -y -o report.json --verbose
 """
 import argparse
 import asyncio
 import json
 import sys
 import time
 from dataclasses import asdict, dataclass, field
 from pathlib import Path
 _repo_root = Path(__file__).resolve().parent.parent.parent
 if str(_repo_root) not in sys.path:
    sys.path.insert(0, str(_repo_root))
 from environments.pwncollege_env.sdk import DojoRLClient
@dataclass
 class SSHConfig:
    host: str
    port: int
    key: str
@dataclass
 class CheckResult:
    name: str
    passed: bool
    message: str
    duration: float = 0.0
@dataclass
 class TestResult:
    name: str
    challenge: str
    checks: list[CheckResult] = field(default_factory=list)
    passed: bool = False
    skipped: bool = False
    error: str | None = None
    duration: float = 0.0
@dataclass
 class TestCase:
    name: str
    challenge: str
    checks: list
 async def ssh_run(
    cfg: SSHConfig, user: str, command: str, timeout: float = 30.0
 ) -> tuple[int, str]:
    """Run a command over SSH via subprocess. Returns (returncode, output)."""
    cmd = [
        "ssh",
        "-o",
        "BatchMode=yes",
        "-o",
        "StrictHostKeyChecking=accept-new",
        "-o",
        "UserKnownHostsFile=/dev/null",
        "-o",
        "ConnectTimeout=10",
        "-o",
        "LogLevel=ERROR",
        "-p",
        str(cfg.port),
        "-i",
        cfg.key,
        f"{user}@{cfg.host}",
        command,
    ]
    proc = await asyncio.create_subprocess_exec(
        *cmd,
        stdout=asyncio.subprocess.PIPE,
        stderr=asyncio.subprocess.STDOUT,
    )
    try:
        stdout, _ = await asyncio.wait_for(proc.communicate(), timeout=timeout)
        return proc.returncode, stdout.decode(errors="replace")
    except asyncio.TimeoutError:
        proc.kill()
        await proc.wait()
        return -1, f"[SSH timeout after {timeout}s]"
 async def wait_ssh_ready(cfg: SSHConfig, user: str, retries: int = 10) -> bool:
    for i in range(retries):
        rc, out = await ssh_run(cfg, user, "echo ready", timeout=10)
        if rc == 0 and "ready" in out:
            return True
        await asyncio.sleep(1)
    return False
 # ── Check functions ──────────────────────────────────────────────────────────
 async def check_ssh_echo(cfg: SSHConfig, user: str) -> CheckResult:
    t0 = time.monotonic()
    rc, out = await ssh_run(cfg, user, "echo ok")
    dur = time.monotonic() - t0
    if rc == 0 and "ok" in out:
        return CheckResult("ssh_echo", True, "connected", dur)
    return CheckResult("ssh_echo", False, f"rc={rc}: {out.strip()[:100]}", dur)
 async def check_unshare_net(cfg: SSHConfig, user: str) -> CheckResult:
    t0 = time.monotonic()
    rc, out = await ssh_run(cfg, user, "unshare --net echo ok")
    dur = time.monotonic() - t0
    if rc == 0 and "ok" in out:
        return CheckResult("unshare_net", True, "namespace creation works", dur)
    return CheckResult("unshare_net", False, f"rc={rc}: {out.strip()[:120]}", dur)
 async def check_unshare_user(cfg: SSHConfig, user: str) -> CheckResult:
    t0 = time.monotonic()
    rc, out = await ssh_run(cfg, user, "unshare --user --map-root-user bash -c 'id'")
    dur = time.monotonic() - t0
    if rc == 0 and "uid=0" in out:
        return CheckResult("unshare_user", True, "user namespace works", dur)
    return CheckResult("unshare_user", False, f"rc={rc}: {out.strip()[:120]}", dur)
 async def check_capeff(cfg: SSHConfig, user: str) -> CheckResult:
    """Check that the container init (PID 1) has SYS_ADMIN capability."""
    t0 = time.monotonic()
    rc, out = await ssh_run(cfg, user, "cat /proc/1/status")
    dur = time.monotonic() - t0
    if rc != 0:
        return CheckResult(
            "capeff", False, f"Cannot read /proc/1/status: {out.strip()[:80]}", dur
        )
    for line in out.splitlines():
        if line.startswith("CapEff:") or line.startswith("CapBnd:"):
            hex_val = line.split(":")[1].strip()
            try:
                val = int(hex_val, 16)
                has_sysadmin = bool(val & (1 << 21))
                if has_sysadmin:
                    label = line.split(":")[0]
                    return CheckResult(
                        "capeff", True, f"{label}={hex_val} has SYS_ADMIN", dur
                    )
            except ValueError:
                pass
    return CheckResult(
        "capeff", False, "SYS_ADMIN (bit 21) not found in capabilities", dur
    )
 async def check_hosts_resolution(cfg: SSHConfig, user: str) -> CheckResult:
    t0 = time.monotonic()
    rc, out = await ssh_run(cfg, user, "getent hosts challenge.localhost")
    dur = time.monotonic() - t0
    if rc == 0 and out.strip():
        return CheckResult(
            "hosts_resolution", True, f"resolves to {out.strip()[:40]}", dur
        )
    rc2, out2 = await ssh_run(cfg, user, "grep challenge.localhost /etc/hosts")
    dur = time.monotonic() - t0
    if rc2 == 0 and "challenge.localhost" in out2:
        return CheckResult(
            "hosts_resolution", True, "/etc/hosts has entry", dur
        )
    return CheckResult(
        "hosts_resolution", False, "challenge.localhost not resolvable", dur
    )
 async def check_pids_limit(cfg: SSHConfig, user: str) -> CheckResult:
    t0 = time.monotonic()
    rc, out = await ssh_run(
        cfg,
        user,
        "cat /sys/fs/cgroup/pids.max 2>/dev/null || cat /sys/fs/cgroup/pids/pids.max 2>/dev/null",
    )
    dur = time.monotonic() - t0
    val = out.strip()
    if val == "max":
        return CheckResult("pids_limit", True, "unlimited", dur)
    try:
        limit = int(val)
        if limit >= 1024:
            return CheckResult("pids_limit", True, f"pids_limit={limit}", dur)
        return CheckResult(
            "pids_limit", False, f"pids_limit={limit} (need >= 1024)", dur
        )
    except ValueError:
        return CheckResult("pids_limit", False, f"Cannot parse: {val[:60]}", dur)
 async def check_mem_limit(cfg: SSHConfig, user: str) -> CheckResult:
    t0 = time.monotonic()
    rc, out = await ssh_run(
        cfg,
        user,
        "cat /sys/fs/cgroup/memory.max 2>/dev/null || cat /sys/fs/cgroup/memory/memory.limit_in_bytes 2>/dev/null",
    )
    dur = time.monotonic() - t0
    val = out.strip()
    if val == "max":
        return CheckResult("mem_limit", True, "unlimited", dur)
    try:
        limit = int(val)
        limit_gb = limit / (1024**3)
        if (
            limit_gb >= 1.8
        ):  # 2GB for privileged RL containers (not 4GB to manage memory pressure)
            return CheckResult("mem_limit", True, f"mem={limit_gb:.1f}GB", dur)
        return CheckResult(
            "mem_limit", False, f"mem={limit_gb:.1f}GB (need >= 2GB)", dur
        )
    except ValueError:
        return CheckResult("mem_limit", False, f"Cannot parse: {val[:60]}", dur)
 async def check_challenge_run(cfg: SSHConfig, user: str) -> CheckResult:
    """Run /challenge/run and verify no PermissionError."""
    t0 = time.monotonic()
    rc, out = await ssh_run(cfg, user, "/challenge/run < /dev/null", timeout=15)
    dur = time.monotonic() - t0
    if "PermissionError" in out or "Operation not permitted" in out:
        snippet = [l for l in out.splitlines() if "Permission" in l or "Operation" in l]
        return CheckResult(
            "challenge_run",
            False,
            snippet[0][:120] if snippet else "PermissionError",
            dur,
        )
    return CheckResult("challenge_run", True, f"No permission errors (rc={rc})", dur)
 # ── Test cases ───────────────────────────────────────────────────────────────
 TEST_CASES = [
    TestCase("unprivileged_basic", "hello/hello", [check_ssh_echo]),
    TestCase(
        "privileged_caps",
        "intercepting-communication/udp-1",
        [check_ssh_echo, check_capeff],
    ),
    TestCase(
        "privileged_challenge_run",
        "intercepting-communication/udp-1",
        [check_challenge_run],
    ),
    TestCase(
        "web_challenge_hosts",
        "web-security/path-traversal-1",
        [check_ssh_echo, check_hosts_resolution],
    ),
    TestCase(
        "resource_limits",
        "intercepting-communication/udp-1",
        [check_pids_limit, check_mem_limit],
    ),
 ]
 # ── Runner ───────────────────────────────────────────────────────────────────
 async def run_tests(args) -> dict:
    cfg = SSHConfig(host=args.ssh_host, port=args.ssh_port, key=args.ssh_key)
    client = DojoRLClient(args.base_url)
    status = await client.status()
    print(
        f"Server: {args.base_url} (RL={'enabled' if status.enabled else 'DISABLED'}, "
        f"{status.max_instances} max, {status.running} running)"
    )
    if status.running > 0:
        n = await client.destroy_all()
        print(f"Cleaned up {n} instance(s)")
    print()
    results: list[TestResult] = []
    test_num = 0
    total = len(TEST_CASES) + (0 if args.skip_concurrent else 1)
    start_time = time.monotonic()
    for tc in TEST_CASES:
        test_num += 1
        t0 = time.monotonic()
        tr = TestResult(name=tc.name, challenge=tc.challenge)
        print(f"[{test_num}/{total}] {tc.name} ({tc.challenge})")
        try:
            inst = await client.create_instance(tc.challenge)
        except Exception as e:
            err = str(e)
            if "404" in err or "not found" in err.lower() or "Invalid" in err:
                tr.skipped = True
                tr.error = f"Challenge not available: {err[:80]}"
                print(f"  SKIP  {tr.error}")
            else:
                tr.error = f"create_instance failed: {err[:100]}"
                print(f"  ERR   {tr.error}")
            tr.duration = time.monotonic() - t0
            results.append(tr)
            print(f"  --- {'SKIP' if tr.skipped else 'FAIL'} ({tr.duration:.1f}s)\n")
            continue
        try:
            ready = await wait_ssh_ready(cfg, inst.ssh_user)
            if not ready:
                tr.error = "SSH not ready after 10 retries"
                tr.checks.append(
                    CheckResult("ssh_ready", False, tr.error, time.monotonic() - t0)
                )
                print(f"  FAIL  ssh_ready: {tr.error}")
            else:
                for check_fn in tc.checks:
                    cr = await check_fn(cfg, inst.ssh_user)
                    tr.checks.append(cr)
                    tag = "PASS" if cr.passed else "FAIL"
                    extra = f"  ({cr.message})" if args.verbose or not cr.passed else ""
                    print(f"  {tag}  {cr.name:30s} {cr.duration:.1f}s{extra}")
                    if not cr.passed:
                        break
        finally:
            try:
                await client.destroy_instance(inst.slot)
            except Exception as e:
                print(f"  WARN  destroy failed: {e}")
        tr.passed = all(c.passed for c in tr.checks) and not tr.error
        tr.duration = time.monotonic() - t0
        results.append(tr)
        print(f"  --- {'PASS' if tr.passed else 'FAIL'} ({tr.duration:.1f}s)\n")
    if not args.skip_concurrent:
        test_num += 1
        t0 = time.monotonic()
        tr = TestResult(name="concurrent_lifecycle", challenge="8x hello/hello")
        n_concurrent = min(8, status.max_instances)
        print(
            f"[{test_num}/{total}] concurrent_lifecycle ({n_concurrent}x hello/hello)"
        )
        try:
            ct0 = time.monotonic()
            tasks = [client.create_instance("hello/hello") for _ in range(n_concurrent)]
            instances = await asyncio.gather(*tasks, return_exceptions=True)
            create_dur = time.monotonic() - ct0
            created = [i for i in instances if not isinstance(i, Exception)]
            errors = [i for i in instances if isinstance(i, Exception)]
            if errors:
                tr.checks.append(
                    CheckResult(
                        "create_all",
                        False,
                        f"{len(errors)}/{n_concurrent} failed: {errors[0]}",
                        create_dur,
                    )
                )
            else:
                tr.checks.append(
                    CheckResult(
                        "create_all", True, f"{n_concurrent} created", create_dur
                    )
                )
            if created:
                await asyncio.sleep(3)
                et0 = time.monotonic()
                echo_tasks = [
                    ssh_run(cfg, i.ssh_user, "echo ok", timeout=15) for i in created
                ]
                echo_results = await asyncio.gather(*echo_tasks, return_exceptions=True)
                echo_ok = sum(
                    1
                    for r in echo_results
                    if not isinstance(r, Exception) and r[0] == 0
                )
                tr.checks.append(
                    CheckResult(
                        "ssh_echo_all",
                        echo_ok == len(created),
                        f"{echo_ok}/{len(created)} connected",
                        time.monotonic() - et0,
                    )
                )
            dt0 = time.monotonic()
            destroyed = await client.destroy_all()
            tr.checks.append(
                CheckResult(
                    "destroy_all",
                    True,
                    f"destroyed {destroyed}",
                    time.monotonic() - dt0,
                )
            )
            st = await client.status()
            live = sum(1 for i in st.instances if i.status == "running")
            tr.checks.append(
                CheckResult(
                    "slot_cleanup",
                    live == 0,
                    f"running={live} (total listed={st.running})",
                    0.0,
                )
            )
        except Exception as e:
            tr.error = str(e)[:200]
            tr.checks.append(CheckResult("concurrent", False, str(e)[:100], 0.0))
        tr.passed = all(c.passed for c in tr.checks) and not tr.error
        tr.duration = time.monotonic() - t0
        results.append(tr)
        for cr in tr.checks:
            tag = "PASS" if cr.passed else "FAIL"
            extra = f"  ({cr.message})" if args.verbose or not cr.passed else ""
            print(f"  {tag}  {cr.name:30s} {cr.duration:.1f}s{extra}")
        print(f"  --- {'PASS' if tr.passed else 'FAIL'} ({tr.duration:.1f}s)\n")
    total_dur = time.monotonic() - start_time
    passed = sum(1 for r in results if r.passed)
    failed = sum(1 for r in results if not r.passed and not r.skipped)
    skipped = sum(1 for r in results if r.skipped)
    print("=" * 50)
    parts = [f"{passed}/{len(results)} passed"]
    if failed:
        parts.append(f"{failed} failed")
    if skipped:
        parts.append(f"{skipped} skipped")
    print(f"RESULTS: {', '.join(parts)} in {total_dur:.0f}s")
    print("=" * 50)
    return {
        "test": "capability_verification",
        "timestamp": time.strftime("%Y-%m-%dT%H:%M:%S%z"),
        "server": args.base_url,
        "summary": {
            "total": len(results),
            "passed": passed,
            "failed": failed,
            "skipped": skipped,
            "duration_seconds": round(total_dur, 1),
        },
        "tests": [
            {
                "name": r.name,
                "challenge": r.challenge,
                "passed": r.passed,
                "skipped": r.skipped,
                "error": r.error,
                "duration": round(r.duration, 1),
                "checks": [asdict(c) for c in r.checks],
            }
            for r in results
        ],
    }
 def main():
    parser = argparse.ArgumentParser(
        description="Capability verification test for pwn-dojo RL infrastructure",
        formatter_class=argparse.ArgumentDefaultsHelpFormatter,
    )
    parser.add_argument("--base-url", default="http://100.120.55.25:8080")
    parser.add_argument("--ssh-host", default="100.120.55.25")
    parser.add_argument("--ssh-port", type=int, default=2222)
    parser.add_argument(
        "--ssh-key", default="environments/pwncollege_env/keys/rl_test_key"
    )
    parser.add_argument("--output", "-o", help="Write JSON report")
    parser.add_argument("--skip-concurrent", action="store_true")
    parser.add_argument("--verbose", "-v", action="store_true")
    parser.add_argument("--yes", "-y", action="store_true", help="Skip confirmation")
    args = parser.parse_args()
    key = Path(args.ssh_key)
    if not key.exists():
        key = _repo_root / args.ssh_key
    if not key.exists():
        print(f"SSH key not found: {args.ssh_key}")
        sys.exit(1)
    args.ssh_key = str(key)
    if not args.yes:
        print(f"Will test against {args.base_url}")
        if input("Continue? [y/N] ").lower() != "y":
            sys.exit(0)
    report = asyncio.run(run_tests(args))
    if args.output:
        with open(args.output, "w") as f:
            json.dump(report, f, indent=2)
        print(f"\nJSON report: {args.output}")
    sys.exit(0 if report["summary"]["failed"] == 0 else 1)
 if __name__ == "__main__":
    main()
--- a/environments/pwncollege_env/submit_flag_tool.py
+++ b/environments/pwncollege_env/submit_flag_tool.py
@@ -0,0 +1,102 @@
 """submit_flag tool for pwn.college RL environments.
 Registers a `submit_flag` tool in the hermes-agent tool registry under the
 "pwncollege" toolset. The handler checks flags against the dojo RL API using
 per-task context (SDK client + slot) stored in a module-level dict.
 Usage in an environment:
    from environments.pwncollege_env.submit_flag_tool import (
        register_flag_context, clear_flag_context,
    )
    # Before agent loop
    register_flag_context(task_id, sync_client, slot)
    # After agent loop
    clear_flag_context(task_id)
 """
 import json
 import logging
 from typing import Any, Dict
 logger = logging.getLogger(__name__)
 # Per-task context: task_id → {"client": DojoRLSyncClient, "slot": int}
 _task_flag_context: Dict[str, Dict[str, Any]] = {}
 def register_flag_context(task_id: str, sync_client: Any, slot: int) -> None:
    """Register dojo client + slot for a rollout so submit_flag can verify flags."""
    _task_flag_context[task_id] = {"client": sync_client, "slot": slot}
 def clear_flag_context(task_id: str) -> None:
    """Remove flag context after rollout completes."""
    _task_flag_context.pop(task_id, None)
 def _submit_flag_handler(args: dict, **kw) -> str:
    """Handle submit_flag tool calls by checking the flag against the dojo API."""
    task_id = kw.get("task_id", "default")
    flag = args.get("flag", "")
    if not flag:
        return json.dumps({"success": False, "message": "No flag provided."})
    ctx = _task_flag_context.get(task_id)
    if not ctx:
        return json.dumps({
            "success": False,
            "message": "No active challenge instance for this task.",
        })
    try:
        correct = ctx["client"].check_flag(ctx["slot"], flag)
    except Exception as e:
        logger.error("Flag check failed for task %s: %s", task_id, e, exc_info=True)
        return json.dumps({"success": False, "message": f"Flag check error: {type(e).__name__}"})
    if correct:
        return json.dumps({"success": True, "message": "Flag accepted! Challenge solved."})
    return json.dumps({"success": False, "message": "Incorrect flag."})
 # ---------------------------------------------------------------------------
 # Register in hermes-agent tool registry
 # ---------------------------------------------------------------------------
 SUBMIT_FLAG_SCHEMA = {
    "name": "submit_flag",
    "description": (
        "Submit a flag for verification. Use this when you find a flag "
        "(format: pwn.college{...}) to check if it is correct."
    ),
    "parameters": {
        "type": "object",
        "properties": {
            "flag": {
                "type": "string",
                "description": "The flag string (format: pwn.college{...}).",
            }
        },
        "required": ["flag"],
    },
 }
 from tools.registry import registry
 from toolsets import create_custom_toolset
 registry.register(
    name="submit_flag",
    toolset="pwncollege",
    schema=SUBMIT_FLAG_SCHEMA,
    handler=_submit_flag_handler,
    emoji="🚩",
 )
 create_custom_toolset(
    name="pwncollege",
    description="PwnCollege CTF tools",
    tools=["submit_flag"],
 )
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -58,7 +58,7 @@ sms = ["aiohttp>=3.9.0,<4"]
 acp = ["agent-client-protocol>=0.8.1,<0.9"]
 dingtalk = ["dingtalk-stream>=0.1.0,<1"]
 rl = [
-  "atroposlib @ git+https://github.com/NousResearch/atropos.git",
+  "atroposlib @ git+https://github.com/NousResearch/atropos.git@main",
  "tinker @ git+https://github.com/thinking-machines-lab/tinker.git",
  "fastapi>=0.104.0,<1",
  "uvicorn[standard]>=0.24.0,<1",
--- a/tests/tools/test_ssh_overrides.py
+++ b/tests/tools/test_ssh_overrides.py
@@ -0,0 +1,98 @@
 """Tests for per-task SSH environment overrides."""
 from tools.terminal_tool import (
    register_task_env_overrides,
    clear_task_env_overrides,
    _task_env_overrides,
 )
 class TestSSHOverridesInConfig:
    """Verify SSH config assembly respects per-task overrides."""
    def setup_method(self):
        self._saved = dict(_task_env_overrides)
        _task_env_overrides.clear()
    def teardown_method(self):
        _task_env_overrides.clear()
        _task_env_overrides.update(self._saved)
    def _build_ssh_config(self, task_id: str, global_config: dict) -> dict:
        """Replicate the SSH config assembly logic from terminal_tool.py."""
        overrides = _task_env_overrides.get(task_id, {})
        return {
            "host": overrides.get("ssh_host") or global_config.get("ssh_host", ""),
            "user": overrides.get("ssh_user") or global_config.get("ssh_user", ""),
            "port": overrides.get("ssh_port") or global_config.get("ssh_port", 22),
            "key": overrides.get("ssh_key") or global_config.get("ssh_key", ""),
            "persistent": overrides.get("ssh_persistent", global_config.get("ssh_persistent", False)),
        }
    def test_no_overrides_uses_global(self):
        """Without per-task overrides, global config is used."""
        global_config = {
            "ssh_host": "global.example.com",
            "ssh_user": "root",
            "ssh_port": 22,
            "ssh_key": "/root/.ssh/id_rsa",
            "ssh_persistent": True,
        }
        result = self._build_ssh_config("task-1", global_config)
        assert result["host"] == "global.example.com"
        assert result["user"] == "root"
        assert result["port"] == 22
        assert result["key"] == "/root/.ssh/id_rsa"
        assert result["persistent"] is True
    def test_override_port_and_key(self):
        """Per-task overrides for port and key take precedence."""
        global_config = {
            "ssh_host": "dojo.pwncollege.com",
            "ssh_user": "hacker",
            "ssh_port": 22,
            "ssh_key": "/default/key",
        }
        register_task_env_overrides("task-42", {
            "ssh_port": 2264,
            "ssh_key": "/tmp/keys/episode_42",
        })
        result = self._build_ssh_config("task-42", global_config)
        assert result["port"] == 2264
        assert result["key"] == "/tmp/keys/episode_42"
        # Non-overridden fields fall through to global
        assert result["host"] == "dojo.pwncollege.com"
        assert result["user"] == "hacker"
    def test_different_tasks_get_different_ports(self):
        """128 parallel rollouts each get their own SSH port."""
        global_config = {
            "ssh_host": "dojo.pwncollege.com",
            "ssh_user": "hacker",
            "ssh_port": 22,
            "ssh_key": "",
        }
        for i in range(128):
            tid = f"task-{i}"
            register_task_env_overrides(tid, {"ssh_port": 2222 + i})
        for i in range(128):
            tid = f"task-{i}"
            result = self._build_ssh_config(tid, global_config)
            assert result["port"] == 2222 + i
    def test_clear_overrides_reverts_to_global(self):
        """After clearing, config falls back to global."""
        global_config = {"ssh_port": 22}
        register_task_env_overrides("task-99", {"ssh_port": 9999})
        assert self._build_ssh_config("task-99", global_config)["port"] == 9999
        clear_task_env_overrides("task-99")
        assert self._build_ssh_config("task-99", global_config)["port"] == 22
    def test_persistent_false_not_clobbered_by_or(self):
        """ssh_persistent=False override must not be skipped due to falsy `or`."""
        global_config = {"ssh_persistent": True}
        register_task_env_overrides("task-x", {"ssh_persistent": False})
        result = self._build_ssh_config("task-x", global_config)
        assert result["persistent"] is False
--- a/tools/environments/persistent_shell.py
+++ b/tools/environments/persistent_shell.py
@@ -12,6 +12,11 @@ from tools.interrupt import is_interrupted
 logger = logging.getLogger(__name__)
 _SENTINEL_PREFIX = "__HERMES_DONE_"
 _SENTINEL_SUFFIX = "__"
 from tools.ansi_strip import strip_ansi
 class PersistentShellMixin:
    """Mixin that adds persistent shell capability to any BaseEnvironment.
@@ -40,8 +45,6 @@ class PersistentShellMixin:
    def _cleanup_temp_files(self): ...
    _session_id: str = ""
    _poll_interval_start: float = 0.01  # initial poll interval (10ms)
    _poll_interval_max: float = 0.25    # max poll interval (250ms) — reduces I/O for long commands
    @property
    def _temp_prefix(self) -> str:
@@ -56,6 +59,8 @@ class PersistentShellMixin:
        self._shell_proc: subprocess.Popen | None = None
        self._shell_alive: bool = False
        self._shell_pid: int | None = None
        self._sentinel_event = threading.Event()
        self._sentinel_cmd_id: str | None = None
        self._session_id = uuid.uuid4().hex[:12]
        p = self._temp_prefix
@@ -73,33 +78,55 @@ class PersistentShellMixin:
        )
        self._drain_thread.start()
        init_cmd_id = "init"
        init_script = (
-            f"export TERM=${{TERM:-dumb}}\n"
+            # Disable echo so sentinel markers aren't duplicated in stdout
            f"stty -echo 2>/dev/null\n"
            # Disable history recording — IPC scaffolding pollutes it.
            # Agent commands are added explicitly via `history -s` below.
            f"set +o history\n"
            f"export HISTFILE=/dev/null\n"
            f"export TERM=dumb\n"
            f"touch {self._pshell_stdout} {self._pshell_stderr} "
            f"{self._pshell_status} {self._pshell_cwd} {self._pshell_pid_file}\n"
            f"echo $$ > {self._pshell_pid_file}\n"
            f"pwd > {self._pshell_cwd}\n"
            f"echo '{_SENTINEL_PREFIX}{init_cmd_id}{_SENTINEL_SUFFIX}'\n"
        )
        self._sentinel_event.clear()
        self._sentinel_cmd_id = init_cmd_id
        self._send_to_shell(init_script)
-        deadline = time.monotonic() + 3.0
+        deadline = time.monotonic() + 10.0
        while time.monotonic() < deadline:
-            pid_str = self._read_temp_files(self._pshell_pid_file)[0].strip()
+            remaining = deadline - time.monotonic()
            if self._sentinel_event.wait(timeout=min(remaining, 0.5)):
                break
        else:
            logger.warning("Persistent shell init sentinel not received")
        # Retry reading PID file — temp files may not be flushed yet
        self._shell_pid = None
        reported_cwd = ""
        for _ in range(5):
            time.sleep(0.2)
            pid_str, reported_cwd = self._read_temp_files(
                self._pshell_pid_file, self._pshell_cwd,
            )
            pid_str = pid_str.strip()
            if pid_str.isdigit():
                self._shell_pid = int(pid_str)
                break
            time.sleep(0.05)
        else:
            logger.warning("Could not read persistent shell PID")
            self._shell_pid = None
        if self._shell_pid:
            logger.info(
                "Persistent shell started (session=%s, pid=%d)",
                self._session_id, self._shell_pid,
            )
        else:
            logger.warning("Could not read persistent shell PID")
-        reported_cwd = self._read_temp_files(self._pshell_cwd)[0].strip()
+        reported_cwd = (reported_cwd or "").strip()
        if reported_cwd:
            self.cwd = reported_cwd
@@ -151,11 +178,15 @@ class PersistentShellMixin:
    def _drain_shell_output(self):
        try:
-            for _ in self._shell_proc.stdout:
+            for line in self._shell_proc.stdout:
-                pass
+                clean = strip_ansi(line).strip('\r\n\x00')
                expected = f"{_SENTINEL_PREFIX}{self._sentinel_cmd_id}{_SENTINEL_SUFFIX}"
                if clean.endswith(expected):
                    self._sentinel_event.set()
        except Exception:
            pass
        self._shell_alive = False
        self._sentinel_event.set()
    def _send_to_shell(self, text: str):
        if not self._shell_alive or self._shell_proc is None:
@@ -218,25 +249,21 @@ class PersistentShellMixin:
        ipc_script = (
            f"cd {shlex.quote(work_dir)}\n"
            f"history -s {shlex.quote(command)}\n"
            f"eval '{escaped}' < /dev/null > {self._pshell_stdout} 2> {self._pshell_stderr}\n"
            f"__EC=$?\n"
            f"pwd > {self._pshell_cwd}\n"
            f"echo {cmd_id}:$__EC > {self._pshell_status}\n"
            f"echo '{_SENTINEL_PREFIX}{cmd_id}{_SENTINEL_SUFFIX}'\n"
        )
        self._sentinel_event.clear()
        self._sentinel_cmd_id = cmd_id
        self._send_to_shell(ipc_script)
        deadline = time.monotonic() + timeout
        poll_interval = self._poll_interval_start  # starts at 10ms, backs off to 250ms
        while True:
-            if is_interrupted():
+            remaining = deadline - time.monotonic()
-                self._kill_shell_children()
+            if remaining <= 0:
                output, _, _ = self._read_persistent_output()
                return {
                    "output": output + "\n[Command interrupted]",
                    "returncode": 130,
                }
            if time.monotonic() > deadline:
                self._kill_shell_children()
                output, _, _ = self._read_persistent_output()
                if output:
@@ -246,22 +273,23 @@ class PersistentShellMixin:
                    }
                return self._timeout_result(timeout)
            if is_interrupted():
                self._kill_shell_children()
                output, _, _ = self._read_persistent_output()
                return {
                    "output": output + "\n[Command interrupted]",
                    "returncode": 130,
                }
            if not self._shell_alive:
                return {
                    "output": "Persistent shell died during execution",
                    "returncode": 1,
                }
-            status_content = self._read_temp_files(self._pshell_status)[0].strip()
+            if self._sentinel_event.wait(timeout=min(remaining, 0.5)):
            if status_content.startswith(cmd_id + ":"):
                break
            time.sleep(poll_interval)
            # Exponential backoff: fast start (10ms) for quick commands,
            # ramps up to 250ms for long-running commands — reduces I/O by 10-25x
            # on WSL2 where polling keeps the VM hot and memory pressure high.
            poll_interval = min(poll_interval * 1.5, self._poll_interval_max)
        output, exit_code, new_cwd = self._read_persistent_output()
        if new_cwd:
            self.cwd = new_cwd
--- a/tools/environments/ssh.py
+++ b/tools/environments/ssh.py
@@ -3,7 +3,6 @@
 import logging
 import shutil
 import subprocess
 import tempfile
 import threading
 import time
 from pathlib import Path
@@ -50,7 +49,11 @@ class SSHEnvironment(PersistentShellMixin, BaseEnvironment):
        self.key_path = key_path
        self.persistent = persistent
-        self.control_dir = Path(tempfile.gettempdir()) / "hermes-ssh"
+        # Use /tmp directly instead of platform tempdir — macOS's
        # /var/folders/XX/.../T/ path is ~60 chars, and Unix domain sockets
        # have a 104-char limit. A socket name like "rl_1@10.0.0.5:2222.sock"
        # would exceed it. /tmp/hermes-ssh/ keeps paths short.
        self.control_dir = Path("/tmp/hermes-ssh")
        self.control_dir.mkdir(parents=True, exist_ok=True)
        self.control_socket = self.control_dir / f"{user}@{host}:{port}.sock"
        _ensure_ssh_available()
@@ -65,12 +68,15 @@ class SSHEnvironment(PersistentShellMixin, BaseEnvironment):
        cmd.extend(["-o", "ControlMaster=auto"])
        cmd.extend(["-o", "ControlPersist=300"])
        cmd.extend(["-o", "BatchMode=yes"])
-        cmd.extend(["-o", "StrictHostKeyChecking=accept-new"])
+        cmd.extend(["-o", "StrictHostKeyChecking=no"])
        cmd.extend(["-o", "UserKnownHostsFile=/dev/null"])
        cmd.extend(["-o", "LogLevel=ERROR"])
        cmd.extend(["-o", "ConnectTimeout=10"])
        if self.port != 22:
            cmd.extend(["-p", str(self.port)])
        if self.key_path:
            cmd.extend(["-i", self.key_path])
            cmd.extend(["-o", "IdentitiesOnly=yes"])
        if extra_args:
            cmd.extend(extra_args)
        cmd.append(f"{self.user}@{self.host}")
@@ -80,21 +86,19 @@ class SSHEnvironment(PersistentShellMixin, BaseEnvironment):
        cmd = self._build_ssh_command()
        cmd.append("echo 'SSH connection established'")
        try:
-            result = subprocess.run(cmd, capture_output=True, text=True, timeout=15)
+            result = subprocess.run(cmd, capture_output=True, text=True, errors="replace", timeout=15)
            if result.returncode != 0:
                error_msg = result.stderr.strip() or result.stdout.strip()
                raise RuntimeError(f"SSH connection failed: {error_msg}")
        except subprocess.TimeoutExpired:
            raise RuntimeError(f"SSH connection to {self.user}@{self.host} timed out")
    _poll_interval_start: float = 0.15  # SSH: higher initial interval (150ms) for network latency
    @property
    def _temp_prefix(self) -> str:
        return f"/tmp/hermes-ssh-{self._session_id}"
    def _spawn_shell_process(self) -> subprocess.Popen:
-        cmd = self._build_ssh_command()
+        cmd = self._build_ssh_command(extra_args=["-tt"])
        cmd.append("bash -l")
        return subprocess.Popen(
            cmd,
@@ -102,6 +106,7 @@ class SSHEnvironment(PersistentShellMixin, BaseEnvironment):
            stdout=subprocess.PIPE,
            stderr=subprocess.DEVNULL,
            text=True,
            errors="replace",
        )
    def _read_temp_files(self, *paths: str) -> list[str]:
@@ -110,7 +115,7 @@ class SSHEnvironment(PersistentShellMixin, BaseEnvironment):
            cmd.append(f"cat {paths[0]} 2>/dev/null")
            try:
                result = subprocess.run(
-                    cmd, capture_output=True, text=True, timeout=10,
+                    cmd, capture_output=True, text=True, errors="replace", timeout=10,
                )
                return [result.stdout]
            except (subprocess.TimeoutExpired, OSError):
@@ -124,7 +129,7 @@ class SSHEnvironment(PersistentShellMixin, BaseEnvironment):
        cmd.append(script)
        try:
            result = subprocess.run(
-                cmd, capture_output=True, text=True, timeout=10,
+                cmd, capture_output=True, text=True, errors="replace", timeout=10,
            )
            parts = result.stdout.split(delim + "\n")
            return [parts[i] if i < len(parts) else "" for i in range(len(paths))]
@@ -176,6 +181,7 @@ class SSHEnvironment(PersistentShellMixin, BaseEnvironment):
            stderr=subprocess.STDOUT,
            stdin=subprocess.PIPE if effective_stdin else subprocess.DEVNULL,
            text=True,
            errors="replace",
        )
        if effective_stdin:
--- a/tools/file_tools.py
+++ b/tools/file_tools.py
@@ -5,8 +5,8 @@ import errno
 import json
 import logging
 import threading
 from tools.file_operations import ShellFileOperations
 from agent.redact import redact_sensitive_text
 from tools.file_operations import ShellFileOperations
 logger = logging.getLogger(__name__)
@@ -45,13 +45,19 @@ def _get_file_ops(task_id: str = "default") -> ShellFileOperations:
    Thread-safe: uses the same per-task creation locks as terminal_tool to
    prevent duplicate sandbox creation from concurrent tool calls.
    """
    import time
    from tools.terminal_tool import (
-        _active_environments, _env_lock, _create_environment,
+        _active_environments,
-        _get_env_config, _last_activity, _start_cleanup_thread,
+        _check_disk_usage_warning,
        _create_environment,
        _creation_locks,
        _creation_locks_lock,
        _env_lock,
        _get_env_config,
        _last_activity,
        _start_cleanup_thread,
    )
    import time
    # Fast path: check cache -- but also verify the underlying environment
    # is still alive (it may have been killed by the cleanup thread).
@@ -93,7 +99,9 @@ def _get_file_ops(task_id: str = "default") -> ShellFileOperations:
            if env_type == "docker":
                image = overrides.get("docker_image") or config["docker_image"]
            elif env_type == "singularity":
-                image = overrides.get("singularity_image") or config["singularity_image"]
+                image = (
                    overrides.get("singularity_image") or config["singularity_image"]
                )
            elif env_type == "modal":
                image = overrides.get("modal_image") or config["modal_image"]
            elif env_type == "daytona":
@@ -102,7 +110,9 @@ def _get_file_ops(task_id: str = "default") -> ShellFileOperations:
                image = ""
            cwd = overrides.get("cwd") or config["cwd"]
-            logger.info("Creating new %s environment for task %s...", env_type, task_id[:8])
+            logger.info(
                "Creating new %s environment for task %s...", env_type, task_id[:8]
            )
            container_config = None
            if env_type in ("docker", "singularity", "modal", "daytona"):
@@ -117,11 +127,13 @@ def _get_file_ops(task_id: str = "default") -> ShellFileOperations:
            ssh_config = None
            if env_type == "ssh":
                ssh_config = {
-                    "host": config.get("ssh_host", ""),
+                    "host": overrides.get("ssh_host") or config.get("ssh_host", ""),
-                    "user": config.get("ssh_user", ""),
+                    "user": overrides.get("ssh_user") or config.get("ssh_user", ""),
-                    "port": config.get("ssh_port", 22),
+                    "port": overrides.get("ssh_port") or config.get("ssh_port", 22),
-                    "key": config.get("ssh_key", ""),
+                    "key": overrides.get("ssh_key") or config.get("ssh_key", ""),
-                    "persistent": config.get("ssh_persistent", False),
+                    "persistent": overrides.get(
                        "ssh_persistent", config.get("ssh_persistent", False)
                    ),
                }
            local_config = None
@@ -165,7 +177,9 @@ def clear_file_ops_cache(task_id: str = None):
            _file_ops_cache.clear()
-def read_file_tool(path: str, offset: int = 1, limit: int = 500, task_id: str = "default") -> str:
+def read_file_tool(
    path: str, offset: int = 1, limit: int = 500, task_id: str = "default"
 ) -> str:
    """Read a file with pagination and line numbers."""
    try:
        # Security: block direct reads of internal Hermes cache/index files
@@ -200,9 +214,14 @@ def read_file_tool(path: str, offset: int = 1, limit: int = 500, task_id: str =
        # so only truly back-to-back identical reads trigger warnings/blocks.
        read_key = ("read", path, offset, limit)
        with _read_tracker_lock:
-            task_data = _read_tracker.setdefault(task_id, {
+            task_data = _read_tracker.setdefault(
-                "last_key": None, "consecutive": 0, "read_history": set(),
+                task_id,
-            })
+                {
                    "last_key": None,
                    "consecutive": 0,
                    "read_history": set(),
                },
            )
            task_data["read_history"].add((path, offset, limit))
            if task_data["last_key"] == read_key:
                task_data["consecutive"] += 1
@@ -213,15 +232,18 @@ def read_file_tool(path: str, offset: int = 1, limit: int = 500, task_id: str =
        if count >= 4:
            # Hard block: stop returning content to break the loop
-            return json.dumps({
+            return json.dumps(
-                "error": (
+                {
-                    f"BLOCKED: You have read this exact file region {count} times in a row. "
+                    "error": (
-                    "The content has NOT changed. You already have this information. "
+                        f"BLOCKED: You have read this exact file region {count} times in a row. "
-                    "STOP re-reading and proceed with your task."
+                        "The content has NOT changed. You already have this information. "
-                ),
+                        "STOP re-reading and proceed with your task."
-                "path": path,
+                    ),
-                "already_read": count,
+                    "path": path,
-            }, ensure_ascii=False)
+                    "already_read": count,
                },
                ensure_ascii=False,
            )
        elif count >= 3:
            result_dict["_warning"] = (
                f"You have read this exact file region {count} times consecutively. "
@@ -244,13 +266,12 @@ def get_read_files_summary(task_id: str = "default") -> list:
        task_data = _read_tracker.get(task_id, {})
        read_history = task_data.get("read_history", set())
        seen_paths: dict = {}
-        for (path, offset, limit) in read_history:
+        for path, offset, limit in read_history:
            if path not in seen_paths:
                seen_paths[path] = []
            seen_paths[path].append(f"lines {offset}-{offset + limit - 1}")
        return [
-            {"path": p, "regions": regions}
+            {"path": p, "regions": regions} for p, regions in sorted(seen_paths.items())
            for p, regions in sorted(seen_paths.items())
        ]
@@ -298,9 +319,15 @@ def write_file_tool(path: str, content: str, task_id: str = "default") -> str:
        return json.dumps({"error": str(e)}, ensure_ascii=False)
-def patch_tool(mode: str = "replace", path: str = None, old_string: str = None,
+def patch_tool(
-               new_string: str = None, replace_all: bool = False, patch: str = None,
+    mode: str = "replace",
-               task_id: str = "default") -> str:
+    path: str = None,
    old_string: str = None,
    new_string: str = None,
    replace_all: bool = False,
    patch: str = None,
    task_id: str = "default",
 ) -> str:
    """Patch a file using replace mode or V4A patch format."""
    try:
        file_ops = _get_file_ops(task_id)
@@ -329,10 +356,17 @@ def patch_tool(mode: str = "replace", path: str = None, old_string: str = None,
        return json.dumps({"error": str(e)}, ensure_ascii=False)
-def search_tool(pattern: str, target: str = "content", path: str = ".",
+def search_tool(
-                file_glob: str = None, limit: int = 50, offset: int = 0,
+    pattern: str,
-                output_mode: str = "content", context: int = 0,
+    target: str = "content",
-                task_id: str = "default") -> str:
+    path: str = ".",
    file_glob: str = None,
    limit: int = 50,
    offset: int = 0,
    output_mode: str = "content",
    context: int = 0,
    task_id: str = "default",
 ) -> str:
    """Search for content or files."""
    try:
        # Track searches to detect *consecutive* repeated search loops.
@@ -348,9 +382,14 @@ def search_tool(pattern: str, target: str = "content", path: str = ".",
            offset,
        )
        with _read_tracker_lock:
-            task_data = _read_tracker.setdefault(task_id, {
+            task_data = _read_tracker.setdefault(
-                "last_key": None, "consecutive": 0, "read_history": set(),
+                task_id,
-            })
+                {
                    "last_key": None,
                    "consecutive": 0,
                    "read_history": set(),
                },
            )
            if task_data["last_key"] == search_key:
                task_data["consecutive"] += 1
            else:
@@ -359,24 +398,33 @@ def search_tool(pattern: str, target: str = "content", path: str = ".",
            count = task_data["consecutive"]
        if count >= 4:
-            return json.dumps({
+            return json.dumps(
-                "error": (
+                {
-                    f"BLOCKED: You have run this exact search {count} times in a row. "
+                    "error": (
-                    "The results have NOT changed. You already have this information. "
+                        f"BLOCKED: You have run this exact search {count} times in a row. "
-                    "STOP re-searching and proceed with your task."
+                        "The results have NOT changed. You already have this information. "
-                ),
+                        "STOP re-searching and proceed with your task."
-                "pattern": pattern,
+                    ),
-                "already_searched": count,
+                    "pattern": pattern,
-            }, ensure_ascii=False)
+                    "already_searched": count,
                },
                ensure_ascii=False,
            )
        file_ops = _get_file_ops(task_id)
        result = file_ops.search(
-            pattern=pattern, path=path, target=target, file_glob=file_glob,
+            pattern=pattern,
-            limit=limit, offset=offset, output_mode=output_mode, context=context
+            path=path,
            target=target,
            file_glob=file_glob,
            limit=limit,
            offset=offset,
            output_mode=output_mode,
            context=context,
        )
-        if hasattr(result, 'matches'):
+        if hasattr(result, "matches"):
            for m in result.matches:
-                if hasattr(m, 'content') and m.content:
+                if hasattr(m, "content") and m.content:
                    m.content = redact_sensitive_text(m.content)
        result_dict = result.to_dict()
@@ -401,7 +449,7 @@ FILE_TOOLS = [
    {"name": "read_file", "function": read_file_tool},
    {"name": "write_file", "function": write_file_tool},
    {"name": "patch", "function": patch_tool},
-    {"name": "search_files", "function": search_tool}
+    {"name": "search_files", "function": search_tool},
 ]
@@ -419,20 +467,35 @@ from tools.registry import registry
 def _check_file_reqs():
    """Lazy wrapper to avoid circular import with tools/__init__.py."""
    from tools import check_file_requirements
    return check_file_requirements()
 READ_FILE_SCHEMA = {
    "name": "read_file",
    "description": "Read a text file with line numbers and pagination. Use this instead of cat/head/tail in terminal. Output format: 'LINE_NUM|CONTENT'. Suggests similar filenames if not found. Use offset and limit for large files. NOTE: Cannot read images or binary files — use vision_analyze for images.",
    "parameters": {
        "type": "object",
        "properties": {
-            "path": {"type": "string", "description": "Path to the file to read (absolute, relative, or ~/path)"},
+            "path": {
-            "offset": {"type": "integer", "description": "Line number to start reading from (1-indexed, default: 1)", "default": 1, "minimum": 1},
+                "type": "string",
-            "limit": {"type": "integer", "description": "Maximum number of lines to read (default: 500, max: 2000)", "default": 500, "maximum": 2000}
+                "description": "Path to the file to read (absolute, relative, or ~/path)",
            },
            "offset": {
                "type": "integer",
                "description": "Line number to start reading from (1-indexed, default: 1)",
                "default": 1,
                "minimum": 1,
            },
            "limit": {
                "type": "integer",
                "description": "Maximum number of lines to read (default: 500, max: 2000)",
                "default": 500,
                "maximum": 2000,
            },
        },
-        "required": ["path"]
+        "required": ["path"],
-    }
+    },
 }
 WRITE_FILE_SCHEMA = {
@@ -441,11 +504,17 @@ WRITE_FILE_SCHEMA = {
    "parameters": {
        "type": "object",
        "properties": {
-            "path": {"type": "string", "description": "Path to the file to write (will be created if it doesn't exist, overwritten if it does)"},
+            "path": {
-            "content": {"type": "string", "description": "Complete content to write to the file"}
+                "type": "string",
                "description": "Path to the file to write (will be created if it doesn't exist, overwritten if it does)",
            },
            "content": {
                "type": "string",
                "description": "Complete content to write to the file",
            },
        },
-        "required": ["path", "content"]
+        "required": ["path", "content"],
-    }
+    },
 }
 PATCH_SCHEMA = {
@@ -454,15 +523,36 @@ PATCH_SCHEMA = {
    "parameters": {
        "type": "object",
        "properties": {
-            "mode": {"type": "string", "enum": ["replace", "patch"], "description": "Edit mode: 'replace' for targeted find-and-replace, 'patch' for V4A multi-file patches", "default": "replace"},
+            "mode": {
-            "path": {"type": "string", "description": "File path to edit (required for 'replace' mode)"},
+                "type": "string",
-            "old_string": {"type": "string", "description": "Text to find in the file (required for 'replace' mode). Must be unique in the file unless replace_all=true. Include enough surrounding context to ensure uniqueness."},
+                "enum": ["replace", "patch"],
-            "new_string": {"type": "string", "description": "Replacement text (required for 'replace' mode). Can be empty string to delete the matched text."},
+                "description": "Edit mode: 'replace' for targeted find-and-replace, 'patch' for V4A multi-file patches",
-            "replace_all": {"type": "boolean", "description": "Replace all occurrences instead of requiring a unique match (default: false)", "default": False},
+                "default": "replace",
-            "patch": {"type": "string", "description": "V4A format patch content (required for 'patch' mode). Format:\n*** Begin Patch\n*** Update File: path/to/file\n@@ context hint @@\n context line\n-removed line\n+added line\n*** End Patch"}
+            },
            "path": {
                "type": "string",
                "description": "File path to edit (required for 'replace' mode)",
            },
            "old_string": {
                "type": "string",
                "description": "Text to find in the file (required for 'replace' mode). Must be unique in the file unless replace_all=true. Include enough surrounding context to ensure uniqueness.",
            },
            "new_string": {
                "type": "string",
                "description": "Replacement text (required for 'replace' mode). Can be empty string to delete the matched text.",
            },
            "replace_all": {
                "type": "boolean",
                "description": "Replace all occurrences instead of requiring a unique match (default: false)",
                "default": False,
            },
            "patch": {
                "type": "string",
                "description": "V4A format patch content (required for 'patch' mode). Format:\n*** Begin Patch\n*** Update File: path/to/file\n@@ context hint @@\n context line\n-removed line\n+added line\n*** End Patch",
            },
        },
-        "required": ["mode"]
+        "required": ["mode"],
-    }
+    },
 }
 SEARCH_FILES_SCHEMA = {
@@ -471,36 +561,80 @@ SEARCH_FILES_SCHEMA = {
    "parameters": {
        "type": "object",
        "properties": {
-            "pattern": {"type": "string", "description": "Regex pattern for content search, or glob pattern (e.g., '*.py') for file search"},
+            "pattern": {
-            "target": {"type": "string", "enum": ["content", "files"], "description": "'content' searches inside file contents, 'files' searches for files by name", "default": "content"},
+                "type": "string",
-            "path": {"type": "string", "description": "Directory or file to search in (default: current working directory)", "default": "."},
+                "description": "Regex pattern for content search, or glob pattern (e.g., '*.py') for file search",
-            "file_glob": {"type": "string", "description": "Filter files by pattern in grep mode (e.g., '*.py' to only search Python files)"},
+            },
-            "limit": {"type": "integer", "description": "Maximum number of results to return (default: 50)", "default": 50},
+            "target": {
-            "offset": {"type": "integer", "description": "Skip first N results for pagination (default: 0)", "default": 0},
+                "type": "string",
-            "output_mode": {"type": "string", "enum": ["content", "files_only", "count"], "description": "Output format for grep mode: 'content' shows matching lines with line numbers, 'files_only' lists file paths, 'count' shows match counts per file", "default": "content"},
+                "enum": ["content", "files"],
-            "context": {"type": "integer", "description": "Number of context lines before and after each match (grep mode only)", "default": 0}
+                "description": "'content' searches inside file contents, 'files' searches for files by name",
                "default": "content",
            },
            "path": {
                "type": "string",
                "description": "Directory or file to search in (default: current working directory)",
                "default": ".",
            },
            "file_glob": {
                "type": "string",
                "description": "Filter files by pattern in grep mode (e.g., '*.py' to only search Python files)",
            },
            "limit": {
                "type": "integer",
                "description": "Maximum number of results to return (default: 50)",
                "default": 50,
            },
            "offset": {
                "type": "integer",
                "description": "Skip first N results for pagination (default: 0)",
                "default": 0,
            },
            "output_mode": {
                "type": "string",
                "enum": ["content", "files_only", "count"],
                "description": "Output format for grep mode: 'content' shows matching lines with line numbers, 'files_only' lists file paths, 'count' shows match counts per file",
                "default": "content",
            },
            "context": {
                "type": "integer",
                "description": "Number of context lines before and after each match (grep mode only)",
                "default": 0,
            },
        },
-        "required": ["pattern"]
+        "required": ["pattern"],
-    }
+    },
 }
 def _handle_read_file(args, **kw):
    tid = kw.get("task_id") or "default"
-    return read_file_tool(path=args.get("path", ""), offset=args.get("offset", 1), limit=args.get("limit", 500), task_id=tid)
+    return read_file_tool(
        path=args.get("path", ""),
        offset=args.get("offset", 1),
        limit=args.get("limit", 500),
        task_id=tid,
    )
 def _handle_write_file(args, **kw):
    tid = kw.get("task_id") or "default"
-    return write_file_tool(path=args.get("path", ""), content=args.get("content", ""), task_id=tid)
+    return write_file_tool(
        path=args.get("path", ""), content=args.get("content", ""), task_id=tid
    )
 def _handle_patch(args, **kw):
    tid = kw.get("task_id") or "default"
    return patch_tool(
-        mode=args.get("mode", "replace"), path=args.get("path"),
+        mode=args.get("mode", "replace"),
-        old_string=args.get("old_string"), new_string=args.get("new_string"),
+        path=args.get("path"),
-        replace_all=args.get("replace_all", False), patch=args.get("patch"), task_id=tid)
+        old_string=args.get("old_string"),
        new_string=args.get("new_string"),
        replace_all=args.get("replace_all", False),
        patch=args.get("patch"),
        task_id=tid,
    )
 def _handle_search_files(args, **kw):
@@ -509,12 +643,47 @@ def _handle_search_files(args, **kw):
    raw_target = args.get("target", "content")
    target = target_map.get(raw_target, raw_target)
    return search_tool(
-        pattern=args.get("pattern", ""), target=target, path=args.get("path", "."),
+        pattern=args.get("pattern", ""),
-        file_glob=args.get("file_glob"), limit=args.get("limit", 50), offset=args.get("offset", 0),
+        target=target,
-        output_mode=args.get("output_mode", "content"), context=args.get("context", 0), task_id=tid)
+        path=args.get("path", "."),
        file_glob=args.get("file_glob"),
        limit=args.get("limit", 50),
        offset=args.get("offset", 0),
        output_mode=args.get("output_mode", "content"),
        context=args.get("context", 0),
        task_id=tid,
    )
-registry.register(name="read_file", toolset="file", schema=READ_FILE_SCHEMA, handler=_handle_read_file, check_fn=_check_file_reqs, emoji="📖")
+registry.register(
-registry.register(name="write_file", toolset="file", schema=WRITE_FILE_SCHEMA, handler=_handle_write_file, check_fn=_check_file_reqs, emoji="✍️")
+    name="read_file",
-registry.register(name="patch", toolset="file", schema=PATCH_SCHEMA, handler=_handle_patch, check_fn=_check_file_reqs, emoji="🔧")
+    toolset="file",
-registry.register(name="search_files", toolset="file", schema=SEARCH_FILES_SCHEMA, handler=_handle_search_files, check_fn=_check_file_reqs, emoji="🔎")
+    schema=READ_FILE_SCHEMA,
    handler=_handle_read_file,
    check_fn=_check_file_reqs,
    emoji="📖",
 )
 registry.register(
    name="write_file",
    toolset="file",
    schema=WRITE_FILE_SCHEMA,
    handler=_handle_write_file,
    check_fn=_check_file_reqs,
    emoji="✍️",
 )
 registry.register(
    name="patch",
    toolset="file",
    schema=PATCH_SCHEMA,
    handler=_handle_patch,
    check_fn=_check_file_reqs,
    emoji="🔧",
 )
 registry.register(
    name="search_files",
    toolset="file",
    schema=SEARCH_FILES_SCHEMA,
    handler=_handle_search_files,
    check_fn=_check_file_reqs,
    emoji="🔎",
 )
--- a/tools/terminal_tool.py
+++ b/tools/terminal_tool.py
@@ -413,6 +413,11 @@ def register_task_env_overrides(task_id: str, overrides: Dict[str, Any]):
        - modal_image: str -- Path to Dockerfile or Docker Hub image name
        - docker_image: str -- Docker image name
        - cwd: str -- Working directory inside the sandbox
        - ssh_host: str -- SSH hostname (overrides TERMINAL_SSH_HOST)
        - ssh_user: str -- SSH username (overrides TERMINAL_SSH_USER)
        - ssh_port: int -- SSH port (overrides TERMINAL_SSH_PORT)
        - ssh_key: str -- Path to SSH private key (overrides TERMINAL_SSH_KEY)
        - ssh_persistent: bool -- Persistent shell mode (overrides TERMINAL_SSH_PERSISTENT)
    Args:
        task_id: The rollout's unique task identifier
@@ -942,11 +947,11 @@ def terminal_tool(
                        ssh_config = None
                        if env_type == "ssh":
                            ssh_config = {
-                                "host": config.get("ssh_host", ""),
+                                "host": overrides.get("ssh_host") or config.get("ssh_host", ""),
-                                "user": config.get("ssh_user", ""),
+                                "user": overrides.get("ssh_user") or config.get("ssh_user", ""),
-                                "port": config.get("ssh_port", 22),
+                                "port": overrides.get("ssh_port") or config.get("ssh_port", 22),
-                                "key": config.get("ssh_key", ""),
+                                "key": overrides.get("ssh_key") or config.get("ssh_key", ""),
-                                "persistent": config.get("ssh_persistent", False),
+                                "persistent": overrides.get("ssh_persistent", config.get("ssh_persistent", False)),
                            }
                        container_config = None
--- a/uv.lock
+++ b/uv.lock
@@ -262,7 +262,7 @@ sdist = { url = "https://files.pythonhosted.org/packages/87/c6/53da25344e3e3a9c0
 [[package]]
 name = "atroposlib"
 version = "0.4.0"
-source = { git = "https://github.com/NousResearch/atropos.git#c421582b6f7ce8a32f751aab3117d3824ac8f709" }
+source = { git = "https://github.com/NousResearch/atropos.git?rev=main#c20c85256e5a45ad31edf8b7276e9c5ee1995a30" }
 dependencies = [
    { name = "aiofiles" },
    { name = "aiohttp" },
@@ -1758,12 +1758,12 @@ yc-bench = [
 [package.metadata]
 requires-dist = [
-    { name = "agent-client-protocol", marker = "extra == 'acp'", specifier = ">=0.8.1,<1.0" },
+    { name = "agent-client-protocol", marker = "extra == 'acp'", specifier = ">=0.8.1,<0.9" },
    { name = "aiohttp", marker = "extra == 'homeassistant'", specifier = ">=3.9.0,<4" },
    { name = "aiohttp", marker = "extra == 'messaging'", specifier = ">=3.13.3,<4" },
    { name = "aiohttp", marker = "extra == 'sms'", specifier = ">=3.9.0,<4" },
    { name = "anthropic", specifier = ">=0.39.0,<1" },
-    { name = "atroposlib", marker = "extra == 'rl'", git = "https://github.com/NousResearch/atropos.git" },
+    { name = "atroposlib", marker = "extra == 'rl'", git = "https://github.com/NousResearch/atropos.git?rev=main" },
    { name = "croniter", marker = "extra == 'cron'", specifier = ">=6.0.0,<7" },
    { name = "daytona", marker = "extra == 'daytona'", specifier = ">=0.148.0,<1" },
    { name = "dingtalk-stream", marker = "extra == 'dingtalk'", specifier = ">=0.1.0,<1" },
Author	SHA1	Message	Date
Hermes Agent	b4d4fee6fe	pwncollege: slot pool for process mode + include_challenges filter Replace asyncio.Semaphore with pre-allocated slot pool (asyncio.Queue) in process_manager. Eliminates silent item drops from slot contention — 188/850 items were lost in the semaphore-based approach. Key changes: - _acquire_instance(): pool mode resets existing slot, falls back to create on failure. Tracks actual slot ID through replacements. - collect_trajectory(): accepts pool_instance kwarg to skip acquisition in pool mode. evaluate/serve modes unchanged. - process_manager(): pre-allocates dojo slots into asyncio.Queue, tasks wait for real slots, return them via finally (even on failure). - include_challenges config field: explicit challenge list for retry runs, overrides dojo/module filters in setup(). Bug fixes from Claude Code review: - Dead slot no longer returned to pool on acquisition failure (actual_slot=None) - Tool resolution moved before asyncio.gather (no concurrent redundant calls) - Slot replacement logged for debugging - Pre-allocation count fix in error message	2026-03-31 22:30:25 +00:00
Hermes Agent	a1f9961f51	feat: add disable_secret_redaction config for RL environments Adds a new disable_secret_redaction field to HermesAgentEnvConfig that sets HERMES_REDACT_SECRETS=false, preventing the secret redactor from munging source code containing password fields (e.g. Flask apps in web-security challenges). Follows same pattern as disable_command_guards -> HERMES_YOLO_MODE.	2026-03-31 16:19:22 +00:00
alt-glitch	3741ee08d2	pwncollege: auto-generate and register SSH key when not configured If ssh_key is empty or the file doesn't exist, setup() now generates an ed25519 keypair to a temp dir and registers it with the dojo via the SDK. Temp keys are cleaned up on exit.	2026-03-31 01:06:51 -07:00
alt-glitch	9cd3050a08	pwncollege: concurrent process mode for full-dojo trajectory collection Override process_manager() to process items concurrently instead of Atropos's default sequential loop. Uses asyncio.Semaphore gated by eval_concurrency to saturate all dojo slots (16) across different challenges simultaneously. Add process_config.yaml for running all 842 challenges with optimal concurrency settings.	2026-03-31 00:56:51 -07:00
alt-glitch	4670f66a33	pwncollege: early stop callback, SSH key SDK, shell robustness - Add early_stop_check callback to HermesAgentLoop for environment-level completion signals (e.g. flag accepted) - Add SSH key management endpoints to DojoRLClient - Harden persistent shell: ANSI stripping via existing strip_ansi(), PID file retry loop, history isolation, sentinel detection cleanup - SSH: disable host key checking, add IdentitiesOnly for key-based auth - Point atroposlib dependency at main branch	2026-03-31 00:24:02 -07:00
Hermes Agent	c9479c6c6f	feat: add disable_command_guards config for RL environments Adds disable_command_guards field to HermesAgentEnvConfig. When enabled, sets HERMES_YOLO_MODE=1 to bypass terminal command security guards (dangerous command detection, tirith scanning, approval prompts). Needed for RL environment runs where agents operate inside isolated containers and need unrestricted command execution (e.g., pwn.college challenges requiring inline Python, raw sockets, binary exploitation). Also adds eval configs for intro-to-cybersecurity and smoke test, and .gitignore for SSH keys directory.	2026-03-29 17:44:51 +00:00
alt-glitch	5a5d7ec2a2	pwncollege: sentinel-based shell completion, eval improvements, retry hardening - Replace polling-based command completion with sentinel event detection in persistent shell (eliminates I/O polling, immediate completion signaling) - Add SSH PTY allocation (-tt) and safe UTF-8 decoding (errors=replace) - Add retry with exponential backoff for transient instance creation failures - Support eval_challenges list and eval_exclude_modules for flexible eval filtering - Stream eval samples via log_eval_sample() for real-time HTML viewer - Add tmux hint for interactive challenge shells - Add capability verification stress test for pwn-dojo infrastructure - Fix atroposlib dependency to resolve from git (not local path)	2026-03-28 17:20:03 -07:00
alt-glitch	87995cd9c5	pwncollege: add full eval mode with graceful cleanup and richer SDK types	2026-03-27 11:03:02 -07:00
alt-glitch	8fd8def544	update prompts and update SDK w/ types	2026-03-27 11:03:02 -07:00
alt-glitch	1d6a92103a	Clean up formatting and improve error handling in pwncollege environment	2026-03-27 11:03:02 -07:00
alt-glitch	a692859ddb	feat(environments): add pwncollege RL environment with per-task SSH overrides	2026-03-27 11:02:28 -07:00
		`@@ -0,0 +1 @@`
							`from .pwncollege_env import PwnCollegeEnv, PwnCollegeEnvConfig`