terminal tool

fix terminal interactivity
some bugfixes
2026-07-11 12:52:03 +08:00 · 2025-11-02 08:57:04 +08:00 · 2025-11-02 08:52:05 +08:00 · 2025-10-15 18:07:06 +00:00 · 2025-10-08 02:38:04 +00:00 · 2025-10-08 02:33:58 +00:00
22 changed files with 4991 additions and 2774 deletions
--- a/.cursorrules
+++ b/.cursorrules
@@ -0,0 +1,23 @@
+Hermes-Agent is an agent harness for LLMs.
+
+When building, the tool functionality is in the tools/ directory, where each specific tool (or in some cases, tools that are built for the same execution category or api) are placed in a script each their own.
+
+Each tool is then consolidated in the model_tools.py file in the repo root.
+
+There is also a way to consolidate sets of tools in toolsets.py for the agent to use.
+
+The primary agent runner code is in run_agent, but other runners could be developed using the tools and framework.
+
+Always ensure consistency between tools, the model_tools.py and toolsets.py when changing any of them, otherwise they could become desynced in a way that is detrimental to functionality.
+
+The expected pathway for using API keys is to setup and place them in a .env file in the repo root.
+
+Test scripts will be placed in tests/
+
+The run_agent loop is setup to:
+- Process the enabled toolsets to provide to the model,
+- Pipe in a prompt or problem from the input to the agent,
+- Loop the LLM each time it calls a tool, until the model decides no more tools are needed and provides a natural language response,
+- Return that response.
+
+There are additional caveats for logging, where we restructure the "tools" as a system prompt for storage later into a format that can be used and handled properly later.
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,49 @@
+# Hermes Agent Environment Configuration
+# Copy this file to .env and fill in your API keys
+# Get API keys from the URLs listed below
+
+# =============================================================================
+# REQUIRED API KEYS
+# =============================================================================
+
+# Anthropic API Key - Main agent model
+# Get at: https://console.anthropic.com/
+ANTHROPIC_API_KEY=
+
+# Firecrawl API Key - Web search, extract, and crawl
+# Get at: https://firecrawl.dev/
+FIRECRAWL_API_KEY=
+
+# Nous Research API Key - Vision analysis and multi-model reasoning
+# Get at: https://inference-api.nousresearch.com/
+NOUS_API_KEY=
+
+# Morph API Key - Terminal/command execution tools
+# Get at: https://morph.so/
+MORPH_API_KEY=
+
+# FAL.ai API Key - Image generation
+# Get at: https://fal.ai/
+FAL_KEY=
+
+# =============================================================================
+# OPTIONAL API KEYS
+# =============================================================================
+
+# OpenAI API Key - Optional, for enhanced Hecate features
+# Get at: https://platform.openai.com/
+OPENAI_API_KEY=
+
+# =============================================================================
+# OPTIONAL CONFIGURATION
+# =============================================================================
+
+# Terminal Tool Settings
+HECATE_VM_LIFETIME_SECONDS=300
+HECATE_DEFAULT_SNAPSHOT_ID=snapshot_p5294qxt
+
+# Debug Logging (set to "true" to enable, logs saved to ./logs/)
+WEB_TOOLS_DEBUG=false
+VISION_TOOLS_DEBUG=false
+MOA_TOOLS_DEBUG=false
+IMAGE_TOOLS_DEBUG=false
--- a/.gitignore
+++ b/.gitignore
@@ -16,3 +16,8 @@ __pycache__/
 export*
 __pycache__/model_tools.cpython-310.pyc
 __pycache__/web_tools.cpython-310.pyc
+logs/
+data/
+.pytest_cache/
+tmp/
+temp_vision_images/
--- a/README.md
+++ b/README.md
@@ -1,13 +1,99 @@
+# Hermes Agent
+
+An AI agent with advanced tool-calling capabilities, featuring a flexible toolsets system for organizing and managing tools.
+
+## Features
+
+- **Web Tools**: Search, extract content, and crawl websites
+- **Terminal Tools**: Execute commands with interactive session support
+- **Vision Tools**: Analyze images from URLs
+- **Reasoning Tools**: Advanced multi-model reasoning (Mixture of Agents)
+- **Creative Tools**: Generate images from text prompts
+- **Toolsets System**: Organize tools into logical groups for different scenarios
+- **Batch Processing**: Process datasets in parallel with checkpointing and statistics tracking
+- **Ephemeral System Prompts**: Guide model behavior without polluting training datasets
+
 ## Setup
-```
+
+### 1. Install Dependencies
+```bash
+# Create and activate virtual environment (recommended)
+python3 -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+
+# Install required packages
 pip install -r requirements.txt
+
+# Install Hecate for terminal tools
 git clone git@github.com:NousResearch/hecate.git
 cd hecate
 pip install -e .
+cd ..
 ```

-## Run
+### 2. Configure Environment Variables
+```bash
+# Copy the example environment file
+cp .env.example .env
+
+# Edit .env and add your API keys
+nano .env  # or use your preferred editor
 ```
+
+**Required API Keys:**
+- `ANTHROPIC_API_KEY` - Main agent model (get at: https://console.anthropic.com/)
+- `FIRECRAWL_API_KEY` - Web tools (get at: https://firecrawl.dev/)
+- `NOUS_API_KEY` - Vision & reasoning tools (get at: https://inference-api.nousresearch.com/)
+- `MORPH_API_KEY` - Terminal tools (get at: https://morph.so/)
+- `FAL_KEY` - Image generation (get at: https://fal.ai/)
+- `OPENAI_API_KEY` - Optional, for some Hecate features
+
+See `.env.example` for all available configuration options including debug settings and terminal tool configuration.
+
+## Toolsets System
+
+The agent uses a toolsets system for organizing and managing tools. All tools must be part of a toolset to be accessible - individual tool selection is not supported. This ensures consistent and logical grouping of capabilities.
+
+### Key Concepts
+
+- **Toolsets**: Logical groups of tools for specific use cases (e.g., "research", "development", "debugging")
+- **Composition**: Toolsets can include other toolsets for powerful combinations
+- **Custom Toolsets**: Create your own toolsets at runtime or by editing `toolsets.py`
+- **Toolset-Only Access**: Tools are only accessible through toolsets, not individually
+
+### Available Toolsets
+
+See `toolsets.py` for the complete list of predefined toolsets including:
+- Basic toolsets (web, terminal, vision, creative, reasoning)
+- Composite toolsets (research, development, analysis, etc.)
+- Scenario-specific toolsets (debugging, documentation, API testing, etc.)
+- Special toolsets (safe mode without terminal, minimal, offline)
+
+### Using Toolsets
+
+```bash
+# Use a predefined toolset
+python run_agent.py --enabled_toolsets=research --query "Find latest AI papers"
+
+# Combine multiple toolsets
+python run_agent.py --enabled_toolsets=web,vision --query "Analyze this website"
+
+# Enable all toolsets explicitly (same as omitting the flag)
+python run_agent.py --enabled_toolsets=all --query "Do web research and run commands if helpful"
+
+# Safe mode (no terminal access)
+python run_agent.py --enabled_toolsets=safe --query "Help without running commands"
+
+# List all available toolsets and tools
+python run_agent.py --list_tools
+```
+
+For detailed documentation on toolsets, see `TOOLSETS_README.md`.
+
+## Basic Usage
+
+### Default (all tools enabled)
+```bash
 python run_agent.py \
  --query "search up the latest docs on jit in python 3.13 and write me basic example that's not in their docs. profile its perf" \
  --max_turns 20 \
@@ -15,3 +101,143 @@ python run_agent.py \
  --base_url https://api.anthropic.com/v1/ \
  --api_key $ANTHROPIC_API_KEY
 ```
+
+### With specific toolset
+```bash
+python run_agent.py \
+  --query "Debug this Python error" \
+  --enabled_toolsets=debugging \
+  --model claude-sonnet-4-20250514 \
+  --api_key $ANTHROPIC_API_KEY
+```
+
+### Python API
+```python
+from run_agent import AIAgent
+
+# Use a specific toolset
+agent = AIAgent(
+    model="claude-opus-4-20250514",
+    enabled_toolsets=["research"]
+)
+response = agent.chat("Find information about quantum computing")
+
+# Create custom toolset at runtime
+from toolsets import create_custom_toolset
+
+create_custom_toolset(
+    name="my_tools",
+    description="My custom toolkit",
+    tools=["web_search"],
+    includes=["terminal", "vision"]
+)
+
+agent = AIAgent(enabled_toolsets=["my_tools"])
+```
+
+## Batch Processing
+
+Process multiple prompts from a dataset in parallel with automatic checkpointing and statistics tracking:
+
+```bash
+# Basic batch processing
+python batch_runner.py \
+  --dataset_file=prompts.jsonl \
+  --batch_size=20 \
+  --run_name=my_run
+
+# With specific distribution
+python batch_runner.py \
+  --dataset_file=prompts.jsonl \
+  --batch_size=20 \
+  --run_name=image_run \
+  --distribution=image_gen \
+  --num_workers=4
+```
+
+**Key Features:**
+- Parallel processing with configurable workers
+- Toolset distributions for varied data generation
+- Automatic checkpointing and resume capability
+- Combined output in `data/<run_name>/trajectories.jsonl`
+- Tool usage statistics and success rates
+
+**Quick Start:** See [QUICKSTART_BATCH.md](QUICKSTART_BATCH.md) for a 5-minute getting started guide.  
+**Full Documentation:** See [BATCH_PROCESSING.md](BATCH_PROCESSING.md) for comprehensive documentation.
+
+### Ephemeral System Prompts
+
+The ephemeral system prompt feature allows you to guide the model's behavior during batch processing **without** saving that prompt to the training dataset trajectories. This is useful for:
+
+- Guiding model behavior during data collection
+- Adding task-specific instructions 
+- Keeping saved trajectories clean and focused on tool-calling format
+
+**Example:**
+```bash
+python batch_runner.py \
+  --dataset_file=prompts.jsonl \
+  --batch_size=10 \
+  --run_name=my_run \
+  --ephemeral_system_prompt="You are a helpful assistant focused on image generation."
+```
+
+The ephemeral prompt will influence the model's behavior during execution, but **only the standard tool-calling system prompt** will be saved in the trajectory files.
+
+**Documentation:** See [docs/ephemeral_system_prompt.md](docs/ephemeral_system_prompt.md) for complete details.
+
+## Command Line Arguments
+
+**Single Agent (`run_agent.py`):**
+- `--query`: The question or task for the agent
+- `--model`: Model to use (default: claude-opus-4-20250514)
+- `--api_key`: API key for authentication
+- `--base_url`: API endpoint URL
+- `--max_turns`: Maximum number of tool-calling iterations
+- `--enabled_toolsets`: Comma-separated list of toolsets to enable. Use `all` (or `*`) to enable everything. If omitted, all toolsets are enabled by default.
+- `--disabled_toolsets`: Comma-separated list of toolsets to disable
+- `--list_tools`: List all available toolsets and tools
+- `--save_trajectories`: Save conversation trajectories to JSONL files
+
+**Batch Processing (`batch_runner.py`):**
+- `--dataset_file`: Path to JSONL file with prompts
+- `--batch_size`: Number of prompts per batch
+- `--run_name`: Name for this run (for output/checkpointing)
+- `--distribution`: Toolset distribution to use (default: "default")
+- `--num_workers`: Number of parallel workers (default: 4)
+- `--resume`: Resume from checkpoint if interrupted
+- `--ephemeral_system_prompt`: System prompt used during execution but NOT saved to trajectories
+- `--list_distributions`: List available toolset distributions
+
+## Environment Variables
+
+All environment variables can be configured in the `.env` file (copy from `.env.example`).
+
+**Core API Keys:**
+- `ANTHROPIC_API_KEY`: Main agent model
+- `FIRECRAWL_API_KEY`: Web tools (search, extract, crawl)
+- `NOUS_API_KEY`: Vision and reasoning tools
+- `MORPH_API_KEY`: Terminal tools
+- `FAL_KEY`: Image generation tools
+- `OPENAI_API_KEY`: Optional, for some Hecate features
+
+**Configuration Options:**
+- `HECATE_VM_LIFETIME_SECONDS`: VM lifetime (default: 300)
+- `HECATE_DEFAULT_SNAPSHOT_ID`: Default snapshot (default: snapshot_p5294qxt)
+- `WEB_TOOLS_DEBUG`, `VISION_TOOLS_DEBUG`, `MOA_TOOLS_DEBUG`, `IMAGE_TOOLS_DEBUG`: Enable debug logging
+
+## Documentation
+
+**Single Agent Usage:**
+- `TOOLSETS_README.md`: Comprehensive guide to the toolsets system
+- `toolsets.py`: View and modify available toolsets
+- `model_tools.py`: Core tool definitions and handlers
+
+**Batch Processing:**
+- `QUICKSTART_BATCH.md`: 5-minute quick start guide
+- `BATCH_PROCESSING.md`: Complete batch processing documentation
+- `toolset_distributions.py`: Toolset distributions for data generation
+
+## Examples
+
+See `TOOLSETS_README.md` for extensive examples of using different toolsets for various scenarios.
--- a/batch_runner.py
+++ b/batch_runner.py
@@ -0,0 +1,746 @@
+#!/usr/bin/env python3
+"""
+Batch Agent Runner
+
+This module provides parallel batch processing capabilities for running the agent
+across multiple prompts from a dataset. It includes:
+- Dataset loading and batching
+- Parallel batch processing with multiprocessing
+- Checkpointing for fault tolerance and resumption
+- Trajectory saving in the proper format (from/value pairs)
+- Tool usage statistics aggregation across all batches
+
+Usage:
+    python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=my_run
+    
+    # Resume an interrupted run
+    python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=my_run --resume
+    
+    # Use a specific toolset distribution
+    python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=my_run --distribution=image_gen
+"""
+
+import json
+import logging
+import os
+import time
+from pathlib import Path
+from typing import List, Dict, Any, Optional, Tuple
+from datetime import datetime
+from multiprocessing import Pool, Manager, Lock
+import traceback
+
+import fire
+
+from run_agent import AIAgent
+from toolset_distributions import (
+    get_distribution, 
+    list_distributions, 
+    sample_toolsets_from_distribution,
+    validate_distribution
+)
+
+
+# Global configuration for worker processes
+_WORKER_CONFIG = {}
+
+
+def _extract_tool_stats(messages: List[Dict[str, Any]]) -> Dict[str, Dict[str, int]]:
+    """
+    Extract tool usage statistics from message history.
+    
+    Args:
+        messages (List[Dict]): Message history
+        
+    Returns:
+        Dict: Tool statistics with counts and success/failure rates
+    """
+    tool_stats = {}
+    
+    # Track tool calls and their results
+    tool_calls_map = {}  # Map tool_call_id to tool name
+    
+    for msg in messages:
+        # Track tool calls from assistant messages
+        if msg["role"] == "assistant" and "tool_calls" in msg and msg["tool_calls"]:
+            for tool_call in msg["tool_calls"]:
+                tool_name = tool_call["function"]["name"]
+                tool_call_id = tool_call["id"]
+                
+                # Initialize stats for this tool if not exists
+                if tool_name not in tool_stats:
+                    tool_stats[tool_name] = {
+                        "count": 0,
+                        "success": 0,
+                        "failure": 0
+                    }
+                
+                tool_stats[tool_name]["count"] += 1
+                tool_calls_map[tool_call_id] = tool_name
+        
+        # Track tool responses
+        elif msg["role"] == "tool":
+            tool_call_id = msg.get("tool_call_id", "")
+            content = msg.get("content", "")
+            
+            # Determine if tool call was successful
+            is_success = True
+            try:
+                # Try to parse as JSON and check for actual error values
+                content_json = json.loads(content) if isinstance(content, str) else content
+                
+                if isinstance(content_json, dict):
+                    # Check if error field exists AND has a non-null value
+                    if "error" in content_json and content_json["error"] is not None:
+                        is_success = False
+                    
+                    # Special handling for terminal tool responses
+                    # Terminal wraps its response in a "content" field
+                    if "content" in content_json and isinstance(content_json["content"], dict):
+                        inner_content = content_json["content"]
+                        # Check for actual error (non-null error field or non-zero exit code)
+                        has_error = (inner_content.get("error") is not None or 
+                                   inner_content.get("exit_code", 0) != 0)
+                        if has_error:
+                            is_success = False
+                    
+                    # Check for "success": false pattern used by some tools
+                    if content_json.get("success") is False:
+                        is_success = False
+                        
+            except:
+                # If not JSON, check if content is empty or explicitly states an error
+                # Note: We avoid simple substring matching to prevent false positives
+                if not content:
+                    is_success = False
+                # Only mark as failure if it explicitly starts with "Error:" or "ERROR:"
+                elif content.strip().lower().startswith("error:"):
+                    is_success = False
+            
+            # Update success/failure count
+            if tool_call_id in tool_calls_map:
+                tool_name = tool_calls_map[tool_call_id]
+                if is_success:
+                    tool_stats[tool_name]["success"] += 1
+                else:
+                    tool_stats[tool_name]["failure"] += 1
+    
+    return tool_stats
+
+
+def _process_single_prompt(
+    prompt_index: int,
+    prompt_data: Dict[str, Any],
+    batch_num: int,
+    config: Dict[str, Any]
+) -> Dict[str, Any]:
+    """
+    Process a single prompt with the agent.
+    
+    Args:
+        prompt_index (int): Index of prompt in dataset
+        prompt_data (Dict): Prompt data containing 'prompt' field
+        batch_num (int): Batch number
+        config (Dict): Configuration dict with agent parameters
+        
+    Returns:
+        Dict: Result containing trajectory, stats, and metadata
+    """
+    prompt = prompt_data["prompt"]
+    
+    try:
+        # Sample toolsets from distribution for this prompt
+        selected_toolsets = sample_toolsets_from_distribution(config["distribution"])
+        
+        if config.get("verbose"):
+            print(f"   Prompt {prompt_index}: Using toolsets {selected_toolsets}")
+        
+        # Initialize agent with sampled toolsets
+        agent = AIAgent(
+            base_url=config.get("base_url"),
+            api_key=config.get("api_key"),
+            model=config["model"],
+            max_iterations=config["max_iterations"],
+            enabled_toolsets=selected_toolsets,
+            save_trajectories=False,  # We handle saving ourselves
+            verbose_logging=config.get("verbose", False),
+            ephemeral_system_prompt=config.get("ephemeral_system_prompt")
+        )
+        
+        # Run the agent
+        result = agent.run_conversation(prompt)
+        
+        # Extract tool usage statistics
+        tool_stats = _extract_tool_stats(result["messages"])
+        
+        # Convert to trajectory format (using existing method)
+        trajectory = agent._convert_to_trajectory_format(
+            result["messages"],
+            prompt,
+            result["completed"]
+        )
+        
+        return {
+            "success": True,
+            "prompt_index": prompt_index,
+            "trajectory": trajectory,
+            "tool_stats": tool_stats,
+            "completed": result["completed"],
+            "api_calls": result["api_calls"],
+            "toolsets_used": selected_toolsets,
+            "metadata": {
+                "batch_num": batch_num,
+                "timestamp": datetime.now().isoformat(),
+                "model": config["model"]
+            }
+        }
+    
+    except Exception as e:
+        print(f"❌ Error processing prompt {prompt_index}: {e}")
+        if config.get("verbose"):
+            traceback.print_exc()
+        
+        return {
+            "success": False,
+            "prompt_index": prompt_index,
+            "error": str(e),
+            "trajectory": None,
+            "tool_stats": {},
+            "toolsets_used": [],
+            "metadata": {
+                "batch_num": batch_num,
+                "timestamp": datetime.now().isoformat()
+            }
+        }
+
+
+def _process_batch_worker(args: Tuple) -> Dict[str, Any]:
+    """
+    Worker function to process a single batch of prompts.
+    
+    Args:
+        args (Tuple): (batch_num, batch_data, output_dir, completed_prompts, config)
+        
+    Returns:
+        Dict: Batch results with statistics
+    """
+    batch_num, batch_data, output_dir, completed_prompts_set, config = args
+    
+    output_dir = Path(output_dir)
+    print(f"\n🔄 Batch {batch_num}: Starting ({len(batch_data)} prompts)")
+    
+    # Output file for this batch
+    batch_output_file = output_dir / f"batch_{batch_num}.jsonl"
+    
+    # Filter out already completed prompts
+    prompts_to_process = [
+        (idx, data) for idx, data in batch_data
+        if idx not in completed_prompts_set
+    ]
+    
+    if not prompts_to_process:
+        print(f"✅ Batch {batch_num}: Already completed (skipping)")
+        return {
+            "batch_num": batch_num,
+            "processed": 0,
+            "skipped": len(batch_data),
+            "tool_stats": {},
+            "completed_prompts": []
+        }
+    
+    print(f"   Processing {len(prompts_to_process)} prompts (skipping {len(batch_data) - len(prompts_to_process)} already completed)")
+    
+    # Initialize aggregated stats for this batch
+    batch_tool_stats = {}
+    completed_in_batch = []
+    
+    # Process each prompt sequentially in this batch
+    for prompt_index, prompt_data in prompts_to_process:
+        # Process the prompt
+        result = _process_single_prompt(
+            prompt_index,
+            prompt_data,
+            batch_num,
+            config
+        )
+        
+        # Save trajectory if successful
+        if result["success"] and result["trajectory"]:
+            trajectory_entry = {
+                "prompt_index": prompt_index,
+                "conversations": result["trajectory"],
+                "metadata": result["metadata"],
+                "completed": result["completed"],
+                "api_calls": result["api_calls"],
+                "toolsets_used": result["toolsets_used"]
+            }
+            
+            # Append to batch output file
+            with open(batch_output_file, 'a', encoding='utf-8') as f:
+                f.write(json.dumps(trajectory_entry, ensure_ascii=False) + "\n")
+        
+        # Aggregate tool statistics
+        for tool_name, stats in result.get("tool_stats", {}).items():
+            if tool_name not in batch_tool_stats:
+                batch_tool_stats[tool_name] = {
+                    "count": 0,
+                    "success": 0,
+                    "failure": 0
+                }
+            
+            batch_tool_stats[tool_name]["count"] += stats["count"]
+            batch_tool_stats[tool_name]["success"] += stats["success"]
+            batch_tool_stats[tool_name]["failure"] += stats["failure"]
+        
+        completed_in_batch.append(prompt_index)
+        print(f"   ✅ Prompt {prompt_index} completed")
+    
+    print(f"✅ Batch {batch_num}: Completed ({len(prompts_to_process)} prompts processed)")
+    
+    return {
+        "batch_num": batch_num,
+        "processed": len(prompts_to_process),
+        "skipped": len(batch_data) - len(prompts_to_process),
+        "tool_stats": batch_tool_stats,
+        "completed_prompts": completed_in_batch
+    }
+
+
+class BatchRunner:
+    """
+    Manages batch processing of agent prompts with checkpointing and statistics.
+    """
+    
+    def __init__(
+        self,
+        dataset_file: str,
+        batch_size: int,
+        run_name: str,
+        distribution: str = "default",
+        max_iterations: int = 10,
+        base_url: str = None,
+        api_key: str = None,
+        model: str = "claude-opus-4-20250514",
+        num_workers: int = 4,
+        verbose: bool = False,
+        ephemeral_system_prompt: str = None
+    ):
+        """
+        Initialize the batch runner.
+        
+        Args:
+            dataset_file (str): Path to the dataset JSONL file with 'prompt' field
+            batch_size (int): Number of prompts per batch
+            run_name (str): Name for this run (used for checkpointing and output)
+            distribution (str): Toolset distribution to use (default: "default")
+            max_iterations (int): Max iterations per agent run
+            base_url (str): Base URL for model API
+            api_key (str): API key for model
+            model (str): Model name to use
+            num_workers (int): Number of parallel workers
+            verbose (bool): Enable verbose logging
+            ephemeral_system_prompt (str): System prompt used during agent execution but NOT saved to trajectories (optional)
+        """
+        self.dataset_file = Path(dataset_file)
+        self.batch_size = batch_size
+        self.run_name = run_name
+        self.distribution = distribution
+        self.max_iterations = max_iterations
+        self.base_url = base_url
+        self.api_key = api_key
+        self.model = model
+        self.num_workers = num_workers
+        self.verbose = verbose
+        self.ephemeral_system_prompt = ephemeral_system_prompt
+        
+        # Validate distribution
+        if not validate_distribution(distribution):
+            raise ValueError(f"Unknown distribution: {distribution}. Available: {list(list_distributions().keys())}")
+        
+        # Setup output directory
+        self.output_dir = Path("data") / run_name
+        self.output_dir.mkdir(parents=True, exist_ok=True)
+        
+        # Checkpoint file
+        self.checkpoint_file = self.output_dir / "checkpoint.json"
+        
+        # Statistics file
+        self.stats_file = self.output_dir / "statistics.json"
+        
+        # Load dataset
+        self.dataset = self._load_dataset()
+        
+        # Create batches
+        self.batches = self._create_batches()
+        
+        print(f"📊 Batch Runner Initialized")
+        print(f"   Dataset: {self.dataset_file} ({len(self.dataset)} prompts)")
+        print(f"   Batch size: {self.batch_size}")
+        print(f"   Total batches: {len(self.batches)}")
+        print(f"   Run name: {self.run_name}")
+        print(f"   Distribution: {self.distribution}")
+        print(f"   Output directory: {self.output_dir}")
+        print(f"   Workers: {self.num_workers}")
+        if self.ephemeral_system_prompt:
+            prompt_preview = self.ephemeral_system_prompt[:60] + "..." if len(self.ephemeral_system_prompt) > 60 else self.ephemeral_system_prompt
+            print(f"   🔒 Ephemeral system prompt: '{prompt_preview}'")
+    
+    def _load_dataset(self) -> List[Dict[str, Any]]:
+        """
+        Load dataset from JSONL file.
+        
+        Returns:
+            List[Dict]: List of dataset entries
+        """
+        if not self.dataset_file.exists():
+            raise FileNotFoundError(f"Dataset file not found: {self.dataset_file}")
+        
+        dataset = []
+        with open(self.dataset_file, 'r', encoding='utf-8') as f:
+            for line_num, line in enumerate(f, 1):
+                line = line.strip()
+                if not line:
+                    continue
+                
+                try:
+                    entry = json.loads(line)
+                    if 'prompt' not in entry:
+                        print(f"⚠️  Warning: Line {line_num} missing 'prompt' field, skipping")
+                        continue
+                    dataset.append(entry)
+                except json.JSONDecodeError as e:
+                    print(f"⚠️  Warning: Invalid JSON on line {line_num}: {e}")
+                    continue
+        
+        if not dataset:
+            raise ValueError(f"No valid entries found in dataset file: {self.dataset_file}")
+        
+        return dataset
+    
+    def _create_batches(self) -> List[List[Tuple[int, Dict[str, Any]]]]:
+        """
+        Split dataset into batches with indices.
+        
+        Returns:
+            List of batches, where each batch is a list of (index, entry) tuples
+        """
+        batches = []
+        for i in range(0, len(self.dataset), self.batch_size):
+            batch = [(idx, entry) for idx, entry in enumerate(self.dataset[i:i + self.batch_size], start=i)]
+            batches.append(batch)
+        
+        return batches
+    
+    def _load_checkpoint(self) -> Dict[str, Any]:
+        """
+        Load checkpoint data if it exists.
+        
+        Returns:
+            Dict: Checkpoint data with completed prompt indices
+        """
+        if not self.checkpoint_file.exists():
+            return {
+                "run_name": self.run_name,
+                "completed_prompts": [],
+                "batch_stats": {},
+                "last_updated": None
+            }
+        
+        try:
+            with open(self.checkpoint_file, 'r', encoding='utf-8') as f:
+                return json.load(f)
+        except Exception as e:
+            print(f"⚠️  Warning: Failed to load checkpoint: {e}")
+            return {
+                "run_name": self.run_name,
+                "completed_prompts": [],
+                "batch_stats": {},
+                "last_updated": None
+            }
+    
+    def _save_checkpoint(self, checkpoint_data: Dict[str, Any], lock: Optional[Lock] = None):
+        """
+        Save checkpoint data.
+        
+        Args:
+            checkpoint_data (Dict): Checkpoint data to save
+            lock (Lock): Optional lock for thread-safe access
+        """
+        checkpoint_data["last_updated"] = datetime.now().isoformat()
+        
+        if lock:
+            with lock:
+                with open(self.checkpoint_file, 'w', encoding='utf-8') as f:
+                    json.dump(checkpoint_data, f, indent=2)
+        else:
+            with open(self.checkpoint_file, 'w', encoding='utf-8') as f:
+                json.dump(checkpoint_data, f, indent=2)
+    
+    
+    def run(self, resume: bool = False):
+        """
+        Run the batch processing pipeline.
+        
+        Args:
+            resume (bool): Whether to resume from checkpoint
+        """
+        print("\n" + "=" * 70)
+        print("🚀 Starting Batch Processing")
+        print("=" * 70)
+        
+        # Load checkpoint
+        checkpoint_data = self._load_checkpoint() if resume else {
+            "run_name": self.run_name,
+            "completed_prompts": [],
+            "batch_stats": {},
+            "last_updated": None
+        }
+        
+        if resume and checkpoint_data.get("completed_prompts"):
+            print(f"📂 Resuming from checkpoint ({len(checkpoint_data['completed_prompts'])} prompts already completed)")
+        
+        # Prepare configuration for workers
+        config = {
+            "distribution": self.distribution,
+            "model": self.model,
+            "max_iterations": self.max_iterations,
+            "base_url": self.base_url,
+            "api_key": self.api_key,
+            "verbose": self.verbose,
+            "ephemeral_system_prompt": self.ephemeral_system_prompt
+        }
+        
+        # Get completed prompts set
+        completed_prompts_set = set(checkpoint_data.get("completed_prompts", []))
+        
+        # Aggregate statistics across all batches
+        total_tool_stats = {}
+        
+        start_time = time.time()
+        
+        # Process batches in parallel
+        with Pool(processes=self.num_workers) as pool:
+            # Create tasks for each batch
+            tasks = [
+                (
+                    batch_num,
+                    batch_data,
+                    str(self.output_dir),  # Convert Path to string for pickling
+                    completed_prompts_set,
+                    config
+                )
+                for batch_num, batch_data in enumerate(self.batches)
+            ]
+            
+            # Use map to process batches in parallel
+            results = pool.map(_process_batch_worker, tasks)
+        
+        # Aggregate all batch statistics and update checkpoint
+        all_completed_prompts = list(completed_prompts_set)
+        for batch_result in results:
+            # Add newly completed prompts
+            all_completed_prompts.extend(batch_result.get("completed_prompts", []))
+            
+            # Aggregate tool stats
+            for tool_name, stats in batch_result.get("tool_stats", {}).items():
+                if tool_name not in total_tool_stats:
+                    total_tool_stats[tool_name] = {
+                        "count": 0,
+                        "success": 0,
+                        "failure": 0
+                    }
+                
+                total_tool_stats[tool_name]["count"] += stats["count"]
+                total_tool_stats[tool_name]["success"] += stats["success"]
+                total_tool_stats[tool_name]["failure"] += stats["failure"]
+        
+        # Save final checkpoint
+        checkpoint_data["completed_prompts"] = all_completed_prompts
+        self._save_checkpoint(checkpoint_data)
+        
+        # Calculate success rates
+        for tool_name in total_tool_stats:
+            stats = total_tool_stats[tool_name]
+            total_calls = stats["success"] + stats["failure"]
+            if total_calls > 0:
+                stats["success_rate"] = round(stats["success"] / total_calls * 100, 2)
+                stats["failure_rate"] = round(stats["failure"] / total_calls * 100, 2)
+            else:
+                stats["success_rate"] = 0.0
+                stats["failure_rate"] = 0.0
+        
+        # Combine all batch files into a single trajectories.jsonl file
+        combined_file = self.output_dir / "trajectories.jsonl"
+        print(f"\n📦 Combining batch files into {combined_file.name}...")
+        
+        with open(combined_file, 'w', encoding='utf-8') as outfile:
+            for batch_num in range(len(self.batches)):
+                batch_file = self.output_dir / f"batch_{batch_num}.jsonl"
+                if batch_file.exists():
+                    with open(batch_file, 'r', encoding='utf-8') as infile:
+                        for line in infile:
+                            outfile.write(line)
+        
+        print(f"✅ Combined {len(self.batches)} batch files into trajectories.jsonl")
+        
+        # Save final statistics
+        final_stats = {
+            "run_name": self.run_name,
+            "distribution": self.distribution,
+            "total_prompts": len(self.dataset),
+            "total_batches": len(self.batches),
+            "batch_size": self.batch_size,
+            "model": self.model,
+            "completed_at": datetime.now().isoformat(),
+            "duration_seconds": round(time.time() - start_time, 2),
+            "tool_statistics": total_tool_stats
+        }
+        
+        with open(self.stats_file, 'w', encoding='utf-8') as f:
+            json.dump(final_stats, f, indent=2)
+        
+        # Print summary
+        print("\n" + "=" * 70)
+        print("📊 BATCH PROCESSING COMPLETE")
+        print("=" * 70)
+        print(f"✅ Total prompts processed: {len(self.dataset)}")
+        print(f"✅ Total batches: {len(self.batches)}")
+        print(f"⏱️  Total duration: {round(time.time() - start_time, 2)}s")
+        print(f"\n📈 Tool Usage Statistics:")
+        print("-" * 70)
+        
+        if total_tool_stats:
+            # Sort by count descending
+            sorted_tools = sorted(
+                total_tool_stats.items(),
+                key=lambda x: x[1]["count"],
+                reverse=True
+            )
+            
+            print(f"{'Tool Name':<25} {'Count':<10} {'Success':<10} {'Failure':<10} {'Success Rate':<12}")
+            print("-" * 70)
+            for tool_name, stats in sorted_tools:
+                print(
+                    f"{tool_name:<25} "
+                    f"{stats['count']:<10} "
+                    f"{stats['success']:<10} "
+                    f"{stats['failure']:<10} "
+                    f"{stats['success_rate']:.1f}%"
+                )
+        else:
+            print("No tool calls were made during this run.")
+        
+        print(f"\n💾 Results saved to: {self.output_dir}")
+        print(f"   - Trajectories: trajectories.jsonl (combined)")
+        print(f"   - Individual batches: batch_*.jsonl (for debugging)")
+        print(f"   - Statistics: {self.stats_file.name}")
+        print(f"   - Checkpoint: {self.checkpoint_file.name}")
+
+
+def main(
+    dataset_file: str = None,
+    batch_size: int = None,
+    run_name: str = None,
+    distribution: str = "default",
+    model: str = "claude-opus-4-20250514",
+    api_key: str = None,
+    base_url: str = "https://api.anthropic.com/v1/",
+    max_turns: int = 10,
+    num_workers: int = 4,
+    resume: bool = False,
+    verbose: bool = False,
+    list_distributions: bool = False,
+    ephemeral_system_prompt: str = None
+):
+    """
+    Run batch processing of agent prompts from a dataset.
+    
+    Args:
+        dataset_file (str): Path to JSONL file with 'prompt' field in each entry
+        batch_size (int): Number of prompts per batch
+        run_name (str): Name for this run (used for output and checkpointing)
+        distribution (str): Toolset distribution to use (default: "default")
+        model (str): Model name to use (default: "claude-opus-4-20250514")
+        api_key (str): API key for model authentication
+        base_url (str): Base URL for model API
+        max_turns (int): Maximum number of tool calling iterations per prompt (default: 10)
+        num_workers (int): Number of parallel worker processes (default: 4)
+        resume (bool): Resume from checkpoint if run was interrupted (default: False)
+        verbose (bool): Enable verbose logging (default: False)
+        list_distributions (bool): List available toolset distributions and exit
+        ephemeral_system_prompt (str): System prompt used during agent execution but NOT saved to trajectories (optional)
+        
+    Examples:
+        # Basic usage
+        python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=my_run
+        
+        # Resume interrupted run
+        python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=my_run --resume
+        
+        # Use specific distribution
+        python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=image_test --distribution=image_gen
+        
+        # With ephemeral system prompt (not saved to dataset)
+        python batch_runner.py --dataset_file=data.jsonl --batch_size=10 --run_name=my_run \\
+                               --ephemeral_system_prompt="You are a helpful assistant focused on image generation."
+        
+        # List available distributions
+        python batch_runner.py --list_distributions
+    """
+    # Handle list distributions
+    if list_distributions:
+        from toolset_distributions import list_distributions as get_all_dists, print_distribution_info
+        
+        print("📊 Available Toolset Distributions")
+        print("=" * 70)
+        
+        all_dists = get_all_dists()
+        for dist_name in sorted(all_dists.keys()):
+            print_distribution_info(dist_name)
+        
+        print("\n💡 Usage:")
+        print("  python batch_runner.py --dataset_file=data.jsonl --batch_size=10 \\")
+        print("                         --run_name=my_run --distribution=<name>")
+        return
+    
+    # Validate required arguments
+    if not dataset_file:
+        print("❌ Error: --dataset_file is required")
+        return
+    
+    if not batch_size or batch_size < 1:
+        print("❌ Error: --batch_size must be a positive integer")
+        return
+    
+    if not run_name:
+        print("❌ Error: --run_name is required")
+        return
+    
+    # Initialize and run batch runner
+    try:
+        runner = BatchRunner(
+            dataset_file=dataset_file,
+            batch_size=batch_size,
+            run_name=run_name,
+            distribution=distribution,
+            max_iterations=max_turns,
+            base_url=base_url,
+            api_key=api_key,
+            model=model,
+            num_workers=num_workers,
+            verbose=verbose,
+            ephemeral_system_prompt=ephemeral_system_prompt
+        )
+        
+        runner.run(resume=resume)
+    
+    except Exception as e:
+        print(f"\n❌ Fatal error: {e}")
+        if verbose:
+            traceback.print_exc()
+        return 1
+
+
+if __name__ == "__main__":
+    fire.Fire(main)
+
--- a/model_tools.py
+++ b/model_tools.py
@@ -8,27 +8,38 @@ for defining tools and executing function calls.

 Currently supports:
 - Web tools (search, extract, crawl) from web_tools.py
+- Terminal tools (command execution with interactive sessions) from terminal_tool.py
+- Vision tools (image analysis) from vision_tools.py
+- Mixture of Agents tools (collaborative multi-model reasoning) from mixture_of_agents_tool.py
+- Image generation tools (text-to-image with upscaling) from image_generation_tool.py

 Usage:
    from model_tools import get_tool_definitions, handle_function_call
    
-    # Get tool definitions for model API
+    # Get all available tool definitions for model API
    tools = get_tool_definitions()
    
+    # Get specific toolsets
+    web_tools = get_tool_definitions(enabled_toolsets=['web_tools'])
+    
    # Handle function calls from model
-    result = handle_function_call("web_search_tool", {"query": "Python", "limit": 3})
+    result = handle_function_call("web_search", {"query": "Python"})
 """

 import json
 import asyncio
 from typing import Dict, Any, List

-# Import toolsets
-from web_tools import web_search_tool, web_extract_tool, web_crawl_tool, check_firecrawl_api_key
-from terminal_tool import terminal_tool, check_hecate_requirements, TERMINAL_TOOL_DESCRIPTION
-from vision_tools import vision_analyze_tool, check_vision_requirements
-from mixture_of_agents_tool import mixture_of_agents_tool, check_moa_requirements
-from image_generation_tool import image_generate_tool, check_image_generation_requirements
+from tools.web_tools import web_search_tool, web_extract_tool, web_crawl_tool, check_firecrawl_api_key
+from tools.terminal_tool import terminal_tool, check_hecate_requirements, TERMINAL_TOOL_DESCRIPTION
+from tools.vision_tools import vision_analyze_tool, check_vision_requirements
+from tools.mixture_of_agents_tool import mixture_of_agents_tool, check_moa_requirements
+from tools.image_generation_tool import image_generate_tool, check_image_generation_requirements
+from toolsets import (
+    get_toolset, resolve_toolset, resolve_multiple_toolsets,
+    get_all_toolsets, get_toolset_names, validate_toolset,
+    get_toolset_info, print_toolset_tree
+)

 def get_web_tool_definitions() -> List[Dict[str, Any]]:
    """
@@ -42,20 +53,13 @@ def get_web_tool_definitions() -> List[Dict[str, Any]]:
            "type": "function",
            "function": {
                "name": "web_search",
-                "description": "Search the web for information on any topic. Returns relevant results with titles and URLs. Uses advanced search depth for comprehensive results.",
+                "description": "Search the web for information on any topic. Returns up to 5 relevant results with titles and URLs. Uses advanced search depth for comprehensive results.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "query": {
                            "type": "string",
                            "description": "The search query to look up on the web"
-                        },
-                        "limit": {
-                            "type": "integer",
-                            "description": "Maximum number of results to return (default: 5, max: 10)",
-                            "default": 5,
-                            "minimum": 1,
-                            "maximum": 10
                        }
                    },
                    "required": ["query"]
@@ -75,11 +79,6 @@ def get_web_tool_definitions() -> List[Dict[str, Any]]:
                            "items": {"type": "string"},
                            "description": "List of URLs to extract content from (max 5 URLs per call)",
                            "maxItems": 5
-                        },
-                        "format": {
-                            "type": "string",
-                            "enum": ["markdown", "html"],
-                            "description": "Desired output format for extracted content (optional)"
                        }
                    },
                    "required": ["urls"]
@@ -101,12 +100,6 @@ def get_web_tool_definitions() -> List[Dict[str, Any]]:
                        "instructions": {
                            "type": "string",
                            "description": "Specific instructions for what to crawl/extract using AI intelligence (e.g., 'Find pricing information', 'Get documentation pages', 'Extract contact details')"
-                        },
-                        "depth": {
-                            "type": "string",
-                            "enum": ["basic", "advanced"],
-                            "description": "Depth of extraction - 'basic' for surface content, 'advanced' for deeper analysis (default: basic)",
-                            "default": "basic"
                        }
                    },
                    "required": ["url"]
@@ -185,12 +178,7 @@ def get_vision_tool_definitions() -> List[Dict[str, Any]]:
                        },
                        "question": {
                            "type": "string",
-                            "description": "Your specific question or request about the image to resolve. The AI will automatically provide a complete image description AND answer your specific question. Examples: 'What text can you read?', 'What architectural style is this?', 'Describe the mood and emotions', 'What safety hazards do you see?'"
-                        },
-                        "model": {
-                            "type": "string",
-                            "description": "The vision model to use for analysis (optional, default: gemini-2.5-flash)",
-                            "default": "gemini-2.5-flash"
+                            "description": "Your specific question or request about the image to resolve. The AI will automatically provide a complete image description AND answer your specific question."
                        }
                    },
                    "required": ["image_url", "question"]
@@ -212,7 +200,7 @@ def get_moa_tool_definitions() -> List[Dict[str, Any]]:
            "type": "function",
            "function": {
                "name": "mixture_of_agents",
-                "description": "Process extremely difficult problems requiring intense reasoning using the Mixture-of-Agents methodology. This tool leverages multiple frontier language models to collaboratively solve complex tasks that single models struggle with. Uses a fixed 2-layer architecture: reference models (claude-opus-4, gemini-2.5-pro, o4-mini, deepseek-r1) generate diverse responses, then an aggregator synthesizes the best solution. Best for: complex mathematical proofs, advanced coding problems, multi-step analytical reasoning, precise and complex STEM problems, algorithm design, and problems requiring diverse domain expertise.",
+                "description": "Process extremely difficult problems requiring intense reasoning using a Mixture-of-Agents. This tool leverages multiple frontier language models to collaboratively solve complex tasks that single models struggle with. Uses a fixed 2-layer architecture: reference models generate diverse responses, then an aggregator synthesizes the best solution. Best for: complex mathematical proofs, advanced coding problems, multi-step analytical reasoning, precise and complex STEM problems, algorithm design, and problems requiring diverse domain expertise.",
                "parameters": {
                    "type": "object",
                    "properties": {
@@ -240,13 +228,13 @@ def get_image_tool_definitions() -> List[Dict[str, Any]]:
            "type": "function",
            "function": {
                "name": "image_generate",
-                "description": "Generate high-quality images from text prompts using FAL.ai's FLUX.1 Krea model with automatic 2x upscaling. Creates detailed, artistic images that are automatically enhanced for superior quality. Returns a single upscaled image URL that can be displayed using <img src=\"{URL}\"></img> tags.",
+                "description": "Generate high-quality images from text prompts using FLUX Krea model with automatic 2x upscaling. Creates detailed, artistic images that are automatically enhanced for superior quality. Returns a single upscaled image URL that can be displayed using <img src=\"{URL}\"></img> tags.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "prompt": {
                            "type": "string",
-                            "description": "The text prompt describing the desired image. Be detailed and descriptive for best results."
+                            "description": "The text prompt describing the desired image. Be detailed and descriptive."
                        },
                        "image_size": {
                            "type": "string",
@@ -291,10 +279,6 @@ def get_all_tool_names() -> List[str]:
    if check_image_generation_requirements():
        tool_names.extend(["image_generate"])
    
-    # Future toolsets can be added here:
-    # if check_file_tools():
-    #     tool_names.extend(["file_read", "file_write"])
-    
    return tool_names


@@ -316,154 +300,152 @@ def get_toolset_for_tool(tool_name: str) -> str:
        "vision_analyze": "vision_tools",
        "mixture_of_agents": "moa_tools",
        "image_generate": "image_tools"
-        # Future tools can be added here
    }
    
    return toolset_mapping.get(tool_name, "unknown")


 def get_tool_definitions(
-    enabled_tools: List[str] = None, 
-    disabled_tools: List[str] = None,
    enabled_toolsets: List[str] = None,
    disabled_toolsets: List[str] = None
 ) -> List[Dict[str, Any]]:
    """
-    Get tool definitions for model API calls with optional filtering.
+    Get tool definitions for model API calls with toolset-based filtering.
    
-    This function aggregates tool definitions from all available toolsets
-    and applies filtering based on the provided parameters.
-    
-    Filter Priority (higher priority overrides lower):
-    1. enabled_tools (highest priority - only these tools, overrides everything)
-    2. disabled_tools (applied after toolset filtering)
-    3. enabled_toolsets (only tools from these toolsets)
-    4. disabled_toolsets (exclude tools from these toolsets)
+    This function aggregates tool definitions from available toolsets.
+    All tools must be part of a toolset to be accessible. Individual tool
+    selection is not supported - use toolsets to organize and select tools.
    
    Args:
-        enabled_tools (List[str]): Only include these specific tools. If provided, 
-                                  ONLY these tools will be included (overrides all other filters)
-        disabled_tools (List[str]): Exclude these specific tools (applied after toolset filtering)
-        enabled_toolsets (List[str]): Only include tools from these toolsets
-        disabled_toolsets (List[str]): Exclude tools from these toolsets
+        enabled_toolsets (List[str]): Only include tools from these toolsets.
+                                     If None, all available tools are included.
+        disabled_toolsets (List[str]): Exclude tools from these toolsets.
+                                      Applied only if enabled_toolsets is None.
    
    Returns:
        List[Dict]: Filtered list of tool definitions
    
    Examples:
-        # Only web tools
-        tools = get_tool_definitions(enabled_toolsets=["web_tools"])
+        # Use predefined toolsets
+        tools = get_tool_definitions(enabled_toolsets=["research"])
+        tools = get_tool_definitions(enabled_toolsets=["development"])
        
-        # All tools except terminal
-        tools = get_tool_definitions(disabled_tools=["terminal"])
+        # Combine multiple toolsets
+        tools = get_tool_definitions(enabled_toolsets=["web", "vision"])
        
-        # Only specific tools (overrides toolset filters)
-        tools = get_tool_definitions(enabled_tools=["web_search", "web_extract"])
+        # All tools except those in terminal toolset
+        tools = get_tool_definitions(disabled_toolsets=["terminal"])
        
-        # Conflicting filters (enabled_tools wins)
-        tools = get_tool_definitions(enabled_toolsets=["web_tools"], enabled_tools=["terminal"])
-        # Result: Only terminal tool (enabled_tools overrides enabled_toolsets)
+        # Default - all available tools
+        tools = get_tool_definitions()
    """
-    # Detect and warn about potential conflicts
-    conflicts_detected = False
+    # Collect all available tool definitions
+    all_available_tools_map = {}
    
-    if enabled_tools and (enabled_toolsets or disabled_toolsets or disabled_tools):
-        print("⚠️  enabled_tools overrides all other filters")
-        conflicts_detected = True
+    # Map tool names to their definitions
+    if check_firecrawl_api_key():
+        for tool in get_web_tool_definitions():
+            all_available_tools_map[tool["function"]["name"]] = tool
    
-    if enabled_toolsets and disabled_toolsets:
-        # Check for overlap
-        enabled_set = set(enabled_toolsets)
-        disabled_set = set(disabled_toolsets)
-        overlap = enabled_set & disabled_set
-        if overlap:
-            print(f"⚠️  Conflicting toolsets: {overlap} in both enabled and disabled")
-            print(f"   → enabled_toolsets takes priority")
-            conflicts_detected = True
+    if check_hecate_requirements():
+        for tool in get_terminal_tool_definitions():
+            all_available_tools_map[tool["function"]["name"]] = tool
    
-    if enabled_tools and disabled_tools:
-        # Check for overlap
-        enabled_set = set(enabled_tools)
-        disabled_set = set(disabled_tools)
-        overlap = enabled_set & disabled_set
-        if overlap:
-            print(f"⚠️  Conflicting tools: {overlap} in both enabled and disabled")
-            print(f"   → enabled_tools takes priority")
-            conflicts_detected = True
+    if check_vision_requirements():
+        for tool in get_vision_tool_definitions():
+            all_available_tools_map[tool["function"]["name"]] = tool
    
-    all_tools = []
+    if check_moa_requirements():
+        for tool in get_moa_tool_definitions():
+            all_available_tools_map[tool["function"]["name"]] = tool
    
-    # Collect all available tools from each toolset
-    toolset_tools = {
-        "web_tools": get_web_tool_definitions() if check_firecrawl_api_key() else [],
-        "terminal_tools": get_terminal_tool_definitions() if check_hecate_requirements() else [],
-        "vision_tools": get_vision_tool_definitions() if check_vision_requirements() else [],
-        "moa_tools": get_moa_tool_definitions() if check_moa_requirements() else [],
-        "image_tools": get_image_tool_definitions() if check_image_generation_requirements() else []
-        # Future toolsets can be added here:
-        # "file_tools": get_file_tool_definitions() if check_file_tools() else [],
-    }
+    if check_image_generation_requirements():
+        for tool in get_image_tool_definitions():
+            all_available_tools_map[tool["function"]["name"]] = tool
    
-    # HIGHEST PRIORITY: enabled_tools (overrides everything)
-    if enabled_tools:
-        if conflicts_detected:
-            print(f"🎯 Using only enabled_tools: {enabled_tools}")
-        
-        # Collect all available tools first
-        all_available_tools = []
-        for tools in toolset_tools.values():
-            all_available_tools.extend(tools)
-        
-        # Only include specifically enabled tools
-        tool_names_to_include = set(enabled_tools)
-        filtered_tools = [
-            tool for tool in all_available_tools 
-            if tool["function"]["name"] in tool_names_to_include
-        ]
-        
-        # Warn about requested tools that aren't available
-        found_tools = {tool["function"]["name"] for tool in filtered_tools}
-        missing_tools = tool_names_to_include - found_tools
-        if missing_tools:
-            print(f"⚠️  Requested tools not available: {missing_tools}")
-        
-        return filtered_tools
+    # Determine which tools to include based on toolsets
+    tools_to_include = set()
    
-    # Apply toolset-level filtering first
    if enabled_toolsets:
        # Only include tools from enabled toolsets
        for toolset_name in enabled_toolsets:
-            if toolset_name in toolset_tools:
-                all_tools.extend(toolset_tools[toolset_name])
+            if validate_toolset(toolset_name):
+                resolved_tools = resolve_toolset(toolset_name)
+                tools_to_include.update(resolved_tools)
+                print(f"✅ Enabled toolset '{toolset_name}': {', '.join(resolved_tools) if resolved_tools else 'no tools'}")
            else:
-                print(f"⚠️  Unknown toolset: {toolset_name}")
+                # Try legacy compatibility
+                if toolset_name in ["web_tools", "terminal_tools", "vision_tools", "moa_tools", "image_tools"]:
+                    # Map legacy names to new system
+                    legacy_map = {
+                        "web_tools": ["web_search", "web_extract", "web_crawl"],
+                        "terminal_tools": ["terminal"],
+                        "vision_tools": ["vision_analyze"],
+                        "moa_tools": ["mixture_of_agents"],
+                        "image_tools": ["image_generate"]
+                    }
+                    legacy_tools = legacy_map.get(toolset_name, [])
+                    tools_to_include.update(legacy_tools)
+                    print(f"✅ Enabled legacy toolset '{toolset_name}': {', '.join(legacy_tools)}")
+                else:
+                    print(f"⚠️  Unknown toolset: {toolset_name}")
    elif disabled_toolsets:
-        # Include all tools except from disabled toolsets
-        for toolset_name, tools in toolset_tools.items():
-            if toolset_name not in disabled_toolsets:
-                all_tools.extend(tools)
+        # Start with all tools from all toolsets, then remove disabled ones
+        # Note: Only tools that are part of toolsets are accessible
+        # We need to get all tools from all defined toolsets
+        from toolsets import get_all_toolsets
+        all_toolset_tools = set()
+        for toolset_name in get_all_toolsets():
+            resolved_tools = resolve_toolset(toolset_name)
+            all_toolset_tools.update(resolved_tools)
+        
+        # Start with all tools from toolsets
+        tools_to_include = all_toolset_tools
+        
+        # Remove tools from disabled toolsets
+        for toolset_name in disabled_toolsets:
+            if validate_toolset(toolset_name):
+                resolved_tools = resolve_toolset(toolset_name)
+                tools_to_include.difference_update(resolved_tools)
+                print(f"🚫 Disabled toolset '{toolset_name}': {', '.join(resolved_tools) if resolved_tools else 'no tools'}")
+            else:
+                # Try legacy compatibility
+                if toolset_name in ["web_tools", "terminal_tools", "vision_tools", "moa_tools", "image_tools"]:
+                    legacy_map = {
+                        "web_tools": ["web_search", "web_extract", "web_crawl"],
+                        "terminal_tools": ["terminal"],
+                        "vision_tools": ["vision_analyze"],
+                        "moa_tools": ["mixture_of_agents"],
+                        "image_tools": ["image_generate"]
+                    }
+                    legacy_tools = legacy_map.get(toolset_name, [])
+                    tools_to_include.difference_update(legacy_tools)
+                    print(f"🚫 Disabled legacy toolset '{toolset_name}': {', '.join(legacy_tools)}")
+                else:
+                    print(f"⚠️  Unknown toolset: {toolset_name}")
    else:
-        # Include all available tools
-        for tools in toolset_tools.values():
-            all_tools.extend(tools)
+        # No filtering - include all tools from all defined toolsets
+        from toolsets import get_all_toolsets
+        for toolset_name in get_all_toolsets():
+            resolved_tools = resolve_toolset(toolset_name)
+            tools_to_include.update(resolved_tools)
    
-    # Apply tool-level filtering (disabled_tools)
-    if disabled_tools:
-        tool_names_to_exclude = set(disabled_tools)
-        original_tools = [tool["function"]["name"] for tool in all_tools]
-        
-        all_tools = [
-            tool for tool in all_tools 
-            if tool["function"]["name"] not in tool_names_to_exclude
-        ]
-        
-        # Show what was actually filtered out
-        remaining_tools = {tool["function"]["name"] for tool in all_tools}
-        actually_excluded = set(original_tools) & tool_names_to_exclude
-        if actually_excluded:
-            print(f"🚫 Excluded tools: {actually_excluded}")
+    # Build final tool list (only include tools that are available)
+    filtered_tools = []
+    for tool_name in tools_to_include:
+        if tool_name in all_available_tools_map:
+            filtered_tools.append(all_available_tools_map[tool_name])
    
-    return all_tools
+    # Sort tools for consistent ordering
+    filtered_tools.sort(key=lambda t: t["function"]["name"])
+    
+    if filtered_tools:
+        tool_names = [t["function"]["name"] for t in filtered_tools]
+        print(f"🛠️  Final tool selection ({len(filtered_tools)} tools): {', '.join(tool_names)}")
+    else:
+        print("🛠️  No tools selected (all filtered out or unavailable)")
+    
+    return filtered_tools

 def handle_web_function_call(function_name: str, function_args: Dict[str, Any]) -> str:
    """
@@ -478,25 +460,22 @@ def handle_web_function_call(function_name: str, function_args: Dict[str, Any])
    """
    if function_name == "web_search":
        query = function_args.get("query", "")
-        limit = function_args.get("limit", 5)
-        # Ensure limit is within bounds
-        limit = max(1, min(10, limit))
+        # Always use fixed limit of 5
+        limit = 5
        return web_search_tool(query, limit)
    
    elif function_name == "web_extract":
        urls = function_args.get("urls", [])
        # Limit URLs to prevent abuse
        urls = urls[:5] if isinstance(urls, list) else []
-        format = function_args.get("format")
        # Run async function in event loop
-        return asyncio.run(web_extract_tool(urls, format))
+        return asyncio.run(web_extract_tool(urls, "markdown"))
    
    elif function_name == "web_crawl":
        url = function_args.get("url", "")
        instructions = function_args.get("instructions")
-        depth = function_args.get("depth", "basic")
        # Run async function in event loop
-        return asyncio.run(web_crawl_tool(url, instructions, depth))
+        return asyncio.run(web_crawl_tool(url, instructions, "basic"))
    
    else:
        return json.dumps({"error": f"Unknown web function: {function_name}"})
@@ -518,9 +497,8 @@ def handle_terminal_function_call(function_name: str, function_args: Dict[str, A
        background = function_args.get("background", False)
        idle_threshold = function_args.get("idle_threshold", 5.0)
        timeout = function_args.get("timeout")
-        snapshot_id = function_args.get("snapshot_id")
-        # Session management is handled internally - don't pass session_id from model
-        return terminal_tool(command, input_keys, None, background, idle_threshold, timeout, snapshot_id=snapshot_id)
+
+        return terminal_tool(command, input_keys, None, background, idle_threshold, timeout)
    
    else:
        return json.dumps({"error": f"Unknown terminal function: {function_name}"})
@@ -540,13 +518,11 @@ def handle_vision_function_call(function_name: str, function_args: Dict[str, Any
    if function_name == "vision_analyze":
        image_url = function_args.get("image_url", "")
        question = function_args.get("question", "")
-        model = function_args.get("model", "gemini-2.5-flash")
-        
-        # Automatically prepend full description request to user's question
-        full_prompt = f"Fully describe and explain everything about this image\n\n{question}"
+
+        full_prompt = f"Fully describe and explain everything about this image, then answer the following question:\n\n{question}"
        
        # Run async function in event loop
-        return asyncio.run(vision_analyze_tool(image_url, full_prompt, model))
+        return asyncio.run(vision_analyze_tool(image_url, full_prompt, "gemini-2.5-flash"))
    
    else:
        return json.dumps({"error": f"Unknown vision function: {function_name}"})
@@ -593,7 +569,6 @@ def handle_image_function_call(function_name: str, function_args: Dict[str, Any]
        if not prompt:
            return json.dumps({"success": False, "image": None})
        
-        # Extract only the exposed parameters
        image_size = function_args.get("image_size", "landscape_16_9")
        
        # Use fixed internal defaults for all other parameters (not exposed to model)
@@ -606,8 +581,21 @@ def handle_image_function_call(function_name: str, function_args: Dict[str, Any]
        allow_nsfw_images = True
        seed = None
        
-        # Run async function in event loop
-        return asyncio.run(image_generate_tool(
+        # Run async function in event loop with proper handling for multiprocessing
+        try:
+            # Try to get existing event loop
+            loop = asyncio.get_event_loop()
+            if loop.is_closed():
+                # If closed, create a new one
+                loop = asyncio.new_event_loop()
+                asyncio.set_event_loop(loop)
+        except RuntimeError:
+            # No event loop in current thread, create one
+            loop = asyncio.new_event_loop()
+            asyncio.set_event_loop(loop)
+        
+        # Run the coroutine in the event loop
+        result = loop.run_until_complete(image_generate_tool(
            prompt=prompt,
            image_size=image_size,
            num_inference_steps=num_inference_steps,
@@ -619,6 +607,8 @@ def handle_image_function_call(function_name: str, function_args: Dict[str, Any]
            allow_nsfw_images=allow_nsfw_images,
            seed=seed
        ))
+        
+        return result
    
    else:
        return json.dumps({"error": f"Unknown image generation function: {function_name}"})
@@ -663,12 +653,6 @@ def handle_function_call(function_name: str, function_args: Dict[str, Any]) -> s
        elif function_name in ["image_generate"]:
            return handle_image_function_call(function_name, function_args)
        
-        # Future toolsets can be routed here:
-        # elif function_name in ["file_read_tool", "file_write_tool"]:
-        #     return handle_file_function_call(function_name, function_args)
-        # elif function_name in ["code_execute_tool", "code_analyze_tool"]:
-        #     return handle_code_function_call(function_name, function_args)
-        
        else:
            error_msg = f"Unknown function: {function_name}"
            print(f"❌ {error_msg}")
@@ -717,7 +701,6 @@ def get_available_toolsets() -> Dict[str, Dict[str, Any]]:
            "description": "Generate high-quality images from text prompts using FAL.ai's FLUX.1 Krea model with automatic 2x upscaling for enhanced quality",
            "requirements": ["FAL_KEY environment variable", "fal-client package"]
        }
-        # Future toolsets can be added here
    }
    
    return toolsets
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -0,0 +1,28 @@
+[build-system]
+requires = ["setuptools>=61.0"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "hermes-agent"
+version = "0.1.0"
+description = "AI agent with advanced tool-calling and toolsets"
+readme = "README.md"
+requires-python = ">=3.10"
+authors = [{ name = "Hermes Agent" }]
+license = { text = "MIT" }
+dependencies = [
+  "firecrawl-py",
+  "openai",
+  "fal-client",
+  "python-dotenv",
+  "fire"
+]
+
+[project.scripts]
+hermes-agent = "run_agent:main"
+
+[tool.setuptools]
+py-modules = ["run_agent", "model_tools", "toolsets"]
+
+[tool.setuptools.packages.find]
+include = ["tools"]
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,3 +1,6 @@
 firecrawl-py
 openai
-fal-client
+fal-client
+python-dotenv
+fire
+requests
--- a/run_agent.py
+++ b/run_agent.py
--- a/run_datagen_images.sh
+++ b/run_datagen_images.sh
@@ -0,0 +1,12 @@
+python batch_runner.py \
+  --dataset_file="hermes-agent-imagen-data/hermes_agent_imagen_eval.jsonl" \
+  --batch_size=10 \
+  --run_name="imagen_eval_gpt5" \
+  --distribution="image_gen" \
+  --model="gpt-5" \
+  --base_url="https://api.openai.com/v1" \
+  --api_key="${OPENAI_API_KEY}" \
+  --num_workers=4 \
+  --max_turns=5 \
+  --verbose \
+  --ephemeral_system_prompt="When generating an image for the user view the image by using the vision_analyze tool to ensure it is what the user wanted. If it isn't feel free to retry a few times. If none are perfect, choose the best option that is the closest match, and explain its imperfections. If the image generation tool fails, try again a few times. If the vision analyze tool fails, provide the image to the user and explain it is your best effort attempt."
--- a/test_run.sh
+++ b/test_run.sh
@@ -1,14 +1,23 @@
+#!/bin/bash
+
+# Check if a prompt argument was provided
+if [ $# -eq 0 ]; then
+    echo "Error: Please provide a prompt as an argument"
+    echo "Usage: $0 \"your prompt here\""
+    exit 1
+fi
+
+# Get the prompt from the first argument
+PROMPT="$1"
+
+# Set debug mode for web tools
 export WEB_TOOLS_DEBUG=true

+# Run the agent with the provided prompt
 python run_agent.py \
-  --query "Tell me about this animal pictured: https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcQi1nkrYXY-ijQv5aCxkwooyg2roNFxj0ewJA&s" \
+  --query "$PROMPT" \
  --max_turns 30 \
-  --model claude-sonnet-4-20250514 \
+  --model claude-sonnet-4-5-20250929 \
  --base_url https://api.anthropic.com/v1/ \
  --api_key $ANTHROPIC_API_KEY \
-  --enabled_toolsets=vision_tools
-
-#Possible Toolsets:
-#web_tools
-#vision_tools
-#terminal_tools
+  --save_trajectories
--- a/tests/init.py
+++ b/tests/init.py
--- a/tests/test_batch_runner.py
+++ b/tests/test_batch_runner.py
@@ -0,0 +1,129 @@
+#!/usr/bin/env python3
+"""
+Test script for batch runner
+
+This script tests the batch runner with a small sample dataset
+to verify functionality before running large batches.
+"""
+
+import json
+import shutil
+from pathlib import Path
+
+
+def create_test_dataset():
+    """Create a small test dataset."""
+    test_file = Path("tests/test_dataset.jsonl")
+    test_file.parent.mkdir(exist_ok=True)
+    
+    prompts = [
+        {"prompt": "What is 2 + 2?"},
+        {"prompt": "What is the capital of France?"},
+        {"prompt": "Explain what Python is in one sentence."},
+    ]
+    
+    with open(test_file, 'w') as f:
+        for prompt in prompts:
+            f.write(json.dumps(prompt) + "\n")
+    
+    print(f"✅ Created test dataset: {test_file}")
+    return test_file
+
+
+def cleanup_test_run(run_name):
+    """Clean up test run output."""
+    output_dir = Path("data") / run_name
+    if output_dir.exists():
+        shutil.rmtree(output_dir)
+        print(f"🗑️  Cleaned up test output: {output_dir}")
+
+
+def verify_output(run_name):
+    """Verify that output files were created correctly."""
+    output_dir = Path("data") / run_name
+    
+    # Check directory exists
+    if not output_dir.exists():
+        print(f"❌ Output directory not found: {output_dir}")
+        return False
+    
+    # Check for checkpoint
+    checkpoint_file = output_dir / "checkpoint.json"
+    if not checkpoint_file.exists():
+        print(f"❌ Checkpoint file not found: {checkpoint_file}")
+        return False
+    
+    # Check for statistics
+    stats_file = output_dir / "statistics.json"
+    if not stats_file.exists():
+        print(f"❌ Statistics file not found: {stats_file}")
+        return False
+    
+    # Check for batch files
+    batch_files = list(output_dir.glob("batch_*.jsonl"))
+    if not batch_files:
+        print(f"❌ No batch files found in: {output_dir}")
+        return False
+    
+    print(f"✅ Output verification passed:")
+    print(f"   - Checkpoint: {checkpoint_file}")
+    print(f"   - Statistics: {stats_file}")
+    print(f"   - Batch files: {len(batch_files)}")
+    
+    # Load and display statistics
+    with open(stats_file) as f:
+        stats = json.load(f)
+    
+    print(f"\n📊 Statistics Summary:")
+    print(f"   - Total prompts: {stats['total_prompts']}")
+    print(f"   - Total batches: {stats['total_batches']}")
+    print(f"   - Duration: {stats['duration_seconds']}s")
+    
+    if stats.get('tool_statistics'):
+        print(f"   - Tool calls:")
+        for tool, tool_stats in stats['tool_statistics'].items():
+            print(f"     • {tool}: {tool_stats['count']} calls, {tool_stats['success_rate']:.1f}% success")
+    
+    return True
+
+
+def main():
+    """Run the test."""
+    print("🧪 Batch Runner Test")
+    print("=" * 60)
+    
+    run_name = "test_run"
+    
+    # Clean up any previous test run
+    cleanup_test_run(run_name)
+    
+    # Create test dataset
+    test_file = create_test_dataset()
+    
+    print(f"\n📝 To run the test manually:")
+    print(f"   python batch_runner.py \\")
+    print(f"       --dataset_file={test_file} \\")
+    print(f"       --batch_size=2 \\")
+    print(f"       --run_name={run_name} \\")
+    print(f"       --distribution=minimal \\")
+    print(f"       --num_workers=2")
+    
+    print(f"\n💡 Or test with different distributions:")
+    print(f"   python batch_runner.py --list_distributions")
+    
+    print(f"\n🔍 After running, you can verify output with:")
+    print(f"   python tests/test_batch_runner.py --verify")
+    
+    # Note: We don't actually run the batch runner here to avoid API calls during testing
+    # Users should run it manually with their API keys configured
+
+
+if __name__ == "__main__":
+    import sys
+    
+    if "--verify" in sys.argv:
+        run_name = "test_run"
+        verify_output(run_name)
+    else:
+        main()
+
--- a/tests/test_web_tools.py
+++ b/tests/test_web_tools.py
@@ -23,8 +23,8 @@ import argparse
 from datetime import datetime
 from typing import List, Dict, Any

-# Import the web tools to test
-from web_tools import (
+# Import the web tools to test (updated path after moving tools/)
+from tools.web_tools import (
    web_search_tool, 
    web_extract_tool, 
    web_crawl_tool,
--- a/tools/init.py
+++ b/tools/init.py
@@ -0,0 +1,67 @@
+#!/usr/bin/env python3
+"""
+Tools Package
+
+This package contains all the specific tool implementations for the Hermes Agent.
+Each module provides specialized functionality for different capabilities:
+
+- web_tools: Web search, content extraction, and crawling
+- terminal_tool: Command execution on virtual machines
+- vision_tools: Image analysis and understanding
+- mixture_of_agents_tool: Multi-model collaborative reasoning
+- image_generation_tool: Text-to-image generation with upscaling
+
+The tools are imported into model_tools.py which provides a unified interface
+for the AI agent to access all capabilities.
+"""
+
+# Export all tools for easy importing
+from .web_tools import (
+    web_search_tool,
+    web_extract_tool,
+    web_crawl_tool,
+    check_firecrawl_api_key
+)
+
+from .terminal_tool import (
+    terminal_tool,
+    check_hecate_requirements,
+    TERMINAL_TOOL_DESCRIPTION
+)
+
+from .vision_tools import (
+    vision_analyze_tool,
+    check_vision_requirements
+)
+
+from .mixture_of_agents_tool import (
+    mixture_of_agents_tool,
+    check_moa_requirements
+)
+
+from .image_generation_tool import (
+    image_generate_tool,
+    check_image_generation_requirements
+)
+
+__all__ = [
+    # Web tools
+    'web_search_tool',
+    'web_extract_tool',
+    'web_crawl_tool',
+    'check_firecrawl_api_key',
+    # Terminal tools
+    'terminal_tool',
+    'check_hecate_requirements',
+    'TERMINAL_TOOL_DESCRIPTION',
+    # Vision tools
+    'vision_analyze_tool',
+    'check_vision_requirements',
+    # MoA tools
+    'mixture_of_agents_tool',
+    'check_moa_requirements',
+    # Image generation tools
+    'image_generate_tool',
+    'check_image_generation_requirements',
+]
+
--- a/tools/image_generation_tool.py
+++ b/tools/image_generation_tool.py
@@ -319,9 +319,6 @@ async def image_generate_tool(
        if not prompt or not isinstance(prompt, str) or len(prompt.strip()) == 0:
            raise ValueError("Prompt is required and must be a non-empty string")
        
-        if len(prompt) > 1000:
-            raise ValueError("Prompt must be 1000 characters or less")
-        
        # Check API key availability
        if not os.getenv("FAL_KEY"):
            raise ValueError("FAL_KEY environment variable not set")
--- a/tools/mixture_of_agents_tool.py
+++ b/tools/mixture_of_agents_tool.py
--- a/tools/terminal_tool.py
+++ b/tools/terminal_tool.py
@@ -4,26 +4,27 @@ Terminal Tool Module

 This module provides a single terminal tool using Hecate's VM infrastructure.
 It wraps Hecate's functionality to provide a simple interface for executing commands
-on Morph VMs with automatic lifecycle management.
+on Morph VMs with automatic lifecycle management. VMs live for 5 minutes after last use.
+Timer resets with each use.

 Available tool:
 - terminal_tool: Execute commands with optional interactive session support

 Usage:
    from terminal_tool import terminal_tool
-    
+
    # Execute a single command
    result = terminal_tool("ls -la")
-    
+
    # Execute in an interactive session
    result = terminal_tool("python", input_keys="print('hello')\\nexit()\\n")
 """

 import json
 import os
+import uuid
+import threading
 from typing import Optional, Dict, Any
-from hecate import run_tool_with_lifecycle_management
-from morphcloud._llm import ToolCall

 # Detailed description for the terminal tool based on Hermes Terminal system prompt
 TERMINAL_TOOL_DESCRIPTION = """Execute commands on a secure, persistent Linux VM environment with full interactive application support.
@@ -72,14 +73,19 @@ When commands enter interactive mode (vim, nano, less, git prompts, package mana
 - Test components incrementally with mock inputs
 - Install whatever tools needed - full system access provided"""

+# Global state for VM lifecycle management
+# These persist across tool calls to enable session continuity
+_active_instance = None
+_active_context = None
+_instance_lock = threading.Lock()
+
 def terminal_tool(
    command: Optional[str] = None,
    input_keys: Optional[str] = None,
    session_id: Optional[str] = None,
    background: bool = False,
    idle_threshold: float = 5.0,
-    timeout: Optional[int] = None,
-    snapshot_id: str | None = None,
+    timeout: Optional[int] = None
 ) -> str:
    """
    Execute a command on a Morph VM with optional interactive session support.
@@ -114,10 +120,60 @@ def terminal_tool(
        # Run a background task
        >>> result = terminal_tool(command="sleep 60", background=True)
    """
+    global _active_instance, _active_context
+
    try:
+        # Import required modules lazily so this module can be imported
+        # even when hecate is not installed
+        try:
+            from morphcloud._llm import ToolCall
+            from morphcloud.api import MorphCloudClient
+            from hecate.cli import run_tool, ExecutionContext
+            from rich.console import Console
+            import io
+        except ImportError as import_error:
+            return json.dumps({
+                "output": "",
+                "screen": "",
+                "session_id": None,
+                "exit_code": -1,
+                "error": f"Terminal tool is disabled due to import error: {import_error}",
+                "status": "disabled"
+            })
+
+        # Get configuration from environment
+        vm_lifetime_seconds = int(os.getenv("HECATE_VM_LIFETIME_SECONDS", "300"))
+        snapshot_id = os.getenv("HECATE_DEFAULT_SNAPSHOT_ID", "python-2025-10-31")
+
+        # Check API key
+        morph_api_key = os.getenv("MORPH_API_KEY")
+        if not morph_api_key:
+            return json.dumps({
+                "output": "",
+                "screen": "",
+                "session_id": None,
+                "exit_code": -1,
+                "error": "MORPH_API_KEY environment variable not set",
+                "status": "disabled"
+            })
+
+        # Get or create VM instance and execution context
+        # This is critical for interactive session support - the context must persist!
+        with _instance_lock:
+            if _active_instance is None:
+                morph_client = MorphCloudClient(api_key=morph_api_key)
+                _active_instance = morph_client.instances.start(snapshot_id=snapshot_id)
+
+            # Get or create persistent execution context
+            if _active_context is None:
+                _active_context = ExecutionContext()
+
+            instance = _active_instance
+            ctx = _active_context
+
        # Build tool input based on provided parameters
        tool_input = {}
-        
+
        if command:
            tool_input["command"] = command
        if input_keys:
@@ -130,15 +186,28 @@ def terminal_tool(
            tool_input["idle_threshold"] = idle_threshold
        if timeout is not None:
            tool_input["timeout"] = timeout
-        
+
        tool_call = ToolCall(
            name="run_command",
            input=tool_input
        )
-        
-        # Execute with lifecycle management
-        result = run_tool_with_lifecycle_management(tool_call, snapshot_id=snapshot_id)
-        
+
+        # Create a console for output (redirect to string buffer to avoid printing)
+        console_output = io.StringIO()
+        console = Console(file=console_output, force_terminal=False, legacy_windows=False)
+
+        # Generate unique tool block ID
+        tool_block_id = f"tool_{uuid.uuid4().hex[:8]}"
+
+        # Execute the tool with hecate
+        result = run_tool(
+            tool_call=tool_call,
+            instance=instance,
+            console=console,
+            tool_block_id=tool_block_id,
+            ctx=ctx
+        )
+
        # Format the result with all possible fields
        # Map hecate's "stdout" to "output" for compatibility
        formatted_result = {
@@ -149,9 +218,9 @@ def terminal_tool(
            "error": result.get("error"),
            "status": "active" if result.get("session_id") else "ended"
        }
-        
+
        return json.dumps(formatted_result)
-        
+
    except Exception as e:
        return json.dumps({
            "output": "",
@@ -184,12 +253,16 @@ def check_hecate_requirements() -> bool:
        print(f"Warning: Missing optional environment variables: {', '.join(missing_optional)}")
        print("   (Some Hecate features may be limited)")
    
-    # Check if Hecate is importable
+    # Check if Hecate and required modules are importable
    try:
-        import hecate
+        from morphcloud._llm import ToolCall
+        from morphcloud.api import MorphCloudClient
+        from hecate.cli import run_tool, ExecutionContext
+        from rich.console import Console
        return True
-    except ImportError:
-        print("Hecate is not installed. Please install it with: pip install hecate")
+    except Exception as e:
+        print(f"Hecate not available: {e}")
+        print(f"Make sure hecate is installed and MORPH_API_KEY is set.")
        return False

 # Module-level initialization check
--- a/tools/vision_tools.py
+++ b/tools/vision_tools.py
@@ -1,346 +1,471 @@
-#!/usr/bin/env python3
-"""
-Vision Tools Module
-
-This module provides vision analysis tools that work with image URLs.
-Uses Gemini Flash via Nous Research API for intelligent image understanding.
-
-Available tools:
- vision_analyze_tool: Analyze images from URLs with custom prompts
-
-Features:
- Comprehensive image description
- Context-aware analysis based on user queries
- Proper error handling and validation
- Debug logging support
-
-Usage:
-    from vision_tools import vision_analyze_tool
-    import asyncio
-    
-    # Analyze an image
-    result = await vision_analyze_tool(
-        image_url="https://example.com/image.jpg",
-        user_prompt="What architectural style is this building?"
-    )
-"""
-
-import json
-import os
-import asyncio
-import uuid
-import datetime
-from pathlib import Path
-from typing import Dict, Any, Optional
-from openai import AsyncOpenAI
-
-# Initialize Nous Research API client for vision processing
-nous_client = AsyncOpenAI(
-    api_key=os.getenv("NOUS_API_KEY"),
-    base_url="https://inference-api.nousresearch.com/v1"
-)
-
-# Configuration for vision processing
-DEFAULT_VISION_MODEL = "gemini-2.5-flash"
-
-# Debug mode configuration
-DEBUG_MODE = os.getenv("VISION_TOOLS_DEBUG", "false").lower() == "true"
-DEBUG_SESSION_ID = str(uuid.uuid4())
-DEBUG_LOG_PATH = Path("./logs")
-DEBUG_DATA = {
-    "session_id": DEBUG_SESSION_ID,
-    "start_time": datetime.datetime.now().isoformat(),
-    "debug_enabled": DEBUG_MODE,
-    "tool_calls": []
-} if DEBUG_MODE else None
-
-# Create logs directory if debug mode is enabled
-if DEBUG_MODE:
-    DEBUG_LOG_PATH.mkdir(exist_ok=True)
-    print(f"🐛 Vision debug mode enabled - Session ID: {DEBUG_SESSION_ID}")
-
-
-def _log_debug_call(tool_name: str, call_data: Dict[str, Any]) -> None:
-    """
-    Log a debug call entry to the global debug data structure.
-    
-    Args:
-        tool_name (str): Name of the tool being called
-        call_data (Dict[str, Any]): Data about the call including parameters and results
-    """
-    if not DEBUG_MODE or not DEBUG_DATA:
-        return
-    
-    call_entry = {
-        "timestamp": datetime.datetime.now().isoformat(),
-        "tool_name": tool_name,
-        **call_data
-    }
-    
-    DEBUG_DATA["tool_calls"].append(call_entry)
-
-
-def _save_debug_log() -> None:
-    """
-    Save the current debug data to a JSON file in the logs directory.
-    """
-    if not DEBUG_MODE or not DEBUG_DATA:
-        return
-    
-    try:
-        debug_filename = f"vision_tools_debug_{DEBUG_SESSION_ID}.json"
-        debug_filepath = DEBUG_LOG_PATH / debug_filename
-        
-        # Update end time
-        DEBUG_DATA["end_time"] = datetime.datetime.now().isoformat()
-        DEBUG_DATA["total_calls"] = len(DEBUG_DATA["tool_calls"])
-        
-        with open(debug_filepath, 'w', encoding='utf-8') as f:
-            json.dump(DEBUG_DATA, f, indent=2, ensure_ascii=False)
-        
-        print(f"🐛 Vision debug log saved: {debug_filepath}")
-        
-    except Exception as e:
-        print(f"❌ Error saving vision debug log: {str(e)}")
-
-
-def _validate_image_url(url: str) -> bool:
-    """
-    Basic validation of image URL format.
-    
-    Args:
-        url (str): The URL to validate
-        
-    Returns:
-        bool: True if URL appears to be valid, False otherwise
-    """
-    if not url or not isinstance(url, str):
-        return False
-    
-    # Check if it's a valid URL format
-    if not (url.startswith('http://') or url.startswith('https://')):
-        return False
-    
-    # Check for common image extensions (optional, as URLs may not have extensions)
-    image_extensions = ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.webp', '.svg']
-    
-    return True  # Allow all HTTP/HTTPS URLs for flexibility
-
-
-async def vision_analyze_tool(
-    image_url: str,
-    user_prompt: str,
-    model: str = DEFAULT_VISION_MODEL
-) -> str:
-    """
-    Analyze an image from a URL using vision AI.
-    
-    This tool processes images using Gemini Flash via Nous Research API.
-    The user_prompt parameter is expected to be pre-formatted by the calling
-    function (typically model_tools.py) to include both full description
-    requests and specific questions.
-    
-    Args:
-        image_url (str): The URL of the image to analyze
-        user_prompt (str): The pre-formatted prompt for the vision model
-        model (str): The vision model to use (default: gemini-2.5-flash)
-    
-    Returns:
-        str: JSON string containing the analysis results with the following structure:
-             {
-                 "success": bool,
-                 "analysis": str (defaults to error message if None)
-             }
-    
-    Raises:
-        Exception: If analysis fails or API key is not set
-    """
-    debug_call_data = {
-        "parameters": {
-            "image_url": image_url,
-            "user_prompt": user_prompt,
-            "model": model
-        },
-        "error": None,
-        "success": False,
-        "analysis_length": 0,
-        "model_used": model
-    }
-    
-    try:
-        print(f"🔍 Analyzing image from URL: {image_url[:60]}{'...' if len(image_url) > 60 else ''}")
-        print(f"📝 User prompt: {user_prompt[:100]}{'...' if len(user_prompt) > 100 else ''}")
-        
-        # Validate image URL
-        if not _validate_image_url(image_url):
-            raise ValueError("Invalid image URL format. Must start with http:// or https://")
-        
-        # Check API key availability
-        if not os.getenv("NOUS_API_KEY"):
-            raise ValueError("NOUS_API_KEY environment variable not set")
-        
-        # Use the prompt as provided (model_tools.py now handles full description formatting)
-        comprehensive_prompt = user_prompt
-        
-        # Prepare the message with image URL format
-        messages = [
-            {
-                "role": "user",
-                "content": [
-                    {
-                        "type": "text",
-                        "text": comprehensive_prompt
-                    },
-                    {
-                        "type": "image_url",
-                        "image_url": {
-                            "url": image_url
-                        }
-                    }
-                ]
-            }
-        ]
-        
-        print(f"🧠 Processing image with {model}...")
-        
-        # Call the vision API
-        response = await nous_client.chat.completions.create(
-            model=model,
-            messages=messages,
-            temperature=0.1,  # Low temperature for consistent analysis
-            max_tokens=2000   # Generous limit for detailed analysis
-        )
-        
-        # Extract the analysis
-        analysis = response.choices[0].message.content.strip()
-        analysis_length = len(analysis)
-        
-        print(f"✅ Image analysis completed ({analysis_length} characters)")
-        
-        # Prepare successful response
-        result = {
-            "success": True,
-            "analysis": analysis or "There was a problem with the request and the image could not be analyzed."
-        }
-        
-        debug_call_data["success"] = True
-        debug_call_data["analysis_length"] = analysis_length
-        
-        # Log debug information
-        _log_debug_call("vision_analyze_tool", debug_call_data)
-        _save_debug_log()
-        
-        return json.dumps(result, indent=2)
-        
-    except Exception as e:
-        error_msg = f"Error analyzing image: {str(e)}"
-        print(f"❌ {error_msg}")
-        
-        # Prepare error response
-        result = {
-            "success": False,
-            "analysis": "There was a problem with the request and the image could not be analyzed."
-        }
-        
-        debug_call_data["error"] = error_msg
-        _log_debug_call("vision_analyze_tool", debug_call_data)
-        _save_debug_log()
-        
-        return json.dumps(result, indent=2)
-
-
-def check_nous_api_key() -> bool:
-    """
-    Check if the Nous Research API key is available in environment variables.
-    
-    Returns:
-        bool: True if API key is set, False otherwise
-    """
-    return bool(os.getenv("NOUS_API_KEY"))
-
-
-def check_vision_requirements() -> bool:
-    """
-    Check if all requirements for vision tools are met.
-    
-    Returns:
-        bool: True if requirements are met, False otherwise
-    """
-    return check_nous_api_key()
-
-
-def get_debug_session_info() -> Dict[str, Any]:
-    """
-    Get information about the current debug session.
-    
-    Returns:
-        Dict[str, Any]: Dictionary containing debug session information
-    """
-    if not DEBUG_MODE or not DEBUG_DATA:
-        return {
-            "enabled": False,
-            "session_id": None,
-            "log_path": None,
-            "total_calls": 0
-        }
-    
-    return {
-        "enabled": True,
-        "session_id": DEBUG_SESSION_ID,
-        "log_path": str(DEBUG_LOG_PATH / f"vision_tools_debug_{DEBUG_SESSION_ID}.json"),
-        "total_calls": len(DEBUG_DATA["tool_calls"])
-    }
-
-
-if __name__ == "__main__":
-    """
-    Simple test/demo when run directly
-    """
-    print("👁️ Vision Tools Module")
-    print("=" * 40)
-    
-    # Check if API key is available
-    api_available = check_nous_api_key()
-    
-    if not api_available:
-        print("❌ NOUS_API_KEY environment variable not set")
-        print("Please set your API key: export NOUS_API_KEY='your-key-here'")
-        print("Get API key at: https://inference-api.nousresearch.com/")
-        exit(1)
-    else:
-        print("✅ Nous Research API key found")
-    
-    print("🛠️ Vision tools ready for use!")
-    print(f"🧠 Using model: {DEFAULT_VISION_MODEL}")
-    
-    # Show debug mode status
-    if DEBUG_MODE:
-        print(f"🐛 Debug mode ENABLED - Session ID: {DEBUG_SESSION_ID}")
-        print(f"   Debug logs will be saved to: ./logs/vision_tools_debug_{DEBUG_SESSION_ID}.json")
-    else:
-        print("🐛 Debug mode disabled (set VISION_TOOLS_DEBUG=true to enable)")
-    
-    print("\nBasic usage:")
-    print("  from vision_tools import vision_analyze_tool")
-    print("  import asyncio")
-    print("")
-    print("  async def main():")
-    print("      result = await vision_analyze_tool(")
-    print("          image_url='https://example.com/image.jpg',")
-    print("          user_prompt='What do you see in this image?'")
-    print("      )")
-    print("      print(result)")
-    print("  asyncio.run(main())")
-    
-    print("\nExample prompts:")
-    print("  - 'What architectural style is this building?'")
-    print("  - 'Describe the emotions and mood in this image'")
-    print("  - 'What text can you read in this image?'")
-    print("  - 'Identify any safety hazards visible'")
-    print("  - 'What products or brands are shown?'")
-    
-    print("\nDebug mode:")
-    print("  # Enable debug logging")
-    print("  export VISION_TOOLS_DEBUG=true")
-    print("  # Debug logs capture all vision analysis calls and results")
-    print("  # Logs saved to: ./logs/vision_tools_debug_UUID.json")
+#!/usr/bin/env python3
+"""
+Vision Tools Module
+
+This module provides vision analysis tools that work with image URLs.
+Uses Gemini Flash via Nous Research API for intelligent image understanding.
+
+Available tools:
+- vision_analyze_tool: Analyze images from URLs with custom prompts
+
+Features:
+- Downloads images from URLs and converts to base64 for API compatibility
+- Comprehensive image description
+- Context-aware analysis based on user queries
+- Automatic temporary file cleanup
+- Proper error handling and validation
+- Debug logging support
+
+Usage:
+    from vision_tools import vision_analyze_tool
+    import asyncio
+    
+    # Analyze an image
+    result = await vision_analyze_tool(
+        image_url="https://example.com/image.jpg",
+        user_prompt="What architectural style is this building?"
+    )
+"""
+
+import json
+import os
+import asyncio
+import uuid
+import datetime
+import base64
+from pathlib import Path
+from typing import Dict, Any, Optional
+from openai import AsyncOpenAI
+import httpx  # Use httpx for async HTTP requests
+
+# Initialize Nous Research API client for vision processing
+nous_client = AsyncOpenAI(
+    api_key=os.getenv("NOUS_API_KEY"),
+    base_url="https://inference-api.nousresearch.com/v1"
+)
+
+# Configuration for vision processing
+DEFAULT_VISION_MODEL = "gemini-2.5-flash"
+
+# Debug mode configuration
+DEBUG_MODE = os.getenv("VISION_TOOLS_DEBUG", "false").lower() == "true"
+DEBUG_SESSION_ID = str(uuid.uuid4())
+DEBUG_LOG_PATH = Path("./logs")
+DEBUG_DATA = {
+    "session_id": DEBUG_SESSION_ID,
+    "start_time": datetime.datetime.now().isoformat(),
+    "debug_enabled": DEBUG_MODE,
+    "tool_calls": []
+} if DEBUG_MODE else None
+
+# Create logs directory if debug mode is enabled
+if DEBUG_MODE:
+    DEBUG_LOG_PATH.mkdir(exist_ok=True)
+    print(f"🐛 Vision debug mode enabled - Session ID: {DEBUG_SESSION_ID}")
+
+
+def _log_debug_call(tool_name: str, call_data: Dict[str, Any]) -> None:
+    """
+    Log a debug call entry to the global debug data structure.
+    
+    Args:
+        tool_name (str): Name of the tool being called
+        call_data (Dict[str, Any]): Data about the call including parameters and results
+    """
+    if not DEBUG_MODE or not DEBUG_DATA:
+        return
+    
+    call_entry = {
+        "timestamp": datetime.datetime.now().isoformat(),
+        "tool_name": tool_name,
+        **call_data
+    }
+    
+    DEBUG_DATA["tool_calls"].append(call_entry)
+
+
+def _save_debug_log() -> None:
+    """
+    Save the current debug data to a JSON file in the logs directory.
+    """
+    if not DEBUG_MODE or not DEBUG_DATA:
+        return
+    
+    try:
+        debug_filename = f"vision_tools_debug_{DEBUG_SESSION_ID}.json"
+        debug_filepath = DEBUG_LOG_PATH / debug_filename
+        
+        # Update end time
+        DEBUG_DATA["end_time"] = datetime.datetime.now().isoformat()
+        DEBUG_DATA["total_calls"] = len(DEBUG_DATA["tool_calls"])
+        
+        with open(debug_filepath, 'w', encoding='utf-8') as f:
+            json.dump(DEBUG_DATA, f, indent=2, ensure_ascii=False)
+        
+        print(f"🐛 Vision debug log saved: {debug_filepath}")
+        
+    except Exception as e:
+        print(f"❌ Error saving vision debug log: {str(e)}")
+
+
+def _validate_image_url(url: str) -> bool:
+    """
+    Basic validation of image URL format.
+    
+    Args:
+        url (str): The URL to validate
+        
+    Returns:
+        bool: True if URL appears to be valid, False otherwise
+    """
+    if not url or not isinstance(url, str):
+        return False
+    
+    # Check if it's a valid URL format
+    if not (url.startswith('http://') or url.startswith('https://')):
+        return False
+    
+    # Check for common image extensions (optional, as URLs may not have extensions)
+    image_extensions = ['.jpg', '.jpeg', '.png', '.gif', '.bmp', '.webp', '.svg']
+    
+    return True  # Allow all HTTP/HTTPS URLs for flexibility
+
+
+async def _download_image(image_url: str, destination: Path) -> Path:
+    """
+    Download an image from a URL to a local destination (async).
+    
+    Args:
+        image_url (str): The URL of the image to download
+        destination (Path): The path where the image should be saved
+        
+    Returns:
+        Path: The path to the downloaded image
+        
+    Raises:
+        Exception: If download fails or response is invalid
+    """
+    # Create parent directories if they don't exist
+    destination.parent.mkdir(parents=True, exist_ok=True)
+    
+    # Download the image with appropriate headers using async httpx
+    async with httpx.AsyncClient(timeout=30.0) as client:
+        response = await client.get(
+            image_url,
+            headers={"User-Agent": "hermes-agent-vision/1.0"},
+        )
+        response.raise_for_status()
+        
+        # Save the image content
+        destination.write_bytes(response.content)
+    
+    return destination
+
+
+def _determine_mime_type(image_path: Path) -> str:
+    """
+    Determine the MIME type of an image based on its file extension.
+    
+    Args:
+        image_path (Path): Path to the image file
+        
+    Returns:
+        str: The MIME type (defaults to image/jpeg if unknown)
+    """
+    extension = image_path.suffix.lower()
+    mime_types = {
+        '.jpg': 'image/jpeg',
+        '.jpeg': 'image/jpeg',
+        '.png': 'image/png',
+        '.gif': 'image/gif',
+        '.bmp': 'image/bmp',
+        '.webp': 'image/webp',
+        '.svg': 'image/svg+xml'
+    }
+    return mime_types.get(extension, 'image/jpeg')
+
+
+def _image_to_base64_data_url(image_path: Path, mime_type: Optional[str] = None) -> str:
+    """
+    Convert an image file to a base64-encoded data URL.
+    
+    Args:
+        image_path (Path): Path to the image file
+        mime_type (Optional[str]): MIME type of the image (auto-detected if None)
+        
+    Returns:
+        str: Base64-encoded data URL (e.g., "data:image/jpeg;base64,...")
+    """
+    # Read the image as bytes
+    data = image_path.read_bytes()
+    
+    # Encode to base64
+    encoded = base64.b64encode(data).decode("ascii")
+    
+    # Determine MIME type
+    mime = mime_type or _determine_mime_type(image_path)
+    
+    # Create data URL
+    data_url = f"data:{mime};base64,{encoded}"
+    
+    return data_url
+
+
+async def vision_analyze_tool(
+    image_url: str,
+    user_prompt: str,
+    model: str = DEFAULT_VISION_MODEL
+) -> str:
+    """
+    Analyze an image from a URL using vision AI.
+    
+    This tool downloads images from URLs, converts them to base64, and processes
+    them using Gemini Flash via Nous Research API. The image is downloaded to a
+    temporary location and automatically cleaned up after processing.
+    
+    The user_prompt parameter is expected to be pre-formatted by the calling
+    function (typically model_tools.py) to include both full description
+    requests and specific questions.
+    
+    Args:
+        image_url (str): The URL of the image to analyze (must be http:// or https://)
+        user_prompt (str): The pre-formatted prompt for the vision model
+        model (str): The vision model to use (default: gemini-2.5-flash)
+    
+    Returns:
+        str: JSON string containing the analysis results with the following structure:
+             {
+                 "success": bool,
+                 "analysis": str (defaults to error message if None)
+             }
+    
+    Raises:
+        Exception: If download fails, analysis fails, or API key is not set
+        
+    Note:
+        - Temporary images are stored in ./temp_vision_images/
+        - Images are automatically deleted after processing
+        - Supports common image formats (JPEG, PNG, GIF, WebP, etc.)
+    """
+    debug_call_data = {
+        "parameters": {
+            "image_url": image_url,
+            "user_prompt": user_prompt[:200] + "..." if len(user_prompt) > 200 else user_prompt,
+            "model": model
+        },
+        "error": None,
+        "success": False,
+        "analysis_length": 0,
+        "model_used": model,
+        "image_size_bytes": 0
+    }
+    
+    temp_image_path = None
+    
+    try:
+        print(f"🔍 Analyzing image from URL: {image_url[:60]}{'...' if len(image_url) > 60 else ''}", flush=True)
+        print(f"📝 User prompt: {user_prompt[:100]}{'...' if len(user_prompt) > 100 else ''}", flush=True)
+        
+        # Validate image URL
+        if not _validate_image_url(image_url):
+            raise ValueError("Invalid image URL format. Must start with http:// or https://")
+        
+        # Check API key availability
+        if not os.getenv("NOUS_API_KEY"):
+            raise ValueError("NOUS_API_KEY environment variable not set")
+        
+        # Download the image to a temporary location
+        print(f"⬇️  Downloading image from URL...", flush=True)
+        temp_dir = Path("./temp_vision_images")
+        temp_image_path = temp_dir / f"temp_image_{uuid.uuid4()}.jpg"
+        
+        await _download_image(image_url, temp_image_path)
+        
+        # Get image file size for logging
+        image_size_bytes = temp_image_path.stat().st_size
+        image_size_kb = image_size_bytes / 1024
+        print(f"✅ Image downloaded successfully ({image_size_kb:.1f} KB)", flush=True)
+        
+        # Convert image to base64 data URL
+        print(f"🔄 Converting image to base64...", flush=True)
+        image_data_url = _image_to_base64_data_url(temp_image_path)
+        # Calculate size in KB for better readability
+        data_size_kb = len(image_data_url) / 1024
+        print(f"✅ Image converted to base64 ({data_size_kb:.1f} KB)", flush=True)
+        
+        debug_call_data["image_size_bytes"] = image_size_bytes
+        
+        # Use the prompt as provided (model_tools.py now handles full description formatting)
+        comprehensive_prompt = user_prompt
+        
+        # Prepare the message with base64-encoded image
+        messages = [
+            {
+                "role": "user",
+                "content": [
+                    {
+                        "type": "text",
+                        "text": comprehensive_prompt
+                    },
+                    {
+                        "type": "image_url",
+                        "image_url": {
+                            "url": image_data_url
+                        }
+                    }
+                ]
+            }
+        ]
+        
+        print(f"🧠 Processing image with {model}...", flush=True)
+        
+        # Call the vision API
+        response = await nous_client.chat.completions.create(
+            model=model,
+            messages=messages,
+            temperature=0.1,  # Low temperature for consistent analysis
+            max_tokens=2000   # Generous limit for detailed analysis
+        )
+        
+        # Extract the analysis
+        analysis = response.choices[0].message.content.strip()
+        analysis_length = len(analysis)
+        
+        print(f"✅ Image analysis completed ({analysis_length} characters)", flush=True)
+        
+        # Prepare successful response
+        result = {
+            "success": True,
+            "analysis": analysis or "There was a problem with the request and the image could not be analyzed."
+        }
+        
+        debug_call_data["success"] = True
+        debug_call_data["analysis_length"] = analysis_length
+        
+        # Log debug information
+        _log_debug_call("vision_analyze_tool", debug_call_data)
+        _save_debug_log()
+        
+        return json.dumps(result, indent=2)
+        
+    except Exception as e:
+        error_msg = f"Error analyzing image: {str(e)}"
+        print(f"❌ {error_msg}", flush=True)
+        
+        # Prepare error response
+        result = {
+            "success": False,
+            "analysis": "There was a problem with the request and the image could not be analyzed."
+        }
+        
+        debug_call_data["error"] = error_msg
+        _log_debug_call("vision_analyze_tool", debug_call_data)
+        _save_debug_log()
+        
+        return json.dumps(result, indent=2)
+    
+    finally:
+        # Clean up temporary image file
+        if temp_image_path and temp_image_path.exists():
+            try:
+                temp_image_path.unlink()
+                print(f"🧹 Cleaned up temporary image file", flush=True)
+            except Exception as cleanup_error:
+                print(f"⚠️  Warning: Could not delete temporary file: {cleanup_error}", flush=True)
+
+
+def check_nous_api_key() -> bool:
+    """
+    Check if the Nous Research API key is available in environment variables.
+    
+    Returns:
+        bool: True if API key is set, False otherwise
+    """
+    return bool(os.getenv("NOUS_API_KEY"))
+
+
+def check_vision_requirements() -> bool:
+    """
+    Check if all requirements for vision tools are met.
+    
+    Returns:
+        bool: True if requirements are met, False otherwise
+    """
+    return check_nous_api_key()
+
+
+def get_debug_session_info() -> Dict[str, Any]:
+    """
+    Get information about the current debug session.
+    
+    Returns:
+        Dict[str, Any]: Dictionary containing debug session information
+    """
+    if not DEBUG_MODE or not DEBUG_DATA:
+        return {
+            "enabled": False,
+            "session_id": None,
+            "log_path": None,
+            "total_calls": 0
+        }
+    
+    return {
+        "enabled": True,
+        "session_id": DEBUG_SESSION_ID,
+        "log_path": str(DEBUG_LOG_PATH / f"vision_tools_debug_{DEBUG_SESSION_ID}.json"),
+        "total_calls": len(DEBUG_DATA["tool_calls"])
+    }
+
+
+if __name__ == "__main__":
+    """
+    Simple test/demo when run directly
+    """
+    print("👁️ Vision Tools Module")
+    print("=" * 40)
+    
+    # Check if API key is available
+    api_available = check_nous_api_key()
+    
+    if not api_available:
+        print("❌ NOUS_API_KEY environment variable not set")
+        print("Please set your API key: export NOUS_API_KEY='your-key-here'")
+        print("Get API key at: https://inference-api.nousresearch.com/")
+        exit(1)
+    else:
+        print("✅ Nous Research API key found")
+    
+    print("🛠️ Vision tools ready for use!")
+    print(f"🧠 Using model: {DEFAULT_VISION_MODEL}")
+    
+    # Show debug mode status
+    if DEBUG_MODE:
+        print(f"🐛 Debug mode ENABLED - Session ID: {DEBUG_SESSION_ID}")
+        print(f"   Debug logs will be saved to: ./logs/vision_tools_debug_{DEBUG_SESSION_ID}.json")
+    else:
+        print("🐛 Debug mode disabled (set VISION_TOOLS_DEBUG=true to enable)")
+    
+    print("\nBasic usage:")
+    print("  from vision_tools import vision_analyze_tool")
+    print("  import asyncio")
+    print("")
+    print("  async def main():")
+    print("      result = await vision_analyze_tool(")
+    print("          image_url='https://example.com/image.jpg',")
+    print("          user_prompt='What do you see in this image?'")
+    print("      )")
+    print("      print(result)")
+    print("  asyncio.run(main())")
+    
+    print("\nExample prompts:")
+    print("  - 'What architectural style is this building?'")
+    print("  - 'Describe the emotions and mood in this image'")
+    print("  - 'What text can you read in this image?'")
+    print("  - 'Identify any safety hazards visible'")
+    print("  - 'What products or brands are shown?'")
+    
+    print("\nDebug mode:")
+    print("  # Enable debug logging")
+    print("  export VISION_TOOLS_DEBUG=true")
+    print("  # Debug logs capture all vision analysis calls and results")
+    print("  # Logs saved to: ./logs/vision_tools_debug_UUID.json")
--- a/tools/web_tools.py
+++ b/tools/web_tools.py
--- a/toolset_distributions.py
+++ b/toolset_distributions.py
@@ -0,0 +1,270 @@
+#!/usr/bin/env python3
+"""
+Toolset Distributions Module
+
+This module defines distributions of toolsets for data generation runs.
+Each distribution specifies which toolsets should be used and their probability
+of being selected for any given prompt during the batch processing.
+
+A distribution is a dictionary mapping toolset names to their selection probability (%).
+Probabilities should sum to 100, but the system will normalize if they don't.
+
+Usage:
+    from toolset_distributions import get_distribution, list_distributions
+    
+    # Get a specific distribution
+    dist = get_distribution("image_gen")
+    
+    # List all available distributions
+    all_dists = list_distributions()
+"""
+
+from typing import Dict, List, Optional
+import random
+from toolsets import validate_toolset
+
+
+# Distribution definitions
+# Each key is a distribution name, and the value is a dict of toolset_name: probability_percentage
+DISTRIBUTIONS = {
+    # Default: All tools available 100% of the time
+    "default": {
+        "description": "All available tools, all the time",
+        "toolsets": {
+            "web": 100,
+            "vision": 100,
+            "image_gen": 100,
+            "terminal": 100,
+            "moa": 100
+        }
+    },
+    
+    # Image generation focused distribution
+    "image_gen": {
+        "description": "Heavy focus on image generation with vision and web support",
+        "toolsets": {
+            "image_gen": 90,  # 80% chance of image generation tools
+            "vision": 90,      # 60% chance of vision tools
+            "web": 55,         # 40% chance of web tools
+            "terminal": 45,
+            "moa": 10          # 20% chance of reasoning tools
+        }
+    },
+    
+    # Research-focused distribution
+    "research": {
+        "description": "Web research with vision analysis and reasoning",
+        "toolsets": {
+            "web": 90,      # 90% chance of web tools
+            "vision": 50,   # 50% chance of vision tools
+            "moa": 40,      # 40% chance of reasoning tools
+            "terminal": 10  # 10% chance of terminal tools
+        }
+    },
+    
+    # Development-focused distribution
+    "development": {
+        "description": "Terminal and reasoning with occasional web lookup",
+        "toolsets": {
+            "terminal": 80,  # 80% chance of terminal tools
+            "moa": 60,       # 60% chance of reasoning tools
+            "web": 30,       # 30% chance of web tools
+            "vision": 10     # 10% chance of vision tools
+        }
+    },
+    
+    # Safe mode (no terminal)
+    "safe": {
+        "description": "All tools except terminal for safety",
+        "toolsets": {
+            "web": 80,
+            "vision": 60,
+            "image_gen": 60,
+            "moa": 50
+        }
+    },
+    
+    # Balanced distribution
+    "balanced": {
+        "description": "Equal probability of all toolsets",
+        "toolsets": {
+            "web": 50,
+            "vision": 50,
+            "image_gen": 50,
+            "terminal": 50,
+            "moa": 50
+        }
+    },
+    
+    # Minimal (web only)
+    "minimal": {
+        "description": "Only web tools for basic research",
+        "toolsets": {
+            "web": 100
+        }
+    },
+    
+    # Creative (vision + image generation)
+    "creative": {
+        "description": "Image generation and vision analysis focus",
+        "toolsets": {
+            "image_gen": 90,
+            "vision": 90,
+            "web": 30
+        }
+    },
+    
+    # Reasoning heavy
+    "reasoning": {
+        "description": "Heavy mixture of agents usage with minimal other tools",
+        "toolsets": {
+            "moa": 90,
+            "web": 30,
+            "terminal": 20
+        }
+    }
+}
+
+
+def get_distribution(name: str) -> Optional[Dict[str, any]]:
+    """
+    Get a toolset distribution by name.
+    
+    Args:
+        name (str): Name of the distribution
+        
+    Returns:
+        Dict: Distribution definition with description and toolsets
+        None: If distribution not found
+    """
+    return DISTRIBUTIONS.get(name)
+
+
+def list_distributions() -> Dict[str, Dict]:
+    """
+    List all available distributions.
+    
+    Returns:
+        Dict: All distribution definitions
+    """
+    return DISTRIBUTIONS.copy()
+
+
+def sample_toolsets_from_distribution(distribution_name: str) -> List[str]:
+    """
+    Sample toolsets based on a distribution's probabilities.
+    
+    Each toolset in the distribution has a % chance of being included.
+    This allows multiple toolsets to be active simultaneously.
+    
+    Args:
+        distribution_name (str): Name of the distribution to sample from
+        
+    Returns:
+        List[str]: List of sampled toolset names
+        
+    Raises:
+        ValueError: If distribution name is not found
+    """
+    dist = get_distribution(distribution_name)
+    if not dist:
+        raise ValueError(f"Unknown distribution: {distribution_name}")
+    
+    # Sample each toolset independently based on its probability
+    selected_toolsets = []
+    
+    for toolset_name, probability in dist["toolsets"].items():
+        # Validate toolset exists
+        if not validate_toolset(toolset_name):
+            print(f"⚠️  Warning: Toolset '{toolset_name}' in distribution '{distribution_name}' is not valid")
+            continue
+        
+        # Roll the dice - if random value is less than probability, include this toolset
+        if random.random() * 100 < probability:
+            selected_toolsets.append(toolset_name)
+    
+    # If no toolsets were selected (can happen with low probabilities), 
+    # ensure at least one toolset is selected by picking the highest probability one
+    if not selected_toolsets and dist["toolsets"]:
+        # Find toolset with highest probability
+        highest_prob_toolset = max(dist["toolsets"].items(), key=lambda x: x[1])[0]
+        if validate_toolset(highest_prob_toolset):
+            selected_toolsets.append(highest_prob_toolset)
+    
+    return selected_toolsets
+
+
+def validate_distribution(distribution_name: str) -> bool:
+    """
+    Check if a distribution name is valid.
+    
+    Args:
+        distribution_name (str): Distribution name to validate
+        
+    Returns:
+        bool: True if valid, False otherwise
+    """
+    return distribution_name in DISTRIBUTIONS
+
+
+def print_distribution_info(distribution_name: str) -> None:
+    """
+    Print detailed information about a distribution.
+    
+    Args:
+        distribution_name (str): Distribution name
+    """
+    dist = get_distribution(distribution_name)
+    if not dist:
+        print(f"❌ Unknown distribution: {distribution_name}")
+        return
+    
+    print(f"\n📊 Distribution: {distribution_name}")
+    print(f"   Description: {dist['description']}")
+    print(f"   Toolsets:")
+    for toolset, prob in sorted(dist["toolsets"].items(), key=lambda x: x[1], reverse=True):
+        print(f"     • {toolset:15} : {prob:3}% chance")
+
+
+if __name__ == "__main__":
+    """
+    Demo and testing of the distributions system
+    """
+    print("📊 Toolset Distributions Demo")
+    print("=" * 60)
+    
+    # List all distributions
+    print("\n📋 Available Distributions:")
+    print("-" * 40)
+    for name, dist in list_distributions().items():
+        print(f"\n  {name}:")
+        print(f"    {dist['description']}")
+        toolset_list = ", ".join([f"{ts}({p}%)" for ts, p in dist["toolsets"].items()])
+        print(f"    Toolsets: {toolset_list}")
+    
+    # Demo sampling
+    print("\n\n🎲 Sampling Examples:")
+    print("-" * 40)
+    
+    test_distributions = ["image_gen", "research", "balanced", "default"]
+    
+    for dist_name in test_distributions:
+        print(f"\n{dist_name}:")
+        # Sample 5 times to show variability
+        samples = []
+        for _ in range(5):
+            sampled = sample_toolsets_from_distribution(dist_name)
+            samples.append(sorted(sampled))
+        
+        print(f"  Sample 1: {samples[0]}")
+        print(f"  Sample 2: {samples[1]}")
+        print(f"  Sample 3: {samples[2]}")
+        print(f"  Sample 4: {samples[3]}")
+        print(f"  Sample 5: {samples[4]}")
+    
+    # Show detailed info
+    print("\n\n📊 Detailed Distribution Info:")
+    print("-" * 40)
+    print_distribution_info("image_gen")
+    print_distribution_info("research")
+
--- a/toolsets.py
+++ b/toolsets.py
@@ -0,0 +1,339 @@
+#!/usr/bin/env python3
+"""
+Toolsets Module
+
+This module provides a flexible system for defining and managing tool aliases/toolsets.
+Toolsets allow you to group tools together for specific scenarios and can be composed
+from individual tools or other toolsets.
+
+Features:
+- Define custom toolsets with specific tools
+- Compose toolsets from other toolsets
+- Built-in common toolsets for typical use cases
+- Easy extension for new toolsets
+- Support for dynamic toolset resolution
+
+Usage:
+    from toolsets import get_toolset, resolve_toolset, get_all_toolsets
+    
+    # Get tools for a specific toolset
+    tools = get_toolset("research")
+    
+    # Resolve a toolset to get all tool names (including from composed toolsets)
+    all_tools = resolve_toolset("full_stack")
+"""
+
+from typing import List, Dict, Any, Set, Optional
+import json
+
+
+# Core toolset definitions
+# These can include individual tools or reference other toolsets
+TOOLSETS = {
+    # Basic toolsets - individual tool categories
+    "web": {
+        "description": "Web research and content extraction tools",
+        "tools": ["web_search", "web_extract", "web_crawl"],
+        "includes": []  # No other toolsets included
+    },
+    
+    "vision": {
+        "description": "Image analysis and vision tools",
+        "tools": ["vision_analyze"],
+        "includes": []
+    },
+    
+    "image_gen": {
+        "description": "Creative generation tools (images)",
+        "tools": ["image_generate"],
+        "includes": []
+    },
+    
+    "terminal": {
+        "description": "Terminal/command execution tools",
+        "tools": ["terminal"],
+        "includes": []
+    },
+    
+    "moa": {
+        "description": "Advanced reasoning and problem-solving tools",
+        "tools": ["mixture_of_agents"],
+        "includes": []
+    },
+    
+    # Scenario-specific toolsets
+    
+    "debugging": {
+        "description": "Debugging and troubleshooting toolkit",
+        "tools": ["terminal"],
+        "includes": ["web"]  # For searching error messages and solutions
+    },
+    
+    "safe": {
+        "description": "Safe toolkit without terminal access",
+        "tools": ["mixture_of_agents"],
+        "includes": ["web", "vision", "creative"]
+    }
+}
+
+
+
+def get_toolset(name: str) -> Optional[Dict[str, Any]]:
+    """
+    Get a toolset definition by name.
+    
+    Args:
+        name (str): Name of the toolset
+        
+    Returns:
+        Dict: Toolset definition with description, tools, and includes
+        None: If toolset not found
+    """
+    # Return toolset definition
+    return TOOLSETS.get(name)
+
+
+def resolve_toolset(name: str, visited: Set[str] = None) -> List[str]:
+    """
+    Recursively resolve a toolset to get all tool names.
+    
+    This function handles toolset composition by recursively resolving
+    included toolsets and combining all tools.
+    
+    Args:
+        name (str): Name of the toolset to resolve
+        visited (Set[str]): Set of already visited toolsets (for cycle detection)
+        
+    Returns:
+        List[str]: List of all tool names in the toolset
+    """
+    if visited is None:
+        visited = set()
+    
+    # Special aliases that represent all tools across every toolset
+    # This ensures future toolsets are automatically included without changes.
+    if name in {"all", "*"}:
+        all_tools: Set[str] = set()
+        for toolset_name in get_toolset_names():
+            # Use a fresh visited set per branch to avoid cross-branch contamination
+            resolved = resolve_toolset(toolset_name, visited.copy())
+            all_tools.update(resolved)
+        return list(all_tools)
+
+    # Check for cycles
+    if name in visited:
+        print(f"⚠️  Circular dependency detected in toolset '{name}'")
+        return []
+    
+    visited.add(name)
+    
+    # Get toolset definition
+    toolset = TOOLSETS.get(name)
+    if not toolset:
+        return []
+    
+    # Collect direct tools
+    tools = set(toolset.get("tools", []))
+    
+    # Recursively resolve included toolsets
+    for included_name in toolset.get("includes", []):
+        included_tools = resolve_toolset(included_name, visited.copy())
+        tools.update(included_tools)
+    
+    return list(tools)
+
+
+def resolve_multiple_toolsets(toolset_names: List[str]) -> List[str]:
+    """
+    Resolve multiple toolsets and combine their tools.
+    
+    Args:
+        toolset_names (List[str]): List of toolset names to resolve
+        
+    Returns:
+        List[str]: Combined list of all tool names (deduplicated)
+    """
+    all_tools = set()
+    
+    for name in toolset_names:
+        tools = resolve_toolset(name)
+        all_tools.update(tools)
+    
+    return list(all_tools)
+
+
+def get_all_toolsets() -> Dict[str, Dict[str, Any]]:
+    """
+    Get all available toolsets with their definitions.
+    
+    Returns:
+        Dict: All toolset definitions
+    """
+    return TOOLSETS.copy()
+
+
+def get_toolset_names() -> List[str]:
+    """
+    Get names of all available toolsets (excluding aliases).
+    
+    Returns:
+        List[str]: List of toolset names
+    """
+    return list(TOOLSETS.keys())
+
+
+
+
+def validate_toolset(name: str) -> bool:
+    """
+    Check if a toolset name is valid.
+    
+    Args:
+        name (str): Toolset name to validate
+        
+    Returns:
+        bool: True if valid, False otherwise
+    """
+    # Accept special alias names for convenience
+    if name in {"all", "*"}:
+        return True
+    return name in TOOLSETS
+
+
+def create_custom_toolset(
+    name: str,
+    description: str,
+    tools: List[str] = None,
+    includes: List[str] = None
+) -> None:
+    """
+    Create a custom toolset at runtime.
+    
+    Args:
+        name (str): Name for the new toolset
+        description (str): Description of the toolset
+        tools (List[str]): Direct tools to include
+        includes (List[str]): Other toolsets to include
+    """
+    TOOLSETS[name] = {
+        "description": description,
+        "tools": tools or [],
+        "includes": includes or []
+    }
+
+
+
+
+def get_toolset_info(name: str) -> Dict[str, Any]:
+    """
+    Get detailed information about a toolset including resolved tools.
+    
+    Args:
+        name (str): Toolset name
+        
+    Returns:
+        Dict: Detailed toolset information
+    """
+    toolset = get_toolset(name)
+    if not toolset:
+        return None
+    
+    resolved_tools = resolve_toolset(name)
+    
+    return {
+        "name": name,
+        "description": toolset["description"],
+        "direct_tools": toolset["tools"],
+        "includes": toolset["includes"],
+        "resolved_tools": resolved_tools,
+        "tool_count": len(resolved_tools),
+        "is_composite": len(toolset["includes"]) > 0
+    }
+
+
+def print_toolset_tree(name: str, indent: int = 0) -> None:
+    """
+    Print a tree view of a toolset and its composition.
+    
+    Args:
+        name (str): Toolset name
+        indent (int): Current indentation level
+    """
+    prefix = "  " * indent
+    toolset = get_toolset(name)
+    
+    if not toolset:
+        print(f"{prefix}❌ Unknown toolset: {name}")
+        return
+    
+    # Print toolset name and description
+    print(f"{prefix}📦 {name}: {toolset['description']}")
+    
+    # Print direct tools
+    if toolset["tools"]:
+        print(f"{prefix}  🔧 Tools: {', '.join(toolset['tools'])}")
+    
+    # Print included toolsets
+    if toolset["includes"]:
+        print(f"{prefix}  📂 Includes:")
+        for included in toolset["includes"]:
+            print_toolset_tree(included, indent + 2)
+
+
+if __name__ == "__main__":
+    """
+    Demo and testing of the toolsets system
+    """
+    print("🎯 Toolsets System Demo")
+    print("=" * 60)
+    
+    # Show all available toolsets
+    print("\n📦 Available Toolsets:")
+    print("-" * 40)
+    for name, toolset in get_all_toolsets().items():
+        info = get_toolset_info(name)
+        composite = "📂" if info["is_composite"] else "🔧"
+        print(f"{composite} {name:20} - {toolset['description']}")
+        print(f"   Tools: {len(info['resolved_tools'])} total")
+    
+    
+    # Demo toolset resolution
+    print("\n🔍 Toolset Resolution Examples:")
+    print("-" * 40)
+    
+    examples = ["research", "development", "full_stack", "minimal", "safe"]
+    for name in examples:
+        tools = resolve_toolset(name)
+        print(f"\n{name}:")
+        print(f"  Resolved to {len(tools)} tools: {', '.join(sorted(tools))}")
+    
+    # Show toolset composition tree
+    print("\n🌳 Toolset Composition Tree:")
+    print("-" * 40)
+    print("\nExample: 'content_creation' toolset:")
+    print_toolset_tree("content_creation")
+    
+    print("\nExample: 'full_stack' toolset:")
+    print_toolset_tree("full_stack")
+    
+    # Demo multiple toolset resolution
+    print("\n🔗 Multiple Toolset Resolution:")
+    print("-" * 40)
+    combined = resolve_multiple_toolsets(["minimal", "vision", "reasoning"])
+    print(f"Combining ['minimal', 'vision', 'reasoning']:")
+    print(f"  Result: {', '.join(sorted(combined))}")
+    
+    # Demo custom toolset creation
+    print("\n➕ Custom Toolset Creation:")
+    print("-" * 40)
+    create_custom_toolset(
+        name="my_custom",
+        description="My custom toolset for specific tasks",
+        tools=["web_search"],
+        includes=["terminal", "vision"]
+    )
+    
+    custom_info = get_toolset_info("my_custom")
+    print(f"Created 'my_custom' toolset:")
+    print(f"  Description: {custom_info['description']}")
+    print(f"  Resolved tools: {', '.join(custom_info['resolved_tools'])}")
Author	SHA1	Message	Date
hjc-puro	a6ec79730c	terminal tool	2025-11-02 08:57:04 +08:00
hjc-puro	faecbddd9b	fix terminal interactivity	2025-11-02 08:52:05 +08:00
teknium	de9c0edc51	some bugfixes	2025-10-15 18:07:06 +00:00
teknium	8d256779d8	Update vision_tools.py to include image downloading and base64 conversion features. add excluding tmp image dl's in .gitignore	2025-10-08 02:38:04 +00:00
teknium	d36790de91	Add ephemeral system prompt support in batch and agent runners. Update README with usage examples and documentation for the new feature. Ensure prompt is not saved to trajectories.	2025-10-08 02:33:58 +00:00
teknium	a398d320b7	update gitignore	2025-10-07 14:09:37 +00:00
teknium	22b6d5866c	Fix some issues around async and tool constraints	2025-10-07 14:08:46 +00:00
teknium	0e2e69a71d	Add batch processing capabilities with checkpointing and statistics tracking, along with toolset distribution management. Update README and add test scripts for validation.	2025-10-06 03:17:58 +00:00
teknium	bc5f0e62d9	Add support for enabling all toolsets with 'all' or '*' alias in README and toolset resolution logic	2025-10-03 13:45:29 +00:00
teknium	6fac6fecde	Enhance import handling for Hecate in terminal_tool.py to manage local folder shadowing and improve error reporting for import failures.	2025-10-03 09:46:44 +00:00
teknium	c42d9055ed	Move test run back to repo root. weirdness occurred	2025-10-02 20:05:09 +00:00
teknium	a7ff4d49e9	A bit of restructuring for simplicity and organization	2025-10-01 23:29:25 +00:00
teknium	0411ca1880	Add environment configuration file, restructure tool imports, and enhance README setup instructions	2025-10-01 09:54:17 +00:00
Teknium	c5386ed7e6	add better logging when requests fail	2025-09-10 00:51:41 -07:00
Teknium	2082c7caa3	update gitignore	2025-09-10 00:50:56 -07:00
Teknium	17608c1142	Update to use toolsets and make them easy to create and configure	2025-09-10 00:43:55 -07:00
Teknium	c7fa4447b8	cleanup	2025-09-06 22:07:38 -07:00