Compare commits

...

1 Commits

Author SHA1 Message Date
kshitijk4poor
0f05b18413 feat(skills): add comfyui-mcp optional skill for generative image/video workflows
New optional skill under optional-skills/creative/comfyui-mcp/ with:
- SKILL.md: Setup guide (local, cloud, remote), Python helper functions
  for execute_code, common workflow patterns (txt2img, parameterized gen),
  queue management, and MCP server integration option.
- references/api.md: ComfyUI REST API reference (endpoints, JSON formats).
- references/recipes.md: Ready-to-use workflow templates (SDXL txt2img,
  img2img, Flux).

Zero core code changes — skill-only PR. Uses the established
skill-as-prompt pattern (like blender-mcp): teaches the agent to
interact with ComfyUI's REST API via execute_code.
2026-04-18 17:30:32 +05:30
3 changed files with 564 additions and 0 deletions

View File

@@ -0,0 +1,346 @@
---
name: comfyui-mcp
description: Control a running ComfyUI instance from Hermes — queue workflows, generate images/video, upload inputs, manage models. Use when the user wants to create or modify anything with ComfyUI's node-based generative pipeline.
version: 1.0.0
requires: ComfyUI running locally, remotely, or via Comfy Cloud (default http://127.0.0.1:8188)
author: kshitijk4poor
license: MIT
metadata:
hermes:
tags: [comfyui, image-generation, stable-diffusion, flux, creative, generative-ai]
related_skills: [hermes-blender, stable-diffusion-image-generation, image_gen]
category: creative
---
# ComfyUI
Control a running ComfyUI instance from Hermes via its REST API. Queue workflow prompts, generate images and video, upload inputs, check progress, and retrieve outputs — all through `execute_code`.
## When to Use
- User asks to generate images with Stable Diffusion, SDXL, Flux, or other diffusion models
- User wants to run a specific ComfyUI workflow
- User wants to chain generative steps (txt2img → upscale → face restore)
- User needs ControlNet, inpainting, img2img, or other advanced pipelines
- User asks to manage ComfyUI queue or check generation progress
## Setup
ComfyUI must be running and reachable. Three options:
### Option A: Local
**Requires Python 3.10+.**
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python3 -m venv venv && source venv/bin/activate
pip install torch torchvision torchaudio
pip install -r requirements.txt
python main.py --listen 127.0.0.1 --port 8188
GPU acceleration is auto-detected (CUDA on NVIDIA, MPS on Apple Silicon).
### Option B: Comfy Cloud
1. Sign up at https://platform.comfy.org
2. Generate an API key at https://platform.comfy.org/profile/api-keys (**requires paid plan**)
3. Set in `~/.hermes/.env`:
```
COMFYUI_URL=https://cloud.comfy.org/api
COMFYUI_API_KEY=<your-key>
```
### Option C: Remote instance
Point `COMFYUI_URL` at any reachable ComfyUI server:
```
COMFYUI_URL=http://192.168.1.100:8188
```
### Verify connection
```python
from hermes_tools import terminal
r = terminal("curl -s ${COMFYUI_URL:-http://127.0.0.1:8188}/system_stats | python3 -m json.tool | head -5")
print(r["output"])
```
## Core Pattern — ComfyUI Helper
Use this helper inside `execute_code` for all ComfyUI interactions:
```python
import json, time, urllib.request, urllib.parse, urllib.error, uuid, os
COMFY_URL = os.getenv("COMFYUI_URL", "http://127.0.0.1:8188")
COMFY_API_KEY = os.getenv("COMFYUI_API_KEY", "")
def comfy_api(method, path, data=None, timeout=30):
"""Send a request to the ComfyUI API."""
url = f"{COMFY_URL}{path}"
body = json.dumps(data).encode() if data else None
req = urllib.request.Request(url, data=body, method=method)
if body:
req.add_header("Content-Type", "application/json")
if COMFY_API_KEY:
req.add_header("X-API-Key", COMFY_API_KEY)
with urllib.request.urlopen(req, timeout=timeout) as resp:
return json.loads(resp.read())
def queue_prompt(workflow, client_id=None):
"""Queue a workflow for execution. Returns prompt_id."""
client_id = client_id or str(uuid.uuid4())
result = comfy_api("POST", "/prompt", {
"prompt": workflow,
"client_id": client_id,
})
return result["prompt_id"]
def wait_for_completion(prompt_id, timeout=300, poll_interval=2):
"""Poll /history until the prompt completes. Returns output dict."""
deadline = time.time() + timeout
while time.time() < deadline:
history = comfy_api("GET", f"/history/{prompt_id}")
if prompt_id in history:
return history[prompt_id]
time.sleep(poll_interval)
raise TimeoutError(f"Prompt {prompt_id} did not complete in {timeout}s")
def get_image(filename, subfolder="", img_type="output"):
"""Download a generated image. Returns bytes."""
params = urllib.parse.urlencode({
"filename": filename, "subfolder": subfolder, "type": img_type
})
url = f"{COMFY_URL}/view?{params}"
req = urllib.request.Request(url)
if COMFY_API_KEY:
req.add_header("X-API-Key", COMFY_API_KEY)
with urllib.request.urlopen(req) as resp:
return resp.read()
def upload_image(filepath, img_type="input", overwrite=True):
"""Upload an image to ComfyUI. Returns server-side filename."""
import mimetypes
boundary = uuid.uuid4().hex
filename = os.path.basename(filepath)
mime = mimetypes.guess_type(filepath)[0] or "image/png"
with open(filepath, "rb") as f:
file_data = f.read()
body = (
f"--{boundary}\r\n"
f'Content-Disposition: form-data; name="image"; filename="{filename}"\r\n'
f"Content-Type: {mime}\r\n\r\n"
).encode() + file_data + (
f"\r\n--{boundary}\r\n"
f'Content-Disposition: form-data; name="type"\r\n\r\n'
f"{img_type}\r\n"
f"--{boundary}\r\n"
f'Content-Disposition: form-data; name="overwrite"\r\n\r\n'
f"{'true' if overwrite else 'false'}\r\n"
f"--{boundary}--\r\n"
).encode()
req = urllib.request.Request(
f"{COMFY_URL}/upload/image", data=body, method="POST",
headers={"Content-Type": f"multipart/form-data; boundary={boundary}"},
)
if COMFY_API_KEY:
req.add_header("X-API-Key", COMFY_API_KEY)
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read())
def list_models(folder="checkpoints"):
"""List available models in a folder (checkpoints, loras, vae, etc.)."""
return comfy_api("GET", f"/models/{folder}")
def get_queue_status():
"""Get current queue (running + pending)."""
return comfy_api("GET", "/queue")
def interrupt():
"""Interrupt the currently running generation."""
return comfy_api("POST", "/interrupt")
```
## Common Workflows
### Text-to-Image (Minimal)
Always call `list_models("checkpoints")` first to get the exact filename.
```python
# Discover which checkpoint is installed
models = list_models("checkpoints")
ckpt = models[0] # use first available
workflow = {
"3": {
"class_type": "KSampler",
"inputs": {
"seed": 42,
"steps": 20,
"cfg": 7.0,
"sampler_name": "euler",
"scheduler": "normal",
"denoise": 1.0,
"model": ["4", 0],
"positive": ["6", 0],
"negative": ["7", 0],
"latent_image": ["5", 0],
},
},
"4": {
"class_type": "CheckpointLoaderSimple",
"inputs": {"ckpt_name": ckpt},
},
"5": {
"class_type": "EmptyLatentImage",
"inputs": {"width": 512, "height": 512, "batch_size": 1},
},
"6": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "a beautiful sunset over mountains, photorealistic",
"clip": ["4", 1],
},
},
"7": {
"class_type": "CLIPTextEncode",
"inputs": {
"text": "ugly, blurry, low quality",
"clip": ["4", 1],
},
},
"8": {
"class_type": "VAEDecode",
"inputs": {"samples": ["3", 0], "vae": ["4", 2]},
},
"9": {
"class_type": "SaveImage",
"inputs": {"filename_prefix": "hermes", "images": ["8", 0]},
},
}
pid = queue_prompt(workflow)
result = wait_for_completion(pid)
# Extract output image filename
for node_id, node_output in result["outputs"].items():
if "images" in node_output:
for img in node_output["images"]:
img_data = get_image(img["filename"], img["subfolder"], img["type"])
with open(f"/tmp/{img['filename']}", "wb") as f:
f.write(img_data)
print(f"Saved: /tmp/{img['filename']}")
```
### Parameterized Generation
When the user asks to generate an image, build the workflow by modifying the template:
- **Prompt**: Set node "6" inputs.text to the user's positive prompt
- **Negative**: Set node "7" inputs.text (default: "ugly, blurry, low quality")
- **Model**: Set node "4" inputs.ckpt_name (use `list_models()` to find available ones)
- **Size**: Set node "5" inputs.width/height (SD 1.5: 512, SDXL: 1024, Flux: 1024)
- **Steps/CFG**: Set node "3" inputs.steps and inputs.cfg
- **Seed**: Set node "3" inputs.seed (random for variation, fixed for reproducibility)
### Loading User Workflows
Users often have saved workflow JSON files. Two formats exist:
1. **API format** — flat node dict, directly usable with `queue_prompt()`:
```python
with open("workflow_api.json") as f:
workflow = json.load(f)
pid = queue_prompt(workflow)
```
2. **UI format** — includes visual layout, NOT directly usable. Look for the
`"prompt"` key inside the exported data, or ask the user to export as API format
from ComfyUI's menu: Save (API Format).
### Checking Available Nodes
```python
# List all available node types
info = comfy_api("GET", "/object_info")
print(f"Total node types: {len(info)}")
# Get info for a specific node
ksampler_info = comfy_api("GET", "/object_info/KSampler")
print(json.dumps(ksampler_info, indent=2)[:500])
```
## Queue Management
```python
# Check what's running/pending
status = get_queue_status()
running = status.get("queue_running", [])
pending = status.get("queue_pending", [])
print(f"Running: {len(running)}, Pending: {len(pending)}")
# Cancel everything
if pending:
comfy_api("POST", "/queue", {"clear": True})
# Interrupt current generation
interrupt()
```
## Advanced: Native MCP Server Integration
For deeper integration with dedicated MCP tools, configure an external ComfyUI
MCP server in `~/.hermes/config.yaml`:
```yaml
mcp_servers:
comfyui:
command: "npx"
args: ["-y", "comfyui-mcp-server"]
env:
COMFYUI_URL: "http://127.0.0.1:8188"
```
This registers ComfyUI operations as native Hermes tools (prefixed `mcp_comfyui_*`).
See the `native-mcp` skill for MCP server configuration details.
## Pitfalls
1. **Python 3.10+ required**: ComfyUI's dependencies require Python 3.10+.
2. **API format vs UI format**: ComfyUI Save produces UI format (with layout info).
Only API format works with POST /prompt. Use "Save (API Format)" or extract
the `"prompt"` key from the UI format JSON.
3. **Node IDs are strings**: Always use `"3"` not `3` in workflow dicts. Links
between nodes use `["source_node_id", output_index]` arrays.
4. **Model names must be exact**: Use `list_models("checkpoints")` to get the
exact filename including extension. Names are case-sensitive.
5. **Long generations**: Complex workflows (high steps, large images, video) can
take minutes. Set `wait_for_completion(timeout=600)` for heavy workloads.
6. **VRAM/memory exhaustion**: Large models + high resolution can OOM. Use
`comfy_api("POST", "/free", {"unload_models": True})` to free memory between
generations, or start ComfyUI with `--lowvram` / `--cpu` flags.
7. **Custom nodes**: Many workflows require custom nodes (ControlNet, IPAdapter,
AnimateDiff, etc.). If a workflow fails with "class_type not found", the user
needs to install the missing node pack via ComfyUI Manager or manually.
8. **Output path**: Generated images are saved in ComfyUI's `output/` directory.
Use `get_image()` to download them to a local path the user can access.
9. **Concurrent generations**: ComfyUI queues prompts sequentially by default.
Multiple `queue_prompt()` calls will queue, not parallelize.
10. **Sampler/scheduler compatibility**: Not all combinations work with all models.
Safe defaults — SD 1.5/SDXL: `euler` + `normal`, CFG 7.0.
Flux: `euler` + `simple`, CFG 1.0. SD3: `euler` + `sgm_uniform`, CFG 4.5.

View File

@@ -0,0 +1,97 @@
# ComfyUI REST API Reference
Default: `http://127.0.0.1:8188`. Cloud: `https://cloud.comfy.org/api`.
## Workflow Execution
### POST /prompt — Queue a workflow
```json
{
"prompt": { "<workflow nodes dict>" },
"client_id": "optional-uuid"
}
```
Response: `{"prompt_id": "uuid", "number": 1, "node_errors": {}}`
### GET /history/{prompt_id} — Single prompt history
Returns: `{ "prompt_id": { "prompt": [...], "outputs": {...}, "status": {...} } }`
Empty dict `{}` if not yet complete.
### GET /history — All execution history
Query params: `?max_items=200&offset=0`
### POST /interrupt — Stop current generation
### GET /queue — Queue status
Returns: `{"queue_running": [...], "queue_pending": [...]}`
### POST /queue — Manage queue
Body: `{"clear": true}` to clear all, or `{"delete": ["prompt_id1", ...]}`.
## Images
### GET /view — Download image
Query params: `filename` (required), `type` (`output`|`input`|`temp`), `subfolder`.
### POST /upload/image — Upload image
Multipart form: `image` (file), `type` (`input`), `subfolder`, `overwrite` (`true`|`false`).
Response: `{"name": "filename.png", "subfolder": "", "type": "input"}`
## Node/Model Information
### GET /object_info — All node types
Returns every registered node with inputs, outputs, types, defaults, category.
### GET /object_info/{class_type} — Single node info
### GET /models/{folder} — List models
Folders: `checkpoints`, `loras`, `vae`, `controlnet`, `clip`, `clip_vision`,
`upscale_models`, `embeddings`, `unet`, `diffusion_models`.
Returns: array of filename strings.
## System
### GET /system_stats — System information
Returns: OS, Python version, PyTorch version, VRAM per device, RAM total/free.
### POST /free — Free memory
Body: `{"unload_models": true, "free_memory": true}`
## Workflow JSON Format (API Format)
```json
{
"node_id_string": {
"class_type": "NodeClassName",
"inputs": {
"param_name": "value",
"linked_input": ["source_node_id", output_index]
}
}
}
```
- Node IDs are **strings** (`"3"`, not `3`)
- Links use `["node_id", output_index]` arrays (0-based int)
- `class_type` must match a registered node exactly (case-sensitive)
## WebSocket (real-time progress)
Connect to: `ws://host:8188/ws?clientId={uuid}`
Key events: `execution_start`, `executing` (null = done), `progress`, `execution_success`, `execution_error`.

View File

@@ -0,0 +1,121 @@
# ComfyUI Workflow Recipes
Ready-to-use workflow templates. Always call `list_models("checkpoints")` first
to discover the exact checkpoint filename on the user's system.
## SDXL Text-to-Image
```python
workflow = {
"1": {
"class_type": "CheckpointLoaderSimple",
"inputs": {"ckpt_name": "SDXL_CHECKPOINT_HERE"},
},
"2": {
"class_type": "CLIPTextEncode",
"inputs": {"text": "POSITIVE PROMPT", "clip": ["1", 1]},
},
"3": {
"class_type": "CLIPTextEncode",
"inputs": {"text": "ugly, blurry, low quality, deformed", "clip": ["1", 1]},
},
"4": {
"class_type": "EmptyLatentImage",
"inputs": {"width": 1024, "height": 1024, "batch_size": 1},
},
"5": {
"class_type": "KSampler",
"inputs": {
"seed": 0, "steps": 25, "cfg": 7.0,
"sampler_name": "euler_ancestral", "scheduler": "normal",
"denoise": 1.0,
"model": ["1", 0], "positive": ["2", 0],
"negative": ["3", 0], "latent_image": ["4", 0],
},
},
"6": {
"class_type": "VAEDecode",
"inputs": {"samples": ["5", 0], "vae": ["1", 2]},
},
"7": {
"class_type": "SaveImage",
"inputs": {"filename_prefix": "hermes_sdxl", "images": ["6", 0]},
},
}
```
SDXL sizes: 1024×1024, 1152×896, 896×1152. Steps 20-30. CFG 5-9.
## Image-to-Image
```python
# Upload the input image first
result = upload_image("/path/to/input.png")
input_name = result["name"]
workflow = {
"1": {"class_type": "CheckpointLoaderSimple", "inputs": {"ckpt_name": "CHECKPOINT"}},
"2": {"class_type": "LoadImage", "inputs": {"image": input_name}},
"3": {"class_type": "VAEEncode", "inputs": {"pixels": ["2", 0], "vae": ["1", 2]}},
"4": {"class_type": "CLIPTextEncode", "inputs": {"text": "POSITIVE", "clip": ["1", 1]}},
"5": {"class_type": "CLIPTextEncode", "inputs": {"text": "ugly, blurry", "clip": ["1", 1]}},
"6": {
"class_type": "KSampler",
"inputs": {
"seed": 0, "steps": 20, "cfg": 7.0,
"sampler_name": "euler", "scheduler": "normal",
"denoise": 0.6,
"model": ["1", 0], "positive": ["4", 0],
"negative": ["5", 0], "latent_image": ["3", 0],
},
},
"7": {"class_type": "VAEDecode", "inputs": {"samples": ["6", 0], "vae": ["1", 2]}},
"8": {"class_type": "SaveImage", "inputs": {"filename_prefix": "hermes_img2img", "images": ["7", 0]}},
}
```
Key: **denoise** (0.3 = subtle, 0.6 = moderate, 0.9 = heavy changes).
## Flux Text-to-Image
Flux uses separate UNET/CLIP/VAE loaders (not CheckpointLoaderSimple).
```python
workflow = {
"1": {"class_type": "UNETLoader", "inputs": {"unet_name": "FLUX_UNET", "weight_dtype": "default"}},
"2": {"class_type": "DualCLIPLoader", "inputs": {"clip_name1": "T5_CLIP", "clip_name2": "CLIP_L", "type": "flux"}},
"3": {"class_type": "VAELoader", "inputs": {"vae_name": "VAE_NAME"}},
"4": {"class_type": "CLIPTextEncode", "inputs": {"text": "PROMPT", "clip": ["2", 0]}},
"5": {"class_type": "EmptySD3LatentImage", "inputs": {"width": 1024, "height": 1024, "batch_size": 1}},
"6": {
"class_type": "KSampler",
"inputs": {
"seed": 0, "steps": 20, "cfg": 1.0,
"sampler_name": "euler", "scheduler": "simple",
"denoise": 1.0,
"model": ["1", 0], "positive": ["4", 0],
"negative": ["4", 0], "latent_image": ["5", 0],
},
},
"7": {"class_type": "VAEDecode", "inputs": {"samples": ["6", 0], "vae": ["3", 0]}},
"8": {"class_type": "SaveImage", "inputs": {"filename_prefix": "hermes_flux", "images": ["7", 0]}},
}
```
Flux: CFG 1.0, `euler` + `simple`. Negative prompt has minimal effect.
## Execution Pattern (all recipes)
```python
pid = queue_prompt(workflow)
result = wait_for_completion(pid, timeout=300)
for node_id, node_output in result["outputs"].items():
if "images" in node_output:
for img_info in node_output["images"]:
data = get_image(img_info["filename"], img_info["subfolder"], img_info["type"])
local_path = f"/tmp/{img_info['filename']}"
with open(local_path, "wb") as f:
f.write(data)
print(f"Saved: {local_path} ({len(data)} bytes)")
```