hermes-agent/skills/creative/comfyui/references/workflow-format.md

# ComfyUI Workflow JSON Format

## Two Formats — Only API Format Is Executable

**API format** is required for `/api/prompt` and every script in this skill.
The web UI also produces an "editor format" used for visual editing, which
**cannot** be submitted directly.

### API Format

Top-level keys are string node IDs. Each node has `class_type` and `inputs`:

```json
{
  "3": {
    "class_type": "KSampler",
    "inputs": {
      "seed": 156680208700286,
      "steps": 20,
      "cfg": 8,
      "sampler_name": "euler",
      "scheduler": "normal",
      "denoise": 1.0,
      "model": ["4", 0],
      "positive": ["6", 0],
      "negative": ["7", 0],
      "latent_image": ["5", 0]
    },
    "_meta": {"title": "KSampler"}
  },
  "4": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {"ckpt_name": "v1-5-pruned-emaonly.safetensors"}
  }
}
```

**Detection:** every top-level value has `class_type`. The skill's
`_common.is_api_format()` does this check.

### Editor Format (not directly executable)

Has `nodes[]` and `links[]` arrays — the visual graph. To convert: open in
ComfyUI's web UI and use **Workflow → Export (API)** (newer UI) or the
"Save (API Format)" button (older UI).

**Detection:** top-level has `"nodes"` and `"links"` keys.

## Inputs: Literals vs Links

```json
"inputs": {
  "text": "a cat",         // literal — modifiable
  "seed": 42,              // literal — modifiable
  "clip": ["4", 1]         // link — wiring; do NOT overwrite
}
```

Links are length-2 arrays of `[upstream_node_id, output_slot]`. The skill's
parameter injector refuses to overwrite a link with a literal (logs a
warning and skips).

## Common Node Types and Their Controllable Parameters

The full catalog lives in `scripts/_common.py` (`PARAM_PATTERNS` and
`MODEL_LOADERS`). Highlights:

### Text Prompts

| Node Class | Key Fields |
|------------|------------|
| `CLIPTextEncode` | `text` |
| `CLIPTextEncodeSDXL` | `text_g`, `text_l`, `width`, `height` |
| `CLIPTextEncodeFlux` | `clip_l`, `t5xxl`, `guidance` |

To distinguish positive from negative the skill traces `KSampler.negative`
back through Reroute / Primitive nodes to the source CLIPTextEncode. Falls
back to `_meta.title` heuristics ("negative", "neg", "anti").

### Sampling

| Node Class | Key Fields |
|------------|------------|
| `KSampler` | `seed`, `steps`, `cfg`, `sampler_name`, `scheduler`, `denoise` |
| `KSamplerAdvanced` | `noise_seed`, `steps`, `cfg`, `start_at_step`, `end_at_step` |
| `SamplerCustom` | `noise_seed`, `cfg`, `sampler`, `sigmas` |
| `SamplerCustomAdvanced` | `noise_seed` (via RandomNoise input) |
| `RandomNoise` | `noise_seed` |
| `BasicScheduler` | `steps`, `scheduler`, `denoise` |
| `KSamplerSelect` | `sampler_name` |
| `BasicGuider` / `CFGGuider` | `cfg` |
| `ModelSamplingFlux` | `max_shift`, `base_shift`, `width`, `height` |
| `SDTurboScheduler` | `steps`, `denoise` |

### Latent / Dimensions

| Node Class | Key Fields |
|------------|------------|
| `EmptyLatentImage` | `width`, `height`, `batch_size` |
| `EmptySD3LatentImage` | `width`, `height`, `batch_size` |
| `EmptyHunyuanLatentVideo` | `width`, `height`, `length`, `batch_size` |
| `EmptyMochiLatentVideo` | `width`, `height`, `length`, `batch_size` |
| `EmptyLTXVLatentVideo` | `width`, `height`, `length`, `batch_size` |

### Model Loading

| Node Class | Key Fields | Folder |
|------------|------------|--------|
| `CheckpointLoaderSimple` | `ckpt_name` | `checkpoints` |
| `LoraLoader` | `lora_name`, `strength_model`, `strength_clip` | `loras` |
| `LoraLoaderModelOnly` | `lora_name`, `strength_model` | `loras` |
| `VAELoader` | `vae_name` | `vae` |
| `ControlNetLoader` | `control_net_name` | `controlnet` |
| `CLIPLoader` | `clip_name` | `clip` |
| `DualCLIPLoader` | `clip_name1`, `clip_name2` | `clip` |
| `TripleCLIPLoader` | `clip_name1/2/3` | `clip` |
| `UNETLoader` | `unet_name` | `unet` |
| `DiffusionModelLoader` | `model_name` | `diffusion_models` |
| `UpscaleModelLoader` | `model_name` | `upscale_models` |
| `IPAdapterModelLoader` | `ipadapter_file` | `ipadapter` |
| `ADE_AnimateDiffLoaderWithContext` | `model_name`, `motion_scale` | `animatediff_models` |

### Image Input/Output

| Node Class | Key Fields |
|------------|------------|
| `LoadImage` | `image` (server-side filename, after upload) |
| `LoadImageMask` | `image`, `channel` (`red` / `green` / `blue` / `alpha`) |
| `VAEEncode` / `VAEDecode` | (no controllable fields) |
| `VAEEncodeForInpaint` | `grow_mask_by` |
| `SaveImage` | `filename_prefix` |
| `VHS_VideoCombine` | `frame_rate`, `format`, `filename_prefix`, `loop_count`, `pingpong` |

### ControlNet

| Node Class | Key Fields |
|------------|------------|
| `ControlNetApply` | `strength` |
| `ControlNetApplyAdvanced` | `strength`, `start_percent`, `end_percent` |

### IPAdapter (community pack `comfyui_ipadapter_plus`)

| Node Class | Key Fields |
|------------|------------|
| `IPAdapterAdvanced` | `weight`, `start_at`, `end_at` |
| `IPAdapter` | `weight` |

### Embeddings (referenced inside prompt strings)

ComfyUI scans prompt text for `embedding:NAME` syntax. The skill's
`_common.iter_embedding_refs()` extracts these as model dependencies.

```text
"a beautiful cat, embedding:goodvibes:1.2, embedding:art-style"
```

`extract_schema.py` and `check_deps.py` surface these in
`embedding_dependencies` / `missing_embeddings`.

## Parameter Injection Pattern

```python
import json, copy

with open("workflow_api.json") as f:
    workflow = json.load(f)

wf = copy.deepcopy(workflow)
wf["6"]["inputs"]["text"] = "a beautiful sunset"
wf["7"]["inputs"]["text"] = "ugly, blurry"
wf["3"]["inputs"]["seed"] = 42
wf["3"]["inputs"]["steps"] = 30
wf["5"]["inputs"]["width"] = 1024
wf["5"]["inputs"]["height"] = 1024
```

`scripts/extract_schema.py` automates discovering which node IDs/fields
correspond to which user-facing parameters. It returns a `parameters` dict
that `run_workflow.py` reads to inject values from `--args`.

## Identifying Controllable Parameters (Heuristics)

For unknown workflows:

1. **Prompt text** — any `CLIPTextEncode.text`. Use connection tracing back
   from `KSampler.positive` / `.negative` to disambiguate (don't trust
   meta-title alone).
2. **Seed** — `KSampler.seed` / `KSamplerAdvanced.noise_seed` / `RandomNoise.noise_seed`.
3. **Dimensions** — `Empty*LatentImage.width/height` (must be multiples of 8).
4. **Steps / CFG** — `KSampler.steps`, `KSampler.cfg`. Steps 20–50 typical.
   CFG 5–15 typical (Flux uses guidance, not CFG).
5. **Model / checkpoint** — `CheckpointLoaderSimple.ckpt_name`. Filename must
   match an installed file *exactly*.
6. **LoRA** — `LoraLoader.lora_name`, `.strength_model`.
7. **Images for img2img / inpaint** — `LoadImage.image`. Server-side filename
   after upload.
8. **Denoise** — `KSampler.denoise`. 0.0–1.0; 1.0 = ignore input image,
   0.0 = pass through. Sweet spot for img2img: 0.4–0.7.

## Output Nodes

Output is produced by these node types. The skill's `OUTPUT_NODES` set
extends to common community packs.

| Node | Output Key | Content |
|------|-----------|---------|
| `SaveImage` | `images` | List of `{filename, subfolder, type}` |
| `PreviewImage` | `images` | Temporary preview (not saved) |
| `VHS_VideoCombine` | `gifs` (older) or `videos`/`video` (newer cloud) | Video file refs |
| `SaveAudio` | `audio` | Audio file refs |
| `SaveAnimatedWEBP` / `SaveAnimatedPNG` | `images` | Animated images |
| `Save3D` | `3d` | 3D asset refs |

After execution, fetch outputs from `/history/{prompt_id}` (local) or
`/api/jobs/{prompt_id}` (cloud) → `outputs` → `{node_id}` → `{key}`.

## Wrapper Variants

Some saved JSON files wrap the workflow under a `"prompt"` key (matching
the `/api/prompt` payload shape). The skill's `_common.unwrap_workflow()`
handles this — pass any of:

- raw API format: `{"3": {...}, "4": {...}}`
- wrapped: `{"prompt": {"3": {...}}, "client_id": "..."}`

It rejects editor format with a clear error and a re-export instruction.