Files
hermes-agent/skills/creative/comfyui/references/workflow-format.md
SHL0MS a7780fe05f fix(skills/comfyui): bug fixes, cloud parity, expanded coverage, examples, tests
The audit of v4.1 surfaced ~70 issues across the five scripts and three
reference docs — most user-visible (silent file overwrites, status-error
misclassified as success, X-API-Key leaked to S3 on /api/view redirect,
Cloud endpoints that 404 because they were renamed). v5.0.0 fixes those
and fills the gaps that previously forced users to write their own glue
(WebSocket monitoring, batch/sweep, img2img upload helper, dep auto-fix,
log fetch, health check, example workflows).

Critical fixes
- run_workflow.py: poll_status now checks status_str==error BEFORE
  completed:true, so a failed run no longer reports success
- run_workflow.py: download_output streams to disk via safe_path_join,
  preserves server subfolder structure (no silent overwrites), and
  retries with exponential backoff
- run_workflow.py: refuses to overwrite a link with a literal in
  inject_params (would silently break wiring)
- _common.py: _StripSensitiveOnRedirectSession (subclasses
  requests.Session.rebuild_auth) drops X-API-Key/Cookie on cross-host
  redirects — fixes a real key-leak path through Cloud's signed-URL
  download flow. Tested
- Cloud routing (verified live): /history → /history_v2,
  /models/<f> → /experiment/models/<f>, plus folder aliases for the
  unet ↔ diffusion_models and clip ↔ text_encoders rename
- check_deps.py: distinguishes 200/empty vs 404 folder_not_found vs
  403 free-tier; emits concrete fix_command per missing dep
- extract_schema.py: prompt vs negative_prompt determined by tracing
  KSampler.{positive,negative} connections (incl. through Reroute /
  Primitive nodes) instead of meta-title heuristic; symmetric
  duplicate-name resolution; cycle-safe trace_to_node
- hardware_check.py: multi-GPU pick-best, Apple variant detection,
  Rosetta detection, WSL2, ROCm --json, disk-space check, optional
  PyTorch probe; powershell preferred over deprecated wmic
- comfyui_setup.sh: prefers pipx → uvx → pip --user (with PEP-668
  fallback); idempotent — skips relaunch if server already up;
  configurable port/workspace; persistent log; SIGINT trap

New scripts
- run_batch.py — count or sweep (cartesian product), parallel up to
  cloud tier limit
- ws_monitor.py — real-time WebSocket viewer; saves preview frames
- auto_fix_deps.py — runs comfy node install / model download for
  whatever check_deps reports missing (with --dry-run)
- health_check.py — single command that runs the verification checklist
  (comfy-cli + server + checkpoints + optional smoke test that cancels
  itself to avoid burning compute)
- fetch_logs.py — pull traceback / status messages for a prompt_id

Coverage expansion
- Param patterns now cover Flux (BasicScheduler, BasicGuider,
  RandomNoise, ModelSamplingFlux), SD3, Wan/Hunyuan/LTX video,
  IPAdapter, rgthree, easy-use, AnimateDiff
- Embedding refs in CLIPTextEncode strings extracted as model deps
- ckpt_name / vae_name / lora_name / unet_name now controllable so
  workflows can be retargeted per run

Examples
- workflows/{sd15,sdxl,flux_dev}_txt2img.json
- workflows/sdxl_{img2img,inpaint}.json
- workflows/upscale_4x.json
- workflows/{animatediff_video,wan_video_t2v}.json + README

Tests
- 117 tests (105 unit + 8 cloud integration + 4 cross-host security)
- Cloud tests auto-skip without COMFY_CLOUD_API_KEY; verified end-to-end
  against live cloud API

Backwards compatibility
- All existing CLI flags continue to work; new behavior is opt-in
  (--ws, --input-image, --randomize-seed, --flat-output, etc.)
2026-04-29 20:48:01 -07:00

7.8 KiB
Raw Blame History

ComfyUI Workflow JSON Format

Two Formats — Only API Format Is Executable

API format is required for /api/prompt and every script in this skill. The web UI also produces an "editor format" used for visual editing, which cannot be submitted directly.

API Format

Top-level keys are string node IDs. Each node has class_type and inputs:

{
  "3": {
    "class_type": "KSampler",
    "inputs": {
      "seed": 156680208700286,
      "steps": 20,
      "cfg": 8,
      "sampler_name": "euler",
      "scheduler": "normal",
      "denoise": 1.0,
      "model": ["4", 0],
      "positive": ["6", 0],
      "negative": ["7", 0],
      "latent_image": ["5", 0]
    },
    "_meta": {"title": "KSampler"}
  },
  "4": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {"ckpt_name": "v1-5-pruned-emaonly.safetensors"}
  }
}

Detection: every top-level value has class_type. The skill's _common.is_api_format() does this check.

Editor Format (not directly executable)

Has nodes[] and links[] arrays — the visual graph. To convert: open in ComfyUI's web UI and use Workflow → Export (API) (newer UI) or the "Save (API Format)" button (older UI).

Detection: top-level has "nodes" and "links" keys.

"inputs": {
  "text": "a cat",         // literal — modifiable
  "seed": 42,              // literal — modifiable
  "clip": ["4", 1]         // link — wiring; do NOT overwrite
}

Links are length-2 arrays of [upstream_node_id, output_slot]. The skill's parameter injector refuses to overwrite a link with a literal (logs a warning and skips).

Common Node Types and Their Controllable Parameters

The full catalog lives in scripts/_common.py (PARAM_PATTERNS and MODEL_LOADERS). Highlights:

Text Prompts

Node Class Key Fields
CLIPTextEncode text
CLIPTextEncodeSDXL text_g, text_l, width, height
CLIPTextEncodeFlux clip_l, t5xxl, guidance

To distinguish positive from negative the skill traces KSampler.negative back through Reroute / Primitive nodes to the source CLIPTextEncode. Falls back to _meta.title heuristics ("negative", "neg", "anti").

Sampling

Node Class Key Fields
KSampler seed, steps, cfg, sampler_name, scheduler, denoise
KSamplerAdvanced noise_seed, steps, cfg, start_at_step, end_at_step
SamplerCustom noise_seed, cfg, sampler, sigmas
SamplerCustomAdvanced noise_seed (via RandomNoise input)
RandomNoise noise_seed
BasicScheduler steps, scheduler, denoise
KSamplerSelect sampler_name
BasicGuider / CFGGuider cfg
ModelSamplingFlux max_shift, base_shift, width, height
SDTurboScheduler steps, denoise

Latent / Dimensions

Node Class Key Fields
EmptyLatentImage width, height, batch_size
EmptySD3LatentImage width, height, batch_size
EmptyHunyuanLatentVideo width, height, length, batch_size
EmptyMochiLatentVideo width, height, length, batch_size
EmptyLTXVLatentVideo width, height, length, batch_size

Model Loading

Node Class Key Fields Folder
CheckpointLoaderSimple ckpt_name checkpoints
LoraLoader lora_name, strength_model, strength_clip loras
LoraLoaderModelOnly lora_name, strength_model loras
VAELoader vae_name vae
ControlNetLoader control_net_name controlnet
CLIPLoader clip_name clip
DualCLIPLoader clip_name1, clip_name2 clip
TripleCLIPLoader clip_name1/2/3 clip
UNETLoader unet_name unet
DiffusionModelLoader model_name diffusion_models
UpscaleModelLoader model_name upscale_models
IPAdapterModelLoader ipadapter_file ipadapter
ADE_AnimateDiffLoaderWithContext model_name, motion_scale animatediff_models

Image Input/Output

Node Class Key Fields
LoadImage image (server-side filename, after upload)
LoadImageMask image, channel (red / green / blue / alpha)
VAEEncode / VAEDecode (no controllable fields)
VAEEncodeForInpaint grow_mask_by
SaveImage filename_prefix
VHS_VideoCombine frame_rate, format, filename_prefix, loop_count, pingpong

ControlNet

Node Class Key Fields
ControlNetApply strength
ControlNetApplyAdvanced strength, start_percent, end_percent

IPAdapter (community pack comfyui_ipadapter_plus)

Node Class Key Fields
IPAdapterAdvanced weight, start_at, end_at
IPAdapter weight

Embeddings (referenced inside prompt strings)

ComfyUI scans prompt text for embedding:NAME syntax. The skill's _common.iter_embedding_refs() extracts these as model dependencies.

"a beautiful cat, embedding:goodvibes:1.2, embedding:art-style"

extract_schema.py and check_deps.py surface these in embedding_dependencies / missing_embeddings.

Parameter Injection Pattern

import json, copy

with open("workflow_api.json") as f:
    workflow = json.load(f)

wf = copy.deepcopy(workflow)
wf["6"]["inputs"]["text"] = "a beautiful sunset"
wf["7"]["inputs"]["text"] = "ugly, blurry"
wf["3"]["inputs"]["seed"] = 42
wf["3"]["inputs"]["steps"] = 30
wf["5"]["inputs"]["width"] = 1024
wf["5"]["inputs"]["height"] = 1024

scripts/extract_schema.py automates discovering which node IDs/fields correspond to which user-facing parameters. It returns a parameters dict that run_workflow.py reads to inject values from --args.

Identifying Controllable Parameters (Heuristics)

For unknown workflows:

  1. Prompt text — any CLIPTextEncode.text. Use connection tracing back from KSampler.positive / .negative to disambiguate (don't trust meta-title alone).
  2. SeedKSampler.seed / KSamplerAdvanced.noise_seed / RandomNoise.noise_seed.
  3. DimensionsEmpty*LatentImage.width/height (must be multiples of 8).
  4. Steps / CFGKSampler.steps, KSampler.cfg. Steps 2050 typical. CFG 515 typical (Flux uses guidance, not CFG).
  5. Model / checkpointCheckpointLoaderSimple.ckpt_name. Filename must match an installed file exactly.
  6. LoRALoraLoader.lora_name, .strength_model.
  7. Images for img2img / inpaintLoadImage.image. Server-side filename after upload.
  8. DenoiseKSampler.denoise. 0.01.0; 1.0 = ignore input image, 0.0 = pass through. Sweet spot for img2img: 0.40.7.

Output Nodes

Output is produced by these node types. The skill's OUTPUT_NODES set extends to common community packs.

Node Output Key Content
SaveImage images List of {filename, subfolder, type}
PreviewImage images Temporary preview (not saved)
VHS_VideoCombine gifs (older) or videos/video (newer cloud) Video file refs
SaveAudio audio Audio file refs
SaveAnimatedWEBP / SaveAnimatedPNG images Animated images
Save3D 3d 3D asset refs

After execution, fetch outputs from /history/{prompt_id} (local) or /api/jobs/{prompt_id} (cloud) → outputs{node_id}{key}.

Wrapper Variants

Some saved JSON files wrap the workflow under a "prompt" key (matching the /api/prompt payload shape). The skill's _common.unwrap_workflow() handles this — pass any of:

  • raw API format: {"3": {...}, "4": {...}}
  • wrapped: {"prompt": {"3": {...}}, "client_id": "..."}

It rejects editor format with a clear error and a re-export instruction.