mirror of https://github.com/NousResearch/hermes-agent.git synced 2026-05-03 17:27:37 +08:00

Files

SHL0MS a7780fe05f fix(skills/comfyui): bug fixes, cloud parity, expanded coverage, examples, tests

The audit of v4.1 surfaced ~70 issues across the five scripts and three
reference docs — most user-visible (silent file overwrites, status-error
misclassified as success, X-API-Key leaked to S3 on /api/view redirect,
Cloud endpoints that 404 because they were renamed). v5.0.0 fixes those
and fills the gaps that previously forced users to write their own glue
(WebSocket monitoring, batch/sweep, img2img upload helper, dep auto-fix,
log fetch, health check, example workflows).

Critical fixes
- run_workflow.py: poll_status now checks status_str==error BEFORE
  completed:true, so a failed run no longer reports success
- run_workflow.py: download_output streams to disk via safe_path_join,
  preserves server subfolder structure (no silent overwrites), and
  retries with exponential backoff
- run_workflow.py: refuses to overwrite a link with a literal in
  inject_params (would silently break wiring)
- _common.py: _StripSensitiveOnRedirectSession (subclasses
  requests.Session.rebuild_auth) drops X-API-Key/Cookie on cross-host
  redirects — fixes a real key-leak path through Cloud's signed-URL
  download flow. Tested
- Cloud routing (verified live): /history → /history_v2,
  /models/<f> → /experiment/models/<f>, plus folder aliases for the
  unet ↔ diffusion_models and clip ↔ text_encoders rename
- check_deps.py: distinguishes 200/empty vs 404 folder_not_found vs
  403 free-tier; emits concrete fix_command per missing dep
- extract_schema.py: prompt vs negative_prompt determined by tracing
  KSampler.{positive,negative} connections (incl. through Reroute /
  Primitive nodes) instead of meta-title heuristic; symmetric
  duplicate-name resolution; cycle-safe trace_to_node
- hardware_check.py: multi-GPU pick-best, Apple variant detection,
  Rosetta detection, WSL2, ROCm --json, disk-space check, optional
  PyTorch probe; powershell preferred over deprecated wmic
- comfyui_setup.sh: prefers pipx → uvx → pip --user (with PEP-668
  fallback); idempotent — skips relaunch if server already up;
  configurable port/workspace; persistent log; SIGINT trap

New scripts
- run_batch.py — count or sweep (cartesian product), parallel up to
  cloud tier limit
- ws_monitor.py — real-time WebSocket viewer; saves preview frames
- auto_fix_deps.py — runs comfy node install / model download for
  whatever check_deps reports missing (with --dry-run)
- health_check.py — single command that runs the verification checklist
  (comfy-cli + server + checkpoints + optional smoke test that cancels
  itself to avoid burning compute)
- fetch_logs.py — pull traceback / status messages for a prompt_id

Coverage expansion
- Param patterns now cover Flux (BasicScheduler, BasicGuider,
  RandomNoise, ModelSamplingFlux), SD3, Wan/Hunyuan/LTX video,
  IPAdapter, rgthree, easy-use, AnimateDiff
- Embedding refs in CLIPTextEncode strings extracted as model deps
- ckpt_name / vae_name / lora_name / unet_name now controllable so
  workflows can be retargeted per run

Examples
- workflows/{sd15,sdxl,flux_dev}_txt2img.json
- workflows/sdxl_{img2img,inpaint}.json
- workflows/upscale_4x.json
- workflows/{animatediff_video,wan_video_t2v}.json + README

Tests
- 117 tests (105 unit + 8 cloud integration + 4 cross-host security)
- Cloud tests auto-skip without COMFY_CLOUD_API_KEY; verified end-to-end
  against live cloud API

Backwards compatibility
- All existing CLI flags continue to work; new behavior is opt-in
  (--ws, --input-image, --randomize-seed, --flat-output, etc.)

2026-04-29 20:48:01 -07:00

7.8 KiB

Raw Blame History

ComfyUI Workflow JSON Format

Two Formats — Only API Format Is Executable

API format is required for /api/prompt and every script in this skill. The web UI also produces an "editor format" used for visual editing, which cannot be submitted directly.

API Format

Top-level keys are string node IDs. Each node has class_type and inputs:

{
  "3": {
    "class_type": "KSampler",
    "inputs": {
      "seed": 156680208700286,
      "steps": 20,
      "cfg": 8,
      "sampler_name": "euler",
      "scheduler": "normal",
      "denoise": 1.0,
      "model": ["4", 0],
      "positive": ["6", 0],
      "negative": ["7", 0],
      "latent_image": ["5", 0]
    },
    "_meta": {"title": "KSampler"}
  },
  "4": {
    "class_type": "CheckpointLoaderSimple",
    "inputs": {"ckpt_name": "v1-5-pruned-emaonly.safetensors"}
  }
}

Detection: every top-level value has class_type. The skill's _common.is_api_format() does this check.

Editor Format (not directly executable)

Has nodes[] and links[] arrays — the visual graph. To convert: open in ComfyUI's web UI and use Workflow → Export (API) (newer UI) or the "Save (API Format)" button (older UI).

Detection: top-level has "nodes" and "links" keys.

Inputs: Literals vs Links

"inputs": {
  "text": "a cat",         // literal — modifiable
  "seed": 42,              // literal — modifiable
  "clip": ["4", 1]         // link — wiring; do NOT overwrite
}

Links are length-2 arrays of [upstream_node_id, output_slot]. The skill's parameter injector refuses to overwrite a link with a literal (logs a warning and skips).

Common Node Types and Their Controllable Parameters

The full catalog lives in scripts/_common.py (PARAM_PATTERNS and MODEL_LOADERS). Highlights:

Text Prompts

Node Class	Key Fields
`CLIPTextEncode`	`text`
`CLIPTextEncodeSDXL`	`text_g`, `text_l`, `width`, `height`
`CLIPTextEncodeFlux`	`clip_l`, `t5xxl`, `guidance`

To distinguish positive from negative the skill traces KSampler.negative back through Reroute / Primitive nodes to the source CLIPTextEncode. Falls back to _meta.title heuristics ("negative", "neg", "anti").

Sampling

Node Class	Key Fields
`KSampler`	`seed`, `steps`, `cfg`, `sampler_name`, `scheduler`, `denoise`
`KSamplerAdvanced`	`noise_seed`, `steps`, `cfg`, `start_at_step`, `end_at_step`
`SamplerCustom`	`noise_seed`, `cfg`, `sampler`, `sigmas`
`SamplerCustomAdvanced`	`noise_seed` (via RandomNoise input)
`RandomNoise`	`noise_seed`
`BasicScheduler`	`steps`, `scheduler`, `denoise`
`KSamplerSelect`	`sampler_name`
`BasicGuider` / `CFGGuider`	`cfg`
`ModelSamplingFlux`	`max_shift`, `base_shift`, `width`, `height`
`SDTurboScheduler`	`steps`, `denoise`

Latent / Dimensions

Node Class	Key Fields
`EmptyLatentImage`	`width`, `height`, `batch_size`
`EmptySD3LatentImage`	`width`, `height`, `batch_size`
`EmptyHunyuanLatentVideo`	`width`, `height`, `length`, `batch_size`
`EmptyMochiLatentVideo`	`width`, `height`, `length`, `batch_size`
`EmptyLTXVLatentVideo`	`width`, `height`, `length`, `batch_size`

Model Loading

Node Class	Key Fields	Folder
`CheckpointLoaderSimple`	`ckpt_name`	`checkpoints`
`LoraLoader`	`lora_name`, `strength_model`, `strength_clip`	`loras`
`LoraLoaderModelOnly`	`lora_name`, `strength_model`	`loras`
`VAELoader`	`vae_name`	`vae`
`ControlNetLoader`	`control_net_name`	`controlnet`
`CLIPLoader`	`clip_name`	`clip`
`DualCLIPLoader`	`clip_name1`, `clip_name2`	`clip`
`TripleCLIPLoader`	`clip_name1/2/3`	`clip`
`UNETLoader`	`unet_name`	`unet`
`DiffusionModelLoader`	`model_name`	`diffusion_models`
`UpscaleModelLoader`	`model_name`	`upscale_models`
`IPAdapterModelLoader`	`ipadapter_file`	`ipadapter`
`ADE_AnimateDiffLoaderWithContext`	`model_name`, `motion_scale`	`animatediff_models`

Image Input/Output

Node Class	Key Fields
`LoadImage`	`image` (server-side filename, after upload)
`LoadImageMask`	`image`, `channel` (`red` / `green` / `blue` / `alpha`)
`VAEEncode` / `VAEDecode`	(no controllable fields)
`VAEEncodeForInpaint`	`grow_mask_by`
`SaveImage`	`filename_prefix`
`VHS_VideoCombine`	`frame_rate`, `format`, `filename_prefix`, `loop_count`, `pingpong`

ControlNet

Node Class	Key Fields
`ControlNetApply`	`strength`
`ControlNetApplyAdvanced`	`strength`, `start_percent`, `end_percent`

IPAdapter (community pack `comfyui_ipadapter_plus`)

Node Class	Key Fields
`IPAdapterAdvanced`	`weight`, `start_at`, `end_at`
`IPAdapter`	`weight`

Embeddings (referenced inside prompt strings)

ComfyUI scans prompt text for embedding:NAME syntax. The skill's _common.iter_embedding_refs() extracts these as model dependencies.

"a beautiful cat, embedding:goodvibes:1.2, embedding:art-style"

extract_schema.py and check_deps.py surface these in embedding_dependencies / missing_embeddings.

Parameter Injection Pattern

import json, copy

with open("workflow_api.json") as f:
    workflow = json.load(f)

wf = copy.deepcopy(workflow)
wf["6"]["inputs"]["text"] = "a beautiful sunset"
wf["7"]["inputs"]["text"] = "ugly, blurry"
wf["3"]["inputs"]["seed"] = 42
wf["3"]["inputs"]["steps"] = 30
wf["5"]["inputs"]["width"] = 1024
wf["5"]["inputs"]["height"] = 1024

scripts/extract_schema.py automates discovering which node IDs/fields correspond to which user-facing parameters. It returns a parameters dict that run_workflow.py reads to inject values from --args.

Identifying Controllable Parameters (Heuristics)

For unknown workflows:

Prompt text — any CLIPTextEncode.text. Use connection tracing back from KSampler.positive / .negative to disambiguate (don't trust meta-title alone).
Seed — KSampler.seed / KSamplerAdvanced.noise_seed / RandomNoise.noise_seed.
Dimensions — Empty*LatentImage.width/height (must be multiples of 8).
Steps / CFG — KSampler.steps, KSampler.cfg. Steps 20–50 typical. CFG 5–15 typical (Flux uses guidance, not CFG).
Model / checkpoint — CheckpointLoaderSimple.ckpt_name. Filename must match an installed file exactly.
LoRA — LoraLoader.lora_name, .strength_model.
Images for img2img / inpaint — LoadImage.image. Server-side filename after upload.
Denoise — KSampler.denoise. 0.0–1.0; 1.0 = ignore input image, 0.0 = pass through. Sweet spot for img2img: 0.4–0.7.

Output Nodes

Output is produced by these node types. The skill's OUTPUT_NODES set extends to common community packs.

Node	Output Key	Content
`SaveImage`	`images`	List of `{filename, subfolder, type}`
`PreviewImage`	`images`	Temporary preview (not saved)
`VHS_VideoCombine`	`gifs` (older) or `videos`/`video` (newer cloud)	Video file refs
`SaveAudio`	`audio`	Audio file refs
`SaveAnimatedWEBP` / `SaveAnimatedPNG`	`images`	Animated images
`Save3D`	`3d`	3D asset refs

After execution, fetch outputs from /history/{prompt_id} (local) or /api/jobs/{prompt_id} (cloud) → outputs → {node_id} → {key}.

Wrapper Variants

Some saved JSON files wrap the workflow under a "prompt" key (matching the /api/prompt payload shape). The skill's _common.unwrap_workflow() handles this — pass any of:

raw API format: {"3": {...}, "4": {...}}
wrapped: {"prompt": {"3": {...}}, "client_id": "..."}

It rejects editor format with a clear error and a re-export instruction.

7.8 KiB Raw Blame History Unescape Escape