mirror of
https://github.com/NousResearch/hermes-agent.git
synced 2026-05-03 17:27:37 +08:00
The audit of v4.1 surfaced ~70 issues across the five scripts and three
reference docs — most user-visible (silent file overwrites, status-error
misclassified as success, X-API-Key leaked to S3 on /api/view redirect,
Cloud endpoints that 404 because they were renamed). v5.0.0 fixes those
and fills the gaps that previously forced users to write their own glue
(WebSocket monitoring, batch/sweep, img2img upload helper, dep auto-fix,
log fetch, health check, example workflows).
Critical fixes
- run_workflow.py: poll_status now checks status_str==error BEFORE
completed:true, so a failed run no longer reports success
- run_workflow.py: download_output streams to disk via safe_path_join,
preserves server subfolder structure (no silent overwrites), and
retries with exponential backoff
- run_workflow.py: refuses to overwrite a link with a literal in
inject_params (would silently break wiring)
- _common.py: _StripSensitiveOnRedirectSession (subclasses
requests.Session.rebuild_auth) drops X-API-Key/Cookie on cross-host
redirects — fixes a real key-leak path through Cloud's signed-URL
download flow. Tested
- Cloud routing (verified live): /history → /history_v2,
/models/<f> → /experiment/models/<f>, plus folder aliases for the
unet ↔ diffusion_models and clip ↔ text_encoders rename
- check_deps.py: distinguishes 200/empty vs 404 folder_not_found vs
403 free-tier; emits concrete fix_command per missing dep
- extract_schema.py: prompt vs negative_prompt determined by tracing
KSampler.{positive,negative} connections (incl. through Reroute /
Primitive nodes) instead of meta-title heuristic; symmetric
duplicate-name resolution; cycle-safe trace_to_node
- hardware_check.py: multi-GPU pick-best, Apple variant detection,
Rosetta detection, WSL2, ROCm --json, disk-space check, optional
PyTorch probe; powershell preferred over deprecated wmic
- comfyui_setup.sh: prefers pipx → uvx → pip --user (with PEP-668
fallback); idempotent — skips relaunch if server already up;
configurable port/workspace; persistent log; SIGINT trap
New scripts
- run_batch.py — count or sweep (cartesian product), parallel up to
cloud tier limit
- ws_monitor.py — real-time WebSocket viewer; saves preview frames
- auto_fix_deps.py — runs comfy node install / model download for
whatever check_deps reports missing (with --dry-run)
- health_check.py — single command that runs the verification checklist
(comfy-cli + server + checkpoints + optional smoke test that cancels
itself to avoid burning compute)
- fetch_logs.py — pull traceback / status messages for a prompt_id
Coverage expansion
- Param patterns now cover Flux (BasicScheduler, BasicGuider,
RandomNoise, ModelSamplingFlux), SD3, Wan/Hunyuan/LTX video,
IPAdapter, rgthree, easy-use, AnimateDiff
- Embedding refs in CLIPTextEncode strings extracted as model deps
- ckpt_name / vae_name / lora_name / unet_name now controllable so
workflows can be retargeted per run
Examples
- workflows/{sd15,sdxl,flux_dev}_txt2img.json
- workflows/sdxl_{img2img,inpaint}.json
- workflows/upscale_4x.json
- workflows/{animatediff_video,wan_video_t2v}.json + README
Tests
- 117 tests (105 unit + 8 cloud integration + 4 cross-host security)
- Cloud tests auto-skip without COMFY_CLOUD_API_KEY; verified end-to-end
against live cloud API
Backwards compatibility
- All existing CLI flags continue to work; new behavior is opt-in
(--ws, --input-image, --randomize-seed, --flat-output, etc.)
227 lines
7.8 KiB
Markdown
227 lines
7.8 KiB
Markdown
# ComfyUI Workflow JSON Format
|
||
|
||
## Two Formats — Only API Format Is Executable
|
||
|
||
**API format** is required for `/api/prompt` and every script in this skill.
|
||
The web UI also produces an "editor format" used for visual editing, which
|
||
**cannot** be submitted directly.
|
||
|
||
### API Format
|
||
|
||
Top-level keys are string node IDs. Each node has `class_type` and `inputs`:
|
||
|
||
```json
|
||
{
|
||
"3": {
|
||
"class_type": "KSampler",
|
||
"inputs": {
|
||
"seed": 156680208700286,
|
||
"steps": 20,
|
||
"cfg": 8,
|
||
"sampler_name": "euler",
|
||
"scheduler": "normal",
|
||
"denoise": 1.0,
|
||
"model": ["4", 0],
|
||
"positive": ["6", 0],
|
||
"negative": ["7", 0],
|
||
"latent_image": ["5", 0]
|
||
},
|
||
"_meta": {"title": "KSampler"}
|
||
},
|
||
"4": {
|
||
"class_type": "CheckpointLoaderSimple",
|
||
"inputs": {"ckpt_name": "v1-5-pruned-emaonly.safetensors"}
|
||
}
|
||
}
|
||
```
|
||
|
||
**Detection:** every top-level value has `class_type`. The skill's
|
||
`_common.is_api_format()` does this check.
|
||
|
||
### Editor Format (not directly executable)
|
||
|
||
Has `nodes[]` and `links[]` arrays — the visual graph. To convert: open in
|
||
ComfyUI's web UI and use **Workflow → Export (API)** (newer UI) or the
|
||
"Save (API Format)" button (older UI).
|
||
|
||
**Detection:** top-level has `"nodes"` and `"links"` keys.
|
||
|
||
## Inputs: Literals vs Links
|
||
|
||
```json
|
||
"inputs": {
|
||
"text": "a cat", // literal — modifiable
|
||
"seed": 42, // literal — modifiable
|
||
"clip": ["4", 1] // link — wiring; do NOT overwrite
|
||
}
|
||
```
|
||
|
||
Links are length-2 arrays of `[upstream_node_id, output_slot]`. The skill's
|
||
parameter injector refuses to overwrite a link with a literal (logs a
|
||
warning and skips).
|
||
|
||
## Common Node Types and Their Controllable Parameters
|
||
|
||
The full catalog lives in `scripts/_common.py` (`PARAM_PATTERNS` and
|
||
`MODEL_LOADERS`). Highlights:
|
||
|
||
### Text Prompts
|
||
|
||
| Node Class | Key Fields |
|
||
|------------|------------|
|
||
| `CLIPTextEncode` | `text` |
|
||
| `CLIPTextEncodeSDXL` | `text_g`, `text_l`, `width`, `height` |
|
||
| `CLIPTextEncodeFlux` | `clip_l`, `t5xxl`, `guidance` |
|
||
|
||
To distinguish positive from negative the skill traces `KSampler.negative`
|
||
back through Reroute / Primitive nodes to the source CLIPTextEncode. Falls
|
||
back to `_meta.title` heuristics ("negative", "neg", "anti").
|
||
|
||
### Sampling
|
||
|
||
| Node Class | Key Fields |
|
||
|------------|------------|
|
||
| `KSampler` | `seed`, `steps`, `cfg`, `sampler_name`, `scheduler`, `denoise` |
|
||
| `KSamplerAdvanced` | `noise_seed`, `steps`, `cfg`, `start_at_step`, `end_at_step` |
|
||
| `SamplerCustom` | `noise_seed`, `cfg`, `sampler`, `sigmas` |
|
||
| `SamplerCustomAdvanced` | `noise_seed` (via RandomNoise input) |
|
||
| `RandomNoise` | `noise_seed` |
|
||
| `BasicScheduler` | `steps`, `scheduler`, `denoise` |
|
||
| `KSamplerSelect` | `sampler_name` |
|
||
| `BasicGuider` / `CFGGuider` | `cfg` |
|
||
| `ModelSamplingFlux` | `max_shift`, `base_shift`, `width`, `height` |
|
||
| `SDTurboScheduler` | `steps`, `denoise` |
|
||
|
||
### Latent / Dimensions
|
||
|
||
| Node Class | Key Fields |
|
||
|------------|------------|
|
||
| `EmptyLatentImage` | `width`, `height`, `batch_size` |
|
||
| `EmptySD3LatentImage` | `width`, `height`, `batch_size` |
|
||
| `EmptyHunyuanLatentVideo` | `width`, `height`, `length`, `batch_size` |
|
||
| `EmptyMochiLatentVideo` | `width`, `height`, `length`, `batch_size` |
|
||
| `EmptyLTXVLatentVideo` | `width`, `height`, `length`, `batch_size` |
|
||
|
||
### Model Loading
|
||
|
||
| Node Class | Key Fields | Folder |
|
||
|------------|------------|--------|
|
||
| `CheckpointLoaderSimple` | `ckpt_name` | `checkpoints` |
|
||
| `LoraLoader` | `lora_name`, `strength_model`, `strength_clip` | `loras` |
|
||
| `LoraLoaderModelOnly` | `lora_name`, `strength_model` | `loras` |
|
||
| `VAELoader` | `vae_name` | `vae` |
|
||
| `ControlNetLoader` | `control_net_name` | `controlnet` |
|
||
| `CLIPLoader` | `clip_name` | `clip` |
|
||
| `DualCLIPLoader` | `clip_name1`, `clip_name2` | `clip` |
|
||
| `TripleCLIPLoader` | `clip_name1/2/3` | `clip` |
|
||
| `UNETLoader` | `unet_name` | `unet` |
|
||
| `DiffusionModelLoader` | `model_name` | `diffusion_models` |
|
||
| `UpscaleModelLoader` | `model_name` | `upscale_models` |
|
||
| `IPAdapterModelLoader` | `ipadapter_file` | `ipadapter` |
|
||
| `ADE_AnimateDiffLoaderWithContext` | `model_name`, `motion_scale` | `animatediff_models` |
|
||
|
||
### Image Input/Output
|
||
|
||
| Node Class | Key Fields |
|
||
|------------|------------|
|
||
| `LoadImage` | `image` (server-side filename, after upload) |
|
||
| `LoadImageMask` | `image`, `channel` (`red` / `green` / `blue` / `alpha`) |
|
||
| `VAEEncode` / `VAEDecode` | (no controllable fields) |
|
||
| `VAEEncodeForInpaint` | `grow_mask_by` |
|
||
| `SaveImage` | `filename_prefix` |
|
||
| `VHS_VideoCombine` | `frame_rate`, `format`, `filename_prefix`, `loop_count`, `pingpong` |
|
||
|
||
### ControlNet
|
||
|
||
| Node Class | Key Fields |
|
||
|------------|------------|
|
||
| `ControlNetApply` | `strength` |
|
||
| `ControlNetApplyAdvanced` | `strength`, `start_percent`, `end_percent` |
|
||
|
||
### IPAdapter (community pack `comfyui_ipadapter_plus`)
|
||
|
||
| Node Class | Key Fields |
|
||
|------------|------------|
|
||
| `IPAdapterAdvanced` | `weight`, `start_at`, `end_at` |
|
||
| `IPAdapter` | `weight` |
|
||
|
||
### Embeddings (referenced inside prompt strings)
|
||
|
||
ComfyUI scans prompt text for `embedding:NAME` syntax. The skill's
|
||
`_common.iter_embedding_refs()` extracts these as model dependencies.
|
||
|
||
```text
|
||
"a beautiful cat, embedding:goodvibes:1.2, embedding:art-style"
|
||
```
|
||
|
||
`extract_schema.py` and `check_deps.py` surface these in
|
||
`embedding_dependencies` / `missing_embeddings`.
|
||
|
||
## Parameter Injection Pattern
|
||
|
||
```python
|
||
import json, copy
|
||
|
||
with open("workflow_api.json") as f:
|
||
workflow = json.load(f)
|
||
|
||
wf = copy.deepcopy(workflow)
|
||
wf["6"]["inputs"]["text"] = "a beautiful sunset"
|
||
wf["7"]["inputs"]["text"] = "ugly, blurry"
|
||
wf["3"]["inputs"]["seed"] = 42
|
||
wf["3"]["inputs"]["steps"] = 30
|
||
wf["5"]["inputs"]["width"] = 1024
|
||
wf["5"]["inputs"]["height"] = 1024
|
||
```
|
||
|
||
`scripts/extract_schema.py` automates discovering which node IDs/fields
|
||
correspond to which user-facing parameters. It returns a `parameters` dict
|
||
that `run_workflow.py` reads to inject values from `--args`.
|
||
|
||
## Identifying Controllable Parameters (Heuristics)
|
||
|
||
For unknown workflows:
|
||
|
||
1. **Prompt text** — any `CLIPTextEncode.text`. Use connection tracing back
|
||
from `KSampler.positive` / `.negative` to disambiguate (don't trust
|
||
meta-title alone).
|
||
2. **Seed** — `KSampler.seed` / `KSamplerAdvanced.noise_seed` / `RandomNoise.noise_seed`.
|
||
3. **Dimensions** — `Empty*LatentImage.width/height` (must be multiples of 8).
|
||
4. **Steps / CFG** — `KSampler.steps`, `KSampler.cfg`. Steps 20–50 typical.
|
||
CFG 5–15 typical (Flux uses guidance, not CFG).
|
||
5. **Model / checkpoint** — `CheckpointLoaderSimple.ckpt_name`. Filename must
|
||
match an installed file *exactly*.
|
||
6. **LoRA** — `LoraLoader.lora_name`, `.strength_model`.
|
||
7. **Images for img2img / inpaint** — `LoadImage.image`. Server-side filename
|
||
after upload.
|
||
8. **Denoise** — `KSampler.denoise`. 0.0–1.0; 1.0 = ignore input image,
|
||
0.0 = pass through. Sweet spot for img2img: 0.4–0.7.
|
||
|
||
## Output Nodes
|
||
|
||
Output is produced by these node types. The skill's `OUTPUT_NODES` set
|
||
extends to common community packs.
|
||
|
||
| Node | Output Key | Content |
|
||
|------|-----------|---------|
|
||
| `SaveImage` | `images` | List of `{filename, subfolder, type}` |
|
||
| `PreviewImage` | `images` | Temporary preview (not saved) |
|
||
| `VHS_VideoCombine` | `gifs` (older) or `videos`/`video` (newer cloud) | Video file refs |
|
||
| `SaveAudio` | `audio` | Audio file refs |
|
||
| `SaveAnimatedWEBP` / `SaveAnimatedPNG` | `images` | Animated images |
|
||
| `Save3D` | `3d` | 3D asset refs |
|
||
|
||
After execution, fetch outputs from `/history/{prompt_id}` (local) or
|
||
`/api/jobs/{prompt_id}` (cloud) → `outputs` → `{node_id}` → `{key}`.
|
||
|
||
## Wrapper Variants
|
||
|
||
Some saved JSON files wrap the workflow under a `"prompt"` key (matching
|
||
the `/api/prompt` payload shape). The skill's `_common.unwrap_workflow()`
|
||
handles this — pass any of:
|
||
|
||
- raw API format: `{"3": {...}, "4": {...}}`
|
||
- wrapped: `{"prompt": {"3": {...}}, "client_id": "..."}`
|
||
|
||
It rejects editor format with a clear error and a re-export instruction.
|