Which pillar does this cluster support?

See /blog/comfyui-local-stable-diffusion-guide for the full cornerstone guide.

Your First ComfyUI Workflow for Local SDXL | WikiWayne

Your first ComfyUI workflow for local SDXL is just five connected nodes: a checkpoint loader, two text-encode nodes for your prompts, an empty latent, a KSampler, and a VAE decode feeding a save-image node. Load an SDXL checkpoint, type a prompt, set the latent to 1024x1024, and hit Queue Prompt — ComfyUI runs the whole graph on your local GPU and drops a PNG (with the full workflow embedded) into your output folder. That is the entire loop, and once you've wired it once by hand you understand every Stable Diffusion UI ever built.

This is a cluster guide under the ComfyUI local Stable Diffusion guide pillar — read that for install, model management, and the bigger picture. Here we're zooming in on the single thing beginners get stuck on: building the default graph from scratch so the nodes stop looking like spaghetti.

What is a ComfyUI workflow?

A ComfyUI workflow is a node graph where each box does one job and you connect their inputs and outputs by hand, so the whole image-generation pipeline is visible and editable instead of hidden behind sliders. Where Automatic1111 gives you a form, ComfyUI gives you the wiring diagram. That's intimidating for about ten minutes and liberating forever after, because you can see exactly where latents, conditioning, and pixels flow.

The graph runs from left to right: a model goes in, conditioning (your prompts) gets attached, noise gets denoised over N steps, and the result gets decoded from latent space into an actual image.

What do I need to run SDXL locally?

SDXL (Stable Diffusion XL) is the open-weight 1024x1024 base model from Stability AI — bigger and sharper than SD 1.5, and the sensible default for new local setups in 2026. Here's the honest hardware picture. Verify exact numbers on your own stack, but these are realistic ballparks:

Setup	SDXL feasible?	Notes
NVIDIA 8GB VRAM (3060 Ti, 4060)	Yes	Fine at 1024x1024; enable `--lowvram` if you hit OOM
NVIDIA 12-16GB (4070, 4080)	Comfortably	Room for refiner, ControlNet, upscaling
NVIDIA 24GB (3090, 4090)	Easily	Batch generation, multiple models loaded
AMD (ROCm on Linux)	Yes	Works, but expect more setup friction than NVIDIA
Apple Silicon (M-series, MPS)	Yes	Slower per image than a comparable NVIDIA card, but totally usable on 16GB+ unified memory

If you're still picking hardware, my best GPU for local AI in 2026 and best used GPU on a budget breakdowns apply directly — the VRAM that runs a 12B LLM comfortably runs SDXL comfortably too.

Generation speed depends heavily on steps, sampler, and card. A mid-range NVIDIA GPU does a 1024x1024 image in single-digit seconds to low double digits at 20-30 steps; Apple Silicon and older cards take longer. Don't trust any fixed tokens-or-seconds-per-image number you read online, including mine — time it yourself.

How do I install ComfyUI?

Quickest path is the standalone portable build (Windows) or a git clone with a virtual environment. The portable approach for a CUDA machine:

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
python main.py

On Apple Silicon, skip the CUDA index URL and let pip install the MPS-enabled PyTorch build:

pip install torch torchvision
python main.py --force-fp16

Then open http://127.0.0.1:8188 in a browser. Install ComfyUI-Manager early — it's the extension that lets you install missing nodes and models without touching the command line again. The pillar guide covers Manager setup in depth.

Where do I put the SDXL checkpoint?

Drop the checkpoint file (a .safetensors, usually 6-7GB for SDXL base) into ComfyUI/models/checkpoints/. Grab sd_xl_base_1.0.safetensors from Hugging Face, or any community SDXL fine-tune — Juggernaut, RealVisXL, and the like all load identically because they share SDXL's architecture.

# from the ComfyUI root
cd models/checkpoints
# download your SDXL base .safetensors here, e.g. with huggingface-cli or wget

Always prefer .safetensors over old .ckpt pickle files. Safetensors can't execute arbitrary code on load — same security logic as preferring vetted GGUF quants for LLMs. If you're new to open-weight file formats generally, the LLM-side explainer in what is GGUF carries over: the format is a container, the weights are the model, and the source matters.

How do I wire the default workflow node by node?

If you load ComfyUI fresh you get this graph by default — but build it once yourself so you understand it. Right-click the canvas, choose Add Node, and place these six:

Load Checkpoint — pick your SDXL .safetensors from the dropdown. It outputs three things: MODEL, CLIP, and VAE.
CLIP Text Encode (Prompt) x2 — one for your positive prompt, one for negative. Connect the checkpoint's CLIP output into both. Type your description in the positive ("a tabby cat astronaut, cinematic lighting") and what to avoid in the negative ("blurry, lowres, extra limbs").
Empty Latent Image — set width and height to 1024 and 1024 for SDXL, batch size 1 to start. This is your blank canvas in latent space.
KSampler — the engine. Connect MODEL from the checkpoint, your positive CLIP encode into positive, your negative into negative, and the empty latent into latent_image.
VAE Decode — connect the KSampler's LATENT output and the checkpoint's VAE output. This turns the denoised latent into actual pixels.
Save Image — connect the VAE Decode's IMAGE output. Hit Queue Prompt.

The mental model: checkpoint splits into model + prompt-encoder + decoder; prompts become conditioning; KSampler denoises the empty latent guided by that conditioning; VAE decodes latent to image. Every other ComfyUI graph you ever build is this skeleton plus extra nodes spliced in.

What KSampler settings should a beginner use?

The KSampler is where the denoising happens, and four settings matter most. Term definitions, then sane SDXL starting values:

Steps — how many denoising passes. More steps, more refinement, diminishing returns. Start at 25-30.
CFG — how hard the model sticks to your prompt vs. inventing freely. Start at 6-8 for SDXL (lower than SD 1.5's typical 7-11).
Sampler — the denoising algorithm. dpmpp_2m is a reliable default; euler is fine too.
Scheduler — controls the noise schedule. karras pairs well with dpmpp_2m.
Seed — the random starting noise. Fix it to reproduce an image; randomize to explore. Set control_after_generate to randomize while experimenting.

Decision shortcuts:

If images look noisy or unfinished, raise steps toward 30-40.
If the model ignores your prompt, raise CFG toward 9-10.
If images look fried, over-saturated, or "deep-fried," lower CFG toward 5-6.
If you want speed over polish, drop to 15-20 steps with euler.

Why is my output a PNG with the workflow inside it?

ComfyUI embeds the entire node graph as metadata inside every PNG it saves to ComfyUI/output/. Drag that PNG back onto the ComfyUI canvas and the full workflow — checkpoint, prompts, every setting — reloads exactly. That's the killer feature: your images are self-documenting. Share a PNG and you've shared a reproducible recipe. (Strip metadata before posting publicly if your prompts are private.)

ComfyUI vs. one-click image tools — which should I use?

	ComfyUI	A1111 / Forge / Fooocus
Interface	Node graph	Form with tabs
Learning curve	Steeper at first	Gentle
Flexibility	Total — splice any pipeline	Limited to built-in features
Reproducibility	Workflow embedded in PNG	Params in text
Best for	Tinkerers, complex pipelines	"I just want an image now"

If you want to understand the machine, ComfyUI. If you want a fast on-ramp before graduating, start with a form-based UI — my broader ComfyUI local Stable Diffusion guide covers when to switch. The same "see-the-internals vs. one-click" tradeoff shows up on the text side in LM Studio vs. Ollama vs. llama.cpp — ComfyUI is the llama.cpp of image generation: maximum control, minimum hand-holding.

Bottom line

Six nodes, one connection at a time: Load Checkpoint to two CLIP Text Encodes, into a KSampler alongside an Empty Latent, out through VAE Decode to Save Image. Build that graph by hand once and ComfyUI stops being scary — you'll read any workflow on the internet at a glance. Start with SDXL base at 1024x1024, 25-30 steps, CFG 6-8, dpmpp_2m/karras, and a randomized seed, then tweak. Everything runs on your own GPU, nothing touches the cloud, and every PNG you export carries its own recipe. When you're ready for ControlNet, LoRAs, and upscaling, come back to the pillar guide — but you've already learned the part that actually matters.

What is a ComfyUI workflow?

The graph runs from left to right: a model goes in, conditioning (your prompts) gets attached, noise gets denoised over N steps, and the result gets decoded from latent space into an actual image.

What do I need to run SDXL locally?

Setup	SDXL feasible?	Notes
NVIDIA 8GB VRAM (3060 Ti, 4060)	Yes	Fine at 1024x1024; enable `--lowvram` if you hit OOM
NVIDIA 12-16GB (4070, 4080)	Comfortably	Room for refiner, ControlNet, upscaling
NVIDIA 24GB (3090, 4090)	Easily	Batch generation, multiple models loaded
AMD (ROCm on Linux)	Yes	Works, but expect more setup friction than NVIDIA
Apple Silicon (M-series, MPS)	Yes	Slower per image than a comparable NVIDIA card, but totally usable on 16GB+ unified memory

If you're still picking hardware, my best GPU for local AI in 2026 and best used GPU on a budget breakdowns apply directly — the VRAM that runs a 12B LLM comfortably runs SDXL comfortably too.

How do I install ComfyUI?

Quickest path is the standalone portable build (Windows) or a git clone with a virtual environment. The portable approach for a CUDA machine:

git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python -m venv venv
source venv/bin/activate   # Windows: venv\Scripts\activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
python main.py

On Apple Silicon, skip the CUDA index URL and let pip install the MPS-enabled PyTorch build:

pip install torch torchvision
python main.py --force-fp16

Where do I put the SDXL checkpoint?

# from the ComfyUI root
cd models/checkpoints
# download your SDXL base .safetensors here, e.g. with huggingface-cli or wget

How do I wire the default workflow node by node?

If you load ComfyUI fresh you get this graph by default — but build it once yourself so you understand it. Right-click the canvas, choose Add Node, and place these six:

Load Checkpoint — pick your SDXL .safetensors from the dropdown. It outputs three things: MODEL, CLIP, and VAE.
CLIP Text Encode (Prompt) x2 — one for your positive prompt, one for negative. Connect the checkpoint's CLIP output into both. Type your description in the positive ("a tabby cat astronaut, cinematic lighting") and what to avoid in the negative ("blurry, lowres, extra limbs").
Empty Latent Image — set width and height to 1024 and 1024 for SDXL, batch size 1 to start. This is your blank canvas in latent space.
KSampler — the engine. Connect MODEL from the checkpoint, your positive CLIP encode into positive, your negative into negative, and the empty latent into latent_image.
VAE Decode — connect the KSampler's LATENT output and the checkpoint's VAE output. This turns the denoised latent into actual pixels.
Save Image — connect the VAE Decode's IMAGE output. Hit Queue Prompt.

What KSampler settings should a beginner use?

The KSampler is where the denoising happens, and four settings matter most. Term definitions, then sane SDXL starting values:

Steps — how many denoising passes. More steps, more refinement, diminishing returns. Start at 25-30.
CFG — how hard the model sticks to your prompt vs. inventing freely. Start at 6-8 for SDXL (lower than SD 1.5's typical 7-11).
Sampler — the denoising algorithm. dpmpp_2m is a reliable default; euler is fine too.
Scheduler — controls the noise schedule. karras pairs well with dpmpp_2m.
Seed — the random starting noise. Fix it to reproduce an image; randomize to explore. Set control_after_generate to randomize while experimenting.

Decision shortcuts:

If images look noisy or unfinished, raise steps toward 30-40.
If the model ignores your prompt, raise CFG toward 9-10.
If images look fried, over-saturated, or "deep-fried," lower CFG toward 5-6.
If you want speed over polish, drop to 15-20 steps with euler.

Why is my output a PNG with the workflow inside it?

ComfyUI vs. one-click image tools — which should I use?

	ComfyUI	A1111 / Forge / Fooocus
Interface	Node graph	Form with tabs
Learning curve	Steeper at first	Gentle
Flexibility	Total — splice any pipeline	Limited to built-in features
Reproducibility	Workflow embedded in PNG	Params in text
Best for	Tinkerers, complex pipelines	"I just want an image now"

Your First ComfyUI Workflow for Local SDXL

Key takeaways

What is a ComfyUI workflow?

What do I need to run SDXL locally?

How do I install ComfyUI?

Where do I put the SDXL checkpoint?

How do I wire the default workflow node by node?

What KSampler settings should a beginner use?

Why is my output a PNG with the workflow inside it?

ComfyUI vs. one-click image tools — which should I use?

Bottom line

Frequently asked questions

Related Articles

ComfyUI Local Stable Diffusion Guide

Best Used GPUs for Local AI on a Budget (2026)

CPU-Only Local LLM Privacy Tradeoffs

Your First ComfyUI Workflow for Local SDXL

Key takeaways

What is a ComfyUI workflow?

What do I need to run SDXL locally?

How do I install ComfyUI?

Where do I put the SDXL checkpoint?

How do I wire the default workflow node by node?

What KSampler settings should a beginner use?

Why is my output a PNG with the workflow inside it?

ComfyUI vs. one-click image tools — which should I use?

Bottom line

Frequently asked questions

Related Articles

ComfyUI Local Stable Diffusion Guide

Best Used GPUs for Local AI on a Budget (2026)

CPU-Only Local LLM Privacy Tradeoffs