Disclosure: As an Amazon Associate I earn from qualifying purchases. This site contains affiliate links.
Your First ComfyUI Workflow for Local SDXL
Load checkpoints, wire KSampler, export PNGs locally.
Key takeaways
- Load checkpoints, wire KSampler, export PNGs locally.
- Parent pillar: /blog/comfyui-local-stable-diffusion-guide
10+ years in Digital Marketing & SEO
Your first ComfyUI workflow for local SDXL is just five connected nodes: a checkpoint loader, two text-encode nodes for your prompts, an empty latent, a KSampler, and a VAE decode feeding a save-image node. Load an SDXL checkpoint, type a prompt, set the latent to 1024x1024, and hit Queue Prompt — ComfyUI runs the whole graph on your local GPU and drops a PNG (with the full workflow embedded) into your output folder. That is the entire loop, and once you've wired it once by hand you understand every Stable Diffusion UI ever built.
This is a cluster guide under the ComfyUI local Stable Diffusion guide pillar — read that for install, model management, and the bigger picture. Here we're zooming in on the single thing beginners get stuck on: building the default graph from scratch so the nodes stop looking like spaghetti.
What is a ComfyUI workflow?
A ComfyUI workflow is a node graph where each box does one job and you connect their inputs and outputs by hand, so the whole image-generation pipeline is visible and editable instead of hidden behind sliders. Where Automatic1111 gives you a form, ComfyUI gives you the wiring diagram. That's intimidating for about ten minutes and liberating forever after, because you can see exactly where latents, conditioning, and pixels flow.
The graph runs from left to right: a model goes in, conditioning (your prompts) gets attached, noise gets denoised over N steps, and the result gets decoded from latent space into an actual image.
What do I need to run SDXL locally?
SDXL (Stable Diffusion XL) is the open-weight 1024x1024 base model from Stability AI — bigger and sharper than SD 1.5, and the sensible default for new local setups in 2026. Here's the honest hardware picture. Verify exact numbers on your own stack, but these are realistic ballparks:
| Setup | SDXL feasible? | Notes |
|---|---|---|
| NVIDIA 8GB VRAM (3060 Ti, 4060) | Yes | Fine at 1024x1024; enable --lowvram if you hit OOM |
| NVIDIA 12-16GB (4070, 4080) | Comfortably | Room for refiner, ControlNet, upscaling |
| NVIDIA 24GB (3090, 4090) | Easily | Batch generation, multiple models loaded |
| AMD (ROCm on Linux) | Yes | Works, but expect more setup friction than NVIDIA |
| Apple Silicon (M-series, MPS) | Yes | Slower per image than a comparable NVIDIA card, but totally usable on 16GB+ unified memory |
If you're still picking hardware, my best GPU for local AI in 2026 and best used GPU on a budget breakdowns apply directly — the VRAM that runs a 12B LLM comfortably runs SDXL comfortably too.
Generation speed depends heavily on steps, sampler, and card. A mid-range NVIDIA GPU does a 1024x1024 image in single-digit seconds to low double digits at 20-30 steps; Apple Silicon and older cards take longer. Don't trust any fixed tokens-or-seconds-per-image number you read online, including mine — time it yourself.
How do I install ComfyUI?
Quickest path is the standalone portable build (Windows) or a git clone with a virtual environment. The portable approach for a CUDA machine:
git clone https://github.com/comfyanonymous/ComfyUI
cd ComfyUI
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
python main.py
On Apple Silicon, skip the CUDA index URL and let pip install the MPS-enabled PyTorch build:
pip install torch torchvision
python main.py --force-fp16
Then open http://127.0.0.1:8188 in a browser. Install ComfyUI-Manager early — it's the extension that lets you install missing nodes and models without touching the command line again. The pillar guide covers Manager setup in depth.
Where do I put the SDXL checkpoint?
Drop the checkpoint file (a .safetensors, usually 6-7GB for SDXL base) into ComfyUI/models/checkpoints/. Grab sd_xl_base_1.0.safetensors from Hugging Face, or any community SDXL fine-tune — Juggernaut, RealVisXL, and the like all load identically because they share SDXL's architecture.
# from the ComfyUI root
cd models/checkpoints
# download your SDXL base .safetensors here, e.g. with huggingface-cli or wget
Always prefer .safetensors over old .ckpt pickle files. Safetensors can't execute arbitrary code on load — same security logic as preferring vetted GGUF quants for LLMs. If you're new to open-weight file formats generally, the LLM-side explainer in what is GGUF carries over: the format is a container, the weights are the model, and the source matters.
How do I wire the default workflow node by node?
If you load ComfyUI fresh you get this graph by default — but build it once yourself so you understand it. Right-click the canvas, choose Add Node, and place these six:
- Load Checkpoint — pick your SDXL
.safetensorsfrom the dropdown. It outputs three things: MODEL, CLIP, and VAE. - CLIP Text Encode (Prompt) x2 — one for your positive prompt, one for negative. Connect the checkpoint's CLIP output into both. Type your description in the positive ("a tabby cat astronaut, cinematic lighting") and what to avoid in the negative ("blurry, lowres, extra limbs").
- Empty Latent Image — set width and height to 1024 and 1024 for SDXL, batch size 1 to start. This is your blank canvas in latent space.
- KSampler — the engine. Connect MODEL from the checkpoint, your positive CLIP encode into
positive, your negative intonegative, and the empty latent intolatent_image. - VAE Decode — connect the KSampler's LATENT output and the checkpoint's VAE output. This turns the denoised latent into actual pixels.
- Save Image — connect the VAE Decode's IMAGE output. Hit Queue Prompt.
The mental model: checkpoint splits into model + prompt-encoder + decoder; prompts become conditioning; KSampler denoises the empty latent guided by that conditioning; VAE decodes latent to image. Every other ComfyUI graph you ever build is this skeleton plus extra nodes spliced in.
What KSampler settings should a beginner use?
The KSampler is where the denoising happens, and four settings matter most. Term definitions, then sane SDXL starting values:
- Steps — how many denoising passes. More steps, more refinement, diminishing returns. Start at 25-30.
- CFG — how hard the model sticks to your prompt vs. inventing freely. Start at 6-8 for SDXL (lower than SD 1.5's typical 7-11).
- Sampler — the denoising algorithm.
dpmpp_2mis a reliable default;euleris fine too. - Scheduler — controls the noise schedule.
karraspairs well withdpmpp_2m. - Seed — the random starting noise. Fix it to reproduce an image; randomize to explore. Set
control_after_generatetorandomizewhile experimenting.
Decision shortcuts:
- If images look noisy or unfinished, raise steps toward 30-40.
- If the model ignores your prompt, raise CFG toward 9-10.
- If images look fried, over-saturated, or "deep-fried," lower CFG toward 5-6.
- If you want speed over polish, drop to 15-20 steps with
euler.
Why is my output a PNG with the workflow inside it?
ComfyUI embeds the entire node graph as metadata inside every PNG it saves to ComfyUI/output/. Drag that PNG back onto the ComfyUI canvas and the full workflow — checkpoint, prompts, every setting — reloads exactly. That's the killer feature: your images are self-documenting. Share a PNG and you've shared a reproducible recipe. (Strip metadata before posting publicly if your prompts are private.)
ComfyUI vs. one-click image tools — which should I use?
| ComfyUI | A1111 / Forge / Fooocus | |
|---|---|---|
| Interface | Node graph | Form with tabs |
| Learning curve | Steeper at first | Gentle |
| Flexibility | Total — splice any pipeline | Limited to built-in features |
| Reproducibility | Workflow embedded in PNG | Params in text |
| Best for | Tinkerers, complex pipelines | "I just want an image now" |
If you want to understand the machine, ComfyUI. If you want a fast on-ramp before graduating, start with a form-based UI — my broader ComfyUI local Stable Diffusion guide covers when to switch. The same "see-the-internals vs. one-click" tradeoff shows up on the text side in LM Studio vs. Ollama vs. llama.cpp — ComfyUI is the llama.cpp of image generation: maximum control, minimum hand-holding.
Bottom line
Six nodes, one connection at a time: Load Checkpoint to two CLIP Text Encodes, into a KSampler alongside an Empty Latent, out through VAE Decode to Save Image. Build that graph by hand once and ComfyUI stops being scary — you'll read any workflow on the internet at a glance. Start with SDXL base at 1024x1024, 25-30 steps, CFG 6-8, dpmpp_2m/karras, and a randomized seed, then tweak. Everything runs on your own GPU, nothing touches the cloud, and every PNG you export carries its own recipe. When you're ready for ControlNet, LoRAs, and upscaling, come back to the pillar guide — but you've already learned the part that actually matters.
Frequently asked questions
See /blog/comfyui-local-stable-diffusion-guide for the full cornerstone guide.
Affiliate Disclosure: As an Amazon Associate I earn from qualifying purchases. This site contains affiliate links.
