Is this page updated when runners change?

Yes. Cornerstone posts bump updatedAt when Ollama, LM Studio, or llama.cpp ship breaking changes; see the refresh log in Content Ideas.

A GPU helps for 7B+ models at interactive speed. CPU-only inference is supported for privacy experiments with smaller quants.

ComfyUI Local Stable Diffusion Guide | WikiWayne

ComfyUI Local Stable Diffusion Guide

ComfyUI is a free, open-source node-graph interface for running Stable Diffusion and SDXL image generation entirely on your own machine — no cloud, no subscription, no images leaving your drive. To run it well you need a GPU with at least 8 GB of VRAM for comfortable SDXL work, the right model checkpoint, and a couple of node connections you can wire up in under ten minutes. This is the cornerstone guide for the WikiWayne ComfyUI cluster; I link out to the deep-dive posts as we go.

What is ComfyUI and why run Stable Diffusion locally?

ComfyUI is a node-based front end for Stable Diffusion models where each step — load checkpoint, encode prompt, sample, decode, save — is a box you wire together into a graph. Stable Diffusion (and its bigger sibling SDXL) is an open-weight text-to-image diffusion model you download once and run offline forever.

The reason I run it locally instead of paying a hosted generator comes down to three things: your prompts and outputs stay on your disk, there are no per-image fees once the hardware is paid off, and you get total control over the pipeline — custom samplers, LoRAs, ControlNet, upscalers, all the knobs the SaaS tools hide. The tradeoff is you own the setup and the VRAM math. That's what this guide is for.

If you're brand new to the node graph itself, start with my first ComfyUI SDXL workflow walkthrough — it builds the default text-to-image graph node by node. This page is the map; that one is the hands-on lab.

How much VRAM do I need for ComfyUI?

VRAM is the dedicated memory on your GPU, and for image generation it's the single number that decides whether a workflow runs or crashes with an out-of-memory error. Image models are heavier on VRAM than text LLMs of similar file size because the latent tensors and the VAE decode step balloon during sampling.

Here are realistic ballpark tiers. Verify on your own stack — actual usage swings with resolution, batch size, and whether you're stacking ControlNet or upscalers.

VRAM	What runs comfortably	Notes
4 GB	SD 1.5 at 512×512	Tight. Use `--lowvram`, expect slow sampling
6 GB	SD 1.5 comfortably, SDXL with offload	SDXL works but leans on system RAM swapping
8 GB	SDXL at 1024×1024	The practical floor for happy SDXL
12 GB	SDXL + LoRA + ControlNet	Room for a real pipeline
16 GB+	SDXL + upscale + batch, or FLUX-class models	Headroom for the heavy stuff

On Apple Silicon there's no separate VRAM — unified memory is shared, so a 16 GB M-series machine behaves roughly like a 12 GB discrete GPU for these workloads, sometimes better because there's no PCIe transfer cost. The same "verify before you scale" rule from the LLM side applies here; my VRAM requirements guide covers the underlying math if you want the full picture.

How do I install ComfyUI?

ComfyUI installs three main ways. Pick based on your platform and how much you like the terminal.

The portable/desktop route is easiest on Windows and Mac — download the official desktop app or the portable ZIP, unzip, and run the launcher. It bundles Python and the right Torch build.

For the manual route on Linux or anyone who wants control, clone and install with pip:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python -m venv venv
source venv/bin/activate
# NVIDIA (CUDA):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
python main.py

On Apple Silicon, swap the Torch line for the standard build with MPS support:

pip install torch torchvision
python main.py

On AMD under Linux you'll want the ROCm Torch wheel instead of CUDA — the rest is identical. If you're sorting out which GPU camp to be in before buying, I compare them in NVIDIA vs AMD for local LLMs; the same driver-maturity story applies to image generation.

Once it's running, open http://127.0.0.1:8188 in your browser. The default graph loads automatically.

Where do I get models, and what format are they in?

ComfyUI loads model checkpoints from ComfyUI/models/checkpoints/. The files come in a few formats, and the format matters for safety and size.

.safetensors — the standard. A safe, fast tensor container that can't execute arbitrary code on load. Always prefer this.
.ckpt — older pickle format that can run code on load. Avoid unless you trust the source completely.
GGUF — yes, the same quantized format used for LLMs is now showing up for image models like FLUX, letting you shrink a model to fit smaller cards. If you've never met GGUF, here's what GGUF is on the LLM side; the concept carries straight over.

Quantization is the same idea too: a lower-precision build (think Q4 vs Q8) trades a little image fidelity for a big VRAM saving. My rule for image models mirrors the text rule — start with a smaller quant, confirm it fits and runs, then scale up. The Q4 vs Q8 quality tradeoff post explains where the quality actually starts to slip.

Drop your checkpoint in the folder, hit the refresh button in the ComfyUI menu, and pick it in the Load Checkpoint node.

What's the minimum workflow to generate an image?

The default text-to-image graph ComfyUI ships with is the whole minimum. Seven nodes, wired in a line:

Load Checkpoint → loads your model, outputs MODEL, CLIP, VAE
CLIP Text Encode (positive) → your prompt
CLIP Text Encode (negative) → what to avoid
Empty Latent Image → sets resolution and batch size
KSampler → the actual diffusion; this is where steps, CFG, sampler, and seed live
VAE Decode → turns the latent into pixels
Save Image → writes the PNG (with the full workflow embedded in its metadata)

Hit Queue Prompt and watch it run. For SDXL set the Empty Latent to 1024×1024; for SD 1.5 use 512×512 — running SD 1.5 at SDXL resolution gives you mutant outputs, and vice versa wastes VRAM. The full node-by-node build with screenshots is in the first-workflow post.

Which model should I run? A quick decision list

If you have 4–6 GB VRAM → run SD 1.5 checkpoints. Mature, fast, huge LoRA ecosystem, forgiving on memory.
If you have 8–12 GB → SDXL is your sweet spot. Better anatomy, native 1024px, sharper detail.
If you have 16 GB+ and want the current frontier → try FLUX-class models, ideally a GGUF quant so it fits and leaves room for the VAE.
If you're on a Mac with unified memory → SDXL runs well on 16 GB+; expect slower sampling than a comparable NVIDIA card but no OOM drama.
If you only have an iGPU or CPU → it'll work but be patient; generations run in minutes, not seconds. Treat it as a privacy-first experiment, the same calculus as CPU-only local inference.

How does ComfyUI compare to other Stable Diffusion front ends?

ComfyUI isn't the only local option. Here's how the main open-source choices stack up.

Tool	Interface	Best for	Learning curve
ComfyUI	Node graph	Full pipeline control, reproducibility, automation	Moderate — but pays off
AUTOMATIC1111	Tabbed web UI	Beginners, quick one-offs, extension richness	Low
Forge	A1111-style, optimized	Same UI, better low-VRAM performance	Low
InvokeAI	Polished unified UI	Designers who want canvas + nodes	Low-moderate

ComfyUI wins when you care about exactly what each step does and want to save a workflow you can re-run or hand to someone else — the entire graph is baked into every PNG you export. If you just want to type a prompt and get a picture, A1111 or Forge is the gentler door. I run ComfyUI because the node graph is the same mental model I use for chaining anything local; once it clicks, you stop fighting hidden defaults.

How do I run out of memory less often?

If you hit CUDA out of memory, ComfyUI gives you launch flags before you have to buy a bigger card:

# Aggressive memory saving (slower, fits more)
python main.py --lowvram

# Extreme — for 4 GB cards
python main.py --novram

# Force CPU (last resort, very slow)
python main.py --cpu

Other quick wins: drop batch size to 1, generate at base resolution then upscale as a second pass, and close other GPU apps (the browser and your LLM runner are both quietly holding VRAM). Offloading layers to system RAM is the same trick I lean on for big language models — the principle is laid out in GPU offload layers explained.

Can I run ComfyUI in Docker?

Yes — Docker is the clean way to keep ComfyUI's Python and CUDA stack isolated from the rest of your system, especially on a homelab box. You'll need the NVIDIA Container Toolkit installed on the host so the container can see the GPU, then pass --gpus all and mount your models directory as a volume so checkpoints persist outside the container. If you're already running an Ollama + Open WebUI stack on the same machine, the pattern is identical to my homelab Docker stack guide — one more service on the same GPU, with VRAM as the shared budget you have to watch.

Bottom line

ComfyUI turns Stable Diffusion and SDXL into a local, offline, fully controllable image pipeline — and the only hard gate is VRAM. Plan on 8 GB for happy SDXL, install via the desktop app or a pip clone, grab .safetensors checkpoints, and start from the default seven-node graph before you bolt on LoRAs and upscalers. Start small, confirm it fits on your GPU, then scale the model up. Everything beyond the basics — your first full workflow, GGUF quants, memory tuning — lives in the cluster posts linked above.

ComfyUI Local Stable Diffusion Guide

What is ComfyUI and why run Stable Diffusion locally?

How much VRAM do I need for ComfyUI?

Here are realistic ballpark tiers. Verify on your own stack — actual usage swings with resolution, batch size, and whether you're stacking ControlNet or upscalers.

VRAM	What runs comfortably	Notes
4 GB	SD 1.5 at 512×512	Tight. Use `--lowvram`, expect slow sampling
6 GB	SD 1.5 comfortably, SDXL with offload	SDXL works but leans on system RAM swapping
8 GB	SDXL at 1024×1024	The practical floor for happy SDXL
12 GB	SDXL + LoRA + ControlNet	Room for a real pipeline
16 GB+	SDXL + upscale + batch, or FLUX-class models	Headroom for the heavy stuff

How do I install ComfyUI?

ComfyUI installs three main ways. Pick based on your platform and how much you like the terminal.

The portable/desktop route is easiest on Windows and Mac — download the official desktop app or the portable ZIP, unzip, and run the launcher. It bundles Python and the right Torch build.

For the manual route on Linux or anyone who wants control, clone and install with pip:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python -m venv venv
source venv/bin/activate
# NVIDIA (CUDA):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
python main.py

On Apple Silicon, swap the Torch line for the standard build with MPS support:

pip install torch torchvision
python main.py

Once it's running, open http://127.0.0.1:8188 in your browser. The default graph loads automatically.

Where do I get models, and what format are they in?

ComfyUI loads model checkpoints from ComfyUI/models/checkpoints/. The files come in a few formats, and the format matters for safety and size.

.safetensors — the standard. A safe, fast tensor container that can't execute arbitrary code on load. Always prefer this.
.ckpt — older pickle format that can run code on load. Avoid unless you trust the source completely.
GGUF — yes, the same quantized format used for LLMs is now showing up for image models like FLUX, letting you shrink a model to fit smaller cards. If you've never met GGUF, here's what GGUF is on the LLM side; the concept carries straight over.

Drop your checkpoint in the folder, hit the refresh button in the ComfyUI menu, and pick it in the Load Checkpoint node.

What's the minimum workflow to generate an image?

The default text-to-image graph ComfyUI ships with is the whole minimum. Seven nodes, wired in a line:

Load Checkpoint → loads your model, outputs MODEL, CLIP, VAE
CLIP Text Encode (positive) → your prompt
CLIP Text Encode (negative) → what to avoid
Empty Latent Image → sets resolution and batch size
KSampler → the actual diffusion; this is where steps, CFG, sampler, and seed live
VAE Decode → turns the latent into pixels
Save Image → writes the PNG (with the full workflow embedded in its metadata)

Which model should I run? A quick decision list

If you have 4–6 GB VRAM → run SD 1.5 checkpoints. Mature, fast, huge LoRA ecosystem, forgiving on memory.
If you have 8–12 GB → SDXL is your sweet spot. Better anatomy, native 1024px, sharper detail.
If you have 16 GB+ and want the current frontier → try FLUX-class models, ideally a GGUF quant so it fits and leaves room for the VAE.
If you're on a Mac with unified memory → SDXL runs well on 16 GB+; expect slower sampling than a comparable NVIDIA card but no OOM drama.
If you only have an iGPU or CPU → it'll work but be patient; generations run in minutes, not seconds. Treat it as a privacy-first experiment, the same calculus as CPU-only local inference.

How does ComfyUI compare to other Stable Diffusion front ends?

ComfyUI isn't the only local option. Here's how the main open-source choices stack up.

Tool	Interface	Best for	Learning curve
ComfyUI	Node graph	Full pipeline control, reproducibility, automation	Moderate — but pays off
AUTOMATIC1111	Tabbed web UI	Beginners, quick one-offs, extension richness	Low
Forge	A1111-style, optimized	Same UI, better low-VRAM performance	Low
InvokeAI	Polished unified UI	Designers who want canvas + nodes	Low-moderate

How do I run out of memory less often?

If you hit CUDA out of memory, ComfyUI gives you launch flags before you have to buy a bigger card:

# Aggressive memory saving (slower, fits more)
python main.py --lowvram

# Extreme — for 4 GB cards
python main.py --novram

# Force CPU (last resort, very slow)
python main.py --cpu

ComfyUI Local Stable Diffusion Guide

Key takeaways

What is ComfyUI and why run Stable Diffusion locally?

How much VRAM do I need for ComfyUI?

How do I install ComfyUI?

Where do I get models, and what format are they in?

What's the minimum workflow to generate an image?

Which model should I run? A quick decision list

How does ComfyUI compare to other Stable Diffusion front ends?

How do I run out of memory less often?

Can I run ComfyUI in Docker?

Bottom line

Frequently asked questions

Related Articles

Your First ComfyUI Workflow for Local SDXL

Best GPU for Local AI (2026)

KoboldCpp Local LLM Guide

ComfyUI Local Stable Diffusion Guide

Key takeaways

What is ComfyUI and why run Stable Diffusion locally?

How much VRAM do I need for ComfyUI?

How do I install ComfyUI?

Where do I get models, and what format are they in?

What's the minimum workflow to generate an image?

Which model should I run? A quick decision list

How does ComfyUI compare to other Stable Diffusion front ends?

How do I run out of memory less often?

Can I run ComfyUI in Docker?

Bottom line

Frequently asked questions

Related Articles

Your First ComfyUI Workflow for Local SDXL

Best GPU for Local AI (2026)

KoboldCpp Local LLM Guide

ComfyUI Local Stable Diffusion Guide

Key takeaways

What is ComfyUI and why run Stable Diffusion locally?

How much VRAM do I need for ComfyUI?

How do I install ComfyUI?

Where do I get models, and what format are they in?

What's the minimum workflow to generate an image?

Which model should I run? A quick decision list

How does ComfyUI compare to other Stable Diffusion front ends?

How do I run out of memory less often?

Can I run ComfyUI in Docker?

Bottom line

Frequently asked questions

Is this page updated when runners change?

Do I need a GPU?

Related Articles

Your First ComfyUI Workflow for Local SDXL

Best GPU for Local AI (2026)

KoboldCpp Local LLM Guide

ComfyUI Local Stable Diffusion Guide

Key takeaways

What is ComfyUI and why run Stable Diffusion locally?

How much VRAM do I need for ComfyUI?

How do I install ComfyUI?

Where do I get models, and what format are they in?

What's the minimum workflow to generate an image?

Which model should I run? A quick decision list

How does ComfyUI compare to other Stable Diffusion front ends?

How do I run out of memory less often?

Can I run ComfyUI in Docker?

Bottom line

Frequently asked questions

Is this page updated when runners change?

Do I need a GPU?

Related Articles

Your First ComfyUI Workflow for Local SDXL

Best GPU for Local AI (2026)

KoboldCpp Local LLM Guide