WikiWayne
Local AIAI ToolsDigital MarketingTech NewsAboutBlogContact

As an Amazon Associate I earn from qualifying purchases. This site contains affiliate links.

WikiWayne

Independent guides on open-weight AI, local inference, and the hardware that runs it.

Categories

  • Local AI Hub
  • Local AI
  • AI Tools
  • Digital Marketing
  • Tech News

Quick Links

  • About Wayne
  • Contact
  • Methodology
  • Editorial Standards
  • Disclosures
  • Privacy Policy
  • Sitemap

Follow on X

Daily AI insights, tech takes, and more.

Follow @wikiwayne
WikiWayne© 2026
PrivacyMethodologyEditorialDisclosuresTermsSitemap

Disclosure: As an Amazon Associate I earn from qualifying purchases. This site contains affiliate links.

Home/Local AI/ComfyUI Local Stable Diffusion Guide
Back to Blog
ComfyUI Local Stable Diffusion Guide — WikiWayne local-AI hero
Local AI

ComfyUI Local Stable Diffusion Guide

Published: June 13, 2026

ComfyUI Local Stable Diffusion Guide is a cornerstone page for the WikiWayne local-AI cluster.

Key takeaways

  • ComfyUI Local Stable Diffusion Guide is a cornerstone page for the WikiWayne local-AI cluster.
  • Start with a small GGUF quant and verify VRAM on your own GPU before scaling model size.
  • Use linked cluster posts for install steps and runner-specific commands.
9 min read
local-ai, open-weight, pillar
Wayne Lowry, WikiWayne author
Wayne Lowry

10+ years in Digital Marketing & SEO

ComfyUI Local Stable Diffusion Guide

ComfyUI is a free, open-source node-graph interface for running Stable Diffusion and SDXL image generation entirely on your own machine — no cloud, no subscription, no images leaving your drive. To run it well you need a GPU with at least 8 GB of VRAM for comfortable SDXL work, the right model checkpoint, and a couple of node connections you can wire up in under ten minutes. This is the cornerstone guide for the WikiWayne ComfyUI cluster; I link out to the deep-dive posts as we go.

What is ComfyUI and why run Stable Diffusion locally?

ComfyUI is a node-based front end for Stable Diffusion models where each step — load checkpoint, encode prompt, sample, decode, save — is a box you wire together into a graph. Stable Diffusion (and its bigger sibling SDXL) is an open-weight text-to-image diffusion model you download once and run offline forever.

The reason I run it locally instead of paying a hosted generator comes down to three things: your prompts and outputs stay on your disk, there are no per-image fees once the hardware is paid off, and you get total control over the pipeline — custom samplers, LoRAs, ControlNet, upscalers, all the knobs the SaaS tools hide. The tradeoff is you own the setup and the VRAM math. That's what this guide is for.

If you're brand new to the node graph itself, start with my first ComfyUI SDXL workflow walkthrough — it builds the default text-to-image graph node by node. This page is the map; that one is the hands-on lab.

How much VRAM do I need for ComfyUI?

VRAM is the dedicated memory on your GPU, and for image generation it's the single number that decides whether a workflow runs or crashes with an out-of-memory error. Image models are heavier on VRAM than text LLMs of similar file size because the latent tensors and the VAE decode step balloon during sampling.

Here are realistic ballpark tiers. Verify on your own stack — actual usage swings with resolution, batch size, and whether you're stacking ControlNet or upscalers.

VRAM What runs comfortably Notes
4 GB SD 1.5 at 512×512 Tight. Use --lowvram, expect slow sampling
6 GB SD 1.5 comfortably, SDXL with offload SDXL works but leans on system RAM swapping
8 GB SDXL at 1024×1024 The practical floor for happy SDXL
12 GB SDXL + LoRA + ControlNet Room for a real pipeline
16 GB+ SDXL + upscale + batch, or FLUX-class models Headroom for the heavy stuff

On Apple Silicon there's no separate VRAM — unified memory is shared, so a 16 GB M-series machine behaves roughly like a 12 GB discrete GPU for these workloads, sometimes better because there's no PCIe transfer cost. The same "verify before you scale" rule from the LLM side applies here; my VRAM requirements guide covers the underlying math if you want the full picture.

How do I install ComfyUI?

ComfyUI installs three main ways. Pick based on your platform and how much you like the terminal.

The portable/desktop route is easiest on Windows and Mac — download the official desktop app or the portable ZIP, unzip, and run the launcher. It bundles Python and the right Torch build.

For the manual route on Linux or anyone who wants control, clone and install with pip:

git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
python -m venv venv
source venv/bin/activate
# NVIDIA (CUDA):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu124
pip install -r requirements.txt
python main.py

On Apple Silicon, swap the Torch line for the standard build with MPS support:

pip install torch torchvision
python main.py

On AMD under Linux you'll want the ROCm Torch wheel instead of CUDA — the rest is identical. If you're sorting out which GPU camp to be in before buying, I compare them in NVIDIA vs AMD for local LLMs; the same driver-maturity story applies to image generation.

Once it's running, open http://127.0.0.1:8188 in your browser. The default graph loads automatically.

Where do I get models, and what format are they in?

ComfyUI loads model checkpoints from ComfyUI/models/checkpoints/. The files come in a few formats, and the format matters for safety and size.

  • .safetensors — the standard. A safe, fast tensor container that can't execute arbitrary code on load. Always prefer this.
  • .ckpt — older pickle format that can run code on load. Avoid unless you trust the source completely.
  • GGUF — yes, the same quantized format used for LLMs is now showing up for image models like FLUX, letting you shrink a model to fit smaller cards. If you've never met GGUF, here's what GGUF is on the LLM side; the concept carries straight over.

Quantization is the same idea too: a lower-precision build (think Q4 vs Q8) trades a little image fidelity for a big VRAM saving. My rule for image models mirrors the text rule — start with a smaller quant, confirm it fits and runs, then scale up. The Q4 vs Q8 quality tradeoff post explains where the quality actually starts to slip.

Drop your checkpoint in the folder, hit the refresh button in the ComfyUI menu, and pick it in the Load Checkpoint node.

What's the minimum workflow to generate an image?

The default text-to-image graph ComfyUI ships with is the whole minimum. Seven nodes, wired in a line:

  1. Load Checkpoint → loads your model, outputs MODEL, CLIP, VAE
  2. CLIP Text Encode (positive) → your prompt
  3. CLIP Text Encode (negative) → what to avoid
  4. Empty Latent Image → sets resolution and batch size
  5. KSampler → the actual diffusion; this is where steps, CFG, sampler, and seed live
  6. VAE Decode → turns the latent into pixels
  7. Save Image → writes the PNG (with the full workflow embedded in its metadata)

Hit Queue Prompt and watch it run. For SDXL set the Empty Latent to 1024×1024; for SD 1.5 use 512×512 — running SD 1.5 at SDXL resolution gives you mutant outputs, and vice versa wastes VRAM. The full node-by-node build with screenshots is in the first-workflow post.

Which model should I run? A quick decision list

  • If you have 4–6 GB VRAM → run SD 1.5 checkpoints. Mature, fast, huge LoRA ecosystem, forgiving on memory.
  • If you have 8–12 GB → SDXL is your sweet spot. Better anatomy, native 1024px, sharper detail.
  • If you have 16 GB+ and want the current frontier → try FLUX-class models, ideally a GGUF quant so it fits and leaves room for the VAE.
  • If you're on a Mac with unified memory → SDXL runs well on 16 GB+; expect slower sampling than a comparable NVIDIA card but no OOM drama.
  • If you only have an iGPU or CPU → it'll work but be patient; generations run in minutes, not seconds. Treat it as a privacy-first experiment, the same calculus as CPU-only local inference.

How does ComfyUI compare to other Stable Diffusion front ends?

ComfyUI isn't the only local option. Here's how the main open-source choices stack up.

Tool Interface Best for Learning curve
ComfyUI Node graph Full pipeline control, reproducibility, automation Moderate — but pays off
AUTOMATIC1111 Tabbed web UI Beginners, quick one-offs, extension richness Low
Forge A1111-style, optimized Same UI, better low-VRAM performance Low
InvokeAI Polished unified UI Designers who want canvas + nodes Low-moderate

ComfyUI wins when you care about exactly what each step does and want to save a workflow you can re-run or hand to someone else — the entire graph is baked into every PNG you export. If you just want to type a prompt and get a picture, A1111 or Forge is the gentler door. I run ComfyUI because the node graph is the same mental model I use for chaining anything local; once it clicks, you stop fighting hidden defaults.

How do I run out of memory less often?

If you hit CUDA out of memory, ComfyUI gives you launch flags before you have to buy a bigger card:

# Aggressive memory saving (slower, fits more)
python main.py --lowvram

# Extreme — for 4 GB cards
python main.py --novram

# Force CPU (last resort, very slow)
python main.py --cpu

Other quick wins: drop batch size to 1, generate at base resolution then upscale as a second pass, and close other GPU apps (the browser and your LLM runner are both quietly holding VRAM). Offloading layers to system RAM is the same trick I lean on for big language models — the principle is laid out in GPU offload layers explained.

Can I run ComfyUI in Docker?

Yes — Docker is the clean way to keep ComfyUI's Python and CUDA stack isolated from the rest of your system, especially on a homelab box. You'll need the NVIDIA Container Toolkit installed on the host so the container can see the GPU, then pass --gpus all and mount your models directory as a volume so checkpoints persist outside the container. If you're already running an Ollama + Open WebUI stack on the same machine, the pattern is identical to my homelab Docker stack guide — one more service on the same GPU, with VRAM as the shared budget you have to watch.

Bottom line

ComfyUI turns Stable Diffusion and SDXL into a local, offline, fully controllable image pipeline — and the only hard gate is VRAM. Plan on 8 GB for happy SDXL, install via the desktop app or a pip clone, grab .safetensors checkpoints, and start from the default seven-node graph before you bolt on LoRAs and upscalers. Start small, confirm it fits on your GPU, then scale the model up. Everything beyond the basics — your first full workflow, GGUF quants, memory tuning — lives in the cluster posts linked above.

Frequently asked questions

Yes. Cornerstone posts bump updatedAt when Ollama, LM Studio, or llama.cpp ship breaking changes; see the refresh log in Content Ideas.

A GPU helps for 7B+ models at interactive speed. CPU-only inference is supported for privacy experiments with smaller quants.

Affiliate Disclosure: As an Amazon Associate I earn from qualifying purchases. This site contains affiliate links.

Related Articles

local ai

Your First ComfyUI Workflow for Local SDXL

8 min read

local ai

Best GPU for Local AI (2026)

8 min read

local ai

KoboldCpp Local LLM Guide

8 min read